Files

Alexander Whitestone 0d64d8e559 initial: sovereign home — morrowind agent, skills, training-data, research, specs, notes, operational docs

Tracked: morrowind agent (py/cfg), skills/, training-data/, research/,
notes/, specs/, test-results/, metrics/, heartbeat/, briefings/,
memories/, skins/, hooks/, decisions.md, OPERATIONS.md, SOUL.md

Excluded: screenshots, PNGs, binaries, sessions, databases, secrets,
audio cache, timmy-config/ and timmy-telemetry/ (separate repos)

2026-03-27 13:05:57 -04:00

3.0 KiB

Raw Blame History

Atropos BaseEnv Reference

Source: atroposlib/envs/base.py (~2124 lines)

Abstract Methods (MUST implement)

Method	Signature	Description
`get_next_item()`	`async def get_next_item(self) -> Item`	Return next item for trajectory. Return None to pause.
`evaluate()`	`async def evaluate(self, args, *kwargs)`	Called every steps_per_eval steps.
`setup()`	`async def setup(self)`	Called once at start. Load datasets, init models.
`collect_trajectory()`	`async def collect_trajectory(self, item) -> Tuple[Optional[ScoredDataItem], List[Item]]`	Single rollout. Or override collect_trajectories instead.

Overridable Methods

Method	Default Behavior	Override When
`collect_trajectories()`	Runs collect_trajectory group_size times in parallel	Batch generation, MCTS, coupled rollouts
`wandb_log()`	Logs completion lengths, rollout table, perf stats	Add custom metrics (always call super)
`config_init()`	Returns (env_config_cls(), ServerBaseline())	Custom defaults + server configs
`postprocess_histories()`	Passthrough	Final processing before sending to trainer
`save_checkpoint()`	Saves JSON to checkpoint_dir	Custom serialization
`cleanup()`	No-op	Release resources after each rollout

ScoredDataGroup Structure

ScoredDataGroup = TypedDict with:
    tokens:             List[List[int]]       # Token IDs per rollout
    masks:              List[List[int]]       # -100=prompt, token_id=completion
    scores:             List[float]           # Score per rollout
    advantages:         Optional[...]         # Per-token advantages
    ref_logprobs:       Optional[...]         # Reference model logprobs
    messages:           Optional[...]         # OpenAI-format messages
    inference_logprobs: Optional[...]         # Inference logprobs

BaseEnvConfig Key Fields

Field	Default	Description
`group_size`	4	Responses grouped for scoring
`steps_per_eval`	100	Steps between evaluations
`max_token_length`	2048	Max token length for generations
`total_steps`	1000	Total training steps
`use_wandb`	True	Enable wandb logging
`tokenizer_name`	DeepHermes-3	Tokenizer for token encoding
`ensure_scores_are_not_same`	True	Skip groups with identical scores
`worker_timeout`	600	Task timeout seconds

Data Flow

env_manager() → add_train_workers() → handle_env()
    → collect_trajectories() → postprocess_histories()
    → handle_send_to_api() → training server

Atropos Environment Statistics (82 environments analyzed)

95% implement setup, collect_trajectories, evaluate, get_next_item
76% override wandb_log
54% have custom config class
Most use collect_trajectories (plural), not collect_trajectory (singular)
Common reward patterns: LLM-judge (~40), regex-extract (~35), code-exec (~12)

3.0 KiB Raw Blame History

Atropos BaseEnv Reference

Abstract Methods (MUST implement)

Overridable Methods

ScoredDataGroup Structure

BaseEnvConfig Key Fields

Data Flow

Atropos Environment Statistics (82 environments analyzed)

3.0 KiB

Raw Blame History