Sovereign backup of all Hermes Agent configuration and data. Excludes: secrets, auth tokens, sessions, caches, code (separate repo). Tracked: - config.yaml (model, fallback chain, toolsets, display prefs) - SOUL.md (Timmy personality charter) - memories/ (persistent MEMORY.md + USER.md) - skills/ (371 files — full skill library) - cron/jobs.json (scheduled tasks) - channel_directory.json (platform channels) - hooks/ (custom hooks)
66 lines
3.0 KiB
Markdown
66 lines
3.0 KiB
Markdown
# Atropos BaseEnv Reference
|
|
|
|
Source: `atroposlib/envs/base.py` (~2124 lines)
|
|
|
|
## Abstract Methods (MUST implement)
|
|
|
|
| Method | Signature | Description |
|
|
|--------|-----------|-------------|
|
|
| `get_next_item()` | `async def get_next_item(self) -> Item` | Return next item for trajectory. Return None to pause. |
|
|
| `evaluate()` | `async def evaluate(self, *args, **kwargs)` | Called every steps_per_eval steps. |
|
|
| `setup()` | `async def setup(self)` | Called once at start. Load datasets, init models. |
|
|
| `collect_trajectory()` | `async def collect_trajectory(self, item) -> Tuple[Optional[ScoredDataItem], List[Item]]` | Single rollout. Or override collect_trajectories instead. |
|
|
|
|
## Overridable Methods
|
|
|
|
| Method | Default Behavior | Override When |
|
|
|--------|-----------------|---------------|
|
|
| `collect_trajectories()` | Runs collect_trajectory group_size times in parallel | Batch generation, MCTS, coupled rollouts |
|
|
| `wandb_log()` | Logs completion lengths, rollout table, perf stats | Add custom metrics (always call super) |
|
|
| `config_init()` | Returns (env_config_cls(), ServerBaseline()) | Custom defaults + server configs |
|
|
| `postprocess_histories()` | Passthrough | Final processing before sending to trainer |
|
|
| `save_checkpoint()` | Saves JSON to checkpoint_dir | Custom serialization |
|
|
| `cleanup()` | No-op | Release resources after each rollout |
|
|
|
|
## ScoredDataGroup Structure
|
|
|
|
```python
|
|
ScoredDataGroup = TypedDict with:
|
|
tokens: List[List[int]] # Token IDs per rollout
|
|
masks: List[List[int]] # -100=prompt, token_id=completion
|
|
scores: List[float] # Score per rollout
|
|
advantages: Optional[...] # Per-token advantages
|
|
ref_logprobs: Optional[...] # Reference model logprobs
|
|
messages: Optional[...] # OpenAI-format messages
|
|
inference_logprobs: Optional[...] # Inference logprobs
|
|
```
|
|
|
|
## BaseEnvConfig Key Fields
|
|
|
|
| Field | Default | Description |
|
|
|-------|---------|-------------|
|
|
| `group_size` | 4 | Responses grouped for scoring |
|
|
| `steps_per_eval` | 100 | Steps between evaluations |
|
|
| `max_token_length` | 2048 | Max token length for generations |
|
|
| `total_steps` | 1000 | Total training steps |
|
|
| `use_wandb` | True | Enable wandb logging |
|
|
| `tokenizer_name` | DeepHermes-3 | Tokenizer for token encoding |
|
|
| `ensure_scores_are_not_same` | True | Skip groups with identical scores |
|
|
| `worker_timeout` | 600 | Task timeout seconds |
|
|
|
|
## Data Flow
|
|
|
|
```
|
|
env_manager() → add_train_workers() → handle_env()
|
|
→ collect_trajectories() → postprocess_histories()
|
|
→ handle_send_to_api() → training server
|
|
```
|
|
|
|
## Atropos Environment Statistics (82 environments analyzed)
|
|
|
|
- 95% implement setup, collect_trajectories, evaluate, get_next_item
|
|
- 76% override wandb_log
|
|
- 54% have custom config class
|
|
- Most use collect_trajectories (plural), not collect_trajectory (singular)
|
|
- Common reward patterns: LLM-judge (~40), regex-extract (~35), code-exec (~12)
|