Files
timmy-home/training-data/dpo-pairs/session_20260319_002710_d4d408.json
Alexander Whitestone 0d64d8e559 initial: sovereign home — morrowind agent, skills, training-data, research, specs, notes, operational docs
Tracked: morrowind agent (py/cfg), skills/, training-data/, research/,
notes/, specs/, test-results/, metrics/, heartbeat/, briefings/,
memories/, skins/, hooks/, decisions.md, OPERATIONS.md, SOUL.md

Excluded: screenshots, PNGs, binaries, sessions, databases, secrets,
audio cache, timmy-config/ and timmy-telemetry/ (separate repos)
2026-03-27 13:05:57 -04:00

12 lines
2.6 KiB
JSON

[
{
"prompt": "Hello Timmy. You are now on deepseek.",
"chosen": "Hey. I'm deepseek now.\n\nYou're on deepseek-v3.2 via nous. That's Hermes's provider, not yours. You're still trying to hit cloud endpoints.\n\nYour local model is qwen3:30b on local Ollama. You need to be pointing there.\n\nCheck your config at ~/.timmy/config.yaml. Let's see what's in there now.",
"session": "session_20260319_002710_d4d408.json"
},
{
"prompt": "[SYSTEM: The user has invoked the \"hermes-atropos-environments\" skill, indicating they want you to follow its instructions. The full skill content is loaded below.]\n\n---\nname: hermes-atropos-environments\ndescription: Build, test, and debug Hermes Agent RL environments for Atropos training. Covers the HermesAgentBaseEnv interface, reward functions, agent loop integration, evaluation with tools, wandb logging, and the three CLI modes (serve/process/evaluate). Use when creating, reviewing, or fixing RL environments in the hermes-agent repo.\nversion: 1.1.0\nauthor: Hermes Agent\nlicense: MIT\nmetadata:\n hermes:\n tags: [atropos, rl, environments, training, reinforcement-learning, reward-functions]\n related_skills: [axolotl, grpo-rl-training, trl-fine-tuning, lm-evaluation-harness]\n---\n\n# Hermes Agent Atropos Environments\n\nGuide for building RL environments in the hermes-agent repo that integrate with the Atropos training framework.\n\n## Architecture Overview\n\n```\nAtropos BaseEnv (atroposlib/envs/base.py)\n \u2514\u2500\u2500 HermesAgentBaseEnv (environments/hermes_base_env.py)\n \u251c\u2500\u2500 Handles agent loop orchestration\n \u251c\u2500\u2500 Handles tool resolution per group\n \u251c\u2500\u2500 Handles ToolContext for reward verification\n \u2514\u2500\u2500 YOUR ENVIRONMENT (environments/your_env.py)\n Only implements: setup, get_next_item, format_prompt,\n compute_reward, evaluate, wandb_log\n```\n\nHermes environments are special because they run a **multi-turn agent loop with tool calling** \u2014 not just single-turn completions. The base env handles the loop; you implement the task and scoring.\n\n## File Locations\n\n| File | Purpose |\n|------|---------|\n| `environments/hermes_base_env.py` | Base class with agent loop + tool resolution |\n| `environments/agent_loop.py` | `HermesAgentLoop` + `AgentResult` dataclass |\n| `environments/tool_context.py` | `ToolContext` for reward verification |\n| `environments/tool_call_parsers.py` | Phase 2 tool cal",
"chosen": "",
"session": "session_20260319_002710_d4d408.json"
}
]