[TRAINING] Hermes/Evennia telemetry, replay, and DPO/eval alignment #37

Closed
opened 2026-03-28 16:28:38 +00:00 by Timmy · 4 comments
Owner

Parent direction: #34 and implementation lane #36

Goal:
Make the Evennia world compatible with Timmy’s best training methodologies from the start.

Principle:
Do not let Evennia become a sidecar that steals experience away from Hermes. Timmy must inhabit the world through the harness so the resulting decisions, actions, and corrections become training artifacts.

Training methodology requirements:

  1. Hermes-in-the-loop
  • world actions originate from a Hermes session, not a hidden background bot
  • sessions land in ~/.hermes* /sessions/ as normal transcripts
  1. Structured world telemetry
  • emit Evennia-side event logs (room, object, command, result, timestamp, actor)
  • save under a durable local path like ~/.timmy/training-data/evennia/
  • every world action should have world-state evidence, not only prose
  1. Replayability
  • define a small deterministic or semi-deterministic replay harness for basic room/object flows
  • make it possible to rerun “connect -> look -> move -> inspect -> read” and compare outputs over time
  1. DPO/export friendliness
  • specify how Hermes session turns and Evennia world events line up
  • ensure the world interaction trace can be paired with transcript turns for future DPO curation
  1. Eval lane
  • define a tiny eval set for world competence: orientation, navigation, object inspection, persistence recall, and simple task completion
  • keep this separate from the NLE/MiniHack benchmark lane

Acceptance:

  • define the canonical telemetry directory structure for Evennia world data
  • define the canonical mapping between Hermes session IDs and Evennia event logs
  • produce one sample world interaction trace that is usable as future training material
  • define one small replay/eval harness for world basics
  • explicitly document how this lane complements rather than replaces NLE/MiniHack

Non-goal:

  • do not build a giant bespoke training system inside Evennia; use the world as structured experience and keep the harness as the learning spine
Parent direction: #34 and implementation lane #36 Goal: Make the Evennia world compatible with Timmy’s best training methodologies from the start. Principle: Do not let Evennia become a sidecar that steals experience away from Hermes. Timmy must inhabit the world through the harness so the resulting decisions, actions, and corrections become training artifacts. Training methodology requirements: 1. Hermes-in-the-loop - world actions originate from a Hermes session, not a hidden background bot - sessions land in `~/.hermes* /sessions/` as normal transcripts 2. Structured world telemetry - emit Evennia-side event logs (room, object, command, result, timestamp, actor) - save under a durable local path like `~/.timmy/training-data/evennia/` - every world action should have world-state evidence, not only prose 3. Replayability - define a small deterministic or semi-deterministic replay harness for basic room/object flows - make it possible to rerun “connect -> look -> move -> inspect -> read” and compare outputs over time 4. DPO/export friendliness - specify how Hermes session turns and Evennia world events line up - ensure the world interaction trace can be paired with transcript turns for future DPO curation 5. Eval lane - define a tiny eval set for world competence: orientation, navigation, object inspection, persistence recall, and simple task completion - keep this separate from the NLE/MiniHack benchmark lane Acceptance: - define the canonical telemetry directory structure for Evennia world data - define the canonical mapping between Hermes session IDs and Evennia event logs - produce one sample world interaction trace that is usable as future training material - define one small replay/eval harness for world basics - explicitly document how this lane complements rather than replaces NLE/MiniHack Non-goal: - do not build a giant bespoke training system inside Evennia; use the world as structured experience and keep the harness as the learning spine
Timmy self-assigned this 2026-03-28 16:28:39 +00:00
Author
Owner

Carry-forward from merged PR #38:

Already landed toward #37:

  • canonical telemetry helper module: evennia_tools/telemetry.py
  • Evennia MCP session binding hook: mcp_evennia_bind_session
  • event logging on connect / command / disconnect in scripts/evennia/evennia_mcp_server.py
  • world verification artifact: reports/production/2026-03-28-evennia-world-proof.md

Still open for #37:

  • explicit sample trace artifact in canonical training-data shape
  • replay harness for world basics
  • eval harness for orientation/navigation/object-inspection/persistence tasks
  • final written mapping between Hermes session IDs and Evennia event logs

So #37 is now a narrower follow-up, not a blank page.

Carry-forward from merged PR #38: Already landed toward #37: - canonical telemetry helper module: `evennia_tools/telemetry.py` - Evennia MCP session binding hook: `mcp_evennia_bind_session` - event logging on connect / command / disconnect in `scripts/evennia/evennia_mcp_server.py` - world verification artifact: `reports/production/2026-03-28-evennia-world-proof.md` Still open for #37: - explicit sample trace artifact in canonical training-data shape - replay harness for world basics - eval harness for orientation/navigation/object-inspection/persistence tasks - final written mapping between Hermes session IDs and Evennia event logs So #37 is now a narrower follow-up, not a blank page.
Author
Owner

Follow-up docs PR opened: #39

http://143.198.27.163:3000/Timmy_Foundation/timmy-home/pulls/39

This captures the Evennia layout/training specs that originally lived only under ~/.timmy/specs/.

Follow-up docs PR opened: #39 http://143.198.27.163:3000/Timmy_Foundation/timmy-home/pulls/39 This captures the Evennia layout/training specs that originally lived only under `~/.timmy/specs/`.
Author
Owner

Updated PR #40 with deterministic replay/eval fix.

Current status:

  • sample trace generation succeeds
  • world-basics eval passes green
  • dedicated eval account keeps benchmark state separate from Timmy's lived world state

PR: http://143.198.27.163:3000/Timmy_Foundation/timmy-home/pulls/40

Updated PR #40 with deterministic replay/eval fix. Current status: - sample trace generation succeeds - world-basics eval passes green - dedicated eval account keeps benchmark state separate from Timmy's lived world state PR: http://143.198.27.163:3000/Timmy_Foundation/timmy-home/pulls/40
Author
Owner

Merged in PR #40.

Proof anchors:

Final state:

  • deterministic replay/eval baseline is green
  • telemetry/session mapping contract is in repo
  • sample trace and eval artifacts are in repo

Closing #37 as delivered for this first training slice.

Merged in PR #40. Proof anchors: - PR: http://143.198.27.163:3000/Timmy_Foundation/timmy-home/pulls/40 - Training report: `reports/production/2026-03-28-evennia-training-baseline.md` - Example trace: `training-data/evennia/examples/world-basics-trace.example.jsonl` - Example eval: `training-data/evennia/examples/world-basics-eval.example.json` Final state: - deterministic replay/eval baseline is green - telemetry/session mapping contract is in repo - sample trace and eval artifacts are in repo Closing #37 as delivered for this first training slice.
Timmy closed this issue 2026-03-28 19:35:45 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-home#37