Timmy_Foundation/the-nexus

Fork 2

Files

Alexander Whitestone c5c752f9be

Deploy Nexus / deploy (push) Has been cancelled

Details

Staging Verification Gate / verify-staging (push) Has been cancelled

Details

fix: [TRAINING] Capture the first replayable local Bannerlord session trace for Timmy

Closes #1197

Automated squash merge by mimo swarm.

2026-04-11 00:20:37 +00:00

2.8 KiB

Raw Blame History

Bannerlord Session Trace — Replay & Eval Guide

Storage Layout

All traces live under ~/.timmy/traces/bannerlord/:

~/.timmy/traces/bannerlord/
  trace_<trace_id>.jsonl    # One line per ODA cycle (full state + actions)
  manifest_<trace_id>.json  # Session metadata, counts, replay command

Trace Format (JSONL)

Each line is one ODA cycle:

{
  "cycle_index": 0,
  "timestamp_start": "2026-04-10T20:15:00+00:00",
  "timestamp_end": "2026-04-10T20:15:45+00:00",
  "duration_ms": 45000,

  "screenshot_path": "/tmp/bannerlord_capture_1744320900.png",
  "window_found": true,
  "screen_size": [1920, 1080],
  "mouse_position": [960, 540],
  "playtime_hours": 142.5,
  "players_online": 8421,
  "is_running": true,

  "actions_planned": [{"type": "move_to", "x": 960, "y": 540}],
  "actions_executed": [{"success": true, "action": "move_to", ...}],
  "actions_succeeded": 1,
  "actions_failed": 0,

  "hermes_session_id": "f47ac10b",
  "hermes_log_id": "",
  "harness_session_id": "f47ac10b"
}

Capturing a Trace

# Run harness with trace logging enabled
cd /path/to/the-nexus
python -m nexus.bannerlord_harness --mock --trace --iterations 3

The trace and manifest are written to ~/.timmy/traces/bannerlord/ on harness shutdown.

Replay Protocol

Load a trace: BannerlordTraceLogger.load_trace(trace_file)
Create a fresh harness in mock mode
For each cycle in the trace:
- Re-execute the actions_planned list
- Compare actual actions_executed outcomes against the recorded ones
Score: (matching_actions / total_actions) * 100

Eval Criteria

Score	Grade	Meaning
>= 90%	PASS	Replay matches original closely
70-89%	PARTIAL	Some divergence, investigate differences
< 70%	FAIL	Significant drift, review action semantics

Replay Script (sketch)

from nexus.bannerlord_trace import BannerlordTraceLogger
from nexus.bannerlord_harness import BannerlordHarness

# Load trace
cycles = BannerlordTraceLogger.load_trace(
    Path.home() / ".timmy" / "traces" / "bannerlord" / "trace_bl_xxx.jsonl"
)

# Replay
harness = BannerlordHarness(enable_mock=True, enable_trace=False)
await harness.start()

for cycle in cycles:
    for action in cycle["actions_planned"]:
        result = await harness.execute_action(action)
        # Compare result against cycle["actions_executed"]

await harness.stop()

Hermes Session Mapping

The hermes_session_id and hermes_log_id fields link traces to Hermes session logs. When a trace is captured during a live Hermes session, populate these fields so the trace can be correlated with the broader agent conversation context.

2.8 KiB Raw Blame History