2.8 KiB
2.8 KiB
Bannerlord Session Trace — Replay & Eval Guide
Storage Layout
All traces live under ~/.timmy/traces/bannerlord/:
~/.timmy/traces/bannerlord/
trace_<trace_id>.jsonl # One line per ODA cycle (full state + actions)
manifest_<trace_id>.json # Session metadata, counts, replay command
Trace Format (JSONL)
Each line is one ODA cycle:
{
"cycle_index": 0,
"timestamp_start": "2026-04-10T20:15:00+00:00",
"timestamp_end": "2026-04-10T20:15:45+00:00",
"duration_ms": 45000,
"screenshot_path": "/tmp/bannerlord_capture_1744320900.png",
"window_found": true,
"screen_size": [1920, 1080],
"mouse_position": [960, 540],
"playtime_hours": 142.5,
"players_online": 8421,
"is_running": true,
"actions_planned": [{"type": "move_to", "x": 960, "y": 540}],
"actions_executed": [{"success": true, "action": "move_to", ...}],
"actions_succeeded": 1,
"actions_failed": 0,
"hermes_session_id": "f47ac10b",
"hermes_log_id": "",
"harness_session_id": "f47ac10b"
}
Capturing a Trace
# Run harness with trace logging enabled
cd /path/to/the-nexus
python -m nexus.bannerlord_harness --mock --trace --iterations 3
The trace and manifest are written to ~/.timmy/traces/bannerlord/ on harness shutdown.
Replay Protocol
- Load a trace:
BannerlordTraceLogger.load_trace(trace_file) - Create a fresh harness in mock mode
- For each cycle in the trace:
- Re-execute the
actions_plannedlist - Compare actual
actions_executedoutcomes against the recorded ones
- Re-execute the
- Score:
(matching_actions / total_actions) * 100
Eval Criteria
| Score | Grade | Meaning |
|---|---|---|
| >= 90% | PASS | Replay matches original closely |
| 70-89% | PARTIAL | Some divergence, investigate differences |
| < 70% | FAIL | Significant drift, review action semantics |
Replay Script (sketch)
from nexus.bannerlord_trace import BannerlordTraceLogger
from nexus.bannerlord_harness import BannerlordHarness
# Load trace
cycles = BannerlordTraceLogger.load_trace(
Path.home() / ".timmy" / "traces" / "bannerlord" / "trace_bl_xxx.jsonl"
)
# Replay
harness = BannerlordHarness(enable_mock=True, enable_trace=False)
await harness.start()
for cycle in cycles:
for action in cycle["actions_planned"]:
result = await harness.execute_action(action)
# Compare result against cycle["actions_executed"]
await harness.stop()
Hermes Session Mapping
The hermes_session_id and hermes_log_id fields link traces to Hermes session logs.
When a trace is captured during a live Hermes session, populate these fields so
the trace can be correlated with the broader agent conversation context.