Files
the-nexus/nexus/traces/bannerlord/REPLAY.md
Alexander Whitestone c5c752f9be
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
Staging Verification Gate / verify-staging (push) Has been cancelled
fix: [TRAINING] Capture the first replayable local Bannerlord session trace for Timmy
Closes #1197

Automated squash merge by mimo swarm.
2026-04-11 00:20:37 +00:00

98 lines
2.8 KiB
Markdown

# Bannerlord Session Trace — Replay & Eval Guide
## Storage Layout
All traces live under `~/.timmy/traces/bannerlord/`:
```
~/.timmy/traces/bannerlord/
trace_<trace_id>.jsonl # One line per ODA cycle (full state + actions)
manifest_<trace_id>.json # Session metadata, counts, replay command
```
## Trace Format (JSONL)
Each line is one ODA cycle:
```json
{
"cycle_index": 0,
"timestamp_start": "2026-04-10T20:15:00+00:00",
"timestamp_end": "2026-04-10T20:15:45+00:00",
"duration_ms": 45000,
"screenshot_path": "/tmp/bannerlord_capture_1744320900.png",
"window_found": true,
"screen_size": [1920, 1080],
"mouse_position": [960, 540],
"playtime_hours": 142.5,
"players_online": 8421,
"is_running": true,
"actions_planned": [{"type": "move_to", "x": 960, "y": 540}],
"actions_executed": [{"success": true, "action": "move_to", ...}],
"actions_succeeded": 1,
"actions_failed": 0,
"hermes_session_id": "f47ac10b",
"hermes_log_id": "",
"harness_session_id": "f47ac10b"
}
```
## Capturing a Trace
```bash
# Run harness with trace logging enabled
cd /path/to/the-nexus
python -m nexus.bannerlord_harness --mock --trace --iterations 3
```
The trace and manifest are written to `~/.timmy/traces/bannerlord/` on harness shutdown.
## Replay Protocol
1. Load a trace: `BannerlordTraceLogger.load_trace(trace_file)`
2. Create a fresh harness in mock mode
3. For each cycle in the trace:
- Re-execute the `actions_planned` list
- Compare actual `actions_executed` outcomes against the recorded ones
4. Score: `(matching_actions / total_actions) * 100`
### Eval Criteria
| Score | Grade | Meaning |
|---------|----------|--------------------------------------------|
| >= 90% | PASS | Replay matches original closely |
| 70-89% | PARTIAL | Some divergence, investigate differences |
| < 70% | FAIL | Significant drift, review action semantics |
## Replay Script (sketch)
```python
from nexus.bannerlord_trace import BannerlordTraceLogger
from nexus.bannerlord_harness import BannerlordHarness
# Load trace
cycles = BannerlordTraceLogger.load_trace(
Path.home() / ".timmy" / "traces" / "bannerlord" / "trace_bl_xxx.jsonl"
)
# Replay
harness = BannerlordHarness(enable_mock=True, enable_trace=False)
await harness.start()
for cycle in cycles:
for action in cycle["actions_planned"]:
result = await harness.execute_action(action)
# Compare result against cycle["actions_executed"]
await harness.stop()
```
## Hermes Session Mapping
The `hermes_session_id` and `hermes_log_id` fields link traces to Hermes session logs.
When a trace is captured during a live Hermes session, populate these fields so
the trace can be correlated with the broader agent conversation context.