the-nexus/nexus/traces/bannerlord/REPLAY.md

# Bannerlord Session Trace — Replay & Eval Guide

## Storage Layout

All traces live under `~/.timmy/traces/bannerlord/`:

```
~/.timmy/traces/bannerlord/
  trace_<trace_id>.jsonl    # One line per ODA cycle (full state + actions)
  manifest_<trace_id>.json  # Session metadata, counts, replay command
```

## Trace Format (JSONL)

Each line is one ODA cycle:

```json
{
  "cycle_index": 0,
  "timestamp_start": "2026-04-10T20:15:00+00:00",
  "timestamp_end": "2026-04-10T20:15:45+00:00",
  "duration_ms": 45000,

  "screenshot_path": "/tmp/bannerlord_capture_1744320900.png",
  "window_found": true,
  "screen_size": [1920, 1080],
  "mouse_position": [960, 540],
  "playtime_hours": 142.5,
  "players_online": 8421,
  "is_running": true,

  "actions_planned": [{"type": "move_to", "x": 960, "y": 540}],
  "actions_executed": [{"success": true, "action": "move_to", ...}],
  "actions_succeeded": 1,
  "actions_failed": 0,

  "hermes_session_id": "f47ac10b",
  "hermes_log_id": "",
  "harness_session_id": "f47ac10b"
}
```

## Capturing a Trace

```bash
# Run harness with trace logging enabled
cd /path/to/the-nexus
python -m nexus.bannerlord_harness --mock --trace --iterations 3
```

The trace and manifest are written to `~/.timmy/traces/bannerlord/` on harness shutdown.

## Replay Protocol

1. Load a trace: `BannerlordTraceLogger.load_trace(trace_file)`
2. Create a fresh harness in mock mode
3. For each cycle in the trace:
   - Re-execute the `actions_planned` list
   - Compare actual `actions_executed` outcomes against the recorded ones
4. Score: `(matching_actions / total_actions) * 100`

### Eval Criteria

| Score   | Grade    | Meaning                                    |
|---------|----------|--------------------------------------------|
| >= 90%  | PASS     | Replay matches original closely            |
| 70-89%  | PARTIAL  | Some divergence, investigate differences   |
| < 70%   | FAIL     | Significant drift, review action semantics |

## Replay Script (sketch)

```python
from nexus.bannerlord_trace import BannerlordTraceLogger
from nexus.bannerlord_harness import BannerlordHarness

# Load trace
cycles = BannerlordTraceLogger.load_trace(
    Path.home() / ".timmy" / "traces" / "bannerlord" / "trace_bl_xxx.jsonl"
)

# Replay
harness = BannerlordHarness(enable_mock=True, enable_trace=False)
await harness.start()

for cycle in cycles:
    for action in cycle["actions_planned"]:
        result = await harness.execute_action(action)
        # Compare result against cycle["actions_executed"]

await harness.stop()
```

## Hermes Session Mapping

The `hermes_session_id` and `hermes_log_id` fields link traces to Hermes session logs.
When a trace is captured during a live Hermes session, populate these fields so
the trace can be correlated with the broader agent conversation context.