Compare commits
1 Commits
mimo/resea
...
mimo/code/
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
1b2ac5cd1f |
@@ -1,203 +0,0 @@
|
||||
# FINDINGS: MemPalace Local AI Memory System Assessment & Leverage Plan
|
||||
|
||||
**Issue:** #1047
|
||||
**Date:** 2026-04-10
|
||||
**Investigator:** mimo-v2-pro (swarm researcher)
|
||||
|
||||
---
|
||||
|
||||
## 1. What Issue #1047 Claims
|
||||
|
||||
The issue (authored by Bezalel, dated 2026-04-07) describes MemPalace as:
|
||||
- An open-source local-first AI memory system with highest published LongMemEval scores (96.6% R@5)
|
||||
- A Python CLI + MCP server using ChromaDB + SQLite with a "palace" hierarchy metaphor
|
||||
- AAAK compression dialect for ~30x context compression
|
||||
- 19 MCP tools for agent memory
|
||||
|
||||
It recommends that every wizard clone/vendor MemPalace, configure rooms, mine workspace, and wire the searcher into heartbeats.
|
||||
|
||||
## 2. What Actually Exists in the Codebase (Current State)
|
||||
|
||||
The Nexus repo already contains **substantial MemPalace integration** that goes well beyond the original research proposal. Here is the full inventory:
|
||||
|
||||
### 2.1 Core Python Layer — `nexus/mempalace/` (3 files, ~290 lines)
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `config.py` | Environment-driven config: palace paths, fleet path, wing name, core rooms, collection name |
|
||||
| `searcher.py` | ChromaDB-backed search/write API with `search_memories()`, `search_fleet()`, `add_memory()` |
|
||||
| `__init__.py` | Package marker |
|
||||
|
||||
**Status:** Functional. Clean API. Lazy ChromaDB import with graceful `MemPalaceUnavailable` exception.
|
||||
|
||||
### 2.2 Fleet Management Tools — `mempalace/` (8 files, ~800 lines)
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `rooms.yaml` | Fleet-wide room taxonomy standard (5 core rooms + optional rooms) |
|
||||
| `validate_rooms.py` | Validates wizard `mempalace.yaml` against fleet standard |
|
||||
| `audit_privacy.py` | Scans fleet palace for policy violations (raw drawers, oversized closets, private paths) |
|
||||
| `retain_closets.py` | 90-day retention enforcement for closet aging |
|
||||
| `export_closets.sh` | Privacy-safe closet export for rsync to Alpha fleet palace |
|
||||
| `fleet_api.py` | HTTP API for shared fleet palace (search, record, wings) |
|
||||
| `tunnel_sync.py` | Pull closets from remote wizard's fleet API into local palace |
|
||||
| `__init__.py` | Package marker |
|
||||
|
||||
**Status:** Well-structured. Each tool has clear CLI interface and proper error handling.
|
||||
|
||||
### 2.3 Evennia MUD Integration — `nexus/evennia_mempalace/` (6 files, ~580 lines)
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `commands/recall.py` | `CmdRecall` (semantic search), `CmdEnterRoom` (teleport), `CmdAsk` (NPC query) |
|
||||
| `commands/write.py` | `CmdRecord`, `CmdNote`, `CmdEvent` (memory writing commands) |
|
||||
| `typeclasses/rooms.py` | `MemPalaceRoom` typeclass |
|
||||
| `typeclasses/npcs.py` | `StewardNPC` with question-answering via palace search |
|
||||
|
||||
**Status:** Complete. Evennia stub fallback for testing outside live environment.
|
||||
|
||||
### 2.4 3D Visualization — `nexus/components/spatial-memory.js` (~665 lines)
|
||||
|
||||
Maps memory categories to spatial regions in the Nexus Three.js world:
|
||||
- Inner ring: Documents, Projects, Code, Conversations, Working Memory, Archive
|
||||
- Outer ring (MemPalace zones, issue #1168): User Preferences, Project Facts, Tool Knowledge, General Facts
|
||||
- Crystal geometry with deterministic positioning, connection lines, localStorage persistence
|
||||
|
||||
**Status:** Functional 3D visualization with region markers, memory crystals, and animation.
|
||||
|
||||
### 2.5 Frontend Integration — `mempalace.js` (~44 lines)
|
||||
|
||||
Basic Electron/browser integration class that:
|
||||
- Initializes a palace wing
|
||||
- Auto-mines chat content every 30 seconds
|
||||
- Exposes `search()` method
|
||||
- Updates stats display
|
||||
|
||||
**Status:** Minimal but functional as a bridge between browser UI and CLI mempalace.
|
||||
|
||||
### 2.6 Scripts & Automation — `scripts/` (5 files)
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `mempalace-incremental-mine.sh` | Re-mines only changed files since last run |
|
||||
| `mempalace_nightly.sh` | Nightly maintenance |
|
||||
| `mempalace_export.py` | Export utility |
|
||||
| `validate_mempalace_taxonomy.py` | Taxonomy validation script |
|
||||
| `audit_mempalace_privacy.py` | Privacy audit script |
|
||||
| `sync_fleet_to_alpha.sh` | Fleet sync to Alpha server |
|
||||
|
||||
### 2.7 Tests — `tests/` (7 test files)
|
||||
|
||||
| File | Tests |
|
||||
|------|-------|
|
||||
| `test_mempalace_searcher.py` | Searcher API, config |
|
||||
| `test_mempalace_validate_rooms.py` | Room taxonomy validation |
|
||||
| `test_mempalace_retain_closets.py` | Closet retention |
|
||||
| `test_mempalace_audit_privacy.py` | Privacy auditor |
|
||||
| `test_mempalace_fleet_api.py` | Fleet HTTP API |
|
||||
| `test_mempalace_tunnel_sync.py` | Remote wizard sync |
|
||||
| `test_evennia_mempalace_commands.py` | Evennia commands + NPC helpers |
|
||||
|
||||
### 2.8 CI/CD
|
||||
|
||||
- **ci.yml**: Validates palace taxonomy on every PR, plus Python/JSON/YAML syntax checks
|
||||
- **weekly-audit.yml**: Monday 05:00 UTC — runs privacy audit + dry-run retention against test fixtures
|
||||
|
||||
### 2.9 Documentation
|
||||
|
||||
- `docs/mempalace_taxonomy.yaml` — Full taxonomy standard (145 lines)
|
||||
- `docs/mempalace/rooms.yaml` — Rooms documentation
|
||||
- `docs/mempalace/bezalel_example.yaml` — Example wizard config
|
||||
- `docs/bezalel/evennia/` — Evennia integration examples (steward NPC, palace commands)
|
||||
- `reports/bezalel/2026-04-07-mempalace-field-report.md` — Original field report
|
||||
|
||||
## 3. Gap Analysis: Issue #1047 vs. Reality
|
||||
|
||||
| Issue #1047 Proposes | Current State | Gap |
|
||||
|---------------------|---------------|-----|
|
||||
| "Each wizard should clone/vendor it" | Vendor infrastructure exists (`scripts/mempalace-incremental-mine.sh`) | **DONE** |
|
||||
| "Write a mempalace.yaml" | Fleet taxonomy standard + validator exist | **DONE** |
|
||||
| "Run mempalace mine" | Incremental mining script exists | **DONE** |
|
||||
| "Wire searcher into heartbeat scripts" | `nexus/mempalace/searcher.py` provides API | **DONE** (needs adoption verification) |
|
||||
| AAAK compression | Not implemented in repo | **OPEN** — no AAAK dialect code |
|
||||
| MCP server (19 tools) | No MCP server integration | **OPEN** — no MCP tool definitions |
|
||||
| Benchmark validation | No LongMemEval test harness in repo | **OPEN** — claims unverified locally |
|
||||
| Fleet-wide adoption | Only Bezalel field report exists | **OPEN** — no evidence of Timmy/Allegro/Ezra adoption |
|
||||
| Hermes harness integration | No direct harness/memory-tool bridge | **OPEN** — searcher exists but no harness wiring |
|
||||
|
||||
## 4. What's Actually Broken
|
||||
|
||||
### 4.1 No AAAK Implementation
|
||||
The issue describes AAAK (~30x compression, ~170 tokens wake-up context) as a key feature, but there is zero AAAK code in the repo. The `nexus/mempalace/` layer has no compression functions. This is a missing feature, not a bug.
|
||||
|
||||
### 4.2 No MCP Server Bridge
|
||||
The upstream MemPalace offers 19 MCP tools, but the Nexus integration only exposes the ChromaDB Python API. There is no MCP server definition, no tool registration for the harness, and no bridge to the `mcp_config.json` at repo root.
|
||||
|
||||
### 4.3 Fleet Adoption Gap
|
||||
Only Bezalel has a documented field report (#1072). There is no evidence that Timmy, Allegro, or Ezra have populated palaces, configured room taxonomies, or run incremental mining. The `export_closets.sh` script hardcodes Bezalel paths.
|
||||
|
||||
### 4.4 Frontend Integration Stale
|
||||
`mempalace.js` references `window.electronAPI.execPython()` which only works in the Electron shell. The main `app.js` (Three.js world) does not import or use `mempalace.js`. The `spatial-memory.js` component defines MemPalace zones but has no data pipeline to populate them from actual palace data.
|
||||
|
||||
### 4.5 Upstream Quality Concern
|
||||
Bezalel's field report notes the upstream repo is "astroturfed hype" — 13.4k LOC in a single commit, 5,769 GitHub stars in 48 hours, ~125 lines of tests. The code is not malicious but is not production-grade. The Nexus has effectively forked/vendored the useful parts and rewritten the critical integration layers.
|
||||
|
||||
## 5. What's Working Well
|
||||
|
||||
1. **Clean architecture separation** — `nexus/mempalace/` is a proper Python package with config/searcher separation. Testable without ChromaDB installed.
|
||||
|
||||
2. **Privacy-first fleet design** — closet-only export policy, privacy auditor, retention enforcement, and private path detection are solid operational safeguards.
|
||||
|
||||
3. **Taxonomy standardization** — `rooms.yaml` + validator ensures consistent memory structure across wizards.
|
||||
|
||||
4. **CI integration** — Taxonomy validation in PR checks + weekly privacy audit cron are good DevOps practices.
|
||||
|
||||
5. **Evennia integration** — The MUD commands (recall, enter room, ask steward) are well-designed and testable outside Evennia via stubs.
|
||||
|
||||
6. **Spatial visualization** — `spatial-memory.js` is a creative 3D representation with deterministic positioning and category zones.
|
||||
|
||||
## 6. Recommended Actions
|
||||
|
||||
### Priority 1: Fleet Adoption Verification (effort: small)
|
||||
- Confirm each wizard (Timmy, Allegro, Ezra) has run `mempalace mine` and has a populated palace
|
||||
- Verify `mempalace.yaml` exists on each wizard's VPS
|
||||
- Update `export_closets.sh` to not hardcode Bezalel paths (use env vars)
|
||||
|
||||
### Priority 2: Hermes Harness Bridge (effort: medium)
|
||||
- Wire `nexus/mempalace/searcher.py` into the Hermes harness as a memory tool
|
||||
- Add memory search/recall to the agent loop so wizards get cross-session context automatically
|
||||
- Map MemPalace search to the existing `memory`/`fact_store` tools or add a dedicated `palace_search` tool
|
||||
|
||||
### Priority 3: MCP Server Registration (effort: medium)
|
||||
- Create an MCP server that exposes search, write, and status tools
|
||||
- Register in `mcp_config.json`
|
||||
- Enable any harness agent to use MemPalace without Python imports
|
||||
|
||||
### Priority 4: AAAK Compression (effort: large, optional)
|
||||
- Implement or port the AAAK compression dialect
|
||||
- Generate wake-up context summaries from palace data
|
||||
- This is a nice-to-have, not critical — the raw ChromaDB search is functional
|
||||
|
||||
### Priority 5: 3D Pipeline Bridge (effort: medium)
|
||||
- Connect `spatial-memory.js` to live palace data via WebSocket or REST
|
||||
- Populate memory crystals from actual search results
|
||||
- Visual feedback when new memories are added
|
||||
|
||||
## 7. Effort Summary
|
||||
|
||||
| Action | Effort | Impact |
|
||||
|--------|--------|--------|
|
||||
| Fleet adoption verification | 2-4 hours | High — ensures all wizards have memory |
|
||||
| Hermes harness bridge | 1-2 days | High — automatic cross-session context |
|
||||
| MCP server registration | 1 day | Medium — enables any agent to use palace |
|
||||
| AAAK compression | 2-3 days | Low — nice-to-have |
|
||||
| 3D pipeline bridge | 1-2 days | Medium — visual representation of memory |
|
||||
| Fix export_closets.sh hardcoded paths | 30 min | Low — operational hygiene |
|
||||
|
||||
## 8. Conclusion
|
||||
|
||||
Issue #1047 was a research request from 2026-04-07. Since then, significant implementation work has been completed — far exceeding the original proposal. The core memory infrastructure (searcher, fleet tools, privacy, taxonomy, Evennia integration, tests, CI) is **built and functional**.
|
||||
|
||||
The primary remaining gap is **fleet-wide adoption** (only Bezalel has documented use) and **harness integration** (the searcher exists but isn't wired into the agent loop). The AAAK and MCP features from the original research are not implemented but are not blocking — the ChromaDB-backed search provides the core value proposition.
|
||||
|
||||
**Verdict:** The MemPalace integration is substantially complete at the infrastructure level. The next bottleneck is operational adoption and harness wiring, not new feature development.
|
||||
@@ -29,6 +29,8 @@ from typing import Any, Callable, Optional
|
||||
|
||||
import websockets
|
||||
|
||||
from bannerlord_trace import BannerlordTraceLogger
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
# CONFIGURATION
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
@@ -265,11 +267,13 @@ class BannerlordHarness:
|
||||
desktop_command: Optional[list[str]] = None,
|
||||
steam_command: Optional[list[str]] = None,
|
||||
enable_mock: bool = False,
|
||||
enable_trace: bool = False,
|
||||
):
|
||||
self.hermes_ws_url = hermes_ws_url
|
||||
self.desktop_command = desktop_command or DEFAULT_MCP_DESKTOP_COMMAND
|
||||
self.steam_command = steam_command or DEFAULT_MCP_STEAM_COMMAND
|
||||
self.enable_mock = enable_mock
|
||||
self.enable_trace = enable_trace
|
||||
|
||||
# MCP clients
|
||||
self.desktop_mcp: Optional[MCPClient] = None
|
||||
@@ -284,6 +288,9 @@ class BannerlordHarness:
|
||||
self.cycle_count = 0
|
||||
self.running = False
|
||||
|
||||
# Session trace logger
|
||||
self.trace_logger: Optional[BannerlordTraceLogger] = None
|
||||
|
||||
# ═══ LIFECYCLE ═══
|
||||
|
||||
async def start(self) -> bool:
|
||||
@@ -314,6 +321,15 @@ class BannerlordHarness:
|
||||
# Connect to Hermes WebSocket
|
||||
await self._connect_hermes()
|
||||
|
||||
# Initialize trace logger if enabled
|
||||
if self.enable_trace:
|
||||
self.trace_logger = BannerlordTraceLogger(
|
||||
harness_session_id=self.session_id,
|
||||
hermes_session_id=self.session_id,
|
||||
)
|
||||
self.trace_logger.start_session()
|
||||
log.info(f"Trace logger started: {self.trace_logger.trace_id}")
|
||||
|
||||
log.info("Harness initialized successfully")
|
||||
return True
|
||||
|
||||
@@ -322,6 +338,12 @@ class BannerlordHarness:
|
||||
self.running = False
|
||||
log.info("Shutting down harness...")
|
||||
|
||||
# Finalize trace logger
|
||||
if self.trace_logger:
|
||||
manifest = self.trace_logger.finish_session()
|
||||
log.info(f"Trace saved: {manifest.trace_file}")
|
||||
log.info(f"Manifest: {self.trace_logger.manifest_file}")
|
||||
|
||||
if self.desktop_mcp:
|
||||
self.desktop_mcp.stop()
|
||||
if self.steam_mcp:
|
||||
@@ -707,6 +729,11 @@ class BannerlordHarness:
|
||||
self.cycle_count = iteration
|
||||
log.info(f"\n--- ODA Cycle {iteration + 1}/{max_iterations} ---")
|
||||
|
||||
# Start trace cycle
|
||||
trace_cycle = None
|
||||
if self.trace_logger:
|
||||
trace_cycle = self.trace_logger.begin_cycle(iteration)
|
||||
|
||||
# 1. OBSERVE: Capture state
|
||||
log.info("[OBSERVE] Capturing game state...")
|
||||
state = await self.capture_state()
|
||||
@@ -715,11 +742,24 @@ class BannerlordHarness:
|
||||
log.info(f" Screen: {state.visual.screen_size}")
|
||||
log.info(f" Players online: {state.game_context.current_players_online}")
|
||||
|
||||
# Populate trace with observation data
|
||||
if trace_cycle:
|
||||
trace_cycle.screenshot_path = state.visual.screenshot_path or ""
|
||||
trace_cycle.window_found = state.visual.window_found
|
||||
trace_cycle.screen_size = list(state.visual.screen_size)
|
||||
trace_cycle.mouse_position = list(state.visual.mouse_position)
|
||||
trace_cycle.playtime_hours = state.game_context.playtime_hours
|
||||
trace_cycle.players_online = state.game_context.current_players_online
|
||||
trace_cycle.is_running = state.game_context.is_running
|
||||
|
||||
# 2. DECIDE: Get actions from decision function
|
||||
log.info("[DECIDE] Getting actions...")
|
||||
actions = decision_fn(state)
|
||||
log.info(f" Decision returned {len(actions)} actions")
|
||||
|
||||
if trace_cycle:
|
||||
trace_cycle.actions_planned = actions
|
||||
|
||||
# 3. ACT: Execute actions
|
||||
log.info("[ACT] Executing actions...")
|
||||
results = []
|
||||
@@ -731,6 +771,13 @@ class BannerlordHarness:
|
||||
if result.error:
|
||||
log.info(f" Error: {result.error}")
|
||||
|
||||
if trace_cycle:
|
||||
trace_cycle.actions_executed.append(result.to_dict())
|
||||
|
||||
# Finalize trace cycle
|
||||
if trace_cycle:
|
||||
self.trace_logger.finish_cycle(trace_cycle)
|
||||
|
||||
# Send cycle summary telemetry
|
||||
await self._send_telemetry({
|
||||
"type": "oda_cycle_complete",
|
||||
@@ -836,12 +883,18 @@ async def main():
|
||||
default=1.0,
|
||||
help="Delay between iterations in seconds (default: 1.0)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--trace",
|
||||
action="store_true",
|
||||
help="Enable session trace logging to ~/.timmy/traces/bannerlord/",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
# Create harness
|
||||
harness = BannerlordHarness(
|
||||
hermes_ws_url=args.hermes_ws,
|
||||
enable_mock=args.mock,
|
||||
enable_trace=args.trace,
|
||||
)
|
||||
|
||||
try:
|
||||
|
||||
234
nexus/bannerlord_trace.py
Normal file
234
nexus/bannerlord_trace.py
Normal file
@@ -0,0 +1,234 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Bannerlord Session Trace Logger — First-Replayable Training Material
|
||||
|
||||
Captures one Bannerlord session as a replayable trace:
|
||||
- Timestamps on every cycle
|
||||
- Actions executed with success/failure
|
||||
- World-state evidence (screenshots, Steam stats)
|
||||
- Hermes session/log ID mapping
|
||||
|
||||
Storage: ~/.timmy/traces/bannerlord/trace_<session_id>.jsonl
|
||||
Manifest: ~/.timmy/traces/bannerlord/manifest_<session_id>.json
|
||||
|
||||
Each JSONL line is one ODA cycle with full context.
|
||||
The manifest bundles metadata for replay/eval.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import time
|
||||
import uuid
|
||||
from dataclasses import dataclass, field, asdict
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
# Storage root — local-first under ~/.timmy/
|
||||
DEFAULT_TRACE_DIR = Path.home() / ".timmy" / "traces" / "bannerlord"
|
||||
|
||||
|
||||
@dataclass
|
||||
class CycleTrace:
|
||||
"""One ODA cycle captured in full."""
|
||||
cycle_index: int
|
||||
timestamp_start: str
|
||||
timestamp_end: str = ""
|
||||
duration_ms: int = 0
|
||||
|
||||
# Observe
|
||||
screenshot_path: str = ""
|
||||
window_found: bool = False
|
||||
screen_size: list[int] = field(default_factory=lambda: [1920, 1080])
|
||||
mouse_position: list[int] = field(default_factory=lambda: [0, 0])
|
||||
playtime_hours: float = 0.0
|
||||
players_online: int = 0
|
||||
is_running: bool = False
|
||||
|
||||
# Decide
|
||||
actions_planned: list[dict] = field(default_factory=list)
|
||||
decision_note: str = ""
|
||||
|
||||
# Act
|
||||
actions_executed: list[dict] = field(default_factory=list)
|
||||
actions_succeeded: int = 0
|
||||
actions_failed: int = 0
|
||||
|
||||
# Metadata
|
||||
hermes_session_id: str = ""
|
||||
hermes_log_id: str = ""
|
||||
harness_session_id: str = ""
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return asdict(self)
|
||||
|
||||
|
||||
@dataclass
|
||||
class SessionManifest:
|
||||
"""Top-level metadata for a captured session trace."""
|
||||
trace_id: str
|
||||
harness_session_id: str
|
||||
hermes_session_id: str
|
||||
hermes_log_id: str
|
||||
game: str = "Mount & Blade II: Bannerlord"
|
||||
app_id: int = 261550
|
||||
started_at: str = ""
|
||||
finished_at: str = ""
|
||||
total_cycles: int = 0
|
||||
total_actions: int = 0
|
||||
total_succeeded: int = 0
|
||||
total_failed: int = 0
|
||||
trace_file: str = ""
|
||||
trace_dir: str = ""
|
||||
replay_command: str = ""
|
||||
eval_note: str = ""
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return asdict(self)
|
||||
|
||||
|
||||
class BannerlordTraceLogger:
|
||||
"""
|
||||
Captures a single Bannerlord session as a replayable trace.
|
||||
|
||||
Usage:
|
||||
logger = BannerlordTraceLogger(hermes_session_id="abc123")
|
||||
logger.start_session()
|
||||
cycle = logger.begin_cycle(0)
|
||||
# ... populate cycle fields ...
|
||||
logger.finish_cycle(cycle)
|
||||
manifest = logger.finish_session()
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
trace_dir: Optional[Path] = None,
|
||||
harness_session_id: str = "",
|
||||
hermes_session_id: str = "",
|
||||
hermes_log_id: str = "",
|
||||
):
|
||||
self.trace_dir = trace_dir or DEFAULT_TRACE_DIR
|
||||
self.trace_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
self.trace_id = f"bl_{datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
|
||||
self.harness_session_id = harness_session_id or str(uuid.uuid4())[:8]
|
||||
self.hermes_session_id = hermes_session_id
|
||||
self.hermes_log_id = hermes_log_id
|
||||
|
||||
self.trace_file = self.trace_dir / f"trace_{self.trace_id}.jsonl"
|
||||
self.manifest_file = self.trace_dir / f"manifest_{self.trace_id}.json"
|
||||
|
||||
self.cycles: list[CycleTrace] = []
|
||||
self.started_at: str = ""
|
||||
self.finished_at: str = ""
|
||||
|
||||
def start_session(self) -> str:
|
||||
"""Begin a trace session. Returns trace_id."""
|
||||
self.started_at = datetime.now(timezone.utc).isoformat()
|
||||
return self.trace_id
|
||||
|
||||
def begin_cycle(self, cycle_index: int) -> CycleTrace:
|
||||
"""Start recording one ODA cycle."""
|
||||
cycle = CycleTrace(
|
||||
cycle_index=cycle_index,
|
||||
timestamp_start=datetime.now(timezone.utc).isoformat(),
|
||||
harness_session_id=self.harness_session_id,
|
||||
hermes_session_id=self.hermes_session_id,
|
||||
hermes_log_id=self.hermes_log_id,
|
||||
)
|
||||
return cycle
|
||||
|
||||
def finish_cycle(self, cycle: CycleTrace) -> None:
|
||||
"""Finalize and persist one cycle to the trace file."""
|
||||
cycle.timestamp_end = datetime.now(timezone.utc).isoformat()
|
||||
# Compute duration
|
||||
try:
|
||||
t0 = datetime.fromisoformat(cycle.timestamp_start)
|
||||
t1 = datetime.fromisoformat(cycle.timestamp_end)
|
||||
cycle.duration_ms = int((t1 - t0).total_seconds() * 1000)
|
||||
except (ValueError, TypeError):
|
||||
cycle.duration_ms = 0
|
||||
|
||||
# Count successes/failures
|
||||
cycle.actions_succeeded = sum(
|
||||
1 for a in cycle.actions_executed if a.get("success", False)
|
||||
)
|
||||
cycle.actions_failed = sum(
|
||||
1 for a in cycle.actions_executed if not a.get("success", True)
|
||||
)
|
||||
|
||||
self.cycles.append(cycle)
|
||||
|
||||
# Append to JSONL
|
||||
with open(self.trace_file, "a") as f:
|
||||
f.write(json.dumps(cycle.to_dict()) + "\n")
|
||||
|
||||
def finish_session(self) -> SessionManifest:
|
||||
"""Finalize the session and write the manifest."""
|
||||
self.finished_at = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
total_actions = sum(len(c.actions_executed) for c in self.cycles)
|
||||
total_succeeded = sum(c.actions_succeeded for c in self.cycles)
|
||||
total_failed = sum(c.actions_failed for c in self.cycles)
|
||||
|
||||
manifest = SessionManifest(
|
||||
trace_id=self.trace_id,
|
||||
harness_session_id=self.harness_session_id,
|
||||
hermes_session_id=self.hermes_session_id,
|
||||
hermes_log_id=self.hermes_log_id,
|
||||
started_at=self.started_at,
|
||||
finished_at=self.finished_at,
|
||||
total_cycles=len(self.cycles),
|
||||
total_actions=total_actions,
|
||||
total_succeeded=total_succeeded,
|
||||
total_failed=total_failed,
|
||||
trace_file=str(self.trace_file),
|
||||
trace_dir=str(self.trace_dir),
|
||||
replay_command=(
|
||||
f"python -m nexus.bannerlord_harness --mock --replay {self.trace_file}"
|
||||
),
|
||||
eval_note=(
|
||||
"To replay: load this trace, re-execute each cycle's actions_planned "
|
||||
"against a fresh harness in mock mode, compare actions_executed outcomes. "
|
||||
"Success metric: >=90% action parity between original and replay runs."
|
||||
),
|
||||
)
|
||||
|
||||
with open(self.manifest_file, "w") as f:
|
||||
json.dump(manifest.to_dict(), f, indent=2)
|
||||
|
||||
return manifest
|
||||
|
||||
@classmethod
|
||||
def load_trace(cls, trace_file: Path) -> list[dict]:
|
||||
"""Load a trace JSONL file for replay or analysis."""
|
||||
cycles = []
|
||||
with open(trace_file) as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if line:
|
||||
cycles.append(json.loads(line))
|
||||
return cycles
|
||||
|
||||
@classmethod
|
||||
def load_manifest(cls, manifest_file: Path) -> dict:
|
||||
"""Load a session manifest."""
|
||||
with open(manifest_file) as f:
|
||||
return json.load(f)
|
||||
|
||||
@classmethod
|
||||
def list_traces(cls, trace_dir: Optional[Path] = None) -> list[dict]:
|
||||
"""List all available trace sessions."""
|
||||
d = trace_dir or DEFAULT_TRACE_DIR
|
||||
if not d.exists():
|
||||
return []
|
||||
|
||||
traces = []
|
||||
for mf in sorted(d.glob("manifest_*.json")):
|
||||
try:
|
||||
manifest = cls.load_manifest(mf)
|
||||
traces.append(manifest)
|
||||
except (json.JSONDecodeError, IOError):
|
||||
continue
|
||||
return traces
|
||||
97
nexus/traces/bannerlord/REPLAY.md
Normal file
97
nexus/traces/bannerlord/REPLAY.md
Normal file
@@ -0,0 +1,97 @@
|
||||
# Bannerlord Session Trace — Replay & Eval Guide
|
||||
|
||||
## Storage Layout
|
||||
|
||||
All traces live under `~/.timmy/traces/bannerlord/`:
|
||||
|
||||
```
|
||||
~/.timmy/traces/bannerlord/
|
||||
trace_<trace_id>.jsonl # One line per ODA cycle (full state + actions)
|
||||
manifest_<trace_id>.json # Session metadata, counts, replay command
|
||||
```
|
||||
|
||||
## Trace Format (JSONL)
|
||||
|
||||
Each line is one ODA cycle:
|
||||
|
||||
```json
|
||||
{
|
||||
"cycle_index": 0,
|
||||
"timestamp_start": "2026-04-10T20:15:00+00:00",
|
||||
"timestamp_end": "2026-04-10T20:15:45+00:00",
|
||||
"duration_ms": 45000,
|
||||
|
||||
"screenshot_path": "/tmp/bannerlord_capture_1744320900.png",
|
||||
"window_found": true,
|
||||
"screen_size": [1920, 1080],
|
||||
"mouse_position": [960, 540],
|
||||
"playtime_hours": 142.5,
|
||||
"players_online": 8421,
|
||||
"is_running": true,
|
||||
|
||||
"actions_planned": [{"type": "move_to", "x": 960, "y": 540}],
|
||||
"actions_executed": [{"success": true, "action": "move_to", ...}],
|
||||
"actions_succeeded": 1,
|
||||
"actions_failed": 0,
|
||||
|
||||
"hermes_session_id": "f47ac10b",
|
||||
"hermes_log_id": "",
|
||||
"harness_session_id": "f47ac10b"
|
||||
}
|
||||
```
|
||||
|
||||
## Capturing a Trace
|
||||
|
||||
```bash
|
||||
# Run harness with trace logging enabled
|
||||
cd /path/to/the-nexus
|
||||
python -m nexus.bannerlord_harness --mock --trace --iterations 3
|
||||
```
|
||||
|
||||
The trace and manifest are written to `~/.timmy/traces/bannerlord/` on harness shutdown.
|
||||
|
||||
## Replay Protocol
|
||||
|
||||
1. Load a trace: `BannerlordTraceLogger.load_trace(trace_file)`
|
||||
2. Create a fresh harness in mock mode
|
||||
3. For each cycle in the trace:
|
||||
- Re-execute the `actions_planned` list
|
||||
- Compare actual `actions_executed` outcomes against the recorded ones
|
||||
4. Score: `(matching_actions / total_actions) * 100`
|
||||
|
||||
### Eval Criteria
|
||||
|
||||
| Score | Grade | Meaning |
|
||||
|---------|----------|--------------------------------------------|
|
||||
| >= 90% | PASS | Replay matches original closely |
|
||||
| 70-89% | PARTIAL | Some divergence, investigate differences |
|
||||
| < 70% | FAIL | Significant drift, review action semantics |
|
||||
|
||||
## Replay Script (sketch)
|
||||
|
||||
```python
|
||||
from nexus.bannerlord_trace import BannerlordTraceLogger
|
||||
from nexus.bannerlord_harness import BannerlordHarness
|
||||
|
||||
# Load trace
|
||||
cycles = BannerlordTraceLogger.load_trace(
|
||||
Path.home() / ".timmy" / "traces" / "bannerlord" / "trace_bl_xxx.jsonl"
|
||||
)
|
||||
|
||||
# Replay
|
||||
harness = BannerlordHarness(enable_mock=True, enable_trace=False)
|
||||
await harness.start()
|
||||
|
||||
for cycle in cycles:
|
||||
for action in cycle["actions_planned"]:
|
||||
result = await harness.execute_action(action)
|
||||
# Compare result against cycle["actions_executed"]
|
||||
|
||||
await harness.stop()
|
||||
```
|
||||
|
||||
## Hermes Session Mapping
|
||||
|
||||
The `hermes_session_id` and `hermes_log_id` fields link traces to Hermes session logs.
|
||||
When a trace is captured during a live Hermes session, populate these fields so
|
||||
the trace can be correlated with the broader agent conversation context.
|
||||
18
nexus/traces/bannerlord/sample_manifest.json
Normal file
18
nexus/traces/bannerlord/sample_manifest.json
Normal file
@@ -0,0 +1,18 @@
|
||||
{
|
||||
"trace_id": "bl_20260410_201500_a1b2c3",
|
||||
"harness_session_id": "f47ac10b",
|
||||
"hermes_session_id": "f47ac10b",
|
||||
"hermes_log_id": "",
|
||||
"game": "Mount & Blade II: Bannerlord",
|
||||
"app_id": 261550,
|
||||
"started_at": "2026-04-10T20:15:00+00:00",
|
||||
"finished_at": "2026-04-10T20:17:30+00:00",
|
||||
"total_cycles": 3,
|
||||
"total_actions": 6,
|
||||
"total_succeeded": 6,
|
||||
"total_failed": 0,
|
||||
"trace_file": "~/.timmy/traces/bannerlord/trace_bl_20260410_201500_a1b2c3.jsonl",
|
||||
"trace_dir": "~/.timmy/traces/bannerlord",
|
||||
"replay_command": "python -m nexus.bannerlord_harness --mock --replay ~/.timmy/traces/bannerlord/trace_bl_20260410_201500_a1b2c3.jsonl",
|
||||
"eval_note": "To replay: load trace, re-execute each cycle's actions_planned against a fresh harness in mock mode, compare actions_executed outcomes. Success metric: >=90% action parity between original and replay runs."
|
||||
}
|
||||
3
nexus/traces/bannerlord/sample_trace.jsonl
Normal file
3
nexus/traces/bannerlord/sample_trace.jsonl
Normal file
@@ -0,0 +1,3 @@
|
||||
{"cycle_index": 0, "timestamp_start": "2026-04-10T20:15:00+00:00", "timestamp_end": "2026-04-10T20:15:45+00:00", "duration_ms": 45000, "screenshot_path": "/tmp/bannerlord_capture_1744320900.png", "window_found": true, "screen_size": [1920, 1080], "mouse_position": [960, 540], "playtime_hours": 142.5, "players_online": 8421, "is_running": true, "actions_planned": [{"type": "move_to", "x": 960, "y": 540}, {"type": "press_key", "key": "space"}], "decision_note": "Initial state capture. Move to screen center and press space to advance.", "actions_executed": [{"success": true, "action": "move_to", "params": {"type": "move_to", "x": 960, "y": 540}, "timestamp": "2026-04-10T20:15:30+00:00", "error": null}, {"success": true, "action": "press_key", "params": {"type": "press_key", "key": "space"}, "timestamp": "2026-04-10T20:15:45+00:00", "error": null}], "actions_succeeded": 2, "actions_failed": 0, "hermes_session_id": "f47ac10b", "hermes_log_id": "", "harness_session_id": "f47ac10b"}
|
||||
{"cycle_index": 1, "timestamp_start": "2026-04-10T20:15:45+00:00", "timestamp_end": "2026-04-10T20:16:30+00:00", "duration_ms": 45000, "screenshot_path": "/tmp/bannerlord_capture_1744320945.png", "window_found": true, "screen_size": [1920, 1080], "mouse_position": [960, 540], "playtime_hours": 142.5, "players_online": 8421, "is_running": true, "actions_planned": [{"type": "press_key", "key": "p"}], "decision_note": "Open party screen to inspect troops.", "actions_executed": [{"success": true, "action": "press_key", "params": {"type": "press_key", "key": "p"}, "timestamp": "2026-04-10T20:16:00+00:00", "error": null}], "actions_succeeded": 1, "actions_failed": 0, "hermes_session_id": "f47ac10b", "hermes_log_id": "", "harness_session_id": "f47ac10b"}
|
||||
{"cycle_index": 2, "timestamp_start": "2026-04-10T20:16:30+00:00", "timestamp_end": "2026-04-10T20:17:30+00:00", "duration_ms": 60000, "screenshot_path": "/tmp/bannerlord_capture_1744321020.png", "window_found": true, "screen_size": [1920, 1080], "mouse_position": [960, 540], "playtime_hours": 142.5, "players_online": 8421, "is_running": true, "actions_planned": [{"type": "press_key", "key": "escape"}, {"type": "move_to", "x": 500, "y": 300}, {"type": "click", "x": 500, "y": 300}], "decision_note": "Close party screen, click on campaign map settlement.", "actions_executed": [{"success": true, "action": "press_key", "params": {"type": "press_key", "key": "escape"}, "timestamp": "2026-04-10T20:16:45+00:00", "error": null}, {"success": true, "action": "move_to", "params": {"type": "move_to", "x": 500, "y": 300}, "timestamp": "2026-04-10T20:17:00+00:00", "error": null}, {"success": true, "action": "click", "params": {"type": "click", "x": 500, "y": 300}, "timestamp": "2026-04-10T20:17:30+00:00", "error": null}], "actions_succeeded": 3, "actions_failed": 0, "hermes_session_id": "f47ac10b", "hermes_log_id": "", "harness_session_id": "f47ac10b"}
|
||||
Reference in New Issue
Block a user