Compare commits
1 Commits
mimo/resea
...
mimo/code/
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
1b2ac5cd1f |
@@ -1,305 +0,0 @@
|
|||||||
# Security Audit: NostrIdentity BIP340 Schnorr Signatures — Timing Side-Channel Analysis
|
|
||||||
|
|
||||||
**Issue:** #801
|
|
||||||
**Repository:** Timmy_Foundation/the-nexus
|
|
||||||
**File:** `nexus/nostr_identity.py`
|
|
||||||
**Auditor:** mimo-v2-pro swarm worker
|
|
||||||
**Date:** 2026-04-10
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Summary
|
|
||||||
|
|
||||||
The pure-Python BIP340 Schnorr signature implementation in `NostrIdentity` has **multiple timing side-channel vulnerabilities** that could allow an attacker with precise timing measurements to recover the private key. The implementation is suitable for prototyping and non-adversarial environments but **must not be used in production** without the fixes described below.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
The Nostr sovereign identity system consists of two files:
|
|
||||||
|
|
||||||
- **`nexus/nostr_identity.py`** — Pure-Python secp256k1 + BIP340 Schnorr signature implementation. No external dependencies. Contains `NostrIdentity` class for key generation, event signing, and pubkey derivation.
|
|
||||||
- **`nexus/nostr_publisher.py`** — Async WebSocket publisher that sends signed Nostr events to public relays (damus.io, nos.lol, snort.social).
|
|
||||||
- **`app.js` (line 507)** — Browser-side `NostrAgent` class uses **mock signatures** (`mock_id`, `mock_sig`), not real crypto. Not affected.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Vulnerabilities Found
|
|
||||||
|
|
||||||
### 1. Branch-Dependent Scalar Multiplication — CRITICAL
|
|
||||||
|
|
||||||
**Location:** `nostr_identity.py:41-47` — `point_mul()`
|
|
||||||
|
|
||||||
```python
|
|
||||||
def point_mul(p, n):
|
|
||||||
r = None
|
|
||||||
for i in range(256):
|
|
||||||
if (n >> i) & 1: # <-- branch leaks Hamming weight
|
|
||||||
r = point_add(r, p)
|
|
||||||
p = point_add(p, p)
|
|
||||||
return r
|
|
||||||
```
|
|
||||||
|
|
||||||
**Problem:** The `if (n >> i) & 1` branch causes `point_add(r, p)` to execute only when the bit is 1. An attacker measuring signature generation time can determine which bits of the scalar are set, recovering the private key from a small number of timed signatures.
|
|
||||||
|
|
||||||
**Severity:** CRITICAL — direct private key recovery.
|
|
||||||
|
|
||||||
**Fix:** Use a constant-time double-and-always-add algorithm:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def point_mul(p, n):
|
|
||||||
r = (None, None)
|
|
||||||
for i in range(256):
|
|
||||||
bit = (n >> i) & 1
|
|
||||||
r0 = point_add(r, p) # always compute both
|
|
||||||
r = r0 if bit else r # constant-time select
|
|
||||||
p = point_add(p, p)
|
|
||||||
return r
|
|
||||||
```
|
|
||||||
|
|
||||||
Or better: use Montgomery ladder which avoids point doubling on the identity.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 2. Branch-Dependent Point Addition — CRITICAL
|
|
||||||
|
|
||||||
**Location:** `nostr_identity.py:28-39` — `point_add()`
|
|
||||||
|
|
||||||
```python
|
|
||||||
def point_add(p1, p2):
|
|
||||||
if p1 is None: return p2 # <-- branch leaks operand state
|
|
||||||
if p2 is None: return p1 # <-- branch leaks operand state
|
|
||||||
(x1, y1), (x2, y2) = p1, p2
|
|
||||||
if x1 == x2 and y1 != y2: return None # <-- branch leaks equality
|
|
||||||
if x1 == x2: # <-- branch leaks equality
|
|
||||||
m = (3 * x1 * x1 * inverse(2 * y1, P)) % P
|
|
||||||
else:
|
|
||||||
m = ((y2 - y1) * inverse(x2 - x1, P)) % P
|
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
**Problem:** Multiple conditional branches leak whether inputs are the identity point, whether x-coordinates are equal, and whether y-coordinates are negations. Combined with the scalar multiplication above, this gives an attacker detailed timing information about intermediate computations.
|
|
||||||
|
|
||||||
**Severity:** CRITICAL — compounds the scalar multiplication leak.
|
|
||||||
|
|
||||||
**Fix:** Replace with a branchless point addition using Jacobian or projective coordinates with dummy operations:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def point_add(p1, p2):
|
|
||||||
# Use Jacobian coordinates; always perform full addition
|
|
||||||
# Use conditional moves (simulated with arithmetic masking)
|
|
||||||
# for selecting between doubling and addition paths
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 3. Branch-Dependent Y-Parity Check in Signing — HIGH
|
|
||||||
|
|
||||||
**Location:** `nostr_identity.py:57-58` — `sign_schnorr()`
|
|
||||||
|
|
||||||
```python
|
|
||||||
R = point_mul(G, k)
|
|
||||||
if R[1] % 2 != 0: # <-- branch leaks parity of R's y-coordinate
|
|
||||||
k = N - k
|
|
||||||
```
|
|
||||||
|
|
||||||
**Problem:** The conditional negation of `k` based on the y-parity of R leaks information about the nonce through timing. While less critical than the point_mul leak (it's a single bit), combined with other leaks it aids key recovery.
|
|
||||||
|
|
||||||
**Severity:** HIGH
|
|
||||||
|
|
||||||
**Fix:** Use arithmetic masking:
|
|
||||||
|
|
||||||
```python
|
|
||||||
R = point_mul(G, k)
|
|
||||||
parity = R[1] & 1
|
|
||||||
k = (k * (1 - parity) + (N - k) * parity) % N # constant-time select
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 4. Non-Constant-Time Modular Inverse — MEDIUM
|
|
||||||
|
|
||||||
**Location:** `nostr_identity.py:25-26` — `inverse()`
|
|
||||||
|
|
||||||
```python
|
|
||||||
def inverse(a, n):
|
|
||||||
return pow(a, n - 2, n)
|
|
||||||
```
|
|
||||||
|
|
||||||
**Problem:** CPython's built-in `pow()` with 3 args uses Montgomery ladder internally, which is *generally* constant-time for fixed-size operands. However:
|
|
||||||
- This is an implementation detail, not a guarantee.
|
|
||||||
- PyPy, GraalPy, and other Python runtimes may use different algorithms.
|
|
||||||
- The exponent `n-2` has a fixed Hamming weight for secp256k1's `N`, so this specific case is less exploitable, but relying on it is fragile.
|
|
||||||
|
|
||||||
**Severity:** MEDIUM — implementation-dependent; low risk on CPython specifically.
|
|
||||||
|
|
||||||
**Fix:** Implement Fermat's little theorem inversion with blinding, or use a dedicated constant-time GCD algorithm (extended binary GCD).
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 5. Non-RFC6979 Nonce Generation — LOW (but non-standard)
|
|
||||||
|
|
||||||
**Location:** `nostr_identity.py:55`
|
|
||||||
|
|
||||||
```python
|
|
||||||
k = int.from_bytes(sha256(privkey.to_bytes(32, 'big') + msg_hash), 'big') % N
|
|
||||||
```
|
|
||||||
|
|
||||||
**Problem:** The nonce derivation is `SHA256(privkey || msg_hash)` which is deterministic but doesn't follow RFC6979 (HMAC-based DRBG). Issues:
|
|
||||||
- Not vulnerable to timing (it's a single hash), but could be vulnerable to related-message attacks if the same key signs messages with predictable relationships.
|
|
||||||
- BIP340 specifies `tagged_hash("BIP0340/nonce", ...)` with specific domain separation, which is not used here.
|
|
||||||
|
|
||||||
**Severity:** LOW — not a timing issue but a cryptographic correctness concern.
|
|
||||||
|
|
||||||
**Fix:** Follow RFC6979 or BIP340's tagged hash approach:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def sign_schnorr(msg_hash, privkey):
|
|
||||||
# BIP340 nonce generation with tagged hash
|
|
||||||
t = privkey.to_bytes(32, 'big')
|
|
||||||
if R_y_is_odd:
|
|
||||||
t = bytes(b ^ 0x01 for b in t) # negate if needed
|
|
||||||
k = int.from_bytes(tagged_hash("BIP0340/nonce", t + pubkey + msg_hash), 'big') % N
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 6. Private Key Bias in Random Generation — LOW
|
|
||||||
|
|
||||||
**Location:** `nostr_identity.py:69`
|
|
||||||
|
|
||||||
```python
|
|
||||||
self.privkey = int.from_bytes(os.urandom(32), 'big') % N
|
|
||||||
```
|
|
||||||
|
|
||||||
**Problem:** `os.urandom(32)` produces values in `[0, 2^256)`, while `N` is slightly less than `2^256`. The modulo reduction introduces a negligible bias (~2^-128). Not exploitable in practice, but not the cleanest approach.
|
|
||||||
|
|
||||||
**Severity:** LOW — theoretically biased, practically unexploitable.
|
|
||||||
|
|
||||||
**Fix:** Use rejection sampling or derive from a hash:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def generate_privkey():
|
|
||||||
while True:
|
|
||||||
candidate = int.from_bytes(os.urandom(32), 'big')
|
|
||||||
if 0 < candidate < N:
|
|
||||||
return candidate
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 7. No Scalar/Point Blinding — MEDIUM
|
|
||||||
|
|
||||||
**Location:** Global — no blinding anywhere in the implementation.
|
|
||||||
|
|
||||||
**Problem:** The implementation has no countermeasures against:
|
|
||||||
- **Power analysis** (DPA/SPA) on embedded systems
|
|
||||||
- **Cache-timing attacks** on shared hardware (VMs, cloud)
|
|
||||||
- **Electromagnetic emanation** attacks
|
|
||||||
|
|
||||||
Adding random blinding to scalar multiplication (multiply by `r * r^-1` where `r` is random) would significantly raise the bar for side-channel attacks beyond simple timing.
|
|
||||||
|
|
||||||
**Severity:** MEDIUM — not timing-specific, but important for hardening.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## What's NOT Vulnerable (Good News)
|
|
||||||
|
|
||||||
1. **The JS-side `NostrAgent` in `app.js`** uses mock signatures (`mock_id`, `mock_sig`) — not real crypto, not affected.
|
|
||||||
2. **`nostr_publisher.py`** correctly imports and uses `NostrIdentity` without modifying its internals.
|
|
||||||
3. **The hash functions** (`sha256`, `hmac_sha256`) use Python's `hashlib` which delegates to OpenSSL — these are constant-time.
|
|
||||||
4. **The JSON serialization** in `sign_event()` is deterministic and doesn't leak timing.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Recommended Fix (Full Remediation)
|
|
||||||
|
|
||||||
### Priority 1: Replace with secp256k1-py or coincurve (IMMEDIATE)
|
|
||||||
|
|
||||||
The fastest, most reliable fix is to stop using the pure-Python implementation entirely:
|
|
||||||
|
|
||||||
```python
|
|
||||||
# nostr_identity.py — replacement using coincurve
|
|
||||||
import coincurve
|
|
||||||
import hashlib
|
|
||||||
import json
|
|
||||||
import os
|
|
||||||
|
|
||||||
class NostrIdentity:
|
|
||||||
def __init__(self, privkey_hex=None):
|
|
||||||
if privkey_hex:
|
|
||||||
self.privkey = bytes.fromhex(privkey_hex)
|
|
||||||
else:
|
|
||||||
self.privkey = os.urandom(32)
|
|
||||||
self.pubkey = coincurve.PrivateKey(self.privkey).public_key.format(compressed=True)[1:].hex()
|
|
||||||
|
|
||||||
def sign_event(self, event):
|
|
||||||
event_data = [0, event['pubkey'], event['created_at'], event['kind'], event['tags'], event['content']]
|
|
||||||
serialized = json.dumps(event_data, separators=(',', ':'))
|
|
||||||
msg_hash = hashlib.sha256(serialized.encode()).digest()
|
|
||||||
event['id'] = msg_hash.hex()
|
|
||||||
# Use libsecp256k1's BIP340 Schnorr (constant-time C implementation)
|
|
||||||
event['sig'] = coincurve.PrivateKey(self.privkey).sign_schnorr(msg_hash).hex()
|
|
||||||
return event
|
|
||||||
```
|
|
||||||
|
|
||||||
**Effort:** ~2 hours (swap implementation, add `coincurve` to `requirements.txt`, test)
|
|
||||||
**Risk:** Adds a C dependency. If pure-Python is required (sovereignty constraint), use Priority 2.
|
|
||||||
|
|
||||||
### Priority 2: Pure-Python Constant-Time Rewrite (IF PURE PYTHON REQUIRED)
|
|
||||||
|
|
||||||
If the sovereignty constraint (no C dependencies) must be maintained, rewrite the elliptic curve operations:
|
|
||||||
|
|
||||||
1. **Replace `point_mul`** with Montgomery ladder (constant-time by design)
|
|
||||||
2. **Replace `point_add`** with Jacobian coordinate addition that always performs both doubling and addition, selecting with arithmetic masking
|
|
||||||
3. **Replace `inverse`** with extended binary GCD with blinding
|
|
||||||
4. **Fix nonce generation** to follow RFC6979 or BIP340 tagged hashes
|
|
||||||
5. **Fix key generation** to use rejection sampling
|
|
||||||
|
|
||||||
**Effort:** ~8-12 hours (careful implementation + test vectors from BIP340 spec)
|
|
||||||
**Risk:** Pure-Python crypto is inherently slower (~100ms per signature vs ~1ms with libsecp256k1)
|
|
||||||
|
|
||||||
### Priority 3: Hybrid Approach
|
|
||||||
|
|
||||||
Use `coincurve` when available, fall back to pure-Python with warnings:
|
|
||||||
|
|
||||||
```python
|
|
||||||
try:
|
|
||||||
import coincurve
|
|
||||||
USE_LIB = True
|
|
||||||
except ImportError:
|
|
||||||
USE_LIB = False
|
|
||||||
import warnings
|
|
||||||
warnings.warn("Using pure-Python Schnorr — vulnerable to timing attacks. Install coincurve for production use.")
|
|
||||||
```
|
|
||||||
|
|
||||||
**Effort:** ~3 hours
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Effort Estimate
|
|
||||||
|
|
||||||
| Fix | Effort | Risk Reduction | Recommended |
|
|
||||||
|-----|--------|----------------|-------------|
|
|
||||||
| Replace with coincurve (Priority 1) | 2h | Eliminates all timing issues | YES — do this |
|
|
||||||
| Pure-Python constant-time rewrite (Priority 2) | 8-12h | Eliminates timing issues | Only if no-C constraint is firm |
|
|
||||||
| Hybrid (Priority 3) | 3h | Full for installed, partial for fallback | Good compromise |
|
|
||||||
| Findings doc + PR (this work) | 2h | Documents the problem | DONE |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Test Vectors
|
|
||||||
|
|
||||||
The BIP340 specification includes test vectors at https://github.com/bitcoin/bips/blob/master/bip-00340/test-vectors.csv
|
|
||||||
|
|
||||||
Any replacement implementation MUST pass all test vectors before deployment.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Conclusion
|
|
||||||
|
|
||||||
The pure-Python BIP340 Schnorr implementation in `NostrIdentity` is **vulnerable to timing side-channel attacks** that could recover the private key. The primary issue is branch-dependent execution in scalar multiplication and point addition. The fastest fix is replacing with `coincurve` (libsecp256k1 binding). If pure-Python sovereignty is required, a constant-time rewrite using Montgomery ladder and arithmetic masking is needed.
|
|
||||||
|
|
||||||
The JS-side `NostrAgent` in `app.js` uses mock signatures and is not affected.
|
|
||||||
|
|
||||||
**Recommendation:** Ship `coincurve` replacement immediately. It's 2 hours of work and eliminates the entire attack surface.
|
|
||||||
@@ -29,6 +29,8 @@ from typing import Any, Callable, Optional
|
|||||||
|
|
||||||
import websockets
|
import websockets
|
||||||
|
|
||||||
|
from bannerlord_trace import BannerlordTraceLogger
|
||||||
|
|
||||||
# ═══════════════════════════════════════════════════════════════════════════
|
# ═══════════════════════════════════════════════════════════════════════════
|
||||||
# CONFIGURATION
|
# CONFIGURATION
|
||||||
# ═══════════════════════════════════════════════════════════════════════════
|
# ═══════════════════════════════════════════════════════════════════════════
|
||||||
@@ -265,11 +267,13 @@ class BannerlordHarness:
|
|||||||
desktop_command: Optional[list[str]] = None,
|
desktop_command: Optional[list[str]] = None,
|
||||||
steam_command: Optional[list[str]] = None,
|
steam_command: Optional[list[str]] = None,
|
||||||
enable_mock: bool = False,
|
enable_mock: bool = False,
|
||||||
|
enable_trace: bool = False,
|
||||||
):
|
):
|
||||||
self.hermes_ws_url = hermes_ws_url
|
self.hermes_ws_url = hermes_ws_url
|
||||||
self.desktop_command = desktop_command or DEFAULT_MCP_DESKTOP_COMMAND
|
self.desktop_command = desktop_command or DEFAULT_MCP_DESKTOP_COMMAND
|
||||||
self.steam_command = steam_command or DEFAULT_MCP_STEAM_COMMAND
|
self.steam_command = steam_command or DEFAULT_MCP_STEAM_COMMAND
|
||||||
self.enable_mock = enable_mock
|
self.enable_mock = enable_mock
|
||||||
|
self.enable_trace = enable_trace
|
||||||
|
|
||||||
# MCP clients
|
# MCP clients
|
||||||
self.desktop_mcp: Optional[MCPClient] = None
|
self.desktop_mcp: Optional[MCPClient] = None
|
||||||
@@ -284,6 +288,9 @@ class BannerlordHarness:
|
|||||||
self.cycle_count = 0
|
self.cycle_count = 0
|
||||||
self.running = False
|
self.running = False
|
||||||
|
|
||||||
|
# Session trace logger
|
||||||
|
self.trace_logger: Optional[BannerlordTraceLogger] = None
|
||||||
|
|
||||||
# ═══ LIFECYCLE ═══
|
# ═══ LIFECYCLE ═══
|
||||||
|
|
||||||
async def start(self) -> bool:
|
async def start(self) -> bool:
|
||||||
@@ -314,6 +321,15 @@ class BannerlordHarness:
|
|||||||
# Connect to Hermes WebSocket
|
# Connect to Hermes WebSocket
|
||||||
await self._connect_hermes()
|
await self._connect_hermes()
|
||||||
|
|
||||||
|
# Initialize trace logger if enabled
|
||||||
|
if self.enable_trace:
|
||||||
|
self.trace_logger = BannerlordTraceLogger(
|
||||||
|
harness_session_id=self.session_id,
|
||||||
|
hermes_session_id=self.session_id,
|
||||||
|
)
|
||||||
|
self.trace_logger.start_session()
|
||||||
|
log.info(f"Trace logger started: {self.trace_logger.trace_id}")
|
||||||
|
|
||||||
log.info("Harness initialized successfully")
|
log.info("Harness initialized successfully")
|
||||||
return True
|
return True
|
||||||
|
|
||||||
@@ -322,6 +338,12 @@ class BannerlordHarness:
|
|||||||
self.running = False
|
self.running = False
|
||||||
log.info("Shutting down harness...")
|
log.info("Shutting down harness...")
|
||||||
|
|
||||||
|
# Finalize trace logger
|
||||||
|
if self.trace_logger:
|
||||||
|
manifest = self.trace_logger.finish_session()
|
||||||
|
log.info(f"Trace saved: {manifest.trace_file}")
|
||||||
|
log.info(f"Manifest: {self.trace_logger.manifest_file}")
|
||||||
|
|
||||||
if self.desktop_mcp:
|
if self.desktop_mcp:
|
||||||
self.desktop_mcp.stop()
|
self.desktop_mcp.stop()
|
||||||
if self.steam_mcp:
|
if self.steam_mcp:
|
||||||
@@ -707,6 +729,11 @@ class BannerlordHarness:
|
|||||||
self.cycle_count = iteration
|
self.cycle_count = iteration
|
||||||
log.info(f"\n--- ODA Cycle {iteration + 1}/{max_iterations} ---")
|
log.info(f"\n--- ODA Cycle {iteration + 1}/{max_iterations} ---")
|
||||||
|
|
||||||
|
# Start trace cycle
|
||||||
|
trace_cycle = None
|
||||||
|
if self.trace_logger:
|
||||||
|
trace_cycle = self.trace_logger.begin_cycle(iteration)
|
||||||
|
|
||||||
# 1. OBSERVE: Capture state
|
# 1. OBSERVE: Capture state
|
||||||
log.info("[OBSERVE] Capturing game state...")
|
log.info("[OBSERVE] Capturing game state...")
|
||||||
state = await self.capture_state()
|
state = await self.capture_state()
|
||||||
@@ -715,11 +742,24 @@ class BannerlordHarness:
|
|||||||
log.info(f" Screen: {state.visual.screen_size}")
|
log.info(f" Screen: {state.visual.screen_size}")
|
||||||
log.info(f" Players online: {state.game_context.current_players_online}")
|
log.info(f" Players online: {state.game_context.current_players_online}")
|
||||||
|
|
||||||
|
# Populate trace with observation data
|
||||||
|
if trace_cycle:
|
||||||
|
trace_cycle.screenshot_path = state.visual.screenshot_path or ""
|
||||||
|
trace_cycle.window_found = state.visual.window_found
|
||||||
|
trace_cycle.screen_size = list(state.visual.screen_size)
|
||||||
|
trace_cycle.mouse_position = list(state.visual.mouse_position)
|
||||||
|
trace_cycle.playtime_hours = state.game_context.playtime_hours
|
||||||
|
trace_cycle.players_online = state.game_context.current_players_online
|
||||||
|
trace_cycle.is_running = state.game_context.is_running
|
||||||
|
|
||||||
# 2. DECIDE: Get actions from decision function
|
# 2. DECIDE: Get actions from decision function
|
||||||
log.info("[DECIDE] Getting actions...")
|
log.info("[DECIDE] Getting actions...")
|
||||||
actions = decision_fn(state)
|
actions = decision_fn(state)
|
||||||
log.info(f" Decision returned {len(actions)} actions")
|
log.info(f" Decision returned {len(actions)} actions")
|
||||||
|
|
||||||
|
if trace_cycle:
|
||||||
|
trace_cycle.actions_planned = actions
|
||||||
|
|
||||||
# 3. ACT: Execute actions
|
# 3. ACT: Execute actions
|
||||||
log.info("[ACT] Executing actions...")
|
log.info("[ACT] Executing actions...")
|
||||||
results = []
|
results = []
|
||||||
@@ -731,6 +771,13 @@ class BannerlordHarness:
|
|||||||
if result.error:
|
if result.error:
|
||||||
log.info(f" Error: {result.error}")
|
log.info(f" Error: {result.error}")
|
||||||
|
|
||||||
|
if trace_cycle:
|
||||||
|
trace_cycle.actions_executed.append(result.to_dict())
|
||||||
|
|
||||||
|
# Finalize trace cycle
|
||||||
|
if trace_cycle:
|
||||||
|
self.trace_logger.finish_cycle(trace_cycle)
|
||||||
|
|
||||||
# Send cycle summary telemetry
|
# Send cycle summary telemetry
|
||||||
await self._send_telemetry({
|
await self._send_telemetry({
|
||||||
"type": "oda_cycle_complete",
|
"type": "oda_cycle_complete",
|
||||||
@@ -836,12 +883,18 @@ async def main():
|
|||||||
default=1.0,
|
default=1.0,
|
||||||
help="Delay between iterations in seconds (default: 1.0)",
|
help="Delay between iterations in seconds (default: 1.0)",
|
||||||
)
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--trace",
|
||||||
|
action="store_true",
|
||||||
|
help="Enable session trace logging to ~/.timmy/traces/bannerlord/",
|
||||||
|
)
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
# Create harness
|
# Create harness
|
||||||
harness = BannerlordHarness(
|
harness = BannerlordHarness(
|
||||||
hermes_ws_url=args.hermes_ws,
|
hermes_ws_url=args.hermes_ws,
|
||||||
enable_mock=args.mock,
|
enable_mock=args.mock,
|
||||||
|
enable_trace=args.trace,
|
||||||
)
|
)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
|
|||||||
234
nexus/bannerlord_trace.py
Normal file
234
nexus/bannerlord_trace.py
Normal file
@@ -0,0 +1,234 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Bannerlord Session Trace Logger — First-Replayable Training Material
|
||||||
|
|
||||||
|
Captures one Bannerlord session as a replayable trace:
|
||||||
|
- Timestamps on every cycle
|
||||||
|
- Actions executed with success/failure
|
||||||
|
- World-state evidence (screenshots, Steam stats)
|
||||||
|
- Hermes session/log ID mapping
|
||||||
|
|
||||||
|
Storage: ~/.timmy/traces/bannerlord/trace_<session_id>.jsonl
|
||||||
|
Manifest: ~/.timmy/traces/bannerlord/manifest_<session_id>.json
|
||||||
|
|
||||||
|
Each JSONL line is one ODA cycle with full context.
|
||||||
|
The manifest bundles metadata for replay/eval.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import time
|
||||||
|
import uuid
|
||||||
|
from dataclasses import dataclass, field, asdict
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
# Storage root — local-first under ~/.timmy/
|
||||||
|
DEFAULT_TRACE_DIR = Path.home() / ".timmy" / "traces" / "bannerlord"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CycleTrace:
|
||||||
|
"""One ODA cycle captured in full."""
|
||||||
|
cycle_index: int
|
||||||
|
timestamp_start: str
|
||||||
|
timestamp_end: str = ""
|
||||||
|
duration_ms: int = 0
|
||||||
|
|
||||||
|
# Observe
|
||||||
|
screenshot_path: str = ""
|
||||||
|
window_found: bool = False
|
||||||
|
screen_size: list[int] = field(default_factory=lambda: [1920, 1080])
|
||||||
|
mouse_position: list[int] = field(default_factory=lambda: [0, 0])
|
||||||
|
playtime_hours: float = 0.0
|
||||||
|
players_online: int = 0
|
||||||
|
is_running: bool = False
|
||||||
|
|
||||||
|
# Decide
|
||||||
|
actions_planned: list[dict] = field(default_factory=list)
|
||||||
|
decision_note: str = ""
|
||||||
|
|
||||||
|
# Act
|
||||||
|
actions_executed: list[dict] = field(default_factory=list)
|
||||||
|
actions_succeeded: int = 0
|
||||||
|
actions_failed: int = 0
|
||||||
|
|
||||||
|
# Metadata
|
||||||
|
hermes_session_id: str = ""
|
||||||
|
hermes_log_id: str = ""
|
||||||
|
harness_session_id: str = ""
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
return asdict(self)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SessionManifest:
|
||||||
|
"""Top-level metadata for a captured session trace."""
|
||||||
|
trace_id: str
|
||||||
|
harness_session_id: str
|
||||||
|
hermes_session_id: str
|
||||||
|
hermes_log_id: str
|
||||||
|
game: str = "Mount & Blade II: Bannerlord"
|
||||||
|
app_id: int = 261550
|
||||||
|
started_at: str = ""
|
||||||
|
finished_at: str = ""
|
||||||
|
total_cycles: int = 0
|
||||||
|
total_actions: int = 0
|
||||||
|
total_succeeded: int = 0
|
||||||
|
total_failed: int = 0
|
||||||
|
trace_file: str = ""
|
||||||
|
trace_dir: str = ""
|
||||||
|
replay_command: str = ""
|
||||||
|
eval_note: str = ""
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
return asdict(self)
|
||||||
|
|
||||||
|
|
||||||
|
class BannerlordTraceLogger:
|
||||||
|
"""
|
||||||
|
Captures a single Bannerlord session as a replayable trace.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
logger = BannerlordTraceLogger(hermes_session_id="abc123")
|
||||||
|
logger.start_session()
|
||||||
|
cycle = logger.begin_cycle(0)
|
||||||
|
# ... populate cycle fields ...
|
||||||
|
logger.finish_cycle(cycle)
|
||||||
|
manifest = logger.finish_session()
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
trace_dir: Optional[Path] = None,
|
||||||
|
harness_session_id: str = "",
|
||||||
|
hermes_session_id: str = "",
|
||||||
|
hermes_log_id: str = "",
|
||||||
|
):
|
||||||
|
self.trace_dir = trace_dir or DEFAULT_TRACE_DIR
|
||||||
|
self.trace_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
self.trace_id = f"bl_{datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
|
||||||
|
self.harness_session_id = harness_session_id or str(uuid.uuid4())[:8]
|
||||||
|
self.hermes_session_id = hermes_session_id
|
||||||
|
self.hermes_log_id = hermes_log_id
|
||||||
|
|
||||||
|
self.trace_file = self.trace_dir / f"trace_{self.trace_id}.jsonl"
|
||||||
|
self.manifest_file = self.trace_dir / f"manifest_{self.trace_id}.json"
|
||||||
|
|
||||||
|
self.cycles: list[CycleTrace] = []
|
||||||
|
self.started_at: str = ""
|
||||||
|
self.finished_at: str = ""
|
||||||
|
|
||||||
|
def start_session(self) -> str:
|
||||||
|
"""Begin a trace session. Returns trace_id."""
|
||||||
|
self.started_at = datetime.now(timezone.utc).isoformat()
|
||||||
|
return self.trace_id
|
||||||
|
|
||||||
|
def begin_cycle(self, cycle_index: int) -> CycleTrace:
|
||||||
|
"""Start recording one ODA cycle."""
|
||||||
|
cycle = CycleTrace(
|
||||||
|
cycle_index=cycle_index,
|
||||||
|
timestamp_start=datetime.now(timezone.utc).isoformat(),
|
||||||
|
harness_session_id=self.harness_session_id,
|
||||||
|
hermes_session_id=self.hermes_session_id,
|
||||||
|
hermes_log_id=self.hermes_log_id,
|
||||||
|
)
|
||||||
|
return cycle
|
||||||
|
|
||||||
|
def finish_cycle(self, cycle: CycleTrace) -> None:
|
||||||
|
"""Finalize and persist one cycle to the trace file."""
|
||||||
|
cycle.timestamp_end = datetime.now(timezone.utc).isoformat()
|
||||||
|
# Compute duration
|
||||||
|
try:
|
||||||
|
t0 = datetime.fromisoformat(cycle.timestamp_start)
|
||||||
|
t1 = datetime.fromisoformat(cycle.timestamp_end)
|
||||||
|
cycle.duration_ms = int((t1 - t0).total_seconds() * 1000)
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
cycle.duration_ms = 0
|
||||||
|
|
||||||
|
# Count successes/failures
|
||||||
|
cycle.actions_succeeded = sum(
|
||||||
|
1 for a in cycle.actions_executed if a.get("success", False)
|
||||||
|
)
|
||||||
|
cycle.actions_failed = sum(
|
||||||
|
1 for a in cycle.actions_executed if not a.get("success", True)
|
||||||
|
)
|
||||||
|
|
||||||
|
self.cycles.append(cycle)
|
||||||
|
|
||||||
|
# Append to JSONL
|
||||||
|
with open(self.trace_file, "a") as f:
|
||||||
|
f.write(json.dumps(cycle.to_dict()) + "\n")
|
||||||
|
|
||||||
|
def finish_session(self) -> SessionManifest:
|
||||||
|
"""Finalize the session and write the manifest."""
|
||||||
|
self.finished_at = datetime.now(timezone.utc).isoformat()
|
||||||
|
|
||||||
|
total_actions = sum(len(c.actions_executed) for c in self.cycles)
|
||||||
|
total_succeeded = sum(c.actions_succeeded for c in self.cycles)
|
||||||
|
total_failed = sum(c.actions_failed for c in self.cycles)
|
||||||
|
|
||||||
|
manifest = SessionManifest(
|
||||||
|
trace_id=self.trace_id,
|
||||||
|
harness_session_id=self.harness_session_id,
|
||||||
|
hermes_session_id=self.hermes_session_id,
|
||||||
|
hermes_log_id=self.hermes_log_id,
|
||||||
|
started_at=self.started_at,
|
||||||
|
finished_at=self.finished_at,
|
||||||
|
total_cycles=len(self.cycles),
|
||||||
|
total_actions=total_actions,
|
||||||
|
total_succeeded=total_succeeded,
|
||||||
|
total_failed=total_failed,
|
||||||
|
trace_file=str(self.trace_file),
|
||||||
|
trace_dir=str(self.trace_dir),
|
||||||
|
replay_command=(
|
||||||
|
f"python -m nexus.bannerlord_harness --mock --replay {self.trace_file}"
|
||||||
|
),
|
||||||
|
eval_note=(
|
||||||
|
"To replay: load this trace, re-execute each cycle's actions_planned "
|
||||||
|
"against a fresh harness in mock mode, compare actions_executed outcomes. "
|
||||||
|
"Success metric: >=90% action parity between original and replay runs."
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
with open(self.manifest_file, "w") as f:
|
||||||
|
json.dump(manifest.to_dict(), f, indent=2)
|
||||||
|
|
||||||
|
return manifest
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def load_trace(cls, trace_file: Path) -> list[dict]:
|
||||||
|
"""Load a trace JSONL file for replay or analysis."""
|
||||||
|
cycles = []
|
||||||
|
with open(trace_file) as f:
|
||||||
|
for line in f:
|
||||||
|
line = line.strip()
|
||||||
|
if line:
|
||||||
|
cycles.append(json.loads(line))
|
||||||
|
return cycles
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def load_manifest(cls, manifest_file: Path) -> dict:
|
||||||
|
"""Load a session manifest."""
|
||||||
|
with open(manifest_file) as f:
|
||||||
|
return json.load(f)
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def list_traces(cls, trace_dir: Optional[Path] = None) -> list[dict]:
|
||||||
|
"""List all available trace sessions."""
|
||||||
|
d = trace_dir or DEFAULT_TRACE_DIR
|
||||||
|
if not d.exists():
|
||||||
|
return []
|
||||||
|
|
||||||
|
traces = []
|
||||||
|
for mf in sorted(d.glob("manifest_*.json")):
|
||||||
|
try:
|
||||||
|
manifest = cls.load_manifest(mf)
|
||||||
|
traces.append(manifest)
|
||||||
|
except (json.JSONDecodeError, IOError):
|
||||||
|
continue
|
||||||
|
return traces
|
||||||
97
nexus/traces/bannerlord/REPLAY.md
Normal file
97
nexus/traces/bannerlord/REPLAY.md
Normal file
@@ -0,0 +1,97 @@
|
|||||||
|
# Bannerlord Session Trace — Replay & Eval Guide
|
||||||
|
|
||||||
|
## Storage Layout
|
||||||
|
|
||||||
|
All traces live under `~/.timmy/traces/bannerlord/`:
|
||||||
|
|
||||||
|
```
|
||||||
|
~/.timmy/traces/bannerlord/
|
||||||
|
trace_<trace_id>.jsonl # One line per ODA cycle (full state + actions)
|
||||||
|
manifest_<trace_id>.json # Session metadata, counts, replay command
|
||||||
|
```
|
||||||
|
|
||||||
|
## Trace Format (JSONL)
|
||||||
|
|
||||||
|
Each line is one ODA cycle:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"cycle_index": 0,
|
||||||
|
"timestamp_start": "2026-04-10T20:15:00+00:00",
|
||||||
|
"timestamp_end": "2026-04-10T20:15:45+00:00",
|
||||||
|
"duration_ms": 45000,
|
||||||
|
|
||||||
|
"screenshot_path": "/tmp/bannerlord_capture_1744320900.png",
|
||||||
|
"window_found": true,
|
||||||
|
"screen_size": [1920, 1080],
|
||||||
|
"mouse_position": [960, 540],
|
||||||
|
"playtime_hours": 142.5,
|
||||||
|
"players_online": 8421,
|
||||||
|
"is_running": true,
|
||||||
|
|
||||||
|
"actions_planned": [{"type": "move_to", "x": 960, "y": 540}],
|
||||||
|
"actions_executed": [{"success": true, "action": "move_to", ...}],
|
||||||
|
"actions_succeeded": 1,
|
||||||
|
"actions_failed": 0,
|
||||||
|
|
||||||
|
"hermes_session_id": "f47ac10b",
|
||||||
|
"hermes_log_id": "",
|
||||||
|
"harness_session_id": "f47ac10b"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Capturing a Trace
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run harness with trace logging enabled
|
||||||
|
cd /path/to/the-nexus
|
||||||
|
python -m nexus.bannerlord_harness --mock --trace --iterations 3
|
||||||
|
```
|
||||||
|
|
||||||
|
The trace and manifest are written to `~/.timmy/traces/bannerlord/` on harness shutdown.
|
||||||
|
|
||||||
|
## Replay Protocol
|
||||||
|
|
||||||
|
1. Load a trace: `BannerlordTraceLogger.load_trace(trace_file)`
|
||||||
|
2. Create a fresh harness in mock mode
|
||||||
|
3. For each cycle in the trace:
|
||||||
|
- Re-execute the `actions_planned` list
|
||||||
|
- Compare actual `actions_executed` outcomes against the recorded ones
|
||||||
|
4. Score: `(matching_actions / total_actions) * 100`
|
||||||
|
|
||||||
|
### Eval Criteria
|
||||||
|
|
||||||
|
| Score | Grade | Meaning |
|
||||||
|
|---------|----------|--------------------------------------------|
|
||||||
|
| >= 90% | PASS | Replay matches original closely |
|
||||||
|
| 70-89% | PARTIAL | Some divergence, investigate differences |
|
||||||
|
| < 70% | FAIL | Significant drift, review action semantics |
|
||||||
|
|
||||||
|
## Replay Script (sketch)
|
||||||
|
|
||||||
|
```python
|
||||||
|
from nexus.bannerlord_trace import BannerlordTraceLogger
|
||||||
|
from nexus.bannerlord_harness import BannerlordHarness
|
||||||
|
|
||||||
|
# Load trace
|
||||||
|
cycles = BannerlordTraceLogger.load_trace(
|
||||||
|
Path.home() / ".timmy" / "traces" / "bannerlord" / "trace_bl_xxx.jsonl"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Replay
|
||||||
|
harness = BannerlordHarness(enable_mock=True, enable_trace=False)
|
||||||
|
await harness.start()
|
||||||
|
|
||||||
|
for cycle in cycles:
|
||||||
|
for action in cycle["actions_planned"]:
|
||||||
|
result = await harness.execute_action(action)
|
||||||
|
# Compare result against cycle["actions_executed"]
|
||||||
|
|
||||||
|
await harness.stop()
|
||||||
|
```
|
||||||
|
|
||||||
|
## Hermes Session Mapping
|
||||||
|
|
||||||
|
The `hermes_session_id` and `hermes_log_id` fields link traces to Hermes session logs.
|
||||||
|
When a trace is captured during a live Hermes session, populate these fields so
|
||||||
|
the trace can be correlated with the broader agent conversation context.
|
||||||
18
nexus/traces/bannerlord/sample_manifest.json
Normal file
18
nexus/traces/bannerlord/sample_manifest.json
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
{
|
||||||
|
"trace_id": "bl_20260410_201500_a1b2c3",
|
||||||
|
"harness_session_id": "f47ac10b",
|
||||||
|
"hermes_session_id": "f47ac10b",
|
||||||
|
"hermes_log_id": "",
|
||||||
|
"game": "Mount & Blade II: Bannerlord",
|
||||||
|
"app_id": 261550,
|
||||||
|
"started_at": "2026-04-10T20:15:00+00:00",
|
||||||
|
"finished_at": "2026-04-10T20:17:30+00:00",
|
||||||
|
"total_cycles": 3,
|
||||||
|
"total_actions": 6,
|
||||||
|
"total_succeeded": 6,
|
||||||
|
"total_failed": 0,
|
||||||
|
"trace_file": "~/.timmy/traces/bannerlord/trace_bl_20260410_201500_a1b2c3.jsonl",
|
||||||
|
"trace_dir": "~/.timmy/traces/bannerlord",
|
||||||
|
"replay_command": "python -m nexus.bannerlord_harness --mock --replay ~/.timmy/traces/bannerlord/trace_bl_20260410_201500_a1b2c3.jsonl",
|
||||||
|
"eval_note": "To replay: load trace, re-execute each cycle's actions_planned against a fresh harness in mock mode, compare actions_executed outcomes. Success metric: >=90% action parity between original and replay runs."
|
||||||
|
}
|
||||||
3
nexus/traces/bannerlord/sample_trace.jsonl
Normal file
3
nexus/traces/bannerlord/sample_trace.jsonl
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
{"cycle_index": 0, "timestamp_start": "2026-04-10T20:15:00+00:00", "timestamp_end": "2026-04-10T20:15:45+00:00", "duration_ms": 45000, "screenshot_path": "/tmp/bannerlord_capture_1744320900.png", "window_found": true, "screen_size": [1920, 1080], "mouse_position": [960, 540], "playtime_hours": 142.5, "players_online": 8421, "is_running": true, "actions_planned": [{"type": "move_to", "x": 960, "y": 540}, {"type": "press_key", "key": "space"}], "decision_note": "Initial state capture. Move to screen center and press space to advance.", "actions_executed": [{"success": true, "action": "move_to", "params": {"type": "move_to", "x": 960, "y": 540}, "timestamp": "2026-04-10T20:15:30+00:00", "error": null}, {"success": true, "action": "press_key", "params": {"type": "press_key", "key": "space"}, "timestamp": "2026-04-10T20:15:45+00:00", "error": null}], "actions_succeeded": 2, "actions_failed": 0, "hermes_session_id": "f47ac10b", "hermes_log_id": "", "harness_session_id": "f47ac10b"}
|
||||||
|
{"cycle_index": 1, "timestamp_start": "2026-04-10T20:15:45+00:00", "timestamp_end": "2026-04-10T20:16:30+00:00", "duration_ms": 45000, "screenshot_path": "/tmp/bannerlord_capture_1744320945.png", "window_found": true, "screen_size": [1920, 1080], "mouse_position": [960, 540], "playtime_hours": 142.5, "players_online": 8421, "is_running": true, "actions_planned": [{"type": "press_key", "key": "p"}], "decision_note": "Open party screen to inspect troops.", "actions_executed": [{"success": true, "action": "press_key", "params": {"type": "press_key", "key": "p"}, "timestamp": "2026-04-10T20:16:00+00:00", "error": null}], "actions_succeeded": 1, "actions_failed": 0, "hermes_session_id": "f47ac10b", "hermes_log_id": "", "harness_session_id": "f47ac10b"}
|
||||||
|
{"cycle_index": 2, "timestamp_start": "2026-04-10T20:16:30+00:00", "timestamp_end": "2026-04-10T20:17:30+00:00", "duration_ms": 60000, "screenshot_path": "/tmp/bannerlord_capture_1744321020.png", "window_found": true, "screen_size": [1920, 1080], "mouse_position": [960, 540], "playtime_hours": 142.5, "players_online": 8421, "is_running": true, "actions_planned": [{"type": "press_key", "key": "escape"}, {"type": "move_to", "x": 500, "y": 300}, {"type": "click", "x": 500, "y": 300}], "decision_note": "Close party screen, click on campaign map settlement.", "actions_executed": [{"success": true, "action": "press_key", "params": {"type": "press_key", "key": "escape"}, "timestamp": "2026-04-10T20:16:45+00:00", "error": null}, {"success": true, "action": "move_to", "params": {"type": "move_to", "x": 500, "y": 300}, "timestamp": "2026-04-10T20:17:00+00:00", "error": null}, {"success": true, "action": "click", "params": {"type": "click", "x": 500, "y": 300}, "timestamp": "2026-04-10T20:17:30+00:00", "error": null}], "actions_succeeded": 3, "actions_failed": 0, "hermes_session_id": "f47ac10b", "hermes_log_id": "", "harness_session_id": "f47ac10b"}
|
||||||
Reference in New Issue
Block a user