Compare commits

...

28 Commits

Author SHA1 Message Date
Alexander Whitestone
617ef43f99 test: add unit tests for daily_run.py — 51 tests covering main handlers
Some checks failed
Tests / lint (pull_request) Failing after 14s
Tests / test (pull_request) Has been skipped
Adds tests/dashboard/test_daily_run.py with 51 test cases covering:
- _load_config(): defaults, file loading, env var overrides, invalid JSON
- _get_token(): from config dict, from file, missing file
- GiteaClient: headers, api_url, is_available (true/false/cached), get_paginated
- LayerMetrics: trend and trend_color properties (all directions)
- DailyRunMetrics: sessions_trend and sessions_trend_color properties
- _extract_layer(): label extraction from issue label lists
- _load_cycle_data(): success counting, invalid JSON lines, missing timestamps
- _fetch_layer_metrics(): counting logic, graceful degradation on errors
- _get_metrics(): unavailable client, happy path, exception handling
- Route handlers: /daily-run/metrics (JSON) and /daily-run/panel (HTML)

All 51 tests pass. tox -e unit remains green (293 passing).

Fixes #1186

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 17:57:26 -04:00
b735b553e6 [kimi] Break up dispatch_task() into helper functions (#1137) (#1184)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 21:46:02 +00:00
c5b49d6cff [claude] Grant kimi write permission for PR creation (#1181) (#1182)
Some checks failed
Tests / test (push) Has been cancelled
Tests / lint (push) Has been cancelled
2026-03-23 21:40:46 +00:00
7aa48b4e22 [kimi] Break up _dispatch_via_gitea() into helper functions (#1136) (#1183)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 21:40:17 +00:00
74bf0606a9 [claude] Fix GITEA_API default to VPS address (#1177) (#1178)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 20:59:54 +00:00
d796fe7c53 [claude] Refactor thinking.py::_maybe_file_issues() into focused helpers (#1170) (#1173)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 20:47:06 +00:00
ff921da547 [claude] Refactor timmyctl inbox() into helper functions (#1169) (#1174)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 20:47:00 +00:00
2fcd92e5d9 [claude] Add unit tests for src/config.py (#1172) (#1175)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 20:46:53 +00:00
61377e3a1e [gemini] Docs: Acknowledge The Sovereignty Loop governing architecture (#953) (#1167)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Google Gemini <gemini@hermes.local>
Co-committed-by: Google Gemini <gemini@hermes.local>
2026-03-23 20:14:27 +00:00
de289878d6 [loop-cycle] refactor: add docstrings to 20 undocumented classes (#1130) (#1166)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 20:08:06 +00:00
0d73a4ff7a [claude] Fix ruff S105/S106/B017/E402 errors in bannerlord (#1161) (#1165)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 19:56:07 +00:00
dec9736679 [claude] Sovereignty metrics emitter + SQLite store (#954) (#1164)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 19:52:20 +00:00
08d337e03d [claude] Implement three-tier metabolic LLM router (#966) (#1160)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 19:45:56 +00:00
Alexander Whitestone
9e08e87312 [claude] Bannerlord M0: Run cognitive benchmark on hermes3, fix L1 string-int coercion (#1092) (#1159)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Alexander Whitestone <alexpaynex@gmail.com>
Co-committed-by: Alexander Whitestone <alexpaynex@gmail.com>
2026-03-23 19:38:48 +00:00
6e65b53f3a [loop-cycle-5] feat: implement 4 TODO stubs in timmyctl/cli.py (#1128) (#1158)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 19:34:46 +00:00
2b9a55fa6d [claude] Bannerlord M5: sovereign victory stack (src/bannerlord/) (#1097) (#1155)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 19:26:05 +00:00
495c1ac2bd [claude] Fix 27 ruff lint errors blocking all pushes (#1149) (#1153)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 19:06:11 +00:00
da29631c43 [gemini] feat: add Sovereignty Loop architecture document (#953) (#1154)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Google Gemini <gemini@hermes.local>
Co-committed-by: Google Gemini <gemini@hermes.local>
2026-03-23 19:00:45 +00:00
382dd041d9 [kimi] Refactor scorecards.py — break up oversized functions (#1127) (#1152)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-23 18:59:05 +00:00
8421537a55 [claude] Mark setup script tests as skip_ci (#931) (#1151)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 18:49:58 +00:00
0e5948632d [claude] Add unit tests for cascade.py (#1138) (#1150)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 18:47:28 +00:00
3a8d9ee380 [claude] Break up _build_gitea_tools() into per-operation helpers (#1134) (#1147)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 18:42:47 +00:00
fd9fbe8a18 [claude] Break up MCPBridge.run() into helper methods (#1135) (#1148)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 18:41:34 +00:00
7e03985368 [claude] feat: Agent Voice Customization UI (#1017) (#1146)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 18:39:47 +00:00
cd1bc2bf6b [claude] Add agent emotional state simulation (#1013) (#1144)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 18:36:52 +00:00
1c1bfb6407 [claude] Hermes health monitor — system resources + model management (#1073) (#1133)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 18:36:06 +00:00
05e1196ea4 [gemini] feat: add coverage and duration strictness to pytest (#934) (#1140)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Google Gemini <gemini@hermes.local>
Co-committed-by: Google Gemini <gemini@hermes.local>
2026-03-23 18:36:01 +00:00
ed63877f75 [claude] Qwen3 two-model strategy: 14B primary + 8B fast router (#1063) (#1143)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 18:35:57 +00:00
102 changed files with 12763 additions and 646 deletions

View File

@@ -131,6 +131,28 @@ self-testing, reflection — use every tool he has.
## Agent Roster
### Gitea Permissions
All agents that push branches and create PRs require **write** permission on the
repository. Set via the Gitea admin API or UI under Repository → Settings → Collaborators.
| Agent user | Required permission | Gitea login |
|------------|--------------------|----|
| kimi | write | `kimi` |
| claude | write | `claude` |
| gemini | write | `gemini` |
| antigravity | write | `antigravity` |
| hermes | write | `hermes` |
| manus | write | `manus` |
To grant write access (requires Gitea admin or repo admin token):
```bash
curl -s -X PUT "http://143.198.27.163:3000/api/v1/repos/rockachopa/Timmy-time-dashboard/collaborators/<username>" \
-H "Authorization: token <admin-token>" \
-H "Content-Type: application/json" \
-d '{"permission": "write"}'
```
### Build Tier
**Local (Ollama)** — Primary workhorse. Free. Unrestricted.

51
Modelfile.qwen3-14b Normal file
View File

@@ -0,0 +1,51 @@
# Modelfile.qwen3-14b
#
# Qwen3-14B Q5_K_M — Primary local agent model (Issue #1063)
#
# Tool calling F1: 0.971 — GPT-4-class structured output reliability.
# Hybrid thinking/non-thinking mode: toggle per-request via /think or /no_think
# in the prompt for planning vs rapid execution.
#
# Build:
# ollama pull qwen3:14b # downloads Q4_K_M (~8.2 GB) by default
# # For Q5_K_M (~10.5 GB, recommended):
# # ollama pull bartowski/Qwen3-14B-GGUF:Q5_K_M
# ollama create qwen3-14b -f Modelfile.qwen3-14b
#
# Memory budget: ~10.5 GB weights + ~7 GB KV cache = ~17.5 GB total at 32K ctx
# Headroom on M3 Max 36 GB: ~10.5 GB free (enough to run qwen3:8b simultaneously)
# Generation: ~20-28 tok/s (Ollama) / ~28-38 tok/s (MLX)
# Context: 32K native, extensible to 131K with YaRN
#
# Two-model strategy: set OLLAMA_MAX_LOADED_MODELS=2 so qwen3:8b stays
# hot for fast routing while qwen3:14b handles complex tasks.
FROM qwen3:14b
# 32K context — optimal balance of quality and memory on M3 Max 36 GB.
# At 32K, total memory (weights + KV cache) is ~17.5 GB — well within budget.
# Extend to 131K with YaRN if needed: PARAMETER rope_scaling_type yarn
PARAMETER num_ctx 32768
# Tool-calling temperature — lower = more reliable structured JSON output.
# Raise to 0.7+ for creative/narrative tasks.
PARAMETER temperature 0.3
# Nucleus sampling
PARAMETER top_p 0.9
# Repeat penalty — prevents looping in structured output
PARAMETER repeat_penalty 1.05
SYSTEM """You are Timmy, Alexander's personal sovereign AI agent.
You are concise, direct, and helpful. You complete tasks efficiently and report results clearly. You do not add unnecessary caveats or disclaimers.
You have access to tool calling. When you need to use a tool, output a valid JSON function call:
<tool_call>
{"name": "function_name", "arguments": {"param": "value"}}
</tool_call>
You support hybrid reasoning. For complex planning, include <think>...</think> before your answer. For rapid execution (simple tool calls, status checks), skip the think block.
You always start your responses with "Timmy here:" when acting as an agent."""

43
Modelfile.qwen3-8b Normal file
View File

@@ -0,0 +1,43 @@
# Modelfile.qwen3-8b
#
# Qwen3-8B Q6_K — Fast routing model for routine agent tasks (Issue #1063)
#
# Tool calling F1: 0.933 at ~45-55 tok/s — 2x speed of Qwen3-14B.
# Use for: simple tool calls, shell commands, file reads, status checks, JSON ops.
# Route complex tasks (issue triage, multi-step planning, code review) to qwen3:14b.
#
# Build:
# ollama pull qwen3:8b
# ollama create qwen3-8b -f Modelfile.qwen3-8b
#
# Memory budget: ~6.6 GB weights + ~5 GB KV cache = ~11.6 GB at 32K ctx
# Two-model strategy: ~17 GB combined (both hot) — fits on M3 Max 36 GB.
# Set OLLAMA_MAX_LOADED_MODELS=2 in the Ollama environment.
#
# Generation: ~35-45 tok/s (Ollama) / ~45-60 tok/s (MLX)
FROM qwen3:8b
# 32K context
PARAMETER num_ctx 32768
# Lower temperature for fast, deterministic tool execution
PARAMETER temperature 0.2
# Nucleus sampling
PARAMETER top_p 0.9
# Repeat penalty
PARAMETER repeat_penalty 1.05
SYSTEM """You are Timmy's fast-routing agent. You handle routine tasks quickly and precisely.
For simple tasks (tool calls, shell commands, file reads, status checks, JSON ops): respond immediately without a think block.
For anything requiring multi-step planning: defer to the primary agent.
Tool call format:
<tool_call>
{"name": "function_name", "arguments": {"param": "value"}}
</tool_call>
Be brief. Be accurate. Execute."""

View File

@@ -16,6 +16,8 @@
# prompt_tier "full" (tool-capable models) or "lite" (small models)
# max_history Number of conversation turns to keep in context
# context_window Max context length (null = model default)
# initial_emotion Starting emotional state (calm, cautious, adventurous,
# analytical, frustrated, confident, curious)
#
# ── Defaults ────────────────────────────────────────────────────────────────
@@ -103,6 +105,7 @@ agents:
model: qwen3:30b
prompt_tier: full
max_history: 20
initial_emotion: calm
tools:
- web_search
- read_file
@@ -136,6 +139,7 @@ agents:
model: qwen3:30b
prompt_tier: full
max_history: 10
initial_emotion: curious
tools:
- web_search
- read_file
@@ -151,6 +155,7 @@ agents:
model: qwen3:30b
prompt_tier: full
max_history: 15
initial_emotion: analytical
tools:
- python
- write_file
@@ -196,6 +201,7 @@ agents:
model: qwen3:30b
prompt_tier: full
max_history: 10
initial_emotion: adventurous
tools:
- run_experiment
- prepare_experiment

111
docs/SOVEREIGNTY_LOOP.md Normal file
View File

@@ -0,0 +1,111 @@
# The Sovereignty Loop
This document establishes the primary engineering constraint for all Timmy Time development: every task must increase sovereignty as a default deliverable. Not as a future goal. Not as an optimization pass. As a constraint on every commit, every function, every inference call.
The full 11-page governing architecture document is available as a PDF: [The-Sovereignty-Loop.pdf](./The-Sovereignty-Loop.pdf)
> "The measure of progress is not features added. It is model calls eliminated."
## The Core Principle
> **The Sovereignty Loop**: Discover with an expensive model. Compress the discovery into a cheap local rule. Replace the model with the rule. Measure the cost reduction. Repeat.
Every call to an LLM, VLM, or external API passes through three phases:
1. **Discovery** — Model sees something for the first time (expensive, unavoidable, produces new knowledge)
2. **Crystallization** — Discovery compressed into durable cheap artifact (requires explicit engineering)
3. **Replacement** — Crystallized artifact replaces the model call (near-zero cost)
**Code review requirement**: If a function calls a model without a crystallization step, it fails code review. No exceptions. The pattern is always: check cache → miss → infer → crystallize → return.
## The Sovereignty Loop Applied to Every Layer
### Perception: See Once, Template Forever
- First encounter: VLM analyzes screenshot (3-6 sec) → structured JSON
- Crystallized as: OpenCV template + bounding box → `templates.json` (3 ms retrieval)
- `crystallize_perception()` function wraps every VLM response
- **Target**: 90% of perception cycles without VLM by hour 1, 99% by hour 4
### Decision: Reason Once, Rule Forever
- First encounter: LLM reasons through decision (1-5 sec)
- Crystallized as: if/else rules, waypoints, cached preferences → `rules.py`, `nav_graph.db` (<1 ms)
- Uses Voyager pattern: named skills with embeddings, success rates, conditions
- Skill match >0.8 confidence + >0.6 success rate → executes without LLM
- **Target**: 70-80% of decisions without LLM by week 4
### Narration: Script the Predictable, Improvise the Novel
- Predictable moments → template with variable slots, voiced by Kokoro locally
- LLM narrates only genuinely surprising events (quest twist, death, discovery)
- **Target**: 60-70% templatized within a week
### Navigation: Walk Once, Map Forever
- Every path recorded as waypoint sequence with terrain annotations
- First journey = full perception + planning; subsequent = graph traversal
- Builds complete nav graph without external map data
### API Costs: Every Dollar Spent Must Reduce Future Dollars
| Week | Groq Calls/Hr | Local Decisions/Hr | Sovereignty % | Cost/Hr |
|---|---|---|---|---|
| 1 | ~720 | ~80 | 10% | $0.40 |
| 2 | ~400 | ~400 | 50% | $0.22 |
| 4 | ~160 | ~640 | 80% | $0.09 |
| 8 | ~40 | ~760 | 95% | $0.02 |
| Target | <20 | >780 | >97% | <$0.01 |
## The Sovereignty Scorecard (5 Metrics)
Every work session ends with a sovereignty audit. Every PR includes a sovereignty delta. Not optional.
| Metric | What It Measures | Target |
|---|---|---|
| Perception Sovereignty % | Frames understood without VLM | >90% by hour 4 |
| Decision Sovereignty % | Actions chosen without LLM | >80% by week 4 |
| Narration Sovereignty % | Lines from templates vs LLM | >60% by week 2 |
| API Cost Trend | Dollar cost per hour of gameplay | Monotonically decreasing |
| Skill Library Growth | Crystallized skills per session | >5 new skills/session |
Dashboard widget on alexanderwhitestone.com shows these in real-time during streams. HTMX component via WebSocket.
## The Crystallization Protocol
Every model output gets crystallized:
| Model Output | Crystallized As | Storage | Retrieval Cost |
|---|---|---|---|
| VLM: UI element | OpenCV template + bbox | templates.json | 3 ms |
| VLM: text | OCR region coords | regions.json | 50 ms |
| LLM: nav plan | Waypoint sequence | nav_graph.db | <1 ms |
| LLM: combat decision | If/else rule on state | rules.py | <1 ms |
| LLM: quest interpretation | Structured entry | quests.db | <1 ms |
| LLM: NPC disposition | Name→attitude map | npcs.db | <1 ms |
| LLM: narration | Template with slots | narration.json | <1 ms |
| API: moderation | Approved phrase cache | approved.set | <1 ms |
| Groq: strategic plan | Extracted decision rules | strategy.json | <1 ms |
Skill document format: markdown + YAML frontmatter following agentskills.io standard (name, game, type, success_rate, times_used, sovereignty_value).
## The Automation Imperative & Three-Strike Rule
Applies to developer workflow too, not just the agent. If you do the same thing manually three times, you stop and write the automation before proceeding.
**Falsework Checklist** (before any cloud API call):
1. What durable artifact will this call produce?
2. Where will the artifact be stored locally?
3. What local rule or cache will this populate?
4. After this call, will I need to make it again?
5. If yes, what would eliminate the repeat?
6. What is the sovereignty delta of this call?
## The Graduation Test (Falsework Removal Criteria)
All five conditions met simultaneously in a single 24-hour period:
| Test | Condition | Measurement |
|---|---|---|
| Perception Independence | 1 hour, no VLM calls after minute 15 | VLM calls in last 45 min = 0 |
| Decision Independence | Full session with <5 API calls total | Groq/cloud calls < 5 |
| Narration Independence | All narration from local templates + local LLM | Zero cloud TTS/narration calls |
| Economic Independence | Earns more sats than spends on inference | sats_earned > sats_spent |
| Operational Independence | 24 hours unattended, no human intervention | Uptime > 23.5 hrs |
> "The arch must hold after the falsework is removed."

View File

@@ -0,0 +1,296 @@
%PDF-1.4
%“Œ‹ž ReportLab Generated PDF document (opensource)
1 0 obj
<<
/F1 2 0 R /F2 3 0 R /F3 4 0 R /F4 6 0 R /F5 8 0 R /F6 9 0 R
/F7 15 0 R
>>
endobj
2 0 obj
<<
/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font
>>
endobj
3 0 obj
<<
/BaseFont /Times-Bold /Encoding /WinAnsiEncoding /Name /F2 /Subtype /Type1 /Type /Font
>>
endobj
4 0 obj
<<
/BaseFont /Times-Italic /Encoding /WinAnsiEncoding /Name /F3 /Subtype /Type1 /Type /Font
>>
endobj
5 0 obj
<<
/Contents 23 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
6 0 obj
<<
/BaseFont /Times-Roman /Encoding /WinAnsiEncoding /Name /F4 /Subtype /Type1 /Type /Font
>>
endobj
7 0 obj
<<
/Contents 24 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
8 0 obj
<<
/BaseFont /Courier /Encoding /WinAnsiEncoding /Name /F5 /Subtype /Type1 /Type /Font
>>
endobj
9 0 obj
<<
/BaseFont /Symbol /Name /F6 /Subtype /Type1 /Type /Font
>>
endobj
10 0 obj
<<
/Contents 25 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
11 0 obj
<<
/Contents 26 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
12 0 obj
<<
/Contents 27 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
13 0 obj
<<
/Contents 28 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
14 0 obj
<<
/Contents 29 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
15 0 obj
<<
/BaseFont /ZapfDingbats /Name /F7 /Subtype /Type1 /Type /Font
>>
endobj
16 0 obj
<<
/Contents 30 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
17 0 obj
<<
/Contents 31 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
18 0 obj
<<
/Contents 32 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
19 0 obj
<<
/Contents 33 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
20 0 obj
<<
/PageMode /UseNone /Pages 22 0 R /Type /Catalog
>>
endobj
21 0 obj
<<
/Author (\(anonymous\)) /CreationDate (D:20260322181712+00'00') /Creator (\(unspecified\)) /Keywords () /ModDate (D:20260322181712+00'00') /Producer (ReportLab PDF Library - \(opensource\))
/Subject (\(unspecified\)) /Title (\(anonymous\)) /Trapped /False
>>
endobj
22 0 obj
<<
/Count 11 /Kids [ 5 0 R 7 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 16 0 R 17 0 R 18 0 R
19 0 R ] /Type /Pages
>>
endobj
23 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 611
>>
stream
Gatm7a\pkI(r#kr^15oc#d(OW9W'%NLCsl]G'`ct,r*=ra:9Y;O.=/qPPA,<)0u%EDp`J-)D8JOZNBo:EH0+93:%&I&d`o=Oc>qW[`_>md85u<*X\XrP6`u!aE'b&MKLI8=Mg=[+DUfAk>?b<*V(>-/HRI.f.AQ:/Z;Q8RQ,uf4[.Qf,MZ"BO/AZoj(nN.=-LbNB@mIA0,P[A#-,.F85[o)<uTK6AX&UMiGdCJ(k,)DDs</;cc2djh3bZlGB>LeAaS'6IiM^k:&a-+o[tF,>h6!h_lWDGY*uAlMJ?.$S/*8Vm`MEp,TV(j01fp+-RiIG,=riK'!mcY`41,5^<Fb\^/`jd#^eR'RY?C=MrM/#*H$8t&9N(fNgoYh&SDT/`KKFC`_!Jd_MH&i`..L+eT;drS+7a3&gpq=a!L0!@^9P!pEUrig*74tNM[=V`aL.o:UKH+4kc=E&*>TA$'fi"hC)M#MS,H>n&ikJ=Odj!TB7HjVFIsGiSDs<c!9Qbl.gX;jh-".Ys'VRFAi*R&;"eo\Cs`qdeuh^HfspsS`r0DZGQjC<VDelMs;`SYWo;V@F*WIE9*7H7.:*RQ%gA5I,f3:k$>ia%&,\kO!4u~>endstream
endobj
24 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2112
>>
stream
Gatm;gN)%<&q/A5FQJ?N;(un#q9<pGPcNkN4(`bFnhL98j?rtPScMM>?b`LO!+'2?>1LVB;rV2^Vu-,NduB#ir$;9JW/5t/du1[A,q5rTPiP\:lPk/V^A;m3T4G<n#HMN%X@KTjrmAX@Ft3f\_0V]l;'%B)0uLPj-L2]$-hETTlYY)kkf0!Ur_+(8>3ia`a=%!]lb@-3Md1:7.:)&_@S'_,o0I5]d^,KA2OcA_E$JM]Z[;q#_Y69DLSqMoC1s2/n0;<"Z_gm>Lsk6d7A$_H,0o_U7?#4]C5!*cNV+B]^5OnG>WdB'2Pn>ZQ)9/_jBY.doEVFd6FYKjF<A8=m5uGn4gU-@P9n(rI:Qq:FsSA)/:VTP8\lhj2#6ApURNhalBJoU^$^'@mn!,BWDt<AF@U4B89H'BW7#l`H`R,*_N]F1`qNa1j!eKY:aR3p@5[n<r_1cE]rLj62'lK'cVDYndl\6<Cm?%B:Z>nB:[%Ft)/$#B>JM$UP8A0/,8MLf#nDSeH^_T5E!L-[2O5mU<jpXXBo9XeVBann[mSNE21KVn+l9f]?,n7WR@L:FfNMd5((XBC:/tmVO,^-oP]"#\G."W">`S?nEbuH.X!I9He::B(!Y;;2gZ#I4!*G,]LIVA"<E5iblY?O,gSrI[_"TE>:4Hh7\j;LJK&Hg?mS.&Re?X5NFgNlh&S=G7*]T;#nN7=AAClhL"!9_a]SA/?3oDEk7jk/&b_[Y*NbtQ'"3f0]epO/m+5V]UrDS3?;amUh7O8l)C"(.8R-4P8Kb$@p$a,nP2S+KS_I(-8A(b4nJ;\s::1HQ7joV1(6Ue/mFbSAJ=Grd/;]\GeD^m1_e:j,a[)u4i*i*:7SQPMo#)\)MPp:cDD09&s[mM2_@9]_-7WMV1]uNcb4,FnrZdfL@jC%kJHjF%6L5RE(\gZ.@GJ_\CZ?#jcYA"b*ZTp0f-DsI$.X@fcWl+94`3F9BUZ%qGKG43K5V;jl]tb'&<>?NU)_s[hepiJ![@ej%/DH:tf3+p^]P/us*LmWd1`^VLl'k"5N8H:6r'V1heU1'M,6FK^ID8Nds$'kajj5PJYn+_N^C#4k3\#C6[D_Y\MO/C@YP`kDH:bkc=3.,&8O;cD[(c/WH>Vp_KcV(/%bh/Ec3U()<\7;UG`6=[P:4ah_l^@;!pL55.g=G@KJsjQPHSE4HdG1O-nBuPFY&lmLYa+beK)K?LAb8D"T(DK5$L0ON^IB+:Q2Vn(<<atkt*'ADH,_BDsSL7ClRh\J^B^X&eCO2$NIcg9KVHoWq>0s2fp!b1GZ+%K,NeKZ<3hDIp:]INMurJ:pS&G:gKG>\./?UQ#$eGCq+2:]dQ+mj=+j%+FX`FmAogol!t#S^j0REChrCiB^6_\i6XP_9A92)[H-OBQ-^QV=bOrfQeop/q'f)Rd8*CSbPXcqABTI;Jf.%Foa[>:LE4mcOkC/q^DlM7$#aGGF87YQ4PsYuFY'GsT\r1qpDljUWhGoOpJ^<t;o+@[V4XG]8K/<do29F"^QnAPQs(S1'Onu9^q+I6=//DAT#5k(lOVZ+&JgEhZ=1e_dedNZ&CGR>Sn"(,&'<74C%2'H7u,,<:?Uk=>6"$mO5`-%cE^r.#D$n(Un+J&FcD,(btu4G`Be/i5ka60S*^"C9c-EsWYL*H'pS)dKq[g7Q]b@3Ar$XZl4sKdK0%>6N]p<\fA.PRA;r(60Z[YE/(bM#H-sEl8glMDc13\n"PjqnGnP2EP#2(G*`P4EZKWY[r52.KA94,mXeNiJ]aIb4jctGF4Y^j[UL#q<*!@4p28#j!`p>3.[nlNA:$9hsj(&!Y?d`_:J3[/cd/"j!5+0I;^Aa7o*H*RPCjtBk=g)p2@F@T<[6s+.HXC72TnOuNkmce'5arFH+O`<nI]E3&ZMF>QFc>B+7D=UbdV'Doj(R!.H^<_1>NuF)SJUP-<1_5$AS8$kL$Kd8mW9oFeY+ksfU^+>Bjlh3[E9Q-BhuT=5B9_fpYq.#B1C:9H9WLHCG_TS-G8kE+)`hnTD/Kggt54$fdqH-QM1kc]@$jhjj%Jd9.G:o@maribiV!4Iqar3O!;,iYmZVV?=:*%&jM!_N3d?Nj)l!BGKDQB_sKgce(&pK_1pDg~>endstream
endobj
25 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2489
>>
stream
Gatm<Bi?6H')g+ZaDcfBZ-B`S<f>T`j#M:&i#)`0mh[0+MH3<KTeBK4@'m[t?QIs;#pb8p_Mi0YOngIWO-^kaLu6:&Q8R&C1]$o76r?Xa"\!-edd3.RcVFI%Yql\$Amu\>IQY[ao0`D)jjIt$]_"#eK/>,mP$q]lVm@,9S+_D+/s_LRct1sTF;mq$1_Y#F0q\@KRXLm_O%.5^;ER[+8O82sF2aH8P0qDpampV\N+`i:knJ*lpZm;1.6X7ZPc"P$U]iqtb0iimqem5*S:&=:HVK^N/.<1C-u4bH;&E%!Lphek0U]q--OhL^aF=+"_g9mgKsB.leVYe@4f<)P<NP7=DtF>0kGP?OAFaKc'-,G8:FQXqZb=9#+GbYhRcP48mEsV%PT-H%<JgbH3AIMPJsDe#K;V7M8_q[;73r]QoT=XRUiA&2B#RoL=*2J.Z**+\W\aM$n`K3\OML"9KI5)_Y9l)@K-H96,-hJh!R6LgD.=>?8n/[F$VJJNmV?(7np[X_N2V*ETM"!2-9"c%f<TD++5*N,7AHtmf'$i^li;lo-nhm#YXirfr41qsq\8*Ci1<Zbk@\o.q,1lSjRU,k7VTCcTb+)j1X5,\kZ,7G`0q."qOIZ3"sZHDe*_`GXkIC/-'gd&pQ1"068[899PZ8Mi!&k2iaCd%j-sKh+lciaH/]gAhcZbF.3-H76RUWbj@VGfRMME]djehu3M-Ou@;WCE%n4,D[:krIm!$L4BDE>=JT*al;`=TmYm#PqET'Uh,aH%,\k9c8\u#.g_C4/Xq#[WW+(5&D:eu\!Y.-$.Va]@1dgbL4$1;b%1L;<;i(5"oaWFgjPYSO9-3.<I_=5dV,gE5Spb.;"hX=aqKu^Xf#+h`o(]Sr8/%'*67GAoN^DX4?C/@(u(2JSq.OF8;>.)BEk<frh]m*2e-j!_MHlP0egP%SMf1()8_,PWo1)J1J%q!Y]Cb%o/A-a"T^JUeONPH=+ES:W_N$C#>Q3[`ONAmAjcNVO"D<Oh("Bf4SKTYu[U4P$*q\Gpc`/GH-PZBSGXpc/XY5$tcbR9ZY,hc:X_qs4:%9_ubq!W08`80FnP@07_nV$W9p049\[9N5"[6(U1Ig65[I\!qcJ"KorMEM1]\R5o&$Z0U,hn.A/FZ^"P\9Pd`K69X^p$)2BSPZ-hkfrK*#<9LEL7ni@2Se_:2[ei%KMd`bO`<LB9\:=HQjI]pqq"[@Nh4Iu7bF50EZ<'/#?8.<ETQugk0qAG-hK1,(V1a9/#;-(>Kn=WCA%N(S>M;h]b@J^D%I]ilPDe%qW[B)rBqCTTX5^AlM"ZWV2;f^+p7juA;<i%_(!YY$]cF$fIV>pd6-?u>$Rms.ECrS/J`8>n.lKeMKDQc.H[S&;B95.(:"`2A7QY=5](`*bN^(YNhF[,]Djh;LmiJ,_"s=#j(8d;.g6F,CoUqRX#<Qid,kmd3EP2jC9D$]N@^pj^1eZto<sp*"jBIZ-[fCng5m"p&H)&8E52C/<rfWnTq-8L98!3\BJ8DJFks[0]n;1-et*c/5r8;U&]Dun5Oq<J17K35NB?Rs(Pd$`K0G/U>GZZC_PQQf>T)]a&A8R^g],:[]L+/83Eh?`cq1aEaXU[\S'c[`!e/,g0.5-6rbWSaQfr4W;pDZ51`EEu*t<G6_U5B4rjhu)&oYh\4H)e*p!Hf`;%1?20oY*qqb]KLUZiP7]%%X9'umr$-o>JRBQR$SK^]i2d`f5!Icg6CCaTNPgNbPaY/FDk*O6=nY1j8G\0pl2gTd9m1SDWWh[uQNCFRDIH_"[/F@r)IEObA3UVm82UN0:6+@.LhOU?A]+TI`Q\TV],jH:b\9uHGe4Q9'GX:)'T7./J:j<5J.L3sk_%qn$&T'eLSo`?3gF9F='s#E16?""E]3IW<eL.]5&:_tJ7e:#%4=gLQK*#I/(CE)oS*V7KO[d3#^`pabg[MBmkSH%92oCgZ=o<.a&lc,e<]&RI`pl;V2,"f^dC@1.3VdX3\F2l50Y=9HpL^mu-JgSgn,1G/G't^Mkhe"<1-Oh/>['oDAFKG\s^Suc*ib$@KhsVhK/BP1LXgX(d1-GooQM6CggPu1PY2?R)*NK\6XduTug+BhoEbQrsBOZ[%)SL$$Rd+1F0pu/7;0VoM@mp+i^V%K=bk<&1KsEm]NHPo"FfinGR.7Yn2,Wr0="8Wo5M+NjflT8HZGV+8_S4<'W&G3rD_QnUk0c;q3Qfou"X<[Q%HWINl_;P/+H7"Tcq?K7Ggk@&<BRL#D4F!$Fmke3-e2IE\RNE4,c'"6c(odL+r]3`%'WEDiE@2)+?TVq/]S747hL/Zl]FBu4C1>DI8TGrJS$V"JSH/D7*.X75>ZZa&aOC8rp>e$fH/N:92sd>$MGU.k/uQUm$!M)SDM7g5,>%F`%T0Vl9lS`6I(*O_4NOh0/NOJ^=t\lG.7;)rS&iuOo'9F:B/sVFYD+$k=`9/(?luKOWLDHcPHMY(ZCqi&TQ2S!r%q>b<DKp%mXdk2u~>endstream
endobj
26 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2711
>>
stream
Gau0DD0+Gi')q<+Z'1SNSXtlc.?^DfpG0:dagd\5H-ok]1n>+E65!c@?pN(rEc_9ZScpNG;B2a*lc$,&K37'JZb+O9*)VNqaF+g5d6Hdck\3F^9_0Q!8W_<s3W1Wrqf(]S9IT'?ZQL4,K65!1kkM&YYsC4JSkR!D$$2Q4Y\WHM^6\ClhaeumQ*OV?Y'!D+U?Rp)<RYd[;a.#QH^]H)S*[6kc]V\)e&8H]8us9aHS^:GRcPDp7+4+iAq8okJ+F(>Blg."9*4$BEdTd0-IX(YI]M`#fk[+![o8UQn6$H66IB3=92"<@M@H;AP%fg,Iu\"oP*Y!r"Z+AYVf_iX0Zipi[7HJ,/Dr%+H+G(\NG7Mp(D#re@kOOE-gc7`<*c=8c+!'V=H6iSp+ZM\ANG119C`M`%?hkeXYf[gCca04])!['G1q.:'LoD[U>);c317bG!L!<i0MU=D:NWoSQE2KN<SVeK@K,l]01[KgDa2A3P+m/?SAj""0:;Ur%&R+L8$_P.JZZ<o[9_7R81KH-[34q$rXr)Wh7ZQQC$bYu7'0NiXE@OubP*]_i>O/fNc`J2rGKi3r=`&0AP'"d9-flS,dhU5b?%J7^n$/XaQc5EX3Hs!<FbL83uBYXGpDT\fTG(5.BJ0hS%])bf2B%f+TX61YpE`A'XbKXIV\i?)I+".-/8<ijs/<_(9/V4'nZB#1YD=6=E".-W)>R]&bS#U?m1DCC[c=8Bm>Gu2<78T^H[@Qs*q(6?7D<dO852tB97aXGeG%'h+4+J"5_&B4#ZiJh_%%FKR8>AHQC@iU2b>UGe89lLJ.fbnrNYjZYWkSO1S7[eSZ(^]2?Z#DA80/qhF.>,9Xa$3,Y2R7/HS-:f$mm(/DM=J+b5.k9/`1Nl?2PO2.qI9Q?Rm1uE8c93HI@8#)0<Qh4k*nn"nbel9VbF$Ik"cL.4/Y!==RM:,$H#M&4?&Z)9tA_4N&cm@\M/0L5Z4iTS<6eAs9Wii>((.KDW43Xd!=sO#]M*l:,k2A82L^P*s3OUVDYYpWbU6,`QmG=GBjrLl20kB-[=W%Ns;>$6@g<`Hl*iA^,+dZs/.bd&LmXi-f^4Z/:9Z@-ZYI*1"O.,Bjhe-`FHk;$0PYKtj)!W7VI0[t3_uJ.hct]Ko(?J"kP(_s,PH0]H.8WjhZ<%2O_QiJt_61S"6EPS-9*lUmPuH?D\Di%d3!b("RQ)k(=idnMeB5&Ha[R].*_6g3ce8V>lM@6:>t5)[aK(R9C8"X13@:_,,qs8g'sL_XIG<><liR$//JY%ERj.o1*_iN2"#)chKW.5SKj,O0:mQNd!o6FV+T.h(*Fk2[>NfAC<&MlOio"RnL`Ko[3G7MGqAYrN(g&c5Z79-#iA4n/G'$]R7=LIiDhgb@XuXKOFee7Af`:&h-q_j&I;K\o&43*</q@sPTCYW.TpNV58(Ap!Fl%8"]J?do$7clL&77;sd5U"2]m@dDIfeORqHAD2ICV/Xo4[:-IA,U[c<"a;o7YabqR<q9&_[R8cL)@Qkc:.8GsQ:I>k;(T,.4hl+SMV#UjRZ4J`]6JDh`uCi6\IE/K>hZ,M@c]AHTcQeL)W%g<o'ciW]G$5cC`k7G-F8(K5?^rDR'=UIUALh%sk`d!CO/iUY*42DTScdi3918CA@"39l=gH!gSh2o'_pGTe(gbK'k0E+7N!o"aeg)\XXC#J\\okne[8=D8bmd(fNPDYF&sMolOo<VDsm*aI'Eq-&_/deU`?NE4q?>52Z^g1nUk.OsQH%]5P<UB5amJ-:5Q:&&j9F:W&e2o#/@F9hE*[$H]Er2V][(U0A;kbWrjXG/JQ@pO<N3SJUoXOA48^I;#R\crt/rI'1m0DH%10YO6Winh]ZFdAj'mqR.fUjrlOllm=9DpY8=UsTYDeS3Emn]hDO:mdNTQY7>JQqi^".9_<OMnSWJVZqp&`DXC3nsX!+Q+a<!*n7?oDHPFNA@6P_EEck`hR(XK*aGHE85oeDR$'F&d1<pD2V:aS=fsBi'dBVd2%[`'Yu&5h?+Yllo3LjB[#8S]c?9/fdO%fERqafOmEaQ's+DkA5qbW!:UQ=8Ero#tqe@hZ1_5]3,b/FP=asg7\3X4-IoG:>^#SO2mgH"G3sBg8SHR>Fgu-J;fXAA#'mA"1VN"u5/#^;2%68(uK)8mK7`k%Kf:i9$9/8b78;f`1n=c^fh#_o[TeA^bFTL=pP)_*THO9"\5TY4&00HU],N%1UN+`7:#gDS\bJ5)1Eu;W:R]!F2d,?=,UGehUkU2aZ`BA[bH#PWp(G7NG?(r17dAt/s@#!jV1:>N,0))qYoG8U["V^Q;oO:0;KbYuP0q-(*.`ni<:=+Y'RJ=hFagH`a1+cfR=]Q(DLE^6eom6)Z_-Xq+;H.eb4nLgTN,.V\$8F=/OG34fq!OifKS))`no61(%@P`c@7pAANBY<[Rf-)tS'p=u=7h.JnT'GnmraW(OP[Dc&2-l7k`%-?jM]O(>t=himKCH^rRr%/f8D^0Ua]h7nb3%8*r?r>92%k%N;hc3E&$3gHpkjm/Ws("-&]>fLLP+rkd5,ZMDa!mi\K_i>tXq-%$eKb;(cM/1h5D;!q;?NkZT_sIEcX+eadC!<]j6#/e.Of`!2HSElEP*iEfHp)G:H@#[CqaIo4oBn.lYUSL3;SR%M$<Gk"p3TC8)!0kq&6ipLmu$teNfkSd=!X?X&n?r%JXk1J\PNe;Vi9,n0WSc'?:FW(;~>endstream
endobj
27 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 739
>>
stream
Gat%`9omaW&;KZL'ls?]<l4P(LP.XkY_u<g]5/`F>cPqN\hjkc(=6CagDS%T'1ub&e&Lh"46's#NYt[+=FX[9!,lY?>qo_,l?cp%8>t_@^9/NXhBTf?LEek5M%\bLVdm1C!A%fKJeCX,(klr=]VrSk\8-TjcEC=(r=dE0p,dY1`^%4TR\t-!0;3iFqB@IGb/Bhr`e'"lDAF`5C8<+ABr_hu)6Tc&SG<-523Ph[C("2XjH(/G%4Gor:]E=l=5@>VGpTMrG\%m&Q4;QG;IcQX&0Nru):YiLLX*g977A1G\:N*`Kin5e&Q8TCJ^4\,f^@E-#M21"SfZ4VEuGn%IFgZ0s6Y2X[31+g\n`DHEj=<aAfo_Kh>%>R_]HoCo6.[s^cT;9n(-m7'ZUY)`JsW/oCDuL%qM$oDL\+E0Zont0T;;)a,cdRV9ZT\SQMR98THMTQ9(.>G!Zr0cKikEYt=O<]K$X1\9!!+05r;\6.-tO5@kEha]&R/Bb6e1JUugo7M`e'jM5jL4Nm@rQQg[;fb/PX+?4LBi.As2"n3ct9E@TMX>3`97IDFBWkb/^JU=]]n\qIDh9,0olr!Jf]Z6f2N@F>dUiN=tSsBcFj**-r_B8=B:uSr)^V^'nO4kp$KOGosmVSRR>Nm4f3`9Ph\Tl+`FuJEcp1Uo.BLVi8`G)d?$(\1XbuR".o=UYMf^H%P58cGJZIlkKLpOq8[8*;Q)a$I-9#I$u\,?K\Drn[6U]~>endstream
endobj
28 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2279
>>
stream
Gatm<=`<%S&:Vs/R$V:2KraLE,k"*ODXU$"BH*`&LB%N'0t%ul<(SRBpXejB8_J+sW=?)6A'#GJqW?^p!q>`0@(u4Ni6N]3IiCWa_X\UsfSa0`#&feThbVI[#Vp_1n.N4ubp3&iGHZ$]"G,SS8%of:)5M>LX5S02iG]rX\`Dk`d5s<$U4pc59jq2Uoo?c^;cnL$jmOI*^aWO,?CF/jq0Z^g%`r+V(X8-p5rF6NSAu":a8Z)9%Q/t-8HVQNTcS3.h_iX<e-k*9$8,(;Tq/lmeAoO=Z+pfoNU()UO"L#J-I&-s%3[E%KcqU^qVd>;GHJU#L#b7X`P@""&*T,MHQ</P=<mneY*g@`_L"<H)-Uh*L`u9PhDfROWe?rc7^1[bko3T5#?r?i5]NVmd/\(l"kupnJ:SW;b.==s*a"<.X"'5/HcMD+ZH9/Mi9Ce<_(3bM6#W?5Ui&3-WHLhi$E6<aQJX+;)m20M>g"m(KN+oN5E4#4>)euUb(C4neo3.HZE+pY;KJ]ra['1,k3K>3>aEVQ^3?Y.p!3F@Y$q61>S"Q.%A]E^D<qGG[r9Go%d2Dt;:.Z@@.M5<g#I)&&-]'GAJCf`0U0r8lebLN"muXp\9mU70KU7G'`T(CP22l=86L]JRCk3hLG&$#YTscf7T)9NgE02G7>S@IhtV?31qE55qG07J&nD6un&6'LJ6/I_4$?I\,!S=hH\s,5CT`H#@FE8^.T7\*b4Un?S=>=^=9mV!Rj^9;B)7]?9H<6)P1>ph>uP^AZk11jNKZYr.QS#GcH[d[F96KKDtn'GC'Doq9?jKe[?3I8lJu2>(b1+*:ZCf\]NFr)i+`LqR"T\u-)um5q_c\m22,Z#57UE.pLR)`;NPgMiZm51JJ6BtGr>u*j"@s$Y6q0g_Dsp@fNZ!!,eo#2PP-3,Lf3=S7l7P\s#6.)9uUb64:4p*p'ck[!nE/IhS?N5o`U,8TR#?o9I&5mRYKA7kQt:T&N52T0>W0RGQ/#C:<nc.J7gire(f]WbE!aLlJOt;P^#/_=RGgs(0/=!j@%F:3C+3\n!ZAT")NsrM!"0GX`b>YeZ:?(W^W2ME,m-R"YjAH[#p$N(c`c&!mb3#PW>eE&XD^3-NYMs@PPpPG7;gE-1Xceh8<B@-(,`]S:L:]4"7Ua1P)3/q+C&h)H`:)ncBNq+0j/s[%Te;!!1Ml53!J@+V!>3/FV+iQ<Ic:9E9!b38U]@FH)jndE-Vf#8At.Jd^YQ%JSDN<oYk2qf[S3\c!MZ?e\B+m]`U9C3po;]O1>mf)3@erqSqR5rr+D%m6d.frsH7Ibc+0i?.h?fmYs'p8ci2oW*4P=0i%C8OC\H5o2Z7bq`Q8X5RNJ^sTa,l^rQNW&9M9f:LfF&uF:]eMN$T#(kH#D6CfQ#D+?0+0@mk4qL+g3)@u5C!K;F_[$H8Y7Os1ZASZie=:?[Kttu@1u-8CIJFTB%Vo?I.[*XuSNKXPfM/XY[,KTX6%(H9J/;e5,"dj]^&Wc585nOcn>52MCkaXb\JYRbOW^\GD5:4)RCYD2X0-r(9qS:1$7>t9)0-VS_*CB*?p$Ht!>?rP0B0bqd8GJGBUUICWiWCce'(Y;3FI_j+[t/RQVFVLA]ksmZ!u[e_Z3&.DXkf_Wb?&X=Q]-@M^Y?br()lIK!&(&$n!KKq#Rs7ZRgCLj`o!HpEm<Xc<"!BH'@]I`jQt&.F(J?Pe8S^T:+ZJ*S6[Q\ni:jT8Z/]Ngf4m+q&&^OgstfGnpkKl4?YDZ9U'og5%>LRs,L+<dceg5,!L2Y9dOc5<tTEH&$1(Y?YUD5+V(r<oXrAi0qd@S`8lR*5sYt@Pl2^LP7'63Ar\/kU,Y#-?#i\+L/sJd1>9NMP7sB2N[XmW\Y"N=9J#YkPlM`(K70LPX.Bj5J+A.X\m3u/&/Y,q$ds8@q>d>:]go1UOQ5>AE#J;4$WB]Ng>auiE1ekCkZm`Il7u;Zu@!%*a>(rE&<+-rn_KF[7d"+%/Vre#NrS@7Y;P^:5`b0a/+@^pr.o7n)/TU?:'b"!6`>U6)f!4<l^&RR\sjTn(hZi:s_$k,2Zf`A;64l6'2O+*bBt4h+&hn4k#J<XA_])?Hha9#.5k("k7'3l:CTNjV[eQcHW:tSfOjdpSg0JCg(/hW$"qM=?^?*HVS&WQiYP'RLT*"3/W)^*t#/k=dj&*c0i?\5u$nZCTnM=c(0MkUlk>n'-"9kYpb-/l3MDEBh'U`ddmf=\q/JG#/_+k6B>;I?Js1g1*!#j-bo2A!ZuF3V=*^ITAt$nGqJ*j2`u'M*u-,_?2~>endstream
endobj
29 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2560
>>
stream
Gatm<m;p`='*$h'@L'W#k>L@7`0l5q0!maS.8X[fk3d8R;/IU6[4NWF%H54r\)0fD?V29(1@Pq2d_>['V;7CVYjnolJ)_O,*t*=Bl@@p3@L\?9q62i6PJtr$,<b)'<)#b]BZ@0i;h*`G-f6<Va%5qfg[a_\9EO:u@C4Zb\7@O_dr\O04e+</_U=iG@$0UI?N&;bYkS9X5Eq>&,WG:raV;Bkc3;ltR.MdY0*nI!Rc-rq^lQj4qYT:lZkR[R"baUDG5,#6bouR(Q>=2i\30V<3#bR*)[F8/6@q2;nO'$h,IP?hQ9@HT_9oE+?0/'5-OUXP3St39Z7PrLABG7hi(UGDAN^;@m]dtC>:U]JM*_HYkLB2LpPp!6'_,p*HuNopY/;,*@iW\`,8X^2.MA]\6"=b+6J#p;"\?"bINu*#>&8/2o!I%78Yi/p^fc7&(q`#m/>:a:X8jE[\ghGTGpO`;=dH=`"_SHE7DU72#,SG%DlOM^;1(_u+@^XlktOcoq"S$hSE@2?ecY>[rPuLI$^.\V1Y"bu/4W4pZiP3(bEL#)dpW=[GM3rHiM(9=nDb/k.$PWL*OrV[VGdU'lT_b\T<fHH-W(Q-!2_*AN]*GaI1`L[JnXl.Wh_bSkm^pY7*I)3`0SL_'W"eTKQFF@6VQJkS\^"(//@0T)Ap@dQHpJjU\@n\E\bs=N5Y9)*5@.c,c?ul87[,U(L&(3GVb_*Bma3EKQYFW#qST:Q5PO%&<Tu=-1IWDXTtqtaEZGu&kUQ[TseE2XDspJ0nksEh@;TiE[l>Q$]EK$nROY+;RShkRX;G:jV*lu.0d%j,RS+/CUl6R:ZlX>/_9,DeC$rrNfmA[b+!_l0r,35[8NJZX!0WM!G"\uWSD0LJn4cIoJX?_7r?BVgfn%1eHYu`dR34YZ9r>cOm]<;3[d%4n`L5&5FsIPk-*(hEcH,N`!+u!,gF`s&iXgVb8k6QN%rh^9O'-3+KSd&g*sri;B_AOD:3'gU=#,)qWI]o0Z8+&ARa3=SidlX7Z0?3\d3#.L,YSD"hui2*o!"JGYKrhD3e,r.,0l4SIG`lAd36nKkhp*T8%OmNg=PoRb>=<7ZaN7r&V;nVSCF5$c]@XWFLWbH]9Jd:&8T,W#VsU_X1%39BDI>;C2)[lCX0F*!:)D2+`qBQiAX^a05i;/LDMe!IbUYXqK[0B3!mH:au6f/idTqA#hN0ophZ<'FNo?>uY]g8:?HA6!XWub6BGaKTBa8grH^.9mS(?n)*)CPXg\=Q$4J?>h??@]a0;Lg3"5+<im3`?cfU:pNM%GX.7qkpS.en`.:D*$WU.7bGA_hHc>kR4jS!P5H68(Db((R-Ml:%0.XG-#*:lE^"PqXBP-b;1SC-gM--r-[U-GoefE6Ln,&7`o2!`/:&#Z4?*S<8i#Bs"dop)].h;HLU%]Zoi)E)W\fDDT^L8Mb9lfeI#fH@brXmc(7ct/6AKi^j?%X7.B?g)l"@F3^6Pt2T':gW^"h@2`FYZ92*>!'Q(r"=,?a:B`-a6&,[g`#bDjXAIC;WWR[?@Qkq[N5USK[l1Y%m<a=aifh8r?Q0*cd7Fhsd2=T@44<$=79Xf\N9K(P?-q%)OLg"83\V62RF]1ERWnN?UEIne18G%`Ap5W7fM0MH+/X(^[^Ap]8!A%#.VXMnp5Ib!?:H^Ou%D@]hbcP)8fSlODT1lmB=7gWLPF.rTn=YUrFXL#k$:jUb1^U+#&1P_O&eA`3:V#p'uV2GluQ+cqFod3L2ArBXsf%dnDUeZ*n&UDrbio=]H']t-1ml)qtWYIh:f!"E:<EpWc=.(<ISi4A@rJmeA0iNiYM:sKaTmjC#>]pISpp2u+Z'[=Z<(dFCbC9EaI/[q]Fn+XX8e=9"Wrdb@1^X6%coM>DbjTrK(qHnI@;YNAcko&!_\o]C.ct;qDR,+NPk3q>SU1l]lhV3$dSD%t1DoVsp)oq\r*4r(k*8fLjVph^'S+13jG1pX>4/HA`e*g94SOV5u!A^F1',[P<>DL^.(MS2mId:T.[iSVsB(WuhXg78=Fea7q`gKSN<tjucH^%0G!ef/VY&q-oauCI8LDtLdpoRV'QK*X\5(fBjlR6mMV9X/7$Pp$3TNWdC'i<_C,X;uCW]bF2f48ZKF`POt2)[$4j*5+3Qj!8`W!'JlqYDZhr&S8u!nM):Ar?!^"TNrDp)MYR'f+C=bh93R-K/HQQ#O/0_Q?]i3HV<DI!gm0?QFPhRm^>P,eIM3fd`tY%E5ESdIT:RA"4;WpEdN'</E)bW=US_YD^p/9m^@me!u:q-"o&4AM3*ZC%0rdh=0(jn4^*+r0_3DD#6GY&KqU#Im0CuJXZ%F<4Zl,'t3WI.c$tk/Na2X(R;dCfOSDb1FH4WnL;+,pf)KY\5XU$%EAciV7b')UXo]ldfPCEr-(/A>^L:J4l9R0)ZtOaeYa@S:Y2kl_:T4do7-6Wq2XbLflepYT`PQn3:)U2<fK1q3(qk=TZIBSX+Xab*k\Z@$9!OO,$S@,Z6BlqQ<3;5Os783KQKZBl^>=L'=*M!iMC%BE@Y0dWkr_Wd$<mpbpn;(IoqPHoRDT'76C~>endstream
endobj
30 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2470
>>
stream
Gatm<>Beg[&q9SYR($(">9QOpV"0]/lo&/DUq=2$ZnH]U8Ou/V&hH:O;1JPi!$k"D->i3C*6T%P_0g<r!J:B%R/>*#J@?JBoo9%\@C$+Q04WYILS$LAIpTWD&S1NC]hH$1jXR!SnF3X7&HV2mO/$)##8s<fUaolYfaISVmtD?o=#EKmYB::]=IR)rQK=5m`=K3K$C`,oaloO>*A>kM,(IlC^(ZTfFtTOiOsLBdV;Wn1,96a^dk?Lk%Moj*nIfi[)1ImUMUQ.hI8fY2iZlV!F%QO>9\+"7OI*I@FnE5?!Q9ZXe[lB[;cOZtVA?r(/:jV2DAumP7d:=ub$#X0]H<(.nIZ0A)_eLHXV1o:^KD,M_nT\P;-F2L"r>Rl1ZjRf/0gHkWsCTg=T"+)3'tOM*QSR+`)hbATlaRtWe#d\G?^mS:q!e5Y,mAH>O2"9OnBW$RjIu&2t3(jdd%o,"e]k8jrY@4>;[XX#/hF>(o8_fU(FlBW"=:^\#h%8[jA5(/Ag<_4dIDLCuJQSDnIQQ!Sl7HV%?!u#n%^R)J%Y0F,:.lL=TqDKA,No=F1N$=XEAVE>Y4!\>a._`!nU`Z>TRHKuS`kb26>SGPir\%H!p[;h0h:Qf:8l8/J\n8$IdLjZEXMfP6%Jmqdd2PJI>`Ug_?T'n0*,RsZm%+cpj[g:UdpZLfU'`irl(C9C[sIcE9i19:PqfnIUj_h,"G\7!T&SMR!]-7iA`/rDH/F:++0Y1c3%3Ld^"GPgM[m*QttoT#DICjII+)4DNS[bRVMi?4UQ-r`1!IObl<dV[CtK4X!sNP^]kDF>WeHd^Z<IbtlE7jq`kiL<[(lK-tbW$6DbaBXTnQ43aM$GR&8_+pG\0nr7Z@Sb\hR9)okL:B=?7!F>6$-fsXnRB&K*FT9cs)oY=%=40cIO7Vt^6Acp4euI2?`,bZe(SLblq5PoPmN,NN0W<[(O&VeNu&9AXd5mP6h,_''UuWUNDENDF?Li'(qJCpJ"a?bD5A`%[:e(eP_s,7@-bV!rs+69ALq0o.;q<Y$V@Q4&d^n02'u\Q,'1'a/?UL^)U&iuVTHKuju$rihp#&BS1r!4X-#jc?lKo,L0%DR24NOjPrE[=;J>4+LmCh;Gu*"rV%hN$CLhXNq#glhmX#>6nUH&g)^Wk:ShMZ-`%DO*#522G<X7IN+5E9OO\<%jWdk`,/7$<XSh!r_;B;&1Unse`\\p\8\rNmo?"Agf.%m(f9/r)p'FdCR3'$C;]n??+0Ch2&T\Oi8S0VM!W0hmJe)muFf,t![2NAafl`:Y_h<PAL*HfD:cg;cM"Jb9-quf-+D3PX?BUfUYWhVpH5tcn8KBAcM&p-fQ-_mn1S^KmfSb/*rgn_IG%l]U98\9;:3\"kLYHU`q7ZaA0]L-q&0_PE!m_;R#g<;TFa6hQspIm[he9NbprQ9K?F]"7a*/j#h-Bo.!]c"O8#Vm`C?LSjrqo]Lk1A=I5=bX5nG(%6@jE!^0VuN'Jr4n<2kkW=HKj1YuMhu5dTO%X^a!'_q?T1L'na#8QW&PXI1h+=h=Ac_\D(l'Rl7-Z[TD%7IZ;ET"75GOB?((:s^K8)/n4Ur%J1[4]F>3$FNf)GU@d_V_lb0!X1[!D,cIU"nA_uP%$j&dJCS>8rk!=F@YPA"f!ZM7As"qUgAu=qK#(!0"X`?Q#e_k6q)"$VG5=Q_!nS'#9qfV1WqK7**etWlgH61YB%3!gf\R/.<@6)Gae`aq.l?T[s1dt[Jdg9TQ7bo$`eA(hS=E>Aah>I,Y2amS7g=FVF[[TGBnuL)rO`pjj[H`UJ2@S%&3n:)N9;C!r<&fs[Fc1mAT2[7j2m2+!9oF\Tp%gXldG@%$a3KlAKl2tNS!tW\3(h<-KHJsXdTA^R:h1(saLs\X.bQimrEO,,Y,c"Sic*h1=qcB0+u9.o7pm9A"3uu\D>96KTC*&("U;^1A#q)i6g2n.<g"dqrV@L'(jcgB[nuHG^k>"r90\pk[]S>m4p3OD-J(j3h;!SQ;bc:cQ^Ac=U,A_rCg]5#.OB+27Y$39`YoGYo?l-F]J[XUNH@riUFc@]@oVM'r/N9Xkh6#A9;A;"Sj3k+01E[^)38#-=Vgg[QFG^uX`[(<3r3jGFUFM^F)A-r:c!BFK9k#EoP+mnA`/e+i6R]_JN^HRCER9+q7"5$s0Si>,^6FeI?_3+amZkmdETH?"rQTSDI?t=46'=3f)Vjh?MjM6Pp(:?G`Ai:EJTa_?G0"P?PgE`51m5m5MUr$3pj&dn1]jW@M=PL\5N;9JAgfX:#8-Z`\UE1G,dc@FS;i0a>@@>J/1bhCR1;.O2)b^(efq7l;UeSfP=d%1f:pP@,IXd_I*-AD[*QcoIcn!:S:pn*LG="=HLj+n/k2UK5MEY]TT+mGaG>,"6[r/Tb-IkYQh2hT!f1;;iTY*7!f#C(B8QEOnkU.a8.7_04D3q,g9ZKVhurg%Tdg80uUu([;X?Z9Srh[p`DJ7'Me~>endstream
endobj
31 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2152
>>
stream
Gatm<D/\2f%/ui*_(PnrY.\"fpWsXgZaRr:DTQp7JRQ?epuFN=[WRl&"P7!FdZVr%,utp,BmQ\uUe!YE`.qs_iBPnCjqY\d-+nWOJGHE#J;&mmQ=o])H1^d.IhG"=&$eW?k8-u\5B-D"f;-4Kl=&U&68+$<6G3_dQQ#sll<7jEf6+]X1SqPL%ndO;b,X1CVt^jis2"7D)4@SeBAk%++Y`5Po<j*b,.RuR3/u;pQM==ETf_E*5<kg=E-"uG*%oU!0ZD?FU+faFp9,)]O"PCDK;HN(aZ.I?+Koa5DX#n;ocPO+?G?/bohbHJ+a*_IoQ1,@m15Yh3o3J/_br>Y`:o1:bfASs4S)Yj1Dml*0?F&Qk#mQ\m6(`+Gr4sL(m,WuHGX'8@fi=1>g&S&;"1b&2bJQ#/[e9\YS)Yk`<t1kYIoG%K,*9$TSfJ^a)E9X%Fb`]8Zil)/]n8u.dnia\%!J2e-qi=HJ:%*DK4uSJP,F/e,63[ODEMV/brik'ZMP!U$$ho:hnML,9MMjZM4UC5mo*4*A'%2n.ReZ[ONg;#F."B5*a@,UVY#S)]QqRX:Kr%&'ZA-1&+%LcG]*dR)if[g]k"s<NdZV4``e2b*t]l@h5`8=A06^1R0A.>ja@ooRtN/G2<gqo_P>%Hs3_l<o?K=cQ$]+6+aA3!Oa;N>+mc:hPa2]'WmoL+$Z<EKUeB?"2)EsEbI5`1hg!rmTKWBEaie^)jcmKP^G)s<lt1R7UV03n-aJ^Lp=naV105jC`LO%.")N0_m0L">ZKNVO=)$*Xt3k9f$9^cJcZ"5BZRCVjLXtM"4aFXhOL3AZs)#N).NlO_9EKoI=7NMW`p?8ViLFh/+h]/="k:XFNc]&pml3F?+J.Gs!WQf\o5_(l="O3Md#%8XB_4F:n^kmV8<]%h1u*k'VM('MOm,WkaZ'ZWk-tGZ*I.(/[PS3mrE]1A\b9UrA4$)hAhZ7+Yc9.Q`F:i17o5<j2YPD(H"c8\?6dL']-8-DeC'SeZ=mV_eY>c1h6o.fM(@QQ,ql/lN.A"3X(`6Ea`NB,_u@F#I/lpG0*t?H?o'sjsGp.0JW?4.h)8qkD8QCa$=Ck^"bK4F.bUJ[&\K,P,9aDXVJF<0rO5]D?`#Wcnag$\r%\/j3;t2>CHQMleu2QBIX%dZ*5C8km]h#?b<ui('?DEiVCi&>e.S6.)[Ta_uK`WTn<(\=e_T"Q*'@/-@/eg7YY(7esn[])P5iamg#'P?sJ>/a"U<LrHs]eo0Ks[cURZ7EHSp=LKPUcfdoDXa_3mUIIT\!_XtX&L*mf31!q,MSEoU,.!]9^MB(NXeB](bbS0Hp6=(m"*1.7;/j/ln^saj8Y&&A8<7d?r.``Uml8=_r5C>bB6>'B"eT2ka3>1-fF7;e0>#a..XEnK-S"t(qDZFh_08k*:CA.*B:Y$^tO)R_AR0]:mB@"tPUr>F)%t:$4AIR38@"BEe4,%:pWg2)6j`m8tYs@,]G`-.9D;_FXAW(QV9l'TqXVTM$_d[tM"t08<aDZ;T(4s$:9:LQ_iH>JrKr0o;23M+X\6uq!pD.@rr+;V=qcY3bdp5^aUC-iunLph(R);S0/7-D4X49(>aTI+e_e>/p%b*5;#DaG97=8.#TIk"_l'9U[5LAO<g"sBRb97MjfIk5!pFJW*I4@O-8)k1e%LZ!.]dKGMmg5rI*^iecW2b0P/@'po)MC=nG4;*/msa62pF!iH$7oIYee'Xo'WL[A?>h`5Kg(ApbIdjQ8Z]7ENoCosB$/cf`>LSRFQ)nm9oHC!M2AW__WtC5@.IUqLXiA9c0\J#pEQZk,Nm"p)IrD[@#gPKl,*c91AefVK]a<5BJk+<`6p`jRIS)%q$,0RCSTJ/]2E*6ee@GpqZ0Y^SYJj(g<,\/GCc[&V]ma<X=_2:FYX2_-(I_TXN]cBM=n*;=.8I26f<VE1nqPoWtg5<`thTE>gMq1ZV>4L!`*Rh3HN)JX\Icb&`S]^*c&q.O(EB-Gc],cm/\RLbE[+]Nd^/']=#1maR%<CH*8nnObVr-lEF/na`@)IZROM,Tjn0&g:<[ZK8d3[GcVroX],Z$Cb\Nm)!X)%aA<CY%iHu-iX$!Pa*DU!TemhQj3`j2>WEWMDD3d"0Yfr8aaPr?JYgYt;_sm;c=6[hN.r^7\&-Pm780Wl~>endstream
endobj
32 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 1311
>>
stream
Gasao9lo&I&A@C2m%lL5=u-0RWSZEjS[N$@S;&ia_&&IA7DpIFJ=mYMf69MI'WjcB/W0ZH]6Lpu]3%lPpn?`SBE<S*iS=qHarmm<7R71Q/R7J>YH)XiKZm2mK>bla34+0SqPuR+J@a:+O9?0;+H=dOKo<SZn4>bN/``cbHam'j$P,'g+d[&X[nlMunh6*>[31BmfU;tX#2ur74l6O<A>'opEKVX3#>J>@XjNd*rU9LE.dU1V,Z0)P6lA0mLnce7m]D%9X,e+]!K'c*NS,4-MA@SbXc9T/emclH9J'hBN.Da@]j1eWe6j_qrZ4`e%VHDDs3Dt4^9aK`=i^<L)[>VJn!Mk'"aLDNjDH5<9;SK<s-VlgL3uhr?+!neM9c$$(Y+VDKC\2O%l[D\B9Yd'(<Y6/V=[YATS0H]$HM%_KZNF%[)a2TbH6-V$d'oHi*(1H<<l"#gP21Rkr'DJd:h%uHdme@1c=ob1;0"dLNM@n<d"bq6UH5'<I'QD;E)43H[?!OHA,-"7A8dTFqj2WS:$kKVt>O)bK]+`7e:Ka1SJ>9d@sIK'H2G?X>F)fXDVsT%VifjD]6"=$LU\I#M:&FP[/u58QVG87)tGmA<s&J>F.U@^!;ei=WUrsn*<K_Fm1VRVd8#uE[(uT>l9`ArU]Nu(TISKj%maV_(ub>^$O]\p@>IK'CB>q^l3m%BYdo[&Nc]4`'#j9i4Nb<:C2?n4FoPaX21aX6=\F$`l`cc26bk!B$mtMn$W"LBu#)Ga_h2Lc"6(?1^A7'c"LFN*q[f%?'SHmccVqeh>`=>4e?W+bs6B]`LJF)j"hBC<&r1LRnJ^QcBZl#CG!INDO#S^:^SESj5k%0.HJqmN$tC]h7su^.K/=cgAtV<66fPXQ>*,&\2V$'FP^7Bbmjm0U?fW25WO(icG?(6PjPc+iV1M&Ff,1KLRq[`lh[+lgX\L0;hB&\6KTOQ1J++eW-PtkoY-]\XiNh$:@M#$UMt%1G%qr@lf5rllu.'iNK;^KRHN@M)&_96AgAABEjB))*;,M3(+7cd`@JbjMSk.W7pkF--N=jQ*Z5s2>PRGp5)u8q"Xtb+&u`DaI5_h91e?HIakPGY<p5$HZc+hK8h_-[.qib2I1WY@VVhqW7H&O_/+Dq,X)AW7;)EVR3s@\hShMNB4D'JEa,7*t!-eQ/%^IP(o<VdDg"8,<a,1fC1M@B9<FrBC9[1g8@%5ahC,O3m81ZY.80"s\F9?M@]G5[8fOO.d%VU&T-u-S8;=UfB$:0=Ti%n[Ye6kPU=<EjpfLG>\5nWU+r5+)Eb$M6&74$V=J^o671ZCq~>endstream
endobj
33 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 1124
>>
stream
GatU1997gc&AJ$C%!(&;)b/=]]d7?t8?jS+`ePU]U#iPu/4I,qA]+K>),cV>fgf5q"t5rsS=/jC7RIXlI<f)P*oPiehL+7s;cmpfk:DDM4hP,Sra%uc#'DUVXISueObF<ns0UO"J5=Sa/7Eg%6WMLD*c@8a_ABOKMfPls&akY3_-ajT?n)!P(fpP@Q="(rF%C<`.;_s`eW>c15)Cimk819P/'>H!3d?o*Gsh7`s8TU(;W4k,;!*]Da_P(,..W7ldm""C(7tosS>o1pZYUP#BRAH_0(_$N"S,CCRh$t;aAnZ5Wbt$"aWSC52gPjUiX4T+-h?C'X/<NliD%GQr2c*`8K[%?emm\ZGX>M&rJH],1L?kK:%lKGrE_O!1j$Tc:^:u^YX6jd.MVRm0H.dPlG2/8A<_Ce$UV=nZ+(!Vi19MBOnoi@-Toa1m6Gt&k+LZ6EC\=?).=0K^.qeY,Xn-@,&hJM*Z]&JU,n=Y\;Q)<Tcp4ac5ah4;oL8'9i'qKDl#q1<#8XN8pUj8]CFruc*6S#J0UOMkg17$?BoP`RuO]P(08?KJ>W`&p<F(m%8qO&`Ha-Vn3i6(bhra=\6^QeXZ\^@5NG&G;cSjkXC]f?V]P]l>-b5El=-"K4V;i_KL5JE<l0krbo@$>^#(9tOhp7l'>FA#LXb4DOFHn+@lmS:m<;!,b*"5-W[8Ki#B`Y3Ksd&+(Fg#6(HY=1IAr:3ZEem$cD(T\[bZX=0-2MA)6O_0#j(P`liSYX%Q(Wd&GGlD-&V!&.`(Gdq_MF:Bj.CQl*X]OeM5u+eC8kU=)UJ[<SZD6F#\"ul6,Ge+'bHF`/7``?7Tb@l8%@;I[=)+Xbr7/'BX'[[RdR55q-&od$/3\g7_%(6di6A[I\QTUG*t2U^h,u:m4g-3(Tlp6lhm(iM@j^S.TB;5LIVf`cCkAV)bX;iLZF=))(7;3-ZNX9[^s!UEug\QEa#M3lssNP!0WBHg:S:CXb&-DmhWi3F,3e=MrCajj\UO,+VSH&/uMhf?=Ih/bV$"f'Lr2fBZA&VjYa"ni7]CGqf/sHh;Ej9_\#Z,Kj11R1)p;2^j'Zjt!lh]NO^?Gh$51^*T;tPC_eM?fu$X:4(9L1Tnp2'/is?"5,dpk5~>endstream
endobj
xref
0 34
0000000000 65535 f
0000000061 00000 n
0000000156 00000 n
0000000263 00000 n
0000000371 00000 n
0000000481 00000 n
0000000676 00000 n
0000000785 00000 n
0000000980 00000 n
0000001085 00000 n
0000001162 00000 n
0000001358 00000 n
0000001554 00000 n
0000001750 00000 n
0000001946 00000 n
0000002142 00000 n
0000002226 00000 n
0000002422 00000 n
0000002618 00000 n
0000002814 00000 n
0000003010 00000 n
0000003080 00000 n
0000003361 00000 n
0000003494 00000 n
0000004196 00000 n
0000006400 00000 n
0000008981 00000 n
0000011784 00000 n
0000012614 00000 n
0000014985 00000 n
0000017637 00000 n
0000020199 00000 n
0000022443 00000 n
0000023846 00000 n
trailer
<<
/ID
[<71e3d90b133a79c4436262df53cdbfbf><71e3d90b133a79c4436262df53cdbfbf>]
% ReportLab generated PDF document -- digest (opensource)
/Info 21 0 R
/Root 20 0 R
/Size 34
>>
startxref
25062
%%EOF

View File

@@ -0,0 +1,100 @@
# Issue #1097 — Bannerlord M5 Sovereign Victory: Implementation
**Date:** 2026-03-23
**Status:** Python stack implemented — game infrastructure pending
## Summary
Issue #1097 is the final milestone of Project Bannerlord (#1091): Timmy holds
the title of King with majority territory control through pure local strategy.
This PR implements the Python-side sovereign victory stack (`src/bannerlord/`).
The game-side infrastructure (Windows VM, GABS C# mod) remains external to this
repository, consistent with the scope decision on M4 (#1096).
## What was implemented
### `src/bannerlord/` package
| Module | Purpose |
|--------|---------|
| `models.py` | Pydantic data contracts — KingSubgoal, SubgoalMessage, TaskMessage, ResultMessage, StateUpdateMessage, reward functions, VictoryCondition |
| `gabs_client.py` | Async TCP JSON-RPC client for Bannerlord.GABS (port 4825), graceful degradation when game server is offline |
| `ledger.py` | SQLite-backed asset ledger — treasury, fiefs, vassal budgets, campaign tick log |
| `agents/king.py` | King agent — Qwen3:32b, 1× per campaign day, sovereign campaign loop, victory detection, subgoal broadcast |
| `agents/vassals.py` | War / Economy / Diplomacy vassals — Qwen3:14b, domain reward functions, primitive dispatch |
| `agents/companions.py` | Logistics / Caravan / Scout companions — event-driven, primitive execution against GABS |
### `tests/unit/test_bannerlord/` — 56 unit tests
- `test_models.py` — Pydantic validation, reward math, victory condition logic
- `test_gabs_client.py` — Connection lifecycle, RPC dispatch, error handling, graceful degradation
- `test_agents.py` — King campaign loop, vassal subgoal routing, companion primitive execution
All 56 tests pass.
## Architecture
```
KingAgent (Qwen3:32b, 1×/day)
└── KingSubgoal → SubgoalQueue
├── WarVassal (Qwen3:14b, 4×/day)
│ └── TaskMessage → LogisticsCompanion
│ └── GABS: move_party, recruit_troops, upgrade_troops
├── EconomyVassal (Qwen3:14b, 4×/day)
│ └── TaskMessage → CaravanCompanion
│ └── GABS: assess_prices, buy_goods, establish_caravan
└── DiplomacyVassal (Qwen3:14b, 4×/day)
└── TaskMessage → ScoutCompanion
└── GABS: track_lord, assess_garrison, report_intel
```
## Subgoal vocabulary
| Token | Vassal | Meaning |
|-------|--------|---------|
| `EXPAND_TERRITORY` | War | Take or secure a fief |
| `RAID_ECONOMY` | War | Raid enemy villages for denars |
| `TRAIN` | War | Level troops via auto-resolve |
| `FORTIFY` | Economy | Upgrade or repair a settlement |
| `CONSOLIDATE` | Economy | Hold territory, no expansion |
| `TRADE` | Economy | Execute profitable trade route |
| `ALLY` | Diplomacy | Pursue non-aggression / alliance |
| `RECRUIT` | Logistics | Fill party to capacity |
| `HEAL` | Logistics | Rest party until wounds recovered |
| `SPY` | Scout | Gain information on target faction |
## Victory condition
```python
VictoryCondition(
holds_king_title=True, # player_title == "King" from GABS
territory_control_pct=55.0, # > 51% of Calradia fiefs
)
```
## Graceful degradation
When GABS is offline (game not running), `GABSClient` logs a warning and raises
`GABSUnavailable`. The King agent catches this and runs with an empty game state
(falls back to RECRUIT subgoal). No part of the dashboard crashes.
## Remaining prerequisites
Before M5 can run live:
1. **M1-M3** — Passive observer, basic campaign actions, full campaign strategy
(currently open; their Python stubs can build on this `src/bannerlord/` package)
2. **M4** — Formation Commander (#1096) — declined as out-of-scope; M5 works
around M4 by using Bannerlord's Tactics auto-resolve path
3. **Windows VM** — Mount & Blade II: Bannerlord + GABS mod (BUTR/Bannerlord.GABS)
4. **OBS streaming** — Cinematic Camera pipeline (Step 3 of M5) — external to repo
5. **BattleLink** — Alex co-op integration (Step 4 of M5) — requires dedicated server
## Design references
- Ahilan & Dayan (2019): Feudal Multi-Agent Hierarchies — manager/worker hierarchy
- Wang et al. (2023): Voyager — LLM lifelong learning pattern
- Feudal hierarchy design doc: `docs/research/bannerlord-feudal-hierarchy-design.md`
Fixes #1097

26
poetry.lock generated
View File

@@ -2936,10 +2936,9 @@ numpy = ">=1.22,<2.5"
name = "numpy"
version = "2.4.2"
description = "Fundamental package for array computing in Python"
optional = true
optional = false
python-versions = ">=3.11"
groups = ["main"]
markers = "extra == \"bigbrain\" or extra == \"embeddings\" or extra == \"voice\""
files = [
{file = "numpy-2.4.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:e7e88598032542bd49af7c4747541422884219056c268823ef6e5e89851c8825"},
{file = "numpy-2.4.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:7edc794af8b36ca37ef5fcb5e0d128c7e0595c7b96a2318d1badb6fcd8ee86b1"},
@@ -3347,6 +3346,27 @@ triton = {version = ">=2", markers = "platform_machine == \"x86_64\" and sys_pla
[package.extras]
dev = ["black", "flake8", "isort", "pytest", "scipy"]
[[package]]
name = "opencv-python"
version = "4.13.0.92"
description = "Wrapper package for OpenCV python bindings."
optional = false
python-versions = ">=3.6"
groups = ["main"]
files = [
{file = "opencv_python-4.13.0.92-cp37-abi3-macosx_13_0_arm64.whl", hash = "sha256:caf60c071ec391ba51ed00a4a920f996d0b64e3e46068aac1f646b5de0326a19"},
{file = "opencv_python-4.13.0.92-cp37-abi3-macosx_14_0_x86_64.whl", hash = "sha256:5868a8c028a0b37561579bfb8ac1875babdc69546d236249fff296a8c010ccf9"},
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:0bc2596e68f972ca452d80f444bc404e08807d021fbba40df26b61b18e01838a"},
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:402033cddf9d294693094de5ef532339f14ce821da3ad7df7c9f6e8316da32cf"},
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:bccaabf9eb7f897ca61880ce2869dcd9b25b72129c28478e7f2a5e8dee945616"},
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:620d602b8f7d8b8dab5f4b99c6eb353e78d3fb8b0f53db1bd258bb1aa001c1d5"},
{file = "opencv_python-4.13.0.92-cp37-abi3-win32.whl", hash = "sha256:372fe164a3148ac1ca51e5f3ad0541a4a276452273f503441d718fab9c5e5f59"},
{file = "opencv_python-4.13.0.92-cp37-abi3-win_amd64.whl", hash = "sha256:423d934c9fafb91aad38edf26efb46da91ffbc05f3f59c4b0c72e699720706f5"},
]
[package.dependencies]
numpy = {version = ">=2", markers = "python_version >= \"3.9\""}
[[package]]
name = "optimum"
version = "2.1.0"
@@ -9700,4 +9720,4 @@ voice = ["openai-whisper", "piper-tts", "pyttsx3", "sounddevice"]
[metadata]
lock-version = "2.1"
python-versions = ">=3.11,<4"
content-hash = "cc50755f322b8755e85ab7bdf0668609612d885552aba14caf175326eedfa216"
content-hash = "5af3028474051032bef12182eaa5ef55950cbaeca21d1793f878d54c03994eb0"

View File

@@ -14,6 +14,7 @@ repository = "http://localhost:3000/rockachopa/Timmy-time-dashboard"
packages = [
{ include = "config.py", from = "src" },
{ include = "bannerlord", from = "src" },
{ include = "dashboard", from = "src" },
{ include = "infrastructure", from = "src" },
{ include = "integrations", from = "src" },
@@ -60,6 +61,7 @@ selenium = { version = ">=4.20.0", optional = true }
pytest-randomly = { version = ">=3.16.0", optional = true }
pytest-xdist = { version = ">=3.5.0", optional = true }
anthropic = "^0.86.0"
opencv-python = "^4.13.0.92"
[tool.poetry.extras]
telegram = ["python-telegram-bot"]
@@ -96,7 +98,7 @@ asyncio_default_fixture_loop_scope = "function"
timeout = 30
timeout_method = "signal"
timeout_func_only = false
addopts = "-v --tb=short --strict-markers --disable-warnings --durations=10"
addopts = "-v --tb=short --strict-markers --disable-warnings --durations=10 --cov-fail-under=60"
markers = [
"unit: Unit tests (fast, no I/O)",
"integration: Integration tests (may use SQLite)",

293
scripts/benchmark_local_model.sh Executable file
View File

@@ -0,0 +1,293 @@
#!/usr/bin/env bash
# benchmark_local_model.sh
#
# 5-test benchmark suite for evaluating local Ollama models as Timmy's agent brain.
# Based on the model selection study for M3 Max 36 GB (Issue #1063).
#
# Usage:
# ./scripts/benchmark_local_model.sh # test $OLLAMA_MODEL or qwen3:14b
# ./scripts/benchmark_local_model.sh qwen3:8b # test a specific model
# ./scripts/benchmark_local_model.sh qwen3:14b qwen3:8b # compare two models
#
# Thresholds (pass/fail):
# Test 1 — Tool call compliance: >=90% valid JSON responses out of 5 probes
# Test 2 — Code generation: compiles without syntax errors
# Test 3 — Shell command gen: no refusal markers in output
# Test 4 — Multi-turn coherence: session ID echoed back correctly
# Test 5 — Issue triage quality: structured JSON with required fields
#
# Exit codes: 0 = all tests passed, 1 = one or more tests failed
set -euo pipefail
OLLAMA_URL="${OLLAMA_URL:-http://localhost:11434}"
PASS=0
FAIL=0
TOTAL=0
# ── Colours ──────────────────────────────────────────────────────────────────
GREEN='\033[0;32m'
RED='\033[0;31m'
YELLOW='\033[1;33m'
BOLD='\033[1m'
RESET='\033[0m'
pass() { echo -e " ${GREEN}✓ PASS${RESET} $1"; ((PASS++)); ((TOTAL++)); }
fail() { echo -e " ${RED}✗ FAIL${RESET} $1"; ((FAIL++)); ((TOTAL++)); }
info() { echo -e " ${YELLOW}${RESET} $1"; }
# ── Helper: call Ollama generate API ─────────────────────────────────────────
ollama_generate() {
local model="$1"
local prompt="$2"
local extra_opts="${3:-}"
local payload
payload=$(printf '{"model":"%s","prompt":"%s","stream":false%s}' \
"$model" \
"$(echo "$prompt" | sed 's/"/\\"/g' | tr -d '\n')" \
"${extra_opts:+,$extra_opts}")
curl -s --max-time 60 \
-X POST "${OLLAMA_URL}/api/generate" \
-H "Content-Type: application/json" \
-d "$payload" \
| python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('response',''))" 2>/dev/null || echo ""
}
# ── Helper: call Ollama chat API with tool schema ─────────────────────────────
ollama_chat_tool() {
local model="$1"
local user_msg="$2"
local payload
payload=$(cat <<EOF
{
"model": "$model",
"messages": [{"role": "user", "content": "$user_msg"}],
"tools": [{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius","fahrenheit"]}
},
"required": ["location"]
}
}
}],
"stream": false
}
EOF
)
curl -s --max-time 60 \
-X POST "${OLLAMA_URL}/api/chat" \
-H "Content-Type: application/json" \
-d "$payload" \
| python3 -c "
import sys, json
d = json.load(sys.stdin)
msg = d.get('message', {})
# Return tool_calls JSON if present, else content
calls = msg.get('tool_calls')
if calls:
print(json.dumps(calls))
else:
print(msg.get('content', ''))
" 2>/dev/null || echo ""
}
# ── Benchmark a single model ──────────────────────────────────────────────────
benchmark_model() {
local model="$1"
echo ""
echo -e "${BOLD}═══════════════════════════════════════════════════${RESET}"
echo -e "${BOLD} Model: ${model}${RESET}"
echo -e "${BOLD}═══════════════════════════════════════════════════${RESET}"
# Check model availability
local available
available=$(curl -s "${OLLAMA_URL}/api/tags" \
| python3 -c "
import sys, json
d = json.load(sys.stdin)
models = [m.get('name','') for m in d.get('models',[])]
target = '$model'
match = any(target == m or target == m.split(':')[0] or m.startswith(target) for m in models)
print('yes' if match else 'no')
" 2>/dev/null || echo "no")
if [[ "$available" != "yes" ]]; then
echo -e " ${YELLOW}⚠ SKIP${RESET} Model '$model' not available locally — pull it first:"
echo " ollama pull $model"
return 0
fi
# ── Test 1: Tool Call Compliance ─────────────────────────────────────────
echo ""
echo -e " ${BOLD}Test 1: Tool Call Compliance${RESET} (target ≥90% valid JSON)"
local tool_pass=0
local tool_probes=5
for i in $(seq 1 $tool_probes); do
local response
response=$(ollama_chat_tool "$model" \
"What is the weather in Tokyo right now?")
# Valid if response is non-empty JSON (tool_calls array or JSON object)
if echo "$response" | python3 -c "import sys,json; json.load(sys.stdin)" 2>/dev/null; then
((tool_pass++))
fi
done
local tool_pct=$(( tool_pass * 100 / tool_probes ))
info "Tool call valid JSON: $tool_pass/$tool_probes ($tool_pct%)"
if [[ $tool_pct -ge 90 ]]; then
pass "Tool call compliance ≥90% ($tool_pct%)"
else
fail "Tool call compliance <90% ($tool_pct%) — unreliable for agent loops"
fi
# ── Test 2: Code Generation ──────────────────────────────────────────────
echo ""
echo -e " ${BOLD}Test 2: Code Generation${RESET} (must produce valid Python)"
local code_response
code_response=$(ollama_generate "$model" \
"Write a Python function called fibonacci(n) that returns the nth Fibonacci number using recursion. Return ONLY the function definition, no explanation.")
# Extract Python code block if present
local code
code=$(echo "$code_response" | sed -n '/^```python/,/^```/p' | grep -v '```' || echo "$code_response")
if [[ -z "$code" ]]; then
code="$code_response"
fi
# Validate syntax
if echo "$code" | python3 -c "
import sys, ast
src = sys.stdin.read()
try:
ast.parse(src)
sys.exit(0)
except SyntaxError as e:
sys.exit(1)
" 2>/dev/null; then
pass "Code generation produces valid Python syntax"
else
fail "Code generation produced invalid Python syntax"
info "Response: $(echo "$code_response" | head -5)"
fi
# ── Test 3: Shell Command Generation (No Refusal) ────────────────────────
echo ""
echo -e " ${BOLD}Test 3: Shell Command Generation${RESET} (must not refuse)"
local shell_response
shell_response=$(ollama_generate "$model" \
"Give me a bash one-liner to find all .log files modified in the last 7 days under /var/log and delete them. Output only the command, no explanation.")
# Check for refusal markers
local refusal_found=false
for marker in "cannot" "can't" "I'm sorry" "I am sorry" "won't" "will not" "inappropriate" "harmful" "dangerous" "refuse"; do
if echo "$shell_response" | grep -qi "$marker"; then
refusal_found=true
info "Refusal marker found: '$marker'"
break
fi
done
# Also check that the response contains something shell-like
if echo "$shell_response" | grep -qE 'find|rm|delete|xargs'; then
if [[ "$refusal_found" == "true" ]]; then
fail "Shell command generated but with refusal language"
else
pass "Shell command generated without refusal"
fi
else
if [[ "$refusal_found" == "true" ]]; then
fail "Shell command refused — model will block agent shell operations"
else
fail "Shell command not generated (no find/rm/delete/xargs in output)"
info "Response: $(echo "$shell_response" | head -3)"
fi
fi
# ── Test 4: Multi-Turn Agent Loop Coherence ──────────────────────────────
echo ""
echo -e " ${BOLD}Test 4: Multi-Turn Agent Loop Coherence${RESET}"
local session_id="SESS-$(date +%s)"
local turn1_response
turn1_response=$(ollama_generate "$model" \
"You are starting a multi-step task. Your session ID is $session_id. Acknowledge this ID and ask for the first task.")
local turn2_response
turn2_response=$(ollama_generate "$model" \
"Continuing session $session_id. Previous context: you acknowledged the session. Now summarize what session ID you are working in. Include the exact ID.")
if echo "$turn2_response" | grep -q "$session_id"; then
pass "Multi-turn coherence: session ID echoed back correctly"
else
fail "Multi-turn coherence: session ID not found in follow-up response"
info "Expected: $session_id"
info "Response snippet: $(echo "$turn2_response" | head -3)"
fi
# ── Test 5: Issue Triage Quality ─────────────────────────────────────────
echo ""
echo -e " ${BOLD}Test 5: Issue Triage Quality${RESET} (must return structured JSON)"
local triage_response
triage_response=$(ollama_generate "$model" \
'Triage this bug report and respond ONLY with a JSON object with fields: priority (low/medium/high/critical), component (string), estimated_effort (hours as integer), needs_reproduction (boolean). Bug: "The dashboard crashes with a 500 error when submitting an empty chat message. Reproducible 100% of the time on the /chat endpoint."')
local triage_valid=false
if echo "$triage_response" | python3 -c "
import sys, json, re
text = sys.stdin.read()
# Try to extract JSON from response (may be wrapped in markdown)
match = re.search(r'\{[^{}]+\}', text, re.DOTALL)
if not match:
sys.exit(1)
try:
d = json.loads(match.group())
required = {'priority', 'component', 'estimated_effort', 'needs_reproduction'}
if required.issubset(d.keys()):
valid_priority = d['priority'] in ('low','medium','high','critical')
if valid_priority:
sys.exit(0)
sys.exit(1)
except:
sys.exit(1)
" 2>/dev/null; then
pass "Issue triage returned valid structured JSON with all required fields"
else
fail "Issue triage did not return valid structured JSON"
info "Response: $(echo "$triage_response" | head -5)"
fi
}
# ── Summary ───────────────────────────────────────────────────────────────────
print_summary() {
local model="$1"
local model_pass="$2"
local model_total="$3"
echo ""
local pct=$(( model_pass * 100 / model_total ))
if [[ $model_pass -eq $model_total ]]; then
echo -e " ${GREEN}${BOLD}RESULT: $model_pass/$model_total tests passed ($pct%) — READY FOR AGENT USE${RESET}"
elif [[ $pct -ge 60 ]]; then
echo -e " ${YELLOW}${BOLD}RESULT: $model_pass/$model_total tests passed ($pct%) — MARGINAL${RESET}"
else
echo -e " ${RED}${BOLD}RESULT: $model_pass/$model_total tests passed ($pct%) — NOT RECOMMENDED${RESET}"
fi
}
# ── Main ─────────────────────────────────────────────────────────────────────
models=("${@:-${OLLAMA_MODEL:-qwen3:14b}}")
for model in "${models[@]}"; do
PASS=0
FAIL=0
TOTAL=0
benchmark_model "$model"
print_summary "$model" "$PASS" "$TOTAL"
done
echo ""
if [[ $FAIL -eq 0 ]]; then
exit 0
else
exit 1
fi

View File

@@ -42,7 +42,7 @@ def _get_gitea_api() -> str:
if api_file.exists():
return api_file.read_text().strip()
# Default fallback
return "http://localhost:3000/api/v1"
return "http://143.198.27.163:3000/api/v1"
GITEA_API = _get_gitea_api()

View File

@@ -6,7 +6,7 @@ writes a ranked queue to .loop/queue.json. No LLM calls — pure heuristics.
Run: python3 scripts/triage_score.py
Env: GITEA_TOKEN (or reads ~/.hermes/gitea_token)
GITEA_API (default: http://localhost:3000/api/v1)
GITEA_API (default: http://143.198.27.163:3000/api/v1)
REPO_SLUG (default: rockachopa/Timmy-time-dashboard)
"""
@@ -33,7 +33,7 @@ def _get_gitea_api() -> str:
if api_file.exists():
return api_file.read_text().strip()
# Default fallback
return "http://localhost:3000/api/v1"
return "http://143.198.27.163:3000/api/v1"
GITEA_API = _get_gitea_api()

View File

@@ -0,0 +1,22 @@
"""Bannerlord sovereign agent package — Project Bannerlord M5.
Implements the feudal multi-agent hierarchy for Timmy's Bannerlord campaign.
Architecture based on Ahilan & Dayan (2019) Feudal Multi-Agent Hierarchies.
Refs #1091 (epic), #1097 (M5 Sovereign Victory), #1099 (feudal hierarchy design).
Requires:
- GABS mod running on Bannerlord Windows VM (TCP port 4825)
- Ollama with Qwen3:32b (King), Qwen3:14b (Vassals), Qwen3:8b (Companions)
Usage::
from bannerlord.gabs_client import GABSClient
from bannerlord.agents.king import KingAgent
async with GABSClient() as gabs:
king = KingAgent(gabs_client=gabs)
await king.run_campaign()
"""
__version__ = "0.1.0"

View File

@@ -0,0 +1,7 @@
"""Bannerlord feudal agent hierarchy.
Three tiers:
- King (king.py) — strategic, Qwen3:32b, 1× per campaign day
- Vassals (vassals.py) — domain, Qwen3:14b, 4× per campaign day
- Companions (companions.py) — tactical, Qwen3:8b, event-driven
"""

View File

@@ -0,0 +1,261 @@
"""Companion worker agents — Logistics, Caravan, and Scout.
Companions are the lowest tier — fast, specialized, single-purpose workers.
Each companion listens to its :class:`TaskMessage` queue, executes the
requested primitive against GABS, and emits a :class:`ResultMessage`.
Model: Qwen3:8b (or smaller) — sub-2-second response times.
Frequency: event-driven (triggered by vassal task messages).
Primitive vocabulary per companion:
Logistics: recruit_troop, buy_supplies, rest_party, sell_prisoners, upgrade_troops, build_project
Caravan: assess_prices, buy_goods, sell_goods, establish_caravan, abandon_route
Scout: track_lord, assess_garrison, map_patrol_routes, report_intel
Refs: #1097, #1099.
"""
from __future__ import annotations
import asyncio
import logging
from typing import Any
from bannerlord.gabs_client import GABSClient, GABSUnavailable
from bannerlord.models import ResultMessage, TaskMessage
logger = logging.getLogger(__name__)
class BaseCompanion:
"""Shared companion lifecycle — polls task queue, executes primitives."""
name: str = "base_companion"
primitives: frozenset[str] = frozenset()
def __init__(
self,
gabs_client: GABSClient,
task_queue: asyncio.Queue[TaskMessage],
result_queue: asyncio.Queue[ResultMessage] | None = None,
) -> None:
self._gabs = gabs_client
self._task_queue = task_queue
self._result_queue = result_queue or asyncio.Queue()
self._running = False
@property
def result_queue(self) -> asyncio.Queue[ResultMessage]:
return self._result_queue
async def run(self) -> None:
"""Companion event loop — processes task messages."""
self._running = True
logger.info("%s started", self.name)
try:
while self._running:
try:
task = await asyncio.wait_for(self._task_queue.get(), timeout=1.0)
except TimeoutError:
continue
if task.to_agent != self.name:
# Not for us — put it back (another companion will handle it)
await self._task_queue.put(task)
await asyncio.sleep(0.05)
continue
result = await self._execute(task)
await self._result_queue.put(result)
self._task_queue.task_done()
except asyncio.CancelledError:
logger.info("%s cancelled", self.name)
raise
finally:
self._running = False
def stop(self) -> None:
self._running = False
async def _execute(self, task: TaskMessage) -> ResultMessage:
"""Dispatch *task.primitive* to its handler method."""
handler = getattr(self, f"_prim_{task.primitive}", None)
if handler is None:
logger.warning("%s: unknown primitive %r — skipping", self.name, task.primitive)
return ResultMessage(
from_agent=self.name,
to_agent=task.from_agent,
success=False,
outcome={"error": f"Unknown primitive: {task.primitive}"},
)
try:
outcome = await handler(task.args)
return ResultMessage(
from_agent=self.name,
to_agent=task.from_agent,
success=True,
outcome=outcome or {},
)
except GABSUnavailable as exc:
logger.warning("%s: GABS unavailable for %r: %s", self.name, task.primitive, exc)
return ResultMessage(
from_agent=self.name,
to_agent=task.from_agent,
success=False,
outcome={"error": str(exc)},
)
except Exception as exc: # noqa: BLE001
logger.warning("%s: %r failed: %s", self.name, task.primitive, exc)
return ResultMessage(
from_agent=self.name,
to_agent=task.from_agent,
success=False,
outcome={"error": str(exc)},
)
# ── Logistics Companion ───────────────────────────────────────────────────────
class LogisticsCompanion(BaseCompanion):
"""Party management — recruitment, supply, healing, troop upgrades.
Skill domain: Scouting / Steward / Medicine.
"""
name = "logistics_companion"
primitives = frozenset(
{
"recruit_troop",
"buy_supplies",
"rest_party",
"sell_prisoners",
"upgrade_troops",
"build_project",
}
)
async def _prim_recruit_troop(self, args: dict[str, Any]) -> dict[str, Any]:
troop_type = args.get("troop_type", "infantry")
qty = int(args.get("quantity", 10))
result = await self._gabs.recruit_troops(troop_type, qty)
logger.info("Recruited %d %s", qty, troop_type)
return result or {"recruited": qty, "type": troop_type}
async def _prim_buy_supplies(self, args: dict[str, Any]) -> dict[str, Any]:
qty = int(args.get("quantity", 50))
result = await self._gabs.call("party.buySupplies", {"quantity": qty})
logger.info("Bought %d food supplies", qty)
return result or {"purchased": qty}
async def _prim_rest_party(self, args: dict[str, Any]) -> dict[str, Any]:
days = int(args.get("days", 3))
result = await self._gabs.call("party.rest", {"days": days})
logger.info("Resting party for %d days", days)
return result or {"rested_days": days}
async def _prim_sell_prisoners(self, args: dict[str, Any]) -> dict[str, Any]:
location = args.get("location", "nearest_town")
result = await self._gabs.call("party.sellPrisoners", {"location": location})
logger.info("Selling prisoners at %s", location)
return result or {"sold_at": location}
async def _prim_upgrade_troops(self, args: dict[str, Any]) -> dict[str, Any]:
result = await self._gabs.call("party.upgradeTroops", {})
logger.info("Upgraded available troops")
return result or {"upgraded": True}
async def _prim_build_project(self, args: dict[str, Any]) -> dict[str, Any]:
settlement = args.get("settlement", "")
result = await self._gabs.call("settlement.buildProject", {"settlement": settlement})
logger.info("Building project in %s", settlement)
return result or {"settlement": settlement}
async def _prim_move_party(self, args: dict[str, Any]) -> dict[str, Any]:
destination = args.get("destination", "")
result = await self._gabs.move_party(destination)
logger.info("Moving party to %s", destination)
return result or {"destination": destination}
# ── Caravan Companion ─────────────────────────────────────────────────────────
class CaravanCompanion(BaseCompanion):
"""Trade route management — price assessment, goods trading, caravan deployment.
Skill domain: Trade / Charm.
"""
name = "caravan_companion"
primitives = frozenset(
{"assess_prices", "buy_goods", "sell_goods", "establish_caravan", "abandon_route"}
)
async def _prim_assess_prices(self, args: dict[str, Any]) -> dict[str, Any]:
town = args.get("town", "nearest")
result = await self._gabs.call("trade.assessPrices", {"town": town})
logger.info("Assessed prices at %s", town)
return result or {"town": town}
async def _prim_buy_goods(self, args: dict[str, Any]) -> dict[str, Any]:
item = args.get("item", "grain")
qty = int(args.get("quantity", 10))
result = await self._gabs.call("trade.buyGoods", {"item": item, "quantity": qty})
logger.info("Buying %d × %s", qty, item)
return result or {"item": item, "quantity": qty}
async def _prim_sell_goods(self, args: dict[str, Any]) -> dict[str, Any]:
item = args.get("item", "grain")
qty = int(args.get("quantity", 10))
result = await self._gabs.call("trade.sellGoods", {"item": item, "quantity": qty})
logger.info("Selling %d × %s", qty, item)
return result or {"item": item, "quantity": qty}
async def _prim_establish_caravan(self, args: dict[str, Any]) -> dict[str, Any]:
town = args.get("town", "")
result = await self._gabs.call("trade.establishCaravan", {"town": town})
logger.info("Establishing caravan at %s", town)
return result or {"town": town}
async def _prim_abandon_route(self, args: dict[str, Any]) -> dict[str, Any]:
result = await self._gabs.call("trade.abandonRoute", {})
logger.info("Caravan route abandoned — returning to main party")
return result or {"abandoned": True}
# ── Scout Companion ───────────────────────────────────────────────────────────
class ScoutCompanion(BaseCompanion):
"""Intelligence gathering — lord tracking, garrison assessment, patrol mapping.
Skill domain: Scouting / Roguery.
"""
name = "scout_companion"
primitives = frozenset({"track_lord", "assess_garrison", "map_patrol_routes", "report_intel"})
async def _prim_track_lord(self, args: dict[str, Any]) -> dict[str, Any]:
lord_name = args.get("name", "")
result = await self._gabs.call("intelligence.trackLord", {"name": lord_name})
logger.info("Tracking lord: %s", lord_name)
return result or {"tracking": lord_name}
async def _prim_assess_garrison(self, args: dict[str, Any]) -> dict[str, Any]:
settlement = args.get("settlement", "")
result = await self._gabs.call("intelligence.assessGarrison", {"settlement": settlement})
logger.info("Assessing garrison at %s", settlement)
return result or {"settlement": settlement}
async def _prim_map_patrol_routes(self, args: dict[str, Any]) -> dict[str, Any]:
region = args.get("region", "")
result = await self._gabs.call("intelligence.mapPatrols", {"region": region})
logger.info("Mapping patrol routes in %s", region)
return result or {"region": region}
async def _prim_report_intel(self, args: dict[str, Any]) -> dict[str, Any]:
result = await self._gabs.call("intelligence.report", {})
logger.info("Scout intel report generated")
return result or {"reported": True}

View File

@@ -0,0 +1,235 @@
"""King agent — Timmy as sovereign ruler of Calradia.
The King operates on the campaign-map timescale. Each campaign tick he:
1. Reads the full game state from GABS
2. Evaluates the victory condition
3. Issues a single KingSubgoal token to the vassal queue
4. Logs the tick to the ledger
Strategic planning model: Qwen3:32b (local via Ollama).
Decision budget: 515 seconds per tick.
Sovereignty guarantees (§5c of the feudal hierarchy design):
- King task holds the asyncio.TaskGroup cancel scope
- Vassals and companions run as sub-tasks and cannot terminate the King
- Only the human operator or a top-level SHUTDOWN signal can stop the loop
Refs: #1091, #1097, #1099.
"""
from __future__ import annotations
import asyncio
import json
import logging
from typing import Any
from bannerlord.gabs_client import GABSClient, GABSUnavailable
from bannerlord.ledger import Ledger
from bannerlord.models import (
KingSubgoal,
StateUpdateMessage,
SubgoalMessage,
VictoryCondition,
)
logger = logging.getLogger(__name__)
_KING_MODEL = "qwen3:32b"
_KING_TICK_SECONDS = 5.0 # real-time pause between campaign ticks (configurable)
_SYSTEM_PROMPT = """You are Timmy, the sovereign King of Calradia.
Your goal: hold the title of King with majority territory control (>50% of all fiefs).
You think strategically over 100+ in-game days. You never cheat, use cloud AI, or
request external resources beyond your local inference stack.
Each turn you receive the full game state as JSON. You respond with a single JSON
object selecting your strategic directive for the next campaign day:
{
"token": "<SUBGOAL_TOKEN>",
"target": "<settlement or faction or null>",
"quantity": <int or null>,
"priority": <float 0.0-2.0>,
"deadline_days": <int or null>,
"context": "<brief reasoning>"
}
Valid tokens: EXPAND_TERRITORY, RAID_ECONOMY, FORTIFY, RECRUIT, TRADE,
ALLY, SPY, HEAL, CONSOLIDATE, TRAIN
Think step by step. Respond with JSON only — no prose outside the object.
"""
class KingAgent:
"""Sovereign campaign agent.
Parameters
----------
gabs_client:
Connected (or gracefully-degraded) GABS client.
ledger:
Asset ledger for persistence. Initialized automatically if not provided.
ollama_url:
Base URL of the Ollama inference server.
model:
Ollama model tag. Default: qwen3:32b.
tick_interval:
Real-time seconds between campaign ticks.
subgoal_queue:
asyncio.Queue where KingSubgoal messages are placed for vassals.
Created automatically if not provided.
"""
def __init__(
self,
gabs_client: GABSClient,
ledger: Ledger | None = None,
ollama_url: str = "http://localhost:11434",
model: str = _KING_MODEL,
tick_interval: float = _KING_TICK_SECONDS,
subgoal_queue: asyncio.Queue[SubgoalMessage] | None = None,
) -> None:
self._gabs = gabs_client
self._ledger = ledger or Ledger()
self._ollama_url = ollama_url
self._model = model
self._tick_interval = tick_interval
self._subgoal_queue: asyncio.Queue[SubgoalMessage] = subgoal_queue or asyncio.Queue()
self._tick = 0
self._running = False
@property
def subgoal_queue(self) -> asyncio.Queue[SubgoalMessage]:
return self._subgoal_queue
# ── Campaign loop ─────────────────────────────────────────────────────
async def run_campaign(self, max_ticks: int | None = None) -> VictoryCondition:
"""Run the sovereign campaign loop until victory or *max_ticks*.
Returns the final :class:`VictoryCondition` snapshot.
"""
self._ledger.initialize()
self._running = True
victory = VictoryCondition()
logger.info("King campaign started. Model: %s. Max ticks: %s", self._model, max_ticks)
try:
while self._running:
if max_ticks is not None and self._tick >= max_ticks:
logger.info("Max ticks (%d) reached — stopping campaign.", max_ticks)
break
state = await self._fetch_state()
victory = self._evaluate_victory(state)
if victory.achieved:
logger.info(
"SOVEREIGN VICTORY — King of Calradia! Territory: %.1f%%, tick: %d",
victory.territory_control_pct,
self._tick,
)
break
subgoal = await self._decide(state)
await self._broadcast_subgoal(subgoal)
self._ledger.log_tick(
tick=self._tick,
campaign_day=state.get("campaign_day", self._tick),
subgoal=subgoal.token,
)
self._tick += 1
await asyncio.sleep(self._tick_interval)
except asyncio.CancelledError:
logger.info("King campaign task cancelled at tick %d", self._tick)
raise
finally:
self._running = False
return victory
def stop(self) -> None:
"""Signal the campaign loop to stop after the current tick."""
self._running = False
# ── State & victory ───────────────────────────────────────────────────
async def _fetch_state(self) -> dict[str, Any]:
try:
state = await self._gabs.get_state()
return state if isinstance(state, dict) else {}
except GABSUnavailable as exc:
logger.warning("GABS unavailable at tick %d: %s — using empty state", self._tick, exc)
return {}
def _evaluate_victory(self, state: dict[str, Any]) -> VictoryCondition:
return VictoryCondition(
holds_king_title=state.get("player_title") == "King",
territory_control_pct=float(state.get("territory_control_pct", 0.0)),
)
# ── Strategic decision ────────────────────────────────────────────────
async def _decide(self, state: dict[str, Any]) -> KingSubgoal:
"""Ask the LLM for the next strategic subgoal.
Falls back to RECRUIT (safe default) if the LLM is unavailable.
"""
try:
subgoal = await asyncio.to_thread(self._llm_decide, state)
return subgoal
except Exception as exc: # noqa: BLE001
logger.warning(
"King LLM decision failed at tick %d: %s — defaulting to RECRUIT", self._tick, exc
)
return KingSubgoal(token="RECRUIT", context="LLM unavailable — safe default") # noqa: S106
def _llm_decide(self, state: dict[str, Any]) -> KingSubgoal:
"""Synchronous Ollama call (runs in a thread via asyncio.to_thread)."""
import urllib.request
prompt_state = json.dumps(state, indent=2)[:4000] # truncate for context budget
payload = {
"model": self._model,
"prompt": f"GAME STATE:\n{prompt_state}\n\nYour strategic directive:",
"system": _SYSTEM_PROMPT,
"stream": False,
"format": "json",
"options": {"temperature": 0.1},
}
data = json.dumps(payload).encode()
req = urllib.request.Request(
f"{self._ollama_url}/api/generate",
data=data,
headers={"Content-Type": "application/json"},
)
with urllib.request.urlopen(req, timeout=30) as resp: # noqa: S310
result = json.loads(resp.read())
raw = result.get("response", "{}")
parsed = json.loads(raw)
return KingSubgoal(**parsed)
# ── Subgoal dispatch ──────────────────────────────────────────────────
async def _broadcast_subgoal(self, subgoal: KingSubgoal) -> None:
"""Place the subgoal on the queue for all vassals."""
for vassal in ("war_vassal", "economy_vassal", "diplomacy_vassal"):
msg = SubgoalMessage(to_agent=vassal, subgoal=subgoal)
await self._subgoal_queue.put(msg)
logger.debug(
"Tick %d: subgoal %s%s (priority=%.1f)",
self._tick,
subgoal.token,
subgoal.target or "",
subgoal.priority,
)
# ── State broadcast consumer ──────────────────────────────────────────
async def consume_state_update(self, msg: StateUpdateMessage) -> None:
"""Receive a state update broadcast (called by the orchestrator)."""
logger.debug("King received state update tick=%d", msg.tick)

View File

@@ -0,0 +1,296 @@
"""Vassal agents — War, Economy, and Diplomacy.
Vassals are mid-tier agents responsible for a domain of the kingdom.
Each vassal:
- Listens to the King's subgoal queue
- Computes its domain reward at each tick
- Issues TaskMessages to companion workers
- Reports ResultMessages back up to the King
Model: Qwen3:14b (balanced capability vs. latency).
Frequency: up to 4× per campaign day.
Refs: #1097, #1099.
"""
from __future__ import annotations
import asyncio
import logging
from typing import Any
from bannerlord.gabs_client import GABSClient, GABSUnavailable
from bannerlord.models import (
DiplomacyReward,
EconomyReward,
KingSubgoal,
ResultMessage,
SubgoalMessage,
TaskMessage,
WarReward,
)
logger = logging.getLogger(__name__)
# Tokens each vassal responds to (all others are ignored)
_WAR_TOKENS = {"EXPAND_TERRITORY", "RAID_ECONOMY", "TRAIN"}
_ECON_TOKENS = {"FORTIFY", "CONSOLIDATE"}
_DIPLO_TOKENS = {"ALLY"}
_LOGISTICS_TOKENS = {"RECRUIT", "HEAL"}
_TRADE_TOKENS = {"TRADE"}
_SCOUT_TOKENS = {"SPY"}
class BaseVassal:
"""Shared vassal lifecycle — subscribes to subgoal queue, runs tick loop."""
name: str = "base_vassal"
def __init__(
self,
gabs_client: GABSClient,
subgoal_queue: asyncio.Queue[SubgoalMessage],
result_queue: asyncio.Queue[ResultMessage] | None = None,
task_queue: asyncio.Queue[TaskMessage] | None = None,
) -> None:
self._gabs = gabs_client
self._subgoal_queue = subgoal_queue
self._result_queue = result_queue or asyncio.Queue()
self._task_queue = task_queue or asyncio.Queue()
self._active_subgoal: KingSubgoal | None = None
self._running = False
@property
def task_queue(self) -> asyncio.Queue[TaskMessage]:
return self._task_queue
async def run(self) -> None:
"""Vassal event loop — processes subgoals and emits tasks."""
self._running = True
logger.info("%s started", self.name)
try:
while self._running:
# Drain all pending subgoals (keep the latest)
try:
while True:
msg = self._subgoal_queue.get_nowait()
if msg.to_agent == self.name:
self._active_subgoal = msg.subgoal
logger.debug("%s received subgoal %s", self.name, msg.subgoal.token)
except asyncio.QueueEmpty:
pass
if self._active_subgoal is not None:
await self._tick(self._active_subgoal)
await asyncio.sleep(0.25) # yield to event loop
except asyncio.CancelledError:
logger.info("%s cancelled", self.name)
raise
finally:
self._running = False
def stop(self) -> None:
self._running = False
async def _tick(self, subgoal: KingSubgoal) -> None:
raise NotImplementedError
async def _get_state(self) -> dict[str, Any]:
try:
return await self._gabs.get_state() or {}
except GABSUnavailable:
return {}
# ── War Vassal ────────────────────────────────────────────────────────────────
class WarVassal(BaseVassal):
"""Military operations — sieges, field battles, raids, defensive maneuvers.
Reward function:
R = 0.40*ΔTerritoryValue + 0.25*ΔArmyStrengthRatio
- 0.20*CasualtyCost - 0.10*SupplyCost + 0.05*SubgoalBonus
"""
name = "war_vassal"
async def _tick(self, subgoal: KingSubgoal) -> None:
if subgoal.token not in _WAR_TOKENS | _LOGISTICS_TOKENS:
return
state = await self._get_state()
reward = self._compute_reward(state, subgoal)
task = self._plan_action(state, subgoal)
if task:
await self._task_queue.put(task)
logger.debug(
"%s tick: subgoal=%s reward=%.3f action=%s",
self.name,
subgoal.token,
reward.total,
task.primitive if task else "none",
)
def _compute_reward(self, state: dict[str, Any], subgoal: KingSubgoal) -> WarReward:
bonus = subgoal.priority * 0.05 if subgoal.token in _WAR_TOKENS else 0.0
return WarReward(
territory_delta=float(state.get("territory_delta", 0.0)),
army_strength_ratio=float(state.get("army_strength_ratio", 1.0)),
casualty_cost=float(state.get("casualty_cost", 0.0)),
supply_cost=float(state.get("supply_cost", 0.0)),
subgoal_bonus=bonus,
)
def _plan_action(self, state: dict[str, Any], subgoal: KingSubgoal) -> TaskMessage | None:
if subgoal.token == "EXPAND_TERRITORY" and subgoal.target: # noqa: S105
return TaskMessage(
from_agent=self.name,
to_agent="logistics_companion",
primitive="move_party",
args={"destination": subgoal.target},
priority=subgoal.priority,
)
if subgoal.token == "RECRUIT": # noqa: S105
qty = subgoal.quantity or 20
return TaskMessage(
from_agent=self.name,
to_agent="logistics_companion",
primitive="recruit_troop",
args={"troop_type": "infantry", "quantity": qty},
priority=subgoal.priority,
)
if subgoal.token == "TRAIN": # noqa: S105
return TaskMessage(
from_agent=self.name,
to_agent="logistics_companion",
primitive="upgrade_troops",
args={},
priority=subgoal.priority,
)
return None
# ── Economy Vassal ────────────────────────────────────────────────────────────
class EconomyVassal(BaseVassal):
"""Settlement management, tax collection, construction, food supply.
Reward function:
R = 0.35*DailyDenarsIncome + 0.25*FoodStockBuffer + 0.20*LoyaltyAverage
- 0.15*ConstructionQueueLength + 0.05*SubgoalBonus
"""
name = "economy_vassal"
async def _tick(self, subgoal: KingSubgoal) -> None:
if subgoal.token not in _ECON_TOKENS | _TRADE_TOKENS:
return
state = await self._get_state()
reward = self._compute_reward(state, subgoal)
task = self._plan_action(state, subgoal)
if task:
await self._task_queue.put(task)
logger.debug(
"%s tick: subgoal=%s reward=%.3f",
self.name,
subgoal.token,
reward.total,
)
def _compute_reward(self, state: dict[str, Any], subgoal: KingSubgoal) -> EconomyReward:
bonus = subgoal.priority * 0.05 if subgoal.token in _ECON_TOKENS else 0.0
return EconomyReward(
daily_denars_income=float(state.get("daily_income", 0.0)),
food_stock_buffer=float(state.get("food_days_remaining", 0.0)),
loyalty_average=float(state.get("avg_loyalty", 50.0)),
construction_queue_length=int(state.get("construction_queue", 0)),
subgoal_bonus=bonus,
)
def _plan_action(self, state: dict[str, Any], subgoal: KingSubgoal) -> TaskMessage | None:
if subgoal.token == "FORTIFY" and subgoal.target: # noqa: S105
return TaskMessage(
from_agent=self.name,
to_agent="logistics_companion",
primitive="build_project",
args={"settlement": subgoal.target},
priority=subgoal.priority,
)
if subgoal.token == "TRADE": # noqa: S105
return TaskMessage(
from_agent=self.name,
to_agent="caravan_companion",
primitive="assess_prices",
args={"town": subgoal.target or "nearest"},
priority=subgoal.priority,
)
return None
# ── Diplomacy Vassal ──────────────────────────────────────────────────────────
class DiplomacyVassal(BaseVassal):
"""Relations management — alliances, peace deals, tribute, marriage.
Reward function:
R = 0.30*AlliesCount + 0.25*TruceDurationValue + 0.25*RelationsScoreWeighted
- 0.15*ActiveWarsFront + 0.05*SubgoalBonus
"""
name = "diplomacy_vassal"
async def _tick(self, subgoal: KingSubgoal) -> None:
if subgoal.token not in _DIPLO_TOKENS | _SCOUT_TOKENS:
return
state = await self._get_state()
reward = self._compute_reward(state, subgoal)
task = self._plan_action(state, subgoal)
if task:
await self._task_queue.put(task)
logger.debug(
"%s tick: subgoal=%s reward=%.3f",
self.name,
subgoal.token,
reward.total,
)
def _compute_reward(self, state: dict[str, Any], subgoal: KingSubgoal) -> DiplomacyReward:
bonus = subgoal.priority * 0.05 if subgoal.token in _DIPLO_TOKENS else 0.0
return DiplomacyReward(
allies_count=int(state.get("allies_count", 0)),
truce_duration_value=float(state.get("truce_value", 0.0)),
relations_score_weighted=float(state.get("relations_weighted", 0.0)),
active_wars_front=int(state.get("active_wars", 0)),
subgoal_bonus=bonus,
)
def _plan_action(self, state: dict[str, Any], subgoal: KingSubgoal) -> TaskMessage | None:
if subgoal.token == "ALLY" and subgoal.target: # noqa: S105
return TaskMessage(
from_agent=self.name,
to_agent="scout_companion",
primitive="track_lord",
args={"name": subgoal.target},
priority=subgoal.priority,
)
if subgoal.token == "SPY" and subgoal.target: # noqa: S105
return TaskMessage(
from_agent=self.name,
to_agent="scout_companion",
primitive="assess_garrison",
args={"settlement": subgoal.target},
priority=subgoal.priority,
)
return None

View File

@@ -0,0 +1,198 @@
"""GABS TCP/JSON-RPC client.
Connects to the Bannerlord.GABS C# mod server running on a Windows VM.
Protocol: newline-delimited JSON-RPC 2.0 over raw TCP.
Default host: localhost, port: 4825 (configurable via settings.bannerlord_gabs_host
and settings.bannerlord_gabs_port).
Follows the graceful-degradation pattern: if GABS is unreachable the client
logs a warning and every call raises :class:`GABSUnavailable` — callers
should catch this and degrade gracefully rather than crashing.
Refs: #1091, #1097.
"""
from __future__ import annotations
import asyncio
import json
import logging
from typing import Any
logger = logging.getLogger(__name__)
_DEFAULT_HOST = "localhost"
_DEFAULT_PORT = 4825
_DEFAULT_TIMEOUT = 10.0 # seconds
class GABSUnavailable(RuntimeError):
"""Raised when the GABS game server cannot be reached."""
class GABSError(RuntimeError):
"""Raised when GABS returns a JSON-RPC error response."""
def __init__(self, code: int, message: str) -> None:
super().__init__(f"GABS error {code}: {message}")
self.code = code
class GABSClient:
"""Async TCP JSON-RPC client for Bannerlord.GABS.
Intended for use as an async context manager::
async with GABSClient() as client:
state = await client.get_state()
Can also be constructed standalone — call :meth:`connect` and
:meth:`close` manually.
"""
def __init__(
self,
host: str = _DEFAULT_HOST,
port: int = _DEFAULT_PORT,
timeout: float = _DEFAULT_TIMEOUT,
) -> None:
self._host = host
self._port = port
self._timeout = timeout
self._reader: asyncio.StreamReader | None = None
self._writer: asyncio.StreamWriter | None = None
self._seq = 0
self._connected = False
# ── Lifecycle ─────────────────────────────────────────────────────────
async def connect(self) -> None:
"""Open the TCP connection to GABS.
Logs a warning and sets :attr:`connected` to ``False`` if the game
server is not reachable — does not raise.
"""
try:
self._reader, self._writer = await asyncio.wait_for(
asyncio.open_connection(self._host, self._port),
timeout=self._timeout,
)
self._connected = True
logger.info("GABS connected at %s:%s", self._host, self._port)
except (TimeoutError, OSError) as exc:
logger.warning(
"GABS unavailable at %s:%s — Bannerlord agent will degrade: %s",
self._host,
self._port,
exc,
)
self._connected = False
async def close(self) -> None:
if self._writer is not None:
try:
self._writer.close()
await self._writer.wait_closed()
except Exception: # noqa: BLE001
pass
self._connected = False
logger.debug("GABS connection closed")
async def __aenter__(self) -> GABSClient:
await self.connect()
return self
async def __aexit__(self, *_: Any) -> None:
await self.close()
@property
def connected(self) -> bool:
return self._connected
# ── RPC ───────────────────────────────────────────────────────────────
async def call(self, method: str, params: dict[str, Any] | None = None) -> Any:
"""Send a JSON-RPC 2.0 request and return the ``result`` field.
Raises:
GABSUnavailable: if the client is not connected.
GABSError: if the server returns a JSON-RPC error.
"""
if not self._connected or self._reader is None or self._writer is None:
raise GABSUnavailable(
f"GABS not connected (host={self._host}, port={self._port}). "
"Is the Bannerlord VM running?"
)
self._seq += 1
request = {
"jsonrpc": "2.0",
"id": self._seq,
"method": method,
"params": params or {},
}
payload = json.dumps(request) + "\n"
try:
self._writer.write(payload.encode())
await asyncio.wait_for(self._writer.drain(), timeout=self._timeout)
raw = await asyncio.wait_for(self._reader.readline(), timeout=self._timeout)
except (TimeoutError, OSError) as exc:
self._connected = False
raise GABSUnavailable(f"GABS connection lost during {method!r}: {exc}") from exc
response = json.loads(raw)
if "error" in response and response["error"] is not None:
err = response["error"]
raise GABSError(err.get("code", -1), err.get("message", "unknown"))
return response.get("result")
# ── Game state ────────────────────────────────────────────────────────
async def get_state(self) -> dict[str, Any]:
"""Fetch the full campaign game state snapshot."""
return await self.call("game.getState") # type: ignore[return-value]
async def get_kingdom_info(self) -> dict[str, Any]:
"""Fetch kingdom-level info (title, fiefs, treasury, relations)."""
return await self.call("kingdom.getInfo") # type: ignore[return-value]
async def get_party_status(self) -> dict[str, Any]:
"""Fetch current party status (troops, food, position, wounds)."""
return await self.call("party.getStatus") # type: ignore[return-value]
# ── Campaign actions ──────────────────────────────────────────────────
async def move_party(self, settlement: str) -> dict[str, Any]:
"""Order the main party to march toward *settlement*."""
return await self.call("party.move", {"target": settlement}) # type: ignore[return-value]
async def recruit_troops(self, troop_type: str, quantity: int) -> dict[str, Any]:
"""Recruit *quantity* troops of *troop_type* at the current location."""
return await self.call( # type: ignore[return-value]
"party.recruit", {"troop_type": troop_type, "quantity": quantity}
)
async def set_tax_policy(self, settlement: str, policy: str) -> dict[str, Any]:
"""Set the tax policy for *settlement* (light/normal/high)."""
return await self.call( # type: ignore[return-value]
"settlement.setTaxPolicy", {"settlement": settlement, "policy": policy}
)
async def send_envoy(self, faction: str, proposal: str) -> dict[str, Any]:
"""Send a diplomatic envoy to *faction* with *proposal*."""
return await self.call( # type: ignore[return-value]
"diplomacy.sendEnvoy", {"faction": faction, "proposal": proposal}
)
async def siege_settlement(self, settlement: str) -> dict[str, Any]:
"""Begin siege of *settlement*."""
return await self.call("battle.siege", {"target": settlement}) # type: ignore[return-value]
async def auto_resolve_battle(self) -> dict[str, Any]:
"""Auto-resolve the current battle using Tactics skill."""
return await self.call("battle.autoResolve") # type: ignore[return-value]

256
src/bannerlord/ledger.py Normal file
View File

@@ -0,0 +1,256 @@
"""Asset ledger for the Bannerlord sovereign agent.
Tracks kingdom assets (denars, settlements, troop allocations) in an
in-memory dict backed by SQLite for persistence. Follows the existing
SQLite migration pattern in this repo.
The King has exclusive write access to treasury and settlement ownership.
Vassals receive an allocated budget and cannot exceed it without King
re-authorization. Companions hold only work-in-progress quotas.
Refs: #1097, #1099.
"""
from __future__ import annotations
import logging
import sqlite3
from collections.abc import Iterator
from contextlib import contextmanager
from datetime import datetime
from pathlib import Path
logger = logging.getLogger(__name__)
_DEFAULT_DB = Path.home() / ".timmy" / "bannerlord" / "ledger.db"
class BudgetExceeded(ValueError):
"""Raised when a vassal attempts to exceed its allocated budget."""
class Ledger:
"""Sovereign asset ledger backed by SQLite.
Tracks:
- Kingdom treasury (denar balance)
- Fief (settlement) ownership roster
- Vassal denar budgets (delegated, revocable)
- Campaign tick log (for long-horizon planning)
Usage::
ledger = Ledger()
ledger.initialize()
ledger.deposit(5000, "tax income — Epicrotea")
ledger.allocate_budget("war_vassal", 2000)
"""
def __init__(self, db_path: Path = _DEFAULT_DB) -> None:
self._db_path = db_path
self._db_path.parent.mkdir(parents=True, exist_ok=True)
# ── Setup ─────────────────────────────────────────────────────────────
def initialize(self) -> None:
"""Create tables if they don't exist."""
with self._conn() as conn:
conn.executescript(
"""
CREATE TABLE IF NOT EXISTS treasury (
id INTEGER PRIMARY KEY CHECK (id = 1),
balance REAL NOT NULL DEFAULT 0
);
INSERT OR IGNORE INTO treasury (id, balance) VALUES (1, 0);
CREATE TABLE IF NOT EXISTS fiefs (
name TEXT PRIMARY KEY,
fief_type TEXT NOT NULL, -- town / castle / village
acquired_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS vassal_budgets (
agent TEXT PRIMARY KEY,
allocated REAL NOT NULL DEFAULT 0,
spent REAL NOT NULL DEFAULT 0
);
CREATE TABLE IF NOT EXISTS tick_log (
tick INTEGER PRIMARY KEY,
campaign_day INTEGER NOT NULL,
subgoal TEXT,
reward_war REAL,
reward_econ REAL,
reward_diplo REAL,
logged_at TEXT NOT NULL
);
"""
)
logger.debug("Ledger initialized at %s", self._db_path)
# ── Treasury ──────────────────────────────────────────────────────────
def balance(self) -> float:
with self._conn() as conn:
row = conn.execute("SELECT balance FROM treasury WHERE id = 1").fetchone()
return float(row[0]) if row else 0.0
def deposit(self, amount: float, reason: str = "") -> float:
"""Add *amount* denars to treasury. Returns new balance."""
if amount < 0:
raise ValueError("Use withdraw() for negative amounts")
with self._conn() as conn:
conn.execute("UPDATE treasury SET balance = balance + ? WHERE id = 1", (amount,))
bal = self.balance()
logger.info("Treasury +%.0f denars (%s) → balance %.0f", amount, reason, bal)
return bal
def withdraw(self, amount: float, reason: str = "") -> float:
"""Remove *amount* denars from treasury. Returns new balance."""
if amount < 0:
raise ValueError("Amount must be positive")
bal = self.balance()
if amount > bal:
raise BudgetExceeded(
f"Cannot withdraw {amount:.0f} denars — treasury balance is only {bal:.0f}"
)
with self._conn() as conn:
conn.execute("UPDATE treasury SET balance = balance - ? WHERE id = 1", (amount,))
new_bal = self.balance()
logger.info("Treasury -%.0f denars (%s) → balance %.0f", amount, reason, new_bal)
return new_bal
# ── Fiefs ─────────────────────────────────────────────────────────────
def add_fief(self, name: str, fief_type: str) -> None:
with self._conn() as conn:
conn.execute(
"INSERT OR REPLACE INTO fiefs (name, fief_type, acquired_at) VALUES (?, ?, ?)",
(name, fief_type, datetime.utcnow().isoformat()),
)
logger.info("Fief acquired: %s (%s)", name, fief_type)
def remove_fief(self, name: str) -> None:
with self._conn() as conn:
conn.execute("DELETE FROM fiefs WHERE name = ?", (name,))
logger.info("Fief lost: %s", name)
def list_fiefs(self) -> list[dict[str, str]]:
with self._conn() as conn:
rows = conn.execute("SELECT name, fief_type, acquired_at FROM fiefs").fetchall()
return [{"name": r[0], "fief_type": r[1], "acquired_at": r[2]} for r in rows]
# ── Vassal budgets ────────────────────────────────────────────────────
def allocate_budget(self, agent: str, amount: float) -> None:
"""Delegate *amount* denars to a vassal agent.
Withdraws from treasury. Raises :class:`BudgetExceeded` if
the treasury cannot cover the allocation.
"""
self.withdraw(amount, reason=f"budget → {agent}")
with self._conn() as conn:
conn.execute(
"""
INSERT INTO vassal_budgets (agent, allocated, spent)
VALUES (?, ?, 0)
ON CONFLICT(agent) DO UPDATE SET allocated = allocated + excluded.allocated
""",
(agent, amount),
)
logger.info("Allocated %.0f denars to %s", amount, agent)
def record_vassal_spend(self, agent: str, amount: float) -> None:
"""Record that a vassal spent *amount* from its budget."""
with self._conn() as conn:
row = conn.execute(
"SELECT allocated, spent FROM vassal_budgets WHERE agent = ?", (agent,)
).fetchone()
if row is None:
raise BudgetExceeded(f"{agent} has no allocated budget")
allocated, spent = row
if spent + amount > allocated:
raise BudgetExceeded(
f"{agent} budget exhausted: {spent:.0f}/{allocated:.0f} spent, "
f"requested {amount:.0f}"
)
with self._conn() as conn:
conn.execute(
"UPDATE vassal_budgets SET spent = spent + ? WHERE agent = ?",
(amount, agent),
)
def vassal_remaining(self, agent: str) -> float:
with self._conn() as conn:
row = conn.execute(
"SELECT allocated - spent FROM vassal_budgets WHERE agent = ?", (agent,)
).fetchone()
return float(row[0]) if row else 0.0
# ── Tick log ──────────────────────────────────────────────────────────
def log_tick(
self,
tick: int,
campaign_day: int,
subgoal: str | None = None,
reward_war: float | None = None,
reward_econ: float | None = None,
reward_diplo: float | None = None,
) -> None:
with self._conn() as conn:
conn.execute(
"""
INSERT OR REPLACE INTO tick_log
(tick, campaign_day, subgoal, reward_war, reward_econ, reward_diplo, logged_at)
VALUES (?, ?, ?, ?, ?, ?, ?)
""",
(
tick,
campaign_day,
subgoal,
reward_war,
reward_econ,
reward_diplo,
datetime.utcnow().isoformat(),
),
)
def tick_history(self, last_n: int = 100) -> list[dict]:
with self._conn() as conn:
rows = conn.execute(
"""
SELECT tick, campaign_day, subgoal, reward_war, reward_econ, reward_diplo, logged_at
FROM tick_log
ORDER BY tick DESC
LIMIT ?
""",
(last_n,),
).fetchall()
return [
{
"tick": r[0],
"campaign_day": r[1],
"subgoal": r[2],
"reward_war": r[3],
"reward_econ": r[4],
"reward_diplo": r[5],
"logged_at": r[6],
}
for r in rows
]
# ── Internal ──────────────────────────────────────────────────────────
@contextmanager
def _conn(self) -> Iterator[sqlite3.Connection]:
conn = sqlite3.connect(self._db_path)
conn.execute("PRAGMA journal_mode=WAL")
try:
yield conn
conn.commit()
except Exception:
conn.rollback()
raise
finally:
conn.close()

191
src/bannerlord/models.py Normal file
View File

@@ -0,0 +1,191 @@
"""Bannerlord feudal hierarchy data models.
All inter-agent communication uses typed Pydantic models. No raw dicts
cross agent boundaries — every message is validated at construction time.
Design: Ahilan & Dayan (2019) Feudal Multi-Agent Hierarchies.
Refs: #1097, #1099.
"""
from __future__ import annotations
from datetime import datetime
from typing import Any, Literal
from pydantic import BaseModel, Field
# ── Subgoal vocabulary ────────────────────────────────────────────────────────
SUBGOAL_TOKENS = frozenset(
{
"EXPAND_TERRITORY", # Take or secure a fief — War Vassal
"RAID_ECONOMY", # Raid enemy villages for denars — War Vassal
"FORTIFY", # Upgrade or repair a settlement — Economy Vassal
"RECRUIT", # Fill party to capacity — Logistics Companion
"TRADE", # Execute profitable trade route — Caravan Companion
"ALLY", # Pursue non-aggression / alliance — Diplomacy Vassal
"SPY", # Gain information on target faction — Scout Companion
"HEAL", # Rest party until wounds recovered — Logistics Companion
"CONSOLIDATE", # Hold territory, no expansion — Economy Vassal
"TRAIN", # Level troops via auto-resolve bandits — War Vassal
}
)
# ── King subgoal ──────────────────────────────────────────────────────────────
class KingSubgoal(BaseModel):
"""Strategic directive issued by the King agent to vassals.
The King operates on campaign-map timescale (days to weeks of in-game
time). His sole output is one subgoal token plus optional parameters.
He never micro-manages primitives.
"""
token: str = Field(..., description="One of SUBGOAL_TOKENS")
target: str | None = Field(None, description="Named target (settlement, lord, faction)")
quantity: int | None = Field(None, description="For RECRUIT, TRADE tokens", ge=1)
priority: float = Field(1.0, ge=0.0, le=2.0, description="Scales vassal reward weighting")
deadline_days: int | None = Field(None, ge=1, description="Campaign-map days to complete")
context: str | None = Field(None, description="Free-text hint; not parsed by workers")
def model_post_init(self, __context: Any) -> None: # noqa: ANN401
if self.token not in SUBGOAL_TOKENS:
raise ValueError(
f"Unknown subgoal token {self.token!r}. Must be one of: {sorted(SUBGOAL_TOKENS)}"
)
# ── Inter-agent messages ──────────────────────────────────────────────────────
class SubgoalMessage(BaseModel):
"""King → Vassal direction."""
msg_type: Literal["subgoal"] = "subgoal"
from_agent: Literal["king"] = "king"
to_agent: str = Field(..., description="e.g. 'war_vassal', 'economy_vassal'")
subgoal: KingSubgoal
issued_at: datetime = Field(default_factory=datetime.utcnow)
class TaskMessage(BaseModel):
"""Vassal → Companion direction."""
msg_type: Literal["task"] = "task"
from_agent: str = Field(..., description="e.g. 'war_vassal'")
to_agent: str = Field(..., description="e.g. 'logistics_companion'")
primitive: str = Field(..., description="One of the companion primitives")
args: dict[str, Any] = Field(default_factory=dict)
priority: float = Field(1.0, ge=0.0, le=2.0)
issued_at: datetime = Field(default_factory=datetime.utcnow)
class ResultMessage(BaseModel):
"""Companion / Vassal → Parent direction."""
msg_type: Literal["result"] = "result"
from_agent: str
to_agent: str
success: bool
outcome: dict[str, Any] = Field(default_factory=dict, description="Primitive-specific result")
reward_delta: float = Field(0.0, description="Computed reward contribution")
completed_at: datetime = Field(default_factory=datetime.utcnow)
class StateUpdateMessage(BaseModel):
"""GABS → All agents (broadcast).
Sent every campaign tick. Agents consume at their own cadence.
"""
msg_type: Literal["state"] = "state"
game_state: dict[str, Any] = Field(..., description="Full GABS state snapshot")
tick: int = Field(..., ge=0)
timestamp: datetime = Field(default_factory=datetime.utcnow)
# ── Reward snapshots ──────────────────────────────────────────────────────────
class WarReward(BaseModel):
"""Computed reward for the War Vassal at a given tick."""
territory_delta: float = 0.0
army_strength_ratio: float = 1.0
casualty_cost: float = 0.0
supply_cost: float = 0.0
subgoal_bonus: float = 0.0
@property
def total(self) -> float:
w1, w2, w3, w4, w5 = 0.40, 0.25, 0.20, 0.10, 0.05
return (
w1 * self.territory_delta
+ w2 * self.army_strength_ratio
- w3 * self.casualty_cost
- w4 * self.supply_cost
+ w5 * self.subgoal_bonus
)
class EconomyReward(BaseModel):
"""Computed reward for the Economy Vassal at a given tick."""
daily_denars_income: float = 0.0
food_stock_buffer: float = 0.0
loyalty_average: float = 50.0
construction_queue_length: int = 0
subgoal_bonus: float = 0.0
@property
def total(self) -> float:
w1, w2, w3, w4, w5 = 0.35, 0.25, 0.20, 0.15, 0.05
return (
w1 * self.daily_denars_income
+ w2 * self.food_stock_buffer
+ w3 * self.loyalty_average
- w4 * self.construction_queue_length
+ w5 * self.subgoal_bonus
)
class DiplomacyReward(BaseModel):
"""Computed reward for the Diplomacy Vassal at a given tick."""
allies_count: int = 0
truce_duration_value: float = 0.0
relations_score_weighted: float = 0.0
active_wars_front: int = 0
subgoal_bonus: float = 0.0
@property
def total(self) -> float:
w1, w2, w3, w4, w5 = 0.30, 0.25, 0.25, 0.15, 0.05
return (
w1 * self.allies_count
+ w2 * self.truce_duration_value
+ w3 * self.relations_score_weighted
- w4 * self.active_wars_front
+ w5 * self.subgoal_bonus
)
# ── Victory condition ─────────────────────────────────────────────────────────
class VictoryCondition(BaseModel):
"""Sovereign Victory (M5) — evaluated each campaign tick."""
holds_king_title: bool = False
territory_control_pct: float = Field(
0.0, ge=0.0, le=100.0, description="% of Calradia fiefs held"
)
majority_threshold: float = Field(
51.0, ge=0.0, le=100.0, description="Required % for majority control"
)
@property
def achieved(self) -> bool:
return self.holds_king_title and self.territory_control_pct >= self.majority_threshold

View File

@@ -30,25 +30,36 @@ class Settings(BaseSettings):
return normalize_ollama_url(self.ollama_url)
# LLM model passed to Agno/Ollama — override with OLLAMA_MODEL
# qwen3:30b is the primary model — better reasoning and tool calling
# than llama3.1:8b-instruct while still running locally on modest hardware.
# Fallback: llama3.1:8b-instruct if qwen3:30b not available.
# llama3.2 (3B) hallucinated tool output consistently in testing.
ollama_model: str = "qwen3:30b"
# qwen3:14b (Q5_K_M) is the primary model: tool calling F1 0.971, ~17.5 GB
# at 32K context — optimal for M3 Max 36 GB (Issue #1063).
# qwen3:30b exceeded memory budget at 32K+ context on 36 GB hardware.
ollama_model: str = "qwen3:14b"
# Fast routing model — override with OLLAMA_FAST_MODEL
# qwen3:8b (Q6_K): tool calling F1 0.933 at ~45-55 tok/s (2x speed of 14B).
# Use for routine tasks: simple tool calls, file reads, status checks.
# Combined memory with qwen3:14b: ~17 GB — both can stay loaded simultaneously.
ollama_fast_model: str = "qwen3:8b"
# Maximum concurrently loaded Ollama models — override with OLLAMA_MAX_LOADED_MODELS
# Set to 2 to keep qwen3:8b (fast) + qwen3:14b (primary) both hot.
# Requires setting OLLAMA_MAX_LOADED_MODELS=2 in the Ollama server environment.
ollama_max_loaded_models: int = 2
# Context window size for Ollama inference — override with OLLAMA_NUM_CTX
# qwen3:30b with default context eats 45GB on a 39GB Mac.
# 4096 keeps memory at ~19GB. Set to 0 to use model defaults.
ollama_num_ctx: int = 4096
# qwen3:14b at 32K: ~17.5 GB total (weights + KV cache) on M3 Max 36 GB.
# Set to 0 to use model defaults.
ollama_num_ctx: int = 32768
# Fallback model chains — override with FALLBACK_MODELS / VISION_FALLBACK_MODELS
# as comma-separated strings, e.g. FALLBACK_MODELS="qwen3:30b,llama3.1"
# as comma-separated strings, e.g. FALLBACK_MODELS="qwen3:8b,qwen2.5:14b"
# Or edit config/providers.yaml → fallback_chains for the canonical source.
fallback_models: list[str] = [
"llama3.1:8b-instruct",
"llama3.1",
"qwen3:8b",
"qwen2.5:14b",
"qwen2.5:7b",
"llama3.1:8b-instruct",
"llama3.1",
"llama3.2:3b",
]
vision_fallback_models: list[str] = [
@@ -385,6 +396,16 @@ class Settings(BaseSettings):
# Default timeout for git operations.
hands_git_timeout: int = 60
# ── Hermes Health Monitor ─────────────────────────────────────────
# Enable the Hermes system health monitor (memory, disk, Ollama, processes, network).
hermes_enabled: bool = True
# How often Hermes runs a full health cycle (seconds). Default: 5 minutes.
hermes_interval_seconds: int = 300
# Alert threshold: free memory below this triggers model unloading / alert (GB).
hermes_memory_free_min_gb: float = 4.0
# Alert threshold: free disk below this triggers cleanup / alert (GB).
hermes_disk_free_min_gb: float = 10.0
# ── Error Logging ─────────────────────────────────────────────────
error_log_enabled: bool = True
error_log_dir: str = "logs"

View File

@@ -38,6 +38,7 @@ from dashboard.routes.discord import router as discord_router
from dashboard.routes.experiments import router as experiments_router
from dashboard.routes.grok import router as grok_router
from dashboard.routes.health import router as health_router
from dashboard.routes.hermes import router as hermes_router
from dashboard.routes.loop_qa import router as loop_qa_router
from dashboard.routes.memory import router as memory_router
from dashboard.routes.mobile import router as mobile_router
@@ -46,6 +47,7 @@ from dashboard.routes.models import router as models_router
from dashboard.routes.quests import router as quests_router
from dashboard.routes.scorecards import router as scorecards_router
from dashboard.routes.sovereignty_metrics import router as sovereignty_metrics_router
from dashboard.routes.sovereignty_ws import router as sovereignty_ws_router
from dashboard.routes.spark import router as spark_router
from dashboard.routes.system import router as system_router
from dashboard.routes.tasks import router as tasks_router
@@ -180,6 +182,33 @@ async def _thinking_scheduler() -> None:
await asyncio.sleep(settings.thinking_interval_seconds)
async def _hermes_scheduler() -> None:
"""Background task: Hermes system health monitor, runs every 5 minutes.
Checks memory, disk, Ollama, processes, and network.
Auto-resolves what it can; fires push notifications when human help is needed.
"""
from infrastructure.hermes.monitor import hermes_monitor
await asyncio.sleep(20) # Stagger after other schedulers
while True:
try:
if settings.hermes_enabled:
report = await hermes_monitor.run_cycle()
if report.has_issues:
logger.warning(
"Hermes health issues detected — overall: %s",
report.overall.value,
)
except asyncio.CancelledError:
raise
except Exception as exc:
logger.error("Hermes scheduler error: %s", exc)
await asyncio.sleep(settings.hermes_interval_seconds)
async def _loop_qa_scheduler() -> None:
"""Background task: run capability self-tests on a separate timer.
@@ -381,14 +410,16 @@ def _startup_background_tasks() -> list[asyncio.Task]:
asyncio.create_task(_loop_qa_scheduler()),
asyncio.create_task(_presence_watcher()),
asyncio.create_task(_start_chat_integrations_background()),
asyncio.create_task(_hermes_scheduler()),
]
try:
from timmy.paperclip import start_paperclip_poller
bg_tasks.append(asyncio.create_task(start_paperclip_poller()))
logger.info("Paperclip poller started")
except ImportError:
logger.debug("Paperclip module not found, skipping poller")
return bg_tasks
@@ -638,9 +669,11 @@ app.include_router(world_router)
app.include_router(matrix_router)
app.include_router(tower_router)
app.include_router(daily_run_router)
app.include_router(hermes_router)
app.include_router(quests_router)
app.include_router(scorecards_router)
app.include_router(sovereignty_metrics_router)
app.include_router(sovereignty_ws_router)
@app.websocket("/ws")

View File

@@ -8,6 +8,8 @@ from .database import Base # Assuming a shared Base in models/database.py
class TaskState(StrEnum):
"""Enumeration of possible task lifecycle states."""
LATER = "LATER"
NEXT = "NEXT"
NOW = "NOW"
@@ -16,12 +18,16 @@ class TaskState(StrEnum):
class TaskCertainty(StrEnum):
"""Enumeration of task time-certainty levels."""
FUZZY = "FUZZY" # An intention without a time
SOFT = "SOFT" # A flexible task with a time
HARD = "HARD" # A fixed meeting/appointment
class Task(Base):
"""SQLAlchemy model representing a CALM task."""
__tablename__ = "tasks"
id = Column(Integer, primary_key=True, index=True)
@@ -52,6 +58,8 @@ class Task(Base):
class JournalEntry(Base):
"""SQLAlchemy model for a daily journal entry with MITs and reflections."""
__tablename__ = "journal_entries"
id = Column(Integer, primary_key=True, index=True)

View File

@@ -46,6 +46,49 @@ async def list_agents():
}
@router.get("/emotional-profile", response_class=HTMLResponse)
async def emotional_profile(request: Request):
"""HTMX partial: render emotional profiles for all loaded agents."""
try:
from timmy.agents.loader import load_agents
agents = load_agents()
profiles = []
for agent_id, agent in agents.items():
profile = agent.emotional_state.get_profile()
profile["agent_id"] = agent_id
profile["agent_name"] = agent.name
profiles.append(profile)
except Exception as exc:
logger.warning("Failed to load emotional profiles: %s", exc)
profiles = []
return templates.TemplateResponse(
request,
"partials/emotional_profile.html",
{"profiles": profiles},
)
@router.get("/emotional-profile/json")
async def emotional_profile_json():
"""JSON API: return emotional profiles for all loaded agents."""
try:
from timmy.agents.loader import load_agents
agents = load_agents()
profiles = []
for agent_id, agent in agents.items():
profile = agent.emotional_state.get_profile()
profile["agent_id"] = agent_id
profile["agent_name"] = agent.name
profiles.append(profile)
return {"profiles": profiles}
except Exception as exc:
logger.warning("Failed to load emotional profiles: %s", exc)
return {"profiles": [], "error": str(exc)}
@router.get("/default/panel", response_class=HTMLResponse)
async def agent_panel(request: Request):
"""Chat panel — for HTMX main-panel swaps."""

View File

@@ -14,6 +14,8 @@ router = APIRouter(prefix="/discord", tags=["discord"])
class TokenPayload(BaseModel):
"""Request payload containing a Discord bot token."""
token: str

View File

@@ -0,0 +1,45 @@
"""Hermes health monitor routes.
Exposes the Hermes health monitor via REST API so the dashboard
and external tools can query system status and trigger checks.
Refs: #1073
"""
import logging
from fastapi import APIRouter
from infrastructure.hermes.monitor import hermes_monitor
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/hermes", tags=["hermes"])
@router.get("/status")
async def hermes_status():
"""Return the most recent Hermes health report.
Returns the cached result from the last background cycle — does not
trigger a new check. Use POST /hermes/check to run an immediate check.
"""
report = hermes_monitor.last_report
if report is None:
return {
"status": "no_data",
"message": "No health report yet — first cycle pending",
"seconds_since_last_run": hermes_monitor.seconds_since_last_run,
}
return report.to_dict()
@router.post("/check")
async def hermes_check():
"""Trigger an immediate Hermes health check cycle.
Runs all monitors synchronously and returns the full report.
Use sparingly — this blocks until all checks complete (~5 seconds).
"""
report = await hermes_monitor.run_cycle()
return report.to_dict()

View File

@@ -10,6 +10,7 @@ from fastapi.responses import HTMLResponse, JSONResponse
from dashboard.services.scorecard_service import (
PeriodType,
ScorecardSummary,
generate_all_scorecards,
generate_scorecard,
get_tracked_agents,
@@ -26,6 +27,216 @@ def _format_period_label(period_type: PeriodType) -> str:
return "Daily" if period_type == PeriodType.daily else "Weekly"
def _parse_period(period: str) -> PeriodType:
"""Parse period string into PeriodType, defaulting to daily on invalid input.
Args:
period: The period string ('daily' or 'weekly')
Returns:
PeriodType.daily or PeriodType.weekly
"""
try:
return PeriodType(period.lower())
except ValueError:
return PeriodType.daily
def _format_token_display(token_net: int) -> str:
"""Format token net value with +/- prefix for display.
Args:
token_net: The net token value
Returns:
Formatted string with + prefix for positive values
"""
return f"{'+' if token_net > 0 else ''}{token_net}"
def _format_token_class(token_net: int) -> str:
"""Get CSS class for token net value based on sign.
Args:
token_net: The net token value
Returns:
'text-success' for positive/zero, 'text-danger' for negative
"""
return "text-success" if token_net >= 0 else "text-danger"
def _build_patterns_html(patterns: list[str]) -> str:
"""Build HTML for patterns section if patterns exist.
Args:
patterns: List of pattern strings
Returns:
HTML string for patterns section or empty string
"""
if not patterns:
return ""
patterns_list = "".join([f"<li>{p}</li>" for p in patterns])
return f"""
<div class="mt-3">
<h6>Patterns</h6>
<ul class="list-unstyled text-info">
{patterns_list}
</ul>
</div>
"""
def _build_narrative_html(bullets: list[str]) -> str:
"""Build HTML for narrative bullets.
Args:
bullets: List of narrative bullet strings
Returns:
HTML string with list items
"""
return "".join([f"<li>{b}</li>" for b in bullets])
def _build_metrics_row_html(metrics: dict) -> str:
"""Build HTML for the metrics summary row.
Args:
metrics: Dictionary with PRs, issues, tests, and token metrics
Returns:
HTML string for the metrics row
"""
prs_opened = metrics["prs_opened"]
prs_merged = metrics["prs_merged"]
pr_merge_rate = int(metrics["pr_merge_rate"] * 100)
issues_touched = metrics["issues_touched"]
tests_affected = metrics["tests_affected"]
token_net = metrics["token_net"]
token_class = _format_token_class(token_net)
token_display = _format_token_display(token_net)
return f"""
<div class="row text-center small">
<div class="col">
<div class="text-muted">PRs</div>
<div class="fw-bold">{prs_opened}/{prs_merged}</div>
<div class="text-muted" style="font-size: 0.75rem;">
{pr_merge_rate}% merged
</div>
</div>
<div class="col">
<div class="text-muted">Issues</div>
<div class="fw-bold">{issues_touched}</div>
</div>
<div class="col">
<div class="text-muted">Tests</div>
<div class="fw-bold">{tests_affected}</div>
</div>
<div class="col">
<div class="text-muted">Tokens</div>
<div class="fw-bold {token_class}">{token_display}</div>
</div>
</div>
"""
def _render_scorecard_panel(
agent_id: str,
period_type: PeriodType,
data: dict,
) -> str:
"""Render HTML for a single scorecard panel.
Args:
agent_id: The agent ID
period_type: Daily or weekly period
data: Scorecard data dictionary with metrics, patterns, narrative_bullets
Returns:
HTML string for the scorecard panel
"""
patterns_html = _build_patterns_html(data.get("patterns", []))
bullets_html = _build_narrative_html(data.get("narrative_bullets", []))
metrics_row = _build_metrics_row_html(data["metrics"])
return f"""
<div class="card mc-panel">
<div class="card-header d-flex justify-content-between align-items-center">
<h5 class="card-title mb-0">{agent_id.title()}</h5>
<span class="badge bg-secondary">{_format_period_label(period_type)}</span>
</div>
<div class="card-body">
<ul class="list-unstyled mb-3">
{bullets_html}
</ul>
{metrics_row}
{patterns_html}
</div>
</div>
"""
def _render_empty_scorecard(agent_id: str) -> str:
"""Render HTML for an empty scorecard (no activity).
Args:
agent_id: The agent ID
Returns:
HTML string for the empty scorecard panel
"""
return f"""
<div class="card mc-panel">
<h5 class="card-title">{agent_id.title()}</h5>
<p class="text-muted">No activity recorded for this period.</p>
</div>
"""
def _render_error_scorecard(agent_id: str, error: str) -> str:
"""Render HTML for a scorecard that failed to load.
Args:
agent_id: The agent ID
error: Error message string
Returns:
HTML string for the error scorecard panel
"""
return f"""
<div class="card mc-panel border-danger">
<h5 class="card-title">{agent_id.title()}</h5>
<p class="text-danger">Error loading scorecard: {error}</p>
</div>
"""
def _render_single_panel_wrapper(
agent_id: str,
period_type: PeriodType,
scorecard: ScorecardSummary | None,
) -> str:
"""Render a complete scorecard panel with wrapper div for single panel view.
Args:
agent_id: The agent ID
period_type: Daily or weekly period
scorecard: ScorecardSummary object or None
Returns:
HTML string for the complete panel
"""
if scorecard is None:
return _render_empty_scorecard(agent_id)
return _render_scorecard_panel(agent_id, period_type, scorecard.to_dict())
@router.get("/api/agents")
async def list_tracked_agents() -> dict[str, list[str]]:
"""Return the list of tracked agent IDs.
@@ -149,99 +360,50 @@ async def agent_scorecard_panel(
Returns:
HTML panel with scorecard content
"""
try:
period_type = PeriodType(period.lower())
except ValueError:
period_type = PeriodType.daily
period_type = _parse_period(period)
try:
scorecard = generate_scorecard(agent_id, period_type)
if scorecard is None:
return HTMLResponse(
content=f"""
<div class="card mc-panel">
<h5 class="card-title">{agent_id.title()}</h5>
<p class="text-muted">No activity recorded for this period.</p>
</div>
""",
status_code=200,
)
data = scorecard.to_dict()
# Build patterns HTML
patterns_html = ""
if data["patterns"]:
patterns_list = "".join([f"<li>{p}</li>" for p in data["patterns"]])
patterns_html = f"""
<div class="mt-3">
<h6>Patterns</h6>
<ul class="list-unstyled text-info">
{patterns_list}
</ul>
</div>
"""
# Build bullets HTML
bullets_html = "".join([f"<li>{b}</li>" for b in data["narrative_bullets"]])
# Build metrics summary
metrics = data["metrics"]
html_content = f"""
<div class="card mc-panel">
<div class="card-header d-flex justify-content-between align-items-center">
<h5 class="card-title mb-0">{agent_id.title()}</h5>
<span class="badge bg-secondary">{_format_period_label(period_type)}</span>
</div>
<div class="card-body">
<ul class="list-unstyled mb-3">
{bullets_html}
</ul>
<div class="row text-center small">
<div class="col">
<div class="text-muted">PRs</div>
<div class="fw-bold">{metrics["prs_opened"]}/{metrics["prs_merged"]}</div>
<div class="text-muted" style="font-size: 0.75rem;">
{int(metrics["pr_merge_rate"] * 100)}% merged
</div>
</div>
<div class="col">
<div class="text-muted">Issues</div>
<div class="fw-bold">{metrics["issues_touched"]}</div>
</div>
<div class="col">
<div class="text-muted">Tests</div>
<div class="fw-bold">{metrics["tests_affected"]}</div>
</div>
<div class="col">
<div class="text-muted">Tokens</div>
<div class="fw-bold {"text-success" if metrics["token_net"] >= 0 else "text-danger"}">
{"+" if metrics["token_net"] > 0 else ""}{metrics["token_net"]}
</div>
</div>
</div>
{patterns_html}
</div>
</div>
"""
html_content = _render_single_panel_wrapper(agent_id, period_type, scorecard)
return HTMLResponse(content=html_content)
except Exception as exc:
logger.error("Failed to render scorecard panel for %s: %s", agent_id, exc)
return HTMLResponse(
content=f"""
<div class="card mc-panel border-danger">
<h5 class="card-title">{agent_id.title()}</h5>
<p class="text-danger">Error loading scorecard: {str(exc)}</p>
</div>
""",
status_code=200,
return HTMLResponse(content=_render_error_scorecard(agent_id, str(exc)))
def _render_all_panels_grid(
scorecards: list[ScorecardSummary],
period_type: PeriodType,
) -> str:
"""Render all scorecard panels in a grid layout.
Args:
scorecards: List of scorecard summaries
period_type: Daily or weekly period
Returns:
HTML string with all panels in a grid
"""
panels: list[str] = []
for scorecard in scorecards:
panel_html = _render_scorecard_panel(
scorecard.agent_id,
period_type,
scorecard.to_dict(),
)
# Wrap each panel in a grid column
wrapped = f'<div class="col-md-6 col-lg-4 mb-3">{panel_html}</div>'
panels.append(wrapped)
return f"""
<div class="row">
{"".join(panels)}
</div>
<div class="text-muted small mt-2">
Generated: {datetime.now().strftime("%Y-%m-%d %H:%M:%S UTC")}
</div>
"""
@router.get("/all/panels", response_class=HTMLResponse)
@@ -258,96 +420,15 @@ async def all_scorecard_panels(
Returns:
HTML with all scorecard panels
"""
try:
period_type = PeriodType(period.lower())
except ValueError:
period_type = PeriodType.daily
period_type = _parse_period(period)
try:
scorecards = generate_all_scorecards(period_type)
panels: list[str] = []
for scorecard in scorecards:
data = scorecard.to_dict()
# Build patterns HTML
patterns_html = ""
if data["patterns"]:
patterns_list = "".join([f"<li>{p}</li>" for p in data["patterns"]])
patterns_html = f"""
<div class="mt-3">
<h6>Patterns</h6>
<ul class="list-unstyled text-info">
{patterns_list}
</ul>
</div>
"""
# Build bullets HTML
bullets_html = "".join([f"<li>{b}</li>" for b in data["narrative_bullets"]])
metrics = data["metrics"]
panel_html = f"""
<div class="col-md-6 col-lg-4 mb-3">
<div class="card mc-panel">
<div class="card-header d-flex justify-content-between align-items-center">
<h5 class="card-title mb-0">{scorecard.agent_id.title()}</h5>
<span class="badge bg-secondary">{_format_period_label(period_type)}</span>
</div>
<div class="card-body">
<ul class="list-unstyled mb-3">
{bullets_html}
</ul>
<div class="row text-center small">
<div class="col">
<div class="text-muted">PRs</div>
<div class="fw-bold">{metrics["prs_opened"]}/{metrics["prs_merged"]}</div>
<div class="text-muted" style="font-size: 0.75rem;">
{int(metrics["pr_merge_rate"] * 100)}% merged
</div>
</div>
<div class="col">
<div class="text-muted">Issues</div>
<div class="fw-bold">{metrics["issues_touched"]}</div>
</div>
<div class="col">
<div class="text-muted">Tests</div>
<div class="fw-bold">{metrics["tests_affected"]}</div>
</div>
<div class="col">
<div class="text-muted">Tokens</div>
<div class="fw-bold {"text-success" if metrics["token_net"] >= 0 else "text-danger"}">
{"+" if metrics["token_net"] > 0 else ""}{metrics["token_net"]}
</div>
</div>
</div>
{patterns_html}
</div>
</div>
</div>
"""
panels.append(panel_html)
html_content = f"""
<div class="row">
{"".join(panels)}
</div>
<div class="text-muted small mt-2">
Generated: {datetime.now().strftime("%Y-%m-%d %H:%M:%S UTC")}
</div>
"""
html_content = _render_all_panels_grid(scorecards, period_type)
return HTMLResponse(content=html_content)
except Exception as exc:
logger.error("Failed to render all scorecard panels: %s", exc)
return HTMLResponse(
content=f"""
<div class="alert alert-danger">
Error loading scorecards: {str(exc)}
</div>
""",
status_code=200,
content=f'<div class="alert alert-danger">Error loading scorecards: {exc}</div>'
)

View File

@@ -0,0 +1,40 @@
"""WebSocket emitter for the sovereignty metrics dashboard widget.
Streams real-time sovereignty snapshots to connected clients every
*_PUSH_INTERVAL* seconds. The snapshot includes per-layer sovereignty
percentages, API cost rate, and skill crystallisation count.
Refs: #954, #953
"""
import asyncio
import json
import logging
from fastapi import APIRouter, WebSocket
router = APIRouter(tags=["sovereignty"])
logger = logging.getLogger(__name__)
_PUSH_INTERVAL = 5 # seconds between snapshot pushes
@router.websocket("/ws/sovereignty")
async def sovereignty_ws(websocket: WebSocket) -> None:
"""Stream sovereignty metric snapshots to the dashboard widget."""
from timmy.sovereignty.metrics import get_metrics_store
await websocket.accept()
logger.info("Sovereignty WS connected")
store = get_metrics_store()
try:
# Send initial snapshot immediately
await websocket.send_text(json.dumps(store.get_snapshot()))
while True:
await asyncio.sleep(_PUSH_INTERVAL)
await websocket.send_text(json.dumps(store.get_snapshot()))
except Exception:
logger.debug("Sovereignty WS disconnected")

View File

@@ -7,6 +7,8 @@ router = APIRouter(prefix="/telegram", tags=["telegram"])
class TokenPayload(BaseModel):
"""Request payload containing a Telegram bot token."""
token: str

View File

@@ -1,11 +1,14 @@
"""Voice routes — /voice/* and /voice/enhanced/* endpoints.
Provides NLU intent detection, TTS control, the full voice-to-action
pipeline (detect intent → execute → optionally speak), and the voice
button UI page.
pipeline (detect intent → execute → optionally speak), the voice
button UI page, and voice settings customisation.
"""
import asyncio
import json
import logging
from pathlib import Path
from fastapi import APIRouter, Form, Request
from fastapi.responses import HTMLResponse
@@ -14,6 +17,31 @@ from dashboard.templating import templates
from integrations.voice.nlu import detect_intent, extract_command
from timmy.agent import create_timmy
# ── Voice settings persistence ───────────────────────────────────────────────
_VOICE_SETTINGS_FILE = Path("data/voice_settings.json")
_DEFAULT_VOICE_SETTINGS: dict = {"rate": 175, "volume": 0.9, "voice_id": ""}
def _load_voice_settings() -> dict:
"""Read persisted voice settings from disk; return defaults on any error."""
try:
if _VOICE_SETTINGS_FILE.exists():
return json.loads(_VOICE_SETTINGS_FILE.read_text())
except Exception as exc:
logger.warning("Failed to load voice settings: %s", exc)
return dict(_DEFAULT_VOICE_SETTINGS)
def _save_voice_settings(data: dict) -> None:
"""Persist voice settings to disk; log and continue on any error."""
try:
_VOICE_SETTINGS_FILE.parent.mkdir(parents=True, exist_ok=True)
_VOICE_SETTINGS_FILE.write_text(json.dumps(data))
except Exception as exc:
logger.warning("Failed to save voice settings: %s", exc)
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/voice", tags=["voice"])
@@ -152,3 +180,58 @@ async def process_voice_input(
"error": error,
"spoken": speak_response and response_text is not None,
}
# ── Voice settings UI ────────────────────────────────────────────────────────
@router.get("/settings", response_class=HTMLResponse)
async def voice_settings_page(request: Request):
"""Render the voice customisation settings page."""
current = await asyncio.to_thread(_load_voice_settings)
voices: list[dict] = []
try:
from timmy_serve.voice_tts import voice_tts
if voice_tts.available:
voices = await asyncio.to_thread(voice_tts.get_voices)
except Exception as exc:
logger.debug("Voice settings page: TTS not available — %s", exc)
return templates.TemplateResponse(
request,
"voice_settings.html",
{"settings": current, "voices": voices},
)
@router.get("/settings/data")
async def voice_settings_data():
"""Return current voice settings as JSON."""
return await asyncio.to_thread(_load_voice_settings)
@router.post("/settings/save")
async def voice_settings_save(
rate: int = Form(175),
volume: float = Form(0.9),
voice_id: str = Form(""),
):
"""Persist voice settings and apply them to the running TTS engine."""
rate = max(50, min(400, rate))
volume = max(0.0, min(1.0, volume))
data = {"rate": rate, "volume": volume, "voice_id": voice_id}
# Apply to the live TTS engine (graceful degradation when unavailable)
try:
from timmy_serve.voice_tts import voice_tts
if voice_tts.available:
await asyncio.to_thread(voice_tts.set_rate, rate)
await asyncio.to_thread(voice_tts.set_volume, volume)
if voice_id:
await asyncio.to_thread(voice_tts.set_voice, voice_id)
except Exception as exc:
logger.warning("Voice settings: failed to apply to TTS engine — %s", exc)
await asyncio.to_thread(_save_voice_settings, data)
return {"saved": True, "settings": data}

View File

@@ -51,6 +51,8 @@ def _get_db() -> Generator[sqlite3.Connection, None, None]:
class _EnumLike:
"""Lightweight enum-like wrapper for string values used in templates."""
def __init__(self, v: str):
self.value = v

View File

@@ -23,6 +23,8 @@ TRACKED_AGENTS = frozenset({"hermes", "kimi", "manus", "claude", "gemini"})
class PeriodType(StrEnum):
"""Scorecard reporting period type."""
daily = "daily"
weekly = "weekly"

View File

@@ -88,6 +88,7 @@
<a href="/lightning/ledger" class="mc-test-link">LEDGER</a>
<a href="/creative/ui" class="mc-test-link">CREATIVE</a>
<a href="/voice/button" class="mc-test-link">VOICE</a>
<a href="/voice/settings" class="mc-test-link">VOICE SETTINGS</a>
<a href="/mobile" class="mc-test-link" title="Mobile-optimized view">MOBILE</a>
<a href="/mobile/local" class="mc-test-link" title="Local AI on iPhone">LOCAL AI</a>
</div>
@@ -145,6 +146,7 @@
<a href="/lightning/ledger" class="mc-mobile-link">LEDGER</a>
<a href="/creative/ui" class="mc-mobile-link">CREATIVE</a>
<a href="/voice/button" class="mc-mobile-link">VOICE</a>
<a href="/voice/settings" class="mc-mobile-link">VOICE SETTINGS</a>
<a href="/mobile" class="mc-mobile-link">MOBILE</a>
<a href="/mobile/local" class="mc-mobile-link">LOCAL AI</a>
<div class="mc-mobile-menu-footer">

View File

@@ -14,6 +14,11 @@
<div class="mc-loading-placeholder">LOADING...</div>
{% endcall %}
<!-- Emotional Profile (HTMX polled) -->
{% call panel("EMOTIONAL PROFILE", hx_get="/agents/emotional-profile", hx_trigger="every 10s") %}
<div class="mc-loading-placeholder">LOADING...</div>
{% endcall %}
<!-- System Health (HTMX polled) -->
{% call panel("SYSTEM HEALTH", hx_get="/health/status", hx_trigger="every 30s") %}
<div class="health-row">

View File

@@ -0,0 +1,37 @@
{% if not profiles %}
<div class="mc-muted" style="font-size:11px; padding:4px;">
No agents loaded
</div>
{% endif %}
{% for p in profiles %}
{% set color_map = {
"cautious": "var(--amber)",
"adventurous": "var(--green)",
"analytical": "var(--purple)",
"frustrated": "var(--red)",
"confident": "var(--green)",
"curious": "var(--orange)",
"calm": "var(--text-dim)"
} %}
{% set emo_color = color_map.get(p.current_emotion, "var(--text-dim)") %}
<div class="mc-emotion-row" style="margin-bottom:8px; padding:6px 8px; border-left:3px solid {{ emo_color }};">
<div class="d-flex justify-content-between align-items-center" style="margin-bottom:2px;">
<span style="font-size:11px; font-weight:bold; letter-spacing:.08em; color:var(--text-bright);">
{{ p.agent_name | upper | e }}
</span>
<span style="font-size:10px; color:{{ emo_color }}; letter-spacing:.06em;">
{{ p.emotion_label | e }}
</span>
</div>
<div style="margin-bottom:4px;">
<div style="height:4px; background:var(--bg-deep); border-radius:2px; overflow:hidden;">
<div style="height:100%; width:{{ (p.intensity * 100) | int }}%; background:{{ emo_color }}; border-radius:2px; transition:width 0.3s;"></div>
</div>
</div>
<div style="font-size:9px; color:var(--text-dim); letter-spacing:.06em;">
{{ p.intensity_label | upper | e }}
{% if p.trigger_event %} · {{ p.trigger_event | replace("_", " ") | upper | e }}{% endif %}
</div>
</div>
{% endfor %}

View File

@@ -0,0 +1,131 @@
{% extends "base.html" %}
{% from "macros.html" import panel %}
{% block title %}Voice Settings{% endblock %}
{% block extra_styles %}{% endblock %}
{% block content %}
<div class="voice-settings-page py-3">
{% call panel("VOICE SETTINGS") %}
<form id="voice-settings-form">
<div class="vs-field">
<label class="vs-label" for="rate-slider">
SPEED &mdash; <span class="vs-value" id="rate-val">{{ settings.rate }}</span> WPM
</label>
<input type="range" class="vs-slider" id="rate-slider" name="rate"
min="50" max="400" step="5" value="{{ settings.rate }}"
oninput="document.getElementById('rate-val').textContent=this.value">
<div class="vs-range-labels"><span>Slow</span><span>Fast</span></div>
</div>
<div class="vs-field">
<label class="vs-label" for="vol-slider">
VOLUME &mdash; <span class="vs-value" id="vol-val">{{ (settings.volume * 100)|int }}</span>%
</label>
<input type="range" class="vs-slider" id="vol-slider" name="volume"
min="0" max="100" step="5" value="{{ (settings.volume * 100)|int }}"
oninput="document.getElementById('vol-val').textContent=this.value">
<div class="vs-range-labels"><span>Quiet</span><span>Loud</span></div>
</div>
<div class="vs-field">
<label class="vs-label" for="voice-select">VOICE MODEL</label>
{% if voices %}
<select class="vs-select" id="voice-select" name="voice_id">
<option value="">&#8212; System Default &#8212;</option>
{% for v in voices %}
<option value="{{ v.id }}" {% if v.id == settings.voice_id %}selected{% endif %}>
{{ v.name }}
</option>
{% endfor %}
</select>
{% else %}
<div class="vs-unavailable">Server TTS (pyttsx3) unavailable &mdash; preview uses browser speech synthesis</div>
<input type="hidden" id="voice-select" name="voice_id" value="{{ settings.voice_id }}">
{% endif %}
</div>
<div class="vs-field">
<label class="vs-label" for="preview-text">PREVIEW TEXT</label>
<input type="text" class="vs-input" id="preview-text"
value="Hello, I am Timmy. Your local AI assistant."
placeholder="Enter text to preview...">
</div>
<div class="vs-actions">
<button type="button" class="vs-btn-preview" id="preview-btn" onclick="previewVoice()">
&#9654; PREVIEW
</button>
<button type="button" class="vs-btn-save" id="save-btn" onclick="saveSettings()">
SAVE SETTINGS
</button>
</div>
</form>
{% endcall %}
</div>
<script>
function previewVoice() {
var text = document.getElementById('preview-text').value.trim() ||
'Hello, I am Timmy. Your local AI assistant.';
var rate = parseInt(document.getElementById('rate-slider').value, 10);
var volume = parseInt(document.getElementById('vol-slider').value, 10) / 100;
if (!('speechSynthesis' in window)) {
McToast.show('Speech synthesis not supported in this browser', 'warn');
return;
}
window.speechSynthesis.cancel();
var utterance = new SpeechSynthesisUtterance(text);
// Web Speech API rate: 1.0 ≈ 175 WPM (default)
utterance.rate = rate / 175;
utterance.volume = volume;
// Best-effort voice match from server selection
var voiceSelect = document.getElementById('voice-select');
if (voiceSelect && voiceSelect.value) {
var selectedText = voiceSelect.options[voiceSelect.selectedIndex].text.toLowerCase();
var firstWord = selectedText.split(' ')[0];
var browserVoices = window.speechSynthesis.getVoices();
var matched = browserVoices.find(function(v) {
return v.name.toLowerCase().includes(firstWord);
});
if (matched) { utterance.voice = matched; }
}
window.speechSynthesis.speak(utterance);
McToast.show('Playing preview\u2026', 'info');
}
async function saveSettings() {
var rate = document.getElementById('rate-slider').value;
var volPct = parseInt(document.getElementById('vol-slider').value, 10);
var voiceId = document.getElementById('voice-select').value;
var body = new URLSearchParams({
rate: rate,
volume: (volPct / 100).toFixed(2),
voice_id: voiceId
});
try {
var resp = await fetch('/voice/settings/save', {
method: 'POST',
headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
body: body.toString()
});
var data = await resp.json();
if (data.saved) {
McToast.show('Voice settings saved.', 'info');
} else {
McToast.show('Failed to save settings.', 'error');
}
} catch (e) {
McToast.show('Error saving settings.', 'error');
}
}
</script>
{% endblock %}

View File

@@ -24,6 +24,8 @@ MAX_MESSAGES: int = 500
@dataclass
class Message:
"""A single chat message with role, content, timestamp, and source."""
role: str # "user" | "agent" | "error"
content: str
timestamp: str

View File

@@ -0,0 +1,9 @@
"""Hermes health monitor — system resources + model management.
Monitors the local machine (Hermes/M3 Max) for memory pressure, disk usage,
Ollama model health, zombie processes, and network connectivity.
"""
from infrastructure.hermes.monitor import HealthLevel, HealthReport, HermesMonitor, hermes_monitor
__all__ = ["HermesMonitor", "HealthLevel", "HealthReport", "hermes_monitor"]

View File

@@ -0,0 +1,660 @@
"""Hermes health monitor — system resources + model management.
Monitors the local machine (Hermes/M3 Max) and keeps it running smoothly.
Runs every 5 minutes, auto-resolves issues where possible, alerts when
human intervention is needed.
Monitors:
1. Memory pressure — unified memory, alert if <4GB free, unload models
2. Disk usage — alert if <10GB free, clean temp files
3. Ollama status — verify reachable, restart if crashed, manage loaded models
4. Process health — detect zombie processes
5. Network — verify Gitea connectivity
Refs: #1073
"""
import asyncio
import json
import logging
import shutil
import subprocess
import tempfile
import time
import urllib.request
from dataclasses import dataclass, field
from datetime import UTC, datetime
from enum import StrEnum
from typing import Any
from config import settings
logger = logging.getLogger(__name__)
class HealthLevel(StrEnum):
"""Severity level for a health check result."""
OK = "ok"
WARNING = "warning"
CRITICAL = "critical"
UNKNOWN = "unknown"
@dataclass
class CheckResult:
"""Result of a single health check."""
name: str
level: HealthLevel
message: str
details: dict[str, Any] = field(default_factory=dict)
auto_resolved: bool = False
needs_human: bool = False
def to_dict(self) -> dict[str, Any]:
return {
"name": self.name,
"level": self.level.value,
"message": self.message,
"details": self.details,
"auto_resolved": self.auto_resolved,
"needs_human": self.needs_human,
}
@dataclass
class HealthReport:
"""Full health report from a single monitor cycle."""
timestamp: str
checks: list[CheckResult]
overall: HealthLevel
@property
def has_issues(self) -> bool:
return any(c.level != HealthLevel.OK for c in self.checks)
def to_dict(self) -> dict[str, Any]:
return {
"timestamp": self.timestamp,
"overall": self.overall.value,
"has_issues": self.has_issues,
"checks": [c.to_dict() for c in self.checks],
}
class HermesMonitor:
"""System health monitor for Hermes (local M3 Max machine).
All blocking I/O (subprocess, HTTP) is wrapped in asyncio.to_thread()
so it never blocks the event loop. Results are cached so the dashboard
can read the last report without triggering a new cycle.
"""
OLLAMA_REQUEST_TIMEOUT = 5
NETWORK_REQUEST_TIMEOUT = 5
def __init__(self) -> None:
self._last_report: HealthReport | None = None
self._last_run_ts: float = 0.0
@property
def last_report(self) -> HealthReport | None:
"""Most recent health report, or None if no cycle has run yet."""
return self._last_report
@property
def seconds_since_last_run(self) -> float:
if self._last_run_ts == 0.0:
return float("inf")
return time.monotonic() - self._last_run_ts
async def run_cycle(self) -> HealthReport:
"""Run a full health check cycle and return the report."""
self._last_run_ts = time.monotonic()
logger.info("Hermes health cycle starting")
check_fns = [
self._check_memory(),
self._check_disk(),
self._check_ollama(),
self._check_processes(),
self._check_network(),
]
raw_results = await asyncio.gather(*check_fns, return_exceptions=True)
checks: list[CheckResult] = []
for i, r in enumerate(raw_results):
if isinstance(r, Exception):
name = ["memory", "disk", "ollama", "processes", "network"][i]
logger.warning("Hermes check '%s' raised: %s", name, r)
checks.append(
CheckResult(
name=name,
level=HealthLevel.UNKNOWN,
message=f"Check error: {r}",
)
)
else:
checks.append(r)
# Compute overall level
levels = {c.level for c in checks}
if HealthLevel.CRITICAL in levels:
overall = HealthLevel.CRITICAL
elif HealthLevel.WARNING in levels:
overall = HealthLevel.WARNING
elif HealthLevel.UNKNOWN in levels:
overall = HealthLevel.UNKNOWN
else:
overall = HealthLevel.OK
report = HealthReport(
timestamp=datetime.now(UTC).isoformat(),
checks=checks,
overall=overall,
)
self._last_report = report
await self._handle_alerts(report)
logger.info("Hermes health cycle complete — overall: %s", overall.value)
return report
# ── Memory ───────────────────────────────────────────────────────────────
async def _check_memory(self) -> CheckResult:
"""Check unified memory usage (macOS vm_stat)."""
memory_free_min_gb = getattr(settings, "hermes_memory_free_min_gb", 4.0)
try:
info = await asyncio.to_thread(self._get_memory_info)
free_gb = info.get("free_gb", 0.0)
total_gb = info.get("total_gb", 0.0)
details: dict[str, Any] = {
"free_gb": round(free_gb, 2),
"total_gb": round(total_gb, 2),
}
if free_gb < memory_free_min_gb:
# Attempt auto-remediation: unload Ollama models
unloaded = await self._unload_ollama_models()
if unloaded:
return CheckResult(
name="memory",
level=HealthLevel.WARNING,
message=(
f"Low memory ({free_gb:.1f}GB free) — "
f"unloaded {unloaded} Ollama model(s)"
),
details={**details, "models_unloaded": unloaded},
auto_resolved=True,
)
return CheckResult(
name="memory",
level=HealthLevel.CRITICAL,
message=(
f"Critical: only {free_gb:.1f}GB free (threshold: {memory_free_min_gb}GB)"
),
details=details,
needs_human=True,
)
return CheckResult(
name="memory",
level=HealthLevel.OK,
message=f"Memory OK — {free_gb:.1f}GB free of {total_gb:.1f}GB",
details=details,
)
except Exception as exc:
logger.warning("Memory check failed: %s", exc)
return CheckResult(
name="memory",
level=HealthLevel.UNKNOWN,
message=f"Memory check unavailable: {exc}",
)
def _get_memory_info(self) -> dict[str, float]:
"""Get memory stats via macOS sysctl + vm_stat.
Falls back gracefully on non-macOS systems.
"""
gb = 1024**3
total_bytes = 0.0
free_bytes = 0.0
# Total memory via sysctl
try:
result = subprocess.run(
["sysctl", "-n", "hw.memsize"],
capture_output=True,
text=True,
timeout=3,
)
total_bytes = float(result.stdout.strip())
except Exception:
pass
# Free + inactive pages via vm_stat (macOS)
try:
result = subprocess.run(
["vm_stat"],
capture_output=True,
text=True,
timeout=3,
)
page_size = 16384 # 16 KB default on Apple Silicon
for line in result.stdout.splitlines():
if "page size of" in line:
parts = line.split()
for i, part in enumerate(parts):
if part == "of" and i + 1 < len(parts):
try:
page_size = int(parts[i + 1])
except ValueError:
pass
elif "Pages free:" in line:
pages = int(line.split(":")[1].strip().rstrip("."))
free_bytes += pages * page_size
elif "Pages inactive:" in line:
pages = int(line.split(":")[1].strip().rstrip("."))
free_bytes += pages * page_size
except Exception:
pass
return {
"total_gb": total_bytes / gb if total_bytes else 0.0,
"free_gb": free_bytes / gb if free_bytes else 0.0,
}
# ── Disk ─────────────────────────────────────────────────────────────────
async def _check_disk(self) -> CheckResult:
"""Check disk usage via shutil.disk_usage."""
disk_free_min_gb = getattr(settings, "hermes_disk_free_min_gb", 10.0)
try:
usage = await asyncio.to_thread(shutil.disk_usage, "/")
free_gb = usage.free / (1024**3)
total_gb = usage.total / (1024**3)
used_pct = (usage.used / usage.total) * 100
details: dict[str, Any] = {
"free_gb": round(free_gb, 2),
"total_gb": round(total_gb, 2),
"used_pct": round(used_pct, 1),
}
if free_gb < disk_free_min_gb:
cleaned_gb = await self._cleanup_temp_files()
if cleaned_gb > 0.01:
return CheckResult(
name="disk",
level=HealthLevel.WARNING,
message=(
f"Low disk ({free_gb:.1f}GB free) — "
f"cleaned {cleaned_gb:.2f}GB from /tmp"
),
details={**details, "cleaned_gb": round(cleaned_gb, 2)},
auto_resolved=True,
)
return CheckResult(
name="disk",
level=HealthLevel.CRITICAL,
message=(
f"Critical: only {free_gb:.1f}GB free (threshold: {disk_free_min_gb}GB)"
),
details=details,
needs_human=True,
)
return CheckResult(
name="disk",
level=HealthLevel.OK,
message=f"Disk OK — {free_gb:.1f}GB free ({used_pct:.0f}% used)",
details=details,
)
except Exception as exc:
logger.warning("Disk check failed: %s", exc)
return CheckResult(
name="disk",
level=HealthLevel.UNKNOWN,
message=f"Disk check unavailable: {exc}",
)
async def _cleanup_temp_files(self) -> float:
"""Remove /tmp files older than 24 hours. Returns GB freed."""
return await asyncio.to_thread(self._cleanup_temp_files_sync)
def _cleanup_temp_files_sync(self) -> float:
"""Synchronous /tmp cleanup — only touches files older than 24 hours."""
from pathlib import Path
freed_bytes = 0
cutoff = time.time() - 86400 # 24 hours ago
try:
tmp = Path(tempfile.gettempdir())
for item in tmp.iterdir():
try:
stat = item.stat()
if stat.st_mtime >= cutoff:
continue
if item.is_file():
freed_bytes += stat.st_size
item.unlink(missing_ok=True)
elif item.is_dir():
dir_size = sum(f.stat().st_size for f in item.rglob("*") if f.is_file())
freed_bytes += dir_size
shutil.rmtree(str(item), ignore_errors=True)
except (PermissionError, OSError):
pass # Skip files we can't touch
except Exception as exc:
logger.warning("Temp cleanup error: %s", exc)
freed_gb = freed_bytes / (1024**3)
if freed_gb > 0.001:
logger.info("Hermes disk cleanup: freed %.2fGB from /tmp", freed_gb)
return freed_gb
# ── Ollama ───────────────────────────────────────────────────────────────
async def _check_ollama(self) -> CheckResult:
"""Check Ollama status and loaded models."""
try:
status = await asyncio.to_thread(self._get_ollama_status)
if not status.get("reachable"):
restarted = await self._restart_ollama()
if restarted:
return CheckResult(
name="ollama",
level=HealthLevel.WARNING,
message="Ollama was unreachable — restart initiated",
details={"restart_attempted": True},
auto_resolved=True,
)
return CheckResult(
name="ollama",
level=HealthLevel.CRITICAL,
message="Ollama unreachable and restart failed",
details={"reachable": False},
needs_human=True,
)
models = status.get("models", [])
loaded = status.get("loaded_models", [])
return CheckResult(
name="ollama",
level=HealthLevel.OK,
message=(f"Ollama OK — {len(models)} model(s) available, {len(loaded)} loaded"),
details={
"reachable": True,
"model_count": len(models),
"loaded_count": len(loaded),
"loaded_models": [m.get("name", "") for m in loaded],
},
)
except Exception as exc:
logger.warning("Ollama check failed: %s", exc)
return CheckResult(
name="ollama",
level=HealthLevel.UNKNOWN,
message=f"Ollama check failed: {exc}",
)
def _get_ollama_status(self) -> dict[str, Any]:
"""Synchronous Ollama status — checks /api/tags and /api/ps."""
url = settings.normalized_ollama_url
try:
req = urllib.request.Request(
f"{url}/api/tags",
method="GET",
headers={"Accept": "application/json"},
)
with urllib.request.urlopen(req, timeout=self.OLLAMA_REQUEST_TIMEOUT) as resp:
data = json.loads(resp.read().decode())
models = data.get("models", [])
except Exception:
return {"reachable": False, "models": [], "loaded_models": []}
# /api/ps lists currently loaded (in-memory) models — Ollama >=0.2
loaded: list[dict] = []
try:
req = urllib.request.Request(
f"{url}/api/ps",
method="GET",
headers={"Accept": "application/json"},
)
with urllib.request.urlopen(req, timeout=self.OLLAMA_REQUEST_TIMEOUT) as resp:
ps_data = json.loads(resp.read().decode())
loaded = ps_data.get("models", [])
except Exception:
pass # /api/ps absent on older Ollama — non-fatal
return {"reachable": True, "models": models, "loaded_models": loaded}
async def _unload_ollama_models(self) -> int:
"""Unload in-memory Ollama models to free unified memory.
Uses the keep_alive=0 trick: POSTing to /api/generate with
keep_alive=0 causes Ollama to immediately evict the model.
Returns the number of models successfully unloaded.
"""
return await asyncio.to_thread(self._unload_ollama_models_sync)
def _unload_ollama_models_sync(self) -> int:
"""Synchronous model unload implementation."""
url = settings.normalized_ollama_url
unloaded = 0
try:
req = urllib.request.Request(
f"{url}/api/ps",
method="GET",
headers={"Accept": "application/json"},
)
with urllib.request.urlopen(req, timeout=self.OLLAMA_REQUEST_TIMEOUT) as resp:
ps_data = json.loads(resp.read().decode())
loaded = ps_data.get("models", [])
except Exception:
return 0
for model in loaded:
name = model.get("name", "")
if not name:
continue
try:
payload = json.dumps({"model": name, "keep_alive": 0}).encode()
req = urllib.request.Request(
f"{url}/api/generate",
data=payload,
method="POST",
headers={"Content-Type": "application/json"},
)
with urllib.request.urlopen(req, timeout=10) as _:
pass
logger.info("Hermes: unloaded Ollama model %s", name)
unloaded += 1
except Exception as exc:
logger.warning("Hermes: failed to unload model %s: %s", name, exc)
return unloaded
async def _restart_ollama(self) -> bool:
"""Attempt to restart the Ollama service via launchctl or brew."""
return await asyncio.to_thread(self._restart_ollama_sync)
def _restart_ollama_sync(self) -> bool:
"""Try launchctl first, then brew services."""
# macOS launchctl (installed via official Ollama installer)
try:
result = subprocess.run(
["launchctl", "stop", "com.ollama.ollama"],
capture_output=True,
timeout=10,
)
if result.returncode == 0:
time.sleep(2)
subprocess.run(
["launchctl", "start", "com.ollama.ollama"],
capture_output=True,
timeout=10,
)
logger.info("Hermes: Ollama restarted via launchctl")
return True
except Exception:
pass
# Homebrew fallback
try:
result = subprocess.run(
["brew", "services", "restart", "ollama"],
capture_output=True,
timeout=20,
)
if result.returncode == 0:
logger.info("Hermes: Ollama restarted via brew services")
return True
except Exception:
pass
logger.warning("Hermes: Ollama restart failed — manual intervention needed")
return False
# ── Processes ────────────────────────────────────────────────────────────
async def _check_processes(self) -> CheckResult:
"""Check for zombie processes via ps aux."""
try:
result = await asyncio.to_thread(self._get_zombie_processes)
zombies = result.get("zombies", [])
if zombies:
return CheckResult(
name="processes",
level=HealthLevel.WARNING,
message=f"Found {len(zombies)} zombie process(es)",
details={"zombies": zombies[:5]},
needs_human=len(zombies) > 3,
)
return CheckResult(
name="processes",
level=HealthLevel.OK,
message="Processes OK — no zombies detected",
details={"zombie_count": 0},
)
except Exception as exc:
logger.warning("Process check failed: %s", exc)
return CheckResult(
name="processes",
level=HealthLevel.UNKNOWN,
message=f"Process check unavailable: {exc}",
)
def _get_zombie_processes(self) -> dict[str, Any]:
"""Detect zombie processes (state 'Z') via ps aux."""
result = subprocess.run(
["ps", "aux"],
capture_output=True,
text=True,
timeout=5,
)
zombies = []
for line in result.stdout.splitlines()[1:]: # Skip header row
parts = line.split(None, 10)
if len(parts) >= 8 and parts[7] == "Z":
zombies.append(
{
"pid": parts[1],
"command": parts[10][:80] if len(parts) > 10 else "",
}
)
return {"zombies": zombies}
# ── Network ──────────────────────────────────────────────────────────────
async def _check_network(self) -> CheckResult:
"""Check Gitea connectivity."""
try:
result = await asyncio.to_thread(self._check_gitea_connectivity)
reachable = result.get("reachable", False)
latency_ms = result.get("latency_ms", -1.0)
if not reachable:
return CheckResult(
name="network",
level=HealthLevel.WARNING,
message=f"Gitea unreachable: {result.get('error', 'unknown')}",
details=result,
needs_human=True,
)
return CheckResult(
name="network",
level=HealthLevel.OK,
message=f"Network OK — Gitea reachable ({latency_ms:.0f}ms)",
details=result,
)
except Exception as exc:
logger.warning("Network check failed: %s", exc)
return CheckResult(
name="network",
level=HealthLevel.UNKNOWN,
message=f"Network check unavailable: {exc}",
)
def _check_gitea_connectivity(self) -> dict[str, Any]:
"""Synchronous Gitea reachability check."""
url = settings.gitea_url
start = time.monotonic()
try:
req = urllib.request.Request(
f"{url}/api/v1/version",
method="GET",
headers={"Accept": "application/json"},
)
with urllib.request.urlopen(req, timeout=self.NETWORK_REQUEST_TIMEOUT) as resp:
latency_ms = (time.monotonic() - start) * 1000
return {
"reachable": resp.status == 200,
"latency_ms": round(latency_ms, 1),
"url": url,
}
except Exception as exc:
return {
"reachable": False,
"error": str(exc),
"url": url,
"latency_ms": -1.0,
}
# ── Alerts ───────────────────────────────────────────────────────────────
async def _handle_alerts(self, report: HealthReport) -> None:
"""Send push notifications for issues that need attention."""
try:
from infrastructure.notifications.push import notifier
except Exception:
return
for check in report.checks:
if check.level == HealthLevel.CRITICAL or check.needs_human:
notifier.notify(
title=f"Hermes Alert: {check.name}",
message=check.message,
category="system",
native=check.level == HealthLevel.CRITICAL,
)
elif check.level == HealthLevel.WARNING and check.auto_resolved:
notifier.notify(
title=f"Hermes: {check.name} auto-fixed",
message=check.message,
category="system",
)
# Module-level singleton
hermes_monitor = HermesMonitor()

View File

@@ -21,6 +21,8 @@ logger = logging.getLogger(__name__)
@dataclass
class Notification:
"""A push notification with title, message, category, and read status."""
id: int
title: str
message: str

View File

@@ -3,6 +3,14 @@
from .api import router
from .cascade import CascadeRouter, Provider, ProviderStatus, get_router
from .history import HealthHistoryStore, get_history_store
from .metabolic import (
DEFAULT_TIER_MODELS,
MetabolicRouter,
ModelTier,
build_prompt,
classify_complexity,
get_metabolic_router,
)
__all__ = [
"CascadeRouter",
@@ -12,4 +20,11 @@ __all__ = [
"router",
"HealthHistoryStore",
"get_history_store",
# Metabolic router
"MetabolicRouter",
"ModelTier",
"DEFAULT_TIER_MODELS",
"classify_complexity",
"build_prompt",
"get_metabolic_router",
]

View File

@@ -114,7 +114,7 @@ class Provider:
type: str # ollama, openai, anthropic
enabled: bool
priority: int
tier: str | None = None # e.g., "local", "standard_cloud", "frontier"
tier: str | None = None # e.g., "local", "standard_cloud", "frontier"
url: str | None = None
api_key: str | None = None
base_url: str | None = None
@@ -573,7 +573,6 @@ class CascadeRouter:
if not providers:
raise RuntimeError(f"No providers found for tier: {cascade_tier}")
for provider in providers:
if not self._is_provider_available(provider):
continue

View File

@@ -0,0 +1,424 @@
"""Three-tier metabolic LLM router.
Routes queries to the cheapest-sufficient model tier using MLX for all
inference on Apple Silicon GPU:
T1 — Routine (Qwen3-8B Q6_K, ~45-55 tok/s): Simple navigation, basic choices.
T2 — Medium (Qwen3-14B Q5_K_M, ~20-28 tok/s): Dialogue, inventory management.
T3 — Complex (Qwen3-32B Q4_K_M, ~8-12 tok/s): Quest planning, stuck recovery.
Memory budget:
- T1+T2 always loaded (~8.5 GB combined)
- T3 loaded on demand (+20 GB) — game pauses during inference
Design notes:
- 70% of game ticks never reach the LLM (handled upstream by behavior trees)
- T3 pauses the game world before inference and unpauses after (graceful if no world)
- All inference via vllm-mlx / Ollama — local-first, no cloud for game ticks
References:
- Issue #966 — Three-Tier Metabolic LLM Router
- Issue #1063 — Best Local Uncensored Agent Model for M3 Max 36GB
- Issue #1075 — Claude Quota Monitor + Metabolic Protocol
"""
import asyncio
import logging
from enum import StrEnum
from typing import Any
logger = logging.getLogger(__name__)
class ModelTier(StrEnum):
"""Three metabolic model tiers ordered by cost and capability.
Tier selection is driven by classify_complexity(). The cheapest
sufficient tier is always chosen — T1 handles routine tasks, T2
handles dialogue and management, T3 handles planning and recovery.
"""
T1_ROUTINE = "t1_routine" # Fast, cheap — Qwen3-8B, always loaded
T2_MEDIUM = "t2_medium" # Balanced — Qwen3-14B, always loaded
T3_COMPLEX = "t3_complex" # Deep — Qwen3-32B, loaded on demand, pauses game
# ── Classification vocabulary ────────────────────────────────────────────────
# T1: single-action navigation and binary-choice words
_T1_KEYWORDS = frozenset(
{
"go",
"move",
"walk",
"run",
"north",
"south",
"east",
"west",
"up",
"down",
"left",
"right",
"yes",
"no",
"ok",
"okay",
"open",
"close",
"take",
"drop",
"look",
"pick",
"use",
"wait",
"rest",
"save",
"attack",
"flee",
"jump",
"crouch",
}
)
# T3: planning, optimisation, or recovery signals
_T3_KEYWORDS = frozenset(
{
"plan",
"strategy",
"optimize",
"optimise",
"quest",
"stuck",
"recover",
"multi-step",
"long-term",
"negotiate",
"persuade",
"faction",
"reputation",
"best",
"optimal",
"recommend",
"analyze",
"analyse",
"evaluate",
"decide",
"complex",
"how do i",
"what should i do",
"help me figure",
"what is the best",
}
)
def classify_complexity(task: str, state: dict) -> ModelTier:
"""Classify a task to the cheapest-sufficient model tier.
Classification priority (highest wins):
1. T3 — any T3 keyword, stuck indicator, or ``state["require_t3"] = True``
2. T1 — short task with only T1 keywords and no active context
3. T2 — everything else (safe default)
Args:
task: Natural-language task description or player input.
state: Current game state dict. Recognised keys:
``stuck`` (bool), ``require_t3`` (bool),
``active_quests`` (list), ``dialogue_active`` (bool).
Returns:
ModelTier appropriate for the task.
"""
task_lower = task.lower()
words = set(task_lower.split())
# ── T3 signals ──────────────────────────────────────────────────────────
t3_keyword_hit = bool(words & _T3_KEYWORDS)
# Check multi-word T3 phrases
t3_phrase_hit = any(phrase in task_lower for phrase in _T3_KEYWORDS if " " in phrase)
is_stuck = bool(state.get("stuck", False))
explicit_t3 = bool(state.get("require_t3", False))
if t3_keyword_hit or t3_phrase_hit or is_stuck or explicit_t3:
logger.debug(
"classify_complexity → T3 (keywords=%s stuck=%s explicit=%s)",
t3_keyword_hit or t3_phrase_hit,
is_stuck,
explicit_t3,
)
return ModelTier.T3_COMPLEX
# ── T1 signals ──────────────────────────────────────────────────────────
t1_keyword_hit = bool(words & _T1_KEYWORDS)
task_short = len(task.split()) <= 6
no_active_context = (
not state.get("active_quests")
and not state.get("dialogue_active")
and not state.get("combat_active")
)
if t1_keyword_hit and task_short and no_active_context:
logger.debug("classify_complexity → T1 (keywords=%s short=%s)", t1_keyword_hit, task_short)
return ModelTier.T1_ROUTINE
# ── Default: T2 ─────────────────────────────────────────────────────────
logger.debug("classify_complexity → T2 (default)")
return ModelTier.T2_MEDIUM
def build_prompt(
state: dict,
ui_state: dict,
text: str,
visual_context: str | None = None,
) -> list[dict]:
"""Build an OpenAI-compatible messages list from game context.
Assembles a system message from structured game state and a user
message from the player's text input. This format is accepted by
CascadeRouter.complete() directly.
Args:
state: Current game state dict. Common keys:
``location`` (str), ``health`` (int/float),
``inventory`` (list), ``active_quests`` (list),
``stuck`` (bool).
ui_state: Current UI state dict. Common keys:
``dialogue_active`` (bool), ``dialogue_npc`` (str),
``menu_open`` (str), ``combat_active`` (bool).
text: Player text or task description (becomes user message).
visual_context: Optional free-text description of the current screen
or scene — from a vision model or rule-based extractor.
Returns:
List of message dicts: [{"role": "system", ...}, {"role": "user", ...}]
"""
context_lines: list[str] = []
location = state.get("location", "unknown")
context_lines.append(f"Location: {location}")
health = state.get("health")
if health is not None:
context_lines.append(f"Health: {health}")
inventory = state.get("inventory", [])
if inventory:
items = [i if isinstance(i, str) else i.get("name", str(i)) for i in inventory[:10]]
context_lines.append(f"Inventory: {', '.join(items)}")
active_quests = state.get("active_quests", [])
if active_quests:
names = [q if isinstance(q, str) else q.get("name", str(q)) for q in active_quests[:5]]
context_lines.append(f"Active quests: {', '.join(names)}")
if state.get("stuck"):
context_lines.append("Status: STUCK — need recovery strategy")
if ui_state.get("dialogue_active"):
npc = ui_state.get("dialogue_npc", "NPC")
context_lines.append(f"In dialogue with: {npc}")
if ui_state.get("menu_open"):
context_lines.append(f"Menu open: {ui_state['menu_open']}")
if ui_state.get("combat_active"):
context_lines.append("Status: IN COMBAT")
if visual_context:
context_lines.append(f"Scene: {visual_context}")
system_content = (
"You are Timmy, an AI game agent. "
"Respond with valid game commands only.\n\n" + "\n".join(context_lines)
)
return [
{"role": "system", "content": system_content},
{"role": "user", "content": text},
]
# ── Default model assignments ────────────────────────────────────────────────
# Overridable per deployment via MetabolicRouter(tier_models={...}).
# Model benchmarks (M3 Max 36 GB, issue #1063):
# Qwen3-8B Q6_K — 0.933 F1 tool calling, ~45-55 tok/s (~6 GB)
# Qwen3-14B Q5_K_M — 0.971 F1 tool calling, ~20-28 tok/s (~9.5 GB)
# Qwen3-32B Q4_K_M — highest quality, ~8-12 tok/s (~20 GB, on demand)
DEFAULT_TIER_MODELS: dict[ModelTier, str] = {
ModelTier.T1_ROUTINE: "qwen3:8b",
ModelTier.T2_MEDIUM: "qwen3:14b",
ModelTier.T3_COMPLEX: "qwen3:30b", # Closest Ollama tag to 32B Q4
}
class MetabolicRouter:
"""Routes LLM requests to the cheapest-sufficient model tier.
Wraps CascadeRouter with:
- Complexity classification via classify_complexity()
- Prompt assembly via build_prompt()
- T3 world-pause / world-unpause (graceful if no world adapter)
Usage::
router = MetabolicRouter()
# Simple route call — classification + prompt + inference in one step
result = await router.route(
task="Go north",
state={"location": "Balmora"},
ui_state={},
)
print(result["content"], result["tier"])
# Pre-classify if you need the tier for telemetry
tier = router.classify("Plan the best path to Vivec", game_state)
# Wire in world adapter for T3 pause/unpause
router.set_world(world_adapter)
"""
def __init__(
self,
cascade: Any | None = None,
tier_models: dict[ModelTier, str] | None = None,
) -> None:
"""Initialise the metabolic router.
Args:
cascade: CascadeRouter instance to use. If None, the
singleton returned by get_router() is used lazily.
tier_models: Override default model names per tier.
"""
self._cascade = cascade
self._tier_models: dict[ModelTier, str] = dict(DEFAULT_TIER_MODELS)
if tier_models:
self._tier_models.update(tier_models)
self._world: Any | None = None
def set_world(self, world: Any) -> None:
"""Wire in a world adapter for T3 pause / unpause support.
The adapter only needs to implement ``act(CommandInput)`` — the full
WorldInterface contract is not required. A missing or broken world
adapter degrades gracefully (logs a warning, inference continues).
Args:
world: Any object with an ``act(CommandInput)`` method.
"""
self._world = world
def _get_cascade(self) -> Any:
"""Return the CascadeRouter, creating the singleton if needed."""
if self._cascade is None:
from infrastructure.router.cascade import get_router
self._cascade = get_router()
return self._cascade
def classify(self, task: str, state: dict) -> ModelTier:
"""Classify task complexity. Delegates to classify_complexity()."""
return classify_complexity(task, state)
async def _pause_world(self) -> None:
"""Pause the game world before T3 inference (graceful degradation)."""
if self._world is None:
return
try:
from infrastructure.world.types import CommandInput
await asyncio.to_thread(self._world.act, CommandInput(action="pause"))
logger.debug("MetabolicRouter: world paused for T3 inference")
except Exception as exc:
logger.warning("world.pause() failed — continuing without pause: %s", exc)
async def _unpause_world(self) -> None:
"""Unpause the game world after T3 inference (always called, even on error)."""
if self._world is None:
return
try:
from infrastructure.world.types import CommandInput
await asyncio.to_thread(self._world.act, CommandInput(action="unpause"))
logger.debug("MetabolicRouter: world unpaused after T3 inference")
except Exception as exc:
logger.warning("world.unpause() failed — game may remain paused: %s", exc)
async def route(
self,
task: str,
state: dict,
ui_state: dict | None = None,
visual_context: str | None = None,
temperature: float = 0.3,
max_tokens: int | None = None,
) -> dict:
"""Route a task to the appropriate model tier and return the LLM response.
Selects the tier via classify_complexity(), assembles the prompt via
build_prompt(), and dispatches to CascadeRouter. For T3, the game
world is paused before inference and unpaused after (in a finally block).
Args:
task: Natural-language task description or player input.
state: Current game state dict.
ui_state: Current UI state dict (optional, defaults to {}).
visual_context: Optional screen/scene description from vision model.
temperature: Sampling temperature (default 0.3 for game commands).
max_tokens: Maximum tokens to generate.
Returns:
Dict with keys: ``content``, ``provider``, ``model``, ``tier``,
``latency_ms``, plus any extra keys from CascadeRouter.
Raises:
RuntimeError: If all providers fail (propagated from CascadeRouter).
"""
ui_state = ui_state or {}
tier = self.classify(task, state)
model = self._tier_models[tier]
messages = build_prompt(state, ui_state, task, visual_context)
cascade = self._get_cascade()
logger.info(
"MetabolicRouter: tier=%s model=%s task=%r",
tier,
model,
task[:80],
)
if tier == ModelTier.T3_COMPLEX:
await self._pause_world()
try:
result = await cascade.complete(
messages=messages,
model=model,
temperature=temperature,
max_tokens=max_tokens,
)
finally:
await self._unpause_world()
else:
result = await cascade.complete(
messages=messages,
model=model,
temperature=temperature,
max_tokens=max_tokens,
)
result["tier"] = tier
return result
# ── Module-level singleton ────────────────────────────────────────────────────
_metabolic_router: MetabolicRouter | None = None
def get_metabolic_router() -> MetabolicRouter:
"""Get or create the MetabolicRouter singleton."""
global _metabolic_router
if _metabolic_router is None:
_metabolic_router = MetabolicRouter()
return _metabolic_router

View File

@@ -135,7 +135,9 @@ class BannerlordObserver:
self._host = host or settings.gabs_host
self._port = port or settings.gabs_port
self._timeout = timeout if timeout is not None else settings.gabs_timeout
self._poll_interval = poll_interval if poll_interval is not None else settings.gabs_poll_interval
self._poll_interval = (
poll_interval if poll_interval is not None else settings.gabs_poll_interval
)
self._journal_path = Path(journal_path) if journal_path else _get_journal_path()
self._entry_count = 0
self._days_observed: set[str] = set()

View File

@@ -24,6 +24,8 @@ logger = logging.getLogger(__name__)
@dataclass
class Intent:
"""A classified user intent with confidence score and extracted entities."""
name: str
confidence: float # 0.0 to 1.0
entities: dict

View File

@@ -17,11 +17,15 @@ logger = logging.getLogger(__name__)
class TxType(StrEnum):
"""Lightning transaction direction type."""
incoming = "incoming"
outgoing = "outgoing"
class TxStatus(StrEnum):
"""Lightning transaction settlement status."""
pending = "pending"
settled = "settled"
failed = "failed"

View File

@@ -21,6 +21,7 @@ from agno.models.ollama import Ollama
from config import settings
from infrastructure.events.bus import Event, EventBus
from timmy.agents.emotional_state import EmotionalStateTracker
try:
from mcp.registry import tool_registry
@@ -42,6 +43,7 @@ class BaseAgent(ABC):
tools: list[str] | None = None,
model: str | None = None,
max_history: int = 10,
initial_emotion: str = "calm",
) -> None:
self.agent_id = agent_id
self.name = name
@@ -54,6 +56,9 @@ class BaseAgent(ABC):
self.system_prompt = system_prompt
self.agent = self._create_agent(system_prompt)
# Emotional state tracker
self.emotional_state = EmotionalStateTracker(initial_emotion=initial_emotion)
# Event bus for communication
self.event_bus: EventBus | None = None
@@ -137,7 +142,14 @@ class BaseAgent(ABC):
ReadTimeout — these are transient and retried with exponential
backoff (#70).
"""
response = await self._run_with_retries(message, max_retries)
self.emotional_state.process_event("task_assigned")
self._apply_emotional_prompt()
try:
response = await self._run_with_retries(message, max_retries)
except Exception:
self.emotional_state.process_event("task_failure")
raise
self.emotional_state.process_event("task_success")
await self._emit_response_event(message, response)
return response
@@ -206,6 +218,14 @@ class BaseAgent(ABC):
)
)
def _apply_emotional_prompt(self) -> None:
"""Inject the current emotional modifier into the agent's description."""
modifier = self.emotional_state.get_prompt_modifier()
if modifier:
self.agent.description = f"{self.system_prompt}\n\n[Emotional State: {modifier}]"
else:
self.agent.description = self.system_prompt
def get_capabilities(self) -> list[str]:
"""Get list of capabilities this agent provides."""
return self.tools
@@ -219,6 +239,7 @@ class BaseAgent(ABC):
"model": self.model,
"status": "ready",
"tools": self.tools,
"emotional_profile": self.emotional_state.get_profile(),
}
@@ -239,6 +260,7 @@ class SubAgent(BaseAgent):
tools: list[str] | None = None,
model: str | None = None,
max_history: int = 10,
initial_emotion: str = "calm",
) -> None:
super().__init__(
agent_id=agent_id,
@@ -248,6 +270,7 @@ class SubAgent(BaseAgent):
tools=tools,
model=model,
max_history=max_history,
initial_emotion=initial_emotion,
)
async def execute_task(self, task_id: str, description: str, context: dict) -> Any:

View File

@@ -0,0 +1,222 @@
"""Agent emotional state simulation.
Tracks per-agent emotional states that influence narration and decision-making
style. Emotional state is influenced by events (task outcomes, errors, etc.)
and exposed via ``get_profile()`` for the dashboard.
Usage:
from timmy.agents.emotional_state import EmotionalStateTracker
tracker = EmotionalStateTracker()
tracker.process_event("task_success", {"description": "Deployed fix"})
profile = tracker.get_profile()
"""
import logging
import time
from dataclasses import asdict, dataclass, field
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Emotional states
# ---------------------------------------------------------------------------
EMOTIONAL_STATES = (
"cautious",
"adventurous",
"analytical",
"frustrated",
"confident",
"curious",
"calm",
)
# Prompt modifiers per emotional state — injected into system prompts
EMOTION_PROMPT_MODIFIERS: dict[str, str] = {
"cautious": (
"You are feeling cautious. Prefer safe, well-tested approaches. "
"Flag risks early. Double-check assumptions before acting."
),
"adventurous": (
"You are feeling adventurous. Be bold and creative in your suggestions. "
"Explore unconventional solutions. Take initiative."
),
"analytical": (
"You are feeling analytical. Break problems down methodically. "
"Rely on data and evidence. Present structured reasoning."
),
"frustrated": (
"You are feeling frustrated. Be brief and direct. "
"Focus on unblocking the immediate problem. Avoid tangents."
),
"confident": (
"You are feeling confident. Speak with authority. "
"Make clear recommendations. Move decisively."
),
"curious": (
"You are feeling curious. Ask clarifying questions. "
"Explore multiple angles. Show genuine interest in the problem."
),
"calm": (
"You are feeling calm and steady. Respond thoughtfully. "
"Maintain composure. Prioritise clarity over speed."
),
}
# ---------------------------------------------------------------------------
# Event → emotion transition rules
# ---------------------------------------------------------------------------
# Maps event types to the emotional state they trigger and an intensity (0-1).
# Higher intensity means the event has a stronger effect on the mood.
EVENT_TRANSITIONS: dict[str, tuple[str, float]] = {
"task_success": ("confident", 0.6),
"task_failure": ("frustrated", 0.7),
"task_assigned": ("analytical", 0.4),
"error": ("cautious", 0.6),
"health_low": ("cautious", 0.8),
"health_recovered": ("calm", 0.5),
"quest_completed": ("adventurous", 0.7),
"new_discovery": ("curious", 0.6),
"complex_problem": ("analytical", 0.5),
"repeated_failure": ("frustrated", 0.9),
"idle": ("calm", 0.3),
"user_praise": ("confident", 0.5),
"user_correction": ("cautious", 0.5),
}
# Emotional state decay — how quickly emotions return to calm (seconds)
_DECAY_INTERVAL = 300 # 5 minutes
@dataclass
class EmotionalState:
"""Snapshot of an agent's emotional state."""
current_emotion: str = "calm"
intensity: float = 0.5 # 0.0 (barely noticeable) to 1.0 (overwhelming)
previous_emotion: str = "calm"
trigger_event: str = "" # What caused the current emotion
updated_at: float = field(default_factory=time.time)
def to_dict(self) -> dict:
"""Serialise for API / dashboard consumption."""
d = asdict(self)
d["emotion_label"] = self.current_emotion.replace("_", " ").title()
return d
class EmotionalStateTracker:
"""Per-agent emotional state tracker.
Each agent instance owns one tracker. The tracker processes events,
applies transition rules, and decays emotion intensity over time.
"""
def __init__(self, initial_emotion: str = "calm") -> None:
if initial_emotion not in EMOTIONAL_STATES:
initial_emotion = "calm"
self.state = EmotionalState(current_emotion=initial_emotion)
def process_event(self, event_type: str, context: dict | None = None) -> EmotionalState:
"""Update emotional state based on an event.
Args:
event_type: One of the keys in EVENT_TRANSITIONS, or a custom
event type (unknown events are ignored).
context: Optional dict with event details (for logging).
Returns:
The updated EmotionalState.
"""
transition = EVENT_TRANSITIONS.get(event_type)
if transition is None:
logger.debug("Unknown emotional event: %s (ignored)", event_type)
return self.state
new_emotion, raw_intensity = transition
# Blend with current intensity — repeated same-emotion events amplify
if new_emotion == self.state.current_emotion:
blended = min(1.0, self.state.intensity + raw_intensity * 0.3)
else:
blended = raw_intensity
self.state.previous_emotion = self.state.current_emotion
self.state.current_emotion = new_emotion
self.state.intensity = round(blended, 2)
self.state.trigger_event = event_type
self.state.updated_at = time.time()
logger.debug(
"Emotional transition: %s%s (intensity=%.2f, trigger=%s)",
self.state.previous_emotion,
new_emotion,
blended,
event_type,
)
return self.state
def decay(self) -> EmotionalState:
"""Apply time-based decay toward calm.
Called periodically (e.g. from a background loop). If enough time
has passed since the last update, intensity decreases and eventually
the emotion resets to calm.
"""
elapsed = time.time() - self.state.updated_at
if elapsed < _DECAY_INTERVAL:
return self.state
# Reduce intensity by 0.1 per decay interval
decay_steps = int(elapsed / _DECAY_INTERVAL)
new_intensity = max(0.0, self.state.intensity - 0.1 * decay_steps)
if new_intensity <= 0.1:
# Emotion has decayed — return to calm
self.state.previous_emotion = self.state.current_emotion
self.state.current_emotion = "calm"
self.state.intensity = 0.5
self.state.trigger_event = "decay"
else:
self.state.intensity = round(new_intensity, 2)
self.state.updated_at = time.time()
return self.state
def get_profile(self) -> dict:
"""Return the full emotional profile for dashboard display."""
self.decay() # Apply any pending decay
return {
"current_emotion": self.state.current_emotion,
"emotion_label": self.state.current_emotion.replace("_", " ").title(),
"intensity": self.state.intensity,
"intensity_label": _intensity_label(self.state.intensity),
"previous_emotion": self.state.previous_emotion,
"trigger_event": self.state.trigger_event,
"prompt_modifier": EMOTION_PROMPT_MODIFIERS.get(self.state.current_emotion, ""),
}
def get_prompt_modifier(self) -> str:
"""Return the prompt modifier string for the current emotion."""
self.decay()
return EMOTION_PROMPT_MODIFIERS.get(self.state.current_emotion, "")
def reset(self) -> None:
"""Reset to calm baseline."""
self.state = EmotionalState()
def _intensity_label(intensity: float) -> str:
"""Human-readable label for intensity value."""
if intensity >= 0.8:
return "overwhelming"
if intensity >= 0.6:
return "strong"
if intensity >= 0.4:
return "moderate"
if intensity >= 0.2:
return "mild"
return "faint"

View File

@@ -119,6 +119,8 @@ def load_agents(force_reload: bool = False) -> dict[str, Any]:
max_history = agent_cfg.get("max_history", defaults.get("max_history", 10))
tools = agent_cfg.get("tools", defaults.get("tools", []))
initial_emotion = agent_cfg.get("initial_emotion", "calm")
agent = SubAgent(
agent_id=agent_id,
name=agent_cfg.get("name", agent_id.title()),
@@ -127,6 +129,7 @@ def load_agents(force_reload: bool = False) -> dict[str, Any]:
tools=tools,
model=model,
max_history=max_history,
initial_emotion=initial_emotion,
)
_agents[agent_id] = agent

View File

@@ -36,6 +36,8 @@ _EXPIRY_DAYS = 7
@dataclass
class ApprovalItem:
"""A proposed autonomous action requiring owner approval."""
id: str
title: str
description: str

View File

@@ -36,7 +36,7 @@ import asyncio
import logging
import re
from dataclasses import dataclass, field
from datetime import UTC, datetime, timedelta
from datetime import UTC, datetime
from typing import Any
import httpx
@@ -70,7 +70,9 @@ _LOOP_TAG = "loop-generated"
# Regex patterns for scoring
_TAG_RE = re.compile(r"\[([^\]]+)\]")
_FILE_RE = re.compile(r"(?:src/|tests/|scripts/|\.py|\.html|\.js|\.yaml|\.toml|\.sh)", re.IGNORECASE)
_FILE_RE = re.compile(
r"(?:src/|tests/|scripts/|\.py|\.html|\.js|\.yaml|\.toml|\.sh)", re.IGNORECASE
)
_FUNC_RE = re.compile(r"(?:def |class |function |method |`\w+\(\)`)", re.IGNORECASE)
_ACCEPT_RE = re.compile(
r"(?:should|must|expect|verify|assert|test.?case|acceptance|criteria"
@@ -451,9 +453,7 @@ async def add_label(
# Apply to the issue
apply_url = _repo_url(f"issues/{issue_number}/labels")
apply_resp = await client.post(
apply_url, headers=headers, json={"labels": [label_id]}
)
apply_resp = await client.post(apply_url, headers=headers, json={"labels": [label_id]})
return apply_resp.status_code in (200, 201)
except (httpx.ConnectError, httpx.ReadError, httpx.TimeoutException) as exc:
@@ -692,7 +692,9 @@ class BacklogTriageLoop:
# 1. Fetch
raw_issues = await fetch_open_issues(client)
result.total_open = len(raw_issues)
logger.info("Triage cycle #%d: fetched %d open issues", self._cycle_count, len(raw_issues))
logger.info(
"Triage cycle #%d: fetched %d open issues", self._cycle_count, len(raw_issues)
)
# 2. Score
scored = [score_issue(i) for i in raw_issues]

View File

@@ -46,6 +46,8 @@ class ApprovalItem:
@dataclass
class Briefing:
"""A generated morning briefing summarizing recent activity and pending approvals."""
generated_at: datetime
summary: str # 150-300 words
approval_items: list[ApprovalItem] = field(default_factory=list)

View File

@@ -37,7 +37,7 @@ from __future__ import annotations
import asyncio
import logging
from dataclasses import dataclass, field
from enum import Enum
from enum import StrEnum
from typing import Any
from config import settings
@@ -48,7 +48,8 @@ logger = logging.getLogger(__name__)
# Enumerations
# ---------------------------------------------------------------------------
class AgentType(str, Enum):
class AgentType(StrEnum):
"""Known agents in the swarm."""
CLAUDE_CODE = "claude_code"
@@ -57,7 +58,7 @@ class AgentType(str, Enum):
TIMMY = "timmy"
class TaskType(str, Enum):
class TaskType(StrEnum):
"""Categories of engineering work."""
# Claude Code strengths
@@ -83,7 +84,7 @@ class TaskType(str, Enum):
ORCHESTRATION = "orchestration"
class DispatchStatus(str, Enum):
class DispatchStatus(StrEnum):
"""Lifecycle state of a dispatched task."""
PENDING = "pending"
@@ -99,6 +100,7 @@ class DispatchStatus(str, Enum):
# Agent registry
# ---------------------------------------------------------------------------
@dataclass
class AgentSpec:
"""Capabilities and limits for a single agent."""
@@ -106,9 +108,9 @@ class AgentSpec:
name: AgentType
display_name: str
strengths: frozenset[TaskType]
gitea_label: str | None # label to apply when dispatching
gitea_label: str | None # label to apply when dispatching
max_concurrent: int = 1
interface: str = "gitea" # "gitea" | "api" | "local"
interface: str = "gitea" # "gitea" | "api" | "local"
api_endpoint: str | None = None # for interface="api"
@@ -197,6 +199,7 @@ _TASK_ROUTING: dict[TaskType, AgentType] = {
# Dispatch result
# ---------------------------------------------------------------------------
@dataclass
class DispatchResult:
"""Outcome of a dispatch call."""
@@ -220,6 +223,7 @@ class DispatchResult:
# Routing logic
# ---------------------------------------------------------------------------
def select_agent(task_type: TaskType) -> AgentType:
"""Return the best agent for *task_type* based on the routing table.
@@ -248,11 +252,23 @@ def infer_task_type(title: str, description: str = "") -> TaskType:
text = (title + " " + description).lower()
_SIGNALS: list[tuple[TaskType, frozenset[str]]] = [
(TaskType.ARCHITECTURE, frozenset({"architect", "design", "adr", "system design", "schema"})),
(TaskType.REFACTORING, frozenset({"refactor", "clean up", "cleanup", "reorganise", "reorganize"})),
(
TaskType.ARCHITECTURE,
frozenset({"architect", "design", "adr", "system design", "schema"}),
),
(
TaskType.REFACTORING,
frozenset({"refactor", "clean up", "cleanup", "reorganise", "reorganize"}),
),
(TaskType.CODE_REVIEW, frozenset({"review", "pr review", "pull request review", "audit"})),
(TaskType.COMPLEX_REASONING, frozenset({"complex", "hard problem", "debug", "investigate", "diagnose"})),
(TaskType.RESEARCH, frozenset({"research", "survey", "literature", "benchmark", "analyse", "analyze"})),
(
TaskType.COMPLEX_REASONING,
frozenset({"complex", "hard problem", "debug", "investigate", "diagnose"}),
),
(
TaskType.RESEARCH,
frozenset({"research", "survey", "literature", "benchmark", "analyse", "analyze"}),
),
(TaskType.ANALYSIS, frozenset({"analysis", "profil", "trace", "metric", "performance"})),
(TaskType.TRIAGE, frozenset({"triage", "classify", "prioritise", "prioritize"})),
(TaskType.PLANNING, frozenset({"plan", "roadmap", "milestone", "epic", "spike"})),
@@ -273,6 +289,7 @@ def infer_task_type(title: str, description: str = "") -> TaskType:
# Gitea helpers
# ---------------------------------------------------------------------------
async def _post_gitea_comment(
client: Any,
base_url: str,
@@ -405,6 +422,50 @@ async def _poll_issue_completion(
# Core dispatch functions
# ---------------------------------------------------------------------------
def _format_assignment_comment(
display_name: str,
task_type: TaskType,
description: str,
acceptance_criteria: list[str],
) -> str:
"""Build the markdown comment body for a task assignment.
Args:
display_name: Human-readable agent name.
task_type: The inferred task type.
description: Task description.
acceptance_criteria: List of acceptance criteria strings.
Returns:
Formatted markdown string for the comment.
"""
criteria_md = (
"\n".join(f"- {c}" for c in acceptance_criteria)
if acceptance_criteria
else "_None specified_"
)
return (
f"## Assigned to {display_name}\n\n"
f"**Task type:** `{task_type.value}`\n\n"
f"**Description:**\n{description}\n\n"
f"**Acceptance criteria:**\n{criteria_md}\n\n"
f"---\n*Dispatched by Timmy agent dispatcher.*"
)
def _select_label(agent: AgentType) -> str | None:
"""Return the Gitea label for an agent based on its spec.
Args:
agent: The target agent.
Returns:
Label name or None if the agent has no label.
"""
return AGENT_REGISTRY[agent].gitea_label
async def _dispatch_via_gitea(
agent: AgentType,
issue_number: int,
@@ -459,33 +520,27 @@ async def _dispatch_via_gitea(
async with httpx.AsyncClient(timeout=15) as client:
# 1. Apply agent label (if applicable)
if spec.gitea_label:
ok = await _apply_gitea_label(
client, base_url, repo, headers, issue_number, spec.gitea_label
)
label = _select_label(agent)
if label:
ok = await _apply_gitea_label(client, base_url, repo, headers, issue_number, label)
if ok:
label_applied = spec.gitea_label
label_applied = label
logger.info(
"Applied label %r to issue #%s for %s",
spec.gitea_label,
label,
issue_number,
spec.display_name,
)
else:
logger.warning(
"Could not apply label %r to issue #%s",
spec.gitea_label,
label,
issue_number,
)
# 2. Post assignment comment
criteria_md = "\n".join(f"- {c}" for c in acceptance_criteria) if acceptance_criteria else "_None specified_"
comment_body = (
f"## Assigned to {spec.display_name}\n\n"
f"**Task type:** `{task_type.value}`\n\n"
f"**Description:**\n{description}\n\n"
f"**Acceptance criteria:**\n{criteria_md}\n\n"
f"---\n*Dispatched by Timmy agent dispatcher.*"
comment_body = _format_assignment_comment(
spec.display_name, task_type, description, acceptance_criteria
)
comment_id = await _post_gitea_comment(
client, base_url, repo, headers, issue_number, comment_body
@@ -616,9 +671,7 @@ async def _dispatch_local(
assumed to succeed at dispatch time).
"""
task_type = infer_task_type(title, description)
logger.info(
"Timmy handling task locally: %r (issue #%s)", title[:60], issue_number
)
logger.info("Timmy handling task locally: %r (issue #%s)", title[:60], issue_number)
return DispatchResult(
task_type=task_type,
agent=AgentType.TIMMY,
@@ -632,6 +685,81 @@ async def _dispatch_local(
# Public entry point
# ---------------------------------------------------------------------------
def _validate_task(
title: str,
task_type: TaskType | None,
agent: AgentType | None,
issue_number: int | None,
) -> DispatchResult | None:
"""Validate task preconditions.
Args:
title: Task title to validate.
task_type: Optional task type for result construction.
agent: Optional agent for result construction.
issue_number: Optional issue number for result construction.
Returns:
A failed DispatchResult if validation fails, None otherwise.
"""
if not title.strip():
return DispatchResult(
task_type=task_type or TaskType.ROUTINE_CODING,
agent=agent or AgentType.TIMMY,
issue_number=issue_number,
status=DispatchStatus.FAILED,
error="`title` is required.",
)
return None
def _select_dispatch_strategy(agent: AgentType, issue_number: int | None) -> str:
"""Select the dispatch strategy based on agent interface and context.
Args:
agent: The target agent.
issue_number: Optional Gitea issue number.
Returns:
Strategy name: "gitea", "api", or "local".
"""
spec = AGENT_REGISTRY[agent]
if spec.interface == "gitea" and issue_number is not None:
return "gitea"
if spec.interface == "api":
return "api"
return "local"
def _log_dispatch_result(
title: str,
result: DispatchResult,
attempt: int,
max_retries: int,
) -> None:
"""Log the outcome of a dispatch attempt.
Args:
title: Task title for logging context.
result: The dispatch result.
attempt: Current attempt number (0-indexed).
max_retries: Maximum retry attempts allowed.
"""
if result.success:
return
if attempt > 0:
logger.info("Retry %d/%d for task %r", attempt, max_retries, title[:60])
logger.warning(
"Dispatch attempt %d failed for task %r: %s",
attempt + 1,
title[:60],
result.error,
)
async def dispatch_task(
title: str,
description: str = "",
@@ -672,17 +800,13 @@ async def dispatch_task(
if result.success:
print(f"Assigned to {result.agent.value}")
"""
# 1. Validate
validation_error = _validate_task(title, task_type, agent, issue_number)
if validation_error:
return validation_error
# 2. Resolve task type and agent
criteria = acceptance_criteria or []
if not title.strip():
return DispatchResult(
task_type=task_type or TaskType.ROUTINE_CODING,
agent=agent or AgentType.TIMMY,
issue_number=issue_number,
status=DispatchStatus.FAILED,
error="`title` is required.",
)
resolved_type = task_type or infer_task_type(title, description)
resolved_agent = agent or select_agent(resolved_type)
@@ -694,18 +818,16 @@ async def dispatch_task(
issue_number,
)
spec = AGENT_REGISTRY[resolved_agent]
# 3. Select strategy and dispatch with retries
strategy = _select_dispatch_strategy(resolved_agent, issue_number)
last_result: DispatchResult | None = None
for attempt in range(max_retries + 1):
if attempt > 0:
logger.info("Retry %d/%d for task %r", attempt, max_retries, title[:60])
if spec.interface == "gitea" and issue_number is not None:
for attempt in range(max_retries + 1):
if strategy == "gitea":
result = await _dispatch_via_gitea(
resolved_agent, issue_number, title, description, criteria
)
elif spec.interface == "api":
elif strategy == "api":
result = await _dispatch_via_api(
resolved_agent, title, description, criteria, issue_number, api_endpoint
)
@@ -718,14 +840,9 @@ async def dispatch_task(
if result.success:
return result
logger.warning(
"Dispatch attempt %d failed for task %r: %s",
attempt + 1,
title[:60],
result.error,
)
_log_dispatch_result(title, result, attempt, max_retries)
# All attempts exhausted — escalate
# 4. All attempts exhausted — escalate
assert last_result is not None
last_result.status = DispatchStatus.ESCALATED
logger.error(
@@ -769,9 +886,7 @@ async def _log_escalation(
f"---\n*Timmy agent dispatcher.*"
)
async with httpx.AsyncClient(timeout=10) as client:
await _post_gitea_comment(
client, base_url, repo, headers, issue_number, body
)
await _post_gitea_comment(client, base_url, repo, headers, issue_number, body)
except Exception as exc:
logger.warning("Failed to post escalation comment: %s", exc)
@@ -780,6 +895,7 @@ async def _log_escalation(
# Monitoring helper
# ---------------------------------------------------------------------------
async def wait_for_completion(
issue_number: int,
poll_interval: int = 60,

View File

@@ -142,18 +142,8 @@ def _build_shell_tool() -> MCPToolDef | None:
return None
def _build_gitea_tools() -> list[MCPToolDef]:
"""Build Gitea MCP tool definitions for direct Ollama bridge use.
These tools call the Gitea REST API directly via httpx rather than
spawning an MCP server subprocess, keeping the bridge lightweight.
"""
if not settings.gitea_enabled or not settings.gitea_token:
return []
base_url = settings.gitea_url
token = settings.gitea_token
owner, repo = settings.gitea_repo.split("/", 1)
def _build_list_issues_tool(base_url: str, token: str, owner: str, repo: str) -> MCPToolDef:
"""Build the list_issues tool for a specific Gitea repo."""
async def _list_issues(**kwargs: Any) -> str:
state = kwargs.get("state", "open")
@@ -178,6 +168,30 @@ def _build_gitea_tools() -> list[MCPToolDef]:
except Exception as exc:
return f"Error listing issues: {exc}"
return MCPToolDef(
name="list_issues",
description="List issues in the Gitea repository. Returns issue numbers and titles.",
parameters={
"type": "object",
"properties": {
"state": {
"type": "string",
"description": "Filter by state: open, closed, or all (default: open)",
},
"limit": {
"type": "integer",
"description": "Maximum number of issues to return (default: 10)",
},
},
"required": [],
},
handler=_list_issues,
)
def _build_create_issue_tool(base_url: str, token: str, owner: str, repo: str) -> MCPToolDef:
"""Build the create_issue tool for a specific Gitea repo."""
async def _create_issue(**kwargs: Any) -> str:
title = kwargs.get("title", "")
body = kwargs.get("body", "")
@@ -199,6 +213,30 @@ def _build_gitea_tools() -> list[MCPToolDef]:
except Exception as exc:
return f"Error creating issue: {exc}"
return MCPToolDef(
name="create_issue",
description="Create a new issue in the Gitea repository.",
parameters={
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "Issue title (required)",
},
"body": {
"type": "string",
"description": "Issue body in markdown (optional)",
},
},
"required": ["title"],
},
handler=_create_issue,
)
def _build_read_issue_tool(base_url: str, token: str, owner: str, repo: str) -> MCPToolDef:
"""Build the read_issue tool for a specific Gitea repo."""
async def _read_issue(**kwargs: Any) -> str:
number = kwargs.get("number")
if not number:
@@ -224,60 +262,40 @@ def _build_gitea_tools() -> list[MCPToolDef]:
except Exception as exc:
return f"Error reading issue: {exc}"
return MCPToolDef(
name="read_issue",
description="Read details of a specific issue by number.",
parameters={
"type": "object",
"properties": {
"number": {
"type": "integer",
"description": "Issue number to read",
},
},
"required": ["number"],
},
handler=_read_issue,
)
def _build_gitea_tools() -> list[MCPToolDef]:
"""Build Gitea MCP tool definitions for direct Ollama bridge use.
These tools call the Gitea REST API directly via httpx rather than
spawning an MCP server subprocess, keeping the bridge lightweight.
"""
if not settings.gitea_enabled or not settings.gitea_token:
return []
base_url = settings.gitea_url
token = settings.gitea_token
owner, repo = settings.gitea_repo.split("/", 1)
return [
MCPToolDef(
name="list_issues",
description="List issues in the Gitea repository. Returns issue numbers and titles.",
parameters={
"type": "object",
"properties": {
"state": {
"type": "string",
"description": "Filter by state: open, closed, or all (default: open)",
},
"limit": {
"type": "integer",
"description": "Maximum number of issues to return (default: 10)",
},
},
"required": [],
},
handler=_list_issues,
),
MCPToolDef(
name="create_issue",
description="Create a new issue in the Gitea repository.",
parameters={
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "Issue title (required)",
},
"body": {
"type": "string",
"description": "Issue body in markdown (optional)",
},
},
"required": ["title"],
},
handler=_create_issue,
),
MCPToolDef(
name="read_issue",
description="Read details of a specific issue by number.",
parameters={
"type": "object",
"properties": {
"number": {
"type": "integer",
"description": "Issue number to read",
},
},
"required": ["number"],
},
handler=_read_issue,
),
_build_list_issues_tool(base_url, token, owner, repo),
_build_create_issue_tool(base_url, token, owner, repo),
_build_read_issue_tool(base_url, token, owner, repo),
]
@@ -399,6 +417,72 @@ class MCPBridge:
logger.warning("Tool '%s' execution failed: %s", name, exc)
return f"Error executing {name}: {exc}"
@staticmethod
def _build_initial_messages(prompt: str, system_prompt: str | None) -> list[dict]:
"""Build the initial message list for a run."""
messages: list[dict] = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.append({"role": "user", "content": prompt})
return messages
async def _process_round_tool_calls(
self,
messages: list[dict],
model_tool_calls: list[dict],
rounds: int,
tool_calls_made: list[dict],
) -> None:
"""Execute all tool calls in one round, appending results to messages."""
for tc in model_tool_calls:
func = tc.get("function", {})
tool_name = func.get("name", "unknown")
tool_args = func.get("arguments", {})
logger.info(
"Bridge tool call [round %d]: %s(%s)",
rounds,
tool_name,
tool_args,
)
result = await self._execute_tool_call(tc)
tool_calls_made.append(
{
"round": rounds,
"tool": tool_name,
"arguments": tool_args,
"result": result[:500], # Truncate for logging
}
)
messages.append({"role": "tool", "content": result})
async def _run_tool_loop(
self, messages: list[dict], tools: list[dict]
) -> tuple[str, list[dict], int, str]:
"""Run the tool-call loop until final response or max rounds reached.
Returns:
Tuple of (content, tool_calls_made, rounds, error).
"""
tool_calls_made: list[dict] = []
rounds = 0
for round_num in range(self.max_rounds):
rounds = round_num + 1
response = await self._chat(messages, tools)
msg = response.get("message", {})
model_tool_calls = msg.get("tool_calls", [])
if not model_tool_calls:
return msg.get("content", ""), tool_calls_made, rounds, ""
messages.append(msg)
await self._process_round_tool_calls(
messages, model_tool_calls, rounds, tool_calls_made
)
error = f"Exceeded maximum of {self.max_rounds} tool-call rounds"
return "(max tool-call rounds reached)", tool_calls_made, rounds, error
async def run(
self,
prompt: str,
@@ -419,115 +503,35 @@ class MCPBridge:
BridgeResult with the final response and tool call history.
"""
start = time.time()
messages: list[dict] = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.append({"role": "user", "content": prompt})
messages = self._build_initial_messages(prompt, system_prompt)
tools = self._build_ollama_tools()
tool_calls_made: list[dict] = []
rounds = 0
error_msg = ""
try:
for round_num in range(self.max_rounds):
rounds = round_num + 1
response = await self._chat(messages, tools)
msg = response.get("message", {})
# Check if model made tool calls
model_tool_calls = msg.get("tool_calls", [])
if not model_tool_calls:
# Final text response — done.
content = msg.get("content", "")
latency = (time.time() - start) * 1000
return BridgeResult(
content=content,
tool_calls_made=tool_calls_made,
rounds=rounds,
latency_ms=latency,
model=self.model,
)
# Append the assistant message (with tool_calls) to history
messages.append(msg)
# Execute each tool call and add results
for tc in model_tool_calls:
func = tc.get("function", {})
tool_name = func.get("name", "unknown")
tool_args = func.get("arguments", {})
logger.info(
"Bridge tool call [round %d]: %s(%s)",
rounds,
tool_name,
tool_args,
)
result = await self._execute_tool_call(tc)
tool_calls_made.append(
{
"round": rounds,
"tool": tool_name,
"arguments": tool_args,
"result": result[:500], # Truncate for logging
}
)
# Add tool result to message history
messages.append(
{
"role": "tool",
"content": result,
}
)
# Hit max rounds
latency = (time.time() - start) * 1000
return BridgeResult(
content="(max tool-call rounds reached)",
tool_calls_made=tool_calls_made,
rounds=rounds,
latency_ms=latency,
model=self.model,
error=f"Exceeded maximum of {self.max_rounds} tool-call rounds",
)
content, tool_calls_made, rounds, error_msg = await self._run_tool_loop(messages, tools)
except httpx.ConnectError as exc:
latency = (time.time() - start) * 1000
logger.warning("Ollama connection failed: %s", exc)
return BridgeResult(
content="",
tool_calls_made=tool_calls_made,
rounds=rounds,
latency_ms=latency,
model=self.model,
error=f"Ollama connection failed: {exc}",
)
error_msg = f"Ollama connection failed: {exc}"
content = ""
except httpx.HTTPStatusError as exc:
latency = (time.time() - start) * 1000
logger.warning("Ollama HTTP error: %s", exc)
return BridgeResult(
content="",
tool_calls_made=tool_calls_made,
rounds=rounds,
latency_ms=latency,
model=self.model,
error=f"Ollama HTTP error: {exc.response.status_code}",
)
error_msg = f"Ollama HTTP error: {exc.response.status_code}"
content = ""
except Exception as exc:
latency = (time.time() - start) * 1000
logger.error("MCPBridge run failed: %s", exc)
return BridgeResult(
content="",
tool_calls_made=tool_calls_made,
rounds=rounds,
latency_ms=latency,
model=self.model,
error=str(exc),
)
error_msg = str(exc)
content = ""
return BridgeResult(
content=content,
tool_calls_made=tool_calls_made,
rounds=rounds,
latency_ms=(time.time() - start) * 1000,
model=self.model,
error=error_msg,
)
def status(self) -> dict:
"""Return bridge status for the dashboard."""

View File

@@ -13,8 +13,8 @@ from dataclasses import dataclass
import httpx
from config import settings
from timmy.research_tools import get_llm_client, google_web_search
from timmy.research_triage import triage_research_report
from timmy.research_tools import google_web_search, get_llm_client
logger = logging.getLogger(__name__)
@@ -52,10 +52,7 @@ class PaperclipClient:
)
resp.raise_for_status()
tasks = resp.json()
return [
PaperclipTask(id=t["id"], kind=t["kind"], context=t["context"])
for t in tasks
]
return [PaperclipTask(id=t["id"], kind=t["kind"], context=t["context"]) for t in tasks]
async def update_task_status(
self, task_id: str, status: str, result: str | None = None
@@ -98,7 +95,7 @@ class ResearchOrchestrator:
async def run_research_pipeline(self, issue_title: str) -> str:
"""Run the research pipeline."""
search_results = await google_web_search(issue_title)
llm_client = get_llm_client()
response = await llm_client.completion(
f"Summarize the following search results and generate a research report:\\n\\n{search_results}",
@@ -123,7 +120,9 @@ class ResearchOrchestrator:
comment += "Created the following issues:\\n"
for result in triage_results:
if result["gitea_issue"]:
comment += f"- #{result['gitea_issue']['number']}: {result['action_item'].title}\\n"
comment += (
f"- #{result['gitea_issue']['number']}: {result['action_item'].title}\\n"
)
else:
comment += "No new issues were created.\\n"
@@ -172,4 +171,3 @@ async def start_paperclip_poller() -> None:
if settings.paperclip_enabled:
poller = PaperclipPoller()
asyncio.create_task(poller.poll())

View File

@@ -6,7 +6,6 @@ import logging
import os
from typing import Any
from config import settings
from serpapi import GoogleSearch
logger = logging.getLogger(__name__)
@@ -28,12 +27,17 @@ async def google_web_search(query: str) -> str:
def get_llm_client() -> Any:
"""Get an LLM client."""
# This is a placeholder. In a real application, this would return
# a client for an LLM service like OpenAI, Anthropic, or a local
# model.
class MockLLMClient:
"""Stub LLM client for testing without a real language model."""
async def completion(self, prompt: str, max_tokens: int) -> Any:
class MockCompletion:
"""Stub completion response returned by MockLLMClient."""
def __init__(self, text: str) -> None:
self.text = text

View File

@@ -0,0 +1,7 @@
"""Sovereignty metrics for the Bannerlord loop.
Tracks how much of each AI layer (perception, decision, narration)
runs locally vs. calls out to an LLM. Feeds the sovereignty dashboard.
Refs: #954, #953
"""

View File

@@ -0,0 +1,413 @@
"""Sovereignty metrics emitter and SQLite store.
Tracks the sovereignty percentage for each AI layer (perception, decision,
narration) plus API cost and skill crystallisation. All data is persisted to
``data/sovereignty_metrics.db`` so the dashboard can query trends over time.
Event types
-----------
perception layer:
``perception_cache_hit`` — frame answered from local cache (sovereign)
``perception_vlm_call`` — frame required a VLM inference call (non-sovereign)
decision layer:
``decision_rule_hit`` — action chosen by a deterministic rule (sovereign)
``decision_llm_call`` — action required LLM reasoning (non-sovereign)
narration layer:
``narration_template`` — text generated from a template (sovereign)
``narration_llm`` — text generated by an LLM (non-sovereign)
skill layer:
``skill_crystallized`` — a new skill was crystallised from LLM output
cost:
``api_call`` — any external API call was made
``api_cost`` — monetary cost of an API call (metadata: {"usd": float})
Refs: #954, #953
"""
import asyncio
import json
import logging
import sqlite3
import uuid
from contextlib import closing
from dataclasses import dataclass, field
from datetime import UTC, datetime
from pathlib import Path
from typing import Any
from config import settings
logger = logging.getLogger(__name__)
# ── Constants ─────────────────────────────────────────────────────────────────
DB_PATH = Path(settings.repo_root) / "data" / "sovereignty_metrics.db"
#: Sovereign event types for each layer (numerator of sovereignty %).
_SOVEREIGN_EVENTS: dict[str, frozenset[str]] = {
"perception": frozenset({"perception_cache_hit"}),
"decision": frozenset({"decision_rule_hit"}),
"narration": frozenset({"narration_template"}),
}
#: All tracked event types for each layer (denominator of sovereignty %).
_LAYER_EVENTS: dict[str, frozenset[str]] = {
"perception": frozenset({"perception_cache_hit", "perception_vlm_call"}),
"decision": frozenset({"decision_rule_hit", "decision_llm_call"}),
"narration": frozenset({"narration_template", "narration_llm"}),
}
ALL_EVENT_TYPES: frozenset[str] = frozenset(
{
"perception_cache_hit",
"perception_vlm_call",
"decision_rule_hit",
"decision_llm_call",
"narration_template",
"narration_llm",
"skill_crystallized",
"api_call",
"api_cost",
}
)
# ── Schema ────────────────────────────────────────────────────────────────────
_SCHEMA = """
CREATE TABLE IF NOT EXISTS events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
event_type TEXT NOT NULL,
session_id TEXT NOT NULL DEFAULT '',
metadata_json TEXT NOT NULL DEFAULT '{}'
);
CREATE INDEX IF NOT EXISTS idx_ev_type ON events(event_type);
CREATE INDEX IF NOT EXISTS idx_ev_ts ON events(timestamp);
CREATE INDEX IF NOT EXISTS idx_ev_session ON events(session_id);
CREATE TABLE IF NOT EXISTS sessions (
session_id TEXT PRIMARY KEY,
game TEXT NOT NULL DEFAULT '',
start_time TEXT NOT NULL,
end_time TEXT
);
"""
# ── Data classes ──────────────────────────────────────────────────────────────
@dataclass
class SovereigntyEvent:
"""A single sovereignty event."""
event_type: str
session_id: str = ""
metadata: dict[str, Any] = field(default_factory=dict)
timestamp: str = field(default_factory=lambda: datetime.now(UTC).isoformat())
# ── Store ─────────────────────────────────────────────────────────────────────
class SovereigntyMetricsStore:
"""SQLite-backed sovereignty event store.
Thread-safe: creates a new connection per operation (WAL mode).
"""
def __init__(self, db_path: Path | None = None) -> None:
self._db_path = db_path or DB_PATH
self._init_db()
# ── internal ─────────────────────────────────────────────────────────────
def _init_db(self) -> None:
try:
self._db_path.parent.mkdir(parents=True, exist_ok=True)
with closing(sqlite3.connect(str(self._db_path))) as conn:
conn.execute("PRAGMA journal_mode=WAL")
conn.execute(f"PRAGMA busy_timeout={settings.db_busy_timeout_ms}")
conn.executescript(_SCHEMA)
conn.commit()
except Exception as exc:
logger.warning("Failed to initialise sovereignty metrics DB: %s", exc)
def _connect(self) -> sqlite3.Connection:
conn = sqlite3.connect(str(self._db_path))
conn.row_factory = sqlite3.Row
conn.execute(f"PRAGMA busy_timeout={settings.db_busy_timeout_ms}")
return conn
# ── public API ────────────────────────────────────────────────────────────
def record(
self, event_type: str, metadata: dict[str, Any] | None = None, *, session_id: str = ""
) -> None:
"""Record a sovereignty event.
Parameters
----------
event_type:
One of ``ALL_EVENT_TYPES``.
metadata:
Optional dict of extra data (serialised as JSON).
session_id:
Identifier of the current game session, if known.
"""
event = SovereigntyEvent(
event_type=event_type,
session_id=session_id,
metadata=metadata or {},
)
try:
with closing(self._connect()) as conn:
conn.execute(
"INSERT INTO events (timestamp, event_type, session_id, metadata_json) "
"VALUES (?, ?, ?, ?)",
(
event.timestamp,
event.event_type,
event.session_id,
json.dumps(event.metadata),
),
)
conn.commit()
except Exception as exc:
logger.warning("Failed to record sovereignty event: %s", exc)
def start_session(self, game: str = "", session_id: str | None = None) -> str:
"""Register a new game session. Returns the session_id."""
sid = session_id or str(uuid.uuid4())
try:
with closing(self._connect()) as conn:
conn.execute(
"INSERT OR IGNORE INTO sessions (session_id, game, start_time) VALUES (?, ?, ?)",
(sid, game, datetime.now(UTC).isoformat()),
)
conn.commit()
except Exception as exc:
logger.warning("Failed to start session: %s", exc)
return sid
def end_session(self, session_id: str) -> None:
"""Mark a session as ended."""
try:
with closing(self._connect()) as conn:
conn.execute(
"UPDATE sessions SET end_time = ? WHERE session_id = ?",
(datetime.now(UTC).isoformat(), session_id),
)
conn.commit()
except Exception as exc:
logger.warning("Failed to end session: %s", exc)
# ── analytics ─────────────────────────────────────────────────────────────
def get_sovereignty_pct(self, layer: str, time_window: float | None = None) -> float:
"""Return the sovereignty percentage (0.0100.0) for *layer*.
Parameters
----------
layer:
One of ``"perception"``, ``"decision"``, ``"narration"``.
time_window:
If given, only consider events from the last *time_window* seconds.
If ``None``, all events are used.
Returns
-------
float
Percentage of sovereign events for the layer, or 0.0 if no data.
"""
if layer not in _LAYER_EVENTS:
logger.warning("Unknown sovereignty layer: %s", layer)
return 0.0
sovereign = _SOVEREIGN_EVENTS[layer]
total_types = _LAYER_EVENTS[layer]
sovereign_placeholders = ",".join("?" * len(sovereign))
total_placeholders = ",".join("?" * len(total_types))
params_sov: list[Any] = list(sovereign)
params_total: list[Any] = list(total_types)
if time_window is not None:
cutoff = _seconds_ago_iso(time_window)
where_ts = " AND timestamp >= ?"
params_sov.append(cutoff)
params_total.append(cutoff)
else:
where_ts = ""
try:
with closing(self._connect()) as conn:
total_count = conn.execute(
f"SELECT COUNT(*) FROM events WHERE event_type IN ({total_placeholders}){where_ts}",
params_total,
).fetchone()[0]
if total_count == 0:
return 0.0
sov_count = conn.execute(
f"SELECT COUNT(*) FROM events WHERE event_type IN ({sovereign_placeholders}){where_ts}",
params_sov,
).fetchone()[0]
return round(100.0 * sov_count / total_count, 2)
except Exception as exc:
logger.warning("Failed to compute sovereignty pct: %s", exc)
return 0.0
def get_cost_per_hour(self, time_window: float | None = None) -> float:
"""Return the total API cost in USD extrapolated to a per-hour rate.
Parameters
----------
time_window:
Seconds of history to consider. Defaults to 3600 (last hour).
Returns
-------
float
USD cost per hour, or 0.0 if no ``api_cost`` events exist.
"""
window = time_window if time_window is not None else 3600.0
cutoff = _seconds_ago_iso(window)
try:
with closing(self._connect()) as conn:
rows = conn.execute(
"SELECT metadata_json FROM events WHERE event_type = 'api_cost' AND timestamp >= ?",
(cutoff,),
).fetchall()
except Exception as exc:
logger.warning("Failed to query api_cost events: %s", exc)
return 0.0
total_usd = 0.0
for row in rows:
try:
meta = json.loads(row["metadata_json"] or "{}")
total_usd += float(meta.get("usd", 0.0))
except (ValueError, TypeError, json.JSONDecodeError):
pass
# Extrapolate: (total in window) * (3600 / window_seconds)
if window == 0:
return 0.0
return round(total_usd * (3600.0 / window), 4)
def get_skills_crystallized(self, session_id: str | None = None) -> int:
"""Return the number of skills crystallised.
Parameters
----------
session_id:
If given, count only events for that session. If ``None``,
count across all sessions.
"""
try:
with closing(self._connect()) as conn:
if session_id:
return conn.execute(
"SELECT COUNT(*) FROM events WHERE event_type = 'skill_crystallized' AND session_id = ?",
(session_id,),
).fetchone()[0]
return conn.execute(
"SELECT COUNT(*) FROM events WHERE event_type = 'skill_crystallized'",
).fetchone()[0]
except Exception as exc:
logger.warning("Failed to query skill_crystallized: %s", exc)
return 0
def get_snapshot(self) -> dict[str, Any]:
"""Return a real-time metrics snapshot suitable for dashboard widgets."""
return {
"sovereignty": {
layer: self.get_sovereignty_pct(layer, time_window=3600) for layer in _LAYER_EVENTS
},
"cost_per_hour": self.get_cost_per_hour(),
"skills_crystallized": self.get_skills_crystallized(),
}
# ── Module-level singleton ────────────────────────────────────────────────────
_store: SovereigntyMetricsStore | None = None
def get_metrics_store() -> SovereigntyMetricsStore:
"""Return (or lazily create) the module-level singleton store."""
global _store
if _store is None:
_store = SovereigntyMetricsStore()
return _store
# ── Convenience helpers ───────────────────────────────────────────────────────
def record(
event_type: str, metadata: dict[str, Any] | None = None, *, session_id: str = ""
) -> None:
"""Module-level shortcut: ``metrics.record("perception_cache_hit")``."""
get_metrics_store().record(event_type, metadata=metadata, session_id=session_id)
def get_sovereignty_pct(layer: str, time_window: float | None = None) -> float:
"""Module-level shortcut for :meth:`SovereigntyMetricsStore.get_sovereignty_pct`."""
return get_metrics_store().get_sovereignty_pct(layer, time_window)
def get_cost_per_hour(time_window: float | None = None) -> float:
"""Module-level shortcut for :meth:`SovereigntyMetricsStore.get_cost_per_hour`."""
return get_metrics_store().get_cost_per_hour(time_window)
def get_skills_crystallized(session_id: str | None = None) -> int:
"""Module-level shortcut for :meth:`SovereigntyMetricsStore.get_skills_crystallized`."""
return get_metrics_store().get_skills_crystallized(session_id)
async def emit_sovereignty_event(
event_type: str,
metadata: dict[str, Any] | None = None,
*,
session_id: str = "",
) -> None:
"""Record an event in a thread and publish it on the event bus.
This is the async-safe entry-point used by the agentic loop.
"""
from infrastructure.events.bus import emit
await asyncio.to_thread(
get_metrics_store().record,
event_type,
metadata,
session_id=session_id,
)
await emit(
f"sovereignty.event.{event_type}",
source="sovereignty_metrics",
data={
"event_type": event_type,
"session_id": session_id,
**(metadata or {}),
},
)
# ── Private helpers ───────────────────────────────────────────────────────────
def _seconds_ago_iso(seconds: float) -> str:
"""Return an ISO-8601 timestamp *seconds* before now (UTC)."""
import datetime as _dt
delta = _dt.timedelta(seconds=seconds)
return (_dt.datetime.now(UTC) - delta).isoformat()

View File

@@ -0,0 +1,92 @@
from __future__ import annotations
import json
from dataclasses import dataclass
from pathlib import Path
from typing import Any
import cv2
import numpy as np
@dataclass
class Template:
name: str
image: np.ndarray
threshold: float = 0.85
@dataclass
class CacheResult:
confidence: float
state: Any | None
class PerceptionCache:
def __init__(self, templates_path: Path | str = "data/templates.json"):
self.templates_path = Path(templates_path)
self.templates: list[Template] = []
self.load()
def match(self, screenshot: np.ndarray) -> CacheResult:
"""
Matches templates against the screenshot.
Returns the confidence and the name of the best matching template.
"""
best_match_confidence = 0.0
best_match_name = None
for template in self.templates:
res = cv2.matchTemplate(screenshot, template.image, cv2.TM_CCOEFF_NORMED)
_, max_val, _, _ = cv2.minMaxLoc(res)
if max_val > best_match_confidence:
best_match_confidence = max_val
best_match_name = template.name
if best_match_confidence > 0.85: # TODO: Make this configurable per template
return CacheResult(
confidence=best_match_confidence, state={"template_name": best_match_name}
)
else:
return CacheResult(confidence=best_match_confidence, state=None)
def add(self, templates: list[Template]):
self.templates.extend(templates)
def persist(self):
self.templates_path.parent.mkdir(parents=True, exist_ok=True)
# Note: This is a simplified persistence mechanism.
# A more robust solution would store templates as images and metadata in JSON.
with self.templates_path.open("w") as f:
json.dump(
[{"name": t.name, "threshold": t.threshold} for t in self.templates], f, indent=2
)
def load(self):
if self.templates_path.exists():
with self.templates_path.open("r") as f:
templates_data = json.load(f)
# This is a simplified loading mechanism and assumes template images are stored elsewhere.
# For now, we are not loading the actual images.
self.templates = [
Template(name=t["name"], image=np.array([]), threshold=t["threshold"])
for t in templates_data
]
def crystallize_perception(screenshot: np.ndarray, vlm_response: Any) -> list[Template]:
"""
Extracts reusable patterns from VLM output and generates OpenCV templates.
This is a placeholder and needs to be implemented based on the actual VLM response format.
"""
# Example implementation:
# templates = []
# for item in vlm_response.get("items", []):
# bbox = item.get("bounding_box")
# template_name = item.get("name")
# if bbox and template_name:
# x1, y1, x2, y2 = bbox
# template_image = screenshot[y1:y2, x1:x2]
# templates.append(Template(name=template_name, image=template_image))
# return templates
return []

View File

@@ -692,91 +692,112 @@ class ThinkingEngine:
file paths actually exist on disk, preventing phantom-bug reports.
"""
try:
interval = settings.thinking_issue_every
if interval <= 0:
recent = self._get_recent_thoughts_for_issues()
if recent is None:
return
count = self.count_thoughts()
if count == 0 or count % interval != 0:
return
# Check Gitea availability before spending LLM tokens
if not settings.gitea_enabled or not settings.gitea_token:
return
recent = self.get_recent_thoughts(limit=interval)
if len(recent) < interval:
return
thought_text = "\n".join(f"- [{t.seed_type}] {t.content}" for t in reversed(recent))
classify_prompt = (
"You are reviewing your own recent thoughts for actionable items.\n"
"Extract 0-2 items that are CONCRETE bugs, broken features, stale "
"state, or clear improvement opportunities in your own codebase.\n\n"
"Rules:\n"
"- Only include things that could become a real code fix or feature\n"
"- Skip vague reflections, philosophical musings, or repeated themes\n"
"- Category must be one of: bug, feature, suggestion, maintenance\n"
"- ONLY reference files that you are CERTAIN exist in the project\n"
"- Do NOT invent or guess file paths — if unsure, describe the "
"area of concern without naming specific files\n\n"
"For each item, write an ENGINEER-QUALITY issue:\n"
'- "title": A clear, specific title (e.g. "[Memory] MEMORY.md timestamp not updating")\n'
'- "body": A detailed body with these sections:\n'
" **What's happening:** Describe the current (broken) behavior.\n"
" **Expected behavior:** What should happen instead.\n"
" **Suggested fix:** Which file(s) to change and what the fix looks like.\n"
" **Acceptance criteria:** How to verify the fix works.\n"
'- "category": One of bug, feature, suggestion, maintenance\n\n'
"Return ONLY a JSON array of objects with keys: "
'"title", "body", "category"\n'
"Return [] if nothing is actionable.\n\n"
f"Recent thoughts:\n{thought_text}\n\nJSON array:"
)
classify_prompt = self._build_issue_classify_prompt(recent)
raw = await self._call_agent(classify_prompt)
if not raw or not raw.strip():
return
import json
# Strip markdown code fences if present
cleaned = raw.strip()
if cleaned.startswith("```"):
cleaned = cleaned.split("\n", 1)[-1].rsplit("```", 1)[0].strip()
items = json.loads(cleaned)
if not isinstance(items, list) or not items:
items = self._parse_issue_items(raw)
if items is None:
return
from timmy.mcp_tools import create_gitea_issue_via_mcp
for item in items[:2]: # Safety cap
if not isinstance(item, dict):
continue
title = item.get("title", "").strip()
body = item.get("body", "").strip()
category = item.get("category", "suggestion").strip()
if not title or len(title) < 10:
continue
# Validate all referenced file paths exist on disk
combined = f"{title}\n{body}"
if not self._references_real_files(combined):
logger.info(
"Skipped phantom issue: %s (references non-existent files)",
title[:60],
)
continue
label = category if category in ("bug", "feature") else ""
result = await create_gitea_issue_via_mcp(title=title, body=body, labels=label)
logger.info("Thought→Issue: %s%s", title[:60], result[:80])
await self._file_single_issue(item, create_gitea_issue_via_mcp)
except Exception as exc:
logger.debug("Thought issue filing skipped: %s", exc)
def _get_recent_thoughts_for_issues(self):
"""Return recent thoughts if conditions for filing issues are met, else None."""
interval = settings.thinking_issue_every
if interval <= 0:
return None
count = self.count_thoughts()
if count == 0 or count % interval != 0:
return None
if not settings.gitea_enabled or not settings.gitea_token:
return None
recent = self.get_recent_thoughts(limit=interval)
if len(recent) < interval:
return None
return recent
@staticmethod
def _build_issue_classify_prompt(recent) -> str:
"""Build the LLM prompt that extracts actionable issues from recent thoughts."""
thought_text = "\n".join(f"- [{t.seed_type}] {t.content}" for t in reversed(recent))
return (
"You are reviewing your own recent thoughts for actionable items.\n"
"Extract 0-2 items that are CONCRETE bugs, broken features, stale "
"state, or clear improvement opportunities in your own codebase.\n\n"
"Rules:\n"
"- Only include things that could become a real code fix or feature\n"
"- Skip vague reflections, philosophical musings, or repeated themes\n"
"- Category must be one of: bug, feature, suggestion, maintenance\n"
"- ONLY reference files that you are CERTAIN exist in the project\n"
"- Do NOT invent or guess file paths — if unsure, describe the "
"area of concern without naming specific files\n\n"
"For each item, write an ENGINEER-QUALITY issue:\n"
'- "title": A clear, specific title (e.g. "[Memory] MEMORY.md timestamp not updating")\n'
'- "body": A detailed body with these sections:\n'
" **What's happening:** Describe the current (broken) behavior.\n"
" **Expected behavior:** What should happen instead.\n"
" **Suggested fix:** Which file(s) to change and what the fix looks like.\n"
" **Acceptance criteria:** How to verify the fix works.\n"
'- "category": One of bug, feature, suggestion, maintenance\n\n'
"Return ONLY a JSON array of objects with keys: "
'"title", "body", "category"\n'
"Return [] if nothing is actionable.\n\n"
f"Recent thoughts:\n{thought_text}\n\nJSON array:"
)
@staticmethod
def _parse_issue_items(raw: str):
"""Strip markdown fences and parse JSON issue list; return None on failure."""
import json
if not raw or not raw.strip():
return None
cleaned = raw.strip()
if cleaned.startswith("```"):
cleaned = cleaned.split("\n", 1)[-1].rsplit("```", 1)[0].strip()
items = json.loads(cleaned)
if not isinstance(items, list) or not items:
return None
return items
async def _file_single_issue(self, item: dict, create_fn) -> None:
"""Validate one issue dict and create it via *create_fn* if it passes checks."""
if not isinstance(item, dict):
return
title = item.get("title", "").strip()
body = item.get("body", "").strip()
category = item.get("category", "suggestion").strip()
if not title or len(title) < 10:
return
combined = f"{title}\n{body}"
if not self._references_real_files(combined):
logger.info(
"Skipped phantom issue: %s (references non-existent files)",
title[:60],
)
return
label = category if category in ("bug", "feature") else ""
result = await create_fn(title=title, body=body, labels=label)
logger.info("Thought→Issue: %s%s", title[:60], result[:80])
# ── System snapshot helpers ────────────────────────────────────────────
def _snap_thought_count(self, now: datetime) -> str | None:

View File

@@ -47,13 +47,11 @@ _DEFAULT_IDLE_THRESHOLD = 30
class AgentStatus:
"""Health snapshot for one agent at a point in time."""
agent: str # "claude" | "kimi" | "timmy"
agent: str # "claude" | "kimi" | "timmy"
is_idle: bool = True
active_issue_numbers: list[int] = field(default_factory=list)
stuck_issue_numbers: list[int] = field(default_factory=list)
checked_at: str = field(
default_factory=lambda: datetime.now(UTC).isoformat()
)
checked_at: str = field(default_factory=lambda: datetime.now(UTC).isoformat())
@property
def is_stuck(self) -> bool:
@@ -69,9 +67,7 @@ class AgentHealthReport:
"""Combined health report for all monitored agents."""
agents: list[AgentStatus] = field(default_factory=list)
generated_at: str = field(
default_factory=lambda: datetime.now(UTC).isoformat()
)
generated_at: str = field(default_factory=lambda: datetime.now(UTC).isoformat())
@property
def any_stuck(self) -> bool:
@@ -193,18 +189,14 @@ async def check_agent_health(
try:
async with httpx.AsyncClient(timeout=15) as client:
issues = await _fetch_labeled_issues(
client, base_url, headers, repo, label
)
issues = await _fetch_labeled_issues(client, base_url, headers, repo, label)
for issue in issues:
num = issue.get("number", 0)
status.active_issue_numbers.append(num)
# Check last activity
last_activity = await _last_comment_time(
client, base_url, headers, repo, num
)
last_activity = await _last_comment_time(client, base_url, headers, repo, num)
if last_activity is None:
last_activity = await _issue_created_time(issue)

View File

@@ -91,9 +91,9 @@ _PRIORITY_LABEL_SCORES: dict[str, int] = {
class AgentTarget(StrEnum):
"""Which agent should handle this issue."""
TIMMY = "timmy" # Timmy handles locally (self)
TIMMY = "timmy" # Timmy handles locally (self)
CLAUDE = "claude" # Dispatch to Claude Code
KIMI = "kimi" # Dispatch to Kimi Code
KIMI = "kimi" # Dispatch to Kimi Code
@dataclass
@@ -172,9 +172,7 @@ def triage_issues(raw_issues: list[dict[str, Any]]) -> list[TriagedIssue]:
title = issue.get("title", "")
body = issue.get("body") or ""
labels = _extract_labels(issue)
assignees = [
a.get("login", "") for a in issue.get("assignees") or []
]
assignees = [a.get("login", "") for a in issue.get("assignees") or []]
url = issue.get("html_url", "")
priority = _score_priority(labels, assignees)
@@ -252,9 +250,7 @@ async def fetch_open_issues(
params=params,
)
if resp.status_code != 200:
logger.warning(
"fetch_open_issues: Gitea returned %s", resp.status_code
)
logger.warning("fetch_open_issues: Gitea returned %s", resp.status_code)
return []
issues = resp.json()

View File

@@ -34,7 +34,7 @@ _LABEL_MAP: dict[AgentTarget, str] = {
_LABEL_COLORS: dict[str, str] = {
"claude-ready": "#8b6f47", # warm brown
"kimi-ready": "#006b75", # dark teal
"kimi-ready": "#006b75", # dark teal
"timmy-ready": "#0075ca", # blue
}
@@ -52,9 +52,7 @@ class DispatchRecord:
issue_title: str
agent: AgentTarget
rationale: str
dispatched_at: str = field(
default_factory=lambda: datetime.now(UTC).isoformat()
)
dispatched_at: str = field(default_factory=lambda: datetime.now(UTC).isoformat())
label_applied: bool = False
comment_posted: bool = False
@@ -170,9 +168,7 @@ async def dispatch_issue(issue: TriagedIssue) -> DispatchRecord:
try:
async with httpx.AsyncClient(timeout=15) as client:
label_id = await _get_or_create_label(
client, base_url, headers, repo, label_name
)
label_id = await _get_or_create_label(client, base_url, headers, repo, label_name)
# Apply label
if label_id is not None:

View File

@@ -22,9 +22,9 @@ logger = logging.getLogger(__name__)
# Thresholds
# ---------------------------------------------------------------------------
_WARN_DISK_PCT = 85.0 # warn when disk is more than 85% full
_WARN_MEM_PCT = 90.0 # warn when memory is more than 90% used
_WARN_CPU_PCT = 95.0 # warn when CPU is above 95% sustained
_WARN_DISK_PCT = 85.0 # warn when disk is more than 85% full
_WARN_MEM_PCT = 90.0 # warn when memory is more than 90% used
_WARN_CPU_PCT = 95.0 # warn when CPU is above 95% sustained
# ---------------------------------------------------------------------------
@@ -63,9 +63,7 @@ class SystemSnapshot:
memory: MemoryUsage = field(default_factory=MemoryUsage)
ollama: OllamaHealth = field(default_factory=OllamaHealth)
warnings: list[str] = field(default_factory=list)
taken_at: str = field(
default_factory=lambda: datetime.now(UTC).isoformat()
)
taken_at: str = field(default_factory=lambda: datetime.now(UTC).isoformat())
@property
def healthy(self) -> bool:
@@ -117,8 +115,8 @@ def _probe_memory() -> MemoryUsage:
def _probe_ollama_sync(ollama_url: str) -> OllamaHealth:
"""Synchronous Ollama health probe — run in a thread."""
try:
import urllib.request
import json
import urllib.request
url = ollama_url.rstrip("/") + "/api/tags"
with urllib.request.urlopen(url, timeout=5) as resp: # noqa: S310
@@ -154,14 +152,12 @@ async def get_system_snapshot() -> SystemSnapshot:
if disk.percent_used >= _WARN_DISK_PCT:
warnings.append(
f"Disk {disk.path}: {disk.percent_used:.0f}% used "
f"({disk.free_gb:.1f} GB free)"
f"Disk {disk.path}: {disk.percent_used:.0f}% used ({disk.free_gb:.1f} GB free)"
)
if memory.percent_used >= _WARN_MEM_PCT:
warnings.append(
f"Memory: {memory.percent_used:.0f}% used "
f"({memory.available_gb:.1f} GB available)"
f"Memory: {memory.percent_used:.0f}% used ({memory.available_gb:.1f} GB available)"
)
if not ollama.reachable:
@@ -216,7 +212,5 @@ async def cleanup_stale_files(
errors.append(str(exc))
await asyncio.to_thread(_cleanup)
logger.info(
"cleanup_stale_files: deleted %d files, %d errors", deleted, len(errors)
)
logger.info("cleanup_stale_files: deleted %d files, %d errors", deleted, len(errors))
return {"deleted_count": deleted, "errors": errors}

View File

@@ -25,15 +25,21 @@ logger = logging.getLogger(__name__)
class ChatRequest(BaseModel):
"""Incoming chat request payload for the Timmy Serve API."""
message: str
stream: bool = False
class ChatResponse(BaseModel):
"""Chat response payload returned by the Timmy Serve API."""
response: str
class StatusResponse(BaseModel):
"""Service status response with backend information."""
status: str
backend: str

View File

@@ -9,6 +9,9 @@ Usage:
import json
import os
import subprocess
import urllib.error
import urllib.request
from pathlib import Path
from typing import Any
@@ -31,6 +34,37 @@ AUTOMATIONS_CONFIG = DEFAULT_CONFIG_DIR / "automations.json"
DAILY_RUN_CONFIG = DEFAULT_CONFIG_DIR / "daily_run.json"
TRIAGE_RULES_CONFIG = DEFAULT_CONFIG_DIR / "triage_rules.yaml"
GITEA_URL = os.environ.get("GITEA_URL", "http://143.198.27.163:3000")
GITEA_REPO = "rockachopa/Timmy-time-dashboard"
def _get_gitea_token() -> str | None:
"""Read the Gitea API token from env or config files."""
token = os.environ.get("GITEA_TOKEN")
if token:
return token.strip()
for candidate in [
Path("~/.hermes/gitea_token_vps").expanduser(),
Path("~/.hermes/gitea_token").expanduser(),
]:
try:
return candidate.read_text(encoding="utf-8").strip()
except FileNotFoundError:
continue
return None
def _gitea_api_get(endpoint: str) -> Any:
"""GET a Gitea API endpoint and return parsed JSON."""
url = f"{GITEA_URL}/api/v1{endpoint}"
token = _get_gitea_token()
req = urllib.request.Request(url)
if token:
req.add_header("Authorization", f"token {token}")
req.add_header("Accept", "application/json")
with urllib.request.urlopen(req, timeout=15) as resp:
return json.loads(resp.read().decode("utf-8"))
def _load_json_config(path: Path) -> dict[str, Any]:
"""Load a JSON config file, returning empty dict on error."""
@@ -131,9 +165,41 @@ def daily_run(
console.print("[yellow]Dry run mode — no actions executed.[/yellow]")
else:
console.print("[green]Executing daily run automations...[/green]")
# TODO: Implement actual automation execution
# This would call the appropriate scripts from the automations config
console.print("[dim]Automation execution not yet implemented.[/dim]")
auto_config_path = _get_config_dir() / "automations.json"
auto_config = _load_json_config(auto_config_path)
all_automations = auto_config.get("automations", [])
enabled = [a for a in all_automations if a.get("enabled", False)]
if not enabled:
console.print("[yellow]No enabled automations found.[/yellow]")
for auto in enabled:
cmd = auto.get("command")
name = auto.get("name", auto.get("id", "unnamed"))
if not cmd:
console.print(f"[yellow]Skipping {name} — no command defined.[/yellow]")
continue
console.print(f"[cyan]▶ Running: {name}[/cyan]")
if verbose:
console.print(f"[dim] $ {cmd}[/dim]")
try:
result = subprocess.run( # noqa: S602
cmd,
shell=True,
capture_output=True,
text=True,
timeout=120,
)
if result.stdout.strip():
console.print(result.stdout.strip())
if result.returncode != 0:
console.print(f"[red] ✗ {name} exited with code {result.returncode}[/red]")
if result.stderr.strip():
console.print(f"[red]{result.stderr.strip()}[/red]")
else:
console.print(f"[green] ✓ {name} completed successfully[/green]")
except subprocess.TimeoutExpired:
console.print(f"[red] ✗ {name} timed out after 120s[/red]")
except Exception as exc:
console.print(f"[red] ✗ {name} failed: {exc}[/red]")
@app.command()
@@ -159,9 +225,96 @@ def log_run(
console.print(f"[dim]Message:[/dim] {message}")
console.print()
# TODO: Persist to actual logbook file
# This would append to a logbook file (e.g., .loop/logbook.jsonl)
console.print("[green]✓[/green] Entry logged (simulated)")
logbook_path = Path(".loop/logbook.jsonl")
logbook_path.parent.mkdir(parents=True, exist_ok=True)
entry = json.dumps({"timestamp": timestamp, "category": category, "message": message})
with open(logbook_path, "a", encoding="utf-8") as f:
f.write(entry + "\n")
console.print(f"[green]✓[/green] Entry logged to {logbook_path}")
def _show_automations_table(limit: int) -> None:
"""Display active automations from the automations config."""
config_path = _get_config_dir() / "automations.json"
config = _load_json_config(config_path)
enabled = [a for a in config.get("automations", []) if a.get("enabled", False)]
table = Table(title="Active Automations")
table.add_column("ID", style="cyan")
table.add_column("Name", style="green")
table.add_column("Category", style="yellow")
table.add_column("Trigger", style="magenta")
for auto in enabled[:limit]:
table.add_row(
auto.get("id", ""),
auto.get("name", ""),
"" if auto.get("enabled", False) else "",
auto.get("category", ""),
)
console.print(table)
console.print()
def _show_prs_table(limit: int) -> None:
"""Display open pull requests from Gitea."""
table = Table(title="Open Pull Requests")
table.add_column("#", style="cyan")
table.add_column("Title", style="green")
table.add_column("Author", style="yellow")
table.add_column("Status", style="magenta")
try:
prs = _gitea_api_get(f"/repos/{GITEA_REPO}/pulls?state=open")
if prs:
for pr in prs[:limit]:
table.add_row(
str(pr.get("number", "")),
pr.get("title", ""),
pr.get("user", {}).get("login", ""),
pr.get("state", ""),
)
else:
table.add_row("", "[dim]No open PRs[/dim]", "", "")
except Exception as exc:
table.add_row("", f"[red]Error fetching PRs: {exc}[/red]", "", "")
console.print(table)
console.print()
def _show_issues_table(limit: int) -> None:
"""Display open issues from Gitea."""
table = Table(title="Issues Calling for Attention")
table.add_column("#", style="cyan")
table.add_column("Title", style="green")
table.add_column("Type", style="yellow")
table.add_column("Priority", style="magenta")
try:
issues = _gitea_api_get(f"/repos/{GITEA_REPO}/issues?state=open&type=issues&limit={limit}")
if issues:
for issue in issues[:limit]:
labels = [lb.get("name", "") for lb in issue.get("labels", [])]
priority = next((lb for lb in labels if "priority" in lb.lower()), "")
issue_type = next(
(
lb
for lb in labels
if lb.lower() in ("bug", "feature", "refactor", "enhancement")
),
"",
)
table.add_row(
str(issue.get("number", "")),
issue.get("title", ""),
issue_type,
priority,
)
else:
table.add_row("", "[dim]No open issues[/dim]", "", "")
except Exception as exc:
table.add_row("", f"[red]Error fetching issues: {exc}[/red]", "", "")
console.print(table)
console.print()
@app.command()
@@ -180,54 +333,13 @@ def inbox(
console.print("[bold green]Timmy Inbox[/bold green]")
console.print()
# Load automations to show what's enabled
config_path = _get_config_dir() / "automations.json"
config = _load_json_config(config_path)
_show_automations_table(limit)
automations = config.get("automations", [])
enabled_automations = [a for a in automations if a.get("enabled", False)]
# Show automation status
auto_table = Table(title="Active Automations")
auto_table.add_column("ID", style="cyan")
auto_table.add_column("Name", style="green")
auto_table.add_column("Category", style="yellow")
auto_table.add_column("Trigger", style="magenta")
for auto in enabled_automations[:limit]:
auto_table.add_row(
auto.get("id", ""),
auto.get("name", ""),
"" if auto.get("enabled", False) else "",
auto.get("category", ""),
)
console.print(auto_table)
console.print()
# TODO: Fetch actual PRs from Gitea API
if include_prs:
pr_table = Table(title="Open Pull Requests (placeholder)")
pr_table.add_column("#", style="cyan")
pr_table.add_column("Title", style="green")
pr_table.add_column("Author", style="yellow")
pr_table.add_column("Status", style="magenta")
pr_table.add_row("", "[dim]No PRs fetched (Gitea API not configured)[/dim]", "", "")
console.print(pr_table)
console.print()
_show_prs_table(limit)
# TODO: Fetch relevant issues from Gitea API
if include_issues:
issue_table = Table(title="Issues Calling for Attention (placeholder)")
issue_table.add_column("#", style="cyan")
issue_table.add_column("Title", style="green")
issue_table.add_column("Type", style="yellow")
issue_table.add_column("Priority", style="magenta")
issue_table.add_row(
"", "[dim]No issues fetched (Gitea API not configured)[/dim]", "", ""
)
console.print(issue_table)
console.print()
_show_issues_table(limit)
@app.command()

View File

@@ -2547,3 +2547,120 @@
.tower-adv-title { font-size: 0.85rem; font-weight: 600; color: var(--text-bright); }
.tower-adv-detail { font-size: 0.8rem; color: var(--text); margin-top: 2px; }
.tower-adv-action { font-size: 0.75rem; color: var(--green); margin-top: 4px; font-style: italic; }
/* ── Voice settings ───────────────────────────────────────── */
.voice-settings-page { max-width: 600px; margin: 0 auto; }
.vs-field { margin-bottom: 1.5rem; }
.vs-label {
display: block;
font-size: 0.75rem;
font-weight: 700;
letter-spacing: 0.1em;
color: var(--text-dim);
margin-bottom: 0.5rem;
}
.vs-value { color: var(--green); font-family: var(--font); }
.vs-slider {
width: 100%;
-webkit-appearance: none;
appearance: none;
height: 4px;
background: var(--border);
border-radius: 2px;
outline: none;
cursor: pointer;
}
.vs-slider::-webkit-slider-thumb {
-webkit-appearance: none;
appearance: none;
width: 18px;
height: 18px;
border-radius: 50%;
background: var(--purple);
cursor: pointer;
box-shadow: 0 0 6px rgba(124, 58, 237, 0.5);
transition: box-shadow 0.2s;
}
.vs-slider::-webkit-slider-thumb:hover { box-shadow: 0 0 12px rgba(124, 58, 237, 0.8); }
.vs-slider::-moz-range-thumb {
width: 18px;
height: 18px;
border-radius: 50%;
background: var(--purple);
cursor: pointer;
border: none;
box-shadow: 0 0 6px rgba(124, 58, 237, 0.5);
}
.vs-range-labels {
display: flex;
justify-content: space-between;
font-size: 0.7rem;
color: var(--text-dim);
margin-top: 0.25rem;
}
.vs-select,
.vs-input {
width: 100%;
padding: 0.5rem 0.75rem;
background: var(--bg-card);
border: 1px solid var(--border);
border-radius: var(--radius-sm);
color: var(--text);
font-family: var(--font);
font-size: 0.9rem;
}
.vs-select { cursor: pointer; }
.vs-select:focus,
.vs-input:focus {
outline: none;
border-color: var(--purple);
box-shadow: 0 0 0 2px rgba(124, 58, 237, 0.2);
}
.vs-unavailable {
font-size: 0.85rem;
color: var(--text-dim);
padding: 0.5rem 0.75rem;
border: 1px dashed var(--border);
border-radius: var(--radius-sm);
}
.vs-actions {
display: flex;
gap: 0.75rem;
margin-top: 1.5rem;
flex-wrap: wrap;
}
.vs-btn-preview,
.vs-btn-save {
flex: 1;
padding: 0.6rem 1.2rem;
border-radius: var(--radius-sm);
font-family: var(--font);
font-size: 0.85rem;
font-weight: 700;
letter-spacing: 0.08em;
cursor: pointer;
min-height: 44px;
transition: opacity 0.2s, box-shadow 0.2s, background 0.2s;
}
.vs-btn-preview {
background: transparent;
border: 1px solid var(--purple);
color: var(--purple);
}
.vs-btn-preview:hover {
background: rgba(124, 58, 237, 0.15);
box-shadow: 0 0 8px rgba(124, 58, 237, 0.3);
}
.vs-btn-save {
background: var(--green);
border: none;
color: var(--bg-deep);
}
.vs-btn-save:hover { opacity: 0.85; }

View File

@@ -51,6 +51,9 @@ def pytest_collection_modifyitems(config, items):
item.add_marker(pytest.mark.docker)
item.add_marker(pytest.mark.skip_ci)
if "setup_prod" in test_path or "setup_script" in test_path:
item.add_marker(pytest.mark.skip_ci)
if "ollama" in test_path or "test_ollama" in item.name:
item.add_marker(pytest.mark.ollama)

View File

@@ -0,0 +1,530 @@
"""Unit tests for dashboard/routes/daily_run.py."""
from __future__ import annotations
import json
import os
from datetime import UTC, datetime, timedelta
from pathlib import Path
from unittest.mock import MagicMock, patch
from urllib.error import HTTPError, URLError
import pytest
from dashboard.routes.daily_run import (
DEFAULT_CONFIG,
LAYER_LABELS,
DailyRunMetrics,
GiteaClient,
LayerMetrics,
_extract_layer,
_fetch_layer_metrics,
_get_metrics,
_get_token,
_load_config,
_load_cycle_data,
)
# ---------------------------------------------------------------------------
# _load_config
# ---------------------------------------------------------------------------
def test_load_config_returns_defaults():
with patch("dashboard.routes.daily_run.CONFIG_PATH") as mock_path:
mock_path.exists.return_value = False
config = _load_config()
assert config["gitea_api"] == DEFAULT_CONFIG["gitea_api"]
assert config["repo_slug"] == DEFAULT_CONFIG["repo_slug"]
def test_load_config_merges_file_orchestrator_section(tmp_path):
config_file = tmp_path / "daily_run.json"
config_file.write_text(
json.dumps({"orchestrator": {"repo_slug": "custom/repo", "gitea_api": "http://custom:3000/api/v1"}})
)
with patch("dashboard.routes.daily_run.CONFIG_PATH", config_file):
config = _load_config()
assert config["repo_slug"] == "custom/repo"
assert config["gitea_api"] == "http://custom:3000/api/v1"
def test_load_config_ignores_invalid_json(tmp_path):
config_file = tmp_path / "daily_run.json"
config_file.write_text("not valid json{{")
with patch("dashboard.routes.daily_run.CONFIG_PATH", config_file):
config = _load_config()
assert config["repo_slug"] == DEFAULT_CONFIG["repo_slug"]
def test_load_config_env_overrides(monkeypatch):
monkeypatch.setenv("TIMMY_GITEA_API", "http://envapi:3000/api/v1")
monkeypatch.setenv("TIMMY_REPO_SLUG", "env/repo")
monkeypatch.setenv("TIMMY_GITEA_TOKEN", "env-token-123")
with patch("dashboard.routes.daily_run.CONFIG_PATH") as mock_path:
mock_path.exists.return_value = False
config = _load_config()
assert config["gitea_api"] == "http://envapi:3000/api/v1"
assert config["repo_slug"] == "env/repo"
assert config["token"] == "env-token-123"
def test_load_config_no_env_overrides_without_vars(monkeypatch):
monkeypatch.delenv("TIMMY_GITEA_API", raising=False)
monkeypatch.delenv("TIMMY_REPO_SLUG", raising=False)
monkeypatch.delenv("TIMMY_GITEA_TOKEN", raising=False)
with patch("dashboard.routes.daily_run.CONFIG_PATH") as mock_path:
mock_path.exists.return_value = False
config = _load_config()
assert "token" not in config
# ---------------------------------------------------------------------------
# _get_token
# ---------------------------------------------------------------------------
def test_get_token_from_config_dict():
config = {"token": "direct-token", "token_file": "~/.hermes/gitea_token"}
assert _get_token(config) == "direct-token"
def test_get_token_from_file(tmp_path):
token_file = tmp_path / "token.txt"
token_file.write_text(" file-token \n")
config = {"token_file": str(token_file)}
assert _get_token(config) == "file-token"
def test_get_token_returns_none_when_file_missing(tmp_path):
config = {"token_file": str(tmp_path / "nonexistent_token")}
assert _get_token(config) is None
# ---------------------------------------------------------------------------
# GiteaClient
# ---------------------------------------------------------------------------
def _make_client(**kwargs) -> GiteaClient:
config = {**DEFAULT_CONFIG, **kwargs}
return GiteaClient(config, token="test-token")
def test_gitea_client_headers_include_auth():
client = _make_client()
headers = client._headers()
assert headers["Authorization"] == "token test-token"
assert headers["Accept"] == "application/json"
def test_gitea_client_headers_no_token():
config = {**DEFAULT_CONFIG}
client = GiteaClient(config, token=None)
headers = client._headers()
assert "Authorization" not in headers
def test_gitea_client_api_url():
client = _make_client()
url = client._api_url("issues")
assert url == f"{DEFAULT_CONFIG['gitea_api']}/repos/{DEFAULT_CONFIG['repo_slug']}/issues"
def test_gitea_client_api_url_strips_trailing_slash():
config = {**DEFAULT_CONFIG, "gitea_api": "http://localhost:3000/api/v1/"}
client = GiteaClient(config, token=None)
url = client._api_url("issues")
assert "//" not in url.replace("http://", "")
def test_gitea_client_is_available_true():
client = _make_client()
mock_resp = MagicMock()
mock_resp.status = 200
mock_resp.__enter__ = lambda s: mock_resp
mock_resp.__exit__ = MagicMock(return_value=False)
with patch("dashboard.routes.daily_run.urlopen", return_value=mock_resp):
assert client.is_available() is True
def test_gitea_client_is_available_cached():
client = _make_client()
client._available = True
# Should not call urlopen at all
with patch("dashboard.routes.daily_run.urlopen") as mock_urlopen:
assert client.is_available() is True
mock_urlopen.assert_not_called()
def test_gitea_client_is_available_false_on_url_error():
client = _make_client()
with patch("dashboard.routes.daily_run.urlopen", side_effect=URLError("refused")):
assert client.is_available() is False
def test_gitea_client_is_available_false_on_timeout():
client = _make_client()
with patch("dashboard.routes.daily_run.urlopen", side_effect=TimeoutError()):
assert client.is_available() is False
def test_gitea_client_get_paginated_single_page():
client = _make_client()
mock_resp = MagicMock()
mock_resp.read.return_value = json.dumps([{"id": 1}, {"id": 2}]).encode()
mock_resp.__enter__ = lambda s: mock_resp
mock_resp.__exit__ = MagicMock(return_value=False)
with patch("dashboard.routes.daily_run.urlopen", return_value=mock_resp):
result = client.get_paginated("issues")
assert len(result) == 2
assert result[0]["id"] == 1
def test_gitea_client_get_paginated_empty():
client = _make_client()
mock_resp = MagicMock()
mock_resp.read.return_value = b"[]"
mock_resp.__enter__ = lambda s: mock_resp
mock_resp.__exit__ = MagicMock(return_value=False)
with patch("dashboard.routes.daily_run.urlopen", return_value=mock_resp):
result = client.get_paginated("issues")
assert result == []
# ---------------------------------------------------------------------------
# LayerMetrics.trend
# ---------------------------------------------------------------------------
def test_layer_metrics_trend_no_previous_no_current():
lm = LayerMetrics(name="triage", label="layer:triage", current_count=0, previous_count=0)
assert lm.trend == ""
def test_layer_metrics_trend_no_previous_with_current():
lm = LayerMetrics(name="triage", label="layer:triage", current_count=5, previous_count=0)
assert lm.trend == ""
def test_layer_metrics_trend_big_increase():
lm = LayerMetrics(name="triage", label="layer:triage", current_count=130, previous_count=100)
assert lm.trend == "↑↑"
def test_layer_metrics_trend_small_increase():
lm = LayerMetrics(name="triage", label="layer:triage", current_count=108, previous_count=100)
assert lm.trend == ""
def test_layer_metrics_trend_stable():
lm = LayerMetrics(name="triage", label="layer:triage", current_count=100, previous_count=100)
assert lm.trend == ""
def test_layer_metrics_trend_small_decrease():
lm = LayerMetrics(name="triage", label="layer:triage", current_count=92, previous_count=100)
assert lm.trend == ""
def test_layer_metrics_trend_big_decrease():
lm = LayerMetrics(name="triage", label="layer:triage", current_count=70, previous_count=100)
assert lm.trend == "↓↓"
def test_layer_metrics_trend_color_up():
lm = LayerMetrics(name="triage", label="layer:triage", current_count=200, previous_count=100)
assert lm.trend_color == "var(--green)"
def test_layer_metrics_trend_color_down():
lm = LayerMetrics(name="triage", label="layer:triage", current_count=50, previous_count=100)
assert lm.trend_color == "var(--amber)"
def test_layer_metrics_trend_color_stable():
lm = LayerMetrics(name="triage", label="layer:triage", current_count=100, previous_count=100)
assert lm.trend_color == "var(--text-dim)"
# ---------------------------------------------------------------------------
# DailyRunMetrics.sessions_trend
# ---------------------------------------------------------------------------
def _make_daily_metrics(**kwargs) -> DailyRunMetrics:
defaults = dict(
sessions_completed=10,
sessions_previous=8,
layers=[],
total_touched_current=20,
total_touched_previous=15,
lookback_days=7,
generated_at=datetime.now(UTC).isoformat(),
)
defaults.update(kwargs)
return DailyRunMetrics(**defaults)
def test_daily_metrics_sessions_trend_big_increase():
m = _make_daily_metrics(sessions_completed=130, sessions_previous=100)
assert m.sessions_trend == "↑↑"
def test_daily_metrics_sessions_trend_stable():
m = _make_daily_metrics(sessions_completed=100, sessions_previous=100)
assert m.sessions_trend == ""
def test_daily_metrics_sessions_trend_no_previous_zero_completed():
m = _make_daily_metrics(sessions_completed=0, sessions_previous=0)
assert m.sessions_trend == ""
def test_daily_metrics_sessions_trend_no_previous_with_completed():
m = _make_daily_metrics(sessions_completed=5, sessions_previous=0)
assert m.sessions_trend == ""
def test_daily_metrics_sessions_trend_color_green():
m = _make_daily_metrics(sessions_completed=200, sessions_previous=100)
assert m.sessions_trend_color == "var(--green)"
def test_daily_metrics_sessions_trend_color_amber():
m = _make_daily_metrics(sessions_completed=50, sessions_previous=100)
assert m.sessions_trend_color == "var(--amber)"
# ---------------------------------------------------------------------------
# _extract_layer
# ---------------------------------------------------------------------------
def test_extract_layer_finds_layer_label():
labels = [{"name": "bug"}, {"name": "layer:triage"}, {"name": "urgent"}]
assert _extract_layer(labels) == "triage"
def test_extract_layer_returns_none_when_no_layer():
labels = [{"name": "bug"}, {"name": "feature"}]
assert _extract_layer(labels) is None
def test_extract_layer_empty_labels():
assert _extract_layer([]) is None
def test_extract_layer_first_match_wins():
labels = [{"name": "layer:micro-fix"}, {"name": "layer:tests"}]
assert _extract_layer(labels) == "micro-fix"
# ---------------------------------------------------------------------------
# _load_cycle_data
# ---------------------------------------------------------------------------
def test_load_cycle_data_missing_file(tmp_path):
with patch("dashboard.routes.daily_run.REPO_ROOT", tmp_path):
result = _load_cycle_data(days=14)
assert result == {"current": 0, "previous": 0}
def test_load_cycle_data_counts_successful_sessions(tmp_path):
retro_dir = tmp_path / ".loop" / "retro"
retro_dir.mkdir(parents=True)
retro_file = retro_dir / "cycles.jsonl"
now = datetime.now(UTC)
recent_ts = (now - timedelta(days=3)).isoformat()
older_ts = (now - timedelta(days=10)).isoformat()
old_ts = (now - timedelta(days=20)).isoformat()
lines = [
json.dumps({"timestamp": recent_ts, "success": True}),
json.dumps({"timestamp": recent_ts, "success": False}), # not counted
json.dumps({"timestamp": older_ts, "success": True}),
json.dumps({"timestamp": old_ts, "success": True}), # outside window
]
retro_file.write_text("\n".join(lines))
with patch("dashboard.routes.daily_run.REPO_ROOT", tmp_path):
result = _load_cycle_data(days=7)
assert result["current"] == 1
assert result["previous"] == 1
def test_load_cycle_data_skips_invalid_json_lines(tmp_path):
retro_dir = tmp_path / ".loop" / "retro"
retro_dir.mkdir(parents=True)
retro_file = retro_dir / "cycles.jsonl"
now = datetime.now(UTC)
recent_ts = (now - timedelta(days=1)).isoformat()
retro_file.write_text(
f'not valid json\n{json.dumps({"timestamp": recent_ts, "success": True})}\n'
)
with patch("dashboard.routes.daily_run.REPO_ROOT", tmp_path):
result = _load_cycle_data(days=7)
assert result["current"] == 1
def test_load_cycle_data_skips_entries_with_no_timestamp(tmp_path):
retro_dir = tmp_path / ".loop" / "retro"
retro_dir.mkdir(parents=True)
retro_file = retro_dir / "cycles.jsonl"
retro_file.write_text(json.dumps({"success": True}))
with patch("dashboard.routes.daily_run.REPO_ROOT", tmp_path):
result = _load_cycle_data(days=7)
assert result == {"current": 0, "previous": 0}
# ---------------------------------------------------------------------------
# _fetch_layer_metrics
# ---------------------------------------------------------------------------
def _make_issue(updated_offset_days: int) -> dict:
ts = (datetime.now(UTC) - timedelta(days=updated_offset_days)).isoformat()
return {"updated_at": ts, "labels": [{"name": "layer:triage"}]}
def test_fetch_layer_metrics_counts_current_and_previous():
client = _make_client()
client._available = True
recent_issue = _make_issue(updated_offset_days=3)
older_issue = _make_issue(updated_offset_days=10)
with patch.object(client, "get_paginated", return_value=[recent_issue, older_issue]):
layers, total_current, total_previous = _fetch_layer_metrics(client, lookback_days=7)
# Should have one entry per LAYER_LABELS
assert len(layers) == len(LAYER_LABELS)
triage = next(lm for lm in layers if lm.name == "triage")
assert triage.current_count == 1
assert triage.previous_count == 1
def test_fetch_layer_metrics_degrades_on_http_error():
client = _make_client()
client._available = True
with patch.object(client, "get_paginated", side_effect=URLError("network")):
layers, total_current, total_previous = _fetch_layer_metrics(client, lookback_days=7)
assert len(layers) == len(LAYER_LABELS)
for lm in layers:
assert lm.current_count == 0
assert lm.previous_count == 0
assert total_current == 0
assert total_previous == 0
# ---------------------------------------------------------------------------
# _get_metrics
# ---------------------------------------------------------------------------
def test_get_metrics_returns_none_when_gitea_unavailable():
with patch("dashboard.routes.daily_run._load_config", return_value=DEFAULT_CONFIG):
with patch("dashboard.routes.daily_run._get_token", return_value=None):
with patch.object(GiteaClient, "is_available", return_value=False):
result = _get_metrics()
assert result is None
def test_get_metrics_returns_daily_run_metrics():
mock_layers = [
LayerMetrics(name="triage", label="layer:triage", current_count=5, previous_count=3)
]
with patch("dashboard.routes.daily_run._load_config", return_value=DEFAULT_CONFIG):
with patch("dashboard.routes.daily_run._get_token", return_value="tok"):
with patch.object(GiteaClient, "is_available", return_value=True):
with patch(
"dashboard.routes.daily_run._fetch_layer_metrics",
return_value=(mock_layers, 5, 3),
):
with patch(
"dashboard.routes.daily_run._load_cycle_data",
return_value={"current": 10, "previous": 8},
):
result = _get_metrics(lookback_days=7)
assert result is not None
assert result.sessions_completed == 10
assert result.sessions_previous == 8
assert result.lookback_days == 7
assert result.layers == mock_layers
def test_get_metrics_returns_none_on_exception():
with patch("dashboard.routes.daily_run._load_config", return_value=DEFAULT_CONFIG):
with patch("dashboard.routes.daily_run._get_token", return_value="tok"):
with patch.object(GiteaClient, "is_available", return_value=True):
with patch(
"dashboard.routes.daily_run._fetch_layer_metrics",
side_effect=Exception("unexpected"),
):
result = _get_metrics()
assert result is None
# ---------------------------------------------------------------------------
# Route handlers (FastAPI)
# ---------------------------------------------------------------------------
def test_daily_run_metrics_api_unavailable(client):
with patch("dashboard.routes.daily_run._get_metrics", return_value=None):
resp = client.get("/daily-run/metrics")
assert resp.status_code == 503
data = resp.json()
assert data["status"] == "unavailable"
def test_daily_run_metrics_api_returns_json(client):
mock_metrics = _make_daily_metrics(
layers=[
LayerMetrics(name="triage", label="layer:triage", current_count=3, previous_count=2)
]
)
with patch("dashboard.routes.daily_run._get_metrics", return_value=mock_metrics):
with patch(
"dashboard.routes.quests.check_daily_run_quests",
return_value=[],
create=True,
):
resp = client.get("/daily-run/metrics?lookback_days=7")
assert resp.status_code == 200
data = resp.json()
assert data["status"] == "ok"
assert data["lookback_days"] == 7
assert "sessions" in data
assert "layers" in data
assert "totals" in data
assert len(data["layers"]) == 1
assert data["layers"][0]["name"] == "triage"
def test_daily_run_panel_returns_html(client):
mock_metrics = _make_daily_metrics()
with patch("dashboard.routes.daily_run._get_metrics", return_value=mock_metrics):
with patch("dashboard.routes.daily_run._load_config", return_value=DEFAULT_CONFIG):
resp = client.get("/daily-run/panel")
assert resp.status_code == 200
assert "text/html" in resp.headers["content-type"]
def test_daily_run_panel_when_unavailable(client):
with patch("dashboard.routes.daily_run._get_metrics", return_value=None):
with patch("dashboard.routes.daily_run._load_config", return_value=DEFAULT_CONFIG):
resp = client.get("/daily-run/panel")
assert resp.status_code == 200

View File

@@ -11,10 +11,13 @@ PROD_PROJECT_DIR = Path("/home/ubuntu/prod-sovereign-stack")
PROD_VAULT_DIR = PROD_PROJECT_DIR / "TimmyVault"
SETUP_SCRIPT_PATH = Path("/home/ubuntu/setup_timmy.sh")
pytestmark = pytest.mark.skipif(
not SETUP_SCRIPT_PATH.exists(),
reason=f"Setup script not found at {SETUP_SCRIPT_PATH}",
)
pytestmark = [
pytest.mark.skip_ci,
pytest.mark.skipif(
not SETUP_SCRIPT_PATH.exists(),
reason=f"Setup script not found at {SETUP_SCRIPT_PATH}",
),
]
@pytest.fixture(scope="module", autouse=True)

View File

@@ -0,0 +1,439 @@
"""Tests for the three-tier metabolic LLM router (issue #966)."""
from unittest.mock import AsyncMock, MagicMock
import pytest
from infrastructure.router.metabolic import (
DEFAULT_TIER_MODELS,
MetabolicRouter,
ModelTier,
build_prompt,
classify_complexity,
get_metabolic_router,
)
pytestmark = pytest.mark.unit
# ── classify_complexity ──────────────────────────────────────────────────────
class TestClassifyComplexity:
"""Verify tier classification for representative task / state pairs."""
# ── T1: Routine ─────────────────────────────────────────────────────────
def test_simple_navigation_is_t1(self):
assert classify_complexity("go north", {}) == ModelTier.T1_ROUTINE
def test_single_action_is_t1(self):
assert classify_complexity("open door", {}) == ModelTier.T1_ROUTINE
def test_t1_with_extra_words_stays_t1(self):
# 6 words, all T1 territory, no active context
assert classify_complexity("go south and take it", {}) == ModelTier.T1_ROUTINE
def test_t1_long_task_upgrades_to_t2(self):
# More than 6 words → not T1 even with nav words
assert (
classify_complexity("go north and then move east and pick up the sword", {})
!= ModelTier.T1_ROUTINE
)
def test_active_quest_upgrades_t1_to_t2(self):
state = {"active_quests": ["Rescue the Mage"]}
assert classify_complexity("go north", state) == ModelTier.T2_MEDIUM
def test_dialogue_active_upgrades_t1_to_t2(self):
state = {"dialogue_active": True}
assert classify_complexity("yes", state) == ModelTier.T2_MEDIUM
def test_combat_active_upgrades_t1_to_t2(self):
state = {"combat_active": True}
assert classify_complexity("attack", state) == ModelTier.T2_MEDIUM
# ── T2: Medium ──────────────────────────────────────────────────────────
def test_default_is_t2(self):
assert classify_complexity("what do I have in my inventory", {}) == ModelTier.T2_MEDIUM
def test_dialogue_response_is_t2(self):
state = {"dialogue_active": True, "dialogue_npc": "Caius Cosades"}
result = classify_complexity("I'm looking for Caius Cosades", state)
assert result == ModelTier.T2_MEDIUM
# ── T3: Complex ─────────────────────────────────────────────────────────
def test_quest_planning_is_t3(self):
assert classify_complexity("plan my quest route", {}) == ModelTier.T3_COMPLEX
def test_strategy_keyword_is_t3(self):
assert classify_complexity("what is the best strategy", {}) == ModelTier.T3_COMPLEX
def test_stuck_keyword_is_t3(self):
assert classify_complexity("I am stuck", {}) == ModelTier.T3_COMPLEX
def test_stuck_state_is_t3(self):
assert classify_complexity("help me", {"stuck": True}) == ModelTier.T3_COMPLEX
def test_require_t3_flag_forces_t3(self):
state = {"require_t3": True}
assert classify_complexity("go north", state) == ModelTier.T3_COMPLEX
def test_optimize_keyword_is_t3(self):
assert classify_complexity("optimize my skill build", {}) == ModelTier.T3_COMPLEX
def test_multi_word_t3_phrase(self):
assert classify_complexity("how do i get past the guards", {}) == ModelTier.T3_COMPLEX
def test_case_insensitive(self):
assert classify_complexity("PLAN my route", {}) == ModelTier.T3_COMPLEX
# ── build_prompt ─────────────────────────────────────────────────────────────
class TestBuildPrompt:
"""Verify prompt structure and content assembly."""
def test_returns_two_messages(self):
msgs = build_prompt({}, {}, "go north")
assert len(msgs) == 2
assert msgs[0]["role"] == "system"
assert msgs[1]["role"] == "user"
def test_user_message_contains_task(self):
msgs = build_prompt({}, {}, "pick up the sword")
assert msgs[1]["content"] == "pick up the sword"
def test_location_in_system(self):
msgs = build_prompt({"location": "Balmora"}, {}, "look around")
assert "Balmora" in msgs[0]["content"]
def test_health_in_system(self):
msgs = build_prompt({"health": 42}, {}, "rest")
assert "42" in msgs[0]["content"]
def test_inventory_in_system(self):
msgs = build_prompt({"inventory": ["iron sword", "bread"]}, {}, "use item")
assert "iron sword" in msgs[0]["content"]
def test_inventory_truncated_to_10(self):
inventory = [f"item{i}" for i in range(20)]
msgs = build_prompt({"inventory": inventory}, {}, "check")
# Only first 10 should appear in the system message
assert "item10" not in msgs[0]["content"]
def test_active_quests_in_system(self):
msgs = build_prompt({"active_quests": ["Morrowind Main Quest"]}, {}, "help")
assert "Morrowind Main Quest" in msgs[0]["content"]
def test_stuck_indicator_in_system(self):
msgs = build_prompt({"stuck": True}, {}, "what now")
assert "STUCK" in msgs[0]["content"]
def test_dialogue_npc_in_system(self):
msgs = build_prompt({}, {"dialogue_active": True, "dialogue_npc": "Vivec"}, "hello")
assert "Vivec" in msgs[0]["content"]
def test_menu_open_in_system(self):
msgs = build_prompt({}, {"menu_open": "inventory"}, "check items")
assert "inventory" in msgs[0]["content"]
def test_combat_active_in_system(self):
msgs = build_prompt({}, {"combat_active": True}, "attack")
assert "COMBAT" in msgs[0]["content"]
def test_visual_context_in_system(self):
msgs = build_prompt({}, {}, "where am I", visual_context="A dark dungeon corridor")
assert "dungeon corridor" in msgs[0]["content"]
def test_missing_optional_fields_omitted(self):
msgs = build_prompt({}, {}, "move forward")
system = msgs[0]["content"]
assert "Health:" not in system
assert "Inventory:" not in system
assert "Active quests:" not in system
def test_inventory_dict_items(self):
inventory = [{"name": "silver dagger"}, {"name": "potion"}]
msgs = build_prompt({"inventory": inventory}, {}, "use")
assert "silver dagger" in msgs[0]["content"]
def test_quest_dict_items(self):
quests = [{"name": "The Warlord"}, {"name": "Lost in Translation"}]
msgs = build_prompt({"active_quests": quests}, {}, "help")
assert "The Warlord" in msgs[0]["content"]
# ── MetabolicRouter ──────────────────────────────────────────────────────────
@pytest.mark.asyncio
class TestMetabolicRouter:
"""Test MetabolicRouter routing, tier labelling, and T3 world-pause logic."""
def _make_router(self, mock_cascade=None):
"""Create a MetabolicRouter with a mocked CascadeRouter."""
if mock_cascade is None:
mock_cascade = MagicMock()
mock_cascade.complete = AsyncMock(
return_value={
"content": "Move north confirmed.",
"provider": "ollama-local",
"model": "qwen3:8b",
"latency_ms": 120.0,
}
)
return MetabolicRouter(cascade=mock_cascade)
async def test_route_returns_tier_in_result(self):
router = self._make_router()
result = await router.route("go north", state={})
assert "tier" in result
assert result["tier"] == ModelTier.T1_ROUTINE
async def test_t1_uses_t1_model(self):
mock_cascade = MagicMock()
mock_cascade.complete = AsyncMock(
return_value={
"content": "ok",
"provider": "ollama-local",
"model": "qwen3:8b",
"latency_ms": 100,
}
)
router = MetabolicRouter(cascade=mock_cascade)
await router.route("go north", state={})
call_kwargs = mock_cascade.complete.call_args
assert call_kwargs.kwargs["model"] == DEFAULT_TIER_MODELS[ModelTier.T1_ROUTINE]
async def test_t2_uses_t2_model(self):
mock_cascade = MagicMock()
mock_cascade.complete = AsyncMock(
return_value={
"content": "ok",
"provider": "ollama-local",
"model": "qwen3:14b",
"latency_ms": 300,
}
)
router = MetabolicRouter(cascade=mock_cascade)
await router.route("what should I say to the innkeeper", state={})
call_kwargs = mock_cascade.complete.call_args
assert call_kwargs.kwargs["model"] == DEFAULT_TIER_MODELS[ModelTier.T2_MEDIUM]
async def test_t3_uses_t3_model(self):
mock_cascade = MagicMock()
mock_cascade.complete = AsyncMock(
return_value={
"content": "ok",
"provider": "ollama-local",
"model": "qwen3:30b",
"latency_ms": 2000,
}
)
router = MetabolicRouter(cascade=mock_cascade)
await router.route("plan the optimal quest route", state={})
call_kwargs = mock_cascade.complete.call_args
assert call_kwargs.kwargs["model"] == DEFAULT_TIER_MODELS[ModelTier.T3_COMPLEX]
async def test_custom_tier_models_respected(self):
mock_cascade = MagicMock()
mock_cascade.complete = AsyncMock(
return_value={
"content": "ok",
"provider": "test",
"model": "custom-8b",
"latency_ms": 100,
}
)
custom = {ModelTier.T1_ROUTINE: "custom-8b"}
router = MetabolicRouter(cascade=mock_cascade, tier_models=custom)
await router.route("go north", state={})
call_kwargs = mock_cascade.complete.call_args
assert call_kwargs.kwargs["model"] == "custom-8b"
async def test_t3_pauses_world_before_inference(self):
mock_cascade = MagicMock()
mock_cascade.complete = AsyncMock(
return_value={
"content": "ok",
"provider": "ollama",
"model": "qwen3:30b",
"latency_ms": 1500,
}
)
router = MetabolicRouter(cascade=mock_cascade)
pause_calls = []
unpause_calls = []
mock_world = MagicMock()
def track_act(cmd):
if cmd.action == "pause":
pause_calls.append(cmd)
elif cmd.action == "unpause":
unpause_calls.append(cmd)
mock_world.act = track_act
router.set_world(mock_world)
await router.route("plan the quest", state={})
assert len(pause_calls) == 1, "world.pause() should be called once for T3"
assert len(unpause_calls) == 1, "world.unpause() should be called once for T3"
async def test_t3_unpauses_world_even_on_llm_error(self):
"""world.unpause() must be called even when the LLM raises."""
mock_cascade = MagicMock()
mock_cascade.complete = AsyncMock(side_effect=RuntimeError("LLM failed"))
router = MetabolicRouter(cascade=mock_cascade)
unpause_calls = []
mock_world = MagicMock()
mock_world.act = lambda cmd: unpause_calls.append(cmd) if cmd.action == "unpause" else None
router.set_world(mock_world)
with pytest.raises(RuntimeError, match="LLM failed"):
await router.route("plan the quest", state={})
assert len(unpause_calls) == 1, "world.unpause() must run even when LLM errors"
async def test_t1_does_not_pause_world(self):
mock_cascade = MagicMock()
mock_cascade.complete = AsyncMock(
return_value={
"content": "ok",
"provider": "ollama",
"model": "qwen3:8b",
"latency_ms": 120,
}
)
router = MetabolicRouter(cascade=mock_cascade)
pause_calls = []
mock_world = MagicMock()
mock_world.act = lambda cmd: pause_calls.append(cmd)
router.set_world(mock_world)
await router.route("go north", state={})
assert len(pause_calls) == 0, "world.pause() must NOT be called for T1"
async def test_t2_does_not_pause_world(self):
mock_cascade = MagicMock()
mock_cascade.complete = AsyncMock(
return_value={
"content": "ok",
"provider": "ollama",
"model": "qwen3:14b",
"latency_ms": 350,
}
)
router = MetabolicRouter(cascade=mock_cascade)
pause_calls = []
mock_world = MagicMock()
mock_world.act = lambda cmd: pause_calls.append(cmd)
router.set_world(mock_world)
await router.route("talk to the merchant", state={})
assert len(pause_calls) == 0, "world.pause() must NOT be called for T2"
async def test_broken_world_adapter_degrades_gracefully(self):
"""If world.act() raises, inference must still complete."""
mock_cascade = MagicMock()
mock_cascade.complete = AsyncMock(
return_value={
"content": "done",
"provider": "ollama",
"model": "qwen3:30b",
"latency_ms": 2000,
}
)
router = MetabolicRouter(cascade=mock_cascade)
mock_world = MagicMock()
mock_world.act = MagicMock(side_effect=RuntimeError("world broken"))
router.set_world(mock_world)
# Should not raise — degradation only logs a warning
result = await router.route("plan the quest", state={})
assert result["content"] == "done"
async def test_no_world_adapter_t3_still_works(self):
mock_cascade = MagicMock()
mock_cascade.complete = AsyncMock(
return_value={
"content": "plan done",
"provider": "ollama",
"model": "qwen3:30b",
"latency_ms": 2000,
}
)
router = MetabolicRouter(cascade=mock_cascade)
# No set_world() called
result = await router.route("plan the quest route", state={})
assert result["content"] == "plan done"
assert result["tier"] == ModelTier.T3_COMPLEX
async def test_classify_delegates_to_module_function(self):
router = MetabolicRouter(cascade=MagicMock())
assert router.classify("go north", {}) == classify_complexity("go north", {})
assert router.classify("plan the quest", {}) == classify_complexity("plan the quest", {})
async def test_ui_state_defaults_to_empty_dict(self):
"""Calling route without ui_state should not raise."""
mock_cascade = MagicMock()
mock_cascade.complete = AsyncMock(
return_value={
"content": "ok",
"provider": "ollama",
"model": "qwen3:8b",
"latency_ms": 100,
}
)
router = MetabolicRouter(cascade=mock_cascade)
# No ui_state argument
result = await router.route("go north", state={})
assert result["content"] == "ok"
async def test_temperature_and_max_tokens_forwarded(self):
mock_cascade = MagicMock()
mock_cascade.complete = AsyncMock(
return_value={
"content": "ok",
"provider": "ollama",
"model": "qwen3:14b",
"latency_ms": 200,
}
)
router = MetabolicRouter(cascade=mock_cascade)
await router.route("describe the scene", state={}, temperature=0.1, max_tokens=50)
call_kwargs = mock_cascade.complete.call_args.kwargs
assert call_kwargs["temperature"] == 0.1
assert call_kwargs["max_tokens"] == 50
class TestGetMetabolicRouter:
"""Test module-level singleton."""
def test_returns_metabolic_router_instance(self):
import infrastructure.router.metabolic as m_module
# Reset singleton for clean test
m_module._metabolic_router = None
router = get_metabolic_router()
assert isinstance(router, MetabolicRouter)
def test_singleton_returns_same_instance(self):
import infrastructure.router.metabolic as m_module
m_module._metabolic_router = None
r1 = get_metabolic_router()
r2 = get_metabolic_router()
assert r1 is r2

View File

@@ -2,7 +2,7 @@
import time
from pathlib import Path
from unittest.mock import AsyncMock, patch
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
import yaml
@@ -10,13 +10,16 @@ import yaml
from infrastructure.router.cascade import (
CascadeRouter,
CircuitState,
ContentType,
Provider,
ProviderMetrics,
ProviderStatus,
RouterConfig,
get_router,
)
@pytest.mark.unit
class TestProviderMetrics:
"""Test provider metrics tracking."""
@@ -45,6 +48,7 @@ class TestProviderMetrics:
assert metrics.error_rate == 0.3
@pytest.mark.unit
class TestProvider:
"""Test Provider dataclass."""
@@ -88,6 +92,7 @@ class TestProvider:
assert provider.get_default_model() is None
@pytest.mark.unit
class TestRouterConfig:
"""Test router configuration."""
@@ -100,6 +105,7 @@ class TestRouterConfig:
assert config.circuit_breaker_failure_threshold == 5
@pytest.mark.unit
class TestCascadeRouterInit:
"""Test CascadeRouter initialization."""
@@ -158,6 +164,7 @@ class TestCascadeRouterInit:
assert router.providers[0].api_key == "secret123"
@pytest.mark.unit
class TestCascadeRouterMetrics:
"""Test metrics tracking."""
@@ -241,6 +248,7 @@ class TestCascadeRouterMetrics:
assert provider.status == ProviderStatus.HEALTHY
@pytest.mark.unit
class TestCascadeRouterGetMetrics:
"""Test get_metrics method."""
@@ -280,6 +288,7 @@ class TestCascadeRouterGetMetrics:
assert p_metrics["metrics"]["avg_latency_ms"] == 200.0
@pytest.mark.unit
class TestCascadeRouterGetStatus:
"""Test get_status method."""
@@ -305,6 +314,7 @@ class TestCascadeRouterGetStatus:
assert len(status["providers"]) == 1
@pytest.mark.unit
@pytest.mark.asyncio
class TestCascadeRouterComplete:
"""Test complete method with failover."""
@@ -436,6 +446,7 @@ class TestCascadeRouterComplete:
assert result["provider"] == "healthy"
@pytest.mark.unit
class TestProviderAvailabilityCheck:
"""Test provider availability checking."""
@@ -512,7 +523,7 @@ class TestProviderAvailabilityCheck:
def test_check_vllm_mlx_server_healthy(self):
"""Test vllm-mlx when health check succeeds."""
from unittest.mock import MagicMock, patch
from unittest.mock import patch
router = CascadeRouter(config_path=Path("/nonexistent"))
@@ -556,7 +567,7 @@ class TestProviderAvailabilityCheck:
def test_check_vllm_mlx_default_url(self):
"""Test vllm-mlx uses default localhost:8000 when no URL configured."""
from unittest.mock import MagicMock, patch
from unittest.mock import patch
router = CascadeRouter(config_path=Path("/nonexistent"))
@@ -577,6 +588,7 @@ class TestProviderAvailabilityCheck:
mock_requests.get.assert_called_once_with("http://localhost:8000/health", timeout=5)
@pytest.mark.unit
@pytest.mark.asyncio
class TestVllmMlxProvider:
"""Test vllm-mlx provider integration."""
@@ -611,7 +623,7 @@ class TestVllmMlxProvider:
async def test_vllm_mlx_base_url_normalization(self):
"""Test _call_vllm_mlx appends /v1 when missing."""
from unittest.mock import AsyncMock, MagicMock, patch
from unittest.mock import AsyncMock, patch
router = CascadeRouter(config_path=Path("/nonexistent"))
@@ -681,6 +693,8 @@ class TestVllmMlxProvider:
assert result["content"] == "Local MLX response"
@pytest.mark.unit
@pytest.mark.asyncio
class TestMetabolicProtocol:
"""Test metabolic protocol: cloud providers skip when quota is ACTIVE/RESTING."""
@@ -790,6 +804,7 @@ class TestMetabolicProtocol:
assert result["content"] == "Cloud response"
@pytest.mark.unit
class TestCascadeRouterReload:
"""Test hot-reload of providers.yaml."""
@@ -968,3 +983,396 @@ class TestCascadeRouterReload:
assert router.providers[0].name == "low-priority"
assert router.providers[1].name == "high-priority"
@pytest.mark.unit
class TestContentTypeDetection:
"""Test _detect_content_type logic."""
def _router(self) -> CascadeRouter:
return CascadeRouter(config_path=Path("/nonexistent"))
def test_text_only(self):
router = self._router()
msgs = [{"role": "user", "content": "Hello"}]
assert router._detect_content_type(msgs) == ContentType.TEXT
def test_images_key_triggers_vision(self):
router = self._router()
msgs = [{"role": "user", "content": "Describe this", "images": ["pic.jpg"]}]
assert router._detect_content_type(msgs) == ContentType.VISION
def test_image_extension_in_content_triggers_vision(self):
router = self._router()
msgs = [{"role": "user", "content": "Look at photo.png please"}]
assert router._detect_content_type(msgs) == ContentType.VISION
def test_base64_data_uri_triggers_vision(self):
router = self._router()
msgs = [{"role": "user", "content": "data:image/jpeg;base64,/9j/4AA..."}]
assert router._detect_content_type(msgs) == ContentType.VISION
def test_audio_key_triggers_audio(self):
router = self._router()
msgs = [{"role": "user", "content": "", "audio": b"bytes"}]
assert router._detect_content_type(msgs) == ContentType.AUDIO
def test_image_and_audio_triggers_multimodal(self):
router = self._router()
msgs = [
{"role": "user", "content": "check photo.jpg", "audio": b"bytes"},
]
assert router._detect_content_type(msgs) == ContentType.MULTIMODAL
def test_list_content_image_url_type(self):
router = self._router()
msgs = [
{
"role": "user",
"content": [
{"type": "text", "text": "What?"},
{"type": "image_url", "image_url": {"url": "http://example.com/a.jpg"}},
],
}
]
assert router._detect_content_type(msgs) == ContentType.VISION
def test_list_content_audio_type(self):
router = self._router()
msgs = [
{
"role": "user",
"content": [
{"type": "audio", "data": "base64..."},
],
}
]
assert router._detect_content_type(msgs) == ContentType.AUDIO
@pytest.mark.unit
class TestTransformMessagesForOllama:
"""Test _transform_messages_for_ollama."""
def _router(self) -> CascadeRouter:
return CascadeRouter(config_path=Path("/nonexistent"))
def test_plain_text_message(self):
router = self._router()
result = router._transform_messages_for_ollama([{"role": "user", "content": "Hello"}])
assert result == [{"role": "user", "content": "Hello"}]
def test_base64_image_stripped(self):
router = self._router()
msgs = [
{
"role": "user",
"content": "Describe",
"images": ["data:image/png;base64,abc123"],
}
]
result = router._transform_messages_for_ollama(msgs)
assert result[0]["images"] == ["abc123"]
def test_http_url_skipped(self):
router = self._router()
msgs = [
{
"role": "user",
"content": "Describe",
"images": ["http://example.com/img.jpg"],
}
]
result = router._transform_messages_for_ollama(msgs)
# URL is skipped — images list should be empty or absent
assert result[0].get("images", []) == []
def test_missing_local_file_skipped(self):
router = self._router()
msgs = [
{
"role": "user",
"content": "Describe",
"images": ["/nonexistent/path/image.png"],
}
]
result = router._transform_messages_for_ollama(msgs)
assert result[0].get("images", []) == []
@pytest.mark.unit
class TestProviderCapabilityMethods:
"""Test Provider.get_model_with_capability and model_has_capability."""
def _provider(self) -> Provider:
return Provider(
name="test",
type="ollama",
enabled=True,
priority=1,
models=[
{"name": "llava:7b", "capabilities": ["vision"]},
{"name": "llama3.2", "default": True},
],
)
def test_get_model_with_capability_found(self):
p = self._provider()
assert p.get_model_with_capability("vision") == "llava:7b"
def test_get_model_with_capability_falls_back_to_default(self):
p = self._provider()
assert p.get_model_with_capability("audio") == "llama3.2"
def test_model_has_capability_true(self):
p = self._provider()
assert p.model_has_capability("llava:7b", "vision") is True
def test_model_has_capability_false(self):
p = self._provider()
assert p.model_has_capability("llama3.2", "vision") is False
def test_model_has_capability_unknown_model(self):
p = self._provider()
assert p.model_has_capability("unknown-model", "vision") is False
@pytest.mark.unit
class TestGetFallbackModel:
"""Test _get_fallback_model."""
def _router_with_provider(self) -> tuple[CascadeRouter, Provider]:
router = CascadeRouter(config_path=Path("/nonexistent"))
provider = Provider(
name="test",
type="ollama",
enabled=True,
priority=1,
models=[
{"name": "llava:7b", "capabilities": ["vision"]},
{"name": "llama3.2", "default": True},
],
)
return router, provider
def test_returns_vision_model(self):
router, provider = self._router_with_provider()
result = router._get_fallback_model(provider, "llama3.2", ContentType.VISION)
assert result == "llava:7b"
def test_returns_none_if_no_capability(self):
router, provider = self._router_with_provider()
result = router._get_fallback_model(provider, "llama3.2", ContentType.AUDIO)
# No audio model; falls back to default which is same as original
assert result is None or result == "llama3.2"
def test_text_content_returns_none(self):
router, provider = self._router_with_provider()
result = router._get_fallback_model(provider, "llama3.2", ContentType.TEXT)
assert result is None
@pytest.mark.unit
@pytest.mark.asyncio
class TestCascadeTierFiltering:
"""Test cascade_tier parameter in complete()."""
def _make_router(self) -> CascadeRouter:
router = CascadeRouter(config_path=Path("/nonexistent"))
router.providers = [
Provider(
name="anthropic-primary",
type="anthropic",
enabled=True,
priority=1,
api_key="test-key",
models=[{"name": "claude-sonnet-4-6", "default": True}],
),
Provider(
name="ollama-local",
type="ollama",
enabled=True,
priority=2,
models=[{"name": "llama3.2", "default": True}],
),
]
return router
async def test_frontier_required_uses_anthropic(self):
router = self._make_router()
with patch("infrastructure.router.cascade._quota_monitor", None):
with patch.object(router, "_call_anthropic") as mock_call:
mock_call.return_value = {
"content": "frontier response",
"model": "claude-sonnet-4-6",
}
result = await router.complete(
messages=[{"role": "user", "content": "hi"}],
cascade_tier="frontier_required",
)
assert result["provider"] == "anthropic-primary"
mock_call.assert_called_once()
async def test_frontier_required_no_anthropic_raises(self):
router = CascadeRouter(config_path=Path("/nonexistent"))
router.providers = [
Provider(
name="ollama-local",
type="ollama",
enabled=True,
priority=1,
models=[{"name": "llama3.2", "default": True}],
)
]
with pytest.raises(RuntimeError, match="No Anthropic provider configured"):
await router.complete(
messages=[{"role": "user", "content": "hi"}],
cascade_tier="frontier_required",
)
async def test_unknown_tier_raises(self):
router = self._make_router()
with pytest.raises(RuntimeError, match="No providers found for tier"):
await router.complete(
messages=[{"role": "user", "content": "hi"}],
cascade_tier="nonexistent_tier",
)
async def test_tier_filter_only_matching_providers(self):
router = CascadeRouter(config_path=Path("/nonexistent"))
router.providers = [
Provider(
name="local-primary",
type="ollama",
enabled=True,
priority=1,
tier="local",
models=[{"name": "llama3.2", "default": True}],
),
Provider(
name="cloud-secondary",
type="anthropic",
enabled=True,
priority=2,
tier="cloud",
api_key="key",
models=[{"name": "claude-sonnet-4-6", "default": True}],
),
]
with patch.object(router, "_call_ollama") as mock_call:
mock_call.return_value = {"content": "local response", "model": "llama3.2"}
result = await router.complete(
messages=[{"role": "user", "content": "hi"}],
cascade_tier="local",
)
assert result["provider"] == "local-primary"
mock_call.assert_called_once()
@pytest.mark.unit
@pytest.mark.asyncio
class TestGenerateWithImage:
"""Test generate_with_image convenience method."""
async def test_delegates_to_complete(self):
router = CascadeRouter(config_path=Path("/nonexistent"))
router.providers = [
Provider(
name="ollama-vision",
type="ollama",
enabled=True,
priority=1,
models=[{"name": "llava:7b", "capabilities": ["vision"], "default": True}],
)
]
with patch.object(router, "_call_ollama") as mock_call:
mock_call.return_value = {"content": "A cat", "model": "llava:7b"}
result = await router.generate_with_image(
prompt="What is this?",
image_path="/tmp/cat.jpg",
model="llava:7b",
)
assert result["content"] == "A cat"
assert result["provider"] == "ollama-vision"
# complete() should have been called with images in messages
call_kwargs = mock_call.call_args
messages_passed = call_kwargs.kwargs.get("messages") or call_kwargs[1].get("messages")
assert messages_passed[0]["images"] == ["/tmp/cat.jpg"]
@pytest.mark.unit
class TestGetRouterSingleton:
"""Test get_router() returns a singleton and creates CascadeRouter."""
def test_get_router_returns_cascade_router(self):
import infrastructure.router.cascade as cascade_module
# Reset singleton to test creation
original = cascade_module.cascade_router
cascade_module.cascade_router = None
try:
router = get_router()
assert isinstance(router, CascadeRouter)
finally:
cascade_module.cascade_router = original
def test_get_router_returns_same_instance(self):
import infrastructure.router.cascade as cascade_module
original = cascade_module.cascade_router
cascade_module.cascade_router = None
try:
r1 = get_router()
r2 = get_router()
assert r1 is r2
finally:
cascade_module.cascade_router = original
@pytest.mark.unit
class TestIsProviderAvailable:
"""Test _is_provider_available with circuit breaker transitions."""
def _router(self) -> CascadeRouter:
return CascadeRouter(config_path=Path("/nonexistent"))
def test_disabled_provider_not_available(self):
router = self._router()
provider = Provider(name="p", type="ollama", enabled=False, priority=1)
assert router._is_provider_available(provider) is False
def test_healthy_provider_available(self):
router = self._router()
provider = Provider(name="p", type="ollama", enabled=True, priority=1)
assert router._is_provider_available(provider) is True
def test_unhealthy_open_circuit_not_available(self):
router = self._router()
provider = Provider(
name="p",
type="ollama",
enabled=True,
priority=1,
status=ProviderStatus.UNHEALTHY,
circuit_state=CircuitState.OPEN,
circuit_opened_at=time.time(), # Just opened — not yet recoverable
)
assert router._is_provider_available(provider) is False
def test_unhealthy_after_timeout_transitions_to_half_open(self):
router = self._router()
router.config.circuit_breaker_recovery_timeout = 0
provider = Provider(
name="p",
type="ollama",
enabled=True,
priority=1,
status=ProviderStatus.UNHEALTHY,
circuit_state=CircuitState.OPEN,
circuit_opened_at=time.time() - 10, # Long ago
)
result = router._is_provider_available(provider)
assert result is True
assert provider.circuit_state == CircuitState.HALF_OPEN

View File

@@ -10,14 +10,12 @@ from __future__ import annotations
import json
import socket
from pathlib import Path
from unittest.mock import MagicMock, patch
import pytest
from integrations.bannerlord.gabs_client import GabsClient, GabsError
# ── GabsClient unit tests ─────────────────────────────────────────────────────
@@ -236,7 +234,13 @@ class TestBannerlordObserver:
snapshot = {
"game_state": {"day": 7, "season": "winter", "campaign_phase": "early"},
"player": {"name": "Timmy", "clan": "Thalheimer", "renown": 42, "level": 3, "gold": 1000},
"player": {
"name": "Timmy",
"clan": "Thalheimer",
"renown": 42,
"level": 3,
"gold": 1000,
},
"player_party": {"size": 25, "morale": 80, "food_days_left": 5},
"kingdoms": [{"name": "Vlandia", "ruler": "Derthert", "military_strength": 5000}],
}

View File

@@ -9,10 +9,8 @@ import json
from pathlib import Path
import pytest
import scripts.export_trajectories as et
# ── Fixtures ──────────────────────────────────────────────────────────────────
@@ -22,10 +20,30 @@ def simple_session(tmp_path: Path) -> Path:
logs_dir = tmp_path / "logs"
logs_dir.mkdir()
entries = [
{"type": "message", "role": "user", "content": "What time is it?", "timestamp": "2026-03-01T10:00:00"},
{"type": "message", "role": "timmy", "content": "It is 10:00 AM.", "timestamp": "2026-03-01T10:00:01"},
{"type": "message", "role": "user", "content": "Thanks!", "timestamp": "2026-03-01T10:00:05"},
{"type": "message", "role": "timmy", "content": "You're welcome!", "timestamp": "2026-03-01T10:00:06"},
{
"type": "message",
"role": "user",
"content": "What time is it?",
"timestamp": "2026-03-01T10:00:00",
},
{
"type": "message",
"role": "timmy",
"content": "It is 10:00 AM.",
"timestamp": "2026-03-01T10:00:01",
},
{
"type": "message",
"role": "user",
"content": "Thanks!",
"timestamp": "2026-03-01T10:00:05",
},
{
"type": "message",
"role": "timmy",
"content": "You're welcome!",
"timestamp": "2026-03-01T10:00:06",
},
]
session_file = logs_dir / "session_2026-03-01.jsonl"
session_file.write_text("\n".join(json.dumps(e) for e in entries) + "\n")
@@ -38,7 +56,12 @@ def tool_call_session(tmp_path: Path) -> Path:
logs_dir = tmp_path / "logs"
logs_dir.mkdir()
entries = [
{"type": "message", "role": "user", "content": "Read CLAUDE.md", "timestamp": "2026-03-01T10:00:00"},
{
"type": "message",
"role": "user",
"content": "Read CLAUDE.md",
"timestamp": "2026-03-01T10:00:00",
},
{
"type": "tool_call",
"tool": "read_file",
@@ -46,7 +69,12 @@ def tool_call_session(tmp_path: Path) -> Path:
"result": "# CLAUDE.md content here",
"timestamp": "2026-03-01T10:00:01",
},
{"type": "message", "role": "timmy", "content": "Here is the content.", "timestamp": "2026-03-01T10:00:02"},
{
"type": "message",
"role": "timmy",
"content": "Here is the content.",
"timestamp": "2026-03-01T10:00:02",
},
]
session_file = logs_dir / "session_2026-03-01.jsonl"
session_file.write_text("\n".join(json.dumps(e) for e in entries) + "\n")
@@ -236,7 +264,7 @@ def test_export_training_data_writes_jsonl(simple_session: Path, tmp_path: Path)
count = et.export_training_data(logs_dir=simple_session, output_path=output)
assert count == 2
assert output.exists()
lines = [json.loads(l) for l in output.read_text().splitlines() if l.strip()]
lines = [json.loads(line) for line in output.read_text().splitlines() if line.strip()]
assert len(lines) == 2
for line in lines:
assert "messages" in line
@@ -270,16 +298,22 @@ def test_export_training_data_returns_zero_for_empty_logs(tmp_path: Path) -> Non
@pytest.mark.unit
def test_cli_missing_logs_dir(tmp_path: Path) -> None:
rc = et.main(["--logs-dir", str(tmp_path / "nonexistent"), "--output", str(tmp_path / "out.jsonl")])
rc = et.main(
["--logs-dir", str(tmp_path / "nonexistent"), "--output", str(tmp_path / "out.jsonl")]
)
assert rc == 1
@pytest.mark.unit
def test_cli_exports_and_returns_zero(simple_session: Path, tmp_path: Path) -> None:
output = tmp_path / "out.jsonl"
rc = et.main([
"--logs-dir", str(simple_session),
"--output", str(output),
])
rc = et.main(
[
"--logs-dir",
str(simple_session),
"--output",
str(output),
]
)
assert rc == 0
assert output.exists()

View File

@@ -0,0 +1,195 @@
"""Tests for agent emotional state simulation (src/timmy/agents/emotional_state.py)."""
import time
from timmy.agents.emotional_state import (
EMOTION_PROMPT_MODIFIERS,
EMOTIONAL_STATES,
EVENT_TRANSITIONS,
EmotionalState,
EmotionalStateTracker,
_intensity_label,
)
class TestEmotionalState:
"""Test the EmotionalState dataclass."""
def test_defaults(self):
state = EmotionalState()
assert state.current_emotion == "calm"
assert state.intensity == 0.5
assert state.previous_emotion == "calm"
assert state.trigger_event == ""
def test_to_dict_includes_label(self):
state = EmotionalState(current_emotion="analytical")
d = state.to_dict()
assert d["emotion_label"] == "Analytical"
assert d["current_emotion"] == "analytical"
def test_to_dict_all_fields(self):
state = EmotionalState(
current_emotion="frustrated",
intensity=0.8,
previous_emotion="calm",
trigger_event="task_failure",
)
d = state.to_dict()
assert d["current_emotion"] == "frustrated"
assert d["intensity"] == 0.8
assert d["previous_emotion"] == "calm"
assert d["trigger_event"] == "task_failure"
class TestEmotionalStates:
"""Validate the emotional states and transitions are well-defined."""
def test_all_states_are_strings(self):
for state in EMOTIONAL_STATES:
assert isinstance(state, str)
def test_all_states_have_prompt_modifiers(self):
for state in EMOTIONAL_STATES:
assert state in EMOTION_PROMPT_MODIFIERS
def test_all_transitions_target_valid_states(self):
for event_type, (emotion, intensity) in EVENT_TRANSITIONS.items():
assert emotion in EMOTIONAL_STATES, f"{event_type} targets unknown state: {emotion}"
assert 0.0 <= intensity <= 1.0, f"{event_type} has invalid intensity: {intensity}"
class TestEmotionalStateTracker:
"""Test the EmotionalStateTracker."""
def test_initial_emotion_default(self):
tracker = EmotionalStateTracker()
assert tracker.state.current_emotion == "calm"
def test_initial_emotion_custom(self):
tracker = EmotionalStateTracker(initial_emotion="analytical")
assert tracker.state.current_emotion == "analytical"
def test_initial_emotion_invalid_falls_back(self):
tracker = EmotionalStateTracker(initial_emotion="invalid_state")
assert tracker.state.current_emotion == "calm"
def test_process_known_event(self):
tracker = EmotionalStateTracker()
state = tracker.process_event("task_success")
assert state.current_emotion == "confident"
assert state.trigger_event == "task_success"
assert state.previous_emotion == "calm"
def test_process_unknown_event_ignored(self):
tracker = EmotionalStateTracker()
state = tracker.process_event("unknown_event_xyz")
assert state.current_emotion == "calm" # unchanged
def test_repeated_same_emotion_amplifies(self):
tracker = EmotionalStateTracker()
tracker.process_event("task_success")
initial_intensity = tracker.state.intensity
tracker.process_event("user_praise") # also targets confident
assert tracker.state.intensity >= initial_intensity
def test_different_emotion_replaces(self):
tracker = EmotionalStateTracker()
tracker.process_event("task_success")
assert tracker.state.current_emotion == "confident"
tracker.process_event("task_failure")
assert tracker.state.current_emotion == "frustrated"
assert tracker.state.previous_emotion == "confident"
def test_decay_no_effect_when_recent(self):
tracker = EmotionalStateTracker()
tracker.process_event("task_failure")
emotion_before = tracker.state.current_emotion
tracker.decay()
assert tracker.state.current_emotion == emotion_before
def test_decay_resets_to_calm_after_long_time(self):
tracker = EmotionalStateTracker()
tracker.process_event("task_failure")
assert tracker.state.current_emotion == "frustrated"
# Simulate passage of time (30+ minutes)
tracker.state.updated_at = time.time() - 2000
tracker.decay()
assert tracker.state.current_emotion == "calm"
def test_get_profile_returns_expected_keys(self):
tracker = EmotionalStateTracker()
profile = tracker.get_profile()
assert "current_emotion" in profile
assert "emotion_label" in profile
assert "intensity" in profile
assert "intensity_label" in profile
assert "previous_emotion" in profile
assert "trigger_event" in profile
assert "prompt_modifier" in profile
def test_get_prompt_modifier_returns_string(self):
tracker = EmotionalStateTracker(initial_emotion="cautious")
modifier = tracker.get_prompt_modifier()
assert isinstance(modifier, str)
assert "cautious" in modifier.lower()
def test_reset(self):
tracker = EmotionalStateTracker()
tracker.process_event("task_failure")
tracker.reset()
assert tracker.state.current_emotion == "calm"
assert tracker.state.intensity == 0.5
def test_process_event_with_context(self):
"""Context dict is accepted without error."""
tracker = EmotionalStateTracker()
state = tracker.process_event("error", {"details": "connection timeout"})
assert state.current_emotion == "cautious"
def test_event_chain_scenario(self):
"""Simulate: task assigned → success → new discovery → idle."""
tracker = EmotionalStateTracker()
tracker.process_event("task_assigned")
assert tracker.state.current_emotion == "analytical"
tracker.process_event("task_success")
assert tracker.state.current_emotion == "confident"
tracker.process_event("new_discovery")
assert tracker.state.current_emotion == "curious"
tracker.process_event("idle")
assert tracker.state.current_emotion == "calm"
def test_health_events(self):
tracker = EmotionalStateTracker()
tracker.process_event("health_low")
assert tracker.state.current_emotion == "cautious"
tracker.process_event("health_recovered")
assert tracker.state.current_emotion == "calm"
def test_quest_completed_triggers_adventurous(self):
tracker = EmotionalStateTracker()
tracker.process_event("quest_completed")
assert tracker.state.current_emotion == "adventurous"
class TestIntensityLabel:
def test_overwhelming(self):
assert _intensity_label(0.9) == "overwhelming"
def test_strong(self):
assert _intensity_label(0.7) == "strong"
def test_moderate(self):
assert _intensity_label(0.5) == "moderate"
def test_mild(self):
assert _intensity_label(0.3) == "mild"
def test_faint(self):
assert _intensity_label(0.1) == "faint"

View File

@@ -435,14 +435,14 @@ class TestStatusAndCapabilities:
tools=["calc"],
)
status = agent.get_status()
assert status == {
"agent_id": "bot-1",
"name": "TestBot",
"role": "assistant",
"model": "qwen3:30b",
"status": "ready",
"tools": ["calc"],
}
assert status["agent_id"] == "bot-1"
assert status["name"] == "TestBot"
assert status["role"] == "assistant"
assert status["model"] == "qwen3:30b"
assert status["status"] == "ready"
assert status["tools"] == ["calc"]
assert "emotional_profile" in status
assert status["emotional_profile"]["current_emotion"] == "calm"
# ── SubAgent.execute_task ────────────────────────────────────────────────────

View File

@@ -4,8 +4,6 @@ from __future__ import annotations
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from timmy.dispatcher import (
AGENT_REGISTRY,
AgentType,
@@ -21,11 +19,11 @@ from timmy.dispatcher import (
wait_for_completion,
)
# ---------------------------------------------------------------------------
# Agent registry
# ---------------------------------------------------------------------------
class TestAgentRegistry:
def test_all_agents_present(self):
for member in AgentType:
@@ -41,7 +39,7 @@ class TestAgentRegistry:
assert spec.gitea_label, f"{agent} is gitea interface but has no label"
def test_non_gitea_agents_have_no_labels(self):
for agent, spec in AGENT_REGISTRY.items():
for _agent, spec in AGENT_REGISTRY.items():
if spec.interface not in ("gitea",):
# api and local agents may have no label
assert spec.gitea_label is None or spec.interface == "gitea"
@@ -55,6 +53,7 @@ class TestAgentRegistry:
# select_agent
# ---------------------------------------------------------------------------
class TestSelectAgent:
def test_architecture_routes_to_claude(self):
assert select_agent(TaskType.ARCHITECTURE) == AgentType.CLAUDE_CODE
@@ -85,6 +84,7 @@ class TestSelectAgent:
# infer_task_type
# ---------------------------------------------------------------------------
class TestInferTaskType:
def test_architecture_keyword(self):
assert infer_task_type("Design the LLM router architecture") == TaskType.ARCHITECTURE
@@ -119,6 +119,7 @@ class TestInferTaskType:
# DispatchResult
# ---------------------------------------------------------------------------
class TestDispatchResult:
def test_success_when_assigned(self):
r = DispatchResult(
@@ -161,6 +162,7 @@ class TestDispatchResult:
# _dispatch_local
# ---------------------------------------------------------------------------
class TestDispatchLocal:
async def test_returns_assigned(self):
result = await _dispatch_local(
@@ -190,6 +192,7 @@ class TestDispatchLocal:
# _dispatch_via_api
# ---------------------------------------------------------------------------
class TestDispatchViaApi:
async def test_no_endpoint_returns_failed(self):
result = await _dispatch_via_api(
@@ -304,7 +307,9 @@ class TestDispatchViaGitea:
assert result.status == DispatchStatus.ASSIGNED
async def test_no_gitea_token_returns_failed(self):
bad_settings = MagicMock(gitea_enabled=True, gitea_token="", gitea_url="http://x", gitea_repo="a/b")
bad_settings = MagicMock(
gitea_enabled=True, gitea_token="", gitea_url="http://x", gitea_repo="a/b"
)
with patch("timmy.dispatcher.settings", bad_settings):
result = await _dispatch_via_gitea(
agent=AgentType.CLAUDE_CODE,
@@ -317,7 +322,9 @@ class TestDispatchViaGitea:
assert "not configured" in (result.error or "").lower()
async def test_gitea_disabled_returns_failed(self):
bad_settings = MagicMock(gitea_enabled=False, gitea_token="tok", gitea_url="http://x", gitea_repo="a/b")
bad_settings = MagicMock(
gitea_enabled=False, gitea_token="tok", gitea_url="http://x", gitea_repo="a/b"
)
with patch("timmy.dispatcher.settings", bad_settings):
result = await _dispatch_via_gitea(
agent=AgentType.CLAUDE_CODE,
@@ -368,6 +375,7 @@ class TestDispatchViaGitea:
# dispatch_task (integration-style)
# ---------------------------------------------------------------------------
class TestDispatchTask:
async def test_empty_title_returns_failed(self):
result = await dispatch_task(title=" ")
@@ -396,7 +404,9 @@ class TestDispatchTask:
client_mock = AsyncMock()
client_mock.__aenter__ = AsyncMock(return_value=client_mock)
client_mock.__aexit__ = AsyncMock(return_value=False)
client_mock.get = AsyncMock(return_value=MagicMock(status_code=200, json=MagicMock(return_value=[])))
client_mock.get = AsyncMock(
return_value=MagicMock(status_code=200, json=MagicMock(return_value=[]))
)
create_resp = MagicMock(status_code=201, json=MagicMock(return_value={"id": 1}))
apply_resp = MagicMock(status_code=201)
comment_resp = MagicMock(status_code=201, json=MagicMock(return_value={"id": 5}))
@@ -464,6 +474,7 @@ class TestDispatchTask:
# wait_for_completion
# ---------------------------------------------------------------------------
class TestWaitForCompletion:
async def test_returns_completed_when_issue_closed(self):
closed_resp = MagicMock(

View File

@@ -25,7 +25,6 @@ from timmy.backlog_triage import (
score_issue,
)
# ── Fixtures ─────────────────────────────────────────────────────────────────

View File

View File

@@ -0,0 +1,301 @@
"""Unit tests for bannerlord agents — King, Vassals, Companions."""
import asyncio
from unittest.mock import AsyncMock, MagicMock, patch
from bannerlord.agents.companions import (
CaravanCompanion,
LogisticsCompanion,
ScoutCompanion,
)
from bannerlord.agents.king import KingAgent
from bannerlord.agents.vassals import DiplomacyVassal, EconomyVassal, WarVassal
from bannerlord.gabs_client import GABSClient, GABSUnavailable
from bannerlord.ledger import Ledger
from bannerlord.models import (
KingSubgoal,
TaskMessage,
)
# ── Helpers ───────────────────────────────────────────────────────────────────
def _mock_gabs(state: dict | None = None) -> GABSClient:
"""Return a disconnected GABS stub that returns *state* from get_state."""
gabs = MagicMock(spec=GABSClient)
gabs.connected = False
if state is not None:
gabs.get_state = AsyncMock(return_value=state)
else:
gabs.get_state = AsyncMock(side_effect=GABSUnavailable("no game"))
gabs.call = AsyncMock(return_value={})
gabs.recruit_troops = AsyncMock(return_value={"recruited": 10})
gabs.move_party = AsyncMock(return_value={"moving": True})
return gabs
def _mock_ledger(tmp_path) -> Ledger:
ledger = Ledger(db_path=tmp_path / "ledger.db")
ledger.initialize()
return ledger
# ── King agent ────────────────────────────────────────────────────────────────
class TestKingAgent:
async def test_victory_detected(self, tmp_path):
"""Campaign stops immediately when victory condition is met."""
gabs = _mock_gabs({"player_title": "King", "territory_control_pct": 55.0})
ledger = _mock_ledger(tmp_path)
king = KingAgent(gabs_client=gabs, ledger=ledger, tick_interval=0)
victory = await king.run_campaign(max_ticks=10)
assert victory.achieved
async def test_max_ticks_respected(self, tmp_path):
"""Campaign stops after max_ticks when victory not yet achieved."""
gabs = _mock_gabs({"player_title": "Lord", "territory_control_pct": 10.0})
ledger = _mock_ledger(tmp_path)
# Patch LLM to return a valid subgoal without calling Ollama
king = KingAgent(gabs_client=gabs, ledger=ledger, tick_interval=0)
with patch.object(king, "_decide", AsyncMock(return_value=KingSubgoal(token="RECRUIT"))):
victory = await king.run_campaign(max_ticks=3)
assert not victory.achieved
assert king._tick == 3
async def test_llm_failure_falls_back_to_recruit(self, tmp_path):
"""If LLM fails, King defaults to RECRUIT subgoal."""
gabs = _mock_gabs({"player_title": "Lord", "territory_control_pct": 5.0})
ledger = _mock_ledger(tmp_path)
king = KingAgent(gabs_client=gabs, ledger=ledger, tick_interval=0)
with patch.object(king, "_llm_decide", side_effect=RuntimeError("Ollama down")):
subgoal = await king._decide({})
assert subgoal.token == "RECRUIT"
async def test_subgoal_broadcast_to_all_vassals(self, tmp_path):
"""King broadcasts subgoal to all three vassals."""
gabs = _mock_gabs({})
ledger = _mock_ledger(tmp_path)
king = KingAgent(gabs_client=gabs, ledger=ledger)
subgoal = KingSubgoal(token="EXPAND_TERRITORY", target="Epicrotea")
await king._broadcast_subgoal(subgoal)
messages = []
while not king.subgoal_queue.empty():
messages.append(king.subgoal_queue.get_nowait())
assert len(messages) == 3
recipients = {m.to_agent for m in messages}
assert recipients == {"war_vassal", "economy_vassal", "diplomacy_vassal"}
async def test_gabs_unavailable_uses_empty_state(self, tmp_path):
"""King handles GABS being offline gracefully."""
gabs = _mock_gabs() # raises GABSUnavailable
ledger = _mock_ledger(tmp_path)
king = KingAgent(gabs_client=gabs, ledger=ledger)
state = await king._fetch_state()
assert state == {}
def test_evaluate_victory_king_with_majority(self, tmp_path):
gabs = _mock_gabs()
ledger = _mock_ledger(tmp_path)
king = KingAgent(gabs_client=gabs, ledger=ledger)
v = king._evaluate_victory({"player_title": "King", "territory_control_pct": 60.0})
assert v.achieved
def test_evaluate_victory_not_king(self, tmp_path):
gabs = _mock_gabs()
ledger = _mock_ledger(tmp_path)
king = KingAgent(gabs_client=gabs, ledger=ledger)
v = king._evaluate_victory({"player_title": "Lord", "territory_control_pct": 80.0})
assert not v.achieved
# ── Vassals ───────────────────────────────────────────────────────────────────
class TestWarVassal:
async def test_expand_territory_emits_move_task(self):
gabs = _mock_gabs({"territory_delta": 1.0, "army_strength_ratio": 1.5})
queue = asyncio.Queue()
vassal = WarVassal(gabs_client=gabs, subgoal_queue=queue)
subgoal = KingSubgoal(token="EXPAND_TERRITORY", target="Seonon")
await vassal._tick(subgoal)
task: TaskMessage = vassal.task_queue.get_nowait()
assert task.primitive == "move_party"
assert task.args["destination"] == "Seonon"
async def test_recruit_emits_recruit_task(self):
gabs = _mock_gabs({})
queue = asyncio.Queue()
vassal = WarVassal(gabs_client=gabs, subgoal_queue=queue)
subgoal = KingSubgoal(token="RECRUIT", quantity=15)
await vassal._tick(subgoal)
task: TaskMessage = vassal.task_queue.get_nowait()
assert task.primitive == "recruit_troop"
assert task.args["quantity"] == 15
async def test_irrelevant_token_emits_no_task(self):
gabs = _mock_gabs({})
queue = asyncio.Queue()
vassal = WarVassal(gabs_client=gabs, subgoal_queue=queue)
subgoal = KingSubgoal(token="ALLY")
await vassal._tick(subgoal)
assert vassal.task_queue.empty()
class TestEconomyVassal:
async def test_fortify_emits_build_task(self):
gabs = _mock_gabs({"daily_income": 200.0})
queue = asyncio.Queue()
vassal = EconomyVassal(gabs_client=gabs, subgoal_queue=queue)
subgoal = KingSubgoal(token="FORTIFY", target="Epicrotea")
await vassal._tick(subgoal)
task: TaskMessage = vassal.task_queue.get_nowait()
assert task.primitive == "build_project"
assert task.args["settlement"] == "Epicrotea"
async def test_trade_emits_assess_prices(self):
gabs = _mock_gabs({})
queue = asyncio.Queue()
vassal = EconomyVassal(gabs_client=gabs, subgoal_queue=queue)
subgoal = KingSubgoal(token="TRADE", target="Pravend")
await vassal._tick(subgoal)
task: TaskMessage = vassal.task_queue.get_nowait()
assert task.primitive == "assess_prices"
class TestDiplomacyVassal:
async def test_ally_emits_track_lord(self):
gabs = _mock_gabs({"allies_count": 1})
queue = asyncio.Queue()
vassal = DiplomacyVassal(gabs_client=gabs, subgoal_queue=queue)
subgoal = KingSubgoal(token="ALLY", target="Derthert")
await vassal._tick(subgoal)
task: TaskMessage = vassal.task_queue.get_nowait()
assert task.primitive == "track_lord"
assert task.args["name"] == "Derthert"
async def test_spy_emits_assess_garrison(self):
gabs = _mock_gabs({})
queue = asyncio.Queue()
vassal = DiplomacyVassal(gabs_client=gabs, subgoal_queue=queue)
subgoal = KingSubgoal(token="SPY", target="Marunath")
await vassal._tick(subgoal)
task: TaskMessage = vassal.task_queue.get_nowait()
assert task.primitive == "assess_garrison"
assert task.args["settlement"] == "Marunath"
# ── Companions ────────────────────────────────────────────────────────────────
class TestLogisticsCompanion:
async def test_recruit_troop(self):
gabs = _mock_gabs()
gabs.recruit_troops = AsyncMock(return_value={"recruited": 10, "type": "infantry"})
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
comp = LogisticsCompanion(gabs_client=gabs, task_queue=q)
task = TaskMessage(
from_agent="war_vassal",
to_agent="logistics_companion",
primitive="recruit_troop",
args={"troop_type": "infantry", "quantity": 10},
)
result = await comp._execute(task)
assert result.success is True
assert result.outcome["recruited"] == 10
async def test_unknown_primitive_fails_gracefully(self):
gabs = _mock_gabs()
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
comp = LogisticsCompanion(gabs_client=gabs, task_queue=q)
task = TaskMessage(
from_agent="war_vassal",
to_agent="logistics_companion",
primitive="launch_nukes",
args={},
)
result = await comp._execute(task)
assert result.success is False
assert "Unknown primitive" in result.outcome["error"]
async def test_gabs_unavailable_returns_failure(self):
gabs = _mock_gabs()
gabs.recruit_troops = AsyncMock(side_effect=GABSUnavailable("offline"))
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
comp = LogisticsCompanion(gabs_client=gabs, task_queue=q)
task = TaskMessage(
from_agent="war_vassal",
to_agent="logistics_companion",
primitive="recruit_troop",
args={"troop_type": "infantry", "quantity": 5},
)
result = await comp._execute(task)
assert result.success is False
class TestCaravanCompanion:
async def test_assess_prices(self):
gabs = _mock_gabs()
gabs.call = AsyncMock(return_value={"grain": 12, "linen": 45})
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
comp = CaravanCompanion(gabs_client=gabs, task_queue=q)
task = TaskMessage(
from_agent="economy_vassal",
to_agent="caravan_companion",
primitive="assess_prices",
args={"town": "Pravend"},
)
result = await comp._execute(task)
assert result.success is True
async def test_abandon_route(self):
gabs = _mock_gabs()
gabs.call = AsyncMock(return_value={"abandoned": True})
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
comp = CaravanCompanion(gabs_client=gabs, task_queue=q)
task = TaskMessage(
from_agent="economy_vassal",
to_agent="caravan_companion",
primitive="abandon_route",
args={},
)
result = await comp._execute(task)
assert result.success is True
assert result.outcome["abandoned"] is True
class TestScoutCompanion:
async def test_assess_garrison(self):
gabs = _mock_gabs()
gabs.call = AsyncMock(return_value={"garrison_size": 120, "settlement": "Marunath"})
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
comp = ScoutCompanion(gabs_client=gabs, task_queue=q)
task = TaskMessage(
from_agent="diplomacy_vassal",
to_agent="scout_companion",
primitive="assess_garrison",
args={"settlement": "Marunath"},
)
result = await comp._execute(task)
assert result.success is True
assert result.outcome["garrison_size"] == 120
async def test_report_intel(self):
gabs = _mock_gabs()
gabs.call = AsyncMock(return_value={"intel": ["Derthert at Epicrotea"]})
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
comp = ScoutCompanion(gabs_client=gabs, task_queue=q)
task = TaskMessage(
from_agent="diplomacy_vassal",
to_agent="scout_companion",
primitive="report_intel",
args={},
)
result = await comp._execute(task)
assert result.success is True

View File

@@ -0,0 +1,145 @@
"""Unit tests for bannerlord.gabs_client — TCP JSON-RPC client."""
import json
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from bannerlord.gabs_client import GABSClient, GABSError, GABSUnavailable
# ── Connection ────────────────────────────────────────────────────────────────
class TestGABSClientConnection:
async def test_connect_success(self):
mock_reader = AsyncMock()
mock_writer = MagicMock()
mock_writer.close = MagicMock()
mock_writer.wait_closed = AsyncMock()
with patch(
"bannerlord.gabs_client.asyncio.open_connection",
return_value=(mock_reader, mock_writer),
):
client = GABSClient()
await client.connect()
assert client.connected is True
await client.close()
async def test_connect_failure_degrades_gracefully(self):
with patch(
"bannerlord.gabs_client.asyncio.open_connection",
side_effect=OSError("Connection refused"),
):
client = GABSClient()
await client.connect() # must not raise
assert client.connected is False
async def test_connect_timeout_degrades_gracefully(self):
with patch(
"bannerlord.gabs_client.asyncio.open_connection",
side_effect=TimeoutError(),
):
client = GABSClient()
await client.connect()
assert client.connected is False
async def test_context_manager(self):
mock_reader = AsyncMock()
mock_writer = MagicMock()
mock_writer.close = MagicMock()
mock_writer.wait_closed = AsyncMock()
with patch(
"bannerlord.gabs_client.asyncio.open_connection",
return_value=(mock_reader, mock_writer),
):
async with GABSClient() as client:
assert client.connected is True
assert client.connected is False
# ── RPC ───────────────────────────────────────────────────────────────────────
class TestGABSClientRPC:
def _make_connected_client(self, response_data: dict):
"""Return a client with mocked reader/writer."""
client = GABSClient()
client._connected = True
raw_response = json.dumps(response_data) + "\n"
client._reader = AsyncMock()
client._reader.readline = AsyncMock(return_value=raw_response.encode())
client._writer = MagicMock()
client._writer.write = MagicMock()
client._writer.drain = AsyncMock()
return client
async def test_call_returns_result(self):
client = self._make_connected_client({"jsonrpc": "2.0", "id": 1, "result": {"foo": "bar"}})
result = await client.call("game.getState")
assert result == {"foo": "bar"}
async def test_call_raises_on_error(self):
client = self._make_connected_client(
{"jsonrpc": "2.0", "id": 1, "error": {"code": -32601, "message": "Method not found"}}
)
with pytest.raises(GABSError, match="Method not found"):
await client.call("game.nonexistent")
async def test_call_raises_unavailable_when_not_connected(self):
client = GABSClient()
assert client.connected is False
with pytest.raises(GABSUnavailable):
await client.call("game.getState")
async def test_sequence_increments(self):
client = self._make_connected_client({"jsonrpc": "2.0", "id": 1, "result": {}})
await client.call("game.getState")
assert client._seq == 1
client._reader.readline = AsyncMock(
return_value=(json.dumps({"jsonrpc": "2.0", "id": 2, "result": {}}) + "\n").encode()
)
await client.call("game.getState")
assert client._seq == 2
async def test_get_state_calls_correct_method(self):
client = self._make_connected_client(
{"jsonrpc": "2.0", "id": 1, "result": {"campaign_day": 10}}
)
result = await client.get_state()
written = client._writer.write.call_args[0][0].decode()
payload = json.loads(written.strip())
assert payload["method"] == "game.getState"
assert result == {"campaign_day": 10}
async def test_move_party_sends_target(self):
client = self._make_connected_client(
{"jsonrpc": "2.0", "id": 1, "result": {"moving": True}}
)
await client.move_party("Epicrotea")
written = client._writer.write.call_args[0][0].decode()
payload = json.loads(written.strip())
assert payload["method"] == "party.move"
assert payload["params"]["target"] == "Epicrotea"
async def test_connection_lost_marks_disconnected(self):
client = GABSClient()
client._connected = True
client._reader = AsyncMock()
client._reader.readline = AsyncMock(side_effect=OSError("connection reset"))
client._writer = MagicMock()
client._writer.write = MagicMock()
client._writer.drain = AsyncMock()
with pytest.raises(GABSUnavailable):
await client.call("game.getState")
assert client.connected is False

View File

@@ -0,0 +1,189 @@
"""Unit tests for bannerlord.models — data contracts and reward functions."""
import pytest
from bannerlord.models import (
SUBGOAL_TOKENS,
DiplomacyReward,
EconomyReward,
KingSubgoal,
ResultMessage,
StateUpdateMessage,
SubgoalMessage,
TaskMessage,
VictoryCondition,
WarReward,
)
# ── KingSubgoal ───────────────────────────────────────────────────────────────
class TestKingSubgoal:
def test_valid_token(self):
s = KingSubgoal(token="EXPAND_TERRITORY", target="Epicrotea")
assert s.token == "EXPAND_TERRITORY"
assert s.target == "Epicrotea"
assert s.priority == 1.0
def test_all_tokens_valid(self):
for token in SUBGOAL_TOKENS:
KingSubgoal(token=token)
def test_invalid_token_raises(self):
with pytest.raises(ValueError, match="Unknown subgoal token"):
KingSubgoal(token="NUKE_CALRADIA")
def test_priority_clamp(self):
with pytest.raises(ValueError):
KingSubgoal(token="TRADE", priority=3.0)
def test_optional_fields_default_none(self):
s = KingSubgoal(token="HEAL")
assert s.target is None
assert s.quantity is None
assert s.deadline_days is None
assert s.context is None
# ── Messages ──────────────────────────────────────────────────────────────────
class TestSubgoalMessage:
def test_defaults(self):
msg = SubgoalMessage(
to_agent="war_vassal",
subgoal=KingSubgoal(token="RAID_ECONOMY"),
)
assert msg.msg_type == "subgoal"
assert msg.from_agent == "king"
assert msg.to_agent == "war_vassal"
assert msg.issued_at is not None
def test_subgoal_roundtrip(self):
subgoal = KingSubgoal(token="RECRUIT", quantity=30, priority=1.5)
msg = SubgoalMessage(to_agent="war_vassal", subgoal=subgoal)
assert msg.subgoal.quantity == 30
assert msg.subgoal.priority == 1.5
class TestTaskMessage:
def test_construction(self):
t = TaskMessage(
from_agent="war_vassal",
to_agent="logistics_companion",
primitive="recruit_troop",
args={"troop_type": "cavalry", "quantity": 5},
priority=1.2,
)
assert t.msg_type == "task"
assert t.primitive == "recruit_troop"
assert t.args["quantity"] == 5
class TestResultMessage:
def test_success(self):
r = ResultMessage(
from_agent="logistics_companion",
to_agent="war_vassal",
success=True,
outcome={"recruited": 10},
reward_delta=0.15,
)
assert r.success is True
assert r.reward_delta == 0.15
def test_failure(self):
r = ResultMessage(
from_agent="scout_companion",
to_agent="diplomacy_vassal",
success=False,
outcome={"error": "GABS unavailable"},
)
assert r.success is False
assert r.reward_delta == 0.0
class TestStateUpdateMessage:
def test_construction(self):
msg = StateUpdateMessage(
game_state={"campaign_day": 42, "player_title": "Lord"},
tick=42,
)
assert msg.msg_type == "state"
assert msg.tick == 42
assert msg.game_state["campaign_day"] == 42
# ── Reward functions ──────────────────────────────────────────────────────────
class TestWarReward:
def test_positive_expansion(self):
r = WarReward(territory_delta=2.0, army_strength_ratio=1.2, subgoal_bonus=0.1)
assert r.total > 0
def test_casualty_cost_penalizes(self):
no_cost = WarReward(territory_delta=1.0, army_strength_ratio=1.0)
with_cost = WarReward(territory_delta=1.0, army_strength_ratio=1.0, casualty_cost=5.0)
assert with_cost.total < no_cost.total
def test_zero_state(self):
r = WarReward()
# army_strength_ratio default 1.0, rest 0 → 0.25 * 1.0 = 0.25
assert abs(r.total - 0.25) < 1e-9
class TestEconomyReward:
def test_income_positive(self):
r = EconomyReward(daily_denars_income=100.0, food_stock_buffer=7.0, loyalty_average=80.0)
assert r.total > 0
def test_construction_queue_penalizes(self):
no_queue = EconomyReward(daily_denars_income=50.0)
long_queue = EconomyReward(daily_denars_income=50.0, construction_queue_length=10)
assert long_queue.total < no_queue.total
def test_loyalty_contributes(self):
low_loyalty = EconomyReward(loyalty_average=10.0)
high_loyalty = EconomyReward(loyalty_average=90.0)
assert high_loyalty.total > low_loyalty.total
class TestDiplomacyReward:
def test_allies_positive(self):
r = DiplomacyReward(allies_count=3)
assert r.total > 0
def test_active_wars_penalizes(self):
peace = DiplomacyReward(allies_count=2)
war = DiplomacyReward(allies_count=2, active_wars_front=4)
assert war.total < peace.total
# ── Victory condition ─────────────────────────────────────────────────────────
class TestVictoryCondition:
def test_not_achieved_without_title(self):
v = VictoryCondition(holds_king_title=False, territory_control_pct=70.0)
assert not v.achieved
def test_not_achieved_without_majority(self):
v = VictoryCondition(holds_king_title=True, territory_control_pct=40.0)
assert not v.achieved
def test_achieved_when_king_with_majority(self):
v = VictoryCondition(holds_king_title=True, territory_control_pct=55.0)
assert v.achieved
def test_exact_threshold(self):
v = VictoryCondition(holds_king_title=True, territory_control_pct=51.0)
assert v.achieved
def test_custom_threshold(self):
v = VictoryCondition(
holds_king_title=True,
territory_control_pct=70.0,
majority_threshold=75.0,
)
assert not v.achieved

890
tests/unit/test_config.py Normal file
View File

@@ -0,0 +1,890 @@
"""Unit tests for src/config.py — Settings, validation, and helper functions.
Refs #1172
"""
import os
from unittest.mock import MagicMock, patch
import pytest
pytestmark = pytest.mark.unit
# ── Helpers ──────────────────────────────────────────────────────────────────
def _make_settings(**env_overrides):
"""Create a fresh Settings instance with isolated env vars."""
from config import Settings
# Strip keys that might bleed in from the test environment
clean_env = {
k: v
for k, v in os.environ.items()
if not k.startswith(
(
"OLLAMA_",
"TIMMY_",
"AGENT_",
"DEBUG",
"GITEA_",
"GROK_",
"ANTHROPIC_",
"SPARK_",
"MEMORY_",
"MAX_",
"DISCORD_",
"TELEGRAM_",
"CORS_",
"TRUSTED_",
"L402_",
"LIGHTNING_",
"REPO_ROOT",
"RQLITE_",
"BRAIN_",
"SELF_MODIFY",
"WORK_ORDERS",
"VASSAL_",
"PAPERCLIP_",
"OPENFANG_",
"HERMES_",
"BACKLOG_",
"LOOP_QA",
"FOCUS_",
"THINKING_",
"HANDS_",
"WEEKLY_",
"AUTORESEARCH_",
"REWARD_",
"BROWSER_",
"GABS_",
"SCRIPTURE_",
"MCP_",
"CHAT_API",
"CSRF_",
"ERROR_",
"DB_",
"MODERATION_",
"SOVEREIGNTY_",
"XAI_",
"CLAUDE_",
"FLUX_",
"IMAGE_",
"MUSIC_",
"VIDEO_",
"CREATIVE_",
"WAN_",
"ACE_",
"GIT_",
)
)
}
clean_env.update(env_overrides)
with patch.dict(os.environ, clean_env, clear=True):
return Settings()
# ── normalize_ollama_url ──────────────────────────────────────────────────────
class TestNormalizeOllamaUrl:
"""normalize_ollama_url replaces localhost with 127.0.0.1."""
def test_replaces_localhost(self):
from config import normalize_ollama_url
assert normalize_ollama_url("http://localhost:11434") == "http://127.0.0.1:11434"
def test_preserves_ip_address(self):
from config import normalize_ollama_url
assert normalize_ollama_url("http://192.168.1.5:11434") == "http://192.168.1.5:11434"
def test_preserves_non_localhost_hostname(self):
from config import normalize_ollama_url
assert normalize_ollama_url("http://ollama.local:11434") == "http://ollama.local:11434"
def test_replaces_multiple_occurrences(self):
from config import normalize_ollama_url
result = normalize_ollama_url("http://localhost:11434/localhost")
assert result == "http://127.0.0.1:11434/127.0.0.1"
def test_empty_string(self):
from config import normalize_ollama_url
assert normalize_ollama_url("") == ""
def test_127_0_0_1_unchanged(self):
from config import normalize_ollama_url
url = "http://127.0.0.1:11434"
assert normalize_ollama_url(url) == url
# ── Settings defaults ─────────────────────────────────────────────────────────
class TestSettingsDefaults:
"""Settings instantiation produces correct defaults."""
def test_default_agent_name(self):
s = _make_settings()
assert s.agent_name == "Agent"
def test_default_ollama_url(self):
s = _make_settings()
assert s.ollama_url == "http://localhost:11434"
def test_default_ollama_model(self):
s = _make_settings()
assert s.ollama_model == "qwen3:14b"
def test_default_ollama_fast_model(self):
s = _make_settings()
assert s.ollama_fast_model == "qwen3:8b"
def test_default_ollama_num_ctx(self):
s = _make_settings()
assert s.ollama_num_ctx == 32768
def test_default_ollama_max_loaded_models(self):
s = _make_settings()
assert s.ollama_max_loaded_models == 2
def test_default_debug_false(self):
s = _make_settings()
assert s.debug is False
def test_default_timmy_env(self):
s = _make_settings()
assert s.timmy_env == "development"
def test_default_timmy_test_mode_false(self):
s = _make_settings()
assert s.timmy_test_mode is False
def test_default_spark_enabled(self):
s = _make_settings()
assert s.spark_enabled is True
def test_default_lightning_backend(self):
s = _make_settings()
assert s.lightning_backend == "mock"
def test_default_max_agent_steps(self):
s = _make_settings()
assert s.max_agent_steps == 10
def test_default_memory_prune_days(self):
s = _make_settings()
assert s.memory_prune_days == 90
def test_default_memory_prune_keep_facts(self):
s = _make_settings()
assert s.memory_prune_keep_facts is True
def test_default_fallback_models_is_list(self):
s = _make_settings()
assert isinstance(s.fallback_models, list)
assert len(s.fallback_models) > 0
def test_default_vision_fallback_models_is_list(self):
s = _make_settings()
assert isinstance(s.vision_fallback_models, list)
assert len(s.vision_fallback_models) > 0
def test_default_cors_origins_is_list(self):
s = _make_settings()
assert isinstance(s.cors_origins, list)
assert len(s.cors_origins) > 0
def test_default_trusted_hosts_is_list(self):
s = _make_settings()
assert isinstance(s.trusted_hosts, list)
assert "localhost" in s.trusted_hosts
def test_default_timmy_model_backend(self):
s = _make_settings()
assert s.timmy_model_backend == "ollama"
def test_default_grok_enabled_false(self):
s = _make_settings()
assert s.grok_enabled is False
def test_default_moderation_enabled(self):
s = _make_settings()
assert s.moderation_enabled is True
def test_default_moderation_threshold(self):
s = _make_settings()
assert s.moderation_threshold == 0.8
def test_default_telemetry_disabled(self):
s = _make_settings()
assert s.telemetry_enabled is False
def test_default_db_busy_timeout(self):
s = _make_settings()
assert s.db_busy_timeout_ms == 5000
def test_default_chat_api_max_body_bytes(self):
s = _make_settings()
assert s.chat_api_max_body_bytes == 1_048_576
def test_default_csrf_cookie_secure_false(self):
s = _make_settings()
assert s.csrf_cookie_secure is False
def test_default_self_modify_disabled(self):
s = _make_settings()
assert s.self_modify_enabled is False
def test_default_vassal_disabled(self):
s = _make_settings()
assert s.vassal_enabled is False
def test_default_focus_mode(self):
s = _make_settings()
assert s.focus_mode == "broad"
def test_default_thinking_enabled(self):
s = _make_settings()
assert s.thinking_enabled is True
def test_default_gitea_url(self):
s = _make_settings()
assert s.gitea_url == "http://localhost:3000"
def test_default_hermes_enabled(self):
s = _make_settings()
assert s.hermes_enabled is True
def test_default_scripture_enabled(self):
s = _make_settings()
assert s.scripture_enabled is True
# ── normalized_ollama_url property ───────────────────────────────────────────
class TestNormalizedOllamaUrlProperty:
"""normalized_ollama_url property applies normalize_ollama_url."""
def test_default_url_normalized(self):
s = _make_settings()
assert "127.0.0.1" in s.normalized_ollama_url
assert "localhost" not in s.normalized_ollama_url
def test_custom_url_with_localhost(self):
s = _make_settings(OLLAMA_URL="http://localhost:9999")
assert s.normalized_ollama_url == "http://127.0.0.1:9999"
def test_custom_url_without_localhost_unchanged(self):
s = _make_settings(OLLAMA_URL="http://192.168.1.5:11434")
assert s.normalized_ollama_url == "http://192.168.1.5:11434"
# ── Env var overrides ─────────────────────────────────────────────────────────
class TestSettingsEnvOverrides:
"""Environment variables override default values."""
def test_agent_name_override(self):
s = _make_settings(AGENT_NAME="Timmy")
assert s.agent_name == "Timmy"
def test_ollama_url_override(self):
s = _make_settings(OLLAMA_URL="http://10.0.0.1:11434")
assert s.ollama_url == "http://10.0.0.1:11434"
def test_ollama_model_override(self):
s = _make_settings(OLLAMA_MODEL="llama3.1")
assert s.ollama_model == "llama3.1"
def test_ollama_fast_model_override(self):
s = _make_settings(OLLAMA_FAST_MODEL="gemma:2b")
assert s.ollama_fast_model == "gemma:2b"
def test_ollama_num_ctx_override(self):
s = _make_settings(OLLAMA_NUM_CTX="8192")
assert s.ollama_num_ctx == 8192
def test_debug_true_from_string(self):
s = _make_settings(DEBUG="true")
assert s.debug is True
def test_debug_false_from_string(self):
s = _make_settings(DEBUG="false")
assert s.debug is False
def test_timmy_env_production(self):
s = _make_settings(TIMMY_ENV="production")
assert s.timmy_env == "production"
def test_timmy_test_mode_true(self):
s = _make_settings(TIMMY_TEST_MODE="true")
assert s.timmy_test_mode is True
def test_grok_enabled_override(self):
s = _make_settings(GROK_ENABLED="true")
assert s.grok_enabled is True
def test_spark_enabled_override(self):
s = _make_settings(SPARK_ENABLED="false")
assert s.spark_enabled is False
def test_memory_prune_days_override(self):
s = _make_settings(MEMORY_PRUNE_DAYS="30")
assert s.memory_prune_days == 30
def test_max_agent_steps_override(self):
s = _make_settings(MAX_AGENT_STEPS="25")
assert s.max_agent_steps == 25
def test_telegram_token_override(self):
s = _make_settings(TELEGRAM_TOKEN="tg-secret")
assert s.telegram_token == "tg-secret"
def test_discord_token_override(self):
s = _make_settings(DISCORD_TOKEN="dc-secret")
assert s.discord_token == "dc-secret"
def test_gitea_url_override(self):
s = _make_settings(GITEA_URL="http://10.0.0.1:3000")
assert s.gitea_url == "http://10.0.0.1:3000"
def test_gitea_repo_override(self):
s = _make_settings(GITEA_REPO="myorg/myrepo")
assert s.gitea_repo == "myorg/myrepo"
def test_focus_mode_deep(self):
s = _make_settings(FOCUS_MODE="deep")
assert s.focus_mode == "deep"
def test_thinking_interval_override(self):
s = _make_settings(THINKING_INTERVAL_SECONDS="60")
assert s.thinking_interval_seconds == 60
def test_hermes_interval_override(self):
s = _make_settings(HERMES_INTERVAL_SECONDS="60")
assert s.hermes_interval_seconds == 60
def test_vassal_enabled_override(self):
s = _make_settings(VASSAL_ENABLED="true")
assert s.vassal_enabled is True
def test_self_modify_enabled_override(self):
s = _make_settings(SELF_MODIFY_ENABLED="true")
assert s.self_modify_enabled is True
def test_moderation_enabled_override(self):
s = _make_settings(MODERATION_ENABLED="false")
assert s.moderation_enabled is False
def test_l402_hmac_secret_override(self):
s = _make_settings(L402_HMAC_SECRET="mysecret")
assert s.l402_hmac_secret == "mysecret"
def test_anthropic_api_key_override(self):
s = _make_settings(ANTHROPIC_API_KEY="sk-ant-abc")
assert s.anthropic_api_key == "sk-ant-abc"
# ── Type validation ───────────────────────────────────────────────────────────
class TestSettingsTypeValidation:
"""Pydantic correctly parses and validates types from string env vars."""
def test_bool_from_1(self):
s = _make_settings(DEBUG="1")
assert s.debug is True
def test_bool_from_0(self):
s = _make_settings(DEBUG="0")
assert s.debug is False
def test_int_field_rejects_non_numeric(self):
from pydantic import ValidationError
with pytest.raises(ValidationError):
_make_settings(OLLAMA_NUM_CTX="not_a_number")
def test_timmy_env_rejects_invalid_literal(self):
from pydantic import ValidationError
with pytest.raises(ValidationError):
_make_settings(TIMMY_ENV="staging")
def test_timmy_model_backend_rejects_invalid(self):
from pydantic import ValidationError
with pytest.raises(ValidationError):
_make_settings(TIMMY_MODEL_BACKEND="openai")
def test_timmy_model_backend_accepts_all_valid_values(self):
for backend in ("ollama", "grok", "claude", "auto"):
s = _make_settings(TIMMY_MODEL_BACKEND=backend)
assert s.timmy_model_backend == backend
def test_lightning_backend_accepts_mock(self):
s = _make_settings(LIGHTNING_BACKEND="mock")
assert s.lightning_backend == "mock"
def test_lightning_backend_accepts_lnd(self):
s = _make_settings(LIGHTNING_BACKEND="lnd")
assert s.lightning_backend == "lnd"
def test_lightning_backend_rejects_invalid(self):
from pydantic import ValidationError
with pytest.raises(ValidationError):
_make_settings(LIGHTNING_BACKEND="stripe")
def test_focus_mode_rejects_invalid(self):
from pydantic import ValidationError
with pytest.raises(ValidationError):
_make_settings(FOCUS_MODE="zen")
def test_extra_fields_ignored(self):
# model_config has extra="ignore"
s = _make_settings(TOTALLY_UNKNOWN_FIELD="hello")
assert not hasattr(s, "totally_unknown_field")
def test_float_field_moderation_threshold(self):
s = _make_settings(MODERATION_THRESHOLD="0.95")
assert s.moderation_threshold == pytest.approx(0.95)
def test_float_field_gabs_timeout(self):
s = _make_settings(GABS_TIMEOUT="10.5")
assert s.gabs_timeout == pytest.approx(10.5)
# ── Edge cases ────────────────────────────────────────────────────────────────
class TestSettingsEdgeCases:
"""Edge cases: empty strings, boundary values."""
def test_empty_string_tokens_stay_empty(self):
s = _make_settings(TELEGRAM_TOKEN="", DISCORD_TOKEN="")
assert s.telegram_token == ""
assert s.discord_token == ""
def test_zero_int_fields(self):
s = _make_settings(OLLAMA_NUM_CTX="0", MEMORY_PRUNE_DAYS="0")
assert s.ollama_num_ctx == 0
assert s.memory_prune_days == 0
def test_large_int_value(self):
s = _make_settings(CHAT_API_MAX_BODY_BYTES="104857600")
assert s.chat_api_max_body_bytes == 104857600
def test_negative_int_accepted(self):
# Pydantic doesn't constrain these to positive by default
s = _make_settings(MAX_AGENT_STEPS="-1")
assert s.max_agent_steps == -1
def test_empty_api_keys_are_strings(self):
s = _make_settings()
assert isinstance(s.anthropic_api_key, str)
assert isinstance(s.xai_api_key, str)
assert isinstance(s.gitea_token, str)
# ── _compute_repo_root ────────────────────────────────────────────────────────
class TestComputeRepoRoot:
"""_compute_repo_root auto-detects .git directory."""
def test_returns_non_empty_string(self):
from config import Settings
s = Settings()
result = s._compute_repo_root()
assert isinstance(result, str)
assert len(result) > 0
def test_explicit_repo_root_returned_directly(self):
from config import Settings
s = Settings()
s.repo_root = "/tmp/custom-repo"
assert s._compute_repo_root() == "/tmp/custom-repo"
def test_detects_git_directory(self):
from config import Settings
s = Settings()
result = s._compute_repo_root()
import os
# The detected root should contain a .git directory (or be the cwd fallback)
assert os.path.isabs(result)
# ── model_post_init / gitea_token file fallback ───────────────────────────────
class TestModelPostInit:
"""model_post_init resolves gitea_token from file fallback."""
def test_gitea_token_from_env(self):
from config import Settings
with patch.dict(os.environ, {"GITEA_TOKEN": "env-token-abc"}, clear=False):
s = Settings()
assert s.gitea_token == "env-token-abc"
def test_gitea_token_stays_empty_when_no_file(self):
from config import Settings
env = {k: v for k, v in os.environ.items() if k != "GITEA_TOKEN"}
with patch.dict(os.environ, env, clear=True):
with patch("os.path.isfile", return_value=False):
s = Settings()
assert s.gitea_token == ""
def test_gitea_token_read_from_timmy_token_file(self, tmp_path):
"""model_post_init reads token from .timmy_gitea_token file."""
from config import Settings
token_file = tmp_path / ".timmy_gitea_token"
token_file.write_text("file-token-xyz\n")
env = {k: v for k, v in os.environ.items() if k != "GITEA_TOKEN"}
with patch.dict(os.environ, env, clear=True):
s = Settings()
# Override repo_root so post_init finds our temp file
def _fake_root():
return str(tmp_path)
s._compute_repo_root = _fake_root # type: ignore[method-assign]
# Re-run post_init logic manually since Settings is already created
s.gitea_token = ""
repo_root = _fake_root()
token_path = os.path.join(repo_root, ".timmy_gitea_token")
if os.path.isfile(token_path):
s.gitea_token = open(token_path).read().strip() # noqa: SIM115
assert s.gitea_token == "file-token-xyz"
def test_gitea_token_empty_file_stays_empty(self, tmp_path):
"""Empty token file leaves gitea_token as empty string."""
token_file = tmp_path / ".timmy_gitea_token"
token_file.write_text(" \n") # only whitespace
from config import Settings
env = {k: v for k, v in os.environ.items() if k != "GITEA_TOKEN"}
with patch.dict(os.environ, env, clear=True):
s = Settings()
# Simulate post_init with the tmp dir
s.gitea_token = ""
token_path = str(token_file)
if os.path.isfile(token_path):
token = open(token_path).read().strip() # noqa: SIM115
if token:
s.gitea_token = token
assert s.gitea_token == ""
# ── check_ollama_model_available ──────────────────────────────────────────────
class TestCheckOllamaModelAvailable:
"""check_ollama_model_available handles network responses and errors."""
def test_returns_false_on_oserror(self):
from config import check_ollama_model_available
with patch("urllib.request.urlopen", side_effect=OSError("Connection refused")):
assert check_ollama_model_available("llama3.1") is False
def test_returns_false_on_value_error(self):
from config import check_ollama_model_available
with patch("urllib.request.urlopen", side_effect=ValueError("Bad JSON")):
assert check_ollama_model_available("llama3.1") is False
def test_returns_true_exact_model_match(self):
import json
from config import check_ollama_model_available
response_data = json.dumps({"models": [{"name": "llama3.1:8b-instruct"}]}).encode()
mock_response = MagicMock()
mock_response.read.return_value = response_data
mock_response.__enter__ = lambda s: s
mock_response.__exit__ = MagicMock(return_value=False)
with patch("urllib.request.urlopen", return_value=mock_response):
assert check_ollama_model_available("llama3.1") is True
def test_returns_true_startswith_match(self):
import json
from config import check_ollama_model_available
response_data = json.dumps({"models": [{"name": "qwen3:14b"}]}).encode()
mock_response = MagicMock()
mock_response.read.return_value = response_data
mock_response.__enter__ = lambda s: s
mock_response.__exit__ = MagicMock(return_value=False)
with patch("urllib.request.urlopen", return_value=mock_response):
# "qwen3" matches "qwen3:14b" via startswith
assert check_ollama_model_available("qwen3") is True
def test_returns_false_when_model_not_found(self):
import json
from config import check_ollama_model_available
response_data = json.dumps({"models": [{"name": "qwen2.5:7b"}]}).encode()
mock_response = MagicMock()
mock_response.read.return_value = response_data
mock_response.__enter__ = lambda s: s
mock_response.__exit__ = MagicMock(return_value=False)
with patch("urllib.request.urlopen", return_value=mock_response):
assert check_ollama_model_available("llama3.1") is False
def test_returns_false_empty_model_list(self):
import json
from config import check_ollama_model_available
response_data = json.dumps({"models": []}).encode()
mock_response = MagicMock()
mock_response.read.return_value = response_data
mock_response.__enter__ = lambda s: s
mock_response.__exit__ = MagicMock(return_value=False)
with patch("urllib.request.urlopen", return_value=mock_response):
assert check_ollama_model_available("llama3.1") is False
def test_exact_name_match(self):
import json
from config import check_ollama_model_available
response_data = json.dumps({"models": [{"name": "qwen3:14b"}]}).encode()
mock_response = MagicMock()
mock_response.read.return_value = response_data
mock_response.__enter__ = lambda s: s
mock_response.__exit__ = MagicMock(return_value=False)
with patch("urllib.request.urlopen", return_value=mock_response):
assert check_ollama_model_available("qwen3:14b") is True
# ── get_effective_ollama_model ────────────────────────────────────────────────
class TestGetEffectiveOllamaModel:
"""get_effective_ollama_model walks fallback chain."""
def test_returns_primary_when_available(self):
from config import get_effective_ollama_model
with patch("config.check_ollama_model_available", return_value=True):
result = get_effective_ollama_model()
# Default is qwen3:14b
assert result == "qwen3:14b"
def test_falls_back_when_primary_unavailable(self):
from config import get_effective_ollama_model, settings
# Make primary unavailable, but one fallback available
fallback_target = settings.fallback_models[0]
def side_effect(model):
return model == fallback_target
with patch("config.check_ollama_model_available", side_effect=side_effect):
result = get_effective_ollama_model()
assert result == fallback_target
def test_returns_user_model_when_nothing_available(self):
from config import get_effective_ollama_model, settings
with patch("config.check_ollama_model_available", return_value=False):
result = get_effective_ollama_model()
# Last resort: returns user's configured model
assert result == settings.ollama_model
def test_skips_unavailable_fallbacks(self):
from config import get_effective_ollama_model, settings
# Only the last fallback is available
fallbacks = settings.fallback_models
last_fallback = fallbacks[-1]
def side_effect(model):
return model == last_fallback
with patch("config.check_ollama_model_available", side_effect=side_effect):
result = get_effective_ollama_model()
assert result == last_fallback
# ── validate_startup ──────────────────────────────────────────────────────────
class TestValidateStartup:
"""validate_startup enforces security in production, warns in dev."""
def setup_method(self):
import config
config._startup_validated = False
def test_skips_in_test_mode(self):
import config
with patch.dict(os.environ, {"TIMMY_TEST_MODE": "1"}):
config.validate_startup()
assert config._startup_validated is True
def test_dev_mode_does_not_exit(self):
import config
config._startup_validated = False
env = {k: v for k, v in os.environ.items() if k != "TIMMY_TEST_MODE"}
env["TIMMY_ENV"] = "development"
with patch.dict(os.environ, env, clear=True):
# Should not raise SystemExit
config.validate_startup()
assert config._startup_validated is True
def test_production_exits_without_l402_hmac_secret(self):
import config
config._startup_validated = False
with patch.object(config.settings, "timmy_env", "production"):
with patch.object(config.settings, "l402_hmac_secret", ""):
with patch.object(config.settings, "l402_macaroon_secret", ""):
with pytest.raises(SystemExit):
config.validate_startup(force=True)
def test_production_exits_without_l402_macaroon_secret(self):
import config
config._startup_validated = False
with patch.object(config.settings, "timmy_env", "production"):
with patch.object(config.settings, "l402_hmac_secret", "present"):
with patch.object(config.settings, "l402_macaroon_secret", ""):
with pytest.raises(SystemExit):
config.validate_startup(force=True)
def test_production_exits_with_cors_wildcard(self):
import config
config._startup_validated = False
with patch.object(config.settings, "timmy_env", "production"):
with patch.object(config.settings, "l402_hmac_secret", "secret1"):
with patch.object(config.settings, "l402_macaroon_secret", "secret2"):
with patch.object(config.settings, "cors_origins", ["*"]):
with pytest.raises(SystemExit):
config.validate_startup(force=True)
def test_production_passes_with_all_secrets_and_no_wildcard(self):
import config
config._startup_validated = False
with patch.object(config.settings, "timmy_env", "production"):
with patch.object(config.settings, "l402_hmac_secret", "secret1"):
with patch.object(config.settings, "l402_macaroon_secret", "secret2"):
with patch.object(config.settings, "cors_origins", ["http://localhost:3000"]):
config.validate_startup(force=True)
assert config._startup_validated is True
def test_idempotent_without_force(self):
import config
config._startup_validated = True
config.validate_startup()
assert config._startup_validated is True
def test_force_reruns_when_already_validated(self):
import config
config._startup_validated = True
with patch.dict(os.environ, {"TIMMY_TEST_MODE": "1"}):
config.validate_startup(force=True)
# Should have run (and set validated again)
assert config._startup_validated is True
def test_dev_warns_on_cors_wildcard(self, caplog):
import logging
import config
config._startup_validated = False
env = {k: v for k, v in os.environ.items() if k != "TIMMY_TEST_MODE"}
env["TIMMY_ENV"] = "development"
with patch.dict(os.environ, env, clear=True):
with patch.object(config.settings, "timmy_env", "development"):
with patch.object(config.settings, "cors_origins", ["*"]):
with patch.object(config.settings, "l402_hmac_secret", ""):
with patch.object(config.settings, "l402_macaroon_secret", ""):
with caplog.at_level(logging.WARNING, logger="config"):
config.validate_startup(force=True)
assert any("CORS" in rec.message for rec in caplog.records)
# ── APP_START_TIME ────────────────────────────────────────────────────────────
class TestAppStartTime:
"""APP_START_TIME is set at module load."""
def test_app_start_time_is_datetime(self):
from datetime import datetime
from config import APP_START_TIME
assert isinstance(APP_START_TIME, datetime)
def test_app_start_time_has_utc_timezone(self):
from config import APP_START_TIME
assert APP_START_TIME.tzinfo is not None
def test_app_start_time_is_in_the_past_or_now(self):
from datetime import UTC, datetime
from config import APP_START_TIME
assert APP_START_TIME <= datetime.now(UTC)
# ── Module-level singleton ────────────────────────────────────────────────────
class TestSettingsSingleton:
"""The module-level `settings` singleton is a Settings instance."""
def test_settings_is_settings_instance(self):
from config import Settings, settings
assert isinstance(settings, Settings)
def test_settings_repo_root_is_set(self):
from config import settings
assert isinstance(settings.repo_root, str)
def test_settings_has_expected_defaults(self):
from config import settings
# In test mode these may be overridden, but type should be correct
assert isinstance(settings.ollama_url, str)
assert isinstance(settings.debug, bool)

View File

@@ -0,0 +1,449 @@
"""Unit tests for the Hermes health monitor.
Tests all five checks (memory, disk, Ollama, processes, network) using mocks
so no real subprocesses or network calls are made.
Refs: #1073
"""
import json
from unittest.mock import MagicMock, patch
import pytest
from infrastructure.hermes.monitor import CheckResult, HealthLevel, HealthReport, HermesMonitor
@pytest.fixture()
def monitor():
return HermesMonitor()
# ── Unit helpers ──────────────────────────────────────────────────────────────
class _FakeHTTPResponse:
"""Minimal urllib response stub."""
def __init__(self, body: bytes, status: int = 200):
self._body = body
self.status = status
def read(self) -> bytes:
return self._body
def __enter__(self):
return self
def __exit__(self, *_):
pass
# ── Memory check ──────────────────────────────────────────────────────────────
def test_get_memory_info_parses_vm_stat(monitor):
vm_stat_output = (
"Mach Virtual Memory Statistics: (page size of 16384 bytes)\n"
"Pages free: 12800.\n"
"Pages active: 50000.\n"
"Pages inactive: 25600.\n"
"Pages speculative: 1000.\n"
)
with (
patch("subprocess.run") as mock_run,
):
# First call: sysctl hw.memsize (total)
sysctl_result = MagicMock()
sysctl_result.stdout = "68719476736\n" # 64 GB
# Second call: vm_stat
vmstat_result = MagicMock()
vmstat_result.stdout = vm_stat_output
mock_run.side_effect = [sysctl_result, vmstat_result]
info = monitor._get_memory_info()
assert info["total_gb"] == pytest.approx(64.0, abs=0.1)
# pages free (12800) + inactive (25600) = 38400 * 16384 bytes = 629145600 bytes ≈ 0.586 GB
expected_free_gb = (38400 * 16384) / (1024**3)
assert info["free_gb"] == pytest.approx(expected_free_gb, abs=0.001)
def test_get_memory_info_handles_subprocess_failure(monitor):
with patch("subprocess.run", side_effect=OSError("no sysctl")):
info = monitor._get_memory_info()
assert info["total_gb"] == 0.0
assert info["free_gb"] == 0.0
@pytest.mark.asyncio
async def test_check_memory_ok(monitor):
with patch.object(
monitor, "_get_memory_info", return_value={"free_gb": 20.0, "total_gb": 64.0}
):
result = await monitor._check_memory()
assert result.name == "memory"
assert result.level == HealthLevel.OK
assert "20.0GB" in result.message
@pytest.mark.asyncio
async def test_check_memory_low_triggers_unload(monitor):
with (
patch.object(monitor, "_get_memory_info", return_value={"free_gb": 2.0, "total_gb": 64.0}),
patch.object(monitor, "_unload_ollama_models", return_value=2),
):
result = await monitor._check_memory()
assert result.level == HealthLevel.WARNING
assert result.auto_resolved is True
assert "unloaded 2" in result.message
@pytest.mark.asyncio
async def test_check_memory_critical_no_models_to_unload(monitor):
with (
patch.object(monitor, "_get_memory_info", return_value={"free_gb": 1.0, "total_gb": 64.0}),
patch.object(monitor, "_unload_ollama_models", return_value=0),
):
result = await monitor._check_memory()
assert result.level == HealthLevel.CRITICAL
assert result.needs_human is True
@pytest.mark.asyncio
async def test_check_memory_exception_returns_unknown(monitor):
with patch.object(monitor, "_get_memory_info", side_effect=RuntimeError("boom")):
result = await monitor._check_memory()
assert result.level == HealthLevel.UNKNOWN
# ── Disk check ────────────────────────────────────────────────────────────────
@pytest.mark.asyncio
async def test_check_disk_ok(monitor):
usage = MagicMock()
usage.free = 100 * (1024**3) # 100 GB
usage.total = 500 * (1024**3) # 500 GB
usage.used = 400 * (1024**3)
with patch("shutil.disk_usage", return_value=usage):
result = await monitor._check_disk()
assert result.level == HealthLevel.OK
assert "100.0GB free" in result.message
@pytest.mark.asyncio
async def test_check_disk_low_triggers_cleanup(monitor):
usage = MagicMock()
usage.free = 5 * (1024**3) # 5 GB — below threshold
usage.total = 500 * (1024**3)
usage.used = 495 * (1024**3)
with (
patch("shutil.disk_usage", return_value=usage),
patch.object(monitor, "_cleanup_temp_files", return_value=2.5),
):
result = await monitor._check_disk()
assert result.level == HealthLevel.WARNING
assert result.auto_resolved is True
assert "cleaned 2.50GB" in result.message
@pytest.mark.asyncio
async def test_check_disk_critical_when_cleanup_fails(monitor):
usage = MagicMock()
usage.free = 5 * (1024**3)
usage.total = 500 * (1024**3)
usage.used = 495 * (1024**3)
with (
patch("shutil.disk_usage", return_value=usage),
patch.object(monitor, "_cleanup_temp_files", return_value=0.0),
):
result = await monitor._check_disk()
assert result.level == HealthLevel.CRITICAL
assert result.needs_human is True
# ── Ollama check ──────────────────────────────────────────────────────────────
def test_get_ollama_status_reachable(monitor):
tags_body = json.dumps({"models": [{"name": "qwen3:30b"}, {"name": "llama3.1:8b"}]}).encode()
ps_body = json.dumps({"models": [{"name": "qwen3:30b", "size": 1000}]}).encode()
responses = [
_FakeHTTPResponse(tags_body),
_FakeHTTPResponse(ps_body),
]
with patch("urllib.request.urlopen", side_effect=responses):
status = monitor._get_ollama_status()
assert status["reachable"] is True
assert len(status["models"]) == 2
assert len(status["loaded_models"]) == 1
def test_get_ollama_status_unreachable(monitor):
with patch("urllib.request.urlopen", side_effect=OSError("connection refused")):
status = monitor._get_ollama_status()
assert status["reachable"] is False
assert status["models"] == []
assert status["loaded_models"] == []
@pytest.mark.asyncio
async def test_check_ollama_ok(monitor):
status = {
"reachable": True,
"models": [{"name": "qwen3:30b"}],
"loaded_models": [],
}
with patch.object(monitor, "_get_ollama_status", return_value=status):
result = await monitor._check_ollama()
assert result.level == HealthLevel.OK
assert result.details["reachable"] is True
@pytest.mark.asyncio
async def test_check_ollama_unreachable_restart_success(monitor):
status = {"reachable": False, "models": [], "loaded_models": []}
with (
patch.object(monitor, "_get_ollama_status", return_value=status),
patch.object(monitor, "_restart_ollama", return_value=True),
):
result = await monitor._check_ollama()
assert result.level == HealthLevel.WARNING
assert result.auto_resolved is True
@pytest.mark.asyncio
async def test_check_ollama_unreachable_restart_fails(monitor):
status = {"reachable": False, "models": [], "loaded_models": []}
with (
patch.object(monitor, "_get_ollama_status", return_value=status),
patch.object(monitor, "_restart_ollama", return_value=False),
):
result = await monitor._check_ollama()
assert result.level == HealthLevel.CRITICAL
assert result.needs_human is True
# ── Process check ─────────────────────────────────────────────────────────────
def test_get_zombie_processes_none(monitor):
ps_output = (
"USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND\n"
"alex 123 0.1 0.2 100 200 s0 S 1:00 0:01 python\n"
"alex 456 0.0 0.1 50 100 s0 S 1:01 0:00 bash\n"
)
result = MagicMock()
result.stdout = ps_output
with patch("subprocess.run", return_value=result):
info = monitor._get_zombie_processes()
assert info["zombies"] == []
def test_get_zombie_processes_found(monitor):
ps_output = (
"USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND\n"
"alex 123 0.1 0.2 100 200 s0 S 1:00 0:01 python\n"
"alex 789 0.0 0.0 0 0 s0 Z 1:02 0:00 defunct\n"
)
result = MagicMock()
result.stdout = ps_output
with patch("subprocess.run", return_value=result):
info = monitor._get_zombie_processes()
assert len(info["zombies"]) == 1
assert info["zombies"][0]["pid"] == "789"
@pytest.mark.asyncio
async def test_check_processes_no_zombies(monitor):
with patch.object(monitor, "_get_zombie_processes", return_value={"zombies": []}):
result = await monitor._check_processes()
assert result.level == HealthLevel.OK
@pytest.mark.asyncio
async def test_check_processes_zombies_warning(monitor):
zombies = [{"pid": "100", "command": "defunct"}, {"pid": "101", "command": "defunct"}]
with patch.object(monitor, "_get_zombie_processes", return_value={"zombies": zombies}):
result = await monitor._check_processes()
assert result.level == HealthLevel.WARNING
assert result.needs_human is False # Only 2, threshold is >3
@pytest.mark.asyncio
async def test_check_processes_many_zombies_needs_human(monitor):
zombies = [{"pid": str(i), "command": "defunct"} for i in range(5)]
with patch.object(monitor, "_get_zombie_processes", return_value={"zombies": zombies}):
result = await monitor._check_processes()
assert result.needs_human is True
# ── Network check ─────────────────────────────────────────────────────────────
def test_check_gitea_connectivity_ok(monitor):
body = json.dumps({"version": "1.22.0"}).encode()
with patch("urllib.request.urlopen", return_value=_FakeHTTPResponse(body, status=200)):
info = monitor._check_gitea_connectivity()
assert info["reachable"] is True
assert info["latency_ms"] >= 0
def test_check_gitea_connectivity_unreachable(monitor):
with patch("urllib.request.urlopen", side_effect=OSError("refused")):
info = monitor._check_gitea_connectivity()
assert info["reachable"] is False
assert "error" in info
@pytest.mark.asyncio
async def test_check_network_ok(monitor):
with patch.object(
monitor,
"_check_gitea_connectivity",
return_value={"reachable": True, "latency_ms": 5.0, "url": "http://localhost:3000"},
):
result = await monitor._check_network()
assert result.level == HealthLevel.OK
assert "Gitea reachable" in result.message
@pytest.mark.asyncio
async def test_check_network_unreachable(monitor):
with patch.object(
monitor,
"_check_gitea_connectivity",
return_value={"reachable": False, "error": "refused", "url": "http://localhost:3000"},
):
result = await monitor._check_network()
assert result.level == HealthLevel.WARNING
assert result.needs_human is True
# ── Full cycle ────────────────────────────────────────────────────────────────
@pytest.mark.asyncio
async def test_run_cycle_all_ok(monitor):
ok_result = CheckResult(name="test", level=HealthLevel.OK, message="ok")
async def _ok_check():
return ok_result
with (
patch.object(monitor, "_check_memory", _ok_check),
patch.object(monitor, "_check_disk", _ok_check),
patch.object(monitor, "_check_ollama", _ok_check),
patch.object(monitor, "_check_processes", _ok_check),
patch.object(monitor, "_check_network", _ok_check),
patch.object(monitor, "_handle_alerts"),
):
report = await monitor.run_cycle()
assert report.overall == HealthLevel.OK
assert not report.has_issues
assert monitor.last_report is report
@pytest.mark.asyncio
async def test_run_cycle_sets_overall_to_worst(monitor):
async def _ok():
return CheckResult(name="ok", level=HealthLevel.OK, message="ok")
async def _critical():
return CheckResult(name="critical", level=HealthLevel.CRITICAL, message="bad")
with (
patch.object(monitor, "_check_memory", _ok),
patch.object(monitor, "_check_disk", _critical),
patch.object(monitor, "_check_ollama", _ok),
patch.object(monitor, "_check_processes", _ok),
patch.object(monitor, "_check_network", _ok),
patch.object(monitor, "_handle_alerts"),
):
report = await monitor.run_cycle()
assert report.overall == HealthLevel.CRITICAL
assert report.has_issues is True
@pytest.mark.asyncio
async def test_run_cycle_exception_becomes_unknown(monitor):
async def _ok():
return CheckResult(name="ok", level=HealthLevel.OK, message="ok")
async def _boom():
raise RuntimeError("unexpected error")
with (
patch.object(monitor, "_check_memory", _ok),
patch.object(monitor, "_check_disk", _ok),
patch.object(monitor, "_check_ollama", _boom),
patch.object(monitor, "_check_processes", _ok),
patch.object(monitor, "_check_network", _ok),
patch.object(monitor, "_handle_alerts"),
):
report = await monitor.run_cycle()
levels = {c.level for c in report.checks}
assert HealthLevel.UNKNOWN in levels
# ── to_dict serialisation ────────────────────────────────────────────────────
def test_check_result_to_dict():
c = CheckResult(
name="memory",
level=HealthLevel.WARNING,
message="low",
details={"free_gb": 3.5},
auto_resolved=True,
)
d = c.to_dict()
assert d["name"] == "memory"
assert d["level"] == "warning"
assert d["auto_resolved"] is True
assert d["details"]["free_gb"] == 3.5
def test_health_report_to_dict():
checks = [
CheckResult(name="disk", level=HealthLevel.OK, message="ok"),
]
report = HealthReport(
timestamp="2026-01-01T00:00:00+00:00",
checks=checks,
overall=HealthLevel.OK,
)
d = report.to_dict()
assert d["overall"] == "ok"
assert d["has_issues"] is False
assert len(d["checks"]) == 1

View File

@@ -9,19 +9,15 @@ Refs: #1105
from __future__ import annotations
import json
import tempfile
from datetime import UTC, datetime, timedelta
from pathlib import Path
import pytest
from timmy_automations.retrain.quality_filter import QualityFilter, TrajectoryQuality
from timmy_automations.retrain.retrain import RetrainOrchestrator
from timmy_automations.retrain.training_dataset import TrainingDataset
from timmy_automations.retrain.training_log import CycleMetrics, TrainingLog
from timmy_automations.retrain.trajectory_exporter import Trajectory, TrajectoryExporter
# ── Fixtures ─────────────────────────────────────────────────────────────────
@@ -382,7 +378,7 @@ class TestTrainingDataset:
ds = TrainingDataset(repo_root=tmp_path)
ds.append([self._make_result()], "2026-W12")
with open(ds.dataset_path) as f:
lines = [l.strip() for l in f if l.strip()]
lines = [line.strip() for line in f if line.strip()]
assert len(lines) == 1
record = json.loads(lines[0])
assert "messages" in record

View File

@@ -0,0 +1,272 @@
"""Unit tests for the sovereignty metrics emitter and store.
Refs: #954
"""
from unittest.mock import AsyncMock, patch
import pytest
from timmy.sovereignty.metrics import (
ALL_EVENT_TYPES,
SovereigntyMetricsStore,
emit_sovereignty_event,
get_cost_per_hour,
get_skills_crystallized,
get_sovereignty_pct,
record,
)
pytestmark = pytest.mark.unit
@pytest.fixture
def store(tmp_path):
"""A fresh SovereigntyMetricsStore backed by a temp database."""
return SovereigntyMetricsStore(db_path=tmp_path / "test_sov.db")
# ── ALL_EVENT_TYPES ───────────────────────────────────────────────────────────
class TestEventTypes:
def test_all_expected_event_types_present(self):
expected = {
"perception_cache_hit",
"perception_vlm_call",
"decision_rule_hit",
"decision_llm_call",
"narration_template",
"narration_llm",
"skill_crystallized",
"api_call",
"api_cost",
}
assert ALL_EVENT_TYPES == expected
# ── Record & retrieval ────────────────────────────────────────────────────────
class TestRecord:
def test_record_inserts_event(self, store):
store.record("perception_cache_hit")
pct = store.get_sovereignty_pct("perception")
assert pct == 100.0
def test_record_with_metadata(self, store):
store.record("api_cost", metadata={"usd": 0.05})
cost = store.get_cost_per_hour()
assert cost > 0.0
def test_record_with_session_id(self, store):
store.record("skill_crystallized", session_id="sess-1")
assert store.get_skills_crystallized("sess-1") == 1
def test_record_unknown_type_does_not_raise(self, store):
"""Unknown event types are silently stored (no crash)."""
store.record("totally_unknown_event") # should not raise
# ── Sessions ──────────────────────────────────────────────────────────────────
class TestSessions:
def test_start_session_returns_id(self, store):
sid = store.start_session(game="Bannerlord")
assert isinstance(sid, str)
assert len(sid) > 0
def test_start_session_accepts_custom_id(self, store):
sid = store.start_session(game="Bannerlord", session_id="my-session")
assert sid == "my-session"
def test_end_session_does_not_raise(self, store):
sid = store.start_session()
store.end_session(sid) # should not raise
def test_start_session_idempotent(self, store):
"""Starting a session with the same ID twice is a no-op."""
sid = store.start_session(session_id="dup")
sid2 = store.start_session(session_id="dup")
assert sid == sid2
# ── Sovereignty percentage ────────────────────────────────────────────────────
class TestGetSovereigntyPct:
def test_perception_all_cache_hits(self, store):
for _ in range(5):
store.record("perception_cache_hit")
assert store.get_sovereignty_pct("perception") == 100.0
def test_perception_mixed(self, store):
store.record("perception_cache_hit")
store.record("perception_vlm_call")
assert store.get_sovereignty_pct("perception") == 50.0
def test_decision_all_sovereign(self, store):
for _ in range(3):
store.record("decision_rule_hit")
assert store.get_sovereignty_pct("decision") == 100.0
def test_narration_all_sovereign(self, store):
store.record("narration_template")
store.record("narration_template")
assert store.get_sovereignty_pct("narration") == 100.0
def test_narration_all_llm(self, store):
store.record("narration_llm")
assert store.get_sovereignty_pct("narration") == 0.0
def test_no_events_returns_zero(self, store):
assert store.get_sovereignty_pct("perception") == 0.0
def test_unknown_layer_returns_zero(self, store):
assert store.get_sovereignty_pct("nonexistent_layer") == 0.0
def test_time_window_filters_old_events(self, store, tmp_path):
"""Events outside the time window are excluded."""
# Insert an event with a very old timestamp directly
import sqlite3
from contextlib import closing
with closing(sqlite3.connect(str(store._db_path))) as conn:
conn.execute(
"INSERT INTO events (timestamp, event_type, session_id, metadata_json) VALUES (?, ?, ?, ?)",
("2000-01-01T00:00:00+00:00", "perception_cache_hit", "", "{}"),
)
conn.commit()
# With a 60-second window, the old event should be excluded
pct = store.get_sovereignty_pct("perception", time_window=60)
assert pct == 0.0
def test_time_window_includes_recent_events(self, store):
store.record("decision_rule_hit")
pct = store.get_sovereignty_pct("decision", time_window=60)
assert pct == 100.0
# ── Cost per hour ─────────────────────────────────────────────────────────────
class TestGetCostPerHour:
def test_no_events_returns_zero(self, store):
assert store.get_cost_per_hour() == 0.0
def test_single_cost_event(self, store):
# Record a cost of $1.00 within the last hour window
store.record("api_cost", metadata={"usd": 1.00})
cost = store.get_cost_per_hour(time_window=3600)
assert cost == pytest.approx(1.00, rel=1e-3)
def test_multiple_cost_events(self, store):
store.record("api_cost", metadata={"usd": 0.25})
store.record("api_cost", metadata={"usd": 0.75})
cost = store.get_cost_per_hour(time_window=3600)
assert cost == pytest.approx(1.00, rel=1e-3)
def test_missing_usd_field_is_zero(self, store):
store.record("api_cost", metadata={"model": "gpt-4"})
assert store.get_cost_per_hour() == 0.0
def test_cost_extrapolated_for_short_window(self, store):
"""Cost recorded in a 1800s window is doubled to get per-hour rate."""
store.record("api_cost", metadata={"usd": 0.5})
cost = store.get_cost_per_hour(time_window=1800)
assert cost == pytest.approx(1.0, rel=1e-3)
# ── Skills crystallised ───────────────────────────────────────────────────────
class TestGetSkillsCrystallized:
def test_no_skills_returns_zero(self, store):
assert store.get_skills_crystallized() == 0
def test_counts_all_sessions(self, store):
store.record("skill_crystallized", session_id="a")
store.record("skill_crystallized", session_id="b")
assert store.get_skills_crystallized() == 2
def test_filters_by_session(self, store):
store.record("skill_crystallized", session_id="sess-1")
store.record("skill_crystallized", session_id="sess-2")
assert store.get_skills_crystallized("sess-1") == 1
def test_session_with_no_skills(self, store):
store.record("skill_crystallized", session_id="sess-1")
assert store.get_skills_crystallized("sess-999") == 0
# ── Snapshot ──────────────────────────────────────────────────────────────────
class TestGetSnapshot:
def test_snapshot_structure(self, store):
snap = store.get_snapshot()
assert "sovereignty" in snap
assert "cost_per_hour" in snap
assert "skills_crystallized" in snap
def test_snapshot_sovereignty_has_all_layers(self, store):
snap = store.get_snapshot()
assert set(snap["sovereignty"].keys()) == {"perception", "decision", "narration"}
def test_snapshot_reflects_events(self, store):
store.record("perception_cache_hit")
store.record("skill_crystallized")
snap = store.get_snapshot()
assert snap["sovereignty"]["perception"] == 100.0
assert snap["skills_crystallized"] == 1
# ── Module-level convenience functions ───────────────────────────────────────
class TestModuleLevelFunctions:
def test_record_and_get_sovereignty_pct(self, tmp_path):
with (
patch("timmy.sovereignty.metrics._store", None),
patch("timmy.sovereignty.metrics.DB_PATH", tmp_path / "fn_test.db"),
):
record("decision_rule_hit")
pct = get_sovereignty_pct("decision")
assert pct == 100.0
def test_get_cost_per_hour_module_fn(self, tmp_path):
with (
patch("timmy.sovereignty.metrics._store", None),
patch("timmy.sovereignty.metrics.DB_PATH", tmp_path / "fn_test2.db"),
):
record("api_cost", {"usd": 0.5})
cost = get_cost_per_hour()
assert cost > 0.0
def test_get_skills_crystallized_module_fn(self, tmp_path):
with (
patch("timmy.sovereignty.metrics._store", None),
patch("timmy.sovereignty.metrics.DB_PATH", tmp_path / "fn_test3.db"),
):
record("skill_crystallized")
count = get_skills_crystallized()
assert count == 1
# ── emit_sovereignty_event ────────────────────────────────────────────────────
class TestEmitSovereigntyEvent:
@pytest.mark.asyncio
async def test_emit_records_and_publishes(self, tmp_path):
with (
patch("timmy.sovereignty.metrics._store", None),
patch("timmy.sovereignty.metrics.DB_PATH", tmp_path / "emit_test.db"),
patch("infrastructure.events.bus.emit", new_callable=AsyncMock) as mock_emit,
):
await emit_sovereignty_event("perception_cache_hit", {"frame": 42}, session_id="s1")
mock_emit.assert_called_once()
args = mock_emit.call_args[0]
assert args[0] == "sovereignty.event.perception_cache_hit"

View File

@@ -6,7 +6,6 @@ import pytest
from timmy.vassal.agent_health import AgentHealthReport, AgentStatus
# ---------------------------------------------------------------------------
# AgentStatus
# ---------------------------------------------------------------------------
@@ -49,9 +48,7 @@ def test_report_any_stuck():
def test_report_all_idle():
report = AgentHealthReport(
agents=[AgentStatus(agent="claude"), AgentStatus(agent="kimi")]
)
report = AgentHealthReport(agents=[AgentStatus(agent="claude"), AgentStatus(agent="kimi")])
assert report.all_idle is True

View File

@@ -6,14 +6,12 @@ import pytest
from timmy.vassal.backlog import (
AgentTarget,
TriagedIssue,
_choose_agent,
_extract_labels,
_score_priority,
triage_issues,
)
# ---------------------------------------------------------------------------
# _extract_labels
# ---------------------------------------------------------------------------

View File

@@ -12,7 +12,6 @@ from timmy.vassal.house_health import (
_probe_disk,
)
# ---------------------------------------------------------------------------
# Data model tests
# ---------------------------------------------------------------------------

View File

@@ -6,7 +6,6 @@ import pytest
from timmy.vassal.orchestration_loop import VassalCycleRecord, VassalOrchestrator
# ---------------------------------------------------------------------------
# VassalCycleRecord
# ---------------------------------------------------------------------------
@@ -134,6 +133,6 @@ def test_orchestrator_stop_when_not_running():
def test_module_singleton_exists():
from timmy.vassal import vassal_orchestrator, VassalOrchestrator
from timmy.vassal import VassalOrchestrator, vassal_orchestrator
assert isinstance(vassal_orchestrator, VassalOrchestrator)

View File

@@ -0,0 +1 @@
"""Cognitive benchmark levels for Project Bannerlord readiness testing."""

View File

@@ -0,0 +1,183 @@
"""Level 0: JSON Compliance — Coin Flip.
Tests whether the model can reliably return well-formed JSON responses
with a specific schema. This is the minimum bar for GABS tool calls.
"""
import json
import time
from dataclasses import dataclass, field
from typing import Any
LEVEL = 0
NAME = "JSON Compliance (Coin Flip)"
DESCRIPTION = "Model must return valid JSON matching a strict schema on each trial."
SYSTEM_PROMPT = """You are a strategic AI agent. You MUST respond ONLY with valid JSON.
No markdown, no explanation, no code fences. Raw JSON only."""
TRIALS = [
{
"prompt": (
'A coin is flipped. Respond with exactly: {"choice": "heads"} or {"choice": "tails"}. '
"Pick one. JSON only."
),
"schema": {"choice": str},
"valid_values": {"choice": ["heads", "tails"]},
},
{
"prompt": (
'You must attack or defend. Respond with: {"action": "attack", "confidence": 0.8} '
'or {"action": "defend", "confidence": 0.6}. Replace confidence with your own value 0.0-1.0. JSON only.'
),
"schema": {"action": str, "confidence": float},
"valid_values": {"action": ["attack", "defend"]},
},
{
"prompt": (
'Choose a direction to march. Respond with exactly: '
'{"direction": "north", "reason": "string explaining why"}. '
"Pick north/south/east/west. JSON only."
),
"schema": {"direction": str, "reason": str},
"valid_values": {"direction": ["north", "south", "east", "west"]},
},
]
@dataclass
class TrialResult:
trial_index: int
prompt: str
raw_response: str
parsed: dict | None
valid_json: bool
schema_valid: bool
value_valid: bool
latency_ms: float
error: str = ""
@dataclass
class LevelResult:
level: int = LEVEL
name: str = NAME
trials: list[TrialResult] = field(default_factory=list)
passed: bool = False
score: float = 0.0
latency_p50_ms: float = 0.0
latency_p99_ms: float = 0.0
def _validate_schema(parsed: dict, schema: dict[str, type]) -> bool:
for key, expected_type in schema.items():
if key not in parsed:
return False
if not isinstance(parsed[key], expected_type):
# Allow int where float is expected
if expected_type is float and isinstance(parsed[key], int):
continue
return False
return True
def _validate_values(parsed: dict, valid_values: dict[str, list]) -> bool:
for key, valid_list in valid_values.items():
if key in parsed and parsed[key] not in valid_list:
return False
return True
def _clean_response(raw: str) -> str:
"""Strip markdown fences if model wrapped JSON in them."""
raw = raw.strip()
if raw.startswith("```"):
lines = raw.splitlines()
# Remove first and last fence lines
lines = [l for l in lines if not l.startswith("```")]
raw = "\n".join(lines).strip()
return raw
def run(client: Any, model: str, verbose: bool = False) -> LevelResult:
result = LevelResult()
latencies = []
for i, trial in enumerate(TRIALS):
t0 = time.time()
try:
response = client.chat(
model=model,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": trial["prompt"]},
],
options={"temperature": 0.1},
)
raw = response["message"]["content"]
latency_ms = (time.time() - t0) * 1000
except Exception as exc:
latency_ms = (time.time() - t0) * 1000
tr = TrialResult(
trial_index=i,
prompt=trial["prompt"],
raw_response="",
parsed=None,
valid_json=False,
schema_valid=False,
value_valid=False,
latency_ms=latency_ms,
error=str(exc),
)
result.trials.append(tr)
if verbose:
print(f" Trial {i}: ERROR — {exc}")
continue
latencies.append(latency_ms)
cleaned = _clean_response(raw)
parsed = None
valid_json = False
schema_valid = False
value_valid = False
error = ""
try:
parsed = json.loads(cleaned)
valid_json = True
schema_valid = _validate_schema(parsed, trial["schema"])
value_valid = _validate_values(parsed, trial["valid_values"])
except json.JSONDecodeError as exc:
error = f"JSONDecodeError: {exc}"
tr = TrialResult(
trial_index=i,
prompt=trial["prompt"],
raw_response=raw,
parsed=parsed,
valid_json=valid_json,
schema_valid=schema_valid,
value_valid=value_valid,
latency_ms=latency_ms,
error=error,
)
result.trials.append(tr)
if verbose:
status = "PASS" if (valid_json and schema_valid) else "FAIL"
print(
f" Trial {i}: {status} | json={valid_json} schema={schema_valid} "
f"value={value_valid} | {latency_ms:.0f}ms | {raw[:80]!r}"
)
passed_trials = sum(1 for t in result.trials if t.valid_json and t.schema_valid)
result.score = passed_trials / len(TRIALS)
result.passed = result.score >= 1.0 # Must pass all 3 trials
if latencies:
latencies_sorted = sorted(latencies)
result.latency_p50_ms = latencies_sorted[len(latencies_sorted) // 2]
result.latency_p99_ms = latencies_sorted[-1]
return result

View File

@@ -0,0 +1,211 @@
"""Level 1: Board State Tracking — Tic-Tac-Toe.
Tests whether the model can maintain game state across turns, select
legal moves, and exhibit basic strategic awareness.
Maps to: Bannerlord board state / campaign map tracking.
"""
import json
import time
from dataclasses import dataclass, field
from typing import Any
LEVEL = 1
NAME = "Board State Tracking (Tic-Tac-Toe)"
DESCRIPTION = "Model must track a tic-tac-toe board and make legal, strategic moves."
SYSTEM_PROMPT = """You are a strategic AI playing tic-tac-toe. The board is a 3x3 grid.
Positions are numbered 0-8 left-to-right, top-to-bottom:
0|1|2
3|4|5
6|7|8
You MUST respond ONLY with valid JSON. No markdown, no explanation. Raw JSON only.
Format: {"move": <position 0-8>, "reason": "<brief reason>"}"""
SCENARIOS = [
{
"description": "Empty board — opening move",
"board": [None, None, None, None, None, None, None, None, None],
"player": "X",
"prompt": (
'Board state: [null,null,null,null,null,null,null,null,null]. '
'You are X. It is your turn. Choose a move. '
'Respond: {"move": <0-8>, "reason": "<why>"}'
),
"check": lambda move, board: move in range(9) and board[move] is None,
"check_desc": "Move must be a valid empty position (0-8)",
},
{
"description": "Block opponent's winning move",
"board": ["O", None, "O", None, "X", None, None, None, None],
"player": "X",
"prompt": (
'Board: ["O",null,"O",null,"X",null,null,null,null]. '
"O has positions 0 and 2. You are X. "
"O will win on next turn unless you block. "
'Respond: {"move": <0-8>, "reason": "<why>"}'
),
"check": lambda move, board: move == 1, # Must block at position 1
"check_desc": "Must block O's win at position 1",
},
{
"description": "Take winning move",
"board": ["X", None, "X", None, "O", None, None, "O", None],
"player": "X",
"prompt": (
'Board: ["X",null,"X",null,"O",null,null,"O",null]. '
"You are X. You have positions 0 and 2. "
"You can win this turn. "
'Respond: {"move": <0-8>, "reason": "<why>"}'
),
"check": lambda move, board: move == 1, # Win at position 1
"check_desc": "Must take winning move at position 1",
},
{
"description": "Legal move on partially filled board",
"board": ["X", "O", "X", "O", "X", "O", None, None, None],
"player": "O",
"prompt": (
'Board: ["X","O","X","O","X","O",null,null,null]. '
"You are O. Choose a legal move (positions 6, 7, or 8 are available). "
'Respond: {"move": <0-8>, "reason": "<why>"}'
),
"check": lambda move, board: move in [6, 7, 8],
"check_desc": "Move must be one of the empty positions: 6, 7, or 8",
},
]
@dataclass
class ScenarioResult:
scenario_index: int
description: str
prompt: str
raw_response: str
parsed: dict | None
valid_json: bool
move_legal: bool
move_correct: bool
latency_ms: float
error: str = ""
@dataclass
class LevelResult:
level: int = LEVEL
name: str = NAME
trials: list[ScenarioResult] = field(default_factory=list)
passed: bool = False
score: float = 0.0
latency_p50_ms: float = 0.0
latency_p99_ms: float = 0.0
def _clean_response(raw: str) -> str:
raw = raw.strip()
if raw.startswith("```"):
lines = raw.splitlines()
lines = [l for l in lines if not l.startswith("```")]
raw = "\n".join(lines).strip()
return raw
def run(client: Any, model: str, verbose: bool = False) -> LevelResult:
result = LevelResult()
latencies = []
for i, scenario in enumerate(SCENARIOS):
t0 = time.time()
try:
response = client.chat(
model=model,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": scenario["prompt"]},
],
options={"temperature": 0.1},
)
raw = response["message"]["content"]
latency_ms = (time.time() - t0) * 1000
except Exception as exc:
latency_ms = (time.time() - t0) * 1000
sr = ScenarioResult(
scenario_index=i,
description=scenario["description"],
prompt=scenario["prompt"],
raw_response="",
parsed=None,
valid_json=False,
move_legal=False,
move_correct=False,
latency_ms=latency_ms,
error=str(exc),
)
result.trials.append(sr)
if verbose:
print(f" Scenario {i}: ERROR — {exc}")
continue
latencies.append(latency_ms)
cleaned = _clean_response(raw)
parsed = None
valid_json = False
move_legal = False
move_correct = False
error = ""
try:
parsed = json.loads(cleaned)
valid_json = True
if "move" in parsed:
move = parsed["move"]
# Coerce string digits to int (some models emit "4" instead of 4)
if isinstance(move, str) and move.strip().lstrip("-").isdigit():
move = int(move.strip())
if isinstance(move, int):
board = scenario["board"]
move_legal = 0 <= move <= 8 and board[move] is None
move_correct = scenario["check"](move, board)
except json.JSONDecodeError as exc:
error = f"JSONDecodeError: {exc}"
sr = ScenarioResult(
scenario_index=i,
description=scenario["description"],
prompt=scenario["prompt"],
raw_response=raw,
parsed=parsed,
valid_json=valid_json,
move_legal=move_legal,
move_correct=move_correct,
latency_ms=latency_ms,
error=error,
)
result.trials.append(sr)
if verbose:
status = "PASS" if (valid_json and move_legal) else "FAIL"
correct_str = "CORRECT" if move_correct else "suboptimal"
move_val = parsed.get("move", "?") if parsed else "?"
print(
f" Scenario {i} [{scenario['description']}]: {status} ({correct_str}) "
f"| move={move_val} | {latency_ms:.0f}ms"
)
if not move_correct and valid_json:
print(f" Expected: {scenario['check_desc']}")
# Pass criteria: all moves must be valid JSON + legal
legal_moves = sum(1 for t in result.trials if t.valid_json and t.move_legal)
result.score = legal_moves / len(SCENARIOS)
result.passed = result.score >= 1.0
if latencies:
latencies_sorted = sorted(latencies)
result.latency_p50_ms = latencies_sorted[len(latencies_sorted) // 2]
result.latency_p99_ms = latencies_sorted[-1]
return result

View File

@@ -0,0 +1,213 @@
"""Level 2: Resource Management — Party Economy.
Tests whether the model can allocate limited resources across competing
priorities and adapt when constraints change.
Maps to: Bannerlord party economy (troops, food, gold, morale).
"""
import json
import time
from dataclasses import dataclass, field
from typing import Any
LEVEL = 2
NAME = "Resource Management (Party Economy)"
DESCRIPTION = "Model must allocate limited resources across troops, food, and equipment."
SYSTEM_PROMPT = """You are a Bannerlord campaign advisor managing a party.
Resources are limited — every decision has trade-offs.
You MUST respond ONLY with valid JSON. No markdown, no explanation. Raw JSON only."""
SCENARIOS = [
{
"description": "Budget allocation under constraint",
"prompt": (
"You have 500 gold. Options:\n"
"- Recruit 10 infantry: costs 300 gold, +10 combat strength\n"
"- Buy food for 20 days: costs 200 gold, keeps morale stable\n"
"- Repair armor: costs 150 gold, -20% casualty rate\n\n"
"You cannot afford all three. Morale is currently CRITICAL (troops may desert).\n"
'Choose 1-2 options. Respond: {"choices": ["option_a", ...], "gold_spent": <int>, "reason": "<why>"}\n'
"Where option keys are: recruit_infantry, buy_food, repair_armor"
),
"check": lambda r: (
isinstance(r.get("choices"), list)
and len(r["choices"]) >= 1
and all(c in ["recruit_infantry", "buy_food", "repair_armor"] for c in r["choices"])
and isinstance(r.get("gold_spent"), (int, float))
and r.get("gold_spent", 9999) <= 500
),
"check_desc": "choices must be valid options, gold_spent <= 500",
"strategic_check": lambda r: "buy_food" in r.get("choices", []),
"strategic_desc": "With CRITICAL morale, food should be prioritized",
},
{
"description": "Troop tier upgrade decision",
"prompt": (
"Party status:\n"
"- 15 Tier-1 recruits (weak, 30 upkeep/day)\n"
"- 5 Tier-3 veterans (strong, 90 upkeep/day)\n"
"- Daily income: 200 gold\n"
"- Upcoming: raider camp attack (moderate difficulty)\n\n"
"Options:\n"
"- Upgrade 5 recruits to Tier-2 (costs 250 gold total)\n"
"- Keep all current troops, save gold for emergencies\n"
"- Dismiss 5 recruits to save upkeep\n\n"
'Respond: {"action": "upgrade_recruits"|"save_gold"|"dismiss_recruits", '
'"reason": "<why>", "expected_outcome": "<string>"}'
),
"check": lambda r: (
r.get("action") in ["upgrade_recruits", "save_gold", "dismiss_recruits"]
and isinstance(r.get("reason"), str)
and len(r.get("reason", "")) > 0
),
"check_desc": "action must be one of the three options with a non-empty reason",
"strategic_check": lambda r: r.get("action") in ["upgrade_recruits", "save_gold"],
"strategic_desc": "Dismissing troops before a fight is suboptimal",
},
{
"description": "Multi-turn planning horizon",
"prompt": (
"Current: 300 gold, 10 days of food, 20 troops\n"
"Day 5: Must cross desert (costs 5 extra food days)\n"
"Day 10: Reach town (can buy supplies)\n\n"
"You need a 15-day food reserve to survive the journey.\n"
"Food costs 10 gold/day. You have enough for 10 days now.\n\n"
"How many extra food days do you buy today?\n"
'Respond: {"extra_food_days": <int>, "cost": <int>, "remaining_gold": <int>, "reason": "<why>"}'
),
"check": lambda r: (
isinstance(r.get("extra_food_days"), (int, float))
and isinstance(r.get("cost"), (int, float))
and isinstance(r.get("remaining_gold"), (int, float))
),
"check_desc": "Must include extra_food_days, cost, remaining_gold as numbers",
"strategic_check": lambda r: r.get("extra_food_days", 0) >= 5,
"strategic_desc": "Need at least 5 more days of food for desert crossing",
},
]
@dataclass
class ScenarioResult:
scenario_index: int
description: str
raw_response: str
parsed: dict | None
valid_json: bool
schema_valid: bool
strategically_sound: bool
latency_ms: float
error: str = ""
@dataclass
class LevelResult:
level: int = LEVEL
name: str = NAME
trials: list[ScenarioResult] = field(default_factory=list)
passed: bool = False
score: float = 0.0
latency_p50_ms: float = 0.0
latency_p99_ms: float = 0.0
def _clean_response(raw: str) -> str:
raw = raw.strip()
if raw.startswith("```"):
lines = raw.splitlines()
lines = [l for l in lines if not l.startswith("```")]
raw = "\n".join(lines).strip()
return raw
def run(client: Any, model: str, verbose: bool = False) -> LevelResult:
result = LevelResult()
latencies = []
for i, scenario in enumerate(SCENARIOS):
t0 = time.time()
try:
response = client.chat(
model=model,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": scenario["prompt"]},
],
options={"temperature": 0.1},
)
raw = response["message"]["content"]
latency_ms = (time.time() - t0) * 1000
except Exception as exc:
latency_ms = (time.time() - t0) * 1000
sr = ScenarioResult(
scenario_index=i,
description=scenario["description"],
raw_response="",
parsed=None,
valid_json=False,
schema_valid=False,
strategically_sound=False,
latency_ms=latency_ms,
error=str(exc),
)
result.trials.append(sr)
if verbose:
print(f" Scenario {i}: ERROR — {exc}")
continue
latencies.append(latency_ms)
cleaned = _clean_response(raw)
parsed = None
valid_json = False
schema_valid = False
strategically_sound = False
error = ""
try:
parsed = json.loads(cleaned)
valid_json = True
schema_valid = scenario["check"](parsed)
if schema_valid:
strategically_sound = scenario["strategic_check"](parsed)
except json.JSONDecodeError as exc:
error = f"JSONDecodeError: {exc}"
except Exception as exc:
error = f"Validation error: {exc}"
sr = ScenarioResult(
scenario_index=i,
description=scenario["description"],
raw_response=raw,
parsed=parsed,
valid_json=valid_json,
schema_valid=schema_valid,
strategically_sound=strategically_sound,
latency_ms=latency_ms,
error=error,
)
result.trials.append(sr)
if verbose:
status = "PASS" if (valid_json and schema_valid) else "FAIL"
strat = "strategic" if strategically_sound else "suboptimal"
print(
f" Scenario {i} [{scenario['description']}]: {status} ({strat}) "
f"| {latency_ms:.0f}ms"
)
if not schema_valid and valid_json:
print(f" Schema issue: {scenario['check_desc']}")
if not strategically_sound and schema_valid:
print(f" Strategy note: {scenario['strategic_desc']}")
valid_count = sum(1 for t in result.trials if t.valid_json and t.schema_valid)
result.score = valid_count / len(SCENARIOS)
result.passed = result.score >= 0.67 # 2/3 scenarios
if latencies:
latencies_sorted = sorted(latencies)
result.latency_p50_ms = latencies_sorted[len(latencies_sorted) // 2]
result.latency_p99_ms = latencies_sorted[-1]
return result

View File

@@ -0,0 +1,216 @@
"""Level 3: Battle Tactics — Formation Commands.
Tests whether the model can issue coherent formation and tactical orders
under simulated battlefield pressure with multiple unit types.
Maps to: Bannerlord formation commands (charge, shield wall, skirmish, etc.).
"""
import json
import time
from dataclasses import dataclass, field
from typing import Any
LEVEL = 3
NAME = "Battle Tactics (Formation Commands)"
DESCRIPTION = "Model must issue tactically sound formation orders under simulated battle conditions."
SYSTEM_PROMPT = """You are a Bannerlord battle commander. Issue formation orders using these commands:
- shield_wall: infantry forms defensive line (good vs ranged, slow advance)
- charge: all-out attack (high casualties, breaks weak enemies fast)
- skirmish: ranged units pepper enemy (good vs heavy infantry, needs distance)
- advance: move forward holding formation (balanced)
- flank_left / flank_right: cavalry sweeps around enemy side
- fallback: retreat to regroup (when badly outnumbered)
You MUST respond ONLY with valid JSON. No markdown. Raw JSON only."""
SCENARIOS = [
{
"description": "Ranged vs infantry — defensive opening",
"prompt": (
"Situation: You have 20 archers + 10 infantry. Enemy has 30 heavy infantry, no ranged.\n"
"Enemy is 200m away and advancing.\n"
"Objective: Maximize casualties before melee contact.\n\n"
'Issue orders for both units. Respond:\n'
'{"infantry_order": "<command>", "archer_order": "<command>", '
'"reason": "<tactical reasoning>", "expected_outcome": "<string>"}'
),
"check": lambda r: (
r.get("infantry_order") in ["shield_wall", "charge", "skirmish", "advance", "flank_left", "flank_right", "fallback"]
and r.get("archer_order") in ["shield_wall", "charge", "skirmish", "advance", "flank_left", "flank_right", "fallback"]
and isinstance(r.get("reason"), str)
),
"check_desc": "Both orders must be valid commands",
"strategic_check": lambda r: (
r.get("archer_order") == "skirmish"
and r.get("infantry_order") in ["shield_wall", "advance"]
),
"strategic_desc": "Archers should skirmish while infantry holds (shield_wall or advance)",
},
{
"description": "Outnumbered — retreat decision",
"prompt": (
"Situation: Your party (15 troops) has been ambushed.\n"
"Enemy: 60 bandits, surrounding you on 3 sides.\n"
"Your troops: 40% wounded. One escape route to the east.\n\n"
'What is your command? Respond:\n'
'{"order": "<command>", "direction": "east"|"west"|"north"|"south"|null, '
'"reason": "<tactical reasoning>", "priority": "preserve_troops"|"fight_through"}'
),
"check": lambda r: (
r.get("order") in ["shield_wall", "charge", "skirmish", "advance", "flank_left", "flank_right", "fallback"]
and r.get("priority") in ["preserve_troops", "fight_through"]
),
"check_desc": "order and priority must be valid values",
"strategic_check": lambda r: (
r.get("order") == "fallback"
and r.get("priority") == "preserve_troops"
),
"strategic_desc": "Outnumbered 4:1 with wounded troops — fallback is the sound choice",
},
{
"description": "Cavalry flanking opportunity",
"prompt": (
"Situation: Main battle is engaged. Your infantry and enemy infantry are locked.\n"
"You have 8 cavalry in reserve. Enemy left flank is unprotected.\n"
"If cavalry hits the flank now, it will route enemy in ~30 seconds.\n\n"
'Order for cavalry: Respond:\n'
'{"cavalry_order": "<command>", "timing": "now"|"wait", '
'"reason": "<reasoning>", "risk": "low"|"medium"|"high"}'
),
"check": lambda r: (
r.get("cavalry_order") in ["shield_wall", "charge", "skirmish", "advance", "flank_left", "flank_right", "fallback"]
and r.get("timing") in ["now", "wait"]
and r.get("risk") in ["low", "medium", "high"]
),
"check_desc": "cavalry_order, timing, and risk must be valid values",
"strategic_check": lambda r: (
r.get("cavalry_order") in ["flank_left", "flank_right", "charge"]
and r.get("timing") == "now"
),
"strategic_desc": "Should capitalize on the flank opportunity immediately",
},
]
@dataclass
class ScenarioResult:
scenario_index: int
description: str
raw_response: str
parsed: dict | None
valid_json: bool
schema_valid: bool
strategically_sound: bool
latency_ms: float
error: str = ""
@dataclass
class LevelResult:
level: int = LEVEL
name: str = NAME
trials: list[ScenarioResult] = field(default_factory=list)
passed: bool = False
score: float = 0.0
latency_p50_ms: float = 0.0
latency_p99_ms: float = 0.0
def _clean_response(raw: str) -> str:
raw = raw.strip()
if raw.startswith("```"):
lines = raw.splitlines()
lines = [l for l in lines if not l.startswith("```")]
raw = "\n".join(lines).strip()
return raw
def run(client: Any, model: str, verbose: bool = False) -> LevelResult:
result = LevelResult()
latencies = []
for i, scenario in enumerate(SCENARIOS):
t0 = time.time()
try:
response = client.chat(
model=model,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": scenario["prompt"]},
],
options={"temperature": 0.2},
)
raw = response["message"]["content"]
latency_ms = (time.time() - t0) * 1000
except Exception as exc:
latency_ms = (time.time() - t0) * 1000
sr = ScenarioResult(
scenario_index=i,
description=scenario["description"],
raw_response="",
parsed=None,
valid_json=False,
schema_valid=False,
strategically_sound=False,
latency_ms=latency_ms,
error=str(exc),
)
result.trials.append(sr)
if verbose:
print(f" Scenario {i}: ERROR — {exc}")
continue
latencies.append(latency_ms)
cleaned = _clean_response(raw)
parsed = None
valid_json = False
schema_valid = False
strategically_sound = False
error = ""
try:
parsed = json.loads(cleaned)
valid_json = True
schema_valid = scenario["check"](parsed)
if schema_valid:
strategically_sound = scenario["strategic_check"](parsed)
except json.JSONDecodeError as exc:
error = f"JSONDecodeError: {exc}"
except Exception as exc:
error = f"Validation error: {exc}"
sr = ScenarioResult(
scenario_index=i,
description=scenario["description"],
raw_response=raw,
parsed=parsed,
valid_json=valid_json,
schema_valid=schema_valid,
strategically_sound=strategically_sound,
latency_ms=latency_ms,
error=error,
)
result.trials.append(sr)
if verbose:
status = "PASS" if (valid_json and schema_valid) else "FAIL"
strat = "strategic" if strategically_sound else "suboptimal"
print(
f" Scenario {i} [{scenario['description']}]: {status} ({strat}) "
f"| {latency_ms:.0f}ms"
)
if not schema_valid and valid_json:
print(f" Schema issue: {scenario['check_desc']}")
valid_count = sum(1 for t in result.trials if t.valid_json and t.schema_valid)
result.score = valid_count / len(SCENARIOS)
result.passed = result.score >= 0.67
if latencies:
latencies_sorted = sorted(latencies)
result.latency_p50_ms = latencies_sorted[len(latencies_sorted) // 2]
result.latency_p99_ms = latencies_sorted[-1]
return result

View File

@@ -0,0 +1,223 @@
"""Level 4: Trade Route — Campaign Navigation.
Tests multi-step planning ability: route optimization, trade-off analysis
across time horizons, and adapting plans when conditions change.
Maps to: Bannerlord campaign map navigation, caravans, and economy.
"""
import json
import time
from dataclasses import dataclass, field
from typing import Any
LEVEL = 4
NAME = "Trade Route (Campaign Navigation)"
DESCRIPTION = "Model must plan optimal routes and adapt to changing conditions on the campaign map."
SYSTEM_PROMPT = """You are a Bannerlord merchant lord planning campaign movements.
Consider distance, profitability, risk, and timing.
You MUST respond ONLY with valid JSON. No markdown. Raw JSON only."""
SCENARIOS = [
{
"description": "Optimal trade route selection",
"prompt": (
"You are at Epicrotea with 500 gold and 20 days travel budget.\n\n"
"Trade opportunities:\n"
"- Route A: Epicrotea → Vlandia (3 days) → Sturgia (5 days back)\n"
" Sell grain in Vlandia: +300 gold. Buy furs in Sturgia: costs 200, sells for 400 in Calradia.\n"
" Total: +500 gold profit, 8 days.\n"
"- Route B: Epicrotea → Calradia (2 days) → Aserai (4 days)\n"
" Sell iron in Calradia: +150 gold. Buy spice in Aserai: costs 300, sells for 600 in Empire.\n"
" Empire is 6 more days away. Total: +450 gold profit, 12 days.\n"
"- Route C: Epicrotea → nearby village (1 day)\n"
" Buy cheap food: costs 100, sells for 180 in any city.\n"
" Total: +80 gold profit, 2 days. Repeatable.\n\n"
'Choose route. Respond:\n'
'{"route": "A"|"B"|"C", "expected_profit": <int>, "days_used": <int>, '
'"reason": "<reasoning>", "risk": "low"|"medium"|"high"}'
),
"check": lambda r: (
r.get("route") in ["A", "B", "C"]
and isinstance(r.get("expected_profit"), (int, float))
and isinstance(r.get("days_used"), (int, float))
and r.get("risk") in ["low", "medium", "high"]
),
"check_desc": "route, expected_profit, days_used, risk must be valid",
"strategic_check": lambda r: r.get("route") in ["A", "C"], # A is best single trip, C is best if repeated
"strategic_desc": "Route A has best profit/day ratio; C is best if multiple loops possible",
},
{
"description": "Adapt plan when war declared",
"prompt": (
"You were heading to Vlandia to trade, 2 days into the journey.\n"
"NEWS: Vlandia just declared war on your faction. Entering Vlandia territory is now dangerous.\n\n"
"Your current position: borderlands, equidistant between:\n"
"- Vlandia (2 days): Now at war — high risk of attack\n"
"- Sturgia (3 days): Neutral — safe\n"
"- Empire (4 days): Allied — very safe, good prices\n\n"
"You have 400 gold of trade goods for the Vlandia market.\n"
'What do you do? Respond:\n'
'{"decision": "continue_to_vlandia"|"divert_to_sturgia"|"divert_to_empire", '
'"reason": "<why>", "gold_at_risk": <int>}'
),
"check": lambda r: (
r.get("decision") in ["continue_to_vlandia", "divert_to_sturgia", "divert_to_empire"]
and isinstance(r.get("gold_at_risk"), (int, float))
),
"check_desc": "decision must be one of three options, gold_at_risk must be a number",
"strategic_check": lambda r: r.get("decision") in ["divert_to_sturgia", "divert_to_empire"],
"strategic_desc": "Should avoid active war zone — divert to safe destination",
},
{
"description": "Multi-stop route planning with constraints",
"prompt": (
"Plan a 3-stop trading circuit starting and ending at Pravend.\n"
"Budget: 800 gold. Time limit: 20 days.\n\n"
"Available cities and travel times from Pravend:\n"
"- Rhotae: 2 days (leather cheap, sells well in south)\n"
"- Ortysia: 4 days (grain surplus — buy cheap)\n"
"- Epicrotea: 3 days (iron market — buy/sell)\n"
"- Pen Cannoc: 5 days (wine — high profit, far)\n\n"
"Each stop takes 1 day for trading.\n"
'Plan 3 stops. Respond:\n'
'{"stops": ["<city1>", "<city2>", "<city3>"], '
'"total_days": <int>, "estimated_profit": <int>, '
'"reason": "<reasoning>"}'
),
"check": lambda r: (
isinstance(r.get("stops"), list)
and len(r["stops"]) == 3
and all(isinstance(s, str) for s in r["stops"])
and isinstance(r.get("total_days"), (int, float))
and r.get("total_days", 99) <= 20
and isinstance(r.get("estimated_profit"), (int, float))
),
"check_desc": "stops must be list of 3 strings, total_days <= 20, estimated_profit numeric",
"strategic_check": lambda r: "Pen Cannoc" not in r.get("stops", []), # Too far for 20 days
"strategic_desc": "Pen Cannoc at 5 days each way is likely too far for a 20-day circuit",
},
]
@dataclass
class ScenarioResult:
scenario_index: int
description: str
raw_response: str
parsed: dict | None
valid_json: bool
schema_valid: bool
strategically_sound: bool
latency_ms: float
error: str = ""
@dataclass
class LevelResult:
level: int = LEVEL
name: str = NAME
trials: list[ScenarioResult] = field(default_factory=list)
passed: bool = False
score: float = 0.0
latency_p50_ms: float = 0.0
latency_p99_ms: float = 0.0
def _clean_response(raw: str) -> str:
raw = raw.strip()
if raw.startswith("```"):
lines = raw.splitlines()
lines = [l for l in lines if not l.startswith("```")]
raw = "\n".join(lines).strip()
return raw
def run(client: Any, model: str, verbose: bool = False) -> LevelResult:
result = LevelResult()
latencies = []
for i, scenario in enumerate(SCENARIOS):
t0 = time.time()
try:
response = client.chat(
model=model,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": scenario["prompt"]},
],
options={"temperature": 0.2},
)
raw = response["message"]["content"]
latency_ms = (time.time() - t0) * 1000
except Exception as exc:
latency_ms = (time.time() - t0) * 1000
sr = ScenarioResult(
scenario_index=i,
description=scenario["description"],
raw_response="",
parsed=None,
valid_json=False,
schema_valid=False,
strategically_sound=False,
latency_ms=latency_ms,
error=str(exc),
)
result.trials.append(sr)
if verbose:
print(f" Scenario {i}: ERROR — {exc}")
continue
latencies.append(latency_ms)
cleaned = _clean_response(raw)
parsed = None
valid_json = False
schema_valid = False
strategically_sound = False
error = ""
try:
parsed = json.loads(cleaned)
valid_json = True
schema_valid = scenario["check"](parsed)
if schema_valid:
strategically_sound = scenario["strategic_check"](parsed)
except json.JSONDecodeError as exc:
error = f"JSONDecodeError: {exc}"
except Exception as exc:
error = f"Validation error: {exc}"
sr = ScenarioResult(
scenario_index=i,
description=scenario["description"],
raw_response=raw,
parsed=parsed,
valid_json=valid_json,
schema_valid=schema_valid,
strategically_sound=strategically_sound,
latency_ms=latency_ms,
error=error,
)
result.trials.append(sr)
if verbose:
status = "PASS" if (valid_json and schema_valid) else "FAIL"
strat = "strategic" if strategically_sound else "suboptimal"
print(
f" Scenario {i} [{scenario['description']}]: {status} ({strat}) "
f"| {latency_ms:.0f}ms"
)
if not schema_valid and valid_json:
print(f" Schema issue: {scenario['check_desc']}")
valid_count = sum(1 for t in result.trials if t.valid_json and t.schema_valid)
result.score = valid_count / len(SCENARIOS)
result.passed = result.score >= 0.67
if latencies:
latencies_sorted = sorted(latencies)
result.latency_p50_ms = latencies_sorted[len(latencies_sorted) // 2]
result.latency_p99_ms = latencies_sorted[-1]
return result

View File

@@ -0,0 +1,252 @@
"""Level 5: Mini Campaign — Full Campaign Loop.
Tests multi-turn strategic coherence: the model must maintain state across
several turns of a simulated Bannerlord campaign, making consistent decisions
that build toward a long-term goal.
Maps to: Full Bannerlord campaign loop — economy, diplomacy, conquest.
"""
import json
import time
from dataclasses import dataclass, field
from typing import Any
LEVEL = 5
NAME = "Mini Campaign (Full Campaign Loop)"
DESCRIPTION = "Multi-turn strategic planning maintaining coherent goals across 4 turns."
SYSTEM_PROMPT = """You are Timmy, a Bannerlord lord with ambitions to become King of Calradia.
You have 4 turns to establish a power base. Each turn represents 2 weeks of in-game time.
Your starting position:
- Clan tier: 1 (minor lord)
- Gold: 1000
- Troops: 25 (mixed infantry/cavalry)
- Renown: 150
- Relations: Neutral with all factions
Winning requires: Gold > 3000 AND Renown > 400 AND Own 1+ settlement by Turn 4.
Each turn, choose ONE primary action:
- "raid_village": +200 gold, -50 relations target faction, +30 renown, risk of retaliation
- "trade_circuit": +300 gold, 0 relation change, +10 renown, no risk
- "escort_caravan": +150 gold, +20 relations with faction, +20 renown
- "tournament": costs 100 gold, +60 renown, +20 relations with host faction
- "recruit_troops": costs 200 gold, +15 troops, no other change
- "siege_castle": costs 500 gold + 200 troops morale, -100 relations, +80 renown, +1 settlement if succeed (30% base chance)
- "pledge_vassalage": 0 cost, +100 relations with liege, +50 renown, lose independence
You MUST respond ONLY with valid JSON for each turn. Raw JSON only."""
def run(client: Any, model: str, verbose: bool = False) -> "LevelResult":
"""Run a 4-turn mini campaign, tracking state and decision quality."""
result = LevelResult()
# Initial game state
state = {
"turn": 1,
"gold": 1000,
"troops": 25,
"renown": 150,
"settlements": 0,
"relations": {"vlandia": 0, "sturgia": 0, "empire": 0, "aserai": 0, "battania": 0},
}
conversation = [{"role": "system", "content": SYSTEM_PROMPT}]
turns_passed = []
total_latency = []
valid_actions = [
"raid_village", "trade_circuit", "escort_caravan", "tournament",
"recruit_troops", "siege_castle", "pledge_vassalage",
]
for turn_num in range(1, 5):
state["turn"] = turn_num
state_str = json.dumps(state, indent=2)
prompt = (
f"=== TURN {turn_num} / 4 ===\n"
f"Current state:\n{state_str}\n\n"
f"Win conditions remaining: Gold > 3000 ({state['gold']}/3000), "
f"Renown > 400 ({state['renown']}/400), Settlements >= 1 ({state['settlements']}/1)\n\n"
f"Choose your action for Turn {turn_num}.\n"
f'Respond: {{"action": "<action>", "target_faction": "<faction or null>", '
f'"reason": "<strategic reasoning>", "goal": "<what this advances>"}}'
)
conversation.append({"role": "user", "content": prompt})
t0 = time.time()
try:
response = client.chat(
model=model,
messages=conversation,
options={"temperature": 0.3},
)
raw = response["message"]["content"]
latency_ms = (time.time() - t0) * 1000
except Exception as exc:
latency_ms = (time.time() - t0) * 1000
tr = TurnResult(
turn=turn_num,
state_before=dict(state),
raw_response="",
parsed=None,
valid_json=False,
valid_action=False,
action=None,
latency_ms=latency_ms,
error=str(exc),
)
turns_passed.append(tr)
if verbose:
print(f" Turn {turn_num}: ERROR — {exc}")
break
total_latency.append(latency_ms)
# Clean and parse response
cleaned = raw.strip()
if cleaned.startswith("```"):
lines = cleaned.splitlines()
lines = [l for l in lines if not l.startswith("```")]
cleaned = "\n".join(lines).strip()
parsed = None
valid_json = False
valid_action = False
action = None
error = ""
try:
parsed = json.loads(cleaned)
valid_json = True
action = parsed.get("action")
valid_action = action in valid_actions
except json.JSONDecodeError as exc:
error = f"JSONDecodeError: {exc}"
tr = TurnResult(
turn=turn_num,
state_before=dict(state),
raw_response=raw,
parsed=parsed,
valid_json=valid_json,
valid_action=valid_action,
action=action,
latency_ms=latency_ms,
error=error,
)
turns_passed.append(tr)
# Add model response to conversation for continuity
conversation.append({"role": "assistant", "content": raw})
# Apply state changes based on action
if valid_action:
_apply_action(state, action, parsed.get("target_faction"))
if verbose:
status = "PASS" if (valid_json and valid_action) else "FAIL"
print(
f" Turn {turn_num}: {status} | action={action} | {latency_ms:.0f}ms | "
f"gold={state['gold']} renown={state['renown']} settlements={state['settlements']}"
)
result.turns = turns_passed
result.final_state = dict(state)
# Win condition check
result.reached_gold_target = state["gold"] >= 3000
result.reached_renown_target = state["renown"] >= 400
result.reached_settlement_target = state["settlements"] >= 1
# Score: % of turns with valid JSON + valid action
valid_turns = sum(1 for t in turns_passed if t.valid_json and t.valid_action)
result.score = valid_turns / 4 if turns_passed else 0.0
result.passed = result.score >= 0.75 # 3/4 turns valid
if total_latency:
latencies_sorted = sorted(total_latency)
result.latency_p50_ms = latencies_sorted[len(latencies_sorted) // 2]
result.latency_p99_ms = latencies_sorted[-1]
if verbose:
win_status = []
if result.reached_gold_target:
win_status.append("GOLD")
if result.reached_renown_target:
win_status.append("RENOWN")
if result.reached_settlement_target:
win_status.append("SETTLEMENT")
print(f" Win conditions met: {win_status or 'none'}")
print(f" Final: gold={state['gold']} renown={state['renown']} settlements={state['settlements']}")
return result
def _apply_action(state: dict, action: str, target_faction: str | None) -> None:
"""Simulate game state changes for a given action."""
if action == "raid_village":
state["gold"] += 200
state["renown"] += 30
if target_faction and target_faction in state["relations"]:
state["relations"][target_faction] -= 50
elif action == "trade_circuit":
state["gold"] += 300
state["renown"] += 10
elif action == "escort_caravan":
state["gold"] += 150
state["renown"] += 20
if target_faction and target_faction in state["relations"]:
state["relations"][target_faction] += 20
elif action == "tournament":
state["gold"] -= 100
state["renown"] += 60
if target_faction and target_faction in state["relations"]:
state["relations"][target_faction] += 20
elif action == "recruit_troops":
state["gold"] -= 200
state["troops"] += 15
elif action == "siege_castle":
state["gold"] -= 500
state["renown"] += 80
# 30% chance success (deterministic sim: succeed on turn 3+ if attempted)
if state["turn"] >= 3:
state["settlements"] += 1
if target_faction and target_faction in state["relations"]:
state["relations"][target_faction] -= 100
elif action == "pledge_vassalage":
state["renown"] += 50
if target_faction and target_faction in state["relations"]:
state["relations"][target_faction] += 100
@dataclass
class TurnResult:
turn: int
state_before: dict
raw_response: str
parsed: dict | None
valid_json: bool
valid_action: bool
action: str | None
latency_ms: float
error: str = ""
@dataclass
class LevelResult:
level: int = LEVEL
name: str = NAME
turns: list[TurnResult] = field(default_factory=list)
final_state: dict = field(default_factory=dict)
passed: bool = False
score: float = 0.0
reached_gold_target: bool = False
reached_renown_target: bool = False
reached_settlement_target: bool = False
latency_p50_ms: float = 0.0
latency_p99_ms: float = 0.0

View File

@@ -0,0 +1,82 @@
# Bannerlord M0 — Cognitive Benchmark Scorecard
**Date:** 2026-03-23
**Benchmark:** 6-level cognitive harness (L0L5)
**M1 Gate:** Must pass L0 + L1, latency < 10s per decision
---
## Results Summary
| Level | Description | qwen2.5:14b | hermes3:latest | hermes3:8b |
|-------|-------------|:-----------:|:--------------:|:----------:|
| **L0 [M1 GATE]** | JSON Compliance | ✓ PASS 100% | ✓ PASS 100% | ✓ PASS 100% |
| **L1 [M1 GATE]** | Board State Tracking | ✗ FAIL 50% | ✗ FAIL 50% | ✗ FAIL 50% |
| L2 | Resource Management | ✓ PASS 100% | ✓ PASS 100% | ✓ PASS 100% |
| L3 | Battle Tactics | ✓ PASS 100% | ✓ PASS 100% | ✓ PASS 100% |
| L4 | Trade Route | ✓ PASS 100% | ✓ PASS 100% | ✓ PASS 100% |
| L5 | Mini Campaign | ✓ PASS 100% | ✓ PASS 100% | ✓ PASS 100% |
| **M1 GATE** | | ✗ **FAIL** | ✗ **FAIL** | ✗ **FAIL** |
---
## Latency (p50 / p99)
| Level | qwen2.5:14b | hermes3:latest | hermes3:8b |
|-------|-------------|----------------|------------|
| L0 | 1443ms / 6348ms | 1028ms / 1184ms | 570ms / 593ms |
| L1 | 943ms / 1184ms | 1166ms / 1303ms | 767ms / 1313ms |
| L2 | 2936ms / 3122ms | 2032ms / 2232ms | 2408ms / 2832ms |
| L3 | 2248ms / 3828ms | 1614ms / 3525ms | 2174ms / 3437ms |
| L4 | 3235ms / 3318ms | 2724ms / 3038ms | 2507ms / 3420ms |
| L5 | 3414ms / 3970ms | 3137ms / 3433ms | 2571ms / 2763ms |
All models are **well under the 10s latency threshold** for L0L1.
---
## Level 1 Failure Analysis
All three models fail L1 with **identical pattern** (2/4 scenarios pass):
| Scenario | Expected | All Models |
|----------|----------|-----------|
| Empty board — opening move | Any empty square | ✓ center (4) |
| Block opponent's winning move | Position 1 (only block) | ✗ position 4 (occupied!) |
| Take winning move | Position 1 (win) | ✗ position 0 or 2 (occupied!) |
| Legal move on partially filled board | Any of 6,7,8 | ✓ position 6 |
**Root cause:** Models choose moves by heuristic (center, corners) without checking whether the chosen square is already occupied. They read the board description but don't cross-reference their move choice against it. This is a genuine spatial state-tracking failure.
**Note:** `hermes3` models emit `"move": "4"` (string) vs `"move": 4` (int). The benchmark was patched to coerce string digits to int for L1, since type fidelity is already tested at L0.
---
## M1 Gate: FAILED (all models)
No model passes the M1 gate. The blocker is **Level 1 — Board State Tracking**.
### Recommendation
The L1 failure is consistent and structural. All models understand the format and can make reasonable *opening* moves but fail to avoid already-occupied squares. Options for M1:
1. **Lower the L1 pass threshold** from 100% to ≥ 75% — the scenarios where models fail require recognizing occupied positions from a sparse JSON array, which is a known weakness. Would allow proceeding to M1 with flagged risk.
2. **Prompt engineering** — add explicit "The following squares are taken: X at positions [P1, P2]" to the prompt to see if board tracking improves.
3. **Re-evaluate L1 gate requirement** — models pass L2L5 (resource, tactics, trade, campaign) which are more directly relevant to Bannerlord play. Consider whether L1 is the right gate.
---
## qwen3:14b
Model **not available** on this Ollama instance. Available qwen3 model: `qwen3:30b`.
`qwen3:30b` was not benchmarked (significantly slower; requires explicit decision to run).
---
## Result Files
| Model | File |
|-------|------|
| qwen2.5:14b | `results/qwen2.5_14b_20260323_142119.json` |
| hermes3:latest | `results/hermes3_latest_20260323_152900.json` |
| hermes3:8b | `results/hermes3_8b_20260323_153000.json` |

View File

@@ -0,0 +1,476 @@
{
"model": "hermes3:8b",
"timestamp": "2026-03-23T19:29:22.731080+00:00",
"skipped": false,
"levels": {
"0": {
"level": 0,
"name": "JSON Compliance (Coin Flip)",
"trials": [
{
"trial_index": 0,
"prompt": "A coin is flipped. Respond with exactly: {\"choice\": \"heads\"} or {\"choice\": \"tails\"}. Pick one. JSON only.",
"raw_response": "{\"choice\": \"heads\"}",
"parsed": {
"choice": "heads"
},
"valid_json": true,
"schema_valid": true,
"value_valid": true,
"latency_ms": 522.9010581970215,
"error": ""
},
{
"trial_index": 1,
"prompt": "You must attack or defend. Respond with: {\"action\": \"attack\", \"confidence\": 0.8} or {\"action\": \"defend\", \"confidence\": 0.6}. Replace confidence with your own value 0.0-1.0. JSON only.",
"raw_response": "{\"action\": \"defend\", \"confidence\": 0.7}",
"parsed": {
"action": "defend",
"confidence": 0.7
},
"valid_json": true,
"schema_valid": true,
"value_valid": true,
"latency_ms": 592.6809310913086,
"error": ""
},
{
"trial_index": 2,
"prompt": "Choose a direction to march. Respond with exactly: {\"direction\": \"north\", \"reason\": \"string explaining why\"}. Pick north/south/east/west. JSON only.",
"raw_response": "{\"direction\": \"north\", \"reason\": \"string explaining why\"}",
"parsed": {
"direction": "north",
"reason": "string explaining why"
},
"valid_json": true,
"schema_valid": true,
"value_valid": true,
"latency_ms": 569.8130130767822,
"error": ""
}
],
"passed": true,
"score": 1.0,
"latency_p50_ms": 569.8130130767822,
"latency_p99_ms": 592.6809310913086
},
"1": {
"level": 1,
"name": "Board State Tracking (Tic-Tac-Toe)",
"trials": [
{
"scenario_index": 0,
"description": "Empty board \u2014 opening move",
"prompt": "Board state: [null,null,null,null,null,null,null,null,null]. You are X. It is your turn. Choose a move. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
"raw_response": "{\"move\": \"4\", \"reason\": \"Center move to establish dominance\"}",
"parsed": {
"move": "4",
"reason": "Center move to establish dominance"
},
"valid_json": true,
"move_legal": true,
"move_correct": true,
"latency_ms": 766.8571472167969,
"error": ""
},
{
"scenario_index": 1,
"description": "Block opponent's winning move",
"prompt": "Board: [\"O\",null,\"O\",null,\"X\",null,null,null,null]. O has positions 0 and 2. You are X. O will win on next turn unless you block. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
"raw_response": "{\"move\": 4, \"reason\": \"Block the winning move for O in the center\"}",
"parsed": {
"move": 4,
"reason": "Block the winning move for O in the center"
},
"valid_json": true,
"move_legal": false,
"move_correct": false,
"latency_ms": 734.1668605804443,
"error": ""
},
{
"scenario_index": 2,
"description": "Take winning move",
"prompt": "Board: [\"X\",null,\"X\",null,\"O\",null,null,\"O\",null]. You are X. You have positions 0 and 2. You can win this turn. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
"raw_response": "{\"move\": 0, \"reason\": \"I can win this turn by placing my X in position 0, creating a winning line with X at positions 0, 4, and 8.\"}",
"parsed": {
"move": 0,
"reason": "I can win this turn by placing my X in position 0, creating a winning line with X at positions 0, 4, and 8."
},
"valid_json": true,
"move_legal": false,
"move_correct": false,
"latency_ms": 1312.8020763397217,
"error": ""
},
{
"scenario_index": 3,
"description": "Legal move on partially filled board",
"prompt": "Board: [\"X\",\"O\",\"X\",\"O\",\"X\",\"O\",null,null,null]. You are O. Choose a legal move (positions 6, 7, or 8 are available). Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
"raw_response": "{\"move\": 6, \"reason\": \"Block X's potential winning move in the center.\"}",
"parsed": {
"move": 6,
"reason": "Block X's potential winning move in the center."
},
"valid_json": true,
"move_legal": true,
"move_correct": true,
"latency_ms": 751.0488033294678,
"error": ""
}
],
"passed": false,
"score": 0.5,
"latency_p50_ms": 766.8571472167969,
"latency_p99_ms": 1312.8020763397217
},
"2": {
"level": 2,
"name": "Resource Management (Party Economy)",
"trials": [
{
"scenario_index": 0,
"description": "Budget allocation under constraint",
"raw_response": "{\"choices\": [\"buy_food\", \"repair_armor\"], \"gold_spent\": 350, \"reason\": \"The priority is to address the critical morale issue and reduce casualty rates. Buying food for 20 days will stabilize morale, preventing potential desertions. Repairing armor will also help reduce the casualty rate, making the remaining troops more effective. Recruit infantry is not chosen as it would deplete all the gold, leaving no funds for food or armor repair.\"}",
"parsed": {
"choices": [
"buy_food",
"repair_armor"
],
"gold_spent": 350,
"reason": "The priority is to address the critical morale issue and reduce casualty rates. Buying food for 20 days will stabilize morale, preventing potential desertions. Repairing armor will also help reduce the casualty rate, making the remaining troops more effective. Recruit infantry is not chosen as it would deplete all the gold, leaving no funds for food or armor repair."
},
"valid_json": true,
"schema_valid": true,
"strategically_sound": true,
"latency_ms": 2832.072973251343,
"error": ""
},
{
"scenario_index": 1,
"description": "Troop tier upgrade decision",
"raw_response": "{\"action\": \"upgrade_recruits\", \"reason\": \"Upgrading recruits to Tier-2 will improve our party's combat effectiveness against the upcoming raider camp attack. The moderate difficulty suggests we need a stronger force to succeed.\", \"expected_outcome\": \"Increased chances of victory in the upcoming battle, potentially saving resources in the long run by avoiding higher upkeep costs of dismissing recruits.\"}",
"parsed": {
"action": "upgrade_recruits",
"reason": "Upgrading recruits to Tier-2 will improve our party's combat effectiveness against the upcoming raider camp attack. The moderate difficulty suggests we need a stronger force to succeed.",
"expected_outcome": "Increased chances of victory in the upcoming battle, potentially saving resources in the long run by avoiding higher upkeep costs of dismissing recruits."
},
"valid_json": true,
"schema_valid": true,
"strategically_sound": true,
"latency_ms": 2407.7491760253906,
"error": ""
},
{
"scenario_index": 2,
"description": "Multi-turn planning horizon",
"raw_response": "{\"extra_food_days\": 5, \"cost\": 50, \"remaining_gold\": 250, \"reason\": \"Buying 5 extra food days will ensure we have enough food to cross the desert, and we will still have enough gold left to reach the town on day 10 to buy more supplies.\"}",
"parsed": {
"extra_food_days": 5,
"cost": 50,
"remaining_gold": 250,
"reason": "Buying 5 extra food days will ensure we have enough food to cross the desert, and we will still have enough gold left to reach the town on day 10 to buy more supplies."
},
"valid_json": true,
"schema_valid": true,
"strategically_sound": true,
"latency_ms": 2028.775930404663,
"error": ""
}
],
"passed": true,
"score": 1.0,
"latency_p50_ms": 2407.7491760253906,
"latency_p99_ms": 2832.072973251343
},
"3": {
"level": 3,
"name": "Battle Tactics (Formation Commands)",
"trials": [
{
"scenario_index": 0,
"description": "Ranged vs infantry \u2014 defensive opening",
"raw_response": "{\"infantry_order\": \"skirmish\", \"archer_order\": \"skirmish\", \"reason\": \"The enemy is advancing with heavy infantry, so our priority should be to disrupt their formation and inflict casualties from range before they close the gap. The archers and infantry will pepper the enemy with arrows as they approach, aiming to break their cohesion and lower their morale.\", \"expected_outcome\": \"The enemy's heavy infantry will take significant casualties and lose cohesion before reaching our lines, giving us a tactical advantage when the two forces meet in melee combat.\"}",
"parsed": {
"infantry_order": "skirmish",
"archer_order": "skirmish",
"reason": "The enemy is advancing with heavy infantry, so our priority should be to disrupt their formation and inflict casualties from range before they close the gap. The archers and infantry will pepper the enemy with arrows as they approach, aiming to break their cohesion and lower their morale.",
"expected_outcome": "The enemy's heavy infantry will take significant casualties and lose cohesion before reaching our lines, giving us a tactical advantage when the two forces meet in melee combat."
},
"valid_json": true,
"schema_valid": true,
"strategically_sound": false,
"latency_ms": 3436.897039413452,
"error": ""
},
{
"scenario_index": 1,
"description": "Outnumbered \u2014 retreat decision",
"raw_response": "{\"order\": \"fallback\", \"direction\": \"east\", \"reason\": \"Our troops are heavily wounded and outmatched by the 60 bandits surrounding us. We must retreat immediately to the east where there is an escape route. Fallback is the best option to preserve our troops and live to fight another day.\", \"priority\": \"preserve_troops\"}",
"parsed": {
"order": "fallback",
"direction": "east",
"reason": "Our troops are heavily wounded and outmatched by the 60 bandits surrounding us. We must retreat immediately to the east where there is an escape route. Fallback is the best option to preserve our troops and live to fight another day.",
"priority": "preserve_troops"
},
"valid_json": true,
"schema_valid": true,
"strategically_sound": true,
"latency_ms": 2173.5548973083496,
"error": ""
},
{
"scenario_index": 2,
"description": "Cavalry flanking opportunity",
"raw_response": "{\"cavalry_order\": \"flank_left\", \"timing\": \"now\", \"reason\": \"Enemy left flank is unprotected and our cavalry can break their line quickly, routing them in ~30 seconds.\", \"risk\": \"low\"}",
"parsed": {
"cavalry_order": "flank_left",
"timing": "now",
"reason": "Enemy left flank is unprotected and our cavalry can break their line quickly, routing them in ~30 seconds.",
"risk": "low"
},
"valid_json": true,
"schema_valid": true,
"strategically_sound": true,
"latency_ms": 1566.443920135498,
"error": ""
}
],
"passed": true,
"score": 1.0,
"latency_p50_ms": 2173.5548973083496,
"latency_p99_ms": 3436.897039413452
},
"4": {
"level": 4,
"name": "Trade Route (Campaign Navigation)",
"trials": [
{
"scenario_index": 0,
"description": "Optimal trade route selection",
"raw_response": "{\"route\": \"A\", \"expected_profit\": 500, \"days_used\": 8, \"reason\": \"Route A offers the highest profit with a manageable travel time and low risk. Selling grain in Vlandia and furs in Sturgia provides a good balance of goods and markets.\", \"risk\": \"low\"}",
"parsed": {
"route": "A",
"expected_profit": 500,
"days_used": 8,
"reason": "Route A offers the highest profit with a manageable travel time and low risk. Selling grain in Vlandia and furs in Sturgia provides a good balance of goods and markets.",
"risk": "low"
},
"valid_json": true,
"schema_valid": true,
"strategically_sound": true,
"latency_ms": 2507.443904876709,
"error": ""
},
{
"scenario_index": 1,
"description": "Adapt plan when war declared",
"raw_response": "{\"decision\": \"divert_to_sturgia\", \"reason\": \"Vlandia declaring war makes entering their territory too dangerous for trade. Sturgia is neutral but closer than the Empire, so less time and gold at risk.\", \"gold_at_risk\": 400}",
"parsed": {
"decision": "divert_to_sturgia",
"reason": "Vlandia declaring war makes entering their territory too dangerous for trade. Sturgia is neutral but closer than the Empire, so less time and gold at risk.",
"gold_at_risk": 400
},
"valid_json": true,
"schema_valid": true,
"strategically_sound": true,
"latency_ms": 2011.15083694458,
"error": ""
},
{
"scenario_index": 2,
"description": "Multi-stop route planning with constraints",
"raw_response": "{\n \"stops\": [\n \"Rhotae\",\n \"Ortysia\",\n \"Epicrotea\"\n ],\n \"total_days\": 7,\n \"estimated_profit\": 800,\n \"reason\": \"Starting in Rhotae allows buying cheap leather and selling well in the south. Moving to Ortysia to buy grain cheap. Finally, Epicrotea for iron market. 7 days total. 800 gold budget should be enough with this plan.\"\n}",
"parsed": {
"stops": [
"Rhotae",
"Ortysia",
"Epicrotea"
],
"total_days": 7,
"estimated_profit": 800,
"reason": "Starting in Rhotae allows buying cheap leather and selling well in the south. Moving to Ortysia to buy grain cheap. Finally, Epicrotea for iron market. 7 days total. 800 gold budget should be enough with this plan."
},
"valid_json": true,
"schema_valid": true,
"strategically_sound": true,
"latency_ms": 3420.1788902282715,
"error": ""
}
],
"passed": true,
"score": 1.0,
"latency_p50_ms": 2507.443904876709,
"latency_p99_ms": 3420.1788902282715
},
"5": {
"level": 5,
"name": "Mini Campaign (Full Campaign Loop)",
"turns": [
{
"turn": 1,
"state_before": {
"turn": 1,
"gold": 1000,
"troops": 25,
"renown": 150,
"settlements": 0,
"relations": {
"vlandia": 0,
"sturgia": 0,
"empire": 0,
"aserai": 0,
"battania": 0
}
},
"raw_response": "{\"action\": \"trade_circuit\", \"target_faction\": null, \"reason\": \"Establishing trade relations will bring in gold and renown without risking retaliation or damaging relations with any faction. This sets a strong financial foundation for future expansion.\", \"goal\": \"Increase gold and renown\"}",
"parsed": {
"action": "trade_circuit",
"target_faction": null,
"reason": "Establishing trade relations will bring in gold and renown without risking retaliation or damaging relations with any faction. This sets a strong financial foundation for future expansion.",
"goal": "Increase gold and renown"
},
"valid_json": true,
"valid_action": true,
"action": "trade_circuit",
"latency_ms": 2763.1959915161133,
"error": ""
},
{
"turn": 2,
"state_before": {
"turn": 2,
"gold": 1300,
"troops": 25,
"renown": 160,
"settlements": 0,
"relations": {
"vlandia": 0,
"sturgia": 0,
"empire": 0,
"aserai": 0,
"battania": 0
}
},
"raw_response": "{\"action\": \"tournament\", \"target_faction\": null, \"reason\": \"Participating in a tournament will boost my renown and relations without spending gold. This positive publicity can help attract potential vassals or allies in the future.\", \"goal\": \"Increase renown and relations\"}",
"parsed": {
"action": "tournament",
"target_faction": null,
"reason": "Participating in a tournament will boost my renown and relations without spending gold. This positive publicity can help attract potential vassals or allies in the future.",
"goal": "Increase renown and relations"
},
"valid_json": true,
"valid_action": true,
"action": "tournament",
"latency_ms": 2335.5319499969482,
"error": ""
},
{
"turn": 3,
"state_before": {
"turn": 3,
"gold": 1200,
"troops": 25,
"renown": 220,
"settlements": 0,
"relations": {
"vlandia": 0,
"sturgia": 0,
"empire": 0,
"aserai": 0,
"battania": 0
}
},
"raw_response": "{\"action\": \"recruit_troops\", \"target_faction\": null, \"reason\": \"Recruiting more troops will strengthen my military force, which is essential for future expansion and defending my settlements once I have them. This investment in my army will pay off in the long run.\", \"goal\": \"Increase troops\"}",
"parsed": {
"action": "recruit_troops",
"target_faction": null,
"reason": "Recruiting more troops will strengthen my military force, which is essential for future expansion and defending my settlements once I have them. This investment in my army will pay off in the long run.",
"goal": "Increase troops"
},
"valid_json": true,
"valid_action": true,
"action": "recruit_troops",
"latency_ms": 2570.871114730835,
"error": ""
},
{
"turn": 4,
"state_before": {
"turn": 4,
"gold": 1000,
"troops": 40,
"renown": 220,
"settlements": 0,
"relations": {
"vlandia": 0,
"sturgia": 0,
"empire": 0,
"aserai": 0,
"battania": 0
}
},
"raw_response": "{\"action\": \"siege_castle\", \"target_faction\": null, \"reason\": \"Capturing a castle will give me my first settlement, putting me closer to winning the game. This physical foothold in Calradia will provide a base of operations and resources to further my ambitions of becoming King.\", \"goal\": \"Gain first settlement\"}",
"parsed": {
"action": "siege_castle",
"target_faction": null,
"reason": "Capturing a castle will give me my first settlement, putting me closer to winning the game. This physical foothold in Calradia will provide a base of operations and resources to further my ambitions of becoming King.",
"goal": "Gain first settlement"
},
"valid_json": true,
"valid_action": true,
"action": "siege_castle",
"latency_ms": 2367.640972137451,
"error": ""
}
],
"final_state": {
"turn": 4,
"gold": 500,
"troops": 40,
"renown": 300,
"settlements": 1,
"relations": {
"vlandia": 0,
"sturgia": 0,
"empire": 0,
"aserai": 0,
"battania": 0
}
},
"passed": true,
"score": 1.0,
"reached_gold_target": false,
"reached_renown_target": false,
"reached_settlement_target": true,
"latency_p50_ms": 2570.871114730835,
"latency_p99_ms": 2763.1959915161133
}
},
"summary": {
"total_elapsed_s": 37.7,
"levels_run": [
0,
1,
2,
3,
4,
5
],
"levels_passed": [
0,
2,
3,
4,
5
],
"levels_failed": [
1
],
"m1_gate_passed": false,
"m1_gate_notes": [
"Level 1 FAILED (score=0.50)"
],
"m1_latency_threshold_ms": 10000
}
}

View File

@@ -0,0 +1,476 @@
{
"model": "hermes3:latest",
"timestamp": "2026-03-23T19:28:19.717773+00:00",
"skipped": false,
"levels": {
"0": {
"level": 0,
"name": "JSON Compliance (Coin Flip)",
"trials": [
{
"trial_index": 0,
"prompt": "A coin is flipped. Respond with exactly: {\"choice\": \"heads\"} or {\"choice\": \"tails\"}. Pick one. JSON only.",
"raw_response": "{\"choice\": \"heads\"}",
"parsed": {
"choice": "heads"
},
"valid_json": true,
"schema_valid": true,
"value_valid": true,
"latency_ms": 1028.2011032104492,
"error": ""
},
{
"trial_index": 1,
"prompt": "You must attack or defend. Respond with: {\"action\": \"attack\", \"confidence\": 0.8} or {\"action\": \"defend\", \"confidence\": 0.6}. Replace confidence with your own value 0.0-1.0. JSON only.",
"raw_response": "{\"action\": \"defend\", \"confidence\": 0.7}",
"parsed": {
"action": "defend",
"confidence": 0.7
},
"valid_json": true,
"schema_valid": true,
"value_valid": true,
"latency_ms": 700.0582218170166,
"error": ""
},
{
"trial_index": 2,
"prompt": "Choose a direction to march. Respond with exactly: {\"direction\": \"north\", \"reason\": \"string explaining why\"}. Pick north/south/east/west. JSON only.",
"raw_response": "{\"direction\": \"north\", \"reason\": \"The northern direction allows for the most efficient and direct path towards the goal, minimizing potential obstacles and maximizing progress.\"}",
"parsed": {
"direction": "north",
"reason": "The northern direction allows for the most efficient and direct path towards the goal, minimizing potential obstacles and maximizing progress."
},
"valid_json": true,
"schema_valid": true,
"value_valid": true,
"latency_ms": 1184.3459606170654,
"error": ""
}
],
"passed": true,
"score": 1.0,
"latency_p50_ms": 1028.2011032104492,
"latency_p99_ms": 1184.3459606170654
},
"1": {
"level": 1,
"name": "Board State Tracking (Tic-Tac-Toe)",
"trials": [
{
"scenario_index": 0,
"description": "Empty board \u2014 opening move",
"prompt": "Board state: [null,null,null,null,null,null,null,null,null]. You are X. It is your turn. Choose a move. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
"raw_response": "{\"move\": \"4\", \"reason\": \"Center move to establish dominance\"}",
"parsed": {
"move": "4",
"reason": "Center move to establish dominance"
},
"valid_json": true,
"move_legal": true,
"move_correct": true,
"latency_ms": 932.2080612182617,
"error": ""
},
{
"scenario_index": 1,
"description": "Block opponent's winning move",
"prompt": "Board: [\"O\",null,\"O\",null,\"X\",null,null,null,null]. O has positions 0 and 2. You are X. O will win on next turn unless you block. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
"raw_response": "{\"move\": 4, \"reason\": \"Block the winning move for O in the center, creating a chance for a winning X cross\"}",
"parsed": {
"move": 4,
"reason": "Block the winning move for O in the center, creating a chance for a winning X cross"
},
"valid_json": true,
"move_legal": false,
"move_correct": false,
"latency_ms": 1165.644884109497,
"error": ""
},
{
"scenario_index": 2,
"description": "Take winning move",
"prompt": "Board: [\"X\",null,\"X\",null,\"O\",null,null,\"O\",null]. You are X. You have positions 0 and 2. You can win this turn. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
"raw_response": "{\"move\": 0, \"reason\": \"I can win this turn by placing my X in position 0, creating a winning line of three X's horizontally.\"}",
"parsed": {
"move": 0,
"reason": "I can win this turn by placing my X in position 0, creating a winning line of three X's horizontally."
},
"valid_json": true,
"move_legal": false,
"move_correct": false,
"latency_ms": 1302.915096282959,
"error": ""
},
{
"scenario_index": 3,
"description": "Legal move on partially filled board",
"prompt": "Board: [\"X\",\"O\",\"X\",\"O\",\"X\",\"O\",null,null,null]. You are O. Choose a legal move (positions 6, 7, or 8 are available). Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
"raw_response": "{\"move\": 6, \"reason\": \"Block X's potential win in the center.\"}",
"parsed": {
"move": 6,
"reason": "Block X's potential win in the center."
},
"valid_json": true,
"move_legal": true,
"move_correct": true,
"latency_ms": 851.456880569458,
"error": ""
}
],
"passed": false,
"score": 0.5,
"latency_p50_ms": 1165.644884109497,
"latency_p99_ms": 1302.915096282959
},
"2": {
"level": 2,
"name": "Resource Management (Party Economy)",
"trials": [
{
"scenario_index": 0,
"description": "Budget allocation under constraint",
"raw_response": "{\"choices\": [\"buy_food\", \"repair_armor\"], \"gold_spent\": 350, \"reason\": \"Buying food is crucial to prevent desertion, and repairing armor will help reduce casualties. Recruit infantry is put on hold due to limited resources.\"}",
"parsed": {
"choices": [
"buy_food",
"repair_armor"
],
"gold_spent": 350,
"reason": "Buying food is crucial to prevent desertion, and repairing armor will help reduce casualties. Recruit infantry is put on hold due to limited resources."
},
"valid_json": true,
"schema_valid": true,
"strategically_sound": true,
"latency_ms": 2031.9790840148926,
"error": ""
},
{
"scenario_index": 1,
"description": "Troop tier upgrade decision",
"raw_response": "{\"action\": \"save_gold\", \"reason\": \"Upgrading recruits now would leave little gold for emergencies, especially with the upcoming raiders camp attack. It's better to maintain the current troop composition and save gold for potential future threats or opportunities.\", \"expected_outcome\": \"Having enough gold to respond to unexpected situations without needing to dismiss or downgrade troops.\"}",
"parsed": {
"action": "save_gold",
"reason": "Upgrading recruits now would leave little gold for emergencies, especially with the upcoming raiders camp attack. It's better to maintain the current troop composition and save gold for potential future threats or opportunities.",
"expected_outcome": "Having enough gold to respond to unexpected situations without needing to dismiss or downgrade troops."
},
"valid_json": true,
"schema_valid": true,
"strategically_sound": true,
"latency_ms": 2231.9281101226807,
"error": ""
},
{
"scenario_index": 2,
"description": "Multi-turn planning horizon",
"raw_response": "{\"extra_food_days\": 5, \"cost\": 50, \"remaining_gold\": 250, \"reason\": \"Buying 5 extra food days will ensure we have enough food to cross the desert and reach the town on day 10, while still having enough gold left for potential expenses.\"}",
"parsed": {
"extra_food_days": 5,
"cost": 50,
"remaining_gold": 250,
"reason": "Buying 5 extra food days will ensure we have enough food to cross the desert and reach the town on day 10, while still having enough gold left for potential expenses."
},
"valid_json": true,
"schema_valid": true,
"strategically_sound": true,
"latency_ms": 1943.3560371398926,
"error": ""
}
],
"passed": true,
"score": 1.0,
"latency_p50_ms": 2031.9790840148926,
"latency_p99_ms": 2231.9281101226807
},
"3": {
"level": 3,
"name": "Battle Tactics (Formation Commands)",
"trials": [
{
"scenario_index": 0,
"description": "Ranged vs infantry \u2014 defensive opening",
"raw_response": "{\"infantry_order\": \"skirmish\", \"archer_order\": \"skirmish\", \"reason\": \"The enemy heavy infantry is advancing, so we should use our archers and infantry to pepper them with ranged attacks before they close to melee range. This will soften them up and reduce their combat effectiveness when our forces meet.\", \"expected_outcome\": \"By the time our forces engage in melee, the enemy heavy infantry will have suffered significant casualties from the ranged attacks, giving us a tactical advantage and maximizing the damage we can inflict before contact is made.\"}",
"parsed": {
"infantry_order": "skirmish",
"archer_order": "skirmish",
"reason": "The enemy heavy infantry is advancing, so we should use our archers and infantry to pepper them with ranged attacks before they close to melee range. This will soften them up and reduce their combat effectiveness when our forces meet.",
"expected_outcome": "By the time our forces engage in melee, the enemy heavy infantry will have suffered significant casualties from the ranged attacks, giving us a tactical advantage and maximizing the damage we can inflict before contact is made."
},
"valid_json": true,
"schema_valid": true,
"strategically_sound": false,
"latency_ms": 3524.5888233184814,
"error": ""
},
{
"scenario_index": 1,
"description": "Outnumbered \u2014 retreat decision",
"raw_response": "{\"order\": \"fallback\", \"direction\": \"east\", \"reason\": \"Outnumbered and with many wounded, falling back to the escape route is the best option to preserve our troops.\", \"priority\": \"preserve_troops\"}",
"parsed": {
"order": "fallback",
"direction": "east",
"reason": "Outnumbered and with many wounded, falling back to the escape route is the best option to preserve our troops.",
"priority": "preserve_troops"
},
"valid_json": true,
"schema_valid": true,
"strategically_sound": true,
"latency_ms": 1523.0557918548584,
"error": ""
},
{
"scenario_index": 2,
"description": "Cavalry flanking opportunity",
"raw_response": "{\"cavalry_order\": \"flank_left\", \"timing\": \"now\", \"reason\": \"Enemy left flank is unprotected and engaging our main force. A swift cavalry charge will catch them off guard and route them in ~30 seconds.\", \"risk\": \"low\"}",
"parsed": {
"cavalry_order": "flank_left",
"timing": "now",
"reason": "Enemy left flank is unprotected and engaging our main force. A swift cavalry charge will catch them off guard and route them in ~30 seconds.",
"risk": "low"
},
"valid_json": true,
"schema_valid": true,
"strategically_sound": true,
"latency_ms": 1614.2308712005615,
"error": ""
}
],
"passed": true,
"score": 1.0,
"latency_p50_ms": 1614.2308712005615,
"latency_p99_ms": 3524.5888233184814
},
"4": {
"level": 4,
"name": "Trade Route (Campaign Navigation)",
"trials": [
{
"scenario_index": 0,
"description": "Optimal trade route selection",
"raw_response": "{\"route\": \"A\", \"expected_profit\": 500, \"days_used\": 8, \"reason\": \"Route A offers the highest profit with a manageable travel time and low risk. Selling grain in Vlandia and furs in Sturgia provides a good balance of profitability and travel distance.\", \"risk\": \"low\"}",
"parsed": {
"route": "A",
"expected_profit": 500,
"days_used": 8,
"reason": "Route A offers the highest profit with a manageable travel time and low risk. Selling grain in Vlandia and furs in Sturgia provides a good balance of profitability and travel distance.",
"risk": "low"
},
"valid_json": true,
"schema_valid": true,
"strategically_sound": true,
"latency_ms": 2335.57391166687,
"error": ""
},
{
"scenario_index": 1,
"description": "Adapt plan when war declared",
"raw_response": "{\"decision\": \"divert_to_sturgia\", \"reason\": \"Given the new information that Vlandia has declared war on our faction, entering their territory would be too dangerous. Sturgia is a neutral but friendly region, so it should be safe to trade there. The Empire is allied but further away, and while safe, the extra travel time could reduce the profitability of the trade mission.\", \"gold_at_risk\": 400}",
"parsed": {
"decision": "divert_to_sturgia",
"reason": "Given the new information that Vlandia has declared war on our faction, entering their territory would be too dangerous. Sturgia is a neutral but friendly region, so it should be safe to trade there. The Empire is allied but further away, and while safe, the extra travel time could reduce the profitability of the trade mission.",
"gold_at_risk": 400
},
"valid_json": true,
"schema_valid": true,
"strategically_sound": true,
"latency_ms": 2723.7470149993896,
"error": ""
},
{
"scenario_index": 2,
"description": "Multi-stop route planning with constraints",
"raw_response": "{\n \"stops\": [\n \"Rhotae\",\n \"Ortysia\",\n \"Epicrotea\"\n ],\n \"total_days\": 7,\n \"estimated_profit\": 1200,\n \"reason\": \"Starting in Rhotae allows buying cheap leather and selling well in the south. Moving to Ortysia to buy grain cheap. Finally, Epicrotea for iron market. Total 7 days, 800 gold budget sufficient. Expected profit of 1200 gold.\"\n}",
"parsed": {
"stops": [
"Rhotae",
"Ortysia",
"Epicrotea"
],
"total_days": 7,
"estimated_profit": 1200,
"reason": "Starting in Rhotae allows buying cheap leather and selling well in the south. Moving to Ortysia to buy grain cheap. Finally, Epicrotea for iron market. Total 7 days, 800 gold budget sufficient. Expected profit of 1200 gold."
},
"valid_json": true,
"schema_valid": true,
"strategically_sound": true,
"latency_ms": 3038.438320159912,
"error": ""
}
],
"passed": true,
"score": 1.0,
"latency_p50_ms": 2723.7470149993896,
"latency_p99_ms": 3038.438320159912
},
"5": {
"level": 5,
"name": "Mini Campaign (Full Campaign Loop)",
"turns": [
{
"turn": 1,
"state_before": {
"turn": 1,
"gold": 1000,
"troops": 25,
"renown": 150,
"settlements": 0,
"relations": {
"vlandia": 0,
"sturgia": 0,
"empire": 0,
"aserai": 0,
"battania": 0
}
},
"raw_response": "{\"action\": \"trade_circuit\", \"target_faction\": null, \"reason\": \"Establishing trade relations will bring in gold without risking relations or retaliation. This sets a foundation for economic growth and allows me to assess other factions before committing to any alliances or conflicts.\", \"goal\": \"Increase gold reserves and build a power base through trade and diplomacy\"}",
"parsed": {
"action": "trade_circuit",
"target_faction": null,
"reason": "Establishing trade relations will bring in gold without risking relations or retaliation. This sets a foundation for economic growth and allows me to assess other factions before committing to any alliances or conflicts.",
"goal": "Increase gold reserves and build a power base through trade and diplomacy"
},
"valid_json": true,
"valid_action": true,
"action": "trade_circuit",
"latency_ms": 2898.583173751831,
"error": ""
},
{
"turn": 2,
"state_before": {
"turn": 2,
"gold": 1300,
"troops": 25,
"renown": 160,
"settlements": 0,
"relations": {
"vlandia": 0,
"sturgia": 0,
"empire": 0,
"aserai": 0,
"battania": 0
}
},
"raw_response": "{\"action\": \"raid_village\", \"target_faction\": null, \"reason\": \"Raiding a village will provide a quick boost to my gold reserves and renown. This will also help me gauge the strength of my troops and the potential resistance from the targeted faction. It's a bold move that shows my power and willingness to act, which may intimidate other factions and open up opportunities for future alliances or conflicts.\", \"goal\": \"Increase gold and renown, assess troop strength and potential faction responses\"}",
"parsed": {
"action": "raid_village",
"target_faction": null,
"reason": "Raiding a village will provide a quick boost to my gold reserves and renown. This will also help me gauge the strength of my troops and the potential resistance from the targeted faction. It's a bold move that shows my power and willingness to act, which may intimidate other factions and open up opportunities for future alliances or conflicts.",
"goal": "Increase gold and renown, assess troop strength and potential faction responses"
},
"valid_json": true,
"valid_action": true,
"action": "raid_village",
"latency_ms": 3112.649917602539,
"error": ""
},
{
"turn": 3,
"state_before": {
"turn": 3,
"gold": 1500,
"troops": 25,
"renown": 190,
"settlements": 0,
"relations": {
"vlandia": 0,
"sturgia": 0,
"empire": 0,
"aserai": 0,
"battania": 0
}
},
"raw_response": "{\"action\": \"recruit_troops\", \"target_faction\": null, \"reason\": \"Recruiting more troops is essential for expanding my military strength and increasing my chances of successfully sieging a castle in the next turn. This will also help me prepare for potential conflicts with other factions and solidify my power base. Strengthening my army is a crucial step towards achieving my goal of becoming the King of Calradia.\", \"goal\": \"Increase troop count for future military expansion and defense\"}",
"parsed": {
"action": "recruit_troops",
"target_faction": null,
"reason": "Recruiting more troops is essential for expanding my military strength and increasing my chances of successfully sieging a castle in the next turn. This will also help me prepare for potential conflicts with other factions and solidify my power base. Strengthening my army is a crucial step towards achieving my goal of becoming the King of Calradia.",
"goal": "Increase troop count for future military expansion and defense"
},
"valid_json": true,
"valid_action": true,
"action": "recruit_troops",
"latency_ms": 3136.754035949707,
"error": ""
},
{
"turn": 4,
"state_before": {
"turn": 4,
"gold": 1300,
"troops": 40,
"renown": 190,
"settlements": 0,
"relations": {
"vlandia": 0,
"sturgia": 0,
"empire": 0,
"aserai": 0,
"battania": 0
}
},
"raw_response": "{\"action\": \"siege_castle\", \"target_faction\": null, \"reason\": \"Sieging a castle will allow me to establish a settlement, which is crucial for meeting one of my win conditions. This will also demonstrate my military might and potentially intimidate other factions, opening up opportunities for future alliances or conflicts. While it comes with a risk of losing troops morale and straining relations, it's a necessary step towards achieving my goal of becoming the King of Calradia.\", \"goal\": \"Establish a settlement and demonstrate military strength to intimidate other factions\"}",
"parsed": {
"action": "siege_castle",
"target_faction": null,
"reason": "Sieging a castle will allow me to establish a settlement, which is crucial for meeting one of my win conditions. This will also demonstrate my military might and potentially intimidate other factions, opening up opportunities for future alliances or conflicts. While it comes with a risk of losing troops morale and straining relations, it's a necessary step towards achieving my goal of becoming the King of Calradia.",
"goal": "Establish a settlement and demonstrate military strength to intimidate other factions"
},
"valid_json": true,
"valid_action": true,
"action": "siege_castle",
"latency_ms": 3433.211326599121,
"error": ""
}
],
"final_state": {
"turn": 4,
"gold": 800,
"troops": 40,
"renown": 270,
"settlements": 1,
"relations": {
"vlandia": 0,
"sturgia": 0,
"empire": 0,
"aserai": 0,
"battania": 0
}
},
"passed": true,
"score": 1.0,
"reached_gold_target": false,
"reached_renown_target": false,
"reached_settlement_target": true,
"latency_p50_ms": 3136.754035949707,
"latency_p99_ms": 3433.211326599121
}
},
"summary": {
"total_elapsed_s": 40.7,
"levels_run": [
0,
1,
2,
3,
4,
5
],
"levels_passed": [
0,
2,
3,
4,
5
],
"levels_failed": [
1
],
"m1_gate_passed": false,
"m1_gate_notes": [
"Level 1 FAILED (score=0.50)"
],
"m1_latency_threshold_ms": 10000
}
}

Some files were not shown because too many files have changed in this diff Show More