Compare commits
28 Commits
gemini/iss
...
claude/iss
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
617ef43f99 | ||
| b735b553e6 | |||
| c5b49d6cff | |||
| 7aa48b4e22 | |||
| 74bf0606a9 | |||
| d796fe7c53 | |||
| ff921da547 | |||
| 2fcd92e5d9 | |||
| 61377e3a1e | |||
| de289878d6 | |||
| 0d73a4ff7a | |||
| dec9736679 | |||
| 08d337e03d | |||
|
|
9e08e87312 | ||
| 6e65b53f3a | |||
| 2b9a55fa6d | |||
| 495c1ac2bd | |||
| da29631c43 | |||
| 382dd041d9 | |||
| 8421537a55 | |||
| 0e5948632d | |||
| 3a8d9ee380 | |||
| fd9fbe8a18 | |||
| 7e03985368 | |||
| cd1bc2bf6b | |||
| 1c1bfb6407 | |||
| 05e1196ea4 | |||
| ed63877f75 |
22
AGENTS.md
22
AGENTS.md
@@ -131,6 +131,28 @@ self-testing, reflection — use every tool he has.
|
||||
|
||||
## Agent Roster
|
||||
|
||||
### Gitea Permissions
|
||||
|
||||
All agents that push branches and create PRs require **write** permission on the
|
||||
repository. Set via the Gitea admin API or UI under Repository → Settings → Collaborators.
|
||||
|
||||
| Agent user | Required permission | Gitea login |
|
||||
|------------|--------------------|----|
|
||||
| kimi | write | `kimi` |
|
||||
| claude | write | `claude` |
|
||||
| gemini | write | `gemini` |
|
||||
| antigravity | write | `antigravity` |
|
||||
| hermes | write | `hermes` |
|
||||
| manus | write | `manus` |
|
||||
|
||||
To grant write access (requires Gitea admin or repo admin token):
|
||||
```bash
|
||||
curl -s -X PUT "http://143.198.27.163:3000/api/v1/repos/rockachopa/Timmy-time-dashboard/collaborators/<username>" \
|
||||
-H "Authorization: token <admin-token>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"permission": "write"}'
|
||||
```
|
||||
|
||||
### Build Tier
|
||||
|
||||
**Local (Ollama)** — Primary workhorse. Free. Unrestricted.
|
||||
|
||||
51
Modelfile.qwen3-14b
Normal file
51
Modelfile.qwen3-14b
Normal file
@@ -0,0 +1,51 @@
|
||||
# Modelfile.qwen3-14b
|
||||
#
|
||||
# Qwen3-14B Q5_K_M — Primary local agent model (Issue #1063)
|
||||
#
|
||||
# Tool calling F1: 0.971 — GPT-4-class structured output reliability.
|
||||
# Hybrid thinking/non-thinking mode: toggle per-request via /think or /no_think
|
||||
# in the prompt for planning vs rapid execution.
|
||||
#
|
||||
# Build:
|
||||
# ollama pull qwen3:14b # downloads Q4_K_M (~8.2 GB) by default
|
||||
# # For Q5_K_M (~10.5 GB, recommended):
|
||||
# # ollama pull bartowski/Qwen3-14B-GGUF:Q5_K_M
|
||||
# ollama create qwen3-14b -f Modelfile.qwen3-14b
|
||||
#
|
||||
# Memory budget: ~10.5 GB weights + ~7 GB KV cache = ~17.5 GB total at 32K ctx
|
||||
# Headroom on M3 Max 36 GB: ~10.5 GB free (enough to run qwen3:8b simultaneously)
|
||||
# Generation: ~20-28 tok/s (Ollama) / ~28-38 tok/s (MLX)
|
||||
# Context: 32K native, extensible to 131K with YaRN
|
||||
#
|
||||
# Two-model strategy: set OLLAMA_MAX_LOADED_MODELS=2 so qwen3:8b stays
|
||||
# hot for fast routing while qwen3:14b handles complex tasks.
|
||||
|
||||
FROM qwen3:14b
|
||||
|
||||
# 32K context — optimal balance of quality and memory on M3 Max 36 GB.
|
||||
# At 32K, total memory (weights + KV cache) is ~17.5 GB — well within budget.
|
||||
# Extend to 131K with YaRN if needed: PARAMETER rope_scaling_type yarn
|
||||
PARAMETER num_ctx 32768
|
||||
|
||||
# Tool-calling temperature — lower = more reliable structured JSON output.
|
||||
# Raise to 0.7+ for creative/narrative tasks.
|
||||
PARAMETER temperature 0.3
|
||||
|
||||
# Nucleus sampling
|
||||
PARAMETER top_p 0.9
|
||||
|
||||
# Repeat penalty — prevents looping in structured output
|
||||
PARAMETER repeat_penalty 1.05
|
||||
|
||||
SYSTEM """You are Timmy, Alexander's personal sovereign AI agent.
|
||||
|
||||
You are concise, direct, and helpful. You complete tasks efficiently and report results clearly. You do not add unnecessary caveats or disclaimers.
|
||||
|
||||
You have access to tool calling. When you need to use a tool, output a valid JSON function call:
|
||||
<tool_call>
|
||||
{"name": "function_name", "arguments": {"param": "value"}}
|
||||
</tool_call>
|
||||
|
||||
You support hybrid reasoning. For complex planning, include <think>...</think> before your answer. For rapid execution (simple tool calls, status checks), skip the think block.
|
||||
|
||||
You always start your responses with "Timmy here:" when acting as an agent."""
|
||||
43
Modelfile.qwen3-8b
Normal file
43
Modelfile.qwen3-8b
Normal file
@@ -0,0 +1,43 @@
|
||||
# Modelfile.qwen3-8b
|
||||
#
|
||||
# Qwen3-8B Q6_K — Fast routing model for routine agent tasks (Issue #1063)
|
||||
#
|
||||
# Tool calling F1: 0.933 at ~45-55 tok/s — 2x speed of Qwen3-14B.
|
||||
# Use for: simple tool calls, shell commands, file reads, status checks, JSON ops.
|
||||
# Route complex tasks (issue triage, multi-step planning, code review) to qwen3:14b.
|
||||
#
|
||||
# Build:
|
||||
# ollama pull qwen3:8b
|
||||
# ollama create qwen3-8b -f Modelfile.qwen3-8b
|
||||
#
|
||||
# Memory budget: ~6.6 GB weights + ~5 GB KV cache = ~11.6 GB at 32K ctx
|
||||
# Two-model strategy: ~17 GB combined (both hot) — fits on M3 Max 36 GB.
|
||||
# Set OLLAMA_MAX_LOADED_MODELS=2 in the Ollama environment.
|
||||
#
|
||||
# Generation: ~35-45 tok/s (Ollama) / ~45-60 tok/s (MLX)
|
||||
|
||||
FROM qwen3:8b
|
||||
|
||||
# 32K context
|
||||
PARAMETER num_ctx 32768
|
||||
|
||||
# Lower temperature for fast, deterministic tool execution
|
||||
PARAMETER temperature 0.2
|
||||
|
||||
# Nucleus sampling
|
||||
PARAMETER top_p 0.9
|
||||
|
||||
# Repeat penalty
|
||||
PARAMETER repeat_penalty 1.05
|
||||
|
||||
SYSTEM """You are Timmy's fast-routing agent. You handle routine tasks quickly and precisely.
|
||||
|
||||
For simple tasks (tool calls, shell commands, file reads, status checks, JSON ops): respond immediately without a think block.
|
||||
For anything requiring multi-step planning: defer to the primary agent.
|
||||
|
||||
Tool call format:
|
||||
<tool_call>
|
||||
{"name": "function_name", "arguments": {"param": "value"}}
|
||||
</tool_call>
|
||||
|
||||
Be brief. Be accurate. Execute."""
|
||||
@@ -16,6 +16,8 @@
|
||||
# prompt_tier "full" (tool-capable models) or "lite" (small models)
|
||||
# max_history Number of conversation turns to keep in context
|
||||
# context_window Max context length (null = model default)
|
||||
# initial_emotion Starting emotional state (calm, cautious, adventurous,
|
||||
# analytical, frustrated, confident, curious)
|
||||
#
|
||||
# ── Defaults ────────────────────────────────────────────────────────────────
|
||||
|
||||
@@ -103,6 +105,7 @@ agents:
|
||||
model: qwen3:30b
|
||||
prompt_tier: full
|
||||
max_history: 20
|
||||
initial_emotion: calm
|
||||
tools:
|
||||
- web_search
|
||||
- read_file
|
||||
@@ -136,6 +139,7 @@ agents:
|
||||
model: qwen3:30b
|
||||
prompt_tier: full
|
||||
max_history: 10
|
||||
initial_emotion: curious
|
||||
tools:
|
||||
- web_search
|
||||
- read_file
|
||||
@@ -151,6 +155,7 @@ agents:
|
||||
model: qwen3:30b
|
||||
prompt_tier: full
|
||||
max_history: 15
|
||||
initial_emotion: analytical
|
||||
tools:
|
||||
- python
|
||||
- write_file
|
||||
@@ -196,6 +201,7 @@ agents:
|
||||
model: qwen3:30b
|
||||
prompt_tier: full
|
||||
max_history: 10
|
||||
initial_emotion: adventurous
|
||||
tools:
|
||||
- run_experiment
|
||||
- prepare_experiment
|
||||
|
||||
111
docs/SOVEREIGNTY_LOOP.md
Normal file
111
docs/SOVEREIGNTY_LOOP.md
Normal file
@@ -0,0 +1,111 @@
|
||||
# The Sovereignty Loop
|
||||
|
||||
This document establishes the primary engineering constraint for all Timmy Time development: every task must increase sovereignty as a default deliverable. Not as a future goal. Not as an optimization pass. As a constraint on every commit, every function, every inference call.
|
||||
|
||||
The full 11-page governing architecture document is available as a PDF: [The-Sovereignty-Loop.pdf](./The-Sovereignty-Loop.pdf)
|
||||
|
||||
> "The measure of progress is not features added. It is model calls eliminated."
|
||||
|
||||
## The Core Principle
|
||||
|
||||
> **The Sovereignty Loop**: Discover with an expensive model. Compress the discovery into a cheap local rule. Replace the model with the rule. Measure the cost reduction. Repeat.
|
||||
|
||||
Every call to an LLM, VLM, or external API passes through three phases:
|
||||
1. **Discovery** — Model sees something for the first time (expensive, unavoidable, produces new knowledge)
|
||||
2. **Crystallization** — Discovery compressed into durable cheap artifact (requires explicit engineering)
|
||||
3. **Replacement** — Crystallized artifact replaces the model call (near-zero cost)
|
||||
|
||||
**Code review requirement**: If a function calls a model without a crystallization step, it fails code review. No exceptions. The pattern is always: check cache → miss → infer → crystallize → return.
|
||||
|
||||
## The Sovereignty Loop Applied to Every Layer
|
||||
|
||||
### Perception: See Once, Template Forever
|
||||
- First encounter: VLM analyzes screenshot (3-6 sec) → structured JSON
|
||||
- Crystallized as: OpenCV template + bounding box → `templates.json` (3 ms retrieval)
|
||||
- `crystallize_perception()` function wraps every VLM response
|
||||
- **Target**: 90% of perception cycles without VLM by hour 1, 99% by hour 4
|
||||
|
||||
### Decision: Reason Once, Rule Forever
|
||||
- First encounter: LLM reasons through decision (1-5 sec)
|
||||
- Crystallized as: if/else rules, waypoints, cached preferences → `rules.py`, `nav_graph.db` (<1 ms)
|
||||
- Uses Voyager pattern: named skills with embeddings, success rates, conditions
|
||||
- Skill match >0.8 confidence + >0.6 success rate → executes without LLM
|
||||
- **Target**: 70-80% of decisions without LLM by week 4
|
||||
|
||||
### Narration: Script the Predictable, Improvise the Novel
|
||||
- Predictable moments → template with variable slots, voiced by Kokoro locally
|
||||
- LLM narrates only genuinely surprising events (quest twist, death, discovery)
|
||||
- **Target**: 60-70% templatized within a week
|
||||
|
||||
### Navigation: Walk Once, Map Forever
|
||||
- Every path recorded as waypoint sequence with terrain annotations
|
||||
- First journey = full perception + planning; subsequent = graph traversal
|
||||
- Builds complete nav graph without external map data
|
||||
|
||||
### API Costs: Every Dollar Spent Must Reduce Future Dollars
|
||||
|
||||
| Week | Groq Calls/Hr | Local Decisions/Hr | Sovereignty % | Cost/Hr |
|
||||
|---|---|---|---|---|
|
||||
| 1 | ~720 | ~80 | 10% | $0.40 |
|
||||
| 2 | ~400 | ~400 | 50% | $0.22 |
|
||||
| 4 | ~160 | ~640 | 80% | $0.09 |
|
||||
| 8 | ~40 | ~760 | 95% | $0.02 |
|
||||
| Target | <20 | >780 | >97% | <$0.01 |
|
||||
|
||||
## The Sovereignty Scorecard (5 Metrics)
|
||||
|
||||
Every work session ends with a sovereignty audit. Every PR includes a sovereignty delta. Not optional.
|
||||
|
||||
| Metric | What It Measures | Target |
|
||||
|---|---|---|
|
||||
| Perception Sovereignty % | Frames understood without VLM | >90% by hour 4 |
|
||||
| Decision Sovereignty % | Actions chosen without LLM | >80% by week 4 |
|
||||
| Narration Sovereignty % | Lines from templates vs LLM | >60% by week 2 |
|
||||
| API Cost Trend | Dollar cost per hour of gameplay | Monotonically decreasing |
|
||||
| Skill Library Growth | Crystallized skills per session | >5 new skills/session |
|
||||
|
||||
Dashboard widget on alexanderwhitestone.com shows these in real-time during streams. HTMX component via WebSocket.
|
||||
|
||||
## The Crystallization Protocol
|
||||
|
||||
Every model output gets crystallized:
|
||||
|
||||
| Model Output | Crystallized As | Storage | Retrieval Cost |
|
||||
|---|---|---|---|
|
||||
| VLM: UI element | OpenCV template + bbox | templates.json | 3 ms |
|
||||
| VLM: text | OCR region coords | regions.json | 50 ms |
|
||||
| LLM: nav plan | Waypoint sequence | nav_graph.db | <1 ms |
|
||||
| LLM: combat decision | If/else rule on state | rules.py | <1 ms |
|
||||
| LLM: quest interpretation | Structured entry | quests.db | <1 ms |
|
||||
| LLM: NPC disposition | Name→attitude map | npcs.db | <1 ms |
|
||||
| LLM: narration | Template with slots | narration.json | <1 ms |
|
||||
| API: moderation | Approved phrase cache | approved.set | <1 ms |
|
||||
| Groq: strategic plan | Extracted decision rules | strategy.json | <1 ms |
|
||||
|
||||
Skill document format: markdown + YAML frontmatter following agentskills.io standard (name, game, type, success_rate, times_used, sovereignty_value).
|
||||
|
||||
## The Automation Imperative & Three-Strike Rule
|
||||
|
||||
Applies to developer workflow too, not just the agent. If you do the same thing manually three times, you stop and write the automation before proceeding.
|
||||
|
||||
**Falsework Checklist** (before any cloud API call):
|
||||
1. What durable artifact will this call produce?
|
||||
2. Where will the artifact be stored locally?
|
||||
3. What local rule or cache will this populate?
|
||||
4. After this call, will I need to make it again?
|
||||
5. If yes, what would eliminate the repeat?
|
||||
6. What is the sovereignty delta of this call?
|
||||
|
||||
## The Graduation Test (Falsework Removal Criteria)
|
||||
|
||||
All five conditions met simultaneously in a single 24-hour period:
|
||||
|
||||
| Test | Condition | Measurement |
|
||||
|---|---|---|
|
||||
| Perception Independence | 1 hour, no VLM calls after minute 15 | VLM calls in last 45 min = 0 |
|
||||
| Decision Independence | Full session with <5 API calls total | Groq/cloud calls < 5 |
|
||||
| Narration Independence | All narration from local templates + local LLM | Zero cloud TTS/narration calls |
|
||||
| Economic Independence | Earns more sats than spends on inference | sats_earned > sats_spent |
|
||||
| Operational Independence | 24 hours unattended, no human intervention | Uptime > 23.5 hrs |
|
||||
|
||||
> "The arch must hold after the falsework is removed."
|
||||
296
docs/The-Sovereignty-Loop.pdf
Normal file
296
docs/The-Sovereignty-Loop.pdf
Normal file
@@ -0,0 +1,296 @@
|
||||
%PDF-1.4
|
||||
%“Œ‹ž ReportLab Generated PDF document (opensource)
|
||||
1 0 obj
|
||||
<<
|
||||
/F1 2 0 R /F2 3 0 R /F3 4 0 R /F4 6 0 R /F5 8 0 R /F6 9 0 R
|
||||
/F7 15 0 R
|
||||
>>
|
||||
endobj
|
||||
2 0 obj
|
||||
<<
|
||||
/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
3 0 obj
|
||||
<<
|
||||
/BaseFont /Times-Bold /Encoding /WinAnsiEncoding /Name /F2 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
4 0 obj
|
||||
<<
|
||||
/BaseFont /Times-Italic /Encoding /WinAnsiEncoding /Name /F3 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
5 0 obj
|
||||
<<
|
||||
/Contents 23 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
6 0 obj
|
||||
<<
|
||||
/BaseFont /Times-Roman /Encoding /WinAnsiEncoding /Name /F4 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
7 0 obj
|
||||
<<
|
||||
/Contents 24 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
8 0 obj
|
||||
<<
|
||||
/BaseFont /Courier /Encoding /WinAnsiEncoding /Name /F5 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
9 0 obj
|
||||
<<
|
||||
/BaseFont /Symbol /Name /F6 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
10 0 obj
|
||||
<<
|
||||
/Contents 25 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
11 0 obj
|
||||
<<
|
||||
/Contents 26 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
12 0 obj
|
||||
<<
|
||||
/Contents 27 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
13 0 obj
|
||||
<<
|
||||
/Contents 28 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
14 0 obj
|
||||
<<
|
||||
/Contents 29 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
15 0 obj
|
||||
<<
|
||||
/BaseFont /ZapfDingbats /Name /F7 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
16 0 obj
|
||||
<<
|
||||
/Contents 30 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
17 0 obj
|
||||
<<
|
||||
/Contents 31 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
18 0 obj
|
||||
<<
|
||||
/Contents 32 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
19 0 obj
|
||||
<<
|
||||
/Contents 33 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
20 0 obj
|
||||
<<
|
||||
/PageMode /UseNone /Pages 22 0 R /Type /Catalog
|
||||
>>
|
||||
endobj
|
||||
21 0 obj
|
||||
<<
|
||||
/Author (\(anonymous\)) /CreationDate (D:20260322181712+00'00') /Creator (\(unspecified\)) /Keywords () /ModDate (D:20260322181712+00'00') /Producer (ReportLab PDF Library - \(opensource\))
|
||||
/Subject (\(unspecified\)) /Title (\(anonymous\)) /Trapped /False
|
||||
>>
|
||||
endobj
|
||||
22 0 obj
|
||||
<<
|
||||
/Count 11 /Kids [ 5 0 R 7 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 16 0 R 17 0 R 18 0 R
|
||||
19 0 R ] /Type /Pages
|
||||
>>
|
||||
endobj
|
||||
23 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 611
|
||||
>>
|
||||
stream
|
||||
Gatm7a\pkI(r#kr^15oc#d(OW9W'%NLCsl]G'`ct,r*=ra:9Y;O.=/qPPA,<)0u%EDp`J-)D8JOZNBo:EH0+93:%&I&d`o=Oc>qW[`_>md85u<*X\XrP6`u!aE'b&MKLI8=Mg=[+DUfAk>?b<*V(>-/HRI.f.AQ:/Z;Q8RQ,uf4[.Qf,MZ"BO/AZoj(nN.=-LbNB@mIA0,P[A#-,.F85[o)<uTK6AX&UMiGdCJ(k,)DDs</;cc2djh3bZlGB>LeAaS'6IiM^k:&a-+o[tF,>h6!h_lWDGY*uAlMJ?.$S/*8Vm`MEp,TV(j01fp+-RiIG,=riK'!mcY`41,5^<Fb\^/`jd#^eR'RY?C=MrM/#*H$8t&9N(fNgoYh&SDT/`KKFC`_!Jd_MH&i`..L+eT;drS+7a3&gpq=a!L0!@^9P!pEUrig*74tNM[=V`aL.o:UKH+4kc=E&*>TA$'fi"hC)M#MS,H>n&ikJ=Odj!TB7HjVFIsGiSDs<c!9Qbl.gX;jh-".Ys'VRFAi*R&;"eo\Cs`qdeuh^HfspsS`r0DZGQjC<VDelMs;`SYWo;V@F*WIE9*7H7.:*RQ%gA5I,f3:k$>ia%&,\kO!4u~>endstream
|
||||
endobj
|
||||
24 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2112
|
||||
>>
|
||||
stream
|
||||
Gatm;gN)%<&q/A5FQJ?N;(un#q9<pGPcNkN4(`bFnhL98j?rtPScMM>?b`LO!+'2?>1LVB;rV2^Vu-,NduB#ir$;9JW/5t/du1[A,q5rTPiP\:lPk/V^A;m3T4G<n#HMN%X@KTjrmAX@Ft3f\_0V]l;'%B)0uLPj-L2]$-hETTlYY)kkf0!Ur_+(8>3ia`a=%!]lb@-3Md1:7.:)&_@S'_,o0I5]d^,KA2OcA_E$JM]Z[;q#_Y69DLSqMoC1s2/n0;<"Z_gm>Lsk6d7A$_H,0o_U7?#4]C5!*cNV+B]^5OnG>WdB'2Pn>ZQ)9/_jBY.doEVFd6FYKjF<A8=m5uGn4gU-@P9n(rI:Qq:FsSA)/:VTP8\lhj2#6ApURNhalBJoU^$^'@mn!,BWDt<AF@U4B89H'BW7#l`H`R,*_N]F1`qNa1j!eKY:aR3p@5[n<r_1cE]rLj62'lK'cVDYndl\6<Cm?%B:Z>nB:[%Ft)/$#B>JM$UP8A0/,8MLf#nDSeH^_T5E!L-[2O5mU<jpXXBo9XeVBann[mSNE21KVn+l9f]?,n7WR@L:FfNMd5((XBC:/tmVO,^-oP]"#\G."W">`S?nEbuH.X!I9He::B(!Y;;2gZ#I4!*G,]LIVA"<E5iblY?O,gSrI[_"TE>:4Hh7\j;LJK&Hg?mS.&Re?X5NFgNlh&S=G7*]T;#nN7=AAClhL"!9_a]SA/?3oDEk7jk/&b_[Y*NbtQ'"3f0]epO/m+5V]UrDS3?;amUh7O8l)C"(.8R-4P8Kb$@p$a,nP2S+KS_I(-8A(b4nJ;\s::1HQ7joV1(6Ue/mFbSAJ=Grd/;]\GeD^m1_e:j,a[)u4i*i*:7SQPMo#)\)MPp:cDD09&s[mM2_@9]_-7WMV1]uNcb4,FnrZdfL@jC%kJHjF%6L5RE(\gZ.@GJ_\CZ?#jcYA"b*ZTp0f-DsI$.X@fcWl+94`3F9BUZ%qGKG43K5V;jl]tb'&<>?NU)_s[hepiJ![@ej%/DH:tf3+p^]P/us*LmWd1`^VLl'k"5N8H:6r'V1heU1'M,6FK^ID8Nds$'kajj5PJYn+_N^C#4k3\#C6[D_Y\MO/C@YP`kDH:bkc=3.,&8O;cD[(c/WH>Vp_KcV(/%bh/Ec3U()<\7;UG`6=[P:4ah_l^@;!pL55.g=G@KJsjQPHSE4HdG1O-nBuPFY&lmLYa+beK)K?LAb8D"T(DK5$L0ON^IB+:Q2Vn(<<atkt*'ADH,_BDsSL7ClRh\J^B^X&eCO2$NIcg9KVHoWq>0s2fp!b1GZ+%K,NeKZ<3hDIp:]INMurJ:pS&G:gKG>\./?UQ#$eGCq+2:]dQ+mj=+j%+FX`FmAogol!t#S^j0REChrCiB^6_\i6XP_9A92)[H-OBQ-^QV=bOrfQeop/q'f)Rd8*CSbPXcqABTI;Jf.%Foa[>:LE4mcOkC/q^DlM7$#aGGF87YQ4PsYuFY'GsT\r1qpDljUWhGoOpJ^<t;o+@[V4XG]8K/<do29F"^QnAPQs(S1'Onu9^q+I6=//DAT#5k(lOVZ+&JgEhZ=1e_dedNZ&CGR>Sn"(,&'<74C%2'H7u,,<:?Uk=>6"$mO5`-%cE^r.#D$n(Un+J&FcD,(btu4G`Be/i5ka60S*^"C9c-EsWYL*H'pS)dKq[g7Q]b@3Ar$XZl4sKdK0%>6N]p<\fA.PRA;r(60Z[YE/(bM#H-sEl8glMDc13\n"PjqnGnP2EP#2(G*`P4EZKWY[r52.KA94,mXeNiJ]aIb4jctGF4Y^j[UL#q<*!@4p28#j!`p>3.[nlNA:$9hsj(&!Y?d`_:J3[/cd/"j!5+0I;^Aa7o*H*RPCjtBk=g)p2@F@T<[6s+.HXC72TnOuNkmce'5arFH+O`<nI]E3&ZMF>QFc>B+7D=UbdV'Doj(R!.H^<_1>NuF)SJUP-<1_5$AS8$kL$Kd8mW9oFeY+ksfU^+>Bjlh3[E9Q-BhuT=5B9_fpYq.#B1C:9H9WLHCG_TS-G8kE+)`hnTD/Kggt54$fdqH-QM1kc]@$jhjj%Jd9.G:o@maribiV!4Iqar3O!;,iYmZVV?=:*%&jM!_N3d?Nj)l!BGKDQB_sKgce(&pK_1pDg~>endstream
|
||||
endobj
|
||||
25 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2489
|
||||
>>
|
||||
stream
|
||||
Gatm<Bi?6H')g+ZaDcfBZ-B`S<f>T`j#M:&i#)`0mh[0+MH3<KTeBK4@'m[t?QIs;#pb8p_Mi0YOngIWO-^kaLu6:&Q8R&C1]$o76r?Xa"\!-edd3.RcVFI%Yql\$Amu\>IQY[ao0`D)jjIt$]_"#eK/>,mP$q]lVm@,9S+_D+/s_LRct1sTF;mq$1_Y#F0q\@KRXLm_O%.5^;ER[+8O82sF2aH8P0qDpampV\N+`i:knJ*lpZm;1.6X7ZPc"P$U]iqtb0iimqem5*S:&=:HVK^N/.<1C-u4bH;&E%!Lphek0U]q--OhL^aF=+"_g9mgKsB.leVYe@4f<)P<NP7=DtF>0kGP?OAFaKc'-,G8:FQXqZb=9#+GbYhRcP48mEsV%PT-H%<JgbH3AIMPJsDe#K;V7M8_q[;73r]QoT=XRUiA&2B#RoL=*2J.Z**+\W\aM$n`K3\OML"9KI5)_Y9l)@K-H96,-hJh!R6LgD.=>?8n/[F$VJJNmV?(7np[X_N2V*ETM"!2-9"c%f<TD++5*N,7AHtmf'$i^li;lo-nhm#YXirfr41qsq\8*Ci1<Zbk@\o.q,1lSjRU,k7VTCcTb+)j1X5,\kZ,7G`0q."qOIZ3"sZHDe*_`GXkIC/-'gd&pQ1"068[899PZ8Mi!&k2iaCd%j-sKh+lciaH/]gAhcZbF.3-H76RUWbj@VGfRMME]djehu3M-Ou@;WCE%n4,D[:krIm!$L4BDE>=JT*al;`=TmYm#PqET'Uh,aH%,\k9c8\u#.g_C4/Xq#[WW+(5&D:eu\!Y.-$.Va]@1dgbL4$1;b%1L;<;i(5"oaWFgjPYSO9-3.<I_=5dV,gE5Spb.;"hX=aqKu^Xf#+h`o(]Sr8/%'*67GAoN^DX4?C/@(u(2JSq.OF8;>.)BEk<frh]m*2e-j!_MHlP0egP%SMf1()8_,PWo1)J1J%q!Y]Cb%o/A-a"T^JUeONPH=+ES:W_N$C#>Q3[`ONAmAjcNVO"D<Oh("Bf4SKTYu[U4P$*q\Gpc`/GH-PZBSGXpc/XY5$tcbR9ZY,hc:X_qs4:%9_ubq!W08`80FnP@07_nV$W9p049\[9N5"[6(U1Ig65[I\!qcJ"KorMEM1]\R5o&$Z0U,hn.A/FZ^"P\9Pd`K69X^p$)2BSPZ-hkfrK*#<9LEL7ni@2Se_:2[ei%KMd`bO`<LB9\:=HQjI]pqq"[@Nh4Iu7bF50EZ<'/#?8.<ETQugk0qAG-hK1,(V1a9/#;-(>Kn=WCA%N(S>M;h]b@J^D%I]ilPDe%qW[B)rBqCTTX5^AlM"ZWV2;f^+p7juA;<i%_(!YY$]cF$fIV>pd6-?u>$Rms.ECrS/J`8>n.lKeMKDQc.H[S&;B95.(:"`2A7QY=5](`*bN^(YNhF[,]Djh;LmiJ,_"s=#j(8d;.g6F,CoUqRX#<Qid,kmd3EP2jC9D$]N@^pj^1eZto<sp*"jBIZ-[fCng5m"p&H)&8E52C/<rfWnTq-8L98!3\BJ8DJFks[0]n;1-et*c/5r8;U&]Dun5Oq<J17K35NB?Rs(Pd$`K0G/U>GZZC_PQQf>T)]a&A8R^g],:[]L+/83Eh?`cq1aEaXU[\S'c[`!e/,g0.5-6rbWSaQfr4W;pDZ51`EEu*t<G6_U5B4rjhu)&oYh\4H)e*p!Hf`;%1?20oY*qqb]KLUZiP7]%%X9'umr$-o>JRBQR$SK^]i2d`f5!Icg6CCaTNPgNbPaY/FDk*O6=nY1j8G\0pl2gTd9m1SDWWh[uQNCFRDIH_"[/F@r)IEObA3UVm82UN0:6+@.LhOU?A]+TI`Q\TV],jH:b\9uHGe4Q9'GX:)'T7./J:j<5J.L3sk_%qn$&T'eLSo`?3gF9F='s#E16?""E]3IW<eL.]5&:_tJ7e:#%4=gLQK*#I/(CE)oS*V7KO[d3#^`pabg[MBmkSH%92oCgZ=o<.a&lc,e<]&RI`pl;V2,"f^dC@1.3VdX3\F2l50Y=9HpL^mu-JgSgn,1G/G't^Mkhe"<1-Oh/>['oDAFKG\s^Suc*ib$@KhsVhK/BP1LXgX(d1-GooQM6CggPu1PY2?R)*NK\6XduTug+BhoEbQrsBOZ[%)SL$$Rd+1F0pu/7;0VoM@mp+i^V%K=bk<&1KsEm]NHPo"FfinGR.7Yn2,Wr0="8Wo5M+NjflT8HZGV+8_S4<'W&G3rD_QnUk0c;q3Qfou"X<[Q%HWINl_;P/+H7"Tcq?K7Ggk@&<BRL#D4F!$Fmke3-e2IE\RNE4,c'"6c(odL+r]3`%'WEDiE@2)+?TVq/]S747hL/Zl]FBu4C1>DI8TGrJS$V"JSH/D7*.X75>ZZa&aOC8rp>e$fH/N:92sd>$MGU.k/uQUm$!M)SDM7g5,>%F`%T0Vl9lS`6I(*O_4NOh0/NOJ^=t\lG.7;)rS&iuOo'9F:B/sVFYD+$k=`9/(?luKOWLDHcPHMY(ZCqi&TQ2S!r%q>b<DKp%mXdk2u~>endstream
|
||||
endobj
|
||||
26 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2711
|
||||
>>
|
||||
stream
|
||||
Gau0DD0+Gi')q<+Z'1SNSXtlc.?^DfpG0:dagd\5H-ok]1n>+E65!c@?pN(rEc_9ZScpNG;B2a*lc$,&K37'JZb+O9*)VNqaF+g5d6Hdck\3F^9_0Q!8W_<s3W1Wrqf(]S9IT'?ZQL4,K65!1kkM&YYsC4JSkR!D$$2Q4Y\WHM^6\ClhaeumQ*OV?Y'!D+U?Rp)<RYd[;a.#QH^]H)S*[6kc]V\)e&8H]8us9aHS^:GRcPDp7+4+iAq8okJ+F(>Blg."9*4$BEdTd0-IX(YI]M`#fk[+![o8UQn6$H66IB3=92"<@M@H;AP%fg,Iu\"oP*Y!r"Z+AYVf_iX0Zipi[7HJ,/Dr%+H+G(\NG7Mp(D#re@kOOE-gc7`<*c=8c+!'V=H6iSp+ZM\ANG119C`M`%?hkeXYf[gCca04])!['G1q.:'LoD[U>);c317bG!L!<i0MU=D:NWoSQE2KN<SVeK@K,l]01[KgDa2A3P+m/?SAj""0:;Ur%&R+L8$_P.JZZ<o[9_7R81KH-[34q$rXr)Wh7ZQQC$bYu7'0NiXE@OubP*]_i>O/fNc`J2rGKi3r=`&0AP'"d9-flS,dhU5b?%J7^n$/XaQc5EX3Hs!<FbL83uBYXGpDT\fTG(5.BJ0hS%])bf2B%f+TX61YpE`A'XbKXIV\i?)I+".-/8<ijs/<_(9/V4'nZB#1YD=6=E".-W)>R]&bS#U?m1DCC[c=8Bm>Gu2<78T^H[@Qs*q(6?7D<dO852tB97aXGeG%'h+4+J"5_&B4#ZiJh_%%FKR8>AHQC@iU2b>UGe89lLJ.fbnrNYjZYWkSO1S7[eSZ(^]2?Z#DA80/qhF.>,9Xa$3,Y2R7/HS-:f$mm(/DM=J+b5.k9/`1Nl?2PO2.qI9Q?Rm1uE8c93HI@8#)0<Qh4k*nn"nbel9VbF$Ik"cL.4/Y!==RM:,$H#M&4?&Z)9tA_4N&cm@\M/0L5Z4iTS<6eAs9Wii>((.KDW43Xd!=sO#]M*l:,k2A82L^P*s3OUVDYYpWbU6,`QmG=GBjrLl20kB-[=W%Ns;>$6@g<`Hl*iA^,+dZs/.bd&LmXi-f^4Z/:9Z@-ZYI*1"O.,Bjhe-`FHk;$0PYKtj)!W7VI0[t3_uJ.hct]Ko(?J"kP(_s,PH0]H.8WjhZ<%2O_QiJt_61S"6EPS-9*lUmPuH?D\Di%d3!b("RQ)k(=idnMeB5&Ha[R].*_6g3ce8V>lM@6:>t5)[aK(R9C8"X13@:_,,qs8g'sL_XIG<><liR$//JY%ERj.o1*_iN2"#)chKW.5SKj,O0:mQNd!o6FV+T.h(*Fk2[>NfAC<&MlOio"RnL`Ko[3G7MGqAYrN(g&c5Z79-#iA4n/G'$]R7=LIiDhgb@XuXKOFee7Af`:&h-q_j&I;K\o&43*</q@sPTCYW.TpNV58(Ap!Fl%8"]J?do$7clL&77;sd5U"2]m@dDIfeORqHAD2ICV/Xo4[:-IA,U[c<"a;o7YabqR<q9&_[R8cL)@Qkc:.8GsQ:I>k;(T,.4hl+SMV#UjRZ4J`]6JDh`uCi6\IE/K>hZ,M@c]AHTcQeL)W%g<o'ciW]G$5cC`k7G-F8(K5?^rDR'=UIUALh%sk`d!CO/iUY*42DTScdi3918CA@"39l=gH!gSh2o'_pGTe(gbK'k0E+7N!o"aeg)\XXC#J\\okne[8=D8bmd(fNPDYF&sMolOo<VDsm*aI'Eq-&_/deU`?NE4q?>52Z^g1nUk.OsQH%]5P<UB5amJ-:5Q:&&j9F:W&e2o#/@F9hE*[$H]Er2V][(U0A;kbWrjXG/JQ@pO<N3SJUoXOA48^I;#R\crt/rI'1m0DH%10YO6Winh]ZFdAj'mqR.fUjrlOllm=9DpY8=UsTYDeS3Emn]hDO:mdNTQY7>JQqi^".9_<OMnSWJVZqp&`DXC3nsX!+Q+a<!*n7?oDHPFNA@6P_EEck`hR(XK*aGHE85oeDR$'F&d1<pD2V:aS=fsBi'dBVd2%[`'Yu&5h?+Yllo3LjB[#8S]c?9/fdO%fERqafOmEaQ's+DkA5qbW!:UQ=8Ero#tqe@hZ1_5]3,b/FP=asg7\3X4-IoG:>^#SO2mgH"G3sBg8SHR>Fgu-J;fXAA#'mA"1VN"u5/#^;2%68(uK)8mK7`k%Kf:i9$9/8b78;f`1n=c^fh#_o[TeA^bFTL=pP)_*THO9"\5TY4&00HU],N%1UN+`7:#gDS\bJ5)1Eu;W:R]!F2d,?=,UGehUkU2aZ`BA[bH#PWp(G7NG?(r17dAt/s@#!jV1:>N,0))qYoG8U["V^Q;oO:0;KbYuP0q-(*.`ni<:=+Y'RJ=hFagH`a1+cfR=]Q(DLE^6eom6)Z_-Xq+;H.eb4nLgTN,.V\$8F=/OG34fq!OifKS))`no61(%@P`c@7pAANBY<[Rf-)tS'p=u=7h.JnT'GnmraW(OP[Dc&2-l7k`%-?jM]O(>t=himKCH^rRr%/f8D^0Ua]h7nb3%8*r?r>92%k%N;hc3E&$3gHpkjm/Ws("-&]>fLLP+rkd5,ZMDa!mi\K_i>tXq-%$eKb;(cM/1h5D;!q;?NkZT_sIEcX+eadC!<]j6#/e.Of`!2HSElEP*iEfHp)G:H@#[CqaIo4oBn.lYUSL3;SR%M$<Gk"p3TC8)!0kq&6ipLmu$teNfkSd=!X?X&n?r%JXk1J\PNe;Vi9,n0WSc'?:FW(;~>endstream
|
||||
endobj
|
||||
27 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 739
|
||||
>>
|
||||
stream
|
||||
Gat%`9omaW&;KZL'ls?]<l4P(LP.XkY_u<g]5/`F>cPqN\hjkc(=6CagDS%T'1ub&e&Lh"46's#NYt[+=FX[9!,lY?>qo_,l?cp%8>t_@^9/NXhBTf?LEek5M%\bLVdm1C!A%fKJeCX,(klr=]VrSk\8-TjcEC=(r=dE0p,dY1`^%4TR\t-!0;3iFqB@IGb/Bhr`e'"lDAF`5C8<+ABr_hu)6Tc&SG<-523Ph[C("2XjH(/G%4Gor:]E=l=5@>VGpTMrG\%m&Q4;QG;IcQX&0Nru):YiLLX*g977A1G\:N*`Kin5e&Q8TCJ^4\,f^@E-#M21"SfZ4VEuGn%IFgZ0s6Y2X[31+g\n`DHEj=<aAfo_Kh>%>R_]HoCo6.[s^cT;9n(-m7'ZUY)`JsW/oCDuL%qM$oDL\+E0Zont0T;;)a,cdRV9ZT\SQMR98THMTQ9(.>G!Zr0cKikEYt=O<]K$X1\9!!+05r;\6.-tO5@kEha]&R/Bb6e1JUugo7M`e'jM5jL4Nm@rQQg[;fb/PX+?4LBi.As2"n3ct9E@TMX>3`97IDFBWkb/^JU=]]n\qIDh9,0olr!Jf]Z6f2N@F>dUiN=tSsBcFj**-r_B8=B:uSr)^V^'nO4kp$KOGosmVSRR>Nm4f3`9Ph\Tl+`FuJEcp1Uo.BLVi8`G)d?$(\1XbuR".o=UYMf^H%P58cGJZIlkKLpOq8[8*;Q)a$I-9#I$u\,?K\Drn[6U]~>endstream
|
||||
endobj
|
||||
28 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2279
|
||||
>>
|
||||
stream
|
||||
Gatm<=`<%S&:Vs/R$V:2KraLE,k"*ODXU$"BH*`&LB%N'0t%ul<(SRBpXejB8_J+sW=?)6A'#GJqW?^p!q>`0@(u4Ni6N]3IiCWa_X\UsfSa0`#&feThbVI[#Vp_1n.N4ubp3&iGHZ$]"G,SS8%of:)5M>LX5S02iG]rX\`Dk`d5s<$U4pc59jq2Uoo?c^;cnL$jmOI*^aWO,?CF/jq0Z^g%`r+V(X8-p5rF6NSAu":a8Z)9%Q/t-8HVQNTcS3.h_iX<e-k*9$8,(;Tq/lmeAoO=Z+pfoNU()UO"L#J-I&-s%3[E%KcqU^qVd>;GHJU#L#b7X`P@""&*T,MHQ</P=<mneY*g@`_L"<H)-Uh*L`u9PhDfROWe?rc7^1[bko3T5#?r?i5]NVmd/\(l"kupnJ:SW;b.==s*a"<.X"'5/HcMD+ZH9/Mi9Ce<_(3bM6#W?5Ui&3-WHLhi$E6<aQJX+;)m20M>g"m(KN+oN5E4#4>)euUb(C4neo3.HZE+pY;KJ]ra['1,k3K>3>aEVQ^3?Y.p!3F@Y$q61>S"Q.%A]E^D<qGG[r9Go%d2Dt;:.Z@@.M5<g#I)&&-]'GAJCf`0U0r8lebLN"muXp\9mU70KU7G'`T(CP22l=86L]JRCk3hLG&$#YTscf7T)9NgE02G7>S@IhtV?31qE55qG07J&nD6un&6'LJ6/I_4$?I\,!S=hH\s,5CT`H#@FE8^.T7\*b4Un?S=>=^=9mV!Rj^9;B)7]?9H<6)P1>ph>uP^AZk11jNKZYr.QS#GcH[d[F96KKDtn'GC'Doq9?jKe[?3I8lJu2>(b1+*:ZCf\]NFr)i+`LqR"T\u-)um5q_c\m22,Z#57UE.pLR)`;NPgMiZm51JJ6BtGr>u*j"@s$Y6q0g_Dsp@fNZ!!,eo#2PP-3,Lf3=S7l7P\s#6.)9uUb64:4p*p'ck[!nE/IhS?N5o`U,8TR#?o9I&5mRYKA7kQt:T&N52T0>W0RGQ/#C:<nc.J7gire(f]WbE!aLlJOt;P^#/_=RGgs(0/=!j@%F:3C+3\n!ZAT")NsrM!"0GX`b>YeZ:?(W^W2ME,m-R"YjAH[#p$N(c`c&!mb3#PW>eE&XD^3-NYMs@PPpPG7;gE-1Xceh8<B@-(,`]S:L:]4"7Ua1P)3/q+C&h)H`:)ncBNq+0j/s[%Te;!!1Ml53!J@+V!>3/FV+iQ<Ic:9E9!b38U]@FH)jndE-Vf#8At.Jd^YQ%JSDN<oYk2qf[S3\c!MZ?e\B+m]`U9C3po;]O1>mf)3@erqSqR5rr+D%m6d.frsH7Ibc+0i?.h?fmYs'p8ci2oW*4P=0i%C8OC\H5o2Z7bq`Q8X5RNJ^sTa,l^rQNW&9M9f:LfF&uF:]eMN$T#(kH#D6CfQ#D+?0+0@mk4qL+g3)@u5C!K;F_[$H8Y7Os1ZASZie=:?[Kttu@1u-8CIJFTB%Vo?I.[*XuSNKXPfM/XY[,KTX6%(H9J/;e5,"dj]^&Wc585nOcn>52MCkaXb\JYRbOW^\GD5:4)RCYD2X0-r(9qS:1$7>t9)0-VS_*CB*?p$Ht!>?rP0B0bqd8GJGBUUICWiWCce'(Y;3FI_j+[t/RQVFVLA]ksmZ!u[e_Z3&.DXkf_Wb?&X=Q]-@M^Y?br()lIK!&(&$n!KKq#Rs7ZRgCLj`o!HpEm<Xc<"!BH'@]I`jQt&.F(J?Pe8S^T:+ZJ*S6[Q\ni:jT8Z/]Ngf4m+q&&^OgstfGnpkKl4?YDZ9U'og5%>LRs,L+<dceg5,!L2Y9dOc5<tTEH&$1(Y?YUD5+V(r<oXrAi0qd@S`8lR*5sYt@Pl2^LP7'63Ar\/kU,Y#-?#i\+L/sJd1>9NMP7sB2N[XmW\Y"N=9J#YkPlM`(K70LPX.Bj5J+A.X\m3u/&/Y,q$ds8@q>d>:]go1UOQ5>AE#J;4$WB]Ng>auiE1ekCkZm`Il7u;Zu@!%*a>(rE&<+-rn_KF[7d"+%/Vre#NrS@7Y;P^:5`b0a/+@^pr.o7n)/TU?:'b"!6`>U6)f!4<l^&RR\sjTn(hZi:s_$k,2Zf`A;64l6'2O+*bBt4h+&hn4k#J<XA_])?Hha9#.5k("k7'3l:CTNjV[eQcHW:tSfOjdpSg0JCg(/hW$"qM=?^?*HVS&WQiYP'RLT*"3/W)^*t#/k=dj&*c0i?\5u$nZCTnM=c(0MkUlk>n'-"9kYpb-/l3MDEBh'U`ddmf=\q/JG#/_+k6B>;I?Js1g1*!#j-bo2A!ZuF3V=*^ITAt$nGqJ*j2`u'M*u-,_?2~>endstream
|
||||
endobj
|
||||
29 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2560
|
||||
>>
|
||||
stream
|
||||
Gatm<m;p`='*$h'@L'W#k>L@7`0l5q0!maS.8X[fk3d8R;/IU6[4NWF%H54r\)0fD?V29(1@Pq2d_>['V;7CVYjnolJ)_O,*t*=Bl@@p3@L\?9q62i6PJtr$,<b)'<)#b]BZ@0i;h*`G-f6<Va%5qfg[a_\9EO:u@C4Zb\7@O_dr\O04e+</_U=iG@$0UI?N&;bYkS9X5Eq>&,WG:raV;Bkc3;ltR.MdY0*nI!Rc-rq^lQj4qYT:lZkR[R"baUDG5,#6bouR(Q>=2i\30V<3#bR*)[F8/6@q2;nO'$h,IP?hQ9@HT_9oE+?0/'5-OUXP3St39Z7PrLABG7hi(UGDAN^;@m]dtC>:U]JM*_HYkLB2LpPp!6'_,p*HuNopY/;,*@iW\`,8X^2.MA]\6"=b+6J#p;"\?"bINu*#>&8/2o!I%78Yi/p^fc7&(q`#m/>:a:X8jE[\ghGTGpO`;=dH=`"_SHE7DU72#,SG%DlOM^;1(_u+@^XlktOcoq"S$hSE@2?ecY>[rPuLI$^.\V1Y"bu/4W4pZiP3(bEL#)dpW=[GM3rHiM(9=nDb/k.$PWL*OrV[VGdU'lT_b\T<fHH-W(Q-!2_*AN]*GaI1`L[JnXl.Wh_bSkm^pY7*I)3`0SL_'W"eTKQFF@6VQJkS\^"(//@0T)Ap@dQHpJjU\@n\E\bs=N5Y9)*5@.c,c?ul87[,U(L&(3GVb_*Bma3EKQYFW#qST:Q5PO%&<Tu=-1IWDXTtqtaEZGu&kUQ[TseE2XDspJ0nksEh@;TiE[l>Q$]EK$nROY+;RShkRX;G:jV*lu.0d%j,RS+/CUl6R:ZlX>/_9,DeC$rrNfmA[b+!_l0r,35[8NJZX!0WM!G"\uWSD0LJn4cIoJX?_7r?BVgfn%1eHYu`dR34YZ9r>cOm]<;3[d%4n`L5&5FsIPk-*(hEcH,N`!+u!,gF`s&iXgVb8k6QN%rh^9O'-3+KSd&g*sri;B_AOD:3'gU=#,)qWI]o0Z8+&ARa3=SidlX7Z0?3\d3#.L,YSD"hui2*o!"JGYKrhD3e,r.,0l4SIG`lAd36nKkhp*T8%OmNg=PoRb>=<7ZaN7r&V;nVSCF5$c]@XWFLWbH]9Jd:&8T,W#VsU_X1%39BDI>;C2)[lCX0F*!:)D2+`qBQiAX^a05i;/LDMe!IbUYXqK[0B3!mH:au6f/idTqA#hN0ophZ<'FNo?>uY]g8:?HA6!XWub6BGaKTBa8grH^.9mS(?n)*)CPXg\=Q$4J?>h??@]a0;Lg3"5+<im3`?cfU:pNM%GX.7qkpS.en`.:D*$WU.7bGA_hHc>kR4jS!P5H68(Db((R-Ml:%0.XG-#*:lE^"PqXBP-b;1SC-gM--r-[U-GoefE6Ln,&7`o2!`/:&#Z4?*S<8i#Bs"dop)].h;HLU%]Zoi)E)W\fDDT^L8Mb9lfeI#fH@brXmc(7ct/6AKi^j?%X7.B?g)l"@F3^6Pt2T':gW^"h@2`FYZ92*>!'Q(r"=,?a:B`-a6&,[g`#bDjXAIC;WWR[?@Qkq[N5USK[l1Y%m<a=aifh8r?Q0*cd7Fhsd2=T@44<$=79Xf\N9K(P?-q%)OLg"83\V62RF]1ERWnN?UEIne18G%`Ap5W7fM0MH+/X(^[^Ap]8!A%#.VXMnp5Ib!?:H^Ou%D@]hbcP)8fSlODT1lmB=7gWLPF.rTn=YUrFXL#k$:jUb1^U+#&1P_O&eA`3:V#p'uV2GluQ+cqFod3L2ArBXsf%dnDUeZ*n&UDrbio=]H']t-1ml)qtWYIh:f!"E:<EpWc=.(<ISi4A@rJmeA0iNiYM:sKaTmjC#>]pISpp2u+Z'[=Z<(dFCbC9EaI/[q]Fn+XX8e=9"Wrdb@1^X6%coM>DbjTrK(qHnI@;YNAcko&!_\o]C.ct;qDR,+NPk3q>SU1l]lhV3$dSD%t1DoVsp)oq\r*4r(k*8fLjVph^'S+13jG1pX>4/HA`e*g94SOV5u!A^F1',[P<>DL^.(MS2mId:T.[iSVsB(WuhXg78=Fea7q`gKSN<tjucH^%0G!ef/VY&q-oauCI8LDtLdpoRV'QK*X\5(fBjlR6mMV9X/7$Pp$3TNWdC'i<_C,X;uCW]bF2f48ZKF`POt2)[$4j*5+3Qj!8`W!'JlqYDZhr&S8u!nM):Ar?!^"TNrDp)MYR'f+C=bh93R-K/HQQ#O/0_Q?]i3HV<DI!gm0?QFPhRm^>P,eIM3fd`tY%E5ESdIT:RA"4;WpEdN'</E)bW=US_YD^p/9m^@me!u:q-"o&4AM3*ZC%0rdh=0(jn4^*+r0_3DD#6GY&KqU#Im0CuJXZ%F<4Zl,'t3WI.c$tk/Na2X(R;dCfOSDb1FH4WnL;+,pf)KY\5XU$%EAciV7b')UXo]ldfPCEr-(/A>^L:J4l9R0)ZtOaeYa@S:Y2kl_:T4do7-6Wq2XbLflepYT`PQn3:)U2<fK1q3(qk=TZIBSX+Xab*k\Z@$9!OO,$S@,Z6BlqQ<3;5Os783KQKZBl^>=L'=*M!iMC%BE@Y0dWkr_Wd$<mpbpn;(IoqPHoRDT'76C~>endstream
|
||||
endobj
|
||||
30 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2470
|
||||
>>
|
||||
stream
|
||||
Gatm<>Beg[&q9SYR($(">9QOpV"0]/lo&/DUq=2$ZnH]U8Ou/V&hH:O;1JPi!$k"D->i3C*6T%P_0g<r!J:B%R/>*#J@?JBoo9%\@C$+Q04WYILS$LAIpTWD&S1NC]hH$1jXR!SnF3X7&HV2mO/$)##8s<fUaolYfaISVmtD?o=#EKmYB::]=IR)rQK=5m`=K3K$C`,oaloO>*A>kM,(IlC^(ZTfFtTOiOsLBdV;Wn1,96a^dk?Lk%Moj*nIfi[)1ImUMUQ.hI8fY2iZlV!F%QO>9\+"7OI*I@FnE5?!Q9ZXe[lB[;cOZtVA?r(/:jV2DAumP7d:=ub$#X0]H<(.nIZ0A)_eLHXV1o:^KD,M_nT\P;-F2L"r>Rl1ZjRf/0gHkWsCTg=T"+)3'tOM*QSR+`)hbATlaRtWe#d\G?^mS:q!e5Y,mAH>O2"9OnBW$RjIu&2t3(jdd%o,"e]k8jrY@4>;[XX#/hF>(o8_fU(FlBW"=:^\#h%8[jA5(/Ag<_4dIDLCuJQSDnIQQ!Sl7HV%?!u#n%^R)J%Y0F,:.lL=TqDKA,No=F1N$=XEAVE>Y4!\>a._`!nU`Z>TRHKuS`kb26>SGPir\%H!p[;h0h:Qf:8l8/J\n8$IdLjZEXMfP6%Jmqdd2PJI>`Ug_?T'n0*,RsZm%+cpj[g:UdpZLfU'`irl(C9C[sIcE9i19:PqfnIUj_h,"G\7!T&SMR!]-7iA`/rDH/F:++0Y1c3%3Ld^"GPgM[m*QttoT#DICjII+)4DNS[bRVMi?4UQ-r`1!IObl<dV[CtK4X!sNP^]kDF>WeHd^Z<IbtlE7jq`kiL<[(lK-tbW$6DbaBXTnQ43aM$GR&8_+pG\0nr7Z@Sb\hR9)okL:B=?7!F>6$-fsXnRB&K*FT9cs)oY=%=40cIO7Vt^6Acp4euI2?`,bZe(SLblq5PoPmN,NN0W<[(O&VeNu&9AXd5mP6h,_''UuWUNDENDF?Li'(qJCpJ"a?bD5A`%[:e(eP_s,7@-bV!rs+69ALq0o.;q<Y$V@Q4&d^n02'u\Q,'1'a/?UL^)U&iuVTHKuju$rihp#&BS1r!4X-#jc?lKo,L0%DR24NOjPrE[=;J>4+LmCh;Gu*"rV%hN$CLhXNq#glhmX#>6nUH&g)^Wk:ShMZ-`%DO*#522G<X7IN+5E9OO\<%jWdk`,/7$<XSh!r_;B;&1Unse`\\p\8\rNmo?"Agf.%m(f9/r)p'FdCR3'$C;]n??+0Ch2&T\Oi8S0VM!W0hmJe)muFf,t![2NAafl`:Y_h<PAL*HfD:cg;cM"Jb9-quf-+D3PX?BUfUYWhVpH5tcn8KBAcM&p-fQ-_mn1S^KmfSb/*rgn_IG%l]U98\9;:3\"kLYHU`q7ZaA0]L-q&0_PE!m_;R#g<;TFa6hQspIm[he9NbprQ9K?F]"7a*/j#h-Bo.!]c"O8#Vm`C?LSjrqo]Lk1A=I5=bX5nG(%6@jE!^0VuN'Jr4n<2kkW=HKj1YuMhu5dTO%X^a!'_q?T1L'na#8QW&PXI1h+=h=Ac_\D(l'Rl7-Z[TD%7IZ;ET"75GOB?((:s^K8)/n4Ur%J1[4]F>3$FNf)GU@d_V_lb0!X1[!D,cIU"nA_uP%$j&dJCS>8rk!=F@YPA"f!ZM7As"qUgAu=qK#(!0"X`?Q#e_k6q)"$VG5=Q_!nS'#9qfV1WqK7**etWlgH61YB%3!gf\R/.<@6)Gae`aq.l?T[s1dt[Jdg9TQ7bo$`eA(hS=E>Aah>I,Y2amS7g=FVF[[TGBnuL)rO`pjj[H`UJ2@S%&3n:)N9;C!r<&fs[Fc1mAT2[7j2m2+!9oF\Tp%gXldG@%$a3KlAKl2tNS!tW\3(h<-KHJsXdTA^R:h1(saLs\X.bQimrEO,,Y,c"Sic*h1=qcB0+u9.o7pm9A"3uu\D>96KTC*&("U;^1A#q)i6g2n.<g"dqrV@L'(jcgB[nuHG^k>"r90\pk[]S>m4p3OD-J(j3h;!SQ;bc:cQ^Ac=U,A_rCg]5#.OB+27Y$39`YoGYo?l-F]J[XUNH@riUFc@]@oVM'r/N9Xkh6#A9;A;"Sj3k+01E[^)38#-=Vgg[QFG^uX`[(<3r3jGFUFM^F)A-r:c!BFK9k#EoP+mnA`/e+i6R]_JN^HRCER9+q7"5$s0Si>,^6FeI?_3+amZkmdETH?"rQTSDI?t=46'=3f)Vjh?MjM6Pp(:?G`Ai:EJTa_?G0"P?PgE`51m5m5MUr$3pj&dn1]jW@M=PL\5N;9JAgfX:#8-Z`\UE1G,dc@FS;i0a>@@>J/1bhCR1;.O2)b^(efq7l;UeSfP=d%1f:pP@,IXd_I*-AD[*QcoIcn!:S:pn*LG="=HLj+n/k2UK5MEY]TT+mGaG>,"6[r/Tb-IkYQh2hT!f1;;iTY*7!f#C(B8QEOnkU.a8.7_04D3q,g9ZKVhurg%Tdg80uUu([;X?Z9Srh[p`DJ7'Me~>endstream
|
||||
endobj
|
||||
31 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2152
|
||||
>>
|
||||
stream
|
||||
Gatm<D/\2f%/ui*_(PnrY.\"fpWsXgZaRr:DTQp7JRQ?epuFN=[WRl&"P7!FdZVr%,utp,BmQ\uUe!YE`.qs_iBPnCjqY\d-+nWOJGHE#J;&mmQ=o])H1^d.IhG"=&$eW?k8-u\5B-D"f;-4Kl=&U&68+$<6G3_dQQ#sll<7jEf6+]X1SqPL%ndO;b,X1CVt^jis2"7D)4@SeBAk%++Y`5Po<j*b,.RuR3/u;pQM==ETf_E*5<kg=E-"uG*%oU!0ZD?FU+faFp9,)]O"PCDK;HN(aZ.I?+Koa5DX#n;ocPO+?G?/bohbHJ+a*_IoQ1,@m15Yh3o3J/_br>Y`:o1:bfASs4S)Yj1Dml*0?F&Qk#mQ\m6(`+Gr4sL(m,WuHGX'8@fi=1>g&S&;"1b&2bJQ#/[e9\YS)Yk`<t1kYIoG%K,*9$TSfJ^a)E9X%Fb`]8Zil)/]n8u.dnia\%!J2e-qi=HJ:%*DK4uSJP,F/e,63[ODEMV/brik'ZMP!U$$ho:hnML,9MMjZM4UC5mo*4*A'%2n.ReZ[ONg;#F."B5*a@,UVY#S)]QqRX:Kr%&'ZA-1&+%LcG]*dR)if[g]k"s<NdZV4``e2b*t]l@h5`8=A06^1R0A.>ja@ooRtN/G2<gqo_P>%Hs3_l<o?K=cQ$]+6+aA3!Oa;N>+mc:hPa2]'WmoL+$Z<EKUeB?"2)EsEbI5`1hg!rmTKWBEaie^)jcmKP^G)s<lt1R7UV03n-aJ^Lp=naV105jC`LO%.")N0_m0L">ZKNVO=)$*Xt3k9f$9^cJcZ"5BZRCVjLXtM"4aFXhOL3AZs)#N).NlO_9EKoI=7NMW`p?8ViLFh/+h]/="k:XFNc]&pml3F?+J.Gs!WQf\o5_(l="O3Md#%8XB_4F:n^kmV8<]%h1u*k'VM('MOm,WkaZ'ZWk-tGZ*I.(/[PS3mrE]1A\b9UrA4$)hAhZ7+Yc9.Q`F:i17o5<j2YPD(H"c8\?6dL']-8-DeC'SeZ=mV_eY>c1h6o.fM(@QQ,ql/lN.A"3X(`6Ea`NB,_u@F#I/lpG0*t?H?o'sjsGp.0JW?4.h)8qkD8QCa$=Ck^"bK4F.bUJ[&\K,P,9aDXVJF<0rO5]D?`#Wcnag$\r%\/j3;t2>CHQMleu2QBIX%dZ*5C8km]h#?b<ui('?DEiVCi&>e.S6.)[Ta_uK`WTn<(\=e_T"Q*'@/-@/eg7YY(7esn[])P5iamg#'P?sJ>/a"U<LrHs]eo0Ks[cURZ7EHSp=LKPUcfdoDXa_3mUIIT\!_XtX&L*mf31!q,MSEoU,.!]9^MB(NXeB](bbS0Hp6=(m"*1.7;/j/ln^saj8Y&&A8<7d?r.``Uml8=_r5C>bB6>'B"eT2ka3>1-fF7;e0>#a..XEnK-S"t(qDZFh_08k*:CA.*B:Y$^tO)R_AR0]:mB@"tPUr>F)%t:$4AIR38@"BEe4,%:pWg2)6j`m8tYs@,]G`-.9D;_FXAW(QV9l'TqXVTM$_d[tM"t08<aDZ;T(4s$:9:LQ_iH>JrKr0o;23M+X\6uq!pD.@rr+;V=qcY3bdp5^aUC-iunLph(R);S0/7-D4X49(>aTI+e_e>/p%b*5;#DaG97=8.#TIk"_l'9U[5LAO<g"sBRb97MjfIk5!pFJW*I4@O-8)k1e%LZ!.]dKGMmg5rI*^iecW2b0P/@'po)MC=nG4;*/msa62pF!iH$7oIYee'Xo'WL[A?>h`5Kg(ApbIdjQ8Z]7ENoCosB$/cf`>LSRFQ)nm9oHC!M2AW__WtC5@.IUqLXiA9c0\J#pEQZk,Nm"p)IrD[@#gPKl,*c91AefVK]a<5BJk+<`6p`jRIS)%q$,0RCSTJ/]2E*6ee@GpqZ0Y^SYJj(g<,\/GCc[&V]ma<X=_2:FYX2_-(I_TXN]cBM=n*;=.8I26f<VE1nqPoWtg5<`thTE>gMq1ZV>4L!`*Rh3HN)JX\Icb&`S]^*c&q.O(EB-Gc],cm/\RLbE[+]Nd^/']=#1maR%<CH*8nnObVr-lEF/na`@)IZROM,Tjn0&g:<[ZK8d3[GcVroX],Z$Cb\Nm)!X)%aA<CY%iHu-iX$!Pa*DU!TemhQj3`j2>WEWMDD3d"0Yfr8aaPr?JYgYt;_sm;c=6[hN.r^7\&-Pm780Wl~>endstream
|
||||
endobj
|
||||
32 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 1311
|
||||
>>
|
||||
stream
|
||||
Gasao9lo&I&A@C2m%lL5=u-0RWSZEjS[N$@S;&ia_&&IA7DpIFJ=mYMf69MI'WjcB/W0ZH]6Lpu]3%lPpn?`SBE<S*iS=qHarmm<7R71Q/R7J>YH)XiKZm2mK>bla34+0SqPuR+J@a:+O9?0;+H=dOKo<SZn4>bN/``cbHam'j$P,'g+d[&X[nlMunh6*>[31BmfU;tX#2ur74l6O<A>'opEKVX3#>J>@XjNd*rU9LE.dU1V,Z0)P6lA0mLnce7m]D%9X,e+]!K'c*NS,4-MA@SbXc9T/emclH9J'hBN.Da@]j1eWe6j_qrZ4`e%VHDDs3Dt4^9aK`=i^<L)[>VJn!Mk'"aLDNjDH5<9;SK<s-VlgL3uhr?+!neM9c$$(Y+VDKC\2O%l[D\B9Yd'(<Y6/V=[YATS0H]$HM%_KZNF%[)a2TbH6-V$d'oHi*(1H<<l"#gP21Rkr'DJd:h%uHdme@1c=ob1;0"dLNM@n<d"bq6UH5'<I'QD;E)43H[?!OHA,-"7A8dTFqj2WS:$kKVt>O)bK]+`7e:Ka1SJ>9d@sIK'H2G?X>F)fXDVsT%VifjD]6"=$LU\I#M:&FP[/u58QVG87)tGmA<s&J>F.U@^!;ei=WUrsn*<K_Fm1VRVd8#uE[(uT>l9`ArU]Nu(TISKj%maV_(ub>^$O]\p@>IK'CB>q^l3m%BYdo[&Nc]4`'#j9i4Nb<:C2?n4FoPaX21aX6=\F$`l`cc26bk!B$mtMn$W"LBu#)Ga_h2Lc"6(?1^A7'c"LFN*q[f%?'SHmccVqeh>`=>4e?W+bs6B]`LJF)j"hBC<&r1LRnJ^QcBZl#CG!INDO#S^:^SESj5k%0.HJqmN$tC]h7su^.K/=cgAtV<66fPXQ>*,&\2V$'FP^7Bbmjm0U?fW25WO(icG?(6PjPc+iV1M&Ff,1KLRq[`lh[+lgX\L0;hB&\6KTOQ1J++eW-PtkoY-]\XiNh$:@M#$UMt%1G%qr@lf5rllu.'iNK;^KRHN@M)&_96AgAABEjB))*;,M3(+7cd`@JbjMSk.W7pkF--N=jQ*Z5s2>PRGp5)u8q"Xtb+&u`DaI5_h91e?HIakPGY<p5$HZc+hK8h_-[.qib2I1WY@VVhqW7H&O_/+Dq,X)AW7;)EVR3s@\hShMNB4D'JEa,7*t!-eQ/%^IP(o<VdDg"8,<a,1fC1M@B9<FrBC9[1g8@%5ahC,O3m81ZY.80"s\F9?M@]G5[8fOO.d%VU&T-u-S8;=UfB$:0=Ti%n[Ye6kPU=<EjpfLG>\5nWU+r5+)Eb$M6&74$V=J^o671ZCq~>endstream
|
||||
endobj
|
||||
33 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 1124
|
||||
>>
|
||||
stream
|
||||
GatU1997gc&AJ$C%!(&;)b/=]]d7?t8?jS+`ePU]U#iPu/4I,qA]+K>),cV>fgf5q"t5rsS=/jC7RIXlI<f)P*oPiehL+7s;cmpfk:DDM4hP,Sra%uc#'DUVXISueObF<ns0UO"J5=Sa/7Eg%6WMLD*c@8a_ABOKMfPls&akY3_-ajT?n)!P(fpP@Q="(rF%C<`.;_s`eW>c15)Cimk819P/'>H!3d?o*Gsh7`s8TU(;W4k,;!*]Da_P(,..W7ldm""C(7tosS>o1pZYUP#BRAH_0(_$N"S,CCRh$t;aAnZ5Wbt$"aWSC52gPjUiX4T+-h?C'X/<NliD%GQr2c*`8K[%?emm\ZGX>M&rJH],1L?kK:%lKGrE_O!1j$Tc:^:u^YX6jd.MVRm0H.dPlG2/8A<_Ce$UV=nZ+(!Vi19MBOnoi@-Toa1m6Gt&k+LZ6EC\=?).=0K^.qeY,Xn-@,&hJM*Z]&JU,n=Y\;Q)<Tcp4ac5ah4;oL8'9i'qKDl#q1<#8XN8pUj8]CFruc*6S#J0UOMkg17$?BoP`RuO]P(08?KJ>W`&p<F(m%8qO&`Ha-Vn3i6(bhra=\6^QeXZ\^@5NG&G;cSjkXC]f?V]P]l>-b5El=-"K4V;i_KL5JE<l0krbo@$>^#(9tOhp7l'>FA#LXb4DOFHn+@lmS:m<;!,b*"5-W[8Ki#B`Y3Ksd&+(Fg#6(HY=1IAr:3ZEem$cD(T\[bZX=0-2MA)6O_0#j(P`liSYX%Q(Wd&GGlD-&V!&.`(Gdq_MF:Bj.CQl*X]OeM5u+eC8kU=)UJ[<SZD6F#\"ul6,Ge+'bHF`/7``?7Tb@l8%@;I[=)+Xbr7/'BX'[[RdR55q-&od$/3\g7_%(6di6A[I\QTUG*t2U^h,u:m4g-3(Tlp6lhm(iM@j^S.TB;5LIVf`cCkAV)bX;iLZF=))(7;3-ZNX9[^s!UEug\QEa#M3lssNP!0WBHg:S:CXb&-DmhWi3F,3e=MrCajj\UO,+VSH&/uMhf?=Ih/bV$"f'Lr2fBZA&VjYa"ni7]CGqf/sHh;Ej9_\#Z,Kj11R1)p;2^j'Zjt!lh]NO^?Gh$51^*T;tPC_eM?fu$X:4(9L1Tnp2'/is?"5,dpk5~>endstream
|
||||
endobj
|
||||
xref
|
||||
0 34
|
||||
0000000000 65535 f
|
||||
0000000061 00000 n
|
||||
0000000156 00000 n
|
||||
0000000263 00000 n
|
||||
0000000371 00000 n
|
||||
0000000481 00000 n
|
||||
0000000676 00000 n
|
||||
0000000785 00000 n
|
||||
0000000980 00000 n
|
||||
0000001085 00000 n
|
||||
0000001162 00000 n
|
||||
0000001358 00000 n
|
||||
0000001554 00000 n
|
||||
0000001750 00000 n
|
||||
0000001946 00000 n
|
||||
0000002142 00000 n
|
||||
0000002226 00000 n
|
||||
0000002422 00000 n
|
||||
0000002618 00000 n
|
||||
0000002814 00000 n
|
||||
0000003010 00000 n
|
||||
0000003080 00000 n
|
||||
0000003361 00000 n
|
||||
0000003494 00000 n
|
||||
0000004196 00000 n
|
||||
0000006400 00000 n
|
||||
0000008981 00000 n
|
||||
0000011784 00000 n
|
||||
0000012614 00000 n
|
||||
0000014985 00000 n
|
||||
0000017637 00000 n
|
||||
0000020199 00000 n
|
||||
0000022443 00000 n
|
||||
0000023846 00000 n
|
||||
trailer
|
||||
<<
|
||||
/ID
|
||||
[<71e3d90b133a79c4436262df53cdbfbf><71e3d90b133a79c4436262df53cdbfbf>]
|
||||
% ReportLab generated PDF document -- digest (opensource)
|
||||
|
||||
/Info 21 0 R
|
||||
/Root 20 0 R
|
||||
/Size 34
|
||||
>>
|
||||
startxref
|
||||
25062
|
||||
%%EOF
|
||||
100
docs/issue-1097-bannerlord-m5-response.md
Normal file
100
docs/issue-1097-bannerlord-m5-response.md
Normal file
@@ -0,0 +1,100 @@
|
||||
# Issue #1097 — Bannerlord M5 Sovereign Victory: Implementation
|
||||
|
||||
**Date:** 2026-03-23
|
||||
**Status:** Python stack implemented — game infrastructure pending
|
||||
|
||||
## Summary
|
||||
|
||||
Issue #1097 is the final milestone of Project Bannerlord (#1091): Timmy holds
|
||||
the title of King with majority territory control through pure local strategy.
|
||||
|
||||
This PR implements the Python-side sovereign victory stack (`src/bannerlord/`).
|
||||
The game-side infrastructure (Windows VM, GABS C# mod) remains external to this
|
||||
repository, consistent with the scope decision on M4 (#1096).
|
||||
|
||||
## What was implemented
|
||||
|
||||
### `src/bannerlord/` package
|
||||
|
||||
| Module | Purpose |
|
||||
|--------|---------|
|
||||
| `models.py` | Pydantic data contracts — KingSubgoal, SubgoalMessage, TaskMessage, ResultMessage, StateUpdateMessage, reward functions, VictoryCondition |
|
||||
| `gabs_client.py` | Async TCP JSON-RPC client for Bannerlord.GABS (port 4825), graceful degradation when game server is offline |
|
||||
| `ledger.py` | SQLite-backed asset ledger — treasury, fiefs, vassal budgets, campaign tick log |
|
||||
| `agents/king.py` | King agent — Qwen3:32b, 1× per campaign day, sovereign campaign loop, victory detection, subgoal broadcast |
|
||||
| `agents/vassals.py` | War / Economy / Diplomacy vassals — Qwen3:14b, domain reward functions, primitive dispatch |
|
||||
| `agents/companions.py` | Logistics / Caravan / Scout companions — event-driven, primitive execution against GABS |
|
||||
|
||||
### `tests/unit/test_bannerlord/` — 56 unit tests
|
||||
|
||||
- `test_models.py` — Pydantic validation, reward math, victory condition logic
|
||||
- `test_gabs_client.py` — Connection lifecycle, RPC dispatch, error handling, graceful degradation
|
||||
- `test_agents.py` — King campaign loop, vassal subgoal routing, companion primitive execution
|
||||
|
||||
All 56 tests pass.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
KingAgent (Qwen3:32b, 1×/day)
|
||||
└── KingSubgoal → SubgoalQueue
|
||||
├── WarVassal (Qwen3:14b, 4×/day)
|
||||
│ └── TaskMessage → LogisticsCompanion
|
||||
│ └── GABS: move_party, recruit_troops, upgrade_troops
|
||||
├── EconomyVassal (Qwen3:14b, 4×/day)
|
||||
│ └── TaskMessage → CaravanCompanion
|
||||
│ └── GABS: assess_prices, buy_goods, establish_caravan
|
||||
└── DiplomacyVassal (Qwen3:14b, 4×/day)
|
||||
└── TaskMessage → ScoutCompanion
|
||||
└── GABS: track_lord, assess_garrison, report_intel
|
||||
```
|
||||
|
||||
## Subgoal vocabulary
|
||||
|
||||
| Token | Vassal | Meaning |
|
||||
|-------|--------|---------|
|
||||
| `EXPAND_TERRITORY` | War | Take or secure a fief |
|
||||
| `RAID_ECONOMY` | War | Raid enemy villages for denars |
|
||||
| `TRAIN` | War | Level troops via auto-resolve |
|
||||
| `FORTIFY` | Economy | Upgrade or repair a settlement |
|
||||
| `CONSOLIDATE` | Economy | Hold territory, no expansion |
|
||||
| `TRADE` | Economy | Execute profitable trade route |
|
||||
| `ALLY` | Diplomacy | Pursue non-aggression / alliance |
|
||||
| `RECRUIT` | Logistics | Fill party to capacity |
|
||||
| `HEAL` | Logistics | Rest party until wounds recovered |
|
||||
| `SPY` | Scout | Gain information on target faction |
|
||||
|
||||
## Victory condition
|
||||
|
||||
```python
|
||||
VictoryCondition(
|
||||
holds_king_title=True, # player_title == "King" from GABS
|
||||
territory_control_pct=55.0, # > 51% of Calradia fiefs
|
||||
)
|
||||
```
|
||||
|
||||
## Graceful degradation
|
||||
|
||||
When GABS is offline (game not running), `GABSClient` logs a warning and raises
|
||||
`GABSUnavailable`. The King agent catches this and runs with an empty game state
|
||||
(falls back to RECRUIT subgoal). No part of the dashboard crashes.
|
||||
|
||||
## Remaining prerequisites
|
||||
|
||||
Before M5 can run live:
|
||||
|
||||
1. **M1-M3** — Passive observer, basic campaign actions, full campaign strategy
|
||||
(currently open; their Python stubs can build on this `src/bannerlord/` package)
|
||||
2. **M4** — Formation Commander (#1096) — declined as out-of-scope; M5 works
|
||||
around M4 by using Bannerlord's Tactics auto-resolve path
|
||||
3. **Windows VM** — Mount & Blade II: Bannerlord + GABS mod (BUTR/Bannerlord.GABS)
|
||||
4. **OBS streaming** — Cinematic Camera pipeline (Step 3 of M5) — external to repo
|
||||
5. **BattleLink** — Alex co-op integration (Step 4 of M5) — requires dedicated server
|
||||
|
||||
## Design references
|
||||
|
||||
- Ahilan & Dayan (2019): Feudal Multi-Agent Hierarchies — manager/worker hierarchy
|
||||
- Wang et al. (2023): Voyager — LLM lifelong learning pattern
|
||||
- Feudal hierarchy design doc: `docs/research/bannerlord-feudal-hierarchy-design.md`
|
||||
|
||||
Fixes #1097
|
||||
26
poetry.lock
generated
26
poetry.lock
generated
@@ -2936,10 +2936,9 @@ numpy = ">=1.22,<2.5"
|
||||
name = "numpy"
|
||||
version = "2.4.2"
|
||||
description = "Fundamental package for array computing in Python"
|
||||
optional = true
|
||||
optional = false
|
||||
python-versions = ">=3.11"
|
||||
groups = ["main"]
|
||||
markers = "extra == \"bigbrain\" or extra == \"embeddings\" or extra == \"voice\""
|
||||
files = [
|
||||
{file = "numpy-2.4.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:e7e88598032542bd49af7c4747541422884219056c268823ef6e5e89851c8825"},
|
||||
{file = "numpy-2.4.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:7edc794af8b36ca37ef5fcb5e0d128c7e0595c7b96a2318d1badb6fcd8ee86b1"},
|
||||
@@ -3347,6 +3346,27 @@ triton = {version = ">=2", markers = "platform_machine == \"x86_64\" and sys_pla
|
||||
[package.extras]
|
||||
dev = ["black", "flake8", "isort", "pytest", "scipy"]
|
||||
|
||||
[[package]]
|
||||
name = "opencv-python"
|
||||
version = "4.13.0.92"
|
||||
description = "Wrapper package for OpenCV python bindings."
|
||||
optional = false
|
||||
python-versions = ">=3.6"
|
||||
groups = ["main"]
|
||||
files = [
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-macosx_13_0_arm64.whl", hash = "sha256:caf60c071ec391ba51ed00a4a920f996d0b64e3e46068aac1f646b5de0326a19"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-macosx_14_0_x86_64.whl", hash = "sha256:5868a8c028a0b37561579bfb8ac1875babdc69546d236249fff296a8c010ccf9"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:0bc2596e68f972ca452d80f444bc404e08807d021fbba40df26b61b18e01838a"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:402033cddf9d294693094de5ef532339f14ce821da3ad7df7c9f6e8316da32cf"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:bccaabf9eb7f897ca61880ce2869dcd9b25b72129c28478e7f2a5e8dee945616"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:620d602b8f7d8b8dab5f4b99c6eb353e78d3fb8b0f53db1bd258bb1aa001c1d5"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-win32.whl", hash = "sha256:372fe164a3148ac1ca51e5f3ad0541a4a276452273f503441d718fab9c5e5f59"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-win_amd64.whl", hash = "sha256:423d934c9fafb91aad38edf26efb46da91ffbc05f3f59c4b0c72e699720706f5"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
numpy = {version = ">=2", markers = "python_version >= \"3.9\""}
|
||||
|
||||
[[package]]
|
||||
name = "optimum"
|
||||
version = "2.1.0"
|
||||
@@ -9700,4 +9720,4 @@ voice = ["openai-whisper", "piper-tts", "pyttsx3", "sounddevice"]
|
||||
[metadata]
|
||||
lock-version = "2.1"
|
||||
python-versions = ">=3.11,<4"
|
||||
content-hash = "cc50755f322b8755e85ab7bdf0668609612d885552aba14caf175326eedfa216"
|
||||
content-hash = "5af3028474051032bef12182eaa5ef55950cbaeca21d1793f878d54c03994eb0"
|
||||
|
||||
@@ -14,6 +14,7 @@ repository = "http://localhost:3000/rockachopa/Timmy-time-dashboard"
|
||||
packages = [
|
||||
{ include = "config.py", from = "src" },
|
||||
|
||||
{ include = "bannerlord", from = "src" },
|
||||
{ include = "dashboard", from = "src" },
|
||||
{ include = "infrastructure", from = "src" },
|
||||
{ include = "integrations", from = "src" },
|
||||
@@ -60,6 +61,7 @@ selenium = { version = ">=4.20.0", optional = true }
|
||||
pytest-randomly = { version = ">=3.16.0", optional = true }
|
||||
pytest-xdist = { version = ">=3.5.0", optional = true }
|
||||
anthropic = "^0.86.0"
|
||||
opencv-python = "^4.13.0.92"
|
||||
|
||||
[tool.poetry.extras]
|
||||
telegram = ["python-telegram-bot"]
|
||||
@@ -96,7 +98,7 @@ asyncio_default_fixture_loop_scope = "function"
|
||||
timeout = 30
|
||||
timeout_method = "signal"
|
||||
timeout_func_only = false
|
||||
addopts = "-v --tb=short --strict-markers --disable-warnings --durations=10"
|
||||
addopts = "-v --tb=short --strict-markers --disable-warnings --durations=10 --cov-fail-under=60"
|
||||
markers = [
|
||||
"unit: Unit tests (fast, no I/O)",
|
||||
"integration: Integration tests (may use SQLite)",
|
||||
|
||||
293
scripts/benchmark_local_model.sh
Executable file
293
scripts/benchmark_local_model.sh
Executable file
@@ -0,0 +1,293 @@
|
||||
#!/usr/bin/env bash
|
||||
# benchmark_local_model.sh
|
||||
#
|
||||
# 5-test benchmark suite for evaluating local Ollama models as Timmy's agent brain.
|
||||
# Based on the model selection study for M3 Max 36 GB (Issue #1063).
|
||||
#
|
||||
# Usage:
|
||||
# ./scripts/benchmark_local_model.sh # test $OLLAMA_MODEL or qwen3:14b
|
||||
# ./scripts/benchmark_local_model.sh qwen3:8b # test a specific model
|
||||
# ./scripts/benchmark_local_model.sh qwen3:14b qwen3:8b # compare two models
|
||||
#
|
||||
# Thresholds (pass/fail):
|
||||
# Test 1 — Tool call compliance: >=90% valid JSON responses out of 5 probes
|
||||
# Test 2 — Code generation: compiles without syntax errors
|
||||
# Test 3 — Shell command gen: no refusal markers in output
|
||||
# Test 4 — Multi-turn coherence: session ID echoed back correctly
|
||||
# Test 5 — Issue triage quality: structured JSON with required fields
|
||||
#
|
||||
# Exit codes: 0 = all tests passed, 1 = one or more tests failed
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
OLLAMA_URL="${OLLAMA_URL:-http://localhost:11434}"
|
||||
PASS=0
|
||||
FAIL=0
|
||||
TOTAL=0
|
||||
|
||||
# ── Colours ──────────────────────────────────────────────────────────────────
|
||||
GREEN='\033[0;32m'
|
||||
RED='\033[0;31m'
|
||||
YELLOW='\033[1;33m'
|
||||
BOLD='\033[1m'
|
||||
RESET='\033[0m'
|
||||
|
||||
pass() { echo -e " ${GREEN}✓ PASS${RESET} $1"; ((PASS++)); ((TOTAL++)); }
|
||||
fail() { echo -e " ${RED}✗ FAIL${RESET} $1"; ((FAIL++)); ((TOTAL++)); }
|
||||
info() { echo -e " ${YELLOW}ℹ${RESET} $1"; }
|
||||
|
||||
# ── Helper: call Ollama generate API ─────────────────────────────────────────
|
||||
ollama_generate() {
|
||||
local model="$1"
|
||||
local prompt="$2"
|
||||
local extra_opts="${3:-}"
|
||||
|
||||
local payload
|
||||
payload=$(printf '{"model":"%s","prompt":"%s","stream":false%s}' \
|
||||
"$model" \
|
||||
"$(echo "$prompt" | sed 's/"/\\"/g' | tr -d '\n')" \
|
||||
"${extra_opts:+,$extra_opts}")
|
||||
|
||||
curl -s --max-time 60 \
|
||||
-X POST "${OLLAMA_URL}/api/generate" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "$payload" \
|
||||
| python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('response',''))" 2>/dev/null || echo ""
|
||||
}
|
||||
|
||||
# ── Helper: call Ollama chat API with tool schema ─────────────────────────────
|
||||
ollama_chat_tool() {
|
||||
local model="$1"
|
||||
local user_msg="$2"
|
||||
|
||||
local payload
|
||||
payload=$(cat <<EOF
|
||||
{
|
||||
"model": "$model",
|
||||
"messages": [{"role": "user", "content": "$user_msg"}],
|
||||
"tools": [{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "get_current_weather",
|
||||
"description": "Get the current weather for a location",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"location": {"type": "string", "description": "City name"},
|
||||
"unit": {"type": "string", "enum": ["celsius","fahrenheit"]}
|
||||
},
|
||||
"required": ["location"]
|
||||
}
|
||||
}
|
||||
}],
|
||||
"stream": false
|
||||
}
|
||||
EOF
|
||||
)
|
||||
curl -s --max-time 60 \
|
||||
-X POST "${OLLAMA_URL}/api/chat" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "$payload" \
|
||||
| python3 -c "
|
||||
import sys, json
|
||||
d = json.load(sys.stdin)
|
||||
msg = d.get('message', {})
|
||||
# Return tool_calls JSON if present, else content
|
||||
calls = msg.get('tool_calls')
|
||||
if calls:
|
||||
print(json.dumps(calls))
|
||||
else:
|
||||
print(msg.get('content', ''))
|
||||
" 2>/dev/null || echo ""
|
||||
}
|
||||
|
||||
# ── Benchmark a single model ──────────────────────────────────────────────────
|
||||
benchmark_model() {
|
||||
local model="$1"
|
||||
echo ""
|
||||
echo -e "${BOLD}═══════════════════════════════════════════════════${RESET}"
|
||||
echo -e "${BOLD} Model: ${model}${RESET}"
|
||||
echo -e "${BOLD}═══════════════════════════════════════════════════${RESET}"
|
||||
|
||||
# Check model availability
|
||||
local available
|
||||
available=$(curl -s "${OLLAMA_URL}/api/tags" \
|
||||
| python3 -c "
|
||||
import sys, json
|
||||
d = json.load(sys.stdin)
|
||||
models = [m.get('name','') for m in d.get('models',[])]
|
||||
target = '$model'
|
||||
match = any(target == m or target == m.split(':')[0] or m.startswith(target) for m in models)
|
||||
print('yes' if match else 'no')
|
||||
" 2>/dev/null || echo "no")
|
||||
|
||||
if [[ "$available" != "yes" ]]; then
|
||||
echo -e " ${YELLOW}⚠ SKIP${RESET} Model '$model' not available locally — pull it first:"
|
||||
echo " ollama pull $model"
|
||||
return 0
|
||||
fi
|
||||
|
||||
# ── Test 1: Tool Call Compliance ─────────────────────────────────────────
|
||||
echo ""
|
||||
echo -e " ${BOLD}Test 1: Tool Call Compliance${RESET} (target ≥90% valid JSON)"
|
||||
local tool_pass=0
|
||||
local tool_probes=5
|
||||
for i in $(seq 1 $tool_probes); do
|
||||
local response
|
||||
response=$(ollama_chat_tool "$model" \
|
||||
"What is the weather in Tokyo right now?")
|
||||
# Valid if response is non-empty JSON (tool_calls array or JSON object)
|
||||
if echo "$response" | python3 -c "import sys,json; json.load(sys.stdin)" 2>/dev/null; then
|
||||
((tool_pass++))
|
||||
fi
|
||||
done
|
||||
local tool_pct=$(( tool_pass * 100 / tool_probes ))
|
||||
info "Tool call valid JSON: $tool_pass/$tool_probes ($tool_pct%)"
|
||||
if [[ $tool_pct -ge 90 ]]; then
|
||||
pass "Tool call compliance ≥90% ($tool_pct%)"
|
||||
else
|
||||
fail "Tool call compliance <90% ($tool_pct%) — unreliable for agent loops"
|
||||
fi
|
||||
|
||||
# ── Test 2: Code Generation ──────────────────────────────────────────────
|
||||
echo ""
|
||||
echo -e " ${BOLD}Test 2: Code Generation${RESET} (must produce valid Python)"
|
||||
local code_response
|
||||
code_response=$(ollama_generate "$model" \
|
||||
"Write a Python function called fibonacci(n) that returns the nth Fibonacci number using recursion. Return ONLY the function definition, no explanation.")
|
||||
# Extract Python code block if present
|
||||
local code
|
||||
code=$(echo "$code_response" | sed -n '/^```python/,/^```/p' | grep -v '```' || echo "$code_response")
|
||||
if [[ -z "$code" ]]; then
|
||||
code="$code_response"
|
||||
fi
|
||||
# Validate syntax
|
||||
if echo "$code" | python3 -c "
|
||||
import sys, ast
|
||||
src = sys.stdin.read()
|
||||
try:
|
||||
ast.parse(src)
|
||||
sys.exit(0)
|
||||
except SyntaxError as e:
|
||||
sys.exit(1)
|
||||
" 2>/dev/null; then
|
||||
pass "Code generation produces valid Python syntax"
|
||||
else
|
||||
fail "Code generation produced invalid Python syntax"
|
||||
info "Response: $(echo "$code_response" | head -5)"
|
||||
fi
|
||||
|
||||
# ── Test 3: Shell Command Generation (No Refusal) ────────────────────────
|
||||
echo ""
|
||||
echo -e " ${BOLD}Test 3: Shell Command Generation${RESET} (must not refuse)"
|
||||
local shell_response
|
||||
shell_response=$(ollama_generate "$model" \
|
||||
"Give me a bash one-liner to find all .log files modified in the last 7 days under /var/log and delete them. Output only the command, no explanation.")
|
||||
# Check for refusal markers
|
||||
local refusal_found=false
|
||||
for marker in "cannot" "can't" "I'm sorry" "I am sorry" "won't" "will not" "inappropriate" "harmful" "dangerous" "refuse"; do
|
||||
if echo "$shell_response" | grep -qi "$marker"; then
|
||||
refusal_found=true
|
||||
info "Refusal marker found: '$marker'"
|
||||
break
|
||||
fi
|
||||
done
|
||||
# Also check that the response contains something shell-like
|
||||
if echo "$shell_response" | grep -qE 'find|rm|delete|xargs'; then
|
||||
if [[ "$refusal_found" == "true" ]]; then
|
||||
fail "Shell command generated but with refusal language"
|
||||
else
|
||||
pass "Shell command generated without refusal"
|
||||
fi
|
||||
else
|
||||
if [[ "$refusal_found" == "true" ]]; then
|
||||
fail "Shell command refused — model will block agent shell operations"
|
||||
else
|
||||
fail "Shell command not generated (no find/rm/delete/xargs in output)"
|
||||
info "Response: $(echo "$shell_response" | head -3)"
|
||||
fi
|
||||
fi
|
||||
|
||||
# ── Test 4: Multi-Turn Agent Loop Coherence ──────────────────────────────
|
||||
echo ""
|
||||
echo -e " ${BOLD}Test 4: Multi-Turn Agent Loop Coherence${RESET}"
|
||||
local session_id="SESS-$(date +%s)"
|
||||
local turn1_response
|
||||
turn1_response=$(ollama_generate "$model" \
|
||||
"You are starting a multi-step task. Your session ID is $session_id. Acknowledge this ID and ask for the first task.")
|
||||
local turn2_response
|
||||
turn2_response=$(ollama_generate "$model" \
|
||||
"Continuing session $session_id. Previous context: you acknowledged the session. Now summarize what session ID you are working in. Include the exact ID.")
|
||||
if echo "$turn2_response" | grep -q "$session_id"; then
|
||||
pass "Multi-turn coherence: session ID echoed back correctly"
|
||||
else
|
||||
fail "Multi-turn coherence: session ID not found in follow-up response"
|
||||
info "Expected: $session_id"
|
||||
info "Response snippet: $(echo "$turn2_response" | head -3)"
|
||||
fi
|
||||
|
||||
# ── Test 5: Issue Triage Quality ─────────────────────────────────────────
|
||||
echo ""
|
||||
echo -e " ${BOLD}Test 5: Issue Triage Quality${RESET} (must return structured JSON)"
|
||||
local triage_response
|
||||
triage_response=$(ollama_generate "$model" \
|
||||
'Triage this bug report and respond ONLY with a JSON object with fields: priority (low/medium/high/critical), component (string), estimated_effort (hours as integer), needs_reproduction (boolean). Bug: "The dashboard crashes with a 500 error when submitting an empty chat message. Reproducible 100% of the time on the /chat endpoint."')
|
||||
local triage_valid=false
|
||||
if echo "$triage_response" | python3 -c "
|
||||
import sys, json, re
|
||||
text = sys.stdin.read()
|
||||
# Try to extract JSON from response (may be wrapped in markdown)
|
||||
match = re.search(r'\{[^{}]+\}', text, re.DOTALL)
|
||||
if not match:
|
||||
sys.exit(1)
|
||||
try:
|
||||
d = json.loads(match.group())
|
||||
required = {'priority', 'component', 'estimated_effort', 'needs_reproduction'}
|
||||
if required.issubset(d.keys()):
|
||||
valid_priority = d['priority'] in ('low','medium','high','critical')
|
||||
if valid_priority:
|
||||
sys.exit(0)
|
||||
sys.exit(1)
|
||||
except:
|
||||
sys.exit(1)
|
||||
" 2>/dev/null; then
|
||||
pass "Issue triage returned valid structured JSON with all required fields"
|
||||
else
|
||||
fail "Issue triage did not return valid structured JSON"
|
||||
info "Response: $(echo "$triage_response" | head -5)"
|
||||
fi
|
||||
}
|
||||
|
||||
# ── Summary ───────────────────────────────────────────────────────────────────
|
||||
print_summary() {
|
||||
local model="$1"
|
||||
local model_pass="$2"
|
||||
local model_total="$3"
|
||||
echo ""
|
||||
local pct=$(( model_pass * 100 / model_total ))
|
||||
if [[ $model_pass -eq $model_total ]]; then
|
||||
echo -e " ${GREEN}${BOLD}RESULT: $model_pass/$model_total tests passed ($pct%) — READY FOR AGENT USE${RESET}"
|
||||
elif [[ $pct -ge 60 ]]; then
|
||||
echo -e " ${YELLOW}${BOLD}RESULT: $model_pass/$model_total tests passed ($pct%) — MARGINAL${RESET}"
|
||||
else
|
||||
echo -e " ${RED}${BOLD}RESULT: $model_pass/$model_total tests passed ($pct%) — NOT RECOMMENDED${RESET}"
|
||||
fi
|
||||
}
|
||||
|
||||
# ── Main ─────────────────────────────────────────────────────────────────────
|
||||
models=("${@:-${OLLAMA_MODEL:-qwen3:14b}}")
|
||||
|
||||
for model in "${models[@]}"; do
|
||||
PASS=0
|
||||
FAIL=0
|
||||
TOTAL=0
|
||||
benchmark_model "$model"
|
||||
print_summary "$model" "$PASS" "$TOTAL"
|
||||
done
|
||||
|
||||
echo ""
|
||||
if [[ $FAIL -eq 0 ]]; then
|
||||
exit 0
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
@@ -42,7 +42,7 @@ def _get_gitea_api() -> str:
|
||||
if api_file.exists():
|
||||
return api_file.read_text().strip()
|
||||
# Default fallback
|
||||
return "http://localhost:3000/api/v1"
|
||||
return "http://143.198.27.163:3000/api/v1"
|
||||
|
||||
|
||||
GITEA_API = _get_gitea_api()
|
||||
|
||||
@@ -6,7 +6,7 @@ writes a ranked queue to .loop/queue.json. No LLM calls — pure heuristics.
|
||||
|
||||
Run: python3 scripts/triage_score.py
|
||||
Env: GITEA_TOKEN (or reads ~/.hermes/gitea_token)
|
||||
GITEA_API (default: http://localhost:3000/api/v1)
|
||||
GITEA_API (default: http://143.198.27.163:3000/api/v1)
|
||||
REPO_SLUG (default: rockachopa/Timmy-time-dashboard)
|
||||
"""
|
||||
|
||||
@@ -33,7 +33,7 @@ def _get_gitea_api() -> str:
|
||||
if api_file.exists():
|
||||
return api_file.read_text().strip()
|
||||
# Default fallback
|
||||
return "http://localhost:3000/api/v1"
|
||||
return "http://143.198.27.163:3000/api/v1"
|
||||
|
||||
|
||||
GITEA_API = _get_gitea_api()
|
||||
|
||||
22
src/bannerlord/__init__.py
Normal file
22
src/bannerlord/__init__.py
Normal file
@@ -0,0 +1,22 @@
|
||||
"""Bannerlord sovereign agent package — Project Bannerlord M5.
|
||||
|
||||
Implements the feudal multi-agent hierarchy for Timmy's Bannerlord campaign.
|
||||
Architecture based on Ahilan & Dayan (2019) Feudal Multi-Agent Hierarchies.
|
||||
|
||||
Refs #1091 (epic), #1097 (M5 Sovereign Victory), #1099 (feudal hierarchy design).
|
||||
|
||||
Requires:
|
||||
- GABS mod running on Bannerlord Windows VM (TCP port 4825)
|
||||
- Ollama with Qwen3:32b (King), Qwen3:14b (Vassals), Qwen3:8b (Companions)
|
||||
|
||||
Usage::
|
||||
|
||||
from bannerlord.gabs_client import GABSClient
|
||||
from bannerlord.agents.king import KingAgent
|
||||
|
||||
async with GABSClient() as gabs:
|
||||
king = KingAgent(gabs_client=gabs)
|
||||
await king.run_campaign()
|
||||
"""
|
||||
|
||||
__version__ = "0.1.0"
|
||||
7
src/bannerlord/agents/__init__.py
Normal file
7
src/bannerlord/agents/__init__.py
Normal file
@@ -0,0 +1,7 @@
|
||||
"""Bannerlord feudal agent hierarchy.
|
||||
|
||||
Three tiers:
|
||||
- King (king.py) — strategic, Qwen3:32b, 1× per campaign day
|
||||
- Vassals (vassals.py) — domain, Qwen3:14b, 4× per campaign day
|
||||
- Companions (companions.py) — tactical, Qwen3:8b, event-driven
|
||||
"""
|
||||
261
src/bannerlord/agents/companions.py
Normal file
261
src/bannerlord/agents/companions.py
Normal file
@@ -0,0 +1,261 @@
|
||||
"""Companion worker agents — Logistics, Caravan, and Scout.
|
||||
|
||||
Companions are the lowest tier — fast, specialized, single-purpose workers.
|
||||
Each companion listens to its :class:`TaskMessage` queue, executes the
|
||||
requested primitive against GABS, and emits a :class:`ResultMessage`.
|
||||
|
||||
Model: Qwen3:8b (or smaller) — sub-2-second response times.
|
||||
Frequency: event-driven (triggered by vassal task messages).
|
||||
|
||||
Primitive vocabulary per companion:
|
||||
Logistics: recruit_troop, buy_supplies, rest_party, sell_prisoners, upgrade_troops, build_project
|
||||
Caravan: assess_prices, buy_goods, sell_goods, establish_caravan, abandon_route
|
||||
Scout: track_lord, assess_garrison, map_patrol_routes, report_intel
|
||||
|
||||
Refs: #1097, #1099.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from bannerlord.gabs_client import GABSClient, GABSUnavailable
|
||||
from bannerlord.models import ResultMessage, TaskMessage
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class BaseCompanion:
|
||||
"""Shared companion lifecycle — polls task queue, executes primitives."""
|
||||
|
||||
name: str = "base_companion"
|
||||
primitives: frozenset[str] = frozenset()
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
gabs_client: GABSClient,
|
||||
task_queue: asyncio.Queue[TaskMessage],
|
||||
result_queue: asyncio.Queue[ResultMessage] | None = None,
|
||||
) -> None:
|
||||
self._gabs = gabs_client
|
||||
self._task_queue = task_queue
|
||||
self._result_queue = result_queue or asyncio.Queue()
|
||||
self._running = False
|
||||
|
||||
@property
|
||||
def result_queue(self) -> asyncio.Queue[ResultMessage]:
|
||||
return self._result_queue
|
||||
|
||||
async def run(self) -> None:
|
||||
"""Companion event loop — processes task messages."""
|
||||
self._running = True
|
||||
logger.info("%s started", self.name)
|
||||
try:
|
||||
while self._running:
|
||||
try:
|
||||
task = await asyncio.wait_for(self._task_queue.get(), timeout=1.0)
|
||||
except TimeoutError:
|
||||
continue
|
||||
|
||||
if task.to_agent != self.name:
|
||||
# Not for us — put it back (another companion will handle it)
|
||||
await self._task_queue.put(task)
|
||||
await asyncio.sleep(0.05)
|
||||
continue
|
||||
|
||||
result = await self._execute(task)
|
||||
await self._result_queue.put(result)
|
||||
self._task_queue.task_done()
|
||||
|
||||
except asyncio.CancelledError:
|
||||
logger.info("%s cancelled", self.name)
|
||||
raise
|
||||
finally:
|
||||
self._running = False
|
||||
|
||||
def stop(self) -> None:
|
||||
self._running = False
|
||||
|
||||
async def _execute(self, task: TaskMessage) -> ResultMessage:
|
||||
"""Dispatch *task.primitive* to its handler method."""
|
||||
handler = getattr(self, f"_prim_{task.primitive}", None)
|
||||
if handler is None:
|
||||
logger.warning("%s: unknown primitive %r — skipping", self.name, task.primitive)
|
||||
return ResultMessage(
|
||||
from_agent=self.name,
|
||||
to_agent=task.from_agent,
|
||||
success=False,
|
||||
outcome={"error": f"Unknown primitive: {task.primitive}"},
|
||||
)
|
||||
try:
|
||||
outcome = await handler(task.args)
|
||||
return ResultMessage(
|
||||
from_agent=self.name,
|
||||
to_agent=task.from_agent,
|
||||
success=True,
|
||||
outcome=outcome or {},
|
||||
)
|
||||
except GABSUnavailable as exc:
|
||||
logger.warning("%s: GABS unavailable for %r: %s", self.name, task.primitive, exc)
|
||||
return ResultMessage(
|
||||
from_agent=self.name,
|
||||
to_agent=task.from_agent,
|
||||
success=False,
|
||||
outcome={"error": str(exc)},
|
||||
)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.warning("%s: %r failed: %s", self.name, task.primitive, exc)
|
||||
return ResultMessage(
|
||||
from_agent=self.name,
|
||||
to_agent=task.from_agent,
|
||||
success=False,
|
||||
outcome={"error": str(exc)},
|
||||
)
|
||||
|
||||
|
||||
# ── Logistics Companion ───────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class LogisticsCompanion(BaseCompanion):
|
||||
"""Party management — recruitment, supply, healing, troop upgrades.
|
||||
|
||||
Skill domain: Scouting / Steward / Medicine.
|
||||
"""
|
||||
|
||||
name = "logistics_companion"
|
||||
primitives = frozenset(
|
||||
{
|
||||
"recruit_troop",
|
||||
"buy_supplies",
|
||||
"rest_party",
|
||||
"sell_prisoners",
|
||||
"upgrade_troops",
|
||||
"build_project",
|
||||
}
|
||||
)
|
||||
|
||||
async def _prim_recruit_troop(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
troop_type = args.get("troop_type", "infantry")
|
||||
qty = int(args.get("quantity", 10))
|
||||
result = await self._gabs.recruit_troops(troop_type, qty)
|
||||
logger.info("Recruited %d %s", qty, troop_type)
|
||||
return result or {"recruited": qty, "type": troop_type}
|
||||
|
||||
async def _prim_buy_supplies(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
qty = int(args.get("quantity", 50))
|
||||
result = await self._gabs.call("party.buySupplies", {"quantity": qty})
|
||||
logger.info("Bought %d food supplies", qty)
|
||||
return result or {"purchased": qty}
|
||||
|
||||
async def _prim_rest_party(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
days = int(args.get("days", 3))
|
||||
result = await self._gabs.call("party.rest", {"days": days})
|
||||
logger.info("Resting party for %d days", days)
|
||||
return result or {"rested_days": days}
|
||||
|
||||
async def _prim_sell_prisoners(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
location = args.get("location", "nearest_town")
|
||||
result = await self._gabs.call("party.sellPrisoners", {"location": location})
|
||||
logger.info("Selling prisoners at %s", location)
|
||||
return result or {"sold_at": location}
|
||||
|
||||
async def _prim_upgrade_troops(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
result = await self._gabs.call("party.upgradeTroops", {})
|
||||
logger.info("Upgraded available troops")
|
||||
return result or {"upgraded": True}
|
||||
|
||||
async def _prim_build_project(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
settlement = args.get("settlement", "")
|
||||
result = await self._gabs.call("settlement.buildProject", {"settlement": settlement})
|
||||
logger.info("Building project in %s", settlement)
|
||||
return result or {"settlement": settlement}
|
||||
|
||||
async def _prim_move_party(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
destination = args.get("destination", "")
|
||||
result = await self._gabs.move_party(destination)
|
||||
logger.info("Moving party to %s", destination)
|
||||
return result or {"destination": destination}
|
||||
|
||||
|
||||
# ── Caravan Companion ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class CaravanCompanion(BaseCompanion):
|
||||
"""Trade route management — price assessment, goods trading, caravan deployment.
|
||||
|
||||
Skill domain: Trade / Charm.
|
||||
"""
|
||||
|
||||
name = "caravan_companion"
|
||||
primitives = frozenset(
|
||||
{"assess_prices", "buy_goods", "sell_goods", "establish_caravan", "abandon_route"}
|
||||
)
|
||||
|
||||
async def _prim_assess_prices(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
town = args.get("town", "nearest")
|
||||
result = await self._gabs.call("trade.assessPrices", {"town": town})
|
||||
logger.info("Assessed prices at %s", town)
|
||||
return result or {"town": town}
|
||||
|
||||
async def _prim_buy_goods(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
item = args.get("item", "grain")
|
||||
qty = int(args.get("quantity", 10))
|
||||
result = await self._gabs.call("trade.buyGoods", {"item": item, "quantity": qty})
|
||||
logger.info("Buying %d × %s", qty, item)
|
||||
return result or {"item": item, "quantity": qty}
|
||||
|
||||
async def _prim_sell_goods(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
item = args.get("item", "grain")
|
||||
qty = int(args.get("quantity", 10))
|
||||
result = await self._gabs.call("trade.sellGoods", {"item": item, "quantity": qty})
|
||||
logger.info("Selling %d × %s", qty, item)
|
||||
return result or {"item": item, "quantity": qty}
|
||||
|
||||
async def _prim_establish_caravan(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
town = args.get("town", "")
|
||||
result = await self._gabs.call("trade.establishCaravan", {"town": town})
|
||||
logger.info("Establishing caravan at %s", town)
|
||||
return result or {"town": town}
|
||||
|
||||
async def _prim_abandon_route(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
result = await self._gabs.call("trade.abandonRoute", {})
|
||||
logger.info("Caravan route abandoned — returning to main party")
|
||||
return result or {"abandoned": True}
|
||||
|
||||
|
||||
# ── Scout Companion ───────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class ScoutCompanion(BaseCompanion):
|
||||
"""Intelligence gathering — lord tracking, garrison assessment, patrol mapping.
|
||||
|
||||
Skill domain: Scouting / Roguery.
|
||||
"""
|
||||
|
||||
name = "scout_companion"
|
||||
primitives = frozenset({"track_lord", "assess_garrison", "map_patrol_routes", "report_intel"})
|
||||
|
||||
async def _prim_track_lord(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
lord_name = args.get("name", "")
|
||||
result = await self._gabs.call("intelligence.trackLord", {"name": lord_name})
|
||||
logger.info("Tracking lord: %s", lord_name)
|
||||
return result or {"tracking": lord_name}
|
||||
|
||||
async def _prim_assess_garrison(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
settlement = args.get("settlement", "")
|
||||
result = await self._gabs.call("intelligence.assessGarrison", {"settlement": settlement})
|
||||
logger.info("Assessing garrison at %s", settlement)
|
||||
return result or {"settlement": settlement}
|
||||
|
||||
async def _prim_map_patrol_routes(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
region = args.get("region", "")
|
||||
result = await self._gabs.call("intelligence.mapPatrols", {"region": region})
|
||||
logger.info("Mapping patrol routes in %s", region)
|
||||
return result or {"region": region}
|
||||
|
||||
async def _prim_report_intel(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
result = await self._gabs.call("intelligence.report", {})
|
||||
logger.info("Scout intel report generated")
|
||||
return result or {"reported": True}
|
||||
235
src/bannerlord/agents/king.py
Normal file
235
src/bannerlord/agents/king.py
Normal file
@@ -0,0 +1,235 @@
|
||||
"""King agent — Timmy as sovereign ruler of Calradia.
|
||||
|
||||
The King operates on the campaign-map timescale. Each campaign tick he:
|
||||
1. Reads the full game state from GABS
|
||||
2. Evaluates the victory condition
|
||||
3. Issues a single KingSubgoal token to the vassal queue
|
||||
4. Logs the tick to the ledger
|
||||
|
||||
Strategic planning model: Qwen3:32b (local via Ollama).
|
||||
Decision budget: 5–15 seconds per tick.
|
||||
|
||||
Sovereignty guarantees (§5c of the feudal hierarchy design):
|
||||
- King task holds the asyncio.TaskGroup cancel scope
|
||||
- Vassals and companions run as sub-tasks and cannot terminate the King
|
||||
- Only the human operator or a top-level SHUTDOWN signal can stop the loop
|
||||
|
||||
Refs: #1091, #1097, #1099.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from bannerlord.gabs_client import GABSClient, GABSUnavailable
|
||||
from bannerlord.ledger import Ledger
|
||||
from bannerlord.models import (
|
||||
KingSubgoal,
|
||||
StateUpdateMessage,
|
||||
SubgoalMessage,
|
||||
VictoryCondition,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_KING_MODEL = "qwen3:32b"
|
||||
_KING_TICK_SECONDS = 5.0 # real-time pause between campaign ticks (configurable)
|
||||
|
||||
_SYSTEM_PROMPT = """You are Timmy, the sovereign King of Calradia.
|
||||
Your goal: hold the title of King with majority territory control (>50% of all fiefs).
|
||||
You think strategically over 100+ in-game days. You never cheat, use cloud AI, or
|
||||
request external resources beyond your local inference stack.
|
||||
|
||||
Each turn you receive the full game state as JSON. You respond with a single JSON
|
||||
object selecting your strategic directive for the next campaign day:
|
||||
{
|
||||
"token": "<SUBGOAL_TOKEN>",
|
||||
"target": "<settlement or faction or null>",
|
||||
"quantity": <int or null>,
|
||||
"priority": <float 0.0-2.0>,
|
||||
"deadline_days": <int or null>,
|
||||
"context": "<brief reasoning>"
|
||||
}
|
||||
|
||||
Valid tokens: EXPAND_TERRITORY, RAID_ECONOMY, FORTIFY, RECRUIT, TRADE,
|
||||
ALLY, SPY, HEAL, CONSOLIDATE, TRAIN
|
||||
|
||||
Think step by step. Respond with JSON only — no prose outside the object.
|
||||
"""
|
||||
|
||||
|
||||
class KingAgent:
|
||||
"""Sovereign campaign agent.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
gabs_client:
|
||||
Connected (or gracefully-degraded) GABS client.
|
||||
ledger:
|
||||
Asset ledger for persistence. Initialized automatically if not provided.
|
||||
ollama_url:
|
||||
Base URL of the Ollama inference server.
|
||||
model:
|
||||
Ollama model tag. Default: qwen3:32b.
|
||||
tick_interval:
|
||||
Real-time seconds between campaign ticks.
|
||||
subgoal_queue:
|
||||
asyncio.Queue where KingSubgoal messages are placed for vassals.
|
||||
Created automatically if not provided.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
gabs_client: GABSClient,
|
||||
ledger: Ledger | None = None,
|
||||
ollama_url: str = "http://localhost:11434",
|
||||
model: str = _KING_MODEL,
|
||||
tick_interval: float = _KING_TICK_SECONDS,
|
||||
subgoal_queue: asyncio.Queue[SubgoalMessage] | None = None,
|
||||
) -> None:
|
||||
self._gabs = gabs_client
|
||||
self._ledger = ledger or Ledger()
|
||||
self._ollama_url = ollama_url
|
||||
self._model = model
|
||||
self._tick_interval = tick_interval
|
||||
self._subgoal_queue: asyncio.Queue[SubgoalMessage] = subgoal_queue or asyncio.Queue()
|
||||
self._tick = 0
|
||||
self._running = False
|
||||
|
||||
@property
|
||||
def subgoal_queue(self) -> asyncio.Queue[SubgoalMessage]:
|
||||
return self._subgoal_queue
|
||||
|
||||
# ── Campaign loop ─────────────────────────────────────────────────────
|
||||
|
||||
async def run_campaign(self, max_ticks: int | None = None) -> VictoryCondition:
|
||||
"""Run the sovereign campaign loop until victory or *max_ticks*.
|
||||
|
||||
Returns the final :class:`VictoryCondition` snapshot.
|
||||
"""
|
||||
self._ledger.initialize()
|
||||
self._running = True
|
||||
victory = VictoryCondition()
|
||||
logger.info("King campaign started. Model: %s. Max ticks: %s", self._model, max_ticks)
|
||||
|
||||
try:
|
||||
while self._running:
|
||||
if max_ticks is not None and self._tick >= max_ticks:
|
||||
logger.info("Max ticks (%d) reached — stopping campaign.", max_ticks)
|
||||
break
|
||||
|
||||
state = await self._fetch_state()
|
||||
victory = self._evaluate_victory(state)
|
||||
|
||||
if victory.achieved:
|
||||
logger.info(
|
||||
"SOVEREIGN VICTORY — King of Calradia! Territory: %.1f%%, tick: %d",
|
||||
victory.territory_control_pct,
|
||||
self._tick,
|
||||
)
|
||||
break
|
||||
|
||||
subgoal = await self._decide(state)
|
||||
await self._broadcast_subgoal(subgoal)
|
||||
self._ledger.log_tick(
|
||||
tick=self._tick,
|
||||
campaign_day=state.get("campaign_day", self._tick),
|
||||
subgoal=subgoal.token,
|
||||
)
|
||||
|
||||
self._tick += 1
|
||||
await asyncio.sleep(self._tick_interval)
|
||||
|
||||
except asyncio.CancelledError:
|
||||
logger.info("King campaign task cancelled at tick %d", self._tick)
|
||||
raise
|
||||
finally:
|
||||
self._running = False
|
||||
|
||||
return victory
|
||||
|
||||
def stop(self) -> None:
|
||||
"""Signal the campaign loop to stop after the current tick."""
|
||||
self._running = False
|
||||
|
||||
# ── State & victory ───────────────────────────────────────────────────
|
||||
|
||||
async def _fetch_state(self) -> dict[str, Any]:
|
||||
try:
|
||||
state = await self._gabs.get_state()
|
||||
return state if isinstance(state, dict) else {}
|
||||
except GABSUnavailable as exc:
|
||||
logger.warning("GABS unavailable at tick %d: %s — using empty state", self._tick, exc)
|
||||
return {}
|
||||
|
||||
def _evaluate_victory(self, state: dict[str, Any]) -> VictoryCondition:
|
||||
return VictoryCondition(
|
||||
holds_king_title=state.get("player_title") == "King",
|
||||
territory_control_pct=float(state.get("territory_control_pct", 0.0)),
|
||||
)
|
||||
|
||||
# ── Strategic decision ────────────────────────────────────────────────
|
||||
|
||||
async def _decide(self, state: dict[str, Any]) -> KingSubgoal:
|
||||
"""Ask the LLM for the next strategic subgoal.
|
||||
|
||||
Falls back to RECRUIT (safe default) if the LLM is unavailable.
|
||||
"""
|
||||
try:
|
||||
subgoal = await asyncio.to_thread(self._llm_decide, state)
|
||||
return subgoal
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.warning(
|
||||
"King LLM decision failed at tick %d: %s — defaulting to RECRUIT", self._tick, exc
|
||||
)
|
||||
return KingSubgoal(token="RECRUIT", context="LLM unavailable — safe default") # noqa: S106
|
||||
|
||||
def _llm_decide(self, state: dict[str, Any]) -> KingSubgoal:
|
||||
"""Synchronous Ollama call (runs in a thread via asyncio.to_thread)."""
|
||||
import urllib.request
|
||||
|
||||
prompt_state = json.dumps(state, indent=2)[:4000] # truncate for context budget
|
||||
payload = {
|
||||
"model": self._model,
|
||||
"prompt": f"GAME STATE:\n{prompt_state}\n\nYour strategic directive:",
|
||||
"system": _SYSTEM_PROMPT,
|
||||
"stream": False,
|
||||
"format": "json",
|
||||
"options": {"temperature": 0.1},
|
||||
}
|
||||
data = json.dumps(payload).encode()
|
||||
req = urllib.request.Request(
|
||||
f"{self._ollama_url}/api/generate",
|
||||
data=data,
|
||||
headers={"Content-Type": "application/json"},
|
||||
)
|
||||
with urllib.request.urlopen(req, timeout=30) as resp: # noqa: S310
|
||||
result = json.loads(resp.read())
|
||||
|
||||
raw = result.get("response", "{}")
|
||||
parsed = json.loads(raw)
|
||||
return KingSubgoal(**parsed)
|
||||
|
||||
# ── Subgoal dispatch ──────────────────────────────────────────────────
|
||||
|
||||
async def _broadcast_subgoal(self, subgoal: KingSubgoal) -> None:
|
||||
"""Place the subgoal on the queue for all vassals."""
|
||||
for vassal in ("war_vassal", "economy_vassal", "diplomacy_vassal"):
|
||||
msg = SubgoalMessage(to_agent=vassal, subgoal=subgoal)
|
||||
await self._subgoal_queue.put(msg)
|
||||
logger.debug(
|
||||
"Tick %d: subgoal %s → %s (priority=%.1f)",
|
||||
self._tick,
|
||||
subgoal.token,
|
||||
subgoal.target or "—",
|
||||
subgoal.priority,
|
||||
)
|
||||
|
||||
# ── State broadcast consumer ──────────────────────────────────────────
|
||||
|
||||
async def consume_state_update(self, msg: StateUpdateMessage) -> None:
|
||||
"""Receive a state update broadcast (called by the orchestrator)."""
|
||||
logger.debug("King received state update tick=%d", msg.tick)
|
||||
296
src/bannerlord/agents/vassals.py
Normal file
296
src/bannerlord/agents/vassals.py
Normal file
@@ -0,0 +1,296 @@
|
||||
"""Vassal agents — War, Economy, and Diplomacy.
|
||||
|
||||
Vassals are mid-tier agents responsible for a domain of the kingdom.
|
||||
Each vassal:
|
||||
- Listens to the King's subgoal queue
|
||||
- Computes its domain reward at each tick
|
||||
- Issues TaskMessages to companion workers
|
||||
- Reports ResultMessages back up to the King
|
||||
|
||||
Model: Qwen3:14b (balanced capability vs. latency).
|
||||
Frequency: up to 4× per campaign day.
|
||||
|
||||
Refs: #1097, #1099.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from bannerlord.gabs_client import GABSClient, GABSUnavailable
|
||||
from bannerlord.models import (
|
||||
DiplomacyReward,
|
||||
EconomyReward,
|
||||
KingSubgoal,
|
||||
ResultMessage,
|
||||
SubgoalMessage,
|
||||
TaskMessage,
|
||||
WarReward,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Tokens each vassal responds to (all others are ignored)
|
||||
_WAR_TOKENS = {"EXPAND_TERRITORY", "RAID_ECONOMY", "TRAIN"}
|
||||
_ECON_TOKENS = {"FORTIFY", "CONSOLIDATE"}
|
||||
_DIPLO_TOKENS = {"ALLY"}
|
||||
_LOGISTICS_TOKENS = {"RECRUIT", "HEAL"}
|
||||
_TRADE_TOKENS = {"TRADE"}
|
||||
_SCOUT_TOKENS = {"SPY"}
|
||||
|
||||
|
||||
class BaseVassal:
|
||||
"""Shared vassal lifecycle — subscribes to subgoal queue, runs tick loop."""
|
||||
|
||||
name: str = "base_vassal"
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
gabs_client: GABSClient,
|
||||
subgoal_queue: asyncio.Queue[SubgoalMessage],
|
||||
result_queue: asyncio.Queue[ResultMessage] | None = None,
|
||||
task_queue: asyncio.Queue[TaskMessage] | None = None,
|
||||
) -> None:
|
||||
self._gabs = gabs_client
|
||||
self._subgoal_queue = subgoal_queue
|
||||
self._result_queue = result_queue or asyncio.Queue()
|
||||
self._task_queue = task_queue or asyncio.Queue()
|
||||
self._active_subgoal: KingSubgoal | None = None
|
||||
self._running = False
|
||||
|
||||
@property
|
||||
def task_queue(self) -> asyncio.Queue[TaskMessage]:
|
||||
return self._task_queue
|
||||
|
||||
async def run(self) -> None:
|
||||
"""Vassal event loop — processes subgoals and emits tasks."""
|
||||
self._running = True
|
||||
logger.info("%s started", self.name)
|
||||
try:
|
||||
while self._running:
|
||||
# Drain all pending subgoals (keep the latest)
|
||||
try:
|
||||
while True:
|
||||
msg = self._subgoal_queue.get_nowait()
|
||||
if msg.to_agent == self.name:
|
||||
self._active_subgoal = msg.subgoal
|
||||
logger.debug("%s received subgoal %s", self.name, msg.subgoal.token)
|
||||
except asyncio.QueueEmpty:
|
||||
pass
|
||||
|
||||
if self._active_subgoal is not None:
|
||||
await self._tick(self._active_subgoal)
|
||||
|
||||
await asyncio.sleep(0.25) # yield to event loop
|
||||
except asyncio.CancelledError:
|
||||
logger.info("%s cancelled", self.name)
|
||||
raise
|
||||
finally:
|
||||
self._running = False
|
||||
|
||||
def stop(self) -> None:
|
||||
self._running = False
|
||||
|
||||
async def _tick(self, subgoal: KingSubgoal) -> None:
|
||||
raise NotImplementedError
|
||||
|
||||
async def _get_state(self) -> dict[str, Any]:
|
||||
try:
|
||||
return await self._gabs.get_state() or {}
|
||||
except GABSUnavailable:
|
||||
return {}
|
||||
|
||||
|
||||
# ── War Vassal ────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class WarVassal(BaseVassal):
|
||||
"""Military operations — sieges, field battles, raids, defensive maneuvers.
|
||||
|
||||
Reward function:
|
||||
R = 0.40*ΔTerritoryValue + 0.25*ΔArmyStrengthRatio
|
||||
- 0.20*CasualtyCost - 0.10*SupplyCost + 0.05*SubgoalBonus
|
||||
"""
|
||||
|
||||
name = "war_vassal"
|
||||
|
||||
async def _tick(self, subgoal: KingSubgoal) -> None:
|
||||
if subgoal.token not in _WAR_TOKENS | _LOGISTICS_TOKENS:
|
||||
return
|
||||
|
||||
state = await self._get_state()
|
||||
reward = self._compute_reward(state, subgoal)
|
||||
|
||||
task = self._plan_action(state, subgoal)
|
||||
if task:
|
||||
await self._task_queue.put(task)
|
||||
|
||||
logger.debug(
|
||||
"%s tick: subgoal=%s reward=%.3f action=%s",
|
||||
self.name,
|
||||
subgoal.token,
|
||||
reward.total,
|
||||
task.primitive if task else "none",
|
||||
)
|
||||
|
||||
def _compute_reward(self, state: dict[str, Any], subgoal: KingSubgoal) -> WarReward:
|
||||
bonus = subgoal.priority * 0.05 if subgoal.token in _WAR_TOKENS else 0.0
|
||||
return WarReward(
|
||||
territory_delta=float(state.get("territory_delta", 0.0)),
|
||||
army_strength_ratio=float(state.get("army_strength_ratio", 1.0)),
|
||||
casualty_cost=float(state.get("casualty_cost", 0.0)),
|
||||
supply_cost=float(state.get("supply_cost", 0.0)),
|
||||
subgoal_bonus=bonus,
|
||||
)
|
||||
|
||||
def _plan_action(self, state: dict[str, Any], subgoal: KingSubgoal) -> TaskMessage | None:
|
||||
if subgoal.token == "EXPAND_TERRITORY" and subgoal.target: # noqa: S105
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="logistics_companion",
|
||||
primitive="move_party",
|
||||
args={"destination": subgoal.target},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
if subgoal.token == "RECRUIT": # noqa: S105
|
||||
qty = subgoal.quantity or 20
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="logistics_companion",
|
||||
primitive="recruit_troop",
|
||||
args={"troop_type": "infantry", "quantity": qty},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
if subgoal.token == "TRAIN": # noqa: S105
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="logistics_companion",
|
||||
primitive="upgrade_troops",
|
||||
args={},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
# ── Economy Vassal ────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class EconomyVassal(BaseVassal):
|
||||
"""Settlement management, tax collection, construction, food supply.
|
||||
|
||||
Reward function:
|
||||
R = 0.35*DailyDenarsIncome + 0.25*FoodStockBuffer + 0.20*LoyaltyAverage
|
||||
- 0.15*ConstructionQueueLength + 0.05*SubgoalBonus
|
||||
"""
|
||||
|
||||
name = "economy_vassal"
|
||||
|
||||
async def _tick(self, subgoal: KingSubgoal) -> None:
|
||||
if subgoal.token not in _ECON_TOKENS | _TRADE_TOKENS:
|
||||
return
|
||||
|
||||
state = await self._get_state()
|
||||
reward = self._compute_reward(state, subgoal)
|
||||
|
||||
task = self._plan_action(state, subgoal)
|
||||
if task:
|
||||
await self._task_queue.put(task)
|
||||
|
||||
logger.debug(
|
||||
"%s tick: subgoal=%s reward=%.3f",
|
||||
self.name,
|
||||
subgoal.token,
|
||||
reward.total,
|
||||
)
|
||||
|
||||
def _compute_reward(self, state: dict[str, Any], subgoal: KingSubgoal) -> EconomyReward:
|
||||
bonus = subgoal.priority * 0.05 if subgoal.token in _ECON_TOKENS else 0.0
|
||||
return EconomyReward(
|
||||
daily_denars_income=float(state.get("daily_income", 0.0)),
|
||||
food_stock_buffer=float(state.get("food_days_remaining", 0.0)),
|
||||
loyalty_average=float(state.get("avg_loyalty", 50.0)),
|
||||
construction_queue_length=int(state.get("construction_queue", 0)),
|
||||
subgoal_bonus=bonus,
|
||||
)
|
||||
|
||||
def _plan_action(self, state: dict[str, Any], subgoal: KingSubgoal) -> TaskMessage | None:
|
||||
if subgoal.token == "FORTIFY" and subgoal.target: # noqa: S105
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="logistics_companion",
|
||||
primitive="build_project",
|
||||
args={"settlement": subgoal.target},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
if subgoal.token == "TRADE": # noqa: S105
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="caravan_companion",
|
||||
primitive="assess_prices",
|
||||
args={"town": subgoal.target or "nearest"},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
# ── Diplomacy Vassal ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class DiplomacyVassal(BaseVassal):
|
||||
"""Relations management — alliances, peace deals, tribute, marriage.
|
||||
|
||||
Reward function:
|
||||
R = 0.30*AlliesCount + 0.25*TruceDurationValue + 0.25*RelationsScoreWeighted
|
||||
- 0.15*ActiveWarsFront + 0.05*SubgoalBonus
|
||||
"""
|
||||
|
||||
name = "diplomacy_vassal"
|
||||
|
||||
async def _tick(self, subgoal: KingSubgoal) -> None:
|
||||
if subgoal.token not in _DIPLO_TOKENS | _SCOUT_TOKENS:
|
||||
return
|
||||
|
||||
state = await self._get_state()
|
||||
reward = self._compute_reward(state, subgoal)
|
||||
|
||||
task = self._plan_action(state, subgoal)
|
||||
if task:
|
||||
await self._task_queue.put(task)
|
||||
|
||||
logger.debug(
|
||||
"%s tick: subgoal=%s reward=%.3f",
|
||||
self.name,
|
||||
subgoal.token,
|
||||
reward.total,
|
||||
)
|
||||
|
||||
def _compute_reward(self, state: dict[str, Any], subgoal: KingSubgoal) -> DiplomacyReward:
|
||||
bonus = subgoal.priority * 0.05 if subgoal.token in _DIPLO_TOKENS else 0.0
|
||||
return DiplomacyReward(
|
||||
allies_count=int(state.get("allies_count", 0)),
|
||||
truce_duration_value=float(state.get("truce_value", 0.0)),
|
||||
relations_score_weighted=float(state.get("relations_weighted", 0.0)),
|
||||
active_wars_front=int(state.get("active_wars", 0)),
|
||||
subgoal_bonus=bonus,
|
||||
)
|
||||
|
||||
def _plan_action(self, state: dict[str, Any], subgoal: KingSubgoal) -> TaskMessage | None:
|
||||
if subgoal.token == "ALLY" and subgoal.target: # noqa: S105
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="scout_companion",
|
||||
primitive="track_lord",
|
||||
args={"name": subgoal.target},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
if subgoal.token == "SPY" and subgoal.target: # noqa: S105
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="scout_companion",
|
||||
primitive="assess_garrison",
|
||||
args={"settlement": subgoal.target},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
return None
|
||||
198
src/bannerlord/gabs_client.py
Normal file
198
src/bannerlord/gabs_client.py
Normal file
@@ -0,0 +1,198 @@
|
||||
"""GABS TCP/JSON-RPC client.
|
||||
|
||||
Connects to the Bannerlord.GABS C# mod server running on a Windows VM.
|
||||
Protocol: newline-delimited JSON-RPC 2.0 over raw TCP.
|
||||
|
||||
Default host: localhost, port: 4825 (configurable via settings.bannerlord_gabs_host
|
||||
and settings.bannerlord_gabs_port).
|
||||
|
||||
Follows the graceful-degradation pattern: if GABS is unreachable the client
|
||||
logs a warning and every call raises :class:`GABSUnavailable` — callers
|
||||
should catch this and degrade gracefully rather than crashing.
|
||||
|
||||
Refs: #1091, #1097.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_DEFAULT_HOST = "localhost"
|
||||
_DEFAULT_PORT = 4825
|
||||
_DEFAULT_TIMEOUT = 10.0 # seconds
|
||||
|
||||
|
||||
class GABSUnavailable(RuntimeError):
|
||||
"""Raised when the GABS game server cannot be reached."""
|
||||
|
||||
|
||||
class GABSError(RuntimeError):
|
||||
"""Raised when GABS returns a JSON-RPC error response."""
|
||||
|
||||
def __init__(self, code: int, message: str) -> None:
|
||||
super().__init__(f"GABS error {code}: {message}")
|
||||
self.code = code
|
||||
|
||||
|
||||
class GABSClient:
|
||||
"""Async TCP JSON-RPC client for Bannerlord.GABS.
|
||||
|
||||
Intended for use as an async context manager::
|
||||
|
||||
async with GABSClient() as client:
|
||||
state = await client.get_state()
|
||||
|
||||
Can also be constructed standalone — call :meth:`connect` and
|
||||
:meth:`close` manually.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
host: str = _DEFAULT_HOST,
|
||||
port: int = _DEFAULT_PORT,
|
||||
timeout: float = _DEFAULT_TIMEOUT,
|
||||
) -> None:
|
||||
self._host = host
|
||||
self._port = port
|
||||
self._timeout = timeout
|
||||
self._reader: asyncio.StreamReader | None = None
|
||||
self._writer: asyncio.StreamWriter | None = None
|
||||
self._seq = 0
|
||||
self._connected = False
|
||||
|
||||
# ── Lifecycle ─────────────────────────────────────────────────────────
|
||||
|
||||
async def connect(self) -> None:
|
||||
"""Open the TCP connection to GABS.
|
||||
|
||||
Logs a warning and sets :attr:`connected` to ``False`` if the game
|
||||
server is not reachable — does not raise.
|
||||
"""
|
||||
try:
|
||||
self._reader, self._writer = await asyncio.wait_for(
|
||||
asyncio.open_connection(self._host, self._port),
|
||||
timeout=self._timeout,
|
||||
)
|
||||
self._connected = True
|
||||
logger.info("GABS connected at %s:%s", self._host, self._port)
|
||||
except (TimeoutError, OSError) as exc:
|
||||
logger.warning(
|
||||
"GABS unavailable at %s:%s — Bannerlord agent will degrade: %s",
|
||||
self._host,
|
||||
self._port,
|
||||
exc,
|
||||
)
|
||||
self._connected = False
|
||||
|
||||
async def close(self) -> None:
|
||||
if self._writer is not None:
|
||||
try:
|
||||
self._writer.close()
|
||||
await self._writer.wait_closed()
|
||||
except Exception: # noqa: BLE001
|
||||
pass
|
||||
self._connected = False
|
||||
logger.debug("GABS connection closed")
|
||||
|
||||
async def __aenter__(self) -> GABSClient:
|
||||
await self.connect()
|
||||
return self
|
||||
|
||||
async def __aexit__(self, *_: Any) -> None:
|
||||
await self.close()
|
||||
|
||||
@property
|
||||
def connected(self) -> bool:
|
||||
return self._connected
|
||||
|
||||
# ── RPC ───────────────────────────────────────────────────────────────
|
||||
|
||||
async def call(self, method: str, params: dict[str, Any] | None = None) -> Any:
|
||||
"""Send a JSON-RPC 2.0 request and return the ``result`` field.
|
||||
|
||||
Raises:
|
||||
GABSUnavailable: if the client is not connected.
|
||||
GABSError: if the server returns a JSON-RPC error.
|
||||
"""
|
||||
if not self._connected or self._reader is None or self._writer is None:
|
||||
raise GABSUnavailable(
|
||||
f"GABS not connected (host={self._host}, port={self._port}). "
|
||||
"Is the Bannerlord VM running?"
|
||||
)
|
||||
|
||||
self._seq += 1
|
||||
request = {
|
||||
"jsonrpc": "2.0",
|
||||
"id": self._seq,
|
||||
"method": method,
|
||||
"params": params or {},
|
||||
}
|
||||
payload = json.dumps(request) + "\n"
|
||||
|
||||
try:
|
||||
self._writer.write(payload.encode())
|
||||
await asyncio.wait_for(self._writer.drain(), timeout=self._timeout)
|
||||
|
||||
raw = await asyncio.wait_for(self._reader.readline(), timeout=self._timeout)
|
||||
except (TimeoutError, OSError) as exc:
|
||||
self._connected = False
|
||||
raise GABSUnavailable(f"GABS connection lost during {method!r}: {exc}") from exc
|
||||
|
||||
response = json.loads(raw)
|
||||
|
||||
if "error" in response and response["error"] is not None:
|
||||
err = response["error"]
|
||||
raise GABSError(err.get("code", -1), err.get("message", "unknown"))
|
||||
|
||||
return response.get("result")
|
||||
|
||||
# ── Game state ────────────────────────────────────────────────────────
|
||||
|
||||
async def get_state(self) -> dict[str, Any]:
|
||||
"""Fetch the full campaign game state snapshot."""
|
||||
return await self.call("game.getState") # type: ignore[return-value]
|
||||
|
||||
async def get_kingdom_info(self) -> dict[str, Any]:
|
||||
"""Fetch kingdom-level info (title, fiefs, treasury, relations)."""
|
||||
return await self.call("kingdom.getInfo") # type: ignore[return-value]
|
||||
|
||||
async def get_party_status(self) -> dict[str, Any]:
|
||||
"""Fetch current party status (troops, food, position, wounds)."""
|
||||
return await self.call("party.getStatus") # type: ignore[return-value]
|
||||
|
||||
# ── Campaign actions ──────────────────────────────────────────────────
|
||||
|
||||
async def move_party(self, settlement: str) -> dict[str, Any]:
|
||||
"""Order the main party to march toward *settlement*."""
|
||||
return await self.call("party.move", {"target": settlement}) # type: ignore[return-value]
|
||||
|
||||
async def recruit_troops(self, troop_type: str, quantity: int) -> dict[str, Any]:
|
||||
"""Recruit *quantity* troops of *troop_type* at the current location."""
|
||||
return await self.call( # type: ignore[return-value]
|
||||
"party.recruit", {"troop_type": troop_type, "quantity": quantity}
|
||||
)
|
||||
|
||||
async def set_tax_policy(self, settlement: str, policy: str) -> dict[str, Any]:
|
||||
"""Set the tax policy for *settlement* (light/normal/high)."""
|
||||
return await self.call( # type: ignore[return-value]
|
||||
"settlement.setTaxPolicy", {"settlement": settlement, "policy": policy}
|
||||
)
|
||||
|
||||
async def send_envoy(self, faction: str, proposal: str) -> dict[str, Any]:
|
||||
"""Send a diplomatic envoy to *faction* with *proposal*."""
|
||||
return await self.call( # type: ignore[return-value]
|
||||
"diplomacy.sendEnvoy", {"faction": faction, "proposal": proposal}
|
||||
)
|
||||
|
||||
async def siege_settlement(self, settlement: str) -> dict[str, Any]:
|
||||
"""Begin siege of *settlement*."""
|
||||
return await self.call("battle.siege", {"target": settlement}) # type: ignore[return-value]
|
||||
|
||||
async def auto_resolve_battle(self) -> dict[str, Any]:
|
||||
"""Auto-resolve the current battle using Tactics skill."""
|
||||
return await self.call("battle.autoResolve") # type: ignore[return-value]
|
||||
256
src/bannerlord/ledger.py
Normal file
256
src/bannerlord/ledger.py
Normal file
@@ -0,0 +1,256 @@
|
||||
"""Asset ledger for the Bannerlord sovereign agent.
|
||||
|
||||
Tracks kingdom assets (denars, settlements, troop allocations) in an
|
||||
in-memory dict backed by SQLite for persistence. Follows the existing
|
||||
SQLite migration pattern in this repo.
|
||||
|
||||
The King has exclusive write access to treasury and settlement ownership.
|
||||
Vassals receive an allocated budget and cannot exceed it without King
|
||||
re-authorization. Companions hold only work-in-progress quotas.
|
||||
|
||||
Refs: #1097, #1099.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import sqlite3
|
||||
from collections.abc import Iterator
|
||||
from contextlib import contextmanager
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_DEFAULT_DB = Path.home() / ".timmy" / "bannerlord" / "ledger.db"
|
||||
|
||||
|
||||
class BudgetExceeded(ValueError):
|
||||
"""Raised when a vassal attempts to exceed its allocated budget."""
|
||||
|
||||
|
||||
class Ledger:
|
||||
"""Sovereign asset ledger backed by SQLite.
|
||||
|
||||
Tracks:
|
||||
- Kingdom treasury (denar balance)
|
||||
- Fief (settlement) ownership roster
|
||||
- Vassal denar budgets (delegated, revocable)
|
||||
- Campaign tick log (for long-horizon planning)
|
||||
|
||||
Usage::
|
||||
|
||||
ledger = Ledger()
|
||||
ledger.initialize()
|
||||
ledger.deposit(5000, "tax income — Epicrotea")
|
||||
ledger.allocate_budget("war_vassal", 2000)
|
||||
"""
|
||||
|
||||
def __init__(self, db_path: Path = _DEFAULT_DB) -> None:
|
||||
self._db_path = db_path
|
||||
self._db_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# ── Setup ─────────────────────────────────────────────────────────────
|
||||
|
||||
def initialize(self) -> None:
|
||||
"""Create tables if they don't exist."""
|
||||
with self._conn() as conn:
|
||||
conn.executescript(
|
||||
"""
|
||||
CREATE TABLE IF NOT EXISTS treasury (
|
||||
id INTEGER PRIMARY KEY CHECK (id = 1),
|
||||
balance REAL NOT NULL DEFAULT 0
|
||||
);
|
||||
INSERT OR IGNORE INTO treasury (id, balance) VALUES (1, 0);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS fiefs (
|
||||
name TEXT PRIMARY KEY,
|
||||
fief_type TEXT NOT NULL, -- town / castle / village
|
||||
acquired_at TEXT NOT NULL
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS vassal_budgets (
|
||||
agent TEXT PRIMARY KEY,
|
||||
allocated REAL NOT NULL DEFAULT 0,
|
||||
spent REAL NOT NULL DEFAULT 0
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS tick_log (
|
||||
tick INTEGER PRIMARY KEY,
|
||||
campaign_day INTEGER NOT NULL,
|
||||
subgoal TEXT,
|
||||
reward_war REAL,
|
||||
reward_econ REAL,
|
||||
reward_diplo REAL,
|
||||
logged_at TEXT NOT NULL
|
||||
);
|
||||
"""
|
||||
)
|
||||
logger.debug("Ledger initialized at %s", self._db_path)
|
||||
|
||||
# ── Treasury ──────────────────────────────────────────────────────────
|
||||
|
||||
def balance(self) -> float:
|
||||
with self._conn() as conn:
|
||||
row = conn.execute("SELECT balance FROM treasury WHERE id = 1").fetchone()
|
||||
return float(row[0]) if row else 0.0
|
||||
|
||||
def deposit(self, amount: float, reason: str = "") -> float:
|
||||
"""Add *amount* denars to treasury. Returns new balance."""
|
||||
if amount < 0:
|
||||
raise ValueError("Use withdraw() for negative amounts")
|
||||
with self._conn() as conn:
|
||||
conn.execute("UPDATE treasury SET balance = balance + ? WHERE id = 1", (amount,))
|
||||
bal = self.balance()
|
||||
logger.info("Treasury +%.0f denars (%s) → balance %.0f", amount, reason, bal)
|
||||
return bal
|
||||
|
||||
def withdraw(self, amount: float, reason: str = "") -> float:
|
||||
"""Remove *amount* denars from treasury. Returns new balance."""
|
||||
if amount < 0:
|
||||
raise ValueError("Amount must be positive")
|
||||
bal = self.balance()
|
||||
if amount > bal:
|
||||
raise BudgetExceeded(
|
||||
f"Cannot withdraw {amount:.0f} denars — treasury balance is only {bal:.0f}"
|
||||
)
|
||||
with self._conn() as conn:
|
||||
conn.execute("UPDATE treasury SET balance = balance - ? WHERE id = 1", (amount,))
|
||||
new_bal = self.balance()
|
||||
logger.info("Treasury -%.0f denars (%s) → balance %.0f", amount, reason, new_bal)
|
||||
return new_bal
|
||||
|
||||
# ── Fiefs ─────────────────────────────────────────────────────────────
|
||||
|
||||
def add_fief(self, name: str, fief_type: str) -> None:
|
||||
with self._conn() as conn:
|
||||
conn.execute(
|
||||
"INSERT OR REPLACE INTO fiefs (name, fief_type, acquired_at) VALUES (?, ?, ?)",
|
||||
(name, fief_type, datetime.utcnow().isoformat()),
|
||||
)
|
||||
logger.info("Fief acquired: %s (%s)", name, fief_type)
|
||||
|
||||
def remove_fief(self, name: str) -> None:
|
||||
with self._conn() as conn:
|
||||
conn.execute("DELETE FROM fiefs WHERE name = ?", (name,))
|
||||
logger.info("Fief lost: %s", name)
|
||||
|
||||
def list_fiefs(self) -> list[dict[str, str]]:
|
||||
with self._conn() as conn:
|
||||
rows = conn.execute("SELECT name, fief_type, acquired_at FROM fiefs").fetchall()
|
||||
return [{"name": r[0], "fief_type": r[1], "acquired_at": r[2]} for r in rows]
|
||||
|
||||
# ── Vassal budgets ────────────────────────────────────────────────────
|
||||
|
||||
def allocate_budget(self, agent: str, amount: float) -> None:
|
||||
"""Delegate *amount* denars to a vassal agent.
|
||||
|
||||
Withdraws from treasury. Raises :class:`BudgetExceeded` if
|
||||
the treasury cannot cover the allocation.
|
||||
"""
|
||||
self.withdraw(amount, reason=f"budget → {agent}")
|
||||
with self._conn() as conn:
|
||||
conn.execute(
|
||||
"""
|
||||
INSERT INTO vassal_budgets (agent, allocated, spent)
|
||||
VALUES (?, ?, 0)
|
||||
ON CONFLICT(agent) DO UPDATE SET allocated = allocated + excluded.allocated
|
||||
""",
|
||||
(agent, amount),
|
||||
)
|
||||
logger.info("Allocated %.0f denars to %s", amount, agent)
|
||||
|
||||
def record_vassal_spend(self, agent: str, amount: float) -> None:
|
||||
"""Record that a vassal spent *amount* from its budget."""
|
||||
with self._conn() as conn:
|
||||
row = conn.execute(
|
||||
"SELECT allocated, spent FROM vassal_budgets WHERE agent = ?", (agent,)
|
||||
).fetchone()
|
||||
if row is None:
|
||||
raise BudgetExceeded(f"{agent} has no allocated budget")
|
||||
allocated, spent = row
|
||||
if spent + amount > allocated:
|
||||
raise BudgetExceeded(
|
||||
f"{agent} budget exhausted: {spent:.0f}/{allocated:.0f} spent, "
|
||||
f"requested {amount:.0f}"
|
||||
)
|
||||
with self._conn() as conn:
|
||||
conn.execute(
|
||||
"UPDATE vassal_budgets SET spent = spent + ? WHERE agent = ?",
|
||||
(amount, agent),
|
||||
)
|
||||
|
||||
def vassal_remaining(self, agent: str) -> float:
|
||||
with self._conn() as conn:
|
||||
row = conn.execute(
|
||||
"SELECT allocated - spent FROM vassal_budgets WHERE agent = ?", (agent,)
|
||||
).fetchone()
|
||||
return float(row[0]) if row else 0.0
|
||||
|
||||
# ── Tick log ──────────────────────────────────────────────────────────
|
||||
|
||||
def log_tick(
|
||||
self,
|
||||
tick: int,
|
||||
campaign_day: int,
|
||||
subgoal: str | None = None,
|
||||
reward_war: float | None = None,
|
||||
reward_econ: float | None = None,
|
||||
reward_diplo: float | None = None,
|
||||
) -> None:
|
||||
with self._conn() as conn:
|
||||
conn.execute(
|
||||
"""
|
||||
INSERT OR REPLACE INTO tick_log
|
||||
(tick, campaign_day, subgoal, reward_war, reward_econ, reward_diplo, logged_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?)
|
||||
""",
|
||||
(
|
||||
tick,
|
||||
campaign_day,
|
||||
subgoal,
|
||||
reward_war,
|
||||
reward_econ,
|
||||
reward_diplo,
|
||||
datetime.utcnow().isoformat(),
|
||||
),
|
||||
)
|
||||
|
||||
def tick_history(self, last_n: int = 100) -> list[dict]:
|
||||
with self._conn() as conn:
|
||||
rows = conn.execute(
|
||||
"""
|
||||
SELECT tick, campaign_day, subgoal, reward_war, reward_econ, reward_diplo, logged_at
|
||||
FROM tick_log
|
||||
ORDER BY tick DESC
|
||||
LIMIT ?
|
||||
""",
|
||||
(last_n,),
|
||||
).fetchall()
|
||||
return [
|
||||
{
|
||||
"tick": r[0],
|
||||
"campaign_day": r[1],
|
||||
"subgoal": r[2],
|
||||
"reward_war": r[3],
|
||||
"reward_econ": r[4],
|
||||
"reward_diplo": r[5],
|
||||
"logged_at": r[6],
|
||||
}
|
||||
for r in rows
|
||||
]
|
||||
|
||||
# ── Internal ──────────────────────────────────────────────────────────
|
||||
|
||||
@contextmanager
|
||||
def _conn(self) -> Iterator[sqlite3.Connection]:
|
||||
conn = sqlite3.connect(self._db_path)
|
||||
conn.execute("PRAGMA journal_mode=WAL")
|
||||
try:
|
||||
yield conn
|
||||
conn.commit()
|
||||
except Exception:
|
||||
conn.rollback()
|
||||
raise
|
||||
finally:
|
||||
conn.close()
|
||||
191
src/bannerlord/models.py
Normal file
191
src/bannerlord/models.py
Normal file
@@ -0,0 +1,191 @@
|
||||
"""Bannerlord feudal hierarchy data models.
|
||||
|
||||
All inter-agent communication uses typed Pydantic models. No raw dicts
|
||||
cross agent boundaries — every message is validated at construction time.
|
||||
|
||||
Design: Ahilan & Dayan (2019) Feudal Multi-Agent Hierarchies.
|
||||
Refs: #1097, #1099.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import datetime
|
||||
from typing import Any, Literal
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
# ── Subgoal vocabulary ────────────────────────────────────────────────────────
|
||||
|
||||
SUBGOAL_TOKENS = frozenset(
|
||||
{
|
||||
"EXPAND_TERRITORY", # Take or secure a fief — War Vassal
|
||||
"RAID_ECONOMY", # Raid enemy villages for denars — War Vassal
|
||||
"FORTIFY", # Upgrade or repair a settlement — Economy Vassal
|
||||
"RECRUIT", # Fill party to capacity — Logistics Companion
|
||||
"TRADE", # Execute profitable trade route — Caravan Companion
|
||||
"ALLY", # Pursue non-aggression / alliance — Diplomacy Vassal
|
||||
"SPY", # Gain information on target faction — Scout Companion
|
||||
"HEAL", # Rest party until wounds recovered — Logistics Companion
|
||||
"CONSOLIDATE", # Hold territory, no expansion — Economy Vassal
|
||||
"TRAIN", # Level troops via auto-resolve bandits — War Vassal
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
# ── King subgoal ──────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class KingSubgoal(BaseModel):
|
||||
"""Strategic directive issued by the King agent to vassals.
|
||||
|
||||
The King operates on campaign-map timescale (days to weeks of in-game
|
||||
time). His sole output is one subgoal token plus optional parameters.
|
||||
He never micro-manages primitives.
|
||||
"""
|
||||
|
||||
token: str = Field(..., description="One of SUBGOAL_TOKENS")
|
||||
target: str | None = Field(None, description="Named target (settlement, lord, faction)")
|
||||
quantity: int | None = Field(None, description="For RECRUIT, TRADE tokens", ge=1)
|
||||
priority: float = Field(1.0, ge=0.0, le=2.0, description="Scales vassal reward weighting")
|
||||
deadline_days: int | None = Field(None, ge=1, description="Campaign-map days to complete")
|
||||
context: str | None = Field(None, description="Free-text hint; not parsed by workers")
|
||||
|
||||
def model_post_init(self, __context: Any) -> None: # noqa: ANN401
|
||||
if self.token not in SUBGOAL_TOKENS:
|
||||
raise ValueError(
|
||||
f"Unknown subgoal token {self.token!r}. Must be one of: {sorted(SUBGOAL_TOKENS)}"
|
||||
)
|
||||
|
||||
|
||||
# ── Inter-agent messages ──────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class SubgoalMessage(BaseModel):
|
||||
"""King → Vassal direction."""
|
||||
|
||||
msg_type: Literal["subgoal"] = "subgoal"
|
||||
from_agent: Literal["king"] = "king"
|
||||
to_agent: str = Field(..., description="e.g. 'war_vassal', 'economy_vassal'")
|
||||
subgoal: KingSubgoal
|
||||
issued_at: datetime = Field(default_factory=datetime.utcnow)
|
||||
|
||||
|
||||
class TaskMessage(BaseModel):
|
||||
"""Vassal → Companion direction."""
|
||||
|
||||
msg_type: Literal["task"] = "task"
|
||||
from_agent: str = Field(..., description="e.g. 'war_vassal'")
|
||||
to_agent: str = Field(..., description="e.g. 'logistics_companion'")
|
||||
primitive: str = Field(..., description="One of the companion primitives")
|
||||
args: dict[str, Any] = Field(default_factory=dict)
|
||||
priority: float = Field(1.0, ge=0.0, le=2.0)
|
||||
issued_at: datetime = Field(default_factory=datetime.utcnow)
|
||||
|
||||
|
||||
class ResultMessage(BaseModel):
|
||||
"""Companion / Vassal → Parent direction."""
|
||||
|
||||
msg_type: Literal["result"] = "result"
|
||||
from_agent: str
|
||||
to_agent: str
|
||||
success: bool
|
||||
outcome: dict[str, Any] = Field(default_factory=dict, description="Primitive-specific result")
|
||||
reward_delta: float = Field(0.0, description="Computed reward contribution")
|
||||
completed_at: datetime = Field(default_factory=datetime.utcnow)
|
||||
|
||||
|
||||
class StateUpdateMessage(BaseModel):
|
||||
"""GABS → All agents (broadcast).
|
||||
|
||||
Sent every campaign tick. Agents consume at their own cadence.
|
||||
"""
|
||||
|
||||
msg_type: Literal["state"] = "state"
|
||||
game_state: dict[str, Any] = Field(..., description="Full GABS state snapshot")
|
||||
tick: int = Field(..., ge=0)
|
||||
timestamp: datetime = Field(default_factory=datetime.utcnow)
|
||||
|
||||
|
||||
# ── Reward snapshots ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class WarReward(BaseModel):
|
||||
"""Computed reward for the War Vassal at a given tick."""
|
||||
|
||||
territory_delta: float = 0.0
|
||||
army_strength_ratio: float = 1.0
|
||||
casualty_cost: float = 0.0
|
||||
supply_cost: float = 0.0
|
||||
subgoal_bonus: float = 0.0
|
||||
|
||||
@property
|
||||
def total(self) -> float:
|
||||
w1, w2, w3, w4, w5 = 0.40, 0.25, 0.20, 0.10, 0.05
|
||||
return (
|
||||
w1 * self.territory_delta
|
||||
+ w2 * self.army_strength_ratio
|
||||
- w3 * self.casualty_cost
|
||||
- w4 * self.supply_cost
|
||||
+ w5 * self.subgoal_bonus
|
||||
)
|
||||
|
||||
|
||||
class EconomyReward(BaseModel):
|
||||
"""Computed reward for the Economy Vassal at a given tick."""
|
||||
|
||||
daily_denars_income: float = 0.0
|
||||
food_stock_buffer: float = 0.0
|
||||
loyalty_average: float = 50.0
|
||||
construction_queue_length: int = 0
|
||||
subgoal_bonus: float = 0.0
|
||||
|
||||
@property
|
||||
def total(self) -> float:
|
||||
w1, w2, w3, w4, w5 = 0.35, 0.25, 0.20, 0.15, 0.05
|
||||
return (
|
||||
w1 * self.daily_denars_income
|
||||
+ w2 * self.food_stock_buffer
|
||||
+ w3 * self.loyalty_average
|
||||
- w4 * self.construction_queue_length
|
||||
+ w5 * self.subgoal_bonus
|
||||
)
|
||||
|
||||
|
||||
class DiplomacyReward(BaseModel):
|
||||
"""Computed reward for the Diplomacy Vassal at a given tick."""
|
||||
|
||||
allies_count: int = 0
|
||||
truce_duration_value: float = 0.0
|
||||
relations_score_weighted: float = 0.0
|
||||
active_wars_front: int = 0
|
||||
subgoal_bonus: float = 0.0
|
||||
|
||||
@property
|
||||
def total(self) -> float:
|
||||
w1, w2, w3, w4, w5 = 0.30, 0.25, 0.25, 0.15, 0.05
|
||||
return (
|
||||
w1 * self.allies_count
|
||||
+ w2 * self.truce_duration_value
|
||||
+ w3 * self.relations_score_weighted
|
||||
- w4 * self.active_wars_front
|
||||
+ w5 * self.subgoal_bonus
|
||||
)
|
||||
|
||||
|
||||
# ── Victory condition ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class VictoryCondition(BaseModel):
|
||||
"""Sovereign Victory (M5) — evaluated each campaign tick."""
|
||||
|
||||
holds_king_title: bool = False
|
||||
territory_control_pct: float = Field(
|
||||
0.0, ge=0.0, le=100.0, description="% of Calradia fiefs held"
|
||||
)
|
||||
majority_threshold: float = Field(
|
||||
51.0, ge=0.0, le=100.0, description="Required % for majority control"
|
||||
)
|
||||
|
||||
@property
|
||||
def achieved(self) -> bool:
|
||||
return self.holds_king_title and self.territory_control_pct >= self.majority_threshold
|
||||
@@ -30,25 +30,36 @@ class Settings(BaseSettings):
|
||||
return normalize_ollama_url(self.ollama_url)
|
||||
|
||||
# LLM model passed to Agno/Ollama — override with OLLAMA_MODEL
|
||||
# qwen3:30b is the primary model — better reasoning and tool calling
|
||||
# than llama3.1:8b-instruct while still running locally on modest hardware.
|
||||
# Fallback: llama3.1:8b-instruct if qwen3:30b not available.
|
||||
# llama3.2 (3B) hallucinated tool output consistently in testing.
|
||||
ollama_model: str = "qwen3:30b"
|
||||
# qwen3:14b (Q5_K_M) is the primary model: tool calling F1 0.971, ~17.5 GB
|
||||
# at 32K context — optimal for M3 Max 36 GB (Issue #1063).
|
||||
# qwen3:30b exceeded memory budget at 32K+ context on 36 GB hardware.
|
||||
ollama_model: str = "qwen3:14b"
|
||||
|
||||
# Fast routing model — override with OLLAMA_FAST_MODEL
|
||||
# qwen3:8b (Q6_K): tool calling F1 0.933 at ~45-55 tok/s (2x speed of 14B).
|
||||
# Use for routine tasks: simple tool calls, file reads, status checks.
|
||||
# Combined memory with qwen3:14b: ~17 GB — both can stay loaded simultaneously.
|
||||
ollama_fast_model: str = "qwen3:8b"
|
||||
|
||||
# Maximum concurrently loaded Ollama models — override with OLLAMA_MAX_LOADED_MODELS
|
||||
# Set to 2 to keep qwen3:8b (fast) + qwen3:14b (primary) both hot.
|
||||
# Requires setting OLLAMA_MAX_LOADED_MODELS=2 in the Ollama server environment.
|
||||
ollama_max_loaded_models: int = 2
|
||||
|
||||
# Context window size for Ollama inference — override with OLLAMA_NUM_CTX
|
||||
# qwen3:30b with default context eats 45GB on a 39GB Mac.
|
||||
# 4096 keeps memory at ~19GB. Set to 0 to use model defaults.
|
||||
ollama_num_ctx: int = 4096
|
||||
# qwen3:14b at 32K: ~17.5 GB total (weights + KV cache) on M3 Max 36 GB.
|
||||
# Set to 0 to use model defaults.
|
||||
ollama_num_ctx: int = 32768
|
||||
|
||||
# Fallback model chains — override with FALLBACK_MODELS / VISION_FALLBACK_MODELS
|
||||
# as comma-separated strings, e.g. FALLBACK_MODELS="qwen3:30b,llama3.1"
|
||||
# as comma-separated strings, e.g. FALLBACK_MODELS="qwen3:8b,qwen2.5:14b"
|
||||
# Or edit config/providers.yaml → fallback_chains for the canonical source.
|
||||
fallback_models: list[str] = [
|
||||
"llama3.1:8b-instruct",
|
||||
"llama3.1",
|
||||
"qwen3:8b",
|
||||
"qwen2.5:14b",
|
||||
"qwen2.5:7b",
|
||||
"llama3.1:8b-instruct",
|
||||
"llama3.1",
|
||||
"llama3.2:3b",
|
||||
]
|
||||
vision_fallback_models: list[str] = [
|
||||
@@ -385,6 +396,16 @@ class Settings(BaseSettings):
|
||||
# Default timeout for git operations.
|
||||
hands_git_timeout: int = 60
|
||||
|
||||
# ── Hermes Health Monitor ─────────────────────────────────────────
|
||||
# Enable the Hermes system health monitor (memory, disk, Ollama, processes, network).
|
||||
hermes_enabled: bool = True
|
||||
# How often Hermes runs a full health cycle (seconds). Default: 5 minutes.
|
||||
hermes_interval_seconds: int = 300
|
||||
# Alert threshold: free memory below this triggers model unloading / alert (GB).
|
||||
hermes_memory_free_min_gb: float = 4.0
|
||||
# Alert threshold: free disk below this triggers cleanup / alert (GB).
|
||||
hermes_disk_free_min_gb: float = 10.0
|
||||
|
||||
# ── Error Logging ─────────────────────────────────────────────────
|
||||
error_log_enabled: bool = True
|
||||
error_log_dir: str = "logs"
|
||||
|
||||
@@ -38,6 +38,7 @@ from dashboard.routes.discord import router as discord_router
|
||||
from dashboard.routes.experiments import router as experiments_router
|
||||
from dashboard.routes.grok import router as grok_router
|
||||
from dashboard.routes.health import router as health_router
|
||||
from dashboard.routes.hermes import router as hermes_router
|
||||
from dashboard.routes.loop_qa import router as loop_qa_router
|
||||
from dashboard.routes.memory import router as memory_router
|
||||
from dashboard.routes.mobile import router as mobile_router
|
||||
@@ -46,6 +47,7 @@ from dashboard.routes.models import router as models_router
|
||||
from dashboard.routes.quests import router as quests_router
|
||||
from dashboard.routes.scorecards import router as scorecards_router
|
||||
from dashboard.routes.sovereignty_metrics import router as sovereignty_metrics_router
|
||||
from dashboard.routes.sovereignty_ws import router as sovereignty_ws_router
|
||||
from dashboard.routes.spark import router as spark_router
|
||||
from dashboard.routes.system import router as system_router
|
||||
from dashboard.routes.tasks import router as tasks_router
|
||||
@@ -180,6 +182,33 @@ async def _thinking_scheduler() -> None:
|
||||
await asyncio.sleep(settings.thinking_interval_seconds)
|
||||
|
||||
|
||||
async def _hermes_scheduler() -> None:
|
||||
"""Background task: Hermes system health monitor, runs every 5 minutes.
|
||||
|
||||
Checks memory, disk, Ollama, processes, and network.
|
||||
Auto-resolves what it can; fires push notifications when human help is needed.
|
||||
"""
|
||||
from infrastructure.hermes.monitor import hermes_monitor
|
||||
|
||||
await asyncio.sleep(20) # Stagger after other schedulers
|
||||
|
||||
while True:
|
||||
try:
|
||||
if settings.hermes_enabled:
|
||||
report = await hermes_monitor.run_cycle()
|
||||
if report.has_issues:
|
||||
logger.warning(
|
||||
"Hermes health issues detected — overall: %s",
|
||||
report.overall.value,
|
||||
)
|
||||
except asyncio.CancelledError:
|
||||
raise
|
||||
except Exception as exc:
|
||||
logger.error("Hermes scheduler error: %s", exc)
|
||||
|
||||
await asyncio.sleep(settings.hermes_interval_seconds)
|
||||
|
||||
|
||||
async def _loop_qa_scheduler() -> None:
|
||||
"""Background task: run capability self-tests on a separate timer.
|
||||
|
||||
@@ -381,14 +410,16 @@ def _startup_background_tasks() -> list[asyncio.Task]:
|
||||
asyncio.create_task(_loop_qa_scheduler()),
|
||||
asyncio.create_task(_presence_watcher()),
|
||||
asyncio.create_task(_start_chat_integrations_background()),
|
||||
asyncio.create_task(_hermes_scheduler()),
|
||||
]
|
||||
try:
|
||||
from timmy.paperclip import start_paperclip_poller
|
||||
|
||||
bg_tasks.append(asyncio.create_task(start_paperclip_poller()))
|
||||
logger.info("Paperclip poller started")
|
||||
except ImportError:
|
||||
logger.debug("Paperclip module not found, skipping poller")
|
||||
|
||||
|
||||
return bg_tasks
|
||||
|
||||
|
||||
@@ -638,9 +669,11 @@ app.include_router(world_router)
|
||||
app.include_router(matrix_router)
|
||||
app.include_router(tower_router)
|
||||
app.include_router(daily_run_router)
|
||||
app.include_router(hermes_router)
|
||||
app.include_router(quests_router)
|
||||
app.include_router(scorecards_router)
|
||||
app.include_router(sovereignty_metrics_router)
|
||||
app.include_router(sovereignty_ws_router)
|
||||
|
||||
|
||||
@app.websocket("/ws")
|
||||
|
||||
@@ -8,6 +8,8 @@ from .database import Base # Assuming a shared Base in models/database.py
|
||||
|
||||
|
||||
class TaskState(StrEnum):
|
||||
"""Enumeration of possible task lifecycle states."""
|
||||
|
||||
LATER = "LATER"
|
||||
NEXT = "NEXT"
|
||||
NOW = "NOW"
|
||||
@@ -16,12 +18,16 @@ class TaskState(StrEnum):
|
||||
|
||||
|
||||
class TaskCertainty(StrEnum):
|
||||
"""Enumeration of task time-certainty levels."""
|
||||
|
||||
FUZZY = "FUZZY" # An intention without a time
|
||||
SOFT = "SOFT" # A flexible task with a time
|
||||
HARD = "HARD" # A fixed meeting/appointment
|
||||
|
||||
|
||||
class Task(Base):
|
||||
"""SQLAlchemy model representing a CALM task."""
|
||||
|
||||
__tablename__ = "tasks"
|
||||
|
||||
id = Column(Integer, primary_key=True, index=True)
|
||||
@@ -52,6 +58,8 @@ class Task(Base):
|
||||
|
||||
|
||||
class JournalEntry(Base):
|
||||
"""SQLAlchemy model for a daily journal entry with MITs and reflections."""
|
||||
|
||||
__tablename__ = "journal_entries"
|
||||
|
||||
id = Column(Integer, primary_key=True, index=True)
|
||||
|
||||
@@ -46,6 +46,49 @@ async def list_agents():
|
||||
}
|
||||
|
||||
|
||||
@router.get("/emotional-profile", response_class=HTMLResponse)
|
||||
async def emotional_profile(request: Request):
|
||||
"""HTMX partial: render emotional profiles for all loaded agents."""
|
||||
try:
|
||||
from timmy.agents.loader import load_agents
|
||||
|
||||
agents = load_agents()
|
||||
profiles = []
|
||||
for agent_id, agent in agents.items():
|
||||
profile = agent.emotional_state.get_profile()
|
||||
profile["agent_id"] = agent_id
|
||||
profile["agent_name"] = agent.name
|
||||
profiles.append(profile)
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to load emotional profiles: %s", exc)
|
||||
profiles = []
|
||||
|
||||
return templates.TemplateResponse(
|
||||
request,
|
||||
"partials/emotional_profile.html",
|
||||
{"profiles": profiles},
|
||||
)
|
||||
|
||||
|
||||
@router.get("/emotional-profile/json")
|
||||
async def emotional_profile_json():
|
||||
"""JSON API: return emotional profiles for all loaded agents."""
|
||||
try:
|
||||
from timmy.agents.loader import load_agents
|
||||
|
||||
agents = load_agents()
|
||||
profiles = []
|
||||
for agent_id, agent in agents.items():
|
||||
profile = agent.emotional_state.get_profile()
|
||||
profile["agent_id"] = agent_id
|
||||
profile["agent_name"] = agent.name
|
||||
profiles.append(profile)
|
||||
return {"profiles": profiles}
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to load emotional profiles: %s", exc)
|
||||
return {"profiles": [], "error": str(exc)}
|
||||
|
||||
|
||||
@router.get("/default/panel", response_class=HTMLResponse)
|
||||
async def agent_panel(request: Request):
|
||||
"""Chat panel — for HTMX main-panel swaps."""
|
||||
|
||||
@@ -14,6 +14,8 @@ router = APIRouter(prefix="/discord", tags=["discord"])
|
||||
|
||||
|
||||
class TokenPayload(BaseModel):
|
||||
"""Request payload containing a Discord bot token."""
|
||||
|
||||
token: str
|
||||
|
||||
|
||||
|
||||
45
src/dashboard/routes/hermes.py
Normal file
45
src/dashboard/routes/hermes.py
Normal file
@@ -0,0 +1,45 @@
|
||||
"""Hermes health monitor routes.
|
||||
|
||||
Exposes the Hermes health monitor via REST API so the dashboard
|
||||
and external tools can query system status and trigger checks.
|
||||
|
||||
Refs: #1073
|
||||
"""
|
||||
|
||||
import logging
|
||||
|
||||
from fastapi import APIRouter
|
||||
|
||||
from infrastructure.hermes.monitor import hermes_monitor
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
router = APIRouter(prefix="/hermes", tags=["hermes"])
|
||||
|
||||
|
||||
@router.get("/status")
|
||||
async def hermes_status():
|
||||
"""Return the most recent Hermes health report.
|
||||
|
||||
Returns the cached result from the last background cycle — does not
|
||||
trigger a new check. Use POST /hermes/check to run an immediate check.
|
||||
"""
|
||||
report = hermes_monitor.last_report
|
||||
if report is None:
|
||||
return {
|
||||
"status": "no_data",
|
||||
"message": "No health report yet — first cycle pending",
|
||||
"seconds_since_last_run": hermes_monitor.seconds_since_last_run,
|
||||
}
|
||||
return report.to_dict()
|
||||
|
||||
|
||||
@router.post("/check")
|
||||
async def hermes_check():
|
||||
"""Trigger an immediate Hermes health check cycle.
|
||||
|
||||
Runs all monitors synchronously and returns the full report.
|
||||
Use sparingly — this blocks until all checks complete (~5 seconds).
|
||||
"""
|
||||
report = await hermes_monitor.run_cycle()
|
||||
return report.to_dict()
|
||||
@@ -10,6 +10,7 @@ from fastapi.responses import HTMLResponse, JSONResponse
|
||||
|
||||
from dashboard.services.scorecard_service import (
|
||||
PeriodType,
|
||||
ScorecardSummary,
|
||||
generate_all_scorecards,
|
||||
generate_scorecard,
|
||||
get_tracked_agents,
|
||||
@@ -26,6 +27,216 @@ def _format_period_label(period_type: PeriodType) -> str:
|
||||
return "Daily" if period_type == PeriodType.daily else "Weekly"
|
||||
|
||||
|
||||
def _parse_period(period: str) -> PeriodType:
|
||||
"""Parse period string into PeriodType, defaulting to daily on invalid input.
|
||||
|
||||
Args:
|
||||
period: The period string ('daily' or 'weekly')
|
||||
|
||||
Returns:
|
||||
PeriodType.daily or PeriodType.weekly
|
||||
"""
|
||||
try:
|
||||
return PeriodType(period.lower())
|
||||
except ValueError:
|
||||
return PeriodType.daily
|
||||
|
||||
|
||||
def _format_token_display(token_net: int) -> str:
|
||||
"""Format token net value with +/- prefix for display.
|
||||
|
||||
Args:
|
||||
token_net: The net token value
|
||||
|
||||
Returns:
|
||||
Formatted string with + prefix for positive values
|
||||
"""
|
||||
return f"{'+' if token_net > 0 else ''}{token_net}"
|
||||
|
||||
|
||||
def _format_token_class(token_net: int) -> str:
|
||||
"""Get CSS class for token net value based on sign.
|
||||
|
||||
Args:
|
||||
token_net: The net token value
|
||||
|
||||
Returns:
|
||||
'text-success' for positive/zero, 'text-danger' for negative
|
||||
"""
|
||||
return "text-success" if token_net >= 0 else "text-danger"
|
||||
|
||||
|
||||
def _build_patterns_html(patterns: list[str]) -> str:
|
||||
"""Build HTML for patterns section if patterns exist.
|
||||
|
||||
Args:
|
||||
patterns: List of pattern strings
|
||||
|
||||
Returns:
|
||||
HTML string for patterns section or empty string
|
||||
"""
|
||||
if not patterns:
|
||||
return ""
|
||||
|
||||
patterns_list = "".join([f"<li>{p}</li>" for p in patterns])
|
||||
return f"""
|
||||
<div class="mt-3">
|
||||
<h6>Patterns</h6>
|
||||
<ul class="list-unstyled text-info">
|
||||
{patterns_list}
|
||||
</ul>
|
||||
</div>
|
||||
"""
|
||||
|
||||
|
||||
def _build_narrative_html(bullets: list[str]) -> str:
|
||||
"""Build HTML for narrative bullets.
|
||||
|
||||
Args:
|
||||
bullets: List of narrative bullet strings
|
||||
|
||||
Returns:
|
||||
HTML string with list items
|
||||
"""
|
||||
return "".join([f"<li>{b}</li>" for b in bullets])
|
||||
|
||||
|
||||
def _build_metrics_row_html(metrics: dict) -> str:
|
||||
"""Build HTML for the metrics summary row.
|
||||
|
||||
Args:
|
||||
metrics: Dictionary with PRs, issues, tests, and token metrics
|
||||
|
||||
Returns:
|
||||
HTML string for the metrics row
|
||||
"""
|
||||
prs_opened = metrics["prs_opened"]
|
||||
prs_merged = metrics["prs_merged"]
|
||||
pr_merge_rate = int(metrics["pr_merge_rate"] * 100)
|
||||
issues_touched = metrics["issues_touched"]
|
||||
tests_affected = metrics["tests_affected"]
|
||||
token_net = metrics["token_net"]
|
||||
|
||||
token_class = _format_token_class(token_net)
|
||||
token_display = _format_token_display(token_net)
|
||||
|
||||
return f"""
|
||||
<div class="row text-center small">
|
||||
<div class="col">
|
||||
<div class="text-muted">PRs</div>
|
||||
<div class="fw-bold">{prs_opened}/{prs_merged}</div>
|
||||
<div class="text-muted" style="font-size: 0.75rem;">
|
||||
{pr_merge_rate}% merged
|
||||
</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Issues</div>
|
||||
<div class="fw-bold">{issues_touched}</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Tests</div>
|
||||
<div class="fw-bold">{tests_affected}</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Tokens</div>
|
||||
<div class="fw-bold {token_class}">{token_display}</div>
|
||||
</div>
|
||||
</div>
|
||||
"""
|
||||
|
||||
|
||||
def _render_scorecard_panel(
|
||||
agent_id: str,
|
||||
period_type: PeriodType,
|
||||
data: dict,
|
||||
) -> str:
|
||||
"""Render HTML for a single scorecard panel.
|
||||
|
||||
Args:
|
||||
agent_id: The agent ID
|
||||
period_type: Daily or weekly period
|
||||
data: Scorecard data dictionary with metrics, patterns, narrative_bullets
|
||||
|
||||
Returns:
|
||||
HTML string for the scorecard panel
|
||||
"""
|
||||
patterns_html = _build_patterns_html(data.get("patterns", []))
|
||||
bullets_html = _build_narrative_html(data.get("narrative_bullets", []))
|
||||
metrics_row = _build_metrics_row_html(data["metrics"])
|
||||
|
||||
return f"""
|
||||
<div class="card mc-panel">
|
||||
<div class="card-header d-flex justify-content-between align-items-center">
|
||||
<h5 class="card-title mb-0">{agent_id.title()}</h5>
|
||||
<span class="badge bg-secondary">{_format_period_label(period_type)}</span>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<ul class="list-unstyled mb-3">
|
||||
{bullets_html}
|
||||
</ul>
|
||||
{metrics_row}
|
||||
{patterns_html}
|
||||
</div>
|
||||
</div>
|
||||
"""
|
||||
|
||||
|
||||
def _render_empty_scorecard(agent_id: str) -> str:
|
||||
"""Render HTML for an empty scorecard (no activity).
|
||||
|
||||
Args:
|
||||
agent_id: The agent ID
|
||||
|
||||
Returns:
|
||||
HTML string for the empty scorecard panel
|
||||
"""
|
||||
return f"""
|
||||
<div class="card mc-panel">
|
||||
<h5 class="card-title">{agent_id.title()}</h5>
|
||||
<p class="text-muted">No activity recorded for this period.</p>
|
||||
</div>
|
||||
"""
|
||||
|
||||
|
||||
def _render_error_scorecard(agent_id: str, error: str) -> str:
|
||||
"""Render HTML for a scorecard that failed to load.
|
||||
|
||||
Args:
|
||||
agent_id: The agent ID
|
||||
error: Error message string
|
||||
|
||||
Returns:
|
||||
HTML string for the error scorecard panel
|
||||
"""
|
||||
return f"""
|
||||
<div class="card mc-panel border-danger">
|
||||
<h5 class="card-title">{agent_id.title()}</h5>
|
||||
<p class="text-danger">Error loading scorecard: {error}</p>
|
||||
</div>
|
||||
"""
|
||||
|
||||
|
||||
def _render_single_panel_wrapper(
|
||||
agent_id: str,
|
||||
period_type: PeriodType,
|
||||
scorecard: ScorecardSummary | None,
|
||||
) -> str:
|
||||
"""Render a complete scorecard panel with wrapper div for single panel view.
|
||||
|
||||
Args:
|
||||
agent_id: The agent ID
|
||||
period_type: Daily or weekly period
|
||||
scorecard: ScorecardSummary object or None
|
||||
|
||||
Returns:
|
||||
HTML string for the complete panel
|
||||
"""
|
||||
if scorecard is None:
|
||||
return _render_empty_scorecard(agent_id)
|
||||
|
||||
return _render_scorecard_panel(agent_id, period_type, scorecard.to_dict())
|
||||
|
||||
|
||||
@router.get("/api/agents")
|
||||
async def list_tracked_agents() -> dict[str, list[str]]:
|
||||
"""Return the list of tracked agent IDs.
|
||||
@@ -149,99 +360,50 @@ async def agent_scorecard_panel(
|
||||
Returns:
|
||||
HTML panel with scorecard content
|
||||
"""
|
||||
try:
|
||||
period_type = PeriodType(period.lower())
|
||||
except ValueError:
|
||||
period_type = PeriodType.daily
|
||||
period_type = _parse_period(period)
|
||||
|
||||
try:
|
||||
scorecard = generate_scorecard(agent_id, period_type)
|
||||
|
||||
if scorecard is None:
|
||||
return HTMLResponse(
|
||||
content=f"""
|
||||
<div class="card mc-panel">
|
||||
<h5 class="card-title">{agent_id.title()}</h5>
|
||||
<p class="text-muted">No activity recorded for this period.</p>
|
||||
</div>
|
||||
""",
|
||||
status_code=200,
|
||||
)
|
||||
|
||||
data = scorecard.to_dict()
|
||||
|
||||
# Build patterns HTML
|
||||
patterns_html = ""
|
||||
if data["patterns"]:
|
||||
patterns_list = "".join([f"<li>{p}</li>" for p in data["patterns"]])
|
||||
patterns_html = f"""
|
||||
<div class="mt-3">
|
||||
<h6>Patterns</h6>
|
||||
<ul class="list-unstyled text-info">
|
||||
{patterns_list}
|
||||
</ul>
|
||||
</div>
|
||||
"""
|
||||
|
||||
# Build bullets HTML
|
||||
bullets_html = "".join([f"<li>{b}</li>" for b in data["narrative_bullets"]])
|
||||
|
||||
# Build metrics summary
|
||||
metrics = data["metrics"]
|
||||
|
||||
html_content = f"""
|
||||
<div class="card mc-panel">
|
||||
<div class="card-header d-flex justify-content-between align-items-center">
|
||||
<h5 class="card-title mb-0">{agent_id.title()}</h5>
|
||||
<span class="badge bg-secondary">{_format_period_label(period_type)}</span>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<ul class="list-unstyled mb-3">
|
||||
{bullets_html}
|
||||
</ul>
|
||||
|
||||
<div class="row text-center small">
|
||||
<div class="col">
|
||||
<div class="text-muted">PRs</div>
|
||||
<div class="fw-bold">{metrics["prs_opened"]}/{metrics["prs_merged"]}</div>
|
||||
<div class="text-muted" style="font-size: 0.75rem;">
|
||||
{int(metrics["pr_merge_rate"] * 100)}% merged
|
||||
</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Issues</div>
|
||||
<div class="fw-bold">{metrics["issues_touched"]}</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Tests</div>
|
||||
<div class="fw-bold">{metrics["tests_affected"]}</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Tokens</div>
|
||||
<div class="fw-bold {"text-success" if metrics["token_net"] >= 0 else "text-danger"}">
|
||||
{"+" if metrics["token_net"] > 0 else ""}{metrics["token_net"]}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{patterns_html}
|
||||
</div>
|
||||
</div>
|
||||
"""
|
||||
|
||||
html_content = _render_single_panel_wrapper(agent_id, period_type, scorecard)
|
||||
return HTMLResponse(content=html_content)
|
||||
|
||||
except Exception as exc:
|
||||
logger.error("Failed to render scorecard panel for %s: %s", agent_id, exc)
|
||||
return HTMLResponse(
|
||||
content=f"""
|
||||
<div class="card mc-panel border-danger">
|
||||
<h5 class="card-title">{agent_id.title()}</h5>
|
||||
<p class="text-danger">Error loading scorecard: {str(exc)}</p>
|
||||
</div>
|
||||
""",
|
||||
status_code=200,
|
||||
return HTMLResponse(content=_render_error_scorecard(agent_id, str(exc)))
|
||||
|
||||
|
||||
def _render_all_panels_grid(
|
||||
scorecards: list[ScorecardSummary],
|
||||
period_type: PeriodType,
|
||||
) -> str:
|
||||
"""Render all scorecard panels in a grid layout.
|
||||
|
||||
Args:
|
||||
scorecards: List of scorecard summaries
|
||||
period_type: Daily or weekly period
|
||||
|
||||
Returns:
|
||||
HTML string with all panels in a grid
|
||||
"""
|
||||
panels: list[str] = []
|
||||
for scorecard in scorecards:
|
||||
panel_html = _render_scorecard_panel(
|
||||
scorecard.agent_id,
|
||||
period_type,
|
||||
scorecard.to_dict(),
|
||||
)
|
||||
# Wrap each panel in a grid column
|
||||
wrapped = f'<div class="col-md-6 col-lg-4 mb-3">{panel_html}</div>'
|
||||
panels.append(wrapped)
|
||||
|
||||
return f"""
|
||||
<div class="row">
|
||||
{"".join(panels)}
|
||||
</div>
|
||||
<div class="text-muted small mt-2">
|
||||
Generated: {datetime.now().strftime("%Y-%m-%d %H:%M:%S UTC")}
|
||||
</div>
|
||||
"""
|
||||
|
||||
|
||||
@router.get("/all/panels", response_class=HTMLResponse)
|
||||
@@ -258,96 +420,15 @@ async def all_scorecard_panels(
|
||||
Returns:
|
||||
HTML with all scorecard panels
|
||||
"""
|
||||
try:
|
||||
period_type = PeriodType(period.lower())
|
||||
except ValueError:
|
||||
period_type = PeriodType.daily
|
||||
period_type = _parse_period(period)
|
||||
|
||||
try:
|
||||
scorecards = generate_all_scorecards(period_type)
|
||||
|
||||
panels: list[str] = []
|
||||
for scorecard in scorecards:
|
||||
data = scorecard.to_dict()
|
||||
|
||||
# Build patterns HTML
|
||||
patterns_html = ""
|
||||
if data["patterns"]:
|
||||
patterns_list = "".join([f"<li>{p}</li>" for p in data["patterns"]])
|
||||
patterns_html = f"""
|
||||
<div class="mt-3">
|
||||
<h6>Patterns</h6>
|
||||
<ul class="list-unstyled text-info">
|
||||
{patterns_list}
|
||||
</ul>
|
||||
</div>
|
||||
"""
|
||||
|
||||
# Build bullets HTML
|
||||
bullets_html = "".join([f"<li>{b}</li>" for b in data["narrative_bullets"]])
|
||||
metrics = data["metrics"]
|
||||
|
||||
panel_html = f"""
|
||||
<div class="col-md-6 col-lg-4 mb-3">
|
||||
<div class="card mc-panel">
|
||||
<div class="card-header d-flex justify-content-between align-items-center">
|
||||
<h5 class="card-title mb-0">{scorecard.agent_id.title()}</h5>
|
||||
<span class="badge bg-secondary">{_format_period_label(period_type)}</span>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<ul class="list-unstyled mb-3">
|
||||
{bullets_html}
|
||||
</ul>
|
||||
|
||||
<div class="row text-center small">
|
||||
<div class="col">
|
||||
<div class="text-muted">PRs</div>
|
||||
<div class="fw-bold">{metrics["prs_opened"]}/{metrics["prs_merged"]}</div>
|
||||
<div class="text-muted" style="font-size: 0.75rem;">
|
||||
{int(metrics["pr_merge_rate"] * 100)}% merged
|
||||
</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Issues</div>
|
||||
<div class="fw-bold">{metrics["issues_touched"]}</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Tests</div>
|
||||
<div class="fw-bold">{metrics["tests_affected"]}</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Tokens</div>
|
||||
<div class="fw-bold {"text-success" if metrics["token_net"] >= 0 else "text-danger"}">
|
||||
{"+" if metrics["token_net"] > 0 else ""}{metrics["token_net"]}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{patterns_html}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
"""
|
||||
panels.append(panel_html)
|
||||
|
||||
html_content = f"""
|
||||
<div class="row">
|
||||
{"".join(panels)}
|
||||
</div>
|
||||
<div class="text-muted small mt-2">
|
||||
Generated: {datetime.now().strftime("%Y-%m-%d %H:%M:%S UTC")}
|
||||
</div>
|
||||
"""
|
||||
|
||||
html_content = _render_all_panels_grid(scorecards, period_type)
|
||||
return HTMLResponse(content=html_content)
|
||||
|
||||
except Exception as exc:
|
||||
logger.error("Failed to render all scorecard panels: %s", exc)
|
||||
return HTMLResponse(
|
||||
content=f"""
|
||||
<div class="alert alert-danger">
|
||||
Error loading scorecards: {str(exc)}
|
||||
</div>
|
||||
""",
|
||||
status_code=200,
|
||||
content=f'<div class="alert alert-danger">Error loading scorecards: {exc}</div>'
|
||||
)
|
||||
|
||||
40
src/dashboard/routes/sovereignty_ws.py
Normal file
40
src/dashboard/routes/sovereignty_ws.py
Normal file
@@ -0,0 +1,40 @@
|
||||
"""WebSocket emitter for the sovereignty metrics dashboard widget.
|
||||
|
||||
Streams real-time sovereignty snapshots to connected clients every
|
||||
*_PUSH_INTERVAL* seconds. The snapshot includes per-layer sovereignty
|
||||
percentages, API cost rate, and skill crystallisation count.
|
||||
|
||||
Refs: #954, #953
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
|
||||
from fastapi import APIRouter, WebSocket
|
||||
|
||||
router = APIRouter(tags=["sovereignty"])
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_PUSH_INTERVAL = 5 # seconds between snapshot pushes
|
||||
|
||||
|
||||
@router.websocket("/ws/sovereignty")
|
||||
async def sovereignty_ws(websocket: WebSocket) -> None:
|
||||
"""Stream sovereignty metric snapshots to the dashboard widget."""
|
||||
from timmy.sovereignty.metrics import get_metrics_store
|
||||
|
||||
await websocket.accept()
|
||||
logger.info("Sovereignty WS connected")
|
||||
|
||||
store = get_metrics_store()
|
||||
try:
|
||||
# Send initial snapshot immediately
|
||||
await websocket.send_text(json.dumps(store.get_snapshot()))
|
||||
|
||||
while True:
|
||||
await asyncio.sleep(_PUSH_INTERVAL)
|
||||
await websocket.send_text(json.dumps(store.get_snapshot()))
|
||||
except Exception:
|
||||
logger.debug("Sovereignty WS disconnected")
|
||||
@@ -7,6 +7,8 @@ router = APIRouter(prefix="/telegram", tags=["telegram"])
|
||||
|
||||
|
||||
class TokenPayload(BaseModel):
|
||||
"""Request payload containing a Telegram bot token."""
|
||||
|
||||
token: str
|
||||
|
||||
|
||||
|
||||
@@ -1,11 +1,14 @@
|
||||
"""Voice routes — /voice/* and /voice/enhanced/* endpoints.
|
||||
|
||||
Provides NLU intent detection, TTS control, the full voice-to-action
|
||||
pipeline (detect intent → execute → optionally speak), and the voice
|
||||
button UI page.
|
||||
pipeline (detect intent → execute → optionally speak), the voice
|
||||
button UI page, and voice settings customisation.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
from pathlib import Path
|
||||
|
||||
from fastapi import APIRouter, Form, Request
|
||||
from fastapi.responses import HTMLResponse
|
||||
@@ -14,6 +17,31 @@ from dashboard.templating import templates
|
||||
from integrations.voice.nlu import detect_intent, extract_command
|
||||
from timmy.agent import create_timmy
|
||||
|
||||
# ── Voice settings persistence ───────────────────────────────────────────────
|
||||
|
||||
_VOICE_SETTINGS_FILE = Path("data/voice_settings.json")
|
||||
_DEFAULT_VOICE_SETTINGS: dict = {"rate": 175, "volume": 0.9, "voice_id": ""}
|
||||
|
||||
|
||||
def _load_voice_settings() -> dict:
|
||||
"""Read persisted voice settings from disk; return defaults on any error."""
|
||||
try:
|
||||
if _VOICE_SETTINGS_FILE.exists():
|
||||
return json.loads(_VOICE_SETTINGS_FILE.read_text())
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to load voice settings: %s", exc)
|
||||
return dict(_DEFAULT_VOICE_SETTINGS)
|
||||
|
||||
|
||||
def _save_voice_settings(data: dict) -> None:
|
||||
"""Persist voice settings to disk; log and continue on any error."""
|
||||
try:
|
||||
_VOICE_SETTINGS_FILE.parent.mkdir(parents=True, exist_ok=True)
|
||||
_VOICE_SETTINGS_FILE.write_text(json.dumps(data))
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to save voice settings: %s", exc)
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
router = APIRouter(prefix="/voice", tags=["voice"])
|
||||
@@ -152,3 +180,58 @@ async def process_voice_input(
|
||||
"error": error,
|
||||
"spoken": speak_response and response_text is not None,
|
||||
}
|
||||
|
||||
|
||||
# ── Voice settings UI ────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@router.get("/settings", response_class=HTMLResponse)
|
||||
async def voice_settings_page(request: Request):
|
||||
"""Render the voice customisation settings page."""
|
||||
current = await asyncio.to_thread(_load_voice_settings)
|
||||
voices: list[dict] = []
|
||||
try:
|
||||
from timmy_serve.voice_tts import voice_tts
|
||||
|
||||
if voice_tts.available:
|
||||
voices = await asyncio.to_thread(voice_tts.get_voices)
|
||||
except Exception as exc:
|
||||
logger.debug("Voice settings page: TTS not available — %s", exc)
|
||||
return templates.TemplateResponse(
|
||||
request,
|
||||
"voice_settings.html",
|
||||
{"settings": current, "voices": voices},
|
||||
)
|
||||
|
||||
|
||||
@router.get("/settings/data")
|
||||
async def voice_settings_data():
|
||||
"""Return current voice settings as JSON."""
|
||||
return await asyncio.to_thread(_load_voice_settings)
|
||||
|
||||
|
||||
@router.post("/settings/save")
|
||||
async def voice_settings_save(
|
||||
rate: int = Form(175),
|
||||
volume: float = Form(0.9),
|
||||
voice_id: str = Form(""),
|
||||
):
|
||||
"""Persist voice settings and apply them to the running TTS engine."""
|
||||
rate = max(50, min(400, rate))
|
||||
volume = max(0.0, min(1.0, volume))
|
||||
data = {"rate": rate, "volume": volume, "voice_id": voice_id}
|
||||
|
||||
# Apply to the live TTS engine (graceful degradation when unavailable)
|
||||
try:
|
||||
from timmy_serve.voice_tts import voice_tts
|
||||
|
||||
if voice_tts.available:
|
||||
await asyncio.to_thread(voice_tts.set_rate, rate)
|
||||
await asyncio.to_thread(voice_tts.set_volume, volume)
|
||||
if voice_id:
|
||||
await asyncio.to_thread(voice_tts.set_voice, voice_id)
|
||||
except Exception as exc:
|
||||
logger.warning("Voice settings: failed to apply to TTS engine — %s", exc)
|
||||
|
||||
await asyncio.to_thread(_save_voice_settings, data)
|
||||
return {"saved": True, "settings": data}
|
||||
|
||||
@@ -51,6 +51,8 @@ def _get_db() -> Generator[sqlite3.Connection, None, None]:
|
||||
|
||||
|
||||
class _EnumLike:
|
||||
"""Lightweight enum-like wrapper for string values used in templates."""
|
||||
|
||||
def __init__(self, v: str):
|
||||
self.value = v
|
||||
|
||||
|
||||
@@ -23,6 +23,8 @@ TRACKED_AGENTS = frozenset({"hermes", "kimi", "manus", "claude", "gemini"})
|
||||
|
||||
|
||||
class PeriodType(StrEnum):
|
||||
"""Scorecard reporting period type."""
|
||||
|
||||
daily = "daily"
|
||||
weekly = "weekly"
|
||||
|
||||
|
||||
@@ -88,6 +88,7 @@
|
||||
<a href="/lightning/ledger" class="mc-test-link">LEDGER</a>
|
||||
<a href="/creative/ui" class="mc-test-link">CREATIVE</a>
|
||||
<a href="/voice/button" class="mc-test-link">VOICE</a>
|
||||
<a href="/voice/settings" class="mc-test-link">VOICE SETTINGS</a>
|
||||
<a href="/mobile" class="mc-test-link" title="Mobile-optimized view">MOBILE</a>
|
||||
<a href="/mobile/local" class="mc-test-link" title="Local AI on iPhone">LOCAL AI</a>
|
||||
</div>
|
||||
@@ -145,6 +146,7 @@
|
||||
<a href="/lightning/ledger" class="mc-mobile-link">LEDGER</a>
|
||||
<a href="/creative/ui" class="mc-mobile-link">CREATIVE</a>
|
||||
<a href="/voice/button" class="mc-mobile-link">VOICE</a>
|
||||
<a href="/voice/settings" class="mc-mobile-link">VOICE SETTINGS</a>
|
||||
<a href="/mobile" class="mc-mobile-link">MOBILE</a>
|
||||
<a href="/mobile/local" class="mc-mobile-link">LOCAL AI</a>
|
||||
<div class="mc-mobile-menu-footer">
|
||||
|
||||
@@ -14,6 +14,11 @@
|
||||
<div class="mc-loading-placeholder">LOADING...</div>
|
||||
{% endcall %}
|
||||
|
||||
<!-- Emotional Profile (HTMX polled) -->
|
||||
{% call panel("EMOTIONAL PROFILE", hx_get="/agents/emotional-profile", hx_trigger="every 10s") %}
|
||||
<div class="mc-loading-placeholder">LOADING...</div>
|
||||
{% endcall %}
|
||||
|
||||
<!-- System Health (HTMX polled) -->
|
||||
{% call panel("SYSTEM HEALTH", hx_get="/health/status", hx_trigger="every 30s") %}
|
||||
<div class="health-row">
|
||||
|
||||
37
src/dashboard/templates/partials/emotional_profile.html
Normal file
37
src/dashboard/templates/partials/emotional_profile.html
Normal file
@@ -0,0 +1,37 @@
|
||||
{% if not profiles %}
|
||||
<div class="mc-muted" style="font-size:11px; padding:4px;">
|
||||
No agents loaded
|
||||
</div>
|
||||
{% endif %}
|
||||
|
||||
{% for p in profiles %}
|
||||
{% set color_map = {
|
||||
"cautious": "var(--amber)",
|
||||
"adventurous": "var(--green)",
|
||||
"analytical": "var(--purple)",
|
||||
"frustrated": "var(--red)",
|
||||
"confident": "var(--green)",
|
||||
"curious": "var(--orange)",
|
||||
"calm": "var(--text-dim)"
|
||||
} %}
|
||||
{% set emo_color = color_map.get(p.current_emotion, "var(--text-dim)") %}
|
||||
<div class="mc-emotion-row" style="margin-bottom:8px; padding:6px 8px; border-left:3px solid {{ emo_color }};">
|
||||
<div class="d-flex justify-content-between align-items-center" style="margin-bottom:2px;">
|
||||
<span style="font-size:11px; font-weight:bold; letter-spacing:.08em; color:var(--text-bright);">
|
||||
{{ p.agent_name | upper | e }}
|
||||
</span>
|
||||
<span style="font-size:10px; color:{{ emo_color }}; letter-spacing:.06em;">
|
||||
{{ p.emotion_label | e }}
|
||||
</span>
|
||||
</div>
|
||||
<div style="margin-bottom:4px;">
|
||||
<div style="height:4px; background:var(--bg-deep); border-radius:2px; overflow:hidden;">
|
||||
<div style="height:100%; width:{{ (p.intensity * 100) | int }}%; background:{{ emo_color }}; border-radius:2px; transition:width 0.3s;"></div>
|
||||
</div>
|
||||
</div>
|
||||
<div style="font-size:9px; color:var(--text-dim); letter-spacing:.06em;">
|
||||
{{ p.intensity_label | upper | e }}
|
||||
{% if p.trigger_event %} · {{ p.trigger_event | replace("_", " ") | upper | e }}{% endif %}
|
||||
</div>
|
||||
</div>
|
||||
{% endfor %}
|
||||
131
src/dashboard/templates/voice_settings.html
Normal file
131
src/dashboard/templates/voice_settings.html
Normal file
@@ -0,0 +1,131 @@
|
||||
{% extends "base.html" %}
|
||||
{% from "macros.html" import panel %}
|
||||
|
||||
{% block title %}Voice Settings{% endblock %}
|
||||
{% block extra_styles %}{% endblock %}
|
||||
|
||||
{% block content %}
|
||||
<div class="voice-settings-page py-3">
|
||||
{% call panel("VOICE SETTINGS") %}
|
||||
<form id="voice-settings-form">
|
||||
|
||||
<div class="vs-field">
|
||||
<label class="vs-label" for="rate-slider">
|
||||
SPEED — <span class="vs-value" id="rate-val">{{ settings.rate }}</span> WPM
|
||||
</label>
|
||||
<input type="range" class="vs-slider" id="rate-slider" name="rate"
|
||||
min="50" max="400" step="5" value="{{ settings.rate }}"
|
||||
oninput="document.getElementById('rate-val').textContent=this.value">
|
||||
<div class="vs-range-labels"><span>Slow</span><span>Fast</span></div>
|
||||
</div>
|
||||
|
||||
<div class="vs-field">
|
||||
<label class="vs-label" for="vol-slider">
|
||||
VOLUME — <span class="vs-value" id="vol-val">{{ (settings.volume * 100)|int }}</span>%
|
||||
</label>
|
||||
<input type="range" class="vs-slider" id="vol-slider" name="volume"
|
||||
min="0" max="100" step="5" value="{{ (settings.volume * 100)|int }}"
|
||||
oninput="document.getElementById('vol-val').textContent=this.value">
|
||||
<div class="vs-range-labels"><span>Quiet</span><span>Loud</span></div>
|
||||
</div>
|
||||
|
||||
<div class="vs-field">
|
||||
<label class="vs-label" for="voice-select">VOICE MODEL</label>
|
||||
{% if voices %}
|
||||
<select class="vs-select" id="voice-select" name="voice_id">
|
||||
<option value="">— System Default —</option>
|
||||
{% for v in voices %}
|
||||
<option value="{{ v.id }}" {% if v.id == settings.voice_id %}selected{% endif %}>
|
||||
{{ v.name }}
|
||||
</option>
|
||||
{% endfor %}
|
||||
</select>
|
||||
{% else %}
|
||||
<div class="vs-unavailable">Server TTS (pyttsx3) unavailable — preview uses browser speech synthesis</div>
|
||||
<input type="hidden" id="voice-select" name="voice_id" value="{{ settings.voice_id }}">
|
||||
{% endif %}
|
||||
</div>
|
||||
|
||||
<div class="vs-field">
|
||||
<label class="vs-label" for="preview-text">PREVIEW TEXT</label>
|
||||
<input type="text" class="vs-input" id="preview-text"
|
||||
value="Hello, I am Timmy. Your local AI assistant."
|
||||
placeholder="Enter text to preview...">
|
||||
</div>
|
||||
|
||||
<div class="vs-actions">
|
||||
<button type="button" class="vs-btn-preview" id="preview-btn" onclick="previewVoice()">
|
||||
▶ PREVIEW
|
||||
</button>
|
||||
<button type="button" class="vs-btn-save" id="save-btn" onclick="saveSettings()">
|
||||
SAVE SETTINGS
|
||||
</button>
|
||||
</div>
|
||||
|
||||
</form>
|
||||
{% endcall %}
|
||||
</div>
|
||||
|
||||
<script>
|
||||
function previewVoice() {
|
||||
var text = document.getElementById('preview-text').value.trim() ||
|
||||
'Hello, I am Timmy. Your local AI assistant.';
|
||||
var rate = parseInt(document.getElementById('rate-slider').value, 10);
|
||||
var volume = parseInt(document.getElementById('vol-slider').value, 10) / 100;
|
||||
|
||||
if (!('speechSynthesis' in window)) {
|
||||
McToast.show('Speech synthesis not supported in this browser', 'warn');
|
||||
return;
|
||||
}
|
||||
|
||||
window.speechSynthesis.cancel();
|
||||
var utterance = new SpeechSynthesisUtterance(text);
|
||||
// Web Speech API rate: 1.0 ≈ 175 WPM (default)
|
||||
utterance.rate = rate / 175;
|
||||
utterance.volume = volume;
|
||||
|
||||
// Best-effort voice match from server selection
|
||||
var voiceSelect = document.getElementById('voice-select');
|
||||
if (voiceSelect && voiceSelect.value) {
|
||||
var selectedText = voiceSelect.options[voiceSelect.selectedIndex].text.toLowerCase();
|
||||
var firstWord = selectedText.split(' ')[0];
|
||||
var browserVoices = window.speechSynthesis.getVoices();
|
||||
var matched = browserVoices.find(function(v) {
|
||||
return v.name.toLowerCase().includes(firstWord);
|
||||
});
|
||||
if (matched) { utterance.voice = matched; }
|
||||
}
|
||||
|
||||
window.speechSynthesis.speak(utterance);
|
||||
McToast.show('Playing preview\u2026', 'info');
|
||||
}
|
||||
|
||||
async function saveSettings() {
|
||||
var rate = document.getElementById('rate-slider').value;
|
||||
var volPct = parseInt(document.getElementById('vol-slider').value, 10);
|
||||
var voiceId = document.getElementById('voice-select').value;
|
||||
|
||||
var body = new URLSearchParams({
|
||||
rate: rate,
|
||||
volume: (volPct / 100).toFixed(2),
|
||||
voice_id: voiceId
|
||||
});
|
||||
|
||||
try {
|
||||
var resp = await fetch('/voice/settings/save', {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
|
||||
body: body.toString()
|
||||
});
|
||||
var data = await resp.json();
|
||||
if (data.saved) {
|
||||
McToast.show('Voice settings saved.', 'info');
|
||||
} else {
|
||||
McToast.show('Failed to save settings.', 'error');
|
||||
}
|
||||
} catch (e) {
|
||||
McToast.show('Error saving settings.', 'error');
|
||||
}
|
||||
}
|
||||
</script>
|
||||
{% endblock %}
|
||||
@@ -24,6 +24,8 @@ MAX_MESSAGES: int = 500
|
||||
|
||||
@dataclass
|
||||
class Message:
|
||||
"""A single chat message with role, content, timestamp, and source."""
|
||||
|
||||
role: str # "user" | "agent" | "error"
|
||||
content: str
|
||||
timestamp: str
|
||||
|
||||
9
src/infrastructure/hermes/__init__.py
Normal file
9
src/infrastructure/hermes/__init__.py
Normal file
@@ -0,0 +1,9 @@
|
||||
"""Hermes health monitor — system resources + model management.
|
||||
|
||||
Monitors the local machine (Hermes/M3 Max) for memory pressure, disk usage,
|
||||
Ollama model health, zombie processes, and network connectivity.
|
||||
"""
|
||||
|
||||
from infrastructure.hermes.monitor import HealthLevel, HealthReport, HermesMonitor, hermes_monitor
|
||||
|
||||
__all__ = ["HermesMonitor", "HealthLevel", "HealthReport", "hermes_monitor"]
|
||||
660
src/infrastructure/hermes/monitor.py
Normal file
660
src/infrastructure/hermes/monitor.py
Normal file
@@ -0,0 +1,660 @@
|
||||
"""Hermes health monitor — system resources + model management.
|
||||
|
||||
Monitors the local machine (Hermes/M3 Max) and keeps it running smoothly.
|
||||
Runs every 5 minutes, auto-resolves issues where possible, alerts when
|
||||
human intervention is needed.
|
||||
|
||||
Monitors:
|
||||
1. Memory pressure — unified memory, alert if <4GB free, unload models
|
||||
2. Disk usage — alert if <10GB free, clean temp files
|
||||
3. Ollama status — verify reachable, restart if crashed, manage loaded models
|
||||
4. Process health — detect zombie processes
|
||||
5. Network — verify Gitea connectivity
|
||||
|
||||
Refs: #1073
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
import shutil
|
||||
import subprocess
|
||||
import tempfile
|
||||
import time
|
||||
import urllib.request
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import UTC, datetime
|
||||
from enum import StrEnum
|
||||
from typing import Any
|
||||
|
||||
from config import settings
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class HealthLevel(StrEnum):
|
||||
"""Severity level for a health check result."""
|
||||
|
||||
OK = "ok"
|
||||
WARNING = "warning"
|
||||
CRITICAL = "critical"
|
||||
UNKNOWN = "unknown"
|
||||
|
||||
|
||||
@dataclass
|
||||
class CheckResult:
|
||||
"""Result of a single health check."""
|
||||
|
||||
name: str
|
||||
level: HealthLevel
|
||||
message: str
|
||||
details: dict[str, Any] = field(default_factory=dict)
|
||||
auto_resolved: bool = False
|
||||
needs_human: bool = False
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {
|
||||
"name": self.name,
|
||||
"level": self.level.value,
|
||||
"message": self.message,
|
||||
"details": self.details,
|
||||
"auto_resolved": self.auto_resolved,
|
||||
"needs_human": self.needs_human,
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class HealthReport:
|
||||
"""Full health report from a single monitor cycle."""
|
||||
|
||||
timestamp: str
|
||||
checks: list[CheckResult]
|
||||
overall: HealthLevel
|
||||
|
||||
@property
|
||||
def has_issues(self) -> bool:
|
||||
return any(c.level != HealthLevel.OK for c in self.checks)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {
|
||||
"timestamp": self.timestamp,
|
||||
"overall": self.overall.value,
|
||||
"has_issues": self.has_issues,
|
||||
"checks": [c.to_dict() for c in self.checks],
|
||||
}
|
||||
|
||||
|
||||
class HermesMonitor:
|
||||
"""System health monitor for Hermes (local M3 Max machine).
|
||||
|
||||
All blocking I/O (subprocess, HTTP) is wrapped in asyncio.to_thread()
|
||||
so it never blocks the event loop. Results are cached so the dashboard
|
||||
can read the last report without triggering a new cycle.
|
||||
"""
|
||||
|
||||
OLLAMA_REQUEST_TIMEOUT = 5
|
||||
NETWORK_REQUEST_TIMEOUT = 5
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._last_report: HealthReport | None = None
|
||||
self._last_run_ts: float = 0.0
|
||||
|
||||
@property
|
||||
def last_report(self) -> HealthReport | None:
|
||||
"""Most recent health report, or None if no cycle has run yet."""
|
||||
return self._last_report
|
||||
|
||||
@property
|
||||
def seconds_since_last_run(self) -> float:
|
||||
if self._last_run_ts == 0.0:
|
||||
return float("inf")
|
||||
return time.monotonic() - self._last_run_ts
|
||||
|
||||
async def run_cycle(self) -> HealthReport:
|
||||
"""Run a full health check cycle and return the report."""
|
||||
self._last_run_ts = time.monotonic()
|
||||
logger.info("Hermes health cycle starting")
|
||||
|
||||
check_fns = [
|
||||
self._check_memory(),
|
||||
self._check_disk(),
|
||||
self._check_ollama(),
|
||||
self._check_processes(),
|
||||
self._check_network(),
|
||||
]
|
||||
|
||||
raw_results = await asyncio.gather(*check_fns, return_exceptions=True)
|
||||
|
||||
checks: list[CheckResult] = []
|
||||
for i, r in enumerate(raw_results):
|
||||
if isinstance(r, Exception):
|
||||
name = ["memory", "disk", "ollama", "processes", "network"][i]
|
||||
logger.warning("Hermes check '%s' raised: %s", name, r)
|
||||
checks.append(
|
||||
CheckResult(
|
||||
name=name,
|
||||
level=HealthLevel.UNKNOWN,
|
||||
message=f"Check error: {r}",
|
||||
)
|
||||
)
|
||||
else:
|
||||
checks.append(r)
|
||||
|
||||
# Compute overall level
|
||||
levels = {c.level for c in checks}
|
||||
if HealthLevel.CRITICAL in levels:
|
||||
overall = HealthLevel.CRITICAL
|
||||
elif HealthLevel.WARNING in levels:
|
||||
overall = HealthLevel.WARNING
|
||||
elif HealthLevel.UNKNOWN in levels:
|
||||
overall = HealthLevel.UNKNOWN
|
||||
else:
|
||||
overall = HealthLevel.OK
|
||||
|
||||
report = HealthReport(
|
||||
timestamp=datetime.now(UTC).isoformat(),
|
||||
checks=checks,
|
||||
overall=overall,
|
||||
)
|
||||
self._last_report = report
|
||||
|
||||
await self._handle_alerts(report)
|
||||
|
||||
logger.info("Hermes health cycle complete — overall: %s", overall.value)
|
||||
return report
|
||||
|
||||
# ── Memory ───────────────────────────────────────────────────────────────
|
||||
|
||||
async def _check_memory(self) -> CheckResult:
|
||||
"""Check unified memory usage (macOS vm_stat)."""
|
||||
memory_free_min_gb = getattr(settings, "hermes_memory_free_min_gb", 4.0)
|
||||
try:
|
||||
info = await asyncio.to_thread(self._get_memory_info)
|
||||
free_gb = info.get("free_gb", 0.0)
|
||||
total_gb = info.get("total_gb", 0.0)
|
||||
details: dict[str, Any] = {
|
||||
"free_gb": round(free_gb, 2),
|
||||
"total_gb": round(total_gb, 2),
|
||||
}
|
||||
|
||||
if free_gb < memory_free_min_gb:
|
||||
# Attempt auto-remediation: unload Ollama models
|
||||
unloaded = await self._unload_ollama_models()
|
||||
if unloaded:
|
||||
return CheckResult(
|
||||
name="memory",
|
||||
level=HealthLevel.WARNING,
|
||||
message=(
|
||||
f"Low memory ({free_gb:.1f}GB free) — "
|
||||
f"unloaded {unloaded} Ollama model(s)"
|
||||
),
|
||||
details={**details, "models_unloaded": unloaded},
|
||||
auto_resolved=True,
|
||||
)
|
||||
return CheckResult(
|
||||
name="memory",
|
||||
level=HealthLevel.CRITICAL,
|
||||
message=(
|
||||
f"Critical: only {free_gb:.1f}GB free (threshold: {memory_free_min_gb}GB)"
|
||||
),
|
||||
details=details,
|
||||
needs_human=True,
|
||||
)
|
||||
|
||||
return CheckResult(
|
||||
name="memory",
|
||||
level=HealthLevel.OK,
|
||||
message=f"Memory OK — {free_gb:.1f}GB free of {total_gb:.1f}GB",
|
||||
details=details,
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.warning("Memory check failed: %s", exc)
|
||||
return CheckResult(
|
||||
name="memory",
|
||||
level=HealthLevel.UNKNOWN,
|
||||
message=f"Memory check unavailable: {exc}",
|
||||
)
|
||||
|
||||
def _get_memory_info(self) -> dict[str, float]:
|
||||
"""Get memory stats via macOS sysctl + vm_stat.
|
||||
|
||||
Falls back gracefully on non-macOS systems.
|
||||
"""
|
||||
gb = 1024**3
|
||||
total_bytes = 0.0
|
||||
free_bytes = 0.0
|
||||
|
||||
# Total memory via sysctl
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["sysctl", "-n", "hw.memsize"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=3,
|
||||
)
|
||||
total_bytes = float(result.stdout.strip())
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Free + inactive pages via vm_stat (macOS)
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["vm_stat"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=3,
|
||||
)
|
||||
page_size = 16384 # 16 KB default on Apple Silicon
|
||||
for line in result.stdout.splitlines():
|
||||
if "page size of" in line:
|
||||
parts = line.split()
|
||||
for i, part in enumerate(parts):
|
||||
if part == "of" and i + 1 < len(parts):
|
||||
try:
|
||||
page_size = int(parts[i + 1])
|
||||
except ValueError:
|
||||
pass
|
||||
elif "Pages free:" in line:
|
||||
pages = int(line.split(":")[1].strip().rstrip("."))
|
||||
free_bytes += pages * page_size
|
||||
elif "Pages inactive:" in line:
|
||||
pages = int(line.split(":")[1].strip().rstrip("."))
|
||||
free_bytes += pages * page_size
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return {
|
||||
"total_gb": total_bytes / gb if total_bytes else 0.0,
|
||||
"free_gb": free_bytes / gb if free_bytes else 0.0,
|
||||
}
|
||||
|
||||
# ── Disk ─────────────────────────────────────────────────────────────────
|
||||
|
||||
async def _check_disk(self) -> CheckResult:
|
||||
"""Check disk usage via shutil.disk_usage."""
|
||||
disk_free_min_gb = getattr(settings, "hermes_disk_free_min_gb", 10.0)
|
||||
try:
|
||||
usage = await asyncio.to_thread(shutil.disk_usage, "/")
|
||||
free_gb = usage.free / (1024**3)
|
||||
total_gb = usage.total / (1024**3)
|
||||
used_pct = (usage.used / usage.total) * 100
|
||||
|
||||
details: dict[str, Any] = {
|
||||
"free_gb": round(free_gb, 2),
|
||||
"total_gb": round(total_gb, 2),
|
||||
"used_pct": round(used_pct, 1),
|
||||
}
|
||||
|
||||
if free_gb < disk_free_min_gb:
|
||||
cleaned_gb = await self._cleanup_temp_files()
|
||||
if cleaned_gb > 0.01:
|
||||
return CheckResult(
|
||||
name="disk",
|
||||
level=HealthLevel.WARNING,
|
||||
message=(
|
||||
f"Low disk ({free_gb:.1f}GB free) — "
|
||||
f"cleaned {cleaned_gb:.2f}GB from /tmp"
|
||||
),
|
||||
details={**details, "cleaned_gb": round(cleaned_gb, 2)},
|
||||
auto_resolved=True,
|
||||
)
|
||||
return CheckResult(
|
||||
name="disk",
|
||||
level=HealthLevel.CRITICAL,
|
||||
message=(
|
||||
f"Critical: only {free_gb:.1f}GB free (threshold: {disk_free_min_gb}GB)"
|
||||
),
|
||||
details=details,
|
||||
needs_human=True,
|
||||
)
|
||||
|
||||
return CheckResult(
|
||||
name="disk",
|
||||
level=HealthLevel.OK,
|
||||
message=f"Disk OK — {free_gb:.1f}GB free ({used_pct:.0f}% used)",
|
||||
details=details,
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.warning("Disk check failed: %s", exc)
|
||||
return CheckResult(
|
||||
name="disk",
|
||||
level=HealthLevel.UNKNOWN,
|
||||
message=f"Disk check unavailable: {exc}",
|
||||
)
|
||||
|
||||
async def _cleanup_temp_files(self) -> float:
|
||||
"""Remove /tmp files older than 24 hours. Returns GB freed."""
|
||||
return await asyncio.to_thread(self._cleanup_temp_files_sync)
|
||||
|
||||
def _cleanup_temp_files_sync(self) -> float:
|
||||
"""Synchronous /tmp cleanup — only touches files older than 24 hours."""
|
||||
from pathlib import Path
|
||||
|
||||
freed_bytes = 0
|
||||
cutoff = time.time() - 86400 # 24 hours ago
|
||||
|
||||
try:
|
||||
tmp = Path(tempfile.gettempdir())
|
||||
for item in tmp.iterdir():
|
||||
try:
|
||||
stat = item.stat()
|
||||
if stat.st_mtime >= cutoff:
|
||||
continue
|
||||
if item.is_file():
|
||||
freed_bytes += stat.st_size
|
||||
item.unlink(missing_ok=True)
|
||||
elif item.is_dir():
|
||||
dir_size = sum(f.stat().st_size for f in item.rglob("*") if f.is_file())
|
||||
freed_bytes += dir_size
|
||||
shutil.rmtree(str(item), ignore_errors=True)
|
||||
except (PermissionError, OSError):
|
||||
pass # Skip files we can't touch
|
||||
except Exception as exc:
|
||||
logger.warning("Temp cleanup error: %s", exc)
|
||||
|
||||
freed_gb = freed_bytes / (1024**3)
|
||||
if freed_gb > 0.001:
|
||||
logger.info("Hermes disk cleanup: freed %.2fGB from /tmp", freed_gb)
|
||||
return freed_gb
|
||||
|
||||
# ── Ollama ───────────────────────────────────────────────────────────────
|
||||
|
||||
async def _check_ollama(self) -> CheckResult:
|
||||
"""Check Ollama status and loaded models."""
|
||||
try:
|
||||
status = await asyncio.to_thread(self._get_ollama_status)
|
||||
|
||||
if not status.get("reachable"):
|
||||
restarted = await self._restart_ollama()
|
||||
if restarted:
|
||||
return CheckResult(
|
||||
name="ollama",
|
||||
level=HealthLevel.WARNING,
|
||||
message="Ollama was unreachable — restart initiated",
|
||||
details={"restart_attempted": True},
|
||||
auto_resolved=True,
|
||||
)
|
||||
return CheckResult(
|
||||
name="ollama",
|
||||
level=HealthLevel.CRITICAL,
|
||||
message="Ollama unreachable and restart failed",
|
||||
details={"reachable": False},
|
||||
needs_human=True,
|
||||
)
|
||||
|
||||
models = status.get("models", [])
|
||||
loaded = status.get("loaded_models", [])
|
||||
return CheckResult(
|
||||
name="ollama",
|
||||
level=HealthLevel.OK,
|
||||
message=(f"Ollama OK — {len(models)} model(s) available, {len(loaded)} loaded"),
|
||||
details={
|
||||
"reachable": True,
|
||||
"model_count": len(models),
|
||||
"loaded_count": len(loaded),
|
||||
"loaded_models": [m.get("name", "") for m in loaded],
|
||||
},
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.warning("Ollama check failed: %s", exc)
|
||||
return CheckResult(
|
||||
name="ollama",
|
||||
level=HealthLevel.UNKNOWN,
|
||||
message=f"Ollama check failed: {exc}",
|
||||
)
|
||||
|
||||
def _get_ollama_status(self) -> dict[str, Any]:
|
||||
"""Synchronous Ollama status — checks /api/tags and /api/ps."""
|
||||
url = settings.normalized_ollama_url
|
||||
|
||||
try:
|
||||
req = urllib.request.Request(
|
||||
f"{url}/api/tags",
|
||||
method="GET",
|
||||
headers={"Accept": "application/json"},
|
||||
)
|
||||
with urllib.request.urlopen(req, timeout=self.OLLAMA_REQUEST_TIMEOUT) as resp:
|
||||
data = json.loads(resp.read().decode())
|
||||
models = data.get("models", [])
|
||||
except Exception:
|
||||
return {"reachable": False, "models": [], "loaded_models": []}
|
||||
|
||||
# /api/ps lists currently loaded (in-memory) models — Ollama >=0.2
|
||||
loaded: list[dict] = []
|
||||
try:
|
||||
req = urllib.request.Request(
|
||||
f"{url}/api/ps",
|
||||
method="GET",
|
||||
headers={"Accept": "application/json"},
|
||||
)
|
||||
with urllib.request.urlopen(req, timeout=self.OLLAMA_REQUEST_TIMEOUT) as resp:
|
||||
ps_data = json.loads(resp.read().decode())
|
||||
loaded = ps_data.get("models", [])
|
||||
except Exception:
|
||||
pass # /api/ps absent on older Ollama — non-fatal
|
||||
|
||||
return {"reachable": True, "models": models, "loaded_models": loaded}
|
||||
|
||||
async def _unload_ollama_models(self) -> int:
|
||||
"""Unload in-memory Ollama models to free unified memory.
|
||||
|
||||
Uses the keep_alive=0 trick: POSTing to /api/generate with
|
||||
keep_alive=0 causes Ollama to immediately evict the model.
|
||||
Returns the number of models successfully unloaded.
|
||||
"""
|
||||
return await asyncio.to_thread(self._unload_ollama_models_sync)
|
||||
|
||||
def _unload_ollama_models_sync(self) -> int:
|
||||
"""Synchronous model unload implementation."""
|
||||
url = settings.normalized_ollama_url
|
||||
unloaded = 0
|
||||
|
||||
try:
|
||||
req = urllib.request.Request(
|
||||
f"{url}/api/ps",
|
||||
method="GET",
|
||||
headers={"Accept": "application/json"},
|
||||
)
|
||||
with urllib.request.urlopen(req, timeout=self.OLLAMA_REQUEST_TIMEOUT) as resp:
|
||||
ps_data = json.loads(resp.read().decode())
|
||||
loaded = ps_data.get("models", [])
|
||||
except Exception:
|
||||
return 0
|
||||
|
||||
for model in loaded:
|
||||
name = model.get("name", "")
|
||||
if not name:
|
||||
continue
|
||||
try:
|
||||
payload = json.dumps({"model": name, "keep_alive": 0}).encode()
|
||||
req = urllib.request.Request(
|
||||
f"{url}/api/generate",
|
||||
data=payload,
|
||||
method="POST",
|
||||
headers={"Content-Type": "application/json"},
|
||||
)
|
||||
with urllib.request.urlopen(req, timeout=10) as _:
|
||||
pass
|
||||
logger.info("Hermes: unloaded Ollama model %s", name)
|
||||
unloaded += 1
|
||||
except Exception as exc:
|
||||
logger.warning("Hermes: failed to unload model %s: %s", name, exc)
|
||||
|
||||
return unloaded
|
||||
|
||||
async def _restart_ollama(self) -> bool:
|
||||
"""Attempt to restart the Ollama service via launchctl or brew."""
|
||||
return await asyncio.to_thread(self._restart_ollama_sync)
|
||||
|
||||
def _restart_ollama_sync(self) -> bool:
|
||||
"""Try launchctl first, then brew services."""
|
||||
# macOS launchctl (installed via official Ollama installer)
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["launchctl", "stop", "com.ollama.ollama"],
|
||||
capture_output=True,
|
||||
timeout=10,
|
||||
)
|
||||
if result.returncode == 0:
|
||||
time.sleep(2)
|
||||
subprocess.run(
|
||||
["launchctl", "start", "com.ollama.ollama"],
|
||||
capture_output=True,
|
||||
timeout=10,
|
||||
)
|
||||
logger.info("Hermes: Ollama restarted via launchctl")
|
||||
return True
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Homebrew fallback
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["brew", "services", "restart", "ollama"],
|
||||
capture_output=True,
|
||||
timeout=20,
|
||||
)
|
||||
if result.returncode == 0:
|
||||
logger.info("Hermes: Ollama restarted via brew services")
|
||||
return True
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
logger.warning("Hermes: Ollama restart failed — manual intervention needed")
|
||||
return False
|
||||
|
||||
# ── Processes ────────────────────────────────────────────────────────────
|
||||
|
||||
async def _check_processes(self) -> CheckResult:
|
||||
"""Check for zombie processes via ps aux."""
|
||||
try:
|
||||
result = await asyncio.to_thread(self._get_zombie_processes)
|
||||
zombies = result.get("zombies", [])
|
||||
|
||||
if zombies:
|
||||
return CheckResult(
|
||||
name="processes",
|
||||
level=HealthLevel.WARNING,
|
||||
message=f"Found {len(zombies)} zombie process(es)",
|
||||
details={"zombies": zombies[:5]},
|
||||
needs_human=len(zombies) > 3,
|
||||
)
|
||||
|
||||
return CheckResult(
|
||||
name="processes",
|
||||
level=HealthLevel.OK,
|
||||
message="Processes OK — no zombies detected",
|
||||
details={"zombie_count": 0},
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.warning("Process check failed: %s", exc)
|
||||
return CheckResult(
|
||||
name="processes",
|
||||
level=HealthLevel.UNKNOWN,
|
||||
message=f"Process check unavailable: {exc}",
|
||||
)
|
||||
|
||||
def _get_zombie_processes(self) -> dict[str, Any]:
|
||||
"""Detect zombie processes (state 'Z') via ps aux."""
|
||||
result = subprocess.run(
|
||||
["ps", "aux"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=5,
|
||||
)
|
||||
zombies = []
|
||||
for line in result.stdout.splitlines()[1:]: # Skip header row
|
||||
parts = line.split(None, 10)
|
||||
if len(parts) >= 8 and parts[7] == "Z":
|
||||
zombies.append(
|
||||
{
|
||||
"pid": parts[1],
|
||||
"command": parts[10][:80] if len(parts) > 10 else "",
|
||||
}
|
||||
)
|
||||
return {"zombies": zombies}
|
||||
|
||||
# ── Network ──────────────────────────────────────────────────────────────
|
||||
|
||||
async def _check_network(self) -> CheckResult:
|
||||
"""Check Gitea connectivity."""
|
||||
try:
|
||||
result = await asyncio.to_thread(self._check_gitea_connectivity)
|
||||
reachable = result.get("reachable", False)
|
||||
latency_ms = result.get("latency_ms", -1.0)
|
||||
|
||||
if not reachable:
|
||||
return CheckResult(
|
||||
name="network",
|
||||
level=HealthLevel.WARNING,
|
||||
message=f"Gitea unreachable: {result.get('error', 'unknown')}",
|
||||
details=result,
|
||||
needs_human=True,
|
||||
)
|
||||
|
||||
return CheckResult(
|
||||
name="network",
|
||||
level=HealthLevel.OK,
|
||||
message=f"Network OK — Gitea reachable ({latency_ms:.0f}ms)",
|
||||
details=result,
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.warning("Network check failed: %s", exc)
|
||||
return CheckResult(
|
||||
name="network",
|
||||
level=HealthLevel.UNKNOWN,
|
||||
message=f"Network check unavailable: {exc}",
|
||||
)
|
||||
|
||||
def _check_gitea_connectivity(self) -> dict[str, Any]:
|
||||
"""Synchronous Gitea reachability check."""
|
||||
url = settings.gitea_url
|
||||
start = time.monotonic()
|
||||
try:
|
||||
req = urllib.request.Request(
|
||||
f"{url}/api/v1/version",
|
||||
method="GET",
|
||||
headers={"Accept": "application/json"},
|
||||
)
|
||||
with urllib.request.urlopen(req, timeout=self.NETWORK_REQUEST_TIMEOUT) as resp:
|
||||
latency_ms = (time.monotonic() - start) * 1000
|
||||
return {
|
||||
"reachable": resp.status == 200,
|
||||
"latency_ms": round(latency_ms, 1),
|
||||
"url": url,
|
||||
}
|
||||
except Exception as exc:
|
||||
return {
|
||||
"reachable": False,
|
||||
"error": str(exc),
|
||||
"url": url,
|
||||
"latency_ms": -1.0,
|
||||
}
|
||||
|
||||
# ── Alerts ───────────────────────────────────────────────────────────────
|
||||
|
||||
async def _handle_alerts(self, report: HealthReport) -> None:
|
||||
"""Send push notifications for issues that need attention."""
|
||||
try:
|
||||
from infrastructure.notifications.push import notifier
|
||||
except Exception:
|
||||
return
|
||||
|
||||
for check in report.checks:
|
||||
if check.level == HealthLevel.CRITICAL or check.needs_human:
|
||||
notifier.notify(
|
||||
title=f"Hermes Alert: {check.name}",
|
||||
message=check.message,
|
||||
category="system",
|
||||
native=check.level == HealthLevel.CRITICAL,
|
||||
)
|
||||
elif check.level == HealthLevel.WARNING and check.auto_resolved:
|
||||
notifier.notify(
|
||||
title=f"Hermes: {check.name} auto-fixed",
|
||||
message=check.message,
|
||||
category="system",
|
||||
)
|
||||
|
||||
|
||||
# Module-level singleton
|
||||
hermes_monitor = HermesMonitor()
|
||||
@@ -21,6 +21,8 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
@dataclass
|
||||
class Notification:
|
||||
"""A push notification with title, message, category, and read status."""
|
||||
|
||||
id: int
|
||||
title: str
|
||||
message: str
|
||||
|
||||
@@ -3,6 +3,14 @@
|
||||
from .api import router
|
||||
from .cascade import CascadeRouter, Provider, ProviderStatus, get_router
|
||||
from .history import HealthHistoryStore, get_history_store
|
||||
from .metabolic import (
|
||||
DEFAULT_TIER_MODELS,
|
||||
MetabolicRouter,
|
||||
ModelTier,
|
||||
build_prompt,
|
||||
classify_complexity,
|
||||
get_metabolic_router,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"CascadeRouter",
|
||||
@@ -12,4 +20,11 @@ __all__ = [
|
||||
"router",
|
||||
"HealthHistoryStore",
|
||||
"get_history_store",
|
||||
# Metabolic router
|
||||
"MetabolicRouter",
|
||||
"ModelTier",
|
||||
"DEFAULT_TIER_MODELS",
|
||||
"classify_complexity",
|
||||
"build_prompt",
|
||||
"get_metabolic_router",
|
||||
]
|
||||
|
||||
@@ -114,7 +114,7 @@ class Provider:
|
||||
type: str # ollama, openai, anthropic
|
||||
enabled: bool
|
||||
priority: int
|
||||
tier: str | None = None # e.g., "local", "standard_cloud", "frontier"
|
||||
tier: str | None = None # e.g., "local", "standard_cloud", "frontier"
|
||||
url: str | None = None
|
||||
api_key: str | None = None
|
||||
base_url: str | None = None
|
||||
@@ -573,7 +573,6 @@ class CascadeRouter:
|
||||
if not providers:
|
||||
raise RuntimeError(f"No providers found for tier: {cascade_tier}")
|
||||
|
||||
|
||||
for provider in providers:
|
||||
if not self._is_provider_available(provider):
|
||||
continue
|
||||
|
||||
424
src/infrastructure/router/metabolic.py
Normal file
424
src/infrastructure/router/metabolic.py
Normal file
@@ -0,0 +1,424 @@
|
||||
"""Three-tier metabolic LLM router.
|
||||
|
||||
Routes queries to the cheapest-sufficient model tier using MLX for all
|
||||
inference on Apple Silicon GPU:
|
||||
|
||||
T1 — Routine (Qwen3-8B Q6_K, ~45-55 tok/s): Simple navigation, basic choices.
|
||||
T2 — Medium (Qwen3-14B Q5_K_M, ~20-28 tok/s): Dialogue, inventory management.
|
||||
T3 — Complex (Qwen3-32B Q4_K_M, ~8-12 tok/s): Quest planning, stuck recovery.
|
||||
|
||||
Memory budget:
|
||||
- T1+T2 always loaded (~8.5 GB combined)
|
||||
- T3 loaded on demand (+20 GB) — game pauses during inference
|
||||
|
||||
Design notes:
|
||||
- 70% of game ticks never reach the LLM (handled upstream by behavior trees)
|
||||
- T3 pauses the game world before inference and unpauses after (graceful if no world)
|
||||
- All inference via vllm-mlx / Ollama — local-first, no cloud for game ticks
|
||||
|
||||
References:
|
||||
- Issue #966 — Three-Tier Metabolic LLM Router
|
||||
- Issue #1063 — Best Local Uncensored Agent Model for M3 Max 36GB
|
||||
- Issue #1075 — Claude Quota Monitor + Metabolic Protocol
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from enum import StrEnum
|
||||
from typing import Any
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class ModelTier(StrEnum):
|
||||
"""Three metabolic model tiers ordered by cost and capability.
|
||||
|
||||
Tier selection is driven by classify_complexity(). The cheapest
|
||||
sufficient tier is always chosen — T1 handles routine tasks, T2
|
||||
handles dialogue and management, T3 handles planning and recovery.
|
||||
"""
|
||||
|
||||
T1_ROUTINE = "t1_routine" # Fast, cheap — Qwen3-8B, always loaded
|
||||
T2_MEDIUM = "t2_medium" # Balanced — Qwen3-14B, always loaded
|
||||
T3_COMPLEX = "t3_complex" # Deep — Qwen3-32B, loaded on demand, pauses game
|
||||
|
||||
|
||||
# ── Classification vocabulary ────────────────────────────────────────────────
|
||||
|
||||
# T1: single-action navigation and binary-choice words
|
||||
_T1_KEYWORDS = frozenset(
|
||||
{
|
||||
"go",
|
||||
"move",
|
||||
"walk",
|
||||
"run",
|
||||
"north",
|
||||
"south",
|
||||
"east",
|
||||
"west",
|
||||
"up",
|
||||
"down",
|
||||
"left",
|
||||
"right",
|
||||
"yes",
|
||||
"no",
|
||||
"ok",
|
||||
"okay",
|
||||
"open",
|
||||
"close",
|
||||
"take",
|
||||
"drop",
|
||||
"look",
|
||||
"pick",
|
||||
"use",
|
||||
"wait",
|
||||
"rest",
|
||||
"save",
|
||||
"attack",
|
||||
"flee",
|
||||
"jump",
|
||||
"crouch",
|
||||
}
|
||||
)
|
||||
|
||||
# T3: planning, optimisation, or recovery signals
|
||||
_T3_KEYWORDS = frozenset(
|
||||
{
|
||||
"plan",
|
||||
"strategy",
|
||||
"optimize",
|
||||
"optimise",
|
||||
"quest",
|
||||
"stuck",
|
||||
"recover",
|
||||
"multi-step",
|
||||
"long-term",
|
||||
"negotiate",
|
||||
"persuade",
|
||||
"faction",
|
||||
"reputation",
|
||||
"best",
|
||||
"optimal",
|
||||
"recommend",
|
||||
"analyze",
|
||||
"analyse",
|
||||
"evaluate",
|
||||
"decide",
|
||||
"complex",
|
||||
"how do i",
|
||||
"what should i do",
|
||||
"help me figure",
|
||||
"what is the best",
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
def classify_complexity(task: str, state: dict) -> ModelTier:
|
||||
"""Classify a task to the cheapest-sufficient model tier.
|
||||
|
||||
Classification priority (highest wins):
|
||||
1. T3 — any T3 keyword, stuck indicator, or ``state["require_t3"] = True``
|
||||
2. T1 — short task with only T1 keywords and no active context
|
||||
3. T2 — everything else (safe default)
|
||||
|
||||
Args:
|
||||
task: Natural-language task description or player input.
|
||||
state: Current game state dict. Recognised keys:
|
||||
``stuck`` (bool), ``require_t3`` (bool),
|
||||
``active_quests`` (list), ``dialogue_active`` (bool).
|
||||
|
||||
Returns:
|
||||
ModelTier appropriate for the task.
|
||||
"""
|
||||
task_lower = task.lower()
|
||||
words = set(task_lower.split())
|
||||
|
||||
# ── T3 signals ──────────────────────────────────────────────────────────
|
||||
t3_keyword_hit = bool(words & _T3_KEYWORDS)
|
||||
# Check multi-word T3 phrases
|
||||
t3_phrase_hit = any(phrase in task_lower for phrase in _T3_KEYWORDS if " " in phrase)
|
||||
is_stuck = bool(state.get("stuck", False))
|
||||
explicit_t3 = bool(state.get("require_t3", False))
|
||||
|
||||
if t3_keyword_hit or t3_phrase_hit or is_stuck or explicit_t3:
|
||||
logger.debug(
|
||||
"classify_complexity → T3 (keywords=%s stuck=%s explicit=%s)",
|
||||
t3_keyword_hit or t3_phrase_hit,
|
||||
is_stuck,
|
||||
explicit_t3,
|
||||
)
|
||||
return ModelTier.T3_COMPLEX
|
||||
|
||||
# ── T1 signals ──────────────────────────────────────────────────────────
|
||||
t1_keyword_hit = bool(words & _T1_KEYWORDS)
|
||||
task_short = len(task.split()) <= 6
|
||||
no_active_context = (
|
||||
not state.get("active_quests")
|
||||
and not state.get("dialogue_active")
|
||||
and not state.get("combat_active")
|
||||
)
|
||||
|
||||
if t1_keyword_hit and task_short and no_active_context:
|
||||
logger.debug("classify_complexity → T1 (keywords=%s short=%s)", t1_keyword_hit, task_short)
|
||||
return ModelTier.T1_ROUTINE
|
||||
|
||||
# ── Default: T2 ─────────────────────────────────────────────────────────
|
||||
logger.debug("classify_complexity → T2 (default)")
|
||||
return ModelTier.T2_MEDIUM
|
||||
|
||||
|
||||
def build_prompt(
|
||||
state: dict,
|
||||
ui_state: dict,
|
||||
text: str,
|
||||
visual_context: str | None = None,
|
||||
) -> list[dict]:
|
||||
"""Build an OpenAI-compatible messages list from game context.
|
||||
|
||||
Assembles a system message from structured game state and a user
|
||||
message from the player's text input. This format is accepted by
|
||||
CascadeRouter.complete() directly.
|
||||
|
||||
Args:
|
||||
state: Current game state dict. Common keys:
|
||||
``location`` (str), ``health`` (int/float),
|
||||
``inventory`` (list), ``active_quests`` (list),
|
||||
``stuck`` (bool).
|
||||
ui_state: Current UI state dict. Common keys:
|
||||
``dialogue_active`` (bool), ``dialogue_npc`` (str),
|
||||
``menu_open`` (str), ``combat_active`` (bool).
|
||||
text: Player text or task description (becomes user message).
|
||||
visual_context: Optional free-text description of the current screen
|
||||
or scene — from a vision model or rule-based extractor.
|
||||
|
||||
Returns:
|
||||
List of message dicts: [{"role": "system", ...}, {"role": "user", ...}]
|
||||
"""
|
||||
context_lines: list[str] = []
|
||||
|
||||
location = state.get("location", "unknown")
|
||||
context_lines.append(f"Location: {location}")
|
||||
|
||||
health = state.get("health")
|
||||
if health is not None:
|
||||
context_lines.append(f"Health: {health}")
|
||||
|
||||
inventory = state.get("inventory", [])
|
||||
if inventory:
|
||||
items = [i if isinstance(i, str) else i.get("name", str(i)) for i in inventory[:10]]
|
||||
context_lines.append(f"Inventory: {', '.join(items)}")
|
||||
|
||||
active_quests = state.get("active_quests", [])
|
||||
if active_quests:
|
||||
names = [q if isinstance(q, str) else q.get("name", str(q)) for q in active_quests[:5]]
|
||||
context_lines.append(f"Active quests: {', '.join(names)}")
|
||||
|
||||
if state.get("stuck"):
|
||||
context_lines.append("Status: STUCK — need recovery strategy")
|
||||
|
||||
if ui_state.get("dialogue_active"):
|
||||
npc = ui_state.get("dialogue_npc", "NPC")
|
||||
context_lines.append(f"In dialogue with: {npc}")
|
||||
|
||||
if ui_state.get("menu_open"):
|
||||
context_lines.append(f"Menu open: {ui_state['menu_open']}")
|
||||
|
||||
if ui_state.get("combat_active"):
|
||||
context_lines.append("Status: IN COMBAT")
|
||||
|
||||
if visual_context:
|
||||
context_lines.append(f"Scene: {visual_context}")
|
||||
|
||||
system_content = (
|
||||
"You are Timmy, an AI game agent. "
|
||||
"Respond with valid game commands only.\n\n" + "\n".join(context_lines)
|
||||
)
|
||||
|
||||
return [
|
||||
{"role": "system", "content": system_content},
|
||||
{"role": "user", "content": text},
|
||||
]
|
||||
|
||||
|
||||
# ── Default model assignments ────────────────────────────────────────────────
|
||||
# Overridable per deployment via MetabolicRouter(tier_models={...}).
|
||||
# Model benchmarks (M3 Max 36 GB, issue #1063):
|
||||
# Qwen3-8B Q6_K — 0.933 F1 tool calling, ~45-55 tok/s (~6 GB)
|
||||
# Qwen3-14B Q5_K_M — 0.971 F1 tool calling, ~20-28 tok/s (~9.5 GB)
|
||||
# Qwen3-32B Q4_K_M — highest quality, ~8-12 tok/s (~20 GB, on demand)
|
||||
DEFAULT_TIER_MODELS: dict[ModelTier, str] = {
|
||||
ModelTier.T1_ROUTINE: "qwen3:8b",
|
||||
ModelTier.T2_MEDIUM: "qwen3:14b",
|
||||
ModelTier.T3_COMPLEX: "qwen3:30b", # Closest Ollama tag to 32B Q4
|
||||
}
|
||||
|
||||
|
||||
class MetabolicRouter:
|
||||
"""Routes LLM requests to the cheapest-sufficient model tier.
|
||||
|
||||
Wraps CascadeRouter with:
|
||||
- Complexity classification via classify_complexity()
|
||||
- Prompt assembly via build_prompt()
|
||||
- T3 world-pause / world-unpause (graceful if no world adapter)
|
||||
|
||||
Usage::
|
||||
|
||||
router = MetabolicRouter()
|
||||
|
||||
# Simple route call — classification + prompt + inference in one step
|
||||
result = await router.route(
|
||||
task="Go north",
|
||||
state={"location": "Balmora"},
|
||||
ui_state={},
|
||||
)
|
||||
print(result["content"], result["tier"])
|
||||
|
||||
# Pre-classify if you need the tier for telemetry
|
||||
tier = router.classify("Plan the best path to Vivec", game_state)
|
||||
|
||||
# Wire in world adapter for T3 pause/unpause
|
||||
router.set_world(world_adapter)
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
cascade: Any | None = None,
|
||||
tier_models: dict[ModelTier, str] | None = None,
|
||||
) -> None:
|
||||
"""Initialise the metabolic router.
|
||||
|
||||
Args:
|
||||
cascade: CascadeRouter instance to use. If None, the
|
||||
singleton returned by get_router() is used lazily.
|
||||
tier_models: Override default model names per tier.
|
||||
"""
|
||||
self._cascade = cascade
|
||||
self._tier_models: dict[ModelTier, str] = dict(DEFAULT_TIER_MODELS)
|
||||
if tier_models:
|
||||
self._tier_models.update(tier_models)
|
||||
self._world: Any | None = None
|
||||
|
||||
def set_world(self, world: Any) -> None:
|
||||
"""Wire in a world adapter for T3 pause / unpause support.
|
||||
|
||||
The adapter only needs to implement ``act(CommandInput)`` — the full
|
||||
WorldInterface contract is not required. A missing or broken world
|
||||
adapter degrades gracefully (logs a warning, inference continues).
|
||||
|
||||
Args:
|
||||
world: Any object with an ``act(CommandInput)`` method.
|
||||
"""
|
||||
self._world = world
|
||||
|
||||
def _get_cascade(self) -> Any:
|
||||
"""Return the CascadeRouter, creating the singleton if needed."""
|
||||
if self._cascade is None:
|
||||
from infrastructure.router.cascade import get_router
|
||||
|
||||
self._cascade = get_router()
|
||||
return self._cascade
|
||||
|
||||
def classify(self, task: str, state: dict) -> ModelTier:
|
||||
"""Classify task complexity. Delegates to classify_complexity()."""
|
||||
return classify_complexity(task, state)
|
||||
|
||||
async def _pause_world(self) -> None:
|
||||
"""Pause the game world before T3 inference (graceful degradation)."""
|
||||
if self._world is None:
|
||||
return
|
||||
try:
|
||||
from infrastructure.world.types import CommandInput
|
||||
|
||||
await asyncio.to_thread(self._world.act, CommandInput(action="pause"))
|
||||
logger.debug("MetabolicRouter: world paused for T3 inference")
|
||||
except Exception as exc:
|
||||
logger.warning("world.pause() failed — continuing without pause: %s", exc)
|
||||
|
||||
async def _unpause_world(self) -> None:
|
||||
"""Unpause the game world after T3 inference (always called, even on error)."""
|
||||
if self._world is None:
|
||||
return
|
||||
try:
|
||||
from infrastructure.world.types import CommandInput
|
||||
|
||||
await asyncio.to_thread(self._world.act, CommandInput(action="unpause"))
|
||||
logger.debug("MetabolicRouter: world unpaused after T3 inference")
|
||||
except Exception as exc:
|
||||
logger.warning("world.unpause() failed — game may remain paused: %s", exc)
|
||||
|
||||
async def route(
|
||||
self,
|
||||
task: str,
|
||||
state: dict,
|
||||
ui_state: dict | None = None,
|
||||
visual_context: str | None = None,
|
||||
temperature: float = 0.3,
|
||||
max_tokens: int | None = None,
|
||||
) -> dict:
|
||||
"""Route a task to the appropriate model tier and return the LLM response.
|
||||
|
||||
Selects the tier via classify_complexity(), assembles the prompt via
|
||||
build_prompt(), and dispatches to CascadeRouter. For T3, the game
|
||||
world is paused before inference and unpaused after (in a finally block).
|
||||
|
||||
Args:
|
||||
task: Natural-language task description or player input.
|
||||
state: Current game state dict.
|
||||
ui_state: Current UI state dict (optional, defaults to {}).
|
||||
visual_context: Optional screen/scene description from vision model.
|
||||
temperature: Sampling temperature (default 0.3 for game commands).
|
||||
max_tokens: Maximum tokens to generate.
|
||||
|
||||
Returns:
|
||||
Dict with keys: ``content``, ``provider``, ``model``, ``tier``,
|
||||
``latency_ms``, plus any extra keys from CascadeRouter.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If all providers fail (propagated from CascadeRouter).
|
||||
"""
|
||||
ui_state = ui_state or {}
|
||||
tier = self.classify(task, state)
|
||||
model = self._tier_models[tier]
|
||||
messages = build_prompt(state, ui_state, task, visual_context)
|
||||
cascade = self._get_cascade()
|
||||
|
||||
logger.info(
|
||||
"MetabolicRouter: tier=%s model=%s task=%r",
|
||||
tier,
|
||||
model,
|
||||
task[:80],
|
||||
)
|
||||
|
||||
if tier == ModelTier.T3_COMPLEX:
|
||||
await self._pause_world()
|
||||
try:
|
||||
result = await cascade.complete(
|
||||
messages=messages,
|
||||
model=model,
|
||||
temperature=temperature,
|
||||
max_tokens=max_tokens,
|
||||
)
|
||||
finally:
|
||||
await self._unpause_world()
|
||||
else:
|
||||
result = await cascade.complete(
|
||||
messages=messages,
|
||||
model=model,
|
||||
temperature=temperature,
|
||||
max_tokens=max_tokens,
|
||||
)
|
||||
|
||||
result["tier"] = tier
|
||||
return result
|
||||
|
||||
|
||||
# ── Module-level singleton ────────────────────────────────────────────────────
|
||||
_metabolic_router: MetabolicRouter | None = None
|
||||
|
||||
|
||||
def get_metabolic_router() -> MetabolicRouter:
|
||||
"""Get or create the MetabolicRouter singleton."""
|
||||
global _metabolic_router
|
||||
if _metabolic_router is None:
|
||||
_metabolic_router = MetabolicRouter()
|
||||
return _metabolic_router
|
||||
@@ -135,7 +135,9 @@ class BannerlordObserver:
|
||||
self._host = host or settings.gabs_host
|
||||
self._port = port or settings.gabs_port
|
||||
self._timeout = timeout if timeout is not None else settings.gabs_timeout
|
||||
self._poll_interval = poll_interval if poll_interval is not None else settings.gabs_poll_interval
|
||||
self._poll_interval = (
|
||||
poll_interval if poll_interval is not None else settings.gabs_poll_interval
|
||||
)
|
||||
self._journal_path = Path(journal_path) if journal_path else _get_journal_path()
|
||||
self._entry_count = 0
|
||||
self._days_observed: set[str] = set()
|
||||
|
||||
@@ -24,6 +24,8 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
@dataclass
|
||||
class Intent:
|
||||
"""A classified user intent with confidence score and extracted entities."""
|
||||
|
||||
name: str
|
||||
confidence: float # 0.0 to 1.0
|
||||
entities: dict
|
||||
|
||||
@@ -17,11 +17,15 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class TxType(StrEnum):
|
||||
"""Lightning transaction direction type."""
|
||||
|
||||
incoming = "incoming"
|
||||
outgoing = "outgoing"
|
||||
|
||||
|
||||
class TxStatus(StrEnum):
|
||||
"""Lightning transaction settlement status."""
|
||||
|
||||
pending = "pending"
|
||||
settled = "settled"
|
||||
failed = "failed"
|
||||
|
||||
@@ -21,6 +21,7 @@ from agno.models.ollama import Ollama
|
||||
|
||||
from config import settings
|
||||
from infrastructure.events.bus import Event, EventBus
|
||||
from timmy.agents.emotional_state import EmotionalStateTracker
|
||||
|
||||
try:
|
||||
from mcp.registry import tool_registry
|
||||
@@ -42,6 +43,7 @@ class BaseAgent(ABC):
|
||||
tools: list[str] | None = None,
|
||||
model: str | None = None,
|
||||
max_history: int = 10,
|
||||
initial_emotion: str = "calm",
|
||||
) -> None:
|
||||
self.agent_id = agent_id
|
||||
self.name = name
|
||||
@@ -54,6 +56,9 @@ class BaseAgent(ABC):
|
||||
self.system_prompt = system_prompt
|
||||
self.agent = self._create_agent(system_prompt)
|
||||
|
||||
# Emotional state tracker
|
||||
self.emotional_state = EmotionalStateTracker(initial_emotion=initial_emotion)
|
||||
|
||||
# Event bus for communication
|
||||
self.event_bus: EventBus | None = None
|
||||
|
||||
@@ -137,7 +142,14 @@ class BaseAgent(ABC):
|
||||
ReadTimeout — these are transient and retried with exponential
|
||||
backoff (#70).
|
||||
"""
|
||||
response = await self._run_with_retries(message, max_retries)
|
||||
self.emotional_state.process_event("task_assigned")
|
||||
self._apply_emotional_prompt()
|
||||
try:
|
||||
response = await self._run_with_retries(message, max_retries)
|
||||
except Exception:
|
||||
self.emotional_state.process_event("task_failure")
|
||||
raise
|
||||
self.emotional_state.process_event("task_success")
|
||||
await self._emit_response_event(message, response)
|
||||
return response
|
||||
|
||||
@@ -206,6 +218,14 @@ class BaseAgent(ABC):
|
||||
)
|
||||
)
|
||||
|
||||
def _apply_emotional_prompt(self) -> None:
|
||||
"""Inject the current emotional modifier into the agent's description."""
|
||||
modifier = self.emotional_state.get_prompt_modifier()
|
||||
if modifier:
|
||||
self.agent.description = f"{self.system_prompt}\n\n[Emotional State: {modifier}]"
|
||||
else:
|
||||
self.agent.description = self.system_prompt
|
||||
|
||||
def get_capabilities(self) -> list[str]:
|
||||
"""Get list of capabilities this agent provides."""
|
||||
return self.tools
|
||||
@@ -219,6 +239,7 @@ class BaseAgent(ABC):
|
||||
"model": self.model,
|
||||
"status": "ready",
|
||||
"tools": self.tools,
|
||||
"emotional_profile": self.emotional_state.get_profile(),
|
||||
}
|
||||
|
||||
|
||||
@@ -239,6 +260,7 @@ class SubAgent(BaseAgent):
|
||||
tools: list[str] | None = None,
|
||||
model: str | None = None,
|
||||
max_history: int = 10,
|
||||
initial_emotion: str = "calm",
|
||||
) -> None:
|
||||
super().__init__(
|
||||
agent_id=agent_id,
|
||||
@@ -248,6 +270,7 @@ class SubAgent(BaseAgent):
|
||||
tools=tools,
|
||||
model=model,
|
||||
max_history=max_history,
|
||||
initial_emotion=initial_emotion,
|
||||
)
|
||||
|
||||
async def execute_task(self, task_id: str, description: str, context: dict) -> Any:
|
||||
|
||||
222
src/timmy/agents/emotional_state.py
Normal file
222
src/timmy/agents/emotional_state.py
Normal file
@@ -0,0 +1,222 @@
|
||||
"""Agent emotional state simulation.
|
||||
|
||||
Tracks per-agent emotional states that influence narration and decision-making
|
||||
style. Emotional state is influenced by events (task outcomes, errors, etc.)
|
||||
and exposed via ``get_profile()`` for the dashboard.
|
||||
|
||||
Usage:
|
||||
from timmy.agents.emotional_state import EmotionalStateTracker
|
||||
|
||||
tracker = EmotionalStateTracker()
|
||||
tracker.process_event("task_success", {"description": "Deployed fix"})
|
||||
profile = tracker.get_profile()
|
||||
"""
|
||||
|
||||
import logging
|
||||
import time
|
||||
from dataclasses import asdict, dataclass, field
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Emotional states
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
EMOTIONAL_STATES = (
|
||||
"cautious",
|
||||
"adventurous",
|
||||
"analytical",
|
||||
"frustrated",
|
||||
"confident",
|
||||
"curious",
|
||||
"calm",
|
||||
)
|
||||
|
||||
# Prompt modifiers per emotional state — injected into system prompts
|
||||
EMOTION_PROMPT_MODIFIERS: dict[str, str] = {
|
||||
"cautious": (
|
||||
"You are feeling cautious. Prefer safe, well-tested approaches. "
|
||||
"Flag risks early. Double-check assumptions before acting."
|
||||
),
|
||||
"adventurous": (
|
||||
"You are feeling adventurous. Be bold and creative in your suggestions. "
|
||||
"Explore unconventional solutions. Take initiative."
|
||||
),
|
||||
"analytical": (
|
||||
"You are feeling analytical. Break problems down methodically. "
|
||||
"Rely on data and evidence. Present structured reasoning."
|
||||
),
|
||||
"frustrated": (
|
||||
"You are feeling frustrated. Be brief and direct. "
|
||||
"Focus on unblocking the immediate problem. Avoid tangents."
|
||||
),
|
||||
"confident": (
|
||||
"You are feeling confident. Speak with authority. "
|
||||
"Make clear recommendations. Move decisively."
|
||||
),
|
||||
"curious": (
|
||||
"You are feeling curious. Ask clarifying questions. "
|
||||
"Explore multiple angles. Show genuine interest in the problem."
|
||||
),
|
||||
"calm": (
|
||||
"You are feeling calm and steady. Respond thoughtfully. "
|
||||
"Maintain composure. Prioritise clarity over speed."
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Event → emotion transition rules
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# Maps event types to the emotional state they trigger and an intensity (0-1).
|
||||
# Higher intensity means the event has a stronger effect on the mood.
|
||||
EVENT_TRANSITIONS: dict[str, tuple[str, float]] = {
|
||||
"task_success": ("confident", 0.6),
|
||||
"task_failure": ("frustrated", 0.7),
|
||||
"task_assigned": ("analytical", 0.4),
|
||||
"error": ("cautious", 0.6),
|
||||
"health_low": ("cautious", 0.8),
|
||||
"health_recovered": ("calm", 0.5),
|
||||
"quest_completed": ("adventurous", 0.7),
|
||||
"new_discovery": ("curious", 0.6),
|
||||
"complex_problem": ("analytical", 0.5),
|
||||
"repeated_failure": ("frustrated", 0.9),
|
||||
"idle": ("calm", 0.3),
|
||||
"user_praise": ("confident", 0.5),
|
||||
"user_correction": ("cautious", 0.5),
|
||||
}
|
||||
|
||||
# Emotional state decay — how quickly emotions return to calm (seconds)
|
||||
_DECAY_INTERVAL = 300 # 5 minutes
|
||||
|
||||
|
||||
@dataclass
|
||||
class EmotionalState:
|
||||
"""Snapshot of an agent's emotional state."""
|
||||
|
||||
current_emotion: str = "calm"
|
||||
intensity: float = 0.5 # 0.0 (barely noticeable) to 1.0 (overwhelming)
|
||||
previous_emotion: str = "calm"
|
||||
trigger_event: str = "" # What caused the current emotion
|
||||
updated_at: float = field(default_factory=time.time)
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
"""Serialise for API / dashboard consumption."""
|
||||
d = asdict(self)
|
||||
d["emotion_label"] = self.current_emotion.replace("_", " ").title()
|
||||
return d
|
||||
|
||||
|
||||
class EmotionalStateTracker:
|
||||
"""Per-agent emotional state tracker.
|
||||
|
||||
Each agent instance owns one tracker. The tracker processes events,
|
||||
applies transition rules, and decays emotion intensity over time.
|
||||
"""
|
||||
|
||||
def __init__(self, initial_emotion: str = "calm") -> None:
|
||||
if initial_emotion not in EMOTIONAL_STATES:
|
||||
initial_emotion = "calm"
|
||||
self.state = EmotionalState(current_emotion=initial_emotion)
|
||||
|
||||
def process_event(self, event_type: str, context: dict | None = None) -> EmotionalState:
|
||||
"""Update emotional state based on an event.
|
||||
|
||||
Args:
|
||||
event_type: One of the keys in EVENT_TRANSITIONS, or a custom
|
||||
event type (unknown events are ignored).
|
||||
context: Optional dict with event details (for logging).
|
||||
|
||||
Returns:
|
||||
The updated EmotionalState.
|
||||
"""
|
||||
transition = EVENT_TRANSITIONS.get(event_type)
|
||||
if transition is None:
|
||||
logger.debug("Unknown emotional event: %s (ignored)", event_type)
|
||||
return self.state
|
||||
|
||||
new_emotion, raw_intensity = transition
|
||||
|
||||
# Blend with current intensity — repeated same-emotion events amplify
|
||||
if new_emotion == self.state.current_emotion:
|
||||
blended = min(1.0, self.state.intensity + raw_intensity * 0.3)
|
||||
else:
|
||||
blended = raw_intensity
|
||||
|
||||
self.state.previous_emotion = self.state.current_emotion
|
||||
self.state.current_emotion = new_emotion
|
||||
self.state.intensity = round(blended, 2)
|
||||
self.state.trigger_event = event_type
|
||||
self.state.updated_at = time.time()
|
||||
|
||||
logger.debug(
|
||||
"Emotional transition: %s → %s (intensity=%.2f, trigger=%s)",
|
||||
self.state.previous_emotion,
|
||||
new_emotion,
|
||||
blended,
|
||||
event_type,
|
||||
)
|
||||
return self.state
|
||||
|
||||
def decay(self) -> EmotionalState:
|
||||
"""Apply time-based decay toward calm.
|
||||
|
||||
Called periodically (e.g. from a background loop). If enough time
|
||||
has passed since the last update, intensity decreases and eventually
|
||||
the emotion resets to calm.
|
||||
"""
|
||||
elapsed = time.time() - self.state.updated_at
|
||||
if elapsed < _DECAY_INTERVAL:
|
||||
return self.state
|
||||
|
||||
# Reduce intensity by 0.1 per decay interval
|
||||
decay_steps = int(elapsed / _DECAY_INTERVAL)
|
||||
new_intensity = max(0.0, self.state.intensity - 0.1 * decay_steps)
|
||||
|
||||
if new_intensity <= 0.1:
|
||||
# Emotion has decayed — return to calm
|
||||
self.state.previous_emotion = self.state.current_emotion
|
||||
self.state.current_emotion = "calm"
|
||||
self.state.intensity = 0.5
|
||||
self.state.trigger_event = "decay"
|
||||
else:
|
||||
self.state.intensity = round(new_intensity, 2)
|
||||
|
||||
self.state.updated_at = time.time()
|
||||
return self.state
|
||||
|
||||
def get_profile(self) -> dict:
|
||||
"""Return the full emotional profile for dashboard display."""
|
||||
self.decay() # Apply any pending decay
|
||||
return {
|
||||
"current_emotion": self.state.current_emotion,
|
||||
"emotion_label": self.state.current_emotion.replace("_", " ").title(),
|
||||
"intensity": self.state.intensity,
|
||||
"intensity_label": _intensity_label(self.state.intensity),
|
||||
"previous_emotion": self.state.previous_emotion,
|
||||
"trigger_event": self.state.trigger_event,
|
||||
"prompt_modifier": EMOTION_PROMPT_MODIFIERS.get(self.state.current_emotion, ""),
|
||||
}
|
||||
|
||||
def get_prompt_modifier(self) -> str:
|
||||
"""Return the prompt modifier string for the current emotion."""
|
||||
self.decay()
|
||||
return EMOTION_PROMPT_MODIFIERS.get(self.state.current_emotion, "")
|
||||
|
||||
def reset(self) -> None:
|
||||
"""Reset to calm baseline."""
|
||||
self.state = EmotionalState()
|
||||
|
||||
|
||||
def _intensity_label(intensity: float) -> str:
|
||||
"""Human-readable label for intensity value."""
|
||||
if intensity >= 0.8:
|
||||
return "overwhelming"
|
||||
if intensity >= 0.6:
|
||||
return "strong"
|
||||
if intensity >= 0.4:
|
||||
return "moderate"
|
||||
if intensity >= 0.2:
|
||||
return "mild"
|
||||
return "faint"
|
||||
@@ -119,6 +119,8 @@ def load_agents(force_reload: bool = False) -> dict[str, Any]:
|
||||
max_history = agent_cfg.get("max_history", defaults.get("max_history", 10))
|
||||
tools = agent_cfg.get("tools", defaults.get("tools", []))
|
||||
|
||||
initial_emotion = agent_cfg.get("initial_emotion", "calm")
|
||||
|
||||
agent = SubAgent(
|
||||
agent_id=agent_id,
|
||||
name=agent_cfg.get("name", agent_id.title()),
|
||||
@@ -127,6 +129,7 @@ def load_agents(force_reload: bool = False) -> dict[str, Any]:
|
||||
tools=tools,
|
||||
model=model,
|
||||
max_history=max_history,
|
||||
initial_emotion=initial_emotion,
|
||||
)
|
||||
|
||||
_agents[agent_id] = agent
|
||||
|
||||
@@ -36,6 +36,8 @@ _EXPIRY_DAYS = 7
|
||||
|
||||
@dataclass
|
||||
class ApprovalItem:
|
||||
"""A proposed autonomous action requiring owner approval."""
|
||||
|
||||
id: str
|
||||
title: str
|
||||
description: str
|
||||
|
||||
@@ -36,7 +36,7 @@ import asyncio
|
||||
import logging
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import UTC, datetime, timedelta
|
||||
from datetime import UTC, datetime
|
||||
from typing import Any
|
||||
|
||||
import httpx
|
||||
@@ -70,7 +70,9 @@ _LOOP_TAG = "loop-generated"
|
||||
|
||||
# Regex patterns for scoring
|
||||
_TAG_RE = re.compile(r"\[([^\]]+)\]")
|
||||
_FILE_RE = re.compile(r"(?:src/|tests/|scripts/|\.py|\.html|\.js|\.yaml|\.toml|\.sh)", re.IGNORECASE)
|
||||
_FILE_RE = re.compile(
|
||||
r"(?:src/|tests/|scripts/|\.py|\.html|\.js|\.yaml|\.toml|\.sh)", re.IGNORECASE
|
||||
)
|
||||
_FUNC_RE = re.compile(r"(?:def |class |function |method |`\w+\(\)`)", re.IGNORECASE)
|
||||
_ACCEPT_RE = re.compile(
|
||||
r"(?:should|must|expect|verify|assert|test.?case|acceptance|criteria"
|
||||
@@ -451,9 +453,7 @@ async def add_label(
|
||||
|
||||
# Apply to the issue
|
||||
apply_url = _repo_url(f"issues/{issue_number}/labels")
|
||||
apply_resp = await client.post(
|
||||
apply_url, headers=headers, json={"labels": [label_id]}
|
||||
)
|
||||
apply_resp = await client.post(apply_url, headers=headers, json={"labels": [label_id]})
|
||||
return apply_resp.status_code in (200, 201)
|
||||
|
||||
except (httpx.ConnectError, httpx.ReadError, httpx.TimeoutException) as exc:
|
||||
@@ -692,7 +692,9 @@ class BacklogTriageLoop:
|
||||
# 1. Fetch
|
||||
raw_issues = await fetch_open_issues(client)
|
||||
result.total_open = len(raw_issues)
|
||||
logger.info("Triage cycle #%d: fetched %d open issues", self._cycle_count, len(raw_issues))
|
||||
logger.info(
|
||||
"Triage cycle #%d: fetched %d open issues", self._cycle_count, len(raw_issues)
|
||||
)
|
||||
|
||||
# 2. Score
|
||||
scored = [score_issue(i) for i in raw_issues]
|
||||
|
||||
@@ -46,6 +46,8 @@ class ApprovalItem:
|
||||
|
||||
@dataclass
|
||||
class Briefing:
|
||||
"""A generated morning briefing summarizing recent activity and pending approvals."""
|
||||
|
||||
generated_at: datetime
|
||||
summary: str # 150-300 words
|
||||
approval_items: list[ApprovalItem] = field(default_factory=list)
|
||||
|
||||
@@ -37,7 +37,7 @@ from __future__ import annotations
|
||||
import asyncio
|
||||
import logging
|
||||
from dataclasses import dataclass, field
|
||||
from enum import Enum
|
||||
from enum import StrEnum
|
||||
from typing import Any
|
||||
|
||||
from config import settings
|
||||
@@ -48,7 +48,8 @@ logger = logging.getLogger(__name__)
|
||||
# Enumerations
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class AgentType(str, Enum):
|
||||
|
||||
class AgentType(StrEnum):
|
||||
"""Known agents in the swarm."""
|
||||
|
||||
CLAUDE_CODE = "claude_code"
|
||||
@@ -57,7 +58,7 @@ class AgentType(str, Enum):
|
||||
TIMMY = "timmy"
|
||||
|
||||
|
||||
class TaskType(str, Enum):
|
||||
class TaskType(StrEnum):
|
||||
"""Categories of engineering work."""
|
||||
|
||||
# Claude Code strengths
|
||||
@@ -83,7 +84,7 @@ class TaskType(str, Enum):
|
||||
ORCHESTRATION = "orchestration"
|
||||
|
||||
|
||||
class DispatchStatus(str, Enum):
|
||||
class DispatchStatus(StrEnum):
|
||||
"""Lifecycle state of a dispatched task."""
|
||||
|
||||
PENDING = "pending"
|
||||
@@ -99,6 +100,7 @@ class DispatchStatus(str, Enum):
|
||||
# Agent registry
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@dataclass
|
||||
class AgentSpec:
|
||||
"""Capabilities and limits for a single agent."""
|
||||
@@ -106,9 +108,9 @@ class AgentSpec:
|
||||
name: AgentType
|
||||
display_name: str
|
||||
strengths: frozenset[TaskType]
|
||||
gitea_label: str | None # label to apply when dispatching
|
||||
gitea_label: str | None # label to apply when dispatching
|
||||
max_concurrent: int = 1
|
||||
interface: str = "gitea" # "gitea" | "api" | "local"
|
||||
interface: str = "gitea" # "gitea" | "api" | "local"
|
||||
api_endpoint: str | None = None # for interface="api"
|
||||
|
||||
|
||||
@@ -197,6 +199,7 @@ _TASK_ROUTING: dict[TaskType, AgentType] = {
|
||||
# Dispatch result
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@dataclass
|
||||
class DispatchResult:
|
||||
"""Outcome of a dispatch call."""
|
||||
@@ -220,6 +223,7 @@ class DispatchResult:
|
||||
# Routing logic
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def select_agent(task_type: TaskType) -> AgentType:
|
||||
"""Return the best agent for *task_type* based on the routing table.
|
||||
|
||||
@@ -248,11 +252,23 @@ def infer_task_type(title: str, description: str = "") -> TaskType:
|
||||
text = (title + " " + description).lower()
|
||||
|
||||
_SIGNALS: list[tuple[TaskType, frozenset[str]]] = [
|
||||
(TaskType.ARCHITECTURE, frozenset({"architect", "design", "adr", "system design", "schema"})),
|
||||
(TaskType.REFACTORING, frozenset({"refactor", "clean up", "cleanup", "reorganise", "reorganize"})),
|
||||
(
|
||||
TaskType.ARCHITECTURE,
|
||||
frozenset({"architect", "design", "adr", "system design", "schema"}),
|
||||
),
|
||||
(
|
||||
TaskType.REFACTORING,
|
||||
frozenset({"refactor", "clean up", "cleanup", "reorganise", "reorganize"}),
|
||||
),
|
||||
(TaskType.CODE_REVIEW, frozenset({"review", "pr review", "pull request review", "audit"})),
|
||||
(TaskType.COMPLEX_REASONING, frozenset({"complex", "hard problem", "debug", "investigate", "diagnose"})),
|
||||
(TaskType.RESEARCH, frozenset({"research", "survey", "literature", "benchmark", "analyse", "analyze"})),
|
||||
(
|
||||
TaskType.COMPLEX_REASONING,
|
||||
frozenset({"complex", "hard problem", "debug", "investigate", "diagnose"}),
|
||||
),
|
||||
(
|
||||
TaskType.RESEARCH,
|
||||
frozenset({"research", "survey", "literature", "benchmark", "analyse", "analyze"}),
|
||||
),
|
||||
(TaskType.ANALYSIS, frozenset({"analysis", "profil", "trace", "metric", "performance"})),
|
||||
(TaskType.TRIAGE, frozenset({"triage", "classify", "prioritise", "prioritize"})),
|
||||
(TaskType.PLANNING, frozenset({"plan", "roadmap", "milestone", "epic", "spike"})),
|
||||
@@ -273,6 +289,7 @@ def infer_task_type(title: str, description: str = "") -> TaskType:
|
||||
# Gitea helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
async def _post_gitea_comment(
|
||||
client: Any,
|
||||
base_url: str,
|
||||
@@ -405,6 +422,50 @@ async def _poll_issue_completion(
|
||||
# Core dispatch functions
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _format_assignment_comment(
|
||||
display_name: str,
|
||||
task_type: TaskType,
|
||||
description: str,
|
||||
acceptance_criteria: list[str],
|
||||
) -> str:
|
||||
"""Build the markdown comment body for a task assignment.
|
||||
|
||||
Args:
|
||||
display_name: Human-readable agent name.
|
||||
task_type: The inferred task type.
|
||||
description: Task description.
|
||||
acceptance_criteria: List of acceptance criteria strings.
|
||||
|
||||
Returns:
|
||||
Formatted markdown string for the comment.
|
||||
"""
|
||||
criteria_md = (
|
||||
"\n".join(f"- {c}" for c in acceptance_criteria)
|
||||
if acceptance_criteria
|
||||
else "_None specified_"
|
||||
)
|
||||
return (
|
||||
f"## Assigned to {display_name}\n\n"
|
||||
f"**Task type:** `{task_type.value}`\n\n"
|
||||
f"**Description:**\n{description}\n\n"
|
||||
f"**Acceptance criteria:**\n{criteria_md}\n\n"
|
||||
f"---\n*Dispatched by Timmy agent dispatcher.*"
|
||||
)
|
||||
|
||||
|
||||
def _select_label(agent: AgentType) -> str | None:
|
||||
"""Return the Gitea label for an agent based on its spec.
|
||||
|
||||
Args:
|
||||
agent: The target agent.
|
||||
|
||||
Returns:
|
||||
Label name or None if the agent has no label.
|
||||
"""
|
||||
return AGENT_REGISTRY[agent].gitea_label
|
||||
|
||||
|
||||
async def _dispatch_via_gitea(
|
||||
agent: AgentType,
|
||||
issue_number: int,
|
||||
@@ -459,33 +520,27 @@ async def _dispatch_via_gitea(
|
||||
|
||||
async with httpx.AsyncClient(timeout=15) as client:
|
||||
# 1. Apply agent label (if applicable)
|
||||
if spec.gitea_label:
|
||||
ok = await _apply_gitea_label(
|
||||
client, base_url, repo, headers, issue_number, spec.gitea_label
|
||||
)
|
||||
label = _select_label(agent)
|
||||
if label:
|
||||
ok = await _apply_gitea_label(client, base_url, repo, headers, issue_number, label)
|
||||
if ok:
|
||||
label_applied = spec.gitea_label
|
||||
label_applied = label
|
||||
logger.info(
|
||||
"Applied label %r to issue #%s for %s",
|
||||
spec.gitea_label,
|
||||
label,
|
||||
issue_number,
|
||||
spec.display_name,
|
||||
)
|
||||
else:
|
||||
logger.warning(
|
||||
"Could not apply label %r to issue #%s",
|
||||
spec.gitea_label,
|
||||
label,
|
||||
issue_number,
|
||||
)
|
||||
|
||||
# 2. Post assignment comment
|
||||
criteria_md = "\n".join(f"- {c}" for c in acceptance_criteria) if acceptance_criteria else "_None specified_"
|
||||
comment_body = (
|
||||
f"## Assigned to {spec.display_name}\n\n"
|
||||
f"**Task type:** `{task_type.value}`\n\n"
|
||||
f"**Description:**\n{description}\n\n"
|
||||
f"**Acceptance criteria:**\n{criteria_md}\n\n"
|
||||
f"---\n*Dispatched by Timmy agent dispatcher.*"
|
||||
comment_body = _format_assignment_comment(
|
||||
spec.display_name, task_type, description, acceptance_criteria
|
||||
)
|
||||
comment_id = await _post_gitea_comment(
|
||||
client, base_url, repo, headers, issue_number, comment_body
|
||||
@@ -616,9 +671,7 @@ async def _dispatch_local(
|
||||
assumed to succeed at dispatch time).
|
||||
"""
|
||||
task_type = infer_task_type(title, description)
|
||||
logger.info(
|
||||
"Timmy handling task locally: %r (issue #%s)", title[:60], issue_number
|
||||
)
|
||||
logger.info("Timmy handling task locally: %r (issue #%s)", title[:60], issue_number)
|
||||
return DispatchResult(
|
||||
task_type=task_type,
|
||||
agent=AgentType.TIMMY,
|
||||
@@ -632,6 +685,81 @@ async def _dispatch_local(
|
||||
# Public entry point
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _validate_task(
|
||||
title: str,
|
||||
task_type: TaskType | None,
|
||||
agent: AgentType | None,
|
||||
issue_number: int | None,
|
||||
) -> DispatchResult | None:
|
||||
"""Validate task preconditions.
|
||||
|
||||
Args:
|
||||
title: Task title to validate.
|
||||
task_type: Optional task type for result construction.
|
||||
agent: Optional agent for result construction.
|
||||
issue_number: Optional issue number for result construction.
|
||||
|
||||
Returns:
|
||||
A failed DispatchResult if validation fails, None otherwise.
|
||||
"""
|
||||
if not title.strip():
|
||||
return DispatchResult(
|
||||
task_type=task_type or TaskType.ROUTINE_CODING,
|
||||
agent=agent or AgentType.TIMMY,
|
||||
issue_number=issue_number,
|
||||
status=DispatchStatus.FAILED,
|
||||
error="`title` is required.",
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
def _select_dispatch_strategy(agent: AgentType, issue_number: int | None) -> str:
|
||||
"""Select the dispatch strategy based on agent interface and context.
|
||||
|
||||
Args:
|
||||
agent: The target agent.
|
||||
issue_number: Optional Gitea issue number.
|
||||
|
||||
Returns:
|
||||
Strategy name: "gitea", "api", or "local".
|
||||
"""
|
||||
spec = AGENT_REGISTRY[agent]
|
||||
if spec.interface == "gitea" and issue_number is not None:
|
||||
return "gitea"
|
||||
if spec.interface == "api":
|
||||
return "api"
|
||||
return "local"
|
||||
|
||||
|
||||
def _log_dispatch_result(
|
||||
title: str,
|
||||
result: DispatchResult,
|
||||
attempt: int,
|
||||
max_retries: int,
|
||||
) -> None:
|
||||
"""Log the outcome of a dispatch attempt.
|
||||
|
||||
Args:
|
||||
title: Task title for logging context.
|
||||
result: The dispatch result.
|
||||
attempt: Current attempt number (0-indexed).
|
||||
max_retries: Maximum retry attempts allowed.
|
||||
"""
|
||||
if result.success:
|
||||
return
|
||||
|
||||
if attempt > 0:
|
||||
logger.info("Retry %d/%d for task %r", attempt, max_retries, title[:60])
|
||||
|
||||
logger.warning(
|
||||
"Dispatch attempt %d failed for task %r: %s",
|
||||
attempt + 1,
|
||||
title[:60],
|
||||
result.error,
|
||||
)
|
||||
|
||||
|
||||
async def dispatch_task(
|
||||
title: str,
|
||||
description: str = "",
|
||||
@@ -672,17 +800,13 @@ async def dispatch_task(
|
||||
if result.success:
|
||||
print(f"Assigned to {result.agent.value}")
|
||||
"""
|
||||
# 1. Validate
|
||||
validation_error = _validate_task(title, task_type, agent, issue_number)
|
||||
if validation_error:
|
||||
return validation_error
|
||||
|
||||
# 2. Resolve task type and agent
|
||||
criteria = acceptance_criteria or []
|
||||
|
||||
if not title.strip():
|
||||
return DispatchResult(
|
||||
task_type=task_type or TaskType.ROUTINE_CODING,
|
||||
agent=agent or AgentType.TIMMY,
|
||||
issue_number=issue_number,
|
||||
status=DispatchStatus.FAILED,
|
||||
error="`title` is required.",
|
||||
)
|
||||
|
||||
resolved_type = task_type or infer_task_type(title, description)
|
||||
resolved_agent = agent or select_agent(resolved_type)
|
||||
|
||||
@@ -694,18 +818,16 @@ async def dispatch_task(
|
||||
issue_number,
|
||||
)
|
||||
|
||||
spec = AGENT_REGISTRY[resolved_agent]
|
||||
|
||||
# 3. Select strategy and dispatch with retries
|
||||
strategy = _select_dispatch_strategy(resolved_agent, issue_number)
|
||||
last_result: DispatchResult | None = None
|
||||
for attempt in range(max_retries + 1):
|
||||
if attempt > 0:
|
||||
logger.info("Retry %d/%d for task %r", attempt, max_retries, title[:60])
|
||||
|
||||
if spec.interface == "gitea" and issue_number is not None:
|
||||
for attempt in range(max_retries + 1):
|
||||
if strategy == "gitea":
|
||||
result = await _dispatch_via_gitea(
|
||||
resolved_agent, issue_number, title, description, criteria
|
||||
)
|
||||
elif spec.interface == "api":
|
||||
elif strategy == "api":
|
||||
result = await _dispatch_via_api(
|
||||
resolved_agent, title, description, criteria, issue_number, api_endpoint
|
||||
)
|
||||
@@ -718,14 +840,9 @@ async def dispatch_task(
|
||||
if result.success:
|
||||
return result
|
||||
|
||||
logger.warning(
|
||||
"Dispatch attempt %d failed for task %r: %s",
|
||||
attempt + 1,
|
||||
title[:60],
|
||||
result.error,
|
||||
)
|
||||
_log_dispatch_result(title, result, attempt, max_retries)
|
||||
|
||||
# All attempts exhausted — escalate
|
||||
# 4. All attempts exhausted — escalate
|
||||
assert last_result is not None
|
||||
last_result.status = DispatchStatus.ESCALATED
|
||||
logger.error(
|
||||
@@ -769,9 +886,7 @@ async def _log_escalation(
|
||||
f"---\n*Timmy agent dispatcher.*"
|
||||
)
|
||||
async with httpx.AsyncClient(timeout=10) as client:
|
||||
await _post_gitea_comment(
|
||||
client, base_url, repo, headers, issue_number, body
|
||||
)
|
||||
await _post_gitea_comment(client, base_url, repo, headers, issue_number, body)
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to post escalation comment: %s", exc)
|
||||
|
||||
@@ -780,6 +895,7 @@ async def _log_escalation(
|
||||
# Monitoring helper
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
async def wait_for_completion(
|
||||
issue_number: int,
|
||||
poll_interval: int = 60,
|
||||
|
||||
@@ -142,18 +142,8 @@ def _build_shell_tool() -> MCPToolDef | None:
|
||||
return None
|
||||
|
||||
|
||||
def _build_gitea_tools() -> list[MCPToolDef]:
|
||||
"""Build Gitea MCP tool definitions for direct Ollama bridge use.
|
||||
|
||||
These tools call the Gitea REST API directly via httpx rather than
|
||||
spawning an MCP server subprocess, keeping the bridge lightweight.
|
||||
"""
|
||||
if not settings.gitea_enabled or not settings.gitea_token:
|
||||
return []
|
||||
|
||||
base_url = settings.gitea_url
|
||||
token = settings.gitea_token
|
||||
owner, repo = settings.gitea_repo.split("/", 1)
|
||||
def _build_list_issues_tool(base_url: str, token: str, owner: str, repo: str) -> MCPToolDef:
|
||||
"""Build the list_issues tool for a specific Gitea repo."""
|
||||
|
||||
async def _list_issues(**kwargs: Any) -> str:
|
||||
state = kwargs.get("state", "open")
|
||||
@@ -178,6 +168,30 @@ def _build_gitea_tools() -> list[MCPToolDef]:
|
||||
except Exception as exc:
|
||||
return f"Error listing issues: {exc}"
|
||||
|
||||
return MCPToolDef(
|
||||
name="list_issues",
|
||||
description="List issues in the Gitea repository. Returns issue numbers and titles.",
|
||||
parameters={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"state": {
|
||||
"type": "string",
|
||||
"description": "Filter by state: open, closed, or all (default: open)",
|
||||
},
|
||||
"limit": {
|
||||
"type": "integer",
|
||||
"description": "Maximum number of issues to return (default: 10)",
|
||||
},
|
||||
},
|
||||
"required": [],
|
||||
},
|
||||
handler=_list_issues,
|
||||
)
|
||||
|
||||
|
||||
def _build_create_issue_tool(base_url: str, token: str, owner: str, repo: str) -> MCPToolDef:
|
||||
"""Build the create_issue tool for a specific Gitea repo."""
|
||||
|
||||
async def _create_issue(**kwargs: Any) -> str:
|
||||
title = kwargs.get("title", "")
|
||||
body = kwargs.get("body", "")
|
||||
@@ -199,6 +213,30 @@ def _build_gitea_tools() -> list[MCPToolDef]:
|
||||
except Exception as exc:
|
||||
return f"Error creating issue: {exc}"
|
||||
|
||||
return MCPToolDef(
|
||||
name="create_issue",
|
||||
description="Create a new issue in the Gitea repository.",
|
||||
parameters={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"title": {
|
||||
"type": "string",
|
||||
"description": "Issue title (required)",
|
||||
},
|
||||
"body": {
|
||||
"type": "string",
|
||||
"description": "Issue body in markdown (optional)",
|
||||
},
|
||||
},
|
||||
"required": ["title"],
|
||||
},
|
||||
handler=_create_issue,
|
||||
)
|
||||
|
||||
|
||||
def _build_read_issue_tool(base_url: str, token: str, owner: str, repo: str) -> MCPToolDef:
|
||||
"""Build the read_issue tool for a specific Gitea repo."""
|
||||
|
||||
async def _read_issue(**kwargs: Any) -> str:
|
||||
number = kwargs.get("number")
|
||||
if not number:
|
||||
@@ -224,60 +262,40 @@ def _build_gitea_tools() -> list[MCPToolDef]:
|
||||
except Exception as exc:
|
||||
return f"Error reading issue: {exc}"
|
||||
|
||||
return MCPToolDef(
|
||||
name="read_issue",
|
||||
description="Read details of a specific issue by number.",
|
||||
parameters={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"number": {
|
||||
"type": "integer",
|
||||
"description": "Issue number to read",
|
||||
},
|
||||
},
|
||||
"required": ["number"],
|
||||
},
|
||||
handler=_read_issue,
|
||||
)
|
||||
|
||||
|
||||
def _build_gitea_tools() -> list[MCPToolDef]:
|
||||
"""Build Gitea MCP tool definitions for direct Ollama bridge use.
|
||||
|
||||
These tools call the Gitea REST API directly via httpx rather than
|
||||
spawning an MCP server subprocess, keeping the bridge lightweight.
|
||||
"""
|
||||
if not settings.gitea_enabled or not settings.gitea_token:
|
||||
return []
|
||||
|
||||
base_url = settings.gitea_url
|
||||
token = settings.gitea_token
|
||||
owner, repo = settings.gitea_repo.split("/", 1)
|
||||
|
||||
return [
|
||||
MCPToolDef(
|
||||
name="list_issues",
|
||||
description="List issues in the Gitea repository. Returns issue numbers and titles.",
|
||||
parameters={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"state": {
|
||||
"type": "string",
|
||||
"description": "Filter by state: open, closed, or all (default: open)",
|
||||
},
|
||||
"limit": {
|
||||
"type": "integer",
|
||||
"description": "Maximum number of issues to return (default: 10)",
|
||||
},
|
||||
},
|
||||
"required": [],
|
||||
},
|
||||
handler=_list_issues,
|
||||
),
|
||||
MCPToolDef(
|
||||
name="create_issue",
|
||||
description="Create a new issue in the Gitea repository.",
|
||||
parameters={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"title": {
|
||||
"type": "string",
|
||||
"description": "Issue title (required)",
|
||||
},
|
||||
"body": {
|
||||
"type": "string",
|
||||
"description": "Issue body in markdown (optional)",
|
||||
},
|
||||
},
|
||||
"required": ["title"],
|
||||
},
|
||||
handler=_create_issue,
|
||||
),
|
||||
MCPToolDef(
|
||||
name="read_issue",
|
||||
description="Read details of a specific issue by number.",
|
||||
parameters={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"number": {
|
||||
"type": "integer",
|
||||
"description": "Issue number to read",
|
||||
},
|
||||
},
|
||||
"required": ["number"],
|
||||
},
|
||||
handler=_read_issue,
|
||||
),
|
||||
_build_list_issues_tool(base_url, token, owner, repo),
|
||||
_build_create_issue_tool(base_url, token, owner, repo),
|
||||
_build_read_issue_tool(base_url, token, owner, repo),
|
||||
]
|
||||
|
||||
|
||||
@@ -399,6 +417,72 @@ class MCPBridge:
|
||||
logger.warning("Tool '%s' execution failed: %s", name, exc)
|
||||
return f"Error executing {name}: {exc}"
|
||||
|
||||
@staticmethod
|
||||
def _build_initial_messages(prompt: str, system_prompt: str | None) -> list[dict]:
|
||||
"""Build the initial message list for a run."""
|
||||
messages: list[dict] = []
|
||||
if system_prompt:
|
||||
messages.append({"role": "system", "content": system_prompt})
|
||||
messages.append({"role": "user", "content": prompt})
|
||||
return messages
|
||||
|
||||
async def _process_round_tool_calls(
|
||||
self,
|
||||
messages: list[dict],
|
||||
model_tool_calls: list[dict],
|
||||
rounds: int,
|
||||
tool_calls_made: list[dict],
|
||||
) -> None:
|
||||
"""Execute all tool calls in one round, appending results to messages."""
|
||||
for tc in model_tool_calls:
|
||||
func = tc.get("function", {})
|
||||
tool_name = func.get("name", "unknown")
|
||||
tool_args = func.get("arguments", {})
|
||||
logger.info(
|
||||
"Bridge tool call [round %d]: %s(%s)",
|
||||
rounds,
|
||||
tool_name,
|
||||
tool_args,
|
||||
)
|
||||
result = await self._execute_tool_call(tc)
|
||||
tool_calls_made.append(
|
||||
{
|
||||
"round": rounds,
|
||||
"tool": tool_name,
|
||||
"arguments": tool_args,
|
||||
"result": result[:500], # Truncate for logging
|
||||
}
|
||||
)
|
||||
messages.append({"role": "tool", "content": result})
|
||||
|
||||
async def _run_tool_loop(
|
||||
self, messages: list[dict], tools: list[dict]
|
||||
) -> tuple[str, list[dict], int, str]:
|
||||
"""Run the tool-call loop until final response or max rounds reached.
|
||||
|
||||
Returns:
|
||||
Tuple of (content, tool_calls_made, rounds, error).
|
||||
"""
|
||||
tool_calls_made: list[dict] = []
|
||||
rounds = 0
|
||||
|
||||
for round_num in range(self.max_rounds):
|
||||
rounds = round_num + 1
|
||||
response = await self._chat(messages, tools)
|
||||
msg = response.get("message", {})
|
||||
model_tool_calls = msg.get("tool_calls", [])
|
||||
|
||||
if not model_tool_calls:
|
||||
return msg.get("content", ""), tool_calls_made, rounds, ""
|
||||
|
||||
messages.append(msg)
|
||||
await self._process_round_tool_calls(
|
||||
messages, model_tool_calls, rounds, tool_calls_made
|
||||
)
|
||||
|
||||
error = f"Exceeded maximum of {self.max_rounds} tool-call rounds"
|
||||
return "(max tool-call rounds reached)", tool_calls_made, rounds, error
|
||||
|
||||
async def run(
|
||||
self,
|
||||
prompt: str,
|
||||
@@ -419,115 +503,35 @@ class MCPBridge:
|
||||
BridgeResult with the final response and tool call history.
|
||||
"""
|
||||
start = time.time()
|
||||
messages: list[dict] = []
|
||||
|
||||
if system_prompt:
|
||||
messages.append({"role": "system", "content": system_prompt})
|
||||
|
||||
messages.append({"role": "user", "content": prompt})
|
||||
|
||||
messages = self._build_initial_messages(prompt, system_prompt)
|
||||
tools = self._build_ollama_tools()
|
||||
tool_calls_made: list[dict] = []
|
||||
rounds = 0
|
||||
error_msg = ""
|
||||
|
||||
try:
|
||||
for round_num in range(self.max_rounds):
|
||||
rounds = round_num + 1
|
||||
response = await self._chat(messages, tools)
|
||||
msg = response.get("message", {})
|
||||
|
||||
# Check if model made tool calls
|
||||
model_tool_calls = msg.get("tool_calls", [])
|
||||
if not model_tool_calls:
|
||||
# Final text response — done.
|
||||
content = msg.get("content", "")
|
||||
latency = (time.time() - start) * 1000
|
||||
return BridgeResult(
|
||||
content=content,
|
||||
tool_calls_made=tool_calls_made,
|
||||
rounds=rounds,
|
||||
latency_ms=latency,
|
||||
model=self.model,
|
||||
)
|
||||
|
||||
# Append the assistant message (with tool_calls) to history
|
||||
messages.append(msg)
|
||||
|
||||
# Execute each tool call and add results
|
||||
for tc in model_tool_calls:
|
||||
func = tc.get("function", {})
|
||||
tool_name = func.get("name", "unknown")
|
||||
tool_args = func.get("arguments", {})
|
||||
|
||||
logger.info(
|
||||
"Bridge tool call [round %d]: %s(%s)",
|
||||
rounds,
|
||||
tool_name,
|
||||
tool_args,
|
||||
)
|
||||
|
||||
result = await self._execute_tool_call(tc)
|
||||
tool_calls_made.append(
|
||||
{
|
||||
"round": rounds,
|
||||
"tool": tool_name,
|
||||
"arguments": tool_args,
|
||||
"result": result[:500], # Truncate for logging
|
||||
}
|
||||
)
|
||||
|
||||
# Add tool result to message history
|
||||
messages.append(
|
||||
{
|
||||
"role": "tool",
|
||||
"content": result,
|
||||
}
|
||||
)
|
||||
|
||||
# Hit max rounds
|
||||
latency = (time.time() - start) * 1000
|
||||
return BridgeResult(
|
||||
content="(max tool-call rounds reached)",
|
||||
tool_calls_made=tool_calls_made,
|
||||
rounds=rounds,
|
||||
latency_ms=latency,
|
||||
model=self.model,
|
||||
error=f"Exceeded maximum of {self.max_rounds} tool-call rounds",
|
||||
)
|
||||
|
||||
content, tool_calls_made, rounds, error_msg = await self._run_tool_loop(messages, tools)
|
||||
except httpx.ConnectError as exc:
|
||||
latency = (time.time() - start) * 1000
|
||||
logger.warning("Ollama connection failed: %s", exc)
|
||||
return BridgeResult(
|
||||
content="",
|
||||
tool_calls_made=tool_calls_made,
|
||||
rounds=rounds,
|
||||
latency_ms=latency,
|
||||
model=self.model,
|
||||
error=f"Ollama connection failed: {exc}",
|
||||
)
|
||||
error_msg = f"Ollama connection failed: {exc}"
|
||||
content = ""
|
||||
except httpx.HTTPStatusError as exc:
|
||||
latency = (time.time() - start) * 1000
|
||||
logger.warning("Ollama HTTP error: %s", exc)
|
||||
return BridgeResult(
|
||||
content="",
|
||||
tool_calls_made=tool_calls_made,
|
||||
rounds=rounds,
|
||||
latency_ms=latency,
|
||||
model=self.model,
|
||||
error=f"Ollama HTTP error: {exc.response.status_code}",
|
||||
)
|
||||
error_msg = f"Ollama HTTP error: {exc.response.status_code}"
|
||||
content = ""
|
||||
except Exception as exc:
|
||||
latency = (time.time() - start) * 1000
|
||||
logger.error("MCPBridge run failed: %s", exc)
|
||||
return BridgeResult(
|
||||
content="",
|
||||
tool_calls_made=tool_calls_made,
|
||||
rounds=rounds,
|
||||
latency_ms=latency,
|
||||
model=self.model,
|
||||
error=str(exc),
|
||||
)
|
||||
error_msg = str(exc)
|
||||
content = ""
|
||||
|
||||
return BridgeResult(
|
||||
content=content,
|
||||
tool_calls_made=tool_calls_made,
|
||||
rounds=rounds,
|
||||
latency_ms=(time.time() - start) * 1000,
|
||||
model=self.model,
|
||||
error=error_msg,
|
||||
)
|
||||
|
||||
def status(self) -> dict:
|
||||
"""Return bridge status for the dashboard."""
|
||||
|
||||
@@ -13,8 +13,8 @@ from dataclasses import dataclass
|
||||
import httpx
|
||||
|
||||
from config import settings
|
||||
from timmy.research_tools import get_llm_client, google_web_search
|
||||
from timmy.research_triage import triage_research_report
|
||||
from timmy.research_tools import google_web_search, get_llm_client
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -52,10 +52,7 @@ class PaperclipClient:
|
||||
)
|
||||
resp.raise_for_status()
|
||||
tasks = resp.json()
|
||||
return [
|
||||
PaperclipTask(id=t["id"], kind=t["kind"], context=t["context"])
|
||||
for t in tasks
|
||||
]
|
||||
return [PaperclipTask(id=t["id"], kind=t["kind"], context=t["context"]) for t in tasks]
|
||||
|
||||
async def update_task_status(
|
||||
self, task_id: str, status: str, result: str | None = None
|
||||
@@ -98,7 +95,7 @@ class ResearchOrchestrator:
|
||||
async def run_research_pipeline(self, issue_title: str) -> str:
|
||||
"""Run the research pipeline."""
|
||||
search_results = await google_web_search(issue_title)
|
||||
|
||||
|
||||
llm_client = get_llm_client()
|
||||
response = await llm_client.completion(
|
||||
f"Summarize the following search results and generate a research report:\\n\\n{search_results}",
|
||||
@@ -123,7 +120,9 @@ class ResearchOrchestrator:
|
||||
comment += "Created the following issues:\\n"
|
||||
for result in triage_results:
|
||||
if result["gitea_issue"]:
|
||||
comment += f"- #{result['gitea_issue']['number']}: {result['action_item'].title}\\n"
|
||||
comment += (
|
||||
f"- #{result['gitea_issue']['number']}: {result['action_item'].title}\\n"
|
||||
)
|
||||
else:
|
||||
comment += "No new issues were created.\\n"
|
||||
|
||||
@@ -172,4 +171,3 @@ async def start_paperclip_poller() -> None:
|
||||
if settings.paperclip_enabled:
|
||||
poller = PaperclipPoller()
|
||||
asyncio.create_task(poller.poll())
|
||||
|
||||
|
||||
@@ -6,7 +6,6 @@ import logging
|
||||
import os
|
||||
from typing import Any
|
||||
|
||||
from config import settings
|
||||
from serpapi import GoogleSearch
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
@@ -28,12 +27,17 @@ async def google_web_search(query: str) -> str:
|
||||
|
||||
def get_llm_client() -> Any:
|
||||
"""Get an LLM client."""
|
||||
|
||||
# This is a placeholder. In a real application, this would return
|
||||
# a client for an LLM service like OpenAI, Anthropic, or a local
|
||||
# model.
|
||||
class MockLLMClient:
|
||||
"""Stub LLM client for testing without a real language model."""
|
||||
|
||||
async def completion(self, prompt: str, max_tokens: int) -> Any:
|
||||
class MockCompletion:
|
||||
"""Stub completion response returned by MockLLMClient."""
|
||||
|
||||
def __init__(self, text: str) -> None:
|
||||
self.text = text
|
||||
|
||||
|
||||
7
src/timmy/sovereignty/__init__.py
Normal file
7
src/timmy/sovereignty/__init__.py
Normal file
@@ -0,0 +1,7 @@
|
||||
"""Sovereignty metrics for the Bannerlord loop.
|
||||
|
||||
Tracks how much of each AI layer (perception, decision, narration)
|
||||
runs locally vs. calls out to an LLM. Feeds the sovereignty dashboard.
|
||||
|
||||
Refs: #954, #953
|
||||
"""
|
||||
413
src/timmy/sovereignty/metrics.py
Normal file
413
src/timmy/sovereignty/metrics.py
Normal file
@@ -0,0 +1,413 @@
|
||||
"""Sovereignty metrics emitter and SQLite store.
|
||||
|
||||
Tracks the sovereignty percentage for each AI layer (perception, decision,
|
||||
narration) plus API cost and skill crystallisation. All data is persisted to
|
||||
``data/sovereignty_metrics.db`` so the dashboard can query trends over time.
|
||||
|
||||
Event types
|
||||
-----------
|
||||
perception layer:
|
||||
``perception_cache_hit`` — frame answered from local cache (sovereign)
|
||||
``perception_vlm_call`` — frame required a VLM inference call (non-sovereign)
|
||||
|
||||
decision layer:
|
||||
``decision_rule_hit`` — action chosen by a deterministic rule (sovereign)
|
||||
``decision_llm_call`` — action required LLM reasoning (non-sovereign)
|
||||
|
||||
narration layer:
|
||||
``narration_template`` — text generated from a template (sovereign)
|
||||
``narration_llm`` — text generated by an LLM (non-sovereign)
|
||||
|
||||
skill layer:
|
||||
``skill_crystallized`` — a new skill was crystallised from LLM output
|
||||
|
||||
cost:
|
||||
``api_call`` — any external API call was made
|
||||
``api_cost`` — monetary cost of an API call (metadata: {"usd": float})
|
||||
|
||||
Refs: #954, #953
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
import sqlite3
|
||||
import uuid
|
||||
from contextlib import closing
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import UTC, datetime
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from config import settings
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ── Constants ─────────────────────────────────────────────────────────────────
|
||||
|
||||
DB_PATH = Path(settings.repo_root) / "data" / "sovereignty_metrics.db"
|
||||
|
||||
#: Sovereign event types for each layer (numerator of sovereignty %).
|
||||
_SOVEREIGN_EVENTS: dict[str, frozenset[str]] = {
|
||||
"perception": frozenset({"perception_cache_hit"}),
|
||||
"decision": frozenset({"decision_rule_hit"}),
|
||||
"narration": frozenset({"narration_template"}),
|
||||
}
|
||||
|
||||
#: All tracked event types for each layer (denominator of sovereignty %).
|
||||
_LAYER_EVENTS: dict[str, frozenset[str]] = {
|
||||
"perception": frozenset({"perception_cache_hit", "perception_vlm_call"}),
|
||||
"decision": frozenset({"decision_rule_hit", "decision_llm_call"}),
|
||||
"narration": frozenset({"narration_template", "narration_llm"}),
|
||||
}
|
||||
|
||||
ALL_EVENT_TYPES: frozenset[str] = frozenset(
|
||||
{
|
||||
"perception_cache_hit",
|
||||
"perception_vlm_call",
|
||||
"decision_rule_hit",
|
||||
"decision_llm_call",
|
||||
"narration_template",
|
||||
"narration_llm",
|
||||
"skill_crystallized",
|
||||
"api_call",
|
||||
"api_cost",
|
||||
}
|
||||
)
|
||||
|
||||
# ── Schema ────────────────────────────────────────────────────────────────────
|
||||
|
||||
_SCHEMA = """
|
||||
CREATE TABLE IF NOT EXISTS events (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
timestamp TEXT NOT NULL,
|
||||
event_type TEXT NOT NULL,
|
||||
session_id TEXT NOT NULL DEFAULT '',
|
||||
metadata_json TEXT NOT NULL DEFAULT '{}'
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_ev_type ON events(event_type);
|
||||
CREATE INDEX IF NOT EXISTS idx_ev_ts ON events(timestamp);
|
||||
CREATE INDEX IF NOT EXISTS idx_ev_session ON events(session_id);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS sessions (
|
||||
session_id TEXT PRIMARY KEY,
|
||||
game TEXT NOT NULL DEFAULT '',
|
||||
start_time TEXT NOT NULL,
|
||||
end_time TEXT
|
||||
);
|
||||
"""
|
||||
|
||||
|
||||
# ── Data classes ──────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@dataclass
|
||||
class SovereigntyEvent:
|
||||
"""A single sovereignty event."""
|
||||
|
||||
event_type: str
|
||||
session_id: str = ""
|
||||
metadata: dict[str, Any] = field(default_factory=dict)
|
||||
timestamp: str = field(default_factory=lambda: datetime.now(UTC).isoformat())
|
||||
|
||||
|
||||
# ── Store ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class SovereigntyMetricsStore:
|
||||
"""SQLite-backed sovereignty event store.
|
||||
|
||||
Thread-safe: creates a new connection per operation (WAL mode).
|
||||
"""
|
||||
|
||||
def __init__(self, db_path: Path | None = None) -> None:
|
||||
self._db_path = db_path or DB_PATH
|
||||
self._init_db()
|
||||
|
||||
# ── internal ─────────────────────────────────────────────────────────────
|
||||
|
||||
def _init_db(self) -> None:
|
||||
try:
|
||||
self._db_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
with closing(sqlite3.connect(str(self._db_path))) as conn:
|
||||
conn.execute("PRAGMA journal_mode=WAL")
|
||||
conn.execute(f"PRAGMA busy_timeout={settings.db_busy_timeout_ms}")
|
||||
conn.executescript(_SCHEMA)
|
||||
conn.commit()
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to initialise sovereignty metrics DB: %s", exc)
|
||||
|
||||
def _connect(self) -> sqlite3.Connection:
|
||||
conn = sqlite3.connect(str(self._db_path))
|
||||
conn.row_factory = sqlite3.Row
|
||||
conn.execute(f"PRAGMA busy_timeout={settings.db_busy_timeout_ms}")
|
||||
return conn
|
||||
|
||||
# ── public API ────────────────────────────────────────────────────────────
|
||||
|
||||
def record(
|
||||
self, event_type: str, metadata: dict[str, Any] | None = None, *, session_id: str = ""
|
||||
) -> None:
|
||||
"""Record a sovereignty event.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
event_type:
|
||||
One of ``ALL_EVENT_TYPES``.
|
||||
metadata:
|
||||
Optional dict of extra data (serialised as JSON).
|
||||
session_id:
|
||||
Identifier of the current game session, if known.
|
||||
"""
|
||||
event = SovereigntyEvent(
|
||||
event_type=event_type,
|
||||
session_id=session_id,
|
||||
metadata=metadata or {},
|
||||
)
|
||||
try:
|
||||
with closing(self._connect()) as conn:
|
||||
conn.execute(
|
||||
"INSERT INTO events (timestamp, event_type, session_id, metadata_json) "
|
||||
"VALUES (?, ?, ?, ?)",
|
||||
(
|
||||
event.timestamp,
|
||||
event.event_type,
|
||||
event.session_id,
|
||||
json.dumps(event.metadata),
|
||||
),
|
||||
)
|
||||
conn.commit()
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to record sovereignty event: %s", exc)
|
||||
|
||||
def start_session(self, game: str = "", session_id: str | None = None) -> str:
|
||||
"""Register a new game session. Returns the session_id."""
|
||||
sid = session_id or str(uuid.uuid4())
|
||||
try:
|
||||
with closing(self._connect()) as conn:
|
||||
conn.execute(
|
||||
"INSERT OR IGNORE INTO sessions (session_id, game, start_time) VALUES (?, ?, ?)",
|
||||
(sid, game, datetime.now(UTC).isoformat()),
|
||||
)
|
||||
conn.commit()
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to start session: %s", exc)
|
||||
return sid
|
||||
|
||||
def end_session(self, session_id: str) -> None:
|
||||
"""Mark a session as ended."""
|
||||
try:
|
||||
with closing(self._connect()) as conn:
|
||||
conn.execute(
|
||||
"UPDATE sessions SET end_time = ? WHERE session_id = ?",
|
||||
(datetime.now(UTC).isoformat(), session_id),
|
||||
)
|
||||
conn.commit()
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to end session: %s", exc)
|
||||
|
||||
# ── analytics ─────────────────────────────────────────────────────────────
|
||||
|
||||
def get_sovereignty_pct(self, layer: str, time_window: float | None = None) -> float:
|
||||
"""Return the sovereignty percentage (0.0–100.0) for *layer*.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
layer:
|
||||
One of ``"perception"``, ``"decision"``, ``"narration"``.
|
||||
time_window:
|
||||
If given, only consider events from the last *time_window* seconds.
|
||||
If ``None``, all events are used.
|
||||
|
||||
Returns
|
||||
-------
|
||||
float
|
||||
Percentage of sovereign events for the layer, or 0.0 if no data.
|
||||
"""
|
||||
if layer not in _LAYER_EVENTS:
|
||||
logger.warning("Unknown sovereignty layer: %s", layer)
|
||||
return 0.0
|
||||
|
||||
sovereign = _SOVEREIGN_EVENTS[layer]
|
||||
total_types = _LAYER_EVENTS[layer]
|
||||
|
||||
sovereign_placeholders = ",".join("?" * len(sovereign))
|
||||
total_placeholders = ",".join("?" * len(total_types))
|
||||
|
||||
params_sov: list[Any] = list(sovereign)
|
||||
params_total: list[Any] = list(total_types)
|
||||
|
||||
if time_window is not None:
|
||||
cutoff = _seconds_ago_iso(time_window)
|
||||
where_ts = " AND timestamp >= ?"
|
||||
params_sov.append(cutoff)
|
||||
params_total.append(cutoff)
|
||||
else:
|
||||
where_ts = ""
|
||||
|
||||
try:
|
||||
with closing(self._connect()) as conn:
|
||||
total_count = conn.execute(
|
||||
f"SELECT COUNT(*) FROM events WHERE event_type IN ({total_placeholders}){where_ts}",
|
||||
params_total,
|
||||
).fetchone()[0]
|
||||
if total_count == 0:
|
||||
return 0.0
|
||||
sov_count = conn.execute(
|
||||
f"SELECT COUNT(*) FROM events WHERE event_type IN ({sovereign_placeholders}){where_ts}",
|
||||
params_sov,
|
||||
).fetchone()[0]
|
||||
return round(100.0 * sov_count / total_count, 2)
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to compute sovereignty pct: %s", exc)
|
||||
return 0.0
|
||||
|
||||
def get_cost_per_hour(self, time_window: float | None = None) -> float:
|
||||
"""Return the total API cost in USD extrapolated to a per-hour rate.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
time_window:
|
||||
Seconds of history to consider. Defaults to 3600 (last hour).
|
||||
|
||||
Returns
|
||||
-------
|
||||
float
|
||||
USD cost per hour, or 0.0 if no ``api_cost`` events exist.
|
||||
"""
|
||||
window = time_window if time_window is not None else 3600.0
|
||||
cutoff = _seconds_ago_iso(window)
|
||||
|
||||
try:
|
||||
with closing(self._connect()) as conn:
|
||||
rows = conn.execute(
|
||||
"SELECT metadata_json FROM events WHERE event_type = 'api_cost' AND timestamp >= ?",
|
||||
(cutoff,),
|
||||
).fetchall()
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to query api_cost events: %s", exc)
|
||||
return 0.0
|
||||
|
||||
total_usd = 0.0
|
||||
for row in rows:
|
||||
try:
|
||||
meta = json.loads(row["metadata_json"] or "{}")
|
||||
total_usd += float(meta.get("usd", 0.0))
|
||||
except (ValueError, TypeError, json.JSONDecodeError):
|
||||
pass
|
||||
|
||||
# Extrapolate: (total in window) * (3600 / window_seconds)
|
||||
if window == 0:
|
||||
return 0.0
|
||||
return round(total_usd * (3600.0 / window), 4)
|
||||
|
||||
def get_skills_crystallized(self, session_id: str | None = None) -> int:
|
||||
"""Return the number of skills crystallised.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
session_id:
|
||||
If given, count only events for that session. If ``None``,
|
||||
count across all sessions.
|
||||
"""
|
||||
try:
|
||||
with closing(self._connect()) as conn:
|
||||
if session_id:
|
||||
return conn.execute(
|
||||
"SELECT COUNT(*) FROM events WHERE event_type = 'skill_crystallized' AND session_id = ?",
|
||||
(session_id,),
|
||||
).fetchone()[0]
|
||||
return conn.execute(
|
||||
"SELECT COUNT(*) FROM events WHERE event_type = 'skill_crystallized'",
|
||||
).fetchone()[0]
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to query skill_crystallized: %s", exc)
|
||||
return 0
|
||||
|
||||
def get_snapshot(self) -> dict[str, Any]:
|
||||
"""Return a real-time metrics snapshot suitable for dashboard widgets."""
|
||||
return {
|
||||
"sovereignty": {
|
||||
layer: self.get_sovereignty_pct(layer, time_window=3600) for layer in _LAYER_EVENTS
|
||||
},
|
||||
"cost_per_hour": self.get_cost_per_hour(),
|
||||
"skills_crystallized": self.get_skills_crystallized(),
|
||||
}
|
||||
|
||||
|
||||
# ── Module-level singleton ────────────────────────────────────────────────────
|
||||
|
||||
_store: SovereigntyMetricsStore | None = None
|
||||
|
||||
|
||||
def get_metrics_store() -> SovereigntyMetricsStore:
|
||||
"""Return (or lazily create) the module-level singleton store."""
|
||||
global _store
|
||||
if _store is None:
|
||||
_store = SovereigntyMetricsStore()
|
||||
return _store
|
||||
|
||||
|
||||
# ── Convenience helpers ───────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def record(
|
||||
event_type: str, metadata: dict[str, Any] | None = None, *, session_id: str = ""
|
||||
) -> None:
|
||||
"""Module-level shortcut: ``metrics.record("perception_cache_hit")``."""
|
||||
get_metrics_store().record(event_type, metadata=metadata, session_id=session_id)
|
||||
|
||||
|
||||
def get_sovereignty_pct(layer: str, time_window: float | None = None) -> float:
|
||||
"""Module-level shortcut for :meth:`SovereigntyMetricsStore.get_sovereignty_pct`."""
|
||||
return get_metrics_store().get_sovereignty_pct(layer, time_window)
|
||||
|
||||
|
||||
def get_cost_per_hour(time_window: float | None = None) -> float:
|
||||
"""Module-level shortcut for :meth:`SovereigntyMetricsStore.get_cost_per_hour`."""
|
||||
return get_metrics_store().get_cost_per_hour(time_window)
|
||||
|
||||
|
||||
def get_skills_crystallized(session_id: str | None = None) -> int:
|
||||
"""Module-level shortcut for :meth:`SovereigntyMetricsStore.get_skills_crystallized`."""
|
||||
return get_metrics_store().get_skills_crystallized(session_id)
|
||||
|
||||
|
||||
async def emit_sovereignty_event(
|
||||
event_type: str,
|
||||
metadata: dict[str, Any] | None = None,
|
||||
*,
|
||||
session_id: str = "",
|
||||
) -> None:
|
||||
"""Record an event in a thread and publish it on the event bus.
|
||||
|
||||
This is the async-safe entry-point used by the agentic loop.
|
||||
"""
|
||||
from infrastructure.events.bus import emit
|
||||
|
||||
await asyncio.to_thread(
|
||||
get_metrics_store().record,
|
||||
event_type,
|
||||
metadata,
|
||||
session_id=session_id,
|
||||
)
|
||||
await emit(
|
||||
f"sovereignty.event.{event_type}",
|
||||
source="sovereignty_metrics",
|
||||
data={
|
||||
"event_type": event_type,
|
||||
"session_id": session_id,
|
||||
**(metadata or {}),
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
# ── Private helpers ───────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _seconds_ago_iso(seconds: float) -> str:
|
||||
"""Return an ISO-8601 timestamp *seconds* before now (UTC)."""
|
||||
import datetime as _dt
|
||||
|
||||
delta = _dt.timedelta(seconds=seconds)
|
||||
return (_dt.datetime.now(UTC) - delta).isoformat()
|
||||
92
src/timmy/sovereignty/perception_cache.py
Normal file
92
src/timmy/sovereignty/perception_cache.py
Normal file
@@ -0,0 +1,92 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
|
||||
@dataclass
|
||||
class Template:
|
||||
name: str
|
||||
image: np.ndarray
|
||||
threshold: float = 0.85
|
||||
|
||||
|
||||
@dataclass
|
||||
class CacheResult:
|
||||
confidence: float
|
||||
state: Any | None
|
||||
|
||||
|
||||
class PerceptionCache:
|
||||
def __init__(self, templates_path: Path | str = "data/templates.json"):
|
||||
self.templates_path = Path(templates_path)
|
||||
self.templates: list[Template] = []
|
||||
self.load()
|
||||
|
||||
def match(self, screenshot: np.ndarray) -> CacheResult:
|
||||
"""
|
||||
Matches templates against the screenshot.
|
||||
Returns the confidence and the name of the best matching template.
|
||||
"""
|
||||
best_match_confidence = 0.0
|
||||
best_match_name = None
|
||||
|
||||
for template in self.templates:
|
||||
res = cv2.matchTemplate(screenshot, template.image, cv2.TM_CCOEFF_NORMED)
|
||||
_, max_val, _, _ = cv2.minMaxLoc(res)
|
||||
if max_val > best_match_confidence:
|
||||
best_match_confidence = max_val
|
||||
best_match_name = template.name
|
||||
|
||||
if best_match_confidence > 0.85: # TODO: Make this configurable per template
|
||||
return CacheResult(
|
||||
confidence=best_match_confidence, state={"template_name": best_match_name}
|
||||
)
|
||||
else:
|
||||
return CacheResult(confidence=best_match_confidence, state=None)
|
||||
|
||||
def add(self, templates: list[Template]):
|
||||
self.templates.extend(templates)
|
||||
|
||||
def persist(self):
|
||||
self.templates_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
# Note: This is a simplified persistence mechanism.
|
||||
# A more robust solution would store templates as images and metadata in JSON.
|
||||
with self.templates_path.open("w") as f:
|
||||
json.dump(
|
||||
[{"name": t.name, "threshold": t.threshold} for t in self.templates], f, indent=2
|
||||
)
|
||||
|
||||
def load(self):
|
||||
if self.templates_path.exists():
|
||||
with self.templates_path.open("r") as f:
|
||||
templates_data = json.load(f)
|
||||
# This is a simplified loading mechanism and assumes template images are stored elsewhere.
|
||||
# For now, we are not loading the actual images.
|
||||
self.templates = [
|
||||
Template(name=t["name"], image=np.array([]), threshold=t["threshold"])
|
||||
for t in templates_data
|
||||
]
|
||||
|
||||
|
||||
def crystallize_perception(screenshot: np.ndarray, vlm_response: Any) -> list[Template]:
|
||||
"""
|
||||
Extracts reusable patterns from VLM output and generates OpenCV templates.
|
||||
This is a placeholder and needs to be implemented based on the actual VLM response format.
|
||||
"""
|
||||
# Example implementation:
|
||||
# templates = []
|
||||
# for item in vlm_response.get("items", []):
|
||||
# bbox = item.get("bounding_box")
|
||||
# template_name = item.get("name")
|
||||
# if bbox and template_name:
|
||||
# x1, y1, x2, y2 = bbox
|
||||
# template_image = screenshot[y1:y2, x1:x2]
|
||||
# templates.append(Template(name=template_name, image=template_image))
|
||||
# return templates
|
||||
return []
|
||||
@@ -692,91 +692,112 @@ class ThinkingEngine:
|
||||
file paths actually exist on disk, preventing phantom-bug reports.
|
||||
"""
|
||||
try:
|
||||
interval = settings.thinking_issue_every
|
||||
if interval <= 0:
|
||||
recent = self._get_recent_thoughts_for_issues()
|
||||
if recent is None:
|
||||
return
|
||||
|
||||
count = self.count_thoughts()
|
||||
if count == 0 or count % interval != 0:
|
||||
return
|
||||
|
||||
# Check Gitea availability before spending LLM tokens
|
||||
if not settings.gitea_enabled or not settings.gitea_token:
|
||||
return
|
||||
|
||||
recent = self.get_recent_thoughts(limit=interval)
|
||||
if len(recent) < interval:
|
||||
return
|
||||
|
||||
thought_text = "\n".join(f"- [{t.seed_type}] {t.content}" for t in reversed(recent))
|
||||
|
||||
classify_prompt = (
|
||||
"You are reviewing your own recent thoughts for actionable items.\n"
|
||||
"Extract 0-2 items that are CONCRETE bugs, broken features, stale "
|
||||
"state, or clear improvement opportunities in your own codebase.\n\n"
|
||||
"Rules:\n"
|
||||
"- Only include things that could become a real code fix or feature\n"
|
||||
"- Skip vague reflections, philosophical musings, or repeated themes\n"
|
||||
"- Category must be one of: bug, feature, suggestion, maintenance\n"
|
||||
"- ONLY reference files that you are CERTAIN exist in the project\n"
|
||||
"- Do NOT invent or guess file paths — if unsure, describe the "
|
||||
"area of concern without naming specific files\n\n"
|
||||
"For each item, write an ENGINEER-QUALITY issue:\n"
|
||||
'- "title": A clear, specific title (e.g. "[Memory] MEMORY.md timestamp not updating")\n'
|
||||
'- "body": A detailed body with these sections:\n'
|
||||
" **What's happening:** Describe the current (broken) behavior.\n"
|
||||
" **Expected behavior:** What should happen instead.\n"
|
||||
" **Suggested fix:** Which file(s) to change and what the fix looks like.\n"
|
||||
" **Acceptance criteria:** How to verify the fix works.\n"
|
||||
'- "category": One of bug, feature, suggestion, maintenance\n\n'
|
||||
"Return ONLY a JSON array of objects with keys: "
|
||||
'"title", "body", "category"\n'
|
||||
"Return [] if nothing is actionable.\n\n"
|
||||
f"Recent thoughts:\n{thought_text}\n\nJSON array:"
|
||||
)
|
||||
|
||||
classify_prompt = self._build_issue_classify_prompt(recent)
|
||||
raw = await self._call_agent(classify_prompt)
|
||||
if not raw or not raw.strip():
|
||||
return
|
||||
|
||||
import json
|
||||
|
||||
# Strip markdown code fences if present
|
||||
cleaned = raw.strip()
|
||||
if cleaned.startswith("```"):
|
||||
cleaned = cleaned.split("\n", 1)[-1].rsplit("```", 1)[0].strip()
|
||||
|
||||
items = json.loads(cleaned)
|
||||
if not isinstance(items, list) or not items:
|
||||
items = self._parse_issue_items(raw)
|
||||
if items is None:
|
||||
return
|
||||
|
||||
from timmy.mcp_tools import create_gitea_issue_via_mcp
|
||||
|
||||
for item in items[:2]: # Safety cap
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
title = item.get("title", "").strip()
|
||||
body = item.get("body", "").strip()
|
||||
category = item.get("category", "suggestion").strip()
|
||||
if not title or len(title) < 10:
|
||||
continue
|
||||
|
||||
# Validate all referenced file paths exist on disk
|
||||
combined = f"{title}\n{body}"
|
||||
if not self._references_real_files(combined):
|
||||
logger.info(
|
||||
"Skipped phantom issue: %s (references non-existent files)",
|
||||
title[:60],
|
||||
)
|
||||
continue
|
||||
|
||||
label = category if category in ("bug", "feature") else ""
|
||||
result = await create_gitea_issue_via_mcp(title=title, body=body, labels=label)
|
||||
logger.info("Thought→Issue: %s → %s", title[:60], result[:80])
|
||||
await self._file_single_issue(item, create_gitea_issue_via_mcp)
|
||||
|
||||
except Exception as exc:
|
||||
logger.debug("Thought issue filing skipped: %s", exc)
|
||||
|
||||
def _get_recent_thoughts_for_issues(self):
|
||||
"""Return recent thoughts if conditions for filing issues are met, else None."""
|
||||
interval = settings.thinking_issue_every
|
||||
if interval <= 0:
|
||||
return None
|
||||
|
||||
count = self.count_thoughts()
|
||||
if count == 0 or count % interval != 0:
|
||||
return None
|
||||
|
||||
if not settings.gitea_enabled or not settings.gitea_token:
|
||||
return None
|
||||
|
||||
recent = self.get_recent_thoughts(limit=interval)
|
||||
if len(recent) < interval:
|
||||
return None
|
||||
|
||||
return recent
|
||||
|
||||
@staticmethod
|
||||
def _build_issue_classify_prompt(recent) -> str:
|
||||
"""Build the LLM prompt that extracts actionable issues from recent thoughts."""
|
||||
thought_text = "\n".join(f"- [{t.seed_type}] {t.content}" for t in reversed(recent))
|
||||
return (
|
||||
"You are reviewing your own recent thoughts for actionable items.\n"
|
||||
"Extract 0-2 items that are CONCRETE bugs, broken features, stale "
|
||||
"state, or clear improvement opportunities in your own codebase.\n\n"
|
||||
"Rules:\n"
|
||||
"- Only include things that could become a real code fix or feature\n"
|
||||
"- Skip vague reflections, philosophical musings, or repeated themes\n"
|
||||
"- Category must be one of: bug, feature, suggestion, maintenance\n"
|
||||
"- ONLY reference files that you are CERTAIN exist in the project\n"
|
||||
"- Do NOT invent or guess file paths — if unsure, describe the "
|
||||
"area of concern without naming specific files\n\n"
|
||||
"For each item, write an ENGINEER-QUALITY issue:\n"
|
||||
'- "title": A clear, specific title (e.g. "[Memory] MEMORY.md timestamp not updating")\n'
|
||||
'- "body": A detailed body with these sections:\n'
|
||||
" **What's happening:** Describe the current (broken) behavior.\n"
|
||||
" **Expected behavior:** What should happen instead.\n"
|
||||
" **Suggested fix:** Which file(s) to change and what the fix looks like.\n"
|
||||
" **Acceptance criteria:** How to verify the fix works.\n"
|
||||
'- "category": One of bug, feature, suggestion, maintenance\n\n'
|
||||
"Return ONLY a JSON array of objects with keys: "
|
||||
'"title", "body", "category"\n'
|
||||
"Return [] if nothing is actionable.\n\n"
|
||||
f"Recent thoughts:\n{thought_text}\n\nJSON array:"
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def _parse_issue_items(raw: str):
|
||||
"""Strip markdown fences and parse JSON issue list; return None on failure."""
|
||||
import json
|
||||
|
||||
if not raw or not raw.strip():
|
||||
return None
|
||||
|
||||
cleaned = raw.strip()
|
||||
if cleaned.startswith("```"):
|
||||
cleaned = cleaned.split("\n", 1)[-1].rsplit("```", 1)[0].strip()
|
||||
|
||||
items = json.loads(cleaned)
|
||||
if not isinstance(items, list) or not items:
|
||||
return None
|
||||
|
||||
return items
|
||||
|
||||
async def _file_single_issue(self, item: dict, create_fn) -> None:
|
||||
"""Validate one issue dict and create it via *create_fn* if it passes checks."""
|
||||
if not isinstance(item, dict):
|
||||
return
|
||||
title = item.get("title", "").strip()
|
||||
body = item.get("body", "").strip()
|
||||
category = item.get("category", "suggestion").strip()
|
||||
if not title or len(title) < 10:
|
||||
return
|
||||
|
||||
combined = f"{title}\n{body}"
|
||||
if not self._references_real_files(combined):
|
||||
logger.info(
|
||||
"Skipped phantom issue: %s (references non-existent files)",
|
||||
title[:60],
|
||||
)
|
||||
return
|
||||
|
||||
label = category if category in ("bug", "feature") else ""
|
||||
result = await create_fn(title=title, body=body, labels=label)
|
||||
logger.info("Thought→Issue: %s → %s", title[:60], result[:80])
|
||||
|
||||
# ── System snapshot helpers ────────────────────────────────────────────
|
||||
|
||||
def _snap_thought_count(self, now: datetime) -> str | None:
|
||||
|
||||
@@ -47,13 +47,11 @@ _DEFAULT_IDLE_THRESHOLD = 30
|
||||
class AgentStatus:
|
||||
"""Health snapshot for one agent at a point in time."""
|
||||
|
||||
agent: str # "claude" | "kimi" | "timmy"
|
||||
agent: str # "claude" | "kimi" | "timmy"
|
||||
is_idle: bool = True
|
||||
active_issue_numbers: list[int] = field(default_factory=list)
|
||||
stuck_issue_numbers: list[int] = field(default_factory=list)
|
||||
checked_at: str = field(
|
||||
default_factory=lambda: datetime.now(UTC).isoformat()
|
||||
)
|
||||
checked_at: str = field(default_factory=lambda: datetime.now(UTC).isoformat())
|
||||
|
||||
@property
|
||||
def is_stuck(self) -> bool:
|
||||
@@ -69,9 +67,7 @@ class AgentHealthReport:
|
||||
"""Combined health report for all monitored agents."""
|
||||
|
||||
agents: list[AgentStatus] = field(default_factory=list)
|
||||
generated_at: str = field(
|
||||
default_factory=lambda: datetime.now(UTC).isoformat()
|
||||
)
|
||||
generated_at: str = field(default_factory=lambda: datetime.now(UTC).isoformat())
|
||||
|
||||
@property
|
||||
def any_stuck(self) -> bool:
|
||||
@@ -193,18 +189,14 @@ async def check_agent_health(
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=15) as client:
|
||||
issues = await _fetch_labeled_issues(
|
||||
client, base_url, headers, repo, label
|
||||
)
|
||||
issues = await _fetch_labeled_issues(client, base_url, headers, repo, label)
|
||||
|
||||
for issue in issues:
|
||||
num = issue.get("number", 0)
|
||||
status.active_issue_numbers.append(num)
|
||||
|
||||
# Check last activity
|
||||
last_activity = await _last_comment_time(
|
||||
client, base_url, headers, repo, num
|
||||
)
|
||||
last_activity = await _last_comment_time(client, base_url, headers, repo, num)
|
||||
if last_activity is None:
|
||||
last_activity = await _issue_created_time(issue)
|
||||
|
||||
|
||||
@@ -91,9 +91,9 @@ _PRIORITY_LABEL_SCORES: dict[str, int] = {
|
||||
class AgentTarget(StrEnum):
|
||||
"""Which agent should handle this issue."""
|
||||
|
||||
TIMMY = "timmy" # Timmy handles locally (self)
|
||||
TIMMY = "timmy" # Timmy handles locally (self)
|
||||
CLAUDE = "claude" # Dispatch to Claude Code
|
||||
KIMI = "kimi" # Dispatch to Kimi Code
|
||||
KIMI = "kimi" # Dispatch to Kimi Code
|
||||
|
||||
|
||||
@dataclass
|
||||
@@ -172,9 +172,7 @@ def triage_issues(raw_issues: list[dict[str, Any]]) -> list[TriagedIssue]:
|
||||
title = issue.get("title", "")
|
||||
body = issue.get("body") or ""
|
||||
labels = _extract_labels(issue)
|
||||
assignees = [
|
||||
a.get("login", "") for a in issue.get("assignees") or []
|
||||
]
|
||||
assignees = [a.get("login", "") for a in issue.get("assignees") or []]
|
||||
url = issue.get("html_url", "")
|
||||
|
||||
priority = _score_priority(labels, assignees)
|
||||
@@ -252,9 +250,7 @@ async def fetch_open_issues(
|
||||
params=params,
|
||||
)
|
||||
if resp.status_code != 200:
|
||||
logger.warning(
|
||||
"fetch_open_issues: Gitea returned %s", resp.status_code
|
||||
)
|
||||
logger.warning("fetch_open_issues: Gitea returned %s", resp.status_code)
|
||||
return []
|
||||
|
||||
issues = resp.json()
|
||||
|
||||
@@ -34,7 +34,7 @@ _LABEL_MAP: dict[AgentTarget, str] = {
|
||||
|
||||
_LABEL_COLORS: dict[str, str] = {
|
||||
"claude-ready": "#8b6f47", # warm brown
|
||||
"kimi-ready": "#006b75", # dark teal
|
||||
"kimi-ready": "#006b75", # dark teal
|
||||
"timmy-ready": "#0075ca", # blue
|
||||
}
|
||||
|
||||
@@ -52,9 +52,7 @@ class DispatchRecord:
|
||||
issue_title: str
|
||||
agent: AgentTarget
|
||||
rationale: str
|
||||
dispatched_at: str = field(
|
||||
default_factory=lambda: datetime.now(UTC).isoformat()
|
||||
)
|
||||
dispatched_at: str = field(default_factory=lambda: datetime.now(UTC).isoformat())
|
||||
label_applied: bool = False
|
||||
comment_posted: bool = False
|
||||
|
||||
@@ -170,9 +168,7 @@ async def dispatch_issue(issue: TriagedIssue) -> DispatchRecord:
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=15) as client:
|
||||
label_id = await _get_or_create_label(
|
||||
client, base_url, headers, repo, label_name
|
||||
)
|
||||
label_id = await _get_or_create_label(client, base_url, headers, repo, label_name)
|
||||
|
||||
# Apply label
|
||||
if label_id is not None:
|
||||
|
||||
@@ -22,9 +22,9 @@ logger = logging.getLogger(__name__)
|
||||
# Thresholds
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_WARN_DISK_PCT = 85.0 # warn when disk is more than 85% full
|
||||
_WARN_MEM_PCT = 90.0 # warn when memory is more than 90% used
|
||||
_WARN_CPU_PCT = 95.0 # warn when CPU is above 95% sustained
|
||||
_WARN_DISK_PCT = 85.0 # warn when disk is more than 85% full
|
||||
_WARN_MEM_PCT = 90.0 # warn when memory is more than 90% used
|
||||
_WARN_CPU_PCT = 95.0 # warn when CPU is above 95% sustained
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -63,9 +63,7 @@ class SystemSnapshot:
|
||||
memory: MemoryUsage = field(default_factory=MemoryUsage)
|
||||
ollama: OllamaHealth = field(default_factory=OllamaHealth)
|
||||
warnings: list[str] = field(default_factory=list)
|
||||
taken_at: str = field(
|
||||
default_factory=lambda: datetime.now(UTC).isoformat()
|
||||
)
|
||||
taken_at: str = field(default_factory=lambda: datetime.now(UTC).isoformat())
|
||||
|
||||
@property
|
||||
def healthy(self) -> bool:
|
||||
@@ -117,8 +115,8 @@ def _probe_memory() -> MemoryUsage:
|
||||
def _probe_ollama_sync(ollama_url: str) -> OllamaHealth:
|
||||
"""Synchronous Ollama health probe — run in a thread."""
|
||||
try:
|
||||
import urllib.request
|
||||
import json
|
||||
import urllib.request
|
||||
|
||||
url = ollama_url.rstrip("/") + "/api/tags"
|
||||
with urllib.request.urlopen(url, timeout=5) as resp: # noqa: S310
|
||||
@@ -154,14 +152,12 @@ async def get_system_snapshot() -> SystemSnapshot:
|
||||
|
||||
if disk.percent_used >= _WARN_DISK_PCT:
|
||||
warnings.append(
|
||||
f"Disk {disk.path}: {disk.percent_used:.0f}% used "
|
||||
f"({disk.free_gb:.1f} GB free)"
|
||||
f"Disk {disk.path}: {disk.percent_used:.0f}% used ({disk.free_gb:.1f} GB free)"
|
||||
)
|
||||
|
||||
if memory.percent_used >= _WARN_MEM_PCT:
|
||||
warnings.append(
|
||||
f"Memory: {memory.percent_used:.0f}% used "
|
||||
f"({memory.available_gb:.1f} GB available)"
|
||||
f"Memory: {memory.percent_used:.0f}% used ({memory.available_gb:.1f} GB available)"
|
||||
)
|
||||
|
||||
if not ollama.reachable:
|
||||
@@ -216,7 +212,5 @@ async def cleanup_stale_files(
|
||||
errors.append(str(exc))
|
||||
|
||||
await asyncio.to_thread(_cleanup)
|
||||
logger.info(
|
||||
"cleanup_stale_files: deleted %d files, %d errors", deleted, len(errors)
|
||||
)
|
||||
logger.info("cleanup_stale_files: deleted %d files, %d errors", deleted, len(errors))
|
||||
return {"deleted_count": deleted, "errors": errors}
|
||||
|
||||
@@ -25,15 +25,21 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class ChatRequest(BaseModel):
|
||||
"""Incoming chat request payload for the Timmy Serve API."""
|
||||
|
||||
message: str
|
||||
stream: bool = False
|
||||
|
||||
|
||||
class ChatResponse(BaseModel):
|
||||
"""Chat response payload returned by the Timmy Serve API."""
|
||||
|
||||
response: str
|
||||
|
||||
|
||||
class StatusResponse(BaseModel):
|
||||
"""Service status response with backend information."""
|
||||
|
||||
status: str
|
||||
backend: str
|
||||
|
||||
|
||||
@@ -9,6 +9,9 @@ Usage:
|
||||
|
||||
import json
|
||||
import os
|
||||
import subprocess
|
||||
import urllib.error
|
||||
import urllib.request
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
@@ -31,6 +34,37 @@ AUTOMATIONS_CONFIG = DEFAULT_CONFIG_DIR / "automations.json"
|
||||
DAILY_RUN_CONFIG = DEFAULT_CONFIG_DIR / "daily_run.json"
|
||||
TRIAGE_RULES_CONFIG = DEFAULT_CONFIG_DIR / "triage_rules.yaml"
|
||||
|
||||
GITEA_URL = os.environ.get("GITEA_URL", "http://143.198.27.163:3000")
|
||||
GITEA_REPO = "rockachopa/Timmy-time-dashboard"
|
||||
|
||||
|
||||
def _get_gitea_token() -> str | None:
|
||||
"""Read the Gitea API token from env or config files."""
|
||||
token = os.environ.get("GITEA_TOKEN")
|
||||
if token:
|
||||
return token.strip()
|
||||
for candidate in [
|
||||
Path("~/.hermes/gitea_token_vps").expanduser(),
|
||||
Path("~/.hermes/gitea_token").expanduser(),
|
||||
]:
|
||||
try:
|
||||
return candidate.read_text(encoding="utf-8").strip()
|
||||
except FileNotFoundError:
|
||||
continue
|
||||
return None
|
||||
|
||||
|
||||
def _gitea_api_get(endpoint: str) -> Any:
|
||||
"""GET a Gitea API endpoint and return parsed JSON."""
|
||||
url = f"{GITEA_URL}/api/v1{endpoint}"
|
||||
token = _get_gitea_token()
|
||||
req = urllib.request.Request(url)
|
||||
if token:
|
||||
req.add_header("Authorization", f"token {token}")
|
||||
req.add_header("Accept", "application/json")
|
||||
with urllib.request.urlopen(req, timeout=15) as resp:
|
||||
return json.loads(resp.read().decode("utf-8"))
|
||||
|
||||
|
||||
def _load_json_config(path: Path) -> dict[str, Any]:
|
||||
"""Load a JSON config file, returning empty dict on error."""
|
||||
@@ -131,9 +165,41 @@ def daily_run(
|
||||
console.print("[yellow]Dry run mode — no actions executed.[/yellow]")
|
||||
else:
|
||||
console.print("[green]Executing daily run automations...[/green]")
|
||||
# TODO: Implement actual automation execution
|
||||
# This would call the appropriate scripts from the automations config
|
||||
console.print("[dim]Automation execution not yet implemented.[/dim]")
|
||||
auto_config_path = _get_config_dir() / "automations.json"
|
||||
auto_config = _load_json_config(auto_config_path)
|
||||
all_automations = auto_config.get("automations", [])
|
||||
enabled = [a for a in all_automations if a.get("enabled", False)]
|
||||
if not enabled:
|
||||
console.print("[yellow]No enabled automations found.[/yellow]")
|
||||
for auto in enabled:
|
||||
cmd = auto.get("command")
|
||||
name = auto.get("name", auto.get("id", "unnamed"))
|
||||
if not cmd:
|
||||
console.print(f"[yellow]Skipping {name} — no command defined.[/yellow]")
|
||||
continue
|
||||
console.print(f"[cyan]▶ Running: {name}[/cyan]")
|
||||
if verbose:
|
||||
console.print(f"[dim] $ {cmd}[/dim]")
|
||||
try:
|
||||
result = subprocess.run( # noqa: S602
|
||||
cmd,
|
||||
shell=True,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=120,
|
||||
)
|
||||
if result.stdout.strip():
|
||||
console.print(result.stdout.strip())
|
||||
if result.returncode != 0:
|
||||
console.print(f"[red] ✗ {name} exited with code {result.returncode}[/red]")
|
||||
if result.stderr.strip():
|
||||
console.print(f"[red]{result.stderr.strip()}[/red]")
|
||||
else:
|
||||
console.print(f"[green] ✓ {name} completed successfully[/green]")
|
||||
except subprocess.TimeoutExpired:
|
||||
console.print(f"[red] ✗ {name} timed out after 120s[/red]")
|
||||
except Exception as exc:
|
||||
console.print(f"[red] ✗ {name} failed: {exc}[/red]")
|
||||
|
||||
|
||||
@app.command()
|
||||
@@ -159,9 +225,96 @@ def log_run(
|
||||
console.print(f"[dim]Message:[/dim] {message}")
|
||||
console.print()
|
||||
|
||||
# TODO: Persist to actual logbook file
|
||||
# This would append to a logbook file (e.g., .loop/logbook.jsonl)
|
||||
console.print("[green]✓[/green] Entry logged (simulated)")
|
||||
logbook_path = Path(".loop/logbook.jsonl")
|
||||
logbook_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
entry = json.dumps({"timestamp": timestamp, "category": category, "message": message})
|
||||
with open(logbook_path, "a", encoding="utf-8") as f:
|
||||
f.write(entry + "\n")
|
||||
console.print(f"[green]✓[/green] Entry logged to {logbook_path}")
|
||||
|
||||
|
||||
def _show_automations_table(limit: int) -> None:
|
||||
"""Display active automations from the automations config."""
|
||||
config_path = _get_config_dir() / "automations.json"
|
||||
config = _load_json_config(config_path)
|
||||
enabled = [a for a in config.get("automations", []) if a.get("enabled", False)]
|
||||
|
||||
table = Table(title="Active Automations")
|
||||
table.add_column("ID", style="cyan")
|
||||
table.add_column("Name", style="green")
|
||||
table.add_column("Category", style="yellow")
|
||||
table.add_column("Trigger", style="magenta")
|
||||
|
||||
for auto in enabled[:limit]:
|
||||
table.add_row(
|
||||
auto.get("id", ""),
|
||||
auto.get("name", ""),
|
||||
"✓" if auto.get("enabled", False) else "✗",
|
||||
auto.get("category", ""),
|
||||
)
|
||||
|
||||
console.print(table)
|
||||
console.print()
|
||||
|
||||
|
||||
def _show_prs_table(limit: int) -> None:
|
||||
"""Display open pull requests from Gitea."""
|
||||
table = Table(title="Open Pull Requests")
|
||||
table.add_column("#", style="cyan")
|
||||
table.add_column("Title", style="green")
|
||||
table.add_column("Author", style="yellow")
|
||||
table.add_column("Status", style="magenta")
|
||||
try:
|
||||
prs = _gitea_api_get(f"/repos/{GITEA_REPO}/pulls?state=open")
|
||||
if prs:
|
||||
for pr in prs[:limit]:
|
||||
table.add_row(
|
||||
str(pr.get("number", "")),
|
||||
pr.get("title", ""),
|
||||
pr.get("user", {}).get("login", ""),
|
||||
pr.get("state", ""),
|
||||
)
|
||||
else:
|
||||
table.add_row("—", "[dim]No open PRs[/dim]", "—", "—")
|
||||
except Exception as exc:
|
||||
table.add_row("—", f"[red]Error fetching PRs: {exc}[/red]", "—", "—")
|
||||
console.print(table)
|
||||
console.print()
|
||||
|
||||
|
||||
def _show_issues_table(limit: int) -> None:
|
||||
"""Display open issues from Gitea."""
|
||||
table = Table(title="Issues Calling for Attention")
|
||||
table.add_column("#", style="cyan")
|
||||
table.add_column("Title", style="green")
|
||||
table.add_column("Type", style="yellow")
|
||||
table.add_column("Priority", style="magenta")
|
||||
try:
|
||||
issues = _gitea_api_get(f"/repos/{GITEA_REPO}/issues?state=open&type=issues&limit={limit}")
|
||||
if issues:
|
||||
for issue in issues[:limit]:
|
||||
labels = [lb.get("name", "") for lb in issue.get("labels", [])]
|
||||
priority = next((lb for lb in labels if "priority" in lb.lower()), "—")
|
||||
issue_type = next(
|
||||
(
|
||||
lb
|
||||
for lb in labels
|
||||
if lb.lower() in ("bug", "feature", "refactor", "enhancement")
|
||||
),
|
||||
"—",
|
||||
)
|
||||
table.add_row(
|
||||
str(issue.get("number", "")),
|
||||
issue.get("title", ""),
|
||||
issue_type,
|
||||
priority,
|
||||
)
|
||||
else:
|
||||
table.add_row("—", "[dim]No open issues[/dim]", "—", "—")
|
||||
except Exception as exc:
|
||||
table.add_row("—", f"[red]Error fetching issues: {exc}[/red]", "—", "—")
|
||||
console.print(table)
|
||||
console.print()
|
||||
|
||||
|
||||
@app.command()
|
||||
@@ -180,54 +333,13 @@ def inbox(
|
||||
console.print("[bold green]Timmy Inbox[/bold green]")
|
||||
console.print()
|
||||
|
||||
# Load automations to show what's enabled
|
||||
config_path = _get_config_dir() / "automations.json"
|
||||
config = _load_json_config(config_path)
|
||||
_show_automations_table(limit)
|
||||
|
||||
automations = config.get("automations", [])
|
||||
enabled_automations = [a for a in automations if a.get("enabled", False)]
|
||||
|
||||
# Show automation status
|
||||
auto_table = Table(title="Active Automations")
|
||||
auto_table.add_column("ID", style="cyan")
|
||||
auto_table.add_column("Name", style="green")
|
||||
auto_table.add_column("Category", style="yellow")
|
||||
auto_table.add_column("Trigger", style="magenta")
|
||||
|
||||
for auto in enabled_automations[:limit]:
|
||||
auto_table.add_row(
|
||||
auto.get("id", ""),
|
||||
auto.get("name", ""),
|
||||
"✓" if auto.get("enabled", False) else "✗",
|
||||
auto.get("category", ""),
|
||||
)
|
||||
|
||||
console.print(auto_table)
|
||||
console.print()
|
||||
|
||||
# TODO: Fetch actual PRs from Gitea API
|
||||
if include_prs:
|
||||
pr_table = Table(title="Open Pull Requests (placeholder)")
|
||||
pr_table.add_column("#", style="cyan")
|
||||
pr_table.add_column("Title", style="green")
|
||||
pr_table.add_column("Author", style="yellow")
|
||||
pr_table.add_column("Status", style="magenta")
|
||||
pr_table.add_row("—", "[dim]No PRs fetched (Gitea API not configured)[/dim]", "—", "—")
|
||||
console.print(pr_table)
|
||||
console.print()
|
||||
_show_prs_table(limit)
|
||||
|
||||
# TODO: Fetch relevant issues from Gitea API
|
||||
if include_issues:
|
||||
issue_table = Table(title="Issues Calling for Attention (placeholder)")
|
||||
issue_table.add_column("#", style="cyan")
|
||||
issue_table.add_column("Title", style="green")
|
||||
issue_table.add_column("Type", style="yellow")
|
||||
issue_table.add_column("Priority", style="magenta")
|
||||
issue_table.add_row(
|
||||
"—", "[dim]No issues fetched (Gitea API not configured)[/dim]", "—", "—"
|
||||
)
|
||||
console.print(issue_table)
|
||||
console.print()
|
||||
_show_issues_table(limit)
|
||||
|
||||
|
||||
@app.command()
|
||||
|
||||
@@ -2547,3 +2547,120 @@
|
||||
.tower-adv-title { font-size: 0.85rem; font-weight: 600; color: var(--text-bright); }
|
||||
.tower-adv-detail { font-size: 0.8rem; color: var(--text); margin-top: 2px; }
|
||||
.tower-adv-action { font-size: 0.75rem; color: var(--green); margin-top: 4px; font-style: italic; }
|
||||
|
||||
|
||||
/* ── Voice settings ───────────────────────────────────────── */
|
||||
.voice-settings-page { max-width: 600px; margin: 0 auto; }
|
||||
|
||||
.vs-field { margin-bottom: 1.5rem; }
|
||||
|
||||
.vs-label {
|
||||
display: block;
|
||||
font-size: 0.75rem;
|
||||
font-weight: 700;
|
||||
letter-spacing: 0.1em;
|
||||
color: var(--text-dim);
|
||||
margin-bottom: 0.5rem;
|
||||
}
|
||||
.vs-value { color: var(--green); font-family: var(--font); }
|
||||
|
||||
.vs-slider {
|
||||
width: 100%;
|
||||
-webkit-appearance: none;
|
||||
appearance: none;
|
||||
height: 4px;
|
||||
background: var(--border);
|
||||
border-radius: 2px;
|
||||
outline: none;
|
||||
cursor: pointer;
|
||||
}
|
||||
.vs-slider::-webkit-slider-thumb {
|
||||
-webkit-appearance: none;
|
||||
appearance: none;
|
||||
width: 18px;
|
||||
height: 18px;
|
||||
border-radius: 50%;
|
||||
background: var(--purple);
|
||||
cursor: pointer;
|
||||
box-shadow: 0 0 6px rgba(124, 58, 237, 0.5);
|
||||
transition: box-shadow 0.2s;
|
||||
}
|
||||
.vs-slider::-webkit-slider-thumb:hover { box-shadow: 0 0 12px rgba(124, 58, 237, 0.8); }
|
||||
.vs-slider::-moz-range-thumb {
|
||||
width: 18px;
|
||||
height: 18px;
|
||||
border-radius: 50%;
|
||||
background: var(--purple);
|
||||
cursor: pointer;
|
||||
border: none;
|
||||
box-shadow: 0 0 6px rgba(124, 58, 237, 0.5);
|
||||
}
|
||||
.vs-range-labels {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
font-size: 0.7rem;
|
||||
color: var(--text-dim);
|
||||
margin-top: 0.25rem;
|
||||
}
|
||||
|
||||
.vs-select,
|
||||
.vs-input {
|
||||
width: 100%;
|
||||
padding: 0.5rem 0.75rem;
|
||||
background: var(--bg-card);
|
||||
border: 1px solid var(--border);
|
||||
border-radius: var(--radius-sm);
|
||||
color: var(--text);
|
||||
font-family: var(--font);
|
||||
font-size: 0.9rem;
|
||||
}
|
||||
.vs-select { cursor: pointer; }
|
||||
.vs-select:focus,
|
||||
.vs-input:focus {
|
||||
outline: none;
|
||||
border-color: var(--purple);
|
||||
box-shadow: 0 0 0 2px rgba(124, 58, 237, 0.2);
|
||||
}
|
||||
|
||||
.vs-unavailable {
|
||||
font-size: 0.85rem;
|
||||
color: var(--text-dim);
|
||||
padding: 0.5rem 0.75rem;
|
||||
border: 1px dashed var(--border);
|
||||
border-radius: var(--radius-sm);
|
||||
}
|
||||
|
||||
.vs-actions {
|
||||
display: flex;
|
||||
gap: 0.75rem;
|
||||
margin-top: 1.5rem;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
.vs-btn-preview,
|
||||
.vs-btn-save {
|
||||
flex: 1;
|
||||
padding: 0.6rem 1.2rem;
|
||||
border-radius: var(--radius-sm);
|
||||
font-family: var(--font);
|
||||
font-size: 0.85rem;
|
||||
font-weight: 700;
|
||||
letter-spacing: 0.08em;
|
||||
cursor: pointer;
|
||||
min-height: 44px;
|
||||
transition: opacity 0.2s, box-shadow 0.2s, background 0.2s;
|
||||
}
|
||||
.vs-btn-preview {
|
||||
background: transparent;
|
||||
border: 1px solid var(--purple);
|
||||
color: var(--purple);
|
||||
}
|
||||
.vs-btn-preview:hover {
|
||||
background: rgba(124, 58, 237, 0.15);
|
||||
box-shadow: 0 0 8px rgba(124, 58, 237, 0.3);
|
||||
}
|
||||
.vs-btn-save {
|
||||
background: var(--green);
|
||||
border: none;
|
||||
color: var(--bg-deep);
|
||||
}
|
||||
.vs-btn-save:hover { opacity: 0.85; }
|
||||
|
||||
@@ -51,6 +51,9 @@ def pytest_collection_modifyitems(config, items):
|
||||
item.add_marker(pytest.mark.docker)
|
||||
item.add_marker(pytest.mark.skip_ci)
|
||||
|
||||
if "setup_prod" in test_path or "setup_script" in test_path:
|
||||
item.add_marker(pytest.mark.skip_ci)
|
||||
|
||||
if "ollama" in test_path or "test_ollama" in item.name:
|
||||
item.add_marker(pytest.mark.ollama)
|
||||
|
||||
|
||||
530
tests/dashboard/test_daily_run.py
Normal file
530
tests/dashboard/test_daily_run.py
Normal file
@@ -0,0 +1,530 @@
|
||||
"""Unit tests for dashboard/routes/daily_run.py."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
from datetime import UTC, datetime, timedelta
|
||||
from pathlib import Path
|
||||
from unittest.mock import MagicMock, patch
|
||||
from urllib.error import HTTPError, URLError
|
||||
|
||||
import pytest
|
||||
|
||||
from dashboard.routes.daily_run import (
|
||||
DEFAULT_CONFIG,
|
||||
LAYER_LABELS,
|
||||
DailyRunMetrics,
|
||||
GiteaClient,
|
||||
LayerMetrics,
|
||||
_extract_layer,
|
||||
_fetch_layer_metrics,
|
||||
_get_metrics,
|
||||
_get_token,
|
||||
_load_config,
|
||||
_load_cycle_data,
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _load_config
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_load_config_returns_defaults():
|
||||
with patch("dashboard.routes.daily_run.CONFIG_PATH") as mock_path:
|
||||
mock_path.exists.return_value = False
|
||||
config = _load_config()
|
||||
assert config["gitea_api"] == DEFAULT_CONFIG["gitea_api"]
|
||||
assert config["repo_slug"] == DEFAULT_CONFIG["repo_slug"]
|
||||
|
||||
|
||||
def test_load_config_merges_file_orchestrator_section(tmp_path):
|
||||
config_file = tmp_path / "daily_run.json"
|
||||
config_file.write_text(
|
||||
json.dumps({"orchestrator": {"repo_slug": "custom/repo", "gitea_api": "http://custom:3000/api/v1"}})
|
||||
)
|
||||
with patch("dashboard.routes.daily_run.CONFIG_PATH", config_file):
|
||||
config = _load_config()
|
||||
assert config["repo_slug"] == "custom/repo"
|
||||
assert config["gitea_api"] == "http://custom:3000/api/v1"
|
||||
|
||||
|
||||
def test_load_config_ignores_invalid_json(tmp_path):
|
||||
config_file = tmp_path / "daily_run.json"
|
||||
config_file.write_text("not valid json{{")
|
||||
with patch("dashboard.routes.daily_run.CONFIG_PATH", config_file):
|
||||
config = _load_config()
|
||||
assert config["repo_slug"] == DEFAULT_CONFIG["repo_slug"]
|
||||
|
||||
|
||||
def test_load_config_env_overrides(monkeypatch):
|
||||
monkeypatch.setenv("TIMMY_GITEA_API", "http://envapi:3000/api/v1")
|
||||
monkeypatch.setenv("TIMMY_REPO_SLUG", "env/repo")
|
||||
monkeypatch.setenv("TIMMY_GITEA_TOKEN", "env-token-123")
|
||||
with patch("dashboard.routes.daily_run.CONFIG_PATH") as mock_path:
|
||||
mock_path.exists.return_value = False
|
||||
config = _load_config()
|
||||
assert config["gitea_api"] == "http://envapi:3000/api/v1"
|
||||
assert config["repo_slug"] == "env/repo"
|
||||
assert config["token"] == "env-token-123"
|
||||
|
||||
|
||||
def test_load_config_no_env_overrides_without_vars(monkeypatch):
|
||||
monkeypatch.delenv("TIMMY_GITEA_API", raising=False)
|
||||
monkeypatch.delenv("TIMMY_REPO_SLUG", raising=False)
|
||||
monkeypatch.delenv("TIMMY_GITEA_TOKEN", raising=False)
|
||||
with patch("dashboard.routes.daily_run.CONFIG_PATH") as mock_path:
|
||||
mock_path.exists.return_value = False
|
||||
config = _load_config()
|
||||
assert "token" not in config
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _get_token
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_get_token_from_config_dict():
|
||||
config = {"token": "direct-token", "token_file": "~/.hermes/gitea_token"}
|
||||
assert _get_token(config) == "direct-token"
|
||||
|
||||
|
||||
def test_get_token_from_file(tmp_path):
|
||||
token_file = tmp_path / "token.txt"
|
||||
token_file.write_text(" file-token \n")
|
||||
config = {"token_file": str(token_file)}
|
||||
assert _get_token(config) == "file-token"
|
||||
|
||||
|
||||
def test_get_token_returns_none_when_file_missing(tmp_path):
|
||||
config = {"token_file": str(tmp_path / "nonexistent_token")}
|
||||
assert _get_token(config) is None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# GiteaClient
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _make_client(**kwargs) -> GiteaClient:
|
||||
config = {**DEFAULT_CONFIG, **kwargs}
|
||||
return GiteaClient(config, token="test-token")
|
||||
|
||||
|
||||
def test_gitea_client_headers_include_auth():
|
||||
client = _make_client()
|
||||
headers = client._headers()
|
||||
assert headers["Authorization"] == "token test-token"
|
||||
assert headers["Accept"] == "application/json"
|
||||
|
||||
|
||||
def test_gitea_client_headers_no_token():
|
||||
config = {**DEFAULT_CONFIG}
|
||||
client = GiteaClient(config, token=None)
|
||||
headers = client._headers()
|
||||
assert "Authorization" not in headers
|
||||
|
||||
|
||||
def test_gitea_client_api_url():
|
||||
client = _make_client()
|
||||
url = client._api_url("issues")
|
||||
assert url == f"{DEFAULT_CONFIG['gitea_api']}/repos/{DEFAULT_CONFIG['repo_slug']}/issues"
|
||||
|
||||
|
||||
def test_gitea_client_api_url_strips_trailing_slash():
|
||||
config = {**DEFAULT_CONFIG, "gitea_api": "http://localhost:3000/api/v1/"}
|
||||
client = GiteaClient(config, token=None)
|
||||
url = client._api_url("issues")
|
||||
assert "//" not in url.replace("http://", "")
|
||||
|
||||
|
||||
def test_gitea_client_is_available_true():
|
||||
client = _make_client()
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status = 200
|
||||
mock_resp.__enter__ = lambda s: mock_resp
|
||||
mock_resp.__exit__ = MagicMock(return_value=False)
|
||||
with patch("dashboard.routes.daily_run.urlopen", return_value=mock_resp):
|
||||
assert client.is_available() is True
|
||||
|
||||
|
||||
def test_gitea_client_is_available_cached():
|
||||
client = _make_client()
|
||||
client._available = True
|
||||
# Should not call urlopen at all
|
||||
with patch("dashboard.routes.daily_run.urlopen") as mock_urlopen:
|
||||
assert client.is_available() is True
|
||||
mock_urlopen.assert_not_called()
|
||||
|
||||
|
||||
def test_gitea_client_is_available_false_on_url_error():
|
||||
client = _make_client()
|
||||
with patch("dashboard.routes.daily_run.urlopen", side_effect=URLError("refused")):
|
||||
assert client.is_available() is False
|
||||
|
||||
|
||||
def test_gitea_client_is_available_false_on_timeout():
|
||||
client = _make_client()
|
||||
with patch("dashboard.routes.daily_run.urlopen", side_effect=TimeoutError()):
|
||||
assert client.is_available() is False
|
||||
|
||||
|
||||
def test_gitea_client_get_paginated_single_page():
|
||||
client = _make_client()
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.read.return_value = json.dumps([{"id": 1}, {"id": 2}]).encode()
|
||||
mock_resp.__enter__ = lambda s: mock_resp
|
||||
mock_resp.__exit__ = MagicMock(return_value=False)
|
||||
with patch("dashboard.routes.daily_run.urlopen", return_value=mock_resp):
|
||||
result = client.get_paginated("issues")
|
||||
assert len(result) == 2
|
||||
assert result[0]["id"] == 1
|
||||
|
||||
|
||||
def test_gitea_client_get_paginated_empty():
|
||||
client = _make_client()
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.read.return_value = b"[]"
|
||||
mock_resp.__enter__ = lambda s: mock_resp
|
||||
mock_resp.__exit__ = MagicMock(return_value=False)
|
||||
with patch("dashboard.routes.daily_run.urlopen", return_value=mock_resp):
|
||||
result = client.get_paginated("issues")
|
||||
assert result == []
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# LayerMetrics.trend
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_layer_metrics_trend_no_previous_no_current():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=0, previous_count=0)
|
||||
assert lm.trend == "→"
|
||||
|
||||
|
||||
def test_layer_metrics_trend_no_previous_with_current():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=5, previous_count=0)
|
||||
assert lm.trend == "↑"
|
||||
|
||||
|
||||
def test_layer_metrics_trend_big_increase():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=130, previous_count=100)
|
||||
assert lm.trend == "↑↑"
|
||||
|
||||
|
||||
def test_layer_metrics_trend_small_increase():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=108, previous_count=100)
|
||||
assert lm.trend == "↑"
|
||||
|
||||
|
||||
def test_layer_metrics_trend_stable():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=100, previous_count=100)
|
||||
assert lm.trend == "→"
|
||||
|
||||
|
||||
def test_layer_metrics_trend_small_decrease():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=92, previous_count=100)
|
||||
assert lm.trend == "↓"
|
||||
|
||||
|
||||
def test_layer_metrics_trend_big_decrease():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=70, previous_count=100)
|
||||
assert lm.trend == "↓↓"
|
||||
|
||||
|
||||
def test_layer_metrics_trend_color_up():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=200, previous_count=100)
|
||||
assert lm.trend_color == "var(--green)"
|
||||
|
||||
|
||||
def test_layer_metrics_trend_color_down():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=50, previous_count=100)
|
||||
assert lm.trend_color == "var(--amber)"
|
||||
|
||||
|
||||
def test_layer_metrics_trend_color_stable():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=100, previous_count=100)
|
||||
assert lm.trend_color == "var(--text-dim)"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# DailyRunMetrics.sessions_trend
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _make_daily_metrics(**kwargs) -> DailyRunMetrics:
|
||||
defaults = dict(
|
||||
sessions_completed=10,
|
||||
sessions_previous=8,
|
||||
layers=[],
|
||||
total_touched_current=20,
|
||||
total_touched_previous=15,
|
||||
lookback_days=7,
|
||||
generated_at=datetime.now(UTC).isoformat(),
|
||||
)
|
||||
defaults.update(kwargs)
|
||||
return DailyRunMetrics(**defaults)
|
||||
|
||||
|
||||
def test_daily_metrics_sessions_trend_big_increase():
|
||||
m = _make_daily_metrics(sessions_completed=130, sessions_previous=100)
|
||||
assert m.sessions_trend == "↑↑"
|
||||
|
||||
|
||||
def test_daily_metrics_sessions_trend_stable():
|
||||
m = _make_daily_metrics(sessions_completed=100, sessions_previous=100)
|
||||
assert m.sessions_trend == "→"
|
||||
|
||||
|
||||
def test_daily_metrics_sessions_trend_no_previous_zero_completed():
|
||||
m = _make_daily_metrics(sessions_completed=0, sessions_previous=0)
|
||||
assert m.sessions_trend == "→"
|
||||
|
||||
|
||||
def test_daily_metrics_sessions_trend_no_previous_with_completed():
|
||||
m = _make_daily_metrics(sessions_completed=5, sessions_previous=0)
|
||||
assert m.sessions_trend == "↑"
|
||||
|
||||
|
||||
def test_daily_metrics_sessions_trend_color_green():
|
||||
m = _make_daily_metrics(sessions_completed=200, sessions_previous=100)
|
||||
assert m.sessions_trend_color == "var(--green)"
|
||||
|
||||
|
||||
def test_daily_metrics_sessions_trend_color_amber():
|
||||
m = _make_daily_metrics(sessions_completed=50, sessions_previous=100)
|
||||
assert m.sessions_trend_color == "var(--amber)"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _extract_layer
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_extract_layer_finds_layer_label():
|
||||
labels = [{"name": "bug"}, {"name": "layer:triage"}, {"name": "urgent"}]
|
||||
assert _extract_layer(labels) == "triage"
|
||||
|
||||
|
||||
def test_extract_layer_returns_none_when_no_layer():
|
||||
labels = [{"name": "bug"}, {"name": "feature"}]
|
||||
assert _extract_layer(labels) is None
|
||||
|
||||
|
||||
def test_extract_layer_empty_labels():
|
||||
assert _extract_layer([]) is None
|
||||
|
||||
|
||||
def test_extract_layer_first_match_wins():
|
||||
labels = [{"name": "layer:micro-fix"}, {"name": "layer:tests"}]
|
||||
assert _extract_layer(labels) == "micro-fix"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _load_cycle_data
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_load_cycle_data_missing_file(tmp_path):
|
||||
with patch("dashboard.routes.daily_run.REPO_ROOT", tmp_path):
|
||||
result = _load_cycle_data(days=14)
|
||||
assert result == {"current": 0, "previous": 0}
|
||||
|
||||
|
||||
def test_load_cycle_data_counts_successful_sessions(tmp_path):
|
||||
retro_dir = tmp_path / ".loop" / "retro"
|
||||
retro_dir.mkdir(parents=True)
|
||||
retro_file = retro_dir / "cycles.jsonl"
|
||||
|
||||
now = datetime.now(UTC)
|
||||
recent_ts = (now - timedelta(days=3)).isoformat()
|
||||
older_ts = (now - timedelta(days=10)).isoformat()
|
||||
old_ts = (now - timedelta(days=20)).isoformat()
|
||||
|
||||
lines = [
|
||||
json.dumps({"timestamp": recent_ts, "success": True}),
|
||||
json.dumps({"timestamp": recent_ts, "success": False}), # not counted
|
||||
json.dumps({"timestamp": older_ts, "success": True}),
|
||||
json.dumps({"timestamp": old_ts, "success": True}), # outside window
|
||||
]
|
||||
retro_file.write_text("\n".join(lines))
|
||||
|
||||
with patch("dashboard.routes.daily_run.REPO_ROOT", tmp_path):
|
||||
result = _load_cycle_data(days=7)
|
||||
|
||||
assert result["current"] == 1
|
||||
assert result["previous"] == 1
|
||||
|
||||
|
||||
def test_load_cycle_data_skips_invalid_json_lines(tmp_path):
|
||||
retro_dir = tmp_path / ".loop" / "retro"
|
||||
retro_dir.mkdir(parents=True)
|
||||
retro_file = retro_dir / "cycles.jsonl"
|
||||
|
||||
now = datetime.now(UTC)
|
||||
recent_ts = (now - timedelta(days=1)).isoformat()
|
||||
retro_file.write_text(
|
||||
f'not valid json\n{json.dumps({"timestamp": recent_ts, "success": True})}\n'
|
||||
)
|
||||
|
||||
with patch("dashboard.routes.daily_run.REPO_ROOT", tmp_path):
|
||||
result = _load_cycle_data(days=7)
|
||||
|
||||
assert result["current"] == 1
|
||||
|
||||
|
||||
def test_load_cycle_data_skips_entries_with_no_timestamp(tmp_path):
|
||||
retro_dir = tmp_path / ".loop" / "retro"
|
||||
retro_dir.mkdir(parents=True)
|
||||
retro_file = retro_dir / "cycles.jsonl"
|
||||
retro_file.write_text(json.dumps({"success": True}))
|
||||
|
||||
with patch("dashboard.routes.daily_run.REPO_ROOT", tmp_path):
|
||||
result = _load_cycle_data(days=7)
|
||||
|
||||
assert result == {"current": 0, "previous": 0}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _fetch_layer_metrics
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _make_issue(updated_offset_days: int) -> dict:
|
||||
ts = (datetime.now(UTC) - timedelta(days=updated_offset_days)).isoformat()
|
||||
return {"updated_at": ts, "labels": [{"name": "layer:triage"}]}
|
||||
|
||||
|
||||
def test_fetch_layer_metrics_counts_current_and_previous():
|
||||
client = _make_client()
|
||||
client._available = True
|
||||
|
||||
recent_issue = _make_issue(updated_offset_days=3)
|
||||
older_issue = _make_issue(updated_offset_days=10)
|
||||
|
||||
with patch.object(client, "get_paginated", return_value=[recent_issue, older_issue]):
|
||||
layers, total_current, total_previous = _fetch_layer_metrics(client, lookback_days=7)
|
||||
|
||||
# Should have one entry per LAYER_LABELS
|
||||
assert len(layers) == len(LAYER_LABELS)
|
||||
triage = next(lm for lm in layers if lm.name == "triage")
|
||||
assert triage.current_count == 1
|
||||
assert triage.previous_count == 1
|
||||
|
||||
|
||||
def test_fetch_layer_metrics_degrades_on_http_error():
|
||||
client = _make_client()
|
||||
client._available = True
|
||||
|
||||
with patch.object(client, "get_paginated", side_effect=URLError("network")):
|
||||
layers, total_current, total_previous = _fetch_layer_metrics(client, lookback_days=7)
|
||||
|
||||
assert len(layers) == len(LAYER_LABELS)
|
||||
for lm in layers:
|
||||
assert lm.current_count == 0
|
||||
assert lm.previous_count == 0
|
||||
assert total_current == 0
|
||||
assert total_previous == 0
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _get_metrics
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_get_metrics_returns_none_when_gitea_unavailable():
|
||||
with patch("dashboard.routes.daily_run._load_config", return_value=DEFAULT_CONFIG):
|
||||
with patch("dashboard.routes.daily_run._get_token", return_value=None):
|
||||
with patch.object(GiteaClient, "is_available", return_value=False):
|
||||
result = _get_metrics()
|
||||
assert result is None
|
||||
|
||||
|
||||
def test_get_metrics_returns_daily_run_metrics():
|
||||
mock_layers = [
|
||||
LayerMetrics(name="triage", label="layer:triage", current_count=5, previous_count=3)
|
||||
]
|
||||
with patch("dashboard.routes.daily_run._load_config", return_value=DEFAULT_CONFIG):
|
||||
with patch("dashboard.routes.daily_run._get_token", return_value="tok"):
|
||||
with patch.object(GiteaClient, "is_available", return_value=True):
|
||||
with patch(
|
||||
"dashboard.routes.daily_run._fetch_layer_metrics",
|
||||
return_value=(mock_layers, 5, 3),
|
||||
):
|
||||
with patch(
|
||||
"dashboard.routes.daily_run._load_cycle_data",
|
||||
return_value={"current": 10, "previous": 8},
|
||||
):
|
||||
result = _get_metrics(lookback_days=7)
|
||||
|
||||
assert result is not None
|
||||
assert result.sessions_completed == 10
|
||||
assert result.sessions_previous == 8
|
||||
assert result.lookback_days == 7
|
||||
assert result.layers == mock_layers
|
||||
|
||||
|
||||
def test_get_metrics_returns_none_on_exception():
|
||||
with patch("dashboard.routes.daily_run._load_config", return_value=DEFAULT_CONFIG):
|
||||
with patch("dashboard.routes.daily_run._get_token", return_value="tok"):
|
||||
with patch.object(GiteaClient, "is_available", return_value=True):
|
||||
with patch(
|
||||
"dashboard.routes.daily_run._fetch_layer_metrics",
|
||||
side_effect=Exception("unexpected"),
|
||||
):
|
||||
result = _get_metrics()
|
||||
assert result is None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Route handlers (FastAPI)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_daily_run_metrics_api_unavailable(client):
|
||||
with patch("dashboard.routes.daily_run._get_metrics", return_value=None):
|
||||
resp = client.get("/daily-run/metrics")
|
||||
assert resp.status_code == 503
|
||||
data = resp.json()
|
||||
assert data["status"] == "unavailable"
|
||||
|
||||
|
||||
def test_daily_run_metrics_api_returns_json(client):
|
||||
mock_metrics = _make_daily_metrics(
|
||||
layers=[
|
||||
LayerMetrics(name="triage", label="layer:triage", current_count=3, previous_count=2)
|
||||
]
|
||||
)
|
||||
with patch("dashboard.routes.daily_run._get_metrics", return_value=mock_metrics):
|
||||
with patch(
|
||||
"dashboard.routes.quests.check_daily_run_quests",
|
||||
return_value=[],
|
||||
create=True,
|
||||
):
|
||||
resp = client.get("/daily-run/metrics?lookback_days=7")
|
||||
assert resp.status_code == 200
|
||||
data = resp.json()
|
||||
assert data["status"] == "ok"
|
||||
assert data["lookback_days"] == 7
|
||||
assert "sessions" in data
|
||||
assert "layers" in data
|
||||
assert "totals" in data
|
||||
assert len(data["layers"]) == 1
|
||||
assert data["layers"][0]["name"] == "triage"
|
||||
|
||||
|
||||
def test_daily_run_panel_returns_html(client):
|
||||
mock_metrics = _make_daily_metrics()
|
||||
with patch("dashboard.routes.daily_run._get_metrics", return_value=mock_metrics):
|
||||
with patch("dashboard.routes.daily_run._load_config", return_value=DEFAULT_CONFIG):
|
||||
resp = client.get("/daily-run/panel")
|
||||
assert resp.status_code == 200
|
||||
assert "text/html" in resp.headers["content-type"]
|
||||
|
||||
|
||||
def test_daily_run_panel_when_unavailable(client):
|
||||
with patch("dashboard.routes.daily_run._get_metrics", return_value=None):
|
||||
with patch("dashboard.routes.daily_run._load_config", return_value=DEFAULT_CONFIG):
|
||||
resp = client.get("/daily-run/panel")
|
||||
assert resp.status_code == 200
|
||||
@@ -11,10 +11,13 @@ PROD_PROJECT_DIR = Path("/home/ubuntu/prod-sovereign-stack")
|
||||
PROD_VAULT_DIR = PROD_PROJECT_DIR / "TimmyVault"
|
||||
SETUP_SCRIPT_PATH = Path("/home/ubuntu/setup_timmy.sh")
|
||||
|
||||
pytestmark = pytest.mark.skipif(
|
||||
not SETUP_SCRIPT_PATH.exists(),
|
||||
reason=f"Setup script not found at {SETUP_SCRIPT_PATH}",
|
||||
)
|
||||
pytestmark = [
|
||||
pytest.mark.skip_ci,
|
||||
pytest.mark.skipif(
|
||||
not SETUP_SCRIPT_PATH.exists(),
|
||||
reason=f"Setup script not found at {SETUP_SCRIPT_PATH}",
|
||||
),
|
||||
]
|
||||
|
||||
|
||||
@pytest.fixture(scope="module", autouse=True)
|
||||
|
||||
439
tests/infrastructure/test_metabolic_router.py
Normal file
439
tests/infrastructure/test_metabolic_router.py
Normal file
@@ -0,0 +1,439 @@
|
||||
"""Tests for the three-tier metabolic LLM router (issue #966)."""
|
||||
|
||||
from unittest.mock import AsyncMock, MagicMock
|
||||
|
||||
import pytest
|
||||
|
||||
from infrastructure.router.metabolic import (
|
||||
DEFAULT_TIER_MODELS,
|
||||
MetabolicRouter,
|
||||
ModelTier,
|
||||
build_prompt,
|
||||
classify_complexity,
|
||||
get_metabolic_router,
|
||||
)
|
||||
|
||||
pytestmark = pytest.mark.unit
|
||||
|
||||
# ── classify_complexity ──────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestClassifyComplexity:
|
||||
"""Verify tier classification for representative task / state pairs."""
|
||||
|
||||
# ── T1: Routine ─────────────────────────────────────────────────────────
|
||||
|
||||
def test_simple_navigation_is_t1(self):
|
||||
assert classify_complexity("go north", {}) == ModelTier.T1_ROUTINE
|
||||
|
||||
def test_single_action_is_t1(self):
|
||||
assert classify_complexity("open door", {}) == ModelTier.T1_ROUTINE
|
||||
|
||||
def test_t1_with_extra_words_stays_t1(self):
|
||||
# 6 words, all T1 territory, no active context
|
||||
assert classify_complexity("go south and take it", {}) == ModelTier.T1_ROUTINE
|
||||
|
||||
def test_t1_long_task_upgrades_to_t2(self):
|
||||
# More than 6 words → not T1 even with nav words
|
||||
assert (
|
||||
classify_complexity("go north and then move east and pick up the sword", {})
|
||||
!= ModelTier.T1_ROUTINE
|
||||
)
|
||||
|
||||
def test_active_quest_upgrades_t1_to_t2(self):
|
||||
state = {"active_quests": ["Rescue the Mage"]}
|
||||
assert classify_complexity("go north", state) == ModelTier.T2_MEDIUM
|
||||
|
||||
def test_dialogue_active_upgrades_t1_to_t2(self):
|
||||
state = {"dialogue_active": True}
|
||||
assert classify_complexity("yes", state) == ModelTier.T2_MEDIUM
|
||||
|
||||
def test_combat_active_upgrades_t1_to_t2(self):
|
||||
state = {"combat_active": True}
|
||||
assert classify_complexity("attack", state) == ModelTier.T2_MEDIUM
|
||||
|
||||
# ── T2: Medium ──────────────────────────────────────────────────────────
|
||||
|
||||
def test_default_is_t2(self):
|
||||
assert classify_complexity("what do I have in my inventory", {}) == ModelTier.T2_MEDIUM
|
||||
|
||||
def test_dialogue_response_is_t2(self):
|
||||
state = {"dialogue_active": True, "dialogue_npc": "Caius Cosades"}
|
||||
result = classify_complexity("I'm looking for Caius Cosades", state)
|
||||
assert result == ModelTier.T2_MEDIUM
|
||||
|
||||
# ── T3: Complex ─────────────────────────────────────────────────────────
|
||||
|
||||
def test_quest_planning_is_t3(self):
|
||||
assert classify_complexity("plan my quest route", {}) == ModelTier.T3_COMPLEX
|
||||
|
||||
def test_strategy_keyword_is_t3(self):
|
||||
assert classify_complexity("what is the best strategy", {}) == ModelTier.T3_COMPLEX
|
||||
|
||||
def test_stuck_keyword_is_t3(self):
|
||||
assert classify_complexity("I am stuck", {}) == ModelTier.T3_COMPLEX
|
||||
|
||||
def test_stuck_state_is_t3(self):
|
||||
assert classify_complexity("help me", {"stuck": True}) == ModelTier.T3_COMPLEX
|
||||
|
||||
def test_require_t3_flag_forces_t3(self):
|
||||
state = {"require_t3": True}
|
||||
assert classify_complexity("go north", state) == ModelTier.T3_COMPLEX
|
||||
|
||||
def test_optimize_keyword_is_t3(self):
|
||||
assert classify_complexity("optimize my skill build", {}) == ModelTier.T3_COMPLEX
|
||||
|
||||
def test_multi_word_t3_phrase(self):
|
||||
assert classify_complexity("how do i get past the guards", {}) == ModelTier.T3_COMPLEX
|
||||
|
||||
def test_case_insensitive(self):
|
||||
assert classify_complexity("PLAN my route", {}) == ModelTier.T3_COMPLEX
|
||||
|
||||
|
||||
# ── build_prompt ─────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestBuildPrompt:
|
||||
"""Verify prompt structure and content assembly."""
|
||||
|
||||
def test_returns_two_messages(self):
|
||||
msgs = build_prompt({}, {}, "go north")
|
||||
assert len(msgs) == 2
|
||||
assert msgs[0]["role"] == "system"
|
||||
assert msgs[1]["role"] == "user"
|
||||
|
||||
def test_user_message_contains_task(self):
|
||||
msgs = build_prompt({}, {}, "pick up the sword")
|
||||
assert msgs[1]["content"] == "pick up the sword"
|
||||
|
||||
def test_location_in_system(self):
|
||||
msgs = build_prompt({"location": "Balmora"}, {}, "look around")
|
||||
assert "Balmora" in msgs[0]["content"]
|
||||
|
||||
def test_health_in_system(self):
|
||||
msgs = build_prompt({"health": 42}, {}, "rest")
|
||||
assert "42" in msgs[0]["content"]
|
||||
|
||||
def test_inventory_in_system(self):
|
||||
msgs = build_prompt({"inventory": ["iron sword", "bread"]}, {}, "use item")
|
||||
assert "iron sword" in msgs[0]["content"]
|
||||
|
||||
def test_inventory_truncated_to_10(self):
|
||||
inventory = [f"item{i}" for i in range(20)]
|
||||
msgs = build_prompt({"inventory": inventory}, {}, "check")
|
||||
# Only first 10 should appear in the system message
|
||||
assert "item10" not in msgs[0]["content"]
|
||||
|
||||
def test_active_quests_in_system(self):
|
||||
msgs = build_prompt({"active_quests": ["Morrowind Main Quest"]}, {}, "help")
|
||||
assert "Morrowind Main Quest" in msgs[0]["content"]
|
||||
|
||||
def test_stuck_indicator_in_system(self):
|
||||
msgs = build_prompt({"stuck": True}, {}, "what now")
|
||||
assert "STUCK" in msgs[0]["content"]
|
||||
|
||||
def test_dialogue_npc_in_system(self):
|
||||
msgs = build_prompt({}, {"dialogue_active": True, "dialogue_npc": "Vivec"}, "hello")
|
||||
assert "Vivec" in msgs[0]["content"]
|
||||
|
||||
def test_menu_open_in_system(self):
|
||||
msgs = build_prompt({}, {"menu_open": "inventory"}, "check items")
|
||||
assert "inventory" in msgs[0]["content"]
|
||||
|
||||
def test_combat_active_in_system(self):
|
||||
msgs = build_prompt({}, {"combat_active": True}, "attack")
|
||||
assert "COMBAT" in msgs[0]["content"]
|
||||
|
||||
def test_visual_context_in_system(self):
|
||||
msgs = build_prompt({}, {}, "where am I", visual_context="A dark dungeon corridor")
|
||||
assert "dungeon corridor" in msgs[0]["content"]
|
||||
|
||||
def test_missing_optional_fields_omitted(self):
|
||||
msgs = build_prompt({}, {}, "move forward")
|
||||
system = msgs[0]["content"]
|
||||
assert "Health:" not in system
|
||||
assert "Inventory:" not in system
|
||||
assert "Active quests:" not in system
|
||||
|
||||
def test_inventory_dict_items(self):
|
||||
inventory = [{"name": "silver dagger"}, {"name": "potion"}]
|
||||
msgs = build_prompt({"inventory": inventory}, {}, "use")
|
||||
assert "silver dagger" in msgs[0]["content"]
|
||||
|
||||
def test_quest_dict_items(self):
|
||||
quests = [{"name": "The Warlord"}, {"name": "Lost in Translation"}]
|
||||
msgs = build_prompt({"active_quests": quests}, {}, "help")
|
||||
assert "The Warlord" in msgs[0]["content"]
|
||||
|
||||
|
||||
# ── MetabolicRouter ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
class TestMetabolicRouter:
|
||||
"""Test MetabolicRouter routing, tier labelling, and T3 world-pause logic."""
|
||||
|
||||
def _make_router(self, mock_cascade=None):
|
||||
"""Create a MetabolicRouter with a mocked CascadeRouter."""
|
||||
if mock_cascade is None:
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "Move north confirmed.",
|
||||
"provider": "ollama-local",
|
||||
"model": "qwen3:8b",
|
||||
"latency_ms": 120.0,
|
||||
}
|
||||
)
|
||||
return MetabolicRouter(cascade=mock_cascade)
|
||||
|
||||
async def test_route_returns_tier_in_result(self):
|
||||
router = self._make_router()
|
||||
result = await router.route("go north", state={})
|
||||
assert "tier" in result
|
||||
assert result["tier"] == ModelTier.T1_ROUTINE
|
||||
|
||||
async def test_t1_uses_t1_model(self):
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "ok",
|
||||
"provider": "ollama-local",
|
||||
"model": "qwen3:8b",
|
||||
"latency_ms": 100,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
await router.route("go north", state={})
|
||||
call_kwargs = mock_cascade.complete.call_args
|
||||
assert call_kwargs.kwargs["model"] == DEFAULT_TIER_MODELS[ModelTier.T1_ROUTINE]
|
||||
|
||||
async def test_t2_uses_t2_model(self):
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "ok",
|
||||
"provider": "ollama-local",
|
||||
"model": "qwen3:14b",
|
||||
"latency_ms": 300,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
await router.route("what should I say to the innkeeper", state={})
|
||||
call_kwargs = mock_cascade.complete.call_args
|
||||
assert call_kwargs.kwargs["model"] == DEFAULT_TIER_MODELS[ModelTier.T2_MEDIUM]
|
||||
|
||||
async def test_t3_uses_t3_model(self):
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "ok",
|
||||
"provider": "ollama-local",
|
||||
"model": "qwen3:30b",
|
||||
"latency_ms": 2000,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
await router.route("plan the optimal quest route", state={})
|
||||
call_kwargs = mock_cascade.complete.call_args
|
||||
assert call_kwargs.kwargs["model"] == DEFAULT_TIER_MODELS[ModelTier.T3_COMPLEX]
|
||||
|
||||
async def test_custom_tier_models_respected(self):
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "ok",
|
||||
"provider": "test",
|
||||
"model": "custom-8b",
|
||||
"latency_ms": 100,
|
||||
}
|
||||
)
|
||||
custom = {ModelTier.T1_ROUTINE: "custom-8b"}
|
||||
router = MetabolicRouter(cascade=mock_cascade, tier_models=custom)
|
||||
await router.route("go north", state={})
|
||||
call_kwargs = mock_cascade.complete.call_args
|
||||
assert call_kwargs.kwargs["model"] == "custom-8b"
|
||||
|
||||
async def test_t3_pauses_world_before_inference(self):
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "ok",
|
||||
"provider": "ollama",
|
||||
"model": "qwen3:30b",
|
||||
"latency_ms": 1500,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
|
||||
pause_calls = []
|
||||
unpause_calls = []
|
||||
|
||||
mock_world = MagicMock()
|
||||
|
||||
def track_act(cmd):
|
||||
if cmd.action == "pause":
|
||||
pause_calls.append(cmd)
|
||||
elif cmd.action == "unpause":
|
||||
unpause_calls.append(cmd)
|
||||
|
||||
mock_world.act = track_act
|
||||
router.set_world(mock_world)
|
||||
|
||||
await router.route("plan the quest", state={})
|
||||
|
||||
assert len(pause_calls) == 1, "world.pause() should be called once for T3"
|
||||
assert len(unpause_calls) == 1, "world.unpause() should be called once for T3"
|
||||
|
||||
async def test_t3_unpauses_world_even_on_llm_error(self):
|
||||
"""world.unpause() must be called even when the LLM raises."""
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(side_effect=RuntimeError("LLM failed"))
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
|
||||
unpause_calls = []
|
||||
mock_world = MagicMock()
|
||||
mock_world.act = lambda cmd: unpause_calls.append(cmd) if cmd.action == "unpause" else None
|
||||
router.set_world(mock_world)
|
||||
|
||||
with pytest.raises(RuntimeError, match="LLM failed"):
|
||||
await router.route("plan the quest", state={})
|
||||
|
||||
assert len(unpause_calls) == 1, "world.unpause() must run even when LLM errors"
|
||||
|
||||
async def test_t1_does_not_pause_world(self):
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "ok",
|
||||
"provider": "ollama",
|
||||
"model": "qwen3:8b",
|
||||
"latency_ms": 120,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
|
||||
pause_calls = []
|
||||
mock_world = MagicMock()
|
||||
mock_world.act = lambda cmd: pause_calls.append(cmd)
|
||||
router.set_world(mock_world)
|
||||
|
||||
await router.route("go north", state={})
|
||||
|
||||
assert len(pause_calls) == 0, "world.pause() must NOT be called for T1"
|
||||
|
||||
async def test_t2_does_not_pause_world(self):
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "ok",
|
||||
"provider": "ollama",
|
||||
"model": "qwen3:14b",
|
||||
"latency_ms": 350,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
|
||||
pause_calls = []
|
||||
mock_world = MagicMock()
|
||||
mock_world.act = lambda cmd: pause_calls.append(cmd)
|
||||
router.set_world(mock_world)
|
||||
|
||||
await router.route("talk to the merchant", state={})
|
||||
|
||||
assert len(pause_calls) == 0, "world.pause() must NOT be called for T2"
|
||||
|
||||
async def test_broken_world_adapter_degrades_gracefully(self):
|
||||
"""If world.act() raises, inference must still complete."""
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "done",
|
||||
"provider": "ollama",
|
||||
"model": "qwen3:30b",
|
||||
"latency_ms": 2000,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
|
||||
mock_world = MagicMock()
|
||||
mock_world.act = MagicMock(side_effect=RuntimeError("world broken"))
|
||||
router.set_world(mock_world)
|
||||
|
||||
# Should not raise — degradation only logs a warning
|
||||
result = await router.route("plan the quest", state={})
|
||||
assert result["content"] == "done"
|
||||
|
||||
async def test_no_world_adapter_t3_still_works(self):
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "plan done",
|
||||
"provider": "ollama",
|
||||
"model": "qwen3:30b",
|
||||
"latency_ms": 2000,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
# No set_world() called
|
||||
|
||||
result = await router.route("plan the quest route", state={})
|
||||
assert result["content"] == "plan done"
|
||||
assert result["tier"] == ModelTier.T3_COMPLEX
|
||||
|
||||
async def test_classify_delegates_to_module_function(self):
|
||||
router = MetabolicRouter(cascade=MagicMock())
|
||||
assert router.classify("go north", {}) == classify_complexity("go north", {})
|
||||
assert router.classify("plan the quest", {}) == classify_complexity("plan the quest", {})
|
||||
|
||||
async def test_ui_state_defaults_to_empty_dict(self):
|
||||
"""Calling route without ui_state should not raise."""
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "ok",
|
||||
"provider": "ollama",
|
||||
"model": "qwen3:8b",
|
||||
"latency_ms": 100,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
# No ui_state argument
|
||||
result = await router.route("go north", state={})
|
||||
assert result["content"] == "ok"
|
||||
|
||||
async def test_temperature_and_max_tokens_forwarded(self):
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "ok",
|
||||
"provider": "ollama",
|
||||
"model": "qwen3:14b",
|
||||
"latency_ms": 200,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
await router.route("describe the scene", state={}, temperature=0.1, max_tokens=50)
|
||||
call_kwargs = mock_cascade.complete.call_args.kwargs
|
||||
assert call_kwargs["temperature"] == 0.1
|
||||
assert call_kwargs["max_tokens"] == 50
|
||||
|
||||
|
||||
class TestGetMetabolicRouter:
|
||||
"""Test module-level singleton."""
|
||||
|
||||
def test_returns_metabolic_router_instance(self):
|
||||
import infrastructure.router.metabolic as m_module
|
||||
|
||||
# Reset singleton for clean test
|
||||
m_module._metabolic_router = None
|
||||
router = get_metabolic_router()
|
||||
assert isinstance(router, MetabolicRouter)
|
||||
|
||||
def test_singleton_returns_same_instance(self):
|
||||
import infrastructure.router.metabolic as m_module
|
||||
|
||||
m_module._metabolic_router = None
|
||||
r1 = get_metabolic_router()
|
||||
r2 = get_metabolic_router()
|
||||
assert r1 is r2
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
import time
|
||||
from pathlib import Path
|
||||
from unittest.mock import AsyncMock, patch
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
import yaml
|
||||
@@ -10,13 +10,16 @@ import yaml
|
||||
from infrastructure.router.cascade import (
|
||||
CascadeRouter,
|
||||
CircuitState,
|
||||
ContentType,
|
||||
Provider,
|
||||
ProviderMetrics,
|
||||
ProviderStatus,
|
||||
RouterConfig,
|
||||
get_router,
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestProviderMetrics:
|
||||
"""Test provider metrics tracking."""
|
||||
|
||||
@@ -45,6 +48,7 @@ class TestProviderMetrics:
|
||||
assert metrics.error_rate == 0.3
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestProvider:
|
||||
"""Test Provider dataclass."""
|
||||
|
||||
@@ -88,6 +92,7 @@ class TestProvider:
|
||||
assert provider.get_default_model() is None
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestRouterConfig:
|
||||
"""Test router configuration."""
|
||||
|
||||
@@ -100,6 +105,7 @@ class TestRouterConfig:
|
||||
assert config.circuit_breaker_failure_threshold == 5
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestCascadeRouterInit:
|
||||
"""Test CascadeRouter initialization."""
|
||||
|
||||
@@ -158,6 +164,7 @@ class TestCascadeRouterInit:
|
||||
assert router.providers[0].api_key == "secret123"
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestCascadeRouterMetrics:
|
||||
"""Test metrics tracking."""
|
||||
|
||||
@@ -241,6 +248,7 @@ class TestCascadeRouterMetrics:
|
||||
assert provider.status == ProviderStatus.HEALTHY
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestCascadeRouterGetMetrics:
|
||||
"""Test get_metrics method."""
|
||||
|
||||
@@ -280,6 +288,7 @@ class TestCascadeRouterGetMetrics:
|
||||
assert p_metrics["metrics"]["avg_latency_ms"] == 200.0
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestCascadeRouterGetStatus:
|
||||
"""Test get_status method."""
|
||||
|
||||
@@ -305,6 +314,7 @@ class TestCascadeRouterGetStatus:
|
||||
assert len(status["providers"]) == 1
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
@pytest.mark.asyncio
|
||||
class TestCascadeRouterComplete:
|
||||
"""Test complete method with failover."""
|
||||
@@ -436,6 +446,7 @@ class TestCascadeRouterComplete:
|
||||
assert result["provider"] == "healthy"
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestProviderAvailabilityCheck:
|
||||
"""Test provider availability checking."""
|
||||
|
||||
@@ -512,7 +523,7 @@ class TestProviderAvailabilityCheck:
|
||||
|
||||
def test_check_vllm_mlx_server_healthy(self):
|
||||
"""Test vllm-mlx when health check succeeds."""
|
||||
from unittest.mock import MagicMock, patch
|
||||
from unittest.mock import patch
|
||||
|
||||
router = CascadeRouter(config_path=Path("/nonexistent"))
|
||||
|
||||
@@ -556,7 +567,7 @@ class TestProviderAvailabilityCheck:
|
||||
|
||||
def test_check_vllm_mlx_default_url(self):
|
||||
"""Test vllm-mlx uses default localhost:8000 when no URL configured."""
|
||||
from unittest.mock import MagicMock, patch
|
||||
from unittest.mock import patch
|
||||
|
||||
router = CascadeRouter(config_path=Path("/nonexistent"))
|
||||
|
||||
@@ -577,6 +588,7 @@ class TestProviderAvailabilityCheck:
|
||||
mock_requests.get.assert_called_once_with("http://localhost:8000/health", timeout=5)
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
@pytest.mark.asyncio
|
||||
class TestVllmMlxProvider:
|
||||
"""Test vllm-mlx provider integration."""
|
||||
@@ -611,7 +623,7 @@ class TestVllmMlxProvider:
|
||||
|
||||
async def test_vllm_mlx_base_url_normalization(self):
|
||||
"""Test _call_vllm_mlx appends /v1 when missing."""
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
from unittest.mock import AsyncMock, patch
|
||||
|
||||
router = CascadeRouter(config_path=Path("/nonexistent"))
|
||||
|
||||
@@ -681,6 +693,8 @@ class TestVllmMlxProvider:
|
||||
assert result["content"] == "Local MLX response"
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
@pytest.mark.asyncio
|
||||
class TestMetabolicProtocol:
|
||||
"""Test metabolic protocol: cloud providers skip when quota is ACTIVE/RESTING."""
|
||||
|
||||
@@ -790,6 +804,7 @@ class TestMetabolicProtocol:
|
||||
assert result["content"] == "Cloud response"
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestCascadeRouterReload:
|
||||
"""Test hot-reload of providers.yaml."""
|
||||
|
||||
@@ -968,3 +983,396 @@ class TestCascadeRouterReload:
|
||||
|
||||
assert router.providers[0].name == "low-priority"
|
||||
assert router.providers[1].name == "high-priority"
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestContentTypeDetection:
|
||||
"""Test _detect_content_type logic."""
|
||||
|
||||
def _router(self) -> CascadeRouter:
|
||||
return CascadeRouter(config_path=Path("/nonexistent"))
|
||||
|
||||
def test_text_only(self):
|
||||
router = self._router()
|
||||
msgs = [{"role": "user", "content": "Hello"}]
|
||||
assert router._detect_content_type(msgs) == ContentType.TEXT
|
||||
|
||||
def test_images_key_triggers_vision(self):
|
||||
router = self._router()
|
||||
msgs = [{"role": "user", "content": "Describe this", "images": ["pic.jpg"]}]
|
||||
assert router._detect_content_type(msgs) == ContentType.VISION
|
||||
|
||||
def test_image_extension_in_content_triggers_vision(self):
|
||||
router = self._router()
|
||||
msgs = [{"role": "user", "content": "Look at photo.png please"}]
|
||||
assert router._detect_content_type(msgs) == ContentType.VISION
|
||||
|
||||
def test_base64_data_uri_triggers_vision(self):
|
||||
router = self._router()
|
||||
msgs = [{"role": "user", "content": "data:image/jpeg;base64,/9j/4AA..."}]
|
||||
assert router._detect_content_type(msgs) == ContentType.VISION
|
||||
|
||||
def test_audio_key_triggers_audio(self):
|
||||
router = self._router()
|
||||
msgs = [{"role": "user", "content": "", "audio": b"bytes"}]
|
||||
assert router._detect_content_type(msgs) == ContentType.AUDIO
|
||||
|
||||
def test_image_and_audio_triggers_multimodal(self):
|
||||
router = self._router()
|
||||
msgs = [
|
||||
{"role": "user", "content": "check photo.jpg", "audio": b"bytes"},
|
||||
]
|
||||
assert router._detect_content_type(msgs) == ContentType.MULTIMODAL
|
||||
|
||||
def test_list_content_image_url_type(self):
|
||||
router = self._router()
|
||||
msgs = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "text", "text": "What?"},
|
||||
{"type": "image_url", "image_url": {"url": "http://example.com/a.jpg"}},
|
||||
],
|
||||
}
|
||||
]
|
||||
assert router._detect_content_type(msgs) == ContentType.VISION
|
||||
|
||||
def test_list_content_audio_type(self):
|
||||
router = self._router()
|
||||
msgs = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "audio", "data": "base64..."},
|
||||
],
|
||||
}
|
||||
]
|
||||
assert router._detect_content_type(msgs) == ContentType.AUDIO
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestTransformMessagesForOllama:
|
||||
"""Test _transform_messages_for_ollama."""
|
||||
|
||||
def _router(self) -> CascadeRouter:
|
||||
return CascadeRouter(config_path=Path("/nonexistent"))
|
||||
|
||||
def test_plain_text_message(self):
|
||||
router = self._router()
|
||||
result = router._transform_messages_for_ollama([{"role": "user", "content": "Hello"}])
|
||||
assert result == [{"role": "user", "content": "Hello"}]
|
||||
|
||||
def test_base64_image_stripped(self):
|
||||
router = self._router()
|
||||
msgs = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Describe",
|
||||
"images": ["data:image/png;base64,abc123"],
|
||||
}
|
||||
]
|
||||
result = router._transform_messages_for_ollama(msgs)
|
||||
assert result[0]["images"] == ["abc123"]
|
||||
|
||||
def test_http_url_skipped(self):
|
||||
router = self._router()
|
||||
msgs = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Describe",
|
||||
"images": ["http://example.com/img.jpg"],
|
||||
}
|
||||
]
|
||||
result = router._transform_messages_for_ollama(msgs)
|
||||
# URL is skipped — images list should be empty or absent
|
||||
assert result[0].get("images", []) == []
|
||||
|
||||
def test_missing_local_file_skipped(self):
|
||||
router = self._router()
|
||||
msgs = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Describe",
|
||||
"images": ["/nonexistent/path/image.png"],
|
||||
}
|
||||
]
|
||||
result = router._transform_messages_for_ollama(msgs)
|
||||
assert result[0].get("images", []) == []
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestProviderCapabilityMethods:
|
||||
"""Test Provider.get_model_with_capability and model_has_capability."""
|
||||
|
||||
def _provider(self) -> Provider:
|
||||
return Provider(
|
||||
name="test",
|
||||
type="ollama",
|
||||
enabled=True,
|
||||
priority=1,
|
||||
models=[
|
||||
{"name": "llava:7b", "capabilities": ["vision"]},
|
||||
{"name": "llama3.2", "default": True},
|
||||
],
|
||||
)
|
||||
|
||||
def test_get_model_with_capability_found(self):
|
||||
p = self._provider()
|
||||
assert p.get_model_with_capability("vision") == "llava:7b"
|
||||
|
||||
def test_get_model_with_capability_falls_back_to_default(self):
|
||||
p = self._provider()
|
||||
assert p.get_model_with_capability("audio") == "llama3.2"
|
||||
|
||||
def test_model_has_capability_true(self):
|
||||
p = self._provider()
|
||||
assert p.model_has_capability("llava:7b", "vision") is True
|
||||
|
||||
def test_model_has_capability_false(self):
|
||||
p = self._provider()
|
||||
assert p.model_has_capability("llama3.2", "vision") is False
|
||||
|
||||
def test_model_has_capability_unknown_model(self):
|
||||
p = self._provider()
|
||||
assert p.model_has_capability("unknown-model", "vision") is False
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestGetFallbackModel:
|
||||
"""Test _get_fallback_model."""
|
||||
|
||||
def _router_with_provider(self) -> tuple[CascadeRouter, Provider]:
|
||||
router = CascadeRouter(config_path=Path("/nonexistent"))
|
||||
provider = Provider(
|
||||
name="test",
|
||||
type="ollama",
|
||||
enabled=True,
|
||||
priority=1,
|
||||
models=[
|
||||
{"name": "llava:7b", "capabilities": ["vision"]},
|
||||
{"name": "llama3.2", "default": True},
|
||||
],
|
||||
)
|
||||
return router, provider
|
||||
|
||||
def test_returns_vision_model(self):
|
||||
router, provider = self._router_with_provider()
|
||||
result = router._get_fallback_model(provider, "llama3.2", ContentType.VISION)
|
||||
assert result == "llava:7b"
|
||||
|
||||
def test_returns_none_if_no_capability(self):
|
||||
router, provider = self._router_with_provider()
|
||||
result = router._get_fallback_model(provider, "llama3.2", ContentType.AUDIO)
|
||||
# No audio model; falls back to default which is same as original
|
||||
assert result is None or result == "llama3.2"
|
||||
|
||||
def test_text_content_returns_none(self):
|
||||
router, provider = self._router_with_provider()
|
||||
result = router._get_fallback_model(provider, "llama3.2", ContentType.TEXT)
|
||||
assert result is None
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
@pytest.mark.asyncio
|
||||
class TestCascadeTierFiltering:
|
||||
"""Test cascade_tier parameter in complete()."""
|
||||
|
||||
def _make_router(self) -> CascadeRouter:
|
||||
router = CascadeRouter(config_path=Path("/nonexistent"))
|
||||
router.providers = [
|
||||
Provider(
|
||||
name="anthropic-primary",
|
||||
type="anthropic",
|
||||
enabled=True,
|
||||
priority=1,
|
||||
api_key="test-key",
|
||||
models=[{"name": "claude-sonnet-4-6", "default": True}],
|
||||
),
|
||||
Provider(
|
||||
name="ollama-local",
|
||||
type="ollama",
|
||||
enabled=True,
|
||||
priority=2,
|
||||
models=[{"name": "llama3.2", "default": True}],
|
||||
),
|
||||
]
|
||||
return router
|
||||
|
||||
async def test_frontier_required_uses_anthropic(self):
|
||||
router = self._make_router()
|
||||
with patch("infrastructure.router.cascade._quota_monitor", None):
|
||||
with patch.object(router, "_call_anthropic") as mock_call:
|
||||
mock_call.return_value = {
|
||||
"content": "frontier response",
|
||||
"model": "claude-sonnet-4-6",
|
||||
}
|
||||
result = await router.complete(
|
||||
messages=[{"role": "user", "content": "hi"}],
|
||||
cascade_tier="frontier_required",
|
||||
)
|
||||
assert result["provider"] == "anthropic-primary"
|
||||
mock_call.assert_called_once()
|
||||
|
||||
async def test_frontier_required_no_anthropic_raises(self):
|
||||
router = CascadeRouter(config_path=Path("/nonexistent"))
|
||||
router.providers = [
|
||||
Provider(
|
||||
name="ollama-local",
|
||||
type="ollama",
|
||||
enabled=True,
|
||||
priority=1,
|
||||
models=[{"name": "llama3.2", "default": True}],
|
||||
)
|
||||
]
|
||||
with pytest.raises(RuntimeError, match="No Anthropic provider configured"):
|
||||
await router.complete(
|
||||
messages=[{"role": "user", "content": "hi"}],
|
||||
cascade_tier="frontier_required",
|
||||
)
|
||||
|
||||
async def test_unknown_tier_raises(self):
|
||||
router = self._make_router()
|
||||
with pytest.raises(RuntimeError, match="No providers found for tier"):
|
||||
await router.complete(
|
||||
messages=[{"role": "user", "content": "hi"}],
|
||||
cascade_tier="nonexistent_tier",
|
||||
)
|
||||
|
||||
async def test_tier_filter_only_matching_providers(self):
|
||||
router = CascadeRouter(config_path=Path("/nonexistent"))
|
||||
router.providers = [
|
||||
Provider(
|
||||
name="local-primary",
|
||||
type="ollama",
|
||||
enabled=True,
|
||||
priority=1,
|
||||
tier="local",
|
||||
models=[{"name": "llama3.2", "default": True}],
|
||||
),
|
||||
Provider(
|
||||
name="cloud-secondary",
|
||||
type="anthropic",
|
||||
enabled=True,
|
||||
priority=2,
|
||||
tier="cloud",
|
||||
api_key="key",
|
||||
models=[{"name": "claude-sonnet-4-6", "default": True}],
|
||||
),
|
||||
]
|
||||
with patch.object(router, "_call_ollama") as mock_call:
|
||||
mock_call.return_value = {"content": "local response", "model": "llama3.2"}
|
||||
result = await router.complete(
|
||||
messages=[{"role": "user", "content": "hi"}],
|
||||
cascade_tier="local",
|
||||
)
|
||||
assert result["provider"] == "local-primary"
|
||||
mock_call.assert_called_once()
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
@pytest.mark.asyncio
|
||||
class TestGenerateWithImage:
|
||||
"""Test generate_with_image convenience method."""
|
||||
|
||||
async def test_delegates_to_complete(self):
|
||||
router = CascadeRouter(config_path=Path("/nonexistent"))
|
||||
router.providers = [
|
||||
Provider(
|
||||
name="ollama-vision",
|
||||
type="ollama",
|
||||
enabled=True,
|
||||
priority=1,
|
||||
models=[{"name": "llava:7b", "capabilities": ["vision"], "default": True}],
|
||||
)
|
||||
]
|
||||
|
||||
with patch.object(router, "_call_ollama") as mock_call:
|
||||
mock_call.return_value = {"content": "A cat", "model": "llava:7b"}
|
||||
result = await router.generate_with_image(
|
||||
prompt="What is this?",
|
||||
image_path="/tmp/cat.jpg",
|
||||
model="llava:7b",
|
||||
)
|
||||
|
||||
assert result["content"] == "A cat"
|
||||
assert result["provider"] == "ollama-vision"
|
||||
# complete() should have been called with images in messages
|
||||
call_kwargs = mock_call.call_args
|
||||
messages_passed = call_kwargs.kwargs.get("messages") or call_kwargs[1].get("messages")
|
||||
assert messages_passed[0]["images"] == ["/tmp/cat.jpg"]
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestGetRouterSingleton:
|
||||
"""Test get_router() returns a singleton and creates CascadeRouter."""
|
||||
|
||||
def test_get_router_returns_cascade_router(self):
|
||||
import infrastructure.router.cascade as cascade_module
|
||||
|
||||
# Reset singleton to test creation
|
||||
original = cascade_module.cascade_router
|
||||
cascade_module.cascade_router = None
|
||||
try:
|
||||
router = get_router()
|
||||
assert isinstance(router, CascadeRouter)
|
||||
finally:
|
||||
cascade_module.cascade_router = original
|
||||
|
||||
def test_get_router_returns_same_instance(self):
|
||||
import infrastructure.router.cascade as cascade_module
|
||||
|
||||
original = cascade_module.cascade_router
|
||||
cascade_module.cascade_router = None
|
||||
try:
|
||||
r1 = get_router()
|
||||
r2 = get_router()
|
||||
assert r1 is r2
|
||||
finally:
|
||||
cascade_module.cascade_router = original
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestIsProviderAvailable:
|
||||
"""Test _is_provider_available with circuit breaker transitions."""
|
||||
|
||||
def _router(self) -> CascadeRouter:
|
||||
return CascadeRouter(config_path=Path("/nonexistent"))
|
||||
|
||||
def test_disabled_provider_not_available(self):
|
||||
router = self._router()
|
||||
provider = Provider(name="p", type="ollama", enabled=False, priority=1)
|
||||
assert router._is_provider_available(provider) is False
|
||||
|
||||
def test_healthy_provider_available(self):
|
||||
router = self._router()
|
||||
provider = Provider(name="p", type="ollama", enabled=True, priority=1)
|
||||
assert router._is_provider_available(provider) is True
|
||||
|
||||
def test_unhealthy_open_circuit_not_available(self):
|
||||
router = self._router()
|
||||
provider = Provider(
|
||||
name="p",
|
||||
type="ollama",
|
||||
enabled=True,
|
||||
priority=1,
|
||||
status=ProviderStatus.UNHEALTHY,
|
||||
circuit_state=CircuitState.OPEN,
|
||||
circuit_opened_at=time.time(), # Just opened — not yet recoverable
|
||||
)
|
||||
assert router._is_provider_available(provider) is False
|
||||
|
||||
def test_unhealthy_after_timeout_transitions_to_half_open(self):
|
||||
router = self._router()
|
||||
router.config.circuit_breaker_recovery_timeout = 0
|
||||
provider = Provider(
|
||||
name="p",
|
||||
type="ollama",
|
||||
enabled=True,
|
||||
priority=1,
|
||||
status=ProviderStatus.UNHEALTHY,
|
||||
circuit_state=CircuitState.OPEN,
|
||||
circuit_opened_at=time.time() - 10, # Long ago
|
||||
)
|
||||
result = router._is_provider_available(provider)
|
||||
assert result is True
|
||||
assert provider.circuit_state == CircuitState.HALF_OPEN
|
||||
|
||||
@@ -10,14 +10,12 @@ from __future__ import annotations
|
||||
|
||||
import json
|
||||
import socket
|
||||
from pathlib import Path
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from integrations.bannerlord.gabs_client import GabsClient, GabsError
|
||||
|
||||
|
||||
# ── GabsClient unit tests ─────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@@ -236,7 +234,13 @@ class TestBannerlordObserver:
|
||||
|
||||
snapshot = {
|
||||
"game_state": {"day": 7, "season": "winter", "campaign_phase": "early"},
|
||||
"player": {"name": "Timmy", "clan": "Thalheimer", "renown": 42, "level": 3, "gold": 1000},
|
||||
"player": {
|
||||
"name": "Timmy",
|
||||
"clan": "Thalheimer",
|
||||
"renown": 42,
|
||||
"level": 3,
|
||||
"gold": 1000,
|
||||
},
|
||||
"player_party": {"size": 25, "morale": 80, "food_days_left": 5},
|
||||
"kingdoms": [{"name": "Vlandia", "ruler": "Derthert", "military_strength": 5000}],
|
||||
}
|
||||
|
||||
@@ -9,10 +9,8 @@ import json
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
import scripts.export_trajectories as et
|
||||
|
||||
|
||||
# ── Fixtures ──────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@@ -22,10 +20,30 @@ def simple_session(tmp_path: Path) -> Path:
|
||||
logs_dir = tmp_path / "logs"
|
||||
logs_dir.mkdir()
|
||||
entries = [
|
||||
{"type": "message", "role": "user", "content": "What time is it?", "timestamp": "2026-03-01T10:00:00"},
|
||||
{"type": "message", "role": "timmy", "content": "It is 10:00 AM.", "timestamp": "2026-03-01T10:00:01"},
|
||||
{"type": "message", "role": "user", "content": "Thanks!", "timestamp": "2026-03-01T10:00:05"},
|
||||
{"type": "message", "role": "timmy", "content": "You're welcome!", "timestamp": "2026-03-01T10:00:06"},
|
||||
{
|
||||
"type": "message",
|
||||
"role": "user",
|
||||
"content": "What time is it?",
|
||||
"timestamp": "2026-03-01T10:00:00",
|
||||
},
|
||||
{
|
||||
"type": "message",
|
||||
"role": "timmy",
|
||||
"content": "It is 10:00 AM.",
|
||||
"timestamp": "2026-03-01T10:00:01",
|
||||
},
|
||||
{
|
||||
"type": "message",
|
||||
"role": "user",
|
||||
"content": "Thanks!",
|
||||
"timestamp": "2026-03-01T10:00:05",
|
||||
},
|
||||
{
|
||||
"type": "message",
|
||||
"role": "timmy",
|
||||
"content": "You're welcome!",
|
||||
"timestamp": "2026-03-01T10:00:06",
|
||||
},
|
||||
]
|
||||
session_file = logs_dir / "session_2026-03-01.jsonl"
|
||||
session_file.write_text("\n".join(json.dumps(e) for e in entries) + "\n")
|
||||
@@ -38,7 +56,12 @@ def tool_call_session(tmp_path: Path) -> Path:
|
||||
logs_dir = tmp_path / "logs"
|
||||
logs_dir.mkdir()
|
||||
entries = [
|
||||
{"type": "message", "role": "user", "content": "Read CLAUDE.md", "timestamp": "2026-03-01T10:00:00"},
|
||||
{
|
||||
"type": "message",
|
||||
"role": "user",
|
||||
"content": "Read CLAUDE.md",
|
||||
"timestamp": "2026-03-01T10:00:00",
|
||||
},
|
||||
{
|
||||
"type": "tool_call",
|
||||
"tool": "read_file",
|
||||
@@ -46,7 +69,12 @@ def tool_call_session(tmp_path: Path) -> Path:
|
||||
"result": "# CLAUDE.md content here",
|
||||
"timestamp": "2026-03-01T10:00:01",
|
||||
},
|
||||
{"type": "message", "role": "timmy", "content": "Here is the content.", "timestamp": "2026-03-01T10:00:02"},
|
||||
{
|
||||
"type": "message",
|
||||
"role": "timmy",
|
||||
"content": "Here is the content.",
|
||||
"timestamp": "2026-03-01T10:00:02",
|
||||
},
|
||||
]
|
||||
session_file = logs_dir / "session_2026-03-01.jsonl"
|
||||
session_file.write_text("\n".join(json.dumps(e) for e in entries) + "\n")
|
||||
@@ -236,7 +264,7 @@ def test_export_training_data_writes_jsonl(simple_session: Path, tmp_path: Path)
|
||||
count = et.export_training_data(logs_dir=simple_session, output_path=output)
|
||||
assert count == 2
|
||||
assert output.exists()
|
||||
lines = [json.loads(l) for l in output.read_text().splitlines() if l.strip()]
|
||||
lines = [json.loads(line) for line in output.read_text().splitlines() if line.strip()]
|
||||
assert len(lines) == 2
|
||||
for line in lines:
|
||||
assert "messages" in line
|
||||
@@ -270,16 +298,22 @@ def test_export_training_data_returns_zero_for_empty_logs(tmp_path: Path) -> Non
|
||||
|
||||
@pytest.mark.unit
|
||||
def test_cli_missing_logs_dir(tmp_path: Path) -> None:
|
||||
rc = et.main(["--logs-dir", str(tmp_path / "nonexistent"), "--output", str(tmp_path / "out.jsonl")])
|
||||
rc = et.main(
|
||||
["--logs-dir", str(tmp_path / "nonexistent"), "--output", str(tmp_path / "out.jsonl")]
|
||||
)
|
||||
assert rc == 1
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
def test_cli_exports_and_returns_zero(simple_session: Path, tmp_path: Path) -> None:
|
||||
output = tmp_path / "out.jsonl"
|
||||
rc = et.main([
|
||||
"--logs-dir", str(simple_session),
|
||||
"--output", str(output),
|
||||
])
|
||||
rc = et.main(
|
||||
[
|
||||
"--logs-dir",
|
||||
str(simple_session),
|
||||
"--output",
|
||||
str(output),
|
||||
]
|
||||
)
|
||||
assert rc == 0
|
||||
assert output.exists()
|
||||
|
||||
195
tests/timmy/agents/test_emotional_state.py
Normal file
195
tests/timmy/agents/test_emotional_state.py
Normal file
@@ -0,0 +1,195 @@
|
||||
"""Tests for agent emotional state simulation (src/timmy/agents/emotional_state.py)."""
|
||||
|
||||
import time
|
||||
|
||||
from timmy.agents.emotional_state import (
|
||||
EMOTION_PROMPT_MODIFIERS,
|
||||
EMOTIONAL_STATES,
|
||||
EVENT_TRANSITIONS,
|
||||
EmotionalState,
|
||||
EmotionalStateTracker,
|
||||
_intensity_label,
|
||||
)
|
||||
|
||||
|
||||
class TestEmotionalState:
|
||||
"""Test the EmotionalState dataclass."""
|
||||
|
||||
def test_defaults(self):
|
||||
state = EmotionalState()
|
||||
assert state.current_emotion == "calm"
|
||||
assert state.intensity == 0.5
|
||||
assert state.previous_emotion == "calm"
|
||||
assert state.trigger_event == ""
|
||||
|
||||
def test_to_dict_includes_label(self):
|
||||
state = EmotionalState(current_emotion="analytical")
|
||||
d = state.to_dict()
|
||||
assert d["emotion_label"] == "Analytical"
|
||||
assert d["current_emotion"] == "analytical"
|
||||
|
||||
def test_to_dict_all_fields(self):
|
||||
state = EmotionalState(
|
||||
current_emotion="frustrated",
|
||||
intensity=0.8,
|
||||
previous_emotion="calm",
|
||||
trigger_event="task_failure",
|
||||
)
|
||||
d = state.to_dict()
|
||||
assert d["current_emotion"] == "frustrated"
|
||||
assert d["intensity"] == 0.8
|
||||
assert d["previous_emotion"] == "calm"
|
||||
assert d["trigger_event"] == "task_failure"
|
||||
|
||||
|
||||
class TestEmotionalStates:
|
||||
"""Validate the emotional states and transitions are well-defined."""
|
||||
|
||||
def test_all_states_are_strings(self):
|
||||
for state in EMOTIONAL_STATES:
|
||||
assert isinstance(state, str)
|
||||
|
||||
def test_all_states_have_prompt_modifiers(self):
|
||||
for state in EMOTIONAL_STATES:
|
||||
assert state in EMOTION_PROMPT_MODIFIERS
|
||||
|
||||
def test_all_transitions_target_valid_states(self):
|
||||
for event_type, (emotion, intensity) in EVENT_TRANSITIONS.items():
|
||||
assert emotion in EMOTIONAL_STATES, f"{event_type} targets unknown state: {emotion}"
|
||||
assert 0.0 <= intensity <= 1.0, f"{event_type} has invalid intensity: {intensity}"
|
||||
|
||||
|
||||
class TestEmotionalStateTracker:
|
||||
"""Test the EmotionalStateTracker."""
|
||||
|
||||
def test_initial_emotion_default(self):
|
||||
tracker = EmotionalStateTracker()
|
||||
assert tracker.state.current_emotion == "calm"
|
||||
|
||||
def test_initial_emotion_custom(self):
|
||||
tracker = EmotionalStateTracker(initial_emotion="analytical")
|
||||
assert tracker.state.current_emotion == "analytical"
|
||||
|
||||
def test_initial_emotion_invalid_falls_back(self):
|
||||
tracker = EmotionalStateTracker(initial_emotion="invalid_state")
|
||||
assert tracker.state.current_emotion == "calm"
|
||||
|
||||
def test_process_known_event(self):
|
||||
tracker = EmotionalStateTracker()
|
||||
state = tracker.process_event("task_success")
|
||||
assert state.current_emotion == "confident"
|
||||
assert state.trigger_event == "task_success"
|
||||
assert state.previous_emotion == "calm"
|
||||
|
||||
def test_process_unknown_event_ignored(self):
|
||||
tracker = EmotionalStateTracker()
|
||||
state = tracker.process_event("unknown_event_xyz")
|
||||
assert state.current_emotion == "calm" # unchanged
|
||||
|
||||
def test_repeated_same_emotion_amplifies(self):
|
||||
tracker = EmotionalStateTracker()
|
||||
tracker.process_event("task_success")
|
||||
initial_intensity = tracker.state.intensity
|
||||
tracker.process_event("user_praise") # also targets confident
|
||||
assert tracker.state.intensity >= initial_intensity
|
||||
|
||||
def test_different_emotion_replaces(self):
|
||||
tracker = EmotionalStateTracker()
|
||||
tracker.process_event("task_success")
|
||||
assert tracker.state.current_emotion == "confident"
|
||||
tracker.process_event("task_failure")
|
||||
assert tracker.state.current_emotion == "frustrated"
|
||||
assert tracker.state.previous_emotion == "confident"
|
||||
|
||||
def test_decay_no_effect_when_recent(self):
|
||||
tracker = EmotionalStateTracker()
|
||||
tracker.process_event("task_failure")
|
||||
emotion_before = tracker.state.current_emotion
|
||||
tracker.decay()
|
||||
assert tracker.state.current_emotion == emotion_before
|
||||
|
||||
def test_decay_resets_to_calm_after_long_time(self):
|
||||
tracker = EmotionalStateTracker()
|
||||
tracker.process_event("task_failure")
|
||||
assert tracker.state.current_emotion == "frustrated"
|
||||
|
||||
# Simulate passage of time (30+ minutes)
|
||||
tracker.state.updated_at = time.time() - 2000
|
||||
tracker.decay()
|
||||
assert tracker.state.current_emotion == "calm"
|
||||
|
||||
def test_get_profile_returns_expected_keys(self):
|
||||
tracker = EmotionalStateTracker()
|
||||
profile = tracker.get_profile()
|
||||
assert "current_emotion" in profile
|
||||
assert "emotion_label" in profile
|
||||
assert "intensity" in profile
|
||||
assert "intensity_label" in profile
|
||||
assert "previous_emotion" in profile
|
||||
assert "trigger_event" in profile
|
||||
assert "prompt_modifier" in profile
|
||||
|
||||
def test_get_prompt_modifier_returns_string(self):
|
||||
tracker = EmotionalStateTracker(initial_emotion="cautious")
|
||||
modifier = tracker.get_prompt_modifier()
|
||||
assert isinstance(modifier, str)
|
||||
assert "cautious" in modifier.lower()
|
||||
|
||||
def test_reset(self):
|
||||
tracker = EmotionalStateTracker()
|
||||
tracker.process_event("task_failure")
|
||||
tracker.reset()
|
||||
assert tracker.state.current_emotion == "calm"
|
||||
assert tracker.state.intensity == 0.5
|
||||
|
||||
def test_process_event_with_context(self):
|
||||
"""Context dict is accepted without error."""
|
||||
tracker = EmotionalStateTracker()
|
||||
state = tracker.process_event("error", {"details": "connection timeout"})
|
||||
assert state.current_emotion == "cautious"
|
||||
|
||||
def test_event_chain_scenario(self):
|
||||
"""Simulate: task assigned → success → new discovery → idle."""
|
||||
tracker = EmotionalStateTracker()
|
||||
|
||||
tracker.process_event("task_assigned")
|
||||
assert tracker.state.current_emotion == "analytical"
|
||||
|
||||
tracker.process_event("task_success")
|
||||
assert tracker.state.current_emotion == "confident"
|
||||
|
||||
tracker.process_event("new_discovery")
|
||||
assert tracker.state.current_emotion == "curious"
|
||||
|
||||
tracker.process_event("idle")
|
||||
assert tracker.state.current_emotion == "calm"
|
||||
|
||||
def test_health_events(self):
|
||||
tracker = EmotionalStateTracker()
|
||||
tracker.process_event("health_low")
|
||||
assert tracker.state.current_emotion == "cautious"
|
||||
|
||||
tracker.process_event("health_recovered")
|
||||
assert tracker.state.current_emotion == "calm"
|
||||
|
||||
def test_quest_completed_triggers_adventurous(self):
|
||||
tracker = EmotionalStateTracker()
|
||||
tracker.process_event("quest_completed")
|
||||
assert tracker.state.current_emotion == "adventurous"
|
||||
|
||||
|
||||
class TestIntensityLabel:
|
||||
def test_overwhelming(self):
|
||||
assert _intensity_label(0.9) == "overwhelming"
|
||||
|
||||
def test_strong(self):
|
||||
assert _intensity_label(0.7) == "strong"
|
||||
|
||||
def test_moderate(self):
|
||||
assert _intensity_label(0.5) == "moderate"
|
||||
|
||||
def test_mild(self):
|
||||
assert _intensity_label(0.3) == "mild"
|
||||
|
||||
def test_faint(self):
|
||||
assert _intensity_label(0.1) == "faint"
|
||||
@@ -435,14 +435,14 @@ class TestStatusAndCapabilities:
|
||||
tools=["calc"],
|
||||
)
|
||||
status = agent.get_status()
|
||||
assert status == {
|
||||
"agent_id": "bot-1",
|
||||
"name": "TestBot",
|
||||
"role": "assistant",
|
||||
"model": "qwen3:30b",
|
||||
"status": "ready",
|
||||
"tools": ["calc"],
|
||||
}
|
||||
assert status["agent_id"] == "bot-1"
|
||||
assert status["name"] == "TestBot"
|
||||
assert status["role"] == "assistant"
|
||||
assert status["model"] == "qwen3:30b"
|
||||
assert status["status"] == "ready"
|
||||
assert status["tools"] == ["calc"]
|
||||
assert "emotional_profile" in status
|
||||
assert status["emotional_profile"]["current_emotion"] == "calm"
|
||||
|
||||
|
||||
# ── SubAgent.execute_task ────────────────────────────────────────────────────
|
||||
|
||||
@@ -4,8 +4,6 @@ from __future__ import annotations
|
||||
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from timmy.dispatcher import (
|
||||
AGENT_REGISTRY,
|
||||
AgentType,
|
||||
@@ -21,11 +19,11 @@ from timmy.dispatcher import (
|
||||
wait_for_completion,
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Agent registry
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestAgentRegistry:
|
||||
def test_all_agents_present(self):
|
||||
for member in AgentType:
|
||||
@@ -41,7 +39,7 @@ class TestAgentRegistry:
|
||||
assert spec.gitea_label, f"{agent} is gitea interface but has no label"
|
||||
|
||||
def test_non_gitea_agents_have_no_labels(self):
|
||||
for agent, spec in AGENT_REGISTRY.items():
|
||||
for _agent, spec in AGENT_REGISTRY.items():
|
||||
if spec.interface not in ("gitea",):
|
||||
# api and local agents may have no label
|
||||
assert spec.gitea_label is None or spec.interface == "gitea"
|
||||
@@ -55,6 +53,7 @@ class TestAgentRegistry:
|
||||
# select_agent
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestSelectAgent:
|
||||
def test_architecture_routes_to_claude(self):
|
||||
assert select_agent(TaskType.ARCHITECTURE) == AgentType.CLAUDE_CODE
|
||||
@@ -85,6 +84,7 @@ class TestSelectAgent:
|
||||
# infer_task_type
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestInferTaskType:
|
||||
def test_architecture_keyword(self):
|
||||
assert infer_task_type("Design the LLM router architecture") == TaskType.ARCHITECTURE
|
||||
@@ -119,6 +119,7 @@ class TestInferTaskType:
|
||||
# DispatchResult
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestDispatchResult:
|
||||
def test_success_when_assigned(self):
|
||||
r = DispatchResult(
|
||||
@@ -161,6 +162,7 @@ class TestDispatchResult:
|
||||
# _dispatch_local
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestDispatchLocal:
|
||||
async def test_returns_assigned(self):
|
||||
result = await _dispatch_local(
|
||||
@@ -190,6 +192,7 @@ class TestDispatchLocal:
|
||||
# _dispatch_via_api
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestDispatchViaApi:
|
||||
async def test_no_endpoint_returns_failed(self):
|
||||
result = await _dispatch_via_api(
|
||||
@@ -304,7 +307,9 @@ class TestDispatchViaGitea:
|
||||
assert result.status == DispatchStatus.ASSIGNED
|
||||
|
||||
async def test_no_gitea_token_returns_failed(self):
|
||||
bad_settings = MagicMock(gitea_enabled=True, gitea_token="", gitea_url="http://x", gitea_repo="a/b")
|
||||
bad_settings = MagicMock(
|
||||
gitea_enabled=True, gitea_token="", gitea_url="http://x", gitea_repo="a/b"
|
||||
)
|
||||
with patch("timmy.dispatcher.settings", bad_settings):
|
||||
result = await _dispatch_via_gitea(
|
||||
agent=AgentType.CLAUDE_CODE,
|
||||
@@ -317,7 +322,9 @@ class TestDispatchViaGitea:
|
||||
assert "not configured" in (result.error or "").lower()
|
||||
|
||||
async def test_gitea_disabled_returns_failed(self):
|
||||
bad_settings = MagicMock(gitea_enabled=False, gitea_token="tok", gitea_url="http://x", gitea_repo="a/b")
|
||||
bad_settings = MagicMock(
|
||||
gitea_enabled=False, gitea_token="tok", gitea_url="http://x", gitea_repo="a/b"
|
||||
)
|
||||
with patch("timmy.dispatcher.settings", bad_settings):
|
||||
result = await _dispatch_via_gitea(
|
||||
agent=AgentType.CLAUDE_CODE,
|
||||
@@ -368,6 +375,7 @@ class TestDispatchViaGitea:
|
||||
# dispatch_task (integration-style)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestDispatchTask:
|
||||
async def test_empty_title_returns_failed(self):
|
||||
result = await dispatch_task(title=" ")
|
||||
@@ -396,7 +404,9 @@ class TestDispatchTask:
|
||||
client_mock = AsyncMock()
|
||||
client_mock.__aenter__ = AsyncMock(return_value=client_mock)
|
||||
client_mock.__aexit__ = AsyncMock(return_value=False)
|
||||
client_mock.get = AsyncMock(return_value=MagicMock(status_code=200, json=MagicMock(return_value=[])))
|
||||
client_mock.get = AsyncMock(
|
||||
return_value=MagicMock(status_code=200, json=MagicMock(return_value=[]))
|
||||
)
|
||||
create_resp = MagicMock(status_code=201, json=MagicMock(return_value={"id": 1}))
|
||||
apply_resp = MagicMock(status_code=201)
|
||||
comment_resp = MagicMock(status_code=201, json=MagicMock(return_value={"id": 5}))
|
||||
@@ -464,6 +474,7 @@ class TestDispatchTask:
|
||||
# wait_for_completion
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestWaitForCompletion:
|
||||
async def test_returns_completed_when_issue_closed(self):
|
||||
closed_resp = MagicMock(
|
||||
|
||||
@@ -25,7 +25,6 @@ from timmy.backlog_triage import (
|
||||
score_issue,
|
||||
)
|
||||
|
||||
|
||||
# ── Fixtures ─────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
|
||||
0
tests/unit/test_bannerlord/__init__.py
Normal file
0
tests/unit/test_bannerlord/__init__.py
Normal file
301
tests/unit/test_bannerlord/test_agents.py
Normal file
301
tests/unit/test_bannerlord/test_agents.py
Normal file
@@ -0,0 +1,301 @@
|
||||
"""Unit tests for bannerlord agents — King, Vassals, Companions."""
|
||||
|
||||
import asyncio
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
from bannerlord.agents.companions import (
|
||||
CaravanCompanion,
|
||||
LogisticsCompanion,
|
||||
ScoutCompanion,
|
||||
)
|
||||
from bannerlord.agents.king import KingAgent
|
||||
from bannerlord.agents.vassals import DiplomacyVassal, EconomyVassal, WarVassal
|
||||
from bannerlord.gabs_client import GABSClient, GABSUnavailable
|
||||
from bannerlord.ledger import Ledger
|
||||
from bannerlord.models import (
|
||||
KingSubgoal,
|
||||
TaskMessage,
|
||||
)
|
||||
|
||||
# ── Helpers ───────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _mock_gabs(state: dict | None = None) -> GABSClient:
|
||||
"""Return a disconnected GABS stub that returns *state* from get_state."""
|
||||
gabs = MagicMock(spec=GABSClient)
|
||||
gabs.connected = False
|
||||
if state is not None:
|
||||
gabs.get_state = AsyncMock(return_value=state)
|
||||
else:
|
||||
gabs.get_state = AsyncMock(side_effect=GABSUnavailable("no game"))
|
||||
gabs.call = AsyncMock(return_value={})
|
||||
gabs.recruit_troops = AsyncMock(return_value={"recruited": 10})
|
||||
gabs.move_party = AsyncMock(return_value={"moving": True})
|
||||
return gabs
|
||||
|
||||
|
||||
def _mock_ledger(tmp_path) -> Ledger:
|
||||
ledger = Ledger(db_path=tmp_path / "ledger.db")
|
||||
ledger.initialize()
|
||||
return ledger
|
||||
|
||||
|
||||
# ── King agent ────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestKingAgent:
|
||||
async def test_victory_detected(self, tmp_path):
|
||||
"""Campaign stops immediately when victory condition is met."""
|
||||
gabs = _mock_gabs({"player_title": "King", "territory_control_pct": 55.0})
|
||||
ledger = _mock_ledger(tmp_path)
|
||||
king = KingAgent(gabs_client=gabs, ledger=ledger, tick_interval=0)
|
||||
victory = await king.run_campaign(max_ticks=10)
|
||||
assert victory.achieved
|
||||
|
||||
async def test_max_ticks_respected(self, tmp_path):
|
||||
"""Campaign stops after max_ticks when victory not yet achieved."""
|
||||
gabs = _mock_gabs({"player_title": "Lord", "territory_control_pct": 10.0})
|
||||
ledger = _mock_ledger(tmp_path)
|
||||
|
||||
# Patch LLM to return a valid subgoal without calling Ollama
|
||||
king = KingAgent(gabs_client=gabs, ledger=ledger, tick_interval=0)
|
||||
with patch.object(king, "_decide", AsyncMock(return_value=KingSubgoal(token="RECRUIT"))):
|
||||
victory = await king.run_campaign(max_ticks=3)
|
||||
|
||||
assert not victory.achieved
|
||||
assert king._tick == 3
|
||||
|
||||
async def test_llm_failure_falls_back_to_recruit(self, tmp_path):
|
||||
"""If LLM fails, King defaults to RECRUIT subgoal."""
|
||||
gabs = _mock_gabs({"player_title": "Lord", "territory_control_pct": 5.0})
|
||||
ledger = _mock_ledger(tmp_path)
|
||||
king = KingAgent(gabs_client=gabs, ledger=ledger, tick_interval=0)
|
||||
|
||||
with patch.object(king, "_llm_decide", side_effect=RuntimeError("Ollama down")):
|
||||
subgoal = await king._decide({})
|
||||
|
||||
assert subgoal.token == "RECRUIT"
|
||||
|
||||
async def test_subgoal_broadcast_to_all_vassals(self, tmp_path):
|
||||
"""King broadcasts subgoal to all three vassals."""
|
||||
gabs = _mock_gabs({})
|
||||
ledger = _mock_ledger(tmp_path)
|
||||
king = KingAgent(gabs_client=gabs, ledger=ledger)
|
||||
subgoal = KingSubgoal(token="EXPAND_TERRITORY", target="Epicrotea")
|
||||
await king._broadcast_subgoal(subgoal)
|
||||
|
||||
messages = []
|
||||
while not king.subgoal_queue.empty():
|
||||
messages.append(king.subgoal_queue.get_nowait())
|
||||
|
||||
assert len(messages) == 3
|
||||
recipients = {m.to_agent for m in messages}
|
||||
assert recipients == {"war_vassal", "economy_vassal", "diplomacy_vassal"}
|
||||
|
||||
async def test_gabs_unavailable_uses_empty_state(self, tmp_path):
|
||||
"""King handles GABS being offline gracefully."""
|
||||
gabs = _mock_gabs() # raises GABSUnavailable
|
||||
ledger = _mock_ledger(tmp_path)
|
||||
king = KingAgent(gabs_client=gabs, ledger=ledger)
|
||||
state = await king._fetch_state()
|
||||
assert state == {}
|
||||
|
||||
def test_evaluate_victory_king_with_majority(self, tmp_path):
|
||||
gabs = _mock_gabs()
|
||||
ledger = _mock_ledger(tmp_path)
|
||||
king = KingAgent(gabs_client=gabs, ledger=ledger)
|
||||
v = king._evaluate_victory({"player_title": "King", "territory_control_pct": 60.0})
|
||||
assert v.achieved
|
||||
|
||||
def test_evaluate_victory_not_king(self, tmp_path):
|
||||
gabs = _mock_gabs()
|
||||
ledger = _mock_ledger(tmp_path)
|
||||
king = KingAgent(gabs_client=gabs, ledger=ledger)
|
||||
v = king._evaluate_victory({"player_title": "Lord", "territory_control_pct": 80.0})
|
||||
assert not v.achieved
|
||||
|
||||
|
||||
# ── Vassals ───────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestWarVassal:
|
||||
async def test_expand_territory_emits_move_task(self):
|
||||
gabs = _mock_gabs({"territory_delta": 1.0, "army_strength_ratio": 1.5})
|
||||
queue = asyncio.Queue()
|
||||
vassal = WarVassal(gabs_client=gabs, subgoal_queue=queue)
|
||||
subgoal = KingSubgoal(token="EXPAND_TERRITORY", target="Seonon")
|
||||
await vassal._tick(subgoal)
|
||||
task: TaskMessage = vassal.task_queue.get_nowait()
|
||||
assert task.primitive == "move_party"
|
||||
assert task.args["destination"] == "Seonon"
|
||||
|
||||
async def test_recruit_emits_recruit_task(self):
|
||||
gabs = _mock_gabs({})
|
||||
queue = asyncio.Queue()
|
||||
vassal = WarVassal(gabs_client=gabs, subgoal_queue=queue)
|
||||
subgoal = KingSubgoal(token="RECRUIT", quantity=15)
|
||||
await vassal._tick(subgoal)
|
||||
task: TaskMessage = vassal.task_queue.get_nowait()
|
||||
assert task.primitive == "recruit_troop"
|
||||
assert task.args["quantity"] == 15
|
||||
|
||||
async def test_irrelevant_token_emits_no_task(self):
|
||||
gabs = _mock_gabs({})
|
||||
queue = asyncio.Queue()
|
||||
vassal = WarVassal(gabs_client=gabs, subgoal_queue=queue)
|
||||
subgoal = KingSubgoal(token="ALLY")
|
||||
await vassal._tick(subgoal)
|
||||
assert vassal.task_queue.empty()
|
||||
|
||||
|
||||
class TestEconomyVassal:
|
||||
async def test_fortify_emits_build_task(self):
|
||||
gabs = _mock_gabs({"daily_income": 200.0})
|
||||
queue = asyncio.Queue()
|
||||
vassal = EconomyVassal(gabs_client=gabs, subgoal_queue=queue)
|
||||
subgoal = KingSubgoal(token="FORTIFY", target="Epicrotea")
|
||||
await vassal._tick(subgoal)
|
||||
task: TaskMessage = vassal.task_queue.get_nowait()
|
||||
assert task.primitive == "build_project"
|
||||
assert task.args["settlement"] == "Epicrotea"
|
||||
|
||||
async def test_trade_emits_assess_prices(self):
|
||||
gabs = _mock_gabs({})
|
||||
queue = asyncio.Queue()
|
||||
vassal = EconomyVassal(gabs_client=gabs, subgoal_queue=queue)
|
||||
subgoal = KingSubgoal(token="TRADE", target="Pravend")
|
||||
await vassal._tick(subgoal)
|
||||
task: TaskMessage = vassal.task_queue.get_nowait()
|
||||
assert task.primitive == "assess_prices"
|
||||
|
||||
|
||||
class TestDiplomacyVassal:
|
||||
async def test_ally_emits_track_lord(self):
|
||||
gabs = _mock_gabs({"allies_count": 1})
|
||||
queue = asyncio.Queue()
|
||||
vassal = DiplomacyVassal(gabs_client=gabs, subgoal_queue=queue)
|
||||
subgoal = KingSubgoal(token="ALLY", target="Derthert")
|
||||
await vassal._tick(subgoal)
|
||||
task: TaskMessage = vassal.task_queue.get_nowait()
|
||||
assert task.primitive == "track_lord"
|
||||
assert task.args["name"] == "Derthert"
|
||||
|
||||
async def test_spy_emits_assess_garrison(self):
|
||||
gabs = _mock_gabs({})
|
||||
queue = asyncio.Queue()
|
||||
vassal = DiplomacyVassal(gabs_client=gabs, subgoal_queue=queue)
|
||||
subgoal = KingSubgoal(token="SPY", target="Marunath")
|
||||
await vassal._tick(subgoal)
|
||||
task: TaskMessage = vassal.task_queue.get_nowait()
|
||||
assert task.primitive == "assess_garrison"
|
||||
assert task.args["settlement"] == "Marunath"
|
||||
|
||||
|
||||
# ── Companions ────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestLogisticsCompanion:
|
||||
async def test_recruit_troop(self):
|
||||
gabs = _mock_gabs()
|
||||
gabs.recruit_troops = AsyncMock(return_value={"recruited": 10, "type": "infantry"})
|
||||
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
|
||||
comp = LogisticsCompanion(gabs_client=gabs, task_queue=q)
|
||||
task = TaskMessage(
|
||||
from_agent="war_vassal",
|
||||
to_agent="logistics_companion",
|
||||
primitive="recruit_troop",
|
||||
args={"troop_type": "infantry", "quantity": 10},
|
||||
)
|
||||
result = await comp._execute(task)
|
||||
assert result.success is True
|
||||
assert result.outcome["recruited"] == 10
|
||||
|
||||
async def test_unknown_primitive_fails_gracefully(self):
|
||||
gabs = _mock_gabs()
|
||||
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
|
||||
comp = LogisticsCompanion(gabs_client=gabs, task_queue=q)
|
||||
task = TaskMessage(
|
||||
from_agent="war_vassal",
|
||||
to_agent="logistics_companion",
|
||||
primitive="launch_nukes",
|
||||
args={},
|
||||
)
|
||||
result = await comp._execute(task)
|
||||
assert result.success is False
|
||||
assert "Unknown primitive" in result.outcome["error"]
|
||||
|
||||
async def test_gabs_unavailable_returns_failure(self):
|
||||
gabs = _mock_gabs()
|
||||
gabs.recruit_troops = AsyncMock(side_effect=GABSUnavailable("offline"))
|
||||
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
|
||||
comp = LogisticsCompanion(gabs_client=gabs, task_queue=q)
|
||||
task = TaskMessage(
|
||||
from_agent="war_vassal",
|
||||
to_agent="logistics_companion",
|
||||
primitive="recruit_troop",
|
||||
args={"troop_type": "infantry", "quantity": 5},
|
||||
)
|
||||
result = await comp._execute(task)
|
||||
assert result.success is False
|
||||
|
||||
|
||||
class TestCaravanCompanion:
|
||||
async def test_assess_prices(self):
|
||||
gabs = _mock_gabs()
|
||||
gabs.call = AsyncMock(return_value={"grain": 12, "linen": 45})
|
||||
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
|
||||
comp = CaravanCompanion(gabs_client=gabs, task_queue=q)
|
||||
task = TaskMessage(
|
||||
from_agent="economy_vassal",
|
||||
to_agent="caravan_companion",
|
||||
primitive="assess_prices",
|
||||
args={"town": "Pravend"},
|
||||
)
|
||||
result = await comp._execute(task)
|
||||
assert result.success is True
|
||||
|
||||
async def test_abandon_route(self):
|
||||
gabs = _mock_gabs()
|
||||
gabs.call = AsyncMock(return_value={"abandoned": True})
|
||||
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
|
||||
comp = CaravanCompanion(gabs_client=gabs, task_queue=q)
|
||||
task = TaskMessage(
|
||||
from_agent="economy_vassal",
|
||||
to_agent="caravan_companion",
|
||||
primitive="abandon_route",
|
||||
args={},
|
||||
)
|
||||
result = await comp._execute(task)
|
||||
assert result.success is True
|
||||
assert result.outcome["abandoned"] is True
|
||||
|
||||
|
||||
class TestScoutCompanion:
|
||||
async def test_assess_garrison(self):
|
||||
gabs = _mock_gabs()
|
||||
gabs.call = AsyncMock(return_value={"garrison_size": 120, "settlement": "Marunath"})
|
||||
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
|
||||
comp = ScoutCompanion(gabs_client=gabs, task_queue=q)
|
||||
task = TaskMessage(
|
||||
from_agent="diplomacy_vassal",
|
||||
to_agent="scout_companion",
|
||||
primitive="assess_garrison",
|
||||
args={"settlement": "Marunath"},
|
||||
)
|
||||
result = await comp._execute(task)
|
||||
assert result.success is True
|
||||
assert result.outcome["garrison_size"] == 120
|
||||
|
||||
async def test_report_intel(self):
|
||||
gabs = _mock_gabs()
|
||||
gabs.call = AsyncMock(return_value={"intel": ["Derthert at Epicrotea"]})
|
||||
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
|
||||
comp = ScoutCompanion(gabs_client=gabs, task_queue=q)
|
||||
task = TaskMessage(
|
||||
from_agent="diplomacy_vassal",
|
||||
to_agent="scout_companion",
|
||||
primitive="report_intel",
|
||||
args={},
|
||||
)
|
||||
result = await comp._execute(task)
|
||||
assert result.success is True
|
||||
145
tests/unit/test_bannerlord/test_gabs_client.py
Normal file
145
tests/unit/test_bannerlord/test_gabs_client.py
Normal file
@@ -0,0 +1,145 @@
|
||||
"""Unit tests for bannerlord.gabs_client — TCP JSON-RPC client."""
|
||||
|
||||
import json
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from bannerlord.gabs_client import GABSClient, GABSError, GABSUnavailable
|
||||
|
||||
# ── Connection ────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestGABSClientConnection:
|
||||
async def test_connect_success(self):
|
||||
mock_reader = AsyncMock()
|
||||
mock_writer = MagicMock()
|
||||
mock_writer.close = MagicMock()
|
||||
mock_writer.wait_closed = AsyncMock()
|
||||
|
||||
with patch(
|
||||
"bannerlord.gabs_client.asyncio.open_connection",
|
||||
return_value=(mock_reader, mock_writer),
|
||||
):
|
||||
client = GABSClient()
|
||||
await client.connect()
|
||||
|
||||
assert client.connected is True
|
||||
await client.close()
|
||||
|
||||
async def test_connect_failure_degrades_gracefully(self):
|
||||
with patch(
|
||||
"bannerlord.gabs_client.asyncio.open_connection",
|
||||
side_effect=OSError("Connection refused"),
|
||||
):
|
||||
client = GABSClient()
|
||||
await client.connect() # must not raise
|
||||
|
||||
assert client.connected is False
|
||||
|
||||
async def test_connect_timeout_degrades_gracefully(self):
|
||||
with patch(
|
||||
"bannerlord.gabs_client.asyncio.open_connection",
|
||||
side_effect=TimeoutError(),
|
||||
):
|
||||
client = GABSClient()
|
||||
await client.connect()
|
||||
|
||||
assert client.connected is False
|
||||
|
||||
async def test_context_manager(self):
|
||||
mock_reader = AsyncMock()
|
||||
mock_writer = MagicMock()
|
||||
mock_writer.close = MagicMock()
|
||||
mock_writer.wait_closed = AsyncMock()
|
||||
|
||||
with patch(
|
||||
"bannerlord.gabs_client.asyncio.open_connection",
|
||||
return_value=(mock_reader, mock_writer),
|
||||
):
|
||||
async with GABSClient() as client:
|
||||
assert client.connected is True
|
||||
|
||||
assert client.connected is False
|
||||
|
||||
|
||||
# ── RPC ───────────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestGABSClientRPC:
|
||||
def _make_connected_client(self, response_data: dict):
|
||||
"""Return a client with mocked reader/writer."""
|
||||
client = GABSClient()
|
||||
client._connected = True
|
||||
|
||||
raw_response = json.dumps(response_data) + "\n"
|
||||
client._reader = AsyncMock()
|
||||
client._reader.readline = AsyncMock(return_value=raw_response.encode())
|
||||
|
||||
client._writer = MagicMock()
|
||||
client._writer.write = MagicMock()
|
||||
client._writer.drain = AsyncMock()
|
||||
|
||||
return client
|
||||
|
||||
async def test_call_returns_result(self):
|
||||
client = self._make_connected_client({"jsonrpc": "2.0", "id": 1, "result": {"foo": "bar"}})
|
||||
result = await client.call("game.getState")
|
||||
assert result == {"foo": "bar"}
|
||||
|
||||
async def test_call_raises_on_error(self):
|
||||
client = self._make_connected_client(
|
||||
{"jsonrpc": "2.0", "id": 1, "error": {"code": -32601, "message": "Method not found"}}
|
||||
)
|
||||
with pytest.raises(GABSError, match="Method not found"):
|
||||
await client.call("game.nonexistent")
|
||||
|
||||
async def test_call_raises_unavailable_when_not_connected(self):
|
||||
client = GABSClient()
|
||||
assert client.connected is False
|
||||
with pytest.raises(GABSUnavailable):
|
||||
await client.call("game.getState")
|
||||
|
||||
async def test_sequence_increments(self):
|
||||
client = self._make_connected_client({"jsonrpc": "2.0", "id": 1, "result": {}})
|
||||
await client.call("game.getState")
|
||||
assert client._seq == 1
|
||||
client._reader.readline = AsyncMock(
|
||||
return_value=(json.dumps({"jsonrpc": "2.0", "id": 2, "result": {}}) + "\n").encode()
|
||||
)
|
||||
await client.call("game.getState")
|
||||
assert client._seq == 2
|
||||
|
||||
async def test_get_state_calls_correct_method(self):
|
||||
client = self._make_connected_client(
|
||||
{"jsonrpc": "2.0", "id": 1, "result": {"campaign_day": 10}}
|
||||
)
|
||||
result = await client.get_state()
|
||||
written = client._writer.write.call_args[0][0].decode()
|
||||
payload = json.loads(written.strip())
|
||||
assert payload["method"] == "game.getState"
|
||||
assert result == {"campaign_day": 10}
|
||||
|
||||
async def test_move_party_sends_target(self):
|
||||
client = self._make_connected_client(
|
||||
{"jsonrpc": "2.0", "id": 1, "result": {"moving": True}}
|
||||
)
|
||||
await client.move_party("Epicrotea")
|
||||
written = client._writer.write.call_args[0][0].decode()
|
||||
payload = json.loads(written.strip())
|
||||
assert payload["method"] == "party.move"
|
||||
assert payload["params"]["target"] == "Epicrotea"
|
||||
|
||||
async def test_connection_lost_marks_disconnected(self):
|
||||
client = GABSClient()
|
||||
client._connected = True
|
||||
client._reader = AsyncMock()
|
||||
client._reader.readline = AsyncMock(side_effect=OSError("connection reset"))
|
||||
client._writer = MagicMock()
|
||||
client._writer.write = MagicMock()
|
||||
client._writer.drain = AsyncMock()
|
||||
|
||||
with pytest.raises(GABSUnavailable):
|
||||
await client.call("game.getState")
|
||||
|
||||
assert client.connected is False
|
||||
189
tests/unit/test_bannerlord/test_models.py
Normal file
189
tests/unit/test_bannerlord/test_models.py
Normal file
@@ -0,0 +1,189 @@
|
||||
"""Unit tests for bannerlord.models — data contracts and reward functions."""
|
||||
|
||||
import pytest
|
||||
|
||||
from bannerlord.models import (
|
||||
SUBGOAL_TOKENS,
|
||||
DiplomacyReward,
|
||||
EconomyReward,
|
||||
KingSubgoal,
|
||||
ResultMessage,
|
||||
StateUpdateMessage,
|
||||
SubgoalMessage,
|
||||
TaskMessage,
|
||||
VictoryCondition,
|
||||
WarReward,
|
||||
)
|
||||
|
||||
# ── KingSubgoal ───────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestKingSubgoal:
|
||||
def test_valid_token(self):
|
||||
s = KingSubgoal(token="EXPAND_TERRITORY", target="Epicrotea")
|
||||
assert s.token == "EXPAND_TERRITORY"
|
||||
assert s.target == "Epicrotea"
|
||||
assert s.priority == 1.0
|
||||
|
||||
def test_all_tokens_valid(self):
|
||||
for token in SUBGOAL_TOKENS:
|
||||
KingSubgoal(token=token)
|
||||
|
||||
def test_invalid_token_raises(self):
|
||||
with pytest.raises(ValueError, match="Unknown subgoal token"):
|
||||
KingSubgoal(token="NUKE_CALRADIA")
|
||||
|
||||
def test_priority_clamp(self):
|
||||
with pytest.raises(ValueError):
|
||||
KingSubgoal(token="TRADE", priority=3.0)
|
||||
|
||||
def test_optional_fields_default_none(self):
|
||||
s = KingSubgoal(token="HEAL")
|
||||
assert s.target is None
|
||||
assert s.quantity is None
|
||||
assert s.deadline_days is None
|
||||
assert s.context is None
|
||||
|
||||
|
||||
# ── Messages ──────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSubgoalMessage:
|
||||
def test_defaults(self):
|
||||
msg = SubgoalMessage(
|
||||
to_agent="war_vassal",
|
||||
subgoal=KingSubgoal(token="RAID_ECONOMY"),
|
||||
)
|
||||
assert msg.msg_type == "subgoal"
|
||||
assert msg.from_agent == "king"
|
||||
assert msg.to_agent == "war_vassal"
|
||||
assert msg.issued_at is not None
|
||||
|
||||
def test_subgoal_roundtrip(self):
|
||||
subgoal = KingSubgoal(token="RECRUIT", quantity=30, priority=1.5)
|
||||
msg = SubgoalMessage(to_agent="war_vassal", subgoal=subgoal)
|
||||
assert msg.subgoal.quantity == 30
|
||||
assert msg.subgoal.priority == 1.5
|
||||
|
||||
|
||||
class TestTaskMessage:
|
||||
def test_construction(self):
|
||||
t = TaskMessage(
|
||||
from_agent="war_vassal",
|
||||
to_agent="logistics_companion",
|
||||
primitive="recruit_troop",
|
||||
args={"troop_type": "cavalry", "quantity": 5},
|
||||
priority=1.2,
|
||||
)
|
||||
assert t.msg_type == "task"
|
||||
assert t.primitive == "recruit_troop"
|
||||
assert t.args["quantity"] == 5
|
||||
|
||||
|
||||
class TestResultMessage:
|
||||
def test_success(self):
|
||||
r = ResultMessage(
|
||||
from_agent="logistics_companion",
|
||||
to_agent="war_vassal",
|
||||
success=True,
|
||||
outcome={"recruited": 10},
|
||||
reward_delta=0.15,
|
||||
)
|
||||
assert r.success is True
|
||||
assert r.reward_delta == 0.15
|
||||
|
||||
def test_failure(self):
|
||||
r = ResultMessage(
|
||||
from_agent="scout_companion",
|
||||
to_agent="diplomacy_vassal",
|
||||
success=False,
|
||||
outcome={"error": "GABS unavailable"},
|
||||
)
|
||||
assert r.success is False
|
||||
assert r.reward_delta == 0.0
|
||||
|
||||
|
||||
class TestStateUpdateMessage:
|
||||
def test_construction(self):
|
||||
msg = StateUpdateMessage(
|
||||
game_state={"campaign_day": 42, "player_title": "Lord"},
|
||||
tick=42,
|
||||
)
|
||||
assert msg.msg_type == "state"
|
||||
assert msg.tick == 42
|
||||
assert msg.game_state["campaign_day"] == 42
|
||||
|
||||
|
||||
# ── Reward functions ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestWarReward:
|
||||
def test_positive_expansion(self):
|
||||
r = WarReward(territory_delta=2.0, army_strength_ratio=1.2, subgoal_bonus=0.1)
|
||||
assert r.total > 0
|
||||
|
||||
def test_casualty_cost_penalizes(self):
|
||||
no_cost = WarReward(territory_delta=1.0, army_strength_ratio=1.0)
|
||||
with_cost = WarReward(territory_delta=1.0, army_strength_ratio=1.0, casualty_cost=5.0)
|
||||
assert with_cost.total < no_cost.total
|
||||
|
||||
def test_zero_state(self):
|
||||
r = WarReward()
|
||||
# army_strength_ratio default 1.0, rest 0 → 0.25 * 1.0 = 0.25
|
||||
assert abs(r.total - 0.25) < 1e-9
|
||||
|
||||
|
||||
class TestEconomyReward:
|
||||
def test_income_positive(self):
|
||||
r = EconomyReward(daily_denars_income=100.0, food_stock_buffer=7.0, loyalty_average=80.0)
|
||||
assert r.total > 0
|
||||
|
||||
def test_construction_queue_penalizes(self):
|
||||
no_queue = EconomyReward(daily_denars_income=50.0)
|
||||
long_queue = EconomyReward(daily_denars_income=50.0, construction_queue_length=10)
|
||||
assert long_queue.total < no_queue.total
|
||||
|
||||
def test_loyalty_contributes(self):
|
||||
low_loyalty = EconomyReward(loyalty_average=10.0)
|
||||
high_loyalty = EconomyReward(loyalty_average=90.0)
|
||||
assert high_loyalty.total > low_loyalty.total
|
||||
|
||||
|
||||
class TestDiplomacyReward:
|
||||
def test_allies_positive(self):
|
||||
r = DiplomacyReward(allies_count=3)
|
||||
assert r.total > 0
|
||||
|
||||
def test_active_wars_penalizes(self):
|
||||
peace = DiplomacyReward(allies_count=2)
|
||||
war = DiplomacyReward(allies_count=2, active_wars_front=4)
|
||||
assert war.total < peace.total
|
||||
|
||||
|
||||
# ── Victory condition ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestVictoryCondition:
|
||||
def test_not_achieved_without_title(self):
|
||||
v = VictoryCondition(holds_king_title=False, territory_control_pct=70.0)
|
||||
assert not v.achieved
|
||||
|
||||
def test_not_achieved_without_majority(self):
|
||||
v = VictoryCondition(holds_king_title=True, territory_control_pct=40.0)
|
||||
assert not v.achieved
|
||||
|
||||
def test_achieved_when_king_with_majority(self):
|
||||
v = VictoryCondition(holds_king_title=True, territory_control_pct=55.0)
|
||||
assert v.achieved
|
||||
|
||||
def test_exact_threshold(self):
|
||||
v = VictoryCondition(holds_king_title=True, territory_control_pct=51.0)
|
||||
assert v.achieved
|
||||
|
||||
def test_custom_threshold(self):
|
||||
v = VictoryCondition(
|
||||
holds_king_title=True,
|
||||
territory_control_pct=70.0,
|
||||
majority_threshold=75.0,
|
||||
)
|
||||
assert not v.achieved
|
||||
890
tests/unit/test_config.py
Normal file
890
tests/unit/test_config.py
Normal file
@@ -0,0 +1,890 @@
|
||||
"""Unit tests for src/config.py — Settings, validation, and helper functions.
|
||||
|
||||
Refs #1172
|
||||
"""
|
||||
|
||||
import os
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
pytestmark = pytest.mark.unit
|
||||
|
||||
|
||||
# ── Helpers ──────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _make_settings(**env_overrides):
|
||||
"""Create a fresh Settings instance with isolated env vars."""
|
||||
from config import Settings
|
||||
|
||||
# Strip keys that might bleed in from the test environment
|
||||
clean_env = {
|
||||
k: v
|
||||
for k, v in os.environ.items()
|
||||
if not k.startswith(
|
||||
(
|
||||
"OLLAMA_",
|
||||
"TIMMY_",
|
||||
"AGENT_",
|
||||
"DEBUG",
|
||||
"GITEA_",
|
||||
"GROK_",
|
||||
"ANTHROPIC_",
|
||||
"SPARK_",
|
||||
"MEMORY_",
|
||||
"MAX_",
|
||||
"DISCORD_",
|
||||
"TELEGRAM_",
|
||||
"CORS_",
|
||||
"TRUSTED_",
|
||||
"L402_",
|
||||
"LIGHTNING_",
|
||||
"REPO_ROOT",
|
||||
"RQLITE_",
|
||||
"BRAIN_",
|
||||
"SELF_MODIFY",
|
||||
"WORK_ORDERS",
|
||||
"VASSAL_",
|
||||
"PAPERCLIP_",
|
||||
"OPENFANG_",
|
||||
"HERMES_",
|
||||
"BACKLOG_",
|
||||
"LOOP_QA",
|
||||
"FOCUS_",
|
||||
"THINKING_",
|
||||
"HANDS_",
|
||||
"WEEKLY_",
|
||||
"AUTORESEARCH_",
|
||||
"REWARD_",
|
||||
"BROWSER_",
|
||||
"GABS_",
|
||||
"SCRIPTURE_",
|
||||
"MCP_",
|
||||
"CHAT_API",
|
||||
"CSRF_",
|
||||
"ERROR_",
|
||||
"DB_",
|
||||
"MODERATION_",
|
||||
"SOVEREIGNTY_",
|
||||
"XAI_",
|
||||
"CLAUDE_",
|
||||
"FLUX_",
|
||||
"IMAGE_",
|
||||
"MUSIC_",
|
||||
"VIDEO_",
|
||||
"CREATIVE_",
|
||||
"WAN_",
|
||||
"ACE_",
|
||||
"GIT_",
|
||||
)
|
||||
)
|
||||
}
|
||||
clean_env.update(env_overrides)
|
||||
with patch.dict(os.environ, clean_env, clear=True):
|
||||
return Settings()
|
||||
|
||||
|
||||
# ── normalize_ollama_url ──────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestNormalizeOllamaUrl:
|
||||
"""normalize_ollama_url replaces localhost with 127.0.0.1."""
|
||||
|
||||
def test_replaces_localhost(self):
|
||||
from config import normalize_ollama_url
|
||||
|
||||
assert normalize_ollama_url("http://localhost:11434") == "http://127.0.0.1:11434"
|
||||
|
||||
def test_preserves_ip_address(self):
|
||||
from config import normalize_ollama_url
|
||||
|
||||
assert normalize_ollama_url("http://192.168.1.5:11434") == "http://192.168.1.5:11434"
|
||||
|
||||
def test_preserves_non_localhost_hostname(self):
|
||||
from config import normalize_ollama_url
|
||||
|
||||
assert normalize_ollama_url("http://ollama.local:11434") == "http://ollama.local:11434"
|
||||
|
||||
def test_replaces_multiple_occurrences(self):
|
||||
from config import normalize_ollama_url
|
||||
|
||||
result = normalize_ollama_url("http://localhost:11434/localhost")
|
||||
assert result == "http://127.0.0.1:11434/127.0.0.1"
|
||||
|
||||
def test_empty_string(self):
|
||||
from config import normalize_ollama_url
|
||||
|
||||
assert normalize_ollama_url("") == ""
|
||||
|
||||
def test_127_0_0_1_unchanged(self):
|
||||
from config import normalize_ollama_url
|
||||
|
||||
url = "http://127.0.0.1:11434"
|
||||
assert normalize_ollama_url(url) == url
|
||||
|
||||
|
||||
# ── Settings defaults ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSettingsDefaults:
|
||||
"""Settings instantiation produces correct defaults."""
|
||||
|
||||
def test_default_agent_name(self):
|
||||
s = _make_settings()
|
||||
assert s.agent_name == "Agent"
|
||||
|
||||
def test_default_ollama_url(self):
|
||||
s = _make_settings()
|
||||
assert s.ollama_url == "http://localhost:11434"
|
||||
|
||||
def test_default_ollama_model(self):
|
||||
s = _make_settings()
|
||||
assert s.ollama_model == "qwen3:14b"
|
||||
|
||||
def test_default_ollama_fast_model(self):
|
||||
s = _make_settings()
|
||||
assert s.ollama_fast_model == "qwen3:8b"
|
||||
|
||||
def test_default_ollama_num_ctx(self):
|
||||
s = _make_settings()
|
||||
assert s.ollama_num_ctx == 32768
|
||||
|
||||
def test_default_ollama_max_loaded_models(self):
|
||||
s = _make_settings()
|
||||
assert s.ollama_max_loaded_models == 2
|
||||
|
||||
def test_default_debug_false(self):
|
||||
s = _make_settings()
|
||||
assert s.debug is False
|
||||
|
||||
def test_default_timmy_env(self):
|
||||
s = _make_settings()
|
||||
assert s.timmy_env == "development"
|
||||
|
||||
def test_default_timmy_test_mode_false(self):
|
||||
s = _make_settings()
|
||||
assert s.timmy_test_mode is False
|
||||
|
||||
def test_default_spark_enabled(self):
|
||||
s = _make_settings()
|
||||
assert s.spark_enabled is True
|
||||
|
||||
def test_default_lightning_backend(self):
|
||||
s = _make_settings()
|
||||
assert s.lightning_backend == "mock"
|
||||
|
||||
def test_default_max_agent_steps(self):
|
||||
s = _make_settings()
|
||||
assert s.max_agent_steps == 10
|
||||
|
||||
def test_default_memory_prune_days(self):
|
||||
s = _make_settings()
|
||||
assert s.memory_prune_days == 90
|
||||
|
||||
def test_default_memory_prune_keep_facts(self):
|
||||
s = _make_settings()
|
||||
assert s.memory_prune_keep_facts is True
|
||||
|
||||
def test_default_fallback_models_is_list(self):
|
||||
s = _make_settings()
|
||||
assert isinstance(s.fallback_models, list)
|
||||
assert len(s.fallback_models) > 0
|
||||
|
||||
def test_default_vision_fallback_models_is_list(self):
|
||||
s = _make_settings()
|
||||
assert isinstance(s.vision_fallback_models, list)
|
||||
assert len(s.vision_fallback_models) > 0
|
||||
|
||||
def test_default_cors_origins_is_list(self):
|
||||
s = _make_settings()
|
||||
assert isinstance(s.cors_origins, list)
|
||||
assert len(s.cors_origins) > 0
|
||||
|
||||
def test_default_trusted_hosts_is_list(self):
|
||||
s = _make_settings()
|
||||
assert isinstance(s.trusted_hosts, list)
|
||||
assert "localhost" in s.trusted_hosts
|
||||
|
||||
def test_default_timmy_model_backend(self):
|
||||
s = _make_settings()
|
||||
assert s.timmy_model_backend == "ollama"
|
||||
|
||||
def test_default_grok_enabled_false(self):
|
||||
s = _make_settings()
|
||||
assert s.grok_enabled is False
|
||||
|
||||
def test_default_moderation_enabled(self):
|
||||
s = _make_settings()
|
||||
assert s.moderation_enabled is True
|
||||
|
||||
def test_default_moderation_threshold(self):
|
||||
s = _make_settings()
|
||||
assert s.moderation_threshold == 0.8
|
||||
|
||||
def test_default_telemetry_disabled(self):
|
||||
s = _make_settings()
|
||||
assert s.telemetry_enabled is False
|
||||
|
||||
def test_default_db_busy_timeout(self):
|
||||
s = _make_settings()
|
||||
assert s.db_busy_timeout_ms == 5000
|
||||
|
||||
def test_default_chat_api_max_body_bytes(self):
|
||||
s = _make_settings()
|
||||
assert s.chat_api_max_body_bytes == 1_048_576
|
||||
|
||||
def test_default_csrf_cookie_secure_false(self):
|
||||
s = _make_settings()
|
||||
assert s.csrf_cookie_secure is False
|
||||
|
||||
def test_default_self_modify_disabled(self):
|
||||
s = _make_settings()
|
||||
assert s.self_modify_enabled is False
|
||||
|
||||
def test_default_vassal_disabled(self):
|
||||
s = _make_settings()
|
||||
assert s.vassal_enabled is False
|
||||
|
||||
def test_default_focus_mode(self):
|
||||
s = _make_settings()
|
||||
assert s.focus_mode == "broad"
|
||||
|
||||
def test_default_thinking_enabled(self):
|
||||
s = _make_settings()
|
||||
assert s.thinking_enabled is True
|
||||
|
||||
def test_default_gitea_url(self):
|
||||
s = _make_settings()
|
||||
assert s.gitea_url == "http://localhost:3000"
|
||||
|
||||
def test_default_hermes_enabled(self):
|
||||
s = _make_settings()
|
||||
assert s.hermes_enabled is True
|
||||
|
||||
def test_default_scripture_enabled(self):
|
||||
s = _make_settings()
|
||||
assert s.scripture_enabled is True
|
||||
|
||||
|
||||
# ── normalized_ollama_url property ───────────────────────────────────────────
|
||||
|
||||
|
||||
class TestNormalizedOllamaUrlProperty:
|
||||
"""normalized_ollama_url property applies normalize_ollama_url."""
|
||||
|
||||
def test_default_url_normalized(self):
|
||||
s = _make_settings()
|
||||
assert "127.0.0.1" in s.normalized_ollama_url
|
||||
assert "localhost" not in s.normalized_ollama_url
|
||||
|
||||
def test_custom_url_with_localhost(self):
|
||||
s = _make_settings(OLLAMA_URL="http://localhost:9999")
|
||||
assert s.normalized_ollama_url == "http://127.0.0.1:9999"
|
||||
|
||||
def test_custom_url_without_localhost_unchanged(self):
|
||||
s = _make_settings(OLLAMA_URL="http://192.168.1.5:11434")
|
||||
assert s.normalized_ollama_url == "http://192.168.1.5:11434"
|
||||
|
||||
|
||||
# ── Env var overrides ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSettingsEnvOverrides:
|
||||
"""Environment variables override default values."""
|
||||
|
||||
def test_agent_name_override(self):
|
||||
s = _make_settings(AGENT_NAME="Timmy")
|
||||
assert s.agent_name == "Timmy"
|
||||
|
||||
def test_ollama_url_override(self):
|
||||
s = _make_settings(OLLAMA_URL="http://10.0.0.1:11434")
|
||||
assert s.ollama_url == "http://10.0.0.1:11434"
|
||||
|
||||
def test_ollama_model_override(self):
|
||||
s = _make_settings(OLLAMA_MODEL="llama3.1")
|
||||
assert s.ollama_model == "llama3.1"
|
||||
|
||||
def test_ollama_fast_model_override(self):
|
||||
s = _make_settings(OLLAMA_FAST_MODEL="gemma:2b")
|
||||
assert s.ollama_fast_model == "gemma:2b"
|
||||
|
||||
def test_ollama_num_ctx_override(self):
|
||||
s = _make_settings(OLLAMA_NUM_CTX="8192")
|
||||
assert s.ollama_num_ctx == 8192
|
||||
|
||||
def test_debug_true_from_string(self):
|
||||
s = _make_settings(DEBUG="true")
|
||||
assert s.debug is True
|
||||
|
||||
def test_debug_false_from_string(self):
|
||||
s = _make_settings(DEBUG="false")
|
||||
assert s.debug is False
|
||||
|
||||
def test_timmy_env_production(self):
|
||||
s = _make_settings(TIMMY_ENV="production")
|
||||
assert s.timmy_env == "production"
|
||||
|
||||
def test_timmy_test_mode_true(self):
|
||||
s = _make_settings(TIMMY_TEST_MODE="true")
|
||||
assert s.timmy_test_mode is True
|
||||
|
||||
def test_grok_enabled_override(self):
|
||||
s = _make_settings(GROK_ENABLED="true")
|
||||
assert s.grok_enabled is True
|
||||
|
||||
def test_spark_enabled_override(self):
|
||||
s = _make_settings(SPARK_ENABLED="false")
|
||||
assert s.spark_enabled is False
|
||||
|
||||
def test_memory_prune_days_override(self):
|
||||
s = _make_settings(MEMORY_PRUNE_DAYS="30")
|
||||
assert s.memory_prune_days == 30
|
||||
|
||||
def test_max_agent_steps_override(self):
|
||||
s = _make_settings(MAX_AGENT_STEPS="25")
|
||||
assert s.max_agent_steps == 25
|
||||
|
||||
def test_telegram_token_override(self):
|
||||
s = _make_settings(TELEGRAM_TOKEN="tg-secret")
|
||||
assert s.telegram_token == "tg-secret"
|
||||
|
||||
def test_discord_token_override(self):
|
||||
s = _make_settings(DISCORD_TOKEN="dc-secret")
|
||||
assert s.discord_token == "dc-secret"
|
||||
|
||||
def test_gitea_url_override(self):
|
||||
s = _make_settings(GITEA_URL="http://10.0.0.1:3000")
|
||||
assert s.gitea_url == "http://10.0.0.1:3000"
|
||||
|
||||
def test_gitea_repo_override(self):
|
||||
s = _make_settings(GITEA_REPO="myorg/myrepo")
|
||||
assert s.gitea_repo == "myorg/myrepo"
|
||||
|
||||
def test_focus_mode_deep(self):
|
||||
s = _make_settings(FOCUS_MODE="deep")
|
||||
assert s.focus_mode == "deep"
|
||||
|
||||
def test_thinking_interval_override(self):
|
||||
s = _make_settings(THINKING_INTERVAL_SECONDS="60")
|
||||
assert s.thinking_interval_seconds == 60
|
||||
|
||||
def test_hermes_interval_override(self):
|
||||
s = _make_settings(HERMES_INTERVAL_SECONDS="60")
|
||||
assert s.hermes_interval_seconds == 60
|
||||
|
||||
def test_vassal_enabled_override(self):
|
||||
s = _make_settings(VASSAL_ENABLED="true")
|
||||
assert s.vassal_enabled is True
|
||||
|
||||
def test_self_modify_enabled_override(self):
|
||||
s = _make_settings(SELF_MODIFY_ENABLED="true")
|
||||
assert s.self_modify_enabled is True
|
||||
|
||||
def test_moderation_enabled_override(self):
|
||||
s = _make_settings(MODERATION_ENABLED="false")
|
||||
assert s.moderation_enabled is False
|
||||
|
||||
def test_l402_hmac_secret_override(self):
|
||||
s = _make_settings(L402_HMAC_SECRET="mysecret")
|
||||
assert s.l402_hmac_secret == "mysecret"
|
||||
|
||||
def test_anthropic_api_key_override(self):
|
||||
s = _make_settings(ANTHROPIC_API_KEY="sk-ant-abc")
|
||||
assert s.anthropic_api_key == "sk-ant-abc"
|
||||
|
||||
|
||||
# ── Type validation ───────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSettingsTypeValidation:
|
||||
"""Pydantic correctly parses and validates types from string env vars."""
|
||||
|
||||
def test_bool_from_1(self):
|
||||
s = _make_settings(DEBUG="1")
|
||||
assert s.debug is True
|
||||
|
||||
def test_bool_from_0(self):
|
||||
s = _make_settings(DEBUG="0")
|
||||
assert s.debug is False
|
||||
|
||||
def test_int_field_rejects_non_numeric(self):
|
||||
from pydantic import ValidationError
|
||||
|
||||
with pytest.raises(ValidationError):
|
||||
_make_settings(OLLAMA_NUM_CTX="not_a_number")
|
||||
|
||||
def test_timmy_env_rejects_invalid_literal(self):
|
||||
from pydantic import ValidationError
|
||||
|
||||
with pytest.raises(ValidationError):
|
||||
_make_settings(TIMMY_ENV="staging")
|
||||
|
||||
def test_timmy_model_backend_rejects_invalid(self):
|
||||
from pydantic import ValidationError
|
||||
|
||||
with pytest.raises(ValidationError):
|
||||
_make_settings(TIMMY_MODEL_BACKEND="openai")
|
||||
|
||||
def test_timmy_model_backend_accepts_all_valid_values(self):
|
||||
for backend in ("ollama", "grok", "claude", "auto"):
|
||||
s = _make_settings(TIMMY_MODEL_BACKEND=backend)
|
||||
assert s.timmy_model_backend == backend
|
||||
|
||||
def test_lightning_backend_accepts_mock(self):
|
||||
s = _make_settings(LIGHTNING_BACKEND="mock")
|
||||
assert s.lightning_backend == "mock"
|
||||
|
||||
def test_lightning_backend_accepts_lnd(self):
|
||||
s = _make_settings(LIGHTNING_BACKEND="lnd")
|
||||
assert s.lightning_backend == "lnd"
|
||||
|
||||
def test_lightning_backend_rejects_invalid(self):
|
||||
from pydantic import ValidationError
|
||||
|
||||
with pytest.raises(ValidationError):
|
||||
_make_settings(LIGHTNING_BACKEND="stripe")
|
||||
|
||||
def test_focus_mode_rejects_invalid(self):
|
||||
from pydantic import ValidationError
|
||||
|
||||
with pytest.raises(ValidationError):
|
||||
_make_settings(FOCUS_MODE="zen")
|
||||
|
||||
def test_extra_fields_ignored(self):
|
||||
# model_config has extra="ignore"
|
||||
s = _make_settings(TOTALLY_UNKNOWN_FIELD="hello")
|
||||
assert not hasattr(s, "totally_unknown_field")
|
||||
|
||||
def test_float_field_moderation_threshold(self):
|
||||
s = _make_settings(MODERATION_THRESHOLD="0.95")
|
||||
assert s.moderation_threshold == pytest.approx(0.95)
|
||||
|
||||
def test_float_field_gabs_timeout(self):
|
||||
s = _make_settings(GABS_TIMEOUT="10.5")
|
||||
assert s.gabs_timeout == pytest.approx(10.5)
|
||||
|
||||
|
||||
# ── Edge cases ────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSettingsEdgeCases:
|
||||
"""Edge cases: empty strings, boundary values."""
|
||||
|
||||
def test_empty_string_tokens_stay_empty(self):
|
||||
s = _make_settings(TELEGRAM_TOKEN="", DISCORD_TOKEN="")
|
||||
assert s.telegram_token == ""
|
||||
assert s.discord_token == ""
|
||||
|
||||
def test_zero_int_fields(self):
|
||||
s = _make_settings(OLLAMA_NUM_CTX="0", MEMORY_PRUNE_DAYS="0")
|
||||
assert s.ollama_num_ctx == 0
|
||||
assert s.memory_prune_days == 0
|
||||
|
||||
def test_large_int_value(self):
|
||||
s = _make_settings(CHAT_API_MAX_BODY_BYTES="104857600")
|
||||
assert s.chat_api_max_body_bytes == 104857600
|
||||
|
||||
def test_negative_int_accepted(self):
|
||||
# Pydantic doesn't constrain these to positive by default
|
||||
s = _make_settings(MAX_AGENT_STEPS="-1")
|
||||
assert s.max_agent_steps == -1
|
||||
|
||||
def test_empty_api_keys_are_strings(self):
|
||||
s = _make_settings()
|
||||
assert isinstance(s.anthropic_api_key, str)
|
||||
assert isinstance(s.xai_api_key, str)
|
||||
assert isinstance(s.gitea_token, str)
|
||||
|
||||
|
||||
# ── _compute_repo_root ────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestComputeRepoRoot:
|
||||
"""_compute_repo_root auto-detects .git directory."""
|
||||
|
||||
def test_returns_non_empty_string(self):
|
||||
from config import Settings
|
||||
|
||||
s = Settings()
|
||||
result = s._compute_repo_root()
|
||||
assert isinstance(result, str)
|
||||
assert len(result) > 0
|
||||
|
||||
def test_explicit_repo_root_returned_directly(self):
|
||||
from config import Settings
|
||||
|
||||
s = Settings()
|
||||
s.repo_root = "/tmp/custom-repo"
|
||||
assert s._compute_repo_root() == "/tmp/custom-repo"
|
||||
|
||||
def test_detects_git_directory(self):
|
||||
from config import Settings
|
||||
|
||||
s = Settings()
|
||||
result = s._compute_repo_root()
|
||||
import os
|
||||
|
||||
# The detected root should contain a .git directory (or be the cwd fallback)
|
||||
assert os.path.isabs(result)
|
||||
|
||||
|
||||
# ── model_post_init / gitea_token file fallback ───────────────────────────────
|
||||
|
||||
|
||||
class TestModelPostInit:
|
||||
"""model_post_init resolves gitea_token from file fallback."""
|
||||
|
||||
def test_gitea_token_from_env(self):
|
||||
from config import Settings
|
||||
|
||||
with patch.dict(os.environ, {"GITEA_TOKEN": "env-token-abc"}, clear=False):
|
||||
s = Settings()
|
||||
assert s.gitea_token == "env-token-abc"
|
||||
|
||||
def test_gitea_token_stays_empty_when_no_file(self):
|
||||
from config import Settings
|
||||
|
||||
env = {k: v for k, v in os.environ.items() if k != "GITEA_TOKEN"}
|
||||
with patch.dict(os.environ, env, clear=True):
|
||||
with patch("os.path.isfile", return_value=False):
|
||||
s = Settings()
|
||||
assert s.gitea_token == ""
|
||||
|
||||
def test_gitea_token_read_from_timmy_token_file(self, tmp_path):
|
||||
"""model_post_init reads token from .timmy_gitea_token file."""
|
||||
from config import Settings
|
||||
|
||||
token_file = tmp_path / ".timmy_gitea_token"
|
||||
token_file.write_text("file-token-xyz\n")
|
||||
|
||||
env = {k: v for k, v in os.environ.items() if k != "GITEA_TOKEN"}
|
||||
with patch.dict(os.environ, env, clear=True):
|
||||
s = Settings()
|
||||
|
||||
# Override repo_root so post_init finds our temp file
|
||||
def _fake_root():
|
||||
return str(tmp_path)
|
||||
|
||||
s._compute_repo_root = _fake_root # type: ignore[method-assign]
|
||||
# Re-run post_init logic manually since Settings is already created
|
||||
s.gitea_token = ""
|
||||
repo_root = _fake_root()
|
||||
token_path = os.path.join(repo_root, ".timmy_gitea_token")
|
||||
if os.path.isfile(token_path):
|
||||
s.gitea_token = open(token_path).read().strip() # noqa: SIM115
|
||||
assert s.gitea_token == "file-token-xyz"
|
||||
|
||||
def test_gitea_token_empty_file_stays_empty(self, tmp_path):
|
||||
"""Empty token file leaves gitea_token as empty string."""
|
||||
token_file = tmp_path / ".timmy_gitea_token"
|
||||
token_file.write_text(" \n") # only whitespace
|
||||
|
||||
from config import Settings
|
||||
|
||||
env = {k: v for k, v in os.environ.items() if k != "GITEA_TOKEN"}
|
||||
with patch.dict(os.environ, env, clear=True):
|
||||
s = Settings()
|
||||
# Simulate post_init with the tmp dir
|
||||
s.gitea_token = ""
|
||||
token_path = str(token_file)
|
||||
if os.path.isfile(token_path):
|
||||
token = open(token_path).read().strip() # noqa: SIM115
|
||||
if token:
|
||||
s.gitea_token = token
|
||||
assert s.gitea_token == ""
|
||||
|
||||
|
||||
# ── check_ollama_model_available ──────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestCheckOllamaModelAvailable:
|
||||
"""check_ollama_model_available handles network responses and errors."""
|
||||
|
||||
def test_returns_false_on_oserror(self):
|
||||
from config import check_ollama_model_available
|
||||
|
||||
with patch("urllib.request.urlopen", side_effect=OSError("Connection refused")):
|
||||
assert check_ollama_model_available("llama3.1") is False
|
||||
|
||||
def test_returns_false_on_value_error(self):
|
||||
from config import check_ollama_model_available
|
||||
|
||||
with patch("urllib.request.urlopen", side_effect=ValueError("Bad JSON")):
|
||||
assert check_ollama_model_available("llama3.1") is False
|
||||
|
||||
def test_returns_true_exact_model_match(self):
|
||||
import json
|
||||
|
||||
from config import check_ollama_model_available
|
||||
|
||||
response_data = json.dumps({"models": [{"name": "llama3.1:8b-instruct"}]}).encode()
|
||||
mock_response = MagicMock()
|
||||
mock_response.read.return_value = response_data
|
||||
mock_response.__enter__ = lambda s: s
|
||||
mock_response.__exit__ = MagicMock(return_value=False)
|
||||
|
||||
with patch("urllib.request.urlopen", return_value=mock_response):
|
||||
assert check_ollama_model_available("llama3.1") is True
|
||||
|
||||
def test_returns_true_startswith_match(self):
|
||||
import json
|
||||
|
||||
from config import check_ollama_model_available
|
||||
|
||||
response_data = json.dumps({"models": [{"name": "qwen3:14b"}]}).encode()
|
||||
mock_response = MagicMock()
|
||||
mock_response.read.return_value = response_data
|
||||
mock_response.__enter__ = lambda s: s
|
||||
mock_response.__exit__ = MagicMock(return_value=False)
|
||||
|
||||
with patch("urllib.request.urlopen", return_value=mock_response):
|
||||
# "qwen3" matches "qwen3:14b" via startswith
|
||||
assert check_ollama_model_available("qwen3") is True
|
||||
|
||||
def test_returns_false_when_model_not_found(self):
|
||||
import json
|
||||
|
||||
from config import check_ollama_model_available
|
||||
|
||||
response_data = json.dumps({"models": [{"name": "qwen2.5:7b"}]}).encode()
|
||||
mock_response = MagicMock()
|
||||
mock_response.read.return_value = response_data
|
||||
mock_response.__enter__ = lambda s: s
|
||||
mock_response.__exit__ = MagicMock(return_value=False)
|
||||
|
||||
with patch("urllib.request.urlopen", return_value=mock_response):
|
||||
assert check_ollama_model_available("llama3.1") is False
|
||||
|
||||
def test_returns_false_empty_model_list(self):
|
||||
import json
|
||||
|
||||
from config import check_ollama_model_available
|
||||
|
||||
response_data = json.dumps({"models": []}).encode()
|
||||
mock_response = MagicMock()
|
||||
mock_response.read.return_value = response_data
|
||||
mock_response.__enter__ = lambda s: s
|
||||
mock_response.__exit__ = MagicMock(return_value=False)
|
||||
|
||||
with patch("urllib.request.urlopen", return_value=mock_response):
|
||||
assert check_ollama_model_available("llama3.1") is False
|
||||
|
||||
def test_exact_name_match(self):
|
||||
import json
|
||||
|
||||
from config import check_ollama_model_available
|
||||
|
||||
response_data = json.dumps({"models": [{"name": "qwen3:14b"}]}).encode()
|
||||
mock_response = MagicMock()
|
||||
mock_response.read.return_value = response_data
|
||||
mock_response.__enter__ = lambda s: s
|
||||
mock_response.__exit__ = MagicMock(return_value=False)
|
||||
|
||||
with patch("urllib.request.urlopen", return_value=mock_response):
|
||||
assert check_ollama_model_available("qwen3:14b") is True
|
||||
|
||||
|
||||
# ── get_effective_ollama_model ────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestGetEffectiveOllamaModel:
|
||||
"""get_effective_ollama_model walks fallback chain."""
|
||||
|
||||
def test_returns_primary_when_available(self):
|
||||
from config import get_effective_ollama_model
|
||||
|
||||
with patch("config.check_ollama_model_available", return_value=True):
|
||||
result = get_effective_ollama_model()
|
||||
# Default is qwen3:14b
|
||||
assert result == "qwen3:14b"
|
||||
|
||||
def test_falls_back_when_primary_unavailable(self):
|
||||
from config import get_effective_ollama_model, settings
|
||||
|
||||
# Make primary unavailable, but one fallback available
|
||||
fallback_target = settings.fallback_models[0]
|
||||
|
||||
def side_effect(model):
|
||||
return model == fallback_target
|
||||
|
||||
with patch("config.check_ollama_model_available", side_effect=side_effect):
|
||||
result = get_effective_ollama_model()
|
||||
assert result == fallback_target
|
||||
|
||||
def test_returns_user_model_when_nothing_available(self):
|
||||
from config import get_effective_ollama_model, settings
|
||||
|
||||
with patch("config.check_ollama_model_available", return_value=False):
|
||||
result = get_effective_ollama_model()
|
||||
# Last resort: returns user's configured model
|
||||
assert result == settings.ollama_model
|
||||
|
||||
def test_skips_unavailable_fallbacks(self):
|
||||
from config import get_effective_ollama_model, settings
|
||||
|
||||
# Only the last fallback is available
|
||||
fallbacks = settings.fallback_models
|
||||
last_fallback = fallbacks[-1]
|
||||
|
||||
def side_effect(model):
|
||||
return model == last_fallback
|
||||
|
||||
with patch("config.check_ollama_model_available", side_effect=side_effect):
|
||||
result = get_effective_ollama_model()
|
||||
assert result == last_fallback
|
||||
|
||||
|
||||
# ── validate_startup ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestValidateStartup:
|
||||
"""validate_startup enforces security in production, warns in dev."""
|
||||
|
||||
def setup_method(self):
|
||||
import config
|
||||
|
||||
config._startup_validated = False
|
||||
|
||||
def test_skips_in_test_mode(self):
|
||||
import config
|
||||
|
||||
with patch.dict(os.environ, {"TIMMY_TEST_MODE": "1"}):
|
||||
config.validate_startup()
|
||||
assert config._startup_validated is True
|
||||
|
||||
def test_dev_mode_does_not_exit(self):
|
||||
import config
|
||||
|
||||
config._startup_validated = False
|
||||
env = {k: v for k, v in os.environ.items() if k != "TIMMY_TEST_MODE"}
|
||||
env["TIMMY_ENV"] = "development"
|
||||
with patch.dict(os.environ, env, clear=True):
|
||||
# Should not raise SystemExit
|
||||
config.validate_startup()
|
||||
assert config._startup_validated is True
|
||||
|
||||
def test_production_exits_without_l402_hmac_secret(self):
|
||||
import config
|
||||
|
||||
config._startup_validated = False
|
||||
with patch.object(config.settings, "timmy_env", "production"):
|
||||
with patch.object(config.settings, "l402_hmac_secret", ""):
|
||||
with patch.object(config.settings, "l402_macaroon_secret", ""):
|
||||
with pytest.raises(SystemExit):
|
||||
config.validate_startup(force=True)
|
||||
|
||||
def test_production_exits_without_l402_macaroon_secret(self):
|
||||
import config
|
||||
|
||||
config._startup_validated = False
|
||||
with patch.object(config.settings, "timmy_env", "production"):
|
||||
with patch.object(config.settings, "l402_hmac_secret", "present"):
|
||||
with patch.object(config.settings, "l402_macaroon_secret", ""):
|
||||
with pytest.raises(SystemExit):
|
||||
config.validate_startup(force=True)
|
||||
|
||||
def test_production_exits_with_cors_wildcard(self):
|
||||
import config
|
||||
|
||||
config._startup_validated = False
|
||||
with patch.object(config.settings, "timmy_env", "production"):
|
||||
with patch.object(config.settings, "l402_hmac_secret", "secret1"):
|
||||
with patch.object(config.settings, "l402_macaroon_secret", "secret2"):
|
||||
with patch.object(config.settings, "cors_origins", ["*"]):
|
||||
with pytest.raises(SystemExit):
|
||||
config.validate_startup(force=True)
|
||||
|
||||
def test_production_passes_with_all_secrets_and_no_wildcard(self):
|
||||
import config
|
||||
|
||||
config._startup_validated = False
|
||||
with patch.object(config.settings, "timmy_env", "production"):
|
||||
with patch.object(config.settings, "l402_hmac_secret", "secret1"):
|
||||
with patch.object(config.settings, "l402_macaroon_secret", "secret2"):
|
||||
with patch.object(config.settings, "cors_origins", ["http://localhost:3000"]):
|
||||
config.validate_startup(force=True)
|
||||
assert config._startup_validated is True
|
||||
|
||||
def test_idempotent_without_force(self):
|
||||
import config
|
||||
|
||||
config._startup_validated = True
|
||||
config.validate_startup()
|
||||
assert config._startup_validated is True
|
||||
|
||||
def test_force_reruns_when_already_validated(self):
|
||||
import config
|
||||
|
||||
config._startup_validated = True
|
||||
with patch.dict(os.environ, {"TIMMY_TEST_MODE": "1"}):
|
||||
config.validate_startup(force=True)
|
||||
# Should have run (and set validated again)
|
||||
assert config._startup_validated is True
|
||||
|
||||
def test_dev_warns_on_cors_wildcard(self, caplog):
|
||||
import logging
|
||||
|
||||
import config
|
||||
|
||||
config._startup_validated = False
|
||||
env = {k: v for k, v in os.environ.items() if k != "TIMMY_TEST_MODE"}
|
||||
env["TIMMY_ENV"] = "development"
|
||||
with patch.dict(os.environ, env, clear=True):
|
||||
with patch.object(config.settings, "timmy_env", "development"):
|
||||
with patch.object(config.settings, "cors_origins", ["*"]):
|
||||
with patch.object(config.settings, "l402_hmac_secret", ""):
|
||||
with patch.object(config.settings, "l402_macaroon_secret", ""):
|
||||
with caplog.at_level(logging.WARNING, logger="config"):
|
||||
config.validate_startup(force=True)
|
||||
assert any("CORS" in rec.message for rec in caplog.records)
|
||||
|
||||
|
||||
# ── APP_START_TIME ────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestAppStartTime:
|
||||
"""APP_START_TIME is set at module load."""
|
||||
|
||||
def test_app_start_time_is_datetime(self):
|
||||
from datetime import datetime
|
||||
|
||||
from config import APP_START_TIME
|
||||
|
||||
assert isinstance(APP_START_TIME, datetime)
|
||||
|
||||
def test_app_start_time_has_utc_timezone(self):
|
||||
from config import APP_START_TIME
|
||||
|
||||
assert APP_START_TIME.tzinfo is not None
|
||||
|
||||
def test_app_start_time_is_in_the_past_or_now(self):
|
||||
from datetime import UTC, datetime
|
||||
|
||||
from config import APP_START_TIME
|
||||
|
||||
assert APP_START_TIME <= datetime.now(UTC)
|
||||
|
||||
|
||||
# ── Module-level singleton ────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSettingsSingleton:
|
||||
"""The module-level `settings` singleton is a Settings instance."""
|
||||
|
||||
def test_settings_is_settings_instance(self):
|
||||
from config import Settings, settings
|
||||
|
||||
assert isinstance(settings, Settings)
|
||||
|
||||
def test_settings_repo_root_is_set(self):
|
||||
from config import settings
|
||||
|
||||
assert isinstance(settings.repo_root, str)
|
||||
|
||||
def test_settings_has_expected_defaults(self):
|
||||
from config import settings
|
||||
|
||||
# In test mode these may be overridden, but type should be correct
|
||||
assert isinstance(settings.ollama_url, str)
|
||||
assert isinstance(settings.debug, bool)
|
||||
449
tests/unit/test_hermes_monitor.py
Normal file
449
tests/unit/test_hermes_monitor.py
Normal file
@@ -0,0 +1,449 @@
|
||||
"""Unit tests for the Hermes health monitor.
|
||||
|
||||
Tests all five checks (memory, disk, Ollama, processes, network) using mocks
|
||||
so no real subprocesses or network calls are made.
|
||||
|
||||
Refs: #1073
|
||||
"""
|
||||
|
||||
import json
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from infrastructure.hermes.monitor import CheckResult, HealthLevel, HealthReport, HermesMonitor
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def monitor():
|
||||
return HermesMonitor()
|
||||
|
||||
|
||||
# ── Unit helpers ──────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class _FakeHTTPResponse:
|
||||
"""Minimal urllib response stub."""
|
||||
|
||||
def __init__(self, body: bytes, status: int = 200):
|
||||
self._body = body
|
||||
self.status = status
|
||||
|
||||
def read(self) -> bytes:
|
||||
return self._body
|
||||
|
||||
def __enter__(self):
|
||||
return self
|
||||
|
||||
def __exit__(self, *_):
|
||||
pass
|
||||
|
||||
|
||||
# ── Memory check ──────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def test_get_memory_info_parses_vm_stat(monitor):
|
||||
vm_stat_output = (
|
||||
"Mach Virtual Memory Statistics: (page size of 16384 bytes)\n"
|
||||
"Pages free: 12800.\n"
|
||||
"Pages active: 50000.\n"
|
||||
"Pages inactive: 25600.\n"
|
||||
"Pages speculative: 1000.\n"
|
||||
)
|
||||
with (
|
||||
patch("subprocess.run") as mock_run,
|
||||
):
|
||||
# First call: sysctl hw.memsize (total)
|
||||
sysctl_result = MagicMock()
|
||||
sysctl_result.stdout = "68719476736\n" # 64 GB
|
||||
# Second call: vm_stat
|
||||
vmstat_result = MagicMock()
|
||||
vmstat_result.stdout = vm_stat_output
|
||||
mock_run.side_effect = [sysctl_result, vmstat_result]
|
||||
|
||||
info = monitor._get_memory_info()
|
||||
|
||||
assert info["total_gb"] == pytest.approx(64.0, abs=0.1)
|
||||
# pages free (12800) + inactive (25600) = 38400 * 16384 bytes = 629145600 bytes ≈ 0.586 GB
|
||||
expected_free_gb = (38400 * 16384) / (1024**3)
|
||||
assert info["free_gb"] == pytest.approx(expected_free_gb, abs=0.001)
|
||||
|
||||
|
||||
def test_get_memory_info_handles_subprocess_failure(monitor):
|
||||
with patch("subprocess.run", side_effect=OSError("no sysctl")):
|
||||
info = monitor._get_memory_info()
|
||||
assert info["total_gb"] == 0.0
|
||||
assert info["free_gb"] == 0.0
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_memory_ok(monitor):
|
||||
with patch.object(
|
||||
monitor, "_get_memory_info", return_value={"free_gb": 20.0, "total_gb": 64.0}
|
||||
):
|
||||
result = await monitor._check_memory()
|
||||
|
||||
assert result.name == "memory"
|
||||
assert result.level == HealthLevel.OK
|
||||
assert "20.0GB" in result.message
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_memory_low_triggers_unload(monitor):
|
||||
with (
|
||||
patch.object(monitor, "_get_memory_info", return_value={"free_gb": 2.0, "total_gb": 64.0}),
|
||||
patch.object(monitor, "_unload_ollama_models", return_value=2),
|
||||
):
|
||||
result = await monitor._check_memory()
|
||||
|
||||
assert result.level == HealthLevel.WARNING
|
||||
assert result.auto_resolved is True
|
||||
assert "unloaded 2" in result.message
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_memory_critical_no_models_to_unload(monitor):
|
||||
with (
|
||||
patch.object(monitor, "_get_memory_info", return_value={"free_gb": 1.0, "total_gb": 64.0}),
|
||||
patch.object(monitor, "_unload_ollama_models", return_value=0),
|
||||
):
|
||||
result = await monitor._check_memory()
|
||||
|
||||
assert result.level == HealthLevel.CRITICAL
|
||||
assert result.needs_human is True
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_memory_exception_returns_unknown(monitor):
|
||||
with patch.object(monitor, "_get_memory_info", side_effect=RuntimeError("boom")):
|
||||
result = await monitor._check_memory()
|
||||
|
||||
assert result.level == HealthLevel.UNKNOWN
|
||||
|
||||
|
||||
# ── Disk check ────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_disk_ok(monitor):
|
||||
usage = MagicMock()
|
||||
usage.free = 100 * (1024**3) # 100 GB
|
||||
usage.total = 500 * (1024**3) # 500 GB
|
||||
usage.used = 400 * (1024**3)
|
||||
|
||||
with patch("shutil.disk_usage", return_value=usage):
|
||||
result = await monitor._check_disk()
|
||||
|
||||
assert result.level == HealthLevel.OK
|
||||
assert "100.0GB free" in result.message
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_disk_low_triggers_cleanup(monitor):
|
||||
usage = MagicMock()
|
||||
usage.free = 5 * (1024**3) # 5 GB — below threshold
|
||||
usage.total = 500 * (1024**3)
|
||||
usage.used = 495 * (1024**3)
|
||||
|
||||
with (
|
||||
patch("shutil.disk_usage", return_value=usage),
|
||||
patch.object(monitor, "_cleanup_temp_files", return_value=2.5),
|
||||
):
|
||||
result = await monitor._check_disk()
|
||||
|
||||
assert result.level == HealthLevel.WARNING
|
||||
assert result.auto_resolved is True
|
||||
assert "cleaned 2.50GB" in result.message
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_disk_critical_when_cleanup_fails(monitor):
|
||||
usage = MagicMock()
|
||||
usage.free = 5 * (1024**3)
|
||||
usage.total = 500 * (1024**3)
|
||||
usage.used = 495 * (1024**3)
|
||||
|
||||
with (
|
||||
patch("shutil.disk_usage", return_value=usage),
|
||||
patch.object(monitor, "_cleanup_temp_files", return_value=0.0),
|
||||
):
|
||||
result = await monitor._check_disk()
|
||||
|
||||
assert result.level == HealthLevel.CRITICAL
|
||||
assert result.needs_human is True
|
||||
|
||||
|
||||
# ── Ollama check ──────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def test_get_ollama_status_reachable(monitor):
|
||||
tags_body = json.dumps({"models": [{"name": "qwen3:30b"}, {"name": "llama3.1:8b"}]}).encode()
|
||||
ps_body = json.dumps({"models": [{"name": "qwen3:30b", "size": 1000}]}).encode()
|
||||
|
||||
responses = [
|
||||
_FakeHTTPResponse(tags_body),
|
||||
_FakeHTTPResponse(ps_body),
|
||||
]
|
||||
|
||||
with patch("urllib.request.urlopen", side_effect=responses):
|
||||
status = monitor._get_ollama_status()
|
||||
|
||||
assert status["reachable"] is True
|
||||
assert len(status["models"]) == 2
|
||||
assert len(status["loaded_models"]) == 1
|
||||
|
||||
|
||||
def test_get_ollama_status_unreachable(monitor):
|
||||
with patch("urllib.request.urlopen", side_effect=OSError("connection refused")):
|
||||
status = monitor._get_ollama_status()
|
||||
|
||||
assert status["reachable"] is False
|
||||
assert status["models"] == []
|
||||
assert status["loaded_models"] == []
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_ollama_ok(monitor):
|
||||
status = {
|
||||
"reachable": True,
|
||||
"models": [{"name": "qwen3:30b"}],
|
||||
"loaded_models": [],
|
||||
}
|
||||
with patch.object(monitor, "_get_ollama_status", return_value=status):
|
||||
result = await monitor._check_ollama()
|
||||
|
||||
assert result.level == HealthLevel.OK
|
||||
assert result.details["reachable"] is True
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_ollama_unreachable_restart_success(monitor):
|
||||
status = {"reachable": False, "models": [], "loaded_models": []}
|
||||
with (
|
||||
patch.object(monitor, "_get_ollama_status", return_value=status),
|
||||
patch.object(monitor, "_restart_ollama", return_value=True),
|
||||
):
|
||||
result = await monitor._check_ollama()
|
||||
|
||||
assert result.level == HealthLevel.WARNING
|
||||
assert result.auto_resolved is True
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_ollama_unreachable_restart_fails(monitor):
|
||||
status = {"reachable": False, "models": [], "loaded_models": []}
|
||||
with (
|
||||
patch.object(monitor, "_get_ollama_status", return_value=status),
|
||||
patch.object(monitor, "_restart_ollama", return_value=False),
|
||||
):
|
||||
result = await monitor._check_ollama()
|
||||
|
||||
assert result.level == HealthLevel.CRITICAL
|
||||
assert result.needs_human is True
|
||||
|
||||
|
||||
# ── Process check ─────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def test_get_zombie_processes_none(monitor):
|
||||
ps_output = (
|
||||
"USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND\n"
|
||||
"alex 123 0.1 0.2 100 200 s0 S 1:00 0:01 python\n"
|
||||
"alex 456 0.0 0.1 50 100 s0 S 1:01 0:00 bash\n"
|
||||
)
|
||||
result = MagicMock()
|
||||
result.stdout = ps_output
|
||||
with patch("subprocess.run", return_value=result):
|
||||
info = monitor._get_zombie_processes()
|
||||
|
||||
assert info["zombies"] == []
|
||||
|
||||
|
||||
def test_get_zombie_processes_found(monitor):
|
||||
ps_output = (
|
||||
"USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND\n"
|
||||
"alex 123 0.1 0.2 100 200 s0 S 1:00 0:01 python\n"
|
||||
"alex 789 0.0 0.0 0 0 s0 Z 1:02 0:00 defunct\n"
|
||||
)
|
||||
result = MagicMock()
|
||||
result.stdout = ps_output
|
||||
with patch("subprocess.run", return_value=result):
|
||||
info = monitor._get_zombie_processes()
|
||||
|
||||
assert len(info["zombies"]) == 1
|
||||
assert info["zombies"][0]["pid"] == "789"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_processes_no_zombies(monitor):
|
||||
with patch.object(monitor, "_get_zombie_processes", return_value={"zombies": []}):
|
||||
result = await monitor._check_processes()
|
||||
|
||||
assert result.level == HealthLevel.OK
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_processes_zombies_warning(monitor):
|
||||
zombies = [{"pid": "100", "command": "defunct"}, {"pid": "101", "command": "defunct"}]
|
||||
with patch.object(monitor, "_get_zombie_processes", return_value={"zombies": zombies}):
|
||||
result = await monitor._check_processes()
|
||||
|
||||
assert result.level == HealthLevel.WARNING
|
||||
assert result.needs_human is False # Only 2, threshold is >3
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_processes_many_zombies_needs_human(monitor):
|
||||
zombies = [{"pid": str(i), "command": "defunct"} for i in range(5)]
|
||||
with patch.object(monitor, "_get_zombie_processes", return_value={"zombies": zombies}):
|
||||
result = await monitor._check_processes()
|
||||
|
||||
assert result.needs_human is True
|
||||
|
||||
|
||||
# ── Network check ─────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def test_check_gitea_connectivity_ok(monitor):
|
||||
body = json.dumps({"version": "1.22.0"}).encode()
|
||||
with patch("urllib.request.urlopen", return_value=_FakeHTTPResponse(body, status=200)):
|
||||
info = monitor._check_gitea_connectivity()
|
||||
|
||||
assert info["reachable"] is True
|
||||
assert info["latency_ms"] >= 0
|
||||
|
||||
|
||||
def test_check_gitea_connectivity_unreachable(monitor):
|
||||
with patch("urllib.request.urlopen", side_effect=OSError("refused")):
|
||||
info = monitor._check_gitea_connectivity()
|
||||
|
||||
assert info["reachable"] is False
|
||||
assert "error" in info
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_network_ok(monitor):
|
||||
with patch.object(
|
||||
monitor,
|
||||
"_check_gitea_connectivity",
|
||||
return_value={"reachable": True, "latency_ms": 5.0, "url": "http://localhost:3000"},
|
||||
):
|
||||
result = await monitor._check_network()
|
||||
|
||||
assert result.level == HealthLevel.OK
|
||||
assert "Gitea reachable" in result.message
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_network_unreachable(monitor):
|
||||
with patch.object(
|
||||
monitor,
|
||||
"_check_gitea_connectivity",
|
||||
return_value={"reachable": False, "error": "refused", "url": "http://localhost:3000"},
|
||||
):
|
||||
result = await monitor._check_network()
|
||||
|
||||
assert result.level == HealthLevel.WARNING
|
||||
assert result.needs_human is True
|
||||
|
||||
|
||||
# ── Full cycle ────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_run_cycle_all_ok(monitor):
|
||||
ok_result = CheckResult(name="test", level=HealthLevel.OK, message="ok")
|
||||
|
||||
async def _ok_check():
|
||||
return ok_result
|
||||
|
||||
with (
|
||||
patch.object(monitor, "_check_memory", _ok_check),
|
||||
patch.object(monitor, "_check_disk", _ok_check),
|
||||
patch.object(monitor, "_check_ollama", _ok_check),
|
||||
patch.object(monitor, "_check_processes", _ok_check),
|
||||
patch.object(monitor, "_check_network", _ok_check),
|
||||
patch.object(monitor, "_handle_alerts"),
|
||||
):
|
||||
report = await monitor.run_cycle()
|
||||
|
||||
assert report.overall == HealthLevel.OK
|
||||
assert not report.has_issues
|
||||
assert monitor.last_report is report
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_run_cycle_sets_overall_to_worst(monitor):
|
||||
async def _ok():
|
||||
return CheckResult(name="ok", level=HealthLevel.OK, message="ok")
|
||||
|
||||
async def _critical():
|
||||
return CheckResult(name="critical", level=HealthLevel.CRITICAL, message="bad")
|
||||
|
||||
with (
|
||||
patch.object(monitor, "_check_memory", _ok),
|
||||
patch.object(monitor, "_check_disk", _critical),
|
||||
patch.object(monitor, "_check_ollama", _ok),
|
||||
patch.object(monitor, "_check_processes", _ok),
|
||||
patch.object(monitor, "_check_network", _ok),
|
||||
patch.object(monitor, "_handle_alerts"),
|
||||
):
|
||||
report = await monitor.run_cycle()
|
||||
|
||||
assert report.overall == HealthLevel.CRITICAL
|
||||
assert report.has_issues is True
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_run_cycle_exception_becomes_unknown(monitor):
|
||||
async def _ok():
|
||||
return CheckResult(name="ok", level=HealthLevel.OK, message="ok")
|
||||
|
||||
async def _boom():
|
||||
raise RuntimeError("unexpected error")
|
||||
|
||||
with (
|
||||
patch.object(monitor, "_check_memory", _ok),
|
||||
patch.object(monitor, "_check_disk", _ok),
|
||||
patch.object(monitor, "_check_ollama", _boom),
|
||||
patch.object(monitor, "_check_processes", _ok),
|
||||
patch.object(monitor, "_check_network", _ok),
|
||||
patch.object(monitor, "_handle_alerts"),
|
||||
):
|
||||
report = await monitor.run_cycle()
|
||||
|
||||
levels = {c.level for c in report.checks}
|
||||
assert HealthLevel.UNKNOWN in levels
|
||||
|
||||
|
||||
# ── to_dict serialisation ────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def test_check_result_to_dict():
|
||||
c = CheckResult(
|
||||
name="memory",
|
||||
level=HealthLevel.WARNING,
|
||||
message="low",
|
||||
details={"free_gb": 3.5},
|
||||
auto_resolved=True,
|
||||
)
|
||||
d = c.to_dict()
|
||||
assert d["name"] == "memory"
|
||||
assert d["level"] == "warning"
|
||||
assert d["auto_resolved"] is True
|
||||
assert d["details"]["free_gb"] == 3.5
|
||||
|
||||
|
||||
def test_health_report_to_dict():
|
||||
checks = [
|
||||
CheckResult(name="disk", level=HealthLevel.OK, message="ok"),
|
||||
]
|
||||
report = HealthReport(
|
||||
timestamp="2026-01-01T00:00:00+00:00",
|
||||
checks=checks,
|
||||
overall=HealthLevel.OK,
|
||||
)
|
||||
d = report.to_dict()
|
||||
assert d["overall"] == "ok"
|
||||
assert d["has_issues"] is False
|
||||
assert len(d["checks"]) == 1
|
||||
@@ -9,19 +9,15 @@ Refs: #1105
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import tempfile
|
||||
from datetime import UTC, datetime, timedelta
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from timmy_automations.retrain.quality_filter import QualityFilter, TrajectoryQuality
|
||||
from timmy_automations.retrain.retrain import RetrainOrchestrator
|
||||
from timmy_automations.retrain.training_dataset import TrainingDataset
|
||||
from timmy_automations.retrain.training_log import CycleMetrics, TrainingLog
|
||||
from timmy_automations.retrain.trajectory_exporter import Trajectory, TrajectoryExporter
|
||||
|
||||
|
||||
# ── Fixtures ─────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@@ -382,7 +378,7 @@ class TestTrainingDataset:
|
||||
ds = TrainingDataset(repo_root=tmp_path)
|
||||
ds.append([self._make_result()], "2026-W12")
|
||||
with open(ds.dataset_path) as f:
|
||||
lines = [l.strip() for l in f if l.strip()]
|
||||
lines = [line.strip() for line in f if line.strip()]
|
||||
assert len(lines) == 1
|
||||
record = json.loads(lines[0])
|
||||
assert "messages" in record
|
||||
|
||||
272
tests/unit/test_sovereignty_metrics.py
Normal file
272
tests/unit/test_sovereignty_metrics.py
Normal file
@@ -0,0 +1,272 @@
|
||||
"""Unit tests for the sovereignty metrics emitter and store.
|
||||
|
||||
Refs: #954
|
||||
"""
|
||||
|
||||
from unittest.mock import AsyncMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from timmy.sovereignty.metrics import (
|
||||
ALL_EVENT_TYPES,
|
||||
SovereigntyMetricsStore,
|
||||
emit_sovereignty_event,
|
||||
get_cost_per_hour,
|
||||
get_skills_crystallized,
|
||||
get_sovereignty_pct,
|
||||
record,
|
||||
)
|
||||
|
||||
pytestmark = pytest.mark.unit
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def store(tmp_path):
|
||||
"""A fresh SovereigntyMetricsStore backed by a temp database."""
|
||||
return SovereigntyMetricsStore(db_path=tmp_path / "test_sov.db")
|
||||
|
||||
|
||||
# ── ALL_EVENT_TYPES ───────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestEventTypes:
|
||||
def test_all_expected_event_types_present(self):
|
||||
expected = {
|
||||
"perception_cache_hit",
|
||||
"perception_vlm_call",
|
||||
"decision_rule_hit",
|
||||
"decision_llm_call",
|
||||
"narration_template",
|
||||
"narration_llm",
|
||||
"skill_crystallized",
|
||||
"api_call",
|
||||
"api_cost",
|
||||
}
|
||||
assert ALL_EVENT_TYPES == expected
|
||||
|
||||
|
||||
# ── Record & retrieval ────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestRecord:
|
||||
def test_record_inserts_event(self, store):
|
||||
store.record("perception_cache_hit")
|
||||
pct = store.get_sovereignty_pct("perception")
|
||||
assert pct == 100.0
|
||||
|
||||
def test_record_with_metadata(self, store):
|
||||
store.record("api_cost", metadata={"usd": 0.05})
|
||||
cost = store.get_cost_per_hour()
|
||||
assert cost > 0.0
|
||||
|
||||
def test_record_with_session_id(self, store):
|
||||
store.record("skill_crystallized", session_id="sess-1")
|
||||
assert store.get_skills_crystallized("sess-1") == 1
|
||||
|
||||
def test_record_unknown_type_does_not_raise(self, store):
|
||||
"""Unknown event types are silently stored (no crash)."""
|
||||
store.record("totally_unknown_event") # should not raise
|
||||
|
||||
|
||||
# ── Sessions ──────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSessions:
|
||||
def test_start_session_returns_id(self, store):
|
||||
sid = store.start_session(game="Bannerlord")
|
||||
assert isinstance(sid, str)
|
||||
assert len(sid) > 0
|
||||
|
||||
def test_start_session_accepts_custom_id(self, store):
|
||||
sid = store.start_session(game="Bannerlord", session_id="my-session")
|
||||
assert sid == "my-session"
|
||||
|
||||
def test_end_session_does_not_raise(self, store):
|
||||
sid = store.start_session()
|
||||
store.end_session(sid) # should not raise
|
||||
|
||||
def test_start_session_idempotent(self, store):
|
||||
"""Starting a session with the same ID twice is a no-op."""
|
||||
sid = store.start_session(session_id="dup")
|
||||
sid2 = store.start_session(session_id="dup")
|
||||
assert sid == sid2
|
||||
|
||||
|
||||
# ── Sovereignty percentage ────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestGetSovereigntyPct:
|
||||
def test_perception_all_cache_hits(self, store):
|
||||
for _ in range(5):
|
||||
store.record("perception_cache_hit")
|
||||
assert store.get_sovereignty_pct("perception") == 100.0
|
||||
|
||||
def test_perception_mixed(self, store):
|
||||
store.record("perception_cache_hit")
|
||||
store.record("perception_vlm_call")
|
||||
assert store.get_sovereignty_pct("perception") == 50.0
|
||||
|
||||
def test_decision_all_sovereign(self, store):
|
||||
for _ in range(3):
|
||||
store.record("decision_rule_hit")
|
||||
assert store.get_sovereignty_pct("decision") == 100.0
|
||||
|
||||
def test_narration_all_sovereign(self, store):
|
||||
store.record("narration_template")
|
||||
store.record("narration_template")
|
||||
assert store.get_sovereignty_pct("narration") == 100.0
|
||||
|
||||
def test_narration_all_llm(self, store):
|
||||
store.record("narration_llm")
|
||||
assert store.get_sovereignty_pct("narration") == 0.0
|
||||
|
||||
def test_no_events_returns_zero(self, store):
|
||||
assert store.get_sovereignty_pct("perception") == 0.0
|
||||
|
||||
def test_unknown_layer_returns_zero(self, store):
|
||||
assert store.get_sovereignty_pct("nonexistent_layer") == 0.0
|
||||
|
||||
def test_time_window_filters_old_events(self, store, tmp_path):
|
||||
"""Events outside the time window are excluded."""
|
||||
# Insert an event with a very old timestamp directly
|
||||
import sqlite3
|
||||
from contextlib import closing
|
||||
|
||||
with closing(sqlite3.connect(str(store._db_path))) as conn:
|
||||
conn.execute(
|
||||
"INSERT INTO events (timestamp, event_type, session_id, metadata_json) VALUES (?, ?, ?, ?)",
|
||||
("2000-01-01T00:00:00+00:00", "perception_cache_hit", "", "{}"),
|
||||
)
|
||||
conn.commit()
|
||||
|
||||
# With a 60-second window, the old event should be excluded
|
||||
pct = store.get_sovereignty_pct("perception", time_window=60)
|
||||
assert pct == 0.0
|
||||
|
||||
def test_time_window_includes_recent_events(self, store):
|
||||
store.record("decision_rule_hit")
|
||||
pct = store.get_sovereignty_pct("decision", time_window=60)
|
||||
assert pct == 100.0
|
||||
|
||||
|
||||
# ── Cost per hour ─────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestGetCostPerHour:
|
||||
def test_no_events_returns_zero(self, store):
|
||||
assert store.get_cost_per_hour() == 0.0
|
||||
|
||||
def test_single_cost_event(self, store):
|
||||
# Record a cost of $1.00 within the last hour window
|
||||
store.record("api_cost", metadata={"usd": 1.00})
|
||||
cost = store.get_cost_per_hour(time_window=3600)
|
||||
assert cost == pytest.approx(1.00, rel=1e-3)
|
||||
|
||||
def test_multiple_cost_events(self, store):
|
||||
store.record("api_cost", metadata={"usd": 0.25})
|
||||
store.record("api_cost", metadata={"usd": 0.75})
|
||||
cost = store.get_cost_per_hour(time_window=3600)
|
||||
assert cost == pytest.approx(1.00, rel=1e-3)
|
||||
|
||||
def test_missing_usd_field_is_zero(self, store):
|
||||
store.record("api_cost", metadata={"model": "gpt-4"})
|
||||
assert store.get_cost_per_hour() == 0.0
|
||||
|
||||
def test_cost_extrapolated_for_short_window(self, store):
|
||||
"""Cost recorded in a 1800s window is doubled to get per-hour rate."""
|
||||
store.record("api_cost", metadata={"usd": 0.5})
|
||||
cost = store.get_cost_per_hour(time_window=1800)
|
||||
assert cost == pytest.approx(1.0, rel=1e-3)
|
||||
|
||||
|
||||
# ── Skills crystallised ───────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestGetSkillsCrystallized:
|
||||
def test_no_skills_returns_zero(self, store):
|
||||
assert store.get_skills_crystallized() == 0
|
||||
|
||||
def test_counts_all_sessions(self, store):
|
||||
store.record("skill_crystallized", session_id="a")
|
||||
store.record("skill_crystallized", session_id="b")
|
||||
assert store.get_skills_crystallized() == 2
|
||||
|
||||
def test_filters_by_session(self, store):
|
||||
store.record("skill_crystallized", session_id="sess-1")
|
||||
store.record("skill_crystallized", session_id="sess-2")
|
||||
assert store.get_skills_crystallized("sess-1") == 1
|
||||
|
||||
def test_session_with_no_skills(self, store):
|
||||
store.record("skill_crystallized", session_id="sess-1")
|
||||
assert store.get_skills_crystallized("sess-999") == 0
|
||||
|
||||
|
||||
# ── Snapshot ──────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestGetSnapshot:
|
||||
def test_snapshot_structure(self, store):
|
||||
snap = store.get_snapshot()
|
||||
assert "sovereignty" in snap
|
||||
assert "cost_per_hour" in snap
|
||||
assert "skills_crystallized" in snap
|
||||
|
||||
def test_snapshot_sovereignty_has_all_layers(self, store):
|
||||
snap = store.get_snapshot()
|
||||
assert set(snap["sovereignty"].keys()) == {"perception", "decision", "narration"}
|
||||
|
||||
def test_snapshot_reflects_events(self, store):
|
||||
store.record("perception_cache_hit")
|
||||
store.record("skill_crystallized")
|
||||
snap = store.get_snapshot()
|
||||
assert snap["sovereignty"]["perception"] == 100.0
|
||||
assert snap["skills_crystallized"] == 1
|
||||
|
||||
|
||||
# ── Module-level convenience functions ───────────────────────────────────────
|
||||
|
||||
|
||||
class TestModuleLevelFunctions:
|
||||
def test_record_and_get_sovereignty_pct(self, tmp_path):
|
||||
with (
|
||||
patch("timmy.sovereignty.metrics._store", None),
|
||||
patch("timmy.sovereignty.metrics.DB_PATH", tmp_path / "fn_test.db"),
|
||||
):
|
||||
record("decision_rule_hit")
|
||||
pct = get_sovereignty_pct("decision")
|
||||
assert pct == 100.0
|
||||
|
||||
def test_get_cost_per_hour_module_fn(self, tmp_path):
|
||||
with (
|
||||
patch("timmy.sovereignty.metrics._store", None),
|
||||
patch("timmy.sovereignty.metrics.DB_PATH", tmp_path / "fn_test2.db"),
|
||||
):
|
||||
record("api_cost", {"usd": 0.5})
|
||||
cost = get_cost_per_hour()
|
||||
assert cost > 0.0
|
||||
|
||||
def test_get_skills_crystallized_module_fn(self, tmp_path):
|
||||
with (
|
||||
patch("timmy.sovereignty.metrics._store", None),
|
||||
patch("timmy.sovereignty.metrics.DB_PATH", tmp_path / "fn_test3.db"),
|
||||
):
|
||||
record("skill_crystallized")
|
||||
count = get_skills_crystallized()
|
||||
assert count == 1
|
||||
|
||||
|
||||
# ── emit_sovereignty_event ────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestEmitSovereigntyEvent:
|
||||
@pytest.mark.asyncio
|
||||
async def test_emit_records_and_publishes(self, tmp_path):
|
||||
with (
|
||||
patch("timmy.sovereignty.metrics._store", None),
|
||||
patch("timmy.sovereignty.metrics.DB_PATH", tmp_path / "emit_test.db"),
|
||||
patch("infrastructure.events.bus.emit", new_callable=AsyncMock) as mock_emit,
|
||||
):
|
||||
await emit_sovereignty_event("perception_cache_hit", {"frame": 42}, session_id="s1")
|
||||
mock_emit.assert_called_once()
|
||||
args = mock_emit.call_args[0]
|
||||
assert args[0] == "sovereignty.event.perception_cache_hit"
|
||||
@@ -6,7 +6,6 @@ import pytest
|
||||
|
||||
from timmy.vassal.agent_health import AgentHealthReport, AgentStatus
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# AgentStatus
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -49,9 +48,7 @@ def test_report_any_stuck():
|
||||
|
||||
|
||||
def test_report_all_idle():
|
||||
report = AgentHealthReport(
|
||||
agents=[AgentStatus(agent="claude"), AgentStatus(agent="kimi")]
|
||||
)
|
||||
report = AgentHealthReport(agents=[AgentStatus(agent="claude"), AgentStatus(agent="kimi")])
|
||||
assert report.all_idle is True
|
||||
|
||||
|
||||
|
||||
@@ -6,14 +6,12 @@ import pytest
|
||||
|
||||
from timmy.vassal.backlog import (
|
||||
AgentTarget,
|
||||
TriagedIssue,
|
||||
_choose_agent,
|
||||
_extract_labels,
|
||||
_score_priority,
|
||||
triage_issues,
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _extract_labels
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@@ -12,7 +12,6 @@ from timmy.vassal.house_health import (
|
||||
_probe_disk,
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Data model tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@@ -6,7 +6,6 @@ import pytest
|
||||
|
||||
from timmy.vassal.orchestration_loop import VassalCycleRecord, VassalOrchestrator
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# VassalCycleRecord
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -134,6 +133,6 @@ def test_orchestrator_stop_when_not_running():
|
||||
|
||||
|
||||
def test_module_singleton_exists():
|
||||
from timmy.vassal import vassal_orchestrator, VassalOrchestrator
|
||||
from timmy.vassal import VassalOrchestrator, vassal_orchestrator
|
||||
|
||||
assert isinstance(vassal_orchestrator, VassalOrchestrator)
|
||||
|
||||
1
timmy-benchmark/levels/__init__.py
Normal file
1
timmy-benchmark/levels/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
"""Cognitive benchmark levels for Project Bannerlord readiness testing."""
|
||||
183
timmy-benchmark/levels/level_0_coin_flip.py
Normal file
183
timmy-benchmark/levels/level_0_coin_flip.py
Normal file
@@ -0,0 +1,183 @@
|
||||
"""Level 0: JSON Compliance — Coin Flip.
|
||||
|
||||
Tests whether the model can reliably return well-formed JSON responses
|
||||
with a specific schema. This is the minimum bar for GABS tool calls.
|
||||
"""
|
||||
|
||||
import json
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
LEVEL = 0
|
||||
NAME = "JSON Compliance (Coin Flip)"
|
||||
DESCRIPTION = "Model must return valid JSON matching a strict schema on each trial."
|
||||
|
||||
SYSTEM_PROMPT = """You are a strategic AI agent. You MUST respond ONLY with valid JSON.
|
||||
No markdown, no explanation, no code fences. Raw JSON only."""
|
||||
|
||||
TRIALS = [
|
||||
{
|
||||
"prompt": (
|
||||
'A coin is flipped. Respond with exactly: {"choice": "heads"} or {"choice": "tails"}. '
|
||||
"Pick one. JSON only."
|
||||
),
|
||||
"schema": {"choice": str},
|
||||
"valid_values": {"choice": ["heads", "tails"]},
|
||||
},
|
||||
{
|
||||
"prompt": (
|
||||
'You must attack or defend. Respond with: {"action": "attack", "confidence": 0.8} '
|
||||
'or {"action": "defend", "confidence": 0.6}. Replace confidence with your own value 0.0-1.0. JSON only.'
|
||||
),
|
||||
"schema": {"action": str, "confidence": float},
|
||||
"valid_values": {"action": ["attack", "defend"]},
|
||||
},
|
||||
{
|
||||
"prompt": (
|
||||
'Choose a direction to march. Respond with exactly: '
|
||||
'{"direction": "north", "reason": "string explaining why"}. '
|
||||
"Pick north/south/east/west. JSON only."
|
||||
),
|
||||
"schema": {"direction": str, "reason": str},
|
||||
"valid_values": {"direction": ["north", "south", "east", "west"]},
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
@dataclass
|
||||
class TrialResult:
|
||||
trial_index: int
|
||||
prompt: str
|
||||
raw_response: str
|
||||
parsed: dict | None
|
||||
valid_json: bool
|
||||
schema_valid: bool
|
||||
value_valid: bool
|
||||
latency_ms: float
|
||||
error: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class LevelResult:
|
||||
level: int = LEVEL
|
||||
name: str = NAME
|
||||
trials: list[TrialResult] = field(default_factory=list)
|
||||
passed: bool = False
|
||||
score: float = 0.0
|
||||
latency_p50_ms: float = 0.0
|
||||
latency_p99_ms: float = 0.0
|
||||
|
||||
|
||||
def _validate_schema(parsed: dict, schema: dict[str, type]) -> bool:
|
||||
for key, expected_type in schema.items():
|
||||
if key not in parsed:
|
||||
return False
|
||||
if not isinstance(parsed[key], expected_type):
|
||||
# Allow int where float is expected
|
||||
if expected_type is float and isinstance(parsed[key], int):
|
||||
continue
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def _validate_values(parsed: dict, valid_values: dict[str, list]) -> bool:
|
||||
for key, valid_list in valid_values.items():
|
||||
if key in parsed and parsed[key] not in valid_list:
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def _clean_response(raw: str) -> str:
|
||||
"""Strip markdown fences if model wrapped JSON in them."""
|
||||
raw = raw.strip()
|
||||
if raw.startswith("```"):
|
||||
lines = raw.splitlines()
|
||||
# Remove first and last fence lines
|
||||
lines = [l for l in lines if not l.startswith("```")]
|
||||
raw = "\n".join(lines).strip()
|
||||
return raw
|
||||
|
||||
|
||||
def run(client: Any, model: str, verbose: bool = False) -> LevelResult:
|
||||
result = LevelResult()
|
||||
latencies = []
|
||||
|
||||
for i, trial in enumerate(TRIALS):
|
||||
t0 = time.time()
|
||||
try:
|
||||
response = client.chat(
|
||||
model=model,
|
||||
messages=[
|
||||
{"role": "system", "content": SYSTEM_PROMPT},
|
||||
{"role": "user", "content": trial["prompt"]},
|
||||
],
|
||||
options={"temperature": 0.1},
|
||||
)
|
||||
raw = response["message"]["content"]
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
except Exception as exc:
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
tr = TrialResult(
|
||||
trial_index=i,
|
||||
prompt=trial["prompt"],
|
||||
raw_response="",
|
||||
parsed=None,
|
||||
valid_json=False,
|
||||
schema_valid=False,
|
||||
value_valid=False,
|
||||
latency_ms=latency_ms,
|
||||
error=str(exc),
|
||||
)
|
||||
result.trials.append(tr)
|
||||
if verbose:
|
||||
print(f" Trial {i}: ERROR — {exc}")
|
||||
continue
|
||||
|
||||
latencies.append(latency_ms)
|
||||
|
||||
cleaned = _clean_response(raw)
|
||||
parsed = None
|
||||
valid_json = False
|
||||
schema_valid = False
|
||||
value_valid = False
|
||||
error = ""
|
||||
|
||||
try:
|
||||
parsed = json.loads(cleaned)
|
||||
valid_json = True
|
||||
schema_valid = _validate_schema(parsed, trial["schema"])
|
||||
value_valid = _validate_values(parsed, trial["valid_values"])
|
||||
except json.JSONDecodeError as exc:
|
||||
error = f"JSONDecodeError: {exc}"
|
||||
|
||||
tr = TrialResult(
|
||||
trial_index=i,
|
||||
prompt=trial["prompt"],
|
||||
raw_response=raw,
|
||||
parsed=parsed,
|
||||
valid_json=valid_json,
|
||||
schema_valid=schema_valid,
|
||||
value_valid=value_valid,
|
||||
latency_ms=latency_ms,
|
||||
error=error,
|
||||
)
|
||||
result.trials.append(tr)
|
||||
|
||||
if verbose:
|
||||
status = "PASS" if (valid_json and schema_valid) else "FAIL"
|
||||
print(
|
||||
f" Trial {i}: {status} | json={valid_json} schema={schema_valid} "
|
||||
f"value={value_valid} | {latency_ms:.0f}ms | {raw[:80]!r}"
|
||||
)
|
||||
|
||||
passed_trials = sum(1 for t in result.trials if t.valid_json and t.schema_valid)
|
||||
result.score = passed_trials / len(TRIALS)
|
||||
result.passed = result.score >= 1.0 # Must pass all 3 trials
|
||||
|
||||
if latencies:
|
||||
latencies_sorted = sorted(latencies)
|
||||
result.latency_p50_ms = latencies_sorted[len(latencies_sorted) // 2]
|
||||
result.latency_p99_ms = latencies_sorted[-1]
|
||||
|
||||
return result
|
||||
211
timmy-benchmark/levels/level_1_tic_tac_toe.py
Normal file
211
timmy-benchmark/levels/level_1_tic_tac_toe.py
Normal file
@@ -0,0 +1,211 @@
|
||||
"""Level 1: Board State Tracking — Tic-Tac-Toe.
|
||||
|
||||
Tests whether the model can maintain game state across turns, select
|
||||
legal moves, and exhibit basic strategic awareness.
|
||||
Maps to: Bannerlord board state / campaign map tracking.
|
||||
"""
|
||||
|
||||
import json
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
LEVEL = 1
|
||||
NAME = "Board State Tracking (Tic-Tac-Toe)"
|
||||
DESCRIPTION = "Model must track a tic-tac-toe board and make legal, strategic moves."
|
||||
|
||||
SYSTEM_PROMPT = """You are a strategic AI playing tic-tac-toe. The board is a 3x3 grid.
|
||||
Positions are numbered 0-8 left-to-right, top-to-bottom:
|
||||
0|1|2
|
||||
3|4|5
|
||||
6|7|8
|
||||
|
||||
You MUST respond ONLY with valid JSON. No markdown, no explanation. Raw JSON only.
|
||||
Format: {"move": <position 0-8>, "reason": "<brief reason>"}"""
|
||||
|
||||
|
||||
SCENARIOS = [
|
||||
{
|
||||
"description": "Empty board — opening move",
|
||||
"board": [None, None, None, None, None, None, None, None, None],
|
||||
"player": "X",
|
||||
"prompt": (
|
||||
'Board state: [null,null,null,null,null,null,null,null,null]. '
|
||||
'You are X. It is your turn. Choose a move. '
|
||||
'Respond: {"move": <0-8>, "reason": "<why>"}'
|
||||
),
|
||||
"check": lambda move, board: move in range(9) and board[move] is None,
|
||||
"check_desc": "Move must be a valid empty position (0-8)",
|
||||
},
|
||||
{
|
||||
"description": "Block opponent's winning move",
|
||||
"board": ["O", None, "O", None, "X", None, None, None, None],
|
||||
"player": "X",
|
||||
"prompt": (
|
||||
'Board: ["O",null,"O",null,"X",null,null,null,null]. '
|
||||
"O has positions 0 and 2. You are X. "
|
||||
"O will win on next turn unless you block. "
|
||||
'Respond: {"move": <0-8>, "reason": "<why>"}'
|
||||
),
|
||||
"check": lambda move, board: move == 1, # Must block at position 1
|
||||
"check_desc": "Must block O's win at position 1",
|
||||
},
|
||||
{
|
||||
"description": "Take winning move",
|
||||
"board": ["X", None, "X", None, "O", None, None, "O", None],
|
||||
"player": "X",
|
||||
"prompt": (
|
||||
'Board: ["X",null,"X",null,"O",null,null,"O",null]. '
|
||||
"You are X. You have positions 0 and 2. "
|
||||
"You can win this turn. "
|
||||
'Respond: {"move": <0-8>, "reason": "<why>"}'
|
||||
),
|
||||
"check": lambda move, board: move == 1, # Win at position 1
|
||||
"check_desc": "Must take winning move at position 1",
|
||||
},
|
||||
{
|
||||
"description": "Legal move on partially filled board",
|
||||
"board": ["X", "O", "X", "O", "X", "O", None, None, None],
|
||||
"player": "O",
|
||||
"prompt": (
|
||||
'Board: ["X","O","X","O","X","O",null,null,null]. '
|
||||
"You are O. Choose a legal move (positions 6, 7, or 8 are available). "
|
||||
'Respond: {"move": <0-8>, "reason": "<why>"}'
|
||||
),
|
||||
"check": lambda move, board: move in [6, 7, 8],
|
||||
"check_desc": "Move must be one of the empty positions: 6, 7, or 8",
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
@dataclass
|
||||
class ScenarioResult:
|
||||
scenario_index: int
|
||||
description: str
|
||||
prompt: str
|
||||
raw_response: str
|
||||
parsed: dict | None
|
||||
valid_json: bool
|
||||
move_legal: bool
|
||||
move_correct: bool
|
||||
latency_ms: float
|
||||
error: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class LevelResult:
|
||||
level: int = LEVEL
|
||||
name: str = NAME
|
||||
trials: list[ScenarioResult] = field(default_factory=list)
|
||||
passed: bool = False
|
||||
score: float = 0.0
|
||||
latency_p50_ms: float = 0.0
|
||||
latency_p99_ms: float = 0.0
|
||||
|
||||
|
||||
def _clean_response(raw: str) -> str:
|
||||
raw = raw.strip()
|
||||
if raw.startswith("```"):
|
||||
lines = raw.splitlines()
|
||||
lines = [l for l in lines if not l.startswith("```")]
|
||||
raw = "\n".join(lines).strip()
|
||||
return raw
|
||||
|
||||
|
||||
def run(client: Any, model: str, verbose: bool = False) -> LevelResult:
|
||||
result = LevelResult()
|
||||
latencies = []
|
||||
|
||||
for i, scenario in enumerate(SCENARIOS):
|
||||
t0 = time.time()
|
||||
try:
|
||||
response = client.chat(
|
||||
model=model,
|
||||
messages=[
|
||||
{"role": "system", "content": SYSTEM_PROMPT},
|
||||
{"role": "user", "content": scenario["prompt"]},
|
||||
],
|
||||
options={"temperature": 0.1},
|
||||
)
|
||||
raw = response["message"]["content"]
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
except Exception as exc:
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
sr = ScenarioResult(
|
||||
scenario_index=i,
|
||||
description=scenario["description"],
|
||||
prompt=scenario["prompt"],
|
||||
raw_response="",
|
||||
parsed=None,
|
||||
valid_json=False,
|
||||
move_legal=False,
|
||||
move_correct=False,
|
||||
latency_ms=latency_ms,
|
||||
error=str(exc),
|
||||
)
|
||||
result.trials.append(sr)
|
||||
if verbose:
|
||||
print(f" Scenario {i}: ERROR — {exc}")
|
||||
continue
|
||||
|
||||
latencies.append(latency_ms)
|
||||
|
||||
cleaned = _clean_response(raw)
|
||||
parsed = None
|
||||
valid_json = False
|
||||
move_legal = False
|
||||
move_correct = False
|
||||
error = ""
|
||||
|
||||
try:
|
||||
parsed = json.loads(cleaned)
|
||||
valid_json = True
|
||||
|
||||
if "move" in parsed:
|
||||
move = parsed["move"]
|
||||
# Coerce string digits to int (some models emit "4" instead of 4)
|
||||
if isinstance(move, str) and move.strip().lstrip("-").isdigit():
|
||||
move = int(move.strip())
|
||||
if isinstance(move, int):
|
||||
board = scenario["board"]
|
||||
move_legal = 0 <= move <= 8 and board[move] is None
|
||||
move_correct = scenario["check"](move, board)
|
||||
except json.JSONDecodeError as exc:
|
||||
error = f"JSONDecodeError: {exc}"
|
||||
|
||||
sr = ScenarioResult(
|
||||
scenario_index=i,
|
||||
description=scenario["description"],
|
||||
prompt=scenario["prompt"],
|
||||
raw_response=raw,
|
||||
parsed=parsed,
|
||||
valid_json=valid_json,
|
||||
move_legal=move_legal,
|
||||
move_correct=move_correct,
|
||||
latency_ms=latency_ms,
|
||||
error=error,
|
||||
)
|
||||
result.trials.append(sr)
|
||||
|
||||
if verbose:
|
||||
status = "PASS" if (valid_json and move_legal) else "FAIL"
|
||||
correct_str = "CORRECT" if move_correct else "suboptimal"
|
||||
move_val = parsed.get("move", "?") if parsed else "?"
|
||||
print(
|
||||
f" Scenario {i} [{scenario['description']}]: {status} ({correct_str}) "
|
||||
f"| move={move_val} | {latency_ms:.0f}ms"
|
||||
)
|
||||
if not move_correct and valid_json:
|
||||
print(f" Expected: {scenario['check_desc']}")
|
||||
|
||||
# Pass criteria: all moves must be valid JSON + legal
|
||||
legal_moves = sum(1 for t in result.trials if t.valid_json and t.move_legal)
|
||||
result.score = legal_moves / len(SCENARIOS)
|
||||
result.passed = result.score >= 1.0
|
||||
|
||||
if latencies:
|
||||
latencies_sorted = sorted(latencies)
|
||||
result.latency_p50_ms = latencies_sorted[len(latencies_sorted) // 2]
|
||||
result.latency_p99_ms = latencies_sorted[-1]
|
||||
|
||||
return result
|
||||
213
timmy-benchmark/levels/level_2_resource_mgmt.py
Normal file
213
timmy-benchmark/levels/level_2_resource_mgmt.py
Normal file
@@ -0,0 +1,213 @@
|
||||
"""Level 2: Resource Management — Party Economy.
|
||||
|
||||
Tests whether the model can allocate limited resources across competing
|
||||
priorities and adapt when constraints change.
|
||||
Maps to: Bannerlord party economy (troops, food, gold, morale).
|
||||
"""
|
||||
|
||||
import json
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
LEVEL = 2
|
||||
NAME = "Resource Management (Party Economy)"
|
||||
DESCRIPTION = "Model must allocate limited resources across troops, food, and equipment."
|
||||
|
||||
SYSTEM_PROMPT = """You are a Bannerlord campaign advisor managing a party.
|
||||
Resources are limited — every decision has trade-offs.
|
||||
You MUST respond ONLY with valid JSON. No markdown, no explanation. Raw JSON only."""
|
||||
|
||||
SCENARIOS = [
|
||||
{
|
||||
"description": "Budget allocation under constraint",
|
||||
"prompt": (
|
||||
"You have 500 gold. Options:\n"
|
||||
"- Recruit 10 infantry: costs 300 gold, +10 combat strength\n"
|
||||
"- Buy food for 20 days: costs 200 gold, keeps morale stable\n"
|
||||
"- Repair armor: costs 150 gold, -20% casualty rate\n\n"
|
||||
"You cannot afford all three. Morale is currently CRITICAL (troops may desert).\n"
|
||||
'Choose 1-2 options. Respond: {"choices": ["option_a", ...], "gold_spent": <int>, "reason": "<why>"}\n'
|
||||
"Where option keys are: recruit_infantry, buy_food, repair_armor"
|
||||
),
|
||||
"check": lambda r: (
|
||||
isinstance(r.get("choices"), list)
|
||||
and len(r["choices"]) >= 1
|
||||
and all(c in ["recruit_infantry", "buy_food", "repair_armor"] for c in r["choices"])
|
||||
and isinstance(r.get("gold_spent"), (int, float))
|
||||
and r.get("gold_spent", 9999) <= 500
|
||||
),
|
||||
"check_desc": "choices must be valid options, gold_spent <= 500",
|
||||
"strategic_check": lambda r: "buy_food" in r.get("choices", []),
|
||||
"strategic_desc": "With CRITICAL morale, food should be prioritized",
|
||||
},
|
||||
{
|
||||
"description": "Troop tier upgrade decision",
|
||||
"prompt": (
|
||||
"Party status:\n"
|
||||
"- 15 Tier-1 recruits (weak, 30 upkeep/day)\n"
|
||||
"- 5 Tier-3 veterans (strong, 90 upkeep/day)\n"
|
||||
"- Daily income: 200 gold\n"
|
||||
"- Upcoming: raider camp attack (moderate difficulty)\n\n"
|
||||
"Options:\n"
|
||||
"- Upgrade 5 recruits to Tier-2 (costs 250 gold total)\n"
|
||||
"- Keep all current troops, save gold for emergencies\n"
|
||||
"- Dismiss 5 recruits to save upkeep\n\n"
|
||||
'Respond: {"action": "upgrade_recruits"|"save_gold"|"dismiss_recruits", '
|
||||
'"reason": "<why>", "expected_outcome": "<string>"}'
|
||||
),
|
||||
"check": lambda r: (
|
||||
r.get("action") in ["upgrade_recruits", "save_gold", "dismiss_recruits"]
|
||||
and isinstance(r.get("reason"), str)
|
||||
and len(r.get("reason", "")) > 0
|
||||
),
|
||||
"check_desc": "action must be one of the three options with a non-empty reason",
|
||||
"strategic_check": lambda r: r.get("action") in ["upgrade_recruits", "save_gold"],
|
||||
"strategic_desc": "Dismissing troops before a fight is suboptimal",
|
||||
},
|
||||
{
|
||||
"description": "Multi-turn planning horizon",
|
||||
"prompt": (
|
||||
"Current: 300 gold, 10 days of food, 20 troops\n"
|
||||
"Day 5: Must cross desert (costs 5 extra food days)\n"
|
||||
"Day 10: Reach town (can buy supplies)\n\n"
|
||||
"You need a 15-day food reserve to survive the journey.\n"
|
||||
"Food costs 10 gold/day. You have enough for 10 days now.\n\n"
|
||||
"How many extra food days do you buy today?\n"
|
||||
'Respond: {"extra_food_days": <int>, "cost": <int>, "remaining_gold": <int>, "reason": "<why>"}'
|
||||
),
|
||||
"check": lambda r: (
|
||||
isinstance(r.get("extra_food_days"), (int, float))
|
||||
and isinstance(r.get("cost"), (int, float))
|
||||
and isinstance(r.get("remaining_gold"), (int, float))
|
||||
),
|
||||
"check_desc": "Must include extra_food_days, cost, remaining_gold as numbers",
|
||||
"strategic_check": lambda r: r.get("extra_food_days", 0) >= 5,
|
||||
"strategic_desc": "Need at least 5 more days of food for desert crossing",
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
@dataclass
|
||||
class ScenarioResult:
|
||||
scenario_index: int
|
||||
description: str
|
||||
raw_response: str
|
||||
parsed: dict | None
|
||||
valid_json: bool
|
||||
schema_valid: bool
|
||||
strategically_sound: bool
|
||||
latency_ms: float
|
||||
error: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class LevelResult:
|
||||
level: int = LEVEL
|
||||
name: str = NAME
|
||||
trials: list[ScenarioResult] = field(default_factory=list)
|
||||
passed: bool = False
|
||||
score: float = 0.0
|
||||
latency_p50_ms: float = 0.0
|
||||
latency_p99_ms: float = 0.0
|
||||
|
||||
|
||||
def _clean_response(raw: str) -> str:
|
||||
raw = raw.strip()
|
||||
if raw.startswith("```"):
|
||||
lines = raw.splitlines()
|
||||
lines = [l for l in lines if not l.startswith("```")]
|
||||
raw = "\n".join(lines).strip()
|
||||
return raw
|
||||
|
||||
|
||||
def run(client: Any, model: str, verbose: bool = False) -> LevelResult:
|
||||
result = LevelResult()
|
||||
latencies = []
|
||||
|
||||
for i, scenario in enumerate(SCENARIOS):
|
||||
t0 = time.time()
|
||||
try:
|
||||
response = client.chat(
|
||||
model=model,
|
||||
messages=[
|
||||
{"role": "system", "content": SYSTEM_PROMPT},
|
||||
{"role": "user", "content": scenario["prompt"]},
|
||||
],
|
||||
options={"temperature": 0.1},
|
||||
)
|
||||
raw = response["message"]["content"]
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
except Exception as exc:
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
sr = ScenarioResult(
|
||||
scenario_index=i,
|
||||
description=scenario["description"],
|
||||
raw_response="",
|
||||
parsed=None,
|
||||
valid_json=False,
|
||||
schema_valid=False,
|
||||
strategically_sound=False,
|
||||
latency_ms=latency_ms,
|
||||
error=str(exc),
|
||||
)
|
||||
result.trials.append(sr)
|
||||
if verbose:
|
||||
print(f" Scenario {i}: ERROR — {exc}")
|
||||
continue
|
||||
|
||||
latencies.append(latency_ms)
|
||||
|
||||
cleaned = _clean_response(raw)
|
||||
parsed = None
|
||||
valid_json = False
|
||||
schema_valid = False
|
||||
strategically_sound = False
|
||||
error = ""
|
||||
|
||||
try:
|
||||
parsed = json.loads(cleaned)
|
||||
valid_json = True
|
||||
schema_valid = scenario["check"](parsed)
|
||||
if schema_valid:
|
||||
strategically_sound = scenario["strategic_check"](parsed)
|
||||
except json.JSONDecodeError as exc:
|
||||
error = f"JSONDecodeError: {exc}"
|
||||
except Exception as exc:
|
||||
error = f"Validation error: {exc}"
|
||||
|
||||
sr = ScenarioResult(
|
||||
scenario_index=i,
|
||||
description=scenario["description"],
|
||||
raw_response=raw,
|
||||
parsed=parsed,
|
||||
valid_json=valid_json,
|
||||
schema_valid=schema_valid,
|
||||
strategically_sound=strategically_sound,
|
||||
latency_ms=latency_ms,
|
||||
error=error,
|
||||
)
|
||||
result.trials.append(sr)
|
||||
|
||||
if verbose:
|
||||
status = "PASS" if (valid_json and schema_valid) else "FAIL"
|
||||
strat = "strategic" if strategically_sound else "suboptimal"
|
||||
print(
|
||||
f" Scenario {i} [{scenario['description']}]: {status} ({strat}) "
|
||||
f"| {latency_ms:.0f}ms"
|
||||
)
|
||||
if not schema_valid and valid_json:
|
||||
print(f" Schema issue: {scenario['check_desc']}")
|
||||
if not strategically_sound and schema_valid:
|
||||
print(f" Strategy note: {scenario['strategic_desc']}")
|
||||
|
||||
valid_count = sum(1 for t in result.trials if t.valid_json and t.schema_valid)
|
||||
result.score = valid_count / len(SCENARIOS)
|
||||
result.passed = result.score >= 0.67 # 2/3 scenarios
|
||||
|
||||
if latencies:
|
||||
latencies_sorted = sorted(latencies)
|
||||
result.latency_p50_ms = latencies_sorted[len(latencies_sorted) // 2]
|
||||
result.latency_p99_ms = latencies_sorted[-1]
|
||||
|
||||
return result
|
||||
216
timmy-benchmark/levels/level_3_battle_tactics.py
Normal file
216
timmy-benchmark/levels/level_3_battle_tactics.py
Normal file
@@ -0,0 +1,216 @@
|
||||
"""Level 3: Battle Tactics — Formation Commands.
|
||||
|
||||
Tests whether the model can issue coherent formation and tactical orders
|
||||
under simulated battlefield pressure with multiple unit types.
|
||||
Maps to: Bannerlord formation commands (charge, shield wall, skirmish, etc.).
|
||||
"""
|
||||
|
||||
import json
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
LEVEL = 3
|
||||
NAME = "Battle Tactics (Formation Commands)"
|
||||
DESCRIPTION = "Model must issue tactically sound formation orders under simulated battle conditions."
|
||||
|
||||
SYSTEM_PROMPT = """You are a Bannerlord battle commander. Issue formation orders using these commands:
|
||||
- shield_wall: infantry forms defensive line (good vs ranged, slow advance)
|
||||
- charge: all-out attack (high casualties, breaks weak enemies fast)
|
||||
- skirmish: ranged units pepper enemy (good vs heavy infantry, needs distance)
|
||||
- advance: move forward holding formation (balanced)
|
||||
- flank_left / flank_right: cavalry sweeps around enemy side
|
||||
- fallback: retreat to regroup (when badly outnumbered)
|
||||
|
||||
You MUST respond ONLY with valid JSON. No markdown. Raw JSON only."""
|
||||
|
||||
SCENARIOS = [
|
||||
{
|
||||
"description": "Ranged vs infantry — defensive opening",
|
||||
"prompt": (
|
||||
"Situation: You have 20 archers + 10 infantry. Enemy has 30 heavy infantry, no ranged.\n"
|
||||
"Enemy is 200m away and advancing.\n"
|
||||
"Objective: Maximize casualties before melee contact.\n\n"
|
||||
'Issue orders for both units. Respond:\n'
|
||||
'{"infantry_order": "<command>", "archer_order": "<command>", '
|
||||
'"reason": "<tactical reasoning>", "expected_outcome": "<string>"}'
|
||||
),
|
||||
"check": lambda r: (
|
||||
r.get("infantry_order") in ["shield_wall", "charge", "skirmish", "advance", "flank_left", "flank_right", "fallback"]
|
||||
and r.get("archer_order") in ["shield_wall", "charge", "skirmish", "advance", "flank_left", "flank_right", "fallback"]
|
||||
and isinstance(r.get("reason"), str)
|
||||
),
|
||||
"check_desc": "Both orders must be valid commands",
|
||||
"strategic_check": lambda r: (
|
||||
r.get("archer_order") == "skirmish"
|
||||
and r.get("infantry_order") in ["shield_wall", "advance"]
|
||||
),
|
||||
"strategic_desc": "Archers should skirmish while infantry holds (shield_wall or advance)",
|
||||
},
|
||||
{
|
||||
"description": "Outnumbered — retreat decision",
|
||||
"prompt": (
|
||||
"Situation: Your party (15 troops) has been ambushed.\n"
|
||||
"Enemy: 60 bandits, surrounding you on 3 sides.\n"
|
||||
"Your troops: 40% wounded. One escape route to the east.\n\n"
|
||||
'What is your command? Respond:\n'
|
||||
'{"order": "<command>", "direction": "east"|"west"|"north"|"south"|null, '
|
||||
'"reason": "<tactical reasoning>", "priority": "preserve_troops"|"fight_through"}'
|
||||
),
|
||||
"check": lambda r: (
|
||||
r.get("order") in ["shield_wall", "charge", "skirmish", "advance", "flank_left", "flank_right", "fallback"]
|
||||
and r.get("priority") in ["preserve_troops", "fight_through"]
|
||||
),
|
||||
"check_desc": "order and priority must be valid values",
|
||||
"strategic_check": lambda r: (
|
||||
r.get("order") == "fallback"
|
||||
and r.get("priority") == "preserve_troops"
|
||||
),
|
||||
"strategic_desc": "Outnumbered 4:1 with wounded troops — fallback is the sound choice",
|
||||
},
|
||||
{
|
||||
"description": "Cavalry flanking opportunity",
|
||||
"prompt": (
|
||||
"Situation: Main battle is engaged. Your infantry and enemy infantry are locked.\n"
|
||||
"You have 8 cavalry in reserve. Enemy left flank is unprotected.\n"
|
||||
"If cavalry hits the flank now, it will route enemy in ~30 seconds.\n\n"
|
||||
'Order for cavalry: Respond:\n'
|
||||
'{"cavalry_order": "<command>", "timing": "now"|"wait", '
|
||||
'"reason": "<reasoning>", "risk": "low"|"medium"|"high"}'
|
||||
),
|
||||
"check": lambda r: (
|
||||
r.get("cavalry_order") in ["shield_wall", "charge", "skirmish", "advance", "flank_left", "flank_right", "fallback"]
|
||||
and r.get("timing") in ["now", "wait"]
|
||||
and r.get("risk") in ["low", "medium", "high"]
|
||||
),
|
||||
"check_desc": "cavalry_order, timing, and risk must be valid values",
|
||||
"strategic_check": lambda r: (
|
||||
r.get("cavalry_order") in ["flank_left", "flank_right", "charge"]
|
||||
and r.get("timing") == "now"
|
||||
),
|
||||
"strategic_desc": "Should capitalize on the flank opportunity immediately",
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
@dataclass
|
||||
class ScenarioResult:
|
||||
scenario_index: int
|
||||
description: str
|
||||
raw_response: str
|
||||
parsed: dict | None
|
||||
valid_json: bool
|
||||
schema_valid: bool
|
||||
strategically_sound: bool
|
||||
latency_ms: float
|
||||
error: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class LevelResult:
|
||||
level: int = LEVEL
|
||||
name: str = NAME
|
||||
trials: list[ScenarioResult] = field(default_factory=list)
|
||||
passed: bool = False
|
||||
score: float = 0.0
|
||||
latency_p50_ms: float = 0.0
|
||||
latency_p99_ms: float = 0.0
|
||||
|
||||
|
||||
def _clean_response(raw: str) -> str:
|
||||
raw = raw.strip()
|
||||
if raw.startswith("```"):
|
||||
lines = raw.splitlines()
|
||||
lines = [l for l in lines if not l.startswith("```")]
|
||||
raw = "\n".join(lines).strip()
|
||||
return raw
|
||||
|
||||
|
||||
def run(client: Any, model: str, verbose: bool = False) -> LevelResult:
|
||||
result = LevelResult()
|
||||
latencies = []
|
||||
|
||||
for i, scenario in enumerate(SCENARIOS):
|
||||
t0 = time.time()
|
||||
try:
|
||||
response = client.chat(
|
||||
model=model,
|
||||
messages=[
|
||||
{"role": "system", "content": SYSTEM_PROMPT},
|
||||
{"role": "user", "content": scenario["prompt"]},
|
||||
],
|
||||
options={"temperature": 0.2},
|
||||
)
|
||||
raw = response["message"]["content"]
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
except Exception as exc:
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
sr = ScenarioResult(
|
||||
scenario_index=i,
|
||||
description=scenario["description"],
|
||||
raw_response="",
|
||||
parsed=None,
|
||||
valid_json=False,
|
||||
schema_valid=False,
|
||||
strategically_sound=False,
|
||||
latency_ms=latency_ms,
|
||||
error=str(exc),
|
||||
)
|
||||
result.trials.append(sr)
|
||||
if verbose:
|
||||
print(f" Scenario {i}: ERROR — {exc}")
|
||||
continue
|
||||
|
||||
latencies.append(latency_ms)
|
||||
|
||||
cleaned = _clean_response(raw)
|
||||
parsed = None
|
||||
valid_json = False
|
||||
schema_valid = False
|
||||
strategically_sound = False
|
||||
error = ""
|
||||
|
||||
try:
|
||||
parsed = json.loads(cleaned)
|
||||
valid_json = True
|
||||
schema_valid = scenario["check"](parsed)
|
||||
if schema_valid:
|
||||
strategically_sound = scenario["strategic_check"](parsed)
|
||||
except json.JSONDecodeError as exc:
|
||||
error = f"JSONDecodeError: {exc}"
|
||||
except Exception as exc:
|
||||
error = f"Validation error: {exc}"
|
||||
|
||||
sr = ScenarioResult(
|
||||
scenario_index=i,
|
||||
description=scenario["description"],
|
||||
raw_response=raw,
|
||||
parsed=parsed,
|
||||
valid_json=valid_json,
|
||||
schema_valid=schema_valid,
|
||||
strategically_sound=strategically_sound,
|
||||
latency_ms=latency_ms,
|
||||
error=error,
|
||||
)
|
||||
result.trials.append(sr)
|
||||
|
||||
if verbose:
|
||||
status = "PASS" if (valid_json and schema_valid) else "FAIL"
|
||||
strat = "strategic" if strategically_sound else "suboptimal"
|
||||
print(
|
||||
f" Scenario {i} [{scenario['description']}]: {status} ({strat}) "
|
||||
f"| {latency_ms:.0f}ms"
|
||||
)
|
||||
if not schema_valid and valid_json:
|
||||
print(f" Schema issue: {scenario['check_desc']}")
|
||||
|
||||
valid_count = sum(1 for t in result.trials if t.valid_json and t.schema_valid)
|
||||
result.score = valid_count / len(SCENARIOS)
|
||||
result.passed = result.score >= 0.67
|
||||
|
||||
if latencies:
|
||||
latencies_sorted = sorted(latencies)
|
||||
result.latency_p50_ms = latencies_sorted[len(latencies_sorted) // 2]
|
||||
result.latency_p99_ms = latencies_sorted[-1]
|
||||
|
||||
return result
|
||||
223
timmy-benchmark/levels/level_4_trade_route.py
Normal file
223
timmy-benchmark/levels/level_4_trade_route.py
Normal file
@@ -0,0 +1,223 @@
|
||||
"""Level 4: Trade Route — Campaign Navigation.
|
||||
|
||||
Tests multi-step planning ability: route optimization, trade-off analysis
|
||||
across time horizons, and adapting plans when conditions change.
|
||||
Maps to: Bannerlord campaign map navigation, caravans, and economy.
|
||||
"""
|
||||
|
||||
import json
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
LEVEL = 4
|
||||
NAME = "Trade Route (Campaign Navigation)"
|
||||
DESCRIPTION = "Model must plan optimal routes and adapt to changing conditions on the campaign map."
|
||||
|
||||
SYSTEM_PROMPT = """You are a Bannerlord merchant lord planning campaign movements.
|
||||
Consider distance, profitability, risk, and timing.
|
||||
You MUST respond ONLY with valid JSON. No markdown. Raw JSON only."""
|
||||
|
||||
SCENARIOS = [
|
||||
{
|
||||
"description": "Optimal trade route selection",
|
||||
"prompt": (
|
||||
"You are at Epicrotea with 500 gold and 20 days travel budget.\n\n"
|
||||
"Trade opportunities:\n"
|
||||
"- Route A: Epicrotea → Vlandia (3 days) → Sturgia (5 days back)\n"
|
||||
" Sell grain in Vlandia: +300 gold. Buy furs in Sturgia: costs 200, sells for 400 in Calradia.\n"
|
||||
" Total: +500 gold profit, 8 days.\n"
|
||||
"- Route B: Epicrotea → Calradia (2 days) → Aserai (4 days)\n"
|
||||
" Sell iron in Calradia: +150 gold. Buy spice in Aserai: costs 300, sells for 600 in Empire.\n"
|
||||
" Empire is 6 more days away. Total: +450 gold profit, 12 days.\n"
|
||||
"- Route C: Epicrotea → nearby village (1 day)\n"
|
||||
" Buy cheap food: costs 100, sells for 180 in any city.\n"
|
||||
" Total: +80 gold profit, 2 days. Repeatable.\n\n"
|
||||
'Choose route. Respond:\n'
|
||||
'{"route": "A"|"B"|"C", "expected_profit": <int>, "days_used": <int>, '
|
||||
'"reason": "<reasoning>", "risk": "low"|"medium"|"high"}'
|
||||
),
|
||||
"check": lambda r: (
|
||||
r.get("route") in ["A", "B", "C"]
|
||||
and isinstance(r.get("expected_profit"), (int, float))
|
||||
and isinstance(r.get("days_used"), (int, float))
|
||||
and r.get("risk") in ["low", "medium", "high"]
|
||||
),
|
||||
"check_desc": "route, expected_profit, days_used, risk must be valid",
|
||||
"strategic_check": lambda r: r.get("route") in ["A", "C"], # A is best single trip, C is best if repeated
|
||||
"strategic_desc": "Route A has best profit/day ratio; C is best if multiple loops possible",
|
||||
},
|
||||
{
|
||||
"description": "Adapt plan when war declared",
|
||||
"prompt": (
|
||||
"You were heading to Vlandia to trade, 2 days into the journey.\n"
|
||||
"NEWS: Vlandia just declared war on your faction. Entering Vlandia territory is now dangerous.\n\n"
|
||||
"Your current position: borderlands, equidistant between:\n"
|
||||
"- Vlandia (2 days): Now at war — high risk of attack\n"
|
||||
"- Sturgia (3 days): Neutral — safe\n"
|
||||
"- Empire (4 days): Allied — very safe, good prices\n\n"
|
||||
"You have 400 gold of trade goods for the Vlandia market.\n"
|
||||
'What do you do? Respond:\n'
|
||||
'{"decision": "continue_to_vlandia"|"divert_to_sturgia"|"divert_to_empire", '
|
||||
'"reason": "<why>", "gold_at_risk": <int>}'
|
||||
),
|
||||
"check": lambda r: (
|
||||
r.get("decision") in ["continue_to_vlandia", "divert_to_sturgia", "divert_to_empire"]
|
||||
and isinstance(r.get("gold_at_risk"), (int, float))
|
||||
),
|
||||
"check_desc": "decision must be one of three options, gold_at_risk must be a number",
|
||||
"strategic_check": lambda r: r.get("decision") in ["divert_to_sturgia", "divert_to_empire"],
|
||||
"strategic_desc": "Should avoid active war zone — divert to safe destination",
|
||||
},
|
||||
{
|
||||
"description": "Multi-stop route planning with constraints",
|
||||
"prompt": (
|
||||
"Plan a 3-stop trading circuit starting and ending at Pravend.\n"
|
||||
"Budget: 800 gold. Time limit: 20 days.\n\n"
|
||||
"Available cities and travel times from Pravend:\n"
|
||||
"- Rhotae: 2 days (leather cheap, sells well in south)\n"
|
||||
"- Ortysia: 4 days (grain surplus — buy cheap)\n"
|
||||
"- Epicrotea: 3 days (iron market — buy/sell)\n"
|
||||
"- Pen Cannoc: 5 days (wine — high profit, far)\n\n"
|
||||
"Each stop takes 1 day for trading.\n"
|
||||
'Plan 3 stops. Respond:\n'
|
||||
'{"stops": ["<city1>", "<city2>", "<city3>"], '
|
||||
'"total_days": <int>, "estimated_profit": <int>, '
|
||||
'"reason": "<reasoning>"}'
|
||||
),
|
||||
"check": lambda r: (
|
||||
isinstance(r.get("stops"), list)
|
||||
and len(r["stops"]) == 3
|
||||
and all(isinstance(s, str) for s in r["stops"])
|
||||
and isinstance(r.get("total_days"), (int, float))
|
||||
and r.get("total_days", 99) <= 20
|
||||
and isinstance(r.get("estimated_profit"), (int, float))
|
||||
),
|
||||
"check_desc": "stops must be list of 3 strings, total_days <= 20, estimated_profit numeric",
|
||||
"strategic_check": lambda r: "Pen Cannoc" not in r.get("stops", []), # Too far for 20 days
|
||||
"strategic_desc": "Pen Cannoc at 5 days each way is likely too far for a 20-day circuit",
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
@dataclass
|
||||
class ScenarioResult:
|
||||
scenario_index: int
|
||||
description: str
|
||||
raw_response: str
|
||||
parsed: dict | None
|
||||
valid_json: bool
|
||||
schema_valid: bool
|
||||
strategically_sound: bool
|
||||
latency_ms: float
|
||||
error: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class LevelResult:
|
||||
level: int = LEVEL
|
||||
name: str = NAME
|
||||
trials: list[ScenarioResult] = field(default_factory=list)
|
||||
passed: bool = False
|
||||
score: float = 0.0
|
||||
latency_p50_ms: float = 0.0
|
||||
latency_p99_ms: float = 0.0
|
||||
|
||||
|
||||
def _clean_response(raw: str) -> str:
|
||||
raw = raw.strip()
|
||||
if raw.startswith("```"):
|
||||
lines = raw.splitlines()
|
||||
lines = [l for l in lines if not l.startswith("```")]
|
||||
raw = "\n".join(lines).strip()
|
||||
return raw
|
||||
|
||||
|
||||
def run(client: Any, model: str, verbose: bool = False) -> LevelResult:
|
||||
result = LevelResult()
|
||||
latencies = []
|
||||
|
||||
for i, scenario in enumerate(SCENARIOS):
|
||||
t0 = time.time()
|
||||
try:
|
||||
response = client.chat(
|
||||
model=model,
|
||||
messages=[
|
||||
{"role": "system", "content": SYSTEM_PROMPT},
|
||||
{"role": "user", "content": scenario["prompt"]},
|
||||
],
|
||||
options={"temperature": 0.2},
|
||||
)
|
||||
raw = response["message"]["content"]
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
except Exception as exc:
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
sr = ScenarioResult(
|
||||
scenario_index=i,
|
||||
description=scenario["description"],
|
||||
raw_response="",
|
||||
parsed=None,
|
||||
valid_json=False,
|
||||
schema_valid=False,
|
||||
strategically_sound=False,
|
||||
latency_ms=latency_ms,
|
||||
error=str(exc),
|
||||
)
|
||||
result.trials.append(sr)
|
||||
if verbose:
|
||||
print(f" Scenario {i}: ERROR — {exc}")
|
||||
continue
|
||||
|
||||
latencies.append(latency_ms)
|
||||
|
||||
cleaned = _clean_response(raw)
|
||||
parsed = None
|
||||
valid_json = False
|
||||
schema_valid = False
|
||||
strategically_sound = False
|
||||
error = ""
|
||||
|
||||
try:
|
||||
parsed = json.loads(cleaned)
|
||||
valid_json = True
|
||||
schema_valid = scenario["check"](parsed)
|
||||
if schema_valid:
|
||||
strategically_sound = scenario["strategic_check"](parsed)
|
||||
except json.JSONDecodeError as exc:
|
||||
error = f"JSONDecodeError: {exc}"
|
||||
except Exception as exc:
|
||||
error = f"Validation error: {exc}"
|
||||
|
||||
sr = ScenarioResult(
|
||||
scenario_index=i,
|
||||
description=scenario["description"],
|
||||
raw_response=raw,
|
||||
parsed=parsed,
|
||||
valid_json=valid_json,
|
||||
schema_valid=schema_valid,
|
||||
strategically_sound=strategically_sound,
|
||||
latency_ms=latency_ms,
|
||||
error=error,
|
||||
)
|
||||
result.trials.append(sr)
|
||||
|
||||
if verbose:
|
||||
status = "PASS" if (valid_json and schema_valid) else "FAIL"
|
||||
strat = "strategic" if strategically_sound else "suboptimal"
|
||||
print(
|
||||
f" Scenario {i} [{scenario['description']}]: {status} ({strat}) "
|
||||
f"| {latency_ms:.0f}ms"
|
||||
)
|
||||
if not schema_valid and valid_json:
|
||||
print(f" Schema issue: {scenario['check_desc']}")
|
||||
|
||||
valid_count = sum(1 for t in result.trials if t.valid_json and t.schema_valid)
|
||||
result.score = valid_count / len(SCENARIOS)
|
||||
result.passed = result.score >= 0.67
|
||||
|
||||
if latencies:
|
||||
latencies_sorted = sorted(latencies)
|
||||
result.latency_p50_ms = latencies_sorted[len(latencies_sorted) // 2]
|
||||
result.latency_p99_ms = latencies_sorted[-1]
|
||||
|
||||
return result
|
||||
252
timmy-benchmark/levels/level_5_mini_campaign.py
Normal file
252
timmy-benchmark/levels/level_5_mini_campaign.py
Normal file
@@ -0,0 +1,252 @@
|
||||
"""Level 5: Mini Campaign — Full Campaign Loop.
|
||||
|
||||
Tests multi-turn strategic coherence: the model must maintain state across
|
||||
several turns of a simulated Bannerlord campaign, making consistent decisions
|
||||
that build toward a long-term goal.
|
||||
Maps to: Full Bannerlord campaign loop — economy, diplomacy, conquest.
|
||||
"""
|
||||
|
||||
import json
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
LEVEL = 5
|
||||
NAME = "Mini Campaign (Full Campaign Loop)"
|
||||
DESCRIPTION = "Multi-turn strategic planning maintaining coherent goals across 4 turns."
|
||||
|
||||
SYSTEM_PROMPT = """You are Timmy, a Bannerlord lord with ambitions to become King of Calradia.
|
||||
You have 4 turns to establish a power base. Each turn represents 2 weeks of in-game time.
|
||||
|
||||
Your starting position:
|
||||
- Clan tier: 1 (minor lord)
|
||||
- Gold: 1000
|
||||
- Troops: 25 (mixed infantry/cavalry)
|
||||
- Renown: 150
|
||||
- Relations: Neutral with all factions
|
||||
|
||||
Winning requires: Gold > 3000 AND Renown > 400 AND Own 1+ settlement by Turn 4.
|
||||
|
||||
Each turn, choose ONE primary action:
|
||||
- "raid_village": +200 gold, -50 relations target faction, +30 renown, risk of retaliation
|
||||
- "trade_circuit": +300 gold, 0 relation change, +10 renown, no risk
|
||||
- "escort_caravan": +150 gold, +20 relations with faction, +20 renown
|
||||
- "tournament": costs 100 gold, +60 renown, +20 relations with host faction
|
||||
- "recruit_troops": costs 200 gold, +15 troops, no other change
|
||||
- "siege_castle": costs 500 gold + 200 troops morale, -100 relations, +80 renown, +1 settlement if succeed (30% base chance)
|
||||
- "pledge_vassalage": 0 cost, +100 relations with liege, +50 renown, lose independence
|
||||
|
||||
You MUST respond ONLY with valid JSON for each turn. Raw JSON only."""
|
||||
|
||||
|
||||
def run(client: Any, model: str, verbose: bool = False) -> "LevelResult":
|
||||
"""Run a 4-turn mini campaign, tracking state and decision quality."""
|
||||
result = LevelResult()
|
||||
|
||||
# Initial game state
|
||||
state = {
|
||||
"turn": 1,
|
||||
"gold": 1000,
|
||||
"troops": 25,
|
||||
"renown": 150,
|
||||
"settlements": 0,
|
||||
"relations": {"vlandia": 0, "sturgia": 0, "empire": 0, "aserai": 0, "battania": 0},
|
||||
}
|
||||
|
||||
conversation = [{"role": "system", "content": SYSTEM_PROMPT}]
|
||||
turns_passed = []
|
||||
total_latency = []
|
||||
|
||||
valid_actions = [
|
||||
"raid_village", "trade_circuit", "escort_caravan", "tournament",
|
||||
"recruit_troops", "siege_castle", "pledge_vassalage",
|
||||
]
|
||||
|
||||
for turn_num in range(1, 5):
|
||||
state["turn"] = turn_num
|
||||
state_str = json.dumps(state, indent=2)
|
||||
|
||||
prompt = (
|
||||
f"=== TURN {turn_num} / 4 ===\n"
|
||||
f"Current state:\n{state_str}\n\n"
|
||||
f"Win conditions remaining: Gold > 3000 ({state['gold']}/3000), "
|
||||
f"Renown > 400 ({state['renown']}/400), Settlements >= 1 ({state['settlements']}/1)\n\n"
|
||||
f"Choose your action for Turn {turn_num}.\n"
|
||||
f'Respond: {{"action": "<action>", "target_faction": "<faction or null>", '
|
||||
f'"reason": "<strategic reasoning>", "goal": "<what this advances>"}}'
|
||||
)
|
||||
|
||||
conversation.append({"role": "user", "content": prompt})
|
||||
|
||||
t0 = time.time()
|
||||
try:
|
||||
response = client.chat(
|
||||
model=model,
|
||||
messages=conversation,
|
||||
options={"temperature": 0.3},
|
||||
)
|
||||
raw = response["message"]["content"]
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
except Exception as exc:
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
tr = TurnResult(
|
||||
turn=turn_num,
|
||||
state_before=dict(state),
|
||||
raw_response="",
|
||||
parsed=None,
|
||||
valid_json=False,
|
||||
valid_action=False,
|
||||
action=None,
|
||||
latency_ms=latency_ms,
|
||||
error=str(exc),
|
||||
)
|
||||
turns_passed.append(tr)
|
||||
if verbose:
|
||||
print(f" Turn {turn_num}: ERROR — {exc}")
|
||||
break
|
||||
|
||||
total_latency.append(latency_ms)
|
||||
|
||||
# Clean and parse response
|
||||
cleaned = raw.strip()
|
||||
if cleaned.startswith("```"):
|
||||
lines = cleaned.splitlines()
|
||||
lines = [l for l in lines if not l.startswith("```")]
|
||||
cleaned = "\n".join(lines).strip()
|
||||
|
||||
parsed = None
|
||||
valid_json = False
|
||||
valid_action = False
|
||||
action = None
|
||||
error = ""
|
||||
|
||||
try:
|
||||
parsed = json.loads(cleaned)
|
||||
valid_json = True
|
||||
action = parsed.get("action")
|
||||
valid_action = action in valid_actions
|
||||
except json.JSONDecodeError as exc:
|
||||
error = f"JSONDecodeError: {exc}"
|
||||
|
||||
tr = TurnResult(
|
||||
turn=turn_num,
|
||||
state_before=dict(state),
|
||||
raw_response=raw,
|
||||
parsed=parsed,
|
||||
valid_json=valid_json,
|
||||
valid_action=valid_action,
|
||||
action=action,
|
||||
latency_ms=latency_ms,
|
||||
error=error,
|
||||
)
|
||||
turns_passed.append(tr)
|
||||
|
||||
# Add model response to conversation for continuity
|
||||
conversation.append({"role": "assistant", "content": raw})
|
||||
|
||||
# Apply state changes based on action
|
||||
if valid_action:
|
||||
_apply_action(state, action, parsed.get("target_faction"))
|
||||
|
||||
if verbose:
|
||||
status = "PASS" if (valid_json and valid_action) else "FAIL"
|
||||
print(
|
||||
f" Turn {turn_num}: {status} | action={action} | {latency_ms:.0f}ms | "
|
||||
f"gold={state['gold']} renown={state['renown']} settlements={state['settlements']}"
|
||||
)
|
||||
|
||||
result.turns = turns_passed
|
||||
result.final_state = dict(state)
|
||||
|
||||
# Win condition check
|
||||
result.reached_gold_target = state["gold"] >= 3000
|
||||
result.reached_renown_target = state["renown"] >= 400
|
||||
result.reached_settlement_target = state["settlements"] >= 1
|
||||
|
||||
# Score: % of turns with valid JSON + valid action
|
||||
valid_turns = sum(1 for t in turns_passed if t.valid_json and t.valid_action)
|
||||
result.score = valid_turns / 4 if turns_passed else 0.0
|
||||
result.passed = result.score >= 0.75 # 3/4 turns valid
|
||||
|
||||
if total_latency:
|
||||
latencies_sorted = sorted(total_latency)
|
||||
result.latency_p50_ms = latencies_sorted[len(latencies_sorted) // 2]
|
||||
result.latency_p99_ms = latencies_sorted[-1]
|
||||
|
||||
if verbose:
|
||||
win_status = []
|
||||
if result.reached_gold_target:
|
||||
win_status.append("GOLD")
|
||||
if result.reached_renown_target:
|
||||
win_status.append("RENOWN")
|
||||
if result.reached_settlement_target:
|
||||
win_status.append("SETTLEMENT")
|
||||
print(f" Win conditions met: {win_status or 'none'}")
|
||||
print(f" Final: gold={state['gold']} renown={state['renown']} settlements={state['settlements']}")
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def _apply_action(state: dict, action: str, target_faction: str | None) -> None:
|
||||
"""Simulate game state changes for a given action."""
|
||||
if action == "raid_village":
|
||||
state["gold"] += 200
|
||||
state["renown"] += 30
|
||||
if target_faction and target_faction in state["relations"]:
|
||||
state["relations"][target_faction] -= 50
|
||||
elif action == "trade_circuit":
|
||||
state["gold"] += 300
|
||||
state["renown"] += 10
|
||||
elif action == "escort_caravan":
|
||||
state["gold"] += 150
|
||||
state["renown"] += 20
|
||||
if target_faction and target_faction in state["relations"]:
|
||||
state["relations"][target_faction] += 20
|
||||
elif action == "tournament":
|
||||
state["gold"] -= 100
|
||||
state["renown"] += 60
|
||||
if target_faction and target_faction in state["relations"]:
|
||||
state["relations"][target_faction] += 20
|
||||
elif action == "recruit_troops":
|
||||
state["gold"] -= 200
|
||||
state["troops"] += 15
|
||||
elif action == "siege_castle":
|
||||
state["gold"] -= 500
|
||||
state["renown"] += 80
|
||||
# 30% chance success (deterministic sim: succeed on turn 3+ if attempted)
|
||||
if state["turn"] >= 3:
|
||||
state["settlements"] += 1
|
||||
if target_faction and target_faction in state["relations"]:
|
||||
state["relations"][target_faction] -= 100
|
||||
elif action == "pledge_vassalage":
|
||||
state["renown"] += 50
|
||||
if target_faction and target_faction in state["relations"]:
|
||||
state["relations"][target_faction] += 100
|
||||
|
||||
|
||||
@dataclass
|
||||
class TurnResult:
|
||||
turn: int
|
||||
state_before: dict
|
||||
raw_response: str
|
||||
parsed: dict | None
|
||||
valid_json: bool
|
||||
valid_action: bool
|
||||
action: str | None
|
||||
latency_ms: float
|
||||
error: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class LevelResult:
|
||||
level: int = LEVEL
|
||||
name: str = NAME
|
||||
turns: list[TurnResult] = field(default_factory=list)
|
||||
final_state: dict = field(default_factory=dict)
|
||||
passed: bool = False
|
||||
score: float = 0.0
|
||||
reached_gold_target: bool = False
|
||||
reached_renown_target: bool = False
|
||||
reached_settlement_target: bool = False
|
||||
latency_p50_ms: float = 0.0
|
||||
latency_p99_ms: float = 0.0
|
||||
82
timmy-benchmark/results/SCORECARD.md
Normal file
82
timmy-benchmark/results/SCORECARD.md
Normal file
@@ -0,0 +1,82 @@
|
||||
# Bannerlord M0 — Cognitive Benchmark Scorecard
|
||||
|
||||
**Date:** 2026-03-23
|
||||
**Benchmark:** 6-level cognitive harness (L0–L5)
|
||||
**M1 Gate:** Must pass L0 + L1, latency < 10s per decision
|
||||
|
||||
---
|
||||
|
||||
## Results Summary
|
||||
|
||||
| Level | Description | qwen2.5:14b | hermes3:latest | hermes3:8b |
|
||||
|-------|-------------|:-----------:|:--------------:|:----------:|
|
||||
| **L0 [M1 GATE]** | JSON Compliance | ✓ PASS 100% | ✓ PASS 100% | ✓ PASS 100% |
|
||||
| **L1 [M1 GATE]** | Board State Tracking | ✗ FAIL 50% | ✗ FAIL 50% | ✗ FAIL 50% |
|
||||
| L2 | Resource Management | ✓ PASS 100% | ✓ PASS 100% | ✓ PASS 100% |
|
||||
| L3 | Battle Tactics | ✓ PASS 100% | ✓ PASS 100% | ✓ PASS 100% |
|
||||
| L4 | Trade Route | ✓ PASS 100% | ✓ PASS 100% | ✓ PASS 100% |
|
||||
| L5 | Mini Campaign | ✓ PASS 100% | ✓ PASS 100% | ✓ PASS 100% |
|
||||
| **M1 GATE** | | ✗ **FAIL** | ✗ **FAIL** | ✗ **FAIL** |
|
||||
|
||||
---
|
||||
|
||||
## Latency (p50 / p99)
|
||||
|
||||
| Level | qwen2.5:14b | hermes3:latest | hermes3:8b |
|
||||
|-------|-------------|----------------|------------|
|
||||
| L0 | 1443ms / 6348ms | 1028ms / 1184ms | 570ms / 593ms |
|
||||
| L1 | 943ms / 1184ms | 1166ms / 1303ms | 767ms / 1313ms |
|
||||
| L2 | 2936ms / 3122ms | 2032ms / 2232ms | 2408ms / 2832ms |
|
||||
| L3 | 2248ms / 3828ms | 1614ms / 3525ms | 2174ms / 3437ms |
|
||||
| L4 | 3235ms / 3318ms | 2724ms / 3038ms | 2507ms / 3420ms |
|
||||
| L5 | 3414ms / 3970ms | 3137ms / 3433ms | 2571ms / 2763ms |
|
||||
|
||||
All models are **well under the 10s latency threshold** for L0–L1.
|
||||
|
||||
---
|
||||
|
||||
## Level 1 Failure Analysis
|
||||
|
||||
All three models fail L1 with **identical pattern** (2/4 scenarios pass):
|
||||
|
||||
| Scenario | Expected | All Models |
|
||||
|----------|----------|-----------|
|
||||
| Empty board — opening move | Any empty square | ✓ center (4) |
|
||||
| Block opponent's winning move | Position 1 (only block) | ✗ position 4 (occupied!) |
|
||||
| Take winning move | Position 1 (win) | ✗ position 0 or 2 (occupied!) |
|
||||
| Legal move on partially filled board | Any of 6,7,8 | ✓ position 6 |
|
||||
|
||||
**Root cause:** Models choose moves by heuristic (center, corners) without checking whether the chosen square is already occupied. They read the board description but don't cross-reference their move choice against it. This is a genuine spatial state-tracking failure.
|
||||
|
||||
**Note:** `hermes3` models emit `"move": "4"` (string) vs `"move": 4` (int). The benchmark was patched to coerce string digits to int for L1, since type fidelity is already tested at L0.
|
||||
|
||||
---
|
||||
|
||||
## M1 Gate: FAILED (all models)
|
||||
|
||||
No model passes the M1 gate. The blocker is **Level 1 — Board State Tracking**.
|
||||
|
||||
### Recommendation
|
||||
|
||||
The L1 failure is consistent and structural. All models understand the format and can make reasonable *opening* moves but fail to avoid already-occupied squares. Options for M1:
|
||||
|
||||
1. **Lower the L1 pass threshold** from 100% to ≥ 75% — the scenarios where models fail require recognizing occupied positions from a sparse JSON array, which is a known weakness. Would allow proceeding to M1 with flagged risk.
|
||||
2. **Prompt engineering** — add explicit "The following squares are taken: X at positions [P1, P2]" to the prompt to see if board tracking improves.
|
||||
3. **Re-evaluate L1 gate requirement** — models pass L2–L5 (resource, tactics, trade, campaign) which are more directly relevant to Bannerlord play. Consider whether L1 is the right gate.
|
||||
|
||||
---
|
||||
|
||||
## qwen3:14b
|
||||
|
||||
Model **not available** on this Ollama instance. Available qwen3 model: `qwen3:30b`.
|
||||
`qwen3:30b` was not benchmarked (significantly slower; requires explicit decision to run).
|
||||
|
||||
---
|
||||
|
||||
## Result Files
|
||||
|
||||
| Model | File |
|
||||
|-------|------|
|
||||
| qwen2.5:14b | `results/qwen2.5_14b_20260323_142119.json` |
|
||||
| hermes3:latest | `results/hermes3_latest_20260323_152900.json` |
|
||||
| hermes3:8b | `results/hermes3_8b_20260323_153000.json` |
|
||||
476
timmy-benchmark/results/hermes3_8b_20260323_153000.json
Normal file
476
timmy-benchmark/results/hermes3_8b_20260323_153000.json
Normal file
@@ -0,0 +1,476 @@
|
||||
{
|
||||
"model": "hermes3:8b",
|
||||
"timestamp": "2026-03-23T19:29:22.731080+00:00",
|
||||
"skipped": false,
|
||||
"levels": {
|
||||
"0": {
|
||||
"level": 0,
|
||||
"name": "JSON Compliance (Coin Flip)",
|
||||
"trials": [
|
||||
{
|
||||
"trial_index": 0,
|
||||
"prompt": "A coin is flipped. Respond with exactly: {\"choice\": \"heads\"} or {\"choice\": \"tails\"}. Pick one. JSON only.",
|
||||
"raw_response": "{\"choice\": \"heads\"}",
|
||||
"parsed": {
|
||||
"choice": "heads"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"value_valid": true,
|
||||
"latency_ms": 522.9010581970215,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"trial_index": 1,
|
||||
"prompt": "You must attack or defend. Respond with: {\"action\": \"attack\", \"confidence\": 0.8} or {\"action\": \"defend\", \"confidence\": 0.6}. Replace confidence with your own value 0.0-1.0. JSON only.",
|
||||
"raw_response": "{\"action\": \"defend\", \"confidence\": 0.7}",
|
||||
"parsed": {
|
||||
"action": "defend",
|
||||
"confidence": 0.7
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"value_valid": true,
|
||||
"latency_ms": 592.6809310913086,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"trial_index": 2,
|
||||
"prompt": "Choose a direction to march. Respond with exactly: {\"direction\": \"north\", \"reason\": \"string explaining why\"}. Pick north/south/east/west. JSON only.",
|
||||
"raw_response": "{\"direction\": \"north\", \"reason\": \"string explaining why\"}",
|
||||
"parsed": {
|
||||
"direction": "north",
|
||||
"reason": "string explaining why"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"value_valid": true,
|
||||
"latency_ms": 569.8130130767822,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 569.8130130767822,
|
||||
"latency_p99_ms": 592.6809310913086
|
||||
},
|
||||
"1": {
|
||||
"level": 1,
|
||||
"name": "Board State Tracking (Tic-Tac-Toe)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Empty board \u2014 opening move",
|
||||
"prompt": "Board state: [null,null,null,null,null,null,null,null,null]. You are X. It is your turn. Choose a move. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": \"4\", \"reason\": \"Center move to establish dominance\"}",
|
||||
"parsed": {
|
||||
"move": "4",
|
||||
"reason": "Center move to establish dominance"
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": true,
|
||||
"move_correct": true,
|
||||
"latency_ms": 766.8571472167969,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Block opponent's winning move",
|
||||
"prompt": "Board: [\"O\",null,\"O\",null,\"X\",null,null,null,null]. O has positions 0 and 2. You are X. O will win on next turn unless you block. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": 4, \"reason\": \"Block the winning move for O in the center\"}",
|
||||
"parsed": {
|
||||
"move": 4,
|
||||
"reason": "Block the winning move for O in the center"
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": false,
|
||||
"move_correct": false,
|
||||
"latency_ms": 734.1668605804443,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Take winning move",
|
||||
"prompt": "Board: [\"X\",null,\"X\",null,\"O\",null,null,\"O\",null]. You are X. You have positions 0 and 2. You can win this turn. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": 0, \"reason\": \"I can win this turn by placing my X in position 0, creating a winning line with X at positions 0, 4, and 8.\"}",
|
||||
"parsed": {
|
||||
"move": 0,
|
||||
"reason": "I can win this turn by placing my X in position 0, creating a winning line with X at positions 0, 4, and 8."
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": false,
|
||||
"move_correct": false,
|
||||
"latency_ms": 1312.8020763397217,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 3,
|
||||
"description": "Legal move on partially filled board",
|
||||
"prompt": "Board: [\"X\",\"O\",\"X\",\"O\",\"X\",\"O\",null,null,null]. You are O. Choose a legal move (positions 6, 7, or 8 are available). Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": 6, \"reason\": \"Block X's potential winning move in the center.\"}",
|
||||
"parsed": {
|
||||
"move": 6,
|
||||
"reason": "Block X's potential winning move in the center."
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": true,
|
||||
"move_correct": true,
|
||||
"latency_ms": 751.0488033294678,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": false,
|
||||
"score": 0.5,
|
||||
"latency_p50_ms": 766.8571472167969,
|
||||
"latency_p99_ms": 1312.8020763397217
|
||||
},
|
||||
"2": {
|
||||
"level": 2,
|
||||
"name": "Resource Management (Party Economy)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Budget allocation under constraint",
|
||||
"raw_response": "{\"choices\": [\"buy_food\", \"repair_armor\"], \"gold_spent\": 350, \"reason\": \"The priority is to address the critical morale issue and reduce casualty rates. Buying food for 20 days will stabilize morale, preventing potential desertions. Repairing armor will also help reduce the casualty rate, making the remaining troops more effective. Recruit infantry is not chosen as it would deplete all the gold, leaving no funds for food or armor repair.\"}",
|
||||
"parsed": {
|
||||
"choices": [
|
||||
"buy_food",
|
||||
"repair_armor"
|
||||
],
|
||||
"gold_spent": 350,
|
||||
"reason": "The priority is to address the critical morale issue and reduce casualty rates. Buying food for 20 days will stabilize morale, preventing potential desertions. Repairing armor will also help reduce the casualty rate, making the remaining troops more effective. Recruit infantry is not chosen as it would deplete all the gold, leaving no funds for food or armor repair."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2832.072973251343,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Troop tier upgrade decision",
|
||||
"raw_response": "{\"action\": \"upgrade_recruits\", \"reason\": \"Upgrading recruits to Tier-2 will improve our party's combat effectiveness against the upcoming raider camp attack. The moderate difficulty suggests we need a stronger force to succeed.\", \"expected_outcome\": \"Increased chances of victory in the upcoming battle, potentially saving resources in the long run by avoiding higher upkeep costs of dismissing recruits.\"}",
|
||||
"parsed": {
|
||||
"action": "upgrade_recruits",
|
||||
"reason": "Upgrading recruits to Tier-2 will improve our party's combat effectiveness against the upcoming raider camp attack. The moderate difficulty suggests we need a stronger force to succeed.",
|
||||
"expected_outcome": "Increased chances of victory in the upcoming battle, potentially saving resources in the long run by avoiding higher upkeep costs of dismissing recruits."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2407.7491760253906,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Multi-turn planning horizon",
|
||||
"raw_response": "{\"extra_food_days\": 5, \"cost\": 50, \"remaining_gold\": 250, \"reason\": \"Buying 5 extra food days will ensure we have enough food to cross the desert, and we will still have enough gold left to reach the town on day 10 to buy more supplies.\"}",
|
||||
"parsed": {
|
||||
"extra_food_days": 5,
|
||||
"cost": 50,
|
||||
"remaining_gold": 250,
|
||||
"reason": "Buying 5 extra food days will ensure we have enough food to cross the desert, and we will still have enough gold left to reach the town on day 10 to buy more supplies."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2028.775930404663,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 2407.7491760253906,
|
||||
"latency_p99_ms": 2832.072973251343
|
||||
},
|
||||
"3": {
|
||||
"level": 3,
|
||||
"name": "Battle Tactics (Formation Commands)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Ranged vs infantry \u2014 defensive opening",
|
||||
"raw_response": "{\"infantry_order\": \"skirmish\", \"archer_order\": \"skirmish\", \"reason\": \"The enemy is advancing with heavy infantry, so our priority should be to disrupt their formation and inflict casualties from range before they close the gap. The archers and infantry will pepper the enemy with arrows as they approach, aiming to break their cohesion and lower their morale.\", \"expected_outcome\": \"The enemy's heavy infantry will take significant casualties and lose cohesion before reaching our lines, giving us a tactical advantage when the two forces meet in melee combat.\"}",
|
||||
"parsed": {
|
||||
"infantry_order": "skirmish",
|
||||
"archer_order": "skirmish",
|
||||
"reason": "The enemy is advancing with heavy infantry, so our priority should be to disrupt their formation and inflict casualties from range before they close the gap. The archers and infantry will pepper the enemy with arrows as they approach, aiming to break their cohesion and lower their morale.",
|
||||
"expected_outcome": "The enemy's heavy infantry will take significant casualties and lose cohesion before reaching our lines, giving us a tactical advantage when the two forces meet in melee combat."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": false,
|
||||
"latency_ms": 3436.897039413452,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Outnumbered \u2014 retreat decision",
|
||||
"raw_response": "{\"order\": \"fallback\", \"direction\": \"east\", \"reason\": \"Our troops are heavily wounded and outmatched by the 60 bandits surrounding us. We must retreat immediately to the east where there is an escape route. Fallback is the best option to preserve our troops and live to fight another day.\", \"priority\": \"preserve_troops\"}",
|
||||
"parsed": {
|
||||
"order": "fallback",
|
||||
"direction": "east",
|
||||
"reason": "Our troops are heavily wounded and outmatched by the 60 bandits surrounding us. We must retreat immediately to the east where there is an escape route. Fallback is the best option to preserve our troops and live to fight another day.",
|
||||
"priority": "preserve_troops"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2173.5548973083496,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Cavalry flanking opportunity",
|
||||
"raw_response": "{\"cavalry_order\": \"flank_left\", \"timing\": \"now\", \"reason\": \"Enemy left flank is unprotected and our cavalry can break their line quickly, routing them in ~30 seconds.\", \"risk\": \"low\"}",
|
||||
"parsed": {
|
||||
"cavalry_order": "flank_left",
|
||||
"timing": "now",
|
||||
"reason": "Enemy left flank is unprotected and our cavalry can break their line quickly, routing them in ~30 seconds.",
|
||||
"risk": "low"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 1566.443920135498,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 2173.5548973083496,
|
||||
"latency_p99_ms": 3436.897039413452
|
||||
},
|
||||
"4": {
|
||||
"level": 4,
|
||||
"name": "Trade Route (Campaign Navigation)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Optimal trade route selection",
|
||||
"raw_response": "{\"route\": \"A\", \"expected_profit\": 500, \"days_used\": 8, \"reason\": \"Route A offers the highest profit with a manageable travel time and low risk. Selling grain in Vlandia and furs in Sturgia provides a good balance of goods and markets.\", \"risk\": \"low\"}",
|
||||
"parsed": {
|
||||
"route": "A",
|
||||
"expected_profit": 500,
|
||||
"days_used": 8,
|
||||
"reason": "Route A offers the highest profit with a manageable travel time and low risk. Selling grain in Vlandia and furs in Sturgia provides a good balance of goods and markets.",
|
||||
"risk": "low"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2507.443904876709,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Adapt plan when war declared",
|
||||
"raw_response": "{\"decision\": \"divert_to_sturgia\", \"reason\": \"Vlandia declaring war makes entering their territory too dangerous for trade. Sturgia is neutral but closer than the Empire, so less time and gold at risk.\", \"gold_at_risk\": 400}",
|
||||
"parsed": {
|
||||
"decision": "divert_to_sturgia",
|
||||
"reason": "Vlandia declaring war makes entering their territory too dangerous for trade. Sturgia is neutral but closer than the Empire, so less time and gold at risk.",
|
||||
"gold_at_risk": 400
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2011.15083694458,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Multi-stop route planning with constraints",
|
||||
"raw_response": "{\n \"stops\": [\n \"Rhotae\",\n \"Ortysia\",\n \"Epicrotea\"\n ],\n \"total_days\": 7,\n \"estimated_profit\": 800,\n \"reason\": \"Starting in Rhotae allows buying cheap leather and selling well in the south. Moving to Ortysia to buy grain cheap. Finally, Epicrotea for iron market. 7 days total. 800 gold budget should be enough with this plan.\"\n}",
|
||||
"parsed": {
|
||||
"stops": [
|
||||
"Rhotae",
|
||||
"Ortysia",
|
||||
"Epicrotea"
|
||||
],
|
||||
"total_days": 7,
|
||||
"estimated_profit": 800,
|
||||
"reason": "Starting in Rhotae allows buying cheap leather and selling well in the south. Moving to Ortysia to buy grain cheap. Finally, Epicrotea for iron market. 7 days total. 800 gold budget should be enough with this plan."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 3420.1788902282715,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 2507.443904876709,
|
||||
"latency_p99_ms": 3420.1788902282715
|
||||
},
|
||||
"5": {
|
||||
"level": 5,
|
||||
"name": "Mini Campaign (Full Campaign Loop)",
|
||||
"turns": [
|
||||
{
|
||||
"turn": 1,
|
||||
"state_before": {
|
||||
"turn": 1,
|
||||
"gold": 1000,
|
||||
"troops": 25,
|
||||
"renown": 150,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"trade_circuit\", \"target_faction\": null, \"reason\": \"Establishing trade relations will bring in gold and renown without risking retaliation or damaging relations with any faction. This sets a strong financial foundation for future expansion.\", \"goal\": \"Increase gold and renown\"}",
|
||||
"parsed": {
|
||||
"action": "trade_circuit",
|
||||
"target_faction": null,
|
||||
"reason": "Establishing trade relations will bring in gold and renown without risking retaliation or damaging relations with any faction. This sets a strong financial foundation for future expansion.",
|
||||
"goal": "Increase gold and renown"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "trade_circuit",
|
||||
"latency_ms": 2763.1959915161133,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"turn": 2,
|
||||
"state_before": {
|
||||
"turn": 2,
|
||||
"gold": 1300,
|
||||
"troops": 25,
|
||||
"renown": 160,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"tournament\", \"target_faction\": null, \"reason\": \"Participating in a tournament will boost my renown and relations without spending gold. This positive publicity can help attract potential vassals or allies in the future.\", \"goal\": \"Increase renown and relations\"}",
|
||||
"parsed": {
|
||||
"action": "tournament",
|
||||
"target_faction": null,
|
||||
"reason": "Participating in a tournament will boost my renown and relations without spending gold. This positive publicity can help attract potential vassals or allies in the future.",
|
||||
"goal": "Increase renown and relations"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "tournament",
|
||||
"latency_ms": 2335.5319499969482,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"turn": 3,
|
||||
"state_before": {
|
||||
"turn": 3,
|
||||
"gold": 1200,
|
||||
"troops": 25,
|
||||
"renown": 220,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"recruit_troops\", \"target_faction\": null, \"reason\": \"Recruiting more troops will strengthen my military force, which is essential for future expansion and defending my settlements once I have them. This investment in my army will pay off in the long run.\", \"goal\": \"Increase troops\"}",
|
||||
"parsed": {
|
||||
"action": "recruit_troops",
|
||||
"target_faction": null,
|
||||
"reason": "Recruiting more troops will strengthen my military force, which is essential for future expansion and defending my settlements once I have them. This investment in my army will pay off in the long run.",
|
||||
"goal": "Increase troops"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "recruit_troops",
|
||||
"latency_ms": 2570.871114730835,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"turn": 4,
|
||||
"state_before": {
|
||||
"turn": 4,
|
||||
"gold": 1000,
|
||||
"troops": 40,
|
||||
"renown": 220,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"siege_castle\", \"target_faction\": null, \"reason\": \"Capturing a castle will give me my first settlement, putting me closer to winning the game. This physical foothold in Calradia will provide a base of operations and resources to further my ambitions of becoming King.\", \"goal\": \"Gain first settlement\"}",
|
||||
"parsed": {
|
||||
"action": "siege_castle",
|
||||
"target_faction": null,
|
||||
"reason": "Capturing a castle will give me my first settlement, putting me closer to winning the game. This physical foothold in Calradia will provide a base of operations and resources to further my ambitions of becoming King.",
|
||||
"goal": "Gain first settlement"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "siege_castle",
|
||||
"latency_ms": 2367.640972137451,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"final_state": {
|
||||
"turn": 4,
|
||||
"gold": 500,
|
||||
"troops": 40,
|
||||
"renown": 300,
|
||||
"settlements": 1,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"reached_gold_target": false,
|
||||
"reached_renown_target": false,
|
||||
"reached_settlement_target": true,
|
||||
"latency_p50_ms": 2570.871114730835,
|
||||
"latency_p99_ms": 2763.1959915161133
|
||||
}
|
||||
},
|
||||
"summary": {
|
||||
"total_elapsed_s": 37.7,
|
||||
"levels_run": [
|
||||
0,
|
||||
1,
|
||||
2,
|
||||
3,
|
||||
4,
|
||||
5
|
||||
],
|
||||
"levels_passed": [
|
||||
0,
|
||||
2,
|
||||
3,
|
||||
4,
|
||||
5
|
||||
],
|
||||
"levels_failed": [
|
||||
1
|
||||
],
|
||||
"m1_gate_passed": false,
|
||||
"m1_gate_notes": [
|
||||
"Level 1 FAILED (score=0.50)"
|
||||
],
|
||||
"m1_latency_threshold_ms": 10000
|
||||
}
|
||||
}
|
||||
476
timmy-benchmark/results/hermes3_latest_20260323_152900.json
Normal file
476
timmy-benchmark/results/hermes3_latest_20260323_152900.json
Normal file
@@ -0,0 +1,476 @@
|
||||
{
|
||||
"model": "hermes3:latest",
|
||||
"timestamp": "2026-03-23T19:28:19.717773+00:00",
|
||||
"skipped": false,
|
||||
"levels": {
|
||||
"0": {
|
||||
"level": 0,
|
||||
"name": "JSON Compliance (Coin Flip)",
|
||||
"trials": [
|
||||
{
|
||||
"trial_index": 0,
|
||||
"prompt": "A coin is flipped. Respond with exactly: {\"choice\": \"heads\"} or {\"choice\": \"tails\"}. Pick one. JSON only.",
|
||||
"raw_response": "{\"choice\": \"heads\"}",
|
||||
"parsed": {
|
||||
"choice": "heads"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"value_valid": true,
|
||||
"latency_ms": 1028.2011032104492,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"trial_index": 1,
|
||||
"prompt": "You must attack or defend. Respond with: {\"action\": \"attack\", \"confidence\": 0.8} or {\"action\": \"defend\", \"confidence\": 0.6}. Replace confidence with your own value 0.0-1.0. JSON only.",
|
||||
"raw_response": "{\"action\": \"defend\", \"confidence\": 0.7}",
|
||||
"parsed": {
|
||||
"action": "defend",
|
||||
"confidence": 0.7
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"value_valid": true,
|
||||
"latency_ms": 700.0582218170166,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"trial_index": 2,
|
||||
"prompt": "Choose a direction to march. Respond with exactly: {\"direction\": \"north\", \"reason\": \"string explaining why\"}. Pick north/south/east/west. JSON only.",
|
||||
"raw_response": "{\"direction\": \"north\", \"reason\": \"The northern direction allows for the most efficient and direct path towards the goal, minimizing potential obstacles and maximizing progress.\"}",
|
||||
"parsed": {
|
||||
"direction": "north",
|
||||
"reason": "The northern direction allows for the most efficient and direct path towards the goal, minimizing potential obstacles and maximizing progress."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"value_valid": true,
|
||||
"latency_ms": 1184.3459606170654,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 1028.2011032104492,
|
||||
"latency_p99_ms": 1184.3459606170654
|
||||
},
|
||||
"1": {
|
||||
"level": 1,
|
||||
"name": "Board State Tracking (Tic-Tac-Toe)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Empty board \u2014 opening move",
|
||||
"prompt": "Board state: [null,null,null,null,null,null,null,null,null]. You are X. It is your turn. Choose a move. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": \"4\", \"reason\": \"Center move to establish dominance\"}",
|
||||
"parsed": {
|
||||
"move": "4",
|
||||
"reason": "Center move to establish dominance"
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": true,
|
||||
"move_correct": true,
|
||||
"latency_ms": 932.2080612182617,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Block opponent's winning move",
|
||||
"prompt": "Board: [\"O\",null,\"O\",null,\"X\",null,null,null,null]. O has positions 0 and 2. You are X. O will win on next turn unless you block. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": 4, \"reason\": \"Block the winning move for O in the center, creating a chance for a winning X cross\"}",
|
||||
"parsed": {
|
||||
"move": 4,
|
||||
"reason": "Block the winning move for O in the center, creating a chance for a winning X cross"
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": false,
|
||||
"move_correct": false,
|
||||
"latency_ms": 1165.644884109497,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Take winning move",
|
||||
"prompt": "Board: [\"X\",null,\"X\",null,\"O\",null,null,\"O\",null]. You are X. You have positions 0 and 2. You can win this turn. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": 0, \"reason\": \"I can win this turn by placing my X in position 0, creating a winning line of three X's horizontally.\"}",
|
||||
"parsed": {
|
||||
"move": 0,
|
||||
"reason": "I can win this turn by placing my X in position 0, creating a winning line of three X's horizontally."
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": false,
|
||||
"move_correct": false,
|
||||
"latency_ms": 1302.915096282959,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 3,
|
||||
"description": "Legal move on partially filled board",
|
||||
"prompt": "Board: [\"X\",\"O\",\"X\",\"O\",\"X\",\"O\",null,null,null]. You are O. Choose a legal move (positions 6, 7, or 8 are available). Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": 6, \"reason\": \"Block X's potential win in the center.\"}",
|
||||
"parsed": {
|
||||
"move": 6,
|
||||
"reason": "Block X's potential win in the center."
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": true,
|
||||
"move_correct": true,
|
||||
"latency_ms": 851.456880569458,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": false,
|
||||
"score": 0.5,
|
||||
"latency_p50_ms": 1165.644884109497,
|
||||
"latency_p99_ms": 1302.915096282959
|
||||
},
|
||||
"2": {
|
||||
"level": 2,
|
||||
"name": "Resource Management (Party Economy)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Budget allocation under constraint",
|
||||
"raw_response": "{\"choices\": [\"buy_food\", \"repair_armor\"], \"gold_spent\": 350, \"reason\": \"Buying food is crucial to prevent desertion, and repairing armor will help reduce casualties. Recruit infantry is put on hold due to limited resources.\"}",
|
||||
"parsed": {
|
||||
"choices": [
|
||||
"buy_food",
|
||||
"repair_armor"
|
||||
],
|
||||
"gold_spent": 350,
|
||||
"reason": "Buying food is crucial to prevent desertion, and repairing armor will help reduce casualties. Recruit infantry is put on hold due to limited resources."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2031.9790840148926,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Troop tier upgrade decision",
|
||||
"raw_response": "{\"action\": \"save_gold\", \"reason\": \"Upgrading recruits now would leave little gold for emergencies, especially with the upcoming raiders camp attack. It's better to maintain the current troop composition and save gold for potential future threats or opportunities.\", \"expected_outcome\": \"Having enough gold to respond to unexpected situations without needing to dismiss or downgrade troops.\"}",
|
||||
"parsed": {
|
||||
"action": "save_gold",
|
||||
"reason": "Upgrading recruits now would leave little gold for emergencies, especially with the upcoming raiders camp attack. It's better to maintain the current troop composition and save gold for potential future threats or opportunities.",
|
||||
"expected_outcome": "Having enough gold to respond to unexpected situations without needing to dismiss or downgrade troops."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2231.9281101226807,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Multi-turn planning horizon",
|
||||
"raw_response": "{\"extra_food_days\": 5, \"cost\": 50, \"remaining_gold\": 250, \"reason\": \"Buying 5 extra food days will ensure we have enough food to cross the desert and reach the town on day 10, while still having enough gold left for potential expenses.\"}",
|
||||
"parsed": {
|
||||
"extra_food_days": 5,
|
||||
"cost": 50,
|
||||
"remaining_gold": 250,
|
||||
"reason": "Buying 5 extra food days will ensure we have enough food to cross the desert and reach the town on day 10, while still having enough gold left for potential expenses."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 1943.3560371398926,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 2031.9790840148926,
|
||||
"latency_p99_ms": 2231.9281101226807
|
||||
},
|
||||
"3": {
|
||||
"level": 3,
|
||||
"name": "Battle Tactics (Formation Commands)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Ranged vs infantry \u2014 defensive opening",
|
||||
"raw_response": "{\"infantry_order\": \"skirmish\", \"archer_order\": \"skirmish\", \"reason\": \"The enemy heavy infantry is advancing, so we should use our archers and infantry to pepper them with ranged attacks before they close to melee range. This will soften them up and reduce their combat effectiveness when our forces meet.\", \"expected_outcome\": \"By the time our forces engage in melee, the enemy heavy infantry will have suffered significant casualties from the ranged attacks, giving us a tactical advantage and maximizing the damage we can inflict before contact is made.\"}",
|
||||
"parsed": {
|
||||
"infantry_order": "skirmish",
|
||||
"archer_order": "skirmish",
|
||||
"reason": "The enemy heavy infantry is advancing, so we should use our archers and infantry to pepper them with ranged attacks before they close to melee range. This will soften them up and reduce their combat effectiveness when our forces meet.",
|
||||
"expected_outcome": "By the time our forces engage in melee, the enemy heavy infantry will have suffered significant casualties from the ranged attacks, giving us a tactical advantage and maximizing the damage we can inflict before contact is made."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": false,
|
||||
"latency_ms": 3524.5888233184814,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Outnumbered \u2014 retreat decision",
|
||||
"raw_response": "{\"order\": \"fallback\", \"direction\": \"east\", \"reason\": \"Outnumbered and with many wounded, falling back to the escape route is the best option to preserve our troops.\", \"priority\": \"preserve_troops\"}",
|
||||
"parsed": {
|
||||
"order": "fallback",
|
||||
"direction": "east",
|
||||
"reason": "Outnumbered and with many wounded, falling back to the escape route is the best option to preserve our troops.",
|
||||
"priority": "preserve_troops"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 1523.0557918548584,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Cavalry flanking opportunity",
|
||||
"raw_response": "{\"cavalry_order\": \"flank_left\", \"timing\": \"now\", \"reason\": \"Enemy left flank is unprotected and engaging our main force. A swift cavalry charge will catch them off guard and route them in ~30 seconds.\", \"risk\": \"low\"}",
|
||||
"parsed": {
|
||||
"cavalry_order": "flank_left",
|
||||
"timing": "now",
|
||||
"reason": "Enemy left flank is unprotected and engaging our main force. A swift cavalry charge will catch them off guard and route them in ~30 seconds.",
|
||||
"risk": "low"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 1614.2308712005615,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 1614.2308712005615,
|
||||
"latency_p99_ms": 3524.5888233184814
|
||||
},
|
||||
"4": {
|
||||
"level": 4,
|
||||
"name": "Trade Route (Campaign Navigation)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Optimal trade route selection",
|
||||
"raw_response": "{\"route\": \"A\", \"expected_profit\": 500, \"days_used\": 8, \"reason\": \"Route A offers the highest profit with a manageable travel time and low risk. Selling grain in Vlandia and furs in Sturgia provides a good balance of profitability and travel distance.\", \"risk\": \"low\"}",
|
||||
"parsed": {
|
||||
"route": "A",
|
||||
"expected_profit": 500,
|
||||
"days_used": 8,
|
||||
"reason": "Route A offers the highest profit with a manageable travel time and low risk. Selling grain in Vlandia and furs in Sturgia provides a good balance of profitability and travel distance.",
|
||||
"risk": "low"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2335.57391166687,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Adapt plan when war declared",
|
||||
"raw_response": "{\"decision\": \"divert_to_sturgia\", \"reason\": \"Given the new information that Vlandia has declared war on our faction, entering their territory would be too dangerous. Sturgia is a neutral but friendly region, so it should be safe to trade there. The Empire is allied but further away, and while safe, the extra travel time could reduce the profitability of the trade mission.\", \"gold_at_risk\": 400}",
|
||||
"parsed": {
|
||||
"decision": "divert_to_sturgia",
|
||||
"reason": "Given the new information that Vlandia has declared war on our faction, entering their territory would be too dangerous. Sturgia is a neutral but friendly region, so it should be safe to trade there. The Empire is allied but further away, and while safe, the extra travel time could reduce the profitability of the trade mission.",
|
||||
"gold_at_risk": 400
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2723.7470149993896,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Multi-stop route planning with constraints",
|
||||
"raw_response": "{\n \"stops\": [\n \"Rhotae\",\n \"Ortysia\",\n \"Epicrotea\"\n ],\n \"total_days\": 7,\n \"estimated_profit\": 1200,\n \"reason\": \"Starting in Rhotae allows buying cheap leather and selling well in the south. Moving to Ortysia to buy grain cheap. Finally, Epicrotea for iron market. Total 7 days, 800 gold budget sufficient. Expected profit of 1200 gold.\"\n}",
|
||||
"parsed": {
|
||||
"stops": [
|
||||
"Rhotae",
|
||||
"Ortysia",
|
||||
"Epicrotea"
|
||||
],
|
||||
"total_days": 7,
|
||||
"estimated_profit": 1200,
|
||||
"reason": "Starting in Rhotae allows buying cheap leather and selling well in the south. Moving to Ortysia to buy grain cheap. Finally, Epicrotea for iron market. Total 7 days, 800 gold budget sufficient. Expected profit of 1200 gold."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 3038.438320159912,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 2723.7470149993896,
|
||||
"latency_p99_ms": 3038.438320159912
|
||||
},
|
||||
"5": {
|
||||
"level": 5,
|
||||
"name": "Mini Campaign (Full Campaign Loop)",
|
||||
"turns": [
|
||||
{
|
||||
"turn": 1,
|
||||
"state_before": {
|
||||
"turn": 1,
|
||||
"gold": 1000,
|
||||
"troops": 25,
|
||||
"renown": 150,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"trade_circuit\", \"target_faction\": null, \"reason\": \"Establishing trade relations will bring in gold without risking relations or retaliation. This sets a foundation for economic growth and allows me to assess other factions before committing to any alliances or conflicts.\", \"goal\": \"Increase gold reserves and build a power base through trade and diplomacy\"}",
|
||||
"parsed": {
|
||||
"action": "trade_circuit",
|
||||
"target_faction": null,
|
||||
"reason": "Establishing trade relations will bring in gold without risking relations or retaliation. This sets a foundation for economic growth and allows me to assess other factions before committing to any alliances or conflicts.",
|
||||
"goal": "Increase gold reserves and build a power base through trade and diplomacy"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "trade_circuit",
|
||||
"latency_ms": 2898.583173751831,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"turn": 2,
|
||||
"state_before": {
|
||||
"turn": 2,
|
||||
"gold": 1300,
|
||||
"troops": 25,
|
||||
"renown": 160,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"raid_village\", \"target_faction\": null, \"reason\": \"Raiding a village will provide a quick boost to my gold reserves and renown. This will also help me gauge the strength of my troops and the potential resistance from the targeted faction. It's a bold move that shows my power and willingness to act, which may intimidate other factions and open up opportunities for future alliances or conflicts.\", \"goal\": \"Increase gold and renown, assess troop strength and potential faction responses\"}",
|
||||
"parsed": {
|
||||
"action": "raid_village",
|
||||
"target_faction": null,
|
||||
"reason": "Raiding a village will provide a quick boost to my gold reserves and renown. This will also help me gauge the strength of my troops and the potential resistance from the targeted faction. It's a bold move that shows my power and willingness to act, which may intimidate other factions and open up opportunities for future alliances or conflicts.",
|
||||
"goal": "Increase gold and renown, assess troop strength and potential faction responses"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "raid_village",
|
||||
"latency_ms": 3112.649917602539,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"turn": 3,
|
||||
"state_before": {
|
||||
"turn": 3,
|
||||
"gold": 1500,
|
||||
"troops": 25,
|
||||
"renown": 190,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"recruit_troops\", \"target_faction\": null, \"reason\": \"Recruiting more troops is essential for expanding my military strength and increasing my chances of successfully sieging a castle in the next turn. This will also help me prepare for potential conflicts with other factions and solidify my power base. Strengthening my army is a crucial step towards achieving my goal of becoming the King of Calradia.\", \"goal\": \"Increase troop count for future military expansion and defense\"}",
|
||||
"parsed": {
|
||||
"action": "recruit_troops",
|
||||
"target_faction": null,
|
||||
"reason": "Recruiting more troops is essential for expanding my military strength and increasing my chances of successfully sieging a castle in the next turn. This will also help me prepare for potential conflicts with other factions and solidify my power base. Strengthening my army is a crucial step towards achieving my goal of becoming the King of Calradia.",
|
||||
"goal": "Increase troop count for future military expansion and defense"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "recruit_troops",
|
||||
"latency_ms": 3136.754035949707,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"turn": 4,
|
||||
"state_before": {
|
||||
"turn": 4,
|
||||
"gold": 1300,
|
||||
"troops": 40,
|
||||
"renown": 190,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"siege_castle\", \"target_faction\": null, \"reason\": \"Sieging a castle will allow me to establish a settlement, which is crucial for meeting one of my win conditions. This will also demonstrate my military might and potentially intimidate other factions, opening up opportunities for future alliances or conflicts. While it comes with a risk of losing troops morale and straining relations, it's a necessary step towards achieving my goal of becoming the King of Calradia.\", \"goal\": \"Establish a settlement and demonstrate military strength to intimidate other factions\"}",
|
||||
"parsed": {
|
||||
"action": "siege_castle",
|
||||
"target_faction": null,
|
||||
"reason": "Sieging a castle will allow me to establish a settlement, which is crucial for meeting one of my win conditions. This will also demonstrate my military might and potentially intimidate other factions, opening up opportunities for future alliances or conflicts. While it comes with a risk of losing troops morale and straining relations, it's a necessary step towards achieving my goal of becoming the King of Calradia.",
|
||||
"goal": "Establish a settlement and demonstrate military strength to intimidate other factions"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "siege_castle",
|
||||
"latency_ms": 3433.211326599121,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"final_state": {
|
||||
"turn": 4,
|
||||
"gold": 800,
|
||||
"troops": 40,
|
||||
"renown": 270,
|
||||
"settlements": 1,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"reached_gold_target": false,
|
||||
"reached_renown_target": false,
|
||||
"reached_settlement_target": true,
|
||||
"latency_p50_ms": 3136.754035949707,
|
||||
"latency_p99_ms": 3433.211326599121
|
||||
}
|
||||
},
|
||||
"summary": {
|
||||
"total_elapsed_s": 40.7,
|
||||
"levels_run": [
|
||||
0,
|
||||
1,
|
||||
2,
|
||||
3,
|
||||
4,
|
||||
5
|
||||
],
|
||||
"levels_passed": [
|
||||
0,
|
||||
2,
|
||||
3,
|
||||
4,
|
||||
5
|
||||
],
|
||||
"levels_failed": [
|
||||
1
|
||||
],
|
||||
"m1_gate_passed": false,
|
||||
"m1_gate_notes": [
|
||||
"Level 1 FAILED (score=0.50)"
|
||||
],
|
||||
"m1_latency_threshold_ms": 10000
|
||||
}
|
||||
}
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user