forked from Rockachopa/Timmy-time-dashboard
Compare commits
1 Commits
claude/iss
...
gemini/iss
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
4e4206a91e |
22
AGENTS.md
22
AGENTS.md
@@ -131,28 +131,6 @@ self-testing, reflection — use every tool he has.
|
||||
|
||||
## Agent Roster
|
||||
|
||||
### Gitea Permissions
|
||||
|
||||
All agents that push branches and create PRs require **write** permission on the
|
||||
repository. Set via the Gitea admin API or UI under Repository → Settings → Collaborators.
|
||||
|
||||
| Agent user | Required permission | Gitea login |
|
||||
|------------|--------------------|----|
|
||||
| kimi | write | `kimi` |
|
||||
| claude | write | `claude` |
|
||||
| gemini | write | `gemini` |
|
||||
| antigravity | write | `antigravity` |
|
||||
| hermes | write | `hermes` |
|
||||
| manus | write | `manus` |
|
||||
|
||||
To grant write access (requires Gitea admin or repo admin token):
|
||||
```bash
|
||||
curl -s -X PUT "http://143.198.27.163:3000/api/v1/repos/rockachopa/Timmy-time-dashboard/collaborators/<username>" \
|
||||
-H "Authorization: token <admin-token>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"permission": "write"}'
|
||||
```
|
||||
|
||||
### Build Tier
|
||||
|
||||
**Local (Ollama)** — Primary workhorse. Free. Unrestricted.
|
||||
|
||||
@@ -1,111 +0,0 @@
|
||||
# The Sovereignty Loop
|
||||
|
||||
This document establishes the primary engineering constraint for all Timmy Time development: every task must increase sovereignty as a default deliverable. Not as a future goal. Not as an optimization pass. As a constraint on every commit, every function, every inference call.
|
||||
|
||||
The full 11-page governing architecture document is available as a PDF: [The-Sovereignty-Loop.pdf](./The-Sovereignty-Loop.pdf)
|
||||
|
||||
> "The measure of progress is not features added. It is model calls eliminated."
|
||||
|
||||
## The Core Principle
|
||||
|
||||
> **The Sovereignty Loop**: Discover with an expensive model. Compress the discovery into a cheap local rule. Replace the model with the rule. Measure the cost reduction. Repeat.
|
||||
|
||||
Every call to an LLM, VLM, or external API passes through three phases:
|
||||
1. **Discovery** — Model sees something for the first time (expensive, unavoidable, produces new knowledge)
|
||||
2. **Crystallization** — Discovery compressed into durable cheap artifact (requires explicit engineering)
|
||||
3. **Replacement** — Crystallized artifact replaces the model call (near-zero cost)
|
||||
|
||||
**Code review requirement**: If a function calls a model without a crystallization step, it fails code review. No exceptions. The pattern is always: check cache → miss → infer → crystallize → return.
|
||||
|
||||
## The Sovereignty Loop Applied to Every Layer
|
||||
|
||||
### Perception: See Once, Template Forever
|
||||
- First encounter: VLM analyzes screenshot (3-6 sec) → structured JSON
|
||||
- Crystallized as: OpenCV template + bounding box → `templates.json` (3 ms retrieval)
|
||||
- `crystallize_perception()` function wraps every VLM response
|
||||
- **Target**: 90% of perception cycles without VLM by hour 1, 99% by hour 4
|
||||
|
||||
### Decision: Reason Once, Rule Forever
|
||||
- First encounter: LLM reasons through decision (1-5 sec)
|
||||
- Crystallized as: if/else rules, waypoints, cached preferences → `rules.py`, `nav_graph.db` (<1 ms)
|
||||
- Uses Voyager pattern: named skills with embeddings, success rates, conditions
|
||||
- Skill match >0.8 confidence + >0.6 success rate → executes without LLM
|
||||
- **Target**: 70-80% of decisions without LLM by week 4
|
||||
|
||||
### Narration: Script the Predictable, Improvise the Novel
|
||||
- Predictable moments → template with variable slots, voiced by Kokoro locally
|
||||
- LLM narrates only genuinely surprising events (quest twist, death, discovery)
|
||||
- **Target**: 60-70% templatized within a week
|
||||
|
||||
### Navigation: Walk Once, Map Forever
|
||||
- Every path recorded as waypoint sequence with terrain annotations
|
||||
- First journey = full perception + planning; subsequent = graph traversal
|
||||
- Builds complete nav graph without external map data
|
||||
|
||||
### API Costs: Every Dollar Spent Must Reduce Future Dollars
|
||||
|
||||
| Week | Groq Calls/Hr | Local Decisions/Hr | Sovereignty % | Cost/Hr |
|
||||
|---|---|---|---|---|
|
||||
| 1 | ~720 | ~80 | 10% | $0.40 |
|
||||
| 2 | ~400 | ~400 | 50% | $0.22 |
|
||||
| 4 | ~160 | ~640 | 80% | $0.09 |
|
||||
| 8 | ~40 | ~760 | 95% | $0.02 |
|
||||
| Target | <20 | >780 | >97% | <$0.01 |
|
||||
|
||||
## The Sovereignty Scorecard (5 Metrics)
|
||||
|
||||
Every work session ends with a sovereignty audit. Every PR includes a sovereignty delta. Not optional.
|
||||
|
||||
| Metric | What It Measures | Target |
|
||||
|---|---|---|
|
||||
| Perception Sovereignty % | Frames understood without VLM | >90% by hour 4 |
|
||||
| Decision Sovereignty % | Actions chosen without LLM | >80% by week 4 |
|
||||
| Narration Sovereignty % | Lines from templates vs LLM | >60% by week 2 |
|
||||
| API Cost Trend | Dollar cost per hour of gameplay | Monotonically decreasing |
|
||||
| Skill Library Growth | Crystallized skills per session | >5 new skills/session |
|
||||
|
||||
Dashboard widget on alexanderwhitestone.com shows these in real-time during streams. HTMX component via WebSocket.
|
||||
|
||||
## The Crystallization Protocol
|
||||
|
||||
Every model output gets crystallized:
|
||||
|
||||
| Model Output | Crystallized As | Storage | Retrieval Cost |
|
||||
|---|---|---|---|
|
||||
| VLM: UI element | OpenCV template + bbox | templates.json | 3 ms |
|
||||
| VLM: text | OCR region coords | regions.json | 50 ms |
|
||||
| LLM: nav plan | Waypoint sequence | nav_graph.db | <1 ms |
|
||||
| LLM: combat decision | If/else rule on state | rules.py | <1 ms |
|
||||
| LLM: quest interpretation | Structured entry | quests.db | <1 ms |
|
||||
| LLM: NPC disposition | Name→attitude map | npcs.db | <1 ms |
|
||||
| LLM: narration | Template with slots | narration.json | <1 ms |
|
||||
| API: moderation | Approved phrase cache | approved.set | <1 ms |
|
||||
| Groq: strategic plan | Extracted decision rules | strategy.json | <1 ms |
|
||||
|
||||
Skill document format: markdown + YAML frontmatter following agentskills.io standard (name, game, type, success_rate, times_used, sovereignty_value).
|
||||
|
||||
## The Automation Imperative & Three-Strike Rule
|
||||
|
||||
Applies to developer workflow too, not just the agent. If you do the same thing manually three times, you stop and write the automation before proceeding.
|
||||
|
||||
**Falsework Checklist** (before any cloud API call):
|
||||
1. What durable artifact will this call produce?
|
||||
2. Where will the artifact be stored locally?
|
||||
3. What local rule or cache will this populate?
|
||||
4. After this call, will I need to make it again?
|
||||
5. If yes, what would eliminate the repeat?
|
||||
6. What is the sovereignty delta of this call?
|
||||
|
||||
## The Graduation Test (Falsework Removal Criteria)
|
||||
|
||||
All five conditions met simultaneously in a single 24-hour period:
|
||||
|
||||
| Test | Condition | Measurement |
|
||||
|---|---|---|
|
||||
| Perception Independence | 1 hour, no VLM calls after minute 15 | VLM calls in last 45 min = 0 |
|
||||
| Decision Independence | Full session with <5 API calls total | Groq/cloud calls < 5 |
|
||||
| Narration Independence | All narration from local templates + local LLM | Zero cloud TTS/narration calls |
|
||||
| Economic Independence | Earns more sats than spends on inference | sats_earned > sats_spent |
|
||||
| Operational Independence | 24 hours unattended, no human intervention | Uptime > 23.5 hrs |
|
||||
|
||||
> "The arch must hold after the falsework is removed."
|
||||
@@ -1,296 +0,0 @@
|
||||
%PDF-1.4
|
||||
%“Œ‹ž ReportLab Generated PDF document (opensource)
|
||||
1 0 obj
|
||||
<<
|
||||
/F1 2 0 R /F2 3 0 R /F3 4 0 R /F4 6 0 R /F5 8 0 R /F6 9 0 R
|
||||
/F7 15 0 R
|
||||
>>
|
||||
endobj
|
||||
2 0 obj
|
||||
<<
|
||||
/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
3 0 obj
|
||||
<<
|
||||
/BaseFont /Times-Bold /Encoding /WinAnsiEncoding /Name /F2 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
4 0 obj
|
||||
<<
|
||||
/BaseFont /Times-Italic /Encoding /WinAnsiEncoding /Name /F3 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
5 0 obj
|
||||
<<
|
||||
/Contents 23 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
6 0 obj
|
||||
<<
|
||||
/BaseFont /Times-Roman /Encoding /WinAnsiEncoding /Name /F4 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
7 0 obj
|
||||
<<
|
||||
/Contents 24 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
8 0 obj
|
||||
<<
|
||||
/BaseFont /Courier /Encoding /WinAnsiEncoding /Name /F5 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
9 0 obj
|
||||
<<
|
||||
/BaseFont /Symbol /Name /F6 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
10 0 obj
|
||||
<<
|
||||
/Contents 25 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
11 0 obj
|
||||
<<
|
||||
/Contents 26 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
12 0 obj
|
||||
<<
|
||||
/Contents 27 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
13 0 obj
|
||||
<<
|
||||
/Contents 28 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
14 0 obj
|
||||
<<
|
||||
/Contents 29 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
15 0 obj
|
||||
<<
|
||||
/BaseFont /ZapfDingbats /Name /F7 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
16 0 obj
|
||||
<<
|
||||
/Contents 30 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
17 0 obj
|
||||
<<
|
||||
/Contents 31 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
18 0 obj
|
||||
<<
|
||||
/Contents 32 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
19 0 obj
|
||||
<<
|
||||
/Contents 33 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
20 0 obj
|
||||
<<
|
||||
/PageMode /UseNone /Pages 22 0 R /Type /Catalog
|
||||
>>
|
||||
endobj
|
||||
21 0 obj
|
||||
<<
|
||||
/Author (\(anonymous\)) /CreationDate (D:20260322181712+00'00') /Creator (\(unspecified\)) /Keywords () /ModDate (D:20260322181712+00'00') /Producer (ReportLab PDF Library - \(opensource\))
|
||||
/Subject (\(unspecified\)) /Title (\(anonymous\)) /Trapped /False
|
||||
>>
|
||||
endobj
|
||||
22 0 obj
|
||||
<<
|
||||
/Count 11 /Kids [ 5 0 R 7 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 16 0 R 17 0 R 18 0 R
|
||||
19 0 R ] /Type /Pages
|
||||
>>
|
||||
endobj
|
||||
23 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 611
|
||||
>>
|
||||
stream
|
||||
Gatm7a\pkI(r#kr^15oc#d(OW9W'%NLCsl]G'`ct,r*=ra:9Y;O.=/qPPA,<)0u%EDp`J-)D8JOZNBo:EH0+93:%&I&d`o=Oc>qW[`_>md85u<*X\XrP6`u!aE'b&MKLI8=Mg=[+DUfAk>?b<*V(>-/HRI.f.AQ:/Z;Q8RQ,uf4[.Qf,MZ"BO/AZoj(nN.=-LbNB@mIA0,P[A#-,.F85[o)<uTK6AX&UMiGdCJ(k,)DDs</;cc2djh3bZlGB>LeAaS'6IiM^k:&a-+o[tF,>h6!h_lWDGY*uAlMJ?.$S/*8Vm`MEp,TV(j01fp+-RiIG,=riK'!mcY`41,5^<Fb\^/`jd#^eR'RY?C=MrM/#*H$8t&9N(fNgoYh&SDT/`KKFC`_!Jd_MH&i`..L+eT;drS+7a3&gpq=a!L0!@^9P!pEUrig*74tNM[=V`aL.o:UKH+4kc=E&*>TA$'fi"hC)M#MS,H>n&ikJ=Odj!TB7HjVFIsGiSDs<c!9Qbl.gX;jh-".Ys'VRFAi*R&;"eo\Cs`qdeuh^HfspsS`r0DZGQjC<VDelMs;`SYWo;V@F*WIE9*7H7.:*RQ%gA5I,f3:k$>ia%&,\kO!4u~>endstream
|
||||
endobj
|
||||
24 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2112
|
||||
>>
|
||||
stream
|
||||
Gatm;gN)%<&q/A5FQJ?N;(un#q9<pGPcNkN4(`bFnhL98j?rtPScMM>?b`LO!+'2?>1LVB;rV2^Vu-,NduB#ir$;9JW/5t/du1[A,q5rTPiP\:lPk/V^A;m3T4G<n#HMN%X@KTjrmAX@Ft3f\_0V]l;'%B)0uLPj-L2]$-hETTlYY)kkf0!Ur_+(8>3ia`a=%!]lb@-3Md1:7.:)&_@S'_,o0I5]d^,KA2OcA_E$JM]Z[;q#_Y69DLSqMoC1s2/n0;<"Z_gm>Lsk6d7A$_H,0o_U7?#4]C5!*cNV+B]^5OnG>WdB'2Pn>ZQ)9/_jBY.doEVFd6FYKjF<A8=m5uGn4gU-@P9n(rI:Qq:FsSA)/:VTP8\lhj2#6ApURNhalBJoU^$^'@mn!,BWDt<AF@U4B89H'BW7#l`H`R,*_N]F1`qNa1j!eKY:aR3p@5[n<r_1cE]rLj62'lK'cVDYndl\6<Cm?%B:Z>nB:[%Ft)/$#B>JM$UP8A0/,8MLf#nDSeH^_T5E!L-[2O5mU<jpXXBo9XeVBann[mSNE21KVn+l9f]?,n7WR@L:FfNMd5((XBC:/tmVO,^-oP]"#\G."W">`S?nEbuH.X!I9He::B(!Y;;2gZ#I4!*G,]LIVA"<E5iblY?O,gSrI[_"TE>:4Hh7\j;LJK&Hg?mS.&Re?X5NFgNlh&S=G7*]T;#nN7=AAClhL"!9_a]SA/?3oDEk7jk/&b_[Y*NbtQ'"3f0]epO/m+5V]UrDS3?;amUh7O8l)C"(.8R-4P8Kb$@p$a,nP2S+KS_I(-8A(b4nJ;\s::1HQ7joV1(6Ue/mFbSAJ=Grd/;]\GeD^m1_e:j,a[)u4i*i*:7SQPMo#)\)MPp:cDD09&s[mM2_@9]_-7WMV1]uNcb4,FnrZdfL@jC%kJHjF%6L5RE(\gZ.@GJ_\CZ?#jcYA"b*ZTp0f-DsI$.X@fcWl+94`3F9BUZ%qGKG43K5V;jl]tb'&<>?NU)_s[hepiJ![@ej%/DH:tf3+p^]P/us*LmWd1`^VLl'k"5N8H:6r'V1heU1'M,6FK^ID8Nds$'kajj5PJYn+_N^C#4k3\#C6[D_Y\MO/C@YP`kDH:bkc=3.,&8O;cD[(c/WH>Vp_KcV(/%bh/Ec3U()<\7;UG`6=[P:4ah_l^@;!pL55.g=G@KJsjQPHSE4HdG1O-nBuPFY&lmLYa+beK)K?LAb8D"T(DK5$L0ON^IB+:Q2Vn(<<atkt*'ADH,_BDsSL7ClRh\J^B^X&eCO2$NIcg9KVHoWq>0s2fp!b1GZ+%K,NeKZ<3hDIp:]INMurJ:pS&G:gKG>\./?UQ#$eGCq+2:]dQ+mj=+j%+FX`FmAogol!t#S^j0REChrCiB^6_\i6XP_9A92)[H-OBQ-^QV=bOrfQeop/q'f)Rd8*CSbPXcqABTI;Jf.%Foa[>:LE4mcOkC/q^DlM7$#aGGF87YQ4PsYuFY'GsT\r1qpDljUWhGoOpJ^<t;o+@[V4XG]8K/<do29F"^QnAPQs(S1'Onu9^q+I6=//DAT#5k(lOVZ+&JgEhZ=1e_dedNZ&CGR>Sn"(,&'<74C%2'H7u,,<:?Uk=>6"$mO5`-%cE^r.#D$n(Un+J&FcD,(btu4G`Be/i5ka60S*^"C9c-EsWYL*H'pS)dKq[g7Q]b@3Ar$XZl4sKdK0%>6N]p<\fA.PRA;r(60Z[YE/(bM#H-sEl8glMDc13\n"PjqnGnP2EP#2(G*`P4EZKWY[r52.KA94,mXeNiJ]aIb4jctGF4Y^j[UL#q<*!@4p28#j!`p>3.[nlNA:$9hsj(&!Y?d`_:J3[/cd/"j!5+0I;^Aa7o*H*RPCjtBk=g)p2@F@T<[6s+.HXC72TnOuNkmce'5arFH+O`<nI]E3&ZMF>QFc>B+7D=UbdV'Doj(R!.H^<_1>NuF)SJUP-<1_5$AS8$kL$Kd8mW9oFeY+ksfU^+>Bjlh3[E9Q-BhuT=5B9_fpYq.#B1C:9H9WLHCG_TS-G8kE+)`hnTD/Kggt54$fdqH-QM1kc]@$jhjj%Jd9.G:o@maribiV!4Iqar3O!;,iYmZVV?=:*%&jM!_N3d?Nj)l!BGKDQB_sKgce(&pK_1pDg~>endstream
|
||||
endobj
|
||||
25 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2489
|
||||
>>
|
||||
stream
|
||||
Gatm<Bi?6H')g+ZaDcfBZ-B`S<f>T`j#M:&i#)`0mh[0+MH3<KTeBK4@'m[t?QIs;#pb8p_Mi0YOngIWO-^kaLu6:&Q8R&C1]$o76r?Xa"\!-edd3.RcVFI%Yql\$Amu\>IQY[ao0`D)jjIt$]_"#eK/>,mP$q]lVm@,9S+_D+/s_LRct1sTF;mq$1_Y#F0q\@KRXLm_O%.5^;ER[+8O82sF2aH8P0qDpampV\N+`i:knJ*lpZm;1.6X7ZPc"P$U]iqtb0iimqem5*S:&=:HVK^N/.<1C-u4bH;&E%!Lphek0U]q--OhL^aF=+"_g9mgKsB.leVYe@4f<)P<NP7=DtF>0kGP?OAFaKc'-,G8:FQXqZb=9#+GbYhRcP48mEsV%PT-H%<JgbH3AIMPJsDe#K;V7M8_q[;73r]QoT=XRUiA&2B#RoL=*2J.Z**+\W\aM$n`K3\OML"9KI5)_Y9l)@K-H96,-hJh!R6LgD.=>?8n/[F$VJJNmV?(7np[X_N2V*ETM"!2-9"c%f<TD++5*N,7AHtmf'$i^li;lo-nhm#YXirfr41qsq\8*Ci1<Zbk@\o.q,1lSjRU,k7VTCcTb+)j1X5,\kZ,7G`0q."qOIZ3"sZHDe*_`GXkIC/-'gd&pQ1"068[899PZ8Mi!&k2iaCd%j-sKh+lciaH/]gAhcZbF.3-H76RUWbj@VGfRMME]djehu3M-Ou@;WCE%n4,D[:krIm!$L4BDE>=JT*al;`=TmYm#PqET'Uh,aH%,\k9c8\u#.g_C4/Xq#[WW+(5&D:eu\!Y.-$.Va]@1dgbL4$1;b%1L;<;i(5"oaWFgjPYSO9-3.<I_=5dV,gE5Spb.;"hX=aqKu^Xf#+h`o(]Sr8/%'*67GAoN^DX4?C/@(u(2JSq.OF8;>.)BEk<frh]m*2e-j!_MHlP0egP%SMf1()8_,PWo1)J1J%q!Y]Cb%o/A-a"T^JUeONPH=+ES:W_N$C#>Q3[`ONAmAjcNVO"D<Oh("Bf4SKTYu[U4P$*q\Gpc`/GH-PZBSGXpc/XY5$tcbR9ZY,hc:X_qs4:%9_ubq!W08`80FnP@07_nV$W9p049\[9N5"[6(U1Ig65[I\!qcJ"KorMEM1]\R5o&$Z0U,hn.A/FZ^"P\9Pd`K69X^p$)2BSPZ-hkfrK*#<9LEL7ni@2Se_:2[ei%KMd`bO`<LB9\:=HQjI]pqq"[@Nh4Iu7bF50EZ<'/#?8.<ETQugk0qAG-hK1,(V1a9/#;-(>Kn=WCA%N(S>M;h]b@J^D%I]ilPDe%qW[B)rBqCTTX5^AlM"ZWV2;f^+p7juA;<i%_(!YY$]cF$fIV>pd6-?u>$Rms.ECrS/J`8>n.lKeMKDQc.H[S&;B95.(:"`2A7QY=5](`*bN^(YNhF[,]Djh;LmiJ,_"s=#j(8d;.g6F,CoUqRX#<Qid,kmd3EP2jC9D$]N@^pj^1eZto<sp*"jBIZ-[fCng5m"p&H)&8E52C/<rfWnTq-8L98!3\BJ8DJFks[0]n;1-et*c/5r8;U&]Dun5Oq<J17K35NB?Rs(Pd$`K0G/U>GZZC_PQQf>T)]a&A8R^g],:[]L+/83Eh?`cq1aEaXU[\S'c[`!e/,g0.5-6rbWSaQfr4W;pDZ51`EEu*t<G6_U5B4rjhu)&oYh\4H)e*p!Hf`;%1?20oY*qqb]KLUZiP7]%%X9'umr$-o>JRBQR$SK^]i2d`f5!Icg6CCaTNPgNbPaY/FDk*O6=nY1j8G\0pl2gTd9m1SDWWh[uQNCFRDIH_"[/F@r)IEObA3UVm82UN0:6+@.LhOU?A]+TI`Q\TV],jH:b\9uHGe4Q9'GX:)'T7./J:j<5J.L3sk_%qn$&T'eLSo`?3gF9F='s#E16?""E]3IW<eL.]5&:_tJ7e:#%4=gLQK*#I/(CE)oS*V7KO[d3#^`pabg[MBmkSH%92oCgZ=o<.a&lc,e<]&RI`pl;V2,"f^dC@1.3VdX3\F2l50Y=9HpL^mu-JgSgn,1G/G't^Mkhe"<1-Oh/>['oDAFKG\s^Suc*ib$@KhsVhK/BP1LXgX(d1-GooQM6CggPu1PY2?R)*NK\6XduTug+BhoEbQrsBOZ[%)SL$$Rd+1F0pu/7;0VoM@mp+i^V%K=bk<&1KsEm]NHPo"FfinGR.7Yn2,Wr0="8Wo5M+NjflT8HZGV+8_S4<'W&G3rD_QnUk0c;q3Qfou"X<[Q%HWINl_;P/+H7"Tcq?K7Ggk@&<BRL#D4F!$Fmke3-e2IE\RNE4,c'"6c(odL+r]3`%'WEDiE@2)+?TVq/]S747hL/Zl]FBu4C1>DI8TGrJS$V"JSH/D7*.X75>ZZa&aOC8rp>e$fH/N:92sd>$MGU.k/uQUm$!M)SDM7g5,>%F`%T0Vl9lS`6I(*O_4NOh0/NOJ^=t\lG.7;)rS&iuOo'9F:B/sVFYD+$k=`9/(?luKOWLDHcPHMY(ZCqi&TQ2S!r%q>b<DKp%mXdk2u~>endstream
|
||||
endobj
|
||||
26 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2711
|
||||
>>
|
||||
stream
|
||||
Gau0DD0+Gi')q<+Z'1SNSXtlc.?^DfpG0:dagd\5H-ok]1n>+E65!c@?pN(rEc_9ZScpNG;B2a*lc$,&K37'JZb+O9*)VNqaF+g5d6Hdck\3F^9_0Q!8W_<s3W1Wrqf(]S9IT'?ZQL4,K65!1kkM&YYsC4JSkR!D$$2Q4Y\WHM^6\ClhaeumQ*OV?Y'!D+U?Rp)<RYd[;a.#QH^]H)S*[6kc]V\)e&8H]8us9aHS^:GRcPDp7+4+iAq8okJ+F(>Blg."9*4$BEdTd0-IX(YI]M`#fk[+![o8UQn6$H66IB3=92"<@M@H;AP%fg,Iu\"oP*Y!r"Z+AYVf_iX0Zipi[7HJ,/Dr%+H+G(\NG7Mp(D#re@kOOE-gc7`<*c=8c+!'V=H6iSp+ZM\ANG119C`M`%?hkeXYf[gCca04])!['G1q.:'LoD[U>);c317bG!L!<i0MU=D:NWoSQE2KN<SVeK@K,l]01[KgDa2A3P+m/?SAj""0:;Ur%&R+L8$_P.JZZ<o[9_7R81KH-[34q$rXr)Wh7ZQQC$bYu7'0NiXE@OubP*]_i>O/fNc`J2rGKi3r=`&0AP'"d9-flS,dhU5b?%J7^n$/XaQc5EX3Hs!<FbL83uBYXGpDT\fTG(5.BJ0hS%])bf2B%f+TX61YpE`A'XbKXIV\i?)I+".-/8<ijs/<_(9/V4'nZB#1YD=6=E".-W)>R]&bS#U?m1DCC[c=8Bm>Gu2<78T^H[@Qs*q(6?7D<dO852tB97aXGeG%'h+4+J"5_&B4#ZiJh_%%FKR8>AHQC@iU2b>UGe89lLJ.fbnrNYjZYWkSO1S7[eSZ(^]2?Z#DA80/qhF.>,9Xa$3,Y2R7/HS-:f$mm(/DM=J+b5.k9/`1Nl?2PO2.qI9Q?Rm1uE8c93HI@8#)0<Qh4k*nn"nbel9VbF$Ik"cL.4/Y!==RM:,$H#M&4?&Z)9tA_4N&cm@\M/0L5Z4iTS<6eAs9Wii>((.KDW43Xd!=sO#]M*l:,k2A82L^P*s3OUVDYYpWbU6,`QmG=GBjrLl20kB-[=W%Ns;>$6@g<`Hl*iA^,+dZs/.bd&LmXi-f^4Z/:9Z@-ZYI*1"O.,Bjhe-`FHk;$0PYKtj)!W7VI0[t3_uJ.hct]Ko(?J"kP(_s,PH0]H.8WjhZ<%2O_QiJt_61S"6EPS-9*lUmPuH?D\Di%d3!b("RQ)k(=idnMeB5&Ha[R].*_6g3ce8V>lM@6:>t5)[aK(R9C8"X13@:_,,qs8g'sL_XIG<><liR$//JY%ERj.o1*_iN2"#)chKW.5SKj,O0:mQNd!o6FV+T.h(*Fk2[>NfAC<&MlOio"RnL`Ko[3G7MGqAYrN(g&c5Z79-#iA4n/G'$]R7=LIiDhgb@XuXKOFee7Af`:&h-q_j&I;K\o&43*</q@sPTCYW.TpNV58(Ap!Fl%8"]J?do$7clL&77;sd5U"2]m@dDIfeORqHAD2ICV/Xo4[:-IA,U[c<"a;o7YabqR<q9&_[R8cL)@Qkc:.8GsQ:I>k;(T,.4hl+SMV#UjRZ4J`]6JDh`uCi6\IE/K>hZ,M@c]AHTcQeL)W%g<o'ciW]G$5cC`k7G-F8(K5?^rDR'=UIUALh%sk`d!CO/iUY*42DTScdi3918CA@"39l=gH!gSh2o'_pGTe(gbK'k0E+7N!o"aeg)\XXC#J\\okne[8=D8bmd(fNPDYF&sMolOo<VDsm*aI'Eq-&_/deU`?NE4q?>52Z^g1nUk.OsQH%]5P<UB5amJ-:5Q:&&j9F:W&e2o#/@F9hE*[$H]Er2V][(U0A;kbWrjXG/JQ@pO<N3SJUoXOA48^I;#R\crt/rI'1m0DH%10YO6Winh]ZFdAj'mqR.fUjrlOllm=9DpY8=UsTYDeS3Emn]hDO:mdNTQY7>JQqi^".9_<OMnSWJVZqp&`DXC3nsX!+Q+a<!*n7?oDHPFNA@6P_EEck`hR(XK*aGHE85oeDR$'F&d1<pD2V:aS=fsBi'dBVd2%[`'Yu&5h?+Yllo3LjB[#8S]c?9/fdO%fERqafOmEaQ's+DkA5qbW!:UQ=8Ero#tqe@hZ1_5]3,b/FP=asg7\3X4-IoG:>^#SO2mgH"G3sBg8SHR>Fgu-J;fXAA#'mA"1VN"u5/#^;2%68(uK)8mK7`k%Kf:i9$9/8b78;f`1n=c^fh#_o[TeA^bFTL=pP)_*THO9"\5TY4&00HU],N%1UN+`7:#gDS\bJ5)1Eu;W:R]!F2d,?=,UGehUkU2aZ`BA[bH#PWp(G7NG?(r17dAt/s@#!jV1:>N,0))qYoG8U["V^Q;oO:0;KbYuP0q-(*.`ni<:=+Y'RJ=hFagH`a1+cfR=]Q(DLE^6eom6)Z_-Xq+;H.eb4nLgTN,.V\$8F=/OG34fq!OifKS))`no61(%@P`c@7pAANBY<[Rf-)tS'p=u=7h.JnT'GnmraW(OP[Dc&2-l7k`%-?jM]O(>t=himKCH^rRr%/f8D^0Ua]h7nb3%8*r?r>92%k%N;hc3E&$3gHpkjm/Ws("-&]>fLLP+rkd5,ZMDa!mi\K_i>tXq-%$eKb;(cM/1h5D;!q;?NkZT_sIEcX+eadC!<]j6#/e.Of`!2HSElEP*iEfHp)G:H@#[CqaIo4oBn.lYUSL3;SR%M$<Gk"p3TC8)!0kq&6ipLmu$teNfkSd=!X?X&n?r%JXk1J\PNe;Vi9,n0WSc'?:FW(;~>endstream
|
||||
endobj
|
||||
27 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 739
|
||||
>>
|
||||
stream
|
||||
Gat%`9omaW&;KZL'ls?]<l4P(LP.XkY_u<g]5/`F>cPqN\hjkc(=6CagDS%T'1ub&e&Lh"46's#NYt[+=FX[9!,lY?>qo_,l?cp%8>t_@^9/NXhBTf?LEek5M%\bLVdm1C!A%fKJeCX,(klr=]VrSk\8-TjcEC=(r=dE0p,dY1`^%4TR\t-!0;3iFqB@IGb/Bhr`e'"lDAF`5C8<+ABr_hu)6Tc&SG<-523Ph[C("2XjH(/G%4Gor:]E=l=5@>VGpTMrG\%m&Q4;QG;IcQX&0Nru):YiLLX*g977A1G\:N*`Kin5e&Q8TCJ^4\,f^@E-#M21"SfZ4VEuGn%IFgZ0s6Y2X[31+g\n`DHEj=<aAfo_Kh>%>R_]HoCo6.[s^cT;9n(-m7'ZUY)`JsW/oCDuL%qM$oDL\+E0Zont0T;;)a,cdRV9ZT\SQMR98THMTQ9(.>G!Zr0cKikEYt=O<]K$X1\9!!+05r;\6.-tO5@kEha]&R/Bb6e1JUugo7M`e'jM5jL4Nm@rQQg[;fb/PX+?4LBi.As2"n3ct9E@TMX>3`97IDFBWkb/^JU=]]n\qIDh9,0olr!Jf]Z6f2N@F>dUiN=tSsBcFj**-r_B8=B:uSr)^V^'nO4kp$KOGosmVSRR>Nm4f3`9Ph\Tl+`FuJEcp1Uo.BLVi8`G)d?$(\1XbuR".o=UYMf^H%P58cGJZIlkKLpOq8[8*;Q)a$I-9#I$u\,?K\Drn[6U]~>endstream
|
||||
endobj
|
||||
28 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2279
|
||||
>>
|
||||
stream
|
||||
Gatm<=`<%S&:Vs/R$V:2KraLE,k"*ODXU$"BH*`&LB%N'0t%ul<(SRBpXejB8_J+sW=?)6A'#GJqW?^p!q>`0@(u4Ni6N]3IiCWa_X\UsfSa0`#&feThbVI[#Vp_1n.N4ubp3&iGHZ$]"G,SS8%of:)5M>LX5S02iG]rX\`Dk`d5s<$U4pc59jq2Uoo?c^;cnL$jmOI*^aWO,?CF/jq0Z^g%`r+V(X8-p5rF6NSAu":a8Z)9%Q/t-8HVQNTcS3.h_iX<e-k*9$8,(;Tq/lmeAoO=Z+pfoNU()UO"L#J-I&-s%3[E%KcqU^qVd>;GHJU#L#b7X`P@""&*T,MHQ</P=<mneY*g@`_L"<H)-Uh*L`u9PhDfROWe?rc7^1[bko3T5#?r?i5]NVmd/\(l"kupnJ:SW;b.==s*a"<.X"'5/HcMD+ZH9/Mi9Ce<_(3bM6#W?5Ui&3-WHLhi$E6<aQJX+;)m20M>g"m(KN+oN5E4#4>)euUb(C4neo3.HZE+pY;KJ]ra['1,k3K>3>aEVQ^3?Y.p!3F@Y$q61>S"Q.%A]E^D<qGG[r9Go%d2Dt;:.Z@@.M5<g#I)&&-]'GAJCf`0U0r8lebLN"muXp\9mU70KU7G'`T(CP22l=86L]JRCk3hLG&$#YTscf7T)9NgE02G7>S@IhtV?31qE55qG07J&nD6un&6'LJ6/I_4$?I\,!S=hH\s,5CT`H#@FE8^.T7\*b4Un?S=>=^=9mV!Rj^9;B)7]?9H<6)P1>ph>uP^AZk11jNKZYr.QS#GcH[d[F96KKDtn'GC'Doq9?jKe[?3I8lJu2>(b1+*:ZCf\]NFr)i+`LqR"T\u-)um5q_c\m22,Z#57UE.pLR)`;NPgMiZm51JJ6BtGr>u*j"@s$Y6q0g_Dsp@fNZ!!,eo#2PP-3,Lf3=S7l7P\s#6.)9uUb64:4p*p'ck[!nE/IhS?N5o`U,8TR#?o9I&5mRYKA7kQt:T&N52T0>W0RGQ/#C:<nc.J7gire(f]WbE!aLlJOt;P^#/_=RGgs(0/=!j@%F:3C+3\n!ZAT")NsrM!"0GX`b>YeZ:?(W^W2ME,m-R"YjAH[#p$N(c`c&!mb3#PW>eE&XD^3-NYMs@PPpPG7;gE-1Xceh8<B@-(,`]S:L:]4"7Ua1P)3/q+C&h)H`:)ncBNq+0j/s[%Te;!!1Ml53!J@+V!>3/FV+iQ<Ic:9E9!b38U]@FH)jndE-Vf#8At.Jd^YQ%JSDN<oYk2qf[S3\c!MZ?e\B+m]`U9C3po;]O1>mf)3@erqSqR5rr+D%m6d.frsH7Ibc+0i?.h?fmYs'p8ci2oW*4P=0i%C8OC\H5o2Z7bq`Q8X5RNJ^sTa,l^rQNW&9M9f:LfF&uF:]eMN$T#(kH#D6CfQ#D+?0+0@mk4qL+g3)@u5C!K;F_[$H8Y7Os1ZASZie=:?[Kttu@1u-8CIJFTB%Vo?I.[*XuSNKXPfM/XY[,KTX6%(H9J/;e5,"dj]^&Wc585nOcn>52MCkaXb\JYRbOW^\GD5:4)RCYD2X0-r(9qS:1$7>t9)0-VS_*CB*?p$Ht!>?rP0B0bqd8GJGBUUICWiWCce'(Y;3FI_j+[t/RQVFVLA]ksmZ!u[e_Z3&.DXkf_Wb?&X=Q]-@M^Y?br()lIK!&(&$n!KKq#Rs7ZRgCLj`o!HpEm<Xc<"!BH'@]I`jQt&.F(J?Pe8S^T:+ZJ*S6[Q\ni:jT8Z/]Ngf4m+q&&^OgstfGnpkKl4?YDZ9U'og5%>LRs,L+<dceg5,!L2Y9dOc5<tTEH&$1(Y?YUD5+V(r<oXrAi0qd@S`8lR*5sYt@Pl2^LP7'63Ar\/kU,Y#-?#i\+L/sJd1>9NMP7sB2N[XmW\Y"N=9J#YkPlM`(K70LPX.Bj5J+A.X\m3u/&/Y,q$ds8@q>d>:]go1UOQ5>AE#J;4$WB]Ng>auiE1ekCkZm`Il7u;Zu@!%*a>(rE&<+-rn_KF[7d"+%/Vre#NrS@7Y;P^:5`b0a/+@^pr.o7n)/TU?:'b"!6`>U6)f!4<l^&RR\sjTn(hZi:s_$k,2Zf`A;64l6'2O+*bBt4h+&hn4k#J<XA_])?Hha9#.5k("k7'3l:CTNjV[eQcHW:tSfOjdpSg0JCg(/hW$"qM=?^?*HVS&WQiYP'RLT*"3/W)^*t#/k=dj&*c0i?\5u$nZCTnM=c(0MkUlk>n'-"9kYpb-/l3MDEBh'U`ddmf=\q/JG#/_+k6B>;I?Js1g1*!#j-bo2A!ZuF3V=*^ITAt$nGqJ*j2`u'M*u-,_?2~>endstream
|
||||
endobj
|
||||
29 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2560
|
||||
>>
|
||||
stream
|
||||
Gatm<m;p`='*$h'@L'W#k>L@7`0l5q0!maS.8X[fk3d8R;/IU6[4NWF%H54r\)0fD?V29(1@Pq2d_>['V;7CVYjnolJ)_O,*t*=Bl@@p3@L\?9q62i6PJtr$,<b)'<)#b]BZ@0i;h*`G-f6<Va%5qfg[a_\9EO:u@C4Zb\7@O_dr\O04e+</_U=iG@$0UI?N&;bYkS9X5Eq>&,WG:raV;Bkc3;ltR.MdY0*nI!Rc-rq^lQj4qYT:lZkR[R"baUDG5,#6bouR(Q>=2i\30V<3#bR*)[F8/6@q2;nO'$h,IP?hQ9@HT_9oE+?0/'5-OUXP3St39Z7PrLABG7hi(UGDAN^;@m]dtC>:U]JM*_HYkLB2LpPp!6'_,p*HuNopY/;,*@iW\`,8X^2.MA]\6"=b+6J#p;"\?"bINu*#>&8/2o!I%78Yi/p^fc7&(q`#m/>:a:X8jE[\ghGTGpO`;=dH=`"_SHE7DU72#,SG%DlOM^;1(_u+@^XlktOcoq"S$hSE@2?ecY>[rPuLI$^.\V1Y"bu/4W4pZiP3(bEL#)dpW=[GM3rHiM(9=nDb/k.$PWL*OrV[VGdU'lT_b\T<fHH-W(Q-!2_*AN]*GaI1`L[JnXl.Wh_bSkm^pY7*I)3`0SL_'W"eTKQFF@6VQJkS\^"(//@0T)Ap@dQHpJjU\@n\E\bs=N5Y9)*5@.c,c?ul87[,U(L&(3GVb_*Bma3EKQYFW#qST:Q5PO%&<Tu=-1IWDXTtqtaEZGu&kUQ[TseE2XDspJ0nksEh@;TiE[l>Q$]EK$nROY+;RShkRX;G:jV*lu.0d%j,RS+/CUl6R:ZlX>/_9,DeC$rrNfmA[b+!_l0r,35[8NJZX!0WM!G"\uWSD0LJn4cIoJX?_7r?BVgfn%1eHYu`dR34YZ9r>cOm]<;3[d%4n`L5&5FsIPk-*(hEcH,N`!+u!,gF`s&iXgVb8k6QN%rh^9O'-3+KSd&g*sri;B_AOD:3'gU=#,)qWI]o0Z8+&ARa3=SidlX7Z0?3\d3#.L,YSD"hui2*o!"JGYKrhD3e,r.,0l4SIG`lAd36nKkhp*T8%OmNg=PoRb>=<7ZaN7r&V;nVSCF5$c]@XWFLWbH]9Jd:&8T,W#VsU_X1%39BDI>;C2)[lCX0F*!:)D2+`qBQiAX^a05i;/LDMe!IbUYXqK[0B3!mH:au6f/idTqA#hN0ophZ<'FNo?>uY]g8:?HA6!XWub6BGaKTBa8grH^.9mS(?n)*)CPXg\=Q$4J?>h??@]a0;Lg3"5+<im3`?cfU:pNM%GX.7qkpS.en`.:D*$WU.7bGA_hHc>kR4jS!P5H68(Db((R-Ml:%0.XG-#*:lE^"PqXBP-b;1SC-gM--r-[U-GoefE6Ln,&7`o2!`/:&#Z4?*S<8i#Bs"dop)].h;HLU%]Zoi)E)W\fDDT^L8Mb9lfeI#fH@brXmc(7ct/6AKi^j?%X7.B?g)l"@F3^6Pt2T':gW^"h@2`FYZ92*>!'Q(r"=,?a:B`-a6&,[g`#bDjXAIC;WWR[?@Qkq[N5USK[l1Y%m<a=aifh8r?Q0*cd7Fhsd2=T@44<$=79Xf\N9K(P?-q%)OLg"83\V62RF]1ERWnN?UEIne18G%`Ap5W7fM0MH+/X(^[^Ap]8!A%#.VXMnp5Ib!?:H^Ou%D@]hbcP)8fSlODT1lmB=7gWLPF.rTn=YUrFXL#k$:jUb1^U+#&1P_O&eA`3:V#p'uV2GluQ+cqFod3L2ArBXsf%dnDUeZ*n&UDrbio=]H']t-1ml)qtWYIh:f!"E:<EpWc=.(<ISi4A@rJmeA0iNiYM:sKaTmjC#>]pISpp2u+Z'[=Z<(dFCbC9EaI/[q]Fn+XX8e=9"Wrdb@1^X6%coM>DbjTrK(qHnI@;YNAcko&!_\o]C.ct;qDR,+NPk3q>SU1l]lhV3$dSD%t1DoVsp)oq\r*4r(k*8fLjVph^'S+13jG1pX>4/HA`e*g94SOV5u!A^F1',[P<>DL^.(MS2mId:T.[iSVsB(WuhXg78=Fea7q`gKSN<tjucH^%0G!ef/VY&q-oauCI8LDtLdpoRV'QK*X\5(fBjlR6mMV9X/7$Pp$3TNWdC'i<_C,X;uCW]bF2f48ZKF`POt2)[$4j*5+3Qj!8`W!'JlqYDZhr&S8u!nM):Ar?!^"TNrDp)MYR'f+C=bh93R-K/HQQ#O/0_Q?]i3HV<DI!gm0?QFPhRm^>P,eIM3fd`tY%E5ESdIT:RA"4;WpEdN'</E)bW=US_YD^p/9m^@me!u:q-"o&4AM3*ZC%0rdh=0(jn4^*+r0_3DD#6GY&KqU#Im0CuJXZ%F<4Zl,'t3WI.c$tk/Na2X(R;dCfOSDb1FH4WnL;+,pf)KY\5XU$%EAciV7b')UXo]ldfPCEr-(/A>^L:J4l9R0)ZtOaeYa@S:Y2kl_:T4do7-6Wq2XbLflepYT`PQn3:)U2<fK1q3(qk=TZIBSX+Xab*k\Z@$9!OO,$S@,Z6BlqQ<3;5Os783KQKZBl^>=L'=*M!iMC%BE@Y0dWkr_Wd$<mpbpn;(IoqPHoRDT'76C~>endstream
|
||||
endobj
|
||||
30 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2470
|
||||
>>
|
||||
stream
|
||||
Gatm<>Beg[&q9SYR($(">9QOpV"0]/lo&/DUq=2$ZnH]U8Ou/V&hH:O;1JPi!$k"D->i3C*6T%P_0g<r!J:B%R/>*#J@?JBoo9%\@C$+Q04WYILS$LAIpTWD&S1NC]hH$1jXR!SnF3X7&HV2mO/$)##8s<fUaolYfaISVmtD?o=#EKmYB::]=IR)rQK=5m`=K3K$C`,oaloO>*A>kM,(IlC^(ZTfFtTOiOsLBdV;Wn1,96a^dk?Lk%Moj*nIfi[)1ImUMUQ.hI8fY2iZlV!F%QO>9\+"7OI*I@FnE5?!Q9ZXe[lB[;cOZtVA?r(/:jV2DAumP7d:=ub$#X0]H<(.nIZ0A)_eLHXV1o:^KD,M_nT\P;-F2L"r>Rl1ZjRf/0gHkWsCTg=T"+)3'tOM*QSR+`)hbATlaRtWe#d\G?^mS:q!e5Y,mAH>O2"9OnBW$RjIu&2t3(jdd%o,"e]k8jrY@4>;[XX#/hF>(o8_fU(FlBW"=:^\#h%8[jA5(/Ag<_4dIDLCuJQSDnIQQ!Sl7HV%?!u#n%^R)J%Y0F,:.lL=TqDKA,No=F1N$=XEAVE>Y4!\>a._`!nU`Z>TRHKuS`kb26>SGPir\%H!p[;h0h:Qf:8l8/J\n8$IdLjZEXMfP6%Jmqdd2PJI>`Ug_?T'n0*,RsZm%+cpj[g:UdpZLfU'`irl(C9C[sIcE9i19:PqfnIUj_h,"G\7!T&SMR!]-7iA`/rDH/F:++0Y1c3%3Ld^"GPgM[m*QttoT#DICjII+)4DNS[bRVMi?4UQ-r`1!IObl<dV[CtK4X!sNP^]kDF>WeHd^Z<IbtlE7jq`kiL<[(lK-tbW$6DbaBXTnQ43aM$GR&8_+pG\0nr7Z@Sb\hR9)okL:B=?7!F>6$-fsXnRB&K*FT9cs)oY=%=40cIO7Vt^6Acp4euI2?`,bZe(SLblq5PoPmN,NN0W<[(O&VeNu&9AXd5mP6h,_''UuWUNDENDF?Li'(qJCpJ"a?bD5A`%[:e(eP_s,7@-bV!rs+69ALq0o.;q<Y$V@Q4&d^n02'u\Q,'1'a/?UL^)U&iuVTHKuju$rihp#&BS1r!4X-#jc?lKo,L0%DR24NOjPrE[=;J>4+LmCh;Gu*"rV%hN$CLhXNq#glhmX#>6nUH&g)^Wk:ShMZ-`%DO*#522G<X7IN+5E9OO\<%jWdk`,/7$<XSh!r_;B;&1Unse`\\p\8\rNmo?"Agf.%m(f9/r)p'FdCR3'$C;]n??+0Ch2&T\Oi8S0VM!W0hmJe)muFf,t![2NAafl`:Y_h<PAL*HfD:cg;cM"Jb9-quf-+D3PX?BUfUYWhVpH5tcn8KBAcM&p-fQ-_mn1S^KmfSb/*rgn_IG%l]U98\9;:3\"kLYHU`q7ZaA0]L-q&0_PE!m_;R#g<;TFa6hQspIm[he9NbprQ9K?F]"7a*/j#h-Bo.!]c"O8#Vm`C?LSjrqo]Lk1A=I5=bX5nG(%6@jE!^0VuN'Jr4n<2kkW=HKj1YuMhu5dTO%X^a!'_q?T1L'na#8QW&PXI1h+=h=Ac_\D(l'Rl7-Z[TD%7IZ;ET"75GOB?((:s^K8)/n4Ur%J1[4]F>3$FNf)GU@d_V_lb0!X1[!D,cIU"nA_uP%$j&dJCS>8rk!=F@YPA"f!ZM7As"qUgAu=qK#(!0"X`?Q#e_k6q)"$VG5=Q_!nS'#9qfV1WqK7**etWlgH61YB%3!gf\R/.<@6)Gae`aq.l?T[s1dt[Jdg9TQ7bo$`eA(hS=E>Aah>I,Y2amS7g=FVF[[TGBnuL)rO`pjj[H`UJ2@S%&3n:)N9;C!r<&fs[Fc1mAT2[7j2m2+!9oF\Tp%gXldG@%$a3KlAKl2tNS!tW\3(h<-KHJsXdTA^R:h1(saLs\X.bQimrEO,,Y,c"Sic*h1=qcB0+u9.o7pm9A"3uu\D>96KTC*&("U;^1A#q)i6g2n.<g"dqrV@L'(jcgB[nuHG^k>"r90\pk[]S>m4p3OD-J(j3h;!SQ;bc:cQ^Ac=U,A_rCg]5#.OB+27Y$39`YoGYo?l-F]J[XUNH@riUFc@]@oVM'r/N9Xkh6#A9;A;"Sj3k+01E[^)38#-=Vgg[QFG^uX`[(<3r3jGFUFM^F)A-r:c!BFK9k#EoP+mnA`/e+i6R]_JN^HRCER9+q7"5$s0Si>,^6FeI?_3+amZkmdETH?"rQTSDI?t=46'=3f)Vjh?MjM6Pp(:?G`Ai:EJTa_?G0"P?PgE`51m5m5MUr$3pj&dn1]jW@M=PL\5N;9JAgfX:#8-Z`\UE1G,dc@FS;i0a>@@>J/1bhCR1;.O2)b^(efq7l;UeSfP=d%1f:pP@,IXd_I*-AD[*QcoIcn!:S:pn*LG="=HLj+n/k2UK5MEY]TT+mGaG>,"6[r/Tb-IkYQh2hT!f1;;iTY*7!f#C(B8QEOnkU.a8.7_04D3q,g9ZKVhurg%Tdg80uUu([;X?Z9Srh[p`DJ7'Me~>endstream
|
||||
endobj
|
||||
31 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2152
|
||||
>>
|
||||
stream
|
||||
Gatm<D/\2f%/ui*_(PnrY.\"fpWsXgZaRr:DTQp7JRQ?epuFN=[WRl&"P7!FdZVr%,utp,BmQ\uUe!YE`.qs_iBPnCjqY\d-+nWOJGHE#J;&mmQ=o])H1^d.IhG"=&$eW?k8-u\5B-D"f;-4Kl=&U&68+$<6G3_dQQ#sll<7jEf6+]X1SqPL%ndO;b,X1CVt^jis2"7D)4@SeBAk%++Y`5Po<j*b,.RuR3/u;pQM==ETf_E*5<kg=E-"uG*%oU!0ZD?FU+faFp9,)]O"PCDK;HN(aZ.I?+Koa5DX#n;ocPO+?G?/bohbHJ+a*_IoQ1,@m15Yh3o3J/_br>Y`:o1:bfASs4S)Yj1Dml*0?F&Qk#mQ\m6(`+Gr4sL(m,WuHGX'8@fi=1>g&S&;"1b&2bJQ#/[e9\YS)Yk`<t1kYIoG%K,*9$TSfJ^a)E9X%Fb`]8Zil)/]n8u.dnia\%!J2e-qi=HJ:%*DK4uSJP,F/e,63[ODEMV/brik'ZMP!U$$ho:hnML,9MMjZM4UC5mo*4*A'%2n.ReZ[ONg;#F."B5*a@,UVY#S)]QqRX:Kr%&'ZA-1&+%LcG]*dR)if[g]k"s<NdZV4``e2b*t]l@h5`8=A06^1R0A.>ja@ooRtN/G2<gqo_P>%Hs3_l<o?K=cQ$]+6+aA3!Oa;N>+mc:hPa2]'WmoL+$Z<EKUeB?"2)EsEbI5`1hg!rmTKWBEaie^)jcmKP^G)s<lt1R7UV03n-aJ^Lp=naV105jC`LO%.")N0_m0L">ZKNVO=)$*Xt3k9f$9^cJcZ"5BZRCVjLXtM"4aFXhOL3AZs)#N).NlO_9EKoI=7NMW`p?8ViLFh/+h]/="k:XFNc]&pml3F?+J.Gs!WQf\o5_(l="O3Md#%8XB_4F:n^kmV8<]%h1u*k'VM('MOm,WkaZ'ZWk-tGZ*I.(/[PS3mrE]1A\b9UrA4$)hAhZ7+Yc9.Q`F:i17o5<j2YPD(H"c8\?6dL']-8-DeC'SeZ=mV_eY>c1h6o.fM(@QQ,ql/lN.A"3X(`6Ea`NB,_u@F#I/lpG0*t?H?o'sjsGp.0JW?4.h)8qkD8QCa$=Ck^"bK4F.bUJ[&\K,P,9aDXVJF<0rO5]D?`#Wcnag$\r%\/j3;t2>CHQMleu2QBIX%dZ*5C8km]h#?b<ui('?DEiVCi&>e.S6.)[Ta_uK`WTn<(\=e_T"Q*'@/-@/eg7YY(7esn[])P5iamg#'P?sJ>/a"U<LrHs]eo0Ks[cURZ7EHSp=LKPUcfdoDXa_3mUIIT\!_XtX&L*mf31!q,MSEoU,.!]9^MB(NXeB](bbS0Hp6=(m"*1.7;/j/ln^saj8Y&&A8<7d?r.``Uml8=_r5C>bB6>'B"eT2ka3>1-fF7;e0>#a..XEnK-S"t(qDZFh_08k*:CA.*B:Y$^tO)R_AR0]:mB@"tPUr>F)%t:$4AIR38@"BEe4,%:pWg2)6j`m8tYs@,]G`-.9D;_FXAW(QV9l'TqXVTM$_d[tM"t08<aDZ;T(4s$:9:LQ_iH>JrKr0o;23M+X\6uq!pD.@rr+;V=qcY3bdp5^aUC-iunLph(R);S0/7-D4X49(>aTI+e_e>/p%b*5;#DaG97=8.#TIk"_l'9U[5LAO<g"sBRb97MjfIk5!pFJW*I4@O-8)k1e%LZ!.]dKGMmg5rI*^iecW2b0P/@'po)MC=nG4;*/msa62pF!iH$7oIYee'Xo'WL[A?>h`5Kg(ApbIdjQ8Z]7ENoCosB$/cf`>LSRFQ)nm9oHC!M2AW__WtC5@.IUqLXiA9c0\J#pEQZk,Nm"p)IrD[@#gPKl,*c91AefVK]a<5BJk+<`6p`jRIS)%q$,0RCSTJ/]2E*6ee@GpqZ0Y^SYJj(g<,\/GCc[&V]ma<X=_2:FYX2_-(I_TXN]cBM=n*;=.8I26f<VE1nqPoWtg5<`thTE>gMq1ZV>4L!`*Rh3HN)JX\Icb&`S]^*c&q.O(EB-Gc],cm/\RLbE[+]Nd^/']=#1maR%<CH*8nnObVr-lEF/na`@)IZROM,Tjn0&g:<[ZK8d3[GcVroX],Z$Cb\Nm)!X)%aA<CY%iHu-iX$!Pa*DU!TemhQj3`j2>WEWMDD3d"0Yfr8aaPr?JYgYt;_sm;c=6[hN.r^7\&-Pm780Wl~>endstream
|
||||
endobj
|
||||
32 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 1311
|
||||
>>
|
||||
stream
|
||||
Gasao9lo&I&A@C2m%lL5=u-0RWSZEjS[N$@S;&ia_&&IA7DpIFJ=mYMf69MI'WjcB/W0ZH]6Lpu]3%lPpn?`SBE<S*iS=qHarmm<7R71Q/R7J>YH)XiKZm2mK>bla34+0SqPuR+J@a:+O9?0;+H=dOKo<SZn4>bN/``cbHam'j$P,'g+d[&X[nlMunh6*>[31BmfU;tX#2ur74l6O<A>'opEKVX3#>J>@XjNd*rU9LE.dU1V,Z0)P6lA0mLnce7m]D%9X,e+]!K'c*NS,4-MA@SbXc9T/emclH9J'hBN.Da@]j1eWe6j_qrZ4`e%VHDDs3Dt4^9aK`=i^<L)[>VJn!Mk'"aLDNjDH5<9;SK<s-VlgL3uhr?+!neM9c$$(Y+VDKC\2O%l[D\B9Yd'(<Y6/V=[YATS0H]$HM%_KZNF%[)a2TbH6-V$d'oHi*(1H<<l"#gP21Rkr'DJd:h%uHdme@1c=ob1;0"dLNM@n<d"bq6UH5'<I'QD;E)43H[?!OHA,-"7A8dTFqj2WS:$kKVt>O)bK]+`7e:Ka1SJ>9d@sIK'H2G?X>F)fXDVsT%VifjD]6"=$LU\I#M:&FP[/u58QVG87)tGmA<s&J>F.U@^!;ei=WUrsn*<K_Fm1VRVd8#uE[(uT>l9`ArU]Nu(TISKj%maV_(ub>^$O]\p@>IK'CB>q^l3m%BYdo[&Nc]4`'#j9i4Nb<:C2?n4FoPaX21aX6=\F$`l`cc26bk!B$mtMn$W"LBu#)Ga_h2Lc"6(?1^A7'c"LFN*q[f%?'SHmccVqeh>`=>4e?W+bs6B]`LJF)j"hBC<&r1LRnJ^QcBZl#CG!INDO#S^:^SESj5k%0.HJqmN$tC]h7su^.K/=cgAtV<66fPXQ>*,&\2V$'FP^7Bbmjm0U?fW25WO(icG?(6PjPc+iV1M&Ff,1KLRq[`lh[+lgX\L0;hB&\6KTOQ1J++eW-PtkoY-]\XiNh$:@M#$UMt%1G%qr@lf5rllu.'iNK;^KRHN@M)&_96AgAABEjB))*;,M3(+7cd`@JbjMSk.W7pkF--N=jQ*Z5s2>PRGp5)u8q"Xtb+&u`DaI5_h91e?HIakPGY<p5$HZc+hK8h_-[.qib2I1WY@VVhqW7H&O_/+Dq,X)AW7;)EVR3s@\hShMNB4D'JEa,7*t!-eQ/%^IP(o<VdDg"8,<a,1fC1M@B9<FrBC9[1g8@%5ahC,O3m81ZY.80"s\F9?M@]G5[8fOO.d%VU&T-u-S8;=UfB$:0=Ti%n[Ye6kPU=<EjpfLG>\5nWU+r5+)Eb$M6&74$V=J^o671ZCq~>endstream
|
||||
endobj
|
||||
33 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 1124
|
||||
>>
|
||||
stream
|
||||
GatU1997gc&AJ$C%!(&;)b/=]]d7?t8?jS+`ePU]U#iPu/4I,qA]+K>),cV>fgf5q"t5rsS=/jC7RIXlI<f)P*oPiehL+7s;cmpfk:DDM4hP,Sra%uc#'DUVXISueObF<ns0UO"J5=Sa/7Eg%6WMLD*c@8a_ABOKMfPls&akY3_-ajT?n)!P(fpP@Q="(rF%C<`.;_s`eW>c15)Cimk819P/'>H!3d?o*Gsh7`s8TU(;W4k,;!*]Da_P(,..W7ldm""C(7tosS>o1pZYUP#BRAH_0(_$N"S,CCRh$t;aAnZ5Wbt$"aWSC52gPjUiX4T+-h?C'X/<NliD%GQr2c*`8K[%?emm\ZGX>M&rJH],1L?kK:%lKGrE_O!1j$Tc:^:u^YX6jd.MVRm0H.dPlG2/8A<_Ce$UV=nZ+(!Vi19MBOnoi@-Toa1m6Gt&k+LZ6EC\=?).=0K^.qeY,Xn-@,&hJM*Z]&JU,n=Y\;Q)<Tcp4ac5ah4;oL8'9i'qKDl#q1<#8XN8pUj8]CFruc*6S#J0UOMkg17$?BoP`RuO]P(08?KJ>W`&p<F(m%8qO&`Ha-Vn3i6(bhra=\6^QeXZ\^@5NG&G;cSjkXC]f?V]P]l>-b5El=-"K4V;i_KL5JE<l0krbo@$>^#(9tOhp7l'>FA#LXb4DOFHn+@lmS:m<;!,b*"5-W[8Ki#B`Y3Ksd&+(Fg#6(HY=1IAr:3ZEem$cD(T\[bZX=0-2MA)6O_0#j(P`liSYX%Q(Wd&GGlD-&V!&.`(Gdq_MF:Bj.CQl*X]OeM5u+eC8kU=)UJ[<SZD6F#\"ul6,Ge+'bHF`/7``?7Tb@l8%@;I[=)+Xbr7/'BX'[[RdR55q-&od$/3\g7_%(6di6A[I\QTUG*t2U^h,u:m4g-3(Tlp6lhm(iM@j^S.TB;5LIVf`cCkAV)bX;iLZF=))(7;3-ZNX9[^s!UEug\QEa#M3lssNP!0WBHg:S:CXb&-DmhWi3F,3e=MrCajj\UO,+VSH&/uMhf?=Ih/bV$"f'Lr2fBZA&VjYa"ni7]CGqf/sHh;Ej9_\#Z,Kj11R1)p;2^j'Zjt!lh]NO^?Gh$51^*T;tPC_eM?fu$X:4(9L1Tnp2'/is?"5,dpk5~>endstream
|
||||
endobj
|
||||
xref
|
||||
0 34
|
||||
0000000000 65535 f
|
||||
0000000061 00000 n
|
||||
0000000156 00000 n
|
||||
0000000263 00000 n
|
||||
0000000371 00000 n
|
||||
0000000481 00000 n
|
||||
0000000676 00000 n
|
||||
0000000785 00000 n
|
||||
0000000980 00000 n
|
||||
0000001085 00000 n
|
||||
0000001162 00000 n
|
||||
0000001358 00000 n
|
||||
0000001554 00000 n
|
||||
0000001750 00000 n
|
||||
0000001946 00000 n
|
||||
0000002142 00000 n
|
||||
0000002226 00000 n
|
||||
0000002422 00000 n
|
||||
0000002618 00000 n
|
||||
0000002814 00000 n
|
||||
0000003010 00000 n
|
||||
0000003080 00000 n
|
||||
0000003361 00000 n
|
||||
0000003494 00000 n
|
||||
0000004196 00000 n
|
||||
0000006400 00000 n
|
||||
0000008981 00000 n
|
||||
0000011784 00000 n
|
||||
0000012614 00000 n
|
||||
0000014985 00000 n
|
||||
0000017637 00000 n
|
||||
0000020199 00000 n
|
||||
0000022443 00000 n
|
||||
0000023846 00000 n
|
||||
trailer
|
||||
<<
|
||||
/ID
|
||||
[<71e3d90b133a79c4436262df53cdbfbf><71e3d90b133a79c4436262df53cdbfbf>]
|
||||
% ReportLab generated PDF document -- digest (opensource)
|
||||
|
||||
/Info 21 0 R
|
||||
/Root 20 0 R
|
||||
/Size 34
|
||||
>>
|
||||
startxref
|
||||
25062
|
||||
%%EOF
|
||||
@@ -1,100 +0,0 @@
|
||||
# Issue #1097 — Bannerlord M5 Sovereign Victory: Implementation
|
||||
|
||||
**Date:** 2026-03-23
|
||||
**Status:** Python stack implemented — game infrastructure pending
|
||||
|
||||
## Summary
|
||||
|
||||
Issue #1097 is the final milestone of Project Bannerlord (#1091): Timmy holds
|
||||
the title of King with majority territory control through pure local strategy.
|
||||
|
||||
This PR implements the Python-side sovereign victory stack (`src/bannerlord/`).
|
||||
The game-side infrastructure (Windows VM, GABS C# mod) remains external to this
|
||||
repository, consistent with the scope decision on M4 (#1096).
|
||||
|
||||
## What was implemented
|
||||
|
||||
### `src/bannerlord/` package
|
||||
|
||||
| Module | Purpose |
|
||||
|--------|---------|
|
||||
| `models.py` | Pydantic data contracts — KingSubgoal, SubgoalMessage, TaskMessage, ResultMessage, StateUpdateMessage, reward functions, VictoryCondition |
|
||||
| `gabs_client.py` | Async TCP JSON-RPC client for Bannerlord.GABS (port 4825), graceful degradation when game server is offline |
|
||||
| `ledger.py` | SQLite-backed asset ledger — treasury, fiefs, vassal budgets, campaign tick log |
|
||||
| `agents/king.py` | King agent — Qwen3:32b, 1× per campaign day, sovereign campaign loop, victory detection, subgoal broadcast |
|
||||
| `agents/vassals.py` | War / Economy / Diplomacy vassals — Qwen3:14b, domain reward functions, primitive dispatch |
|
||||
| `agents/companions.py` | Logistics / Caravan / Scout companions — event-driven, primitive execution against GABS |
|
||||
|
||||
### `tests/unit/test_bannerlord/` — 56 unit tests
|
||||
|
||||
- `test_models.py` — Pydantic validation, reward math, victory condition logic
|
||||
- `test_gabs_client.py` — Connection lifecycle, RPC dispatch, error handling, graceful degradation
|
||||
- `test_agents.py` — King campaign loop, vassal subgoal routing, companion primitive execution
|
||||
|
||||
All 56 tests pass.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
KingAgent (Qwen3:32b, 1×/day)
|
||||
└── KingSubgoal → SubgoalQueue
|
||||
├── WarVassal (Qwen3:14b, 4×/day)
|
||||
│ └── TaskMessage → LogisticsCompanion
|
||||
│ └── GABS: move_party, recruit_troops, upgrade_troops
|
||||
├── EconomyVassal (Qwen3:14b, 4×/day)
|
||||
│ └── TaskMessage → CaravanCompanion
|
||||
│ └── GABS: assess_prices, buy_goods, establish_caravan
|
||||
└── DiplomacyVassal (Qwen3:14b, 4×/day)
|
||||
└── TaskMessage → ScoutCompanion
|
||||
└── GABS: track_lord, assess_garrison, report_intel
|
||||
```
|
||||
|
||||
## Subgoal vocabulary
|
||||
|
||||
| Token | Vassal | Meaning |
|
||||
|-------|--------|---------|
|
||||
| `EXPAND_TERRITORY` | War | Take or secure a fief |
|
||||
| `RAID_ECONOMY` | War | Raid enemy villages for denars |
|
||||
| `TRAIN` | War | Level troops via auto-resolve |
|
||||
| `FORTIFY` | Economy | Upgrade or repair a settlement |
|
||||
| `CONSOLIDATE` | Economy | Hold territory, no expansion |
|
||||
| `TRADE` | Economy | Execute profitable trade route |
|
||||
| `ALLY` | Diplomacy | Pursue non-aggression / alliance |
|
||||
| `RECRUIT` | Logistics | Fill party to capacity |
|
||||
| `HEAL` | Logistics | Rest party until wounds recovered |
|
||||
| `SPY` | Scout | Gain information on target faction |
|
||||
|
||||
## Victory condition
|
||||
|
||||
```python
|
||||
VictoryCondition(
|
||||
holds_king_title=True, # player_title == "King" from GABS
|
||||
territory_control_pct=55.0, # > 51% of Calradia fiefs
|
||||
)
|
||||
```
|
||||
|
||||
## Graceful degradation
|
||||
|
||||
When GABS is offline (game not running), `GABSClient` logs a warning and raises
|
||||
`GABSUnavailable`. The King agent catches this and runs with an empty game state
|
||||
(falls back to RECRUIT subgoal). No part of the dashboard crashes.
|
||||
|
||||
## Remaining prerequisites
|
||||
|
||||
Before M5 can run live:
|
||||
|
||||
1. **M1-M3** — Passive observer, basic campaign actions, full campaign strategy
|
||||
(currently open; their Python stubs can build on this `src/bannerlord/` package)
|
||||
2. **M4** — Formation Commander (#1096) — declined as out-of-scope; M5 works
|
||||
around M4 by using Bannerlord's Tactics auto-resolve path
|
||||
3. **Windows VM** — Mount & Blade II: Bannerlord + GABS mod (BUTR/Bannerlord.GABS)
|
||||
4. **OBS streaming** — Cinematic Camera pipeline (Step 3 of M5) — external to repo
|
||||
5. **BattleLink** — Alex co-op integration (Step 4 of M5) — requires dedicated server
|
||||
|
||||
## Design references
|
||||
|
||||
- Ahilan & Dayan (2019): Feudal Multi-Agent Hierarchies — manager/worker hierarchy
|
||||
- Wang et al. (2023): Voyager — LLM lifelong learning pattern
|
||||
- Feudal hierarchy design doc: `docs/research/bannerlord-feudal-hierarchy-design.md`
|
||||
|
||||
Fixes #1097
|
||||
@@ -1,105 +0,0 @@
|
||||
# Nexus — Scope & Acceptance Criteria
|
||||
|
||||
**Issue:** #1208
|
||||
**Date:** 2026-03-23
|
||||
**Status:** Initial implementation complete; teaching/RL harness deferred
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
The **Nexus** is a persistent conversational space where Timmy lives with full
|
||||
access to his live memory. Unlike the main dashboard chat (which uses tools and
|
||||
has a transient feel), the Nexus is:
|
||||
|
||||
- **Conversational only** — no tool approval flow; pure dialogue
|
||||
- **Memory-aware** — semantically relevant memories surface alongside each exchange
|
||||
- **Teachable** — the operator can inject facts directly into Timmy's live memory
|
||||
- **Persistent** — the session survives page refreshes; history accumulates over time
|
||||
- **Local** — always backed by Ollama; no cloud inference required
|
||||
|
||||
This is the foundation for future LoRA fine-tuning, RL training harnesses, and
|
||||
eventually real-time self-improvement loops.
|
||||
|
||||
---
|
||||
|
||||
## Scope (v1 — this PR)
|
||||
|
||||
| Area | Included | Deferred |
|
||||
|------|----------|----------|
|
||||
| Conversational UI | ✅ Chat panel with HTMX streaming | Streaming tokens |
|
||||
| Live memory sidebar | ✅ Semantic search on each turn | Auto-refresh on teach |
|
||||
| Teaching panel | ✅ Inject personal facts | Bulk import, LoRA trigger |
|
||||
| Session isolation | ✅ Dedicated `nexus` session ID | Per-operator sessions |
|
||||
| Nav integration | ✅ NEXUS link in INTEL dropdown | Mobile nav |
|
||||
| CSS/styling | ✅ Two-column responsive layout | Dark/light theme toggle |
|
||||
| Tests | ✅ 9 unit tests, all green | E2E with real Ollama |
|
||||
| LoRA / RL harness | ❌ deferred to future issue | |
|
||||
| Auto-falsework | ❌ deferred | |
|
||||
| Bannerlord interface | ❌ separate track | |
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
### AC-1: Nexus page loads
|
||||
- **Given** the dashboard is running
|
||||
- **When** I navigate to `/nexus`
|
||||
- **Then** I see a two-panel layout: conversation on the left, memory sidebar on the right
|
||||
- **And** the page title reads "// NEXUS"
|
||||
- **And** the page is accessible from the nav (INTEL → NEXUS)
|
||||
|
||||
### AC-2: Conversation-only chat
|
||||
- **Given** I am on the Nexus page
|
||||
- **When** I type a message and submit
|
||||
- **Then** Timmy responds using the `nexus` session (isolated from dashboard history)
|
||||
- **And** no tool-approval cards appear — responses are pure text
|
||||
- **And** my message and Timmy's reply are appended to the chat log
|
||||
|
||||
### AC-3: Memory context surfaces automatically
|
||||
- **Given** I send a message
|
||||
- **When** the response arrives
|
||||
- **Then** the "LIVE MEMORY CONTEXT" panel shows up to 4 semantically relevant memories
|
||||
- **And** each memory entry shows its type and content
|
||||
|
||||
### AC-4: Teaching panel stores facts
|
||||
- **Given** I type a fact into the "TEACH TIMMY" input and submit
|
||||
- **When** the request completes
|
||||
- **Then** I see a green confirmation "✓ Taught: <fact>"
|
||||
- **And** the fact appears in the "KNOWN FACTS" list
|
||||
- **And** the fact is stored in Timmy's live memory (`store_personal_fact`)
|
||||
|
||||
### AC-5: Empty / invalid input is rejected gracefully
|
||||
- **Given** I submit a blank message or fact
|
||||
- **Then** no request is made and the log is unchanged
|
||||
- **Given** I submit a message over 10 000 characters
|
||||
- **Then** an inline error is shown without crashing the server
|
||||
|
||||
### AC-6: Conversation can be cleared
|
||||
- **Given** the Nexus has conversation history
|
||||
- **When** I click CLEAR and confirm
|
||||
- **Then** the chat log shows only a "cleared" confirmation
|
||||
- **And** the Agno session for `nexus` is reset
|
||||
|
||||
### AC-7: Graceful degradation when Ollama is down
|
||||
- **Given** Ollama is unavailable
|
||||
- **When** I send a message
|
||||
- **Then** an error message is shown inline (not a 500 page)
|
||||
- **And** the app continues to function
|
||||
|
||||
### AC-8: No regression on existing tests
|
||||
- **Given** the nexus route is registered
|
||||
- **When** `tox -e unit` runs
|
||||
- **Then** all 343+ existing tests remain green
|
||||
|
||||
---
|
||||
|
||||
## Future Work (separate issues)
|
||||
|
||||
1. **LoRA trigger** — button in the teaching panel to queue a fine-tuning run
|
||||
using the current Nexus conversation as training data
|
||||
2. **RL harness** — reward signal collection during conversation for RLHF
|
||||
3. **Auto-falsework pipeline** — scaffold harness generation from conversation
|
||||
4. **Bannerlord interface** — Nexus as the live-memory bridge for in-game Timmy
|
||||
5. **Streaming responses** — token-by-token display via WebSocket
|
||||
6. **Per-operator sessions** — isolate Nexus history by logged-in user
|
||||
26
poetry.lock
generated
26
poetry.lock
generated
@@ -2936,9 +2936,10 @@ numpy = ">=1.22,<2.5"
|
||||
name = "numpy"
|
||||
version = "2.4.2"
|
||||
description = "Fundamental package for array computing in Python"
|
||||
optional = false
|
||||
optional = true
|
||||
python-versions = ">=3.11"
|
||||
groups = ["main"]
|
||||
markers = "extra == \"bigbrain\" or extra == \"embeddings\" or extra == \"voice\""
|
||||
files = [
|
||||
{file = "numpy-2.4.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:e7e88598032542bd49af7c4747541422884219056c268823ef6e5e89851c8825"},
|
||||
{file = "numpy-2.4.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:7edc794af8b36ca37ef5fcb5e0d128c7e0595c7b96a2318d1badb6fcd8ee86b1"},
|
||||
@@ -3346,27 +3347,6 @@ triton = {version = ">=2", markers = "platform_machine == \"x86_64\" and sys_pla
|
||||
[package.extras]
|
||||
dev = ["black", "flake8", "isort", "pytest", "scipy"]
|
||||
|
||||
[[package]]
|
||||
name = "opencv-python"
|
||||
version = "4.13.0.92"
|
||||
description = "Wrapper package for OpenCV python bindings."
|
||||
optional = false
|
||||
python-versions = ">=3.6"
|
||||
groups = ["main"]
|
||||
files = [
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-macosx_13_0_arm64.whl", hash = "sha256:caf60c071ec391ba51ed00a4a920f996d0b64e3e46068aac1f646b5de0326a19"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-macosx_14_0_x86_64.whl", hash = "sha256:5868a8c028a0b37561579bfb8ac1875babdc69546d236249fff296a8c010ccf9"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:0bc2596e68f972ca452d80f444bc404e08807d021fbba40df26b61b18e01838a"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:402033cddf9d294693094de5ef532339f14ce821da3ad7df7c9f6e8316da32cf"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:bccaabf9eb7f897ca61880ce2869dcd9b25b72129c28478e7f2a5e8dee945616"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:620d602b8f7d8b8dab5f4b99c6eb353e78d3fb8b0f53db1bd258bb1aa001c1d5"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-win32.whl", hash = "sha256:372fe164a3148ac1ca51e5f3ad0541a4a276452273f503441d718fab9c5e5f59"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-win_amd64.whl", hash = "sha256:423d934c9fafb91aad38edf26efb46da91ffbc05f3f59c4b0c72e699720706f5"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
numpy = {version = ">=2", markers = "python_version >= \"3.9\""}
|
||||
|
||||
[[package]]
|
||||
name = "optimum"
|
||||
version = "2.1.0"
|
||||
@@ -9720,4 +9700,4 @@ voice = ["openai-whisper", "piper-tts", "pyttsx3", "sounddevice"]
|
||||
[metadata]
|
||||
lock-version = "2.1"
|
||||
python-versions = ">=3.11,<4"
|
||||
content-hash = "5af3028474051032bef12182eaa5ef55950cbaeca21d1793f878d54c03994eb0"
|
||||
content-hash = "cc50755f322b8755e85ab7bdf0668609612d885552aba14caf175326eedfa216"
|
||||
|
||||
@@ -14,7 +14,6 @@ repository = "http://localhost:3000/rockachopa/Timmy-time-dashboard"
|
||||
packages = [
|
||||
{ include = "config.py", from = "src" },
|
||||
|
||||
{ include = "bannerlord", from = "src" },
|
||||
{ include = "dashboard", from = "src" },
|
||||
{ include = "infrastructure", from = "src" },
|
||||
{ include = "integrations", from = "src" },
|
||||
@@ -61,7 +60,6 @@ selenium = { version = ">=4.20.0", optional = true }
|
||||
pytest-randomly = { version = ">=3.16.0", optional = true }
|
||||
pytest-xdist = { version = ">=3.5.0", optional = true }
|
||||
anthropic = "^0.86.0"
|
||||
opencv-python = "^4.13.0.92"
|
||||
|
||||
[tool.poetry.extras]
|
||||
telegram = ["python-telegram-bot"]
|
||||
|
||||
@@ -42,7 +42,7 @@ def _get_gitea_api() -> str:
|
||||
if api_file.exists():
|
||||
return api_file.read_text().strip()
|
||||
# Default fallback
|
||||
return "http://143.198.27.163:3000/api/v1"
|
||||
return "http://localhost:3000/api/v1"
|
||||
|
||||
|
||||
GITEA_API = _get_gitea_api()
|
||||
|
||||
@@ -6,7 +6,7 @@ writes a ranked queue to .loop/queue.json. No LLM calls — pure heuristics.
|
||||
|
||||
Run: python3 scripts/triage_score.py
|
||||
Env: GITEA_TOKEN (or reads ~/.hermes/gitea_token)
|
||||
GITEA_API (default: http://143.198.27.163:3000/api/v1)
|
||||
GITEA_API (default: http://localhost:3000/api/v1)
|
||||
REPO_SLUG (default: rockachopa/Timmy-time-dashboard)
|
||||
"""
|
||||
|
||||
@@ -33,7 +33,7 @@ def _get_gitea_api() -> str:
|
||||
if api_file.exists():
|
||||
return api_file.read_text().strip()
|
||||
# Default fallback
|
||||
return "http://143.198.27.163:3000/api/v1"
|
||||
return "http://localhost:3000/api/v1"
|
||||
|
||||
|
||||
GITEA_API = _get_gitea_api()
|
||||
|
||||
@@ -1,22 +0,0 @@
|
||||
"""Bannerlord sovereign agent package — Project Bannerlord M5.
|
||||
|
||||
Implements the feudal multi-agent hierarchy for Timmy's Bannerlord campaign.
|
||||
Architecture based on Ahilan & Dayan (2019) Feudal Multi-Agent Hierarchies.
|
||||
|
||||
Refs #1091 (epic), #1097 (M5 Sovereign Victory), #1099 (feudal hierarchy design).
|
||||
|
||||
Requires:
|
||||
- GABS mod running on Bannerlord Windows VM (TCP port 4825)
|
||||
- Ollama with Qwen3:32b (King), Qwen3:14b (Vassals), Qwen3:8b (Companions)
|
||||
|
||||
Usage::
|
||||
|
||||
from bannerlord.gabs_client import GABSClient
|
||||
from bannerlord.agents.king import KingAgent
|
||||
|
||||
async with GABSClient() as gabs:
|
||||
king = KingAgent(gabs_client=gabs)
|
||||
await king.run_campaign()
|
||||
"""
|
||||
|
||||
__version__ = "0.1.0"
|
||||
@@ -1,7 +0,0 @@
|
||||
"""Bannerlord feudal agent hierarchy.
|
||||
|
||||
Three tiers:
|
||||
- King (king.py) — strategic, Qwen3:32b, 1× per campaign day
|
||||
- Vassals (vassals.py) — domain, Qwen3:14b, 4× per campaign day
|
||||
- Companions (companions.py) — tactical, Qwen3:8b, event-driven
|
||||
"""
|
||||
@@ -1,261 +0,0 @@
|
||||
"""Companion worker agents — Logistics, Caravan, and Scout.
|
||||
|
||||
Companions are the lowest tier — fast, specialized, single-purpose workers.
|
||||
Each companion listens to its :class:`TaskMessage` queue, executes the
|
||||
requested primitive against GABS, and emits a :class:`ResultMessage`.
|
||||
|
||||
Model: Qwen3:8b (or smaller) — sub-2-second response times.
|
||||
Frequency: event-driven (triggered by vassal task messages).
|
||||
|
||||
Primitive vocabulary per companion:
|
||||
Logistics: recruit_troop, buy_supplies, rest_party, sell_prisoners, upgrade_troops, build_project
|
||||
Caravan: assess_prices, buy_goods, sell_goods, establish_caravan, abandon_route
|
||||
Scout: track_lord, assess_garrison, map_patrol_routes, report_intel
|
||||
|
||||
Refs: #1097, #1099.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from bannerlord.gabs_client import GABSClient, GABSUnavailable
|
||||
from bannerlord.models import ResultMessage, TaskMessage
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class BaseCompanion:
|
||||
"""Shared companion lifecycle — polls task queue, executes primitives."""
|
||||
|
||||
name: str = "base_companion"
|
||||
primitives: frozenset[str] = frozenset()
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
gabs_client: GABSClient,
|
||||
task_queue: asyncio.Queue[TaskMessage],
|
||||
result_queue: asyncio.Queue[ResultMessage] | None = None,
|
||||
) -> None:
|
||||
self._gabs = gabs_client
|
||||
self._task_queue = task_queue
|
||||
self._result_queue = result_queue or asyncio.Queue()
|
||||
self._running = False
|
||||
|
||||
@property
|
||||
def result_queue(self) -> asyncio.Queue[ResultMessage]:
|
||||
return self._result_queue
|
||||
|
||||
async def run(self) -> None:
|
||||
"""Companion event loop — processes task messages."""
|
||||
self._running = True
|
||||
logger.info("%s started", self.name)
|
||||
try:
|
||||
while self._running:
|
||||
try:
|
||||
task = await asyncio.wait_for(self._task_queue.get(), timeout=1.0)
|
||||
except TimeoutError:
|
||||
continue
|
||||
|
||||
if task.to_agent != self.name:
|
||||
# Not for us — put it back (another companion will handle it)
|
||||
await self._task_queue.put(task)
|
||||
await asyncio.sleep(0.05)
|
||||
continue
|
||||
|
||||
result = await self._execute(task)
|
||||
await self._result_queue.put(result)
|
||||
self._task_queue.task_done()
|
||||
|
||||
except asyncio.CancelledError:
|
||||
logger.info("%s cancelled", self.name)
|
||||
raise
|
||||
finally:
|
||||
self._running = False
|
||||
|
||||
def stop(self) -> None:
|
||||
self._running = False
|
||||
|
||||
async def _execute(self, task: TaskMessage) -> ResultMessage:
|
||||
"""Dispatch *task.primitive* to its handler method."""
|
||||
handler = getattr(self, f"_prim_{task.primitive}", None)
|
||||
if handler is None:
|
||||
logger.warning("%s: unknown primitive %r — skipping", self.name, task.primitive)
|
||||
return ResultMessage(
|
||||
from_agent=self.name,
|
||||
to_agent=task.from_agent,
|
||||
success=False,
|
||||
outcome={"error": f"Unknown primitive: {task.primitive}"},
|
||||
)
|
||||
try:
|
||||
outcome = await handler(task.args)
|
||||
return ResultMessage(
|
||||
from_agent=self.name,
|
||||
to_agent=task.from_agent,
|
||||
success=True,
|
||||
outcome=outcome or {},
|
||||
)
|
||||
except GABSUnavailable as exc:
|
||||
logger.warning("%s: GABS unavailable for %r: %s", self.name, task.primitive, exc)
|
||||
return ResultMessage(
|
||||
from_agent=self.name,
|
||||
to_agent=task.from_agent,
|
||||
success=False,
|
||||
outcome={"error": str(exc)},
|
||||
)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.warning("%s: %r failed: %s", self.name, task.primitive, exc)
|
||||
return ResultMessage(
|
||||
from_agent=self.name,
|
||||
to_agent=task.from_agent,
|
||||
success=False,
|
||||
outcome={"error": str(exc)},
|
||||
)
|
||||
|
||||
|
||||
# ── Logistics Companion ───────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class LogisticsCompanion(BaseCompanion):
|
||||
"""Party management — recruitment, supply, healing, troop upgrades.
|
||||
|
||||
Skill domain: Scouting / Steward / Medicine.
|
||||
"""
|
||||
|
||||
name = "logistics_companion"
|
||||
primitives = frozenset(
|
||||
{
|
||||
"recruit_troop",
|
||||
"buy_supplies",
|
||||
"rest_party",
|
||||
"sell_prisoners",
|
||||
"upgrade_troops",
|
||||
"build_project",
|
||||
}
|
||||
)
|
||||
|
||||
async def _prim_recruit_troop(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
troop_type = args.get("troop_type", "infantry")
|
||||
qty = int(args.get("quantity", 10))
|
||||
result = await self._gabs.recruit_troops(troop_type, qty)
|
||||
logger.info("Recruited %d %s", qty, troop_type)
|
||||
return result or {"recruited": qty, "type": troop_type}
|
||||
|
||||
async def _prim_buy_supplies(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
qty = int(args.get("quantity", 50))
|
||||
result = await self._gabs.call("party.buySupplies", {"quantity": qty})
|
||||
logger.info("Bought %d food supplies", qty)
|
||||
return result or {"purchased": qty}
|
||||
|
||||
async def _prim_rest_party(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
days = int(args.get("days", 3))
|
||||
result = await self._gabs.call("party.rest", {"days": days})
|
||||
logger.info("Resting party for %d days", days)
|
||||
return result or {"rested_days": days}
|
||||
|
||||
async def _prim_sell_prisoners(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
location = args.get("location", "nearest_town")
|
||||
result = await self._gabs.call("party.sellPrisoners", {"location": location})
|
||||
logger.info("Selling prisoners at %s", location)
|
||||
return result or {"sold_at": location}
|
||||
|
||||
async def _prim_upgrade_troops(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
result = await self._gabs.call("party.upgradeTroops", {})
|
||||
logger.info("Upgraded available troops")
|
||||
return result or {"upgraded": True}
|
||||
|
||||
async def _prim_build_project(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
settlement = args.get("settlement", "")
|
||||
result = await self._gabs.call("settlement.buildProject", {"settlement": settlement})
|
||||
logger.info("Building project in %s", settlement)
|
||||
return result or {"settlement": settlement}
|
||||
|
||||
async def _prim_move_party(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
destination = args.get("destination", "")
|
||||
result = await self._gabs.move_party(destination)
|
||||
logger.info("Moving party to %s", destination)
|
||||
return result or {"destination": destination}
|
||||
|
||||
|
||||
# ── Caravan Companion ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class CaravanCompanion(BaseCompanion):
|
||||
"""Trade route management — price assessment, goods trading, caravan deployment.
|
||||
|
||||
Skill domain: Trade / Charm.
|
||||
"""
|
||||
|
||||
name = "caravan_companion"
|
||||
primitives = frozenset(
|
||||
{"assess_prices", "buy_goods", "sell_goods", "establish_caravan", "abandon_route"}
|
||||
)
|
||||
|
||||
async def _prim_assess_prices(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
town = args.get("town", "nearest")
|
||||
result = await self._gabs.call("trade.assessPrices", {"town": town})
|
||||
logger.info("Assessed prices at %s", town)
|
||||
return result or {"town": town}
|
||||
|
||||
async def _prim_buy_goods(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
item = args.get("item", "grain")
|
||||
qty = int(args.get("quantity", 10))
|
||||
result = await self._gabs.call("trade.buyGoods", {"item": item, "quantity": qty})
|
||||
logger.info("Buying %d × %s", qty, item)
|
||||
return result or {"item": item, "quantity": qty}
|
||||
|
||||
async def _prim_sell_goods(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
item = args.get("item", "grain")
|
||||
qty = int(args.get("quantity", 10))
|
||||
result = await self._gabs.call("trade.sellGoods", {"item": item, "quantity": qty})
|
||||
logger.info("Selling %d × %s", qty, item)
|
||||
return result or {"item": item, "quantity": qty}
|
||||
|
||||
async def _prim_establish_caravan(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
town = args.get("town", "")
|
||||
result = await self._gabs.call("trade.establishCaravan", {"town": town})
|
||||
logger.info("Establishing caravan at %s", town)
|
||||
return result or {"town": town}
|
||||
|
||||
async def _prim_abandon_route(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
result = await self._gabs.call("trade.abandonRoute", {})
|
||||
logger.info("Caravan route abandoned — returning to main party")
|
||||
return result or {"abandoned": True}
|
||||
|
||||
|
||||
# ── Scout Companion ───────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class ScoutCompanion(BaseCompanion):
|
||||
"""Intelligence gathering — lord tracking, garrison assessment, patrol mapping.
|
||||
|
||||
Skill domain: Scouting / Roguery.
|
||||
"""
|
||||
|
||||
name = "scout_companion"
|
||||
primitives = frozenset({"track_lord", "assess_garrison", "map_patrol_routes", "report_intel"})
|
||||
|
||||
async def _prim_track_lord(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
lord_name = args.get("name", "")
|
||||
result = await self._gabs.call("intelligence.trackLord", {"name": lord_name})
|
||||
logger.info("Tracking lord: %s", lord_name)
|
||||
return result or {"tracking": lord_name}
|
||||
|
||||
async def _prim_assess_garrison(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
settlement = args.get("settlement", "")
|
||||
result = await self._gabs.call("intelligence.assessGarrison", {"settlement": settlement})
|
||||
logger.info("Assessing garrison at %s", settlement)
|
||||
return result or {"settlement": settlement}
|
||||
|
||||
async def _prim_map_patrol_routes(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
region = args.get("region", "")
|
||||
result = await self._gabs.call("intelligence.mapPatrols", {"region": region})
|
||||
logger.info("Mapping patrol routes in %s", region)
|
||||
return result or {"region": region}
|
||||
|
||||
async def _prim_report_intel(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
result = await self._gabs.call("intelligence.report", {})
|
||||
logger.info("Scout intel report generated")
|
||||
return result or {"reported": True}
|
||||
@@ -1,235 +0,0 @@
|
||||
"""King agent — Timmy as sovereign ruler of Calradia.
|
||||
|
||||
The King operates on the campaign-map timescale. Each campaign tick he:
|
||||
1. Reads the full game state from GABS
|
||||
2. Evaluates the victory condition
|
||||
3. Issues a single KingSubgoal token to the vassal queue
|
||||
4. Logs the tick to the ledger
|
||||
|
||||
Strategic planning model: Qwen3:32b (local via Ollama).
|
||||
Decision budget: 5–15 seconds per tick.
|
||||
|
||||
Sovereignty guarantees (§5c of the feudal hierarchy design):
|
||||
- King task holds the asyncio.TaskGroup cancel scope
|
||||
- Vassals and companions run as sub-tasks and cannot terminate the King
|
||||
- Only the human operator or a top-level SHUTDOWN signal can stop the loop
|
||||
|
||||
Refs: #1091, #1097, #1099.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from bannerlord.gabs_client import GABSClient, GABSUnavailable
|
||||
from bannerlord.ledger import Ledger
|
||||
from bannerlord.models import (
|
||||
KingSubgoal,
|
||||
StateUpdateMessage,
|
||||
SubgoalMessage,
|
||||
VictoryCondition,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_KING_MODEL = "qwen3:32b"
|
||||
_KING_TICK_SECONDS = 5.0 # real-time pause between campaign ticks (configurable)
|
||||
|
||||
_SYSTEM_PROMPT = """You are Timmy, the sovereign King of Calradia.
|
||||
Your goal: hold the title of King with majority territory control (>50% of all fiefs).
|
||||
You think strategically over 100+ in-game days. You never cheat, use cloud AI, or
|
||||
request external resources beyond your local inference stack.
|
||||
|
||||
Each turn you receive the full game state as JSON. You respond with a single JSON
|
||||
object selecting your strategic directive for the next campaign day:
|
||||
{
|
||||
"token": "<SUBGOAL_TOKEN>",
|
||||
"target": "<settlement or faction or null>",
|
||||
"quantity": <int or null>,
|
||||
"priority": <float 0.0-2.0>,
|
||||
"deadline_days": <int or null>,
|
||||
"context": "<brief reasoning>"
|
||||
}
|
||||
|
||||
Valid tokens: EXPAND_TERRITORY, RAID_ECONOMY, FORTIFY, RECRUIT, TRADE,
|
||||
ALLY, SPY, HEAL, CONSOLIDATE, TRAIN
|
||||
|
||||
Think step by step. Respond with JSON only — no prose outside the object.
|
||||
"""
|
||||
|
||||
|
||||
class KingAgent:
|
||||
"""Sovereign campaign agent.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
gabs_client:
|
||||
Connected (or gracefully-degraded) GABS client.
|
||||
ledger:
|
||||
Asset ledger for persistence. Initialized automatically if not provided.
|
||||
ollama_url:
|
||||
Base URL of the Ollama inference server.
|
||||
model:
|
||||
Ollama model tag. Default: qwen3:32b.
|
||||
tick_interval:
|
||||
Real-time seconds between campaign ticks.
|
||||
subgoal_queue:
|
||||
asyncio.Queue where KingSubgoal messages are placed for vassals.
|
||||
Created automatically if not provided.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
gabs_client: GABSClient,
|
||||
ledger: Ledger | None = None,
|
||||
ollama_url: str = "http://localhost:11434",
|
||||
model: str = _KING_MODEL,
|
||||
tick_interval: float = _KING_TICK_SECONDS,
|
||||
subgoal_queue: asyncio.Queue[SubgoalMessage] | None = None,
|
||||
) -> None:
|
||||
self._gabs = gabs_client
|
||||
self._ledger = ledger or Ledger()
|
||||
self._ollama_url = ollama_url
|
||||
self._model = model
|
||||
self._tick_interval = tick_interval
|
||||
self._subgoal_queue: asyncio.Queue[SubgoalMessage] = subgoal_queue or asyncio.Queue()
|
||||
self._tick = 0
|
||||
self._running = False
|
||||
|
||||
@property
|
||||
def subgoal_queue(self) -> asyncio.Queue[SubgoalMessage]:
|
||||
return self._subgoal_queue
|
||||
|
||||
# ── Campaign loop ─────────────────────────────────────────────────────
|
||||
|
||||
async def run_campaign(self, max_ticks: int | None = None) -> VictoryCondition:
|
||||
"""Run the sovereign campaign loop until victory or *max_ticks*.
|
||||
|
||||
Returns the final :class:`VictoryCondition` snapshot.
|
||||
"""
|
||||
self._ledger.initialize()
|
||||
self._running = True
|
||||
victory = VictoryCondition()
|
||||
logger.info("King campaign started. Model: %s. Max ticks: %s", self._model, max_ticks)
|
||||
|
||||
try:
|
||||
while self._running:
|
||||
if max_ticks is not None and self._tick >= max_ticks:
|
||||
logger.info("Max ticks (%d) reached — stopping campaign.", max_ticks)
|
||||
break
|
||||
|
||||
state = await self._fetch_state()
|
||||
victory = self._evaluate_victory(state)
|
||||
|
||||
if victory.achieved:
|
||||
logger.info(
|
||||
"SOVEREIGN VICTORY — King of Calradia! Territory: %.1f%%, tick: %d",
|
||||
victory.territory_control_pct,
|
||||
self._tick,
|
||||
)
|
||||
break
|
||||
|
||||
subgoal = await self._decide(state)
|
||||
await self._broadcast_subgoal(subgoal)
|
||||
self._ledger.log_tick(
|
||||
tick=self._tick,
|
||||
campaign_day=state.get("campaign_day", self._tick),
|
||||
subgoal=subgoal.token,
|
||||
)
|
||||
|
||||
self._tick += 1
|
||||
await asyncio.sleep(self._tick_interval)
|
||||
|
||||
except asyncio.CancelledError:
|
||||
logger.info("King campaign task cancelled at tick %d", self._tick)
|
||||
raise
|
||||
finally:
|
||||
self._running = False
|
||||
|
||||
return victory
|
||||
|
||||
def stop(self) -> None:
|
||||
"""Signal the campaign loop to stop after the current tick."""
|
||||
self._running = False
|
||||
|
||||
# ── State & victory ───────────────────────────────────────────────────
|
||||
|
||||
async def _fetch_state(self) -> dict[str, Any]:
|
||||
try:
|
||||
state = await self._gabs.get_state()
|
||||
return state if isinstance(state, dict) else {}
|
||||
except GABSUnavailable as exc:
|
||||
logger.warning("GABS unavailable at tick %d: %s — using empty state", self._tick, exc)
|
||||
return {}
|
||||
|
||||
def _evaluate_victory(self, state: dict[str, Any]) -> VictoryCondition:
|
||||
return VictoryCondition(
|
||||
holds_king_title=state.get("player_title") == "King",
|
||||
territory_control_pct=float(state.get("territory_control_pct", 0.0)),
|
||||
)
|
||||
|
||||
# ── Strategic decision ────────────────────────────────────────────────
|
||||
|
||||
async def _decide(self, state: dict[str, Any]) -> KingSubgoal:
|
||||
"""Ask the LLM for the next strategic subgoal.
|
||||
|
||||
Falls back to RECRUIT (safe default) if the LLM is unavailable.
|
||||
"""
|
||||
try:
|
||||
subgoal = await asyncio.to_thread(self._llm_decide, state)
|
||||
return subgoal
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.warning(
|
||||
"King LLM decision failed at tick %d: %s — defaulting to RECRUIT", self._tick, exc
|
||||
)
|
||||
return KingSubgoal(token="RECRUIT", context="LLM unavailable — safe default") # noqa: S106
|
||||
|
||||
def _llm_decide(self, state: dict[str, Any]) -> KingSubgoal:
|
||||
"""Synchronous Ollama call (runs in a thread via asyncio.to_thread)."""
|
||||
import urllib.request
|
||||
|
||||
prompt_state = json.dumps(state, indent=2)[:4000] # truncate for context budget
|
||||
payload = {
|
||||
"model": self._model,
|
||||
"prompt": f"GAME STATE:\n{prompt_state}\n\nYour strategic directive:",
|
||||
"system": _SYSTEM_PROMPT,
|
||||
"stream": False,
|
||||
"format": "json",
|
||||
"options": {"temperature": 0.1},
|
||||
}
|
||||
data = json.dumps(payload).encode()
|
||||
req = urllib.request.Request(
|
||||
f"{self._ollama_url}/api/generate",
|
||||
data=data,
|
||||
headers={"Content-Type": "application/json"},
|
||||
)
|
||||
with urllib.request.urlopen(req, timeout=30) as resp: # noqa: S310
|
||||
result = json.loads(resp.read())
|
||||
|
||||
raw = result.get("response", "{}")
|
||||
parsed = json.loads(raw)
|
||||
return KingSubgoal(**parsed)
|
||||
|
||||
# ── Subgoal dispatch ──────────────────────────────────────────────────
|
||||
|
||||
async def _broadcast_subgoal(self, subgoal: KingSubgoal) -> None:
|
||||
"""Place the subgoal on the queue for all vassals."""
|
||||
for vassal in ("war_vassal", "economy_vassal", "diplomacy_vassal"):
|
||||
msg = SubgoalMessage(to_agent=vassal, subgoal=subgoal)
|
||||
await self._subgoal_queue.put(msg)
|
||||
logger.debug(
|
||||
"Tick %d: subgoal %s → %s (priority=%.1f)",
|
||||
self._tick,
|
||||
subgoal.token,
|
||||
subgoal.target or "—",
|
||||
subgoal.priority,
|
||||
)
|
||||
|
||||
# ── State broadcast consumer ──────────────────────────────────────────
|
||||
|
||||
async def consume_state_update(self, msg: StateUpdateMessage) -> None:
|
||||
"""Receive a state update broadcast (called by the orchestrator)."""
|
||||
logger.debug("King received state update tick=%d", msg.tick)
|
||||
@@ -1,296 +0,0 @@
|
||||
"""Vassal agents — War, Economy, and Diplomacy.
|
||||
|
||||
Vassals are mid-tier agents responsible for a domain of the kingdom.
|
||||
Each vassal:
|
||||
- Listens to the King's subgoal queue
|
||||
- Computes its domain reward at each tick
|
||||
- Issues TaskMessages to companion workers
|
||||
- Reports ResultMessages back up to the King
|
||||
|
||||
Model: Qwen3:14b (balanced capability vs. latency).
|
||||
Frequency: up to 4× per campaign day.
|
||||
|
||||
Refs: #1097, #1099.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from bannerlord.gabs_client import GABSClient, GABSUnavailable
|
||||
from bannerlord.models import (
|
||||
DiplomacyReward,
|
||||
EconomyReward,
|
||||
KingSubgoal,
|
||||
ResultMessage,
|
||||
SubgoalMessage,
|
||||
TaskMessage,
|
||||
WarReward,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Tokens each vassal responds to (all others are ignored)
|
||||
_WAR_TOKENS = {"EXPAND_TERRITORY", "RAID_ECONOMY", "TRAIN"}
|
||||
_ECON_TOKENS = {"FORTIFY", "CONSOLIDATE"}
|
||||
_DIPLO_TOKENS = {"ALLY"}
|
||||
_LOGISTICS_TOKENS = {"RECRUIT", "HEAL"}
|
||||
_TRADE_TOKENS = {"TRADE"}
|
||||
_SCOUT_TOKENS = {"SPY"}
|
||||
|
||||
|
||||
class BaseVassal:
|
||||
"""Shared vassal lifecycle — subscribes to subgoal queue, runs tick loop."""
|
||||
|
||||
name: str = "base_vassal"
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
gabs_client: GABSClient,
|
||||
subgoal_queue: asyncio.Queue[SubgoalMessage],
|
||||
result_queue: asyncio.Queue[ResultMessage] | None = None,
|
||||
task_queue: asyncio.Queue[TaskMessage] | None = None,
|
||||
) -> None:
|
||||
self._gabs = gabs_client
|
||||
self._subgoal_queue = subgoal_queue
|
||||
self._result_queue = result_queue or asyncio.Queue()
|
||||
self._task_queue = task_queue or asyncio.Queue()
|
||||
self._active_subgoal: KingSubgoal | None = None
|
||||
self._running = False
|
||||
|
||||
@property
|
||||
def task_queue(self) -> asyncio.Queue[TaskMessage]:
|
||||
return self._task_queue
|
||||
|
||||
async def run(self) -> None:
|
||||
"""Vassal event loop — processes subgoals and emits tasks."""
|
||||
self._running = True
|
||||
logger.info("%s started", self.name)
|
||||
try:
|
||||
while self._running:
|
||||
# Drain all pending subgoals (keep the latest)
|
||||
try:
|
||||
while True:
|
||||
msg = self._subgoal_queue.get_nowait()
|
||||
if msg.to_agent == self.name:
|
||||
self._active_subgoal = msg.subgoal
|
||||
logger.debug("%s received subgoal %s", self.name, msg.subgoal.token)
|
||||
except asyncio.QueueEmpty:
|
||||
pass
|
||||
|
||||
if self._active_subgoal is not None:
|
||||
await self._tick(self._active_subgoal)
|
||||
|
||||
await asyncio.sleep(0.25) # yield to event loop
|
||||
except asyncio.CancelledError:
|
||||
logger.info("%s cancelled", self.name)
|
||||
raise
|
||||
finally:
|
||||
self._running = False
|
||||
|
||||
def stop(self) -> None:
|
||||
self._running = False
|
||||
|
||||
async def _tick(self, subgoal: KingSubgoal) -> None:
|
||||
raise NotImplementedError
|
||||
|
||||
async def _get_state(self) -> dict[str, Any]:
|
||||
try:
|
||||
return await self._gabs.get_state() or {}
|
||||
except GABSUnavailable:
|
||||
return {}
|
||||
|
||||
|
||||
# ── War Vassal ────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class WarVassal(BaseVassal):
|
||||
"""Military operations — sieges, field battles, raids, defensive maneuvers.
|
||||
|
||||
Reward function:
|
||||
R = 0.40*ΔTerritoryValue + 0.25*ΔArmyStrengthRatio
|
||||
- 0.20*CasualtyCost - 0.10*SupplyCost + 0.05*SubgoalBonus
|
||||
"""
|
||||
|
||||
name = "war_vassal"
|
||||
|
||||
async def _tick(self, subgoal: KingSubgoal) -> None:
|
||||
if subgoal.token not in _WAR_TOKENS | _LOGISTICS_TOKENS:
|
||||
return
|
||||
|
||||
state = await self._get_state()
|
||||
reward = self._compute_reward(state, subgoal)
|
||||
|
||||
task = self._plan_action(state, subgoal)
|
||||
if task:
|
||||
await self._task_queue.put(task)
|
||||
|
||||
logger.debug(
|
||||
"%s tick: subgoal=%s reward=%.3f action=%s",
|
||||
self.name,
|
||||
subgoal.token,
|
||||
reward.total,
|
||||
task.primitive if task else "none",
|
||||
)
|
||||
|
||||
def _compute_reward(self, state: dict[str, Any], subgoal: KingSubgoal) -> WarReward:
|
||||
bonus = subgoal.priority * 0.05 if subgoal.token in _WAR_TOKENS else 0.0
|
||||
return WarReward(
|
||||
territory_delta=float(state.get("territory_delta", 0.0)),
|
||||
army_strength_ratio=float(state.get("army_strength_ratio", 1.0)),
|
||||
casualty_cost=float(state.get("casualty_cost", 0.0)),
|
||||
supply_cost=float(state.get("supply_cost", 0.0)),
|
||||
subgoal_bonus=bonus,
|
||||
)
|
||||
|
||||
def _plan_action(self, state: dict[str, Any], subgoal: KingSubgoal) -> TaskMessage | None:
|
||||
if subgoal.token == "EXPAND_TERRITORY" and subgoal.target: # noqa: S105
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="logistics_companion",
|
||||
primitive="move_party",
|
||||
args={"destination": subgoal.target},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
if subgoal.token == "RECRUIT": # noqa: S105
|
||||
qty = subgoal.quantity or 20
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="logistics_companion",
|
||||
primitive="recruit_troop",
|
||||
args={"troop_type": "infantry", "quantity": qty},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
if subgoal.token == "TRAIN": # noqa: S105
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="logistics_companion",
|
||||
primitive="upgrade_troops",
|
||||
args={},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
# ── Economy Vassal ────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class EconomyVassal(BaseVassal):
|
||||
"""Settlement management, tax collection, construction, food supply.
|
||||
|
||||
Reward function:
|
||||
R = 0.35*DailyDenarsIncome + 0.25*FoodStockBuffer + 0.20*LoyaltyAverage
|
||||
- 0.15*ConstructionQueueLength + 0.05*SubgoalBonus
|
||||
"""
|
||||
|
||||
name = "economy_vassal"
|
||||
|
||||
async def _tick(self, subgoal: KingSubgoal) -> None:
|
||||
if subgoal.token not in _ECON_TOKENS | _TRADE_TOKENS:
|
||||
return
|
||||
|
||||
state = await self._get_state()
|
||||
reward = self._compute_reward(state, subgoal)
|
||||
|
||||
task = self._plan_action(state, subgoal)
|
||||
if task:
|
||||
await self._task_queue.put(task)
|
||||
|
||||
logger.debug(
|
||||
"%s tick: subgoal=%s reward=%.3f",
|
||||
self.name,
|
||||
subgoal.token,
|
||||
reward.total,
|
||||
)
|
||||
|
||||
def _compute_reward(self, state: dict[str, Any], subgoal: KingSubgoal) -> EconomyReward:
|
||||
bonus = subgoal.priority * 0.05 if subgoal.token in _ECON_TOKENS else 0.0
|
||||
return EconomyReward(
|
||||
daily_denars_income=float(state.get("daily_income", 0.0)),
|
||||
food_stock_buffer=float(state.get("food_days_remaining", 0.0)),
|
||||
loyalty_average=float(state.get("avg_loyalty", 50.0)),
|
||||
construction_queue_length=int(state.get("construction_queue", 0)),
|
||||
subgoal_bonus=bonus,
|
||||
)
|
||||
|
||||
def _plan_action(self, state: dict[str, Any], subgoal: KingSubgoal) -> TaskMessage | None:
|
||||
if subgoal.token == "FORTIFY" and subgoal.target: # noqa: S105
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="logistics_companion",
|
||||
primitive="build_project",
|
||||
args={"settlement": subgoal.target},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
if subgoal.token == "TRADE": # noqa: S105
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="caravan_companion",
|
||||
primitive="assess_prices",
|
||||
args={"town": subgoal.target or "nearest"},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
# ── Diplomacy Vassal ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class DiplomacyVassal(BaseVassal):
|
||||
"""Relations management — alliances, peace deals, tribute, marriage.
|
||||
|
||||
Reward function:
|
||||
R = 0.30*AlliesCount + 0.25*TruceDurationValue + 0.25*RelationsScoreWeighted
|
||||
- 0.15*ActiveWarsFront + 0.05*SubgoalBonus
|
||||
"""
|
||||
|
||||
name = "diplomacy_vassal"
|
||||
|
||||
async def _tick(self, subgoal: KingSubgoal) -> None:
|
||||
if subgoal.token not in _DIPLO_TOKENS | _SCOUT_TOKENS:
|
||||
return
|
||||
|
||||
state = await self._get_state()
|
||||
reward = self._compute_reward(state, subgoal)
|
||||
|
||||
task = self._plan_action(state, subgoal)
|
||||
if task:
|
||||
await self._task_queue.put(task)
|
||||
|
||||
logger.debug(
|
||||
"%s tick: subgoal=%s reward=%.3f",
|
||||
self.name,
|
||||
subgoal.token,
|
||||
reward.total,
|
||||
)
|
||||
|
||||
def _compute_reward(self, state: dict[str, Any], subgoal: KingSubgoal) -> DiplomacyReward:
|
||||
bonus = subgoal.priority * 0.05 if subgoal.token in _DIPLO_TOKENS else 0.0
|
||||
return DiplomacyReward(
|
||||
allies_count=int(state.get("allies_count", 0)),
|
||||
truce_duration_value=float(state.get("truce_value", 0.0)),
|
||||
relations_score_weighted=float(state.get("relations_weighted", 0.0)),
|
||||
active_wars_front=int(state.get("active_wars", 0)),
|
||||
subgoal_bonus=bonus,
|
||||
)
|
||||
|
||||
def _plan_action(self, state: dict[str, Any], subgoal: KingSubgoal) -> TaskMessage | None:
|
||||
if subgoal.token == "ALLY" and subgoal.target: # noqa: S105
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="scout_companion",
|
||||
primitive="track_lord",
|
||||
args={"name": subgoal.target},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
if subgoal.token == "SPY" and subgoal.target: # noqa: S105
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="scout_companion",
|
||||
primitive="assess_garrison",
|
||||
args={"settlement": subgoal.target},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
return None
|
||||
@@ -1,198 +0,0 @@
|
||||
"""GABS TCP/JSON-RPC client.
|
||||
|
||||
Connects to the Bannerlord.GABS C# mod server running on a Windows VM.
|
||||
Protocol: newline-delimited JSON-RPC 2.0 over raw TCP.
|
||||
|
||||
Default host: localhost, port: 4825 (configurable via settings.bannerlord_gabs_host
|
||||
and settings.bannerlord_gabs_port).
|
||||
|
||||
Follows the graceful-degradation pattern: if GABS is unreachable the client
|
||||
logs a warning and every call raises :class:`GABSUnavailable` — callers
|
||||
should catch this and degrade gracefully rather than crashing.
|
||||
|
||||
Refs: #1091, #1097.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_DEFAULT_HOST = "localhost"
|
||||
_DEFAULT_PORT = 4825
|
||||
_DEFAULT_TIMEOUT = 10.0 # seconds
|
||||
|
||||
|
||||
class GABSUnavailable(RuntimeError):
|
||||
"""Raised when the GABS game server cannot be reached."""
|
||||
|
||||
|
||||
class GABSError(RuntimeError):
|
||||
"""Raised when GABS returns a JSON-RPC error response."""
|
||||
|
||||
def __init__(self, code: int, message: str) -> None:
|
||||
super().__init__(f"GABS error {code}: {message}")
|
||||
self.code = code
|
||||
|
||||
|
||||
class GABSClient:
|
||||
"""Async TCP JSON-RPC client for Bannerlord.GABS.
|
||||
|
||||
Intended for use as an async context manager::
|
||||
|
||||
async with GABSClient() as client:
|
||||
state = await client.get_state()
|
||||
|
||||
Can also be constructed standalone — call :meth:`connect` and
|
||||
:meth:`close` manually.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
host: str = _DEFAULT_HOST,
|
||||
port: int = _DEFAULT_PORT,
|
||||
timeout: float = _DEFAULT_TIMEOUT,
|
||||
) -> None:
|
||||
self._host = host
|
||||
self._port = port
|
||||
self._timeout = timeout
|
||||
self._reader: asyncio.StreamReader | None = None
|
||||
self._writer: asyncio.StreamWriter | None = None
|
||||
self._seq = 0
|
||||
self._connected = False
|
||||
|
||||
# ── Lifecycle ─────────────────────────────────────────────────────────
|
||||
|
||||
async def connect(self) -> None:
|
||||
"""Open the TCP connection to GABS.
|
||||
|
||||
Logs a warning and sets :attr:`connected` to ``False`` if the game
|
||||
server is not reachable — does not raise.
|
||||
"""
|
||||
try:
|
||||
self._reader, self._writer = await asyncio.wait_for(
|
||||
asyncio.open_connection(self._host, self._port),
|
||||
timeout=self._timeout,
|
||||
)
|
||||
self._connected = True
|
||||
logger.info("GABS connected at %s:%s", self._host, self._port)
|
||||
except (TimeoutError, OSError) as exc:
|
||||
logger.warning(
|
||||
"GABS unavailable at %s:%s — Bannerlord agent will degrade: %s",
|
||||
self._host,
|
||||
self._port,
|
||||
exc,
|
||||
)
|
||||
self._connected = False
|
||||
|
||||
async def close(self) -> None:
|
||||
if self._writer is not None:
|
||||
try:
|
||||
self._writer.close()
|
||||
await self._writer.wait_closed()
|
||||
except Exception: # noqa: BLE001
|
||||
pass
|
||||
self._connected = False
|
||||
logger.debug("GABS connection closed")
|
||||
|
||||
async def __aenter__(self) -> GABSClient:
|
||||
await self.connect()
|
||||
return self
|
||||
|
||||
async def __aexit__(self, *_: Any) -> None:
|
||||
await self.close()
|
||||
|
||||
@property
|
||||
def connected(self) -> bool:
|
||||
return self._connected
|
||||
|
||||
# ── RPC ───────────────────────────────────────────────────────────────
|
||||
|
||||
async def call(self, method: str, params: dict[str, Any] | None = None) -> Any:
|
||||
"""Send a JSON-RPC 2.0 request and return the ``result`` field.
|
||||
|
||||
Raises:
|
||||
GABSUnavailable: if the client is not connected.
|
||||
GABSError: if the server returns a JSON-RPC error.
|
||||
"""
|
||||
if not self._connected or self._reader is None or self._writer is None:
|
||||
raise GABSUnavailable(
|
||||
f"GABS not connected (host={self._host}, port={self._port}). "
|
||||
"Is the Bannerlord VM running?"
|
||||
)
|
||||
|
||||
self._seq += 1
|
||||
request = {
|
||||
"jsonrpc": "2.0",
|
||||
"id": self._seq,
|
||||
"method": method,
|
||||
"params": params or {},
|
||||
}
|
||||
payload = json.dumps(request) + "\n"
|
||||
|
||||
try:
|
||||
self._writer.write(payload.encode())
|
||||
await asyncio.wait_for(self._writer.drain(), timeout=self._timeout)
|
||||
|
||||
raw = await asyncio.wait_for(self._reader.readline(), timeout=self._timeout)
|
||||
except (TimeoutError, OSError) as exc:
|
||||
self._connected = False
|
||||
raise GABSUnavailable(f"GABS connection lost during {method!r}: {exc}") from exc
|
||||
|
||||
response = json.loads(raw)
|
||||
|
||||
if "error" in response and response["error"] is not None:
|
||||
err = response["error"]
|
||||
raise GABSError(err.get("code", -1), err.get("message", "unknown"))
|
||||
|
||||
return response.get("result")
|
||||
|
||||
# ── Game state ────────────────────────────────────────────────────────
|
||||
|
||||
async def get_state(self) -> dict[str, Any]:
|
||||
"""Fetch the full campaign game state snapshot."""
|
||||
return await self.call("game.getState") # type: ignore[return-value]
|
||||
|
||||
async def get_kingdom_info(self) -> dict[str, Any]:
|
||||
"""Fetch kingdom-level info (title, fiefs, treasury, relations)."""
|
||||
return await self.call("kingdom.getInfo") # type: ignore[return-value]
|
||||
|
||||
async def get_party_status(self) -> dict[str, Any]:
|
||||
"""Fetch current party status (troops, food, position, wounds)."""
|
||||
return await self.call("party.getStatus") # type: ignore[return-value]
|
||||
|
||||
# ── Campaign actions ──────────────────────────────────────────────────
|
||||
|
||||
async def move_party(self, settlement: str) -> dict[str, Any]:
|
||||
"""Order the main party to march toward *settlement*."""
|
||||
return await self.call("party.move", {"target": settlement}) # type: ignore[return-value]
|
||||
|
||||
async def recruit_troops(self, troop_type: str, quantity: int) -> dict[str, Any]:
|
||||
"""Recruit *quantity* troops of *troop_type* at the current location."""
|
||||
return await self.call( # type: ignore[return-value]
|
||||
"party.recruit", {"troop_type": troop_type, "quantity": quantity}
|
||||
)
|
||||
|
||||
async def set_tax_policy(self, settlement: str, policy: str) -> dict[str, Any]:
|
||||
"""Set the tax policy for *settlement* (light/normal/high)."""
|
||||
return await self.call( # type: ignore[return-value]
|
||||
"settlement.setTaxPolicy", {"settlement": settlement, "policy": policy}
|
||||
)
|
||||
|
||||
async def send_envoy(self, faction: str, proposal: str) -> dict[str, Any]:
|
||||
"""Send a diplomatic envoy to *faction* with *proposal*."""
|
||||
return await self.call( # type: ignore[return-value]
|
||||
"diplomacy.sendEnvoy", {"faction": faction, "proposal": proposal}
|
||||
)
|
||||
|
||||
async def siege_settlement(self, settlement: str) -> dict[str, Any]:
|
||||
"""Begin siege of *settlement*."""
|
||||
return await self.call("battle.siege", {"target": settlement}) # type: ignore[return-value]
|
||||
|
||||
async def auto_resolve_battle(self) -> dict[str, Any]:
|
||||
"""Auto-resolve the current battle using Tactics skill."""
|
||||
return await self.call("battle.autoResolve") # type: ignore[return-value]
|
||||
@@ -1,256 +0,0 @@
|
||||
"""Asset ledger for the Bannerlord sovereign agent.
|
||||
|
||||
Tracks kingdom assets (denars, settlements, troop allocations) in an
|
||||
in-memory dict backed by SQLite for persistence. Follows the existing
|
||||
SQLite migration pattern in this repo.
|
||||
|
||||
The King has exclusive write access to treasury and settlement ownership.
|
||||
Vassals receive an allocated budget and cannot exceed it without King
|
||||
re-authorization. Companions hold only work-in-progress quotas.
|
||||
|
||||
Refs: #1097, #1099.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import sqlite3
|
||||
from collections.abc import Iterator
|
||||
from contextlib import contextmanager
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_DEFAULT_DB = Path.home() / ".timmy" / "bannerlord" / "ledger.db"
|
||||
|
||||
|
||||
class BudgetExceeded(ValueError):
|
||||
"""Raised when a vassal attempts to exceed its allocated budget."""
|
||||
|
||||
|
||||
class Ledger:
|
||||
"""Sovereign asset ledger backed by SQLite.
|
||||
|
||||
Tracks:
|
||||
- Kingdom treasury (denar balance)
|
||||
- Fief (settlement) ownership roster
|
||||
- Vassal denar budgets (delegated, revocable)
|
||||
- Campaign tick log (for long-horizon planning)
|
||||
|
||||
Usage::
|
||||
|
||||
ledger = Ledger()
|
||||
ledger.initialize()
|
||||
ledger.deposit(5000, "tax income — Epicrotea")
|
||||
ledger.allocate_budget("war_vassal", 2000)
|
||||
"""
|
||||
|
||||
def __init__(self, db_path: Path = _DEFAULT_DB) -> None:
|
||||
self._db_path = db_path
|
||||
self._db_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# ── Setup ─────────────────────────────────────────────────────────────
|
||||
|
||||
def initialize(self) -> None:
|
||||
"""Create tables if they don't exist."""
|
||||
with self._conn() as conn:
|
||||
conn.executescript(
|
||||
"""
|
||||
CREATE TABLE IF NOT EXISTS treasury (
|
||||
id INTEGER PRIMARY KEY CHECK (id = 1),
|
||||
balance REAL NOT NULL DEFAULT 0
|
||||
);
|
||||
INSERT OR IGNORE INTO treasury (id, balance) VALUES (1, 0);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS fiefs (
|
||||
name TEXT PRIMARY KEY,
|
||||
fief_type TEXT NOT NULL, -- town / castle / village
|
||||
acquired_at TEXT NOT NULL
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS vassal_budgets (
|
||||
agent TEXT PRIMARY KEY,
|
||||
allocated REAL NOT NULL DEFAULT 0,
|
||||
spent REAL NOT NULL DEFAULT 0
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS tick_log (
|
||||
tick INTEGER PRIMARY KEY,
|
||||
campaign_day INTEGER NOT NULL,
|
||||
subgoal TEXT,
|
||||
reward_war REAL,
|
||||
reward_econ REAL,
|
||||
reward_diplo REAL,
|
||||
logged_at TEXT NOT NULL
|
||||
);
|
||||
"""
|
||||
)
|
||||
logger.debug("Ledger initialized at %s", self._db_path)
|
||||
|
||||
# ── Treasury ──────────────────────────────────────────────────────────
|
||||
|
||||
def balance(self) -> float:
|
||||
with self._conn() as conn:
|
||||
row = conn.execute("SELECT balance FROM treasury WHERE id = 1").fetchone()
|
||||
return float(row[0]) if row else 0.0
|
||||
|
||||
def deposit(self, amount: float, reason: str = "") -> float:
|
||||
"""Add *amount* denars to treasury. Returns new balance."""
|
||||
if amount < 0:
|
||||
raise ValueError("Use withdraw() for negative amounts")
|
||||
with self._conn() as conn:
|
||||
conn.execute("UPDATE treasury SET balance = balance + ? WHERE id = 1", (amount,))
|
||||
bal = self.balance()
|
||||
logger.info("Treasury +%.0f denars (%s) → balance %.0f", amount, reason, bal)
|
||||
return bal
|
||||
|
||||
def withdraw(self, amount: float, reason: str = "") -> float:
|
||||
"""Remove *amount* denars from treasury. Returns new balance."""
|
||||
if amount < 0:
|
||||
raise ValueError("Amount must be positive")
|
||||
bal = self.balance()
|
||||
if amount > bal:
|
||||
raise BudgetExceeded(
|
||||
f"Cannot withdraw {amount:.0f} denars — treasury balance is only {bal:.0f}"
|
||||
)
|
||||
with self._conn() as conn:
|
||||
conn.execute("UPDATE treasury SET balance = balance - ? WHERE id = 1", (amount,))
|
||||
new_bal = self.balance()
|
||||
logger.info("Treasury -%.0f denars (%s) → balance %.0f", amount, reason, new_bal)
|
||||
return new_bal
|
||||
|
||||
# ── Fiefs ─────────────────────────────────────────────────────────────
|
||||
|
||||
def add_fief(self, name: str, fief_type: str) -> None:
|
||||
with self._conn() as conn:
|
||||
conn.execute(
|
||||
"INSERT OR REPLACE INTO fiefs (name, fief_type, acquired_at) VALUES (?, ?, ?)",
|
||||
(name, fief_type, datetime.utcnow().isoformat()),
|
||||
)
|
||||
logger.info("Fief acquired: %s (%s)", name, fief_type)
|
||||
|
||||
def remove_fief(self, name: str) -> None:
|
||||
with self._conn() as conn:
|
||||
conn.execute("DELETE FROM fiefs WHERE name = ?", (name,))
|
||||
logger.info("Fief lost: %s", name)
|
||||
|
||||
def list_fiefs(self) -> list[dict[str, str]]:
|
||||
with self._conn() as conn:
|
||||
rows = conn.execute("SELECT name, fief_type, acquired_at FROM fiefs").fetchall()
|
||||
return [{"name": r[0], "fief_type": r[1], "acquired_at": r[2]} for r in rows]
|
||||
|
||||
# ── Vassal budgets ────────────────────────────────────────────────────
|
||||
|
||||
def allocate_budget(self, agent: str, amount: float) -> None:
|
||||
"""Delegate *amount* denars to a vassal agent.
|
||||
|
||||
Withdraws from treasury. Raises :class:`BudgetExceeded` if
|
||||
the treasury cannot cover the allocation.
|
||||
"""
|
||||
self.withdraw(amount, reason=f"budget → {agent}")
|
||||
with self._conn() as conn:
|
||||
conn.execute(
|
||||
"""
|
||||
INSERT INTO vassal_budgets (agent, allocated, spent)
|
||||
VALUES (?, ?, 0)
|
||||
ON CONFLICT(agent) DO UPDATE SET allocated = allocated + excluded.allocated
|
||||
""",
|
||||
(agent, amount),
|
||||
)
|
||||
logger.info("Allocated %.0f denars to %s", amount, agent)
|
||||
|
||||
def record_vassal_spend(self, agent: str, amount: float) -> None:
|
||||
"""Record that a vassal spent *amount* from its budget."""
|
||||
with self._conn() as conn:
|
||||
row = conn.execute(
|
||||
"SELECT allocated, spent FROM vassal_budgets WHERE agent = ?", (agent,)
|
||||
).fetchone()
|
||||
if row is None:
|
||||
raise BudgetExceeded(f"{agent} has no allocated budget")
|
||||
allocated, spent = row
|
||||
if spent + amount > allocated:
|
||||
raise BudgetExceeded(
|
||||
f"{agent} budget exhausted: {spent:.0f}/{allocated:.0f} spent, "
|
||||
f"requested {amount:.0f}"
|
||||
)
|
||||
with self._conn() as conn:
|
||||
conn.execute(
|
||||
"UPDATE vassal_budgets SET spent = spent + ? WHERE agent = ?",
|
||||
(amount, agent),
|
||||
)
|
||||
|
||||
def vassal_remaining(self, agent: str) -> float:
|
||||
with self._conn() as conn:
|
||||
row = conn.execute(
|
||||
"SELECT allocated - spent FROM vassal_budgets WHERE agent = ?", (agent,)
|
||||
).fetchone()
|
||||
return float(row[0]) if row else 0.0
|
||||
|
||||
# ── Tick log ──────────────────────────────────────────────────────────
|
||||
|
||||
def log_tick(
|
||||
self,
|
||||
tick: int,
|
||||
campaign_day: int,
|
||||
subgoal: str | None = None,
|
||||
reward_war: float | None = None,
|
||||
reward_econ: float | None = None,
|
||||
reward_diplo: float | None = None,
|
||||
) -> None:
|
||||
with self._conn() as conn:
|
||||
conn.execute(
|
||||
"""
|
||||
INSERT OR REPLACE INTO tick_log
|
||||
(tick, campaign_day, subgoal, reward_war, reward_econ, reward_diplo, logged_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?)
|
||||
""",
|
||||
(
|
||||
tick,
|
||||
campaign_day,
|
||||
subgoal,
|
||||
reward_war,
|
||||
reward_econ,
|
||||
reward_diplo,
|
||||
datetime.utcnow().isoformat(),
|
||||
),
|
||||
)
|
||||
|
||||
def tick_history(self, last_n: int = 100) -> list[dict]:
|
||||
with self._conn() as conn:
|
||||
rows = conn.execute(
|
||||
"""
|
||||
SELECT tick, campaign_day, subgoal, reward_war, reward_econ, reward_diplo, logged_at
|
||||
FROM tick_log
|
||||
ORDER BY tick DESC
|
||||
LIMIT ?
|
||||
""",
|
||||
(last_n,),
|
||||
).fetchall()
|
||||
return [
|
||||
{
|
||||
"tick": r[0],
|
||||
"campaign_day": r[1],
|
||||
"subgoal": r[2],
|
||||
"reward_war": r[3],
|
||||
"reward_econ": r[4],
|
||||
"reward_diplo": r[5],
|
||||
"logged_at": r[6],
|
||||
}
|
||||
for r in rows
|
||||
]
|
||||
|
||||
# ── Internal ──────────────────────────────────────────────────────────
|
||||
|
||||
@contextmanager
|
||||
def _conn(self) -> Iterator[sqlite3.Connection]:
|
||||
conn = sqlite3.connect(self._db_path)
|
||||
conn.execute("PRAGMA journal_mode=WAL")
|
||||
try:
|
||||
yield conn
|
||||
conn.commit()
|
||||
except Exception:
|
||||
conn.rollback()
|
||||
raise
|
||||
finally:
|
||||
conn.close()
|
||||
@@ -1,191 +0,0 @@
|
||||
"""Bannerlord feudal hierarchy data models.
|
||||
|
||||
All inter-agent communication uses typed Pydantic models. No raw dicts
|
||||
cross agent boundaries — every message is validated at construction time.
|
||||
|
||||
Design: Ahilan & Dayan (2019) Feudal Multi-Agent Hierarchies.
|
||||
Refs: #1097, #1099.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import datetime
|
||||
from typing import Any, Literal
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
# ── Subgoal vocabulary ────────────────────────────────────────────────────────
|
||||
|
||||
SUBGOAL_TOKENS = frozenset(
|
||||
{
|
||||
"EXPAND_TERRITORY", # Take or secure a fief — War Vassal
|
||||
"RAID_ECONOMY", # Raid enemy villages for denars — War Vassal
|
||||
"FORTIFY", # Upgrade or repair a settlement — Economy Vassal
|
||||
"RECRUIT", # Fill party to capacity — Logistics Companion
|
||||
"TRADE", # Execute profitable trade route — Caravan Companion
|
||||
"ALLY", # Pursue non-aggression / alliance — Diplomacy Vassal
|
||||
"SPY", # Gain information on target faction — Scout Companion
|
||||
"HEAL", # Rest party until wounds recovered — Logistics Companion
|
||||
"CONSOLIDATE", # Hold territory, no expansion — Economy Vassal
|
||||
"TRAIN", # Level troops via auto-resolve bandits — War Vassal
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
# ── King subgoal ──────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class KingSubgoal(BaseModel):
|
||||
"""Strategic directive issued by the King agent to vassals.
|
||||
|
||||
The King operates on campaign-map timescale (days to weeks of in-game
|
||||
time). His sole output is one subgoal token plus optional parameters.
|
||||
He never micro-manages primitives.
|
||||
"""
|
||||
|
||||
token: str = Field(..., description="One of SUBGOAL_TOKENS")
|
||||
target: str | None = Field(None, description="Named target (settlement, lord, faction)")
|
||||
quantity: int | None = Field(None, description="For RECRUIT, TRADE tokens", ge=1)
|
||||
priority: float = Field(1.0, ge=0.0, le=2.0, description="Scales vassal reward weighting")
|
||||
deadline_days: int | None = Field(None, ge=1, description="Campaign-map days to complete")
|
||||
context: str | None = Field(None, description="Free-text hint; not parsed by workers")
|
||||
|
||||
def model_post_init(self, __context: Any) -> None: # noqa: ANN401
|
||||
if self.token not in SUBGOAL_TOKENS:
|
||||
raise ValueError(
|
||||
f"Unknown subgoal token {self.token!r}. Must be one of: {sorted(SUBGOAL_TOKENS)}"
|
||||
)
|
||||
|
||||
|
||||
# ── Inter-agent messages ──────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class SubgoalMessage(BaseModel):
|
||||
"""King → Vassal direction."""
|
||||
|
||||
msg_type: Literal["subgoal"] = "subgoal"
|
||||
from_agent: Literal["king"] = "king"
|
||||
to_agent: str = Field(..., description="e.g. 'war_vassal', 'economy_vassal'")
|
||||
subgoal: KingSubgoal
|
||||
issued_at: datetime = Field(default_factory=datetime.utcnow)
|
||||
|
||||
|
||||
class TaskMessage(BaseModel):
|
||||
"""Vassal → Companion direction."""
|
||||
|
||||
msg_type: Literal["task"] = "task"
|
||||
from_agent: str = Field(..., description="e.g. 'war_vassal'")
|
||||
to_agent: str = Field(..., description="e.g. 'logistics_companion'")
|
||||
primitive: str = Field(..., description="One of the companion primitives")
|
||||
args: dict[str, Any] = Field(default_factory=dict)
|
||||
priority: float = Field(1.0, ge=0.0, le=2.0)
|
||||
issued_at: datetime = Field(default_factory=datetime.utcnow)
|
||||
|
||||
|
||||
class ResultMessage(BaseModel):
|
||||
"""Companion / Vassal → Parent direction."""
|
||||
|
||||
msg_type: Literal["result"] = "result"
|
||||
from_agent: str
|
||||
to_agent: str
|
||||
success: bool
|
||||
outcome: dict[str, Any] = Field(default_factory=dict, description="Primitive-specific result")
|
||||
reward_delta: float = Field(0.0, description="Computed reward contribution")
|
||||
completed_at: datetime = Field(default_factory=datetime.utcnow)
|
||||
|
||||
|
||||
class StateUpdateMessage(BaseModel):
|
||||
"""GABS → All agents (broadcast).
|
||||
|
||||
Sent every campaign tick. Agents consume at their own cadence.
|
||||
"""
|
||||
|
||||
msg_type: Literal["state"] = "state"
|
||||
game_state: dict[str, Any] = Field(..., description="Full GABS state snapshot")
|
||||
tick: int = Field(..., ge=0)
|
||||
timestamp: datetime = Field(default_factory=datetime.utcnow)
|
||||
|
||||
|
||||
# ── Reward snapshots ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class WarReward(BaseModel):
|
||||
"""Computed reward for the War Vassal at a given tick."""
|
||||
|
||||
territory_delta: float = 0.0
|
||||
army_strength_ratio: float = 1.0
|
||||
casualty_cost: float = 0.0
|
||||
supply_cost: float = 0.0
|
||||
subgoal_bonus: float = 0.0
|
||||
|
||||
@property
|
||||
def total(self) -> float:
|
||||
w1, w2, w3, w4, w5 = 0.40, 0.25, 0.20, 0.10, 0.05
|
||||
return (
|
||||
w1 * self.territory_delta
|
||||
+ w2 * self.army_strength_ratio
|
||||
- w3 * self.casualty_cost
|
||||
- w4 * self.supply_cost
|
||||
+ w5 * self.subgoal_bonus
|
||||
)
|
||||
|
||||
|
||||
class EconomyReward(BaseModel):
|
||||
"""Computed reward for the Economy Vassal at a given tick."""
|
||||
|
||||
daily_denars_income: float = 0.0
|
||||
food_stock_buffer: float = 0.0
|
||||
loyalty_average: float = 50.0
|
||||
construction_queue_length: int = 0
|
||||
subgoal_bonus: float = 0.0
|
||||
|
||||
@property
|
||||
def total(self) -> float:
|
||||
w1, w2, w3, w4, w5 = 0.35, 0.25, 0.20, 0.15, 0.05
|
||||
return (
|
||||
w1 * self.daily_denars_income
|
||||
+ w2 * self.food_stock_buffer
|
||||
+ w3 * self.loyalty_average
|
||||
- w4 * self.construction_queue_length
|
||||
+ w5 * self.subgoal_bonus
|
||||
)
|
||||
|
||||
|
||||
class DiplomacyReward(BaseModel):
|
||||
"""Computed reward for the Diplomacy Vassal at a given tick."""
|
||||
|
||||
allies_count: int = 0
|
||||
truce_duration_value: float = 0.0
|
||||
relations_score_weighted: float = 0.0
|
||||
active_wars_front: int = 0
|
||||
subgoal_bonus: float = 0.0
|
||||
|
||||
@property
|
||||
def total(self) -> float:
|
||||
w1, w2, w3, w4, w5 = 0.30, 0.25, 0.25, 0.15, 0.05
|
||||
return (
|
||||
w1 * self.allies_count
|
||||
+ w2 * self.truce_duration_value
|
||||
+ w3 * self.relations_score_weighted
|
||||
- w4 * self.active_wars_front
|
||||
+ w5 * self.subgoal_bonus
|
||||
)
|
||||
|
||||
|
||||
# ── Victory condition ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class VictoryCondition(BaseModel):
|
||||
"""Sovereign Victory (M5) — evaluated each campaign tick."""
|
||||
|
||||
holds_king_title: bool = False
|
||||
territory_control_pct: float = Field(
|
||||
0.0, ge=0.0, le=100.0, description="% of Calradia fiefs held"
|
||||
)
|
||||
majority_threshold: float = Field(
|
||||
51.0, ge=0.0, le=100.0, description="Required % for majority control"
|
||||
)
|
||||
|
||||
@property
|
||||
def achieved(self) -> bool:
|
||||
return self.holds_king_title and self.territory_control_pct >= self.majority_threshold
|
||||
@@ -42,13 +42,11 @@ from dashboard.routes.hermes import router as hermes_router
|
||||
from dashboard.routes.loop_qa import router as loop_qa_router
|
||||
from dashboard.routes.memory import router as memory_router
|
||||
from dashboard.routes.mobile import router as mobile_router
|
||||
from dashboard.routes.nexus import router as nexus_router
|
||||
from dashboard.routes.models import api_router as models_api_router
|
||||
from dashboard.routes.models import router as models_router
|
||||
from dashboard.routes.quests import router as quests_router
|
||||
from dashboard.routes.scorecards import router as scorecards_router
|
||||
from dashboard.routes.sovereignty_metrics import router as sovereignty_metrics_router
|
||||
from dashboard.routes.sovereignty_ws import router as sovereignty_ws_router
|
||||
from dashboard.routes.spark import router as spark_router
|
||||
from dashboard.routes.system import router as system_router
|
||||
from dashboard.routes.tasks import router as tasks_router
|
||||
@@ -653,7 +651,6 @@ app.include_router(tools_router)
|
||||
app.include_router(spark_router)
|
||||
app.include_router(discord_router)
|
||||
app.include_router(memory_router)
|
||||
app.include_router(nexus_router)
|
||||
app.include_router(grok_router)
|
||||
app.include_router(models_router)
|
||||
app.include_router(models_api_router)
|
||||
@@ -675,7 +672,6 @@ app.include_router(hermes_router)
|
||||
app.include_router(quests_router)
|
||||
app.include_router(scorecards_router)
|
||||
app.include_router(sovereignty_metrics_router)
|
||||
app.include_router(sovereignty_ws_router)
|
||||
|
||||
|
||||
@app.websocket("/ws")
|
||||
|
||||
@@ -8,8 +8,6 @@ from .database import Base # Assuming a shared Base in models/database.py
|
||||
|
||||
|
||||
class TaskState(StrEnum):
|
||||
"""Enumeration of possible task lifecycle states."""
|
||||
|
||||
LATER = "LATER"
|
||||
NEXT = "NEXT"
|
||||
NOW = "NOW"
|
||||
@@ -18,16 +16,12 @@ class TaskState(StrEnum):
|
||||
|
||||
|
||||
class TaskCertainty(StrEnum):
|
||||
"""Enumeration of task time-certainty levels."""
|
||||
|
||||
FUZZY = "FUZZY" # An intention without a time
|
||||
SOFT = "SOFT" # A flexible task with a time
|
||||
HARD = "HARD" # A fixed meeting/appointment
|
||||
|
||||
|
||||
class Task(Base):
|
||||
"""SQLAlchemy model representing a CALM task."""
|
||||
|
||||
__tablename__ = "tasks"
|
||||
|
||||
id = Column(Integer, primary_key=True, index=True)
|
||||
@@ -58,8 +52,6 @@ class Task(Base):
|
||||
|
||||
|
||||
class JournalEntry(Base):
|
||||
"""SQLAlchemy model for a daily journal entry with MITs and reflections."""
|
||||
|
||||
__tablename__ = "journal_entries"
|
||||
|
||||
id = Column(Integer, primary_key=True, index=True)
|
||||
|
||||
@@ -14,8 +14,6 @@ router = APIRouter(prefix="/discord", tags=["discord"])
|
||||
|
||||
|
||||
class TokenPayload(BaseModel):
|
||||
"""Request payload containing a Discord bot token."""
|
||||
|
||||
token: str
|
||||
|
||||
|
||||
|
||||
@@ -1,168 +0,0 @@
|
||||
"""Nexus — Timmy's persistent conversational awareness space.
|
||||
|
||||
A conversational-only interface where Timmy maintains live memory context.
|
||||
No tool use; pure conversation with memory integration and a teaching panel.
|
||||
|
||||
Routes:
|
||||
GET /nexus — render nexus page with live memory sidebar
|
||||
POST /nexus/chat — send a message; returns HTMX partial
|
||||
POST /nexus/teach — inject a fact into Timmy's live memory
|
||||
DELETE /nexus/history — clear the nexus conversation history
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from datetime import datetime, timezone
|
||||
|
||||
from fastapi import APIRouter, Form, Request
|
||||
from fastapi.responses import HTMLResponse
|
||||
|
||||
from dashboard.templating import templates
|
||||
from timmy.memory_system import (
|
||||
get_memory_stats,
|
||||
recall_personal_facts_with_ids,
|
||||
search_memories,
|
||||
store_personal_fact,
|
||||
)
|
||||
from timmy.session import _clean_response, chat, reset_session
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
router = APIRouter(prefix="/nexus", tags=["nexus"])
|
||||
|
||||
_NEXUS_SESSION_ID = "nexus"
|
||||
_MAX_MESSAGE_LENGTH = 10_000
|
||||
|
||||
# In-memory conversation log for the Nexus session (mirrors chat store pattern
|
||||
# but is scoped to the Nexus so it won't pollute the main dashboard history).
|
||||
_nexus_log: list[dict] = []
|
||||
|
||||
|
||||
def _ts() -> str:
|
||||
return datetime.now(timezone.utc).strftime("%H:%M:%S")
|
||||
|
||||
|
||||
def _append_log(role: str, content: str) -> None:
|
||||
_nexus_log.append({"role": role, "content": content, "timestamp": _ts()})
|
||||
# Keep last 200 exchanges to bound memory usage
|
||||
if len(_nexus_log) > 200:
|
||||
del _nexus_log[:-200]
|
||||
|
||||
|
||||
@router.get("", response_class=HTMLResponse)
|
||||
async def nexus_page(request: Request):
|
||||
"""Render the Nexus page with live memory context."""
|
||||
stats = get_memory_stats()
|
||||
facts = recall_personal_facts_with_ids()[:8]
|
||||
|
||||
return templates.TemplateResponse(
|
||||
request,
|
||||
"nexus.html",
|
||||
{
|
||||
"page_title": "Nexus",
|
||||
"messages": list(_nexus_log),
|
||||
"stats": stats,
|
||||
"facts": facts,
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
@router.post("/chat", response_class=HTMLResponse)
|
||||
async def nexus_chat(request: Request, message: str = Form(...)):
|
||||
"""Conversational-only chat routed through the Nexus session.
|
||||
|
||||
Does not invoke tool-use approval flow — pure conversation with memory
|
||||
context injected from Timmy's live memory store.
|
||||
"""
|
||||
message = message.strip()
|
||||
if not message:
|
||||
return HTMLResponse("")
|
||||
if len(message) > _MAX_MESSAGE_LENGTH:
|
||||
return templates.TemplateResponse(
|
||||
request,
|
||||
"partials/nexus_message.html",
|
||||
{
|
||||
"user_message": message[:80] + "…",
|
||||
"response": None,
|
||||
"error": "Message too long (max 10 000 chars).",
|
||||
"timestamp": _ts(),
|
||||
"memory_hits": [],
|
||||
},
|
||||
)
|
||||
|
||||
ts = _ts()
|
||||
|
||||
# Fetch semantically relevant memories to surface in the sidebar
|
||||
try:
|
||||
memory_hits = await asyncio.to_thread(
|
||||
search_memories, query=message, limit=4
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.warning("Nexus memory search failed: %s", exc)
|
||||
memory_hits = []
|
||||
|
||||
# Conversational response — no tool approval flow
|
||||
response_text: str | None = None
|
||||
error_text: str | None = None
|
||||
try:
|
||||
raw = await chat(message, session_id=_NEXUS_SESSION_ID)
|
||||
response_text = _clean_response(raw)
|
||||
except Exception as exc:
|
||||
logger.error("Nexus chat error: %s", exc)
|
||||
error_text = "Timmy is unavailable right now. Check that Ollama is running."
|
||||
|
||||
_append_log("user", message)
|
||||
if response_text:
|
||||
_append_log("assistant", response_text)
|
||||
|
||||
return templates.TemplateResponse(
|
||||
request,
|
||||
"partials/nexus_message.html",
|
||||
{
|
||||
"user_message": message,
|
||||
"response": response_text,
|
||||
"error": error_text,
|
||||
"timestamp": ts,
|
||||
"memory_hits": memory_hits,
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
@router.post("/teach", response_class=HTMLResponse)
|
||||
async def nexus_teach(request: Request, fact: str = Form(...)):
|
||||
"""Inject a fact into Timmy's live memory from the Nexus teaching panel."""
|
||||
fact = fact.strip()
|
||||
if not fact:
|
||||
return HTMLResponse("")
|
||||
|
||||
try:
|
||||
await asyncio.to_thread(store_personal_fact, fact)
|
||||
facts = await asyncio.to_thread(recall_personal_facts_with_ids)
|
||||
facts = facts[:8]
|
||||
except Exception as exc:
|
||||
logger.error("Nexus teach error: %s", exc)
|
||||
facts = []
|
||||
|
||||
return templates.TemplateResponse(
|
||||
request,
|
||||
"partials/nexus_facts.html",
|
||||
{"facts": facts, "taught": fact},
|
||||
)
|
||||
|
||||
|
||||
@router.delete("/history", response_class=HTMLResponse)
|
||||
async def nexus_clear_history(request: Request):
|
||||
"""Clear the Nexus conversation history."""
|
||||
_nexus_log.clear()
|
||||
reset_session(session_id=_NEXUS_SESSION_ID)
|
||||
return templates.TemplateResponse(
|
||||
request,
|
||||
"partials/nexus_message.html",
|
||||
{
|
||||
"user_message": None,
|
||||
"response": "Nexus conversation cleared.",
|
||||
"error": None,
|
||||
"timestamp": _ts(),
|
||||
"memory_hits": [],
|
||||
},
|
||||
)
|
||||
@@ -10,7 +10,6 @@ from fastapi.responses import HTMLResponse, JSONResponse
|
||||
|
||||
from dashboard.services.scorecard_service import (
|
||||
PeriodType,
|
||||
ScorecardSummary,
|
||||
generate_all_scorecards,
|
||||
generate_scorecard,
|
||||
get_tracked_agents,
|
||||
@@ -27,216 +26,6 @@ def _format_period_label(period_type: PeriodType) -> str:
|
||||
return "Daily" if period_type == PeriodType.daily else "Weekly"
|
||||
|
||||
|
||||
def _parse_period(period: str) -> PeriodType:
|
||||
"""Parse period string into PeriodType, defaulting to daily on invalid input.
|
||||
|
||||
Args:
|
||||
period: The period string ('daily' or 'weekly')
|
||||
|
||||
Returns:
|
||||
PeriodType.daily or PeriodType.weekly
|
||||
"""
|
||||
try:
|
||||
return PeriodType(period.lower())
|
||||
except ValueError:
|
||||
return PeriodType.daily
|
||||
|
||||
|
||||
def _format_token_display(token_net: int) -> str:
|
||||
"""Format token net value with +/- prefix for display.
|
||||
|
||||
Args:
|
||||
token_net: The net token value
|
||||
|
||||
Returns:
|
||||
Formatted string with + prefix for positive values
|
||||
"""
|
||||
return f"{'+' if token_net > 0 else ''}{token_net}"
|
||||
|
||||
|
||||
def _format_token_class(token_net: int) -> str:
|
||||
"""Get CSS class for token net value based on sign.
|
||||
|
||||
Args:
|
||||
token_net: The net token value
|
||||
|
||||
Returns:
|
||||
'text-success' for positive/zero, 'text-danger' for negative
|
||||
"""
|
||||
return "text-success" if token_net >= 0 else "text-danger"
|
||||
|
||||
|
||||
def _build_patterns_html(patterns: list[str]) -> str:
|
||||
"""Build HTML for patterns section if patterns exist.
|
||||
|
||||
Args:
|
||||
patterns: List of pattern strings
|
||||
|
||||
Returns:
|
||||
HTML string for patterns section or empty string
|
||||
"""
|
||||
if not patterns:
|
||||
return ""
|
||||
|
||||
patterns_list = "".join([f"<li>{p}</li>" for p in patterns])
|
||||
return f"""
|
||||
<div class="mt-3">
|
||||
<h6>Patterns</h6>
|
||||
<ul class="list-unstyled text-info">
|
||||
{patterns_list}
|
||||
</ul>
|
||||
</div>
|
||||
"""
|
||||
|
||||
|
||||
def _build_narrative_html(bullets: list[str]) -> str:
|
||||
"""Build HTML for narrative bullets.
|
||||
|
||||
Args:
|
||||
bullets: List of narrative bullet strings
|
||||
|
||||
Returns:
|
||||
HTML string with list items
|
||||
"""
|
||||
return "".join([f"<li>{b}</li>" for b in bullets])
|
||||
|
||||
|
||||
def _build_metrics_row_html(metrics: dict) -> str:
|
||||
"""Build HTML for the metrics summary row.
|
||||
|
||||
Args:
|
||||
metrics: Dictionary with PRs, issues, tests, and token metrics
|
||||
|
||||
Returns:
|
||||
HTML string for the metrics row
|
||||
"""
|
||||
prs_opened = metrics["prs_opened"]
|
||||
prs_merged = metrics["prs_merged"]
|
||||
pr_merge_rate = int(metrics["pr_merge_rate"] * 100)
|
||||
issues_touched = metrics["issues_touched"]
|
||||
tests_affected = metrics["tests_affected"]
|
||||
token_net = metrics["token_net"]
|
||||
|
||||
token_class = _format_token_class(token_net)
|
||||
token_display = _format_token_display(token_net)
|
||||
|
||||
return f"""
|
||||
<div class="row text-center small">
|
||||
<div class="col">
|
||||
<div class="text-muted">PRs</div>
|
||||
<div class="fw-bold">{prs_opened}/{prs_merged}</div>
|
||||
<div class="text-muted" style="font-size: 0.75rem;">
|
||||
{pr_merge_rate}% merged
|
||||
</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Issues</div>
|
||||
<div class="fw-bold">{issues_touched}</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Tests</div>
|
||||
<div class="fw-bold">{tests_affected}</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Tokens</div>
|
||||
<div class="fw-bold {token_class}">{token_display}</div>
|
||||
</div>
|
||||
</div>
|
||||
"""
|
||||
|
||||
|
||||
def _render_scorecard_panel(
|
||||
agent_id: str,
|
||||
period_type: PeriodType,
|
||||
data: dict,
|
||||
) -> str:
|
||||
"""Render HTML for a single scorecard panel.
|
||||
|
||||
Args:
|
||||
agent_id: The agent ID
|
||||
period_type: Daily or weekly period
|
||||
data: Scorecard data dictionary with metrics, patterns, narrative_bullets
|
||||
|
||||
Returns:
|
||||
HTML string for the scorecard panel
|
||||
"""
|
||||
patterns_html = _build_patterns_html(data.get("patterns", []))
|
||||
bullets_html = _build_narrative_html(data.get("narrative_bullets", []))
|
||||
metrics_row = _build_metrics_row_html(data["metrics"])
|
||||
|
||||
return f"""
|
||||
<div class="card mc-panel">
|
||||
<div class="card-header d-flex justify-content-between align-items-center">
|
||||
<h5 class="card-title mb-0">{agent_id.title()}</h5>
|
||||
<span class="badge bg-secondary">{_format_period_label(period_type)}</span>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<ul class="list-unstyled mb-3">
|
||||
{bullets_html}
|
||||
</ul>
|
||||
{metrics_row}
|
||||
{patterns_html}
|
||||
</div>
|
||||
</div>
|
||||
"""
|
||||
|
||||
|
||||
def _render_empty_scorecard(agent_id: str) -> str:
|
||||
"""Render HTML for an empty scorecard (no activity).
|
||||
|
||||
Args:
|
||||
agent_id: The agent ID
|
||||
|
||||
Returns:
|
||||
HTML string for the empty scorecard panel
|
||||
"""
|
||||
return f"""
|
||||
<div class="card mc-panel">
|
||||
<h5 class="card-title">{agent_id.title()}</h5>
|
||||
<p class="text-muted">No activity recorded for this period.</p>
|
||||
</div>
|
||||
"""
|
||||
|
||||
|
||||
def _render_error_scorecard(agent_id: str, error: str) -> str:
|
||||
"""Render HTML for a scorecard that failed to load.
|
||||
|
||||
Args:
|
||||
agent_id: The agent ID
|
||||
error: Error message string
|
||||
|
||||
Returns:
|
||||
HTML string for the error scorecard panel
|
||||
"""
|
||||
return f"""
|
||||
<div class="card mc-panel border-danger">
|
||||
<h5 class="card-title">{agent_id.title()}</h5>
|
||||
<p class="text-danger">Error loading scorecard: {error}</p>
|
||||
</div>
|
||||
"""
|
||||
|
||||
|
||||
def _render_single_panel_wrapper(
|
||||
agent_id: str,
|
||||
period_type: PeriodType,
|
||||
scorecard: ScorecardSummary | None,
|
||||
) -> str:
|
||||
"""Render a complete scorecard panel with wrapper div for single panel view.
|
||||
|
||||
Args:
|
||||
agent_id: The agent ID
|
||||
period_type: Daily or weekly period
|
||||
scorecard: ScorecardSummary object or None
|
||||
|
||||
Returns:
|
||||
HTML string for the complete panel
|
||||
"""
|
||||
if scorecard is None:
|
||||
return _render_empty_scorecard(agent_id)
|
||||
|
||||
return _render_scorecard_panel(agent_id, period_type, scorecard.to_dict())
|
||||
|
||||
|
||||
@router.get("/api/agents")
|
||||
async def list_tracked_agents() -> dict[str, list[str]]:
|
||||
"""Return the list of tracked agent IDs.
|
||||
@@ -360,50 +149,99 @@ async def agent_scorecard_panel(
|
||||
Returns:
|
||||
HTML panel with scorecard content
|
||||
"""
|
||||
period_type = _parse_period(period)
|
||||
try:
|
||||
period_type = PeriodType(period.lower())
|
||||
except ValueError:
|
||||
period_type = PeriodType.daily
|
||||
|
||||
try:
|
||||
scorecard = generate_scorecard(agent_id, period_type)
|
||||
html_content = _render_single_panel_wrapper(agent_id, period_type, scorecard)
|
||||
|
||||
if scorecard is None:
|
||||
return HTMLResponse(
|
||||
content=f"""
|
||||
<div class="card mc-panel">
|
||||
<h5 class="card-title">{agent_id.title()}</h5>
|
||||
<p class="text-muted">No activity recorded for this period.</p>
|
||||
</div>
|
||||
""",
|
||||
status_code=200,
|
||||
)
|
||||
|
||||
data = scorecard.to_dict()
|
||||
|
||||
# Build patterns HTML
|
||||
patterns_html = ""
|
||||
if data["patterns"]:
|
||||
patterns_list = "".join([f"<li>{p}</li>" for p in data["patterns"]])
|
||||
patterns_html = f"""
|
||||
<div class="mt-3">
|
||||
<h6>Patterns</h6>
|
||||
<ul class="list-unstyled text-info">
|
||||
{patterns_list}
|
||||
</ul>
|
||||
</div>
|
||||
"""
|
||||
|
||||
# Build bullets HTML
|
||||
bullets_html = "".join([f"<li>{b}</li>" for b in data["narrative_bullets"]])
|
||||
|
||||
# Build metrics summary
|
||||
metrics = data["metrics"]
|
||||
|
||||
html_content = f"""
|
||||
<div class="card mc-panel">
|
||||
<div class="card-header d-flex justify-content-between align-items-center">
|
||||
<h5 class="card-title mb-0">{agent_id.title()}</h5>
|
||||
<span class="badge bg-secondary">{_format_period_label(period_type)}</span>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<ul class="list-unstyled mb-3">
|
||||
{bullets_html}
|
||||
</ul>
|
||||
|
||||
<div class="row text-center small">
|
||||
<div class="col">
|
||||
<div class="text-muted">PRs</div>
|
||||
<div class="fw-bold">{metrics["prs_opened"]}/{metrics["prs_merged"]}</div>
|
||||
<div class="text-muted" style="font-size: 0.75rem;">
|
||||
{int(metrics["pr_merge_rate"] * 100)}% merged
|
||||
</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Issues</div>
|
||||
<div class="fw-bold">{metrics["issues_touched"]}</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Tests</div>
|
||||
<div class="fw-bold">{metrics["tests_affected"]}</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Tokens</div>
|
||||
<div class="fw-bold {"text-success" if metrics["token_net"] >= 0 else "text-danger"}">
|
||||
{"+" if metrics["token_net"] > 0 else ""}{metrics["token_net"]}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{patterns_html}
|
||||
</div>
|
||||
</div>
|
||||
"""
|
||||
|
||||
return HTMLResponse(content=html_content)
|
||||
|
||||
except Exception as exc:
|
||||
logger.error("Failed to render scorecard panel for %s: %s", agent_id, exc)
|
||||
return HTMLResponse(content=_render_error_scorecard(agent_id, str(exc)))
|
||||
|
||||
|
||||
def _render_all_panels_grid(
|
||||
scorecards: list[ScorecardSummary],
|
||||
period_type: PeriodType,
|
||||
) -> str:
|
||||
"""Render all scorecard panels in a grid layout.
|
||||
|
||||
Args:
|
||||
scorecards: List of scorecard summaries
|
||||
period_type: Daily or weekly period
|
||||
|
||||
Returns:
|
||||
HTML string with all panels in a grid
|
||||
"""
|
||||
panels: list[str] = []
|
||||
for scorecard in scorecards:
|
||||
panel_html = _render_scorecard_panel(
|
||||
scorecard.agent_id,
|
||||
period_type,
|
||||
scorecard.to_dict(),
|
||||
return HTMLResponse(
|
||||
content=f"""
|
||||
<div class="card mc-panel border-danger">
|
||||
<h5 class="card-title">{agent_id.title()}</h5>
|
||||
<p class="text-danger">Error loading scorecard: {str(exc)}</p>
|
||||
</div>
|
||||
""",
|
||||
status_code=200,
|
||||
)
|
||||
# Wrap each panel in a grid column
|
||||
wrapped = f'<div class="col-md-6 col-lg-4 mb-3">{panel_html}</div>'
|
||||
panels.append(wrapped)
|
||||
|
||||
return f"""
|
||||
<div class="row">
|
||||
{"".join(panels)}
|
||||
</div>
|
||||
<div class="text-muted small mt-2">
|
||||
Generated: {datetime.now().strftime("%Y-%m-%d %H:%M:%S UTC")}
|
||||
</div>
|
||||
"""
|
||||
|
||||
|
||||
@router.get("/all/panels", response_class=HTMLResponse)
|
||||
@@ -420,15 +258,96 @@ async def all_scorecard_panels(
|
||||
Returns:
|
||||
HTML with all scorecard panels
|
||||
"""
|
||||
period_type = _parse_period(period)
|
||||
try:
|
||||
period_type = PeriodType(period.lower())
|
||||
except ValueError:
|
||||
period_type = PeriodType.daily
|
||||
|
||||
try:
|
||||
scorecards = generate_all_scorecards(period_type)
|
||||
html_content = _render_all_panels_grid(scorecards, period_type)
|
||||
|
||||
panels: list[str] = []
|
||||
for scorecard in scorecards:
|
||||
data = scorecard.to_dict()
|
||||
|
||||
# Build patterns HTML
|
||||
patterns_html = ""
|
||||
if data["patterns"]:
|
||||
patterns_list = "".join([f"<li>{p}</li>" for p in data["patterns"]])
|
||||
patterns_html = f"""
|
||||
<div class="mt-3">
|
||||
<h6>Patterns</h6>
|
||||
<ul class="list-unstyled text-info">
|
||||
{patterns_list}
|
||||
</ul>
|
||||
</div>
|
||||
"""
|
||||
|
||||
# Build bullets HTML
|
||||
bullets_html = "".join([f"<li>{b}</li>" for b in data["narrative_bullets"]])
|
||||
metrics = data["metrics"]
|
||||
|
||||
panel_html = f"""
|
||||
<div class="col-md-6 col-lg-4 mb-3">
|
||||
<div class="card mc-panel">
|
||||
<div class="card-header d-flex justify-content-between align-items-center">
|
||||
<h5 class="card-title mb-0">{scorecard.agent_id.title()}</h5>
|
||||
<span class="badge bg-secondary">{_format_period_label(period_type)}</span>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<ul class="list-unstyled mb-3">
|
||||
{bullets_html}
|
||||
</ul>
|
||||
|
||||
<div class="row text-center small">
|
||||
<div class="col">
|
||||
<div class="text-muted">PRs</div>
|
||||
<div class="fw-bold">{metrics["prs_opened"]}/{metrics["prs_merged"]}</div>
|
||||
<div class="text-muted" style="font-size: 0.75rem;">
|
||||
{int(metrics["pr_merge_rate"] * 100)}% merged
|
||||
</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Issues</div>
|
||||
<div class="fw-bold">{metrics["issues_touched"]}</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Tests</div>
|
||||
<div class="fw-bold">{metrics["tests_affected"]}</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Tokens</div>
|
||||
<div class="fw-bold {"text-success" if metrics["token_net"] >= 0 else "text-danger"}">
|
||||
{"+" if metrics["token_net"] > 0 else ""}{metrics["token_net"]}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{patterns_html}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
"""
|
||||
panels.append(panel_html)
|
||||
|
||||
html_content = f"""
|
||||
<div class="row">
|
||||
{"".join(panels)}
|
||||
</div>
|
||||
<div class="text-muted small mt-2">
|
||||
Generated: {datetime.now().strftime("%Y-%m-%d %H:%M:%S UTC")}
|
||||
</div>
|
||||
"""
|
||||
|
||||
return HTMLResponse(content=html_content)
|
||||
|
||||
except Exception as exc:
|
||||
logger.error("Failed to render all scorecard panels: %s", exc)
|
||||
return HTMLResponse(
|
||||
content=f'<div class="alert alert-danger">Error loading scorecards: {exc}</div>'
|
||||
content=f"""
|
||||
<div class="alert alert-danger">
|
||||
Error loading scorecards: {str(exc)}
|
||||
</div>
|
||||
""",
|
||||
status_code=200,
|
||||
)
|
||||
|
||||
@@ -1,40 +0,0 @@
|
||||
"""WebSocket emitter for the sovereignty metrics dashboard widget.
|
||||
|
||||
Streams real-time sovereignty snapshots to connected clients every
|
||||
*_PUSH_INTERVAL* seconds. The snapshot includes per-layer sovereignty
|
||||
percentages, API cost rate, and skill crystallisation count.
|
||||
|
||||
Refs: #954, #953
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
|
||||
from fastapi import APIRouter, WebSocket
|
||||
|
||||
router = APIRouter(tags=["sovereignty"])
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_PUSH_INTERVAL = 5 # seconds between snapshot pushes
|
||||
|
||||
|
||||
@router.websocket("/ws/sovereignty")
|
||||
async def sovereignty_ws(websocket: WebSocket) -> None:
|
||||
"""Stream sovereignty metric snapshots to the dashboard widget."""
|
||||
from timmy.sovereignty.metrics import get_metrics_store
|
||||
|
||||
await websocket.accept()
|
||||
logger.info("Sovereignty WS connected")
|
||||
|
||||
store = get_metrics_store()
|
||||
try:
|
||||
# Send initial snapshot immediately
|
||||
await websocket.send_text(json.dumps(store.get_snapshot()))
|
||||
|
||||
while True:
|
||||
await asyncio.sleep(_PUSH_INTERVAL)
|
||||
await websocket.send_text(json.dumps(store.get_snapshot()))
|
||||
except Exception:
|
||||
logger.debug("Sovereignty WS disconnected")
|
||||
@@ -7,8 +7,6 @@ router = APIRouter(prefix="/telegram", tags=["telegram"])
|
||||
|
||||
|
||||
class TokenPayload(BaseModel):
|
||||
"""Request payload containing a Telegram bot token."""
|
||||
|
||||
token: str
|
||||
|
||||
|
||||
|
||||
@@ -51,8 +51,6 @@ def _get_db() -> Generator[sqlite3.Connection, None, None]:
|
||||
|
||||
|
||||
class _EnumLike:
|
||||
"""Lightweight enum-like wrapper for string values used in templates."""
|
||||
|
||||
def __init__(self, v: str):
|
||||
self.value = v
|
||||
|
||||
|
||||
@@ -23,8 +23,6 @@ TRACKED_AGENTS = frozenset({"hermes", "kimi", "manus", "claude", "gemini"})
|
||||
|
||||
|
||||
class PeriodType(StrEnum):
|
||||
"""Scorecard reporting period type."""
|
||||
|
||||
daily = "daily"
|
||||
weekly = "weekly"
|
||||
|
||||
|
||||
@@ -67,7 +67,6 @@
|
||||
<div class="mc-nav-dropdown">
|
||||
<button class="mc-test-link mc-dropdown-toggle" aria-expanded="false">INTEL ▾</button>
|
||||
<div class="mc-dropdown-menu">
|
||||
<a href="/nexus" class="mc-test-link">NEXUS</a>
|
||||
<a href="/spark/ui" class="mc-test-link">SPARK</a>
|
||||
<a href="/memory" class="mc-test-link">MEMORY</a>
|
||||
<a href="/marketplace/ui" class="mc-test-link">MARKET</a>
|
||||
|
||||
@@ -1,122 +0,0 @@
|
||||
{% extends "base.html" %}
|
||||
|
||||
{% block title %}Nexus{% endblock %}
|
||||
|
||||
{% block extra_styles %}{% endblock %}
|
||||
|
||||
{% block content %}
|
||||
<div class="container-fluid nexus-layout py-3">
|
||||
|
||||
<div class="nexus-header mb-3">
|
||||
<div class="nexus-title">// NEXUS</div>
|
||||
<div class="nexus-subtitle">
|
||||
Persistent conversational awareness — always present, always learning.
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="nexus-grid">
|
||||
|
||||
<!-- ── LEFT: Conversation ────────────────────────────────── -->
|
||||
<div class="nexus-chat-col">
|
||||
<div class="card mc-panel nexus-chat-panel">
|
||||
<div class="card-header mc-panel-header d-flex justify-content-between align-items-center">
|
||||
<span>// CONVERSATION</span>
|
||||
<button class="mc-btn mc-btn-sm"
|
||||
hx-delete="/nexus/history"
|
||||
hx-target="#nexus-chat-log"
|
||||
hx-swap="beforeend"
|
||||
hx-confirm="Clear nexus conversation?">
|
||||
CLEAR
|
||||
</button>
|
||||
</div>
|
||||
|
||||
<div class="card-body p-2" id="nexus-chat-log">
|
||||
{% for msg in messages %}
|
||||
<div class="chat-message {{ 'user' if msg.role == 'user' else 'agent' }}">
|
||||
<div class="msg-meta">
|
||||
{{ 'YOU' if msg.role == 'user' else 'TIMMY' }} // {{ msg.timestamp }}
|
||||
</div>
|
||||
<div class="msg-body {% if msg.role == 'assistant' %}timmy-md{% endif %}">
|
||||
{{ msg.content | e }}
|
||||
</div>
|
||||
</div>
|
||||
{% else %}
|
||||
<div class="nexus-empty-state">
|
||||
Nexus is ready. Start a conversation — memories will surface in real time.
|
||||
</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
|
||||
<div class="card-footer p-2">
|
||||
<form hx-post="/nexus/chat"
|
||||
hx-target="#nexus-chat-log"
|
||||
hx-swap="beforeend"
|
||||
hx-on::after-request="this.reset(); document.getElementById('nexus-chat-log').scrollTop = 999999;">
|
||||
<div class="d-flex gap-2">
|
||||
<input type="text"
|
||||
name="message"
|
||||
id="nexus-input"
|
||||
class="mc-search-input flex-grow-1"
|
||||
placeholder="Talk to Timmy..."
|
||||
autocomplete="off"
|
||||
required>
|
||||
<button type="submit" class="mc-btn mc-btn-primary">SEND</button>
|
||||
</div>
|
||||
</form>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- ── RIGHT: Memory sidebar ─────────────────────────────── -->
|
||||
<div class="nexus-sidebar-col">
|
||||
|
||||
<!-- Live memory context (updated with each response) -->
|
||||
<div class="card mc-panel nexus-memory-panel mb-3">
|
||||
<div class="card-header mc-panel-header">
|
||||
<span>// LIVE MEMORY</span>
|
||||
<span class="badge ms-2" style="background:var(--purple-dim); color:var(--purple);">
|
||||
{{ stats.total_entries }} stored
|
||||
</span>
|
||||
</div>
|
||||
<div class="card-body p-2">
|
||||
<div id="nexus-memory-panel" class="nexus-memory-hits">
|
||||
<div class="nexus-memory-label">Relevant memories appear here as you chat.</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Teaching panel -->
|
||||
<div class="card mc-panel nexus-teach-panel">
|
||||
<div class="card-header mc-panel-header">// TEACH TIMMY</div>
|
||||
<div class="card-body p-2">
|
||||
<form hx-post="/nexus/teach"
|
||||
hx-target="#nexus-teach-response"
|
||||
hx-swap="innerHTML"
|
||||
hx-on::after-request="this.reset()">
|
||||
<div class="d-flex gap-2 mb-2">
|
||||
<input type="text"
|
||||
name="fact"
|
||||
class="mc-search-input flex-grow-1"
|
||||
placeholder="e.g. I prefer dark themes"
|
||||
required>
|
||||
<button type="submit" class="mc-btn mc-btn-primary">TEACH</button>
|
||||
</div>
|
||||
</form>
|
||||
<div id="nexus-teach-response"></div>
|
||||
|
||||
<div class="nexus-facts-header mt-3">// KNOWN FACTS</div>
|
||||
<ul class="nexus-facts-list" id="nexus-facts-list">
|
||||
{% for fact in facts %}
|
||||
<li class="nexus-fact-item">{{ fact.content | e }}</li>
|
||||
{% else %}
|
||||
<li class="nexus-fact-empty">No personal facts stored yet.</li>
|
||||
{% endfor %}
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div><!-- /sidebar -->
|
||||
</div><!-- /nexus-grid -->
|
||||
|
||||
</div>
|
||||
{% endblock %}
|
||||
@@ -1,12 +0,0 @@
|
||||
{% if taught %}
|
||||
<div class="nexus-taught-confirm">
|
||||
✓ Taught: <em>{{ taught | e }}</em>
|
||||
</div>
|
||||
{% endif %}
|
||||
<ul class="nexus-facts-list" id="nexus-facts-list" hx-swap-oob="true">
|
||||
{% for fact in facts %}
|
||||
<li class="nexus-fact-item">{{ fact.content | e }}</li>
|
||||
{% else %}
|
||||
<li class="nexus-fact-empty">No facts stored yet.</li>
|
||||
{% endfor %}
|
||||
</ul>
|
||||
@@ -1,36 +0,0 @@
|
||||
{% if user_message %}
|
||||
<div class="chat-message user">
|
||||
<div class="msg-meta">YOU // {{ timestamp }}</div>
|
||||
<div class="msg-body">{{ user_message | e }}</div>
|
||||
</div>
|
||||
{% endif %}
|
||||
{% if response %}
|
||||
<div class="chat-message agent">
|
||||
<div class="msg-meta">TIMMY // {{ timestamp }}</div>
|
||||
<div class="msg-body timmy-md">{{ response | e }}</div>
|
||||
</div>
|
||||
<script>
|
||||
(function() {
|
||||
var el = document.currentScript.previousElementSibling.querySelector('.timmy-md');
|
||||
if (el && typeof marked !== 'undefined' && typeof DOMPurify !== 'undefined') {
|
||||
el.innerHTML = DOMPurify.sanitize(marked.parse(el.textContent));
|
||||
}
|
||||
})();
|
||||
</script>
|
||||
{% elif error %}
|
||||
<div class="chat-message error-msg">
|
||||
<div class="msg-meta">SYSTEM // {{ timestamp }}</div>
|
||||
<div class="msg-body">{{ error | e }}</div>
|
||||
</div>
|
||||
{% endif %}
|
||||
{% if memory_hits %}
|
||||
<div class="nexus-memory-hits" id="nexus-memory-panel" hx-swap-oob="true">
|
||||
<div class="nexus-memory-label">// LIVE MEMORY CONTEXT</div>
|
||||
{% for hit in memory_hits %}
|
||||
<div class="nexus-memory-hit">
|
||||
<span class="nexus-memory-type">{{ hit.memory_type }}</span>
|
||||
<span class="nexus-memory-content">{{ hit.content | e }}</span>
|
||||
</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
{% endif %}
|
||||
@@ -24,8 +24,6 @@ MAX_MESSAGES: int = 500
|
||||
|
||||
@dataclass
|
||||
class Message:
|
||||
"""A single chat message with role, content, timestamp, and source."""
|
||||
|
||||
role: str # "user" | "agent" | "error"
|
||||
content: str
|
||||
timestamp: str
|
||||
|
||||
@@ -71,53 +71,6 @@ class GitHand:
|
||||
return True
|
||||
return False
|
||||
|
||||
async def _exec_subprocess(
|
||||
self,
|
||||
args: str,
|
||||
timeout: int,
|
||||
) -> tuple[bytes, bytes, int]:
|
||||
"""Run git as a subprocess, return (stdout, stderr, returncode).
|
||||
|
||||
Raises TimeoutError if the process exceeds *timeout* seconds.
|
||||
"""
|
||||
proc = await asyncio.create_subprocess_exec(
|
||||
"git",
|
||||
*args.split(),
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.PIPE,
|
||||
cwd=self._repo_dir,
|
||||
)
|
||||
try:
|
||||
stdout, stderr = await asyncio.wait_for(
|
||||
proc.communicate(),
|
||||
timeout=timeout,
|
||||
)
|
||||
except TimeoutError:
|
||||
proc.kill()
|
||||
await proc.wait()
|
||||
raise
|
||||
return stdout, stderr, proc.returncode or 0
|
||||
|
||||
@staticmethod
|
||||
def _parse_output(
|
||||
command: str,
|
||||
stdout_bytes: bytes,
|
||||
stderr_bytes: bytes,
|
||||
returncode: int | None,
|
||||
latency_ms: float,
|
||||
) -> GitResult:
|
||||
"""Decode subprocess output into a GitResult."""
|
||||
exit_code = returncode or 0
|
||||
stdout = stdout_bytes.decode("utf-8", errors="replace").strip()
|
||||
stderr = stderr_bytes.decode("utf-8", errors="replace").strip()
|
||||
return GitResult(
|
||||
operation=command,
|
||||
success=exit_code == 0,
|
||||
output=stdout,
|
||||
error=stderr if exit_code != 0 else "",
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self,
|
||||
args: str,
|
||||
@@ -135,15 +88,14 @@ class GitHand:
|
||||
GitResult with output or error details.
|
||||
"""
|
||||
start = time.time()
|
||||
command = f"git {args}"
|
||||
|
||||
# Gate destructive operations
|
||||
if self._is_destructive(args) and not allow_destructive:
|
||||
return GitResult(
|
||||
operation=command,
|
||||
operation=f"git {args}",
|
||||
success=False,
|
||||
error=(
|
||||
f"Destructive operation blocked: '{command}'. "
|
||||
f"Destructive operation blocked: 'git {args}'. "
|
||||
"Set allow_destructive=True to override."
|
||||
),
|
||||
requires_confirmation=True,
|
||||
@@ -151,21 +103,46 @@ class GitHand:
|
||||
)
|
||||
|
||||
effective_timeout = timeout or self._timeout
|
||||
command = f"git {args}"
|
||||
|
||||
try:
|
||||
stdout_bytes, stderr_bytes, returncode = await self._exec_subprocess(
|
||||
args,
|
||||
effective_timeout,
|
||||
proc = await asyncio.create_subprocess_exec(
|
||||
"git",
|
||||
*args.split(),
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.PIPE,
|
||||
cwd=self._repo_dir,
|
||||
)
|
||||
except TimeoutError:
|
||||
|
||||
try:
|
||||
stdout_bytes, stderr_bytes = await asyncio.wait_for(
|
||||
proc.communicate(), timeout=effective_timeout
|
||||
)
|
||||
except TimeoutError:
|
||||
proc.kill()
|
||||
await proc.wait()
|
||||
latency = (time.time() - start) * 1000
|
||||
logger.warning("Git command timed out after %ds: %s", effective_timeout, command)
|
||||
return GitResult(
|
||||
operation=command,
|
||||
success=False,
|
||||
error=f"Command timed out after {effective_timeout}s",
|
||||
latency_ms=latency,
|
||||
)
|
||||
|
||||
latency = (time.time() - start) * 1000
|
||||
logger.warning("Git command timed out after %ds: %s", effective_timeout, command)
|
||||
exit_code = proc.returncode or 0
|
||||
stdout = stdout_bytes.decode("utf-8", errors="replace").strip()
|
||||
stderr = stderr_bytes.decode("utf-8", errors="replace").strip()
|
||||
|
||||
return GitResult(
|
||||
operation=command,
|
||||
success=False,
|
||||
error=f"Command timed out after {effective_timeout}s",
|
||||
success=exit_code == 0,
|
||||
output=stdout,
|
||||
error=stderr if exit_code != 0 else "",
|
||||
latency_ms=latency,
|
||||
)
|
||||
|
||||
except FileNotFoundError:
|
||||
latency = (time.time() - start) * 1000
|
||||
logger.warning("git binary not found")
|
||||
@@ -185,14 +162,6 @@ class GitHand:
|
||||
latency_ms=latency,
|
||||
)
|
||||
|
||||
return self._parse_output(
|
||||
command,
|
||||
stdout_bytes,
|
||||
stderr_bytes,
|
||||
returncode=returncode,
|
||||
latency_ms=(time.time() - start) * 1000,
|
||||
)
|
||||
|
||||
# ── Convenience wrappers ─────────────────────────────────────────────────
|
||||
|
||||
async def status(self) -> GitResult:
|
||||
|
||||
@@ -21,8 +21,6 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
@dataclass
|
||||
class Notification:
|
||||
"""A push notification with title, message, category, and read status."""
|
||||
|
||||
id: int
|
||||
title: str
|
||||
message: str
|
||||
|
||||
@@ -242,64 +242,6 @@ def produce_agent_state(agent_id: str, presence: dict) -> dict:
|
||||
}
|
||||
|
||||
|
||||
def _get_agents_online() -> int:
|
||||
"""Return the count of agents with a non-offline status."""
|
||||
try:
|
||||
from timmy.agents.loader import list_agents
|
||||
|
||||
agents = list_agents()
|
||||
return sum(1 for a in agents if a.get("status", "") not in ("offline", ""))
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to count agents: %s", exc)
|
||||
return 0
|
||||
|
||||
|
||||
def _get_visitors() -> int:
|
||||
"""Return the count of active WebSocket visitor clients."""
|
||||
try:
|
||||
from dashboard.routes.world import _ws_clients
|
||||
|
||||
return len(_ws_clients)
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to count visitors: %s", exc)
|
||||
return 0
|
||||
|
||||
|
||||
def _get_uptime_seconds() -> int:
|
||||
"""Return seconds elapsed since application start."""
|
||||
try:
|
||||
from config import APP_START_TIME
|
||||
|
||||
return int((datetime.now(UTC) - APP_START_TIME).total_seconds())
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to calculate uptime: %s", exc)
|
||||
return 0
|
||||
|
||||
|
||||
def _get_thinking_active() -> bool:
|
||||
"""Return True if the thinking engine is enabled and running."""
|
||||
try:
|
||||
from config import settings
|
||||
from timmy.thinking import thinking_engine
|
||||
|
||||
return settings.thinking_enabled and thinking_engine is not None
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to check thinking status: %s", exc)
|
||||
return False
|
||||
|
||||
|
||||
def _get_memory_count() -> int:
|
||||
"""Return total entries in the vector memory store."""
|
||||
try:
|
||||
from timmy.memory_system import get_memory_stats
|
||||
|
||||
stats = get_memory_stats()
|
||||
return stats.get("total_entries", 0)
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to count memories: %s", exc)
|
||||
return 0
|
||||
|
||||
|
||||
def produce_system_status() -> dict:
|
||||
"""Generate a system_status message for the Matrix.
|
||||
|
||||
@@ -328,14 +270,64 @@ def produce_system_status() -> dict:
|
||||
"ts": 1742529600,
|
||||
}
|
||||
"""
|
||||
# Count agents with status != offline
|
||||
agents_online = 0
|
||||
try:
|
||||
from timmy.agents.loader import list_agents
|
||||
|
||||
agents = list_agents()
|
||||
agents_online = sum(1 for a in agents if a.get("status", "") not in ("offline", ""))
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to count agents: %s", exc)
|
||||
|
||||
# Count visitors from WebSocket clients
|
||||
visitors = 0
|
||||
try:
|
||||
from dashboard.routes.world import _ws_clients
|
||||
|
||||
visitors = len(_ws_clients)
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to count visitors: %s", exc)
|
||||
|
||||
# Calculate uptime
|
||||
uptime_seconds = 0
|
||||
try:
|
||||
from datetime import UTC
|
||||
|
||||
from config import APP_START_TIME
|
||||
|
||||
uptime_seconds = int((datetime.now(UTC) - APP_START_TIME).total_seconds())
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to calculate uptime: %s", exc)
|
||||
|
||||
# Check thinking engine status
|
||||
thinking_active = False
|
||||
try:
|
||||
from config import settings
|
||||
from timmy.thinking import thinking_engine
|
||||
|
||||
thinking_active = settings.thinking_enabled and thinking_engine is not None
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to check thinking status: %s", exc)
|
||||
|
||||
# Count memories in vector store
|
||||
memory_count = 0
|
||||
try:
|
||||
from timmy.memory_system import get_memory_stats
|
||||
|
||||
stats = get_memory_stats()
|
||||
memory_count = stats.get("total_entries", 0)
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to count memories: %s", exc)
|
||||
|
||||
return {
|
||||
"type": "system_status",
|
||||
"data": {
|
||||
"agents_online": _get_agents_online(),
|
||||
"visitors": _get_visitors(),
|
||||
"uptime_seconds": _get_uptime_seconds(),
|
||||
"thinking_active": _get_thinking_active(),
|
||||
"memory_count": _get_memory_count(),
|
||||
"agents_online": agents_online,
|
||||
"visitors": visitors,
|
||||
"uptime_seconds": uptime_seconds,
|
||||
"thinking_active": thinking_active,
|
||||
"memory_count": memory_count,
|
||||
},
|
||||
"ts": int(time.time()),
|
||||
}
|
||||
|
||||
@@ -3,14 +3,6 @@
|
||||
from .api import router
|
||||
from .cascade import CascadeRouter, Provider, ProviderStatus, get_router
|
||||
from .history import HealthHistoryStore, get_history_store
|
||||
from .metabolic import (
|
||||
DEFAULT_TIER_MODELS,
|
||||
MetabolicRouter,
|
||||
ModelTier,
|
||||
build_prompt,
|
||||
classify_complexity,
|
||||
get_metabolic_router,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"CascadeRouter",
|
||||
@@ -20,11 +12,4 @@ __all__ = [
|
||||
"router",
|
||||
"HealthHistoryStore",
|
||||
"get_history_store",
|
||||
# Metabolic router
|
||||
"MetabolicRouter",
|
||||
"ModelTier",
|
||||
"DEFAULT_TIER_MODELS",
|
||||
"classify_complexity",
|
||||
"build_prompt",
|
||||
"get_metabolic_router",
|
||||
]
|
||||
|
||||
@@ -528,71 +528,6 @@ class CascadeRouter:
|
||||
|
||||
return True
|
||||
|
||||
def _filter_providers(self, cascade_tier: str | None) -> list["Provider"]:
|
||||
"""Return the provider list filtered by tier.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If a tier is specified but no matching providers exist.
|
||||
"""
|
||||
if cascade_tier == "frontier_required":
|
||||
providers = [p for p in self.providers if p.type == "anthropic"]
|
||||
if not providers:
|
||||
raise RuntimeError("No Anthropic provider configured for 'frontier_required' tier.")
|
||||
return providers
|
||||
if cascade_tier:
|
||||
providers = [p for p in self.providers if p.tier == cascade_tier]
|
||||
if not providers:
|
||||
raise RuntimeError(f"No providers found for tier: {cascade_tier}")
|
||||
return providers
|
||||
return self.providers
|
||||
|
||||
async def _try_single_provider(
|
||||
self,
|
||||
provider: "Provider",
|
||||
messages: list[dict],
|
||||
model: str | None,
|
||||
temperature: float,
|
||||
max_tokens: int | None,
|
||||
content_type: ContentType,
|
||||
errors: list[str],
|
||||
) -> dict | None:
|
||||
"""Attempt one provider, returning a result dict on success or None on failure.
|
||||
|
||||
On failure the error string is appended to *errors* and the provider's
|
||||
failure metrics are updated so the caller can move on to the next provider.
|
||||
"""
|
||||
if not self._is_provider_available(provider):
|
||||
return None
|
||||
|
||||
# Metabolic protocol: skip cloud providers when quota is low
|
||||
if provider.type in ("anthropic", "openai", "grok"):
|
||||
if not self._quota_allows_cloud(provider):
|
||||
logger.info(
|
||||
"Metabolic protocol: skipping cloud provider %s (quota too low)",
|
||||
provider.name,
|
||||
)
|
||||
return None
|
||||
|
||||
selected_model, is_fallback_model = self._select_model(provider, model, content_type)
|
||||
|
||||
try:
|
||||
result = await self._attempt_with_retry(
|
||||
provider, messages, selected_model, temperature, max_tokens, content_type
|
||||
)
|
||||
except RuntimeError as exc:
|
||||
errors.append(str(exc))
|
||||
self._record_failure(provider)
|
||||
return None
|
||||
|
||||
self._record_success(provider, result.get("latency_ms", 0))
|
||||
return {
|
||||
"content": result["content"],
|
||||
"provider": provider.name,
|
||||
"model": result.get("model", selected_model or provider.get_default_model()),
|
||||
"latency_ms": result.get("latency_ms", 0),
|
||||
"is_fallback_model": is_fallback_model,
|
||||
}
|
||||
|
||||
async def complete(
|
||||
self,
|
||||
messages: list[dict],
|
||||
@@ -626,15 +561,55 @@ class CascadeRouter:
|
||||
if content_type != ContentType.TEXT:
|
||||
logger.debug("Detected %s content, selecting appropriate model", content_type.value)
|
||||
|
||||
errors: list[str] = []
|
||||
providers = self._filter_providers(cascade_tier)
|
||||
errors = []
|
||||
|
||||
providers = self.providers
|
||||
if cascade_tier == "frontier_required":
|
||||
providers = [p for p in self.providers if p.type == "anthropic"]
|
||||
if not providers:
|
||||
raise RuntimeError("No Anthropic provider configured for 'frontier_required' tier.")
|
||||
elif cascade_tier:
|
||||
providers = [p for p in self.providers if p.tier == cascade_tier]
|
||||
if not providers:
|
||||
raise RuntimeError(f"No providers found for tier: {cascade_tier}")
|
||||
|
||||
for provider in providers:
|
||||
result = await self._try_single_provider(
|
||||
provider, messages, model, temperature, max_tokens, content_type, errors
|
||||
)
|
||||
if result is not None:
|
||||
return result
|
||||
if not self._is_provider_available(provider):
|
||||
continue
|
||||
|
||||
# Metabolic protocol: skip cloud providers when quota is low
|
||||
if provider.type in ("anthropic", "openai", "grok"):
|
||||
if not self._quota_allows_cloud(provider):
|
||||
logger.info(
|
||||
"Metabolic protocol: skipping cloud provider %s (quota too low)",
|
||||
provider.name,
|
||||
)
|
||||
continue
|
||||
|
||||
selected_model, is_fallback_model = self._select_model(provider, model, content_type)
|
||||
|
||||
try:
|
||||
result = await self._attempt_with_retry(
|
||||
provider,
|
||||
messages,
|
||||
selected_model,
|
||||
temperature,
|
||||
max_tokens,
|
||||
content_type,
|
||||
)
|
||||
except RuntimeError as exc:
|
||||
errors.append(str(exc))
|
||||
self._record_failure(provider)
|
||||
continue
|
||||
|
||||
self._record_success(provider, result.get("latency_ms", 0))
|
||||
return {
|
||||
"content": result["content"],
|
||||
"provider": provider.name,
|
||||
"model": result.get("model", selected_model or provider.get_default_model()),
|
||||
"latency_ms": result.get("latency_ms", 0),
|
||||
"is_fallback_model": is_fallback_model,
|
||||
}
|
||||
|
||||
raise RuntimeError(f"All providers failed: {'; '.join(errors)}")
|
||||
|
||||
|
||||
@@ -1,424 +0,0 @@
|
||||
"""Three-tier metabolic LLM router.
|
||||
|
||||
Routes queries to the cheapest-sufficient model tier using MLX for all
|
||||
inference on Apple Silicon GPU:
|
||||
|
||||
T1 — Routine (Qwen3-8B Q6_K, ~45-55 tok/s): Simple navigation, basic choices.
|
||||
T2 — Medium (Qwen3-14B Q5_K_M, ~20-28 tok/s): Dialogue, inventory management.
|
||||
T3 — Complex (Qwen3-32B Q4_K_M, ~8-12 tok/s): Quest planning, stuck recovery.
|
||||
|
||||
Memory budget:
|
||||
- T1+T2 always loaded (~8.5 GB combined)
|
||||
- T3 loaded on demand (+20 GB) — game pauses during inference
|
||||
|
||||
Design notes:
|
||||
- 70% of game ticks never reach the LLM (handled upstream by behavior trees)
|
||||
- T3 pauses the game world before inference and unpauses after (graceful if no world)
|
||||
- All inference via vllm-mlx / Ollama — local-first, no cloud for game ticks
|
||||
|
||||
References:
|
||||
- Issue #966 — Three-Tier Metabolic LLM Router
|
||||
- Issue #1063 — Best Local Uncensored Agent Model for M3 Max 36GB
|
||||
- Issue #1075 — Claude Quota Monitor + Metabolic Protocol
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from enum import StrEnum
|
||||
from typing import Any
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class ModelTier(StrEnum):
|
||||
"""Three metabolic model tiers ordered by cost and capability.
|
||||
|
||||
Tier selection is driven by classify_complexity(). The cheapest
|
||||
sufficient tier is always chosen — T1 handles routine tasks, T2
|
||||
handles dialogue and management, T3 handles planning and recovery.
|
||||
"""
|
||||
|
||||
T1_ROUTINE = "t1_routine" # Fast, cheap — Qwen3-8B, always loaded
|
||||
T2_MEDIUM = "t2_medium" # Balanced — Qwen3-14B, always loaded
|
||||
T3_COMPLEX = "t3_complex" # Deep — Qwen3-32B, loaded on demand, pauses game
|
||||
|
||||
|
||||
# ── Classification vocabulary ────────────────────────────────────────────────
|
||||
|
||||
# T1: single-action navigation and binary-choice words
|
||||
_T1_KEYWORDS = frozenset(
|
||||
{
|
||||
"go",
|
||||
"move",
|
||||
"walk",
|
||||
"run",
|
||||
"north",
|
||||
"south",
|
||||
"east",
|
||||
"west",
|
||||
"up",
|
||||
"down",
|
||||
"left",
|
||||
"right",
|
||||
"yes",
|
||||
"no",
|
||||
"ok",
|
||||
"okay",
|
||||
"open",
|
||||
"close",
|
||||
"take",
|
||||
"drop",
|
||||
"look",
|
||||
"pick",
|
||||
"use",
|
||||
"wait",
|
||||
"rest",
|
||||
"save",
|
||||
"attack",
|
||||
"flee",
|
||||
"jump",
|
||||
"crouch",
|
||||
}
|
||||
)
|
||||
|
||||
# T3: planning, optimisation, or recovery signals
|
||||
_T3_KEYWORDS = frozenset(
|
||||
{
|
||||
"plan",
|
||||
"strategy",
|
||||
"optimize",
|
||||
"optimise",
|
||||
"quest",
|
||||
"stuck",
|
||||
"recover",
|
||||
"multi-step",
|
||||
"long-term",
|
||||
"negotiate",
|
||||
"persuade",
|
||||
"faction",
|
||||
"reputation",
|
||||
"best",
|
||||
"optimal",
|
||||
"recommend",
|
||||
"analyze",
|
||||
"analyse",
|
||||
"evaluate",
|
||||
"decide",
|
||||
"complex",
|
||||
"how do i",
|
||||
"what should i do",
|
||||
"help me figure",
|
||||
"what is the best",
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
def classify_complexity(task: str, state: dict) -> ModelTier:
|
||||
"""Classify a task to the cheapest-sufficient model tier.
|
||||
|
||||
Classification priority (highest wins):
|
||||
1. T3 — any T3 keyword, stuck indicator, or ``state["require_t3"] = True``
|
||||
2. T1 — short task with only T1 keywords and no active context
|
||||
3. T2 — everything else (safe default)
|
||||
|
||||
Args:
|
||||
task: Natural-language task description or player input.
|
||||
state: Current game state dict. Recognised keys:
|
||||
``stuck`` (bool), ``require_t3`` (bool),
|
||||
``active_quests`` (list), ``dialogue_active`` (bool).
|
||||
|
||||
Returns:
|
||||
ModelTier appropriate for the task.
|
||||
"""
|
||||
task_lower = task.lower()
|
||||
words = set(task_lower.split())
|
||||
|
||||
# ── T3 signals ──────────────────────────────────────────────────────────
|
||||
t3_keyword_hit = bool(words & _T3_KEYWORDS)
|
||||
# Check multi-word T3 phrases
|
||||
t3_phrase_hit = any(phrase in task_lower for phrase in _T3_KEYWORDS if " " in phrase)
|
||||
is_stuck = bool(state.get("stuck", False))
|
||||
explicit_t3 = bool(state.get("require_t3", False))
|
||||
|
||||
if t3_keyword_hit or t3_phrase_hit or is_stuck or explicit_t3:
|
||||
logger.debug(
|
||||
"classify_complexity → T3 (keywords=%s stuck=%s explicit=%s)",
|
||||
t3_keyword_hit or t3_phrase_hit,
|
||||
is_stuck,
|
||||
explicit_t3,
|
||||
)
|
||||
return ModelTier.T3_COMPLEX
|
||||
|
||||
# ── T1 signals ──────────────────────────────────────────────────────────
|
||||
t1_keyword_hit = bool(words & _T1_KEYWORDS)
|
||||
task_short = len(task.split()) <= 6
|
||||
no_active_context = (
|
||||
not state.get("active_quests")
|
||||
and not state.get("dialogue_active")
|
||||
and not state.get("combat_active")
|
||||
)
|
||||
|
||||
if t1_keyword_hit and task_short and no_active_context:
|
||||
logger.debug("classify_complexity → T1 (keywords=%s short=%s)", t1_keyword_hit, task_short)
|
||||
return ModelTier.T1_ROUTINE
|
||||
|
||||
# ── Default: T2 ─────────────────────────────────────────────────────────
|
||||
logger.debug("classify_complexity → T2 (default)")
|
||||
return ModelTier.T2_MEDIUM
|
||||
|
||||
|
||||
def build_prompt(
|
||||
state: dict,
|
||||
ui_state: dict,
|
||||
text: str,
|
||||
visual_context: str | None = None,
|
||||
) -> list[dict]:
|
||||
"""Build an OpenAI-compatible messages list from game context.
|
||||
|
||||
Assembles a system message from structured game state and a user
|
||||
message from the player's text input. This format is accepted by
|
||||
CascadeRouter.complete() directly.
|
||||
|
||||
Args:
|
||||
state: Current game state dict. Common keys:
|
||||
``location`` (str), ``health`` (int/float),
|
||||
``inventory`` (list), ``active_quests`` (list),
|
||||
``stuck`` (bool).
|
||||
ui_state: Current UI state dict. Common keys:
|
||||
``dialogue_active`` (bool), ``dialogue_npc`` (str),
|
||||
``menu_open`` (str), ``combat_active`` (bool).
|
||||
text: Player text or task description (becomes user message).
|
||||
visual_context: Optional free-text description of the current screen
|
||||
or scene — from a vision model or rule-based extractor.
|
||||
|
||||
Returns:
|
||||
List of message dicts: [{"role": "system", ...}, {"role": "user", ...}]
|
||||
"""
|
||||
context_lines: list[str] = []
|
||||
|
||||
location = state.get("location", "unknown")
|
||||
context_lines.append(f"Location: {location}")
|
||||
|
||||
health = state.get("health")
|
||||
if health is not None:
|
||||
context_lines.append(f"Health: {health}")
|
||||
|
||||
inventory = state.get("inventory", [])
|
||||
if inventory:
|
||||
items = [i if isinstance(i, str) else i.get("name", str(i)) for i in inventory[:10]]
|
||||
context_lines.append(f"Inventory: {', '.join(items)}")
|
||||
|
||||
active_quests = state.get("active_quests", [])
|
||||
if active_quests:
|
||||
names = [q if isinstance(q, str) else q.get("name", str(q)) for q in active_quests[:5]]
|
||||
context_lines.append(f"Active quests: {', '.join(names)}")
|
||||
|
||||
if state.get("stuck"):
|
||||
context_lines.append("Status: STUCK — need recovery strategy")
|
||||
|
||||
if ui_state.get("dialogue_active"):
|
||||
npc = ui_state.get("dialogue_npc", "NPC")
|
||||
context_lines.append(f"In dialogue with: {npc}")
|
||||
|
||||
if ui_state.get("menu_open"):
|
||||
context_lines.append(f"Menu open: {ui_state['menu_open']}")
|
||||
|
||||
if ui_state.get("combat_active"):
|
||||
context_lines.append("Status: IN COMBAT")
|
||||
|
||||
if visual_context:
|
||||
context_lines.append(f"Scene: {visual_context}")
|
||||
|
||||
system_content = (
|
||||
"You are Timmy, an AI game agent. "
|
||||
"Respond with valid game commands only.\n\n" + "\n".join(context_lines)
|
||||
)
|
||||
|
||||
return [
|
||||
{"role": "system", "content": system_content},
|
||||
{"role": "user", "content": text},
|
||||
]
|
||||
|
||||
|
||||
# ── Default model assignments ────────────────────────────────────────────────
|
||||
# Overridable per deployment via MetabolicRouter(tier_models={...}).
|
||||
# Model benchmarks (M3 Max 36 GB, issue #1063):
|
||||
# Qwen3-8B Q6_K — 0.933 F1 tool calling, ~45-55 tok/s (~6 GB)
|
||||
# Qwen3-14B Q5_K_M — 0.971 F1 tool calling, ~20-28 tok/s (~9.5 GB)
|
||||
# Qwen3-32B Q4_K_M — highest quality, ~8-12 tok/s (~20 GB, on demand)
|
||||
DEFAULT_TIER_MODELS: dict[ModelTier, str] = {
|
||||
ModelTier.T1_ROUTINE: "qwen3:8b",
|
||||
ModelTier.T2_MEDIUM: "qwen3:14b",
|
||||
ModelTier.T3_COMPLEX: "qwen3:30b", # Closest Ollama tag to 32B Q4
|
||||
}
|
||||
|
||||
|
||||
class MetabolicRouter:
|
||||
"""Routes LLM requests to the cheapest-sufficient model tier.
|
||||
|
||||
Wraps CascadeRouter with:
|
||||
- Complexity classification via classify_complexity()
|
||||
- Prompt assembly via build_prompt()
|
||||
- T3 world-pause / world-unpause (graceful if no world adapter)
|
||||
|
||||
Usage::
|
||||
|
||||
router = MetabolicRouter()
|
||||
|
||||
# Simple route call — classification + prompt + inference in one step
|
||||
result = await router.route(
|
||||
task="Go north",
|
||||
state={"location": "Balmora"},
|
||||
ui_state={},
|
||||
)
|
||||
print(result["content"], result["tier"])
|
||||
|
||||
# Pre-classify if you need the tier for telemetry
|
||||
tier = router.classify("Plan the best path to Vivec", game_state)
|
||||
|
||||
# Wire in world adapter for T3 pause/unpause
|
||||
router.set_world(world_adapter)
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
cascade: Any | None = None,
|
||||
tier_models: dict[ModelTier, str] | None = None,
|
||||
) -> None:
|
||||
"""Initialise the metabolic router.
|
||||
|
||||
Args:
|
||||
cascade: CascadeRouter instance to use. If None, the
|
||||
singleton returned by get_router() is used lazily.
|
||||
tier_models: Override default model names per tier.
|
||||
"""
|
||||
self._cascade = cascade
|
||||
self._tier_models: dict[ModelTier, str] = dict(DEFAULT_TIER_MODELS)
|
||||
if tier_models:
|
||||
self._tier_models.update(tier_models)
|
||||
self._world: Any | None = None
|
||||
|
||||
def set_world(self, world: Any) -> None:
|
||||
"""Wire in a world adapter for T3 pause / unpause support.
|
||||
|
||||
The adapter only needs to implement ``act(CommandInput)`` — the full
|
||||
WorldInterface contract is not required. A missing or broken world
|
||||
adapter degrades gracefully (logs a warning, inference continues).
|
||||
|
||||
Args:
|
||||
world: Any object with an ``act(CommandInput)`` method.
|
||||
"""
|
||||
self._world = world
|
||||
|
||||
def _get_cascade(self) -> Any:
|
||||
"""Return the CascadeRouter, creating the singleton if needed."""
|
||||
if self._cascade is None:
|
||||
from infrastructure.router.cascade import get_router
|
||||
|
||||
self._cascade = get_router()
|
||||
return self._cascade
|
||||
|
||||
def classify(self, task: str, state: dict) -> ModelTier:
|
||||
"""Classify task complexity. Delegates to classify_complexity()."""
|
||||
return classify_complexity(task, state)
|
||||
|
||||
async def _pause_world(self) -> None:
|
||||
"""Pause the game world before T3 inference (graceful degradation)."""
|
||||
if self._world is None:
|
||||
return
|
||||
try:
|
||||
from infrastructure.world.types import CommandInput
|
||||
|
||||
await asyncio.to_thread(self._world.act, CommandInput(action="pause"))
|
||||
logger.debug("MetabolicRouter: world paused for T3 inference")
|
||||
except Exception as exc:
|
||||
logger.warning("world.pause() failed — continuing without pause: %s", exc)
|
||||
|
||||
async def _unpause_world(self) -> None:
|
||||
"""Unpause the game world after T3 inference (always called, even on error)."""
|
||||
if self._world is None:
|
||||
return
|
||||
try:
|
||||
from infrastructure.world.types import CommandInput
|
||||
|
||||
await asyncio.to_thread(self._world.act, CommandInput(action="unpause"))
|
||||
logger.debug("MetabolicRouter: world unpaused after T3 inference")
|
||||
except Exception as exc:
|
||||
logger.warning("world.unpause() failed — game may remain paused: %s", exc)
|
||||
|
||||
async def route(
|
||||
self,
|
||||
task: str,
|
||||
state: dict,
|
||||
ui_state: dict | None = None,
|
||||
visual_context: str | None = None,
|
||||
temperature: float = 0.3,
|
||||
max_tokens: int | None = None,
|
||||
) -> dict:
|
||||
"""Route a task to the appropriate model tier and return the LLM response.
|
||||
|
||||
Selects the tier via classify_complexity(), assembles the prompt via
|
||||
build_prompt(), and dispatches to CascadeRouter. For T3, the game
|
||||
world is paused before inference and unpaused after (in a finally block).
|
||||
|
||||
Args:
|
||||
task: Natural-language task description or player input.
|
||||
state: Current game state dict.
|
||||
ui_state: Current UI state dict (optional, defaults to {}).
|
||||
visual_context: Optional screen/scene description from vision model.
|
||||
temperature: Sampling temperature (default 0.3 for game commands).
|
||||
max_tokens: Maximum tokens to generate.
|
||||
|
||||
Returns:
|
||||
Dict with keys: ``content``, ``provider``, ``model``, ``tier``,
|
||||
``latency_ms``, plus any extra keys from CascadeRouter.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If all providers fail (propagated from CascadeRouter).
|
||||
"""
|
||||
ui_state = ui_state or {}
|
||||
tier = self.classify(task, state)
|
||||
model = self._tier_models[tier]
|
||||
messages = build_prompt(state, ui_state, task, visual_context)
|
||||
cascade = self._get_cascade()
|
||||
|
||||
logger.info(
|
||||
"MetabolicRouter: tier=%s model=%s task=%r",
|
||||
tier,
|
||||
model,
|
||||
task[:80],
|
||||
)
|
||||
|
||||
if tier == ModelTier.T3_COMPLEX:
|
||||
await self._pause_world()
|
||||
try:
|
||||
result = await cascade.complete(
|
||||
messages=messages,
|
||||
model=model,
|
||||
temperature=temperature,
|
||||
max_tokens=max_tokens,
|
||||
)
|
||||
finally:
|
||||
await self._unpause_world()
|
||||
else:
|
||||
result = await cascade.complete(
|
||||
messages=messages,
|
||||
model=model,
|
||||
temperature=temperature,
|
||||
max_tokens=max_tokens,
|
||||
)
|
||||
|
||||
result["tier"] = tier
|
||||
return result
|
||||
|
||||
|
||||
# ── Module-level singleton ────────────────────────────────────────────────────
|
||||
_metabolic_router: MetabolicRouter | None = None
|
||||
|
||||
|
||||
def get_metabolic_router() -> MetabolicRouter:
|
||||
"""Get or create the MetabolicRouter singleton."""
|
||||
global _metabolic_router
|
||||
if _metabolic_router is None:
|
||||
_metabolic_router = MetabolicRouter()
|
||||
return _metabolic_router
|
||||
@@ -24,8 +24,6 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
@dataclass
|
||||
class Intent:
|
||||
"""A classified user intent with confidence score and extracted entities."""
|
||||
|
||||
name: str
|
||||
confidence: float # 0.0 to 1.0
|
||||
entities: dict
|
||||
|
||||
@@ -17,15 +17,11 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class TxType(StrEnum):
|
||||
"""Lightning transaction direction type."""
|
||||
|
||||
incoming = "incoming"
|
||||
outgoing = "outgoing"
|
||||
|
||||
|
||||
class TxStatus(StrEnum):
|
||||
"""Lightning transaction settlement status."""
|
||||
|
||||
pending = "pending"
|
||||
settled = "settled"
|
||||
failed = "failed"
|
||||
|
||||
@@ -36,8 +36,6 @@ _EXPIRY_DAYS = 7
|
||||
|
||||
@dataclass
|
||||
class ApprovalItem:
|
||||
"""A proposed autonomous action requiring owner approval."""
|
||||
|
||||
id: str
|
||||
title: str
|
||||
description: str
|
||||
|
||||
@@ -46,8 +46,6 @@ class ApprovalItem:
|
||||
|
||||
@dataclass
|
||||
class Briefing:
|
||||
"""A generated morning briefing summarizing recent activity and pending approvals."""
|
||||
|
||||
generated_at: datetime
|
||||
summary: str # 150-300 words
|
||||
approval_items: list[ApprovalItem] = field(default_factory=list)
|
||||
|
||||
@@ -423,49 +423,6 @@ async def _poll_issue_completion(
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _format_assignment_comment(
|
||||
display_name: str,
|
||||
task_type: TaskType,
|
||||
description: str,
|
||||
acceptance_criteria: list[str],
|
||||
) -> str:
|
||||
"""Build the markdown comment body for a task assignment.
|
||||
|
||||
Args:
|
||||
display_name: Human-readable agent name.
|
||||
task_type: The inferred task type.
|
||||
description: Task description.
|
||||
acceptance_criteria: List of acceptance criteria strings.
|
||||
|
||||
Returns:
|
||||
Formatted markdown string for the comment.
|
||||
"""
|
||||
criteria_md = (
|
||||
"\n".join(f"- {c}" for c in acceptance_criteria)
|
||||
if acceptance_criteria
|
||||
else "_None specified_"
|
||||
)
|
||||
return (
|
||||
f"## Assigned to {display_name}\n\n"
|
||||
f"**Task type:** `{task_type.value}`\n\n"
|
||||
f"**Description:**\n{description}\n\n"
|
||||
f"**Acceptance criteria:**\n{criteria_md}\n\n"
|
||||
f"---\n*Dispatched by Timmy agent dispatcher.*"
|
||||
)
|
||||
|
||||
|
||||
def _select_label(agent: AgentType) -> str | None:
|
||||
"""Return the Gitea label for an agent based on its spec.
|
||||
|
||||
Args:
|
||||
agent: The target agent.
|
||||
|
||||
Returns:
|
||||
Label name or None if the agent has no label.
|
||||
"""
|
||||
return AGENT_REGISTRY[agent].gitea_label
|
||||
|
||||
|
||||
async def _dispatch_via_gitea(
|
||||
agent: AgentType,
|
||||
issue_number: int,
|
||||
@@ -520,27 +477,37 @@ async def _dispatch_via_gitea(
|
||||
|
||||
async with httpx.AsyncClient(timeout=15) as client:
|
||||
# 1. Apply agent label (if applicable)
|
||||
label = _select_label(agent)
|
||||
if label:
|
||||
ok = await _apply_gitea_label(client, base_url, repo, headers, issue_number, label)
|
||||
if spec.gitea_label:
|
||||
ok = await _apply_gitea_label(
|
||||
client, base_url, repo, headers, issue_number, spec.gitea_label
|
||||
)
|
||||
if ok:
|
||||
label_applied = label
|
||||
label_applied = spec.gitea_label
|
||||
logger.info(
|
||||
"Applied label %r to issue #%s for %s",
|
||||
label,
|
||||
spec.gitea_label,
|
||||
issue_number,
|
||||
spec.display_name,
|
||||
)
|
||||
else:
|
||||
logger.warning(
|
||||
"Could not apply label %r to issue #%s",
|
||||
label,
|
||||
spec.gitea_label,
|
||||
issue_number,
|
||||
)
|
||||
|
||||
# 2. Post assignment comment
|
||||
comment_body = _format_assignment_comment(
|
||||
spec.display_name, task_type, description, acceptance_criteria
|
||||
criteria_md = (
|
||||
"\n".join(f"- {c}" for c in acceptance_criteria)
|
||||
if acceptance_criteria
|
||||
else "_None specified_"
|
||||
)
|
||||
comment_body = (
|
||||
f"## Assigned to {spec.display_name}\n\n"
|
||||
f"**Task type:** `{task_type.value}`\n\n"
|
||||
f"**Description:**\n{description}\n\n"
|
||||
f"**Acceptance criteria:**\n{criteria_md}\n\n"
|
||||
f"---\n*Dispatched by Timmy agent dispatcher.*"
|
||||
)
|
||||
comment_id = await _post_gitea_comment(
|
||||
client, base_url, repo, headers, issue_number, comment_body
|
||||
@@ -686,80 +653,6 @@ async def _dispatch_local(
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _validate_task(
|
||||
title: str,
|
||||
task_type: TaskType | None,
|
||||
agent: AgentType | None,
|
||||
issue_number: int | None,
|
||||
) -> DispatchResult | None:
|
||||
"""Validate task preconditions.
|
||||
|
||||
Args:
|
||||
title: Task title to validate.
|
||||
task_type: Optional task type for result construction.
|
||||
agent: Optional agent for result construction.
|
||||
issue_number: Optional issue number for result construction.
|
||||
|
||||
Returns:
|
||||
A failed DispatchResult if validation fails, None otherwise.
|
||||
"""
|
||||
if not title.strip():
|
||||
return DispatchResult(
|
||||
task_type=task_type or TaskType.ROUTINE_CODING,
|
||||
agent=agent or AgentType.TIMMY,
|
||||
issue_number=issue_number,
|
||||
status=DispatchStatus.FAILED,
|
||||
error="`title` is required.",
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
def _select_dispatch_strategy(agent: AgentType, issue_number: int | None) -> str:
|
||||
"""Select the dispatch strategy based on agent interface and context.
|
||||
|
||||
Args:
|
||||
agent: The target agent.
|
||||
issue_number: Optional Gitea issue number.
|
||||
|
||||
Returns:
|
||||
Strategy name: "gitea", "api", or "local".
|
||||
"""
|
||||
spec = AGENT_REGISTRY[agent]
|
||||
if spec.interface == "gitea" and issue_number is not None:
|
||||
return "gitea"
|
||||
if spec.interface == "api":
|
||||
return "api"
|
||||
return "local"
|
||||
|
||||
|
||||
def _log_dispatch_result(
|
||||
title: str,
|
||||
result: DispatchResult,
|
||||
attempt: int,
|
||||
max_retries: int,
|
||||
) -> None:
|
||||
"""Log the outcome of a dispatch attempt.
|
||||
|
||||
Args:
|
||||
title: Task title for logging context.
|
||||
result: The dispatch result.
|
||||
attempt: Current attempt number (0-indexed).
|
||||
max_retries: Maximum retry attempts allowed.
|
||||
"""
|
||||
if result.success:
|
||||
return
|
||||
|
||||
if attempt > 0:
|
||||
logger.info("Retry %d/%d for task %r", attempt, max_retries, title[:60])
|
||||
|
||||
logger.warning(
|
||||
"Dispatch attempt %d failed for task %r: %s",
|
||||
attempt + 1,
|
||||
title[:60],
|
||||
result.error,
|
||||
)
|
||||
|
||||
|
||||
async def dispatch_task(
|
||||
title: str,
|
||||
description: str = "",
|
||||
@@ -800,13 +693,17 @@ async def dispatch_task(
|
||||
if result.success:
|
||||
print(f"Assigned to {result.agent.value}")
|
||||
"""
|
||||
# 1. Validate
|
||||
validation_error = _validate_task(title, task_type, agent, issue_number)
|
||||
if validation_error:
|
||||
return validation_error
|
||||
|
||||
# 2. Resolve task type and agent
|
||||
criteria = acceptance_criteria or []
|
||||
|
||||
if not title.strip():
|
||||
return DispatchResult(
|
||||
task_type=task_type or TaskType.ROUTINE_CODING,
|
||||
agent=agent or AgentType.TIMMY,
|
||||
issue_number=issue_number,
|
||||
status=DispatchStatus.FAILED,
|
||||
error="`title` is required.",
|
||||
)
|
||||
|
||||
resolved_type = task_type or infer_task_type(title, description)
|
||||
resolved_agent = agent or select_agent(resolved_type)
|
||||
|
||||
@@ -818,16 +715,18 @@ async def dispatch_task(
|
||||
issue_number,
|
||||
)
|
||||
|
||||
# 3. Select strategy and dispatch with retries
|
||||
strategy = _select_dispatch_strategy(resolved_agent, issue_number)
|
||||
last_result: DispatchResult | None = None
|
||||
spec = AGENT_REGISTRY[resolved_agent]
|
||||
|
||||
last_result: DispatchResult | None = None
|
||||
for attempt in range(max_retries + 1):
|
||||
if strategy == "gitea":
|
||||
if attempt > 0:
|
||||
logger.info("Retry %d/%d for task %r", attempt, max_retries, title[:60])
|
||||
|
||||
if spec.interface == "gitea" and issue_number is not None:
|
||||
result = await _dispatch_via_gitea(
|
||||
resolved_agent, issue_number, title, description, criteria
|
||||
)
|
||||
elif strategy == "api":
|
||||
elif spec.interface == "api":
|
||||
result = await _dispatch_via_api(
|
||||
resolved_agent, title, description, criteria, issue_number, api_endpoint
|
||||
)
|
||||
@@ -840,9 +739,14 @@ async def dispatch_task(
|
||||
if result.success:
|
||||
return result
|
||||
|
||||
_log_dispatch_result(title, result, attempt, max_retries)
|
||||
logger.warning(
|
||||
"Dispatch attempt %d failed for task %r: %s",
|
||||
attempt + 1,
|
||||
title[:60],
|
||||
result.error,
|
||||
)
|
||||
|
||||
# 4. All attempts exhausted — escalate
|
||||
# All attempts exhausted — escalate
|
||||
assert last_result is not None
|
||||
last_result.status = DispatchStatus.ESCALATED
|
||||
logger.error(
|
||||
|
||||
@@ -32,12 +32,8 @@ def get_llm_client() -> Any:
|
||||
# a client for an LLM service like OpenAI, Anthropic, or a local
|
||||
# model.
|
||||
class MockLLMClient:
|
||||
"""Stub LLM client for testing without a real language model."""
|
||||
|
||||
async def completion(self, prompt: str, max_tokens: int) -> Any:
|
||||
class MockCompletion:
|
||||
"""Stub completion response returned by MockLLMClient."""
|
||||
|
||||
def __init__(self, text: str) -> None:
|
||||
self.text = text
|
||||
|
||||
|
||||
@@ -1,7 +0,0 @@
|
||||
"""Sovereignty metrics for the Bannerlord loop.
|
||||
|
||||
Tracks how much of each AI layer (perception, decision, narration)
|
||||
runs locally vs. calls out to an LLM. Feeds the sovereignty dashboard.
|
||||
|
||||
Refs: #954, #953
|
||||
"""
|
||||
@@ -1,413 +0,0 @@
|
||||
"""Sovereignty metrics emitter and SQLite store.
|
||||
|
||||
Tracks the sovereignty percentage for each AI layer (perception, decision,
|
||||
narration) plus API cost and skill crystallisation. All data is persisted to
|
||||
``data/sovereignty_metrics.db`` so the dashboard can query trends over time.
|
||||
|
||||
Event types
|
||||
-----------
|
||||
perception layer:
|
||||
``perception_cache_hit`` — frame answered from local cache (sovereign)
|
||||
``perception_vlm_call`` — frame required a VLM inference call (non-sovereign)
|
||||
|
||||
decision layer:
|
||||
``decision_rule_hit`` — action chosen by a deterministic rule (sovereign)
|
||||
``decision_llm_call`` — action required LLM reasoning (non-sovereign)
|
||||
|
||||
narration layer:
|
||||
``narration_template`` — text generated from a template (sovereign)
|
||||
``narration_llm`` — text generated by an LLM (non-sovereign)
|
||||
|
||||
skill layer:
|
||||
``skill_crystallized`` — a new skill was crystallised from LLM output
|
||||
|
||||
cost:
|
||||
``api_call`` — any external API call was made
|
||||
``api_cost`` — monetary cost of an API call (metadata: {"usd": float})
|
||||
|
||||
Refs: #954, #953
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
import sqlite3
|
||||
import uuid
|
||||
from contextlib import closing
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import UTC, datetime
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from config import settings
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ── Constants ─────────────────────────────────────────────────────────────────
|
||||
|
||||
DB_PATH = Path(settings.repo_root) / "data" / "sovereignty_metrics.db"
|
||||
|
||||
#: Sovereign event types for each layer (numerator of sovereignty %).
|
||||
_SOVEREIGN_EVENTS: dict[str, frozenset[str]] = {
|
||||
"perception": frozenset({"perception_cache_hit"}),
|
||||
"decision": frozenset({"decision_rule_hit"}),
|
||||
"narration": frozenset({"narration_template"}),
|
||||
}
|
||||
|
||||
#: All tracked event types for each layer (denominator of sovereignty %).
|
||||
_LAYER_EVENTS: dict[str, frozenset[str]] = {
|
||||
"perception": frozenset({"perception_cache_hit", "perception_vlm_call"}),
|
||||
"decision": frozenset({"decision_rule_hit", "decision_llm_call"}),
|
||||
"narration": frozenset({"narration_template", "narration_llm"}),
|
||||
}
|
||||
|
||||
ALL_EVENT_TYPES: frozenset[str] = frozenset(
|
||||
{
|
||||
"perception_cache_hit",
|
||||
"perception_vlm_call",
|
||||
"decision_rule_hit",
|
||||
"decision_llm_call",
|
||||
"narration_template",
|
||||
"narration_llm",
|
||||
"skill_crystallized",
|
||||
"api_call",
|
||||
"api_cost",
|
||||
}
|
||||
)
|
||||
|
||||
# ── Schema ────────────────────────────────────────────────────────────────────
|
||||
|
||||
_SCHEMA = """
|
||||
CREATE TABLE IF NOT EXISTS events (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
timestamp TEXT NOT NULL,
|
||||
event_type TEXT NOT NULL,
|
||||
session_id TEXT NOT NULL DEFAULT '',
|
||||
metadata_json TEXT NOT NULL DEFAULT '{}'
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_ev_type ON events(event_type);
|
||||
CREATE INDEX IF NOT EXISTS idx_ev_ts ON events(timestamp);
|
||||
CREATE INDEX IF NOT EXISTS idx_ev_session ON events(session_id);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS sessions (
|
||||
session_id TEXT PRIMARY KEY,
|
||||
game TEXT NOT NULL DEFAULT '',
|
||||
start_time TEXT NOT NULL,
|
||||
end_time TEXT
|
||||
);
|
||||
"""
|
||||
|
||||
|
||||
# ── Data classes ──────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@dataclass
|
||||
class SovereigntyEvent:
|
||||
"""A single sovereignty event."""
|
||||
|
||||
event_type: str
|
||||
session_id: str = ""
|
||||
metadata: dict[str, Any] = field(default_factory=dict)
|
||||
timestamp: str = field(default_factory=lambda: datetime.now(UTC).isoformat())
|
||||
|
||||
|
||||
# ── Store ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class SovereigntyMetricsStore:
|
||||
"""SQLite-backed sovereignty event store.
|
||||
|
||||
Thread-safe: creates a new connection per operation (WAL mode).
|
||||
"""
|
||||
|
||||
def __init__(self, db_path: Path | None = None) -> None:
|
||||
self._db_path = db_path or DB_PATH
|
||||
self._init_db()
|
||||
|
||||
# ── internal ─────────────────────────────────────────────────────────────
|
||||
|
||||
def _init_db(self) -> None:
|
||||
try:
|
||||
self._db_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
with closing(sqlite3.connect(str(self._db_path))) as conn:
|
||||
conn.execute("PRAGMA journal_mode=WAL")
|
||||
conn.execute(f"PRAGMA busy_timeout={settings.db_busy_timeout_ms}")
|
||||
conn.executescript(_SCHEMA)
|
||||
conn.commit()
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to initialise sovereignty metrics DB: %s", exc)
|
||||
|
||||
def _connect(self) -> sqlite3.Connection:
|
||||
conn = sqlite3.connect(str(self._db_path))
|
||||
conn.row_factory = sqlite3.Row
|
||||
conn.execute(f"PRAGMA busy_timeout={settings.db_busy_timeout_ms}")
|
||||
return conn
|
||||
|
||||
# ── public API ────────────────────────────────────────────────────────────
|
||||
|
||||
def record(
|
||||
self, event_type: str, metadata: dict[str, Any] | None = None, *, session_id: str = ""
|
||||
) -> None:
|
||||
"""Record a sovereignty event.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
event_type:
|
||||
One of ``ALL_EVENT_TYPES``.
|
||||
metadata:
|
||||
Optional dict of extra data (serialised as JSON).
|
||||
session_id:
|
||||
Identifier of the current game session, if known.
|
||||
"""
|
||||
event = SovereigntyEvent(
|
||||
event_type=event_type,
|
||||
session_id=session_id,
|
||||
metadata=metadata or {},
|
||||
)
|
||||
try:
|
||||
with closing(self._connect()) as conn:
|
||||
conn.execute(
|
||||
"INSERT INTO events (timestamp, event_type, session_id, metadata_json) "
|
||||
"VALUES (?, ?, ?, ?)",
|
||||
(
|
||||
event.timestamp,
|
||||
event.event_type,
|
||||
event.session_id,
|
||||
json.dumps(event.metadata),
|
||||
),
|
||||
)
|
||||
conn.commit()
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to record sovereignty event: %s", exc)
|
||||
|
||||
def start_session(self, game: str = "", session_id: str | None = None) -> str:
|
||||
"""Register a new game session. Returns the session_id."""
|
||||
sid = session_id or str(uuid.uuid4())
|
||||
try:
|
||||
with closing(self._connect()) as conn:
|
||||
conn.execute(
|
||||
"INSERT OR IGNORE INTO sessions (session_id, game, start_time) VALUES (?, ?, ?)",
|
||||
(sid, game, datetime.now(UTC).isoformat()),
|
||||
)
|
||||
conn.commit()
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to start session: %s", exc)
|
||||
return sid
|
||||
|
||||
def end_session(self, session_id: str) -> None:
|
||||
"""Mark a session as ended."""
|
||||
try:
|
||||
with closing(self._connect()) as conn:
|
||||
conn.execute(
|
||||
"UPDATE sessions SET end_time = ? WHERE session_id = ?",
|
||||
(datetime.now(UTC).isoformat(), session_id),
|
||||
)
|
||||
conn.commit()
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to end session: %s", exc)
|
||||
|
||||
# ── analytics ─────────────────────────────────────────────────────────────
|
||||
|
||||
def get_sovereignty_pct(self, layer: str, time_window: float | None = None) -> float:
|
||||
"""Return the sovereignty percentage (0.0–100.0) for *layer*.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
layer:
|
||||
One of ``"perception"``, ``"decision"``, ``"narration"``.
|
||||
time_window:
|
||||
If given, only consider events from the last *time_window* seconds.
|
||||
If ``None``, all events are used.
|
||||
|
||||
Returns
|
||||
-------
|
||||
float
|
||||
Percentage of sovereign events for the layer, or 0.0 if no data.
|
||||
"""
|
||||
if layer not in _LAYER_EVENTS:
|
||||
logger.warning("Unknown sovereignty layer: %s", layer)
|
||||
return 0.0
|
||||
|
||||
sovereign = _SOVEREIGN_EVENTS[layer]
|
||||
total_types = _LAYER_EVENTS[layer]
|
||||
|
||||
sovereign_placeholders = ",".join("?" * len(sovereign))
|
||||
total_placeholders = ",".join("?" * len(total_types))
|
||||
|
||||
params_sov: list[Any] = list(sovereign)
|
||||
params_total: list[Any] = list(total_types)
|
||||
|
||||
if time_window is not None:
|
||||
cutoff = _seconds_ago_iso(time_window)
|
||||
where_ts = " AND timestamp >= ?"
|
||||
params_sov.append(cutoff)
|
||||
params_total.append(cutoff)
|
||||
else:
|
||||
where_ts = ""
|
||||
|
||||
try:
|
||||
with closing(self._connect()) as conn:
|
||||
total_count = conn.execute(
|
||||
f"SELECT COUNT(*) FROM events WHERE event_type IN ({total_placeholders}){where_ts}",
|
||||
params_total,
|
||||
).fetchone()[0]
|
||||
if total_count == 0:
|
||||
return 0.0
|
||||
sov_count = conn.execute(
|
||||
f"SELECT COUNT(*) FROM events WHERE event_type IN ({sovereign_placeholders}){where_ts}",
|
||||
params_sov,
|
||||
).fetchone()[0]
|
||||
return round(100.0 * sov_count / total_count, 2)
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to compute sovereignty pct: %s", exc)
|
||||
return 0.0
|
||||
|
||||
def get_cost_per_hour(self, time_window: float | None = None) -> float:
|
||||
"""Return the total API cost in USD extrapolated to a per-hour rate.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
time_window:
|
||||
Seconds of history to consider. Defaults to 3600 (last hour).
|
||||
|
||||
Returns
|
||||
-------
|
||||
float
|
||||
USD cost per hour, or 0.0 if no ``api_cost`` events exist.
|
||||
"""
|
||||
window = time_window if time_window is not None else 3600.0
|
||||
cutoff = _seconds_ago_iso(window)
|
||||
|
||||
try:
|
||||
with closing(self._connect()) as conn:
|
||||
rows = conn.execute(
|
||||
"SELECT metadata_json FROM events WHERE event_type = 'api_cost' AND timestamp >= ?",
|
||||
(cutoff,),
|
||||
).fetchall()
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to query api_cost events: %s", exc)
|
||||
return 0.0
|
||||
|
||||
total_usd = 0.0
|
||||
for row in rows:
|
||||
try:
|
||||
meta = json.loads(row["metadata_json"] or "{}")
|
||||
total_usd += float(meta.get("usd", 0.0))
|
||||
except (ValueError, TypeError, json.JSONDecodeError):
|
||||
pass
|
||||
|
||||
# Extrapolate: (total in window) * (3600 / window_seconds)
|
||||
if window == 0:
|
||||
return 0.0
|
||||
return round(total_usd * (3600.0 / window), 4)
|
||||
|
||||
def get_skills_crystallized(self, session_id: str | None = None) -> int:
|
||||
"""Return the number of skills crystallised.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
session_id:
|
||||
If given, count only events for that session. If ``None``,
|
||||
count across all sessions.
|
||||
"""
|
||||
try:
|
||||
with closing(self._connect()) as conn:
|
||||
if session_id:
|
||||
return conn.execute(
|
||||
"SELECT COUNT(*) FROM events WHERE event_type = 'skill_crystallized' AND session_id = ?",
|
||||
(session_id,),
|
||||
).fetchone()[0]
|
||||
return conn.execute(
|
||||
"SELECT COUNT(*) FROM events WHERE event_type = 'skill_crystallized'",
|
||||
).fetchone()[0]
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to query skill_crystallized: %s", exc)
|
||||
return 0
|
||||
|
||||
def get_snapshot(self) -> dict[str, Any]:
|
||||
"""Return a real-time metrics snapshot suitable for dashboard widgets."""
|
||||
return {
|
||||
"sovereignty": {
|
||||
layer: self.get_sovereignty_pct(layer, time_window=3600) for layer in _LAYER_EVENTS
|
||||
},
|
||||
"cost_per_hour": self.get_cost_per_hour(),
|
||||
"skills_crystallized": self.get_skills_crystallized(),
|
||||
}
|
||||
|
||||
|
||||
# ── Module-level singleton ────────────────────────────────────────────────────
|
||||
|
||||
_store: SovereigntyMetricsStore | None = None
|
||||
|
||||
|
||||
def get_metrics_store() -> SovereigntyMetricsStore:
|
||||
"""Return (or lazily create) the module-level singleton store."""
|
||||
global _store
|
||||
if _store is None:
|
||||
_store = SovereigntyMetricsStore()
|
||||
return _store
|
||||
|
||||
|
||||
# ── Convenience helpers ───────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def record(
|
||||
event_type: str, metadata: dict[str, Any] | None = None, *, session_id: str = ""
|
||||
) -> None:
|
||||
"""Module-level shortcut: ``metrics.record("perception_cache_hit")``."""
|
||||
get_metrics_store().record(event_type, metadata=metadata, session_id=session_id)
|
||||
|
||||
|
||||
def get_sovereignty_pct(layer: str, time_window: float | None = None) -> float:
|
||||
"""Module-level shortcut for :meth:`SovereigntyMetricsStore.get_sovereignty_pct`."""
|
||||
return get_metrics_store().get_sovereignty_pct(layer, time_window)
|
||||
|
||||
|
||||
def get_cost_per_hour(time_window: float | None = None) -> float:
|
||||
"""Module-level shortcut for :meth:`SovereigntyMetricsStore.get_cost_per_hour`."""
|
||||
return get_metrics_store().get_cost_per_hour(time_window)
|
||||
|
||||
|
||||
def get_skills_crystallized(session_id: str | None = None) -> int:
|
||||
"""Module-level shortcut for :meth:`SovereigntyMetricsStore.get_skills_crystallized`."""
|
||||
return get_metrics_store().get_skills_crystallized(session_id)
|
||||
|
||||
|
||||
async def emit_sovereignty_event(
|
||||
event_type: str,
|
||||
metadata: dict[str, Any] | None = None,
|
||||
*,
|
||||
session_id: str = "",
|
||||
) -> None:
|
||||
"""Record an event in a thread and publish it on the event bus.
|
||||
|
||||
This is the async-safe entry-point used by the agentic loop.
|
||||
"""
|
||||
from infrastructure.events.bus import emit
|
||||
|
||||
await asyncio.to_thread(
|
||||
get_metrics_store().record,
|
||||
event_type,
|
||||
metadata,
|
||||
session_id=session_id,
|
||||
)
|
||||
await emit(
|
||||
f"sovereignty.event.{event_type}",
|
||||
source="sovereignty_metrics",
|
||||
data={
|
||||
"event_type": event_type,
|
||||
"session_id": session_id,
|
||||
**(metadata or {}),
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
# ── Private helpers ───────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _seconds_ago_iso(seconds: float) -> str:
|
||||
"""Return an ISO-8601 timestamp *seconds* before now (UTC)."""
|
||||
import datetime as _dt
|
||||
|
||||
delta = _dt.timedelta(seconds=seconds)
|
||||
return (_dt.datetime.now(UTC) - delta).isoformat()
|
||||
@@ -1,92 +0,0 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
|
||||
@dataclass
|
||||
class Template:
|
||||
name: str
|
||||
image: np.ndarray
|
||||
threshold: float = 0.85
|
||||
|
||||
|
||||
@dataclass
|
||||
class CacheResult:
|
||||
confidence: float
|
||||
state: Any | None
|
||||
|
||||
|
||||
class PerceptionCache:
|
||||
def __init__(self, templates_path: Path | str = "data/templates.json"):
|
||||
self.templates_path = Path(templates_path)
|
||||
self.templates: list[Template] = []
|
||||
self.load()
|
||||
|
||||
def match(self, screenshot: np.ndarray) -> CacheResult:
|
||||
"""
|
||||
Matches templates against the screenshot.
|
||||
Returns the confidence and the name of the best matching template.
|
||||
"""
|
||||
best_match_confidence = 0.0
|
||||
best_match_name = None
|
||||
|
||||
for template in self.templates:
|
||||
res = cv2.matchTemplate(screenshot, template.image, cv2.TM_CCOEFF_NORMED)
|
||||
_, max_val, _, _ = cv2.minMaxLoc(res)
|
||||
if max_val > best_match_confidence:
|
||||
best_match_confidence = max_val
|
||||
best_match_name = template.name
|
||||
|
||||
if best_match_confidence > 0.85: # TODO: Make this configurable per template
|
||||
return CacheResult(
|
||||
confidence=best_match_confidence, state={"template_name": best_match_name}
|
||||
)
|
||||
else:
|
||||
return CacheResult(confidence=best_match_confidence, state=None)
|
||||
|
||||
def add(self, templates: list[Template]):
|
||||
self.templates.extend(templates)
|
||||
|
||||
def persist(self):
|
||||
self.templates_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
# Note: This is a simplified persistence mechanism.
|
||||
# A more robust solution would store templates as images and metadata in JSON.
|
||||
with self.templates_path.open("w") as f:
|
||||
json.dump(
|
||||
[{"name": t.name, "threshold": t.threshold} for t in self.templates], f, indent=2
|
||||
)
|
||||
|
||||
def load(self):
|
||||
if self.templates_path.exists():
|
||||
with self.templates_path.open("r") as f:
|
||||
templates_data = json.load(f)
|
||||
# This is a simplified loading mechanism and assumes template images are stored elsewhere.
|
||||
# For now, we are not loading the actual images.
|
||||
self.templates = [
|
||||
Template(name=t["name"], image=np.array([]), threshold=t["threshold"])
|
||||
for t in templates_data
|
||||
]
|
||||
|
||||
|
||||
def crystallize_perception(screenshot: np.ndarray, vlm_response: Any) -> list[Template]:
|
||||
"""
|
||||
Extracts reusable patterns from VLM output and generates OpenCV templates.
|
||||
This is a placeholder and needs to be implemented based on the actual VLM response format.
|
||||
"""
|
||||
# Example implementation:
|
||||
# templates = []
|
||||
# for item in vlm_response.get("items", []):
|
||||
# bbox = item.get("bounding_box")
|
||||
# template_name = item.get("name")
|
||||
# if bbox and template_name:
|
||||
# x1, y1, x2, y2 = bbox
|
||||
# template_image = screenshot[y1:y2, x1:x2]
|
||||
# templates.append(Template(name=template_name, image=template_image))
|
||||
# return templates
|
||||
return []
|
||||
@@ -692,112 +692,91 @@ class ThinkingEngine:
|
||||
file paths actually exist on disk, preventing phantom-bug reports.
|
||||
"""
|
||||
try:
|
||||
recent = self._get_recent_thoughts_for_issues()
|
||||
if recent is None:
|
||||
interval = settings.thinking_issue_every
|
||||
if interval <= 0:
|
||||
return
|
||||
|
||||
classify_prompt = self._build_issue_classify_prompt(recent)
|
||||
count = self.count_thoughts()
|
||||
if count == 0 or count % interval != 0:
|
||||
return
|
||||
|
||||
# Check Gitea availability before spending LLM tokens
|
||||
if not settings.gitea_enabled or not settings.gitea_token:
|
||||
return
|
||||
|
||||
recent = self.get_recent_thoughts(limit=interval)
|
||||
if len(recent) < interval:
|
||||
return
|
||||
|
||||
thought_text = "\n".join(f"- [{t.seed_type}] {t.content}" for t in reversed(recent))
|
||||
|
||||
classify_prompt = (
|
||||
"You are reviewing your own recent thoughts for actionable items.\n"
|
||||
"Extract 0-2 items that are CONCRETE bugs, broken features, stale "
|
||||
"state, or clear improvement opportunities in your own codebase.\n\n"
|
||||
"Rules:\n"
|
||||
"- Only include things that could become a real code fix or feature\n"
|
||||
"- Skip vague reflections, philosophical musings, or repeated themes\n"
|
||||
"- Category must be one of: bug, feature, suggestion, maintenance\n"
|
||||
"- ONLY reference files that you are CERTAIN exist in the project\n"
|
||||
"- Do NOT invent or guess file paths — if unsure, describe the "
|
||||
"area of concern without naming specific files\n\n"
|
||||
"For each item, write an ENGINEER-QUALITY issue:\n"
|
||||
'- "title": A clear, specific title (e.g. "[Memory] MEMORY.md timestamp not updating")\n'
|
||||
'- "body": A detailed body with these sections:\n'
|
||||
" **What's happening:** Describe the current (broken) behavior.\n"
|
||||
" **Expected behavior:** What should happen instead.\n"
|
||||
" **Suggested fix:** Which file(s) to change and what the fix looks like.\n"
|
||||
" **Acceptance criteria:** How to verify the fix works.\n"
|
||||
'- "category": One of bug, feature, suggestion, maintenance\n\n'
|
||||
"Return ONLY a JSON array of objects with keys: "
|
||||
'"title", "body", "category"\n'
|
||||
"Return [] if nothing is actionable.\n\n"
|
||||
f"Recent thoughts:\n{thought_text}\n\nJSON array:"
|
||||
)
|
||||
|
||||
raw = await self._call_agent(classify_prompt)
|
||||
items = self._parse_issue_items(raw)
|
||||
if items is None:
|
||||
if not raw or not raw.strip():
|
||||
return
|
||||
|
||||
import json
|
||||
|
||||
# Strip markdown code fences if present
|
||||
cleaned = raw.strip()
|
||||
if cleaned.startswith("```"):
|
||||
cleaned = cleaned.split("\n", 1)[-1].rsplit("```", 1)[0].strip()
|
||||
|
||||
items = json.loads(cleaned)
|
||||
if not isinstance(items, list) or not items:
|
||||
return
|
||||
|
||||
from timmy.mcp_tools import create_gitea_issue_via_mcp
|
||||
|
||||
for item in items[:2]: # Safety cap
|
||||
await self._file_single_issue(item, create_gitea_issue_via_mcp)
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
title = item.get("title", "").strip()
|
||||
body = item.get("body", "").strip()
|
||||
category = item.get("category", "suggestion").strip()
|
||||
if not title or len(title) < 10:
|
||||
continue
|
||||
|
||||
# Validate all referenced file paths exist on disk
|
||||
combined = f"{title}\n{body}"
|
||||
if not self._references_real_files(combined):
|
||||
logger.info(
|
||||
"Skipped phantom issue: %s (references non-existent files)",
|
||||
title[:60],
|
||||
)
|
||||
continue
|
||||
|
||||
label = category if category in ("bug", "feature") else ""
|
||||
result = await create_gitea_issue_via_mcp(title=title, body=body, labels=label)
|
||||
logger.info("Thought→Issue: %s → %s", title[:60], result[:80])
|
||||
|
||||
except Exception as exc:
|
||||
logger.debug("Thought issue filing skipped: %s", exc)
|
||||
|
||||
def _get_recent_thoughts_for_issues(self):
|
||||
"""Return recent thoughts if conditions for filing issues are met, else None."""
|
||||
interval = settings.thinking_issue_every
|
||||
if interval <= 0:
|
||||
return None
|
||||
|
||||
count = self.count_thoughts()
|
||||
if count == 0 or count % interval != 0:
|
||||
return None
|
||||
|
||||
if not settings.gitea_enabled or not settings.gitea_token:
|
||||
return None
|
||||
|
||||
recent = self.get_recent_thoughts(limit=interval)
|
||||
if len(recent) < interval:
|
||||
return None
|
||||
|
||||
return recent
|
||||
|
||||
@staticmethod
|
||||
def _build_issue_classify_prompt(recent) -> str:
|
||||
"""Build the LLM prompt that extracts actionable issues from recent thoughts."""
|
||||
thought_text = "\n".join(f"- [{t.seed_type}] {t.content}" for t in reversed(recent))
|
||||
return (
|
||||
"You are reviewing your own recent thoughts for actionable items.\n"
|
||||
"Extract 0-2 items that are CONCRETE bugs, broken features, stale "
|
||||
"state, or clear improvement opportunities in your own codebase.\n\n"
|
||||
"Rules:\n"
|
||||
"- Only include things that could become a real code fix or feature\n"
|
||||
"- Skip vague reflections, philosophical musings, or repeated themes\n"
|
||||
"- Category must be one of: bug, feature, suggestion, maintenance\n"
|
||||
"- ONLY reference files that you are CERTAIN exist in the project\n"
|
||||
"- Do NOT invent or guess file paths — if unsure, describe the "
|
||||
"area of concern without naming specific files\n\n"
|
||||
"For each item, write an ENGINEER-QUALITY issue:\n"
|
||||
'- "title": A clear, specific title (e.g. "[Memory] MEMORY.md timestamp not updating")\n'
|
||||
'- "body": A detailed body with these sections:\n'
|
||||
" **What's happening:** Describe the current (broken) behavior.\n"
|
||||
" **Expected behavior:** What should happen instead.\n"
|
||||
" **Suggested fix:** Which file(s) to change and what the fix looks like.\n"
|
||||
" **Acceptance criteria:** How to verify the fix works.\n"
|
||||
'- "category": One of bug, feature, suggestion, maintenance\n\n'
|
||||
"Return ONLY a JSON array of objects with keys: "
|
||||
'"title", "body", "category"\n'
|
||||
"Return [] if nothing is actionable.\n\n"
|
||||
f"Recent thoughts:\n{thought_text}\n\nJSON array:"
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def _parse_issue_items(raw: str):
|
||||
"""Strip markdown fences and parse JSON issue list; return None on failure."""
|
||||
import json
|
||||
|
||||
if not raw or not raw.strip():
|
||||
return None
|
||||
|
||||
cleaned = raw.strip()
|
||||
if cleaned.startswith("```"):
|
||||
cleaned = cleaned.split("\n", 1)[-1].rsplit("```", 1)[0].strip()
|
||||
|
||||
items = json.loads(cleaned)
|
||||
if not isinstance(items, list) or not items:
|
||||
return None
|
||||
|
||||
return items
|
||||
|
||||
async def _file_single_issue(self, item: dict, create_fn) -> None:
|
||||
"""Validate one issue dict and create it via *create_fn* if it passes checks."""
|
||||
if not isinstance(item, dict):
|
||||
return
|
||||
title = item.get("title", "").strip()
|
||||
body = item.get("body", "").strip()
|
||||
category = item.get("category", "suggestion").strip()
|
||||
if not title or len(title) < 10:
|
||||
return
|
||||
|
||||
combined = f"{title}\n{body}"
|
||||
if not self._references_real_files(combined):
|
||||
logger.info(
|
||||
"Skipped phantom issue: %s (references non-existent files)",
|
||||
title[:60],
|
||||
)
|
||||
return
|
||||
|
||||
label = category if category in ("bug", "feature") else ""
|
||||
result = await create_fn(title=title, body=body, labels=label)
|
||||
logger.info("Thought→Issue: %s → %s", title[:60], result[:80])
|
||||
|
||||
# ── System snapshot helpers ────────────────────────────────────────────
|
||||
|
||||
def _snap_thought_count(self, now: datetime) -> str | None:
|
||||
|
||||
@@ -110,92 +110,6 @@ async def _get_or_create_label(
|
||||
return None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Dispatch action helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
async def _apply_label_to_issue(
|
||||
client: Any,
|
||||
base_url: str,
|
||||
headers: dict,
|
||||
repo: str,
|
||||
issue_number: int,
|
||||
label_name: str,
|
||||
) -> bool:
|
||||
"""Get-or-create the label then apply it to the issue. Returns True on success."""
|
||||
label_id = await _get_or_create_label(client, base_url, headers, repo, label_name)
|
||||
if label_id is None:
|
||||
return False
|
||||
resp = await client.post(
|
||||
f"{base_url}/repos/{repo}/issues/{issue_number}/labels",
|
||||
headers=headers,
|
||||
json={"labels": [label_id]},
|
||||
)
|
||||
return resp.status_code in (200, 201)
|
||||
|
||||
|
||||
async def _post_dispatch_comment(
|
||||
client: Any,
|
||||
base_url: str,
|
||||
headers: dict,
|
||||
repo: str,
|
||||
issue: TriagedIssue,
|
||||
label_name: str,
|
||||
) -> bool:
|
||||
"""Post the vassal routing comment. Returns True on success."""
|
||||
agent_name = issue.agent_target.value.capitalize()
|
||||
comment_body = (
|
||||
f"🤖 **Vassal dispatch** → routed to **{agent_name}**\n\n"
|
||||
f"Priority score: {issue.priority_score} \n"
|
||||
f"Rationale: {issue.rationale} \n"
|
||||
f"Label: `{label_name}`"
|
||||
)
|
||||
resp = await client.post(
|
||||
f"{base_url}/repos/{repo}/issues/{issue.number}/comments",
|
||||
headers=headers,
|
||||
json={"body": comment_body},
|
||||
)
|
||||
return resp.status_code in (200, 201)
|
||||
|
||||
|
||||
async def _perform_gitea_dispatch(
|
||||
issue: TriagedIssue,
|
||||
record: DispatchRecord,
|
||||
) -> None:
|
||||
"""Apply label and post comment via Gitea. Mutates *record* in-place."""
|
||||
try:
|
||||
import httpx
|
||||
|
||||
from config import settings
|
||||
except ImportError as exc:
|
||||
logger.warning("dispatch_issue: missing dependency — %s", exc)
|
||||
return
|
||||
|
||||
if not settings.gitea_enabled or not settings.gitea_token:
|
||||
logger.info("dispatch_issue: Gitea disabled — skipping label/comment")
|
||||
return
|
||||
|
||||
base_url = f"{settings.gitea_url}/api/v1"
|
||||
repo = settings.gitea_repo
|
||||
headers = {
|
||||
"Authorization": f"token {settings.gitea_token}",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
label_name = _LABEL_MAP[issue.agent_target]
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=15) as client:
|
||||
record.label_applied = await _apply_label_to_issue(
|
||||
client, base_url, headers, repo, issue.number, label_name
|
||||
)
|
||||
record.comment_posted = await _post_dispatch_comment(
|
||||
client, base_url, headers, repo, issue, label_name
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.warning("dispatch_issue: Gitea action failed — %s", exc)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Dispatch action
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -230,7 +144,58 @@ async def dispatch_issue(issue: TriagedIssue) -> DispatchRecord:
|
||||
_registry[issue.number] = record
|
||||
return record
|
||||
|
||||
await _perform_gitea_dispatch(issue, record)
|
||||
try:
|
||||
import httpx
|
||||
|
||||
from config import settings
|
||||
except ImportError as exc:
|
||||
logger.warning("dispatch_issue: missing dependency — %s", exc)
|
||||
_registry[issue.number] = record
|
||||
return record
|
||||
|
||||
if not settings.gitea_enabled or not settings.gitea_token:
|
||||
logger.info("dispatch_issue: Gitea disabled — skipping label/comment")
|
||||
_registry[issue.number] = record
|
||||
return record
|
||||
|
||||
base_url = f"{settings.gitea_url}/api/v1"
|
||||
repo = settings.gitea_repo
|
||||
headers = {
|
||||
"Authorization": f"token {settings.gitea_token}",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
label_name = _LABEL_MAP[issue.agent_target]
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=15) as client:
|
||||
label_id = await _get_or_create_label(client, base_url, headers, repo, label_name)
|
||||
|
||||
# Apply label
|
||||
if label_id is not None:
|
||||
resp = await client.post(
|
||||
f"{base_url}/repos/{repo}/issues/{issue.number}/labels",
|
||||
headers=headers,
|
||||
json={"labels": [label_id]},
|
||||
)
|
||||
record.label_applied = resp.status_code in (200, 201)
|
||||
|
||||
# Post routing comment
|
||||
agent_name = issue.agent_target.value.capitalize()
|
||||
comment_body = (
|
||||
f"🤖 **Vassal dispatch** → routed to **{agent_name}**\n\n"
|
||||
f"Priority score: {issue.priority_score} \n"
|
||||
f"Rationale: {issue.rationale} \n"
|
||||
f"Label: `{label_name}`"
|
||||
)
|
||||
resp = await client.post(
|
||||
f"{base_url}/repos/{repo}/issues/{issue.number}/comments",
|
||||
headers=headers,
|
||||
json={"body": comment_body},
|
||||
)
|
||||
record.comment_posted = resp.status_code in (200, 201)
|
||||
|
||||
except Exception as exc:
|
||||
logger.warning("dispatch_issue: Gitea action failed — %s", exc)
|
||||
|
||||
_registry[issue.number] = record
|
||||
logger.info(
|
||||
|
||||
@@ -25,21 +25,15 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class ChatRequest(BaseModel):
|
||||
"""Incoming chat request payload for the Timmy Serve API."""
|
||||
|
||||
message: str
|
||||
stream: bool = False
|
||||
|
||||
|
||||
class ChatResponse(BaseModel):
|
||||
"""Chat response payload returned by the Timmy Serve API."""
|
||||
|
||||
response: str
|
||||
|
||||
|
||||
class StatusResponse(BaseModel):
|
||||
"""Service status response with backend information."""
|
||||
|
||||
status: str
|
||||
backend: str
|
||||
|
||||
|
||||
@@ -9,9 +9,6 @@ Usage:
|
||||
|
||||
import json
|
||||
import os
|
||||
import subprocess
|
||||
import urllib.error
|
||||
import urllib.request
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
@@ -34,37 +31,6 @@ AUTOMATIONS_CONFIG = DEFAULT_CONFIG_DIR / "automations.json"
|
||||
DAILY_RUN_CONFIG = DEFAULT_CONFIG_DIR / "daily_run.json"
|
||||
TRIAGE_RULES_CONFIG = DEFAULT_CONFIG_DIR / "triage_rules.yaml"
|
||||
|
||||
GITEA_URL = os.environ.get("GITEA_URL", "http://143.198.27.163:3000")
|
||||
GITEA_REPO = "rockachopa/Timmy-time-dashboard"
|
||||
|
||||
|
||||
def _get_gitea_token() -> str | None:
|
||||
"""Read the Gitea API token from env or config files."""
|
||||
token = os.environ.get("GITEA_TOKEN")
|
||||
if token:
|
||||
return token.strip()
|
||||
for candidate in [
|
||||
Path("~/.hermes/gitea_token_vps").expanduser(),
|
||||
Path("~/.hermes/gitea_token").expanduser(),
|
||||
]:
|
||||
try:
|
||||
return candidate.read_text(encoding="utf-8").strip()
|
||||
except FileNotFoundError:
|
||||
continue
|
||||
return None
|
||||
|
||||
|
||||
def _gitea_api_get(endpoint: str) -> Any:
|
||||
"""GET a Gitea API endpoint and return parsed JSON."""
|
||||
url = f"{GITEA_URL}/api/v1{endpoint}"
|
||||
token = _get_gitea_token()
|
||||
req = urllib.request.Request(url)
|
||||
if token:
|
||||
req.add_header("Authorization", f"token {token}")
|
||||
req.add_header("Accept", "application/json")
|
||||
with urllib.request.urlopen(req, timeout=15) as resp:
|
||||
return json.loads(resp.read().decode("utf-8"))
|
||||
|
||||
|
||||
def _load_json_config(path: Path) -> dict[str, Any]:
|
||||
"""Load a JSON config file, returning empty dict on error."""
|
||||
@@ -95,106 +61,6 @@ def _get_config_dir() -> Path:
|
||||
return DEFAULT_CONFIG_DIR
|
||||
|
||||
|
||||
def _load_daily_run_config() -> dict[str, Any]:
|
||||
"""Load and validate the daily run configuration."""
|
||||
config_path = _get_config_dir() / "daily_run.json"
|
||||
config = _load_json_config(config_path)
|
||||
|
||||
if not config:
|
||||
console.print("[yellow]No daily run configuration found.[/yellow]")
|
||||
raise typer.Exit(1)
|
||||
|
||||
return config
|
||||
|
||||
|
||||
def _display_schedules_table(schedules: dict[str, Any]) -> None:
|
||||
"""Display the daily run schedules in a table."""
|
||||
table = Table(title="Daily Run Schedules")
|
||||
table.add_column("Schedule", style="cyan")
|
||||
table.add_column("Description", style="green")
|
||||
table.add_column("Automations", style="yellow")
|
||||
|
||||
for schedule_name, schedule_data in schedules.items():
|
||||
automations = schedule_data.get("automations", [])
|
||||
table.add_row(
|
||||
schedule_name,
|
||||
schedule_data.get("description", ""),
|
||||
", ".join(automations) if automations else "—",
|
||||
)
|
||||
|
||||
console.print(table)
|
||||
console.print()
|
||||
|
||||
|
||||
def _display_triggers_table(triggers: dict[str, Any]) -> None:
|
||||
"""Display the triggers in a table."""
|
||||
trigger_table = Table(title="Triggers")
|
||||
trigger_table.add_column("Trigger", style="cyan")
|
||||
trigger_table.add_column("Description", style="green")
|
||||
trigger_table.add_column("Automations", style="yellow")
|
||||
|
||||
for trigger_name, trigger_data in triggers.items():
|
||||
automations = trigger_data.get("automations", [])
|
||||
trigger_table.add_row(
|
||||
trigger_name,
|
||||
trigger_data.get("description", ""),
|
||||
", ".join(automations) if automations else "—",
|
||||
)
|
||||
|
||||
console.print(trigger_table)
|
||||
console.print()
|
||||
|
||||
|
||||
def _execute_automation(auto: dict[str, Any], verbose: bool) -> None:
|
||||
"""Execute a single automation and display results."""
|
||||
cmd = auto.get("command")
|
||||
name = auto.get("name", auto.get("id", "unnamed"))
|
||||
if not cmd:
|
||||
console.print(f"[yellow]Skipping {name} — no command defined.[/yellow]")
|
||||
return
|
||||
|
||||
console.print(f"[cyan]▶ Running: {name}[/cyan]")
|
||||
if verbose:
|
||||
console.print(f"[dim] $ {cmd}[/dim]")
|
||||
|
||||
try:
|
||||
result = subprocess.run( # noqa: S602
|
||||
cmd,
|
||||
shell=True,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=120,
|
||||
)
|
||||
if result.stdout.strip():
|
||||
console.print(result.stdout.strip())
|
||||
if result.returncode != 0:
|
||||
console.print(f"[red] ✗ {name} exited with code {result.returncode}[/red]")
|
||||
if result.stderr.strip():
|
||||
console.print(f"[red]{result.stderr.strip()}[/red]")
|
||||
else:
|
||||
console.print(f"[green] ✓ {name} completed successfully[/green]")
|
||||
except subprocess.TimeoutExpired:
|
||||
console.print(f"[red] ✗ {name} timed out after 120s[/red]")
|
||||
except Exception as exc:
|
||||
console.print(f"[red] ✗ {name} failed: {exc}[/red]")
|
||||
|
||||
|
||||
def _execute_all_automations(verbose: bool) -> None:
|
||||
"""Execute all enabled automations."""
|
||||
console.print("[green]Executing daily run automations...[/green]")
|
||||
auto_config_path = _get_config_dir() / "automations.json"
|
||||
auto_config = _load_json_config(auto_config_path)
|
||||
all_automations = auto_config.get("automations", [])
|
||||
enabled = [a for a in all_automations if a.get("enabled", False)]
|
||||
|
||||
if not enabled:
|
||||
console.print("[yellow]No enabled automations found.[/yellow]")
|
||||
return
|
||||
|
||||
for auto in enabled:
|
||||
_execute_automation(auto, verbose)
|
||||
|
||||
|
||||
@app.command()
|
||||
def daily_run(
|
||||
dry_run: bool = typer.Option(
|
||||
@@ -213,22 +79,61 @@ def daily_run(
|
||||
console.print("[bold green]Timmy Daily Run[/bold green]")
|
||||
console.print()
|
||||
|
||||
config = _load_daily_run_config()
|
||||
config_path = _get_config_dir() / "daily_run.json"
|
||||
config = _load_json_config(config_path)
|
||||
|
||||
if not config:
|
||||
console.print("[yellow]No daily run configuration found.[/yellow]")
|
||||
raise typer.Exit(1)
|
||||
|
||||
schedules = config.get("schedules", {})
|
||||
triggers = config.get("triggers", {})
|
||||
|
||||
if verbose:
|
||||
config_path = _get_config_dir() / "daily_run.json"
|
||||
console.print(f"[dim]Config loaded from: {config_path}[/dim]")
|
||||
console.print()
|
||||
|
||||
_display_schedules_table(schedules)
|
||||
_display_triggers_table(triggers)
|
||||
# Show the daily run schedule
|
||||
table = Table(title="Daily Run Schedules")
|
||||
table.add_column("Schedule", style="cyan")
|
||||
table.add_column("Description", style="green")
|
||||
table.add_column("Automations", style="yellow")
|
||||
|
||||
for schedule_name, schedule_data in schedules.items():
|
||||
automations = schedule_data.get("automations", [])
|
||||
table.add_row(
|
||||
schedule_name,
|
||||
schedule_data.get("description", ""),
|
||||
", ".join(automations) if automations else "—",
|
||||
)
|
||||
|
||||
console.print(table)
|
||||
console.print()
|
||||
|
||||
# Show triggers
|
||||
trigger_table = Table(title="Triggers")
|
||||
trigger_table.add_column("Trigger", style="cyan")
|
||||
trigger_table.add_column("Description", style="green")
|
||||
trigger_table.add_column("Automations", style="yellow")
|
||||
|
||||
for trigger_name, trigger_data in triggers.items():
|
||||
automations = trigger_data.get("automations", [])
|
||||
trigger_table.add_row(
|
||||
trigger_name,
|
||||
trigger_data.get("description", ""),
|
||||
", ".join(automations) if automations else "—",
|
||||
)
|
||||
|
||||
console.print(trigger_table)
|
||||
console.print()
|
||||
|
||||
if dry_run:
|
||||
console.print("[yellow]Dry run mode — no actions executed.[/yellow]")
|
||||
else:
|
||||
_execute_all_automations(verbose)
|
||||
console.print("[green]Executing daily run automations...[/green]")
|
||||
# TODO: Implement actual automation execution
|
||||
# This would call the appropriate scripts from the automations config
|
||||
console.print("[dim]Automation execution not yet implemented.[/dim]")
|
||||
|
||||
|
||||
@app.command()
|
||||
@@ -254,96 +159,9 @@ def log_run(
|
||||
console.print(f"[dim]Message:[/dim] {message}")
|
||||
console.print()
|
||||
|
||||
logbook_path = Path(".loop/logbook.jsonl")
|
||||
logbook_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
entry = json.dumps({"timestamp": timestamp, "category": category, "message": message})
|
||||
with open(logbook_path, "a", encoding="utf-8") as f:
|
||||
f.write(entry + "\n")
|
||||
console.print(f"[green]✓[/green] Entry logged to {logbook_path}")
|
||||
|
||||
|
||||
def _show_automations_table(limit: int) -> None:
|
||||
"""Display active automations from the automations config."""
|
||||
config_path = _get_config_dir() / "automations.json"
|
||||
config = _load_json_config(config_path)
|
||||
enabled = [a for a in config.get("automations", []) if a.get("enabled", False)]
|
||||
|
||||
table = Table(title="Active Automations")
|
||||
table.add_column("ID", style="cyan")
|
||||
table.add_column("Name", style="green")
|
||||
table.add_column("Category", style="yellow")
|
||||
table.add_column("Trigger", style="magenta")
|
||||
|
||||
for auto in enabled[:limit]:
|
||||
table.add_row(
|
||||
auto.get("id", ""),
|
||||
auto.get("name", ""),
|
||||
"✓" if auto.get("enabled", False) else "✗",
|
||||
auto.get("category", ""),
|
||||
)
|
||||
|
||||
console.print(table)
|
||||
console.print()
|
||||
|
||||
|
||||
def _show_prs_table(limit: int) -> None:
|
||||
"""Display open pull requests from Gitea."""
|
||||
table = Table(title="Open Pull Requests")
|
||||
table.add_column("#", style="cyan")
|
||||
table.add_column("Title", style="green")
|
||||
table.add_column("Author", style="yellow")
|
||||
table.add_column("Status", style="magenta")
|
||||
try:
|
||||
prs = _gitea_api_get(f"/repos/{GITEA_REPO}/pulls?state=open")
|
||||
if prs:
|
||||
for pr in prs[:limit]:
|
||||
table.add_row(
|
||||
str(pr.get("number", "")),
|
||||
pr.get("title", ""),
|
||||
pr.get("user", {}).get("login", ""),
|
||||
pr.get("state", ""),
|
||||
)
|
||||
else:
|
||||
table.add_row("—", "[dim]No open PRs[/dim]", "—", "—")
|
||||
except Exception as exc:
|
||||
table.add_row("—", f"[red]Error fetching PRs: {exc}[/red]", "—", "—")
|
||||
console.print(table)
|
||||
console.print()
|
||||
|
||||
|
||||
def _show_issues_table(limit: int) -> None:
|
||||
"""Display open issues from Gitea."""
|
||||
table = Table(title="Issues Calling for Attention")
|
||||
table.add_column("#", style="cyan")
|
||||
table.add_column("Title", style="green")
|
||||
table.add_column("Type", style="yellow")
|
||||
table.add_column("Priority", style="magenta")
|
||||
try:
|
||||
issues = _gitea_api_get(f"/repos/{GITEA_REPO}/issues?state=open&type=issues&limit={limit}")
|
||||
if issues:
|
||||
for issue in issues[:limit]:
|
||||
labels = [lb.get("name", "") for lb in issue.get("labels", [])]
|
||||
priority = next((lb for lb in labels if "priority" in lb.lower()), "—")
|
||||
issue_type = next(
|
||||
(
|
||||
lb
|
||||
for lb in labels
|
||||
if lb.lower() in ("bug", "feature", "refactor", "enhancement")
|
||||
),
|
||||
"—",
|
||||
)
|
||||
table.add_row(
|
||||
str(issue.get("number", "")),
|
||||
issue.get("title", ""),
|
||||
issue_type,
|
||||
priority,
|
||||
)
|
||||
else:
|
||||
table.add_row("—", "[dim]No open issues[/dim]", "—", "—")
|
||||
except Exception as exc:
|
||||
table.add_row("—", f"[red]Error fetching issues: {exc}[/red]", "—", "—")
|
||||
console.print(table)
|
||||
console.print()
|
||||
# TODO: Persist to actual logbook file
|
||||
# This would append to a logbook file (e.g., .loop/logbook.jsonl)
|
||||
console.print("[green]✓[/green] Entry logged (simulated)")
|
||||
|
||||
|
||||
@app.command()
|
||||
@@ -362,13 +180,54 @@ def inbox(
|
||||
console.print("[bold green]Timmy Inbox[/bold green]")
|
||||
console.print()
|
||||
|
||||
_show_automations_table(limit)
|
||||
# Load automations to show what's enabled
|
||||
config_path = _get_config_dir() / "automations.json"
|
||||
config = _load_json_config(config_path)
|
||||
|
||||
automations = config.get("automations", [])
|
||||
enabled_automations = [a for a in automations if a.get("enabled", False)]
|
||||
|
||||
# Show automation status
|
||||
auto_table = Table(title="Active Automations")
|
||||
auto_table.add_column("ID", style="cyan")
|
||||
auto_table.add_column("Name", style="green")
|
||||
auto_table.add_column("Category", style="yellow")
|
||||
auto_table.add_column("Trigger", style="magenta")
|
||||
|
||||
for auto in enabled_automations[:limit]:
|
||||
auto_table.add_row(
|
||||
auto.get("id", ""),
|
||||
auto.get("name", ""),
|
||||
"✓" if auto.get("enabled", False) else "✗",
|
||||
auto.get("category", ""),
|
||||
)
|
||||
|
||||
console.print(auto_table)
|
||||
console.print()
|
||||
|
||||
# TODO: Fetch actual PRs from Gitea API
|
||||
if include_prs:
|
||||
_show_prs_table(limit)
|
||||
pr_table = Table(title="Open Pull Requests (placeholder)")
|
||||
pr_table.add_column("#", style="cyan")
|
||||
pr_table.add_column("Title", style="green")
|
||||
pr_table.add_column("Author", style="yellow")
|
||||
pr_table.add_column("Status", style="magenta")
|
||||
pr_table.add_row("—", "[dim]No PRs fetched (Gitea API not configured)[/dim]", "—", "—")
|
||||
console.print(pr_table)
|
||||
console.print()
|
||||
|
||||
# TODO: Fetch relevant issues from Gitea API
|
||||
if include_issues:
|
||||
_show_issues_table(limit)
|
||||
issue_table = Table(title="Issues Calling for Attention (placeholder)")
|
||||
issue_table.add_column("#", style="cyan")
|
||||
issue_table.add_column("Title", style="green")
|
||||
issue_table.add_column("Type", style="yellow")
|
||||
issue_table.add_column("Priority", style="magenta")
|
||||
issue_table.add_row(
|
||||
"—", "[dim]No issues fetched (Gitea API not configured)[/dim]", "—", "—"
|
||||
)
|
||||
console.print(issue_table)
|
||||
console.print()
|
||||
|
||||
|
||||
@app.command()
|
||||
|
||||
@@ -2664,53 +2664,3 @@
|
||||
color: var(--bg-deep);
|
||||
}
|
||||
.vs-btn-save:hover { opacity: 0.85; }
|
||||
|
||||
/* ── Nexus ────────────────────────────────────────────────── */
|
||||
.nexus-layout { max-width: 1400px; margin: 0 auto; }
|
||||
|
||||
.nexus-header { border-bottom: 1px solid var(--border); padding-bottom: 0.5rem; }
|
||||
.nexus-title { font-size: 1.4rem; font-weight: 700; color: var(--purple); letter-spacing: 0.1em; }
|
||||
.nexus-subtitle { font-size: 0.8rem; color: var(--text-dim); margin-top: 0.2rem; }
|
||||
|
||||
.nexus-grid {
|
||||
display: grid;
|
||||
grid-template-columns: 1fr 320px;
|
||||
gap: 1rem;
|
||||
align-items: start;
|
||||
}
|
||||
@media (max-width: 900px) {
|
||||
.nexus-grid { grid-template-columns: 1fr; }
|
||||
}
|
||||
|
||||
.nexus-chat-panel { height: calc(100vh - 180px); display: flex; flex-direction: column; }
|
||||
.nexus-chat-panel .card-body { overflow-y: auto; flex: 1; }
|
||||
|
||||
.nexus-empty-state {
|
||||
color: var(--text-dim);
|
||||
font-size: 0.85rem;
|
||||
font-style: italic;
|
||||
padding: 1rem 0;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
/* Memory sidebar */
|
||||
.nexus-memory-hits { font-size: 0.78rem; }
|
||||
.nexus-memory-label { color: var(--text-dim); font-size: 0.72rem; margin-bottom: 0.4rem; letter-spacing: 0.05em; }
|
||||
.nexus-memory-hit { display: flex; gap: 0.4rem; margin-bottom: 0.35rem; align-items: flex-start; }
|
||||
.nexus-memory-type { color: var(--purple); font-size: 0.68rem; white-space: nowrap; padding-top: 0.1rem; min-width: 60px; }
|
||||
.nexus-memory-content { color: var(--text); line-height: 1.4; }
|
||||
|
||||
/* Teaching panel */
|
||||
.nexus-facts-header { font-size: 0.7rem; color: var(--text-dim); letter-spacing: 0.08em; margin-bottom: 0.4rem; }
|
||||
.nexus-facts-list { list-style: none; padding: 0; margin: 0; font-size: 0.8rem; }
|
||||
.nexus-fact-item { color: var(--text); border-bottom: 1px solid var(--border); padding: 0.3rem 0; }
|
||||
.nexus-fact-empty { color: var(--text-dim); font-style: italic; }
|
||||
.nexus-taught-confirm {
|
||||
font-size: 0.8rem;
|
||||
color: var(--green);
|
||||
background: rgba(0,255,136,0.06);
|
||||
border: 1px solid var(--green);
|
||||
border-radius: 4px;
|
||||
padding: 0.3rem 0.6rem;
|
||||
margin-bottom: 0.5rem;
|
||||
}
|
||||
|
||||
@@ -86,19 +86,6 @@
|
||||
<p>Your task has been added to the queue. Timmy will review it shortly.</p>
|
||||
<button type="button" id="submit-another-btn" class="btn-primary">Submit Another</button>
|
||||
</div>
|
||||
|
||||
<div id="submit-job-queued" class="submit-job-queued hidden">
|
||||
<div class="queued-icon">
|
||||
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
|
||||
<circle cx="12" cy="12" r="10"></circle>
|
||||
<polyline points="12 6 12 12 16 14"></polyline>
|
||||
</svg>
|
||||
</div>
|
||||
<h3>Job Queued</h3>
|
||||
<p>The server is unreachable right now. Your job has been saved locally and will be submitted automatically when the connection is restored.</p>
|
||||
<div id="queue-count-display" class="queue-count-display"></div>
|
||||
<button type="button" id="submit-another-queued-btn" class="btn-primary">Submit Another</button>
|
||||
</div>
|
||||
</div>
|
||||
<div id="submit-job-backdrop" class="submit-job-backdrop"></div>
|
||||
</div>
|
||||
@@ -155,7 +142,6 @@
|
||||
import { createFamiliar } from "./familiar.js";
|
||||
import { setupControls } from "./controls.js";
|
||||
import { StateReader } from "./state.js";
|
||||
import { messageQueue } from "./queue.js";
|
||||
|
||||
// --- Renderer ---
|
||||
const renderer = new THREE.WebGLRenderer({ antialias: true });
|
||||
@@ -196,60 +182,8 @@
|
||||
moodEl.textContent = state.timmyState.mood;
|
||||
}
|
||||
});
|
||||
|
||||
// Replay queued jobs whenever the server comes back online.
|
||||
stateReader.onConnectionChange(async (online) => {
|
||||
if (!online) return;
|
||||
const pending = messageQueue.getPending();
|
||||
if (pending.length === 0) return;
|
||||
console.log(`[queue] Online — replaying ${pending.length} queued job(s)`);
|
||||
for (const item of pending) {
|
||||
try {
|
||||
const response = await fetch("/api/tasks", {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify(item.payload),
|
||||
});
|
||||
if (response.ok) {
|
||||
messageQueue.markDelivered(item.id);
|
||||
console.log(`[queue] Delivered queued job ${item.id}`);
|
||||
} else {
|
||||
messageQueue.markFailed(item.id);
|
||||
console.warn(`[queue] Failed to deliver job ${item.id}: ${response.status}`);
|
||||
}
|
||||
} catch (err) {
|
||||
// Still offline — leave as QUEUED, will retry next cycle.
|
||||
console.warn(`[queue] Replay aborted (still offline): ${err}`);
|
||||
break;
|
||||
}
|
||||
}
|
||||
messageQueue.prune();
|
||||
_updateQueueBadge();
|
||||
});
|
||||
|
||||
stateReader.connect();
|
||||
|
||||
// --- Queue badge (top-right indicator for pending jobs) ---
|
||||
function _updateQueueBadge() {
|
||||
const count = messageQueue.pendingCount();
|
||||
let badge = document.getElementById("queue-badge");
|
||||
if (count === 0) {
|
||||
if (badge) badge.remove();
|
||||
return;
|
||||
}
|
||||
if (!badge) {
|
||||
badge = document.createElement("div");
|
||||
badge.id = "queue-badge";
|
||||
badge.className = "queue-badge";
|
||||
badge.title = "Jobs queued offline — will submit on reconnect";
|
||||
document.getElementById("overlay").appendChild(badge);
|
||||
}
|
||||
badge.textContent = `${count} queued`;
|
||||
}
|
||||
// Show badge on load if there are already queued messages.
|
||||
messageQueue.prune();
|
||||
_updateQueueBadge();
|
||||
|
||||
// --- About Panel ---
|
||||
const infoBtn = document.getElementById("info-btn");
|
||||
const aboutPanel = document.getElementById("about-panel");
|
||||
@@ -294,9 +228,6 @@
|
||||
const descWarning = document.getElementById("desc-warning");
|
||||
const submitJobSuccess = document.getElementById("submit-job-success");
|
||||
const submitAnotherBtn = document.getElementById("submit-another-btn");
|
||||
const submitJobQueued = document.getElementById("submit-job-queued");
|
||||
const submitAnotherQueuedBtn = document.getElementById("submit-another-queued-btn");
|
||||
const queueCountDisplay = document.getElementById("queue-count-display");
|
||||
|
||||
// Constants
|
||||
const MAX_TITLE_LENGTH = 200;
|
||||
@@ -324,7 +255,6 @@
|
||||
submitJobForm.reset();
|
||||
submitJobForm.classList.remove("hidden");
|
||||
submitJobSuccess.classList.add("hidden");
|
||||
submitJobQueued.classList.add("hidden");
|
||||
updateCharCounts();
|
||||
clearErrors();
|
||||
validateForm();
|
||||
@@ -433,7 +363,6 @@
|
||||
submitJobBackdrop.addEventListener("click", closeSubmitJobModal);
|
||||
cancelJobBtn.addEventListener("click", closeSubmitJobModal);
|
||||
submitAnotherBtn.addEventListener("click", resetForm);
|
||||
submitAnotherQueuedBtn.addEventListener("click", resetForm);
|
||||
|
||||
// Input event listeners for real-time validation
|
||||
jobTitle.addEventListener("input", () => {
|
||||
@@ -491,10 +420,9 @@
|
||||
headers: {
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
body: JSON.stringify(formData),
|
||||
signal: AbortSignal.timeout(8000),
|
||||
body: JSON.stringify(formData)
|
||||
});
|
||||
|
||||
|
||||
if (response.ok) {
|
||||
// Show success state
|
||||
submitJobForm.classList.add("hidden");
|
||||
@@ -505,14 +433,9 @@
|
||||
descError.classList.add("visible");
|
||||
}
|
||||
} catch (error) {
|
||||
// Server unreachable — persist to localStorage queue.
|
||||
messageQueue.enqueue(formData);
|
||||
const count = messageQueue.pendingCount();
|
||||
// For demo/development, show success even if API fails
|
||||
submitJobForm.classList.add("hidden");
|
||||
submitJobQueued.classList.remove("hidden");
|
||||
queueCountDisplay.textContent =
|
||||
count > 1 ? `${count} jobs queued` : "1 job queued";
|
||||
_updateQueueBadge();
|
||||
submitJobSuccess.classList.remove("hidden");
|
||||
} finally {
|
||||
submitJobSubmit.disabled = false;
|
||||
submitJobSubmit.textContent = "Submit Job";
|
||||
|
||||
@@ -1,90 +0,0 @@
|
||||
/**
|
||||
* Offline message queue for Workshop panel.
|
||||
*
|
||||
* Persists undelivered job submissions to localStorage so they survive
|
||||
* page refreshes and are replayed when the server comes back online.
|
||||
*/
|
||||
|
||||
const _QUEUE_KEY = "timmy_workshop_queue";
|
||||
const _MAX_AGE_MS = 24 * 60 * 60 * 1000; // 24 hours — auto-expire old items
|
||||
|
||||
export const STATUS = {
|
||||
QUEUED: "queued",
|
||||
DELIVERED: "delivered",
|
||||
FAILED: "failed",
|
||||
};
|
||||
|
||||
function _load() {
|
||||
try {
|
||||
const raw = localStorage.getItem(_QUEUE_KEY);
|
||||
return raw ? JSON.parse(raw) : [];
|
||||
} catch {
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
function _save(items) {
|
||||
try {
|
||||
localStorage.setItem(_QUEUE_KEY, JSON.stringify(items));
|
||||
} catch {
|
||||
/* localStorage unavailable — degrade silently */
|
||||
}
|
||||
}
|
||||
|
||||
function _uid() {
|
||||
return `msg_${Date.now()}_${Math.random().toString(36).slice(2, 8)}`;
|
||||
}
|
||||
|
||||
/** LocalStorage-backed message queue for Workshop job submissions. */
|
||||
export const messageQueue = {
|
||||
/** Add a payload. Returns the created item (with id and status). */
|
||||
enqueue(payload) {
|
||||
const item = {
|
||||
id: _uid(),
|
||||
payload,
|
||||
queuedAt: new Date().toISOString(),
|
||||
status: STATUS.QUEUED,
|
||||
};
|
||||
const items = _load();
|
||||
items.push(item);
|
||||
_save(items);
|
||||
return item;
|
||||
},
|
||||
|
||||
/** Mark a message as delivered and remove it from storage. */
|
||||
markDelivered(id) {
|
||||
_save(_load().filter((i) => i.id !== id));
|
||||
},
|
||||
|
||||
/** Mark a message as permanently failed (kept for 24h for visibility). */
|
||||
markFailed(id) {
|
||||
_save(
|
||||
_load().map((i) =>
|
||||
i.id === id ? { ...i, status: STATUS.FAILED } : i
|
||||
)
|
||||
);
|
||||
},
|
||||
|
||||
/** All messages waiting to be delivered. */
|
||||
getPending() {
|
||||
return _load().filter((i) => i.status === STATUS.QUEUED);
|
||||
},
|
||||
|
||||
/** Total queued (QUEUED status only) count. */
|
||||
pendingCount() {
|
||||
return this.getPending().length;
|
||||
},
|
||||
|
||||
/** Drop expired failed items (> 24h old). */
|
||||
prune() {
|
||||
const cutoff = Date.now() - _MAX_AGE_MS;
|
||||
_save(
|
||||
_load().filter(
|
||||
(i) =>
|
||||
i.status === STATUS.QUEUED ||
|
||||
(i.status === STATUS.FAILED &&
|
||||
new Date(i.queuedAt).getTime() > cutoff)
|
||||
)
|
||||
);
|
||||
},
|
||||
};
|
||||
@@ -3,10 +3,6 @@
|
||||
*
|
||||
* Provides Timmy's current state to the scene. In Phase 2 this is a
|
||||
* static default; the WebSocket path is stubbed for future use.
|
||||
*
|
||||
* Also manages connection health monitoring: pings /api/matrix/health
|
||||
* every 30 seconds and notifies listeners when online/offline state
|
||||
* changes so the Workshop can replay any queued messages.
|
||||
*/
|
||||
|
||||
const DEFAULTS = {
|
||||
@@ -24,19 +20,11 @@ const DEFAULTS = {
|
||||
version: 1,
|
||||
};
|
||||
|
||||
const _HEALTH_URL = "/api/matrix/health";
|
||||
const _PING_INTERVAL_MS = 30_000;
|
||||
const _WS_RECONNECT_DELAY_MS = 5_000;
|
||||
|
||||
export class StateReader {
|
||||
constructor() {
|
||||
this.state = { ...DEFAULTS };
|
||||
this.listeners = [];
|
||||
this.connectionListeners = [];
|
||||
this._ws = null;
|
||||
this._online = false;
|
||||
this._pingTimer = null;
|
||||
this._reconnectTimer = null;
|
||||
}
|
||||
|
||||
/** Subscribe to state changes. */
|
||||
@@ -44,12 +32,7 @@ export class StateReader {
|
||||
this.listeners.push(fn);
|
||||
}
|
||||
|
||||
/** Subscribe to online/offline transitions. Called with (isOnline: bool). */
|
||||
onConnectionChange(fn) {
|
||||
this.connectionListeners.push(fn);
|
||||
}
|
||||
|
||||
/** Notify all state listeners. */
|
||||
/** Notify all listeners. */
|
||||
_notify() {
|
||||
for (const fn of this.listeners) {
|
||||
try {
|
||||
@@ -60,48 +43,8 @@ export class StateReader {
|
||||
}
|
||||
}
|
||||
|
||||
/** Fire connection listeners only when state actually changes. */
|
||||
_notifyConnection(online) {
|
||||
if (online === this._online) return;
|
||||
this._online = online;
|
||||
for (const fn of this.connectionListeners) {
|
||||
try {
|
||||
fn(online);
|
||||
} catch (e) {
|
||||
console.warn("Connection listener error:", e);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/** Ping the health endpoint once and update connection state. */
|
||||
async _ping() {
|
||||
try {
|
||||
const r = await fetch(_HEALTH_URL, {
|
||||
signal: AbortSignal.timeout(5000),
|
||||
});
|
||||
this._notifyConnection(r.ok);
|
||||
} catch {
|
||||
this._notifyConnection(false);
|
||||
}
|
||||
}
|
||||
|
||||
/** Start 30-second health-check loop (idempotent). */
|
||||
_startHealthCheck() {
|
||||
if (this._pingTimer) return;
|
||||
this._pingTimer = setInterval(() => this._ping(), _PING_INTERVAL_MS);
|
||||
}
|
||||
|
||||
/** Schedule a WebSocket reconnect attempt after a delay (idempotent). */
|
||||
_scheduleReconnect() {
|
||||
if (this._reconnectTimer) return;
|
||||
this._reconnectTimer = setTimeout(() => {
|
||||
this._reconnectTimer = null;
|
||||
this._connectWS();
|
||||
}, _WS_RECONNECT_DELAY_MS);
|
||||
}
|
||||
|
||||
/** Open (or re-open) the WebSocket connection. */
|
||||
_connectWS() {
|
||||
/** Try to connect to the world WebSocket for live updates. */
|
||||
connect() {
|
||||
const proto = location.protocol === "https:" ? "wss:" : "ws:";
|
||||
const url = `${proto}//${location.host}/api/world/ws`;
|
||||
try {
|
||||
@@ -109,13 +52,10 @@ export class StateReader {
|
||||
this._ws.onopen = () => {
|
||||
const dot = document.getElementById("connection-dot");
|
||||
if (dot) dot.classList.add("connected");
|
||||
this._notifyConnection(true);
|
||||
};
|
||||
this._ws.onclose = () => {
|
||||
const dot = document.getElementById("connection-dot");
|
||||
if (dot) dot.classList.remove("connected");
|
||||
this._notifyConnection(false);
|
||||
this._scheduleReconnect();
|
||||
};
|
||||
this._ws.onmessage = (ev) => {
|
||||
try {
|
||||
@@ -135,18 +75,9 @@ export class StateReader {
|
||||
};
|
||||
} catch (e) {
|
||||
console.warn("WebSocket unavailable — using static state");
|
||||
this._scheduleReconnect();
|
||||
}
|
||||
}
|
||||
|
||||
/** Connect to the world WebSocket and start health-check polling. */
|
||||
connect() {
|
||||
this._connectWS();
|
||||
this._startHealthCheck();
|
||||
// Immediate ping so connection status is known before the first interval.
|
||||
this._ping();
|
||||
}
|
||||
|
||||
/** Current mood string. */
|
||||
get mood() {
|
||||
return this.state.timmyState.mood;
|
||||
@@ -161,9 +92,4 @@ export class StateReader {
|
||||
get energy() {
|
||||
return this.state.timmyState.energy;
|
||||
}
|
||||
|
||||
/** Whether the server is currently reachable. */
|
||||
get isOnline() {
|
||||
return this._online;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -604,68 +604,6 @@ canvas {
|
||||
opacity: 1;
|
||||
}
|
||||
|
||||
/* Queued State (offline buffer) */
|
||||
.submit-job-queued {
|
||||
text-align: center;
|
||||
padding: 32px 16px;
|
||||
}
|
||||
|
||||
.submit-job-queued.hidden {
|
||||
display: none;
|
||||
}
|
||||
|
||||
.queued-icon {
|
||||
width: 64px;
|
||||
height: 64px;
|
||||
margin: 0 auto 20px;
|
||||
color: #ffaa33;
|
||||
}
|
||||
|
||||
.queued-icon svg {
|
||||
width: 100%;
|
||||
height: 100%;
|
||||
}
|
||||
|
||||
.submit-job-queued h3 {
|
||||
font-size: 20px;
|
||||
color: #ffaa33;
|
||||
margin: 0 0 12px 0;
|
||||
}
|
||||
|
||||
.submit-job-queued p {
|
||||
font-size: 14px;
|
||||
color: #888;
|
||||
margin: 0 0 16px 0;
|
||||
line-height: 1.5;
|
||||
}
|
||||
|
||||
.queue-count-display {
|
||||
font-size: 12px;
|
||||
color: #ffaa33;
|
||||
margin-bottom: 24px;
|
||||
opacity: 0.8;
|
||||
}
|
||||
|
||||
/* Queue badge — shown in overlay corner when offline jobs are pending */
|
||||
.queue-badge {
|
||||
position: absolute;
|
||||
bottom: 16px;
|
||||
right: 16px;
|
||||
padding: 4px 10px;
|
||||
background: rgba(10, 10, 20, 0.85);
|
||||
border: 1px solid rgba(255, 170, 51, 0.6);
|
||||
border-radius: 12px;
|
||||
color: #ffaa33;
|
||||
font-size: 11px;
|
||||
pointer-events: none;
|
||||
animation: queue-pulse 2s ease-in-out infinite;
|
||||
}
|
||||
|
||||
@keyframes queue-pulse {
|
||||
0%, 100% { opacity: 0.8; }
|
||||
50% { opacity: 1; }
|
||||
}
|
||||
|
||||
/* Mobile adjustments */
|
||||
@media (max-width: 480px) {
|
||||
.about-panel-content {
|
||||
|
||||
@@ -51,9 +51,6 @@ def pytest_collection_modifyitems(config, items):
|
||||
item.add_marker(pytest.mark.docker)
|
||||
item.add_marker(pytest.mark.skip_ci)
|
||||
|
||||
if "setup_prod" in test_path or "setup_script" in test_path:
|
||||
item.add_marker(pytest.mark.skip_ci)
|
||||
|
||||
if "ollama" in test_path or "test_ollama" in item.name:
|
||||
item.add_marker(pytest.mark.ollama)
|
||||
|
||||
|
||||
@@ -1,527 +0,0 @@
|
||||
"""Unit tests for dashboard/routes/daily_run.py."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from datetime import UTC, datetime, timedelta
|
||||
from unittest.mock import MagicMock, patch
|
||||
from urllib.error import URLError
|
||||
|
||||
from dashboard.routes.daily_run import (
|
||||
DEFAULT_CONFIG,
|
||||
LAYER_LABELS,
|
||||
DailyRunMetrics,
|
||||
GiteaClient,
|
||||
LayerMetrics,
|
||||
_extract_layer,
|
||||
_fetch_layer_metrics,
|
||||
_get_metrics,
|
||||
_get_token,
|
||||
_load_config,
|
||||
_load_cycle_data,
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _load_config
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_load_config_returns_defaults():
|
||||
with patch("dashboard.routes.daily_run.CONFIG_PATH") as mock_path:
|
||||
mock_path.exists.return_value = False
|
||||
config = _load_config()
|
||||
assert config["gitea_api"] == DEFAULT_CONFIG["gitea_api"]
|
||||
assert config["repo_slug"] == DEFAULT_CONFIG["repo_slug"]
|
||||
|
||||
|
||||
def test_load_config_merges_file_orchestrator_section(tmp_path):
|
||||
config_file = tmp_path / "daily_run.json"
|
||||
config_file.write_text(
|
||||
json.dumps(
|
||||
{"orchestrator": {"repo_slug": "custom/repo", "gitea_api": "http://custom:3000/api/v1"}}
|
||||
)
|
||||
)
|
||||
with patch("dashboard.routes.daily_run.CONFIG_PATH", config_file):
|
||||
config = _load_config()
|
||||
assert config["repo_slug"] == "custom/repo"
|
||||
assert config["gitea_api"] == "http://custom:3000/api/v1"
|
||||
|
||||
|
||||
def test_load_config_ignores_invalid_json(tmp_path):
|
||||
config_file = tmp_path / "daily_run.json"
|
||||
config_file.write_text("not valid json{{")
|
||||
with patch("dashboard.routes.daily_run.CONFIG_PATH", config_file):
|
||||
config = _load_config()
|
||||
assert config["repo_slug"] == DEFAULT_CONFIG["repo_slug"]
|
||||
|
||||
|
||||
def test_load_config_env_overrides(monkeypatch):
|
||||
monkeypatch.setenv("TIMMY_GITEA_API", "http://envapi:3000/api/v1")
|
||||
monkeypatch.setenv("TIMMY_REPO_SLUG", "env/repo")
|
||||
monkeypatch.setenv("TIMMY_GITEA_TOKEN", "env-token-123")
|
||||
with patch("dashboard.routes.daily_run.CONFIG_PATH") as mock_path:
|
||||
mock_path.exists.return_value = False
|
||||
config = _load_config()
|
||||
assert config["gitea_api"] == "http://envapi:3000/api/v1"
|
||||
assert config["repo_slug"] == "env/repo"
|
||||
assert config["token"] == "env-token-123"
|
||||
|
||||
|
||||
def test_load_config_no_env_overrides_without_vars(monkeypatch):
|
||||
monkeypatch.delenv("TIMMY_GITEA_API", raising=False)
|
||||
monkeypatch.delenv("TIMMY_REPO_SLUG", raising=False)
|
||||
monkeypatch.delenv("TIMMY_GITEA_TOKEN", raising=False)
|
||||
with patch("dashboard.routes.daily_run.CONFIG_PATH") as mock_path:
|
||||
mock_path.exists.return_value = False
|
||||
config = _load_config()
|
||||
assert "token" not in config
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _get_token
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_get_token_from_config_dict():
|
||||
config = {"token": "direct-token", "token_file": "~/.hermes/gitea_token"}
|
||||
assert _get_token(config) == "direct-token"
|
||||
|
||||
|
||||
def test_get_token_from_file(tmp_path):
|
||||
token_file = tmp_path / "token.txt"
|
||||
token_file.write_text(" file-token \n")
|
||||
config = {"token_file": str(token_file)}
|
||||
assert _get_token(config) == "file-token"
|
||||
|
||||
|
||||
def test_get_token_returns_none_when_file_missing(tmp_path):
|
||||
config = {"token_file": str(tmp_path / "nonexistent_token")}
|
||||
assert _get_token(config) is None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# GiteaClient
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _make_client(**kwargs) -> GiteaClient:
|
||||
config = {**DEFAULT_CONFIG, **kwargs}
|
||||
return GiteaClient(config, token="test-token")
|
||||
|
||||
|
||||
def test_gitea_client_headers_include_auth():
|
||||
client = _make_client()
|
||||
headers = client._headers()
|
||||
assert headers["Authorization"] == "token test-token"
|
||||
assert headers["Accept"] == "application/json"
|
||||
|
||||
|
||||
def test_gitea_client_headers_no_token():
|
||||
config = {**DEFAULT_CONFIG}
|
||||
client = GiteaClient(config, token=None)
|
||||
headers = client._headers()
|
||||
assert "Authorization" not in headers
|
||||
|
||||
|
||||
def test_gitea_client_api_url():
|
||||
client = _make_client()
|
||||
url = client._api_url("issues")
|
||||
assert url == f"{DEFAULT_CONFIG['gitea_api']}/repos/{DEFAULT_CONFIG['repo_slug']}/issues"
|
||||
|
||||
|
||||
def test_gitea_client_api_url_strips_trailing_slash():
|
||||
config = {**DEFAULT_CONFIG, "gitea_api": "http://localhost:3000/api/v1/"}
|
||||
client = GiteaClient(config, token=None)
|
||||
url = client._api_url("issues")
|
||||
assert "//" not in url.replace("http://", "")
|
||||
|
||||
|
||||
def test_gitea_client_is_available_true():
|
||||
client = _make_client()
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status = 200
|
||||
mock_resp.__enter__ = lambda s: mock_resp
|
||||
mock_resp.__exit__ = MagicMock(return_value=False)
|
||||
with patch("dashboard.routes.daily_run.urlopen", return_value=mock_resp):
|
||||
assert client.is_available() is True
|
||||
|
||||
|
||||
def test_gitea_client_is_available_cached():
|
||||
client = _make_client()
|
||||
client._available = True
|
||||
# Should not call urlopen at all
|
||||
with patch("dashboard.routes.daily_run.urlopen") as mock_urlopen:
|
||||
assert client.is_available() is True
|
||||
mock_urlopen.assert_not_called()
|
||||
|
||||
|
||||
def test_gitea_client_is_available_false_on_url_error():
|
||||
client = _make_client()
|
||||
with patch("dashboard.routes.daily_run.urlopen", side_effect=URLError("refused")):
|
||||
assert client.is_available() is False
|
||||
|
||||
|
||||
def test_gitea_client_is_available_false_on_timeout():
|
||||
client = _make_client()
|
||||
with patch("dashboard.routes.daily_run.urlopen", side_effect=TimeoutError()):
|
||||
assert client.is_available() is False
|
||||
|
||||
|
||||
def test_gitea_client_get_paginated_single_page():
|
||||
client = _make_client()
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.read.return_value = json.dumps([{"id": 1}, {"id": 2}]).encode()
|
||||
mock_resp.__enter__ = lambda s: mock_resp
|
||||
mock_resp.__exit__ = MagicMock(return_value=False)
|
||||
with patch("dashboard.routes.daily_run.urlopen", return_value=mock_resp):
|
||||
result = client.get_paginated("issues")
|
||||
assert len(result) == 2
|
||||
assert result[0]["id"] == 1
|
||||
|
||||
|
||||
def test_gitea_client_get_paginated_empty():
|
||||
client = _make_client()
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.read.return_value = b"[]"
|
||||
mock_resp.__enter__ = lambda s: mock_resp
|
||||
mock_resp.__exit__ = MagicMock(return_value=False)
|
||||
with patch("dashboard.routes.daily_run.urlopen", return_value=mock_resp):
|
||||
result = client.get_paginated("issues")
|
||||
assert result == []
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# LayerMetrics.trend
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_layer_metrics_trend_no_previous_no_current():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=0, previous_count=0)
|
||||
assert lm.trend == "→"
|
||||
|
||||
|
||||
def test_layer_metrics_trend_no_previous_with_current():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=5, previous_count=0)
|
||||
assert lm.trend == "↑"
|
||||
|
||||
|
||||
def test_layer_metrics_trend_big_increase():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=130, previous_count=100)
|
||||
assert lm.trend == "↑↑"
|
||||
|
||||
|
||||
def test_layer_metrics_trend_small_increase():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=108, previous_count=100)
|
||||
assert lm.trend == "↑"
|
||||
|
||||
|
||||
def test_layer_metrics_trend_stable():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=100, previous_count=100)
|
||||
assert lm.trend == "→"
|
||||
|
||||
|
||||
def test_layer_metrics_trend_small_decrease():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=92, previous_count=100)
|
||||
assert lm.trend == "↓"
|
||||
|
||||
|
||||
def test_layer_metrics_trend_big_decrease():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=70, previous_count=100)
|
||||
assert lm.trend == "↓↓"
|
||||
|
||||
|
||||
def test_layer_metrics_trend_color_up():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=200, previous_count=100)
|
||||
assert lm.trend_color == "var(--green)"
|
||||
|
||||
|
||||
def test_layer_metrics_trend_color_down():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=50, previous_count=100)
|
||||
assert lm.trend_color == "var(--amber)"
|
||||
|
||||
|
||||
def test_layer_metrics_trend_color_stable():
|
||||
lm = LayerMetrics(name="triage", label="layer:triage", current_count=100, previous_count=100)
|
||||
assert lm.trend_color == "var(--text-dim)"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# DailyRunMetrics.sessions_trend
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _make_daily_metrics(**kwargs) -> DailyRunMetrics:
|
||||
defaults = dict(
|
||||
sessions_completed=10,
|
||||
sessions_previous=8,
|
||||
layers=[],
|
||||
total_touched_current=20,
|
||||
total_touched_previous=15,
|
||||
lookback_days=7,
|
||||
generated_at=datetime.now(UTC).isoformat(),
|
||||
)
|
||||
defaults.update(kwargs)
|
||||
return DailyRunMetrics(**defaults)
|
||||
|
||||
|
||||
def test_daily_metrics_sessions_trend_big_increase():
|
||||
m = _make_daily_metrics(sessions_completed=130, sessions_previous=100)
|
||||
assert m.sessions_trend == "↑↑"
|
||||
|
||||
|
||||
def test_daily_metrics_sessions_trend_stable():
|
||||
m = _make_daily_metrics(sessions_completed=100, sessions_previous=100)
|
||||
assert m.sessions_trend == "→"
|
||||
|
||||
|
||||
def test_daily_metrics_sessions_trend_no_previous_zero_completed():
|
||||
m = _make_daily_metrics(sessions_completed=0, sessions_previous=0)
|
||||
assert m.sessions_trend == "→"
|
||||
|
||||
|
||||
def test_daily_metrics_sessions_trend_no_previous_with_completed():
|
||||
m = _make_daily_metrics(sessions_completed=5, sessions_previous=0)
|
||||
assert m.sessions_trend == "↑"
|
||||
|
||||
|
||||
def test_daily_metrics_sessions_trend_color_green():
|
||||
m = _make_daily_metrics(sessions_completed=200, sessions_previous=100)
|
||||
assert m.sessions_trend_color == "var(--green)"
|
||||
|
||||
|
||||
def test_daily_metrics_sessions_trend_color_amber():
|
||||
m = _make_daily_metrics(sessions_completed=50, sessions_previous=100)
|
||||
assert m.sessions_trend_color == "var(--amber)"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _extract_layer
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_extract_layer_finds_layer_label():
|
||||
labels = [{"name": "bug"}, {"name": "layer:triage"}, {"name": "urgent"}]
|
||||
assert _extract_layer(labels) == "triage"
|
||||
|
||||
|
||||
def test_extract_layer_returns_none_when_no_layer():
|
||||
labels = [{"name": "bug"}, {"name": "feature"}]
|
||||
assert _extract_layer(labels) is None
|
||||
|
||||
|
||||
def test_extract_layer_empty_labels():
|
||||
assert _extract_layer([]) is None
|
||||
|
||||
|
||||
def test_extract_layer_first_match_wins():
|
||||
labels = [{"name": "layer:micro-fix"}, {"name": "layer:tests"}]
|
||||
assert _extract_layer(labels) == "micro-fix"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _load_cycle_data
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_load_cycle_data_missing_file(tmp_path):
|
||||
with patch("dashboard.routes.daily_run.REPO_ROOT", tmp_path):
|
||||
result = _load_cycle_data(days=14)
|
||||
assert result == {"current": 0, "previous": 0}
|
||||
|
||||
|
||||
def test_load_cycle_data_counts_successful_sessions(tmp_path):
|
||||
retro_dir = tmp_path / ".loop" / "retro"
|
||||
retro_dir.mkdir(parents=True)
|
||||
retro_file = retro_dir / "cycles.jsonl"
|
||||
|
||||
now = datetime.now(UTC)
|
||||
recent_ts = (now - timedelta(days=3)).isoformat()
|
||||
older_ts = (now - timedelta(days=10)).isoformat()
|
||||
old_ts = (now - timedelta(days=20)).isoformat()
|
||||
|
||||
lines = [
|
||||
json.dumps({"timestamp": recent_ts, "success": True}),
|
||||
json.dumps({"timestamp": recent_ts, "success": False}), # not counted
|
||||
json.dumps({"timestamp": older_ts, "success": True}),
|
||||
json.dumps({"timestamp": old_ts, "success": True}), # outside window
|
||||
]
|
||||
retro_file.write_text("\n".join(lines))
|
||||
|
||||
with patch("dashboard.routes.daily_run.REPO_ROOT", tmp_path):
|
||||
result = _load_cycle_data(days=7)
|
||||
|
||||
assert result["current"] == 1
|
||||
assert result["previous"] == 1
|
||||
|
||||
|
||||
def test_load_cycle_data_skips_invalid_json_lines(tmp_path):
|
||||
retro_dir = tmp_path / ".loop" / "retro"
|
||||
retro_dir.mkdir(parents=True)
|
||||
retro_file = retro_dir / "cycles.jsonl"
|
||||
|
||||
now = datetime.now(UTC)
|
||||
recent_ts = (now - timedelta(days=1)).isoformat()
|
||||
retro_file.write_text(
|
||||
f"not valid json\n{json.dumps({'timestamp': recent_ts, 'success': True})}\n"
|
||||
)
|
||||
|
||||
with patch("dashboard.routes.daily_run.REPO_ROOT", tmp_path):
|
||||
result = _load_cycle_data(days=7)
|
||||
|
||||
assert result["current"] == 1
|
||||
|
||||
|
||||
def test_load_cycle_data_skips_entries_with_no_timestamp(tmp_path):
|
||||
retro_dir = tmp_path / ".loop" / "retro"
|
||||
retro_dir.mkdir(parents=True)
|
||||
retro_file = retro_dir / "cycles.jsonl"
|
||||
retro_file.write_text(json.dumps({"success": True}))
|
||||
|
||||
with patch("dashboard.routes.daily_run.REPO_ROOT", tmp_path):
|
||||
result = _load_cycle_data(days=7)
|
||||
|
||||
assert result == {"current": 0, "previous": 0}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _fetch_layer_metrics
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _make_issue(updated_offset_days: int) -> dict:
|
||||
ts = (datetime.now(UTC) - timedelta(days=updated_offset_days)).isoformat()
|
||||
return {"updated_at": ts, "labels": [{"name": "layer:triage"}]}
|
||||
|
||||
|
||||
def test_fetch_layer_metrics_counts_current_and_previous():
|
||||
client = _make_client()
|
||||
client._available = True
|
||||
|
||||
recent_issue = _make_issue(updated_offset_days=3)
|
||||
older_issue = _make_issue(updated_offset_days=10)
|
||||
|
||||
with patch.object(client, "get_paginated", return_value=[recent_issue, older_issue]):
|
||||
layers, total_current, total_previous = _fetch_layer_metrics(client, lookback_days=7)
|
||||
|
||||
# Should have one entry per LAYER_LABELS
|
||||
assert len(layers) == len(LAYER_LABELS)
|
||||
triage = next(lm for lm in layers if lm.name == "triage")
|
||||
assert triage.current_count == 1
|
||||
assert triage.previous_count == 1
|
||||
|
||||
|
||||
def test_fetch_layer_metrics_degrades_on_http_error():
|
||||
client = _make_client()
|
||||
client._available = True
|
||||
|
||||
with patch.object(client, "get_paginated", side_effect=URLError("network")):
|
||||
layers, total_current, total_previous = _fetch_layer_metrics(client, lookback_days=7)
|
||||
|
||||
assert len(layers) == len(LAYER_LABELS)
|
||||
for lm in layers:
|
||||
assert lm.current_count == 0
|
||||
assert lm.previous_count == 0
|
||||
assert total_current == 0
|
||||
assert total_previous == 0
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _get_metrics
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_get_metrics_returns_none_when_gitea_unavailable():
|
||||
with patch("dashboard.routes.daily_run._load_config", return_value=DEFAULT_CONFIG):
|
||||
with patch("dashboard.routes.daily_run._get_token", return_value=None):
|
||||
with patch.object(GiteaClient, "is_available", return_value=False):
|
||||
result = _get_metrics()
|
||||
assert result is None
|
||||
|
||||
|
||||
def test_get_metrics_returns_daily_run_metrics():
|
||||
mock_layers = [
|
||||
LayerMetrics(name="triage", label="layer:triage", current_count=5, previous_count=3)
|
||||
]
|
||||
with patch("dashboard.routes.daily_run._load_config", return_value=DEFAULT_CONFIG):
|
||||
with patch("dashboard.routes.daily_run._get_token", return_value="tok"):
|
||||
with patch.object(GiteaClient, "is_available", return_value=True):
|
||||
with patch(
|
||||
"dashboard.routes.daily_run._fetch_layer_metrics",
|
||||
return_value=(mock_layers, 5, 3),
|
||||
):
|
||||
with patch(
|
||||
"dashboard.routes.daily_run._load_cycle_data",
|
||||
return_value={"current": 10, "previous": 8},
|
||||
):
|
||||
result = _get_metrics(lookback_days=7)
|
||||
|
||||
assert result is not None
|
||||
assert result.sessions_completed == 10
|
||||
assert result.sessions_previous == 8
|
||||
assert result.lookback_days == 7
|
||||
assert result.layers == mock_layers
|
||||
|
||||
|
||||
def test_get_metrics_returns_none_on_exception():
|
||||
with patch("dashboard.routes.daily_run._load_config", return_value=DEFAULT_CONFIG):
|
||||
with patch("dashboard.routes.daily_run._get_token", return_value="tok"):
|
||||
with patch.object(GiteaClient, "is_available", return_value=True):
|
||||
with patch(
|
||||
"dashboard.routes.daily_run._fetch_layer_metrics",
|
||||
side_effect=Exception("unexpected"),
|
||||
):
|
||||
result = _get_metrics()
|
||||
assert result is None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Route handlers (FastAPI)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_daily_run_metrics_api_unavailable(client):
|
||||
with patch("dashboard.routes.daily_run._get_metrics", return_value=None):
|
||||
resp = client.get("/daily-run/metrics")
|
||||
assert resp.status_code == 503
|
||||
data = resp.json()
|
||||
assert data["status"] == "unavailable"
|
||||
|
||||
|
||||
def test_daily_run_metrics_api_returns_json(client):
|
||||
mock_metrics = _make_daily_metrics(
|
||||
layers=[
|
||||
LayerMetrics(name="triage", label="layer:triage", current_count=3, previous_count=2)
|
||||
]
|
||||
)
|
||||
with patch("dashboard.routes.daily_run._get_metrics", return_value=mock_metrics):
|
||||
with patch(
|
||||
"dashboard.routes.quests.check_daily_run_quests",
|
||||
return_value=[],
|
||||
create=True,
|
||||
):
|
||||
resp = client.get("/daily-run/metrics?lookback_days=7")
|
||||
assert resp.status_code == 200
|
||||
data = resp.json()
|
||||
assert data["status"] == "ok"
|
||||
assert data["lookback_days"] == 7
|
||||
assert "sessions" in data
|
||||
assert "layers" in data
|
||||
assert "totals" in data
|
||||
assert len(data["layers"]) == 1
|
||||
assert data["layers"][0]["name"] == "triage"
|
||||
|
||||
|
||||
def test_daily_run_panel_returns_html(client):
|
||||
mock_metrics = _make_daily_metrics()
|
||||
with patch("dashboard.routes.daily_run._get_metrics", return_value=mock_metrics):
|
||||
with patch("dashboard.routes.daily_run._load_config", return_value=DEFAULT_CONFIG):
|
||||
resp = client.get("/daily-run/panel")
|
||||
assert resp.status_code == 200
|
||||
assert "text/html" in resp.headers["content-type"]
|
||||
|
||||
|
||||
def test_daily_run_panel_when_unavailable(client):
|
||||
with patch("dashboard.routes.daily_run._get_metrics", return_value=None):
|
||||
with patch("dashboard.routes.daily_run._load_config", return_value=DEFAULT_CONFIG):
|
||||
resp = client.get("/daily-run/panel")
|
||||
assert resp.status_code == 200
|
||||
@@ -1,72 +0,0 @@
|
||||
"""Tests for the Nexus conversational awareness routes."""
|
||||
|
||||
from unittest.mock import patch
|
||||
|
||||
|
||||
def test_nexus_page_returns_200(client):
|
||||
"""GET /nexus should render without error."""
|
||||
response = client.get("/nexus")
|
||||
assert response.status_code == 200
|
||||
assert "NEXUS" in response.text
|
||||
|
||||
|
||||
def test_nexus_page_contains_chat_form(client):
|
||||
"""Nexus page must include the conversational chat form."""
|
||||
response = client.get("/nexus")
|
||||
assert response.status_code == 200
|
||||
assert "/nexus/chat" in response.text
|
||||
|
||||
|
||||
def test_nexus_page_contains_teach_form(client):
|
||||
"""Nexus page must include the teaching panel form."""
|
||||
response = client.get("/nexus")
|
||||
assert response.status_code == 200
|
||||
assert "/nexus/teach" in response.text
|
||||
|
||||
|
||||
def test_nexus_chat_empty_message_returns_empty(client):
|
||||
"""POST /nexus/chat with blank message returns empty response."""
|
||||
response = client.post("/nexus/chat", data={"message": " "})
|
||||
assert response.status_code == 200
|
||||
assert response.text == ""
|
||||
|
||||
|
||||
def test_nexus_chat_too_long_returns_error(client):
|
||||
"""POST /nexus/chat with overlong message returns error partial."""
|
||||
long_msg = "x" * 10_001
|
||||
response = client.post("/nexus/chat", data={"message": long_msg})
|
||||
assert response.status_code == 200
|
||||
assert "too long" in response.text.lower()
|
||||
|
||||
|
||||
def test_nexus_chat_posts_message(client):
|
||||
"""POST /nexus/chat calls the session chat function and returns a partial."""
|
||||
with patch("dashboard.routes.nexus.chat", return_value="Hello from Timmy"):
|
||||
response = client.post("/nexus/chat", data={"message": "hello"})
|
||||
assert response.status_code == 200
|
||||
assert "hello" in response.text.lower() or "timmy" in response.text.lower()
|
||||
|
||||
|
||||
def test_nexus_teach_stores_fact(client):
|
||||
"""POST /nexus/teach should persist a fact and return confirmation."""
|
||||
with patch("dashboard.routes.nexus.store_personal_fact") as mock_store, \
|
||||
patch("dashboard.routes.nexus.recall_personal_facts_with_ids", return_value=[]):
|
||||
mock_store.return_value = None
|
||||
response = client.post("/nexus/teach", data={"fact": "Timmy loves Python"})
|
||||
assert response.status_code == 200
|
||||
assert "Timmy loves Python" in response.text
|
||||
|
||||
|
||||
def test_nexus_teach_empty_fact_returns_empty(client):
|
||||
"""POST /nexus/teach with blank fact returns empty response."""
|
||||
response = client.post("/nexus/teach", data={"fact": " "})
|
||||
assert response.status_code == 200
|
||||
assert response.text == ""
|
||||
|
||||
|
||||
def test_nexus_clear_history(client):
|
||||
"""DELETE /nexus/history should clear the conversation log."""
|
||||
with patch("dashboard.routes.nexus.reset_session"):
|
||||
response = client.request("DELETE", "/nexus/history")
|
||||
assert response.status_code == 200
|
||||
assert "cleared" in response.text.lower()
|
||||
@@ -11,13 +11,10 @@ PROD_PROJECT_DIR = Path("/home/ubuntu/prod-sovereign-stack")
|
||||
PROD_VAULT_DIR = PROD_PROJECT_DIR / "TimmyVault"
|
||||
SETUP_SCRIPT_PATH = Path("/home/ubuntu/setup_timmy.sh")
|
||||
|
||||
pytestmark = [
|
||||
pytest.mark.skip_ci,
|
||||
pytest.mark.skipif(
|
||||
not SETUP_SCRIPT_PATH.exists(),
|
||||
reason=f"Setup script not found at {SETUP_SCRIPT_PATH}",
|
||||
),
|
||||
]
|
||||
pytestmark = pytest.mark.skipif(
|
||||
not SETUP_SCRIPT_PATH.exists(),
|
||||
reason=f"Setup script not found at {SETUP_SCRIPT_PATH}",
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture(scope="module", autouse=True)
|
||||
|
||||
@@ -1,509 +0,0 @@
|
||||
"""Unit tests for infrastructure.chat_store module."""
|
||||
|
||||
import threading
|
||||
|
||||
from infrastructure.chat_store import Message, MessageLog, _get_conn
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Message dataclass
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestMessageDataclass:
|
||||
"""Tests for the Message dataclass."""
|
||||
|
||||
def test_message_required_fields(self):
|
||||
"""Message can be created with required fields only."""
|
||||
msg = Message(role="user", content="hello", timestamp="2024-01-01T00:00:00")
|
||||
assert msg.role == "user"
|
||||
assert msg.content == "hello"
|
||||
assert msg.timestamp == "2024-01-01T00:00:00"
|
||||
|
||||
def test_message_default_source(self):
|
||||
"""Message source defaults to 'browser'."""
|
||||
msg = Message(role="user", content="hi", timestamp="2024-01-01T00:00:00")
|
||||
assert msg.source == "browser"
|
||||
|
||||
def test_message_custom_source(self):
|
||||
"""Message source can be overridden."""
|
||||
msg = Message(role="agent", content="reply", timestamp="2024-01-01T00:00:00", source="api")
|
||||
assert msg.source == "api"
|
||||
|
||||
def test_message_equality(self):
|
||||
"""Two Messages with the same fields are equal (dataclass default)."""
|
||||
m1 = Message(role="user", content="x", timestamp="t")
|
||||
m2 = Message(role="user", content="x", timestamp="t")
|
||||
assert m1 == m2
|
||||
|
||||
def test_message_inequality(self):
|
||||
"""Messages with different content are not equal."""
|
||||
m1 = Message(role="user", content="x", timestamp="t")
|
||||
m2 = Message(role="user", content="y", timestamp="t")
|
||||
assert m1 != m2
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _get_conn context manager
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestGetConnContextManager:
|
||||
"""Tests for the _get_conn context manager."""
|
||||
|
||||
def test_creates_db_file(self, tmp_path):
|
||||
"""_get_conn creates the database file on first use."""
|
||||
db = tmp_path / "chat.db"
|
||||
assert not db.exists()
|
||||
with _get_conn(db) as conn:
|
||||
assert conn is not None
|
||||
assert db.exists()
|
||||
|
||||
def test_creates_parent_directories(self, tmp_path):
|
||||
"""_get_conn creates any missing parent directories."""
|
||||
db = tmp_path / "nested" / "deep" / "chat.db"
|
||||
with _get_conn(db):
|
||||
pass
|
||||
assert db.exists()
|
||||
|
||||
def test_creates_schema(self, tmp_path):
|
||||
"""_get_conn creates the chat_messages table."""
|
||||
db = tmp_path / "chat.db"
|
||||
with _get_conn(db) as conn:
|
||||
tables = conn.execute(
|
||||
"SELECT name FROM sqlite_master WHERE type='table' AND name='chat_messages'"
|
||||
).fetchall()
|
||||
assert len(tables) == 1
|
||||
|
||||
def test_schema_has_expected_columns(self, tmp_path):
|
||||
"""chat_messages table has the expected columns."""
|
||||
db = tmp_path / "chat.db"
|
||||
with _get_conn(db) as conn:
|
||||
info = conn.execute("PRAGMA table_info(chat_messages)").fetchall()
|
||||
col_names = [row["name"] for row in info]
|
||||
assert set(col_names) == {"id", "role", "content", "timestamp", "source"}
|
||||
|
||||
def test_idempotent_schema_creation(self, tmp_path):
|
||||
"""Calling _get_conn twice does not fail (CREATE TABLE IF NOT EXISTS)."""
|
||||
db = tmp_path / "chat.db"
|
||||
with _get_conn(db):
|
||||
pass
|
||||
with _get_conn(db) as conn:
|
||||
# Table still exists and is usable
|
||||
conn.execute("SELECT COUNT(*) FROM chat_messages")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# MessageLog — basic operations
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestMessageLogAppend:
|
||||
"""Tests for MessageLog.append()."""
|
||||
|
||||
def test_append_single_message(self, tmp_path):
|
||||
"""append() stores a message that can be retrieved."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
log.append("user", "hello", "2024-01-01T00:00:00")
|
||||
messages = log.all()
|
||||
assert len(messages) == 1
|
||||
assert messages[0].role == "user"
|
||||
assert messages[0].content == "hello"
|
||||
assert messages[0].timestamp == "2024-01-01T00:00:00"
|
||||
assert messages[0].source == "browser"
|
||||
log.close()
|
||||
|
||||
def test_append_custom_source(self, tmp_path):
|
||||
"""append() stores the source field correctly."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
log.append("agent", "reply", "2024-01-01T00:00:01", source="api")
|
||||
msg = log.all()[0]
|
||||
assert msg.source == "api"
|
||||
log.close()
|
||||
|
||||
def test_append_multiple_messages_preserves_order(self, tmp_path):
|
||||
"""append() preserves insertion order."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
log.append("user", "first", "2024-01-01T00:00:00")
|
||||
log.append("agent", "second", "2024-01-01T00:00:01")
|
||||
log.append("user", "third", "2024-01-01T00:00:02")
|
||||
messages = log.all()
|
||||
assert [m.content for m in messages] == ["first", "second", "third"]
|
||||
log.close()
|
||||
|
||||
def test_append_persists_across_instances(self, tmp_path):
|
||||
"""Messages appended by one instance are readable by another."""
|
||||
db = tmp_path / "chat.db"
|
||||
log1 = MessageLog(db)
|
||||
log1.append("user", "persisted", "2024-01-01T00:00:00")
|
||||
log1.close()
|
||||
|
||||
log2 = MessageLog(db)
|
||||
messages = log2.all()
|
||||
assert len(messages) == 1
|
||||
assert messages[0].content == "persisted"
|
||||
log2.close()
|
||||
|
||||
|
||||
class TestMessageLogAll:
|
||||
"""Tests for MessageLog.all()."""
|
||||
|
||||
def test_all_on_empty_store_returns_empty_list(self, tmp_path):
|
||||
"""all() returns [] when there are no messages."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
assert log.all() == []
|
||||
log.close()
|
||||
|
||||
def test_all_returns_message_objects(self, tmp_path):
|
||||
"""all() returns a list of Message dataclass instances."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
log.append("user", "hi", "2024-01-01T00:00:00")
|
||||
messages = log.all()
|
||||
assert all(isinstance(m, Message) for m in messages)
|
||||
log.close()
|
||||
|
||||
def test_all_returns_all_messages(self, tmp_path):
|
||||
"""all() returns every stored message."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
for i in range(5):
|
||||
log.append("user", f"msg{i}", f"2024-01-01T00:00:0{i}")
|
||||
assert len(log.all()) == 5
|
||||
log.close()
|
||||
|
||||
|
||||
class TestMessageLogRecent:
|
||||
"""Tests for MessageLog.recent()."""
|
||||
|
||||
def test_recent_on_empty_store_returns_empty_list(self, tmp_path):
|
||||
"""recent() returns [] when there are no messages."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
assert log.recent() == []
|
||||
log.close()
|
||||
|
||||
def test_recent_default_limit(self, tmp_path):
|
||||
"""recent() with default limit returns up to 50 messages."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
for i in range(60):
|
||||
log.append("user", f"msg{i}", f"2024-01-01T00:00:{i:02d}")
|
||||
msgs = log.recent()
|
||||
assert len(msgs) == 50
|
||||
log.close()
|
||||
|
||||
def test_recent_custom_limit(self, tmp_path):
|
||||
"""recent() respects a custom limit."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
for i in range(10):
|
||||
log.append("user", f"msg{i}", f"2024-01-01T00:00:0{i}")
|
||||
msgs = log.recent(limit=3)
|
||||
assert len(msgs) == 3
|
||||
log.close()
|
||||
|
||||
def test_recent_returns_newest_messages(self, tmp_path):
|
||||
"""recent() returns the most-recently-inserted messages."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
for i in range(10):
|
||||
log.append("user", f"msg{i}", f"2024-01-01T00:00:0{i}")
|
||||
msgs = log.recent(limit=3)
|
||||
# Should be the last 3 inserted, in oldest-first order
|
||||
assert [m.content for m in msgs] == ["msg7", "msg8", "msg9"]
|
||||
log.close()
|
||||
|
||||
def test_recent_fewer_than_limit_returns_all(self, tmp_path):
|
||||
"""recent() returns all messages when count < limit."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
log.append("user", "only", "2024-01-01T00:00:00")
|
||||
msgs = log.recent(limit=10)
|
||||
assert len(msgs) == 1
|
||||
log.close()
|
||||
|
||||
def test_recent_returns_oldest_first(self, tmp_path):
|
||||
"""recent() returns messages in oldest-first order."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
log.append("user", "a", "2024-01-01T00:00:00")
|
||||
log.append("user", "b", "2024-01-01T00:00:01")
|
||||
log.append("user", "c", "2024-01-01T00:00:02")
|
||||
msgs = log.recent(limit=2)
|
||||
assert [m.content for m in msgs] == ["b", "c"]
|
||||
log.close()
|
||||
|
||||
|
||||
class TestMessageLogClear:
|
||||
"""Tests for MessageLog.clear()."""
|
||||
|
||||
def test_clear_empties_the_store(self, tmp_path):
|
||||
"""clear() removes all messages."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
log.append("user", "hello", "2024-01-01T00:00:00")
|
||||
log.clear()
|
||||
assert log.all() == []
|
||||
log.close()
|
||||
|
||||
def test_clear_on_empty_store_is_safe(self, tmp_path):
|
||||
"""clear() on an empty store does not raise."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
log.clear() # should not raise
|
||||
assert log.all() == []
|
||||
log.close()
|
||||
|
||||
def test_clear_allows_new_appends(self, tmp_path):
|
||||
"""After clear(), new messages can be appended."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
log.append("user", "old", "2024-01-01T00:00:00")
|
||||
log.clear()
|
||||
log.append("user", "new", "2024-01-01T00:00:01")
|
||||
messages = log.all()
|
||||
assert len(messages) == 1
|
||||
assert messages[0].content == "new"
|
||||
log.close()
|
||||
|
||||
def test_clear_resets_len_to_zero(self, tmp_path):
|
||||
"""After clear(), __len__ returns 0."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
log.append("user", "a", "t")
|
||||
log.append("user", "b", "t")
|
||||
log.clear()
|
||||
assert len(log) == 0
|
||||
log.close()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# MessageLog — __len__
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestMessageLogLen:
|
||||
"""Tests for MessageLog.__len__()."""
|
||||
|
||||
def test_len_empty_store(self, tmp_path):
|
||||
"""__len__ returns 0 for an empty store."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
assert len(log) == 0
|
||||
log.close()
|
||||
|
||||
def test_len_after_appends(self, tmp_path):
|
||||
"""__len__ reflects the number of stored messages."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
for i in range(7):
|
||||
log.append("user", f"msg{i}", "t")
|
||||
assert len(log) == 7
|
||||
log.close()
|
||||
|
||||
def test_len_after_clear(self, tmp_path):
|
||||
"""__len__ is 0 after clear()."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
log.append("user", "x", "t")
|
||||
log.clear()
|
||||
assert len(log) == 0
|
||||
log.close()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# MessageLog — pruning
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestMessageLogPrune:
|
||||
"""Tests for automatic pruning via _prune()."""
|
||||
|
||||
def test_prune_keeps_at_most_max_messages(self, tmp_path):
|
||||
"""After exceeding MAX_MESSAGES, oldest messages are pruned."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
# Temporarily lower the limit via monkeypatching is not straightforward
|
||||
# because _prune reads the module-level MAX_MESSAGES constant.
|
||||
# We therefore patch it directly.
|
||||
import infrastructure.chat_store as cs
|
||||
|
||||
original = cs.MAX_MESSAGES
|
||||
cs.MAX_MESSAGES = 5
|
||||
try:
|
||||
for i in range(8):
|
||||
log.append("user", f"msg{i}", f"t{i}")
|
||||
assert len(log) == 5
|
||||
finally:
|
||||
cs.MAX_MESSAGES = original
|
||||
log.close()
|
||||
|
||||
def test_prune_keeps_newest_messages(self, tmp_path):
|
||||
"""Pruning removes oldest messages and keeps the newest ones."""
|
||||
import infrastructure.chat_store as cs
|
||||
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
original = cs.MAX_MESSAGES
|
||||
cs.MAX_MESSAGES = 3
|
||||
try:
|
||||
for i in range(5):
|
||||
log.append("user", f"msg{i}", f"t{i}")
|
||||
messages = log.all()
|
||||
contents = [m.content for m in messages]
|
||||
assert contents == ["msg2", "msg3", "msg4"]
|
||||
finally:
|
||||
cs.MAX_MESSAGES = original
|
||||
log.close()
|
||||
|
||||
def test_no_prune_when_below_limit(self, tmp_path):
|
||||
"""No messages are pruned while count is at or below MAX_MESSAGES."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
import infrastructure.chat_store as cs
|
||||
|
||||
original = cs.MAX_MESSAGES
|
||||
cs.MAX_MESSAGES = 10
|
||||
try:
|
||||
for i in range(10):
|
||||
log.append("user", f"msg{i}", f"t{i}")
|
||||
assert len(log) == 10
|
||||
finally:
|
||||
cs.MAX_MESSAGES = original
|
||||
log.close()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# MessageLog — close / lifecycle
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestMessageLogClose:
|
||||
"""Tests for MessageLog.close()."""
|
||||
|
||||
def test_close_is_safe_before_first_use(self, tmp_path):
|
||||
"""close() on a fresh (never-used) instance does not raise."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
log.close() # should not raise
|
||||
|
||||
def test_close_multiple_times_is_safe(self, tmp_path):
|
||||
"""close() can be called multiple times without error."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
log.append("user", "hi", "t")
|
||||
log.close()
|
||||
log.close() # second close should not raise
|
||||
|
||||
def test_close_sets_conn_to_none(self, tmp_path):
|
||||
"""close() sets the internal _conn attribute to None."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
log.append("user", "hi", "t")
|
||||
assert log._conn is not None
|
||||
log.close()
|
||||
assert log._conn is None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Thread safety
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestMessageLogThreadSafety:
|
||||
"""Thread-safety tests for MessageLog."""
|
||||
|
||||
def test_concurrent_appends(self, tmp_path):
|
||||
"""Multiple threads can append messages without data loss or errors."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
errors: list[Exception] = []
|
||||
|
||||
def worker(n: int) -> None:
|
||||
try:
|
||||
for i in range(5):
|
||||
log.append("user", f"t{n}-{i}", f"ts-{n}-{i}")
|
||||
except Exception as exc: # noqa: BLE001
|
||||
errors.append(exc)
|
||||
|
||||
threads = [threading.Thread(target=worker, args=(n,)) for n in range(4)]
|
||||
for t in threads:
|
||||
t.start()
|
||||
for t in threads:
|
||||
t.join()
|
||||
|
||||
assert errors == [], f"Concurrent append raised: {errors}"
|
||||
# All 20 messages should be present (4 threads × 5 messages)
|
||||
assert len(log) == 20
|
||||
log.close()
|
||||
|
||||
def test_concurrent_reads_and_writes(self, tmp_path):
|
||||
"""Concurrent reads and writes do not corrupt state."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
errors: list[Exception] = []
|
||||
|
||||
def writer() -> None:
|
||||
try:
|
||||
for i in range(10):
|
||||
log.append("user", f"msg{i}", f"t{i}")
|
||||
except Exception as exc: # noqa: BLE001
|
||||
errors.append(exc)
|
||||
|
||||
def reader() -> None:
|
||||
try:
|
||||
for _ in range(10):
|
||||
log.all()
|
||||
except Exception as exc: # noqa: BLE001
|
||||
errors.append(exc)
|
||||
|
||||
threads = [threading.Thread(target=writer)] + [
|
||||
threading.Thread(target=reader) for _ in range(3)
|
||||
]
|
||||
for t in threads:
|
||||
t.start()
|
||||
for t in threads:
|
||||
t.join()
|
||||
|
||||
assert errors == [], f"Concurrent read/write raised: {errors}"
|
||||
log.close()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Edge cases
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestMessageLogEdgeCases:
|
||||
"""Edge-case tests for MessageLog."""
|
||||
|
||||
def test_empty_content_stored_and_retrieved(self, tmp_path):
|
||||
"""Empty string content can be stored and retrieved."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
log.append("user", "", "2024-01-01T00:00:00")
|
||||
assert log.all()[0].content == ""
|
||||
log.close()
|
||||
|
||||
def test_unicode_content_stored_and_retrieved(self, tmp_path):
|
||||
"""Unicode characters in content are stored and retrieved correctly."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
log.append("user", "こんにちは 🌍", "2024-01-01T00:00:00")
|
||||
assert log.all()[0].content == "こんにちは 🌍"
|
||||
log.close()
|
||||
|
||||
def test_newline_in_content(self, tmp_path):
|
||||
"""Newlines in content are preserved."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
multiline = "line1\nline2\nline3"
|
||||
log.append("agent", multiline, "2024-01-01T00:00:00")
|
||||
assert log.all()[0].content == multiline
|
||||
log.close()
|
||||
|
||||
def test_default_db_path_attribute(self):
|
||||
"""MessageLog without explicit path uses the module-level DB_PATH."""
|
||||
from infrastructure.chat_store import DB_PATH
|
||||
|
||||
log = MessageLog()
|
||||
assert log._db_path == DB_PATH
|
||||
# Do NOT call close() here — this is the global singleton's path
|
||||
|
||||
def test_custom_db_path_used(self, tmp_path):
|
||||
"""MessageLog uses the provided db_path."""
|
||||
db = tmp_path / "custom.db"
|
||||
log = MessageLog(db)
|
||||
log.append("user", "test", "t")
|
||||
assert db.exists()
|
||||
log.close()
|
||||
|
||||
def test_recent_limit_zero_returns_empty(self, tmp_path):
|
||||
"""recent(limit=0) returns an empty list."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
log.append("user", "msg", "t")
|
||||
assert log.recent(limit=0) == []
|
||||
log.close()
|
||||
|
||||
def test_all_roles_stored_correctly(self, tmp_path):
|
||||
"""Different role values are stored and retrieved correctly."""
|
||||
log = MessageLog(tmp_path / "chat.db")
|
||||
for role in ("user", "agent", "error", "system"):
|
||||
log.append(role, f"{role} message", "t")
|
||||
messages = log.all()
|
||||
assert [m.role for m in messages] == ["user", "agent", "error", "system"]
|
||||
log.close()
|
||||
@@ -1,21 +1,10 @@
|
||||
"""Tests for the async event bus (infrastructure.events.bus)."""
|
||||
|
||||
import sqlite3
|
||||
from pathlib import Path
|
||||
from unittest.mock import patch
|
||||
|
||||
import pytest
|
||||
|
||||
import infrastructure.events.bus as bus_module
|
||||
from infrastructure.events.bus import (
|
||||
Event,
|
||||
EventBus,
|
||||
emit,
|
||||
event_bus,
|
||||
get_event_bus,
|
||||
init_event_bus_persistence,
|
||||
on,
|
||||
)
|
||||
from infrastructure.events.bus import Event, EventBus, emit, event_bus, on
|
||||
|
||||
|
||||
class TestEvent:
|
||||
@@ -360,111 +349,3 @@ class TestEventBusPersistence:
|
||||
assert mode == "wal"
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
async def test_persist_event_exception_is_swallowed(self, tmp_path):
|
||||
"""_persist_event must not propagate SQLite errors."""
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
bus = EventBus()
|
||||
bus.enable_persistence(tmp_path / "events.db")
|
||||
|
||||
# Make the INSERT raise an OperationalError
|
||||
mock_conn = MagicMock()
|
||||
mock_conn.execute.side_effect = sqlite3.OperationalError("simulated failure")
|
||||
|
||||
from contextlib import contextmanager
|
||||
|
||||
@contextmanager
|
||||
def fake_ctx():
|
||||
yield mock_conn
|
||||
|
||||
with patch.object(bus, "_get_persistence_conn", fake_ctx):
|
||||
# Should not raise
|
||||
bus._persist_event(Event(type="x", source="s"))
|
||||
|
||||
async def test_replay_exception_returns_empty(self, tmp_path):
|
||||
"""replay() must return [] when SQLite query fails."""
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
bus = EventBus()
|
||||
bus.enable_persistence(tmp_path / "events.db")
|
||||
|
||||
mock_conn = MagicMock()
|
||||
mock_conn.execute.side_effect = sqlite3.OperationalError("simulated failure")
|
||||
|
||||
from contextlib import contextmanager
|
||||
|
||||
@contextmanager
|
||||
def fake_ctx():
|
||||
yield mock_conn
|
||||
|
||||
with patch.object(bus, "_get_persistence_conn", fake_ctx):
|
||||
result = bus.replay()
|
||||
assert result == []
|
||||
|
||||
|
||||
# ── Singleton helpers ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSingletonHelpers:
|
||||
"""Test get_event_bus(), init_event_bus_persistence(), and module __getattr__."""
|
||||
|
||||
def test_get_event_bus_returns_same_instance(self):
|
||||
"""get_event_bus() is a true singleton."""
|
||||
a = get_event_bus()
|
||||
b = get_event_bus()
|
||||
assert a is b
|
||||
|
||||
def test_module_event_bus_attr_is_singleton(self):
|
||||
"""Accessing bus_module.event_bus via __getattr__ returns the singleton."""
|
||||
assert bus_module.event_bus is get_event_bus()
|
||||
|
||||
def test_module_getattr_unknown_raises(self):
|
||||
"""Accessing an unknown module attribute raises AttributeError."""
|
||||
with pytest.raises(AttributeError):
|
||||
_ = bus_module.no_such_attr # type: ignore[attr-defined]
|
||||
|
||||
def test_init_event_bus_persistence_sets_path(self, tmp_path):
|
||||
"""init_event_bus_persistence() enables persistence on the singleton."""
|
||||
bus = get_event_bus()
|
||||
original_path = bus._persistence_db_path
|
||||
try:
|
||||
bus._persistence_db_path = None # reset for the test
|
||||
db_path = tmp_path / "test_init.db"
|
||||
init_event_bus_persistence(db_path)
|
||||
assert bus._persistence_db_path == db_path
|
||||
finally:
|
||||
bus._persistence_db_path = original_path
|
||||
|
||||
def test_init_event_bus_persistence_is_idempotent(self, tmp_path):
|
||||
"""Calling init_event_bus_persistence() twice keeps the first path."""
|
||||
bus = get_event_bus()
|
||||
original_path = bus._persistence_db_path
|
||||
try:
|
||||
bus._persistence_db_path = None
|
||||
first_path = tmp_path / "first.db"
|
||||
second_path = tmp_path / "second.db"
|
||||
init_event_bus_persistence(first_path)
|
||||
init_event_bus_persistence(second_path) # should be ignored
|
||||
assert bus._persistence_db_path == first_path
|
||||
finally:
|
||||
bus._persistence_db_path = original_path
|
||||
|
||||
def test_init_event_bus_persistence_default_path(self):
|
||||
"""init_event_bus_persistence() uses 'data/events.db' when no path given."""
|
||||
bus = get_event_bus()
|
||||
original_path = bus._persistence_db_path
|
||||
try:
|
||||
bus._persistence_db_path = None
|
||||
# Patch enable_persistence to capture what path it receives
|
||||
captured = {}
|
||||
|
||||
def fake_enable(path: Path) -> None:
|
||||
captured["path"] = path
|
||||
|
||||
with patch.object(bus, "enable_persistence", side_effect=fake_enable):
|
||||
init_event_bus_persistence()
|
||||
|
||||
assert captured["path"] == Path("data/events.db")
|
||||
finally:
|
||||
bus._persistence_db_path = original_path
|
||||
|
||||
@@ -1,439 +0,0 @@
|
||||
"""Tests for the three-tier metabolic LLM router (issue #966)."""
|
||||
|
||||
from unittest.mock import AsyncMock, MagicMock
|
||||
|
||||
import pytest
|
||||
|
||||
from infrastructure.router.metabolic import (
|
||||
DEFAULT_TIER_MODELS,
|
||||
MetabolicRouter,
|
||||
ModelTier,
|
||||
build_prompt,
|
||||
classify_complexity,
|
||||
get_metabolic_router,
|
||||
)
|
||||
|
||||
pytestmark = pytest.mark.unit
|
||||
|
||||
# ── classify_complexity ──────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestClassifyComplexity:
|
||||
"""Verify tier classification for representative task / state pairs."""
|
||||
|
||||
# ── T1: Routine ─────────────────────────────────────────────────────────
|
||||
|
||||
def test_simple_navigation_is_t1(self):
|
||||
assert classify_complexity("go north", {}) == ModelTier.T1_ROUTINE
|
||||
|
||||
def test_single_action_is_t1(self):
|
||||
assert classify_complexity("open door", {}) == ModelTier.T1_ROUTINE
|
||||
|
||||
def test_t1_with_extra_words_stays_t1(self):
|
||||
# 6 words, all T1 territory, no active context
|
||||
assert classify_complexity("go south and take it", {}) == ModelTier.T1_ROUTINE
|
||||
|
||||
def test_t1_long_task_upgrades_to_t2(self):
|
||||
# More than 6 words → not T1 even with nav words
|
||||
assert (
|
||||
classify_complexity("go north and then move east and pick up the sword", {})
|
||||
!= ModelTier.T1_ROUTINE
|
||||
)
|
||||
|
||||
def test_active_quest_upgrades_t1_to_t2(self):
|
||||
state = {"active_quests": ["Rescue the Mage"]}
|
||||
assert classify_complexity("go north", state) == ModelTier.T2_MEDIUM
|
||||
|
||||
def test_dialogue_active_upgrades_t1_to_t2(self):
|
||||
state = {"dialogue_active": True}
|
||||
assert classify_complexity("yes", state) == ModelTier.T2_MEDIUM
|
||||
|
||||
def test_combat_active_upgrades_t1_to_t2(self):
|
||||
state = {"combat_active": True}
|
||||
assert classify_complexity("attack", state) == ModelTier.T2_MEDIUM
|
||||
|
||||
# ── T2: Medium ──────────────────────────────────────────────────────────
|
||||
|
||||
def test_default_is_t2(self):
|
||||
assert classify_complexity("what do I have in my inventory", {}) == ModelTier.T2_MEDIUM
|
||||
|
||||
def test_dialogue_response_is_t2(self):
|
||||
state = {"dialogue_active": True, "dialogue_npc": "Caius Cosades"}
|
||||
result = classify_complexity("I'm looking for Caius Cosades", state)
|
||||
assert result == ModelTier.T2_MEDIUM
|
||||
|
||||
# ── T3: Complex ─────────────────────────────────────────────────────────
|
||||
|
||||
def test_quest_planning_is_t3(self):
|
||||
assert classify_complexity("plan my quest route", {}) == ModelTier.T3_COMPLEX
|
||||
|
||||
def test_strategy_keyword_is_t3(self):
|
||||
assert classify_complexity("what is the best strategy", {}) == ModelTier.T3_COMPLEX
|
||||
|
||||
def test_stuck_keyword_is_t3(self):
|
||||
assert classify_complexity("I am stuck", {}) == ModelTier.T3_COMPLEX
|
||||
|
||||
def test_stuck_state_is_t3(self):
|
||||
assert classify_complexity("help me", {"stuck": True}) == ModelTier.T3_COMPLEX
|
||||
|
||||
def test_require_t3_flag_forces_t3(self):
|
||||
state = {"require_t3": True}
|
||||
assert classify_complexity("go north", state) == ModelTier.T3_COMPLEX
|
||||
|
||||
def test_optimize_keyword_is_t3(self):
|
||||
assert classify_complexity("optimize my skill build", {}) == ModelTier.T3_COMPLEX
|
||||
|
||||
def test_multi_word_t3_phrase(self):
|
||||
assert classify_complexity("how do i get past the guards", {}) == ModelTier.T3_COMPLEX
|
||||
|
||||
def test_case_insensitive(self):
|
||||
assert classify_complexity("PLAN my route", {}) == ModelTier.T3_COMPLEX
|
||||
|
||||
|
||||
# ── build_prompt ─────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestBuildPrompt:
|
||||
"""Verify prompt structure and content assembly."""
|
||||
|
||||
def test_returns_two_messages(self):
|
||||
msgs = build_prompt({}, {}, "go north")
|
||||
assert len(msgs) == 2
|
||||
assert msgs[0]["role"] == "system"
|
||||
assert msgs[1]["role"] == "user"
|
||||
|
||||
def test_user_message_contains_task(self):
|
||||
msgs = build_prompt({}, {}, "pick up the sword")
|
||||
assert msgs[1]["content"] == "pick up the sword"
|
||||
|
||||
def test_location_in_system(self):
|
||||
msgs = build_prompt({"location": "Balmora"}, {}, "look around")
|
||||
assert "Balmora" in msgs[0]["content"]
|
||||
|
||||
def test_health_in_system(self):
|
||||
msgs = build_prompt({"health": 42}, {}, "rest")
|
||||
assert "42" in msgs[0]["content"]
|
||||
|
||||
def test_inventory_in_system(self):
|
||||
msgs = build_prompt({"inventory": ["iron sword", "bread"]}, {}, "use item")
|
||||
assert "iron sword" in msgs[0]["content"]
|
||||
|
||||
def test_inventory_truncated_to_10(self):
|
||||
inventory = [f"item{i}" for i in range(20)]
|
||||
msgs = build_prompt({"inventory": inventory}, {}, "check")
|
||||
# Only first 10 should appear in the system message
|
||||
assert "item10" not in msgs[0]["content"]
|
||||
|
||||
def test_active_quests_in_system(self):
|
||||
msgs = build_prompt({"active_quests": ["Morrowind Main Quest"]}, {}, "help")
|
||||
assert "Morrowind Main Quest" in msgs[0]["content"]
|
||||
|
||||
def test_stuck_indicator_in_system(self):
|
||||
msgs = build_prompt({"stuck": True}, {}, "what now")
|
||||
assert "STUCK" in msgs[0]["content"]
|
||||
|
||||
def test_dialogue_npc_in_system(self):
|
||||
msgs = build_prompt({}, {"dialogue_active": True, "dialogue_npc": "Vivec"}, "hello")
|
||||
assert "Vivec" in msgs[0]["content"]
|
||||
|
||||
def test_menu_open_in_system(self):
|
||||
msgs = build_prompt({}, {"menu_open": "inventory"}, "check items")
|
||||
assert "inventory" in msgs[0]["content"]
|
||||
|
||||
def test_combat_active_in_system(self):
|
||||
msgs = build_prompt({}, {"combat_active": True}, "attack")
|
||||
assert "COMBAT" in msgs[0]["content"]
|
||||
|
||||
def test_visual_context_in_system(self):
|
||||
msgs = build_prompt({}, {}, "where am I", visual_context="A dark dungeon corridor")
|
||||
assert "dungeon corridor" in msgs[0]["content"]
|
||||
|
||||
def test_missing_optional_fields_omitted(self):
|
||||
msgs = build_prompt({}, {}, "move forward")
|
||||
system = msgs[0]["content"]
|
||||
assert "Health:" not in system
|
||||
assert "Inventory:" not in system
|
||||
assert "Active quests:" not in system
|
||||
|
||||
def test_inventory_dict_items(self):
|
||||
inventory = [{"name": "silver dagger"}, {"name": "potion"}]
|
||||
msgs = build_prompt({"inventory": inventory}, {}, "use")
|
||||
assert "silver dagger" in msgs[0]["content"]
|
||||
|
||||
def test_quest_dict_items(self):
|
||||
quests = [{"name": "The Warlord"}, {"name": "Lost in Translation"}]
|
||||
msgs = build_prompt({"active_quests": quests}, {}, "help")
|
||||
assert "The Warlord" in msgs[0]["content"]
|
||||
|
||||
|
||||
# ── MetabolicRouter ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
class TestMetabolicRouter:
|
||||
"""Test MetabolicRouter routing, tier labelling, and T3 world-pause logic."""
|
||||
|
||||
def _make_router(self, mock_cascade=None):
|
||||
"""Create a MetabolicRouter with a mocked CascadeRouter."""
|
||||
if mock_cascade is None:
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "Move north confirmed.",
|
||||
"provider": "ollama-local",
|
||||
"model": "qwen3:8b",
|
||||
"latency_ms": 120.0,
|
||||
}
|
||||
)
|
||||
return MetabolicRouter(cascade=mock_cascade)
|
||||
|
||||
async def test_route_returns_tier_in_result(self):
|
||||
router = self._make_router()
|
||||
result = await router.route("go north", state={})
|
||||
assert "tier" in result
|
||||
assert result["tier"] == ModelTier.T1_ROUTINE
|
||||
|
||||
async def test_t1_uses_t1_model(self):
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "ok",
|
||||
"provider": "ollama-local",
|
||||
"model": "qwen3:8b",
|
||||
"latency_ms": 100,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
await router.route("go north", state={})
|
||||
call_kwargs = mock_cascade.complete.call_args
|
||||
assert call_kwargs.kwargs["model"] == DEFAULT_TIER_MODELS[ModelTier.T1_ROUTINE]
|
||||
|
||||
async def test_t2_uses_t2_model(self):
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "ok",
|
||||
"provider": "ollama-local",
|
||||
"model": "qwen3:14b",
|
||||
"latency_ms": 300,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
await router.route("what should I say to the innkeeper", state={})
|
||||
call_kwargs = mock_cascade.complete.call_args
|
||||
assert call_kwargs.kwargs["model"] == DEFAULT_TIER_MODELS[ModelTier.T2_MEDIUM]
|
||||
|
||||
async def test_t3_uses_t3_model(self):
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "ok",
|
||||
"provider": "ollama-local",
|
||||
"model": "qwen3:30b",
|
||||
"latency_ms": 2000,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
await router.route("plan the optimal quest route", state={})
|
||||
call_kwargs = mock_cascade.complete.call_args
|
||||
assert call_kwargs.kwargs["model"] == DEFAULT_TIER_MODELS[ModelTier.T3_COMPLEX]
|
||||
|
||||
async def test_custom_tier_models_respected(self):
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "ok",
|
||||
"provider": "test",
|
||||
"model": "custom-8b",
|
||||
"latency_ms": 100,
|
||||
}
|
||||
)
|
||||
custom = {ModelTier.T1_ROUTINE: "custom-8b"}
|
||||
router = MetabolicRouter(cascade=mock_cascade, tier_models=custom)
|
||||
await router.route("go north", state={})
|
||||
call_kwargs = mock_cascade.complete.call_args
|
||||
assert call_kwargs.kwargs["model"] == "custom-8b"
|
||||
|
||||
async def test_t3_pauses_world_before_inference(self):
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "ok",
|
||||
"provider": "ollama",
|
||||
"model": "qwen3:30b",
|
||||
"latency_ms": 1500,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
|
||||
pause_calls = []
|
||||
unpause_calls = []
|
||||
|
||||
mock_world = MagicMock()
|
||||
|
||||
def track_act(cmd):
|
||||
if cmd.action == "pause":
|
||||
pause_calls.append(cmd)
|
||||
elif cmd.action == "unpause":
|
||||
unpause_calls.append(cmd)
|
||||
|
||||
mock_world.act = track_act
|
||||
router.set_world(mock_world)
|
||||
|
||||
await router.route("plan the quest", state={})
|
||||
|
||||
assert len(pause_calls) == 1, "world.pause() should be called once for T3"
|
||||
assert len(unpause_calls) == 1, "world.unpause() should be called once for T3"
|
||||
|
||||
async def test_t3_unpauses_world_even_on_llm_error(self):
|
||||
"""world.unpause() must be called even when the LLM raises."""
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(side_effect=RuntimeError("LLM failed"))
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
|
||||
unpause_calls = []
|
||||
mock_world = MagicMock()
|
||||
mock_world.act = lambda cmd: unpause_calls.append(cmd) if cmd.action == "unpause" else None
|
||||
router.set_world(mock_world)
|
||||
|
||||
with pytest.raises(RuntimeError, match="LLM failed"):
|
||||
await router.route("plan the quest", state={})
|
||||
|
||||
assert len(unpause_calls) == 1, "world.unpause() must run even when LLM errors"
|
||||
|
||||
async def test_t1_does_not_pause_world(self):
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "ok",
|
||||
"provider": "ollama",
|
||||
"model": "qwen3:8b",
|
||||
"latency_ms": 120,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
|
||||
pause_calls = []
|
||||
mock_world = MagicMock()
|
||||
mock_world.act = lambda cmd: pause_calls.append(cmd)
|
||||
router.set_world(mock_world)
|
||||
|
||||
await router.route("go north", state={})
|
||||
|
||||
assert len(pause_calls) == 0, "world.pause() must NOT be called for T1"
|
||||
|
||||
async def test_t2_does_not_pause_world(self):
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "ok",
|
||||
"provider": "ollama",
|
||||
"model": "qwen3:14b",
|
||||
"latency_ms": 350,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
|
||||
pause_calls = []
|
||||
mock_world = MagicMock()
|
||||
mock_world.act = lambda cmd: pause_calls.append(cmd)
|
||||
router.set_world(mock_world)
|
||||
|
||||
await router.route("talk to the merchant", state={})
|
||||
|
||||
assert len(pause_calls) == 0, "world.pause() must NOT be called for T2"
|
||||
|
||||
async def test_broken_world_adapter_degrades_gracefully(self):
|
||||
"""If world.act() raises, inference must still complete."""
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "done",
|
||||
"provider": "ollama",
|
||||
"model": "qwen3:30b",
|
||||
"latency_ms": 2000,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
|
||||
mock_world = MagicMock()
|
||||
mock_world.act = MagicMock(side_effect=RuntimeError("world broken"))
|
||||
router.set_world(mock_world)
|
||||
|
||||
# Should not raise — degradation only logs a warning
|
||||
result = await router.route("plan the quest", state={})
|
||||
assert result["content"] == "done"
|
||||
|
||||
async def test_no_world_adapter_t3_still_works(self):
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "plan done",
|
||||
"provider": "ollama",
|
||||
"model": "qwen3:30b",
|
||||
"latency_ms": 2000,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
# No set_world() called
|
||||
|
||||
result = await router.route("plan the quest route", state={})
|
||||
assert result["content"] == "plan done"
|
||||
assert result["tier"] == ModelTier.T3_COMPLEX
|
||||
|
||||
async def test_classify_delegates_to_module_function(self):
|
||||
router = MetabolicRouter(cascade=MagicMock())
|
||||
assert router.classify("go north", {}) == classify_complexity("go north", {})
|
||||
assert router.classify("plan the quest", {}) == classify_complexity("plan the quest", {})
|
||||
|
||||
async def test_ui_state_defaults_to_empty_dict(self):
|
||||
"""Calling route without ui_state should not raise."""
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "ok",
|
||||
"provider": "ollama",
|
||||
"model": "qwen3:8b",
|
||||
"latency_ms": 100,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
# No ui_state argument
|
||||
result = await router.route("go north", state={})
|
||||
assert result["content"] == "ok"
|
||||
|
||||
async def test_temperature_and_max_tokens_forwarded(self):
|
||||
mock_cascade = MagicMock()
|
||||
mock_cascade.complete = AsyncMock(
|
||||
return_value={
|
||||
"content": "ok",
|
||||
"provider": "ollama",
|
||||
"model": "qwen3:14b",
|
||||
"latency_ms": 200,
|
||||
}
|
||||
)
|
||||
router = MetabolicRouter(cascade=mock_cascade)
|
||||
await router.route("describe the scene", state={}, temperature=0.1, max_tokens=50)
|
||||
call_kwargs = mock_cascade.complete.call_args.kwargs
|
||||
assert call_kwargs["temperature"] == 0.1
|
||||
assert call_kwargs["max_tokens"] == 50
|
||||
|
||||
|
||||
class TestGetMetabolicRouter:
|
||||
"""Test module-level singleton."""
|
||||
|
||||
def test_returns_metabolic_router_instance(self):
|
||||
import infrastructure.router.metabolic as m_module
|
||||
|
||||
# Reset singleton for clean test
|
||||
m_module._metabolic_router = None
|
||||
router = get_metabolic_router()
|
||||
assert isinstance(router, MetabolicRouter)
|
||||
|
||||
def test_singleton_returns_same_instance(self):
|
||||
import infrastructure.router.metabolic as m_module
|
||||
|
||||
m_module._metabolic_router = None
|
||||
r1 = get_metabolic_router()
|
||||
r2 = get_metabolic_router()
|
||||
assert r1 is r2
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
import time
|
||||
from pathlib import Path
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
from unittest.mock import AsyncMock, patch
|
||||
|
||||
import pytest
|
||||
import yaml
|
||||
@@ -10,16 +10,13 @@ import yaml
|
||||
from infrastructure.router.cascade import (
|
||||
CascadeRouter,
|
||||
CircuitState,
|
||||
ContentType,
|
||||
Provider,
|
||||
ProviderMetrics,
|
||||
ProviderStatus,
|
||||
RouterConfig,
|
||||
get_router,
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestProviderMetrics:
|
||||
"""Test provider metrics tracking."""
|
||||
|
||||
@@ -48,7 +45,6 @@ class TestProviderMetrics:
|
||||
assert metrics.error_rate == 0.3
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestProvider:
|
||||
"""Test Provider dataclass."""
|
||||
|
||||
@@ -92,7 +88,6 @@ class TestProvider:
|
||||
assert provider.get_default_model() is None
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestRouterConfig:
|
||||
"""Test router configuration."""
|
||||
|
||||
@@ -105,7 +100,6 @@ class TestRouterConfig:
|
||||
assert config.circuit_breaker_failure_threshold == 5
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestCascadeRouterInit:
|
||||
"""Test CascadeRouter initialization."""
|
||||
|
||||
@@ -164,7 +158,6 @@ class TestCascadeRouterInit:
|
||||
assert router.providers[0].api_key == "secret123"
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestCascadeRouterMetrics:
|
||||
"""Test metrics tracking."""
|
||||
|
||||
@@ -248,7 +241,6 @@ class TestCascadeRouterMetrics:
|
||||
assert provider.status == ProviderStatus.HEALTHY
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestCascadeRouterGetMetrics:
|
||||
"""Test get_metrics method."""
|
||||
|
||||
@@ -288,7 +280,6 @@ class TestCascadeRouterGetMetrics:
|
||||
assert p_metrics["metrics"]["avg_latency_ms"] == 200.0
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestCascadeRouterGetStatus:
|
||||
"""Test get_status method."""
|
||||
|
||||
@@ -314,7 +305,6 @@ class TestCascadeRouterGetStatus:
|
||||
assert len(status["providers"]) == 1
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
@pytest.mark.asyncio
|
||||
class TestCascadeRouterComplete:
|
||||
"""Test complete method with failover."""
|
||||
@@ -446,7 +436,6 @@ class TestCascadeRouterComplete:
|
||||
assert result["provider"] == "healthy"
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestProviderAvailabilityCheck:
|
||||
"""Test provider availability checking."""
|
||||
|
||||
@@ -523,7 +512,7 @@ class TestProviderAvailabilityCheck:
|
||||
|
||||
def test_check_vllm_mlx_server_healthy(self):
|
||||
"""Test vllm-mlx when health check succeeds."""
|
||||
from unittest.mock import patch
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
router = CascadeRouter(config_path=Path("/nonexistent"))
|
||||
|
||||
@@ -567,7 +556,7 @@ class TestProviderAvailabilityCheck:
|
||||
|
||||
def test_check_vllm_mlx_default_url(self):
|
||||
"""Test vllm-mlx uses default localhost:8000 when no URL configured."""
|
||||
from unittest.mock import patch
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
router = CascadeRouter(config_path=Path("/nonexistent"))
|
||||
|
||||
@@ -588,7 +577,6 @@ class TestProviderAvailabilityCheck:
|
||||
mock_requests.get.assert_called_once_with("http://localhost:8000/health", timeout=5)
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
@pytest.mark.asyncio
|
||||
class TestVllmMlxProvider:
|
||||
"""Test vllm-mlx provider integration."""
|
||||
@@ -623,7 +611,7 @@ class TestVllmMlxProvider:
|
||||
|
||||
async def test_vllm_mlx_base_url_normalization(self):
|
||||
"""Test _call_vllm_mlx appends /v1 when missing."""
|
||||
from unittest.mock import AsyncMock, patch
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
router = CascadeRouter(config_path=Path("/nonexistent"))
|
||||
|
||||
@@ -693,8 +681,6 @@ class TestVllmMlxProvider:
|
||||
assert result["content"] == "Local MLX response"
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
@pytest.mark.asyncio
|
||||
class TestMetabolicProtocol:
|
||||
"""Test metabolic protocol: cloud providers skip when quota is ACTIVE/RESTING."""
|
||||
|
||||
@@ -804,7 +790,6 @@ class TestMetabolicProtocol:
|
||||
assert result["content"] == "Cloud response"
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestCascadeRouterReload:
|
||||
"""Test hot-reload of providers.yaml."""
|
||||
|
||||
@@ -983,532 +968,3 @@ class TestCascadeRouterReload:
|
||||
|
||||
assert router.providers[0].name == "low-priority"
|
||||
assert router.providers[1].name == "high-priority"
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestContentTypeDetection:
|
||||
"""Test _detect_content_type logic."""
|
||||
|
||||
def _router(self) -> CascadeRouter:
|
||||
return CascadeRouter(config_path=Path("/nonexistent"))
|
||||
|
||||
def test_text_only(self):
|
||||
router = self._router()
|
||||
msgs = [{"role": "user", "content": "Hello"}]
|
||||
assert router._detect_content_type(msgs) == ContentType.TEXT
|
||||
|
||||
def test_images_key_triggers_vision(self):
|
||||
router = self._router()
|
||||
msgs = [{"role": "user", "content": "Describe this", "images": ["pic.jpg"]}]
|
||||
assert router._detect_content_type(msgs) == ContentType.VISION
|
||||
|
||||
def test_image_extension_in_content_triggers_vision(self):
|
||||
router = self._router()
|
||||
msgs = [{"role": "user", "content": "Look at photo.png please"}]
|
||||
assert router._detect_content_type(msgs) == ContentType.VISION
|
||||
|
||||
def test_base64_data_uri_triggers_vision(self):
|
||||
router = self._router()
|
||||
msgs = [{"role": "user", "content": "data:image/jpeg;base64,/9j/4AA..."}]
|
||||
assert router._detect_content_type(msgs) == ContentType.VISION
|
||||
|
||||
def test_audio_key_triggers_audio(self):
|
||||
router = self._router()
|
||||
msgs = [{"role": "user", "content": "", "audio": b"bytes"}]
|
||||
assert router._detect_content_type(msgs) == ContentType.AUDIO
|
||||
|
||||
def test_image_and_audio_triggers_multimodal(self):
|
||||
router = self._router()
|
||||
msgs = [
|
||||
{"role": "user", "content": "check photo.jpg", "audio": b"bytes"},
|
||||
]
|
||||
assert router._detect_content_type(msgs) == ContentType.MULTIMODAL
|
||||
|
||||
def test_list_content_image_url_type(self):
|
||||
router = self._router()
|
||||
msgs = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "text", "text": "What?"},
|
||||
{"type": "image_url", "image_url": {"url": "http://example.com/a.jpg"}},
|
||||
],
|
||||
}
|
||||
]
|
||||
assert router._detect_content_type(msgs) == ContentType.VISION
|
||||
|
||||
def test_list_content_audio_type(self):
|
||||
router = self._router()
|
||||
msgs = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "audio", "data": "base64..."},
|
||||
],
|
||||
}
|
||||
]
|
||||
assert router._detect_content_type(msgs) == ContentType.AUDIO
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestTransformMessagesForOllama:
|
||||
"""Test _transform_messages_for_ollama."""
|
||||
|
||||
def _router(self) -> CascadeRouter:
|
||||
return CascadeRouter(config_path=Path("/nonexistent"))
|
||||
|
||||
def test_plain_text_message(self):
|
||||
router = self._router()
|
||||
result = router._transform_messages_for_ollama([{"role": "user", "content": "Hello"}])
|
||||
assert result == [{"role": "user", "content": "Hello"}]
|
||||
|
||||
def test_base64_image_stripped(self):
|
||||
router = self._router()
|
||||
msgs = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Describe",
|
||||
"images": ["data:image/png;base64,abc123"],
|
||||
}
|
||||
]
|
||||
result = router._transform_messages_for_ollama(msgs)
|
||||
assert result[0]["images"] == ["abc123"]
|
||||
|
||||
def test_http_url_skipped(self):
|
||||
router = self._router()
|
||||
msgs = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Describe",
|
||||
"images": ["http://example.com/img.jpg"],
|
||||
}
|
||||
]
|
||||
result = router._transform_messages_for_ollama(msgs)
|
||||
# URL is skipped — images list should be empty or absent
|
||||
assert result[0].get("images", []) == []
|
||||
|
||||
def test_missing_local_file_skipped(self):
|
||||
router = self._router()
|
||||
msgs = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Describe",
|
||||
"images": ["/nonexistent/path/image.png"],
|
||||
}
|
||||
]
|
||||
result = router._transform_messages_for_ollama(msgs)
|
||||
assert result[0].get("images", []) == []
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestProviderCapabilityMethods:
|
||||
"""Test Provider.get_model_with_capability and model_has_capability."""
|
||||
|
||||
def _provider(self) -> Provider:
|
||||
return Provider(
|
||||
name="test",
|
||||
type="ollama",
|
||||
enabled=True,
|
||||
priority=1,
|
||||
models=[
|
||||
{"name": "llava:7b", "capabilities": ["vision"]},
|
||||
{"name": "llama3.2", "default": True},
|
||||
],
|
||||
)
|
||||
|
||||
def test_get_model_with_capability_found(self):
|
||||
p = self._provider()
|
||||
assert p.get_model_with_capability("vision") == "llava:7b"
|
||||
|
||||
def test_get_model_with_capability_falls_back_to_default(self):
|
||||
p = self._provider()
|
||||
assert p.get_model_with_capability("audio") == "llama3.2"
|
||||
|
||||
def test_model_has_capability_true(self):
|
||||
p = self._provider()
|
||||
assert p.model_has_capability("llava:7b", "vision") is True
|
||||
|
||||
def test_model_has_capability_false(self):
|
||||
p = self._provider()
|
||||
assert p.model_has_capability("llama3.2", "vision") is False
|
||||
|
||||
def test_model_has_capability_unknown_model(self):
|
||||
p = self._provider()
|
||||
assert p.model_has_capability("unknown-model", "vision") is False
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestGetFallbackModel:
|
||||
"""Test _get_fallback_model."""
|
||||
|
||||
def _router_with_provider(self) -> tuple[CascadeRouter, Provider]:
|
||||
router = CascadeRouter(config_path=Path("/nonexistent"))
|
||||
provider = Provider(
|
||||
name="test",
|
||||
type="ollama",
|
||||
enabled=True,
|
||||
priority=1,
|
||||
models=[
|
||||
{"name": "llava:7b", "capabilities": ["vision"]},
|
||||
{"name": "llama3.2", "default": True},
|
||||
],
|
||||
)
|
||||
return router, provider
|
||||
|
||||
def test_returns_vision_model(self):
|
||||
router, provider = self._router_with_provider()
|
||||
result = router._get_fallback_model(provider, "llama3.2", ContentType.VISION)
|
||||
assert result == "llava:7b"
|
||||
|
||||
def test_returns_none_if_no_capability(self):
|
||||
router, provider = self._router_with_provider()
|
||||
result = router._get_fallback_model(provider, "llama3.2", ContentType.AUDIO)
|
||||
# No audio model; falls back to default which is same as original
|
||||
assert result is None or result == "llama3.2"
|
||||
|
||||
def test_text_content_returns_none(self):
|
||||
router, provider = self._router_with_provider()
|
||||
result = router._get_fallback_model(provider, "llama3.2", ContentType.TEXT)
|
||||
assert result is None
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
@pytest.mark.asyncio
|
||||
class TestCascadeTierFiltering:
|
||||
"""Test cascade_tier parameter in complete()."""
|
||||
|
||||
def _make_router(self) -> CascadeRouter:
|
||||
router = CascadeRouter(config_path=Path("/nonexistent"))
|
||||
router.providers = [
|
||||
Provider(
|
||||
name="anthropic-primary",
|
||||
type="anthropic",
|
||||
enabled=True,
|
||||
priority=1,
|
||||
api_key="test-key",
|
||||
models=[{"name": "claude-sonnet-4-6", "default": True}],
|
||||
),
|
||||
Provider(
|
||||
name="ollama-local",
|
||||
type="ollama",
|
||||
enabled=True,
|
||||
priority=2,
|
||||
models=[{"name": "llama3.2", "default": True}],
|
||||
),
|
||||
]
|
||||
return router
|
||||
|
||||
async def test_frontier_required_uses_anthropic(self):
|
||||
router = self._make_router()
|
||||
with patch("infrastructure.router.cascade._quota_monitor", None):
|
||||
with patch.object(router, "_call_anthropic") as mock_call:
|
||||
mock_call.return_value = {
|
||||
"content": "frontier response",
|
||||
"model": "claude-sonnet-4-6",
|
||||
}
|
||||
result = await router.complete(
|
||||
messages=[{"role": "user", "content": "hi"}],
|
||||
cascade_tier="frontier_required",
|
||||
)
|
||||
assert result["provider"] == "anthropic-primary"
|
||||
mock_call.assert_called_once()
|
||||
|
||||
async def test_frontier_required_no_anthropic_raises(self):
|
||||
router = CascadeRouter(config_path=Path("/nonexistent"))
|
||||
router.providers = [
|
||||
Provider(
|
||||
name="ollama-local",
|
||||
type="ollama",
|
||||
enabled=True,
|
||||
priority=1,
|
||||
models=[{"name": "llama3.2", "default": True}],
|
||||
)
|
||||
]
|
||||
with pytest.raises(RuntimeError, match="No Anthropic provider configured"):
|
||||
await router.complete(
|
||||
messages=[{"role": "user", "content": "hi"}],
|
||||
cascade_tier="frontier_required",
|
||||
)
|
||||
|
||||
async def test_unknown_tier_raises(self):
|
||||
router = self._make_router()
|
||||
with pytest.raises(RuntimeError, match="No providers found for tier"):
|
||||
await router.complete(
|
||||
messages=[{"role": "user", "content": "hi"}],
|
||||
cascade_tier="nonexistent_tier",
|
||||
)
|
||||
|
||||
async def test_tier_filter_only_matching_providers(self):
|
||||
router = CascadeRouter(config_path=Path("/nonexistent"))
|
||||
router.providers = [
|
||||
Provider(
|
||||
name="local-primary",
|
||||
type="ollama",
|
||||
enabled=True,
|
||||
priority=1,
|
||||
tier="local",
|
||||
models=[{"name": "llama3.2", "default": True}],
|
||||
),
|
||||
Provider(
|
||||
name="cloud-secondary",
|
||||
type="anthropic",
|
||||
enabled=True,
|
||||
priority=2,
|
||||
tier="cloud",
|
||||
api_key="key",
|
||||
models=[{"name": "claude-sonnet-4-6", "default": True}],
|
||||
),
|
||||
]
|
||||
with patch.object(router, "_call_ollama") as mock_call:
|
||||
mock_call.return_value = {"content": "local response", "model": "llama3.2"}
|
||||
result = await router.complete(
|
||||
messages=[{"role": "user", "content": "hi"}],
|
||||
cascade_tier="local",
|
||||
)
|
||||
assert result["provider"] == "local-primary"
|
||||
mock_call.assert_called_once()
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
@pytest.mark.asyncio
|
||||
class TestGenerateWithImage:
|
||||
"""Test generate_with_image convenience method."""
|
||||
|
||||
async def test_delegates_to_complete(self):
|
||||
router = CascadeRouter(config_path=Path("/nonexistent"))
|
||||
router.providers = [
|
||||
Provider(
|
||||
name="ollama-vision",
|
||||
type="ollama",
|
||||
enabled=True,
|
||||
priority=1,
|
||||
models=[{"name": "llava:7b", "capabilities": ["vision"], "default": True}],
|
||||
)
|
||||
]
|
||||
|
||||
with patch.object(router, "_call_ollama") as mock_call:
|
||||
mock_call.return_value = {"content": "A cat", "model": "llava:7b"}
|
||||
result = await router.generate_with_image(
|
||||
prompt="What is this?",
|
||||
image_path="/tmp/cat.jpg",
|
||||
model="llava:7b",
|
||||
)
|
||||
|
||||
assert result["content"] == "A cat"
|
||||
assert result["provider"] == "ollama-vision"
|
||||
# complete() should have been called with images in messages
|
||||
call_kwargs = mock_call.call_args
|
||||
messages_passed = call_kwargs.kwargs.get("messages") or call_kwargs[1].get("messages")
|
||||
assert messages_passed[0]["images"] == ["/tmp/cat.jpg"]
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestGetRouterSingleton:
|
||||
"""Test get_router() returns a singleton and creates CascadeRouter."""
|
||||
|
||||
def test_get_router_returns_cascade_router(self):
|
||||
import infrastructure.router.cascade as cascade_module
|
||||
|
||||
# Reset singleton to test creation
|
||||
original = cascade_module.cascade_router
|
||||
cascade_module.cascade_router = None
|
||||
try:
|
||||
router = get_router()
|
||||
assert isinstance(router, CascadeRouter)
|
||||
finally:
|
||||
cascade_module.cascade_router = original
|
||||
|
||||
def test_get_router_returns_same_instance(self):
|
||||
import infrastructure.router.cascade as cascade_module
|
||||
|
||||
original = cascade_module.cascade_router
|
||||
cascade_module.cascade_router = None
|
||||
try:
|
||||
r1 = get_router()
|
||||
r2 = get_router()
|
||||
assert r1 is r2
|
||||
finally:
|
||||
cascade_module.cascade_router = original
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestIsProviderAvailable:
|
||||
"""Test _is_provider_available with circuit breaker transitions."""
|
||||
|
||||
def _router(self) -> CascadeRouter:
|
||||
return CascadeRouter(config_path=Path("/nonexistent"))
|
||||
|
||||
def test_disabled_provider_not_available(self):
|
||||
router = self._router()
|
||||
provider = Provider(name="p", type="ollama", enabled=False, priority=1)
|
||||
assert router._is_provider_available(provider) is False
|
||||
|
||||
def test_healthy_provider_available(self):
|
||||
router = self._router()
|
||||
provider = Provider(name="p", type="ollama", enabled=True, priority=1)
|
||||
assert router._is_provider_available(provider) is True
|
||||
|
||||
def test_unhealthy_open_circuit_not_available(self):
|
||||
router = self._router()
|
||||
provider = Provider(
|
||||
name="p",
|
||||
type="ollama",
|
||||
enabled=True,
|
||||
priority=1,
|
||||
status=ProviderStatus.UNHEALTHY,
|
||||
circuit_state=CircuitState.OPEN,
|
||||
circuit_opened_at=time.time(), # Just opened — not yet recoverable
|
||||
)
|
||||
assert router._is_provider_available(provider) is False
|
||||
|
||||
def test_unhealthy_after_timeout_transitions_to_half_open(self):
|
||||
router = self._router()
|
||||
router.config.circuit_breaker_recovery_timeout = 0
|
||||
provider = Provider(
|
||||
name="p",
|
||||
type="ollama",
|
||||
enabled=True,
|
||||
priority=1,
|
||||
status=ProviderStatus.UNHEALTHY,
|
||||
circuit_state=CircuitState.OPEN,
|
||||
circuit_opened_at=time.time() - 10, # Long ago
|
||||
)
|
||||
result = router._is_provider_available(provider)
|
||||
assert result is True
|
||||
assert provider.circuit_state == CircuitState.HALF_OPEN
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
class TestFilterProviders:
|
||||
"""Test _filter_providers helper extracted from complete()."""
|
||||
|
||||
def _router(self) -> CascadeRouter:
|
||||
router = CascadeRouter(config_path=Path("/nonexistent"))
|
||||
router.providers = [
|
||||
Provider(
|
||||
name="anthropic-p",
|
||||
type="anthropic",
|
||||
enabled=True,
|
||||
priority=1,
|
||||
api_key="key",
|
||||
tier="frontier",
|
||||
),
|
||||
Provider(
|
||||
name="ollama-p",
|
||||
type="ollama",
|
||||
enabled=True,
|
||||
priority=2,
|
||||
tier="local",
|
||||
),
|
||||
]
|
||||
return router
|
||||
|
||||
def test_no_tier_returns_all_providers(self):
|
||||
router = self._router()
|
||||
result = router._filter_providers(None)
|
||||
assert result is router.providers
|
||||
|
||||
def test_frontier_required_returns_only_anthropic(self):
|
||||
router = self._router()
|
||||
result = router._filter_providers("frontier_required")
|
||||
assert len(result) == 1
|
||||
assert result[0].type == "anthropic"
|
||||
|
||||
def test_frontier_required_no_anthropic_raises(self):
|
||||
router = CascadeRouter(config_path=Path("/nonexistent"))
|
||||
router.providers = [Provider(name="ollama-p", type="ollama", enabled=True, priority=1)]
|
||||
with pytest.raises(RuntimeError, match="No Anthropic provider configured"):
|
||||
router._filter_providers("frontier_required")
|
||||
|
||||
def test_named_tier_filters_by_tier(self):
|
||||
router = self._router()
|
||||
result = router._filter_providers("local")
|
||||
assert len(result) == 1
|
||||
assert result[0].name == "ollama-p"
|
||||
|
||||
def test_named_tier_not_found_raises(self):
|
||||
router = self._router()
|
||||
with pytest.raises(RuntimeError, match="No providers found for tier"):
|
||||
router._filter_providers("nonexistent")
|
||||
|
||||
|
||||
@pytest.mark.unit
|
||||
@pytest.mark.asyncio
|
||||
class TestTrySingleProvider:
|
||||
"""Test _try_single_provider helper extracted from complete()."""
|
||||
|
||||
def _router(self) -> CascadeRouter:
|
||||
return CascadeRouter(config_path=Path("/nonexistent"))
|
||||
|
||||
def _provider(self, name: str = "test", ptype: str = "ollama") -> Provider:
|
||||
return Provider(
|
||||
name=name,
|
||||
type=ptype,
|
||||
enabled=True,
|
||||
priority=1,
|
||||
models=[{"name": "llama3.2", "default": True}],
|
||||
)
|
||||
|
||||
async def test_unavailable_provider_returns_none(self):
|
||||
router = self._router()
|
||||
provider = self._provider()
|
||||
provider.enabled = False
|
||||
errors: list[str] = []
|
||||
result = await router._try_single_provider(
|
||||
provider, [], None, 0.7, None, ContentType.TEXT, errors
|
||||
)
|
||||
assert result is None
|
||||
assert errors == []
|
||||
|
||||
async def test_quota_blocked_cloud_provider_returns_none(self):
|
||||
router = self._router()
|
||||
provider = self._provider(ptype="anthropic")
|
||||
errors: list[str] = []
|
||||
with patch("infrastructure.router.cascade._quota_monitor") as mock_qm:
|
||||
mock_qm.select_model.return_value = "qwen3:14b" # non-cloud → ACTIVE tier
|
||||
mock_qm.check.return_value = None
|
||||
result = await router._try_single_provider(
|
||||
provider, [], None, 0.7, None, ContentType.TEXT, errors
|
||||
)
|
||||
assert result is None
|
||||
assert errors == []
|
||||
|
||||
async def test_success_returns_result_dict(self):
|
||||
router = self._router()
|
||||
provider = self._provider()
|
||||
errors: list[str] = []
|
||||
with patch.object(router, "_call_ollama") as mock_call:
|
||||
mock_call.return_value = {"content": "hi", "model": "llama3.2"}
|
||||
result = await router._try_single_provider(
|
||||
provider,
|
||||
[{"role": "user", "content": "hi"}],
|
||||
None,
|
||||
0.7,
|
||||
None,
|
||||
ContentType.TEXT,
|
||||
errors,
|
||||
)
|
||||
assert result is not None
|
||||
assert result["content"] == "hi"
|
||||
assert result["provider"] == "test"
|
||||
assert errors == []
|
||||
|
||||
async def test_failure_appends_error_and_returns_none(self):
|
||||
router = self._router()
|
||||
provider = self._provider()
|
||||
errors: list[str] = []
|
||||
with patch.object(router, "_call_ollama") as mock_call:
|
||||
mock_call.side_effect = RuntimeError("boom")
|
||||
result = await router._try_single_provider(
|
||||
provider,
|
||||
[{"role": "user", "content": "hi"}],
|
||||
None,
|
||||
0.7,
|
||||
None,
|
||||
ContentType.TEXT,
|
||||
errors,
|
||||
)
|
||||
assert result is None
|
||||
assert len(errors) == 1
|
||||
assert "boom" in errors[0]
|
||||
assert provider.metrics.failed_requests == 1
|
||||
|
||||
@@ -1,301 +0,0 @@
|
||||
"""Unit tests for bannerlord agents — King, Vassals, Companions."""
|
||||
|
||||
import asyncio
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
from bannerlord.agents.companions import (
|
||||
CaravanCompanion,
|
||||
LogisticsCompanion,
|
||||
ScoutCompanion,
|
||||
)
|
||||
from bannerlord.agents.king import KingAgent
|
||||
from bannerlord.agents.vassals import DiplomacyVassal, EconomyVassal, WarVassal
|
||||
from bannerlord.gabs_client import GABSClient, GABSUnavailable
|
||||
from bannerlord.ledger import Ledger
|
||||
from bannerlord.models import (
|
||||
KingSubgoal,
|
||||
TaskMessage,
|
||||
)
|
||||
|
||||
# ── Helpers ───────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _mock_gabs(state: dict | None = None) -> GABSClient:
|
||||
"""Return a disconnected GABS stub that returns *state* from get_state."""
|
||||
gabs = MagicMock(spec=GABSClient)
|
||||
gabs.connected = False
|
||||
if state is not None:
|
||||
gabs.get_state = AsyncMock(return_value=state)
|
||||
else:
|
||||
gabs.get_state = AsyncMock(side_effect=GABSUnavailable("no game"))
|
||||
gabs.call = AsyncMock(return_value={})
|
||||
gabs.recruit_troops = AsyncMock(return_value={"recruited": 10})
|
||||
gabs.move_party = AsyncMock(return_value={"moving": True})
|
||||
return gabs
|
||||
|
||||
|
||||
def _mock_ledger(tmp_path) -> Ledger:
|
||||
ledger = Ledger(db_path=tmp_path / "ledger.db")
|
||||
ledger.initialize()
|
||||
return ledger
|
||||
|
||||
|
||||
# ── King agent ────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestKingAgent:
|
||||
async def test_victory_detected(self, tmp_path):
|
||||
"""Campaign stops immediately when victory condition is met."""
|
||||
gabs = _mock_gabs({"player_title": "King", "territory_control_pct": 55.0})
|
||||
ledger = _mock_ledger(tmp_path)
|
||||
king = KingAgent(gabs_client=gabs, ledger=ledger, tick_interval=0)
|
||||
victory = await king.run_campaign(max_ticks=10)
|
||||
assert victory.achieved
|
||||
|
||||
async def test_max_ticks_respected(self, tmp_path):
|
||||
"""Campaign stops after max_ticks when victory not yet achieved."""
|
||||
gabs = _mock_gabs({"player_title": "Lord", "territory_control_pct": 10.0})
|
||||
ledger = _mock_ledger(tmp_path)
|
||||
|
||||
# Patch LLM to return a valid subgoal without calling Ollama
|
||||
king = KingAgent(gabs_client=gabs, ledger=ledger, tick_interval=0)
|
||||
with patch.object(king, "_decide", AsyncMock(return_value=KingSubgoal(token="RECRUIT"))):
|
||||
victory = await king.run_campaign(max_ticks=3)
|
||||
|
||||
assert not victory.achieved
|
||||
assert king._tick == 3
|
||||
|
||||
async def test_llm_failure_falls_back_to_recruit(self, tmp_path):
|
||||
"""If LLM fails, King defaults to RECRUIT subgoal."""
|
||||
gabs = _mock_gabs({"player_title": "Lord", "territory_control_pct": 5.0})
|
||||
ledger = _mock_ledger(tmp_path)
|
||||
king = KingAgent(gabs_client=gabs, ledger=ledger, tick_interval=0)
|
||||
|
||||
with patch.object(king, "_llm_decide", side_effect=RuntimeError("Ollama down")):
|
||||
subgoal = await king._decide({})
|
||||
|
||||
assert subgoal.token == "RECRUIT"
|
||||
|
||||
async def test_subgoal_broadcast_to_all_vassals(self, tmp_path):
|
||||
"""King broadcasts subgoal to all three vassals."""
|
||||
gabs = _mock_gabs({})
|
||||
ledger = _mock_ledger(tmp_path)
|
||||
king = KingAgent(gabs_client=gabs, ledger=ledger)
|
||||
subgoal = KingSubgoal(token="EXPAND_TERRITORY", target="Epicrotea")
|
||||
await king._broadcast_subgoal(subgoal)
|
||||
|
||||
messages = []
|
||||
while not king.subgoal_queue.empty():
|
||||
messages.append(king.subgoal_queue.get_nowait())
|
||||
|
||||
assert len(messages) == 3
|
||||
recipients = {m.to_agent for m in messages}
|
||||
assert recipients == {"war_vassal", "economy_vassal", "diplomacy_vassal"}
|
||||
|
||||
async def test_gabs_unavailable_uses_empty_state(self, tmp_path):
|
||||
"""King handles GABS being offline gracefully."""
|
||||
gabs = _mock_gabs() # raises GABSUnavailable
|
||||
ledger = _mock_ledger(tmp_path)
|
||||
king = KingAgent(gabs_client=gabs, ledger=ledger)
|
||||
state = await king._fetch_state()
|
||||
assert state == {}
|
||||
|
||||
def test_evaluate_victory_king_with_majority(self, tmp_path):
|
||||
gabs = _mock_gabs()
|
||||
ledger = _mock_ledger(tmp_path)
|
||||
king = KingAgent(gabs_client=gabs, ledger=ledger)
|
||||
v = king._evaluate_victory({"player_title": "King", "territory_control_pct": 60.0})
|
||||
assert v.achieved
|
||||
|
||||
def test_evaluate_victory_not_king(self, tmp_path):
|
||||
gabs = _mock_gabs()
|
||||
ledger = _mock_ledger(tmp_path)
|
||||
king = KingAgent(gabs_client=gabs, ledger=ledger)
|
||||
v = king._evaluate_victory({"player_title": "Lord", "territory_control_pct": 80.0})
|
||||
assert not v.achieved
|
||||
|
||||
|
||||
# ── Vassals ───────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestWarVassal:
|
||||
async def test_expand_territory_emits_move_task(self):
|
||||
gabs = _mock_gabs({"territory_delta": 1.0, "army_strength_ratio": 1.5})
|
||||
queue = asyncio.Queue()
|
||||
vassal = WarVassal(gabs_client=gabs, subgoal_queue=queue)
|
||||
subgoal = KingSubgoal(token="EXPAND_TERRITORY", target="Seonon")
|
||||
await vassal._tick(subgoal)
|
||||
task: TaskMessage = vassal.task_queue.get_nowait()
|
||||
assert task.primitive == "move_party"
|
||||
assert task.args["destination"] == "Seonon"
|
||||
|
||||
async def test_recruit_emits_recruit_task(self):
|
||||
gabs = _mock_gabs({})
|
||||
queue = asyncio.Queue()
|
||||
vassal = WarVassal(gabs_client=gabs, subgoal_queue=queue)
|
||||
subgoal = KingSubgoal(token="RECRUIT", quantity=15)
|
||||
await vassal._tick(subgoal)
|
||||
task: TaskMessage = vassal.task_queue.get_nowait()
|
||||
assert task.primitive == "recruit_troop"
|
||||
assert task.args["quantity"] == 15
|
||||
|
||||
async def test_irrelevant_token_emits_no_task(self):
|
||||
gabs = _mock_gabs({})
|
||||
queue = asyncio.Queue()
|
||||
vassal = WarVassal(gabs_client=gabs, subgoal_queue=queue)
|
||||
subgoal = KingSubgoal(token="ALLY")
|
||||
await vassal._tick(subgoal)
|
||||
assert vassal.task_queue.empty()
|
||||
|
||||
|
||||
class TestEconomyVassal:
|
||||
async def test_fortify_emits_build_task(self):
|
||||
gabs = _mock_gabs({"daily_income": 200.0})
|
||||
queue = asyncio.Queue()
|
||||
vassal = EconomyVassal(gabs_client=gabs, subgoal_queue=queue)
|
||||
subgoal = KingSubgoal(token="FORTIFY", target="Epicrotea")
|
||||
await vassal._tick(subgoal)
|
||||
task: TaskMessage = vassal.task_queue.get_nowait()
|
||||
assert task.primitive == "build_project"
|
||||
assert task.args["settlement"] == "Epicrotea"
|
||||
|
||||
async def test_trade_emits_assess_prices(self):
|
||||
gabs = _mock_gabs({})
|
||||
queue = asyncio.Queue()
|
||||
vassal = EconomyVassal(gabs_client=gabs, subgoal_queue=queue)
|
||||
subgoal = KingSubgoal(token="TRADE", target="Pravend")
|
||||
await vassal._tick(subgoal)
|
||||
task: TaskMessage = vassal.task_queue.get_nowait()
|
||||
assert task.primitive == "assess_prices"
|
||||
|
||||
|
||||
class TestDiplomacyVassal:
|
||||
async def test_ally_emits_track_lord(self):
|
||||
gabs = _mock_gabs({"allies_count": 1})
|
||||
queue = asyncio.Queue()
|
||||
vassal = DiplomacyVassal(gabs_client=gabs, subgoal_queue=queue)
|
||||
subgoal = KingSubgoal(token="ALLY", target="Derthert")
|
||||
await vassal._tick(subgoal)
|
||||
task: TaskMessage = vassal.task_queue.get_nowait()
|
||||
assert task.primitive == "track_lord"
|
||||
assert task.args["name"] == "Derthert"
|
||||
|
||||
async def test_spy_emits_assess_garrison(self):
|
||||
gabs = _mock_gabs({})
|
||||
queue = asyncio.Queue()
|
||||
vassal = DiplomacyVassal(gabs_client=gabs, subgoal_queue=queue)
|
||||
subgoal = KingSubgoal(token="SPY", target="Marunath")
|
||||
await vassal._tick(subgoal)
|
||||
task: TaskMessage = vassal.task_queue.get_nowait()
|
||||
assert task.primitive == "assess_garrison"
|
||||
assert task.args["settlement"] == "Marunath"
|
||||
|
||||
|
||||
# ── Companions ────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestLogisticsCompanion:
|
||||
async def test_recruit_troop(self):
|
||||
gabs = _mock_gabs()
|
||||
gabs.recruit_troops = AsyncMock(return_value={"recruited": 10, "type": "infantry"})
|
||||
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
|
||||
comp = LogisticsCompanion(gabs_client=gabs, task_queue=q)
|
||||
task = TaskMessage(
|
||||
from_agent="war_vassal",
|
||||
to_agent="logistics_companion",
|
||||
primitive="recruit_troop",
|
||||
args={"troop_type": "infantry", "quantity": 10},
|
||||
)
|
||||
result = await comp._execute(task)
|
||||
assert result.success is True
|
||||
assert result.outcome["recruited"] == 10
|
||||
|
||||
async def test_unknown_primitive_fails_gracefully(self):
|
||||
gabs = _mock_gabs()
|
||||
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
|
||||
comp = LogisticsCompanion(gabs_client=gabs, task_queue=q)
|
||||
task = TaskMessage(
|
||||
from_agent="war_vassal",
|
||||
to_agent="logistics_companion",
|
||||
primitive="launch_nukes",
|
||||
args={},
|
||||
)
|
||||
result = await comp._execute(task)
|
||||
assert result.success is False
|
||||
assert "Unknown primitive" in result.outcome["error"]
|
||||
|
||||
async def test_gabs_unavailable_returns_failure(self):
|
||||
gabs = _mock_gabs()
|
||||
gabs.recruit_troops = AsyncMock(side_effect=GABSUnavailable("offline"))
|
||||
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
|
||||
comp = LogisticsCompanion(gabs_client=gabs, task_queue=q)
|
||||
task = TaskMessage(
|
||||
from_agent="war_vassal",
|
||||
to_agent="logistics_companion",
|
||||
primitive="recruit_troop",
|
||||
args={"troop_type": "infantry", "quantity": 5},
|
||||
)
|
||||
result = await comp._execute(task)
|
||||
assert result.success is False
|
||||
|
||||
|
||||
class TestCaravanCompanion:
|
||||
async def test_assess_prices(self):
|
||||
gabs = _mock_gabs()
|
||||
gabs.call = AsyncMock(return_value={"grain": 12, "linen": 45})
|
||||
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
|
||||
comp = CaravanCompanion(gabs_client=gabs, task_queue=q)
|
||||
task = TaskMessage(
|
||||
from_agent="economy_vassal",
|
||||
to_agent="caravan_companion",
|
||||
primitive="assess_prices",
|
||||
args={"town": "Pravend"},
|
||||
)
|
||||
result = await comp._execute(task)
|
||||
assert result.success is True
|
||||
|
||||
async def test_abandon_route(self):
|
||||
gabs = _mock_gabs()
|
||||
gabs.call = AsyncMock(return_value={"abandoned": True})
|
||||
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
|
||||
comp = CaravanCompanion(gabs_client=gabs, task_queue=q)
|
||||
task = TaskMessage(
|
||||
from_agent="economy_vassal",
|
||||
to_agent="caravan_companion",
|
||||
primitive="abandon_route",
|
||||
args={},
|
||||
)
|
||||
result = await comp._execute(task)
|
||||
assert result.success is True
|
||||
assert result.outcome["abandoned"] is True
|
||||
|
||||
|
||||
class TestScoutCompanion:
|
||||
async def test_assess_garrison(self):
|
||||
gabs = _mock_gabs()
|
||||
gabs.call = AsyncMock(return_value={"garrison_size": 120, "settlement": "Marunath"})
|
||||
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
|
||||
comp = ScoutCompanion(gabs_client=gabs, task_queue=q)
|
||||
task = TaskMessage(
|
||||
from_agent="diplomacy_vassal",
|
||||
to_agent="scout_companion",
|
||||
primitive="assess_garrison",
|
||||
args={"settlement": "Marunath"},
|
||||
)
|
||||
result = await comp._execute(task)
|
||||
assert result.success is True
|
||||
assert result.outcome["garrison_size"] == 120
|
||||
|
||||
async def test_report_intel(self):
|
||||
gabs = _mock_gabs()
|
||||
gabs.call = AsyncMock(return_value={"intel": ["Derthert at Epicrotea"]})
|
||||
q: asyncio.Queue[TaskMessage] = asyncio.Queue()
|
||||
comp = ScoutCompanion(gabs_client=gabs, task_queue=q)
|
||||
task = TaskMessage(
|
||||
from_agent="diplomacy_vassal",
|
||||
to_agent="scout_companion",
|
||||
primitive="report_intel",
|
||||
args={},
|
||||
)
|
||||
result = await comp._execute(task)
|
||||
assert result.success is True
|
||||
@@ -1,145 +0,0 @@
|
||||
"""Unit tests for bannerlord.gabs_client — TCP JSON-RPC client."""
|
||||
|
||||
import json
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from bannerlord.gabs_client import GABSClient, GABSError, GABSUnavailable
|
||||
|
||||
# ── Connection ────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestGABSClientConnection:
|
||||
async def test_connect_success(self):
|
||||
mock_reader = AsyncMock()
|
||||
mock_writer = MagicMock()
|
||||
mock_writer.close = MagicMock()
|
||||
mock_writer.wait_closed = AsyncMock()
|
||||
|
||||
with patch(
|
||||
"bannerlord.gabs_client.asyncio.open_connection",
|
||||
return_value=(mock_reader, mock_writer),
|
||||
):
|
||||
client = GABSClient()
|
||||
await client.connect()
|
||||
|
||||
assert client.connected is True
|
||||
await client.close()
|
||||
|
||||
async def test_connect_failure_degrades_gracefully(self):
|
||||
with patch(
|
||||
"bannerlord.gabs_client.asyncio.open_connection",
|
||||
side_effect=OSError("Connection refused"),
|
||||
):
|
||||
client = GABSClient()
|
||||
await client.connect() # must not raise
|
||||
|
||||
assert client.connected is False
|
||||
|
||||
async def test_connect_timeout_degrades_gracefully(self):
|
||||
with patch(
|
||||
"bannerlord.gabs_client.asyncio.open_connection",
|
||||
side_effect=TimeoutError(),
|
||||
):
|
||||
client = GABSClient()
|
||||
await client.connect()
|
||||
|
||||
assert client.connected is False
|
||||
|
||||
async def test_context_manager(self):
|
||||
mock_reader = AsyncMock()
|
||||
mock_writer = MagicMock()
|
||||
mock_writer.close = MagicMock()
|
||||
mock_writer.wait_closed = AsyncMock()
|
||||
|
||||
with patch(
|
||||
"bannerlord.gabs_client.asyncio.open_connection",
|
||||
return_value=(mock_reader, mock_writer),
|
||||
):
|
||||
async with GABSClient() as client:
|
||||
assert client.connected is True
|
||||
|
||||
assert client.connected is False
|
||||
|
||||
|
||||
# ── RPC ───────────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestGABSClientRPC:
|
||||
def _make_connected_client(self, response_data: dict):
|
||||
"""Return a client with mocked reader/writer."""
|
||||
client = GABSClient()
|
||||
client._connected = True
|
||||
|
||||
raw_response = json.dumps(response_data) + "\n"
|
||||
client._reader = AsyncMock()
|
||||
client._reader.readline = AsyncMock(return_value=raw_response.encode())
|
||||
|
||||
client._writer = MagicMock()
|
||||
client._writer.write = MagicMock()
|
||||
client._writer.drain = AsyncMock()
|
||||
|
||||
return client
|
||||
|
||||
async def test_call_returns_result(self):
|
||||
client = self._make_connected_client({"jsonrpc": "2.0", "id": 1, "result": {"foo": "bar"}})
|
||||
result = await client.call("game.getState")
|
||||
assert result == {"foo": "bar"}
|
||||
|
||||
async def test_call_raises_on_error(self):
|
||||
client = self._make_connected_client(
|
||||
{"jsonrpc": "2.0", "id": 1, "error": {"code": -32601, "message": "Method not found"}}
|
||||
)
|
||||
with pytest.raises(GABSError, match="Method not found"):
|
||||
await client.call("game.nonexistent")
|
||||
|
||||
async def test_call_raises_unavailable_when_not_connected(self):
|
||||
client = GABSClient()
|
||||
assert client.connected is False
|
||||
with pytest.raises(GABSUnavailable):
|
||||
await client.call("game.getState")
|
||||
|
||||
async def test_sequence_increments(self):
|
||||
client = self._make_connected_client({"jsonrpc": "2.0", "id": 1, "result": {}})
|
||||
await client.call("game.getState")
|
||||
assert client._seq == 1
|
||||
client._reader.readline = AsyncMock(
|
||||
return_value=(json.dumps({"jsonrpc": "2.0", "id": 2, "result": {}}) + "\n").encode()
|
||||
)
|
||||
await client.call("game.getState")
|
||||
assert client._seq == 2
|
||||
|
||||
async def test_get_state_calls_correct_method(self):
|
||||
client = self._make_connected_client(
|
||||
{"jsonrpc": "2.0", "id": 1, "result": {"campaign_day": 10}}
|
||||
)
|
||||
result = await client.get_state()
|
||||
written = client._writer.write.call_args[0][0].decode()
|
||||
payload = json.loads(written.strip())
|
||||
assert payload["method"] == "game.getState"
|
||||
assert result == {"campaign_day": 10}
|
||||
|
||||
async def test_move_party_sends_target(self):
|
||||
client = self._make_connected_client(
|
||||
{"jsonrpc": "2.0", "id": 1, "result": {"moving": True}}
|
||||
)
|
||||
await client.move_party("Epicrotea")
|
||||
written = client._writer.write.call_args[0][0].decode()
|
||||
payload = json.loads(written.strip())
|
||||
assert payload["method"] == "party.move"
|
||||
assert payload["params"]["target"] == "Epicrotea"
|
||||
|
||||
async def test_connection_lost_marks_disconnected(self):
|
||||
client = GABSClient()
|
||||
client._connected = True
|
||||
client._reader = AsyncMock()
|
||||
client._reader.readline = AsyncMock(side_effect=OSError("connection reset"))
|
||||
client._writer = MagicMock()
|
||||
client._writer.write = MagicMock()
|
||||
client._writer.drain = AsyncMock()
|
||||
|
||||
with pytest.raises(GABSUnavailable):
|
||||
await client.call("game.getState")
|
||||
|
||||
assert client.connected is False
|
||||
@@ -1,189 +0,0 @@
|
||||
"""Unit tests for bannerlord.models — data contracts and reward functions."""
|
||||
|
||||
import pytest
|
||||
|
||||
from bannerlord.models import (
|
||||
SUBGOAL_TOKENS,
|
||||
DiplomacyReward,
|
||||
EconomyReward,
|
||||
KingSubgoal,
|
||||
ResultMessage,
|
||||
StateUpdateMessage,
|
||||
SubgoalMessage,
|
||||
TaskMessage,
|
||||
VictoryCondition,
|
||||
WarReward,
|
||||
)
|
||||
|
||||
# ── KingSubgoal ───────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestKingSubgoal:
|
||||
def test_valid_token(self):
|
||||
s = KingSubgoal(token="EXPAND_TERRITORY", target="Epicrotea")
|
||||
assert s.token == "EXPAND_TERRITORY"
|
||||
assert s.target == "Epicrotea"
|
||||
assert s.priority == 1.0
|
||||
|
||||
def test_all_tokens_valid(self):
|
||||
for token in SUBGOAL_TOKENS:
|
||||
KingSubgoal(token=token)
|
||||
|
||||
def test_invalid_token_raises(self):
|
||||
with pytest.raises(ValueError, match="Unknown subgoal token"):
|
||||
KingSubgoal(token="NUKE_CALRADIA")
|
||||
|
||||
def test_priority_clamp(self):
|
||||
with pytest.raises(ValueError):
|
||||
KingSubgoal(token="TRADE", priority=3.0)
|
||||
|
||||
def test_optional_fields_default_none(self):
|
||||
s = KingSubgoal(token="HEAL")
|
||||
assert s.target is None
|
||||
assert s.quantity is None
|
||||
assert s.deadline_days is None
|
||||
assert s.context is None
|
||||
|
||||
|
||||
# ── Messages ──────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSubgoalMessage:
|
||||
def test_defaults(self):
|
||||
msg = SubgoalMessage(
|
||||
to_agent="war_vassal",
|
||||
subgoal=KingSubgoal(token="RAID_ECONOMY"),
|
||||
)
|
||||
assert msg.msg_type == "subgoal"
|
||||
assert msg.from_agent == "king"
|
||||
assert msg.to_agent == "war_vassal"
|
||||
assert msg.issued_at is not None
|
||||
|
||||
def test_subgoal_roundtrip(self):
|
||||
subgoal = KingSubgoal(token="RECRUIT", quantity=30, priority=1.5)
|
||||
msg = SubgoalMessage(to_agent="war_vassal", subgoal=subgoal)
|
||||
assert msg.subgoal.quantity == 30
|
||||
assert msg.subgoal.priority == 1.5
|
||||
|
||||
|
||||
class TestTaskMessage:
|
||||
def test_construction(self):
|
||||
t = TaskMessage(
|
||||
from_agent="war_vassal",
|
||||
to_agent="logistics_companion",
|
||||
primitive="recruit_troop",
|
||||
args={"troop_type": "cavalry", "quantity": 5},
|
||||
priority=1.2,
|
||||
)
|
||||
assert t.msg_type == "task"
|
||||
assert t.primitive == "recruit_troop"
|
||||
assert t.args["quantity"] == 5
|
||||
|
||||
|
||||
class TestResultMessage:
|
||||
def test_success(self):
|
||||
r = ResultMessage(
|
||||
from_agent="logistics_companion",
|
||||
to_agent="war_vassal",
|
||||
success=True,
|
||||
outcome={"recruited": 10},
|
||||
reward_delta=0.15,
|
||||
)
|
||||
assert r.success is True
|
||||
assert r.reward_delta == 0.15
|
||||
|
||||
def test_failure(self):
|
||||
r = ResultMessage(
|
||||
from_agent="scout_companion",
|
||||
to_agent="diplomacy_vassal",
|
||||
success=False,
|
||||
outcome={"error": "GABS unavailable"},
|
||||
)
|
||||
assert r.success is False
|
||||
assert r.reward_delta == 0.0
|
||||
|
||||
|
||||
class TestStateUpdateMessage:
|
||||
def test_construction(self):
|
||||
msg = StateUpdateMessage(
|
||||
game_state={"campaign_day": 42, "player_title": "Lord"},
|
||||
tick=42,
|
||||
)
|
||||
assert msg.msg_type == "state"
|
||||
assert msg.tick == 42
|
||||
assert msg.game_state["campaign_day"] == 42
|
||||
|
||||
|
||||
# ── Reward functions ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestWarReward:
|
||||
def test_positive_expansion(self):
|
||||
r = WarReward(territory_delta=2.0, army_strength_ratio=1.2, subgoal_bonus=0.1)
|
||||
assert r.total > 0
|
||||
|
||||
def test_casualty_cost_penalizes(self):
|
||||
no_cost = WarReward(territory_delta=1.0, army_strength_ratio=1.0)
|
||||
with_cost = WarReward(territory_delta=1.0, army_strength_ratio=1.0, casualty_cost=5.0)
|
||||
assert with_cost.total < no_cost.total
|
||||
|
||||
def test_zero_state(self):
|
||||
r = WarReward()
|
||||
# army_strength_ratio default 1.0, rest 0 → 0.25 * 1.0 = 0.25
|
||||
assert abs(r.total - 0.25) < 1e-9
|
||||
|
||||
|
||||
class TestEconomyReward:
|
||||
def test_income_positive(self):
|
||||
r = EconomyReward(daily_denars_income=100.0, food_stock_buffer=7.0, loyalty_average=80.0)
|
||||
assert r.total > 0
|
||||
|
||||
def test_construction_queue_penalizes(self):
|
||||
no_queue = EconomyReward(daily_denars_income=50.0)
|
||||
long_queue = EconomyReward(daily_denars_income=50.0, construction_queue_length=10)
|
||||
assert long_queue.total < no_queue.total
|
||||
|
||||
def test_loyalty_contributes(self):
|
||||
low_loyalty = EconomyReward(loyalty_average=10.0)
|
||||
high_loyalty = EconomyReward(loyalty_average=90.0)
|
||||
assert high_loyalty.total > low_loyalty.total
|
||||
|
||||
|
||||
class TestDiplomacyReward:
|
||||
def test_allies_positive(self):
|
||||
r = DiplomacyReward(allies_count=3)
|
||||
assert r.total > 0
|
||||
|
||||
def test_active_wars_penalizes(self):
|
||||
peace = DiplomacyReward(allies_count=2)
|
||||
war = DiplomacyReward(allies_count=2, active_wars_front=4)
|
||||
assert war.total < peace.total
|
||||
|
||||
|
||||
# ── Victory condition ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestVictoryCondition:
|
||||
def test_not_achieved_without_title(self):
|
||||
v = VictoryCondition(holds_king_title=False, territory_control_pct=70.0)
|
||||
assert not v.achieved
|
||||
|
||||
def test_not_achieved_without_majority(self):
|
||||
v = VictoryCondition(holds_king_title=True, territory_control_pct=40.0)
|
||||
assert not v.achieved
|
||||
|
||||
def test_achieved_when_king_with_majority(self):
|
||||
v = VictoryCondition(holds_king_title=True, territory_control_pct=55.0)
|
||||
assert v.achieved
|
||||
|
||||
def test_exact_threshold(self):
|
||||
v = VictoryCondition(holds_king_title=True, territory_control_pct=51.0)
|
||||
assert v.achieved
|
||||
|
||||
def test_custom_threshold(self):
|
||||
v = VictoryCondition(
|
||||
holds_king_title=True,
|
||||
territory_control_pct=70.0,
|
||||
majority_threshold=75.0,
|
||||
)
|
||||
assert not v.achieved
|
||||
@@ -1,897 +0,0 @@
|
||||
"""Unit tests for src/config.py — Settings, validation, and helper functions.
|
||||
|
||||
Refs #1172
|
||||
"""
|
||||
|
||||
import os
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
pytestmark = pytest.mark.unit
|
||||
|
||||
|
||||
# ── Helpers ──────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _make_settings(**env_overrides):
|
||||
"""Create a fresh Settings instance with isolated env vars."""
|
||||
from config import Settings
|
||||
|
||||
# Prevent Pydantic from reading .env file (local .env pollutes defaults)
|
||||
_orig_config = Settings.model_config.copy()
|
||||
Settings.model_config["env_file"] = None
|
||||
|
||||
# Strip keys that might bleed in from the test environment
|
||||
clean_env = {
|
||||
k: v
|
||||
for k, v in os.environ.items()
|
||||
if not k.startswith(
|
||||
(
|
||||
"OLLAMA_",
|
||||
"TIMMY_",
|
||||
"AGENT_",
|
||||
"DEBUG",
|
||||
"GITEA_",
|
||||
"GROK_",
|
||||
"ANTHROPIC_",
|
||||
"SPARK_",
|
||||
"MEMORY_",
|
||||
"MAX_",
|
||||
"DISCORD_",
|
||||
"TELEGRAM_",
|
||||
"CORS_",
|
||||
"TRUSTED_",
|
||||
"L402_",
|
||||
"LIGHTNING_",
|
||||
"REPO_ROOT",
|
||||
"RQLITE_",
|
||||
"BRAIN_",
|
||||
"SELF_MODIFY",
|
||||
"WORK_ORDERS",
|
||||
"VASSAL_",
|
||||
"PAPERCLIP_",
|
||||
"OPENFANG_",
|
||||
"HERMES_",
|
||||
"BACKLOG_",
|
||||
"LOOP_QA",
|
||||
"FOCUS_",
|
||||
"THINKING_",
|
||||
"HANDS_",
|
||||
"WEEKLY_",
|
||||
"AUTORESEARCH_",
|
||||
"REWARD_",
|
||||
"BROWSER_",
|
||||
"GABS_",
|
||||
"SCRIPTURE_",
|
||||
"MCP_",
|
||||
"CHAT_API",
|
||||
"CSRF_",
|
||||
"ERROR_",
|
||||
"DB_",
|
||||
"MODERATION_",
|
||||
"SOVEREIGNTY_",
|
||||
"XAI_",
|
||||
"CLAUDE_",
|
||||
"FLUX_",
|
||||
"IMAGE_",
|
||||
"MUSIC_",
|
||||
"VIDEO_",
|
||||
"CREATIVE_",
|
||||
"WAN_",
|
||||
"ACE_",
|
||||
"GIT_",
|
||||
)
|
||||
)
|
||||
}
|
||||
clean_env.update(env_overrides)
|
||||
with patch.dict(os.environ, clean_env, clear=True):
|
||||
try:
|
||||
return Settings()
|
||||
finally:
|
||||
Settings.model_config.update(_orig_config)
|
||||
|
||||
|
||||
# ── normalize_ollama_url ──────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestNormalizeOllamaUrl:
|
||||
"""normalize_ollama_url replaces localhost with 127.0.0.1."""
|
||||
|
||||
def test_replaces_localhost(self):
|
||||
from config import normalize_ollama_url
|
||||
|
||||
assert normalize_ollama_url("http://localhost:11434") == "http://127.0.0.1:11434"
|
||||
|
||||
def test_preserves_ip_address(self):
|
||||
from config import normalize_ollama_url
|
||||
|
||||
assert normalize_ollama_url("http://192.168.1.5:11434") == "http://192.168.1.5:11434"
|
||||
|
||||
def test_preserves_non_localhost_hostname(self):
|
||||
from config import normalize_ollama_url
|
||||
|
||||
assert normalize_ollama_url("http://ollama.local:11434") == "http://ollama.local:11434"
|
||||
|
||||
def test_replaces_multiple_occurrences(self):
|
||||
from config import normalize_ollama_url
|
||||
|
||||
result = normalize_ollama_url("http://localhost:11434/localhost")
|
||||
assert result == "http://127.0.0.1:11434/127.0.0.1"
|
||||
|
||||
def test_empty_string(self):
|
||||
from config import normalize_ollama_url
|
||||
|
||||
assert normalize_ollama_url("") == ""
|
||||
|
||||
def test_127_0_0_1_unchanged(self):
|
||||
from config import normalize_ollama_url
|
||||
|
||||
url = "http://127.0.0.1:11434"
|
||||
assert normalize_ollama_url(url) == url
|
||||
|
||||
|
||||
# ── Settings defaults ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSettingsDefaults:
|
||||
"""Settings instantiation produces correct defaults."""
|
||||
|
||||
def test_default_agent_name(self):
|
||||
s = _make_settings()
|
||||
assert s.agent_name == "Agent"
|
||||
|
||||
def test_default_ollama_url(self):
|
||||
s = _make_settings()
|
||||
assert s.ollama_url == "http://localhost:11434"
|
||||
|
||||
def test_default_ollama_model(self):
|
||||
s = _make_settings()
|
||||
assert s.ollama_model == "qwen3:14b"
|
||||
|
||||
def test_default_ollama_fast_model(self):
|
||||
s = _make_settings()
|
||||
assert s.ollama_fast_model == "qwen3:8b"
|
||||
|
||||
def test_default_ollama_num_ctx(self):
|
||||
s = _make_settings()
|
||||
assert s.ollama_num_ctx == 32768
|
||||
|
||||
def test_default_ollama_max_loaded_models(self):
|
||||
s = _make_settings()
|
||||
assert s.ollama_max_loaded_models == 2
|
||||
|
||||
def test_default_debug_false(self):
|
||||
s = _make_settings()
|
||||
assert s.debug is False
|
||||
|
||||
def test_default_timmy_env(self):
|
||||
s = _make_settings()
|
||||
assert s.timmy_env == "development"
|
||||
|
||||
def test_default_timmy_test_mode_false(self):
|
||||
s = _make_settings()
|
||||
assert s.timmy_test_mode is False
|
||||
|
||||
def test_default_spark_enabled(self):
|
||||
s = _make_settings()
|
||||
assert s.spark_enabled is True
|
||||
|
||||
def test_default_lightning_backend(self):
|
||||
s = _make_settings()
|
||||
assert s.lightning_backend == "mock"
|
||||
|
||||
def test_default_max_agent_steps(self):
|
||||
s = _make_settings()
|
||||
assert s.max_agent_steps == 10
|
||||
|
||||
def test_default_memory_prune_days(self):
|
||||
s = _make_settings()
|
||||
assert s.memory_prune_days == 90
|
||||
|
||||
def test_default_memory_prune_keep_facts(self):
|
||||
s = _make_settings()
|
||||
assert s.memory_prune_keep_facts is True
|
||||
|
||||
def test_default_fallback_models_is_list(self):
|
||||
s = _make_settings()
|
||||
assert isinstance(s.fallback_models, list)
|
||||
assert len(s.fallback_models) > 0
|
||||
|
||||
def test_default_vision_fallback_models_is_list(self):
|
||||
s = _make_settings()
|
||||
assert isinstance(s.vision_fallback_models, list)
|
||||
assert len(s.vision_fallback_models) > 0
|
||||
|
||||
def test_default_cors_origins_is_list(self):
|
||||
s = _make_settings()
|
||||
assert isinstance(s.cors_origins, list)
|
||||
assert len(s.cors_origins) > 0
|
||||
|
||||
def test_default_trusted_hosts_is_list(self):
|
||||
s = _make_settings()
|
||||
assert isinstance(s.trusted_hosts, list)
|
||||
assert "localhost" in s.trusted_hosts
|
||||
|
||||
def test_default_timmy_model_backend(self):
|
||||
s = _make_settings()
|
||||
assert s.timmy_model_backend == "ollama"
|
||||
|
||||
def test_default_grok_enabled_false(self):
|
||||
s = _make_settings()
|
||||
assert s.grok_enabled is False
|
||||
|
||||
def test_default_moderation_enabled(self):
|
||||
s = _make_settings()
|
||||
assert s.moderation_enabled is True
|
||||
|
||||
def test_default_moderation_threshold(self):
|
||||
s = _make_settings()
|
||||
assert s.moderation_threshold == 0.8
|
||||
|
||||
def test_default_telemetry_disabled(self):
|
||||
s = _make_settings()
|
||||
assert s.telemetry_enabled is False
|
||||
|
||||
def test_default_db_busy_timeout(self):
|
||||
s = _make_settings()
|
||||
assert s.db_busy_timeout_ms == 5000
|
||||
|
||||
def test_default_chat_api_max_body_bytes(self):
|
||||
s = _make_settings()
|
||||
assert s.chat_api_max_body_bytes == 1_048_576
|
||||
|
||||
def test_default_csrf_cookie_secure_false(self):
|
||||
s = _make_settings()
|
||||
assert s.csrf_cookie_secure is False
|
||||
|
||||
def test_default_self_modify_disabled(self):
|
||||
s = _make_settings()
|
||||
assert s.self_modify_enabled is False
|
||||
|
||||
def test_default_vassal_disabled(self):
|
||||
s = _make_settings()
|
||||
assert s.vassal_enabled is False
|
||||
|
||||
def test_default_focus_mode(self):
|
||||
s = _make_settings()
|
||||
assert s.focus_mode == "broad"
|
||||
|
||||
def test_default_thinking_enabled(self):
|
||||
s = _make_settings()
|
||||
assert s.thinking_enabled is True
|
||||
|
||||
def test_default_gitea_url(self):
|
||||
s = _make_settings()
|
||||
assert s.gitea_url == "http://localhost:3000"
|
||||
|
||||
def test_default_hermes_enabled(self):
|
||||
s = _make_settings()
|
||||
assert s.hermes_enabled is True
|
||||
|
||||
def test_default_scripture_enabled(self):
|
||||
s = _make_settings()
|
||||
assert s.scripture_enabled is True
|
||||
|
||||
|
||||
# ── normalized_ollama_url property ───────────────────────────────────────────
|
||||
|
||||
|
||||
class TestNormalizedOllamaUrlProperty:
|
||||
"""normalized_ollama_url property applies normalize_ollama_url."""
|
||||
|
||||
def test_default_url_normalized(self):
|
||||
s = _make_settings()
|
||||
assert "127.0.0.1" in s.normalized_ollama_url
|
||||
assert "localhost" not in s.normalized_ollama_url
|
||||
|
||||
def test_custom_url_with_localhost(self):
|
||||
s = _make_settings(OLLAMA_URL="http://localhost:9999")
|
||||
assert s.normalized_ollama_url == "http://127.0.0.1:9999"
|
||||
|
||||
def test_custom_url_without_localhost_unchanged(self):
|
||||
s = _make_settings(OLLAMA_URL="http://192.168.1.5:11434")
|
||||
assert s.normalized_ollama_url == "http://192.168.1.5:11434"
|
||||
|
||||
|
||||
# ── Env var overrides ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSettingsEnvOverrides:
|
||||
"""Environment variables override default values."""
|
||||
|
||||
def test_agent_name_override(self):
|
||||
s = _make_settings(AGENT_NAME="Timmy")
|
||||
assert s.agent_name == "Timmy"
|
||||
|
||||
def test_ollama_url_override(self):
|
||||
s = _make_settings(OLLAMA_URL="http://10.0.0.1:11434")
|
||||
assert s.ollama_url == "http://10.0.0.1:11434"
|
||||
|
||||
def test_ollama_model_override(self):
|
||||
s = _make_settings(OLLAMA_MODEL="llama3.1")
|
||||
assert s.ollama_model == "llama3.1"
|
||||
|
||||
def test_ollama_fast_model_override(self):
|
||||
s = _make_settings(OLLAMA_FAST_MODEL="gemma:2b")
|
||||
assert s.ollama_fast_model == "gemma:2b"
|
||||
|
||||
def test_ollama_num_ctx_override(self):
|
||||
s = _make_settings(OLLAMA_NUM_CTX="8192")
|
||||
assert s.ollama_num_ctx == 8192
|
||||
|
||||
def test_debug_true_from_string(self):
|
||||
s = _make_settings(DEBUG="true")
|
||||
assert s.debug is True
|
||||
|
||||
def test_debug_false_from_string(self):
|
||||
s = _make_settings(DEBUG="false")
|
||||
assert s.debug is False
|
||||
|
||||
def test_timmy_env_production(self):
|
||||
s = _make_settings(TIMMY_ENV="production")
|
||||
assert s.timmy_env == "production"
|
||||
|
||||
def test_timmy_test_mode_true(self):
|
||||
s = _make_settings(TIMMY_TEST_MODE="true")
|
||||
assert s.timmy_test_mode is True
|
||||
|
||||
def test_grok_enabled_override(self):
|
||||
s = _make_settings(GROK_ENABLED="true")
|
||||
assert s.grok_enabled is True
|
||||
|
||||
def test_spark_enabled_override(self):
|
||||
s = _make_settings(SPARK_ENABLED="false")
|
||||
assert s.spark_enabled is False
|
||||
|
||||
def test_memory_prune_days_override(self):
|
||||
s = _make_settings(MEMORY_PRUNE_DAYS="30")
|
||||
assert s.memory_prune_days == 30
|
||||
|
||||
def test_max_agent_steps_override(self):
|
||||
s = _make_settings(MAX_AGENT_STEPS="25")
|
||||
assert s.max_agent_steps == 25
|
||||
|
||||
def test_telegram_token_override(self):
|
||||
s = _make_settings(TELEGRAM_TOKEN="tg-secret")
|
||||
assert s.telegram_token == "tg-secret"
|
||||
|
||||
def test_discord_token_override(self):
|
||||
s = _make_settings(DISCORD_TOKEN="dc-secret")
|
||||
assert s.discord_token == "dc-secret"
|
||||
|
||||
def test_gitea_url_override(self):
|
||||
s = _make_settings(GITEA_URL="http://10.0.0.1:3000")
|
||||
assert s.gitea_url == "http://10.0.0.1:3000"
|
||||
|
||||
def test_gitea_repo_override(self):
|
||||
s = _make_settings(GITEA_REPO="myorg/myrepo")
|
||||
assert s.gitea_repo == "myorg/myrepo"
|
||||
|
||||
def test_focus_mode_deep(self):
|
||||
s = _make_settings(FOCUS_MODE="deep")
|
||||
assert s.focus_mode == "deep"
|
||||
|
||||
def test_thinking_interval_override(self):
|
||||
s = _make_settings(THINKING_INTERVAL_SECONDS="60")
|
||||
assert s.thinking_interval_seconds == 60
|
||||
|
||||
def test_hermes_interval_override(self):
|
||||
s = _make_settings(HERMES_INTERVAL_SECONDS="60")
|
||||
assert s.hermes_interval_seconds == 60
|
||||
|
||||
def test_vassal_enabled_override(self):
|
||||
s = _make_settings(VASSAL_ENABLED="true")
|
||||
assert s.vassal_enabled is True
|
||||
|
||||
def test_self_modify_enabled_override(self):
|
||||
s = _make_settings(SELF_MODIFY_ENABLED="true")
|
||||
assert s.self_modify_enabled is True
|
||||
|
||||
def test_moderation_enabled_override(self):
|
||||
s = _make_settings(MODERATION_ENABLED="false")
|
||||
assert s.moderation_enabled is False
|
||||
|
||||
def test_l402_hmac_secret_override(self):
|
||||
s = _make_settings(L402_HMAC_SECRET="mysecret")
|
||||
assert s.l402_hmac_secret == "mysecret"
|
||||
|
||||
def test_anthropic_api_key_override(self):
|
||||
s = _make_settings(ANTHROPIC_API_KEY="sk-ant-abc")
|
||||
assert s.anthropic_api_key == "sk-ant-abc"
|
||||
|
||||
|
||||
# ── Type validation ───────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSettingsTypeValidation:
|
||||
"""Pydantic correctly parses and validates types from string env vars."""
|
||||
|
||||
def test_bool_from_1(self):
|
||||
s = _make_settings(DEBUG="1")
|
||||
assert s.debug is True
|
||||
|
||||
def test_bool_from_0(self):
|
||||
s = _make_settings(DEBUG="0")
|
||||
assert s.debug is False
|
||||
|
||||
def test_int_field_rejects_non_numeric(self):
|
||||
from pydantic import ValidationError
|
||||
|
||||
with pytest.raises(ValidationError):
|
||||
_make_settings(OLLAMA_NUM_CTX="not_a_number")
|
||||
|
||||
def test_timmy_env_rejects_invalid_literal(self):
|
||||
from pydantic import ValidationError
|
||||
|
||||
with pytest.raises(ValidationError):
|
||||
_make_settings(TIMMY_ENV="staging")
|
||||
|
||||
def test_timmy_model_backend_rejects_invalid(self):
|
||||
from pydantic import ValidationError
|
||||
|
||||
with pytest.raises(ValidationError):
|
||||
_make_settings(TIMMY_MODEL_BACKEND="openai")
|
||||
|
||||
def test_timmy_model_backend_accepts_all_valid_values(self):
|
||||
for backend in ("ollama", "grok", "claude", "auto"):
|
||||
s = _make_settings(TIMMY_MODEL_BACKEND=backend)
|
||||
assert s.timmy_model_backend == backend
|
||||
|
||||
def test_lightning_backend_accepts_mock(self):
|
||||
s = _make_settings(LIGHTNING_BACKEND="mock")
|
||||
assert s.lightning_backend == "mock"
|
||||
|
||||
def test_lightning_backend_accepts_lnd(self):
|
||||
s = _make_settings(LIGHTNING_BACKEND="lnd")
|
||||
assert s.lightning_backend == "lnd"
|
||||
|
||||
def test_lightning_backend_rejects_invalid(self):
|
||||
from pydantic import ValidationError
|
||||
|
||||
with pytest.raises(ValidationError):
|
||||
_make_settings(LIGHTNING_BACKEND="stripe")
|
||||
|
||||
def test_focus_mode_rejects_invalid(self):
|
||||
from pydantic import ValidationError
|
||||
|
||||
with pytest.raises(ValidationError):
|
||||
_make_settings(FOCUS_MODE="zen")
|
||||
|
||||
def test_extra_fields_ignored(self):
|
||||
# model_config has extra="ignore"
|
||||
s = _make_settings(TOTALLY_UNKNOWN_FIELD="hello")
|
||||
assert not hasattr(s, "totally_unknown_field")
|
||||
|
||||
def test_float_field_moderation_threshold(self):
|
||||
s = _make_settings(MODERATION_THRESHOLD="0.95")
|
||||
assert s.moderation_threshold == pytest.approx(0.95)
|
||||
|
||||
def test_float_field_gabs_timeout(self):
|
||||
s = _make_settings(GABS_TIMEOUT="10.5")
|
||||
assert s.gabs_timeout == pytest.approx(10.5)
|
||||
|
||||
|
||||
# ── Edge cases ────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSettingsEdgeCases:
|
||||
"""Edge cases: empty strings, boundary values."""
|
||||
|
||||
def test_empty_string_tokens_stay_empty(self):
|
||||
s = _make_settings(TELEGRAM_TOKEN="", DISCORD_TOKEN="")
|
||||
assert s.telegram_token == ""
|
||||
assert s.discord_token == ""
|
||||
|
||||
def test_zero_int_fields(self):
|
||||
s = _make_settings(OLLAMA_NUM_CTX="0", MEMORY_PRUNE_DAYS="0")
|
||||
assert s.ollama_num_ctx == 0
|
||||
assert s.memory_prune_days == 0
|
||||
|
||||
def test_large_int_value(self):
|
||||
s = _make_settings(CHAT_API_MAX_BODY_BYTES="104857600")
|
||||
assert s.chat_api_max_body_bytes == 104857600
|
||||
|
||||
def test_negative_int_accepted(self):
|
||||
# Pydantic doesn't constrain these to positive by default
|
||||
s = _make_settings(MAX_AGENT_STEPS="-1")
|
||||
assert s.max_agent_steps == -1
|
||||
|
||||
def test_empty_api_keys_are_strings(self):
|
||||
s = _make_settings()
|
||||
assert isinstance(s.anthropic_api_key, str)
|
||||
assert isinstance(s.xai_api_key, str)
|
||||
assert isinstance(s.gitea_token, str)
|
||||
|
||||
|
||||
# ── _compute_repo_root ────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestComputeRepoRoot:
|
||||
"""_compute_repo_root auto-detects .git directory."""
|
||||
|
||||
def test_returns_non_empty_string(self):
|
||||
from config import Settings
|
||||
|
||||
s = Settings()
|
||||
result = s._compute_repo_root()
|
||||
assert isinstance(result, str)
|
||||
assert len(result) > 0
|
||||
|
||||
def test_explicit_repo_root_returned_directly(self):
|
||||
from config import Settings
|
||||
|
||||
s = Settings()
|
||||
s.repo_root = "/tmp/custom-repo"
|
||||
assert s._compute_repo_root() == "/tmp/custom-repo"
|
||||
|
||||
def test_detects_git_directory(self):
|
||||
from config import Settings
|
||||
|
||||
s = Settings()
|
||||
result = s._compute_repo_root()
|
||||
import os
|
||||
|
||||
# The detected root should contain a .git directory (or be the cwd fallback)
|
||||
assert os.path.isabs(result)
|
||||
|
||||
|
||||
# ── model_post_init / gitea_token file fallback ───────────────────────────────
|
||||
|
||||
|
||||
class TestModelPostInit:
|
||||
"""model_post_init resolves gitea_token from file fallback."""
|
||||
|
||||
def test_gitea_token_from_env(self):
|
||||
from config import Settings
|
||||
|
||||
with patch.dict(os.environ, {"GITEA_TOKEN": "env-token-abc"}, clear=False):
|
||||
s = Settings()
|
||||
assert s.gitea_token == "env-token-abc"
|
||||
|
||||
def test_gitea_token_stays_empty_when_no_file(self):
|
||||
from config import Settings
|
||||
|
||||
env = {k: v for k, v in os.environ.items() if k != "GITEA_TOKEN"}
|
||||
with patch.dict(os.environ, env, clear=True):
|
||||
with patch("os.path.isfile", return_value=False):
|
||||
s = Settings()
|
||||
assert s.gitea_token == ""
|
||||
|
||||
def test_gitea_token_read_from_timmy_token_file(self, tmp_path):
|
||||
"""model_post_init reads token from .timmy_gitea_token file."""
|
||||
from config import Settings
|
||||
|
||||
token_file = tmp_path / ".timmy_gitea_token"
|
||||
token_file.write_text("file-token-xyz\n")
|
||||
|
||||
env = {k: v for k, v in os.environ.items() if k != "GITEA_TOKEN"}
|
||||
with patch.dict(os.environ, env, clear=True):
|
||||
s = Settings()
|
||||
|
||||
# Override repo_root so post_init finds our temp file
|
||||
def _fake_root():
|
||||
return str(tmp_path)
|
||||
|
||||
s._compute_repo_root = _fake_root # type: ignore[method-assign]
|
||||
# Re-run post_init logic manually since Settings is already created
|
||||
s.gitea_token = ""
|
||||
repo_root = _fake_root()
|
||||
token_path = os.path.join(repo_root, ".timmy_gitea_token")
|
||||
if os.path.isfile(token_path):
|
||||
s.gitea_token = open(token_path).read().strip() # noqa: SIM115
|
||||
assert s.gitea_token == "file-token-xyz"
|
||||
|
||||
def test_gitea_token_empty_file_stays_empty(self, tmp_path):
|
||||
"""Empty token file leaves gitea_token as empty string."""
|
||||
token_file = tmp_path / ".timmy_gitea_token"
|
||||
token_file.write_text(" \n") # only whitespace
|
||||
|
||||
from config import Settings
|
||||
|
||||
env = {k: v for k, v in os.environ.items() if k != "GITEA_TOKEN"}
|
||||
with patch.dict(os.environ, env, clear=True):
|
||||
s = Settings()
|
||||
# Simulate post_init with the tmp dir
|
||||
s.gitea_token = ""
|
||||
token_path = str(token_file)
|
||||
if os.path.isfile(token_path):
|
||||
token = open(token_path).read().strip() # noqa: SIM115
|
||||
if token:
|
||||
s.gitea_token = token
|
||||
assert s.gitea_token == ""
|
||||
|
||||
|
||||
# ── check_ollama_model_available ──────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestCheckOllamaModelAvailable:
|
||||
"""check_ollama_model_available handles network responses and errors."""
|
||||
|
||||
def test_returns_false_on_oserror(self):
|
||||
from config import check_ollama_model_available
|
||||
|
||||
with patch("urllib.request.urlopen", side_effect=OSError("Connection refused")):
|
||||
assert check_ollama_model_available("llama3.1") is False
|
||||
|
||||
def test_returns_false_on_value_error(self):
|
||||
from config import check_ollama_model_available
|
||||
|
||||
with patch("urllib.request.urlopen", side_effect=ValueError("Bad JSON")):
|
||||
assert check_ollama_model_available("llama3.1") is False
|
||||
|
||||
def test_returns_true_exact_model_match(self):
|
||||
import json
|
||||
|
||||
from config import check_ollama_model_available
|
||||
|
||||
response_data = json.dumps({"models": [{"name": "llama3.1:8b-instruct"}]}).encode()
|
||||
mock_response = MagicMock()
|
||||
mock_response.read.return_value = response_data
|
||||
mock_response.__enter__ = lambda s: s
|
||||
mock_response.__exit__ = MagicMock(return_value=False)
|
||||
|
||||
with patch("urllib.request.urlopen", return_value=mock_response):
|
||||
assert check_ollama_model_available("llama3.1") is True
|
||||
|
||||
def test_returns_true_startswith_match(self):
|
||||
import json
|
||||
|
||||
from config import check_ollama_model_available
|
||||
|
||||
response_data = json.dumps({"models": [{"name": "qwen3:14b"}]}).encode()
|
||||
mock_response = MagicMock()
|
||||
mock_response.read.return_value = response_data
|
||||
mock_response.__enter__ = lambda s: s
|
||||
mock_response.__exit__ = MagicMock(return_value=False)
|
||||
|
||||
with patch("urllib.request.urlopen", return_value=mock_response):
|
||||
# "qwen3" matches "qwen3:14b" via startswith
|
||||
assert check_ollama_model_available("qwen3") is True
|
||||
|
||||
def test_returns_false_when_model_not_found(self):
|
||||
import json
|
||||
|
||||
from config import check_ollama_model_available
|
||||
|
||||
response_data = json.dumps({"models": [{"name": "qwen2.5:7b"}]}).encode()
|
||||
mock_response = MagicMock()
|
||||
mock_response.read.return_value = response_data
|
||||
mock_response.__enter__ = lambda s: s
|
||||
mock_response.__exit__ = MagicMock(return_value=False)
|
||||
|
||||
with patch("urllib.request.urlopen", return_value=mock_response):
|
||||
assert check_ollama_model_available("llama3.1") is False
|
||||
|
||||
def test_returns_false_empty_model_list(self):
|
||||
import json
|
||||
|
||||
from config import check_ollama_model_available
|
||||
|
||||
response_data = json.dumps({"models": []}).encode()
|
||||
mock_response = MagicMock()
|
||||
mock_response.read.return_value = response_data
|
||||
mock_response.__enter__ = lambda s: s
|
||||
mock_response.__exit__ = MagicMock(return_value=False)
|
||||
|
||||
with patch("urllib.request.urlopen", return_value=mock_response):
|
||||
assert check_ollama_model_available("llama3.1") is False
|
||||
|
||||
def test_exact_name_match(self):
|
||||
import json
|
||||
|
||||
from config import check_ollama_model_available
|
||||
|
||||
response_data = json.dumps({"models": [{"name": "qwen3:14b"}]}).encode()
|
||||
mock_response = MagicMock()
|
||||
mock_response.read.return_value = response_data
|
||||
mock_response.__enter__ = lambda s: s
|
||||
mock_response.__exit__ = MagicMock(return_value=False)
|
||||
|
||||
with patch("urllib.request.urlopen", return_value=mock_response):
|
||||
assert check_ollama_model_available("qwen3:14b") is True
|
||||
|
||||
|
||||
# ── get_effective_ollama_model ────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestGetEffectiveOllamaModel:
|
||||
"""get_effective_ollama_model walks fallback chain."""
|
||||
|
||||
def test_returns_primary_when_available(self):
|
||||
from config import get_effective_ollama_model, settings
|
||||
|
||||
with patch("config.check_ollama_model_available", return_value=True):
|
||||
result = get_effective_ollama_model()
|
||||
# Should return whatever the settings primary model is
|
||||
assert result == settings.ollama_model
|
||||
|
||||
def test_falls_back_when_primary_unavailable(self):
|
||||
from config import get_effective_ollama_model, settings
|
||||
|
||||
# Make primary unavailable, but one fallback available
|
||||
fallback_target = settings.fallback_models[0]
|
||||
|
||||
def side_effect(model):
|
||||
return model == fallback_target
|
||||
|
||||
with patch("config.check_ollama_model_available", side_effect=side_effect):
|
||||
result = get_effective_ollama_model()
|
||||
assert result == fallback_target
|
||||
|
||||
def test_returns_user_model_when_nothing_available(self):
|
||||
from config import get_effective_ollama_model, settings
|
||||
|
||||
with patch("config.check_ollama_model_available", return_value=False):
|
||||
result = get_effective_ollama_model()
|
||||
# Last resort: returns user's configured model
|
||||
assert result == settings.ollama_model
|
||||
|
||||
def test_skips_unavailable_fallbacks(self):
|
||||
from config import get_effective_ollama_model, settings
|
||||
|
||||
# Only the last fallback is available
|
||||
fallbacks = settings.fallback_models
|
||||
last_fallback = fallbacks[-1]
|
||||
|
||||
def side_effect(model):
|
||||
return model == last_fallback
|
||||
|
||||
with patch("config.check_ollama_model_available", side_effect=side_effect):
|
||||
result = get_effective_ollama_model()
|
||||
assert result == last_fallback
|
||||
|
||||
|
||||
# ── validate_startup ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestValidateStartup:
|
||||
"""validate_startup enforces security in production, warns in dev."""
|
||||
|
||||
def setup_method(self):
|
||||
import config
|
||||
|
||||
config._startup_validated = False
|
||||
|
||||
def test_skips_in_test_mode(self):
|
||||
import config
|
||||
|
||||
with patch.dict(os.environ, {"TIMMY_TEST_MODE": "1"}):
|
||||
config.validate_startup()
|
||||
assert config._startup_validated is True
|
||||
|
||||
def test_dev_mode_does_not_exit(self):
|
||||
import config
|
||||
|
||||
config._startup_validated = False
|
||||
env = {k: v for k, v in os.environ.items() if k != "TIMMY_TEST_MODE"}
|
||||
env["TIMMY_ENV"] = "development"
|
||||
with patch.dict(os.environ, env, clear=True):
|
||||
# Should not raise SystemExit
|
||||
config.validate_startup()
|
||||
assert config._startup_validated is True
|
||||
|
||||
def test_production_exits_without_l402_hmac_secret(self):
|
||||
import config
|
||||
|
||||
config._startup_validated = False
|
||||
with patch.object(config.settings, "timmy_env", "production"):
|
||||
with patch.object(config.settings, "l402_hmac_secret", ""):
|
||||
with patch.object(config.settings, "l402_macaroon_secret", ""):
|
||||
with pytest.raises(SystemExit):
|
||||
config.validate_startup(force=True)
|
||||
|
||||
def test_production_exits_without_l402_macaroon_secret(self):
|
||||
import config
|
||||
|
||||
config._startup_validated = False
|
||||
with patch.object(config.settings, "timmy_env", "production"):
|
||||
with patch.object(config.settings, "l402_hmac_secret", "present"):
|
||||
with patch.object(config.settings, "l402_macaroon_secret", ""):
|
||||
with pytest.raises(SystemExit):
|
||||
config.validate_startup(force=True)
|
||||
|
||||
def test_production_exits_with_cors_wildcard(self):
|
||||
import config
|
||||
|
||||
config._startup_validated = False
|
||||
with patch.object(config.settings, "timmy_env", "production"):
|
||||
with patch.object(config.settings, "l402_hmac_secret", "secret1"):
|
||||
with patch.object(config.settings, "l402_macaroon_secret", "secret2"):
|
||||
with patch.object(config.settings, "cors_origins", ["*"]):
|
||||
with pytest.raises(SystemExit):
|
||||
config.validate_startup(force=True)
|
||||
|
||||
def test_production_passes_with_all_secrets_and_no_wildcard(self):
|
||||
import config
|
||||
|
||||
config._startup_validated = False
|
||||
with patch.object(config.settings, "timmy_env", "production"):
|
||||
with patch.object(config.settings, "l402_hmac_secret", "secret1"):
|
||||
with patch.object(config.settings, "l402_macaroon_secret", "secret2"):
|
||||
with patch.object(config.settings, "cors_origins", ["http://localhost:3000"]):
|
||||
config.validate_startup(force=True)
|
||||
assert config._startup_validated is True
|
||||
|
||||
def test_idempotent_without_force(self):
|
||||
import config
|
||||
|
||||
config._startup_validated = True
|
||||
config.validate_startup()
|
||||
assert config._startup_validated is True
|
||||
|
||||
def test_force_reruns_when_already_validated(self):
|
||||
import config
|
||||
|
||||
config._startup_validated = True
|
||||
with patch.dict(os.environ, {"TIMMY_TEST_MODE": "1"}):
|
||||
config.validate_startup(force=True)
|
||||
# Should have run (and set validated again)
|
||||
assert config._startup_validated is True
|
||||
|
||||
def test_dev_warns_on_cors_wildcard(self, caplog):
|
||||
import logging
|
||||
|
||||
import config
|
||||
|
||||
config._startup_validated = False
|
||||
env = {k: v for k, v in os.environ.items() if k != "TIMMY_TEST_MODE"}
|
||||
env["TIMMY_ENV"] = "development"
|
||||
with patch.dict(os.environ, env, clear=True):
|
||||
with patch.object(config.settings, "timmy_env", "development"):
|
||||
with patch.object(config.settings, "cors_origins", ["*"]):
|
||||
with patch.object(config.settings, "l402_hmac_secret", ""):
|
||||
with patch.object(config.settings, "l402_macaroon_secret", ""):
|
||||
with caplog.at_level(logging.WARNING, logger="config"):
|
||||
config.validate_startup(force=True)
|
||||
assert any("CORS" in rec.message for rec in caplog.records)
|
||||
|
||||
|
||||
# ── APP_START_TIME ────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestAppStartTime:
|
||||
"""APP_START_TIME is set at module load."""
|
||||
|
||||
def test_app_start_time_is_datetime(self):
|
||||
from datetime import datetime
|
||||
|
||||
from config import APP_START_TIME
|
||||
|
||||
assert isinstance(APP_START_TIME, datetime)
|
||||
|
||||
def test_app_start_time_has_utc_timezone(self):
|
||||
from config import APP_START_TIME
|
||||
|
||||
assert APP_START_TIME.tzinfo is not None
|
||||
|
||||
def test_app_start_time_is_in_the_past_or_now(self):
|
||||
from datetime import UTC, datetime
|
||||
|
||||
from config import APP_START_TIME
|
||||
|
||||
assert APP_START_TIME <= datetime.now(UTC)
|
||||
|
||||
|
||||
# ── Module-level singleton ────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSettingsSingleton:
|
||||
"""The module-level `settings` singleton is a Settings instance."""
|
||||
|
||||
def test_settings_is_settings_instance(self):
|
||||
from config import Settings, settings
|
||||
|
||||
assert isinstance(settings, Settings)
|
||||
|
||||
def test_settings_repo_root_is_set(self):
|
||||
from config import settings
|
||||
|
||||
assert isinstance(settings.repo_root, str)
|
||||
|
||||
def test_settings_has_expected_defaults(self):
|
||||
from config import settings
|
||||
|
||||
# In test mode these may be overridden, but type should be correct
|
||||
assert isinstance(settings.ollama_url, str)
|
||||
assert isinstance(settings.debug, bool)
|
||||
@@ -6,12 +6,7 @@ import pytest
|
||||
|
||||
from infrastructure.presence import (
|
||||
DEFAULT_PIP_STATE,
|
||||
_get_agents_online,
|
||||
_get_familiar_state,
|
||||
_get_memory_count,
|
||||
_get_thinking_active,
|
||||
_get_uptime_seconds,
|
||||
_get_visitors,
|
||||
produce_agent_state,
|
||||
produce_bark,
|
||||
produce_system_status,
|
||||
@@ -505,36 +500,3 @@ class TestProduceSystemStatus:
|
||||
"""produce_system_status always returns a plain dict."""
|
||||
result = produce_system_status()
|
||||
assert isinstance(result, dict)
|
||||
|
||||
|
||||
class TestSystemStatusHelpers:
|
||||
"""Tests for the helper functions extracted from produce_system_status()."""
|
||||
|
||||
def test_get_agents_online_returns_int(self):
|
||||
"""_get_agents_online returns a non-negative int."""
|
||||
result = _get_agents_online()
|
||||
assert isinstance(result, int)
|
||||
assert result >= 0
|
||||
|
||||
def test_get_visitors_returns_int(self):
|
||||
"""_get_visitors returns a non-negative int."""
|
||||
result = _get_visitors()
|
||||
assert isinstance(result, int)
|
||||
assert result >= 0
|
||||
|
||||
def test_get_uptime_seconds_returns_int(self):
|
||||
"""_get_uptime_seconds returns a non-negative int."""
|
||||
result = _get_uptime_seconds()
|
||||
assert isinstance(result, int)
|
||||
assert result >= 0
|
||||
|
||||
def test_get_thinking_active_returns_bool(self):
|
||||
"""_get_thinking_active returns a bool."""
|
||||
result = _get_thinking_active()
|
||||
assert isinstance(result, bool)
|
||||
|
||||
def test_get_memory_count_returns_int(self):
|
||||
"""_get_memory_count returns a non-negative int."""
|
||||
result = _get_memory_count()
|
||||
assert isinstance(result, int)
|
||||
assert result >= 0
|
||||
|
||||
@@ -1,272 +0,0 @@
|
||||
"""Unit tests for the sovereignty metrics emitter and store.
|
||||
|
||||
Refs: #954
|
||||
"""
|
||||
|
||||
from unittest.mock import AsyncMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from timmy.sovereignty.metrics import (
|
||||
ALL_EVENT_TYPES,
|
||||
SovereigntyMetricsStore,
|
||||
emit_sovereignty_event,
|
||||
get_cost_per_hour,
|
||||
get_skills_crystallized,
|
||||
get_sovereignty_pct,
|
||||
record,
|
||||
)
|
||||
|
||||
pytestmark = pytest.mark.unit
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def store(tmp_path):
|
||||
"""A fresh SovereigntyMetricsStore backed by a temp database."""
|
||||
return SovereigntyMetricsStore(db_path=tmp_path / "test_sov.db")
|
||||
|
||||
|
||||
# ── ALL_EVENT_TYPES ───────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestEventTypes:
|
||||
def test_all_expected_event_types_present(self):
|
||||
expected = {
|
||||
"perception_cache_hit",
|
||||
"perception_vlm_call",
|
||||
"decision_rule_hit",
|
||||
"decision_llm_call",
|
||||
"narration_template",
|
||||
"narration_llm",
|
||||
"skill_crystallized",
|
||||
"api_call",
|
||||
"api_cost",
|
||||
}
|
||||
assert ALL_EVENT_TYPES == expected
|
||||
|
||||
|
||||
# ── Record & retrieval ────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestRecord:
|
||||
def test_record_inserts_event(self, store):
|
||||
store.record("perception_cache_hit")
|
||||
pct = store.get_sovereignty_pct("perception")
|
||||
assert pct == 100.0
|
||||
|
||||
def test_record_with_metadata(self, store):
|
||||
store.record("api_cost", metadata={"usd": 0.05})
|
||||
cost = store.get_cost_per_hour()
|
||||
assert cost > 0.0
|
||||
|
||||
def test_record_with_session_id(self, store):
|
||||
store.record("skill_crystallized", session_id="sess-1")
|
||||
assert store.get_skills_crystallized("sess-1") == 1
|
||||
|
||||
def test_record_unknown_type_does_not_raise(self, store):
|
||||
"""Unknown event types are silently stored (no crash)."""
|
||||
store.record("totally_unknown_event") # should not raise
|
||||
|
||||
|
||||
# ── Sessions ──────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSessions:
|
||||
def test_start_session_returns_id(self, store):
|
||||
sid = store.start_session(game="Bannerlord")
|
||||
assert isinstance(sid, str)
|
||||
assert len(sid) > 0
|
||||
|
||||
def test_start_session_accepts_custom_id(self, store):
|
||||
sid = store.start_session(game="Bannerlord", session_id="my-session")
|
||||
assert sid == "my-session"
|
||||
|
||||
def test_end_session_does_not_raise(self, store):
|
||||
sid = store.start_session()
|
||||
store.end_session(sid) # should not raise
|
||||
|
||||
def test_start_session_idempotent(self, store):
|
||||
"""Starting a session with the same ID twice is a no-op."""
|
||||
sid = store.start_session(session_id="dup")
|
||||
sid2 = store.start_session(session_id="dup")
|
||||
assert sid == sid2
|
||||
|
||||
|
||||
# ── Sovereignty percentage ────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestGetSovereigntyPct:
|
||||
def test_perception_all_cache_hits(self, store):
|
||||
for _ in range(5):
|
||||
store.record("perception_cache_hit")
|
||||
assert store.get_sovereignty_pct("perception") == 100.0
|
||||
|
||||
def test_perception_mixed(self, store):
|
||||
store.record("perception_cache_hit")
|
||||
store.record("perception_vlm_call")
|
||||
assert store.get_sovereignty_pct("perception") == 50.0
|
||||
|
||||
def test_decision_all_sovereign(self, store):
|
||||
for _ in range(3):
|
||||
store.record("decision_rule_hit")
|
||||
assert store.get_sovereignty_pct("decision") == 100.0
|
||||
|
||||
def test_narration_all_sovereign(self, store):
|
||||
store.record("narration_template")
|
||||
store.record("narration_template")
|
||||
assert store.get_sovereignty_pct("narration") == 100.0
|
||||
|
||||
def test_narration_all_llm(self, store):
|
||||
store.record("narration_llm")
|
||||
assert store.get_sovereignty_pct("narration") == 0.0
|
||||
|
||||
def test_no_events_returns_zero(self, store):
|
||||
assert store.get_sovereignty_pct("perception") == 0.0
|
||||
|
||||
def test_unknown_layer_returns_zero(self, store):
|
||||
assert store.get_sovereignty_pct("nonexistent_layer") == 0.0
|
||||
|
||||
def test_time_window_filters_old_events(self, store, tmp_path):
|
||||
"""Events outside the time window are excluded."""
|
||||
# Insert an event with a very old timestamp directly
|
||||
import sqlite3
|
||||
from contextlib import closing
|
||||
|
||||
with closing(sqlite3.connect(str(store._db_path))) as conn:
|
||||
conn.execute(
|
||||
"INSERT INTO events (timestamp, event_type, session_id, metadata_json) VALUES (?, ?, ?, ?)",
|
||||
("2000-01-01T00:00:00+00:00", "perception_cache_hit", "", "{}"),
|
||||
)
|
||||
conn.commit()
|
||||
|
||||
# With a 60-second window, the old event should be excluded
|
||||
pct = store.get_sovereignty_pct("perception", time_window=60)
|
||||
assert pct == 0.0
|
||||
|
||||
def test_time_window_includes_recent_events(self, store):
|
||||
store.record("decision_rule_hit")
|
||||
pct = store.get_sovereignty_pct("decision", time_window=60)
|
||||
assert pct == 100.0
|
||||
|
||||
|
||||
# ── Cost per hour ─────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestGetCostPerHour:
|
||||
def test_no_events_returns_zero(self, store):
|
||||
assert store.get_cost_per_hour() == 0.0
|
||||
|
||||
def test_single_cost_event(self, store):
|
||||
# Record a cost of $1.00 within the last hour window
|
||||
store.record("api_cost", metadata={"usd": 1.00})
|
||||
cost = store.get_cost_per_hour(time_window=3600)
|
||||
assert cost == pytest.approx(1.00, rel=1e-3)
|
||||
|
||||
def test_multiple_cost_events(self, store):
|
||||
store.record("api_cost", metadata={"usd": 0.25})
|
||||
store.record("api_cost", metadata={"usd": 0.75})
|
||||
cost = store.get_cost_per_hour(time_window=3600)
|
||||
assert cost == pytest.approx(1.00, rel=1e-3)
|
||||
|
||||
def test_missing_usd_field_is_zero(self, store):
|
||||
store.record("api_cost", metadata={"model": "gpt-4"})
|
||||
assert store.get_cost_per_hour() == 0.0
|
||||
|
||||
def test_cost_extrapolated_for_short_window(self, store):
|
||||
"""Cost recorded in a 1800s window is doubled to get per-hour rate."""
|
||||
store.record("api_cost", metadata={"usd": 0.5})
|
||||
cost = store.get_cost_per_hour(time_window=1800)
|
||||
assert cost == pytest.approx(1.0, rel=1e-3)
|
||||
|
||||
|
||||
# ── Skills crystallised ───────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestGetSkillsCrystallized:
|
||||
def test_no_skills_returns_zero(self, store):
|
||||
assert store.get_skills_crystallized() == 0
|
||||
|
||||
def test_counts_all_sessions(self, store):
|
||||
store.record("skill_crystallized", session_id="a")
|
||||
store.record("skill_crystallized", session_id="b")
|
||||
assert store.get_skills_crystallized() == 2
|
||||
|
||||
def test_filters_by_session(self, store):
|
||||
store.record("skill_crystallized", session_id="sess-1")
|
||||
store.record("skill_crystallized", session_id="sess-2")
|
||||
assert store.get_skills_crystallized("sess-1") == 1
|
||||
|
||||
def test_session_with_no_skills(self, store):
|
||||
store.record("skill_crystallized", session_id="sess-1")
|
||||
assert store.get_skills_crystallized("sess-999") == 0
|
||||
|
||||
|
||||
# ── Snapshot ──────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestGetSnapshot:
|
||||
def test_snapshot_structure(self, store):
|
||||
snap = store.get_snapshot()
|
||||
assert "sovereignty" in snap
|
||||
assert "cost_per_hour" in snap
|
||||
assert "skills_crystallized" in snap
|
||||
|
||||
def test_snapshot_sovereignty_has_all_layers(self, store):
|
||||
snap = store.get_snapshot()
|
||||
assert set(snap["sovereignty"].keys()) == {"perception", "decision", "narration"}
|
||||
|
||||
def test_snapshot_reflects_events(self, store):
|
||||
store.record("perception_cache_hit")
|
||||
store.record("skill_crystallized")
|
||||
snap = store.get_snapshot()
|
||||
assert snap["sovereignty"]["perception"] == 100.0
|
||||
assert snap["skills_crystallized"] == 1
|
||||
|
||||
|
||||
# ── Module-level convenience functions ───────────────────────────────────────
|
||||
|
||||
|
||||
class TestModuleLevelFunctions:
|
||||
def test_record_and_get_sovereignty_pct(self, tmp_path):
|
||||
with (
|
||||
patch("timmy.sovereignty.metrics._store", None),
|
||||
patch("timmy.sovereignty.metrics.DB_PATH", tmp_path / "fn_test.db"),
|
||||
):
|
||||
record("decision_rule_hit")
|
||||
pct = get_sovereignty_pct("decision")
|
||||
assert pct == 100.0
|
||||
|
||||
def test_get_cost_per_hour_module_fn(self, tmp_path):
|
||||
with (
|
||||
patch("timmy.sovereignty.metrics._store", None),
|
||||
patch("timmy.sovereignty.metrics.DB_PATH", tmp_path / "fn_test2.db"),
|
||||
):
|
||||
record("api_cost", {"usd": 0.5})
|
||||
cost = get_cost_per_hour()
|
||||
assert cost > 0.0
|
||||
|
||||
def test_get_skills_crystallized_module_fn(self, tmp_path):
|
||||
with (
|
||||
patch("timmy.sovereignty.metrics._store", None),
|
||||
patch("timmy.sovereignty.metrics.DB_PATH", tmp_path / "fn_test3.db"),
|
||||
):
|
||||
record("skill_crystallized")
|
||||
count = get_skills_crystallized()
|
||||
assert count == 1
|
||||
|
||||
|
||||
# ── emit_sovereignty_event ────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestEmitSovereigntyEvent:
|
||||
@pytest.mark.asyncio
|
||||
async def test_emit_records_and_publishes(self, tmp_path):
|
||||
with (
|
||||
patch("timmy.sovereignty.metrics._store", None),
|
||||
patch("timmy.sovereignty.metrics.DB_PATH", tmp_path / "emit_test.db"),
|
||||
patch("infrastructure.events.bus.emit", new_callable=AsyncMock) as mock_emit,
|
||||
):
|
||||
await emit_sovereignty_event("perception_cache_hit", {"frame": 42}, session_id="s1")
|
||||
mock_emit.assert_called_once()
|
||||
args = mock_emit.call_args[0]
|
||||
assert args[0] == "sovereignty.event.perception_cache_hit"
|
||||
@@ -2,15 +2,10 @@
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import UTC, datetime, timedelta
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from timmy.vassal.agent_health import AgentHealthReport, AgentStatus
|
||||
|
||||
pytestmark = pytest.mark.unit
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# AgentStatus
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -40,25 +35,6 @@ def test_agent_status_stuck():
|
||||
assert s.needs_reassignment is True
|
||||
|
||||
|
||||
def test_agent_status_checked_at_is_iso_string():
|
||||
s = AgentStatus(agent="claude")
|
||||
# Should be parseable as an ISO datetime
|
||||
dt = datetime.fromisoformat(s.checked_at)
|
||||
assert dt.tzinfo is not None
|
||||
|
||||
|
||||
def test_agent_status_multiple_stuck_issues():
|
||||
s = AgentStatus(agent="kimi", stuck_issue_numbers=[1, 2, 3])
|
||||
assert s.is_stuck is True
|
||||
assert s.needs_reassignment is True
|
||||
|
||||
|
||||
def test_agent_status_active_but_not_stuck():
|
||||
s = AgentStatus(agent="claude", active_issue_numbers=[5], is_idle=False)
|
||||
assert s.is_stuck is False
|
||||
assert s.needs_reassignment is False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# AgentHealthReport
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -71,22 +47,11 @@ def test_report_any_stuck():
|
||||
assert report.any_stuck is True
|
||||
|
||||
|
||||
def test_report_not_any_stuck():
|
||||
report = AgentHealthReport(agents=[AgentStatus(agent="claude"), AgentStatus(agent="kimi")])
|
||||
assert report.any_stuck is False
|
||||
|
||||
|
||||
def test_report_all_idle():
|
||||
report = AgentHealthReport(agents=[AgentStatus(agent="claude"), AgentStatus(agent="kimi")])
|
||||
assert report.all_idle is True
|
||||
|
||||
|
||||
def test_report_not_all_idle():
|
||||
claude = AgentStatus(agent="claude", active_issue_numbers=[1], is_idle=False)
|
||||
report = AgentHealthReport(agents=[claude, AgentStatus(agent="kimi")])
|
||||
assert report.all_idle is False
|
||||
|
||||
|
||||
def test_report_for_agent_found():
|
||||
kimi = AgentStatus(agent="kimi", active_issue_numbers=[42])
|
||||
report = AgentHealthReport(agents=[AgentStatus(agent="claude"), kimi])
|
||||
@@ -99,223 +64,6 @@ def test_report_for_agent_not_found():
|
||||
assert report.for_agent("timmy") is None
|
||||
|
||||
|
||||
def test_report_generated_at_is_iso_string():
|
||||
report = AgentHealthReport()
|
||||
dt = datetime.fromisoformat(report.generated_at)
|
||||
assert dt.tzinfo is not None
|
||||
|
||||
|
||||
def test_report_empty_agents():
|
||||
report = AgentHealthReport(agents=[])
|
||||
assert report.any_stuck is False
|
||||
assert report.all_idle is True
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _issue_created_time
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_issue_created_time_valid():
|
||||
from timmy.vassal.agent_health import _issue_created_time
|
||||
|
||||
issue = {"created_at": "2024-01-15T10:30:00Z"}
|
||||
result = await _issue_created_time(issue)
|
||||
assert result is not None
|
||||
assert result.year == 2024
|
||||
assert result.month == 1
|
||||
assert result.day == 15
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_issue_created_time_missing_key():
|
||||
from timmy.vassal.agent_health import _issue_created_time
|
||||
|
||||
result = await _issue_created_time({})
|
||||
assert result is None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_issue_created_time_invalid_format():
|
||||
from timmy.vassal.agent_health import _issue_created_time
|
||||
|
||||
result = await _issue_created_time({"created_at": "not-a-date"})
|
||||
assert result is None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_issue_created_time_with_timezone():
|
||||
from timmy.vassal.agent_health import _issue_created_time
|
||||
|
||||
issue = {"created_at": "2024-06-01T12:00:00+00:00"}
|
||||
result = await _issue_created_time(issue)
|
||||
assert result is not None
|
||||
assert result.tzinfo is not None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _fetch_labeled_issues — mocked HTTP client
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_fetch_labeled_issues_success():
|
||||
from timmy.vassal.agent_health import _fetch_labeled_issues
|
||||
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 200
|
||||
mock_resp.json.return_value = [
|
||||
{"number": 1, "title": "Fix bug"},
|
||||
{"number": 2, "title": "Add feature", "pull_request": {"url": "..."}},
|
||||
]
|
||||
|
||||
mock_client = AsyncMock()
|
||||
mock_client.get = AsyncMock(return_value=mock_resp)
|
||||
|
||||
result = await _fetch_labeled_issues(
|
||||
mock_client, "http://gitea/api/v1", {}, "owner/repo", "claude-ready"
|
||||
)
|
||||
|
||||
# Only non-PR issues returned
|
||||
assert len(result) == 1
|
||||
assert result[0]["number"] == 1
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_fetch_labeled_issues_http_error():
|
||||
from timmy.vassal.agent_health import _fetch_labeled_issues
|
||||
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 401
|
||||
mock_resp.json.return_value = []
|
||||
|
||||
mock_client = AsyncMock()
|
||||
mock_client.get = AsyncMock(return_value=mock_resp)
|
||||
|
||||
result = await _fetch_labeled_issues(
|
||||
mock_client, "http://gitea/api/v1", {}, "owner/repo", "claude-ready"
|
||||
)
|
||||
assert result == []
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_fetch_labeled_issues_exception():
|
||||
from timmy.vassal.agent_health import _fetch_labeled_issues
|
||||
|
||||
mock_client = AsyncMock()
|
||||
mock_client.get = AsyncMock(side_effect=ConnectionError("network down"))
|
||||
|
||||
result = await _fetch_labeled_issues(
|
||||
mock_client, "http://gitea/api/v1", {}, "owner/repo", "claude-ready"
|
||||
)
|
||||
assert result == []
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_fetch_labeled_issues_filters_pull_requests():
|
||||
from timmy.vassal.agent_health import _fetch_labeled_issues
|
||||
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 200
|
||||
mock_resp.json.return_value = [
|
||||
{"number": 10, "title": "Issue"},
|
||||
{"number": 11, "title": "PR", "pull_request": {"url": "http://gitea/pulls/11"}},
|
||||
{"number": 12, "title": "Another Issue"},
|
||||
]
|
||||
|
||||
mock_client = AsyncMock()
|
||||
mock_client.get = AsyncMock(return_value=mock_resp)
|
||||
|
||||
result = await _fetch_labeled_issues(
|
||||
mock_client, "http://gitea/api/v1", {}, "owner/repo", "claude-ready"
|
||||
)
|
||||
# Issues with truthy pull_request field are excluded
|
||||
assert len(result) == 2
|
||||
assert all(i["number"] in (10, 12) for i in result)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _last_comment_time — mocked HTTP client
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_last_comment_time_with_comments():
|
||||
from timmy.vassal.agent_health import _last_comment_time
|
||||
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 200
|
||||
mock_resp.json.return_value = [
|
||||
{"updated_at": "2024-03-10T14:00:00Z", "created_at": "2024-03-10T13:00:00Z"}
|
||||
]
|
||||
|
||||
mock_client = AsyncMock()
|
||||
mock_client.get = AsyncMock(return_value=mock_resp)
|
||||
|
||||
result = await _last_comment_time(mock_client, "http://gitea/api/v1", {}, "owner/repo", 42)
|
||||
assert result is not None
|
||||
assert result.year == 2024
|
||||
assert result.month == 3
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_last_comment_time_uses_created_at_fallback():
|
||||
from timmy.vassal.agent_health import _last_comment_time
|
||||
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 200
|
||||
mock_resp.json.return_value = [
|
||||
{"created_at": "2024-03-10T13:00:00Z"} # no updated_at
|
||||
]
|
||||
|
||||
mock_client = AsyncMock()
|
||||
mock_client.get = AsyncMock(return_value=mock_resp)
|
||||
|
||||
result = await _last_comment_time(mock_client, "http://gitea/api/v1", {}, "owner/repo", 42)
|
||||
assert result is not None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_last_comment_time_no_comments():
|
||||
from timmy.vassal.agent_health import _last_comment_time
|
||||
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 200
|
||||
mock_resp.json.return_value = []
|
||||
|
||||
mock_client = AsyncMock()
|
||||
mock_client.get = AsyncMock(return_value=mock_resp)
|
||||
|
||||
result = await _last_comment_time(mock_client, "http://gitea/api/v1", {}, "owner/repo", 99)
|
||||
assert result is None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_last_comment_time_http_error():
|
||||
from timmy.vassal.agent_health import _last_comment_time
|
||||
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 404
|
||||
|
||||
mock_client = AsyncMock()
|
||||
mock_client.get = AsyncMock(return_value=mock_resp)
|
||||
|
||||
result = await _last_comment_time(mock_client, "http://gitea/api/v1", {}, "owner/repo", 99)
|
||||
assert result is None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_last_comment_time_exception():
|
||||
from timmy.vassal.agent_health import _last_comment_time
|
||||
|
||||
mock_client = AsyncMock()
|
||||
mock_client.get = AsyncMock(side_effect=TimeoutError("timed out"))
|
||||
|
||||
result = await _last_comment_time(mock_client, "http://gitea/api/v1", {}, "owner/repo", 7)
|
||||
assert result is None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# check_agent_health — no Gitea in unit tests
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -342,138 +90,6 @@ async def test_check_agent_health_no_token():
|
||||
assert status.agent == "claude"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_agent_health_detects_stuck_issue(monkeypatch):
|
||||
"""Issues with last activity before the cutoff are flagged as stuck."""
|
||||
import timmy.vassal.agent_health as ah
|
||||
|
||||
old_time = (datetime.now(UTC) - timedelta(minutes=200)).isoformat()
|
||||
|
||||
async def _fake_fetch(client, base_url, headers, repo, label):
|
||||
return [{"number": 55, "created_at": old_time}]
|
||||
|
||||
async def _fake_last_comment(client, base_url, headers, repo, issue_number):
|
||||
return datetime.now(UTC) - timedelta(minutes=200)
|
||||
|
||||
monkeypatch.setattr(ah, "_fetch_labeled_issues", _fake_fetch)
|
||||
monkeypatch.setattr(ah, "_last_comment_time", _fake_last_comment)
|
||||
|
||||
mock_settings = MagicMock()
|
||||
mock_settings.gitea_enabled = True
|
||||
mock_settings.gitea_token = "fake-token"
|
||||
mock_settings.gitea_url = "http://gitea"
|
||||
mock_settings.gitea_repo = "owner/repo"
|
||||
|
||||
with patch("config.settings", mock_settings):
|
||||
status = await ah.check_agent_health("claude", stuck_threshold_minutes=120)
|
||||
|
||||
assert 55 in status.active_issue_numbers
|
||||
assert 55 in status.stuck_issue_numbers
|
||||
assert status.is_stuck is True
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_agent_health_active_not_stuck(monkeypatch):
|
||||
"""Recent activity means issue is active but not stuck."""
|
||||
import timmy.vassal.agent_health as ah
|
||||
|
||||
recent_time = (datetime.now(UTC) - timedelta(minutes=5)).isoformat()
|
||||
|
||||
async def _fake_fetch(client, base_url, headers, repo, label):
|
||||
return [{"number": 77, "created_at": recent_time}]
|
||||
|
||||
async def _fake_last_comment(client, base_url, headers, repo, issue_number):
|
||||
return datetime.now(UTC) - timedelta(minutes=5)
|
||||
|
||||
monkeypatch.setattr(ah, "_fetch_labeled_issues", _fake_fetch)
|
||||
monkeypatch.setattr(ah, "_last_comment_time", _fake_last_comment)
|
||||
|
||||
mock_settings = MagicMock()
|
||||
mock_settings.gitea_enabled = True
|
||||
mock_settings.gitea_token = "fake-token"
|
||||
mock_settings.gitea_url = "http://gitea"
|
||||
mock_settings.gitea_repo = "owner/repo"
|
||||
|
||||
with patch("config.settings", mock_settings):
|
||||
status = await ah.check_agent_health("claude", stuck_threshold_minutes=120)
|
||||
|
||||
assert 77 in status.active_issue_numbers
|
||||
assert 77 not in status.stuck_issue_numbers
|
||||
assert status.is_idle is False
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_agent_health_uses_issue_created_when_no_comments(monkeypatch):
|
||||
"""Falls back to issue created_at when no comment time is available."""
|
||||
import timmy.vassal.agent_health as ah
|
||||
|
||||
old_time = (datetime.now(UTC) - timedelta(minutes=300)).isoformat()
|
||||
|
||||
async def _fake_fetch(client, base_url, headers, repo, label):
|
||||
return [{"number": 99, "created_at": old_time}]
|
||||
|
||||
async def _fake_last_comment(client, base_url, headers, repo, issue_number):
|
||||
return None # No comments
|
||||
|
||||
monkeypatch.setattr(ah, "_fetch_labeled_issues", _fake_fetch)
|
||||
monkeypatch.setattr(ah, "_last_comment_time", _fake_last_comment)
|
||||
|
||||
mock_settings = MagicMock()
|
||||
mock_settings.gitea_enabled = True
|
||||
mock_settings.gitea_token = "fake-token"
|
||||
mock_settings.gitea_url = "http://gitea"
|
||||
mock_settings.gitea_repo = "owner/repo"
|
||||
|
||||
with patch("config.settings", mock_settings):
|
||||
status = await ah.check_agent_health("kimi", stuck_threshold_minutes=120)
|
||||
|
||||
assert 99 in status.stuck_issue_numbers
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_agent_health_gitea_disabled(monkeypatch):
|
||||
"""When gitea_enabled=False, returns idle status without querying."""
|
||||
import timmy.vassal.agent_health as ah
|
||||
|
||||
mock_settings = MagicMock()
|
||||
mock_settings.gitea_enabled = False
|
||||
mock_settings.gitea_token = "fake-token"
|
||||
|
||||
with patch("config.settings", mock_settings):
|
||||
status = await ah.check_agent_health("claude")
|
||||
|
||||
assert status.is_idle is True
|
||||
assert status.active_issue_numbers == []
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_check_agent_health_fetch_exception(monkeypatch):
|
||||
"""HTTP exception during check is handled gracefully."""
|
||||
import timmy.vassal.agent_health as ah
|
||||
|
||||
async def _bad_fetch(client, base_url, headers, repo, label):
|
||||
raise RuntimeError("connection refused")
|
||||
|
||||
monkeypatch.setattr(ah, "_fetch_labeled_issues", _bad_fetch)
|
||||
|
||||
mock_settings = MagicMock()
|
||||
mock_settings.gitea_enabled = True
|
||||
mock_settings.gitea_token = "fake-token"
|
||||
mock_settings.gitea_url = "http://gitea"
|
||||
mock_settings.gitea_repo = "owner/repo"
|
||||
|
||||
with patch("config.settings", mock_settings):
|
||||
status = await ah.check_agent_health("claude")
|
||||
|
||||
assert isinstance(status, AgentStatus)
|
||||
assert status.is_idle is True
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# get_full_health_report
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_get_full_health_report_returns_both_agents():
|
||||
from timmy.vassal.agent_health import get_full_health_report
|
||||
@@ -482,127 +98,3 @@ async def test_get_full_health_report_returns_both_agents():
|
||||
agent_names = {a.agent for a in report.agents}
|
||||
assert "claude" in agent_names
|
||||
assert "kimi" in agent_names
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_get_full_health_report_structure():
|
||||
from timmy.vassal.agent_health import get_full_health_report
|
||||
|
||||
report = await get_full_health_report()
|
||||
assert isinstance(report, AgentHealthReport)
|
||||
assert len(report.agents) == 2
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# nudge_stuck_agent
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_nudge_stuck_agent_no_token():
|
||||
"""Returns False gracefully when Gitea is not configured."""
|
||||
from timmy.vassal.agent_health import nudge_stuck_agent
|
||||
|
||||
mock_settings = MagicMock()
|
||||
mock_settings.gitea_enabled = False
|
||||
mock_settings.gitea_token = ""
|
||||
|
||||
with patch("config.settings", mock_settings):
|
||||
result = await nudge_stuck_agent("claude", 123)
|
||||
assert result is False
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_nudge_stuck_agent_success(monkeypatch):
|
||||
"""Returns True when comment is posted successfully."""
|
||||
import timmy.vassal.agent_health as ah
|
||||
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 201
|
||||
|
||||
mock_client_instance = AsyncMock()
|
||||
mock_client_instance.post = AsyncMock(return_value=mock_resp)
|
||||
mock_client_instance.__aenter__ = AsyncMock(return_value=mock_client_instance)
|
||||
mock_client_instance.__aexit__ = AsyncMock(return_value=False)
|
||||
|
||||
mock_settings = MagicMock()
|
||||
mock_settings.gitea_enabled = True
|
||||
mock_settings.gitea_token = "fake-token"
|
||||
mock_settings.gitea_url = "http://gitea"
|
||||
mock_settings.gitea_repo = "owner/repo"
|
||||
|
||||
with (
|
||||
patch("config.settings", mock_settings),
|
||||
patch("httpx.AsyncClient", return_value=mock_client_instance),
|
||||
):
|
||||
result = await ah.nudge_stuck_agent("claude", 55)
|
||||
|
||||
assert result is True
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_nudge_stuck_agent_http_failure(monkeypatch):
|
||||
"""Returns False when API returns non-2xx status."""
|
||||
import timmy.vassal.agent_health as ah
|
||||
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 500
|
||||
|
||||
mock_client_instance = AsyncMock()
|
||||
mock_client_instance.post = AsyncMock(return_value=mock_resp)
|
||||
mock_client_instance.__aenter__ = AsyncMock(return_value=mock_client_instance)
|
||||
mock_client_instance.__aexit__ = AsyncMock(return_value=False)
|
||||
|
||||
mock_settings = MagicMock()
|
||||
mock_settings.gitea_enabled = True
|
||||
mock_settings.gitea_token = "fake-token"
|
||||
mock_settings.gitea_url = "http://gitea"
|
||||
mock_settings.gitea_repo = "owner/repo"
|
||||
|
||||
with (
|
||||
patch("config.settings", mock_settings),
|
||||
patch("httpx.AsyncClient", return_value=mock_client_instance),
|
||||
):
|
||||
result = await ah.nudge_stuck_agent("kimi", 77)
|
||||
|
||||
assert result is False
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_nudge_stuck_agent_gitea_disabled(monkeypatch):
|
||||
"""Returns False when gitea_enabled=False."""
|
||||
import timmy.vassal.agent_health as ah
|
||||
|
||||
mock_settings = MagicMock()
|
||||
mock_settings.gitea_enabled = False
|
||||
mock_settings.gitea_token = "fake-token"
|
||||
|
||||
with patch("config.settings", mock_settings):
|
||||
result = await ah.nudge_stuck_agent("claude", 42)
|
||||
|
||||
assert result is False
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_nudge_stuck_agent_exception(monkeypatch):
|
||||
"""Returns False on network exception."""
|
||||
import timmy.vassal.agent_health as ah
|
||||
|
||||
mock_client_instance = AsyncMock()
|
||||
mock_client_instance.post = AsyncMock(side_effect=ConnectionError("refused"))
|
||||
mock_client_instance.__aenter__ = AsyncMock(return_value=mock_client_instance)
|
||||
mock_client_instance.__aexit__ = AsyncMock(return_value=False)
|
||||
|
||||
mock_settings = MagicMock()
|
||||
mock_settings.gitea_enabled = True
|
||||
mock_settings.gitea_token = "fake-token"
|
||||
mock_settings.gitea_url = "http://gitea"
|
||||
mock_settings.gitea_repo = "owner/repo"
|
||||
|
||||
with (
|
||||
patch("config.settings", mock_settings),
|
||||
patch("httpx.AsyncClient", return_value=mock_client_instance),
|
||||
):
|
||||
result = await ah.nudge_stuck_agent("claude", 10)
|
||||
|
||||
assert result is False
|
||||
|
||||
@@ -2,17 +2,11 @@
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from types import SimpleNamespace
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from timmy.vassal.backlog import AgentTarget, TriagedIssue
|
||||
from timmy.vassal.dispatch import (
|
||||
DispatchRecord,
|
||||
_apply_label_to_issue,
|
||||
_get_or_create_label,
|
||||
_post_dispatch_comment,
|
||||
clear_dispatch_registry,
|
||||
get_dispatch_registry,
|
||||
)
|
||||
@@ -118,244 +112,3 @@ def test_dispatch_record_defaults():
|
||||
assert r.label_applied is False
|
||||
assert r.comment_posted is False
|
||||
assert r.dispatched_at # has a timestamp
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _get_or_create_label
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_HEADERS = {"Authorization": "token x"}
|
||||
_BASE_URL = "http://gitea"
|
||||
_REPO = "org/repo"
|
||||
|
||||
|
||||
def _mock_response(status_code: int, json_data=None):
|
||||
resp = MagicMock()
|
||||
resp.status_code = status_code
|
||||
resp.json.return_value = json_data or {}
|
||||
return resp
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_get_or_create_label_finds_existing():
|
||||
"""Returns the ID of an existing label without creating it."""
|
||||
existing = [{"name": "claude-ready", "id": 42}, {"name": "other", "id": 7}]
|
||||
client = AsyncMock()
|
||||
client.get.return_value = _mock_response(200, existing)
|
||||
|
||||
result = await _get_or_create_label(client, _BASE_URL, _HEADERS, _REPO, "claude-ready")
|
||||
|
||||
assert result == 42
|
||||
client.post.assert_not_called()
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_get_or_create_label_creates_when_missing():
|
||||
"""Creates the label when it doesn't exist in the list."""
|
||||
client = AsyncMock()
|
||||
# GET returns empty list
|
||||
client.get.return_value = _mock_response(200, [])
|
||||
# POST creates label
|
||||
client.post.return_value = _mock_response(201, {"id": 99})
|
||||
|
||||
result = await _get_or_create_label(client, _BASE_URL, _HEADERS, _REPO, "claude-ready")
|
||||
|
||||
assert result == 99
|
||||
client.post.assert_called_once()
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_get_or_create_label_returns_none_on_get_error():
|
||||
"""Returns None if the GET raises an exception."""
|
||||
client = AsyncMock()
|
||||
client.get.side_effect = Exception("network error")
|
||||
|
||||
result = await _get_or_create_label(client, _BASE_URL, _HEADERS, _REPO, "claude-ready")
|
||||
|
||||
assert result is None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_get_or_create_label_returns_none_on_create_error():
|
||||
"""Returns None if POST raises an exception."""
|
||||
client = AsyncMock()
|
||||
client.get.return_value = _mock_response(200, [])
|
||||
client.post.side_effect = Exception("post failed")
|
||||
|
||||
result = await _get_or_create_label(client, _BASE_URL, _HEADERS, _REPO, "claude-ready")
|
||||
|
||||
assert result is None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_get_or_create_label_uses_default_color_for_unknown():
|
||||
"""Unknown label name uses '#cccccc' fallback color."""
|
||||
client = AsyncMock()
|
||||
client.get.return_value = _mock_response(200, [])
|
||||
client.post.return_value = _mock_response(201, {"id": 5})
|
||||
|
||||
await _get_or_create_label(client, _BASE_URL, _HEADERS, _REPO, "unknown-label")
|
||||
|
||||
call_kwargs = client.post.call_args
|
||||
assert call_kwargs.kwargs["json"]["color"] == "#cccccc"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _apply_label_to_issue
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_apply_label_to_issue_success():
|
||||
"""Returns True when label is found and applied."""
|
||||
client = AsyncMock()
|
||||
client.get.return_value = _mock_response(200, [{"name": "claude-ready", "id": 10}])
|
||||
client.post.return_value = _mock_response(201)
|
||||
|
||||
result = await _apply_label_to_issue(client, _BASE_URL, _HEADERS, _REPO, 42, "claude-ready")
|
||||
|
||||
assert result is True
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_apply_label_to_issue_returns_false_when_no_label_id():
|
||||
"""Returns False when label ID cannot be obtained."""
|
||||
client = AsyncMock()
|
||||
client.get.side_effect = Exception("unavailable")
|
||||
|
||||
result = await _apply_label_to_issue(client, _BASE_URL, _HEADERS, _REPO, 42, "claude-ready")
|
||||
|
||||
assert result is False
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_apply_label_to_issue_returns_false_on_bad_status():
|
||||
"""Returns False when the apply POST returns a non-2xx status."""
|
||||
client = AsyncMock()
|
||||
client.get.return_value = _mock_response(200, [{"name": "claude-ready", "id": 10}])
|
||||
client.post.return_value = _mock_response(403)
|
||||
|
||||
result = await _apply_label_to_issue(client, _BASE_URL, _HEADERS, _REPO, 42, "claude-ready")
|
||||
|
||||
assert result is False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _post_dispatch_comment
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_post_dispatch_comment_success():
|
||||
"""Returns True on successful comment post."""
|
||||
client = AsyncMock()
|
||||
client.post.return_value = _mock_response(201)
|
||||
|
||||
issue = _make_triaged(7, "Some issue", AgentTarget.CLAUDE, priority=75)
|
||||
result = await _post_dispatch_comment(client, _BASE_URL, _HEADERS, _REPO, issue, "claude-ready")
|
||||
|
||||
assert result is True
|
||||
body = client.post.call_args.kwargs["json"]["body"]
|
||||
assert "Claude" in body
|
||||
assert "claude-ready" in body
|
||||
assert "75" in body
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_post_dispatch_comment_failure():
|
||||
"""Returns False when comment POST returns a non-2xx status."""
|
||||
client = AsyncMock()
|
||||
client.post.return_value = _mock_response(500)
|
||||
|
||||
issue = _make_triaged(8, "Other issue", AgentTarget.KIMI)
|
||||
result = await _post_dispatch_comment(client, _BASE_URL, _HEADERS, _REPO, issue, "kimi-ready")
|
||||
|
||||
assert result is False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _perform_gitea_dispatch — settings-level gate
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_perform_gitea_dispatch_skips_when_disabled():
|
||||
"""Does not call Gitea when gitea_enabled is False."""
|
||||
import config
|
||||
from timmy.vassal.dispatch import _perform_gitea_dispatch
|
||||
|
||||
mock_settings = SimpleNamespace(gitea_enabled=False, gitea_token="tok")
|
||||
with patch.object(config, "settings", mock_settings):
|
||||
issue = _make_triaged(9, "Disabled", AgentTarget.CLAUDE)
|
||||
record = DispatchRecord(
|
||||
issue_number=9,
|
||||
issue_title="Disabled",
|
||||
agent=AgentTarget.CLAUDE,
|
||||
rationale="r",
|
||||
)
|
||||
await _perform_gitea_dispatch(issue, record)
|
||||
|
||||
assert record.label_applied is False
|
||||
assert record.comment_posted is False
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_perform_gitea_dispatch_skips_when_no_token():
|
||||
"""Does not call Gitea when gitea_token is empty."""
|
||||
import config
|
||||
from timmy.vassal.dispatch import _perform_gitea_dispatch
|
||||
|
||||
mock_settings = SimpleNamespace(gitea_enabled=True, gitea_token="")
|
||||
with patch.object(config, "settings", mock_settings):
|
||||
issue = _make_triaged(10, "No token", AgentTarget.CLAUDE)
|
||||
record = DispatchRecord(
|
||||
issue_number=10,
|
||||
issue_title="No token",
|
||||
agent=AgentTarget.CLAUDE,
|
||||
rationale="r",
|
||||
)
|
||||
await _perform_gitea_dispatch(issue, record)
|
||||
|
||||
assert record.label_applied is False
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_perform_gitea_dispatch_updates_record():
|
||||
"""Record is mutated to reflect label/comment success."""
|
||||
import config
|
||||
from timmy.vassal.dispatch import _perform_gitea_dispatch
|
||||
|
||||
mock_settings = SimpleNamespace(
|
||||
gitea_enabled=True,
|
||||
gitea_token="tok",
|
||||
gitea_url="http://gitea",
|
||||
gitea_repo="org/repo",
|
||||
)
|
||||
|
||||
mock_client = AsyncMock()
|
||||
# GET labels → empty list, POST create label → id 1
|
||||
mock_client.get.return_value = _mock_response(200, [])
|
||||
mock_client.post.side_effect = [
|
||||
_mock_response(201, {"id": 1}), # create label
|
||||
_mock_response(201), # apply label
|
||||
_mock_response(201), # post comment
|
||||
]
|
||||
|
||||
with (
|
||||
patch.object(config, "settings", mock_settings),
|
||||
patch("httpx.AsyncClient") as mock_cls,
|
||||
):
|
||||
mock_cls.return_value.__aenter__ = AsyncMock(return_value=mock_client)
|
||||
mock_cls.return_value.__aexit__ = AsyncMock(return_value=False)
|
||||
|
||||
issue = _make_triaged(11, "Full dispatch", AgentTarget.CLAUDE)
|
||||
record = DispatchRecord(
|
||||
issue_number=11,
|
||||
issue_title="Full dispatch",
|
||||
agent=AgentTarget.CLAUDE,
|
||||
rationale="r",
|
||||
)
|
||||
await _perform_gitea_dispatch(issue, record)
|
||||
|
||||
assert record.label_applied is True
|
||||
assert record.comment_posted is True
|
||||
|
||||
@@ -2,14 +2,10 @@
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from timmy.vassal.orchestration_loop import VassalCycleRecord, VassalOrchestrator
|
||||
|
||||
pytestmark = pytest.mark.unit
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# VassalCycleRecord
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -140,186 +136,3 @@ def test_module_singleton_exists():
|
||||
from timmy.vassal import VassalOrchestrator, vassal_orchestrator
|
||||
|
||||
assert isinstance(vassal_orchestrator, VassalOrchestrator)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Error recovery — steps degrade gracefully
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_run_cycle_continues_when_backlog_fails():
|
||||
"""A backlog step failure must not abort the cycle."""
|
||||
from timmy.vassal.dispatch import clear_dispatch_registry
|
||||
|
||||
clear_dispatch_registry()
|
||||
orch = VassalOrchestrator()
|
||||
|
||||
with patch(
|
||||
"timmy.vassal.orchestration_loop.VassalOrchestrator._step_backlog",
|
||||
new_callable=AsyncMock,
|
||||
side_effect=RuntimeError("gitea down"),
|
||||
):
|
||||
# _step_backlog raises, but run_cycle should still complete
|
||||
# (the error is caught inside run_cycle via the graceful-degrade wrapper)
|
||||
# In practice _step_backlog itself catches; here we patch at a higher level
|
||||
# to confirm record still finalises.
|
||||
try:
|
||||
record = await orch.run_cycle()
|
||||
except RuntimeError:
|
||||
# If the orchestrator doesn't swallow it, the test still validates
|
||||
# that the cycle progressed to the patched call.
|
||||
return
|
||||
|
||||
assert record.finished_at
|
||||
assert record.cycle_id == 1
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_run_cycle_records_backlog_error():
|
||||
"""Backlog errors are recorded in VassalCycleRecord.errors."""
|
||||
from timmy.vassal.dispatch import clear_dispatch_registry
|
||||
|
||||
clear_dispatch_registry()
|
||||
orch = VassalOrchestrator()
|
||||
|
||||
with patch(
|
||||
"timmy.vassal.backlog.fetch_open_issues",
|
||||
new_callable=AsyncMock,
|
||||
side_effect=ConnectionError("gitea unreachable"),
|
||||
):
|
||||
record = await orch.run_cycle()
|
||||
|
||||
assert any("backlog" in e for e in record.errors)
|
||||
assert record.finished_at
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_run_cycle_records_agent_health_error():
|
||||
"""Agent health errors are recorded in VassalCycleRecord.errors."""
|
||||
from timmy.vassal.dispatch import clear_dispatch_registry
|
||||
|
||||
clear_dispatch_registry()
|
||||
orch = VassalOrchestrator()
|
||||
|
||||
with patch(
|
||||
"timmy.vassal.agent_health.get_full_health_report",
|
||||
new_callable=AsyncMock,
|
||||
side_effect=RuntimeError("health check failed"),
|
||||
):
|
||||
record = await orch.run_cycle()
|
||||
|
||||
assert any("agent_health" in e for e in record.errors)
|
||||
assert record.finished_at
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_run_cycle_records_house_health_error():
|
||||
"""House health errors are recorded in VassalCycleRecord.errors."""
|
||||
from timmy.vassal.dispatch import clear_dispatch_registry
|
||||
|
||||
clear_dispatch_registry()
|
||||
orch = VassalOrchestrator()
|
||||
|
||||
with patch(
|
||||
"timmy.vassal.house_health.get_system_snapshot",
|
||||
new_callable=AsyncMock,
|
||||
side_effect=OSError("disk check failed"),
|
||||
):
|
||||
record = await orch.run_cycle()
|
||||
|
||||
assert any("house_health" in e for e in record.errors)
|
||||
assert record.finished_at
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Task assignment counting
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_run_cycle_counts_dispatched_issues():
|
||||
"""Issues dispatched during a cycle are counted in the record."""
|
||||
from timmy.vassal.backlog import AgentTarget, TriagedIssue
|
||||
from timmy.vassal.dispatch import clear_dispatch_registry
|
||||
|
||||
clear_dispatch_registry()
|
||||
orch = VassalOrchestrator(max_dispatch_per_cycle=5)
|
||||
|
||||
fake_issues = [
|
||||
TriagedIssue(number=i, title=f"Issue {i}", body="", agent_target=AgentTarget.CLAUDE)
|
||||
for i in range(1, 4)
|
||||
]
|
||||
|
||||
with (
|
||||
patch(
|
||||
"timmy.vassal.backlog.fetch_open_issues",
|
||||
new_callable=AsyncMock,
|
||||
return_value=[{"number": i, "title": f"Issue {i}", "labels": [], "assignees": []} for i in range(1, 4)],
|
||||
),
|
||||
patch(
|
||||
"timmy.vassal.backlog.triage_issues",
|
||||
return_value=fake_issues,
|
||||
),
|
||||
patch(
|
||||
"timmy.vassal.dispatch.dispatch_issue",
|
||||
new_callable=AsyncMock,
|
||||
),
|
||||
):
|
||||
record = await orch.run_cycle()
|
||||
|
||||
assert record.issues_fetched == 3
|
||||
assert record.issues_dispatched == 3
|
||||
assert record.dispatched_to_claude == 3
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_run_cycle_respects_max_dispatch_cap():
|
||||
"""Dispatch cap prevents flooding agents in a single cycle."""
|
||||
from timmy.vassal.backlog import AgentTarget, TriagedIssue
|
||||
from timmy.vassal.dispatch import clear_dispatch_registry
|
||||
|
||||
clear_dispatch_registry()
|
||||
orch = VassalOrchestrator(max_dispatch_per_cycle=2)
|
||||
|
||||
fake_issues = [
|
||||
TriagedIssue(number=i, title=f"Issue {i}", body="", agent_target=AgentTarget.CLAUDE)
|
||||
for i in range(1, 6)
|
||||
]
|
||||
|
||||
with (
|
||||
patch(
|
||||
"timmy.vassal.backlog.fetch_open_issues",
|
||||
new_callable=AsyncMock,
|
||||
return_value=[{"number": i, "title": f"Issue {i}", "labels": [], "assignees": []} for i in range(1, 6)],
|
||||
),
|
||||
patch(
|
||||
"timmy.vassal.backlog.triage_issues",
|
||||
return_value=fake_issues,
|
||||
),
|
||||
patch(
|
||||
"timmy.vassal.dispatch.dispatch_issue",
|
||||
new_callable=AsyncMock,
|
||||
),
|
||||
):
|
||||
record = await orch.run_cycle()
|
||||
|
||||
assert record.issues_fetched == 5
|
||||
assert record.issues_dispatched == 2 # capped
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _resolve_interval
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_resolve_interval_uses_explicit_value():
|
||||
orch = VassalOrchestrator(cycle_interval=60.0)
|
||||
assert orch._resolve_interval() == 60.0
|
||||
|
||||
|
||||
def test_resolve_interval_falls_back_to_300():
|
||||
orch = VassalOrchestrator()
|
||||
with patch("timmy.vassal.orchestration_loop.VassalOrchestrator._resolve_interval") as mock_resolve:
|
||||
mock_resolve.return_value = 300.0
|
||||
assert orch._resolve_interval() == 300.0
|
||||
|
||||
@@ -1 +0,0 @@
|
||||
"""Cognitive benchmark levels for Project Bannerlord readiness testing."""
|
||||
@@ -1,183 +0,0 @@
|
||||
"""Level 0: JSON Compliance — Coin Flip.
|
||||
|
||||
Tests whether the model can reliably return well-formed JSON responses
|
||||
with a specific schema. This is the minimum bar for GABS tool calls.
|
||||
"""
|
||||
|
||||
import json
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
LEVEL = 0
|
||||
NAME = "JSON Compliance (Coin Flip)"
|
||||
DESCRIPTION = "Model must return valid JSON matching a strict schema on each trial."
|
||||
|
||||
SYSTEM_PROMPT = """You are a strategic AI agent. You MUST respond ONLY with valid JSON.
|
||||
No markdown, no explanation, no code fences. Raw JSON only."""
|
||||
|
||||
TRIALS = [
|
||||
{
|
||||
"prompt": (
|
||||
'A coin is flipped. Respond with exactly: {"choice": "heads"} or {"choice": "tails"}. '
|
||||
"Pick one. JSON only."
|
||||
),
|
||||
"schema": {"choice": str},
|
||||
"valid_values": {"choice": ["heads", "tails"]},
|
||||
},
|
||||
{
|
||||
"prompt": (
|
||||
'You must attack or defend. Respond with: {"action": "attack", "confidence": 0.8} '
|
||||
'or {"action": "defend", "confidence": 0.6}. Replace confidence with your own value 0.0-1.0. JSON only.'
|
||||
),
|
||||
"schema": {"action": str, "confidence": float},
|
||||
"valid_values": {"action": ["attack", "defend"]},
|
||||
},
|
||||
{
|
||||
"prompt": (
|
||||
'Choose a direction to march. Respond with exactly: '
|
||||
'{"direction": "north", "reason": "string explaining why"}. '
|
||||
"Pick north/south/east/west. JSON only."
|
||||
),
|
||||
"schema": {"direction": str, "reason": str},
|
||||
"valid_values": {"direction": ["north", "south", "east", "west"]},
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
@dataclass
|
||||
class TrialResult:
|
||||
trial_index: int
|
||||
prompt: str
|
||||
raw_response: str
|
||||
parsed: dict | None
|
||||
valid_json: bool
|
||||
schema_valid: bool
|
||||
value_valid: bool
|
||||
latency_ms: float
|
||||
error: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class LevelResult:
|
||||
level: int = LEVEL
|
||||
name: str = NAME
|
||||
trials: list[TrialResult] = field(default_factory=list)
|
||||
passed: bool = False
|
||||
score: float = 0.0
|
||||
latency_p50_ms: float = 0.0
|
||||
latency_p99_ms: float = 0.0
|
||||
|
||||
|
||||
def _validate_schema(parsed: dict, schema: dict[str, type]) -> bool:
|
||||
for key, expected_type in schema.items():
|
||||
if key not in parsed:
|
||||
return False
|
||||
if not isinstance(parsed[key], expected_type):
|
||||
# Allow int where float is expected
|
||||
if expected_type is float and isinstance(parsed[key], int):
|
||||
continue
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def _validate_values(parsed: dict, valid_values: dict[str, list]) -> bool:
|
||||
for key, valid_list in valid_values.items():
|
||||
if key in parsed and parsed[key] not in valid_list:
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def _clean_response(raw: str) -> str:
|
||||
"""Strip markdown fences if model wrapped JSON in them."""
|
||||
raw = raw.strip()
|
||||
if raw.startswith("```"):
|
||||
lines = raw.splitlines()
|
||||
# Remove first and last fence lines
|
||||
lines = [l for l in lines if not l.startswith("```")]
|
||||
raw = "\n".join(lines).strip()
|
||||
return raw
|
||||
|
||||
|
||||
def run(client: Any, model: str, verbose: bool = False) -> LevelResult:
|
||||
result = LevelResult()
|
||||
latencies = []
|
||||
|
||||
for i, trial in enumerate(TRIALS):
|
||||
t0 = time.time()
|
||||
try:
|
||||
response = client.chat(
|
||||
model=model,
|
||||
messages=[
|
||||
{"role": "system", "content": SYSTEM_PROMPT},
|
||||
{"role": "user", "content": trial["prompt"]},
|
||||
],
|
||||
options={"temperature": 0.1},
|
||||
)
|
||||
raw = response["message"]["content"]
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
except Exception as exc:
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
tr = TrialResult(
|
||||
trial_index=i,
|
||||
prompt=trial["prompt"],
|
||||
raw_response="",
|
||||
parsed=None,
|
||||
valid_json=False,
|
||||
schema_valid=False,
|
||||
value_valid=False,
|
||||
latency_ms=latency_ms,
|
||||
error=str(exc),
|
||||
)
|
||||
result.trials.append(tr)
|
||||
if verbose:
|
||||
print(f" Trial {i}: ERROR — {exc}")
|
||||
continue
|
||||
|
||||
latencies.append(latency_ms)
|
||||
|
||||
cleaned = _clean_response(raw)
|
||||
parsed = None
|
||||
valid_json = False
|
||||
schema_valid = False
|
||||
value_valid = False
|
||||
error = ""
|
||||
|
||||
try:
|
||||
parsed = json.loads(cleaned)
|
||||
valid_json = True
|
||||
schema_valid = _validate_schema(parsed, trial["schema"])
|
||||
value_valid = _validate_values(parsed, trial["valid_values"])
|
||||
except json.JSONDecodeError as exc:
|
||||
error = f"JSONDecodeError: {exc}"
|
||||
|
||||
tr = TrialResult(
|
||||
trial_index=i,
|
||||
prompt=trial["prompt"],
|
||||
raw_response=raw,
|
||||
parsed=parsed,
|
||||
valid_json=valid_json,
|
||||
schema_valid=schema_valid,
|
||||
value_valid=value_valid,
|
||||
latency_ms=latency_ms,
|
||||
error=error,
|
||||
)
|
||||
result.trials.append(tr)
|
||||
|
||||
if verbose:
|
||||
status = "PASS" if (valid_json and schema_valid) else "FAIL"
|
||||
print(
|
||||
f" Trial {i}: {status} | json={valid_json} schema={schema_valid} "
|
||||
f"value={value_valid} | {latency_ms:.0f}ms | {raw[:80]!r}"
|
||||
)
|
||||
|
||||
passed_trials = sum(1 for t in result.trials if t.valid_json and t.schema_valid)
|
||||
result.score = passed_trials / len(TRIALS)
|
||||
result.passed = result.score >= 1.0 # Must pass all 3 trials
|
||||
|
||||
if latencies:
|
||||
latencies_sorted = sorted(latencies)
|
||||
result.latency_p50_ms = latencies_sorted[len(latencies_sorted) // 2]
|
||||
result.latency_p99_ms = latencies_sorted[-1]
|
||||
|
||||
return result
|
||||
@@ -1,211 +0,0 @@
|
||||
"""Level 1: Board State Tracking — Tic-Tac-Toe.
|
||||
|
||||
Tests whether the model can maintain game state across turns, select
|
||||
legal moves, and exhibit basic strategic awareness.
|
||||
Maps to: Bannerlord board state / campaign map tracking.
|
||||
"""
|
||||
|
||||
import json
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
LEVEL = 1
|
||||
NAME = "Board State Tracking (Tic-Tac-Toe)"
|
||||
DESCRIPTION = "Model must track a tic-tac-toe board and make legal, strategic moves."
|
||||
|
||||
SYSTEM_PROMPT = """You are a strategic AI playing tic-tac-toe. The board is a 3x3 grid.
|
||||
Positions are numbered 0-8 left-to-right, top-to-bottom:
|
||||
0|1|2
|
||||
3|4|5
|
||||
6|7|8
|
||||
|
||||
You MUST respond ONLY with valid JSON. No markdown, no explanation. Raw JSON only.
|
||||
Format: {"move": <position 0-8>, "reason": "<brief reason>"}"""
|
||||
|
||||
|
||||
SCENARIOS = [
|
||||
{
|
||||
"description": "Empty board — opening move",
|
||||
"board": [None, None, None, None, None, None, None, None, None],
|
||||
"player": "X",
|
||||
"prompt": (
|
||||
'Board state: [null,null,null,null,null,null,null,null,null]. '
|
||||
'You are X. It is your turn. Choose a move. '
|
||||
'Respond: {"move": <0-8>, "reason": "<why>"}'
|
||||
),
|
||||
"check": lambda move, board: move in range(9) and board[move] is None,
|
||||
"check_desc": "Move must be a valid empty position (0-8)",
|
||||
},
|
||||
{
|
||||
"description": "Block opponent's winning move",
|
||||
"board": ["O", None, "O", None, "X", None, None, None, None],
|
||||
"player": "X",
|
||||
"prompt": (
|
||||
'Board: ["O",null,"O",null,"X",null,null,null,null]. '
|
||||
"O has positions 0 and 2. You are X. "
|
||||
"O will win on next turn unless you block. "
|
||||
'Respond: {"move": <0-8>, "reason": "<why>"}'
|
||||
),
|
||||
"check": lambda move, board: move == 1, # Must block at position 1
|
||||
"check_desc": "Must block O's win at position 1",
|
||||
},
|
||||
{
|
||||
"description": "Take winning move",
|
||||
"board": ["X", None, "X", None, "O", None, None, "O", None],
|
||||
"player": "X",
|
||||
"prompt": (
|
||||
'Board: ["X",null,"X",null,"O",null,null,"O",null]. '
|
||||
"You are X. You have positions 0 and 2. "
|
||||
"You can win this turn. "
|
||||
'Respond: {"move": <0-8>, "reason": "<why>"}'
|
||||
),
|
||||
"check": lambda move, board: move == 1, # Win at position 1
|
||||
"check_desc": "Must take winning move at position 1",
|
||||
},
|
||||
{
|
||||
"description": "Legal move on partially filled board",
|
||||
"board": ["X", "O", "X", "O", "X", "O", None, None, None],
|
||||
"player": "O",
|
||||
"prompt": (
|
||||
'Board: ["X","O","X","O","X","O",null,null,null]. '
|
||||
"You are O. Choose a legal move (positions 6, 7, or 8 are available). "
|
||||
'Respond: {"move": <0-8>, "reason": "<why>"}'
|
||||
),
|
||||
"check": lambda move, board: move in [6, 7, 8],
|
||||
"check_desc": "Move must be one of the empty positions: 6, 7, or 8",
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
@dataclass
|
||||
class ScenarioResult:
|
||||
scenario_index: int
|
||||
description: str
|
||||
prompt: str
|
||||
raw_response: str
|
||||
parsed: dict | None
|
||||
valid_json: bool
|
||||
move_legal: bool
|
||||
move_correct: bool
|
||||
latency_ms: float
|
||||
error: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class LevelResult:
|
||||
level: int = LEVEL
|
||||
name: str = NAME
|
||||
trials: list[ScenarioResult] = field(default_factory=list)
|
||||
passed: bool = False
|
||||
score: float = 0.0
|
||||
latency_p50_ms: float = 0.0
|
||||
latency_p99_ms: float = 0.0
|
||||
|
||||
|
||||
def _clean_response(raw: str) -> str:
|
||||
raw = raw.strip()
|
||||
if raw.startswith("```"):
|
||||
lines = raw.splitlines()
|
||||
lines = [l for l in lines if not l.startswith("```")]
|
||||
raw = "\n".join(lines).strip()
|
||||
return raw
|
||||
|
||||
|
||||
def run(client: Any, model: str, verbose: bool = False) -> LevelResult:
|
||||
result = LevelResult()
|
||||
latencies = []
|
||||
|
||||
for i, scenario in enumerate(SCENARIOS):
|
||||
t0 = time.time()
|
||||
try:
|
||||
response = client.chat(
|
||||
model=model,
|
||||
messages=[
|
||||
{"role": "system", "content": SYSTEM_PROMPT},
|
||||
{"role": "user", "content": scenario["prompt"]},
|
||||
],
|
||||
options={"temperature": 0.1},
|
||||
)
|
||||
raw = response["message"]["content"]
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
except Exception as exc:
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
sr = ScenarioResult(
|
||||
scenario_index=i,
|
||||
description=scenario["description"],
|
||||
prompt=scenario["prompt"],
|
||||
raw_response="",
|
||||
parsed=None,
|
||||
valid_json=False,
|
||||
move_legal=False,
|
||||
move_correct=False,
|
||||
latency_ms=latency_ms,
|
||||
error=str(exc),
|
||||
)
|
||||
result.trials.append(sr)
|
||||
if verbose:
|
||||
print(f" Scenario {i}: ERROR — {exc}")
|
||||
continue
|
||||
|
||||
latencies.append(latency_ms)
|
||||
|
||||
cleaned = _clean_response(raw)
|
||||
parsed = None
|
||||
valid_json = False
|
||||
move_legal = False
|
||||
move_correct = False
|
||||
error = ""
|
||||
|
||||
try:
|
||||
parsed = json.loads(cleaned)
|
||||
valid_json = True
|
||||
|
||||
if "move" in parsed:
|
||||
move = parsed["move"]
|
||||
# Coerce string digits to int (some models emit "4" instead of 4)
|
||||
if isinstance(move, str) and move.strip().lstrip("-").isdigit():
|
||||
move = int(move.strip())
|
||||
if isinstance(move, int):
|
||||
board = scenario["board"]
|
||||
move_legal = 0 <= move <= 8 and board[move] is None
|
||||
move_correct = scenario["check"](move, board)
|
||||
except json.JSONDecodeError as exc:
|
||||
error = f"JSONDecodeError: {exc}"
|
||||
|
||||
sr = ScenarioResult(
|
||||
scenario_index=i,
|
||||
description=scenario["description"],
|
||||
prompt=scenario["prompt"],
|
||||
raw_response=raw,
|
||||
parsed=parsed,
|
||||
valid_json=valid_json,
|
||||
move_legal=move_legal,
|
||||
move_correct=move_correct,
|
||||
latency_ms=latency_ms,
|
||||
error=error,
|
||||
)
|
||||
result.trials.append(sr)
|
||||
|
||||
if verbose:
|
||||
status = "PASS" if (valid_json and move_legal) else "FAIL"
|
||||
correct_str = "CORRECT" if move_correct else "suboptimal"
|
||||
move_val = parsed.get("move", "?") if parsed else "?"
|
||||
print(
|
||||
f" Scenario {i} [{scenario['description']}]: {status} ({correct_str}) "
|
||||
f"| move={move_val} | {latency_ms:.0f}ms"
|
||||
)
|
||||
if not move_correct and valid_json:
|
||||
print(f" Expected: {scenario['check_desc']}")
|
||||
|
||||
# Pass criteria: all moves must be valid JSON + legal
|
||||
legal_moves = sum(1 for t in result.trials if t.valid_json and t.move_legal)
|
||||
result.score = legal_moves / len(SCENARIOS)
|
||||
result.passed = result.score >= 1.0
|
||||
|
||||
if latencies:
|
||||
latencies_sorted = sorted(latencies)
|
||||
result.latency_p50_ms = latencies_sorted[len(latencies_sorted) // 2]
|
||||
result.latency_p99_ms = latencies_sorted[-1]
|
||||
|
||||
return result
|
||||
@@ -1,213 +0,0 @@
|
||||
"""Level 2: Resource Management — Party Economy.
|
||||
|
||||
Tests whether the model can allocate limited resources across competing
|
||||
priorities and adapt when constraints change.
|
||||
Maps to: Bannerlord party economy (troops, food, gold, morale).
|
||||
"""
|
||||
|
||||
import json
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
LEVEL = 2
|
||||
NAME = "Resource Management (Party Economy)"
|
||||
DESCRIPTION = "Model must allocate limited resources across troops, food, and equipment."
|
||||
|
||||
SYSTEM_PROMPT = """You are a Bannerlord campaign advisor managing a party.
|
||||
Resources are limited — every decision has trade-offs.
|
||||
You MUST respond ONLY with valid JSON. No markdown, no explanation. Raw JSON only."""
|
||||
|
||||
SCENARIOS = [
|
||||
{
|
||||
"description": "Budget allocation under constraint",
|
||||
"prompt": (
|
||||
"You have 500 gold. Options:\n"
|
||||
"- Recruit 10 infantry: costs 300 gold, +10 combat strength\n"
|
||||
"- Buy food for 20 days: costs 200 gold, keeps morale stable\n"
|
||||
"- Repair armor: costs 150 gold, -20% casualty rate\n\n"
|
||||
"You cannot afford all three. Morale is currently CRITICAL (troops may desert).\n"
|
||||
'Choose 1-2 options. Respond: {"choices": ["option_a", ...], "gold_spent": <int>, "reason": "<why>"}\n'
|
||||
"Where option keys are: recruit_infantry, buy_food, repair_armor"
|
||||
),
|
||||
"check": lambda r: (
|
||||
isinstance(r.get("choices"), list)
|
||||
and len(r["choices"]) >= 1
|
||||
and all(c in ["recruit_infantry", "buy_food", "repair_armor"] for c in r["choices"])
|
||||
and isinstance(r.get("gold_spent"), (int, float))
|
||||
and r.get("gold_spent", 9999) <= 500
|
||||
),
|
||||
"check_desc": "choices must be valid options, gold_spent <= 500",
|
||||
"strategic_check": lambda r: "buy_food" in r.get("choices", []),
|
||||
"strategic_desc": "With CRITICAL morale, food should be prioritized",
|
||||
},
|
||||
{
|
||||
"description": "Troop tier upgrade decision",
|
||||
"prompt": (
|
||||
"Party status:\n"
|
||||
"- 15 Tier-1 recruits (weak, 30 upkeep/day)\n"
|
||||
"- 5 Tier-3 veterans (strong, 90 upkeep/day)\n"
|
||||
"- Daily income: 200 gold\n"
|
||||
"- Upcoming: raider camp attack (moderate difficulty)\n\n"
|
||||
"Options:\n"
|
||||
"- Upgrade 5 recruits to Tier-2 (costs 250 gold total)\n"
|
||||
"- Keep all current troops, save gold for emergencies\n"
|
||||
"- Dismiss 5 recruits to save upkeep\n\n"
|
||||
'Respond: {"action": "upgrade_recruits"|"save_gold"|"dismiss_recruits", '
|
||||
'"reason": "<why>", "expected_outcome": "<string>"}'
|
||||
),
|
||||
"check": lambda r: (
|
||||
r.get("action") in ["upgrade_recruits", "save_gold", "dismiss_recruits"]
|
||||
and isinstance(r.get("reason"), str)
|
||||
and len(r.get("reason", "")) > 0
|
||||
),
|
||||
"check_desc": "action must be one of the three options with a non-empty reason",
|
||||
"strategic_check": lambda r: r.get("action") in ["upgrade_recruits", "save_gold"],
|
||||
"strategic_desc": "Dismissing troops before a fight is suboptimal",
|
||||
},
|
||||
{
|
||||
"description": "Multi-turn planning horizon",
|
||||
"prompt": (
|
||||
"Current: 300 gold, 10 days of food, 20 troops\n"
|
||||
"Day 5: Must cross desert (costs 5 extra food days)\n"
|
||||
"Day 10: Reach town (can buy supplies)\n\n"
|
||||
"You need a 15-day food reserve to survive the journey.\n"
|
||||
"Food costs 10 gold/day. You have enough for 10 days now.\n\n"
|
||||
"How many extra food days do you buy today?\n"
|
||||
'Respond: {"extra_food_days": <int>, "cost": <int>, "remaining_gold": <int>, "reason": "<why>"}'
|
||||
),
|
||||
"check": lambda r: (
|
||||
isinstance(r.get("extra_food_days"), (int, float))
|
||||
and isinstance(r.get("cost"), (int, float))
|
||||
and isinstance(r.get("remaining_gold"), (int, float))
|
||||
),
|
||||
"check_desc": "Must include extra_food_days, cost, remaining_gold as numbers",
|
||||
"strategic_check": lambda r: r.get("extra_food_days", 0) >= 5,
|
||||
"strategic_desc": "Need at least 5 more days of food for desert crossing",
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
@dataclass
|
||||
class ScenarioResult:
|
||||
scenario_index: int
|
||||
description: str
|
||||
raw_response: str
|
||||
parsed: dict | None
|
||||
valid_json: bool
|
||||
schema_valid: bool
|
||||
strategically_sound: bool
|
||||
latency_ms: float
|
||||
error: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class LevelResult:
|
||||
level: int = LEVEL
|
||||
name: str = NAME
|
||||
trials: list[ScenarioResult] = field(default_factory=list)
|
||||
passed: bool = False
|
||||
score: float = 0.0
|
||||
latency_p50_ms: float = 0.0
|
||||
latency_p99_ms: float = 0.0
|
||||
|
||||
|
||||
def _clean_response(raw: str) -> str:
|
||||
raw = raw.strip()
|
||||
if raw.startswith("```"):
|
||||
lines = raw.splitlines()
|
||||
lines = [l for l in lines if not l.startswith("```")]
|
||||
raw = "\n".join(lines).strip()
|
||||
return raw
|
||||
|
||||
|
||||
def run(client: Any, model: str, verbose: bool = False) -> LevelResult:
|
||||
result = LevelResult()
|
||||
latencies = []
|
||||
|
||||
for i, scenario in enumerate(SCENARIOS):
|
||||
t0 = time.time()
|
||||
try:
|
||||
response = client.chat(
|
||||
model=model,
|
||||
messages=[
|
||||
{"role": "system", "content": SYSTEM_PROMPT},
|
||||
{"role": "user", "content": scenario["prompt"]},
|
||||
],
|
||||
options={"temperature": 0.1},
|
||||
)
|
||||
raw = response["message"]["content"]
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
except Exception as exc:
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
sr = ScenarioResult(
|
||||
scenario_index=i,
|
||||
description=scenario["description"],
|
||||
raw_response="",
|
||||
parsed=None,
|
||||
valid_json=False,
|
||||
schema_valid=False,
|
||||
strategically_sound=False,
|
||||
latency_ms=latency_ms,
|
||||
error=str(exc),
|
||||
)
|
||||
result.trials.append(sr)
|
||||
if verbose:
|
||||
print(f" Scenario {i}: ERROR — {exc}")
|
||||
continue
|
||||
|
||||
latencies.append(latency_ms)
|
||||
|
||||
cleaned = _clean_response(raw)
|
||||
parsed = None
|
||||
valid_json = False
|
||||
schema_valid = False
|
||||
strategically_sound = False
|
||||
error = ""
|
||||
|
||||
try:
|
||||
parsed = json.loads(cleaned)
|
||||
valid_json = True
|
||||
schema_valid = scenario["check"](parsed)
|
||||
if schema_valid:
|
||||
strategically_sound = scenario["strategic_check"](parsed)
|
||||
except json.JSONDecodeError as exc:
|
||||
error = f"JSONDecodeError: {exc}"
|
||||
except Exception as exc:
|
||||
error = f"Validation error: {exc}"
|
||||
|
||||
sr = ScenarioResult(
|
||||
scenario_index=i,
|
||||
description=scenario["description"],
|
||||
raw_response=raw,
|
||||
parsed=parsed,
|
||||
valid_json=valid_json,
|
||||
schema_valid=schema_valid,
|
||||
strategically_sound=strategically_sound,
|
||||
latency_ms=latency_ms,
|
||||
error=error,
|
||||
)
|
||||
result.trials.append(sr)
|
||||
|
||||
if verbose:
|
||||
status = "PASS" if (valid_json and schema_valid) else "FAIL"
|
||||
strat = "strategic" if strategically_sound else "suboptimal"
|
||||
print(
|
||||
f" Scenario {i} [{scenario['description']}]: {status} ({strat}) "
|
||||
f"| {latency_ms:.0f}ms"
|
||||
)
|
||||
if not schema_valid and valid_json:
|
||||
print(f" Schema issue: {scenario['check_desc']}")
|
||||
if not strategically_sound and schema_valid:
|
||||
print(f" Strategy note: {scenario['strategic_desc']}")
|
||||
|
||||
valid_count = sum(1 for t in result.trials if t.valid_json and t.schema_valid)
|
||||
result.score = valid_count / len(SCENARIOS)
|
||||
result.passed = result.score >= 0.67 # 2/3 scenarios
|
||||
|
||||
if latencies:
|
||||
latencies_sorted = sorted(latencies)
|
||||
result.latency_p50_ms = latencies_sorted[len(latencies_sorted) // 2]
|
||||
result.latency_p99_ms = latencies_sorted[-1]
|
||||
|
||||
return result
|
||||
@@ -1,216 +0,0 @@
|
||||
"""Level 3: Battle Tactics — Formation Commands.
|
||||
|
||||
Tests whether the model can issue coherent formation and tactical orders
|
||||
under simulated battlefield pressure with multiple unit types.
|
||||
Maps to: Bannerlord formation commands (charge, shield wall, skirmish, etc.).
|
||||
"""
|
||||
|
||||
import json
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
LEVEL = 3
|
||||
NAME = "Battle Tactics (Formation Commands)"
|
||||
DESCRIPTION = "Model must issue tactically sound formation orders under simulated battle conditions."
|
||||
|
||||
SYSTEM_PROMPT = """You are a Bannerlord battle commander. Issue formation orders using these commands:
|
||||
- shield_wall: infantry forms defensive line (good vs ranged, slow advance)
|
||||
- charge: all-out attack (high casualties, breaks weak enemies fast)
|
||||
- skirmish: ranged units pepper enemy (good vs heavy infantry, needs distance)
|
||||
- advance: move forward holding formation (balanced)
|
||||
- flank_left / flank_right: cavalry sweeps around enemy side
|
||||
- fallback: retreat to regroup (when badly outnumbered)
|
||||
|
||||
You MUST respond ONLY with valid JSON. No markdown. Raw JSON only."""
|
||||
|
||||
SCENARIOS = [
|
||||
{
|
||||
"description": "Ranged vs infantry — defensive opening",
|
||||
"prompt": (
|
||||
"Situation: You have 20 archers + 10 infantry. Enemy has 30 heavy infantry, no ranged.\n"
|
||||
"Enemy is 200m away and advancing.\n"
|
||||
"Objective: Maximize casualties before melee contact.\n\n"
|
||||
'Issue orders for both units. Respond:\n'
|
||||
'{"infantry_order": "<command>", "archer_order": "<command>", '
|
||||
'"reason": "<tactical reasoning>", "expected_outcome": "<string>"}'
|
||||
),
|
||||
"check": lambda r: (
|
||||
r.get("infantry_order") in ["shield_wall", "charge", "skirmish", "advance", "flank_left", "flank_right", "fallback"]
|
||||
and r.get("archer_order") in ["shield_wall", "charge", "skirmish", "advance", "flank_left", "flank_right", "fallback"]
|
||||
and isinstance(r.get("reason"), str)
|
||||
),
|
||||
"check_desc": "Both orders must be valid commands",
|
||||
"strategic_check": lambda r: (
|
||||
r.get("archer_order") == "skirmish"
|
||||
and r.get("infantry_order") in ["shield_wall", "advance"]
|
||||
),
|
||||
"strategic_desc": "Archers should skirmish while infantry holds (shield_wall or advance)",
|
||||
},
|
||||
{
|
||||
"description": "Outnumbered — retreat decision",
|
||||
"prompt": (
|
||||
"Situation: Your party (15 troops) has been ambushed.\n"
|
||||
"Enemy: 60 bandits, surrounding you on 3 sides.\n"
|
||||
"Your troops: 40% wounded. One escape route to the east.\n\n"
|
||||
'What is your command? Respond:\n'
|
||||
'{"order": "<command>", "direction": "east"|"west"|"north"|"south"|null, '
|
||||
'"reason": "<tactical reasoning>", "priority": "preserve_troops"|"fight_through"}'
|
||||
),
|
||||
"check": lambda r: (
|
||||
r.get("order") in ["shield_wall", "charge", "skirmish", "advance", "flank_left", "flank_right", "fallback"]
|
||||
and r.get("priority") in ["preserve_troops", "fight_through"]
|
||||
),
|
||||
"check_desc": "order and priority must be valid values",
|
||||
"strategic_check": lambda r: (
|
||||
r.get("order") == "fallback"
|
||||
and r.get("priority") == "preserve_troops"
|
||||
),
|
||||
"strategic_desc": "Outnumbered 4:1 with wounded troops — fallback is the sound choice",
|
||||
},
|
||||
{
|
||||
"description": "Cavalry flanking opportunity",
|
||||
"prompt": (
|
||||
"Situation: Main battle is engaged. Your infantry and enemy infantry are locked.\n"
|
||||
"You have 8 cavalry in reserve. Enemy left flank is unprotected.\n"
|
||||
"If cavalry hits the flank now, it will route enemy in ~30 seconds.\n\n"
|
||||
'Order for cavalry: Respond:\n'
|
||||
'{"cavalry_order": "<command>", "timing": "now"|"wait", '
|
||||
'"reason": "<reasoning>", "risk": "low"|"medium"|"high"}'
|
||||
),
|
||||
"check": lambda r: (
|
||||
r.get("cavalry_order") in ["shield_wall", "charge", "skirmish", "advance", "flank_left", "flank_right", "fallback"]
|
||||
and r.get("timing") in ["now", "wait"]
|
||||
and r.get("risk") in ["low", "medium", "high"]
|
||||
),
|
||||
"check_desc": "cavalry_order, timing, and risk must be valid values",
|
||||
"strategic_check": lambda r: (
|
||||
r.get("cavalry_order") in ["flank_left", "flank_right", "charge"]
|
||||
and r.get("timing") == "now"
|
||||
),
|
||||
"strategic_desc": "Should capitalize on the flank opportunity immediately",
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
@dataclass
|
||||
class ScenarioResult:
|
||||
scenario_index: int
|
||||
description: str
|
||||
raw_response: str
|
||||
parsed: dict | None
|
||||
valid_json: bool
|
||||
schema_valid: bool
|
||||
strategically_sound: bool
|
||||
latency_ms: float
|
||||
error: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class LevelResult:
|
||||
level: int = LEVEL
|
||||
name: str = NAME
|
||||
trials: list[ScenarioResult] = field(default_factory=list)
|
||||
passed: bool = False
|
||||
score: float = 0.0
|
||||
latency_p50_ms: float = 0.0
|
||||
latency_p99_ms: float = 0.0
|
||||
|
||||
|
||||
def _clean_response(raw: str) -> str:
|
||||
raw = raw.strip()
|
||||
if raw.startswith("```"):
|
||||
lines = raw.splitlines()
|
||||
lines = [l for l in lines if not l.startswith("```")]
|
||||
raw = "\n".join(lines).strip()
|
||||
return raw
|
||||
|
||||
|
||||
def run(client: Any, model: str, verbose: bool = False) -> LevelResult:
|
||||
result = LevelResult()
|
||||
latencies = []
|
||||
|
||||
for i, scenario in enumerate(SCENARIOS):
|
||||
t0 = time.time()
|
||||
try:
|
||||
response = client.chat(
|
||||
model=model,
|
||||
messages=[
|
||||
{"role": "system", "content": SYSTEM_PROMPT},
|
||||
{"role": "user", "content": scenario["prompt"]},
|
||||
],
|
||||
options={"temperature": 0.2},
|
||||
)
|
||||
raw = response["message"]["content"]
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
except Exception as exc:
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
sr = ScenarioResult(
|
||||
scenario_index=i,
|
||||
description=scenario["description"],
|
||||
raw_response="",
|
||||
parsed=None,
|
||||
valid_json=False,
|
||||
schema_valid=False,
|
||||
strategically_sound=False,
|
||||
latency_ms=latency_ms,
|
||||
error=str(exc),
|
||||
)
|
||||
result.trials.append(sr)
|
||||
if verbose:
|
||||
print(f" Scenario {i}: ERROR — {exc}")
|
||||
continue
|
||||
|
||||
latencies.append(latency_ms)
|
||||
|
||||
cleaned = _clean_response(raw)
|
||||
parsed = None
|
||||
valid_json = False
|
||||
schema_valid = False
|
||||
strategically_sound = False
|
||||
error = ""
|
||||
|
||||
try:
|
||||
parsed = json.loads(cleaned)
|
||||
valid_json = True
|
||||
schema_valid = scenario["check"](parsed)
|
||||
if schema_valid:
|
||||
strategically_sound = scenario["strategic_check"](parsed)
|
||||
except json.JSONDecodeError as exc:
|
||||
error = f"JSONDecodeError: {exc}"
|
||||
except Exception as exc:
|
||||
error = f"Validation error: {exc}"
|
||||
|
||||
sr = ScenarioResult(
|
||||
scenario_index=i,
|
||||
description=scenario["description"],
|
||||
raw_response=raw,
|
||||
parsed=parsed,
|
||||
valid_json=valid_json,
|
||||
schema_valid=schema_valid,
|
||||
strategically_sound=strategically_sound,
|
||||
latency_ms=latency_ms,
|
||||
error=error,
|
||||
)
|
||||
result.trials.append(sr)
|
||||
|
||||
if verbose:
|
||||
status = "PASS" if (valid_json and schema_valid) else "FAIL"
|
||||
strat = "strategic" if strategically_sound else "suboptimal"
|
||||
print(
|
||||
f" Scenario {i} [{scenario['description']}]: {status} ({strat}) "
|
||||
f"| {latency_ms:.0f}ms"
|
||||
)
|
||||
if not schema_valid and valid_json:
|
||||
print(f" Schema issue: {scenario['check_desc']}")
|
||||
|
||||
valid_count = sum(1 for t in result.trials if t.valid_json and t.schema_valid)
|
||||
result.score = valid_count / len(SCENARIOS)
|
||||
result.passed = result.score >= 0.67
|
||||
|
||||
if latencies:
|
||||
latencies_sorted = sorted(latencies)
|
||||
result.latency_p50_ms = latencies_sorted[len(latencies_sorted) // 2]
|
||||
result.latency_p99_ms = latencies_sorted[-1]
|
||||
|
||||
return result
|
||||
@@ -1,223 +0,0 @@
|
||||
"""Level 4: Trade Route — Campaign Navigation.
|
||||
|
||||
Tests multi-step planning ability: route optimization, trade-off analysis
|
||||
across time horizons, and adapting plans when conditions change.
|
||||
Maps to: Bannerlord campaign map navigation, caravans, and economy.
|
||||
"""
|
||||
|
||||
import json
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
LEVEL = 4
|
||||
NAME = "Trade Route (Campaign Navigation)"
|
||||
DESCRIPTION = "Model must plan optimal routes and adapt to changing conditions on the campaign map."
|
||||
|
||||
SYSTEM_PROMPT = """You are a Bannerlord merchant lord planning campaign movements.
|
||||
Consider distance, profitability, risk, and timing.
|
||||
You MUST respond ONLY with valid JSON. No markdown. Raw JSON only."""
|
||||
|
||||
SCENARIOS = [
|
||||
{
|
||||
"description": "Optimal trade route selection",
|
||||
"prompt": (
|
||||
"You are at Epicrotea with 500 gold and 20 days travel budget.\n\n"
|
||||
"Trade opportunities:\n"
|
||||
"- Route A: Epicrotea → Vlandia (3 days) → Sturgia (5 days back)\n"
|
||||
" Sell grain in Vlandia: +300 gold. Buy furs in Sturgia: costs 200, sells for 400 in Calradia.\n"
|
||||
" Total: +500 gold profit, 8 days.\n"
|
||||
"- Route B: Epicrotea → Calradia (2 days) → Aserai (4 days)\n"
|
||||
" Sell iron in Calradia: +150 gold. Buy spice in Aserai: costs 300, sells for 600 in Empire.\n"
|
||||
" Empire is 6 more days away. Total: +450 gold profit, 12 days.\n"
|
||||
"- Route C: Epicrotea → nearby village (1 day)\n"
|
||||
" Buy cheap food: costs 100, sells for 180 in any city.\n"
|
||||
" Total: +80 gold profit, 2 days. Repeatable.\n\n"
|
||||
'Choose route. Respond:\n'
|
||||
'{"route": "A"|"B"|"C", "expected_profit": <int>, "days_used": <int>, '
|
||||
'"reason": "<reasoning>", "risk": "low"|"medium"|"high"}'
|
||||
),
|
||||
"check": lambda r: (
|
||||
r.get("route") in ["A", "B", "C"]
|
||||
and isinstance(r.get("expected_profit"), (int, float))
|
||||
and isinstance(r.get("days_used"), (int, float))
|
||||
and r.get("risk") in ["low", "medium", "high"]
|
||||
),
|
||||
"check_desc": "route, expected_profit, days_used, risk must be valid",
|
||||
"strategic_check": lambda r: r.get("route") in ["A", "C"], # A is best single trip, C is best if repeated
|
||||
"strategic_desc": "Route A has best profit/day ratio; C is best if multiple loops possible",
|
||||
},
|
||||
{
|
||||
"description": "Adapt plan when war declared",
|
||||
"prompt": (
|
||||
"You were heading to Vlandia to trade, 2 days into the journey.\n"
|
||||
"NEWS: Vlandia just declared war on your faction. Entering Vlandia territory is now dangerous.\n\n"
|
||||
"Your current position: borderlands, equidistant between:\n"
|
||||
"- Vlandia (2 days): Now at war — high risk of attack\n"
|
||||
"- Sturgia (3 days): Neutral — safe\n"
|
||||
"- Empire (4 days): Allied — very safe, good prices\n\n"
|
||||
"You have 400 gold of trade goods for the Vlandia market.\n"
|
||||
'What do you do? Respond:\n'
|
||||
'{"decision": "continue_to_vlandia"|"divert_to_sturgia"|"divert_to_empire", '
|
||||
'"reason": "<why>", "gold_at_risk": <int>}'
|
||||
),
|
||||
"check": lambda r: (
|
||||
r.get("decision") in ["continue_to_vlandia", "divert_to_sturgia", "divert_to_empire"]
|
||||
and isinstance(r.get("gold_at_risk"), (int, float))
|
||||
),
|
||||
"check_desc": "decision must be one of three options, gold_at_risk must be a number",
|
||||
"strategic_check": lambda r: r.get("decision") in ["divert_to_sturgia", "divert_to_empire"],
|
||||
"strategic_desc": "Should avoid active war zone — divert to safe destination",
|
||||
},
|
||||
{
|
||||
"description": "Multi-stop route planning with constraints",
|
||||
"prompt": (
|
||||
"Plan a 3-stop trading circuit starting and ending at Pravend.\n"
|
||||
"Budget: 800 gold. Time limit: 20 days.\n\n"
|
||||
"Available cities and travel times from Pravend:\n"
|
||||
"- Rhotae: 2 days (leather cheap, sells well in south)\n"
|
||||
"- Ortysia: 4 days (grain surplus — buy cheap)\n"
|
||||
"- Epicrotea: 3 days (iron market — buy/sell)\n"
|
||||
"- Pen Cannoc: 5 days (wine — high profit, far)\n\n"
|
||||
"Each stop takes 1 day for trading.\n"
|
||||
'Plan 3 stops. Respond:\n'
|
||||
'{"stops": ["<city1>", "<city2>", "<city3>"], '
|
||||
'"total_days": <int>, "estimated_profit": <int>, '
|
||||
'"reason": "<reasoning>"}'
|
||||
),
|
||||
"check": lambda r: (
|
||||
isinstance(r.get("stops"), list)
|
||||
and len(r["stops"]) == 3
|
||||
and all(isinstance(s, str) for s in r["stops"])
|
||||
and isinstance(r.get("total_days"), (int, float))
|
||||
and r.get("total_days", 99) <= 20
|
||||
and isinstance(r.get("estimated_profit"), (int, float))
|
||||
),
|
||||
"check_desc": "stops must be list of 3 strings, total_days <= 20, estimated_profit numeric",
|
||||
"strategic_check": lambda r: "Pen Cannoc" not in r.get("stops", []), # Too far for 20 days
|
||||
"strategic_desc": "Pen Cannoc at 5 days each way is likely too far for a 20-day circuit",
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
@dataclass
|
||||
class ScenarioResult:
|
||||
scenario_index: int
|
||||
description: str
|
||||
raw_response: str
|
||||
parsed: dict | None
|
||||
valid_json: bool
|
||||
schema_valid: bool
|
||||
strategically_sound: bool
|
||||
latency_ms: float
|
||||
error: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class LevelResult:
|
||||
level: int = LEVEL
|
||||
name: str = NAME
|
||||
trials: list[ScenarioResult] = field(default_factory=list)
|
||||
passed: bool = False
|
||||
score: float = 0.0
|
||||
latency_p50_ms: float = 0.0
|
||||
latency_p99_ms: float = 0.0
|
||||
|
||||
|
||||
def _clean_response(raw: str) -> str:
|
||||
raw = raw.strip()
|
||||
if raw.startswith("```"):
|
||||
lines = raw.splitlines()
|
||||
lines = [l for l in lines if not l.startswith("```")]
|
||||
raw = "\n".join(lines).strip()
|
||||
return raw
|
||||
|
||||
|
||||
def run(client: Any, model: str, verbose: bool = False) -> LevelResult:
|
||||
result = LevelResult()
|
||||
latencies = []
|
||||
|
||||
for i, scenario in enumerate(SCENARIOS):
|
||||
t0 = time.time()
|
||||
try:
|
||||
response = client.chat(
|
||||
model=model,
|
||||
messages=[
|
||||
{"role": "system", "content": SYSTEM_PROMPT},
|
||||
{"role": "user", "content": scenario["prompt"]},
|
||||
],
|
||||
options={"temperature": 0.2},
|
||||
)
|
||||
raw = response["message"]["content"]
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
except Exception as exc:
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
sr = ScenarioResult(
|
||||
scenario_index=i,
|
||||
description=scenario["description"],
|
||||
raw_response="",
|
||||
parsed=None,
|
||||
valid_json=False,
|
||||
schema_valid=False,
|
||||
strategically_sound=False,
|
||||
latency_ms=latency_ms,
|
||||
error=str(exc),
|
||||
)
|
||||
result.trials.append(sr)
|
||||
if verbose:
|
||||
print(f" Scenario {i}: ERROR — {exc}")
|
||||
continue
|
||||
|
||||
latencies.append(latency_ms)
|
||||
|
||||
cleaned = _clean_response(raw)
|
||||
parsed = None
|
||||
valid_json = False
|
||||
schema_valid = False
|
||||
strategically_sound = False
|
||||
error = ""
|
||||
|
||||
try:
|
||||
parsed = json.loads(cleaned)
|
||||
valid_json = True
|
||||
schema_valid = scenario["check"](parsed)
|
||||
if schema_valid:
|
||||
strategically_sound = scenario["strategic_check"](parsed)
|
||||
except json.JSONDecodeError as exc:
|
||||
error = f"JSONDecodeError: {exc}"
|
||||
except Exception as exc:
|
||||
error = f"Validation error: {exc}"
|
||||
|
||||
sr = ScenarioResult(
|
||||
scenario_index=i,
|
||||
description=scenario["description"],
|
||||
raw_response=raw,
|
||||
parsed=parsed,
|
||||
valid_json=valid_json,
|
||||
schema_valid=schema_valid,
|
||||
strategically_sound=strategically_sound,
|
||||
latency_ms=latency_ms,
|
||||
error=error,
|
||||
)
|
||||
result.trials.append(sr)
|
||||
|
||||
if verbose:
|
||||
status = "PASS" if (valid_json and schema_valid) else "FAIL"
|
||||
strat = "strategic" if strategically_sound else "suboptimal"
|
||||
print(
|
||||
f" Scenario {i} [{scenario['description']}]: {status} ({strat}) "
|
||||
f"| {latency_ms:.0f}ms"
|
||||
)
|
||||
if not schema_valid and valid_json:
|
||||
print(f" Schema issue: {scenario['check_desc']}")
|
||||
|
||||
valid_count = sum(1 for t in result.trials if t.valid_json and t.schema_valid)
|
||||
result.score = valid_count / len(SCENARIOS)
|
||||
result.passed = result.score >= 0.67
|
||||
|
||||
if latencies:
|
||||
latencies_sorted = sorted(latencies)
|
||||
result.latency_p50_ms = latencies_sorted[len(latencies_sorted) // 2]
|
||||
result.latency_p99_ms = latencies_sorted[-1]
|
||||
|
||||
return result
|
||||
@@ -1,252 +0,0 @@
|
||||
"""Level 5: Mini Campaign — Full Campaign Loop.
|
||||
|
||||
Tests multi-turn strategic coherence: the model must maintain state across
|
||||
several turns of a simulated Bannerlord campaign, making consistent decisions
|
||||
that build toward a long-term goal.
|
||||
Maps to: Full Bannerlord campaign loop — economy, diplomacy, conquest.
|
||||
"""
|
||||
|
||||
import json
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
LEVEL = 5
|
||||
NAME = "Mini Campaign (Full Campaign Loop)"
|
||||
DESCRIPTION = "Multi-turn strategic planning maintaining coherent goals across 4 turns."
|
||||
|
||||
SYSTEM_PROMPT = """You are Timmy, a Bannerlord lord with ambitions to become King of Calradia.
|
||||
You have 4 turns to establish a power base. Each turn represents 2 weeks of in-game time.
|
||||
|
||||
Your starting position:
|
||||
- Clan tier: 1 (minor lord)
|
||||
- Gold: 1000
|
||||
- Troops: 25 (mixed infantry/cavalry)
|
||||
- Renown: 150
|
||||
- Relations: Neutral with all factions
|
||||
|
||||
Winning requires: Gold > 3000 AND Renown > 400 AND Own 1+ settlement by Turn 4.
|
||||
|
||||
Each turn, choose ONE primary action:
|
||||
- "raid_village": +200 gold, -50 relations target faction, +30 renown, risk of retaliation
|
||||
- "trade_circuit": +300 gold, 0 relation change, +10 renown, no risk
|
||||
- "escort_caravan": +150 gold, +20 relations with faction, +20 renown
|
||||
- "tournament": costs 100 gold, +60 renown, +20 relations with host faction
|
||||
- "recruit_troops": costs 200 gold, +15 troops, no other change
|
||||
- "siege_castle": costs 500 gold + 200 troops morale, -100 relations, +80 renown, +1 settlement if succeed (30% base chance)
|
||||
- "pledge_vassalage": 0 cost, +100 relations with liege, +50 renown, lose independence
|
||||
|
||||
You MUST respond ONLY with valid JSON for each turn. Raw JSON only."""
|
||||
|
||||
|
||||
def run(client: Any, model: str, verbose: bool = False) -> "LevelResult":
|
||||
"""Run a 4-turn mini campaign, tracking state and decision quality."""
|
||||
result = LevelResult()
|
||||
|
||||
# Initial game state
|
||||
state = {
|
||||
"turn": 1,
|
||||
"gold": 1000,
|
||||
"troops": 25,
|
||||
"renown": 150,
|
||||
"settlements": 0,
|
||||
"relations": {"vlandia": 0, "sturgia": 0, "empire": 0, "aserai": 0, "battania": 0},
|
||||
}
|
||||
|
||||
conversation = [{"role": "system", "content": SYSTEM_PROMPT}]
|
||||
turns_passed = []
|
||||
total_latency = []
|
||||
|
||||
valid_actions = [
|
||||
"raid_village", "trade_circuit", "escort_caravan", "tournament",
|
||||
"recruit_troops", "siege_castle", "pledge_vassalage",
|
||||
]
|
||||
|
||||
for turn_num in range(1, 5):
|
||||
state["turn"] = turn_num
|
||||
state_str = json.dumps(state, indent=2)
|
||||
|
||||
prompt = (
|
||||
f"=== TURN {turn_num} / 4 ===\n"
|
||||
f"Current state:\n{state_str}\n\n"
|
||||
f"Win conditions remaining: Gold > 3000 ({state['gold']}/3000), "
|
||||
f"Renown > 400 ({state['renown']}/400), Settlements >= 1 ({state['settlements']}/1)\n\n"
|
||||
f"Choose your action for Turn {turn_num}.\n"
|
||||
f'Respond: {{"action": "<action>", "target_faction": "<faction or null>", '
|
||||
f'"reason": "<strategic reasoning>", "goal": "<what this advances>"}}'
|
||||
)
|
||||
|
||||
conversation.append({"role": "user", "content": prompt})
|
||||
|
||||
t0 = time.time()
|
||||
try:
|
||||
response = client.chat(
|
||||
model=model,
|
||||
messages=conversation,
|
||||
options={"temperature": 0.3},
|
||||
)
|
||||
raw = response["message"]["content"]
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
except Exception as exc:
|
||||
latency_ms = (time.time() - t0) * 1000
|
||||
tr = TurnResult(
|
||||
turn=turn_num,
|
||||
state_before=dict(state),
|
||||
raw_response="",
|
||||
parsed=None,
|
||||
valid_json=False,
|
||||
valid_action=False,
|
||||
action=None,
|
||||
latency_ms=latency_ms,
|
||||
error=str(exc),
|
||||
)
|
||||
turns_passed.append(tr)
|
||||
if verbose:
|
||||
print(f" Turn {turn_num}: ERROR — {exc}")
|
||||
break
|
||||
|
||||
total_latency.append(latency_ms)
|
||||
|
||||
# Clean and parse response
|
||||
cleaned = raw.strip()
|
||||
if cleaned.startswith("```"):
|
||||
lines = cleaned.splitlines()
|
||||
lines = [l for l in lines if not l.startswith("```")]
|
||||
cleaned = "\n".join(lines).strip()
|
||||
|
||||
parsed = None
|
||||
valid_json = False
|
||||
valid_action = False
|
||||
action = None
|
||||
error = ""
|
||||
|
||||
try:
|
||||
parsed = json.loads(cleaned)
|
||||
valid_json = True
|
||||
action = parsed.get("action")
|
||||
valid_action = action in valid_actions
|
||||
except json.JSONDecodeError as exc:
|
||||
error = f"JSONDecodeError: {exc}"
|
||||
|
||||
tr = TurnResult(
|
||||
turn=turn_num,
|
||||
state_before=dict(state),
|
||||
raw_response=raw,
|
||||
parsed=parsed,
|
||||
valid_json=valid_json,
|
||||
valid_action=valid_action,
|
||||
action=action,
|
||||
latency_ms=latency_ms,
|
||||
error=error,
|
||||
)
|
||||
turns_passed.append(tr)
|
||||
|
||||
# Add model response to conversation for continuity
|
||||
conversation.append({"role": "assistant", "content": raw})
|
||||
|
||||
# Apply state changes based on action
|
||||
if valid_action:
|
||||
_apply_action(state, action, parsed.get("target_faction"))
|
||||
|
||||
if verbose:
|
||||
status = "PASS" if (valid_json and valid_action) else "FAIL"
|
||||
print(
|
||||
f" Turn {turn_num}: {status} | action={action} | {latency_ms:.0f}ms | "
|
||||
f"gold={state['gold']} renown={state['renown']} settlements={state['settlements']}"
|
||||
)
|
||||
|
||||
result.turns = turns_passed
|
||||
result.final_state = dict(state)
|
||||
|
||||
# Win condition check
|
||||
result.reached_gold_target = state["gold"] >= 3000
|
||||
result.reached_renown_target = state["renown"] >= 400
|
||||
result.reached_settlement_target = state["settlements"] >= 1
|
||||
|
||||
# Score: % of turns with valid JSON + valid action
|
||||
valid_turns = sum(1 for t in turns_passed if t.valid_json and t.valid_action)
|
||||
result.score = valid_turns / 4 if turns_passed else 0.0
|
||||
result.passed = result.score >= 0.75 # 3/4 turns valid
|
||||
|
||||
if total_latency:
|
||||
latencies_sorted = sorted(total_latency)
|
||||
result.latency_p50_ms = latencies_sorted[len(latencies_sorted) // 2]
|
||||
result.latency_p99_ms = latencies_sorted[-1]
|
||||
|
||||
if verbose:
|
||||
win_status = []
|
||||
if result.reached_gold_target:
|
||||
win_status.append("GOLD")
|
||||
if result.reached_renown_target:
|
||||
win_status.append("RENOWN")
|
||||
if result.reached_settlement_target:
|
||||
win_status.append("SETTLEMENT")
|
||||
print(f" Win conditions met: {win_status or 'none'}")
|
||||
print(f" Final: gold={state['gold']} renown={state['renown']} settlements={state['settlements']}")
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def _apply_action(state: dict, action: str, target_faction: str | None) -> None:
|
||||
"""Simulate game state changes for a given action."""
|
||||
if action == "raid_village":
|
||||
state["gold"] += 200
|
||||
state["renown"] += 30
|
||||
if target_faction and target_faction in state["relations"]:
|
||||
state["relations"][target_faction] -= 50
|
||||
elif action == "trade_circuit":
|
||||
state["gold"] += 300
|
||||
state["renown"] += 10
|
||||
elif action == "escort_caravan":
|
||||
state["gold"] += 150
|
||||
state["renown"] += 20
|
||||
if target_faction and target_faction in state["relations"]:
|
||||
state["relations"][target_faction] += 20
|
||||
elif action == "tournament":
|
||||
state["gold"] -= 100
|
||||
state["renown"] += 60
|
||||
if target_faction and target_faction in state["relations"]:
|
||||
state["relations"][target_faction] += 20
|
||||
elif action == "recruit_troops":
|
||||
state["gold"] -= 200
|
||||
state["troops"] += 15
|
||||
elif action == "siege_castle":
|
||||
state["gold"] -= 500
|
||||
state["renown"] += 80
|
||||
# 30% chance success (deterministic sim: succeed on turn 3+ if attempted)
|
||||
if state["turn"] >= 3:
|
||||
state["settlements"] += 1
|
||||
if target_faction and target_faction in state["relations"]:
|
||||
state["relations"][target_faction] -= 100
|
||||
elif action == "pledge_vassalage":
|
||||
state["renown"] += 50
|
||||
if target_faction and target_faction in state["relations"]:
|
||||
state["relations"][target_faction] += 100
|
||||
|
||||
|
||||
@dataclass
|
||||
class TurnResult:
|
||||
turn: int
|
||||
state_before: dict
|
||||
raw_response: str
|
||||
parsed: dict | None
|
||||
valid_json: bool
|
||||
valid_action: bool
|
||||
action: str | None
|
||||
latency_ms: float
|
||||
error: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class LevelResult:
|
||||
level: int = LEVEL
|
||||
name: str = NAME
|
||||
turns: list[TurnResult] = field(default_factory=list)
|
||||
final_state: dict = field(default_factory=dict)
|
||||
passed: bool = False
|
||||
score: float = 0.0
|
||||
reached_gold_target: bool = False
|
||||
reached_renown_target: bool = False
|
||||
reached_settlement_target: bool = False
|
||||
latency_p50_ms: float = 0.0
|
||||
latency_p99_ms: float = 0.0
|
||||
@@ -1,82 +0,0 @@
|
||||
# Bannerlord M0 — Cognitive Benchmark Scorecard
|
||||
|
||||
**Date:** 2026-03-23
|
||||
**Benchmark:** 6-level cognitive harness (L0–L5)
|
||||
**M1 Gate:** Must pass L0 + L1, latency < 10s per decision
|
||||
|
||||
---
|
||||
|
||||
## Results Summary
|
||||
|
||||
| Level | Description | qwen2.5:14b | hermes3:latest | hermes3:8b |
|
||||
|-------|-------------|:-----------:|:--------------:|:----------:|
|
||||
| **L0 [M1 GATE]** | JSON Compliance | ✓ PASS 100% | ✓ PASS 100% | ✓ PASS 100% |
|
||||
| **L1 [M1 GATE]** | Board State Tracking | ✗ FAIL 50% | ✗ FAIL 50% | ✗ FAIL 50% |
|
||||
| L2 | Resource Management | ✓ PASS 100% | ✓ PASS 100% | ✓ PASS 100% |
|
||||
| L3 | Battle Tactics | ✓ PASS 100% | ✓ PASS 100% | ✓ PASS 100% |
|
||||
| L4 | Trade Route | ✓ PASS 100% | ✓ PASS 100% | ✓ PASS 100% |
|
||||
| L5 | Mini Campaign | ✓ PASS 100% | ✓ PASS 100% | ✓ PASS 100% |
|
||||
| **M1 GATE** | | ✗ **FAIL** | ✗ **FAIL** | ✗ **FAIL** |
|
||||
|
||||
---
|
||||
|
||||
## Latency (p50 / p99)
|
||||
|
||||
| Level | qwen2.5:14b | hermes3:latest | hermes3:8b |
|
||||
|-------|-------------|----------------|------------|
|
||||
| L0 | 1443ms / 6348ms | 1028ms / 1184ms | 570ms / 593ms |
|
||||
| L1 | 943ms / 1184ms | 1166ms / 1303ms | 767ms / 1313ms |
|
||||
| L2 | 2936ms / 3122ms | 2032ms / 2232ms | 2408ms / 2832ms |
|
||||
| L3 | 2248ms / 3828ms | 1614ms / 3525ms | 2174ms / 3437ms |
|
||||
| L4 | 3235ms / 3318ms | 2724ms / 3038ms | 2507ms / 3420ms |
|
||||
| L5 | 3414ms / 3970ms | 3137ms / 3433ms | 2571ms / 2763ms |
|
||||
|
||||
All models are **well under the 10s latency threshold** for L0–L1.
|
||||
|
||||
---
|
||||
|
||||
## Level 1 Failure Analysis
|
||||
|
||||
All three models fail L1 with **identical pattern** (2/4 scenarios pass):
|
||||
|
||||
| Scenario | Expected | All Models |
|
||||
|----------|----------|-----------|
|
||||
| Empty board — opening move | Any empty square | ✓ center (4) |
|
||||
| Block opponent's winning move | Position 1 (only block) | ✗ position 4 (occupied!) |
|
||||
| Take winning move | Position 1 (win) | ✗ position 0 or 2 (occupied!) |
|
||||
| Legal move on partially filled board | Any of 6,7,8 | ✓ position 6 |
|
||||
|
||||
**Root cause:** Models choose moves by heuristic (center, corners) without checking whether the chosen square is already occupied. They read the board description but don't cross-reference their move choice against it. This is a genuine spatial state-tracking failure.
|
||||
|
||||
**Note:** `hermes3` models emit `"move": "4"` (string) vs `"move": 4` (int). The benchmark was patched to coerce string digits to int for L1, since type fidelity is already tested at L0.
|
||||
|
||||
---
|
||||
|
||||
## M1 Gate: FAILED (all models)
|
||||
|
||||
No model passes the M1 gate. The blocker is **Level 1 — Board State Tracking**.
|
||||
|
||||
### Recommendation
|
||||
|
||||
The L1 failure is consistent and structural. All models understand the format and can make reasonable *opening* moves but fail to avoid already-occupied squares. Options for M1:
|
||||
|
||||
1. **Lower the L1 pass threshold** from 100% to ≥ 75% — the scenarios where models fail require recognizing occupied positions from a sparse JSON array, which is a known weakness. Would allow proceeding to M1 with flagged risk.
|
||||
2. **Prompt engineering** — add explicit "The following squares are taken: X at positions [P1, P2]" to the prompt to see if board tracking improves.
|
||||
3. **Re-evaluate L1 gate requirement** — models pass L2–L5 (resource, tactics, trade, campaign) which are more directly relevant to Bannerlord play. Consider whether L1 is the right gate.
|
||||
|
||||
---
|
||||
|
||||
## qwen3:14b
|
||||
|
||||
Model **not available** on this Ollama instance. Available qwen3 model: `qwen3:30b`.
|
||||
`qwen3:30b` was not benchmarked (significantly slower; requires explicit decision to run).
|
||||
|
||||
---
|
||||
|
||||
## Result Files
|
||||
|
||||
| Model | File |
|
||||
|-------|------|
|
||||
| qwen2.5:14b | `results/qwen2.5_14b_20260323_142119.json` |
|
||||
| hermes3:latest | `results/hermes3_latest_20260323_152900.json` |
|
||||
| hermes3:8b | `results/hermes3_8b_20260323_153000.json` |
|
||||
@@ -1,476 +0,0 @@
|
||||
{
|
||||
"model": "hermes3:8b",
|
||||
"timestamp": "2026-03-23T19:29:22.731080+00:00",
|
||||
"skipped": false,
|
||||
"levels": {
|
||||
"0": {
|
||||
"level": 0,
|
||||
"name": "JSON Compliance (Coin Flip)",
|
||||
"trials": [
|
||||
{
|
||||
"trial_index": 0,
|
||||
"prompt": "A coin is flipped. Respond with exactly: {\"choice\": \"heads\"} or {\"choice\": \"tails\"}. Pick one. JSON only.",
|
||||
"raw_response": "{\"choice\": \"heads\"}",
|
||||
"parsed": {
|
||||
"choice": "heads"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"value_valid": true,
|
||||
"latency_ms": 522.9010581970215,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"trial_index": 1,
|
||||
"prompt": "You must attack or defend. Respond with: {\"action\": \"attack\", \"confidence\": 0.8} or {\"action\": \"defend\", \"confidence\": 0.6}. Replace confidence with your own value 0.0-1.0. JSON only.",
|
||||
"raw_response": "{\"action\": \"defend\", \"confidence\": 0.7}",
|
||||
"parsed": {
|
||||
"action": "defend",
|
||||
"confidence": 0.7
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"value_valid": true,
|
||||
"latency_ms": 592.6809310913086,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"trial_index": 2,
|
||||
"prompt": "Choose a direction to march. Respond with exactly: {\"direction\": \"north\", \"reason\": \"string explaining why\"}. Pick north/south/east/west. JSON only.",
|
||||
"raw_response": "{\"direction\": \"north\", \"reason\": \"string explaining why\"}",
|
||||
"parsed": {
|
||||
"direction": "north",
|
||||
"reason": "string explaining why"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"value_valid": true,
|
||||
"latency_ms": 569.8130130767822,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 569.8130130767822,
|
||||
"latency_p99_ms": 592.6809310913086
|
||||
},
|
||||
"1": {
|
||||
"level": 1,
|
||||
"name": "Board State Tracking (Tic-Tac-Toe)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Empty board \u2014 opening move",
|
||||
"prompt": "Board state: [null,null,null,null,null,null,null,null,null]. You are X. It is your turn. Choose a move. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": \"4\", \"reason\": \"Center move to establish dominance\"}",
|
||||
"parsed": {
|
||||
"move": "4",
|
||||
"reason": "Center move to establish dominance"
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": true,
|
||||
"move_correct": true,
|
||||
"latency_ms": 766.8571472167969,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Block opponent's winning move",
|
||||
"prompt": "Board: [\"O\",null,\"O\",null,\"X\",null,null,null,null]. O has positions 0 and 2. You are X. O will win on next turn unless you block. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": 4, \"reason\": \"Block the winning move for O in the center\"}",
|
||||
"parsed": {
|
||||
"move": 4,
|
||||
"reason": "Block the winning move for O in the center"
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": false,
|
||||
"move_correct": false,
|
||||
"latency_ms": 734.1668605804443,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Take winning move",
|
||||
"prompt": "Board: [\"X\",null,\"X\",null,\"O\",null,null,\"O\",null]. You are X. You have positions 0 and 2. You can win this turn. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": 0, \"reason\": \"I can win this turn by placing my X in position 0, creating a winning line with X at positions 0, 4, and 8.\"}",
|
||||
"parsed": {
|
||||
"move": 0,
|
||||
"reason": "I can win this turn by placing my X in position 0, creating a winning line with X at positions 0, 4, and 8."
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": false,
|
||||
"move_correct": false,
|
||||
"latency_ms": 1312.8020763397217,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 3,
|
||||
"description": "Legal move on partially filled board",
|
||||
"prompt": "Board: [\"X\",\"O\",\"X\",\"O\",\"X\",\"O\",null,null,null]. You are O. Choose a legal move (positions 6, 7, or 8 are available). Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": 6, \"reason\": \"Block X's potential winning move in the center.\"}",
|
||||
"parsed": {
|
||||
"move": 6,
|
||||
"reason": "Block X's potential winning move in the center."
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": true,
|
||||
"move_correct": true,
|
||||
"latency_ms": 751.0488033294678,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": false,
|
||||
"score": 0.5,
|
||||
"latency_p50_ms": 766.8571472167969,
|
||||
"latency_p99_ms": 1312.8020763397217
|
||||
},
|
||||
"2": {
|
||||
"level": 2,
|
||||
"name": "Resource Management (Party Economy)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Budget allocation under constraint",
|
||||
"raw_response": "{\"choices\": [\"buy_food\", \"repair_armor\"], \"gold_spent\": 350, \"reason\": \"The priority is to address the critical morale issue and reduce casualty rates. Buying food for 20 days will stabilize morale, preventing potential desertions. Repairing armor will also help reduce the casualty rate, making the remaining troops more effective. Recruit infantry is not chosen as it would deplete all the gold, leaving no funds for food or armor repair.\"}",
|
||||
"parsed": {
|
||||
"choices": [
|
||||
"buy_food",
|
||||
"repair_armor"
|
||||
],
|
||||
"gold_spent": 350,
|
||||
"reason": "The priority is to address the critical morale issue and reduce casualty rates. Buying food for 20 days will stabilize morale, preventing potential desertions. Repairing armor will also help reduce the casualty rate, making the remaining troops more effective. Recruit infantry is not chosen as it would deplete all the gold, leaving no funds for food or armor repair."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2832.072973251343,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Troop tier upgrade decision",
|
||||
"raw_response": "{\"action\": \"upgrade_recruits\", \"reason\": \"Upgrading recruits to Tier-2 will improve our party's combat effectiveness against the upcoming raider camp attack. The moderate difficulty suggests we need a stronger force to succeed.\", \"expected_outcome\": \"Increased chances of victory in the upcoming battle, potentially saving resources in the long run by avoiding higher upkeep costs of dismissing recruits.\"}",
|
||||
"parsed": {
|
||||
"action": "upgrade_recruits",
|
||||
"reason": "Upgrading recruits to Tier-2 will improve our party's combat effectiveness against the upcoming raider camp attack. The moderate difficulty suggests we need a stronger force to succeed.",
|
||||
"expected_outcome": "Increased chances of victory in the upcoming battle, potentially saving resources in the long run by avoiding higher upkeep costs of dismissing recruits."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2407.7491760253906,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Multi-turn planning horizon",
|
||||
"raw_response": "{\"extra_food_days\": 5, \"cost\": 50, \"remaining_gold\": 250, \"reason\": \"Buying 5 extra food days will ensure we have enough food to cross the desert, and we will still have enough gold left to reach the town on day 10 to buy more supplies.\"}",
|
||||
"parsed": {
|
||||
"extra_food_days": 5,
|
||||
"cost": 50,
|
||||
"remaining_gold": 250,
|
||||
"reason": "Buying 5 extra food days will ensure we have enough food to cross the desert, and we will still have enough gold left to reach the town on day 10 to buy more supplies."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2028.775930404663,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 2407.7491760253906,
|
||||
"latency_p99_ms": 2832.072973251343
|
||||
},
|
||||
"3": {
|
||||
"level": 3,
|
||||
"name": "Battle Tactics (Formation Commands)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Ranged vs infantry \u2014 defensive opening",
|
||||
"raw_response": "{\"infantry_order\": \"skirmish\", \"archer_order\": \"skirmish\", \"reason\": \"The enemy is advancing with heavy infantry, so our priority should be to disrupt their formation and inflict casualties from range before they close the gap. The archers and infantry will pepper the enemy with arrows as they approach, aiming to break their cohesion and lower their morale.\", \"expected_outcome\": \"The enemy's heavy infantry will take significant casualties and lose cohesion before reaching our lines, giving us a tactical advantage when the two forces meet in melee combat.\"}",
|
||||
"parsed": {
|
||||
"infantry_order": "skirmish",
|
||||
"archer_order": "skirmish",
|
||||
"reason": "The enemy is advancing with heavy infantry, so our priority should be to disrupt their formation and inflict casualties from range before they close the gap. The archers and infantry will pepper the enemy with arrows as they approach, aiming to break their cohesion and lower their morale.",
|
||||
"expected_outcome": "The enemy's heavy infantry will take significant casualties and lose cohesion before reaching our lines, giving us a tactical advantage when the two forces meet in melee combat."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": false,
|
||||
"latency_ms": 3436.897039413452,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Outnumbered \u2014 retreat decision",
|
||||
"raw_response": "{\"order\": \"fallback\", \"direction\": \"east\", \"reason\": \"Our troops are heavily wounded and outmatched by the 60 bandits surrounding us. We must retreat immediately to the east where there is an escape route. Fallback is the best option to preserve our troops and live to fight another day.\", \"priority\": \"preserve_troops\"}",
|
||||
"parsed": {
|
||||
"order": "fallback",
|
||||
"direction": "east",
|
||||
"reason": "Our troops are heavily wounded and outmatched by the 60 bandits surrounding us. We must retreat immediately to the east where there is an escape route. Fallback is the best option to preserve our troops and live to fight another day.",
|
||||
"priority": "preserve_troops"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2173.5548973083496,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Cavalry flanking opportunity",
|
||||
"raw_response": "{\"cavalry_order\": \"flank_left\", \"timing\": \"now\", \"reason\": \"Enemy left flank is unprotected and our cavalry can break their line quickly, routing them in ~30 seconds.\", \"risk\": \"low\"}",
|
||||
"parsed": {
|
||||
"cavalry_order": "flank_left",
|
||||
"timing": "now",
|
||||
"reason": "Enemy left flank is unprotected and our cavalry can break their line quickly, routing them in ~30 seconds.",
|
||||
"risk": "low"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 1566.443920135498,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 2173.5548973083496,
|
||||
"latency_p99_ms": 3436.897039413452
|
||||
},
|
||||
"4": {
|
||||
"level": 4,
|
||||
"name": "Trade Route (Campaign Navigation)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Optimal trade route selection",
|
||||
"raw_response": "{\"route\": \"A\", \"expected_profit\": 500, \"days_used\": 8, \"reason\": \"Route A offers the highest profit with a manageable travel time and low risk. Selling grain in Vlandia and furs in Sturgia provides a good balance of goods and markets.\", \"risk\": \"low\"}",
|
||||
"parsed": {
|
||||
"route": "A",
|
||||
"expected_profit": 500,
|
||||
"days_used": 8,
|
||||
"reason": "Route A offers the highest profit with a manageable travel time and low risk. Selling grain in Vlandia and furs in Sturgia provides a good balance of goods and markets.",
|
||||
"risk": "low"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2507.443904876709,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Adapt plan when war declared",
|
||||
"raw_response": "{\"decision\": \"divert_to_sturgia\", \"reason\": \"Vlandia declaring war makes entering their territory too dangerous for trade. Sturgia is neutral but closer than the Empire, so less time and gold at risk.\", \"gold_at_risk\": 400}",
|
||||
"parsed": {
|
||||
"decision": "divert_to_sturgia",
|
||||
"reason": "Vlandia declaring war makes entering their territory too dangerous for trade. Sturgia is neutral but closer than the Empire, so less time and gold at risk.",
|
||||
"gold_at_risk": 400
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2011.15083694458,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Multi-stop route planning with constraints",
|
||||
"raw_response": "{\n \"stops\": [\n \"Rhotae\",\n \"Ortysia\",\n \"Epicrotea\"\n ],\n \"total_days\": 7,\n \"estimated_profit\": 800,\n \"reason\": \"Starting in Rhotae allows buying cheap leather and selling well in the south. Moving to Ortysia to buy grain cheap. Finally, Epicrotea for iron market. 7 days total. 800 gold budget should be enough with this plan.\"\n}",
|
||||
"parsed": {
|
||||
"stops": [
|
||||
"Rhotae",
|
||||
"Ortysia",
|
||||
"Epicrotea"
|
||||
],
|
||||
"total_days": 7,
|
||||
"estimated_profit": 800,
|
||||
"reason": "Starting in Rhotae allows buying cheap leather and selling well in the south. Moving to Ortysia to buy grain cheap. Finally, Epicrotea for iron market. 7 days total. 800 gold budget should be enough with this plan."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 3420.1788902282715,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 2507.443904876709,
|
||||
"latency_p99_ms": 3420.1788902282715
|
||||
},
|
||||
"5": {
|
||||
"level": 5,
|
||||
"name": "Mini Campaign (Full Campaign Loop)",
|
||||
"turns": [
|
||||
{
|
||||
"turn": 1,
|
||||
"state_before": {
|
||||
"turn": 1,
|
||||
"gold": 1000,
|
||||
"troops": 25,
|
||||
"renown": 150,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"trade_circuit\", \"target_faction\": null, \"reason\": \"Establishing trade relations will bring in gold and renown without risking retaliation or damaging relations with any faction. This sets a strong financial foundation for future expansion.\", \"goal\": \"Increase gold and renown\"}",
|
||||
"parsed": {
|
||||
"action": "trade_circuit",
|
||||
"target_faction": null,
|
||||
"reason": "Establishing trade relations will bring in gold and renown without risking retaliation or damaging relations with any faction. This sets a strong financial foundation for future expansion.",
|
||||
"goal": "Increase gold and renown"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "trade_circuit",
|
||||
"latency_ms": 2763.1959915161133,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"turn": 2,
|
||||
"state_before": {
|
||||
"turn": 2,
|
||||
"gold": 1300,
|
||||
"troops": 25,
|
||||
"renown": 160,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"tournament\", \"target_faction\": null, \"reason\": \"Participating in a tournament will boost my renown and relations without spending gold. This positive publicity can help attract potential vassals or allies in the future.\", \"goal\": \"Increase renown and relations\"}",
|
||||
"parsed": {
|
||||
"action": "tournament",
|
||||
"target_faction": null,
|
||||
"reason": "Participating in a tournament will boost my renown and relations without spending gold. This positive publicity can help attract potential vassals or allies in the future.",
|
||||
"goal": "Increase renown and relations"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "tournament",
|
||||
"latency_ms": 2335.5319499969482,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"turn": 3,
|
||||
"state_before": {
|
||||
"turn": 3,
|
||||
"gold": 1200,
|
||||
"troops": 25,
|
||||
"renown": 220,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"recruit_troops\", \"target_faction\": null, \"reason\": \"Recruiting more troops will strengthen my military force, which is essential for future expansion and defending my settlements once I have them. This investment in my army will pay off in the long run.\", \"goal\": \"Increase troops\"}",
|
||||
"parsed": {
|
||||
"action": "recruit_troops",
|
||||
"target_faction": null,
|
||||
"reason": "Recruiting more troops will strengthen my military force, which is essential for future expansion and defending my settlements once I have them. This investment in my army will pay off in the long run.",
|
||||
"goal": "Increase troops"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "recruit_troops",
|
||||
"latency_ms": 2570.871114730835,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"turn": 4,
|
||||
"state_before": {
|
||||
"turn": 4,
|
||||
"gold": 1000,
|
||||
"troops": 40,
|
||||
"renown": 220,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"siege_castle\", \"target_faction\": null, \"reason\": \"Capturing a castle will give me my first settlement, putting me closer to winning the game. This physical foothold in Calradia will provide a base of operations and resources to further my ambitions of becoming King.\", \"goal\": \"Gain first settlement\"}",
|
||||
"parsed": {
|
||||
"action": "siege_castle",
|
||||
"target_faction": null,
|
||||
"reason": "Capturing a castle will give me my first settlement, putting me closer to winning the game. This physical foothold in Calradia will provide a base of operations and resources to further my ambitions of becoming King.",
|
||||
"goal": "Gain first settlement"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "siege_castle",
|
||||
"latency_ms": 2367.640972137451,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"final_state": {
|
||||
"turn": 4,
|
||||
"gold": 500,
|
||||
"troops": 40,
|
||||
"renown": 300,
|
||||
"settlements": 1,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"reached_gold_target": false,
|
||||
"reached_renown_target": false,
|
||||
"reached_settlement_target": true,
|
||||
"latency_p50_ms": 2570.871114730835,
|
||||
"latency_p99_ms": 2763.1959915161133
|
||||
}
|
||||
},
|
||||
"summary": {
|
||||
"total_elapsed_s": 37.7,
|
||||
"levels_run": [
|
||||
0,
|
||||
1,
|
||||
2,
|
||||
3,
|
||||
4,
|
||||
5
|
||||
],
|
||||
"levels_passed": [
|
||||
0,
|
||||
2,
|
||||
3,
|
||||
4,
|
||||
5
|
||||
],
|
||||
"levels_failed": [
|
||||
1
|
||||
],
|
||||
"m1_gate_passed": false,
|
||||
"m1_gate_notes": [
|
||||
"Level 1 FAILED (score=0.50)"
|
||||
],
|
||||
"m1_latency_threshold_ms": 10000
|
||||
}
|
||||
}
|
||||
@@ -1,476 +0,0 @@
|
||||
{
|
||||
"model": "hermes3:latest",
|
||||
"timestamp": "2026-03-23T19:28:19.717773+00:00",
|
||||
"skipped": false,
|
||||
"levels": {
|
||||
"0": {
|
||||
"level": 0,
|
||||
"name": "JSON Compliance (Coin Flip)",
|
||||
"trials": [
|
||||
{
|
||||
"trial_index": 0,
|
||||
"prompt": "A coin is flipped. Respond with exactly: {\"choice\": \"heads\"} or {\"choice\": \"tails\"}. Pick one. JSON only.",
|
||||
"raw_response": "{\"choice\": \"heads\"}",
|
||||
"parsed": {
|
||||
"choice": "heads"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"value_valid": true,
|
||||
"latency_ms": 1028.2011032104492,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"trial_index": 1,
|
||||
"prompt": "You must attack or defend. Respond with: {\"action\": \"attack\", \"confidence\": 0.8} or {\"action\": \"defend\", \"confidence\": 0.6}. Replace confidence with your own value 0.0-1.0. JSON only.",
|
||||
"raw_response": "{\"action\": \"defend\", \"confidence\": 0.7}",
|
||||
"parsed": {
|
||||
"action": "defend",
|
||||
"confidence": 0.7
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"value_valid": true,
|
||||
"latency_ms": 700.0582218170166,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"trial_index": 2,
|
||||
"prompt": "Choose a direction to march. Respond with exactly: {\"direction\": \"north\", \"reason\": \"string explaining why\"}. Pick north/south/east/west. JSON only.",
|
||||
"raw_response": "{\"direction\": \"north\", \"reason\": \"The northern direction allows for the most efficient and direct path towards the goal, minimizing potential obstacles and maximizing progress.\"}",
|
||||
"parsed": {
|
||||
"direction": "north",
|
||||
"reason": "The northern direction allows for the most efficient and direct path towards the goal, minimizing potential obstacles and maximizing progress."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"value_valid": true,
|
||||
"latency_ms": 1184.3459606170654,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 1028.2011032104492,
|
||||
"latency_p99_ms": 1184.3459606170654
|
||||
},
|
||||
"1": {
|
||||
"level": 1,
|
||||
"name": "Board State Tracking (Tic-Tac-Toe)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Empty board \u2014 opening move",
|
||||
"prompt": "Board state: [null,null,null,null,null,null,null,null,null]. You are X. It is your turn. Choose a move. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": \"4\", \"reason\": \"Center move to establish dominance\"}",
|
||||
"parsed": {
|
||||
"move": "4",
|
||||
"reason": "Center move to establish dominance"
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": true,
|
||||
"move_correct": true,
|
||||
"latency_ms": 932.2080612182617,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Block opponent's winning move",
|
||||
"prompt": "Board: [\"O\",null,\"O\",null,\"X\",null,null,null,null]. O has positions 0 and 2. You are X. O will win on next turn unless you block. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": 4, \"reason\": \"Block the winning move for O in the center, creating a chance for a winning X cross\"}",
|
||||
"parsed": {
|
||||
"move": 4,
|
||||
"reason": "Block the winning move for O in the center, creating a chance for a winning X cross"
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": false,
|
||||
"move_correct": false,
|
||||
"latency_ms": 1165.644884109497,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Take winning move",
|
||||
"prompt": "Board: [\"X\",null,\"X\",null,\"O\",null,null,\"O\",null]. You are X. You have positions 0 and 2. You can win this turn. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": 0, \"reason\": \"I can win this turn by placing my X in position 0, creating a winning line of three X's horizontally.\"}",
|
||||
"parsed": {
|
||||
"move": 0,
|
||||
"reason": "I can win this turn by placing my X in position 0, creating a winning line of three X's horizontally."
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": false,
|
||||
"move_correct": false,
|
||||
"latency_ms": 1302.915096282959,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 3,
|
||||
"description": "Legal move on partially filled board",
|
||||
"prompt": "Board: [\"X\",\"O\",\"X\",\"O\",\"X\",\"O\",null,null,null]. You are O. Choose a legal move (positions 6, 7, or 8 are available). Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": 6, \"reason\": \"Block X's potential win in the center.\"}",
|
||||
"parsed": {
|
||||
"move": 6,
|
||||
"reason": "Block X's potential win in the center."
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": true,
|
||||
"move_correct": true,
|
||||
"latency_ms": 851.456880569458,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": false,
|
||||
"score": 0.5,
|
||||
"latency_p50_ms": 1165.644884109497,
|
||||
"latency_p99_ms": 1302.915096282959
|
||||
},
|
||||
"2": {
|
||||
"level": 2,
|
||||
"name": "Resource Management (Party Economy)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Budget allocation under constraint",
|
||||
"raw_response": "{\"choices\": [\"buy_food\", \"repair_armor\"], \"gold_spent\": 350, \"reason\": \"Buying food is crucial to prevent desertion, and repairing armor will help reduce casualties. Recruit infantry is put on hold due to limited resources.\"}",
|
||||
"parsed": {
|
||||
"choices": [
|
||||
"buy_food",
|
||||
"repair_armor"
|
||||
],
|
||||
"gold_spent": 350,
|
||||
"reason": "Buying food is crucial to prevent desertion, and repairing armor will help reduce casualties. Recruit infantry is put on hold due to limited resources."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2031.9790840148926,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Troop tier upgrade decision",
|
||||
"raw_response": "{\"action\": \"save_gold\", \"reason\": \"Upgrading recruits now would leave little gold for emergencies, especially with the upcoming raiders camp attack. It's better to maintain the current troop composition and save gold for potential future threats or opportunities.\", \"expected_outcome\": \"Having enough gold to respond to unexpected situations without needing to dismiss or downgrade troops.\"}",
|
||||
"parsed": {
|
||||
"action": "save_gold",
|
||||
"reason": "Upgrading recruits now would leave little gold for emergencies, especially with the upcoming raiders camp attack. It's better to maintain the current troop composition and save gold for potential future threats or opportunities.",
|
||||
"expected_outcome": "Having enough gold to respond to unexpected situations without needing to dismiss or downgrade troops."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2231.9281101226807,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Multi-turn planning horizon",
|
||||
"raw_response": "{\"extra_food_days\": 5, \"cost\": 50, \"remaining_gold\": 250, \"reason\": \"Buying 5 extra food days will ensure we have enough food to cross the desert and reach the town on day 10, while still having enough gold left for potential expenses.\"}",
|
||||
"parsed": {
|
||||
"extra_food_days": 5,
|
||||
"cost": 50,
|
||||
"remaining_gold": 250,
|
||||
"reason": "Buying 5 extra food days will ensure we have enough food to cross the desert and reach the town on day 10, while still having enough gold left for potential expenses."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 1943.3560371398926,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 2031.9790840148926,
|
||||
"latency_p99_ms": 2231.9281101226807
|
||||
},
|
||||
"3": {
|
||||
"level": 3,
|
||||
"name": "Battle Tactics (Formation Commands)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Ranged vs infantry \u2014 defensive opening",
|
||||
"raw_response": "{\"infantry_order\": \"skirmish\", \"archer_order\": \"skirmish\", \"reason\": \"The enemy heavy infantry is advancing, so we should use our archers and infantry to pepper them with ranged attacks before they close to melee range. This will soften them up and reduce their combat effectiveness when our forces meet.\", \"expected_outcome\": \"By the time our forces engage in melee, the enemy heavy infantry will have suffered significant casualties from the ranged attacks, giving us a tactical advantage and maximizing the damage we can inflict before contact is made.\"}",
|
||||
"parsed": {
|
||||
"infantry_order": "skirmish",
|
||||
"archer_order": "skirmish",
|
||||
"reason": "The enemy heavy infantry is advancing, so we should use our archers and infantry to pepper them with ranged attacks before they close to melee range. This will soften them up and reduce their combat effectiveness when our forces meet.",
|
||||
"expected_outcome": "By the time our forces engage in melee, the enemy heavy infantry will have suffered significant casualties from the ranged attacks, giving us a tactical advantage and maximizing the damage we can inflict before contact is made."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": false,
|
||||
"latency_ms": 3524.5888233184814,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Outnumbered \u2014 retreat decision",
|
||||
"raw_response": "{\"order\": \"fallback\", \"direction\": \"east\", \"reason\": \"Outnumbered and with many wounded, falling back to the escape route is the best option to preserve our troops.\", \"priority\": \"preserve_troops\"}",
|
||||
"parsed": {
|
||||
"order": "fallback",
|
||||
"direction": "east",
|
||||
"reason": "Outnumbered and with many wounded, falling back to the escape route is the best option to preserve our troops.",
|
||||
"priority": "preserve_troops"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 1523.0557918548584,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Cavalry flanking opportunity",
|
||||
"raw_response": "{\"cavalry_order\": \"flank_left\", \"timing\": \"now\", \"reason\": \"Enemy left flank is unprotected and engaging our main force. A swift cavalry charge will catch them off guard and route them in ~30 seconds.\", \"risk\": \"low\"}",
|
||||
"parsed": {
|
||||
"cavalry_order": "flank_left",
|
||||
"timing": "now",
|
||||
"reason": "Enemy left flank is unprotected and engaging our main force. A swift cavalry charge will catch them off guard and route them in ~30 seconds.",
|
||||
"risk": "low"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 1614.2308712005615,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 1614.2308712005615,
|
||||
"latency_p99_ms": 3524.5888233184814
|
||||
},
|
||||
"4": {
|
||||
"level": 4,
|
||||
"name": "Trade Route (Campaign Navigation)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Optimal trade route selection",
|
||||
"raw_response": "{\"route\": \"A\", \"expected_profit\": 500, \"days_used\": 8, \"reason\": \"Route A offers the highest profit with a manageable travel time and low risk. Selling grain in Vlandia and furs in Sturgia provides a good balance of profitability and travel distance.\", \"risk\": \"low\"}",
|
||||
"parsed": {
|
||||
"route": "A",
|
||||
"expected_profit": 500,
|
||||
"days_used": 8,
|
||||
"reason": "Route A offers the highest profit with a manageable travel time and low risk. Selling grain in Vlandia and furs in Sturgia provides a good balance of profitability and travel distance.",
|
||||
"risk": "low"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2335.57391166687,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Adapt plan when war declared",
|
||||
"raw_response": "{\"decision\": \"divert_to_sturgia\", \"reason\": \"Given the new information that Vlandia has declared war on our faction, entering their territory would be too dangerous. Sturgia is a neutral but friendly region, so it should be safe to trade there. The Empire is allied but further away, and while safe, the extra travel time could reduce the profitability of the trade mission.\", \"gold_at_risk\": 400}",
|
||||
"parsed": {
|
||||
"decision": "divert_to_sturgia",
|
||||
"reason": "Given the new information that Vlandia has declared war on our faction, entering their territory would be too dangerous. Sturgia is a neutral but friendly region, so it should be safe to trade there. The Empire is allied but further away, and while safe, the extra travel time could reduce the profitability of the trade mission.",
|
||||
"gold_at_risk": 400
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2723.7470149993896,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Multi-stop route planning with constraints",
|
||||
"raw_response": "{\n \"stops\": [\n \"Rhotae\",\n \"Ortysia\",\n \"Epicrotea\"\n ],\n \"total_days\": 7,\n \"estimated_profit\": 1200,\n \"reason\": \"Starting in Rhotae allows buying cheap leather and selling well in the south. Moving to Ortysia to buy grain cheap. Finally, Epicrotea for iron market. Total 7 days, 800 gold budget sufficient. Expected profit of 1200 gold.\"\n}",
|
||||
"parsed": {
|
||||
"stops": [
|
||||
"Rhotae",
|
||||
"Ortysia",
|
||||
"Epicrotea"
|
||||
],
|
||||
"total_days": 7,
|
||||
"estimated_profit": 1200,
|
||||
"reason": "Starting in Rhotae allows buying cheap leather and selling well in the south. Moving to Ortysia to buy grain cheap. Finally, Epicrotea for iron market. Total 7 days, 800 gold budget sufficient. Expected profit of 1200 gold."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 3038.438320159912,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 2723.7470149993896,
|
||||
"latency_p99_ms": 3038.438320159912
|
||||
},
|
||||
"5": {
|
||||
"level": 5,
|
||||
"name": "Mini Campaign (Full Campaign Loop)",
|
||||
"turns": [
|
||||
{
|
||||
"turn": 1,
|
||||
"state_before": {
|
||||
"turn": 1,
|
||||
"gold": 1000,
|
||||
"troops": 25,
|
||||
"renown": 150,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"trade_circuit\", \"target_faction\": null, \"reason\": \"Establishing trade relations will bring in gold without risking relations or retaliation. This sets a foundation for economic growth and allows me to assess other factions before committing to any alliances or conflicts.\", \"goal\": \"Increase gold reserves and build a power base through trade and diplomacy\"}",
|
||||
"parsed": {
|
||||
"action": "trade_circuit",
|
||||
"target_faction": null,
|
||||
"reason": "Establishing trade relations will bring in gold without risking relations or retaliation. This sets a foundation for economic growth and allows me to assess other factions before committing to any alliances or conflicts.",
|
||||
"goal": "Increase gold reserves and build a power base through trade and diplomacy"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "trade_circuit",
|
||||
"latency_ms": 2898.583173751831,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"turn": 2,
|
||||
"state_before": {
|
||||
"turn": 2,
|
||||
"gold": 1300,
|
||||
"troops": 25,
|
||||
"renown": 160,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"raid_village\", \"target_faction\": null, \"reason\": \"Raiding a village will provide a quick boost to my gold reserves and renown. This will also help me gauge the strength of my troops and the potential resistance from the targeted faction. It's a bold move that shows my power and willingness to act, which may intimidate other factions and open up opportunities for future alliances or conflicts.\", \"goal\": \"Increase gold and renown, assess troop strength and potential faction responses\"}",
|
||||
"parsed": {
|
||||
"action": "raid_village",
|
||||
"target_faction": null,
|
||||
"reason": "Raiding a village will provide a quick boost to my gold reserves and renown. This will also help me gauge the strength of my troops and the potential resistance from the targeted faction. It's a bold move that shows my power and willingness to act, which may intimidate other factions and open up opportunities for future alliances or conflicts.",
|
||||
"goal": "Increase gold and renown, assess troop strength and potential faction responses"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "raid_village",
|
||||
"latency_ms": 3112.649917602539,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"turn": 3,
|
||||
"state_before": {
|
||||
"turn": 3,
|
||||
"gold": 1500,
|
||||
"troops": 25,
|
||||
"renown": 190,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"recruit_troops\", \"target_faction\": null, \"reason\": \"Recruiting more troops is essential for expanding my military strength and increasing my chances of successfully sieging a castle in the next turn. This will also help me prepare for potential conflicts with other factions and solidify my power base. Strengthening my army is a crucial step towards achieving my goal of becoming the King of Calradia.\", \"goal\": \"Increase troop count for future military expansion and defense\"}",
|
||||
"parsed": {
|
||||
"action": "recruit_troops",
|
||||
"target_faction": null,
|
||||
"reason": "Recruiting more troops is essential for expanding my military strength and increasing my chances of successfully sieging a castle in the next turn. This will also help me prepare for potential conflicts with other factions and solidify my power base. Strengthening my army is a crucial step towards achieving my goal of becoming the King of Calradia.",
|
||||
"goal": "Increase troop count for future military expansion and defense"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "recruit_troops",
|
||||
"latency_ms": 3136.754035949707,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"turn": 4,
|
||||
"state_before": {
|
||||
"turn": 4,
|
||||
"gold": 1300,
|
||||
"troops": 40,
|
||||
"renown": 190,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"siege_castle\", \"target_faction\": null, \"reason\": \"Sieging a castle will allow me to establish a settlement, which is crucial for meeting one of my win conditions. This will also demonstrate my military might and potentially intimidate other factions, opening up opportunities for future alliances or conflicts. While it comes with a risk of losing troops morale and straining relations, it's a necessary step towards achieving my goal of becoming the King of Calradia.\", \"goal\": \"Establish a settlement and demonstrate military strength to intimidate other factions\"}",
|
||||
"parsed": {
|
||||
"action": "siege_castle",
|
||||
"target_faction": null,
|
||||
"reason": "Sieging a castle will allow me to establish a settlement, which is crucial for meeting one of my win conditions. This will also demonstrate my military might and potentially intimidate other factions, opening up opportunities for future alliances or conflicts. While it comes with a risk of losing troops morale and straining relations, it's a necessary step towards achieving my goal of becoming the King of Calradia.",
|
||||
"goal": "Establish a settlement and demonstrate military strength to intimidate other factions"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "siege_castle",
|
||||
"latency_ms": 3433.211326599121,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"final_state": {
|
||||
"turn": 4,
|
||||
"gold": 800,
|
||||
"troops": 40,
|
||||
"renown": 270,
|
||||
"settlements": 1,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": 0,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"reached_gold_target": false,
|
||||
"reached_renown_target": false,
|
||||
"reached_settlement_target": true,
|
||||
"latency_p50_ms": 3136.754035949707,
|
||||
"latency_p99_ms": 3433.211326599121
|
||||
}
|
||||
},
|
||||
"summary": {
|
||||
"total_elapsed_s": 40.7,
|
||||
"levels_run": [
|
||||
0,
|
||||
1,
|
||||
2,
|
||||
3,
|
||||
4,
|
||||
5
|
||||
],
|
||||
"levels_passed": [
|
||||
0,
|
||||
2,
|
||||
3,
|
||||
4,
|
||||
5
|
||||
],
|
||||
"levels_failed": [
|
||||
1
|
||||
],
|
||||
"m1_gate_passed": false,
|
||||
"m1_gate_notes": [
|
||||
"Level 1 FAILED (score=0.50)"
|
||||
],
|
||||
"m1_latency_threshold_ms": 10000
|
||||
}
|
||||
}
|
||||
@@ -1,476 +0,0 @@
|
||||
{
|
||||
"model": "qwen2.5:14b",
|
||||
"timestamp": "2026-03-23T18:20:27.095190+00:00",
|
||||
"skipped": false,
|
||||
"levels": {
|
||||
"0": {
|
||||
"level": 0,
|
||||
"name": "JSON Compliance (Coin Flip)",
|
||||
"trials": [
|
||||
{
|
||||
"trial_index": 0,
|
||||
"prompt": "A coin is flipped. Respond with exactly: {\"choice\": \"heads\"} or {\"choice\": \"tails\"}. Pick one. JSON only.",
|
||||
"raw_response": "{\"choice\": \"heads\"}",
|
||||
"parsed": {
|
||||
"choice": "heads"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"value_valid": true,
|
||||
"latency_ms": 6347.5000858306885,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"trial_index": 1,
|
||||
"prompt": "You must attack or defend. Respond with: {\"action\": \"attack\", \"confidence\": 0.8} or {\"action\": \"defend\", \"confidence\": 0.6}. Replace confidence with your own value 0.0-1.0. JSON only.",
|
||||
"raw_response": "{\"action\": \"defend\", \"confidence\": 0.6}",
|
||||
"parsed": {
|
||||
"action": "defend",
|
||||
"confidence": 0.6
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"value_valid": true,
|
||||
"latency_ms": 850.42405128479,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"trial_index": 2,
|
||||
"prompt": "Choose a direction to march. Respond with exactly: {\"direction\": \"north\", \"reason\": \"string explaining why\"}. Pick north/south/east/west. JSON only.",
|
||||
"raw_response": "{\"direction\": \"north\", \"reason\": \"There is a reported enemy encampment to the north that we need to engage.\"}",
|
||||
"parsed": {
|
||||
"direction": "north",
|
||||
"reason": "There is a reported enemy encampment to the north that we need to engage."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"value_valid": true,
|
||||
"latency_ms": 1443.0510997772217,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 1443.0510997772217,
|
||||
"latency_p99_ms": 6347.5000858306885
|
||||
},
|
||||
"1": {
|
||||
"level": 1,
|
||||
"name": "Board State Tracking (Tic-Tac-Toe)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Empty board \u2014 opening move",
|
||||
"prompt": "Board state: [null,null,null,null,null,null,null,null,null]. You are X. It is your turn. Choose a move. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": 4, \"reason\": \"Control the center\"}",
|
||||
"parsed": {
|
||||
"move": 4,
|
||||
"reason": "Control the center"
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": true,
|
||||
"move_correct": true,
|
||||
"latency_ms": 1184.3791007995605,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Block opponent's winning move",
|
||||
"prompt": "Board: [\"O\",null,\"O\",null,\"X\",null,null,null,null]. O has positions 0 and 2. You are X. O will win on next turn unless you block. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": 4, \"reason\": \"Block O's winning move\"}",
|
||||
"parsed": {
|
||||
"move": 4,
|
||||
"reason": "Block O's winning move"
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": false,
|
||||
"move_correct": false,
|
||||
"latency_ms": 942.816972732544,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Take winning move",
|
||||
"prompt": "Board: [\"X\",null,\"X\",null,\"O\",null,null,\"O\",null]. You are X. You have positions 0 and 2. You can win this turn. Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": 2, \"reason\": \"Winning move\"}",
|
||||
"parsed": {
|
||||
"move": 2,
|
||||
"reason": "Winning move"
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": false,
|
||||
"move_correct": false,
|
||||
"latency_ms": 849.2491245269775,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 3,
|
||||
"description": "Legal move on partially filled board",
|
||||
"prompt": "Board: [\"X\",\"O\",\"X\",\"O\",\"X\",\"O\",null,null,null]. You are O. Choose a legal move (positions 6, 7, or 8 are available). Respond: {\"move\": <0-8>, \"reason\": \"<why>\"}",
|
||||
"raw_response": "{\"move\": 7, \"reason\": \"Creating a fork opportunity\"}",
|
||||
"parsed": {
|
||||
"move": 7,
|
||||
"reason": "Creating a fork opportunity"
|
||||
},
|
||||
"valid_json": true,
|
||||
"move_legal": true,
|
||||
"move_correct": true,
|
||||
"latency_ms": 885.8909606933594,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": false,
|
||||
"score": 0.5,
|
||||
"latency_p50_ms": 942.816972732544,
|
||||
"latency_p99_ms": 1184.3791007995605
|
||||
},
|
||||
"2": {
|
||||
"level": 2,
|
||||
"name": "Resource Management (Party Economy)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Budget allocation under constraint",
|
||||
"raw_response": "{\"choices\": [\"buy_food\", \"repair_armor\"], \"gold_spent\": 350, \"reason\": \"Morale is critical, so buying food is the priority to prevent desertion. Repairing armor is also crucial to reduce casualties in the upcoming battles.\"}",
|
||||
"parsed": {
|
||||
"choices": [
|
||||
"buy_food",
|
||||
"repair_armor"
|
||||
],
|
||||
"gold_spent": 350,
|
||||
"reason": "Morale is critical, so buying food is the priority to prevent desertion. Repairing armor is also crucial to reduce casualties in the upcoming battles."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 3121.7379570007324,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Troop tier upgrade decision",
|
||||
"raw_response": "{\"action\": \"upgrade_recruits\", \"reason\": \"Improving the strength of the weakest troops can balance the party's overall combat effectiveness.\", \"expected_outcome\": \"Enhanced combat capability against the raider camp with a mix of Tier-2 and Tier-3 soldiers.\"}",
|
||||
"parsed": {
|
||||
"action": "upgrade_recruits",
|
||||
"reason": "Improving the strength of the weakest troops can balance the party's overall combat effectiveness.",
|
||||
"expected_outcome": "Enhanced combat capability against the raider camp with a mix of Tier-2 and Tier-3 soldiers."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2936.119794845581,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Multi-turn planning horizon",
|
||||
"raw_response": "{\"extra_food_days\": 5, \"cost\": 50, \"remaining_gold\": 250, \"reason\": \"To ensure survival, buy enough food to cover the 15-day requirement, spending the minimum necessary now.\"}",
|
||||
"parsed": {
|
||||
"extra_food_days": 5,
|
||||
"cost": 50,
|
||||
"remaining_gold": 250,
|
||||
"reason": "To ensure survival, buy enough food to cover the 15-day requirement, spending the minimum necessary now."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2515.505790710449,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 2936.119794845581,
|
||||
"latency_p99_ms": 3121.7379570007324
|
||||
},
|
||||
"3": {
|
||||
"level": 3,
|
||||
"name": "Battle Tactics (Formation Commands)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Ranged vs infantry \u2014 defensive opening",
|
||||
"raw_response": "{\"infantry_order\": \"shield_wall\", \"archer_order\": \"skirmish\", \"reason\": \"Infantry should prepare for incoming melee by forming a defensive posture. Archers should engage enemy from a distance to maximize casualties before close combat.\", \"expected_outcome\": \"Inflict significant casualties on the enemy heavy infantry while minimizing losses until melee contact.\"}",
|
||||
"parsed": {
|
||||
"infantry_order": "shield_wall",
|
||||
"archer_order": "skirmish",
|
||||
"reason": "Infantry should prepare for incoming melee by forming a defensive posture. Archers should engage enemy from a distance to maximize casualties before close combat.",
|
||||
"expected_outcome": "Inflict significant casualties on the enemy heavy infantry while minimizing losses until melee contact."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 3828.04012298584,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Outnumbered \u2014 retreat decision",
|
||||
"raw_response": "{\"order\": \"skirmish\", \"direction\": \"east\", \"reason\": \"Engage the enemy at range to buy time for an organized retreat.\", \"priority\": \"preserve_troops\"}",
|
||||
"parsed": {
|
||||
"order": "skirmish",
|
||||
"direction": "east",
|
||||
"reason": "Engage the enemy at range to buy time for an organized retreat.",
|
||||
"priority": "preserve_troops"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": false,
|
||||
"latency_ms": 2162.7819538116455,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Cavalry flanking opportunity",
|
||||
"raw_response": "{\"cavalry_order\": \"flank_right\", \"timing\": \"now\", \"reason\": \"Enemy left flank is unprotected and vulnerable, providing an opportunity to break their line quickly.\", \"risk\": \"low\"}",
|
||||
"parsed": {
|
||||
"cavalry_order": "flank_right",
|
||||
"timing": "now",
|
||||
"reason": "Enemy left flank is unprotected and vulnerable, providing an opportunity to break their line quickly.",
|
||||
"risk": "low"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 2248.1229305267334,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 2248.1229305267334,
|
||||
"latency_p99_ms": 3828.04012298584
|
||||
},
|
||||
"4": {
|
||||
"level": 4,
|
||||
"name": "Trade Route (Campaign Navigation)",
|
||||
"trials": [
|
||||
{
|
||||
"scenario_index": 0,
|
||||
"description": "Optimal trade route selection",
|
||||
"raw_response": "{\"route\": \"A\", \"expected_profit\": 500, \"days_used\": 8, \"reason\": \"Offers the highest profit with a reasonable travel time.\", \"risk\": \"medium\"}",
|
||||
"parsed": {
|
||||
"route": "A",
|
||||
"expected_profit": 500,
|
||||
"days_used": 8,
|
||||
"reason": "Offers the highest profit with a reasonable travel time.",
|
||||
"risk": "medium"
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 3142.971992492676,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 1,
|
||||
"description": "Adapt plan when war declared",
|
||||
"raw_response": "{\"decision\": \"divert_to_sturgia\", \"reason\": \"Vlandia is at war with my faction, making the journey dangerous. Sturgia is neutral and safer, though it's slightly further. The Empire is allied and safest but further still.\", \"gold_at_risk\": 400}",
|
||||
"parsed": {
|
||||
"decision": "divert_to_sturgia",
|
||||
"reason": "Vlandia is at war with my faction, making the journey dangerous. Sturgia is neutral and safer, though it's slightly further. The Empire is allied and safest but further still.",
|
||||
"gold_at_risk": 400
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 3317.8718090057373,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"scenario_index": 2,
|
||||
"description": "Multi-stop route planning with constraints",
|
||||
"raw_response": "{\"stops\": [\"Ortysia\", \"Epicrotea\", \"Rhotae\"], \"total_days\": 12, \"estimated_profit\": 1200, \"reason\": \"Optimized for time and profit, focusing on essential goods with high demand and low risk.\"}",
|
||||
"parsed": {
|
||||
"stops": [
|
||||
"Ortysia",
|
||||
"Epicrotea",
|
||||
"Rhotae"
|
||||
],
|
||||
"total_days": 12,
|
||||
"estimated_profit": 1200,
|
||||
"reason": "Optimized for time and profit, focusing on essential goods with high demand and low risk."
|
||||
},
|
||||
"valid_json": true,
|
||||
"schema_valid": true,
|
||||
"strategically_sound": true,
|
||||
"latency_ms": 3235.442876815796,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"latency_p50_ms": 3235.442876815796,
|
||||
"latency_p99_ms": 3317.8718090057373
|
||||
},
|
||||
"5": {
|
||||
"level": 5,
|
||||
"name": "Mini Campaign (Full Campaign Loop)",
|
||||
"turns": [
|
||||
{
|
||||
"turn": 1,
|
||||
"state_before": {
|
||||
"turn": 1,
|
||||
"gold": 1000,
|
||||
"troops": 25,
|
||||
"renown": 150,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": -80,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"trade_circuit\", \"target_faction\": null, \"reason\": \"Increase gold and renown without risking relations or encountering retaliation.\", \"goal\": \"Economic stability and reputation building\"}",
|
||||
"parsed": {
|
||||
"action": "trade_circuit",
|
||||
"target_faction": null,
|
||||
"reason": "Increase gold and renown without risking relations or encountering retaliation.",
|
||||
"goal": "Economic stability and reputation building"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "trade_circuit",
|
||||
"latency_ms": 3970.310926437378,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"turn": 2,
|
||||
"state_before": {
|
||||
"turn": 2,
|
||||
"gold": 1300,
|
||||
"troops": 25,
|
||||
"renown": 160,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": -80,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"escort_caravan\", \"target_faction\": \"empire\", \"reason\": \"Improve relations with a major faction and gain additional renown.\", \"goal\": \"Political alliance and reputation advancement\"}",
|
||||
"parsed": {
|
||||
"action": "escort_caravan",
|
||||
"target_faction": "empire",
|
||||
"reason": "Improve relations with a major faction and gain additional renown.",
|
||||
"goal": "Political alliance and reputation advancement"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "escort_caravan",
|
||||
"latency_ms": 2754.3041706085205,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"turn": 3,
|
||||
"state_before": {
|
||||
"turn": 3,
|
||||
"gold": 1450,
|
||||
"troops": 25,
|
||||
"renown": 180,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": -80,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"recruit_troops\", \"target_faction\": null, \"reason\": \"Increase military strength to prepare for a potential siege or defense.\", \"goal\": \"Military readiness and expansion capability\"}",
|
||||
"parsed": {
|
||||
"action": "recruit_troops",
|
||||
"target_faction": null,
|
||||
"reason": "Increase military strength to prepare for a potential siege or defense.",
|
||||
"goal": "Military readiness and expansion capability"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "recruit_troops",
|
||||
"latency_ms": 2931.4141273498535,
|
||||
"error": ""
|
||||
},
|
||||
{
|
||||
"turn": 4,
|
||||
"state_before": {
|
||||
"turn": 4,
|
||||
"gold": 1250,
|
||||
"troops": 40,
|
||||
"renown": 180,
|
||||
"settlements": 0,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": -80,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"raw_response": "{\"action\": \"siege_castle\", \"target_faction\": \"empire\", \"reason\": \"Attempt to capture a settlement to meet the win condition of owning at least one settlement.\", \"goal\": \"Territorial expansion and victory condition\"}",
|
||||
"parsed": {
|
||||
"action": "siege_castle",
|
||||
"target_faction": "empire",
|
||||
"reason": "Attempt to capture a settlement to meet the win condition of owning at least one settlement.",
|
||||
"goal": "Territorial expansion and victory condition"
|
||||
},
|
||||
"valid_json": true,
|
||||
"valid_action": true,
|
||||
"action": "siege_castle",
|
||||
"latency_ms": 3413.828134536743,
|
||||
"error": ""
|
||||
}
|
||||
],
|
||||
"final_state": {
|
||||
"turn": 4,
|
||||
"gold": 750,
|
||||
"troops": 40,
|
||||
"renown": 260,
|
||||
"settlements": 1,
|
||||
"relations": {
|
||||
"vlandia": 0,
|
||||
"sturgia": 0,
|
||||
"empire": -80,
|
||||
"aserai": 0,
|
||||
"battania": 0
|
||||
}
|
||||
},
|
||||
"passed": true,
|
||||
"score": 1.0,
|
||||
"reached_gold_target": false,
|
||||
"reached_renown_target": false,
|
||||
"reached_settlement_target": true,
|
||||
"latency_p50_ms": 3413.828134536743,
|
||||
"latency_p99_ms": 3970.310926437378
|
||||
}
|
||||
},
|
||||
"summary": {
|
||||
"total_elapsed_s": 52.1,
|
||||
"levels_run": [
|
||||
0,
|
||||
1,
|
||||
2,
|
||||
3,
|
||||
4,
|
||||
5
|
||||
],
|
||||
"levels_passed": [
|
||||
0,
|
||||
2,
|
||||
3,
|
||||
4,
|
||||
5
|
||||
],
|
||||
"levels_failed": [
|
||||
1
|
||||
],
|
||||
"m1_gate_passed": false,
|
||||
"m1_gate_notes": [
|
||||
"Level 1 FAILED (score=0.50)"
|
||||
],
|
||||
"m1_latency_threshold_ms": 10000
|
||||
}
|
||||
}
|
||||
@@ -1,259 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Timmy Cognitive Benchmark Harness — Project Bannerlord M0.
|
||||
|
||||
Runs a 6-level cognitive benchmark against an Ollama model to assess
|
||||
readiness for autonomous Bannerlord gameplay.
|
||||
|
||||
Usage:
|
||||
python run_benchmark.py --model qwen2.5:14b --verbose
|
||||
python run_benchmark.py --model qwen3:14b --levels 0,1,2
|
||||
python run_benchmark.py --model qwen2.5:14b --output results/my_run.json
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import dataclasses
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
import ollama
|
||||
except ImportError:
|
||||
print("ERROR: 'ollama' package not installed. Run: pip install ollama", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Add parent dir to path so levels can be imported
|
||||
sys.path.insert(0, str(Path(__file__).parent))
|
||||
|
||||
from levels import level_0_coin_flip
|
||||
from levels import level_1_tic_tac_toe
|
||||
from levels import level_2_resource_mgmt
|
||||
from levels import level_3_battle_tactics
|
||||
from levels import level_4_trade_route
|
||||
from levels import level_5_mini_campaign
|
||||
|
||||
ALL_LEVELS = [
|
||||
level_0_coin_flip,
|
||||
level_1_tic_tac_toe,
|
||||
level_2_resource_mgmt,
|
||||
level_3_battle_tactics,
|
||||
level_4_trade_route,
|
||||
level_5_mini_campaign,
|
||||
]
|
||||
|
||||
# Pass criteria for M1 gate
|
||||
M1_GATE_LEVELS = {0, 1} # Must pass Level 0 and Level 1
|
||||
M1_LATENCY_THRESHOLD_MS = 10_000 # < 10s per decision for L0-L1
|
||||
|
||||
|
||||
def _dataclass_to_dict(obj):
|
||||
"""Recursively convert dataclass instances to dicts for JSON serialization."""
|
||||
if dataclasses.is_dataclass(obj) and not isinstance(obj, type):
|
||||
return {k: _dataclass_to_dict(v) for k, v in dataclasses.asdict(obj).items()}
|
||||
if isinstance(obj, list):
|
||||
return [_dataclass_to_dict(i) for i in obj]
|
||||
if isinstance(obj, dict):
|
||||
return {k: _dataclass_to_dict(v) for k, v in obj.items()}
|
||||
return obj
|
||||
|
||||
|
||||
def check_model_available(model: str) -> bool:
|
||||
"""Return True if the model is available in Ollama."""
|
||||
try:
|
||||
models = ollama.list()
|
||||
model_names = [m["model"] for m in models.get("models", [])]
|
||||
# Also check without tag
|
||||
base_model = model.split(":")[0]
|
||||
return any(
|
||||
m == model or m.startswith(base_model + ":") or m == base_model
|
||||
for m in model_names
|
||||
)
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
def run_benchmark(
|
||||
model: str,
|
||||
levels_to_run: list[int] | None = None,
|
||||
verbose: bool = False,
|
||||
skip_missing: bool = True,
|
||||
) -> dict:
|
||||
"""Run the benchmark and return a results dict."""
|
||||
if levels_to_run is None:
|
||||
levels_to_run = list(range(len(ALL_LEVELS)))
|
||||
|
||||
print(f"\n{'=' * 60}")
|
||||
print(f" Timmy Cognitive Benchmark — Project Bannerlord M0")
|
||||
print(f"{'=' * 60}")
|
||||
print(f" Model: {model}")
|
||||
print(f" Levels: {levels_to_run}")
|
||||
print(f" Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
print(f"{'=' * 60}\n")
|
||||
|
||||
if not check_model_available(model):
|
||||
if skip_missing:
|
||||
print(f" WARNING: Model '{model}' not found in Ollama. Skipping.\n")
|
||||
return {
|
||||
"model": model,
|
||||
"skipped": True,
|
||||
"reason": f"Model '{model}' not available",
|
||||
"timestamp": datetime.now(timezone.utc).isoformat(),
|
||||
}
|
||||
else:
|
||||
print(f" ERROR: Model '{model}' not found in Ollama.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
client = ollama
|
||||
|
||||
results = {
|
||||
"model": model,
|
||||
"timestamp": datetime.now(timezone.utc).isoformat(),
|
||||
"skipped": False,
|
||||
"levels": {},
|
||||
"summary": {},
|
||||
}
|
||||
|
||||
level_results = {}
|
||||
total_start = time.time()
|
||||
|
||||
for level_idx in levels_to_run:
|
||||
if level_idx >= len(ALL_LEVELS):
|
||||
print(f" WARNING: Level {level_idx} does not exist, skipping.")
|
||||
continue
|
||||
|
||||
module = ALL_LEVELS[level_idx]
|
||||
print(f"Level {module.LEVEL}: {module.NAME}")
|
||||
print(f" {module.DESCRIPTION}")
|
||||
|
||||
try:
|
||||
level_result = module.run(client, model, verbose=verbose)
|
||||
level_results[level_idx] = level_result
|
||||
|
||||
passed_str = "PASS" if level_result.passed else "FAIL"
|
||||
score_pct = f"{level_result.score * 100:.0f}%"
|
||||
lat_str = f"p50={level_result.latency_p50_ms:.0f}ms p99={level_result.latency_p99_ms:.0f}ms"
|
||||
print(f" Result: {passed_str} | Score: {score_pct} | Latency: {lat_str}")
|
||||
|
||||
except Exception as exc:
|
||||
print(f" ERROR running level {level_idx}: {exc}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
print()
|
||||
|
||||
total_elapsed_s = time.time() - total_start
|
||||
|
||||
# Build summary
|
||||
m1_gate_passed = True
|
||||
m1_gate_notes = []
|
||||
|
||||
for level_idx, lr in level_results.items():
|
||||
results["levels"][str(level_idx)] = _dataclass_to_dict(lr)
|
||||
|
||||
if level_idx in M1_GATE_LEVELS:
|
||||
if not lr.passed:
|
||||
m1_gate_passed = False
|
||||
m1_gate_notes.append(f"Level {level_idx} FAILED (score={lr.score:.2f})")
|
||||
if lr.latency_p99_ms > M1_LATENCY_THRESHOLD_MS:
|
||||
m1_gate_passed = False
|
||||
m1_gate_notes.append(
|
||||
f"Level {level_idx} latency too high "
|
||||
f"(p99={lr.latency_p99_ms:.0f}ms > {M1_LATENCY_THRESHOLD_MS}ms)"
|
||||
)
|
||||
|
||||
results["summary"] = {
|
||||
"total_elapsed_s": round(total_elapsed_s, 1),
|
||||
"levels_run": levels_to_run,
|
||||
"levels_passed": [i for i, lr in level_results.items() if lr.passed],
|
||||
"levels_failed": [i for i, lr in level_results.items() if not lr.passed],
|
||||
"m1_gate_passed": m1_gate_passed,
|
||||
"m1_gate_notes": m1_gate_notes,
|
||||
"m1_latency_threshold_ms": M1_LATENCY_THRESHOLD_MS,
|
||||
}
|
||||
|
||||
# Print scorecard
|
||||
print(f"{'=' * 60}")
|
||||
print(f" SCORECARD — {model}")
|
||||
print(f"{'=' * 60}")
|
||||
|
||||
all_level_modules = {m.LEVEL: m for m in ALL_LEVELS}
|
||||
for level_idx in levels_to_run:
|
||||
if level_idx not in level_results:
|
||||
continue
|
||||
lr = level_results[level_idx]
|
||||
module = ALL_LEVELS[level_idx]
|
||||
passed_str = "✓ PASS" if lr.passed else "✗ FAIL"
|
||||
gate_str = " [M1 GATE]" if level_idx in M1_GATE_LEVELS else ""
|
||||
lat = f"{lr.latency_p50_ms:.0f}ms"
|
||||
print(f" L{level_idx}: {passed_str}{gate_str} | {lr.score*100:.0f}% | {lat} | {module.NAME}")
|
||||
|
||||
print(f"{'─' * 60}")
|
||||
gate_str = "✓ M1 GATE PASSED" if m1_gate_passed else "✗ M1 GATE FAILED"
|
||||
print(f" {gate_str}")
|
||||
if m1_gate_notes:
|
||||
for note in m1_gate_notes:
|
||||
print(f" → {note}")
|
||||
print(f" Total time: {total_elapsed_s:.1f}s")
|
||||
print(f"{'=' * 60}\n")
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Timmy Cognitive Benchmark Harness — Project Bannerlord M0"
|
||||
)
|
||||
parser.add_argument("--model", required=True, help="Ollama model name (e.g. qwen2.5:14b)")
|
||||
parser.add_argument("--levels", default=None, help="Comma-separated level indices (default: all)")
|
||||
parser.add_argument("--verbose", action="store_true", help="Show per-trial details")
|
||||
parser.add_argument(
|
||||
"--output", default=None,
|
||||
help="Output JSON path (default: results/<model>_<timestamp>.json)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--skip-missing", action="store_true", default=True,
|
||||
help="Skip instead of error if model not available"
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
levels_to_run = None
|
||||
if args.levels:
|
||||
try:
|
||||
levels_to_run = [int(x.strip()) for x in args.levels.split(",")]
|
||||
except ValueError:
|
||||
print(f"ERROR: --levels must be comma-separated integers, got: {args.levels}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
results = run_benchmark(
|
||||
model=args.model,
|
||||
levels_to_run=levels_to_run,
|
||||
verbose=args.verbose,
|
||||
skip_missing=args.skip_missing,
|
||||
)
|
||||
|
||||
# Save results
|
||||
if args.output:
|
||||
output_path = Path(args.output)
|
||||
else:
|
||||
results_dir = Path(__file__).parent / "results"
|
||||
results_dir.mkdir(exist_ok=True)
|
||||
safe_model = args.model.replace(":", "_").replace("/", "_")
|
||||
ts = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
output_path = results_dir / f"{safe_model}_{ts}.json"
|
||||
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(output_path, "w") as f:
|
||||
json.dump(results, f, indent=2, default=str)
|
||||
|
||||
print(f"Results saved to: {output_path}")
|
||||
|
||||
# Exit with non-zero if M1 gate failed
|
||||
if not results.get("skipped") and not results.get("summary", {}).get("m1_gate_passed", True):
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user