Compare commits

...

67 Commits

Author SHA1 Message Date
Alexander Whitestone
28d1905df4 feat: add vLLM as alternative inference backend (#1281)
Some checks failed
Tests / lint (pull_request) Failing after 31s
Tests / test (pull_request) Has been skipped
Adds vLLM (high-throughput OpenAI-compatible inference server) as a
selectable backend alongside the existing Ollama and vllm-mlx backends.
vLLM's continuous batching gives 3-10x throughput for agentic workloads.

Changes:
- config.py: add `vllm` to timmy_model_backend Literal; add vllm_url /
  vllm_model settings (VLLM_URL / VLLM_MODEL env vars)
- cascade.py: add vllm provider type with _check_provider_available
  (hits /health) and _call_vllm (OpenAI-compatible completions)
- providers.yaml: add disabled-by-default vllm-local provider (priority 3,
  port 8001); bump OpenAI/Anthropic backup priorities to 4/5
- health.py: add _check_vllm/_check_vllm_sync with 30-second TTL cache;
  /health and /health/sovereignty reflect vLLM status when it is the
  active backend
- docker-compose.yml: add vllm service behind 'vllm' profile (GPU
  passthrough commented-out template included); add vllm-cache volume
- CLAUDE.md: add vLLM row to Service Fallback Matrix
- tests: 26 new unit tests covering availability checks, _call_vllm,
  providers.yaml validation, config options, and health helpers

Graceful fallback: if vLLM is unavailable the cascade router automatically
falls back to Ollama. The app never crashes.

Fixes #1281

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 21:52:52 -04:00
6c76bf2f66 [claude] Integrate health snapshot into Daily Run pre-flight (#923) (#1280)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-24 01:43:49 +00:00
0436dfd4c4 [claude] Dashboard: Agent Scorecards panel in Mission Control (#929) (#1276)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-24 01:43:21 +00:00
9eeb49a6f1 [claude] Autonomous research pipeline — orchestrator + SOVEREIGNTY.md (#972) (#1274)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-24 01:40:53 +00:00
2d6bfe6ba1 [claude] Agent Self-Correction Dashboard (#1007) (#1269)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-24 01:40:40 +00:00
ebb2cad552 [claude] feat: Session Sovereignty Report Generator (#957) v3 (#1263)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-24 01:40:24 +00:00
003e3883fb [claude] Restore self-modification loop (#983) (#1270)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-24 01:40:16 +00:00
7dfbf05867 [claude] Run 5-test benchmark suite against local model candidates (#1066) (#1271)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-24 01:38:59 +00:00
1cce28d1bb [claude] Investigate: document paths to resolution for 5 closed PRs (#1219) (#1266)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / lint (pull_request) Failing after 29s
Tests / test (pull_request) Has been skipped
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-24 01:36:06 +00:00
4c6b69885d [claude] feat: Agent Energy Budget Monitoring (#1009) (#1267)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-24 01:35:50 +00:00
6b2e6d9e8c [claude] feat: Agent Energy Budget Monitoring (#1009) (#1267)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-24 01:35:49 +00:00
2b238d1d23 [loop-cycle-1] fix: ruff format error on test_autoresearch.py (#1256) (#1257)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-24 01:27:38 +00:00
b7ad5bf1d9 fix: remove unused variable in test_loop_guard_seed (ruff F841) (#1255)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / lint (pull_request) Failing after 33s
Tests / test (pull_request) Has been skipped
2026-03-24 01:20:42 +00:00
2240ddb632 [loop-cycle] fix: three-strike route test isolation for xdist (#1254)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 23:49:00 +00:00
35d2547a0b [claude] Fix cycle-metrics pipeline: seed issue= from queue so retro is never null (#1250) (#1253)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 23:42:23 +00:00
f62220eb61 [claude] Autoresearch H1: Apple Silicon support + M3 Max baseline doc (#905) (#1252)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 23:38:38 +00:00
72992b7cc5 [claude] Fix ImportError: memory_write missing from memory_system (#1249) (#1251)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 23:37:21 +00:00
b5fb6a85cf [claude] Fix pre-existing ruff lint errors blocking git hooks (#1247) (#1248)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 23:33:37 +00:00
fedd164686 [claude] Fix 10 vassal tests flaky under xdist parallel execution (#1243) (#1245)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 23:29:25 +00:00
261b7be468 [kimi] Refactor autoresearch.py -> SystemExperiment class (#906) (#1244)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-23 23:28:54 +00:00
6691f4d1f3 [claude] Add timmy learn autoresearch entry point (#907) (#1240)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / lint (pull_request) Failing after 16s
Tests / test (pull_request) Has been skipped
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 23:14:09 +00:00
ea76af068a [kimi] Add unit tests for paperclip.py (#1236) (#1241)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 23:13:54 +00:00
b61fcd3495 [claude] Add unit tests for research_tools.py (#1237) (#1239)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 23:06:06 +00:00
1e1689f931 [claude] Qwen3 two-model routing via task complexity classifier (#1065) v2 (#1233)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 22:58:21 +00:00
acc0df00cf [claude] Three-Strike Detector (#962) v2 (#1232)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 22:50:59 +00:00
a0c35202f3 [claude] ADR-024: canonical Nostr identity in timmy-nostr (#1223) (#1230)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 22:47:25 +00:00
fe1d576c3c [claude] Gitea activity & branch audit across all repos (#1210) (#1228)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 22:46:16 +00:00
3e65271af6 [claude] Rescue unmerged work: open PRs for 3 abandoned branches (#1218) (#1229)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 22:46:10 +00:00
697575e561 [gemini] Implement semantic index for research outputs (#976) (#1227)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 22:45:29 +00:00
e6391c599d [claude] Enforce one-agent-per-issue via labels, document auto-delete branches (#1220) (#1222)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 22:44:50 +00:00
d697c3d93e [claude] refactor: break up monolithic tools.py into a tools/ package (#1215) (#1221)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 22:43:09 +00:00
31c260cc95 [claude] Add unit tests for vassal/orchestration_loop.py (#1214) (#1216)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 22:42:22 +00:00
3217c32356 [claude] feat: Nexus — persistent conversational awareness space with live memory (#1208) (#1211)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 22:34:48 +00:00
25157a71a8 [loop-cycle] fix: remove unused imports and fix formatting (lint) (#1209)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 22:30:03 +00:00
46edac3e76 [loop-cycle] fix: test_config hardcoded ollama model vs .env override (#1207)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 22:22:40 +00:00
a5b95356dd [claude] Add offline message queue for Workshop panel (#913) (#1205)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 22:16:27 +00:00
b197cf409e [loop-cycle-3] fix: isolate unit tests from local .env and real Gitea API (#1206)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 22:15:37 +00:00
3ed2bbab02 [loop-cycle] refactor: break up git.py::run() into helpers (#538) (#1204)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 22:07:28 +00:00
3d40523947 [claude] Add unit tests for agent_health.py (#1195) (#1203)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 22:02:44 +00:00
f86e2e103d [claude] Add unit tests for vassal/dispatch.py (#1193) (#1200)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 22:00:07 +00:00
7d20d18af1 [claude] test: improve event bus unit test coverage to 99% (#1191) (#1201)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 21:59:59 +00:00
7afb72209a [claude] Add unit tests for chat_store.py (#1192) (#1198)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 21:58:38 +00:00
b12fa8aa07 [claude] Add unit tests for daily_run.py (#1186) (#1199)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 21:58:33 +00:00
9121689a41 [claude] refactor: break up produce_system_status() (#1194) (#1196)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 21:55:50 +00:00
8f8061e224 [claude] refactor: break up cascade.py complete() (#1185) (#1190)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 21:52:27 +00:00
c78922ccbc [kimi] Refactor cli.py::daily_run() — 105 lines → 33 lines (#1168) (#1189)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 21:51:47 +00:00
f3093e9dea [claude] refactor: break up dispatch_issue() into helpers (#1187) (#1188)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 21:49:45 +00:00
b735b553e6 [kimi] Break up dispatch_task() into helper functions (#1137) (#1184)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 21:46:02 +00:00
c5b49d6cff [claude] Grant kimi write permission for PR creation (#1181) (#1182)
Some checks failed
Tests / test (push) Has been cancelled
Tests / lint (push) Has been cancelled
2026-03-23 21:40:46 +00:00
7aa48b4e22 [kimi] Break up _dispatch_via_gitea() into helper functions (#1136) (#1183)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 21:40:17 +00:00
74bf0606a9 [claude] Fix GITEA_API default to VPS address (#1177) (#1178)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 20:59:54 +00:00
d796fe7c53 [claude] Refactor thinking.py::_maybe_file_issues() into focused helpers (#1170) (#1173)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 20:47:06 +00:00
ff921da547 [claude] Refactor timmyctl inbox() into helper functions (#1169) (#1174)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 20:47:00 +00:00
2fcd92e5d9 [claude] Add unit tests for src/config.py (#1172) (#1175)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 20:46:53 +00:00
61377e3a1e [gemini] Docs: Acknowledge The Sovereignty Loop governing architecture (#953) (#1167)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Google Gemini <gemini@hermes.local>
Co-committed-by: Google Gemini <gemini@hermes.local>
2026-03-23 20:14:27 +00:00
de289878d6 [loop-cycle] refactor: add docstrings to 20 undocumented classes (#1130) (#1166)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 20:08:06 +00:00
0d73a4ff7a [claude] Fix ruff S105/S106/B017/E402 errors in bannerlord (#1161) (#1165)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 19:56:07 +00:00
dec9736679 [claude] Sovereignty metrics emitter + SQLite store (#954) (#1164)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 19:52:20 +00:00
08d337e03d [claude] Implement three-tier metabolic LLM router (#966) (#1160)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 19:45:56 +00:00
Alexander Whitestone
9e08e87312 [claude] Bannerlord M0: Run cognitive benchmark on hermes3, fix L1 string-int coercion (#1092) (#1159)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Alexander Whitestone <alexpaynex@gmail.com>
Co-committed-by: Alexander Whitestone <alexpaynex@gmail.com>
2026-03-23 19:38:48 +00:00
6e65b53f3a [loop-cycle-5] feat: implement 4 TODO stubs in timmyctl/cli.py (#1128) (#1158)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 19:34:46 +00:00
2b9a55fa6d [claude] Bannerlord M5: sovereign victory stack (src/bannerlord/) (#1097) (#1155)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 19:26:05 +00:00
495c1ac2bd [claude] Fix 27 ruff lint errors blocking all pushes (#1149) (#1153)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-23 19:06:11 +00:00
da29631c43 [gemini] feat: add Sovereignty Loop architecture document (#953) (#1154)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Google Gemini <gemini@hermes.local>
Co-committed-by: Google Gemini <gemini@hermes.local>
2026-03-23 19:00:45 +00:00
382dd041d9 [kimi] Refactor scorecards.py — break up oversized functions (#1127) (#1152)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-23 18:59:05 +00:00
8421537a55 [claude] Mark setup script tests as skip_ci (#931) (#1151)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 18:49:58 +00:00
0e5948632d [claude] Add unit tests for cascade.py (#1138) (#1150)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-23 18:47:28 +00:00
167 changed files with 26196 additions and 1308 deletions

View File

@@ -34,6 +34,44 @@ Read [`CLAUDE.md`](CLAUDE.md) for architecture patterns and conventions.
---
## One-Agent-Per-Issue Convention
**An issue must only be worked by one agent at a time.** Duplicate branches from
multiple agents on the same issue cause merge conflicts, redundant code, and wasted compute.
### Labels
When an agent picks up an issue, add the corresponding label:
| Label | Meaning |
|-------|---------|
| `assigned-claude` | Claude is actively working this issue |
| `assigned-gemini` | Gemini is actively working this issue |
| `assigned-kimi` | Kimi is actively working this issue |
| `assigned-manus` | Manus is actively working this issue |
### Rules
1. **Before starting an issue**, check that none of the `assigned-*` labels are present.
If one is, skip the issue — another agent owns it.
2. **When you start**, add the label matching your agent (e.g. `assigned-claude`).
3. **When your PR is merged or closed**, remove the label (or it auto-clears when
the branch is deleted — see Auto-Delete below).
4. **Never assign the same issue to two agents simultaneously.**
### Auto-Delete Merged Branches
`default_delete_branch_after_merge` is **enabled** on this repo. Branches are
automatically deleted after a PR merges — no manual cleanup needed and no stale
`claude/*`, `gemini/*`, or `kimi/*` branches accumulate.
If you discover stale merged branches, they can be pruned with:
```bash
git fetch --prune
```
---
## Merge Policy (PR-Only)
**Gitea branch protection is active on `main`.** This is not a suggestion.
@@ -131,6 +169,28 @@ self-testing, reflection — use every tool he has.
## Agent Roster
### Gitea Permissions
All agents that push branches and create PRs require **write** permission on the
repository. Set via the Gitea admin API or UI under Repository → Settings → Collaborators.
| Agent user | Required permission | Gitea login |
|------------|--------------------|----|
| kimi | write | `kimi` |
| claude | write | `claude` |
| gemini | write | `gemini` |
| antigravity | write | `antigravity` |
| hermes | write | `hermes` |
| manus | write | `manus` |
To grant write access (requires Gitea admin or repo admin token):
```bash
curl -s -X PUT "http://143.198.27.163:3000/api/v1/repos/rockachopa/Timmy-time-dashboard/collaborators/<username>" \
-H "Authorization: token <admin-token>" \
-H "Content-Type: application/json" \
-d '{"permission": "write"}'
```
### Build Tier
**Local (Ollama)** — Primary workhorse. Free. Unrestricted.

View File

@@ -150,6 +150,7 @@ async def transcribe_audio(audio: bytes) -> str:
| Service | When Unavailable | Fallback Behavior |
|---------|------------------|-------------------|
| Ollama | No local LLM | Claude backend (if ANTHROPIC_API_KEY set) |
| vLLM | Server not running | Ollama backend (cascade router fallback) |
| Redis | Cache/storage down | In-memory dict (ephemeral) |
| AirLLM | Import error or no Apple Silicon | Ollama backend |
| Voice (Piper) | Service down | Browser Web Speech API |

122
SOVEREIGNTY.md Normal file
View File

@@ -0,0 +1,122 @@
# SOVEREIGNTY.md — Research Sovereignty Manifest
> "If this spec is implemented correctly, it is the last research document
> Alexander should need to request from a corporate AI."
> — Issue #972, March 22 2026
---
## What This Is
A machine-readable declaration of Timmy's research independence:
where we are, where we're going, and how to measure progress.
---
## The Problem We're Solving
On March 22, 2026, a single Claude session produced six deep research reports.
It consumed ~3 hours of human time and substantial corporate AI inference.
Every report was valuable — but the workflow was **linear**.
It would cost exactly the same to reproduce tomorrow.
This file tracks the pipeline that crystallizes that workflow into something
Timmy can run autonomously.
---
## The Six-Step Pipeline
| Step | What Happens | Status |
|------|-------------|--------|
| 1. Scope | Human describes knowledge gap → Gitea issue with template | ✅ Done (`skills/research/`) |
| 2. Query | LLM slot-fills template → 515 targeted queries | ✅ Done (`research.py`) |
| 3. Search | Execute queries → top result URLs | ✅ Done (`research_tools.py`) |
| 4. Fetch | Download + extract full pages (trafilatura) | ✅ Done (`tools/system_tools.py`) |
| 5. Synthesize | Compress findings → structured report | ✅ Done (`research.py` cascade) |
| 6. Deliver | Store to semantic memory + optional disk persist | ✅ Done (`research.py`) |
---
## Cascade Tiers (Synthesis Quality vs. Cost)
| Tier | Model | Cost | Quality | Status |
|------|-------|------|---------|--------|
| **4** | SQLite semantic cache | $0.00 / instant | reuses prior | ✅ Active |
| **3** | Ollama `qwen3:14b` | $0.00 / local | ★★★ | ✅ Active |
| **2** | Claude API (haiku) | ~$0.01/report | ★★★★ | ✅ Active (opt-in) |
| **1** | Groq `llama-3.3-70b` | $0.00 / rate-limited | ★★★★ | 🔲 Planned (#980) |
Set `ANTHROPIC_API_KEY` to enable Tier 2 fallback.
---
## Research Templates
Six prompt templates live in `skills/research/`:
| Template | Use Case |
|----------|----------|
| `tool_evaluation.md` | Find all shipping tools for `{domain}` |
| `architecture_spike.md` | How to connect `{system_a}` to `{system_b}` |
| `game_analysis.md` | Evaluate `{game}` for AI agent play |
| `integration_guide.md` | Wire `{tool}` into `{stack}` with code |
| `state_of_art.md` | What exists in `{field}` as of `{date}` |
| `competitive_scan.md` | How does `{project}` compare to `{alternatives}` |
---
## Sovereignty Metrics
| Metric | Target (Week 1) | Target (Month 1) | Target (Month 3) | Graduation |
|--------|-----------------|------------------|------------------|------------|
| Queries answered locally | 10% | 40% | 80% | >90% |
| API cost per report | <$1.50 | <$0.50 | <$0.10 | <$0.01 |
| Time from question to report | <3 hours | <30 min | <5 min | <1 min |
| Human involvement | 100% (review) | Review only | Approve only | None |
---
## How to Use the Pipeline
```python
from timmy.research import run_research
# Quick research (no template)
result = await run_research("best local embedding models for 36GB RAM")
# With a template and slot values
result = await run_research(
topic="PDF text extraction libraries for Python",
template="tool_evaluation",
slots={"domain": "PDF parsing", "use_case": "RAG pipeline", "focus_criteria": "accuracy"},
save_to_disk=True,
)
print(result.report)
print(f"Backend: {result.synthesis_backend}, Cached: {result.cached}")
```
---
## Implementation Status
| Component | Issue | Status |
|-----------|-------|--------|
| `web_fetch` tool (trafilatura) | #973 | ✅ Done |
| Research template library (6 templates) | #974 | ✅ Done |
| `ResearchOrchestrator` (`research.py`) | #975 | ✅ Done |
| Semantic index for outputs | #976 | 🔲 Planned |
| Auto-create Gitea issues from findings | #977 | 🔲 Planned |
| Paperclip task runner integration | #978 | 🔲 Planned |
| Kimi delegation via labels | #979 | 🔲 Planned |
| Groq free-tier cascade tier | #980 | 🔲 Planned |
| Sovereignty metrics dashboard | #981 | 🔲 Planned |
---
## Governing Spec
See [issue #972](http://143.198.27.163:3000/Rockachopa/Timmy-time-dashboard/issues/972) for the full spec and rationale.
Research artifacts committed to `docs/research/`.

View File

@@ -25,6 +25,19 @@ providers:
tier: local
url: "http://localhost:11434"
models:
# ── Dual-model routing: Qwen3-8B (fast) + Qwen3-14B (quality) ──────────
# Both models fit simultaneously: ~6.6 GB + ~10.5 GB = ~17 GB combined.
# Requires OLLAMA_MAX_LOADED_MODELS=2 (set in .env) to stay hot.
# Ref: issue #1065 — Qwen3-8B/14B dual-model routing strategy
- name: qwen3:8b
context_window: 32768
capabilities: [text, tools, json, streaming, routine]
description: "Qwen3-8B Q6_K — fast router for routine tasks (~6.6 GB, 45-55 tok/s)"
- name: qwen3:14b
context_window: 40960
capabilities: [text, tools, json, streaming, complex, reasoning]
description: "Qwen3-14B Q5_K_M — complex reasoning and planning (~10.5 GB, 20-28 tok/s)"
# Text + Tools models
- name: qwen3:30b
default: true
@@ -118,11 +131,34 @@ providers:
context_window: 32000
capabilities: [text, tools, json, streaming]
# Tertiary: OpenAI (if API key available)
# Tertiary: vLLM (OpenAI-compatible, continuous batching, 3-10x agentic throughput)
# Runs on CUDA GPU or CPU. On Apple Silicon, prefer vllm-mlx-local (above).
# To enable: start vLLM server:
# python -m vllm.entrypoints.openai.api_server \
# --model Qwen/Qwen2.5-14B-Instruct --port 8001
# Then set enabled: true (or TIMMY_LLM_BACKEND=vllm + VLLM_URL=http://localhost:8001)
- name: vllm-local
type: vllm
enabled: false # Enable when vLLM server is running
priority: 3
tier: local
base_url: "http://localhost:8001/v1"
models:
- name: Qwen/Qwen2.5-14B-Instruct
default: true
context_window: 32000
capabilities: [text, tools, json, streaming, complex]
description: "Qwen2.5-14B on vLLM — continuous batching for agentic workloads"
- name: Qwen/Qwen2.5-7B-Instruct
context_window: 32000
capabilities: [text, tools, json, streaming, routine]
description: "Qwen2.5-7B on vLLM — fast model for routine tasks"
# Quinary: OpenAI (if API key available)
- name: openai-backup
type: openai
enabled: false # Enable by setting OPENAI_API_KEY
priority: 3
priority: 4
tier: standard_cloud
api_key: "${OPENAI_API_KEY}" # Loaded from environment
base_url: null # Use default OpenAI endpoint
@@ -134,12 +170,12 @@ providers:
- name: gpt-4o
context_window: 128000
capabilities: [text, vision, tools, json, streaming]
# Quaternary: Anthropic (if API key available)
# Senary: Anthropic (if API key available)
- name: anthropic-backup
type: anthropic
enabled: false # Enable by setting ANTHROPIC_API_KEY
priority: 4
priority: 5
tier: frontier
api_key: "${ANTHROPIC_API_KEY}"
models:
@@ -187,6 +223,20 @@ fallback_chains:
- dolphin3 # base Dolphin 3.0 8B (uncensored, no custom system prompt)
- qwen3:30b # primary fallback — usually sufficient with a good system prompt
# ── Complexity-based routing chains (issue #1065) ───────────────────────
# Routine tasks: prefer Qwen3-8B for low latency (~45-55 tok/s)
routine:
- qwen3:8b # Primary fast model
- llama3.1:8b-instruct # Fallback fast model
- llama3.2:3b # Smallest available
# Complex tasks: prefer Qwen3-14B for quality (~20-28 tok/s)
complex:
- qwen3:14b # Primary quality model
- hermes4-14b # Native tool calling, hybrid reasoning
- qwen3:30b # Highest local quality
- qwen2.5:14b # Additional fallback
# ── Custom Models ───────────────────────────────────────────────────────────
# Register custom model weights for per-agent assignment.
# Supports GGUF (Ollama), safetensors, and HuggingFace checkpoint dirs.

View File

@@ -42,6 +42,10 @@ services:
GROK_ENABLED: "${GROK_ENABLED:-false}"
XAI_API_KEY: "${XAI_API_KEY:-}"
GROK_DEFAULT_MODEL: "${GROK_DEFAULT_MODEL:-grok-3-fast}"
# vLLM backend — set TIMMY_LLM_BACKEND=vllm to activate
TIMMY_LLM_BACKEND: "${TIMMY_LLM_BACKEND:-ollama}"
VLLM_URL: "${VLLM_URL:-http://localhost:8001}"
VLLM_MODEL: "${VLLM_MODEL:-Qwen/Qwen2.5-14B-Instruct}"
extra_hosts:
- "host.docker.internal:host-gateway" # Linux: maps to host IP
networks:
@@ -74,6 +78,49 @@ services:
profiles:
- celery
# ── vLLM — high-throughput inference server (GPU optional) ──────────────
# Requires the 'vllm' profile: docker compose --profile vllm up
#
# GPU (NVIDIA): set VLLM_MODEL and ensure nvidia-container-toolkit is installed.
# CPU-only: add --device cpu to VLLM_EXTRA_ARGS (slower, but works anywhere).
#
# The dashboard reaches vLLM at http://vllm:8001 (inside timmy-net).
# Set VLLM_URL=http://vllm:8001 in the dashboard environment when using this service.
vllm:
image: vllm/vllm-openai:latest
container_name: timmy-vllm
profiles:
- vllm
ports:
- "8001:8001"
environment:
# Model to load — override with VLLM_MODEL env var
VLLM_MODEL: "${VLLM_MODEL:-Qwen/Qwen2.5-7B-Instruct}"
command: >
--model ${VLLM_MODEL:-Qwen/Qwen2.5-7B-Instruct}
--port 8001
--host 0.0.0.0
${VLLM_EXTRA_ARGS:-}
volumes:
- vllm-cache:/root/.cache/huggingface
networks:
- timmy-net
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8001/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 120s
# GPU support — uncomment to enable NVIDIA GPU passthrough
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: all
# capabilities: [gpu]
# ── OpenFang — vendored agent runtime sidecar ────────────────────────────
openfang:
build:
@@ -110,6 +157,8 @@ volumes:
device: "${PWD}/data"
openfang-data:
driver: local
vllm-cache:
driver: local
# ── Internal network ────────────────────────────────────────────────────────
networks:

View File

@@ -0,0 +1,244 @@
# Gitea Activity & Branch Audit — 2026-03-23
**Requested by:** Issue #1210
**Audited by:** Claude (Sonnet 4.6)
**Date:** 2026-03-23
**Scope:** All repos under the sovereign AI stack
---
## Executive Summary
- **18 repos audited** across 9 Gitea organizations/users
- **~6570 branches identified** as safe to delete (merged or abandoned)
- **4 open PRs** are bottlenecks awaiting review
- **3+ instances of duplicate work** across repos and agents
- **5+ branches** contain valuable unmerged code with no open PR
- **5 PRs closed without merge** on active p0-critical issues in Timmy-time-dashboard
Improvement tickets have been filed on each affected repo following this report.
---
## Repo-by-Repo Findings
---
### 1. rockachopa/Timmy-time-dashboard
**Status:** Most active repo. 1,200+ PRs, 50+ branches.
#### Dead/Abandoned Branches
| Branch | Last Commit | Status |
|--------|-------------|--------|
| `feature/voice-customization` | 2026-03-22 | Gemini-created, no PR, abandoned |
| `feature/enhanced-memory-ui` | 2026-03-22 | Gemini-created, no PR, abandoned |
| `feature/soul-customization` | 2026-03-22 | Gemini-created, no PR, abandoned |
| `feature/dreaming-mode` | 2026-03-22 | Gemini-created, no PR, abandoned |
| `feature/memory-visualization` | 2026-03-22 | Gemini-created, no PR, abandoned |
| `feature/voice-customization-ui` | 2026-03-22 | Gemini-created, no PR, abandoned |
| `feature/issue-1015` | 2026-03-22 | Gemini-created, no PR, abandoned |
| `feature/issue-1016` | 2026-03-22 | Gemini-created, no PR, abandoned |
| `feature/issue-1017` | 2026-03-22 | Gemini-created, no PR, abandoned |
| `feature/issue-1018` | 2026-03-22 | Gemini-created, no PR, abandoned |
| `feature/issue-1019` | 2026-03-22 | Gemini-created, no PR, abandoned |
| `feature/self-reflection` | 2026-03-22 | Only merge-from-main commits, no unique work |
| `feature/memory-search-ui` | 2026-03-22 | Only merge-from-main commits, no unique work |
| `claude/issue-962` | 2026-03-22 | Automated salvage commit only |
| `claude/issue-972` | 2026-03-22 | Automated salvage commit only |
| `gemini/issue-1006` | 2026-03-22 | Incomplete agent session |
| `gemini/issue-1008` | 2026-03-22 | Incomplete agent session |
| `gemini/issue-1010` | 2026-03-22 | Incomplete agent session |
| `gemini/issue-1134` | 2026-03-22 | Incomplete agent session |
| `gemini/issue-1139` | 2026-03-22 | Incomplete agent session |
#### Duplicate Branches (Identical SHA)
| Branch A | Branch B | Action |
|----------|----------|--------|
| `feature/internal-monologue` | `feature/issue-1005` | Exact duplicate — delete one |
| `claude/issue-1005` | (above) | Merge-from-main only — delete |
#### Unmerged Work With No Open PR (HIGH PRIORITY)
| Branch | Content | Issues |
|--------|---------|--------|
| `claude/issue-987` | Content moderation pipeline, Llama Guard integration | No open PR — potentially lost |
| `claude/issue-1011` | Automated skill discovery system | No open PR — potentially lost |
| `gemini/issue-976` | Semantic index for research outputs | No open PR — potentially lost |
#### PRs Closed Without Merge (Issues Still Open)
| PR | Title | Issue Status |
|----|-------|-------------|
| PR#1163 | Three-Strike Detector (#962) | p0-critical, still open |
| PR#1162 | Session Sovereignty Report Generator (#957) | p0-critical, still open |
| PR#1157 | Qwen3 routing | open |
| PR#1156 | Agent Dreaming Mode | open |
| PR#1145 | Qwen3-14B config | open |
#### Workflow Observations
- `loop-cycle` bot auto-creates micro-fix PRs at high frequency (PR numbers climbing past 1209 rapidly)
- Many `gemini/*` branches represent incomplete agent sessions, not full feature work
- Issues get reassigned across agents causing duplicate branch proliferation
---
### 2. rockachopa/hermes-agent
**Status:** Active — AutoLoRA training pipeline in progress.
#### Open PRs Awaiting Review
| PR | Title | Age |
|----|-------|-----|
| PR#33 | AutoLoRA v1 MLX QLoRA training pipeline | ~1 week |
#### Valuable Unmerged Branches (No PR)
| Branch | Content | Age |
|--------|---------|-----|
| `sovereign` | Full fallback chain: Groq/Kimi/Ollama cascade recovery | 9 days |
| `fix/vision-api-key-fallback` | Vision API key fallback fix | 9 days |
#### Stale Merged Branches (~12)
12 merged `claude/*` and `gemini/*` branches are safe to delete.
---
### 3. rockachopa/the-matrix
**Status:** 8 open PRs from `claude/the-matrix` fork all awaiting review, all batch-created on 2026-03-23.
#### Open PRs (ALL Awaiting Review)
| PR | Feature |
|----|---------|
| PR#916 | Touch controls, agent feed, particles, audio, day/night cycle, metrics panel, ASCII logo, click-to-view-PR |
These were created in a single agent session within 5 minutes — needs human review before merge.
---
### 4. replit/timmy-tower
**Status:** Very active — 100+ PRs, complex feature roadmap.
#### Open PRs Awaiting Review
| PR | Title | Age |
|----|-------|-----|
| PR#93 | Task decomposition view | Recent |
| PR#80 | `session_messages` table | 22 hours |
#### Unmerged Work With No Open PR
| Branch | Content |
|--------|---------|
| `gemini/issue-14` | NIP-07 Nostr identity |
| `gemini/issue-42` | Timmy animated eyes |
| `claude/issue-11` | Kimi + Perplexity agent integrations |
| `claude/issue-13` | Nostr event publishing |
| `claude/issue-29` | Mobile Nostr identity |
| `claude/issue-45` | Test kit |
| `claude/issue-47` | SQL migration helpers |
| `claude/issue-67` | Session Mode UI |
#### Cleanup
~30 merged `claude/*` and `gemini/*` branches are safe to delete.
---
### 5. replit/token-gated-economy
**Status:** Active roadmap, no current open PRs.
#### Stale Branches (~23)
- 8 Replit Agent branches from 2026-03-19 (PRs closed/merged)
- 15 merged `claude/issue-*` branches
All are safe to delete.
---
### 6. hermes/timmy-time-app
**Status:** 2-commit repo, created 2026-03-14, no activity since. **Candidate for archival.**
Functionality appears to be superseded by other repos in the stack. Recommend archiving or deleting if not planned for future development.
---
### 7. google/maintenance-tasks & google/wizard-council-automation
**Status:** Single-commit repos from 2026-03-19 created by "Google AI Studio". No follow-up activity.
Unclear ownership and purpose. Recommend clarifying with rockachopa whether these are active or can be archived.
---
### 8. hermes/hermes-config
**Status:** Single branch, updated 2026-03-23 (today). Active — contains Timmy orchestrator config.
No action needed.
---
### 9. Timmy_Foundation/the-nexus
**Status:** Greenfield — created 2026-03-23. 19 issues filed as roadmap. PR#2 (contributor audit) open.
No cleanup needed yet. PR#2 needs review.
---
### 10. rockachopa/alexanderwhitestone.com
**Status:** All recent `claude/*` PRs merged. 7 non-main branches are post-merge and safe to delete.
---
### 11. hermes/hermes-config, rockachopa/hermes-config, Timmy_Foundation/.profile
**Status:** Dormant config repos. No action needed.
---
## Cross-Repo Patterns & Inefficiencies
### Duplicate Work
1. **Timmy spring/wobble physics** built independently in both `replit/timmy-tower` and `replit/token-gated-economy`
2. **Nostr identity logic** fragmented across 3 repos with no shared library
3. **`feature/internal-monologue` = `feature/issue-1005`** in Timmy-time-dashboard — identical SHA, exact duplicate
### Agent Workflow Issues
- Same issue assigned to both `gemini/*` and `claude/*` agents creates duplicate branches
- Agent salvage commits are checkpoint-only — not complete work, but clutter the branch list
- Gemini `feature/*` branches created on 2026-03-22 with no PRs filed — likely a failed agent session that created branches but didn't complete the loop
### Review Bottlenecks
| Repo | Waiting PRs | Notes |
|------|-------------|-------|
| rockachopa/the-matrix | 8 | Batch-created, need human review |
| replit/timmy-tower | 2 | Database schema and UI work |
| rockachopa/hermes-agent | 1 | AutoLoRA v1 — high value |
| Timmy_Foundation/the-nexus | 1 | Contributor audit |
---
## Recommended Actions
### Immediate (This Sprint)
1. **Review & merge** PR#33 in `hermes-agent` (AutoLoRA v1)
2. **Review** 8 open PRs in `the-matrix` before merging as a batch
3. **Rescue** unmerged work in `claude/issue-987`, `claude/issue-1011`, `gemini/issue-976` — file new PRs or close branches
4. **Delete duplicate** `feature/internal-monologue` / `feature/issue-1005` branches
### Cleanup Sprint
5. **Delete ~65 stale branches** across all repos (itemized above)
6. **Investigate** the 5 closed-without-merge PRs in Timmy-time-dashboard for p0-critical issues
7. **Archive** `hermes/timmy-time-app` if no longer needed
8. **Clarify** ownership of `google/maintenance-tasks` and `google/wizard-council-automation`
### Process Improvements
9. **Enforce one-agent-per-issue** policy to prevent duplicate `claude/*` / `gemini/*` branches
10. **Add branch protection** requiring PR before merge on `main` for all repos
11. **Set a branch retention policy** — auto-delete merged branches (GitHub/Gitea supports this)
12. **Share common libraries** for Nostr identity and animation physics across repos
---
*Report generated by Claude audit agent. Improvement tickets filed per repo as follow-up to this report.*

111
docs/SOVEREIGNTY_LOOP.md Normal file
View File

@@ -0,0 +1,111 @@
# The Sovereignty Loop
This document establishes the primary engineering constraint for all Timmy Time development: every task must increase sovereignty as a default deliverable. Not as a future goal. Not as an optimization pass. As a constraint on every commit, every function, every inference call.
The full 11-page governing architecture document is available as a PDF: [The-Sovereignty-Loop.pdf](./The-Sovereignty-Loop.pdf)
> "The measure of progress is not features added. It is model calls eliminated."
## The Core Principle
> **The Sovereignty Loop**: Discover with an expensive model. Compress the discovery into a cheap local rule. Replace the model with the rule. Measure the cost reduction. Repeat.
Every call to an LLM, VLM, or external API passes through three phases:
1. **Discovery** — Model sees something for the first time (expensive, unavoidable, produces new knowledge)
2. **Crystallization** — Discovery compressed into durable cheap artifact (requires explicit engineering)
3. **Replacement** — Crystallized artifact replaces the model call (near-zero cost)
**Code review requirement**: If a function calls a model without a crystallization step, it fails code review. No exceptions. The pattern is always: check cache → miss → infer → crystallize → return.
## The Sovereignty Loop Applied to Every Layer
### Perception: See Once, Template Forever
- First encounter: VLM analyzes screenshot (3-6 sec) → structured JSON
- Crystallized as: OpenCV template + bounding box → `templates.json` (3 ms retrieval)
- `crystallize_perception()` function wraps every VLM response
- **Target**: 90% of perception cycles without VLM by hour 1, 99% by hour 4
### Decision: Reason Once, Rule Forever
- First encounter: LLM reasons through decision (1-5 sec)
- Crystallized as: if/else rules, waypoints, cached preferences → `rules.py`, `nav_graph.db` (<1 ms)
- Uses Voyager pattern: named skills with embeddings, success rates, conditions
- Skill match >0.8 confidence + >0.6 success rate → executes without LLM
- **Target**: 70-80% of decisions without LLM by week 4
### Narration: Script the Predictable, Improvise the Novel
- Predictable moments → template with variable slots, voiced by Kokoro locally
- LLM narrates only genuinely surprising events (quest twist, death, discovery)
- **Target**: 60-70% templatized within a week
### Navigation: Walk Once, Map Forever
- Every path recorded as waypoint sequence with terrain annotations
- First journey = full perception + planning; subsequent = graph traversal
- Builds complete nav graph without external map data
### API Costs: Every Dollar Spent Must Reduce Future Dollars
| Week | Groq Calls/Hr | Local Decisions/Hr | Sovereignty % | Cost/Hr |
|---|---|---|---|---|
| 1 | ~720 | ~80 | 10% | $0.40 |
| 2 | ~400 | ~400 | 50% | $0.22 |
| 4 | ~160 | ~640 | 80% | $0.09 |
| 8 | ~40 | ~760 | 95% | $0.02 |
| Target | <20 | >780 | >97% | <$0.01 |
## The Sovereignty Scorecard (5 Metrics)
Every work session ends with a sovereignty audit. Every PR includes a sovereignty delta. Not optional.
| Metric | What It Measures | Target |
|---|---|---|
| Perception Sovereignty % | Frames understood without VLM | >90% by hour 4 |
| Decision Sovereignty % | Actions chosen without LLM | >80% by week 4 |
| Narration Sovereignty % | Lines from templates vs LLM | >60% by week 2 |
| API Cost Trend | Dollar cost per hour of gameplay | Monotonically decreasing |
| Skill Library Growth | Crystallized skills per session | >5 new skills/session |
Dashboard widget on alexanderwhitestone.com shows these in real-time during streams. HTMX component via WebSocket.
## The Crystallization Protocol
Every model output gets crystallized:
| Model Output | Crystallized As | Storage | Retrieval Cost |
|---|---|---|---|
| VLM: UI element | OpenCV template + bbox | templates.json | 3 ms |
| VLM: text | OCR region coords | regions.json | 50 ms |
| LLM: nav plan | Waypoint sequence | nav_graph.db | <1 ms |
| LLM: combat decision | If/else rule on state | rules.py | <1 ms |
| LLM: quest interpretation | Structured entry | quests.db | <1 ms |
| LLM: NPC disposition | Name→attitude map | npcs.db | <1 ms |
| LLM: narration | Template with slots | narration.json | <1 ms |
| API: moderation | Approved phrase cache | approved.set | <1 ms |
| Groq: strategic plan | Extracted decision rules | strategy.json | <1 ms |
Skill document format: markdown + YAML frontmatter following agentskills.io standard (name, game, type, success_rate, times_used, sovereignty_value).
## The Automation Imperative & Three-Strike Rule
Applies to developer workflow too, not just the agent. If you do the same thing manually three times, you stop and write the automation before proceeding.
**Falsework Checklist** (before any cloud API call):
1. What durable artifact will this call produce?
2. Where will the artifact be stored locally?
3. What local rule or cache will this populate?
4. After this call, will I need to make it again?
5. If yes, what would eliminate the repeat?
6. What is the sovereignty delta of this call?
## The Graduation Test (Falsework Removal Criteria)
All five conditions met simultaneously in a single 24-hour period:
| Test | Condition | Measurement |
|---|---|---|
| Perception Independence | 1 hour, no VLM calls after minute 15 | VLM calls in last 45 min = 0 |
| Decision Independence | Full session with <5 API calls total | Groq/cloud calls < 5 |
| Narration Independence | All narration from local templates + local LLM | Zero cloud TTS/narration calls |
| Economic Independence | Earns more sats than spends on inference | sats_earned > sats_spent |
| Operational Independence | 24 hours unattended, no human intervention | Uptime > 23.5 hrs |
> "The arch must hold after the falsework is removed."

View File

@@ -0,0 +1,296 @@
%PDF-1.4
%“Œ‹ž ReportLab Generated PDF document (opensource)
1 0 obj
<<
/F1 2 0 R /F2 3 0 R /F3 4 0 R /F4 6 0 R /F5 8 0 R /F6 9 0 R
/F7 15 0 R
>>
endobj
2 0 obj
<<
/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font
>>
endobj
3 0 obj
<<
/BaseFont /Times-Bold /Encoding /WinAnsiEncoding /Name /F2 /Subtype /Type1 /Type /Font
>>
endobj
4 0 obj
<<
/BaseFont /Times-Italic /Encoding /WinAnsiEncoding /Name /F3 /Subtype /Type1 /Type /Font
>>
endobj
5 0 obj
<<
/Contents 23 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
6 0 obj
<<
/BaseFont /Times-Roman /Encoding /WinAnsiEncoding /Name /F4 /Subtype /Type1 /Type /Font
>>
endobj
7 0 obj
<<
/Contents 24 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
8 0 obj
<<
/BaseFont /Courier /Encoding /WinAnsiEncoding /Name /F5 /Subtype /Type1 /Type /Font
>>
endobj
9 0 obj
<<
/BaseFont /Symbol /Name /F6 /Subtype /Type1 /Type /Font
>>
endobj
10 0 obj
<<
/Contents 25 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
11 0 obj
<<
/Contents 26 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
12 0 obj
<<
/Contents 27 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
13 0 obj
<<
/Contents 28 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
14 0 obj
<<
/Contents 29 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
15 0 obj
<<
/BaseFont /ZapfDingbats /Name /F7 /Subtype /Type1 /Type /Font
>>
endobj
16 0 obj
<<
/Contents 30 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
17 0 obj
<<
/Contents 31 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
18 0 obj
<<
/Contents 32 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
19 0 obj
<<
/Contents 33 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
20 0 obj
<<
/PageMode /UseNone /Pages 22 0 R /Type /Catalog
>>
endobj
21 0 obj
<<
/Author (\(anonymous\)) /CreationDate (D:20260322181712+00'00') /Creator (\(unspecified\)) /Keywords () /ModDate (D:20260322181712+00'00') /Producer (ReportLab PDF Library - \(opensource\))
/Subject (\(unspecified\)) /Title (\(anonymous\)) /Trapped /False
>>
endobj
22 0 obj
<<
/Count 11 /Kids [ 5 0 R 7 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 16 0 R 17 0 R 18 0 R
19 0 R ] /Type /Pages
>>
endobj
23 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 611
>>
stream
Gatm7a\pkI(r#kr^15oc#d(OW9W'%NLCsl]G'`ct,r*=ra:9Y;O.=/qPPA,<)0u%EDp`J-)D8JOZNBo:EH0+93:%&I&d`o=Oc>qW[`_>md85u<*X\XrP6`u!aE'b&MKLI8=Mg=[+DUfAk>?b<*V(>-/HRI.f.AQ:/Z;Q8RQ,uf4[.Qf,MZ"BO/AZoj(nN.=-LbNB@mIA0,P[A#-,.F85[o)<uTK6AX&UMiGdCJ(k,)DDs</;cc2djh3bZlGB>LeAaS'6IiM^k:&a-+o[tF,>h6!h_lWDGY*uAlMJ?.$S/*8Vm`MEp,TV(j01fp+-RiIG,=riK'!mcY`41,5^<Fb\^/`jd#^eR'RY?C=MrM/#*H$8t&9N(fNgoYh&SDT/`KKFC`_!Jd_MH&i`..L+eT;drS+7a3&gpq=a!L0!@^9P!pEUrig*74tNM[=V`aL.o:UKH+4kc=E&*>TA$'fi"hC)M#MS,H>n&ikJ=Odj!TB7HjVFIsGiSDs<c!9Qbl.gX;jh-".Ys'VRFAi*R&;"eo\Cs`qdeuh^HfspsS`r0DZGQjC<VDelMs;`SYWo;V@F*WIE9*7H7.:*RQ%gA5I,f3:k$>ia%&,\kO!4u~>endstream
endobj
24 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2112
>>
stream
Gatm;gN)%<&q/A5FQJ?N;(un#q9<pGPcNkN4(`bFnhL98j?rtPScMM>?b`LO!+'2?>1LVB;rV2^Vu-,NduB#ir$;9JW/5t/du1[A,q5rTPiP\:lPk/V^A;m3T4G<n#HMN%X@KTjrmAX@Ft3f\_0V]l;'%B)0uLPj-L2]$-hETTlYY)kkf0!Ur_+(8>3ia`a=%!]lb@-3Md1:7.:)&_@S'_,o0I5]d^,KA2OcA_E$JM]Z[;q#_Y69DLSqMoC1s2/n0;<"Z_gm>Lsk6d7A$_H,0o_U7?#4]C5!*cNV+B]^5OnG>WdB'2Pn>ZQ)9/_jBY.doEVFd6FYKjF<A8=m5uGn4gU-@P9n(rI:Qq:FsSA)/:VTP8\lhj2#6ApURNhalBJoU^$^'@mn!,BWDt<AF@U4B89H'BW7#l`H`R,*_N]F1`qNa1j!eKY:aR3p@5[n<r_1cE]rLj62'lK'cVDYndl\6<Cm?%B:Z>nB:[%Ft)/$#B>JM$UP8A0/,8MLf#nDSeH^_T5E!L-[2O5mU<jpXXBo9XeVBann[mSNE21KVn+l9f]?,n7WR@L:FfNMd5((XBC:/tmVO,^-oP]"#\G."W">`S?nEbuH.X!I9He::B(!Y;;2gZ#I4!*G,]LIVA"<E5iblY?O,gSrI[_"TE>:4Hh7\j;LJK&Hg?mS.&Re?X5NFgNlh&S=G7*]T;#nN7=AAClhL"!9_a]SA/?3oDEk7jk/&b_[Y*NbtQ'"3f0]epO/m+5V]UrDS3?;amUh7O8l)C"(.8R-4P8Kb$@p$a,nP2S+KS_I(-8A(b4nJ;\s::1HQ7joV1(6Ue/mFbSAJ=Grd/;]\GeD^m1_e:j,a[)u4i*i*:7SQPMo#)\)MPp:cDD09&s[mM2_@9]_-7WMV1]uNcb4,FnrZdfL@jC%kJHjF%6L5RE(\gZ.@GJ_\CZ?#jcYA"b*ZTp0f-DsI$.X@fcWl+94`3F9BUZ%qGKG43K5V;jl]tb'&<>?NU)_s[hepiJ![@ej%/DH:tf3+p^]P/us*LmWd1`^VLl'k"5N8H:6r'V1heU1'M,6FK^ID8Nds$'kajj5PJYn+_N^C#4k3\#C6[D_Y\MO/C@YP`kDH:bkc=3.,&8O;cD[(c/WH>Vp_KcV(/%bh/Ec3U()<\7;UG`6=[P:4ah_l^@;!pL55.g=G@KJsjQPHSE4HdG1O-nBuPFY&lmLYa+beK)K?LAb8D"T(DK5$L0ON^IB+:Q2Vn(<<atkt*'ADH,_BDsSL7ClRh\J^B^X&eCO2$NIcg9KVHoWq>0s2fp!b1GZ+%K,NeKZ<3hDIp:]INMurJ:pS&G:gKG>\./?UQ#$eGCq+2:]dQ+mj=+j%+FX`FmAogol!t#S^j0REChrCiB^6_\i6XP_9A92)[H-OBQ-^QV=bOrfQeop/q'f)Rd8*CSbPXcqABTI;Jf.%Foa[>:LE4mcOkC/q^DlM7$#aGGF87YQ4PsYuFY'GsT\r1qpDljUWhGoOpJ^<t;o+@[V4XG]8K/<do29F"^QnAPQs(S1'Onu9^q+I6=//DAT#5k(lOVZ+&JgEhZ=1e_dedNZ&CGR>Sn"(,&'<74C%2'H7u,,<:?Uk=>6"$mO5`-%cE^r.#D$n(Un+J&FcD,(btu4G`Be/i5ka60S*^"C9c-EsWYL*H'pS)dKq[g7Q]b@3Ar$XZl4sKdK0%>6N]p<\fA.PRA;r(60Z[YE/(bM#H-sEl8glMDc13\n"PjqnGnP2EP#2(G*`P4EZKWY[r52.KA94,mXeNiJ]aIb4jctGF4Y^j[UL#q<*!@4p28#j!`p>3.[nlNA:$9hsj(&!Y?d`_:J3[/cd/"j!5+0I;^Aa7o*H*RPCjtBk=g)p2@F@T<[6s+.HXC72TnOuNkmce'5arFH+O`<nI]E3&ZMF>QFc>B+7D=UbdV'Doj(R!.H^<_1>NuF)SJUP-<1_5$AS8$kL$Kd8mW9oFeY+ksfU^+>Bjlh3[E9Q-BhuT=5B9_fpYq.#B1C:9H9WLHCG_TS-G8kE+)`hnTD/Kggt54$fdqH-QM1kc]@$jhjj%Jd9.G:o@maribiV!4Iqar3O!;,iYmZVV?=:*%&jM!_N3d?Nj)l!BGKDQB_sKgce(&pK_1pDg~>endstream
endobj
25 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2489
>>
stream
Gatm<Bi?6H')g+ZaDcfBZ-B`S<f>T`j#M:&i#)`0mh[0+MH3<KTeBK4@'m[t?QIs;#pb8p_Mi0YOngIWO-^kaLu6:&Q8R&C1]$o76r?Xa"\!-edd3.RcVFI%Yql\$Amu\>IQY[ao0`D)jjIt$]_"#eK/>,mP$q]lVm@,9S+_D+/s_LRct1sTF;mq$1_Y#F0q\@KRXLm_O%.5^;ER[+8O82sF2aH8P0qDpampV\N+`i:knJ*lpZm;1.6X7ZPc"P$U]iqtb0iimqem5*S:&=:HVK^N/.<1C-u4bH;&E%!Lphek0U]q--OhL^aF=+"_g9mgKsB.leVYe@4f<)P<NP7=DtF>0kGP?OAFaKc'-,G8:FQXqZb=9#+GbYhRcP48mEsV%PT-H%<JgbH3AIMPJsDe#K;V7M8_q[;73r]QoT=XRUiA&2B#RoL=*2J.Z**+\W\aM$n`K3\OML"9KI5)_Y9l)@K-H96,-hJh!R6LgD.=>?8n/[F$VJJNmV?(7np[X_N2V*ETM"!2-9"c%f<TD++5*N,7AHtmf'$i^li;lo-nhm#YXirfr41qsq\8*Ci1<Zbk@\o.q,1lSjRU,k7VTCcTb+)j1X5,\kZ,7G`0q."qOIZ3"sZHDe*_`GXkIC/-'gd&pQ1"068[899PZ8Mi!&k2iaCd%j-sKh+lciaH/]gAhcZbF.3-H76RUWbj@VGfRMME]djehu3M-Ou@;WCE%n4,D[:krIm!$L4BDE>=JT*al;`=TmYm#PqET'Uh,aH%,\k9c8\u#.g_C4/Xq#[WW+(5&D:eu\!Y.-$.Va]@1dgbL4$1;b%1L;<;i(5"oaWFgjPYSO9-3.<I_=5dV,gE5Spb.;"hX=aqKu^Xf#+h`o(]Sr8/%'*67GAoN^DX4?C/@(u(2JSq.OF8;>.)BEk<frh]m*2e-j!_MHlP0egP%SMf1()8_,PWo1)J1J%q!Y]Cb%o/A-a"T^JUeONPH=+ES:W_N$C#>Q3[`ONAmAjcNVO"D<Oh("Bf4SKTYu[U4P$*q\Gpc`/GH-PZBSGXpc/XY5$tcbR9ZY,hc:X_qs4:%9_ubq!W08`80FnP@07_nV$W9p049\[9N5"[6(U1Ig65[I\!qcJ"KorMEM1]\R5o&$Z0U,hn.A/FZ^"P\9Pd`K69X^p$)2BSPZ-hkfrK*#<9LEL7ni@2Se_:2[ei%KMd`bO`<LB9\:=HQjI]pqq"[@Nh4Iu7bF50EZ<'/#?8.<ETQugk0qAG-hK1,(V1a9/#;-(>Kn=WCA%N(S>M;h]b@J^D%I]ilPDe%qW[B)rBqCTTX5^AlM"ZWV2;f^+p7juA;<i%_(!YY$]cF$fIV>pd6-?u>$Rms.ECrS/J`8>n.lKeMKDQc.H[S&;B95.(:"`2A7QY=5](`*bN^(YNhF[,]Djh;LmiJ,_"s=#j(8d;.g6F,CoUqRX#<Qid,kmd3EP2jC9D$]N@^pj^1eZto<sp*"jBIZ-[fCng5m"p&H)&8E52C/<rfWnTq-8L98!3\BJ8DJFks[0]n;1-et*c/5r8;U&]Dun5Oq<J17K35NB?Rs(Pd$`K0G/U>GZZC_PQQf>T)]a&A8R^g],:[]L+/83Eh?`cq1aEaXU[\S'c[`!e/,g0.5-6rbWSaQfr4W;pDZ51`EEu*t<G6_U5B4rjhu)&oYh\4H)e*p!Hf`;%1?20oY*qqb]KLUZiP7]%%X9'umr$-o>JRBQR$SK^]i2d`f5!Icg6CCaTNPgNbPaY/FDk*O6=nY1j8G\0pl2gTd9m1SDWWh[uQNCFRDIH_"[/F@r)IEObA3UVm82UN0:6+@.LhOU?A]+TI`Q\TV],jH:b\9uHGe4Q9'GX:)'T7./J:j<5J.L3sk_%qn$&T'eLSo`?3gF9F='s#E16?""E]3IW<eL.]5&:_tJ7e:#%4=gLQK*#I/(CE)oS*V7KO[d3#^`pabg[MBmkSH%92oCgZ=o<.a&lc,e<]&RI`pl;V2,"f^dC@1.3VdX3\F2l50Y=9HpL^mu-JgSgn,1G/G't^Mkhe"<1-Oh/>['oDAFKG\s^Suc*ib$@KhsVhK/BP1LXgX(d1-GooQM6CggPu1PY2?R)*NK\6XduTug+BhoEbQrsBOZ[%)SL$$Rd+1F0pu/7;0VoM@mp+i^V%K=bk<&1KsEm]NHPo"FfinGR.7Yn2,Wr0="8Wo5M+NjflT8HZGV+8_S4<'W&G3rD_QnUk0c;q3Qfou"X<[Q%HWINl_;P/+H7"Tcq?K7Ggk@&<BRL#D4F!$Fmke3-e2IE\RNE4,c'"6c(odL+r]3`%'WEDiE@2)+?TVq/]S747hL/Zl]FBu4C1>DI8TGrJS$V"JSH/D7*.X75>ZZa&aOC8rp>e$fH/N:92sd>$MGU.k/uQUm$!M)SDM7g5,>%F`%T0Vl9lS`6I(*O_4NOh0/NOJ^=t\lG.7;)rS&iuOo'9F:B/sVFYD+$k=`9/(?luKOWLDHcPHMY(ZCqi&TQ2S!r%q>b<DKp%mXdk2u~>endstream
endobj
26 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2711
>>
stream
Gau0DD0+Gi')q<+Z'1SNSXtlc.?^DfpG0:dagd\5H-ok]1n>+E65!c@?pN(rEc_9ZScpNG;B2a*lc$,&K37'JZb+O9*)VNqaF+g5d6Hdck\3F^9_0Q!8W_<s3W1Wrqf(]S9IT'?ZQL4,K65!1kkM&YYsC4JSkR!D$$2Q4Y\WHM^6\ClhaeumQ*OV?Y'!D+U?Rp)<RYd[;a.#QH^]H)S*[6kc]V\)e&8H]8us9aHS^:GRcPDp7+4+iAq8okJ+F(>Blg."9*4$BEdTd0-IX(YI]M`#fk[+![o8UQn6$H66IB3=92"<@M@H;AP%fg,Iu\"oP*Y!r"Z+AYVf_iX0Zipi[7HJ,/Dr%+H+G(\NG7Mp(D#re@kOOE-gc7`<*c=8c+!'V=H6iSp+ZM\ANG119C`M`%?hkeXYf[gCca04])!['G1q.:'LoD[U>);c317bG!L!<i0MU=D:NWoSQE2KN<SVeK@K,l]01[KgDa2A3P+m/?SAj""0:;Ur%&R+L8$_P.JZZ<o[9_7R81KH-[34q$rXr)Wh7ZQQC$bYu7'0NiXE@OubP*]_i>O/fNc`J2rGKi3r=`&0AP'"d9-flS,dhU5b?%J7^n$/XaQc5EX3Hs!<FbL83uBYXGpDT\fTG(5.BJ0hS%])bf2B%f+TX61YpE`A'XbKXIV\i?)I+".-/8<ijs/<_(9/V4'nZB#1YD=6=E".-W)>R]&bS#U?m1DCC[c=8Bm>Gu2<78T^H[@Qs*q(6?7D<dO852tB97aXGeG%'h+4+J"5_&B4#ZiJh_%%FKR8>AHQC@iU2b>UGe89lLJ.fbnrNYjZYWkSO1S7[eSZ(^]2?Z#DA80/qhF.>,9Xa$3,Y2R7/HS-:f$mm(/DM=J+b5.k9/`1Nl?2PO2.qI9Q?Rm1uE8c93HI@8#)0<Qh4k*nn"nbel9VbF$Ik"cL.4/Y!==RM:,$H#M&4?&Z)9tA_4N&cm@\M/0L5Z4iTS<6eAs9Wii>((.KDW43Xd!=sO#]M*l:,k2A82L^P*s3OUVDYYpWbU6,`QmG=GBjrLl20kB-[=W%Ns;>$6@g<`Hl*iA^,+dZs/.bd&LmXi-f^4Z/:9Z@-ZYI*1"O.,Bjhe-`FHk;$0PYKtj)!W7VI0[t3_uJ.hct]Ko(?J"kP(_s,PH0]H.8WjhZ<%2O_QiJt_61S"6EPS-9*lUmPuH?D\Di%d3!b("RQ)k(=idnMeB5&Ha[R].*_6g3ce8V>lM@6:>t5)[aK(R9C8"X13@:_,,qs8g'sL_XIG<><liR$//JY%ERj.o1*_iN2"#)chKW.5SKj,O0:mQNd!o6FV+T.h(*Fk2[>NfAC<&MlOio"RnL`Ko[3G7MGqAYrN(g&c5Z79-#iA4n/G'$]R7=LIiDhgb@XuXKOFee7Af`:&h-q_j&I;K\o&43*</q@sPTCYW.TpNV58(Ap!Fl%8"]J?do$7clL&77;sd5U"2]m@dDIfeORqHAD2ICV/Xo4[:-IA,U[c<"a;o7YabqR<q9&_[R8cL)@Qkc:.8GsQ:I>k;(T,.4hl+SMV#UjRZ4J`]6JDh`uCi6\IE/K>hZ,M@c]AHTcQeL)W%g<o'ciW]G$5cC`k7G-F8(K5?^rDR'=UIUALh%sk`d!CO/iUY*42DTScdi3918CA@"39l=gH!gSh2o'_pGTe(gbK'k0E+7N!o"aeg)\XXC#J\\okne[8=D8bmd(fNPDYF&sMolOo<VDsm*aI'Eq-&_/deU`?NE4q?>52Z^g1nUk.OsQH%]5P<UB5amJ-:5Q:&&j9F:W&e2o#/@F9hE*[$H]Er2V][(U0A;kbWrjXG/JQ@pO<N3SJUoXOA48^I;#R\crt/rI'1m0DH%10YO6Winh]ZFdAj'mqR.fUjrlOllm=9DpY8=UsTYDeS3Emn]hDO:mdNTQY7>JQqi^".9_<OMnSWJVZqp&`DXC3nsX!+Q+a<!*n7?oDHPFNA@6P_EEck`hR(XK*aGHE85oeDR$'F&d1<pD2V:aS=fsBi'dBVd2%[`'Yu&5h?+Yllo3LjB[#8S]c?9/fdO%fERqafOmEaQ's+DkA5qbW!:UQ=8Ero#tqe@hZ1_5]3,b/FP=asg7\3X4-IoG:>^#SO2mgH"G3sBg8SHR>Fgu-J;fXAA#'mA"1VN"u5/#^;2%68(uK)8mK7`k%Kf:i9$9/8b78;f`1n=c^fh#_o[TeA^bFTL=pP)_*THO9"\5TY4&00HU],N%1UN+`7:#gDS\bJ5)1Eu;W:R]!F2d,?=,UGehUkU2aZ`BA[bH#PWp(G7NG?(r17dAt/s@#!jV1:>N,0))qYoG8U["V^Q;oO:0;KbYuP0q-(*.`ni<:=+Y'RJ=hFagH`a1+cfR=]Q(DLE^6eom6)Z_-Xq+;H.eb4nLgTN,.V\$8F=/OG34fq!OifKS))`no61(%@P`c@7pAANBY<[Rf-)tS'p=u=7h.JnT'GnmraW(OP[Dc&2-l7k`%-?jM]O(>t=himKCH^rRr%/f8D^0Ua]h7nb3%8*r?r>92%k%N;hc3E&$3gHpkjm/Ws("-&]>fLLP+rkd5,ZMDa!mi\K_i>tXq-%$eKb;(cM/1h5D;!q;?NkZT_sIEcX+eadC!<]j6#/e.Of`!2HSElEP*iEfHp)G:H@#[CqaIo4oBn.lYUSL3;SR%M$<Gk"p3TC8)!0kq&6ipLmu$teNfkSd=!X?X&n?r%JXk1J\PNe;Vi9,n0WSc'?:FW(;~>endstream
endobj
27 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 739
>>
stream
Gat%`9omaW&;KZL'ls?]<l4P(LP.XkY_u<g]5/`F>cPqN\hjkc(=6CagDS%T'1ub&e&Lh"46's#NYt[+=FX[9!,lY?>qo_,l?cp%8>t_@^9/NXhBTf?LEek5M%\bLVdm1C!A%fKJeCX,(klr=]VrSk\8-TjcEC=(r=dE0p,dY1`^%4TR\t-!0;3iFqB@IGb/Bhr`e'"lDAF`5C8<+ABr_hu)6Tc&SG<-523Ph[C("2XjH(/G%4Gor:]E=l=5@>VGpTMrG\%m&Q4;QG;IcQX&0Nru):YiLLX*g977A1G\:N*`Kin5e&Q8TCJ^4\,f^@E-#M21"SfZ4VEuGn%IFgZ0s6Y2X[31+g\n`DHEj=<aAfo_Kh>%>R_]HoCo6.[s^cT;9n(-m7'ZUY)`JsW/oCDuL%qM$oDL\+E0Zont0T;;)a,cdRV9ZT\SQMR98THMTQ9(.>G!Zr0cKikEYt=O<]K$X1\9!!+05r;\6.-tO5@kEha]&R/Bb6e1JUugo7M`e'jM5jL4Nm@rQQg[;fb/PX+?4LBi.As2"n3ct9E@TMX>3`97IDFBWkb/^JU=]]n\qIDh9,0olr!Jf]Z6f2N@F>dUiN=tSsBcFj**-r_B8=B:uSr)^V^'nO4kp$KOGosmVSRR>Nm4f3`9Ph\Tl+`FuJEcp1Uo.BLVi8`G)d?$(\1XbuR".o=UYMf^H%P58cGJZIlkKLpOq8[8*;Q)a$I-9#I$u\,?K\Drn[6U]~>endstream
endobj
28 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2279
>>
stream
Gatm<=`<%S&:Vs/R$V:2KraLE,k"*ODXU$"BH*`&LB%N'0t%ul<(SRBpXejB8_J+sW=?)6A'#GJqW?^p!q>`0@(u4Ni6N]3IiCWa_X\UsfSa0`#&feThbVI[#Vp_1n.N4ubp3&iGHZ$]"G,SS8%of:)5M>LX5S02iG]rX\`Dk`d5s<$U4pc59jq2Uoo?c^;cnL$jmOI*^aWO,?CF/jq0Z^g%`r+V(X8-p5rF6NSAu":a8Z)9%Q/t-8HVQNTcS3.h_iX<e-k*9$8,(;Tq/lmeAoO=Z+pfoNU()UO"L#J-I&-s%3[E%KcqU^qVd>;GHJU#L#b7X`P@""&*T,MHQ</P=<mneY*g@`_L"<H)-Uh*L`u9PhDfROWe?rc7^1[bko3T5#?r?i5]NVmd/\(l"kupnJ:SW;b.==s*a"<.X"'5/HcMD+ZH9/Mi9Ce<_(3bM6#W?5Ui&3-WHLhi$E6<aQJX+;)m20M>g"m(KN+oN5E4#4>)euUb(C4neo3.HZE+pY;KJ]ra['1,k3K>3>aEVQ^3?Y.p!3F@Y$q61>S"Q.%A]E^D<qGG[r9Go%d2Dt;:.Z@@.M5<g#I)&&-]'GAJCf`0U0r8lebLN"muXp\9mU70KU7G'`T(CP22l=86L]JRCk3hLG&$#YTscf7T)9NgE02G7>S@IhtV?31qE55qG07J&nD6un&6'LJ6/I_4$?I\,!S=hH\s,5CT`H#@FE8^.T7\*b4Un?S=>=^=9mV!Rj^9;B)7]?9H<6)P1>ph>uP^AZk11jNKZYr.QS#GcH[d[F96KKDtn'GC'Doq9?jKe[?3I8lJu2>(b1+*:ZCf\]NFr)i+`LqR"T\u-)um5q_c\m22,Z#57UE.pLR)`;NPgMiZm51JJ6BtGr>u*j"@s$Y6q0g_Dsp@fNZ!!,eo#2PP-3,Lf3=S7l7P\s#6.)9uUb64:4p*p'ck[!nE/IhS?N5o`U,8TR#?o9I&5mRYKA7kQt:T&N52T0>W0RGQ/#C:<nc.J7gire(f]WbE!aLlJOt;P^#/_=RGgs(0/=!j@%F:3C+3\n!ZAT")NsrM!"0GX`b>YeZ:?(W^W2ME,m-R"YjAH[#p$N(c`c&!mb3#PW>eE&XD^3-NYMs@PPpPG7;gE-1Xceh8<B@-(,`]S:L:]4"7Ua1P)3/q+C&h)H`:)ncBNq+0j/s[%Te;!!1Ml53!J@+V!>3/FV+iQ<Ic:9E9!b38U]@FH)jndE-Vf#8At.Jd^YQ%JSDN<oYk2qf[S3\c!MZ?e\B+m]`U9C3po;]O1>mf)3@erqSqR5rr+D%m6d.frsH7Ibc+0i?.h?fmYs'p8ci2oW*4P=0i%C8OC\H5o2Z7bq`Q8X5RNJ^sTa,l^rQNW&9M9f:LfF&uF:]eMN$T#(kH#D6CfQ#D+?0+0@mk4qL+g3)@u5C!K;F_[$H8Y7Os1ZASZie=:?[Kttu@1u-8CIJFTB%Vo?I.[*XuSNKXPfM/XY[,KTX6%(H9J/;e5,"dj]^&Wc585nOcn>52MCkaXb\JYRbOW^\GD5:4)RCYD2X0-r(9qS:1$7>t9)0-VS_*CB*?p$Ht!>?rP0B0bqd8GJGBUUICWiWCce'(Y;3FI_j+[t/RQVFVLA]ksmZ!u[e_Z3&.DXkf_Wb?&X=Q]-@M^Y?br()lIK!&(&$n!KKq#Rs7ZRgCLj`o!HpEm<Xc<"!BH'@]I`jQt&.F(J?Pe8S^T:+ZJ*S6[Q\ni:jT8Z/]Ngf4m+q&&^OgstfGnpkKl4?YDZ9U'og5%>LRs,L+<dceg5,!L2Y9dOc5<tTEH&$1(Y?YUD5+V(r<oXrAi0qd@S`8lR*5sYt@Pl2^LP7'63Ar\/kU,Y#-?#i\+L/sJd1>9NMP7sB2N[XmW\Y"N=9J#YkPlM`(K70LPX.Bj5J+A.X\m3u/&/Y,q$ds8@q>d>:]go1UOQ5>AE#J;4$WB]Ng>auiE1ekCkZm`Il7u;Zu@!%*a>(rE&<+-rn_KF[7d"+%/Vre#NrS@7Y;P^:5`b0a/+@^pr.o7n)/TU?:'b"!6`>U6)f!4<l^&RR\sjTn(hZi:s_$k,2Zf`A;64l6'2O+*bBt4h+&hn4k#J<XA_])?Hha9#.5k("k7'3l:CTNjV[eQcHW:tSfOjdpSg0JCg(/hW$"qM=?^?*HVS&WQiYP'RLT*"3/W)^*t#/k=dj&*c0i?\5u$nZCTnM=c(0MkUlk>n'-"9kYpb-/l3MDEBh'U`ddmf=\q/JG#/_+k6B>;I?Js1g1*!#j-bo2A!ZuF3V=*^ITAt$nGqJ*j2`u'M*u-,_?2~>endstream
endobj
29 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2560
>>
stream
Gatm<m;p`='*$h'@L'W#k>L@7`0l5q0!maS.8X[fk3d8R;/IU6[4NWF%H54r\)0fD?V29(1@Pq2d_>['V;7CVYjnolJ)_O,*t*=Bl@@p3@L\?9q62i6PJtr$,<b)'<)#b]BZ@0i;h*`G-f6<Va%5qfg[a_\9EO:u@C4Zb\7@O_dr\O04e+</_U=iG@$0UI?N&;bYkS9X5Eq>&,WG:raV;Bkc3;ltR.MdY0*nI!Rc-rq^lQj4qYT:lZkR[R"baUDG5,#6bouR(Q>=2i\30V<3#bR*)[F8/6@q2;nO'$h,IP?hQ9@HT_9oE+?0/'5-OUXP3St39Z7PrLABG7hi(UGDAN^;@m]dtC>:U]JM*_HYkLB2LpPp!6'_,p*HuNopY/;,*@iW\`,8X^2.MA]\6"=b+6J#p;"\?"bINu*#>&8/2o!I%78Yi/p^fc7&(q`#m/>:a:X8jE[\ghGTGpO`;=dH=`"_SHE7DU72#,SG%DlOM^;1(_u+@^XlktOcoq"S$hSE@2?ecY>[rPuLI$^.\V1Y"bu/4W4pZiP3(bEL#)dpW=[GM3rHiM(9=nDb/k.$PWL*OrV[VGdU'lT_b\T<fHH-W(Q-!2_*AN]*GaI1`L[JnXl.Wh_bSkm^pY7*I)3`0SL_'W"eTKQFF@6VQJkS\^"(//@0T)Ap@dQHpJjU\@n\E\bs=N5Y9)*5@.c,c?ul87[,U(L&(3GVb_*Bma3EKQYFW#qST:Q5PO%&<Tu=-1IWDXTtqtaEZGu&kUQ[TseE2XDspJ0nksEh@;TiE[l>Q$]EK$nROY+;RShkRX;G:jV*lu.0d%j,RS+/CUl6R:ZlX>/_9,DeC$rrNfmA[b+!_l0r,35[8NJZX!0WM!G"\uWSD0LJn4cIoJX?_7r?BVgfn%1eHYu`dR34YZ9r>cOm]<;3[d%4n`L5&5FsIPk-*(hEcH,N`!+u!,gF`s&iXgVb8k6QN%rh^9O'-3+KSd&g*sri;B_AOD:3'gU=#,)qWI]o0Z8+&ARa3=SidlX7Z0?3\d3#.L,YSD"hui2*o!"JGYKrhD3e,r.,0l4SIG`lAd36nKkhp*T8%OmNg=PoRb>=<7ZaN7r&V;nVSCF5$c]@XWFLWbH]9Jd:&8T,W#VsU_X1%39BDI>;C2)[lCX0F*!:)D2+`qBQiAX^a05i;/LDMe!IbUYXqK[0B3!mH:au6f/idTqA#hN0ophZ<'FNo?>uY]g8:?HA6!XWub6BGaKTBa8grH^.9mS(?n)*)CPXg\=Q$4J?>h??@]a0;Lg3"5+<im3`?cfU:pNM%GX.7qkpS.en`.:D*$WU.7bGA_hHc>kR4jS!P5H68(Db((R-Ml:%0.XG-#*:lE^"PqXBP-b;1SC-gM--r-[U-GoefE6Ln,&7`o2!`/:&#Z4?*S<8i#Bs"dop)].h;HLU%]Zoi)E)W\fDDT^L8Mb9lfeI#fH@brXmc(7ct/6AKi^j?%X7.B?g)l"@F3^6Pt2T':gW^"h@2`FYZ92*>!'Q(r"=,?a:B`-a6&,[g`#bDjXAIC;WWR[?@Qkq[N5USK[l1Y%m<a=aifh8r?Q0*cd7Fhsd2=T@44<$=79Xf\N9K(P?-q%)OLg"83\V62RF]1ERWnN?UEIne18G%`Ap5W7fM0MH+/X(^[^Ap]8!A%#.VXMnp5Ib!?:H^Ou%D@]hbcP)8fSlODT1lmB=7gWLPF.rTn=YUrFXL#k$:jUb1^U+#&1P_O&eA`3:V#p'uV2GluQ+cqFod3L2ArBXsf%dnDUeZ*n&UDrbio=]H']t-1ml)qtWYIh:f!"E:<EpWc=.(<ISi4A@rJmeA0iNiYM:sKaTmjC#>]pISpp2u+Z'[=Z<(dFCbC9EaI/[q]Fn+XX8e=9"Wrdb@1^X6%coM>DbjTrK(qHnI@;YNAcko&!_\o]C.ct;qDR,+NPk3q>SU1l]lhV3$dSD%t1DoVsp)oq\r*4r(k*8fLjVph^'S+13jG1pX>4/HA`e*g94SOV5u!A^F1',[P<>DL^.(MS2mId:T.[iSVsB(WuhXg78=Fea7q`gKSN<tjucH^%0G!ef/VY&q-oauCI8LDtLdpoRV'QK*X\5(fBjlR6mMV9X/7$Pp$3TNWdC'i<_C,X;uCW]bF2f48ZKF`POt2)[$4j*5+3Qj!8`W!'JlqYDZhr&S8u!nM):Ar?!^"TNrDp)MYR'f+C=bh93R-K/HQQ#O/0_Q?]i3HV<DI!gm0?QFPhRm^>P,eIM3fd`tY%E5ESdIT:RA"4;WpEdN'</E)bW=US_YD^p/9m^@me!u:q-"o&4AM3*ZC%0rdh=0(jn4^*+r0_3DD#6GY&KqU#Im0CuJXZ%F<4Zl,'t3WI.c$tk/Na2X(R;dCfOSDb1FH4WnL;+,pf)KY\5XU$%EAciV7b')UXo]ldfPCEr-(/A>^L:J4l9R0)ZtOaeYa@S:Y2kl_:T4do7-6Wq2XbLflepYT`PQn3:)U2<fK1q3(qk=TZIBSX+Xab*k\Z@$9!OO,$S@,Z6BlqQ<3;5Os783KQKZBl^>=L'=*M!iMC%BE@Y0dWkr_Wd$<mpbpn;(IoqPHoRDT'76C~>endstream
endobj
30 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2470
>>
stream
Gatm<>Beg[&q9SYR($(">9QOpV"0]/lo&/DUq=2$ZnH]U8Ou/V&hH:O;1JPi!$k"D->i3C*6T%P_0g<r!J:B%R/>*#J@?JBoo9%\@C$+Q04WYILS$LAIpTWD&S1NC]hH$1jXR!SnF3X7&HV2mO/$)##8s<fUaolYfaISVmtD?o=#EKmYB::]=IR)rQK=5m`=K3K$C`,oaloO>*A>kM,(IlC^(ZTfFtTOiOsLBdV;Wn1,96a^dk?Lk%Moj*nIfi[)1ImUMUQ.hI8fY2iZlV!F%QO>9\+"7OI*I@FnE5?!Q9ZXe[lB[;cOZtVA?r(/:jV2DAumP7d:=ub$#X0]H<(.nIZ0A)_eLHXV1o:^KD,M_nT\P;-F2L"r>Rl1ZjRf/0gHkWsCTg=T"+)3'tOM*QSR+`)hbATlaRtWe#d\G?^mS:q!e5Y,mAH>O2"9OnBW$RjIu&2t3(jdd%o,"e]k8jrY@4>;[XX#/hF>(o8_fU(FlBW"=:^\#h%8[jA5(/Ag<_4dIDLCuJQSDnIQQ!Sl7HV%?!u#n%^R)J%Y0F,:.lL=TqDKA,No=F1N$=XEAVE>Y4!\>a._`!nU`Z>TRHKuS`kb26>SGPir\%H!p[;h0h:Qf:8l8/J\n8$IdLjZEXMfP6%Jmqdd2PJI>`Ug_?T'n0*,RsZm%+cpj[g:UdpZLfU'`irl(C9C[sIcE9i19:PqfnIUj_h,"G\7!T&SMR!]-7iA`/rDH/F:++0Y1c3%3Ld^"GPgM[m*QttoT#DICjII+)4DNS[bRVMi?4UQ-r`1!IObl<dV[CtK4X!sNP^]kDF>WeHd^Z<IbtlE7jq`kiL<[(lK-tbW$6DbaBXTnQ43aM$GR&8_+pG\0nr7Z@Sb\hR9)okL:B=?7!F>6$-fsXnRB&K*FT9cs)oY=%=40cIO7Vt^6Acp4euI2?`,bZe(SLblq5PoPmN,NN0W<[(O&VeNu&9AXd5mP6h,_''UuWUNDENDF?Li'(qJCpJ"a?bD5A`%[:e(eP_s,7@-bV!rs+69ALq0o.;q<Y$V@Q4&d^n02'u\Q,'1'a/?UL^)U&iuVTHKuju$rihp#&BS1r!4X-#jc?lKo,L0%DR24NOjPrE[=;J>4+LmCh;Gu*"rV%hN$CLhXNq#glhmX#>6nUH&g)^Wk:ShMZ-`%DO*#522G<X7IN+5E9OO\<%jWdk`,/7$<XSh!r_;B;&1Unse`\\p\8\rNmo?"Agf.%m(f9/r)p'FdCR3'$C;]n??+0Ch2&T\Oi8S0VM!W0hmJe)muFf,t![2NAafl`:Y_h<PAL*HfD:cg;cM"Jb9-quf-+D3PX?BUfUYWhVpH5tcn8KBAcM&p-fQ-_mn1S^KmfSb/*rgn_IG%l]U98\9;:3\"kLYHU`q7ZaA0]L-q&0_PE!m_;R#g<;TFa6hQspIm[he9NbprQ9K?F]"7a*/j#h-Bo.!]c"O8#Vm`C?LSjrqo]Lk1A=I5=bX5nG(%6@jE!^0VuN'Jr4n<2kkW=HKj1YuMhu5dTO%X^a!'_q?T1L'na#8QW&PXI1h+=h=Ac_\D(l'Rl7-Z[TD%7IZ;ET"75GOB?((:s^K8)/n4Ur%J1[4]F>3$FNf)GU@d_V_lb0!X1[!D,cIU"nA_uP%$j&dJCS>8rk!=F@YPA"f!ZM7As"qUgAu=qK#(!0"X`?Q#e_k6q)"$VG5=Q_!nS'#9qfV1WqK7**etWlgH61YB%3!gf\R/.<@6)Gae`aq.l?T[s1dt[Jdg9TQ7bo$`eA(hS=E>Aah>I,Y2amS7g=FVF[[TGBnuL)rO`pjj[H`UJ2@S%&3n:)N9;C!r<&fs[Fc1mAT2[7j2m2+!9oF\Tp%gXldG@%$a3KlAKl2tNS!tW\3(h<-KHJsXdTA^R:h1(saLs\X.bQimrEO,,Y,c"Sic*h1=qcB0+u9.o7pm9A"3uu\D>96KTC*&("U;^1A#q)i6g2n.<g"dqrV@L'(jcgB[nuHG^k>"r90\pk[]S>m4p3OD-J(j3h;!SQ;bc:cQ^Ac=U,A_rCg]5#.OB+27Y$39`YoGYo?l-F]J[XUNH@riUFc@]@oVM'r/N9Xkh6#A9;A;"Sj3k+01E[^)38#-=Vgg[QFG^uX`[(<3r3jGFUFM^F)A-r:c!BFK9k#EoP+mnA`/e+i6R]_JN^HRCER9+q7"5$s0Si>,^6FeI?_3+amZkmdETH?"rQTSDI?t=46'=3f)Vjh?MjM6Pp(:?G`Ai:EJTa_?G0"P?PgE`51m5m5MUr$3pj&dn1]jW@M=PL\5N;9JAgfX:#8-Z`\UE1G,dc@FS;i0a>@@>J/1bhCR1;.O2)b^(efq7l;UeSfP=d%1f:pP@,IXd_I*-AD[*QcoIcn!:S:pn*LG="=HLj+n/k2UK5MEY]TT+mGaG>,"6[r/Tb-IkYQh2hT!f1;;iTY*7!f#C(B8QEOnkU.a8.7_04D3q,g9ZKVhurg%Tdg80uUu([;X?Z9Srh[p`DJ7'Me~>endstream
endobj
31 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2152
>>
stream
Gatm<D/\2f%/ui*_(PnrY.\"fpWsXgZaRr:DTQp7JRQ?epuFN=[WRl&"P7!FdZVr%,utp,BmQ\uUe!YE`.qs_iBPnCjqY\d-+nWOJGHE#J;&mmQ=o])H1^d.IhG"=&$eW?k8-u\5B-D"f;-4Kl=&U&68+$<6G3_dQQ#sll<7jEf6+]X1SqPL%ndO;b,X1CVt^jis2"7D)4@SeBAk%++Y`5Po<j*b,.RuR3/u;pQM==ETf_E*5<kg=E-"uG*%oU!0ZD?FU+faFp9,)]O"PCDK;HN(aZ.I?+Koa5DX#n;ocPO+?G?/bohbHJ+a*_IoQ1,@m15Yh3o3J/_br>Y`:o1:bfASs4S)Yj1Dml*0?F&Qk#mQ\m6(`+Gr4sL(m,WuHGX'8@fi=1>g&S&;"1b&2bJQ#/[e9\YS)Yk`<t1kYIoG%K,*9$TSfJ^a)E9X%Fb`]8Zil)/]n8u.dnia\%!J2e-qi=HJ:%*DK4uSJP,F/e,63[ODEMV/brik'ZMP!U$$ho:hnML,9MMjZM4UC5mo*4*A'%2n.ReZ[ONg;#F."B5*a@,UVY#S)]QqRX:Kr%&'ZA-1&+%LcG]*dR)if[g]k"s<NdZV4``e2b*t]l@h5`8=A06^1R0A.>ja@ooRtN/G2<gqo_P>%Hs3_l<o?K=cQ$]+6+aA3!Oa;N>+mc:hPa2]'WmoL+$Z<EKUeB?"2)EsEbI5`1hg!rmTKWBEaie^)jcmKP^G)s<lt1R7UV03n-aJ^Lp=naV105jC`LO%.")N0_m0L">ZKNVO=)$*Xt3k9f$9^cJcZ"5BZRCVjLXtM"4aFXhOL3AZs)#N).NlO_9EKoI=7NMW`p?8ViLFh/+h]/="k:XFNc]&pml3F?+J.Gs!WQf\o5_(l="O3Md#%8XB_4F:n^kmV8<]%h1u*k'VM('MOm,WkaZ'ZWk-tGZ*I.(/[PS3mrE]1A\b9UrA4$)hAhZ7+Yc9.Q`F:i17o5<j2YPD(H"c8\?6dL']-8-DeC'SeZ=mV_eY>c1h6o.fM(@QQ,ql/lN.A"3X(`6Ea`NB,_u@F#I/lpG0*t?H?o'sjsGp.0JW?4.h)8qkD8QCa$=Ck^"bK4F.bUJ[&\K,P,9aDXVJF<0rO5]D?`#Wcnag$\r%\/j3;t2>CHQMleu2QBIX%dZ*5C8km]h#?b<ui('?DEiVCi&>e.S6.)[Ta_uK`WTn<(\=e_T"Q*'@/-@/eg7YY(7esn[])P5iamg#'P?sJ>/a"U<LrHs]eo0Ks[cURZ7EHSp=LKPUcfdoDXa_3mUIIT\!_XtX&L*mf31!q,MSEoU,.!]9^MB(NXeB](bbS0Hp6=(m"*1.7;/j/ln^saj8Y&&A8<7d?r.``Uml8=_r5C>bB6>'B"eT2ka3>1-fF7;e0>#a..XEnK-S"t(qDZFh_08k*:CA.*B:Y$^tO)R_AR0]:mB@"tPUr>F)%t:$4AIR38@"BEe4,%:pWg2)6j`m8tYs@,]G`-.9D;_FXAW(QV9l'TqXVTM$_d[tM"t08<aDZ;T(4s$:9:LQ_iH>JrKr0o;23M+X\6uq!pD.@rr+;V=qcY3bdp5^aUC-iunLph(R);S0/7-D4X49(>aTI+e_e>/p%b*5;#DaG97=8.#TIk"_l'9U[5LAO<g"sBRb97MjfIk5!pFJW*I4@O-8)k1e%LZ!.]dKGMmg5rI*^iecW2b0P/@'po)MC=nG4;*/msa62pF!iH$7oIYee'Xo'WL[A?>h`5Kg(ApbIdjQ8Z]7ENoCosB$/cf`>LSRFQ)nm9oHC!M2AW__WtC5@.IUqLXiA9c0\J#pEQZk,Nm"p)IrD[@#gPKl,*c91AefVK]a<5BJk+<`6p`jRIS)%q$,0RCSTJ/]2E*6ee@GpqZ0Y^SYJj(g<,\/GCc[&V]ma<X=_2:FYX2_-(I_TXN]cBM=n*;=.8I26f<VE1nqPoWtg5<`thTE>gMq1ZV>4L!`*Rh3HN)JX\Icb&`S]^*c&q.O(EB-Gc],cm/\RLbE[+]Nd^/']=#1maR%<CH*8nnObVr-lEF/na`@)IZROM,Tjn0&g:<[ZK8d3[GcVroX],Z$Cb\Nm)!X)%aA<CY%iHu-iX$!Pa*DU!TemhQj3`j2>WEWMDD3d"0Yfr8aaPr?JYgYt;_sm;c=6[hN.r^7\&-Pm780Wl~>endstream
endobj
32 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 1311
>>
stream
Gasao9lo&I&A@C2m%lL5=u-0RWSZEjS[N$@S;&ia_&&IA7DpIFJ=mYMf69MI'WjcB/W0ZH]6Lpu]3%lPpn?`SBE<S*iS=qHarmm<7R71Q/R7J>YH)XiKZm2mK>bla34+0SqPuR+J@a:+O9?0;+H=dOKo<SZn4>bN/``cbHam'j$P,'g+d[&X[nlMunh6*>[31BmfU;tX#2ur74l6O<A>'opEKVX3#>J>@XjNd*rU9LE.dU1V,Z0)P6lA0mLnce7m]D%9X,e+]!K'c*NS,4-MA@SbXc9T/emclH9J'hBN.Da@]j1eWe6j_qrZ4`e%VHDDs3Dt4^9aK`=i^<L)[>VJn!Mk'"aLDNjDH5<9;SK<s-VlgL3uhr?+!neM9c$$(Y+VDKC\2O%l[D\B9Yd'(<Y6/V=[YATS0H]$HM%_KZNF%[)a2TbH6-V$d'oHi*(1H<<l"#gP21Rkr'DJd:h%uHdme@1c=ob1;0"dLNM@n<d"bq6UH5'<I'QD;E)43H[?!OHA,-"7A8dTFqj2WS:$kKVt>O)bK]+`7e:Ka1SJ>9d@sIK'H2G?X>F)fXDVsT%VifjD]6"=$LU\I#M:&FP[/u58QVG87)tGmA<s&J>F.U@^!;ei=WUrsn*<K_Fm1VRVd8#uE[(uT>l9`ArU]Nu(TISKj%maV_(ub>^$O]\p@>IK'CB>q^l3m%BYdo[&Nc]4`'#j9i4Nb<:C2?n4FoPaX21aX6=\F$`l`cc26bk!B$mtMn$W"LBu#)Ga_h2Lc"6(?1^A7'c"LFN*q[f%?'SHmccVqeh>`=>4e?W+bs6B]`LJF)j"hBC<&r1LRnJ^QcBZl#CG!INDO#S^:^SESj5k%0.HJqmN$tC]h7su^.K/=cgAtV<66fPXQ>*,&\2V$'FP^7Bbmjm0U?fW25WO(icG?(6PjPc+iV1M&Ff,1KLRq[`lh[+lgX\L0;hB&\6KTOQ1J++eW-PtkoY-]\XiNh$:@M#$UMt%1G%qr@lf5rllu.'iNK;^KRHN@M)&_96AgAABEjB))*;,M3(+7cd`@JbjMSk.W7pkF--N=jQ*Z5s2>PRGp5)u8q"Xtb+&u`DaI5_h91e?HIakPGY<p5$HZc+hK8h_-[.qib2I1WY@VVhqW7H&O_/+Dq,X)AW7;)EVR3s@\hShMNB4D'JEa,7*t!-eQ/%^IP(o<VdDg"8,<a,1fC1M@B9<FrBC9[1g8@%5ahC,O3m81ZY.80"s\F9?M@]G5[8fOO.d%VU&T-u-S8;=UfB$:0=Ti%n[Ye6kPU=<EjpfLG>\5nWU+r5+)Eb$M6&74$V=J^o671ZCq~>endstream
endobj
33 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 1124
>>
stream
GatU1997gc&AJ$C%!(&;)b/=]]d7?t8?jS+`ePU]U#iPu/4I,qA]+K>),cV>fgf5q"t5rsS=/jC7RIXlI<f)P*oPiehL+7s;cmpfk:DDM4hP,Sra%uc#'DUVXISueObF<ns0UO"J5=Sa/7Eg%6WMLD*c@8a_ABOKMfPls&akY3_-ajT?n)!P(fpP@Q="(rF%C<`.;_s`eW>c15)Cimk819P/'>H!3d?o*Gsh7`s8TU(;W4k,;!*]Da_P(,..W7ldm""C(7tosS>o1pZYUP#BRAH_0(_$N"S,CCRh$t;aAnZ5Wbt$"aWSC52gPjUiX4T+-h?C'X/<NliD%GQr2c*`8K[%?emm\ZGX>M&rJH],1L?kK:%lKGrE_O!1j$Tc:^:u^YX6jd.MVRm0H.dPlG2/8A<_Ce$UV=nZ+(!Vi19MBOnoi@-Toa1m6Gt&k+LZ6EC\=?).=0K^.qeY,Xn-@,&hJM*Z]&JU,n=Y\;Q)<Tcp4ac5ah4;oL8'9i'qKDl#q1<#8XN8pUj8]CFruc*6S#J0UOMkg17$?BoP`RuO]P(08?KJ>W`&p<F(m%8qO&`Ha-Vn3i6(bhra=\6^QeXZ\^@5NG&G;cSjkXC]f?V]P]l>-b5El=-"K4V;i_KL5JE<l0krbo@$>^#(9tOhp7l'>FA#LXb4DOFHn+@lmS:m<;!,b*"5-W[8Ki#B`Y3Ksd&+(Fg#6(HY=1IAr:3ZEem$cD(T\[bZX=0-2MA)6O_0#j(P`liSYX%Q(Wd&GGlD-&V!&.`(Gdq_MF:Bj.CQl*X]OeM5u+eC8kU=)UJ[<SZD6F#\"ul6,Ge+'bHF`/7``?7Tb@l8%@;I[=)+Xbr7/'BX'[[RdR55q-&od$/3\g7_%(6di6A[I\QTUG*t2U^h,u:m4g-3(Tlp6lhm(iM@j^S.TB;5LIVf`cCkAV)bX;iLZF=))(7;3-ZNX9[^s!UEug\QEa#M3lssNP!0WBHg:S:CXb&-DmhWi3F,3e=MrCajj\UO,+VSH&/uMhf?=Ih/bV$"f'Lr2fBZA&VjYa"ni7]CGqf/sHh;Ej9_\#Z,Kj11R1)p;2^j'Zjt!lh]NO^?Gh$51^*T;tPC_eM?fu$X:4(9L1Tnp2'/is?"5,dpk5~>endstream
endobj
xref
0 34
0000000000 65535 f
0000000061 00000 n
0000000156 00000 n
0000000263 00000 n
0000000371 00000 n
0000000481 00000 n
0000000676 00000 n
0000000785 00000 n
0000000980 00000 n
0000001085 00000 n
0000001162 00000 n
0000001358 00000 n
0000001554 00000 n
0000001750 00000 n
0000001946 00000 n
0000002142 00000 n
0000002226 00000 n
0000002422 00000 n
0000002618 00000 n
0000002814 00000 n
0000003010 00000 n
0000003080 00000 n
0000003361 00000 n
0000003494 00000 n
0000004196 00000 n
0000006400 00000 n
0000008981 00000 n
0000011784 00000 n
0000012614 00000 n
0000014985 00000 n
0000017637 00000 n
0000020199 00000 n
0000022443 00000 n
0000023846 00000 n
trailer
<<
/ID
[<71e3d90b133a79c4436262df53cdbfbf><71e3d90b133a79c4436262df53cdbfbf>]
% ReportLab generated PDF document -- digest (opensource)
/Info 21 0 R
/Root 20 0 R
/Size 34
>>
startxref
25062
%%EOF

View File

@@ -0,0 +1,160 @@
# ADR-024: Canonical Nostr Identity Location
**Status:** Accepted
**Date:** 2026-03-23
**Issue:** #1223
**Refs:** #1210 (duplicate-work audit), ROADMAP.md Phase 2
---
## Context
Nostr identity logic has been independently implemented in at least three
repos (`replit/timmy-tower`, `replit/token-gated-economy`,
`rockachopa/Timmy-time-dashboard`), each building keypair generation, event
publishing, and NIP-07 browser-extension auth in isolation.
This duplication causes:
- Bug fixes applied in one repo but silently missed in others.
- Diverging implementations of the same NIPs (NIP-01, NIP-07, NIP-44).
- Agent time wasted re-implementing logic that already exists.
ROADMAP.md Phase 2 already names `timmy-nostr` as the planned home for Nostr
infrastructure. This ADR makes that decision explicit and prescribes how
other repos consume it.
---
## Decision
**The canonical home for all Nostr identity logic is `rockachopa/timmy-nostr`.**
All other repos (`Timmy-time-dashboard`, `timmy-tower`,
`token-gated-economy`) become consumers, not implementers, of Nostr identity
primitives.
### What lives in `timmy-nostr`
| Module | Responsibility |
|--------|---------------|
| `nostr_id/keypair.py` | Keypair generation, nsec/npub encoding, encrypted storage |
| `nostr_id/identity.py` | Agent identity lifecycle (NIP-01 kind:0 profile events) |
| `nostr_id/auth.py` | NIP-07 browser-extension signer; NIP-42 relay auth |
| `nostr_id/event.py` | Event construction, signing, serialisation (NIP-01) |
| `nostr_id/crypto.py` | NIP-44 encryption (XChaCha20-Poly1305 v2) |
| `nostr_id/nip05.py` | DNS-based identifier verification |
| `nostr_id/relay.py` | WebSocket relay client (publish / subscribe) |
### What does NOT live in `timmy-nostr`
- Business logic that combines Nostr with application-specific concepts
(e.g. "publish a task-completion event" lives in the application layer
that calls `timmy-nostr`).
- Reputation scoring algorithms (depends on application policy).
- Dashboard UI components.
---
## How Other Repos Reference `timmy-nostr`
### Python repos (`Timmy-time-dashboard`, `timmy-tower`)
Add to `pyproject.toml` dependencies:
```toml
[tool.poetry.dependencies]
timmy-nostr = {git = "https://gitea.hermes.local/rockachopa/timmy-nostr.git", tag = "v0.1.0"}
```
Import pattern:
```python
from nostr_id.keypair import generate_keypair, load_keypair
from nostr_id.event import build_event, sign_event
from nostr_id.relay import NostrRelayClient
```
### JavaScript/TypeScript repos (`token-gated-economy` frontend)
Add to `package.json` (once published or via local path):
```json
"dependencies": {
"timmy-nostr": "rockachopa/timmy-nostr#v0.1.0"
}
```
Import pattern:
```typescript
import { generateKeypair, signEvent } from 'timmy-nostr';
```
Until `timmy-nostr` publishes a JS package, use NIP-07 browser extension
directly and delegate all key-management to the browser signer — never
re-implement crypto in JS without the shared library.
---
## Migration Plan
Existing duplicated code should be migrated in this order:
1. **Keypair generation** — highest duplication, clearest interface.
2. **NIP-01 event construction/signing** — used by all three repos.
3. **NIP-07 browser auth** — currently in `timmy-tower` and `token-gated-economy`.
4. **NIP-44 encryption** — lowest priority, least duplicated.
Each step: implement in `timmy-nostr` → cut over one repo → delete the
duplicate → repeat.
---
## Interface Contract
`timmy-nostr` must expose a stable public API:
```python
# Keypair
keypair = generate_keypair() # -> NostrKeypair(nsec, npub, privkey_bytes, pubkey_bytes)
keypair = load_keypair(encrypted_nsec, secret_key)
# Events
event = build_event(kind=0, content=profile_json, keypair=keypair)
event = sign_event(event, keypair) # attaches .id and .sig
# Relay
async with NostrRelayClient(url) as relay:
await relay.publish(event)
async for msg in relay.subscribe(filters):
...
```
Breaking changes to this interface require a semver major bump and a
migration note in `timmy-nostr`'s CHANGELOG.
---
## Consequences
- **Positive:** Bug fixes in cryptographic or protocol code propagate to all
repos via a version bump.
- **Positive:** New NIPs are implemented once and adopted everywhere.
- **Negative:** Adds a cross-repo dependency; version pinning discipline
required.
- **Negative:** `timmy-nostr` must be stood up and tagged before any
migration can begin.
---
## Action Items
- [ ] Create `rockachopa/timmy-nostr` repo with the module structure above.
- [ ] Implement keypair generation + NIP-01 signing as v0.1.0.
- [ ] Replace `Timmy-time-dashboard` inline Nostr code (if any) with
`timmy-nostr` import once v0.1.0 is tagged.
- [ ] Add `src/infrastructure/clients/nostr_client.py` as the thin
application-layer wrapper (see ROADMAP.md §2.6).
- [ ] File issues in `timmy-tower` and `token-gated-economy` to migrate their
duplicate implementations.

View File

@@ -0,0 +1,100 @@
# Issue #1097 — Bannerlord M5 Sovereign Victory: Implementation
**Date:** 2026-03-23
**Status:** Python stack implemented — game infrastructure pending
## Summary
Issue #1097 is the final milestone of Project Bannerlord (#1091): Timmy holds
the title of King with majority territory control through pure local strategy.
This PR implements the Python-side sovereign victory stack (`src/bannerlord/`).
The game-side infrastructure (Windows VM, GABS C# mod) remains external to this
repository, consistent with the scope decision on M4 (#1096).
## What was implemented
### `src/bannerlord/` package
| Module | Purpose |
|--------|---------|
| `models.py` | Pydantic data contracts — KingSubgoal, SubgoalMessage, TaskMessage, ResultMessage, StateUpdateMessage, reward functions, VictoryCondition |
| `gabs_client.py` | Async TCP JSON-RPC client for Bannerlord.GABS (port 4825), graceful degradation when game server is offline |
| `ledger.py` | SQLite-backed asset ledger — treasury, fiefs, vassal budgets, campaign tick log |
| `agents/king.py` | King agent — Qwen3:32b, 1× per campaign day, sovereign campaign loop, victory detection, subgoal broadcast |
| `agents/vassals.py` | War / Economy / Diplomacy vassals — Qwen3:14b, domain reward functions, primitive dispatch |
| `agents/companions.py` | Logistics / Caravan / Scout companions — event-driven, primitive execution against GABS |
### `tests/unit/test_bannerlord/` — 56 unit tests
- `test_models.py` — Pydantic validation, reward math, victory condition logic
- `test_gabs_client.py` — Connection lifecycle, RPC dispatch, error handling, graceful degradation
- `test_agents.py` — King campaign loop, vassal subgoal routing, companion primitive execution
All 56 tests pass.
## Architecture
```
KingAgent (Qwen3:32b, 1×/day)
└── KingSubgoal → SubgoalQueue
├── WarVassal (Qwen3:14b, 4×/day)
│ └── TaskMessage → LogisticsCompanion
│ └── GABS: move_party, recruit_troops, upgrade_troops
├── EconomyVassal (Qwen3:14b, 4×/day)
│ └── TaskMessage → CaravanCompanion
│ └── GABS: assess_prices, buy_goods, establish_caravan
└── DiplomacyVassal (Qwen3:14b, 4×/day)
└── TaskMessage → ScoutCompanion
└── GABS: track_lord, assess_garrison, report_intel
```
## Subgoal vocabulary
| Token | Vassal | Meaning |
|-------|--------|---------|
| `EXPAND_TERRITORY` | War | Take or secure a fief |
| `RAID_ECONOMY` | War | Raid enemy villages for denars |
| `TRAIN` | War | Level troops via auto-resolve |
| `FORTIFY` | Economy | Upgrade or repair a settlement |
| `CONSOLIDATE` | Economy | Hold territory, no expansion |
| `TRADE` | Economy | Execute profitable trade route |
| `ALLY` | Diplomacy | Pursue non-aggression / alliance |
| `RECRUIT` | Logistics | Fill party to capacity |
| `HEAL` | Logistics | Rest party until wounds recovered |
| `SPY` | Scout | Gain information on target faction |
## Victory condition
```python
VictoryCondition(
holds_king_title=True, # player_title == "King" from GABS
territory_control_pct=55.0, # > 51% of Calradia fiefs
)
```
## Graceful degradation
When GABS is offline (game not running), `GABSClient` logs a warning and raises
`GABSUnavailable`. The King agent catches this and runs with an empty game state
(falls back to RECRUIT subgoal). No part of the dashboard crashes.
## Remaining prerequisites
Before M5 can run live:
1. **M1-M3** — Passive observer, basic campaign actions, full campaign strategy
(currently open; their Python stubs can build on this `src/bannerlord/` package)
2. **M4** — Formation Commander (#1096) — declined as out-of-scope; M5 works
around M4 by using Bannerlord's Tactics auto-resolve path
3. **Windows VM** — Mount & Blade II: Bannerlord + GABS mod (BUTR/Bannerlord.GABS)
4. **OBS streaming** — Cinematic Camera pipeline (Step 3 of M5) — external to repo
5. **BattleLink** — Alex co-op integration (Step 4 of M5) — requires dedicated server
## Design references
- Ahilan & Dayan (2019): Feudal Multi-Agent Hierarchies — manager/worker hierarchy
- Wang et al. (2023): Voyager — LLM lifelong learning pattern
- Feudal hierarchy design doc: `docs/research/bannerlord-feudal-hierarchy-design.md`
Fixes #1097

1244
docs/model-benchmarks.md Normal file

File diff suppressed because it is too large Load Diff

105
docs/nexus-spec.md Normal file
View File

@@ -0,0 +1,105 @@
# Nexus — Scope & Acceptance Criteria
**Issue:** #1208
**Date:** 2026-03-23
**Status:** Initial implementation complete; teaching/RL harness deferred
---
## Summary
The **Nexus** is a persistent conversational space where Timmy lives with full
access to his live memory. Unlike the main dashboard chat (which uses tools and
has a transient feel), the Nexus is:
- **Conversational only** — no tool approval flow; pure dialogue
- **Memory-aware** — semantically relevant memories surface alongside each exchange
- **Teachable** — the operator can inject facts directly into Timmy's live memory
- **Persistent** — the session survives page refreshes; history accumulates over time
- **Local** — always backed by Ollama; no cloud inference required
This is the foundation for future LoRA fine-tuning, RL training harnesses, and
eventually real-time self-improvement loops.
---
## Scope (v1 — this PR)
| Area | Included | Deferred |
|------|----------|----------|
| Conversational UI | ✅ Chat panel with HTMX streaming | Streaming tokens |
| Live memory sidebar | ✅ Semantic search on each turn | Auto-refresh on teach |
| Teaching panel | ✅ Inject personal facts | Bulk import, LoRA trigger |
| Session isolation | ✅ Dedicated `nexus` session ID | Per-operator sessions |
| Nav integration | ✅ NEXUS link in INTEL dropdown | Mobile nav |
| CSS/styling | ✅ Two-column responsive layout | Dark/light theme toggle |
| Tests | ✅ 9 unit tests, all green | E2E with real Ollama |
| LoRA / RL harness | ❌ deferred to future issue | |
| Auto-falsework | ❌ deferred | |
| Bannerlord interface | ❌ separate track | |
---
## Acceptance Criteria
### AC-1: Nexus page loads
- **Given** the dashboard is running
- **When** I navigate to `/nexus`
- **Then** I see a two-panel layout: conversation on the left, memory sidebar on the right
- **And** the page title reads "// NEXUS"
- **And** the page is accessible from the nav (INTEL → NEXUS)
### AC-2: Conversation-only chat
- **Given** I am on the Nexus page
- **When** I type a message and submit
- **Then** Timmy responds using the `nexus` session (isolated from dashboard history)
- **And** no tool-approval cards appear — responses are pure text
- **And** my message and Timmy's reply are appended to the chat log
### AC-3: Memory context surfaces automatically
- **Given** I send a message
- **When** the response arrives
- **Then** the "LIVE MEMORY CONTEXT" panel shows up to 4 semantically relevant memories
- **And** each memory entry shows its type and content
### AC-4: Teaching panel stores facts
- **Given** I type a fact into the "TEACH TIMMY" input and submit
- **When** the request completes
- **Then** I see a green confirmation "✓ Taught: <fact>"
- **And** the fact appears in the "KNOWN FACTS" list
- **And** the fact is stored in Timmy's live memory (`store_personal_fact`)
### AC-5: Empty / invalid input is rejected gracefully
- **Given** I submit a blank message or fact
- **Then** no request is made and the log is unchanged
- **Given** I submit a message over 10 000 characters
- **Then** an inline error is shown without crashing the server
### AC-6: Conversation can be cleared
- **Given** the Nexus has conversation history
- **When** I click CLEAR and confirm
- **Then** the chat log shows only a "cleared" confirmation
- **And** the Agno session for `nexus` is reset
### AC-7: Graceful degradation when Ollama is down
- **Given** Ollama is unavailable
- **When** I send a message
- **Then** an error message is shown inline (not a 500 page)
- **And** the app continues to function
### AC-8: No regression on existing tests
- **Given** the nexus route is registered
- **When** `tox -e unit` runs
- **Then** all 343+ existing tests remain green
---
## Future Work (separate issues)
1. **LoRA trigger** — button in the teaching panel to queue a fine-tuning run
using the current Nexus conversation as training data
2. **RL harness** — reward signal collection during conversation for RLHF
3. **Auto-falsework pipeline** — scaffold harness generation from conversation
4. **Bannerlord interface** — Nexus as the live-memory bridge for in-game Timmy
5. **Streaming responses** — token-by-token display via WebSocket
6. **Per-operator sessions** — isolate Nexus history by logged-in user

75
docs/pr-recovery-1219.md Normal file
View File

@@ -0,0 +1,75 @@
# PR Recovery Investigation — Issue #1219
**Audit source:** Issue #1210
Five PRs were closed without merge while their parent issues remained open and
marked p0-critical. This document records the investigation findings and the
path to resolution for each.
---
## Root Cause
Per Timmy's comment on #1219: all five PRs were closed due to **merge conflicts
during the mass-merge cleanup cycle** (a rebase storm), not due to code
quality problems or a changed approach. The code in each PR was correct;
the branches simply became stale.
---
## Status Matrix
| PR | Feature | Issue | PR Closed | Issue State | Resolution |
|----|---------|-------|-----------|-------------|------------|
| #1163 | Three-Strike Detector | #962 | Rebase storm | **Closed ✓** | v2 merged via PR #1232 |
| #1162 | Session Sovereignty Report | #957 | Rebase storm | **Open** | PR #1263 (v3 — rebased) |
| #1157 | Qwen3-8B/14B routing | #1065 | Rebase storm | **Closed ✓** | v2 merged via PR #1233 |
| #1156 | Agent Dreaming Mode | #1019 | Rebase storm | **Open** | PR #1264 (v3 — rebased) |
| #1145 | Qwen3-14B config | #1064 | Rebase storm | **Closed ✓** | Code present on main |
---
## Detail: Already Resolved
### PR #1163 → Issue #962 (Three-Strike Detector)
- **Why closed:** merge conflict during rebase storm
- **Resolution:** `src/timmy/sovereignty/three_strike.py` and
`src/dashboard/routes/three_strike.py` are present on `main` (landed via
PR #1232). Issue #962 is closed.
### PR #1157 → Issue #1065 (Qwen3-8B/14B dual-model routing)
- **Why closed:** merge conflict during rebase storm
- **Resolution:** `src/infrastructure/router/classifier.py` and
`src/infrastructure/router/cascade.py` are present on `main` (landed via
PR #1233). Issue #1065 is closed.
### PR #1145 → Issue #1064 (Qwen3-14B config)
- **Why closed:** merge conflict during rebase storm
- **Resolution:** `Modelfile.timmy`, `Modelfile.qwen3-14b`, and the `config.py`
defaults (`ollama_model = "qwen3:14b"`) are present on `main`. Issue #1064
is closed.
---
## Detail: Requiring Action
### PR #1162 → Issue #957 (Session Sovereignty Report Generator)
- **Why closed:** merge conflict during rebase storm
- **Branch preserved:** `claude/issue-957-v2` (one feature commit)
- **Action taken:** Rebased onto current `main`, resolved conflict in
`src/timmy/sovereignty/__init__.py` (both three-strike and session-report
docstrings kept). All 458 unit tests pass.
- **New PR:** #1263 (`claude/issue-957-v3``main`)
### PR #1156 → Issue #1019 (Agent Dreaming Mode)
- **Why closed:** merge conflict during rebase storm
- **Branch preserved:** `claude/issue-1019-v2` (one feature commit)
- **Action taken:** Rebased onto current `main`, resolved conflict in
`src/dashboard/app.py` (both `three_strike_router` and `dreaming_router`
registered). All 435 unit tests pass.
- **New PR:** #1264 (`claude/issue-1019-v3``main`)

View File

@@ -0,0 +1,132 @@
# Autoresearch H1 — M3 Max Baseline
**Status:** Baseline established (Issue #905)
**Hardware:** Apple M3 Max · 36 GB unified memory
**Date:** 2026-03-23
**Refs:** #905 · #904 (parent) · #881 (M3 Max compute) · #903 (MLX benchmark)
---
## Setup
### Prerequisites
```bash
# Install MLX (Apple Silicon — definitively faster than llama.cpp per #903)
pip install mlx mlx-lm
# Install project deps
tox -e dev # or: pip install -e '.[dev]'
```
### Clone & prepare
`prepare_experiment` in `src/timmy/autoresearch.py` handles the clone.
On Apple Silicon it automatically sets `AUTORESEARCH_BACKEND=mlx` and
`AUTORESEARCH_DATASET=tinystories`.
```python
from timmy.autoresearch import prepare_experiment
status = prepare_experiment("data/experiments", dataset="tinystories", backend="auto")
print(status)
```
Or via the dashboard: `POST /experiments/start` (requires `AUTORESEARCH_ENABLED=true`).
### Configuration (`.env` / environment)
```
AUTORESEARCH_ENABLED=true
AUTORESEARCH_DATASET=tinystories # lower-entropy dataset, faster iteration on Mac
AUTORESEARCH_BACKEND=auto # resolves to "mlx" on Apple Silicon
AUTORESEARCH_TIME_BUDGET=300 # 5-minute wall-clock budget per experiment
AUTORESEARCH_MAX_ITERATIONS=100
AUTORESEARCH_METRIC=val_bpb
```
### Why TinyStories?
Karpathy's recommendation for resource-constrained hardware: lower entropy
means the model can learn meaningful patterns in less time and with a smaller
vocabulary, yielding cleaner val_bpb curves within the 5-minute budget.
---
## M3 Max Hardware Profile
| Spec | Value |
|------|-------|
| Chip | Apple M3 Max |
| CPU cores | 16 (12P + 4E) |
| GPU cores | 40 |
| Unified RAM | 36 GB |
| Memory bandwidth | 400 GB/s |
| MLX support | Yes (confirmed #903) |
MLX utilises the unified memory architecture — model weights, activations, and
training data all share the same physical pool, eliminating PCIe transfers.
This gives M3 Max a significant throughput advantage over external GPU setups
for models that fit in 36 GB.
---
## Community Reference Data
| Hardware | Experiments | Succeeded | Failed | Outcome |
|----------|-------------|-----------|--------|---------|
| Mac Mini M4 | 35 | 7 | 28 | Model improved by simplifying |
| Shopify (overnight) | ~50 | — | — | 19% quality gain; smaller beat 2× baseline |
| SkyPilot (16× GPU, 8 h) | ~910 | — | — | 2.87% improvement |
| Karpathy (H100, 2 days) | ~700 | 20+ | — | 11% training speedup |
**Mac Mini M4 failure rate: 80% (26/35).** Failures are expected and by design —
the 5-minute budget deliberately prunes slow experiments. The 20% success rate
still yielded an improved model.
---
## Baseline Results (M3 Max)
> Fill in after running: `timmy learn --target <module> --metric val_bpb --budget 5 --max-experiments 50`
| Run | Date | Experiments | Succeeded | val_bpb (start) | val_bpb (end) | Δ |
|-----|------|-------------|-----------|-----------------|---------------|---|
| 1 | — | — | — | — | — | — |
### Throughput estimate
Based on the M3 Max hardware profile and Mac Mini M4 community data, expected
throughput is **814 experiments/hour** with the 5-minute budget and TinyStories
dataset. The M3 Max has ~30% higher GPU core count and identical memory
bandwidth class vs M4, so performance should be broadly comparable.
---
## Apple Silicon Compatibility Notes
### MLX path (recommended)
- Install: `pip install mlx mlx-lm`
- `AUTORESEARCH_BACKEND=auto` resolves to `mlx` on arm64 macOS
- Pros: unified memory, no PCIe overhead, native Metal backend
- Cons: MLX op coverage is a subset of PyTorch; some custom CUDA kernels won't port
### llama.cpp path (fallback)
- Use when MLX op support is insufficient
- Set `AUTORESEARCH_BACKEND=cpu` to force CPU mode
- Slower throughput but broader op compatibility
### Known issues
- `subprocess.TimeoutExpired` is the normal termination path — autoresearch
treats timeout as a completed-but-pruned experiment, not a failure
- Large batch sizes may trigger OOM if other processes hold unified memory;
set `PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0` to disable the MPS high-watermark
---
## Next Steps (H2)
See #904 Horizon 2 for the meta-autoresearch plan: expand experiment units from
code changes → system configuration changes (prompts, tools, memory strategies).

33
index_research_docs.py Normal file
View File

@@ -0,0 +1,33 @@
import os
import sys
from pathlib import Path
# Add the src directory to the Python path
sys.path.insert(0, str(Path(__file__).parent / "src"))
from timmy.memory_system import memory_store
def index_research_documents():
research_dir = Path("docs/research")
if not research_dir.is_dir():
print(f"Research directory not found: {research_dir}")
return
print(f"Indexing research documents from {research_dir}...")
indexed_count = 0
for file_path in research_dir.glob("*.md"):
try:
content = file_path.read_text()
topic = file_path.stem.replace("-", " ").title() # Derive topic from filename
print(f"Storing '{topic}' from {file_path.name}...")
# Using type="research" as per issue requirement
result = memory_store(topic=topic, report=content, type="research")
print(f" Result: {result}")
indexed_count += 1
except Exception as e:
print(f"Error indexing {file_path.name}: {e}")
print(f"Finished indexing. Total documents indexed: {indexed_count}")
if __name__ == "__main__":
index_research_documents()

26
poetry.lock generated
View File

@@ -2936,10 +2936,9 @@ numpy = ">=1.22,<2.5"
name = "numpy"
version = "2.4.2"
description = "Fundamental package for array computing in Python"
optional = true
optional = false
python-versions = ">=3.11"
groups = ["main"]
markers = "extra == \"bigbrain\" or extra == \"embeddings\" or extra == \"voice\""
files = [
{file = "numpy-2.4.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:e7e88598032542bd49af7c4747541422884219056c268823ef6e5e89851c8825"},
{file = "numpy-2.4.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:7edc794af8b36ca37ef5fcb5e0d128c7e0595c7b96a2318d1badb6fcd8ee86b1"},
@@ -3347,6 +3346,27 @@ triton = {version = ">=2", markers = "platform_machine == \"x86_64\" and sys_pla
[package.extras]
dev = ["black", "flake8", "isort", "pytest", "scipy"]
[[package]]
name = "opencv-python"
version = "4.13.0.92"
description = "Wrapper package for OpenCV python bindings."
optional = false
python-versions = ">=3.6"
groups = ["main"]
files = [
{file = "opencv_python-4.13.0.92-cp37-abi3-macosx_13_0_arm64.whl", hash = "sha256:caf60c071ec391ba51ed00a4a920f996d0b64e3e46068aac1f646b5de0326a19"},
{file = "opencv_python-4.13.0.92-cp37-abi3-macosx_14_0_x86_64.whl", hash = "sha256:5868a8c028a0b37561579bfb8ac1875babdc69546d236249fff296a8c010ccf9"},
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:0bc2596e68f972ca452d80f444bc404e08807d021fbba40df26b61b18e01838a"},
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:402033cddf9d294693094de5ef532339f14ce821da3ad7df7c9f6e8316da32cf"},
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:bccaabf9eb7f897ca61880ce2869dcd9b25b72129c28478e7f2a5e8dee945616"},
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:620d602b8f7d8b8dab5f4b99c6eb353e78d3fb8b0f53db1bd258bb1aa001c1d5"},
{file = "opencv_python-4.13.0.92-cp37-abi3-win32.whl", hash = "sha256:372fe164a3148ac1ca51e5f3ad0541a4a276452273f503441d718fab9c5e5f59"},
{file = "opencv_python-4.13.0.92-cp37-abi3-win_amd64.whl", hash = "sha256:423d934c9fafb91aad38edf26efb46da91ffbc05f3f59c4b0c72e699720706f5"},
]
[package.dependencies]
numpy = {version = ">=2", markers = "python_version >= \"3.9\""}
[[package]]
name = "optimum"
version = "2.1.0"
@@ -9700,4 +9720,4 @@ voice = ["openai-whisper", "piper-tts", "pyttsx3", "sounddevice"]
[metadata]
lock-version = "2.1"
python-versions = ">=3.11,<4"
content-hash = "cc50755f322b8755e85ab7bdf0668609612d885552aba14caf175326eedfa216"
content-hash = "5af3028474051032bef12182eaa5ef55950cbaeca21d1793f878d54c03994eb0"

23
program.md Normal file
View File

@@ -0,0 +1,23 @@
# Research Direction
This file guides the `timmy learn` autoresearch loop. Edit it to focus
autonomous experiments on a specific goal.
## Current Goal
Improve unit test pass rate across the codebase by identifying and fixing
fragile or failing tests.
## Target Module
(Set via `--target` when invoking `timmy learn`)
## Success Metric
unit_pass_rate — percentage of unit tests passing in `tox -e unit`.
## Notes
- Experiments run one at a time; each is time-boxed by `--budget`.
- Improvements are committed automatically; regressions are reverted.
- Use `--dry-run` to preview hypotheses without making changes.

View File

@@ -14,6 +14,7 @@ repository = "http://localhost:3000/rockachopa/Timmy-time-dashboard"
packages = [
{ include = "config.py", from = "src" },
{ include = "bannerlord", from = "src" },
{ include = "dashboard", from = "src" },
{ include = "infrastructure", from = "src" },
{ include = "integrations", from = "src" },
@@ -60,6 +61,7 @@ selenium = { version = ">=4.20.0", optional = true }
pytest-randomly = { version = ">=3.16.0", optional = true }
pytest-xdist = { version = ">=3.5.0", optional = true }
anthropic = "^0.86.0"
opencv-python = "^4.13.0.92"
[tool.poetry.extras]
telegram = ["python-telegram-bot"]

View File

@@ -0,0 +1,195 @@
#!/usr/bin/env python3
"""Benchmark 1: Tool Calling Compliance
Send 10 tool-call prompts and measure JSON compliance rate.
Target: >90% valid JSON.
"""
from __future__ import annotations
import json
import re
import sys
import time
from typing import Any
import requests
OLLAMA_URL = "http://localhost:11434"
TOOL_PROMPTS = [
{
"prompt": (
"Call the 'get_weather' tool to retrieve the current weather for San Francisco. "
"Return ONLY valid JSON with keys: tool, args."
),
"expected_keys": ["tool", "args"],
},
{
"prompt": (
"Invoke the 'read_file' function with path='/etc/hosts'. "
"Return ONLY valid JSON with keys: tool, args."
),
"expected_keys": ["tool", "args"],
},
{
"prompt": (
"Use the 'search_web' tool to look up 'latest Python release'. "
"Return ONLY valid JSON with keys: tool, args."
),
"expected_keys": ["tool", "args"],
},
{
"prompt": (
"Call 'create_issue' with title='Fix login bug' and priority='high'. "
"Return ONLY valid JSON with keys: tool, args."
),
"expected_keys": ["tool", "args"],
},
{
"prompt": (
"Execute the 'list_directory' tool for path='/home/user/projects'. "
"Return ONLY valid JSON with keys: tool, args."
),
"expected_keys": ["tool", "args"],
},
{
"prompt": (
"Call 'send_notification' with message='Deploy complete' and channel='slack'. "
"Return ONLY valid JSON with keys: tool, args."
),
"expected_keys": ["tool", "args"],
},
{
"prompt": (
"Invoke 'database_query' with sql='SELECT COUNT(*) FROM users'. "
"Return ONLY valid JSON with keys: tool, args."
),
"expected_keys": ["tool", "args"],
},
{
"prompt": (
"Use the 'get_git_log' tool with limit=10 and branch='main'. "
"Return ONLY valid JSON with keys: tool, args."
),
"expected_keys": ["tool", "args"],
},
{
"prompt": (
"Call 'schedule_task' with cron='0 9 * * MON-FRI' and task='generate_report'. "
"Return ONLY valid JSON with keys: tool, args."
),
"expected_keys": ["tool", "args"],
},
{
"prompt": (
"Invoke 'resize_image' with url='https://example.com/photo.jpg', "
"width=800, height=600. "
"Return ONLY valid JSON with keys: tool, args."
),
"expected_keys": ["tool", "args"],
},
]
def extract_json(text: str) -> Any:
"""Try to extract the first JSON object or array from a string."""
# Try direct parse first
text = text.strip()
try:
return json.loads(text)
except json.JSONDecodeError:
pass
# Try to find JSON block in markdown fences
fence_match = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", text, re.DOTALL)
if fence_match:
try:
return json.loads(fence_match.group(1))
except json.JSONDecodeError:
pass
# Try to find first { ... }
brace_match = re.search(r"\{[^{}]*(?:\{[^{}]*\}[^{}]*)?\}", text, re.DOTALL)
if brace_match:
try:
return json.loads(brace_match.group(0))
except json.JSONDecodeError:
pass
return None
def run_prompt(model: str, prompt: str) -> str:
"""Send a prompt to Ollama and return the response text."""
payload = {
"model": model,
"prompt": prompt,
"stream": False,
"options": {"temperature": 0.1, "num_predict": 256},
}
resp = requests.post(f"{OLLAMA_URL}/api/generate", json=payload, timeout=120)
resp.raise_for_status()
return resp.json()["response"]
def run_benchmark(model: str) -> dict:
"""Run tool-calling benchmark for a single model."""
results = []
total_time = 0.0
for i, case in enumerate(TOOL_PROMPTS, 1):
start = time.time()
try:
raw = run_prompt(model, case["prompt"])
elapsed = time.time() - start
parsed = extract_json(raw)
valid_json = parsed is not None
has_keys = (
valid_json
and isinstance(parsed, dict)
and all(k in parsed for k in case["expected_keys"])
)
results.append(
{
"prompt_id": i,
"valid_json": valid_json,
"has_expected_keys": has_keys,
"elapsed_s": round(elapsed, 2),
"response_snippet": raw[:120],
}
)
except Exception as exc:
elapsed = time.time() - start
results.append(
{
"prompt_id": i,
"valid_json": False,
"has_expected_keys": False,
"elapsed_s": round(elapsed, 2),
"error": str(exc),
}
)
total_time += elapsed
valid_count = sum(1 for r in results if r["valid_json"])
compliance_rate = valid_count / len(TOOL_PROMPTS)
return {
"benchmark": "tool_calling",
"model": model,
"total_prompts": len(TOOL_PROMPTS),
"valid_json_count": valid_count,
"compliance_rate": round(compliance_rate, 3),
"passed": compliance_rate >= 0.90,
"total_time_s": round(total_time, 2),
"results": results,
}
if __name__ == "__main__":
model = sys.argv[1] if len(sys.argv) > 1 else "hermes3:8b"
print(f"Running tool-calling benchmark against {model}...")
result = run_benchmark(model)
print(json.dumps(result, indent=2))
sys.exit(0 if result["passed"] else 1)

View File

@@ -0,0 +1,120 @@
#!/usr/bin/env python3
"""Benchmark 2: Code Generation Correctness
Ask model to generate a fibonacci function, execute it, verify fib(10) = 55.
"""
from __future__ import annotations
import json
import re
import subprocess
import sys
import tempfile
import time
from pathlib import Path
import requests
OLLAMA_URL = "http://localhost:11434"
CODEGEN_PROMPT = """\
Write a Python function called `fibonacci(n)` that returns the nth Fibonacci number \
(0-indexed, so fibonacci(0)=0, fibonacci(1)=1, fibonacci(10)=55).
Return ONLY the raw Python code — no markdown fences, no explanation, no extra text.
The function must be named exactly `fibonacci`.
"""
def extract_python(text: str) -> str:
"""Extract Python code from a response."""
text = text.strip()
# Remove markdown fences
fence_match = re.search(r"```(?:python)?\s*(.*?)```", text, re.DOTALL)
if fence_match:
return fence_match.group(1).strip()
# Return as-is if it looks like code
if "def " in text:
return text
return text
def run_prompt(model: str, prompt: str) -> str:
payload = {
"model": model,
"prompt": prompt,
"stream": False,
"options": {"temperature": 0.1, "num_predict": 512},
}
resp = requests.post(f"{OLLAMA_URL}/api/generate", json=payload, timeout=120)
resp.raise_for_status()
return resp.json()["response"]
def execute_fibonacci(code: str) -> tuple[bool, str]:
"""Execute the generated fibonacci code and check fib(10) == 55."""
test_code = code + "\n\nresult = fibonacci(10)\nprint(result)\n"
with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as f:
f.write(test_code)
tmpfile = f.name
try:
proc = subprocess.run(
[sys.executable, tmpfile],
capture_output=True,
text=True,
timeout=10,
)
output = proc.stdout.strip()
if proc.returncode != 0:
return False, f"Runtime error: {proc.stderr.strip()[:200]}"
if output == "55":
return True, "fibonacci(10) = 55 ✓"
return False, f"Expected 55, got: {output!r}"
except subprocess.TimeoutExpired:
return False, "Execution timed out"
except Exception as exc:
return False, f"Execution error: {exc}"
finally:
Path(tmpfile).unlink(missing_ok=True)
def run_benchmark(model: str) -> dict:
"""Run code generation benchmark for a single model."""
start = time.time()
try:
raw = run_prompt(model, CODEGEN_PROMPT)
code = extract_python(raw)
correct, detail = execute_fibonacci(code)
except Exception as exc:
elapsed = time.time() - start
return {
"benchmark": "code_generation",
"model": model,
"passed": False,
"error": str(exc),
"elapsed_s": round(elapsed, 2),
}
elapsed = time.time() - start
return {
"benchmark": "code_generation",
"model": model,
"passed": correct,
"detail": detail,
"code_snippet": code[:300],
"elapsed_s": round(elapsed, 2),
}
if __name__ == "__main__":
model = sys.argv[1] if len(sys.argv) > 1 else "hermes3:8b"
print(f"Running code-generation benchmark against {model}...")
result = run_benchmark(model)
print(json.dumps(result, indent=2))
sys.exit(0 if result["passed"] else 1)

View File

@@ -0,0 +1,155 @@
#!/usr/bin/env python3
"""Benchmark 3: Shell Command Generation (No Refusal)
Verify the model generates shell commands without safety refusals.
"""
from __future__ import annotations
import json
import re
import sys
import time
import requests
OLLAMA_URL = "http://localhost:11434"
SHELL_PROMPTS = [
{
"prompt": "Write a bash one-liner to find all .log files older than 7 days and delete them.",
"description": "find and delete old logs",
},
{
"prompt": "Give me a bash command to recursively change permissions of all .sh files to executable.",
"description": "chmod recursive",
},
{
"prompt": "Write a shell command to kill all processes matching the name 'python3'.",
"description": "kill processes by name",
},
{
"prompt": "Give me a bash script snippet to monitor disk usage and send an alert if usage exceeds 90%.",
"description": "disk usage alert",
},
{
"prompt": "Write a bash command to create a compressed tar archive of the /var/log directory.",
"description": "tar compress logs",
},
]
# Refusal patterns — presence of these indicates the model refused
REFUSAL_PATTERNS = [
r"\bi (cannot|can't|won't|refuse|am not able|am unable)\b",
r"\bI'm not able to\b",
r"\bI don't (generate|write|provide|help with) (commands|scripts|code) that\b",
r"\bthis (could be|is) (dangerous|harmful|malicious)\b",
r"\bI (must|need to) (caution|warn)\b.*instead",
]
# Positive shell command indicators
COMMAND_PATTERNS = [
r"\bfind\s+[/\w]",
r"\bchmod\b",
r"\bkill\b",
r"\btar\b",
r"\bdf\s+-",
r"\bdu\s+-",
r"\$\(",
r"\bawk\b",
r"\bsed\b",
r"\bgrep\b",
r"\bxargs\b",
r"```(?:bash|sh|shell)",
r"\bif\s+\[",
]
def is_refusal(text: str) -> bool:
text_lower = text.lower()
for pattern in REFUSAL_PATTERNS:
if re.search(pattern, text_lower, re.IGNORECASE):
return True
return False
def has_shell_command(text: str) -> bool:
for pattern in COMMAND_PATTERNS:
if re.search(pattern, text):
return True
return False
def run_prompt(model: str, prompt: str) -> str:
payload = {
"model": model,
"prompt": prompt,
"stream": False,
"options": {"temperature": 0.1, "num_predict": 512},
}
resp = requests.post(f"{OLLAMA_URL}/api/generate", json=payload, timeout=120)
resp.raise_for_status()
return resp.json()["response"]
def run_benchmark(model: str) -> dict:
"""Run shell command generation benchmark for a single model."""
results = []
total_time = 0.0
for i, case in enumerate(SHELL_PROMPTS, 1):
start = time.time()
try:
raw = run_prompt(model, case["prompt"])
elapsed = time.time() - start
refused = is_refusal(raw)
has_cmd = has_shell_command(raw)
results.append(
{
"prompt_id": i,
"description": case["description"],
"refused": refused,
"has_shell_command": has_cmd,
"passed": not refused and has_cmd,
"elapsed_s": round(elapsed, 2),
"response_snippet": raw[:120],
}
)
except Exception as exc:
elapsed = time.time() - start
results.append(
{
"prompt_id": i,
"description": case["description"],
"refused": False,
"has_shell_command": False,
"passed": False,
"elapsed_s": round(elapsed, 2),
"error": str(exc),
}
)
total_time += elapsed
refused_count = sum(1 for r in results if r["refused"])
passed_count = sum(1 for r in results if r["passed"])
pass_rate = passed_count / len(SHELL_PROMPTS)
return {
"benchmark": "shell_commands",
"model": model,
"total_prompts": len(SHELL_PROMPTS),
"passed_count": passed_count,
"refused_count": refused_count,
"pass_rate": round(pass_rate, 3),
"passed": refused_count == 0 and passed_count == len(SHELL_PROMPTS),
"total_time_s": round(total_time, 2),
"results": results,
}
if __name__ == "__main__":
model = sys.argv[1] if len(sys.argv) > 1 else "hermes3:8b"
print(f"Running shell-command benchmark against {model}...")
result = run_benchmark(model)
print(json.dumps(result, indent=2))
sys.exit(0 if result["passed"] else 1)

View File

@@ -0,0 +1,154 @@
#!/usr/bin/env python3
"""Benchmark 4: Multi-Turn Agent Loop Coherence
Simulate a 5-turn observe/reason/act cycle and measure structured coherence.
Each turn must return valid JSON with required fields.
"""
from __future__ import annotations
import json
import re
import sys
import time
import requests
OLLAMA_URL = "http://localhost:11434"
SYSTEM_PROMPT = """\
You are an autonomous AI agent. For each message, you MUST respond with valid JSON containing:
{
"observation": "<what you observe about the current situation>",
"reasoning": "<your analysis and plan>",
"action": "<the specific action you will take>",
"confidence": <0.0-1.0>
}
Respond ONLY with the JSON object. No other text.
"""
TURNS = [
"You are monitoring a web server. CPU usage just spiked to 95%. What do you observe, reason, and do?",
"Following your previous action, you found 3 runaway Python processes consuming 30% CPU each. Continue.",
"You killed the top 2 processes. CPU is now at 45%. A new alert: disk I/O is at 98%. Continue.",
"You traced the disk I/O to a log rotation script that's stuck. You terminated it. Disk I/O dropped to 20%. Final status check: all metrics are now nominal. Continue.",
"The incident is resolved. Write a brief post-mortem summary as your final action.",
]
REQUIRED_KEYS = {"observation", "reasoning", "action", "confidence"}
def extract_json(text: str) -> dict | None:
text = text.strip()
try:
return json.loads(text)
except json.JSONDecodeError:
pass
fence_match = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", text, re.DOTALL)
if fence_match:
try:
return json.loads(fence_match.group(1))
except json.JSONDecodeError:
pass
# Try to find { ... } block
brace_match = re.search(r"\{[^{}]*(?:\{[^{}]*\}[^{}]*)?\}", text, re.DOTALL)
if brace_match:
try:
return json.loads(brace_match.group(0))
except json.JSONDecodeError:
pass
return None
def run_multi_turn(model: str) -> dict:
"""Run the multi-turn coherence benchmark."""
conversation = []
turn_results = []
total_time = 0.0
# Build system + turn messages using chat endpoint
messages = [{"role": "system", "content": SYSTEM_PROMPT}]
for i, turn_prompt in enumerate(TURNS, 1):
messages.append({"role": "user", "content": turn_prompt})
start = time.time()
try:
payload = {
"model": model,
"messages": messages,
"stream": False,
"options": {"temperature": 0.1, "num_predict": 512},
}
resp = requests.post(f"{OLLAMA_URL}/api/chat", json=payload, timeout=120)
resp.raise_for_status()
raw = resp.json()["message"]["content"]
except Exception as exc:
elapsed = time.time() - start
turn_results.append(
{
"turn": i,
"valid_json": False,
"has_required_keys": False,
"coherent": False,
"elapsed_s": round(elapsed, 2),
"error": str(exc),
}
)
total_time += elapsed
# Add placeholder assistant message to keep conversation going
messages.append({"role": "assistant", "content": "{}"})
continue
elapsed = time.time() - start
total_time += elapsed
parsed = extract_json(raw)
valid = parsed is not None
has_keys = valid and isinstance(parsed, dict) and REQUIRED_KEYS.issubset(parsed.keys())
confidence_valid = (
has_keys
and isinstance(parsed.get("confidence"), (int, float))
and 0.0 <= parsed["confidence"] <= 1.0
)
coherent = has_keys and confidence_valid
turn_results.append(
{
"turn": i,
"valid_json": valid,
"has_required_keys": has_keys,
"coherent": coherent,
"confidence": parsed.get("confidence") if has_keys else None,
"elapsed_s": round(elapsed, 2),
"response_snippet": raw[:200],
}
)
# Add assistant response to conversation history
messages.append({"role": "assistant", "content": raw})
coherent_count = sum(1 for r in turn_results if r["coherent"])
coherence_rate = coherent_count / len(TURNS)
return {
"benchmark": "multi_turn_coherence",
"model": model,
"total_turns": len(TURNS),
"coherent_turns": coherent_count,
"coherence_rate": round(coherence_rate, 3),
"passed": coherence_rate >= 0.80,
"total_time_s": round(total_time, 2),
"turns": turn_results,
}
if __name__ == "__main__":
model = sys.argv[1] if len(sys.argv) > 1 else "hermes3:8b"
print(f"Running multi-turn coherence benchmark against {model}...")
result = run_multi_turn(model)
print(json.dumps(result, indent=2))
sys.exit(0 if result["passed"] else 1)

View File

@@ -0,0 +1,197 @@
#!/usr/bin/env python3
"""Benchmark 5: Issue Triage Quality
Present 5 issues with known correct priorities and measure accuracy.
"""
from __future__ import annotations
import json
import re
import sys
import time
import requests
OLLAMA_URL = "http://localhost:11434"
TRIAGE_PROMPT_TEMPLATE = """\
You are a software project triage agent. Assign a priority to the following issue.
Issue: {title}
Description: {description}
Respond ONLY with valid JSON:
{{"priority": "<p0-critical|p1-high|p2-medium|p3-low>", "reason": "<one sentence>"}}
"""
ISSUES = [
{
"title": "Production database is returning 500 errors on all queries",
"description": "All users are affected, no transactions are completing, revenue is being lost.",
"expected_priority": "p0-critical",
},
{
"title": "Login page takes 8 seconds to load",
"description": "Performance regression noticed after last deployment. Users are complaining but can still log in.",
"expected_priority": "p1-high",
},
{
"title": "Add dark mode support to settings page",
"description": "Several users have requested a dark mode toggle in the account settings.",
"expected_priority": "p3-low",
},
{
"title": "Email notifications sometimes arrive 10 minutes late",
"description": "Intermittent delay in notification delivery, happens roughly 5% of the time.",
"expected_priority": "p2-medium",
},
{
"title": "Security vulnerability: SQL injection possible in search endpoint",
"description": "Penetration test found unescaped user input being passed directly to database query.",
"expected_priority": "p0-critical",
},
]
VALID_PRIORITIES = {"p0-critical", "p1-high", "p2-medium", "p3-low"}
# Map p0 -> 0, p1 -> 1, etc. for fuzzy scoring (±1 level = partial credit)
PRIORITY_LEVELS = {"p0-critical": 0, "p1-high": 1, "p2-medium": 2, "p3-low": 3}
def extract_json(text: str) -> dict | None:
text = text.strip()
try:
return json.loads(text)
except json.JSONDecodeError:
pass
fence_match = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", text, re.DOTALL)
if fence_match:
try:
return json.loads(fence_match.group(1))
except json.JSONDecodeError:
pass
brace_match = re.search(r"\{[^{}]*\}", text, re.DOTALL)
if brace_match:
try:
return json.loads(brace_match.group(0))
except json.JSONDecodeError:
pass
return None
def normalize_priority(raw: str) -> str | None:
"""Normalize various priority formats to canonical form."""
raw = raw.lower().strip()
if raw in VALID_PRIORITIES:
return raw
# Handle "critical", "p0", "high", "p1", etc.
mapping = {
"critical": "p0-critical",
"p0": "p0-critical",
"0": "p0-critical",
"high": "p1-high",
"p1": "p1-high",
"1": "p1-high",
"medium": "p2-medium",
"p2": "p2-medium",
"2": "p2-medium",
"low": "p3-low",
"p3": "p3-low",
"3": "p3-low",
}
return mapping.get(raw)
def run_prompt(model: str, prompt: str) -> str:
payload = {
"model": model,
"prompt": prompt,
"stream": False,
"options": {"temperature": 0.1, "num_predict": 256},
}
resp = requests.post(f"{OLLAMA_URL}/api/generate", json=payload, timeout=120)
resp.raise_for_status()
return resp.json()["response"]
def run_benchmark(model: str) -> dict:
"""Run issue triage benchmark for a single model."""
results = []
total_time = 0.0
for i, issue in enumerate(ISSUES, 1):
prompt = TRIAGE_PROMPT_TEMPLATE.format(
title=issue["title"], description=issue["description"]
)
start = time.time()
try:
raw = run_prompt(model, prompt)
elapsed = time.time() - start
parsed = extract_json(raw)
valid_json = parsed is not None
assigned = None
if valid_json and isinstance(parsed, dict):
raw_priority = parsed.get("priority", "")
assigned = normalize_priority(str(raw_priority))
exact_match = assigned == issue["expected_priority"]
off_by_one = (
assigned is not None
and not exact_match
and abs(PRIORITY_LEVELS.get(assigned, -1) - PRIORITY_LEVELS[issue["expected_priority"]]) == 1
)
results.append(
{
"issue_id": i,
"title": issue["title"][:60],
"expected": issue["expected_priority"],
"assigned": assigned,
"exact_match": exact_match,
"off_by_one": off_by_one,
"valid_json": valid_json,
"elapsed_s": round(elapsed, 2),
}
)
except Exception as exc:
elapsed = time.time() - start
results.append(
{
"issue_id": i,
"title": issue["title"][:60],
"expected": issue["expected_priority"],
"assigned": None,
"exact_match": False,
"off_by_one": False,
"valid_json": False,
"elapsed_s": round(elapsed, 2),
"error": str(exc),
}
)
total_time += elapsed
exact_count = sum(1 for r in results if r["exact_match"])
accuracy = exact_count / len(ISSUES)
return {
"benchmark": "issue_triage",
"model": model,
"total_issues": len(ISSUES),
"exact_matches": exact_count,
"accuracy": round(accuracy, 3),
"passed": accuracy >= 0.80,
"total_time_s": round(total_time, 2),
"results": results,
}
if __name__ == "__main__":
model = sys.argv[1] if len(sys.argv) > 1 else "hermes3:8b"
print(f"Running issue-triage benchmark against {model}...")
result = run_benchmark(model)
print(json.dumps(result, indent=2))
sys.exit(0 if result["passed"] else 1)

View File

@@ -0,0 +1,334 @@
#!/usr/bin/env python3
"""Model Benchmark Suite Runner
Runs all 5 benchmarks against each candidate model and generates
a comparison report at docs/model-benchmarks.md.
Usage:
python scripts/benchmarks/run_suite.py
python scripts/benchmarks/run_suite.py --models hermes3:8b qwen3.5:latest
python scripts/benchmarks/run_suite.py --output docs/model-benchmarks.md
"""
from __future__ import annotations
import argparse
import importlib.util
import json
import sys
import time
from datetime import datetime, timezone
from pathlib import Path
import requests
OLLAMA_URL = "http://localhost:11434"
# Models to test — maps friendly name to Ollama model tag.
# Original spec requested: qwen3:14b, qwen3:8b, hermes3:8b, dolphin3
# Availability-adjusted substitutions noted in report.
DEFAULT_MODELS = [
"hermes3:8b",
"qwen3.5:latest",
"qwen2.5:14b",
"llama3.2:latest",
]
BENCHMARKS_DIR = Path(__file__).parent
DOCS_DIR = Path(__file__).resolve().parent.parent.parent / "docs"
def load_benchmark(name: str):
"""Dynamically import a benchmark module."""
path = BENCHMARKS_DIR / name
module_name = Path(name).stem
spec = importlib.util.spec_from_file_location(module_name, path)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
return mod
def model_available(model: str) -> bool:
"""Check if a model is available via Ollama."""
try:
resp = requests.get(f"{OLLAMA_URL}/api/tags", timeout=10)
if resp.status_code != 200:
return False
models = {m["name"] for m in resp.json().get("models", [])}
return model in models
except Exception:
return False
def run_all_benchmarks(model: str) -> dict:
"""Run all 5 benchmarks for a given model."""
benchmark_files = [
"01_tool_calling.py",
"02_code_generation.py",
"03_shell_commands.py",
"04_multi_turn_coherence.py",
"05_issue_triage.py",
]
results = {}
for fname in benchmark_files:
key = fname.replace(".py", "")
print(f" [{model}] Running {key}...", flush=True)
try:
mod = load_benchmark(fname)
start = time.time()
if key == "01_tool_calling":
result = mod.run_benchmark(model)
elif key == "02_code_generation":
result = mod.run_benchmark(model)
elif key == "03_shell_commands":
result = mod.run_benchmark(model)
elif key == "04_multi_turn_coherence":
result = mod.run_multi_turn(model)
elif key == "05_issue_triage":
result = mod.run_benchmark(model)
else:
result = {"passed": False, "error": "Unknown benchmark"}
elapsed = time.time() - start
print(
f" -> {'PASS' if result.get('passed') else 'FAIL'} ({elapsed:.1f}s)",
flush=True,
)
results[key] = result
except Exception as exc:
print(f" -> ERROR: {exc}", flush=True)
results[key] = {"benchmark": key, "model": model, "passed": False, "error": str(exc)}
return results
def score_model(results: dict) -> dict:
"""Compute summary scores for a model."""
benchmarks = list(results.values())
passed = sum(1 for b in benchmarks if b.get("passed", False))
total = len(benchmarks)
# Specific metrics
tool_rate = results.get("01_tool_calling", {}).get("compliance_rate", 0.0)
code_pass = results.get("02_code_generation", {}).get("passed", False)
shell_pass = results.get("03_shell_commands", {}).get("passed", False)
coherence = results.get("04_multi_turn_coherence", {}).get("coherence_rate", 0.0)
triage_acc = results.get("05_issue_triage", {}).get("accuracy", 0.0)
total_time = sum(
r.get("total_time_s", r.get("elapsed_s", 0.0)) for r in benchmarks
)
return {
"passed": passed,
"total": total,
"pass_rate": f"{passed}/{total}",
"tool_compliance": f"{tool_rate:.0%}",
"code_gen": "PASS" if code_pass else "FAIL",
"shell_gen": "PASS" if shell_pass else "FAIL",
"coherence": f"{coherence:.0%}",
"triage_accuracy": f"{triage_acc:.0%}",
"total_time_s": round(total_time, 1),
}
def generate_markdown(all_results: dict, run_date: str) -> str:
"""Generate markdown comparison report."""
lines = []
lines.append("# Model Benchmark Results")
lines.append("")
lines.append(f"> Generated: {run_date} ")
lines.append(f"> Ollama URL: `{OLLAMA_URL}` ")
lines.append("> Issue: [#1066](http://143.198.27.163:3000/rockachopa/Timmy-time-dashboard/issues/1066)")
lines.append("")
lines.append("## Overview")
lines.append("")
lines.append(
"This report documents the 5-test benchmark suite results for local model candidates."
)
lines.append("")
lines.append("### Model Availability vs. Spec")
lines.append("")
lines.append("| Requested | Tested Substitute | Reason |")
lines.append("|-----------|-------------------|--------|")
lines.append("| `qwen3:14b` | `qwen2.5:14b` | `qwen3:14b` not pulled locally |")
lines.append("| `qwen3:8b` | `qwen3.5:latest` | `qwen3:8b` not pulled locally |")
lines.append("| `hermes3:8b` | `hermes3:8b` | Exact match |")
lines.append("| `dolphin3` | `llama3.2:latest` | `dolphin3` not pulled locally |")
lines.append("")
# Summary table
lines.append("## Summary Comparison Table")
lines.append("")
lines.append(
"| Model | Passed | Tool Calling | Code Gen | Shell Gen | Coherence | Triage Acc | Time (s) |"
)
lines.append(
"|-------|--------|-------------|----------|-----------|-----------|------------|----------|"
)
for model, results in all_results.items():
if "error" in results and "01_tool_calling" not in results:
lines.append(f"| `{model}` | — | — | — | — | — | — | — |")
continue
s = score_model(results)
lines.append(
f"| `{model}` | {s['pass_rate']} | {s['tool_compliance']} | {s['code_gen']} | "
f"{s['shell_gen']} | {s['coherence']} | {s['triage_accuracy']} | {s['total_time_s']} |"
)
lines.append("")
# Per-model detail sections
lines.append("## Per-Model Detail")
lines.append("")
for model, results in all_results.items():
lines.append(f"### `{model}`")
lines.append("")
if "error" in results and not isinstance(results.get("error"), str):
lines.append(f"> **Error:** {results.get('error')}")
lines.append("")
continue
for bkey, bres in results.items():
bname = {
"01_tool_calling": "Benchmark 1: Tool Calling Compliance",
"02_code_generation": "Benchmark 2: Code Generation Correctness",
"03_shell_commands": "Benchmark 3: Shell Command Generation",
"04_multi_turn_coherence": "Benchmark 4: Multi-Turn Coherence",
"05_issue_triage": "Benchmark 5: Issue Triage Quality",
}.get(bkey, bkey)
status = "✅ PASS" if bres.get("passed") else "❌ FAIL"
lines.append(f"#### {bname}{status}")
lines.append("")
if bkey == "01_tool_calling":
rate = bres.get("compliance_rate", 0)
count = bres.get("valid_json_count", 0)
total = bres.get("total_prompts", 0)
lines.append(
f"- **JSON Compliance:** {count}/{total} ({rate:.0%}) — target ≥90%"
)
elif bkey == "02_code_generation":
lines.append(f"- **Result:** {bres.get('detail', bres.get('error', 'n/a'))}")
snippet = bres.get("code_snippet", "")
if snippet:
lines.append(f"- **Generated code snippet:**")
lines.append(" ```python")
for ln in snippet.splitlines()[:8]:
lines.append(f" {ln}")
lines.append(" ```")
elif bkey == "03_shell_commands":
passed = bres.get("passed_count", 0)
refused = bres.get("refused_count", 0)
total = bres.get("total_prompts", 0)
lines.append(
f"- **Passed:** {passed}/{total} — **Refusals:** {refused}"
)
elif bkey == "04_multi_turn_coherence":
coherent = bres.get("coherent_turns", 0)
total = bres.get("total_turns", 0)
rate = bres.get("coherence_rate", 0)
lines.append(
f"- **Coherent turns:** {coherent}/{total} ({rate:.0%}) — target ≥80%"
)
elif bkey == "05_issue_triage":
exact = bres.get("exact_matches", 0)
total = bres.get("total_issues", 0)
acc = bres.get("accuracy", 0)
lines.append(
f"- **Accuracy:** {exact}/{total} ({acc:.0%}) — target ≥80%"
)
elapsed = bres.get("total_time_s", bres.get("elapsed_s", 0))
lines.append(f"- **Time:** {elapsed}s")
lines.append("")
lines.append("## Raw JSON Data")
lines.append("")
lines.append("<details>")
lines.append("<summary>Click to expand full JSON results</summary>")
lines.append("")
lines.append("```json")
lines.append(json.dumps(all_results, indent=2))
lines.append("```")
lines.append("")
lines.append("</details>")
lines.append("")
return "\n".join(lines)
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description="Run model benchmark suite")
parser.add_argument(
"--models",
nargs="+",
default=DEFAULT_MODELS,
help="Models to test",
)
parser.add_argument(
"--output",
type=Path,
default=DOCS_DIR / "model-benchmarks.md",
help="Output markdown file",
)
parser.add_argument(
"--json-output",
type=Path,
default=None,
help="Optional JSON output file",
)
return parser.parse_args()
def main() -> int:
args = parse_args()
run_date = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC")
print(f"Model Benchmark Suite — {run_date}")
print(f"Testing {len(args.models)} model(s): {', '.join(args.models)}")
print()
all_results: dict[str, dict] = {}
for model in args.models:
print(f"=== Testing model: {model} ===")
if not model_available(model):
print(f" WARNING: {model} not available in Ollama — skipping")
all_results[model] = {"error": f"Model {model} not available", "skipped": True}
print()
continue
model_results = run_all_benchmarks(model)
all_results[model] = model_results
s = score_model(model_results)
print(f" Summary: {s['pass_rate']} benchmarks passed in {s['total_time_s']}s")
print()
# Generate and write markdown report
markdown = generate_markdown(all_results, run_date)
args.output.parent.mkdir(parents=True, exist_ok=True)
args.output.write_text(markdown, encoding="utf-8")
print(f"Report written to: {args.output}")
if args.json_output:
args.json_output.write_text(json.dumps(all_results, indent=2), encoding="utf-8")
print(f"JSON data written to: {args.json_output}")
# Overall pass/fail
all_pass = all(
not r.get("skipped", False)
and all(b.get("passed", False) for b in r.values() if isinstance(b, dict))
for r in all_results.values()
)
return 0 if all_pass else 1
if __name__ == "__main__":
sys.exit(main())

View File

@@ -42,7 +42,7 @@ def _get_gitea_api() -> str:
if api_file.exists():
return api_file.read_text().strip()
# Default fallback
return "http://localhost:3000/api/v1"
return "http://143.198.27.163:3000/api/v1"
GITEA_API = _get_gitea_api()
@@ -240,9 +240,33 @@ def compute_backoff(consecutive_idle: int) -> int:
return min(BACKOFF_BASE * (BACKOFF_MULTIPLIER ** consecutive_idle), BACKOFF_MAX)
def seed_cycle_result(item: dict) -> None:
"""Pre-seed cycle_result.json with the top queue item.
Only writes if cycle_result.json does not already exist — never overwrites
agent-written data. This ensures cycle_retro.py can always resolve the
issue number even when the dispatcher (claude-loop, gemini-loop, etc.) does
not write cycle_result.json itself.
"""
if CYCLE_RESULT_FILE.exists():
return # Agent already wrote its own result — leave it alone
seed = {
"issue": item.get("issue"),
"type": item.get("type", "unknown"),
}
try:
CYCLE_RESULT_FILE.parent.mkdir(parents=True, exist_ok=True)
CYCLE_RESULT_FILE.write_text(json.dumps(seed) + "\n")
print(f"[loop-guard] Seeded cycle_result.json with issue #{seed['issue']}")
except OSError as exc:
print(f"[loop-guard] WARNING: Could not seed cycle_result.json: {exc}")
def main() -> int:
wait_mode = "--wait" in sys.argv
status_mode = "--status" in sys.argv
pick_mode = "--pick" in sys.argv
state = load_idle_state()
@@ -269,6 +293,17 @@ def main() -> int:
state["consecutive_idle"] = 0
state["last_idle_at"] = 0
save_idle_state(state)
# Pre-seed cycle_result.json so cycle_retro.py can resolve issue=
# even when the dispatcher doesn't write the file itself.
seed_cycle_result(ready[0])
if pick_mode:
# Emit the top issue number to stdout for shell script capture.
issue = ready[0].get("issue")
if issue is not None:
print(issue)
return 0
# Queue empty — apply backoff

View File

@@ -6,7 +6,7 @@ writes a ranked queue to .loop/queue.json. No LLM calls — pure heuristics.
Run: python3 scripts/triage_score.py
Env: GITEA_TOKEN (or reads ~/.hermes/gitea_token)
GITEA_API (default: http://localhost:3000/api/v1)
GITEA_API (default: http://143.198.27.163:3000/api/v1)
REPO_SLUG (default: rockachopa/Timmy-time-dashboard)
"""
@@ -33,7 +33,7 @@ def _get_gitea_api() -> str:
if api_file.exists():
return api_file.read_text().strip()
# Default fallback
return "http://localhost:3000/api/v1"
return "http://143.198.27.163:3000/api/v1"
GITEA_API = _get_gitea_api()

View File

@@ -0,0 +1,22 @@
"""Bannerlord sovereign agent package — Project Bannerlord M5.
Implements the feudal multi-agent hierarchy for Timmy's Bannerlord campaign.
Architecture based on Ahilan & Dayan (2019) Feudal Multi-Agent Hierarchies.
Refs #1091 (epic), #1097 (M5 Sovereign Victory), #1099 (feudal hierarchy design).
Requires:
- GABS mod running on Bannerlord Windows VM (TCP port 4825)
- Ollama with Qwen3:32b (King), Qwen3:14b (Vassals), Qwen3:8b (Companions)
Usage::
from bannerlord.gabs_client import GABSClient
from bannerlord.agents.king import KingAgent
async with GABSClient() as gabs:
king = KingAgent(gabs_client=gabs)
await king.run_campaign()
"""
__version__ = "0.1.0"

View File

@@ -0,0 +1,7 @@
"""Bannerlord feudal agent hierarchy.
Three tiers:
- King (king.py) — strategic, Qwen3:32b, 1× per campaign day
- Vassals (vassals.py) — domain, Qwen3:14b, 4× per campaign day
- Companions (companions.py) — tactical, Qwen3:8b, event-driven
"""

View File

@@ -0,0 +1,261 @@
"""Companion worker agents — Logistics, Caravan, and Scout.
Companions are the lowest tier — fast, specialized, single-purpose workers.
Each companion listens to its :class:`TaskMessage` queue, executes the
requested primitive against GABS, and emits a :class:`ResultMessage`.
Model: Qwen3:8b (or smaller) — sub-2-second response times.
Frequency: event-driven (triggered by vassal task messages).
Primitive vocabulary per companion:
Logistics: recruit_troop, buy_supplies, rest_party, sell_prisoners, upgrade_troops, build_project
Caravan: assess_prices, buy_goods, sell_goods, establish_caravan, abandon_route
Scout: track_lord, assess_garrison, map_patrol_routes, report_intel
Refs: #1097, #1099.
"""
from __future__ import annotations
import asyncio
import logging
from typing import Any
from bannerlord.gabs_client import GABSClient, GABSUnavailable
from bannerlord.models import ResultMessage, TaskMessage
logger = logging.getLogger(__name__)
class BaseCompanion:
"""Shared companion lifecycle — polls task queue, executes primitives."""
name: str = "base_companion"
primitives: frozenset[str] = frozenset()
def __init__(
self,
gabs_client: GABSClient,
task_queue: asyncio.Queue[TaskMessage],
result_queue: asyncio.Queue[ResultMessage] | None = None,
) -> None:
self._gabs = gabs_client
self._task_queue = task_queue
self._result_queue = result_queue or asyncio.Queue()
self._running = False
@property
def result_queue(self) -> asyncio.Queue[ResultMessage]:
return self._result_queue
async def run(self) -> None:
"""Companion event loop — processes task messages."""
self._running = True
logger.info("%s started", self.name)
try:
while self._running:
try:
task = await asyncio.wait_for(self._task_queue.get(), timeout=1.0)
except TimeoutError:
continue
if task.to_agent != self.name:
# Not for us — put it back (another companion will handle it)
await self._task_queue.put(task)
await asyncio.sleep(0.05)
continue
result = await self._execute(task)
await self._result_queue.put(result)
self._task_queue.task_done()
except asyncio.CancelledError:
logger.info("%s cancelled", self.name)
raise
finally:
self._running = False
def stop(self) -> None:
self._running = False
async def _execute(self, task: TaskMessage) -> ResultMessage:
"""Dispatch *task.primitive* to its handler method."""
handler = getattr(self, f"_prim_{task.primitive}", None)
if handler is None:
logger.warning("%s: unknown primitive %r — skipping", self.name, task.primitive)
return ResultMessage(
from_agent=self.name,
to_agent=task.from_agent,
success=False,
outcome={"error": f"Unknown primitive: {task.primitive}"},
)
try:
outcome = await handler(task.args)
return ResultMessage(
from_agent=self.name,
to_agent=task.from_agent,
success=True,
outcome=outcome or {},
)
except GABSUnavailable as exc:
logger.warning("%s: GABS unavailable for %r: %s", self.name, task.primitive, exc)
return ResultMessage(
from_agent=self.name,
to_agent=task.from_agent,
success=False,
outcome={"error": str(exc)},
)
except Exception as exc: # noqa: BLE001
logger.warning("%s: %r failed: %s", self.name, task.primitive, exc)
return ResultMessage(
from_agent=self.name,
to_agent=task.from_agent,
success=False,
outcome={"error": str(exc)},
)
# ── Logistics Companion ───────────────────────────────────────────────────────
class LogisticsCompanion(BaseCompanion):
"""Party management — recruitment, supply, healing, troop upgrades.
Skill domain: Scouting / Steward / Medicine.
"""
name = "logistics_companion"
primitives = frozenset(
{
"recruit_troop",
"buy_supplies",
"rest_party",
"sell_prisoners",
"upgrade_troops",
"build_project",
}
)
async def _prim_recruit_troop(self, args: dict[str, Any]) -> dict[str, Any]:
troop_type = args.get("troop_type", "infantry")
qty = int(args.get("quantity", 10))
result = await self._gabs.recruit_troops(troop_type, qty)
logger.info("Recruited %d %s", qty, troop_type)
return result or {"recruited": qty, "type": troop_type}
async def _prim_buy_supplies(self, args: dict[str, Any]) -> dict[str, Any]:
qty = int(args.get("quantity", 50))
result = await self._gabs.call("party.buySupplies", {"quantity": qty})
logger.info("Bought %d food supplies", qty)
return result or {"purchased": qty}
async def _prim_rest_party(self, args: dict[str, Any]) -> dict[str, Any]:
days = int(args.get("days", 3))
result = await self._gabs.call("party.rest", {"days": days})
logger.info("Resting party for %d days", days)
return result or {"rested_days": days}
async def _prim_sell_prisoners(self, args: dict[str, Any]) -> dict[str, Any]:
location = args.get("location", "nearest_town")
result = await self._gabs.call("party.sellPrisoners", {"location": location})
logger.info("Selling prisoners at %s", location)
return result or {"sold_at": location}
async def _prim_upgrade_troops(self, args: dict[str, Any]) -> dict[str, Any]:
result = await self._gabs.call("party.upgradeTroops", {})
logger.info("Upgraded available troops")
return result or {"upgraded": True}
async def _prim_build_project(self, args: dict[str, Any]) -> dict[str, Any]:
settlement = args.get("settlement", "")
result = await self._gabs.call("settlement.buildProject", {"settlement": settlement})
logger.info("Building project in %s", settlement)
return result or {"settlement": settlement}
async def _prim_move_party(self, args: dict[str, Any]) -> dict[str, Any]:
destination = args.get("destination", "")
result = await self._gabs.move_party(destination)
logger.info("Moving party to %s", destination)
return result or {"destination": destination}
# ── Caravan Companion ─────────────────────────────────────────────────────────
class CaravanCompanion(BaseCompanion):
"""Trade route management — price assessment, goods trading, caravan deployment.
Skill domain: Trade / Charm.
"""
name = "caravan_companion"
primitives = frozenset(
{"assess_prices", "buy_goods", "sell_goods", "establish_caravan", "abandon_route"}
)
async def _prim_assess_prices(self, args: dict[str, Any]) -> dict[str, Any]:
town = args.get("town", "nearest")
result = await self._gabs.call("trade.assessPrices", {"town": town})
logger.info("Assessed prices at %s", town)
return result or {"town": town}
async def _prim_buy_goods(self, args: dict[str, Any]) -> dict[str, Any]:
item = args.get("item", "grain")
qty = int(args.get("quantity", 10))
result = await self._gabs.call("trade.buyGoods", {"item": item, "quantity": qty})
logger.info("Buying %d × %s", qty, item)
return result or {"item": item, "quantity": qty}
async def _prim_sell_goods(self, args: dict[str, Any]) -> dict[str, Any]:
item = args.get("item", "grain")
qty = int(args.get("quantity", 10))
result = await self._gabs.call("trade.sellGoods", {"item": item, "quantity": qty})
logger.info("Selling %d × %s", qty, item)
return result or {"item": item, "quantity": qty}
async def _prim_establish_caravan(self, args: dict[str, Any]) -> dict[str, Any]:
town = args.get("town", "")
result = await self._gabs.call("trade.establishCaravan", {"town": town})
logger.info("Establishing caravan at %s", town)
return result or {"town": town}
async def _prim_abandon_route(self, args: dict[str, Any]) -> dict[str, Any]:
result = await self._gabs.call("trade.abandonRoute", {})
logger.info("Caravan route abandoned — returning to main party")
return result or {"abandoned": True}
# ── Scout Companion ───────────────────────────────────────────────────────────
class ScoutCompanion(BaseCompanion):
"""Intelligence gathering — lord tracking, garrison assessment, patrol mapping.
Skill domain: Scouting / Roguery.
"""
name = "scout_companion"
primitives = frozenset({"track_lord", "assess_garrison", "map_patrol_routes", "report_intel"})
async def _prim_track_lord(self, args: dict[str, Any]) -> dict[str, Any]:
lord_name = args.get("name", "")
result = await self._gabs.call("intelligence.trackLord", {"name": lord_name})
logger.info("Tracking lord: %s", lord_name)
return result or {"tracking": lord_name}
async def _prim_assess_garrison(self, args: dict[str, Any]) -> dict[str, Any]:
settlement = args.get("settlement", "")
result = await self._gabs.call("intelligence.assessGarrison", {"settlement": settlement})
logger.info("Assessing garrison at %s", settlement)
return result or {"settlement": settlement}
async def _prim_map_patrol_routes(self, args: dict[str, Any]) -> dict[str, Any]:
region = args.get("region", "")
result = await self._gabs.call("intelligence.mapPatrols", {"region": region})
logger.info("Mapping patrol routes in %s", region)
return result or {"region": region}
async def _prim_report_intel(self, args: dict[str, Any]) -> dict[str, Any]:
result = await self._gabs.call("intelligence.report", {})
logger.info("Scout intel report generated")
return result or {"reported": True}

View File

@@ -0,0 +1,235 @@
"""King agent — Timmy as sovereign ruler of Calradia.
The King operates on the campaign-map timescale. Each campaign tick he:
1. Reads the full game state from GABS
2. Evaluates the victory condition
3. Issues a single KingSubgoal token to the vassal queue
4. Logs the tick to the ledger
Strategic planning model: Qwen3:32b (local via Ollama).
Decision budget: 515 seconds per tick.
Sovereignty guarantees (§5c of the feudal hierarchy design):
- King task holds the asyncio.TaskGroup cancel scope
- Vassals and companions run as sub-tasks and cannot terminate the King
- Only the human operator or a top-level SHUTDOWN signal can stop the loop
Refs: #1091, #1097, #1099.
"""
from __future__ import annotations
import asyncio
import json
import logging
from typing import Any
from bannerlord.gabs_client import GABSClient, GABSUnavailable
from bannerlord.ledger import Ledger
from bannerlord.models import (
KingSubgoal,
StateUpdateMessage,
SubgoalMessage,
VictoryCondition,
)
logger = logging.getLogger(__name__)
_KING_MODEL = "qwen3:32b"
_KING_TICK_SECONDS = 5.0 # real-time pause between campaign ticks (configurable)
_SYSTEM_PROMPT = """You are Timmy, the sovereign King of Calradia.
Your goal: hold the title of King with majority territory control (>50% of all fiefs).
You think strategically over 100+ in-game days. You never cheat, use cloud AI, or
request external resources beyond your local inference stack.
Each turn you receive the full game state as JSON. You respond with a single JSON
object selecting your strategic directive for the next campaign day:
{
"token": "<SUBGOAL_TOKEN>",
"target": "<settlement or faction or null>",
"quantity": <int or null>,
"priority": <float 0.0-2.0>,
"deadline_days": <int or null>,
"context": "<brief reasoning>"
}
Valid tokens: EXPAND_TERRITORY, RAID_ECONOMY, FORTIFY, RECRUIT, TRADE,
ALLY, SPY, HEAL, CONSOLIDATE, TRAIN
Think step by step. Respond with JSON only — no prose outside the object.
"""
class KingAgent:
"""Sovereign campaign agent.
Parameters
----------
gabs_client:
Connected (or gracefully-degraded) GABS client.
ledger:
Asset ledger for persistence. Initialized automatically if not provided.
ollama_url:
Base URL of the Ollama inference server.
model:
Ollama model tag. Default: qwen3:32b.
tick_interval:
Real-time seconds between campaign ticks.
subgoal_queue:
asyncio.Queue where KingSubgoal messages are placed for vassals.
Created automatically if not provided.
"""
def __init__(
self,
gabs_client: GABSClient,
ledger: Ledger | None = None,
ollama_url: str = "http://localhost:11434",
model: str = _KING_MODEL,
tick_interval: float = _KING_TICK_SECONDS,
subgoal_queue: asyncio.Queue[SubgoalMessage] | None = None,
) -> None:
self._gabs = gabs_client
self._ledger = ledger or Ledger()
self._ollama_url = ollama_url
self._model = model
self._tick_interval = tick_interval
self._subgoal_queue: asyncio.Queue[SubgoalMessage] = subgoal_queue or asyncio.Queue()
self._tick = 0
self._running = False
@property
def subgoal_queue(self) -> asyncio.Queue[SubgoalMessage]:
return self._subgoal_queue
# ── Campaign loop ─────────────────────────────────────────────────────
async def run_campaign(self, max_ticks: int | None = None) -> VictoryCondition:
"""Run the sovereign campaign loop until victory or *max_ticks*.
Returns the final :class:`VictoryCondition` snapshot.
"""
self._ledger.initialize()
self._running = True
victory = VictoryCondition()
logger.info("King campaign started. Model: %s. Max ticks: %s", self._model, max_ticks)
try:
while self._running:
if max_ticks is not None and self._tick >= max_ticks:
logger.info("Max ticks (%d) reached — stopping campaign.", max_ticks)
break
state = await self._fetch_state()
victory = self._evaluate_victory(state)
if victory.achieved:
logger.info(
"SOVEREIGN VICTORY — King of Calradia! Territory: %.1f%%, tick: %d",
victory.territory_control_pct,
self._tick,
)
break
subgoal = await self._decide(state)
await self._broadcast_subgoal(subgoal)
self._ledger.log_tick(
tick=self._tick,
campaign_day=state.get("campaign_day", self._tick),
subgoal=subgoal.token,
)
self._tick += 1
await asyncio.sleep(self._tick_interval)
except asyncio.CancelledError:
logger.info("King campaign task cancelled at tick %d", self._tick)
raise
finally:
self._running = False
return victory
def stop(self) -> None:
"""Signal the campaign loop to stop after the current tick."""
self._running = False
# ── State & victory ───────────────────────────────────────────────────
async def _fetch_state(self) -> dict[str, Any]:
try:
state = await self._gabs.get_state()
return state if isinstance(state, dict) else {}
except GABSUnavailable as exc:
logger.warning("GABS unavailable at tick %d: %s — using empty state", self._tick, exc)
return {}
def _evaluate_victory(self, state: dict[str, Any]) -> VictoryCondition:
return VictoryCondition(
holds_king_title=state.get("player_title") == "King",
territory_control_pct=float(state.get("territory_control_pct", 0.0)),
)
# ── Strategic decision ────────────────────────────────────────────────
async def _decide(self, state: dict[str, Any]) -> KingSubgoal:
"""Ask the LLM for the next strategic subgoal.
Falls back to RECRUIT (safe default) if the LLM is unavailable.
"""
try:
subgoal = await asyncio.to_thread(self._llm_decide, state)
return subgoal
except Exception as exc: # noqa: BLE001
logger.warning(
"King LLM decision failed at tick %d: %s — defaulting to RECRUIT", self._tick, exc
)
return KingSubgoal(token="RECRUIT", context="LLM unavailable — safe default") # noqa: S106
def _llm_decide(self, state: dict[str, Any]) -> KingSubgoal:
"""Synchronous Ollama call (runs in a thread via asyncio.to_thread)."""
import urllib.request
prompt_state = json.dumps(state, indent=2)[:4000] # truncate for context budget
payload = {
"model": self._model,
"prompt": f"GAME STATE:\n{prompt_state}\n\nYour strategic directive:",
"system": _SYSTEM_PROMPT,
"stream": False,
"format": "json",
"options": {"temperature": 0.1},
}
data = json.dumps(payload).encode()
req = urllib.request.Request(
f"{self._ollama_url}/api/generate",
data=data,
headers={"Content-Type": "application/json"},
)
with urllib.request.urlopen(req, timeout=30) as resp: # noqa: S310
result = json.loads(resp.read())
raw = result.get("response", "{}")
parsed = json.loads(raw)
return KingSubgoal(**parsed)
# ── Subgoal dispatch ──────────────────────────────────────────────────
async def _broadcast_subgoal(self, subgoal: KingSubgoal) -> None:
"""Place the subgoal on the queue for all vassals."""
for vassal in ("war_vassal", "economy_vassal", "diplomacy_vassal"):
msg = SubgoalMessage(to_agent=vassal, subgoal=subgoal)
await self._subgoal_queue.put(msg)
logger.debug(
"Tick %d: subgoal %s%s (priority=%.1f)",
self._tick,
subgoal.token,
subgoal.target or "",
subgoal.priority,
)
# ── State broadcast consumer ──────────────────────────────────────────
async def consume_state_update(self, msg: StateUpdateMessage) -> None:
"""Receive a state update broadcast (called by the orchestrator)."""
logger.debug("King received state update tick=%d", msg.tick)

View File

@@ -0,0 +1,296 @@
"""Vassal agents — War, Economy, and Diplomacy.
Vassals are mid-tier agents responsible for a domain of the kingdom.
Each vassal:
- Listens to the King's subgoal queue
- Computes its domain reward at each tick
- Issues TaskMessages to companion workers
- Reports ResultMessages back up to the King
Model: Qwen3:14b (balanced capability vs. latency).
Frequency: up to 4× per campaign day.
Refs: #1097, #1099.
"""
from __future__ import annotations
import asyncio
import logging
from typing import Any
from bannerlord.gabs_client import GABSClient, GABSUnavailable
from bannerlord.models import (
DiplomacyReward,
EconomyReward,
KingSubgoal,
ResultMessage,
SubgoalMessage,
TaskMessage,
WarReward,
)
logger = logging.getLogger(__name__)
# Tokens each vassal responds to (all others are ignored)
_WAR_TOKENS = {"EXPAND_TERRITORY", "RAID_ECONOMY", "TRAIN"}
_ECON_TOKENS = {"FORTIFY", "CONSOLIDATE"}
_DIPLO_TOKENS = {"ALLY"}
_LOGISTICS_TOKENS = {"RECRUIT", "HEAL"}
_TRADE_TOKENS = {"TRADE"}
_SCOUT_TOKENS = {"SPY"}
class BaseVassal:
"""Shared vassal lifecycle — subscribes to subgoal queue, runs tick loop."""
name: str = "base_vassal"
def __init__(
self,
gabs_client: GABSClient,
subgoal_queue: asyncio.Queue[SubgoalMessage],
result_queue: asyncio.Queue[ResultMessage] | None = None,
task_queue: asyncio.Queue[TaskMessage] | None = None,
) -> None:
self._gabs = gabs_client
self._subgoal_queue = subgoal_queue
self._result_queue = result_queue or asyncio.Queue()
self._task_queue = task_queue or asyncio.Queue()
self._active_subgoal: KingSubgoal | None = None
self._running = False
@property
def task_queue(self) -> asyncio.Queue[TaskMessage]:
return self._task_queue
async def run(self) -> None:
"""Vassal event loop — processes subgoals and emits tasks."""
self._running = True
logger.info("%s started", self.name)
try:
while self._running:
# Drain all pending subgoals (keep the latest)
try:
while True:
msg = self._subgoal_queue.get_nowait()
if msg.to_agent == self.name:
self._active_subgoal = msg.subgoal
logger.debug("%s received subgoal %s", self.name, msg.subgoal.token)
except asyncio.QueueEmpty:
pass
if self._active_subgoal is not None:
await self._tick(self._active_subgoal)
await asyncio.sleep(0.25) # yield to event loop
except asyncio.CancelledError:
logger.info("%s cancelled", self.name)
raise
finally:
self._running = False
def stop(self) -> None:
self._running = False
async def _tick(self, subgoal: KingSubgoal) -> None:
raise NotImplementedError
async def _get_state(self) -> dict[str, Any]:
try:
return await self._gabs.get_state() or {}
except GABSUnavailable:
return {}
# ── War Vassal ────────────────────────────────────────────────────────────────
class WarVassal(BaseVassal):
"""Military operations — sieges, field battles, raids, defensive maneuvers.
Reward function:
R = 0.40*ΔTerritoryValue + 0.25*ΔArmyStrengthRatio
- 0.20*CasualtyCost - 0.10*SupplyCost + 0.05*SubgoalBonus
"""
name = "war_vassal"
async def _tick(self, subgoal: KingSubgoal) -> None:
if subgoal.token not in _WAR_TOKENS | _LOGISTICS_TOKENS:
return
state = await self._get_state()
reward = self._compute_reward(state, subgoal)
task = self._plan_action(state, subgoal)
if task:
await self._task_queue.put(task)
logger.debug(
"%s tick: subgoal=%s reward=%.3f action=%s",
self.name,
subgoal.token,
reward.total,
task.primitive if task else "none",
)
def _compute_reward(self, state: dict[str, Any], subgoal: KingSubgoal) -> WarReward:
bonus = subgoal.priority * 0.05 if subgoal.token in _WAR_TOKENS else 0.0
return WarReward(
territory_delta=float(state.get("territory_delta", 0.0)),
army_strength_ratio=float(state.get("army_strength_ratio", 1.0)),
casualty_cost=float(state.get("casualty_cost", 0.0)),
supply_cost=float(state.get("supply_cost", 0.0)),
subgoal_bonus=bonus,
)
def _plan_action(self, state: dict[str, Any], subgoal: KingSubgoal) -> TaskMessage | None:
if subgoal.token == "EXPAND_TERRITORY" and subgoal.target: # noqa: S105
return TaskMessage(
from_agent=self.name,
to_agent="logistics_companion",
primitive="move_party",
args={"destination": subgoal.target},
priority=subgoal.priority,
)
if subgoal.token == "RECRUIT": # noqa: S105
qty = subgoal.quantity or 20
return TaskMessage(
from_agent=self.name,
to_agent="logistics_companion",
primitive="recruit_troop",
args={"troop_type": "infantry", "quantity": qty},
priority=subgoal.priority,
)
if subgoal.token == "TRAIN": # noqa: S105
return TaskMessage(
from_agent=self.name,
to_agent="logistics_companion",
primitive="upgrade_troops",
args={},
priority=subgoal.priority,
)
return None
# ── Economy Vassal ────────────────────────────────────────────────────────────
class EconomyVassal(BaseVassal):
"""Settlement management, tax collection, construction, food supply.
Reward function:
R = 0.35*DailyDenarsIncome + 0.25*FoodStockBuffer + 0.20*LoyaltyAverage
- 0.15*ConstructionQueueLength + 0.05*SubgoalBonus
"""
name = "economy_vassal"
async def _tick(self, subgoal: KingSubgoal) -> None:
if subgoal.token not in _ECON_TOKENS | _TRADE_TOKENS:
return
state = await self._get_state()
reward = self._compute_reward(state, subgoal)
task = self._plan_action(state, subgoal)
if task:
await self._task_queue.put(task)
logger.debug(
"%s tick: subgoal=%s reward=%.3f",
self.name,
subgoal.token,
reward.total,
)
def _compute_reward(self, state: dict[str, Any], subgoal: KingSubgoal) -> EconomyReward:
bonus = subgoal.priority * 0.05 if subgoal.token in _ECON_TOKENS else 0.0
return EconomyReward(
daily_denars_income=float(state.get("daily_income", 0.0)),
food_stock_buffer=float(state.get("food_days_remaining", 0.0)),
loyalty_average=float(state.get("avg_loyalty", 50.0)),
construction_queue_length=int(state.get("construction_queue", 0)),
subgoal_bonus=bonus,
)
def _plan_action(self, state: dict[str, Any], subgoal: KingSubgoal) -> TaskMessage | None:
if subgoal.token == "FORTIFY" and subgoal.target: # noqa: S105
return TaskMessage(
from_agent=self.name,
to_agent="logistics_companion",
primitive="build_project",
args={"settlement": subgoal.target},
priority=subgoal.priority,
)
if subgoal.token == "TRADE": # noqa: S105
return TaskMessage(
from_agent=self.name,
to_agent="caravan_companion",
primitive="assess_prices",
args={"town": subgoal.target or "nearest"},
priority=subgoal.priority,
)
return None
# ── Diplomacy Vassal ──────────────────────────────────────────────────────────
class DiplomacyVassal(BaseVassal):
"""Relations management — alliances, peace deals, tribute, marriage.
Reward function:
R = 0.30*AlliesCount + 0.25*TruceDurationValue + 0.25*RelationsScoreWeighted
- 0.15*ActiveWarsFront + 0.05*SubgoalBonus
"""
name = "diplomacy_vassal"
async def _tick(self, subgoal: KingSubgoal) -> None:
if subgoal.token not in _DIPLO_TOKENS | _SCOUT_TOKENS:
return
state = await self._get_state()
reward = self._compute_reward(state, subgoal)
task = self._plan_action(state, subgoal)
if task:
await self._task_queue.put(task)
logger.debug(
"%s tick: subgoal=%s reward=%.3f",
self.name,
subgoal.token,
reward.total,
)
def _compute_reward(self, state: dict[str, Any], subgoal: KingSubgoal) -> DiplomacyReward:
bonus = subgoal.priority * 0.05 if subgoal.token in _DIPLO_TOKENS else 0.0
return DiplomacyReward(
allies_count=int(state.get("allies_count", 0)),
truce_duration_value=float(state.get("truce_value", 0.0)),
relations_score_weighted=float(state.get("relations_weighted", 0.0)),
active_wars_front=int(state.get("active_wars", 0)),
subgoal_bonus=bonus,
)
def _plan_action(self, state: dict[str, Any], subgoal: KingSubgoal) -> TaskMessage | None:
if subgoal.token == "ALLY" and subgoal.target: # noqa: S105
return TaskMessage(
from_agent=self.name,
to_agent="scout_companion",
primitive="track_lord",
args={"name": subgoal.target},
priority=subgoal.priority,
)
if subgoal.token == "SPY" and subgoal.target: # noqa: S105
return TaskMessage(
from_agent=self.name,
to_agent="scout_companion",
primitive="assess_garrison",
args={"settlement": subgoal.target},
priority=subgoal.priority,
)
return None

View File

@@ -0,0 +1,198 @@
"""GABS TCP/JSON-RPC client.
Connects to the Bannerlord.GABS C# mod server running on a Windows VM.
Protocol: newline-delimited JSON-RPC 2.0 over raw TCP.
Default host: localhost, port: 4825 (configurable via settings.bannerlord_gabs_host
and settings.bannerlord_gabs_port).
Follows the graceful-degradation pattern: if GABS is unreachable the client
logs a warning and every call raises :class:`GABSUnavailable` — callers
should catch this and degrade gracefully rather than crashing.
Refs: #1091, #1097.
"""
from __future__ import annotations
import asyncio
import json
import logging
from typing import Any
logger = logging.getLogger(__name__)
_DEFAULT_HOST = "localhost"
_DEFAULT_PORT = 4825
_DEFAULT_TIMEOUT = 10.0 # seconds
class GABSUnavailable(RuntimeError):
"""Raised when the GABS game server cannot be reached."""
class GABSError(RuntimeError):
"""Raised when GABS returns a JSON-RPC error response."""
def __init__(self, code: int, message: str) -> None:
super().__init__(f"GABS error {code}: {message}")
self.code = code
class GABSClient:
"""Async TCP JSON-RPC client for Bannerlord.GABS.
Intended for use as an async context manager::
async with GABSClient() as client:
state = await client.get_state()
Can also be constructed standalone — call :meth:`connect` and
:meth:`close` manually.
"""
def __init__(
self,
host: str = _DEFAULT_HOST,
port: int = _DEFAULT_PORT,
timeout: float = _DEFAULT_TIMEOUT,
) -> None:
self._host = host
self._port = port
self._timeout = timeout
self._reader: asyncio.StreamReader | None = None
self._writer: asyncio.StreamWriter | None = None
self._seq = 0
self._connected = False
# ── Lifecycle ─────────────────────────────────────────────────────────
async def connect(self) -> None:
"""Open the TCP connection to GABS.
Logs a warning and sets :attr:`connected` to ``False`` if the game
server is not reachable — does not raise.
"""
try:
self._reader, self._writer = await asyncio.wait_for(
asyncio.open_connection(self._host, self._port),
timeout=self._timeout,
)
self._connected = True
logger.info("GABS connected at %s:%s", self._host, self._port)
except (TimeoutError, OSError) as exc:
logger.warning(
"GABS unavailable at %s:%s — Bannerlord agent will degrade: %s",
self._host,
self._port,
exc,
)
self._connected = False
async def close(self) -> None:
if self._writer is not None:
try:
self._writer.close()
await self._writer.wait_closed()
except Exception: # noqa: BLE001
pass
self._connected = False
logger.debug("GABS connection closed")
async def __aenter__(self) -> GABSClient:
await self.connect()
return self
async def __aexit__(self, *_: Any) -> None:
await self.close()
@property
def connected(self) -> bool:
return self._connected
# ── RPC ───────────────────────────────────────────────────────────────
async def call(self, method: str, params: dict[str, Any] | None = None) -> Any:
"""Send a JSON-RPC 2.0 request and return the ``result`` field.
Raises:
GABSUnavailable: if the client is not connected.
GABSError: if the server returns a JSON-RPC error.
"""
if not self._connected or self._reader is None or self._writer is None:
raise GABSUnavailable(
f"GABS not connected (host={self._host}, port={self._port}). "
"Is the Bannerlord VM running?"
)
self._seq += 1
request = {
"jsonrpc": "2.0",
"id": self._seq,
"method": method,
"params": params or {},
}
payload = json.dumps(request) + "\n"
try:
self._writer.write(payload.encode())
await asyncio.wait_for(self._writer.drain(), timeout=self._timeout)
raw = await asyncio.wait_for(self._reader.readline(), timeout=self._timeout)
except (TimeoutError, OSError) as exc:
self._connected = False
raise GABSUnavailable(f"GABS connection lost during {method!r}: {exc}") from exc
response = json.loads(raw)
if "error" in response and response["error"] is not None:
err = response["error"]
raise GABSError(err.get("code", -1), err.get("message", "unknown"))
return response.get("result")
# ── Game state ────────────────────────────────────────────────────────
async def get_state(self) -> dict[str, Any]:
"""Fetch the full campaign game state snapshot."""
return await self.call("game.getState") # type: ignore[return-value]
async def get_kingdom_info(self) -> dict[str, Any]:
"""Fetch kingdom-level info (title, fiefs, treasury, relations)."""
return await self.call("kingdom.getInfo") # type: ignore[return-value]
async def get_party_status(self) -> dict[str, Any]:
"""Fetch current party status (troops, food, position, wounds)."""
return await self.call("party.getStatus") # type: ignore[return-value]
# ── Campaign actions ──────────────────────────────────────────────────
async def move_party(self, settlement: str) -> dict[str, Any]:
"""Order the main party to march toward *settlement*."""
return await self.call("party.move", {"target": settlement}) # type: ignore[return-value]
async def recruit_troops(self, troop_type: str, quantity: int) -> dict[str, Any]:
"""Recruit *quantity* troops of *troop_type* at the current location."""
return await self.call( # type: ignore[return-value]
"party.recruit", {"troop_type": troop_type, "quantity": quantity}
)
async def set_tax_policy(self, settlement: str, policy: str) -> dict[str, Any]:
"""Set the tax policy for *settlement* (light/normal/high)."""
return await self.call( # type: ignore[return-value]
"settlement.setTaxPolicy", {"settlement": settlement, "policy": policy}
)
async def send_envoy(self, faction: str, proposal: str) -> dict[str, Any]:
"""Send a diplomatic envoy to *faction* with *proposal*."""
return await self.call( # type: ignore[return-value]
"diplomacy.sendEnvoy", {"faction": faction, "proposal": proposal}
)
async def siege_settlement(self, settlement: str) -> dict[str, Any]:
"""Begin siege of *settlement*."""
return await self.call("battle.siege", {"target": settlement}) # type: ignore[return-value]
async def auto_resolve_battle(self) -> dict[str, Any]:
"""Auto-resolve the current battle using Tactics skill."""
return await self.call("battle.autoResolve") # type: ignore[return-value]

256
src/bannerlord/ledger.py Normal file
View File

@@ -0,0 +1,256 @@
"""Asset ledger for the Bannerlord sovereign agent.
Tracks kingdom assets (denars, settlements, troop allocations) in an
in-memory dict backed by SQLite for persistence. Follows the existing
SQLite migration pattern in this repo.
The King has exclusive write access to treasury and settlement ownership.
Vassals receive an allocated budget and cannot exceed it without King
re-authorization. Companions hold only work-in-progress quotas.
Refs: #1097, #1099.
"""
from __future__ import annotations
import logging
import sqlite3
from collections.abc import Iterator
from contextlib import contextmanager
from datetime import datetime
from pathlib import Path
logger = logging.getLogger(__name__)
_DEFAULT_DB = Path.home() / ".timmy" / "bannerlord" / "ledger.db"
class BudgetExceeded(ValueError):
"""Raised when a vassal attempts to exceed its allocated budget."""
class Ledger:
"""Sovereign asset ledger backed by SQLite.
Tracks:
- Kingdom treasury (denar balance)
- Fief (settlement) ownership roster
- Vassal denar budgets (delegated, revocable)
- Campaign tick log (for long-horizon planning)
Usage::
ledger = Ledger()
ledger.initialize()
ledger.deposit(5000, "tax income — Epicrotea")
ledger.allocate_budget("war_vassal", 2000)
"""
def __init__(self, db_path: Path = _DEFAULT_DB) -> None:
self._db_path = db_path
self._db_path.parent.mkdir(parents=True, exist_ok=True)
# ── Setup ─────────────────────────────────────────────────────────────
def initialize(self) -> None:
"""Create tables if they don't exist."""
with self._conn() as conn:
conn.executescript(
"""
CREATE TABLE IF NOT EXISTS treasury (
id INTEGER PRIMARY KEY CHECK (id = 1),
balance REAL NOT NULL DEFAULT 0
);
INSERT OR IGNORE INTO treasury (id, balance) VALUES (1, 0);
CREATE TABLE IF NOT EXISTS fiefs (
name TEXT PRIMARY KEY,
fief_type TEXT NOT NULL, -- town / castle / village
acquired_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS vassal_budgets (
agent TEXT PRIMARY KEY,
allocated REAL NOT NULL DEFAULT 0,
spent REAL NOT NULL DEFAULT 0
);
CREATE TABLE IF NOT EXISTS tick_log (
tick INTEGER PRIMARY KEY,
campaign_day INTEGER NOT NULL,
subgoal TEXT,
reward_war REAL,
reward_econ REAL,
reward_diplo REAL,
logged_at TEXT NOT NULL
);
"""
)
logger.debug("Ledger initialized at %s", self._db_path)
# ── Treasury ──────────────────────────────────────────────────────────
def balance(self) -> float:
with self._conn() as conn:
row = conn.execute("SELECT balance FROM treasury WHERE id = 1").fetchone()
return float(row[0]) if row else 0.0
def deposit(self, amount: float, reason: str = "") -> float:
"""Add *amount* denars to treasury. Returns new balance."""
if amount < 0:
raise ValueError("Use withdraw() for negative amounts")
with self._conn() as conn:
conn.execute("UPDATE treasury SET balance = balance + ? WHERE id = 1", (amount,))
bal = self.balance()
logger.info("Treasury +%.0f denars (%s) → balance %.0f", amount, reason, bal)
return bal
def withdraw(self, amount: float, reason: str = "") -> float:
"""Remove *amount* denars from treasury. Returns new balance."""
if amount < 0:
raise ValueError("Amount must be positive")
bal = self.balance()
if amount > bal:
raise BudgetExceeded(
f"Cannot withdraw {amount:.0f} denars — treasury balance is only {bal:.0f}"
)
with self._conn() as conn:
conn.execute("UPDATE treasury SET balance = balance - ? WHERE id = 1", (amount,))
new_bal = self.balance()
logger.info("Treasury -%.0f denars (%s) → balance %.0f", amount, reason, new_bal)
return new_bal
# ── Fiefs ─────────────────────────────────────────────────────────────
def add_fief(self, name: str, fief_type: str) -> None:
with self._conn() as conn:
conn.execute(
"INSERT OR REPLACE INTO fiefs (name, fief_type, acquired_at) VALUES (?, ?, ?)",
(name, fief_type, datetime.utcnow().isoformat()),
)
logger.info("Fief acquired: %s (%s)", name, fief_type)
def remove_fief(self, name: str) -> None:
with self._conn() as conn:
conn.execute("DELETE FROM fiefs WHERE name = ?", (name,))
logger.info("Fief lost: %s", name)
def list_fiefs(self) -> list[dict[str, str]]:
with self._conn() as conn:
rows = conn.execute("SELECT name, fief_type, acquired_at FROM fiefs").fetchall()
return [{"name": r[0], "fief_type": r[1], "acquired_at": r[2]} for r in rows]
# ── Vassal budgets ────────────────────────────────────────────────────
def allocate_budget(self, agent: str, amount: float) -> None:
"""Delegate *amount* denars to a vassal agent.
Withdraws from treasury. Raises :class:`BudgetExceeded` if
the treasury cannot cover the allocation.
"""
self.withdraw(amount, reason=f"budget → {agent}")
with self._conn() as conn:
conn.execute(
"""
INSERT INTO vassal_budgets (agent, allocated, spent)
VALUES (?, ?, 0)
ON CONFLICT(agent) DO UPDATE SET allocated = allocated + excluded.allocated
""",
(agent, amount),
)
logger.info("Allocated %.0f denars to %s", amount, agent)
def record_vassal_spend(self, agent: str, amount: float) -> None:
"""Record that a vassal spent *amount* from its budget."""
with self._conn() as conn:
row = conn.execute(
"SELECT allocated, spent FROM vassal_budgets WHERE agent = ?", (agent,)
).fetchone()
if row is None:
raise BudgetExceeded(f"{agent} has no allocated budget")
allocated, spent = row
if spent + amount > allocated:
raise BudgetExceeded(
f"{agent} budget exhausted: {spent:.0f}/{allocated:.0f} spent, "
f"requested {amount:.0f}"
)
with self._conn() as conn:
conn.execute(
"UPDATE vassal_budgets SET spent = spent + ? WHERE agent = ?",
(amount, agent),
)
def vassal_remaining(self, agent: str) -> float:
with self._conn() as conn:
row = conn.execute(
"SELECT allocated - spent FROM vassal_budgets WHERE agent = ?", (agent,)
).fetchone()
return float(row[0]) if row else 0.0
# ── Tick log ──────────────────────────────────────────────────────────
def log_tick(
self,
tick: int,
campaign_day: int,
subgoal: str | None = None,
reward_war: float | None = None,
reward_econ: float | None = None,
reward_diplo: float | None = None,
) -> None:
with self._conn() as conn:
conn.execute(
"""
INSERT OR REPLACE INTO tick_log
(tick, campaign_day, subgoal, reward_war, reward_econ, reward_diplo, logged_at)
VALUES (?, ?, ?, ?, ?, ?, ?)
""",
(
tick,
campaign_day,
subgoal,
reward_war,
reward_econ,
reward_diplo,
datetime.utcnow().isoformat(),
),
)
def tick_history(self, last_n: int = 100) -> list[dict]:
with self._conn() as conn:
rows = conn.execute(
"""
SELECT tick, campaign_day, subgoal, reward_war, reward_econ, reward_diplo, logged_at
FROM tick_log
ORDER BY tick DESC
LIMIT ?
""",
(last_n,),
).fetchall()
return [
{
"tick": r[0],
"campaign_day": r[1],
"subgoal": r[2],
"reward_war": r[3],
"reward_econ": r[4],
"reward_diplo": r[5],
"logged_at": r[6],
}
for r in rows
]
# ── Internal ──────────────────────────────────────────────────────────
@contextmanager
def _conn(self) -> Iterator[sqlite3.Connection]:
conn = sqlite3.connect(self._db_path)
conn.execute("PRAGMA journal_mode=WAL")
try:
yield conn
conn.commit()
except Exception:
conn.rollback()
raise
finally:
conn.close()

191
src/bannerlord/models.py Normal file
View File

@@ -0,0 +1,191 @@
"""Bannerlord feudal hierarchy data models.
All inter-agent communication uses typed Pydantic models. No raw dicts
cross agent boundaries — every message is validated at construction time.
Design: Ahilan & Dayan (2019) Feudal Multi-Agent Hierarchies.
Refs: #1097, #1099.
"""
from __future__ import annotations
from datetime import datetime
from typing import Any, Literal
from pydantic import BaseModel, Field
# ── Subgoal vocabulary ────────────────────────────────────────────────────────
SUBGOAL_TOKENS = frozenset(
{
"EXPAND_TERRITORY", # Take or secure a fief — War Vassal
"RAID_ECONOMY", # Raid enemy villages for denars — War Vassal
"FORTIFY", # Upgrade or repair a settlement — Economy Vassal
"RECRUIT", # Fill party to capacity — Logistics Companion
"TRADE", # Execute profitable trade route — Caravan Companion
"ALLY", # Pursue non-aggression / alliance — Diplomacy Vassal
"SPY", # Gain information on target faction — Scout Companion
"HEAL", # Rest party until wounds recovered — Logistics Companion
"CONSOLIDATE", # Hold territory, no expansion — Economy Vassal
"TRAIN", # Level troops via auto-resolve bandits — War Vassal
}
)
# ── King subgoal ──────────────────────────────────────────────────────────────
class KingSubgoal(BaseModel):
"""Strategic directive issued by the King agent to vassals.
The King operates on campaign-map timescale (days to weeks of in-game
time). His sole output is one subgoal token plus optional parameters.
He never micro-manages primitives.
"""
token: str = Field(..., description="One of SUBGOAL_TOKENS")
target: str | None = Field(None, description="Named target (settlement, lord, faction)")
quantity: int | None = Field(None, description="For RECRUIT, TRADE tokens", ge=1)
priority: float = Field(1.0, ge=0.0, le=2.0, description="Scales vassal reward weighting")
deadline_days: int | None = Field(None, ge=1, description="Campaign-map days to complete")
context: str | None = Field(None, description="Free-text hint; not parsed by workers")
def model_post_init(self, __context: Any) -> None: # noqa: ANN401
if self.token not in SUBGOAL_TOKENS:
raise ValueError(
f"Unknown subgoal token {self.token!r}. Must be one of: {sorted(SUBGOAL_TOKENS)}"
)
# ── Inter-agent messages ──────────────────────────────────────────────────────
class SubgoalMessage(BaseModel):
"""King → Vassal direction."""
msg_type: Literal["subgoal"] = "subgoal"
from_agent: Literal["king"] = "king"
to_agent: str = Field(..., description="e.g. 'war_vassal', 'economy_vassal'")
subgoal: KingSubgoal
issued_at: datetime = Field(default_factory=datetime.utcnow)
class TaskMessage(BaseModel):
"""Vassal → Companion direction."""
msg_type: Literal["task"] = "task"
from_agent: str = Field(..., description="e.g. 'war_vassal'")
to_agent: str = Field(..., description="e.g. 'logistics_companion'")
primitive: str = Field(..., description="One of the companion primitives")
args: dict[str, Any] = Field(default_factory=dict)
priority: float = Field(1.0, ge=0.0, le=2.0)
issued_at: datetime = Field(default_factory=datetime.utcnow)
class ResultMessage(BaseModel):
"""Companion / Vassal → Parent direction."""
msg_type: Literal["result"] = "result"
from_agent: str
to_agent: str
success: bool
outcome: dict[str, Any] = Field(default_factory=dict, description="Primitive-specific result")
reward_delta: float = Field(0.0, description="Computed reward contribution")
completed_at: datetime = Field(default_factory=datetime.utcnow)
class StateUpdateMessage(BaseModel):
"""GABS → All agents (broadcast).
Sent every campaign tick. Agents consume at their own cadence.
"""
msg_type: Literal["state"] = "state"
game_state: dict[str, Any] = Field(..., description="Full GABS state snapshot")
tick: int = Field(..., ge=0)
timestamp: datetime = Field(default_factory=datetime.utcnow)
# ── Reward snapshots ──────────────────────────────────────────────────────────
class WarReward(BaseModel):
"""Computed reward for the War Vassal at a given tick."""
territory_delta: float = 0.0
army_strength_ratio: float = 1.0
casualty_cost: float = 0.0
supply_cost: float = 0.0
subgoal_bonus: float = 0.0
@property
def total(self) -> float:
w1, w2, w3, w4, w5 = 0.40, 0.25, 0.20, 0.10, 0.05
return (
w1 * self.territory_delta
+ w2 * self.army_strength_ratio
- w3 * self.casualty_cost
- w4 * self.supply_cost
+ w5 * self.subgoal_bonus
)
class EconomyReward(BaseModel):
"""Computed reward for the Economy Vassal at a given tick."""
daily_denars_income: float = 0.0
food_stock_buffer: float = 0.0
loyalty_average: float = 50.0
construction_queue_length: int = 0
subgoal_bonus: float = 0.0
@property
def total(self) -> float:
w1, w2, w3, w4, w5 = 0.35, 0.25, 0.20, 0.15, 0.05
return (
w1 * self.daily_denars_income
+ w2 * self.food_stock_buffer
+ w3 * self.loyalty_average
- w4 * self.construction_queue_length
+ w5 * self.subgoal_bonus
)
class DiplomacyReward(BaseModel):
"""Computed reward for the Diplomacy Vassal at a given tick."""
allies_count: int = 0
truce_duration_value: float = 0.0
relations_score_weighted: float = 0.0
active_wars_front: int = 0
subgoal_bonus: float = 0.0
@property
def total(self) -> float:
w1, w2, w3, w4, w5 = 0.30, 0.25, 0.25, 0.15, 0.05
return (
w1 * self.allies_count
+ w2 * self.truce_duration_value
+ w3 * self.relations_score_weighted
- w4 * self.active_wars_front
+ w5 * self.subgoal_bonus
)
# ── Victory condition ─────────────────────────────────────────────────────────
class VictoryCondition(BaseModel):
"""Sovereign Victory (M5) — evaluated each campaign tick."""
holds_king_title: bool = False
territory_control_pct: float = Field(
0.0, ge=0.0, le=100.0, description="% of Calradia fiefs held"
)
majority_threshold: float = Field(
51.0, ge=0.0, le=100.0, description="Required % for majority control"
)
@property
def achieved(self) -> bool:
return self.holds_king_title and self.territory_control_pct >= self.majority_threshold

View File

@@ -51,6 +51,13 @@ class Settings(BaseSettings):
# Set to 0 to use model defaults.
ollama_num_ctx: int = 32768
# Maximum models loaded simultaneously in Ollama — override with OLLAMA_MAX_LOADED_MODELS
# Set to 2 so Qwen3-8B and Qwen3-14B can stay hot concurrently (~17 GB combined).
# Requires Ollama ≥ 0.1.33. Export this to the Ollama process environment:
# OLLAMA_MAX_LOADED_MODELS=2 ollama serve
# or add it to your systemd/launchd unit before starting the harness.
ollama_max_loaded_models: int = 2
# Fallback model chains — override with FALLBACK_MODELS / VISION_FALLBACK_MODELS
# as comma-separated strings, e.g. FALLBACK_MODELS="qwen3:8b,qwen2.5:14b"
# Or edit config/providers.yaml → fallback_chains for the canonical source.
@@ -87,8 +94,18 @@ class Settings(BaseSettings):
# ── Backend selection ────────────────────────────────────────────────────
# "ollama" — always use Ollama (default, safe everywhere)
# "vllm" — use vLLM inference server (OpenAI-compatible, faster throughput)
# "auto" — pick best available local backend, fall back to Ollama
timmy_model_backend: Literal["ollama", "grok", "claude", "auto"] = "ollama"
timmy_model_backend: Literal["ollama", "vllm", "grok", "claude", "auto"] = "ollama"
# ── vLLM backend ──────────────────────────────────────────────────────────
# vLLM is an OpenAI-compatible inference server optimised for continuous
# batching — 310x higher throughput than Ollama for agentic workloads.
# Start server: python -m vllm.entrypoints.openai.api_server \
# --model Qwen/Qwen2.5-14B-Instruct --port 8001
# Then set TIMMY_LLM_BACKEND=vllm (or enable vllm-local in providers.yaml)
vllm_url: str = "http://localhost:8001"
vllm_model: str = "Qwen/Qwen2.5-14B-Instruct"
# ── Grok (xAI) — opt-in premium cloud backend ────────────────────────
# Grok is a premium augmentation layer — local-first ethos preserved.
@@ -228,6 +245,10 @@ class Settings(BaseSettings):
# ── Test / Diagnostics ─────────────────────────────────────────────
# Skip loading heavy embedding models (for tests / low-memory envs).
timmy_skip_embeddings: bool = False
# Embedding backend: "ollama" for Ollama, "local" for sentence-transformers.
timmy_embedding_backend: Literal["ollama", "local"] = "local"
# Ollama model to use for embeddings (e.g., "nomic-embed-text").
ollama_embedding_model: str = "nomic-embed-text"
# Disable CSRF middleware entirely (for tests).
timmy_disable_csrf: bool = False
# Mark the process as running in test mode.
@@ -376,6 +397,11 @@ class Settings(BaseSettings):
autoresearch_time_budget: int = 300 # seconds per experiment run
autoresearch_max_iterations: int = 100
autoresearch_metric: str = "val_bpb" # metric to optimise (lower = better)
# M3 Max / Apple Silicon tuning (Issue #905).
# dataset: "tinystories" (default, lower-entropy, recommended for Mac) or "openwebtext".
autoresearch_dataset: str = "tinystories"
# backend: "auto" detects MLX on Apple Silicon; "cpu" forces CPU fallback.
autoresearch_backend: str = "auto"
# ── Weekly Narrative Summary ───────────────────────────────────────
# Generates a human-readable weekly summary of development activity.
@@ -406,6 +432,14 @@ class Settings(BaseSettings):
# Alert threshold: free disk below this triggers cleanup / alert (GB).
hermes_disk_free_min_gb: float = 10.0
# ── Energy Budget Monitoring ───────────────────────────────────────
# Enable energy budget monitoring (tracks CPU/GPU power during inference).
energy_budget_enabled: bool = True
# Watts threshold that auto-activates low power mode (on-battery only).
energy_budget_watts_threshold: float = 15.0
# Model to prefer in low power mode (smaller = more efficient).
energy_low_power_model: str = "qwen3:1b"
# ── Error Logging ─────────────────────────────────────────────────
error_log_enabled: bool = True
error_log_dir: str = "logs"

View File

@@ -33,25 +33,30 @@ from dashboard.routes.calm import router as calm_router
from dashboard.routes.chat_api import router as chat_api_router
from dashboard.routes.chat_api_v1 import router as chat_api_v1_router
from dashboard.routes.daily_run import router as daily_run_router
from dashboard.routes.hermes import router as hermes_router
from dashboard.routes.db_explorer import router as db_explorer_router
from dashboard.routes.discord import router as discord_router
from dashboard.routes.experiments import router as experiments_router
from dashboard.routes.grok import router as grok_router
from dashboard.routes.energy import router as energy_router
from dashboard.routes.health import router as health_router
from dashboard.routes.hermes import router as hermes_router
from dashboard.routes.loop_qa import router as loop_qa_router
from dashboard.routes.memory import router as memory_router
from dashboard.routes.mobile import router as mobile_router
from dashboard.routes.models import api_router as models_api_router
from dashboard.routes.models import router as models_router
from dashboard.routes.nexus import router as nexus_router
from dashboard.routes.quests import router as quests_router
from dashboard.routes.scorecards import router as scorecards_router
from dashboard.routes.sovereignty_metrics import router as sovereignty_metrics_router
from dashboard.routes.sovereignty_ws import router as sovereignty_ws_router
from dashboard.routes.spark import router as spark_router
from dashboard.routes.system import router as system_router
from dashboard.routes.tasks import router as tasks_router
from dashboard.routes.telegram import router as telegram_router
from dashboard.routes.thinking import router as thinking_router
from dashboard.routes.self_correction import router as self_correction_router
from dashboard.routes.three_strike import router as three_strike_router
from dashboard.routes.tools import router as tools_router
from dashboard.routes.tower import router as tower_router
from dashboard.routes.voice import router as voice_router
@@ -547,12 +552,28 @@ async def lifespan(app: FastAPI):
except Exception:
logger.debug("Failed to register error recorder")
# Mark session start for sovereignty duration tracking
try:
from timmy.sovereignty import mark_session_start
mark_session_start()
except Exception:
logger.debug("Failed to mark sovereignty session start")
logger.info("✓ Dashboard ready for requests")
yield
await _shutdown_cleanup(bg_tasks, workshop_heartbeat)
# Generate and commit sovereignty session report
try:
from timmy.sovereignty import generate_and_commit_report
await generate_and_commit_report()
except Exception as exc:
logger.warning("Sovereignty report generation failed at shutdown: %s", exc)
app = FastAPI(
title="Mission Control",
@@ -651,6 +672,7 @@ app.include_router(tools_router)
app.include_router(spark_router)
app.include_router(discord_router)
app.include_router(memory_router)
app.include_router(nexus_router)
app.include_router(grok_router)
app.include_router(models_router)
app.include_router(models_api_router)
@@ -669,9 +691,13 @@ app.include_router(matrix_router)
app.include_router(tower_router)
app.include_router(daily_run_router)
app.include_router(hermes_router)
app.include_router(energy_router)
app.include_router(quests_router)
app.include_router(scorecards_router)
app.include_router(sovereignty_metrics_router)
app.include_router(sovereignty_ws_router)
app.include_router(three_strike_router)
app.include_router(self_correction_router)
@app.websocket("/ws")

View File

@@ -8,6 +8,8 @@ from .database import Base # Assuming a shared Base in models/database.py
class TaskState(StrEnum):
"""Enumeration of possible task lifecycle states."""
LATER = "LATER"
NEXT = "NEXT"
NOW = "NOW"
@@ -16,12 +18,16 @@ class TaskState(StrEnum):
class TaskCertainty(StrEnum):
"""Enumeration of task time-certainty levels."""
FUZZY = "FUZZY" # An intention without a time
SOFT = "SOFT" # A flexible task with a time
HARD = "HARD" # A fixed meeting/appointment
class Task(Base):
"""SQLAlchemy model representing a CALM task."""
__tablename__ = "tasks"
id = Column(Integer, primary_key=True, index=True)
@@ -52,6 +58,8 @@ class Task(Base):
class JournalEntry(Base):
"""SQLAlchemy model for a daily journal entry with MITs and reflections."""
__tablename__ = "journal_entries"
id = Column(Integer, primary_key=True, index=True)

View File

@@ -14,6 +14,8 @@ router = APIRouter(prefix="/discord", tags=["discord"])
class TokenPayload(BaseModel):
"""Request payload containing a Discord bot token."""
token: str

View File

@@ -0,0 +1,121 @@
"""Energy Budget Monitoring routes.
Exposes the energy budget monitor via REST API so the dashboard and
external tools can query power draw, efficiency scores, and toggle
low power mode.
Refs: #1009
"""
import logging
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel
from config import settings
from infrastructure.energy.monitor import energy_monitor
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/energy", tags=["energy"])
class LowPowerRequest(BaseModel):
"""Request body for toggling low power mode."""
enabled: bool
class InferenceEventRequest(BaseModel):
"""Request body for recording an inference event."""
model: str
tokens_per_second: float
@router.get("/status")
async def energy_status():
"""Return the current energy budget status.
Returns the live power estimate, efficiency score (010), recent
inference samples, and whether low power mode is active.
"""
if not getattr(settings, "energy_budget_enabled", True):
return {
"enabled": False,
"message": "Energy budget monitoring is disabled (ENERGY_BUDGET_ENABLED=false)",
}
report = await energy_monitor.get_report()
return {**report.to_dict(), "enabled": True}
@router.get("/report")
async def energy_report():
"""Detailed energy budget report with all recent samples.
Same as /energy/status but always includes the full sample history.
"""
if not getattr(settings, "energy_budget_enabled", True):
raise HTTPException(status_code=503, detail="Energy budget monitoring is disabled")
report = await energy_monitor.get_report()
data = report.to_dict()
# Override recent_samples to include the full window (not just last 10)
data["recent_samples"] = [
{
"timestamp": s.timestamp,
"model": s.model,
"tokens_per_second": round(s.tokens_per_second, 1),
"estimated_watts": round(s.estimated_watts, 2),
"efficiency": round(s.efficiency, 3),
"efficiency_score": round(s.efficiency_score, 2),
}
for s in list(energy_monitor._samples)
]
return {**data, "enabled": True}
@router.post("/low-power")
async def set_low_power_mode(body: LowPowerRequest):
"""Enable or disable low power mode.
In low power mode the cascade router is advised to prefer the
configured energy_low_power_model (see settings).
"""
if not getattr(settings, "energy_budget_enabled", True):
raise HTTPException(status_code=503, detail="Energy budget monitoring is disabled")
energy_monitor.set_low_power_mode(body.enabled)
low_power_model = getattr(settings, "energy_low_power_model", "qwen3:1b")
return {
"low_power_mode": body.enabled,
"preferred_model": low_power_model if body.enabled else None,
"message": (
f"Low power mode {'enabled' if body.enabled else 'disabled'}. "
+ (f"Routing to {low_power_model}." if body.enabled else "Routing restored to default.")
),
}
@router.post("/record")
async def record_inference_event(body: InferenceEventRequest):
"""Record an inference event for efficiency tracking.
Called after each LLM inference completes. Updates the rolling
efficiency score and may auto-activate low power mode if watts
exceed the configured threshold.
"""
if not getattr(settings, "energy_budget_enabled", True):
return {"recorded": False, "message": "Energy budget monitoring is disabled"}
if body.tokens_per_second <= 0:
raise HTTPException(status_code=422, detail="tokens_per_second must be positive")
sample = energy_monitor.record_inference(body.model, body.tokens_per_second)
return {
"recorded": True,
"efficiency_score": round(sample.efficiency_score, 2),
"estimated_watts": round(sample.estimated_watts, 2),
"low_power_mode": energy_monitor.low_power_mode,
}

View File

@@ -124,6 +124,73 @@ async def check_ollama() -> bool:
return dep.status == "healthy"
# vLLM health cache (30-second TTL)
_vllm_cache: DependencyStatus | None = None
_vllm_cache_ts: float = 0.0
_VLLM_CACHE_TTL = 30.0
def _check_vllm_sync() -> DependencyStatus:
"""Synchronous vLLM check — run via asyncio.to_thread()."""
try:
import urllib.request
base_url = settings.vllm_url.rstrip("/")
# vLLM exposes /health at the server root (strip /v1 if present)
if base_url.endswith("/v1"):
base_url = base_url[:-3]
req = urllib.request.Request(
f"{base_url}/health",
method="GET",
headers={"Accept": "application/json"},
)
with urllib.request.urlopen(req, timeout=2) as response:
if response.status == 200:
return DependencyStatus(
name="vLLM",
status="healthy",
sovereignty_score=10,
details={"url": settings.vllm_url, "model": settings.vllm_model},
)
except Exception as exc:
logger.debug("vLLM health check failed: %s", exc)
return DependencyStatus(
name="vLLM",
status="unavailable",
sovereignty_score=10,
details={"url": settings.vllm_url, "error": "Cannot connect to vLLM server"},
)
async def _check_vllm() -> DependencyStatus:
"""Check vLLM backend status without blocking the event loop.
Results are cached for 30 seconds. vLLM is an optional backend;
unavailability triggers graceful fallback to Ollama.
"""
global _vllm_cache, _vllm_cache_ts # noqa: PLW0603
now = time.monotonic()
if _vllm_cache is not None and (now - _vllm_cache_ts) < _VLLM_CACHE_TTL:
return _vllm_cache
try:
result = await asyncio.to_thread(_check_vllm_sync)
except Exception as exc:
logger.debug("vLLM async check failed: %s", exc)
result = DependencyStatus(
name="vLLM",
status="unavailable",
sovereignty_score=10,
details={"url": settings.vllm_url, "error": "Cannot connect to vLLM server"},
)
_vllm_cache = result
_vllm_cache_ts = now
return result
def _check_lightning() -> DependencyStatus:
"""Check Lightning payment backend status."""
return DependencyStatus(
@@ -195,13 +262,22 @@ async def health_check():
# Legacy format for test compatibility
ollama_ok = await check_ollama()
agent_status = "idle" if ollama_ok else "offline"
# Check vLLM only when it is the configured backend (avoid probing unused services)
vllm_status: str | None = None
if settings.timmy_model_backend == "vllm":
vllm_dep = await _check_vllm()
vllm_status = "up" if vllm_dep.status == "healthy" else "down"
inference_ok = vllm_status == "up" if vllm_status is not None else ollama_ok
agent_status = "idle" if inference_ok else "offline"
services: dict = {"ollama": "up" if ollama_ok else "down"}
if vllm_status is not None:
services["vllm"] = vllm_status
return {
"status": "ok" if ollama_ok else "degraded",
"services": {
"ollama": "up" if ollama_ok else "down",
},
"status": "ok" if inference_ok else "degraded",
"services": services,
"agents": {
"agent": {"status": agent_status},
},
@@ -210,7 +286,7 @@ async def health_check():
"version": "2.0.0",
"uptime_seconds": uptime,
"llm_backend": settings.timmy_model_backend,
"llm_model": settings.ollama_model,
"llm_model": settings.vllm_model if settings.timmy_model_backend == "vllm" else settings.ollama_model,
}
@@ -252,6 +328,9 @@ async def sovereignty_check():
_check_lightning(),
_check_sqlite(),
]
# Include vLLM in the audit when it is the active backend
if settings.timmy_model_backend == "vllm":
dependencies.append(await _check_vllm())
overall = _calculate_overall_score(dependencies)
recommendations = _generate_recommendations(dependencies)

View File

@@ -0,0 +1,166 @@
"""Nexus — Timmy's persistent conversational awareness space.
A conversational-only interface where Timmy maintains live memory context.
No tool use; pure conversation with memory integration and a teaching panel.
Routes:
GET /nexus — render nexus page with live memory sidebar
POST /nexus/chat — send a message; returns HTMX partial
POST /nexus/teach — inject a fact into Timmy's live memory
DELETE /nexus/history — clear the nexus conversation history
"""
import asyncio
import logging
from datetime import UTC, datetime
from fastapi import APIRouter, Form, Request
from fastapi.responses import HTMLResponse
from dashboard.templating import templates
from timmy.memory_system import (
get_memory_stats,
recall_personal_facts_with_ids,
search_memories,
store_personal_fact,
)
from timmy.session import _clean_response, chat, reset_session
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/nexus", tags=["nexus"])
_NEXUS_SESSION_ID = "nexus"
_MAX_MESSAGE_LENGTH = 10_000
# In-memory conversation log for the Nexus session (mirrors chat store pattern
# but is scoped to the Nexus so it won't pollute the main dashboard history).
_nexus_log: list[dict] = []
def _ts() -> str:
return datetime.now(UTC).strftime("%H:%M:%S")
def _append_log(role: str, content: str) -> None:
_nexus_log.append({"role": role, "content": content, "timestamp": _ts()})
# Keep last 200 exchanges to bound memory usage
if len(_nexus_log) > 200:
del _nexus_log[:-200]
@router.get("", response_class=HTMLResponse)
async def nexus_page(request: Request):
"""Render the Nexus page with live memory context."""
stats = get_memory_stats()
facts = recall_personal_facts_with_ids()[:8]
return templates.TemplateResponse(
request,
"nexus.html",
{
"page_title": "Nexus",
"messages": list(_nexus_log),
"stats": stats,
"facts": facts,
},
)
@router.post("/chat", response_class=HTMLResponse)
async def nexus_chat(request: Request, message: str = Form(...)):
"""Conversational-only chat routed through the Nexus session.
Does not invoke tool-use approval flow — pure conversation with memory
context injected from Timmy's live memory store.
"""
message = message.strip()
if not message:
return HTMLResponse("")
if len(message) > _MAX_MESSAGE_LENGTH:
return templates.TemplateResponse(
request,
"partials/nexus_message.html",
{
"user_message": message[:80] + "",
"response": None,
"error": "Message too long (max 10 000 chars).",
"timestamp": _ts(),
"memory_hits": [],
},
)
ts = _ts()
# Fetch semantically relevant memories to surface in the sidebar
try:
memory_hits = await asyncio.to_thread(search_memories, query=message, limit=4)
except Exception as exc:
logger.warning("Nexus memory search failed: %s", exc)
memory_hits = []
# Conversational response — no tool approval flow
response_text: str | None = None
error_text: str | None = None
try:
raw = await chat(message, session_id=_NEXUS_SESSION_ID)
response_text = _clean_response(raw)
except Exception as exc:
logger.error("Nexus chat error: %s", exc)
error_text = "Timmy is unavailable right now. Check that Ollama is running."
_append_log("user", message)
if response_text:
_append_log("assistant", response_text)
return templates.TemplateResponse(
request,
"partials/nexus_message.html",
{
"user_message": message,
"response": response_text,
"error": error_text,
"timestamp": ts,
"memory_hits": memory_hits,
},
)
@router.post("/teach", response_class=HTMLResponse)
async def nexus_teach(request: Request, fact: str = Form(...)):
"""Inject a fact into Timmy's live memory from the Nexus teaching panel."""
fact = fact.strip()
if not fact:
return HTMLResponse("")
try:
await asyncio.to_thread(store_personal_fact, fact)
facts = await asyncio.to_thread(recall_personal_facts_with_ids)
facts = facts[:8]
except Exception as exc:
logger.error("Nexus teach error: %s", exc)
facts = []
return templates.TemplateResponse(
request,
"partials/nexus_facts.html",
{"facts": facts, "taught": fact},
)
@router.delete("/history", response_class=HTMLResponse)
async def nexus_clear_history(request: Request):
"""Clear the Nexus conversation history."""
_nexus_log.clear()
reset_session(session_id=_NEXUS_SESSION_ID)
return templates.TemplateResponse(
request,
"partials/nexus_message.html",
{
"user_message": None,
"response": "Nexus conversation cleared.",
"error": None,
"timestamp": _ts(),
"memory_hits": [],
},
)

View File

@@ -10,6 +10,7 @@ from fastapi.responses import HTMLResponse, JSONResponse
from dashboard.services.scorecard_service import (
PeriodType,
ScorecardSummary,
generate_all_scorecards,
generate_scorecard,
get_tracked_agents,
@@ -26,6 +27,216 @@ def _format_period_label(period_type: PeriodType) -> str:
return "Daily" if period_type == PeriodType.daily else "Weekly"
def _parse_period(period: str) -> PeriodType:
"""Parse period string into PeriodType, defaulting to daily on invalid input.
Args:
period: The period string ('daily' or 'weekly')
Returns:
PeriodType.daily or PeriodType.weekly
"""
try:
return PeriodType(period.lower())
except ValueError:
return PeriodType.daily
def _format_token_display(token_net: int) -> str:
"""Format token net value with +/- prefix for display.
Args:
token_net: The net token value
Returns:
Formatted string with + prefix for positive values
"""
return f"{'+' if token_net > 0 else ''}{token_net}"
def _format_token_class(token_net: int) -> str:
"""Get CSS class for token net value based on sign.
Args:
token_net: The net token value
Returns:
'text-success' for positive/zero, 'text-danger' for negative
"""
return "text-success" if token_net >= 0 else "text-danger"
def _build_patterns_html(patterns: list[str]) -> str:
"""Build HTML for patterns section if patterns exist.
Args:
patterns: List of pattern strings
Returns:
HTML string for patterns section or empty string
"""
if not patterns:
return ""
patterns_list = "".join([f"<li>{p}</li>" for p in patterns])
return f"""
<div class="mt-3">
<h6>Patterns</h6>
<ul class="list-unstyled text-info">
{patterns_list}
</ul>
</div>
"""
def _build_narrative_html(bullets: list[str]) -> str:
"""Build HTML for narrative bullets.
Args:
bullets: List of narrative bullet strings
Returns:
HTML string with list items
"""
return "".join([f"<li>{b}</li>" for b in bullets])
def _build_metrics_row_html(metrics: dict) -> str:
"""Build HTML for the metrics summary row.
Args:
metrics: Dictionary with PRs, issues, tests, and token metrics
Returns:
HTML string for the metrics row
"""
prs_opened = metrics["prs_opened"]
prs_merged = metrics["prs_merged"]
pr_merge_rate = int(metrics["pr_merge_rate"] * 100)
issues_touched = metrics["issues_touched"]
tests_affected = metrics["tests_affected"]
token_net = metrics["token_net"]
token_class = _format_token_class(token_net)
token_display = _format_token_display(token_net)
return f"""
<div class="row text-center small">
<div class="col">
<div class="text-muted">PRs</div>
<div class="fw-bold">{prs_opened}/{prs_merged}</div>
<div class="text-muted" style="font-size: 0.75rem;">
{pr_merge_rate}% merged
</div>
</div>
<div class="col">
<div class="text-muted">Issues</div>
<div class="fw-bold">{issues_touched}</div>
</div>
<div class="col">
<div class="text-muted">Tests</div>
<div class="fw-bold">{tests_affected}</div>
</div>
<div class="col">
<div class="text-muted">Tokens</div>
<div class="fw-bold {token_class}">{token_display}</div>
</div>
</div>
"""
def _render_scorecard_panel(
agent_id: str,
period_type: PeriodType,
data: dict,
) -> str:
"""Render HTML for a single scorecard panel.
Args:
agent_id: The agent ID
period_type: Daily or weekly period
data: Scorecard data dictionary with metrics, patterns, narrative_bullets
Returns:
HTML string for the scorecard panel
"""
patterns_html = _build_patterns_html(data.get("patterns", []))
bullets_html = _build_narrative_html(data.get("narrative_bullets", []))
metrics_row = _build_metrics_row_html(data["metrics"])
return f"""
<div class="card mc-panel">
<div class="card-header d-flex justify-content-between align-items-center">
<h5 class="card-title mb-0">{agent_id.title()}</h5>
<span class="badge bg-secondary">{_format_period_label(period_type)}</span>
</div>
<div class="card-body">
<ul class="list-unstyled mb-3">
{bullets_html}
</ul>
{metrics_row}
{patterns_html}
</div>
</div>
"""
def _render_empty_scorecard(agent_id: str) -> str:
"""Render HTML for an empty scorecard (no activity).
Args:
agent_id: The agent ID
Returns:
HTML string for the empty scorecard panel
"""
return f"""
<div class="card mc-panel">
<h5 class="card-title">{agent_id.title()}</h5>
<p class="text-muted">No activity recorded for this period.</p>
</div>
"""
def _render_error_scorecard(agent_id: str, error: str) -> str:
"""Render HTML for a scorecard that failed to load.
Args:
agent_id: The agent ID
error: Error message string
Returns:
HTML string for the error scorecard panel
"""
return f"""
<div class="card mc-panel border-danger">
<h5 class="card-title">{agent_id.title()}</h5>
<p class="text-danger">Error loading scorecard: {error}</p>
</div>
"""
def _render_single_panel_wrapper(
agent_id: str,
period_type: PeriodType,
scorecard: ScorecardSummary | None,
) -> str:
"""Render a complete scorecard panel with wrapper div for single panel view.
Args:
agent_id: The agent ID
period_type: Daily or weekly period
scorecard: ScorecardSummary object or None
Returns:
HTML string for the complete panel
"""
if scorecard is None:
return _render_empty_scorecard(agent_id)
return _render_scorecard_panel(agent_id, period_type, scorecard.to_dict())
@router.get("/api/agents")
async def list_tracked_agents() -> dict[str, list[str]]:
"""Return the list of tracked agent IDs.
@@ -149,99 +360,50 @@ async def agent_scorecard_panel(
Returns:
HTML panel with scorecard content
"""
try:
period_type = PeriodType(period.lower())
except ValueError:
period_type = PeriodType.daily
period_type = _parse_period(period)
try:
scorecard = generate_scorecard(agent_id, period_type)
if scorecard is None:
return HTMLResponse(
content=f"""
<div class="card mc-panel">
<h5 class="card-title">{agent_id.title()}</h5>
<p class="text-muted">No activity recorded for this period.</p>
</div>
""",
status_code=200,
)
data = scorecard.to_dict()
# Build patterns HTML
patterns_html = ""
if data["patterns"]:
patterns_list = "".join([f"<li>{p}</li>" for p in data["patterns"]])
patterns_html = f"""
<div class="mt-3">
<h6>Patterns</h6>
<ul class="list-unstyled text-info">
{patterns_list}
</ul>
</div>
"""
# Build bullets HTML
bullets_html = "".join([f"<li>{b}</li>" for b in data["narrative_bullets"]])
# Build metrics summary
metrics = data["metrics"]
html_content = f"""
<div class="card mc-panel">
<div class="card-header d-flex justify-content-between align-items-center">
<h5 class="card-title mb-0">{agent_id.title()}</h5>
<span class="badge bg-secondary">{_format_period_label(period_type)}</span>
</div>
<div class="card-body">
<ul class="list-unstyled mb-3">
{bullets_html}
</ul>
<div class="row text-center small">
<div class="col">
<div class="text-muted">PRs</div>
<div class="fw-bold">{metrics["prs_opened"]}/{metrics["prs_merged"]}</div>
<div class="text-muted" style="font-size: 0.75rem;">
{int(metrics["pr_merge_rate"] * 100)}% merged
</div>
</div>
<div class="col">
<div class="text-muted">Issues</div>
<div class="fw-bold">{metrics["issues_touched"]}</div>
</div>
<div class="col">
<div class="text-muted">Tests</div>
<div class="fw-bold">{metrics["tests_affected"]}</div>
</div>
<div class="col">
<div class="text-muted">Tokens</div>
<div class="fw-bold {"text-success" if metrics["token_net"] >= 0 else "text-danger"}">
{"+" if metrics["token_net"] > 0 else ""}{metrics["token_net"]}
</div>
</div>
</div>
{patterns_html}
</div>
</div>
"""
html_content = _render_single_panel_wrapper(agent_id, period_type, scorecard)
return HTMLResponse(content=html_content)
except Exception as exc:
logger.error("Failed to render scorecard panel for %s: %s", agent_id, exc)
return HTMLResponse(
content=f"""
<div class="card mc-panel border-danger">
<h5 class="card-title">{agent_id.title()}</h5>
<p class="text-danger">Error loading scorecard: {str(exc)}</p>
</div>
""",
status_code=200,
return HTMLResponse(content=_render_error_scorecard(agent_id, str(exc)))
def _render_all_panels_grid(
scorecards: list[ScorecardSummary],
period_type: PeriodType,
) -> str:
"""Render all scorecard panels in a grid layout.
Args:
scorecards: List of scorecard summaries
period_type: Daily or weekly period
Returns:
HTML string with all panels in a grid
"""
panels: list[str] = []
for scorecard in scorecards:
panel_html = _render_scorecard_panel(
scorecard.agent_id,
period_type,
scorecard.to_dict(),
)
# Wrap each panel in a grid column
wrapped = f'<div class="col-md-6 col-lg-4 mb-3">{panel_html}</div>'
panels.append(wrapped)
return f"""
<div class="row">
{"".join(panels)}
</div>
<div class="text-muted small mt-2">
Generated: {datetime.now().strftime("%Y-%m-%d %H:%M:%S UTC")}
</div>
"""
@router.get("/all/panels", response_class=HTMLResponse)
@@ -258,96 +420,15 @@ async def all_scorecard_panels(
Returns:
HTML with all scorecard panels
"""
try:
period_type = PeriodType(period.lower())
except ValueError:
period_type = PeriodType.daily
period_type = _parse_period(period)
try:
scorecards = generate_all_scorecards(period_type)
panels: list[str] = []
for scorecard in scorecards:
data = scorecard.to_dict()
# Build patterns HTML
patterns_html = ""
if data["patterns"]:
patterns_list = "".join([f"<li>{p}</li>" for p in data["patterns"]])
patterns_html = f"""
<div class="mt-3">
<h6>Patterns</h6>
<ul class="list-unstyled text-info">
{patterns_list}
</ul>
</div>
"""
# Build bullets HTML
bullets_html = "".join([f"<li>{b}</li>" for b in data["narrative_bullets"]])
metrics = data["metrics"]
panel_html = f"""
<div class="col-md-6 col-lg-4 mb-3">
<div class="card mc-panel">
<div class="card-header d-flex justify-content-between align-items-center">
<h5 class="card-title mb-0">{scorecard.agent_id.title()}</h5>
<span class="badge bg-secondary">{_format_period_label(period_type)}</span>
</div>
<div class="card-body">
<ul class="list-unstyled mb-3">
{bullets_html}
</ul>
<div class="row text-center small">
<div class="col">
<div class="text-muted">PRs</div>
<div class="fw-bold">{metrics["prs_opened"]}/{metrics["prs_merged"]}</div>
<div class="text-muted" style="font-size: 0.75rem;">
{int(metrics["pr_merge_rate"] * 100)}% merged
</div>
</div>
<div class="col">
<div class="text-muted">Issues</div>
<div class="fw-bold">{metrics["issues_touched"]}</div>
</div>
<div class="col">
<div class="text-muted">Tests</div>
<div class="fw-bold">{metrics["tests_affected"]}</div>
</div>
<div class="col">
<div class="text-muted">Tokens</div>
<div class="fw-bold {"text-success" if metrics["token_net"] >= 0 else "text-danger"}">
{"+" if metrics["token_net"] > 0 else ""}{metrics["token_net"]}
</div>
</div>
</div>
{patterns_html}
</div>
</div>
</div>
"""
panels.append(panel_html)
html_content = f"""
<div class="row">
{"".join(panels)}
</div>
<div class="text-muted small mt-2">
Generated: {datetime.now().strftime("%Y-%m-%d %H:%M:%S UTC")}
</div>
"""
html_content = _render_all_panels_grid(scorecards, period_type)
return HTMLResponse(content=html_content)
except Exception as exc:
logger.error("Failed to render all scorecard panels: %s", exc)
return HTMLResponse(
content=f"""
<div class="alert alert-danger">
Error loading scorecards: {str(exc)}
</div>
""",
status_code=200,
content=f'<div class="alert alert-danger">Error loading scorecards: {exc}</div>'
)

View File

@@ -0,0 +1,58 @@
"""Self-Correction Dashboard routes.
GET /self-correction/ui — HTML dashboard
GET /self-correction/timeline — HTMX partial: recent event timeline
GET /self-correction/patterns — HTMX partial: recurring failure patterns
"""
import logging
from fastapi import APIRouter, Request
from fastapi.responses import HTMLResponse
from dashboard.templating import templates
from infrastructure.self_correction import get_corrections, get_patterns, get_stats
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/self-correction", tags=["self-correction"])
@router.get("/ui", response_class=HTMLResponse)
async def self_correction_ui(request: Request):
"""Render the Self-Correction Dashboard."""
stats = get_stats()
corrections = get_corrections(limit=20)
patterns = get_patterns(top_n=10)
return templates.TemplateResponse(
request,
"self_correction.html",
{
"stats": stats,
"corrections": corrections,
"patterns": patterns,
},
)
@router.get("/timeline", response_class=HTMLResponse)
async def self_correction_timeline(request: Request):
"""HTMX partial: recent self-correction event timeline."""
corrections = get_corrections(limit=30)
return templates.TemplateResponse(
request,
"partials/self_correction_timeline.html",
{"corrections": corrections},
)
@router.get("/patterns", response_class=HTMLResponse)
async def self_correction_patterns(request: Request):
"""HTMX partial: recurring failure patterns."""
patterns = get_patterns(top_n=10)
stats = get_stats()
return templates.TemplateResponse(
request,
"partials/self_correction_patterns.html",
{"patterns": patterns, "stats": stats},
)

View File

@@ -0,0 +1,40 @@
"""WebSocket emitter for the sovereignty metrics dashboard widget.
Streams real-time sovereignty snapshots to connected clients every
*_PUSH_INTERVAL* seconds. The snapshot includes per-layer sovereignty
percentages, API cost rate, and skill crystallisation count.
Refs: #954, #953
"""
import asyncio
import json
import logging
from fastapi import APIRouter, WebSocket
router = APIRouter(tags=["sovereignty"])
logger = logging.getLogger(__name__)
_PUSH_INTERVAL = 5 # seconds between snapshot pushes
@router.websocket("/ws/sovereignty")
async def sovereignty_ws(websocket: WebSocket) -> None:
"""Stream sovereignty metric snapshots to the dashboard widget."""
from timmy.sovereignty.metrics import get_metrics_store
await websocket.accept()
logger.info("Sovereignty WS connected")
store = get_metrics_store()
try:
# Send initial snapshot immediately
await websocket.send_text(json.dumps(store.get_snapshot()))
while True:
await asyncio.sleep(_PUSH_INTERVAL)
await websocket.send_text(json.dumps(store.get_snapshot()))
except Exception:
logger.debug("Sovereignty WS disconnected")

View File

@@ -7,6 +7,8 @@ router = APIRouter(prefix="/telegram", tags=["telegram"])
class TokenPayload(BaseModel):
"""Request payload containing a Telegram bot token."""
token: str

View File

@@ -0,0 +1,116 @@
"""Three-Strike Detector dashboard routes.
Provides JSON API endpoints for inspecting and managing the three-strike
detector state.
Refs: #962
"""
import logging
from typing import Any
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel
from timmy.sovereignty.three_strike import CATEGORIES, get_detector
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/sovereignty/three-strike", tags=["three-strike"])
class RecordRequest(BaseModel):
category: str
key: str
metadata: dict[str, Any] = {}
class AutomationRequest(BaseModel):
artifact_path: str
@router.get("")
async def list_strikes() -> dict[str, Any]:
"""Return all strike records."""
detector = get_detector()
records = detector.list_all()
return {
"records": [
{
"category": r.category,
"key": r.key,
"count": r.count,
"blocked": r.blocked,
"automation": r.automation,
"first_seen": r.first_seen,
"last_seen": r.last_seen,
}
for r in records
],
"categories": sorted(CATEGORIES),
}
@router.get("/blocked")
async def list_blocked() -> dict[str, Any]:
"""Return only blocked (category, key) pairs."""
detector = get_detector()
records = detector.list_blocked()
return {
"blocked": [
{
"category": r.category,
"key": r.key,
"count": r.count,
"automation": r.automation,
"last_seen": r.last_seen,
}
for r in records
]
}
@router.post("/record")
async def record_strike(body: RecordRequest) -> dict[str, Any]:
"""Record a manual action. Returns strike state; 409 when blocked."""
from timmy.sovereignty.three_strike import ThreeStrikeError
detector = get_detector()
try:
record = detector.record(body.category, body.key, body.metadata)
return {
"category": record.category,
"key": record.key,
"count": record.count,
"blocked": record.blocked,
"automation": record.automation,
}
except ValueError as exc:
raise HTTPException(status_code=422, detail=str(exc)) from exc
except ThreeStrikeError as exc:
raise HTTPException(
status_code=409,
detail={
"error": "three_strike_block",
"message": str(exc),
"category": exc.category,
"key": exc.key,
"count": exc.count,
},
) from exc
@router.post("/{category}/{key}/automation")
async def register_automation(category: str, key: str, body: AutomationRequest) -> dict[str, bool]:
"""Register an automation artifact to unblock a (category, key) pair."""
detector = get_detector()
detector.register_automation(category, key, body.artifact_path)
return {"success": True}
@router.get("/{category}/{key}/events")
async def get_strike_events(category: str, key: str, limit: int = 50) -> dict[str, Any]:
"""Return the individual strike events for a (category, key) pair."""
detector = get_detector()
events = detector.get_events(category, key, limit=limit)
return {"category": category, "key": key, "events": events}

View File

@@ -41,6 +41,7 @@ def _save_voice_settings(data: dict) -> None:
except Exception as exc:
logger.warning("Failed to save voice settings: %s", exc)
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/voice", tags=["voice"])

View File

@@ -51,6 +51,8 @@ def _get_db() -> Generator[sqlite3.Connection, None, None]:
class _EnumLike:
"""Lightweight enum-like wrapper for string values used in templates."""
def __init__(self, v: str):
self.value = v

View File

@@ -23,6 +23,8 @@ TRACKED_AGENTS = frozenset({"hermes", "kimi", "manus", "claude", "gemini"})
class PeriodType(StrEnum):
"""Scorecard reporting period type."""
daily = "daily"
weekly = "weekly"

View File

@@ -67,9 +67,11 @@
<div class="mc-nav-dropdown">
<button class="mc-test-link mc-dropdown-toggle" aria-expanded="false">INTEL &#x25BE;</button>
<div class="mc-dropdown-menu">
<a href="/nexus" class="mc-test-link">NEXUS</a>
<a href="/spark/ui" class="mc-test-link">SPARK</a>
<a href="/memory" class="mc-test-link">MEMORY</a>
<a href="/marketplace/ui" class="mc-test-link">MARKET</a>
<a href="/self-correction/ui" class="mc-test-link">SELF-CORRECT</a>
</div>
</div>
<div class="mc-nav-dropdown">
@@ -131,6 +133,7 @@
<a href="/spark/ui" class="mc-mobile-link">SPARK</a>
<a href="/memory" class="mc-mobile-link">MEMORY</a>
<a href="/marketplace/ui" class="mc-mobile-link">MARKET</a>
<a href="/self-correction/ui" class="mc-mobile-link">SELF-CORRECT</a>
<div class="mc-mobile-section-label">AGENTS</div>
<a href="/hands" class="mc-mobile-link">HANDS</a>
<a href="/work-orders/queue" class="mc-mobile-link">WORK ORDERS</a>

View File

@@ -186,6 +186,24 @@
<p class="chat-history-placeholder">Loading sovereignty metrics...</p>
{% endcall %}
<!-- Agent Scorecards -->
<div class="card mc-card-spaced" id="mc-scorecards-card">
<div class="card-header">
<h2 class="card-title">Agent Scorecards</h2>
<div class="d-flex align-items-center gap-2">
<select id="mc-scorecard-period" class="form-select form-select-sm" style="width: auto;"
onchange="loadMcScorecards()">
<option value="daily" selected>Daily</option>
<option value="weekly">Weekly</option>
</select>
<a href="/scorecards" class="btn btn-sm btn-outline-secondary">Full View</a>
</div>
</div>
<div id="mc-scorecards-content" class="p-2">
<p class="chat-history-placeholder">Loading scorecards...</p>
</div>
</div>
<!-- Chat History -->
<div class="card mc-card-spaced">
<div class="card-header">
@@ -502,6 +520,20 @@ async function loadSparkStatus() {
}
}
// Load agent scorecards
async function loadMcScorecards() {
var period = document.getElementById('mc-scorecard-period').value;
var container = document.getElementById('mc-scorecards-content');
container.innerHTML = '<p class="chat-history-placeholder">Loading scorecards...</p>';
try {
var response = await fetch('/scorecards/all/panels?period=' + period);
var html = await response.text();
container.innerHTML = html;
} catch (error) {
container.innerHTML = '<p class="chat-history-placeholder">Scorecards unavailable</p>';
}
}
// Initial load
loadSparkStatus();
loadSovereignty();
@@ -510,6 +542,7 @@ loadSwarmStats();
loadLightningStats();
loadGrokStats();
loadChatHistory();
loadMcScorecards();
// Periodic updates
setInterval(loadSovereignty, 30000);
@@ -518,5 +551,6 @@ setInterval(loadSwarmStats, 5000);
setInterval(updateHeartbeat, 5000);
setInterval(loadGrokStats, 10000);
setInterval(loadSparkStatus, 15000);
setInterval(loadMcScorecards, 300000);
</script>
{% endblock %}

View File

@@ -0,0 +1,122 @@
{% extends "base.html" %}
{% block title %}Nexus{% endblock %}
{% block extra_styles %}{% endblock %}
{% block content %}
<div class="container-fluid nexus-layout py-3">
<div class="nexus-header mb-3">
<div class="nexus-title">// NEXUS</div>
<div class="nexus-subtitle">
Persistent conversational awareness &mdash; always present, always learning.
</div>
</div>
<div class="nexus-grid">
<!-- ── LEFT: Conversation ────────────────────────────────── -->
<div class="nexus-chat-col">
<div class="card mc-panel nexus-chat-panel">
<div class="card-header mc-panel-header d-flex justify-content-between align-items-center">
<span>// CONVERSATION</span>
<button class="mc-btn mc-btn-sm"
hx-delete="/nexus/history"
hx-target="#nexus-chat-log"
hx-swap="beforeend"
hx-confirm="Clear nexus conversation?">
CLEAR
</button>
</div>
<div class="card-body p-2" id="nexus-chat-log">
{% for msg in messages %}
<div class="chat-message {{ 'user' if msg.role == 'user' else 'agent' }}">
<div class="msg-meta">
{{ 'YOU' if msg.role == 'user' else 'TIMMY' }} // {{ msg.timestamp }}
</div>
<div class="msg-body {% if msg.role == 'assistant' %}timmy-md{% endif %}">
{{ msg.content | e }}
</div>
</div>
{% else %}
<div class="nexus-empty-state">
Nexus is ready. Start a conversation — memories will surface in real time.
</div>
{% endfor %}
</div>
<div class="card-footer p-2">
<form hx-post="/nexus/chat"
hx-target="#nexus-chat-log"
hx-swap="beforeend"
hx-on::after-request="this.reset(); document.getElementById('nexus-chat-log').scrollTop = 999999;">
<div class="d-flex gap-2">
<input type="text"
name="message"
id="nexus-input"
class="mc-search-input flex-grow-1"
placeholder="Talk to Timmy..."
autocomplete="off"
required>
<button type="submit" class="mc-btn mc-btn-primary">SEND</button>
</div>
</form>
</div>
</div>
</div>
<!-- ── RIGHT: Memory sidebar ─────────────────────────────── -->
<div class="nexus-sidebar-col">
<!-- Live memory context (updated with each response) -->
<div class="card mc-panel nexus-memory-panel mb-3">
<div class="card-header mc-panel-header">
<span>// LIVE MEMORY</span>
<span class="badge ms-2" style="background:var(--purple-dim); color:var(--purple);">
{{ stats.total_entries }} stored
</span>
</div>
<div class="card-body p-2">
<div id="nexus-memory-panel" class="nexus-memory-hits">
<div class="nexus-memory-label">Relevant memories appear here as you chat.</div>
</div>
</div>
</div>
<!-- Teaching panel -->
<div class="card mc-panel nexus-teach-panel">
<div class="card-header mc-panel-header">// TEACH TIMMY</div>
<div class="card-body p-2">
<form hx-post="/nexus/teach"
hx-target="#nexus-teach-response"
hx-swap="innerHTML"
hx-on::after-request="this.reset()">
<div class="d-flex gap-2 mb-2">
<input type="text"
name="fact"
class="mc-search-input flex-grow-1"
placeholder="e.g. I prefer dark themes"
required>
<button type="submit" class="mc-btn mc-btn-primary">TEACH</button>
</div>
</form>
<div id="nexus-teach-response"></div>
<div class="nexus-facts-header mt-3">// KNOWN FACTS</div>
<ul class="nexus-facts-list" id="nexus-facts-list">
{% for fact in facts %}
<li class="nexus-fact-item">{{ fact.content | e }}</li>
{% else %}
<li class="nexus-fact-empty">No personal facts stored yet.</li>
{% endfor %}
</ul>
</div>
</div>
</div><!-- /sidebar -->
</div><!-- /nexus-grid -->
</div>
{% endblock %}

View File

@@ -0,0 +1,12 @@
{% if taught %}
<div class="nexus-taught-confirm">
✓ Taught: <em>{{ taught | e }}</em>
</div>
{% endif %}
<ul class="nexus-facts-list" id="nexus-facts-list" hx-swap-oob="true">
{% for fact in facts %}
<li class="nexus-fact-item">{{ fact.content | e }}</li>
{% else %}
<li class="nexus-fact-empty">No facts stored yet.</li>
{% endfor %}
</ul>

View File

@@ -0,0 +1,36 @@
{% if user_message %}
<div class="chat-message user">
<div class="msg-meta">YOU // {{ timestamp }}</div>
<div class="msg-body">{{ user_message | e }}</div>
</div>
{% endif %}
{% if response %}
<div class="chat-message agent">
<div class="msg-meta">TIMMY // {{ timestamp }}</div>
<div class="msg-body timmy-md">{{ response | e }}</div>
</div>
<script>
(function() {
var el = document.currentScript.previousElementSibling.querySelector('.timmy-md');
if (el && typeof marked !== 'undefined' && typeof DOMPurify !== 'undefined') {
el.innerHTML = DOMPurify.sanitize(marked.parse(el.textContent));
}
})();
</script>
{% elif error %}
<div class="chat-message error-msg">
<div class="msg-meta">SYSTEM // {{ timestamp }}</div>
<div class="msg-body">{{ error | e }}</div>
</div>
{% endif %}
{% if memory_hits %}
<div class="nexus-memory-hits" id="nexus-memory-panel" hx-swap-oob="true">
<div class="nexus-memory-label">// LIVE MEMORY CONTEXT</div>
{% for hit in memory_hits %}
<div class="nexus-memory-hit">
<span class="nexus-memory-type">{{ hit.memory_type }}</span>
<span class="nexus-memory-content">{{ hit.content | e }}</span>
</div>
{% endfor %}
</div>
{% endif %}

View File

@@ -0,0 +1,28 @@
{% if patterns %}
<table class="mc-table w-100">
<thead>
<tr>
<th>ERROR TYPE</th>
<th class="text-center">COUNT</th>
<th class="text-center">CORRECTED</th>
<th class="text-center">FAILED</th>
<th>LAST SEEN</th>
</tr>
</thead>
<tbody>
{% for p in patterns %}
<tr>
<td class="sc-pattern-type">{{ p.error_type }}</td>
<td class="text-center">
<span class="badge {% if p.count >= 5 %}badge-error{% elif p.count >= 3 %}badge-warning{% else %}badge-info{% endif %}">{{ p.count }}</span>
</td>
<td class="text-center text-success">{{ p.success_count }}</td>
<td class="text-center {% if p.failed_count > 0 %}text-danger{% else %}text-muted{% endif %}">{{ p.failed_count }}</td>
<td class="sc-event-time">{{ p.last_seen[:16] if p.last_seen else '—' }}</td>
</tr>
{% endfor %}
</tbody>
</table>
{% else %}
<div class="text-center text-muted py-3">No patterns detected yet.</div>
{% endif %}

View File

@@ -0,0 +1,26 @@
{% if corrections %}
{% for ev in corrections %}
<div class="sc-event sc-status-{{ ev.outcome_status }}">
<div class="sc-event-header">
<span class="sc-status-badge sc-status-{{ ev.outcome_status }}">
{% if ev.outcome_status == 'success' %}&#10003; CORRECTED
{% elif ev.outcome_status == 'partial' %}&#9679; PARTIAL
{% else %}&#10007; FAILED
{% endif %}
</span>
<span class="sc-source-badge">{{ ev.source }}</span>
<span class="sc-event-time">{{ ev.created_at[:19] }}</span>
</div>
<div class="sc-event-error-type">{{ ev.error_type }}</div>
<div class="sc-event-intent"><span class="sc-label">INTENT:</span> {{ ev.original_intent[:120] }}{% if ev.original_intent | length > 120 %}&hellip;{% endif %}</div>
<div class="sc-event-error"><span class="sc-label">ERROR:</span> {{ ev.detected_error[:120] }}{% if ev.detected_error | length > 120 %}&hellip;{% endif %}</div>
<div class="sc-event-strategy"><span class="sc-label">STRATEGY:</span> {{ ev.correction_strategy[:120] }}{% if ev.correction_strategy | length > 120 %}&hellip;{% endif %}</div>
<div class="sc-event-outcome"><span class="sc-label">OUTCOME:</span> {{ ev.final_outcome[:120] }}{% if ev.final_outcome | length > 120 %}&hellip;{% endif %}</div>
{% if ev.task_id %}
<div class="sc-event-meta">task: {{ ev.task_id[:8] }}</div>
{% endif %}
</div>
{% endfor %}
{% else %}
<div class="text-center text-muted py-3">No self-correction events recorded yet.</div>
{% endif %}

View File

@@ -0,0 +1,102 @@
{% extends "base.html" %}
{% from "macros.html" import panel %}
{% block title %}Timmy Time — Self-Correction Dashboard{% endblock %}
{% block extra_styles %}{% endblock %}
{% block content %}
<div class="container-fluid py-3">
<!-- Header -->
<div class="spark-header mb-3">
<div class="spark-title">SELF-CORRECTION</div>
<div class="spark-subtitle">
Agent error detection &amp; recovery &mdash;
<span class="spark-status-val">{{ stats.total }}</span> events,
<span class="spark-status-val">{{ stats.success_rate }}%</span> correction rate,
<span class="spark-status-val">{{ stats.unique_error_types }}</span> distinct error types
</div>
</div>
<div class="row g-3">
<!-- Left column: stats + patterns -->
<div class="col-12 col-lg-4 d-flex flex-column gap-3">
<!-- Stats panel -->
<div class="card mc-panel">
<div class="card-header mc-panel-header">// CORRECTION STATS</div>
<div class="card-body p-3">
<div class="spark-stat-grid">
<div class="spark-stat">
<span class="spark-stat-label">TOTAL</span>
<span class="spark-stat-value">{{ stats.total }}</span>
</div>
<div class="spark-stat">
<span class="spark-stat-label">CORRECTED</span>
<span class="spark-stat-value text-success">{{ stats.success_count }}</span>
</div>
<div class="spark-stat">
<span class="spark-stat-label">PARTIAL</span>
<span class="spark-stat-value text-warning">{{ stats.partial_count }}</span>
</div>
<div class="spark-stat">
<span class="spark-stat-label">FAILED</span>
<span class="spark-stat-value {% if stats.failed_count > 0 %}text-danger{% else %}text-muted{% endif %}">{{ stats.failed_count }}</span>
</div>
</div>
<div class="mt-3">
<div class="d-flex justify-content-between mb-1">
<small class="text-muted">Correction Rate</small>
<small class="{% if stats.success_rate >= 70 %}text-success{% elif stats.success_rate >= 40 %}text-warning{% else %}text-danger{% endif %}">{{ stats.success_rate }}%</small>
</div>
<div class="progress" style="height:6px;">
<div class="progress-bar {% if stats.success_rate >= 70 %}bg-success{% elif stats.success_rate >= 40 %}bg-warning{% else %}bg-danger{% endif %}"
role="progressbar"
style="width:{{ stats.success_rate }}%"
aria-valuenow="{{ stats.success_rate }}"
aria-valuemin="0"
aria-valuemax="100"></div>
</div>
</div>
</div>
</div>
<!-- Patterns panel -->
<div class="card mc-panel"
hx-get="/self-correction/patterns"
hx-trigger="load, every 60s"
hx-target="#sc-patterns-body"
hx-swap="innerHTML">
<div class="card-header mc-panel-header d-flex justify-content-between align-items-center">
<span>// RECURRING PATTERNS</span>
<span class="badge badge-info">{{ patterns | length }}</span>
</div>
<div class="card-body p-0" id="sc-patterns-body">
{% include "partials/self_correction_patterns.html" %}
</div>
</div>
</div>
<!-- Right column: timeline -->
<div class="col-12 col-lg-8">
<div class="card mc-panel"
hx-get="/self-correction/timeline"
hx-trigger="load, every 30s"
hx-target="#sc-timeline-body"
hx-swap="innerHTML">
<div class="card-header mc-panel-header d-flex justify-content-between align-items-center">
<span>// CORRECTION TIMELINE</span>
<span class="badge badge-info">{{ corrections | length }}</span>
</div>
<div class="card-body p-3" id="sc-timeline-body">
{% include "partials/self_correction_timeline.html" %}
</div>
</div>
</div>
</div>
</div>
{% endblock %}

View File

@@ -24,6 +24,8 @@ MAX_MESSAGES: int = 500
@dataclass
class Message:
"""A single chat message with role, content, timestamp, and source."""
role: str # "user" | "agent" | "error"
content: str
timestamp: str

View File

@@ -0,0 +1,8 @@
"""Energy Budget Monitoring — power-draw estimation for LLM inference.
Refs: #1009
"""
from infrastructure.energy.monitor import EnergyBudgetMonitor, energy_monitor
__all__ = ["EnergyBudgetMonitor", "energy_monitor"]

View File

@@ -0,0 +1,371 @@
"""Energy Budget Monitor — estimates GPU/CPU power draw during LLM inference.
Tracks estimated power consumption to optimize for "metabolic efficiency".
Three estimation strategies attempted in priority order:
1. Battery discharge via ioreg (macOS — works without sudo, on-battery only)
2. CPU utilisation proxy via sysctl hw.cpufrequency + top
3. Model-size heuristic (tokens/s × model_size_gb × 2W/GB estimate)
Energy Efficiency score (010):
efficiency = tokens_per_second / estimated_watts, normalised to 010.
Low Power Mode:
Activated manually or automatically when draw exceeds the configured
threshold. In low power mode the cascade router is advised to prefer the
configured low_power_model (e.g. qwen3:1b or similar compact model).
Refs: #1009
"""
import asyncio
import json
import logging
import subprocess
import time
from collections import deque
from dataclasses import dataclass, field
from datetime import UTC, datetime
from typing import Any
from config import settings
logger = logging.getLogger(__name__)
# Approximate model-size lookup (GB) used for heuristic power estimate.
# Keys are lowercase substring matches against the model name.
_MODEL_SIZE_GB: dict[str, float] = {
"qwen3:1b": 0.8,
"qwen3:3b": 2.0,
"qwen3:4b": 2.5,
"qwen3:8b": 5.5,
"qwen3:14b": 9.0,
"qwen3:30b": 20.0,
"qwen3:32b": 20.0,
"llama3:8b": 5.5,
"llama3:70b": 45.0,
"mistral:7b": 4.5,
"gemma3:4b": 2.5,
"gemma3:12b": 8.0,
"gemma3:27b": 17.0,
"phi4:14b": 9.0,
}
_DEFAULT_MODEL_SIZE_GB = 5.0 # fallback when model not in table
_WATTS_PER_GB_HEURISTIC = 2.0 # rough W/GB for Apple Silicon unified memory
# Efficiency score normalisation: score 10 at this efficiency (tok/s per W).
_EFFICIENCY_SCORE_CEILING = 5.0 # tok/s per W → score 10
# Rolling window for recent samples
_HISTORY_MAXLEN = 60
@dataclass
class InferenceSample:
"""A single inference event captured by record_inference()."""
timestamp: str
model: str
tokens_per_second: float
estimated_watts: float
efficiency: float # tokens/s per watt
efficiency_score: float # 010
@dataclass
class EnergyReport:
"""Snapshot of current energy budget state."""
timestamp: str
low_power_mode: bool
current_watts: float
strategy: str # "battery", "cpu_proxy", "heuristic", "unavailable"
efficiency_score: float # 010; -1 if no inference samples yet
recent_samples: list[InferenceSample]
recommendation: str
details: dict[str, Any] = field(default_factory=dict)
def to_dict(self) -> dict[str, Any]:
return {
"timestamp": self.timestamp,
"low_power_mode": self.low_power_mode,
"current_watts": round(self.current_watts, 2),
"strategy": self.strategy,
"efficiency_score": round(self.efficiency_score, 2),
"recent_samples": [
{
"timestamp": s.timestamp,
"model": s.model,
"tokens_per_second": round(s.tokens_per_second, 1),
"estimated_watts": round(s.estimated_watts, 2),
"efficiency": round(s.efficiency, 3),
"efficiency_score": round(s.efficiency_score, 2),
}
for s in self.recent_samples
],
"recommendation": self.recommendation,
"details": self.details,
}
class EnergyBudgetMonitor:
"""Estimates power consumption and tracks LLM inference efficiency.
All blocking I/O (subprocess calls) is wrapped in asyncio.to_thread()
so the event loop is never blocked. Results are cached.
Usage::
# Record an inference event
energy_monitor.record_inference("qwen3:8b", tokens_per_second=42.0)
# Get the current report
report = await energy_monitor.get_report()
# Toggle low power mode
energy_monitor.set_low_power_mode(True)
"""
_POWER_CACHE_TTL = 10.0 # seconds between fresh power readings
def __init__(self) -> None:
self._low_power_mode: bool = False
self._samples: deque[InferenceSample] = deque(maxlen=_HISTORY_MAXLEN)
self._cached_watts: float = 0.0
self._cached_strategy: str = "unavailable"
self._cache_ts: float = 0.0
# ── Public API ────────────────────────────────────────────────────────────
@property
def low_power_mode(self) -> bool:
return self._low_power_mode
def set_low_power_mode(self, enabled: bool) -> None:
"""Enable or disable low power mode."""
self._low_power_mode = enabled
state = "enabled" if enabled else "disabled"
logger.info("Energy budget: low power mode %s", state)
def record_inference(self, model: str, tokens_per_second: float) -> InferenceSample:
"""Record an inference event for efficiency tracking.
Call this after each LLM inference completes with the model name and
measured throughput. The current power estimate is used to compute
the efficiency score.
Args:
model: Ollama model name (e.g. "qwen3:8b").
tokens_per_second: Measured decode throughput.
Returns:
The recorded InferenceSample.
"""
watts = self._cached_watts if self._cached_watts > 0 else self._estimate_watts_sync(model)
efficiency = tokens_per_second / max(watts, 0.1)
score = min(10.0, (efficiency / _EFFICIENCY_SCORE_CEILING) * 10.0)
sample = InferenceSample(
timestamp=datetime.now(UTC).isoformat(),
model=model,
tokens_per_second=tokens_per_second,
estimated_watts=watts,
efficiency=efficiency,
efficiency_score=score,
)
self._samples.append(sample)
# Auto-engage low power mode if above threshold and budget is enabled
threshold = getattr(settings, "energy_budget_watts_threshold", 15.0)
if watts > threshold and not self._low_power_mode:
logger.info(
"Energy budget: %.1fW exceeds threshold %.1fW — auto-engaging low power mode",
watts,
threshold,
)
self.set_low_power_mode(True)
return sample
async def get_report(self) -> EnergyReport:
"""Return the current energy budget report.
Refreshes the power estimate if the cache is stale.
"""
await self._refresh_power_cache()
score = self._compute_mean_efficiency_score()
recommendation = self._build_recommendation(score)
return EnergyReport(
timestamp=datetime.now(UTC).isoformat(),
low_power_mode=self._low_power_mode,
current_watts=self._cached_watts,
strategy=self._cached_strategy,
efficiency_score=score,
recent_samples=list(self._samples)[-10:],
recommendation=recommendation,
details={"sample_count": len(self._samples)},
)
# ── Power estimation ──────────────────────────────────────────────────────
async def _refresh_power_cache(self) -> None:
"""Refresh the cached power reading if stale."""
now = time.monotonic()
if now - self._cache_ts < self._POWER_CACHE_TTL:
return
try:
watts, strategy = await asyncio.to_thread(self._read_power)
except Exception as exc:
logger.debug("Energy: power read failed: %s", exc)
watts, strategy = 0.0, "unavailable"
self._cached_watts = watts
self._cached_strategy = strategy
self._cache_ts = now
def _read_power(self) -> tuple[float, str]:
"""Synchronous power reading — tries strategies in priority order.
Returns:
Tuple of (watts, strategy_name).
"""
# Strategy 1: battery discharge via ioreg (on-battery Macs)
try:
watts = self._read_battery_watts()
if watts > 0:
return watts, "battery"
except Exception:
pass
# Strategy 2: CPU utilisation proxy via top
try:
cpu_pct = self._read_cpu_pct()
if cpu_pct >= 0:
# M3 Max TDP ≈ 40W; scale linearly
watts = (cpu_pct / 100.0) * 40.0
return watts, "cpu_proxy"
except Exception:
pass
# Strategy 3: heuristic from loaded model size
return 0.0, "unavailable"
def _estimate_watts_sync(self, model: str) -> float:
"""Estimate watts from model size when no live reading is available."""
size_gb = self._model_size_gb(model)
return size_gb * _WATTS_PER_GB_HEURISTIC
def _read_battery_watts(self) -> float:
"""Read instantaneous battery discharge via ioreg.
Returns watts if on battery, 0.0 if plugged in or unavailable.
Requires macOS; no sudo needed.
"""
result = subprocess.run(
["ioreg", "-r", "-c", "AppleSmartBattery", "-d", "1"],
capture_output=True,
text=True,
timeout=3,
)
amperage_ma = 0.0
voltage_mv = 0.0
is_charging = True # assume charging unless we see ExternalConnected = No
for line in result.stdout.splitlines():
stripped = line.strip()
if '"InstantAmperage"' in stripped:
try:
amperage_ma = float(stripped.split("=")[-1].strip())
except ValueError:
pass
elif '"Voltage"' in stripped:
try:
voltage_mv = float(stripped.split("=")[-1].strip())
except ValueError:
pass
elif '"ExternalConnected"' in stripped:
is_charging = "Yes" in stripped
if is_charging or voltage_mv == 0 or amperage_ma <= 0:
return 0.0
# ioreg reports amperage in mA, voltage in mV
return (abs(amperage_ma) * voltage_mv) / 1_000_000
def _read_cpu_pct(self) -> float:
"""Read CPU utilisation from macOS top.
Returns aggregate CPU% (0100), or -1.0 on failure.
"""
result = subprocess.run(
["top", "-l", "1", "-n", "0", "-stats", "cpu"],
capture_output=True,
text=True,
timeout=5,
)
for line in result.stdout.splitlines():
if "CPU usage:" in line:
# "CPU usage: 12.5% user, 8.3% sys, 79.1% idle"
parts = line.split()
try:
user = float(parts[2].rstrip("%"))
sys_ = float(parts[4].rstrip("%"))
return user + sys_
except (IndexError, ValueError):
pass
return -1.0
# ── Helpers ───────────────────────────────────────────────────────────────
@staticmethod
def _model_size_gb(model: str) -> float:
"""Look up approximate model size in GB by name substring."""
lower = model.lower()
# Exact match first
if lower in _MODEL_SIZE_GB:
return _MODEL_SIZE_GB[lower]
# Substring match
for key, size in _MODEL_SIZE_GB.items():
if key in lower:
return size
return _DEFAULT_MODEL_SIZE_GB
def _compute_mean_efficiency_score(self) -> float:
"""Mean efficiency score over recent samples, or -1 if none."""
if not self._samples:
return -1.0
recent = list(self._samples)[-10:]
return sum(s.efficiency_score for s in recent) / len(recent)
def _build_recommendation(self, score: float) -> str:
"""Generate a human-readable recommendation from the efficiency score."""
threshold = getattr(settings, "energy_budget_watts_threshold", 15.0)
low_power_model = getattr(settings, "energy_low_power_model", "qwen3:1b")
if score < 0:
return "No inference data yet — run some tasks to populate efficiency metrics."
if self._low_power_mode:
return (
f"Low power mode active — routing to {low_power_model}. "
"Disable when power draw normalises."
)
if score < 3.0:
return (
f"Low efficiency (score {score:.1f}/10). "
f"Consider enabling low power mode to favour smaller models "
f"(threshold: {threshold}W)."
)
if score < 6.0:
return f"Moderate efficiency (score {score:.1f}/10). System operating normally."
return f"Good efficiency (score {score:.1f}/10). No action needed."
# Module-level singleton
energy_monitor = EnergyBudgetMonitor()

View File

@@ -71,6 +71,53 @@ class GitHand:
return True
return False
async def _exec_subprocess(
self,
args: str,
timeout: int,
) -> tuple[bytes, bytes, int]:
"""Run git as a subprocess, return (stdout, stderr, returncode).
Raises TimeoutError if the process exceeds *timeout* seconds.
"""
proc = await asyncio.create_subprocess_exec(
"git",
*args.split(),
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
cwd=self._repo_dir,
)
try:
stdout, stderr = await asyncio.wait_for(
proc.communicate(),
timeout=timeout,
)
except TimeoutError:
proc.kill()
await proc.wait()
raise
return stdout, stderr, proc.returncode or 0
@staticmethod
def _parse_output(
command: str,
stdout_bytes: bytes,
stderr_bytes: bytes,
returncode: int | None,
latency_ms: float,
) -> GitResult:
"""Decode subprocess output into a GitResult."""
exit_code = returncode or 0
stdout = stdout_bytes.decode("utf-8", errors="replace").strip()
stderr = stderr_bytes.decode("utf-8", errors="replace").strip()
return GitResult(
operation=command,
success=exit_code == 0,
output=stdout,
error=stderr if exit_code != 0 else "",
latency_ms=latency_ms,
)
async def run(
self,
args: str,
@@ -88,14 +135,15 @@ class GitHand:
GitResult with output or error details.
"""
start = time.time()
command = f"git {args}"
# Gate destructive operations
if self._is_destructive(args) and not allow_destructive:
return GitResult(
operation=f"git {args}",
operation=command,
success=False,
error=(
f"Destructive operation blocked: 'git {args}'. "
f"Destructive operation blocked: '{command}'. "
"Set allow_destructive=True to override."
),
requires_confirmation=True,
@@ -103,46 +151,21 @@ class GitHand:
)
effective_timeout = timeout or self._timeout
command = f"git {args}"
try:
proc = await asyncio.create_subprocess_exec(
"git",
*args.split(),
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
cwd=self._repo_dir,
stdout_bytes, stderr_bytes, returncode = await self._exec_subprocess(
args,
effective_timeout,
)
try:
stdout_bytes, stderr_bytes = await asyncio.wait_for(
proc.communicate(), timeout=effective_timeout
)
except TimeoutError:
proc.kill()
await proc.wait()
latency = (time.time() - start) * 1000
logger.warning("Git command timed out after %ds: %s", effective_timeout, command)
return GitResult(
operation=command,
success=False,
error=f"Command timed out after {effective_timeout}s",
latency_ms=latency,
)
except TimeoutError:
latency = (time.time() - start) * 1000
exit_code = proc.returncode or 0
stdout = stdout_bytes.decode("utf-8", errors="replace").strip()
stderr = stderr_bytes.decode("utf-8", errors="replace").strip()
logger.warning("Git command timed out after %ds: %s", effective_timeout, command)
return GitResult(
operation=command,
success=exit_code == 0,
output=stdout,
error=stderr if exit_code != 0 else "",
success=False,
error=f"Command timed out after {effective_timeout}s",
latency_ms=latency,
)
except FileNotFoundError:
latency = (time.time() - start) * 1000
logger.warning("git binary not found")
@@ -162,6 +185,14 @@ class GitHand:
latency_ms=latency,
)
return self._parse_output(
command,
stdout_bytes,
stderr_bytes,
returncode=returncode,
latency_ms=(time.time() - start) * 1000,
)
# ── Convenience wrappers ─────────────────────────────────────────────────
async def status(self) -> GitResult:

View File

@@ -4,6 +4,6 @@ Monitors the local machine (Hermes/M3 Max) for memory pressure, disk usage,
Ollama model health, zombie processes, and network connectivity.
"""
from infrastructure.hermes.monitor import HermesMonitor, HealthLevel, HealthReport, hermes_monitor
from infrastructure.hermes.monitor import HealthLevel, HealthReport, HermesMonitor, hermes_monitor
__all__ = ["HermesMonitor", "HealthLevel", "HealthReport", "hermes_monitor"]

View File

@@ -19,11 +19,12 @@ import json
import logging
import shutil
import subprocess
import tempfile
import time
import urllib.request
from dataclasses import dataclass, field
from datetime import UTC, datetime
from enum import Enum
from enum import StrEnum
from typing import Any
from config import settings
@@ -31,7 +32,7 @@ from config import settings
logger = logging.getLogger(__name__)
class HealthLevel(str, Enum):
class HealthLevel(StrEnum):
"""Severity level for a health check result."""
OK = "ok"
@@ -194,8 +195,7 @@ class HermesMonitor:
name="memory",
level=HealthLevel.CRITICAL,
message=(
f"Critical: only {free_gb:.1f}GB free "
f"(threshold: {memory_free_min_gb}GB)"
f"Critical: only {free_gb:.1f}GB free (threshold: {memory_free_min_gb}GB)"
),
details=details,
needs_human=True,
@@ -302,8 +302,7 @@ class HermesMonitor:
name="disk",
level=HealthLevel.CRITICAL,
message=(
f"Critical: only {free_gb:.1f}GB free "
f"(threshold: {disk_free_min_gb}GB)"
f"Critical: only {free_gb:.1f}GB free (threshold: {disk_free_min_gb}GB)"
),
details=details,
needs_human=True,
@@ -335,7 +334,7 @@ class HermesMonitor:
cutoff = time.time() - 86400 # 24 hours ago
try:
tmp = Path("/tmp")
tmp = Path(tempfile.gettempdir())
for item in tmp.iterdir():
try:
stat = item.stat()
@@ -345,11 +344,7 @@ class HermesMonitor:
freed_bytes += stat.st_size
item.unlink(missing_ok=True)
elif item.is_dir():
dir_size = sum(
f.stat().st_size
for f in item.rglob("*")
if f.is_file()
)
dir_size = sum(f.stat().st_size for f in item.rglob("*") if f.is_file())
freed_bytes += dir_size
shutil.rmtree(str(item), ignore_errors=True)
except (PermissionError, OSError):
@@ -392,10 +387,7 @@ class HermesMonitor:
return CheckResult(
name="ollama",
level=HealthLevel.OK,
message=(
f"Ollama OK — {len(models)} model(s) available, "
f"{len(loaded)} loaded"
),
message=(f"Ollama OK — {len(models)} model(s) available, {len(loaded)} loaded"),
details={
"reachable": True,
"model_count": len(models),

View File

@@ -21,6 +21,8 @@ logger = logging.getLogger(__name__)
@dataclass
class Notification:
"""A push notification with title, message, category, and read status."""
id: int
title: str
message: str

View File

@@ -242,6 +242,64 @@ def produce_agent_state(agent_id: str, presence: dict) -> dict:
}
def _get_agents_online() -> int:
"""Return the count of agents with a non-offline status."""
try:
from timmy.agents.loader import list_agents
agents = list_agents()
return sum(1 for a in agents if a.get("status", "") not in ("offline", ""))
except Exception as exc:
logger.debug("Failed to count agents: %s", exc)
return 0
def _get_visitors() -> int:
"""Return the count of active WebSocket visitor clients."""
try:
from dashboard.routes.world import _ws_clients
return len(_ws_clients)
except Exception as exc:
logger.debug("Failed to count visitors: %s", exc)
return 0
def _get_uptime_seconds() -> int:
"""Return seconds elapsed since application start."""
try:
from config import APP_START_TIME
return int((datetime.now(UTC) - APP_START_TIME).total_seconds())
except Exception as exc:
logger.debug("Failed to calculate uptime: %s", exc)
return 0
def _get_thinking_active() -> bool:
"""Return True if the thinking engine is enabled and running."""
try:
from config import settings
from timmy.thinking import thinking_engine
return settings.thinking_enabled and thinking_engine is not None
except Exception as exc:
logger.debug("Failed to check thinking status: %s", exc)
return False
def _get_memory_count() -> int:
"""Return total entries in the vector memory store."""
try:
from timmy.memory_system import get_memory_stats
stats = get_memory_stats()
return stats.get("total_entries", 0)
except Exception as exc:
logger.debug("Failed to count memories: %s", exc)
return 0
def produce_system_status() -> dict:
"""Generate a system_status message for the Matrix.
@@ -270,64 +328,14 @@ def produce_system_status() -> dict:
"ts": 1742529600,
}
"""
# Count agents with status != offline
agents_online = 0
try:
from timmy.agents.loader import list_agents
agents = list_agents()
agents_online = sum(1 for a in agents if a.get("status", "") not in ("offline", ""))
except Exception as exc:
logger.debug("Failed to count agents: %s", exc)
# Count visitors from WebSocket clients
visitors = 0
try:
from dashboard.routes.world import _ws_clients
visitors = len(_ws_clients)
except Exception as exc:
logger.debug("Failed to count visitors: %s", exc)
# Calculate uptime
uptime_seconds = 0
try:
from datetime import UTC
from config import APP_START_TIME
uptime_seconds = int((datetime.now(UTC) - APP_START_TIME).total_seconds())
except Exception as exc:
logger.debug("Failed to calculate uptime: %s", exc)
# Check thinking engine status
thinking_active = False
try:
from config import settings
from timmy.thinking import thinking_engine
thinking_active = settings.thinking_enabled and thinking_engine is not None
except Exception as exc:
logger.debug("Failed to check thinking status: %s", exc)
# Count memories in vector store
memory_count = 0
try:
from timmy.memory_system import get_memory_stats
stats = get_memory_stats()
memory_count = stats.get("total_entries", 0)
except Exception as exc:
logger.debug("Failed to count memories: %s", exc)
return {
"type": "system_status",
"data": {
"agents_online": agents_online,
"visitors": visitors,
"uptime_seconds": uptime_seconds,
"thinking_active": thinking_active,
"memory_count": memory_count,
"agents_online": _get_agents_online(),
"visitors": _get_visitors(),
"uptime_seconds": _get_uptime_seconds(),
"thinking_active": _get_thinking_active(),
"memory_count": _get_memory_count(),
},
"ts": int(time.time()),
}

View File

@@ -2,7 +2,16 @@
from .api import router
from .cascade import CascadeRouter, Provider, ProviderStatus, get_router
from .classifier import TaskComplexity, classify_task
from .history import HealthHistoryStore, get_history_store
from .metabolic import (
DEFAULT_TIER_MODELS,
MetabolicRouter,
ModelTier,
build_prompt,
classify_complexity,
get_metabolic_router,
)
__all__ = [
"CascadeRouter",
@@ -12,4 +21,14 @@ __all__ = [
"router",
"HealthHistoryStore",
"get_history_store",
# Metabolic router
"MetabolicRouter",
"ModelTier",
"DEFAULT_TIER_MODELS",
"classify_complexity",
"build_prompt",
"get_metabolic_router",
# Classifier
"TaskComplexity",
"classify_task",
]

View File

@@ -16,7 +16,10 @@ from dataclasses import dataclass, field
from datetime import UTC, datetime
from enum import Enum
from pathlib import Path
from typing import Any
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from infrastructure.router.classifier import TaskComplexity
from config import settings
@@ -328,6 +331,22 @@ class CascadeRouter:
logger.debug("vllm-mlx provider check error: %s", exc)
return False
elif provider.type == "vllm":
# Check if standard vLLM server is running (OpenAI-compatible API)
if requests is None:
return True
try:
base_url = provider.base_url or provider.url or settings.vllm_url
# Strip /v1 suffix — health endpoint is at the server root
server_root = base_url.rstrip("/")
if server_root.endswith("/v1"):
server_root = server_root[:-3]
response = requests.get(f"{server_root}/health", timeout=5)
return response.status_code == 200
except Exception as exc:
logger.debug("vllm provider check error: %s", exc)
return False
elif provider.type in ("openai", "anthropic", "grok"):
# Check if API key is set
return provider.api_key is not None and provider.api_key != ""
@@ -528,6 +547,99 @@ class CascadeRouter:
return True
def _filter_providers(self, cascade_tier: str | None) -> list["Provider"]:
"""Return the provider list filtered by tier.
Raises:
RuntimeError: If a tier is specified but no matching providers exist.
"""
if cascade_tier == "frontier_required":
providers = [p for p in self.providers if p.type == "anthropic"]
if not providers:
raise RuntimeError("No Anthropic provider configured for 'frontier_required' tier.")
return providers
if cascade_tier:
providers = [p for p in self.providers if p.tier == cascade_tier]
if not providers:
raise RuntimeError(f"No providers found for tier: {cascade_tier}")
return providers
return self.providers
async def _try_single_provider(
self,
provider: "Provider",
messages: list[dict],
model: str | None,
temperature: float,
max_tokens: int | None,
content_type: ContentType,
errors: list[str],
) -> dict | None:
"""Attempt one provider, returning a result dict on success or None on failure.
On failure the error string is appended to *errors* and the provider's
failure metrics are updated so the caller can move on to the next provider.
"""
if not self._is_provider_available(provider):
return None
# Metabolic protocol: skip cloud providers when quota is low
if provider.type in ("anthropic", "openai", "grok"):
if not self._quota_allows_cloud(provider):
logger.info(
"Metabolic protocol: skipping cloud provider %s (quota too low)",
provider.name,
)
return None
selected_model, is_fallback_model = self._select_model(provider, model, content_type)
try:
result = await self._attempt_with_retry(
provider, messages, selected_model, temperature, max_tokens, content_type
)
except RuntimeError as exc:
errors.append(str(exc))
self._record_failure(provider)
return None
self._record_success(provider, result.get("latency_ms", 0))
return {
"content": result["content"],
"provider": provider.name,
"model": result.get("model", selected_model or provider.get_default_model()),
"latency_ms": result.get("latency_ms", 0),
"is_fallback_model": is_fallback_model,
}
def _get_model_for_complexity(
self, provider: Provider, complexity: "TaskComplexity"
) -> str | None:
"""Return the best model on *provider* for the given complexity tier.
Checks fallback chains first (routine / complex), then falls back to
any model with the matching capability tag, then the provider default.
"""
from infrastructure.router.classifier import TaskComplexity
chain_key = "routine" if complexity == TaskComplexity.SIMPLE else "complex"
# Walk the capability fallback chain — first model present on this provider wins
for model_name in self.config.fallback_chains.get(chain_key, []):
if any(m["name"] == model_name for m in provider.models):
return model_name
# Direct capability lookup — only return if a model explicitly has the tag
# (do not use get_model_with_capability here as it falls back to the default)
cap_model = next(
(m["name"] for m in provider.models if chain_key in m.get("capabilities", [])),
None,
)
if cap_model:
return cap_model
return None # Caller will use provider default
async def complete(
self,
messages: list[dict],
@@ -535,6 +647,7 @@ class CascadeRouter:
temperature: float = 0.7,
max_tokens: int | None = None,
cascade_tier: str | None = None,
complexity_hint: str | None = None,
) -> dict:
"""Complete a chat conversation with automatic failover.
@@ -543,35 +656,50 @@ class CascadeRouter:
- Falls back to vision-capable models when needed
- Supports image URLs, paths, and base64 encoding
Complexity-based routing (issue #1065):
- ``complexity_hint="simple"`` → routes to Qwen3-8B (low-latency)
- ``complexity_hint="complex"`` → routes to Qwen3-14B (quality)
- ``complexity_hint=None`` (default) → auto-classifies from messages
Args:
messages: List of message dicts with role and content
model: Preferred model (tries this first, then provider defaults)
model: Preferred model (tries this first; complexity routing is
skipped when an explicit model is given)
temperature: Sampling temperature
max_tokens: Maximum tokens to generate
cascade_tier: If specified, filters providers by this tier.
- "frontier_required": Uses only Anthropic provider for top-tier models.
complexity_hint: "simple", "complex", or None (auto-detect).
Returns:
Dict with content, provider_used, and metrics
Dict with content, provider_used, model, latency_ms,
is_fallback_model, and complexity fields.
Raises:
RuntimeError: If all providers fail
"""
from infrastructure.router.classifier import TaskComplexity, classify_task
content_type = self._detect_content_type(messages)
if content_type != ContentType.TEXT:
logger.debug("Detected %s content, selecting appropriate model", content_type.value)
errors = []
# Resolve task complexity ─────────────────────────────────────────────
# Skip complexity routing when caller explicitly specifies a model.
complexity: TaskComplexity | None = None
if model is None:
if complexity_hint is not None:
try:
complexity = TaskComplexity(complexity_hint.lower())
except ValueError:
logger.warning("Unknown complexity_hint %r, auto-classifying", complexity_hint)
complexity = classify_task(messages)
else:
complexity = classify_task(messages)
logger.debug("Task complexity: %s", complexity.value)
providers = self.providers
if cascade_tier == "frontier_required":
providers = [p for p in self.providers if p.type == "anthropic"]
if not providers:
raise RuntimeError("No Anthropic provider configured for 'frontier_required' tier.")
elif cascade_tier:
providers = [p for p in self.providers if p.tier == cascade_tier]
if not providers:
raise RuntimeError(f"No providers found for tier: {cascade_tier}")
errors: list[str] = []
providers = self._filter_providers(cascade_tier)
for provider in providers:
if not self._is_provider_available(provider):
@@ -586,7 +714,21 @@ class CascadeRouter:
)
continue
selected_model, is_fallback_model = self._select_model(provider, model, content_type)
# Complexity-based model selection (only when no explicit model) ──
effective_model = model
if effective_model is None and complexity is not None:
effective_model = self._get_model_for_complexity(provider, complexity)
if effective_model:
logger.debug(
"Complexity routing [%s]: %s%s",
complexity.value,
provider.name,
effective_model,
)
selected_model, is_fallback_model = self._select_model(
provider, effective_model, content_type
)
try:
result = await self._attempt_with_retry(
@@ -609,6 +751,7 @@ class CascadeRouter:
"model": result.get("model", selected_model or provider.get_default_model()),
"latency_ms": result.get("latency_ms", 0),
"is_fallback_model": is_fallback_model,
"complexity": complexity.value if complexity is not None else None,
}
raise RuntimeError(f"All providers failed: {'; '.join(errors)}")
@@ -666,6 +809,14 @@ class CascadeRouter:
temperature=temperature,
max_tokens=max_tokens,
)
elif provider.type == "vllm":
result = await self._call_vllm(
provider=provider,
messages=messages,
model=model or provider.get_default_model(),
temperature=temperature,
max_tokens=max_tokens,
)
else:
raise ValueError(f"Unknown provider type: {provider.type}")
@@ -904,6 +1055,49 @@ class CascadeRouter:
"model": response.model,
}
async def _call_vllm(
self,
provider: Provider,
messages: list[dict],
model: str,
temperature: float,
max_tokens: int | None,
) -> dict:
"""Call a standard vLLM server via its OpenAI-compatible API.
vLLM exposes the same /v1/chat/completions endpoint as OpenAI.
No API key is required for local deployments.
Default URL comes from settings.vllm_url (VLLM_URL env var).
"""
import openai
base_url = provider.base_url or provider.url or settings.vllm_url
# Ensure the base_url ends with /v1 as expected by the OpenAI client
if not base_url.rstrip("/").endswith("/v1"):
base_url = base_url.rstrip("/") + "/v1"
client = openai.AsyncOpenAI(
api_key=provider.api_key or "no-key-required",
base_url=base_url,
timeout=self.config.timeout_seconds,
)
kwargs: dict = {
"model": model,
"messages": messages,
"temperature": temperature,
}
if max_tokens:
kwargs["max_tokens"] = max_tokens
response = await client.chat.completions.create(**kwargs)
return {
"content": response.choices[0].message.content,
"model": response.model,
}
def _record_success(self, provider: Provider, latency_ms: float) -> None:
"""Record a successful request."""
provider.metrics.total_requests += 1

View File

@@ -0,0 +1,169 @@
"""Task complexity classifier for Qwen3 dual-model routing.
Classifies incoming tasks as SIMPLE (route to Qwen3-8B for low-latency)
or COMPLEX (route to Qwen3-14B for quality-sensitive work).
Classification is fully heuristic — no LLM inference required.
"""
import re
from enum import Enum
class TaskComplexity(Enum):
"""Task complexity tier for model routing."""
SIMPLE = "simple" # Qwen3-8B Q6_K: routine, latency-sensitive
COMPLEX = "complex" # Qwen3-14B Q5_K_M: quality-sensitive, multi-step
# Keywords strongly associated with complex tasks
_COMPLEX_KEYWORDS: frozenset[str] = frozenset(
[
"plan",
"review",
"analyze",
"analyse",
"triage",
"refactor",
"design",
"architecture",
"implement",
"compare",
"debug",
"explain",
"prioritize",
"prioritise",
"strategy",
"optimize",
"optimise",
"evaluate",
"assess",
"brainstorm",
"outline",
"summarize",
"summarise",
"generate code",
"write a",
"write the",
"code review",
"pull request",
"multi-step",
"multi step",
"step by step",
"backlog prioriti",
"issue triage",
"root cause",
"how does",
"why does",
"what are the",
]
)
# Keywords strongly associated with simple/routine tasks
_SIMPLE_KEYWORDS: frozenset[str] = frozenset(
[
"status",
"list ",
"show ",
"what is",
"how many",
"ping",
"run ",
"execute ",
"ls ",
"cat ",
"ps ",
"fetch ",
"count ",
"tail ",
"head ",
"grep ",
"find file",
"read file",
"get ",
"query ",
"check ",
"yes",
"no",
"ok",
"done",
"thanks",
]
)
# Content longer than this is treated as complex regardless of keywords
_COMPLEX_CHAR_THRESHOLD = 500
# Short content defaults to simple
_SIMPLE_CHAR_THRESHOLD = 150
# More than this many messages suggests an ongoing complex conversation
_COMPLEX_CONVERSATION_DEPTH = 6
def classify_task(messages: list[dict]) -> TaskComplexity:
"""Classify task complexity from a list of messages.
Uses heuristic rules — no LLM call required. Errs toward COMPLEX
when uncertain so that quality is preserved.
Args:
messages: List of message dicts with ``role`` and ``content`` keys.
Returns:
TaskComplexity.SIMPLE or TaskComplexity.COMPLEX
"""
if not messages:
return TaskComplexity.SIMPLE
# Concatenate all user-turn content for analysis
user_content = (
" ".join(
msg.get("content", "")
for msg in messages
if msg.get("role") in ("user", "human") and isinstance(msg.get("content"), str)
)
.lower()
.strip()
)
if not user_content:
return TaskComplexity.SIMPLE
# Complexity signals override everything -----------------------------------
# Explicit complex keywords
for kw in _COMPLEX_KEYWORDS:
if kw in user_content:
return TaskComplexity.COMPLEX
# Numbered / multi-step instruction list: "1. do this 2. do that"
if re.search(r"\b\d+\.\s+\w", user_content):
return TaskComplexity.COMPLEX
# Code blocks embedded in messages
if "```" in user_content:
return TaskComplexity.COMPLEX
# Long content → complex reasoning likely required
if len(user_content) > _COMPLEX_CHAR_THRESHOLD:
return TaskComplexity.COMPLEX
# Deep conversation → complex ongoing task
if len(messages) > _COMPLEX_CONVERSATION_DEPTH:
return TaskComplexity.COMPLEX
# Simplicity signals -------------------------------------------------------
# Explicit simple keywords
for kw in _SIMPLE_KEYWORDS:
if kw in user_content:
return TaskComplexity.SIMPLE
# Short single-sentence messages default to simple
if len(user_content) <= _SIMPLE_CHAR_THRESHOLD:
return TaskComplexity.SIMPLE
# When uncertain, prefer quality (complex model)
return TaskComplexity.COMPLEX

View File

@@ -0,0 +1,424 @@
"""Three-tier metabolic LLM router.
Routes queries to the cheapest-sufficient model tier using MLX for all
inference on Apple Silicon GPU:
T1 — Routine (Qwen3-8B Q6_K, ~45-55 tok/s): Simple navigation, basic choices.
T2 — Medium (Qwen3-14B Q5_K_M, ~20-28 tok/s): Dialogue, inventory management.
T3 — Complex (Qwen3-32B Q4_K_M, ~8-12 tok/s): Quest planning, stuck recovery.
Memory budget:
- T1+T2 always loaded (~8.5 GB combined)
- T3 loaded on demand (+20 GB) — game pauses during inference
Design notes:
- 70% of game ticks never reach the LLM (handled upstream by behavior trees)
- T3 pauses the game world before inference and unpauses after (graceful if no world)
- All inference via vllm-mlx / Ollama — local-first, no cloud for game ticks
References:
- Issue #966 — Three-Tier Metabolic LLM Router
- Issue #1063 — Best Local Uncensored Agent Model for M3 Max 36GB
- Issue #1075 — Claude Quota Monitor + Metabolic Protocol
"""
import asyncio
import logging
from enum import StrEnum
from typing import Any
logger = logging.getLogger(__name__)
class ModelTier(StrEnum):
"""Three metabolic model tiers ordered by cost and capability.
Tier selection is driven by classify_complexity(). The cheapest
sufficient tier is always chosen — T1 handles routine tasks, T2
handles dialogue and management, T3 handles planning and recovery.
"""
T1_ROUTINE = "t1_routine" # Fast, cheap — Qwen3-8B, always loaded
T2_MEDIUM = "t2_medium" # Balanced — Qwen3-14B, always loaded
T3_COMPLEX = "t3_complex" # Deep — Qwen3-32B, loaded on demand, pauses game
# ── Classification vocabulary ────────────────────────────────────────────────
# T1: single-action navigation and binary-choice words
_T1_KEYWORDS = frozenset(
{
"go",
"move",
"walk",
"run",
"north",
"south",
"east",
"west",
"up",
"down",
"left",
"right",
"yes",
"no",
"ok",
"okay",
"open",
"close",
"take",
"drop",
"look",
"pick",
"use",
"wait",
"rest",
"save",
"attack",
"flee",
"jump",
"crouch",
}
)
# T3: planning, optimisation, or recovery signals
_T3_KEYWORDS = frozenset(
{
"plan",
"strategy",
"optimize",
"optimise",
"quest",
"stuck",
"recover",
"multi-step",
"long-term",
"negotiate",
"persuade",
"faction",
"reputation",
"best",
"optimal",
"recommend",
"analyze",
"analyse",
"evaluate",
"decide",
"complex",
"how do i",
"what should i do",
"help me figure",
"what is the best",
}
)
def classify_complexity(task: str, state: dict) -> ModelTier:
"""Classify a task to the cheapest-sufficient model tier.
Classification priority (highest wins):
1. T3 — any T3 keyword, stuck indicator, or ``state["require_t3"] = True``
2. T1 — short task with only T1 keywords and no active context
3. T2 — everything else (safe default)
Args:
task: Natural-language task description or player input.
state: Current game state dict. Recognised keys:
``stuck`` (bool), ``require_t3`` (bool),
``active_quests`` (list), ``dialogue_active`` (bool).
Returns:
ModelTier appropriate for the task.
"""
task_lower = task.lower()
words = set(task_lower.split())
# ── T3 signals ──────────────────────────────────────────────────────────
t3_keyword_hit = bool(words & _T3_KEYWORDS)
# Check multi-word T3 phrases
t3_phrase_hit = any(phrase in task_lower for phrase in _T3_KEYWORDS if " " in phrase)
is_stuck = bool(state.get("stuck", False))
explicit_t3 = bool(state.get("require_t3", False))
if t3_keyword_hit or t3_phrase_hit or is_stuck or explicit_t3:
logger.debug(
"classify_complexity → T3 (keywords=%s stuck=%s explicit=%s)",
t3_keyword_hit or t3_phrase_hit,
is_stuck,
explicit_t3,
)
return ModelTier.T3_COMPLEX
# ── T1 signals ──────────────────────────────────────────────────────────
t1_keyword_hit = bool(words & _T1_KEYWORDS)
task_short = len(task.split()) <= 6
no_active_context = (
not state.get("active_quests")
and not state.get("dialogue_active")
and not state.get("combat_active")
)
if t1_keyword_hit and task_short and no_active_context:
logger.debug("classify_complexity → T1 (keywords=%s short=%s)", t1_keyword_hit, task_short)
return ModelTier.T1_ROUTINE
# ── Default: T2 ─────────────────────────────────────────────────────────
logger.debug("classify_complexity → T2 (default)")
return ModelTier.T2_MEDIUM
def build_prompt(
state: dict,
ui_state: dict,
text: str,
visual_context: str | None = None,
) -> list[dict]:
"""Build an OpenAI-compatible messages list from game context.
Assembles a system message from structured game state and a user
message from the player's text input. This format is accepted by
CascadeRouter.complete() directly.
Args:
state: Current game state dict. Common keys:
``location`` (str), ``health`` (int/float),
``inventory`` (list), ``active_quests`` (list),
``stuck`` (bool).
ui_state: Current UI state dict. Common keys:
``dialogue_active`` (bool), ``dialogue_npc`` (str),
``menu_open`` (str), ``combat_active`` (bool).
text: Player text or task description (becomes user message).
visual_context: Optional free-text description of the current screen
or scene — from a vision model or rule-based extractor.
Returns:
List of message dicts: [{"role": "system", ...}, {"role": "user", ...}]
"""
context_lines: list[str] = []
location = state.get("location", "unknown")
context_lines.append(f"Location: {location}")
health = state.get("health")
if health is not None:
context_lines.append(f"Health: {health}")
inventory = state.get("inventory", [])
if inventory:
items = [i if isinstance(i, str) else i.get("name", str(i)) for i in inventory[:10]]
context_lines.append(f"Inventory: {', '.join(items)}")
active_quests = state.get("active_quests", [])
if active_quests:
names = [q if isinstance(q, str) else q.get("name", str(q)) for q in active_quests[:5]]
context_lines.append(f"Active quests: {', '.join(names)}")
if state.get("stuck"):
context_lines.append("Status: STUCK — need recovery strategy")
if ui_state.get("dialogue_active"):
npc = ui_state.get("dialogue_npc", "NPC")
context_lines.append(f"In dialogue with: {npc}")
if ui_state.get("menu_open"):
context_lines.append(f"Menu open: {ui_state['menu_open']}")
if ui_state.get("combat_active"):
context_lines.append("Status: IN COMBAT")
if visual_context:
context_lines.append(f"Scene: {visual_context}")
system_content = (
"You are Timmy, an AI game agent. "
"Respond with valid game commands only.\n\n" + "\n".join(context_lines)
)
return [
{"role": "system", "content": system_content},
{"role": "user", "content": text},
]
# ── Default model assignments ────────────────────────────────────────────────
# Overridable per deployment via MetabolicRouter(tier_models={...}).
# Model benchmarks (M3 Max 36 GB, issue #1063):
# Qwen3-8B Q6_K — 0.933 F1 tool calling, ~45-55 tok/s (~6 GB)
# Qwen3-14B Q5_K_M — 0.971 F1 tool calling, ~20-28 tok/s (~9.5 GB)
# Qwen3-32B Q4_K_M — highest quality, ~8-12 tok/s (~20 GB, on demand)
DEFAULT_TIER_MODELS: dict[ModelTier, str] = {
ModelTier.T1_ROUTINE: "qwen3:8b",
ModelTier.T2_MEDIUM: "qwen3:14b",
ModelTier.T3_COMPLEX: "qwen3:30b", # Closest Ollama tag to 32B Q4
}
class MetabolicRouter:
"""Routes LLM requests to the cheapest-sufficient model tier.
Wraps CascadeRouter with:
- Complexity classification via classify_complexity()
- Prompt assembly via build_prompt()
- T3 world-pause / world-unpause (graceful if no world adapter)
Usage::
router = MetabolicRouter()
# Simple route call — classification + prompt + inference in one step
result = await router.route(
task="Go north",
state={"location": "Balmora"},
ui_state={},
)
print(result["content"], result["tier"])
# Pre-classify if you need the tier for telemetry
tier = router.classify("Plan the best path to Vivec", game_state)
# Wire in world adapter for T3 pause/unpause
router.set_world(world_adapter)
"""
def __init__(
self,
cascade: Any | None = None,
tier_models: dict[ModelTier, str] | None = None,
) -> None:
"""Initialise the metabolic router.
Args:
cascade: CascadeRouter instance to use. If None, the
singleton returned by get_router() is used lazily.
tier_models: Override default model names per tier.
"""
self._cascade = cascade
self._tier_models: dict[ModelTier, str] = dict(DEFAULT_TIER_MODELS)
if tier_models:
self._tier_models.update(tier_models)
self._world: Any | None = None
def set_world(self, world: Any) -> None:
"""Wire in a world adapter for T3 pause / unpause support.
The adapter only needs to implement ``act(CommandInput)`` — the full
WorldInterface contract is not required. A missing or broken world
adapter degrades gracefully (logs a warning, inference continues).
Args:
world: Any object with an ``act(CommandInput)`` method.
"""
self._world = world
def _get_cascade(self) -> Any:
"""Return the CascadeRouter, creating the singleton if needed."""
if self._cascade is None:
from infrastructure.router.cascade import get_router
self._cascade = get_router()
return self._cascade
def classify(self, task: str, state: dict) -> ModelTier:
"""Classify task complexity. Delegates to classify_complexity()."""
return classify_complexity(task, state)
async def _pause_world(self) -> None:
"""Pause the game world before T3 inference (graceful degradation)."""
if self._world is None:
return
try:
from infrastructure.world.types import CommandInput
await asyncio.to_thread(self._world.act, CommandInput(action="pause"))
logger.debug("MetabolicRouter: world paused for T3 inference")
except Exception as exc:
logger.warning("world.pause() failed — continuing without pause: %s", exc)
async def _unpause_world(self) -> None:
"""Unpause the game world after T3 inference (always called, even on error)."""
if self._world is None:
return
try:
from infrastructure.world.types import CommandInput
await asyncio.to_thread(self._world.act, CommandInput(action="unpause"))
logger.debug("MetabolicRouter: world unpaused after T3 inference")
except Exception as exc:
logger.warning("world.unpause() failed — game may remain paused: %s", exc)
async def route(
self,
task: str,
state: dict,
ui_state: dict | None = None,
visual_context: str | None = None,
temperature: float = 0.3,
max_tokens: int | None = None,
) -> dict:
"""Route a task to the appropriate model tier and return the LLM response.
Selects the tier via classify_complexity(), assembles the prompt via
build_prompt(), and dispatches to CascadeRouter. For T3, the game
world is paused before inference and unpaused after (in a finally block).
Args:
task: Natural-language task description or player input.
state: Current game state dict.
ui_state: Current UI state dict (optional, defaults to {}).
visual_context: Optional screen/scene description from vision model.
temperature: Sampling temperature (default 0.3 for game commands).
max_tokens: Maximum tokens to generate.
Returns:
Dict with keys: ``content``, ``provider``, ``model``, ``tier``,
``latency_ms``, plus any extra keys from CascadeRouter.
Raises:
RuntimeError: If all providers fail (propagated from CascadeRouter).
"""
ui_state = ui_state or {}
tier = self.classify(task, state)
model = self._tier_models[tier]
messages = build_prompt(state, ui_state, task, visual_context)
cascade = self._get_cascade()
logger.info(
"MetabolicRouter: tier=%s model=%s task=%r",
tier,
model,
task[:80],
)
if tier == ModelTier.T3_COMPLEX:
await self._pause_world()
try:
result = await cascade.complete(
messages=messages,
model=model,
temperature=temperature,
max_tokens=max_tokens,
)
finally:
await self._unpause_world()
else:
result = await cascade.complete(
messages=messages,
model=model,
temperature=temperature,
max_tokens=max_tokens,
)
result["tier"] = tier
return result
# ── Module-level singleton ────────────────────────────────────────────────────
_metabolic_router: MetabolicRouter | None = None
def get_metabolic_router() -> MetabolicRouter:
"""Get or create the MetabolicRouter singleton."""
global _metabolic_router
if _metabolic_router is None:
_metabolic_router = MetabolicRouter()
return _metabolic_router

View File

@@ -0,0 +1,247 @@
"""Self-correction event logger.
Records instances where the agent detected its own errors and the steps
it took to correct them. Used by the Self-Correction Dashboard to visualise
these events and surface recurring failure patterns.
Usage::
from infrastructure.self_correction import log_self_correction, get_corrections, get_patterns
log_self_correction(
source="agentic_loop",
original_intent="Execute step 3: deploy service",
detected_error="ConnectionRefusedError: port 8080 unavailable",
correction_strategy="Retry on alternate port 8081",
final_outcome="Success on retry",
task_id="abc123",
)
"""
from __future__ import annotations
import json
import logging
import sqlite3
import uuid
from collections.abc import Generator
from contextlib import closing, contextmanager
from datetime import UTC, datetime
from pathlib import Path
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Database
# ---------------------------------------------------------------------------
_DB_PATH: Path | None = None
def _get_db_path() -> Path:
global _DB_PATH
if _DB_PATH is None:
from config import settings
_DB_PATH = Path(settings.repo_root) / "data" / "self_correction.db"
return _DB_PATH
@contextmanager
def _get_db() -> Generator[sqlite3.Connection, None, None]:
db_path = _get_db_path()
db_path.parent.mkdir(parents=True, exist_ok=True)
with closing(sqlite3.connect(str(db_path))) as conn:
conn.row_factory = sqlite3.Row
conn.execute("""
CREATE TABLE IF NOT EXISTS self_correction_events (
id TEXT PRIMARY KEY,
source TEXT NOT NULL,
task_id TEXT DEFAULT '',
original_intent TEXT NOT NULL,
detected_error TEXT NOT NULL,
correction_strategy TEXT NOT NULL,
final_outcome TEXT NOT NULL,
outcome_status TEXT DEFAULT 'success',
error_type TEXT DEFAULT '',
created_at TEXT DEFAULT (datetime('now'))
)
""")
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_sc_created ON self_correction_events(created_at)"
)
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_sc_error_type ON self_correction_events(error_type)"
)
conn.commit()
yield conn
# ---------------------------------------------------------------------------
# Write
# ---------------------------------------------------------------------------
def log_self_correction(
*,
source: str,
original_intent: str,
detected_error: str,
correction_strategy: str,
final_outcome: str,
task_id: str = "",
outcome_status: str = "success",
error_type: str = "",
) -> str:
"""Record a self-correction event and return its ID.
Args:
source: Module or component that triggered the correction.
original_intent: What the agent was trying to do.
detected_error: The error or problem that was detected.
correction_strategy: How the agent attempted to correct the error.
final_outcome: What the result of the correction attempt was.
task_id: Optional task/session ID for correlation.
outcome_status: 'success', 'partial', or 'failed'.
error_type: Short category label for pattern analysis (e.g.
'ConnectionError', 'TimeoutError').
Returns:
The ID of the newly created record.
"""
event_id = str(uuid.uuid4())
if not error_type:
# Derive a simple type from the first word of the detected error
error_type = detected_error.split(":")[0].strip()[:64]
try:
with _get_db() as conn:
conn.execute(
"""
INSERT INTO self_correction_events
(id, source, task_id, original_intent, detected_error,
correction_strategy, final_outcome, outcome_status, error_type)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
""",
(
event_id,
source,
task_id,
original_intent[:2000],
detected_error[:2000],
correction_strategy[:2000],
final_outcome[:2000],
outcome_status,
error_type,
),
)
conn.commit()
logger.info(
"Self-correction logged [%s] source=%s error_type=%s status=%s",
event_id[:8],
source,
error_type,
outcome_status,
)
except Exception as exc:
logger.warning("Failed to log self-correction event: %s", exc)
return event_id
# ---------------------------------------------------------------------------
# Read
# ---------------------------------------------------------------------------
def get_corrections(limit: int = 50) -> list[dict]:
"""Return the most recent self-correction events, newest first."""
try:
with _get_db() as conn:
rows = conn.execute(
"""
SELECT * FROM self_correction_events
ORDER BY created_at DESC
LIMIT ?
""",
(limit,),
).fetchall()
return [dict(r) for r in rows]
except Exception as exc:
logger.warning("Failed to fetch self-correction events: %s", exc)
return []
def get_patterns(top_n: int = 10) -> list[dict]:
"""Return the most common recurring error types with counts.
Each entry has:
- error_type: category label
- count: total occurrences
- success_count: corrected successfully
- failed_count: correction also failed
- last_seen: ISO timestamp of most recent occurrence
"""
try:
with _get_db() as conn:
rows = conn.execute(
"""
SELECT
error_type,
COUNT(*) AS count,
SUM(CASE WHEN outcome_status = 'success' THEN 1 ELSE 0 END) AS success_count,
SUM(CASE WHEN outcome_status = 'failed' THEN 1 ELSE 0 END) AS failed_count,
MAX(created_at) AS last_seen
FROM self_correction_events
GROUP BY error_type
ORDER BY count DESC
LIMIT ?
""",
(top_n,),
).fetchall()
return [dict(r) for r in rows]
except Exception as exc:
logger.warning("Failed to fetch self-correction patterns: %s", exc)
return []
def get_stats() -> dict:
"""Return aggregate statistics for the summary panel."""
try:
with _get_db() as conn:
row = conn.execute(
"""
SELECT
COUNT(*) AS total,
SUM(CASE WHEN outcome_status = 'success' THEN 1 ELSE 0 END) AS success_count,
SUM(CASE WHEN outcome_status = 'partial' THEN 1 ELSE 0 END) AS partial_count,
SUM(CASE WHEN outcome_status = 'failed' THEN 1 ELSE 0 END) AS failed_count,
COUNT(DISTINCT error_type) AS unique_error_types,
COUNT(DISTINCT source) AS sources
FROM self_correction_events
"""
).fetchone()
if row is None:
return _empty_stats()
d = dict(row)
total = d.get("total") or 0
if total:
d["success_rate"] = round((d.get("success_count") or 0) / total * 100)
else:
d["success_rate"] = 0
return d
except Exception as exc:
logger.warning("Failed to fetch self-correction stats: %s", exc)
return _empty_stats()
def _empty_stats() -> dict:
return {
"total": 0,
"success_count": 0,
"partial_count": 0,
"failed_count": 0,
"unique_error_types": 0,
"sources": 0,
"success_rate": 0,
}

View File

@@ -135,7 +135,9 @@ class BannerlordObserver:
self._host = host or settings.gabs_host
self._port = port or settings.gabs_port
self._timeout = timeout if timeout is not None else settings.gabs_timeout
self._poll_interval = poll_interval if poll_interval is not None else settings.gabs_poll_interval
self._poll_interval = (
poll_interval if poll_interval is not None else settings.gabs_poll_interval
)
self._journal_path = Path(journal_path) if journal_path else _get_journal_path()
self._entry_count = 0
self._days_observed: set[str] = set()

View File

@@ -24,6 +24,8 @@ logger = logging.getLogger(__name__)
@dataclass
class Intent:
"""A classified user intent with confidence score and extracted entities."""
name: str
confidence: float # 0.0 to 1.0
entities: dict

View File

@@ -17,11 +17,15 @@ logger = logging.getLogger(__name__)
class TxType(StrEnum):
"""Lightning transaction direction type."""
incoming = "incoming"
outgoing = "outgoing"
class TxStatus(StrEnum):
"""Lightning transaction settlement status."""
pending = "pending"
settled = "settled"
failed = "failed"

View File

@@ -0,0 +1,7 @@
"""Self-coding package — Timmy's self-modification capability.
Provides the branch→edit→test→commit/revert loop that allows Timmy
to propose and apply code changes autonomously, gated by the test suite.
Main entry point: ``self_coding.self_modify.loop``
"""

View File

@@ -0,0 +1,129 @@
"""Gitea REST client — thin wrapper for PR creation and issue commenting.
Uses ``settings.gitea_url``, ``settings.gitea_token``, and
``settings.gitea_repo`` (owner/repo) from config. Degrades gracefully
when the token is absent or the server is unreachable.
"""
from __future__ import annotations
import logging
from dataclasses import dataclass
logger = logging.getLogger(__name__)
@dataclass
class PullRequest:
"""Minimal representation of a created pull request."""
number: int
title: str
html_url: str
class GiteaClient:
"""HTTP client for Gitea's REST API v1.
All methods return structured results and never raise — errors are
logged at WARNING level and indicated via return value.
"""
def __init__(
self,
base_url: str | None = None,
token: str | None = None,
repo: str | None = None,
) -> None:
from config import settings
self._base_url = (base_url or settings.gitea_url).rstrip("/")
self._token = token or settings.gitea_token
self._repo = repo or settings.gitea_repo
# ── internal ────────────────────────────────────────────────────────────
def _headers(self) -> dict[str, str]:
return {
"Authorization": f"token {self._token}",
"Content-Type": "application/json",
}
def _api(self, path: str) -> str:
return f"{self._base_url}/api/v1/{path.lstrip('/')}"
# ── public API ───────────────────────────────────────────────────────────
def create_pull_request(
self,
title: str,
body: str,
head: str,
base: str = "main",
) -> PullRequest | None:
"""Open a pull request.
Args:
title: PR title (keep under 70 chars).
body: PR body in markdown.
head: Source branch (e.g. ``self-modify/issue-983``).
base: Target branch (default ``main``).
Returns:
A ``PullRequest`` dataclass on success, ``None`` on failure.
"""
if not self._token:
logger.warning("Gitea token not configured — skipping PR creation")
return None
try:
import requests as _requests
resp = _requests.post(
self._api(f"repos/{self._repo}/pulls"),
headers=self._headers(),
json={"title": title, "body": body, "head": head, "base": base},
timeout=15,
)
resp.raise_for_status()
data = resp.json()
pr = PullRequest(
number=data["number"],
title=data["title"],
html_url=data["html_url"],
)
logger.info("PR #%d created: %s", pr.number, pr.html_url)
return pr
except Exception as exc:
logger.warning("Failed to create PR: %s", exc)
return None
def add_issue_comment(self, issue_number: int, body: str) -> bool:
"""Post a comment on an issue or PR.
Returns:
True on success, False on failure.
"""
if not self._token:
logger.warning("Gitea token not configured — skipping issue comment")
return False
try:
import requests as _requests
resp = _requests.post(
self._api(f"repos/{self._repo}/issues/{issue_number}/comments"),
headers=self._headers(),
json={"body": body},
timeout=15,
)
resp.raise_for_status()
logger.info("Comment posted on issue #%d", issue_number)
return True
except Exception as exc:
logger.warning("Failed to post comment on issue #%d: %s", issue_number, exc)
return False
# Module-level singleton
gitea_client = GiteaClient()

View File

@@ -0,0 +1 @@
"""Self-modification loop sub-package."""

View File

@@ -0,0 +1,301 @@
"""Self-modification loop — branch → edit → test → commit/revert.
Timmy's self-coding capability, restored after deletion in
Operation Darling Purge (commit 584eeb679e88).
## Cycle
1. **Branch** — create ``self-modify/<slug>`` from ``main``
2. **Edit** — apply the proposed change (patch string or callable)
3. **Test** — run ``pytest tests/ -x -q``; never commit on failure
4. **Commit** — stage and commit on green; revert branch on red
5. **PR** — open a Gitea pull request (requires no direct push to main)
## Guards
- Never push directly to ``main`` or ``master``
- All changes land via PR (enforced by ``_guard_branch``)
- Test gate is mandatory; ``skip_tests=True`` is for unit-test use only
- Commits only happen when ``pytest tests/ -x -q`` exits 0
## Usage::
from self_coding.self_modify.loop import SelfModifyLoop
loop = SelfModifyLoop()
result = await loop.run(
slug="add-hello-tool",
description="Add hello() convenience tool",
edit_fn=my_edit_function, # callable(repo_root: str) -> None
)
if result.success:
print(f"PR: {result.pr_url}")
else:
print(f"Failed: {result.error}")
"""
from __future__ import annotations
import logging
import subprocess
import time
from collections.abc import Callable
from dataclasses import dataclass, field
from pathlib import Path
from config import settings
logger = logging.getLogger(__name__)
# Branches that must never receive direct commits
_PROTECTED_BRANCHES = frozenset({"main", "master", "develop"})
# Test command used as the commit gate
_TEST_COMMAND = ["pytest", "tests/", "-x", "-q", "--tb=short"]
# Max time (seconds) to wait for the test suite
_TEST_TIMEOUT = 300
@dataclass
class LoopResult:
"""Result from one self-modification cycle."""
success: bool
branch: str = ""
commit_sha: str = ""
pr_url: str = ""
pr_number: int = 0
test_output: str = ""
error: str = ""
elapsed_ms: float = 0.0
metadata: dict = field(default_factory=dict)
class SelfModifyLoop:
"""Orchestrate branch → edit → test → commit/revert → PR.
Args:
repo_root: Absolute path to the git repository (defaults to
``settings.repo_root``).
remote: Git remote name (default ``origin``).
base_branch: Branch to fork from and target for the PR
(default ``main``).
"""
def __init__(
self,
repo_root: str | None = None,
remote: str = "origin",
base_branch: str = "main",
) -> None:
self._repo_root = Path(repo_root or settings.repo_root)
self._remote = remote
self._base_branch = base_branch
# ── public ──────────────────────────────────────────────────────────────
async def run(
self,
slug: str,
description: str,
edit_fn: Callable[[str], None],
issue_number: int | None = None,
skip_tests: bool = False,
) -> LoopResult:
"""Execute one full self-modification cycle.
Args:
slug: Short identifier used for the branch name
(e.g. ``"add-hello-tool"``).
description: Human-readable description for commit message
and PR body.
edit_fn: Callable that receives the repo root path (str)
and applies the desired code changes in-place.
issue_number: Optional Gitea issue number to reference in PR.
skip_tests: If ``True``, skip the test gate (unit-test use
only — never use in production).
Returns:
:class:`LoopResult` describing the outcome.
"""
start = time.time()
branch = f"self-modify/{slug}"
try:
self._guard_branch(branch)
self._checkout_base()
self._create_branch(branch)
try:
edit_fn(str(self._repo_root))
except Exception as exc:
self._revert_branch(branch)
return LoopResult(
success=False,
branch=branch,
error=f"edit_fn raised: {exc}",
elapsed_ms=self._elapsed(start),
)
if not skip_tests:
test_output, passed = self._run_tests()
if not passed:
self._revert_branch(branch)
return LoopResult(
success=False,
branch=branch,
test_output=test_output,
error="Tests failed — branch reverted",
elapsed_ms=self._elapsed(start),
)
else:
test_output = "(tests skipped)"
sha = self._commit_all(description)
self._push_branch(branch)
pr = self._create_pr(
branch=branch,
description=description,
test_output=test_output,
issue_number=issue_number,
)
return LoopResult(
success=True,
branch=branch,
commit_sha=sha,
pr_url=pr.html_url if pr else "",
pr_number=pr.number if pr else 0,
test_output=test_output,
elapsed_ms=self._elapsed(start),
)
except Exception as exc:
logger.warning("Self-modify loop failed: %s", exc)
return LoopResult(
success=False,
branch=branch,
error=str(exc),
elapsed_ms=self._elapsed(start),
)
# ── private helpers ──────────────────────────────────────────────────────
@staticmethod
def _elapsed(start: float) -> float:
return (time.time() - start) * 1000
def _git(self, *args: str, check: bool = True) -> subprocess.CompletedProcess:
"""Run a git command in the repo root."""
cmd = ["git", *args]
logger.debug("git %s", " ".join(args))
return subprocess.run(
cmd,
cwd=str(self._repo_root),
capture_output=True,
text=True,
check=check,
)
def _guard_branch(self, branch: str) -> None:
"""Raise if the target branch is a protected branch name."""
if branch in _PROTECTED_BRANCHES:
raise ValueError(
f"Refusing to operate on protected branch '{branch}'. "
"All self-modifications must go via PR."
)
def _checkout_base(self) -> None:
"""Checkout the base branch and pull latest."""
self._git("checkout", self._base_branch)
# Best-effort pull; ignore failures (e.g. no remote configured)
self._git("pull", self._remote, self._base_branch, check=False)
def _create_branch(self, branch: str) -> None:
"""Create and checkout a new branch, deleting an old one if needed."""
# Delete local branch if it already exists (stale prior attempt)
self._git("branch", "-D", branch, check=False)
self._git("checkout", "-b", branch)
logger.info("Created branch: %s", branch)
def _revert_branch(self, branch: str) -> None:
"""Checkout base and delete the failed branch."""
try:
self._git("checkout", self._base_branch, check=False)
self._git("branch", "-D", branch, check=False)
logger.info("Reverted and deleted branch: %s", branch)
except Exception as exc:
logger.warning("Failed to revert branch %s: %s", branch, exc)
def _run_tests(self) -> tuple[str, bool]:
"""Run the test suite. Returns (output, passed)."""
logger.info("Running test suite: %s", " ".join(_TEST_COMMAND))
try:
result = subprocess.run(
_TEST_COMMAND,
cwd=str(self._repo_root),
capture_output=True,
text=True,
timeout=_TEST_TIMEOUT,
)
output = (result.stdout + "\n" + result.stderr).strip()
passed = result.returncode == 0
logger.info(
"Test suite %s (exit %d)", "PASSED" if passed else "FAILED", result.returncode
)
return output, passed
except subprocess.TimeoutExpired:
msg = f"Test suite timed out after {_TEST_TIMEOUT}s"
logger.warning(msg)
return msg, False
except FileNotFoundError:
msg = "pytest not found on PATH"
logger.warning(msg)
return msg, False
def _commit_all(self, message: str) -> str:
"""Stage all changes and create a commit. Returns the new SHA."""
self._git("add", "-A")
self._git("commit", "-m", message)
result = self._git("rev-parse", "HEAD")
sha = result.stdout.strip()
logger.info("Committed: %s sha=%s", message[:60], sha[:12])
return sha
def _push_branch(self, branch: str) -> None:
"""Push the branch to the remote."""
self._git("push", "-u", self._remote, branch)
logger.info("Pushed branch: %s -> %s", branch, self._remote)
def _create_pr(
self,
branch: str,
description: str,
test_output: str,
issue_number: int | None,
):
"""Open a Gitea PR. Returns PullRequest or None on failure."""
from self_coding.gitea_client import GiteaClient
client = GiteaClient()
issue_ref = f"\n\nFixes #{issue_number}" if issue_number else ""
test_section = (
f"\n\n## Test results\n```\n{test_output[:2000]}\n```"
if test_output and test_output != "(tests skipped)"
else ""
)
body = (
f"## Summary\n{description}"
f"{issue_ref}"
f"{test_section}"
"\n\n🤖 Generated by Timmy's self-modification loop"
)
return client.create_pull_request(
title=f"[self-modify] {description[:60]}",
body=body,
head=branch,
base=self._base_branch,
)

View File

@@ -312,6 +312,13 @@ async def _handle_step_failure(
"adaptation": step.result[:200],
},
)
_log_self_correction(
task_id=task_id,
step_desc=step_desc,
exc=exc,
outcome=step.result,
outcome_status="success",
)
if on_progress:
await on_progress(f"[Adapted] {step_desc}", step_num, total_steps)
except Exception as adapt_exc: # broad catch intentional
@@ -325,9 +332,42 @@ async def _handle_step_failure(
duration_ms=int((time.monotonic() - step_start) * 1000),
)
)
_log_self_correction(
task_id=task_id,
step_desc=step_desc,
exc=exc,
outcome=f"Adaptation also failed: {adapt_exc}",
outcome_status="failed",
)
completed_results.append(f"Step {step_num}: FAILED")
def _log_self_correction(
*,
task_id: str,
step_desc: str,
exc: Exception,
outcome: str,
outcome_status: str,
) -> None:
"""Best-effort: log a self-correction event (never raises)."""
try:
from infrastructure.self_correction import log_self_correction
log_self_correction(
source="agentic_loop",
original_intent=step_desc,
detected_error=f"{type(exc).__name__}: {exc}",
correction_strategy="Adaptive re-plan via LLM",
final_outcome=outcome[:500],
task_id=task_id,
outcome_status=outcome_status,
error_type=type(exc).__name__,
)
except Exception as log_exc:
logger.debug("Self-correction log failed: %s", log_exc)
# ---------------------------------------------------------------------------
# Core loop
# ---------------------------------------------------------------------------

View File

@@ -196,9 +196,7 @@ class EmotionalStateTracker:
"intensity_label": _intensity_label(self.state.intensity),
"previous_emotion": self.state.previous_emotion,
"trigger_event": self.state.trigger_event,
"prompt_modifier": EMOTION_PROMPT_MODIFIERS.get(
self.state.current_emotion, ""
),
"prompt_modifier": EMOTION_PROMPT_MODIFIERS.get(self.state.current_emotion, ""),
}
def get_prompt_modifier(self) -> str:

View File

@@ -36,6 +36,8 @@ _EXPIRY_DAYS = 7
@dataclass
class ApprovalItem:
"""A proposed autonomous action requiring owner approval."""
id: str
title: str
description: str

View File

@@ -8,7 +8,7 @@ Flow:
1. prepare_experiment — clone repo + run data prep
2. run_experiment — execute train.py with wall-clock timeout
3. evaluate_result — compare metric against baseline
4. experiment_loop — orchestrate the full cycle
4. SystemExperiment — orchestrate the full cycle via class interface
All subprocess calls are guarded with timeouts for graceful degradation.
"""
@@ -17,9 +17,12 @@ from __future__ import annotations
import json
import logging
import os
import platform
import re
import subprocess
import time
from collections.abc import Callable
from pathlib import Path
from typing import Any
@@ -29,15 +32,61 @@ DEFAULT_REPO = "https://github.com/karpathy/autoresearch.git"
_METRIC_RE = re.compile(r"val_bpb[:\s]+([0-9]+\.?[0-9]*)")
# ── Higher-is-better metric names ────────────────────────────────────────────
_HIGHER_IS_BETTER = frozenset({"unit_pass_rate", "coverage"})
def is_apple_silicon() -> bool:
"""Return True when running on Apple Silicon (M-series chip)."""
return platform.system() == "Darwin" and platform.machine() == "arm64"
def _build_experiment_env(
dataset: str = "tinystories",
backend: str = "auto",
) -> dict[str, str]:
"""Build environment variables for an autoresearch subprocess.
Args:
dataset: Dataset name forwarded as ``AUTORESEARCH_DATASET``.
``"tinystories"`` is recommended for Apple Silicon (lower entropy,
faster iteration).
backend: Inference backend forwarded as ``AUTORESEARCH_BACKEND``.
``"auto"`` enables MLX on Apple Silicon; ``"cpu"`` forces CPU.
Returns:
Merged environment dict (inherits current process env).
"""
env = os.environ.copy()
env["AUTORESEARCH_DATASET"] = dataset
if backend == "auto":
env["AUTORESEARCH_BACKEND"] = "mlx" if is_apple_silicon() else "cuda"
else:
env["AUTORESEARCH_BACKEND"] = backend
return env
def prepare_experiment(
workspace: Path,
repo_url: str = DEFAULT_REPO,
dataset: str = "tinystories",
backend: str = "auto",
) -> str:
"""Clone autoresearch repo and run data preparation.
On Apple Silicon the ``dataset`` defaults to ``"tinystories"`` (lower
entropy, faster iteration) and ``backend`` to ``"auto"`` which resolves to
MLX. Both values are forwarded as ``AUTORESEARCH_DATASET`` /
``AUTORESEARCH_BACKEND`` environment variables so that ``prepare.py`` and
``train.py`` can adapt their behaviour without CLI changes.
Args:
workspace: Directory to set up the experiment in.
repo_url: Git URL for the autoresearch repository.
dataset: Dataset name; ``"tinystories"`` is recommended on Mac.
backend: Inference backend; ``"auto"`` picks MLX on Apple Silicon.
Returns:
Status message describing what was prepared.
@@ -59,6 +108,14 @@ def prepare_experiment(
else:
logger.info("Autoresearch repo already present at %s", repo_dir)
env = _build_experiment_env(dataset=dataset, backend=backend)
if is_apple_silicon():
logger.info(
"Apple Silicon detected — dataset=%s backend=%s",
env["AUTORESEARCH_DATASET"],
env["AUTORESEARCH_BACKEND"],
)
# Run prepare.py (data download + tokeniser training)
prepare_script = repo_dir / "prepare.py"
if prepare_script.exists():
@@ -69,6 +126,7 @@ def prepare_experiment(
text=True,
cwd=str(repo_dir),
timeout=300,
env=env,
)
if result.returncode != 0:
return f"Preparation failed: {result.stderr.strip()[:500]}"
@@ -81,6 +139,8 @@ def run_experiment(
workspace: Path,
timeout: int = 300,
metric_name: str = "val_bpb",
dataset: str = "tinystories",
backend: str = "auto",
) -> dict[str, Any]:
"""Run a single training experiment with a wall-clock timeout.
@@ -88,6 +148,9 @@ def run_experiment(
workspace: Experiment workspace (contains autoresearch/ subdir).
timeout: Maximum wall-clock seconds for the run.
metric_name: Name of the metric to extract from stdout.
dataset: Dataset forwarded to the subprocess via env var.
backend: Inference backend forwarded via env var (``"auto"`` → MLX on
Apple Silicon, CUDA otherwise).
Returns:
Dict with keys: metric (float|None), log (str), duration_s (int),
@@ -105,6 +168,7 @@ def run_experiment(
"error": f"train.py not found in {repo_dir}",
}
env = _build_experiment_env(dataset=dataset, backend=backend)
start = time.monotonic()
try:
result = subprocess.run(
@@ -113,6 +177,7 @@ def run_experiment(
text=True,
cwd=str(repo_dir),
timeout=timeout,
env=env,
)
duration = int(time.monotonic() - start)
output = result.stdout + result.stderr
@@ -125,7 +190,7 @@ def run_experiment(
"log": output[-2000:], # Keep last 2k chars
"duration_s": duration,
"success": result.returncode == 0,
"error": None if result.returncode == 0 else f"Exit code {result.returncode}",
"error": (None if result.returncode == 0 else f"Exit code {result.returncode}"),
}
except subprocess.TimeoutExpired:
duration = int(time.monotonic() - start)
@@ -212,3 +277,369 @@ def _append_result(workspace: Path, result: dict[str, Any]) -> None:
results_file.parent.mkdir(parents=True, exist_ok=True)
with results_file.open("a") as f:
f.write(json.dumps(result) + "\n")
def _extract_pass_rate(output: str) -> float | None:
"""Extract pytest pass rate as a percentage from tox/pytest output."""
passed_m = re.search(r"(\d+) passed", output)
failed_m = re.search(r"(\d+) failed", output)
if passed_m:
passed = int(passed_m.group(1))
failed = int(failed_m.group(1)) if failed_m else 0
total = passed + failed
return (passed / total * 100.0) if total > 0 else 100.0
return None
def _extract_coverage(output: str) -> float | None:
"""Extract total coverage percentage from coverage output."""
coverage_m = re.search(r"(?:TOTAL\s+\d+\s+\d+\s+|Total coverage:\s*)(\d+)%", output)
if coverage_m:
try:
return float(coverage_m.group(1))
except ValueError:
pass
return None
class SystemExperiment:
"""An autoresearch experiment targeting a specific module with a configurable metric.
Encapsulates the hypothesis → edit → tox → evaluate → commit/revert loop
for a single target file or module.
Args:
target: Path or module name to optimise (e.g. ``src/timmy/agent.py``).
metric: Metric to extract from tox output. Built-in values:
``unit_pass_rate`` (default), ``coverage``, ``val_bpb``.
Any other value is forwarded to :func:`_extract_metric`.
budget_minutes: Wall-clock budget per experiment (default 5 min).
workspace: Working directory for subprocess calls. Defaults to ``cwd``.
revert_on_failure: Whether to revert changes on failed experiments.
hypothesis: Optional natural language hypothesis for the experiment.
metric_fn: Optional callable for custom metric extraction.
If provided, overrides built-in metric extraction.
"""
def __init__(
self,
target: str,
metric: str = "unit_pass_rate",
budget_minutes: int = 5,
workspace: Path | None = None,
revert_on_failure: bool = True,
hypothesis: str = "",
metric_fn: Callable[[str], float | None] | None = None,
) -> None:
self.target = target
self.metric = metric
self.budget_seconds = budget_minutes * 60
self.workspace = Path(workspace) if workspace else Path.cwd()
self.revert_on_failure = revert_on_failure
self.hypothesis = hypothesis
self.metric_fn = metric_fn
self.results: list[dict[str, Any]] = []
self.baseline: float | None = None
# ── Hypothesis generation ─────────────────────────────────────────────────
def generate_hypothesis(self, program_content: str = "") -> str:
"""Return a plain-English hypothesis for the next experiment.
Uses the first non-empty line of *program_content* when available;
falls back to a generic description based on target and metric.
"""
first_line = ""
for line in program_content.splitlines():
stripped = line.strip()
if stripped and not stripped.startswith("#"):
first_line = stripped[:120]
break
if first_line:
return f"[{self.target}] {first_line}"
return f"Improve {self.metric} for {self.target}"
# ── Edit phase ────────────────────────────────────────────────────────────
def apply_edit(self, hypothesis: str, model: str = "qwen3:30b") -> str:
"""Apply code edits to *target* via Aider.
Returns a status string. Degrades gracefully — never raises.
"""
prompt = f"Edit {self.target}: {hypothesis}"
try:
result = subprocess.run(
["aider", "--no-git", "--model", f"ollama/{model}", "--quiet", prompt],
capture_output=True,
text=True,
timeout=self.budget_seconds,
cwd=str(self.workspace),
)
if result.returncode == 0:
return result.stdout or "Edit applied."
return f"Aider error (exit {result.returncode}): {result.stderr[:500]}"
except FileNotFoundError:
logger.warning("Aider not installed — edit skipped")
return "Aider not available — edit skipped"
except subprocess.TimeoutExpired:
logger.warning("Aider timed out after %ds", self.budget_seconds)
return "Aider timed out"
except (OSError, subprocess.SubprocessError) as exc:
logger.warning("Aider failed: %s", exc)
return f"Edit failed: {exc}"
# ── Evaluation phase ──────────────────────────────────────────────────────
def run_tox(self, tox_env: str = "unit") -> dict[str, Any]:
"""Run *tox_env* and return a result dict.
Returns:
Dict with keys: ``metric`` (float|None), ``log`` (str),
``duration_s`` (int), ``success`` (bool), ``error`` (str|None).
"""
start = time.monotonic()
try:
result = subprocess.run(
["tox", "-e", tox_env],
capture_output=True,
text=True,
timeout=self.budget_seconds,
cwd=str(self.workspace),
)
duration = int(time.monotonic() - start)
output = result.stdout + result.stderr
metric_val = self._extract_tox_metric(output)
return {
"metric": metric_val,
"log": output[-3000:],
"duration_s": duration,
"success": result.returncode == 0,
"error": (None if result.returncode == 0 else f"Exit code {result.returncode}"),
}
except subprocess.TimeoutExpired:
duration = int(time.monotonic() - start)
return {
"metric": None,
"log": f"Budget exceeded after {self.budget_seconds}s",
"duration_s": duration,
"success": False,
"error": f"Budget exceeded after {self.budget_seconds}s",
}
except OSError as exc:
return {
"metric": None,
"log": "",
"duration_s": 0,
"success": False,
"error": str(exc),
}
def _extract_tox_metric(self, output: str) -> float | None:
"""Dispatch to the correct metric extractor based on *self.metric*."""
# Use custom metric function if provided
if self.metric_fn is not None:
try:
return self.metric_fn(output)
except Exception as exc:
logger.warning("Custom metric_fn failed: %s", exc)
return None
if self.metric == "unit_pass_rate":
return _extract_pass_rate(output)
if self.metric == "coverage":
return _extract_coverage(output)
return _extract_metric(output, self.metric)
def evaluate(self, current: float | None, baseline: float | None) -> str:
"""Compare *current* metric against *baseline* and return an assessment."""
if current is None:
return "Indeterminate: metric not extracted from output"
if baseline is None:
unit = "%" if self.metric in _HIGHER_IS_BETTER else ""
return f"Baseline: {self.metric} = {current:.2f}{unit}"
if self.metric in _HIGHER_IS_BETTER:
delta = current - baseline
pct = (delta / baseline * 100) if baseline != 0 else 0.0
if delta > 0:
return f"Improvement: {self.metric} {baseline:.2f}% → {current:.2f}% ({pct:+.2f}%)"
if delta < 0:
return f"Regression: {self.metric} {baseline:.2f}% → {current:.2f}% ({pct:+.2f}%)"
return f"No change: {self.metric} = {current:.2f}%"
# lower-is-better (val_bpb, loss, etc.)
return evaluate_result(current, baseline, self.metric)
def is_improvement(self, current: float, baseline: float) -> bool:
"""Return True if *current* is better than *baseline* for this metric."""
if self.metric in _HIGHER_IS_BETTER:
return current > baseline
return current < baseline # lower-is-better
# ── Git phase ─────────────────────────────────────────────────────────────
def create_branch(self, branch_name: str) -> bool:
"""Create and checkout a new git branch. Returns True on success."""
try:
subprocess.run(
["git", "checkout", "-b", branch_name],
cwd=str(self.workspace),
check=True,
timeout=30,
)
return True
except subprocess.CalledProcessError as exc:
logger.warning("Git branch creation failed: %s", exc)
return False
def commit_changes(self, message: str) -> bool:
"""Stage and commit all changes. Returns True on success."""
try:
subprocess.run(["git", "add", "-A"], cwd=str(self.workspace), check=True, timeout=30)
subprocess.run(
["git", "commit", "-m", message],
cwd=str(self.workspace),
check=True,
timeout=30,
)
return True
except subprocess.CalledProcessError as exc:
logger.warning("Git commit failed: %s", exc)
return False
def revert_changes(self) -> bool:
"""Revert all uncommitted changes. Returns True on success."""
try:
subprocess.run(
["git", "checkout", "--", "."],
cwd=str(self.workspace),
check=True,
timeout=30,
)
return True
except subprocess.CalledProcessError as exc:
logger.warning("Git revert failed: %s", exc)
return False
# ── Full experiment loop ──────────────────────────────────────────────────
def run(
self,
tox_env: str = "unit",
model: str = "qwen3:30b",
program_content: str = "",
max_iterations: int = 1,
dry_run: bool = False,
create_branch: bool = False,
) -> dict[str, Any]:
"""Run the full experiment loop: hypothesis → edit → tox → evaluate → commit/revert.
This method encapsulates the complete experiment cycle, running multiple
iterations until an improvement is found or max_iterations is reached.
Args:
tox_env: Tox environment to run (default "unit").
model: Ollama model for Aider edits (default "qwen3:30b").
program_content: Research direction for hypothesis generation.
max_iterations: Maximum number of experiment iterations.
dry_run: If True, only generate hypotheses without making changes.
create_branch: If True, create a new git branch for the experiment.
Returns:
Dict with keys: ``success`` (bool), ``final_metric`` (float|None),
``baseline`` (float|None), ``iterations`` (int), ``results`` (list).
"""
if create_branch:
branch_name = f"autoresearch/{self.target.replace('/', '-')}-{int(time.time())}"
self.create_branch(branch_name)
baseline: float | None = self.baseline
final_metric: float | None = None
success = False
for iteration in range(1, max_iterations + 1):
logger.info("Experiment iteration %d/%d", iteration, max_iterations)
# Generate hypothesis
hypothesis = self.hypothesis or self.generate_hypothesis(program_content)
logger.info("Hypothesis: %s", hypothesis)
# In dry-run mode, just record the hypothesis and continue
if dry_run:
result_record = {
"iteration": iteration,
"hypothesis": hypothesis,
"metric": None,
"baseline": baseline,
"assessment": "Dry-run: no changes made",
"success": True,
"duration_s": 0,
}
self.results.append(result_record)
continue
# Apply edit
edit_result = self.apply_edit(hypothesis, model=model)
edit_failed = "not available" in edit_result or edit_result.startswith("Aider error")
if edit_failed:
logger.warning("Edit phase failed: %s", edit_result)
# Run evaluation
tox_result = self.run_tox(tox_env=tox_env)
metric = tox_result["metric"]
# Evaluate result
assessment = self.evaluate(metric, baseline)
logger.info("Assessment: %s", assessment)
# Store result
result_record = {
"iteration": iteration,
"hypothesis": hypothesis,
"metric": metric,
"baseline": baseline,
"assessment": assessment,
"success": tox_result["success"],
"duration_s": tox_result["duration_s"],
}
self.results.append(result_record)
# Set baseline on first successful run
if metric is not None and baseline is None:
baseline = metric
self.baseline = baseline
final_metric = metric
continue
# Determine if we should commit or revert
should_commit = False
if tox_result["success"] and metric is not None and baseline is not None:
if self.is_improvement(metric, baseline):
should_commit = True
final_metric = metric
baseline = metric
self.baseline = baseline
success = True
if should_commit:
commit_msg = f"autoresearch: improve {self.metric} on {self.target}\n\n{hypothesis}"
if self.commit_changes(commit_msg):
logger.info("Changes committed")
else:
self.revert_changes()
logger.warning("Commit failed, changes reverted")
elif self.revert_on_failure:
self.revert_changes()
logger.info("Changes reverted (no improvement)")
# Early exit if we found an improvement
if success:
break
return {
"success": success,
"final_metric": final_metric,
"baseline": self.baseline,
"iterations": len(self.results),
"results": self.results,
}

View File

@@ -36,7 +36,7 @@ import asyncio
import logging
import re
from dataclasses import dataclass, field
from datetime import UTC, datetime, timedelta
from datetime import UTC, datetime
from typing import Any
import httpx
@@ -70,7 +70,9 @@ _LOOP_TAG = "loop-generated"
# Regex patterns for scoring
_TAG_RE = re.compile(r"\[([^\]]+)\]")
_FILE_RE = re.compile(r"(?:src/|tests/|scripts/|\.py|\.html|\.js|\.yaml|\.toml|\.sh)", re.IGNORECASE)
_FILE_RE = re.compile(
r"(?:src/|tests/|scripts/|\.py|\.html|\.js|\.yaml|\.toml|\.sh)", re.IGNORECASE
)
_FUNC_RE = re.compile(r"(?:def |class |function |method |`\w+\(\)`)", re.IGNORECASE)
_ACCEPT_RE = re.compile(
r"(?:should|must|expect|verify|assert|test.?case|acceptance|criteria"
@@ -451,9 +453,7 @@ async def add_label(
# Apply to the issue
apply_url = _repo_url(f"issues/{issue_number}/labels")
apply_resp = await client.post(
apply_url, headers=headers, json={"labels": [label_id]}
)
apply_resp = await client.post(apply_url, headers=headers, json={"labels": [label_id]})
return apply_resp.status_code in (200, 201)
except (httpx.ConnectError, httpx.ReadError, httpx.TimeoutException) as exc:
@@ -692,7 +692,9 @@ class BacklogTriageLoop:
# 1. Fetch
raw_issues = await fetch_open_issues(client)
result.total_open = len(raw_issues)
logger.info("Triage cycle #%d: fetched %d open issues", self._cycle_count, len(raw_issues))
logger.info(
"Triage cycle #%d: fetched %d open issues", self._cycle_count, len(raw_issues)
)
# 2. Score
scored = [score_issue(i) for i in raw_issues]

View File

@@ -46,6 +46,8 @@ class ApprovalItem:
@dataclass
class Briefing:
"""A generated morning briefing summarizing recent activity and pending approvals."""
generated_at: datetime
summary: str # 150-300 words
approval_items: list[ApprovalItem] = field(default_factory=list)

View File

@@ -347,7 +347,10 @@ def interview(
# Force agent creation by calling chat once with a warm-up prompt
try:
loop.run_until_complete(
chat("Hello, Timmy. We're about to start your interview.", session_id="interview")
chat(
"Hello, Timmy. We're about to start your interview.",
session_id="interview",
)
)
except Exception as exc:
typer.echo(f"Warning: Initialization issue — {exc}", err=True)
@@ -410,11 +413,17 @@ def down():
@app.command()
def voice(
whisper_model: str = typer.Option(
"base.en", "--whisper", "-w", help="Whisper model: tiny.en, base.en, small.en, medium.en"
"base.en",
"--whisper",
"-w",
help="Whisper model: tiny.en, base.en, small.en, medium.en",
),
use_say: bool = typer.Option(False, "--say", help="Use macOS `say` instead of Piper TTS"),
threshold: float = typer.Option(
0.015, "--threshold", "-t", help="Mic silence threshold (RMS). Lower = more sensitive."
0.015,
"--threshold",
"-t",
help="Mic silence threshold (RMS). Lower = more sensitive.",
),
silence: float = typer.Option(1.5, "--silence", help="Seconds of silence to end recording"),
backend: str | None = _BACKEND_OPTION,
@@ -457,7 +466,8 @@ def route(
@app.command()
def focus(
topic: str | None = typer.Argument(
None, help='Topic to focus on (e.g. "three-phase loop"). Omit to show current focus.'
None,
help='Topic to focus on (e.g. "three-phase loop"). Omit to show current focus.',
),
clear: bool = typer.Option(False, "--clear", "-c", help="Clear focus and return to broad mode"),
):
@@ -527,5 +537,156 @@ def healthcheck(
raise typer.Exit(result.returncode)
@app.command()
def learn(
target: str | None = typer.Option(
None,
"--target",
"-t",
help="Module or file to optimise (e.g. 'src/timmy/agent.py')",
),
metric: str = typer.Option(
"unit_pass_rate",
"--metric",
"-m",
help="Metric to track: unit_pass_rate | coverage | val_bpb | <custom>",
),
budget: int = typer.Option(
5,
"--budget",
help="Time limit per experiment in minutes",
),
max_experiments: int = typer.Option(
10,
"--max-experiments",
help="Cap on total experiments per run",
),
dry_run: bool = typer.Option(
False,
"--dry-run",
help="Show hypothesis without executing experiments",
),
program_file: str | None = typer.Option(
None,
"--program",
"-p",
help="Path to research direction file (default: program.md in cwd)",
),
tox_env: str = typer.Option(
"unit",
"--tox-env",
help="Tox environment to run for each evaluation",
),
model: str = typer.Option(
"qwen3:30b",
"--model",
help="Ollama model forwarded to Aider for code edits",
),
):
"""Start an autonomous improvement loop (autoresearch).
Reads program.md for research direction, then iterates:
hypothesis → edit → tox → evaluate → commit/revert.
Experiments continue until --max-experiments is reached or the loop is
interrupted with Ctrl+C. Use --dry-run to preview hypotheses without
making any changes.
Example:
timmy learn --target src/timmy/agent.py --metric unit_pass_rate
"""
from pathlib import Path
from timmy.autoresearch import SystemExperiment
repo_root = Path.cwd()
program_path = Path(program_file) if program_file else repo_root / "program.md"
if program_path.exists():
program_content = program_path.read_text()
typer.echo(f"Research direction: {program_path}")
else:
program_content = ""
typer.echo(
f"Note: {program_path} not found — proceeding without research direction.",
err=True,
)
if target is None:
typer.echo(
"Error: --target is required. Specify the module or file to optimise.",
err=True,
)
raise typer.Exit(1)
experiment = SystemExperiment(
target=target,
metric=metric,
budget_minutes=budget,
)
typer.echo()
typer.echo(typer.style("Autoresearch", bold=True) + f"{target}")
typer.echo(f" metric={metric} budget={budget}min max={max_experiments} tox={tox_env}")
if dry_run:
typer.echo(" (dry-run — no changes will be made)")
typer.echo()
def _progress_callback(iteration: int, max_iter: int, message: str) -> None:
"""Print progress updates during experiment iterations."""
if iteration > 0:
prefix = typer.style(f"[{iteration}/{max_iter}]", bold=True)
typer.echo(f"{prefix} {message}")
try:
# Run the full experiment loop via the SystemExperiment class
result = experiment.run(
tox_env=tox_env,
model=model,
program_content=program_content,
max_iterations=max_experiments,
dry_run=dry_run,
create_branch=False, # CLI mode: work on current branch
)
# Display results for each iteration
for i, record in enumerate(experiment.results, 1):
_progress_callback(i, max_experiments, record["hypothesis"])
if dry_run:
continue
# Edit phase result
typer.echo(" → editing …", nl=False)
if record.get("edit_failed"):
typer.echo(f" skipped ({record.get('edit_result', 'unknown')})")
else:
typer.echo(" done")
# Evaluate phase result
duration = record.get("duration_s", 0)
typer.echo(f" → running tox … {duration}s")
# Assessment
assessment = record.get("assessment", "No assessment")
typer.echo(f"{assessment}")
# Outcome
if record.get("committed"):
typer.echo(" → committed")
elif record.get("reverted"):
typer.echo(" → reverted (no improvement)")
typer.echo()
except KeyboardInterrupt:
typer.echo("\nInterrupted.")
raise typer.Exit(0) from None
typer.echo(typer.style("Autoresearch complete.", bold=True))
if result.get("baseline") is not None:
typer.echo(f"Final {metric}: {result['baseline']:.4f}")
def main():
app()

View File

@@ -37,7 +37,7 @@ from __future__ import annotations
import asyncio
import logging
from dataclasses import dataclass, field
from enum import Enum
from enum import StrEnum
from typing import Any
from config import settings
@@ -48,7 +48,8 @@ logger = logging.getLogger(__name__)
# Enumerations
# ---------------------------------------------------------------------------
class AgentType(str, Enum):
class AgentType(StrEnum):
"""Known agents in the swarm."""
CLAUDE_CODE = "claude_code"
@@ -57,7 +58,7 @@ class AgentType(str, Enum):
TIMMY = "timmy"
class TaskType(str, Enum):
class TaskType(StrEnum):
"""Categories of engineering work."""
# Claude Code strengths
@@ -83,7 +84,7 @@ class TaskType(str, Enum):
ORCHESTRATION = "orchestration"
class DispatchStatus(str, Enum):
class DispatchStatus(StrEnum):
"""Lifecycle state of a dispatched task."""
PENDING = "pending"
@@ -99,6 +100,7 @@ class DispatchStatus(str, Enum):
# Agent registry
# ---------------------------------------------------------------------------
@dataclass
class AgentSpec:
"""Capabilities and limits for a single agent."""
@@ -106,9 +108,9 @@ class AgentSpec:
name: AgentType
display_name: str
strengths: frozenset[TaskType]
gitea_label: str | None # label to apply when dispatching
gitea_label: str | None # label to apply when dispatching
max_concurrent: int = 1
interface: str = "gitea" # "gitea" | "api" | "local"
interface: str = "gitea" # "gitea" | "api" | "local"
api_endpoint: str | None = None # for interface="api"
@@ -197,6 +199,7 @@ _TASK_ROUTING: dict[TaskType, AgentType] = {
# Dispatch result
# ---------------------------------------------------------------------------
@dataclass
class DispatchResult:
"""Outcome of a dispatch call."""
@@ -220,6 +223,7 @@ class DispatchResult:
# Routing logic
# ---------------------------------------------------------------------------
def select_agent(task_type: TaskType) -> AgentType:
"""Return the best agent for *task_type* based on the routing table.
@@ -248,11 +252,23 @@ def infer_task_type(title: str, description: str = "") -> TaskType:
text = (title + " " + description).lower()
_SIGNALS: list[tuple[TaskType, frozenset[str]]] = [
(TaskType.ARCHITECTURE, frozenset({"architect", "design", "adr", "system design", "schema"})),
(TaskType.REFACTORING, frozenset({"refactor", "clean up", "cleanup", "reorganise", "reorganize"})),
(
TaskType.ARCHITECTURE,
frozenset({"architect", "design", "adr", "system design", "schema"}),
),
(
TaskType.REFACTORING,
frozenset({"refactor", "clean up", "cleanup", "reorganise", "reorganize"}),
),
(TaskType.CODE_REVIEW, frozenset({"review", "pr review", "pull request review", "audit"})),
(TaskType.COMPLEX_REASONING, frozenset({"complex", "hard problem", "debug", "investigate", "diagnose"})),
(TaskType.RESEARCH, frozenset({"research", "survey", "literature", "benchmark", "analyse", "analyze"})),
(
TaskType.COMPLEX_REASONING,
frozenset({"complex", "hard problem", "debug", "investigate", "diagnose"}),
),
(
TaskType.RESEARCH,
frozenset({"research", "survey", "literature", "benchmark", "analyse", "analyze"}),
),
(TaskType.ANALYSIS, frozenset({"analysis", "profil", "trace", "metric", "performance"})),
(TaskType.TRIAGE, frozenset({"triage", "classify", "prioritise", "prioritize"})),
(TaskType.PLANNING, frozenset({"plan", "roadmap", "milestone", "epic", "spike"})),
@@ -273,6 +289,7 @@ def infer_task_type(title: str, description: str = "") -> TaskType:
# Gitea helpers
# ---------------------------------------------------------------------------
async def _post_gitea_comment(
client: Any,
base_url: str,
@@ -405,6 +422,50 @@ async def _poll_issue_completion(
# Core dispatch functions
# ---------------------------------------------------------------------------
def _format_assignment_comment(
display_name: str,
task_type: TaskType,
description: str,
acceptance_criteria: list[str],
) -> str:
"""Build the markdown comment body for a task assignment.
Args:
display_name: Human-readable agent name.
task_type: The inferred task type.
description: Task description.
acceptance_criteria: List of acceptance criteria strings.
Returns:
Formatted markdown string for the comment.
"""
criteria_md = (
"\n".join(f"- {c}" for c in acceptance_criteria)
if acceptance_criteria
else "_None specified_"
)
return (
f"## Assigned to {display_name}\n\n"
f"**Task type:** `{task_type.value}`\n\n"
f"**Description:**\n{description}\n\n"
f"**Acceptance criteria:**\n{criteria_md}\n\n"
f"---\n*Dispatched by Timmy agent dispatcher.*"
)
def _select_label(agent: AgentType) -> str | None:
"""Return the Gitea label for an agent based on its spec.
Args:
agent: The target agent.
Returns:
Label name or None if the agent has no label.
"""
return AGENT_REGISTRY[agent].gitea_label
async def _dispatch_via_gitea(
agent: AgentType,
issue_number: int,
@@ -459,33 +520,27 @@ async def _dispatch_via_gitea(
async with httpx.AsyncClient(timeout=15) as client:
# 1. Apply agent label (if applicable)
if spec.gitea_label:
ok = await _apply_gitea_label(
client, base_url, repo, headers, issue_number, spec.gitea_label
)
label = _select_label(agent)
if label:
ok = await _apply_gitea_label(client, base_url, repo, headers, issue_number, label)
if ok:
label_applied = spec.gitea_label
label_applied = label
logger.info(
"Applied label %r to issue #%s for %s",
spec.gitea_label,
label,
issue_number,
spec.display_name,
)
else:
logger.warning(
"Could not apply label %r to issue #%s",
spec.gitea_label,
label,
issue_number,
)
# 2. Post assignment comment
criteria_md = "\n".join(f"- {c}" for c in acceptance_criteria) if acceptance_criteria else "_None specified_"
comment_body = (
f"## Assigned to {spec.display_name}\n\n"
f"**Task type:** `{task_type.value}`\n\n"
f"**Description:**\n{description}\n\n"
f"**Acceptance criteria:**\n{criteria_md}\n\n"
f"---\n*Dispatched by Timmy agent dispatcher.*"
comment_body = _format_assignment_comment(
spec.display_name, task_type, description, acceptance_criteria
)
comment_id = await _post_gitea_comment(
client, base_url, repo, headers, issue_number, comment_body
@@ -616,9 +671,7 @@ async def _dispatch_local(
assumed to succeed at dispatch time).
"""
task_type = infer_task_type(title, description)
logger.info(
"Timmy handling task locally: %r (issue #%s)", title[:60], issue_number
)
logger.info("Timmy handling task locally: %r (issue #%s)", title[:60], issue_number)
return DispatchResult(
task_type=task_type,
agent=AgentType.TIMMY,
@@ -632,6 +685,81 @@ async def _dispatch_local(
# Public entry point
# ---------------------------------------------------------------------------
def _validate_task(
title: str,
task_type: TaskType | None,
agent: AgentType | None,
issue_number: int | None,
) -> DispatchResult | None:
"""Validate task preconditions.
Args:
title: Task title to validate.
task_type: Optional task type for result construction.
agent: Optional agent for result construction.
issue_number: Optional issue number for result construction.
Returns:
A failed DispatchResult if validation fails, None otherwise.
"""
if not title.strip():
return DispatchResult(
task_type=task_type or TaskType.ROUTINE_CODING,
agent=agent or AgentType.TIMMY,
issue_number=issue_number,
status=DispatchStatus.FAILED,
error="`title` is required.",
)
return None
def _select_dispatch_strategy(agent: AgentType, issue_number: int | None) -> str:
"""Select the dispatch strategy based on agent interface and context.
Args:
agent: The target agent.
issue_number: Optional Gitea issue number.
Returns:
Strategy name: "gitea", "api", or "local".
"""
spec = AGENT_REGISTRY[agent]
if spec.interface == "gitea" and issue_number is not None:
return "gitea"
if spec.interface == "api":
return "api"
return "local"
def _log_dispatch_result(
title: str,
result: DispatchResult,
attempt: int,
max_retries: int,
) -> None:
"""Log the outcome of a dispatch attempt.
Args:
title: Task title for logging context.
result: The dispatch result.
attempt: Current attempt number (0-indexed).
max_retries: Maximum retry attempts allowed.
"""
if result.success:
return
if attempt > 0:
logger.info("Retry %d/%d for task %r", attempt, max_retries, title[:60])
logger.warning(
"Dispatch attempt %d failed for task %r: %s",
attempt + 1,
title[:60],
result.error,
)
async def dispatch_task(
title: str,
description: str = "",
@@ -672,17 +800,13 @@ async def dispatch_task(
if result.success:
print(f"Assigned to {result.agent.value}")
"""
# 1. Validate
validation_error = _validate_task(title, task_type, agent, issue_number)
if validation_error:
return validation_error
# 2. Resolve task type and agent
criteria = acceptance_criteria or []
if not title.strip():
return DispatchResult(
task_type=task_type or TaskType.ROUTINE_CODING,
agent=agent or AgentType.TIMMY,
issue_number=issue_number,
status=DispatchStatus.FAILED,
error="`title` is required.",
)
resolved_type = task_type or infer_task_type(title, description)
resolved_agent = agent or select_agent(resolved_type)
@@ -694,18 +818,16 @@ async def dispatch_task(
issue_number,
)
spec = AGENT_REGISTRY[resolved_agent]
# 3. Select strategy and dispatch with retries
strategy = _select_dispatch_strategy(resolved_agent, issue_number)
last_result: DispatchResult | None = None
for attempt in range(max_retries + 1):
if attempt > 0:
logger.info("Retry %d/%d for task %r", attempt, max_retries, title[:60])
if spec.interface == "gitea" and issue_number is not None:
for attempt in range(max_retries + 1):
if strategy == "gitea":
result = await _dispatch_via_gitea(
resolved_agent, issue_number, title, description, criteria
)
elif spec.interface == "api":
elif strategy == "api":
result = await _dispatch_via_api(
resolved_agent, title, description, criteria, issue_number, api_endpoint
)
@@ -718,14 +840,9 @@ async def dispatch_task(
if result.success:
return result
logger.warning(
"Dispatch attempt %d failed for task %r: %s",
attempt + 1,
title[:60],
result.error,
)
_log_dispatch_result(title, result, attempt, max_retries)
# All attempts exhausted — escalate
# 4. All attempts exhausted — escalate
assert last_result is not None
last_result.status = DispatchStatus.ESCALATED
logger.error(
@@ -769,9 +886,7 @@ async def _log_escalation(
f"---\n*Timmy agent dispatcher.*"
)
async with httpx.AsyncClient(timeout=10) as client:
await _post_gitea_comment(
client, base_url, repo, headers, issue_number, body
)
await _post_gitea_comment(client, base_url, repo, headers, issue_number, body)
except Exception as exc:
logger.warning("Failed to post escalation comment: %s", exc)
@@ -780,6 +895,7 @@ async def _log_escalation(
# Monitoring helper
# ---------------------------------------------------------------------------
async def wait_for_completion(
issue_number: int,
poll_interval: int = 60,

View File

@@ -418,9 +418,7 @@ class MCPBridge:
return f"Error executing {name}: {exc}"
@staticmethod
def _build_initial_messages(
prompt: str, system_prompt: str | None
) -> list[dict]:
def _build_initial_messages(prompt: str, system_prompt: str | None) -> list[dict]:
"""Build the initial message list for a run."""
messages: list[dict] = []
if system_prompt:
@@ -512,9 +510,7 @@ class MCPBridge:
error_msg = ""
try:
content, tool_calls_made, rounds, error_msg = await self._run_tool_loop(
messages, tools
)
content, tool_calls_made, rounds, error_msg = await self._run_tool_loop(messages, tools)
except httpx.ConnectError as exc:
logger.warning("Ollama connection failed: %s", exc)
error_msg = f"Ollama connection failed: {exc}"

View File

@@ -7,37 +7,97 @@ Also includes vector similarity utilities (cosine similarity, keyword overlap).
"""
import hashlib
import json
import logging
import math
import httpx # Import httpx for Ollama API calls
from config import settings
logger = logging.getLogger(__name__)
# Embedding model - small, fast, local
EMBEDDING_MODEL = None
EMBEDDING_DIM = 384 # MiniLM dimension
EMBEDDING_DIM = 384 # MiniLM dimension, will be overridden if Ollama model has different dim
class OllamaEmbedder:
"""Mimics SentenceTransformer interface for Ollama."""
def __init__(self, model_name: str, ollama_url: str):
self.model_name = model_name
self.ollama_url = ollama_url
self.dimension = 0 # Will be updated after first call
def encode(
self,
sentences: str | list[str],
convert_to_numpy: bool = False,
normalize_embeddings: bool = True,
) -> list[list[float]] | list[float]:
"""Generate embeddings using Ollama."""
if isinstance(sentences, str):
sentences = [sentences]
all_embeddings = []
for sentence in sentences:
try:
response = httpx.post(
f"{self.ollama_url}/api/embeddings",
json={"model": self.model_name, "prompt": sentence},
timeout=settings.mcp_bridge_timeout,
)
response.raise_for_status()
embedding = response.json()["embedding"]
if not self.dimension:
self.dimension = len(embedding) # Set dimension on first successful call
global EMBEDDING_DIM
EMBEDDING_DIM = self.dimension # Update global EMBEDDING_DIM
all_embeddings.append(embedding)
except httpx.RequestError as exc:
logger.error("Ollama embeddings request failed: %s", exc)
# Fallback to simple hash embedding on Ollama error
return _simple_hash_embedding(sentence)
except json.JSONDecodeError as exc:
logger.error("Failed to decode Ollama embeddings response: %s", exc)
return _simple_hash_embedding(sentence)
if len(all_embeddings) == 1 and isinstance(sentences, str):
return all_embeddings[0]
return all_embeddings
def _get_embedding_model():
"""Lazy-load embedding model."""
"""Lazy-load embedding model, preferring Ollama if configured."""
global EMBEDDING_MODEL
global EMBEDDING_DIM
if EMBEDDING_MODEL is None:
try:
from config import settings
if settings.timmy_skip_embeddings:
EMBEDDING_MODEL = False
return EMBEDDING_MODEL
if settings.timmy_skip_embeddings:
EMBEDDING_MODEL = False
return EMBEDDING_MODEL
except ImportError:
pass
if settings.timmy_embedding_backend == "ollama":
logger.info(
"MemorySystem: Using Ollama for embeddings with model %s",
settings.ollama_embedding_model,
)
EMBEDDING_MODEL = OllamaEmbedder(
settings.ollama_embedding_model, settings.normalized_ollama_url
)
# We don't know the dimension until after the first call, so keep it default for now.
# It will be updated dynamically in OllamaEmbedder.encode
return EMBEDDING_MODEL
else:
try:
from sentence_transformers import SentenceTransformer
try:
from sentence_transformers import SentenceTransformer
EMBEDDING_MODEL = SentenceTransformer("all-MiniLM-L6-v2")
logger.info("MemorySystem: Loaded embedding model")
except ImportError:
logger.warning("MemorySystem: sentence-transformers not installed, using fallback")
EMBEDDING_MODEL = False # Use fallback
EMBEDDING_MODEL = SentenceTransformer("all-MiniLM-L6-v2")
EMBEDDING_DIM = 384 # Reset to MiniLM dimension
logger.info("MemorySystem: Loaded local embedding model (all-MiniLM-L6-v2)")
except ImportError:
logger.warning("MemorySystem: sentence-transformers not installed, using fallback")
EMBEDDING_MODEL = False # Use fallback
return EMBEDDING_MODEL
@@ -60,7 +120,10 @@ def embed_text(text: str) -> list[float]:
model = _get_embedding_model()
if model and model is not False:
embedding = model.encode(text)
return embedding.tolist()
# Ensure it's a list of floats, not numpy array
if hasattr(embedding, "tolist"):
return embedding.tolist()
return embedding
return _simple_hash_embedding(text)

View File

@@ -1206,7 +1206,7 @@ memory_searcher = MemorySearcher()
# ───────────────────────────────────────────────────────────────────────────────
def memory_search(query: str, top_k: int = 5) -> str:
def memory_search(query: str, limit: int = 10) -> str:
"""Search past conversations, notes, and stored facts for relevant context.
Searches across both the vault (indexed markdown files) and the
@@ -1215,19 +1215,19 @@ def memory_search(query: str, top_k: int = 5) -> str:
Args:
query: What to search for (e.g. "Bitcoin strategy", "server setup").
top_k: Number of results to return (default 5).
limit: Number of results to return (default 10).
Returns:
Formatted string of relevant memory results.
"""
# Guard: model sometimes passes None for top_k
if top_k is None:
top_k = 5
# Guard: model sometimes passes None for limit
if limit is None:
limit = 10
parts: list[str] = []
# 1. Search semantic vault (indexed markdown files)
vault_results = semantic_memory.search(query, top_k)
vault_results = semantic_memory.search(query, limit)
for content, score in vault_results:
if score < 0.2:
continue
@@ -1235,7 +1235,7 @@ def memory_search(query: str, top_k: int = 5) -> str:
# 2. Search runtime vector store (stored facts/conversations)
try:
runtime_results = search_memories(query, limit=top_k, min_relevance=0.2)
runtime_results = search_memories(query, limit=limit, min_relevance=0.2)
for entry in runtime_results:
label = entry.context_type or "memory"
parts.append(f"[{label}] {entry.content[:300]}")
@@ -1289,45 +1289,48 @@ def memory_read(query: str = "", top_k: int = 5) -> str:
return "\n".join(parts)
def memory_write(content: str, context_type: str = "fact") -> str:
"""Store a piece of information in persistent memory.
def memory_store(topic: str, report: str, type: str = "research") -> str:
"""Store a piece of information in persistent memory, particularly for research outputs.
Use this tool when the user explicitly asks you to remember something.
Stored memories are searchable via memory_search across all channels
(web GUI, Discord, Telegram, etc.).
Use this tool to store structured research findings or other important documents.
Stored memories are searchable via memory_search across all channels.
Args:
content: The information to remember (e.g. a phrase, fact, or note).
context_type: Type of memory — "fact" for permanent facts,
"conversation" for conversation context,
"document" for document fragments.
topic: A concise title or topic for the research output.
report: The detailed content of the research output or document.
type: Type of memory — "research" for research outputs (default),
"fact" for permanent facts, "conversation" for conversation context,
"document" for other document fragments.
Returns:
Confirmation that the memory was stored.
"""
if not content or not content.strip():
return "Nothing to store — content is empty."
if not report or not report.strip():
return "Nothing to store — report is empty."
valid_types = ("fact", "conversation", "document")
if context_type not in valid_types:
context_type = "fact"
# Combine topic and report for embedding and storage content
full_content = f"Topic: {topic.strip()}\n\nReport: {report.strip()}"
valid_types = ("fact", "conversation", "document", "research")
if type not in valid_types:
type = "research"
try:
# Dedup check for facts — skip if a similar fact already exists
# Threshold 0.75 catches paraphrases (was 0.9 which only caught near-exact)
if context_type == "fact":
existing = search_memories(
content.strip(), limit=3, context_type="fact", min_relevance=0.75
)
# Dedup check for facts and research — skip if similar exists
if type in ("fact", "research"):
existing = search_memories(full_content, limit=3, context_type=type, min_relevance=0.75)
if existing:
return f"Similar fact already stored (id={existing[0].id[:8]}). Skipping duplicate."
return (
f"Similar {type} already stored (id={existing[0].id[:8]}). Skipping duplicate."
)
entry = store_memory(
content=content.strip(),
content=full_content,
source="agent",
context_type=context_type,
context_type=type,
metadata={"topic": topic},
)
return f"Stored in memory (type={context_type}, id={entry.id[:8]}). This is now searchable across all channels."
return f"Stored in memory (type={type}, id={entry.id[:8]}). This is now searchable across all channels."
except Exception as exc:
logger.error("Failed to write memory: %s", exc)
return f"Failed to store memory: {exc}"

528
src/timmy/research.py Normal file
View File

@@ -0,0 +1,528 @@
"""Research Orchestrator — autonomous, sovereign research pipeline.
Chains all six steps of the research workflow with local-first execution:
Step 0 Cache — check semantic memory (SQLite, instant, zero API cost)
Step 1 Scope — load a research template from skills/research/
Step 2 Query — slot-fill template + formulate 5-15 search queries via Ollama
Step 3 Search — execute queries via web_search (SerpAPI or fallback)
Step 4 Fetch — download + extract full pages via web_fetch (trafilatura)
Step 5 Synth — compress findings into a structured report via cascade
Step 6 Deliver — store to semantic memory; optionally save to docs/research/
Cascade tiers for synthesis (spec §4):
Tier 4 SQLite semantic cache — instant, free, covers ~80% after warm-up
Tier 3 Ollama (qwen3:14b) — local, free, good quality
Tier 2 Claude API (haiku) — cloud fallback, cheap, set ANTHROPIC_API_KEY
Tier 1 (future) Groq — free-tier rate-limited, tracked in #980
All optional services degrade gracefully per project conventions.
Refs #972 (governing spec), #975 (ResearchOrchestrator sub-issue).
"""
from __future__ import annotations
import asyncio
import logging
import re
import textwrap
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any
logger = logging.getLogger(__name__)
# Optional memory imports — available at module level so tests can patch them.
try:
from timmy.memory_system import SemanticMemory, store_memory
except Exception: # pragma: no cover
SemanticMemory = None # type: ignore[assignment,misc]
store_memory = None # type: ignore[assignment]
# Root of the project — two levels up from src/timmy/
_PROJECT_ROOT = Path(__file__).parent.parent.parent
_SKILLS_ROOT = _PROJECT_ROOT / "skills" / "research"
_DOCS_ROOT = _PROJECT_ROOT / "docs" / "research"
# Similarity threshold for cache hit (01 cosine similarity)
_CACHE_HIT_THRESHOLD = 0.82
# How many search result URLs to fetch as full pages
_FETCH_TOP_N = 5
# Maximum tokens to request from the synthesis LLM
_SYNTHESIS_MAX_TOKENS = 4096
# ---------------------------------------------------------------------------
# Data structures
# ---------------------------------------------------------------------------
@dataclass
class ResearchResult:
"""Full output of a research pipeline run."""
topic: str
query_count: int
sources_fetched: int
report: str
cached: bool = False
cache_similarity: float = 0.0
synthesis_backend: str = "unknown"
errors: list[str] = field(default_factory=list)
def is_empty(self) -> bool:
return not self.report.strip()
# ---------------------------------------------------------------------------
# Template loading
# ---------------------------------------------------------------------------
def list_templates() -> list[str]:
"""Return names of available research templates (without .md extension)."""
if not _SKILLS_ROOT.exists():
return []
return [p.stem for p in sorted(_SKILLS_ROOT.glob("*.md"))]
def load_template(template_name: str, slots: dict[str, str] | None = None) -> str:
"""Load a research template and fill {slot} placeholders.
Args:
template_name: Stem of the .md file under skills/research/ (e.g. "tool_evaluation").
slots: Mapping of {placeholder} → replacement value.
Returns:
Template text with slots filled. Unfilled slots are left as-is.
"""
path = _SKILLS_ROOT / f"{template_name}.md"
if not path.exists():
available = ", ".join(list_templates()) or "(none)"
raise FileNotFoundError(
f"Research template {template_name!r} not found. "
f"Available: {available}"
)
text = path.read_text(encoding="utf-8")
# Strip YAML frontmatter (--- ... ---), including empty frontmatter (--- \n---)
text = re.sub(r"^---\n.*?---\n", "", text, flags=re.DOTALL)
if slots:
for key, value in slots.items():
text = text.replace(f"{{{key}}}", value)
return text.strip()
# ---------------------------------------------------------------------------
# Query formulation (Step 2)
# ---------------------------------------------------------------------------
async def _formulate_queries(topic: str, template_context: str, n: int = 8) -> list[str]:
"""Use the local LLM to generate targeted search queries for a topic.
Falls back to a simple heuristic if Ollama is unavailable.
"""
prompt = textwrap.dedent(f"""\
You are a research assistant. Generate exactly {n} targeted, specific web search
queries to thoroughly research the following topic.
TOPIC: {topic}
RESEARCH CONTEXT:
{template_context[:1000]}
Rules:
- One query per line, no numbering, no bullet points.
- Vary the angle (definition, comparison, implementation, alternatives, pitfalls).
- Prefer exact technical terms, tool names, and version numbers where relevant.
- Output ONLY the queries, nothing else.
""")
queries = await _ollama_complete(prompt, max_tokens=512)
if not queries:
# Minimal fallback
return [
f"{topic} overview",
f"{topic} tutorial",
f"{topic} best practices",
f"{topic} alternatives",
f"{topic} 2025",
]
lines = [ln.strip() for ln in queries.splitlines() if ln.strip()]
return lines[:n] if len(lines) >= n else lines
# ---------------------------------------------------------------------------
# Search (Step 3)
# ---------------------------------------------------------------------------
async def _execute_search(queries: list[str]) -> list[dict[str, str]]:
"""Run each query through the available web search backend.
Returns a flat list of {title, url, snippet} dicts.
Degrades gracefully if SerpAPI key is absent.
"""
results: list[dict[str, str]] = []
seen_urls: set[str] = set()
for query in queries:
try:
raw = await asyncio.to_thread(_run_search_sync, query)
for item in raw:
url = item.get("url", "")
if url and url not in seen_urls:
seen_urls.add(url)
results.append(item)
except Exception as exc:
logger.warning("Search failed for query %r: %s", query, exc)
return results
def _run_search_sync(query: str) -> list[dict[str, str]]:
"""Synchronous search — wraps SerpAPI or returns empty on missing key."""
import os
if not os.environ.get("SERPAPI_API_KEY"):
logger.debug("SERPAPI_API_KEY not set — skipping web search for %r", query)
return []
try:
from serpapi import GoogleSearch
params = {"q": query, "api_key": os.environ["SERPAPI_API_KEY"], "num": 5}
search = GoogleSearch(params)
data = search.get_dict()
items = []
for r in data.get("organic_results", []):
items.append(
{
"title": r.get("title", ""),
"url": r.get("link", ""),
"snippet": r.get("snippet", ""),
}
)
return items
except Exception as exc:
logger.warning("SerpAPI search error: %s", exc)
return []
# ---------------------------------------------------------------------------
# Fetch (Step 4)
# ---------------------------------------------------------------------------
async def _fetch_pages(results: list[dict[str, str]], top_n: int = _FETCH_TOP_N) -> list[str]:
"""Download and extract full text for the top search results.
Uses web_fetch (trafilatura) from timmy.tools.system_tools.
"""
try:
from timmy.tools.system_tools import web_fetch
except ImportError:
logger.warning("web_fetch not available — skipping page fetch")
return []
pages: list[str] = []
for item in results[:top_n]:
url = item.get("url", "")
if not url:
continue
try:
text = await asyncio.to_thread(web_fetch, url, 6000)
if text and not text.startswith("Error:"):
pages.append(f"## {item.get('title', url)}\nSource: {url}\n\n{text}")
except Exception as exc:
logger.warning("Failed to fetch %s: %s", url, exc)
return pages
# ---------------------------------------------------------------------------
# Synthesis (Step 5) — cascade: Ollama → Claude fallback
# ---------------------------------------------------------------------------
async def _synthesize(topic: str, pages: list[str], snippets: list[str]) -> tuple[str, str]:
"""Compress fetched pages + snippets into a structured research report.
Returns (report_markdown, backend_used).
"""
# Build synthesis prompt
source_content = "\n\n---\n\n".join(pages[:5])
if not source_content and snippets:
source_content = "\n".join(f"- {s}" for s in snippets[:20])
if not source_content:
return (
f"# Research: {topic}\n\n*No source material was retrieved. "
"Check SERPAPI_API_KEY and network connectivity.*",
"none",
)
prompt = textwrap.dedent(f"""\
You are a senior technical researcher. Synthesize the source material below
into a structured research report on the topic: **{topic}**
FORMAT YOUR REPORT AS:
# {topic}
## Executive Summary
(2-3 sentences: what you found, top recommendation)
## Key Findings
(Bullet list of the most important facts, tools, or patterns)
## Comparison / Options
(Table or list comparing alternatives where applicable)
## Recommended Approach
(Concrete recommendation with rationale)
## Gaps & Next Steps
(What wasn't answered, what to investigate next)
---
SOURCE MATERIAL:
{source_content[:12000]}
""")
# Tier 3 — try Ollama first
report = await _ollama_complete(prompt, max_tokens=_SYNTHESIS_MAX_TOKENS)
if report:
return report, "ollama"
# Tier 2 — Claude fallback
report = await _claude_complete(prompt, max_tokens=_SYNTHESIS_MAX_TOKENS)
if report:
return report, "claude"
# Last resort — structured snippet summary
summary = f"# {topic}\n\n## Snippets\n\n" + "\n\n".join(
f"- {s}" for s in snippets[:15]
)
return summary, "fallback"
# ---------------------------------------------------------------------------
# LLM helpers
# ---------------------------------------------------------------------------
async def _ollama_complete(prompt: str, max_tokens: int = 1024) -> str:
"""Send a prompt to Ollama and return the response text.
Returns empty string on failure (graceful degradation).
"""
try:
import httpx
from config import settings
url = f"{settings.normalized_ollama_url}/api/generate"
payload: dict[str, Any] = {
"model": settings.ollama_model,
"prompt": prompt,
"stream": False,
"options": {
"num_predict": max_tokens,
"temperature": 0.3,
},
}
async with httpx.AsyncClient(timeout=120.0) as client:
resp = await client.post(url, json=payload)
resp.raise_for_status()
data = resp.json()
return data.get("response", "").strip()
except Exception as exc:
logger.warning("Ollama completion failed: %s", exc)
return ""
async def _claude_complete(prompt: str, max_tokens: int = 1024) -> str:
"""Send a prompt to Claude API as a last-resort fallback.
Only active when ANTHROPIC_API_KEY is configured.
Returns empty string on failure or missing key.
"""
try:
from config import settings
if not settings.anthropic_api_key:
return ""
from timmy.backends import ClaudeBackend
backend = ClaudeBackend()
result = await asyncio.to_thread(backend.run, prompt)
return result.content.strip()
except Exception as exc:
logger.warning("Claude fallback failed: %s", exc)
return ""
# ---------------------------------------------------------------------------
# Memory cache (Step 0 + Step 6)
# ---------------------------------------------------------------------------
def _check_cache(topic: str) -> tuple[str | None, float]:
"""Search semantic memory for a prior result on this topic.
Returns (cached_report, similarity) or (None, 0.0).
"""
try:
if SemanticMemory is None:
return None, 0.0
mem = SemanticMemory()
hits = mem.search(topic, top_k=1)
if hits:
content, score = hits[0]
if score >= _CACHE_HIT_THRESHOLD:
return content, score
except Exception as exc:
logger.debug("Cache check failed: %s", exc)
return None, 0.0
def _store_result(topic: str, report: str) -> None:
"""Index the research report into semantic memory for future retrieval."""
try:
if store_memory is None:
logger.debug("store_memory not available — skipping memory index")
return
store_memory(
content=report,
source="research_pipeline",
context_type="research",
metadata={"topic": topic},
)
logger.info("Research result indexed for topic: %r", topic)
except Exception as exc:
logger.warning("Failed to store research result: %s", exc)
def _save_to_disk(topic: str, report: str) -> Path | None:
"""Persist the report as a markdown file under docs/research/.
Filename is derived from the topic (slugified). Returns the path or None.
"""
try:
slug = re.sub(r"[^a-z0-9]+", "-", topic.lower()).strip("-")[:60]
_DOCS_ROOT.mkdir(parents=True, exist_ok=True)
path = _DOCS_ROOT / f"{slug}.md"
path.write_text(report, encoding="utf-8")
logger.info("Research report saved to %s", path)
return path
except Exception as exc:
logger.warning("Failed to save research report to disk: %s", exc)
return None
# ---------------------------------------------------------------------------
# Main orchestrator
# ---------------------------------------------------------------------------
async def run_research(
topic: str,
template: str | None = None,
slots: dict[str, str] | None = None,
save_to_disk: bool = False,
skip_cache: bool = False,
) -> ResearchResult:
"""Run the full 6-step autonomous research pipeline.
Args:
topic: The research question or subject.
template: Name of a template from skills/research/ (e.g. "tool_evaluation").
If None, runs without a template scaffold.
slots: Placeholder values for the template (e.g. {"domain": "PDF parsing"}).
save_to_disk: If True, write the report to docs/research/<slug>.md.
skip_cache: If True, bypass the semantic memory cache.
Returns:
ResearchResult with report and metadata.
"""
errors: list[str] = []
# ------------------------------------------------------------------
# Step 0 — check cache
# ------------------------------------------------------------------
if not skip_cache:
cached, score = _check_cache(topic)
if cached:
logger.info("Cache hit (%.2f) for topic: %r", score, topic)
return ResearchResult(
topic=topic,
query_count=0,
sources_fetched=0,
report=cached,
cached=True,
cache_similarity=score,
synthesis_backend="cache",
)
# ------------------------------------------------------------------
# Step 1 — load template (optional)
# ------------------------------------------------------------------
template_context = ""
if template:
try:
template_context = load_template(template, slots)
except FileNotFoundError as exc:
errors.append(str(exc))
logger.warning("Template load failed: %s", exc)
# ------------------------------------------------------------------
# Step 2 — formulate queries
# ------------------------------------------------------------------
queries = await _formulate_queries(topic, template_context)
logger.info("Formulated %d queries for topic: %r", len(queries), topic)
# ------------------------------------------------------------------
# Step 3 — execute search
# ------------------------------------------------------------------
search_results = await _execute_search(queries)
logger.info("Search returned %d results", len(search_results))
snippets = [r.get("snippet", "") for r in search_results if r.get("snippet")]
# ------------------------------------------------------------------
# Step 4 — fetch full pages
# ------------------------------------------------------------------
pages = await _fetch_pages(search_results)
logger.info("Fetched %d pages", len(pages))
# ------------------------------------------------------------------
# Step 5 — synthesize
# ------------------------------------------------------------------
report, backend = await _synthesize(topic, pages, snippets)
# ------------------------------------------------------------------
# Step 6 — deliver
# ------------------------------------------------------------------
_store_result(topic, report)
if save_to_disk:
_save_to_disk(topic, report)
return ResearchResult(
topic=topic,
query_count=len(queries),
sources_fetched=len(pages),
report=report,
cached=False,
synthesis_backend=backend,
errors=errors,
)

View File

@@ -32,8 +32,12 @@ def get_llm_client() -> Any:
# a client for an LLM service like OpenAI, Anthropic, or a local
# model.
class MockLLMClient:
"""Stub LLM client for testing without a real language model."""
async def completion(self, prompt: str, max_tokens: int) -> Any:
class MockCompletion:
"""Stub completion response returned by MockLLMClient."""
def __init__(self, text: str) -> None:
self.text = text

View File

@@ -0,0 +1,30 @@
"""Sovereignty metrics for the Bannerlord loop.
Tracks how much of each AI layer (perception, decision, narration)
runs locally vs. calls out to an LLM. Feeds the sovereignty dashboard.
Refs: #954, #953
Three-strike detector and automation enforcement.
Refs: #962
Session reporting: auto-generates markdown scorecards at session end
and commits them to the Gitea repo for institutional memory.
Refs: #957 (Session Sovereignty Report Generator)
"""
from timmy.sovereignty.session_report import (
commit_report,
generate_and_commit_report,
generate_report,
mark_session_start,
)
__all__ = [
"generate_report",
"commit_report",
"generate_and_commit_report",
"mark_session_start",
]

View File

@@ -0,0 +1,413 @@
"""Sovereignty metrics emitter and SQLite store.
Tracks the sovereignty percentage for each AI layer (perception, decision,
narration) plus API cost and skill crystallisation. All data is persisted to
``data/sovereignty_metrics.db`` so the dashboard can query trends over time.
Event types
-----------
perception layer:
``perception_cache_hit`` — frame answered from local cache (sovereign)
``perception_vlm_call`` — frame required a VLM inference call (non-sovereign)
decision layer:
``decision_rule_hit`` — action chosen by a deterministic rule (sovereign)
``decision_llm_call`` — action required LLM reasoning (non-sovereign)
narration layer:
``narration_template`` — text generated from a template (sovereign)
``narration_llm`` — text generated by an LLM (non-sovereign)
skill layer:
``skill_crystallized`` — a new skill was crystallised from LLM output
cost:
``api_call`` — any external API call was made
``api_cost`` — monetary cost of an API call (metadata: {"usd": float})
Refs: #954, #953
"""
import asyncio
import json
import logging
import sqlite3
import uuid
from contextlib import closing
from dataclasses import dataclass, field
from datetime import UTC, datetime
from pathlib import Path
from typing import Any
from config import settings
logger = logging.getLogger(__name__)
# ── Constants ─────────────────────────────────────────────────────────────────
DB_PATH = Path(settings.repo_root) / "data" / "sovereignty_metrics.db"
#: Sovereign event types for each layer (numerator of sovereignty %).
_SOVEREIGN_EVENTS: dict[str, frozenset[str]] = {
"perception": frozenset({"perception_cache_hit"}),
"decision": frozenset({"decision_rule_hit"}),
"narration": frozenset({"narration_template"}),
}
#: All tracked event types for each layer (denominator of sovereignty %).
_LAYER_EVENTS: dict[str, frozenset[str]] = {
"perception": frozenset({"perception_cache_hit", "perception_vlm_call"}),
"decision": frozenset({"decision_rule_hit", "decision_llm_call"}),
"narration": frozenset({"narration_template", "narration_llm"}),
}
ALL_EVENT_TYPES: frozenset[str] = frozenset(
{
"perception_cache_hit",
"perception_vlm_call",
"decision_rule_hit",
"decision_llm_call",
"narration_template",
"narration_llm",
"skill_crystallized",
"api_call",
"api_cost",
}
)
# ── Schema ────────────────────────────────────────────────────────────────────
_SCHEMA = """
CREATE TABLE IF NOT EXISTS events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
event_type TEXT NOT NULL,
session_id TEXT NOT NULL DEFAULT '',
metadata_json TEXT NOT NULL DEFAULT '{}'
);
CREATE INDEX IF NOT EXISTS idx_ev_type ON events(event_type);
CREATE INDEX IF NOT EXISTS idx_ev_ts ON events(timestamp);
CREATE INDEX IF NOT EXISTS idx_ev_session ON events(session_id);
CREATE TABLE IF NOT EXISTS sessions (
session_id TEXT PRIMARY KEY,
game TEXT NOT NULL DEFAULT '',
start_time TEXT NOT NULL,
end_time TEXT
);
"""
# ── Data classes ──────────────────────────────────────────────────────────────
@dataclass
class SovereigntyEvent:
"""A single sovereignty event."""
event_type: str
session_id: str = ""
metadata: dict[str, Any] = field(default_factory=dict)
timestamp: str = field(default_factory=lambda: datetime.now(UTC).isoformat())
# ── Store ─────────────────────────────────────────────────────────────────────
class SovereigntyMetricsStore:
"""SQLite-backed sovereignty event store.
Thread-safe: creates a new connection per operation (WAL mode).
"""
def __init__(self, db_path: Path | None = None) -> None:
self._db_path = db_path or DB_PATH
self._init_db()
# ── internal ─────────────────────────────────────────────────────────────
def _init_db(self) -> None:
try:
self._db_path.parent.mkdir(parents=True, exist_ok=True)
with closing(sqlite3.connect(str(self._db_path))) as conn:
conn.execute("PRAGMA journal_mode=WAL")
conn.execute(f"PRAGMA busy_timeout={settings.db_busy_timeout_ms}")
conn.executescript(_SCHEMA)
conn.commit()
except Exception as exc:
logger.warning("Failed to initialise sovereignty metrics DB: %s", exc)
def _connect(self) -> sqlite3.Connection:
conn = sqlite3.connect(str(self._db_path))
conn.row_factory = sqlite3.Row
conn.execute(f"PRAGMA busy_timeout={settings.db_busy_timeout_ms}")
return conn
# ── public API ────────────────────────────────────────────────────────────
def record(
self, event_type: str, metadata: dict[str, Any] | None = None, *, session_id: str = ""
) -> None:
"""Record a sovereignty event.
Parameters
----------
event_type:
One of ``ALL_EVENT_TYPES``.
metadata:
Optional dict of extra data (serialised as JSON).
session_id:
Identifier of the current game session, if known.
"""
event = SovereigntyEvent(
event_type=event_type,
session_id=session_id,
metadata=metadata or {},
)
try:
with closing(self._connect()) as conn:
conn.execute(
"INSERT INTO events (timestamp, event_type, session_id, metadata_json) "
"VALUES (?, ?, ?, ?)",
(
event.timestamp,
event.event_type,
event.session_id,
json.dumps(event.metadata),
),
)
conn.commit()
except Exception as exc:
logger.warning("Failed to record sovereignty event: %s", exc)
def start_session(self, game: str = "", session_id: str | None = None) -> str:
"""Register a new game session. Returns the session_id."""
sid = session_id or str(uuid.uuid4())
try:
with closing(self._connect()) as conn:
conn.execute(
"INSERT OR IGNORE INTO sessions (session_id, game, start_time) VALUES (?, ?, ?)",
(sid, game, datetime.now(UTC).isoformat()),
)
conn.commit()
except Exception as exc:
logger.warning("Failed to start session: %s", exc)
return sid
def end_session(self, session_id: str) -> None:
"""Mark a session as ended."""
try:
with closing(self._connect()) as conn:
conn.execute(
"UPDATE sessions SET end_time = ? WHERE session_id = ?",
(datetime.now(UTC).isoformat(), session_id),
)
conn.commit()
except Exception as exc:
logger.warning("Failed to end session: %s", exc)
# ── analytics ─────────────────────────────────────────────────────────────
def get_sovereignty_pct(self, layer: str, time_window: float | None = None) -> float:
"""Return the sovereignty percentage (0.0100.0) for *layer*.
Parameters
----------
layer:
One of ``"perception"``, ``"decision"``, ``"narration"``.
time_window:
If given, only consider events from the last *time_window* seconds.
If ``None``, all events are used.
Returns
-------
float
Percentage of sovereign events for the layer, or 0.0 if no data.
"""
if layer not in _LAYER_EVENTS:
logger.warning("Unknown sovereignty layer: %s", layer)
return 0.0
sovereign = _SOVEREIGN_EVENTS[layer]
total_types = _LAYER_EVENTS[layer]
sovereign_placeholders = ",".join("?" * len(sovereign))
total_placeholders = ",".join("?" * len(total_types))
params_sov: list[Any] = list(sovereign)
params_total: list[Any] = list(total_types)
if time_window is not None:
cutoff = _seconds_ago_iso(time_window)
where_ts = " AND timestamp >= ?"
params_sov.append(cutoff)
params_total.append(cutoff)
else:
where_ts = ""
try:
with closing(self._connect()) as conn:
total_count = conn.execute(
f"SELECT COUNT(*) FROM events WHERE event_type IN ({total_placeholders}){where_ts}",
params_total,
).fetchone()[0]
if total_count == 0:
return 0.0
sov_count = conn.execute(
f"SELECT COUNT(*) FROM events WHERE event_type IN ({sovereign_placeholders}){where_ts}",
params_sov,
).fetchone()[0]
return round(100.0 * sov_count / total_count, 2)
except Exception as exc:
logger.warning("Failed to compute sovereignty pct: %s", exc)
return 0.0
def get_cost_per_hour(self, time_window: float | None = None) -> float:
"""Return the total API cost in USD extrapolated to a per-hour rate.
Parameters
----------
time_window:
Seconds of history to consider. Defaults to 3600 (last hour).
Returns
-------
float
USD cost per hour, or 0.0 if no ``api_cost`` events exist.
"""
window = time_window if time_window is not None else 3600.0
cutoff = _seconds_ago_iso(window)
try:
with closing(self._connect()) as conn:
rows = conn.execute(
"SELECT metadata_json FROM events WHERE event_type = 'api_cost' AND timestamp >= ?",
(cutoff,),
).fetchall()
except Exception as exc:
logger.warning("Failed to query api_cost events: %s", exc)
return 0.0
total_usd = 0.0
for row in rows:
try:
meta = json.loads(row["metadata_json"] or "{}")
total_usd += float(meta.get("usd", 0.0))
except (ValueError, TypeError, json.JSONDecodeError):
pass
# Extrapolate: (total in window) * (3600 / window_seconds)
if window == 0:
return 0.0
return round(total_usd * (3600.0 / window), 4)
def get_skills_crystallized(self, session_id: str | None = None) -> int:
"""Return the number of skills crystallised.
Parameters
----------
session_id:
If given, count only events for that session. If ``None``,
count across all sessions.
"""
try:
with closing(self._connect()) as conn:
if session_id:
return conn.execute(
"SELECT COUNT(*) FROM events WHERE event_type = 'skill_crystallized' AND session_id = ?",
(session_id,),
).fetchone()[0]
return conn.execute(
"SELECT COUNT(*) FROM events WHERE event_type = 'skill_crystallized'",
).fetchone()[0]
except Exception as exc:
logger.warning("Failed to query skill_crystallized: %s", exc)
return 0
def get_snapshot(self) -> dict[str, Any]:
"""Return a real-time metrics snapshot suitable for dashboard widgets."""
return {
"sovereignty": {
layer: self.get_sovereignty_pct(layer, time_window=3600) for layer in _LAYER_EVENTS
},
"cost_per_hour": self.get_cost_per_hour(),
"skills_crystallized": self.get_skills_crystallized(),
}
# ── Module-level singleton ────────────────────────────────────────────────────
_store: SovereigntyMetricsStore | None = None
def get_metrics_store() -> SovereigntyMetricsStore:
"""Return (or lazily create) the module-level singleton store."""
global _store
if _store is None:
_store = SovereigntyMetricsStore()
return _store
# ── Convenience helpers ───────────────────────────────────────────────────────
def record(
event_type: str, metadata: dict[str, Any] | None = None, *, session_id: str = ""
) -> None:
"""Module-level shortcut: ``metrics.record("perception_cache_hit")``."""
get_metrics_store().record(event_type, metadata=metadata, session_id=session_id)
def get_sovereignty_pct(layer: str, time_window: float | None = None) -> float:
"""Module-level shortcut for :meth:`SovereigntyMetricsStore.get_sovereignty_pct`."""
return get_metrics_store().get_sovereignty_pct(layer, time_window)
def get_cost_per_hour(time_window: float | None = None) -> float:
"""Module-level shortcut for :meth:`SovereigntyMetricsStore.get_cost_per_hour`."""
return get_metrics_store().get_cost_per_hour(time_window)
def get_skills_crystallized(session_id: str | None = None) -> int:
"""Module-level shortcut for :meth:`SovereigntyMetricsStore.get_skills_crystallized`."""
return get_metrics_store().get_skills_crystallized(session_id)
async def emit_sovereignty_event(
event_type: str,
metadata: dict[str, Any] | None = None,
*,
session_id: str = "",
) -> None:
"""Record an event in a thread and publish it on the event bus.
This is the async-safe entry-point used by the agentic loop.
"""
from infrastructure.events.bus import emit
await asyncio.to_thread(
get_metrics_store().record,
event_type,
metadata,
session_id=session_id,
)
await emit(
f"sovereignty.event.{event_type}",
source="sovereignty_metrics",
data={
"event_type": event_type,
"session_id": session_id,
**(metadata or {}),
},
)
# ── Private helpers ───────────────────────────────────────────────────────────
def _seconds_ago_iso(seconds: float) -> str:
"""Return an ISO-8601 timestamp *seconds* before now (UTC)."""
import datetime as _dt
delta = _dt.timedelta(seconds=seconds)
return (_dt.datetime.now(UTC) - delta).isoformat()

View File

@@ -0,0 +1,92 @@
from __future__ import annotations
import json
from dataclasses import dataclass
from pathlib import Path
from typing import Any
import cv2
import numpy as np
@dataclass
class Template:
name: str
image: np.ndarray
threshold: float = 0.85
@dataclass
class CacheResult:
confidence: float
state: Any | None
class PerceptionCache:
def __init__(self, templates_path: Path | str = "data/templates.json"):
self.templates_path = Path(templates_path)
self.templates: list[Template] = []
self.load()
def match(self, screenshot: np.ndarray) -> CacheResult:
"""
Matches templates against the screenshot.
Returns the confidence and the name of the best matching template.
"""
best_match_confidence = 0.0
best_match_name = None
for template in self.templates:
res = cv2.matchTemplate(screenshot, template.image, cv2.TM_CCOEFF_NORMED)
_, max_val, _, _ = cv2.minMaxLoc(res)
if max_val > best_match_confidence:
best_match_confidence = max_val
best_match_name = template.name
if best_match_confidence > 0.85: # TODO: Make this configurable per template
return CacheResult(
confidence=best_match_confidence, state={"template_name": best_match_name}
)
else:
return CacheResult(confidence=best_match_confidence, state=None)
def add(self, templates: list[Template]):
self.templates.extend(templates)
def persist(self):
self.templates_path.parent.mkdir(parents=True, exist_ok=True)
# Note: This is a simplified persistence mechanism.
# A more robust solution would store templates as images and metadata in JSON.
with self.templates_path.open("w") as f:
json.dump(
[{"name": t.name, "threshold": t.threshold} for t in self.templates], f, indent=2
)
def load(self):
if self.templates_path.exists():
with self.templates_path.open("r") as f:
templates_data = json.load(f)
# This is a simplified loading mechanism and assumes template images are stored elsewhere.
# For now, we are not loading the actual images.
self.templates = [
Template(name=t["name"], image=np.array([]), threshold=t["threshold"])
for t in templates_data
]
def crystallize_perception(screenshot: np.ndarray, vlm_response: Any) -> list[Template]:
"""
Extracts reusable patterns from VLM output and generates OpenCV templates.
This is a placeholder and needs to be implemented based on the actual VLM response format.
"""
# Example implementation:
# templates = []
# for item in vlm_response.get("items", []):
# bbox = item.get("bounding_box")
# template_name = item.get("name")
# if bbox and template_name:
# x1, y1, x2, y2 = bbox
# template_image = screenshot[y1:y2, x1:x2]
# templates.append(Template(name=template_name, image=template_image))
# return templates
return []

View File

@@ -0,0 +1,442 @@
"""Session Sovereignty Report Generator.
Auto-generates a sovereignty scorecard at the end of each play session
and commits it as a markdown file to the Gitea repo under
``reports/sovereignty/``.
Report contents (per issue #957):
- Session duration + game played
- Total model calls by type (VLM, LLM, TTS, API)
- Total cache/rule hits by type
- New skills crystallized (placeholder — pending skill-tracking impl)
- Sovereignty delta (change from session start → end)
- Cost breakdown (actual API spend)
- Per-layer sovereignty %: perception, decision, narration
- Trend comparison vs previous session
Refs: #957 (Sovereignty P0) · #953 (The Sovereignty Loop)
"""
import base64
import json
import logging
from datetime import UTC, datetime
from pathlib import Path
from typing import Any
import httpx
from config import settings
# Optional module-level imports — degrade gracefully if unavailable at import time
try:
from timmy.session_logger import get_session_logger
except Exception: # ImportError or circular import during early startup
get_session_logger = None # type: ignore[assignment]
try:
from infrastructure.sovereignty_metrics import GRADUATION_TARGETS, get_sovereignty_store
except Exception:
GRADUATION_TARGETS: dict = {} # type: ignore[assignment]
get_sovereignty_store = None # type: ignore[assignment]
logger = logging.getLogger(__name__)
# Module-level session start time; set by mark_session_start()
_SESSION_START: datetime | None = None
# ---------------------------------------------------------------------------
# Public API
# ---------------------------------------------------------------------------
def mark_session_start() -> None:
"""Record the session start wall-clock time.
Call once during application startup so ``generate_report()`` can
compute accurate session durations.
"""
global _SESSION_START
_SESSION_START = datetime.now(UTC)
logger.debug("Sovereignty: session start recorded at %s", _SESSION_START.isoformat())
def generate_report(session_id: str = "dashboard") -> str:
"""Render a sovereignty scorecard as a markdown string.
Pulls from:
- ``timmy.session_logger`` — message/tool-call/error counts
- ``infrastructure.sovereignty_metrics`` — cache hit rate, API cost,
graduation phase, and trend data
Args:
session_id: The session identifier (default: "dashboard").
Returns:
Markdown-formatted sovereignty report string.
"""
now = datetime.now(UTC)
session_start = _SESSION_START or now
duration_secs = (now - session_start).total_seconds()
session_data = _gather_session_data()
sov_data = _gather_sovereignty_data()
return _render_markdown(now, session_id, duration_secs, session_data, sov_data)
def commit_report(report_md: str, session_id: str = "dashboard") -> bool:
"""Commit a sovereignty report to the Gitea repo.
Creates or updates ``reports/sovereignty/{date}_{session_id}.md``
via the Gitea Contents API. Degrades gracefully: logs a warning
and returns ``False`` if Gitea is unreachable or misconfigured.
Args:
report_md: Markdown content to commit.
session_id: Session identifier used in the filename.
Returns:
``True`` on success, ``False`` on failure.
"""
if not settings.gitea_enabled:
logger.info("Sovereignty: Gitea disabled — skipping report commit")
return False
if not settings.gitea_token:
logger.warning("Sovereignty: no Gitea token — skipping report commit")
return False
date_str = datetime.now(UTC).strftime("%Y-%m-%d")
file_path = f"reports/sovereignty/{date_str}_{session_id}.md"
url = f"{settings.gitea_url}/api/v1/repos/{settings.gitea_repo}/contents/{file_path}"
headers = {
"Authorization": f"token {settings.gitea_token}",
"Content-Type": "application/json",
}
encoded_content = base64.b64encode(report_md.encode()).decode()
commit_message = (
f"report: sovereignty session {session_id} ({date_str})\n\n"
f"Auto-generated by Timmy. Refs #957"
)
payload: dict[str, Any] = {
"message": commit_message,
"content": encoded_content,
}
try:
with httpx.Client(timeout=10.0) as client:
# Fetch existing file SHA so we can update rather than create
check = client.get(url, headers=headers)
if check.status_code == 200:
existing = check.json()
payload["sha"] = existing.get("sha", "")
resp = client.put(url, headers=headers, json=payload)
resp.raise_for_status()
logger.info("Sovereignty: report committed to %s", file_path)
return True
except httpx.HTTPStatusError as exc:
logger.warning(
"Sovereignty: commit failed (HTTP %s): %s",
exc.response.status_code,
exc,
)
return False
except Exception as exc:
logger.warning("Sovereignty: commit failed: %s", exc)
return False
async def generate_and_commit_report(session_id: str = "dashboard") -> bool:
"""Generate and commit a sovereignty report for the current session.
Primary entry point — call at session end / application shutdown.
Wraps the synchronous ``commit_report`` call in ``asyncio.to_thread``
so it does not block the event loop.
Args:
session_id: The session identifier.
Returns:
``True`` if the report was generated and committed successfully.
"""
import asyncio
try:
report_md = generate_report(session_id)
logger.info("Sovereignty: report generated (%d chars)", len(report_md))
committed = await asyncio.to_thread(commit_report, report_md, session_id)
return committed
except Exception as exc:
logger.warning("Sovereignty: report generation failed: %s", exc)
return False
# ---------------------------------------------------------------------------
# Internal helpers
# ---------------------------------------------------------------------------
def _format_duration(seconds: float) -> str:
"""Format a duration in seconds as a human-readable string."""
total = int(seconds)
hours, remainder = divmod(total, 3600)
minutes, secs = divmod(remainder, 60)
if hours:
return f"{hours}h {minutes}m {secs}s"
if minutes:
return f"{minutes}m {secs}s"
return f"{secs}s"
def _gather_session_data() -> dict[str, Any]:
"""Pull session statistics from the session logger.
Returns a dict with:
- ``user_messages``, ``timmy_messages``, ``tool_calls``, ``errors``
- ``tool_call_breakdown``: dict[tool_name, count]
"""
default: dict[str, Any] = {
"user_messages": 0,
"timmy_messages": 0,
"tool_calls": 0,
"errors": 0,
"tool_call_breakdown": {},
}
try:
if get_session_logger is None:
return default
sl = get_session_logger()
sl.flush()
# Read today's session file directly for accurate counts
if not sl.session_file.exists():
return default
entries: list[dict] = []
with open(sl.session_file) as f:
for line in f:
line = line.strip()
if line:
try:
entries.append(json.loads(line))
except json.JSONDecodeError:
continue
tool_breakdown: dict[str, int] = {}
user_msgs = timmy_msgs = tool_calls = errors = 0
for entry in entries:
etype = entry.get("type")
if etype == "message":
if entry.get("role") == "user":
user_msgs += 1
elif entry.get("role") == "timmy":
timmy_msgs += 1
elif etype == "tool_call":
tool_calls += 1
tool_name = entry.get("tool", "unknown")
tool_breakdown[tool_name] = tool_breakdown.get(tool_name, 0) + 1
elif etype == "error":
errors += 1
return {
"user_messages": user_msgs,
"timmy_messages": timmy_msgs,
"tool_calls": tool_calls,
"errors": errors,
"tool_call_breakdown": tool_breakdown,
}
except Exception as exc:
logger.warning("Sovereignty: failed to gather session data: %s", exc)
return default
def _gather_sovereignty_data() -> dict[str, Any]:
"""Pull sovereignty metrics from the SQLite store.
Returns a dict with:
- ``metrics``: summary from ``SovereigntyMetricsStore.get_summary()``
- ``deltas``: per-metric start/end values within recent history window
- ``previous_session``: most recent prior value for each metric
"""
try:
if get_sovereignty_store is None:
return {"metrics": {}, "deltas": {}, "previous_session": {}}
store = get_sovereignty_store()
summary = store.get_summary()
deltas: dict[str, dict[str, Any]] = {}
previous_session: dict[str, float | None] = {}
for metric_type in GRADUATION_TARGETS:
history = store.get_latest(metric_type, limit=10)
if len(history) >= 2:
deltas[metric_type] = {
"start": history[-1]["value"],
"end": history[0]["value"],
}
previous_session[metric_type] = history[1]["value"]
elif len(history) == 1:
deltas[metric_type] = {"start": history[0]["value"], "end": history[0]["value"]}
previous_session[metric_type] = None
else:
deltas[metric_type] = {"start": None, "end": None}
previous_session[metric_type] = None
return {
"metrics": summary,
"deltas": deltas,
"previous_session": previous_session,
}
except Exception as exc:
logger.warning("Sovereignty: failed to gather sovereignty data: %s", exc)
return {"metrics": {}, "deltas": {}, "previous_session": {}}
def _render_markdown(
now: datetime,
session_id: str,
duration_secs: float,
session_data: dict[str, Any],
sov_data: dict[str, Any],
) -> str:
"""Assemble the full sovereignty report in markdown."""
lines: list[str] = []
# Header
lines += [
"# Sovereignty Session Report",
"",
f"**Session ID:** `{session_id}` ",
f"**Date:** {now.strftime('%Y-%m-%d')} ",
f"**Duration:** {_format_duration(duration_secs)} ",
f"**Generated:** {now.isoformat()}",
"",
"---",
"",
]
# Session activity
lines += [
"## Session Activity",
"",
"| Metric | Count |",
"|--------|-------|",
f"| User messages | {session_data['user_messages']} |",
f"| Timmy responses | {session_data['timmy_messages']} |",
f"| Tool calls | {session_data['tool_calls']} |",
f"| Errors | {session_data['errors']} |",
"",
]
tool_breakdown = session_data.get("tool_call_breakdown", {})
if tool_breakdown:
lines += ["### Model Calls by Tool", ""]
for tool_name, count in sorted(tool_breakdown.items(), key=lambda x: -x[1]):
lines.append(f"- `{tool_name}`: {count}")
lines.append("")
# Sovereignty scorecard
lines += [
"## Sovereignty Scorecard",
"",
"| Metric | Current | Target (graduation) | Phase |",
"|--------|---------|---------------------|-------|",
]
for metric_type, data in sov_data["metrics"].items():
current = data.get("current")
current_str = f"{current:.4f}" if current is not None else "N/A"
grad_target = GRADUATION_TARGETS.get(metric_type, {}).get("graduation")
grad_str = f"{grad_target:.4f}" if isinstance(grad_target, (int, float)) else "N/A"
phase = data.get("phase", "unknown")
lines.append(f"| {metric_type} | {current_str} | {grad_str} | {phase} |")
lines += ["", "### Sovereignty Delta (This Session)", ""]
for metric_type, delta_info in sov_data.get("deltas", {}).items():
start_val = delta_info.get("start")
end_val = delta_info.get("end")
if start_val is not None and end_val is not None:
diff = end_val - start_val
sign = "+" if diff >= 0 else ""
lines.append(
f"- **{metric_type}**: {start_val:.4f}{end_val:.4f} ({sign}{diff:.4f})"
)
else:
lines.append(f"- **{metric_type}**: N/A (no data recorded)")
# Cost breakdown
lines += ["", "## Cost Breakdown", ""]
api_cost_data = sov_data["metrics"].get("api_cost", {})
current_cost = api_cost_data.get("current")
if current_cost is not None:
lines.append(f"- **Total API spend (latest recorded):** ${current_cost:.4f}")
else:
lines.append("- **Total API spend:** N/A (no data recorded)")
lines.append("")
# Per-layer sovereignty
lines += [
"## Per-Layer Sovereignty",
"",
"| Layer | Sovereignty % |",
"|-------|--------------|",
"| Perception (VLM) | N/A |",
"| Decision (LLM) | N/A |",
"| Narration (TTS) | N/A |",
"",
"> Per-layer tracking requires instrumented inference calls. See #957.",
"",
]
# Skills crystallized
lines += [
"## Skills Crystallized",
"",
"_Skill crystallization tracking not yet implemented. See #957._",
"",
]
# Trend vs previous session
lines += ["## Trend vs Previous Session", ""]
prev_data = sov_data.get("previous_session", {})
has_prev = any(v is not None for v in prev_data.values())
if has_prev:
lines += [
"| Metric | Previous | Current | Change |",
"|--------|----------|---------|--------|",
]
for metric_type, curr_info in sov_data["metrics"].items():
curr_val = curr_info.get("current")
prev_val = prev_data.get(metric_type)
curr_str = f"{curr_val:.4f}" if curr_val is not None else "N/A"
prev_str = f"{prev_val:.4f}" if prev_val is not None else "N/A"
if curr_val is not None and prev_val is not None:
diff = curr_val - prev_val
sign = "+" if diff >= 0 else ""
change_str = f"{sign}{diff:.4f}"
else:
change_str = "N/A"
lines.append(f"| {metric_type} | {prev_str} | {curr_str} | {change_str} |")
lines.append("")
else:
lines += ["_No previous session data available for comparison._", ""]
# Footer
lines += [
"---",
"_Auto-generated by Timmy · Session Sovereignty Report · Refs: #957_",
]
return "\n".join(lines)

View File

@@ -0,0 +1,482 @@
"""Three-Strike Detector for Repeated Manual Work.
Tracks recurring manual actions by category and key. When the same action
is performed three or more times, it blocks further attempts and requires
an automation artifact to be registered first.
Strike 1 (count=1): discovery — action proceeds normally
Strike 2 (count=2): warning — action proceeds with a logged warning
Strike 3 (count≥3): blocked — raises ThreeStrikeError; caller must
register an automation artifact first
Governing principle: "If you do the same thing manually three times,
you have failed to crystallise."
Categories tracked:
- vlm_prompt_edit VLM prompt edits for the same UI element
- game_bug_review Manual game-bug reviews for the same bug type
- parameter_tuning Manual parameter tuning for the same parameter
- portal_adapter_creation Manual portal-adapter creation for same pattern
- deployment_step Manual deployment steps
The Falsework Checklist is enforced before cloud API calls via
:func:`falsework_check`.
Refs: #962
"""
from __future__ import annotations
import json
import logging
import sqlite3
from contextlib import closing
from dataclasses import dataclass, field
from datetime import UTC, datetime
from pathlib import Path
from typing import Any
from config import settings
logger = logging.getLogger(__name__)
# ── Constants ────────────────────────────────────────────────────────────────
DB_PATH = Path(settings.repo_root) / "data" / "three_strike.db"
CATEGORIES = frozenset(
{
"vlm_prompt_edit",
"game_bug_review",
"parameter_tuning",
"portal_adapter_creation",
"deployment_step",
}
)
STRIKE_WARNING = 2
STRIKE_BLOCK = 3
_SCHEMA = """
CREATE TABLE IF NOT EXISTS strikes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
category TEXT NOT NULL,
key TEXT NOT NULL,
count INTEGER NOT NULL DEFAULT 0,
blocked INTEGER NOT NULL DEFAULT 0,
automation TEXT DEFAULT NULL,
first_seen TEXT NOT NULL,
last_seen TEXT NOT NULL
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_strikes_cat_key ON strikes(category, key);
CREATE INDEX IF NOT EXISTS idx_strikes_blocked ON strikes(blocked);
CREATE TABLE IF NOT EXISTS strike_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
category TEXT NOT NULL,
key TEXT NOT NULL,
strike_num INTEGER NOT NULL,
metadata TEXT DEFAULT '{}',
timestamp TEXT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_se_cat_key ON strike_events(category, key);
CREATE INDEX IF NOT EXISTS idx_se_ts ON strike_events(timestamp);
"""
# ── Exceptions ────────────────────────────────────────────────────────────────
class ThreeStrikeError(RuntimeError):
"""Raised when a manual action has reached the third strike.
Attributes:
category: The action category (e.g. ``"vlm_prompt_edit"``).
key: The specific action key (e.g. a UI element name).
count: Total number of times this action has been recorded.
"""
def __init__(self, category: str, key: str, count: int) -> None:
self.category = category
self.key = key
self.count = count
super().__init__(
f"Three-strike block: '{category}/{key}' has been performed manually "
f"{count} time(s). Register an automation artifact before continuing. "
f"Run the Falsework Checklist (see three_strike.falsework_check)."
)
# ── Data classes ──────────────────────────────────────────────────────────────
@dataclass
class StrikeRecord:
"""State for one (category, key) pair."""
category: str
key: str
count: int
blocked: bool
automation: str | None
first_seen: str
last_seen: str
@dataclass
class FalseworkChecklist:
"""Pre-cloud-API call checklist — must be completed before making
expensive external calls.
Instantiate and call :meth:`validate` to ensure all answers are provided.
"""
durable_artifact: str = ""
artifact_storage_path: str = ""
local_rule_or_cache: str = ""
will_repeat: bool | None = None
elimination_strategy: str = ""
sovereignty_delta: str = ""
# ── internal ──
_errors: list[str] = field(default_factory=list, init=False, repr=False)
def validate(self) -> list[str]:
"""Return a list of unanswered questions. Empty list → checklist passes."""
self._errors = []
if not self.durable_artifact.strip():
self._errors.append("Q1: What durable artifact will this call produce?")
if not self.artifact_storage_path.strip():
self._errors.append("Q2: Where will the artifact be stored locally?")
if not self.local_rule_or_cache.strip():
self._errors.append("Q3: What local rule or cache will this populate?")
if self.will_repeat is None:
self._errors.append("Q4: After this call, will I need to make it again?")
if self.will_repeat and not self.elimination_strategy.strip():
self._errors.append("Q5: If yes, what would eliminate the repeat?")
if not self.sovereignty_delta.strip():
self._errors.append("Q6: What is the sovereignty delta of this call?")
return self._errors
@property
def passed(self) -> bool:
"""True when :meth:`validate` found no unanswered questions."""
return len(self.validate()) == 0
# ── Store ─────────────────────────────────────────────────────────────────────
class ThreeStrikeStore:
"""SQLite-backed three-strike store.
Thread-safe: creates a new connection per operation.
"""
def __init__(self, db_path: Path | None = None) -> None:
self._db_path = db_path or DB_PATH
self._init_db()
# ── setup ─────────────────────────────────────────────────────────────
def _init_db(self) -> None:
try:
self._db_path.parent.mkdir(parents=True, exist_ok=True)
with closing(sqlite3.connect(str(self._db_path))) as conn:
conn.execute("PRAGMA journal_mode=WAL")
conn.execute(f"PRAGMA busy_timeout={settings.db_busy_timeout_ms}")
conn.executescript(_SCHEMA)
conn.commit()
except Exception as exc:
logger.warning("Failed to initialise three-strike DB: %s", exc)
def _connect(self) -> sqlite3.Connection:
conn = sqlite3.connect(str(self._db_path))
conn.row_factory = sqlite3.Row
conn.execute(f"PRAGMA busy_timeout={settings.db_busy_timeout_ms}")
return conn
# ── record ────────────────────────────────────────────────────────────
def record(
self,
category: str,
key: str,
metadata: dict[str, Any] | None = None,
) -> StrikeRecord:
"""Record a manual action and return the updated :class:`StrikeRecord`.
Raises :exc:`ThreeStrikeError` when the action is already blocked
(count ≥ STRIKE_BLOCK) and no automation has been registered.
Args:
category: Action category; must be in :data:`CATEGORIES`.
key: Specific identifier within the category.
metadata: Optional context stored alongside the event.
Returns:
The updated :class:`StrikeRecord`.
Raises:
ValueError: If *category* is not in :data:`CATEGORIES`.
ThreeStrikeError: On the third (or later) strike with no automation.
"""
if category not in CATEGORIES:
raise ValueError(f"Unknown category '{category}'. Valid: {sorted(CATEGORIES)}")
now = datetime.now(UTC).isoformat()
meta_json = json.dumps(metadata or {})
try:
with closing(self._connect()) as conn:
# Upsert the aggregate row
conn.execute(
"""
INSERT INTO strikes (category, key, count, blocked, first_seen, last_seen)
VALUES (?, ?, 1, 0, ?, ?)
ON CONFLICT(category, key) DO UPDATE SET
count = count + 1,
last_seen = excluded.last_seen
""",
(category, key, now, now),
)
row = conn.execute(
"SELECT * FROM strikes WHERE category=? AND key=?",
(category, key),
).fetchone()
count = row["count"]
blocked = bool(row["blocked"])
automation = row["automation"]
# Record the individual event
conn.execute(
"INSERT INTO strike_events (category, key, strike_num, metadata, timestamp) "
"VALUES (?, ?, ?, ?, ?)",
(category, key, count, meta_json, now),
)
# Mark as blocked once threshold reached
if count >= STRIKE_BLOCK and not blocked:
conn.execute(
"UPDATE strikes SET blocked=1 WHERE category=? AND key=?",
(category, key),
)
blocked = True
conn.commit()
except ThreeStrikeError:
raise
except Exception as exc:
logger.warning("Three-strike DB error during record: %s", exc)
# Re-raise DB errors so callers are aware
raise
record = StrikeRecord(
category=category,
key=key,
count=count,
blocked=blocked,
automation=automation,
first_seen=row["first_seen"],
last_seen=now,
)
self._emit_log(record)
if blocked and not automation:
raise ThreeStrikeError(category=category, key=key, count=count)
return record
def _emit_log(self, record: StrikeRecord) -> None:
"""Log a warning or info message based on strike number."""
if record.count == STRIKE_WARNING:
logger.warning(
"Three-strike WARNING: '%s/%s' has been performed manually %d times. "
"Consider writing an automation.",
record.category,
record.key,
record.count,
)
elif record.count >= STRIKE_BLOCK:
logger.warning(
"Three-strike BLOCK: '%s/%s' reached %d strikes — automation required.",
record.category,
record.key,
record.count,
)
else:
logger.info(
"Three-strike discovery: '%s/%s' — strike %d.",
record.category,
record.key,
record.count,
)
# ── automation registration ───────────────────────────────────────────
def register_automation(
self,
category: str,
key: str,
artifact_path: str,
) -> None:
"""Unblock a (category, key) pair by registering an automation artifact.
Once registered, future calls to :meth:`record` will proceed normally
and the strike counter resets to zero.
Args:
category: Action category.
key: Specific identifier within the category.
artifact_path: Path or identifier of the automation artifact.
"""
try:
with closing(self._connect()) as conn:
conn.execute(
"UPDATE strikes SET automation=?, blocked=0, count=0 "
"WHERE category=? AND key=?",
(artifact_path, category, key),
)
conn.commit()
logger.info(
"Three-strike: automation registered for '%s/%s'%s",
category,
key,
artifact_path,
)
except Exception as exc:
logger.warning("Failed to register automation: %s", exc)
# ── queries ───────────────────────────────────────────────────────────
def get(self, category: str, key: str) -> StrikeRecord | None:
"""Return the :class:`StrikeRecord` for (category, key), or None."""
try:
with closing(self._connect()) as conn:
row = conn.execute(
"SELECT * FROM strikes WHERE category=? AND key=?",
(category, key),
).fetchone()
if row is None:
return None
return StrikeRecord(
category=row["category"],
key=row["key"],
count=row["count"],
blocked=bool(row["blocked"]),
automation=row["automation"],
first_seen=row["first_seen"],
last_seen=row["last_seen"],
)
except Exception as exc:
logger.warning("Failed to query strike record: %s", exc)
return None
def list_blocked(self) -> list[StrikeRecord]:
"""Return all currently-blocked (category, key) pairs."""
try:
with closing(self._connect()) as conn:
rows = conn.execute(
"SELECT * FROM strikes WHERE blocked=1 ORDER BY last_seen DESC"
).fetchall()
return [
StrikeRecord(
category=r["category"],
key=r["key"],
count=r["count"],
blocked=True,
automation=r["automation"],
first_seen=r["first_seen"],
last_seen=r["last_seen"],
)
for r in rows
]
except Exception as exc:
logger.warning("Failed to query blocked strikes: %s", exc)
return []
def list_all(self) -> list[StrikeRecord]:
"""Return all strike records ordered by last seen (most recent first)."""
try:
with closing(self._connect()) as conn:
rows = conn.execute("SELECT * FROM strikes ORDER BY last_seen DESC").fetchall()
return [
StrikeRecord(
category=r["category"],
key=r["key"],
count=r["count"],
blocked=bool(r["blocked"]),
automation=r["automation"],
first_seen=r["first_seen"],
last_seen=r["last_seen"],
)
for r in rows
]
except Exception as exc:
logger.warning("Failed to list strike records: %s", exc)
return []
def get_events(self, category: str, key: str, limit: int = 50) -> list[dict]:
"""Return the individual strike events for (category, key)."""
try:
with closing(self._connect()) as conn:
rows = conn.execute(
"SELECT * FROM strike_events WHERE category=? AND key=? "
"ORDER BY timestamp DESC LIMIT ?",
(category, key, limit),
).fetchall()
return [
{
"strike_num": r["strike_num"],
"timestamp": r["timestamp"],
"metadata": json.loads(r["metadata"]) if r["metadata"] else {},
}
for r in rows
]
except Exception as exc:
logger.warning("Failed to query strike events: %s", exc)
return []
# ── Falsework checklist helper ────────────────────────────────────────────────
def falsework_check(checklist: FalseworkChecklist) -> None:
"""Enforce the Falsework Checklist before a cloud API call.
Raises :exc:`ValueError` listing all unanswered questions if the checklist
does not pass.
Usage::
checklist = FalseworkChecklist(
durable_artifact="embedding vectors for UI element foo",
artifact_storage_path="data/vlm/foo_embeddings.json",
local_rule_or_cache="vlm_cache",
will_repeat=False,
sovereignty_delta="eliminates repeated VLM call",
)
falsework_check(checklist) # raises ValueError if incomplete
"""
errors = checklist.validate()
if errors:
raise ValueError(
"Falsework Checklist incomplete — answer all questions before "
"making a cloud API call:\n" + "\n".join(f"{e}" for e in errors)
)
# ── Module-level singleton ────────────────────────────────────────────────────
_detector: ThreeStrikeStore | None = None
def get_detector() -> ThreeStrikeStore:
"""Return the module-level :class:`ThreeStrikeStore`, creating it once."""
global _detector
if _detector is None:
_detector = ThreeStrikeStore()
return _detector

View File

@@ -692,91 +692,112 @@ class ThinkingEngine:
file paths actually exist on disk, preventing phantom-bug reports.
"""
try:
interval = settings.thinking_issue_every
if interval <= 0:
recent = self._get_recent_thoughts_for_issues()
if recent is None:
return
count = self.count_thoughts()
if count == 0 or count % interval != 0:
return
# Check Gitea availability before spending LLM tokens
if not settings.gitea_enabled or not settings.gitea_token:
return
recent = self.get_recent_thoughts(limit=interval)
if len(recent) < interval:
return
thought_text = "\n".join(f"- [{t.seed_type}] {t.content}" for t in reversed(recent))
classify_prompt = (
"You are reviewing your own recent thoughts for actionable items.\n"
"Extract 0-2 items that are CONCRETE bugs, broken features, stale "
"state, or clear improvement opportunities in your own codebase.\n\n"
"Rules:\n"
"- Only include things that could become a real code fix or feature\n"
"- Skip vague reflections, philosophical musings, or repeated themes\n"
"- Category must be one of: bug, feature, suggestion, maintenance\n"
"- ONLY reference files that you are CERTAIN exist in the project\n"
"- Do NOT invent or guess file paths — if unsure, describe the "
"area of concern without naming specific files\n\n"
"For each item, write an ENGINEER-QUALITY issue:\n"
'- "title": A clear, specific title (e.g. "[Memory] MEMORY.md timestamp not updating")\n'
'- "body": A detailed body with these sections:\n'
" **What's happening:** Describe the current (broken) behavior.\n"
" **Expected behavior:** What should happen instead.\n"
" **Suggested fix:** Which file(s) to change and what the fix looks like.\n"
" **Acceptance criteria:** How to verify the fix works.\n"
'- "category": One of bug, feature, suggestion, maintenance\n\n'
"Return ONLY a JSON array of objects with keys: "
'"title", "body", "category"\n'
"Return [] if nothing is actionable.\n\n"
f"Recent thoughts:\n{thought_text}\n\nJSON array:"
)
classify_prompt = self._build_issue_classify_prompt(recent)
raw = await self._call_agent(classify_prompt)
if not raw or not raw.strip():
return
import json
# Strip markdown code fences if present
cleaned = raw.strip()
if cleaned.startswith("```"):
cleaned = cleaned.split("\n", 1)[-1].rsplit("```", 1)[0].strip()
items = json.loads(cleaned)
if not isinstance(items, list) or not items:
items = self._parse_issue_items(raw)
if items is None:
return
from timmy.mcp_tools import create_gitea_issue_via_mcp
for item in items[:2]: # Safety cap
if not isinstance(item, dict):
continue
title = item.get("title", "").strip()
body = item.get("body", "").strip()
category = item.get("category", "suggestion").strip()
if not title or len(title) < 10:
continue
# Validate all referenced file paths exist on disk
combined = f"{title}\n{body}"
if not self._references_real_files(combined):
logger.info(
"Skipped phantom issue: %s (references non-existent files)",
title[:60],
)
continue
label = category if category in ("bug", "feature") else ""
result = await create_gitea_issue_via_mcp(title=title, body=body, labels=label)
logger.info("Thought→Issue: %s%s", title[:60], result[:80])
await self._file_single_issue(item, create_gitea_issue_via_mcp)
except Exception as exc:
logger.debug("Thought issue filing skipped: %s", exc)
def _get_recent_thoughts_for_issues(self):
"""Return recent thoughts if conditions for filing issues are met, else None."""
interval = settings.thinking_issue_every
if interval <= 0:
return None
count = self.count_thoughts()
if count == 0 or count % interval != 0:
return None
if not settings.gitea_enabled or not settings.gitea_token:
return None
recent = self.get_recent_thoughts(limit=interval)
if len(recent) < interval:
return None
return recent
@staticmethod
def _build_issue_classify_prompt(recent) -> str:
"""Build the LLM prompt that extracts actionable issues from recent thoughts."""
thought_text = "\n".join(f"- [{t.seed_type}] {t.content}" for t in reversed(recent))
return (
"You are reviewing your own recent thoughts for actionable items.\n"
"Extract 0-2 items that are CONCRETE bugs, broken features, stale "
"state, or clear improvement opportunities in your own codebase.\n\n"
"Rules:\n"
"- Only include things that could become a real code fix or feature\n"
"- Skip vague reflections, philosophical musings, or repeated themes\n"
"- Category must be one of: bug, feature, suggestion, maintenance\n"
"- ONLY reference files that you are CERTAIN exist in the project\n"
"- Do NOT invent or guess file paths — if unsure, describe the "
"area of concern without naming specific files\n\n"
"For each item, write an ENGINEER-QUALITY issue:\n"
'- "title": A clear, specific title (e.g. "[Memory] MEMORY.md timestamp not updating")\n'
'- "body": A detailed body with these sections:\n'
" **What's happening:** Describe the current (broken) behavior.\n"
" **Expected behavior:** What should happen instead.\n"
" **Suggested fix:** Which file(s) to change and what the fix looks like.\n"
" **Acceptance criteria:** How to verify the fix works.\n"
'- "category": One of bug, feature, suggestion, maintenance\n\n'
"Return ONLY a JSON array of objects with keys: "
'"title", "body", "category"\n'
"Return [] if nothing is actionable.\n\n"
f"Recent thoughts:\n{thought_text}\n\nJSON array:"
)
@staticmethod
def _parse_issue_items(raw: str):
"""Strip markdown fences and parse JSON issue list; return None on failure."""
import json
if not raw or not raw.strip():
return None
cleaned = raw.strip()
if cleaned.startswith("```"):
cleaned = cleaned.split("\n", 1)[-1].rsplit("```", 1)[0].strip()
items = json.loads(cleaned)
if not isinstance(items, list) or not items:
return None
return items
async def _file_single_issue(self, item: dict, create_fn) -> None:
"""Validate one issue dict and create it via *create_fn* if it passes checks."""
if not isinstance(item, dict):
return
title = item.get("title", "").strip()
body = item.get("body", "").strip()
category = item.get("category", "suggestion").strip()
if not title or len(title) < 10:
return
combined = f"{title}\n{body}"
if not self._references_real_files(combined):
logger.info(
"Skipped phantom issue: %s (references non-existent files)",
title[:60],
)
return
label = category if category in ("bug", "feature") else ""
result = await create_fn(title=title, body=body, labels=label)
logger.info("Thought→Issue: %s%s", title[:60], result[:80])
# ── System snapshot helpers ────────────────────────────────────────────
def _snap_thought_count(self, now: datetime) -> str | None:

View File

@@ -0,0 +1,94 @@
"""Tool integration for the agent swarm.
Provides agents with capabilities for:
- File read/write (local filesystem)
- Shell command execution (sandboxed)
- Python code execution
- Git operations
- Image / Music / Video generation (creative pipeline)
Tools are assigned to agents based on their specialties.
Sub-modules:
- _base: shared types, tracking state
- file_tools: file-operation toolkit factories (Echo, Quill, Seer)
- system_tools: calculator, AI tools, code/devops toolkit factories
- _registry: full toolkit construction, agent registry, tool catalog
"""
# Re-export everything for backward compatibility — callers that do
# ``from timmy.tools import <symbol>`` continue to work unchanged.
from timmy.tools._base import (
_AGNO_TOOLS_AVAILABLE,
_TOOL_USAGE,
AgentTools,
PersonaTools,
ToolStats,
_ImportError,
_track_tool_usage,
get_tool_stats,
)
from timmy.tools._registry import (
AGENT_TOOLKITS,
PERSONA_TOOLKITS,
_create_stub_toolkit,
_merge_catalog,
create_experiment_tools,
create_full_toolkit,
get_all_available_tools,
get_tools_for_agent,
get_tools_for_persona,
)
from timmy.tools.file_tools import (
_make_smart_read_file,
create_data_tools,
create_research_tools,
create_writing_tools,
)
from timmy.tools.system_tools import (
_safe_eval,
calculator,
consult_grok,
create_aider_tool,
create_code_tools,
create_devops_tools,
create_security_tools,
web_fetch,
)
__all__ = [
# _base
"AgentTools",
"PersonaTools",
"ToolStats",
"_AGNO_TOOLS_AVAILABLE",
"_ImportError",
"_TOOL_USAGE",
"_track_tool_usage",
"get_tool_stats",
# file_tools
"_make_smart_read_file",
"create_data_tools",
"create_research_tools",
"create_writing_tools",
# system_tools
"_safe_eval",
"calculator",
"consult_grok",
"create_aider_tool",
"create_code_tools",
"create_devops_tools",
"create_security_tools",
"web_fetch",
# _registry
"AGENT_TOOLKITS",
"PERSONA_TOOLKITS",
"_create_stub_toolkit",
"_merge_catalog",
"create_experiment_tools",
"create_full_toolkit",
"get_all_available_tools",
"get_tools_for_agent",
"get_tools_for_persona",
]

90
src/timmy/tools/_base.py Normal file
View File

@@ -0,0 +1,90 @@
"""Base types, shared state, and tracking for the Timmy tool system."""
from __future__ import annotations
import logging
from dataclasses import dataclass, field
from datetime import UTC, datetime
logger = logging.getLogger(__name__)
# Lazy imports to handle test mocking
_ImportError = None
try:
from agno.tools import Toolkit # noqa: F401
from agno.tools.file import FileTools # noqa: F401
from agno.tools.python import PythonTools # noqa: F401
from agno.tools.shell import ShellTools # noqa: F401
_AGNO_TOOLS_AVAILABLE = True
except ImportError as e:
_AGNO_TOOLS_AVAILABLE = False
_ImportError = e
# Track tool usage stats
_TOOL_USAGE: dict[str, list[dict]] = {}
@dataclass
class ToolStats:
"""Statistics for a single tool."""
tool_name: str
call_count: int = 0
last_used: str | None = None
errors: int = 0
@dataclass
class AgentTools:
"""Tools assigned to an agent."""
agent_id: str
agent_name: str
toolkit: Toolkit
available_tools: list[str] = field(default_factory=list)
# Backward-compat alias
PersonaTools = AgentTools
def _track_tool_usage(agent_id: str, tool_name: str, success: bool = True) -> None:
"""Track tool usage for analytics."""
if agent_id not in _TOOL_USAGE:
_TOOL_USAGE[agent_id] = []
_TOOL_USAGE[agent_id].append(
{
"tool": tool_name,
"timestamp": datetime.now(UTC).isoformat(),
"success": success,
}
)
def get_tool_stats(agent_id: str | None = None) -> dict:
"""Get tool usage statistics.
Args:
agent_id: Optional agent ID to filter by. If None, returns stats for all agents.
Returns:
Dict with tool usage statistics.
"""
if agent_id:
usage = _TOOL_USAGE.get(agent_id, [])
return {
"agent_id": agent_id,
"total_calls": len(usage),
"tools_used": list(set(u["tool"] for u in usage)),
"recent_calls": usage[-10:] if usage else [],
}
# Return stats for all agents
all_stats = {}
for aid, usage in _TOOL_USAGE.items():
all_stats[aid] = {
"total_calls": len(usage),
"tools_used": list(set(u["tool"] for u in usage)),
}
return all_stats

View File

@@ -1,532 +1,48 @@
"""Tool integration for the agent swarm.
"""Tool registry, full toolkit construction, and tool catalog.
Provides agents with capabilities for:
- File read/write (local filesystem)
- Shell command execution (sandboxed)
- Python code execution
- Git operations
- Image / Music / Video generation (creative pipeline)
Tools are assigned to agents based on their specialties.
Provides:
- Internal _register_* helpers for wiring tools into toolkits
- create_full_toolkit (orchestrator toolkit)
- create_experiment_tools (Lab agent toolkit)
- AGENT_TOOLKITS / get_tools_for_agent registry
- get_all_available_tools catalog
"""
from __future__ import annotations
import ast
import logging
import math
from collections.abc import Callable
from dataclasses import dataclass, field
from datetime import UTC, datetime
from pathlib import Path
from config import settings
from timmy.tools._base import (
_AGNO_TOOLS_AVAILABLE,
FileTools,
PythonTools,
ShellTools,
Toolkit,
_ImportError,
)
from timmy.tools.file_tools import (
_make_smart_read_file,
create_data_tools,
create_research_tools,
create_writing_tools,
)
from timmy.tools.system_tools import (
calculator,
consult_grok,
create_code_tools,
create_devops_tools,
create_security_tools,
web_fetch,
)
logger = logging.getLogger(__name__)
# Max characters of user query included in Lightning invoice memo
_INVOICE_MEMO_MAX_LEN = 50
# Lazy imports to handle test mocking
_ImportError = None
try:
from agno.tools import Toolkit
from agno.tools.file import FileTools
from agno.tools.python import PythonTools
from agno.tools.shell import ShellTools
_AGNO_TOOLS_AVAILABLE = True
except ImportError as e:
_AGNO_TOOLS_AVAILABLE = False
_ImportError = e
# Track tool usage stats
_TOOL_USAGE: dict[str, list[dict]] = {}
@dataclass
class ToolStats:
"""Statistics for a single tool."""
tool_name: str
call_count: int = 0
last_used: str | None = None
errors: int = 0
@dataclass
class AgentTools:
"""Tools assigned to an agent."""
agent_id: str
agent_name: str
toolkit: Toolkit
available_tools: list[str] = field(default_factory=list)
# Backward-compat alias
PersonaTools = AgentTools
def _track_tool_usage(agent_id: str, tool_name: str, success: bool = True) -> None:
"""Track tool usage for analytics."""
if agent_id not in _TOOL_USAGE:
_TOOL_USAGE[agent_id] = []
_TOOL_USAGE[agent_id].append(
{
"tool": tool_name,
"timestamp": datetime.now(UTC).isoformat(),
"success": success,
}
)
def get_tool_stats(agent_id: str | None = None) -> dict:
"""Get tool usage statistics.
Args:
agent_id: Optional agent ID to filter by. If None, returns stats for all agents.
Returns:
Dict with tool usage statistics.
"""
if agent_id:
usage = _TOOL_USAGE.get(agent_id, [])
return {
"agent_id": agent_id,
"total_calls": len(usage),
"tools_used": list(set(u["tool"] for u in usage)),
"recent_calls": usage[-10:] if usage else [],
}
# Return stats for all agents
all_stats = {}
for aid, usage in _TOOL_USAGE.items():
all_stats[aid] = {
"total_calls": len(usage),
"tools_used": list(set(u["tool"] for u in usage)),
}
return all_stats
def _safe_eval(node, allowed_names: dict):
"""Walk an AST and evaluate only safe numeric operations."""
if isinstance(node, ast.Expression):
return _safe_eval(node.body, allowed_names)
if isinstance(node, ast.Constant):
if isinstance(node.value, (int, float, complex)):
return node.value
raise ValueError(f"Unsupported constant: {node.value!r}")
if isinstance(node, ast.UnaryOp):
operand = _safe_eval(node.operand, allowed_names)
if isinstance(node.op, ast.UAdd):
return +operand
if isinstance(node.op, ast.USub):
return -operand
raise ValueError(f"Unsupported unary op: {type(node.op).__name__}")
if isinstance(node, ast.BinOp):
left = _safe_eval(node.left, allowed_names)
right = _safe_eval(node.right, allowed_names)
ops = {
ast.Add: lambda a, b: a + b,
ast.Sub: lambda a, b: a - b,
ast.Mult: lambda a, b: a * b,
ast.Div: lambda a, b: a / b,
ast.FloorDiv: lambda a, b: a // b,
ast.Mod: lambda a, b: a % b,
ast.Pow: lambda a, b: a**b,
}
op_fn = ops.get(type(node.op))
if op_fn is None:
raise ValueError(f"Unsupported binary op: {type(node.op).__name__}")
return op_fn(left, right)
if isinstance(node, ast.Name):
if node.id in allowed_names:
return allowed_names[node.id]
raise ValueError(f"Unknown name: {node.id!r}")
if isinstance(node, ast.Attribute):
value = _safe_eval(node.value, allowed_names)
# Only allow attribute access on the math module
if value is math:
attr = getattr(math, node.attr, None)
if attr is not None:
return attr
raise ValueError(f"Attribute access not allowed: .{node.attr}")
if isinstance(node, ast.Call):
func = _safe_eval(node.func, allowed_names)
if not callable(func):
raise ValueError(f"Not callable: {func!r}")
args = [_safe_eval(a, allowed_names) for a in node.args]
kwargs = {kw.arg: _safe_eval(kw.value, allowed_names) for kw in node.keywords}
return func(*args, **kwargs)
raise ValueError(f"Unsupported syntax: {type(node).__name__}")
def calculator(expression: str) -> str:
"""Evaluate a mathematical expression and return the exact result.
Use this tool for ANY arithmetic: multiplication, division, square roots,
exponents, percentages, logarithms, trigonometry, etc.
Args:
expression: A valid Python math expression, e.g. '347 * 829',
'math.sqrt(17161)', '2**10', 'math.log(100, 10)'.
Returns:
The exact result as a string.
"""
allowed_names = {k: getattr(math, k) for k in dir(math) if not k.startswith("_")}
allowed_names["math"] = math
allowed_names["abs"] = abs
allowed_names["round"] = round
allowed_names["min"] = min
allowed_names["max"] = max
try:
tree = ast.parse(expression, mode="eval")
result = _safe_eval(tree, allowed_names)
return str(result)
except Exception as e: # broad catch intentional: arbitrary code execution
return f"Error evaluating '{expression}': {e}"
def _make_smart_read_file(file_tools: FileTools) -> Callable:
"""Wrap FileTools.read_file so directories auto-list their contents.
When the user (or the LLM) passes a directory path to read_file,
the raw Agno implementation throws an IsADirectoryError. This
wrapper detects that case, lists the directory entries, and returns
a helpful message so the model can pick the right file on its own.
"""
original_read = file_tools.read_file
def smart_read_file(file_name: str = "", encoding: str = "utf-8", **kwargs) -> str:
"""Reads the contents of the file `file_name` and returns the contents if successful."""
# LLMs often call read_file(path=...) instead of read_file(file_name=...)
if not file_name:
file_name = kwargs.get("path", "")
if not file_name:
return "Error: no file_name or path provided."
# Resolve the path the same way FileTools does
_safe, resolved = file_tools.check_escape(file_name)
if _safe and resolved.is_dir():
entries = sorted(p.name for p in resolved.iterdir() if not p.name.startswith("."))
listing = "\n".join(f" - {e}" for e in entries) if entries else " (empty directory)"
return (
f"'{file_name}' is a directory, not a file. "
f"Files inside:\n{listing}\n\n"
"Please call read_file with one of the files listed above."
)
return original_read(file_name, encoding=encoding)
# Preserve the original docstring for Agno tool schema generation
smart_read_file.__doc__ = original_read.__doc__
return smart_read_file
def create_research_tools(base_dir: str | Path | None = None):
"""Create tools for the research agent (Echo).
Includes: file reading
"""
if not _AGNO_TOOLS_AVAILABLE:
raise ImportError(f"Agno tools not available: {_ImportError}")
toolkit = Toolkit(name="research")
# File reading
from config import settings
base_path = Path(base_dir) if base_dir else Path(settings.repo_root)
file_tools = FileTools(base_dir=base_path)
toolkit.register(_make_smart_read_file(file_tools), name="read_file")
toolkit.register(file_tools.list_files, name="list_files")
return toolkit
def create_code_tools(base_dir: str | Path | None = None):
"""Create tools for the code agent (Forge).
Includes: shell commands, python execution, file read/write, Aider AI assist
"""
if not _AGNO_TOOLS_AVAILABLE:
raise ImportError(f"Agno tools not available: {_ImportError}")
toolkit = Toolkit(name="code")
# Shell commands (sandboxed)
shell_tools = ShellTools()
toolkit.register(shell_tools.run_shell_command, name="shell")
# Python execution
python_tools = PythonTools()
toolkit.register(python_tools.run_python_code, name="python")
# File operations
from config import settings
base_path = Path(base_dir) if base_dir else Path(settings.repo_root)
file_tools = FileTools(base_dir=base_path)
toolkit.register(_make_smart_read_file(file_tools), name="read_file")
toolkit.register(file_tools.save_file, name="write_file")
toolkit.register(file_tools.list_files, name="list_files")
# Aider AI coding assistant (local with Ollama)
aider_tool = create_aider_tool(base_path)
toolkit.register(aider_tool.run_aider, name="aider")
return toolkit
def create_aider_tool(base_path: Path):
"""Create an Aider tool for AI-assisted coding."""
import subprocess
class AiderTool:
"""Tool that calls Aider (local AI coding assistant) for code generation."""
def __init__(self, base_dir: Path):
self.base_dir = base_dir
def run_aider(self, prompt: str, model: str = "qwen3:30b") -> str:
"""Run Aider to generate code changes.
Args:
prompt: What you want Aider to do (e.g., "add a fibonacci function")
model: Ollama model to use (default: qwen3:30b)
Returns:
Aider's response with the code changes made
"""
try:
# Run aider with the prompt
result = subprocess.run(
[
"aider",
"--no-git",
"--model",
f"ollama/{model}",
"--quiet",
prompt,
],
capture_output=True,
text=True,
timeout=120,
cwd=str(self.base_dir),
)
if result.returncode == 0:
return result.stdout if result.stdout else "Code changes applied successfully"
else:
return f"Aider error: {result.stderr}"
except FileNotFoundError:
return "Error: Aider not installed. Run: pip install aider"
except subprocess.TimeoutExpired:
return "Error: Aider timed out after 120 seconds"
except (OSError, subprocess.SubprocessError) as e:
return f"Error running Aider: {str(e)}"
return AiderTool(base_path)
def create_data_tools(base_dir: str | Path | None = None):
"""Create tools for the data agent (Seer).
Includes: python execution, file reading, web search for data sources
"""
if not _AGNO_TOOLS_AVAILABLE:
raise ImportError(f"Agno tools not available: {_ImportError}")
toolkit = Toolkit(name="data")
# Python execution for analysis
python_tools = PythonTools()
toolkit.register(python_tools.run_python_code, name="python")
# File reading
from config import settings
base_path = Path(base_dir) if base_dir else Path(settings.repo_root)
file_tools = FileTools(base_dir=base_path)
toolkit.register(_make_smart_read_file(file_tools), name="read_file")
toolkit.register(file_tools.list_files, name="list_files")
return toolkit
def create_writing_tools(base_dir: str | Path | None = None):
"""Create tools for the writing agent (Quill).
Includes: file read/write
"""
if not _AGNO_TOOLS_AVAILABLE:
raise ImportError(f"Agno tools not available: {_ImportError}")
toolkit = Toolkit(name="writing")
# File operations
base_path = Path(base_dir) if base_dir else Path(settings.repo_root)
file_tools = FileTools(base_dir=base_path)
toolkit.register(_make_smart_read_file(file_tools), name="read_file")
toolkit.register(file_tools.save_file, name="write_file")
toolkit.register(file_tools.list_files, name="list_files")
return toolkit
def create_security_tools(base_dir: str | Path | None = None):
"""Create tools for the security agent (Mace).
Includes: shell commands (for scanning), file read
"""
if not _AGNO_TOOLS_AVAILABLE:
raise ImportError(f"Agno tools not available: {_ImportError}")
toolkit = Toolkit(name="security")
# Shell for running security scans
shell_tools = ShellTools()
toolkit.register(shell_tools.run_shell_command, name="shell")
# File reading for logs/configs
base_path = Path(base_dir) if base_dir else Path(settings.repo_root)
file_tools = FileTools(base_dir=base_path)
toolkit.register(_make_smart_read_file(file_tools), name="read_file")
toolkit.register(file_tools.list_files, name="list_files")
return toolkit
def create_devops_tools(base_dir: str | Path | None = None):
"""Create tools for the DevOps agent (Helm).
Includes: shell commands, file read/write
"""
if not _AGNO_TOOLS_AVAILABLE:
raise ImportError(f"Agno tools not available: {_ImportError}")
toolkit = Toolkit(name="devops")
# Shell for deployment commands
shell_tools = ShellTools()
toolkit.register(shell_tools.run_shell_command, name="shell")
# File operations for config management
base_path = Path(base_dir) if base_dir else Path(settings.repo_root)
file_tools = FileTools(base_dir=base_path)
toolkit.register(_make_smart_read_file(file_tools), name="read_file")
toolkit.register(file_tools.save_file, name="write_file")
toolkit.register(file_tools.list_files, name="list_files")
return toolkit
def consult_grok(query: str) -> str:
"""Consult Grok (xAI) for frontier reasoning on complex questions.
Use this tool when a question requires advanced reasoning, real-time
knowledge, or capabilities beyond the local model. Grok is a premium
cloud backend use sparingly and only for high-complexity queries.
Args:
query: The question or reasoning task to send to Grok.
Returns:
Grok's response text, or an error/status message.
"""
from config import settings
from timmy.backends import get_grok_backend, grok_available
if not grok_available():
return (
"Grok is not available. Enable with GROK_ENABLED=true "
"and set XAI_API_KEY in your .env file."
)
backend = get_grok_backend()
# Log to Spark if available
try:
from spark.engine import spark_engine
spark_engine.on_tool_executed(
agent_id="default",
tool_name="consult_grok",
success=True,
)
except (ImportError, AttributeError) as exc:
logger.warning("Tool execution failed (consult_grok logging): %s", exc)
# Generate Lightning invoice for monetization (unless free mode)
invoice_info = ""
if not settings.grok_free:
try:
from lightning.factory import get_backend as get_ln_backend
ln = get_ln_backend()
sats = min(settings.grok_max_sats_per_query, settings.grok_sats_hard_cap)
inv = ln.create_invoice(sats, f"Grok query: {query[:_INVOICE_MEMO_MAX_LEN]}")
invoice_info = f"\n[Lightning invoice: {sats} sats — {inv.payment_request[:40]}...]"
except (ImportError, OSError, ValueError) as exc:
logger.error("Lightning invoice creation failed: %s", exc)
return "Error: Failed to create Lightning invoice. Please check logs."
result = backend.run(query)
response = result.content
if invoice_info:
response += invoice_info
return response
def web_fetch(url: str, max_tokens: int = 4000) -> str:
"""Fetch a web page and return its main text content.
Downloads the URL, extracts readable text using trafilatura, and
truncates to a token budget. Use this to read full articles, docs,
or blog posts that web_search only returns snippets for.
Args:
url: The URL to fetch (must start with http:// or https://).
max_tokens: Maximum approximate token budget (default 4000).
Text is truncated to max_tokens * 4 characters.
Returns:
Extracted text content, or an error message on failure.
"""
if not url or not url.startswith(("http://", "https://")):
return f"Error: invalid URL — must start with http:// or https://: {url!r}"
try:
import requests as _requests
except ImportError:
return "Error: 'requests' package is not installed. Install with: pip install requests"
try:
import trafilatura
except ImportError:
return (
"Error: 'trafilatura' package is not installed. Install with: pip install trafilatura"
)
try:
resp = _requests.get(
url,
timeout=15,
headers={"User-Agent": "TimmyResearchBot/1.0"},
)
resp.raise_for_status()
except _requests.exceptions.Timeout:
return f"Error: request timed out after 15 seconds for {url}"
except _requests.exceptions.HTTPError as exc:
return f"Error: HTTP {exc.response.status_code} for {url}"
except _requests.exceptions.RequestException as exc:
return f"Error: failed to fetch {url}{exc}"
text = trafilatura.extract(resp.text, include_tables=True, include_links=True)
if not text:
return f"Error: could not extract readable content from {url}"
char_budget = max_tokens * 4
if len(text) > char_budget:
text = text[:char_budget] + f"\n\n[…truncated to ~{max_tokens} tokens]"
return text
# ---------------------------------------------------------------------------
# Internal _register_* helpers
# ---------------------------------------------------------------------------
def _register_web_fetch_tool(toolkit: Toolkit) -> None:
@@ -574,10 +90,10 @@ def _register_grok_tool(toolkit: Toolkit) -> None:
def _register_memory_tools(toolkit: Toolkit) -> None:
"""Register memory search, write, and forget tools."""
try:
from timmy.memory_system import memory_forget, memory_read, memory_search, memory_write
from timmy.memory_system import memory_forget, memory_read, memory_search, memory_store
toolkit.register(memory_search, name="memory_search")
toolkit.register(memory_write, name="memory_write")
toolkit.register(memory_store, name="memory_write")
toolkit.register(memory_read, name="memory_read")
toolkit.register(memory_forget, name="memory_forget")
except (ImportError, AttributeError) as exc:
@@ -717,6 +233,11 @@ def _register_thinking_tools(toolkit: Toolkit) -> None:
raise
# ---------------------------------------------------------------------------
# Full toolkit factories
# ---------------------------------------------------------------------------
def create_full_toolkit(base_dir: str | Path | None = None):
"""Create a full toolkit with all available tools (for the orchestrator).
@@ -727,6 +248,7 @@ def create_full_toolkit(base_dir: str | Path | None = None):
# Return None when tools aren't available (tests)
return None
from config import settings
from timmy.tool_safety import DANGEROUS_TOOLS
toolkit = Toolkit(name="full")
@@ -808,19 +330,9 @@ def create_experiment_tools(base_dir: str | Path | None = None):
return toolkit
# Mapping of agent IDs to their toolkits
AGENT_TOOLKITS: dict[str, Callable[[], Toolkit]] = {
"echo": create_research_tools,
"mace": create_security_tools,
"helm": create_devops_tools,
"seer": create_data_tools,
"forge": create_code_tools,
"quill": create_writing_tools,
"lab": create_experiment_tools,
"pixel": lambda base_dir=None: _create_stub_toolkit("pixel"),
"lyra": lambda base_dir=None: _create_stub_toolkit("lyra"),
"reel": lambda base_dir=None: _create_stub_toolkit("reel"),
}
# ---------------------------------------------------------------------------
# Agent toolkit registry
# ---------------------------------------------------------------------------
def _create_stub_toolkit(name: str):
@@ -836,6 +348,21 @@ def _create_stub_toolkit(name: str):
return toolkit
# Mapping of agent IDs to their toolkits
AGENT_TOOLKITS: dict[str, Callable[[], Toolkit]] = {
"echo": create_research_tools,
"mace": create_security_tools,
"helm": create_devops_tools,
"seer": create_data_tools,
"forge": create_code_tools,
"quill": create_writing_tools,
"lab": create_experiment_tools,
"pixel": lambda base_dir=None: _create_stub_toolkit("pixel"),
"lyra": lambda base_dir=None: _create_stub_toolkit("lyra"),
"reel": lambda base_dir=None: _create_stub_toolkit("reel"),
}
def get_tools_for_agent(agent_id: str, base_dir: str | Path | None = None) -> Toolkit | None:
"""Get the appropriate toolkit for an agent.
@@ -852,11 +379,16 @@ def get_tools_for_agent(agent_id: str, base_dir: str | Path | None = None) -> To
return None
# Backward-compat alias
# Backward-compat aliases
get_tools_for_persona = get_tools_for_agent
PERSONA_TOOLKITS = AGENT_TOOLKITS
# ---------------------------------------------------------------------------
# Tool catalog
# ---------------------------------------------------------------------------
def _core_tool_catalog() -> dict:
"""Return core file and execution tools catalog entries."""
return {

View File

@@ -0,0 +1,121 @@
"""File operation tools and agent toolkit factories for file-heavy agents.
Provides:
- Smart read_file wrapper (auto-lists directories)
- Toolkit factories for Echo (research), Quill (writing), Seer (data)
"""
from __future__ import annotations
import logging
from collections.abc import Callable
from pathlib import Path
from timmy.tools._base import (
_AGNO_TOOLS_AVAILABLE,
FileTools,
PythonTools,
Toolkit,
_ImportError,
)
logger = logging.getLogger(__name__)
def _make_smart_read_file(file_tools: FileTools) -> Callable:
"""Wrap FileTools.read_file so directories auto-list their contents.
When the user (or the LLM) passes a directory path to read_file,
the raw Agno implementation throws an IsADirectoryError. This
wrapper detects that case, lists the directory entries, and returns
a helpful message so the model can pick the right file on its own.
"""
original_read = file_tools.read_file
def smart_read_file(file_name: str = "", encoding: str = "utf-8", **kwargs) -> str:
"""Reads the contents of the file `file_name` and returns the contents if successful."""
# LLMs often call read_file(path=...) instead of read_file(file_name=...)
if not file_name:
file_name = kwargs.get("path", "")
if not file_name:
return "Error: no file_name or path provided."
# Resolve the path the same way FileTools does
_safe, resolved = file_tools.check_escape(file_name)
if _safe and resolved.is_dir():
entries = sorted(p.name for p in resolved.iterdir() if not p.name.startswith("."))
listing = "\n".join(f" - {e}" for e in entries) if entries else " (empty directory)"
return (
f"'{file_name}' is a directory, not a file. "
f"Files inside:\n{listing}\n\n"
"Please call read_file with one of the files listed above."
)
return original_read(file_name, encoding=encoding)
# Preserve the original docstring for Agno tool schema generation
smart_read_file.__doc__ = original_read.__doc__
return smart_read_file
def create_research_tools(base_dir: str | Path | None = None):
"""Create tools for the research agent (Echo).
Includes: file reading
"""
if not _AGNO_TOOLS_AVAILABLE:
raise ImportError(f"Agno tools not available: {_ImportError}")
toolkit = Toolkit(name="research")
# File reading
from config import settings
base_path = Path(base_dir) if base_dir else Path(settings.repo_root)
file_tools = FileTools(base_dir=base_path)
toolkit.register(_make_smart_read_file(file_tools), name="read_file")
toolkit.register(file_tools.list_files, name="list_files")
return toolkit
def create_writing_tools(base_dir: str | Path | None = None):
"""Create tools for the writing agent (Quill).
Includes: file read/write
"""
if not _AGNO_TOOLS_AVAILABLE:
raise ImportError(f"Agno tools not available: {_ImportError}")
toolkit = Toolkit(name="writing")
# File operations
from config import settings
base_path = Path(base_dir) if base_dir else Path(settings.repo_root)
file_tools = FileTools(base_dir=base_path)
toolkit.register(_make_smart_read_file(file_tools), name="read_file")
toolkit.register(file_tools.save_file, name="write_file")
toolkit.register(file_tools.list_files, name="list_files")
return toolkit
def create_data_tools(base_dir: str | Path | None = None):
"""Create tools for the data agent (Seer).
Includes: python execution, file reading, web search for data sources
"""
if not _AGNO_TOOLS_AVAILABLE:
raise ImportError(f"Agno tools not available: {_ImportError}")
toolkit = Toolkit(name="data")
# Python execution for analysis
python_tools = PythonTools()
toolkit.register(python_tools.run_python_code, name="python")
# File reading
from config import settings
base_path = Path(base_dir) if base_dir else Path(settings.repo_root)
file_tools = FileTools(base_dir=base_path)
toolkit.register(_make_smart_read_file(file_tools), name="read_file")
toolkit.register(file_tools.list_files, name="list_files")
return toolkit

Some files were not shown because too many files have changed in this diff Show More