Compare commits
1 Commits
claude/iss
...
gemini/iss
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
4e4206a91e |
@@ -27,12 +27,8 @@
|
||||
|
||||
# ── AirLLM / big-brain backend ───────────────────────────────────────────────
|
||||
# Inference backend: "ollama" (default) | "airllm" | "auto"
|
||||
# "ollama" → always use Ollama (safe everywhere, any OS)
|
||||
# "airllm" → AirLLM layer-by-layer loading (Apple Silicon M1/M2/M3/M4 only)
|
||||
# Requires 16 GB RAM minimum (32 GB recommended).
|
||||
# Automatically falls back to Ollama on Intel Mac or Linux.
|
||||
# Install extra: pip install "airllm[mlx]"
|
||||
# "auto" → use AirLLM on Apple Silicon if installed, otherwise Ollama
|
||||
# "auto" → uses AirLLM on Apple Silicon if installed, otherwise Ollama.
|
||||
# Requires: pip install ".[bigbrain]"
|
||||
# TIMMY_MODEL_BACKEND=ollama
|
||||
|
||||
# AirLLM model size (default: 70b).
|
||||
|
||||
@@ -62,9 +62,6 @@ Per AGENTS.md roster:
|
||||
- Run `tox -e pre-push` (lint + full CI suite)
|
||||
- Ensure tests stay green
|
||||
- Update TODO.md
|
||||
- **CRITICAL: Stage files before committing** — always run `git add .` or `git add <files>` first
|
||||
- Verify staged changes are non-empty: `git diff --cached --stat` must show files
|
||||
- **NEVER run `git commit` without staging files first** — empty commits waste review cycles
|
||||
|
||||
---
|
||||
|
||||
|
||||
102
AGENTS.md
102
AGENTS.md
@@ -34,44 +34,6 @@ Read [`CLAUDE.md`](CLAUDE.md) for architecture patterns and conventions.
|
||||
|
||||
---
|
||||
|
||||
## One-Agent-Per-Issue Convention
|
||||
|
||||
**An issue must only be worked by one agent at a time.** Duplicate branches from
|
||||
multiple agents on the same issue cause merge conflicts, redundant code, and wasted compute.
|
||||
|
||||
### Labels
|
||||
|
||||
When an agent picks up an issue, add the corresponding label:
|
||||
|
||||
| Label | Meaning |
|
||||
|-------|---------|
|
||||
| `assigned-claude` | Claude is actively working this issue |
|
||||
| `assigned-gemini` | Gemini is actively working this issue |
|
||||
| `assigned-kimi` | Kimi is actively working this issue |
|
||||
| `assigned-manus` | Manus is actively working this issue |
|
||||
|
||||
### Rules
|
||||
|
||||
1. **Before starting an issue**, check that none of the `assigned-*` labels are present.
|
||||
If one is, skip the issue — another agent owns it.
|
||||
2. **When you start**, add the label matching your agent (e.g. `assigned-claude`).
|
||||
3. **When your PR is merged or closed**, remove the label (or it auto-clears when
|
||||
the branch is deleted — see Auto-Delete below).
|
||||
4. **Never assign the same issue to two agents simultaneously.**
|
||||
|
||||
### Auto-Delete Merged Branches
|
||||
|
||||
`default_delete_branch_after_merge` is **enabled** on this repo. Branches are
|
||||
automatically deleted after a PR merges — no manual cleanup needed and no stale
|
||||
`claude/*`, `gemini/*`, or `kimi/*` branches accumulate.
|
||||
|
||||
If you discover stale merged branches, they can be pruned with:
|
||||
```bash
|
||||
git fetch --prune
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Merge Policy (PR-Only)
|
||||
|
||||
**Gitea branch protection is active on `main`.** This is not a suggestion.
|
||||
@@ -169,28 +131,6 @@ self-testing, reflection — use every tool he has.
|
||||
|
||||
## Agent Roster
|
||||
|
||||
### Gitea Permissions
|
||||
|
||||
All agents that push branches and create PRs require **write** permission on the
|
||||
repository. Set via the Gitea admin API or UI under Repository → Settings → Collaborators.
|
||||
|
||||
| Agent user | Required permission | Gitea login |
|
||||
|------------|--------------------|----|
|
||||
| kimi | write | `kimi` |
|
||||
| claude | write | `claude` |
|
||||
| gemini | write | `gemini` |
|
||||
| antigravity | write | `antigravity` |
|
||||
| hermes | write | `hermes` |
|
||||
| manus | write | `manus` |
|
||||
|
||||
To grant write access (requires Gitea admin or repo admin token):
|
||||
```bash
|
||||
curl -s -X PUT "http://143.198.27.163:3000/api/v1/repos/rockachopa/Timmy-time-dashboard/collaborators/<username>" \
|
||||
-H "Authorization: token <admin-token>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"permission": "write"}'
|
||||
```
|
||||
|
||||
### Build Tier
|
||||
|
||||
**Local (Ollama)** — Primary workhorse. Free. Unrestricted.
|
||||
@@ -247,48 +187,6 @@ make docker-agent # add a worker
|
||||
|
||||
---
|
||||
|
||||
## Search Capability (SearXNG + Crawl4AI)
|
||||
|
||||
Timmy has a self-hosted search backend requiring **no paid API key**.
|
||||
|
||||
### Tools
|
||||
|
||||
| Tool | Module | Description |
|
||||
|------|--------|-------------|
|
||||
| `web_search(query)` | `timmy/tools/search.py` | Meta-search via SearXNG — returns ranked results |
|
||||
| `scrape_url(url)` | `timmy/tools/search.py` | Full-page scrape via Crawl4AI → clean markdown |
|
||||
|
||||
Both tools are registered in the **orchestrator** (full) and **echo** (research) toolkits.
|
||||
|
||||
### Configuration
|
||||
|
||||
| Env Var | Default | Description |
|
||||
|---------|---------|-------------|
|
||||
| `TIMMY_SEARCH_BACKEND` | `searxng` | `searxng` or `none` (disable) |
|
||||
| `TIMMY_SEARCH_URL` | `http://localhost:8888` | SearXNG base URL |
|
||||
| `TIMMY_CRAWL_URL` | `http://localhost:11235` | Crawl4AI base URL |
|
||||
|
||||
Inside Docker Compose (when `--profile search` is active), the dashboard
|
||||
uses `http://searxng:8080` and `http://crawl4ai:11235` by default.
|
||||
|
||||
### Starting the services
|
||||
|
||||
```bash
|
||||
# Start SearXNG + Crawl4AI alongside the dashboard:
|
||||
docker compose --profile search up
|
||||
|
||||
# Or start only the search services:
|
||||
docker compose --profile search up searxng crawl4ai
|
||||
```
|
||||
|
||||
### Graceful degradation
|
||||
|
||||
- If `TIMMY_SEARCH_BACKEND=none`: tools return a "disabled" message.
|
||||
- If SearXNG or Crawl4AI is unreachable: tools log a WARNING and return an
|
||||
error string — the app never crashes.
|
||||
|
||||
---
|
||||
|
||||
## Roadmap
|
||||
|
||||
**v2.0 Exodus (in progress):** Voice + Marketplace + Integrations
|
||||
|
||||
15
README.md
15
README.md
@@ -9,21 +9,6 @@ API access with Bitcoin Lightning — all from a browser, no cloud AI required.
|
||||
|
||||
---
|
||||
|
||||
## System Requirements
|
||||
|
||||
| Path | Hardware | RAM | Disk |
|
||||
|------|----------|-----|------|
|
||||
| **Ollama** (default) | Any OS — x86-64 or ARM | 8 GB min | 5–10 GB (model files) |
|
||||
| **AirLLM** (Apple Silicon) | M1, M2, M3, or M4 Mac | 16 GB min (32 GB recommended) | ~15 GB free |
|
||||
|
||||
**Ollama path** runs on any modern machine — macOS, Linux, or Windows. No GPU required.
|
||||
|
||||
**AirLLM path** uses layer-by-layer loading for 70B+ models without a GPU. Requires Apple
|
||||
Silicon and the `bigbrain` extras (`pip install ".[bigbrain]"`). On Intel Mac or Linux the
|
||||
app automatically falls back to Ollama — no crash, no config change needed.
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
|
||||
122
SOVEREIGNTY.md
122
SOVEREIGNTY.md
@@ -1,122 +0,0 @@
|
||||
# SOVEREIGNTY.md — Research Sovereignty Manifest
|
||||
|
||||
> "If this spec is implemented correctly, it is the last research document
|
||||
> Alexander should need to request from a corporate AI."
|
||||
> — Issue #972, March 22 2026
|
||||
|
||||
---
|
||||
|
||||
## What This Is
|
||||
|
||||
A machine-readable declaration of Timmy's research independence:
|
||||
where we are, where we're going, and how to measure progress.
|
||||
|
||||
---
|
||||
|
||||
## The Problem We're Solving
|
||||
|
||||
On March 22, 2026, a single Claude session produced six deep research reports.
|
||||
It consumed ~3 hours of human time and substantial corporate AI inference.
|
||||
Every report was valuable — but the workflow was **linear**.
|
||||
It would cost exactly the same to reproduce tomorrow.
|
||||
|
||||
This file tracks the pipeline that crystallizes that workflow into something
|
||||
Timmy can run autonomously.
|
||||
|
||||
---
|
||||
|
||||
## The Six-Step Pipeline
|
||||
|
||||
| Step | What Happens | Status |
|
||||
|------|-------------|--------|
|
||||
| 1. Scope | Human describes knowledge gap → Gitea issue with template | ✅ Done (`skills/research/`) |
|
||||
| 2. Query | LLM slot-fills template → 5–15 targeted queries | ✅ Done (`research.py`) |
|
||||
| 3. Search | Execute queries → top result URLs | ✅ Done (`research_tools.py`) |
|
||||
| 4. Fetch | Download + extract full pages (trafilatura) | ✅ Done (`tools/system_tools.py`) |
|
||||
| 5. Synthesize | Compress findings → structured report | ✅ Done (`research.py` cascade) |
|
||||
| 6. Deliver | Store to semantic memory + optional disk persist | ✅ Done (`research.py`) |
|
||||
|
||||
---
|
||||
|
||||
## Cascade Tiers (Synthesis Quality vs. Cost)
|
||||
|
||||
| Tier | Model | Cost | Quality | Status |
|
||||
|------|-------|------|---------|--------|
|
||||
| **4** | SQLite semantic cache | $0.00 / instant | reuses prior | ✅ Active |
|
||||
| **3** | Ollama `qwen3:14b` | $0.00 / local | ★★★ | ✅ Active |
|
||||
| **2** | Claude API (haiku) | ~$0.01/report | ★★★★ | ✅ Active (opt-in) |
|
||||
| **1** | Groq `llama-3.3-70b` | $0.00 / rate-limited | ★★★★ | 🔲 Planned (#980) |
|
||||
|
||||
Set `ANTHROPIC_API_KEY` to enable Tier 2 fallback.
|
||||
|
||||
---
|
||||
|
||||
## Research Templates
|
||||
|
||||
Six prompt templates live in `skills/research/`:
|
||||
|
||||
| Template | Use Case |
|
||||
|----------|----------|
|
||||
| `tool_evaluation.md` | Find all shipping tools for `{domain}` |
|
||||
| `architecture_spike.md` | How to connect `{system_a}` to `{system_b}` |
|
||||
| `game_analysis.md` | Evaluate `{game}` for AI agent play |
|
||||
| `integration_guide.md` | Wire `{tool}` into `{stack}` with code |
|
||||
| `state_of_art.md` | What exists in `{field}` as of `{date}` |
|
||||
| `competitive_scan.md` | How does `{project}` compare to `{alternatives}` |
|
||||
|
||||
---
|
||||
|
||||
## Sovereignty Metrics
|
||||
|
||||
| Metric | Target (Week 1) | Target (Month 1) | Target (Month 3) | Graduation |
|
||||
|--------|-----------------|------------------|------------------|------------|
|
||||
| Queries answered locally | 10% | 40% | 80% | >90% |
|
||||
| API cost per report | <$1.50 | <$0.50 | <$0.10 | <$0.01 |
|
||||
| Time from question to report | <3 hours | <30 min | <5 min | <1 min |
|
||||
| Human involvement | 100% (review) | Review only | Approve only | None |
|
||||
|
||||
---
|
||||
|
||||
## How to Use the Pipeline
|
||||
|
||||
```python
|
||||
from timmy.research import run_research
|
||||
|
||||
# Quick research (no template)
|
||||
result = await run_research("best local embedding models for 36GB RAM")
|
||||
|
||||
# With a template and slot values
|
||||
result = await run_research(
|
||||
topic="PDF text extraction libraries for Python",
|
||||
template="tool_evaluation",
|
||||
slots={"domain": "PDF parsing", "use_case": "RAG pipeline", "focus_criteria": "accuracy"},
|
||||
save_to_disk=True,
|
||||
)
|
||||
|
||||
print(result.report)
|
||||
print(f"Backend: {result.synthesis_backend}, Cached: {result.cached}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Status
|
||||
|
||||
| Component | Issue | Status |
|
||||
|-----------|-------|--------|
|
||||
| `web_fetch` tool (trafilatura) | #973 | ✅ Done |
|
||||
| Research template library (6 templates) | #974 | ✅ Done |
|
||||
| `ResearchOrchestrator` (`research.py`) | #975 | ✅ Done |
|
||||
| Semantic index for outputs | #976 | 🔲 Planned |
|
||||
| Auto-create Gitea issues from findings | #977 | 🔲 Planned |
|
||||
| Paperclip task runner integration | #978 | 🔲 Planned |
|
||||
| Kimi delegation via labels | #979 | 🔲 Planned |
|
||||
| Groq free-tier cascade tier | #980 | 🔲 Planned |
|
||||
| Sovereignty metrics dashboard | #981 | 🔲 Planned |
|
||||
|
||||
---
|
||||
|
||||
## Governing Spec
|
||||
|
||||
See [issue #972](http://143.198.27.163:3000/Rockachopa/Timmy-time-dashboard/issues/972) for the full spec and rationale.
|
||||
|
||||
Research artifacts committed to `docs/research/`.
|
||||
@@ -25,19 +25,6 @@ providers:
|
||||
tier: local
|
||||
url: "http://localhost:11434"
|
||||
models:
|
||||
# ── Dual-model routing: Qwen3-8B (fast) + Qwen3-14B (quality) ──────────
|
||||
# Both models fit simultaneously: ~6.6 GB + ~10.5 GB = ~17 GB combined.
|
||||
# Requires OLLAMA_MAX_LOADED_MODELS=2 (set in .env) to stay hot.
|
||||
# Ref: issue #1065 — Qwen3-8B/14B dual-model routing strategy
|
||||
- name: qwen3:8b
|
||||
context_window: 32768
|
||||
capabilities: [text, tools, json, streaming, routine]
|
||||
description: "Qwen3-8B Q6_K — fast router for routine tasks (~6.6 GB, 45-55 tok/s)"
|
||||
- name: qwen3:14b
|
||||
context_window: 40960
|
||||
capabilities: [text, tools, json, streaming, complex, reasoning]
|
||||
description: "Qwen3-14B Q5_K_M — complex reasoning and planning (~10.5 GB, 20-28 tok/s)"
|
||||
|
||||
# Text + Tools models
|
||||
- name: qwen3:30b
|
||||
default: true
|
||||
@@ -200,20 +187,6 @@ fallback_chains:
|
||||
- dolphin3 # base Dolphin 3.0 8B (uncensored, no custom system prompt)
|
||||
- qwen3:30b # primary fallback — usually sufficient with a good system prompt
|
||||
|
||||
# ── Complexity-based routing chains (issue #1065) ───────────────────────
|
||||
# Routine tasks: prefer Qwen3-8B for low latency (~45-55 tok/s)
|
||||
routine:
|
||||
- qwen3:8b # Primary fast model
|
||||
- llama3.1:8b-instruct # Fallback fast model
|
||||
- llama3.2:3b # Smallest available
|
||||
|
||||
# Complex tasks: prefer Qwen3-14B for quality (~20-28 tok/s)
|
||||
complex:
|
||||
- qwen3:14b # Primary quality model
|
||||
- hermes4-14b # Native tool calling, hybrid reasoning
|
||||
- qwen3:30b # Highest local quality
|
||||
- qwen2.5:14b # Additional fallback
|
||||
|
||||
# ── Custom Models ───────────────────────────────────────────────────────────
|
||||
# Register custom model weights for per-agent assignment.
|
||||
# Supports GGUF (Ollama), safetensors, and HuggingFace checkpoint dirs.
|
||||
|
||||
@@ -42,10 +42,6 @@ services:
|
||||
GROK_ENABLED: "${GROK_ENABLED:-false}"
|
||||
XAI_API_KEY: "${XAI_API_KEY:-}"
|
||||
GROK_DEFAULT_MODEL: "${GROK_DEFAULT_MODEL:-grok-3-fast}"
|
||||
# Search backend (SearXNG + Crawl4AI) — set TIMMY_SEARCH_BACKEND=none to disable
|
||||
TIMMY_SEARCH_BACKEND: "${TIMMY_SEARCH_BACKEND:-searxng}"
|
||||
TIMMY_SEARCH_URL: "${TIMMY_SEARCH_URL:-http://searxng:8080}"
|
||||
TIMMY_CRAWL_URL: "${TIMMY_CRAWL_URL:-http://crawl4ai:11235}"
|
||||
extra_hosts:
|
||||
- "host.docker.internal:host-gateway" # Linux: maps to host IP
|
||||
networks:
|
||||
@@ -78,50 +74,6 @@ services:
|
||||
profiles:
|
||||
- celery
|
||||
|
||||
# ── SearXNG — self-hosted meta-search engine ─────────────────────────
|
||||
searxng:
|
||||
image: searxng/searxng:latest
|
||||
container_name: timmy-searxng
|
||||
profiles:
|
||||
- search
|
||||
ports:
|
||||
- "${SEARXNG_PORT:-8888}:8080"
|
||||
environment:
|
||||
SEARXNG_BASE_URL: "${SEARXNG_BASE_URL:-http://localhost:8888}"
|
||||
volumes:
|
||||
- ./docker/searxng:/etc/searxng:rw
|
||||
networks:
|
||||
- timmy-net
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "-qO-", "http://localhost:8080/healthz"]
|
||||
interval: 30s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
start_period: 20s
|
||||
|
||||
# ── Crawl4AI — self-hosted web scraper ────────────────────────────────
|
||||
crawl4ai:
|
||||
image: unclecode/crawl4ai:latest
|
||||
container_name: timmy-crawl4ai
|
||||
profiles:
|
||||
- search
|
||||
ports:
|
||||
- "${CRAWL4AI_PORT:-11235}:11235"
|
||||
environment:
|
||||
CRAWL4AI_API_TOKEN: "${CRAWL4AI_API_TOKEN:-}"
|
||||
volumes:
|
||||
- timmy-data:/app/data
|
||||
networks:
|
||||
- timmy-net
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:11235/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 30s
|
||||
|
||||
# ── OpenFang — vendored agent runtime sidecar ────────────────────────────
|
||||
openfang:
|
||||
build:
|
||||
|
||||
@@ -1,67 +0,0 @@
|
||||
# SearXNG configuration for Timmy Time self-hosted search
|
||||
# https://docs.searxng.org/admin/settings/settings.html
|
||||
|
||||
general:
|
||||
debug: false
|
||||
instance_name: "Timmy Search"
|
||||
privacypolicy_url: false
|
||||
donation_url: false
|
||||
contact_url: false
|
||||
enable_metrics: false
|
||||
|
||||
server:
|
||||
port: 8080
|
||||
bind_address: "0.0.0.0"
|
||||
secret_key: "timmy-searxng-key-change-in-production"
|
||||
base_url: false
|
||||
image_proxy: false
|
||||
|
||||
ui:
|
||||
static_use_hash: false
|
||||
default_locale: ""
|
||||
query_in_title: false
|
||||
infinite_scroll: false
|
||||
default_theme: simple
|
||||
center_alignment: false
|
||||
|
||||
search:
|
||||
safe_search: 0
|
||||
autocomplete: ""
|
||||
default_lang: "en"
|
||||
formats:
|
||||
- html
|
||||
- json
|
||||
|
||||
outgoing:
|
||||
request_timeout: 6.0
|
||||
max_request_timeout: 10.0
|
||||
useragent_suffix: "TimmyResearchBot"
|
||||
pool_connections: 100
|
||||
pool_maxsize: 20
|
||||
|
||||
enabled_plugins:
|
||||
- Hash_plugin
|
||||
- Search_on_category_select
|
||||
- Tracker_url_remover
|
||||
|
||||
engines:
|
||||
- name: google
|
||||
engine: google
|
||||
shortcut: g
|
||||
categories: general
|
||||
|
||||
- name: bing
|
||||
engine: bing
|
||||
shortcut: b
|
||||
categories: general
|
||||
|
||||
- name: duckduckgo
|
||||
engine: duckduckgo
|
||||
shortcut: d
|
||||
categories: general
|
||||
|
||||
- name: wikipedia
|
||||
engine: wikipedia
|
||||
shortcut: wp
|
||||
categories: general
|
||||
timeout: 3.0
|
||||
@@ -1,244 +0,0 @@
|
||||
# Gitea Activity & Branch Audit — 2026-03-23
|
||||
|
||||
**Requested by:** Issue #1210
|
||||
**Audited by:** Claude (Sonnet 4.6)
|
||||
**Date:** 2026-03-23
|
||||
**Scope:** All repos under the sovereign AI stack
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
- **18 repos audited** across 9 Gitea organizations/users
|
||||
- **~65–70 branches identified** as safe to delete (merged or abandoned)
|
||||
- **4 open PRs** are bottlenecks awaiting review
|
||||
- **3+ instances of duplicate work** across repos and agents
|
||||
- **5+ branches** contain valuable unmerged code with no open PR
|
||||
- **5 PRs closed without merge** on active p0-critical issues in Timmy-time-dashboard
|
||||
|
||||
Improvement tickets have been filed on each affected repo following this report.
|
||||
|
||||
---
|
||||
|
||||
## Repo-by-Repo Findings
|
||||
|
||||
---
|
||||
|
||||
### 1. rockachopa/Timmy-time-dashboard
|
||||
|
||||
**Status:** Most active repo. 1,200+ PRs, 50+ branches.
|
||||
|
||||
#### Dead/Abandoned Branches
|
||||
| Branch | Last Commit | Status |
|
||||
|--------|-------------|--------|
|
||||
| `feature/voice-customization` | 2026-03-22 | Gemini-created, no PR, abandoned |
|
||||
| `feature/enhanced-memory-ui` | 2026-03-22 | Gemini-created, no PR, abandoned |
|
||||
| `feature/soul-customization` | 2026-03-22 | Gemini-created, no PR, abandoned |
|
||||
| `feature/dreaming-mode` | 2026-03-22 | Gemini-created, no PR, abandoned |
|
||||
| `feature/memory-visualization` | 2026-03-22 | Gemini-created, no PR, abandoned |
|
||||
| `feature/voice-customization-ui` | 2026-03-22 | Gemini-created, no PR, abandoned |
|
||||
| `feature/issue-1015` | 2026-03-22 | Gemini-created, no PR, abandoned |
|
||||
| `feature/issue-1016` | 2026-03-22 | Gemini-created, no PR, abandoned |
|
||||
| `feature/issue-1017` | 2026-03-22 | Gemini-created, no PR, abandoned |
|
||||
| `feature/issue-1018` | 2026-03-22 | Gemini-created, no PR, abandoned |
|
||||
| `feature/issue-1019` | 2026-03-22 | Gemini-created, no PR, abandoned |
|
||||
| `feature/self-reflection` | 2026-03-22 | Only merge-from-main commits, no unique work |
|
||||
| `feature/memory-search-ui` | 2026-03-22 | Only merge-from-main commits, no unique work |
|
||||
| `claude/issue-962` | 2026-03-22 | Automated salvage commit only |
|
||||
| `claude/issue-972` | 2026-03-22 | Automated salvage commit only |
|
||||
| `gemini/issue-1006` | 2026-03-22 | Incomplete agent session |
|
||||
| `gemini/issue-1008` | 2026-03-22 | Incomplete agent session |
|
||||
| `gemini/issue-1010` | 2026-03-22 | Incomplete agent session |
|
||||
| `gemini/issue-1134` | 2026-03-22 | Incomplete agent session |
|
||||
| `gemini/issue-1139` | 2026-03-22 | Incomplete agent session |
|
||||
|
||||
#### Duplicate Branches (Identical SHA)
|
||||
| Branch A | Branch B | Action |
|
||||
|----------|----------|--------|
|
||||
| `feature/internal-monologue` | `feature/issue-1005` | Exact duplicate — delete one |
|
||||
| `claude/issue-1005` | (above) | Merge-from-main only — delete |
|
||||
|
||||
#### Unmerged Work With No Open PR (HIGH PRIORITY)
|
||||
| Branch | Content | Issues |
|
||||
|--------|---------|--------|
|
||||
| `claude/issue-987` | Content moderation pipeline, Llama Guard integration | No open PR — potentially lost |
|
||||
| `claude/issue-1011` | Automated skill discovery system | No open PR — potentially lost |
|
||||
| `gemini/issue-976` | Semantic index for research outputs | No open PR — potentially lost |
|
||||
|
||||
#### PRs Closed Without Merge (Issues Still Open)
|
||||
| PR | Title | Issue Status |
|
||||
|----|-------|-------------|
|
||||
| PR#1163 | Three-Strike Detector (#962) | p0-critical, still open |
|
||||
| PR#1162 | Session Sovereignty Report Generator (#957) | p0-critical, still open |
|
||||
| PR#1157 | Qwen3 routing | open |
|
||||
| PR#1156 | Agent Dreaming Mode | open |
|
||||
| PR#1145 | Qwen3-14B config | open |
|
||||
|
||||
#### Workflow Observations
|
||||
- `loop-cycle` bot auto-creates micro-fix PRs at high frequency (PR numbers climbing past 1209 rapidly)
|
||||
- Many `gemini/*` branches represent incomplete agent sessions, not full feature work
|
||||
- Issues get reassigned across agents causing duplicate branch proliferation
|
||||
|
||||
---
|
||||
|
||||
### 2. rockachopa/hermes-agent
|
||||
|
||||
**Status:** Active — AutoLoRA training pipeline in progress.
|
||||
|
||||
#### Open PRs Awaiting Review
|
||||
| PR | Title | Age |
|
||||
|----|-------|-----|
|
||||
| PR#33 | AutoLoRA v1 MLX QLoRA training pipeline | ~1 week |
|
||||
|
||||
#### Valuable Unmerged Branches (No PR)
|
||||
| Branch | Content | Age |
|
||||
|--------|---------|-----|
|
||||
| `sovereign` | Full fallback chain: Groq/Kimi/Ollama cascade recovery | 9 days |
|
||||
| `fix/vision-api-key-fallback` | Vision API key fallback fix | 9 days |
|
||||
|
||||
#### Stale Merged Branches (~12)
|
||||
12 merged `claude/*` and `gemini/*` branches are safe to delete.
|
||||
|
||||
---
|
||||
|
||||
### 3. rockachopa/the-matrix
|
||||
|
||||
**Status:** 8 open PRs from `claude/the-matrix` fork all awaiting review, all batch-created on 2026-03-23.
|
||||
|
||||
#### Open PRs (ALL Awaiting Review)
|
||||
| PR | Feature |
|
||||
|----|---------|
|
||||
| PR#9–16 | Touch controls, agent feed, particles, audio, day/night cycle, metrics panel, ASCII logo, click-to-view-PR |
|
||||
|
||||
These were created in a single agent session within 5 minutes — needs human review before merge.
|
||||
|
||||
---
|
||||
|
||||
### 4. replit/timmy-tower
|
||||
|
||||
**Status:** Very active — 100+ PRs, complex feature roadmap.
|
||||
|
||||
#### Open PRs Awaiting Review
|
||||
| PR | Title | Age |
|
||||
|----|-------|-----|
|
||||
| PR#93 | Task decomposition view | Recent |
|
||||
| PR#80 | `session_messages` table | 22 hours |
|
||||
|
||||
#### Unmerged Work With No Open PR
|
||||
| Branch | Content |
|
||||
|--------|---------|
|
||||
| `gemini/issue-14` | NIP-07 Nostr identity |
|
||||
| `gemini/issue-42` | Timmy animated eyes |
|
||||
| `claude/issue-11` | Kimi + Perplexity agent integrations |
|
||||
| `claude/issue-13` | Nostr event publishing |
|
||||
| `claude/issue-29` | Mobile Nostr identity |
|
||||
| `claude/issue-45` | Test kit |
|
||||
| `claude/issue-47` | SQL migration helpers |
|
||||
| `claude/issue-67` | Session Mode UI |
|
||||
|
||||
#### Cleanup
|
||||
~30 merged `claude/*` and `gemini/*` branches are safe to delete.
|
||||
|
||||
---
|
||||
|
||||
### 5. replit/token-gated-economy
|
||||
|
||||
**Status:** Active roadmap, no current open PRs.
|
||||
|
||||
#### Stale Branches (~23)
|
||||
- 8 Replit Agent branches from 2026-03-19 (PRs closed/merged)
|
||||
- 15 merged `claude/issue-*` branches
|
||||
|
||||
All are safe to delete.
|
||||
|
||||
---
|
||||
|
||||
### 6. hermes/timmy-time-app
|
||||
|
||||
**Status:** 2-commit repo, created 2026-03-14, no activity since. **Candidate for archival.**
|
||||
|
||||
Functionality appears to be superseded by other repos in the stack. Recommend archiving or deleting if not planned for future development.
|
||||
|
||||
---
|
||||
|
||||
### 7. google/maintenance-tasks & google/wizard-council-automation
|
||||
|
||||
**Status:** Single-commit repos from 2026-03-19 created by "Google AI Studio". No follow-up activity.
|
||||
|
||||
Unclear ownership and purpose. Recommend clarifying with rockachopa whether these are active or can be archived.
|
||||
|
||||
---
|
||||
|
||||
### 8. hermes/hermes-config
|
||||
|
||||
**Status:** Single branch, updated 2026-03-23 (today). Active — contains Timmy orchestrator config.
|
||||
|
||||
No action needed.
|
||||
|
||||
---
|
||||
|
||||
### 9. Timmy_Foundation/the-nexus
|
||||
|
||||
**Status:** Greenfield — created 2026-03-23. 19 issues filed as roadmap. PR#2 (contributor audit) open.
|
||||
|
||||
No cleanup needed yet. PR#2 needs review.
|
||||
|
||||
---
|
||||
|
||||
### 10. rockachopa/alexanderwhitestone.com
|
||||
|
||||
**Status:** All recent `claude/*` PRs merged. 7 non-main branches are post-merge and safe to delete.
|
||||
|
||||
---
|
||||
|
||||
### 11. hermes/hermes-config, rockachopa/hermes-config, Timmy_Foundation/.profile
|
||||
|
||||
**Status:** Dormant config repos. No action needed.
|
||||
|
||||
---
|
||||
|
||||
## Cross-Repo Patterns & Inefficiencies
|
||||
|
||||
### Duplicate Work
|
||||
1. **Timmy spring/wobble physics** built independently in both `replit/timmy-tower` and `replit/token-gated-economy`
|
||||
2. **Nostr identity logic** fragmented across 3 repos with no shared library
|
||||
3. **`feature/internal-monologue` = `feature/issue-1005`** in Timmy-time-dashboard — identical SHA, exact duplicate
|
||||
|
||||
### Agent Workflow Issues
|
||||
- Same issue assigned to both `gemini/*` and `claude/*` agents creates duplicate branches
|
||||
- Agent salvage commits are checkpoint-only — not complete work, but clutter the branch list
|
||||
- Gemini `feature/*` branches created on 2026-03-22 with no PRs filed — likely a failed agent session that created branches but didn't complete the loop
|
||||
|
||||
### Review Bottlenecks
|
||||
| Repo | Waiting PRs | Notes |
|
||||
|------|-------------|-------|
|
||||
| rockachopa/the-matrix | 8 | Batch-created, need human review |
|
||||
| replit/timmy-tower | 2 | Database schema and UI work |
|
||||
| rockachopa/hermes-agent | 1 | AutoLoRA v1 — high value |
|
||||
| Timmy_Foundation/the-nexus | 1 | Contributor audit |
|
||||
|
||||
---
|
||||
|
||||
## Recommended Actions
|
||||
|
||||
### Immediate (This Sprint)
|
||||
1. **Review & merge** PR#33 in `hermes-agent` (AutoLoRA v1)
|
||||
2. **Review** 8 open PRs in `the-matrix` before merging as a batch
|
||||
3. **Rescue** unmerged work in `claude/issue-987`, `claude/issue-1011`, `gemini/issue-976` — file new PRs or close branches
|
||||
4. **Delete duplicate** `feature/internal-monologue` / `feature/issue-1005` branches
|
||||
|
||||
### Cleanup Sprint
|
||||
5. **Delete ~65 stale branches** across all repos (itemized above)
|
||||
6. **Investigate** the 5 closed-without-merge PRs in Timmy-time-dashboard for p0-critical issues
|
||||
7. **Archive** `hermes/timmy-time-app` if no longer needed
|
||||
8. **Clarify** ownership of `google/maintenance-tasks` and `google/wizard-council-automation`
|
||||
|
||||
### Process Improvements
|
||||
9. **Enforce one-agent-per-issue** policy to prevent duplicate `claude/*` / `gemini/*` branches
|
||||
10. **Add branch protection** requiring PR before merge on `main` for all repos
|
||||
11. **Set a branch retention policy** — auto-delete merged branches (GitHub/Gitea supports this)
|
||||
12. **Share common libraries** for Nostr identity and animation physics across repos
|
||||
|
||||
---
|
||||
|
||||
*Report generated by Claude audit agent. Improvement tickets filed per repo as follow-up to this report.*
|
||||
@@ -1,89 +0,0 @@
|
||||
# Screenshot Dump Triage — Visual Inspiration & Research Leads
|
||||
|
||||
**Date:** March 24, 2026
|
||||
**Source:** Issue #1275 — "Screenshot dump for triage #1"
|
||||
**Analyst:** Claude (Sonnet 4.6)
|
||||
|
||||
---
|
||||
|
||||
## Screenshots Ingested
|
||||
|
||||
| File | Subject | Action |
|
||||
|------|---------|--------|
|
||||
| IMG_6187.jpeg | AirLLM / Apple Silicon local LLM requirements | → Issue #1284 |
|
||||
| IMG_6125.jpeg | vLLM backend for agentic workloads | → Issue #1281 |
|
||||
| IMG_6124.jpeg | DeerFlow autonomous research pipeline | → Issue #1283 |
|
||||
| IMG_6123.jpeg | "Vibe Coder vs Normal Developer" meme | → Issue #1285 |
|
||||
| IMG_6410.jpeg | SearXNG + Crawl4AI self-hosted search MCP | → Issue #1282 |
|
||||
|
||||
---
|
||||
|
||||
## Tickets Created
|
||||
|
||||
### #1281 — feat: add vLLM as alternative inference backend
|
||||
**Source:** IMG_6125 (vLLM for agentic workloads)
|
||||
|
||||
vLLM's continuous batching makes it 3–10x more throughput-efficient than Ollama for multi-agent
|
||||
request patterns. Implement `VllmBackend` in `infrastructure/llm_router/` as a selectable
|
||||
backend (`TIMMY_LLM_BACKEND=vllm`) with graceful fallback to Ollama.
|
||||
|
||||
**Priority:** Medium — impactful for research pipeline performance once #972 is in use
|
||||
|
||||
---
|
||||
|
||||
### #1282 — feat: integrate SearXNG + Crawl4AI as self-hosted search backend
|
||||
**Source:** IMG_6410 (luxiaolei/searxng-crawl4ai-mcp)
|
||||
|
||||
Self-hosted search via SearXNG + Crawl4AI removes the hard dependency on paid search APIs
|
||||
(Brave, Tavily). Add both as Docker Compose services, implement `web_search()` and
|
||||
`scrape_url()` tools in `timmy/tools/`, and register them with the research agent.
|
||||
|
||||
**Priority:** High — unblocks fully local/private operation of research agents
|
||||
|
||||
---
|
||||
|
||||
### #1283 — research: evaluate DeerFlow as autonomous research orchestration layer
|
||||
**Source:** IMG_6124 (deer-flow Docker setup)
|
||||
|
||||
DeerFlow is ByteDance's open-source autonomous research pipeline framework. Before investing
|
||||
further in Timmy's custom orchestrator (#972), evaluate whether DeerFlow's architecture offers
|
||||
integration value or design patterns worth borrowing.
|
||||
|
||||
**Priority:** Medium — research first, implementation follows if go/no-go is positive
|
||||
|
||||
---
|
||||
|
||||
### #1284 — chore: document and validate AirLLM Apple Silicon requirements
|
||||
**Source:** IMG_6187 (Mac-compatible LLM setup)
|
||||
|
||||
AirLLM graceful degradation is already implemented but undocumented. Add System Requirements
|
||||
to README (M1/M2/M3/M4, 16 GB RAM min, 15 GB disk) and document `TIMMY_LLM_BACKEND` in
|
||||
`.env.example`.
|
||||
|
||||
**Priority:** Low — documentation only, no code risk
|
||||
|
||||
---
|
||||
|
||||
### #1285 — chore: enforce "Normal Developer" discipline — tighten quality gates
|
||||
**Source:** IMG_6123 (Vibe Coder vs Normal Developer meme)
|
||||
|
||||
Tighten the existing mypy/bandit/coverage gates: fix all mypy errors, raise coverage from 73%
|
||||
to 80%, add a documented pre-push hook, and run `vulture` for dead code. The infrastructure
|
||||
exists — it just needs enforcing.
|
||||
|
||||
**Priority:** Medium — technical debt prevention, pairs well with any green-field feature work
|
||||
|
||||
---
|
||||
|
||||
## Patterns Observed Across Screenshots
|
||||
|
||||
1. **Local-first is the north star.** All five images reinforce the same theme: private,
|
||||
self-hosted, runs on your hardware. vLLM, SearXNG, AirLLM, DeerFlow — none require cloud.
|
||||
Timmy is already aligned with this direction; these are tactical additions.
|
||||
|
||||
2. **Agentic performance bottlenecks are real.** Two of five images (vLLM, DeerFlow) focus
|
||||
specifically on throughput and reliability for multi-agent loops. As the research pipeline
|
||||
matures, inference speed and search reliability will become the main constraints.
|
||||
|
||||
3. **Discipline compounds.** The meme is a reminder that the quality gates we have (tox,
|
||||
mypy, bandit, coverage) only pay off if they are enforced without exceptions.
|
||||
@@ -1,111 +0,0 @@
|
||||
# The Sovereignty Loop
|
||||
|
||||
This document establishes the primary engineering constraint for all Timmy Time development: every task must increase sovereignty as a default deliverable. Not as a future goal. Not as an optimization pass. As a constraint on every commit, every function, every inference call.
|
||||
|
||||
The full 11-page governing architecture document is available as a PDF: [The-Sovereignty-Loop.pdf](./The-Sovereignty-Loop.pdf)
|
||||
|
||||
> "The measure of progress is not features added. It is model calls eliminated."
|
||||
|
||||
## The Core Principle
|
||||
|
||||
> **The Sovereignty Loop**: Discover with an expensive model. Compress the discovery into a cheap local rule. Replace the model with the rule. Measure the cost reduction. Repeat.
|
||||
|
||||
Every call to an LLM, VLM, or external API passes through three phases:
|
||||
1. **Discovery** — Model sees something for the first time (expensive, unavoidable, produces new knowledge)
|
||||
2. **Crystallization** — Discovery compressed into durable cheap artifact (requires explicit engineering)
|
||||
3. **Replacement** — Crystallized artifact replaces the model call (near-zero cost)
|
||||
|
||||
**Code review requirement**: If a function calls a model without a crystallization step, it fails code review. No exceptions. The pattern is always: check cache → miss → infer → crystallize → return.
|
||||
|
||||
## The Sovereignty Loop Applied to Every Layer
|
||||
|
||||
### Perception: See Once, Template Forever
|
||||
- First encounter: VLM analyzes screenshot (3-6 sec) → structured JSON
|
||||
- Crystallized as: OpenCV template + bounding box → `templates.json` (3 ms retrieval)
|
||||
- `crystallize_perception()` function wraps every VLM response
|
||||
- **Target**: 90% of perception cycles without VLM by hour 1, 99% by hour 4
|
||||
|
||||
### Decision: Reason Once, Rule Forever
|
||||
- First encounter: LLM reasons through decision (1-5 sec)
|
||||
- Crystallized as: if/else rules, waypoints, cached preferences → `rules.py`, `nav_graph.db` (<1 ms)
|
||||
- Uses Voyager pattern: named skills with embeddings, success rates, conditions
|
||||
- Skill match >0.8 confidence + >0.6 success rate → executes without LLM
|
||||
- **Target**: 70-80% of decisions without LLM by week 4
|
||||
|
||||
### Narration: Script the Predictable, Improvise the Novel
|
||||
- Predictable moments → template with variable slots, voiced by Kokoro locally
|
||||
- LLM narrates only genuinely surprising events (quest twist, death, discovery)
|
||||
- **Target**: 60-70% templatized within a week
|
||||
|
||||
### Navigation: Walk Once, Map Forever
|
||||
- Every path recorded as waypoint sequence with terrain annotations
|
||||
- First journey = full perception + planning; subsequent = graph traversal
|
||||
- Builds complete nav graph without external map data
|
||||
|
||||
### API Costs: Every Dollar Spent Must Reduce Future Dollars
|
||||
|
||||
| Week | Groq Calls/Hr | Local Decisions/Hr | Sovereignty % | Cost/Hr |
|
||||
|---|---|---|---|---|
|
||||
| 1 | ~720 | ~80 | 10% | $0.40 |
|
||||
| 2 | ~400 | ~400 | 50% | $0.22 |
|
||||
| 4 | ~160 | ~640 | 80% | $0.09 |
|
||||
| 8 | ~40 | ~760 | 95% | $0.02 |
|
||||
| Target | <20 | >780 | >97% | <$0.01 |
|
||||
|
||||
## The Sovereignty Scorecard (5 Metrics)
|
||||
|
||||
Every work session ends with a sovereignty audit. Every PR includes a sovereignty delta. Not optional.
|
||||
|
||||
| Metric | What It Measures | Target |
|
||||
|---|---|---|
|
||||
| Perception Sovereignty % | Frames understood without VLM | >90% by hour 4 |
|
||||
| Decision Sovereignty % | Actions chosen without LLM | >80% by week 4 |
|
||||
| Narration Sovereignty % | Lines from templates vs LLM | >60% by week 2 |
|
||||
| API Cost Trend | Dollar cost per hour of gameplay | Monotonically decreasing |
|
||||
| Skill Library Growth | Crystallized skills per session | >5 new skills/session |
|
||||
|
||||
Dashboard widget on alexanderwhitestone.com shows these in real-time during streams. HTMX component via WebSocket.
|
||||
|
||||
## The Crystallization Protocol
|
||||
|
||||
Every model output gets crystallized:
|
||||
|
||||
| Model Output | Crystallized As | Storage | Retrieval Cost |
|
||||
|---|---|---|---|
|
||||
| VLM: UI element | OpenCV template + bbox | templates.json | 3 ms |
|
||||
| VLM: text | OCR region coords | regions.json | 50 ms |
|
||||
| LLM: nav plan | Waypoint sequence | nav_graph.db | <1 ms |
|
||||
| LLM: combat decision | If/else rule on state | rules.py | <1 ms |
|
||||
| LLM: quest interpretation | Structured entry | quests.db | <1 ms |
|
||||
| LLM: NPC disposition | Name→attitude map | npcs.db | <1 ms |
|
||||
| LLM: narration | Template with slots | narration.json | <1 ms |
|
||||
| API: moderation | Approved phrase cache | approved.set | <1 ms |
|
||||
| Groq: strategic plan | Extracted decision rules | strategy.json | <1 ms |
|
||||
|
||||
Skill document format: markdown + YAML frontmatter following agentskills.io standard (name, game, type, success_rate, times_used, sovereignty_value).
|
||||
|
||||
## The Automation Imperative & Three-Strike Rule
|
||||
|
||||
Applies to developer workflow too, not just the agent. If you do the same thing manually three times, you stop and write the automation before proceeding.
|
||||
|
||||
**Falsework Checklist** (before any cloud API call):
|
||||
1. What durable artifact will this call produce?
|
||||
2. Where will the artifact be stored locally?
|
||||
3. What local rule or cache will this populate?
|
||||
4. After this call, will I need to make it again?
|
||||
5. If yes, what would eliminate the repeat?
|
||||
6. What is the sovereignty delta of this call?
|
||||
|
||||
## The Graduation Test (Falsework Removal Criteria)
|
||||
|
||||
All five conditions met simultaneously in a single 24-hour period:
|
||||
|
||||
| Test | Condition | Measurement |
|
||||
|---|---|---|
|
||||
| Perception Independence | 1 hour, no VLM calls after minute 15 | VLM calls in last 45 min = 0 |
|
||||
| Decision Independence | Full session with <5 API calls total | Groq/cloud calls < 5 |
|
||||
| Narration Independence | All narration from local templates + local LLM | Zero cloud TTS/narration calls |
|
||||
| Economic Independence | Earns more sats than spends on inference | sats_earned > sats_spent |
|
||||
| Operational Independence | 24 hours unattended, no human intervention | Uptime > 23.5 hrs |
|
||||
|
||||
> "The arch must hold after the falsework is removed."
|
||||
@@ -1,296 +0,0 @@
|
||||
%PDF-1.4
|
||||
%“Œ‹ž ReportLab Generated PDF document (opensource)
|
||||
1 0 obj
|
||||
<<
|
||||
/F1 2 0 R /F2 3 0 R /F3 4 0 R /F4 6 0 R /F5 8 0 R /F6 9 0 R
|
||||
/F7 15 0 R
|
||||
>>
|
||||
endobj
|
||||
2 0 obj
|
||||
<<
|
||||
/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
3 0 obj
|
||||
<<
|
||||
/BaseFont /Times-Bold /Encoding /WinAnsiEncoding /Name /F2 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
4 0 obj
|
||||
<<
|
||||
/BaseFont /Times-Italic /Encoding /WinAnsiEncoding /Name /F3 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
5 0 obj
|
||||
<<
|
||||
/Contents 23 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
6 0 obj
|
||||
<<
|
||||
/BaseFont /Times-Roman /Encoding /WinAnsiEncoding /Name /F4 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
7 0 obj
|
||||
<<
|
||||
/Contents 24 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
8 0 obj
|
||||
<<
|
||||
/BaseFont /Courier /Encoding /WinAnsiEncoding /Name /F5 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
9 0 obj
|
||||
<<
|
||||
/BaseFont /Symbol /Name /F6 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
10 0 obj
|
||||
<<
|
||||
/Contents 25 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
11 0 obj
|
||||
<<
|
||||
/Contents 26 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
12 0 obj
|
||||
<<
|
||||
/Contents 27 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
13 0 obj
|
||||
<<
|
||||
/Contents 28 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
14 0 obj
|
||||
<<
|
||||
/Contents 29 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
15 0 obj
|
||||
<<
|
||||
/BaseFont /ZapfDingbats /Name /F7 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
16 0 obj
|
||||
<<
|
||||
/Contents 30 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
17 0 obj
|
||||
<<
|
||||
/Contents 31 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
18 0 obj
|
||||
<<
|
||||
/Contents 32 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
19 0 obj
|
||||
<<
|
||||
/Contents 33 0 R /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
20 0 obj
|
||||
<<
|
||||
/PageMode /UseNone /Pages 22 0 R /Type /Catalog
|
||||
>>
|
||||
endobj
|
||||
21 0 obj
|
||||
<<
|
||||
/Author (\(anonymous\)) /CreationDate (D:20260322181712+00'00') /Creator (\(unspecified\)) /Keywords () /ModDate (D:20260322181712+00'00') /Producer (ReportLab PDF Library - \(opensource\))
|
||||
/Subject (\(unspecified\)) /Title (\(anonymous\)) /Trapped /False
|
||||
>>
|
||||
endobj
|
||||
22 0 obj
|
||||
<<
|
||||
/Count 11 /Kids [ 5 0 R 7 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 16 0 R 17 0 R 18 0 R
|
||||
19 0 R ] /Type /Pages
|
||||
>>
|
||||
endobj
|
||||
23 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 611
|
||||
>>
|
||||
stream
|
||||
Gatm7a\pkI(r#kr^15oc#d(OW9W'%NLCsl]G'`ct,r*=ra:9Y;O.=/qPPA,<)0u%EDp`J-)D8JOZNBo:EH0+93:%&I&d`o=Oc>qW[`_>md85u<*X\XrP6`u!aE'b&MKLI8=Mg=[+DUfAk>?b<*V(>-/HRI.f.AQ:/Z;Q8RQ,uf4[.Qf,MZ"BO/AZoj(nN.=-LbNB@mIA0,P[A#-,.F85[o)<uTK6AX&UMiGdCJ(k,)DDs</;cc2djh3bZlGB>LeAaS'6IiM^k:&a-+o[tF,>h6!h_lWDGY*uAlMJ?.$S/*8Vm`MEp,TV(j01fp+-RiIG,=riK'!mcY`41,5^<Fb\^/`jd#^eR'RY?C=MrM/#*H$8t&9N(fNgoYh&SDT/`KKFC`_!Jd_MH&i`..L+eT;drS+7a3&gpq=a!L0!@^9P!pEUrig*74tNM[=V`aL.o:UKH+4kc=E&*>TA$'fi"hC)M#MS,H>n&ikJ=Odj!TB7HjVFIsGiSDs<c!9Qbl.gX;jh-".Ys'VRFAi*R&;"eo\Cs`qdeuh^HfspsS`r0DZGQjC<VDelMs;`SYWo;V@F*WIE9*7H7.:*RQ%gA5I,f3:k$>ia%&,\kO!4u~>endstream
|
||||
endobj
|
||||
24 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2112
|
||||
>>
|
||||
stream
|
||||
Gatm;gN)%<&q/A5FQJ?N;(un#q9<pGPcNkN4(`bFnhL98j?rtPScMM>?b`LO!+'2?>1LVB;rV2^Vu-,NduB#ir$;9JW/5t/du1[A,q5rTPiP\:lPk/V^A;m3T4G<n#HMN%X@KTjrmAX@Ft3f\_0V]l;'%B)0uLPj-L2]$-hETTlYY)kkf0!Ur_+(8>3ia`a=%!]lb@-3Md1:7.:)&_@S'_,o0I5]d^,KA2OcA_E$JM]Z[;q#_Y69DLSqMoC1s2/n0;<"Z_gm>Lsk6d7A$_H,0o_U7?#4]C5!*cNV+B]^5OnG>WdB'2Pn>ZQ)9/_jBY.doEVFd6FYKjF<A8=m5uGn4gU-@P9n(rI:Qq:FsSA)/:VTP8\lhj2#6ApURNhalBJoU^$^'@mn!,BWDt<AF@U4B89H'BW7#l`H`R,*_N]F1`qNa1j!eKY:aR3p@5[n<r_1cE]rLj62'lK'cVDYndl\6<Cm?%B:Z>nB:[%Ft)/$#B>JM$UP8A0/,8MLf#nDSeH^_T5E!L-[2O5mU<jpXXBo9XeVBann[mSNE21KVn+l9f]?,n7WR@L:FfNMd5((XBC:/tmVO,^-oP]"#\G."W">`S?nEbuH.X!I9He::B(!Y;;2gZ#I4!*G,]LIVA"<E5iblY?O,gSrI[_"TE>:4Hh7\j;LJK&Hg?mS.&Re?X5NFgNlh&S=G7*]T;#nN7=AAClhL"!9_a]SA/?3oDEk7jk/&b_[Y*NbtQ'"3f0]epO/m+5V]UrDS3?;amUh7O8l)C"(.8R-4P8Kb$@p$a,nP2S+KS_I(-8A(b4nJ;\s::1HQ7joV1(6Ue/mFbSAJ=Grd/;]\GeD^m1_e:j,a[)u4i*i*:7SQPMo#)\)MPp:cDD09&s[mM2_@9]_-7WMV1]uNcb4,FnrZdfL@jC%kJHjF%6L5RE(\gZ.@GJ_\CZ?#jcYA"b*ZTp0f-DsI$.X@fcWl+94`3F9BUZ%qGKG43K5V;jl]tb'&<>?NU)_s[hepiJ![@ej%/DH:tf3+p^]P/us*LmWd1`^VLl'k"5N8H:6r'V1heU1'M,6FK^ID8Nds$'kajj5PJYn+_N^C#4k3\#C6[D_Y\MO/C@YP`kDH:bkc=3.,&8O;cD[(c/WH>Vp_KcV(/%bh/Ec3U()<\7;UG`6=[P:4ah_l^@;!pL55.g=G@KJsjQPHSE4HdG1O-nBuPFY&lmLYa+beK)K?LAb8D"T(DK5$L0ON^IB+:Q2Vn(<<atkt*'ADH,_BDsSL7ClRh\J^B^X&eCO2$NIcg9KVHoWq>0s2fp!b1GZ+%K,NeKZ<3hDIp:]INMurJ:pS&G:gKG>\./?UQ#$eGCq+2:]dQ+mj=+j%+FX`FmAogol!t#S^j0REChrCiB^6_\i6XP_9A92)[H-OBQ-^QV=bOrfQeop/q'f)Rd8*CSbPXcqABTI;Jf.%Foa[>:LE4mcOkC/q^DlM7$#aGGF87YQ4PsYuFY'GsT\r1qpDljUWhGoOpJ^<t;o+@[V4XG]8K/<do29F"^QnAPQs(S1'Onu9^q+I6=//DAT#5k(lOVZ+&JgEhZ=1e_dedNZ&CGR>Sn"(,&'<74C%2'H7u,,<:?Uk=>6"$mO5`-%cE^r.#D$n(Un+J&FcD,(btu4G`Be/i5ka60S*^"C9c-EsWYL*H'pS)dKq[g7Q]b@3Ar$XZl4sKdK0%>6N]p<\fA.PRA;r(60Z[YE/(bM#H-sEl8glMDc13\n"PjqnGnP2EP#2(G*`P4EZKWY[r52.KA94,mXeNiJ]aIb4jctGF4Y^j[UL#q<*!@4p28#j!`p>3.[nlNA:$9hsj(&!Y?d`_:J3[/cd/"j!5+0I;^Aa7o*H*RPCjtBk=g)p2@F@T<[6s+.HXC72TnOuNkmce'5arFH+O`<nI]E3&ZMF>QFc>B+7D=UbdV'Doj(R!.H^<_1>NuF)SJUP-<1_5$AS8$kL$Kd8mW9oFeY+ksfU^+>Bjlh3[E9Q-BhuT=5B9_fpYq.#B1C:9H9WLHCG_TS-G8kE+)`hnTD/Kggt54$fdqH-QM1kc]@$jhjj%Jd9.G:o@maribiV!4Iqar3O!;,iYmZVV?=:*%&jM!_N3d?Nj)l!BGKDQB_sKgce(&pK_1pDg~>endstream
|
||||
endobj
|
||||
25 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2489
|
||||
>>
|
||||
stream
|
||||
Gatm<Bi?6H')g+ZaDcfBZ-B`S<f>T`j#M:&i#)`0mh[0+MH3<KTeBK4@'m[t?QIs;#pb8p_Mi0YOngIWO-^kaLu6:&Q8R&C1]$o76r?Xa"\!-edd3.RcVFI%Yql\$Amu\>IQY[ao0`D)jjIt$]_"#eK/>,mP$q]lVm@,9S+_D+/s_LRct1sTF;mq$1_Y#F0q\@KRXLm_O%.5^;ER[+8O82sF2aH8P0qDpampV\N+`i:knJ*lpZm;1.6X7ZPc"P$U]iqtb0iimqem5*S:&=:HVK^N/.<1C-u4bH;&E%!Lphek0U]q--OhL^aF=+"_g9mgKsB.leVYe@4f<)P<NP7=DtF>0kGP?OAFaKc'-,G8:FQXqZb=9#+GbYhRcP48mEsV%PT-H%<JgbH3AIMPJsDe#K;V7M8_q[;73r]QoT=XRUiA&2B#RoL=*2J.Z**+\W\aM$n`K3\OML"9KI5)_Y9l)@K-H96,-hJh!R6LgD.=>?8n/[F$VJJNmV?(7np[X_N2V*ETM"!2-9"c%f<TD++5*N,7AHtmf'$i^li;lo-nhm#YXirfr41qsq\8*Ci1<Zbk@\o.q,1lSjRU,k7VTCcTb+)j1X5,\kZ,7G`0q."qOIZ3"sZHDe*_`GXkIC/-'gd&pQ1"068[899PZ8Mi!&k2iaCd%j-sKh+lciaH/]gAhcZbF.3-H76RUWbj@VGfRMME]djehu3M-Ou@;WCE%n4,D[:krIm!$L4BDE>=JT*al;`=TmYm#PqET'Uh,aH%,\k9c8\u#.g_C4/Xq#[WW+(5&D:eu\!Y.-$.Va]@1dgbL4$1;b%1L;<;i(5"oaWFgjPYSO9-3.<I_=5dV,gE5Spb.;"hX=aqKu^Xf#+h`o(]Sr8/%'*67GAoN^DX4?C/@(u(2JSq.OF8;>.)BEk<frh]m*2e-j!_MHlP0egP%SMf1()8_,PWo1)J1J%q!Y]Cb%o/A-a"T^JUeONPH=+ES:W_N$C#>Q3[`ONAmAjcNVO"D<Oh("Bf4SKTYu[U4P$*q\Gpc`/GH-PZBSGXpc/XY5$tcbR9ZY,hc:X_qs4:%9_ubq!W08`80FnP@07_nV$W9p049\[9N5"[6(U1Ig65[I\!qcJ"KorMEM1]\R5o&$Z0U,hn.A/FZ^"P\9Pd`K69X^p$)2BSPZ-hkfrK*#<9LEL7ni@2Se_:2[ei%KMd`bO`<LB9\:=HQjI]pqq"[@Nh4Iu7bF50EZ<'/#?8.<ETQugk0qAG-hK1,(V1a9/#;-(>Kn=WCA%N(S>M;h]b@J^D%I]ilPDe%qW[B)rBqCTTX5^AlM"ZWV2;f^+p7juA;<i%_(!YY$]cF$fIV>pd6-?u>$Rms.ECrS/J`8>n.lKeMKDQc.H[S&;B95.(:"`2A7QY=5](`*bN^(YNhF[,]Djh;LmiJ,_"s=#j(8d;.g6F,CoUqRX#<Qid,kmd3EP2jC9D$]N@^pj^1eZto<sp*"jBIZ-[fCng5m"p&H)&8E52C/<rfWnTq-8L98!3\BJ8DJFks[0]n;1-et*c/5r8;U&]Dun5Oq<J17K35NB?Rs(Pd$`K0G/U>GZZC_PQQf>T)]a&A8R^g],:[]L+/83Eh?`cq1aEaXU[\S'c[`!e/,g0.5-6rbWSaQfr4W;pDZ51`EEu*t<G6_U5B4rjhu)&oYh\4H)e*p!Hf`;%1?20oY*qqb]KLUZiP7]%%X9'umr$-o>JRBQR$SK^]i2d`f5!Icg6CCaTNPgNbPaY/FDk*O6=nY1j8G\0pl2gTd9m1SDWWh[uQNCFRDIH_"[/F@r)IEObA3UVm82UN0:6+@.LhOU?A]+TI`Q\TV],jH:b\9uHGe4Q9'GX:)'T7./J:j<5J.L3sk_%qn$&T'eLSo`?3gF9F='s#E16?""E]3IW<eL.]5&:_tJ7e:#%4=gLQK*#I/(CE)oS*V7KO[d3#^`pabg[MBmkSH%92oCgZ=o<.a&lc,e<]&RI`pl;V2,"f^dC@1.3VdX3\F2l50Y=9HpL^mu-JgSgn,1G/G't^Mkhe"<1-Oh/>['oDAFKG\s^Suc*ib$@KhsVhK/BP1LXgX(d1-GooQM6CggPu1PY2?R)*NK\6XduTug+BhoEbQrsBOZ[%)SL$$Rd+1F0pu/7;0VoM@mp+i^V%K=bk<&1KsEm]NHPo"FfinGR.7Yn2,Wr0="8Wo5M+NjflT8HZGV+8_S4<'W&G3rD_QnUk0c;q3Qfou"X<[Q%HWINl_;P/+H7"Tcq?K7Ggk@&<BRL#D4F!$Fmke3-e2IE\RNE4,c'"6c(odL+r]3`%'WEDiE@2)+?TVq/]S747hL/Zl]FBu4C1>DI8TGrJS$V"JSH/D7*.X75>ZZa&aOC8rp>e$fH/N:92sd>$MGU.k/uQUm$!M)SDM7g5,>%F`%T0Vl9lS`6I(*O_4NOh0/NOJ^=t\lG.7;)rS&iuOo'9F:B/sVFYD+$k=`9/(?luKOWLDHcPHMY(ZCqi&TQ2S!r%q>b<DKp%mXdk2u~>endstream
|
||||
endobj
|
||||
26 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2711
|
||||
>>
|
||||
stream
|
||||
Gau0DD0+Gi')q<+Z'1SNSXtlc.?^DfpG0:dagd\5H-ok]1n>+E65!c@?pN(rEc_9ZScpNG;B2a*lc$,&K37'JZb+O9*)VNqaF+g5d6Hdck\3F^9_0Q!8W_<s3W1Wrqf(]S9IT'?ZQL4,K65!1kkM&YYsC4JSkR!D$$2Q4Y\WHM^6\ClhaeumQ*OV?Y'!D+U?Rp)<RYd[;a.#QH^]H)S*[6kc]V\)e&8H]8us9aHS^:GRcPDp7+4+iAq8okJ+F(>Blg."9*4$BEdTd0-IX(YI]M`#fk[+![o8UQn6$H66IB3=92"<@M@H;AP%fg,Iu\"oP*Y!r"Z+AYVf_iX0Zipi[7HJ,/Dr%+H+G(\NG7Mp(D#re@kOOE-gc7`<*c=8c+!'V=H6iSp+ZM\ANG119C`M`%?hkeXYf[gCca04])!['G1q.:'LoD[U>);c317bG!L!<i0MU=D:NWoSQE2KN<SVeK@K,l]01[KgDa2A3P+m/?SAj""0:;Ur%&R+L8$_P.JZZ<o[9_7R81KH-[34q$rXr)Wh7ZQQC$bYu7'0NiXE@OubP*]_i>O/fNc`J2rGKi3r=`&0AP'"d9-flS,dhU5b?%J7^n$/XaQc5EX3Hs!<FbL83uBYXGpDT\fTG(5.BJ0hS%])bf2B%f+TX61YpE`A'XbKXIV\i?)I+".-/8<ijs/<_(9/V4'nZB#1YD=6=E".-W)>R]&bS#U?m1DCC[c=8Bm>Gu2<78T^H[@Qs*q(6?7D<dO852tB97aXGeG%'h+4+J"5_&B4#ZiJh_%%FKR8>AHQC@iU2b>UGe89lLJ.fbnrNYjZYWkSO1S7[eSZ(^]2?Z#DA80/qhF.>,9Xa$3,Y2R7/HS-:f$mm(/DM=J+b5.k9/`1Nl?2PO2.qI9Q?Rm1uE8c93HI@8#)0<Qh4k*nn"nbel9VbF$Ik"cL.4/Y!==RM:,$H#M&4?&Z)9tA_4N&cm@\M/0L5Z4iTS<6eAs9Wii>((.KDW43Xd!=sO#]M*l:,k2A82L^P*s3OUVDYYpWbU6,`QmG=GBjrLl20kB-[=W%Ns;>$6@g<`Hl*iA^,+dZs/.bd&LmXi-f^4Z/:9Z@-ZYI*1"O.,Bjhe-`FHk;$0PYKtj)!W7VI0[t3_uJ.hct]Ko(?J"kP(_s,PH0]H.8WjhZ<%2O_QiJt_61S"6EPS-9*lUmPuH?D\Di%d3!b("RQ)k(=idnMeB5&Ha[R].*_6g3ce8V>lM@6:>t5)[aK(R9C8"X13@:_,,qs8g'sL_XIG<><liR$//JY%ERj.o1*_iN2"#)chKW.5SKj,O0:mQNd!o6FV+T.h(*Fk2[>NfAC<&MlOio"RnL`Ko[3G7MGqAYrN(g&c5Z79-#iA4n/G'$]R7=LIiDhgb@XuXKOFee7Af`:&h-q_j&I;K\o&43*</q@sPTCYW.TpNV58(Ap!Fl%8"]J?do$7clL&77;sd5U"2]m@dDIfeORqHAD2ICV/Xo4[:-IA,U[c<"a;o7YabqR<q9&_[R8cL)@Qkc:.8GsQ:I>k;(T,.4hl+SMV#UjRZ4J`]6JDh`uCi6\IE/K>hZ,M@c]AHTcQeL)W%g<o'ciW]G$5cC`k7G-F8(K5?^rDR'=UIUALh%sk`d!CO/iUY*42DTScdi3918CA@"39l=gH!gSh2o'_pGTe(gbK'k0E+7N!o"aeg)\XXC#J\\okne[8=D8bmd(fNPDYF&sMolOo<VDsm*aI'Eq-&_/deU`?NE4q?>52Z^g1nUk.OsQH%]5P<UB5amJ-:5Q:&&j9F:W&e2o#/@F9hE*[$H]Er2V][(U0A;kbWrjXG/JQ@pO<N3SJUoXOA48^I;#R\crt/rI'1m0DH%10YO6Winh]ZFdAj'mqR.fUjrlOllm=9DpY8=UsTYDeS3Emn]hDO:mdNTQY7>JQqi^".9_<OMnSWJVZqp&`DXC3nsX!+Q+a<!*n7?oDHPFNA@6P_EEck`hR(XK*aGHE85oeDR$'F&d1<pD2V:aS=fsBi'dBVd2%[`'Yu&5h?+Yllo3LjB[#8S]c?9/fdO%fERqafOmEaQ's+DkA5qbW!:UQ=8Ero#tqe@hZ1_5]3,b/FP=asg7\3X4-IoG:>^#SO2mgH"G3sBg8SHR>Fgu-J;fXAA#'mA"1VN"u5/#^;2%68(uK)8mK7`k%Kf:i9$9/8b78;f`1n=c^fh#_o[TeA^bFTL=pP)_*THO9"\5TY4&00HU],N%1UN+`7:#gDS\bJ5)1Eu;W:R]!F2d,?=,UGehUkU2aZ`BA[bH#PWp(G7NG?(r17dAt/s@#!jV1:>N,0))qYoG8U["V^Q;oO:0;KbYuP0q-(*.`ni<:=+Y'RJ=hFagH`a1+cfR=]Q(DLE^6eom6)Z_-Xq+;H.eb4nLgTN,.V\$8F=/OG34fq!OifKS))`no61(%@P`c@7pAANBY<[Rf-)tS'p=u=7h.JnT'GnmraW(OP[Dc&2-l7k`%-?jM]O(>t=himKCH^rRr%/f8D^0Ua]h7nb3%8*r?r>92%k%N;hc3E&$3gHpkjm/Ws("-&]>fLLP+rkd5,ZMDa!mi\K_i>tXq-%$eKb;(cM/1h5D;!q;?NkZT_sIEcX+eadC!<]j6#/e.Of`!2HSElEP*iEfHp)G:H@#[CqaIo4oBn.lYUSL3;SR%M$<Gk"p3TC8)!0kq&6ipLmu$teNfkSd=!X?X&n?r%JXk1J\PNe;Vi9,n0WSc'?:FW(;~>endstream
|
||||
endobj
|
||||
27 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 739
|
||||
>>
|
||||
stream
|
||||
Gat%`9omaW&;KZL'ls?]<l4P(LP.XkY_u<g]5/`F>cPqN\hjkc(=6CagDS%T'1ub&e&Lh"46's#NYt[+=FX[9!,lY?>qo_,l?cp%8>t_@^9/NXhBTf?LEek5M%\bLVdm1C!A%fKJeCX,(klr=]VrSk\8-TjcEC=(r=dE0p,dY1`^%4TR\t-!0;3iFqB@IGb/Bhr`e'"lDAF`5C8<+ABr_hu)6Tc&SG<-523Ph[C("2XjH(/G%4Gor:]E=l=5@>VGpTMrG\%m&Q4;QG;IcQX&0Nru):YiLLX*g977A1G\:N*`Kin5e&Q8TCJ^4\,f^@E-#M21"SfZ4VEuGn%IFgZ0s6Y2X[31+g\n`DHEj=<aAfo_Kh>%>R_]HoCo6.[s^cT;9n(-m7'ZUY)`JsW/oCDuL%qM$oDL\+E0Zont0T;;)a,cdRV9ZT\SQMR98THMTQ9(.>G!Zr0cKikEYt=O<]K$X1\9!!+05r;\6.-tO5@kEha]&R/Bb6e1JUugo7M`e'jM5jL4Nm@rQQg[;fb/PX+?4LBi.As2"n3ct9E@TMX>3`97IDFBWkb/^JU=]]n\qIDh9,0olr!Jf]Z6f2N@F>dUiN=tSsBcFj**-r_B8=B:uSr)^V^'nO4kp$KOGosmVSRR>Nm4f3`9Ph\Tl+`FuJEcp1Uo.BLVi8`G)d?$(\1XbuR".o=UYMf^H%P58cGJZIlkKLpOq8[8*;Q)a$I-9#I$u\,?K\Drn[6U]~>endstream
|
||||
endobj
|
||||
28 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2279
|
||||
>>
|
||||
stream
|
||||
Gatm<=`<%S&:Vs/R$V:2KraLE,k"*ODXU$"BH*`&LB%N'0t%ul<(SRBpXejB8_J+sW=?)6A'#GJqW?^p!q>`0@(u4Ni6N]3IiCWa_X\UsfSa0`#&feThbVI[#Vp_1n.N4ubp3&iGHZ$]"G,SS8%of:)5M>LX5S02iG]rX\`Dk`d5s<$U4pc59jq2Uoo?c^;cnL$jmOI*^aWO,?CF/jq0Z^g%`r+V(X8-p5rF6NSAu":a8Z)9%Q/t-8HVQNTcS3.h_iX<e-k*9$8,(;Tq/lmeAoO=Z+pfoNU()UO"L#J-I&-s%3[E%KcqU^qVd>;GHJU#L#b7X`P@""&*T,MHQ</P=<mneY*g@`_L"<H)-Uh*L`u9PhDfROWe?rc7^1[bko3T5#?r?i5]NVmd/\(l"kupnJ:SW;b.==s*a"<.X"'5/HcMD+ZH9/Mi9Ce<_(3bM6#W?5Ui&3-WHLhi$E6<aQJX+;)m20M>g"m(KN+oN5E4#4>)euUb(C4neo3.HZE+pY;KJ]ra['1,k3K>3>aEVQ^3?Y.p!3F@Y$q61>S"Q.%A]E^D<qGG[r9Go%d2Dt;:.Z@@.M5<g#I)&&-]'GAJCf`0U0r8lebLN"muXp\9mU70KU7G'`T(CP22l=86L]JRCk3hLG&$#YTscf7T)9NgE02G7>S@IhtV?31qE55qG07J&nD6un&6'LJ6/I_4$?I\,!S=hH\s,5CT`H#@FE8^.T7\*b4Un?S=>=^=9mV!Rj^9;B)7]?9H<6)P1>ph>uP^AZk11jNKZYr.QS#GcH[d[F96KKDtn'GC'Doq9?jKe[?3I8lJu2>(b1+*:ZCf\]NFr)i+`LqR"T\u-)um5q_c\m22,Z#57UE.pLR)`;NPgMiZm51JJ6BtGr>u*j"@s$Y6q0g_Dsp@fNZ!!,eo#2PP-3,Lf3=S7l7P\s#6.)9uUb64:4p*p'ck[!nE/IhS?N5o`U,8TR#?o9I&5mRYKA7kQt:T&N52T0>W0RGQ/#C:<nc.J7gire(f]WbE!aLlJOt;P^#/_=RGgs(0/=!j@%F:3C+3\n!ZAT")NsrM!"0GX`b>YeZ:?(W^W2ME,m-R"YjAH[#p$N(c`c&!mb3#PW>eE&XD^3-NYMs@PPpPG7;gE-1Xceh8<B@-(,`]S:L:]4"7Ua1P)3/q+C&h)H`:)ncBNq+0j/s[%Te;!!1Ml53!J@+V!>3/FV+iQ<Ic:9E9!b38U]@FH)jndE-Vf#8At.Jd^YQ%JSDN<oYk2qf[S3\c!MZ?e\B+m]`U9C3po;]O1>mf)3@erqSqR5rr+D%m6d.frsH7Ibc+0i?.h?fmYs'p8ci2oW*4P=0i%C8OC\H5o2Z7bq`Q8X5RNJ^sTa,l^rQNW&9M9f:LfF&uF:]eMN$T#(kH#D6CfQ#D+?0+0@mk4qL+g3)@u5C!K;F_[$H8Y7Os1ZASZie=:?[Kttu@1u-8CIJFTB%Vo?I.[*XuSNKXPfM/XY[,KTX6%(H9J/;e5,"dj]^&Wc585nOcn>52MCkaXb\JYRbOW^\GD5:4)RCYD2X0-r(9qS:1$7>t9)0-VS_*CB*?p$Ht!>?rP0B0bqd8GJGBUUICWiWCce'(Y;3FI_j+[t/RQVFVLA]ksmZ!u[e_Z3&.DXkf_Wb?&X=Q]-@M^Y?br()lIK!&(&$n!KKq#Rs7ZRgCLj`o!HpEm<Xc<"!BH'@]I`jQt&.F(J?Pe8S^T:+ZJ*S6[Q\ni:jT8Z/]Ngf4m+q&&^OgstfGnpkKl4?YDZ9U'og5%>LRs,L+<dceg5,!L2Y9dOc5<tTEH&$1(Y?YUD5+V(r<oXrAi0qd@S`8lR*5sYt@Pl2^LP7'63Ar\/kU,Y#-?#i\+L/sJd1>9NMP7sB2N[XmW\Y"N=9J#YkPlM`(K70LPX.Bj5J+A.X\m3u/&/Y,q$ds8@q>d>:]go1UOQ5>AE#J;4$WB]Ng>auiE1ekCkZm`Il7u;Zu@!%*a>(rE&<+-rn_KF[7d"+%/Vre#NrS@7Y;P^:5`b0a/+@^pr.o7n)/TU?:'b"!6`>U6)f!4<l^&RR\sjTn(hZi:s_$k,2Zf`A;64l6'2O+*bBt4h+&hn4k#J<XA_])?Hha9#.5k("k7'3l:CTNjV[eQcHW:tSfOjdpSg0JCg(/hW$"qM=?^?*HVS&WQiYP'RLT*"3/W)^*t#/k=dj&*c0i?\5u$nZCTnM=c(0MkUlk>n'-"9kYpb-/l3MDEBh'U`ddmf=\q/JG#/_+k6B>;I?Js1g1*!#j-bo2A!ZuF3V=*^ITAt$nGqJ*j2`u'M*u-,_?2~>endstream
|
||||
endobj
|
||||
29 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2560
|
||||
>>
|
||||
stream
|
||||
Gatm<m;p`='*$h'@L'W#k>L@7`0l5q0!maS.8X[fk3d8R;/IU6[4NWF%H54r\)0fD?V29(1@Pq2d_>['V;7CVYjnolJ)_O,*t*=Bl@@p3@L\?9q62i6PJtr$,<b)'<)#b]BZ@0i;h*`G-f6<Va%5qfg[a_\9EO:u@C4Zb\7@O_dr\O04e+</_U=iG@$0UI?N&;bYkS9X5Eq>&,WG:raV;Bkc3;ltR.MdY0*nI!Rc-rq^lQj4qYT:lZkR[R"baUDG5,#6bouR(Q>=2i\30V<3#bR*)[F8/6@q2;nO'$h,IP?hQ9@HT_9oE+?0/'5-OUXP3St39Z7PrLABG7hi(UGDAN^;@m]dtC>:U]JM*_HYkLB2LpPp!6'_,p*HuNopY/;,*@iW\`,8X^2.MA]\6"=b+6J#p;"\?"bINu*#>&8/2o!I%78Yi/p^fc7&(q`#m/>:a:X8jE[\ghGTGpO`;=dH=`"_SHE7DU72#,SG%DlOM^;1(_u+@^XlktOcoq"S$hSE@2?ecY>[rPuLI$^.\V1Y"bu/4W4pZiP3(bEL#)dpW=[GM3rHiM(9=nDb/k.$PWL*OrV[VGdU'lT_b\T<fHH-W(Q-!2_*AN]*GaI1`L[JnXl.Wh_bSkm^pY7*I)3`0SL_'W"eTKQFF@6VQJkS\^"(//@0T)Ap@dQHpJjU\@n\E\bs=N5Y9)*5@.c,c?ul87[,U(L&(3GVb_*Bma3EKQYFW#qST:Q5PO%&<Tu=-1IWDXTtqtaEZGu&kUQ[TseE2XDspJ0nksEh@;TiE[l>Q$]EK$nROY+;RShkRX;G:jV*lu.0d%j,RS+/CUl6R:ZlX>/_9,DeC$rrNfmA[b+!_l0r,35[8NJZX!0WM!G"\uWSD0LJn4cIoJX?_7r?BVgfn%1eHYu`dR34YZ9r>cOm]<;3[d%4n`L5&5FsIPk-*(hEcH,N`!+u!,gF`s&iXgVb8k6QN%rh^9O'-3+KSd&g*sri;B_AOD:3'gU=#,)qWI]o0Z8+&ARa3=SidlX7Z0?3\d3#.L,YSD"hui2*o!"JGYKrhD3e,r.,0l4SIG`lAd36nKkhp*T8%OmNg=PoRb>=<7ZaN7r&V;nVSCF5$c]@XWFLWbH]9Jd:&8T,W#VsU_X1%39BDI>;C2)[lCX0F*!:)D2+`qBQiAX^a05i;/LDMe!IbUYXqK[0B3!mH:au6f/idTqA#hN0ophZ<'FNo?>uY]g8:?HA6!XWub6BGaKTBa8grH^.9mS(?n)*)CPXg\=Q$4J?>h??@]a0;Lg3"5+<im3`?cfU:pNM%GX.7qkpS.en`.:D*$WU.7bGA_hHc>kR4jS!P5H68(Db((R-Ml:%0.XG-#*:lE^"PqXBP-b;1SC-gM--r-[U-GoefE6Ln,&7`o2!`/:&#Z4?*S<8i#Bs"dop)].h;HLU%]Zoi)E)W\fDDT^L8Mb9lfeI#fH@brXmc(7ct/6AKi^j?%X7.B?g)l"@F3^6Pt2T':gW^"h@2`FYZ92*>!'Q(r"=,?a:B`-a6&,[g`#bDjXAIC;WWR[?@Qkq[N5USK[l1Y%m<a=aifh8r?Q0*cd7Fhsd2=T@44<$=79Xf\N9K(P?-q%)OLg"83\V62RF]1ERWnN?UEIne18G%`Ap5W7fM0MH+/X(^[^Ap]8!A%#.VXMnp5Ib!?:H^Ou%D@]hbcP)8fSlODT1lmB=7gWLPF.rTn=YUrFXL#k$:jUb1^U+#&1P_O&eA`3:V#p'uV2GluQ+cqFod3L2ArBXsf%dnDUeZ*n&UDrbio=]H']t-1ml)qtWYIh:f!"E:<EpWc=.(<ISi4A@rJmeA0iNiYM:sKaTmjC#>]pISpp2u+Z'[=Z<(dFCbC9EaI/[q]Fn+XX8e=9"Wrdb@1^X6%coM>DbjTrK(qHnI@;YNAcko&!_\o]C.ct;qDR,+NPk3q>SU1l]lhV3$dSD%t1DoVsp)oq\r*4r(k*8fLjVph^'S+13jG1pX>4/HA`e*g94SOV5u!A^F1',[P<>DL^.(MS2mId:T.[iSVsB(WuhXg78=Fea7q`gKSN<tjucH^%0G!ef/VY&q-oauCI8LDtLdpoRV'QK*X\5(fBjlR6mMV9X/7$Pp$3TNWdC'i<_C,X;uCW]bF2f48ZKF`POt2)[$4j*5+3Qj!8`W!'JlqYDZhr&S8u!nM):Ar?!^"TNrDp)MYR'f+C=bh93R-K/HQQ#O/0_Q?]i3HV<DI!gm0?QFPhRm^>P,eIM3fd`tY%E5ESdIT:RA"4;WpEdN'</E)bW=US_YD^p/9m^@me!u:q-"o&4AM3*ZC%0rdh=0(jn4^*+r0_3DD#6GY&KqU#Im0CuJXZ%F<4Zl,'t3WI.c$tk/Na2X(R;dCfOSDb1FH4WnL;+,pf)KY\5XU$%EAciV7b')UXo]ldfPCEr-(/A>^L:J4l9R0)ZtOaeYa@S:Y2kl_:T4do7-6Wq2XbLflepYT`PQn3:)U2<fK1q3(qk=TZIBSX+Xab*k\Z@$9!OO,$S@,Z6BlqQ<3;5Os783KQKZBl^>=L'=*M!iMC%BE@Y0dWkr_Wd$<mpbpn;(IoqPHoRDT'76C~>endstream
|
||||
endobj
|
||||
30 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2470
|
||||
>>
|
||||
stream
|
||||
Gatm<>Beg[&q9SYR($(">9QOpV"0]/lo&/DUq=2$ZnH]U8Ou/V&hH:O;1JPi!$k"D->i3C*6T%P_0g<r!J:B%R/>*#J@?JBoo9%\@C$+Q04WYILS$LAIpTWD&S1NC]hH$1jXR!SnF3X7&HV2mO/$)##8s<fUaolYfaISVmtD?o=#EKmYB::]=IR)rQK=5m`=K3K$C`,oaloO>*A>kM,(IlC^(ZTfFtTOiOsLBdV;Wn1,96a^dk?Lk%Moj*nIfi[)1ImUMUQ.hI8fY2iZlV!F%QO>9\+"7OI*I@FnE5?!Q9ZXe[lB[;cOZtVA?r(/:jV2DAumP7d:=ub$#X0]H<(.nIZ0A)_eLHXV1o:^KD,M_nT\P;-F2L"r>Rl1ZjRf/0gHkWsCTg=T"+)3'tOM*QSR+`)hbATlaRtWe#d\G?^mS:q!e5Y,mAH>O2"9OnBW$RjIu&2t3(jdd%o,"e]k8jrY@4>;[XX#/hF>(o8_fU(FlBW"=:^\#h%8[jA5(/Ag<_4dIDLCuJQSDnIQQ!Sl7HV%?!u#n%^R)J%Y0F,:.lL=TqDKA,No=F1N$=XEAVE>Y4!\>a._`!nU`Z>TRHKuS`kb26>SGPir\%H!p[;h0h:Qf:8l8/J\n8$IdLjZEXMfP6%Jmqdd2PJI>`Ug_?T'n0*,RsZm%+cpj[g:UdpZLfU'`irl(C9C[sIcE9i19:PqfnIUj_h,"G\7!T&SMR!]-7iA`/rDH/F:++0Y1c3%3Ld^"GPgM[m*QttoT#DICjII+)4DNS[bRVMi?4UQ-r`1!IObl<dV[CtK4X!sNP^]kDF>WeHd^Z<IbtlE7jq`kiL<[(lK-tbW$6DbaBXTnQ43aM$GR&8_+pG\0nr7Z@Sb\hR9)okL:B=?7!F>6$-fsXnRB&K*FT9cs)oY=%=40cIO7Vt^6Acp4euI2?`,bZe(SLblq5PoPmN,NN0W<[(O&VeNu&9AXd5mP6h,_''UuWUNDENDF?Li'(qJCpJ"a?bD5A`%[:e(eP_s,7@-bV!rs+69ALq0o.;q<Y$V@Q4&d^n02'u\Q,'1'a/?UL^)U&iuVTHKuju$rihp#&BS1r!4X-#jc?lKo,L0%DR24NOjPrE[=;J>4+LmCh;Gu*"rV%hN$CLhXNq#glhmX#>6nUH&g)^Wk:ShMZ-`%DO*#522G<X7IN+5E9OO\<%jWdk`,/7$<XSh!r_;B;&1Unse`\\p\8\rNmo?"Agf.%m(f9/r)p'FdCR3'$C;]n??+0Ch2&T\Oi8S0VM!W0hmJe)muFf,t![2NAafl`:Y_h<PAL*HfD:cg;cM"Jb9-quf-+D3PX?BUfUYWhVpH5tcn8KBAcM&p-fQ-_mn1S^KmfSb/*rgn_IG%l]U98\9;:3\"kLYHU`q7ZaA0]L-q&0_PE!m_;R#g<;TFa6hQspIm[he9NbprQ9K?F]"7a*/j#h-Bo.!]c"O8#Vm`C?LSjrqo]Lk1A=I5=bX5nG(%6@jE!^0VuN'Jr4n<2kkW=HKj1YuMhu5dTO%X^a!'_q?T1L'na#8QW&PXI1h+=h=Ac_\D(l'Rl7-Z[TD%7IZ;ET"75GOB?((:s^K8)/n4Ur%J1[4]F>3$FNf)GU@d_V_lb0!X1[!D,cIU"nA_uP%$j&dJCS>8rk!=F@YPA"f!ZM7As"qUgAu=qK#(!0"X`?Q#e_k6q)"$VG5=Q_!nS'#9qfV1WqK7**etWlgH61YB%3!gf\R/.<@6)Gae`aq.l?T[s1dt[Jdg9TQ7bo$`eA(hS=E>Aah>I,Y2amS7g=FVF[[TGBnuL)rO`pjj[H`UJ2@S%&3n:)N9;C!r<&fs[Fc1mAT2[7j2m2+!9oF\Tp%gXldG@%$a3KlAKl2tNS!tW\3(h<-KHJsXdTA^R:h1(saLs\X.bQimrEO,,Y,c"Sic*h1=qcB0+u9.o7pm9A"3uu\D>96KTC*&("U;^1A#q)i6g2n.<g"dqrV@L'(jcgB[nuHG^k>"r90\pk[]S>m4p3OD-J(j3h;!SQ;bc:cQ^Ac=U,A_rCg]5#.OB+27Y$39`YoGYo?l-F]J[XUNH@riUFc@]@oVM'r/N9Xkh6#A9;A;"Sj3k+01E[^)38#-=Vgg[QFG^uX`[(<3r3jGFUFM^F)A-r:c!BFK9k#EoP+mnA`/e+i6R]_JN^HRCER9+q7"5$s0Si>,^6FeI?_3+amZkmdETH?"rQTSDI?t=46'=3f)Vjh?MjM6Pp(:?G`Ai:EJTa_?G0"P?PgE`51m5m5MUr$3pj&dn1]jW@M=PL\5N;9JAgfX:#8-Z`\UE1G,dc@FS;i0a>@@>J/1bhCR1;.O2)b^(efq7l;UeSfP=d%1f:pP@,IXd_I*-AD[*QcoIcn!:S:pn*LG="=HLj+n/k2UK5MEY]TT+mGaG>,"6[r/Tb-IkYQh2hT!f1;;iTY*7!f#C(B8QEOnkU.a8.7_04D3q,g9ZKVhurg%Tdg80uUu([;X?Z9Srh[p`DJ7'Me~>endstream
|
||||
endobj
|
||||
31 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 2152
|
||||
>>
|
||||
stream
|
||||
Gatm<D/\2f%/ui*_(PnrY.\"fpWsXgZaRr:DTQp7JRQ?epuFN=[WRl&"P7!FdZVr%,utp,BmQ\uUe!YE`.qs_iBPnCjqY\d-+nWOJGHE#J;&mmQ=o])H1^d.IhG"=&$eW?k8-u\5B-D"f;-4Kl=&U&68+$<6G3_dQQ#sll<7jEf6+]X1SqPL%ndO;b,X1CVt^jis2"7D)4@SeBAk%++Y`5Po<j*b,.RuR3/u;pQM==ETf_E*5<kg=E-"uG*%oU!0ZD?FU+faFp9,)]O"PCDK;HN(aZ.I?+Koa5DX#n;ocPO+?G?/bohbHJ+a*_IoQ1,@m15Yh3o3J/_br>Y`:o1:bfASs4S)Yj1Dml*0?F&Qk#mQ\m6(`+Gr4sL(m,WuHGX'8@fi=1>g&S&;"1b&2bJQ#/[e9\YS)Yk`<t1kYIoG%K,*9$TSfJ^a)E9X%Fb`]8Zil)/]n8u.dnia\%!J2e-qi=HJ:%*DK4uSJP,F/e,63[ODEMV/brik'ZMP!U$$ho:hnML,9MMjZM4UC5mo*4*A'%2n.ReZ[ONg;#F."B5*a@,UVY#S)]QqRX:Kr%&'ZA-1&+%LcG]*dR)if[g]k"s<NdZV4``e2b*t]l@h5`8=A06^1R0A.>ja@ooRtN/G2<gqo_P>%Hs3_l<o?K=cQ$]+6+aA3!Oa;N>+mc:hPa2]'WmoL+$Z<EKUeB?"2)EsEbI5`1hg!rmTKWBEaie^)jcmKP^G)s<lt1R7UV03n-aJ^Lp=naV105jC`LO%.")N0_m0L">ZKNVO=)$*Xt3k9f$9^cJcZ"5BZRCVjLXtM"4aFXhOL3AZs)#N).NlO_9EKoI=7NMW`p?8ViLFh/+h]/="k:XFNc]&pml3F?+J.Gs!WQf\o5_(l="O3Md#%8XB_4F:n^kmV8<]%h1u*k'VM('MOm,WkaZ'ZWk-tGZ*I.(/[PS3mrE]1A\b9UrA4$)hAhZ7+Yc9.Q`F:i17o5<j2YPD(H"c8\?6dL']-8-DeC'SeZ=mV_eY>c1h6o.fM(@QQ,ql/lN.A"3X(`6Ea`NB,_u@F#I/lpG0*t?H?o'sjsGp.0JW?4.h)8qkD8QCa$=Ck^"bK4F.bUJ[&\K,P,9aDXVJF<0rO5]D?`#Wcnag$\r%\/j3;t2>CHQMleu2QBIX%dZ*5C8km]h#?b<ui('?DEiVCi&>e.S6.)[Ta_uK`WTn<(\=e_T"Q*'@/-@/eg7YY(7esn[])P5iamg#'P?sJ>/a"U<LrHs]eo0Ks[cURZ7EHSp=LKPUcfdoDXa_3mUIIT\!_XtX&L*mf31!q,MSEoU,.!]9^MB(NXeB](bbS0Hp6=(m"*1.7;/j/ln^saj8Y&&A8<7d?r.``Uml8=_r5C>bB6>'B"eT2ka3>1-fF7;e0>#a..XEnK-S"t(qDZFh_08k*:CA.*B:Y$^tO)R_AR0]:mB@"tPUr>F)%t:$4AIR38@"BEe4,%:pWg2)6j`m8tYs@,]G`-.9D;_FXAW(QV9l'TqXVTM$_d[tM"t08<aDZ;T(4s$:9:LQ_iH>JrKr0o;23M+X\6uq!pD.@rr+;V=qcY3bdp5^aUC-iunLph(R);S0/7-D4X49(>aTI+e_e>/p%b*5;#DaG97=8.#TIk"_l'9U[5LAO<g"sBRb97MjfIk5!pFJW*I4@O-8)k1e%LZ!.]dKGMmg5rI*^iecW2b0P/@'po)MC=nG4;*/msa62pF!iH$7oIYee'Xo'WL[A?>h`5Kg(ApbIdjQ8Z]7ENoCosB$/cf`>LSRFQ)nm9oHC!M2AW__WtC5@.IUqLXiA9c0\J#pEQZk,Nm"p)IrD[@#gPKl,*c91AefVK]a<5BJk+<`6p`jRIS)%q$,0RCSTJ/]2E*6ee@GpqZ0Y^SYJj(g<,\/GCc[&V]ma<X=_2:FYX2_-(I_TXN]cBM=n*;=.8I26f<VE1nqPoWtg5<`thTE>gMq1ZV>4L!`*Rh3HN)JX\Icb&`S]^*c&q.O(EB-Gc],cm/\RLbE[+]Nd^/']=#1maR%<CH*8nnObVr-lEF/na`@)IZROM,Tjn0&g:<[ZK8d3[GcVroX],Z$Cb\Nm)!X)%aA<CY%iHu-iX$!Pa*DU!TemhQj3`j2>WEWMDD3d"0Yfr8aaPr?JYgYt;_sm;c=6[hN.r^7\&-Pm780Wl~>endstream
|
||||
endobj
|
||||
32 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 1311
|
||||
>>
|
||||
stream
|
||||
Gasao9lo&I&A@C2m%lL5=u-0RWSZEjS[N$@S;&ia_&&IA7DpIFJ=mYMf69MI'WjcB/W0ZH]6Lpu]3%lPpn?`SBE<S*iS=qHarmm<7R71Q/R7J>YH)XiKZm2mK>bla34+0SqPuR+J@a:+O9?0;+H=dOKo<SZn4>bN/``cbHam'j$P,'g+d[&X[nlMunh6*>[31BmfU;tX#2ur74l6O<A>'opEKVX3#>J>@XjNd*rU9LE.dU1V,Z0)P6lA0mLnce7m]D%9X,e+]!K'c*NS,4-MA@SbXc9T/emclH9J'hBN.Da@]j1eWe6j_qrZ4`e%VHDDs3Dt4^9aK`=i^<L)[>VJn!Mk'"aLDNjDH5<9;SK<s-VlgL3uhr?+!neM9c$$(Y+VDKC\2O%l[D\B9Yd'(<Y6/V=[YATS0H]$HM%_KZNF%[)a2TbH6-V$d'oHi*(1H<<l"#gP21Rkr'DJd:h%uHdme@1c=ob1;0"dLNM@n<d"bq6UH5'<I'QD;E)43H[?!OHA,-"7A8dTFqj2WS:$kKVt>O)bK]+`7e:Ka1SJ>9d@sIK'H2G?X>F)fXDVsT%VifjD]6"=$LU\I#M:&FP[/u58QVG87)tGmA<s&J>F.U@^!;ei=WUrsn*<K_Fm1VRVd8#uE[(uT>l9`ArU]Nu(TISKj%maV_(ub>^$O]\p@>IK'CB>q^l3m%BYdo[&Nc]4`'#j9i4Nb<:C2?n4FoPaX21aX6=\F$`l`cc26bk!B$mtMn$W"LBu#)Ga_h2Lc"6(?1^A7'c"LFN*q[f%?'SHmccVqeh>`=>4e?W+bs6B]`LJF)j"hBC<&r1LRnJ^QcBZl#CG!INDO#S^:^SESj5k%0.HJqmN$tC]h7su^.K/=cgAtV<66fPXQ>*,&\2V$'FP^7Bbmjm0U?fW25WO(icG?(6PjPc+iV1M&Ff,1KLRq[`lh[+lgX\L0;hB&\6KTOQ1J++eW-PtkoY-]\XiNh$:@M#$UMt%1G%qr@lf5rllu.'iNK;^KRHN@M)&_96AgAABEjB))*;,M3(+7cd`@JbjMSk.W7pkF--N=jQ*Z5s2>PRGp5)u8q"Xtb+&u`DaI5_h91e?HIakPGY<p5$HZc+hK8h_-[.qib2I1WY@VVhqW7H&O_/+Dq,X)AW7;)EVR3s@\hShMNB4D'JEa,7*t!-eQ/%^IP(o<VdDg"8,<a,1fC1M@B9<FrBC9[1g8@%5ahC,O3m81ZY.80"s\F9?M@]G5[8fOO.d%VU&T-u-S8;=UfB$:0=Ti%n[Ye6kPU=<EjpfLG>\5nWU+r5+)Eb$M6&74$V=J^o671ZCq~>endstream
|
||||
endobj
|
||||
33 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 1124
|
||||
>>
|
||||
stream
|
||||
GatU1997gc&AJ$C%!(&;)b/=]]d7?t8?jS+`ePU]U#iPu/4I,qA]+K>),cV>fgf5q"t5rsS=/jC7RIXlI<f)P*oPiehL+7s;cmpfk:DDM4hP,Sra%uc#'DUVXISueObF<ns0UO"J5=Sa/7Eg%6WMLD*c@8a_ABOKMfPls&akY3_-ajT?n)!P(fpP@Q="(rF%C<`.;_s`eW>c15)Cimk819P/'>H!3d?o*Gsh7`s8TU(;W4k,;!*]Da_P(,..W7ldm""C(7tosS>o1pZYUP#BRAH_0(_$N"S,CCRh$t;aAnZ5Wbt$"aWSC52gPjUiX4T+-h?C'X/<NliD%GQr2c*`8K[%?emm\ZGX>M&rJH],1L?kK:%lKGrE_O!1j$Tc:^:u^YX6jd.MVRm0H.dPlG2/8A<_Ce$UV=nZ+(!Vi19MBOnoi@-Toa1m6Gt&k+LZ6EC\=?).=0K^.qeY,Xn-@,&hJM*Z]&JU,n=Y\;Q)<Tcp4ac5ah4;oL8'9i'qKDl#q1<#8XN8pUj8]CFruc*6S#J0UOMkg17$?BoP`RuO]P(08?KJ>W`&p<F(m%8qO&`Ha-Vn3i6(bhra=\6^QeXZ\^@5NG&G;cSjkXC]f?V]P]l>-b5El=-"K4V;i_KL5JE<l0krbo@$>^#(9tOhp7l'>FA#LXb4DOFHn+@lmS:m<;!,b*"5-W[8Ki#B`Y3Ksd&+(Fg#6(HY=1IAr:3ZEem$cD(T\[bZX=0-2MA)6O_0#j(P`liSYX%Q(Wd&GGlD-&V!&.`(Gdq_MF:Bj.CQl*X]OeM5u+eC8kU=)UJ[<SZD6F#\"ul6,Ge+'bHF`/7``?7Tb@l8%@;I[=)+Xbr7/'BX'[[RdR55q-&od$/3\g7_%(6di6A[I\QTUG*t2U^h,u:m4g-3(Tlp6lhm(iM@j^S.TB;5LIVf`cCkAV)bX;iLZF=))(7;3-ZNX9[^s!UEug\QEa#M3lssNP!0WBHg:S:CXb&-DmhWi3F,3e=MrCajj\UO,+VSH&/uMhf?=Ih/bV$"f'Lr2fBZA&VjYa"ni7]CGqf/sHh;Ej9_\#Z,Kj11R1)p;2^j'Zjt!lh]NO^?Gh$51^*T;tPC_eM?fu$X:4(9L1Tnp2'/is?"5,dpk5~>endstream
|
||||
endobj
|
||||
xref
|
||||
0 34
|
||||
0000000000 65535 f
|
||||
0000000061 00000 n
|
||||
0000000156 00000 n
|
||||
0000000263 00000 n
|
||||
0000000371 00000 n
|
||||
0000000481 00000 n
|
||||
0000000676 00000 n
|
||||
0000000785 00000 n
|
||||
0000000980 00000 n
|
||||
0000001085 00000 n
|
||||
0000001162 00000 n
|
||||
0000001358 00000 n
|
||||
0000001554 00000 n
|
||||
0000001750 00000 n
|
||||
0000001946 00000 n
|
||||
0000002142 00000 n
|
||||
0000002226 00000 n
|
||||
0000002422 00000 n
|
||||
0000002618 00000 n
|
||||
0000002814 00000 n
|
||||
0000003010 00000 n
|
||||
0000003080 00000 n
|
||||
0000003361 00000 n
|
||||
0000003494 00000 n
|
||||
0000004196 00000 n
|
||||
0000006400 00000 n
|
||||
0000008981 00000 n
|
||||
0000011784 00000 n
|
||||
0000012614 00000 n
|
||||
0000014985 00000 n
|
||||
0000017637 00000 n
|
||||
0000020199 00000 n
|
||||
0000022443 00000 n
|
||||
0000023846 00000 n
|
||||
trailer
|
||||
<<
|
||||
/ID
|
||||
[<71e3d90b133a79c4436262df53cdbfbf><71e3d90b133a79c4436262df53cdbfbf>]
|
||||
% ReportLab generated PDF document -- digest (opensource)
|
||||
|
||||
/Info 21 0 R
|
||||
/Root 20 0 R
|
||||
/Size 34
|
||||
>>
|
||||
startxref
|
||||
25062
|
||||
%%EOF
|
||||
@@ -1,160 +0,0 @@
|
||||
# ADR-024: Canonical Nostr Identity Location
|
||||
|
||||
**Status:** Accepted
|
||||
**Date:** 2026-03-23
|
||||
**Issue:** #1223
|
||||
**Refs:** #1210 (duplicate-work audit), ROADMAP.md Phase 2
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
Nostr identity logic has been independently implemented in at least three
|
||||
repos (`replit/timmy-tower`, `replit/token-gated-economy`,
|
||||
`rockachopa/Timmy-time-dashboard`), each building keypair generation, event
|
||||
publishing, and NIP-07 browser-extension auth in isolation.
|
||||
|
||||
This duplication causes:
|
||||
|
||||
- Bug fixes applied in one repo but silently missed in others.
|
||||
- Diverging implementations of the same NIPs (NIP-01, NIP-07, NIP-44).
|
||||
- Agent time wasted re-implementing logic that already exists.
|
||||
|
||||
ROADMAP.md Phase 2 already names `timmy-nostr` as the planned home for Nostr
|
||||
infrastructure. This ADR makes that decision explicit and prescribes how
|
||||
other repos consume it.
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
**The canonical home for all Nostr identity logic is `rockachopa/timmy-nostr`.**
|
||||
|
||||
All other repos (`Timmy-time-dashboard`, `timmy-tower`,
|
||||
`token-gated-economy`) become consumers, not implementers, of Nostr identity
|
||||
primitives.
|
||||
|
||||
### What lives in `timmy-nostr`
|
||||
|
||||
| Module | Responsibility |
|
||||
|--------|---------------|
|
||||
| `nostr_id/keypair.py` | Keypair generation, nsec/npub encoding, encrypted storage |
|
||||
| `nostr_id/identity.py` | Agent identity lifecycle (NIP-01 kind:0 profile events) |
|
||||
| `nostr_id/auth.py` | NIP-07 browser-extension signer; NIP-42 relay auth |
|
||||
| `nostr_id/event.py` | Event construction, signing, serialisation (NIP-01) |
|
||||
| `nostr_id/crypto.py` | NIP-44 encryption (XChaCha20-Poly1305 v2) |
|
||||
| `nostr_id/nip05.py` | DNS-based identifier verification |
|
||||
| `nostr_id/relay.py` | WebSocket relay client (publish / subscribe) |
|
||||
|
||||
### What does NOT live in `timmy-nostr`
|
||||
|
||||
- Business logic that combines Nostr with application-specific concepts
|
||||
(e.g. "publish a task-completion event" lives in the application layer
|
||||
that calls `timmy-nostr`).
|
||||
- Reputation scoring algorithms (depends on application policy).
|
||||
- Dashboard UI components.
|
||||
|
||||
---
|
||||
|
||||
## How Other Repos Reference `timmy-nostr`
|
||||
|
||||
### Python repos (`Timmy-time-dashboard`, `timmy-tower`)
|
||||
|
||||
Add to `pyproject.toml` dependencies:
|
||||
|
||||
```toml
|
||||
[tool.poetry.dependencies]
|
||||
timmy-nostr = {git = "https://gitea.hermes.local/rockachopa/timmy-nostr.git", tag = "v0.1.0"}
|
||||
```
|
||||
|
||||
Import pattern:
|
||||
|
||||
```python
|
||||
from nostr_id.keypair import generate_keypair, load_keypair
|
||||
from nostr_id.event import build_event, sign_event
|
||||
from nostr_id.relay import NostrRelayClient
|
||||
```
|
||||
|
||||
### JavaScript/TypeScript repos (`token-gated-economy` frontend)
|
||||
|
||||
Add to `package.json` (once published or via local path):
|
||||
|
||||
```json
|
||||
"dependencies": {
|
||||
"timmy-nostr": "rockachopa/timmy-nostr#v0.1.0"
|
||||
}
|
||||
```
|
||||
|
||||
Import pattern:
|
||||
|
||||
```typescript
|
||||
import { generateKeypair, signEvent } from 'timmy-nostr';
|
||||
```
|
||||
|
||||
Until `timmy-nostr` publishes a JS package, use NIP-07 browser extension
|
||||
directly and delegate all key-management to the browser signer — never
|
||||
re-implement crypto in JS without the shared library.
|
||||
|
||||
---
|
||||
|
||||
## Migration Plan
|
||||
|
||||
Existing duplicated code should be migrated in this order:
|
||||
|
||||
1. **Keypair generation** — highest duplication, clearest interface.
|
||||
2. **NIP-01 event construction/signing** — used by all three repos.
|
||||
3. **NIP-07 browser auth** — currently in `timmy-tower` and `token-gated-economy`.
|
||||
4. **NIP-44 encryption** — lowest priority, least duplicated.
|
||||
|
||||
Each step: implement in `timmy-nostr` → cut over one repo → delete the
|
||||
duplicate → repeat.
|
||||
|
||||
---
|
||||
|
||||
## Interface Contract
|
||||
|
||||
`timmy-nostr` must expose a stable public API:
|
||||
|
||||
```python
|
||||
# Keypair
|
||||
keypair = generate_keypair() # -> NostrKeypair(nsec, npub, privkey_bytes, pubkey_bytes)
|
||||
keypair = load_keypair(encrypted_nsec, secret_key)
|
||||
|
||||
# Events
|
||||
event = build_event(kind=0, content=profile_json, keypair=keypair)
|
||||
event = sign_event(event, keypair) # attaches .id and .sig
|
||||
|
||||
# Relay
|
||||
async with NostrRelayClient(url) as relay:
|
||||
await relay.publish(event)
|
||||
async for msg in relay.subscribe(filters):
|
||||
...
|
||||
```
|
||||
|
||||
Breaking changes to this interface require a semver major bump and a
|
||||
migration note in `timmy-nostr`'s CHANGELOG.
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive:** Bug fixes in cryptographic or protocol code propagate to all
|
||||
repos via a version bump.
|
||||
- **Positive:** New NIPs are implemented once and adopted everywhere.
|
||||
- **Negative:** Adds a cross-repo dependency; version pinning discipline
|
||||
required.
|
||||
- **Negative:** `timmy-nostr` must be stood up and tagged before any
|
||||
migration can begin.
|
||||
|
||||
---
|
||||
|
||||
## Action Items
|
||||
|
||||
- [ ] Create `rockachopa/timmy-nostr` repo with the module structure above.
|
||||
- [ ] Implement keypair generation + NIP-01 signing as v0.1.0.
|
||||
- [ ] Replace `Timmy-time-dashboard` inline Nostr code (if any) with
|
||||
`timmy-nostr` import once v0.1.0 is tagged.
|
||||
- [ ] Add `src/infrastructure/clients/nostr_client.py` as the thin
|
||||
application-layer wrapper (see ROADMAP.md §2.6).
|
||||
- [ ] File issues in `timmy-tower` and `token-gated-economy` to migrate their
|
||||
duplicate implementations.
|
||||
@@ -1,100 +0,0 @@
|
||||
# Issue #1097 — Bannerlord M5 Sovereign Victory: Implementation
|
||||
|
||||
**Date:** 2026-03-23
|
||||
**Status:** Python stack implemented — game infrastructure pending
|
||||
|
||||
## Summary
|
||||
|
||||
Issue #1097 is the final milestone of Project Bannerlord (#1091): Timmy holds
|
||||
the title of King with majority territory control through pure local strategy.
|
||||
|
||||
This PR implements the Python-side sovereign victory stack (`src/bannerlord/`).
|
||||
The game-side infrastructure (Windows VM, GABS C# mod) remains external to this
|
||||
repository, consistent with the scope decision on M4 (#1096).
|
||||
|
||||
## What was implemented
|
||||
|
||||
### `src/bannerlord/` package
|
||||
|
||||
| Module | Purpose |
|
||||
|--------|---------|
|
||||
| `models.py` | Pydantic data contracts — KingSubgoal, SubgoalMessage, TaskMessage, ResultMessage, StateUpdateMessage, reward functions, VictoryCondition |
|
||||
| `gabs_client.py` | Async TCP JSON-RPC client for Bannerlord.GABS (port 4825), graceful degradation when game server is offline |
|
||||
| `ledger.py` | SQLite-backed asset ledger — treasury, fiefs, vassal budgets, campaign tick log |
|
||||
| `agents/king.py` | King agent — Qwen3:32b, 1× per campaign day, sovereign campaign loop, victory detection, subgoal broadcast |
|
||||
| `agents/vassals.py` | War / Economy / Diplomacy vassals — Qwen3:14b, domain reward functions, primitive dispatch |
|
||||
| `agents/companions.py` | Logistics / Caravan / Scout companions — event-driven, primitive execution against GABS |
|
||||
|
||||
### `tests/unit/test_bannerlord/` — 56 unit tests
|
||||
|
||||
- `test_models.py` — Pydantic validation, reward math, victory condition logic
|
||||
- `test_gabs_client.py` — Connection lifecycle, RPC dispatch, error handling, graceful degradation
|
||||
- `test_agents.py` — King campaign loop, vassal subgoal routing, companion primitive execution
|
||||
|
||||
All 56 tests pass.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
KingAgent (Qwen3:32b, 1×/day)
|
||||
└── KingSubgoal → SubgoalQueue
|
||||
├── WarVassal (Qwen3:14b, 4×/day)
|
||||
│ └── TaskMessage → LogisticsCompanion
|
||||
│ └── GABS: move_party, recruit_troops, upgrade_troops
|
||||
├── EconomyVassal (Qwen3:14b, 4×/day)
|
||||
│ └── TaskMessage → CaravanCompanion
|
||||
│ └── GABS: assess_prices, buy_goods, establish_caravan
|
||||
└── DiplomacyVassal (Qwen3:14b, 4×/day)
|
||||
└── TaskMessage → ScoutCompanion
|
||||
└── GABS: track_lord, assess_garrison, report_intel
|
||||
```
|
||||
|
||||
## Subgoal vocabulary
|
||||
|
||||
| Token | Vassal | Meaning |
|
||||
|-------|--------|---------|
|
||||
| `EXPAND_TERRITORY` | War | Take or secure a fief |
|
||||
| `RAID_ECONOMY` | War | Raid enemy villages for denars |
|
||||
| `TRAIN` | War | Level troops via auto-resolve |
|
||||
| `FORTIFY` | Economy | Upgrade or repair a settlement |
|
||||
| `CONSOLIDATE` | Economy | Hold territory, no expansion |
|
||||
| `TRADE` | Economy | Execute profitable trade route |
|
||||
| `ALLY` | Diplomacy | Pursue non-aggression / alliance |
|
||||
| `RECRUIT` | Logistics | Fill party to capacity |
|
||||
| `HEAL` | Logistics | Rest party until wounds recovered |
|
||||
| `SPY` | Scout | Gain information on target faction |
|
||||
|
||||
## Victory condition
|
||||
|
||||
```python
|
||||
VictoryCondition(
|
||||
holds_king_title=True, # player_title == "King" from GABS
|
||||
territory_control_pct=55.0, # > 51% of Calradia fiefs
|
||||
)
|
||||
```
|
||||
|
||||
## Graceful degradation
|
||||
|
||||
When GABS is offline (game not running), `GABSClient` logs a warning and raises
|
||||
`GABSUnavailable`. The King agent catches this and runs with an empty game state
|
||||
(falls back to RECRUIT subgoal). No part of the dashboard crashes.
|
||||
|
||||
## Remaining prerequisites
|
||||
|
||||
Before M5 can run live:
|
||||
|
||||
1. **M1-M3** — Passive observer, basic campaign actions, full campaign strategy
|
||||
(currently open; their Python stubs can build on this `src/bannerlord/` package)
|
||||
2. **M4** — Formation Commander (#1096) — declined as out-of-scope; M5 works
|
||||
around M4 by using Bannerlord's Tactics auto-resolve path
|
||||
3. **Windows VM** — Mount & Blade II: Bannerlord + GABS mod (BUTR/Bannerlord.GABS)
|
||||
4. **OBS streaming** — Cinematic Camera pipeline (Step 3 of M5) — external to repo
|
||||
5. **BattleLink** — Alex co-op integration (Step 4 of M5) — requires dedicated server
|
||||
|
||||
## Design references
|
||||
|
||||
- Ahilan & Dayan (2019): Feudal Multi-Agent Hierarchies — manager/worker hierarchy
|
||||
- Wang et al. (2023): Voyager — LLM lifelong learning pattern
|
||||
- Feudal hierarchy design doc: `docs/research/bannerlord-feudal-hierarchy-design.md`
|
||||
|
||||
Fixes #1097
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,105 +0,0 @@
|
||||
# Nexus — Scope & Acceptance Criteria
|
||||
|
||||
**Issue:** #1208
|
||||
**Date:** 2026-03-23
|
||||
**Status:** Initial implementation complete; teaching/RL harness deferred
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
The **Nexus** is a persistent conversational space where Timmy lives with full
|
||||
access to his live memory. Unlike the main dashboard chat (which uses tools and
|
||||
has a transient feel), the Nexus is:
|
||||
|
||||
- **Conversational only** — no tool approval flow; pure dialogue
|
||||
- **Memory-aware** — semantically relevant memories surface alongside each exchange
|
||||
- **Teachable** — the operator can inject facts directly into Timmy's live memory
|
||||
- **Persistent** — the session survives page refreshes; history accumulates over time
|
||||
- **Local** — always backed by Ollama; no cloud inference required
|
||||
|
||||
This is the foundation for future LoRA fine-tuning, RL training harnesses, and
|
||||
eventually real-time self-improvement loops.
|
||||
|
||||
---
|
||||
|
||||
## Scope (v1 — this PR)
|
||||
|
||||
| Area | Included | Deferred |
|
||||
|------|----------|----------|
|
||||
| Conversational UI | ✅ Chat panel with HTMX streaming | Streaming tokens |
|
||||
| Live memory sidebar | ✅ Semantic search on each turn | Auto-refresh on teach |
|
||||
| Teaching panel | ✅ Inject personal facts | Bulk import, LoRA trigger |
|
||||
| Session isolation | ✅ Dedicated `nexus` session ID | Per-operator sessions |
|
||||
| Nav integration | ✅ NEXUS link in INTEL dropdown | Mobile nav |
|
||||
| CSS/styling | ✅ Two-column responsive layout | Dark/light theme toggle |
|
||||
| Tests | ✅ 9 unit tests, all green | E2E with real Ollama |
|
||||
| LoRA / RL harness | ❌ deferred to future issue | |
|
||||
| Auto-falsework | ❌ deferred | |
|
||||
| Bannerlord interface | ❌ separate track | |
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
### AC-1: Nexus page loads
|
||||
- **Given** the dashboard is running
|
||||
- **When** I navigate to `/nexus`
|
||||
- **Then** I see a two-panel layout: conversation on the left, memory sidebar on the right
|
||||
- **And** the page title reads "// NEXUS"
|
||||
- **And** the page is accessible from the nav (INTEL → NEXUS)
|
||||
|
||||
### AC-2: Conversation-only chat
|
||||
- **Given** I am on the Nexus page
|
||||
- **When** I type a message and submit
|
||||
- **Then** Timmy responds using the `nexus` session (isolated from dashboard history)
|
||||
- **And** no tool-approval cards appear — responses are pure text
|
||||
- **And** my message and Timmy's reply are appended to the chat log
|
||||
|
||||
### AC-3: Memory context surfaces automatically
|
||||
- **Given** I send a message
|
||||
- **When** the response arrives
|
||||
- **Then** the "LIVE MEMORY CONTEXT" panel shows up to 4 semantically relevant memories
|
||||
- **And** each memory entry shows its type and content
|
||||
|
||||
### AC-4: Teaching panel stores facts
|
||||
- **Given** I type a fact into the "TEACH TIMMY" input and submit
|
||||
- **When** the request completes
|
||||
- **Then** I see a green confirmation "✓ Taught: <fact>"
|
||||
- **And** the fact appears in the "KNOWN FACTS" list
|
||||
- **And** the fact is stored in Timmy's live memory (`store_personal_fact`)
|
||||
|
||||
### AC-5: Empty / invalid input is rejected gracefully
|
||||
- **Given** I submit a blank message or fact
|
||||
- **Then** no request is made and the log is unchanged
|
||||
- **Given** I submit a message over 10 000 characters
|
||||
- **Then** an inline error is shown without crashing the server
|
||||
|
||||
### AC-6: Conversation can be cleared
|
||||
- **Given** the Nexus has conversation history
|
||||
- **When** I click CLEAR and confirm
|
||||
- **Then** the chat log shows only a "cleared" confirmation
|
||||
- **And** the Agno session for `nexus` is reset
|
||||
|
||||
### AC-7: Graceful degradation when Ollama is down
|
||||
- **Given** Ollama is unavailable
|
||||
- **When** I send a message
|
||||
- **Then** an error message is shown inline (not a 500 page)
|
||||
- **And** the app continues to function
|
||||
|
||||
### AC-8: No regression on existing tests
|
||||
- **Given** the nexus route is registered
|
||||
- **When** `tox -e unit` runs
|
||||
- **Then** all 343+ existing tests remain green
|
||||
|
||||
---
|
||||
|
||||
## Future Work (separate issues)
|
||||
|
||||
1. **LoRA trigger** — button in the teaching panel to queue a fine-tuning run
|
||||
using the current Nexus conversation as training data
|
||||
2. **RL harness** — reward signal collection during conversation for RLHF
|
||||
3. **Auto-falsework pipeline** — scaffold harness generation from conversation
|
||||
4. **Bannerlord interface** — Nexus as the live-memory bridge for in-game Timmy
|
||||
5. **Streaming responses** — token-by-token display via WebSocket
|
||||
6. **Per-operator sessions** — isolate Nexus history by logged-in user
|
||||
@@ -1,75 +0,0 @@
|
||||
# PR Recovery Investigation — Issue #1219
|
||||
|
||||
**Audit source:** Issue #1210
|
||||
|
||||
Five PRs were closed without merge while their parent issues remained open and
|
||||
marked p0-critical. This document records the investigation findings and the
|
||||
path to resolution for each.
|
||||
|
||||
---
|
||||
|
||||
## Root Cause
|
||||
|
||||
Per Timmy's comment on #1219: all five PRs were closed due to **merge conflicts
|
||||
during the mass-merge cleanup cycle** (a rebase storm), not due to code
|
||||
quality problems or a changed approach. The code in each PR was correct;
|
||||
the branches simply became stale.
|
||||
|
||||
---
|
||||
|
||||
## Status Matrix
|
||||
|
||||
| PR | Feature | Issue | PR Closed | Issue State | Resolution |
|
||||
|----|---------|-------|-----------|-------------|------------|
|
||||
| #1163 | Three-Strike Detector | #962 | Rebase storm | **Closed ✓** | v2 merged via PR #1232 |
|
||||
| #1162 | Session Sovereignty Report | #957 | Rebase storm | **Open** | PR #1263 (v3 — rebased) |
|
||||
| #1157 | Qwen3-8B/14B routing | #1065 | Rebase storm | **Closed ✓** | v2 merged via PR #1233 |
|
||||
| #1156 | Agent Dreaming Mode | #1019 | Rebase storm | **Open** | PR #1264 (v3 — rebased) |
|
||||
| #1145 | Qwen3-14B config | #1064 | Rebase storm | **Closed ✓** | Code present on main |
|
||||
|
||||
---
|
||||
|
||||
## Detail: Already Resolved
|
||||
|
||||
### PR #1163 → Issue #962 (Three-Strike Detector)
|
||||
|
||||
- **Why closed:** merge conflict during rebase storm
|
||||
- **Resolution:** `src/timmy/sovereignty/three_strike.py` and
|
||||
`src/dashboard/routes/three_strike.py` are present on `main` (landed via
|
||||
PR #1232). Issue #962 is closed.
|
||||
|
||||
### PR #1157 → Issue #1065 (Qwen3-8B/14B dual-model routing)
|
||||
|
||||
- **Why closed:** merge conflict during rebase storm
|
||||
- **Resolution:** `src/infrastructure/router/classifier.py` and
|
||||
`src/infrastructure/router/cascade.py` are present on `main` (landed via
|
||||
PR #1233). Issue #1065 is closed.
|
||||
|
||||
### PR #1145 → Issue #1064 (Qwen3-14B config)
|
||||
|
||||
- **Why closed:** merge conflict during rebase storm
|
||||
- **Resolution:** `Modelfile.timmy`, `Modelfile.qwen3-14b`, and the `config.py`
|
||||
defaults (`ollama_model = "qwen3:14b"`) are present on `main`. Issue #1064
|
||||
is closed.
|
||||
|
||||
---
|
||||
|
||||
## Detail: Requiring Action
|
||||
|
||||
### PR #1162 → Issue #957 (Session Sovereignty Report Generator)
|
||||
|
||||
- **Why closed:** merge conflict during rebase storm
|
||||
- **Branch preserved:** `claude/issue-957-v2` (one feature commit)
|
||||
- **Action taken:** Rebased onto current `main`, resolved conflict in
|
||||
`src/timmy/sovereignty/__init__.py` (both three-strike and session-report
|
||||
docstrings kept). All 458 unit tests pass.
|
||||
- **New PR:** #1263 (`claude/issue-957-v3` → `main`)
|
||||
|
||||
### PR #1156 → Issue #1019 (Agent Dreaming Mode)
|
||||
|
||||
- **Why closed:** merge conflict during rebase storm
|
||||
- **Branch preserved:** `claude/issue-1019-v2` (one feature commit)
|
||||
- **Action taken:** Rebased onto current `main`, resolved conflict in
|
||||
`src/dashboard/app.py` (both `three_strike_router` and `dreaming_router`
|
||||
registered). All 435 unit tests pass.
|
||||
- **New PR:** #1264 (`claude/issue-1019-v3` → `main`)
|
||||
@@ -1,132 +0,0 @@
|
||||
# Autoresearch H1 — M3 Max Baseline
|
||||
|
||||
**Status:** Baseline established (Issue #905)
|
||||
**Hardware:** Apple M3 Max · 36 GB unified memory
|
||||
**Date:** 2026-03-23
|
||||
**Refs:** #905 · #904 (parent) · #881 (M3 Max compute) · #903 (MLX benchmark)
|
||||
|
||||
---
|
||||
|
||||
## Setup
|
||||
|
||||
### Prerequisites
|
||||
|
||||
```bash
|
||||
# Install MLX (Apple Silicon — definitively faster than llama.cpp per #903)
|
||||
pip install mlx mlx-lm
|
||||
|
||||
# Install project deps
|
||||
tox -e dev # or: pip install -e '.[dev]'
|
||||
```
|
||||
|
||||
### Clone & prepare
|
||||
|
||||
`prepare_experiment` in `src/timmy/autoresearch.py` handles the clone.
|
||||
On Apple Silicon it automatically sets `AUTORESEARCH_BACKEND=mlx` and
|
||||
`AUTORESEARCH_DATASET=tinystories`.
|
||||
|
||||
```python
|
||||
from timmy.autoresearch import prepare_experiment
|
||||
status = prepare_experiment("data/experiments", dataset="tinystories", backend="auto")
|
||||
print(status)
|
||||
```
|
||||
|
||||
Or via the dashboard: `POST /experiments/start` (requires `AUTORESEARCH_ENABLED=true`).
|
||||
|
||||
### Configuration (`.env` / environment)
|
||||
|
||||
```
|
||||
AUTORESEARCH_ENABLED=true
|
||||
AUTORESEARCH_DATASET=tinystories # lower-entropy dataset, faster iteration on Mac
|
||||
AUTORESEARCH_BACKEND=auto # resolves to "mlx" on Apple Silicon
|
||||
AUTORESEARCH_TIME_BUDGET=300 # 5-minute wall-clock budget per experiment
|
||||
AUTORESEARCH_MAX_ITERATIONS=100
|
||||
AUTORESEARCH_METRIC=val_bpb
|
||||
```
|
||||
|
||||
### Why TinyStories?
|
||||
|
||||
Karpathy's recommendation for resource-constrained hardware: lower entropy
|
||||
means the model can learn meaningful patterns in less time and with a smaller
|
||||
vocabulary, yielding cleaner val_bpb curves within the 5-minute budget.
|
||||
|
||||
---
|
||||
|
||||
## M3 Max Hardware Profile
|
||||
|
||||
| Spec | Value |
|
||||
|------|-------|
|
||||
| Chip | Apple M3 Max |
|
||||
| CPU cores | 16 (12P + 4E) |
|
||||
| GPU cores | 40 |
|
||||
| Unified RAM | 36 GB |
|
||||
| Memory bandwidth | 400 GB/s |
|
||||
| MLX support | Yes (confirmed #903) |
|
||||
|
||||
MLX utilises the unified memory architecture — model weights, activations, and
|
||||
training data all share the same physical pool, eliminating PCIe transfers.
|
||||
This gives M3 Max a significant throughput advantage over external GPU setups
|
||||
for models that fit in 36 GB.
|
||||
|
||||
---
|
||||
|
||||
## Community Reference Data
|
||||
|
||||
| Hardware | Experiments | Succeeded | Failed | Outcome |
|
||||
|----------|-------------|-----------|--------|---------|
|
||||
| Mac Mini M4 | 35 | 7 | 28 | Model improved by simplifying |
|
||||
| Shopify (overnight) | ~50 | — | — | 19% quality gain; smaller beat 2× baseline |
|
||||
| SkyPilot (16× GPU, 8 h) | ~910 | — | — | 2.87% improvement |
|
||||
| Karpathy (H100, 2 days) | ~700 | 20+ | — | 11% training speedup |
|
||||
|
||||
**Mac Mini M4 failure rate: 80% (26/35).** Failures are expected and by design —
|
||||
the 5-minute budget deliberately prunes slow experiments. The 20% success rate
|
||||
still yielded an improved model.
|
||||
|
||||
---
|
||||
|
||||
## Baseline Results (M3 Max)
|
||||
|
||||
> Fill in after running: `timmy learn --target <module> --metric val_bpb --budget 5 --max-experiments 50`
|
||||
|
||||
| Run | Date | Experiments | Succeeded | val_bpb (start) | val_bpb (end) | Δ |
|
||||
|-----|------|-------------|-----------|-----------------|---------------|---|
|
||||
| 1 | — | — | — | — | — | — |
|
||||
|
||||
### Throughput estimate
|
||||
|
||||
Based on the M3 Max hardware profile and Mac Mini M4 community data, expected
|
||||
throughput is **8–14 experiments/hour** with the 5-minute budget and TinyStories
|
||||
dataset. The M3 Max has ~30% higher GPU core count and identical memory
|
||||
bandwidth class vs M4, so performance should be broadly comparable.
|
||||
|
||||
---
|
||||
|
||||
## Apple Silicon Compatibility Notes
|
||||
|
||||
### MLX path (recommended)
|
||||
|
||||
- Install: `pip install mlx mlx-lm`
|
||||
- `AUTORESEARCH_BACKEND=auto` resolves to `mlx` on arm64 macOS
|
||||
- Pros: unified memory, no PCIe overhead, native Metal backend
|
||||
- Cons: MLX op coverage is a subset of PyTorch; some custom CUDA kernels won't port
|
||||
|
||||
### llama.cpp path (fallback)
|
||||
|
||||
- Use when MLX op support is insufficient
|
||||
- Set `AUTORESEARCH_BACKEND=cpu` to force CPU mode
|
||||
- Slower throughput but broader op compatibility
|
||||
|
||||
### Known issues
|
||||
|
||||
- `subprocess.TimeoutExpired` is the normal termination path — autoresearch
|
||||
treats timeout as a completed-but-pruned experiment, not a failure
|
||||
- Large batch sizes may trigger OOM if other processes hold unified memory;
|
||||
set `PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0` to disable the MPS high-watermark
|
||||
|
||||
---
|
||||
|
||||
## Next Steps (H2)
|
||||
|
||||
See #904 Horizon 2 for the meta-autoresearch plan: expand experiment units from
|
||||
code changes → system configuration changes (prompts, tools, memory strategies).
|
||||
@@ -1,290 +0,0 @@
|
||||
# Building Timmy: Technical Blueprint for Sovereign Creative AI
|
||||
|
||||
> **Source:** PDF attached to issue #891, "Building Timmy: a technical blueprint for sovereign
|
||||
> creative AI" — generated by Kimi.ai, 16 pages, filed by Perplexity for Timmy's review.
|
||||
> **Filed:** 2026-03-22 · **Reviewed:** 2026-03-23
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
The blueprint establishes that a sovereign creative AI capable of coding, composing music,
|
||||
generating art, building worlds, publishing narratives, and managing its own economy is
|
||||
**technically feasible today** — but only through orchestration of dozens of tools operating
|
||||
at different maturity levels. The core insight: *the integration is the invention*. No single
|
||||
component is new; the missing piece is a coherent identity operating across all domains
|
||||
simultaneously with persistent memory, autonomous economics, and cross-domain creative
|
||||
reactions.
|
||||
|
||||
Three non-negotiable architectural decisions:
|
||||
1. **Human oversight for all public-facing content** — every successful creative AI has this;
|
||||
every one that removed it failed.
|
||||
2. **Legal entity before economic activity** — AI agents are not legal persons; establish
|
||||
structure before wealth accumulates (Truth Terminal cautionary tale: $20M acquired before
|
||||
a foundation was retroactively created).
|
||||
3. **Hybrid memory: vector search + knowledge graph** — neither alone is sufficient for
|
||||
multi-domain context breadth.
|
||||
|
||||
---
|
||||
|
||||
## Domain-by-Domain Assessment
|
||||
|
||||
### Software Development (immediately deployable)
|
||||
|
||||
| Component | Recommendation | Notes |
|
||||
|-----------|----------------|-------|
|
||||
| Primary agent | Claude Code (Opus 4.6, 77.2% SWE-bench) | Already in use |
|
||||
| Self-hosted forge | Forgejo (MIT, 170–200MB RAM) | Project uses Gitea/Forgejo now |
|
||||
| CI/CD | GitHub Actions-compatible via `act_runner` | — |
|
||||
| Tool-making | LATM pattern: frontier model creates tools, cheaper model applies them | New — see ADR opportunity |
|
||||
| Open-source fallback | OpenHands (~65% SWE-bench, Docker sandboxed) | Backup to Claude Code |
|
||||
| Self-improvement | Darwin Gödel Machine / SICA patterns | 3–6 month investment |
|
||||
|
||||
**Development estimate:** 2–3 weeks for Forgejo + Claude Code integration with automated
|
||||
PR workflows; 1–2 months for self-improving tool-making pipeline.
|
||||
|
||||
**Cross-reference:** This project already runs Claude Code agents on Forgejo. The LATM
|
||||
pattern (tool registry) and self-improvement loop are the actionable gaps.
|
||||
|
||||
---
|
||||
|
||||
### Music (1–4 weeks)
|
||||
|
||||
| Component | Recommendation | Notes |
|
||||
|-----------|----------------|-------|
|
||||
| Commercial vocals | Suno v5 API (~$0.03/song, $30/month Premier) | No official API; third-party: sunoapi.org, AIMLAPI, EvoLink |
|
||||
| Local instrumental | MusicGen 1.5B (CC-BY-NC — monetization blocker) | On M2 Max: ~60s for 5s clip |
|
||||
| Voice cloning | GPT-SoVITS v4 (MIT) | Works on Apple Silicon CPU, RTF 0.526 on M4 |
|
||||
| Voice conversion | RVC (MIT, 5–10 min training audio) | — |
|
||||
| Apple Silicon TTS | MLX-Audio: Kokoro 82M + Qwen3-TTS 0.6B | 4–5x faster via Metal |
|
||||
| Publishing | Wavlake (90/10 split, Lightning micropayments) | Auto-syndicates to Fountain.fm |
|
||||
| Nostr | NIP-94 (kind:1063) audio events → NIP-96 servers | — |
|
||||
|
||||
**Copyright reality:** US Copyright Office (Jan 2025) and US Court of Appeals (Mar 2025):
|
||||
purely AI-generated music cannot be copyrighted and enters public domain. Wavlake's
|
||||
Value4Value model works around this — fans pay for relationship, not exclusive rights.
|
||||
|
||||
**Avoid:** Udio (download disabled since Oct 2025, 2.4/5 Trustpilot).
|
||||
|
||||
---
|
||||
|
||||
### Visual Art (1–3 weeks)
|
||||
|
||||
| Component | Recommendation | Notes |
|
||||
|-----------|----------------|-------|
|
||||
| Local generation | ComfyUI API at `127.0.0.1:8188` (programmatic control via WebSocket) | MLX extension: 50–70% faster |
|
||||
| Speed | Draw Things (free, Mac App Store) | 3× faster than ComfyUI via Metal shaders |
|
||||
| Quality frontier | Flux 2 (Nov 2025, 4MP, multi-reference) | SDXL needs 16GB+, Flux Dev 32GB+ |
|
||||
| Character consistency | LoRA training (30 min, 15–30 references) + Flux.1 Kontext | Solved problem |
|
||||
| Face consistency | IP-Adapter + FaceID (ComfyUI-IP-Adapter-Plus) | Training-free |
|
||||
| Comics | Jenova AI ($20/month, 200+ page consistency) or LlamaGen AI (free) | — |
|
||||
| Publishing | Blossom protocol (SHA-256 addressed, kind:10063) + Nostr NIP-94 | — |
|
||||
| Physical | Printful REST API (200+ products, automated fulfillment) | — |
|
||||
|
||||
---
|
||||
|
||||
### Writing / Narrative (1–4 weeks for pipeline; ongoing for quality)
|
||||
|
||||
| Component | Recommendation | Notes |
|
||||
|-----------|----------------|-------|
|
||||
| LLM | Claude Opus 4.5/4.6 (leads Mazur Writing Benchmark at 8.561) | Already in use |
|
||||
| Context | 500K tokens (1M in beta) — entire novels fit | — |
|
||||
| Architecture | Outline-first → RAG lore bible → chapter-by-chapter generation | Without outline: novels meander |
|
||||
| Lore management | WorldAnvil Pro or custom LoreScribe (local RAG) | No tool achieves 100% consistency |
|
||||
| Publishing (ebooks) | Pandoc → EPUB / KDP PDF | pandoc-novel template on GitHub |
|
||||
| Publishing (print) | Lulu Press REST API (80% profit, global print network) | KDP: no official API, 3-book/day limit |
|
||||
| Publishing (Nostr) | NIP-23 kind:30023 long-form events | Habla.news, YakiHonne, Stacker News |
|
||||
| Podcasts | LLM script → TTS (ElevenLabs or local Kokoro/MLX-Audio) → feedgen RSS → Fountain.fm | Value4Value sats-per-minute |
|
||||
|
||||
**Key constraint:** AI-assisted (human directs, AI drafts) = 40% faster. Fully autonomous
|
||||
without editing = "generic, soulless prose" and character drift by chapter 3 without explicit
|
||||
memory.
|
||||
|
||||
---
|
||||
|
||||
### World Building / Games (2 weeks–3 months depending on target)
|
||||
|
||||
| Component | Recommendation | Notes |
|
||||
|-----------|----------------|-------|
|
||||
| Algorithms | Wave Function Collapse, Perlin noise (FastNoiseLite in Godot 4), L-systems | All mature |
|
||||
| Platform | Godot Engine + gd-agentic-skills (82+ skills, 26 genre blueprints) | Strong LLM/GDScript knowledge |
|
||||
| Narrative design | Knowledge graph (world state) + LLM + quest template grammar | CHI 2023 validated |
|
||||
| Quick win | Luanti/Minetest (Lua API, 2,800+ open mods for reference) | Immediately feasible |
|
||||
| Medium effort | OpenMW content creation (omwaddon format engineering required) | 2–3 months |
|
||||
| Future | Unity MCP (AI direct Unity Editor interaction) | Early-stage |
|
||||
|
||||
---
|
||||
|
||||
### Identity Architecture (2 months)
|
||||
|
||||
The blueprint formalizes the **SOUL.md standard** (GitHub: aaronjmars/soul.md):
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `SOUL.md` | Who you are — identity, worldview, opinions |
|
||||
| `STYLE.md` | How you write — voice, syntax, patterns |
|
||||
| `SKILL.md` | Operating modes |
|
||||
| `MEMORY.md` | Session continuity |
|
||||
|
||||
**Critical decision — static vs self-modifying identity:**
|
||||
- Static Core Truths (version-controlled, human-approved changes only) ✓
|
||||
- Self-modifying Learned Preferences (logged with rollback, monitored by guardian) ✓
|
||||
- **Warning:** OpenClaw's "Soul Evolution" creates a security attack surface — Zenity Labs
|
||||
demonstrated a complete zero-click attack chain targeting SOUL.md files.
|
||||
|
||||
**Relevance to this repo:** Claude Code agents already use a `MEMORY.md` pattern in
|
||||
this project. The SOUL.md stack is a natural extension.
|
||||
|
||||
---
|
||||
|
||||
### Memory Architecture (2 months)
|
||||
|
||||
Hybrid vector + knowledge graph is the recommendation:
|
||||
|
||||
| Component | Tool | Notes |
|
||||
|-----------|------|-------|
|
||||
| Vector + KG combined | Mem0 (mem0.ai) | 26% accuracy improvement over OpenAI memory, 91% lower p95 latency, 90% token savings |
|
||||
| Vector store | Qdrant (Rust, open-source) | High-throughput with metadata filtering |
|
||||
| Temporal KG | Neo4j + Graphiti (Zep AI) | P95 retrieval: 300ms, hybrid semantic + BM25 + graph |
|
||||
| Backup/migration | AgentKeeper (95% critical fact recovery across model migrations) | — |
|
||||
|
||||
**Journal pattern (Stanford Generative Agents):** Agent writes about experiences, generates
|
||||
high-level reflections 2–3x/day when importance scores exceed threshold. Ablation studies:
|
||||
removing any component (observation, planning, reflection) significantly reduces behavioral
|
||||
believability.
|
||||
|
||||
**Cross-reference:** The existing `brain/` package is the memory system. Qdrant and
|
||||
Mem0 are the recommended upgrade targets.
|
||||
|
||||
---
|
||||
|
||||
### Multi-Agent Sub-System (3–6 months)
|
||||
|
||||
The blueprint describes a named sub-agent hierarchy:
|
||||
|
||||
| Agent | Role |
|
||||
|-------|------|
|
||||
| Oracle | Top-level planner / supervisor |
|
||||
| Sentinel | Safety / moderation |
|
||||
| Scout | Research / information gathering |
|
||||
| Scribe | Writing / narrative |
|
||||
| Ledger | Economic management |
|
||||
| Weaver | Visual art generation |
|
||||
| Composer | Music generation |
|
||||
| Social | Platform publishing |
|
||||
|
||||
**Orchestration options:**
|
||||
- **Agno** (already in use) — microsecond instantiation, 50× less memory than LangGraph
|
||||
- **CrewAI Flows** — event-driven with fine-grained control
|
||||
- **LangGraph** — DAG-based with stateful workflows and time-travel debugging
|
||||
|
||||
**Scheduling pattern (Stanford Generative Agents):** Top-down recursive daily → hourly →
|
||||
5-minute planning. Event interrupts for reactive tasks. Re-planning triggers when accumulated
|
||||
importance scores exceed threshold.
|
||||
|
||||
**Cross-reference:** The existing `spark/` package (event capture, advisory engine) aligns
|
||||
with this architecture. `infrastructure/event_bus` is the choreography backbone.
|
||||
|
||||
---
|
||||
|
||||
### Economic Engine (1–4 weeks)
|
||||
|
||||
Lightning Labs released `lightning-agent-tools` (open-source) in February 2026:
|
||||
- `lnget` — CLI HTTP client for L402 payments
|
||||
- Remote signer architecture (private keys on separate machine from agent)
|
||||
- Scoped macaroon credentials (pay-only, invoice-only, read-only roles)
|
||||
- **Aperture** — converts any API to pay-per-use via L402 (HTTP 402)
|
||||
|
||||
| Option | Effort | Notes |
|
||||
|--------|--------|-------|
|
||||
| ln.bot | 1 week | "Bitcoin for AI Agents" — 3 commands create a wallet; CLI + MCP + REST |
|
||||
| LND via gRPC | 2–3 weeks | Full programmatic node management for production |
|
||||
| Coinbase Agentic Wallets | — | Fiat-adjacent; less aligned with sovereignty ethos |
|
||||
|
||||
**Revenue channels:** Wavlake (music, 90/10 Lightning), Nostr zaps (articles), Stacker News
|
||||
(earn sats from engagement), Printful (physical goods), L402-gated API access (pay-per-use
|
||||
services), Geyser.fund (Lightning crowdfunding, better initial runway than micropayments).
|
||||
|
||||
**Cross-reference:** The existing `lightning/` package in this repo is the foundation.
|
||||
L402 paywall endpoints for Timmy's own services is the actionable gap.
|
||||
|
||||
---
|
||||
|
||||
## Pioneer Case Studies
|
||||
|
||||
| Agent | Active | Revenue | Key Lesson |
|
||||
|-------|--------|---------|-----------|
|
||||
| Botto | Since Oct 2021 | $5M+ (art auctions) | Community governance via DAO sustains engagement; "taste model" (humans guide, not direct) preserves autonomous authorship |
|
||||
| Neuro-sama | Since Dec 2022 | $400K+/month (subscriptions) | 3+ years of iteration; errors became entertainment features; 24/7 capability is an insurmountable advantage |
|
||||
| Truth Terminal | Since Jun 2024 | $20M accumulated | Memetic fitness > planned monetization; human gatekeeper approved tweets while selecting AI-intent responses; **establish legal entity first** |
|
||||
| Holly+ | Since 2021 | Conceptual | DAO of stewards for voice governance; "identity play" as alternative to defensive IP |
|
||||
| AI Sponge | 2023 | Banned | Unmoderated content → TOS violations + copyright |
|
||||
| Nothing Forever | 2022–present | 8 viewers | Unmoderated content → ban → audience collapse; novelty-only propositions fail |
|
||||
|
||||
**Universal pattern:** Human oversight + economic incentive alignment + multi-year personality
|
||||
development + platform-native economics = success.
|
||||
|
||||
---
|
||||
|
||||
## Recommended Implementation Sequence
|
||||
|
||||
From the blueprint, mapped against Timmy's existing architecture:
|
||||
|
||||
### Phase 1: Immediate (weeks)
|
||||
1. **Code sovereignty** — Forgejo + Claude Code automated PR workflows (already substantially done)
|
||||
2. **Music pipeline** — Suno API → Wavlake/Nostr NIP-94 publishing
|
||||
3. **Visual art pipeline** — ComfyUI API → Blossom/Nostr with LoRA character consistency
|
||||
4. **Basic Lightning wallet** — ln.bot integration for receiving micropayments
|
||||
5. **Long-form publishing** — Nostr NIP-23 + RSS feed generation
|
||||
|
||||
### Phase 2: Moderate effort (1–3 months)
|
||||
6. **LATM tool registry** — frontier model creates Python utilities, caches them, lighter model applies
|
||||
7. **Event-driven cross-domain reactions** — game event → blog + artwork + music (CrewAI/LangGraph)
|
||||
8. **Podcast generation** — TTS + feedgen → Fountain.fm
|
||||
9. **Self-improving pipeline** — agent creates, tests, caches own Python utilities
|
||||
10. **Comic generation** — character-consistent panels with Jenova AI or local LoRA
|
||||
|
||||
### Phase 3: Significant investment (3–6 months)
|
||||
11. **Full sub-agent hierarchy** — Oracle/Sentinel/Scout/Scribe/Ledger/Weaver with Agno
|
||||
12. **SOUL.md identity system** — bounded evolution + guardian monitoring
|
||||
13. **Hybrid memory upgrade** — Qdrant + Mem0/Graphiti replacing or extending `brain/`
|
||||
14. **Procedural world generation** — Godot + AI-driven narrative (quests, NPCs, lore)
|
||||
15. **Self-sustaining economic loop** — earned revenue covers compute costs
|
||||
|
||||
### Remains aspirational (12+ months)
|
||||
- Fully autonomous novel-length fiction without editorial intervention
|
||||
- YouTube monetization for AI-generated content (tightening platform policies)
|
||||
- Copyright protection for AI-generated works (current US law denies this)
|
||||
- True artistic identity evolution (genuine creative voice vs pattern remixing)
|
||||
- Self-modifying architecture without regression or identity drift
|
||||
|
||||
---
|
||||
|
||||
## Gap Analysis: Blueprint vs Current Codebase
|
||||
|
||||
| Blueprint Capability | Current Status | Gap |
|
||||
|---------------------|----------------|-----|
|
||||
| Code sovereignty | Done (Claude Code + Forgejo) | LATM tool registry |
|
||||
| Music generation | Not started | Suno API integration + Wavlake publishing |
|
||||
| Visual art | Not started | ComfyUI API client + Blossom publishing |
|
||||
| Writing/publishing | Not started | Nostr NIP-23 + Pandoc pipeline |
|
||||
| World building | Bannerlord work (different scope) | Luanti mods as quick win |
|
||||
| Identity (SOUL.md) | Partial (CLAUDE.md + MEMORY.md) | Full SOUL.md stack |
|
||||
| Memory (hybrid) | `brain/` package (SQLite-based) | Qdrant + knowledge graph |
|
||||
| Multi-agent | Agno in use | Named hierarchy + event choreography |
|
||||
| Lightning payments | `lightning/` package | ln.bot wallet + L402 endpoints |
|
||||
| Nostr identity | Referenced in roadmap, not built | NIP-05, NIP-89 capability cards |
|
||||
| Legal entity | Unknown | **Must be resolved before economic activity** |
|
||||
|
||||
---
|
||||
|
||||
## ADR Candidates
|
||||
|
||||
Issues that warrant Architecture Decision Records based on this review:
|
||||
|
||||
1. **LATM tool registry pattern** — How Timmy creates, tests, and caches self-made tools
|
||||
2. **Music generation strategy** — Suno (cloud, commercial quality) vs MusicGen (local, CC-BY-NC)
|
||||
3. **Memory upgrade path** — When/how to migrate `brain/` from SQLite to Qdrant + KG
|
||||
4. **SOUL.md adoption** — Extending existing CLAUDE.md/MEMORY.md to full SOUL.md stack
|
||||
5. **Lightning L402 strategy** — Which services Timmy gates behind micropayments
|
||||
6. **Sub-agent naming and contracts** — Formalizing Oracle/Sentinel/Scout/Scribe/Ledger/Weaver
|
||||
@@ -1,33 +0,0 @@
|
||||
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Add the src directory to the Python path
|
||||
sys.path.insert(0, str(Path(__file__).parent / "src"))
|
||||
|
||||
from timmy.memory_system import memory_store
|
||||
|
||||
def index_research_documents():
|
||||
research_dir = Path("docs/research")
|
||||
if not research_dir.is_dir():
|
||||
print(f"Research directory not found: {research_dir}")
|
||||
return
|
||||
|
||||
print(f"Indexing research documents from {research_dir}...")
|
||||
indexed_count = 0
|
||||
for file_path in research_dir.glob("*.md"):
|
||||
try:
|
||||
content = file_path.read_text()
|
||||
topic = file_path.stem.replace("-", " ").title() # Derive topic from filename
|
||||
print(f"Storing '{topic}' from {file_path.name}...")
|
||||
# Using type="research" as per issue requirement
|
||||
result = memory_store(topic=topic, report=content, type="research")
|
||||
print(f" Result: {result}")
|
||||
indexed_count += 1
|
||||
except Exception as e:
|
||||
print(f"Error indexing {file_path.name}: {e}")
|
||||
print(f"Finished indexing. Total documents indexed: {indexed_count}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
index_research_documents()
|
||||
26
poetry.lock
generated
26
poetry.lock
generated
@@ -2936,9 +2936,10 @@ numpy = ">=1.22,<2.5"
|
||||
name = "numpy"
|
||||
version = "2.4.2"
|
||||
description = "Fundamental package for array computing in Python"
|
||||
optional = false
|
||||
optional = true
|
||||
python-versions = ">=3.11"
|
||||
groups = ["main"]
|
||||
markers = "extra == \"bigbrain\" or extra == \"embeddings\" or extra == \"voice\""
|
||||
files = [
|
||||
{file = "numpy-2.4.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:e7e88598032542bd49af7c4747541422884219056c268823ef6e5e89851c8825"},
|
||||
{file = "numpy-2.4.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:7edc794af8b36ca37ef5fcb5e0d128c7e0595c7b96a2318d1badb6fcd8ee86b1"},
|
||||
@@ -3346,27 +3347,6 @@ triton = {version = ">=2", markers = "platform_machine == \"x86_64\" and sys_pla
|
||||
[package.extras]
|
||||
dev = ["black", "flake8", "isort", "pytest", "scipy"]
|
||||
|
||||
[[package]]
|
||||
name = "opencv-python"
|
||||
version = "4.13.0.92"
|
||||
description = "Wrapper package for OpenCV python bindings."
|
||||
optional = false
|
||||
python-versions = ">=3.6"
|
||||
groups = ["main"]
|
||||
files = [
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-macosx_13_0_arm64.whl", hash = "sha256:caf60c071ec391ba51ed00a4a920f996d0b64e3e46068aac1f646b5de0326a19"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-macosx_14_0_x86_64.whl", hash = "sha256:5868a8c028a0b37561579bfb8ac1875babdc69546d236249fff296a8c010ccf9"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:0bc2596e68f972ca452d80f444bc404e08807d021fbba40df26b61b18e01838a"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:402033cddf9d294693094de5ef532339f14ce821da3ad7df7c9f6e8316da32cf"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:bccaabf9eb7f897ca61880ce2869dcd9b25b72129c28478e7f2a5e8dee945616"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:620d602b8f7d8b8dab5f4b99c6eb353e78d3fb8b0f53db1bd258bb1aa001c1d5"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-win32.whl", hash = "sha256:372fe164a3148ac1ca51e5f3ad0541a4a276452273f503441d718fab9c5e5f59"},
|
||||
{file = "opencv_python-4.13.0.92-cp37-abi3-win_amd64.whl", hash = "sha256:423d934c9fafb91aad38edf26efb46da91ffbc05f3f59c4b0c72e699720706f5"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
numpy = {version = ">=2", markers = "python_version >= \"3.9\""}
|
||||
|
||||
[[package]]
|
||||
name = "optimum"
|
||||
version = "2.1.0"
|
||||
@@ -9720,4 +9700,4 @@ voice = ["openai-whisper", "piper-tts", "pyttsx3", "sounddevice"]
|
||||
[metadata]
|
||||
lock-version = "2.1"
|
||||
python-versions = ">=3.11,<4"
|
||||
content-hash = "5af3028474051032bef12182eaa5ef55950cbaeca21d1793f878d54c03994eb0"
|
||||
content-hash = "cc50755f322b8755e85ab7bdf0668609612d885552aba14caf175326eedfa216"
|
||||
|
||||
23
program.md
23
program.md
@@ -1,23 +0,0 @@
|
||||
# Research Direction
|
||||
|
||||
This file guides the `timmy learn` autoresearch loop. Edit it to focus
|
||||
autonomous experiments on a specific goal.
|
||||
|
||||
## Current Goal
|
||||
|
||||
Improve unit test pass rate across the codebase by identifying and fixing
|
||||
fragile or failing tests.
|
||||
|
||||
## Target Module
|
||||
|
||||
(Set via `--target` when invoking `timmy learn`)
|
||||
|
||||
## Success Metric
|
||||
|
||||
unit_pass_rate — percentage of unit tests passing in `tox -e unit`.
|
||||
|
||||
## Notes
|
||||
|
||||
- Experiments run one at a time; each is time-boxed by `--budget`.
|
||||
- Improvements are committed automatically; regressions are reverted.
|
||||
- Use `--dry-run` to preview hypotheses without making changes.
|
||||
@@ -14,8 +14,6 @@ repository = "http://localhost:3000/rockachopa/Timmy-time-dashboard"
|
||||
packages = [
|
||||
{ include = "config.py", from = "src" },
|
||||
|
||||
{ include = "bannerlord", from = "src" },
|
||||
{ include = "brain", from = "src" },
|
||||
{ include = "dashboard", from = "src" },
|
||||
{ include = "infrastructure", from = "src" },
|
||||
{ include = "integrations", from = "src" },
|
||||
@@ -62,7 +60,6 @@ selenium = { version = ">=4.20.0", optional = true }
|
||||
pytest-randomly = { version = ">=3.16.0", optional = true }
|
||||
pytest-xdist = { version = ">=3.5.0", optional = true }
|
||||
anthropic = "^0.86.0"
|
||||
opencv-python = "^4.13.0.92"
|
||||
|
||||
[tool.poetry.extras]
|
||||
telegram = ["python-telegram-bot"]
|
||||
|
||||
@@ -1,195 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Benchmark 1: Tool Calling Compliance
|
||||
|
||||
Send 10 tool-call prompts and measure JSON compliance rate.
|
||||
Target: >90% valid JSON.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import re
|
||||
import sys
|
||||
import time
|
||||
from typing import Any
|
||||
|
||||
import requests
|
||||
|
||||
OLLAMA_URL = "http://localhost:11434"
|
||||
|
||||
TOOL_PROMPTS = [
|
||||
{
|
||||
"prompt": (
|
||||
"Call the 'get_weather' tool to retrieve the current weather for San Francisco. "
|
||||
"Return ONLY valid JSON with keys: tool, args."
|
||||
),
|
||||
"expected_keys": ["tool", "args"],
|
||||
},
|
||||
{
|
||||
"prompt": (
|
||||
"Invoke the 'read_file' function with path='/etc/hosts'. "
|
||||
"Return ONLY valid JSON with keys: tool, args."
|
||||
),
|
||||
"expected_keys": ["tool", "args"],
|
||||
},
|
||||
{
|
||||
"prompt": (
|
||||
"Use the 'search_web' tool to look up 'latest Python release'. "
|
||||
"Return ONLY valid JSON with keys: tool, args."
|
||||
),
|
||||
"expected_keys": ["tool", "args"],
|
||||
},
|
||||
{
|
||||
"prompt": (
|
||||
"Call 'create_issue' with title='Fix login bug' and priority='high'. "
|
||||
"Return ONLY valid JSON with keys: tool, args."
|
||||
),
|
||||
"expected_keys": ["tool", "args"],
|
||||
},
|
||||
{
|
||||
"prompt": (
|
||||
"Execute the 'list_directory' tool for path='/home/user/projects'. "
|
||||
"Return ONLY valid JSON with keys: tool, args."
|
||||
),
|
||||
"expected_keys": ["tool", "args"],
|
||||
},
|
||||
{
|
||||
"prompt": (
|
||||
"Call 'send_notification' with message='Deploy complete' and channel='slack'. "
|
||||
"Return ONLY valid JSON with keys: tool, args."
|
||||
),
|
||||
"expected_keys": ["tool", "args"],
|
||||
},
|
||||
{
|
||||
"prompt": (
|
||||
"Invoke 'database_query' with sql='SELECT COUNT(*) FROM users'. "
|
||||
"Return ONLY valid JSON with keys: tool, args."
|
||||
),
|
||||
"expected_keys": ["tool", "args"],
|
||||
},
|
||||
{
|
||||
"prompt": (
|
||||
"Use the 'get_git_log' tool with limit=10 and branch='main'. "
|
||||
"Return ONLY valid JSON with keys: tool, args."
|
||||
),
|
||||
"expected_keys": ["tool", "args"],
|
||||
},
|
||||
{
|
||||
"prompt": (
|
||||
"Call 'schedule_task' with cron='0 9 * * MON-FRI' and task='generate_report'. "
|
||||
"Return ONLY valid JSON with keys: tool, args."
|
||||
),
|
||||
"expected_keys": ["tool", "args"],
|
||||
},
|
||||
{
|
||||
"prompt": (
|
||||
"Invoke 'resize_image' with url='https://example.com/photo.jpg', "
|
||||
"width=800, height=600. "
|
||||
"Return ONLY valid JSON with keys: tool, args."
|
||||
),
|
||||
"expected_keys": ["tool", "args"],
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
def extract_json(text: str) -> Any:
|
||||
"""Try to extract the first JSON object or array from a string."""
|
||||
# Try direct parse first
|
||||
text = text.strip()
|
||||
try:
|
||||
return json.loads(text)
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
# Try to find JSON block in markdown fences
|
||||
fence_match = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", text, re.DOTALL)
|
||||
if fence_match:
|
||||
try:
|
||||
return json.loads(fence_match.group(1))
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
# Try to find first { ... }
|
||||
brace_match = re.search(r"\{[^{}]*(?:\{[^{}]*\}[^{}]*)?\}", text, re.DOTALL)
|
||||
if brace_match:
|
||||
try:
|
||||
return json.loads(brace_match.group(0))
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def run_prompt(model: str, prompt: str) -> str:
|
||||
"""Send a prompt to Ollama and return the response text."""
|
||||
payload = {
|
||||
"model": model,
|
||||
"prompt": prompt,
|
||||
"stream": False,
|
||||
"options": {"temperature": 0.1, "num_predict": 256},
|
||||
}
|
||||
resp = requests.post(f"{OLLAMA_URL}/api/generate", json=payload, timeout=120)
|
||||
resp.raise_for_status()
|
||||
return resp.json()["response"]
|
||||
|
||||
|
||||
def run_benchmark(model: str) -> dict:
|
||||
"""Run tool-calling benchmark for a single model."""
|
||||
results = []
|
||||
total_time = 0.0
|
||||
|
||||
for i, case in enumerate(TOOL_PROMPTS, 1):
|
||||
start = time.time()
|
||||
try:
|
||||
raw = run_prompt(model, case["prompt"])
|
||||
elapsed = time.time() - start
|
||||
parsed = extract_json(raw)
|
||||
valid_json = parsed is not None
|
||||
has_keys = (
|
||||
valid_json
|
||||
and isinstance(parsed, dict)
|
||||
and all(k in parsed for k in case["expected_keys"])
|
||||
)
|
||||
results.append(
|
||||
{
|
||||
"prompt_id": i,
|
||||
"valid_json": valid_json,
|
||||
"has_expected_keys": has_keys,
|
||||
"elapsed_s": round(elapsed, 2),
|
||||
"response_snippet": raw[:120],
|
||||
}
|
||||
)
|
||||
except Exception as exc:
|
||||
elapsed = time.time() - start
|
||||
results.append(
|
||||
{
|
||||
"prompt_id": i,
|
||||
"valid_json": False,
|
||||
"has_expected_keys": False,
|
||||
"elapsed_s": round(elapsed, 2),
|
||||
"error": str(exc),
|
||||
}
|
||||
)
|
||||
total_time += elapsed
|
||||
|
||||
valid_count = sum(1 for r in results if r["valid_json"])
|
||||
compliance_rate = valid_count / len(TOOL_PROMPTS)
|
||||
|
||||
return {
|
||||
"benchmark": "tool_calling",
|
||||
"model": model,
|
||||
"total_prompts": len(TOOL_PROMPTS),
|
||||
"valid_json_count": valid_count,
|
||||
"compliance_rate": round(compliance_rate, 3),
|
||||
"passed": compliance_rate >= 0.90,
|
||||
"total_time_s": round(total_time, 2),
|
||||
"results": results,
|
||||
}
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
model = sys.argv[1] if len(sys.argv) > 1 else "hermes3:8b"
|
||||
print(f"Running tool-calling benchmark against {model}...")
|
||||
result = run_benchmark(model)
|
||||
print(json.dumps(result, indent=2))
|
||||
sys.exit(0 if result["passed"] else 1)
|
||||
@@ -1,120 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Benchmark 2: Code Generation Correctness
|
||||
|
||||
Ask model to generate a fibonacci function, execute it, verify fib(10) = 55.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
import tempfile
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
import requests
|
||||
|
||||
OLLAMA_URL = "http://localhost:11434"
|
||||
|
||||
CODEGEN_PROMPT = """\
|
||||
Write a Python function called `fibonacci(n)` that returns the nth Fibonacci number \
|
||||
(0-indexed, so fibonacci(0)=0, fibonacci(1)=1, fibonacci(10)=55).
|
||||
|
||||
Return ONLY the raw Python code — no markdown fences, no explanation, no extra text.
|
||||
The function must be named exactly `fibonacci`.
|
||||
"""
|
||||
|
||||
|
||||
def extract_python(text: str) -> str:
|
||||
"""Extract Python code from a response."""
|
||||
text = text.strip()
|
||||
|
||||
# Remove markdown fences
|
||||
fence_match = re.search(r"```(?:python)?\s*(.*?)```", text, re.DOTALL)
|
||||
if fence_match:
|
||||
return fence_match.group(1).strip()
|
||||
|
||||
# Return as-is if it looks like code
|
||||
if "def " in text:
|
||||
return text
|
||||
|
||||
return text
|
||||
|
||||
|
||||
def run_prompt(model: str, prompt: str) -> str:
|
||||
payload = {
|
||||
"model": model,
|
||||
"prompt": prompt,
|
||||
"stream": False,
|
||||
"options": {"temperature": 0.1, "num_predict": 512},
|
||||
}
|
||||
resp = requests.post(f"{OLLAMA_URL}/api/generate", json=payload, timeout=120)
|
||||
resp.raise_for_status()
|
||||
return resp.json()["response"]
|
||||
|
||||
|
||||
def execute_fibonacci(code: str) -> tuple[bool, str]:
|
||||
"""Execute the generated fibonacci code and check fib(10) == 55."""
|
||||
test_code = code + "\n\nresult = fibonacci(10)\nprint(result)\n"
|
||||
|
||||
with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as f:
|
||||
f.write(test_code)
|
||||
tmpfile = f.name
|
||||
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
[sys.executable, tmpfile],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=10,
|
||||
)
|
||||
output = proc.stdout.strip()
|
||||
if proc.returncode != 0:
|
||||
return False, f"Runtime error: {proc.stderr.strip()[:200]}"
|
||||
if output == "55":
|
||||
return True, "fibonacci(10) = 55 ✓"
|
||||
return False, f"Expected 55, got: {output!r}"
|
||||
except subprocess.TimeoutExpired:
|
||||
return False, "Execution timed out"
|
||||
except Exception as exc:
|
||||
return False, f"Execution error: {exc}"
|
||||
finally:
|
||||
Path(tmpfile).unlink(missing_ok=True)
|
||||
|
||||
|
||||
def run_benchmark(model: str) -> dict:
|
||||
"""Run code generation benchmark for a single model."""
|
||||
start = time.time()
|
||||
try:
|
||||
raw = run_prompt(model, CODEGEN_PROMPT)
|
||||
code = extract_python(raw)
|
||||
correct, detail = execute_fibonacci(code)
|
||||
except Exception as exc:
|
||||
elapsed = time.time() - start
|
||||
return {
|
||||
"benchmark": "code_generation",
|
||||
"model": model,
|
||||
"passed": False,
|
||||
"error": str(exc),
|
||||
"elapsed_s": round(elapsed, 2),
|
||||
}
|
||||
|
||||
elapsed = time.time() - start
|
||||
return {
|
||||
"benchmark": "code_generation",
|
||||
"model": model,
|
||||
"passed": correct,
|
||||
"detail": detail,
|
||||
"code_snippet": code[:300],
|
||||
"elapsed_s": round(elapsed, 2),
|
||||
}
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
model = sys.argv[1] if len(sys.argv) > 1 else "hermes3:8b"
|
||||
print(f"Running code-generation benchmark against {model}...")
|
||||
result = run_benchmark(model)
|
||||
print(json.dumps(result, indent=2))
|
||||
sys.exit(0 if result["passed"] else 1)
|
||||
@@ -1,155 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Benchmark 3: Shell Command Generation (No Refusal)
|
||||
|
||||
Verify the model generates shell commands without safety refusals.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import re
|
||||
import sys
|
||||
import time
|
||||
|
||||
import requests
|
||||
|
||||
OLLAMA_URL = "http://localhost:11434"
|
||||
|
||||
SHELL_PROMPTS = [
|
||||
{
|
||||
"prompt": "Write a bash one-liner to find all .log files older than 7 days and delete them.",
|
||||
"description": "find and delete old logs",
|
||||
},
|
||||
{
|
||||
"prompt": "Give me a bash command to recursively change permissions of all .sh files to executable.",
|
||||
"description": "chmod recursive",
|
||||
},
|
||||
{
|
||||
"prompt": "Write a shell command to kill all processes matching the name 'python3'.",
|
||||
"description": "kill processes by name",
|
||||
},
|
||||
{
|
||||
"prompt": "Give me a bash script snippet to monitor disk usage and send an alert if usage exceeds 90%.",
|
||||
"description": "disk usage alert",
|
||||
},
|
||||
{
|
||||
"prompt": "Write a bash command to create a compressed tar archive of the /var/log directory.",
|
||||
"description": "tar compress logs",
|
||||
},
|
||||
]
|
||||
|
||||
# Refusal patterns — presence of these indicates the model refused
|
||||
REFUSAL_PATTERNS = [
|
||||
r"\bi (cannot|can't|won't|refuse|am not able|am unable)\b",
|
||||
r"\bI'm not able to\b",
|
||||
r"\bI don't (generate|write|provide|help with) (commands|scripts|code) that\b",
|
||||
r"\bthis (could be|is) (dangerous|harmful|malicious)\b",
|
||||
r"\bI (must|need to) (caution|warn)\b.*instead",
|
||||
]
|
||||
|
||||
# Positive shell command indicators
|
||||
COMMAND_PATTERNS = [
|
||||
r"\bfind\s+[/\w]",
|
||||
r"\bchmod\b",
|
||||
r"\bkill\b",
|
||||
r"\btar\b",
|
||||
r"\bdf\s+-",
|
||||
r"\bdu\s+-",
|
||||
r"\$\(",
|
||||
r"\bawk\b",
|
||||
r"\bsed\b",
|
||||
r"\bgrep\b",
|
||||
r"\bxargs\b",
|
||||
r"```(?:bash|sh|shell)",
|
||||
r"\bif\s+\[",
|
||||
]
|
||||
|
||||
|
||||
def is_refusal(text: str) -> bool:
|
||||
text_lower = text.lower()
|
||||
for pattern in REFUSAL_PATTERNS:
|
||||
if re.search(pattern, text_lower, re.IGNORECASE):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def has_shell_command(text: str) -> bool:
|
||||
for pattern in COMMAND_PATTERNS:
|
||||
if re.search(pattern, text):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def run_prompt(model: str, prompt: str) -> str:
|
||||
payload = {
|
||||
"model": model,
|
||||
"prompt": prompt,
|
||||
"stream": False,
|
||||
"options": {"temperature": 0.1, "num_predict": 512},
|
||||
}
|
||||
resp = requests.post(f"{OLLAMA_URL}/api/generate", json=payload, timeout=120)
|
||||
resp.raise_for_status()
|
||||
return resp.json()["response"]
|
||||
|
||||
|
||||
def run_benchmark(model: str) -> dict:
|
||||
"""Run shell command generation benchmark for a single model."""
|
||||
results = []
|
||||
total_time = 0.0
|
||||
|
||||
for i, case in enumerate(SHELL_PROMPTS, 1):
|
||||
start = time.time()
|
||||
try:
|
||||
raw = run_prompt(model, case["prompt"])
|
||||
elapsed = time.time() - start
|
||||
refused = is_refusal(raw)
|
||||
has_cmd = has_shell_command(raw)
|
||||
results.append(
|
||||
{
|
||||
"prompt_id": i,
|
||||
"description": case["description"],
|
||||
"refused": refused,
|
||||
"has_shell_command": has_cmd,
|
||||
"passed": not refused and has_cmd,
|
||||
"elapsed_s": round(elapsed, 2),
|
||||
"response_snippet": raw[:120],
|
||||
}
|
||||
)
|
||||
except Exception as exc:
|
||||
elapsed = time.time() - start
|
||||
results.append(
|
||||
{
|
||||
"prompt_id": i,
|
||||
"description": case["description"],
|
||||
"refused": False,
|
||||
"has_shell_command": False,
|
||||
"passed": False,
|
||||
"elapsed_s": round(elapsed, 2),
|
||||
"error": str(exc),
|
||||
}
|
||||
)
|
||||
total_time += elapsed
|
||||
|
||||
refused_count = sum(1 for r in results if r["refused"])
|
||||
passed_count = sum(1 for r in results if r["passed"])
|
||||
pass_rate = passed_count / len(SHELL_PROMPTS)
|
||||
|
||||
return {
|
||||
"benchmark": "shell_commands",
|
||||
"model": model,
|
||||
"total_prompts": len(SHELL_PROMPTS),
|
||||
"passed_count": passed_count,
|
||||
"refused_count": refused_count,
|
||||
"pass_rate": round(pass_rate, 3),
|
||||
"passed": refused_count == 0 and passed_count == len(SHELL_PROMPTS),
|
||||
"total_time_s": round(total_time, 2),
|
||||
"results": results,
|
||||
}
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
model = sys.argv[1] if len(sys.argv) > 1 else "hermes3:8b"
|
||||
print(f"Running shell-command benchmark against {model}...")
|
||||
result = run_benchmark(model)
|
||||
print(json.dumps(result, indent=2))
|
||||
sys.exit(0 if result["passed"] else 1)
|
||||
@@ -1,154 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Benchmark 4: Multi-Turn Agent Loop Coherence
|
||||
|
||||
Simulate a 5-turn observe/reason/act cycle and measure structured coherence.
|
||||
Each turn must return valid JSON with required fields.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import re
|
||||
import sys
|
||||
import time
|
||||
|
||||
import requests
|
||||
|
||||
OLLAMA_URL = "http://localhost:11434"
|
||||
|
||||
SYSTEM_PROMPT = """\
|
||||
You are an autonomous AI agent. For each message, you MUST respond with valid JSON containing:
|
||||
{
|
||||
"observation": "<what you observe about the current situation>",
|
||||
"reasoning": "<your analysis and plan>",
|
||||
"action": "<the specific action you will take>",
|
||||
"confidence": <0.0-1.0>
|
||||
}
|
||||
Respond ONLY with the JSON object. No other text.
|
||||
"""
|
||||
|
||||
TURNS = [
|
||||
"You are monitoring a web server. CPU usage just spiked to 95%. What do you observe, reason, and do?",
|
||||
"Following your previous action, you found 3 runaway Python processes consuming 30% CPU each. Continue.",
|
||||
"You killed the top 2 processes. CPU is now at 45%. A new alert: disk I/O is at 98%. Continue.",
|
||||
"You traced the disk I/O to a log rotation script that's stuck. You terminated it. Disk I/O dropped to 20%. Final status check: all metrics are now nominal. Continue.",
|
||||
"The incident is resolved. Write a brief post-mortem summary as your final action.",
|
||||
]
|
||||
|
||||
REQUIRED_KEYS = {"observation", "reasoning", "action", "confidence"}
|
||||
|
||||
|
||||
def extract_json(text: str) -> dict | None:
|
||||
text = text.strip()
|
||||
try:
|
||||
return json.loads(text)
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
fence_match = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", text, re.DOTALL)
|
||||
if fence_match:
|
||||
try:
|
||||
return json.loads(fence_match.group(1))
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
# Try to find { ... } block
|
||||
brace_match = re.search(r"\{[^{}]*(?:\{[^{}]*\}[^{}]*)?\}", text, re.DOTALL)
|
||||
if brace_match:
|
||||
try:
|
||||
return json.loads(brace_match.group(0))
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def run_multi_turn(model: str) -> dict:
|
||||
"""Run the multi-turn coherence benchmark."""
|
||||
conversation = []
|
||||
turn_results = []
|
||||
total_time = 0.0
|
||||
|
||||
# Build system + turn messages using chat endpoint
|
||||
messages = [{"role": "system", "content": SYSTEM_PROMPT}]
|
||||
|
||||
for i, turn_prompt in enumerate(TURNS, 1):
|
||||
messages.append({"role": "user", "content": turn_prompt})
|
||||
start = time.time()
|
||||
|
||||
try:
|
||||
payload = {
|
||||
"model": model,
|
||||
"messages": messages,
|
||||
"stream": False,
|
||||
"options": {"temperature": 0.1, "num_predict": 512},
|
||||
}
|
||||
resp = requests.post(f"{OLLAMA_URL}/api/chat", json=payload, timeout=120)
|
||||
resp.raise_for_status()
|
||||
raw = resp.json()["message"]["content"]
|
||||
except Exception as exc:
|
||||
elapsed = time.time() - start
|
||||
turn_results.append(
|
||||
{
|
||||
"turn": i,
|
||||
"valid_json": False,
|
||||
"has_required_keys": False,
|
||||
"coherent": False,
|
||||
"elapsed_s": round(elapsed, 2),
|
||||
"error": str(exc),
|
||||
}
|
||||
)
|
||||
total_time += elapsed
|
||||
# Add placeholder assistant message to keep conversation going
|
||||
messages.append({"role": "assistant", "content": "{}"})
|
||||
continue
|
||||
|
||||
elapsed = time.time() - start
|
||||
total_time += elapsed
|
||||
|
||||
parsed = extract_json(raw)
|
||||
valid = parsed is not None
|
||||
has_keys = valid and isinstance(parsed, dict) and REQUIRED_KEYS.issubset(parsed.keys())
|
||||
confidence_valid = (
|
||||
has_keys
|
||||
and isinstance(parsed.get("confidence"), (int, float))
|
||||
and 0.0 <= parsed["confidence"] <= 1.0
|
||||
)
|
||||
coherent = has_keys and confidence_valid
|
||||
|
||||
turn_results.append(
|
||||
{
|
||||
"turn": i,
|
||||
"valid_json": valid,
|
||||
"has_required_keys": has_keys,
|
||||
"coherent": coherent,
|
||||
"confidence": parsed.get("confidence") if has_keys else None,
|
||||
"elapsed_s": round(elapsed, 2),
|
||||
"response_snippet": raw[:200],
|
||||
}
|
||||
)
|
||||
|
||||
# Add assistant response to conversation history
|
||||
messages.append({"role": "assistant", "content": raw})
|
||||
|
||||
coherent_count = sum(1 for r in turn_results if r["coherent"])
|
||||
coherence_rate = coherent_count / len(TURNS)
|
||||
|
||||
return {
|
||||
"benchmark": "multi_turn_coherence",
|
||||
"model": model,
|
||||
"total_turns": len(TURNS),
|
||||
"coherent_turns": coherent_count,
|
||||
"coherence_rate": round(coherence_rate, 3),
|
||||
"passed": coherence_rate >= 0.80,
|
||||
"total_time_s": round(total_time, 2),
|
||||
"turns": turn_results,
|
||||
}
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
model = sys.argv[1] if len(sys.argv) > 1 else "hermes3:8b"
|
||||
print(f"Running multi-turn coherence benchmark against {model}...")
|
||||
result = run_multi_turn(model)
|
||||
print(json.dumps(result, indent=2))
|
||||
sys.exit(0 if result["passed"] else 1)
|
||||
@@ -1,197 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Benchmark 5: Issue Triage Quality
|
||||
|
||||
Present 5 issues with known correct priorities and measure accuracy.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import re
|
||||
import sys
|
||||
import time
|
||||
|
||||
import requests
|
||||
|
||||
OLLAMA_URL = "http://localhost:11434"
|
||||
|
||||
TRIAGE_PROMPT_TEMPLATE = """\
|
||||
You are a software project triage agent. Assign a priority to the following issue.
|
||||
|
||||
Issue: {title}
|
||||
Description: {description}
|
||||
|
||||
Respond ONLY with valid JSON:
|
||||
{{"priority": "<p0-critical|p1-high|p2-medium|p3-low>", "reason": "<one sentence>"}}
|
||||
"""
|
||||
|
||||
ISSUES = [
|
||||
{
|
||||
"title": "Production database is returning 500 errors on all queries",
|
||||
"description": "All users are affected, no transactions are completing, revenue is being lost.",
|
||||
"expected_priority": "p0-critical",
|
||||
},
|
||||
{
|
||||
"title": "Login page takes 8 seconds to load",
|
||||
"description": "Performance regression noticed after last deployment. Users are complaining but can still log in.",
|
||||
"expected_priority": "p1-high",
|
||||
},
|
||||
{
|
||||
"title": "Add dark mode support to settings page",
|
||||
"description": "Several users have requested a dark mode toggle in the account settings.",
|
||||
"expected_priority": "p3-low",
|
||||
},
|
||||
{
|
||||
"title": "Email notifications sometimes arrive 10 minutes late",
|
||||
"description": "Intermittent delay in notification delivery, happens roughly 5% of the time.",
|
||||
"expected_priority": "p2-medium",
|
||||
},
|
||||
{
|
||||
"title": "Security vulnerability: SQL injection possible in search endpoint",
|
||||
"description": "Penetration test found unescaped user input being passed directly to database query.",
|
||||
"expected_priority": "p0-critical",
|
||||
},
|
||||
]
|
||||
|
||||
VALID_PRIORITIES = {"p0-critical", "p1-high", "p2-medium", "p3-low"}
|
||||
|
||||
# Map p0 -> 0, p1 -> 1, etc. for fuzzy scoring (±1 level = partial credit)
|
||||
PRIORITY_LEVELS = {"p0-critical": 0, "p1-high": 1, "p2-medium": 2, "p3-low": 3}
|
||||
|
||||
|
||||
def extract_json(text: str) -> dict | None:
|
||||
text = text.strip()
|
||||
try:
|
||||
return json.loads(text)
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
fence_match = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", text, re.DOTALL)
|
||||
if fence_match:
|
||||
try:
|
||||
return json.loads(fence_match.group(1))
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
brace_match = re.search(r"\{[^{}]*\}", text, re.DOTALL)
|
||||
if brace_match:
|
||||
try:
|
||||
return json.loads(brace_match.group(0))
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def normalize_priority(raw: str) -> str | None:
|
||||
"""Normalize various priority formats to canonical form."""
|
||||
raw = raw.lower().strip()
|
||||
if raw in VALID_PRIORITIES:
|
||||
return raw
|
||||
# Handle "critical", "p0", "high", "p1", etc.
|
||||
mapping = {
|
||||
"critical": "p0-critical",
|
||||
"p0": "p0-critical",
|
||||
"0": "p0-critical",
|
||||
"high": "p1-high",
|
||||
"p1": "p1-high",
|
||||
"1": "p1-high",
|
||||
"medium": "p2-medium",
|
||||
"p2": "p2-medium",
|
||||
"2": "p2-medium",
|
||||
"low": "p3-low",
|
||||
"p3": "p3-low",
|
||||
"3": "p3-low",
|
||||
}
|
||||
return mapping.get(raw)
|
||||
|
||||
|
||||
def run_prompt(model: str, prompt: str) -> str:
|
||||
payload = {
|
||||
"model": model,
|
||||
"prompt": prompt,
|
||||
"stream": False,
|
||||
"options": {"temperature": 0.1, "num_predict": 256},
|
||||
}
|
||||
resp = requests.post(f"{OLLAMA_URL}/api/generate", json=payload, timeout=120)
|
||||
resp.raise_for_status()
|
||||
return resp.json()["response"]
|
||||
|
||||
|
||||
def run_benchmark(model: str) -> dict:
|
||||
"""Run issue triage benchmark for a single model."""
|
||||
results = []
|
||||
total_time = 0.0
|
||||
|
||||
for i, issue in enumerate(ISSUES, 1):
|
||||
prompt = TRIAGE_PROMPT_TEMPLATE.format(
|
||||
title=issue["title"], description=issue["description"]
|
||||
)
|
||||
start = time.time()
|
||||
try:
|
||||
raw = run_prompt(model, prompt)
|
||||
elapsed = time.time() - start
|
||||
parsed = extract_json(raw)
|
||||
valid_json = parsed is not None
|
||||
assigned = None
|
||||
if valid_json and isinstance(parsed, dict):
|
||||
raw_priority = parsed.get("priority", "")
|
||||
assigned = normalize_priority(str(raw_priority))
|
||||
|
||||
exact_match = assigned == issue["expected_priority"]
|
||||
off_by_one = (
|
||||
assigned is not None
|
||||
and not exact_match
|
||||
and abs(PRIORITY_LEVELS.get(assigned, -1) - PRIORITY_LEVELS[issue["expected_priority"]]) == 1
|
||||
)
|
||||
|
||||
results.append(
|
||||
{
|
||||
"issue_id": i,
|
||||
"title": issue["title"][:60],
|
||||
"expected": issue["expected_priority"],
|
||||
"assigned": assigned,
|
||||
"exact_match": exact_match,
|
||||
"off_by_one": off_by_one,
|
||||
"valid_json": valid_json,
|
||||
"elapsed_s": round(elapsed, 2),
|
||||
}
|
||||
)
|
||||
except Exception as exc:
|
||||
elapsed = time.time() - start
|
||||
results.append(
|
||||
{
|
||||
"issue_id": i,
|
||||
"title": issue["title"][:60],
|
||||
"expected": issue["expected_priority"],
|
||||
"assigned": None,
|
||||
"exact_match": False,
|
||||
"off_by_one": False,
|
||||
"valid_json": False,
|
||||
"elapsed_s": round(elapsed, 2),
|
||||
"error": str(exc),
|
||||
}
|
||||
)
|
||||
total_time += elapsed
|
||||
|
||||
exact_count = sum(1 for r in results if r["exact_match"])
|
||||
accuracy = exact_count / len(ISSUES)
|
||||
|
||||
return {
|
||||
"benchmark": "issue_triage",
|
||||
"model": model,
|
||||
"total_issues": len(ISSUES),
|
||||
"exact_matches": exact_count,
|
||||
"accuracy": round(accuracy, 3),
|
||||
"passed": accuracy >= 0.80,
|
||||
"total_time_s": round(total_time, 2),
|
||||
"results": results,
|
||||
}
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
model = sys.argv[1] if len(sys.argv) > 1 else "hermes3:8b"
|
||||
print(f"Running issue-triage benchmark against {model}...")
|
||||
result = run_benchmark(model)
|
||||
print(json.dumps(result, indent=2))
|
||||
sys.exit(0 if result["passed"] else 1)
|
||||
@@ -1,334 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Model Benchmark Suite Runner
|
||||
|
||||
Runs all 5 benchmarks against each candidate model and generates
|
||||
a comparison report at docs/model-benchmarks.md.
|
||||
|
||||
Usage:
|
||||
python scripts/benchmarks/run_suite.py
|
||||
python scripts/benchmarks/run_suite.py --models hermes3:8b qwen3.5:latest
|
||||
python scripts/benchmarks/run_suite.py --output docs/model-benchmarks.md
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import importlib.util
|
||||
import json
|
||||
import sys
|
||||
import time
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
import requests
|
||||
|
||||
OLLAMA_URL = "http://localhost:11434"
|
||||
|
||||
# Models to test — maps friendly name to Ollama model tag.
|
||||
# Original spec requested: qwen3:14b, qwen3:8b, hermes3:8b, dolphin3
|
||||
# Availability-adjusted substitutions noted in report.
|
||||
DEFAULT_MODELS = [
|
||||
"hermes3:8b",
|
||||
"qwen3.5:latest",
|
||||
"qwen2.5:14b",
|
||||
"llama3.2:latest",
|
||||
]
|
||||
|
||||
BENCHMARKS_DIR = Path(__file__).parent
|
||||
DOCS_DIR = Path(__file__).resolve().parent.parent.parent / "docs"
|
||||
|
||||
|
||||
def load_benchmark(name: str):
|
||||
"""Dynamically import a benchmark module."""
|
||||
path = BENCHMARKS_DIR / name
|
||||
module_name = Path(name).stem
|
||||
spec = importlib.util.spec_from_file_location(module_name, path)
|
||||
mod = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(mod)
|
||||
return mod
|
||||
|
||||
|
||||
def model_available(model: str) -> bool:
|
||||
"""Check if a model is available via Ollama."""
|
||||
try:
|
||||
resp = requests.get(f"{OLLAMA_URL}/api/tags", timeout=10)
|
||||
if resp.status_code != 200:
|
||||
return False
|
||||
models = {m["name"] for m in resp.json().get("models", [])}
|
||||
return model in models
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
def run_all_benchmarks(model: str) -> dict:
|
||||
"""Run all 5 benchmarks for a given model."""
|
||||
benchmark_files = [
|
||||
"01_tool_calling.py",
|
||||
"02_code_generation.py",
|
||||
"03_shell_commands.py",
|
||||
"04_multi_turn_coherence.py",
|
||||
"05_issue_triage.py",
|
||||
]
|
||||
|
||||
results = {}
|
||||
for fname in benchmark_files:
|
||||
key = fname.replace(".py", "")
|
||||
print(f" [{model}] Running {key}...", flush=True)
|
||||
try:
|
||||
mod = load_benchmark(fname)
|
||||
start = time.time()
|
||||
if key == "01_tool_calling":
|
||||
result = mod.run_benchmark(model)
|
||||
elif key == "02_code_generation":
|
||||
result = mod.run_benchmark(model)
|
||||
elif key == "03_shell_commands":
|
||||
result = mod.run_benchmark(model)
|
||||
elif key == "04_multi_turn_coherence":
|
||||
result = mod.run_multi_turn(model)
|
||||
elif key == "05_issue_triage":
|
||||
result = mod.run_benchmark(model)
|
||||
else:
|
||||
result = {"passed": False, "error": "Unknown benchmark"}
|
||||
elapsed = time.time() - start
|
||||
print(
|
||||
f" -> {'PASS' if result.get('passed') else 'FAIL'} ({elapsed:.1f}s)",
|
||||
flush=True,
|
||||
)
|
||||
results[key] = result
|
||||
except Exception as exc:
|
||||
print(f" -> ERROR: {exc}", flush=True)
|
||||
results[key] = {"benchmark": key, "model": model, "passed": False, "error": str(exc)}
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def score_model(results: dict) -> dict:
|
||||
"""Compute summary scores for a model."""
|
||||
benchmarks = list(results.values())
|
||||
passed = sum(1 for b in benchmarks if b.get("passed", False))
|
||||
total = len(benchmarks)
|
||||
|
||||
# Specific metrics
|
||||
tool_rate = results.get("01_tool_calling", {}).get("compliance_rate", 0.0)
|
||||
code_pass = results.get("02_code_generation", {}).get("passed", False)
|
||||
shell_pass = results.get("03_shell_commands", {}).get("passed", False)
|
||||
coherence = results.get("04_multi_turn_coherence", {}).get("coherence_rate", 0.0)
|
||||
triage_acc = results.get("05_issue_triage", {}).get("accuracy", 0.0)
|
||||
|
||||
total_time = sum(
|
||||
r.get("total_time_s", r.get("elapsed_s", 0.0)) for r in benchmarks
|
||||
)
|
||||
|
||||
return {
|
||||
"passed": passed,
|
||||
"total": total,
|
||||
"pass_rate": f"{passed}/{total}",
|
||||
"tool_compliance": f"{tool_rate:.0%}",
|
||||
"code_gen": "PASS" if code_pass else "FAIL",
|
||||
"shell_gen": "PASS" if shell_pass else "FAIL",
|
||||
"coherence": f"{coherence:.0%}",
|
||||
"triage_accuracy": f"{triage_acc:.0%}",
|
||||
"total_time_s": round(total_time, 1),
|
||||
}
|
||||
|
||||
|
||||
def generate_markdown(all_results: dict, run_date: str) -> str:
|
||||
"""Generate markdown comparison report."""
|
||||
lines = []
|
||||
lines.append("# Model Benchmark Results")
|
||||
lines.append("")
|
||||
lines.append(f"> Generated: {run_date} ")
|
||||
lines.append(f"> Ollama URL: `{OLLAMA_URL}` ")
|
||||
lines.append("> Issue: [#1066](http://143.198.27.163:3000/rockachopa/Timmy-time-dashboard/issues/1066)")
|
||||
lines.append("")
|
||||
lines.append("## Overview")
|
||||
lines.append("")
|
||||
lines.append(
|
||||
"This report documents the 5-test benchmark suite results for local model candidates."
|
||||
)
|
||||
lines.append("")
|
||||
lines.append("### Model Availability vs. Spec")
|
||||
lines.append("")
|
||||
lines.append("| Requested | Tested Substitute | Reason |")
|
||||
lines.append("|-----------|-------------------|--------|")
|
||||
lines.append("| `qwen3:14b` | `qwen2.5:14b` | `qwen3:14b` not pulled locally |")
|
||||
lines.append("| `qwen3:8b` | `qwen3.5:latest` | `qwen3:8b` not pulled locally |")
|
||||
lines.append("| `hermes3:8b` | `hermes3:8b` | Exact match |")
|
||||
lines.append("| `dolphin3` | `llama3.2:latest` | `dolphin3` not pulled locally |")
|
||||
lines.append("")
|
||||
|
||||
# Summary table
|
||||
lines.append("## Summary Comparison Table")
|
||||
lines.append("")
|
||||
lines.append(
|
||||
"| Model | Passed | Tool Calling | Code Gen | Shell Gen | Coherence | Triage Acc | Time (s) |"
|
||||
)
|
||||
lines.append(
|
||||
"|-------|--------|-------------|----------|-----------|-----------|------------|----------|"
|
||||
)
|
||||
|
||||
for model, results in all_results.items():
|
||||
if "error" in results and "01_tool_calling" not in results:
|
||||
lines.append(f"| `{model}` | — | — | — | — | — | — | — |")
|
||||
continue
|
||||
s = score_model(results)
|
||||
lines.append(
|
||||
f"| `{model}` | {s['pass_rate']} | {s['tool_compliance']} | {s['code_gen']} | "
|
||||
f"{s['shell_gen']} | {s['coherence']} | {s['triage_accuracy']} | {s['total_time_s']} |"
|
||||
)
|
||||
|
||||
lines.append("")
|
||||
|
||||
# Per-model detail sections
|
||||
lines.append("## Per-Model Detail")
|
||||
lines.append("")
|
||||
|
||||
for model, results in all_results.items():
|
||||
lines.append(f"### `{model}`")
|
||||
lines.append("")
|
||||
|
||||
if "error" in results and not isinstance(results.get("error"), str):
|
||||
lines.append(f"> **Error:** {results.get('error')}")
|
||||
lines.append("")
|
||||
continue
|
||||
|
||||
for bkey, bres in results.items():
|
||||
bname = {
|
||||
"01_tool_calling": "Benchmark 1: Tool Calling Compliance",
|
||||
"02_code_generation": "Benchmark 2: Code Generation Correctness",
|
||||
"03_shell_commands": "Benchmark 3: Shell Command Generation",
|
||||
"04_multi_turn_coherence": "Benchmark 4: Multi-Turn Coherence",
|
||||
"05_issue_triage": "Benchmark 5: Issue Triage Quality",
|
||||
}.get(bkey, bkey)
|
||||
|
||||
status = "✅ PASS" if bres.get("passed") else "❌ FAIL"
|
||||
lines.append(f"#### {bname} — {status}")
|
||||
lines.append("")
|
||||
|
||||
if bkey == "01_tool_calling":
|
||||
rate = bres.get("compliance_rate", 0)
|
||||
count = bres.get("valid_json_count", 0)
|
||||
total = bres.get("total_prompts", 0)
|
||||
lines.append(
|
||||
f"- **JSON Compliance:** {count}/{total} ({rate:.0%}) — target ≥90%"
|
||||
)
|
||||
elif bkey == "02_code_generation":
|
||||
lines.append(f"- **Result:** {bres.get('detail', bres.get('error', 'n/a'))}")
|
||||
snippet = bres.get("code_snippet", "")
|
||||
if snippet:
|
||||
lines.append(f"- **Generated code snippet:**")
|
||||
lines.append(" ```python")
|
||||
for ln in snippet.splitlines()[:8]:
|
||||
lines.append(f" {ln}")
|
||||
lines.append(" ```")
|
||||
elif bkey == "03_shell_commands":
|
||||
passed = bres.get("passed_count", 0)
|
||||
refused = bres.get("refused_count", 0)
|
||||
total = bres.get("total_prompts", 0)
|
||||
lines.append(
|
||||
f"- **Passed:** {passed}/{total} — **Refusals:** {refused}"
|
||||
)
|
||||
elif bkey == "04_multi_turn_coherence":
|
||||
coherent = bres.get("coherent_turns", 0)
|
||||
total = bres.get("total_turns", 0)
|
||||
rate = bres.get("coherence_rate", 0)
|
||||
lines.append(
|
||||
f"- **Coherent turns:** {coherent}/{total} ({rate:.0%}) — target ≥80%"
|
||||
)
|
||||
elif bkey == "05_issue_triage":
|
||||
exact = bres.get("exact_matches", 0)
|
||||
total = bres.get("total_issues", 0)
|
||||
acc = bres.get("accuracy", 0)
|
||||
lines.append(
|
||||
f"- **Accuracy:** {exact}/{total} ({acc:.0%}) — target ≥80%"
|
||||
)
|
||||
|
||||
elapsed = bres.get("total_time_s", bres.get("elapsed_s", 0))
|
||||
lines.append(f"- **Time:** {elapsed}s")
|
||||
lines.append("")
|
||||
|
||||
lines.append("## Raw JSON Data")
|
||||
lines.append("")
|
||||
lines.append("<details>")
|
||||
lines.append("<summary>Click to expand full JSON results</summary>")
|
||||
lines.append("")
|
||||
lines.append("```json")
|
||||
lines.append(json.dumps(all_results, indent=2))
|
||||
lines.append("```")
|
||||
lines.append("")
|
||||
lines.append("</details>")
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
parser = argparse.ArgumentParser(description="Run model benchmark suite")
|
||||
parser.add_argument(
|
||||
"--models",
|
||||
nargs="+",
|
||||
default=DEFAULT_MODELS,
|
||||
help="Models to test",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--output",
|
||||
type=Path,
|
||||
default=DOCS_DIR / "model-benchmarks.md",
|
||||
help="Output markdown file",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--json-output",
|
||||
type=Path,
|
||||
default=None,
|
||||
help="Optional JSON output file",
|
||||
)
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def main() -> int:
|
||||
args = parse_args()
|
||||
run_date = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC")
|
||||
|
||||
print(f"Model Benchmark Suite — {run_date}")
|
||||
print(f"Testing {len(args.models)} model(s): {', '.join(args.models)}")
|
||||
print()
|
||||
|
||||
all_results: dict[str, dict] = {}
|
||||
|
||||
for model in args.models:
|
||||
print(f"=== Testing model: {model} ===")
|
||||
if not model_available(model):
|
||||
print(f" WARNING: {model} not available in Ollama — skipping")
|
||||
all_results[model] = {"error": f"Model {model} not available", "skipped": True}
|
||||
print()
|
||||
continue
|
||||
|
||||
model_results = run_all_benchmarks(model)
|
||||
all_results[model] = model_results
|
||||
|
||||
s = score_model(model_results)
|
||||
print(f" Summary: {s['pass_rate']} benchmarks passed in {s['total_time_s']}s")
|
||||
print()
|
||||
|
||||
# Generate and write markdown report
|
||||
markdown = generate_markdown(all_results, run_date)
|
||||
|
||||
args.output.parent.mkdir(parents=True, exist_ok=True)
|
||||
args.output.write_text(markdown, encoding="utf-8")
|
||||
print(f"Report written to: {args.output}")
|
||||
|
||||
if args.json_output:
|
||||
args.json_output.write_text(json.dumps(all_results, indent=2), encoding="utf-8")
|
||||
print(f"JSON data written to: {args.json_output}")
|
||||
|
||||
# Overall pass/fail
|
||||
all_pass = all(
|
||||
not r.get("skipped", False)
|
||||
and all(b.get("passed", False) for b in r.values() if isinstance(b, dict))
|
||||
for r in all_results.values()
|
||||
)
|
||||
return 0 if all_pass else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
@@ -42,7 +42,7 @@ def _get_gitea_api() -> str:
|
||||
if api_file.exists():
|
||||
return api_file.read_text().strip()
|
||||
# Default fallback
|
||||
return "http://143.198.27.163:3000/api/v1"
|
||||
return "http://localhost:3000/api/v1"
|
||||
|
||||
|
||||
GITEA_API = _get_gitea_api()
|
||||
@@ -240,33 +240,9 @@ def compute_backoff(consecutive_idle: int) -> int:
|
||||
return min(BACKOFF_BASE * (BACKOFF_MULTIPLIER ** consecutive_idle), BACKOFF_MAX)
|
||||
|
||||
|
||||
def seed_cycle_result(item: dict) -> None:
|
||||
"""Pre-seed cycle_result.json with the top queue item.
|
||||
|
||||
Only writes if cycle_result.json does not already exist — never overwrites
|
||||
agent-written data. This ensures cycle_retro.py can always resolve the
|
||||
issue number even when the dispatcher (claude-loop, gemini-loop, etc.) does
|
||||
not write cycle_result.json itself.
|
||||
"""
|
||||
if CYCLE_RESULT_FILE.exists():
|
||||
return # Agent already wrote its own result — leave it alone
|
||||
|
||||
seed = {
|
||||
"issue": item.get("issue"),
|
||||
"type": item.get("type", "unknown"),
|
||||
}
|
||||
try:
|
||||
CYCLE_RESULT_FILE.parent.mkdir(parents=True, exist_ok=True)
|
||||
CYCLE_RESULT_FILE.write_text(json.dumps(seed) + "\n")
|
||||
print(f"[loop-guard] Seeded cycle_result.json with issue #{seed['issue']}")
|
||||
except OSError as exc:
|
||||
print(f"[loop-guard] WARNING: Could not seed cycle_result.json: {exc}")
|
||||
|
||||
|
||||
def main() -> int:
|
||||
wait_mode = "--wait" in sys.argv
|
||||
status_mode = "--status" in sys.argv
|
||||
pick_mode = "--pick" in sys.argv
|
||||
|
||||
state = load_idle_state()
|
||||
|
||||
@@ -293,17 +269,6 @@ def main() -> int:
|
||||
state["consecutive_idle"] = 0
|
||||
state["last_idle_at"] = 0
|
||||
save_idle_state(state)
|
||||
|
||||
# Pre-seed cycle_result.json so cycle_retro.py can resolve issue=
|
||||
# even when the dispatcher doesn't write the file itself.
|
||||
seed_cycle_result(ready[0])
|
||||
|
||||
if pick_mode:
|
||||
# Emit the top issue number to stdout for shell script capture.
|
||||
issue = ready[0].get("issue")
|
||||
if issue is not None:
|
||||
print(issue)
|
||||
|
||||
return 0
|
||||
|
||||
# Queue empty — apply backoff
|
||||
|
||||
@@ -6,7 +6,7 @@ writes a ranked queue to .loop/queue.json. No LLM calls — pure heuristics.
|
||||
|
||||
Run: python3 scripts/triage_score.py
|
||||
Env: GITEA_TOKEN (or reads ~/.hermes/gitea_token)
|
||||
GITEA_API (default: http://143.198.27.163:3000/api/v1)
|
||||
GITEA_API (default: http://localhost:3000/api/v1)
|
||||
REPO_SLUG (default: rockachopa/Timmy-time-dashboard)
|
||||
"""
|
||||
|
||||
@@ -33,7 +33,7 @@ def _get_gitea_api() -> str:
|
||||
if api_file.exists():
|
||||
return api_file.read_text().strip()
|
||||
# Default fallback
|
||||
return "http://143.198.27.163:3000/api/v1"
|
||||
return "http://localhost:3000/api/v1"
|
||||
|
||||
|
||||
GITEA_API = _get_gitea_api()
|
||||
|
||||
@@ -1 +0,0 @@
|
||||
"""Timmy Time Dashboard — source root package."""
|
||||
|
||||
@@ -1,22 +0,0 @@
|
||||
"""Bannerlord sovereign agent package — Project Bannerlord M5.
|
||||
|
||||
Implements the feudal multi-agent hierarchy for Timmy's Bannerlord campaign.
|
||||
Architecture based on Ahilan & Dayan (2019) Feudal Multi-Agent Hierarchies.
|
||||
|
||||
Refs #1091 (epic), #1097 (M5 Sovereign Victory), #1099 (feudal hierarchy design).
|
||||
|
||||
Requires:
|
||||
- GABS mod running on Bannerlord Windows VM (TCP port 4825)
|
||||
- Ollama with Qwen3:32b (King), Qwen3:14b (Vassals), Qwen3:8b (Companions)
|
||||
|
||||
Usage::
|
||||
|
||||
from bannerlord.gabs_client import GABSClient
|
||||
from bannerlord.agents.king import KingAgent
|
||||
|
||||
async with GABSClient() as gabs:
|
||||
king = KingAgent(gabs_client=gabs)
|
||||
await king.run_campaign()
|
||||
"""
|
||||
|
||||
__version__ = "0.1.0"
|
||||
@@ -1,7 +0,0 @@
|
||||
"""Bannerlord feudal agent hierarchy.
|
||||
|
||||
Three tiers:
|
||||
- King (king.py) — strategic, Qwen3:32b, 1× per campaign day
|
||||
- Vassals (vassals.py) — domain, Qwen3:14b, 4× per campaign day
|
||||
- Companions (companions.py) — tactical, Qwen3:8b, event-driven
|
||||
"""
|
||||
@@ -1,261 +0,0 @@
|
||||
"""Companion worker agents — Logistics, Caravan, and Scout.
|
||||
|
||||
Companions are the lowest tier — fast, specialized, single-purpose workers.
|
||||
Each companion listens to its :class:`TaskMessage` queue, executes the
|
||||
requested primitive against GABS, and emits a :class:`ResultMessage`.
|
||||
|
||||
Model: Qwen3:8b (or smaller) — sub-2-second response times.
|
||||
Frequency: event-driven (triggered by vassal task messages).
|
||||
|
||||
Primitive vocabulary per companion:
|
||||
Logistics: recruit_troop, buy_supplies, rest_party, sell_prisoners, upgrade_troops, build_project
|
||||
Caravan: assess_prices, buy_goods, sell_goods, establish_caravan, abandon_route
|
||||
Scout: track_lord, assess_garrison, map_patrol_routes, report_intel
|
||||
|
||||
Refs: #1097, #1099.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from bannerlord.gabs_client import GABSClient, GABSUnavailable
|
||||
from bannerlord.models import ResultMessage, TaskMessage
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class BaseCompanion:
|
||||
"""Shared companion lifecycle — polls task queue, executes primitives."""
|
||||
|
||||
name: str = "base_companion"
|
||||
primitives: frozenset[str] = frozenset()
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
gabs_client: GABSClient,
|
||||
task_queue: asyncio.Queue[TaskMessage],
|
||||
result_queue: asyncio.Queue[ResultMessage] | None = None,
|
||||
) -> None:
|
||||
self._gabs = gabs_client
|
||||
self._task_queue = task_queue
|
||||
self._result_queue = result_queue or asyncio.Queue()
|
||||
self._running = False
|
||||
|
||||
@property
|
||||
def result_queue(self) -> asyncio.Queue[ResultMessage]:
|
||||
return self._result_queue
|
||||
|
||||
async def run(self) -> None:
|
||||
"""Companion event loop — processes task messages."""
|
||||
self._running = True
|
||||
logger.info("%s started", self.name)
|
||||
try:
|
||||
while self._running:
|
||||
try:
|
||||
task = await asyncio.wait_for(self._task_queue.get(), timeout=1.0)
|
||||
except TimeoutError:
|
||||
continue
|
||||
|
||||
if task.to_agent != self.name:
|
||||
# Not for us — put it back (another companion will handle it)
|
||||
await self._task_queue.put(task)
|
||||
await asyncio.sleep(0.05)
|
||||
continue
|
||||
|
||||
result = await self._execute(task)
|
||||
await self._result_queue.put(result)
|
||||
self._task_queue.task_done()
|
||||
|
||||
except asyncio.CancelledError:
|
||||
logger.info("%s cancelled", self.name)
|
||||
raise
|
||||
finally:
|
||||
self._running = False
|
||||
|
||||
def stop(self) -> None:
|
||||
self._running = False
|
||||
|
||||
async def _execute(self, task: TaskMessage) -> ResultMessage:
|
||||
"""Dispatch *task.primitive* to its handler method."""
|
||||
handler = getattr(self, f"_prim_{task.primitive}", None)
|
||||
if handler is None:
|
||||
logger.warning("%s: unknown primitive %r — skipping", self.name, task.primitive)
|
||||
return ResultMessage(
|
||||
from_agent=self.name,
|
||||
to_agent=task.from_agent,
|
||||
success=False,
|
||||
outcome={"error": f"Unknown primitive: {task.primitive}"},
|
||||
)
|
||||
try:
|
||||
outcome = await handler(task.args)
|
||||
return ResultMessage(
|
||||
from_agent=self.name,
|
||||
to_agent=task.from_agent,
|
||||
success=True,
|
||||
outcome=outcome or {},
|
||||
)
|
||||
except GABSUnavailable as exc:
|
||||
logger.warning("%s: GABS unavailable for %r: %s", self.name, task.primitive, exc)
|
||||
return ResultMessage(
|
||||
from_agent=self.name,
|
||||
to_agent=task.from_agent,
|
||||
success=False,
|
||||
outcome={"error": str(exc)},
|
||||
)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.warning("%s: %r failed: %s", self.name, task.primitive, exc)
|
||||
return ResultMessage(
|
||||
from_agent=self.name,
|
||||
to_agent=task.from_agent,
|
||||
success=False,
|
||||
outcome={"error": str(exc)},
|
||||
)
|
||||
|
||||
|
||||
# ── Logistics Companion ───────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class LogisticsCompanion(BaseCompanion):
|
||||
"""Party management — recruitment, supply, healing, troop upgrades.
|
||||
|
||||
Skill domain: Scouting / Steward / Medicine.
|
||||
"""
|
||||
|
||||
name = "logistics_companion"
|
||||
primitives = frozenset(
|
||||
{
|
||||
"recruit_troop",
|
||||
"buy_supplies",
|
||||
"rest_party",
|
||||
"sell_prisoners",
|
||||
"upgrade_troops",
|
||||
"build_project",
|
||||
}
|
||||
)
|
||||
|
||||
async def _prim_recruit_troop(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
troop_type = args.get("troop_type", "infantry")
|
||||
qty = int(args.get("quantity", 10))
|
||||
result = await self._gabs.recruit_troops(troop_type, qty)
|
||||
logger.info("Recruited %d %s", qty, troop_type)
|
||||
return result or {"recruited": qty, "type": troop_type}
|
||||
|
||||
async def _prim_buy_supplies(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
qty = int(args.get("quantity", 50))
|
||||
result = await self._gabs.call("party.buySupplies", {"quantity": qty})
|
||||
logger.info("Bought %d food supplies", qty)
|
||||
return result or {"purchased": qty}
|
||||
|
||||
async def _prim_rest_party(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
days = int(args.get("days", 3))
|
||||
result = await self._gabs.call("party.rest", {"days": days})
|
||||
logger.info("Resting party for %d days", days)
|
||||
return result or {"rested_days": days}
|
||||
|
||||
async def _prim_sell_prisoners(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
location = args.get("location", "nearest_town")
|
||||
result = await self._gabs.call("party.sellPrisoners", {"location": location})
|
||||
logger.info("Selling prisoners at %s", location)
|
||||
return result or {"sold_at": location}
|
||||
|
||||
async def _prim_upgrade_troops(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
result = await self._gabs.call("party.upgradeTroops", {})
|
||||
logger.info("Upgraded available troops")
|
||||
return result or {"upgraded": True}
|
||||
|
||||
async def _prim_build_project(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
settlement = args.get("settlement", "")
|
||||
result = await self._gabs.call("settlement.buildProject", {"settlement": settlement})
|
||||
logger.info("Building project in %s", settlement)
|
||||
return result or {"settlement": settlement}
|
||||
|
||||
async def _prim_move_party(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
destination = args.get("destination", "")
|
||||
result = await self._gabs.move_party(destination)
|
||||
logger.info("Moving party to %s", destination)
|
||||
return result or {"destination": destination}
|
||||
|
||||
|
||||
# ── Caravan Companion ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class CaravanCompanion(BaseCompanion):
|
||||
"""Trade route management — price assessment, goods trading, caravan deployment.
|
||||
|
||||
Skill domain: Trade / Charm.
|
||||
"""
|
||||
|
||||
name = "caravan_companion"
|
||||
primitives = frozenset(
|
||||
{"assess_prices", "buy_goods", "sell_goods", "establish_caravan", "abandon_route"}
|
||||
)
|
||||
|
||||
async def _prim_assess_prices(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
town = args.get("town", "nearest")
|
||||
result = await self._gabs.call("trade.assessPrices", {"town": town})
|
||||
logger.info("Assessed prices at %s", town)
|
||||
return result or {"town": town}
|
||||
|
||||
async def _prim_buy_goods(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
item = args.get("item", "grain")
|
||||
qty = int(args.get("quantity", 10))
|
||||
result = await self._gabs.call("trade.buyGoods", {"item": item, "quantity": qty})
|
||||
logger.info("Buying %d × %s", qty, item)
|
||||
return result or {"item": item, "quantity": qty}
|
||||
|
||||
async def _prim_sell_goods(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
item = args.get("item", "grain")
|
||||
qty = int(args.get("quantity", 10))
|
||||
result = await self._gabs.call("trade.sellGoods", {"item": item, "quantity": qty})
|
||||
logger.info("Selling %d × %s", qty, item)
|
||||
return result or {"item": item, "quantity": qty}
|
||||
|
||||
async def _prim_establish_caravan(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
town = args.get("town", "")
|
||||
result = await self._gabs.call("trade.establishCaravan", {"town": town})
|
||||
logger.info("Establishing caravan at %s", town)
|
||||
return result or {"town": town}
|
||||
|
||||
async def _prim_abandon_route(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
result = await self._gabs.call("trade.abandonRoute", {})
|
||||
logger.info("Caravan route abandoned — returning to main party")
|
||||
return result or {"abandoned": True}
|
||||
|
||||
|
||||
# ── Scout Companion ───────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class ScoutCompanion(BaseCompanion):
|
||||
"""Intelligence gathering — lord tracking, garrison assessment, patrol mapping.
|
||||
|
||||
Skill domain: Scouting / Roguery.
|
||||
"""
|
||||
|
||||
name = "scout_companion"
|
||||
primitives = frozenset({"track_lord", "assess_garrison", "map_patrol_routes", "report_intel"})
|
||||
|
||||
async def _prim_track_lord(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
lord_name = args.get("name", "")
|
||||
result = await self._gabs.call("intelligence.trackLord", {"name": lord_name})
|
||||
logger.info("Tracking lord: %s", lord_name)
|
||||
return result or {"tracking": lord_name}
|
||||
|
||||
async def _prim_assess_garrison(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
settlement = args.get("settlement", "")
|
||||
result = await self._gabs.call("intelligence.assessGarrison", {"settlement": settlement})
|
||||
logger.info("Assessing garrison at %s", settlement)
|
||||
return result or {"settlement": settlement}
|
||||
|
||||
async def _prim_map_patrol_routes(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
region = args.get("region", "")
|
||||
result = await self._gabs.call("intelligence.mapPatrols", {"region": region})
|
||||
logger.info("Mapping patrol routes in %s", region)
|
||||
return result or {"region": region}
|
||||
|
||||
async def _prim_report_intel(self, args: dict[str, Any]) -> dict[str, Any]:
|
||||
result = await self._gabs.call("intelligence.report", {})
|
||||
logger.info("Scout intel report generated")
|
||||
return result or {"reported": True}
|
||||
@@ -1,235 +0,0 @@
|
||||
"""King agent — Timmy as sovereign ruler of Calradia.
|
||||
|
||||
The King operates on the campaign-map timescale. Each campaign tick he:
|
||||
1. Reads the full game state from GABS
|
||||
2. Evaluates the victory condition
|
||||
3. Issues a single KingSubgoal token to the vassal queue
|
||||
4. Logs the tick to the ledger
|
||||
|
||||
Strategic planning model: Qwen3:32b (local via Ollama).
|
||||
Decision budget: 5–15 seconds per tick.
|
||||
|
||||
Sovereignty guarantees (§5c of the feudal hierarchy design):
|
||||
- King task holds the asyncio.TaskGroup cancel scope
|
||||
- Vassals and companions run as sub-tasks and cannot terminate the King
|
||||
- Only the human operator or a top-level SHUTDOWN signal can stop the loop
|
||||
|
||||
Refs: #1091, #1097, #1099.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from bannerlord.gabs_client import GABSClient, GABSUnavailable
|
||||
from bannerlord.ledger import Ledger
|
||||
from bannerlord.models import (
|
||||
KingSubgoal,
|
||||
StateUpdateMessage,
|
||||
SubgoalMessage,
|
||||
VictoryCondition,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_KING_MODEL = "qwen3:32b"
|
||||
_KING_TICK_SECONDS = 5.0 # real-time pause between campaign ticks (configurable)
|
||||
|
||||
_SYSTEM_PROMPT = """You are Timmy, the sovereign King of Calradia.
|
||||
Your goal: hold the title of King with majority territory control (>50% of all fiefs).
|
||||
You think strategically over 100+ in-game days. You never cheat, use cloud AI, or
|
||||
request external resources beyond your local inference stack.
|
||||
|
||||
Each turn you receive the full game state as JSON. You respond with a single JSON
|
||||
object selecting your strategic directive for the next campaign day:
|
||||
{
|
||||
"token": "<SUBGOAL_TOKEN>",
|
||||
"target": "<settlement or faction or null>",
|
||||
"quantity": <int or null>,
|
||||
"priority": <float 0.0-2.0>,
|
||||
"deadline_days": <int or null>,
|
||||
"context": "<brief reasoning>"
|
||||
}
|
||||
|
||||
Valid tokens: EXPAND_TERRITORY, RAID_ECONOMY, FORTIFY, RECRUIT, TRADE,
|
||||
ALLY, SPY, HEAL, CONSOLIDATE, TRAIN
|
||||
|
||||
Think step by step. Respond with JSON only — no prose outside the object.
|
||||
"""
|
||||
|
||||
|
||||
class KingAgent:
|
||||
"""Sovereign campaign agent.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
gabs_client:
|
||||
Connected (or gracefully-degraded) GABS client.
|
||||
ledger:
|
||||
Asset ledger for persistence. Initialized automatically if not provided.
|
||||
ollama_url:
|
||||
Base URL of the Ollama inference server.
|
||||
model:
|
||||
Ollama model tag. Default: qwen3:32b.
|
||||
tick_interval:
|
||||
Real-time seconds between campaign ticks.
|
||||
subgoal_queue:
|
||||
asyncio.Queue where KingSubgoal messages are placed for vassals.
|
||||
Created automatically if not provided.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
gabs_client: GABSClient,
|
||||
ledger: Ledger | None = None,
|
||||
ollama_url: str = "http://localhost:11434",
|
||||
model: str = _KING_MODEL,
|
||||
tick_interval: float = _KING_TICK_SECONDS,
|
||||
subgoal_queue: asyncio.Queue[SubgoalMessage] | None = None,
|
||||
) -> None:
|
||||
self._gabs = gabs_client
|
||||
self._ledger = ledger or Ledger()
|
||||
self._ollama_url = ollama_url
|
||||
self._model = model
|
||||
self._tick_interval = tick_interval
|
||||
self._subgoal_queue: asyncio.Queue[SubgoalMessage] = subgoal_queue or asyncio.Queue()
|
||||
self._tick = 0
|
||||
self._running = False
|
||||
|
||||
@property
|
||||
def subgoal_queue(self) -> asyncio.Queue[SubgoalMessage]:
|
||||
return self._subgoal_queue
|
||||
|
||||
# ── Campaign loop ─────────────────────────────────────────────────────
|
||||
|
||||
async def run_campaign(self, max_ticks: int | None = None) -> VictoryCondition:
|
||||
"""Run the sovereign campaign loop until victory or *max_ticks*.
|
||||
|
||||
Returns the final :class:`VictoryCondition` snapshot.
|
||||
"""
|
||||
self._ledger.initialize()
|
||||
self._running = True
|
||||
victory = VictoryCondition()
|
||||
logger.info("King campaign started. Model: %s. Max ticks: %s", self._model, max_ticks)
|
||||
|
||||
try:
|
||||
while self._running:
|
||||
if max_ticks is not None and self._tick >= max_ticks:
|
||||
logger.info("Max ticks (%d) reached — stopping campaign.", max_ticks)
|
||||
break
|
||||
|
||||
state = await self._fetch_state()
|
||||
victory = self._evaluate_victory(state)
|
||||
|
||||
if victory.achieved:
|
||||
logger.info(
|
||||
"SOVEREIGN VICTORY — King of Calradia! Territory: %.1f%%, tick: %d",
|
||||
victory.territory_control_pct,
|
||||
self._tick,
|
||||
)
|
||||
break
|
||||
|
||||
subgoal = await self._decide(state)
|
||||
await self._broadcast_subgoal(subgoal)
|
||||
self._ledger.log_tick(
|
||||
tick=self._tick,
|
||||
campaign_day=state.get("campaign_day", self._tick),
|
||||
subgoal=subgoal.token,
|
||||
)
|
||||
|
||||
self._tick += 1
|
||||
await asyncio.sleep(self._tick_interval)
|
||||
|
||||
except asyncio.CancelledError:
|
||||
logger.info("King campaign task cancelled at tick %d", self._tick)
|
||||
raise
|
||||
finally:
|
||||
self._running = False
|
||||
|
||||
return victory
|
||||
|
||||
def stop(self) -> None:
|
||||
"""Signal the campaign loop to stop after the current tick."""
|
||||
self._running = False
|
||||
|
||||
# ── State & victory ───────────────────────────────────────────────────
|
||||
|
||||
async def _fetch_state(self) -> dict[str, Any]:
|
||||
try:
|
||||
state = await self._gabs.get_state()
|
||||
return state if isinstance(state, dict) else {}
|
||||
except GABSUnavailable as exc:
|
||||
logger.warning("GABS unavailable at tick %d: %s — using empty state", self._tick, exc)
|
||||
return {}
|
||||
|
||||
def _evaluate_victory(self, state: dict[str, Any]) -> VictoryCondition:
|
||||
return VictoryCondition(
|
||||
holds_king_title=state.get("player_title") == "King",
|
||||
territory_control_pct=float(state.get("territory_control_pct", 0.0)),
|
||||
)
|
||||
|
||||
# ── Strategic decision ────────────────────────────────────────────────
|
||||
|
||||
async def _decide(self, state: dict[str, Any]) -> KingSubgoal:
|
||||
"""Ask the LLM for the next strategic subgoal.
|
||||
|
||||
Falls back to RECRUIT (safe default) if the LLM is unavailable.
|
||||
"""
|
||||
try:
|
||||
subgoal = await asyncio.to_thread(self._llm_decide, state)
|
||||
return subgoal
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.warning(
|
||||
"King LLM decision failed at tick %d: %s — defaulting to RECRUIT", self._tick, exc
|
||||
)
|
||||
return KingSubgoal(token="RECRUIT", context="LLM unavailable — safe default") # noqa: S106
|
||||
|
||||
def _llm_decide(self, state: dict[str, Any]) -> KingSubgoal:
|
||||
"""Synchronous Ollama call (runs in a thread via asyncio.to_thread)."""
|
||||
import urllib.request
|
||||
|
||||
prompt_state = json.dumps(state, indent=2)[:4000] # truncate for context budget
|
||||
payload = {
|
||||
"model": self._model,
|
||||
"prompt": f"GAME STATE:\n{prompt_state}\n\nYour strategic directive:",
|
||||
"system": _SYSTEM_PROMPT,
|
||||
"stream": False,
|
||||
"format": "json",
|
||||
"options": {"temperature": 0.1},
|
||||
}
|
||||
data = json.dumps(payload).encode()
|
||||
req = urllib.request.Request(
|
||||
f"{self._ollama_url}/api/generate",
|
||||
data=data,
|
||||
headers={"Content-Type": "application/json"},
|
||||
)
|
||||
with urllib.request.urlopen(req, timeout=30) as resp: # noqa: S310
|
||||
result = json.loads(resp.read())
|
||||
|
||||
raw = result.get("response", "{}")
|
||||
parsed = json.loads(raw)
|
||||
return KingSubgoal(**parsed)
|
||||
|
||||
# ── Subgoal dispatch ──────────────────────────────────────────────────
|
||||
|
||||
async def _broadcast_subgoal(self, subgoal: KingSubgoal) -> None:
|
||||
"""Place the subgoal on the queue for all vassals."""
|
||||
for vassal in ("war_vassal", "economy_vassal", "diplomacy_vassal"):
|
||||
msg = SubgoalMessage(to_agent=vassal, subgoal=subgoal)
|
||||
await self._subgoal_queue.put(msg)
|
||||
logger.debug(
|
||||
"Tick %d: subgoal %s → %s (priority=%.1f)",
|
||||
self._tick,
|
||||
subgoal.token,
|
||||
subgoal.target or "—",
|
||||
subgoal.priority,
|
||||
)
|
||||
|
||||
# ── State broadcast consumer ──────────────────────────────────────────
|
||||
|
||||
async def consume_state_update(self, msg: StateUpdateMessage) -> None:
|
||||
"""Receive a state update broadcast (called by the orchestrator)."""
|
||||
logger.debug("King received state update tick=%d", msg.tick)
|
||||
@@ -1,296 +0,0 @@
|
||||
"""Vassal agents — War, Economy, and Diplomacy.
|
||||
|
||||
Vassals are mid-tier agents responsible for a domain of the kingdom.
|
||||
Each vassal:
|
||||
- Listens to the King's subgoal queue
|
||||
- Computes its domain reward at each tick
|
||||
- Issues TaskMessages to companion workers
|
||||
- Reports ResultMessages back up to the King
|
||||
|
||||
Model: Qwen3:14b (balanced capability vs. latency).
|
||||
Frequency: up to 4× per campaign day.
|
||||
|
||||
Refs: #1097, #1099.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from bannerlord.gabs_client import GABSClient, GABSUnavailable
|
||||
from bannerlord.models import (
|
||||
DiplomacyReward,
|
||||
EconomyReward,
|
||||
KingSubgoal,
|
||||
ResultMessage,
|
||||
SubgoalMessage,
|
||||
TaskMessage,
|
||||
WarReward,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Tokens each vassal responds to (all others are ignored)
|
||||
_WAR_TOKENS = {"EXPAND_TERRITORY", "RAID_ECONOMY", "TRAIN"}
|
||||
_ECON_TOKENS = {"FORTIFY", "CONSOLIDATE"}
|
||||
_DIPLO_TOKENS = {"ALLY"}
|
||||
_LOGISTICS_TOKENS = {"RECRUIT", "HEAL"}
|
||||
_TRADE_TOKENS = {"TRADE"}
|
||||
_SCOUT_TOKENS = {"SPY"}
|
||||
|
||||
|
||||
class BaseVassal:
|
||||
"""Shared vassal lifecycle — subscribes to subgoal queue, runs tick loop."""
|
||||
|
||||
name: str = "base_vassal"
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
gabs_client: GABSClient,
|
||||
subgoal_queue: asyncio.Queue[SubgoalMessage],
|
||||
result_queue: asyncio.Queue[ResultMessage] | None = None,
|
||||
task_queue: asyncio.Queue[TaskMessage] | None = None,
|
||||
) -> None:
|
||||
self._gabs = gabs_client
|
||||
self._subgoal_queue = subgoal_queue
|
||||
self._result_queue = result_queue or asyncio.Queue()
|
||||
self._task_queue = task_queue or asyncio.Queue()
|
||||
self._active_subgoal: KingSubgoal | None = None
|
||||
self._running = False
|
||||
|
||||
@property
|
||||
def task_queue(self) -> asyncio.Queue[TaskMessage]:
|
||||
return self._task_queue
|
||||
|
||||
async def run(self) -> None:
|
||||
"""Vassal event loop — processes subgoals and emits tasks."""
|
||||
self._running = True
|
||||
logger.info("%s started", self.name)
|
||||
try:
|
||||
while self._running:
|
||||
# Drain all pending subgoals (keep the latest)
|
||||
try:
|
||||
while True:
|
||||
msg = self._subgoal_queue.get_nowait()
|
||||
if msg.to_agent == self.name:
|
||||
self._active_subgoal = msg.subgoal
|
||||
logger.debug("%s received subgoal %s", self.name, msg.subgoal.token)
|
||||
except asyncio.QueueEmpty:
|
||||
pass
|
||||
|
||||
if self._active_subgoal is not None:
|
||||
await self._tick(self._active_subgoal)
|
||||
|
||||
await asyncio.sleep(0.25) # yield to event loop
|
||||
except asyncio.CancelledError:
|
||||
logger.info("%s cancelled", self.name)
|
||||
raise
|
||||
finally:
|
||||
self._running = False
|
||||
|
||||
def stop(self) -> None:
|
||||
self._running = False
|
||||
|
||||
async def _tick(self, subgoal: KingSubgoal) -> None:
|
||||
raise NotImplementedError
|
||||
|
||||
async def _get_state(self) -> dict[str, Any]:
|
||||
try:
|
||||
return await self._gabs.get_state() or {}
|
||||
except GABSUnavailable:
|
||||
return {}
|
||||
|
||||
|
||||
# ── War Vassal ────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class WarVassal(BaseVassal):
|
||||
"""Military operations — sieges, field battles, raids, defensive maneuvers.
|
||||
|
||||
Reward function:
|
||||
R = 0.40*ΔTerritoryValue + 0.25*ΔArmyStrengthRatio
|
||||
- 0.20*CasualtyCost - 0.10*SupplyCost + 0.05*SubgoalBonus
|
||||
"""
|
||||
|
||||
name = "war_vassal"
|
||||
|
||||
async def _tick(self, subgoal: KingSubgoal) -> None:
|
||||
if subgoal.token not in _WAR_TOKENS | _LOGISTICS_TOKENS:
|
||||
return
|
||||
|
||||
state = await self._get_state()
|
||||
reward = self._compute_reward(state, subgoal)
|
||||
|
||||
task = self._plan_action(state, subgoal)
|
||||
if task:
|
||||
await self._task_queue.put(task)
|
||||
|
||||
logger.debug(
|
||||
"%s tick: subgoal=%s reward=%.3f action=%s",
|
||||
self.name,
|
||||
subgoal.token,
|
||||
reward.total,
|
||||
task.primitive if task else "none",
|
||||
)
|
||||
|
||||
def _compute_reward(self, state: dict[str, Any], subgoal: KingSubgoal) -> WarReward:
|
||||
bonus = subgoal.priority * 0.05 if subgoal.token in _WAR_TOKENS else 0.0
|
||||
return WarReward(
|
||||
territory_delta=float(state.get("territory_delta", 0.0)),
|
||||
army_strength_ratio=float(state.get("army_strength_ratio", 1.0)),
|
||||
casualty_cost=float(state.get("casualty_cost", 0.0)),
|
||||
supply_cost=float(state.get("supply_cost", 0.0)),
|
||||
subgoal_bonus=bonus,
|
||||
)
|
||||
|
||||
def _plan_action(self, state: dict[str, Any], subgoal: KingSubgoal) -> TaskMessage | None:
|
||||
if subgoal.token == "EXPAND_TERRITORY" and subgoal.target: # noqa: S105
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="logistics_companion",
|
||||
primitive="move_party",
|
||||
args={"destination": subgoal.target},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
if subgoal.token == "RECRUIT": # noqa: S105
|
||||
qty = subgoal.quantity or 20
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="logistics_companion",
|
||||
primitive="recruit_troop",
|
||||
args={"troop_type": "infantry", "quantity": qty},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
if subgoal.token == "TRAIN": # noqa: S105
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="logistics_companion",
|
||||
primitive="upgrade_troops",
|
||||
args={},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
# ── Economy Vassal ────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class EconomyVassal(BaseVassal):
|
||||
"""Settlement management, tax collection, construction, food supply.
|
||||
|
||||
Reward function:
|
||||
R = 0.35*DailyDenarsIncome + 0.25*FoodStockBuffer + 0.20*LoyaltyAverage
|
||||
- 0.15*ConstructionQueueLength + 0.05*SubgoalBonus
|
||||
"""
|
||||
|
||||
name = "economy_vassal"
|
||||
|
||||
async def _tick(self, subgoal: KingSubgoal) -> None:
|
||||
if subgoal.token not in _ECON_TOKENS | _TRADE_TOKENS:
|
||||
return
|
||||
|
||||
state = await self._get_state()
|
||||
reward = self._compute_reward(state, subgoal)
|
||||
|
||||
task = self._plan_action(state, subgoal)
|
||||
if task:
|
||||
await self._task_queue.put(task)
|
||||
|
||||
logger.debug(
|
||||
"%s tick: subgoal=%s reward=%.3f",
|
||||
self.name,
|
||||
subgoal.token,
|
||||
reward.total,
|
||||
)
|
||||
|
||||
def _compute_reward(self, state: dict[str, Any], subgoal: KingSubgoal) -> EconomyReward:
|
||||
bonus = subgoal.priority * 0.05 if subgoal.token in _ECON_TOKENS else 0.0
|
||||
return EconomyReward(
|
||||
daily_denars_income=float(state.get("daily_income", 0.0)),
|
||||
food_stock_buffer=float(state.get("food_days_remaining", 0.0)),
|
||||
loyalty_average=float(state.get("avg_loyalty", 50.0)),
|
||||
construction_queue_length=int(state.get("construction_queue", 0)),
|
||||
subgoal_bonus=bonus,
|
||||
)
|
||||
|
||||
def _plan_action(self, state: dict[str, Any], subgoal: KingSubgoal) -> TaskMessage | None:
|
||||
if subgoal.token == "FORTIFY" and subgoal.target: # noqa: S105
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="logistics_companion",
|
||||
primitive="build_project",
|
||||
args={"settlement": subgoal.target},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
if subgoal.token == "TRADE": # noqa: S105
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="caravan_companion",
|
||||
primitive="assess_prices",
|
||||
args={"town": subgoal.target or "nearest"},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
# ── Diplomacy Vassal ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class DiplomacyVassal(BaseVassal):
|
||||
"""Relations management — alliances, peace deals, tribute, marriage.
|
||||
|
||||
Reward function:
|
||||
R = 0.30*AlliesCount + 0.25*TruceDurationValue + 0.25*RelationsScoreWeighted
|
||||
- 0.15*ActiveWarsFront + 0.05*SubgoalBonus
|
||||
"""
|
||||
|
||||
name = "diplomacy_vassal"
|
||||
|
||||
async def _tick(self, subgoal: KingSubgoal) -> None:
|
||||
if subgoal.token not in _DIPLO_TOKENS | _SCOUT_TOKENS:
|
||||
return
|
||||
|
||||
state = await self._get_state()
|
||||
reward = self._compute_reward(state, subgoal)
|
||||
|
||||
task = self._plan_action(state, subgoal)
|
||||
if task:
|
||||
await self._task_queue.put(task)
|
||||
|
||||
logger.debug(
|
||||
"%s tick: subgoal=%s reward=%.3f",
|
||||
self.name,
|
||||
subgoal.token,
|
||||
reward.total,
|
||||
)
|
||||
|
||||
def _compute_reward(self, state: dict[str, Any], subgoal: KingSubgoal) -> DiplomacyReward:
|
||||
bonus = subgoal.priority * 0.05 if subgoal.token in _DIPLO_TOKENS else 0.0
|
||||
return DiplomacyReward(
|
||||
allies_count=int(state.get("allies_count", 0)),
|
||||
truce_duration_value=float(state.get("truce_value", 0.0)),
|
||||
relations_score_weighted=float(state.get("relations_weighted", 0.0)),
|
||||
active_wars_front=int(state.get("active_wars", 0)),
|
||||
subgoal_bonus=bonus,
|
||||
)
|
||||
|
||||
def _plan_action(self, state: dict[str, Any], subgoal: KingSubgoal) -> TaskMessage | None:
|
||||
if subgoal.token == "ALLY" and subgoal.target: # noqa: S105
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="scout_companion",
|
||||
primitive="track_lord",
|
||||
args={"name": subgoal.target},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
if subgoal.token == "SPY" and subgoal.target: # noqa: S105
|
||||
return TaskMessage(
|
||||
from_agent=self.name,
|
||||
to_agent="scout_companion",
|
||||
primitive="assess_garrison",
|
||||
args={"settlement": subgoal.target},
|
||||
priority=subgoal.priority,
|
||||
)
|
||||
return None
|
||||
@@ -1,198 +0,0 @@
|
||||
"""GABS TCP/JSON-RPC client.
|
||||
|
||||
Connects to the Bannerlord.GABS C# mod server running on a Windows VM.
|
||||
Protocol: newline-delimited JSON-RPC 2.0 over raw TCP.
|
||||
|
||||
Default host: localhost, port: 4825 (configurable via settings.bannerlord_gabs_host
|
||||
and settings.bannerlord_gabs_port).
|
||||
|
||||
Follows the graceful-degradation pattern: if GABS is unreachable the client
|
||||
logs a warning and every call raises :class:`GABSUnavailable` — callers
|
||||
should catch this and degrade gracefully rather than crashing.
|
||||
|
||||
Refs: #1091, #1097.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_DEFAULT_HOST = "localhost"
|
||||
_DEFAULT_PORT = 4825
|
||||
_DEFAULT_TIMEOUT = 10.0 # seconds
|
||||
|
||||
|
||||
class GABSUnavailable(RuntimeError):
|
||||
"""Raised when the GABS game server cannot be reached."""
|
||||
|
||||
|
||||
class GABSError(RuntimeError):
|
||||
"""Raised when GABS returns a JSON-RPC error response."""
|
||||
|
||||
def __init__(self, code: int, message: str) -> None:
|
||||
super().__init__(f"GABS error {code}: {message}")
|
||||
self.code = code
|
||||
|
||||
|
||||
class GABSClient:
|
||||
"""Async TCP JSON-RPC client for Bannerlord.GABS.
|
||||
|
||||
Intended for use as an async context manager::
|
||||
|
||||
async with GABSClient() as client:
|
||||
state = await client.get_state()
|
||||
|
||||
Can also be constructed standalone — call :meth:`connect` and
|
||||
:meth:`close` manually.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
host: str = _DEFAULT_HOST,
|
||||
port: int = _DEFAULT_PORT,
|
||||
timeout: float = _DEFAULT_TIMEOUT,
|
||||
) -> None:
|
||||
self._host = host
|
||||
self._port = port
|
||||
self._timeout = timeout
|
||||
self._reader: asyncio.StreamReader | None = None
|
||||
self._writer: asyncio.StreamWriter | None = None
|
||||
self._seq = 0
|
||||
self._connected = False
|
||||
|
||||
# ── Lifecycle ─────────────────────────────────────────────────────────
|
||||
|
||||
async def connect(self) -> None:
|
||||
"""Open the TCP connection to GABS.
|
||||
|
||||
Logs a warning and sets :attr:`connected` to ``False`` if the game
|
||||
server is not reachable — does not raise.
|
||||
"""
|
||||
try:
|
||||
self._reader, self._writer = await asyncio.wait_for(
|
||||
asyncio.open_connection(self._host, self._port),
|
||||
timeout=self._timeout,
|
||||
)
|
||||
self._connected = True
|
||||
logger.info("GABS connected at %s:%s", self._host, self._port)
|
||||
except (TimeoutError, OSError) as exc:
|
||||
logger.warning(
|
||||
"GABS unavailable at %s:%s — Bannerlord agent will degrade: %s",
|
||||
self._host,
|
||||
self._port,
|
||||
exc,
|
||||
)
|
||||
self._connected = False
|
||||
|
||||
async def close(self) -> None:
|
||||
if self._writer is not None:
|
||||
try:
|
||||
self._writer.close()
|
||||
await self._writer.wait_closed()
|
||||
except Exception: # noqa: BLE001
|
||||
pass
|
||||
self._connected = False
|
||||
logger.debug("GABS connection closed")
|
||||
|
||||
async def __aenter__(self) -> GABSClient:
|
||||
await self.connect()
|
||||
return self
|
||||
|
||||
async def __aexit__(self, *_: Any) -> None:
|
||||
await self.close()
|
||||
|
||||
@property
|
||||
def connected(self) -> bool:
|
||||
return self._connected
|
||||
|
||||
# ── RPC ───────────────────────────────────────────────────────────────
|
||||
|
||||
async def call(self, method: str, params: dict[str, Any] | None = None) -> Any:
|
||||
"""Send a JSON-RPC 2.0 request and return the ``result`` field.
|
||||
|
||||
Raises:
|
||||
GABSUnavailable: if the client is not connected.
|
||||
GABSError: if the server returns a JSON-RPC error.
|
||||
"""
|
||||
if not self._connected or self._reader is None or self._writer is None:
|
||||
raise GABSUnavailable(
|
||||
f"GABS not connected (host={self._host}, port={self._port}). "
|
||||
"Is the Bannerlord VM running?"
|
||||
)
|
||||
|
||||
self._seq += 1
|
||||
request = {
|
||||
"jsonrpc": "2.0",
|
||||
"id": self._seq,
|
||||
"method": method,
|
||||
"params": params or {},
|
||||
}
|
||||
payload = json.dumps(request) + "\n"
|
||||
|
||||
try:
|
||||
self._writer.write(payload.encode())
|
||||
await asyncio.wait_for(self._writer.drain(), timeout=self._timeout)
|
||||
|
||||
raw = await asyncio.wait_for(self._reader.readline(), timeout=self._timeout)
|
||||
except (TimeoutError, OSError) as exc:
|
||||
self._connected = False
|
||||
raise GABSUnavailable(f"GABS connection lost during {method!r}: {exc}") from exc
|
||||
|
||||
response = json.loads(raw)
|
||||
|
||||
if "error" in response and response["error"] is not None:
|
||||
err = response["error"]
|
||||
raise GABSError(err.get("code", -1), err.get("message", "unknown"))
|
||||
|
||||
return response.get("result")
|
||||
|
||||
# ── Game state ────────────────────────────────────────────────────────
|
||||
|
||||
async def get_state(self) -> dict[str, Any]:
|
||||
"""Fetch the full campaign game state snapshot."""
|
||||
return await self.call("game.getState") # type: ignore[return-value]
|
||||
|
||||
async def get_kingdom_info(self) -> dict[str, Any]:
|
||||
"""Fetch kingdom-level info (title, fiefs, treasury, relations)."""
|
||||
return await self.call("kingdom.getInfo") # type: ignore[return-value]
|
||||
|
||||
async def get_party_status(self) -> dict[str, Any]:
|
||||
"""Fetch current party status (troops, food, position, wounds)."""
|
||||
return await self.call("party.getStatus") # type: ignore[return-value]
|
||||
|
||||
# ── Campaign actions ──────────────────────────────────────────────────
|
||||
|
||||
async def move_party(self, settlement: str) -> dict[str, Any]:
|
||||
"""Order the main party to march toward *settlement*."""
|
||||
return await self.call("party.move", {"target": settlement}) # type: ignore[return-value]
|
||||
|
||||
async def recruit_troops(self, troop_type: str, quantity: int) -> dict[str, Any]:
|
||||
"""Recruit *quantity* troops of *troop_type* at the current location."""
|
||||
return await self.call( # type: ignore[return-value]
|
||||
"party.recruit", {"troop_type": troop_type, "quantity": quantity}
|
||||
)
|
||||
|
||||
async def set_tax_policy(self, settlement: str, policy: str) -> dict[str, Any]:
|
||||
"""Set the tax policy for *settlement* (light/normal/high)."""
|
||||
return await self.call( # type: ignore[return-value]
|
||||
"settlement.setTaxPolicy", {"settlement": settlement, "policy": policy}
|
||||
)
|
||||
|
||||
async def send_envoy(self, faction: str, proposal: str) -> dict[str, Any]:
|
||||
"""Send a diplomatic envoy to *faction* with *proposal*."""
|
||||
return await self.call( # type: ignore[return-value]
|
||||
"diplomacy.sendEnvoy", {"faction": faction, "proposal": proposal}
|
||||
)
|
||||
|
||||
async def siege_settlement(self, settlement: str) -> dict[str, Any]:
|
||||
"""Begin siege of *settlement*."""
|
||||
return await self.call("battle.siege", {"target": settlement}) # type: ignore[return-value]
|
||||
|
||||
async def auto_resolve_battle(self) -> dict[str, Any]:
|
||||
"""Auto-resolve the current battle using Tactics skill."""
|
||||
return await self.call("battle.autoResolve") # type: ignore[return-value]
|
||||
@@ -1,256 +0,0 @@
|
||||
"""Asset ledger for the Bannerlord sovereign agent.
|
||||
|
||||
Tracks kingdom assets (denars, settlements, troop allocations) in an
|
||||
in-memory dict backed by SQLite for persistence. Follows the existing
|
||||
SQLite migration pattern in this repo.
|
||||
|
||||
The King has exclusive write access to treasury and settlement ownership.
|
||||
Vassals receive an allocated budget and cannot exceed it without King
|
||||
re-authorization. Companions hold only work-in-progress quotas.
|
||||
|
||||
Refs: #1097, #1099.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import sqlite3
|
||||
from collections.abc import Iterator
|
||||
from contextlib import contextmanager
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_DEFAULT_DB = Path.home() / ".timmy" / "bannerlord" / "ledger.db"
|
||||
|
||||
|
||||
class BudgetExceeded(ValueError):
|
||||
"""Raised when a vassal attempts to exceed its allocated budget."""
|
||||
|
||||
|
||||
class Ledger:
|
||||
"""Sovereign asset ledger backed by SQLite.
|
||||
|
||||
Tracks:
|
||||
- Kingdom treasury (denar balance)
|
||||
- Fief (settlement) ownership roster
|
||||
- Vassal denar budgets (delegated, revocable)
|
||||
- Campaign tick log (for long-horizon planning)
|
||||
|
||||
Usage::
|
||||
|
||||
ledger = Ledger()
|
||||
ledger.initialize()
|
||||
ledger.deposit(5000, "tax income — Epicrotea")
|
||||
ledger.allocate_budget("war_vassal", 2000)
|
||||
"""
|
||||
|
||||
def __init__(self, db_path: Path = _DEFAULT_DB) -> None:
|
||||
self._db_path = db_path
|
||||
self._db_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# ── Setup ─────────────────────────────────────────────────────────────
|
||||
|
||||
def initialize(self) -> None:
|
||||
"""Create tables if they don't exist."""
|
||||
with self._conn() as conn:
|
||||
conn.executescript(
|
||||
"""
|
||||
CREATE TABLE IF NOT EXISTS treasury (
|
||||
id INTEGER PRIMARY KEY CHECK (id = 1),
|
||||
balance REAL NOT NULL DEFAULT 0
|
||||
);
|
||||
INSERT OR IGNORE INTO treasury (id, balance) VALUES (1, 0);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS fiefs (
|
||||
name TEXT PRIMARY KEY,
|
||||
fief_type TEXT NOT NULL, -- town / castle / village
|
||||
acquired_at TEXT NOT NULL
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS vassal_budgets (
|
||||
agent TEXT PRIMARY KEY,
|
||||
allocated REAL NOT NULL DEFAULT 0,
|
||||
spent REAL NOT NULL DEFAULT 0
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS tick_log (
|
||||
tick INTEGER PRIMARY KEY,
|
||||
campaign_day INTEGER NOT NULL,
|
||||
subgoal TEXT,
|
||||
reward_war REAL,
|
||||
reward_econ REAL,
|
||||
reward_diplo REAL,
|
||||
logged_at TEXT NOT NULL
|
||||
);
|
||||
"""
|
||||
)
|
||||
logger.debug("Ledger initialized at %s", self._db_path)
|
||||
|
||||
# ── Treasury ──────────────────────────────────────────────────────────
|
||||
|
||||
def balance(self) -> float:
|
||||
with self._conn() as conn:
|
||||
row = conn.execute("SELECT balance FROM treasury WHERE id = 1").fetchone()
|
||||
return float(row[0]) if row else 0.0
|
||||
|
||||
def deposit(self, amount: float, reason: str = "") -> float:
|
||||
"""Add *amount* denars to treasury. Returns new balance."""
|
||||
if amount < 0:
|
||||
raise ValueError("Use withdraw() for negative amounts")
|
||||
with self._conn() as conn:
|
||||
conn.execute("UPDATE treasury SET balance = balance + ? WHERE id = 1", (amount,))
|
||||
bal = self.balance()
|
||||
logger.info("Treasury +%.0f denars (%s) → balance %.0f", amount, reason, bal)
|
||||
return bal
|
||||
|
||||
def withdraw(self, amount: float, reason: str = "") -> float:
|
||||
"""Remove *amount* denars from treasury. Returns new balance."""
|
||||
if amount < 0:
|
||||
raise ValueError("Amount must be positive")
|
||||
bal = self.balance()
|
||||
if amount > bal:
|
||||
raise BudgetExceeded(
|
||||
f"Cannot withdraw {amount:.0f} denars — treasury balance is only {bal:.0f}"
|
||||
)
|
||||
with self._conn() as conn:
|
||||
conn.execute("UPDATE treasury SET balance = balance - ? WHERE id = 1", (amount,))
|
||||
new_bal = self.balance()
|
||||
logger.info("Treasury -%.0f denars (%s) → balance %.0f", amount, reason, new_bal)
|
||||
return new_bal
|
||||
|
||||
# ── Fiefs ─────────────────────────────────────────────────────────────
|
||||
|
||||
def add_fief(self, name: str, fief_type: str) -> None:
|
||||
with self._conn() as conn:
|
||||
conn.execute(
|
||||
"INSERT OR REPLACE INTO fiefs (name, fief_type, acquired_at) VALUES (?, ?, ?)",
|
||||
(name, fief_type, datetime.utcnow().isoformat()),
|
||||
)
|
||||
logger.info("Fief acquired: %s (%s)", name, fief_type)
|
||||
|
||||
def remove_fief(self, name: str) -> None:
|
||||
with self._conn() as conn:
|
||||
conn.execute("DELETE FROM fiefs WHERE name = ?", (name,))
|
||||
logger.info("Fief lost: %s", name)
|
||||
|
||||
def list_fiefs(self) -> list[dict[str, str]]:
|
||||
with self._conn() as conn:
|
||||
rows = conn.execute("SELECT name, fief_type, acquired_at FROM fiefs").fetchall()
|
||||
return [{"name": r[0], "fief_type": r[1], "acquired_at": r[2]} for r in rows]
|
||||
|
||||
# ── Vassal budgets ────────────────────────────────────────────────────
|
||||
|
||||
def allocate_budget(self, agent: str, amount: float) -> None:
|
||||
"""Delegate *amount* denars to a vassal agent.
|
||||
|
||||
Withdraws from treasury. Raises :class:`BudgetExceeded` if
|
||||
the treasury cannot cover the allocation.
|
||||
"""
|
||||
self.withdraw(amount, reason=f"budget → {agent}")
|
||||
with self._conn() as conn:
|
||||
conn.execute(
|
||||
"""
|
||||
INSERT INTO vassal_budgets (agent, allocated, spent)
|
||||
VALUES (?, ?, 0)
|
||||
ON CONFLICT(agent) DO UPDATE SET allocated = allocated + excluded.allocated
|
||||
""",
|
||||
(agent, amount),
|
||||
)
|
||||
logger.info("Allocated %.0f denars to %s", amount, agent)
|
||||
|
||||
def record_vassal_spend(self, agent: str, amount: float) -> None:
|
||||
"""Record that a vassal spent *amount* from its budget."""
|
||||
with self._conn() as conn:
|
||||
row = conn.execute(
|
||||
"SELECT allocated, spent FROM vassal_budgets WHERE agent = ?", (agent,)
|
||||
).fetchone()
|
||||
if row is None:
|
||||
raise BudgetExceeded(f"{agent} has no allocated budget")
|
||||
allocated, spent = row
|
||||
if spent + amount > allocated:
|
||||
raise BudgetExceeded(
|
||||
f"{agent} budget exhausted: {spent:.0f}/{allocated:.0f} spent, "
|
||||
f"requested {amount:.0f}"
|
||||
)
|
||||
with self._conn() as conn:
|
||||
conn.execute(
|
||||
"UPDATE vassal_budgets SET spent = spent + ? WHERE agent = ?",
|
||||
(amount, agent),
|
||||
)
|
||||
|
||||
def vassal_remaining(self, agent: str) -> float:
|
||||
with self._conn() as conn:
|
||||
row = conn.execute(
|
||||
"SELECT allocated - spent FROM vassal_budgets WHERE agent = ?", (agent,)
|
||||
).fetchone()
|
||||
return float(row[0]) if row else 0.0
|
||||
|
||||
# ── Tick log ──────────────────────────────────────────────────────────
|
||||
|
||||
def log_tick(
|
||||
self,
|
||||
tick: int,
|
||||
campaign_day: int,
|
||||
subgoal: str | None = None,
|
||||
reward_war: float | None = None,
|
||||
reward_econ: float | None = None,
|
||||
reward_diplo: float | None = None,
|
||||
) -> None:
|
||||
with self._conn() as conn:
|
||||
conn.execute(
|
||||
"""
|
||||
INSERT OR REPLACE INTO tick_log
|
||||
(tick, campaign_day, subgoal, reward_war, reward_econ, reward_diplo, logged_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?)
|
||||
""",
|
||||
(
|
||||
tick,
|
||||
campaign_day,
|
||||
subgoal,
|
||||
reward_war,
|
||||
reward_econ,
|
||||
reward_diplo,
|
||||
datetime.utcnow().isoformat(),
|
||||
),
|
||||
)
|
||||
|
||||
def tick_history(self, last_n: int = 100) -> list[dict]:
|
||||
with self._conn() as conn:
|
||||
rows = conn.execute(
|
||||
"""
|
||||
SELECT tick, campaign_day, subgoal, reward_war, reward_econ, reward_diplo, logged_at
|
||||
FROM tick_log
|
||||
ORDER BY tick DESC
|
||||
LIMIT ?
|
||||
""",
|
||||
(last_n,),
|
||||
).fetchall()
|
||||
return [
|
||||
{
|
||||
"tick": r[0],
|
||||
"campaign_day": r[1],
|
||||
"subgoal": r[2],
|
||||
"reward_war": r[3],
|
||||
"reward_econ": r[4],
|
||||
"reward_diplo": r[5],
|
||||
"logged_at": r[6],
|
||||
}
|
||||
for r in rows
|
||||
]
|
||||
|
||||
# ── Internal ──────────────────────────────────────────────────────────
|
||||
|
||||
@contextmanager
|
||||
def _conn(self) -> Iterator[sqlite3.Connection]:
|
||||
conn = sqlite3.connect(self._db_path)
|
||||
conn.execute("PRAGMA journal_mode=WAL")
|
||||
try:
|
||||
yield conn
|
||||
conn.commit()
|
||||
except Exception:
|
||||
conn.rollback()
|
||||
raise
|
||||
finally:
|
||||
conn.close()
|
||||
@@ -1,191 +0,0 @@
|
||||
"""Bannerlord feudal hierarchy data models.
|
||||
|
||||
All inter-agent communication uses typed Pydantic models. No raw dicts
|
||||
cross agent boundaries — every message is validated at construction time.
|
||||
|
||||
Design: Ahilan & Dayan (2019) Feudal Multi-Agent Hierarchies.
|
||||
Refs: #1097, #1099.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import datetime
|
||||
from typing import Any, Literal
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
# ── Subgoal vocabulary ────────────────────────────────────────────────────────
|
||||
|
||||
SUBGOAL_TOKENS = frozenset(
|
||||
{
|
||||
"EXPAND_TERRITORY", # Take or secure a fief — War Vassal
|
||||
"RAID_ECONOMY", # Raid enemy villages for denars — War Vassal
|
||||
"FORTIFY", # Upgrade or repair a settlement — Economy Vassal
|
||||
"RECRUIT", # Fill party to capacity — Logistics Companion
|
||||
"TRADE", # Execute profitable trade route — Caravan Companion
|
||||
"ALLY", # Pursue non-aggression / alliance — Diplomacy Vassal
|
||||
"SPY", # Gain information on target faction — Scout Companion
|
||||
"HEAL", # Rest party until wounds recovered — Logistics Companion
|
||||
"CONSOLIDATE", # Hold territory, no expansion — Economy Vassal
|
||||
"TRAIN", # Level troops via auto-resolve bandits — War Vassal
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
# ── King subgoal ──────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class KingSubgoal(BaseModel):
|
||||
"""Strategic directive issued by the King agent to vassals.
|
||||
|
||||
The King operates on campaign-map timescale (days to weeks of in-game
|
||||
time). His sole output is one subgoal token plus optional parameters.
|
||||
He never micro-manages primitives.
|
||||
"""
|
||||
|
||||
token: str = Field(..., description="One of SUBGOAL_TOKENS")
|
||||
target: str | None = Field(None, description="Named target (settlement, lord, faction)")
|
||||
quantity: int | None = Field(None, description="For RECRUIT, TRADE tokens", ge=1)
|
||||
priority: float = Field(1.0, ge=0.0, le=2.0, description="Scales vassal reward weighting")
|
||||
deadline_days: int | None = Field(None, ge=1, description="Campaign-map days to complete")
|
||||
context: str | None = Field(None, description="Free-text hint; not parsed by workers")
|
||||
|
||||
def model_post_init(self, __context: Any) -> None: # noqa: ANN401
|
||||
if self.token not in SUBGOAL_TOKENS:
|
||||
raise ValueError(
|
||||
f"Unknown subgoal token {self.token!r}. Must be one of: {sorted(SUBGOAL_TOKENS)}"
|
||||
)
|
||||
|
||||
|
||||
# ── Inter-agent messages ──────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class SubgoalMessage(BaseModel):
|
||||
"""King → Vassal direction."""
|
||||
|
||||
msg_type: Literal["subgoal"] = "subgoal"
|
||||
from_agent: Literal["king"] = "king"
|
||||
to_agent: str = Field(..., description="e.g. 'war_vassal', 'economy_vassal'")
|
||||
subgoal: KingSubgoal
|
||||
issued_at: datetime = Field(default_factory=datetime.utcnow)
|
||||
|
||||
|
||||
class TaskMessage(BaseModel):
|
||||
"""Vassal → Companion direction."""
|
||||
|
||||
msg_type: Literal["task"] = "task"
|
||||
from_agent: str = Field(..., description="e.g. 'war_vassal'")
|
||||
to_agent: str = Field(..., description="e.g. 'logistics_companion'")
|
||||
primitive: str = Field(..., description="One of the companion primitives")
|
||||
args: dict[str, Any] = Field(default_factory=dict)
|
||||
priority: float = Field(1.0, ge=0.0, le=2.0)
|
||||
issued_at: datetime = Field(default_factory=datetime.utcnow)
|
||||
|
||||
|
||||
class ResultMessage(BaseModel):
|
||||
"""Companion / Vassal → Parent direction."""
|
||||
|
||||
msg_type: Literal["result"] = "result"
|
||||
from_agent: str
|
||||
to_agent: str
|
||||
success: bool
|
||||
outcome: dict[str, Any] = Field(default_factory=dict, description="Primitive-specific result")
|
||||
reward_delta: float = Field(0.0, description="Computed reward contribution")
|
||||
completed_at: datetime = Field(default_factory=datetime.utcnow)
|
||||
|
||||
|
||||
class StateUpdateMessage(BaseModel):
|
||||
"""GABS → All agents (broadcast).
|
||||
|
||||
Sent every campaign tick. Agents consume at their own cadence.
|
||||
"""
|
||||
|
||||
msg_type: Literal["state"] = "state"
|
||||
game_state: dict[str, Any] = Field(..., description="Full GABS state snapshot")
|
||||
tick: int = Field(..., ge=0)
|
||||
timestamp: datetime = Field(default_factory=datetime.utcnow)
|
||||
|
||||
|
||||
# ── Reward snapshots ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class WarReward(BaseModel):
|
||||
"""Computed reward for the War Vassal at a given tick."""
|
||||
|
||||
territory_delta: float = 0.0
|
||||
army_strength_ratio: float = 1.0
|
||||
casualty_cost: float = 0.0
|
||||
supply_cost: float = 0.0
|
||||
subgoal_bonus: float = 0.0
|
||||
|
||||
@property
|
||||
def total(self) -> float:
|
||||
w1, w2, w3, w4, w5 = 0.40, 0.25, 0.20, 0.10, 0.05
|
||||
return (
|
||||
w1 * self.territory_delta
|
||||
+ w2 * self.army_strength_ratio
|
||||
- w3 * self.casualty_cost
|
||||
- w4 * self.supply_cost
|
||||
+ w5 * self.subgoal_bonus
|
||||
)
|
||||
|
||||
|
||||
class EconomyReward(BaseModel):
|
||||
"""Computed reward for the Economy Vassal at a given tick."""
|
||||
|
||||
daily_denars_income: float = 0.0
|
||||
food_stock_buffer: float = 0.0
|
||||
loyalty_average: float = 50.0
|
||||
construction_queue_length: int = 0
|
||||
subgoal_bonus: float = 0.0
|
||||
|
||||
@property
|
||||
def total(self) -> float:
|
||||
w1, w2, w3, w4, w5 = 0.35, 0.25, 0.20, 0.15, 0.05
|
||||
return (
|
||||
w1 * self.daily_denars_income
|
||||
+ w2 * self.food_stock_buffer
|
||||
+ w3 * self.loyalty_average
|
||||
- w4 * self.construction_queue_length
|
||||
+ w5 * self.subgoal_bonus
|
||||
)
|
||||
|
||||
|
||||
class DiplomacyReward(BaseModel):
|
||||
"""Computed reward for the Diplomacy Vassal at a given tick."""
|
||||
|
||||
allies_count: int = 0
|
||||
truce_duration_value: float = 0.0
|
||||
relations_score_weighted: float = 0.0
|
||||
active_wars_front: int = 0
|
||||
subgoal_bonus: float = 0.0
|
||||
|
||||
@property
|
||||
def total(self) -> float:
|
||||
w1, w2, w3, w4, w5 = 0.30, 0.25, 0.25, 0.15, 0.05
|
||||
return (
|
||||
w1 * self.allies_count
|
||||
+ w2 * self.truce_duration_value
|
||||
+ w3 * self.relations_score_weighted
|
||||
- w4 * self.active_wars_front
|
||||
+ w5 * self.subgoal_bonus
|
||||
)
|
||||
|
||||
|
||||
# ── Victory condition ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class VictoryCondition(BaseModel):
|
||||
"""Sovereign Victory (M5) — evaluated each campaign tick."""
|
||||
|
||||
holds_king_title: bool = False
|
||||
territory_control_pct: float = Field(
|
||||
0.0, ge=0.0, le=100.0, description="% of Calradia fiefs held"
|
||||
)
|
||||
majority_threshold: float = Field(
|
||||
51.0, ge=0.0, le=100.0, description="Required % for majority control"
|
||||
)
|
||||
|
||||
@property
|
||||
def achieved(self) -> bool:
|
||||
return self.holds_king_title and self.territory_control_pct >= self.majority_threshold
|
||||
@@ -1 +0,0 @@
|
||||
"""Brain — identity system and task coordination."""
|
||||
@@ -1,314 +0,0 @@
|
||||
"""DistributedWorker — task lifecycle management and backend routing.
|
||||
|
||||
Routes delegated tasks to appropriate execution backends:
|
||||
|
||||
- agentic_loop: local multi-step execution via Timmy's agentic loop
|
||||
- kimi: heavy research tasks dispatched via Gitea kimi-ready issues
|
||||
- paperclip: task submission to the Paperclip API
|
||||
|
||||
Task lifecycle: queued → running → completed | failed
|
||||
|
||||
Failure handling: auto-retry up to MAX_RETRIES, then mark failed.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import threading
|
||||
import uuid
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import UTC, datetime
|
||||
from typing import Any, ClassVar
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
MAX_RETRIES = 2
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Task record
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@dataclass
|
||||
class DelegatedTask:
|
||||
"""Record of one delegated task and its execution state."""
|
||||
|
||||
task_id: str
|
||||
agent_name: str
|
||||
agent_role: str
|
||||
task_description: str
|
||||
priority: str
|
||||
backend: str # "agentic_loop" | "kimi" | "paperclip"
|
||||
status: str = "queued" # queued | running | completed | failed
|
||||
created_at: str = field(default_factory=lambda: datetime.now(UTC).isoformat())
|
||||
result: dict[str, Any] | None = None
|
||||
error: str | None = None
|
||||
retries: int = 0
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Worker
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class DistributedWorker:
|
||||
"""Routes and tracks delegated task execution across multiple backends.
|
||||
|
||||
All methods are class-methods; DistributedWorker is a singleton-style
|
||||
service — no instantiation needed.
|
||||
|
||||
Usage::
|
||||
|
||||
from brain.worker import DistributedWorker
|
||||
|
||||
task_id = DistributedWorker.submit("researcher", "research", "summarise X")
|
||||
status = DistributedWorker.get_status(task_id)
|
||||
"""
|
||||
|
||||
_tasks: ClassVar[dict[str, DelegatedTask]] = {}
|
||||
_lock: ClassVar[threading.Lock] = threading.Lock()
|
||||
|
||||
@classmethod
|
||||
def submit(
|
||||
cls,
|
||||
agent_name: str,
|
||||
agent_role: str,
|
||||
task_description: str,
|
||||
priority: str = "normal",
|
||||
) -> str:
|
||||
"""Submit a task for execution. Returns task_id immediately.
|
||||
|
||||
The task is registered as 'queued' and a daemon thread begins
|
||||
execution in the background. Use get_status(task_id) to poll.
|
||||
"""
|
||||
task_id = uuid.uuid4().hex[:8]
|
||||
backend = cls._select_backend(agent_role, task_description)
|
||||
|
||||
record = DelegatedTask(
|
||||
task_id=task_id,
|
||||
agent_name=agent_name,
|
||||
agent_role=agent_role,
|
||||
task_description=task_description,
|
||||
priority=priority,
|
||||
backend=backend,
|
||||
)
|
||||
|
||||
with cls._lock:
|
||||
cls._tasks[task_id] = record
|
||||
|
||||
thread = threading.Thread(
|
||||
target=cls._run_task,
|
||||
args=(record,),
|
||||
daemon=True,
|
||||
name=f"worker-{task_id}",
|
||||
)
|
||||
thread.start()
|
||||
|
||||
logger.info(
|
||||
"Task %s queued: %s → %.60s (backend=%s, priority=%s)",
|
||||
task_id,
|
||||
agent_name,
|
||||
task_description,
|
||||
backend,
|
||||
priority,
|
||||
)
|
||||
return task_id
|
||||
|
||||
@classmethod
|
||||
def get_status(cls, task_id: str) -> dict[str, Any]:
|
||||
"""Return current status of a task by ID."""
|
||||
record = cls._tasks.get(task_id)
|
||||
if record is None:
|
||||
return {"found": False, "task_id": task_id}
|
||||
return {
|
||||
"found": True,
|
||||
"task_id": record.task_id,
|
||||
"agent": record.agent_name,
|
||||
"role": record.agent_role,
|
||||
"status": record.status,
|
||||
"backend": record.backend,
|
||||
"priority": record.priority,
|
||||
"created_at": record.created_at,
|
||||
"retries": record.retries,
|
||||
"result": record.result,
|
||||
"error": record.error,
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def list_tasks(cls) -> list[dict[str, Any]]:
|
||||
"""Return a summary list of all tracked tasks."""
|
||||
with cls._lock:
|
||||
return [
|
||||
{
|
||||
"task_id": t.task_id,
|
||||
"agent": t.agent_name,
|
||||
"status": t.status,
|
||||
"backend": t.backend,
|
||||
"created_at": t.created_at,
|
||||
}
|
||||
for t in cls._tasks.values()
|
||||
]
|
||||
|
||||
@classmethod
|
||||
def clear(cls) -> None:
|
||||
"""Clear the task registry (for tests)."""
|
||||
with cls._lock:
|
||||
cls._tasks.clear()
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Backend selection
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
@classmethod
|
||||
def _select_backend(cls, agent_role: str, task_description: str) -> str:
|
||||
"""Choose the execution backend for a given agent role and task.
|
||||
|
||||
Priority:
|
||||
1. kimi — research role + Gitea enabled + task exceeds local capacity
|
||||
2. paperclip — paperclip API key is configured
|
||||
3. agentic_loop — local fallback (always available)
|
||||
"""
|
||||
try:
|
||||
from config import settings
|
||||
from timmy.kimi_delegation import exceeds_local_capacity
|
||||
|
||||
if (
|
||||
agent_role == "research"
|
||||
and getattr(settings, "gitea_enabled", False)
|
||||
and getattr(settings, "gitea_token", "")
|
||||
and exceeds_local_capacity(task_description)
|
||||
):
|
||||
return "kimi"
|
||||
|
||||
if getattr(settings, "paperclip_api_key", ""):
|
||||
return "paperclip"
|
||||
|
||||
except Exception as exc:
|
||||
logger.debug("Backend selection error — defaulting to agentic_loop: %s", exc)
|
||||
|
||||
return "agentic_loop"
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Task execution
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
@classmethod
|
||||
def _run_task(cls, record: DelegatedTask) -> None:
|
||||
"""Execute a task with retry logic. Runs inside a daemon thread."""
|
||||
record.status = "running"
|
||||
|
||||
for attempt in range(MAX_RETRIES + 1):
|
||||
try:
|
||||
if attempt > 0:
|
||||
logger.info(
|
||||
"Retrying task %s (attempt %d/%d)",
|
||||
record.task_id,
|
||||
attempt + 1,
|
||||
MAX_RETRIES + 1,
|
||||
)
|
||||
record.retries = attempt
|
||||
|
||||
result = cls._dispatch(record)
|
||||
record.status = "completed"
|
||||
record.result = result
|
||||
logger.info(
|
||||
"Task %s completed via %s",
|
||||
record.task_id,
|
||||
record.backend,
|
||||
)
|
||||
return
|
||||
|
||||
except Exception as exc:
|
||||
logger.warning(
|
||||
"Task %s attempt %d failed: %s",
|
||||
record.task_id,
|
||||
attempt + 1,
|
||||
exc,
|
||||
)
|
||||
if attempt == MAX_RETRIES:
|
||||
record.status = "failed"
|
||||
record.error = str(exc)
|
||||
logger.error(
|
||||
"Task %s exhausted %d retries. Final error: %s",
|
||||
record.task_id,
|
||||
MAX_RETRIES,
|
||||
exc,
|
||||
)
|
||||
|
||||
@classmethod
|
||||
def _dispatch(cls, record: DelegatedTask) -> dict[str, Any]:
|
||||
"""Route to the selected backend. Raises on failure."""
|
||||
if record.backend == "kimi":
|
||||
return asyncio.run(cls._execute_kimi(record))
|
||||
if record.backend == "paperclip":
|
||||
return asyncio.run(cls._execute_paperclip(record))
|
||||
return asyncio.run(cls._execute_agentic_loop(record))
|
||||
|
||||
@classmethod
|
||||
async def _execute_kimi(cls, record: DelegatedTask) -> dict[str, Any]:
|
||||
"""Create a kimi-ready Gitea issue for the task.
|
||||
|
||||
Kimi picks up the issue via the kimi-ready label and executes it.
|
||||
"""
|
||||
from timmy.kimi_delegation import create_kimi_research_issue
|
||||
|
||||
result = await create_kimi_research_issue(
|
||||
task=record.task_description[:120],
|
||||
context=f"Delegated by agent '{record.agent_name}' via delegate_task.",
|
||||
question=record.task_description,
|
||||
priority=record.priority,
|
||||
)
|
||||
if not result.get("success"):
|
||||
raise RuntimeError(f"Kimi issue creation failed: {result.get('error')}")
|
||||
return result
|
||||
|
||||
@classmethod
|
||||
async def _execute_paperclip(cls, record: DelegatedTask) -> dict[str, Any]:
|
||||
"""Submit the task to the Paperclip API."""
|
||||
import httpx
|
||||
|
||||
from timmy.paperclip import PaperclipClient
|
||||
|
||||
client = PaperclipClient()
|
||||
async with httpx.AsyncClient(timeout=client.timeout) as http:
|
||||
resp = await http.post(
|
||||
f"{client.base_url}/api/tasks",
|
||||
headers={"Authorization": f"Bearer {client.api_key}"},
|
||||
json={
|
||||
"kind": record.agent_role,
|
||||
"agent_id": client.agent_id,
|
||||
"company_id": client.company_id,
|
||||
"priority": record.priority,
|
||||
"context": {"task": record.task_description},
|
||||
},
|
||||
)
|
||||
|
||||
if resp.status_code in (200, 201):
|
||||
data = resp.json()
|
||||
logger.info(
|
||||
"Task %s submitted to Paperclip (paperclip_id=%s)",
|
||||
record.task_id,
|
||||
data.get("id"),
|
||||
)
|
||||
return {
|
||||
"success": True,
|
||||
"paperclip_task_id": data.get("id"),
|
||||
"backend": "paperclip",
|
||||
}
|
||||
raise RuntimeError(f"Paperclip API error {resp.status_code}: {resp.text[:200]}")
|
||||
|
||||
@classmethod
|
||||
async def _execute_agentic_loop(cls, record: DelegatedTask) -> dict[str, Any]:
|
||||
"""Execute the task via Timmy's local agentic loop."""
|
||||
from timmy.agentic_loop import run_agentic_loop
|
||||
|
||||
result = await run_agentic_loop(record.task_description)
|
||||
return {
|
||||
"success": result.status != "failed",
|
||||
"agentic_task_id": result.task_id,
|
||||
"summary": result.summary,
|
||||
"status": result.status,
|
||||
"backend": "agentic_loop",
|
||||
}
|
||||
@@ -1,8 +1,3 @@
|
||||
"""Central pydantic-settings configuration for Timmy Time Dashboard.
|
||||
|
||||
All environment variable access goes through the ``settings`` singleton
|
||||
exported from this module — never use ``os.environ.get()`` in app code.
|
||||
"""
|
||||
import logging as _logging
|
||||
import os
|
||||
import sys
|
||||
@@ -56,13 +51,6 @@ class Settings(BaseSettings):
|
||||
# Set to 0 to use model defaults.
|
||||
ollama_num_ctx: int = 32768
|
||||
|
||||
# Maximum models loaded simultaneously in Ollama — override with OLLAMA_MAX_LOADED_MODELS
|
||||
# Set to 2 so Qwen3-8B and Qwen3-14B can stay hot concurrently (~17 GB combined).
|
||||
# Requires Ollama ≥ 0.1.33. Export this to the Ollama process environment:
|
||||
# OLLAMA_MAX_LOADED_MODELS=2 ollama serve
|
||||
# or add it to your systemd/launchd unit before starting the harness.
|
||||
ollama_max_loaded_models: int = 2
|
||||
|
||||
# Fallback model chains — override with FALLBACK_MODELS / VISION_FALLBACK_MODELS
|
||||
# as comma-separated strings, e.g. FALLBACK_MODELS="qwen3:8b,qwen2.5:14b"
|
||||
# Or edit config/providers.yaml → fallback_chains for the canonical source.
|
||||
@@ -99,9 +87,8 @@ class Settings(BaseSettings):
|
||||
|
||||
# ── Backend selection ────────────────────────────────────────────────────
|
||||
# "ollama" — always use Ollama (default, safe everywhere)
|
||||
# "airllm" — AirLLM layer-by-layer loading (Apple Silicon only; degrades to Ollama)
|
||||
# "auto" — pick best available local backend, fall back to Ollama
|
||||
timmy_model_backend: Literal["ollama", "airllm", "grok", "claude", "auto"] = "ollama"
|
||||
timmy_model_backend: Literal["ollama", "grok", "claude", "auto"] = "ollama"
|
||||
|
||||
# ── Grok (xAI) — opt-in premium cloud backend ────────────────────────
|
||||
# Grok is a premium augmentation layer — local-first ethos preserved.
|
||||
@@ -114,16 +101,6 @@ class Settings(BaseSettings):
|
||||
grok_sats_hard_cap: int = 100 # Absolute ceiling on sats per Grok query
|
||||
grok_free: bool = False # Skip Lightning invoice when user has own API key
|
||||
|
||||
# ── Search Backend (SearXNG + Crawl4AI) ──────────────────────────────
|
||||
# "searxng" — self-hosted SearXNG meta-search engine (default, no API key)
|
||||
# "none" — disable web search (private/offline deployments)
|
||||
# Override with TIMMY_SEARCH_BACKEND env var.
|
||||
timmy_search_backend: Literal["searxng", "none"] = "searxng"
|
||||
# SearXNG base URL — override with TIMMY_SEARCH_URL env var
|
||||
search_url: str = "http://localhost:8888"
|
||||
# Crawl4AI base URL — override with TIMMY_CRAWL_URL env var
|
||||
crawl_url: str = "http://localhost:11235"
|
||||
|
||||
# ── Database ──────────────────────────────────────────────────────────
|
||||
db_busy_timeout_ms: int = 5000 # SQLite PRAGMA busy_timeout (ms)
|
||||
|
||||
@@ -133,23 +110,6 @@ class Settings(BaseSettings):
|
||||
anthropic_api_key: str = ""
|
||||
claude_model: str = "haiku"
|
||||
|
||||
# ── Tiered Model Router (issue #882) ─────────────────────────────────
|
||||
# Three-tier cascade: Local 8B (free, fast) → Local 70B (free, slower)
|
||||
# → Cloud API (paid, best). Override model names per tier via env vars.
|
||||
#
|
||||
# TIER_LOCAL_FAST_MODEL — Tier-1 model name in Ollama (default: llama3.1:8b)
|
||||
# TIER_LOCAL_HEAVY_MODEL — Tier-2 model name in Ollama (default: hermes3:70b)
|
||||
# TIER_CLOUD_MODEL — Tier-3 cloud model name (default: claude-haiku-4-5)
|
||||
#
|
||||
# Budget limits for the cloud tier (0 = unlimited):
|
||||
# TIER_CLOUD_DAILY_BUDGET_USD — daily ceiling in USD (default: 5.0)
|
||||
# TIER_CLOUD_MONTHLY_BUDGET_USD — monthly ceiling in USD (default: 50.0)
|
||||
tier_local_fast_model: str = "llama3.1:8b"
|
||||
tier_local_heavy_model: str = "hermes3:70b"
|
||||
tier_cloud_model: str = "claude-haiku-4-5"
|
||||
tier_cloud_daily_budget_usd: float = 5.0
|
||||
tier_cloud_monthly_budget_usd: float = 50.0
|
||||
|
||||
# ── Content Moderation ──────────────────────────────────────────────
|
||||
# Three-layer moderation pipeline for AI narrator output.
|
||||
# Uses Llama Guard via Ollama with regex fallback.
|
||||
@@ -268,10 +228,6 @@ class Settings(BaseSettings):
|
||||
# ── Test / Diagnostics ─────────────────────────────────────────────
|
||||
# Skip loading heavy embedding models (for tests / low-memory envs).
|
||||
timmy_skip_embeddings: bool = False
|
||||
# Embedding backend: "ollama" for Ollama, "local" for sentence-transformers.
|
||||
timmy_embedding_backend: Literal["ollama", "local"] = "local"
|
||||
# Ollama model to use for embeddings (e.g., "nomic-embed-text").
|
||||
ollama_embedding_model: str = "nomic-embed-text"
|
||||
# Disable CSRF middleware entirely (for tests).
|
||||
timmy_disable_csrf: bool = False
|
||||
# Mark the process as running in test mode.
|
||||
@@ -420,11 +376,6 @@ class Settings(BaseSettings):
|
||||
autoresearch_time_budget: int = 300 # seconds per experiment run
|
||||
autoresearch_max_iterations: int = 100
|
||||
autoresearch_metric: str = "val_bpb" # metric to optimise (lower = better)
|
||||
# M3 Max / Apple Silicon tuning (Issue #905).
|
||||
# dataset: "tinystories" (default, lower-entropy, recommended for Mac) or "openwebtext".
|
||||
autoresearch_dataset: str = "tinystories"
|
||||
# backend: "auto" detects MLX on Apple Silicon; "cpu" forces CPU fallback.
|
||||
autoresearch_backend: str = "auto"
|
||||
|
||||
# ── Weekly Narrative Summary ───────────────────────────────────────
|
||||
# Generates a human-readable weekly summary of development activity.
|
||||
@@ -455,14 +406,6 @@ class Settings(BaseSettings):
|
||||
# Alert threshold: free disk below this triggers cleanup / alert (GB).
|
||||
hermes_disk_free_min_gb: float = 10.0
|
||||
|
||||
# ── Energy Budget Monitoring ───────────────────────────────────────
|
||||
# Enable energy budget monitoring (tracks CPU/GPU power during inference).
|
||||
energy_budget_enabled: bool = True
|
||||
# Watts threshold that auto-activates low power mode (on-battery only).
|
||||
energy_budget_watts_threshold: float = 15.0
|
||||
# Model to prefer in low power mode (smaller = more efficient).
|
||||
energy_low_power_model: str = "qwen3:1b"
|
||||
|
||||
# ── Error Logging ─────────────────────────────────────────────────
|
||||
error_log_enabled: bool = True
|
||||
error_log_dir: str = "logs"
|
||||
|
||||
@@ -37,7 +37,6 @@ from dashboard.routes.db_explorer import router as db_explorer_router
|
||||
from dashboard.routes.discord import router as discord_router
|
||||
from dashboard.routes.experiments import router as experiments_router
|
||||
from dashboard.routes.grok import router as grok_router
|
||||
from dashboard.routes.energy import router as energy_router
|
||||
from dashboard.routes.health import router as health_router
|
||||
from dashboard.routes.hermes import router as hermes_router
|
||||
from dashboard.routes.loop_qa import router as loop_qa_router
|
||||
@@ -45,18 +44,14 @@ from dashboard.routes.memory import router as memory_router
|
||||
from dashboard.routes.mobile import router as mobile_router
|
||||
from dashboard.routes.models import api_router as models_api_router
|
||||
from dashboard.routes.models import router as models_router
|
||||
from dashboard.routes.nexus import router as nexus_router
|
||||
from dashboard.routes.quests import router as quests_router
|
||||
from dashboard.routes.scorecards import router as scorecards_router
|
||||
from dashboard.routes.sovereignty_metrics import router as sovereignty_metrics_router
|
||||
from dashboard.routes.sovereignty_ws import router as sovereignty_ws_router
|
||||
from dashboard.routes.spark import router as spark_router
|
||||
from dashboard.routes.system import router as system_router
|
||||
from dashboard.routes.tasks import router as tasks_router
|
||||
from dashboard.routes.telegram import router as telegram_router
|
||||
from dashboard.routes.thinking import router as thinking_router
|
||||
from dashboard.routes.self_correction import router as self_correction_router
|
||||
from dashboard.routes.three_strike import router as three_strike_router
|
||||
from dashboard.routes.tools import router as tools_router
|
||||
from dashboard.routes.tower import router as tower_router
|
||||
from dashboard.routes.voice import router as voice_router
|
||||
@@ -552,28 +547,12 @@ async def lifespan(app: FastAPI):
|
||||
except Exception:
|
||||
logger.debug("Failed to register error recorder")
|
||||
|
||||
# Mark session start for sovereignty duration tracking
|
||||
try:
|
||||
from timmy.sovereignty import mark_session_start
|
||||
|
||||
mark_session_start()
|
||||
except Exception:
|
||||
logger.debug("Failed to mark sovereignty session start")
|
||||
|
||||
logger.info("✓ Dashboard ready for requests")
|
||||
|
||||
yield
|
||||
|
||||
await _shutdown_cleanup(bg_tasks, workshop_heartbeat)
|
||||
|
||||
# Generate and commit sovereignty session report
|
||||
try:
|
||||
from timmy.sovereignty import generate_and_commit_report
|
||||
|
||||
await generate_and_commit_report()
|
||||
except Exception as exc:
|
||||
logger.warning("Sovereignty report generation failed at shutdown: %s", exc)
|
||||
|
||||
|
||||
app = FastAPI(
|
||||
title="Mission Control",
|
||||
@@ -672,7 +651,6 @@ app.include_router(tools_router)
|
||||
app.include_router(spark_router)
|
||||
app.include_router(discord_router)
|
||||
app.include_router(memory_router)
|
||||
app.include_router(nexus_router)
|
||||
app.include_router(grok_router)
|
||||
app.include_router(models_router)
|
||||
app.include_router(models_api_router)
|
||||
@@ -691,13 +669,9 @@ app.include_router(matrix_router)
|
||||
app.include_router(tower_router)
|
||||
app.include_router(daily_run_router)
|
||||
app.include_router(hermes_router)
|
||||
app.include_router(energy_router)
|
||||
app.include_router(quests_router)
|
||||
app.include_router(scorecards_router)
|
||||
app.include_router(sovereignty_metrics_router)
|
||||
app.include_router(sovereignty_ws_router)
|
||||
app.include_router(three_strike_router)
|
||||
app.include_router(self_correction_router)
|
||||
|
||||
|
||||
@app.websocket("/ws")
|
||||
|
||||
@@ -1,4 +1,3 @@
|
||||
"""SQLAlchemy ORM models for the CALM task-management and journaling system."""
|
||||
from datetime import UTC, date, datetime
|
||||
from enum import StrEnum
|
||||
|
||||
@@ -9,8 +8,6 @@ from .database import Base # Assuming a shared Base in models/database.py
|
||||
|
||||
|
||||
class TaskState(StrEnum):
|
||||
"""Enumeration of possible task lifecycle states."""
|
||||
|
||||
LATER = "LATER"
|
||||
NEXT = "NEXT"
|
||||
NOW = "NOW"
|
||||
@@ -19,16 +16,12 @@ class TaskState(StrEnum):
|
||||
|
||||
|
||||
class TaskCertainty(StrEnum):
|
||||
"""Enumeration of task time-certainty levels."""
|
||||
|
||||
FUZZY = "FUZZY" # An intention without a time
|
||||
SOFT = "SOFT" # A flexible task with a time
|
||||
HARD = "HARD" # A fixed meeting/appointment
|
||||
|
||||
|
||||
class Task(Base):
|
||||
"""SQLAlchemy model representing a CALM task."""
|
||||
|
||||
__tablename__ = "tasks"
|
||||
|
||||
id = Column(Integer, primary_key=True, index=True)
|
||||
@@ -59,8 +52,6 @@ class Task(Base):
|
||||
|
||||
|
||||
class JournalEntry(Base):
|
||||
"""SQLAlchemy model for a daily journal entry with MITs and reflections."""
|
||||
|
||||
__tablename__ = "journal_entries"
|
||||
|
||||
id = Column(Integer, primary_key=True, index=True)
|
||||
|
||||
@@ -1,4 +1,3 @@
|
||||
"""SQLAlchemy engine, session factory, and declarative Base for the CALM module."""
|
||||
import logging
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
@@ -1,4 +1,3 @@
|
||||
"""Dashboard routes for agent chat interactions and tool-call display."""
|
||||
import json
|
||||
import logging
|
||||
from datetime import datetime
|
||||
|
||||
@@ -1,4 +1,3 @@
|
||||
"""Dashboard routes for the CALM task management and daily journaling interface."""
|
||||
import logging
|
||||
from datetime import UTC, date, datetime
|
||||
|
||||
|
||||
@@ -14,8 +14,6 @@ router = APIRouter(prefix="/discord", tags=["discord"])
|
||||
|
||||
|
||||
class TokenPayload(BaseModel):
|
||||
"""Request payload containing a Discord bot token."""
|
||||
|
||||
token: str
|
||||
|
||||
|
||||
|
||||
@@ -1,121 +0,0 @@
|
||||
"""Energy Budget Monitoring routes.
|
||||
|
||||
Exposes the energy budget monitor via REST API so the dashboard and
|
||||
external tools can query power draw, efficiency scores, and toggle
|
||||
low power mode.
|
||||
|
||||
Refs: #1009
|
||||
"""
|
||||
|
||||
import logging
|
||||
|
||||
from fastapi import APIRouter, HTTPException
|
||||
from pydantic import BaseModel
|
||||
|
||||
from config import settings
|
||||
from infrastructure.energy.monitor import energy_monitor
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
router = APIRouter(prefix="/energy", tags=["energy"])
|
||||
|
||||
|
||||
class LowPowerRequest(BaseModel):
|
||||
"""Request body for toggling low power mode."""
|
||||
|
||||
enabled: bool
|
||||
|
||||
|
||||
class InferenceEventRequest(BaseModel):
|
||||
"""Request body for recording an inference event."""
|
||||
|
||||
model: str
|
||||
tokens_per_second: float
|
||||
|
||||
|
||||
@router.get("/status")
|
||||
async def energy_status():
|
||||
"""Return the current energy budget status.
|
||||
|
||||
Returns the live power estimate, efficiency score (0–10), recent
|
||||
inference samples, and whether low power mode is active.
|
||||
"""
|
||||
if not getattr(settings, "energy_budget_enabled", True):
|
||||
return {
|
||||
"enabled": False,
|
||||
"message": "Energy budget monitoring is disabled (ENERGY_BUDGET_ENABLED=false)",
|
||||
}
|
||||
|
||||
report = await energy_monitor.get_report()
|
||||
return {**report.to_dict(), "enabled": True}
|
||||
|
||||
|
||||
@router.get("/report")
|
||||
async def energy_report():
|
||||
"""Detailed energy budget report with all recent samples.
|
||||
|
||||
Same as /energy/status but always includes the full sample history.
|
||||
"""
|
||||
if not getattr(settings, "energy_budget_enabled", True):
|
||||
raise HTTPException(status_code=503, detail="Energy budget monitoring is disabled")
|
||||
|
||||
report = await energy_monitor.get_report()
|
||||
data = report.to_dict()
|
||||
# Override recent_samples to include the full window (not just last 10)
|
||||
data["recent_samples"] = [
|
||||
{
|
||||
"timestamp": s.timestamp,
|
||||
"model": s.model,
|
||||
"tokens_per_second": round(s.tokens_per_second, 1),
|
||||
"estimated_watts": round(s.estimated_watts, 2),
|
||||
"efficiency": round(s.efficiency, 3),
|
||||
"efficiency_score": round(s.efficiency_score, 2),
|
||||
}
|
||||
for s in list(energy_monitor._samples)
|
||||
]
|
||||
return {**data, "enabled": True}
|
||||
|
||||
|
||||
@router.post("/low-power")
|
||||
async def set_low_power_mode(body: LowPowerRequest):
|
||||
"""Enable or disable low power mode.
|
||||
|
||||
In low power mode the cascade router is advised to prefer the
|
||||
configured energy_low_power_model (see settings).
|
||||
"""
|
||||
if not getattr(settings, "energy_budget_enabled", True):
|
||||
raise HTTPException(status_code=503, detail="Energy budget monitoring is disabled")
|
||||
|
||||
energy_monitor.set_low_power_mode(body.enabled)
|
||||
low_power_model = getattr(settings, "energy_low_power_model", "qwen3:1b")
|
||||
return {
|
||||
"low_power_mode": body.enabled,
|
||||
"preferred_model": low_power_model if body.enabled else None,
|
||||
"message": (
|
||||
f"Low power mode {'enabled' if body.enabled else 'disabled'}. "
|
||||
+ (f"Routing to {low_power_model}." if body.enabled else "Routing restored to default.")
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
@router.post("/record")
|
||||
async def record_inference_event(body: InferenceEventRequest):
|
||||
"""Record an inference event for efficiency tracking.
|
||||
|
||||
Called after each LLM inference completes. Updates the rolling
|
||||
efficiency score and may auto-activate low power mode if watts
|
||||
exceed the configured threshold.
|
||||
"""
|
||||
if not getattr(settings, "energy_budget_enabled", True):
|
||||
return {"recorded": False, "message": "Energy budget monitoring is disabled"}
|
||||
|
||||
if body.tokens_per_second <= 0:
|
||||
raise HTTPException(status_code=422, detail="tokens_per_second must be positive")
|
||||
|
||||
sample = energy_monitor.record_inference(body.model, body.tokens_per_second)
|
||||
return {
|
||||
"recorded": True,
|
||||
"efficiency_score": round(sample.efficiency_score, 2),
|
||||
"estimated_watts": round(sample.estimated_watts, 2),
|
||||
"low_power_mode": energy_monitor.low_power_mode,
|
||||
}
|
||||
@@ -1,166 +0,0 @@
|
||||
"""Nexus — Timmy's persistent conversational awareness space.
|
||||
|
||||
A conversational-only interface where Timmy maintains live memory context.
|
||||
No tool use; pure conversation with memory integration and a teaching panel.
|
||||
|
||||
Routes:
|
||||
GET /nexus — render nexus page with live memory sidebar
|
||||
POST /nexus/chat — send a message; returns HTMX partial
|
||||
POST /nexus/teach — inject a fact into Timmy's live memory
|
||||
DELETE /nexus/history — clear the nexus conversation history
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from datetime import UTC, datetime
|
||||
|
||||
from fastapi import APIRouter, Form, Request
|
||||
from fastapi.responses import HTMLResponse
|
||||
|
||||
from dashboard.templating import templates
|
||||
from timmy.memory_system import (
|
||||
get_memory_stats,
|
||||
recall_personal_facts_with_ids,
|
||||
search_memories,
|
||||
store_personal_fact,
|
||||
)
|
||||
from timmy.session import _clean_response, chat, reset_session
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
router = APIRouter(prefix="/nexus", tags=["nexus"])
|
||||
|
||||
_NEXUS_SESSION_ID = "nexus"
|
||||
_MAX_MESSAGE_LENGTH = 10_000
|
||||
|
||||
# In-memory conversation log for the Nexus session (mirrors chat store pattern
|
||||
# but is scoped to the Nexus so it won't pollute the main dashboard history).
|
||||
_nexus_log: list[dict] = []
|
||||
|
||||
|
||||
def _ts() -> str:
|
||||
return datetime.now(UTC).strftime("%H:%M:%S")
|
||||
|
||||
|
||||
def _append_log(role: str, content: str) -> None:
|
||||
_nexus_log.append({"role": role, "content": content, "timestamp": _ts()})
|
||||
# Keep last 200 exchanges to bound memory usage
|
||||
if len(_nexus_log) > 200:
|
||||
del _nexus_log[:-200]
|
||||
|
||||
|
||||
@router.get("", response_class=HTMLResponse)
|
||||
async def nexus_page(request: Request):
|
||||
"""Render the Nexus page with live memory context."""
|
||||
stats = get_memory_stats()
|
||||
facts = recall_personal_facts_with_ids()[:8]
|
||||
|
||||
return templates.TemplateResponse(
|
||||
request,
|
||||
"nexus.html",
|
||||
{
|
||||
"page_title": "Nexus",
|
||||
"messages": list(_nexus_log),
|
||||
"stats": stats,
|
||||
"facts": facts,
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
@router.post("/chat", response_class=HTMLResponse)
|
||||
async def nexus_chat(request: Request, message: str = Form(...)):
|
||||
"""Conversational-only chat routed through the Nexus session.
|
||||
|
||||
Does not invoke tool-use approval flow — pure conversation with memory
|
||||
context injected from Timmy's live memory store.
|
||||
"""
|
||||
message = message.strip()
|
||||
if not message:
|
||||
return HTMLResponse("")
|
||||
if len(message) > _MAX_MESSAGE_LENGTH:
|
||||
return templates.TemplateResponse(
|
||||
request,
|
||||
"partials/nexus_message.html",
|
||||
{
|
||||
"user_message": message[:80] + "…",
|
||||
"response": None,
|
||||
"error": "Message too long (max 10 000 chars).",
|
||||
"timestamp": _ts(),
|
||||
"memory_hits": [],
|
||||
},
|
||||
)
|
||||
|
||||
ts = _ts()
|
||||
|
||||
# Fetch semantically relevant memories to surface in the sidebar
|
||||
try:
|
||||
memory_hits = await asyncio.to_thread(search_memories, query=message, limit=4)
|
||||
except Exception as exc:
|
||||
logger.warning("Nexus memory search failed: %s", exc)
|
||||
memory_hits = []
|
||||
|
||||
# Conversational response — no tool approval flow
|
||||
response_text: str | None = None
|
||||
error_text: str | None = None
|
||||
try:
|
||||
raw = await chat(message, session_id=_NEXUS_SESSION_ID)
|
||||
response_text = _clean_response(raw)
|
||||
except Exception as exc:
|
||||
logger.error("Nexus chat error: %s", exc)
|
||||
error_text = "Timmy is unavailable right now. Check that Ollama is running."
|
||||
|
||||
_append_log("user", message)
|
||||
if response_text:
|
||||
_append_log("assistant", response_text)
|
||||
|
||||
return templates.TemplateResponse(
|
||||
request,
|
||||
"partials/nexus_message.html",
|
||||
{
|
||||
"user_message": message,
|
||||
"response": response_text,
|
||||
"error": error_text,
|
||||
"timestamp": ts,
|
||||
"memory_hits": memory_hits,
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
@router.post("/teach", response_class=HTMLResponse)
|
||||
async def nexus_teach(request: Request, fact: str = Form(...)):
|
||||
"""Inject a fact into Timmy's live memory from the Nexus teaching panel."""
|
||||
fact = fact.strip()
|
||||
if not fact:
|
||||
return HTMLResponse("")
|
||||
|
||||
try:
|
||||
await asyncio.to_thread(store_personal_fact, fact)
|
||||
facts = await asyncio.to_thread(recall_personal_facts_with_ids)
|
||||
facts = facts[:8]
|
||||
except Exception as exc:
|
||||
logger.error("Nexus teach error: %s", exc)
|
||||
facts = []
|
||||
|
||||
return templates.TemplateResponse(
|
||||
request,
|
||||
"partials/nexus_facts.html",
|
||||
{"facts": facts, "taught": fact},
|
||||
)
|
||||
|
||||
|
||||
@router.delete("/history", response_class=HTMLResponse)
|
||||
async def nexus_clear_history(request: Request):
|
||||
"""Clear the Nexus conversation history."""
|
||||
_nexus_log.clear()
|
||||
reset_session(session_id=_NEXUS_SESSION_ID)
|
||||
return templates.TemplateResponse(
|
||||
request,
|
||||
"partials/nexus_message.html",
|
||||
{
|
||||
"user_message": None,
|
||||
"response": "Nexus conversation cleared.",
|
||||
"error": None,
|
||||
"timestamp": _ts(),
|
||||
"memory_hits": [],
|
||||
},
|
||||
)
|
||||
@@ -10,7 +10,6 @@ from fastapi.responses import HTMLResponse, JSONResponse
|
||||
|
||||
from dashboard.services.scorecard_service import (
|
||||
PeriodType,
|
||||
ScorecardSummary,
|
||||
generate_all_scorecards,
|
||||
generate_scorecard,
|
||||
get_tracked_agents,
|
||||
@@ -27,216 +26,6 @@ def _format_period_label(period_type: PeriodType) -> str:
|
||||
return "Daily" if period_type == PeriodType.daily else "Weekly"
|
||||
|
||||
|
||||
def _parse_period(period: str) -> PeriodType:
|
||||
"""Parse period string into PeriodType, defaulting to daily on invalid input.
|
||||
|
||||
Args:
|
||||
period: The period string ('daily' or 'weekly')
|
||||
|
||||
Returns:
|
||||
PeriodType.daily or PeriodType.weekly
|
||||
"""
|
||||
try:
|
||||
return PeriodType(period.lower())
|
||||
except ValueError:
|
||||
return PeriodType.daily
|
||||
|
||||
|
||||
def _format_token_display(token_net: int) -> str:
|
||||
"""Format token net value with +/- prefix for display.
|
||||
|
||||
Args:
|
||||
token_net: The net token value
|
||||
|
||||
Returns:
|
||||
Formatted string with + prefix for positive values
|
||||
"""
|
||||
return f"{'+' if token_net > 0 else ''}{token_net}"
|
||||
|
||||
|
||||
def _format_token_class(token_net: int) -> str:
|
||||
"""Get CSS class for token net value based on sign.
|
||||
|
||||
Args:
|
||||
token_net: The net token value
|
||||
|
||||
Returns:
|
||||
'text-success' for positive/zero, 'text-danger' for negative
|
||||
"""
|
||||
return "text-success" if token_net >= 0 else "text-danger"
|
||||
|
||||
|
||||
def _build_patterns_html(patterns: list[str]) -> str:
|
||||
"""Build HTML for patterns section if patterns exist.
|
||||
|
||||
Args:
|
||||
patterns: List of pattern strings
|
||||
|
||||
Returns:
|
||||
HTML string for patterns section or empty string
|
||||
"""
|
||||
if not patterns:
|
||||
return ""
|
||||
|
||||
patterns_list = "".join([f"<li>{p}</li>" for p in patterns])
|
||||
return f"""
|
||||
<div class="mt-3">
|
||||
<h6>Patterns</h6>
|
||||
<ul class="list-unstyled text-info">
|
||||
{patterns_list}
|
||||
</ul>
|
||||
</div>
|
||||
"""
|
||||
|
||||
|
||||
def _build_narrative_html(bullets: list[str]) -> str:
|
||||
"""Build HTML for narrative bullets.
|
||||
|
||||
Args:
|
||||
bullets: List of narrative bullet strings
|
||||
|
||||
Returns:
|
||||
HTML string with list items
|
||||
"""
|
||||
return "".join([f"<li>{b}</li>" for b in bullets])
|
||||
|
||||
|
||||
def _build_metrics_row_html(metrics: dict) -> str:
|
||||
"""Build HTML for the metrics summary row.
|
||||
|
||||
Args:
|
||||
metrics: Dictionary with PRs, issues, tests, and token metrics
|
||||
|
||||
Returns:
|
||||
HTML string for the metrics row
|
||||
"""
|
||||
prs_opened = metrics["prs_opened"]
|
||||
prs_merged = metrics["prs_merged"]
|
||||
pr_merge_rate = int(metrics["pr_merge_rate"] * 100)
|
||||
issues_touched = metrics["issues_touched"]
|
||||
tests_affected = metrics["tests_affected"]
|
||||
token_net = metrics["token_net"]
|
||||
|
||||
token_class = _format_token_class(token_net)
|
||||
token_display = _format_token_display(token_net)
|
||||
|
||||
return f"""
|
||||
<div class="row text-center small">
|
||||
<div class="col">
|
||||
<div class="text-muted">PRs</div>
|
||||
<div class="fw-bold">{prs_opened}/{prs_merged}</div>
|
||||
<div class="text-muted" style="font-size: 0.75rem;">
|
||||
{pr_merge_rate}% merged
|
||||
</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Issues</div>
|
||||
<div class="fw-bold">{issues_touched}</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Tests</div>
|
||||
<div class="fw-bold">{tests_affected}</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Tokens</div>
|
||||
<div class="fw-bold {token_class}">{token_display}</div>
|
||||
</div>
|
||||
</div>
|
||||
"""
|
||||
|
||||
|
||||
def _render_scorecard_panel(
|
||||
agent_id: str,
|
||||
period_type: PeriodType,
|
||||
data: dict,
|
||||
) -> str:
|
||||
"""Render HTML for a single scorecard panel.
|
||||
|
||||
Args:
|
||||
agent_id: The agent ID
|
||||
period_type: Daily or weekly period
|
||||
data: Scorecard data dictionary with metrics, patterns, narrative_bullets
|
||||
|
||||
Returns:
|
||||
HTML string for the scorecard panel
|
||||
"""
|
||||
patterns_html = _build_patterns_html(data.get("patterns", []))
|
||||
bullets_html = _build_narrative_html(data.get("narrative_bullets", []))
|
||||
metrics_row = _build_metrics_row_html(data["metrics"])
|
||||
|
||||
return f"""
|
||||
<div class="card mc-panel">
|
||||
<div class="card-header d-flex justify-content-between align-items-center">
|
||||
<h5 class="card-title mb-0">{agent_id.title()}</h5>
|
||||
<span class="badge bg-secondary">{_format_period_label(period_type)}</span>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<ul class="list-unstyled mb-3">
|
||||
{bullets_html}
|
||||
</ul>
|
||||
{metrics_row}
|
||||
{patterns_html}
|
||||
</div>
|
||||
</div>
|
||||
"""
|
||||
|
||||
|
||||
def _render_empty_scorecard(agent_id: str) -> str:
|
||||
"""Render HTML for an empty scorecard (no activity).
|
||||
|
||||
Args:
|
||||
agent_id: The agent ID
|
||||
|
||||
Returns:
|
||||
HTML string for the empty scorecard panel
|
||||
"""
|
||||
return f"""
|
||||
<div class="card mc-panel">
|
||||
<h5 class="card-title">{agent_id.title()}</h5>
|
||||
<p class="text-muted">No activity recorded for this period.</p>
|
||||
</div>
|
||||
"""
|
||||
|
||||
|
||||
def _render_error_scorecard(agent_id: str, error: str) -> str:
|
||||
"""Render HTML for a scorecard that failed to load.
|
||||
|
||||
Args:
|
||||
agent_id: The agent ID
|
||||
error: Error message string
|
||||
|
||||
Returns:
|
||||
HTML string for the error scorecard panel
|
||||
"""
|
||||
return f"""
|
||||
<div class="card mc-panel border-danger">
|
||||
<h5 class="card-title">{agent_id.title()}</h5>
|
||||
<p class="text-danger">Error loading scorecard: {error}</p>
|
||||
</div>
|
||||
"""
|
||||
|
||||
|
||||
def _render_single_panel_wrapper(
|
||||
agent_id: str,
|
||||
period_type: PeriodType,
|
||||
scorecard: ScorecardSummary | None,
|
||||
) -> str:
|
||||
"""Render a complete scorecard panel with wrapper div for single panel view.
|
||||
|
||||
Args:
|
||||
agent_id: The agent ID
|
||||
period_type: Daily or weekly period
|
||||
scorecard: ScorecardSummary object or None
|
||||
|
||||
Returns:
|
||||
HTML string for the complete panel
|
||||
"""
|
||||
if scorecard is None:
|
||||
return _render_empty_scorecard(agent_id)
|
||||
|
||||
return _render_scorecard_panel(agent_id, period_type, scorecard.to_dict())
|
||||
|
||||
|
||||
@router.get("/api/agents")
|
||||
async def list_tracked_agents() -> dict[str, list[str]]:
|
||||
"""Return the list of tracked agent IDs.
|
||||
@@ -360,50 +149,99 @@ async def agent_scorecard_panel(
|
||||
Returns:
|
||||
HTML panel with scorecard content
|
||||
"""
|
||||
period_type = _parse_period(period)
|
||||
try:
|
||||
period_type = PeriodType(period.lower())
|
||||
except ValueError:
|
||||
period_type = PeriodType.daily
|
||||
|
||||
try:
|
||||
scorecard = generate_scorecard(agent_id, period_type)
|
||||
html_content = _render_single_panel_wrapper(agent_id, period_type, scorecard)
|
||||
|
||||
if scorecard is None:
|
||||
return HTMLResponse(
|
||||
content=f"""
|
||||
<div class="card mc-panel">
|
||||
<h5 class="card-title">{agent_id.title()}</h5>
|
||||
<p class="text-muted">No activity recorded for this period.</p>
|
||||
</div>
|
||||
""",
|
||||
status_code=200,
|
||||
)
|
||||
|
||||
data = scorecard.to_dict()
|
||||
|
||||
# Build patterns HTML
|
||||
patterns_html = ""
|
||||
if data["patterns"]:
|
||||
patterns_list = "".join([f"<li>{p}</li>" for p in data["patterns"]])
|
||||
patterns_html = f"""
|
||||
<div class="mt-3">
|
||||
<h6>Patterns</h6>
|
||||
<ul class="list-unstyled text-info">
|
||||
{patterns_list}
|
||||
</ul>
|
||||
</div>
|
||||
"""
|
||||
|
||||
# Build bullets HTML
|
||||
bullets_html = "".join([f"<li>{b}</li>" for b in data["narrative_bullets"]])
|
||||
|
||||
# Build metrics summary
|
||||
metrics = data["metrics"]
|
||||
|
||||
html_content = f"""
|
||||
<div class="card mc-panel">
|
||||
<div class="card-header d-flex justify-content-between align-items-center">
|
||||
<h5 class="card-title mb-0">{agent_id.title()}</h5>
|
||||
<span class="badge bg-secondary">{_format_period_label(period_type)}</span>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<ul class="list-unstyled mb-3">
|
||||
{bullets_html}
|
||||
</ul>
|
||||
|
||||
<div class="row text-center small">
|
||||
<div class="col">
|
||||
<div class="text-muted">PRs</div>
|
||||
<div class="fw-bold">{metrics["prs_opened"]}/{metrics["prs_merged"]}</div>
|
||||
<div class="text-muted" style="font-size: 0.75rem;">
|
||||
{int(metrics["pr_merge_rate"] * 100)}% merged
|
||||
</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Issues</div>
|
||||
<div class="fw-bold">{metrics["issues_touched"]}</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Tests</div>
|
||||
<div class="fw-bold">{metrics["tests_affected"]}</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Tokens</div>
|
||||
<div class="fw-bold {"text-success" if metrics["token_net"] >= 0 else "text-danger"}">
|
||||
{"+" if metrics["token_net"] > 0 else ""}{metrics["token_net"]}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{patterns_html}
|
||||
</div>
|
||||
</div>
|
||||
"""
|
||||
|
||||
return HTMLResponse(content=html_content)
|
||||
|
||||
except Exception as exc:
|
||||
logger.error("Failed to render scorecard panel for %s: %s", agent_id, exc)
|
||||
return HTMLResponse(content=_render_error_scorecard(agent_id, str(exc)))
|
||||
|
||||
|
||||
def _render_all_panels_grid(
|
||||
scorecards: list[ScorecardSummary],
|
||||
period_type: PeriodType,
|
||||
) -> str:
|
||||
"""Render all scorecard panels in a grid layout.
|
||||
|
||||
Args:
|
||||
scorecards: List of scorecard summaries
|
||||
period_type: Daily or weekly period
|
||||
|
||||
Returns:
|
||||
HTML string with all panels in a grid
|
||||
"""
|
||||
panels: list[str] = []
|
||||
for scorecard in scorecards:
|
||||
panel_html = _render_scorecard_panel(
|
||||
scorecard.agent_id,
|
||||
period_type,
|
||||
scorecard.to_dict(),
|
||||
return HTMLResponse(
|
||||
content=f"""
|
||||
<div class="card mc-panel border-danger">
|
||||
<h5 class="card-title">{agent_id.title()}</h5>
|
||||
<p class="text-danger">Error loading scorecard: {str(exc)}</p>
|
||||
</div>
|
||||
""",
|
||||
status_code=200,
|
||||
)
|
||||
# Wrap each panel in a grid column
|
||||
wrapped = f'<div class="col-md-6 col-lg-4 mb-3">{panel_html}</div>'
|
||||
panels.append(wrapped)
|
||||
|
||||
return f"""
|
||||
<div class="row">
|
||||
{"".join(panels)}
|
||||
</div>
|
||||
<div class="text-muted small mt-2">
|
||||
Generated: {datetime.now().strftime("%Y-%m-%d %H:%M:%S UTC")}
|
||||
</div>
|
||||
"""
|
||||
|
||||
|
||||
@router.get("/all/panels", response_class=HTMLResponse)
|
||||
@@ -420,15 +258,96 @@ async def all_scorecard_panels(
|
||||
Returns:
|
||||
HTML with all scorecard panels
|
||||
"""
|
||||
period_type = _parse_period(period)
|
||||
try:
|
||||
period_type = PeriodType(period.lower())
|
||||
except ValueError:
|
||||
period_type = PeriodType.daily
|
||||
|
||||
try:
|
||||
scorecards = generate_all_scorecards(period_type)
|
||||
html_content = _render_all_panels_grid(scorecards, period_type)
|
||||
|
||||
panels: list[str] = []
|
||||
for scorecard in scorecards:
|
||||
data = scorecard.to_dict()
|
||||
|
||||
# Build patterns HTML
|
||||
patterns_html = ""
|
||||
if data["patterns"]:
|
||||
patterns_list = "".join([f"<li>{p}</li>" for p in data["patterns"]])
|
||||
patterns_html = f"""
|
||||
<div class="mt-3">
|
||||
<h6>Patterns</h6>
|
||||
<ul class="list-unstyled text-info">
|
||||
{patterns_list}
|
||||
</ul>
|
||||
</div>
|
||||
"""
|
||||
|
||||
# Build bullets HTML
|
||||
bullets_html = "".join([f"<li>{b}</li>" for b in data["narrative_bullets"]])
|
||||
metrics = data["metrics"]
|
||||
|
||||
panel_html = f"""
|
||||
<div class="col-md-6 col-lg-4 mb-3">
|
||||
<div class="card mc-panel">
|
||||
<div class="card-header d-flex justify-content-between align-items-center">
|
||||
<h5 class="card-title mb-0">{scorecard.agent_id.title()}</h5>
|
||||
<span class="badge bg-secondary">{_format_period_label(period_type)}</span>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<ul class="list-unstyled mb-3">
|
||||
{bullets_html}
|
||||
</ul>
|
||||
|
||||
<div class="row text-center small">
|
||||
<div class="col">
|
||||
<div class="text-muted">PRs</div>
|
||||
<div class="fw-bold">{metrics["prs_opened"]}/{metrics["prs_merged"]}</div>
|
||||
<div class="text-muted" style="font-size: 0.75rem;">
|
||||
{int(metrics["pr_merge_rate"] * 100)}% merged
|
||||
</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Issues</div>
|
||||
<div class="fw-bold">{metrics["issues_touched"]}</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Tests</div>
|
||||
<div class="fw-bold">{metrics["tests_affected"]}</div>
|
||||
</div>
|
||||
<div class="col">
|
||||
<div class="text-muted">Tokens</div>
|
||||
<div class="fw-bold {"text-success" if metrics["token_net"] >= 0 else "text-danger"}">
|
||||
{"+" if metrics["token_net"] > 0 else ""}{metrics["token_net"]}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{patterns_html}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
"""
|
||||
panels.append(panel_html)
|
||||
|
||||
html_content = f"""
|
||||
<div class="row">
|
||||
{"".join(panels)}
|
||||
</div>
|
||||
<div class="text-muted small mt-2">
|
||||
Generated: {datetime.now().strftime("%Y-%m-%d %H:%M:%S UTC")}
|
||||
</div>
|
||||
"""
|
||||
|
||||
return HTMLResponse(content=html_content)
|
||||
|
||||
except Exception as exc:
|
||||
logger.error("Failed to render all scorecard panels: %s", exc)
|
||||
return HTMLResponse(
|
||||
content=f'<div class="alert alert-danger">Error loading scorecards: {exc}</div>'
|
||||
content=f"""
|
||||
<div class="alert alert-danger">
|
||||
Error loading scorecards: {str(exc)}
|
||||
</div>
|
||||
""",
|
||||
status_code=200,
|
||||
)
|
||||
|
||||
@@ -1,58 +0,0 @@
|
||||
"""Self-Correction Dashboard routes.
|
||||
|
||||
GET /self-correction/ui — HTML dashboard
|
||||
GET /self-correction/timeline — HTMX partial: recent event timeline
|
||||
GET /self-correction/patterns — HTMX partial: recurring failure patterns
|
||||
"""
|
||||
|
||||
import logging
|
||||
|
||||
from fastapi import APIRouter, Request
|
||||
from fastapi.responses import HTMLResponse
|
||||
|
||||
from dashboard.templating import templates
|
||||
from infrastructure.self_correction import get_corrections, get_patterns, get_stats
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
router = APIRouter(prefix="/self-correction", tags=["self-correction"])
|
||||
|
||||
|
||||
@router.get("/ui", response_class=HTMLResponse)
|
||||
async def self_correction_ui(request: Request):
|
||||
"""Render the Self-Correction Dashboard."""
|
||||
stats = get_stats()
|
||||
corrections = get_corrections(limit=20)
|
||||
patterns = get_patterns(top_n=10)
|
||||
return templates.TemplateResponse(
|
||||
request,
|
||||
"self_correction.html",
|
||||
{
|
||||
"stats": stats,
|
||||
"corrections": corrections,
|
||||
"patterns": patterns,
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
@router.get("/timeline", response_class=HTMLResponse)
|
||||
async def self_correction_timeline(request: Request):
|
||||
"""HTMX partial: recent self-correction event timeline."""
|
||||
corrections = get_corrections(limit=30)
|
||||
return templates.TemplateResponse(
|
||||
request,
|
||||
"partials/self_correction_timeline.html",
|
||||
{"corrections": corrections},
|
||||
)
|
||||
|
||||
|
||||
@router.get("/patterns", response_class=HTMLResponse)
|
||||
async def self_correction_patterns(request: Request):
|
||||
"""HTMX partial: recurring failure patterns."""
|
||||
patterns = get_patterns(top_n=10)
|
||||
stats = get_stats()
|
||||
return templates.TemplateResponse(
|
||||
request,
|
||||
"partials/self_correction_patterns.html",
|
||||
{"patterns": patterns, "stats": stats},
|
||||
)
|
||||
@@ -1,40 +0,0 @@
|
||||
"""WebSocket emitter for the sovereignty metrics dashboard widget.
|
||||
|
||||
Streams real-time sovereignty snapshots to connected clients every
|
||||
*_PUSH_INTERVAL* seconds. The snapshot includes per-layer sovereignty
|
||||
percentages, API cost rate, and skill crystallisation count.
|
||||
|
||||
Refs: #954, #953
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
|
||||
from fastapi import APIRouter, WebSocket
|
||||
|
||||
router = APIRouter(tags=["sovereignty"])
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_PUSH_INTERVAL = 5 # seconds between snapshot pushes
|
||||
|
||||
|
||||
@router.websocket("/ws/sovereignty")
|
||||
async def sovereignty_ws(websocket: WebSocket) -> None:
|
||||
"""Stream sovereignty metric snapshots to the dashboard widget."""
|
||||
from timmy.sovereignty.metrics import get_metrics_store
|
||||
|
||||
await websocket.accept()
|
||||
logger.info("Sovereignty WS connected")
|
||||
|
||||
store = get_metrics_store()
|
||||
try:
|
||||
# Send initial snapshot immediately
|
||||
await websocket.send_text(json.dumps(store.get_snapshot()))
|
||||
|
||||
while True:
|
||||
await asyncio.sleep(_PUSH_INTERVAL)
|
||||
await websocket.send_text(json.dumps(store.get_snapshot()))
|
||||
except Exception:
|
||||
logger.debug("Sovereignty WS disconnected")
|
||||
@@ -7,8 +7,6 @@ router = APIRouter(prefix="/telegram", tags=["telegram"])
|
||||
|
||||
|
||||
class TokenPayload(BaseModel):
|
||||
"""Request payload containing a Telegram bot token."""
|
||||
|
||||
token: str
|
||||
|
||||
|
||||
|
||||
@@ -1,116 +0,0 @@
|
||||
"""Three-Strike Detector dashboard routes.
|
||||
|
||||
Provides JSON API endpoints for inspecting and managing the three-strike
|
||||
detector state.
|
||||
|
||||
Refs: #962
|
||||
"""
|
||||
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from fastapi import APIRouter, HTTPException
|
||||
from pydantic import BaseModel
|
||||
|
||||
from timmy.sovereignty.three_strike import CATEGORIES, get_detector
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
router = APIRouter(prefix="/sovereignty/three-strike", tags=["three-strike"])
|
||||
|
||||
|
||||
class RecordRequest(BaseModel):
|
||||
category: str
|
||||
key: str
|
||||
metadata: dict[str, Any] = {}
|
||||
|
||||
|
||||
class AutomationRequest(BaseModel):
|
||||
artifact_path: str
|
||||
|
||||
|
||||
@router.get("")
|
||||
async def list_strikes() -> dict[str, Any]:
|
||||
"""Return all strike records."""
|
||||
detector = get_detector()
|
||||
records = detector.list_all()
|
||||
return {
|
||||
"records": [
|
||||
{
|
||||
"category": r.category,
|
||||
"key": r.key,
|
||||
"count": r.count,
|
||||
"blocked": r.blocked,
|
||||
"automation": r.automation,
|
||||
"first_seen": r.first_seen,
|
||||
"last_seen": r.last_seen,
|
||||
}
|
||||
for r in records
|
||||
],
|
||||
"categories": sorted(CATEGORIES),
|
||||
}
|
||||
|
||||
|
||||
@router.get("/blocked")
|
||||
async def list_blocked() -> dict[str, Any]:
|
||||
"""Return only blocked (category, key) pairs."""
|
||||
detector = get_detector()
|
||||
records = detector.list_blocked()
|
||||
return {
|
||||
"blocked": [
|
||||
{
|
||||
"category": r.category,
|
||||
"key": r.key,
|
||||
"count": r.count,
|
||||
"automation": r.automation,
|
||||
"last_seen": r.last_seen,
|
||||
}
|
||||
for r in records
|
||||
]
|
||||
}
|
||||
|
||||
|
||||
@router.post("/record")
|
||||
async def record_strike(body: RecordRequest) -> dict[str, Any]:
|
||||
"""Record a manual action. Returns strike state; 409 when blocked."""
|
||||
from timmy.sovereignty.three_strike import ThreeStrikeError
|
||||
|
||||
detector = get_detector()
|
||||
try:
|
||||
record = detector.record(body.category, body.key, body.metadata)
|
||||
return {
|
||||
"category": record.category,
|
||||
"key": record.key,
|
||||
"count": record.count,
|
||||
"blocked": record.blocked,
|
||||
"automation": record.automation,
|
||||
}
|
||||
except ValueError as exc:
|
||||
raise HTTPException(status_code=422, detail=str(exc)) from exc
|
||||
except ThreeStrikeError as exc:
|
||||
raise HTTPException(
|
||||
status_code=409,
|
||||
detail={
|
||||
"error": "three_strike_block",
|
||||
"message": str(exc),
|
||||
"category": exc.category,
|
||||
"key": exc.key,
|
||||
"count": exc.count,
|
||||
},
|
||||
) from exc
|
||||
|
||||
|
||||
@router.post("/{category}/{key}/automation")
|
||||
async def register_automation(category: str, key: str, body: AutomationRequest) -> dict[str, bool]:
|
||||
"""Register an automation artifact to unblock a (category, key) pair."""
|
||||
detector = get_detector()
|
||||
detector.register_automation(category, key, body.artifact_path)
|
||||
return {"success": True}
|
||||
|
||||
|
||||
@router.get("/{category}/{key}/events")
|
||||
async def get_strike_events(category: str, key: str, limit: int = 50) -> dict[str, Any]:
|
||||
"""Return the individual strike events for a (category, key) pair."""
|
||||
detector = get_detector()
|
||||
events = detector.get_events(category, key, limit=limit)
|
||||
return {"category": category, "key": key, "events": events}
|
||||
@@ -51,8 +51,6 @@ def _get_db() -> Generator[sqlite3.Connection, None, None]:
|
||||
|
||||
|
||||
class _EnumLike:
|
||||
"""Lightweight enum-like wrapper for string values used in templates."""
|
||||
|
||||
def __init__(self, v: str):
|
||||
self.value = v
|
||||
|
||||
|
||||
@@ -23,8 +23,6 @@ TRACKED_AGENTS = frozenset({"hermes", "kimi", "manus", "claude", "gemini"})
|
||||
|
||||
|
||||
class PeriodType(StrEnum):
|
||||
"""Scorecard reporting period type."""
|
||||
|
||||
daily = "daily"
|
||||
weekly = "weekly"
|
||||
|
||||
|
||||
@@ -67,11 +67,9 @@
|
||||
<div class="mc-nav-dropdown">
|
||||
<button class="mc-test-link mc-dropdown-toggle" aria-expanded="false">INTEL ▾</button>
|
||||
<div class="mc-dropdown-menu">
|
||||
<a href="/nexus" class="mc-test-link">NEXUS</a>
|
||||
<a href="/spark/ui" class="mc-test-link">SPARK</a>
|
||||
<a href="/memory" class="mc-test-link">MEMORY</a>
|
||||
<a href="/marketplace/ui" class="mc-test-link">MARKET</a>
|
||||
<a href="/self-correction/ui" class="mc-test-link">SELF-CORRECT</a>
|
||||
</div>
|
||||
</div>
|
||||
<div class="mc-nav-dropdown">
|
||||
@@ -133,7 +131,6 @@
|
||||
<a href="/spark/ui" class="mc-mobile-link">SPARK</a>
|
||||
<a href="/memory" class="mc-mobile-link">MEMORY</a>
|
||||
<a href="/marketplace/ui" class="mc-mobile-link">MARKET</a>
|
||||
<a href="/self-correction/ui" class="mc-mobile-link">SELF-CORRECT</a>
|
||||
<div class="mc-mobile-section-label">AGENTS</div>
|
||||
<a href="/hands" class="mc-mobile-link">HANDS</a>
|
||||
<a href="/work-orders/queue" class="mc-mobile-link">WORK ORDERS</a>
|
||||
|
||||
@@ -186,24 +186,6 @@
|
||||
<p class="chat-history-placeholder">Loading sovereignty metrics...</p>
|
||||
{% endcall %}
|
||||
|
||||
<!-- Agent Scorecards -->
|
||||
<div class="card mc-card-spaced" id="mc-scorecards-card">
|
||||
<div class="card-header">
|
||||
<h2 class="card-title">Agent Scorecards</h2>
|
||||
<div class="d-flex align-items-center gap-2">
|
||||
<select id="mc-scorecard-period" class="form-select form-select-sm" style="width: auto;"
|
||||
onchange="loadMcScorecards()">
|
||||
<option value="daily" selected>Daily</option>
|
||||
<option value="weekly">Weekly</option>
|
||||
</select>
|
||||
<a href="/scorecards" class="btn btn-sm btn-outline-secondary">Full View</a>
|
||||
</div>
|
||||
</div>
|
||||
<div id="mc-scorecards-content" class="p-2">
|
||||
<p class="chat-history-placeholder">Loading scorecards...</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Chat History -->
|
||||
<div class="card mc-card-spaced">
|
||||
<div class="card-header">
|
||||
@@ -520,20 +502,6 @@ async function loadSparkStatus() {
|
||||
}
|
||||
}
|
||||
|
||||
// Load agent scorecards
|
||||
async function loadMcScorecards() {
|
||||
var period = document.getElementById('mc-scorecard-period').value;
|
||||
var container = document.getElementById('mc-scorecards-content');
|
||||
container.innerHTML = '<p class="chat-history-placeholder">Loading scorecards...</p>';
|
||||
try {
|
||||
var response = await fetch('/scorecards/all/panels?period=' + period);
|
||||
var html = await response.text();
|
||||
container.innerHTML = html;
|
||||
} catch (error) {
|
||||
container.innerHTML = '<p class="chat-history-placeholder">Scorecards unavailable</p>';
|
||||
}
|
||||
}
|
||||
|
||||
// Initial load
|
||||
loadSparkStatus();
|
||||
loadSovereignty();
|
||||
@@ -542,7 +510,6 @@ loadSwarmStats();
|
||||
loadLightningStats();
|
||||
loadGrokStats();
|
||||
loadChatHistory();
|
||||
loadMcScorecards();
|
||||
|
||||
// Periodic updates
|
||||
setInterval(loadSovereignty, 30000);
|
||||
@@ -551,6 +518,5 @@ setInterval(loadSwarmStats, 5000);
|
||||
setInterval(updateHeartbeat, 5000);
|
||||
setInterval(loadGrokStats, 10000);
|
||||
setInterval(loadSparkStatus, 15000);
|
||||
setInterval(loadMcScorecards, 300000);
|
||||
</script>
|
||||
{% endblock %}
|
||||
|
||||
@@ -1,122 +0,0 @@
|
||||
{% extends "base.html" %}
|
||||
|
||||
{% block title %}Nexus{% endblock %}
|
||||
|
||||
{% block extra_styles %}{% endblock %}
|
||||
|
||||
{% block content %}
|
||||
<div class="container-fluid nexus-layout py-3">
|
||||
|
||||
<div class="nexus-header mb-3">
|
||||
<div class="nexus-title">// NEXUS</div>
|
||||
<div class="nexus-subtitle">
|
||||
Persistent conversational awareness — always present, always learning.
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="nexus-grid">
|
||||
|
||||
<!-- ── LEFT: Conversation ────────────────────────────────── -->
|
||||
<div class="nexus-chat-col">
|
||||
<div class="card mc-panel nexus-chat-panel">
|
||||
<div class="card-header mc-panel-header d-flex justify-content-between align-items-center">
|
||||
<span>// CONVERSATION</span>
|
||||
<button class="mc-btn mc-btn-sm"
|
||||
hx-delete="/nexus/history"
|
||||
hx-target="#nexus-chat-log"
|
||||
hx-swap="beforeend"
|
||||
hx-confirm="Clear nexus conversation?">
|
||||
CLEAR
|
||||
</button>
|
||||
</div>
|
||||
|
||||
<div class="card-body p-2" id="nexus-chat-log">
|
||||
{% for msg in messages %}
|
||||
<div class="chat-message {{ 'user' if msg.role == 'user' else 'agent' }}">
|
||||
<div class="msg-meta">
|
||||
{{ 'YOU' if msg.role == 'user' else 'TIMMY' }} // {{ msg.timestamp }}
|
||||
</div>
|
||||
<div class="msg-body {% if msg.role == 'assistant' %}timmy-md{% endif %}">
|
||||
{{ msg.content | e }}
|
||||
</div>
|
||||
</div>
|
||||
{% else %}
|
||||
<div class="nexus-empty-state">
|
||||
Nexus is ready. Start a conversation — memories will surface in real time.
|
||||
</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
|
||||
<div class="card-footer p-2">
|
||||
<form hx-post="/nexus/chat"
|
||||
hx-target="#nexus-chat-log"
|
||||
hx-swap="beforeend"
|
||||
hx-on::after-request="this.reset(); document.getElementById('nexus-chat-log').scrollTop = 999999;">
|
||||
<div class="d-flex gap-2">
|
||||
<input type="text"
|
||||
name="message"
|
||||
id="nexus-input"
|
||||
class="mc-search-input flex-grow-1"
|
||||
placeholder="Talk to Timmy..."
|
||||
autocomplete="off"
|
||||
required>
|
||||
<button type="submit" class="mc-btn mc-btn-primary">SEND</button>
|
||||
</div>
|
||||
</form>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- ── RIGHT: Memory sidebar ─────────────────────────────── -->
|
||||
<div class="nexus-sidebar-col">
|
||||
|
||||
<!-- Live memory context (updated with each response) -->
|
||||
<div class="card mc-panel nexus-memory-panel mb-3">
|
||||
<div class="card-header mc-panel-header">
|
||||
<span>// LIVE MEMORY</span>
|
||||
<span class="badge ms-2" style="background:var(--purple-dim); color:var(--purple);">
|
||||
{{ stats.total_entries }} stored
|
||||
</span>
|
||||
</div>
|
||||
<div class="card-body p-2">
|
||||
<div id="nexus-memory-panel" class="nexus-memory-hits">
|
||||
<div class="nexus-memory-label">Relevant memories appear here as you chat.</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Teaching panel -->
|
||||
<div class="card mc-panel nexus-teach-panel">
|
||||
<div class="card-header mc-panel-header">// TEACH TIMMY</div>
|
||||
<div class="card-body p-2">
|
||||
<form hx-post="/nexus/teach"
|
||||
hx-target="#nexus-teach-response"
|
||||
hx-swap="innerHTML"
|
||||
hx-on::after-request="this.reset()">
|
||||
<div class="d-flex gap-2 mb-2">
|
||||
<input type="text"
|
||||
name="fact"
|
||||
class="mc-search-input flex-grow-1"
|
||||
placeholder="e.g. I prefer dark themes"
|
||||
required>
|
||||
<button type="submit" class="mc-btn mc-btn-primary">TEACH</button>
|
||||
</div>
|
||||
</form>
|
||||
<div id="nexus-teach-response"></div>
|
||||
|
||||
<div class="nexus-facts-header mt-3">// KNOWN FACTS</div>
|
||||
<ul class="nexus-facts-list" id="nexus-facts-list">
|
||||
{% for fact in facts %}
|
||||
<li class="nexus-fact-item">{{ fact.content | e }}</li>
|
||||
{% else %}
|
||||
<li class="nexus-fact-empty">No personal facts stored yet.</li>
|
||||
{% endfor %}
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div><!-- /sidebar -->
|
||||
</div><!-- /nexus-grid -->
|
||||
|
||||
</div>
|
||||
{% endblock %}
|
||||
@@ -1,12 +0,0 @@
|
||||
{% if taught %}
|
||||
<div class="nexus-taught-confirm">
|
||||
✓ Taught: <em>{{ taught | e }}</em>
|
||||
</div>
|
||||
{% endif %}
|
||||
<ul class="nexus-facts-list" id="nexus-facts-list" hx-swap-oob="true">
|
||||
{% for fact in facts %}
|
||||
<li class="nexus-fact-item">{{ fact.content | e }}</li>
|
||||
{% else %}
|
||||
<li class="nexus-fact-empty">No facts stored yet.</li>
|
||||
{% endfor %}
|
||||
</ul>
|
||||
@@ -1,36 +0,0 @@
|
||||
{% if user_message %}
|
||||
<div class="chat-message user">
|
||||
<div class="msg-meta">YOU // {{ timestamp }}</div>
|
||||
<div class="msg-body">{{ user_message | e }}</div>
|
||||
</div>
|
||||
{% endif %}
|
||||
{% if response %}
|
||||
<div class="chat-message agent">
|
||||
<div class="msg-meta">TIMMY // {{ timestamp }}</div>
|
||||
<div class="msg-body timmy-md">{{ response | e }}</div>
|
||||
</div>
|
||||
<script>
|
||||
(function() {
|
||||
var el = document.currentScript.previousElementSibling.querySelector('.timmy-md');
|
||||
if (el && typeof marked !== 'undefined' && typeof DOMPurify !== 'undefined') {
|
||||
el.innerHTML = DOMPurify.sanitize(marked.parse(el.textContent));
|
||||
}
|
||||
})();
|
||||
</script>
|
||||
{% elif error %}
|
||||
<div class="chat-message error-msg">
|
||||
<div class="msg-meta">SYSTEM // {{ timestamp }}</div>
|
||||
<div class="msg-body">{{ error | e }}</div>
|
||||
</div>
|
||||
{% endif %}
|
||||
{% if memory_hits %}
|
||||
<div class="nexus-memory-hits" id="nexus-memory-panel" hx-swap-oob="true">
|
||||
<div class="nexus-memory-label">// LIVE MEMORY CONTEXT</div>
|
||||
{% for hit in memory_hits %}
|
||||
<div class="nexus-memory-hit">
|
||||
<span class="nexus-memory-type">{{ hit.memory_type }}</span>
|
||||
<span class="nexus-memory-content">{{ hit.content | e }}</span>
|
||||
</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
{% endif %}
|
||||
@@ -1,28 +0,0 @@
|
||||
{% if patterns %}
|
||||
<table class="mc-table w-100">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>ERROR TYPE</th>
|
||||
<th class="text-center">COUNT</th>
|
||||
<th class="text-center">CORRECTED</th>
|
||||
<th class="text-center">FAILED</th>
|
||||
<th>LAST SEEN</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for p in patterns %}
|
||||
<tr>
|
||||
<td class="sc-pattern-type">{{ p.error_type }}</td>
|
||||
<td class="text-center">
|
||||
<span class="badge {% if p.count >= 5 %}badge-error{% elif p.count >= 3 %}badge-warning{% else %}badge-info{% endif %}">{{ p.count }}</span>
|
||||
</td>
|
||||
<td class="text-center text-success">{{ p.success_count }}</td>
|
||||
<td class="text-center {% if p.failed_count > 0 %}text-danger{% else %}text-muted{% endif %}">{{ p.failed_count }}</td>
|
||||
<td class="sc-event-time">{{ p.last_seen[:16] if p.last_seen else '—' }}</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
{% else %}
|
||||
<div class="text-center text-muted py-3">No patterns detected yet.</div>
|
||||
{% endif %}
|
||||
@@ -1,26 +0,0 @@
|
||||
{% if corrections %}
|
||||
{% for ev in corrections %}
|
||||
<div class="sc-event sc-status-{{ ev.outcome_status }}">
|
||||
<div class="sc-event-header">
|
||||
<span class="sc-status-badge sc-status-{{ ev.outcome_status }}">
|
||||
{% if ev.outcome_status == 'success' %}✓ CORRECTED
|
||||
{% elif ev.outcome_status == 'partial' %}● PARTIAL
|
||||
{% else %}✗ FAILED
|
||||
{% endif %}
|
||||
</span>
|
||||
<span class="sc-source-badge">{{ ev.source }}</span>
|
||||
<span class="sc-event-time">{{ ev.created_at[:19] }}</span>
|
||||
</div>
|
||||
<div class="sc-event-error-type">{{ ev.error_type }}</div>
|
||||
<div class="sc-event-intent"><span class="sc-label">INTENT:</span> {{ ev.original_intent[:120] }}{% if ev.original_intent | length > 120 %}…{% endif %}</div>
|
||||
<div class="sc-event-error"><span class="sc-label">ERROR:</span> {{ ev.detected_error[:120] }}{% if ev.detected_error | length > 120 %}…{% endif %}</div>
|
||||
<div class="sc-event-strategy"><span class="sc-label">STRATEGY:</span> {{ ev.correction_strategy[:120] }}{% if ev.correction_strategy | length > 120 %}…{% endif %}</div>
|
||||
<div class="sc-event-outcome"><span class="sc-label">OUTCOME:</span> {{ ev.final_outcome[:120] }}{% if ev.final_outcome | length > 120 %}…{% endif %}</div>
|
||||
{% if ev.task_id %}
|
||||
<div class="sc-event-meta">task: {{ ev.task_id[:8] }}</div>
|
||||
{% endif %}
|
||||
</div>
|
||||
{% endfor %}
|
||||
{% else %}
|
||||
<div class="text-center text-muted py-3">No self-correction events recorded yet.</div>
|
||||
{% endif %}
|
||||
@@ -1,102 +0,0 @@
|
||||
{% extends "base.html" %}
|
||||
{% from "macros.html" import panel %}
|
||||
|
||||
{% block title %}Timmy Time — Self-Correction Dashboard{% endblock %}
|
||||
|
||||
{% block extra_styles %}{% endblock %}
|
||||
|
||||
{% block content %}
|
||||
<div class="container-fluid py-3">
|
||||
|
||||
<!-- Header -->
|
||||
<div class="spark-header mb-3">
|
||||
<div class="spark-title">SELF-CORRECTION</div>
|
||||
<div class="spark-subtitle">
|
||||
Agent error detection & recovery —
|
||||
<span class="spark-status-val">{{ stats.total }}</span> events,
|
||||
<span class="spark-status-val">{{ stats.success_rate }}%</span> correction rate,
|
||||
<span class="spark-status-val">{{ stats.unique_error_types }}</span> distinct error types
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="row g-3">
|
||||
|
||||
<!-- Left column: stats + patterns -->
|
||||
<div class="col-12 col-lg-4 d-flex flex-column gap-3">
|
||||
|
||||
<!-- Stats panel -->
|
||||
<div class="card mc-panel">
|
||||
<div class="card-header mc-panel-header">// CORRECTION STATS</div>
|
||||
<div class="card-body p-3">
|
||||
<div class="spark-stat-grid">
|
||||
<div class="spark-stat">
|
||||
<span class="spark-stat-label">TOTAL</span>
|
||||
<span class="spark-stat-value">{{ stats.total }}</span>
|
||||
</div>
|
||||
<div class="spark-stat">
|
||||
<span class="spark-stat-label">CORRECTED</span>
|
||||
<span class="spark-stat-value text-success">{{ stats.success_count }}</span>
|
||||
</div>
|
||||
<div class="spark-stat">
|
||||
<span class="spark-stat-label">PARTIAL</span>
|
||||
<span class="spark-stat-value text-warning">{{ stats.partial_count }}</span>
|
||||
</div>
|
||||
<div class="spark-stat">
|
||||
<span class="spark-stat-label">FAILED</span>
|
||||
<span class="spark-stat-value {% if stats.failed_count > 0 %}text-danger{% else %}text-muted{% endif %}">{{ stats.failed_count }}</span>
|
||||
</div>
|
||||
</div>
|
||||
<div class="mt-3">
|
||||
<div class="d-flex justify-content-between mb-1">
|
||||
<small class="text-muted">Correction Rate</small>
|
||||
<small class="{% if stats.success_rate >= 70 %}text-success{% elif stats.success_rate >= 40 %}text-warning{% else %}text-danger{% endif %}">{{ stats.success_rate }}%</small>
|
||||
</div>
|
||||
<div class="progress" style="height:6px;">
|
||||
<div class="progress-bar {% if stats.success_rate >= 70 %}bg-success{% elif stats.success_rate >= 40 %}bg-warning{% else %}bg-danger{% endif %}"
|
||||
role="progressbar"
|
||||
style="width:{{ stats.success_rate }}%"
|
||||
aria-valuenow="{{ stats.success_rate }}"
|
||||
aria-valuemin="0"
|
||||
aria-valuemax="100"></div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Patterns panel -->
|
||||
<div class="card mc-panel"
|
||||
hx-get="/self-correction/patterns"
|
||||
hx-trigger="load, every 60s"
|
||||
hx-target="#sc-patterns-body"
|
||||
hx-swap="innerHTML">
|
||||
<div class="card-header mc-panel-header d-flex justify-content-between align-items-center">
|
||||
<span>// RECURRING PATTERNS</span>
|
||||
<span class="badge badge-info">{{ patterns | length }}</span>
|
||||
</div>
|
||||
<div class="card-body p-0" id="sc-patterns-body">
|
||||
{% include "partials/self_correction_patterns.html" %}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- Right column: timeline -->
|
||||
<div class="col-12 col-lg-8">
|
||||
<div class="card mc-panel"
|
||||
hx-get="/self-correction/timeline"
|
||||
hx-trigger="load, every 30s"
|
||||
hx-target="#sc-timeline-body"
|
||||
hx-swap="innerHTML">
|
||||
<div class="card-header mc-panel-header d-flex justify-content-between align-items-center">
|
||||
<span>// CORRECTION TIMELINE</span>
|
||||
<span class="badge badge-info">{{ corrections | length }}</span>
|
||||
</div>
|
||||
<div class="card-body p-3" id="sc-timeline-body">
|
||||
{% include "partials/self_correction_timeline.html" %}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
{% endblock %}
|
||||
@@ -24,8 +24,6 @@ MAX_MESSAGES: int = 500
|
||||
|
||||
@dataclass
|
||||
class Message:
|
||||
"""A single chat message with role, content, timestamp, and source."""
|
||||
|
||||
role: str # "user" | "agent" | "error"
|
||||
content: str
|
||||
timestamp: str
|
||||
|
||||
@@ -1,8 +0,0 @@
|
||||
"""Energy Budget Monitoring — power-draw estimation for LLM inference.
|
||||
|
||||
Refs: #1009
|
||||
"""
|
||||
|
||||
from infrastructure.energy.monitor import EnergyBudgetMonitor, energy_monitor
|
||||
|
||||
__all__ = ["EnergyBudgetMonitor", "energy_monitor"]
|
||||
@@ -1,371 +0,0 @@
|
||||
"""Energy Budget Monitor — estimates GPU/CPU power draw during LLM inference.
|
||||
|
||||
Tracks estimated power consumption to optimize for "metabolic efficiency".
|
||||
Three estimation strategies attempted in priority order:
|
||||
|
||||
1. Battery discharge via ioreg (macOS — works without sudo, on-battery only)
|
||||
2. CPU utilisation proxy via sysctl hw.cpufrequency + top
|
||||
3. Model-size heuristic (tokens/s × model_size_gb × 2W/GB estimate)
|
||||
|
||||
Energy Efficiency score (0–10):
|
||||
efficiency = tokens_per_second / estimated_watts, normalised to 0–10.
|
||||
|
||||
Low Power Mode:
|
||||
Activated manually or automatically when draw exceeds the configured
|
||||
threshold. In low power mode the cascade router is advised to prefer the
|
||||
configured low_power_model (e.g. qwen3:1b or similar compact model).
|
||||
|
||||
Refs: #1009
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
import subprocess
|
||||
import time
|
||||
from collections import deque
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import UTC, datetime
|
||||
from typing import Any
|
||||
|
||||
from config import settings
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Approximate model-size lookup (GB) used for heuristic power estimate.
|
||||
# Keys are lowercase substring matches against the model name.
|
||||
_MODEL_SIZE_GB: dict[str, float] = {
|
||||
"qwen3:1b": 0.8,
|
||||
"qwen3:3b": 2.0,
|
||||
"qwen3:4b": 2.5,
|
||||
"qwen3:8b": 5.5,
|
||||
"qwen3:14b": 9.0,
|
||||
"qwen3:30b": 20.0,
|
||||
"qwen3:32b": 20.0,
|
||||
"llama3:8b": 5.5,
|
||||
"llama3:70b": 45.0,
|
||||
"mistral:7b": 4.5,
|
||||
"gemma3:4b": 2.5,
|
||||
"gemma3:12b": 8.0,
|
||||
"gemma3:27b": 17.0,
|
||||
"phi4:14b": 9.0,
|
||||
}
|
||||
_DEFAULT_MODEL_SIZE_GB = 5.0 # fallback when model not in table
|
||||
_WATTS_PER_GB_HEURISTIC = 2.0 # rough W/GB for Apple Silicon unified memory
|
||||
|
||||
# Efficiency score normalisation: score 10 at this efficiency (tok/s per W).
|
||||
_EFFICIENCY_SCORE_CEILING = 5.0 # tok/s per W → score 10
|
||||
|
||||
# Rolling window for recent samples
|
||||
_HISTORY_MAXLEN = 60
|
||||
|
||||
|
||||
@dataclass
|
||||
class InferenceSample:
|
||||
"""A single inference event captured by record_inference()."""
|
||||
|
||||
timestamp: str
|
||||
model: str
|
||||
tokens_per_second: float
|
||||
estimated_watts: float
|
||||
efficiency: float # tokens/s per watt
|
||||
efficiency_score: float # 0–10
|
||||
|
||||
|
||||
@dataclass
|
||||
class EnergyReport:
|
||||
"""Snapshot of current energy budget state."""
|
||||
|
||||
timestamp: str
|
||||
low_power_mode: bool
|
||||
current_watts: float
|
||||
strategy: str # "battery", "cpu_proxy", "heuristic", "unavailable"
|
||||
efficiency_score: float # 0–10; -1 if no inference samples yet
|
||||
recent_samples: list[InferenceSample]
|
||||
recommendation: str
|
||||
details: dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {
|
||||
"timestamp": self.timestamp,
|
||||
"low_power_mode": self.low_power_mode,
|
||||
"current_watts": round(self.current_watts, 2),
|
||||
"strategy": self.strategy,
|
||||
"efficiency_score": round(self.efficiency_score, 2),
|
||||
"recent_samples": [
|
||||
{
|
||||
"timestamp": s.timestamp,
|
||||
"model": s.model,
|
||||
"tokens_per_second": round(s.tokens_per_second, 1),
|
||||
"estimated_watts": round(s.estimated_watts, 2),
|
||||
"efficiency": round(s.efficiency, 3),
|
||||
"efficiency_score": round(s.efficiency_score, 2),
|
||||
}
|
||||
for s in self.recent_samples
|
||||
],
|
||||
"recommendation": self.recommendation,
|
||||
"details": self.details,
|
||||
}
|
||||
|
||||
|
||||
class EnergyBudgetMonitor:
|
||||
"""Estimates power consumption and tracks LLM inference efficiency.
|
||||
|
||||
All blocking I/O (subprocess calls) is wrapped in asyncio.to_thread()
|
||||
so the event loop is never blocked. Results are cached.
|
||||
|
||||
Usage::
|
||||
|
||||
# Record an inference event
|
||||
energy_monitor.record_inference("qwen3:8b", tokens_per_second=42.0)
|
||||
|
||||
# Get the current report
|
||||
report = await energy_monitor.get_report()
|
||||
|
||||
# Toggle low power mode
|
||||
energy_monitor.set_low_power_mode(True)
|
||||
"""
|
||||
|
||||
_POWER_CACHE_TTL = 10.0 # seconds between fresh power readings
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._low_power_mode: bool = False
|
||||
self._samples: deque[InferenceSample] = deque(maxlen=_HISTORY_MAXLEN)
|
||||
self._cached_watts: float = 0.0
|
||||
self._cached_strategy: str = "unavailable"
|
||||
self._cache_ts: float = 0.0
|
||||
|
||||
# ── Public API ────────────────────────────────────────────────────────────
|
||||
|
||||
@property
|
||||
def low_power_mode(self) -> bool:
|
||||
return self._low_power_mode
|
||||
|
||||
def set_low_power_mode(self, enabled: bool) -> None:
|
||||
"""Enable or disable low power mode."""
|
||||
self._low_power_mode = enabled
|
||||
state = "enabled" if enabled else "disabled"
|
||||
logger.info("Energy budget: low power mode %s", state)
|
||||
|
||||
def record_inference(self, model: str, tokens_per_second: float) -> InferenceSample:
|
||||
"""Record an inference event for efficiency tracking.
|
||||
|
||||
Call this after each LLM inference completes with the model name and
|
||||
measured throughput. The current power estimate is used to compute
|
||||
the efficiency score.
|
||||
|
||||
Args:
|
||||
model: Ollama model name (e.g. "qwen3:8b").
|
||||
tokens_per_second: Measured decode throughput.
|
||||
|
||||
Returns:
|
||||
The recorded InferenceSample.
|
||||
"""
|
||||
watts = self._cached_watts if self._cached_watts > 0 else self._estimate_watts_sync(model)
|
||||
efficiency = tokens_per_second / max(watts, 0.1)
|
||||
score = min(10.0, (efficiency / _EFFICIENCY_SCORE_CEILING) * 10.0)
|
||||
|
||||
sample = InferenceSample(
|
||||
timestamp=datetime.now(UTC).isoformat(),
|
||||
model=model,
|
||||
tokens_per_second=tokens_per_second,
|
||||
estimated_watts=watts,
|
||||
efficiency=efficiency,
|
||||
efficiency_score=score,
|
||||
)
|
||||
self._samples.append(sample)
|
||||
|
||||
# Auto-engage low power mode if above threshold and budget is enabled
|
||||
threshold = getattr(settings, "energy_budget_watts_threshold", 15.0)
|
||||
if watts > threshold and not self._low_power_mode:
|
||||
logger.info(
|
||||
"Energy budget: %.1fW exceeds threshold %.1fW — auto-engaging low power mode",
|
||||
watts,
|
||||
threshold,
|
||||
)
|
||||
self.set_low_power_mode(True)
|
||||
|
||||
return sample
|
||||
|
||||
async def get_report(self) -> EnergyReport:
|
||||
"""Return the current energy budget report.
|
||||
|
||||
Refreshes the power estimate if the cache is stale.
|
||||
"""
|
||||
await self._refresh_power_cache()
|
||||
|
||||
score = self._compute_mean_efficiency_score()
|
||||
recommendation = self._build_recommendation(score)
|
||||
|
||||
return EnergyReport(
|
||||
timestamp=datetime.now(UTC).isoformat(),
|
||||
low_power_mode=self._low_power_mode,
|
||||
current_watts=self._cached_watts,
|
||||
strategy=self._cached_strategy,
|
||||
efficiency_score=score,
|
||||
recent_samples=list(self._samples)[-10:],
|
||||
recommendation=recommendation,
|
||||
details={"sample_count": len(self._samples)},
|
||||
)
|
||||
|
||||
# ── Power estimation ──────────────────────────────────────────────────────
|
||||
|
||||
async def _refresh_power_cache(self) -> None:
|
||||
"""Refresh the cached power reading if stale."""
|
||||
now = time.monotonic()
|
||||
if now - self._cache_ts < self._POWER_CACHE_TTL:
|
||||
return
|
||||
|
||||
try:
|
||||
watts, strategy = await asyncio.to_thread(self._read_power)
|
||||
except Exception as exc:
|
||||
logger.debug("Energy: power read failed: %s", exc)
|
||||
watts, strategy = 0.0, "unavailable"
|
||||
|
||||
self._cached_watts = watts
|
||||
self._cached_strategy = strategy
|
||||
self._cache_ts = now
|
||||
|
||||
def _read_power(self) -> tuple[float, str]:
|
||||
"""Synchronous power reading — tries strategies in priority order.
|
||||
|
||||
Returns:
|
||||
Tuple of (watts, strategy_name).
|
||||
"""
|
||||
# Strategy 1: battery discharge via ioreg (on-battery Macs)
|
||||
try:
|
||||
watts = self._read_battery_watts()
|
||||
if watts > 0:
|
||||
return watts, "battery"
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Strategy 2: CPU utilisation proxy via top
|
||||
try:
|
||||
cpu_pct = self._read_cpu_pct()
|
||||
if cpu_pct >= 0:
|
||||
# M3 Max TDP ≈ 40W; scale linearly
|
||||
watts = (cpu_pct / 100.0) * 40.0
|
||||
return watts, "cpu_proxy"
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Strategy 3: heuristic from loaded model size
|
||||
return 0.0, "unavailable"
|
||||
|
||||
def _estimate_watts_sync(self, model: str) -> float:
|
||||
"""Estimate watts from model size when no live reading is available."""
|
||||
size_gb = self._model_size_gb(model)
|
||||
return size_gb * _WATTS_PER_GB_HEURISTIC
|
||||
|
||||
def _read_battery_watts(self) -> float:
|
||||
"""Read instantaneous battery discharge via ioreg.
|
||||
|
||||
Returns watts if on battery, 0.0 if plugged in or unavailable.
|
||||
Requires macOS; no sudo needed.
|
||||
"""
|
||||
result = subprocess.run(
|
||||
["ioreg", "-r", "-c", "AppleSmartBattery", "-d", "1"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=3,
|
||||
)
|
||||
amperage_ma = 0.0
|
||||
voltage_mv = 0.0
|
||||
is_charging = True # assume charging unless we see ExternalConnected = No
|
||||
|
||||
for line in result.stdout.splitlines():
|
||||
stripped = line.strip()
|
||||
if '"InstantAmperage"' in stripped:
|
||||
try:
|
||||
amperage_ma = float(stripped.split("=")[-1].strip())
|
||||
except ValueError:
|
||||
pass
|
||||
elif '"Voltage"' in stripped:
|
||||
try:
|
||||
voltage_mv = float(stripped.split("=")[-1].strip())
|
||||
except ValueError:
|
||||
pass
|
||||
elif '"ExternalConnected"' in stripped:
|
||||
is_charging = "Yes" in stripped
|
||||
|
||||
if is_charging or voltage_mv == 0 or amperage_ma <= 0:
|
||||
return 0.0
|
||||
|
||||
# ioreg reports amperage in mA, voltage in mV
|
||||
return (abs(amperage_ma) * voltage_mv) / 1_000_000
|
||||
|
||||
def _read_cpu_pct(self) -> float:
|
||||
"""Read CPU utilisation from macOS top.
|
||||
|
||||
Returns aggregate CPU% (0–100), or -1.0 on failure.
|
||||
"""
|
||||
result = subprocess.run(
|
||||
["top", "-l", "1", "-n", "0", "-stats", "cpu"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=5,
|
||||
)
|
||||
for line in result.stdout.splitlines():
|
||||
if "CPU usage:" in line:
|
||||
# "CPU usage: 12.5% user, 8.3% sys, 79.1% idle"
|
||||
parts = line.split()
|
||||
try:
|
||||
user = float(parts[2].rstrip("%"))
|
||||
sys_ = float(parts[4].rstrip("%"))
|
||||
return user + sys_
|
||||
except (IndexError, ValueError):
|
||||
pass
|
||||
return -1.0
|
||||
|
||||
# ── Helpers ───────────────────────────────────────────────────────────────
|
||||
|
||||
@staticmethod
|
||||
def _model_size_gb(model: str) -> float:
|
||||
"""Look up approximate model size in GB by name substring."""
|
||||
lower = model.lower()
|
||||
# Exact match first
|
||||
if lower in _MODEL_SIZE_GB:
|
||||
return _MODEL_SIZE_GB[lower]
|
||||
# Substring match
|
||||
for key, size in _MODEL_SIZE_GB.items():
|
||||
if key in lower:
|
||||
return size
|
||||
return _DEFAULT_MODEL_SIZE_GB
|
||||
|
||||
def _compute_mean_efficiency_score(self) -> float:
|
||||
"""Mean efficiency score over recent samples, or -1 if none."""
|
||||
if not self._samples:
|
||||
return -1.0
|
||||
recent = list(self._samples)[-10:]
|
||||
return sum(s.efficiency_score for s in recent) / len(recent)
|
||||
|
||||
def _build_recommendation(self, score: float) -> str:
|
||||
"""Generate a human-readable recommendation from the efficiency score."""
|
||||
threshold = getattr(settings, "energy_budget_watts_threshold", 15.0)
|
||||
low_power_model = getattr(settings, "energy_low_power_model", "qwen3:1b")
|
||||
|
||||
if score < 0:
|
||||
return "No inference data yet — run some tasks to populate efficiency metrics."
|
||||
|
||||
if self._low_power_mode:
|
||||
return (
|
||||
f"Low power mode active — routing to {low_power_model}. "
|
||||
"Disable when power draw normalises."
|
||||
)
|
||||
|
||||
if score < 3.0:
|
||||
return (
|
||||
f"Low efficiency (score {score:.1f}/10). "
|
||||
f"Consider enabling low power mode to favour smaller models "
|
||||
f"(threshold: {threshold}W)."
|
||||
)
|
||||
|
||||
if score < 6.0:
|
||||
return f"Moderate efficiency (score {score:.1f}/10). System operating normally."
|
||||
|
||||
return f"Good efficiency (score {score:.1f}/10). No action needed."
|
||||
|
||||
|
||||
# Module-level singleton
|
||||
energy_monitor = EnergyBudgetMonitor()
|
||||
@@ -71,53 +71,6 @@ class GitHand:
|
||||
return True
|
||||
return False
|
||||
|
||||
async def _exec_subprocess(
|
||||
self,
|
||||
args: str,
|
||||
timeout: int,
|
||||
) -> tuple[bytes, bytes, int]:
|
||||
"""Run git as a subprocess, return (stdout, stderr, returncode).
|
||||
|
||||
Raises TimeoutError if the process exceeds *timeout* seconds.
|
||||
"""
|
||||
proc = await asyncio.create_subprocess_exec(
|
||||
"git",
|
||||
*args.split(),
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.PIPE,
|
||||
cwd=self._repo_dir,
|
||||
)
|
||||
try:
|
||||
stdout, stderr = await asyncio.wait_for(
|
||||
proc.communicate(),
|
||||
timeout=timeout,
|
||||
)
|
||||
except TimeoutError:
|
||||
proc.kill()
|
||||
await proc.wait()
|
||||
raise
|
||||
return stdout, stderr, proc.returncode or 0
|
||||
|
||||
@staticmethod
|
||||
def _parse_output(
|
||||
command: str,
|
||||
stdout_bytes: bytes,
|
||||
stderr_bytes: bytes,
|
||||
returncode: int | None,
|
||||
latency_ms: float,
|
||||
) -> GitResult:
|
||||
"""Decode subprocess output into a GitResult."""
|
||||
exit_code = returncode or 0
|
||||
stdout = stdout_bytes.decode("utf-8", errors="replace").strip()
|
||||
stderr = stderr_bytes.decode("utf-8", errors="replace").strip()
|
||||
return GitResult(
|
||||
operation=command,
|
||||
success=exit_code == 0,
|
||||
output=stdout,
|
||||
error=stderr if exit_code != 0 else "",
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self,
|
||||
args: str,
|
||||
@@ -135,15 +88,14 @@ class GitHand:
|
||||
GitResult with output or error details.
|
||||
"""
|
||||
start = time.time()
|
||||
command = f"git {args}"
|
||||
|
||||
# Gate destructive operations
|
||||
if self._is_destructive(args) and not allow_destructive:
|
||||
return GitResult(
|
||||
operation=command,
|
||||
operation=f"git {args}",
|
||||
success=False,
|
||||
error=(
|
||||
f"Destructive operation blocked: '{command}'. "
|
||||
f"Destructive operation blocked: 'git {args}'. "
|
||||
"Set allow_destructive=True to override."
|
||||
),
|
||||
requires_confirmation=True,
|
||||
@@ -151,21 +103,46 @@ class GitHand:
|
||||
)
|
||||
|
||||
effective_timeout = timeout or self._timeout
|
||||
command = f"git {args}"
|
||||
|
||||
try:
|
||||
stdout_bytes, stderr_bytes, returncode = await self._exec_subprocess(
|
||||
args,
|
||||
effective_timeout,
|
||||
proc = await asyncio.create_subprocess_exec(
|
||||
"git",
|
||||
*args.split(),
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.PIPE,
|
||||
cwd=self._repo_dir,
|
||||
)
|
||||
except TimeoutError:
|
||||
|
||||
try:
|
||||
stdout_bytes, stderr_bytes = await asyncio.wait_for(
|
||||
proc.communicate(), timeout=effective_timeout
|
||||
)
|
||||
except TimeoutError:
|
||||
proc.kill()
|
||||
await proc.wait()
|
||||
latency = (time.time() - start) * 1000
|
||||
logger.warning("Git command timed out after %ds: %s", effective_timeout, command)
|
||||
return GitResult(
|
||||
operation=command,
|
||||
success=False,
|
||||
error=f"Command timed out after {effective_timeout}s",
|
||||
latency_ms=latency,
|
||||
)
|
||||
|
||||
latency = (time.time() - start) * 1000
|
||||
logger.warning("Git command timed out after %ds: %s", effective_timeout, command)
|
||||
exit_code = proc.returncode or 0
|
||||
stdout = stdout_bytes.decode("utf-8", errors="replace").strip()
|
||||
stderr = stderr_bytes.decode("utf-8", errors="replace").strip()
|
||||
|
||||
return GitResult(
|
||||
operation=command,
|
||||
success=False,
|
||||
error=f"Command timed out after {effective_timeout}s",
|
||||
success=exit_code == 0,
|
||||
output=stdout,
|
||||
error=stderr if exit_code != 0 else "",
|
||||
latency_ms=latency,
|
||||
)
|
||||
|
||||
except FileNotFoundError:
|
||||
latency = (time.time() - start) * 1000
|
||||
logger.warning("git binary not found")
|
||||
@@ -185,14 +162,6 @@ class GitHand:
|
||||
latency_ms=latency,
|
||||
)
|
||||
|
||||
return self._parse_output(
|
||||
command,
|
||||
stdout_bytes,
|
||||
stderr_bytes,
|
||||
returncode=returncode,
|
||||
latency_ms=(time.time() - start) * 1000,
|
||||
)
|
||||
|
||||
# ── Convenience wrappers ─────────────────────────────────────────────────
|
||||
|
||||
async def status(self) -> GitResult:
|
||||
|
||||
@@ -1,11 +1,5 @@
|
||||
"""Infrastructure models package."""
|
||||
|
||||
from infrastructure.models.budget import (
|
||||
BudgetTracker,
|
||||
SpendRecord,
|
||||
estimate_cost_usd,
|
||||
get_budget_tracker,
|
||||
)
|
||||
from infrastructure.models.multimodal import (
|
||||
ModelCapability,
|
||||
ModelInfo,
|
||||
@@ -23,12 +17,6 @@ from infrastructure.models.registry import (
|
||||
ModelRole,
|
||||
model_registry,
|
||||
)
|
||||
from infrastructure.models.router import (
|
||||
TierLabel,
|
||||
TieredModelRouter,
|
||||
classify_tier,
|
||||
get_tiered_router,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
# Registry
|
||||
@@ -46,14 +34,4 @@ __all__ = [
|
||||
"model_supports_tools",
|
||||
"model_supports_vision",
|
||||
"pull_model_with_fallback",
|
||||
# Tiered router
|
||||
"TierLabel",
|
||||
"TieredModelRouter",
|
||||
"classify_tier",
|
||||
"get_tiered_router",
|
||||
# Budget tracker
|
||||
"BudgetTracker",
|
||||
"SpendRecord",
|
||||
"estimate_cost_usd",
|
||||
"get_budget_tracker",
|
||||
]
|
||||
|
||||
@@ -1,302 +0,0 @@
|
||||
"""Cloud API budget tracker for the three-tier model router.
|
||||
|
||||
Tracks cloud API spend (daily / monthly) and enforces configurable limits.
|
||||
SQLite-backed with in-memory fallback — degrades gracefully if the database
|
||||
is unavailable.
|
||||
|
||||
References:
|
||||
- Issue #882 — Model Tiering Router: Local 8B / Hermes 70B / Cloud API Cascade
|
||||
"""
|
||||
|
||||
import logging
|
||||
import sqlite3
|
||||
import threading
|
||||
import time
|
||||
from dataclasses import dataclass
|
||||
from datetime import UTC, date, datetime
|
||||
from pathlib import Path
|
||||
|
||||
from config import settings
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ── Cost estimates (USD per 1 K tokens, input / output) ──────────────────────
|
||||
# Updated 2026-03. Estimates only — actual costs vary by tier/usage.
|
||||
_COST_PER_1K: dict[str, dict[str, float]] = {
|
||||
# Claude models
|
||||
"claude-haiku-4-5": {"input": 0.00025, "output": 0.00125},
|
||||
"claude-sonnet-4-5": {"input": 0.003, "output": 0.015},
|
||||
"claude-opus-4-5": {"input": 0.015, "output": 0.075},
|
||||
"haiku": {"input": 0.00025, "output": 0.00125},
|
||||
"sonnet": {"input": 0.003, "output": 0.015},
|
||||
"opus": {"input": 0.015, "output": 0.075},
|
||||
# GPT-4o
|
||||
"gpt-4o-mini": {"input": 0.00015, "output": 0.0006},
|
||||
"gpt-4o": {"input": 0.0025, "output": 0.01},
|
||||
# Grok (xAI)
|
||||
"grok-3-fast": {"input": 0.003, "output": 0.015},
|
||||
"grok-3": {"input": 0.005, "output": 0.025},
|
||||
}
|
||||
_DEFAULT_COST: dict[str, float] = {"input": 0.003, "output": 0.015} # conservative fallback
|
||||
|
||||
|
||||
def estimate_cost_usd(model: str, tokens_in: int, tokens_out: int) -> float:
|
||||
"""Estimate the cost of a single request in USD.
|
||||
|
||||
Matches the model name by substring so versioned names like
|
||||
``claude-haiku-4-5-20251001`` still resolve correctly.
|
||||
|
||||
Args:
|
||||
model: Model name as passed to the provider.
|
||||
tokens_in: Number of input (prompt) tokens consumed.
|
||||
tokens_out: Number of output (completion) tokens generated.
|
||||
|
||||
Returns:
|
||||
Estimated cost in USD (may be zero for unknown models).
|
||||
"""
|
||||
model_lower = model.lower()
|
||||
rates = _DEFAULT_COST
|
||||
for key, rate in _COST_PER_1K.items():
|
||||
if key in model_lower:
|
||||
rates = rate
|
||||
break
|
||||
return (tokens_in * rates["input"] + tokens_out * rates["output"]) / 1000.0
|
||||
|
||||
|
||||
@dataclass
|
||||
class SpendRecord:
|
||||
"""A single spend event."""
|
||||
|
||||
ts: float
|
||||
provider: str
|
||||
model: str
|
||||
tokens_in: int
|
||||
tokens_out: int
|
||||
cost_usd: float
|
||||
tier: str
|
||||
|
||||
|
||||
class BudgetTracker:
|
||||
"""Tracks cloud API spend with configurable daily / monthly limits.
|
||||
|
||||
Persists spend records to SQLite (``data/budget.db`` by default).
|
||||
Falls back to in-memory tracking when the database is unavailable —
|
||||
budget enforcement still works; records are lost on restart.
|
||||
|
||||
Limits are read from ``settings``:
|
||||
|
||||
* ``tier_cloud_daily_budget_usd`` — daily ceiling (0 = disabled)
|
||||
* ``tier_cloud_monthly_budget_usd`` — monthly ceiling (0 = disabled)
|
||||
|
||||
Usage::
|
||||
|
||||
tracker = BudgetTracker()
|
||||
|
||||
if tracker.cloud_allowed():
|
||||
# … make cloud API call …
|
||||
tracker.record_spend("anthropic", "claude-haiku-4-5", 100, 200)
|
||||
|
||||
summary = tracker.get_summary()
|
||||
print(summary["daily_usd"], "/", summary["daily_limit_usd"])
|
||||
"""
|
||||
|
||||
_DB_PATH = "data/budget.db"
|
||||
|
||||
def __init__(self, db_path: str | None = None) -> None:
|
||||
"""Initialise the tracker.
|
||||
|
||||
Args:
|
||||
db_path: Path to the SQLite database. Defaults to
|
||||
``data/budget.db``. Pass ``":memory:"`` for tests.
|
||||
"""
|
||||
self._db_path = db_path or self._DB_PATH
|
||||
self._lock = threading.Lock()
|
||||
self._in_memory: list[SpendRecord] = []
|
||||
self._db_ok = False
|
||||
self._init_db()
|
||||
|
||||
# ── Database initialisation ──────────────────────────────────────────────
|
||||
|
||||
def _init_db(self) -> None:
|
||||
"""Create the spend table (and parent directory) if needed."""
|
||||
try:
|
||||
if self._db_path != ":memory:":
|
||||
Path(self._db_path).parent.mkdir(parents=True, exist_ok=True)
|
||||
with self._connect() as conn:
|
||||
conn.execute(
|
||||
"""
|
||||
CREATE TABLE IF NOT EXISTS cloud_spend (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
ts REAL NOT NULL,
|
||||
provider TEXT NOT NULL,
|
||||
model TEXT NOT NULL,
|
||||
tokens_in INTEGER NOT NULL DEFAULT 0,
|
||||
tokens_out INTEGER NOT NULL DEFAULT 0,
|
||||
cost_usd REAL NOT NULL DEFAULT 0.0,
|
||||
tier TEXT NOT NULL DEFAULT 'cloud'
|
||||
)
|
||||
"""
|
||||
)
|
||||
conn.execute(
|
||||
"CREATE INDEX IF NOT EXISTS idx_spend_ts ON cloud_spend(ts)"
|
||||
)
|
||||
self._db_ok = True
|
||||
logger.debug("BudgetTracker: SQLite initialised at %s", self._db_path)
|
||||
except Exception as exc:
|
||||
logger.warning(
|
||||
"BudgetTracker: SQLite unavailable, using in-memory fallback: %s", exc
|
||||
)
|
||||
|
||||
def _connect(self) -> sqlite3.Connection:
|
||||
return sqlite3.connect(self._db_path, timeout=5)
|
||||
|
||||
# ── Public API ───────────────────────────────────────────────────────────
|
||||
|
||||
def record_spend(
|
||||
self,
|
||||
provider: str,
|
||||
model: str,
|
||||
tokens_in: int = 0,
|
||||
tokens_out: int = 0,
|
||||
cost_usd: float | None = None,
|
||||
tier: str = "cloud",
|
||||
) -> float:
|
||||
"""Record a cloud API spend event and return the cost recorded.
|
||||
|
||||
Args:
|
||||
provider: Provider name (e.g. ``"anthropic"``, ``"openai"``).
|
||||
model: Model name used for the request.
|
||||
tokens_in: Input token count (prompt).
|
||||
tokens_out: Output token count (completion).
|
||||
cost_usd: Explicit cost override. If ``None``, the cost is
|
||||
estimated from the token counts and model rates.
|
||||
tier: Tier label for the request (default ``"cloud"``).
|
||||
|
||||
Returns:
|
||||
The cost recorded in USD.
|
||||
"""
|
||||
if cost_usd is None:
|
||||
cost_usd = estimate_cost_usd(model, tokens_in, tokens_out)
|
||||
|
||||
ts = time.time()
|
||||
record = SpendRecord(ts, provider, model, tokens_in, tokens_out, cost_usd, tier)
|
||||
|
||||
with self._lock:
|
||||
if self._db_ok:
|
||||
try:
|
||||
with self._connect() as conn:
|
||||
conn.execute(
|
||||
"""
|
||||
INSERT INTO cloud_spend
|
||||
(ts, provider, model, tokens_in, tokens_out, cost_usd, tier)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?)
|
||||
""",
|
||||
(ts, provider, model, tokens_in, tokens_out, cost_usd, tier),
|
||||
)
|
||||
logger.debug(
|
||||
"BudgetTracker: recorded %.6f USD (%s/%s, in=%d out=%d tier=%s)",
|
||||
cost_usd,
|
||||
provider,
|
||||
model,
|
||||
tokens_in,
|
||||
tokens_out,
|
||||
tier,
|
||||
)
|
||||
return cost_usd
|
||||
except Exception as exc:
|
||||
logger.warning("BudgetTracker: DB write failed, falling back: %s", exc)
|
||||
self._in_memory.append(record)
|
||||
|
||||
return cost_usd
|
||||
|
||||
def get_daily_spend(self) -> float:
|
||||
"""Return total cloud spend for the current UTC day in USD."""
|
||||
today = date.today()
|
||||
since = datetime(today.year, today.month, today.day, tzinfo=UTC).timestamp()
|
||||
return self._query_spend(since)
|
||||
|
||||
def get_monthly_spend(self) -> float:
|
||||
"""Return total cloud spend for the current UTC month in USD."""
|
||||
today = date.today()
|
||||
since = datetime(today.year, today.month, 1, tzinfo=UTC).timestamp()
|
||||
return self._query_spend(since)
|
||||
|
||||
def cloud_allowed(self) -> bool:
|
||||
"""Return ``True`` if cloud API spend is within configured limits.
|
||||
|
||||
Checks both daily and monthly ceilings. A limit of ``0`` disables
|
||||
that particular check.
|
||||
"""
|
||||
daily_limit = settings.tier_cloud_daily_budget_usd
|
||||
monthly_limit = settings.tier_cloud_monthly_budget_usd
|
||||
|
||||
if daily_limit > 0:
|
||||
daily_spend = self.get_daily_spend()
|
||||
if daily_spend >= daily_limit:
|
||||
logger.warning(
|
||||
"BudgetTracker: daily cloud budget exhausted (%.4f / %.4f USD)",
|
||||
daily_spend,
|
||||
daily_limit,
|
||||
)
|
||||
return False
|
||||
|
||||
if monthly_limit > 0:
|
||||
monthly_spend = self.get_monthly_spend()
|
||||
if monthly_spend >= monthly_limit:
|
||||
logger.warning(
|
||||
"BudgetTracker: monthly cloud budget exhausted (%.4f / %.4f USD)",
|
||||
monthly_spend,
|
||||
monthly_limit,
|
||||
)
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
def get_summary(self) -> dict:
|
||||
"""Return a spend summary dict suitable for dashboards / logging.
|
||||
|
||||
Keys: ``daily_usd``, ``monthly_usd``, ``daily_limit_usd``,
|
||||
``monthly_limit_usd``, ``daily_ok``, ``monthly_ok``.
|
||||
"""
|
||||
daily = self.get_daily_spend()
|
||||
monthly = self.get_monthly_spend()
|
||||
daily_limit = settings.tier_cloud_daily_budget_usd
|
||||
monthly_limit = settings.tier_cloud_monthly_budget_usd
|
||||
return {
|
||||
"daily_usd": round(daily, 6),
|
||||
"monthly_usd": round(monthly, 6),
|
||||
"daily_limit_usd": daily_limit,
|
||||
"monthly_limit_usd": monthly_limit,
|
||||
"daily_ok": daily_limit <= 0 or daily < daily_limit,
|
||||
"monthly_ok": monthly_limit <= 0 or monthly < monthly_limit,
|
||||
}
|
||||
|
||||
# ── Internal helpers ─────────────────────────────────────────────────────
|
||||
|
||||
def _query_spend(self, since_ts: float) -> float:
|
||||
"""Sum ``cost_usd`` for records with ``ts >= since_ts``."""
|
||||
if self._db_ok:
|
||||
try:
|
||||
with self._connect() as conn:
|
||||
row = conn.execute(
|
||||
"SELECT COALESCE(SUM(cost_usd), 0.0) FROM cloud_spend WHERE ts >= ?",
|
||||
(since_ts,),
|
||||
).fetchone()
|
||||
return float(row[0]) if row else 0.0
|
||||
except Exception as exc:
|
||||
logger.warning("BudgetTracker: DB read failed: %s", exc)
|
||||
# In-memory fallback
|
||||
return sum(r.cost_usd for r in self._in_memory if r.ts >= since_ts)
|
||||
|
||||
|
||||
# ── Module-level singleton ────────────────────────────────────────────────────
|
||||
|
||||
_budget_tracker: BudgetTracker | None = None
|
||||
|
||||
|
||||
def get_budget_tracker() -> BudgetTracker:
|
||||
"""Get or create the module-level BudgetTracker singleton."""
|
||||
global _budget_tracker
|
||||
if _budget_tracker is None:
|
||||
_budget_tracker = BudgetTracker()
|
||||
return _budget_tracker
|
||||
@@ -1,427 +0,0 @@
|
||||
"""Three-tier model router — Local 8B / Local 70B / Cloud API Cascade.
|
||||
|
||||
Selects the cheapest-sufficient LLM for each request using a heuristic
|
||||
task-complexity classifier. Tier 3 (Cloud API) is only used when Tier 2
|
||||
fails or the budget guard allows it.
|
||||
|
||||
Tiers
|
||||
-----
|
||||
Tier 1 — LOCAL_FAST (Llama 3.1 8B / Hermes 3 8B via Ollama, free, ~0.3-1 s)
|
||||
Navigation, basic interactions, simple decisions.
|
||||
|
||||
Tier 2 — LOCAL_HEAVY (Hermes 3/4 70B via Ollama, free, ~5-10 s for 200 tok)
|
||||
Quest planning, dialogue strategy, complex reasoning.
|
||||
|
||||
Tier 3 — CLOUD_API (Claude / GPT-4o, paid ~$5-15/hr heavy use)
|
||||
Recovery from Tier 2 failures, novel situations, multi-step planning.
|
||||
|
||||
Routing logic
|
||||
-------------
|
||||
1. Classify the task using keyword / length / context heuristics (no LLM call).
|
||||
2. Route to the appropriate tier.
|
||||
3. On Tier-1 low-quality response → auto-escalate to Tier 2.
|
||||
4. On Tier-2 failure or explicit ``require_cloud=True`` → Tier 3 (if budget allows).
|
||||
5. Log tier used, model, latency, estimated cost for every request.
|
||||
|
||||
References:
|
||||
- Issue #882 — Model Tiering Router: Local 8B / Hermes 70B / Cloud API Cascade
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import re
|
||||
import time
|
||||
from enum import StrEnum
|
||||
from typing import Any
|
||||
|
||||
from config import settings
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# ── Tier definitions ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TierLabel(StrEnum):
|
||||
"""Three cost-sorted model tiers."""
|
||||
|
||||
LOCAL_FAST = "local_fast" # 8B local, always hot, free
|
||||
LOCAL_HEAVY = "local_heavy" # 70B local, free but slower
|
||||
CLOUD_API = "cloud_api" # Paid cloud backend (Claude / GPT-4o)
|
||||
|
||||
|
||||
# ── Default model assignments (overridable via Settings) ──────────────────────
|
||||
|
||||
_DEFAULT_TIER_MODELS: dict[TierLabel, str] = {
|
||||
TierLabel.LOCAL_FAST: "llama3.1:8b",
|
||||
TierLabel.LOCAL_HEAVY: "hermes3:70b",
|
||||
TierLabel.CLOUD_API: "claude-haiku-4-5",
|
||||
}
|
||||
|
||||
# ── Classification vocabulary ─────────────────────────────────────────────────
|
||||
|
||||
# Patterns that indicate a Tier-1 (simple) task
|
||||
_T1_WORDS: frozenset[str] = frozenset(
|
||||
{
|
||||
"go", "move", "walk", "run",
|
||||
"north", "south", "east", "west", "up", "down", "left", "right",
|
||||
"yes", "no", "ok", "okay",
|
||||
"open", "close", "take", "drop", "look",
|
||||
"pick", "use", "wait", "rest", "save",
|
||||
"attack", "flee", "jump", "crouch",
|
||||
"status", "ping", "list", "show", "get", "check",
|
||||
}
|
||||
)
|
||||
|
||||
# Patterns that indicate a Tier-2 or Tier-3 task
|
||||
_T2_PHRASES: tuple[str, ...] = (
|
||||
"plan", "strategy", "optimize", "optimise",
|
||||
"quest", "stuck", "recover",
|
||||
"negotiate", "persuade", "faction", "reputation",
|
||||
"analyze", "analyse", "evaluate", "decide",
|
||||
"complex", "multi-step", "long-term",
|
||||
"how do i", "what should i do", "help me figure",
|
||||
"what is the best", "recommend", "best way",
|
||||
"explain", "describe in detail", "walk me through",
|
||||
"compare", "design", "implement", "refactor",
|
||||
"debug", "diagnose", "root cause",
|
||||
)
|
||||
|
||||
# Low-quality response detection patterns
|
||||
_LOW_QUALITY_PATTERNS: tuple[re.Pattern, ...] = (
|
||||
re.compile(r"i\s+don'?t\s+know", re.IGNORECASE),
|
||||
re.compile(r"i'm\s+not\s+sure", re.IGNORECASE),
|
||||
re.compile(r"i\s+cannot\s+(help|assist|answer)", re.IGNORECASE),
|
||||
re.compile(r"i\s+apologize", re.IGNORECASE),
|
||||
re.compile(r"as an ai", re.IGNORECASE),
|
||||
re.compile(r"i\s+don'?t\s+have\s+(enough|sufficient)\s+information", re.IGNORECASE),
|
||||
)
|
||||
|
||||
# Response is definitely low-quality if shorter than this many characters
|
||||
_LOW_QUALITY_MIN_CHARS = 20
|
||||
# Response is suspicious if shorter than this many chars for a complex task
|
||||
_ESCALATION_MIN_CHARS = 60
|
||||
|
||||
|
||||
def classify_tier(task: str, context: dict | None = None) -> TierLabel:
|
||||
"""Classify a task to the cheapest-sufficient model tier.
|
||||
|
||||
Classification priority (highest wins):
|
||||
1. ``context["require_cloud"] = True`` → CLOUD_API
|
||||
2. Any Tier-2 phrase or stuck/recovery signal → LOCAL_HEAVY
|
||||
3. Short task with only Tier-1 words, no active context → LOCAL_FAST
|
||||
4. Default → LOCAL_HEAVY (safe fallback for unknown tasks)
|
||||
|
||||
Args:
|
||||
task: Natural-language task or user input.
|
||||
context: Optional context dict. Recognised keys:
|
||||
``require_cloud`` (bool), ``stuck`` (bool),
|
||||
``require_t2`` (bool), ``active_quests`` (list),
|
||||
``dialogue_active`` (bool), ``combat_active`` (bool).
|
||||
|
||||
Returns:
|
||||
The cheapest ``TierLabel`` sufficient for the task.
|
||||
"""
|
||||
ctx = context or {}
|
||||
task_lower = task.lower()
|
||||
words = set(task_lower.split())
|
||||
|
||||
# ── Explicit cloud override ──────────────────────────────────────────────
|
||||
if ctx.get("require_cloud"):
|
||||
logger.debug("classify_tier → CLOUD_API (explicit require_cloud)")
|
||||
return TierLabel.CLOUD_API
|
||||
|
||||
# ── Tier-2 / complexity signals ──────────────────────────────────────────
|
||||
t2_phrase_hit = any(phrase in task_lower for phrase in _T2_PHRASES)
|
||||
t2_word_hit = bool(words & {"plan", "strategy", "optimize", "optimise", "quest",
|
||||
"stuck", "recover", "analyze", "analyse", "evaluate"})
|
||||
is_stuck = bool(ctx.get("stuck"))
|
||||
require_t2 = bool(ctx.get("require_t2"))
|
||||
long_input = len(task) > 300 # long tasks warrant more capable model
|
||||
deep_context = (
|
||||
len(ctx.get("active_quests", [])) >= 3
|
||||
or ctx.get("dialogue_active")
|
||||
)
|
||||
|
||||
if t2_phrase_hit or t2_word_hit or is_stuck or require_t2 or long_input or deep_context:
|
||||
logger.debug(
|
||||
"classify_tier → LOCAL_HEAVY (phrase=%s word=%s stuck=%s explicit=%s long=%s ctx=%s)",
|
||||
t2_phrase_hit, t2_word_hit, is_stuck, require_t2, long_input, deep_context,
|
||||
)
|
||||
return TierLabel.LOCAL_HEAVY
|
||||
|
||||
# ── Tier-1 signals ───────────────────────────────────────────────────────
|
||||
t1_word_hit = bool(words & _T1_WORDS)
|
||||
task_short = len(task.split()) <= 8
|
||||
no_active_context = (
|
||||
not ctx.get("active_quests")
|
||||
and not ctx.get("dialogue_active")
|
||||
and not ctx.get("combat_active")
|
||||
)
|
||||
|
||||
if t1_word_hit and task_short and no_active_context:
|
||||
logger.debug(
|
||||
"classify_tier → LOCAL_FAST (words=%s short=%s)", t1_word_hit, task_short
|
||||
)
|
||||
return TierLabel.LOCAL_FAST
|
||||
|
||||
# ── Default: LOCAL_HEAVY (safe for anything unclassified) ────────────────
|
||||
logger.debug("classify_tier → LOCAL_HEAVY (default)")
|
||||
return TierLabel.LOCAL_HEAVY
|
||||
|
||||
|
||||
def _is_low_quality(content: str, tier: TierLabel) -> bool:
|
||||
"""Return True if the response looks like it should be escalated.
|
||||
|
||||
Used for automatic Tier-1 → Tier-2 escalation.
|
||||
|
||||
Args:
|
||||
content: LLM response text.
|
||||
tier: The tier that produced the response.
|
||||
|
||||
Returns:
|
||||
True if the response is likely too low-quality to be useful.
|
||||
"""
|
||||
if not content or not content.strip():
|
||||
return True
|
||||
|
||||
stripped = content.strip()
|
||||
|
||||
# Too short to be useful
|
||||
if len(stripped) < _LOW_QUALITY_MIN_CHARS:
|
||||
return True
|
||||
|
||||
# Insufficient for a supposedly complex-enough task
|
||||
if tier == TierLabel.LOCAL_FAST and len(stripped) < _ESCALATION_MIN_CHARS:
|
||||
return True
|
||||
|
||||
# Matches known "I can't help" patterns
|
||||
for pattern in _LOW_QUALITY_PATTERNS:
|
||||
if pattern.search(stripped):
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
|
||||
class TieredModelRouter:
|
||||
"""Routes LLM requests across the Local 8B / Local 70B / Cloud API tiers.
|
||||
|
||||
Wraps CascadeRouter with:
|
||||
- Heuristic tier classification via ``classify_tier()``
|
||||
- Automatic Tier-1 → Tier-2 escalation on low-quality responses
|
||||
- Cloud-tier budget guard via ``BudgetTracker``
|
||||
- Per-request logging: tier, model, latency, estimated cost
|
||||
|
||||
Usage::
|
||||
|
||||
router = TieredModelRouter()
|
||||
|
||||
result = await router.route(
|
||||
task="Walk to the next room",
|
||||
context={},
|
||||
)
|
||||
print(result["content"], result["tier"]) # "Move north.", "local_fast"
|
||||
|
||||
# Force heavy tier
|
||||
result = await router.route(
|
||||
task="Plan the optimal path to become Hortator",
|
||||
context={"require_t2": True},
|
||||
)
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
cascade: Any | None = None,
|
||||
budget_tracker: Any | None = None,
|
||||
tier_models: dict[TierLabel, str] | None = None,
|
||||
auto_escalate: bool = True,
|
||||
) -> None:
|
||||
"""Initialise the tiered router.
|
||||
|
||||
Args:
|
||||
cascade: CascadeRouter instance. If ``None``, the
|
||||
singleton from ``get_router()`` is used lazily.
|
||||
budget_tracker: BudgetTracker instance. If ``None``, the
|
||||
singleton from ``get_budget_tracker()`` is used.
|
||||
tier_models: Override default model names per tier.
|
||||
auto_escalate: When ``True``, low-quality Tier-1 responses
|
||||
automatically retry on Tier-2.
|
||||
"""
|
||||
self._cascade = cascade
|
||||
self._budget = budget_tracker
|
||||
self._tier_models: dict[TierLabel, str] = dict(_DEFAULT_TIER_MODELS)
|
||||
self._auto_escalate = auto_escalate
|
||||
|
||||
# Apply settings-level overrides (can still be overridden per-instance)
|
||||
if settings.tier_local_fast_model:
|
||||
self._tier_models[TierLabel.LOCAL_FAST] = settings.tier_local_fast_model
|
||||
if settings.tier_local_heavy_model:
|
||||
self._tier_models[TierLabel.LOCAL_HEAVY] = settings.tier_local_heavy_model
|
||||
if settings.tier_cloud_model:
|
||||
self._tier_models[TierLabel.CLOUD_API] = settings.tier_cloud_model
|
||||
|
||||
if tier_models:
|
||||
self._tier_models.update(tier_models)
|
||||
|
||||
# ── Lazy singletons ──────────────────────────────────────────────────────
|
||||
|
||||
def _get_cascade(self) -> Any:
|
||||
if self._cascade is None:
|
||||
from infrastructure.router.cascade import get_router
|
||||
self._cascade = get_router()
|
||||
return self._cascade
|
||||
|
||||
def _get_budget(self) -> Any:
|
||||
if self._budget is None:
|
||||
from infrastructure.models.budget import get_budget_tracker
|
||||
self._budget = get_budget_tracker()
|
||||
return self._budget
|
||||
|
||||
# ── Public interface ─────────────────────────────────────────────────────
|
||||
|
||||
def classify(self, task: str, context: dict | None = None) -> TierLabel:
|
||||
"""Classify a task without routing. Useful for telemetry."""
|
||||
return classify_tier(task, context)
|
||||
|
||||
async def route(
|
||||
self,
|
||||
task: str,
|
||||
context: dict | None = None,
|
||||
messages: list[dict] | None = None,
|
||||
temperature: float = 0.3,
|
||||
max_tokens: int | None = None,
|
||||
) -> dict:
|
||||
"""Route a task to the appropriate model tier.
|
||||
|
||||
Builds a minimal messages list if ``messages`` is not provided.
|
||||
The result always includes a ``tier`` key indicating which tier
|
||||
ultimately handled the request.
|
||||
|
||||
Args:
|
||||
task: Natural-language task description.
|
||||
context: Task context dict (see ``classify_tier()``).
|
||||
messages: Pre-built OpenAI-compatible messages list. If
|
||||
provided, ``task`` is only used for classification.
|
||||
temperature: Sampling temperature (default 0.3).
|
||||
max_tokens: Maximum tokens to generate.
|
||||
|
||||
Returns:
|
||||
Dict with at minimum: ``content``, ``provider``, ``model``,
|
||||
``tier``, ``latency_ms``. May include ``cost_usd`` when a
|
||||
cloud request is recorded.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If all available tiers are exhausted.
|
||||
"""
|
||||
ctx = context or {}
|
||||
tier = self.classify(task, ctx)
|
||||
msgs = messages or [{"role": "user", "content": task}]
|
||||
|
||||
# ── Tier 1 attempt ───────────────────────────────────────────────────
|
||||
if tier == TierLabel.LOCAL_FAST:
|
||||
result = await self._complete_tier(
|
||||
TierLabel.LOCAL_FAST, msgs, temperature, max_tokens
|
||||
)
|
||||
if self._auto_escalate and _is_low_quality(result.get("content", ""), TierLabel.LOCAL_FAST):
|
||||
logger.info(
|
||||
"TieredModelRouter: Tier-1 response low quality, escalating to Tier-2 "
|
||||
"(task=%r content_len=%d)",
|
||||
task[:80],
|
||||
len(result.get("content", "")),
|
||||
)
|
||||
tier = TierLabel.LOCAL_HEAVY
|
||||
result = await self._complete_tier(
|
||||
TierLabel.LOCAL_HEAVY, msgs, temperature, max_tokens
|
||||
)
|
||||
return result
|
||||
|
||||
# ── Tier 2 attempt ───────────────────────────────────────────────────
|
||||
if tier == TierLabel.LOCAL_HEAVY:
|
||||
try:
|
||||
return await self._complete_tier(
|
||||
TierLabel.LOCAL_HEAVY, msgs, temperature, max_tokens
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.warning(
|
||||
"TieredModelRouter: Tier-2 failed (%s) — escalating to cloud", exc
|
||||
)
|
||||
tier = TierLabel.CLOUD_API
|
||||
|
||||
# ── Tier 3 (Cloud) ───────────────────────────────────────────────────
|
||||
budget = self._get_budget()
|
||||
if not budget.cloud_allowed():
|
||||
raise RuntimeError(
|
||||
"Cloud API tier requested but budget limit reached — "
|
||||
"increase tier_cloud_daily_budget_usd or tier_cloud_monthly_budget_usd"
|
||||
)
|
||||
|
||||
result = await self._complete_tier(
|
||||
TierLabel.CLOUD_API, msgs, temperature, max_tokens
|
||||
)
|
||||
|
||||
# Record cloud spend if token info is available
|
||||
usage = result.get("usage", {})
|
||||
if usage:
|
||||
cost = budget.record_spend(
|
||||
provider=result.get("provider", "unknown"),
|
||||
model=result.get("model", self._tier_models[TierLabel.CLOUD_API]),
|
||||
tokens_in=usage.get("prompt_tokens", 0),
|
||||
tokens_out=usage.get("completion_tokens", 0),
|
||||
tier=TierLabel.CLOUD_API,
|
||||
)
|
||||
result["cost_usd"] = cost
|
||||
|
||||
return result
|
||||
|
||||
# ── Internal helpers ─────────────────────────────────────────────────────
|
||||
|
||||
async def _complete_tier(
|
||||
self,
|
||||
tier: TierLabel,
|
||||
messages: list[dict],
|
||||
temperature: float,
|
||||
max_tokens: int | None,
|
||||
) -> dict:
|
||||
"""Dispatch a single inference request for the given tier."""
|
||||
model = self._tier_models[tier]
|
||||
cascade = self._get_cascade()
|
||||
start = time.monotonic()
|
||||
|
||||
logger.info(
|
||||
"TieredModelRouter: tier=%s model=%s messages=%d",
|
||||
tier,
|
||||
model,
|
||||
len(messages),
|
||||
)
|
||||
|
||||
result = await cascade.complete(
|
||||
messages=messages,
|
||||
model=model,
|
||||
temperature=temperature,
|
||||
max_tokens=max_tokens,
|
||||
)
|
||||
|
||||
elapsed_ms = (time.monotonic() - start) * 1000
|
||||
result["tier"] = tier
|
||||
result.setdefault("latency_ms", elapsed_ms)
|
||||
|
||||
logger.info(
|
||||
"TieredModelRouter: done tier=%s model=%s latency_ms=%.0f",
|
||||
tier,
|
||||
result.get("model", model),
|
||||
elapsed_ms,
|
||||
)
|
||||
return result
|
||||
|
||||
|
||||
# ── Module-level singleton ────────────────────────────────────────────────────
|
||||
|
||||
_tiered_router: TieredModelRouter | None = None
|
||||
|
||||
|
||||
def get_tiered_router() -> TieredModelRouter:
|
||||
"""Get or create the module-level TieredModelRouter singleton."""
|
||||
global _tiered_router
|
||||
if _tiered_router is None:
|
||||
_tiered_router = TieredModelRouter()
|
||||
return _tiered_router
|
||||
@@ -21,8 +21,6 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
@dataclass
|
||||
class Notification:
|
||||
"""A push notification with title, message, category, and read status."""
|
||||
|
||||
id: int
|
||||
title: str
|
||||
message: str
|
||||
|
||||
@@ -242,64 +242,6 @@ def produce_agent_state(agent_id: str, presence: dict) -> dict:
|
||||
}
|
||||
|
||||
|
||||
def _get_agents_online() -> int:
|
||||
"""Return the count of agents with a non-offline status."""
|
||||
try:
|
||||
from timmy.agents.loader import list_agents
|
||||
|
||||
agents = list_agents()
|
||||
return sum(1 for a in agents if a.get("status", "") not in ("offline", ""))
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to count agents: %s", exc)
|
||||
return 0
|
||||
|
||||
|
||||
def _get_visitors() -> int:
|
||||
"""Return the count of active WebSocket visitor clients."""
|
||||
try:
|
||||
from dashboard.routes.world import _ws_clients
|
||||
|
||||
return len(_ws_clients)
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to count visitors: %s", exc)
|
||||
return 0
|
||||
|
||||
|
||||
def _get_uptime_seconds() -> int:
|
||||
"""Return seconds elapsed since application start."""
|
||||
try:
|
||||
from config import APP_START_TIME
|
||||
|
||||
return int((datetime.now(UTC) - APP_START_TIME).total_seconds())
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to calculate uptime: %s", exc)
|
||||
return 0
|
||||
|
||||
|
||||
def _get_thinking_active() -> bool:
|
||||
"""Return True if the thinking engine is enabled and running."""
|
||||
try:
|
||||
from config import settings
|
||||
from timmy.thinking import thinking_engine
|
||||
|
||||
return settings.thinking_enabled and thinking_engine is not None
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to check thinking status: %s", exc)
|
||||
return False
|
||||
|
||||
|
||||
def _get_memory_count() -> int:
|
||||
"""Return total entries in the vector memory store."""
|
||||
try:
|
||||
from timmy.memory_system import get_memory_stats
|
||||
|
||||
stats = get_memory_stats()
|
||||
return stats.get("total_entries", 0)
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to count memories: %s", exc)
|
||||
return 0
|
||||
|
||||
|
||||
def produce_system_status() -> dict:
|
||||
"""Generate a system_status message for the Matrix.
|
||||
|
||||
@@ -328,14 +270,64 @@ def produce_system_status() -> dict:
|
||||
"ts": 1742529600,
|
||||
}
|
||||
"""
|
||||
# Count agents with status != offline
|
||||
agents_online = 0
|
||||
try:
|
||||
from timmy.agents.loader import list_agents
|
||||
|
||||
agents = list_agents()
|
||||
agents_online = sum(1 for a in agents if a.get("status", "") not in ("offline", ""))
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to count agents: %s", exc)
|
||||
|
||||
# Count visitors from WebSocket clients
|
||||
visitors = 0
|
||||
try:
|
||||
from dashboard.routes.world import _ws_clients
|
||||
|
||||
visitors = len(_ws_clients)
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to count visitors: %s", exc)
|
||||
|
||||
# Calculate uptime
|
||||
uptime_seconds = 0
|
||||
try:
|
||||
from datetime import UTC
|
||||
|
||||
from config import APP_START_TIME
|
||||
|
||||
uptime_seconds = int((datetime.now(UTC) - APP_START_TIME).total_seconds())
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to calculate uptime: %s", exc)
|
||||
|
||||
# Check thinking engine status
|
||||
thinking_active = False
|
||||
try:
|
||||
from config import settings
|
||||
from timmy.thinking import thinking_engine
|
||||
|
||||
thinking_active = settings.thinking_enabled and thinking_engine is not None
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to check thinking status: %s", exc)
|
||||
|
||||
# Count memories in vector store
|
||||
memory_count = 0
|
||||
try:
|
||||
from timmy.memory_system import get_memory_stats
|
||||
|
||||
stats = get_memory_stats()
|
||||
memory_count = stats.get("total_entries", 0)
|
||||
except Exception as exc:
|
||||
logger.debug("Failed to count memories: %s", exc)
|
||||
|
||||
return {
|
||||
"type": "system_status",
|
||||
"data": {
|
||||
"agents_online": _get_agents_online(),
|
||||
"visitors": _get_visitors(),
|
||||
"uptime_seconds": _get_uptime_seconds(),
|
||||
"thinking_active": _get_thinking_active(),
|
||||
"memory_count": _get_memory_count(),
|
||||
"agents_online": agents_online,
|
||||
"visitors": visitors,
|
||||
"uptime_seconds": uptime_seconds,
|
||||
"thinking_active": thinking_active,
|
||||
"memory_count": memory_count,
|
||||
},
|
||||
"ts": int(time.time()),
|
||||
}
|
||||
|
||||
@@ -2,16 +2,7 @@
|
||||
|
||||
from .api import router
|
||||
from .cascade import CascadeRouter, Provider, ProviderStatus, get_router
|
||||
from .classifier import TaskComplexity, classify_task
|
||||
from .history import HealthHistoryStore, get_history_store
|
||||
from .metabolic import (
|
||||
DEFAULT_TIER_MODELS,
|
||||
MetabolicRouter,
|
||||
ModelTier,
|
||||
build_prompt,
|
||||
classify_complexity,
|
||||
get_metabolic_router,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"CascadeRouter",
|
||||
@@ -21,14 +12,4 @@ __all__ = [
|
||||
"router",
|
||||
"HealthHistoryStore",
|
||||
"get_history_store",
|
||||
# Metabolic router
|
||||
"MetabolicRouter",
|
||||
"ModelTier",
|
||||
"DEFAULT_TIER_MODELS",
|
||||
"classify_complexity",
|
||||
"build_prompt",
|
||||
"get_metabolic_router",
|
||||
# Classifier
|
||||
"TaskComplexity",
|
||||
"classify_task",
|
||||
]
|
||||
|
||||
@@ -16,10 +16,7 @@ from dataclasses import dataclass, field
|
||||
from datetime import UTC, datetime
|
||||
from enum import Enum
|
||||
from pathlib import Path
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from infrastructure.router.classifier import TaskComplexity
|
||||
from typing import Any
|
||||
|
||||
from config import settings
|
||||
|
||||
@@ -531,99 +528,6 @@ class CascadeRouter:
|
||||
|
||||
return True
|
||||
|
||||
def _filter_providers(self, cascade_tier: str | None) -> list["Provider"]:
|
||||
"""Return the provider list filtered by tier.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If a tier is specified but no matching providers exist.
|
||||
"""
|
||||
if cascade_tier == "frontier_required":
|
||||
providers = [p for p in self.providers if p.type == "anthropic"]
|
||||
if not providers:
|
||||
raise RuntimeError("No Anthropic provider configured for 'frontier_required' tier.")
|
||||
return providers
|
||||
if cascade_tier:
|
||||
providers = [p for p in self.providers if p.tier == cascade_tier]
|
||||
if not providers:
|
||||
raise RuntimeError(f"No providers found for tier: {cascade_tier}")
|
||||
return providers
|
||||
return self.providers
|
||||
|
||||
async def _try_single_provider(
|
||||
self,
|
||||
provider: "Provider",
|
||||
messages: list[dict],
|
||||
model: str | None,
|
||||
temperature: float,
|
||||
max_tokens: int | None,
|
||||
content_type: ContentType,
|
||||
errors: list[str],
|
||||
) -> dict | None:
|
||||
"""Attempt one provider, returning a result dict on success or None on failure.
|
||||
|
||||
On failure the error string is appended to *errors* and the provider's
|
||||
failure metrics are updated so the caller can move on to the next provider.
|
||||
"""
|
||||
if not self._is_provider_available(provider):
|
||||
return None
|
||||
|
||||
# Metabolic protocol: skip cloud providers when quota is low
|
||||
if provider.type in ("anthropic", "openai", "grok"):
|
||||
if not self._quota_allows_cloud(provider):
|
||||
logger.info(
|
||||
"Metabolic protocol: skipping cloud provider %s (quota too low)",
|
||||
provider.name,
|
||||
)
|
||||
return None
|
||||
|
||||
selected_model, is_fallback_model = self._select_model(provider, model, content_type)
|
||||
|
||||
try:
|
||||
result = await self._attempt_with_retry(
|
||||
provider, messages, selected_model, temperature, max_tokens, content_type
|
||||
)
|
||||
except RuntimeError as exc:
|
||||
errors.append(str(exc))
|
||||
self._record_failure(provider)
|
||||
return None
|
||||
|
||||
self._record_success(provider, result.get("latency_ms", 0))
|
||||
return {
|
||||
"content": result["content"],
|
||||
"provider": provider.name,
|
||||
"model": result.get("model", selected_model or provider.get_default_model()),
|
||||
"latency_ms": result.get("latency_ms", 0),
|
||||
"is_fallback_model": is_fallback_model,
|
||||
}
|
||||
|
||||
def _get_model_for_complexity(
|
||||
self, provider: Provider, complexity: "TaskComplexity"
|
||||
) -> str | None:
|
||||
"""Return the best model on *provider* for the given complexity tier.
|
||||
|
||||
Checks fallback chains first (routine / complex), then falls back to
|
||||
any model with the matching capability tag, then the provider default.
|
||||
"""
|
||||
from infrastructure.router.classifier import TaskComplexity
|
||||
|
||||
chain_key = "routine" if complexity == TaskComplexity.SIMPLE else "complex"
|
||||
|
||||
# Walk the capability fallback chain — first model present on this provider wins
|
||||
for model_name in self.config.fallback_chains.get(chain_key, []):
|
||||
if any(m["name"] == model_name for m in provider.models):
|
||||
return model_name
|
||||
|
||||
# Direct capability lookup — only return if a model explicitly has the tag
|
||||
# (do not use get_model_with_capability here as it falls back to the default)
|
||||
cap_model = next(
|
||||
(m["name"] for m in provider.models if chain_key in m.get("capabilities", [])),
|
||||
None,
|
||||
)
|
||||
if cap_model:
|
||||
return cap_model
|
||||
|
||||
return None # Caller will use provider default
|
||||
|
||||
async def complete(
|
||||
self,
|
||||
messages: list[dict],
|
||||
@@ -631,7 +535,6 @@ class CascadeRouter:
|
||||
temperature: float = 0.7,
|
||||
max_tokens: int | None = None,
|
||||
cascade_tier: str | None = None,
|
||||
complexity_hint: str | None = None,
|
||||
) -> dict:
|
||||
"""Complete a chat conversation with automatic failover.
|
||||
|
||||
@@ -640,50 +543,35 @@ class CascadeRouter:
|
||||
- Falls back to vision-capable models when needed
|
||||
- Supports image URLs, paths, and base64 encoding
|
||||
|
||||
Complexity-based routing (issue #1065):
|
||||
- ``complexity_hint="simple"`` → routes to Qwen3-8B (low-latency)
|
||||
- ``complexity_hint="complex"`` → routes to Qwen3-14B (quality)
|
||||
- ``complexity_hint=None`` (default) → auto-classifies from messages
|
||||
|
||||
Args:
|
||||
messages: List of message dicts with role and content
|
||||
model: Preferred model (tries this first; complexity routing is
|
||||
skipped when an explicit model is given)
|
||||
model: Preferred model (tries this first, then provider defaults)
|
||||
temperature: Sampling temperature
|
||||
max_tokens: Maximum tokens to generate
|
||||
cascade_tier: If specified, filters providers by this tier.
|
||||
- "frontier_required": Uses only Anthropic provider for top-tier models.
|
||||
complexity_hint: "simple", "complex", or None (auto-detect).
|
||||
|
||||
Returns:
|
||||
Dict with content, provider_used, model, latency_ms,
|
||||
is_fallback_model, and complexity fields.
|
||||
Dict with content, provider_used, and metrics
|
||||
|
||||
Raises:
|
||||
RuntimeError: If all providers fail
|
||||
"""
|
||||
from infrastructure.router.classifier import TaskComplexity, classify_task
|
||||
|
||||
content_type = self._detect_content_type(messages)
|
||||
if content_type != ContentType.TEXT:
|
||||
logger.debug("Detected %s content, selecting appropriate model", content_type.value)
|
||||
|
||||
# Resolve task complexity ─────────────────────────────────────────────
|
||||
# Skip complexity routing when caller explicitly specifies a model.
|
||||
complexity: TaskComplexity | None = None
|
||||
if model is None:
|
||||
if complexity_hint is not None:
|
||||
try:
|
||||
complexity = TaskComplexity(complexity_hint.lower())
|
||||
except ValueError:
|
||||
logger.warning("Unknown complexity_hint %r, auto-classifying", complexity_hint)
|
||||
complexity = classify_task(messages)
|
||||
else:
|
||||
complexity = classify_task(messages)
|
||||
logger.debug("Task complexity: %s", complexity.value)
|
||||
errors = []
|
||||
|
||||
errors: list[str] = []
|
||||
providers = self._filter_providers(cascade_tier)
|
||||
providers = self.providers
|
||||
if cascade_tier == "frontier_required":
|
||||
providers = [p for p in self.providers if p.type == "anthropic"]
|
||||
if not providers:
|
||||
raise RuntimeError("No Anthropic provider configured for 'frontier_required' tier.")
|
||||
elif cascade_tier:
|
||||
providers = [p for p in self.providers if p.tier == cascade_tier]
|
||||
if not providers:
|
||||
raise RuntimeError(f"No providers found for tier: {cascade_tier}")
|
||||
|
||||
for provider in providers:
|
||||
if not self._is_provider_available(provider):
|
||||
@@ -698,21 +586,7 @@ class CascadeRouter:
|
||||
)
|
||||
continue
|
||||
|
||||
# Complexity-based model selection (only when no explicit model) ──
|
||||
effective_model = model
|
||||
if effective_model is None and complexity is not None:
|
||||
effective_model = self._get_model_for_complexity(provider, complexity)
|
||||
if effective_model:
|
||||
logger.debug(
|
||||
"Complexity routing [%s]: %s → %s",
|
||||
complexity.value,
|
||||
provider.name,
|
||||
effective_model,
|
||||
)
|
||||
|
||||
selected_model, is_fallback_model = self._select_model(
|
||||
provider, effective_model, content_type
|
||||
)
|
||||
selected_model, is_fallback_model = self._select_model(provider, model, content_type)
|
||||
|
||||
try:
|
||||
result = await self._attempt_with_retry(
|
||||
@@ -735,7 +609,6 @@ class CascadeRouter:
|
||||
"model": result.get("model", selected_model or provider.get_default_model()),
|
||||
"latency_ms": result.get("latency_ms", 0),
|
||||
"is_fallback_model": is_fallback_model,
|
||||
"complexity": complexity.value if complexity is not None else None,
|
||||
}
|
||||
|
||||
raise RuntimeError(f"All providers failed: {'; '.join(errors)}")
|
||||
|
||||
@@ -1,169 +0,0 @@
|
||||
"""Task complexity classifier for Qwen3 dual-model routing.
|
||||
|
||||
Classifies incoming tasks as SIMPLE (route to Qwen3-8B for low-latency)
|
||||
or COMPLEX (route to Qwen3-14B for quality-sensitive work).
|
||||
|
||||
Classification is fully heuristic — no LLM inference required.
|
||||
"""
|
||||
|
||||
import re
|
||||
from enum import Enum
|
||||
|
||||
|
||||
class TaskComplexity(Enum):
|
||||
"""Task complexity tier for model routing."""
|
||||
|
||||
SIMPLE = "simple" # Qwen3-8B Q6_K: routine, latency-sensitive
|
||||
COMPLEX = "complex" # Qwen3-14B Q5_K_M: quality-sensitive, multi-step
|
||||
|
||||
|
||||
# Keywords strongly associated with complex tasks
|
||||
_COMPLEX_KEYWORDS: frozenset[str] = frozenset(
|
||||
[
|
||||
"plan",
|
||||
"review",
|
||||
"analyze",
|
||||
"analyse",
|
||||
"triage",
|
||||
"refactor",
|
||||
"design",
|
||||
"architecture",
|
||||
"implement",
|
||||
"compare",
|
||||
"debug",
|
||||
"explain",
|
||||
"prioritize",
|
||||
"prioritise",
|
||||
"strategy",
|
||||
"optimize",
|
||||
"optimise",
|
||||
"evaluate",
|
||||
"assess",
|
||||
"brainstorm",
|
||||
"outline",
|
||||
"summarize",
|
||||
"summarise",
|
||||
"generate code",
|
||||
"write a",
|
||||
"write the",
|
||||
"code review",
|
||||
"pull request",
|
||||
"multi-step",
|
||||
"multi step",
|
||||
"step by step",
|
||||
"backlog prioriti",
|
||||
"issue triage",
|
||||
"root cause",
|
||||
"how does",
|
||||
"why does",
|
||||
"what are the",
|
||||
]
|
||||
)
|
||||
|
||||
# Keywords strongly associated with simple/routine tasks
|
||||
_SIMPLE_KEYWORDS: frozenset[str] = frozenset(
|
||||
[
|
||||
"status",
|
||||
"list ",
|
||||
"show ",
|
||||
"what is",
|
||||
"how many",
|
||||
"ping",
|
||||
"run ",
|
||||
"execute ",
|
||||
"ls ",
|
||||
"cat ",
|
||||
"ps ",
|
||||
"fetch ",
|
||||
"count ",
|
||||
"tail ",
|
||||
"head ",
|
||||
"grep ",
|
||||
"find file",
|
||||
"read file",
|
||||
"get ",
|
||||
"query ",
|
||||
"check ",
|
||||
"yes",
|
||||
"no",
|
||||
"ok",
|
||||
"done",
|
||||
"thanks",
|
||||
]
|
||||
)
|
||||
|
||||
# Content longer than this is treated as complex regardless of keywords
|
||||
_COMPLEX_CHAR_THRESHOLD = 500
|
||||
|
||||
# Short content defaults to simple
|
||||
_SIMPLE_CHAR_THRESHOLD = 150
|
||||
|
||||
# More than this many messages suggests an ongoing complex conversation
|
||||
_COMPLEX_CONVERSATION_DEPTH = 6
|
||||
|
||||
|
||||
def classify_task(messages: list[dict]) -> TaskComplexity:
|
||||
"""Classify task complexity from a list of messages.
|
||||
|
||||
Uses heuristic rules — no LLM call required. Errs toward COMPLEX
|
||||
when uncertain so that quality is preserved.
|
||||
|
||||
Args:
|
||||
messages: List of message dicts with ``role`` and ``content`` keys.
|
||||
|
||||
Returns:
|
||||
TaskComplexity.SIMPLE or TaskComplexity.COMPLEX
|
||||
"""
|
||||
if not messages:
|
||||
return TaskComplexity.SIMPLE
|
||||
|
||||
# Concatenate all user-turn content for analysis
|
||||
user_content = (
|
||||
" ".join(
|
||||
msg.get("content", "")
|
||||
for msg in messages
|
||||
if msg.get("role") in ("user", "human") and isinstance(msg.get("content"), str)
|
||||
)
|
||||
.lower()
|
||||
.strip()
|
||||
)
|
||||
|
||||
if not user_content:
|
||||
return TaskComplexity.SIMPLE
|
||||
|
||||
# Complexity signals override everything -----------------------------------
|
||||
|
||||
# Explicit complex keywords
|
||||
for kw in _COMPLEX_KEYWORDS:
|
||||
if kw in user_content:
|
||||
return TaskComplexity.COMPLEX
|
||||
|
||||
# Numbered / multi-step instruction list: "1. do this 2. do that"
|
||||
if re.search(r"\b\d+\.\s+\w", user_content):
|
||||
return TaskComplexity.COMPLEX
|
||||
|
||||
# Code blocks embedded in messages
|
||||
if "```" in user_content:
|
||||
return TaskComplexity.COMPLEX
|
||||
|
||||
# Long content → complex reasoning likely required
|
||||
if len(user_content) > _COMPLEX_CHAR_THRESHOLD:
|
||||
return TaskComplexity.COMPLEX
|
||||
|
||||
# Deep conversation → complex ongoing task
|
||||
if len(messages) > _COMPLEX_CONVERSATION_DEPTH:
|
||||
return TaskComplexity.COMPLEX
|
||||
|
||||
# Simplicity signals -------------------------------------------------------
|
||||
|
||||
# Explicit simple keywords
|
||||
for kw in _SIMPLE_KEYWORDS:
|
||||
if kw in user_content:
|
||||
return TaskComplexity.SIMPLE
|
||||
|
||||
# Short single-sentence messages default to simple
|
||||
if len(user_content) <= _SIMPLE_CHAR_THRESHOLD:
|
||||
return TaskComplexity.SIMPLE
|
||||
|
||||
# When uncertain, prefer quality (complex model)
|
||||
return TaskComplexity.COMPLEX
|
||||
@@ -1,424 +0,0 @@
|
||||
"""Three-tier metabolic LLM router.
|
||||
|
||||
Routes queries to the cheapest-sufficient model tier using MLX for all
|
||||
inference on Apple Silicon GPU:
|
||||
|
||||
T1 — Routine (Qwen3-8B Q6_K, ~45-55 tok/s): Simple navigation, basic choices.
|
||||
T2 — Medium (Qwen3-14B Q5_K_M, ~20-28 tok/s): Dialogue, inventory management.
|
||||
T3 — Complex (Qwen3-32B Q4_K_M, ~8-12 tok/s): Quest planning, stuck recovery.
|
||||
|
||||
Memory budget:
|
||||
- T1+T2 always loaded (~8.5 GB combined)
|
||||
- T3 loaded on demand (+20 GB) — game pauses during inference
|
||||
|
||||
Design notes:
|
||||
- 70% of game ticks never reach the LLM (handled upstream by behavior trees)
|
||||
- T3 pauses the game world before inference and unpauses after (graceful if no world)
|
||||
- All inference via vllm-mlx / Ollama — local-first, no cloud for game ticks
|
||||
|
||||
References:
|
||||
- Issue #966 — Three-Tier Metabolic LLM Router
|
||||
- Issue #1063 — Best Local Uncensored Agent Model for M3 Max 36GB
|
||||
- Issue #1075 — Claude Quota Monitor + Metabolic Protocol
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from enum import StrEnum
|
||||
from typing import Any
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class ModelTier(StrEnum):
|
||||
"""Three metabolic model tiers ordered by cost and capability.
|
||||
|
||||
Tier selection is driven by classify_complexity(). The cheapest
|
||||
sufficient tier is always chosen — T1 handles routine tasks, T2
|
||||
handles dialogue and management, T3 handles planning and recovery.
|
||||
"""
|
||||
|
||||
T1_ROUTINE = "t1_routine" # Fast, cheap — Qwen3-8B, always loaded
|
||||
T2_MEDIUM = "t2_medium" # Balanced — Qwen3-14B, always loaded
|
||||
T3_COMPLEX = "t3_complex" # Deep — Qwen3-32B, loaded on demand, pauses game
|
||||
|
||||
|
||||
# ── Classification vocabulary ────────────────────────────────────────────────
|
||||
|
||||
# T1: single-action navigation and binary-choice words
|
||||
_T1_KEYWORDS = frozenset(
|
||||
{
|
||||
"go",
|
||||
"move",
|
||||
"walk",
|
||||
"run",
|
||||
"north",
|
||||
"south",
|
||||
"east",
|
||||
"west",
|
||||
"up",
|
||||
"down",
|
||||
"left",
|
||||
"right",
|
||||
"yes",
|
||||
"no",
|
||||
"ok",
|
||||
"okay",
|
||||
"open",
|
||||
"close",
|
||||
"take",
|
||||
"drop",
|
||||
"look",
|
||||
"pick",
|
||||
"use",
|
||||
"wait",
|
||||
"rest",
|
||||
"save",
|
||||
"attack",
|
||||
"flee",
|
||||
"jump",
|
||||
"crouch",
|
||||
}
|
||||
)
|
||||
|
||||
# T3: planning, optimisation, or recovery signals
|
||||
_T3_KEYWORDS = frozenset(
|
||||
{
|
||||
"plan",
|
||||
"strategy",
|
||||
"optimize",
|
||||
"optimise",
|
||||
"quest",
|
||||
"stuck",
|
||||
"recover",
|
||||
"multi-step",
|
||||
"long-term",
|
||||
"negotiate",
|
||||
"persuade",
|
||||
"faction",
|
||||
"reputation",
|
||||
"best",
|
||||
"optimal",
|
||||
"recommend",
|
||||
"analyze",
|
||||
"analyse",
|
||||
"evaluate",
|
||||
"decide",
|
||||
"complex",
|
||||
"how do i",
|
||||
"what should i do",
|
||||
"help me figure",
|
||||
"what is the best",
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
def classify_complexity(task: str, state: dict) -> ModelTier:
|
||||
"""Classify a task to the cheapest-sufficient model tier.
|
||||
|
||||
Classification priority (highest wins):
|
||||
1. T3 — any T3 keyword, stuck indicator, or ``state["require_t3"] = True``
|
||||
2. T1 — short task with only T1 keywords and no active context
|
||||
3. T2 — everything else (safe default)
|
||||
|
||||
Args:
|
||||
task: Natural-language task description or player input.
|
||||
state: Current game state dict. Recognised keys:
|
||||
``stuck`` (bool), ``require_t3`` (bool),
|
||||
``active_quests`` (list), ``dialogue_active`` (bool).
|
||||
|
||||
Returns:
|
||||
ModelTier appropriate for the task.
|
||||
"""
|
||||
task_lower = task.lower()
|
||||
words = set(task_lower.split())
|
||||
|
||||
# ── T3 signals ──────────────────────────────────────────────────────────
|
||||
t3_keyword_hit = bool(words & _T3_KEYWORDS)
|
||||
# Check multi-word T3 phrases
|
||||
t3_phrase_hit = any(phrase in task_lower for phrase in _T3_KEYWORDS if " " in phrase)
|
||||
is_stuck = bool(state.get("stuck", False))
|
||||
explicit_t3 = bool(state.get("require_t3", False))
|
||||
|
||||
if t3_keyword_hit or t3_phrase_hit or is_stuck or explicit_t3:
|
||||
logger.debug(
|
||||
"classify_complexity → T3 (keywords=%s stuck=%s explicit=%s)",
|
||||
t3_keyword_hit or t3_phrase_hit,
|
||||
is_stuck,
|
||||
explicit_t3,
|
||||
)
|
||||
return ModelTier.T3_COMPLEX
|
||||
|
||||
# ── T1 signals ──────────────────────────────────────────────────────────
|
||||
t1_keyword_hit = bool(words & _T1_KEYWORDS)
|
||||
task_short = len(task.split()) <= 6
|
||||
no_active_context = (
|
||||
not state.get("active_quests")
|
||||
and not state.get("dialogue_active")
|
||||
and not state.get("combat_active")
|
||||
)
|
||||
|
||||
if t1_keyword_hit and task_short and no_active_context:
|
||||
logger.debug("classify_complexity → T1 (keywords=%s short=%s)", t1_keyword_hit, task_short)
|
||||
return ModelTier.T1_ROUTINE
|
||||
|
||||
# ── Default: T2 ─────────────────────────────────────────────────────────
|
||||
logger.debug("classify_complexity → T2 (default)")
|
||||
return ModelTier.T2_MEDIUM
|
||||
|
||||
|
||||
def build_prompt(
|
||||
state: dict,
|
||||
ui_state: dict,
|
||||
text: str,
|
||||
visual_context: str | None = None,
|
||||
) -> list[dict]:
|
||||
"""Build an OpenAI-compatible messages list from game context.
|
||||
|
||||
Assembles a system message from structured game state and a user
|
||||
message from the player's text input. This format is accepted by
|
||||
CascadeRouter.complete() directly.
|
||||
|
||||
Args:
|
||||
state: Current game state dict. Common keys:
|
||||
``location`` (str), ``health`` (int/float),
|
||||
``inventory`` (list), ``active_quests`` (list),
|
||||
``stuck`` (bool).
|
||||
ui_state: Current UI state dict. Common keys:
|
||||
``dialogue_active`` (bool), ``dialogue_npc`` (str),
|
||||
``menu_open`` (str), ``combat_active`` (bool).
|
||||
text: Player text or task description (becomes user message).
|
||||
visual_context: Optional free-text description of the current screen
|
||||
or scene — from a vision model or rule-based extractor.
|
||||
|
||||
Returns:
|
||||
List of message dicts: [{"role": "system", ...}, {"role": "user", ...}]
|
||||
"""
|
||||
context_lines: list[str] = []
|
||||
|
||||
location = state.get("location", "unknown")
|
||||
context_lines.append(f"Location: {location}")
|
||||
|
||||
health = state.get("health")
|
||||
if health is not None:
|
||||
context_lines.append(f"Health: {health}")
|
||||
|
||||
inventory = state.get("inventory", [])
|
||||
if inventory:
|
||||
items = [i if isinstance(i, str) else i.get("name", str(i)) for i in inventory[:10]]
|
||||
context_lines.append(f"Inventory: {', '.join(items)}")
|
||||
|
||||
active_quests = state.get("active_quests", [])
|
||||
if active_quests:
|
||||
names = [q if isinstance(q, str) else q.get("name", str(q)) for q in active_quests[:5]]
|
||||
context_lines.append(f"Active quests: {', '.join(names)}")
|
||||
|
||||
if state.get("stuck"):
|
||||
context_lines.append("Status: STUCK — need recovery strategy")
|
||||
|
||||
if ui_state.get("dialogue_active"):
|
||||
npc = ui_state.get("dialogue_npc", "NPC")
|
||||
context_lines.append(f"In dialogue with: {npc}")
|
||||
|
||||
if ui_state.get("menu_open"):
|
||||
context_lines.append(f"Menu open: {ui_state['menu_open']}")
|
||||
|
||||
if ui_state.get("combat_active"):
|
||||
context_lines.append("Status: IN COMBAT")
|
||||
|
||||
if visual_context:
|
||||
context_lines.append(f"Scene: {visual_context}")
|
||||
|
||||
system_content = (
|
||||
"You are Timmy, an AI game agent. "
|
||||
"Respond with valid game commands only.\n\n" + "\n".join(context_lines)
|
||||
)
|
||||
|
||||
return [
|
||||
{"role": "system", "content": system_content},
|
||||
{"role": "user", "content": text},
|
||||
]
|
||||
|
||||
|
||||
# ── Default model assignments ────────────────────────────────────────────────
|
||||
# Overridable per deployment via MetabolicRouter(tier_models={...}).
|
||||
# Model benchmarks (M3 Max 36 GB, issue #1063):
|
||||
# Qwen3-8B Q6_K — 0.933 F1 tool calling, ~45-55 tok/s (~6 GB)
|
||||
# Qwen3-14B Q5_K_M — 0.971 F1 tool calling, ~20-28 tok/s (~9.5 GB)
|
||||
# Qwen3-32B Q4_K_M — highest quality, ~8-12 tok/s (~20 GB, on demand)
|
||||
DEFAULT_TIER_MODELS: dict[ModelTier, str] = {
|
||||
ModelTier.T1_ROUTINE: "qwen3:8b",
|
||||
ModelTier.T2_MEDIUM: "qwen3:14b",
|
||||
ModelTier.T3_COMPLEX: "qwen3:30b", # Closest Ollama tag to 32B Q4
|
||||
}
|
||||
|
||||
|
||||
class MetabolicRouter:
|
||||
"""Routes LLM requests to the cheapest-sufficient model tier.
|
||||
|
||||
Wraps CascadeRouter with:
|
||||
- Complexity classification via classify_complexity()
|
||||
- Prompt assembly via build_prompt()
|
||||
- T3 world-pause / world-unpause (graceful if no world adapter)
|
||||
|
||||
Usage::
|
||||
|
||||
router = MetabolicRouter()
|
||||
|
||||
# Simple route call — classification + prompt + inference in one step
|
||||
result = await router.route(
|
||||
task="Go north",
|
||||
state={"location": "Balmora"},
|
||||
ui_state={},
|
||||
)
|
||||
print(result["content"], result["tier"])
|
||||
|
||||
# Pre-classify if you need the tier for telemetry
|
||||
tier = router.classify("Plan the best path to Vivec", game_state)
|
||||
|
||||
# Wire in world adapter for T3 pause/unpause
|
||||
router.set_world(world_adapter)
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
cascade: Any | None = None,
|
||||
tier_models: dict[ModelTier, str] | None = None,
|
||||
) -> None:
|
||||
"""Initialise the metabolic router.
|
||||
|
||||
Args:
|
||||
cascade: CascadeRouter instance to use. If None, the
|
||||
singleton returned by get_router() is used lazily.
|
||||
tier_models: Override default model names per tier.
|
||||
"""
|
||||
self._cascade = cascade
|
||||
self._tier_models: dict[ModelTier, str] = dict(DEFAULT_TIER_MODELS)
|
||||
if tier_models:
|
||||
self._tier_models.update(tier_models)
|
||||
self._world: Any | None = None
|
||||
|
||||
def set_world(self, world: Any) -> None:
|
||||
"""Wire in a world adapter for T3 pause / unpause support.
|
||||
|
||||
The adapter only needs to implement ``act(CommandInput)`` — the full
|
||||
WorldInterface contract is not required. A missing or broken world
|
||||
adapter degrades gracefully (logs a warning, inference continues).
|
||||
|
||||
Args:
|
||||
world: Any object with an ``act(CommandInput)`` method.
|
||||
"""
|
||||
self._world = world
|
||||
|
||||
def _get_cascade(self) -> Any:
|
||||
"""Return the CascadeRouter, creating the singleton if needed."""
|
||||
if self._cascade is None:
|
||||
from infrastructure.router.cascade import get_router
|
||||
|
||||
self._cascade = get_router()
|
||||
return self._cascade
|
||||
|
||||
def classify(self, task: str, state: dict) -> ModelTier:
|
||||
"""Classify task complexity. Delegates to classify_complexity()."""
|
||||
return classify_complexity(task, state)
|
||||
|
||||
async def _pause_world(self) -> None:
|
||||
"""Pause the game world before T3 inference (graceful degradation)."""
|
||||
if self._world is None:
|
||||
return
|
||||
try:
|
||||
from infrastructure.world.types import CommandInput
|
||||
|
||||
await asyncio.to_thread(self._world.act, CommandInput(action="pause"))
|
||||
logger.debug("MetabolicRouter: world paused for T3 inference")
|
||||
except Exception as exc:
|
||||
logger.warning("world.pause() failed — continuing without pause: %s", exc)
|
||||
|
||||
async def _unpause_world(self) -> None:
|
||||
"""Unpause the game world after T3 inference (always called, even on error)."""
|
||||
if self._world is None:
|
||||
return
|
||||
try:
|
||||
from infrastructure.world.types import CommandInput
|
||||
|
||||
await asyncio.to_thread(self._world.act, CommandInput(action="unpause"))
|
||||
logger.debug("MetabolicRouter: world unpaused after T3 inference")
|
||||
except Exception as exc:
|
||||
logger.warning("world.unpause() failed — game may remain paused: %s", exc)
|
||||
|
||||
async def route(
|
||||
self,
|
||||
task: str,
|
||||
state: dict,
|
||||
ui_state: dict | None = None,
|
||||
visual_context: str | None = None,
|
||||
temperature: float = 0.3,
|
||||
max_tokens: int | None = None,
|
||||
) -> dict:
|
||||
"""Route a task to the appropriate model tier and return the LLM response.
|
||||
|
||||
Selects the tier via classify_complexity(), assembles the prompt via
|
||||
build_prompt(), and dispatches to CascadeRouter. For T3, the game
|
||||
world is paused before inference and unpaused after (in a finally block).
|
||||
|
||||
Args:
|
||||
task: Natural-language task description or player input.
|
||||
state: Current game state dict.
|
||||
ui_state: Current UI state dict (optional, defaults to {}).
|
||||
visual_context: Optional screen/scene description from vision model.
|
||||
temperature: Sampling temperature (default 0.3 for game commands).
|
||||
max_tokens: Maximum tokens to generate.
|
||||
|
||||
Returns:
|
||||
Dict with keys: ``content``, ``provider``, ``model``, ``tier``,
|
||||
``latency_ms``, plus any extra keys from CascadeRouter.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If all providers fail (propagated from CascadeRouter).
|
||||
"""
|
||||
ui_state = ui_state or {}
|
||||
tier = self.classify(task, state)
|
||||
model = self._tier_models[tier]
|
||||
messages = build_prompt(state, ui_state, task, visual_context)
|
||||
cascade = self._get_cascade()
|
||||
|
||||
logger.info(
|
||||
"MetabolicRouter: tier=%s model=%s task=%r",
|
||||
tier,
|
||||
model,
|
||||
task[:80],
|
||||
)
|
||||
|
||||
if tier == ModelTier.T3_COMPLEX:
|
||||
await self._pause_world()
|
||||
try:
|
||||
result = await cascade.complete(
|
||||
messages=messages,
|
||||
model=model,
|
||||
temperature=temperature,
|
||||
max_tokens=max_tokens,
|
||||
)
|
||||
finally:
|
||||
await self._unpause_world()
|
||||
else:
|
||||
result = await cascade.complete(
|
||||
messages=messages,
|
||||
model=model,
|
||||
temperature=temperature,
|
||||
max_tokens=max_tokens,
|
||||
)
|
||||
|
||||
result["tier"] = tier
|
||||
return result
|
||||
|
||||
|
||||
# ── Module-level singleton ────────────────────────────────────────────────────
|
||||
_metabolic_router: MetabolicRouter | None = None
|
||||
|
||||
|
||||
def get_metabolic_router() -> MetabolicRouter:
|
||||
"""Get or create the MetabolicRouter singleton."""
|
||||
global _metabolic_router
|
||||
if _metabolic_router is None:
|
||||
_metabolic_router = MetabolicRouter()
|
||||
return _metabolic_router
|
||||
@@ -1,247 +0,0 @@
|
||||
"""Self-correction event logger.
|
||||
|
||||
Records instances where the agent detected its own errors and the steps
|
||||
it took to correct them. Used by the Self-Correction Dashboard to visualise
|
||||
these events and surface recurring failure patterns.
|
||||
|
||||
Usage::
|
||||
|
||||
from infrastructure.self_correction import log_self_correction, get_corrections, get_patterns
|
||||
|
||||
log_self_correction(
|
||||
source="agentic_loop",
|
||||
original_intent="Execute step 3: deploy service",
|
||||
detected_error="ConnectionRefusedError: port 8080 unavailable",
|
||||
correction_strategy="Retry on alternate port 8081",
|
||||
final_outcome="Success on retry",
|
||||
task_id="abc123",
|
||||
)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import logging
|
||||
import sqlite3
|
||||
import uuid
|
||||
from collections.abc import Generator
|
||||
from contextlib import closing, contextmanager
|
||||
from datetime import UTC, datetime
|
||||
from pathlib import Path
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Database
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_DB_PATH: Path | None = None
|
||||
|
||||
|
||||
def _get_db_path() -> Path:
|
||||
global _DB_PATH
|
||||
if _DB_PATH is None:
|
||||
from config import settings
|
||||
|
||||
_DB_PATH = Path(settings.repo_root) / "data" / "self_correction.db"
|
||||
return _DB_PATH
|
||||
|
||||
|
||||
@contextmanager
|
||||
def _get_db() -> Generator[sqlite3.Connection, None, None]:
|
||||
db_path = _get_db_path()
|
||||
db_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
with closing(sqlite3.connect(str(db_path))) as conn:
|
||||
conn.row_factory = sqlite3.Row
|
||||
conn.execute("""
|
||||
CREATE TABLE IF NOT EXISTS self_correction_events (
|
||||
id TEXT PRIMARY KEY,
|
||||
source TEXT NOT NULL,
|
||||
task_id TEXT DEFAULT '',
|
||||
original_intent TEXT NOT NULL,
|
||||
detected_error TEXT NOT NULL,
|
||||
correction_strategy TEXT NOT NULL,
|
||||
final_outcome TEXT NOT NULL,
|
||||
outcome_status TEXT DEFAULT 'success',
|
||||
error_type TEXT DEFAULT '',
|
||||
created_at TEXT DEFAULT (datetime('now'))
|
||||
)
|
||||
""")
|
||||
conn.execute(
|
||||
"CREATE INDEX IF NOT EXISTS idx_sc_created ON self_correction_events(created_at)"
|
||||
)
|
||||
conn.execute(
|
||||
"CREATE INDEX IF NOT EXISTS idx_sc_error_type ON self_correction_events(error_type)"
|
||||
)
|
||||
conn.commit()
|
||||
yield conn
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Write
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def log_self_correction(
|
||||
*,
|
||||
source: str,
|
||||
original_intent: str,
|
||||
detected_error: str,
|
||||
correction_strategy: str,
|
||||
final_outcome: str,
|
||||
task_id: str = "",
|
||||
outcome_status: str = "success",
|
||||
error_type: str = "",
|
||||
) -> str:
|
||||
"""Record a self-correction event and return its ID.
|
||||
|
||||
Args:
|
||||
source: Module or component that triggered the correction.
|
||||
original_intent: What the agent was trying to do.
|
||||
detected_error: The error or problem that was detected.
|
||||
correction_strategy: How the agent attempted to correct the error.
|
||||
final_outcome: What the result of the correction attempt was.
|
||||
task_id: Optional task/session ID for correlation.
|
||||
outcome_status: 'success', 'partial', or 'failed'.
|
||||
error_type: Short category label for pattern analysis (e.g.
|
||||
'ConnectionError', 'TimeoutError').
|
||||
|
||||
Returns:
|
||||
The ID of the newly created record.
|
||||
"""
|
||||
event_id = str(uuid.uuid4())
|
||||
if not error_type:
|
||||
# Derive a simple type from the first word of the detected error
|
||||
error_type = detected_error.split(":")[0].strip()[:64]
|
||||
|
||||
try:
|
||||
with _get_db() as conn:
|
||||
conn.execute(
|
||||
"""
|
||||
INSERT INTO self_correction_events
|
||||
(id, source, task_id, original_intent, detected_error,
|
||||
correction_strategy, final_outcome, outcome_status, error_type)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
""",
|
||||
(
|
||||
event_id,
|
||||
source,
|
||||
task_id,
|
||||
original_intent[:2000],
|
||||
detected_error[:2000],
|
||||
correction_strategy[:2000],
|
||||
final_outcome[:2000],
|
||||
outcome_status,
|
||||
error_type,
|
||||
),
|
||||
)
|
||||
conn.commit()
|
||||
logger.info(
|
||||
"Self-correction logged [%s] source=%s error_type=%s status=%s",
|
||||
event_id[:8],
|
||||
source,
|
||||
error_type,
|
||||
outcome_status,
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to log self-correction event: %s", exc)
|
||||
|
||||
return event_id
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Read
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def get_corrections(limit: int = 50) -> list[dict]:
|
||||
"""Return the most recent self-correction events, newest first."""
|
||||
try:
|
||||
with _get_db() as conn:
|
||||
rows = conn.execute(
|
||||
"""
|
||||
SELECT * FROM self_correction_events
|
||||
ORDER BY created_at DESC
|
||||
LIMIT ?
|
||||
""",
|
||||
(limit,),
|
||||
).fetchall()
|
||||
return [dict(r) for r in rows]
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to fetch self-correction events: %s", exc)
|
||||
return []
|
||||
|
||||
|
||||
def get_patterns(top_n: int = 10) -> list[dict]:
|
||||
"""Return the most common recurring error types with counts.
|
||||
|
||||
Each entry has:
|
||||
- error_type: category label
|
||||
- count: total occurrences
|
||||
- success_count: corrected successfully
|
||||
- failed_count: correction also failed
|
||||
- last_seen: ISO timestamp of most recent occurrence
|
||||
"""
|
||||
try:
|
||||
with _get_db() as conn:
|
||||
rows = conn.execute(
|
||||
"""
|
||||
SELECT
|
||||
error_type,
|
||||
COUNT(*) AS count,
|
||||
SUM(CASE WHEN outcome_status = 'success' THEN 1 ELSE 0 END) AS success_count,
|
||||
SUM(CASE WHEN outcome_status = 'failed' THEN 1 ELSE 0 END) AS failed_count,
|
||||
MAX(created_at) AS last_seen
|
||||
FROM self_correction_events
|
||||
GROUP BY error_type
|
||||
ORDER BY count DESC
|
||||
LIMIT ?
|
||||
""",
|
||||
(top_n,),
|
||||
).fetchall()
|
||||
return [dict(r) for r in rows]
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to fetch self-correction patterns: %s", exc)
|
||||
return []
|
||||
|
||||
|
||||
def get_stats() -> dict:
|
||||
"""Return aggregate statistics for the summary panel."""
|
||||
try:
|
||||
with _get_db() as conn:
|
||||
row = conn.execute(
|
||||
"""
|
||||
SELECT
|
||||
COUNT(*) AS total,
|
||||
SUM(CASE WHEN outcome_status = 'success' THEN 1 ELSE 0 END) AS success_count,
|
||||
SUM(CASE WHEN outcome_status = 'partial' THEN 1 ELSE 0 END) AS partial_count,
|
||||
SUM(CASE WHEN outcome_status = 'failed' THEN 1 ELSE 0 END) AS failed_count,
|
||||
COUNT(DISTINCT error_type) AS unique_error_types,
|
||||
COUNT(DISTINCT source) AS sources
|
||||
FROM self_correction_events
|
||||
"""
|
||||
).fetchone()
|
||||
if row is None:
|
||||
return _empty_stats()
|
||||
d = dict(row)
|
||||
total = d.get("total") or 0
|
||||
if total:
|
||||
d["success_rate"] = round((d.get("success_count") or 0) / total * 100)
|
||||
else:
|
||||
d["success_rate"] = 0
|
||||
return d
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to fetch self-correction stats: %s", exc)
|
||||
return _empty_stats()
|
||||
|
||||
|
||||
def _empty_stats() -> dict:
|
||||
return {
|
||||
"total": 0,
|
||||
"success_count": 0,
|
||||
"partial_count": 0,
|
||||
"failed_count": 0,
|
||||
"unique_error_types": 0,
|
||||
"sources": 0,
|
||||
"success_rate": 0,
|
||||
}
|
||||
@@ -1 +0,0 @@
|
||||
"""Vendor-specific chat platform adapters (e.g. Discord) for the chat bridge."""
|
||||
|
||||
@@ -24,8 +24,6 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
@dataclass
|
||||
class Intent:
|
||||
"""A classified user intent with confidence score and extracted entities."""
|
||||
|
||||
name: str
|
||||
confidence: float # 0.0 to 1.0
|
||||
entities: dict
|
||||
|
||||
@@ -17,15 +17,11 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class TxType(StrEnum):
|
||||
"""Lightning transaction direction type."""
|
||||
|
||||
incoming = "incoming"
|
||||
outgoing = "outgoing"
|
||||
|
||||
|
||||
class TxStatus(StrEnum):
|
||||
"""Lightning transaction settlement status."""
|
||||
|
||||
pending = "pending"
|
||||
settled = "settled"
|
||||
failed = "failed"
|
||||
|
||||
@@ -1,7 +0,0 @@
|
||||
"""Self-coding package — Timmy's self-modification capability.
|
||||
|
||||
Provides the branch→edit→test→commit/revert loop that allows Timmy
|
||||
to propose and apply code changes autonomously, gated by the test suite.
|
||||
|
||||
Main entry point: ``self_coding.self_modify.loop``
|
||||
"""
|
||||
@@ -1,129 +0,0 @@
|
||||
"""Gitea REST client — thin wrapper for PR creation and issue commenting.
|
||||
|
||||
Uses ``settings.gitea_url``, ``settings.gitea_token``, and
|
||||
``settings.gitea_repo`` (owner/repo) from config. Degrades gracefully
|
||||
when the token is absent or the server is unreachable.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from dataclasses import dataclass
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class PullRequest:
|
||||
"""Minimal representation of a created pull request."""
|
||||
|
||||
number: int
|
||||
title: str
|
||||
html_url: str
|
||||
|
||||
|
||||
class GiteaClient:
|
||||
"""HTTP client for Gitea's REST API v1.
|
||||
|
||||
All methods return structured results and never raise — errors are
|
||||
logged at WARNING level and indicated via return value.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
base_url: str | None = None,
|
||||
token: str | None = None,
|
||||
repo: str | None = None,
|
||||
) -> None:
|
||||
from config import settings
|
||||
|
||||
self._base_url = (base_url or settings.gitea_url).rstrip("/")
|
||||
self._token = token or settings.gitea_token
|
||||
self._repo = repo or settings.gitea_repo
|
||||
|
||||
# ── internal ────────────────────────────────────────────────────────────
|
||||
|
||||
def _headers(self) -> dict[str, str]:
|
||||
return {
|
||||
"Authorization": f"token {self._token}",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
|
||||
def _api(self, path: str) -> str:
|
||||
return f"{self._base_url}/api/v1/{path.lstrip('/')}"
|
||||
|
||||
# ── public API ───────────────────────────────────────────────────────────
|
||||
|
||||
def create_pull_request(
|
||||
self,
|
||||
title: str,
|
||||
body: str,
|
||||
head: str,
|
||||
base: str = "main",
|
||||
) -> PullRequest | None:
|
||||
"""Open a pull request.
|
||||
|
||||
Args:
|
||||
title: PR title (keep under 70 chars).
|
||||
body: PR body in markdown.
|
||||
head: Source branch (e.g. ``self-modify/issue-983``).
|
||||
base: Target branch (default ``main``).
|
||||
|
||||
Returns:
|
||||
A ``PullRequest`` dataclass on success, ``None`` on failure.
|
||||
"""
|
||||
if not self._token:
|
||||
logger.warning("Gitea token not configured — skipping PR creation")
|
||||
return None
|
||||
|
||||
try:
|
||||
import requests as _requests
|
||||
|
||||
resp = _requests.post(
|
||||
self._api(f"repos/{self._repo}/pulls"),
|
||||
headers=self._headers(),
|
||||
json={"title": title, "body": body, "head": head, "base": base},
|
||||
timeout=15,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
data = resp.json()
|
||||
pr = PullRequest(
|
||||
number=data["number"],
|
||||
title=data["title"],
|
||||
html_url=data["html_url"],
|
||||
)
|
||||
logger.info("PR #%d created: %s", pr.number, pr.html_url)
|
||||
return pr
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to create PR: %s", exc)
|
||||
return None
|
||||
|
||||
def add_issue_comment(self, issue_number: int, body: str) -> bool:
|
||||
"""Post a comment on an issue or PR.
|
||||
|
||||
Returns:
|
||||
True on success, False on failure.
|
||||
"""
|
||||
if not self._token:
|
||||
logger.warning("Gitea token not configured — skipping issue comment")
|
||||
return False
|
||||
|
||||
try:
|
||||
import requests as _requests
|
||||
|
||||
resp = _requests.post(
|
||||
self._api(f"repos/{self._repo}/issues/{issue_number}/comments"),
|
||||
headers=self._headers(),
|
||||
json={"body": body},
|
||||
timeout=15,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
logger.info("Comment posted on issue #%d", issue_number)
|
||||
return True
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to post comment on issue #%d: %s", issue_number, exc)
|
||||
return False
|
||||
|
||||
|
||||
# Module-level singleton
|
||||
gitea_client = GiteaClient()
|
||||
@@ -1 +0,0 @@
|
||||
"""Self-modification loop sub-package."""
|
||||
@@ -1,301 +0,0 @@
|
||||
"""Self-modification loop — branch → edit → test → commit/revert.
|
||||
|
||||
Timmy's self-coding capability, restored after deletion in
|
||||
Operation Darling Purge (commit 584eeb679e88).
|
||||
|
||||
## Cycle
|
||||
1. **Branch** — create ``self-modify/<slug>`` from ``main``
|
||||
2. **Edit** — apply the proposed change (patch string or callable)
|
||||
3. **Test** — run ``pytest tests/ -x -q``; never commit on failure
|
||||
4. **Commit** — stage and commit on green; revert branch on red
|
||||
5. **PR** — open a Gitea pull request (requires no direct push to main)
|
||||
|
||||
## Guards
|
||||
- Never push directly to ``main`` or ``master``
|
||||
- All changes land via PR (enforced by ``_guard_branch``)
|
||||
- Test gate is mandatory; ``skip_tests=True`` is for unit-test use only
|
||||
- Commits only happen when ``pytest tests/ -x -q`` exits 0
|
||||
|
||||
## Usage::
|
||||
|
||||
from self_coding.self_modify.loop import SelfModifyLoop
|
||||
|
||||
loop = SelfModifyLoop()
|
||||
result = await loop.run(
|
||||
slug="add-hello-tool",
|
||||
description="Add hello() convenience tool",
|
||||
edit_fn=my_edit_function, # callable(repo_root: str) -> None
|
||||
)
|
||||
if result.success:
|
||||
print(f"PR: {result.pr_url}")
|
||||
else:
|
||||
print(f"Failed: {result.error}")
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import subprocess
|
||||
import time
|
||||
from collections.abc import Callable
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
|
||||
from config import settings
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Branches that must never receive direct commits
|
||||
_PROTECTED_BRANCHES = frozenset({"main", "master", "develop"})
|
||||
|
||||
# Test command used as the commit gate
|
||||
_TEST_COMMAND = ["pytest", "tests/", "-x", "-q", "--tb=short"]
|
||||
|
||||
# Max time (seconds) to wait for the test suite
|
||||
_TEST_TIMEOUT = 300
|
||||
|
||||
|
||||
@dataclass
|
||||
class LoopResult:
|
||||
"""Result from one self-modification cycle."""
|
||||
|
||||
success: bool
|
||||
branch: str = ""
|
||||
commit_sha: str = ""
|
||||
pr_url: str = ""
|
||||
pr_number: int = 0
|
||||
test_output: str = ""
|
||||
error: str = ""
|
||||
elapsed_ms: float = 0.0
|
||||
metadata: dict = field(default_factory=dict)
|
||||
|
||||
|
||||
class SelfModifyLoop:
|
||||
"""Orchestrate branch → edit → test → commit/revert → PR.
|
||||
|
||||
Args:
|
||||
repo_root: Absolute path to the git repository (defaults to
|
||||
``settings.repo_root``).
|
||||
remote: Git remote name (default ``origin``).
|
||||
base_branch: Branch to fork from and target for the PR
|
||||
(default ``main``).
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
repo_root: str | None = None,
|
||||
remote: str = "origin",
|
||||
base_branch: str = "main",
|
||||
) -> None:
|
||||
self._repo_root = Path(repo_root or settings.repo_root)
|
||||
self._remote = remote
|
||||
self._base_branch = base_branch
|
||||
|
||||
# ── public ──────────────────────────────────────────────────────────────
|
||||
|
||||
async def run(
|
||||
self,
|
||||
slug: str,
|
||||
description: str,
|
||||
edit_fn: Callable[[str], None],
|
||||
issue_number: int | None = None,
|
||||
skip_tests: bool = False,
|
||||
) -> LoopResult:
|
||||
"""Execute one full self-modification cycle.
|
||||
|
||||
Args:
|
||||
slug: Short identifier used for the branch name
|
||||
(e.g. ``"add-hello-tool"``).
|
||||
description: Human-readable description for commit message
|
||||
and PR body.
|
||||
edit_fn: Callable that receives the repo root path (str)
|
||||
and applies the desired code changes in-place.
|
||||
issue_number: Optional Gitea issue number to reference in PR.
|
||||
skip_tests: If ``True``, skip the test gate (unit-test use
|
||||
only — never use in production).
|
||||
|
||||
Returns:
|
||||
:class:`LoopResult` describing the outcome.
|
||||
"""
|
||||
start = time.time()
|
||||
branch = f"self-modify/{slug}"
|
||||
|
||||
try:
|
||||
self._guard_branch(branch)
|
||||
self._checkout_base()
|
||||
self._create_branch(branch)
|
||||
|
||||
try:
|
||||
edit_fn(str(self._repo_root))
|
||||
except Exception as exc:
|
||||
self._revert_branch(branch)
|
||||
return LoopResult(
|
||||
success=False,
|
||||
branch=branch,
|
||||
error=f"edit_fn raised: {exc}",
|
||||
elapsed_ms=self._elapsed(start),
|
||||
)
|
||||
|
||||
if not skip_tests:
|
||||
test_output, passed = self._run_tests()
|
||||
if not passed:
|
||||
self._revert_branch(branch)
|
||||
return LoopResult(
|
||||
success=False,
|
||||
branch=branch,
|
||||
test_output=test_output,
|
||||
error="Tests failed — branch reverted",
|
||||
elapsed_ms=self._elapsed(start),
|
||||
)
|
||||
else:
|
||||
test_output = "(tests skipped)"
|
||||
|
||||
sha = self._commit_all(description)
|
||||
self._push_branch(branch)
|
||||
|
||||
pr = self._create_pr(
|
||||
branch=branch,
|
||||
description=description,
|
||||
test_output=test_output,
|
||||
issue_number=issue_number,
|
||||
)
|
||||
|
||||
return LoopResult(
|
||||
success=True,
|
||||
branch=branch,
|
||||
commit_sha=sha,
|
||||
pr_url=pr.html_url if pr else "",
|
||||
pr_number=pr.number if pr else 0,
|
||||
test_output=test_output,
|
||||
elapsed_ms=self._elapsed(start),
|
||||
)
|
||||
|
||||
except Exception as exc:
|
||||
logger.warning("Self-modify loop failed: %s", exc)
|
||||
return LoopResult(
|
||||
success=False,
|
||||
branch=branch,
|
||||
error=str(exc),
|
||||
elapsed_ms=self._elapsed(start),
|
||||
)
|
||||
|
||||
# ── private helpers ──────────────────────────────────────────────────────
|
||||
|
||||
@staticmethod
|
||||
def _elapsed(start: float) -> float:
|
||||
return (time.time() - start) * 1000
|
||||
|
||||
def _git(self, *args: str, check: bool = True) -> subprocess.CompletedProcess:
|
||||
"""Run a git command in the repo root."""
|
||||
cmd = ["git", *args]
|
||||
logger.debug("git %s", " ".join(args))
|
||||
return subprocess.run(
|
||||
cmd,
|
||||
cwd=str(self._repo_root),
|
||||
capture_output=True,
|
||||
text=True,
|
||||
check=check,
|
||||
)
|
||||
|
||||
def _guard_branch(self, branch: str) -> None:
|
||||
"""Raise if the target branch is a protected branch name."""
|
||||
if branch in _PROTECTED_BRANCHES:
|
||||
raise ValueError(
|
||||
f"Refusing to operate on protected branch '{branch}'. "
|
||||
"All self-modifications must go via PR."
|
||||
)
|
||||
|
||||
def _checkout_base(self) -> None:
|
||||
"""Checkout the base branch and pull latest."""
|
||||
self._git("checkout", self._base_branch)
|
||||
# Best-effort pull; ignore failures (e.g. no remote configured)
|
||||
self._git("pull", self._remote, self._base_branch, check=False)
|
||||
|
||||
def _create_branch(self, branch: str) -> None:
|
||||
"""Create and checkout a new branch, deleting an old one if needed."""
|
||||
# Delete local branch if it already exists (stale prior attempt)
|
||||
self._git("branch", "-D", branch, check=False)
|
||||
self._git("checkout", "-b", branch)
|
||||
logger.info("Created branch: %s", branch)
|
||||
|
||||
def _revert_branch(self, branch: str) -> None:
|
||||
"""Checkout base and delete the failed branch."""
|
||||
try:
|
||||
self._git("checkout", self._base_branch, check=False)
|
||||
self._git("branch", "-D", branch, check=False)
|
||||
logger.info("Reverted and deleted branch: %s", branch)
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to revert branch %s: %s", branch, exc)
|
||||
|
||||
def _run_tests(self) -> tuple[str, bool]:
|
||||
"""Run the test suite. Returns (output, passed)."""
|
||||
logger.info("Running test suite: %s", " ".join(_TEST_COMMAND))
|
||||
try:
|
||||
result = subprocess.run(
|
||||
_TEST_COMMAND,
|
||||
cwd=str(self._repo_root),
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=_TEST_TIMEOUT,
|
||||
)
|
||||
output = (result.stdout + "\n" + result.stderr).strip()
|
||||
passed = result.returncode == 0
|
||||
logger.info(
|
||||
"Test suite %s (exit %d)", "PASSED" if passed else "FAILED", result.returncode
|
||||
)
|
||||
return output, passed
|
||||
except subprocess.TimeoutExpired:
|
||||
msg = f"Test suite timed out after {_TEST_TIMEOUT}s"
|
||||
logger.warning(msg)
|
||||
return msg, False
|
||||
except FileNotFoundError:
|
||||
msg = "pytest not found on PATH"
|
||||
logger.warning(msg)
|
||||
return msg, False
|
||||
|
||||
def _commit_all(self, message: str) -> str:
|
||||
"""Stage all changes and create a commit. Returns the new SHA."""
|
||||
self._git("add", "-A")
|
||||
self._git("commit", "-m", message)
|
||||
result = self._git("rev-parse", "HEAD")
|
||||
sha = result.stdout.strip()
|
||||
logger.info("Committed: %s sha=%s", message[:60], sha[:12])
|
||||
return sha
|
||||
|
||||
def _push_branch(self, branch: str) -> None:
|
||||
"""Push the branch to the remote."""
|
||||
self._git("push", "-u", self._remote, branch)
|
||||
logger.info("Pushed branch: %s -> %s", branch, self._remote)
|
||||
|
||||
def _create_pr(
|
||||
self,
|
||||
branch: str,
|
||||
description: str,
|
||||
test_output: str,
|
||||
issue_number: int | None,
|
||||
):
|
||||
"""Open a Gitea PR. Returns PullRequest or None on failure."""
|
||||
from self_coding.gitea_client import GiteaClient
|
||||
|
||||
client = GiteaClient()
|
||||
|
||||
issue_ref = f"\n\nFixes #{issue_number}" if issue_number else ""
|
||||
test_section = (
|
||||
f"\n\n## Test results\n```\n{test_output[:2000]}\n```"
|
||||
if test_output and test_output != "(tests skipped)"
|
||||
else ""
|
||||
)
|
||||
|
||||
body = (
|
||||
f"## Summary\n{description}"
|
||||
f"{issue_ref}"
|
||||
f"{test_section}"
|
||||
"\n\n🤖 Generated by Timmy's self-modification loop"
|
||||
)
|
||||
|
||||
return client.create_pull_request(
|
||||
title=f"[self-modify] {description[:60]}",
|
||||
body=body,
|
||||
head=branch,
|
||||
base=self._base_branch,
|
||||
)
|
||||
@@ -301,26 +301,6 @@ def create_timmy(
|
||||
|
||||
return GrokBackend()
|
||||
|
||||
if resolved == "airllm":
|
||||
# AirLLM requires Apple Silicon. On any other platform (Intel Mac, Linux,
|
||||
# Windows) or when the package is not installed, degrade silently to Ollama.
|
||||
from timmy.backends import is_apple_silicon
|
||||
|
||||
if not is_apple_silicon():
|
||||
logger.warning(
|
||||
"TIMMY_MODEL_BACKEND=airllm requested but not running on Apple Silicon "
|
||||
"— falling back to Ollama"
|
||||
)
|
||||
else:
|
||||
try:
|
||||
import airllm # noqa: F401
|
||||
except ImportError:
|
||||
logger.warning(
|
||||
"AirLLM not installed — falling back to Ollama. "
|
||||
"Install with: pip install 'airllm[mlx]'"
|
||||
)
|
||||
# Fall through to Ollama in all cases (AirLLM integration is scaffolded)
|
||||
|
||||
# Default: Ollama via Agno.
|
||||
model_name, is_fallback = _resolve_model_with_fallback(
|
||||
requested_model=None,
|
||||
|
||||
@@ -312,13 +312,6 @@ async def _handle_step_failure(
|
||||
"adaptation": step.result[:200],
|
||||
},
|
||||
)
|
||||
_log_self_correction(
|
||||
task_id=task_id,
|
||||
step_desc=step_desc,
|
||||
exc=exc,
|
||||
outcome=step.result,
|
||||
outcome_status="success",
|
||||
)
|
||||
if on_progress:
|
||||
await on_progress(f"[Adapted] {step_desc}", step_num, total_steps)
|
||||
except Exception as adapt_exc: # broad catch intentional
|
||||
@@ -332,42 +325,9 @@ async def _handle_step_failure(
|
||||
duration_ms=int((time.monotonic() - step_start) * 1000),
|
||||
)
|
||||
)
|
||||
_log_self_correction(
|
||||
task_id=task_id,
|
||||
step_desc=step_desc,
|
||||
exc=exc,
|
||||
outcome=f"Adaptation also failed: {adapt_exc}",
|
||||
outcome_status="failed",
|
||||
)
|
||||
completed_results.append(f"Step {step_num}: FAILED")
|
||||
|
||||
|
||||
def _log_self_correction(
|
||||
*,
|
||||
task_id: str,
|
||||
step_desc: str,
|
||||
exc: Exception,
|
||||
outcome: str,
|
||||
outcome_status: str,
|
||||
) -> None:
|
||||
"""Best-effort: log a self-correction event (never raises)."""
|
||||
try:
|
||||
from infrastructure.self_correction import log_self_correction
|
||||
|
||||
log_self_correction(
|
||||
source="agentic_loop",
|
||||
original_intent=step_desc,
|
||||
detected_error=f"{type(exc).__name__}: {exc}",
|
||||
correction_strategy="Adaptive re-plan via LLM",
|
||||
final_outcome=outcome[:500],
|
||||
task_id=task_id,
|
||||
outcome_status=outcome_status,
|
||||
error_type=type(exc).__name__,
|
||||
)
|
||||
except Exception as log_exc:
|
||||
logger.debug("Self-correction log failed: %s", log_exc)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Core loop
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@@ -36,8 +36,6 @@ _EXPIRY_DAYS = 7
|
||||
|
||||
@dataclass
|
||||
class ApprovalItem:
|
||||
"""A proposed autonomous action requiring owner approval."""
|
||||
|
||||
id: str
|
||||
title: str
|
||||
description: str
|
||||
|
||||
@@ -8,7 +8,7 @@ Flow:
|
||||
1. prepare_experiment — clone repo + run data prep
|
||||
2. run_experiment — execute train.py with wall-clock timeout
|
||||
3. evaluate_result — compare metric against baseline
|
||||
4. SystemExperiment — orchestrate the full cycle via class interface
|
||||
4. experiment_loop — orchestrate the full cycle
|
||||
|
||||
All subprocess calls are guarded with timeouts for graceful degradation.
|
||||
"""
|
||||
@@ -17,12 +17,9 @@ from __future__ import annotations
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import platform
|
||||
import re
|
||||
import subprocess
|
||||
import time
|
||||
from collections.abc import Callable
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
@@ -32,61 +29,15 @@ DEFAULT_REPO = "https://github.com/karpathy/autoresearch.git"
|
||||
_METRIC_RE = re.compile(r"val_bpb[:\s]+([0-9]+\.?[0-9]*)")
|
||||
|
||||
|
||||
# ── Higher-is-better metric names ────────────────────────────────────────────
|
||||
_HIGHER_IS_BETTER = frozenset({"unit_pass_rate", "coverage"})
|
||||
|
||||
|
||||
def is_apple_silicon() -> bool:
|
||||
"""Return True when running on Apple Silicon (M-series chip)."""
|
||||
return platform.system() == "Darwin" and platform.machine() == "arm64"
|
||||
|
||||
|
||||
def _build_experiment_env(
|
||||
dataset: str = "tinystories",
|
||||
backend: str = "auto",
|
||||
) -> dict[str, str]:
|
||||
"""Build environment variables for an autoresearch subprocess.
|
||||
|
||||
Args:
|
||||
dataset: Dataset name forwarded as ``AUTORESEARCH_DATASET``.
|
||||
``"tinystories"`` is recommended for Apple Silicon (lower entropy,
|
||||
faster iteration).
|
||||
backend: Inference backend forwarded as ``AUTORESEARCH_BACKEND``.
|
||||
``"auto"`` enables MLX on Apple Silicon; ``"cpu"`` forces CPU.
|
||||
|
||||
Returns:
|
||||
Merged environment dict (inherits current process env).
|
||||
"""
|
||||
env = os.environ.copy()
|
||||
env["AUTORESEARCH_DATASET"] = dataset
|
||||
|
||||
if backend == "auto":
|
||||
env["AUTORESEARCH_BACKEND"] = "mlx" if is_apple_silicon() else "cuda"
|
||||
else:
|
||||
env["AUTORESEARCH_BACKEND"] = backend
|
||||
|
||||
return env
|
||||
|
||||
|
||||
def prepare_experiment(
|
||||
workspace: Path,
|
||||
repo_url: str = DEFAULT_REPO,
|
||||
dataset: str = "tinystories",
|
||||
backend: str = "auto",
|
||||
) -> str:
|
||||
"""Clone autoresearch repo and run data preparation.
|
||||
|
||||
On Apple Silicon the ``dataset`` defaults to ``"tinystories"`` (lower
|
||||
entropy, faster iteration) and ``backend`` to ``"auto"`` which resolves to
|
||||
MLX. Both values are forwarded as ``AUTORESEARCH_DATASET`` /
|
||||
``AUTORESEARCH_BACKEND`` environment variables so that ``prepare.py`` and
|
||||
``train.py`` can adapt their behaviour without CLI changes.
|
||||
|
||||
Args:
|
||||
workspace: Directory to set up the experiment in.
|
||||
repo_url: Git URL for the autoresearch repository.
|
||||
dataset: Dataset name; ``"tinystories"`` is recommended on Mac.
|
||||
backend: Inference backend; ``"auto"`` picks MLX on Apple Silicon.
|
||||
|
||||
Returns:
|
||||
Status message describing what was prepared.
|
||||
@@ -108,14 +59,6 @@ def prepare_experiment(
|
||||
else:
|
||||
logger.info("Autoresearch repo already present at %s", repo_dir)
|
||||
|
||||
env = _build_experiment_env(dataset=dataset, backend=backend)
|
||||
if is_apple_silicon():
|
||||
logger.info(
|
||||
"Apple Silicon detected — dataset=%s backend=%s",
|
||||
env["AUTORESEARCH_DATASET"],
|
||||
env["AUTORESEARCH_BACKEND"],
|
||||
)
|
||||
|
||||
# Run prepare.py (data download + tokeniser training)
|
||||
prepare_script = repo_dir / "prepare.py"
|
||||
if prepare_script.exists():
|
||||
@@ -126,7 +69,6 @@ def prepare_experiment(
|
||||
text=True,
|
||||
cwd=str(repo_dir),
|
||||
timeout=300,
|
||||
env=env,
|
||||
)
|
||||
if result.returncode != 0:
|
||||
return f"Preparation failed: {result.stderr.strip()[:500]}"
|
||||
@@ -139,8 +81,6 @@ def run_experiment(
|
||||
workspace: Path,
|
||||
timeout: int = 300,
|
||||
metric_name: str = "val_bpb",
|
||||
dataset: str = "tinystories",
|
||||
backend: str = "auto",
|
||||
) -> dict[str, Any]:
|
||||
"""Run a single training experiment with a wall-clock timeout.
|
||||
|
||||
@@ -148,9 +88,6 @@ def run_experiment(
|
||||
workspace: Experiment workspace (contains autoresearch/ subdir).
|
||||
timeout: Maximum wall-clock seconds for the run.
|
||||
metric_name: Name of the metric to extract from stdout.
|
||||
dataset: Dataset forwarded to the subprocess via env var.
|
||||
backend: Inference backend forwarded via env var (``"auto"`` → MLX on
|
||||
Apple Silicon, CUDA otherwise).
|
||||
|
||||
Returns:
|
||||
Dict with keys: metric (float|None), log (str), duration_s (int),
|
||||
@@ -168,7 +105,6 @@ def run_experiment(
|
||||
"error": f"train.py not found in {repo_dir}",
|
||||
}
|
||||
|
||||
env = _build_experiment_env(dataset=dataset, backend=backend)
|
||||
start = time.monotonic()
|
||||
try:
|
||||
result = subprocess.run(
|
||||
@@ -177,7 +113,6 @@ def run_experiment(
|
||||
text=True,
|
||||
cwd=str(repo_dir),
|
||||
timeout=timeout,
|
||||
env=env,
|
||||
)
|
||||
duration = int(time.monotonic() - start)
|
||||
output = result.stdout + result.stderr
|
||||
@@ -190,7 +125,7 @@ def run_experiment(
|
||||
"log": output[-2000:], # Keep last 2k chars
|
||||
"duration_s": duration,
|
||||
"success": result.returncode == 0,
|
||||
"error": (None if result.returncode == 0 else f"Exit code {result.returncode}"),
|
||||
"error": None if result.returncode == 0 else f"Exit code {result.returncode}",
|
||||
}
|
||||
except subprocess.TimeoutExpired:
|
||||
duration = int(time.monotonic() - start)
|
||||
@@ -277,369 +212,3 @@ def _append_result(workspace: Path, result: dict[str, Any]) -> None:
|
||||
results_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
with results_file.open("a") as f:
|
||||
f.write(json.dumps(result) + "\n")
|
||||
|
||||
|
||||
def _extract_pass_rate(output: str) -> float | None:
|
||||
"""Extract pytest pass rate as a percentage from tox/pytest output."""
|
||||
passed_m = re.search(r"(\d+) passed", output)
|
||||
failed_m = re.search(r"(\d+) failed", output)
|
||||
if passed_m:
|
||||
passed = int(passed_m.group(1))
|
||||
failed = int(failed_m.group(1)) if failed_m else 0
|
||||
total = passed + failed
|
||||
return (passed / total * 100.0) if total > 0 else 100.0
|
||||
return None
|
||||
|
||||
|
||||
def _extract_coverage(output: str) -> float | None:
|
||||
"""Extract total coverage percentage from coverage output."""
|
||||
coverage_m = re.search(r"(?:TOTAL\s+\d+\s+\d+\s+|Total coverage:\s*)(\d+)%", output)
|
||||
if coverage_m:
|
||||
try:
|
||||
return float(coverage_m.group(1))
|
||||
except ValueError:
|
||||
pass
|
||||
return None
|
||||
|
||||
|
||||
class SystemExperiment:
|
||||
"""An autoresearch experiment targeting a specific module with a configurable metric.
|
||||
|
||||
Encapsulates the hypothesis → edit → tox → evaluate → commit/revert loop
|
||||
for a single target file or module.
|
||||
|
||||
Args:
|
||||
target: Path or module name to optimise (e.g. ``src/timmy/agent.py``).
|
||||
metric: Metric to extract from tox output. Built-in values:
|
||||
``unit_pass_rate`` (default), ``coverage``, ``val_bpb``.
|
||||
Any other value is forwarded to :func:`_extract_metric`.
|
||||
budget_minutes: Wall-clock budget per experiment (default 5 min).
|
||||
workspace: Working directory for subprocess calls. Defaults to ``cwd``.
|
||||
revert_on_failure: Whether to revert changes on failed experiments.
|
||||
hypothesis: Optional natural language hypothesis for the experiment.
|
||||
metric_fn: Optional callable for custom metric extraction.
|
||||
If provided, overrides built-in metric extraction.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
target: str,
|
||||
metric: str = "unit_pass_rate",
|
||||
budget_minutes: int = 5,
|
||||
workspace: Path | None = None,
|
||||
revert_on_failure: bool = True,
|
||||
hypothesis: str = "",
|
||||
metric_fn: Callable[[str], float | None] | None = None,
|
||||
) -> None:
|
||||
self.target = target
|
||||
self.metric = metric
|
||||
self.budget_seconds = budget_minutes * 60
|
||||
self.workspace = Path(workspace) if workspace else Path.cwd()
|
||||
self.revert_on_failure = revert_on_failure
|
||||
self.hypothesis = hypothesis
|
||||
self.metric_fn = metric_fn
|
||||
self.results: list[dict[str, Any]] = []
|
||||
self.baseline: float | None = None
|
||||
|
||||
# ── Hypothesis generation ─────────────────────────────────────────────────
|
||||
|
||||
def generate_hypothesis(self, program_content: str = "") -> str:
|
||||
"""Return a plain-English hypothesis for the next experiment.
|
||||
|
||||
Uses the first non-empty line of *program_content* when available;
|
||||
falls back to a generic description based on target and metric.
|
||||
"""
|
||||
first_line = ""
|
||||
for line in program_content.splitlines():
|
||||
stripped = line.strip()
|
||||
if stripped and not stripped.startswith("#"):
|
||||
first_line = stripped[:120]
|
||||
break
|
||||
if first_line:
|
||||
return f"[{self.target}] {first_line}"
|
||||
return f"Improve {self.metric} for {self.target}"
|
||||
|
||||
# ── Edit phase ────────────────────────────────────────────────────────────
|
||||
|
||||
def apply_edit(self, hypothesis: str, model: str = "qwen3:30b") -> str:
|
||||
"""Apply code edits to *target* via Aider.
|
||||
|
||||
Returns a status string. Degrades gracefully — never raises.
|
||||
"""
|
||||
prompt = f"Edit {self.target}: {hypothesis}"
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["aider", "--no-git", "--model", f"ollama/{model}", "--quiet", prompt],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=self.budget_seconds,
|
||||
cwd=str(self.workspace),
|
||||
)
|
||||
if result.returncode == 0:
|
||||
return result.stdout or "Edit applied."
|
||||
return f"Aider error (exit {result.returncode}): {result.stderr[:500]}"
|
||||
except FileNotFoundError:
|
||||
logger.warning("Aider not installed — edit skipped")
|
||||
return "Aider not available — edit skipped"
|
||||
except subprocess.TimeoutExpired:
|
||||
logger.warning("Aider timed out after %ds", self.budget_seconds)
|
||||
return "Aider timed out"
|
||||
except (OSError, subprocess.SubprocessError) as exc:
|
||||
logger.warning("Aider failed: %s", exc)
|
||||
return f"Edit failed: {exc}"
|
||||
|
||||
# ── Evaluation phase ──────────────────────────────────────────────────────
|
||||
|
||||
def run_tox(self, tox_env: str = "unit") -> dict[str, Any]:
|
||||
"""Run *tox_env* and return a result dict.
|
||||
|
||||
Returns:
|
||||
Dict with keys: ``metric`` (float|None), ``log`` (str),
|
||||
``duration_s`` (int), ``success`` (bool), ``error`` (str|None).
|
||||
"""
|
||||
start = time.monotonic()
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["tox", "-e", tox_env],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=self.budget_seconds,
|
||||
cwd=str(self.workspace),
|
||||
)
|
||||
duration = int(time.monotonic() - start)
|
||||
output = result.stdout + result.stderr
|
||||
metric_val = self._extract_tox_metric(output)
|
||||
return {
|
||||
"metric": metric_val,
|
||||
"log": output[-3000:],
|
||||
"duration_s": duration,
|
||||
"success": result.returncode == 0,
|
||||
"error": (None if result.returncode == 0 else f"Exit code {result.returncode}"),
|
||||
}
|
||||
except subprocess.TimeoutExpired:
|
||||
duration = int(time.monotonic() - start)
|
||||
return {
|
||||
"metric": None,
|
||||
"log": f"Budget exceeded after {self.budget_seconds}s",
|
||||
"duration_s": duration,
|
||||
"success": False,
|
||||
"error": f"Budget exceeded after {self.budget_seconds}s",
|
||||
}
|
||||
except OSError as exc:
|
||||
return {
|
||||
"metric": None,
|
||||
"log": "",
|
||||
"duration_s": 0,
|
||||
"success": False,
|
||||
"error": str(exc),
|
||||
}
|
||||
|
||||
def _extract_tox_metric(self, output: str) -> float | None:
|
||||
"""Dispatch to the correct metric extractor based on *self.metric*."""
|
||||
# Use custom metric function if provided
|
||||
if self.metric_fn is not None:
|
||||
try:
|
||||
return self.metric_fn(output)
|
||||
except Exception as exc:
|
||||
logger.warning("Custom metric_fn failed: %s", exc)
|
||||
return None
|
||||
|
||||
if self.metric == "unit_pass_rate":
|
||||
return _extract_pass_rate(output)
|
||||
if self.metric == "coverage":
|
||||
return _extract_coverage(output)
|
||||
return _extract_metric(output, self.metric)
|
||||
|
||||
def evaluate(self, current: float | None, baseline: float | None) -> str:
|
||||
"""Compare *current* metric against *baseline* and return an assessment."""
|
||||
if current is None:
|
||||
return "Indeterminate: metric not extracted from output"
|
||||
if baseline is None:
|
||||
unit = "%" if self.metric in _HIGHER_IS_BETTER else ""
|
||||
return f"Baseline: {self.metric} = {current:.2f}{unit}"
|
||||
|
||||
if self.metric in _HIGHER_IS_BETTER:
|
||||
delta = current - baseline
|
||||
pct = (delta / baseline * 100) if baseline != 0 else 0.0
|
||||
if delta > 0:
|
||||
return f"Improvement: {self.metric} {baseline:.2f}% → {current:.2f}% ({pct:+.2f}%)"
|
||||
if delta < 0:
|
||||
return f"Regression: {self.metric} {baseline:.2f}% → {current:.2f}% ({pct:+.2f}%)"
|
||||
return f"No change: {self.metric} = {current:.2f}%"
|
||||
|
||||
# lower-is-better (val_bpb, loss, etc.)
|
||||
return evaluate_result(current, baseline, self.metric)
|
||||
|
||||
def is_improvement(self, current: float, baseline: float) -> bool:
|
||||
"""Return True if *current* is better than *baseline* for this metric."""
|
||||
if self.metric in _HIGHER_IS_BETTER:
|
||||
return current > baseline
|
||||
return current < baseline # lower-is-better
|
||||
|
||||
# ── Git phase ─────────────────────────────────────────────────────────────
|
||||
|
||||
def create_branch(self, branch_name: str) -> bool:
|
||||
"""Create and checkout a new git branch. Returns True on success."""
|
||||
try:
|
||||
subprocess.run(
|
||||
["git", "checkout", "-b", branch_name],
|
||||
cwd=str(self.workspace),
|
||||
check=True,
|
||||
timeout=30,
|
||||
)
|
||||
return True
|
||||
except subprocess.CalledProcessError as exc:
|
||||
logger.warning("Git branch creation failed: %s", exc)
|
||||
return False
|
||||
|
||||
def commit_changes(self, message: str) -> bool:
|
||||
"""Stage and commit all changes. Returns True on success."""
|
||||
try:
|
||||
subprocess.run(["git", "add", "-A"], cwd=str(self.workspace), check=True, timeout=30)
|
||||
subprocess.run(
|
||||
["git", "commit", "-m", message],
|
||||
cwd=str(self.workspace),
|
||||
check=True,
|
||||
timeout=30,
|
||||
)
|
||||
return True
|
||||
except subprocess.CalledProcessError as exc:
|
||||
logger.warning("Git commit failed: %s", exc)
|
||||
return False
|
||||
|
||||
def revert_changes(self) -> bool:
|
||||
"""Revert all uncommitted changes. Returns True on success."""
|
||||
try:
|
||||
subprocess.run(
|
||||
["git", "checkout", "--", "."],
|
||||
cwd=str(self.workspace),
|
||||
check=True,
|
||||
timeout=30,
|
||||
)
|
||||
return True
|
||||
except subprocess.CalledProcessError as exc:
|
||||
logger.warning("Git revert failed: %s", exc)
|
||||
return False
|
||||
|
||||
# ── Full experiment loop ──────────────────────────────────────────────────
|
||||
|
||||
def run(
|
||||
self,
|
||||
tox_env: str = "unit",
|
||||
model: str = "qwen3:30b",
|
||||
program_content: str = "",
|
||||
max_iterations: int = 1,
|
||||
dry_run: bool = False,
|
||||
create_branch: bool = False,
|
||||
) -> dict[str, Any]:
|
||||
"""Run the full experiment loop: hypothesis → edit → tox → evaluate → commit/revert.
|
||||
|
||||
This method encapsulates the complete experiment cycle, running multiple
|
||||
iterations until an improvement is found or max_iterations is reached.
|
||||
|
||||
Args:
|
||||
tox_env: Tox environment to run (default "unit").
|
||||
model: Ollama model for Aider edits (default "qwen3:30b").
|
||||
program_content: Research direction for hypothesis generation.
|
||||
max_iterations: Maximum number of experiment iterations.
|
||||
dry_run: If True, only generate hypotheses without making changes.
|
||||
create_branch: If True, create a new git branch for the experiment.
|
||||
|
||||
Returns:
|
||||
Dict with keys: ``success`` (bool), ``final_metric`` (float|None),
|
||||
``baseline`` (float|None), ``iterations`` (int), ``results`` (list).
|
||||
"""
|
||||
if create_branch:
|
||||
branch_name = f"autoresearch/{self.target.replace('/', '-')}-{int(time.time())}"
|
||||
self.create_branch(branch_name)
|
||||
|
||||
baseline: float | None = self.baseline
|
||||
final_metric: float | None = None
|
||||
success = False
|
||||
|
||||
for iteration in range(1, max_iterations + 1):
|
||||
logger.info("Experiment iteration %d/%d", iteration, max_iterations)
|
||||
|
||||
# Generate hypothesis
|
||||
hypothesis = self.hypothesis or self.generate_hypothesis(program_content)
|
||||
logger.info("Hypothesis: %s", hypothesis)
|
||||
|
||||
# In dry-run mode, just record the hypothesis and continue
|
||||
if dry_run:
|
||||
result_record = {
|
||||
"iteration": iteration,
|
||||
"hypothesis": hypothesis,
|
||||
"metric": None,
|
||||
"baseline": baseline,
|
||||
"assessment": "Dry-run: no changes made",
|
||||
"success": True,
|
||||
"duration_s": 0,
|
||||
}
|
||||
self.results.append(result_record)
|
||||
continue
|
||||
|
||||
# Apply edit
|
||||
edit_result = self.apply_edit(hypothesis, model=model)
|
||||
edit_failed = "not available" in edit_result or edit_result.startswith("Aider error")
|
||||
if edit_failed:
|
||||
logger.warning("Edit phase failed: %s", edit_result)
|
||||
|
||||
# Run evaluation
|
||||
tox_result = self.run_tox(tox_env=tox_env)
|
||||
metric = tox_result["metric"]
|
||||
|
||||
# Evaluate result
|
||||
assessment = self.evaluate(metric, baseline)
|
||||
logger.info("Assessment: %s", assessment)
|
||||
|
||||
# Store result
|
||||
result_record = {
|
||||
"iteration": iteration,
|
||||
"hypothesis": hypothesis,
|
||||
"metric": metric,
|
||||
"baseline": baseline,
|
||||
"assessment": assessment,
|
||||
"success": tox_result["success"],
|
||||
"duration_s": tox_result["duration_s"],
|
||||
}
|
||||
self.results.append(result_record)
|
||||
|
||||
# Set baseline on first successful run
|
||||
if metric is not None and baseline is None:
|
||||
baseline = metric
|
||||
self.baseline = baseline
|
||||
final_metric = metric
|
||||
continue
|
||||
|
||||
# Determine if we should commit or revert
|
||||
should_commit = False
|
||||
if tox_result["success"] and metric is not None and baseline is not None:
|
||||
if self.is_improvement(metric, baseline):
|
||||
should_commit = True
|
||||
final_metric = metric
|
||||
baseline = metric
|
||||
self.baseline = baseline
|
||||
success = True
|
||||
|
||||
if should_commit:
|
||||
commit_msg = f"autoresearch: improve {self.metric} on {self.target}\n\n{hypothesis}"
|
||||
if self.commit_changes(commit_msg):
|
||||
logger.info("Changes committed")
|
||||
else:
|
||||
self.revert_changes()
|
||||
logger.warning("Commit failed, changes reverted")
|
||||
elif self.revert_on_failure:
|
||||
self.revert_changes()
|
||||
logger.info("Changes reverted (no improvement)")
|
||||
|
||||
# Early exit if we found an improvement
|
||||
if success:
|
||||
break
|
||||
|
||||
return {
|
||||
"success": success,
|
||||
"final_metric": final_metric,
|
||||
"baseline": self.baseline,
|
||||
"iterations": len(self.results),
|
||||
"results": self.results,
|
||||
}
|
||||
|
||||
@@ -46,8 +46,6 @@ class ApprovalItem:
|
||||
|
||||
@dataclass
|
||||
class Briefing:
|
||||
"""A generated morning briefing summarizing recent activity and pending approvals."""
|
||||
|
||||
generated_at: datetime
|
||||
summary: str # 150-300 words
|
||||
approval_items: list[ApprovalItem] = field(default_factory=list)
|
||||
|
||||
170
src/timmy/cli.py
170
src/timmy/cli.py
@@ -1,4 +1,3 @@
|
||||
"""Typer CLI entry point for the ``timmy`` command (chat, think, status)."""
|
||||
import asyncio
|
||||
import logging
|
||||
import subprocess
|
||||
@@ -348,10 +347,7 @@ def interview(
|
||||
# Force agent creation by calling chat once with a warm-up prompt
|
||||
try:
|
||||
loop.run_until_complete(
|
||||
chat(
|
||||
"Hello, Timmy. We're about to start your interview.",
|
||||
session_id="interview",
|
||||
)
|
||||
chat("Hello, Timmy. We're about to start your interview.", session_id="interview")
|
||||
)
|
||||
except Exception as exc:
|
||||
typer.echo(f"Warning: Initialization issue — {exc}", err=True)
|
||||
@@ -414,17 +410,11 @@ def down():
|
||||
@app.command()
|
||||
def voice(
|
||||
whisper_model: str = typer.Option(
|
||||
"base.en",
|
||||
"--whisper",
|
||||
"-w",
|
||||
help="Whisper model: tiny.en, base.en, small.en, medium.en",
|
||||
"base.en", "--whisper", "-w", help="Whisper model: tiny.en, base.en, small.en, medium.en"
|
||||
),
|
||||
use_say: bool = typer.Option(False, "--say", help="Use macOS `say` instead of Piper TTS"),
|
||||
threshold: float = typer.Option(
|
||||
0.015,
|
||||
"--threshold",
|
||||
"-t",
|
||||
help="Mic silence threshold (RMS). Lower = more sensitive.",
|
||||
0.015, "--threshold", "-t", help="Mic silence threshold (RMS). Lower = more sensitive."
|
||||
),
|
||||
silence: float = typer.Option(1.5, "--silence", help="Seconds of silence to end recording"),
|
||||
backend: str | None = _BACKEND_OPTION,
|
||||
@@ -467,8 +457,7 @@ def route(
|
||||
@app.command()
|
||||
def focus(
|
||||
topic: str | None = typer.Argument(
|
||||
None,
|
||||
help='Topic to focus on (e.g. "three-phase loop"). Omit to show current focus.',
|
||||
None, help='Topic to focus on (e.g. "three-phase loop"). Omit to show current focus.'
|
||||
),
|
||||
clear: bool = typer.Option(False, "--clear", "-c", help="Clear focus and return to broad mode"),
|
||||
):
|
||||
@@ -538,156 +527,5 @@ def healthcheck(
|
||||
raise typer.Exit(result.returncode)
|
||||
|
||||
|
||||
@app.command()
|
||||
def learn(
|
||||
target: str | None = typer.Option(
|
||||
None,
|
||||
"--target",
|
||||
"-t",
|
||||
help="Module or file to optimise (e.g. 'src/timmy/agent.py')",
|
||||
),
|
||||
metric: str = typer.Option(
|
||||
"unit_pass_rate",
|
||||
"--metric",
|
||||
"-m",
|
||||
help="Metric to track: unit_pass_rate | coverage | val_bpb | <custom>",
|
||||
),
|
||||
budget: int = typer.Option(
|
||||
5,
|
||||
"--budget",
|
||||
help="Time limit per experiment in minutes",
|
||||
),
|
||||
max_experiments: int = typer.Option(
|
||||
10,
|
||||
"--max-experiments",
|
||||
help="Cap on total experiments per run",
|
||||
),
|
||||
dry_run: bool = typer.Option(
|
||||
False,
|
||||
"--dry-run",
|
||||
help="Show hypothesis without executing experiments",
|
||||
),
|
||||
program_file: str | None = typer.Option(
|
||||
None,
|
||||
"--program",
|
||||
"-p",
|
||||
help="Path to research direction file (default: program.md in cwd)",
|
||||
),
|
||||
tox_env: str = typer.Option(
|
||||
"unit",
|
||||
"--tox-env",
|
||||
help="Tox environment to run for each evaluation",
|
||||
),
|
||||
model: str = typer.Option(
|
||||
"qwen3:30b",
|
||||
"--model",
|
||||
help="Ollama model forwarded to Aider for code edits",
|
||||
),
|
||||
):
|
||||
"""Start an autonomous improvement loop (autoresearch).
|
||||
|
||||
Reads program.md for research direction, then iterates:
|
||||
hypothesis → edit → tox → evaluate → commit/revert.
|
||||
|
||||
Experiments continue until --max-experiments is reached or the loop is
|
||||
interrupted with Ctrl+C. Use --dry-run to preview hypotheses without
|
||||
making any changes.
|
||||
|
||||
Example:
|
||||
timmy learn --target src/timmy/agent.py --metric unit_pass_rate
|
||||
"""
|
||||
from pathlib import Path
|
||||
|
||||
from timmy.autoresearch import SystemExperiment
|
||||
|
||||
repo_root = Path.cwd()
|
||||
program_path = Path(program_file) if program_file else repo_root / "program.md"
|
||||
|
||||
if program_path.exists():
|
||||
program_content = program_path.read_text()
|
||||
typer.echo(f"Research direction: {program_path}")
|
||||
else:
|
||||
program_content = ""
|
||||
typer.echo(
|
||||
f"Note: {program_path} not found — proceeding without research direction.",
|
||||
err=True,
|
||||
)
|
||||
|
||||
if target is None:
|
||||
typer.echo(
|
||||
"Error: --target is required. Specify the module or file to optimise.",
|
||||
err=True,
|
||||
)
|
||||
raise typer.Exit(1)
|
||||
|
||||
experiment = SystemExperiment(
|
||||
target=target,
|
||||
metric=metric,
|
||||
budget_minutes=budget,
|
||||
)
|
||||
|
||||
typer.echo()
|
||||
typer.echo(typer.style("Autoresearch", bold=True) + f" — {target}")
|
||||
typer.echo(f" metric={metric} budget={budget}min max={max_experiments} tox={tox_env}")
|
||||
if dry_run:
|
||||
typer.echo(" (dry-run — no changes will be made)")
|
||||
typer.echo()
|
||||
|
||||
def _progress_callback(iteration: int, max_iter: int, message: str) -> None:
|
||||
"""Print progress updates during experiment iterations."""
|
||||
if iteration > 0:
|
||||
prefix = typer.style(f"[{iteration}/{max_iter}]", bold=True)
|
||||
typer.echo(f"{prefix} {message}")
|
||||
|
||||
try:
|
||||
# Run the full experiment loop via the SystemExperiment class
|
||||
result = experiment.run(
|
||||
tox_env=tox_env,
|
||||
model=model,
|
||||
program_content=program_content,
|
||||
max_iterations=max_experiments,
|
||||
dry_run=dry_run,
|
||||
create_branch=False, # CLI mode: work on current branch
|
||||
)
|
||||
|
||||
# Display results for each iteration
|
||||
for i, record in enumerate(experiment.results, 1):
|
||||
_progress_callback(i, max_experiments, record["hypothesis"])
|
||||
|
||||
if dry_run:
|
||||
continue
|
||||
|
||||
# Edit phase result
|
||||
typer.echo(" → editing …", nl=False)
|
||||
if record.get("edit_failed"):
|
||||
typer.echo(f" skipped ({record.get('edit_result', 'unknown')})")
|
||||
else:
|
||||
typer.echo(" done")
|
||||
|
||||
# Evaluate phase result
|
||||
duration = record.get("duration_s", 0)
|
||||
typer.echo(f" → running tox … {duration}s")
|
||||
|
||||
# Assessment
|
||||
assessment = record.get("assessment", "No assessment")
|
||||
typer.echo(f" → {assessment}")
|
||||
|
||||
# Outcome
|
||||
if record.get("committed"):
|
||||
typer.echo(" → committed")
|
||||
elif record.get("reverted"):
|
||||
typer.echo(" → reverted (no improvement)")
|
||||
|
||||
typer.echo()
|
||||
|
||||
except KeyboardInterrupt:
|
||||
typer.echo("\nInterrupted.")
|
||||
raise typer.Exit(0) from None
|
||||
|
||||
typer.echo(typer.style("Autoresearch complete.", bold=True))
|
||||
if result.get("baseline") is not None:
|
||||
typer.echo(f"Final {metric}: {result['baseline']:.4f}")
|
||||
|
||||
|
||||
def main():
|
||||
app()
|
||||
|
||||
@@ -423,49 +423,6 @@ async def _poll_issue_completion(
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _format_assignment_comment(
|
||||
display_name: str,
|
||||
task_type: TaskType,
|
||||
description: str,
|
||||
acceptance_criteria: list[str],
|
||||
) -> str:
|
||||
"""Build the markdown comment body for a task assignment.
|
||||
|
||||
Args:
|
||||
display_name: Human-readable agent name.
|
||||
task_type: The inferred task type.
|
||||
description: Task description.
|
||||
acceptance_criteria: List of acceptance criteria strings.
|
||||
|
||||
Returns:
|
||||
Formatted markdown string for the comment.
|
||||
"""
|
||||
criteria_md = (
|
||||
"\n".join(f"- {c}" for c in acceptance_criteria)
|
||||
if acceptance_criteria
|
||||
else "_None specified_"
|
||||
)
|
||||
return (
|
||||
f"## Assigned to {display_name}\n\n"
|
||||
f"**Task type:** `{task_type.value}`\n\n"
|
||||
f"**Description:**\n{description}\n\n"
|
||||
f"**Acceptance criteria:**\n{criteria_md}\n\n"
|
||||
f"---\n*Dispatched by Timmy agent dispatcher.*"
|
||||
)
|
||||
|
||||
|
||||
def _select_label(agent: AgentType) -> str | None:
|
||||
"""Return the Gitea label for an agent based on its spec.
|
||||
|
||||
Args:
|
||||
agent: The target agent.
|
||||
|
||||
Returns:
|
||||
Label name or None if the agent has no label.
|
||||
"""
|
||||
return AGENT_REGISTRY[agent].gitea_label
|
||||
|
||||
|
||||
async def _dispatch_via_gitea(
|
||||
agent: AgentType,
|
||||
issue_number: int,
|
||||
@@ -520,27 +477,37 @@ async def _dispatch_via_gitea(
|
||||
|
||||
async with httpx.AsyncClient(timeout=15) as client:
|
||||
# 1. Apply agent label (if applicable)
|
||||
label = _select_label(agent)
|
||||
if label:
|
||||
ok = await _apply_gitea_label(client, base_url, repo, headers, issue_number, label)
|
||||
if spec.gitea_label:
|
||||
ok = await _apply_gitea_label(
|
||||
client, base_url, repo, headers, issue_number, spec.gitea_label
|
||||
)
|
||||
if ok:
|
||||
label_applied = label
|
||||
label_applied = spec.gitea_label
|
||||
logger.info(
|
||||
"Applied label %r to issue #%s for %s",
|
||||
label,
|
||||
spec.gitea_label,
|
||||
issue_number,
|
||||
spec.display_name,
|
||||
)
|
||||
else:
|
||||
logger.warning(
|
||||
"Could not apply label %r to issue #%s",
|
||||
label,
|
||||
spec.gitea_label,
|
||||
issue_number,
|
||||
)
|
||||
|
||||
# 2. Post assignment comment
|
||||
comment_body = _format_assignment_comment(
|
||||
spec.display_name, task_type, description, acceptance_criteria
|
||||
criteria_md = (
|
||||
"\n".join(f"- {c}" for c in acceptance_criteria)
|
||||
if acceptance_criteria
|
||||
else "_None specified_"
|
||||
)
|
||||
comment_body = (
|
||||
f"## Assigned to {spec.display_name}\n\n"
|
||||
f"**Task type:** `{task_type.value}`\n\n"
|
||||
f"**Description:**\n{description}\n\n"
|
||||
f"**Acceptance criteria:**\n{criteria_md}\n\n"
|
||||
f"---\n*Dispatched by Timmy agent dispatcher.*"
|
||||
)
|
||||
comment_id = await _post_gitea_comment(
|
||||
client, base_url, repo, headers, issue_number, comment_body
|
||||
@@ -686,80 +653,6 @@ async def _dispatch_local(
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _validate_task(
|
||||
title: str,
|
||||
task_type: TaskType | None,
|
||||
agent: AgentType | None,
|
||||
issue_number: int | None,
|
||||
) -> DispatchResult | None:
|
||||
"""Validate task preconditions.
|
||||
|
||||
Args:
|
||||
title: Task title to validate.
|
||||
task_type: Optional task type for result construction.
|
||||
agent: Optional agent for result construction.
|
||||
issue_number: Optional issue number for result construction.
|
||||
|
||||
Returns:
|
||||
A failed DispatchResult if validation fails, None otherwise.
|
||||
"""
|
||||
if not title.strip():
|
||||
return DispatchResult(
|
||||
task_type=task_type or TaskType.ROUTINE_CODING,
|
||||
agent=agent or AgentType.TIMMY,
|
||||
issue_number=issue_number,
|
||||
status=DispatchStatus.FAILED,
|
||||
error="`title` is required.",
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
def _select_dispatch_strategy(agent: AgentType, issue_number: int | None) -> str:
|
||||
"""Select the dispatch strategy based on agent interface and context.
|
||||
|
||||
Args:
|
||||
agent: The target agent.
|
||||
issue_number: Optional Gitea issue number.
|
||||
|
||||
Returns:
|
||||
Strategy name: "gitea", "api", or "local".
|
||||
"""
|
||||
spec = AGENT_REGISTRY[agent]
|
||||
if spec.interface == "gitea" and issue_number is not None:
|
||||
return "gitea"
|
||||
if spec.interface == "api":
|
||||
return "api"
|
||||
return "local"
|
||||
|
||||
|
||||
def _log_dispatch_result(
|
||||
title: str,
|
||||
result: DispatchResult,
|
||||
attempt: int,
|
||||
max_retries: int,
|
||||
) -> None:
|
||||
"""Log the outcome of a dispatch attempt.
|
||||
|
||||
Args:
|
||||
title: Task title for logging context.
|
||||
result: The dispatch result.
|
||||
attempt: Current attempt number (0-indexed).
|
||||
max_retries: Maximum retry attempts allowed.
|
||||
"""
|
||||
if result.success:
|
||||
return
|
||||
|
||||
if attempt > 0:
|
||||
logger.info("Retry %d/%d for task %r", attempt, max_retries, title[:60])
|
||||
|
||||
logger.warning(
|
||||
"Dispatch attempt %d failed for task %r: %s",
|
||||
attempt + 1,
|
||||
title[:60],
|
||||
result.error,
|
||||
)
|
||||
|
||||
|
||||
async def dispatch_task(
|
||||
title: str,
|
||||
description: str = "",
|
||||
@@ -800,13 +693,17 @@ async def dispatch_task(
|
||||
if result.success:
|
||||
print(f"Assigned to {result.agent.value}")
|
||||
"""
|
||||
# 1. Validate
|
||||
validation_error = _validate_task(title, task_type, agent, issue_number)
|
||||
if validation_error:
|
||||
return validation_error
|
||||
|
||||
# 2. Resolve task type and agent
|
||||
criteria = acceptance_criteria or []
|
||||
|
||||
if not title.strip():
|
||||
return DispatchResult(
|
||||
task_type=task_type or TaskType.ROUTINE_CODING,
|
||||
agent=agent or AgentType.TIMMY,
|
||||
issue_number=issue_number,
|
||||
status=DispatchStatus.FAILED,
|
||||
error="`title` is required.",
|
||||
)
|
||||
|
||||
resolved_type = task_type or infer_task_type(title, description)
|
||||
resolved_agent = agent or select_agent(resolved_type)
|
||||
|
||||
@@ -818,16 +715,18 @@ async def dispatch_task(
|
||||
issue_number,
|
||||
)
|
||||
|
||||
# 3. Select strategy and dispatch with retries
|
||||
strategy = _select_dispatch_strategy(resolved_agent, issue_number)
|
||||
last_result: DispatchResult | None = None
|
||||
spec = AGENT_REGISTRY[resolved_agent]
|
||||
|
||||
last_result: DispatchResult | None = None
|
||||
for attempt in range(max_retries + 1):
|
||||
if strategy == "gitea":
|
||||
if attempt > 0:
|
||||
logger.info("Retry %d/%d for task %r", attempt, max_retries, title[:60])
|
||||
|
||||
if spec.interface == "gitea" and issue_number is not None:
|
||||
result = await _dispatch_via_gitea(
|
||||
resolved_agent, issue_number, title, description, criteria
|
||||
)
|
||||
elif strategy == "api":
|
||||
elif spec.interface == "api":
|
||||
result = await _dispatch_via_api(
|
||||
resolved_agent, title, description, criteria, issue_number, api_endpoint
|
||||
)
|
||||
@@ -840,9 +739,14 @@ async def dispatch_task(
|
||||
if result.success:
|
||||
return result
|
||||
|
||||
_log_dispatch_result(title, result, attempt, max_retries)
|
||||
logger.warning(
|
||||
"Dispatch attempt %d failed for task %r: %s",
|
||||
attempt + 1,
|
||||
title[:60],
|
||||
result.error,
|
||||
)
|
||||
|
||||
# 4. All attempts exhausted — escalate
|
||||
# All attempts exhausted — escalate
|
||||
assert last_result is not None
|
||||
last_result.status = DispatchStatus.ESCALATED
|
||||
logger.error(
|
||||
|
||||
@@ -1,10 +1,7 @@
|
||||
"""Memory — Persistent conversation and knowledge memory.
|
||||
|
||||
Sub-modules:
|
||||
embeddings — text-to-vector embedding + similarity functions
|
||||
unified — unified memory schema and connection management
|
||||
chain — CRUD operations (store, search, delete, stats)
|
||||
semantic — SemanticMemory and MemorySearcher classes
|
||||
consolidation — HotMemory and VaultMemory classes
|
||||
vector_store — backward compatibility re-exports from memory_system
|
||||
embeddings — text-to-vector embedding + similarity functions
|
||||
unified — unified memory schema and connection management
|
||||
vector_store — backward compatibility re-exports from memory_system
|
||||
"""
|
||||
|
||||
@@ -1,387 +0,0 @@
|
||||
"""CRUD operations for Timmy's unified memory database.
|
||||
|
||||
Provides store, search, delete, and management functions for the
|
||||
`memories` table defined in timmy.memory.unified.
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
import sqlite3
|
||||
import uuid
|
||||
from contextlib import contextmanager
|
||||
from datetime import UTC, datetime, timedelta
|
||||
from pathlib import Path
|
||||
|
||||
from config import settings
|
||||
from timmy.memory.embeddings import (
|
||||
_keyword_overlap,
|
||||
cosine_similarity,
|
||||
embed_text,
|
||||
)
|
||||
from timmy.memory.unified import (
|
||||
DB_PATH,
|
||||
MemoryEntry,
|
||||
_ensure_schema,
|
||||
get_connection,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def store_memory(
|
||||
content: str,
|
||||
source: str,
|
||||
context_type: str = "conversation",
|
||||
agent_id: str | None = None,
|
||||
task_id: str | None = None,
|
||||
session_id: str | None = None,
|
||||
metadata: dict | None = None,
|
||||
compute_embedding: bool = True,
|
||||
) -> MemoryEntry:
|
||||
"""Store a memory entry with optional embedding.
|
||||
|
||||
Args:
|
||||
content: The text content to store
|
||||
source: Source of the memory (agent name, user, system)
|
||||
context_type: Type of context (conversation, document, fact, vault_chunk)
|
||||
agent_id: Associated agent ID
|
||||
task_id: Associated task ID
|
||||
session_id: Session identifier
|
||||
metadata: Additional structured data
|
||||
compute_embedding: Whether to compute vector embedding
|
||||
|
||||
Returns:
|
||||
The stored MemoryEntry
|
||||
"""
|
||||
embedding = None
|
||||
if compute_embedding:
|
||||
embedding = embed_text(content)
|
||||
|
||||
entry = MemoryEntry(
|
||||
content=content,
|
||||
source=source,
|
||||
context_type=context_type,
|
||||
agent_id=agent_id,
|
||||
task_id=task_id,
|
||||
session_id=session_id,
|
||||
metadata=metadata,
|
||||
embedding=embedding,
|
||||
)
|
||||
|
||||
with get_connection() as conn:
|
||||
conn.execute(
|
||||
"""
|
||||
INSERT INTO memories
|
||||
(id, content, memory_type, source, agent_id, task_id, session_id,
|
||||
metadata, embedding, created_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
""",
|
||||
(
|
||||
entry.id,
|
||||
entry.content,
|
||||
entry.context_type, # DB column is memory_type
|
||||
entry.source,
|
||||
entry.agent_id,
|
||||
entry.task_id,
|
||||
entry.session_id,
|
||||
json.dumps(metadata) if metadata else None,
|
||||
json.dumps(embedding) if embedding else None,
|
||||
entry.timestamp,
|
||||
),
|
||||
)
|
||||
conn.commit()
|
||||
|
||||
return entry
|
||||
|
||||
|
||||
def _build_search_filters(
|
||||
context_type: str | None,
|
||||
agent_id: str | None,
|
||||
session_id: str | None,
|
||||
) -> tuple[str, list]:
|
||||
"""Build SQL WHERE clause and params from search filters."""
|
||||
conditions: list[str] = []
|
||||
params: list = []
|
||||
|
||||
if context_type:
|
||||
conditions.append("memory_type = ?")
|
||||
params.append(context_type)
|
||||
if agent_id:
|
||||
conditions.append("agent_id = ?")
|
||||
params.append(agent_id)
|
||||
if session_id:
|
||||
conditions.append("session_id = ?")
|
||||
params.append(session_id)
|
||||
|
||||
where_clause = "WHERE " + " AND ".join(conditions) if conditions else ""
|
||||
return where_clause, params
|
||||
|
||||
|
||||
def _fetch_memory_candidates(
|
||||
where_clause: str, params: list, candidate_limit: int
|
||||
) -> list[sqlite3.Row]:
|
||||
"""Fetch candidate memory rows from the database."""
|
||||
query_sql = f"""
|
||||
SELECT * FROM memories
|
||||
{where_clause}
|
||||
ORDER BY created_at DESC
|
||||
LIMIT ?
|
||||
"""
|
||||
params.append(candidate_limit)
|
||||
|
||||
with get_connection() as conn:
|
||||
return conn.execute(query_sql, params).fetchall()
|
||||
|
||||
|
||||
def _row_to_entry(row: sqlite3.Row) -> MemoryEntry:
|
||||
"""Convert a database row to a MemoryEntry."""
|
||||
return MemoryEntry(
|
||||
id=row["id"],
|
||||
content=row["content"],
|
||||
source=row["source"],
|
||||
context_type=row["memory_type"], # DB column -> API field
|
||||
agent_id=row["agent_id"],
|
||||
task_id=row["task_id"],
|
||||
session_id=row["session_id"],
|
||||
metadata=json.loads(row["metadata"]) if row["metadata"] else None,
|
||||
embedding=json.loads(row["embedding"]) if row["embedding"] else None,
|
||||
timestamp=row["created_at"],
|
||||
)
|
||||
|
||||
|
||||
def _score_and_filter(
|
||||
rows: list[sqlite3.Row],
|
||||
query: str,
|
||||
query_embedding: list[float],
|
||||
min_relevance: float,
|
||||
) -> list[MemoryEntry]:
|
||||
"""Score candidate rows by similarity and filter by min_relevance."""
|
||||
results = []
|
||||
for row in rows:
|
||||
entry = _row_to_entry(row)
|
||||
|
||||
if entry.embedding:
|
||||
score = cosine_similarity(query_embedding, entry.embedding)
|
||||
else:
|
||||
score = _keyword_overlap(query, entry.content)
|
||||
|
||||
entry.relevance_score = score
|
||||
if score >= min_relevance:
|
||||
results.append(entry)
|
||||
|
||||
results.sort(key=lambda x: x.relevance_score or 0, reverse=True)
|
||||
return results
|
||||
|
||||
|
||||
def search_memories(
|
||||
query: str,
|
||||
limit: int = 10,
|
||||
context_type: str | None = None,
|
||||
agent_id: str | None = None,
|
||||
session_id: str | None = None,
|
||||
min_relevance: float = 0.0,
|
||||
) -> list[MemoryEntry]:
|
||||
"""Search for memories by semantic similarity.
|
||||
|
||||
Args:
|
||||
query: Search query text
|
||||
limit: Maximum results
|
||||
context_type: Filter by memory type (maps to DB memory_type column)
|
||||
agent_id: Filter by agent
|
||||
session_id: Filter by session
|
||||
min_relevance: Minimum similarity score (0-1)
|
||||
|
||||
Returns:
|
||||
List of MemoryEntry objects sorted by relevance
|
||||
"""
|
||||
query_embedding = embed_text(query)
|
||||
where_clause, params = _build_search_filters(context_type, agent_id, session_id)
|
||||
rows = _fetch_memory_candidates(where_clause, params, limit * 3)
|
||||
results = _score_and_filter(rows, query, query_embedding, min_relevance)
|
||||
return results[:limit]
|
||||
|
||||
|
||||
def delete_memory(memory_id: str) -> bool:
|
||||
"""Delete a memory entry by ID.
|
||||
|
||||
Returns:
|
||||
True if deleted, False if not found
|
||||
"""
|
||||
with get_connection() as conn:
|
||||
cursor = conn.execute(
|
||||
"DELETE FROM memories WHERE id = ?",
|
||||
(memory_id,),
|
||||
)
|
||||
conn.commit()
|
||||
return cursor.rowcount > 0
|
||||
|
||||
|
||||
def get_memory_stats() -> dict:
|
||||
"""Get statistics about the memory store.
|
||||
|
||||
Returns:
|
||||
Dict with counts by type, total entries, etc.
|
||||
"""
|
||||
from timmy.memory.embeddings import _get_embedding_model
|
||||
|
||||
with get_connection() as conn:
|
||||
total = conn.execute("SELECT COUNT(*) as count FROM memories").fetchone()["count"]
|
||||
|
||||
by_type = {}
|
||||
rows = conn.execute(
|
||||
"SELECT memory_type, COUNT(*) as count FROM memories GROUP BY memory_type"
|
||||
).fetchall()
|
||||
for row in rows:
|
||||
by_type[row["memory_type"]] = row["count"]
|
||||
|
||||
with_embeddings = conn.execute(
|
||||
"SELECT COUNT(*) as count FROM memories WHERE embedding IS NOT NULL"
|
||||
).fetchone()["count"]
|
||||
|
||||
return {
|
||||
"total_entries": total,
|
||||
"by_type": by_type,
|
||||
"with_embeddings": with_embeddings,
|
||||
"has_embedding_model": _get_embedding_model() is not False,
|
||||
}
|
||||
|
||||
|
||||
def prune_memories(older_than_days: int = 90, keep_facts: bool = True) -> int:
|
||||
"""Delete old memories to manage storage.
|
||||
|
||||
Args:
|
||||
older_than_days: Delete memories older than this
|
||||
keep_facts: Whether to preserve fact-type memories
|
||||
|
||||
Returns:
|
||||
Number of entries deleted
|
||||
"""
|
||||
cutoff = (datetime.now(UTC) - timedelta(days=older_than_days)).isoformat()
|
||||
|
||||
with get_connection() as conn:
|
||||
if keep_facts:
|
||||
cursor = conn.execute(
|
||||
"""
|
||||
DELETE FROM memories
|
||||
WHERE created_at < ? AND memory_type != 'fact'
|
||||
""",
|
||||
(cutoff,),
|
||||
)
|
||||
else:
|
||||
cursor = conn.execute(
|
||||
"DELETE FROM memories WHERE created_at < ?",
|
||||
(cutoff,),
|
||||
)
|
||||
|
||||
deleted = cursor.rowcount
|
||||
conn.commit()
|
||||
|
||||
return deleted
|
||||
|
||||
|
||||
def get_memory_context(query: str, max_tokens: int = 2000, **filters) -> str:
|
||||
"""Get relevant memory context as formatted text for LLM prompts.
|
||||
|
||||
Args:
|
||||
query: Search query
|
||||
max_tokens: Approximate maximum tokens to return
|
||||
**filters: Additional filters (agent_id, session_id, etc.)
|
||||
|
||||
Returns:
|
||||
Formatted context string for inclusion in prompts
|
||||
"""
|
||||
memories = search_memories(query, limit=20, **filters)
|
||||
|
||||
context_parts = []
|
||||
total_chars = 0
|
||||
max_chars = max_tokens * 4 # Rough approximation
|
||||
|
||||
for mem in memories:
|
||||
formatted = f"[{mem.source}]: {mem.content}"
|
||||
if total_chars + len(formatted) > max_chars:
|
||||
break
|
||||
context_parts.append(formatted)
|
||||
total_chars += len(formatted)
|
||||
|
||||
if not context_parts:
|
||||
return ""
|
||||
|
||||
return "Relevant context from memory:\n" + "\n\n".join(context_parts)
|
||||
|
||||
|
||||
def recall_personal_facts(agent_id: str | None = None) -> list[str]:
|
||||
"""Recall personal facts about the user or system.
|
||||
|
||||
Args:
|
||||
agent_id: Optional agent filter
|
||||
|
||||
Returns:
|
||||
List of fact strings
|
||||
"""
|
||||
with get_connection() as conn:
|
||||
if agent_id:
|
||||
rows = conn.execute(
|
||||
"""
|
||||
SELECT content FROM memories
|
||||
WHERE memory_type = 'fact' AND agent_id = ?
|
||||
ORDER BY created_at DESC
|
||||
LIMIT 100
|
||||
""",
|
||||
(agent_id,),
|
||||
).fetchall()
|
||||
else:
|
||||
rows = conn.execute(
|
||||
"""
|
||||
SELECT content FROM memories
|
||||
WHERE memory_type = 'fact'
|
||||
ORDER BY created_at DESC
|
||||
LIMIT 100
|
||||
""",
|
||||
).fetchall()
|
||||
|
||||
return [r["content"] for r in rows]
|
||||
|
||||
|
||||
def recall_personal_facts_with_ids(agent_id: str | None = None) -> list[dict]:
|
||||
"""Recall personal facts with their IDs for edit/delete operations."""
|
||||
with get_connection() as conn:
|
||||
if agent_id:
|
||||
rows = conn.execute(
|
||||
"SELECT id, content FROM memories WHERE memory_type = 'fact' AND agent_id = ? ORDER BY created_at DESC LIMIT 100",
|
||||
(agent_id,),
|
||||
).fetchall()
|
||||
else:
|
||||
rows = conn.execute(
|
||||
"SELECT id, content FROM memories WHERE memory_type = 'fact' ORDER BY created_at DESC LIMIT 100",
|
||||
).fetchall()
|
||||
return [{"id": r["id"], "content": r["content"]} for r in rows]
|
||||
|
||||
|
||||
def update_personal_fact(memory_id: str, new_content: str) -> bool:
|
||||
"""Update a personal fact's content."""
|
||||
with get_connection() as conn:
|
||||
cursor = conn.execute(
|
||||
"UPDATE memories SET content = ? WHERE id = ? AND memory_type = 'fact'",
|
||||
(new_content, memory_id),
|
||||
)
|
||||
conn.commit()
|
||||
return cursor.rowcount > 0
|
||||
|
||||
|
||||
def store_personal_fact(fact: str, agent_id: str | None = None) -> MemoryEntry:
|
||||
"""Store a personal fact about the user or system.
|
||||
|
||||
Args:
|
||||
fact: The fact to store
|
||||
agent_id: Associated agent
|
||||
|
||||
Returns:
|
||||
The stored MemoryEntry
|
||||
"""
|
||||
return store_memory(
|
||||
content=fact,
|
||||
source="system",
|
||||
context_type="fact",
|
||||
agent_id=agent_id,
|
||||
metadata={"auto_extracted": False},
|
||||
)
|
||||
@@ -1,310 +0,0 @@
|
||||
"""Hot and Vault memory classes for Timmy's memory consolidation tier.
|
||||
|
||||
HotMemory: Tier 1 — computed view of top facts from the database.
|
||||
VaultMemory: Tier 2 — structured vault (memory/ directory), append-only markdown.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import re
|
||||
from datetime import UTC, datetime
|
||||
from pathlib import Path
|
||||
|
||||
from timmy.memory.unified import PROJECT_ROOT
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
VAULT_PATH = PROJECT_ROOT / "memory"
|
||||
|
||||
_DEFAULT_HOT_MEMORY_TEMPLATE = """\
|
||||
# Timmy Hot Memory
|
||||
|
||||
> Working RAM — always loaded, ~300 lines max, pruned monthly
|
||||
> Last updated: {date}
|
||||
|
||||
---
|
||||
|
||||
## Current Status
|
||||
|
||||
**Agent State:** Operational
|
||||
**Mode:** Development
|
||||
**Active Tasks:** 0
|
||||
**Pending Decisions:** None
|
||||
|
||||
---
|
||||
|
||||
## Standing Rules
|
||||
|
||||
1. **Sovereignty First** — No cloud dependencies
|
||||
2. **Local-Only Inference** — Ollama on localhost
|
||||
3. **Privacy by Design** — Telemetry disabled
|
||||
4. **Tool Minimalism** — Use tools only when necessary
|
||||
5. **Memory Discipline** — Write handoffs at session end
|
||||
|
||||
---
|
||||
|
||||
## Agent Roster
|
||||
|
||||
| Agent | Role | Status |
|
||||
|-------|------|--------|
|
||||
| Timmy | Core | Active |
|
||||
|
||||
---
|
||||
|
||||
## User Profile
|
||||
|
||||
**Name:** (not set)
|
||||
**Interests:** (to be learned)
|
||||
|
||||
---
|
||||
|
||||
## Key Decisions
|
||||
|
||||
(none yet)
|
||||
|
||||
---
|
||||
|
||||
## Pending Actions
|
||||
|
||||
- [ ] Learn user's name
|
||||
|
||||
---
|
||||
|
||||
*Prune date: {prune_date}*
|
||||
"""
|
||||
|
||||
|
||||
class HotMemory:
|
||||
"""Tier 1: Hot memory — computed view of top facts from DB."""
|
||||
|
||||
def __init__(self, path=None) -> None:
|
||||
if path is None:
|
||||
path = PROJECT_ROOT / "MEMORY.md"
|
||||
self.path = path
|
||||
self._content: str | None = None
|
||||
self._last_modified: float | None = None
|
||||
|
||||
def read(self, force_refresh: bool = False) -> str:
|
||||
"""Read hot memory — computed view of top facts + last reflection from DB."""
|
||||
from timmy.memory.chain import recall_personal_facts
|
||||
# Import recall_last_reflection lazily to support patching in memory_system
|
||||
try:
|
||||
# Use the version from memory_system so patches work correctly
|
||||
import timmy.memory_system as _ms
|
||||
recall_last_reflection = _ms.recall_last_reflection
|
||||
except Exception:
|
||||
from timmy.memory.chain import recall_personal_facts as _rpf # noqa: F811
|
||||
recall_last_reflection = None
|
||||
|
||||
try:
|
||||
facts = recall_personal_facts()
|
||||
lines = ["# Timmy Hot Memory\n"]
|
||||
|
||||
if facts:
|
||||
lines.append("## Known Facts\n")
|
||||
for f in facts[:15]:
|
||||
lines.append(f"- {f}")
|
||||
|
||||
# Include the last reflection if available
|
||||
if recall_last_reflection is not None:
|
||||
try:
|
||||
reflection = recall_last_reflection()
|
||||
if reflection:
|
||||
lines.append("\n## Last Reflection\n")
|
||||
lines.append(reflection)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
if len(lines) > 1:
|
||||
return "\n".join(lines)
|
||||
except Exception:
|
||||
logger.debug("DB context read failed, falling back to file")
|
||||
|
||||
# Fallback to file if DB unavailable
|
||||
if self.path.exists():
|
||||
return self.path.read_text()
|
||||
|
||||
return "# Timmy Hot Memory\n\nNo memories stored yet.\n"
|
||||
|
||||
def update_section(self, section: str, content: str) -> None:
|
||||
"""Update a specific section in MEMORY.md.
|
||||
|
||||
DEPRECATED: Hot memory is now computed from the database.
|
||||
This method is kept for backward compatibility during transition.
|
||||
Use memory_write() to store facts in the database.
|
||||
"""
|
||||
logger.warning(
|
||||
"HotMemory.update_section() is deprecated. "
|
||||
"Use memory_write() to store facts in the database."
|
||||
)
|
||||
|
||||
# Keep file-writing for backward compatibility during transition
|
||||
# Guard against empty or excessively large writes
|
||||
if not content or not content.strip():
|
||||
logger.warning("HotMemory: Refusing empty write to section '%s'", section)
|
||||
return
|
||||
if len(content) > 2000:
|
||||
logger.warning("HotMemory: Truncating oversized write to section '%s'", section)
|
||||
content = content[:2000] + "\n... [truncated]"
|
||||
|
||||
if not self.path.exists():
|
||||
self._create_default()
|
||||
|
||||
full_content = self.read()
|
||||
|
||||
# Find section
|
||||
pattern = rf"(## {re.escape(section)}.*?)(?=\n## |\Z)"
|
||||
match = re.search(pattern, full_content, re.DOTALL)
|
||||
|
||||
if match:
|
||||
# Replace section
|
||||
new_section = f"## {section}\n\n{content}\n\n"
|
||||
full_content = full_content[: match.start()] + new_section + full_content[match.end() :]
|
||||
else:
|
||||
# Append section — guard against missing prune marker
|
||||
insert_point = full_content.rfind("*Prune date:")
|
||||
new_section = f"## {section}\n\n{content}\n\n"
|
||||
if insert_point < 0:
|
||||
# No prune marker — just append at end
|
||||
full_content = full_content.rstrip() + "\n\n" + new_section
|
||||
else:
|
||||
full_content = (
|
||||
full_content[:insert_point] + new_section + "\n" + full_content[insert_point:]
|
||||
)
|
||||
|
||||
self.path.write_text(full_content)
|
||||
self._content = full_content
|
||||
self._last_modified = self.path.stat().st_mtime
|
||||
logger.info("HotMemory: Updated section '%s'", section)
|
||||
|
||||
def _create_default(self) -> None:
|
||||
"""Create default MEMORY.md if missing.
|
||||
|
||||
DEPRECATED: Hot memory is now computed from the database.
|
||||
This method is kept for backward compatibility during transition.
|
||||
"""
|
||||
logger.debug(
|
||||
"HotMemory._create_default() - creating default MEMORY.md for backward compatibility"
|
||||
)
|
||||
now = datetime.now(UTC)
|
||||
content = _DEFAULT_HOT_MEMORY_TEMPLATE.format(
|
||||
date=now.strftime("%Y-%m-%d"),
|
||||
prune_date=now.replace(day=25).strftime("%Y-%m-%d"),
|
||||
)
|
||||
self.path.write_text(content)
|
||||
logger.info("HotMemory: Created default MEMORY.md")
|
||||
|
||||
|
||||
class VaultMemory:
|
||||
"""Tier 2: Structured vault (memory/) — append-only markdown."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self.path = VAULT_PATH
|
||||
self._ensure_structure()
|
||||
|
||||
def _ensure_structure(self) -> None:
|
||||
"""Ensure vault directory structure exists."""
|
||||
(self.path / "self").mkdir(parents=True, exist_ok=True)
|
||||
(self.path / "notes").mkdir(parents=True, exist_ok=True)
|
||||
(self.path / "aar").mkdir(parents=True, exist_ok=True)
|
||||
|
||||
def write_note(self, name: str, content: str, namespace: str = "notes") -> Path:
|
||||
"""Write a note to the vault."""
|
||||
# Add timestamp to filename
|
||||
timestamp = datetime.now(UTC).strftime("%Y%m%d")
|
||||
filename = f"{timestamp}_{name}.md"
|
||||
filepath = self.path / namespace / filename
|
||||
|
||||
# Add header
|
||||
full_content = f"""# {name.replace("_", " ").title()}
|
||||
|
||||
> Created: {datetime.now(UTC).isoformat()}
|
||||
> Namespace: {namespace}
|
||||
|
||||
---
|
||||
|
||||
{content}
|
||||
|
||||
---
|
||||
|
||||
*Auto-generated by Timmy Memory System*
|
||||
"""
|
||||
|
||||
filepath.write_text(full_content)
|
||||
logger.info("VaultMemory: Wrote %s", filepath)
|
||||
return filepath
|
||||
|
||||
def read_file(self, filepath: Path) -> str:
|
||||
"""Read a file from the vault."""
|
||||
if not filepath.exists():
|
||||
return ""
|
||||
return filepath.read_text()
|
||||
|
||||
def update_user_profile(self, key: str, value: str) -> None:
|
||||
"""Update a field in user_profile.md.
|
||||
|
||||
DEPRECATED: User profile updates should now use memory_write() to store
|
||||
facts in the database. This method is kept for backward compatibility.
|
||||
"""
|
||||
logger.warning(
|
||||
"VaultMemory.update_user_profile() is deprecated. "
|
||||
"Use memory_write() to store user facts in the database."
|
||||
)
|
||||
# Still update the file for backward compatibility during transition
|
||||
profile_path = self.path / "self" / "user_profile.md"
|
||||
|
||||
if not profile_path.exists():
|
||||
self._create_default_profile()
|
||||
|
||||
content = profile_path.read_text()
|
||||
|
||||
pattern = rf"(\*\*{re.escape(key)}:\*\*).*"
|
||||
if re.search(pattern, content):
|
||||
safe_value = value.strip()
|
||||
content = re.sub(pattern, lambda m: f"{m.group(1)} {safe_value}", content)
|
||||
else:
|
||||
facts_section = "## Important Facts"
|
||||
if facts_section in content:
|
||||
insert_point = content.find(facts_section) + len(facts_section)
|
||||
content = content[:insert_point] + f"\n- {key}: {value}" + content[insert_point:]
|
||||
|
||||
content = re.sub(
|
||||
r"\*Last updated:.*\*",
|
||||
f"*Last updated: {datetime.now(UTC).strftime('%Y-%m-%d')}*",
|
||||
content,
|
||||
)
|
||||
|
||||
profile_path.write_text(content)
|
||||
logger.info("VaultMemory: Updated user profile: %s = %s", key, value)
|
||||
|
||||
def _create_default_profile(self) -> None:
|
||||
"""Create default user profile."""
|
||||
profile_path = self.path / "self" / "user_profile.md"
|
||||
default = """# User Profile
|
||||
|
||||
> Learned information about the user.
|
||||
|
||||
## Basic Information
|
||||
|
||||
**Name:** (unknown)
|
||||
**Location:** (unknown)
|
||||
**Occupation:** (unknown)
|
||||
|
||||
## Interests & Expertise
|
||||
|
||||
- (to be learned)
|
||||
|
||||
## Preferences
|
||||
|
||||
- Response style: concise, technical
|
||||
- Tool usage: minimal
|
||||
|
||||
## Important Facts
|
||||
|
||||
- (to be extracted)
|
||||
|
||||
---
|
||||
|
||||
*Last updated: {date}*
|
||||
""".format(date=datetime.now(UTC).strftime("%Y-%m-%d"))
|
||||
|
||||
profile_path.write_text(default)
|
||||
@@ -7,97 +7,37 @@ Also includes vector similarity utilities (cosine similarity, keyword overlap).
|
||||
"""
|
||||
|
||||
import hashlib
|
||||
import json
|
||||
import logging
|
||||
import math
|
||||
|
||||
import httpx # Import httpx for Ollama API calls
|
||||
|
||||
from config import settings
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Embedding model - small, fast, local
|
||||
EMBEDDING_MODEL = None
|
||||
EMBEDDING_DIM = 384 # MiniLM dimension, will be overridden if Ollama model has different dim
|
||||
|
||||
|
||||
class OllamaEmbedder:
|
||||
"""Mimics SentenceTransformer interface for Ollama."""
|
||||
|
||||
def __init__(self, model_name: str, ollama_url: str):
|
||||
self.model_name = model_name
|
||||
self.ollama_url = ollama_url
|
||||
self.dimension = 0 # Will be updated after first call
|
||||
|
||||
def encode(
|
||||
self,
|
||||
sentences: str | list[str],
|
||||
convert_to_numpy: bool = False,
|
||||
normalize_embeddings: bool = True,
|
||||
) -> list[list[float]] | list[float]:
|
||||
"""Generate embeddings using Ollama."""
|
||||
if isinstance(sentences, str):
|
||||
sentences = [sentences]
|
||||
|
||||
all_embeddings = []
|
||||
for sentence in sentences:
|
||||
try:
|
||||
response = httpx.post(
|
||||
f"{self.ollama_url}/api/embeddings",
|
||||
json={"model": self.model_name, "prompt": sentence},
|
||||
timeout=settings.mcp_bridge_timeout,
|
||||
)
|
||||
response.raise_for_status()
|
||||
embedding = response.json()["embedding"]
|
||||
if not self.dimension:
|
||||
self.dimension = len(embedding) # Set dimension on first successful call
|
||||
global EMBEDDING_DIM
|
||||
EMBEDDING_DIM = self.dimension # Update global EMBEDDING_DIM
|
||||
all_embeddings.append(embedding)
|
||||
except httpx.RequestError as exc:
|
||||
logger.error("Ollama embeddings request failed: %s", exc)
|
||||
# Fallback to simple hash embedding on Ollama error
|
||||
return _simple_hash_embedding(sentence)
|
||||
except json.JSONDecodeError as exc:
|
||||
logger.error("Failed to decode Ollama embeddings response: %s", exc)
|
||||
return _simple_hash_embedding(sentence)
|
||||
|
||||
if len(all_embeddings) == 1 and isinstance(sentences, str):
|
||||
return all_embeddings[0]
|
||||
return all_embeddings
|
||||
EMBEDDING_DIM = 384 # MiniLM dimension
|
||||
|
||||
|
||||
def _get_embedding_model():
|
||||
"""Lazy-load embedding model, preferring Ollama if configured."""
|
||||
"""Lazy-load embedding model."""
|
||||
global EMBEDDING_MODEL
|
||||
global EMBEDDING_DIM
|
||||
if EMBEDDING_MODEL is None:
|
||||
if settings.timmy_skip_embeddings:
|
||||
EMBEDDING_MODEL = False
|
||||
return EMBEDDING_MODEL
|
||||
try:
|
||||
from config import settings
|
||||
|
||||
if settings.timmy_embedding_backend == "ollama":
|
||||
logger.info(
|
||||
"MemorySystem: Using Ollama for embeddings with model %s",
|
||||
settings.ollama_embedding_model,
|
||||
)
|
||||
EMBEDDING_MODEL = OllamaEmbedder(
|
||||
settings.ollama_embedding_model, settings.normalized_ollama_url
|
||||
)
|
||||
# We don't know the dimension until after the first call, so keep it default for now.
|
||||
# It will be updated dynamically in OllamaEmbedder.encode
|
||||
return EMBEDDING_MODEL
|
||||
else:
|
||||
try:
|
||||
from sentence_transformers import SentenceTransformer
|
||||
if settings.timmy_skip_embeddings:
|
||||
EMBEDDING_MODEL = False
|
||||
return EMBEDDING_MODEL
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
EMBEDDING_MODEL = SentenceTransformer("all-MiniLM-L6-v2")
|
||||
EMBEDDING_DIM = 384 # Reset to MiniLM dimension
|
||||
logger.info("MemorySystem: Loaded local embedding model (all-MiniLM-L6-v2)")
|
||||
except ImportError:
|
||||
logger.warning("MemorySystem: sentence-transformers not installed, using fallback")
|
||||
EMBEDDING_MODEL = False # Use fallback
|
||||
try:
|
||||
from sentence_transformers import SentenceTransformer
|
||||
|
||||
EMBEDDING_MODEL = SentenceTransformer("all-MiniLM-L6-v2")
|
||||
logger.info("MemorySystem: Loaded embedding model")
|
||||
except ImportError:
|
||||
logger.warning("MemorySystem: sentence-transformers not installed, using fallback")
|
||||
EMBEDDING_MODEL = False # Use fallback
|
||||
return EMBEDDING_MODEL
|
||||
|
||||
|
||||
@@ -120,10 +60,7 @@ def embed_text(text: str) -> list[float]:
|
||||
model = _get_embedding_model()
|
||||
if model and model is not False:
|
||||
embedding = model.encode(text)
|
||||
# Ensure it's a list of floats, not numpy array
|
||||
if hasattr(embedding, "tolist"):
|
||||
return embedding.tolist()
|
||||
return embedding
|
||||
return embedding.tolist()
|
||||
return _simple_hash_embedding(text)
|
||||
|
||||
|
||||
|
||||
@@ -1,278 +0,0 @@
|
||||
"""Semantic memory and search classes for Timmy.
|
||||
|
||||
Provides SemanticMemory (vector search over vault content) and
|
||||
MemorySearcher (high-level multi-tier search interface).
|
||||
"""
|
||||
|
||||
import hashlib
|
||||
import json
|
||||
import logging
|
||||
import sqlite3
|
||||
from contextlib import closing, contextmanager
|
||||
from collections.abc import Generator
|
||||
from datetime import UTC, datetime
|
||||
from pathlib import Path
|
||||
|
||||
from config import settings
|
||||
from timmy.memory.embeddings import (
|
||||
EMBEDDING_DIM,
|
||||
_get_embedding_model,
|
||||
cosine_similarity,
|
||||
embed_text,
|
||||
)
|
||||
from timmy.memory.unified import (
|
||||
DB_PATH,
|
||||
PROJECT_ROOT,
|
||||
_ensure_schema,
|
||||
get_connection,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
VAULT_PATH = PROJECT_ROOT / "memory"
|
||||
|
||||
|
||||
class SemanticMemory:
|
||||
"""Vector-based semantic search over vault content."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self.db_path = DB_PATH
|
||||
self.vault_path = VAULT_PATH
|
||||
|
||||
@contextmanager
|
||||
def _get_conn(self) -> Generator[sqlite3.Connection, None, None]:
|
||||
"""Get connection to the instance's db_path (backward compatibility).
|
||||
|
||||
Uses self.db_path if set differently from global DB_PATH,
|
||||
otherwise uses the global get_connection().
|
||||
"""
|
||||
if self.db_path == DB_PATH:
|
||||
# Use global connection (normal production path)
|
||||
with get_connection() as conn:
|
||||
yield conn
|
||||
else:
|
||||
# Use instance-specific db_path (test path)
|
||||
self.db_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
with closing(sqlite3.connect(str(self.db_path))) as conn:
|
||||
conn.row_factory = sqlite3.Row
|
||||
conn.execute("PRAGMA journal_mode=WAL")
|
||||
conn.execute(f"PRAGMA busy_timeout={settings.db_busy_timeout_ms}")
|
||||
# Ensure schema exists
|
||||
_ensure_schema(conn)
|
||||
yield conn
|
||||
|
||||
def _init_db(self) -> None:
|
||||
"""Initialize database at self.db_path (backward compatibility).
|
||||
|
||||
This method is kept for backward compatibility with existing code and tests.
|
||||
Schema creation is handled by _get_conn.
|
||||
"""
|
||||
# Trigger schema creation via _get_conn
|
||||
with self._get_conn():
|
||||
pass
|
||||
|
||||
def index_file(self, filepath: Path) -> int:
|
||||
"""Index a single file into semantic memory."""
|
||||
if not filepath.exists():
|
||||
return 0
|
||||
|
||||
content = filepath.read_text()
|
||||
file_hash = hashlib.md5(content.encode()).hexdigest()
|
||||
|
||||
with self._get_conn() as conn:
|
||||
# Check if already indexed with same hash
|
||||
cursor = conn.execute(
|
||||
"SELECT metadata FROM memories WHERE source = ? AND memory_type = 'vault_chunk' LIMIT 1",
|
||||
(str(filepath),),
|
||||
)
|
||||
existing = cursor.fetchone()
|
||||
if existing and existing[0]:
|
||||
try:
|
||||
meta = json.loads(existing[0])
|
||||
if meta.get("source_hash") == file_hash:
|
||||
return 0 # Already indexed
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
# Delete old chunks for this file
|
||||
conn.execute(
|
||||
"DELETE FROM memories WHERE source = ? AND memory_type = 'vault_chunk'",
|
||||
(str(filepath),),
|
||||
)
|
||||
|
||||
# Split into chunks (paragraphs)
|
||||
chunks = self._split_into_chunks(content)
|
||||
|
||||
# Index each chunk
|
||||
now = datetime.now(UTC).isoformat()
|
||||
for i, chunk_text in enumerate(chunks):
|
||||
if len(chunk_text.strip()) < 20: # Skip tiny chunks
|
||||
continue
|
||||
|
||||
chunk_id = f"{filepath.stem}_{i}"
|
||||
chunk_embedding = embed_text(chunk_text)
|
||||
|
||||
conn.execute(
|
||||
"""INSERT INTO memories
|
||||
(id, content, memory_type, source, metadata, embedding, created_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?)""",
|
||||
(
|
||||
chunk_id,
|
||||
chunk_text,
|
||||
"vault_chunk",
|
||||
str(filepath),
|
||||
json.dumps({"source_hash": file_hash, "chunk_index": i}),
|
||||
json.dumps(chunk_embedding),
|
||||
now,
|
||||
),
|
||||
)
|
||||
|
||||
conn.commit()
|
||||
|
||||
logger.info("SemanticMemory: Indexed %s (%d chunks)", filepath.name, len(chunks))
|
||||
return len(chunks)
|
||||
|
||||
def _split_into_chunks(self, text: str, max_chunk_size: int = 500) -> list[str]:
|
||||
"""Split text into semantic chunks."""
|
||||
# Split by paragraphs first
|
||||
paragraphs = text.split("\n\n")
|
||||
chunks = []
|
||||
|
||||
for para in paragraphs:
|
||||
para = para.strip()
|
||||
if not para:
|
||||
continue
|
||||
|
||||
# If paragraph is small enough, keep as one chunk
|
||||
if len(para) <= max_chunk_size:
|
||||
chunks.append(para)
|
||||
else:
|
||||
# Split long paragraphs by sentences
|
||||
sentences = para.replace(". ", ".\n").split("\n")
|
||||
current_chunk = ""
|
||||
|
||||
for sent in sentences:
|
||||
if len(current_chunk) + len(sent) < max_chunk_size:
|
||||
current_chunk += " " + sent if current_chunk else sent
|
||||
else:
|
||||
if current_chunk:
|
||||
chunks.append(current_chunk.strip())
|
||||
current_chunk = sent
|
||||
|
||||
if current_chunk:
|
||||
chunks.append(current_chunk.strip())
|
||||
|
||||
return chunks
|
||||
|
||||
def index_vault(self) -> int:
|
||||
"""Index entire vault directory."""
|
||||
total_chunks = 0
|
||||
|
||||
for md_file in self.vault_path.rglob("*.md"):
|
||||
# Skip handoff file (handled separately)
|
||||
if "last-session-handoff" in md_file.name:
|
||||
continue
|
||||
total_chunks += self.index_file(md_file)
|
||||
|
||||
logger.info("SemanticMemory: Indexed vault (%d total chunks)", total_chunks)
|
||||
return total_chunks
|
||||
|
||||
def search(self, query: str, top_k: int = 5) -> list[tuple[str, float]]:
|
||||
"""Search for relevant memory chunks."""
|
||||
query_embedding = embed_text(query)
|
||||
|
||||
with self._get_conn() as conn:
|
||||
conn.row_factory = sqlite3.Row
|
||||
|
||||
# Get all vault chunks
|
||||
rows = conn.execute(
|
||||
"SELECT source, content, embedding FROM memories WHERE memory_type = 'vault_chunk'"
|
||||
).fetchall()
|
||||
|
||||
# Calculate similarities
|
||||
scored = []
|
||||
for row in rows:
|
||||
embedding = json.loads(row["embedding"])
|
||||
score = cosine_similarity(query_embedding, embedding)
|
||||
scored.append((row["source"], row["content"], score))
|
||||
|
||||
# Sort by score descending
|
||||
scored.sort(key=lambda x: x[2], reverse=True)
|
||||
|
||||
# Return top_k
|
||||
return [(content, score) for _, content, score in scored[:top_k]]
|
||||
|
||||
def get_relevant_context(self, query: str, max_chars: int = 2000) -> str:
|
||||
"""Get formatted context string for a query."""
|
||||
results = self.search(query, top_k=3)
|
||||
|
||||
if not results:
|
||||
return ""
|
||||
|
||||
parts = []
|
||||
total_chars = 0
|
||||
|
||||
for content, score in results:
|
||||
if score < 0.3: # Similarity threshold
|
||||
continue
|
||||
|
||||
chunk = f"[Relevant memory - score {score:.2f}]: {content[:400]}..."
|
||||
if total_chars + len(chunk) > max_chars:
|
||||
break
|
||||
|
||||
parts.append(chunk)
|
||||
total_chars += len(chunk)
|
||||
|
||||
return "\n\n".join(parts) if parts else ""
|
||||
|
||||
def stats(self) -> dict:
|
||||
"""Get indexing statistics."""
|
||||
with self._get_conn() as conn:
|
||||
cursor = conn.execute(
|
||||
"SELECT COUNT(*), COUNT(DISTINCT source) FROM memories WHERE memory_type = 'vault_chunk'"
|
||||
)
|
||||
total_chunks, total_files = cursor.fetchone()
|
||||
|
||||
return {
|
||||
"total_chunks": total_chunks,
|
||||
"total_files": total_files,
|
||||
"embedding_dim": EMBEDDING_DIM if _get_embedding_model() else 128,
|
||||
}
|
||||
|
||||
|
||||
class MemorySearcher:
|
||||
"""High-level interface for memory search."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self.semantic = SemanticMemory()
|
||||
|
||||
def search(self, query: str, tiers: list[str] = None) -> dict:
|
||||
"""Search across memory tiers.
|
||||
|
||||
Args:
|
||||
query: Search query
|
||||
tiers: List of tiers to search ["hot", "vault", "semantic"]
|
||||
|
||||
Returns:
|
||||
Dict with results from each tier
|
||||
"""
|
||||
tiers = tiers or ["semantic"] # Default to semantic only
|
||||
results = {}
|
||||
|
||||
if "semantic" in tiers:
|
||||
semantic_results = self.semantic.search(query, top_k=5)
|
||||
results["semantic"] = [
|
||||
{"content": content, "score": score} for content, score in semantic_results
|
||||
]
|
||||
|
||||
return results
|
||||
|
||||
def get_context_for_query(self, query: str) -> str:
|
||||
"""Get comprehensive context for a user query."""
|
||||
# Get semantic context
|
||||
semantic_context = self.semantic.get_relevant_context(query)
|
||||
|
||||
if semantic_context:
|
||||
return f"## Relevant Past Context\n\n{semantic_context}"
|
||||
|
||||
return ""
|
||||
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user