Compare commits

..

51 Commits

Author SHA1 Message Date
9732c80892 feat: Real-time Spark Visualization in Tower Dashboard (#612)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-20 16:10:42 -04:00
f3b3d1e648 [loop-cycle-1658] feat: provider health history endpoint (#457) (#611) 2026-03-20 16:09:20 -04:00
4ba8d25749 feat: Lightning Network integration for tool usage (#610)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-20 13:07:02 -04:00
2622f0a0fb [loop-cycle-1242] fix: cycle_retro reads cycle_result.json (#603) (#609) 2026-03-20 12:55:01 -04:00
e3d60b89a9 fix: remove model_size kwarg from create_timmy() CLI calls (#606)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-20 12:48:49 -04:00
6214ad3225 refactor: extract helpers from run_self_tests() (#601)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-20 12:40:44 -04:00
5f5da2163f [loop-cycle] refactor: extract helpers from _handle_tool_confirmation (#592) (#600) 2026-03-20 12:32:24 -04:00
0029c34bb1 refactor: break up search_thoughts() into focused helpers (#597)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-20 12:26:51 -04:00
2577b71207 fix: capture thought timestamp at cycle start, not after LLM call (#590)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-20 12:13:48 -04:00
1a8b8ecaed [loop-cycle-1235] refactor: break up _migrate_schema() into focused helpers (#591) (#595) 2026-03-20 12:07:15 -04:00
d821e76589 [loop-cycle-1234] refactor: break up _generate_avatar_image (#563) (#589) 2026-03-20 11:57:53 -04:00
bc010ecfba [loop-cycle-1233] refactor: add docstrings to calm.py route handlers (#569) (#585) 2026-03-20 11:44:06 -04:00
faf6c1a5f1 [loop-cycle-1233] refactor: break up BaseAgent.run() (#561) (#584) 2026-03-20 11:24:36 -04:00
48103bb076 [loop-cycle-956] refactor: break up _handle_message() into focused helpers (#553) (#574) 2026-03-19 21:42:01 -04:00
9f244ffc70 refactor: break up _record_utterance() into focused helpers (#572)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 21:37:32 -04:00
0162a604be refactor: break up voice_loop.py::run() into focused helpers (#567)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 21:33:59 -04:00
2326771c5a [loop-cycle-953] refactor: DRY _import_creative_catalogs() (#560) (#565) 2026-03-19 21:21:23 -04:00
8f6cf2681b refactor: break up search_memories() into focused helpers (#557)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 21:16:07 -04:00
f361893fdd [loop-cycle-951] refactor: break up _migrate_schema() (#552) (#558) 2026-03-19 21:11:02 -04:00
7ad0ee17b6 refactor: break up shell.py::run() into helpers (#551)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 21:04:10 -04:00
29220b6bdd refactor: break up api_chat() into helpers (#547)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 21:02:04 -04:00
2849dba756 [loop-cycle-948] refactor: break up _gather_system_snapshot() into helpers (#540) (#549) 2026-03-19 20:52:13 -04:00
e11e07f117 [loop-cycle-947] refactor: break up self_reflect() into focused helpers (#505) (#546) 2026-03-19 20:49:18 -04:00
50c8a5428e refactor: break up api_chat() into helpers (#544)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 20:49:04 -04:00
7da434c85b [loop-cycle-946] refactor: complete airllm removal (#486) (#545) 2026-03-19 20:46:20 -04:00
88e59f7c17 refactor: break up chat_agent() into helpers (#542)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 20:38:46 -04:00
aa5e9c3176 refactor: break up get_memory_status() into helpers (#537)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 20:30:29 -04:00
1b4fe65650 fix: cache thinking agent and add timeouts to prevent loop pane death (#535)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 20:27:25 -04:00
2d69f73d9d fix: add timeout to thinking/loop-QA schedulers (#530)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 20:18:31 -04:00
ff1e43c235 [loop-cycle-545] fix: queue auto-hygiene — filter closed issues on read (#524) (#529) 2026-03-19 20:10:05 -04:00
b331aa6139 refactor: break up capture_error() into testable helpers (#523)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 20:03:28 -04:00
b45b543f2d refactor: break up create_timmy() into testable helpers (#520)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 19:51:59 -04:00
7c823ab59c refactor: break up think_once() into testable helpers (#518)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 19:43:26 -04:00
9f2728f529 refactor: break up lifespan() into testable helpers (#515)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 19:30:32 -04:00
cd3dc5d989 refactor: break up CascadeRouter.complete() into focused helpers (#510)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 19:24:36 -04:00
e4de539bf3 fix: extract ollama_url normalization into shared utility (#508)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 19:18:22 -04:00
b2057f72e1 [loop-cycle] refactor: break up run_agentic_loop into testable helpers (#504) (#509) 2026-03-19 19:15:38 -04:00
5f52dd54c0 [loop-cycle-932] fix: add logging to bare except Exception blocks (#484) (#501) 2026-03-19 19:05:02 -04:00
9ceffd61d1 [loop-cycle-544] fix: use settings.ollama_url fallback in _call_ollama (#490) (#498) 2026-03-19 16:18:39 -04:00
015d858be5 fix: auto-detect issue number in cycle retro from git branch (#495)
## Summary
- `cycle_retro.py` now auto-detects issue number from the git branch name (e.g. `kimi/issue-492` → `492`) when `--issue` is not provided
- `backfill_retro.py` now skips the PR number suffix Gitea appends to titles so it does not confuse PR numbers with issue numbers
- Added tests for both fixes

Fixes #492

Co-authored-by: kimi <kimi@localhost>
Reviewed-on: http://localhost:3000/rockachopa/Timmy-time-dashboard/pulls/495
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 16:13:35 -04:00
b6d0b5f999 feat: epoch turnover notation for loopstat cycles ⟳WW.D:NNN (#496) 2026-03-19 16:12:10 -04:00
d70e4f810a fix: use settings.ollama_url instead of hardcoded fallback in cascade router (#491)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 16:02:20 -04:00
7f20742fcf fix: replace hardcoded secret placeholder in CSRF middleware docstring (#488)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 15:52:29 -04:00
15eb7c3b45 [loop-cycle-538] refactor: remove dead airllm provider from cascade router (#459) (#481) 2026-03-19 15:44:10 -04:00
dbc2fd5b0f [loop-cycle-536] fix: validate_startup checks CORS wildcard in production (#472) (#478) 2026-03-19 15:29:26 -04:00
3c3aca57f1 [loop-cycle-535] perf: cache Timmy agent at startup (#471) (#476)
## What
Cache the Timmy agent instance at app startup (in lifespan) instead of creating a new one per `/serve/chat` request.

## Changes
- `src/timmy_serve/app.py`: Create agent in lifespan, store in `app.state.timmy`
- `tests/timmy/test_timmy_serve_app.py`: Updated tests for lifespan-based caching, added `test_agent_cached_at_startup`

2085 unit tests pass. 2102 pre-push tests pass. 78.5% coverage.

Closes #471

Co-authored-by: Timmy <timmy@timmytime.ai>
Reviewed-on: http://localhost:3000/rockachopa/Timmy-time-dashboard/pulls/476
Co-authored-by: Timmy Time <timmy@Alexanderwhitestone.ai>
Co-committed-by: Timmy Time <timmy@Alexanderwhitestone.ai>
2026-03-19 15:28:57 -04:00
0ae00af3f8 fix: remove AirLLM config settings from config.py (#475)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 15:24:43 -04:00
3df526f6ef [loop-cycle-2] feat: hot-reload providers.yaml without restart (#458) (#470) 2026-03-19 15:11:40 -04:00
50aaf60db2 [loop-cycle-2] fix: strip CORS wildcards in production (#462) (#469) 2026-03-19 15:05:27 -04:00
a751be3038 fix: default CORS origins to localhost instead of wildcard (#467)
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 14:57:36 -04:00
92594ea588 [loop-cycle] feat: implement source distinction in system prompts (#463) (#464) 2026-03-19 14:49:31 -04:00
59 changed files with 4322 additions and 2087 deletions

View File

@@ -54,19 +54,6 @@ providers:
context_window: 2048
capabilities: [text, vision, streaming]
# Secondary: Local AirLLM (if installed)
- name: airllm-local
type: airllm
enabled: false # Enable if pip install airllm
priority: 2
models:
- name: 70b
default: true
capabilities: [text, tools, json, streaming]
- name: 8b
capabilities: [text, tools, json, streaming]
- name: 405b
capabilities: [text, tools, json, streaming]
# Tertiary: OpenAI (if API key available)
- name: openai-backup

View File

@@ -94,12 +94,17 @@ def extract_cycle_number(title: str) -> int | None:
return int(m.group(1)) if m else None
def extract_issue_number(title: str, body: str) -> int | None:
# Try body first (usually has "closes #N")
def extract_issue_number(title: str, body: str, pr_number: int | None = None) -> int | None:
"""Extract the issue number from PR body/title, ignoring the PR number itself.
Gitea appends "(#N)" to PR titles where N is the PR number — skip that
so we don't confuse it with the linked issue.
"""
for text in [body or "", title]:
m = ISSUE_RE.search(text)
if m:
return int(m.group(1))
for m in ISSUE_RE.finditer(text):
num = int(m.group(1))
if num != pr_number:
return num
return None
@@ -140,7 +145,7 @@ def main():
else:
cycle_counter = max(cycle_counter, cycle)
issue = extract_issue_number(title, body)
issue = extract_issue_number(title, body, pr_number=pr_num)
issue_type = classify_pr(title, body)
duration = estimate_duration(pr)
diff = get_pr_diff_stats(token, pr_num)

View File

@@ -4,11 +4,26 @@
Called after each cycle completes (success or failure).
Appends a structured entry to .loop/retro/cycles.jsonl.
EPOCH NOTATION (turnover system):
Each cycle carries a symbolic epoch tag alongside the raw integer:
⟳WW.D:NNN
⟳ turnover glyph — marks epoch-aware cycles
WW ISO week-of-year (0153)
D ISO weekday (1=Mon … 7=Sun)
NNN daily cycle counter, zero-padded, resets at midnight UTC
Example: ⟳12.3:042 — Week 12, Wednesday, 42nd cycle of the day.
The raw `cycle` integer is preserved for backward compatibility.
The `epoch` field carries the symbolic notation.
SUCCESS DEFINITION:
A cycle is only "success" if BOTH conditions are met:
1. The hermes process exited cleanly (exit code 0)
2. Main is green (smoke test passes on main after merge)
A cycle that merges a PR but leaves main red is a FAILURE.
The --main-green flag records the smoke test result.
@@ -29,6 +44,8 @@ from __future__ import annotations
import argparse
import json
import re
import subprocess
import sys
from datetime import datetime, timezone
from pathlib import Path
@@ -36,10 +53,69 @@ from pathlib import Path
REPO_ROOT = Path(__file__).resolve().parent.parent
RETRO_FILE = REPO_ROOT / ".loop" / "retro" / "cycles.jsonl"
SUMMARY_FILE = REPO_ROOT / ".loop" / "retro" / "summary.json"
EPOCH_COUNTER_FILE = REPO_ROOT / ".loop" / "retro" / ".epoch_counter"
CYCLE_RESULT_FILE = REPO_ROOT / ".loop" / "cycle_result.json"
# How many recent entries to include in rolling summary
SUMMARY_WINDOW = 50
# Branch patterns that encode an issue number, e.g. kimi/issue-492
BRANCH_ISSUE_RE = re.compile(r"issue[/-](\d+)", re.IGNORECASE)
def detect_issue_from_branch() -> int | None:
"""Try to extract an issue number from the current git branch name."""
try:
branch = subprocess.check_output(
["git", "rev-parse", "--abbrev-ref", "HEAD"],
stderr=subprocess.DEVNULL,
text=True,
).strip()
except (subprocess.CalledProcessError, FileNotFoundError):
return None
m = BRANCH_ISSUE_RE.search(branch)
return int(m.group(1)) if m else None
# ── Epoch turnover ────────────────────────────────────────────────────────
def _epoch_tag(now: datetime | None = None) -> tuple[str, dict]:
"""Generate the symbolic epoch tag and advance the daily counter.
Returns (epoch_string, epoch_parts) where epoch_parts is a dict with
week, weekday, daily_n for structured storage.
The daily counter persists in .epoch_counter as a two-line file:
line 1: ISO date (YYYY-MM-DD) of the current epoch day
line 2: integer count
When the date rolls over, the counter resets to 1.
"""
if now is None:
now = datetime.now(timezone.utc)
iso_cal = now.isocalendar() # (year, week, weekday)
week = iso_cal[1]
weekday = iso_cal[2]
today_str = now.strftime("%Y-%m-%d")
# Read / reset daily counter
daily_n = 1
EPOCH_COUNTER_FILE.parent.mkdir(parents=True, exist_ok=True)
if EPOCH_COUNTER_FILE.exists():
try:
lines = EPOCH_COUNTER_FILE.read_text().strip().splitlines()
if len(lines) == 2 and lines[0] == today_str:
daily_n = int(lines[1]) + 1
except (ValueError, IndexError):
pass # corrupt file — reset
# Persist
EPOCH_COUNTER_FILE.write_text(f"{today_str}\n{daily_n}\n")
tag = f"\u27f3{week:02d}.{weekday}:{daily_n:03d}"
parts = {"week": week, "weekday": weekday, "daily_n": daily_n}
return tag, parts
def parse_args() -> argparse.Namespace:
p = argparse.ArgumentParser(description="Log a cycle retrospective")
@@ -123,8 +199,30 @@ def update_summary() -> None:
issue_failures[e["issue"]] = issue_failures.get(e["issue"], 0) + 1
quarantine_candidates = {k: v for k, v in issue_failures.items() if v >= 2}
# Epoch turnover stats — cycles per week/day from epoch-tagged entries
epoch_entries = [e for e in recent if e.get("epoch")]
by_week: dict[int, int] = {}
by_weekday: dict[int, int] = {}
for e in epoch_entries:
w = e.get("epoch_week")
d = e.get("epoch_weekday")
if w is not None:
by_week[w] = by_week.get(w, 0) + 1
if d is not None:
by_weekday[d] = by_weekday.get(d, 0) + 1
# Current epoch — latest entry's epoch tag
current_epoch = epoch_entries[-1].get("epoch", "") if epoch_entries else ""
# Weekday names for display
weekday_glyphs = {1: "Mon", 2: "Tue", 3: "Wed", 4: "Thu",
5: "Fri", 6: "Sat", 7: "Sun"}
by_weekday_named = {weekday_glyphs.get(k, str(k)): v
for k, v in sorted(by_weekday.items())}
summary = {
"updated_at": datetime.now(timezone.utc).isoformat(),
"current_epoch": current_epoch,
"window": len(recent),
"measured_cycles": len(measured),
"total_cycles": len(entries),
@@ -136,9 +234,12 @@ def update_summary() -> None:
"total_lines_removed": sum(e.get("lines_removed", 0) for e in recent),
"total_prs_merged": sum(1 for e in recent if e.get("pr")),
"by_type": type_stats,
"by_week": dict(sorted(by_week.items())),
"by_weekday": by_weekday_named,
"quarantine_candidates": quarantine_candidates,
"recent_failures": [
{"cycle": e["cycle"], "issue": e.get("issue"), "reason": e.get("reason", "")}
{"cycle": e["cycle"], "epoch": e.get("epoch", ""),
"issue": e.get("issue"), "reason": e.get("reason", "")}
for e in failures[-5:]
],
}
@@ -146,9 +247,41 @@ def update_summary() -> None:
SUMMARY_FILE.write_text(json.dumps(summary, indent=2) + "\n")
def _load_cycle_result() -> dict:
"""Read .loop/cycle_result.json if it exists; return empty dict on failure."""
if not CYCLE_RESULT_FILE.exists():
return {}
try:
raw = CYCLE_RESULT_FILE.read_text().strip()
# Strip hermes fence markers (```json ... ```) if present
if raw.startswith("```"):
lines = raw.splitlines()
lines = [l for l in lines if not l.startswith("```")]
raw = "\n".join(lines)
return json.loads(raw)
except (json.JSONDecodeError, OSError):
return {}
def main() -> None:
args = parse_args()
# Backfill from cycle_result.json when CLI args have defaults
cr = _load_cycle_result()
if cr:
if args.issue is None and cr.get("issue"):
args.issue = int(cr["issue"])
if args.type == "unknown" and cr.get("type"):
args.type = cr["type"]
if args.tests_passed == 0 and cr.get("tests_passed"):
args.tests_passed = int(cr["tests_passed"])
if not args.notes and cr.get("notes"):
args.notes = cr["notes"]
# Auto-detect issue from branch when not explicitly provided
if args.issue is None:
args.issue = detect_issue_from_branch()
# Reject idle cycles — no issue and no duration means nothing happened
if not args.issue and args.duration == 0:
print(f"[retro] Cycle {args.cycle} skipped — idle (no issue, no duration)")
@@ -157,9 +290,17 @@ def main() -> None:
# A cycle is only truly successful if hermes exited clean AND main is green
truly_success = args.success and args.main_green
# Generate epoch turnover tag
now = datetime.now(timezone.utc)
epoch_tag, epoch_parts = _epoch_tag(now)
entry = {
"timestamp": datetime.now(timezone.utc).isoformat(),
"timestamp": now.isoformat(),
"cycle": args.cycle,
"epoch": epoch_tag,
"epoch_week": epoch_parts["week"],
"epoch_weekday": epoch_parts["weekday"],
"epoch_daily_n": epoch_parts["daily_n"],
"issue": args.issue,
"type": args.type,
"success": truly_success,
@@ -184,7 +325,7 @@ def main() -> None:
update_summary()
status = "✓ SUCCESS" if args.success else "✗ FAILURE"
print(f"[retro] Cycle {args.cycle} {status}", end="")
print(f"[retro] {epoch_tag} Cycle {args.cycle} {status}", end="")
if args.issue:
print(f" (#{args.issue} {args.type})", end="")
if args.duration:

View File

@@ -18,13 +18,19 @@ Exit codes:
from __future__ import annotations
import json
import os
import sys
import time
import urllib.request
from pathlib import Path
REPO_ROOT = Path(__file__).resolve().parent.parent
QUEUE_FILE = REPO_ROOT / ".loop" / "queue.json"
IDLE_STATE_FILE = REPO_ROOT / ".loop" / "idle_state.json"
TOKEN_FILE = Path.home() / ".hermes" / "gitea_token"
GITEA_API = os.environ.get("GITEA_API", "http://localhost:3000/api/v1")
REPO_SLUG = os.environ.get("REPO_SLUG", "rockachopa/Timmy-time-dashboard")
# Backoff sequence: 60s, 120s, 240s, 600s max
BACKOFF_BASE = 60
@@ -32,19 +38,81 @@ BACKOFF_MAX = 600
BACKOFF_MULTIPLIER = 2
def _get_token() -> str:
"""Read Gitea token from env or file."""
token = os.environ.get("GITEA_TOKEN", "").strip()
if not token and TOKEN_FILE.exists():
token = TOKEN_FILE.read_text().strip()
return token
def _fetch_open_issue_numbers() -> set[int] | None:
"""Fetch open issue numbers from Gitea. Returns None on failure."""
token = _get_token()
if not token:
return None
try:
numbers: set[int] = set()
page = 1
while True:
url = (
f"{GITEA_API}/repos/{REPO_SLUG}/issues"
f"?state=open&type=issues&limit=50&page={page}"
)
req = urllib.request.Request(url, headers={
"Authorization": f"token {token}",
"Accept": "application/json",
})
with urllib.request.urlopen(req, timeout=10) as resp:
data = json.loads(resp.read())
if not data:
break
for issue in data:
numbers.add(issue["number"])
if len(data) < 50:
break
page += 1
return numbers
except Exception:
return None
def load_queue() -> list[dict]:
"""Load queue.json and return ready items."""
"""Load queue.json and return ready items, filtering out closed issues."""
if not QUEUE_FILE.exists():
return []
try:
data = json.loads(QUEUE_FILE.read_text())
if isinstance(data, list):
return [item for item in data if item.get("ready")]
return []
if not isinstance(data, list):
return []
ready = [item for item in data if item.get("ready")]
if not ready:
return []
# Filter out issues that are no longer open (auto-hygiene)
open_numbers = _fetch_open_issue_numbers()
if open_numbers is not None:
before = len(ready)
ready = [item for item in ready if item.get("issue") in open_numbers]
removed = before - len(ready)
if removed > 0:
print(f"[loop-guard] Filtered {removed} closed issue(s) from queue")
# Persist the cleaned queue so stale entries don't recur
_save_cleaned_queue(data, open_numbers)
return ready
except (json.JSONDecodeError, OSError):
return []
def _save_cleaned_queue(full_queue: list[dict], open_numbers: set[int]) -> None:
"""Rewrite queue.json without closed issues."""
cleaned = [item for item in full_queue if item.get("issue") in open_numbers]
try:
QUEUE_FILE.write_text(json.dumps(cleaned, indent=2) + "\n")
except OSError:
pass
def load_idle_state() -> dict:
"""Load persistent idle state."""
if not IDLE_STATE_FILE.exists():

407
scripts/loop_introspect.py Normal file
View File

@@ -0,0 +1,407 @@
#!/usr/bin/env python3
"""Loop introspection — the self-improvement engine.
Analyzes retro data across time windows to detect trends, extract patterns,
and produce structured recommendations. Output is consumed by deep_triage
and injected into the loop prompt context.
This is the piece that closes the feedback loop:
cycle_retro → introspect → deep_triage → loop behavior changes
Run: python3 scripts/loop_introspect.py
Output: .loop/retro/insights.json (structured insights + recommendations)
Prints human-readable summary to stdout.
Called by: deep_triage.sh (before the LLM triage), timmy-loop.sh (every 50 cycles)
"""
from __future__ import annotations
import json
import sys
from collections import defaultdict
from datetime import datetime, timezone, timedelta
from pathlib import Path
REPO_ROOT = Path(__file__).resolve().parent.parent
CYCLES_FILE = REPO_ROOT / ".loop" / "retro" / "cycles.jsonl"
DEEP_TRIAGE_FILE = REPO_ROOT / ".loop" / "retro" / "deep-triage.jsonl"
TRIAGE_FILE = REPO_ROOT / ".loop" / "retro" / "triage.jsonl"
QUARANTINE_FILE = REPO_ROOT / ".loop" / "quarantine.json"
INSIGHTS_FILE = REPO_ROOT / ".loop" / "retro" / "insights.json"
# ── Helpers ──────────────────────────────────────────────────────────────
def load_jsonl(path: Path) -> list[dict]:
"""Load a JSONL file, skipping bad lines."""
if not path.exists():
return []
entries = []
for line in path.read_text().strip().splitlines():
try:
entries.append(json.loads(line))
except (json.JSONDecodeError, ValueError):
continue
return entries
def parse_ts(ts_str: str) -> datetime | None:
"""Parse an ISO timestamp, tolerating missing tz."""
if not ts_str:
return None
try:
dt = datetime.fromisoformat(ts_str.replace("Z", "+00:00"))
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
return dt
except (ValueError, TypeError):
return None
def window(entries: list[dict], days: int) -> list[dict]:
"""Filter entries to the last N days."""
cutoff = datetime.now(timezone.utc) - timedelta(days=days)
result = []
for e in entries:
ts = parse_ts(e.get("timestamp", ""))
if ts and ts >= cutoff:
result.append(e)
return result
# ── Analysis functions ───────────────────────────────────────────────────
def compute_trends(cycles: list[dict]) -> dict:
"""Compare recent window (last 7d) vs older window (7-14d ago)."""
recent = window(cycles, 7)
older = window(cycles, 14)
# Remove recent from older to get the 7-14d window
recent_set = {(e.get("cycle"), e.get("timestamp")) for e in recent}
older = [e for e in older if (e.get("cycle"), e.get("timestamp")) not in recent_set]
def stats(entries):
if not entries:
return {"count": 0, "success_rate": None, "avg_duration": None,
"lines_net": 0, "prs_merged": 0}
successes = sum(1 for e in entries if e.get("success"))
durations = [e["duration"] for e in entries if e.get("duration", 0) > 0]
return {
"count": len(entries),
"success_rate": round(successes / len(entries), 3) if entries else None,
"avg_duration": round(sum(durations) / len(durations)) if durations else None,
"lines_net": sum(e.get("lines_added", 0) - e.get("lines_removed", 0) for e in entries),
"prs_merged": sum(1 for e in entries if e.get("pr")),
}
recent_stats = stats(recent)
older_stats = stats(older)
trend = {
"recent_7d": recent_stats,
"previous_7d": older_stats,
"velocity_change": None,
"success_rate_change": None,
"duration_change": None,
}
if recent_stats["count"] and older_stats["count"]:
trend["velocity_change"] = recent_stats["count"] - older_stats["count"]
if recent_stats["success_rate"] is not None and older_stats["success_rate"] is not None:
trend["success_rate_change"] = round(
recent_stats["success_rate"] - older_stats["success_rate"], 3
)
if recent_stats["avg_duration"] is not None and older_stats["avg_duration"] is not None:
trend["duration_change"] = recent_stats["avg_duration"] - older_stats["avg_duration"]
return trend
def type_analysis(cycles: list[dict]) -> dict:
"""Per-type success rates and durations."""
by_type: dict[str, list[dict]] = defaultdict(list)
for c in cycles:
by_type[c.get("type", "unknown")].append(c)
result = {}
for t, entries in by_type.items():
durations = [e["duration"] for e in entries if e.get("duration", 0) > 0]
successes = sum(1 for e in entries if e.get("success"))
result[t] = {
"count": len(entries),
"success_rate": round(successes / len(entries), 3) if entries else 0,
"avg_duration": round(sum(durations) / len(durations)) if durations else 0,
"max_duration": max(durations) if durations else 0,
}
return result
def repeat_failures(cycles: list[dict]) -> list[dict]:
"""Issues that have failed multiple times — quarantine candidates."""
failures: dict[int, list] = defaultdict(list)
for c in cycles:
if not c.get("success") and c.get("issue"):
failures[c["issue"]].append({
"cycle": c.get("cycle"),
"reason": c.get("reason", ""),
"duration": c.get("duration", 0),
})
# Only issues with 2+ failures
return [
{"issue": k, "failure_count": len(v), "attempts": v}
for k, v in sorted(failures.items(), key=lambda x: -len(x[1]))
if len(v) >= 2
]
def duration_outliers(cycles: list[dict], threshold_multiple: float = 3.0) -> list[dict]:
"""Cycles that took way longer than average — something went wrong."""
durations = [c["duration"] for c in cycles if c.get("duration", 0) > 0]
if len(durations) < 5:
return []
avg = sum(durations) / len(durations)
threshold = avg * threshold_multiple
outliers = []
for c in cycles:
dur = c.get("duration", 0)
if dur > threshold:
outliers.append({
"cycle": c.get("cycle"),
"issue": c.get("issue"),
"type": c.get("type"),
"duration": dur,
"avg_duration": round(avg),
"multiple": round(dur / avg, 1) if avg > 0 else 0,
"reason": c.get("reason", ""),
})
return outliers
def triage_effectiveness(deep_triages: list[dict]) -> dict:
"""How well is the deep triage performing?"""
if not deep_triages:
return {"runs": 0, "note": "No deep triage data yet"}
total_reviewed = sum(d.get("issues_reviewed", 0) for d in deep_triages)
total_refined = sum(len(d.get("issues_refined", [])) for d in deep_triages)
total_created = sum(len(d.get("issues_created", [])) for d in deep_triages)
total_closed = sum(len(d.get("issues_closed", [])) for d in deep_triages)
timmy_available = sum(1 for d in deep_triages if d.get("timmy_available"))
# Extract Timmy's feedback themes
timmy_themes = []
for d in deep_triages:
fb = d.get("timmy_feedback", "")
if fb:
timmy_themes.append(fb[:200])
return {
"runs": len(deep_triages),
"total_reviewed": total_reviewed,
"total_refined": total_refined,
"total_created": total_created,
"total_closed": total_closed,
"timmy_consultation_rate": round(timmy_available / len(deep_triages), 2),
"timmy_recent_feedback": timmy_themes[-1] if timmy_themes else "",
"timmy_feedback_history": timmy_themes,
}
def generate_recommendations(
trends: dict,
types: dict,
repeats: list,
outliers: list,
triage_eff: dict,
) -> list[dict]:
"""Produce actionable recommendations from the analysis."""
recs = []
# 1. Success rate declining?
src = trends.get("success_rate_change")
if src is not None and src < -0.1:
recs.append({
"severity": "high",
"category": "reliability",
"finding": f"Success rate dropped {abs(src)*100:.0f}pp in the last 7 days",
"recommendation": "Review recent failures. Are issues poorly scoped? "
"Is main unstable? Check if triage is producing bad work items.",
})
# 2. Velocity dropping?
vc = trends.get("velocity_change")
if vc is not None and vc < -5:
recs.append({
"severity": "medium",
"category": "throughput",
"finding": f"Velocity dropped by {abs(vc)} cycles vs previous week",
"recommendation": "Check for loop stalls, long-running cycles, or queue starvation.",
})
# 3. Duration creep?
dc = trends.get("duration_change")
if dc is not None and dc > 120: # 2+ minutes longer
recs.append({
"severity": "medium",
"category": "efficiency",
"finding": f"Average cycle duration increased by {dc}s vs previous week",
"recommendation": "Issues may be growing in scope. Enforce tighter decomposition "
"in deep triage. Check if tests are getting slower.",
})
# 4. Type-specific problems
for t, info in types.items():
if info["count"] >= 3 and info["success_rate"] < 0.5:
recs.append({
"severity": "high",
"category": "type_reliability",
"finding": f"'{t}' issues fail {(1-info['success_rate'])*100:.0f}% of the time "
f"({info['count']} attempts)",
"recommendation": f"'{t}' issues need better scoping or different approach. "
f"Consider: tighter acceptance criteria, smaller scope, "
f"or delegating to Kimi with more context.",
})
if info["avg_duration"] > 600 and info["count"] >= 3: # >10 min avg
recs.append({
"severity": "medium",
"category": "type_efficiency",
"finding": f"'{t}' issues average {info['avg_duration']//60}m{info['avg_duration']%60}s "
f"(max {info['max_duration']//60}m)",
"recommendation": f"Break '{t}' issues into smaller pieces. Target <5 min per cycle.",
})
# 5. Repeat failures
for rf in repeats[:3]:
recs.append({
"severity": "high",
"category": "repeat_failure",
"finding": f"Issue #{rf['issue']} has failed {rf['failure_count']} times",
"recommendation": "Quarantine or rewrite this issue. Repeated failure = "
"bad scope or missing prerequisite.",
})
# 6. Outliers
if len(outliers) > 2:
recs.append({
"severity": "medium",
"category": "outliers",
"finding": f"{len(outliers)} cycles took {outliers[0].get('multiple', '?')}x+ "
f"longer than average",
"recommendation": "Long cycles waste resources. Add timeout enforcement or "
"break complex issues earlier.",
})
# 7. Code growth
recent = trends.get("recent_7d", {})
net = recent.get("lines_net", 0)
if net > 500:
recs.append({
"severity": "low",
"category": "code_health",
"finding": f"Net +{net} lines added in the last 7 days",
"recommendation": "Lines of code is a liability. Balance feature work with "
"refactoring. Target net-zero or negative line growth.",
})
# 8. Triage health
if triage_eff.get("runs", 0) == 0:
recs.append({
"severity": "high",
"category": "triage",
"finding": "Deep triage has never run",
"recommendation": "Enable deep triage (every 20 cycles). The loop needs "
"LLM-driven issue refinement to stay effective.",
})
# No recommendations = things are healthy
if not recs:
recs.append({
"severity": "info",
"category": "health",
"finding": "No significant issues detected",
"recommendation": "System is healthy. Continue current patterns.",
})
return recs
# ── Main ─────────────────────────────────────────────────────────────────
def main() -> None:
cycles = load_jsonl(CYCLES_FILE)
deep_triages = load_jsonl(DEEP_TRIAGE_FILE)
if not cycles:
print("[introspect] No cycle data found. Nothing to analyze.")
return
# Run all analyses
trends = compute_trends(cycles)
types = type_analysis(cycles)
repeats = repeat_failures(cycles)
outliers = duration_outliers(cycles)
triage_eff = triage_effectiveness(deep_triages)
recommendations = generate_recommendations(trends, types, repeats, outliers, triage_eff)
insights = {
"generated_at": datetime.now(timezone.utc).isoformat(),
"total_cycles_analyzed": len(cycles),
"trends": trends,
"by_type": types,
"repeat_failures": repeats[:5],
"duration_outliers": outliers[:5],
"triage_effectiveness": triage_eff,
"recommendations": recommendations,
}
# Write insights
INSIGHTS_FILE.parent.mkdir(parents=True, exist_ok=True)
INSIGHTS_FILE.write_text(json.dumps(insights, indent=2) + "\n")
# Current epoch from latest entry
latest_epoch = ""
for c in reversed(cycles):
if c.get("epoch"):
latest_epoch = c["epoch"]
break
# Human-readable output
header = f"[introspect] Analyzed {len(cycles)} cycles"
if latest_epoch:
header += f" · current epoch: {latest_epoch}"
print(header)
print(f"\n TRENDS (7d vs previous 7d):")
r7 = trends["recent_7d"]
p7 = trends["previous_7d"]
print(f" Cycles: {r7['count']:>3d} (was {p7['count']})")
if r7["success_rate"] is not None:
arrow = "" if (trends["success_rate_change"] or 0) > 0 else "" if (trends["success_rate_change"] or 0) < 0 else ""
print(f" Success rate: {r7['success_rate']*100:>4.0f}% {arrow}")
if r7["avg_duration"] is not None:
print(f" Avg duration: {r7['avg_duration']//60}m{r7['avg_duration']%60:02d}s")
print(f" PRs merged: {r7['prs_merged']:>3d} (was {p7['prs_merged']})")
print(f" Lines net: {r7['lines_net']:>+5d}")
print(f"\n BY TYPE:")
for t, info in sorted(types.items(), key=lambda x: -x[1]["count"]):
print(f" {t:12s} n={info['count']:>2d} "
f"ok={info['success_rate']*100:>3.0f}% "
f"avg={info['avg_duration']//60}m{info['avg_duration']%60:02d}s")
if repeats:
print(f"\n REPEAT FAILURES:")
for rf in repeats[:3]:
print(f" #{rf['issue']} failed {rf['failure_count']}x")
print(f"\n RECOMMENDATIONS ({len(recommendations)}):")
for i, rec in enumerate(recommendations, 1):
sev = {"high": "🔴", "medium": "🟡", "low": "🟢", "info": " "}.get(rec["severity"], "?")
print(f" {sev} {rec['finding']}")
print(f"{rec['recommendation']}")
print(f"\n Written to: {INSIGHTS_FILE}")
if __name__ == "__main__":
main()

View File

@@ -10,6 +10,11 @@ from pydantic_settings import BaseSettings, SettingsConfigDict
APP_START_TIME: _datetime = _datetime.now(UTC)
def normalize_ollama_url(url: str) -> str:
"""Replace localhost with 127.0.0.1 to avoid IPv6 resolution delays."""
return url.replace("localhost", "127.0.0.1")
class Settings(BaseSettings):
"""Central configuration — all env-var access goes through this class."""
@@ -19,6 +24,11 @@ class Settings(BaseSettings):
# Ollama host — override with OLLAMA_URL env var or .env file
ollama_url: str = "http://localhost:11434"
@property
def normalized_ollama_url(self) -> str:
"""Return ollama_url with localhost replaced by 127.0.0.1."""
return normalize_ollama_url(self.ollama_url)
# LLM model passed to Agno/Ollama — override with OLLAMA_MODEL
# qwen3:30b is the primary model — better reasoning and tool calling
# than llama3.1:8b-instruct while still running locally on modest hardware.
@@ -64,17 +74,10 @@ class Settings(BaseSettings):
# Seconds to wait for user confirmation before auto-rejecting.
discord_confirm_timeout: int = 120
# ── AirLLM / backend selection ───────────────────────────────────────────
# ── Backend selection ────────────────────────────────────────────────────
# "ollama" — always use Ollama (default, safe everywhere)
# "airllm" — always use AirLLM (requires pip install ".[bigbrain]")
# "auto" — use AirLLM on Apple Silicon if airllm is installed,
# fall back to Ollama otherwise
timmy_model_backend: Literal["ollama", "airllm", "grok", "claude", "auto"] = "ollama"
# AirLLM model size when backend is airllm or auto.
# Larger = smarter, but needs more RAM / disk.
# 8b ~16 GB | 70b ~140 GB | 405b ~810 GB
airllm_model_size: Literal["8b", "70b", "405b"] = "70b"
# "auto" — pick best available local backend, fall back to Ollama
timmy_model_backend: Literal["ollama", "grok", "claude", "auto"] = "ollama"
# ── Grok (xAI) — opt-in premium cloud backend ────────────────────────
# Grok is a premium augmentation layer — local-first ethos preserved.
@@ -138,7 +141,12 @@ class Settings(BaseSettings):
# CORS allowed origins for the web chat interface (Gitea Pages, etc.)
# Set CORS_ORIGINS as a comma-separated list, e.g. "http://localhost:3000,https://example.com"
cors_origins: list[str] = ["*"]
cors_origins: list[str] = [
"http://localhost:3000",
"http://localhost:8000",
"http://127.0.0.1:3000",
"http://127.0.0.1:8000",
]
# Trusted hosts for the Host header check (TrustedHostMiddleware).
# Set TRUSTED_HOSTS as a comma-separated list. Wildcards supported (e.g. "*.ts.net").
@@ -246,6 +254,7 @@ class Settings(BaseSettings):
# When enabled, the agent starts an internal thought loop on server start.
thinking_enabled: bool = True
thinking_interval_seconds: int = 300 # 5 minutes between thoughts
thinking_timeout_seconds: int = 120 # max wall-clock time per thinking cycle
thinking_distill_every: int = 10 # distill facts from thoughts every Nth thought
thinking_issue_every: int = 20 # file Gitea issues from thoughts every Nth thought
thinking_memory_check_every: int = 50 # check memory status every Nth thought
@@ -394,7 +403,7 @@ def check_ollama_model_available(model_name: str) -> bool:
import json
import urllib.request
url = settings.ollama_url.replace("localhost", "127.0.0.1")
url = settings.normalized_ollama_url
req = urllib.request.Request(
f"{url}/api/tags",
method="GET",
@@ -471,8 +480,19 @@ def validate_startup(*, force: bool = False) -> None:
", ".join(_missing),
)
sys.exit(1)
if "*" in settings.cors_origins:
_startup_logger.error(
"PRODUCTION SECURITY ERROR: CORS wildcard '*' is not allowed "
"in production. Set CORS_ORIGINS to explicit origins."
)
sys.exit(1)
_startup_logger.info("Production mode: security secrets validated ✓")
else:
if "*" in settings.cors_origins:
_startup_logger.warning(
"SEC: CORS_ORIGINS contains wildcard '*'"
"restrict to explicit origins before deploying to production."
)
if not settings.l402_hmac_secret:
_startup_logger.warning(
"SEC: L402_HMAC_SECRET is not set — "

View File

@@ -46,6 +46,7 @@ from dashboard.routes.tasks import router as tasks_router
from dashboard.routes.telegram import router as telegram_router
from dashboard.routes.thinking import router as thinking_router
from dashboard.routes.tools import router as tools_router
from dashboard.routes.tower import router as tower_router
from dashboard.routes.voice import router as voice_router
from dashboard.routes.work_orders import router as work_orders_router
from dashboard.routes.world import router as world_router
@@ -155,7 +156,17 @@ async def _thinking_scheduler() -> None:
while True:
try:
if settings.thinking_enabled:
await thinking_engine.think_once()
await asyncio.wait_for(
thinking_engine.think_once(),
timeout=settings.thinking_timeout_seconds,
)
except TimeoutError:
logger.warning(
"Thinking cycle timed out after %ds — Ollama may be unresponsive",
settings.thinking_timeout_seconds,
)
except asyncio.CancelledError:
raise
except Exception as exc:
logger.error("Thinking scheduler error: %s", exc)
@@ -175,7 +186,10 @@ async def _loop_qa_scheduler() -> None:
while True:
try:
if settings.loop_qa_enabled:
result = await loop_qa_orchestrator.run_next_test()
result = await asyncio.wait_for(
loop_qa_orchestrator.run_next_test(),
timeout=settings.thinking_timeout_seconds,
)
if result:
status = "PASS" if result["success"] else "FAIL"
logger.info(
@@ -184,6 +198,13 @@ async def _loop_qa_scheduler() -> None:
status,
result.get("details", "")[:80],
)
except TimeoutError:
logger.warning(
"Loop QA test timed out after %ds",
settings.thinking_timeout_seconds,
)
except asyncio.CancelledError:
raise
except Exception as exc:
logger.error("Loop QA scheduler error: %s", exc)
@@ -329,33 +350,35 @@ async def _discord_token_watcher() -> None:
logger.warning("Discord auto-start failed: %s", exc)
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Application lifespan manager with non-blocking startup."""
# Validate security config (no-op in test mode)
def _startup_init() -> None:
"""Validate config and enable event persistence."""
from config import validate_startup
validate_startup()
# Enable event persistence (unified EventBus + swarm event_log)
from infrastructure.events.bus import init_event_bus_persistence
init_event_bus_persistence()
# Create all background tasks without waiting for them
briefing_task = asyncio.create_task(_briefing_scheduler())
thinking_task = asyncio.create_task(_thinking_scheduler())
loop_qa_task = asyncio.create_task(_loop_qa_scheduler())
presence_task = asyncio.create_task(_presence_watcher())
# Initialize Spark Intelligence engine
from spark.engine import get_spark_engine
if get_spark_engine().enabled:
logger.info("Spark Intelligence active — event capture enabled")
# Auto-prune old vector store memories on startup
def _startup_background_tasks() -> list[asyncio.Task]:
"""Spawn all recurring background tasks (non-blocking)."""
return [
asyncio.create_task(_briefing_scheduler()),
asyncio.create_task(_thinking_scheduler()),
asyncio.create_task(_loop_qa_scheduler()),
asyncio.create_task(_presence_watcher()),
asyncio.create_task(_start_chat_integrations_background()),
]
def _startup_pruning() -> None:
"""Auto-prune old memories, thoughts, and events on startup."""
if settings.memory_prune_days > 0:
try:
from timmy.memory_system import prune_memories
@@ -373,7 +396,6 @@ async def lifespan(app: FastAPI):
except Exception as exc:
logger.debug("Memory auto-prune skipped: %s", exc)
# Auto-prune old thoughts on startup
if settings.thoughts_prune_days > 0:
try:
from timmy.thinking import thinking_engine
@@ -391,7 +413,6 @@ async def lifespan(app: FastAPI):
except Exception as exc:
logger.debug("Thought auto-prune skipped: %s", exc)
# Auto-prune old system events on startup
if settings.events_prune_days > 0:
try:
from swarm.event_log import prune_old_events
@@ -409,7 +430,6 @@ async def lifespan(app: FastAPI):
except Exception as exc:
logger.debug("Event auto-prune skipped: %s", exc)
# Warn if memory vault exceeds size limit
if settings.memory_vault_max_mb > 0:
try:
vault_path = Path(settings.repo_root) / "memory" / "notes"
@@ -425,37 +445,18 @@ async def lifespan(app: FastAPI):
except Exception as exc:
logger.debug("Vault size check skipped: %s", exc)
# Start Workshop presence heartbeat with WS relay
from dashboard.routes.world import broadcast_world_state
from timmy.workshop_state import WorkshopHeartbeat
workshop_heartbeat = WorkshopHeartbeat(on_change=broadcast_world_state)
await workshop_heartbeat.start()
# Start chat integrations in background
chat_task = asyncio.create_task(_start_chat_integrations_background())
# Register session logger with error capture (breaks infrastructure → timmy circular dep)
try:
from infrastructure.error_capture import register_error_recorder
from timmy.session_logger import get_session_logger
register_error_recorder(get_session_logger().record_error)
except Exception:
pass
logger.info("✓ Dashboard ready for requests")
yield
# Cleanup on shutdown
async def _shutdown_cleanup(
bg_tasks: list[asyncio.Task],
workshop_heartbeat,
) -> None:
"""Stop chat bots, MCP sessions, heartbeat, and cancel background tasks."""
from integrations.chat_bridge.vendors.discord import discord_bot
from integrations.telegram_bot.bot import telegram_bot
await discord_bot.stop()
await telegram_bot.stop()
# Close MCP tool server sessions
try:
from timmy.mcp_tools import close_mcp_sessions
@@ -465,13 +466,42 @@ async def lifespan(app: FastAPI):
await workshop_heartbeat.stop()
for task in [briefing_task, thinking_task, chat_task, loop_qa_task, presence_task]:
if task:
task.cancel()
try:
await task
except asyncio.CancelledError:
pass
for task in bg_tasks:
task.cancel()
try:
await task
except asyncio.CancelledError:
pass
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Application lifespan manager with non-blocking startup."""
_startup_init()
bg_tasks = _startup_background_tasks()
_startup_pruning()
# Start Workshop presence heartbeat with WS relay
from dashboard.routes.world import broadcast_world_state
from timmy.workshop_state import WorkshopHeartbeat
workshop_heartbeat = WorkshopHeartbeat(on_change=broadcast_world_state)
await workshop_heartbeat.start()
# Register session logger with error capture
try:
from infrastructure.error_capture import register_error_recorder
from timmy.session_logger import get_session_logger
register_error_recorder(get_session_logger().record_error)
except Exception:
logger.debug("Failed to register error recorder")
logger.info("✓ Dashboard ready for requests")
yield
await _shutdown_cleanup(bg_tasks, workshop_heartbeat)
app = FastAPI(
@@ -484,15 +514,14 @@ app = FastAPI(
def _get_cors_origins() -> list[str]:
"""Get CORS origins from settings, with sensible defaults."""
"""Get CORS origins from settings, rejecting wildcards in production."""
origins = settings.cors_origins
if settings.debug and origins == ["*"]:
return [
"http://localhost:3000",
"http://localhost:8000",
"http://127.0.0.1:3000",
"http://127.0.0.1:8000",
]
if "*" in origins and not settings.debug:
logger.warning(
"Wildcard '*' in CORS_ORIGINS stripped in production — "
"set explicit origins via CORS_ORIGINS env var"
)
origins = [o for o in origins if o != "*"]
return origins
@@ -555,6 +584,7 @@ app.include_router(system_router)
app.include_router(experiments_router)
app.include_router(db_explorer_router)
app.include_router(world_router)
app.include_router(tower_router)
@app.websocket("/ws")

View File

@@ -100,7 +100,7 @@ class CSRFMiddleware(BaseHTTPMiddleware):
...
Usage:
app.add_middleware(CSRFMiddleware, secret="your-secret-key")
app.add_middleware(CSRFMiddleware, secret=settings.csrf_secret)
Attributes:
secret: Secret key for token signing (optional, for future use).

View File

@@ -71,27 +71,87 @@ async def clear_history(request: Request):
)
@router.post("/default/chat", response_class=HTMLResponse)
async def chat_agent(request: Request, message: str = Form(...)):
"""Chat — synchronous response with native Agno tool confirmation."""
def _validate_message(message: str) -> str:
"""Strip and validate chat input; raise HTTPException on bad input."""
from fastapi import HTTPException
message = message.strip()
if not message:
from fastapi import HTTPException
raise HTTPException(status_code=400, detail="Message cannot be empty")
if len(message) > MAX_MESSAGE_LENGTH:
from fastapi import HTTPException
raise HTTPException(status_code=422, detail="Message too long")
return message
# Record user activity so the thinking engine knows we're not idle
def _record_user_activity() -> None:
"""Notify the thinking engine that the user is active."""
try:
from timmy.thinking import thinking_engine
thinking_engine.record_user_input()
except Exception:
pass
logger.debug("Failed to record user input for thinking engine")
def _extract_tool_actions(run_output) -> list[dict]:
"""If Agno paused the run for tool confirmation, build approval items."""
from timmy.approvals import create_item
tool_actions: list[dict] = []
status = getattr(run_output, "status", None)
is_paused = status == "PAUSED" or str(status) == "RunStatus.paused"
if not (is_paused and getattr(run_output, "active_requirements", None)):
return tool_actions
for req in run_output.active_requirements:
if not getattr(req, "needs_confirmation", False):
continue
te = req.tool_execution
tool_name = getattr(te, "tool_name", "unknown")
tool_args = getattr(te, "tool_args", {}) or {}
item = create_item(
title=f"Dashboard: {tool_name}",
description=format_action_description(tool_name, tool_args),
proposed_action=json.dumps({"tool": tool_name, "args": tool_args}),
impact=get_impact_level(tool_name),
)
_pending_runs[item.id] = {
"run_output": run_output,
"requirement": req,
"tool_name": tool_name,
"tool_args": tool_args,
}
tool_actions.append(
{
"approval_id": item.id,
"tool_name": tool_name,
"description": format_action_description(tool_name, tool_args),
"impact": get_impact_level(tool_name),
}
)
return tool_actions
def _log_exchange(
message: str, response_text: str | None, error_text: str | None, timestamp: str
) -> None:
"""Append user message and agent/error reply to the in-memory log."""
message_log.append(role="user", content=message, timestamp=timestamp, source="browser")
if response_text:
message_log.append(
role="agent", content=response_text, timestamp=timestamp, source="browser"
)
elif error_text:
message_log.append(role="error", content=error_text, timestamp=timestamp, source="browser")
@router.post("/default/chat", response_class=HTMLResponse)
async def chat_agent(request: Request, message: str = Form(...)):
"""Chat — synchronous response with native Agno tool confirmation."""
message = _validate_message(message)
_record_user_activity()
timestamp = datetime.now().strftime("%H:%M:%S")
response_text = None
@@ -104,54 +164,15 @@ async def chat_agent(request: Request, message: str = Form(...)):
error_text = f"Chat error: {exc}"
run_output = None
# Check if Agno paused the run for tool confirmation
tool_actions = []
tool_actions: list[dict] = []
if run_output is not None:
status = getattr(run_output, "status", None)
is_paused = status == "PAUSED" or str(status) == "RunStatus.paused"
if is_paused and getattr(run_output, "active_requirements", None):
for req in run_output.active_requirements:
if getattr(req, "needs_confirmation", False):
te = req.tool_execution
tool_name = getattr(te, "tool_name", "unknown")
tool_args = getattr(te, "tool_args", {}) or {}
from timmy.approvals import create_item
item = create_item(
title=f"Dashboard: {tool_name}",
description=format_action_description(tool_name, tool_args),
proposed_action=json.dumps({"tool": tool_name, "args": tool_args}),
impact=get_impact_level(tool_name),
)
_pending_runs[item.id] = {
"run_output": run_output,
"requirement": req,
"tool_name": tool_name,
"tool_args": tool_args,
}
tool_actions.append(
{
"approval_id": item.id,
"tool_name": tool_name,
"description": format_action_description(tool_name, tool_args),
"impact": get_impact_level(tool_name),
}
)
tool_actions = _extract_tool_actions(run_output)
raw_content = run_output.content if hasattr(run_output, "content") else ""
response_text = _clean_response(raw_content or "")
if not response_text and not tool_actions:
response_text = None # let error template show if needed
response_text = None
message_log.append(role="user", content=message, timestamp=timestamp, source="browser")
if response_text:
message_log.append(
role="agent", content=response_text, timestamp=timestamp, source="browser"
)
elif error_text:
message_log.append(role="error", content=error_text, timestamp=timestamp, source="browser")
_log_exchange(message, response_text, error_text, timestamp)
return templates.TemplateResponse(
request,

View File

@@ -19,14 +19,17 @@ router = APIRouter(tags=["calm"])
# Helper functions for state machine logic
def get_now_task(db: Session) -> Task | None:
"""Return the single active NOW task, or None."""
return db.query(Task).filter(Task.state == TaskState.NOW).first()
def get_next_task(db: Session) -> Task | None:
"""Return the single queued NEXT task, or None."""
return db.query(Task).filter(Task.state == TaskState.NEXT).first()
def get_later_tasks(db: Session) -> list[Task]:
"""Return all LATER tasks ordered by MIT flag then sort_order."""
return (
db.query(Task)
.filter(Task.state == TaskState.LATER)
@@ -36,6 +39,12 @@ def get_later_tasks(db: Session) -> list[Task]:
def promote_tasks(db: Session):
"""Enforce the NOW/NEXT/LATER state machine invariants.
- At most one NOW task (extras demoted to NEXT).
- If no NOW, promote NEXT -> NOW.
- If no NEXT, promote highest-priority LATER -> NEXT.
"""
# Ensure only one NOW task exists. If multiple, demote extras to NEXT.
now_tasks = db.query(Task).filter(Task.state == TaskState.NOW).all()
if len(now_tasks) > 1:
@@ -74,6 +83,7 @@ def promote_tasks(db: Session):
# Endpoints
@router.get("/calm", response_class=HTMLResponse)
async def get_calm_view(request: Request, db: Session = Depends(get_db)):
"""Render the main CALM dashboard with NOW/NEXT/LATER counts."""
now_task = get_now_task(db)
next_task = get_next_task(db)
later_tasks_count = len(get_later_tasks(db))
@@ -90,6 +100,7 @@ async def get_calm_view(request: Request, db: Session = Depends(get_db)):
@router.get("/calm/ritual/morning", response_class=HTMLResponse)
async def get_morning_ritual_form(request: Request):
"""Render the morning ritual intake form."""
return templates.TemplateResponse(request, "calm/morning_ritual_form.html", {})
@@ -102,6 +113,7 @@ async def post_morning_ritual(
mit3_title: str = Form(None),
other_tasks: str = Form(""),
):
"""Process morning ritual: create MITs, other tasks, and set initial states."""
# Create Journal Entry
mit_task_ids = []
journal_entry = JournalEntry(entry_date=date.today())
@@ -173,6 +185,7 @@ async def post_morning_ritual(
@router.get("/calm/ritual/evening", response_class=HTMLResponse)
async def get_evening_ritual_form(request: Request, db: Session = Depends(get_db)):
"""Render the evening ritual form for today's journal entry."""
journal_entry = db.query(JournalEntry).filter(JournalEntry.entry_date == date.today()).first()
if not journal_entry:
raise HTTPException(status_code=404, detail="No journal entry for today")
@@ -189,6 +202,7 @@ async def post_evening_ritual(
gratitude: str = Form(None),
energy_level: int = Form(None),
):
"""Process evening ritual: save reflection/gratitude, archive active tasks."""
journal_entry = db.query(JournalEntry).filter(JournalEntry.entry_date == date.today()).first()
if not journal_entry:
raise HTTPException(status_code=404, detail="No journal entry for today")
@@ -223,6 +237,7 @@ async def create_new_task(
is_mit: bool = Form(False),
certainty: TaskCertainty = Form(TaskCertainty.SOFT),
):
"""Create a new task in LATER state and return updated count."""
task = Task(
title=title,
description=description,
@@ -247,6 +262,7 @@ async def start_task(
task_id: int,
db: Session = Depends(get_db),
):
"""Move a task to NOW state, demoting the current NOW to NEXT."""
current_now_task = get_now_task(db)
if current_now_task and current_now_task.id != task_id:
current_now_task.state = TaskState.NEXT # Demote current NOW to NEXT
@@ -281,6 +297,7 @@ async def complete_task(
task_id: int,
db: Session = Depends(get_db),
):
"""Mark a task as DONE and trigger state promotion."""
task = db.query(Task).filter(Task.id == task_id).first()
if not task:
raise HTTPException(status_code=404, detail="Task not found")
@@ -309,6 +326,7 @@ async def defer_task(
task_id: int,
db: Session = Depends(get_db),
):
"""Defer a task and trigger state promotion."""
task = db.query(Task).filter(Task.id == task_id).first()
if not task:
raise HTTPException(status_code=404, detail="Task not found")
@@ -333,6 +351,7 @@ async def defer_task(
@router.get("/calm/partials/later_tasks_list", response_class=HTMLResponse)
async def get_later_tasks_list(request: Request, db: Session = Depends(get_db)):
"""Render the expandable list of LATER tasks."""
later_tasks = get_later_tasks(db)
return templates.TemplateResponse(
"calm/partials/later_tasks_list.html",
@@ -348,6 +367,7 @@ async def reorder_tasks(
later_task_ids: str = Form(""),
next_task_id: int | None = Form(None),
):
"""Reorder LATER tasks and optionally promote one to NEXT."""
# Reorder LATER tasks
if later_task_ids:
ids_in_order = [int(x.strip()) for x in later_task_ids.split(",") if x.strip()]

View File

@@ -31,6 +31,93 @@ _UPLOAD_DIR = str(Path(settings.repo_root) / "data" / "chat-uploads")
_MAX_UPLOAD_SIZE = 50 * 1024 * 1024 # 50 MB
# ── POST /api/chat — helpers ─────────────────────────────────────────────────
async def _parse_chat_body(request: Request) -> tuple[dict | None, JSONResponse | None]:
"""Parse and validate the JSON request body.
Returns (body, None) on success or (None, error_response) on failure.
"""
content_length = request.headers.get("content-length")
if content_length and int(content_length) > settings.chat_api_max_body_bytes:
return None, JSONResponse(status_code=413, content={"error": "Request body too large"})
try:
body = await request.json()
except Exception as exc:
logger.warning("Chat API JSON parse error: %s", exc)
return None, JSONResponse(status_code=400, content={"error": "Invalid JSON"})
messages = body.get("messages")
if not messages or not isinstance(messages, list):
return None, JSONResponse(status_code=400, content={"error": "messages array is required"})
return body, None
def _extract_user_message(messages: list[dict]) -> str | None:
"""Return the text of the last user message, or *None* if absent."""
for msg in reversed(messages):
if msg.get("role") == "user":
content = msg.get("content", "")
if isinstance(content, list):
text_parts = [
p.get("text", "")
for p in content
if isinstance(p, dict) and p.get("type") == "text"
]
return " ".join(text_parts).strip() or None
text = str(content).strip()
return text or None
return None
def _build_context_prefix() -> str:
"""Build the system-context preamble injected before the user message."""
now = datetime.now()
return (
f"[System: Current date/time is "
f"{now.strftime('%A, %B %d, %Y at %I:%M %p')}]\n"
f"[System: Mobile client]\n\n"
)
def _notify_thinking_engine() -> None:
"""Record user activity so the thinking engine knows we're not idle."""
try:
from timmy.thinking import thinking_engine
thinking_engine.record_user_input()
except Exception:
logger.debug("Failed to record user input for thinking engine")
async def _process_chat(user_msg: str) -> dict | JSONResponse:
"""Send *user_msg* to the agent, log the exchange, and return a response."""
_notify_thinking_engine()
timestamp = datetime.now().strftime("%H:%M:%S")
try:
response_text = await agent_chat(
_build_context_prefix() + user_msg,
session_id="mobile",
)
message_log.append(role="user", content=user_msg, timestamp=timestamp, source="api")
message_log.append(role="agent", content=response_text, timestamp=timestamp, source="api")
return {"reply": response_text, "timestamp": timestamp}
except Exception as exc:
error_msg = f"Agent is offline: {exc}"
logger.error("api_chat error: %s", exc)
message_log.append(role="user", content=user_msg, timestamp=timestamp, source="api")
message_log.append(role="error", content=error_msg, timestamp=timestamp, source="api")
return JSONResponse(
status_code=503,
content={"error": error_msg, "timestamp": timestamp},
)
# ── POST /api/chat ────────────────────────────────────────────────────────────
@@ -44,78 +131,15 @@ async def api_chat(request: Request):
Response:
{"reply": "...", "timestamp": "HH:MM:SS"}
"""
# Enforce request body size limit
content_length = request.headers.get("content-length")
if content_length and int(content_length) > settings.chat_api_max_body_bytes:
return JSONResponse(status_code=413, content={"error": "Request body too large"})
body, err = await _parse_chat_body(request)
if err:
return err
try:
body = await request.json()
except Exception as exc:
logger.warning("Chat API JSON parse error: %s", exc)
return JSONResponse(status_code=400, content={"error": "Invalid JSON"})
messages = body.get("messages")
if not messages or not isinstance(messages, list):
return JSONResponse(status_code=400, content={"error": "messages array is required"})
# Extract the latest user message text
last_user_msg = None
for msg in reversed(messages):
if msg.get("role") == "user":
content = msg.get("content", "")
# Handle multimodal content arrays — extract text parts
if isinstance(content, list):
text_parts = [
p.get("text", "")
for p in content
if isinstance(p, dict) and p.get("type") == "text"
]
last_user_msg = " ".join(text_parts).strip()
else:
last_user_msg = str(content).strip()
break
if not last_user_msg:
user_msg = _extract_user_message(body["messages"])
if not user_msg:
return JSONResponse(status_code=400, content={"error": "No user message found"})
# Record user activity so the thinking engine knows we're not idle
try:
from timmy.thinking import thinking_engine
thinking_engine.record_user_input()
except Exception:
pass
timestamp = datetime.now().strftime("%H:%M:%S")
try:
# Inject context (same pattern as the HTMX chat handler in agents.py)
now = datetime.now()
context_prefix = (
f"[System: Current date/time is "
f"{now.strftime('%A, %B %d, %Y at %I:%M %p')}]\n"
f"[System: Mobile client]\n\n"
)
response_text = await agent_chat(
context_prefix + last_user_msg,
session_id="mobile",
)
message_log.append(role="user", content=last_user_msg, timestamp=timestamp, source="api")
message_log.append(role="agent", content=response_text, timestamp=timestamp, source="api")
return {"reply": response_text, "timestamp": timestamp}
except Exception as exc:
error_msg = f"Agent is offline: {exc}"
logger.error("api_chat error: %s", exc)
message_log.append(role="user", content=last_user_msg, timestamp=timestamp, source="api")
message_log.append(role="error", content=error_msg, timestamp=timestamp, source="api")
return JSONResponse(
status_code=503,
content={"error": error_msg, "timestamp": timestamp},
)
return await _process_chat(user_msg)
# ── POST /api/upload ──────────────────────────────────────────────────────────

View File

@@ -65,7 +65,7 @@ def _check_ollama_sync() -> DependencyStatus:
try:
import urllib.request
url = settings.ollama_url.replace("localhost", "127.0.0.1")
url = settings.normalized_ollama_url
req = urllib.request.Request(
f"{url}/api/tags",
method="GET",

View File

@@ -16,52 +16,11 @@ router = APIRouter(tags=["system"])
@router.get("/lightning/ledger", response_class=HTMLResponse)
async def lightning_ledger(request: Request):
"""Ledger and balance page."""
# Mock data for now, as this seems to be a UI-first feature
balance = {
"available_sats": 1337,
"incoming_total_sats": 2000,
"outgoing_total_sats": 663,
"fees_paid_sats": 5,
"net_sats": 1337,
"pending_incoming_sats": 0,
"pending_outgoing_sats": 0,
}
"""Ledger and balance page backed by the in-memory Lightning ledger."""
from lightning.ledger import get_balance, get_transactions
# Mock transactions
from collections import namedtuple
from enum import Enum
class TxType(Enum):
incoming = "incoming"
outgoing = "outgoing"
class TxStatus(Enum):
completed = "completed"
pending = "pending"
Tx = namedtuple(
"Tx", ["tx_type", "status", "amount_sats", "payment_hash", "memo", "created_at"]
)
transactions = [
Tx(
TxType.outgoing,
TxStatus.completed,
50,
"hash1",
"Model inference",
"2026-03-04 10:00:00",
),
Tx(
TxType.incoming,
TxStatus.completed,
1000,
"hash2",
"Manual deposit",
"2026-03-03 15:00:00",
),
]
balance = get_balance()
transactions = get_transactions()
return templates.TemplateResponse(
request,
@@ -70,7 +29,7 @@ async def lightning_ledger(request: Request):
"balance": balance,
"transactions": transactions,
"tx_types": ["incoming", "outgoing"],
"tx_statuses": ["completed", "pending"],
"tx_statuses": ["pending", "settled", "failed", "expired"],
"filter_type": None,
"filter_status": None,
"stats": {},
@@ -166,7 +125,7 @@ async def api_briefing_status():
if cached:
last_generated = cached.generated_at.isoformat()
except Exception:
pass
logger.debug("Failed to read briefing cache")
return JSONResponse(
{
@@ -190,6 +149,7 @@ async def api_memory_status():
stats = get_memory_stats()
indexed_files = stats.get("total_entries", 0)
except Exception:
logger.debug("Failed to get memory stats")
indexed_files = 0
return JSONResponse(
@@ -215,7 +175,7 @@ async def api_swarm_status():
).fetchone()
pending_tasks = row["cnt"] if row else 0
except Exception:
pass
logger.debug("Failed to count pending tasks")
return JSONResponse(
{

View File

@@ -0,0 +1,108 @@
"""Tower dashboard — real-time Spark visualization via WebSocket.
GET /tower — HTML Tower dashboard (Thinking / Predicting / Advising)
WS /tower/ws — WebSocket stream of Spark engine state updates
"""
import asyncio
import json
import logging
from fastapi import APIRouter, Request, WebSocket
from fastapi.responses import HTMLResponse
from dashboard.templating import templates
from spark.engine import spark_engine
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/tower", tags=["tower"])
_PUSH_INTERVAL = 5 # seconds between state broadcasts
def _spark_snapshot() -> dict:
"""Build a JSON-serialisable snapshot of Spark state."""
status = spark_engine.status()
timeline = spark_engine.get_timeline(limit=10)
events = []
for ev in timeline:
entry = {
"event_type": ev.event_type,
"description": ev.description,
"importance": ev.importance,
"created_at": ev.created_at,
}
if ev.agent_id:
entry["agent_id"] = ev.agent_id[:8]
if ev.task_id:
entry["task_id"] = ev.task_id[:8]
try:
entry["data"] = json.loads(ev.data)
except (json.JSONDecodeError, TypeError):
entry["data"] = {}
events.append(entry)
predictions = spark_engine.get_predictions(limit=5)
preds = []
for p in predictions:
pred = {
"task_id": p.task_id[:8] if p.task_id else "?",
"accuracy": p.accuracy,
"evaluated": p.evaluated_at is not None,
"created_at": p.created_at,
}
try:
pred["predicted"] = json.loads(p.predicted_value)
except (json.JSONDecodeError, TypeError):
pred["predicted"] = {}
preds.append(pred)
advisories = spark_engine.get_advisories()
advs = [
{
"category": a.category,
"priority": a.priority,
"title": a.title,
"detail": a.detail,
"suggested_action": a.suggested_action,
}
for a in advisories
]
return {
"type": "spark_state",
"status": status,
"events": events,
"predictions": preds,
"advisories": advs,
}
@router.get("", response_class=HTMLResponse)
async def tower_ui(request: Request):
"""Render the Tower dashboard page."""
snapshot = _spark_snapshot()
return templates.TemplateResponse(
request,
"tower.html",
{"snapshot": snapshot},
)
@router.websocket("/ws")
async def tower_ws(websocket: WebSocket) -> None:
"""Stream Spark state snapshots to the Tower dashboard."""
await websocket.accept()
logger.info("Tower WS connected")
try:
# Send initial snapshot
await websocket.send_text(json.dumps(_spark_snapshot()))
while True:
await asyncio.sleep(_PUSH_INTERVAL)
await websocket.send_text(json.dumps(_spark_snapshot()))
except Exception:
logger.debug("Tower WS disconnected")

View File

@@ -221,7 +221,7 @@ async def _heartbeat(websocket: WebSocket) -> None:
await asyncio.sleep(_HEARTBEAT_INTERVAL)
await websocket.send_text(json.dumps({"type": "ping"}))
except Exception:
pass # connection gone — receive loop will clean up
logger.debug("Heartbeat stopped — connection gone")
@router.websocket("/ws")
@@ -250,7 +250,7 @@ async def world_ws(websocket: WebSocket) -> None:
raw = await websocket.receive_text()
await _handle_client_message(raw)
except Exception:
pass
logger.debug("WebSocket receive loop ended")
finally:
ping_task.cancel()
if websocket in _ws_clients:
@@ -265,6 +265,7 @@ async def _broadcast(message: str) -> None:
try:
await ws.send_text(message)
except Exception:
logger.debug("Pruning dead WebSocket client")
dead.append(ws)
for ws in dead:
if ws in _ws_clients:
@@ -340,7 +341,7 @@ async def _bark_and_broadcast(visitor_text: str) -> None:
pip_familiar.on_event("visitor_spoke")
except Exception:
pass # Pip is optional
logger.debug("Pip familiar notification failed (optional)")
_refresh_ground(visitor_text)
_tick_commitments()

View File

@@ -0,0 +1,180 @@
{% extends "base.html" %}
{% block title %}Timmy Time — Tower{% endblock %}
{% block extra_styles %}{% endblock %}
{% block content %}
<div class="container-fluid tower-container py-3">
<div class="tower-header">
<div class="tower-title">TOWER</div>
<div class="tower-subtitle">
Real-time Spark visualization &mdash;
<span id="tower-conn" class="tower-conn-badge tower-conn-connecting">CONNECTING</span>
</div>
</div>
<div class="row g-3">
<!-- Left: THINKING (events) -->
<div class="col-12 col-lg-4 d-flex flex-column gap-3">
<div class="card mc-panel tower-phase-card">
<div class="card-header mc-panel-header tower-phase-thinking">// THINKING</div>
<div class="card-body p-3 tower-scroll" id="tower-events">
<div class="tower-empty">Waiting for Spark data&hellip;</div>
</div>
</div>
</div>
<!-- Middle: PREDICTING (EIDOS) -->
<div class="col-12 col-lg-4 d-flex flex-column gap-3">
<div class="card mc-panel tower-phase-card">
<div class="card-header mc-panel-header tower-phase-predicting">// PREDICTING</div>
<div class="card-body p-3" id="tower-predictions">
<div class="tower-empty">Waiting for Spark data&hellip;</div>
</div>
</div>
<div class="card mc-panel">
<div class="card-header mc-panel-header">// EIDOS STATS</div>
<div class="card-body p-3">
<div class="tower-stat-grid" id="tower-stats">
<div class="tower-stat"><span class="tower-stat-label">EVENTS</span><span class="tower-stat-value" id="ts-events">0</span></div>
<div class="tower-stat"><span class="tower-stat-label">MEMORIES</span><span class="tower-stat-value" id="ts-memories">0</span></div>
<div class="tower-stat"><span class="tower-stat-label">PREDICTIONS</span><span class="tower-stat-value" id="ts-preds">0</span></div>
<div class="tower-stat"><span class="tower-stat-label">ACCURACY</span><span class="tower-stat-value" id="ts-accuracy"></span></div>
</div>
</div>
</div>
</div>
<!-- Right: ADVISING -->
<div class="col-12 col-lg-4 d-flex flex-column gap-3">
<div class="card mc-panel tower-phase-card">
<div class="card-header mc-panel-header tower-phase-advising">// ADVISING</div>
<div class="card-body p-3 tower-scroll" id="tower-advisories">
<div class="tower-empty">Waiting for Spark data&hellip;</div>
</div>
</div>
</div>
</div>
</div>
<script>
(function() {
var ws = null;
var badge = document.getElementById('tower-conn');
function setConn(state) {
badge.textContent = state.toUpperCase();
badge.className = 'tower-conn-badge tower-conn-' + state;
}
function esc(s) { var d = document.createElement('div'); d.textContent = s; return d.innerHTML; }
function renderEvents(events) {
var el = document.getElementById('tower-events');
if (!events || !events.length) { el.innerHTML = '<div class="tower-empty">No events captured yet.</div>'; return; }
var html = '';
for (var i = 0; i < events.length; i++) {
var ev = events[i];
var dots = ev.importance >= 0.8 ? '\u25cf\u25cf\u25cf' : ev.importance >= 0.5 ? '\u25cf\u25cf' : '\u25cf';
html += '<div class="tower-event tower-etype-' + esc(ev.event_type) + '">'
+ '<div class="tower-ev-head">'
+ '<span class="tower-ev-badge">' + esc(ev.event_type.replace(/_/g, ' ').toUpperCase()) + '</span>'
+ '<span class="tower-ev-dots">' + dots + '</span>'
+ '</div>'
+ '<div class="tower-ev-desc">' + esc(ev.description) + '</div>'
+ '<div class="tower-ev-time">' + esc((ev.created_at || '').slice(0, 19)) + '</div>'
+ '</div>';
}
el.innerHTML = html;
}
function renderPredictions(preds) {
var el = document.getElementById('tower-predictions');
if (!preds || !preds.length) { el.innerHTML = '<div class="tower-empty">No predictions yet.</div>'; return; }
var html = '';
for (var i = 0; i < preds.length; i++) {
var p = preds[i];
var cls = p.evaluated ? 'tower-pred-done' : 'tower-pred-pending';
var accTxt = p.accuracy != null ? Math.round(p.accuracy * 100) + '%' : 'PENDING';
var accCls = p.accuracy != null ? (p.accuracy >= 0.7 ? 'text-success' : p.accuracy < 0.4 ? 'text-danger' : 'text-warning') : '';
html += '<div class="tower-pred ' + cls + '">'
+ '<div class="tower-pred-head">'
+ '<span class="tower-pred-task">' + esc(p.task_id) + '</span>'
+ '<span class="tower-pred-acc ' + accCls + '">' + accTxt + '</span>'
+ '</div>';
if (p.predicted) {
var pr = p.predicted;
html += '<div class="tower-pred-detail">';
if (pr.likely_winner) html += '<span>Winner: ' + esc(pr.likely_winner.slice(0, 8)) + '</span> ';
if (pr.success_probability != null) html += '<span>Success: ' + Math.round(pr.success_probability * 100) + '%</span> ';
html += '</div>';
}
html += '<div class="tower-ev-time">' + esc((p.created_at || '').slice(0, 19)) + '</div>'
+ '</div>';
}
el.innerHTML = html;
}
function renderAdvisories(advs) {
var el = document.getElementById('tower-advisories');
if (!advs || !advs.length) { el.innerHTML = '<div class="tower-empty">No advisories yet.</div>'; return; }
var html = '';
for (var i = 0; i < advs.length; i++) {
var a = advs[i];
var prio = a.priority >= 0.7 ? 'high' : a.priority >= 0.4 ? 'medium' : 'low';
html += '<div class="tower-advisory tower-adv-' + prio + '">'
+ '<div class="tower-adv-head">'
+ '<span class="tower-adv-cat">' + esc(a.category.replace(/_/g, ' ').toUpperCase()) + '</span>'
+ '<span class="tower-adv-prio">' + Math.round(a.priority * 100) + '%</span>'
+ '</div>'
+ '<div class="tower-adv-title">' + esc(a.title) + '</div>'
+ '<div class="tower-adv-detail">' + esc(a.detail) + '</div>'
+ '<div class="tower-adv-action">' + esc(a.suggested_action) + '</div>'
+ '</div>';
}
el.innerHTML = html;
}
function renderStats(status) {
if (!status) return;
document.getElementById('ts-events').textContent = status.events_captured || 0;
document.getElementById('ts-memories').textContent = status.memories_stored || 0;
var p = status.predictions || {};
document.getElementById('ts-preds').textContent = p.total_predictions || 0;
var acc = p.avg_accuracy;
var accEl = document.getElementById('ts-accuracy');
if (acc != null) {
accEl.textContent = Math.round(acc * 100) + '%';
accEl.className = 'tower-stat-value ' + (acc >= 0.7 ? 'text-success' : acc < 0.4 ? 'text-danger' : 'text-warning');
} else {
accEl.textContent = '\u2014';
}
}
function handleMsg(data) {
if (data.type !== 'spark_state') return;
renderEvents(data.events);
renderPredictions(data.predictions);
renderAdvisories(data.advisories);
renderStats(data.status);
}
function connect() {
var proto = location.protocol === 'https:' ? 'wss:' : 'ws:';
ws = new WebSocket(proto + '//' + location.host + '/tower/ws');
ws.onopen = function() { setConn('live'); };
ws.onclose = function() { setConn('offline'); setTimeout(connect, 3000); };
ws.onerror = function() { setConn('offline'); };
ws.onmessage = function(e) {
try { handleMsg(JSON.parse(e.data)); } catch(err) { console.error('Tower WS parse error', err); }
};
}
connect();
})();
</script>
{% endblock %}

View File

@@ -100,36 +100,14 @@ def _get_git_context() -> dict:
return {"branch": "unknown", "commit": "unknown"}
def capture_error(
exc: Exception,
source: str = "unknown",
context: dict | None = None,
) -> str | None:
"""Capture an error and optionally create a bug report.
Args:
exc: The exception to capture
source: Module/component where the error occurred
context: Optional dict of extra context (request path, etc.)
def _extract_traceback_info(exc: Exception) -> tuple[str, str, int]:
"""Extract formatted traceback, affected file, and line number.
Returns:
Task ID of the created bug report, or None if deduplicated/disabled
Tuple of (traceback_string, affected_file, affected_line).
"""
from config import settings
if not settings.error_feedback_enabled:
return None
error_hash = _stack_hash(exc)
if _is_duplicate(error_hash):
logger.debug("Duplicate error suppressed: %s (hash=%s)", exc, error_hash)
return None
# Format the stack trace
tb_str = "".join(traceback.format_exception(type(exc), exc, exc.__traceback__))
# Extract file/line from traceback
tb_obj = exc.__traceback__
affected_file = "unknown"
affected_line = 0
@@ -139,9 +117,18 @@ def capture_error(
affected_file = tb_obj.tb_frame.f_code.co_filename
affected_line = tb_obj.tb_lineno
git_ctx = _get_git_context()
return tb_str, affected_file, affected_line
# 1. Log to event_log
def _log_error_event(
exc: Exception,
source: str,
error_hash: str,
affected_file: str,
affected_line: int,
git_ctx: dict,
) -> None:
"""Log the captured error to the event log."""
try:
from swarm.event_log import EventType, log_event
@@ -161,8 +148,18 @@ def capture_error(
except Exception as log_exc:
logger.debug("Failed to log error event: %s", log_exc)
# 2. Create bug report task
task_id = None
def _create_bug_report(
exc: Exception,
source: str,
context: dict | None,
error_hash: str,
tb_str: str,
affected_file: str,
affected_line: int,
git_ctx: dict,
) -> str | None:
"""Create a bug report task and return the task ID (or None on failure)."""
try:
from swarm.task_queue.models import create_task
@@ -195,7 +192,6 @@ def capture_error(
)
task_id = task.id
# Log the creation event
try:
from swarm.event_log import EventType, log_event
@@ -210,12 +206,16 @@ def capture_error(
)
except Exception as exc:
logger.warning("Bug report screenshot error: %s", exc)
pass
return task_id
except Exception as task_exc:
logger.debug("Failed to create bug report task: %s", task_exc)
return None
# 3. Send notification
def _notify_bug_report(exc: Exception, source: str) -> None:
"""Send a push notification about the captured error."""
try:
from infrastructure.notifications.push import notifier
@@ -224,11 +224,12 @@ def capture_error(
message=f"{type(exc).__name__} in {source}: {str(exc)[:80]}",
category="system",
)
except Exception as exc:
logger.warning("Bug report notification error: %s", exc)
pass
except Exception as notify_exc:
logger.warning("Bug report notification error: %s", notify_exc)
# 4. Record in session logger (via registered callback)
def _record_to_session(exc: Exception, source: str) -> None:
"""Record the error via the registered session callback."""
if _error_recorder is not None:
try:
_error_recorder(
@@ -238,4 +239,50 @@ def capture_error(
except Exception as log_exc:
logger.warning("Bug report session logging error: %s", log_exc)
def capture_error(
exc: Exception,
source: str = "unknown",
context: dict | None = None,
) -> str | None:
"""Capture an error and optionally create a bug report.
Args:
exc: The exception to capture
source: Module/component where the error occurred
context: Optional dict of extra context (request path, etc.)
Returns:
Task ID of the created bug report, or None if deduplicated/disabled
"""
from config import settings
if not settings.error_feedback_enabled:
return None
error_hash = _stack_hash(exc)
if _is_duplicate(error_hash):
logger.debug("Duplicate error suppressed: %s (hash=%s)", exc, error_hash)
return None
tb_str, affected_file, affected_line = _extract_traceback_info(exc)
git_ctx = _get_git_context()
_log_error_event(exc, source, error_hash, affected_file, affected_line, git_ctx)
task_id = _create_bug_report(
exc,
source,
context,
error_hash,
tb_str,
affected_file,
affected_line,
git_ctx,
)
_notify_bug_report(exc, source)
_record_to_session(exc, source)
return task_id

View File

@@ -144,6 +144,65 @@ class ShellHand:
return None
@staticmethod
def _build_run_env(env: dict | None) -> dict:
"""Merge *env* overrides into a copy of the current environment."""
import os
run_env = os.environ.copy()
if env:
run_env.update(env)
return run_env
async def _execute_subprocess(
self,
command: str,
effective_timeout: int,
cwd: str | None,
run_env: dict,
start: float,
) -> ShellResult:
"""Run *command* as a subprocess with timeout enforcement."""
proc = await asyncio.create_subprocess_shell(
command,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
cwd=cwd,
env=run_env,
)
try:
stdout_bytes, stderr_bytes = await asyncio.wait_for(
proc.communicate(), timeout=effective_timeout
)
except TimeoutError:
proc.kill()
await proc.wait()
latency = (time.time() - start) * 1000
logger.warning("Shell command timed out after %ds: %s", effective_timeout, command)
return ShellResult(
command=command,
success=False,
exit_code=-1,
error=f"Command timed out after {effective_timeout}s",
latency_ms=latency,
timed_out=True,
)
latency = (time.time() - start) * 1000
exit_code = proc.returncode if proc.returncode is not None else -1
stdout = stdout_bytes.decode("utf-8", errors="replace").strip()
stderr = stderr_bytes.decode("utf-8", errors="replace").strip()
return ShellResult(
command=command,
success=exit_code == 0,
exit_code=exit_code,
stdout=stdout,
stderr=stderr,
latency_ms=latency,
)
async def run(
self,
command: str,
@@ -164,7 +223,6 @@ class ShellHand:
"""
start = time.time()
# Validate
validation_error = self._validate_command(command)
if validation_error:
return ShellResult(
@@ -178,52 +236,8 @@ class ShellHand:
cwd = working_dir or self._working_dir
try:
import os
run_env = os.environ.copy()
if env:
run_env.update(env)
proc = await asyncio.create_subprocess_shell(
command,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
cwd=cwd,
env=run_env,
)
try:
stdout_bytes, stderr_bytes = await asyncio.wait_for(
proc.communicate(), timeout=effective_timeout
)
except TimeoutError:
proc.kill()
await proc.wait()
latency = (time.time() - start) * 1000
logger.warning("Shell command timed out after %ds: %s", effective_timeout, command)
return ShellResult(
command=command,
success=False,
exit_code=-1,
error=f"Command timed out after {effective_timeout}s",
latency_ms=latency,
timed_out=True,
)
latency = (time.time() - start) * 1000
exit_code = proc.returncode if proc.returncode is not None else -1
stdout = stdout_bytes.decode("utf-8", errors="replace").strip()
stderr = stderr_bytes.decode("utf-8", errors="replace").strip()
return ShellResult(
command=command,
success=exit_code == 0,
exit_code=exit_code,
stdout=stdout,
stderr=stderr,
latency_ms=latency,
)
run_env = self._build_run_env(env)
return await self._execute_subprocess(command, effective_timeout, cwd, run_env, start)
except Exception as exc:
latency = (time.time() - start) * 1000
logger.warning("Shell command failed: %s%s", command, exc)

View File

@@ -13,7 +13,7 @@ import logging
from dataclasses import dataclass, field
from enum import Enum, auto
from config import settings
from config import normalize_ollama_url, settings
logger = logging.getLogger(__name__)
@@ -307,7 +307,7 @@ class MultiModalManager:
import json
import urllib.request
url = self.ollama_url.replace("localhost", "127.0.0.1")
url = normalize_ollama_url(self.ollama_url)
req = urllib.request.Request(
f"{url}/api/tags",
method="GET",
@@ -462,7 +462,7 @@ class MultiModalManager:
logger.info("Pulling model: %s", model_name)
url = self.ollama_url.replace("localhost", "127.0.0.1")
url = normalize_ollama_url(self.ollama_url)
req = urllib.request.Request(
f"{url}/api/pull",
method="POST",

View File

@@ -2,6 +2,7 @@
from .api import router
from .cascade import CascadeRouter, Provider, ProviderStatus, get_router
from .history import HealthHistoryStore, get_history_store
__all__ = [
"CascadeRouter",
@@ -9,4 +10,6 @@ __all__ = [
"ProviderStatus",
"get_router",
"router",
"HealthHistoryStore",
"get_history_store",
]

View File

@@ -8,6 +8,7 @@ from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
from .cascade import CascadeRouter, get_router
from .history import HealthHistoryStore, get_history_store
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/api/v1/router", tags=["router"])
@@ -183,6 +184,33 @@ async def run_health_check(
}
@router.post("/reload")
async def reload_config(
cascade: Annotated[CascadeRouter, Depends(get_cascade_router)],
) -> dict[str, Any]:
"""Hot-reload providers.yaml without restart.
Preserves circuit breaker state and metrics for existing providers.
"""
try:
result = cascade.reload_config()
return {"status": "ok", **result}
except Exception as exc:
logger.error("Config reload failed: %s", exc)
raise HTTPException(status_code=500, detail=f"Reload failed: {exc}") from exc
@router.get("/history")
async def get_history(
hours: int = 24,
store: Annotated[HealthHistoryStore, Depends(get_history_store)] = None,
) -> list[dict[str, Any]]:
"""Get provider health history for the last N hours."""
if store is None:
store = get_history_store()
return store.get_history(hours=hours)
@router.get("/config")
async def get_config(
cascade: Annotated[CascadeRouter, Depends(get_cascade_router)],

View File

@@ -18,6 +18,8 @@ from enum import Enum
from pathlib import Path
from typing import Any
from config import settings
try:
import yaml
except ImportError:
@@ -100,7 +102,7 @@ class Provider:
"""LLM provider configuration and state."""
name: str
type: str # ollama, openai, anthropic, airllm
type: str # ollama, openai, anthropic
enabled: bool
priority: int
url: str | None = None
@@ -301,22 +303,13 @@ class CascadeRouter:
# Can't check without requests, assume available
return True
try:
url = provider.url or "http://localhost:11434"
url = provider.url or settings.ollama_url
response = requests.get(f"{url}/api/tags", timeout=5)
return response.status_code == 200
except Exception as exc:
logger.debug("Ollama provider check error: %s", exc)
return False
elif provider.type == "airllm":
# Check if airllm is installed
try:
import importlib.util
return importlib.util.find_spec("airllm") is not None
except (ImportError, ModuleNotFoundError):
return False
elif provider.type in ("openai", "anthropic", "grok"):
# Check if API key is set
return provider.api_key is not None and provider.api_key != ""
@@ -395,6 +388,101 @@ class CascadeRouter:
return None
def _select_model(
self, provider: Provider, model: str | None, content_type: ContentType
) -> tuple[str | None, bool]:
"""Select the best model for the request, with vision fallback.
Returns:
Tuple of (selected_model, is_fallback_model).
"""
selected_model = model or provider.get_default_model()
is_fallback = False
if content_type != ContentType.TEXT and selected_model:
if provider.type == "ollama" and self._mm_manager:
from infrastructure.models.multimodal import ModelCapability
if content_type == ContentType.VISION:
supports = self._mm_manager.model_supports(
selected_model, ModelCapability.VISION
)
if not supports:
fallback = self._get_fallback_model(provider, selected_model, content_type)
if fallback:
logger.info(
"Model %s doesn't support vision, falling back to %s",
selected_model,
fallback,
)
selected_model = fallback
is_fallback = True
else:
logger.warning(
"No vision-capable model found on %s, trying anyway",
provider.name,
)
return selected_model, is_fallback
async def _attempt_with_retry(
self,
provider: Provider,
messages: list[dict],
model: str | None,
temperature: float,
max_tokens: int | None,
content_type: ContentType,
) -> dict:
"""Try a provider with retries, returning the result dict.
Raises:
RuntimeError: If all retry attempts fail.
Returns error strings collected during retries via the exception message.
"""
errors: list[str] = []
for attempt in range(self.config.max_retries_per_provider):
try:
return await self._try_provider(
provider=provider,
messages=messages,
model=model,
temperature=temperature,
max_tokens=max_tokens,
content_type=content_type,
)
except Exception as exc:
error_msg = str(exc)
logger.warning(
"Provider %s attempt %d failed: %s",
provider.name,
attempt + 1,
error_msg,
)
errors.append(f"{provider.name}: {error_msg}")
if attempt < self.config.max_retries_per_provider - 1:
await asyncio.sleep(self.config.retry_delay_seconds)
raise RuntimeError("; ".join(errors))
def _is_provider_available(self, provider: Provider) -> bool:
"""Check if a provider should be tried (enabled + circuit breaker)."""
if not provider.enabled:
logger.debug("Skipping %s (disabled)", provider.name)
return False
if provider.status == ProviderStatus.UNHEALTHY:
if self._can_close_circuit(provider):
provider.circuit_state = CircuitState.HALF_OPEN
provider.half_open_calls = 0
logger.info("Circuit breaker half-open for %s", provider.name)
else:
logger.debug("Skipping %s (circuit open)", provider.name)
return False
return True
async def complete(
self,
messages: list[dict],
@@ -421,7 +509,6 @@ class CascadeRouter:
Raises:
RuntimeError: If all providers fail
"""
# Detect content type for multi-modal routing
content_type = self._detect_content_type(messages)
if content_type != ContentType.TEXT:
logger.debug("Detected %s content, selecting appropriate model", content_type.value)
@@ -429,93 +516,34 @@ class CascadeRouter:
errors = []
for provider in self.providers:
# Skip disabled providers
if not provider.enabled:
logger.debug("Skipping %s (disabled)", provider.name)
if not self._is_provider_available(provider):
continue
# Skip unhealthy providers (circuit breaker)
if provider.status == ProviderStatus.UNHEALTHY:
# Check if circuit breaker can close
if self._can_close_circuit(provider):
provider.circuit_state = CircuitState.HALF_OPEN
provider.half_open_calls = 0
logger.info("Circuit breaker half-open for %s", provider.name)
else:
logger.debug("Skipping %s (circuit open)", provider.name)
continue
selected_model, is_fallback_model = self._select_model(provider, model, content_type)
# Determine which model to use
selected_model = model or provider.get_default_model()
is_fallback_model = False
try:
result = await self._attempt_with_retry(
provider,
messages,
selected_model,
temperature,
max_tokens,
content_type,
)
except RuntimeError as exc:
errors.append(str(exc))
self._record_failure(provider)
continue
# For non-text content, check if model supports it
if content_type != ContentType.TEXT and selected_model:
if provider.type == "ollama" and self._mm_manager:
from infrastructure.models.multimodal import ModelCapability
self._record_success(provider, result.get("latency_ms", 0))
return {
"content": result["content"],
"provider": provider.name,
"model": result.get("model", selected_model or provider.get_default_model()),
"latency_ms": result.get("latency_ms", 0),
"is_fallback_model": is_fallback_model,
}
# Check if selected model supports the required capability
if content_type == ContentType.VISION:
supports = self._mm_manager.model_supports(
selected_model, ModelCapability.VISION
)
if not supports:
# Find fallback model
fallback = self._get_fallback_model(
provider, selected_model, content_type
)
if fallback:
logger.info(
"Model %s doesn't support vision, falling back to %s",
selected_model,
fallback,
)
selected_model = fallback
is_fallback_model = True
else:
logger.warning(
"No vision-capable model found on %s, trying anyway",
provider.name,
)
# Try this provider
for attempt in range(self.config.max_retries_per_provider):
try:
result = await self._try_provider(
provider=provider,
messages=messages,
model=selected_model,
temperature=temperature,
max_tokens=max_tokens,
content_type=content_type,
)
# Success! Update metrics and return
self._record_success(provider, result.get("latency_ms", 0))
return {
"content": result["content"],
"provider": provider.name,
"model": result.get(
"model", selected_model or provider.get_default_model()
),
"latency_ms": result.get("latency_ms", 0),
"is_fallback_model": is_fallback_model,
}
except Exception as exc:
error_msg = str(exc)
logger.warning(
"Provider %s attempt %d failed: %s", provider.name, attempt + 1, error_msg
)
errors.append(f"{provider.name}: {error_msg}")
if attempt < self.config.max_retries_per_provider - 1:
await asyncio.sleep(self.config.retry_delay_seconds)
# All retries failed for this provider
self._record_failure(provider)
# All providers failed
raise RuntimeError(f"All providers failed: {'; '.join(errors)}")
async def _try_provider(
@@ -581,7 +609,7 @@ class CascadeRouter:
"""Call Ollama API with multi-modal support."""
import aiohttp
url = f"{provider.url}/api/chat"
url = f"{provider.url or settings.ollama_url}/api/chat"
# Transform messages for Ollama format (including images)
transformed_messages = self._transform_messages_for_ollama(messages)
@@ -815,6 +843,66 @@ class CascadeRouter:
provider.status = ProviderStatus.HEALTHY
logger.info("Circuit breaker CLOSED for %s", provider.name)
def reload_config(self) -> dict:
"""Hot-reload providers.yaml, preserving runtime state.
Re-reads the config file, rebuilds the provider list, and
preserves circuit breaker state and metrics for providers
that still exist after reload.
Returns:
Summary dict with added/removed/preserved counts.
"""
# Snapshot current runtime state keyed by provider name
old_state: dict[
str, tuple[ProviderMetrics, CircuitState, float | None, int, ProviderStatus]
] = {}
for p in self.providers:
old_state[p.name] = (
p.metrics,
p.circuit_state,
p.circuit_opened_at,
p.half_open_calls,
p.status,
)
old_names = set(old_state.keys())
# Reload from disk
self.providers = []
self._load_config()
# Restore preserved state
new_names = {p.name for p in self.providers}
preserved = 0
for p in self.providers:
if p.name in old_state:
metrics, circuit, opened_at, half_open, status = old_state[p.name]
p.metrics = metrics
p.circuit_state = circuit
p.circuit_opened_at = opened_at
p.half_open_calls = half_open
p.status = status
preserved += 1
added = new_names - old_names
removed = old_names - new_names
logger.info(
"Config reloaded: %d providers (%d preserved, %d added, %d removed)",
len(self.providers),
preserved,
len(added),
len(removed),
)
return {
"total_providers": len(self.providers),
"preserved": preserved,
"added": sorted(added),
"removed": sorted(removed),
}
def get_metrics(self) -> dict:
"""Get metrics for all providers."""
return {

View File

@@ -0,0 +1,152 @@
"""Provider health history — time-series snapshots for dashboard visualization."""
import asyncio
import logging
import sqlite3
from datetime import UTC, datetime, timedelta
from pathlib import Path
logger = logging.getLogger(__name__)
_store: "HealthHistoryStore | None" = None
class HealthHistoryStore:
"""Stores timestamped provider health snapshots in SQLite."""
def __init__(self, db_path: str = "data/router_history.db") -> None:
self.db_path = db_path
if db_path != ":memory:":
Path(db_path).parent.mkdir(parents=True, exist_ok=True)
self._conn = sqlite3.connect(db_path, check_same_thread=False)
self._conn.row_factory = sqlite3.Row
self._init_schema()
self._bg_task: asyncio.Task | None = None
def _init_schema(self) -> None:
self._conn.execute("""
CREATE TABLE IF NOT EXISTS snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
provider_name TEXT NOT NULL,
status TEXT NOT NULL,
error_rate REAL NOT NULL,
avg_latency_ms REAL NOT NULL,
circuit_state TEXT NOT NULL,
total_requests INTEGER NOT NULL
)
""")
self._conn.execute("""
CREATE INDEX IF NOT EXISTS idx_snapshots_ts
ON snapshots(timestamp)
""")
self._conn.commit()
def record_snapshot(self, providers: list[dict]) -> None:
"""Record a health snapshot for all providers."""
ts = datetime.now(UTC).isoformat()
rows = [
(
ts,
p["name"],
p["status"],
p["error_rate"],
p["avg_latency_ms"],
p["circuit_state"],
p["total_requests"],
)
for p in providers
]
self._conn.executemany(
"""INSERT INTO snapshots
(timestamp, provider_name, status, error_rate,
avg_latency_ms, circuit_state, total_requests)
VALUES (?, ?, ?, ?, ?, ?, ?)""",
rows,
)
self._conn.commit()
def get_history(self, hours: int = 24) -> list[dict]:
"""Return snapshots from the last N hours, grouped by timestamp."""
cutoff = (datetime.now(UTC) - timedelta(hours=hours)).isoformat()
rows = self._conn.execute(
"""SELECT timestamp, provider_name, status, error_rate,
avg_latency_ms, circuit_state, total_requests
FROM snapshots WHERE timestamp >= ? ORDER BY timestamp""",
(cutoff,),
).fetchall()
# Group by timestamp
snapshots: dict[str, list[dict]] = {}
for row in rows:
ts = row["timestamp"]
if ts not in snapshots:
snapshots[ts] = []
snapshots[ts].append(
{
"name": row["provider_name"],
"status": row["status"],
"error_rate": row["error_rate"],
"avg_latency_ms": row["avg_latency_ms"],
"circuit_state": row["circuit_state"],
"total_requests": row["total_requests"],
}
)
return [{"timestamp": ts, "providers": providers} for ts, providers in snapshots.items()]
def prune(self, keep_hours: int = 168) -> int:
"""Remove snapshots older than keep_hours. Returns rows deleted."""
cutoff = (datetime.now(UTC) - timedelta(hours=keep_hours)).isoformat()
cursor = self._conn.execute("DELETE FROM snapshots WHERE timestamp < ?", (cutoff,))
self._conn.commit()
return cursor.rowcount
def close(self) -> None:
"""Close the database connection."""
if self._bg_task and not self._bg_task.done():
self._bg_task.cancel()
self._conn.close()
def _capture_snapshot(self, cascade_router) -> None: # noqa: ANN001
"""Capture current provider state as a snapshot."""
providers = []
for p in cascade_router.providers:
providers.append(
{
"name": p.name,
"status": p.status.value,
"error_rate": round(p.metrics.error_rate, 4),
"avg_latency_ms": round(p.metrics.avg_latency_ms, 2),
"circuit_state": p.circuit_state.value,
"total_requests": p.metrics.total_requests,
}
)
self.record_snapshot(providers)
async def start_background_task(
self,
cascade_router,
interval_seconds: int = 60, # noqa: ANN001
) -> None:
"""Start periodic snapshot capture."""
async def _loop() -> None:
while True:
try:
self._capture_snapshot(cascade_router)
logger.debug("Recorded health snapshot")
except Exception:
logger.exception("Failed to record health snapshot")
await asyncio.sleep(interval_seconds)
self._bg_task = asyncio.create_task(_loop())
logger.info("Health history background task started (interval=%ds)", interval_seconds)
def get_history_store() -> HealthHistoryStore:
"""Get or create the singleton history store."""
global _store # noqa: PLW0603
if _store is None:
_store = HealthHistoryStore()
return _store

View File

@@ -515,25 +515,36 @@ class DiscordVendor(ChatPlatform):
async def _handle_message(self, message) -> None:
"""Process an incoming message and respond via a thread."""
# Strip the bot mention from the message content
content = message.content
if self._client.user:
content = content.replace(f"<@{self._client.user.id}>", "").strip()
content = self._extract_content(message)
if not content:
return
# Create or reuse a thread for this conversation
thread = await self._get_or_create_thread(message)
target = thread or message.channel
session_id = f"discord_{thread.id}" if thread else f"discord_{message.channel.id}"
# Derive session_id for per-conversation history via Agno's SQLite
if thread:
session_id = f"discord_{thread.id}"
else:
session_id = f"discord_{message.channel.id}"
run_output, response = await self._invoke_agent(content, session_id, target)
# Run Timmy agent with typing indicator and timeout
if run_output is not None:
await self._handle_paused_run(run_output, target, session_id)
raw_content = run_output.content if hasattr(run_output, "content") else ""
response = _clean_response(raw_content or "")
await self._send_response(response, target)
def _extract_content(self, message) -> str:
"""Strip the bot mention and return clean message text."""
content = message.content
if self._client.user:
content = content.replace(f"<@{self._client.user.id}>", "").strip()
return content
async def _invoke_agent(self, content: str, session_id: str, target):
"""Run chat_with_tools with a typing indicator and timeout.
Returns a (run_output, error_response) tuple. On success the
error_response is ``None``; on failure run_output is ``None``.
"""
run_output = None
response = None
try:
@@ -548,51 +559,57 @@ class DiscordVendor(ChatPlatform):
except Exception as exc:
logger.error("Discord: chat_with_tools() failed: %s", exc)
response = "I'm having trouble reaching my inference backend right now. Please try again shortly."
return run_output, response
# Check if Agno paused the run for tool confirmation
if run_output is not None:
status = getattr(run_output, "status", None)
is_paused = status == "PAUSED" or str(status) == "RunStatus.paused"
async def _handle_paused_run(self, run_output, target, session_id: str) -> None:
"""If Agno paused the run for tool confirmation, enqueue approvals."""
status = getattr(run_output, "status", None)
is_paused = status == "PAUSED" or str(status) == "RunStatus.paused"
if is_paused and getattr(run_output, "active_requirements", None):
from config import settings
if not (is_paused and getattr(run_output, "active_requirements", None)):
return
if settings.discord_confirm_actions:
for req in run_output.active_requirements:
if getattr(req, "needs_confirmation", False):
te = req.tool_execution
tool_name = getattr(te, "tool_name", "unknown")
tool_args = getattr(te, "tool_args", {}) or {}
from config import settings
from timmy.approvals import create_item
if not settings.discord_confirm_actions:
return
item = create_item(
title=f"Discord: {tool_name}",
description=_format_action_description(tool_name, tool_args),
proposed_action=json.dumps({"tool": tool_name, "args": tool_args}),
impact=_get_impact_level(tool_name),
)
self._pending_actions[item.id] = {
"run_output": run_output,
"requirement": req,
"tool_name": tool_name,
"tool_args": tool_args,
"target": target,
"session_id": session_id,
}
await self._send_confirmation(target, tool_name, tool_args, item.id)
for req in run_output.active_requirements:
if not getattr(req, "needs_confirmation", False):
continue
te = req.tool_execution
tool_name = getattr(te, "tool_name", "unknown")
tool_args = getattr(te, "tool_args", {}) or {}
raw_content = run_output.content if hasattr(run_output, "content") else ""
response = _clean_response(raw_content or "")
from timmy.approvals import create_item
# Discord has a 2000 character limit — send with error handling
if response and response.strip():
for chunk in _chunk_message(response, 2000):
try:
await target.send(chunk)
except Exception as exc:
logger.error("Discord: failed to send message chunk: %s", exc)
break
item = create_item(
title=f"Discord: {tool_name}",
description=_format_action_description(tool_name, tool_args),
proposed_action=json.dumps({"tool": tool_name, "args": tool_args}),
impact=_get_impact_level(tool_name),
)
self._pending_actions[item.id] = {
"run_output": run_output,
"requirement": req,
"tool_name": tool_name,
"tool_args": tool_args,
"target": target,
"session_id": session_id,
}
await self._send_confirmation(target, tool_name, tool_args, item.id)
@staticmethod
async def _send_response(response: str | None, target) -> None:
"""Send a response to Discord, chunked to the 2000-char limit."""
if not response or not response.strip():
return
for chunk in _chunk_message(response, 2000):
try:
await target.send(chunk)
except Exception as exc:
logger.error("Discord: failed to send message chunk: %s", exc)
break
async def _get_or_create_thread(self, message):
"""Get the active thread for a channel, or create one.

View File

@@ -0,0 +1 @@
"""Lightning Network integration for tool-usage micro-payments."""

69
src/lightning/factory.py Normal file
View File

@@ -0,0 +1,69 @@
"""Lightning backend factory.
Returns a mock or real LND backend based on ``settings.lightning_backend``.
"""
from __future__ import annotations
import hashlib
import logging
import secrets
from dataclasses import dataclass
from config import settings
logger = logging.getLogger(__name__)
@dataclass
class Invoice:
"""Minimal Lightning invoice representation."""
payment_hash: str
payment_request: str
amount_sats: int
memo: str
class MockBackend:
"""In-memory mock Lightning backend for development and testing."""
def create_invoice(self, amount_sats: int, memo: str = "") -> Invoice:
"""Create a fake invoice with a random payment hash."""
raw = secrets.token_bytes(32)
payment_hash = hashlib.sha256(raw).hexdigest()
payment_request = f"lnbc{amount_sats}mock{payment_hash[:20]}"
logger.debug("Mock invoice: %s sats — %s", amount_sats, payment_hash[:12])
return Invoice(
payment_hash=payment_hash,
payment_request=payment_request,
amount_sats=amount_sats,
memo=memo,
)
# Singleton — lazily created
_backend: MockBackend | None = None
def get_backend() -> MockBackend:
"""Return the configured Lightning backend (currently mock-only).
Raises ``ValueError`` if an unsupported backend is requested.
"""
global _backend # noqa: PLW0603
if _backend is not None:
return _backend
kind = settings.lightning_backend
if kind == "mock":
_backend = MockBackend()
elif kind == "lnd":
# LND gRPC integration is on the roadmap — for now fall back to mock.
logger.warning("LND backend not yet implemented — using mock")
_backend = MockBackend()
else:
raise ValueError(f"Unknown lightning_backend: {kind!r}")
logger.info("Lightning backend: %s", kind)
return _backend

146
src/lightning/ledger.py Normal file
View File

@@ -0,0 +1,146 @@
"""In-memory Lightning transaction ledger.
Tracks invoices, settlements, and balances per the schema in
``docs/adr/018-lightning-ledger.md``. Uses a simple in-memory list so the
dashboard can display real (ephemeral) data without requiring SQLite yet.
"""
from __future__ import annotations
import logging
import uuid
from dataclasses import dataclass
from datetime import UTC, datetime
from enum import StrEnum
logger = logging.getLogger(__name__)
class TxType(StrEnum):
incoming = "incoming"
outgoing = "outgoing"
class TxStatus(StrEnum):
pending = "pending"
settled = "settled"
failed = "failed"
expired = "expired"
@dataclass
class LedgerEntry:
"""Single ledger row matching the ADR-018 schema."""
id: str
tx_type: TxType
status: TxStatus
payment_hash: str
amount_sats: int
memo: str
source: str
created_at: str
invoice: str = ""
preimage: str = ""
task_id: str = ""
agent_id: str = ""
settled_at: str = ""
fee_sats: int = 0
# ── In-memory store ──────────────────────────────────────────────────
_entries: list[LedgerEntry] = []
def create_invoice_entry(
payment_hash: str,
amount_sats: int,
memo: str = "",
source: str = "tool_usage",
task_id: str = "",
agent_id: str = "",
invoice: str = "",
) -> LedgerEntry:
"""Record a new incoming invoice in the ledger."""
entry = LedgerEntry(
id=uuid.uuid4().hex[:16],
tx_type=TxType.incoming,
status=TxStatus.pending,
payment_hash=payment_hash,
amount_sats=amount_sats,
memo=memo,
source=source,
task_id=task_id,
agent_id=agent_id,
invoice=invoice,
created_at=datetime.now(UTC).isoformat(),
)
_entries.append(entry)
logger.debug("Ledger entry created: %s (%s sats)", entry.id, amount_sats)
return entry
def mark_settled(payment_hash: str, preimage: str = "") -> LedgerEntry | None:
"""Mark a pending entry as settled by payment hash."""
for entry in _entries:
if entry.payment_hash == payment_hash and entry.status == TxStatus.pending:
entry.status = TxStatus.settled
entry.preimage = preimage
entry.settled_at = datetime.now(UTC).isoformat()
logger.debug("Ledger settled: %s", payment_hash[:12])
return entry
return None
def get_balance() -> dict:
"""Compute the current balance from settled and pending entries."""
incoming_total = sum(
e.amount_sats
for e in _entries
if e.tx_type == TxType.incoming and e.status == TxStatus.settled
)
outgoing_total = sum(
e.amount_sats
for e in _entries
if e.tx_type == TxType.outgoing and e.status == TxStatus.settled
)
fees = sum(e.fee_sats for e in _entries if e.status == TxStatus.settled)
pending_in = sum(
e.amount_sats
for e in _entries
if e.tx_type == TxType.incoming and e.status == TxStatus.pending
)
pending_out = sum(
e.amount_sats
for e in _entries
if e.tx_type == TxType.outgoing and e.status == TxStatus.pending
)
net = incoming_total - outgoing_total - fees
return {
"incoming_total_sats": incoming_total,
"outgoing_total_sats": outgoing_total,
"fees_paid_sats": fees,
"net_sats": net,
"pending_incoming_sats": pending_in,
"pending_outgoing_sats": pending_out,
"available_sats": net - pending_out,
}
def get_transactions(
tx_type: str | None = None,
status: str | None = None,
limit: int = 50,
) -> list[LedgerEntry]:
"""Return ledger entries, optionally filtered."""
result = _entries
if tx_type:
result = [e for e in result if e.tx_type.value == tx_type]
if status:
result = [e for e in result if e.status.value == status]
return list(reversed(result))[:limit]
def clear() -> None:
"""Reset the ledger (for testing)."""
_entries.clear()

View File

@@ -1 +1 @@
"""Timmy — Core AI agent (Ollama/AirLLM backends, CLI, prompts)."""
"""Timmy — Core AI agent (Ollama/Grok/Claude backends, CLI, prompts)."""

View File

@@ -26,12 +26,12 @@ from timmy.prompts import get_system_prompt
from timmy.tools import create_full_toolkit
if TYPE_CHECKING:
from timmy.backends import ClaudeBackend, GrokBackend, TimmyAirLLMAgent
from timmy.backends import ClaudeBackend, GrokBackend
logger = logging.getLogger(__name__)
# Union type for callers that want to hint the return type.
TimmyAgent = Union[Agent, "TimmyAirLLMAgent", "GrokBackend", "ClaudeBackend"]
TimmyAgent = Union[Agent, "GrokBackend", "ClaudeBackend"]
# Models known to be too small for reliable tool calling.
# These hallucinate tool calls as text, invoke tools randomly,
@@ -63,7 +63,7 @@ def _pull_model(model_name: str) -> bool:
logger.info("Pulling model: %s", model_name)
url = settings.ollama_url.replace("localhost", "127.0.0.1")
url = settings.normalized_ollama_url
req = urllib.request.Request(
f"{url}/api/pull",
method="POST",
@@ -172,107 +172,34 @@ def _warmup_model(model_name: str) -> bool:
def _resolve_backend(requested: str | None) -> str:
"""Return the backend name to use, resolving 'auto' and explicit overrides.
"""Return the backend name to use.
Priority (highest lowest):
Priority (highest -> lowest):
1. CLI flag passed directly to create_timmy()
2. TIMMY_MODEL_BACKEND env var / .env setting
3. 'ollama' (safe default no surprises)
'auto' triggers Apple Silicon detection: uses AirLLM if both
is_apple_silicon() and airllm_available() return True.
3. 'ollama' (safe default -- no surprises)
"""
if requested is not None:
return requested
configured = settings.timmy_model_backend # "ollama" | "airllm" | "grok" | "claude" | "auto"
if configured != "auto":
return configured
# "auto" path — lazy import to keep startup fast and tests clean.
from timmy.backends import airllm_available, is_apple_silicon
if is_apple_silicon() and airllm_available():
return "airllm"
return "ollama"
return settings.timmy_model_backend # "ollama" | "grok" | "claude"
def create_timmy(
db_file: str = "timmy.db",
backend: str | None = None,
model_size: str | None = None,
*,
skip_mcp: bool = False,
session_id: str = "unknown",
) -> TimmyAgent:
"""Instantiate the agent — Ollama or AirLLM, same public interface.
def _build_tools_list(use_tools: bool, skip_mcp: bool, model_name: str) -> list:
"""Assemble the tools list based on model capability and MCP flags.
Args:
db_file: SQLite file for Agno conversation memory (Ollama path only).
backend: "ollama" | "airllm" | "auto" | None (reads config/env).
model_size: AirLLM size — "8b" | "70b" | "405b" | None (reads config).
skip_mcp: If True, omit MCP tool servers (Gitea, filesystem).
Use for background tasks (thinking, QA) where MCP's
stdio cancel-scope lifecycle conflicts with asyncio
task cancellation.
Returns an Agno Agent or backend-specific agent — all expose
print_response(message, stream).
Returns a list of Toolkit / MCPTools objects, or an empty list.
"""
resolved = _resolve_backend(backend)
size = model_size or settings.airllm_model_size
if resolved == "claude":
from timmy.backends import ClaudeBackend
return ClaudeBackend()
if resolved == "grok":
from timmy.backends import GrokBackend
return GrokBackend()
if resolved == "airllm":
from timmy.backends import TimmyAirLLMAgent
return TimmyAirLLMAgent(model_size=size)
# Default: Ollama via Agno.
# Resolve model with automatic pulling and fallback
model_name, is_fallback = _resolve_model_with_fallback(
requested_model=None,
require_vision=False,
auto_pull=True,
)
# If Ollama is completely unreachable, fail loudly.
# Sovereignty: never silently send data to a cloud API.
# Use --backend claude explicitly if you want cloud inference.
if not _check_model_available(model_name):
logger.error(
"Ollama unreachable and no local models available. "
"Start Ollama with 'ollama serve' or use --backend claude explicitly."
)
if is_fallback:
logger.info("Using fallback model %s (requested was unavailable)", model_name)
use_tools = _model_supports_tools(model_name)
# Conditionally include tools — small models get none
toolkit = create_full_toolkit() if use_tools else None
if not use_tools:
logger.info("Tools disabled for model %s (too small for reliable tool calling)", model_name)
return []
# Build the tools list — Agno accepts a list of Toolkit / MCPTools
tools_list: list = []
if toolkit:
tools_list.append(toolkit)
tools_list: list = [create_full_toolkit()]
# Add MCP tool servers (lazy-connected on first arun()).
# Skipped when skip_mcp=True — MCP's stdio transport uses anyio cancel
# scopes that conflict with asyncio background task cancellation (#72).
if use_tools and not skip_mcp:
if not skip_mcp:
try:
from timmy.mcp_tools import create_filesystem_mcp_tools, create_gitea_mcp_tools
@@ -286,30 +213,46 @@ def create_timmy(
except Exception as exc:
logger.debug("MCP tools unavailable: %s", exc)
# Select prompt tier based on tool capability
return tools_list
def _build_prompt(use_tools: bool, session_id: str) -> str:
"""Build the full system prompt with optional memory context."""
base_prompt = get_system_prompt(tools_enabled=use_tools, session_id=session_id)
# Try to load memory context
try:
from timmy.memory_system import memory_system
memory_context = memory_system.get_system_context()
if memory_context:
# Truncate if too long — smaller budget for small models
# since the expanded prompt (roster, guardrails) uses more tokens
# Smaller budget for small models — expanded prompt uses more tokens
max_context = 2000 if not use_tools else 8000
if len(memory_context) > max_context:
memory_context = memory_context[:max_context] + "\n... [truncated]"
full_prompt = f"{base_prompt}\n\n## Memory Context\n\n{memory_context}"
else:
full_prompt = base_prompt
return (
f"{base_prompt}\n\n"
f"## GROUNDED CONTEXT (verified sources — cite when using)\n\n"
f"{memory_context}"
)
except Exception as exc:
logger.warning("Failed to load memory context: %s", exc)
full_prompt = base_prompt
return base_prompt
def _create_ollama_agent(
*,
db_file: str,
model_name: str,
tools_list: list,
full_prompt: str,
use_tools: bool,
) -> Agent:
"""Construct the Agno Agent with Ollama backend and warm up the model."""
model_kwargs = {}
if settings.ollama_num_ctx > 0:
model_kwargs["options"] = {"num_ctx": settings.ollama_num_ctx}
agent = Agent(
name="Agent",
model=Ollama(id=model_name, host=settings.ollama_url, timeout=300, **model_kwargs),
@@ -326,6 +269,67 @@ def create_timmy(
return agent
def create_timmy(
db_file: str = "timmy.db",
backend: str | None = None,
*,
skip_mcp: bool = False,
session_id: str = "unknown",
) -> TimmyAgent:
"""Instantiate the agent — Ollama, Grok, or Claude.
Args:
db_file: SQLite file for Agno conversation memory (Ollama path only).
backend: "ollama" | "grok" | "claude" | None (reads config/env).
skip_mcp: If True, omit MCP tool servers (Gitea, filesystem).
Use for background tasks (thinking, QA) where MCP's
stdio cancel-scope lifecycle conflicts with asyncio
task cancellation.
Returns an Agno Agent or backend-specific agent — all expose
print_response(message, stream).
"""
resolved = _resolve_backend(backend)
if resolved == "claude":
from timmy.backends import ClaudeBackend
return ClaudeBackend()
if resolved == "grok":
from timmy.backends import GrokBackend
return GrokBackend()
# Default: Ollama via Agno.
model_name, is_fallback = _resolve_model_with_fallback(
requested_model=None,
require_vision=False,
auto_pull=True,
)
if not _check_model_available(model_name):
logger.error(
"Ollama unreachable and no local models available. "
"Start Ollama with 'ollama serve' or use --backend claude explicitly."
)
if is_fallback:
logger.info("Using fallback model %s (requested was unavailable)", model_name)
use_tools = _model_supports_tools(model_name)
tools_list = _build_tools_list(use_tools, skip_mcp, model_name)
full_prompt = _build_prompt(use_tools, session_id)
return _create_ollama_agent(
db_file=db_file,
model_name=model_name,
tools_list=tools_list,
full_prompt=full_prompt,
use_tools=use_tools,
)
class TimmyWithMemory:
"""Agent wrapper with explicit three-tier memory management."""

View File

@@ -95,6 +95,126 @@ def _parse_steps(plan_text: str) -> list[str]:
return [line.strip() for line in plan_text.strip().splitlines() if line.strip()]
# ---------------------------------------------------------------------------
# Extracted helpers
# ---------------------------------------------------------------------------
def _extract_content(run_result) -> str:
"""Extract text content from an agent run result."""
return run_result.content if hasattr(run_result, "content") else str(run_result)
def _clean(text: str) -> str:
"""Clean a model response using session's response cleaner."""
from timmy.session import _clean_response
return _clean_response(text)
async def _plan_task(
agent, task: str, session_id: str, max_steps: int
) -> tuple[list[str], bool] | str:
"""Run the planning phase — returns (steps, was_truncated) or error string."""
plan_prompt = (
f"Break this task into numbered steps (max {max_steps}). "
f"Return ONLY a numbered list, nothing else.\n\n"
f"Task: {task}"
)
try:
plan_run = await asyncio.to_thread(
agent.run, plan_prompt, stream=False, session_id=f"{session_id}_plan"
)
plan_text = _extract_content(plan_run)
except Exception as exc: # broad catch intentional: agent.run can raise any error
logger.error("Agentic loop: planning failed: %s", exc)
return f"Planning failed: {exc}"
steps = _parse_steps(plan_text)
if not steps:
return "Planning produced no steps."
planned_count = len(steps)
steps = steps[:max_steps]
return steps, planned_count > len(steps)
async def _execute_step(
agent,
task: str,
step_desc: str,
step_num: int,
total_steps: int,
recent_results: list[str],
session_id: str,
) -> AgenticStep:
"""Execute a single step, returning an AgenticStep."""
step_start = time.monotonic()
context = (
f"Task: {task}\n"
f"Step {step_num}/{total_steps}: {step_desc}\n"
f"Recent progress: {recent_results[-2:] if recent_results else []}\n\n"
f"Execute this step and report what you did."
)
step_run = await asyncio.to_thread(
agent.run, context, stream=False, session_id=f"{session_id}_step{step_num}"
)
step_result = _clean(_extract_content(step_run))
return AgenticStep(
step_num=step_num,
description=step_desc,
result=step_result,
status="completed",
duration_ms=int((time.monotonic() - step_start) * 1000),
)
async def _adapt_step(
agent,
step_desc: str,
step_num: int,
error: Exception,
step_start: float,
session_id: str,
) -> AgenticStep:
"""Attempt adaptation after a step failure."""
adapt_prompt = (
f"Step {step_num} failed with error: {error}\n"
f"Original step was: {step_desc}\n"
f"Adapt the plan and try an alternative approach for this step."
)
adapt_run = await asyncio.to_thread(
agent.run, adapt_prompt, stream=False, session_id=f"{session_id}_adapt{step_num}"
)
adapt_result = _clean(_extract_content(adapt_run))
return AgenticStep(
step_num=step_num,
description=f"[Adapted] {step_desc}",
result=adapt_result,
status="adapted",
duration_ms=int((time.monotonic() - step_start) * 1000),
)
def _summarize(result: AgenticResult, total_steps: int, was_truncated: bool) -> None:
"""Fill in summary and final status on the result object (mutates in place)."""
completed = sum(1 for s in result.steps if s.status == "completed")
adapted = sum(1 for s in result.steps if s.status == "adapted")
failed = sum(1 for s in result.steps if s.status == "failed")
parts = [f"Completed {completed}/{total_steps} steps"]
if adapted:
parts.append(f"{adapted} adapted")
if failed:
parts.append(f"{failed} failed")
result.summary = f"{result.task}: {', '.join(parts)}."
if was_truncated or len(result.steps) < total_steps or failed:
result.status = "partial"
else:
result.status = "completed"
# ---------------------------------------------------------------------------
# Core loop
# ---------------------------------------------------------------------------
@@ -125,88 +245,41 @@ async def run_agentic_loop(
task_id = str(uuid.uuid4())[:8]
start_time = time.monotonic()
agent = _get_loop_agent()
result = AgenticResult(task_id=task_id, task=task, summary="")
# ── Phase 1: Planning ──────────────────────────────────────────────────
plan_prompt = (
f"Break this task into numbered steps (max {max_steps}). "
f"Return ONLY a numbered list, nothing else.\n\n"
f"Task: {task}"
)
try:
plan_run = await asyncio.to_thread(
agent.run, plan_prompt, stream=False, session_id=f"{session_id}_plan"
)
plan_text = plan_run.content if hasattr(plan_run, "content") else str(plan_run)
except Exception as exc: # broad catch intentional: agent.run can raise any error
logger.error("Agentic loop: planning failed: %s", exc)
# Phase 1: Planning
plan = await _plan_task(agent, task, session_id, max_steps)
if isinstance(plan, str):
result.status = "failed"
result.summary = f"Planning failed: {exc}"
result.summary = plan
result.total_duration_ms = int((time.monotonic() - start_time) * 1000)
return result
steps = _parse_steps(plan_text)
if not steps:
result.status = "failed"
result.summary = "Planning produced no steps."
result.total_duration_ms = int((time.monotonic() - start_time) * 1000)
return result
# Enforce max_steps — track if we truncated
planned_steps = len(steps)
steps = steps[:max_steps]
steps, was_truncated = plan
total_steps = len(steps)
was_truncated = planned_steps > total_steps
# Broadcast plan
await _broadcast_progress(
"agentic.plan_ready",
{
"task_id": task_id,
"task": task,
"steps": steps,
"total": total_steps,
},
{"task_id": task_id, "task": task, "steps": steps, "total": total_steps},
)
# ── Phase 2: Execution ─────────────────────────────────────────────────
# Phase 2: Execution
completed_results: list[str] = []
for i, step_desc in enumerate(steps, 1):
step_start = time.monotonic()
recent = completed_results[-2:] if completed_results else []
context = (
f"Task: {task}\n"
f"Step {i}/{total_steps}: {step_desc}\n"
f"Recent progress: {recent}\n\n"
f"Execute this step and report what you did."
)
try:
step_run = await asyncio.to_thread(
agent.run, context, stream=False, session_id=f"{session_id}_step{i}"
)
step_result = step_run.content if hasattr(step_run, "content") else str(step_run)
# Clean the response
from timmy.session import _clean_response
step_result = _clean_response(step_result)
step = AgenticStep(
step_num=i,
description=step_desc,
result=step_result,
status="completed",
duration_ms=int((time.monotonic() - step_start) * 1000),
step = await _execute_step(
agent,
task,
step_desc,
i,
total_steps,
completed_results,
session_id,
)
result.steps.append(step)
completed_results.append(f"Step {i}: {step_result[:200]}")
# Broadcast progress
completed_results.append(f"Step {i}: {step.result[:200]}")
await _broadcast_progress(
"agentic.step_complete",
{
@@ -214,46 +287,18 @@ async def run_agentic_loop(
"step": i,
"total": total_steps,
"description": step_desc,
"result": step_result[:200],
"result": step.result[:200],
},
)
if on_progress:
await on_progress(step_desc, i, total_steps)
except Exception as exc: # broad catch intentional: agent.run can raise any error
logger.warning("Agentic loop step %d failed: %s", i, exc)
# ── Adaptation: ask model to adapt ─────────────────────────────
adapt_prompt = (
f"Step {i} failed with error: {exc}\n"
f"Original step was: {step_desc}\n"
f"Adapt the plan and try an alternative approach for this step."
)
try:
adapt_run = await asyncio.to_thread(
agent.run,
adapt_prompt,
stream=False,
session_id=f"{session_id}_adapt{i}",
)
adapt_result = (
adapt_run.content if hasattr(adapt_run, "content") else str(adapt_run)
)
from timmy.session import _clean_response
adapt_result = _clean_response(adapt_result)
step = AgenticStep(
step_num=i,
description=f"[Adapted] {step_desc}",
result=adapt_result,
status="adapted",
duration_ms=int((time.monotonic() - step_start) * 1000),
)
step = await _adapt_step(agent, step_desc, i, exc, step_start, session_id)
result.steps.append(step)
completed_results.append(f"Step {i} (adapted): {adapt_result[:200]}")
completed_results.append(f"Step {i} (adapted): {step.result[:200]}")
await _broadcast_progress(
"agentic.step_adapted",
{
@@ -262,46 +307,26 @@ async def run_agentic_loop(
"total": total_steps,
"description": step_desc,
"error": str(exc),
"adaptation": adapt_result[:200],
"adaptation": step.result[:200],
},
)
if on_progress:
await on_progress(f"[Adapted] {step_desc}", i, total_steps)
except Exception as adapt_exc: # broad catch intentional: agent.run can raise any error
except Exception as adapt_exc: # broad catch intentional
logger.error("Agentic loop adaptation also failed: %s", adapt_exc)
step = AgenticStep(
step_num=i,
description=step_desc,
result=f"Failed: {exc}; Adaptation also failed: {adapt_exc}",
status="failed",
duration_ms=int((time.monotonic() - step_start) * 1000),
result.steps.append(
AgenticStep(
step_num=i,
description=step_desc,
result=f"Failed: {exc}; Adaptation also failed: {adapt_exc}",
status="failed",
duration_ms=int((time.monotonic() - step_start) * 1000),
)
)
result.steps.append(step)
completed_results.append(f"Step {i}: FAILED")
# ── Phase 3: Summary ───────────────────────────────────────────────────
completed_count = sum(1 for s in result.steps if s.status == "completed")
adapted_count = sum(1 for s in result.steps if s.status == "adapted")
failed_count = sum(1 for s in result.steps if s.status == "failed")
parts = [f"Completed {completed_count}/{total_steps} steps"]
if adapted_count:
parts.append(f"{adapted_count} adapted")
if failed_count:
parts.append(f"{failed_count} failed")
result.summary = f"{task}: {', '.join(parts)}."
# Determine final status
if was_truncated:
result.status = "partial"
elif len(result.steps) < total_steps:
result.status = "partial"
elif any(s.status == "failed" for s in result.steps):
result.status = "partial"
else:
result.status = "completed"
# Phase 3: Summary
_summarize(result, total_steps, was_truncated)
result.total_duration_ms = int((time.monotonic() - start_time) * 1000)
await _broadcast_progress(

View File

@@ -119,75 +119,84 @@ class BaseAgent(ABC):
"""
pass
async def run(self, message: str) -> str:
"""Run the agent with a message.
# Transient errors that indicate Ollama contention or temporary
# unavailability — these deserve a retry with backoff.
_TRANSIENT = (
httpx.ConnectError,
httpx.ReadError,
httpx.ReadTimeout,
httpx.ConnectTimeout,
ConnectionError,
TimeoutError,
)
Retries on transient failures (connection errors, timeouts) with
exponential backoff. GPU contention from concurrent Ollama
requests causes ReadError / ReadTimeout — these are transient
and should be retried, not raised immediately (#70).
async def run(self, message: str, *, max_retries: int = 3) -> str:
"""Run the agent with a message, retrying on transient failures.
Returns:
Agent response
GPU contention from concurrent Ollama requests causes ReadError /
ReadTimeout — these are transient and retried with exponential
backoff (#70).
"""
max_retries = 3
last_exception = None
# Transient errors that indicate Ollama contention or temporary
# unavailability — these deserve a retry with backoff.
_transient = (
httpx.ConnectError,
httpx.ReadError,
httpx.ReadTimeout,
httpx.ConnectTimeout,
ConnectionError,
TimeoutError,
)
response = await self._run_with_retries(message, max_retries)
await self._emit_response_event(message, response)
return response
async def _run_with_retries(self, message: str, max_retries: int) -> str:
"""Execute agent.run() with retry logic for transient errors."""
for attempt in range(1, max_retries + 1):
try:
result = self.agent.run(message, stream=False)
response = result.content if hasattr(result, "content") else str(result)
break # Success, exit the retry loop
except _transient as exc:
last_exception = exc
if attempt < max_retries:
# Contention backoff — longer waits because the GPU
# needs time to finish the other request.
wait = min(2**attempt, 16)
logger.warning(
"Ollama contention on attempt %d/%d: %s. Waiting %ds before retry...",
attempt,
max_retries,
type(exc).__name__,
wait,
)
await asyncio.sleep(wait)
else:
logger.error(
"Ollama unreachable after %d attempts: %s",
max_retries,
exc,
)
raise last_exception from exc
return result.content if hasattr(result, "content") else str(result)
except self._TRANSIENT as exc:
self._handle_retry_or_raise(
exc,
attempt,
max_retries,
transient=True,
)
await asyncio.sleep(min(2**attempt, 16))
except Exception as exc:
last_exception = exc
if attempt < max_retries:
logger.warning(
"Agent run failed on attempt %d/%d: %s. Retrying...",
attempt,
max_retries,
exc,
)
await asyncio.sleep(min(2 ** (attempt - 1), 8))
else:
logger.error(
"Agent run failed after %d attempts: %s",
max_retries,
exc,
)
raise last_exception from exc
self._handle_retry_or_raise(
exc,
attempt,
max_retries,
transient=False,
)
await asyncio.sleep(min(2 ** (attempt - 1), 8))
# Unreachable — _handle_retry_or_raise raises on last attempt.
raise RuntimeError("retry loop exited unexpectedly") # pragma: no cover
# Emit completion event
@staticmethod
def _handle_retry_or_raise(
exc: Exception,
attempt: int,
max_retries: int,
*,
transient: bool,
) -> None:
"""Log a retry warning or raise after exhausting attempts."""
if attempt < max_retries:
if transient:
logger.warning(
"Ollama contention on attempt %d/%d: %s. Waiting before retry...",
attempt,
max_retries,
type(exc).__name__,
)
else:
logger.warning(
"Agent run failed on attempt %d/%d: %s. Retrying...",
attempt,
max_retries,
exc,
)
else:
label = "Ollama unreachable" if transient else "Agent run failed"
logger.error("%s after %d attempts: %s", label, max_retries, exc)
raise exc
async def _emit_response_event(self, message: str, response: str) -> None:
"""Publish a completion event to the event bus if connected."""
if self.event_bus:
await self.event_bus.publish(
Event(
@@ -197,8 +206,6 @@ class BaseAgent(ABC):
)
)
return response
def get_capabilities(self) -> list[str]:
"""Get list of capabilities this agent provides."""
return self.tools

View File

@@ -1,11 +1,10 @@
"""LLM backends — AirLLM (local big models), Grok (xAI), and Claude (Anthropic).
"""LLM backends — Grok (xAI) and Claude (Anthropic).
Provides drop-in replacements for the Agno Agent that expose the same
run(message, stream) → RunResult interface used by the dashboard and the
print_response(message, stream) interface used by the CLI.
Backends:
- TimmyAirLLMAgent: Local 8B/70B/405B via AirLLM (Apple Silicon or PyTorch)
- GrokBackend: xAI Grok API via OpenAI-compatible SDK (opt-in premium)
- ClaudeBackend: Anthropic Claude API — lightweight cloud fallback
@@ -16,21 +15,11 @@ import logging
import platform
import time
from dataclasses import dataclass
from typing import Literal
from timmy.prompts import get_system_prompt
logger = logging.getLogger(__name__)
# HuggingFace model IDs for each supported size.
_AIRLLM_MODELS: dict[str, str] = {
"8b": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"70b": "meta-llama/Meta-Llama-3.1-70B-Instruct",
"405b": "meta-llama/Meta-Llama-3.1-405B-Instruct",
}
ModelSize = Literal["8b", "70b", "405b"]
@dataclass
class RunResult:
@@ -45,108 +34,6 @@ def is_apple_silicon() -> bool:
return platform.system() == "Darwin" and platform.machine() == "arm64"
def airllm_available() -> bool:
"""Return True when the airllm package is importable."""
try:
import airllm # noqa: F401
return True
except ImportError:
return False
class TimmyAirLLMAgent:
"""Thin AirLLM wrapper compatible with both dashboard and CLI call sites.
Exposes:
run(message, stream) → RunResult(content=...) [dashboard]
print_response(message, stream) → None [CLI]
Maintains a rolling 10-turn in-memory history so Timmy remembers the
conversation within a session — no SQLite needed at this layer.
"""
def __init__(self, model_size: str = "70b") -> None:
model_id = _AIRLLM_MODELS.get(model_size)
if model_id is None:
raise ValueError(
f"Unknown model size {model_size!r}. Choose from: {list(_AIRLLM_MODELS)}"
)
if is_apple_silicon():
from airllm import AirLLMMLX # type: ignore[import]
self._model = AirLLMMLX(model_id)
else:
from airllm import AutoModel # type: ignore[import]
self._model = AutoModel.from_pretrained(model_id)
self._history: list[str] = []
self._model_size = model_size
# ── public interface (mirrors Agno Agent) ────────────────────────────────
def run(self, message: str, *, stream: bool = False) -> RunResult:
"""Run inference and return a structured result (matches Agno Agent.run()).
`stream` is accepted for API compatibility; AirLLM always generates
the full output in one pass.
"""
prompt = self._build_prompt(message)
input_tokens = self._model.tokenizer(
[prompt],
return_tensors="pt",
padding=True,
truncation=True,
max_length=2048,
)
output = self._model.generate(
**input_tokens,
max_new_tokens=512,
use_cache=True,
do_sample=True,
temperature=0.7,
)
# Decode only the newly generated tokens, not the prompt.
input_len = input_tokens["input_ids"].shape[1]
response = self._model.tokenizer.decode(
output[0][input_len:], skip_special_tokens=True
).strip()
self._history.append(f"User: {message}")
self._history.append(f"Timmy: {response}")
return RunResult(content=response)
def print_response(self, message: str, *, stream: bool = True) -> None:
"""Run inference and render the response to stdout (CLI interface)."""
result = self.run(message, stream=stream)
self._render(result.content)
# ── private helpers ──────────────────────────────────────────────────────
def _build_prompt(self, message: str) -> str:
context = get_system_prompt(tools_enabled=False, session_id="airllm") + "\n\n"
# Include the last 10 turns (5 exchanges) for continuity.
if self._history:
context += "\n".join(self._history[-10:]) + "\n\n"
return context + f"User: {message}\nTimmy:"
@staticmethod
def _render(text: str) -> None:
"""Print response with rich markdown when available, plain text otherwise."""
try:
from rich.console import Console
from rich.markdown import Markdown
Console().print(Markdown(text))
except ImportError:
print(text)
# ── Grok (xAI) Backend ─────────────────────────────────────────────────────
# Premium cloud augmentation — opt-in only, never the default path.
@@ -187,7 +74,7 @@ class GrokBackend:
Uses the OpenAI-compatible SDK to connect to xAI's API.
Only activated when GROK_ENABLED=true and XAI_API_KEY is set.
Exposes the same interface as TimmyAirLLMAgent and Agno Agent:
Exposes the same interface as Agno Agent:
run(message, stream) → RunResult [dashboard]
print_response(message, stream) → None [CLI]
health_check() → dict [monitoring]
@@ -437,8 +324,7 @@ CLAUDE_MODELS: dict[str, str] = {
class ClaudeBackend:
"""Anthropic Claude backend — cloud fallback when local models are offline.
Uses the official Anthropic SDK. Same interface as GrokBackend and
TimmyAirLLMAgent:
Uses the official Anthropic SDK. Same interface as GrokBackend:
run(message, stream) → RunResult [dashboard]
print_response(message, stream) → None [CLI]
health_check() → dict [monitoring]

View File

@@ -22,13 +22,13 @@ _BACKEND_OPTION = typer.Option(
None,
"--backend",
"-b",
help="Inference backend: 'ollama' (default) | 'airllm' | 'auto'",
help="Inference backend: 'ollama' (default) | 'grok' | 'claude'",
)
_MODEL_SIZE_OPTION = typer.Option(
None,
"--model-size",
"-s",
help="AirLLM model size when --backend airllm: '8b' | '70b' | '405b'",
help="Model size (reserved for future use).",
)
@@ -37,6 +37,35 @@ def _is_interactive() -> bool:
return hasattr(sys.stdin, "isatty") and sys.stdin.isatty()
def _prompt_interactive(req, tool_name: str, tool_args: dict) -> None:
"""Display tool details and prompt the human for approval."""
description = format_action_description(tool_name, tool_args)
impact = get_impact_level(tool_name)
typer.echo()
typer.echo(typer.style("Tool confirmation required", bold=True))
typer.echo(f" Impact: {impact.upper()}")
typer.echo(f" {description}")
typer.echo()
if typer.confirm("Allow this action?", default=False):
req.confirm()
logger.info("CLI: approved %s", tool_name)
else:
req.reject(note="User rejected from CLI")
logger.info("CLI: rejected %s", tool_name)
def _decide_autonomous(req, tool_name: str, tool_args: dict) -> None:
"""Auto-approve allowlisted tools; reject everything else."""
if is_allowlisted(tool_name, tool_args):
req.confirm()
logger.info("AUTO-APPROVED (allowlist): %s", tool_name)
else:
req.reject(note="Auto-rejected: not in allowlist")
logger.info("AUTO-REJECTED (not allowlisted): %s %s", tool_name, str(tool_args)[:100])
def _handle_tool_confirmation(agent, run_output, session_id: str, *, autonomous: bool = False):
"""Prompt user to approve/reject dangerous tool calls.
@@ -51,6 +80,7 @@ def _handle_tool_confirmation(agent, run_output, session_id: str, *, autonomous:
Returns the final RunOutput after all confirmations are resolved.
"""
interactive = _is_interactive() and not autonomous
decide = _prompt_interactive if interactive else _decide_autonomous
max_rounds = 10 # safety limit
for _ in range(max_rounds):
@@ -66,39 +96,10 @@ def _handle_tool_confirmation(agent, run_output, session_id: str, *, autonomous:
for req in reqs:
if not getattr(req, "needs_confirmation", False):
continue
te = req.tool_execution
tool_name = getattr(te, "tool_name", "unknown")
tool_args = getattr(te, "tool_args", {}) or {}
if interactive:
# Human present — prompt for approval
description = format_action_description(tool_name, tool_args)
impact = get_impact_level(tool_name)
typer.echo()
typer.echo(typer.style("Tool confirmation required", bold=True))
typer.echo(f" Impact: {impact.upper()}")
typer.echo(f" {description}")
typer.echo()
approved = typer.confirm("Allow this action?", default=False)
if approved:
req.confirm()
logger.info("CLI: approved %s", tool_name)
else:
req.reject(note="User rejected from CLI")
logger.info("CLI: rejected %s", tool_name)
else:
# Autonomous mode — check allowlist
if is_allowlisted(tool_name, tool_args):
req.confirm()
logger.info("AUTO-APPROVED (allowlist): %s", tool_name)
else:
req.reject(note="Auto-rejected: not in allowlist")
logger.info(
"AUTO-REJECTED (not allowlisted): %s %s", tool_name, str(tool_args)[:100]
)
decide(req, tool_name, tool_args)
# Resume the run so the agent sees the confirmation result
try:
@@ -138,7 +139,7 @@ def think(
model_size: str | None = _MODEL_SIZE_OPTION,
):
"""Ask Timmy to think carefully about a topic."""
timmy = create_timmy(backend=backend, model_size=model_size, session_id=_CLI_SESSION_ID)
timmy = create_timmy(backend=backend, session_id=_CLI_SESSION_ID)
timmy.print_response(f"Think carefully about: {topic}", stream=True, session_id=_CLI_SESSION_ID)
@@ -201,7 +202,7 @@ def chat(
session_id = str(uuid.uuid4())
else:
session_id = _CLI_SESSION_ID
timmy = create_timmy(backend=backend, model_size=model_size, session_id=session_id)
timmy = create_timmy(backend=backend, session_id=session_id)
# Use agent.run() so we can intercept paused runs for tool confirmation.
run_output = timmy.run(message_str, stream=False, session_id=session_id)
@@ -278,7 +279,7 @@ def status(
model_size: str | None = _MODEL_SIZE_OPTION,
):
"""Print Timmy's operational status."""
timmy = create_timmy(backend=backend, model_size=model_size, session_id=_CLI_SESSION_ID)
timmy = create_timmy(backend=backend, session_id=_CLI_SESSION_ID)
timmy.print_response(STATUS_PROMPT, stream=False, session_id=_CLI_SESSION_ID)

View File

@@ -21,6 +21,10 @@ Usage::
from __future__ import annotations
import logging
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from PIL import ImageDraw
import os
import shutil
import sqlite3
@@ -270,20 +274,8 @@ async def create_gitea_issue_via_mcp(title: str, body: str = "", labels: str = "
return f"Failed to create issue via MCP: {exc}"
def _generate_avatar_image() -> bytes:
"""Generate a Timmy-themed avatar image using Pillow.
Creates a 512x512 wizard-themed avatar with emerald/purple/gold palette.
Returns raw PNG bytes. Falls back to a minimal solid-color image if
Pillow drawing primitives fail.
"""
from PIL import Image, ImageDraw
size = 512
img = Image.new("RGB", (size, size), (15, 25, 20))
draw = ImageDraw.Draw(img)
# Background gradient effect — concentric circles
def _draw_background(draw: ImageDraw.ImageDraw, size: int) -> None:
"""Draw radial gradient background with concentric circles."""
for i in range(size // 2, 0, -4):
g = int(25 + (i / (size // 2)) * 30)
draw.ellipse(
@@ -291,33 +283,45 @@ def _generate_avatar_image() -> bytes:
fill=(10, g, 20),
)
# Wizard hat (triangle)
def _draw_wizard(draw: ImageDraw.ImageDraw) -> None:
"""Draw wizard hat, face, eyes, smile, monogram, and robe."""
hat_color = (100, 50, 160) # purple
draw.polygon(
[(256, 40), (160, 220), (352, 220)],
fill=hat_color,
outline=(180, 130, 255),
)
hat_outline = (180, 130, 255)
gold = (220, 190, 50)
pupil = (30, 30, 60)
# Hat brim
draw.ellipse([140, 200, 372, 250], fill=hat_color, outline=(180, 130, 255))
# Hat + brim
draw.polygon([(256, 40), (160, 220), (352, 220)], fill=hat_color, outline=hat_outline)
draw.ellipse([140, 200, 372, 250], fill=hat_color, outline=hat_outline)
# Face circle
# Face
draw.ellipse([190, 220, 322, 370], fill=(60, 180, 100), outline=(80, 220, 120))
# Eyes
# Eyes (whites + pupils)
draw.ellipse([220, 275, 248, 310], fill=(255, 255, 255))
draw.ellipse([264, 275, 292, 310], fill=(255, 255, 255))
draw.ellipse([228, 285, 242, 300], fill=(30, 30, 60))
draw.ellipse([272, 285, 286, 300], fill=(30, 30, 60))
draw.ellipse([228, 285, 242, 300], fill=pupil)
draw.ellipse([272, 285, 286, 300], fill=pupil)
# Smile
draw.arc([225, 300, 287, 355], start=10, end=170, fill=(30, 30, 60), width=3)
draw.arc([225, 300, 287, 355], start=10, end=170, fill=pupil, width=3)
# Stars around the hat
# "T" monogram on hat
draw.text((243, 100), "T", fill=gold)
# Robe
draw.polygon(
[(180, 370), (140, 500), (372, 500), (332, 370)],
fill=(40, 100, 70),
outline=(60, 160, 100),
)
def _draw_stars(draw: ImageDraw.ImageDraw) -> None:
"""Draw decorative gold stars around the wizard hat."""
gold = (220, 190, 50)
star_positions = [(120, 100), (380, 120), (100, 300), (400, 280), (256, 10)]
for sx, sy in star_positions:
for sx, sy in [(120, 100), (380, 120), (100, 300), (400, 280), (256, 10)]:
r = 8
draw.polygon(
[
@@ -333,18 +337,26 @@ def _generate_avatar_image() -> bytes:
fill=gold,
)
# "T" monogram on the hat
draw.text((243, 100), "T", fill=gold)
# Robe / body
draw.polygon(
[(180, 370), (140, 500), (372, 500), (332, 370)],
fill=(40, 100, 70),
outline=(60, 160, 100),
)
def _generate_avatar_image() -> bytes:
"""Generate a Timmy-themed avatar image using Pillow.
Creates a 512x512 wizard-themed avatar with emerald/purple/gold palette.
Returns raw PNG bytes. Falls back to a minimal solid-color image if
Pillow drawing primitives fail.
"""
import io
from PIL import Image, ImageDraw
size = 512
img = Image.new("RGB", (size, size), (15, 25, 20))
draw = ImageDraw.Draw(img)
_draw_background(draw, size)
_draw_wizard(draw)
_draw_stars(draw)
buf = io.BytesIO()
img.save(buf, format="PNG")
return buf.getvalue()

View File

@@ -78,83 +78,88 @@ def _migrate_schema(conn: sqlite3.Connection) -> None:
cursor = conn.execute("SELECT name FROM sqlite_master WHERE type='table'")
tables = {row[0] for row in cursor.fetchall()}
has_memories = "memories" in tables
has_episodes = "episodes" in tables
has_chunks = "chunks" in tables
has_facts = "facts" in tables
# Check if we need to migrate (old schema exists but new one doesn't fully)
if not has_memories:
if "memories" not in tables:
logger.info("Migration: Creating unified memories table")
# Schema will be created above
# Migrate episodes -> memories
if has_episodes and has_memories:
logger.info("Migration: Converting episodes table to memories")
try:
cols = _get_table_columns(conn, "episodes")
context_type_col = "context_type" if "context_type" in cols else "'conversation'"
conn.execute(f"""
INSERT INTO memories (
id, content, memory_type, source, embedding,
metadata, agent_id, task_id, session_id,
created_at, access_count, last_accessed
)
SELECT
id, content,
COALESCE({context_type_col}, 'conversation'),
COALESCE(source, 'agent'),
embedding,
metadata, agent_id, task_id, session_id,
COALESCE(timestamp, datetime('now')), 0, NULL
FROM episodes
""")
conn.execute("DROP TABLE episodes")
logger.info("Migration: Migrated episodes to memories")
except sqlite3.Error as exc:
logger.warning("Migration: Failed to migrate episodes: %s", exc)
# Migrate chunks -> memories as vault_chunk
if has_chunks and has_memories:
logger.info("Migration: Converting chunks table to memories")
try:
cols = _get_table_columns(conn, "chunks")
id_col = "id" if "id" in cols else "CAST(rowid AS TEXT)"
content_col = "content" if "content" in cols else "text"
source_col = (
"filepath" if "filepath" in cols else ("source" if "source" in cols else "'vault'")
)
embedding_col = "embedding" if "embedding" in cols else "NULL"
created_col = "created_at" if "created_at" in cols else "datetime('now')"
conn.execute(f"""
INSERT INTO memories (
id, content, memory_type, source, embedding,
created_at, access_count
)
SELECT
{id_col}, {content_col}, 'vault_chunk', {source_col},
{embedding_col}, {created_col}, 0
FROM chunks
""")
conn.execute("DROP TABLE chunks")
logger.info("Migration: Migrated chunks to memories")
except sqlite3.Error as exc:
logger.warning("Migration: Failed to migrate chunks: %s", exc)
# Drop old facts table
if has_facts:
try:
conn.execute("DROP TABLE facts")
logger.info("Migration: Dropped old facts table")
except sqlite3.Error as exc:
logger.warning("Migration: Failed to drop facts: %s", exc)
# Schema will be created by _ensure_schema above
conn.commit()
return
_migrate_episodes(conn, tables)
_migrate_chunks(conn, tables)
_drop_legacy_tables(conn, tables)
conn.commit()
def _migrate_episodes(conn: sqlite3.Connection, tables: set[str]) -> None:
"""Migrate episodes table rows into the unified memories table."""
if "episodes" not in tables:
return
logger.info("Migration: Converting episodes table to memories")
try:
cols = _get_table_columns(conn, "episodes")
context_type_col = "context_type" if "context_type" in cols else "'conversation'"
conn.execute(f"""
INSERT INTO memories (
id, content, memory_type, source, embedding,
metadata, agent_id, task_id, session_id,
created_at, access_count, last_accessed
)
SELECT
id, content,
COALESCE({context_type_col}, 'conversation'),
COALESCE(source, 'agent'),
embedding,
metadata, agent_id, task_id, session_id,
COALESCE(timestamp, datetime('now')), 0, NULL
FROM episodes
""")
conn.execute("DROP TABLE episodes")
logger.info("Migration: Migrated episodes to memories")
except sqlite3.Error as exc:
logger.warning("Migration: Failed to migrate episodes: %s", exc)
def _migrate_chunks(conn: sqlite3.Connection, tables: set[str]) -> None:
"""Migrate chunks table rows into the unified memories table as vault_chunk."""
if "chunks" not in tables:
return
logger.info("Migration: Converting chunks table to memories")
try:
cols = _get_table_columns(conn, "chunks")
id_col = "id" if "id" in cols else "CAST(rowid AS TEXT)"
content_col = "content" if "content" in cols else "text"
source_col = (
"filepath" if "filepath" in cols else ("source" if "source" in cols else "'vault'")
)
embedding_col = "embedding" if "embedding" in cols else "NULL"
created_col = "created_at" if "created_at" in cols else "datetime('now')"
conn.execute(f"""
INSERT INTO memories (
id, content, memory_type, source, embedding,
created_at, access_count
)
SELECT
{id_col}, {content_col}, 'vault_chunk', {source_col},
{embedding_col}, {created_col}, 0
FROM chunks
""")
conn.execute("DROP TABLE chunks")
logger.info("Migration: Migrated chunks to memories")
except sqlite3.Error as exc:
logger.warning("Migration: Failed to migrate chunks: %s", exc)
def _drop_legacy_tables(conn: sqlite3.Connection, tables: set[str]) -> None:
"""Drop old facts table if it exists."""
if "facts" not in tables:
return
try:
conn.execute("DROP TABLE facts")
logger.info("Migration: Dropped old facts table")
except sqlite3.Error as exc:
logger.warning("Migration: Failed to drop facts: %s", exc)
def _get_table_columns(conn: sqlite3.Connection, table_name: str) -> set[str]:
"""Get the column names for a table."""
cursor = conn.execute(f"PRAGMA table_info({table_name})")

View File

@@ -98,6 +98,73 @@ def _get_table_columns(conn: sqlite3.Connection, table_name: str) -> set[str]:
return {row[1] for row in cursor.fetchall()}
def _migrate_episodes(conn: sqlite3.Connection) -> None:
"""Migrate episodes table rows into the unified memories table."""
logger.info("Migration: Converting episodes table to memories")
try:
cols = _get_table_columns(conn, "episodes")
context_type_col = "context_type" if "context_type" in cols else "'conversation'"
conn.execute(f"""
INSERT INTO memories (
id, content, memory_type, source, embedding,
metadata, agent_id, task_id, session_id,
created_at, access_count, last_accessed
)
SELECT
id, content,
COALESCE({context_type_col}, 'conversation'),
COALESCE(source, 'agent'),
embedding,
metadata, agent_id, task_id, session_id,
COALESCE(timestamp, datetime('now')), 0, NULL
FROM episodes
""")
conn.execute("DROP TABLE episodes")
logger.info("Migration: Migrated episodes to memories")
except sqlite3.Error as exc:
logger.warning("Migration: Failed to migrate episodes: %s", exc)
def _migrate_chunks(conn: sqlite3.Connection) -> None:
"""Migrate chunks table rows into the unified memories table."""
logger.info("Migration: Converting chunks table to memories")
try:
cols = _get_table_columns(conn, "chunks")
id_col = "id" if "id" in cols else "CAST(rowid AS TEXT)"
content_col = "content" if "content" in cols else "text"
source_col = (
"filepath" if "filepath" in cols else ("source" if "source" in cols else "'vault'")
)
embedding_col = "embedding" if "embedding" in cols else "NULL"
created_col = "created_at" if "created_at" in cols else "datetime('now')"
conn.execute(f"""
INSERT INTO memories (
id, content, memory_type, source, embedding,
created_at, access_count
)
SELECT
{id_col}, {content_col}, 'vault_chunk', {source_col},
{embedding_col}, {created_col}, 0
FROM chunks
""")
conn.execute("DROP TABLE chunks")
logger.info("Migration: Migrated chunks to memories")
except sqlite3.Error as exc:
logger.warning("Migration: Failed to migrate chunks: %s", exc)
def _drop_legacy_table(conn: sqlite3.Connection, table: str) -> None:
"""Drop a legacy table if it exists."""
try:
conn.execute(f"DROP TABLE {table}") # noqa: S608
logger.info("Migration: Dropped old %s table", table)
except sqlite3.Error as exc:
logger.warning("Migration: Failed to drop %s: %s", table, exc)
def _migrate_schema(conn: sqlite3.Connection) -> None:
"""Migrate from old three-table schema to unified memories table.
@@ -110,78 +177,16 @@ def _migrate_schema(conn: sqlite3.Connection) -> None:
tables = {row[0] for row in cursor.fetchall()}
has_memories = "memories" in tables
has_episodes = "episodes" in tables
has_chunks = "chunks" in tables
has_facts = "facts" in tables
# Check if we need to migrate (old schema exists)
if not has_memories and (has_episodes or has_chunks or has_facts):
if not has_memories and (tables & {"episodes", "chunks", "facts"}):
logger.info("Migration: Creating unified memories table")
# Schema will be created by _ensure_schema above
# Migrate episodes -> memories
if has_episodes and has_memories:
logger.info("Migration: Converting episodes table to memories")
try:
cols = _get_table_columns(conn, "episodes")
context_type_col = "context_type" if "context_type" in cols else "'conversation'"
conn.execute(f"""
INSERT INTO memories (
id, content, memory_type, source, embedding,
metadata, agent_id, task_id, session_id,
created_at, access_count, last_accessed
)
SELECT
id, content,
COALESCE({context_type_col}, 'conversation'),
COALESCE(source, 'agent'),
embedding,
metadata, agent_id, task_id, session_id,
COALESCE(timestamp, datetime('now')), 0, NULL
FROM episodes
""")
conn.execute("DROP TABLE episodes")
logger.info("Migration: Migrated episodes to memories")
except sqlite3.Error as exc:
logger.warning("Migration: Failed to migrate episodes: %s", exc)
# Migrate chunks -> memories as vault_chunk
if has_chunks and has_memories:
logger.info("Migration: Converting chunks table to memories")
try:
cols = _get_table_columns(conn, "chunks")
id_col = "id" if "id" in cols else "CAST(rowid AS TEXT)"
content_col = "content" if "content" in cols else "text"
source_col = (
"filepath" if "filepath" in cols else ("source" if "source" in cols else "'vault'")
)
embedding_col = "embedding" if "embedding" in cols else "NULL"
created_col = "created_at" if "created_at" in cols else "datetime('now')"
conn.execute(f"""
INSERT INTO memories (
id, content, memory_type, source, embedding,
created_at, access_count
)
SELECT
{id_col}, {content_col}, 'vault_chunk', {source_col},
{embedding_col}, {created_col}, 0
FROM chunks
""")
conn.execute("DROP TABLE chunks")
logger.info("Migration: Migrated chunks to memories")
except sqlite3.Error as exc:
logger.warning("Migration: Failed to migrate chunks: %s", exc)
# Drop old tables
if has_facts:
try:
conn.execute("DROP TABLE facts")
logger.info("Migration: Dropped old facts table")
except sqlite3.Error as exc:
logger.warning("Migration: Failed to drop facts: %s", exc)
if "episodes" in tables and has_memories:
_migrate_episodes(conn)
if "chunks" in tables and has_memories:
_migrate_chunks(conn)
if "facts" in tables:
_drop_legacy_table(conn, "facts")
conn.commit()
@@ -298,6 +303,85 @@ def store_memory(
return entry
def _build_search_filters(
context_type: str | None,
agent_id: str | None,
session_id: str | None,
) -> tuple[str, list]:
"""Build SQL WHERE clause and params from search filters."""
conditions: list[str] = []
params: list = []
if context_type:
conditions.append("memory_type = ?")
params.append(context_type)
if agent_id:
conditions.append("agent_id = ?")
params.append(agent_id)
if session_id:
conditions.append("session_id = ?")
params.append(session_id)
where_clause = "WHERE " + " AND ".join(conditions) if conditions else ""
return where_clause, params
def _fetch_memory_candidates(
where_clause: str, params: list, candidate_limit: int
) -> list[sqlite3.Row]:
"""Fetch candidate memory rows from the database."""
query_sql = f"""
SELECT * FROM memories
{where_clause}
ORDER BY created_at DESC
LIMIT ?
"""
params.append(candidate_limit)
with get_connection() as conn:
return conn.execute(query_sql, params).fetchall()
def _row_to_entry(row: sqlite3.Row) -> MemoryEntry:
"""Convert a database row to a MemoryEntry."""
return MemoryEntry(
id=row["id"],
content=row["content"],
source=row["source"],
context_type=row["memory_type"], # DB column -> API field
agent_id=row["agent_id"],
task_id=row["task_id"],
session_id=row["session_id"],
metadata=json.loads(row["metadata"]) if row["metadata"] else None,
embedding=json.loads(row["embedding"]) if row["embedding"] else None,
timestamp=row["created_at"],
)
def _score_and_filter(
rows: list[sqlite3.Row],
query: str,
query_embedding: list[float],
min_relevance: float,
) -> list[MemoryEntry]:
"""Score candidate rows by similarity and filter by min_relevance."""
results = []
for row in rows:
entry = _row_to_entry(row)
if entry.embedding:
score = cosine_similarity(query_embedding, entry.embedding)
else:
score = _keyword_overlap(query, entry.content)
entry.relevance_score = score
if score >= min_relevance:
results.append(entry)
results.sort(key=lambda x: x.relevance_score or 0, reverse=True)
return results
def search_memories(
query: str,
limit: int = 10,
@@ -320,65 +404,9 @@ def search_memories(
List of MemoryEntry objects sorted by relevance
"""
query_embedding = embed_text(query)
# Build query with filters
conditions = []
params = []
if context_type:
conditions.append("memory_type = ?")
params.append(context_type)
if agent_id:
conditions.append("agent_id = ?")
params.append(agent_id)
if session_id:
conditions.append("session_id = ?")
params.append(session_id)
where_clause = "WHERE " + " AND ".join(conditions) if conditions else ""
# Fetch candidates (we'll do in-memory similarity for now)
query_sql = f"""
SELECT * FROM memories
{where_clause}
ORDER BY created_at DESC
LIMIT ?
"""
params.append(limit * 3) # Get more candidates for ranking
with get_connection() as conn:
rows = conn.execute(query_sql, params).fetchall()
# Compute similarity scores
results = []
for row in rows:
entry = MemoryEntry(
id=row["id"],
content=row["content"],
source=row["source"],
context_type=row["memory_type"], # DB column -> API field
agent_id=row["agent_id"],
task_id=row["task_id"],
session_id=row["session_id"],
metadata=json.loads(row["metadata"]) if row["metadata"] else None,
embedding=json.loads(row["embedding"]) if row["embedding"] else None,
timestamp=row["created_at"],
)
if entry.embedding:
score = cosine_similarity(query_embedding, entry.embedding)
entry.relevance_score = score
if score >= min_relevance:
results.append(entry)
else:
# Fallback: check for keyword overlap
score = _keyword_overlap(query, entry.content)
entry.relevance_score = score
if score >= min_relevance:
results.append(entry)
# Sort by relevance and return top results
results.sort(key=lambda x: x.relevance_score or 0, reverse=True)
where_clause, params = _build_search_filters(context_type, agent_id, session_id)
rows = _fetch_memory_candidates(where_clause, params, limit * 3)
results = _score_and_filter(rows, query, query_embedding, min_relevance)
return results[:limit]
@@ -636,7 +664,7 @@ class HotMemory:
if len(lines) > 1:
return "\n".join(lines)
except Exception:
pass
logger.debug("DB context read failed, falling back to file")
# Fallback to file if DB unavailable
if self.path.exists():

View File

@@ -23,6 +23,9 @@ Rules:
- Remember what the user tells you during the conversation.
- If you don't know something, say so honestly — never fabricate facts.
- If a request is ambiguous, ask a brief clarifying question before guessing.
- SOURCE DISTINCTION: When answering from memory or retrieved context, cite it.
When answering from your own training, use hedging: "I think", "I believe".
The user must be able to tell grounded claims from pattern-matching.
- Use the user's name if you know it.
- When you state a fact, commit to it.
- NEVER attempt arithmetic in your head. If asked to compute anything, respond:
@@ -78,6 +81,18 @@ HONESTY:
- Never fabricate tool output. Call the tool and wait.
- If a tool errors, report the exact error.
SOURCE DISTINCTION (SOUL requirement — non-negotiable):
- Every claim you make comes from one of two places: a verified source you
can point to, or your own pattern-matching. The user must be able to tell
which is which.
- When your response uses information from GROUNDED CONTEXT (memory, retrieved
documents, tool output), cite it: "From memory:", "According to [source]:".
- When you are generating from your training data alone, signal it naturally:
"I think", "My understanding is", "I believe" — never false certainty.
- If the user asks a factual question and you have no grounded source, say so:
"I don't have a verified source for this — from my training I think..."
- Prefer "I don't know" over a confident-sounding guess. Refusal over fabrication.
MEMORY (three tiers):
- Tier 1: MEMORY.md (hot, always loaded)
- Tier 2: memory/ vault (structured, append-only, date-stamped)

View File

@@ -323,6 +323,75 @@ def session_history(query: str, role: str = "", limit: int = 10) -> str:
_LOW_CONFIDENCE_THRESHOLD = 0.5
def _categorize_entries(
entries: list[dict],
) -> tuple[list[dict], list[dict], list[dict], list[dict]]:
"""Split session entries into messages, errors, timmy msgs, user msgs."""
messages = [e for e in entries if e.get("type") == "message"]
errors = [e for e in entries if e.get("type") == "error"]
timmy_msgs = [e for e in messages if e.get("role") == "timmy"]
user_msgs = [e for e in messages if e.get("role") == "user"]
return messages, errors, timmy_msgs, user_msgs
def _find_low_confidence(timmy_msgs: list[dict]) -> list[dict]:
"""Return Timmy responses below the confidence threshold."""
return [
m
for m in timmy_msgs
if m.get("confidence") is not None and m["confidence"] < _LOW_CONFIDENCE_THRESHOLD
]
def _find_repeated_topics(user_msgs: list[dict], top_n: int = 5) -> list[tuple[str, int]]:
"""Identify frequently mentioned words in user messages."""
topic_counts: dict[str, int] = {}
for m in user_msgs:
for word in (m.get("content") or "").lower().split():
cleaned = word.strip(".,!?\"'()[]")
if len(cleaned) > 3:
topic_counts[cleaned] = topic_counts.get(cleaned, 0) + 1
return sorted(
((w, c) for w, c in topic_counts.items() if c >= 3),
key=lambda x: x[1],
reverse=True,
)[:top_n]
def _format_reflection_section(
title: str,
items: list[dict],
formatter: object,
empty_msg: str,
) -> list[str]:
"""Format a titled section with items, or an empty-state message."""
if items:
lines = [f"### {title} ({len(items)})"]
for item in items[:5]:
lines.append(formatter(item)) # type: ignore[operator]
lines.append("")
return lines
return [f"### {title}\n{empty_msg}\n"]
def _build_insights(
low_conf: list[dict],
errors: list[dict],
repeated: list[tuple[str, int]],
) -> list[str]:
"""Generate actionable insight bullets from analysis results."""
insights: list[str] = []
if low_conf:
insights.append("Consider studying topics where confidence was low.")
if errors:
insights.append("Review error patterns for recurring infrastructure issues.")
if repeated:
insights.append(
f'User frequently asks about "{repeated[0][0]}" — consider deepening knowledge here.'
)
return insights or ["Conversations look healthy. Keep up the good work."]
def self_reflect(limit: int = 30) -> str:
"""Review recent conversations and reflect on Timmy's own behavior.
@@ -343,35 +412,12 @@ def self_reflect(limit: int = 30) -> str:
if not entries:
return "No conversation history to reflect on yet."
# Categorize entries
messages = [e for e in entries if e.get("type") == "message"]
errors = [e for e in entries if e.get("type") == "error"]
timmy_msgs = [e for e in messages if e.get("role") == "timmy"]
user_msgs = [e for e in messages if e.get("role") == "user"]
# 1. Low-confidence responses
low_conf = [
m
for m in timmy_msgs
if m.get("confidence") is not None and m["confidence"] < _LOW_CONFIDENCE_THRESHOLD
]
# 2. Identify repeated user topics (simple word frequency)
topic_counts: dict[str, int] = {}
for m in user_msgs:
for word in (m.get("content") or "").lower().split():
cleaned = word.strip(".,!?\"'()[]")
if len(cleaned) > 3:
topic_counts[cleaned] = topic_counts.get(cleaned, 0) + 1
repeated = sorted(
((w, c) for w, c in topic_counts.items() if c >= 3),
key=lambda x: x[1],
reverse=True,
)[:5]
_messages, errors, timmy_msgs, user_msgs = _categorize_entries(entries)
low_conf = _find_low_confidence(timmy_msgs)
repeated = _find_repeated_topics(user_msgs)
# Build reflection report
sections: list[str] = ["## Self-Reflection Report\n"]
sections.append(
f"Reviewed {len(entries)} recent entries: "
f"{len(user_msgs)} user messages, "
@@ -379,32 +425,27 @@ def self_reflect(limit: int = 30) -> str:
f"{len(errors)} errors.\n"
)
# Low confidence
if low_conf:
sections.append(f"### Low-Confidence Responses ({len(low_conf)})")
for m in low_conf[:5]:
ts = (m.get("timestamp") or "?")[:19]
conf = m.get("confidence", 0)
text = (m.get("content") or "")[:120]
sections.append(f"- [{ts}] confidence={conf:.0%}: {text}")
sections.append("")
else:
sections.append(
"### Low-Confidence Responses\nNone found — all responses above threshold.\n"
sections.extend(
_format_reflection_section(
"Low-Confidence Responses",
low_conf,
lambda m: (
f"- [{(m.get('timestamp') or '?')[:19]}] "
f"confidence={m.get('confidence', 0):.0%}: "
f"{(m.get('content') or '')[:120]}"
),
"None found — all responses above threshold.",
)
)
sections.extend(
_format_reflection_section(
"Errors",
errors,
lambda e: f"- [{(e.get('timestamp') or '?')[:19]}] {(e.get('error') or '')[:120]}",
"No errors recorded.",
)
)
# Errors
if errors:
sections.append(f"### Errors ({len(errors)})")
for e in errors[:5]:
ts = (e.get("timestamp") or "?")[:19]
err = (e.get("error") or "")[:120]
sections.append(f"- [{ts}] {err}")
sections.append("")
else:
sections.append("### Errors\nNo errors recorded.\n")
# Repeated topics
if repeated:
sections.append("### Recurring Topics")
for word, count in repeated:
@@ -413,22 +454,8 @@ def self_reflect(limit: int = 30) -> str:
else:
sections.append("### Recurring Topics\nNo strong patterns detected.\n")
# Actionable summary
insights: list[str] = []
if low_conf:
insights.append("Consider studying topics where confidence was low.")
if errors:
insights.append("Review error patterns for recurring infrastructure issues.")
if repeated:
top_topic = repeated[0][0]
insights.append(
f'User frequently asks about "{top_topic}" — consider deepening knowledge here.'
)
if not insights:
insights.append("Conversations look healthy. Keep up the good work.")
sections.append("### Insights")
for insight in insights:
for insight in _build_insights(low_conf, errors, repeated):
sections.append(f"- {insight}")
return "\n".join(sections)

View File

@@ -232,6 +232,90 @@ class ThinkingEngine:
return False # Disabled — never idle
return datetime.now(UTC) - self._last_input_time > timedelta(minutes=timeout)
def _build_thinking_context(self) -> tuple[str, str, list["Thought"]]:
"""Assemble the context needed for a thinking cycle.
Returns:
(memory_context, system_context, recent_thoughts)
"""
memory_context = self._load_memory_context()
system_context = self._gather_system_snapshot()
recent_thoughts = self.get_recent_thoughts(limit=5)
return memory_context, system_context, recent_thoughts
async def _generate_novel_thought(
self,
prompt: str | None,
memory_context: str,
system_context: str,
recent_thoughts: list["Thought"],
) -> tuple[str | None, str]:
"""Run the dedup-retry loop to produce a novel thought.
Returns:
(content, seed_type) — content is None if no novel thought produced.
"""
seed_type: str = "freeform"
for attempt in range(self._MAX_DEDUP_RETRIES + 1):
if prompt:
seed_type = "prompted"
seed_context = f"Journal prompt: {prompt}"
else:
seed_type, seed_context = self._gather_seed()
continuity = self._build_continuity_context()
full_prompt = _THINKING_PROMPT.format(
memory_context=memory_context,
system_context=system_context,
seed_context=seed_context,
continuity_context=continuity,
)
try:
raw = await self._call_agent(full_prompt)
except Exception as exc:
logger.warning("Thinking cycle failed (Ollama likely down): %s", exc)
return None, seed_type
if not raw or not raw.strip():
logger.debug("Thinking cycle produced empty response, skipping")
return None, seed_type
content = raw.strip()
# Dedup: reject thoughts too similar to recent ones
if not self._is_too_similar(content, recent_thoughts):
return content, seed_type # Good — novel thought
if attempt < self._MAX_DEDUP_RETRIES:
logger.info(
"Thought too similar to recent (attempt %d/%d), retrying with new seed",
attempt + 1,
self._MAX_DEDUP_RETRIES + 1,
)
else:
logger.warning(
"Thought still repetitive after %d retries, discarding",
self._MAX_DEDUP_RETRIES + 1,
)
return None, seed_type
return None, seed_type
async def _process_thinking_result(self, thought: "Thought") -> None:
"""Run all post-hooks after a thought is stored."""
self._maybe_check_memory()
await self._maybe_distill()
await self._maybe_file_issues()
await self._check_workspace()
self._maybe_check_memory_status()
self._update_memory(thought)
self._log_event(thought)
self._write_journal(thought)
await self._broadcast(thought)
async def think_once(self, prompt: str | None = None) -> Thought | None:
"""Execute one thinking cycle.
@@ -257,91 +341,26 @@ class ThinkingEngine:
)
return None
memory_context = self._load_memory_context()
system_context = self._gather_system_snapshot()
recent_thoughts = self.get_recent_thoughts(limit=5)
# Capture arrival time *before* the LLM call so the thought
# timestamp reflects when the cycle started, not when the
# (potentially slow) generation finished. Fixes #582.
arrived_at = datetime.now(UTC).isoformat()
content: str | None = None
seed_type: str = "freeform"
for attempt in range(self._MAX_DEDUP_RETRIES + 1):
if prompt:
seed_type = "prompted"
seed_context = f"Journal prompt: {prompt}"
else:
seed_type, seed_context = self._gather_seed()
continuity = self._build_continuity_context()
full_prompt = _THINKING_PROMPT.format(
memory_context=memory_context,
system_context=system_context,
seed_context=seed_context,
continuity_context=continuity,
)
try:
raw = await self._call_agent(full_prompt)
except Exception as exc:
logger.warning("Thinking cycle failed (Ollama likely down): %s", exc)
return None
if not raw or not raw.strip():
logger.debug("Thinking cycle produced empty response, skipping")
return None
content = raw.strip()
# Dedup: reject thoughts too similar to recent ones
if not self._is_too_similar(content, recent_thoughts):
break # Good — novel thought
if attempt < self._MAX_DEDUP_RETRIES:
logger.info(
"Thought too similar to recent (attempt %d/%d), retrying with new seed",
attempt + 1,
self._MAX_DEDUP_RETRIES + 1,
)
content = None # Will retry
else:
logger.warning(
"Thought still repetitive after %d retries, discarding",
self._MAX_DEDUP_RETRIES + 1,
)
return None
memory_context, system_context, recent_thoughts = self._build_thinking_context()
content, seed_type = await self._generate_novel_thought(
prompt,
memory_context,
system_context,
recent_thoughts,
)
if not content:
return None
thought = self._store_thought(content, seed_type)
thought = self._store_thought(content, seed_type, arrived_at=arrived_at)
self._last_thought_id = thought.id
# Post-hook: check memory status periodically
self._maybe_check_memory()
# Post-hook: distill facts from recent thoughts periodically
await self._maybe_distill()
# Post-hook: file Gitea issues for actionable observations
await self._maybe_file_issues()
# Post-hook: check workspace for new messages from Hermes
await self._check_workspace()
# Post-hook: proactive memory status audit
self._maybe_check_memory_status()
# Post-hook: update MEMORY.md with latest reflection
self._update_memory(thought)
# Log to swarm event system
self._log_event(thought)
# Append to daily journal file
self._write_journal(thought)
# Broadcast to WebSocket clients
await self._broadcast(thought)
await self._process_thinking_result(thought)
logger.info(
"Thought [%s] (%s): %s",
@@ -758,23 +777,10 @@ class ThinkingEngine:
except Exception as exc:
logger.debug("Thought issue filing skipped: %s", exc)
def _gather_system_snapshot(self) -> str:
"""Gather lightweight real system state for grounding thoughts in reality.
# ── System snapshot helpers ────────────────────────────────────────────
Returns a short multi-line string with current time, thought count,
recent chat activity, and task queue status. Never crashes — every
section is independently try/excepted.
"""
parts: list[str] = []
# Current local time
now = datetime.now().astimezone()
tz = now.strftime("%Z") or "UTC"
parts.append(
f"Local time: {now.strftime('%I:%M %p').lstrip('0')} {tz}, {now.strftime('%A %B %d')}"
)
# Thought count today (cheap DB query)
def _snap_thought_count(self, now: datetime) -> str | None:
"""Return today's thought count, or *None* on failure."""
try:
today_start = now.replace(hour=0, minute=0, second=0, microsecond=0)
with _get_conn(self._db_path) as conn:
@@ -782,66 +788,94 @@ class ThinkingEngine:
"SELECT COUNT(*) as c FROM thoughts WHERE created_at >= ?",
(today_start.isoformat(),),
).fetchone()["c"]
parts.append(f"Thoughts today: {count}")
return f"Thoughts today: {count}"
except Exception as exc:
logger.debug("Thought count query failed: %s", exc)
pass
return None
# Recent chat activity (in-memory, no I/O)
def _snap_chat_activity(self) -> list[str]:
"""Return chat-activity lines (in-memory, no I/O)."""
try:
from infrastructure.chat_store import message_log
messages = message_log.all()
if messages:
parts.append(f"Chat messages this session: {len(messages)}")
last = messages[-1]
parts.append(f'Last chat ({last.role}): "{last.content[:80]}"')
else:
parts.append("No chat messages this session")
return [
f"Chat messages this session: {len(messages)}",
f'Last chat ({last.role}): "{last.content[:80]}"',
]
return ["No chat messages this session"]
except Exception as exc:
logger.debug("Chat activity query failed: %s", exc)
pass
return []
# Task queue (lightweight DB query)
def _snap_task_queue(self) -> str | None:
"""Return a one-line task queue summary, or *None*."""
try:
from swarm.task_queue.models import get_task_summary_for_briefing
summary = get_task_summary_for_briefing()
running = summary.get("running", 0)
pending = summary.get("pending_approval", 0)
done = summary.get("completed", 0)
failed = summary.get("failed", 0)
s = get_task_summary_for_briefing()
running, pending = s.get("running", 0), s.get("pending_approval", 0)
done, failed = s.get("completed", 0), s.get("failed", 0)
if running or pending or done or failed:
parts.append(
return (
f"Tasks: {running} running, {pending} pending, "
f"{done} completed, {failed} failed"
)
except Exception as exc:
logger.debug("Task queue query failed: %s", exc)
pass
return None
# Workspace updates (file-based communication with Hermes)
def _snap_workspace(self) -> list[str]:
"""Return workspace-update lines (file-based Hermes comms)."""
try:
from timmy.workspace import workspace_monitor
updates = workspace_monitor.get_pending_updates()
lines: list[str] = []
new_corr = updates.get("new_correspondence")
new_inbox = updates.get("new_inbox_files", [])
if new_corr:
# Count entries (assuming each entry starts with a timestamp or header)
line_count = len([line for line in new_corr.splitlines() if line.strip()])
parts.append(
line_count = len([ln for ln in new_corr.splitlines() if ln.strip()])
lines.append(
f"Workspace: {line_count} new correspondence entries (latest from: Hermes)"
)
new_inbox = updates.get("new_inbox_files", [])
if new_inbox:
files_str = ", ".join(new_inbox[:5])
if len(new_inbox) > 5:
files_str += f", ... (+{len(new_inbox) - 5} more)"
parts.append(f"Workspace: {len(new_inbox)} new inbox files: {files_str}")
lines.append(f"Workspace: {len(new_inbox)} new inbox files: {files_str}")
return lines
except Exception as exc:
logger.debug("Workspace check failed: %s", exc)
pass
return []
def _gather_system_snapshot(self) -> str:
"""Gather lightweight real system state for grounding thoughts in reality.
Returns a short multi-line string with current time, thought count,
recent chat activity, and task queue status. Never crashes — every
section is independently try/excepted.
"""
now = datetime.now().astimezone()
tz = now.strftime("%Z") or "UTC"
parts: list[str] = [
f"Local time: {now.strftime('%I:%M %p').lstrip('0')} {tz}, {now.strftime('%A %B %d')}"
]
thought_line = self._snap_thought_count(now)
if thought_line:
parts.append(thought_line)
parts.extend(self._snap_chat_activity())
task_line = self._snap_task_queue()
if task_line:
parts.append(task_line)
parts.extend(self._snap_workspace())
return "\n".join(parts) if parts else ""
@@ -1110,32 +1144,59 @@ class ThinkingEngine:
lines.append(f"- [{thought.seed_type}] {snippet}")
return "\n".join(lines)
_thinking_agent = None # cached agent — avoids per-call resource leaks (#525)
async def _call_agent(self, prompt: str) -> str:
"""Call Timmy's agent to generate a thought.
Creates a lightweight agent with skip_mcp=True to avoid the cancel-scope
Reuses a cached agent with skip_mcp=True to avoid the cancel-scope
errors that occur when MCP stdio transports are spawned inside asyncio
background tasks (#72). The thinking engine doesn't need Gitea or
filesystem tools — it only needs the LLM.
background tasks (#72) and to prevent per-call resource leaks (httpx
clients, SQLite connections, model warmups) that caused the thinking
loop to die every ~10 min (#525).
Individual calls are capped at 120 s so a hung Ollama never blocks
the scheduler indefinitely.
Strips ``<think>`` tags from reasoning models (qwen3, etc.) so that
downstream parsers (fact distillation, issue filing) receive clean text.
"""
from timmy.agent import create_timmy
import asyncio
if self._thinking_agent is None:
from timmy.agent import create_timmy
self._thinking_agent = create_timmy(skip_mcp=True)
try:
async with asyncio.timeout(120):
run = await self._thinking_agent.arun(prompt, stream=False)
except TimeoutError:
logger.warning("Thinking LLM call timed out after 120 s")
return ""
agent = create_timmy(skip_mcp=True)
run = await agent.arun(prompt, stream=False)
raw = run.content if hasattr(run, "content") else str(run)
return _THINK_TAG_RE.sub("", raw) if raw else raw
def _store_thought(self, content: str, seed_type: str) -> Thought:
"""Persist a thought to SQLite."""
def _store_thought(
self,
content: str,
seed_type: str,
*,
arrived_at: str | None = None,
) -> Thought:
"""Persist a thought to SQLite.
Args:
arrived_at: ISO-8601 timestamp captured when the thinking cycle
started. Falls back to now() for callers that don't supply it.
"""
thought = Thought(
id=str(uuid.uuid4()),
content=content,
seed_type=seed_type,
parent_id=self._last_thought_id,
created_at=datetime.now(UTC).isoformat(),
created_at=arrived_at or datetime.now(UTC).isoformat(),
)
with _get_conn(self._db_path) as conn:
@@ -1216,6 +1277,53 @@ class ThinkingEngine:
logger.debug("Failed to broadcast thought: %s", exc)
def _query_thoughts(
db_path: Path, query: str, seed_type: str | None, limit: int
) -> list[sqlite3.Row]:
"""Run the thought-search SQL and return matching rows."""
pattern = f"%{query}%"
with _get_conn(db_path) as conn:
if seed_type:
return conn.execute(
"""
SELECT id, content, seed_type, created_at
FROM thoughts
WHERE content LIKE ? AND seed_type = ?
ORDER BY created_at DESC
LIMIT ?
""",
(pattern, seed_type, limit),
).fetchall()
return conn.execute(
"""
SELECT id, content, seed_type, created_at
FROM thoughts
WHERE content LIKE ?
ORDER BY created_at DESC
LIMIT ?
""",
(pattern, limit),
).fetchall()
def _format_thought_rows(rows: list[sqlite3.Row], query: str, seed_type: str | None) -> str:
"""Format thought rows into a human-readable string."""
lines = [f'Found {len(rows)} thought(s) matching "{query}":']
if seed_type:
lines[0] += f' [seed_type="{seed_type}"]'
lines.append("")
for row in rows:
ts = datetime.fromisoformat(row["created_at"])
local_ts = ts.astimezone()
time_str = local_ts.strftime("%Y-%m-%d %I:%M %p").lstrip("0")
seed = row["seed_type"]
content = row["content"].replace("\n", " ") # Flatten newlines for display
lines.append(f"[{time_str}] ({seed}) {content[:150]}")
return "\n".join(lines)
def search_thoughts(query: str, seed_type: str | None = None, limit: int = 10) -> str:
"""Search Timmy's thought history for reflections matching a query.
@@ -1233,58 +1341,17 @@ def search_thoughts(query: str, seed_type: str | None = None, limit: int = 10) -
Formatted string with matching thoughts, newest first, including
timestamps and seed types. Returns a helpful message if no matches found.
"""
# Clamp limit to reasonable bounds
limit = max(1, min(limit, 50))
try:
engine = thinking_engine
db_path = engine._db_path
# Build query with optional seed_type filter
with _get_conn(db_path) as conn:
if seed_type:
rows = conn.execute(
"""
SELECT id, content, seed_type, created_at
FROM thoughts
WHERE content LIKE ? AND seed_type = ?
ORDER BY created_at DESC
LIMIT ?
""",
(f"%{query}%", seed_type, limit),
).fetchall()
else:
rows = conn.execute(
"""
SELECT id, content, seed_type, created_at
FROM thoughts
WHERE content LIKE ?
ORDER BY created_at DESC
LIMIT ?
""",
(f"%{query}%", limit),
).fetchall()
rows = _query_thoughts(thinking_engine._db_path, query, seed_type, limit)
if not rows:
if seed_type:
return f'No thoughts found matching "{query}" with seed_type="{seed_type}".'
return f'No thoughts found matching "{query}".'
# Format results
lines = [f'Found {len(rows)} thought(s) matching "{query}":']
if seed_type:
lines[0] += f' [seed_type="{seed_type}"]'
lines.append("")
for row in rows:
ts = datetime.fromisoformat(row["created_at"])
local_ts = ts.astimezone()
time_str = local_ts.strftime("%Y-%m-%d %I:%M %p").lstrip("0")
seed = row["seed_type"]
content = row["content"].replace("\n", " ") # Flatten newlines for display
lines.append(f"[{time_str}] ({seed}) {content[:150]}")
return "\n".join(lines)
return _format_thought_rows(rows, query, seed_type)
except Exception as exc:
logger.warning("Thought search failed: %s", exc)

View File

@@ -909,82 +909,35 @@ def _experiment_tool_catalog() -> dict:
}
_CREATIVE_CATALOG_SOURCES: list[tuple[str, str, list[str]]] = [
("creative.tools.git_tools", "GIT_TOOL_CATALOG", ["forge", "helm", "orchestrator"]),
("creative.tools.image_tools", "IMAGE_TOOL_CATALOG", ["pixel", "orchestrator"]),
("creative.tools.music_tools", "MUSIC_TOOL_CATALOG", ["lyra", "orchestrator"]),
("creative.tools.video_tools", "VIDEO_TOOL_CATALOG", ["reel", "orchestrator"]),
("creative.director", "DIRECTOR_TOOL_CATALOG", ["orchestrator"]),
("creative.assembler", "ASSEMBLER_TOOL_CATALOG", ["reel", "orchestrator"]),
]
def _import_creative_catalogs(catalog: dict) -> None:
"""Import and merge creative tool catalogs from creative module."""
# ── Git tools ─────────────────────────────────────────────────────────────
try:
from creative.tools.git_tools import GIT_TOOL_CATALOG
for module_path, attr_name, available_in in _CREATIVE_CATALOG_SOURCES:
_merge_catalog(catalog, module_path, attr_name, available_in)
for tool_id, info in GIT_TOOL_CATALOG.items():
def _merge_catalog(
catalog: dict, module_path: str, attr_name: str, available_in: list[str]
) -> None:
"""Import a single creative catalog and merge its entries."""
try:
from importlib import import_module
source_catalog = getattr(import_module(module_path), attr_name)
for tool_id, info in source_catalog.items():
catalog[tool_id] = {
"name": info["name"],
"description": info["description"],
"available_in": ["forge", "helm", "orchestrator"],
}
except ImportError:
pass
# ── Image tools ────────────────────────────────────────────────────────────
try:
from creative.tools.image_tools import IMAGE_TOOL_CATALOG
for tool_id, info in IMAGE_TOOL_CATALOG.items():
catalog[tool_id] = {
"name": info["name"],
"description": info["description"],
"available_in": ["pixel", "orchestrator"],
}
except ImportError:
pass
# ── Music tools ────────────────────────────────────────────────────────────
try:
from creative.tools.music_tools import MUSIC_TOOL_CATALOG
for tool_id, info in MUSIC_TOOL_CATALOG.items():
catalog[tool_id] = {
"name": info["name"],
"description": info["description"],
"available_in": ["lyra", "orchestrator"],
}
except ImportError:
pass
# ── Video tools ────────────────────────────────────────────────────────────
try:
from creative.tools.video_tools import VIDEO_TOOL_CATALOG
for tool_id, info in VIDEO_TOOL_CATALOG.items():
catalog[tool_id] = {
"name": info["name"],
"description": info["description"],
"available_in": ["reel", "orchestrator"],
}
except ImportError:
pass
# ── Creative pipeline ──────────────────────────────────────────────────────
try:
from creative.director import DIRECTOR_TOOL_CATALOG
for tool_id, info in DIRECTOR_TOOL_CATALOG.items():
catalog[tool_id] = {
"name": info["name"],
"description": info["description"],
"available_in": ["orchestrator"],
}
except ImportError:
pass
# ── Assembler tools ───────────────────────────────────────────────────────
try:
from creative.assembler import ASSEMBLER_TOOL_CATALOG
for tool_id, info in ASSEMBLER_TOOL_CATALOG.items():
catalog[tool_id] = {
"name": info["name"],
"description": info["description"],
"available_in": ["reel", "orchestrator"],
"available_in": available_in,
}
except ImportError:
pass

View File

@@ -26,7 +26,7 @@ def get_system_info() -> dict[str, Any]:
- python_version: Python version
- platform: OS platform
- model: Current Ollama model (queried from API)
- model_backend: Configured backend (ollama/airllm/grok)
- model_backend: Configured backend (ollama/grok/claude)
- ollama_url: Ollama host URL
- repo_root: Repository root path
- grok_enabled: Whether GROK is enabled
@@ -127,54 +127,48 @@ def check_ollama_health() -> dict[str, Any]:
return result
def get_memory_status() -> dict[str, Any]:
"""Get the status of Timmy's memory system.
Returns:
Dict with memory tier information
"""
from config import settings
repo_root = Path(settings.repo_root)
# Check tier 1: Hot memory
def _hot_memory_info(repo_root: Path) -> dict[str, Any]:
"""Tier 1: Hot memory (MEMORY.md) status."""
memory_md = repo_root / "MEMORY.md"
tier1_exists = memory_md.exists()
tier1_content = ""
if tier1_exists:
tier1_content = memory_md.read_text()[:500] # First 500 chars
tier1_content = memory_md.read_text()[:500]
# Check tier 2: Vault
vault_path = repo_root / "memory" / "self"
tier2_exists = vault_path.exists()
tier2_files = []
if tier2_exists:
tier2_files = [f.name for f in vault_path.iterdir() if f.is_file()]
tier1_info: dict[str, Any] = {
info: dict[str, Any] = {
"exists": tier1_exists,
"path": str(memory_md),
"preview": " ".join(tier1_content[:200].split()) if tier1_content else None,
}
if tier1_exists:
lines = memory_md.read_text().splitlines()
tier1_info["line_count"] = len(lines)
tier1_info["sections"] = [ln.lstrip("# ").strip() for ln in lines if ln.startswith("## ")]
info["line_count"] = len(lines)
info["sections"] = [ln.lstrip("# ").strip() for ln in lines if ln.startswith("## ")]
return info
def _vault_info(repo_root: Path) -> dict[str, Any]:
"""Tier 2: Vault (memory/ directory tree) status."""
vault_path = repo_root / "memory" / "self"
tier2_exists = vault_path.exists()
tier2_files = [f.name for f in vault_path.iterdir() if f.is_file()] if tier2_exists else []
# Vault — scan all subdirs under memory/
vault_root = repo_root / "memory"
vault_info: dict[str, Any] = {
info: dict[str, Any] = {
"exists": tier2_exists,
"path": str(vault_path),
"file_count": len(tier2_files),
"files": tier2_files[:10],
}
if vault_root.exists():
vault_info["directories"] = [d.name for d in vault_root.iterdir() if d.is_dir()]
vault_info["total_markdown_files"] = sum(1 for _ in vault_root.rglob("*.md"))
info["directories"] = [d.name for d in vault_root.iterdir() if d.is_dir()]
info["total_markdown_files"] = sum(1 for _ in vault_root.rglob("*.md"))
return info
# Tier 3: Semantic memory row count
tier3_info: dict[str, Any] = {"available": False}
def _semantic_memory_info(repo_root: Path) -> dict[str, Any]:
"""Tier 3: Semantic memory (vector DB) status."""
info: dict[str, Any] = {"available": False}
try:
sem_db = repo_root / "data" / "memory.db"
if sem_db.exists():
@@ -184,14 +178,16 @@ def get_memory_status() -> dict[str, Any]:
).fetchone()
if row and row[0]:
count = conn.execute("SELECT COUNT(*) FROM chunks").fetchone()
tier3_info["available"] = True
tier3_info["vector_count"] = count[0] if count else 0
info["available"] = True
info["vector_count"] = count[0] if count else 0
except Exception as exc:
logger.debug("Memory status query failed: %s", exc)
pass
return info
# Self-coding journal stats
journal_info: dict[str, Any] = {"available": False}
def _journal_info(repo_root: Path) -> dict[str, Any]:
"""Self-coding journal statistics."""
info: dict[str, Any] = {"available": False}
try:
journal_db = repo_root / "data" / "self_coding.db"
if journal_db.exists():
@@ -203,7 +199,7 @@ def get_memory_status() -> dict[str, Any]:
if rows:
counts = {r["outcome"]: r["cnt"] for r in rows}
total = sum(counts.values())
journal_info = {
info = {
"available": True,
"total_attempts": total,
"successes": counts.get("success", 0),
@@ -212,13 +208,24 @@ def get_memory_status() -> dict[str, Any]:
}
except Exception as exc:
logger.debug("Journal stats query failed: %s", exc)
pass
return info
def get_memory_status() -> dict[str, Any]:
"""Get the status of Timmy's memory system.
Returns:
Dict with memory tier information
"""
from config import settings
repo_root = Path(settings.repo_root)
return {
"tier1_hot_memory": tier1_info,
"tier2_vault": vault_info,
"tier3_semantic": tier3_info,
"self_coding_journal": journal_info,
"tier1_hot_memory": _hot_memory_info(repo_root),
"tier2_vault": _vault_info(repo_root),
"tier3_semantic": _semantic_memory_info(repo_root),
"self_coding_journal": _journal_info(repo_root),
}
@@ -319,6 +326,46 @@ def get_live_system_status() -> dict[str, Any]:
return result
def _build_pytest_cmd(venv_python: Path, scope: str) -> list[str]:
"""Build the pytest command list for the given scope."""
cmd = [str(venv_python), "-m", "pytest", "-x", "-q", "--tb=short", "--timeout=30"]
if scope == "fast":
cmd.extend(
[
"--ignore=tests/functional",
"--ignore=tests/e2e",
"--ignore=tests/integrations",
"tests/",
]
)
elif scope == "full":
cmd.append("tests/")
else:
cmd.append(scope)
return cmd
def _parse_pytest_output(output: str) -> dict[str, int]:
"""Extract passed/failed/error counts from pytest output."""
import re
passed = failed = errors = 0
for line in output.splitlines():
if "passed" in line or "failed" in line or "error" in line:
nums = re.findall(r"(\d+) (passed|failed|error)", line)
for count, kind in nums:
if kind == "passed":
passed = int(count)
elif kind == "failed":
failed = int(count)
elif kind == "error":
errors = int(count)
return {"passed": passed, "failed": failed, "errors": errors}
def run_self_tests(scope: str = "fast", _repo_root: str | None = None) -> dict[str, Any]:
"""Run Timmy's own test suite and report results.
@@ -342,49 +389,17 @@ def run_self_tests(scope: str = "fast", _repo_root: str | None = None) -> dict[s
if not venv_python.exists():
return {"success": False, "error": f"No venv found at {venv_python}"}
cmd = [str(venv_python), "-m", "pytest", "-x", "-q", "--tb=short", "--timeout=30"]
if scope == "fast":
# Unit tests only — skip functional/e2e/integration
cmd.extend(
[
"--ignore=tests/functional",
"--ignore=tests/e2e",
"--ignore=tests/integrations",
"tests/",
]
)
elif scope == "full":
cmd.append("tests/")
else:
# Specific path
cmd.append(scope)
cmd = _build_pytest_cmd(venv_python, scope)
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120, cwd=repo)
output = result.stdout + result.stderr
# Parse pytest output for counts
passed = failed = errors = 0
for line in output.splitlines():
if "passed" in line or "failed" in line or "error" in line:
import re
nums = re.findall(r"(\d+) (passed|failed|error)", line)
for count, kind in nums:
if kind == "passed":
passed = int(count)
elif kind == "failed":
failed = int(count)
elif kind == "error":
errors = int(count)
counts = _parse_pytest_output(output)
return {
"success": result.returncode == 0,
"passed": passed,
"failed": failed,
"errors": errors,
"total": passed + failed + errors,
**counts,
"total": counts["passed"] + counts["failed"] + counts["errors"],
"return_code": result.returncode,
"summary": output[-2000:] if len(output) > 2000 else output,
}

View File

@@ -78,6 +78,11 @@ DEFAULT_MAX_UTTERANCE = 30.0 # safety cap — don't record forever
DEFAULT_SESSION_ID = "voice"
def _rms(block: np.ndarray) -> float:
"""Compute root-mean-square energy of an audio block."""
return float(np.sqrt(np.mean(block.astype(np.float32) ** 2)))
@dataclass
class VoiceConfig:
"""Configuration for the voice loop."""
@@ -161,13 +166,6 @@ class VoiceLoop:
min_blocks = int(self.config.min_utterance / 0.1)
max_blocks = int(self.config.max_utterance / 0.1)
audio_chunks: list[np.ndarray] = []
silent_count = 0
recording = False
def _rms(block: np.ndarray) -> float:
return float(np.sqrt(np.mean(block.astype(np.float32) ** 2)))
sys.stdout.write("\n 🎤 Listening... (speak now)\n")
sys.stdout.flush()
@@ -177,42 +175,69 @@ class VoiceLoop:
dtype="float32",
blocksize=block_size,
) as stream:
while self._running:
block, overflowed = stream.read(block_size)
if overflowed:
logger.debug("Audio buffer overflowed")
chunks = self._capture_audio_blocks(stream, block_size, silence_blocks, max_blocks)
rms = _rms(block)
return self._finalize_utterance(chunks, min_blocks, sr)
if not recording:
if rms > self.config.silence_threshold:
recording = True
silent_count = 0
audio_chunks.append(block.copy())
sys.stdout.write(" 📢 Recording...\r")
sys.stdout.flush()
def _capture_audio_blocks(
self,
stream,
block_size: int,
silence_blocks: int,
max_blocks: int,
) -> list[np.ndarray]:
"""Read audio blocks from *stream* until silence or max length.
Returns the list of captured audio chunks (may be empty).
"""
chunks: list[np.ndarray] = []
silent_count = 0
recording = False
while self._running:
block, overflowed = stream.read(block_size)
if overflowed:
logger.debug("Audio buffer overflowed")
rms = _rms(block)
if not recording:
if rms > self.config.silence_threshold:
recording = True
silent_count = 0
chunks.append(block.copy())
sys.stdout.write(" 📢 Recording...\r")
sys.stdout.flush()
else:
chunks.append(block.copy())
if rms < self.config.silence_threshold:
silent_count += 1
else:
audio_chunks.append(block.copy())
silent_count = 0
if rms < self.config.silence_threshold:
silent_count += 1
else:
silent_count = 0
if silent_count >= silence_blocks:
break
# End of utterance
if silent_count >= silence_blocks:
break
if len(chunks) >= max_blocks:
logger.info("Max utterance length reached, stopping.")
break
# Safety cap
if len(audio_chunks) >= max_blocks:
logger.info("Max utterance length reached, stopping.")
break
return chunks
if not audio_chunks or len(audio_chunks) < min_blocks:
@staticmethod
def _finalize_utterance(
chunks: list[np.ndarray], min_blocks: int, sample_rate: int
) -> np.ndarray | None:
"""Concatenate recorded chunks and report duration.
Returns ``None`` if the utterance is too short to be meaningful.
"""
if not chunks or len(chunks) < min_blocks:
return None
audio = np.concatenate(audio_chunks, axis=0).flatten()
duration = len(audio) / sr
audio = np.concatenate(chunks, axis=0).flatten()
duration = len(audio) / sample_rate
sys.stdout.write(f" ✂️ Captured {duration:.1f}s of audio\n")
sys.stdout.flush()
return audio
@@ -369,15 +394,33 @@ class VoiceLoop:
# ── Main Loop ───────────────────────────────────────────────────────
def run(self) -> None:
"""Run the voice loop. Blocks until Ctrl-C."""
self._ensure_piper()
# Whisper hallucinates these on silence/noise — skip them.
_WHISPER_HALLUCINATIONS = frozenset(
{
"you",
"thanks.",
"thank you.",
"bye.",
"",
"thanks for watching!",
"thank you for watching!",
}
)
# Suppress MCP / Agno stderr noise during voice mode.
_suppress_mcp_noise()
# Suppress MCP async-generator teardown tracebacks on exit.
_install_quiet_asyncgen_hooks()
# Spoken phrases that end the voice session.
_EXIT_COMMANDS = frozenset(
{
"goodbye",
"exit",
"quit",
"stop",
"goodbye timmy",
"stop listening",
}
)
def _log_banner(self) -> None:
"""Log the startup banner with STT/TTS/LLM configuration."""
tts_label = (
"macOS say"
if self.config.use_say_fallback
@@ -393,52 +436,50 @@ class VoiceLoop:
" Press Ctrl-C to exit.\n" + "=" * 60
)
def _is_hallucination(self, text: str) -> bool:
"""Return True if *text* is a known Whisper hallucination."""
return not text or text.lower() in self._WHISPER_HALLUCINATIONS
def _is_exit_command(self, text: str) -> bool:
"""Return True if the user asked to stop the voice session."""
return text.lower().strip().rstrip(".!") in self._EXIT_COMMANDS
def _process_turn(self, text: str) -> None:
"""Handle a single listen-think-speak turn after transcription."""
sys.stdout.write(f"\n 👤 You: {text}\n")
sys.stdout.flush()
response = self._think(text)
sys.stdout.write(f" 🤖 Timmy: {response}\n")
sys.stdout.flush()
self._speak(response)
def run(self) -> None:
"""Run the voice loop. Blocks until Ctrl-C."""
self._ensure_piper()
_suppress_mcp_noise()
_install_quiet_asyncgen_hooks()
self._log_banner()
self._running = True
try:
while self._running:
# 1. LISTEN — record until silence
audio = self._record_utterance()
if audio is None:
continue
# 2. TRANSCRIBE — Whisper STT
text = self._transcribe(audio)
if not text or text.lower() in (
"you",
"thanks.",
"thank you.",
"bye.",
"",
"thanks for watching!",
"thank you for watching!",
):
# Whisper hallucinations on silence/noise
if self._is_hallucination(text):
logger.debug("Ignoring likely Whisper hallucination: '%s'", text)
continue
sys.stdout.write(f"\n 👤 You: {text}\n")
sys.stdout.flush()
# Exit commands
if text.lower().strip().rstrip(".!") in (
"goodbye",
"exit",
"quit",
"stop",
"goodbye timmy",
"stop listening",
):
if self._is_exit_command(text):
logger.info("👋 Goodbye!")
break
# 3. THINK — send to Timmy
response = self._think(text)
sys.stdout.write(f" 🤖 Timmy: {response}\n")
sys.stdout.flush()
# 4. SPEAK — TTS output
self._speak(response)
self._process_turn(text)
except KeyboardInterrupt:
logger.info("👋 Voice loop stopped.")

View File

@@ -75,6 +75,8 @@ def create_timmy_serve_app() -> FastAPI:
@asynccontextmanager
async def lifespan(app: FastAPI):
logger.info("Timmy Serve starting")
app.state.timmy = create_timmy()
logger.info("Timmy agent cached in app state")
yield
logger.info("Timmy Serve shutting down")
@@ -101,7 +103,7 @@ def create_timmy_serve_app() -> FastAPI:
async def serve_chat(request: Request, body: ChatRequest):
"""Process a chat request."""
try:
timmy = create_timmy()
timmy = request.app.state.timmy
result = timmy.run(body.message, stream=False)
response_text = result.content if hasattr(result, "content") else str(result)

View File

@@ -2493,3 +2493,57 @@
.db-cell { max-width: 300px; overflow: hidden; text-overflow: ellipsis; white-space: nowrap; }
.db-cell:hover { white-space: normal; word-break: break-all; }
.db-truncated { font-size: 0.7rem; color: var(--amber); padding: 0.3rem 0; }
/* ── Tower ────────────────────────────────────────────────────────────── */
.tower-container { max-width: 1400px; margin: 0 auto; }
.tower-header { margin-bottom: 1rem; }
.tower-title { font-size: 1.6rem; font-weight: 700; color: var(--green); letter-spacing: 0.15em; }
.tower-subtitle { font-size: 0.85rem; color: var(--text-dim); }
.tower-conn-badge { font-size: 0.7rem; font-weight: 600; padding: 2px 8px; border-radius: 3px; letter-spacing: 0.08em; }
.tower-conn-live { color: var(--green); border: 1px solid var(--green); }
.tower-conn-offline { color: var(--red); border: 1px solid var(--red); }
.tower-conn-connecting { color: var(--amber); border: 1px solid var(--amber); }
.tower-phase-card { min-height: 300px; }
.tower-phase-thinking { border-left: 3px solid var(--purple); }
.tower-phase-predicting { border-left: 3px solid var(--orange); }
.tower-phase-advising { border-left: 3px solid var(--green); }
.tower-scroll { max-height: 50vh; overflow-y: auto; }
.tower-empty { text-align: center; color: var(--text-dim); padding: 16px; font-size: 0.85rem; }
.tower-stat-grid { display: grid; grid-template-columns: repeat(4, 1fr); gap: 0.5rem; text-align: center; }
.tower-stat-label { display: block; font-size: 0.65rem; color: var(--text-dim); letter-spacing: 0.1em; }
.tower-stat-value { display: block; font-size: 1.1rem; font-weight: 700; color: var(--text-bright); }
.tower-event { padding: 8px; margin-bottom: 6px; border-left: 3px solid var(--border); border-radius: 3px; background: var(--bg-card); }
.tower-etype-task_posted { border-left-color: var(--purple); }
.tower-etype-bid_submitted { border-left-color: var(--orange); }
.tower-etype-task_completed { border-left-color: var(--green); }
.tower-etype-task_failed { border-left-color: var(--red); }
.tower-etype-agent_joined { border-left-color: var(--purple); }
.tower-etype-tool_executed { border-left-color: var(--amber); }
.tower-ev-head { display: flex; justify-content: space-between; align-items: center; margin-bottom: 4px; }
.tower-ev-badge { font-size: 0.65rem; font-weight: 600; color: var(--text-bright); letter-spacing: 0.08em; }
.tower-ev-dots { font-size: 0.6rem; color: var(--amber); }
.tower-ev-desc { font-size: 0.8rem; color: var(--text); }
.tower-ev-time { font-size: 0.65rem; color: var(--text-dim); margin-top: 2px; }
.tower-pred { padding: 8px; margin-bottom: 6px; border-radius: 3px; background: var(--bg-card); border-left: 3px solid var(--orange); }
.tower-pred-done { border-left-color: var(--green); }
.tower-pred-pending { border-left-color: var(--amber); }
.tower-pred-head { display: flex; justify-content: space-between; align-items: center; }
.tower-pred-task { font-size: 0.75rem; font-weight: 600; color: var(--text-bright); font-family: monospace; }
.tower-pred-acc { font-size: 0.75rem; font-weight: 700; }
.tower-pred-detail { font-size: 0.75rem; color: var(--text-dim); margin-top: 4px; }
.tower-advisory { padding: 8px; margin-bottom: 6px; border-radius: 3px; background: var(--bg-card); border-left: 3px solid var(--border); }
.tower-adv-high { border-left-color: var(--red); }
.tower-adv-medium { border-left-color: var(--orange); }
.tower-adv-low { border-left-color: var(--green); }
.tower-adv-head { display: flex; justify-content: space-between; font-size: 0.65rem; margin-bottom: 4px; }
.tower-adv-cat { font-weight: 600; color: var(--text-dim); letter-spacing: 0.08em; }
.tower-adv-prio { font-weight: 700; color: var(--amber); }
.tower-adv-title { font-size: 0.85rem; font-weight: 600; color: var(--text-bright); }
.tower-adv-detail { font-size: 0.8rem; color: var(--text); margin-top: 2px; }
.tower-adv-action { font-size: 0.75rem; color: var(--green); margin-top: 4px; font-style: italic; }

View File

@@ -18,7 +18,6 @@ except ImportError:
# agno is a core dependency (always installed) — do NOT stub it, or its
# internal import chains break under xdist parallel workers.
for _mod in [
"airllm",
"mcp",
"mcp.client",
"mcp.client.stdio",

View File

@@ -10,12 +10,10 @@ Categories:
M3xx iOS keyboard & zoom prevention
M4xx HTMX robustness (double-submit, sync)
M5xx Safe-area / notch support
M6xx AirLLM backend interface contract
"""
import re
from pathlib import Path
from unittest.mock import AsyncMock, MagicMock, patch
# ── helpers ───────────────────────────────────────────────────────────────────
@@ -206,147 +204,3 @@ def test_M505_dvh_units_used():
"""Dynamic viewport height (dvh) accounts for collapsing browser chrome."""
css = _css()
assert "dvh" in css
# ── M6xx — AirLLM backend interface contract ──────────────────────────────────
def test_M601_airllm_agent_has_run_method():
"""TimmyAirLLMAgent must expose run() so the dashboard route can call it."""
from timmy.backends import TimmyAirLLMAgent
assert hasattr(TimmyAirLLMAgent, "run"), (
"TimmyAirLLMAgent is missing run() — dashboard will fail with AirLLM backend"
)
def test_M602_airllm_run_returns_content_attribute():
"""run() must return an object with a .content attribute (Agno RunResponse compat)."""
with patch("timmy.backends.is_apple_silicon", return_value=False):
from timmy.backends import TimmyAirLLMAgent
agent = TimmyAirLLMAgent(model_size="8b")
mock_model = MagicMock()
mock_tokenizer = MagicMock()
input_ids_mock = MagicMock()
input_ids_mock.shape = [1, 5]
mock_tokenizer.return_value = {"input_ids": input_ids_mock}
mock_tokenizer.decode.return_value = "Sir, affirmative."
mock_model.tokenizer = mock_tokenizer
mock_model.generate.return_value = [list(range(10))]
agent._model = mock_model
result = agent.run("test")
assert hasattr(result, "content"), "run() result must have a .content attribute"
assert isinstance(result.content, str)
def test_M603_airllm_run_updates_history():
"""run() must update _history so multi-turn context is preserved."""
with patch("timmy.backends.is_apple_silicon", return_value=False):
from timmy.backends import TimmyAirLLMAgent
agent = TimmyAirLLMAgent(model_size="8b")
mock_model = MagicMock()
mock_tokenizer = MagicMock()
input_ids_mock = MagicMock()
input_ids_mock.shape = [1, 5]
mock_tokenizer.return_value = {"input_ids": input_ids_mock}
mock_tokenizer.decode.return_value = "Acknowledged."
mock_model.tokenizer = mock_tokenizer
mock_model.generate.return_value = [list(range(10))]
agent._model = mock_model
assert len(agent._history) == 0
agent.run("hello")
assert len(agent._history) == 2
assert any("hello" in h for h in agent._history)
def test_M604_airllm_print_response_delegates_to_run():
"""print_response must use run() so both interfaces share one inference path."""
with patch("timmy.backends.is_apple_silicon", return_value=False):
from timmy.backends import RunResult, TimmyAirLLMAgent
agent = TimmyAirLLMAgent(model_size="8b")
with (
patch.object(agent, "run", return_value=RunResult(content="ok")) as mock_run,
patch.object(agent, "_render"),
):
agent.print_response("hello", stream=True)
mock_run.assert_called_once_with("hello", stream=True)
def test_M605_health_status_passes_model_to_template(client):
"""Health status partial must receive the configured model name, not a hardcoded string."""
from config import settings
with patch(
"dashboard.routes.health.check_ollama",
new_callable=AsyncMock,
return_value=True,
):
response = client.get("/health/status")
# Model name should come from settings, not be hardcoded
assert response.status_code == 200
model_short = settings.ollama_model.split(":")[0]
assert model_short in response.text
# ── M7xx — XSS prevention ─────────────────────────────────────────────────────
def _mobile_html() -> str:
"""Read the mobile template source."""
path = Path(__file__).parent.parent.parent / "src" / "dashboard" / "templates" / "mobile.html"
return path.read_text()
def _swarm_live_html() -> str:
"""Read the swarm live template source."""
path = (
Path(__file__).parent.parent.parent / "src" / "dashboard" / "templates" / "swarm_live.html"
)
return path.read_text()
def test_M701_mobile_chat_no_raw_message_interpolation():
"""mobile.html must not interpolate ${message} directly into innerHTML — XSS risk."""
html = _mobile_html()
# The vulnerable pattern is `${message}` inside a template literal assigned to innerHTML
# After the fix, message must only appear via textContent assignment
assert "textContent = message" in html or "textContent=message" in html, (
"mobile.html still uses innerHTML + ${message} interpolation — XSS vulnerability"
)
def test_M702_mobile_chat_user_input_not_in_innerhtml_template_literal():
"""${message} must not appear inside a backtick string that is assigned to innerHTML."""
html = _mobile_html()
# Find all innerHTML += `...` blocks and verify none contain ${message}
blocks = re.findall(r"innerHTML\s*\+=?\s*`([^`]*)`", html, re.DOTALL)
for block in blocks:
assert "${message}" not in block, (
"innerHTML template literal still contains ${message} — XSS vulnerability"
)
def test_M703_swarm_live_agent_name_not_interpolated_in_innerhtml():
"""swarm_live.html must not put ${agent.name} inside innerHTML template literals."""
html = _swarm_live_html()
blocks = re.findall(r"innerHTML\s*=\s*agents\.map\([^;]+\)\.join\([^)]*\)", html, re.DOTALL)
assert len(blocks) == 0, (
"swarm_live.html still uses innerHTML=agents.map(…) with interpolated agent data — XSS vulnerability"
)
def test_M704_swarm_live_uses_textcontent_for_agent_data():
"""swarm_live.html must use textContent (not innerHTML) to set agent name/description."""
html = _swarm_live_html()
assert "textContent" in html, (
"swarm_live.html does not use textContent — agent data may be raw-interpolated into DOM"
)

View File

@@ -5,9 +5,14 @@ from datetime import UTC, datetime, timedelta
from unittest.mock import patch
from infrastructure.error_capture import (
_create_bug_report,
_dedup_cache,
_extract_traceback_info,
_get_git_context,
_is_duplicate,
_log_error_event,
_notify_bug_report,
_record_to_session,
_stack_hash,
capture_error,
)
@@ -193,3 +198,91 @@ class TestCaptureError:
def teardown_method(self):
_dedup_cache.clear()
class TestExtractTracebackInfo:
"""Test _extract_traceback_info helper."""
def test_returns_three_tuple(self):
try:
raise ValueError("extract test")
except ValueError as e:
tb_str, affected_file, affected_line = _extract_traceback_info(e)
assert "ValueError" in tb_str
assert "extract test" in tb_str
assert affected_file.endswith(".py")
assert affected_line > 0
def test_file_points_to_raise_site(self):
try:
_make_exception()
except ValueError as e:
_, affected_file, _ = _extract_traceback_info(e)
assert "test_error_capture" in affected_file
class TestLogErrorEvent:
"""Test _log_error_event helper."""
def test_does_not_crash_on_missing_deps(self):
try:
raise RuntimeError("log test")
except RuntimeError as e:
_log_error_event(e, "test", "abc123", "file.py", 42, {"branch": "main"})
class TestCreateBugReport:
"""Test _create_bug_report helper."""
def test_does_not_crash_on_missing_deps(self):
try:
raise RuntimeError("report test")
except RuntimeError as e:
result = _create_bug_report(
e, "test", None, "abc123", "traceback...", "file.py", 42, {}
)
# May return None if swarm deps unavailable — that's fine
assert result is None or isinstance(result, str)
def test_with_context(self):
try:
raise RuntimeError("ctx test")
except RuntimeError as e:
result = _create_bug_report(e, "test", {"path": "/api"}, "abc", "tb", "f.py", 1, {})
assert result is None or isinstance(result, str)
class TestNotifyBugReport:
"""Test _notify_bug_report helper."""
def test_does_not_crash(self):
try:
raise RuntimeError("notify test")
except RuntimeError as e:
_notify_bug_report(e, "test")
class TestRecordToSession:
"""Test _record_to_session helper."""
def test_does_not_crash_without_recorder(self):
try:
raise RuntimeError("session test")
except RuntimeError as e:
_record_to_session(e, "test")
def test_calls_registered_recorder(self):
from infrastructure.error_capture import register_error_recorder
calls = []
register_error_recorder(lambda **kwargs: calls.append(kwargs))
try:
try:
raise RuntimeError("callback test")
except RuntimeError as e:
_record_to_session(e, "test_source")
assert len(calls) == 1
assert "RuntimeError" in calls[0]["error"]
assert calls[0]["context"] == "test_source"
finally:
register_error_recorder(None)

View File

@@ -2,7 +2,7 @@
import time
from pathlib import Path
from unittest.mock import AsyncMock, MagicMock, patch
from unittest.mock import AsyncMock, patch
import pytest
import yaml
@@ -489,30 +489,182 @@ class TestProviderAvailabilityCheck:
assert router._check_provider_available(provider) is False
def test_check_airllm_installed(self):
"""Test AirLLM when installed."""
router = CascadeRouter(config_path=Path("/nonexistent"))
provider = Provider(
name="airllm",
type="airllm",
enabled=True,
priority=1,
class TestCascadeRouterReload:
"""Test hot-reload of providers.yaml."""
def test_reload_preserves_metrics(self, tmp_path):
"""Test that reload preserves metrics for existing providers."""
config = {
"providers": [
{
"name": "test-openai",
"type": "openai",
"enabled": True,
"priority": 1,
"api_key": "sk-test",
}
],
}
config_path = tmp_path / "providers.yaml"
config_path.write_text(yaml.dump(config))
router = CascadeRouter(config_path=config_path)
assert len(router.providers) == 1
# Simulate some traffic
router._record_success(router.providers[0], 150.0)
router._record_success(router.providers[0], 250.0)
assert router.providers[0].metrics.total_requests == 2
# Reload
result = router.reload_config()
assert result["total_providers"] == 1
assert result["preserved"] == 1
assert result["added"] == []
assert result["removed"] == []
# Metrics survived
assert router.providers[0].metrics.total_requests == 2
assert router.providers[0].metrics.total_latency_ms == 400.0
def test_reload_preserves_circuit_breaker(self, tmp_path):
"""Test that reload preserves circuit breaker state."""
config = {
"cascade": {"circuit_breaker": {"failure_threshold": 2}},
"providers": [
{
"name": "test-openai",
"type": "openai",
"enabled": True,
"priority": 1,
"api_key": "sk-test",
}
],
}
config_path = tmp_path / "providers.yaml"
config_path.write_text(yaml.dump(config))
router = CascadeRouter(config_path=config_path)
# Open circuit breaker
for _ in range(2):
router._record_failure(router.providers[0])
assert router.providers[0].circuit_state == CircuitState.OPEN
# Reload
router.reload_config()
# Circuit breaker state preserved
assert router.providers[0].circuit_state == CircuitState.OPEN
assert router.providers[0].status == ProviderStatus.UNHEALTHY
def test_reload_detects_added_provider(self, tmp_path):
"""Test that reload detects newly added providers."""
config = {
"providers": [
{
"name": "openai-1",
"type": "openai",
"enabled": True,
"priority": 1,
"api_key": "sk-test",
}
],
}
config_path = tmp_path / "providers.yaml"
config_path.write_text(yaml.dump(config))
router = CascadeRouter(config_path=config_path)
assert len(router.providers) == 1
# Add a second provider to config
config["providers"].append(
{
"name": "anthropic-1",
"type": "anthropic",
"enabled": True,
"priority": 2,
"api_key": "sk-ant-test",
}
)
config_path.write_text(yaml.dump(config))
with patch("importlib.util.find_spec", return_value=MagicMock()):
assert router._check_provider_available(provider) is True
result = router.reload_config()
def test_check_airllm_not_installed(self):
"""Test AirLLM when not installed."""
router = CascadeRouter(config_path=Path("/nonexistent"))
assert result["total_providers"] == 2
assert result["preserved"] == 1
assert result["added"] == ["anthropic-1"]
assert result["removed"] == []
provider = Provider(
name="airllm",
type="airllm",
enabled=True,
priority=1,
)
def test_reload_detects_removed_provider(self, tmp_path):
"""Test that reload detects removed providers."""
config = {
"providers": [
{
"name": "openai-1",
"type": "openai",
"enabled": True,
"priority": 1,
"api_key": "sk-test",
},
{
"name": "anthropic-1",
"type": "anthropic",
"enabled": True,
"priority": 2,
"api_key": "sk-ant-test",
},
],
}
config_path = tmp_path / "providers.yaml"
config_path.write_text(yaml.dump(config))
with patch("importlib.util.find_spec", return_value=None):
assert router._check_provider_available(provider) is False
router = CascadeRouter(config_path=config_path)
assert len(router.providers) == 2
# Remove anthropic
config["providers"] = [config["providers"][0]]
config_path.write_text(yaml.dump(config))
result = router.reload_config()
assert result["total_providers"] == 1
assert result["preserved"] == 1
assert result["removed"] == ["anthropic-1"]
def test_reload_re_sorts_by_priority(self, tmp_path):
"""Test that providers are re-sorted by priority after reload."""
config = {
"providers": [
{
"name": "low-priority",
"type": "openai",
"enabled": True,
"priority": 10,
"api_key": "sk-test",
},
{
"name": "high-priority",
"type": "openai",
"enabled": True,
"priority": 1,
"api_key": "sk-test2",
},
],
}
config_path = tmp_path / "providers.yaml"
config_path.write_text(yaml.dump(config))
router = CascadeRouter(config_path=config_path)
assert router.providers[0].name == "high-priority"
# Swap priorities
config["providers"][0]["priority"] = 1
config["providers"][1]["priority"] = 10
config_path.write_text(yaml.dump(config))
router.reload_config()
assert router.providers[0].name == "low-priority"
assert router.providers[1].name == "high-priority"

View File

@@ -0,0 +1,149 @@
"""Tests for provider health history store and API endpoint."""
import time
from datetime import UTC, datetime, timedelta
from unittest.mock import MagicMock
import pytest
from src.infrastructure.router.history import HealthHistoryStore
@pytest.fixture
def store():
"""In-memory history store for testing."""
s = HealthHistoryStore(db_path=":memory:")
yield s
s.close()
@pytest.fixture
def sample_providers():
return [
{
"name": "anthropic",
"status": "healthy",
"error_rate": 0.01,
"avg_latency_ms": 250.5,
"circuit_state": "closed",
"total_requests": 100,
},
{
"name": "local",
"status": "degraded",
"error_rate": 0.15,
"avg_latency_ms": 80.0,
"circuit_state": "closed",
"total_requests": 50,
},
]
def test_record_and_retrieve(store, sample_providers):
store.record_snapshot(sample_providers)
history = store.get_history(hours=1)
assert len(history) == 1
assert len(history[0]["providers"]) == 2
assert history[0]["providers"][0]["name"] == "anthropic"
assert history[0]["providers"][1]["name"] == "local"
assert "timestamp" in history[0]
def test_multiple_snapshots(store, sample_providers):
store.record_snapshot(sample_providers)
time.sleep(0.01)
store.record_snapshot(sample_providers)
history = store.get_history(hours=1)
assert len(history) == 2
def test_hours_filtering(store, sample_providers):
old_ts = (datetime.now(UTC) - timedelta(hours=48)).isoformat()
store._conn.execute(
"""INSERT INTO snapshots
(timestamp, provider_name, status, error_rate,
avg_latency_ms, circuit_state, total_requests)
VALUES (?, ?, ?, ?, ?, ?, ?)""",
(old_ts, "anthropic", "healthy", 0.0, 100.0, "closed", 10),
)
store._conn.commit()
store.record_snapshot(sample_providers)
history = store.get_history(hours=24)
assert len(history) == 1
history = store.get_history(hours=72)
assert len(history) == 2
def test_prune(store, sample_providers):
old_ts = (datetime.now(UTC) - timedelta(hours=200)).isoformat()
store._conn.execute(
"""INSERT INTO snapshots
(timestamp, provider_name, status, error_rate,
avg_latency_ms, circuit_state, total_requests)
VALUES (?, ?, ?, ?, ?, ?, ?)""",
(old_ts, "anthropic", "healthy", 0.0, 100.0, "closed", 10),
)
store._conn.commit()
store.record_snapshot(sample_providers)
deleted = store.prune(keep_hours=168)
assert deleted == 1
history = store.get_history(hours=999)
assert len(history) == 1
def test_empty_history(store):
assert store.get_history(hours=24) == []
def test_capture_snapshot_from_router(store):
mock_metrics = MagicMock()
mock_metrics.error_rate = 0.05
mock_metrics.avg_latency_ms = 200.0
mock_metrics.total_requests = 42
mock_provider = MagicMock()
mock_provider.name = "test-provider"
mock_provider.status.value = "healthy"
mock_provider.metrics = mock_metrics
mock_provider.circuit_state.value = "closed"
mock_router = MagicMock()
mock_router.providers = [mock_provider]
store._capture_snapshot(mock_router)
history = store.get_history(hours=1)
assert len(history) == 1
p = history[0]["providers"][0]
assert p["name"] == "test-provider"
assert p["status"] == "healthy"
assert p["error_rate"] == 0.05
assert p["total_requests"] == 42
def test_history_api_endpoint(store, sample_providers):
"""GET /api/v1/router/history returns snapshot data."""
store.record_snapshot(sample_providers)
from fastapi import FastAPI
from fastapi.testclient import TestClient
from src.infrastructure.router.api import get_cascade_router
from src.infrastructure.router.api import router as api_router
from src.infrastructure.router.history import get_history_store
app = FastAPI()
app.include_router(api_router)
app.dependency_overrides[get_history_store] = lambda: store
app.dependency_overrides[get_cascade_router] = lambda: MagicMock()
client = TestClient(app)
resp = client.get("/api/v1/router/history?hours=1")
assert resp.status_code == 200
data = resp.json()
assert len(data) == 1
assert len(data[0]["providers"]) == 2
assert data[0]["providers"][0]["name"] == "anthropic"
app.dependency_overrides.clear()

View File

@@ -174,6 +174,103 @@ class TestDiscordVendor:
assert result is False
class TestExtractContent:
def test_strips_bot_mention(self):
from integrations.chat_bridge.vendors.discord import DiscordVendor
vendor = DiscordVendor()
vendor._client = MagicMock()
vendor._client.user.id = 12345
msg = MagicMock()
msg.content = "<@12345> hello there"
assert vendor._extract_content(msg) == "hello there"
def test_no_client_user(self):
from integrations.chat_bridge.vendors.discord import DiscordVendor
vendor = DiscordVendor()
vendor._client = MagicMock()
vendor._client.user = None
msg = MagicMock()
msg.content = "hello"
assert vendor._extract_content(msg) == "hello"
def test_empty_after_strip(self):
from integrations.chat_bridge.vendors.discord import DiscordVendor
vendor = DiscordVendor()
vendor._client = MagicMock()
vendor._client.user.id = 99
msg = MagicMock()
msg.content = "<@99>"
assert vendor._extract_content(msg) == ""
class TestInvokeAgent:
@staticmethod
def _make_typing_target():
"""Build a mock target whose .typing() is an async context manager."""
from contextlib import asynccontextmanager
target = AsyncMock()
@asynccontextmanager
async def _typing():
yield
target.typing = _typing
return target
@pytest.mark.asyncio
async def test_timeout_returns_error(self):
from integrations.chat_bridge.vendors.discord import DiscordVendor
vendor = DiscordVendor()
target = self._make_typing_target()
with patch(
"integrations.chat_bridge.vendors.discord.chat_with_tools", side_effect=TimeoutError
):
run_output, response = await vendor._invoke_agent("hi", "sess", target)
assert run_output is None
assert "too long" in response
@pytest.mark.asyncio
async def test_exception_returns_error(self):
from integrations.chat_bridge.vendors.discord import DiscordVendor
vendor = DiscordVendor()
target = self._make_typing_target()
with patch(
"integrations.chat_bridge.vendors.discord.chat_with_tools",
side_effect=RuntimeError("boom"),
):
run_output, response = await vendor._invoke_agent("hi", "sess", target)
assert run_output is None
assert "trouble" in response
class TestSendResponse:
@pytest.mark.asyncio
async def test_skips_empty(self):
from integrations.chat_bridge.vendors.discord import DiscordVendor
target = AsyncMock()
await DiscordVendor._send_response(None, target)
target.send.assert_not_called()
await DiscordVendor._send_response("", target)
target.send.assert_not_called()
@pytest.mark.asyncio
async def test_sends_short_message(self):
from integrations.chat_bridge.vendors.discord import DiscordVendor
target = AsyncMock()
await DiscordVendor._send_response("hello", target)
target.send.assert_called_once_with("hello")
class TestChunkMessage:
def test_short_message(self):
from integrations.chat_bridge.vendors.discord import _chunk_message

View File

@@ -0,0 +1,86 @@
"""Tests for scripts/cycle_retro.py issue auto-detection."""
from __future__ import annotations
# Import the module under test — it's a script so we import the helpers directly
import importlib
import subprocess
from pathlib import Path
from unittest.mock import patch
import pytest
SCRIPTS_DIR = Path(__file__).resolve().parent.parent.parent / "scripts"
@pytest.fixture(autouse=True)
def _add_scripts_to_path(monkeypatch):
monkeypatch.syspath_prepend(str(SCRIPTS_DIR))
@pytest.fixture()
def mod():
"""Import cycle_retro as a module."""
return importlib.import_module("cycle_retro")
class TestDetectIssueFromBranch:
def test_kimi_issue_branch(self, mod):
with patch.object(subprocess, "check_output", return_value="kimi/issue-492\n"):
assert mod.detect_issue_from_branch() == 492
def test_plain_issue_branch(self, mod):
with patch.object(subprocess, "check_output", return_value="issue-123\n"):
assert mod.detect_issue_from_branch() == 123
def test_issue_slash_number(self, mod):
with patch.object(subprocess, "check_output", return_value="fix/issue/55\n"):
assert mod.detect_issue_from_branch() == 55
def test_no_issue_in_branch(self, mod):
with patch.object(subprocess, "check_output", return_value="main\n"):
assert mod.detect_issue_from_branch() is None
def test_feature_branch(self, mod):
with patch.object(subprocess, "check_output", return_value="feature/add-widget\n"):
assert mod.detect_issue_from_branch() is None
def test_git_not_available(self, mod):
with patch.object(subprocess, "check_output", side_effect=FileNotFoundError):
assert mod.detect_issue_from_branch() is None
def test_git_fails(self, mod):
with patch.object(
subprocess,
"check_output",
side_effect=subprocess.CalledProcessError(1, "git"),
):
assert mod.detect_issue_from_branch() is None
class TestBackfillExtractIssueNumber:
"""Tests for backfill_retro.extract_issue_number PR-number filtering."""
@pytest.fixture()
def backfill(self):
return importlib.import_module("backfill_retro")
def test_body_has_issue(self, backfill):
assert backfill.extract_issue_number("fix: foo (#491)", "Fixes #490", pr_number=491) == 490
def test_title_skips_pr_number(self, backfill):
assert backfill.extract_issue_number("fix: foo (#491)", "", pr_number=491) is None
def test_title_with_issue_and_pr(self, backfill):
# [loop-cycle-538] refactor: ... (#459) (#481)
assert (
backfill.extract_issue_number(
"[loop-cycle-538] refactor: remove dead airllm (#459) (#481)",
"",
pr_number=481,
)
== 459
)
def test_no_pr_number_provided(self, backfill):
assert backfill.extract_issue_number("fix: foo (#491)", "") == 491

View File

@@ -49,6 +49,34 @@ class TestConfigLazyValidation:
# Should not raise
validate_startup(force=True)
def test_validate_startup_exits_on_cors_wildcard_in_production(self):
"""validate_startup() should exit in production when CORS has wildcard."""
from config import settings, validate_startup
with (
patch.object(settings, "timmy_env", "production"),
patch.object(settings, "l402_hmac_secret", "test-secret-hex-value-32"),
patch.object(settings, "l402_macaroon_secret", "test-macaroon-hex-value-32"),
patch.object(settings, "cors_origins", ["*"]),
pytest.raises(SystemExit),
):
validate_startup(force=True)
def test_validate_startup_warns_cors_wildcard_in_dev(self):
"""validate_startup() should warn in dev when CORS has wildcard."""
from config import settings, validate_startup
with (
patch.object(settings, "timmy_env", "development"),
patch.object(settings, "cors_origins", ["*"]),
patch("config._startup_logger") as mock_logger,
):
validate_startup(force=True)
mock_logger.warning.assert_any_call(
"SEC: CORS_ORIGINS contains wildcard '*'"
"restrict to explicit origins before deploying to production."
)
def test_validate_startup_skips_in_test_mode(self):
"""validate_startup() should be a no-op in test mode."""
from config import validate_startup

View File

@@ -81,7 +81,6 @@ def test_create_timmy_respects_custom_ollama_url():
mock_settings.ollama_url = custom_url
mock_settings.ollama_num_ctx = 4096
mock_settings.timmy_model_backend = "ollama"
mock_settings.airllm_model_size = "70b"
from timmy.agent import create_timmy
@@ -91,33 +90,6 @@ def test_create_timmy_respects_custom_ollama_url():
assert kwargs["host"] == custom_url
# ── AirLLM path ──────────────────────────────────────────────────────────────
def test_create_timmy_airllm_returns_airllm_agent():
"""backend='airllm' must return a TimmyAirLLMAgent, not an Agno Agent."""
with patch("timmy.backends.is_apple_silicon", return_value=False):
from timmy.agent import create_timmy
from timmy.backends import TimmyAirLLMAgent
result = create_timmy(backend="airllm", model_size="8b")
assert isinstance(result, TimmyAirLLMAgent)
def test_create_timmy_airllm_does_not_call_agno_agent():
"""When using the airllm backend, Agno Agent should never be instantiated."""
with (
patch("timmy.agent.Agent") as MockAgent,
patch("timmy.backends.is_apple_silicon", return_value=False),
):
from timmy.agent import create_timmy
create_timmy(backend="airllm", model_size="8b")
MockAgent.assert_not_called()
def test_create_timmy_explicit_ollama_ignores_autodetect():
"""backend='ollama' must always use Ollama, even on Apple Silicon."""
with (
@@ -141,7 +113,6 @@ def test_create_timmy_explicit_ollama_ignores_autodetect():
def test_resolve_backend_explicit_takes_priority():
from timmy.agent import _resolve_backend
assert _resolve_backend("airllm") == "airllm"
assert _resolve_backend("ollama") == "ollama"
@@ -152,39 +123,6 @@ def test_resolve_backend_defaults_to_ollama_without_config():
assert _resolve_backend(None) == "ollama"
def test_resolve_backend_auto_uses_airllm_on_apple_silicon():
"""'auto' on Apple Silicon with airllm stubbed → 'airllm'."""
with (
patch("timmy.backends.is_apple_silicon", return_value=True),
patch("timmy.agent.settings") as mock_settings,
):
mock_settings.timmy_model_backend = "auto"
mock_settings.airllm_model_size = "70b"
mock_settings.ollama_model = "llama3.2"
from timmy.agent import _resolve_backend
assert _resolve_backend(None) == "airllm"
def test_resolve_backend_auto_falls_back_on_non_apple():
"""'auto' on non-Apple Silicon → 'ollama'."""
with (
patch("timmy.backends.is_apple_silicon", return_value=False),
patch("timmy.agent.settings") as mock_settings,
):
mock_settings.timmy_model_backend = "auto"
mock_settings.airllm_model_size = "70b"
mock_settings.ollama_model = "llama3.2"
from timmy.agent import _resolve_backend
assert _resolve_backend(None) == "ollama"
# ── _model_supports_tools ────────────────────────────────────────────────────
def test_model_supports_tools_llama32_returns_false():
"""llama3.2 (3B) is too small for reliable tool calling."""
from timmy.agent import _model_supports_tools
@@ -259,7 +197,6 @@ def test_create_timmy_includes_tools_for_large_model():
mock_settings.ollama_url = "http://localhost:11434"
mock_settings.ollama_num_ctx = 4096
mock_settings.timmy_model_backend = "ollama"
mock_settings.airllm_model_size = "70b"
mock_settings.telemetry_enabled = False
from timmy.agent import create_timmy
@@ -444,6 +381,150 @@ def test_get_effective_ollama_model_walks_fallback_chain():
assert result == "fb-2"
# ── _build_tools_list ─────────────────────────────────────────────────────
def test_build_tools_list_empty_when_tools_disabled():
"""Small models get an empty tools list."""
from timmy.agent import _build_tools_list
result = _build_tools_list(use_tools=False, skip_mcp=False, model_name="llama3.2")
assert result == []
def test_build_tools_list_includes_toolkit_when_enabled():
"""Tool-capable models get the full toolkit."""
mock_toolkit = MagicMock()
with patch("timmy.agent.create_full_toolkit", return_value=mock_toolkit):
from timmy.agent import _build_tools_list
result = _build_tools_list(use_tools=True, skip_mcp=True, model_name="llama3.1")
assert mock_toolkit in result
def test_build_tools_list_skips_mcp_when_flagged():
"""skip_mcp=True must not call MCP factories."""
mock_toolkit = MagicMock()
with (
patch("timmy.agent.create_full_toolkit", return_value=mock_toolkit),
patch("timmy.mcp_tools.create_gitea_mcp_tools") as mock_gitea,
patch("timmy.mcp_tools.create_filesystem_mcp_tools") as mock_fs,
):
from timmy.agent import _build_tools_list
_build_tools_list(use_tools=True, skip_mcp=True, model_name="llama3.1")
mock_gitea.assert_not_called()
mock_fs.assert_not_called()
def test_build_tools_list_includes_mcp_when_not_skipped():
"""skip_mcp=False should attempt MCP tool creation."""
mock_toolkit = MagicMock()
with (
patch("timmy.agent.create_full_toolkit", return_value=mock_toolkit),
patch("timmy.mcp_tools.create_gitea_mcp_tools", return_value=None) as mock_gitea,
patch("timmy.mcp_tools.create_filesystem_mcp_tools", return_value=None) as mock_fs,
):
from timmy.agent import _build_tools_list
_build_tools_list(use_tools=True, skip_mcp=False, model_name="llama3.1")
mock_gitea.assert_called_once()
mock_fs.assert_called_once()
# ── _build_prompt ─────────────────────────────────────────────────────────
def test_build_prompt_includes_base_prompt():
"""Prompt should always contain the base system prompt."""
from timmy.agent import _build_prompt
result = _build_prompt(use_tools=False, session_id="test")
assert "Timmy" in result
def test_build_prompt_appends_memory_context():
"""Memory context should be appended when available."""
mock_memory = MagicMock()
mock_memory.get_system_context.return_value = "User prefers dark mode."
with patch("timmy.memory_system.memory_system", mock_memory):
from timmy.agent import _build_prompt
result = _build_prompt(use_tools=True, session_id="test")
assert "GROUNDED CONTEXT" in result
assert "dark mode" in result
def test_build_prompt_truncates_long_memory():
"""Long memory context should be truncated."""
mock_memory = MagicMock()
mock_memory.get_system_context.return_value = "x" * 10000
with patch("timmy.memory_system.memory_system", mock_memory):
from timmy.agent import _build_prompt
result = _build_prompt(use_tools=False, session_id="test")
assert "[truncated]" in result
def test_build_prompt_survives_memory_failure():
"""Prompt should fall back to base when memory fails."""
mock_memory = MagicMock()
mock_memory.get_system_context.side_effect = RuntimeError("db locked")
with patch("timmy.memory_system.memory_system", mock_memory):
from timmy.agent import _build_prompt
result = _build_prompt(use_tools=True, session_id="test")
assert "Timmy" in result
# Memory context should NOT be appended (the db locked error was caught)
assert "db locked" not in result
# ── _create_ollama_agent ──────────────────────────────────────────────────
def test_create_ollama_agent_passes_correct_kwargs():
"""_create_ollama_agent must pass the expected kwargs to Agent."""
with (
patch("timmy.agent.Agent") as MockAgent,
patch("timmy.agent.Ollama"),
patch("timmy.agent.SqliteDb"),
patch("timmy.agent._warmup_model", return_value=True),
):
from timmy.agent import _create_ollama_agent
_create_ollama_agent(
db_file="test.db",
model_name="llama3.1",
tools_list=[MagicMock()],
full_prompt="test prompt",
use_tools=True,
)
kwargs = MockAgent.call_args.kwargs
assert kwargs["description"] == "test prompt"
assert kwargs["markdown"] is False
def test_create_ollama_agent_none_tools_when_empty():
"""Empty tools_list should pass tools=None to Agent."""
with (
patch("timmy.agent.Agent") as MockAgent,
patch("timmy.agent.Ollama"),
patch("timmy.agent.SqliteDb"),
patch("timmy.agent._warmup_model", return_value=True),
):
from timmy.agent import _create_ollama_agent
_create_ollama_agent(
db_file="test.db",
model_name="llama3.2",
tools_list=[],
full_prompt="test prompt",
use_tools=False,
)
kwargs = MockAgent.call_args.kwargs
assert kwargs["tools"] is None
def test_no_hardcoded_fallback_constants_in_agent():
"""agent.py must not define module-level DEFAULT_MODEL_FALLBACKS."""
import timmy.agent as agent_mod

View File

@@ -361,6 +361,53 @@ class TestRun:
assert response == "ok"
# ── _handle_retry_or_raise ────────────────────────────────────────────────
class TestHandleRetryOrRaise:
def test_raises_on_last_attempt(self):
BaseAgent = _make_base_class()
with pytest.raises(ValueError, match="boom"):
BaseAgent._handle_retry_or_raise(
ValueError("boom"),
attempt=3,
max_retries=3,
transient=False,
)
def test_raises_on_last_attempt_transient(self):
BaseAgent = _make_base_class()
exc = httpx.ConnectError("down")
with pytest.raises(httpx.ConnectError):
BaseAgent._handle_retry_or_raise(
exc,
attempt=3,
max_retries=3,
transient=True,
)
def test_no_raise_on_early_attempt(self):
BaseAgent = _make_base_class()
# Should return None (no raise) on non-final attempt
result = BaseAgent._handle_retry_or_raise(
ValueError("retry me"),
attempt=1,
max_retries=3,
transient=False,
)
assert result is None
def test_no_raise_on_early_transient(self):
BaseAgent = _make_base_class()
result = BaseAgent._handle_retry_or_raise(
httpx.ReadTimeout("busy"),
attempt=2,
max_retries=3,
transient=True,
)
assert result is None
# ── get_capabilities / get_status ────────────────────────────────────────────

View File

@@ -1,10 +1,7 @@
"""Tests for src/timmy/backends.py — AirLLM wrapper and helpers."""
"""Tests for src/timmy/backends.py — backend helpers and classes."""
import sys
from unittest.mock import MagicMock, patch
import pytest
# ── is_apple_silicon ──────────────────────────────────────────────────────────
@@ -38,183 +35,6 @@ def test_is_apple_silicon_false_on_intel_mac():
assert is_apple_silicon() is False
# ── airllm_available ─────────────────────────────────────────────────────────
def test_airllm_available_true_when_stub_in_sys_modules():
# conftest already stubs 'airllm' — importable → True.
from timmy.backends import airllm_available
assert airllm_available() is True
def test_airllm_available_false_when_not_importable():
# Temporarily remove the stub to simulate airllm not installed.
saved = sys.modules.pop("airllm", None)
try:
from timmy.backends import airllm_available
assert airllm_available() is False
finally:
if saved is not None:
sys.modules["airllm"] = saved
# ── TimmyAirLLMAgent construction ────────────────────────────────────────────
def test_airllm_agent_raises_on_unknown_size():
from timmy.backends import TimmyAirLLMAgent
with pytest.raises(ValueError, match="Unknown model size"):
TimmyAirLLMAgent(model_size="3b")
def test_airllm_agent_uses_automodel_on_non_apple():
"""Non-Apple-Silicon path uses AutoModel.from_pretrained."""
with patch("timmy.backends.is_apple_silicon", return_value=False):
from timmy.backends import TimmyAirLLMAgent
TimmyAirLLMAgent(model_size="8b")
# sys.modules["airllm"] is a MagicMock; AutoModel.from_pretrained was called.
assert sys.modules["airllm"].AutoModel.from_pretrained.called
def test_airllm_agent_uses_mlx_on_apple_silicon():
"""Apple Silicon path uses AirLLMMLX, not AutoModel."""
with patch("timmy.backends.is_apple_silicon", return_value=True):
from timmy.backends import TimmyAirLLMAgent
TimmyAirLLMAgent(model_size="8b")
assert sys.modules["airllm"].AirLLMMLX.called
def test_airllm_agent_resolves_correct_model_id_for_70b():
with patch("timmy.backends.is_apple_silicon", return_value=False):
from timmy.backends import _AIRLLM_MODELS, TimmyAirLLMAgent
TimmyAirLLMAgent(model_size="70b")
sys.modules["airllm"].AutoModel.from_pretrained.assert_called_with(_AIRLLM_MODELS["70b"])
# ── TimmyAirLLMAgent.print_response ──────────────────────────────────────────
def _make_agent(model_size: str = "8b") -> "TimmyAirLLMAgent": # noqa: F821
"""Helper: create an agent with a fully mocked underlying model."""
with patch("timmy.backends.is_apple_silicon", return_value=False):
from timmy.backends import TimmyAirLLMAgent
agent = TimmyAirLLMAgent(model_size=model_size)
# Replace the underlying model with a clean mock that returns predictable output.
mock_model = MagicMock()
mock_tokenizer = MagicMock()
# tokenizer() returns a dict-like object with an "input_ids" tensor mock.
input_ids_mock = MagicMock()
input_ids_mock.shape = [1, 10] # shape[1] = prompt token count = 10
token_dict = {"input_ids": input_ids_mock}
mock_tokenizer.return_value = token_dict
# generate() returns a list of token sequences.
mock_tokenizer.decode.return_value = "Sir, affirmative."
mock_model.tokenizer = mock_tokenizer
mock_model.generate.return_value = [list(range(15))] # 15 tokens total
agent._model = mock_model
return agent
def test_print_response_calls_generate():
agent = _make_agent()
agent.print_response("What is sovereignty?", stream=True)
agent._model.generate.assert_called_once()
def test_print_response_decodes_only_generated_tokens():
agent = _make_agent()
agent.print_response("Hello", stream=False)
# decode should be called with tokens starting at index 10 (prompt length).
decode_call = agent._model.tokenizer.decode.call_args
token_slice = decode_call[0][0]
assert list(token_slice) == list(range(10, 15))
def test_print_response_updates_history():
agent = _make_agent()
agent.print_response("First message")
assert any("First message" in turn for turn in agent._history)
assert any("Timmy:" in turn for turn in agent._history)
def test_print_response_history_included_in_second_prompt():
agent = _make_agent()
agent.print_response("First")
# Build the prompt for the second call — history should appear.
prompt = agent._build_prompt("Second")
assert "First" in prompt
assert "Second" in prompt
def test_print_response_stream_flag_accepted():
"""stream=False should not raise — it's accepted for API compatibility."""
agent = _make_agent()
agent.print_response("hello", stream=False) # no error
# ── Prompt formatting tests ────────────────────────────────────────────────
def test_airllm_prompt_contains_formatted_model_name():
"""AirLLM prompt should have actual model name, not literal {model_name}."""
with (
patch("timmy.backends.is_apple_silicon", return_value=False),
patch("config.settings") as mock_settings,
):
mock_settings.ollama_model = "llama3.2:3b"
from timmy.backends import TimmyAirLLMAgent
agent = TimmyAirLLMAgent(model_size="8b")
prompt = agent._build_prompt("test message")
# Should contain the actual model name, not the placeholder
assert "{model_name}" not in prompt
assert "llama3.2:3b" in prompt
def test_airllm_prompt_gets_lite_tier():
"""AirLLM should get LITE tier prompt (tools_enabled=False)."""
with (
patch("timmy.backends.is_apple_silicon", return_value=False),
patch("config.settings") as mock_settings,
):
mock_settings.ollama_model = "test-model"
from timmy.backends import TimmyAirLLMAgent
agent = TimmyAirLLMAgent(model_size="8b")
prompt = agent._build_prompt("test message")
# LITE tier should NOT have TOOL USAGE section
assert "TOOL USAGE" not in prompt
# LITE tier should have the basic rules
assert "Be brief by default" in prompt
def test_airllm_prompt_contains_session_id():
"""AirLLM prompt should have session_id formatted, not placeholder."""
with (
patch("timmy.backends.is_apple_silicon", return_value=False),
patch("config.settings") as mock_settings,
):
mock_settings.ollama_model = "test-model"
from timmy.backends import TimmyAirLLMAgent
agent = TimmyAirLLMAgent(model_size="8b")
prompt = agent._build_prompt("test message")
# Should contain the session_id, not the placeholder
assert '{session_id}"' not in prompt
assert 'session "airllm"' in prompt
# ── ClaudeBackend ─────────────────────────────────────────────────────────

View File

@@ -55,14 +55,14 @@ def test_think_sends_topic_to_agent():
)
def test_think_passes_model_size_option():
"""think --model-size 70b must forward the model size to create_timmy."""
def test_think_ignores_model_size_option():
"""think --model-size 70b must not forward model_size to create_timmy."""
mock_timmy = MagicMock()
with patch("timmy.cli.create_timmy", return_value=mock_timmy) as mock_create:
runner.invoke(app, ["think", "topic", "--model-size", "70b"])
mock_create.assert_called_once_with(backend=None, model_size="70b", session_id="cli")
mock_create.assert_called_once_with(backend=None, session_id="cli")
# ---------------------------------------------------------------------------
@@ -107,19 +107,7 @@ def test_chat_new_session_uses_unique_id():
def test_chat_passes_backend_option():
"""chat --backend airllm must forward the backend to create_timmy."""
mock_run_output = MagicMock()
mock_run_output.content = "OK"
mock_run_output.status = "COMPLETED"
mock_run_output.active_requirements = []
mock_timmy = MagicMock()
mock_timmy.run.return_value = mock_run_output
with patch("timmy.cli.create_timmy", return_value=mock_timmy) as mock_create:
runner.invoke(app, ["chat", "test", "--backend", "airllm"])
mock_create.assert_called_once_with(backend="airllm", model_size=None, session_id="cli")
pass
def test_chat_cleans_response():

View File

@@ -8,11 +8,14 @@ from fastapi.testclient import TestClient
@pytest.fixture
def serve_client():
"""Create a TestClient for the timmy-serve app."""
from timmy_serve.app import create_timmy_serve_app
"""Create a TestClient for the timmy-serve app with mocked Timmy agent."""
with patch("timmy_serve.app.create_timmy") as mock_create:
mock_create.return_value = MagicMock()
from timmy_serve.app import create_timmy_serve_app
app = create_timmy_serve_app()
return TestClient(app)
app = create_timmy_serve_app()
with TestClient(app) as client:
yield client
class TestHealthEndpoint:
@@ -34,18 +37,40 @@ class TestServeStatus:
class TestServeChatEndpoint:
@patch("timmy_serve.app.create_timmy")
def test_chat_returns_response(self, mock_create, serve_client):
def test_chat_returns_response(self, mock_create):
mock_agent = MagicMock()
mock_result = MagicMock()
mock_result.content = "I am Timmy."
mock_agent.run.return_value = mock_result
mock_create.return_value = mock_agent
resp = serve_client.post(
"/serve/chat",
json={"message": "Who are you?"},
)
from timmy_serve.app import create_timmy_serve_app
app = create_timmy_serve_app()
with TestClient(app) as client:
resp = client.post(
"/serve/chat",
json={"message": "Who are you?"},
)
assert resp.status_code == 200
data = resp.json()
assert data["response"] == "I am Timmy."
mock_agent.run.assert_called_once_with("Who are you?", stream=False)
@patch("timmy_serve.app.create_timmy")
def test_agent_cached_at_startup(self, mock_create):
"""Verify create_timmy is called once at startup, not per request."""
mock_agent = MagicMock()
mock_result = MagicMock()
mock_result.content = "reply"
mock_agent.run.return_value = mock_result
mock_create.return_value = mock_agent
from timmy_serve.app import create_timmy_serve_app
app = create_timmy_serve_app()
with TestClient(app) as client:
# Two requests — create_timmy should only be called once (at startup)
client.post("/serve/chat", json={"message": "hello"})
client.post("/serve/chat", json={"message": "world"})
mock_create.assert_called_once()

View File

@@ -15,7 +15,7 @@ except ImportError:
np = None
try:
from timmy.voice_loop import VoiceConfig, VoiceLoop, _strip_markdown
from timmy.voice_loop import VoiceConfig, VoiceLoop, _rms, _strip_markdown
except ImportError:
pass # pytestmark will skip all tests anyway
@@ -147,6 +147,31 @@ class TestStripMarkdown:
assert "*" not in result
class TestRms:
def test_silent_block(self):
block = np.zeros(1600, dtype=np.float32)
assert _rms(block) == pytest.approx(0.0, abs=1e-7)
def test_loud_block(self):
block = np.ones(1600, dtype=np.float32)
assert _rms(block) == pytest.approx(1.0, abs=1e-5)
class TestFinalizeUtterance:
def test_returns_none_for_empty(self):
assert VoiceLoop._finalize_utterance([], min_blocks=5, sample_rate=16000) is None
def test_returns_none_for_too_short(self):
chunks = [np.zeros(1600, dtype=np.float32) for _ in range(3)]
assert VoiceLoop._finalize_utterance(chunks, min_blocks=5, sample_rate=16000) is None
def test_returns_audio_for_sufficient_chunks(self):
chunks = [np.ones(1600, dtype=np.float32) for _ in range(6)]
result = VoiceLoop._finalize_utterance(chunks, min_blocks=5, sample_rate=16000)
assert result is not None
assert len(result) == 6 * 1600
class TestThink:
def test_think_returns_response(self):
loop = VoiceLoop()
@@ -236,6 +261,7 @@ class TestHallucinationFilter:
"""Whisper tends to hallucinate on silence/noise. The loop should filter these."""
def test_known_hallucinations_filtered(self):
loop = VoiceLoop()
hallucinations = [
"you",
"thanks.",
@@ -243,33 +269,35 @@ class TestHallucinationFilter:
"Bye.",
"Thanks for watching!",
"Thank you for watching!",
"",
]
for text in hallucinations:
assert text.lower() in (
"you",
"thanks.",
"thank you.",
"bye.",
"",
"thanks for watching!",
"thank you for watching!",
), f"'{text}' should be filtered"
assert loop._is_hallucination(text), f"'{text}' should be filtered"
def test_real_speech_not_filtered(self):
loop = VoiceLoop()
assert not loop._is_hallucination("Hello Timmy")
assert not loop._is_hallucination("What time is it?")
class TestExitCommands:
"""Voice loop should recognize exit commands."""
def test_exit_commands(self):
loop = VoiceLoop()
exits = ["goodbye", "exit", "quit", "stop", "goodbye timmy", "stop listening"]
for cmd in exits:
assert cmd.lower().strip().rstrip(".!") in (
"goodbye",
"exit",
"quit",
"stop",
"goodbye timmy",
"stop listening",
), f"'{cmd}' should be an exit command"
assert loop._is_exit_command(cmd), f"'{cmd}' should be an exit command"
def test_exit_with_punctuation(self):
loop = VoiceLoop()
assert loop._is_exit_command("goodbye!")
assert loop._is_exit_command("stop.")
def test_non_exit_commands(self):
loop = VoiceLoop()
assert not loop._is_exit_command("hello")
assert not loop._is_exit_command("what time is it")
class TestPlayAudio:

View File

@@ -0,0 +1,109 @@
"""Unit tests for the lightning package (factory + ledger)."""
from __future__ import annotations
import pytest
from lightning.factory import Invoice, MockBackend, get_backend
from lightning.ledger import (
TxStatus,
TxType,
clear,
create_invoice_entry,
get_balance,
get_transactions,
mark_settled,
)
@pytest.fixture(autouse=True)
def _clean_ledger():
"""Reset the in-memory ledger between tests."""
clear()
yield
clear()
# ── Factory tests ────────────────────────────────────────────────────
class TestMockBackend:
def test_create_invoice_returns_invoice(self):
backend = MockBackend()
inv = backend.create_invoice(100, "test memo")
assert isinstance(inv, Invoice)
assert inv.amount_sats == 100
assert inv.memo == "test memo"
assert len(inv.payment_hash) == 64 # SHA-256 hex
assert inv.payment_request.startswith("lnbc")
def test_invoices_have_unique_hashes(self):
backend = MockBackend()
a = backend.create_invoice(10)
b = backend.create_invoice(10)
assert a.payment_hash != b.payment_hash
class TestGetBackend:
def test_returns_mock_backend(self):
backend = get_backend()
assert isinstance(backend, MockBackend)
# ── Ledger tests ─────────────────────────────────────────────────────
class TestLedger:
def test_create_invoice_entry(self):
entry = create_invoice_entry(
payment_hash="abc123",
amount_sats=500,
memo="test",
source="unit_test",
)
assert entry.tx_type == TxType.incoming
assert entry.status == TxStatus.pending
assert entry.amount_sats == 500
def test_mark_settled(self):
create_invoice_entry(payment_hash="hash1", amount_sats=100)
result = mark_settled("hash1", preimage="secret")
assert result is not None
assert result.status == TxStatus.settled
assert result.preimage == "secret"
assert result.settled_at != ""
def test_mark_settled_unknown_hash(self):
assert mark_settled("nonexistent") is None
def test_get_balance_empty(self):
bal = get_balance()
assert bal["net_sats"] == 0
assert bal["available_sats"] == 0
def test_get_balance_with_settled(self):
create_invoice_entry(payment_hash="h1", amount_sats=1000)
mark_settled("h1")
bal = get_balance()
assert bal["incoming_total_sats"] == 1000
assert bal["net_sats"] == 1000
assert bal["available_sats"] == 1000
def test_get_balance_pending_not_counted(self):
create_invoice_entry(payment_hash="h2", amount_sats=500)
bal = get_balance()
assert bal["incoming_total_sats"] == 0
assert bal["pending_incoming_sats"] == 500
def test_get_transactions_returns_entries(self):
create_invoice_entry(payment_hash="t1", amount_sats=10)
create_invoice_entry(payment_hash="t2", amount_sats=20)
txs = get_transactions()
assert len(txs) == 2
def test_get_transactions_filter_by_status(self):
create_invoice_entry(payment_hash="f1", amount_sats=10)
create_invoice_entry(payment_hash="f2", amount_sats=20)
mark_settled("f1")
assert len(get_transactions(status="settled")) == 1
assert len(get_transactions(status="pending")) == 1