Compare commits

..

7 Commits

Author SHA1 Message Date
kimi
9acc0150f1 fix: auto-detect issue number from git branch in cycle_retro
All checks were successful
Tests / lint (pull_request) Successful in 3s
Tests / test (pull_request) Successful in 1m21s
When --issue is not provided, cycle_retro.py now parses the current
git branch name (e.g. kimi/issue-492) to extract the issue number.
This prevents all retro records from having issue=null when the
caller omits the flag.

Fixes #492

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 16:16:43 -04:00
b6d0b5f999 feat: epoch turnover notation for loopstat cycles ⟳WW.D:NNN (#496)
Some checks failed
Tests / lint (push) Successful in 3s
Tests / test (push) Has been cancelled
2026-03-19 16:12:10 -04:00
d70e4f810a fix: use settings.ollama_url instead of hardcoded fallback in cascade router (#491)
All checks were successful
Tests / lint (push) Successful in 3s
Tests / test (push) Successful in 1m20s
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 16:02:20 -04:00
7f20742fcf fix: replace hardcoded secret placeholder in CSRF middleware docstring (#488)
All checks were successful
Tests / lint (push) Successful in 4s
Tests / test (push) Successful in 1m11s
Co-authored-by: Kimi Agent <kimi@timmy.local>
Co-committed-by: Kimi Agent <kimi@timmy.local>
2026-03-19 15:52:29 -04:00
15eb7c3b45 [loop-cycle-538] refactor: remove dead airllm provider from cascade router (#459) (#481)
All checks were successful
Tests / lint (push) Successful in 3s
Tests / test (push) Successful in 1m28s
2026-03-19 15:44:10 -04:00
dbc2fd5b0f [loop-cycle-536] fix: validate_startup checks CORS wildcard in production (#472) (#478)
All checks were successful
Tests / lint (push) Successful in 2s
Tests / test (push) Successful in 1m21s
2026-03-19 15:29:26 -04:00
3c3aca57f1 [loop-cycle-535] perf: cache Timmy agent at startup (#471) (#476)
Some checks failed
Tests / lint (push) Successful in 2s
Tests / test (push) Has been cancelled
## What
Cache the Timmy agent instance at app startup (in lifespan) instead of creating a new one per `/serve/chat` request.

## Changes
- `src/timmy_serve/app.py`: Create agent in lifespan, store in `app.state.timmy`
- `tests/timmy/test_timmy_serve_app.py`: Updated tests for lifespan-based caching, added `test_agent_cached_at_startup`

2085 unit tests pass. 2102 pre-push tests pass. 78.5% coverage.

Closes #471

Co-authored-by: Timmy <timmy@timmytime.ai>
Reviewed-on: http://localhost:3000/rockachopa/Timmy-time-dashboard/pulls/476
Co-authored-by: Timmy Time <timmy@Alexanderwhitestone.ai>
Co-committed-by: Timmy Time <timmy@Alexanderwhitestone.ai>
2026-03-19 15:28:57 -04:00
10 changed files with 601 additions and 76 deletions

View File

@@ -54,19 +54,6 @@ providers:
context_window: 2048 context_window: 2048
capabilities: [text, vision, streaming] capabilities: [text, vision, streaming]
# Secondary: Local AirLLM (if installed)
- name: airllm-local
type: airllm
enabled: false # Enable if pip install airllm
priority: 2
models:
- name: 70b
default: true
capabilities: [text, tools, json, streaming]
- name: 8b
capabilities: [text, tools, json, streaming]
- name: 405b
capabilities: [text, tools, json, streaming]
# Tertiary: OpenAI (if API key available) # Tertiary: OpenAI (if API key available)
- name: openai-backup - name: openai-backup

View File

@@ -4,11 +4,26 @@
Called after each cycle completes (success or failure). Called after each cycle completes (success or failure).
Appends a structured entry to .loop/retro/cycles.jsonl. Appends a structured entry to .loop/retro/cycles.jsonl.
EPOCH NOTATION (turnover system):
Each cycle carries a symbolic epoch tag alongside the raw integer:
⟳WW.D:NNN
⟳ turnover glyph — marks epoch-aware cycles
WW ISO week-of-year (0153)
D ISO weekday (1=Mon … 7=Sun)
NNN daily cycle counter, zero-padded, resets at midnight UTC
Example: ⟳12.3:042 — Week 12, Wednesday, 42nd cycle of the day.
The raw `cycle` integer is preserved for backward compatibility.
The `epoch` field carries the symbolic notation.
SUCCESS DEFINITION: SUCCESS DEFINITION:
A cycle is only "success" if BOTH conditions are met: A cycle is only "success" if BOTH conditions are met:
1. The hermes process exited cleanly (exit code 0) 1. The hermes process exited cleanly (exit code 0)
2. Main is green (smoke test passes on main after merge) 2. Main is green (smoke test passes on main after merge)
A cycle that merges a PR but leaves main red is a FAILURE. A cycle that merges a PR but leaves main red is a FAILURE.
The --main-green flag records the smoke test result. The --main-green flag records the smoke test result.
@@ -29,6 +44,8 @@ from __future__ import annotations
import argparse import argparse
import json import json
import re
import subprocess
import sys import sys
from datetime import datetime, timezone from datetime import datetime, timezone
from pathlib import Path from pathlib import Path
@@ -36,11 +53,73 @@ from pathlib import Path
REPO_ROOT = Path(__file__).resolve().parent.parent REPO_ROOT = Path(__file__).resolve().parent.parent
RETRO_FILE = REPO_ROOT / ".loop" / "retro" / "cycles.jsonl" RETRO_FILE = REPO_ROOT / ".loop" / "retro" / "cycles.jsonl"
SUMMARY_FILE = REPO_ROOT / ".loop" / "retro" / "summary.json" SUMMARY_FILE = REPO_ROOT / ".loop" / "retro" / "summary.json"
EPOCH_COUNTER_FILE = REPO_ROOT / ".loop" / "retro" / ".epoch_counter"
# How many recent entries to include in rolling summary # How many recent entries to include in rolling summary
SUMMARY_WINDOW = 50 SUMMARY_WINDOW = 50
# ── Epoch turnover ────────────────────────────────────────────────────────
def _epoch_tag(now: datetime | None = None) -> tuple[str, dict]:
"""Generate the symbolic epoch tag and advance the daily counter.
Returns (epoch_string, epoch_parts) where epoch_parts is a dict with
week, weekday, daily_n for structured storage.
The daily counter persists in .epoch_counter as a two-line file:
line 1: ISO date (YYYY-MM-DD) of the current epoch day
line 2: integer count
When the date rolls over, the counter resets to 1.
"""
if now is None:
now = datetime.now(timezone.utc)
iso_cal = now.isocalendar() # (year, week, weekday)
week = iso_cal[1]
weekday = iso_cal[2]
today_str = now.strftime("%Y-%m-%d")
# Read / reset daily counter
daily_n = 1
EPOCH_COUNTER_FILE.parent.mkdir(parents=True, exist_ok=True)
if EPOCH_COUNTER_FILE.exists():
try:
lines = EPOCH_COUNTER_FILE.read_text().strip().splitlines()
if len(lines) == 2 and lines[0] == today_str:
daily_n = int(lines[1]) + 1
except (ValueError, IndexError):
pass # corrupt file — reset
# Persist
EPOCH_COUNTER_FILE.write_text(f"{today_str}\n{daily_n}\n")
tag = f"\u27f3{week:02d}.{weekday}:{daily_n:03d}"
parts = {"week": week, "weekday": weekday, "daily_n": daily_n}
return tag, parts
BRANCH_ISSUE_RE = re.compile(r"issue-(\d+)")
def _detect_issue_from_branch() -> int | None:
"""Try to extract an issue number from the current git branch name.
Matches branch patterns like ``kimi/issue-492`` or ``fix/issue-17``.
Returns ``None`` when not on a matching branch or git is unavailable.
"""
try:
branch = subprocess.check_output(
["git", "rev-parse", "--abbrev-ref", "HEAD"],
stderr=subprocess.DEVNULL,
text=True,
).strip()
except (subprocess.CalledProcessError, FileNotFoundError):
return None
m = BRANCH_ISSUE_RE.search(branch)
return int(m.group(1)) if m else None
def parse_args() -> argparse.Namespace: def parse_args() -> argparse.Namespace:
p = argparse.ArgumentParser(description="Log a cycle retrospective") p = argparse.ArgumentParser(description="Log a cycle retrospective")
p.add_argument("--cycle", type=int, required=True) p.add_argument("--cycle", type=int, required=True)
@@ -123,8 +202,30 @@ def update_summary() -> None:
issue_failures[e["issue"]] = issue_failures.get(e["issue"], 0) + 1 issue_failures[e["issue"]] = issue_failures.get(e["issue"], 0) + 1
quarantine_candidates = {k: v for k, v in issue_failures.items() if v >= 2} quarantine_candidates = {k: v for k, v in issue_failures.items() if v >= 2}
# Epoch turnover stats — cycles per week/day from epoch-tagged entries
epoch_entries = [e for e in recent if e.get("epoch")]
by_week: dict[int, int] = {}
by_weekday: dict[int, int] = {}
for e in epoch_entries:
w = e.get("epoch_week")
d = e.get("epoch_weekday")
if w is not None:
by_week[w] = by_week.get(w, 0) + 1
if d is not None:
by_weekday[d] = by_weekday.get(d, 0) + 1
# Current epoch — latest entry's epoch tag
current_epoch = epoch_entries[-1].get("epoch", "") if epoch_entries else ""
# Weekday names for display
weekday_glyphs = {1: "Mon", 2: "Tue", 3: "Wed", 4: "Thu",
5: "Fri", 6: "Sat", 7: "Sun"}
by_weekday_named = {weekday_glyphs.get(k, str(k)): v
for k, v in sorted(by_weekday.items())}
summary = { summary = {
"updated_at": datetime.now(timezone.utc).isoformat(), "updated_at": datetime.now(timezone.utc).isoformat(),
"current_epoch": current_epoch,
"window": len(recent), "window": len(recent),
"measured_cycles": len(measured), "measured_cycles": len(measured),
"total_cycles": len(entries), "total_cycles": len(entries),
@@ -136,9 +237,12 @@ def update_summary() -> None:
"total_lines_removed": sum(e.get("lines_removed", 0) for e in recent), "total_lines_removed": sum(e.get("lines_removed", 0) for e in recent),
"total_prs_merged": sum(1 for e in recent if e.get("pr")), "total_prs_merged": sum(1 for e in recent if e.get("pr")),
"by_type": type_stats, "by_type": type_stats,
"by_week": dict(sorted(by_week.items())),
"by_weekday": by_weekday_named,
"quarantine_candidates": quarantine_candidates, "quarantine_candidates": quarantine_candidates,
"recent_failures": [ "recent_failures": [
{"cycle": e["cycle"], "issue": e.get("issue"), "reason": e.get("reason", "")} {"cycle": e["cycle"], "epoch": e.get("epoch", ""),
"issue": e.get("issue"), "reason": e.get("reason", "")}
for e in failures[-5:] for e in failures[-5:]
], ],
} }
@@ -149,6 +253,13 @@ def update_summary() -> None:
def main() -> None: def main() -> None:
args = parse_args() args = parse_args()
# Auto-detect issue from branch name when not explicitly provided
if args.issue is None:
detected = _detect_issue_from_branch()
if detected is not None:
args.issue = detected
print(f"[retro] Auto-detected issue #{detected} from branch name")
# Reject idle cycles — no issue and no duration means nothing happened # Reject idle cycles — no issue and no duration means nothing happened
if not args.issue and args.duration == 0: if not args.issue and args.duration == 0:
print(f"[retro] Cycle {args.cycle} skipped — idle (no issue, no duration)") print(f"[retro] Cycle {args.cycle} skipped — idle (no issue, no duration)")
@@ -157,9 +268,17 @@ def main() -> None:
# A cycle is only truly successful if hermes exited clean AND main is green # A cycle is only truly successful if hermes exited clean AND main is green
truly_success = args.success and args.main_green truly_success = args.success and args.main_green
# Generate epoch turnover tag
now = datetime.now(timezone.utc)
epoch_tag, epoch_parts = _epoch_tag(now)
entry = { entry = {
"timestamp": datetime.now(timezone.utc).isoformat(), "timestamp": now.isoformat(),
"cycle": args.cycle, "cycle": args.cycle,
"epoch": epoch_tag,
"epoch_week": epoch_parts["week"],
"epoch_weekday": epoch_parts["weekday"],
"epoch_daily_n": epoch_parts["daily_n"],
"issue": args.issue, "issue": args.issue,
"type": args.type, "type": args.type,
"success": truly_success, "success": truly_success,
@@ -184,7 +303,7 @@ def main() -> None:
update_summary() update_summary()
status = "✓ SUCCESS" if args.success else "✗ FAILURE" status = "✓ SUCCESS" if args.success else "✗ FAILURE"
print(f"[retro] Cycle {args.cycle} {status}", end="") print(f"[retro] {epoch_tag} Cycle {args.cycle} {status}", end="")
if args.issue: if args.issue:
print(f" (#{args.issue} {args.type})", end="") print(f" (#{args.issue} {args.type})", end="")
if args.duration: if args.duration:

407
scripts/loop_introspect.py Normal file
View File

@@ -0,0 +1,407 @@
#!/usr/bin/env python3
"""Loop introspection — the self-improvement engine.
Analyzes retro data across time windows to detect trends, extract patterns,
and produce structured recommendations. Output is consumed by deep_triage
and injected into the loop prompt context.
This is the piece that closes the feedback loop:
cycle_retro → introspect → deep_triage → loop behavior changes
Run: python3 scripts/loop_introspect.py
Output: .loop/retro/insights.json (structured insights + recommendations)
Prints human-readable summary to stdout.
Called by: deep_triage.sh (before the LLM triage), timmy-loop.sh (every 50 cycles)
"""
from __future__ import annotations
import json
import sys
from collections import defaultdict
from datetime import datetime, timezone, timedelta
from pathlib import Path
REPO_ROOT = Path(__file__).resolve().parent.parent
CYCLES_FILE = REPO_ROOT / ".loop" / "retro" / "cycles.jsonl"
DEEP_TRIAGE_FILE = REPO_ROOT / ".loop" / "retro" / "deep-triage.jsonl"
TRIAGE_FILE = REPO_ROOT / ".loop" / "retro" / "triage.jsonl"
QUARANTINE_FILE = REPO_ROOT / ".loop" / "quarantine.json"
INSIGHTS_FILE = REPO_ROOT / ".loop" / "retro" / "insights.json"
# ── Helpers ──────────────────────────────────────────────────────────────
def load_jsonl(path: Path) -> list[dict]:
"""Load a JSONL file, skipping bad lines."""
if not path.exists():
return []
entries = []
for line in path.read_text().strip().splitlines():
try:
entries.append(json.loads(line))
except (json.JSONDecodeError, ValueError):
continue
return entries
def parse_ts(ts_str: str) -> datetime | None:
"""Parse an ISO timestamp, tolerating missing tz."""
if not ts_str:
return None
try:
dt = datetime.fromisoformat(ts_str.replace("Z", "+00:00"))
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
return dt
except (ValueError, TypeError):
return None
def window(entries: list[dict], days: int) -> list[dict]:
"""Filter entries to the last N days."""
cutoff = datetime.now(timezone.utc) - timedelta(days=days)
result = []
for e in entries:
ts = parse_ts(e.get("timestamp", ""))
if ts and ts >= cutoff:
result.append(e)
return result
# ── Analysis functions ───────────────────────────────────────────────────
def compute_trends(cycles: list[dict]) -> dict:
"""Compare recent window (last 7d) vs older window (7-14d ago)."""
recent = window(cycles, 7)
older = window(cycles, 14)
# Remove recent from older to get the 7-14d window
recent_set = {(e.get("cycle"), e.get("timestamp")) for e in recent}
older = [e for e in older if (e.get("cycle"), e.get("timestamp")) not in recent_set]
def stats(entries):
if not entries:
return {"count": 0, "success_rate": None, "avg_duration": None,
"lines_net": 0, "prs_merged": 0}
successes = sum(1 for e in entries if e.get("success"))
durations = [e["duration"] for e in entries if e.get("duration", 0) > 0]
return {
"count": len(entries),
"success_rate": round(successes / len(entries), 3) if entries else None,
"avg_duration": round(sum(durations) / len(durations)) if durations else None,
"lines_net": sum(e.get("lines_added", 0) - e.get("lines_removed", 0) for e in entries),
"prs_merged": sum(1 for e in entries if e.get("pr")),
}
recent_stats = stats(recent)
older_stats = stats(older)
trend = {
"recent_7d": recent_stats,
"previous_7d": older_stats,
"velocity_change": None,
"success_rate_change": None,
"duration_change": None,
}
if recent_stats["count"] and older_stats["count"]:
trend["velocity_change"] = recent_stats["count"] - older_stats["count"]
if recent_stats["success_rate"] is not None and older_stats["success_rate"] is not None:
trend["success_rate_change"] = round(
recent_stats["success_rate"] - older_stats["success_rate"], 3
)
if recent_stats["avg_duration"] is not None and older_stats["avg_duration"] is not None:
trend["duration_change"] = recent_stats["avg_duration"] - older_stats["avg_duration"]
return trend
def type_analysis(cycles: list[dict]) -> dict:
"""Per-type success rates and durations."""
by_type: dict[str, list[dict]] = defaultdict(list)
for c in cycles:
by_type[c.get("type", "unknown")].append(c)
result = {}
for t, entries in by_type.items():
durations = [e["duration"] for e in entries if e.get("duration", 0) > 0]
successes = sum(1 for e in entries if e.get("success"))
result[t] = {
"count": len(entries),
"success_rate": round(successes / len(entries), 3) if entries else 0,
"avg_duration": round(sum(durations) / len(durations)) if durations else 0,
"max_duration": max(durations) if durations else 0,
}
return result
def repeat_failures(cycles: list[dict]) -> list[dict]:
"""Issues that have failed multiple times — quarantine candidates."""
failures: dict[int, list] = defaultdict(list)
for c in cycles:
if not c.get("success") and c.get("issue"):
failures[c["issue"]].append({
"cycle": c.get("cycle"),
"reason": c.get("reason", ""),
"duration": c.get("duration", 0),
})
# Only issues with 2+ failures
return [
{"issue": k, "failure_count": len(v), "attempts": v}
for k, v in sorted(failures.items(), key=lambda x: -len(x[1]))
if len(v) >= 2
]
def duration_outliers(cycles: list[dict], threshold_multiple: float = 3.0) -> list[dict]:
"""Cycles that took way longer than average — something went wrong."""
durations = [c["duration"] for c in cycles if c.get("duration", 0) > 0]
if len(durations) < 5:
return []
avg = sum(durations) / len(durations)
threshold = avg * threshold_multiple
outliers = []
for c in cycles:
dur = c.get("duration", 0)
if dur > threshold:
outliers.append({
"cycle": c.get("cycle"),
"issue": c.get("issue"),
"type": c.get("type"),
"duration": dur,
"avg_duration": round(avg),
"multiple": round(dur / avg, 1) if avg > 0 else 0,
"reason": c.get("reason", ""),
})
return outliers
def triage_effectiveness(deep_triages: list[dict]) -> dict:
"""How well is the deep triage performing?"""
if not deep_triages:
return {"runs": 0, "note": "No deep triage data yet"}
total_reviewed = sum(d.get("issues_reviewed", 0) for d in deep_triages)
total_refined = sum(len(d.get("issues_refined", [])) for d in deep_triages)
total_created = sum(len(d.get("issues_created", [])) for d in deep_triages)
total_closed = sum(len(d.get("issues_closed", [])) for d in deep_triages)
timmy_available = sum(1 for d in deep_triages if d.get("timmy_available"))
# Extract Timmy's feedback themes
timmy_themes = []
for d in deep_triages:
fb = d.get("timmy_feedback", "")
if fb:
timmy_themes.append(fb[:200])
return {
"runs": len(deep_triages),
"total_reviewed": total_reviewed,
"total_refined": total_refined,
"total_created": total_created,
"total_closed": total_closed,
"timmy_consultation_rate": round(timmy_available / len(deep_triages), 2),
"timmy_recent_feedback": timmy_themes[-1] if timmy_themes else "",
"timmy_feedback_history": timmy_themes,
}
def generate_recommendations(
trends: dict,
types: dict,
repeats: list,
outliers: list,
triage_eff: dict,
) -> list[dict]:
"""Produce actionable recommendations from the analysis."""
recs = []
# 1. Success rate declining?
src = trends.get("success_rate_change")
if src is not None and src < -0.1:
recs.append({
"severity": "high",
"category": "reliability",
"finding": f"Success rate dropped {abs(src)*100:.0f}pp in the last 7 days",
"recommendation": "Review recent failures. Are issues poorly scoped? "
"Is main unstable? Check if triage is producing bad work items.",
})
# 2. Velocity dropping?
vc = trends.get("velocity_change")
if vc is not None and vc < -5:
recs.append({
"severity": "medium",
"category": "throughput",
"finding": f"Velocity dropped by {abs(vc)} cycles vs previous week",
"recommendation": "Check for loop stalls, long-running cycles, or queue starvation.",
})
# 3. Duration creep?
dc = trends.get("duration_change")
if dc is not None and dc > 120: # 2+ minutes longer
recs.append({
"severity": "medium",
"category": "efficiency",
"finding": f"Average cycle duration increased by {dc}s vs previous week",
"recommendation": "Issues may be growing in scope. Enforce tighter decomposition "
"in deep triage. Check if tests are getting slower.",
})
# 4. Type-specific problems
for t, info in types.items():
if info["count"] >= 3 and info["success_rate"] < 0.5:
recs.append({
"severity": "high",
"category": "type_reliability",
"finding": f"'{t}' issues fail {(1-info['success_rate'])*100:.0f}% of the time "
f"({info['count']} attempts)",
"recommendation": f"'{t}' issues need better scoping or different approach. "
f"Consider: tighter acceptance criteria, smaller scope, "
f"or delegating to Kimi with more context.",
})
if info["avg_duration"] > 600 and info["count"] >= 3: # >10 min avg
recs.append({
"severity": "medium",
"category": "type_efficiency",
"finding": f"'{t}' issues average {info['avg_duration']//60}m{info['avg_duration']%60}s "
f"(max {info['max_duration']//60}m)",
"recommendation": f"Break '{t}' issues into smaller pieces. Target <5 min per cycle.",
})
# 5. Repeat failures
for rf in repeats[:3]:
recs.append({
"severity": "high",
"category": "repeat_failure",
"finding": f"Issue #{rf['issue']} has failed {rf['failure_count']} times",
"recommendation": "Quarantine or rewrite this issue. Repeated failure = "
"bad scope or missing prerequisite.",
})
# 6. Outliers
if len(outliers) > 2:
recs.append({
"severity": "medium",
"category": "outliers",
"finding": f"{len(outliers)} cycles took {outliers[0].get('multiple', '?')}x+ "
f"longer than average",
"recommendation": "Long cycles waste resources. Add timeout enforcement or "
"break complex issues earlier.",
})
# 7. Code growth
recent = trends.get("recent_7d", {})
net = recent.get("lines_net", 0)
if net > 500:
recs.append({
"severity": "low",
"category": "code_health",
"finding": f"Net +{net} lines added in the last 7 days",
"recommendation": "Lines of code is a liability. Balance feature work with "
"refactoring. Target net-zero or negative line growth.",
})
# 8. Triage health
if triage_eff.get("runs", 0) == 0:
recs.append({
"severity": "high",
"category": "triage",
"finding": "Deep triage has never run",
"recommendation": "Enable deep triage (every 20 cycles). The loop needs "
"LLM-driven issue refinement to stay effective.",
})
# No recommendations = things are healthy
if not recs:
recs.append({
"severity": "info",
"category": "health",
"finding": "No significant issues detected",
"recommendation": "System is healthy. Continue current patterns.",
})
return recs
# ── Main ─────────────────────────────────────────────────────────────────
def main() -> None:
cycles = load_jsonl(CYCLES_FILE)
deep_triages = load_jsonl(DEEP_TRIAGE_FILE)
if not cycles:
print("[introspect] No cycle data found. Nothing to analyze.")
return
# Run all analyses
trends = compute_trends(cycles)
types = type_analysis(cycles)
repeats = repeat_failures(cycles)
outliers = duration_outliers(cycles)
triage_eff = triage_effectiveness(deep_triages)
recommendations = generate_recommendations(trends, types, repeats, outliers, triage_eff)
insights = {
"generated_at": datetime.now(timezone.utc).isoformat(),
"total_cycles_analyzed": len(cycles),
"trends": trends,
"by_type": types,
"repeat_failures": repeats[:5],
"duration_outliers": outliers[:5],
"triage_effectiveness": triage_eff,
"recommendations": recommendations,
}
# Write insights
INSIGHTS_FILE.parent.mkdir(parents=True, exist_ok=True)
INSIGHTS_FILE.write_text(json.dumps(insights, indent=2) + "\n")
# Current epoch from latest entry
latest_epoch = ""
for c in reversed(cycles):
if c.get("epoch"):
latest_epoch = c["epoch"]
break
# Human-readable output
header = f"[introspect] Analyzed {len(cycles)} cycles"
if latest_epoch:
header += f" · current epoch: {latest_epoch}"
print(header)
print(f"\n TRENDS (7d vs previous 7d):")
r7 = trends["recent_7d"]
p7 = trends["previous_7d"]
print(f" Cycles: {r7['count']:>3d} (was {p7['count']})")
if r7["success_rate"] is not None:
arrow = "" if (trends["success_rate_change"] or 0) > 0 else "" if (trends["success_rate_change"] or 0) < 0 else ""
print(f" Success rate: {r7['success_rate']*100:>4.0f}% {arrow}")
if r7["avg_duration"] is not None:
print(f" Avg duration: {r7['avg_duration']//60}m{r7['avg_duration']%60:02d}s")
print(f" PRs merged: {r7['prs_merged']:>3d} (was {p7['prs_merged']})")
print(f" Lines net: {r7['lines_net']:>+5d}")
print(f"\n BY TYPE:")
for t, info in sorted(types.items(), key=lambda x: -x[1]["count"]):
print(f" {t:12s} n={info['count']:>2d} "
f"ok={info['success_rate']*100:>3.0f}% "
f"avg={info['avg_duration']//60}m{info['avg_duration']%60:02d}s")
if repeats:
print(f"\n REPEAT FAILURES:")
for rf in repeats[:3]:
print(f" #{rf['issue']} failed {rf['failure_count']}x")
print(f"\n RECOMMENDATIONS ({len(recommendations)}):")
for i, rec in enumerate(recommendations, 1):
sev = {"high": "🔴", "medium": "🟡", "low": "🟢", "info": " "}.get(rec["severity"], "?")
print(f" {sev} {rec['finding']}")
print(f"{rec['recommendation']}")
print(f"\n Written to: {INSIGHTS_FILE}")
if __name__ == "__main__":
main()

View File

@@ -471,12 +471,17 @@ def validate_startup(*, force: bool = False) -> None:
sys.exit(1) sys.exit(1)
if "*" in settings.cors_origins: if "*" in settings.cors_origins:
_startup_logger.error( _startup_logger.error(
"PRODUCTION SECURITY ERROR: Wildcard '*' in CORS_ORIGINS is not " "PRODUCTION SECURITY ERROR: CORS wildcard '*' is not allowed "
"allowed in production — set explicit origins via CORS_ORIGINS env var." "in production. Set CORS_ORIGINS to explicit origins."
) )
sys.exit(1) sys.exit(1)
_startup_logger.info("Production mode: security secrets validated ✓") _startup_logger.info("Production mode: security secrets validated ✓")
else: else:
if "*" in settings.cors_origins:
_startup_logger.warning(
"SEC: CORS_ORIGINS contains wildcard '*'"
"restrict to explicit origins before deploying to production."
)
if not settings.l402_hmac_secret: if not settings.l402_hmac_secret:
_startup_logger.warning( _startup_logger.warning(
"SEC: L402_HMAC_SECRET is not set — " "SEC: L402_HMAC_SECRET is not set — "

View File

@@ -100,7 +100,7 @@ class CSRFMiddleware(BaseHTTPMiddleware):
... ...
Usage: Usage:
app.add_middleware(CSRFMiddleware, secret="your-secret-key") app.add_middleware(CSRFMiddleware, secret=settings.csrf_secret)
Attributes: Attributes:
secret: Secret key for token signing (optional, for future use). secret: Secret key for token signing (optional, for future use).

View File

@@ -18,6 +18,8 @@ from enum import Enum
from pathlib import Path from pathlib import Path
from typing import Any from typing import Any
from config import settings
try: try:
import yaml import yaml
except ImportError: except ImportError:
@@ -100,7 +102,7 @@ class Provider:
"""LLM provider configuration and state.""" """LLM provider configuration and state."""
name: str name: str
type: str # ollama, openai, anthropic, airllm type: str # ollama, openai, anthropic
enabled: bool enabled: bool
priority: int priority: int
url: str | None = None url: str | None = None
@@ -301,22 +303,13 @@ class CascadeRouter:
# Can't check without requests, assume available # Can't check without requests, assume available
return True return True
try: try:
url = provider.url or "http://localhost:11434" url = provider.url or settings.ollama_url
response = requests.get(f"{url}/api/tags", timeout=5) response = requests.get(f"{url}/api/tags", timeout=5)
return response.status_code == 200 return response.status_code == 200
except Exception as exc: except Exception as exc:
logger.debug("Ollama provider check error: %s", exc) logger.debug("Ollama provider check error: %s", exc)
return False return False
elif provider.type == "airllm":
# Check if airllm is installed
try:
import importlib.util
return importlib.util.find_spec("airllm") is not None
except (ImportError, ModuleNotFoundError):
return False
elif provider.type in ("openai", "anthropic", "grok"): elif provider.type in ("openai", "anthropic", "grok"):
# Check if API key is set # Check if API key is set
return provider.api_key is not None and provider.api_key != "" return provider.api_key is not None and provider.api_key != ""

View File

@@ -75,6 +75,8 @@ def create_timmy_serve_app() -> FastAPI:
@asynccontextmanager @asynccontextmanager
async def lifespan(app: FastAPI): async def lifespan(app: FastAPI):
logger.info("Timmy Serve starting") logger.info("Timmy Serve starting")
app.state.timmy = create_timmy()
logger.info("Timmy agent cached in app state")
yield yield
logger.info("Timmy Serve shutting down") logger.info("Timmy Serve shutting down")
@@ -101,7 +103,7 @@ def create_timmy_serve_app() -> FastAPI:
async def serve_chat(request: Request, body: ChatRequest): async def serve_chat(request: Request, body: ChatRequest):
"""Process a chat request.""" """Process a chat request."""
try: try:
timmy = create_timmy() timmy = request.app.state.timmy
result = timmy.run(body.message, stream=False) result = timmy.run(body.message, stream=False)
response_text = result.content if hasattr(result, "content") else str(result) response_text = result.content if hasattr(result, "content") else str(result)

View File

@@ -2,7 +2,7 @@
import time import time
from pathlib import Path from pathlib import Path
from unittest.mock import AsyncMock, MagicMock, patch from unittest.mock import AsyncMock, patch
import pytest import pytest
import yaml import yaml
@@ -489,34 +489,6 @@ class TestProviderAvailabilityCheck:
assert router._check_provider_available(provider) is False assert router._check_provider_available(provider) is False
def test_check_airllm_installed(self):
"""Test AirLLM when installed."""
router = CascadeRouter(config_path=Path("/nonexistent"))
provider = Provider(
name="airllm",
type="airllm",
enabled=True,
priority=1,
)
with patch("importlib.util.find_spec", return_value=MagicMock()):
assert router._check_provider_available(provider) is True
def test_check_airllm_not_installed(self):
"""Test AirLLM when not installed."""
router = CascadeRouter(config_path=Path("/nonexistent"))
provider = Provider(
name="airllm",
type="airllm",
enabled=True,
priority=1,
)
with patch("importlib.util.find_spec", return_value=None):
assert router._check_provider_available(provider) is False
class TestCascadeRouterReload: class TestCascadeRouterReload:
"""Test hot-reload of providers.yaml.""" """Test hot-reload of providers.yaml."""

View File

@@ -37,6 +37,18 @@ class TestConfigLazyValidation:
): ):
validate_startup(force=True) validate_startup(force=True)
def test_validate_startup_ok_with_secrets(self):
"""validate_startup() should not exit when secrets are set."""
from config import settings, validate_startup
with (
patch.object(settings, "timmy_env", "production"),
patch.object(settings, "l402_hmac_secret", "test-secret-hex-value-32"),
patch.object(settings, "l402_macaroon_secret", "test-macaroon-hex-value-32"),
):
# Should not raise
validate_startup(force=True)
def test_validate_startup_exits_on_cors_wildcard_in_production(self): def test_validate_startup_exits_on_cors_wildcard_in_production(self):
"""validate_startup() should exit in production when CORS has wildcard.""" """validate_startup() should exit in production when CORS has wildcard."""
from config import settings, validate_startup from config import settings, validate_startup
@@ -50,17 +62,20 @@ class TestConfigLazyValidation:
): ):
validate_startup(force=True) validate_startup(force=True)
def test_validate_startup_ok_with_secrets(self): def test_validate_startup_warns_cors_wildcard_in_dev(self):
"""validate_startup() should not exit when secrets are set.""" """validate_startup() should warn in dev when CORS has wildcard."""
from config import settings, validate_startup from config import settings, validate_startup
with ( with (
patch.object(settings, "timmy_env", "production"), patch.object(settings, "timmy_env", "development"),
patch.object(settings, "l402_hmac_secret", "test-secret-hex-value-32"), patch.object(settings, "cors_origins", ["*"]),
patch.object(settings, "l402_macaroon_secret", "test-macaroon-hex-value-32"), patch("config._startup_logger") as mock_logger,
): ):
# Should not raise
validate_startup(force=True) validate_startup(force=True)
mock_logger.warning.assert_any_call(
"SEC: CORS_ORIGINS contains wildcard '*'"
"restrict to explicit origins before deploying to production."
)
def test_validate_startup_skips_in_test_mode(self): def test_validate_startup_skips_in_test_mode(self):
"""validate_startup() should be a no-op in test mode.""" """validate_startup() should be a no-op in test mode."""

View File

@@ -8,11 +8,14 @@ from fastapi.testclient import TestClient
@pytest.fixture @pytest.fixture
def serve_client(): def serve_client():
"""Create a TestClient for the timmy-serve app.""" """Create a TestClient for the timmy-serve app with mocked Timmy agent."""
from timmy_serve.app import create_timmy_serve_app with patch("timmy_serve.app.create_timmy") as mock_create:
mock_create.return_value = MagicMock()
from timmy_serve.app import create_timmy_serve_app
app = create_timmy_serve_app() app = create_timmy_serve_app()
return TestClient(app) with TestClient(app) as client:
yield client
class TestHealthEndpoint: class TestHealthEndpoint:
@@ -34,18 +37,40 @@ class TestServeStatus:
class TestServeChatEndpoint: class TestServeChatEndpoint:
@patch("timmy_serve.app.create_timmy") @patch("timmy_serve.app.create_timmy")
def test_chat_returns_response(self, mock_create, serve_client): def test_chat_returns_response(self, mock_create):
mock_agent = MagicMock() mock_agent = MagicMock()
mock_result = MagicMock() mock_result = MagicMock()
mock_result.content = "I am Timmy." mock_result.content = "I am Timmy."
mock_agent.run.return_value = mock_result mock_agent.run.return_value = mock_result
mock_create.return_value = mock_agent mock_create.return_value = mock_agent
resp = serve_client.post( from timmy_serve.app import create_timmy_serve_app
"/serve/chat",
json={"message": "Who are you?"}, app = create_timmy_serve_app()
) with TestClient(app) as client:
resp = client.post(
"/serve/chat",
json={"message": "Who are you?"},
)
assert resp.status_code == 200 assert resp.status_code == 200
data = resp.json() data = resp.json()
assert data["response"] == "I am Timmy." assert data["response"] == "I am Timmy."
mock_agent.run.assert_called_once_with("Who are you?", stream=False) mock_agent.run.assert_called_once_with("Who are you?", stream=False)
@patch("timmy_serve.app.create_timmy")
def test_agent_cached_at_startup(self, mock_create):
"""Verify create_timmy is called once at startup, not per request."""
mock_agent = MagicMock()
mock_result = MagicMock()
mock_result.content = "reply"
mock_agent.run.return_value = mock_result
mock_create.return_value = mock_agent
from timmy_serve.app import create_timmy_serve_app
app = create_timmy_serve_app()
with TestClient(app) as client:
# Two requests — create_timmy should only be called once (at startup)
client.post("/serve/chat", json={"message": "hello"})
client.post("/serve/chat", json={"message": "world"})
mock_create.assert_called_once()