Compare commits

..

1 Commits

Author SHA1 Message Date
Alexander Whitestone
2ecc3800a4 docs: triage screenshot dump into 5 actionable issues
Some checks failed
Tests / lint (pull_request) Failing after 34s
Tests / test (pull_request) Has been skipped
Ingested and analyzed 5 screenshots from issue #1275:
- IMG_6187: AirLLM Apple Silicon requirements → #1284
- IMG_6125: vLLM backend for agentic throughput → #1281
- IMG_6124: DeerFlow research orchestration → #1283
- IMG_6123: Dev discipline / quality gates → #1285
- IMG_6410: SearXNG + Crawl4AI self-hosted search → #1282

Refs #1275

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 21:45:36 -04:00
15 changed files with 29 additions and 742 deletions

View File

@@ -18,17 +18,9 @@ jobs:
- name: Lint (ruff via tox)
run: tox -e lint
typecheck:
runs-on: ubuntu-latest
needs: lint
steps:
- uses: actions/checkout@v4
- name: Type-check (mypy via tox)
run: tox -e typecheck
test:
runs-on: ubuntu-latest
needs: typecheck
needs: lint
steps:
- uses: actions/checkout@v4
- name: Run tests (via tox)

View File

@@ -1,290 +0,0 @@
# Building Timmy: Technical Blueprint for Sovereign Creative AI
> **Source:** PDF attached to issue #891, "Building Timmy: a technical blueprint for sovereign
> creative AI" — generated by Kimi.ai, 16 pages, filed by Perplexity for Timmy's review.
> **Filed:** 2026-03-22 · **Reviewed:** 2026-03-23
---
## Executive Summary
The blueprint establishes that a sovereign creative AI capable of coding, composing music,
generating art, building worlds, publishing narratives, and managing its own economy is
**technically feasible today** — but only through orchestration of dozens of tools operating
at different maturity levels. The core insight: *the integration is the invention*. No single
component is new; the missing piece is a coherent identity operating across all domains
simultaneously with persistent memory, autonomous economics, and cross-domain creative
reactions.
Three non-negotiable architectural decisions:
1. **Human oversight for all public-facing content** — every successful creative AI has this;
every one that removed it failed.
2. **Legal entity before economic activity** — AI agents are not legal persons; establish
structure before wealth accumulates (Truth Terminal cautionary tale: $20M acquired before
a foundation was retroactively created).
3. **Hybrid memory: vector search + knowledge graph** — neither alone is sufficient for
multi-domain context breadth.
---
## Domain-by-Domain Assessment
### Software Development (immediately deployable)
| Component | Recommendation | Notes |
|-----------|----------------|-------|
| Primary agent | Claude Code (Opus 4.6, 77.2% SWE-bench) | Already in use |
| Self-hosted forge | Forgejo (MIT, 170200MB RAM) | Project uses Gitea/Forgejo now |
| CI/CD | GitHub Actions-compatible via `act_runner` | — |
| Tool-making | LATM pattern: frontier model creates tools, cheaper model applies them | New — see ADR opportunity |
| Open-source fallback | OpenHands (~65% SWE-bench, Docker sandboxed) | Backup to Claude Code |
| Self-improvement | Darwin Gödel Machine / SICA patterns | 36 month investment |
**Development estimate:** 23 weeks for Forgejo + Claude Code integration with automated
PR workflows; 12 months for self-improving tool-making pipeline.
**Cross-reference:** This project already runs Claude Code agents on Forgejo. The LATM
pattern (tool registry) and self-improvement loop are the actionable gaps.
---
### Music (14 weeks)
| Component | Recommendation | Notes |
|-----------|----------------|-------|
| Commercial vocals | Suno v5 API (~$0.03/song, $30/month Premier) | No official API; third-party: sunoapi.org, AIMLAPI, EvoLink |
| Local instrumental | MusicGen 1.5B (CC-BY-NC — monetization blocker) | On M2 Max: ~60s for 5s clip |
| Voice cloning | GPT-SoVITS v4 (MIT) | Works on Apple Silicon CPU, RTF 0.526 on M4 |
| Voice conversion | RVC (MIT, 510 min training audio) | — |
| Apple Silicon TTS | MLX-Audio: Kokoro 82M + Qwen3-TTS 0.6B | 45x faster via Metal |
| Publishing | Wavlake (90/10 split, Lightning micropayments) | Auto-syndicates to Fountain.fm |
| Nostr | NIP-94 (kind:1063) audio events → NIP-96 servers | — |
**Copyright reality:** US Copyright Office (Jan 2025) and US Court of Appeals (Mar 2025):
purely AI-generated music cannot be copyrighted and enters public domain. Wavlake's
Value4Value model works around this — fans pay for relationship, not exclusive rights.
**Avoid:** Udio (download disabled since Oct 2025, 2.4/5 Trustpilot).
---
### Visual Art (13 weeks)
| Component | Recommendation | Notes |
|-----------|----------------|-------|
| Local generation | ComfyUI API at `127.0.0.1:8188` (programmatic control via WebSocket) | MLX extension: 5070% faster |
| Speed | Draw Things (free, Mac App Store) | 3× faster than ComfyUI via Metal shaders |
| Quality frontier | Flux 2 (Nov 2025, 4MP, multi-reference) | SDXL needs 16GB+, Flux Dev 32GB+ |
| Character consistency | LoRA training (30 min, 1530 references) + Flux.1 Kontext | Solved problem |
| Face consistency | IP-Adapter + FaceID (ComfyUI-IP-Adapter-Plus) | Training-free |
| Comics | Jenova AI ($20/month, 200+ page consistency) or LlamaGen AI (free) | — |
| Publishing | Blossom protocol (SHA-256 addressed, kind:10063) + Nostr NIP-94 | — |
| Physical | Printful REST API (200+ products, automated fulfillment) | — |
---
### Writing / Narrative (14 weeks for pipeline; ongoing for quality)
| Component | Recommendation | Notes |
|-----------|----------------|-------|
| LLM | Claude Opus 4.5/4.6 (leads Mazur Writing Benchmark at 8.561) | Already in use |
| Context | 500K tokens (1M in beta) — entire novels fit | — |
| Architecture | Outline-first → RAG lore bible → chapter-by-chapter generation | Without outline: novels meander |
| Lore management | WorldAnvil Pro or custom LoreScribe (local RAG) | No tool achieves 100% consistency |
| Publishing (ebooks) | Pandoc → EPUB / KDP PDF | pandoc-novel template on GitHub |
| Publishing (print) | Lulu Press REST API (80% profit, global print network) | KDP: no official API, 3-book/day limit |
| Publishing (Nostr) | NIP-23 kind:30023 long-form events | Habla.news, YakiHonne, Stacker News |
| Podcasts | LLM script → TTS (ElevenLabs or local Kokoro/MLX-Audio) → feedgen RSS → Fountain.fm | Value4Value sats-per-minute |
**Key constraint:** AI-assisted (human directs, AI drafts) = 40% faster. Fully autonomous
without editing = "generic, soulless prose" and character drift by chapter 3 without explicit
memory.
---
### World Building / Games (2 weeks3 months depending on target)
| Component | Recommendation | Notes |
|-----------|----------------|-------|
| Algorithms | Wave Function Collapse, Perlin noise (FastNoiseLite in Godot 4), L-systems | All mature |
| Platform | Godot Engine + gd-agentic-skills (82+ skills, 26 genre blueprints) | Strong LLM/GDScript knowledge |
| Narrative design | Knowledge graph (world state) + LLM + quest template grammar | CHI 2023 validated |
| Quick win | Luanti/Minetest (Lua API, 2,800+ open mods for reference) | Immediately feasible |
| Medium effort | OpenMW content creation (omwaddon format engineering required) | 23 months |
| Future | Unity MCP (AI direct Unity Editor interaction) | Early-stage |
---
### Identity Architecture (2 months)
The blueprint formalizes the **SOUL.md standard** (GitHub: aaronjmars/soul.md):
| File | Purpose |
|------|---------|
| `SOUL.md` | Who you are — identity, worldview, opinions |
| `STYLE.md` | How you write — voice, syntax, patterns |
| `SKILL.md` | Operating modes |
| `MEMORY.md` | Session continuity |
**Critical decision — static vs self-modifying identity:**
- Static Core Truths (version-controlled, human-approved changes only) ✓
- Self-modifying Learned Preferences (logged with rollback, monitored by guardian) ✓
- **Warning:** OpenClaw's "Soul Evolution" creates a security attack surface — Zenity Labs
demonstrated a complete zero-click attack chain targeting SOUL.md files.
**Relevance to this repo:** Claude Code agents already use a `MEMORY.md` pattern in
this project. The SOUL.md stack is a natural extension.
---
### Memory Architecture (2 months)
Hybrid vector + knowledge graph is the recommendation:
| Component | Tool | Notes |
|-----------|------|-------|
| Vector + KG combined | Mem0 (mem0.ai) | 26% accuracy improvement over OpenAI memory, 91% lower p95 latency, 90% token savings |
| Vector store | Qdrant (Rust, open-source) | High-throughput with metadata filtering |
| Temporal KG | Neo4j + Graphiti (Zep AI) | P95 retrieval: 300ms, hybrid semantic + BM25 + graph |
| Backup/migration | AgentKeeper (95% critical fact recovery across model migrations) | — |
**Journal pattern (Stanford Generative Agents):** Agent writes about experiences, generates
high-level reflections 23x/day when importance scores exceed threshold. Ablation studies:
removing any component (observation, planning, reflection) significantly reduces behavioral
believability.
**Cross-reference:** The existing `brain/` package is the memory system. Qdrant and
Mem0 are the recommended upgrade targets.
---
### Multi-Agent Sub-System (36 months)
The blueprint describes a named sub-agent hierarchy:
| Agent | Role |
|-------|------|
| Oracle | Top-level planner / supervisor |
| Sentinel | Safety / moderation |
| Scout | Research / information gathering |
| Scribe | Writing / narrative |
| Ledger | Economic management |
| Weaver | Visual art generation |
| Composer | Music generation |
| Social | Platform publishing |
**Orchestration options:**
- **Agno** (already in use) — microsecond instantiation, 50× less memory than LangGraph
- **CrewAI Flows** — event-driven with fine-grained control
- **LangGraph** — DAG-based with stateful workflows and time-travel debugging
**Scheduling pattern (Stanford Generative Agents):** Top-down recursive daily → hourly →
5-minute planning. Event interrupts for reactive tasks. Re-planning triggers when accumulated
importance scores exceed threshold.
**Cross-reference:** The existing `spark/` package (event capture, advisory engine) aligns
with this architecture. `infrastructure/event_bus` is the choreography backbone.
---
### Economic Engine (14 weeks)
Lightning Labs released `lightning-agent-tools` (open-source) in February 2026:
- `lnget` — CLI HTTP client for L402 payments
- Remote signer architecture (private keys on separate machine from agent)
- Scoped macaroon credentials (pay-only, invoice-only, read-only roles)
- **Aperture** — converts any API to pay-per-use via L402 (HTTP 402)
| Option | Effort | Notes |
|--------|--------|-------|
| ln.bot | 1 week | "Bitcoin for AI Agents" — 3 commands create a wallet; CLI + MCP + REST |
| LND via gRPC | 23 weeks | Full programmatic node management for production |
| Coinbase Agentic Wallets | — | Fiat-adjacent; less aligned with sovereignty ethos |
**Revenue channels:** Wavlake (music, 90/10 Lightning), Nostr zaps (articles), Stacker News
(earn sats from engagement), Printful (physical goods), L402-gated API access (pay-per-use
services), Geyser.fund (Lightning crowdfunding, better initial runway than micropayments).
**Cross-reference:** The existing `lightning/` package in this repo is the foundation.
L402 paywall endpoints for Timmy's own services is the actionable gap.
---
## Pioneer Case Studies
| Agent | Active | Revenue | Key Lesson |
|-------|--------|---------|-----------|
| Botto | Since Oct 2021 | $5M+ (art auctions) | Community governance via DAO sustains engagement; "taste model" (humans guide, not direct) preserves autonomous authorship |
| Neuro-sama | Since Dec 2022 | $400K+/month (subscriptions) | 3+ years of iteration; errors became entertainment features; 24/7 capability is an insurmountable advantage |
| Truth Terminal | Since Jun 2024 | $20M accumulated | Memetic fitness > planned monetization; human gatekeeper approved tweets while selecting AI-intent responses; **establish legal entity first** |
| Holly+ | Since 2021 | Conceptual | DAO of stewards for voice governance; "identity play" as alternative to defensive IP |
| AI Sponge | 2023 | Banned | Unmoderated content → TOS violations + copyright |
| Nothing Forever | 2022present | 8 viewers | Unmoderated content → ban → audience collapse; novelty-only propositions fail |
**Universal pattern:** Human oversight + economic incentive alignment + multi-year personality
development + platform-native economics = success.
---
## Recommended Implementation Sequence
From the blueprint, mapped against Timmy's existing architecture:
### Phase 1: Immediate (weeks)
1. **Code sovereignty** — Forgejo + Claude Code automated PR workflows (already substantially done)
2. **Music pipeline** — Suno API → Wavlake/Nostr NIP-94 publishing
3. **Visual art pipeline** — ComfyUI API → Blossom/Nostr with LoRA character consistency
4. **Basic Lightning wallet** — ln.bot integration for receiving micropayments
5. **Long-form publishing** — Nostr NIP-23 + RSS feed generation
### Phase 2: Moderate effort (13 months)
6. **LATM tool registry** — frontier model creates Python utilities, caches them, lighter model applies
7. **Event-driven cross-domain reactions** — game event → blog + artwork + music (CrewAI/LangGraph)
8. **Podcast generation** — TTS + feedgen → Fountain.fm
9. **Self-improving pipeline** — agent creates, tests, caches own Python utilities
10. **Comic generation** — character-consistent panels with Jenova AI or local LoRA
### Phase 3: Significant investment (36 months)
11. **Full sub-agent hierarchy** — Oracle/Sentinel/Scout/Scribe/Ledger/Weaver with Agno
12. **SOUL.md identity system** — bounded evolution + guardian monitoring
13. **Hybrid memory upgrade** — Qdrant + Mem0/Graphiti replacing or extending `brain/`
14. **Procedural world generation** — Godot + AI-driven narrative (quests, NPCs, lore)
15. **Self-sustaining economic loop** — earned revenue covers compute costs
### Remains aspirational (12+ months)
- Fully autonomous novel-length fiction without editorial intervention
- YouTube monetization for AI-generated content (tightening platform policies)
- Copyright protection for AI-generated works (current US law denies this)
- True artistic identity evolution (genuine creative voice vs pattern remixing)
- Self-modifying architecture without regression or identity drift
---
## Gap Analysis: Blueprint vs Current Codebase
| Blueprint Capability | Current Status | Gap |
|---------------------|----------------|-----|
| Code sovereignty | Done (Claude Code + Forgejo) | LATM tool registry |
| Music generation | Not started | Suno API integration + Wavlake publishing |
| Visual art | Not started | ComfyUI API client + Blossom publishing |
| Writing/publishing | Not started | Nostr NIP-23 + Pandoc pipeline |
| World building | Bannerlord work (different scope) | Luanti mods as quick win |
| Identity (SOUL.md) | Partial (CLAUDE.md + MEMORY.md) | Full SOUL.md stack |
| Memory (hybrid) | `brain/` package (SQLite-based) | Qdrant + knowledge graph |
| Multi-agent | Agno in use | Named hierarchy + event choreography |
| Lightning payments | `lightning/` package | ln.bot wallet + L402 endpoints |
| Nostr identity | Referenced in roadmap, not built | NIP-05, NIP-89 capability cards |
| Legal entity | Unknown | **Must be resolved before economic activity** |
---
## ADR Candidates
Issues that warrant Architecture Decision Records based on this review:
1. **LATM tool registry pattern** — How Timmy creates, tests, and caches self-made tools
2. **Music generation strategy** — Suno (cloud, commercial quality) vs MusicGen (local, CC-BY-NC)
3. **Memory upgrade path** — When/how to migrate `brain/` from SQLite to Qdrant + KG
4. **SOUL.md adoption** — Extending existing CLAUDE.md/MEMORY.md to full SOUL.md stack
5. **Lightning L402 strategy** — Which services Timmy gates behind micropayments
6. **Sub-agent naming and contracts** — Formalizing Oracle/Sentinel/Scout/Scribe/Ledger/Weaver

View File

@@ -164,7 +164,3 @@ directory = "htmlcov"
[tool.coverage.xml]
output = "coverage.xml"
[tool.mypy]
ignore_missing_imports = true
no_error_summary = true

0
src/__init__.py Normal file
View File

View File

@@ -6,8 +6,6 @@ import sqlite3
from contextlib import closing
from pathlib import Path
from typing import Any
from fastapi import APIRouter, Request
from fastapi.responses import HTMLResponse, JSONResponse
@@ -38,9 +36,9 @@ def _discover_databases() -> list[dict]:
return dbs
def _query_database(db_path: str) -> dict[str, Any]:
def _query_database(db_path: str) -> dict:
"""Open a database read-only and return all tables with their rows."""
result: dict[str, Any] = {"tables": {}, "error": None}
result = {"tables": {}, "error": None}
try:
with closing(sqlite3.connect(f"file:{db_path}?mode=ro", uri=True)) as conn:
conn.row_factory = sqlite3.Row

View File

@@ -186,24 +186,6 @@
<p class="chat-history-placeholder">Loading sovereignty metrics...</p>
{% endcall %}
<!-- Agent Scorecards -->
<div class="card mc-card-spaced" id="mc-scorecards-card">
<div class="card-header">
<h2 class="card-title">Agent Scorecards</h2>
<div class="d-flex align-items-center gap-2">
<select id="mc-scorecard-period" class="form-select form-select-sm" style="width: auto;"
onchange="loadMcScorecards()">
<option value="daily" selected>Daily</option>
<option value="weekly">Weekly</option>
</select>
<a href="/scorecards" class="btn btn-sm btn-outline-secondary">Full View</a>
</div>
</div>
<div id="mc-scorecards-content" class="p-2">
<p class="chat-history-placeholder">Loading scorecards...</p>
</div>
</div>
<!-- Chat History -->
<div class="card mc-card-spaced">
<div class="card-header">
@@ -520,20 +502,6 @@ async function loadSparkStatus() {
}
}
// Load agent scorecards
async function loadMcScorecards() {
var period = document.getElementById('mc-scorecard-period').value;
var container = document.getElementById('mc-scorecards-content');
container.innerHTML = '<p class="chat-history-placeholder">Loading scorecards...</p>';
try {
var response = await fetch('/scorecards/all/panels?period=' + period);
var html = await response.text();
container.innerHTML = html;
} catch (error) {
container.innerHTML = '<p class="chat-history-placeholder">Scorecards unavailable</p>';
}
}
// Initial load
loadSparkStatus();
loadSovereignty();
@@ -542,7 +510,6 @@ loadSwarmStats();
loadLightningStats();
loadGrokStats();
loadChatHistory();
loadMcScorecards();
// Periodic updates
setInterval(loadSovereignty, 30000);
@@ -551,6 +518,5 @@ setInterval(loadSwarmStats, 5000);
setInterval(updateHeartbeat, 5000);
setInterval(loadGrokStats, 10000);
setInterval(loadSparkStatus, 15000);
setInterval(loadMcScorecards, 300000);
</script>
{% endblock %}

View File

@@ -137,7 +137,7 @@ class HermesMonitor:
message=f"Check error: {r}",
)
)
elif isinstance(r, CheckResult):
else:
checks.append(r)
# Compute overall level

View File

@@ -203,7 +203,7 @@ async def reload_config(
@router.get("/history")
async def get_history(
hours: int = 24,
store: Annotated[HealthHistoryStore | None, Depends(get_history_store)] = None,
store: Annotated[HealthHistoryStore, Depends(get_history_store)] = None,
) -> list[dict[str, Any]]:
"""Get provider health history for the last N hours."""
if store is None:

View File

@@ -744,20 +744,19 @@ class CascadeRouter:
self,
provider: Provider,
messages: list[dict],
model: str | None,
model: str,
temperature: float,
max_tokens: int | None,
content_type: ContentType = ContentType.TEXT,
) -> dict:
"""Try a single provider request."""
start_time = time.time()
effective_model: str = model or provider.get_default_model() or ""
if provider.type == "ollama":
result = await self._call_ollama(
provider=provider,
messages=messages,
model=effective_model,
model=model or provider.get_default_model(),
temperature=temperature,
max_tokens=max_tokens,
content_type=content_type,
@@ -766,7 +765,7 @@ class CascadeRouter:
result = await self._call_openai(
provider=provider,
messages=messages,
model=effective_model,
model=model or provider.get_default_model(),
temperature=temperature,
max_tokens=max_tokens,
)
@@ -774,7 +773,7 @@ class CascadeRouter:
result = await self._call_anthropic(
provider=provider,
messages=messages,
model=effective_model,
model=model or provider.get_default_model(),
temperature=temperature,
max_tokens=max_tokens,
)
@@ -782,7 +781,7 @@ class CascadeRouter:
result = await self._call_grok(
provider=provider,
messages=messages,
model=effective_model,
model=model or provider.get_default_model(),
temperature=temperature,
max_tokens=max_tokens,
)
@@ -790,7 +789,7 @@ class CascadeRouter:
result = await self._call_vllm_mlx(
provider=provider,
messages=messages,
model=effective_model,
model=model or provider.get_default_model(),
temperature=temperature,
max_tokens=max_tokens,
)

View File

@@ -474,7 +474,7 @@ class DiscordVendor(ChatPlatform):
async def _run_client(self, token: str) -> None:
"""Run the discord.py client (blocking call in a task)."""
try:
await self._client.start(token) # type: ignore[union-attr]
await self._client.start(token)
except Exception as exc:
logger.error("Discord client error: %s", exc)
self._state = PlatformState.ERROR
@@ -482,32 +482,32 @@ class DiscordVendor(ChatPlatform):
def _register_handlers(self) -> None:
"""Register Discord event handlers on the client."""
@self._client.event # type: ignore[union-attr]
@self._client.event
async def on_ready():
self._guild_count = len(self._client.guilds) # type: ignore[union-attr]
self._guild_count = len(self._client.guilds)
self._state = PlatformState.CONNECTED
logger.info(
"Discord ready: %s in %d guild(s)",
self._client.user, # type: ignore[union-attr]
self._client.user,
self._guild_count,
)
@self._client.event # type: ignore[union-attr]
@self._client.event
async def on_message(message):
# Ignore our own messages
if message.author == self._client.user: # type: ignore[union-attr]
if message.author == self._client.user:
return
# Only respond to mentions or DMs
is_dm = not hasattr(message.channel, "guild") or message.channel.guild is None
is_mention = self._client.user in message.mentions # type: ignore[union-attr]
is_mention = self._client.user in message.mentions
if not is_dm and not is_mention:
return
await self._handle_message(message)
@self._client.event # type: ignore[union-attr]
@self._client.event
async def on_disconnect():
if self._state != PlatformState.DISCONNECTED:
self._state = PlatformState.CONNECTING
@@ -535,8 +535,8 @@ class DiscordVendor(ChatPlatform):
def _extract_content(self, message) -> str:
"""Strip the bot mention and return clean message text."""
content = message.content
if self._client.user: # type: ignore[union-attr]
content = content.replace(f"<@{self._client.user.id}>", "").strip() # type: ignore[union-attr]
if self._client.user:
content = content.replace(f"<@{self._client.user.id}>", "").strip()
return content
async def _invoke_agent(self, content: str, session_id: str, target):

View File

@@ -102,14 +102,14 @@ class TelegramBot:
self._token = tok
self._app = Application.builder().token(tok).build()
self._app.add_handler(CommandHandler("start", self._cmd_start)) # type: ignore[union-attr]
self._app.add_handler( # type: ignore[union-attr]
self._app.add_handler(CommandHandler("start", self._cmd_start))
self._app.add_handler(
MessageHandler(filters.TEXT & ~filters.COMMAND, self._handle_message)
)
await self._app.initialize() # type: ignore[union-attr]
await self._app.start() # type: ignore[union-attr]
await self._app.updater.start_polling(allowed_updates=Update.ALL_TYPES) # type: ignore[union-attr]
await self._app.initialize()
await self._app.start()
await self._app.updater.start_polling(allowed_updates=Update.ALL_TYPES)
self._running = True
logger.info("Telegram bot started.")

View File

@@ -245,7 +245,6 @@ class VoiceLoop:
def _transcribe(self, audio: np.ndarray) -> str:
"""Transcribe audio using local Whisper model."""
self._load_whisper()
assert self._whisper_model is not None, "Whisper model failed to load"
sys.stdout.write(" 🧠 Transcribing...\r")
sys.stdout.flush()

View File

@@ -1,270 +0,0 @@
"""Tests for Daily Run orchestrator — health snapshot integration.
Verifies that the orchestrator runs a pre-flight health snapshot before
any coding work begins, and aborts on red status unless --force is passed.
Refs: #923
"""
from __future__ import annotations
import argparse
import json
import sys
from pathlib import Path
from unittest.mock import MagicMock, patch
import pytest
# Add timmy_automations to path for imports
_TA_PATH = Path(__file__).resolve().parent.parent.parent / "timmy_automations" / "daily_run"
if str(_TA_PATH) not in sys.path:
sys.path.insert(0, str(_TA_PATH))
# Also add utils path
_TA_UTILS = Path(__file__).resolve().parent.parent.parent / "timmy_automations"
if str(_TA_UTILS) not in sys.path:
sys.path.insert(0, str(_TA_UTILS))
import health_snapshot as hs
import orchestrator as orch
def _make_snapshot(overall_status: str) -> hs.HealthSnapshot:
"""Build a minimal HealthSnapshot for testing."""
return hs.HealthSnapshot(
timestamp="2026-01-01T00:00:00+00:00",
overall_status=overall_status,
ci=hs.CISignal(status="pass", message="CI passing"),
issues=hs.IssueSignal(count=0, p0_count=0, p1_count=0),
flakiness=hs.FlakinessSignal(
status="healthy",
recent_failures=0,
recent_cycles=10,
failure_rate=0.0,
message="All good",
),
tokens=hs.TokenEconomySignal(status="balanced", message="Balanced"),
)
def _make_red_snapshot() -> hs.HealthSnapshot:
return hs.HealthSnapshot(
timestamp="2026-01-01T00:00:00+00:00",
overall_status="red",
ci=hs.CISignal(status="fail", message="CI failed"),
issues=hs.IssueSignal(count=1, p0_count=1, p1_count=0),
flakiness=hs.FlakinessSignal(
status="critical",
recent_failures=8,
recent_cycles=10,
failure_rate=0.8,
message="High flakiness",
),
tokens=hs.TokenEconomySignal(status="unknown", message="No data"),
)
def _default_args(**overrides) -> argparse.Namespace:
"""Build an argparse Namespace with defaults matching the orchestrator flags."""
defaults = {
"review": False,
"json": False,
"max_items": None,
"skip_health_check": False,
"force": False,
}
defaults.update(overrides)
return argparse.Namespace(**defaults)
class TestRunHealthSnapshot:
"""Test run_health_snapshot() — the pre-flight check called by main()."""
def test_green_returns_zero(self, capsys):
"""Green snapshot returns 0 (proceed)."""
args = _default_args()
with patch.object(orch, "_generate_health_snapshot", return_value=_make_snapshot("green")):
rc = orch.run_health_snapshot(args)
assert rc == 0
def test_yellow_returns_zero(self, capsys):
"""Yellow snapshot returns 0 (proceed with caution)."""
args = _default_args()
with patch.object(orch, "_generate_health_snapshot", return_value=_make_snapshot("yellow")):
rc = orch.run_health_snapshot(args)
assert rc == 0
def test_red_returns_one(self, capsys):
"""Red snapshot returns 1 (abort)."""
args = _default_args()
with patch.object(orch, "_generate_health_snapshot", return_value=_make_red_snapshot()):
rc = orch.run_health_snapshot(args)
assert rc == 1
def test_red_with_force_returns_zero(self, capsys):
"""Red snapshot with --force returns 0 (proceed anyway)."""
args = _default_args(force=True)
with patch.object(orch, "_generate_health_snapshot", return_value=_make_red_snapshot()):
rc = orch.run_health_snapshot(args)
assert rc == 0
def test_snapshot_exception_is_skipped(self, capsys):
"""If health snapshot raises, it degrades gracefully and returns 0."""
args = _default_args()
with patch.object(orch, "_generate_health_snapshot", side_effect=RuntimeError("boom")):
rc = orch.run_health_snapshot(args)
assert rc == 0
captured = capsys.readouterr()
assert "warning" in captured.err.lower() or "skipping" in captured.err.lower()
def test_snapshot_prints_summary(self, capsys):
"""Health snapshot prints a pre-flight summary block."""
args = _default_args()
with patch.object(orch, "_generate_health_snapshot", return_value=_make_snapshot("green")):
orch.run_health_snapshot(args)
captured = capsys.readouterr()
assert "PRE-FLIGHT HEALTH CHECK" in captured.out
assert "CI" in captured.out
def test_red_prints_abort_message(self, capsys):
"""Red snapshot prints an abort message to stderr."""
args = _default_args()
with patch.object(orch, "_generate_health_snapshot", return_value=_make_red_snapshot()):
orch.run_health_snapshot(args)
captured = capsys.readouterr()
assert "RED" in captured.err or "aborting" in captured.err.lower()
def test_p0_issues_shown_in_output(self, capsys):
"""P0 issue count is shown in the pre-flight output."""
args = _default_args()
snapshot = hs.HealthSnapshot(
timestamp="2026-01-01T00:00:00+00:00",
overall_status="red",
ci=hs.CISignal(status="pass", message="CI passing"),
issues=hs.IssueSignal(count=2, p0_count=2, p1_count=0),
flakiness=hs.FlakinessSignal(
status="healthy",
recent_failures=0,
recent_cycles=10,
failure_rate=0.0,
message="All good",
),
tokens=hs.TokenEconomySignal(status="balanced", message="Balanced"),
)
with patch.object(orch, "_generate_health_snapshot", return_value=snapshot):
orch.run_health_snapshot(args)
captured = capsys.readouterr()
assert "P0" in captured.out
class TestMainHealthCheckIntegration:
"""Test that main() runs health snapshot before any coding work."""
def _patch_gitea_unavailable(self):
return patch.object(orch.GiteaClient, "is_available", return_value=False)
def test_main_runs_health_check_before_gitea(self):
"""Health snapshot is called before Gitea client work."""
call_order = []
def fake_snapshot(*_a, **_kw):
call_order.append("health")
return _make_snapshot("green")
def fake_gitea_available(self):
call_order.append("gitea")
return False
args = _default_args()
with (
patch.object(orch, "_generate_health_snapshot", side_effect=fake_snapshot),
patch.object(orch.GiteaClient, "is_available", fake_gitea_available),
patch("sys.argv", ["orchestrator"]),
):
orch.main()
assert call_order.index("health") < call_order.index("gitea")
def test_main_aborts_on_red_before_gitea(self):
"""main() aborts with non-zero exit code when health is red."""
gitea_called = []
def fake_gitea_available(self):
gitea_called.append(True)
return True
with (
patch.object(orch, "_generate_health_snapshot", return_value=_make_red_snapshot()),
patch.object(orch.GiteaClient, "is_available", fake_gitea_available),
patch("sys.argv", ["orchestrator"]),
):
rc = orch.main()
assert rc != 0
assert not gitea_called, "Gitea should NOT be called when health is red"
def test_main_skips_health_check_with_flag(self):
"""--skip-health-check bypasses the pre-flight snapshot."""
health_called = []
def fake_snapshot(*_a, **_kw):
health_called.append(True)
return _make_snapshot("green")
with (
patch.object(orch, "_generate_health_snapshot", side_effect=fake_snapshot),
patch.object(orch.GiteaClient, "is_available", return_value=False),
patch("sys.argv", ["orchestrator", "--skip-health-check"]),
):
orch.main()
assert not health_called, "Health snapshot should be skipped"
def test_main_force_flag_continues_despite_red(self):
"""--force allows Daily Run to continue even when health is red."""
gitea_called = []
def fake_gitea_available(self):
gitea_called.append(True)
return False # Gitea unavailable → exits early but after health check
with (
patch.object(orch, "_generate_health_snapshot", return_value=_make_red_snapshot()),
patch.object(orch.GiteaClient, "is_available", fake_gitea_available),
patch("sys.argv", ["orchestrator", "--force"]),
):
orch.main()
# Gitea was reached despite red status because --force was passed
assert gitea_called
def test_main_json_output_on_red_includes_error(self, capsys):
"""JSON output includes error key when health is red."""
with (
patch.object(orch, "_generate_health_snapshot", return_value=_make_red_snapshot()),
patch.object(orch.GiteaClient, "is_available", return_value=True),
patch("sys.argv", ["orchestrator", "--json"]),
):
rc = orch.main()
assert rc != 0
captured = capsys.readouterr()
data = json.loads(captured.out)
assert "error" in data

View File

@@ -4,13 +4,10 @@
Connects to local Gitea, fetches candidate issues, and produces a concise agenda
plus a day summary (review mode).
The Daily Run begins with a Quick Health Snapshot (#710) to ensure mandatory
systems are green before burning cycles on work that cannot land.
Run: python3 timmy_automations/daily_run/orchestrator.py [--review]
Env: See timmy_automations/config/daily_run.json for configuration
Refs: #703, #923
Refs: #703
"""
from __future__ import annotations
@@ -33,11 +30,6 @@ sys.path.insert(
)
from utils.token_rules import TokenRules, compute_token_reward
# Health snapshot lives in the same package
from health_snapshot import generate_snapshot as _generate_health_snapshot
from health_snapshot import get_token as _hs_get_token
from health_snapshot import load_config as _hs_load_config
# ── Configuration ─────────────────────────────────────────────────────────
REPO_ROOT = Path(__file__).resolve().parent.parent.parent
@@ -503,16 +495,6 @@ def parse_args() -> argparse.Namespace:
default=None,
help="Override max agenda items",
)
p.add_argument(
"--skip-health-check",
action="store_true",
help="Skip the pre-flight health snapshot (not recommended)",
)
p.add_argument(
"--force",
action="store_true",
help="Continue even if health snapshot is red (overrides abort-on-red)",
)
return p.parse_args()
@@ -553,76 +535,6 @@ def compute_daily_run_tokens(success: bool = True) -> dict[str, Any]:
}
def run_health_snapshot(args: argparse.Namespace) -> int:
"""Run pre-flight health snapshot and return 0 (ok) or 1 (abort).
Prints a concise summary of CI, issues, flakiness, and token economy.
Returns 1 if the overall status is red AND --force was not passed.
Returns 0 for green/yellow or when --force is active.
On any import/runtime error the check is skipped with a warning.
"""
try:
hs_config = _hs_load_config()
hs_token = _hs_get_token(hs_config)
snapshot = _generate_health_snapshot(hs_config, hs_token)
except Exception as exc: # noqa: BLE001
print(f"[health] Warning: health snapshot failed ({exc}) — skipping", file=sys.stderr)
return 0
# Print concise pre-flight header
status_emoji = {"green": "🟢", "yellow": "🟡", "red": "🔴"}.get(
snapshot.overall_status, ""
)
print("" * 60)
print(f"PRE-FLIGHT HEALTH CHECK {status_emoji} {snapshot.overall_status.upper()}")
print("" * 60)
ci_emoji = {"pass": "", "fail": "", "unknown": "⚠️", "unavailable": ""}.get(
snapshot.ci.status, ""
)
print(f" {ci_emoji} CI: {snapshot.ci.message}")
if snapshot.issues.p0_count > 0:
issue_emoji = "🔴"
elif snapshot.issues.p1_count > 0:
issue_emoji = "🟡"
else:
issue_emoji = ""
critical_str = f"{snapshot.issues.count} critical"
if snapshot.issues.p0_count:
critical_str += f" (P0: {snapshot.issues.p0_count})"
if snapshot.issues.p1_count:
critical_str += f" (P1: {snapshot.issues.p1_count})"
print(f" {issue_emoji} Issues: {critical_str}")
flak_emoji = {"healthy": "", "degraded": "🟡", "critical": "🔴", "unknown": ""}.get(
snapshot.flakiness.status, ""
)
print(f" {flak_emoji} Flakiness: {snapshot.flakiness.message}")
token_emoji = {"balanced": "", "inflationary": "🟡", "deflationary": "🔵", "unknown": ""}.get(
snapshot.tokens.status, ""
)
print(f" {token_emoji} Tokens: {snapshot.tokens.message}")
print()
if snapshot.overall_status == "red" and not args.force:
print(
"🛑 Health status is RED — aborting Daily Run to avoid burning cycles.",
file=sys.stderr,
)
print(
" Fix the issues above or re-run with --force to override.",
file=sys.stderr,
)
return 1
if snapshot.overall_status == "red":
print("⚠️ Health is RED but --force passed — proceeding anyway.", file=sys.stderr)
return 0
def main() -> int:
args = parse_args()
config = load_config()
@@ -630,15 +542,6 @@ def main() -> int:
if args.max_items:
config["max_agenda_items"] = args.max_items
# ── Step 0: Pre-flight health snapshot ──────────────────────────────────
if not args.skip_health_check:
health_rc = run_health_snapshot(args)
if health_rc != 0:
tokens = compute_daily_run_tokens(success=False)
if args.json:
print(json.dumps({"error": "health_check_failed", "tokens": tokens}))
return health_rc
token = get_token(config)
client = GiteaClient(config, token)

10
tox.ini
View File

@@ -41,10 +41,8 @@ description = Static type checking with mypy
commands_pre =
deps =
mypy>=1.0.0
types-PyYAML
types-requests
commands =
mypy src
mypy src --ignore-missing-imports --no-error-summary
# ── Test Environments ────────────────────────────────────────────────────────
@@ -132,17 +130,13 @@ commands =
# ── Pre-push (mirrors CI exactly) ────────────────────────────────────────────
[testenv:pre-push]
description = Local gate — lint + typecheck + full CI suite (same as Gitea Actions)
description = Local gate — lint + full CI suite (same as Gitea Actions)
deps =
ruff>=0.8.0
mypy>=1.0.0
types-PyYAML
types-requests
commands =
ruff check src/ tests/
ruff format --check src/ tests/
bash -c 'files=$(grep -rl "<style" src/dashboard/templates/ --include="*.html" 2>/dev/null); if [ -n "$files" ]; then echo "ERROR: inline <style> blocks found — move CSS to static/css/mission-control.css:"; echo "$files"; exit 1; fi; echo "No inline CSS — OK"'
mypy src
mkdir -p reports
pytest tests/ \
--cov=src \