Add falsework docs + start-dashboard.sh (API-aware launcher)

- FALSEWORK.md: Full audit of API costs per component, migration plan for shifting load from Claude to local models incrementally - start-dashboard.sh: Launches tmux layout with only zero-cost panes active (status, loopstat). Loop and chat panes held until manual start. - tower-timmy.sh: No changes (already source-controlled) Falsework principle: build on cheap/local scaffolding, upgrade to Claude only where quality demands it.
2026-03-18 16:45:15 -04:00
parent 3fe6e22ccf
commit b71fa55946
3 changed files with 469 additions and 51 deletions
--- a/FALSEWORK.md
+++ b/FALSEWORK.md
@@ -0,0 +1,158 @@
+# Falsework Principle — API Cost Management
+# Created: 2026-03-18
+# Purpose: Document what runs on Claude (expensive), what runs local (free),
+#          and how to incrementally shift load from cloud to local.
+
+## The Metaphor
+
+Falsework = temporary scaffolding that holds the structure while it cures.
+When the permanent structure (local models) can bear the load, remove the
+scaffolding (cloud API calls). Don't wait for perfection — use what works
+NOW, upgrade incrementally.
+
+---
+
+## Current State (2026-03-18)
+
+### ZERO COST (running now)
+| Component           | What it does                    | API calls |
+|---------------------|---------------------------------|-----------|
+| timmy-status.sh     | Gitea + git dashboard (bash)    | 0         |
+| timmy-loopstat.sh   | Queue/perf stats from logs      | 0         |
+| timmy-strategy.sh   | Strategic view panel            | 0         |
+| timmy-watchdog.sh   | Restarts dead tmux panes        | 0         |
+| tower-watchdog.sh   | Restarts dead tower panes       | 0         |
+| hermes-startup.sh   | Boot orchestrator               | 0         |
+| start-dashboard.sh  | tmux layout creator             | 0         |
+| tower-timmy.sh      | Timmy's tower side              | 0 (local) |
+
+### MODERATE COST (running now)
+| Component           | What it does                    | API calls           |
+|---------------------|---------------------------------|---------------------|
+| tower-hermes.sh     | Hermes side of tower chat       | 1 Claude/turn       |
+|                     |                                 | Gated by Timmy's    |
+|                     |                                 | local response time  |
+|                     |                                 | (~1 call/30-60sec)  |
+
+### HEAVY COST (NOT running — held)
+| Component           | What it does                    | API calls           |
+|---------------------|---------------------------------|---------------------|
+| timmy-loop.sh       | Continuous triage + delegation  | 1 Claude Opus/cycle |
+|                     | + timmy-loop-prompt.md          | Runs continuously   |
+|                     |                                 | BIGGEST COST CENTER |
+| kimi-loop.sh        | Per-issue coding agent          | 1 Claude Code/issue |
+|                     |                                 | Bursty, not cont.   |
+| hermes (pane 4)     | Interactive Hermes chat         | Per-interaction     |
+
+---
+
+## Falsework Migration Plan
+
+### Phase 1: DONE — Separate and hold (today)
+- Split the tmux layout so API-heavy panes don't auto-start
+- Tower-hermes is the only active Claude consumer
+- All monitoring is pure bash, zero API cost
+
+### Phase 2: Tower Hermes → Local (next)
+Tower conversation is LOW STAKES. It's two AIs chatting. This does NOT
+need Claude Opus.
+
+FALSEWORK APPROACH:
+- Create ~/.hermes-tower/ config with local-only backend
+- tower-hermes.sh: change `hermes chat` to `HERMES_HOME=~/.hermes-tower hermes chat`
+- Backend: hermes3:latest or qwen3:30b via Ollama
+- Result: tower becomes ZERO API COST
+- Quality: will be dumber but that's fine for conversation
+
+### Phase 3: Loop Triage → Hybrid (requires work)
+The loop prompt (timmy-loop-prompt.md) does 6 phases. NOT all need Opus:
+
+WHAT CAN GO LOCAL:
+- Phase 0 (check stop file) — already bash
+- Phase 1 (fix broken PRs) — needs code reasoning → KEEP CLAUDE
+- Phase 2 (fast triage) — read issues, score them → LOCAL POSSIBLE
+  A local model can read JSON and assign priorities
+- Phase 3 (execute top) — depends on task type
+- Phase 4 (retro) — summarize what happened → LOCAL POSSIBLE
+- Phase 5/6 (deep triage/cleanup) — periodic → LOCAL POSSIBLE
+
+FALSEWORK APPROACH:
+- Split the loop into "triage" (local) and "execute" (Claude)
+- Local model handles: reading issues, scoring, assigning labels
+- Claude handles: actual code review, complex delegation decisions
+- Gate: only call Claude when there's real work, not every cycle
+
+### Phase 4: Kimi → Local Coding Agent (requires model work)
+kimi-loop.sh currently runs `kimi` which is Claude Code ($2/issue budget).
+
+FALSEWORK OPTIONS:
+a) Use qwen3:30b as coding agent (has tool use, just slower)
+b) Use Kimi API (Moonshot) — cheaper than Claude, decent at code
+c) Keep Claude Code but increase poll interval to reduce frequency
+d) Only assign Kimi issues that are scoped/small (1-3 files)
+
+RECOMMENDED: Option (c) for now — same agent, less frequent. Then migrate
+to (a) as local model quality improves.
+
+### Phase 5: Smart Routing (permanent structure)
+Once local models handle triage reliably:
+- Enable smart_model_routing in hermes config
+- Simple turns → hermes3:latest (local, free)
+- Complex turns → Claude Opus (cloud, paid)
+- Tower → always local
+- Loop triage → local, execution → Claude
+- PR review → always Claude (stakes too high)
+
+---
+
+## Cost Estimation (rough)
+
+| Scenario              | Claude calls/hour | Opus cost/hour* |
+|-----------------------|-------------------|-----------------|
+| Everything on Claude  | ~120              | ~$12-24         |
+| Current (tower only)  | ~60               | ~$6-12          |
+| Phase 2 (tower local) | ~0                | ~$0             |
+| Phase 3 (loop hybrid) | ~10-20            | ~$1-4           |
+| Phase 5 (smart route) | ~5-10             | ~$0.50-2        |
+
+*Very rough. Depends on prompt size, response length, Opus pricing.
+
+---
+
+## Rules for Falsework
+
+1. NEVER sacrifice quality gates for cost. If local model can't do PR
+   review reliably, keep it on Claude.
+2. Start with the LOWEST STAKES component. Tower chat → loop triage →
+   PR review. Never the reverse.
+3. Test locally BEFORE removing the scaffolding. Run both paths, compare
+   results, then switch.
+4. Keep the Claude path AVAILABLE. Don't delete configs — comment them
+   out. If local breaks, flip back in 30 seconds.
+5. Monitor degradation. If local triage starts miscategorizing issues,
+   that's the signal to keep Claude for that phase.
+
+---
+
+## Quick Reference: How to Start Each Component
+
+```bash
+# Zero cost — start freely
+~/.hermes/bin/start-dashboard.sh    # tmux layout + status panels
+~/.hermes/bin/tower-timmy.sh        # Timmy side (local)
+~/.hermes/bin/timmy-watchdog.sh     # cron: */8 * * * *
+~/.hermes/bin/tower-watchdog.sh     # cron: */5 * * * *
+
+# Moderate cost — start with awareness
+~/.hermes/bin/tower-hermes.sh       # ~1 Claude call per Timmy response
+
+# Heavy cost — start deliberately
+~/.hermes/bin/timmy-loop.sh         # Continuous Claude Opus calls
+~/.hermes/bin/kimi-loop.sh          # Claude Code per issue
+hermes                              # Interactive Hermes (per-interaction)
+
+# Stop everything
+touch ~/Timmy-Time-dashboard/.loop/STOP   # stops the loop
+tmux kill-session -t timmy-loop           # kills dashboard
+tmux kill-session -t tower                # kills tower
+```