Migrated from hermes/hermes-config (now archived). Contents: SOUL.md — Inscription 1, the immutable conscience config.yaml — live Hermes harness configuration memories/ — persistent agent memory + user profile skins/ — timmy.yaml + trismegistus.yaml personalities playbooks/ — 6 specialist agent configs cron/ — scheduled job definitions docs/design-log/ — historical design decisions FALSEWORK.md — API cost management strategy channel_directory.json — platform channel mappings Applied as side-car to Hermes harness. No hacking on the harness.
6.7 KiB
Falsework Principle — API Cost Management
Created: 2026-03-18
Purpose: Document what runs on Claude (expensive), what runs local (free),
and how to incrementally shift load from cloud to local.
The Metaphor
Falsework = temporary scaffolding that holds the structure while it cures. When the permanent structure (local models) can bear the load, remove the scaffolding (cloud API calls). Don't wait for perfection — use what works NOW, upgrade incrementally.
Current State (2026-03-18)
ZERO COST (running now)
| Component | What it does | API calls |
|---|---|---|
| timmy-status.sh | Gitea + git dashboard (bash) | 0 |
| timmy-loopstat.sh | Queue/perf stats from logs | 0 |
| timmy-strategy.sh | Strategic view panel | 0 |
| timmy-watchdog.sh | Restarts dead tmux panes | 0 |
| tower-watchdog.sh | Restarts dead tower panes | 0 |
| hermes-startup.sh | Boot orchestrator | 0 |
| start-dashboard.sh | tmux layout creator | 0 |
| tower-timmy.sh | Timmy's tower side | 0 (local) |
MODERATE COST (running now)
| Component | What it does | API calls |
|---|---|---|
| tower-hermes.sh | Hermes side of tower chat | 1 Claude/turn |
| Gated by Timmy's | ||
| local response time | ||
| (~1 call/30-60sec) |
HEAVY COST (NOT running — held)
| Component | What it does | API calls |
|---|---|---|
| timmy-loop.sh | Continuous triage + delegation | 1 Claude Opus/cycle |
| + timmy-loop-prompt.md | Runs continuously | |
| BIGGEST COST CENTER | ||
| kimi-loop.sh | Per-issue coding agent | 1 Claude Code/issue |
| Bursty, not cont. | ||
| hermes (pane 4) | Interactive Hermes chat | Per-interaction |
Falsework Migration Plan
Phase 1: DONE — Separate and hold (today)
- Split the tmux layout so API-heavy panes don't auto-start
- Tower-hermes is the only active Claude consumer
- All monitoring is pure bash, zero API cost
Phase 2: Tower Hermes → Local (next)
Tower conversation is LOW STAKES. It's two AIs chatting. This does NOT need Claude Opus.
FALSEWORK APPROACH:
- Create ~/.hermes-tower/ config with local-only backend
- tower-hermes.sh: change
hermes chattoHERMES_HOME=~/.hermes-tower hermes chat - Backend: hermes3:latest or qwen3:30b via Ollama
- Result: tower becomes ZERO API COST
- Quality: will be dumber but that's fine for conversation
Phase 3: Loop Triage → Hybrid (requires work)
The loop prompt (timmy-loop-prompt.md) does 6 phases. NOT all need Opus:
WHAT CAN GO LOCAL:
- Phase 0 (check stop file) — already bash
- Phase 1 (fix broken PRs) — needs code reasoning → KEEP CLAUDE
- Phase 2 (fast triage) — read issues, score them → LOCAL POSSIBLE A local model can read JSON and assign priorities
- Phase 3 (execute top) — depends on task type
- Phase 4 (retro) — summarize what happened → LOCAL POSSIBLE
- Phase 5/6 (deep triage/cleanup) — periodic → LOCAL POSSIBLE
FALSEWORK APPROACH:
- Split the loop into "triage" (local) and "execute" (Claude)
- Local model handles: reading issues, scoring, assigning labels
- Claude handles: actual code review, complex delegation decisions
- Gate: only call Claude when there's real work, not every cycle
Phase 4: Kimi → Local Coding Agent (requires model work)
kimi-loop.sh currently runs kimi which is Claude Code ($2/issue budget).
FALSEWORK OPTIONS: a) Use qwen3:30b as coding agent (has tool use, just slower) b) Use Kimi API (Moonshot) — cheaper than Claude, decent at code c) Keep Claude Code but increase poll interval to reduce frequency d) Only assign Kimi issues that are scoped/small (1-3 files)
RECOMMENDED: Option (c) for now — same agent, less frequent. Then migrate to (a) as local model quality improves.
Phase 5: Smart Routing (permanent structure)
Once local models handle triage reliably:
- Enable smart_model_routing in hermes config
- Simple turns → hermes3:latest (local, free)
- Complex turns → Claude Opus (cloud, paid)
- Tower → always local
- Loop triage → local, execution → Claude
- PR review → always Claude (stakes too high)
Cost Estimation (rough)
| Scenario | Claude calls/hour | Opus cost/hour* |
|---|---|---|
| Everything on Claude | ~120 | ~$12-24 |
| Current (tower only) | ~60 | ~$6-12 |
| Phase 2 (tower local) | ~0 | ~$0 |
| Phase 3 (loop hybrid) | ~10-20 | ~$1-4 |
| Phase 5 (smart route) | ~5-10 | ~$0.50-2 |
*Very rough. Depends on prompt size, response length, Opus pricing.
Rules for Falsework
- NEVER sacrifice quality gates for cost. If local model can't do PR review reliably, keep it on Claude.
- Start with the LOWEST STAKES component. Tower chat → loop triage → PR review. Never the reverse.
- Test locally BEFORE removing the scaffolding. Run both paths, compare results, then switch.
- Keep the Claude path AVAILABLE. Don't delete configs — comment them out. If local breaks, flip back in 30 seconds.
- Monitor degradation. If local triage starts miscategorizing issues, that's the signal to keep Claude for that phase.
Quick Reference: How to Start Each Component
# Zero cost — start freely
~/.hermes/bin/start-dashboard.sh # tmux layout + status panels
~/.hermes/bin/tower-timmy.sh # Timmy side (local)
~/.hermes/bin/timmy-watchdog.sh # cron: */8 * * * *
~/.hermes/bin/tower-watchdog.sh # cron: */5 * * * *
# Moderate cost — start with awareness
~/.hermes/bin/tower-hermes.sh # ~1 Claude call per Timmy response
# Heavy cost — start deliberately
~/.hermes/bin/timmy-loop.sh # Continuous Claude Opus calls
~/.hermes/bin/kimi-loop.sh # Claude Code per issue
hermes # Interactive Hermes (per-interaction)
# Stop everything
touch ~/Timmy-Time-dashboard/.loop/STOP # stops the loop
tmux kill-session -t timmy-loop # kills dashboard
tmux kill-session -t tower # kills tower