fix(testkit): macOS compat + fix test 8c ordering (#24)

2026-03-18 21:01:13 -04:00
parent ca94c0a9e5
commit 83a2ec19e2
59 changed files with 4458 additions and 454 deletions
--- a/attached_assets/Pasted--WHAT-THIS-IS-A-tmux-based-autonomous-dev-loop-where-AI_1773874680640.txt
+++ b/attached_assets/Pasted--WHAT-THIS-IS-A-tmux-based-autonomous-dev-loop-where-AI_1773874680640.txt
@@ -0,0 +1,55 @@
+   WHAT THIS IS:
+   A tmux-based autonomous dev loop where AI agents collaborate:
+   - Hermes (Claude, cloud) = orchestrator. Reads code, writes Kimi prompts, reviews output, manages PRs.
+   - Kimi (Qwen3 30B, local) = coder. Gets dispatched to git worktrees, writes code, runs tests.
+   - Timmy (Claude Code, local) = sovereign AI being built. The PRODUCT, not a worker.
+   - Gitea = self-hosted git. PRs, branch protection, squash-only merge, auto-delete branches.
+   - tox = test/lint/format runner. Pre-commit hooks enforce quality gates.
+
+   KEY FILES TO REVIEW:
+     bin/timmy-loop-prompt.md   — the main orchestration prompt (the "brain")
+     bin/timmy-loop.sh          — shell driver that invokes hermes with the prompt
+     bin/tower-session.sh       — tmux session layout (Hermes + Timmy + watchdog)
+     tmux/tower-session.sh      — the tower tmux layout
+     bin/tower-watchdog.sh      — process health monitor
+     bin/tower-hermes.sh        — hermes entry point
+     bin/tower-timmy.sh         — timmy entry point
+     bin/hermes-claim            — issue claim/release system
+     bin/hermes-dispatch         — kimi dispatch helper
+     bin/hermes-enqueue          — queue management
+     bin/pr-automerge.sh         — PR auto-merge on CI pass
+     bin/timmy-loopstat.sh       — real-time status dashboard
+     bin/timmy-strategy.sh       — triage/strategy logic
+     bin/timmy-watchdog.sh       — older watchdog (may be superseded)
+
+   ARCHITECTURE CONSTRAINTS:
+   - Local-first. No cloud dependencies for inference. Ollama serves models.
+   - Sovereignty matters. Timmy runs on local hardware, not APIs.
+   - Quality gates are sacred. Never bypass hooks, tests, or linters.
+   - Squash-only, linear git history. Every commit on main = one squashed PR.
+   - Config over code. Prefer YAML-driven behavior changes.
+
+   KNOWN PAIN POINTS:
+   1. Kimi scans the full codebase if not given precise file paths — slow and wasteful.
+   2. Worktree cleanup sometimes fails, leaving orphaned /tmp/timmy-cycle-* dirs.
+   3. The loop prompt (timmy-loop-prompt.md) is 327 lines — may be too monolithic.
+   4. No structured retry logic when Kimi produces bad output (just re-dispatch).
+   5. The claim system (hermes-claim) is file-based JSON — race conditions possible.
+   6. Status dashboard (loopstat) polls files on disk — no event-driven updates.
+   7. Two watchdog scripts exist (tower-watchdog.sh, timmy-watchdog.sh) — unclear which is canonical.
+   8. No metrics/telemetry beyond the cycle JSONL logs.
+
+   WHAT I WANT FROM YOU:
+   1. AUDIT: Read every script. Map the data flow. Identify dead code, redundancy, and fragility.
+   2. ARCHITECTURE REVIEW: Is the tmux-pane model the right abstraction? What's better?
+   3. CONCRETE IMPROVEMENTS: File PRs against this repo with actual code changes. Not just suggestions — working code. Prioritize:
+      - Reliability (crash recovery, cleanup, idempotency)
+      - Observability (know what's happening without reading log files)
+      - Simplicity (fewer scripts, clearer contracts between components)
+      - Performance (faster cycles, less wasted inference)
+   4. PROPOSAL: If you think the whole thing should be restructured, write a design doc as a PR. Show me the target architecture.
+
+   Start by reading every file, then give me your assessment before writing any code.
+   ```
+
+   ---