fix(testkit): macOS compat + fix test 8c ordering (#24)

This commit was merged in pull request #24.
This commit is contained in:
2026-03-18 21:01:13 -04:00
parent ca94c0a9e5
commit 83a2ec19e2
59 changed files with 4458 additions and 454 deletions

View File

@@ -0,0 +1,55 @@
WHAT THIS IS:
A tmux-based autonomous dev loop where AI agents collaborate:
- Hermes (Claude, cloud) = orchestrator. Reads code, writes Kimi prompts, reviews output, manages PRs.
- Kimi (Qwen3 30B, local) = coder. Gets dispatched to git worktrees, writes code, runs tests.
- Timmy (Claude Code, local) = sovereign AI being built. The PRODUCT, not a worker.
- Gitea = self-hosted git. PRs, branch protection, squash-only merge, auto-delete branches.
- tox = test/lint/format runner. Pre-commit hooks enforce quality gates.
KEY FILES TO REVIEW:
bin/timmy-loop-prompt.md — the main orchestration prompt (the "brain")
bin/timmy-loop.sh — shell driver that invokes hermes with the prompt
bin/tower-session.sh — tmux session layout (Hermes + Timmy + watchdog)
tmux/tower-session.sh — the tower tmux layout
bin/tower-watchdog.sh — process health monitor
bin/tower-hermes.sh — hermes entry point
bin/tower-timmy.sh — timmy entry point
bin/hermes-claim — issue claim/release system
bin/hermes-dispatch — kimi dispatch helper
bin/hermes-enqueue — queue management
bin/pr-automerge.sh — PR auto-merge on CI pass
bin/timmy-loopstat.sh — real-time status dashboard
bin/timmy-strategy.sh — triage/strategy logic
bin/timmy-watchdog.sh — older watchdog (may be superseded)
ARCHITECTURE CONSTRAINTS:
- Local-first. No cloud dependencies for inference. Ollama serves models.
- Sovereignty matters. Timmy runs on local hardware, not APIs.
- Quality gates are sacred. Never bypass hooks, tests, or linters.
- Squash-only, linear git history. Every commit on main = one squashed PR.
- Config over code. Prefer YAML-driven behavior changes.
KNOWN PAIN POINTS:
1. Kimi scans the full codebase if not given precise file paths — slow and wasteful.
2. Worktree cleanup sometimes fails, leaving orphaned /tmp/timmy-cycle-* dirs.
3. The loop prompt (timmy-loop-prompt.md) is 327 lines — may be too monolithic.
4. No structured retry logic when Kimi produces bad output (just re-dispatch).
5. The claim system (hermes-claim) is file-based JSON — race conditions possible.
6. Status dashboard (loopstat) polls files on disk — no event-driven updates.
7. Two watchdog scripts exist (tower-watchdog.sh, timmy-watchdog.sh) — unclear which is canonical.
8. No metrics/telemetry beyond the cycle JSONL logs.
WHAT I WANT FROM YOU:
1. AUDIT: Read every script. Map the data flow. Identify dead code, redundancy, and fragility.
2. ARCHITECTURE REVIEW: Is the tmux-pane model the right abstraction? What's better?
3. CONCRETE IMPROVEMENTS: File PRs against this repo with actual code changes. Not just suggestions — working code. Prioritize:
- Reliability (crash recovery, cleanup, idempotency)
- Observability (know what's happening without reading log files)
- Simplicity (fewer scripts, clearer contracts between components)
- Performance (faster cycles, less wasted inference)
4. PROPOSAL: If you think the whole thing should be restructured, write a design doc as a PR. Show me the target architecture.
Start by reading every file, then give me your assessment before writing any code.
```
---