Files
hermes-config/docs/design-log/2026-03-15-cycle1-retro.md
Alexander Whitestone 9e858ecf21 docs: cycle 1 retrospective — delegation loop proof of concept
Results: 1 PR (#160), 1 deferred (#148 too large for single Kimi task)
Key lessons: task size is critical, never use /tmp for worktrees,
split large tasks into 3-4 file chunks.
2026-03-15 11:15:44 -04:00

3.4 KiB

Delegation Cycle 1 — Retrospective

Date: 2026-03-15

Orchestrator: Hermes

Implementer: Kimi (kimi-k2.5, Moonshot servers)


Tasks Attempted

Task Issue Branch Status Outcome
Decompose _maybe_distill() #151 refactor/151-maybe-distill PR #160 Clean first-pass, 1426 tests pass
SQLite context managers #148 fix/148-sqlite-context-managers Deferred Kimi timed out 3x, changes lost to /tmp cleanup

What Worked

#151 was a perfect Kimi task:

  • Single file (thinking.py), clear decomposition target
  • Specific "done when" criteria (under 30 lines, tests pass)
  • Kimi produced clean code on first pass
  • Constants moved to module level, helpers well-named
  • 1426 tests pass, no behavior change
  • Total time: ~2 minutes for Kimi, ~5 minutes for review+commit+PR

The two-attempt rule:

  • Applied correctly — after Kimi's second timeout on #148, I took over
  • Prevented infinite retry loops

What Failed

#148 was too large for Kimi:

  • 22 call sites across 16 files
  • Kimi converted ~11 files on first pass but missed several
  • Timed out at 300s on second pass
  • Third attempt also timed out
  • Changes lost when /tmp worktrees were cleaned by macOS

Worktree location mistake:

  • Initially created worktrees in /tmp/
  • macOS cleaned them between terminal calls
  • Lost Kimi's partial work (17 files, -800/+705 lines)
  • Fixed: moved worktrees to ~/worktrees/ (persistent)

Kimi's timeout pattern:

  • Kimi spends significant time reading files and planning
  • On large tasks (16+ files), planning alone can exceed 300s
  • The --print flag doesn't give progress visibility
  • No way to tell if Kimi is stuck vs working

Metrics

Metric Value
Tasks dispatched 2
Tasks completed (PR created) 1
Tasks deferred 1
First-pass acceptance rate 50% (1/2)
Kimi invocations 4 (1 for #151, 3 for #148)
Total Kimi time ~12 min (2 + 5 + 5 + timeout)
Tests run 3x (1426 pass each)
Lines changed (accepted) +127 / -95

Lessons Learned

  1. Task size matters enormously. #151 (1 file, 109 lines) was perfect. #148 (16 files, 22 call sites) was too big. Kimi's sweet spot is confirmed: 1-3 files, <150 lines of diff.

  2. Split big tasks. #148 should have been split into 4-5 sub-tasks of 3-4 files each. "Convert sqlite connections in timmy/" as one task, "dashboard/" as another, etc.

  3. Never use /tmp for worktrees. macOS cleans /tmp aggressively. Use ~/worktrees/ instead.

  4. Kimi doesn't commit reliably. Even when told to commit, it may not. Always verify with git log after Kimi runs. Commit on behalf of Kimi if the work is good.

  5. Timeout of 300s is borderline. For 3+ file tasks, 300s may not be enough. Consider 600s for medium tasks. But the real fix is smaller tasks, not bigger timeouts.

  6. The two-attempt rule works. It prevented me from burning 3+ cycles on a task that needed restructuring, not retrying.

Changes for Cycle 2

  • Split #148 into 4 sub-tasks by directory
  • Use ~/worktrees/ exclusively (never /tmp)
  • Set 180s timeout for small tasks, 300s for medium
  • Always verify Kimi committed before moving on
  • Commit Kimi's work immediately even if incomplete

Deliverables

  • PR #160: refactor: decompose _maybe_distill() into focused helpers (#151)
    • Branch: refactor/151-maybe-distill
    • Tests: 1426 pass, 76% coverage
    • Ready to merge