docs: inventory automation state and stale resurrection paths

2026-04-04 16:16:10 -04:00
parent 4489cee478
commit 6fa3f97284
3 changed files with 379 additions and 34 deletions
--- a/DEPRECATED.md
+++ b/DEPRECATED.md
@@ -1,23 +1,27 @@
-# DEPRECATED — Bash Loop Scripts Removed
+# DEPRECATED — policy, not proof of runtime absence

-**Date:** 2026-03-25
-**Reason:** Replaced by Hermes + timmy-config sidecar orchestration
+Original deprecation date: 2026-03-25

-## What was removed
- claude-loop.sh, gemini-loop.sh, agent-loop.sh
- timmy-orchestrator.sh, workforce-manager.py
- nexus-merge-bot.sh, claudemax-watchdog.sh, timmy-loopstat.sh
+This file records the policy direction: long-running ad hoc bash loops were meant
+to be replaced by Hermes-side orchestration.

-## What replaces them
-**Harness:** Hermes
-**Overlay repo:** Timmy_Foundation/timmy-config
-**Entry points:** `orchestration.py`, `tasks.py`, `deploy.sh`
-**Features:** Huey + SQLite scheduling, local-model health checks, session export, DPO artifact staging
+But policy and world state diverged.
+Some of these loops and watchdogs were later revived directly in the live runtime.

-## Why
-The bash loops crash-looped, produced zero work after relaunch, had no crash
-recovery, no durable export path, and required too many ad hoc scripts. The
-Hermes sidecar keeps orchestration close to Timmy's actual config and training
-surfaces.
+Do NOT use this file as proof that something is gone.
+Use `docs/automation-inventory.md` as the current world-state document.

-Do NOT recreate bash loops. If orchestration is broken, fix the Hermes sidecar.
+## Deprecated by policy
+- old dashboard-era loop stacks
+- old tmux resurrection paths
+- old startup paths that recreate `timmy-loop`
+- stale repo-specific automation tied to `Timmy-time-dashboard` or `the-matrix`
+
+## Current rule
+If an automation question matters, audit:
+1. launchd loaded jobs
+2. live process table
+3. Hermes cron list
+4. the automation inventory doc
+
+Only then decide what is actually live.
--- a/README.md
+++ b/README.md
@@ -14,10 +14,9 @@ timmy-config/
 ├── DEPRECATED.md              ← What was removed and why
 ├── config.yaml                ← Hermes harness configuration
 ├── channel_directory.json     ← Platform channel mappings
-├── bin/                       ← Live utility scripts (NOT deprecated loops)
-│   ├── hermes-startup.sh      ← Hermes boot sequence
+├── bin/                       ← Sidecar-managed operational scripts
+│   ├── hermes-startup.sh      ← Dormant startup path (audit before enabling)
 │   ├── agent-dispatch.sh      ← Manual agent dispatch
-│   ├── deploy-allegro-house.sh← Bootstraps the remote Allegro wizard house
 │   ├── ops-panel.sh           ← Ops dashboard panel
 │   ├── ops-gitea.sh           ← Gitea ops helpers
 │   ├── pipeline-freshness.sh  ← Session/export drift check
@@ -26,7 +25,7 @@ timmy-config/
 ├── skins/                     ← UI skins (timmy skin)
 ├── playbooks/                 ← Agent playbooks (YAML)
 ├── cron/                      ← Cron job definitions
-├── wizards/                   ← Remote wizard-house templates + units
+├── docs/automation-inventory.md ← Live automation + stale-state inventory
 └── training/                  ← Transitional training recipes, not canonical lived data
 ```

@@ -42,9 +41,10 @@ If a file answers "who is Timmy?" or "how does Hermes host him?", it belongs
 here. If it answers "what has Timmy done or learned?" it belongs in
 `timmy-home`.

-The scripts in `bin/` are live operational helpers for the Hermes sidecar.
-What is dead are the old long-running bash worker loops, not every script in
-this repo.
+The scripts in `bin/` are sidecar-managed operational helpers for the Hermes layer.
+Do NOT assume older prose about removed loops is still true at runtime.
+Audit the live machine first, then read `docs/automation-inventory.md` for the
+current reality and stale-state risks.

 ## Orchestration: Huey

@@ -56,15 +56,6 @@ pip install huey
 huey_consumer.py tasks.huey -w 2 -k thread
 ```

-## Proof Standard
-
-This repo uses a hard proof rule for merges.
-
- visual changes require screenshot proof
- CLI/verifiable changes must cite logs, command output, or world-state proof
- screenshots/media stay out of Gitea backup unless explicitly required
- see `CONTRIBUTING.md` for the merge gate
-
 ## Deploy

 ```bash
--- a/docs/automation-inventory.md
+++ b/docs/automation-inventory.md
@@ -0,0 +1,350 @@
+# Automation Inventory
+
+Last audited: 2026-04-04 15:55 EDT
+Owner: Timmy sidecar / Timmy home split
+Purpose: document every known automation that can restart services, revive old worktrees, reuse stale session state, or re-enter old queue state.
+
+## Why this file exists
+
+The failure mode is not just "a process is running".
+The failure mode is:
+- launchd or a watchdog restarts something behind our backs
+- the restarted process reads old config, old labels, old worktrees, old session mappings, or old tmux assumptions
+- the machine appears haunted because old state comes back after we thought it was gone
+
+This file is the source of truth for what automations exist, what state they read, and how to stop or reset them safely.
+
+## Source-of-truth split
+
+Not all automations live in one repo.
+
+1. timmy-config
+   Path: ~/.timmy/timmy-config
+   Owns: sidecar deployment, ~/.hermes/config.yaml overlay, launch-facing helper scripts in timmy-config/bin/
+
+2. timmy-home
+   Path: ~/.timmy
+   Owns: Kimi heartbeat script at uniwizard/kimi-heartbeat.sh and other workspace-native automation
+
+3. live runtime
+   Path: ~/.hermes/bin
+   Reality: some scripts are still only present live in ~/.hermes/bin and are NOT yet mirrored into timmy-config/bin/
+
+Rule:
+- Do not assume ~/.hermes/bin is canonical.
+- Do not assume timmy-config contains every currently running automation.
+- Audit runtime first, then reconcile to source control.
+
+## Current live automations
+
+### A. launchd-loaded automations
+
+These are loaded right now according to `launchctl list` after the 2026-04-04 phase-2 cleanup.
+The only Timmy-specific launchd jobs still loaded are the ones below.
+
+#### 1. ai.hermes.gateway
+- Plist: ~/Library/LaunchAgents/ai.hermes.gateway.plist
+- Command: `python -m hermes_cli.main gateway run --replace`
+- HERMES_HOME: `~/.hermes`
+- Logs:
+  - `~/.hermes/logs/gateway.log`
+  - `~/.hermes/logs/gateway.error.log`
+- KeepAlive: yes
+- RunAtLoad: yes
+- State it reuses:
+  - `~/.hermes/config.yaml`
+  - `~/.hermes/channel_directory.json`
+  - `~/.hermes/sessions/sessions.json`
+  - `~/.hermes/state.db`
+- Old-state risk:
+  - if config drifted, this gateway will faithfully revive the drift
+  - if Telegram/session mappings are stale, it will continue stale conversations
+
+Stop:
+```bash
+launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/ai.hermes.gateway.plist
+```
+Start:
+```bash
+launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/ai.hermes.gateway.plist
+```
+
+#### 2. ai.hermes.gateway-fenrir
+- Plist: ~/Library/LaunchAgents/ai.hermes.gateway-fenrir.plist
+- Command: same gateway binary
+- HERMES_HOME: `~/.hermes/profiles/fenrir`
+- Logs:
+  - `~/.hermes/profiles/fenrir/logs/gateway.log`
+  - `~/.hermes/profiles/fenrir/logs/gateway.error.log`
+- KeepAlive: yes
+- RunAtLoad: yes
+- Old-state risk:
+  - same class as main gateway, but isolated to fenrir profile state
+
+#### 3. ai.openclaw.gateway
+- Plist: ~/Library/LaunchAgents/ai.openclaw.gateway.plist
+- Command: `node .../openclaw/dist/index.js gateway --port 18789`
+- Logs:
+  - `~/.openclaw/logs/gateway.log`
+  - `~/.openclaw/logs/gateway.err.log`
+- KeepAlive: yes
+- RunAtLoad: yes
+- Old-state risk:
+  - long-lived gateway survives toolchain assumptions and keeps accepting work even if upstream routing changed
+
+#### 4. ai.timmy.kimi-heartbeat
+- Plist: ~/Library/LaunchAgents/ai.timmy.kimi-heartbeat.plist
+- Command: `/bin/bash ~/.timmy/uniwizard/kimi-heartbeat.sh`
+- Interval: every 300s
+- Logs:
+  - `/tmp/kimi-heartbeat-launchd.log`
+  - `/tmp/kimi-heartbeat-launchd.err`
+  - script log: `/tmp/kimi-heartbeat.log`
+- State it reuses:
+  - `/tmp/kimi-heartbeat.lock`
+  - Gitea labels: `assigned-kimi`, `kimi-in-progress`, `kimi-done`
+  - repo issue bodies/comments as task memory
+- Current behavior as of this audit:
+  - stale `kimi-in-progress` tasks are now reclaimed after 1 hour of silence
+- Old-state risk:
+  - labels ARE the queue state; if labels are stale, the heartbeat used to starve forever
+  - the heartbeat is source-controlled in timmy-home, not timmy-config
+
+Stop:
+```bash
+launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/ai.timmy.kimi-heartbeat.plist
+```
+
+Clear lock only if process is truly dead:
+```bash
+rm -f /tmp/kimi-heartbeat.lock
+```
+
+#### 5. ai.timmy.claudemax-watchdog
+- Plist: ~/Library/LaunchAgents/ai.timmy.claudemax-watchdog.plist
+- Command: `/bin/bash ~/.hermes/bin/claudemax-watchdog.sh`
+- Interval: every 300s
+- Logs:
+  - `~/.hermes/logs/claudemax-watchdog.log`
+  - launchd wrapper: `~/.hermes/logs/claudemax-launchd.log`
+- State it reuses:
+  - live process table via `pgrep`
+  - recent Claude logs `~/.hermes/logs/claude-*.log`
+  - backlog count from Gitea
+- Current behavior as of this audit:
+  - will NOT restart claude-loop if recent Claude logs say `You've hit your limit`
+  - will log-and-skip missing helper scripts instead of failing loudly
+- Old-state risk:
+  - any watchdog can resurrect a loop you meant to leave dead
+  - this is the first place to check when a loop "comes back"
+
+### B. quarantined legacy launch agents
+
+These were moved out of `~/Library/LaunchAgents` on 2026-04-04 to:
+`~/Library/LaunchAgents.quarantine/timmy-legacy-20260404/`
+
+#### 6. com.timmy.dashboard-backend
+- Former plist: `com.timmy.dashboard-backend.plist`
+- Former command: uvicorn `dashboard.app:app`
+- Former working directory: `~/worktrees/kimi-repo`
+- Quarantine reason:
+  - served code from a specific stale worktree
+  - could revive old backend state by launchd KeepAlive alone
+
+#### 7. com.timmy.matrix-frontend
+- Former plist: `com.timmy.matrix-frontend.plist`
+- Former command: `npx vite --host`
+- Former working directory: `~/worktrees/the-matrix`
+- Quarantine reason:
+  - pointed at the old `the-matrix` lineage instead of current nexus truth
+  - could revive a stale frontend every login
+
+#### 8. ai.hermes.startup
+- Former plist: `ai.hermes.startup.plist`
+- Former command: `~/.hermes/bin/hermes-startup.sh`
+- Quarantine reason:
+  - startup path still expected missing `timmy-tmux.sh`
+  - could recreate old webhook/tmux assumptions at login
+
+#### 9. com.timmy.tick
+- Former plist: `com.timmy.tick.plist`
+- Former command: `/Users/apayne/Timmy-time-dashboard/deploy/timmy-tick-mac.sh`
+- Quarantine reason:
+  - pure dashboard-era legacy path
+
+### C. running now but NOT launchd-managed
+
+These are live processes, but not currently represented by a loaded launchd plist.
+They can still persist because they were started with `nohup` or by other parent scripts.
+
+#### 8. gemini-loop.sh
+- Live process: `~/.hermes/bin/gemini-loop.sh`
+- State files:
+  - `~/.hermes/logs/gemini-loop.log`
+  - `~/.hermes/logs/gemini-skip-list.json`
+  - `~/.hermes/logs/gemini-active.json`
+  - `~/.hermes/logs/gemini-locks/`
+  - `~/.hermes/logs/gemini-pids/`
+  - worktrees under `~/worktrees/gemini-w*`
+  - per-issue logs `~/.hermes/logs/gemini-*.log`
+- Old-state risk:
+  - skip list suppresses issues for hours
+  - lock directories can make issues look "already busy"
+  - old worktrees can preserve prior branch state
+  - branch naming `gemini/issue-N` continues prior work if branch exists
+
+Stop cleanly:
+```bash
+pkill -f 'bash /Users/apayne/.hermes/bin/gemini-loop.sh'
+pkill -f 'gemini .*--yolo'
+rm -rf ~/.hermes/logs/gemini-locks/*.lock ~/.hermes/logs/gemini-pids/*.pid
+printf '{}\n' > ~/.hermes/logs/gemini-active.json
+```
+
+#### 9. timmy-orchestrator.sh
+- Live process: `~/.hermes/bin/timmy-orchestrator.sh`
+- State files:
+  - `~/.hermes/logs/timmy-orchestrator.log`
+  - `~/.hermes/logs/timmy-orchestrator.pid`
+  - `~/.hermes/logs/timmy-reviews.log`
+  - `~/.hermes/logs/workforce-manager.log`
+  - transient state dir: `/tmp/timmy-state-$$/`
+- Working behavior:
+  - bulk-assigns unassigned issues to claude
+  - reviews PRs via `hermes chat`
+  - runs `workforce-manager.py`
+- Old-state risk:
+  - writes agent assignments back into Gitea
+  - can repopulate agent queues even after you thought they were cleared
+  - not represented in timmy-config/bin yet as of this audit
+
+### D. Hermes cron automations
+
+Current cron inventory from `cronjob(list, include_disabled=true)`:
+
+Enabled:
+- `a77a87392582` — Health Monitor — every 5m
+
+Paused:
+- `9e0624269ba7` — Triage Heartbeat
+- `e29eda4a8548` — PR Review Sweep
+- `5e9d952871bc` — Agent Status Check
+- `36fb2f630a17` — Hermes Philosophy Loop
+
+Old-state risk:
+- paused crons are not dead forever; they are resumable state
+- LLM-wrapped crons can revive old routing/model assumptions if resumed blindly
+
+### E. file exists but NOT currently loaded
+
+These are the ones most likely to surprise us later because they still exist and point at old realities.
+
+#### 10. com.tower.pr-automerge
+- Plist: `~/Library/LaunchAgents/com.tower.pr-automerge.plist`
+- Points to: `/Users/apayne/hermes-config/bin/pr-automerge.sh`
+- Not loaded at audit time
+- Separate Tower-era automation path; not part of current Timmy sidecar truth
+
+## State carriers that make the machine feel haunted
+
+These are the files and external states that most often "bring back old state":
+
+### Hermes runtime state
+- `~/.hermes/config.yaml`
+- `~/.hermes/channel_directory.json`
+- `~/.hermes/sessions/sessions.json`
+- `~/.hermes/state.db`
+
+### Loop state
+- `~/.hermes/logs/claude-skip-list.json`
+- `~/.hermes/logs/claude-active.json`
+- `~/.hermes/logs/claude-locks/`
+- `~/.hermes/logs/claude-pids/`
+- `~/.hermes/logs/gemini-skip-list.json`
+- `~/.hermes/logs/gemini-active.json`
+- `~/.hermes/logs/gemini-locks/`
+- `~/.hermes/logs/gemini-pids/`
+
+### Kimi queue state
+- Gitea labels, not local files, are the queue truth
+- `assigned-kimi`
+- `kimi-in-progress`
+- `kimi-done`
+
+### Worktree state
+- `~/worktrees/*`
+- especially old frontend/backend worktrees like:
+  - `~/worktrees/the-matrix`
+  - `~/worktrees/kimi-repo`
+
+### Launchd state
+- plist files in `~/Library/LaunchAgents`
+- anything with `RunAtLoad` and `KeepAlive` can resurrect automatically
+
+## Audit commands
+
+List loaded Timmy/Hermes automations:
+```bash
+launchctl list | egrep 'timmy|kimi|claude|max|dashboard|matrix|gateway|huey'
+```
+
+List Timmy/Hermes launch agent files:
+```bash
+find ~/Library/LaunchAgents -maxdepth 1 -name '*.plist' | egrep 'timmy|hermes|openclaw|tower'
+```
+
+List running loop scripts:
+```bash
+ps -Ao pid,ppid,etime,command | egrep '/Users/apayne/.hermes/bin/|/Users/apayne/.timmy/uniwizard/'
+```
+
+List cron jobs:
+```bash
+hermes cron list --include-disabled
+```
+
+## Safe reset order when old state keeps coming back
+
+1. Stop launchd jobs first
+```bash
+launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/ai.timmy.kimi-heartbeat.plist || true
+launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/ai.timmy.claudemax-watchdog.plist || true
+launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/ai.hermes.gateway.plist || true
+launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/ai.hermes.gateway-fenrir.plist || true
+launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/ai.openclaw.gateway.plist || true
+```
+
+2. Kill manual loops
+```bash
+pkill -f 'gemini-loop.sh' || true
+pkill -f 'timmy-orchestrator.sh' || true
+pkill -f 'claude-loop.sh' || true
+pkill -f 'claude .*--print' || true
+pkill -f 'gemini .*--yolo' || true
+```
+
+3. Clear local loop state
+```bash
+rm -rf ~/.hermes/logs/claude-locks/*.lock ~/.hermes/logs/claude-pids/*.pid
+rm -rf ~/.hermes/logs/gemini-locks/*.lock ~/.hermes/logs/gemini-pids/*.pid
+printf '{}\n' > ~/.hermes/logs/claude-active.json
+printf '{}\n' > ~/.hermes/logs/gemini-active.json
+rm -f /tmp/kimi-heartbeat.lock
+```
+
+4. If gateway/session drift is the problem, back up before clearing
+```bash
+cp ~/.hermes/config.yaml ~/.hermes/config.yaml.bak.$(date +%Y%m%d-%H%M%S)
+cp ~/.hermes/sessions/sessions.json ~/.hermes/sessions/sessions.json.bak.$(date +%Y%m%d-%H%M%S)
+```
+
+5. Relaunch only what you explicitly want
+
+## Current contradictions to fix later
+
+1. README and DEPRECATED were corrected on 2026-04-04, but older local clones may still have stale prose.
+2. The quarantined launch agents now live under `~/Library/LaunchAgents.quarantine/timmy-legacy-20260404/`; if someone moves them back, the old state can return.
+3. `gemini-loop.sh` and `timmy-orchestrator.sh` are live but not yet mirrored into timmy-config/bin/.
+4. The open docs PR must be kept clean: do not mix operational script recovery and documentation history on the same branch.
+
+Until those are reconciled, trust this inventory over older prose.