From ffea2964c4a90db6ac574872178e7cbff52cb563 Mon Sep 17 00:00:00 2001 From: Alexander Whitestone Date: Sat, 4 Apr 2026 15:57:54 -0400 Subject: [PATCH] docs: inventory live automations and stale-state risks --- DEPRECATED.md | 40 ++-- README.md | 12 +- docs/automation-inventory.md | 358 +++++++++++++++++++++++++++++++++++ 3 files changed, 387 insertions(+), 23 deletions(-) create mode 100644 docs/automation-inventory.md diff --git a/DEPRECATED.md b/DEPRECATED.md index 63f31903..d03a5d3b 100644 --- a/DEPRECATED.md +++ b/DEPRECATED.md @@ -1,23 +1,27 @@ -# DEPRECATED — Bash Loop Scripts Removed +# DEPRECATED — policy, not proof of runtime absence -**Date:** 2026-03-25 -**Reason:** Replaced by Hermes + timmy-config sidecar orchestration +Original deprecation date: 2026-03-25 -## What was removed -- claude-loop.sh, gemini-loop.sh, agent-loop.sh -- timmy-orchestrator.sh, workforce-manager.py -- nexus-merge-bot.sh, claudemax-watchdog.sh, timmy-loopstat.sh +This file records the policy direction: long-running ad hoc bash loops were meant +to be replaced by Hermes-side orchestration. -## What replaces them -**Harness:** Hermes -**Overlay repo:** Timmy_Foundation/timmy-config -**Entry points:** `orchestration.py`, `tasks.py`, `deploy.sh` -**Features:** Huey + SQLite scheduling, local-model health checks, session export, DPO artifact staging +But policy and world state diverged. +Some of these loops and watchdogs were later revived directly in the live runtime. -## Why -The bash loops crash-looped, produced zero work after relaunch, had no crash -recovery, no durable export path, and required too many ad hoc scripts. The -Hermes sidecar keeps orchestration close to Timmy's actual config and training -surfaces. +Do NOT use this file as proof that something is gone. +Use `docs/automation-inventory.md` as the current world-state document. -Do NOT recreate bash loops. If orchestration is broken, fix the Hermes sidecar. +## Deprecated by policy +- old dashboard-era loop stacks +- old tmux resurrection paths +- old startup paths that recreate `timmy-loop` +- stale repo-specific automation tied to `Timmy-time-dashboard` or `the-matrix` + +## Current rule +If an automation question matters, audit: +1. launchd loaded jobs +2. live process table +3. Hermes cron list +4. the automation inventory doc + +Only then decide what is actually live. diff --git a/README.md b/README.md index 4e3205a9..a82de6f7 100644 --- a/README.md +++ b/README.md @@ -14,8 +14,8 @@ timmy-config/ ├── DEPRECATED.md ← What was removed and why ├── config.yaml ← Hermes harness configuration ├── channel_directory.json ← Platform channel mappings -├── bin/ ← Live utility scripts (NOT deprecated loops) -│ ├── hermes-startup.sh ← Hermes boot sequence +├── bin/ ← Sidecar-managed operational scripts +│ ├── hermes-startup.sh ← Dormant startup path (audit before enabling) │ ├── agent-dispatch.sh ← Manual agent dispatch │ ├── ops-panel.sh ← Ops dashboard panel │ ├── ops-gitea.sh ← Gitea ops helpers @@ -25,6 +25,7 @@ timmy-config/ ├── skins/ ← UI skins (timmy skin) ├── playbooks/ ← Agent playbooks (YAML) ├── cron/ ← Cron job definitions +├── docs/automation-inventory.md ← Live automation + stale-state inventory └── training/ ← Transitional training recipes, not canonical lived data ``` @@ -40,9 +41,10 @@ If a file answers "who is Timmy?" or "how does Hermes host him?", it belongs here. If it answers "what has Timmy done or learned?" it belongs in `timmy-home`. -The scripts in `bin/` are live operational helpers for the Hermes sidecar. -What is dead are the old long-running bash worker loops, not every script in -this repo. +The scripts in `bin/` are sidecar-managed operational helpers for the Hermes layer. +Do NOT assume older prose about removed loops is still true at runtime. +Audit the live machine first, then read `docs/automation-inventory.md` for the +current reality and stale-state risks. ## Orchestration: Huey diff --git a/docs/automation-inventory.md b/docs/automation-inventory.md new file mode 100644 index 00000000..d7c76763 --- /dev/null +++ b/docs/automation-inventory.md @@ -0,0 +1,358 @@ +# Automation Inventory + +Last audited: 2026-04-04 15:55 EDT +Owner: Timmy sidecar / Timmy home split +Purpose: document every known automation that can restart services, revive old worktrees, reuse stale session state, or re-enter old queue state. + +## Why this file exists + +The failure mode is not just "a process is running". +The failure mode is: +- launchd or a watchdog restarts something behind our backs +- the restarted process reads old config, old labels, old worktrees, old session mappings, or old tmux assumptions +- the machine appears haunted because old state comes back after we thought it was gone + +This file is the source of truth for what automations exist, what state they read, and how to stop or reset them safely. + +## Source-of-truth split + +Not all automations live in one repo. + +1. timmy-config + Path: ~/.timmy/timmy-config + Owns: sidecar deployment, ~/.hermes/config.yaml overlay, launch-facing helper scripts in timmy-config/bin/ + +2. timmy-home + Path: ~/.timmy + Owns: Kimi heartbeat script at uniwizard/kimi-heartbeat.sh and other workspace-native automation + +3. live runtime + Path: ~/.hermes/bin + Reality: some scripts are still only present live in ~/.hermes/bin and are NOT yet mirrored into timmy-config/bin/ + +Rule: +- Do not assume ~/.hermes/bin is canonical. +- Do not assume timmy-config contains every currently running automation. +- Audit runtime first, then reconcile to source control. + +## Current live automations + +### A. launchd-loaded automations + +These are loaded right now according to `launchctl list`. + +#### 1. ai.hermes.gateway +- Plist: ~/Library/LaunchAgents/ai.hermes.gateway.plist +- Command: `python -m hermes_cli.main gateway run --replace` +- HERMES_HOME: `~/.hermes` +- Logs: + - `~/.hermes/logs/gateway.log` + - `~/.hermes/logs/gateway.error.log` +- KeepAlive: yes +- RunAtLoad: yes +- State it reuses: + - `~/.hermes/config.yaml` + - `~/.hermes/channel_directory.json` + - `~/.hermes/sessions/sessions.json` + - `~/.hermes/state.db` +- Old-state risk: + - if config drifted, this gateway will faithfully revive the drift + - if Telegram/session mappings are stale, it will continue stale conversations + +Stop: +```bash +launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/ai.hermes.gateway.plist +``` +Start: +```bash +launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/ai.hermes.gateway.plist +``` + +#### 2. ai.hermes.gateway-fenrir +- Plist: ~/Library/LaunchAgents/ai.hermes.gateway-fenrir.plist +- Command: same gateway binary +- HERMES_HOME: `~/.hermes/profiles/fenrir` +- Logs: + - `~/.hermes/profiles/fenrir/logs/gateway.log` + - `~/.hermes/profiles/fenrir/logs/gateway.error.log` +- KeepAlive: yes +- RunAtLoad: yes +- Old-state risk: + - same class as main gateway, but isolated to fenrir profile state + +#### 3. ai.openclaw.gateway +- Plist: ~/Library/LaunchAgents/ai.openclaw.gateway.plist +- Command: `node .../openclaw/dist/index.js gateway --port 18789` +- Logs: + - `~/.openclaw/logs/gateway.log` + - `~/.openclaw/logs/gateway.err.log` +- KeepAlive: yes +- RunAtLoad: yes +- Old-state risk: + - long-lived gateway survives toolchain assumptions and keeps accepting work even if upstream routing changed + +#### 4. ai.timmy.kimi-heartbeat +- Plist: ~/Library/LaunchAgents/ai.timmy.kimi-heartbeat.plist +- Command: `/bin/bash ~/.timmy/uniwizard/kimi-heartbeat.sh` +- Interval: every 300s +- Logs: + - `/tmp/kimi-heartbeat-launchd.log` + - `/tmp/kimi-heartbeat-launchd.err` + - script log: `/tmp/kimi-heartbeat.log` +- State it reuses: + - `/tmp/kimi-heartbeat.lock` + - Gitea labels: `assigned-kimi`, `kimi-in-progress`, `kimi-done` + - repo issue bodies/comments as task memory +- Current behavior as of this audit: + - stale `kimi-in-progress` tasks are now reclaimed after 1 hour of silence +- Old-state risk: + - labels ARE the queue state; if labels are stale, the heartbeat used to starve forever + - the heartbeat is source-controlled in timmy-home, not timmy-config + +Stop: +```bash +launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/ai.timmy.kimi-heartbeat.plist +``` + +Clear lock only if process is truly dead: +```bash +rm -f /tmp/kimi-heartbeat.lock +``` + +#### 5. ai.timmy.claudemax-watchdog +- Plist: ~/Library/LaunchAgents/ai.timmy.claudemax-watchdog.plist +- Command: `/bin/bash ~/.hermes/bin/claudemax-watchdog.sh` +- Interval: every 300s +- Logs: + - `~/.hermes/logs/claudemax-watchdog.log` + - launchd wrapper: `~/.hermes/logs/claudemax-launchd.log` +- State it reuses: + - live process table via `pgrep` + - recent Claude logs `~/.hermes/logs/claude-*.log` + - backlog count from Gitea +- Current behavior as of this audit: + - will NOT restart claude-loop if recent Claude logs say `You've hit your limit` + - will log-and-skip missing helper scripts instead of failing loudly +- Old-state risk: + - any watchdog can resurrect a loop you meant to leave dead + - this is the first place to check when a loop "comes back" + +#### 6. com.timmy.dashboard-backend +- Plist: ~/Library/LaunchAgents/com.timmy.dashboard-backend.plist +- Command: uvicorn `dashboard.app:app` +- Working directory: `~/worktrees/kimi-repo` +- Port: 8100 +- Logs: `~/.hermes/logs/dashboard-backend.log` +- KeepAlive: yes +- RunAtLoad: yes +- Old-state risk: + - this serves code from a specific worktree, not from current repo truth in the abstract + - if `~/worktrees/kimi-repo` is stale, launchd will faithfully keep serving stale code + +#### 7. com.timmy.matrix-frontend +- Plist: ~/Library/LaunchAgents/com.timmy.matrix-frontend.plist +- Command: `npx vite --host` +- Working directory: `~/worktrees/the-matrix` +- Logs: `~/.hermes/logs/matrix-frontend.log` +- KeepAlive: yes +- RunAtLoad: yes +- Old-state risk: + - HIGH + - this still points at `~/worktrees/the-matrix`, even though the live 3D world work moved to `Timmy_Foundation/the-nexus` + - if this is left loaded, it can revive the old frontend lineage + +### B. running now but NOT launchd-managed + +These are live processes, but not currently represented by a loaded launchd plist. +They can still persist because they were started with `nohup` or by other parent scripts. + +#### 8. gemini-loop.sh +- Live process: `~/.hermes/bin/gemini-loop.sh` +- State files: + - `~/.hermes/logs/gemini-loop.log` + - `~/.hermes/logs/gemini-skip-list.json` + - `~/.hermes/logs/gemini-active.json` + - `~/.hermes/logs/gemini-locks/` + - `~/.hermes/logs/gemini-pids/` + - worktrees under `~/worktrees/gemini-w*` + - per-issue logs `~/.hermes/logs/gemini-*.log` +- Old-state risk: + - skip list suppresses issues for hours + - lock directories can make issues look "already busy" + - old worktrees can preserve prior branch state + - branch naming `gemini/issue-N` continues prior work if branch exists + +Stop cleanly: +```bash +pkill -f 'bash /Users/apayne/.hermes/bin/gemini-loop.sh' +pkill -f 'gemini .*--yolo' +rm -rf ~/.hermes/logs/gemini-locks/*.lock ~/.hermes/logs/gemini-pids/*.pid +printf '{}\n' > ~/.hermes/logs/gemini-active.json +``` + +#### 9. timmy-orchestrator.sh +- Live process: `~/.hermes/bin/timmy-orchestrator.sh` +- State files: + - `~/.hermes/logs/timmy-orchestrator.log` + - `~/.hermes/logs/timmy-orchestrator.pid` + - `~/.hermes/logs/timmy-reviews.log` + - `~/.hermes/logs/workforce-manager.log` + - transient state dir: `/tmp/timmy-state-$$/` +- Working behavior: + - bulk-assigns unassigned issues to claude + - reviews PRs via `hermes chat` + - runs `workforce-manager.py` +- Old-state risk: + - writes agent assignments back into Gitea + - can repopulate agent queues even after you thought they were cleared + - not represented in timmy-config/bin yet as of this audit + +### C. Hermes cron automations + +Current cron inventory from `cronjob(list, include_disabled=true)`: + +Enabled: +- `a77a87392582` — Health Monitor — every 5m + +Paused: +- `9e0624269ba7` — Triage Heartbeat +- `e29eda4a8548` — PR Review Sweep +- `5e9d952871bc` — Agent Status Check +- `36fb2f630a17` — Hermes Philosophy Loop + +Old-state risk: +- paused crons are not dead forever; they are resumable state +- LLM-wrapped crons can revive old routing/model assumptions if resumed blindly + +### D. file exists but NOT currently loaded + +These are the ones most likely to surprise us later because they still exist and point at old realities. + +#### 10. ai.hermes.startup +- Plist: `~/Library/LaunchAgents/ai.hermes.startup.plist` +- Points to: `~/.hermes/bin/hermes-startup.sh` +- Not loaded in launchctl at audit time +- High-risk notes: + - startup script still expects `~/.hermes/bin/timmy-tmux.sh` + - that file is MISSING at audit time + - script also tries to start webhook listener and the old `timmy-loop` tmux world +- This is a dormant old-state resurrection path + +#### 11. com.timmy.tick +- Plist: `~/Library/LaunchAgents/com.timmy.tick.plist` +- Points to: `/Users/apayne/Timmy-time-dashboard/deploy/timmy-tick-mac.sh` +- Not loaded at audit time +- Definitely legacy dashboard-era automation + +#### 12. com.tower.pr-automerge +- Plist: `~/Library/LaunchAgents/com.tower.pr-automerge.plist` +- Points to: `/Users/apayne/hermes-config/bin/pr-automerge.sh` +- Not loaded at audit time +- Separate Tower-era automation path; not part of current Timmy sidecar truth + +## State carriers that make the machine feel haunted + +These are the files and external states that most often "bring back old state": + +### Hermes runtime state +- `~/.hermes/config.yaml` +- `~/.hermes/channel_directory.json` +- `~/.hermes/sessions/sessions.json` +- `~/.hermes/state.db` + +### Loop state +- `~/.hermes/logs/claude-skip-list.json` +- `~/.hermes/logs/claude-active.json` +- `~/.hermes/logs/claude-locks/` +- `~/.hermes/logs/claude-pids/` +- `~/.hermes/logs/gemini-skip-list.json` +- `~/.hermes/logs/gemini-active.json` +- `~/.hermes/logs/gemini-locks/` +- `~/.hermes/logs/gemini-pids/` + +### Kimi queue state +- Gitea labels, not local files, are the queue truth +- `assigned-kimi` +- `kimi-in-progress` +- `kimi-done` + +### Worktree state +- `~/worktrees/*` +- especially old frontend/backend worktrees like: + - `~/worktrees/the-matrix` + - `~/worktrees/kimi-repo` + +### Launchd state +- plist files in `~/Library/LaunchAgents` +- anything with `RunAtLoad` and `KeepAlive` can resurrect automatically + +## Audit commands + +List loaded Timmy/Hermes automations: +```bash +launchctl list | egrep 'timmy|kimi|claude|max|dashboard|matrix|gateway|huey' +``` + +List Timmy/Hermes launch agent files: +```bash +find ~/Library/LaunchAgents -maxdepth 1 -name '*.plist' | egrep 'timmy|hermes|openclaw|tower' +``` + +List running loop scripts: +```bash +ps -Ao pid,ppid,etime,command | egrep '/Users/apayne/.hermes/bin/|/Users/apayne/.timmy/uniwizard/' +``` + +List cron jobs: +```bash +hermes cron list --include-disabled +``` + +## Safe reset order when old state keeps coming back + +1. Stop launchd jobs first +```bash +launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/ai.timmy.kimi-heartbeat.plist || true +launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/ai.timmy.claudemax-watchdog.plist || true +launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/ai.hermes.gateway.plist || true +launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/ai.hermes.gateway-fenrir.plist || true +launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/ai.openclaw.gateway.plist || true +launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/com.timmy.dashboard-backend.plist || true +launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/com.timmy.matrix-frontend.plist || true +``` + +2. Kill manual loops +```bash +pkill -f 'gemini-loop.sh' || true +pkill -f 'timmy-orchestrator.sh' || true +pkill -f 'claude-loop.sh' || true +pkill -f 'claude .*--print' || true +pkill -f 'gemini .*--yolo' || true +``` + +3. Clear local loop state +```bash +rm -rf ~/.hermes/logs/claude-locks/*.lock ~/.hermes/logs/claude-pids/*.pid +rm -rf ~/.hermes/logs/gemini-locks/*.lock ~/.hermes/logs/gemini-pids/*.pid +printf '{}\n' > ~/.hermes/logs/claude-active.json +printf '{}\n' > ~/.hermes/logs/gemini-active.json +rm -f /tmp/kimi-heartbeat.lock +``` + +4. If gateway/session drift is the problem, back up before clearing +```bash +cp ~/.hermes/config.yaml ~/.hermes/config.yaml.bak.$(date +%Y%m%d-%H%M%S) +cp ~/.hermes/sessions/sessions.json ~/.hermes/sessions/sessions.json.bak.$(date +%Y%m%d-%H%M%S) +``` + +5. Relaunch only what you explicitly want + +## Current contradictions to fix later + +1. README still describes `bin/` as "NOT deprecated loops" but live runtime still contains revived loop scripts. +2. `DEPRECATED.md` says claude-loop/gemini-loop/timmy-orchestrator/claudemax-watchdog were removed, but reality disagrees. +3. `com.timmy.matrix-frontend` still points at `~/worktrees/the-matrix` rather than the nexus lineage. +4. `ai.hermes.startup` still points at a startup path that expects missing `timmy-tmux.sh`. +5. `gemini-loop.sh` and `timmy-orchestrator.sh` are live but not yet mirrored into timmy-config/bin/. + +Until those are reconciled, trust this inventory over older prose.