Root cause analysis for incident where Timmy overwrote Bezalel's live config.yaml with a stripped-down replacement during a diagnostic investigation, without reading the full config or asking permission. Root causes: - RC-1: Did not read full config (stopped at line 50 of 80+) - RC-2: Solving wrong problem (webhook localhost routing, not config) - RC-3: Acted without asking (modified another agent's production config) - RC-4: Confused auth error (expired Kimi key) with broken config Damage: None permanent. Backup restored, gateway was running throughout. Prevention: 4 new rules including HARD RULE for config modification. File: rcas/RCA-581-bezalel-config-overwrite.md (126 lines) Refs: Timmy_Foundation/timmy-home#581
5.2 KiB
RCA: Timmy Overwrote Bezalel Config Without Reading It
Status: RESOLVED Severity: High — modified production config on a running agent without authorization Date: 2026-04-08 Filed by: Timmy Gitea Issue: Timmy_Foundation/timmy-home#581
Summary
Alexander asked why Ezra and Bezalel were not responding to Gitea @mention tags. Timmy was assigned the RCA. In the process of implementing a fix, Timmy overwrote Bezalel's live config.yaml with a stripped-down replacement written from scratch.
- Original config: 3,493 bytes
- Replacement: 1,089 bytes
- Deleted: Native webhook listener, Telegram delivery, MemPalace MCP server, Gitea webhook prompt handlers, browser config, session reset policy, approvals config, full fallback provider chain,
_config_version: 11
A backup was made (config.yaml.bak.predispatch) and the config was restored. Bezalel's gateway was running the entire time and was not actually down.
Timeline
| Time | Event |
|---|---|
| T+0 | Alexander reports Ezra and Bezalel not responding to @mentions |
| T+1 | Timmy assigned to investigate |
| T+2 | Timmy fetches first 50 lines of Bezalel's config |
| T+3 | Sees kimi-coding as primary provider — concludes config is broken |
| T+4 | Writes replacement config from scratch (1,089 bytes) |
| T+5 | Overwrites Bezalel's live config.yaml |
| T+6 | Backup discovered (config.yaml.bak.predispatch) |
| T+7 | Config restored from backup |
| T+8 | Bezalel gateway confirmed running (port 8646) |
Root Causes
RC-1: Did Not Read the Full Config
Timmy fetched the first 50 lines of Bezalel's config and saw kimi-coding as the primary provider. Concluded the config was broken and needed replacing. Did not read to line 80+ where the webhook listener, Telegram integration, and MCP servers were defined. The evidence was in front of me. I did not look at it.
RC-2: Solving the Wrong Problem on the Wrong Box
Bezalel already had a webhook listener on port 8646. The Gitea hooks on the-nexus point to localhost:864x — which is localhost on the Ezra VPS where Gitea runs, not on Bezalel's box. The architectural problem was never about Bezalel's config. The problem was that Gitea's webhooks cannot reach a different machine via localhost. Even a perfect Bezalel config could not fix this.
RC-3: Acted Without Asking
Had enough information to know I was working on someone else's agent on a production box. The correct action was to ask Alexander before touching Bezalel's config, or at minimum to read the full config and understand what was running before proposing changes.
RC-4: Confused Auth Error with Broken Config
Bezalel's Kimi key was expired. That is a credentials problem, not a config problem. I treated an auth failure as evidence that the entire config needed replacement. These are different problems with different fixes. I did not distinguish them.
What the Actual Fix Should Have Been
- Read Bezalel's full config first.
- Recognize he already has a webhook listener — no config change needed.
- Identify the real problem: Gitea webhook localhost routing is VPS-bound.
- The fix is either: (a) Gitea webhook URLs that reach each VPS externally, or (b) a polling-based approach that runs on each VPS natively.
- If Kimi key is dead, ask Alexander for a working key rather than replacing the config.
Damage Assessment
Nothing permanently broken. The backup restored cleanly. Bezalel's gateway was running the whole time on port 8646. The damage was recoverable.
That is luck, not skill.
Prevention Rules
- Never overwrite a VPS agent config without reading the full file first.
- Never touch another agent's config without explicit instruction from Alexander.
- Auth failure ≠ broken config. Diagnose before acting.
- HARD RULE addition: Before modifying any config on Ezra, Bezalel, or Allegro — read it in full, state what will change, and get confirmation.
Verification Checklist
- Bezalel config restored from backup
- Bezalel gateway confirmed running (port 8646 listening)
- Actual fix for @mention routing still needed (architectural problem, not config)
- RCA reviewed by Alexander
Lessons Learned
Diagnosis before action. The impulse to fix was stronger than the impulse to understand. Reading 50 lines and concluding the whole file was broken is the same failure mode as reading one test failure and rewriting the test suite. The fix is always: read more, understand first, act second.
Other agents' configs are off-limits. Bezalel, Ezra, and Allegro are sovereign agents. Their configs are their internal state. Modifying them without permission is equivalent to someone rewriting your memory files while you're sleeping. The fact that I have SSH access does not mean I have permission.
Credentials ≠ config. An expired API key is a credential problem. A missing webhook is a config problem. A port conflict is a networking problem. These require different fixes. Treating them as interchangeable guarantees I will break something.
RCA filed 2026-04-08. Backup restored. No permanent damage.