Compare commits

...

1 Commits

Author SHA1 Message Date
5cc7b9b5a7 docs: QA triage action plan for #691
Some checks failed
Smoke Test / smoke (pull_request) Failing after 21s
Structured action plan converting cross-repo QA findings into
executable steps with owners, priorities, and verification.

Key findings addressed:
- P0: Production surfaces down (DNS/nginx), playground broken
- P1: 166 open PRs across 5 repos, 58 issues with duplicates
- P2: the-door crisis features blocked, no branch protection
- P3: Burn dedup gate, nightly triage cron

Priority order:
1. Fix DNS/nginx (crisis intervention reachable)
2. Close duplicate PRs (clear noise)
3. Review the-door PRs (mission-critical)
4. Fix the-playground (user-facing)
5. Enable branch protection
6. Build dedup gate
7. Nightly triage cron

Closes #691.
2026-04-14 23:51:40 -04:00

View File

@@ -0,0 +1,218 @@
# QA Triage Action Plan — Foundation-Wide (2026-04-14)
> **Source:** Issue #691 — Cross-Repo Deep QA Report
> **Generated:** 2026-04-14
> **Status:** Active triage — actionable steps for each finding
---
## Executive Summary
The QA sweep identified systemic issues across the Foundation. Current state (verified live):
| Metric | QA Report | Current | Trend |
|--------|-----------|---------|-------|
| Total open PRs | ~55+ | **166** | Worsening |
| Repos with dupes | 3 | **5 (all)** | Worsening |
| Duplicate PR issues | 7+ | **58** | Critical |
| Prod surfaces reachable | 0/4 | 0/4 | Unchanged |
**The core problem:** Burn sessions generate faster than triage can absorb. The backlog is growing, not shrinking.
---
## P0 — Critical
### 1. Production Surfaces Down (404 on all endpoints)
**Status:** Unchanged since QA report
**Impact:** Zero users can reach any Timmy surface. The Door (crisis intervention) is unreachable.
| Surface | URL | Status |
|---------|-----|--------|
| Root | http://143.198.27.163/ | nginx 404 |
| Nexus | http://143.198.27.163/nexus/ | 404 |
| Playground | http://143.198.27.163/playground/ | 404 |
| Tower | http://143.198.27.163/tower/ | 404 |
| Domain | https://alexanderwhitestone.com/ | DNS broken |
**Action:**
- [ ] Verify DNS records for alexanderwhitestone.com (check registrar)
- [ ] SSH to VPS, check nginx config: `nginx -T`
- [ ] Ensure server blocks exist for each location
- [ ] Restart nginx: `systemctl restart nginx`
- [ ] Tracked in the-nexus#1105
**Owner:** Infrastructure
**Priority:** Immediate — this is the mission
### 2. the-playground index.html Broken
**Status:** Unconfirmed since QA report
**Impact:** Playground app crashes on load — missing script tags
**Action:**
- [ ] Read the-playground/index.html
- [ ] Verify script tags for all JS modules
- [ ] Fix missing imports
- [ ] Tracked in the-playground#200
**Owner:** the-playground
**Priority:** High — blocks user-facing playground
---
## P1 — High (Duplicate PR Crisis)
### 3. Duplicate PR Storm Across All Repos
**Current state (verified live 2026-04-14):**
| Repo | Open PRs | Issues with Duplicates | Worst Case |
|------|----------|----------------------|------------|
| the-nexus | 44 | 16 | Issue #1509 → 4 PRs |
| the-playground | 31 | 10 | Issue #180 → 3 PRs |
| the-door | 27 | 6 | Issue #988 → 7 PRs |
| timmy-config | 50 | 20 | Issue #50 → 7 PRs |
| timmy-home | 14 | 6 | Issue #50 → 6 PRs |
| **Total** | **166** | **58 issues** | — |
**Root cause:** Burn sessions create branches without checking for existing PRs on the same issue. No deduplication gate in the burn pipeline.
**Immediate action — close duplicates per repo:**
For each issue with multiple PRs:
1. Keep the PR with the most commits/diff (most complete implementation)
2. Close all others with comment: "Closing duplicate. See #PR for primary implementation."
3. If no PR is clearly superior, keep the oldest (first mover)
**Script to identify duplicates:**
```bash
# For each repo, list issues with >1 open PR
python3 scripts/duplicate-pr-detector.py --repo <repo> --close-duplicates
```
**Long-term fix:**
- [ ] Add pre-flight check to burn loop: query open PRs before creating new branch
- [ ] Add Gitea label `burn-active` to track which issues have active burn PRs
- [ ] Add CI check that rejects PR if another open PR references the same issue
**Owner:** Fleet / Burn infrastructure
**Priority:** High — duplicates waste review time and create merge conflicts
### 4. Misfiled PR in wrong repo
**the-nexus PR #1521:** "timmy-home Backlog Triage Report" is filed in the-nexus but concerns timmy-home.
**Action:**
- [ ] Close PR #1521 in the-nexus with redirect comment
- [ ] File content as issue or PR in timmy-home if still relevant
---
## P2 — Medium
### 5. the-door Crisis Features Blocked
Mission-critical PRs sitting unreviewed:
| Issue | Title | Impact |
|-------|-------|--------|
| #91 | Safety plan improvements | User safety |
| #89 | Safety plan enhancements | User safety |
| #90 | Crisis overlay fixes | UX |
| #87 | Crisis overlay bugs | UX |
| 988 link | Crisis hotline link fix | **Life safety** |
**Action:**
- [ ] Prioritize the-door PR review over all other repos
- [ ] Assign a reviewer or run dedicated triage session for the-door only
- [ ] After review, merge in dependency order
**Owner:** Crisis team / Alexander
**Priority:** High — this is the mission
### 6. Branch Protection Missing Foundation-Wide
No repo has branch protection enabled. Any member can push directly to main.
**Action:**
- [ ] Enable branch protection on all repos with:
- Require 1 approval before merge
- Require CI to pass (where CI exists)
- Dismiss stale approvals on new commits
- [ ] Covered in timmy-home PR #606 but not yet implemented
**Repos without CI (need smoke test first):**
- the-playground
- the-beacon
- timmy-home
**Owner:** Alexander / Infrastructure
**Priority:** Medium — prevents accidental breakage
---
## P3 — Low (Process Improvements)
### 7. Burn Session Deduplication Gate
**Problem:** Burn loops don't check for existing PRs before creating new ones.
**Solution:** Pre-flight check in burn pipeline:
```python
def has_open_pr(owner, repo, issue_number):
prs = gitea.get_pulls(owner, repo, state="open")
for pr in prs:
if f"#{issue_number}" in (pr.get("body", "") or ""):
return True
return False
```
**Action:**
- [ ] Add to hermes-agent burn loop
- [ ] Add to timmy-config burn scripts
- [ ] Test with dry-run before enabling
### 8. Nightly Triage Cron
**Problem:** No automated triage. Duplicates accumulate until manual sweep.
**Solution:** Nightly cron that:
1. Scans all repos for duplicate PRs
2. Posts summary to a triage channel
3. Auto-closes duplicates older than 48h with lower diff count
**Action:**
- [ ] Design triage cron job spec
- [ ] Implement as hermes cron job
- [ ] Run nightly at 03:00 UTC
---
## Priority Order (Execution Sequence)
1. **Fix DNS/nginx** — The Door must be reachable (crisis intervention = the mission)
2. **Close duplicate PRs** — 58 issues with dupes, clear the noise
3. **Review the-door PRs** — Mission-critical crisis features
4. **Fix the-playground** — User-facing app broken
5. **Enable branch protection** — Prevent future breakage
6. **Build dedup gate** — Prevent future duplicate storms
7. **Nightly triage cron** — Automated hygiene
---
## Verification Checklist
After completing actions above, verify:
- [ ] http://143.198.27.163/ returns a page (not 404)
- [ ] https://alexanderwhitestone.com/ resolves
- [ ] All repos have <5 duplicate PRs
- [ ] the-door has 0 unreviewed safety/crisis PRs
- [ ] Branch protection enabled on all repos
- [ ] Burn loop has pre-flight PR check
---
*This plan converts QA findings into executable actions. Each item has an owner, priority, and verification step.*