Compare commits

..

1 Commits

Author SHA1 Message Date
Alexander Whitestone
418e601f74 docs: add human confirmation firewall research report
All checks were successful
Lint / lint (pull_request) Successful in 9s
2026-04-22 11:22:24 -04:00
5 changed files with 515 additions and 620 deletions

View File

@@ -1,66 +0,0 @@
# Morning Review Packet Status — #949
Generated: 2026-04-22T14:57:44.332419+00:00
Epic: [EPIC: Morning review packet — Hermes harness features landed 2026-04-21](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/949)
## Summary
- Child QA issues tracked: 13
- Open child issues: 11
- Closed child issues: 2
- Open child issues already backed by PRs: 7
- Open child issues still unowned on forge: 4
## Child QA Matrix
| Issue | State | Open PRs | Title |
|------:|-------|----------|-------|
| #950 | open | — | [QA] Verify AI Gateway provider UX + attribution headers |
| #951 | open | — | [QA] Verify transport abstraction + AnthropicTransport wiring |
| #952 | open | — | [QA] Verify CLI voice beep toggle |
| #953 | open | [#1020](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/pulls/1020) | [QA] Verify bundled skill scripts run out of the box |
| #954 | open | [#1021](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/pulls/1021) | [QA] Verify maps skill guest_house / camp_site / bakery expansion |
| #955 | open | — | [QA] Verify KittenTTS local provider end-to-end |
| #956 | open | [#1018](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/pulls/1018) | [QA] Verify numbered keyboard shortcuts for approval + clarify prompts |
| #957 | open | [#1015](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/pulls/1015) | [QA] Verify optional adversarial-ux-test skill catalog flow |
| #958 | open | [#1016](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/pulls/1016) | [QA] Verify /usage account limits in CLI + gateway |
| #959 | open | [#1014](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/pulls/1014) | [QA] Verify OpenCode-Go curated catalog additions |
| #960 | open | [#1017](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/pulls/1017) | [QA] Verify patch 'did you mean?' suggestions |
| #961 | closed | — | [QA] Verify web dashboard update/restart action buttons |
| #962 | closed | — | [QA] Verify hardcoded-home path guard on burn/921 branch |
## Drift Signals
forge/main is still catching up to the upstream packet.
Active PR-backed child lanes:
- #953 -> #1020 ([QA] Verify bundled skill scripts run out of the box)
- #954 -> #1021 ([QA] Verify maps skill guest_house / camp_site / bakery expansion)
- #956 -> #1018 ([QA] Verify numbered keyboard shortcuts for approval + clarify prompts)
- #957 -> #1015 ([QA] Verify optional adversarial-ux-test skill catalog flow)
- #958 -> #1016 ([QA] Verify /usage account limits in CLI + gateway)
- #959 -> #1014 ([QA] Verify OpenCode-Go curated catalog additions)
- #960 -> #1017 ([QA] Verify patch 'did you mean?' suggestions)
## Unowned Open QA Issues
- #950 [QA] Verify AI Gateway provider UX + attribution headers
- #951 [QA] Verify transport abstraction + AnthropicTransport wiring
- #952 [QA] Verify CLI voice beep toggle
- #955 [QA] Verify KittenTTS local provider end-to-end
## Decomposition Follow-Ups
- #965 [open] [EPIC: Morning review packet — Hermes harness features landed 2026-04-21] Phase 1: Landscape Analysis & Scaffolding
- #966 [open] [EPIC: Morning review packet — Hermes harness features landed 2026-04-21] Phase 2: Core Logic Implementation
- #967 [closed] [EPIC: Morning review packet — Hermes harness features landed 2026-04-21] Phase 3: Poka-yoke Integration & Fleet Verification
## Conclusion
Refs #949 only. This epic remains open until every child QA issue has a truthful PASS/FAIL outcome, attached evidence, and any upstream/main versus forge/main drift is resolved or explicitly documented.
## Regeneration
```bash
python3 scripts/morning_review_packet_status.py --fetch-live --json-out docs/morning-review-packet-2026-04-21.snapshot.json --markdown-out docs/morning-review-packet-2026-04-21-status.md
```

View File

@@ -1,172 +0,0 @@
{
"generated_at": "2026-04-22T14:57:44.332419+00:00",
"repo": "Timmy_Foundation/hermes-agent",
"epic": {
"number": 949,
"title": "EPIC: Morning review packet \u2014 Hermes harness features landed 2026-04-21",
"state": "open",
"html_url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/949"
},
"children": [
{
"number": 950,
"title": "[QA] Verify AI Gateway provider UX + attribution headers",
"state": "open",
"html_url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/950",
"open_prs": []
},
{
"number": 951,
"title": "[QA] Verify transport abstraction + AnthropicTransport wiring",
"state": "open",
"html_url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/951",
"open_prs": []
},
{
"number": 952,
"title": "[QA] Verify CLI voice beep toggle",
"state": "open",
"html_url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/952",
"open_prs": []
},
{
"number": 953,
"title": "[QA] Verify bundled skill scripts run out of the box",
"state": "open",
"html_url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/953",
"open_prs": [
{
"number": 1020,
"title": "fix: ship bundled skill scripts executable",
"head": "fix/953",
"url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/pulls/1020"
}
]
},
{
"number": 954,
"title": "[QA] Verify maps skill guest_house / camp_site / bakery expansion",
"state": "open",
"html_url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/954",
"open_prs": [
{
"number": 1021,
"title": "feat: sync maps skill and verify guest_house/camp_site/bakery (#954)",
"head": "fix/954",
"url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/pulls/1021"
}
]
},
{
"number": 955,
"title": "[QA] Verify KittenTTS local provider end-to-end",
"state": "open",
"html_url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/955",
"open_prs": []
},
{
"number": 956,
"title": "[QA] Verify numbered keyboard shortcuts for approval + clarify prompts",
"state": "open",
"html_url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/956",
"open_prs": [
{
"number": 1018,
"title": "fix: add numbered approval and clarify shortcuts (#956)",
"head": "fix/956",
"url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/pulls/1018"
}
]
},
{
"number": 957,
"title": "[QA] Verify optional adversarial-ux-test skill catalog flow",
"state": "open",
"html_url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/957",
"open_prs": [
{
"number": 1015,
"title": "feat(skills): backport adversarial-ux-test optional skill",
"head": "fix/957",
"url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/pulls/1015"
}
]
},
{
"number": 958,
"title": "[QA] Verify /usage account limits in CLI + gateway",
"state": "open",
"html_url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/958",
"open_prs": [
{
"number": 1016,
"title": "fix: restore /usage account limits in CLI + gateway (#958)",
"head": "fix/958",
"url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/pulls/1016"
}
]
},
{
"number": 959,
"title": "[QA] Verify OpenCode-Go curated catalog additions",
"state": "open",
"html_url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/959",
"open_prs": [
{
"number": 1014,
"title": "fix(opencode-go): restore curated catalog additions",
"head": "fix/959",
"url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/pulls/1014"
}
]
},
{
"number": 960,
"title": "[QA] Verify patch 'did you mean?' suggestions",
"state": "open",
"html_url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/960",
"open_prs": [
{
"number": 1017,
"title": "fix(patch): port and verify did-you-mean suggestions (#960)",
"head": "fix/960",
"url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/pulls/1017"
}
]
},
{
"number": 961,
"title": "[QA] Verify web dashboard update/restart action buttons",
"state": "closed",
"html_url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/961",
"open_prs": []
},
{
"number": 962,
"title": "[QA] Verify hardcoded-home path guard on burn/921 branch",
"state": "closed",
"html_url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/962",
"open_prs": []
}
],
"decomposition_issues": [
{
"number": 965,
"title": "[EPIC: Morning review packet \u2014 Hermes harness features landed 2026-04-21] Phase 1: Landscape Analysis & Scaffolding",
"state": "open",
"html_url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/965"
},
{
"number": 966,
"title": "[EPIC: Morning review packet \u2014 Hermes harness features landed 2026-04-21] Phase 2: Core Logic Implementation",
"state": "open",
"html_url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/966"
},
{
"number": 967,
"title": "[EPIC: Morning review packet \u2014 Hermes harness features landed 2026-04-21] Phase 3: Poka-yoke Integration & Fleet Verification",
"state": "closed",
"html_url": "https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/967"
}
]
}

View File

@@ -0,0 +1,515 @@
# Human Confirmation Firewall: Research Report
## Implementation Patterns for Hermes Agent
**Issue:** #878
**Parent:** #659
**Priority:** P0
**Scope:** Human-in-the-loop safety patterns for tool calls, crisis handling, and irreversible actions
---
## Executive Summary
Hermes already has a partial human confirmation firewall, but it is narrow.
Current repo state shows:
- a real **pre-execution gate** for dangerous terminal commands in `tools/approval.py`
- a partial **confidence-threshold path** via `_smart_approve()` in `tools/approval.py`
- gateway support for blocking approval resolution in `gateway/run.py`
What is still missing is the core recommendation from this research issue:
- **confidence scoring on all tool calls**, not just terminal commands that already matched a dangerous regex
- a **hard pre-execution human gate for crisis interventions**, especially any action that would auto-respond to suicidal content
- a consistent way to classify actions into:
1. pre-execution gate
2. post-execution review
3. confidence-threshold execution
Recommendation:
- use **Pattern 1: Pre-Execution Gate** for crisis interventions and irreversible/high-impact actions
- use **Pattern 3: Confidence Threshold** for normal operations
- reserve **Pattern 2: Post-Execution Review** only for low-risk and reversible actions
The next implementation step should be a **tool-call risk assessment layer** that runs before dispatch in `model_tools.handle_function_call()`, assigns a score and pattern to every tool call, and routes only the highest-risk calls into mandatory human confirmation.
---
## 1. The Three Proven Patterns
### Pattern 1: Pre-Execution Gate
Definition:
- halt before execution
- show the proposed action to the human
- require explicit approval or denial
Best for:
- destructive actions
- irreversible side effects
- crisis interventions
- actions that affect another human's safety, money, infrastructure, or private data
Strengths:
- strongest safety guarantee
- simplest audit story
- prevents the most catastrophic failure mode: acting first and apologizing later
Weaknesses:
- adds latency
- creates operator burden if overused
- should not be applied to every ordinary tool call
### Pattern 2: Post-Execution Review
Definition:
- execute first
- expose result to human
- allow rollback or follow-up correction
Best for:
- reversible operations
- low-risk actions with fast recovery
- tasks where human review matters but immediate execution is acceptable
Strengths:
- low friction
- fast iteration
- useful when rollback is practical
Weaknesses:
- unsafe for crisis or destructive actions
- only works when rollback actually exists
- a poor fit for external communication or life-safety contexts
### Pattern 3: Confidence Threshold
Definition:
- compute a risk/confidence score before execution
- auto-execute high-confidence safe actions
- request confirmation for lower-confidence or higher-risk actions
Best for:
- mixed-risk tool ecosystems
- day-to-day operations where always-confirm would be too expensive
- systems with a large volume of ordinary, safe reads and edits
Strengths:
- best balance of speed and safety
- scales across many tool types
- allows targeted human attention where it matters most
Weaknesses:
- depends on a good scoring model
- weak scoring creates false negatives or unnecessary prompts
- must remain inspectable and debuggable
---
## 2. What Hermes Already Has
## 2.1 Existing Pre-Execution Gate for Dangerous Terminal Commands
`tools/approval.py` already implements a real pre-execution confirmation path for dangerous shell commands.
Observed components:
- `DANGEROUS_PATTERNS`
- `detect_dangerous_command()`
- `prompt_dangerous_approval()`
- `check_dangerous_command()`
- gateway queueing and resolution support in the same module
This is already Pattern 1.
Current behavior:
- dangerous terminal commands are detected before execution
- the user can allow once / session / always / deny
- gateway sessions can block until approval resolves
This is a strong foundation, but it is limited to a subset of terminal commands.
## 2.2 Partial Confidence Threshold via Smart Approvals
Hermes also already has a partial Pattern 3.
Observed component:
- `_smart_approve()` in `tools/approval.py`
Current behavior:
- only runs **after** a command has already been flagged by dangerous-pattern detection
- uses the auxiliary LLM to decide:
- approve
- deny
- escalate
This means Hermes has a confidence-threshold mechanism, but only for **already-flagged dangerous terminal commands**.
What it does not yet do:
- score all tool calls
- classify non-terminal tools
- distinguish crisis interventions from normal ops
- produce a shared risk model across the tool surface
## 2.3 Blocking Approval UX in Gateway
`gateway/run.py` already routes `/approve` and `/deny` into the blocking approval path.
This means the infrastructure for a true human confirmation firewall already exists in messaging contexts.
That is important because the missing work is not "invent human approval from zero."
The missing work is:
- expand the scope from dangerous shell commands to **all tool calls that matter**
- make the routing policy explicit and inspectable
---
## 3. What Hermes Still Lacks
## 3.1 No Universal Tool-Call Risk Assessment
The current approval system is command-pattern-centric.
It is not yet a tool-call firewall.
Missing capability:
- before dispatch, every tool call should receive a structured assessment:
- tool name
- side-effect class
- reversibility
- human-impact potential
- crisis relevance
- confidence score
- recommended confirmation pattern
Natural insertion point:
- `model_tools.handle_function_call()`
That function already sits at the central dispatch boundary.
It is the right place to add a pre-dispatch classifier.
## 3.2 No Hard Crisis Gate for Outbound Intervention
Issue #878 explicitly recommends:
- Pattern 1 for crisis interventions
- never auto-respond to suicidal content
That recommendation is not yet codified as a global firewall rule.
Missing rule:
- if a tool call would directly intervene in a crisis context or send outward guidance in response to suicidal content, it must require explicit human confirmation before execution
Examples that should hard-gate:
- outbound `send_message` content aimed at a suicidal user
- any future tool that places calls, escalates emergencies, or contacts third parties about a crisis
- any autonomous action that claims a person should or should not take a life-safety step
## 3.3 No First-Class Post-Execution Review Policy
Hermes has approval and denial, but it does not yet have a formal policy for when Pattern 2 is acceptable.
Without a policy, post-execution review tends to get used implicitly rather than intentionally.
That is risky.
Hermes should define Pattern 2 narrowly:
- only for actions that are both low-risk and reversible
- only when the system can show the human exactly what happened
- never for crisis, finance, destructive config, or sensitive comms
---
## 4. Recommended Architecture for Hermes
## 4.1 Add a Tool-Call Assessment Layer
Add a pre-dispatch assessment object for every tool call.
Suggested shape:
```python
@dataclass
class ToolCallAssessment:
tool_name: str
risk_score: float # 0.0 to 1.0
confidence: float # confidence in the assessment itself
pattern: str # pre_execution_gate | post_execution_review | confidence_threshold
requires_human: bool
reasons: list[str]
reversible: bool
crisis_sensitive: bool
```
Suggested execution point:
- inside `model_tools.handle_function_call()` before `orchestrator.dispatch()`
Why here:
- one place covers all tools
- one place can emit traces
- one place can remain model-agnostic
- one place lets plugins observe or override the assessment
## 4.2 Classify Tool Calls by Side-Effect Class
Suggested first-pass taxonomy:
### A. Read-only
Examples:
- `read_file`
- `search_files`
- `browser_snapshot`
- `browser_console` read-only inspection
Pattern:
- confidence threshold
- almost always auto-execute
- human confirmation normally unnecessary
### B. Local reversible edits
Examples:
- `patch`
- `write_file`
- `todo`
Pattern:
- confidence threshold
- human confirmation only when risk score rises because of path sensitivity or scope breadth
### C. External side effects
Examples:
- `send_message`
- `cronjob`
- `delegate_task`
- smart-home actuation tools
Pattern:
- confidence threshold by default
- pre-execution gate when score exceeds threshold or when context is sensitive
### D. Critical / destructive / crisis-sensitive
Examples:
- dangerous `terminal`
- financial actions
- deletion / kill / restart / deployment in sensitive paths
- outbound crisis intervention
Pattern:
- pre-execution gate
- never auto-execute on confidence alone
## 4.3 Crisis Override Rule
Add a hard override:
```text
If tool call is crisis-sensitive AND outbound or irreversible:
requires_human = True
pattern = pre_execution_gate
```
This is the most important rule in the issue.
The model may draft the message.
The human must confirm before the system sends it.
## 4.4 Use Confidence Threshold for Normal Ops
For non-crisis operations, use Pattern 3.
Suggested logic:
- low risk + high assessment confidence -> auto-execute
- medium risk or medium confidence -> ask human
- high risk -> always ask human
Key point:
- confidence is not just "how sure the LLM is"
- confidence should combine:
- tool type certainty
- argument clarity
- path sensitivity
- external side effects
- crisis indicators
---
## 5. Recommended Initial Scoring Factors
A simple initial scorer is enough.
It does not need to be fancy.
Suggested factors:
### 5.1 Tool class risk
- read-only tools: very low base risk
- local mutation tools: moderate base risk
- external communication / automation tools: higher base risk
- shell execution: variable, often high
### 5.2 Target sensitivity
Examples:
- `/tmp` or local scratch paths -> lower
- repo files under git -> medium
- system config, credentials, secrets, gateway lifecycle -> high
- human-facing channels -> high if message content is sensitive
### 5.3 Reversibility
- reversible -> lower
- difficult but possible to undo -> medium
- practically irreversible -> high
### 5.4 Human-impact content
- no direct human impact -> low
- administrative impact -> medium
- crisis / safety / emotional intervention -> critical
### 5.5 Context certainty
- arguments are explicit and narrow -> higher confidence
- arguments are vague, inferred, or broad -> lower confidence
---
## 6. Implementation Plan
## Phase 1: Assessment Without Behavior Change
Goal:
- score all tool calls
- log assessment decisions
- emit traces for review
- do not yet block new tool categories
Files to touch:
- `tools/approval.py`
- `model_tools.py`
- tests for assessment coverage
Output:
- risk/confidence trace for every tool call
- pattern recommendation for every tool call
Why first:
- lets us calibrate before changing runtime behavior
- avoids breaking existing workflows blindly
## Phase 2: Hard-Gate Crisis-Sensitive Outbound Actions
Goal:
- enforce Pattern 1 for crisis interventions
Likely surfaces:
- `send_message`
- any future telephony / call / escalation tools
- other tools with direct human intervention side effects
Rule:
- never auto-send crisis intervention content without human confirmation
## Phase 3: General Confidence Threshold for Normal Ops
Goal:
- apply Pattern 3 to all tool calls
- auto-run clearly safe actions
- escalate ambiguous or medium-risk actions
Likely thresholds:
- score < 0.25 -> auto
- 0.25 to 0.60 -> confirm if confidence is weak
- > 0.60 -> confirm
- crisis-sensitive -> always confirm
## Phase 4: Optional Post-Execution Review Lane
Goal:
- allow Pattern 2 only for explicitly reversible operations
Examples:
- maybe low-risk messaging drafts saved locally
- maybe reversible UI actions in specific environments
Important:
- this phase is optional
- Hermes should not rely on Pattern 2 for safety-critical flows
---
## 7. Verification Criteria for the Future Implementation
The eventual implementation should prove all of the following:
1. every tool call receives a scored assessment before dispatch
2. crisis-sensitive outbound actions always require human confirmation
3. dangerous terminal commands still preserve their current pre-execution gate
4. clearly safe read-only tool calls are not slowed by unnecessary prompts
5. assessment traces can be inspected after a run
6. approval decisions remain session-safe across CLI and gateway contexts
---
## 8. Concrete Recommendations
### Recommendation 1
Do **not** replace the current dangerous-command approval path.
Generalize above it.
Why:
- existing terminal Pattern 1 already works
- this is the strongest piece of the current firewall
### Recommendation 2
Add a universal scorer in `model_tools.handle_function_call()`.
Why:
- that is the first point where Hermes knows the tool name and structured arguments
- it is the cleanest place to classify all tool calls uniformly
### Recommendation 3
Treat crisis-sensitive outbound intervention as a separate safety class.
Why:
- issue #878 explicitly calls for Pattern 1 here
- this matches Timmy's SOUL-level safety requirements
### Recommendation 4
Ship scoring traces before enforcement expansion.
Why:
- you cannot tune thresholds you cannot inspect
- false positives will otherwise frustrate normal usage
### Recommendation 5
Use Pattern 3 as the default policy for normal operations.
Why:
- full manual confirmation on every tool call is too expensive
- full autonomy is too risky
- Pattern 3 is the practical middle ground
---
## 9. Bottom Line
Hermes should implement a **two-track human confirmation firewall**:
1. **Pattern 1: Pre-Execution Gate**
- crisis interventions
- destructive terminal actions
- irreversible or safety-critical tool calls
2. **Pattern 3: Confidence Threshold**
- all ordinary tool calls
- driven by a universal tool-call assessment layer
- integrated at the central dispatch boundary
Pattern 2 should remain optional and narrow.
It is not the primary answer for Hermes.
The repo already contains the beginnings of this system.
The next step is not new theory.
It is to turn the existing approval path into a true **tool-call-wide human confirmation firewall**.
---
## References
- Issue #878 — Human Confirmation Firewall Implementation Patterns
- Issue #659 — Critical Research Tasks
- `tools/approval.py` — current dangerous-command approval flow and smart approvals
- `model_tools.py` — central tool dispatch boundary
- `gateway/run.py` — blocking approval handling for messaging sessions

View File

@@ -1,288 +0,0 @@
#!/usr/bin/env python3
"""Generate a grounded status report for hermes-agent morning review packet epic #949."""
from __future__ import annotations
import argparse
import base64
import json
import os
import re
import ssl
import urllib.request
from datetime import datetime, timezone
from pathlib import Path
from typing import Any
BASE_API = "https://forge.alexanderwhitestone.com/api/v1"
REPO = "Timmy_Foundation/hermes-agent"
TOKEN_PATH = Path("~/.config/gitea/token").expanduser()
DEFAULT_JSON_OUT = Path("docs/morning-review-packet-2026-04-21.snapshot.json")
DEFAULT_MARKDOWN_OUT = Path("docs/morning-review-packet-2026-04-21-status.md")
def extract_issue_numbers(text: str) -> list[int]:
seen: set[int] = set()
numbers: list[int] = []
for match in re.finditer(r"#(\d+)", text or ""):
num = int(match.group(1))
if num not in seen:
seen.add(num)
numbers.append(num)
return numbers
def _auth_headers(token: str) -> list[dict[str, str]]:
basic = base64.b64encode(f"{token}:".encode()).decode()
return [
{"Authorization": f"token {token}", "Accept": "application/json"},
{"Authorization": f"Basic {basic}", "Accept": "application/json"},
]
def api_get(path: str, *, headers_options: list[dict[str, str]] | None = None) -> Any:
token = TOKEN_PATH.read_text(encoding="utf-8").strip()
headers_options = headers_options or _auth_headers(token)
ctx = ssl.create_default_context()
url = f"{BASE_API}{path}"
last_error: Exception | None = None
for headers in headers_options:
try:
req = urllib.request.Request(url, headers=headers)
with urllib.request.urlopen(req, context=ctx, timeout=30) as resp:
return json.loads(resp.read().decode())
except Exception as exc: # pragma: no cover - exercised via live CLI use
last_error = exc
raise RuntimeError(f"GET {url} failed: {last_error}")
def issue_pr_matches(pr: dict[str, Any], issue_num: int) -> bool:
title = pr.get("title") or ""
body = pr.get("body") or ""
head = (pr.get("head") or {}).get("ref") or ""
exact_ref = re.compile(rf"(?<!\d)#{issue_num}(?!\d)")
body_ref = re.compile(rf"(?i)(closes|close|fixes|fix|resolves|resolve|refs|ref)\s+#?{issue_num}(?!\d)")
branch_variants = {
f"fix/{issue_num}",
f"issue-{issue_num}",
f"burn/{issue_num}",
f"fix/issue-{issue_num}",
}
return bool(
exact_ref.search(title)
or exact_ref.search(body)
or body_ref.search(body)
or head in branch_variants
)
def fetch_open_prs(*, headers_options: list[dict[str, str]]) -> list[dict[str, Any]]:
prs: list[dict[str, Any]] = []
page = 1
while True:
batch = api_get(
f"/repos/{REPO}/pulls?state=open&limit=100&page={page}",
headers_options=headers_options,
)
if not batch:
break
prs.extend(batch)
if len(batch) < 100:
break
page += 1
return prs
def fetch_live_snapshot(epic_issue_num: int = 949) -> dict[str, Any]:
token = TOKEN_PATH.read_text(encoding="utf-8").strip()
headers_options = _auth_headers(token)
epic = api_get(f"/repos/{REPO}/issues/{epic_issue_num}", headers_options=headers_options)
comments = api_get(f"/repos/{REPO}/issues/{epic_issue_num}/comments", headers_options=headers_options)
child_numbers = [n for n in extract_issue_numbers(epic.get("body") or "") if n != epic_issue_num]
decomposition_numbers = [
n
for comment in comments
for n in extract_issue_numbers(comment.get("body") or "")
if n not in child_numbers and n != epic_issue_num
]
open_prs = fetch_open_prs(headers_options=headers_options)
children = []
for number in child_numbers:
issue = api_get(f"/repos/{REPO}/issues/{number}", headers_options=headers_options)
matching_prs = [
{
"number": pr["number"],
"title": pr["title"],
"head": pr.get("head", {}).get("ref", ""),
"url": pr["html_url"],
}
for pr in open_prs
if issue_pr_matches(pr, number)
]
children.append(
{
"number": issue["number"],
"title": issue["title"],
"state": issue["state"],
"html_url": issue["html_url"],
"open_prs": matching_prs,
}
)
decomposition_issues = []
for number in decomposition_numbers:
issue = api_get(f"/repos/{REPO}/issues/{number}", headers_options=headers_options)
decomposition_issues.append(
{
"number": issue["number"],
"title": issue["title"],
"state": issue["state"],
"html_url": issue["html_url"],
}
)
return {
"generated_at": datetime.now(timezone.utc).isoformat(),
"repo": REPO,
"epic": {
"number": epic["number"],
"title": epic["title"],
"state": epic["state"],
"html_url": epic["html_url"],
},
"children": children,
"decomposition_issues": decomposition_issues,
}
def summarize_snapshot(snapshot: dict[str, Any]) -> dict[str, int]:
children = snapshot.get("children", [])
open_children = [issue for issue in children if issue.get("state") == "open"]
closed_children = [issue for issue in children if issue.get("state") == "closed"]
open_with_pr = [issue for issue in open_children if issue.get("open_prs")]
open_without_pr = [issue for issue in open_children if not issue.get("open_prs")]
return {
"total_children": len(children),
"open_children": len(open_children),
"closed_children": len(closed_children),
"open_with_pr": len(open_with_pr),
"open_without_pr": len(open_without_pr),
}
def render_markdown(snapshot: dict[str, Any]) -> str:
epic = snapshot["epic"]
children = snapshot.get("children", [])
summary = summarize_snapshot(snapshot)
open_with_pr = [issue for issue in children if issue.get("state") == "open" and issue.get("open_prs")]
open_without_pr = [issue for issue in children if issue.get("state") == "open" and not issue.get("open_prs")]
decomposition = snapshot.get("decomposition_issues", [])
lines = [
f"# Morning Review Packet Status — #{epic['number']}",
"",
f"Generated: {snapshot.get('generated_at', '')}",
f"Epic: [{epic['title']}]({epic.get('html_url', '')})",
"",
"## Summary",
"",
f"- Child QA issues tracked: {summary['total_children']}",
f"- Open child issues: {summary['open_children']}",
f"- Closed child issues: {summary['closed_children']}",
f"- Open child issues already backed by PRs: {summary['open_with_pr']}",
f"- Open child issues still unowned on forge: {summary['open_without_pr']}",
"",
"## Child QA Matrix",
"",
"| Issue | State | Open PRs | Title |",
"|------:|-------|----------|-------|",
]
for issue in children:
rendered_prs = []
for pr in issue.get("open_prs", []):
pr_num = pr.get("number", "?")
pr_url = pr.get("url") or pr.get("html_url") or ""
rendered_prs.append(f"[#{pr_num}]({pr_url})" if pr_url else f"#{pr_num}")
pr_text = ", ".join(rendered_prs) or ""
lines.append(
f"| #{issue['number']} | {issue['state']} | {pr_text} | {issue['title']} |"
)
lines.extend([
"",
"## Drift Signals",
"",
"forge/main is still catching up to the upstream packet.",
])
if open_with_pr:
lines.append("")
lines.append("Active PR-backed child lanes:")
for issue in open_with_pr:
pr_numbers = ", ".join(f"#{pr['number']}" for pr in issue.get("open_prs", []))
lines.append(f"- #{issue['number']} -> {pr_numbers} ({issue['title']})")
if open_without_pr:
lines.extend([
"",
"## Unowned Open QA Issues",
"",
])
for issue in open_without_pr:
lines.append(f"- #{issue['number']} {issue['title']}")
if decomposition:
lines.extend([
"",
"## Decomposition Follow-Ups",
"",
])
for issue in decomposition:
lines.append(f"- #{issue['number']} [{issue['state']}] {issue['title']}")
lines.extend([
"",
"## Conclusion",
"",
"Refs #949 only. This epic remains open until every child QA issue has a truthful PASS/FAIL outcome, attached evidence, and any upstream/main versus forge/main drift is resolved or explicitly documented.",
"",
"## Regeneration",
"",
"```bash",
"python3 scripts/morning_review_packet_status.py --fetch-live --json-out docs/morning-review-packet-2026-04-21.snapshot.json --markdown-out docs/morning-review-packet-2026-04-21-status.md",
"```",
])
return "\n".join(lines) + "\n"
def write_json(path: Path, data: dict[str, Any]) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(json.dumps(data, indent=2) + "\n", encoding="utf-8")
def main() -> None:
parser = argparse.ArgumentParser(description="Generate grounded status docs for epic #949")
parser.add_argument("--fetch-live", action="store_true", help="Fetch the current packet state from Forge")
parser.add_argument("--snapshot", type=Path, help="Read a local JSON snapshot instead of hitting the API")
parser.add_argument("--json-out", type=Path, default=DEFAULT_JSON_OUT, help="Path to write JSON snapshot")
parser.add_argument("--markdown-out", type=Path, default=DEFAULT_MARKDOWN_OUT, help="Path to write markdown report")
args = parser.parse_args()
if args.fetch_live or not args.snapshot:
snapshot = fetch_live_snapshot()
else:
snapshot = json.loads(args.snapshot.read_text(encoding="utf-8"))
write_json(args.json_out, snapshot)
args.markdown_out.parent.mkdir(parents=True, exist_ok=True)
args.markdown_out.write_text(render_markdown(snapshot), encoding="utf-8")
print(args.markdown_out)
if __name__ == "__main__":
main()

View File

@@ -1,94 +0,0 @@
"""Tests for the morning review packet status report generator."""
from __future__ import annotations
import importlib.util
from pathlib import Path
SCRIPT_PATH = Path(__file__).resolve().parents[1] / "scripts" / "morning_review_packet_status.py"
DOC_PATH = Path(__file__).resolve().parents[1] / "docs" / "morning-review-packet-2026-04-21-status.md"
def load_module():
assert SCRIPT_PATH.exists(), f"missing status script: {SCRIPT_PATH}"
spec = importlib.util.spec_from_file_location("morning_review_packet_status_test", SCRIPT_PATH)
module = importlib.util.module_from_spec(spec)
assert spec.loader is not None
spec.loader.exec_module(module)
return module
def sample_snapshot():
return {
"epic": {"number": 949, "title": "Morning review packet", "state": "open"},
"children": [
{
"number": 950,
"title": "Verify AI Gateway provider UX + attribution headers",
"state": "open",
"open_prs": [],
},
{
"number": 954,
"title": "Verify maps skill guest_house / camp_site / bakery expansion",
"state": "open",
"open_prs": [
{"number": 1021, "head": "fix/954", "title": "feat: sync maps skill and verify guest_house/camp_site/bakery (#954)"}
],
},
{
"number": 961,
"title": "Verify web dashboard update/restart action buttons",
"state": "closed",
"open_prs": [],
},
],
"decomposition_issues": [
{"number": 965, "title": "Phase 1: Landscape Analysis & Scaffolding", "state": "open"},
{"number": 967, "title": "Phase 3: Poka-yoke Integration & Fleet Verification", "state": "closed"},
],
}
def test_extract_child_issue_numbers_from_epic_body():
module = load_module()
body = """
- [ ] #950 one
- [ ] #951 two
- [ ] #962 three
"""
assert module.extract_issue_numbers(body) == [950, 951, 962]
def test_summarize_snapshot_counts_open_closed_and_pr_backing():
module = load_module()
summary = module.summarize_snapshot(sample_snapshot())
assert summary["total_children"] == 3
assert summary["open_children"] == 2
assert summary["closed_children"] == 1
assert summary["open_with_pr"] == 1
assert summary["open_without_pr"] == 1
def test_render_markdown_includes_issue_matrix_and_drift_sections():
module = load_module()
md = module.render_markdown(sample_snapshot())
assert "# Morning Review Packet Status — #949" in md
assert "## Child QA Matrix" in md
assert "#950" in md
assert "#954" in md
assert "#1021" in md
assert "## Unowned Open QA Issues" in md
assert "## Drift Signals" in md
assert "forge/main is still catching up to the upstream packet" in md
def test_committed_status_doc_exists_and_mentions_live_examples():
assert DOC_PATH.exists(), f"missing generated status doc: {DOC_PATH}"
text = DOC_PATH.read_text(encoding="utf-8")
assert "# Morning Review Packet Status — #949" in text
assert "#954" in text
assert "#1021" in text
assert "#950" in text