docs: add human confirmation firewall research report

2026-04-22 11:22:24 -04:00
4 changed files with 515 additions and 850 deletions
--- a/docs/review_packets/hermes-harness-2026-04-21.md
+++ b/docs/review_packets/hermes-harness-2026-04-21.md
@@ -1,387 +0,0 @@
-# Morning Review Packet
-
-Source epic: [EPIC: Morning review packet — Hermes harness features landed 2026-04-21](https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/949)
-
-## Epic context
-
-EPIC: Morning review packet — Hermes harness features landed 2026-04-21
-
-Source: git log on upstream/main since 2026-04-21 00:00 EDT, plus the current local branch `burn/921-poka-yoke-hardcoded-paths` for the branch-only path-guard work.
-
-Important review note:
- Validate upstream-landed features on `upstream/main` or a synced branch.
- Validate the path-guard work on `burn/921-poka-yoke-hardcoded-paths`.
-
-This epic is a morning-review packet: one QA issue per feature cluster, each with concrete acceptance criteria and targeted tests or manual checks.
-
-## Success criteria
- [ ] Every issue has a clear PASS / FAIL outcome.
- [ ] Test output or manual evidence is attached to each issue.
- [ ] Any drift between upstream/main and forge/main is called out explicitly.
-
-## Sub-issues
-### Upstream/main features landed 2026-04-21
- [ ] #950 [QA] Verify AI Gateway provider UX + attribution headers
- [ ] #951 [QA] Verify transport abstraction + AnthropicTransport wiring
- [ ] #952 [QA] Verify CLI voice beep toggle
- [ ] #953 [QA] Verify bundled skill scripts run out of the box
- [ ] #954 [QA] Verify maps skill guest_house / camp_site / bakery expansion
- [ ] #955 [QA] Verify KittenTTS local provider end-to-end
- [ ] #956 [QA] Verify numbered keyboard shortcuts for approval + clarify prompts
- [ ] #957 [QA] Verify optional adversarial-ux-test skill catalog flow
- [ ] #958 [QA] Verify /usage account limits in CLI + gateway
- [ ] #959 [QA] Verify OpenCode-Go curated catalog additions
- [ ] #960 [QA] Verify patch 'did you mean?' suggestions
- [ ] #961 [QA] Verify web dashboard update/restart action buttons
-
-### Local branch-only work
- [ ] #962 [QA] Verify hardcoded-home path guard on burn/921 branch
-
-## Summary
-
-| Issue | State | Commits | Tests |
-| --- | --- | --- | --- |
-| #950 | open | 5 | 2 |
-| #951 | open | 2 | 2 |
-| #952 | open | 1 | 1 |
-| #953 | open | 1 | 2 |
-| #954 | open | 1 | 0 |
-| #955 | open | 2 | 1 |
-| #956 | open | 1 | 0 |
-| #957 | open | 1 | 0 |
-| #958 | open | 2 | 2 |
-| #959 | open | 1 | 1 |
-| #960 | open | 2 | 1 |
-| #961 | closed | 1 | 0 |
-| #962 | closed | 1 | 1 |
-
-## #950 — [QA] Verify AI Gateway provider UX + attribution headers
-
-State: open
-URL: https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/950
-
-### Branch / checkout
- Validate on `upstream/main` or an equivalent synced checkout.
-
-### Commits
- `b11753879` — attribution default_headers for ai-gateway provider
- `700437440` — curated picker with live pricing
- `ac26a460f` — promote ai-gateway in provider picker ordering
- `5bb2d11b0` — auto-promote free Moonshot models
- `29f57ec95` — Vercel deep-link for API key creation
-
-### Targeted tests
- `tests/hermes_cli/test_ai_gateway_models.py`
- `tests/run_agent/test_provider_attribution_headers.py`
-
-### Tasks
- [ ] Open `hermes model` and verify `ai-gateway` appears near the top.
- [ ] Verify live pricing appears in the picker.
- [ ] Verify free Moonshot models are promoted.
- [ ] Trigger API-key setup flow and verify the Vercel deep link.
- [ ] Send one ai-gateway request and verify attribution headers are attached.
-
-### Acceptance criteria
- [ ] UI ordering and pricing match the landed behavior.
- [ ] Attribution headers are present on ai-gateway requests.
- [ ] Targeted tests pass.
-
-## #951 — [QA] Verify transport abstraction + AnthropicTransport wiring
-
-State: open
-URL: https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/951
-
-### Branch / checkout
- Validate on `upstream/main` or an equivalent synced checkout.
-
-### Commits
- `7ab5eebd0` — transport types + Anthropic normalize migration
- `731f4fbae` — transport ABC + AnthropicTransport wired to all paths
-
-### Targeted tests
- `tests/agent/transports/test_types.py`
- `tests/agent/test_anthropic_normalize_v2.py`
-
-### Tasks
- [ ] Verify plain-text Anthropic responses normalize correctly.
- [ ] Verify tool-call responses preserve IDs, names, and arguments.
- [ ] Verify reasoning/thinking is preserved separately from visible content.
- [ ] Verify finish_reason mapping remains correct across paths.
-
-### Acceptance criteria
- [ ] Normalized response shape is stable.
- [ ] Tool-call and reasoning payloads survive normalization.
- [ ] Targeted tests pass.
-
-## #952 — [QA] Verify CLI voice beep toggle
-
-State: open
-URL: https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/952
-
-### Branch / checkout
- Validate on `upstream/main` or an equivalent synced checkout.
-
-### Commits
- `b48ea41d2` — voice: add CLI beep toggle
-
-### Targeted tests
- `tests/tools/test_voice_cli_integration.py`
-
-### Tasks
- [ ] Enable the beep option in config and confirm voice mode emits the beep.
- [ ] Disable the option and confirm the same path is silent.
- [ ] Verify voice mode still strips markdown before speech output.
- [ ] Verify voice mode does not pollute conversation history with TTS-only text.
-
-### Acceptance criteria
- [ ] Beep behavior is actually toggled by config.
- [ ] Existing voice/TTS integration behavior is not regressed.
- [ ] Targeted tests pass.
-
-## #953 — [QA] Verify bundled skill scripts run out of the box
-
-State: open
-URL: https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/953
-
-### Branch / checkout
- Validate on `upstream/main` or an equivalent synced checkout.
-
-### Commits
- `328223576` — make bundled skill scripts runnable out of the box
-
-### Targeted tests
- `tests/agent/test_skill_commands.py`
- `tests/tools/test_local_shell_init.py`
-
-### Tasks
- [ ] Pick a bundled skill that ships a script and run it without manual chmod/PATH surgery.
- [ ] Verify local terminal execution resolves the installed skill script correctly.
- [ ] Verify local shell init still behaves correctly.
-
-### Acceptance criteria
- [ ] Bundled skill scripts execute from the installed skill location with no manual prep.
- [ ] Local shell init remains healthy.
- [ ] Targeted tests pass.
-
-## #954 — [QA] Verify maps skill guest_house / camp_site / bakery expansion
-
-State: open
-URL: https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/954
-
-### Branch / checkout
- Validate on `upstream/main` or an equivalent synced checkout.
-
-### Commits
- `c5a814b23` — maps: add guest_house, camp_site, and dual-key bakery lookup
-
-### Tasks
- [ ] Use the maps skill to search for a guest house in a known populated area.
- [ ] Use the maps skill to search for a camp site in a known populated area.
- [ ] Use the maps skill to search for a bakery and verify both supported keys resolve correctly.
- [ ] Confirm results are sensible and non-empty.
-
-### Acceptance criteria
- [ ] All three place types resolve correctly.
- [ ] Bakery lookup works through both supported keys.
- [ ] Manual evidence is attached in the issue.
-
-## #955 — [QA] Verify KittenTTS local provider end-to-end
-
-State: open
-URL: https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/955
-
-### Branch / checkout
- Validate on `upstream/main` or an equivalent synced checkout.
-
-### Commits
- `1830ebfc5` — add KittenTTS provider
- `2d7ff9c5b` — complete KittenTTS integration across tools/setup/docs/tests
-
-### Targeted tests
- `tests/tools/test_tts_kittentts.py`
-
-### Tasks
- [ ] Configure TTS to use `kittentts`.
- [ ] Generate speech to `.wav` and verify playable output.
- [ ] Verify voice / speed / cleaned text are passed correctly.
- [ ] Generate repeated requests and verify model caching behavior.
- [ ] Generate a non-wav output and verify ffmpeg conversion path.
- [ ] Verify missing-package behavior returns a helpful error.
-
-### Acceptance criteria
- [ ] KittenTTS works end-to-end when installed.
- [ ] Failure mode is operator-friendly when not installed.
- [ ] Targeted tests pass.
-
-## #956 — [QA] Verify numbered keyboard shortcuts for approval + clarify prompts
-
-State: open
-URL: https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/956
-
-### Branch / checkout
- Validate on `upstream/main` or an equivalent synced checkout.
-
-### Commits
- `d1ed6f4fb` — CLI: add numbered keyboard shortcuts to approval and clarify prompts
-
-### Tasks
- [ ] Trigger an approval prompt and choose an option with number keys.
- [ ] Trigger a clarify prompt and choose an option with number keys.
- [ ] Verify the correct option is submitted both times.
- [ ] Verify normal keyboard navigation still works.
-
-### Acceptance criteria
- [ ] Number-key selection works for both prompt types.
- [ ] Legacy keyboard navigation is not broken.
- [ ] Manual evidence is attached in the issue.
-
-## #957 — [QA] Verify optional adversarial-ux-test skill catalog flow
-
-State: open
-URL: https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/957
-
-### Branch / checkout
- Validate on `upstream/main` or an equivalent synced checkout.
-
-### Commits
- `e50e7f11b` — skills: add adversarial-ux-test optional skill
-
-### Tasks
- [ ] Verify the optional skill appears in the optional skill catalog.
- [ ] Install or enable the skill.
- [ ] Load it successfully through Hermes.
- [ ] Disable or remove it and verify catalog state updates cleanly.
-
-### Acceptance criteria
- [ ] Catalog listing is correct.
- [ ] Install / load / disable lifecycle works cleanly.
- [ ] Manual evidence is attached in the issue.
-
-## #958 — [QA] Verify /usage account limits in CLI + gateway
-
-State: open
-URL: https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/958
-
-### Branch / checkout
- Validate on `upstream/main` or an equivalent synced checkout.
-
-### Commits
- `8a11b0a20` — per-provider account limits module
- `bcc5d7b67` — append account limits section in CLI and gateway
-
-### Targeted tests
- `tests/test_account_usage.py`
- `tests/gateway/test_usage_command.py`
-
-### Tasks
- [ ] Run `/usage` in CLI for a provider with account limits.
- [ ] Verify provider, remaining quota, total limit, and reset window render correctly.
- [ ] Run `/usage` through the gateway and verify the same section appears.
- [ ] Verify zero-value cache read/write sections stay hidden when appropriate.
-
-### Acceptance criteria
- [ ] CLI and gateway both show the landed account-limits section correctly.
- [ ] Targeted tests pass.
-
-## #959 — [QA] Verify OpenCode-Go curated catalog additions
-
-State: open
-URL: https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/959
-
-### Branch / checkout
- Validate on `upstream/main` or an equivalent synced checkout.
-
-### Commits
- `4fea1769d` — opencode-go: add Kimi K2.6 and Qwen3.5/3.6 Plus to curated catalog
-
-### Targeted tests
- `tests/hermes_cli/test_opencode_go_in_model_list.py`
-
-### Tasks
- [ ] With valid OpenCode-Go credentials, open `hermes model`.
- [ ] Verify Kimi K2.6 appears.
- [ ] Verify Qwen 3.5 Plus and 3.6 Plus appear.
- [ ] Unset credentials and verify the provider/catalog hides correctly.
-
-### Acceptance criteria
- [ ] New curated models are present when credentials exist.
- [ ] Catalog visibility still respects credential gating.
- [ ] Targeted tests pass.
-
-## #960 — [QA] Verify patch 'did you mean?' suggestions
-
-State: open
-URL: https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/960
-
-### Branch / checkout
- Validate on `upstream/main` or an equivalent synced checkout.
-
-### Commits
- `15abf4ed8` — add `did you mean?` feedback when patch fails to match
- `5e6427a42` — gate it to true no-match cases and extend to v4a / skill_manage
-
-### Targeted tests
- `tests/tools/test_fuzzy_match.py`
-
-### Tasks
- [ ] Intentionally run a replace/patch with a near-miss `old_string`.
- [ ] Verify the tool suggests a useful nearby line/context.
- [ ] Verify suggestions only appear on true no-match failures.
- [ ] Verify the behavior also works via file tools, v4a patching, and skill_manage.
-
-### Acceptance criteria
- [ ] Suggestion quality is helpful, not noisy.
- [ ] Suggestions are correctly gated to no-match cases.
- [ ] Targeted tests pass.
-
-## #961 — [QA] Verify web dashboard update/restart action buttons
-
-State: closed
-URL: https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/961
-
-### Branch / checkout
- Validate on `upstream/main` or an equivalent synced checkout.
-
-### Commits
- `fc21c1420` — add buttons to update Hermes and restart gateway
-
-### Files touched
- `web/src/pages/StatusPage.tsx`
- `web/src/lib/api.ts`
- `web/src/i18n/en.ts`
-
-### Tasks
- [ ] Open the Web UI status page and verify both buttons are present.
- [ ] Click Restart Gateway in a safe environment and verify running/output/success-or-failure states render.
- [ ] Click Update Hermes and verify the same action lifecycle.
- [ ] Verify the page remains responsive while actions are running.
-
-### Acceptance criteria
- [ ] Both action buttons are present and wired.
- [ ] Action status polling and result rendering work end-to-end.
- [ ] Manual evidence is attached in the issue.
-
-## #962 — [QA] Verify hardcoded-home path guard on burn/921 branch
-
-State: closed
-URL: https://forge.alexanderwhitestone.com/Timmy_Foundation/hermes-agent/issues/962
-
-### Branch / checkout
- Validate specifically on `burn/921-poka-yoke-hardcoded-paths` (not upstream/main).
-
-### Commits
- `5dcb90531` — Poka-yoke: prevent hardcoded home-directory paths
-
-### Targeted tests
- `tests/test_path_guard.py`
-
-### Tasks
- [ ] Verify hardcoded `/Users/...` paths are rejected.
- [ ] Verify hardcoded `~/.hermes/...` paths are rejected in guarded contexts.
- [ ] Verify valid relative paths still pass.
- [ ] Verify appropriate absolute paths still pass where intended.
- [ ] Verify linting catches violations in non-test files.
-
-### Acceptance criteria
- [ ] Guard blocks the dangerous patterns and preserves allowed ones.
- [ ] Targeted tests pass.
--- a/research_human_confirmation_firewall.md
+++ b/research_human_confirmation_firewall.md
@@ -0,0 +1,515 @@
+# Human Confirmation Firewall: Research Report
+## Implementation Patterns for Hermes Agent
+
+**Issue:** #878  
+**Parent:** #659  
+**Priority:** P0  
+**Scope:** Human-in-the-loop safety patterns for tool calls, crisis handling, and irreversible actions
+
+---
+
+## Executive Summary
+
+Hermes already has a partial human confirmation firewall, but it is narrow.
+
+Current repo state shows:
+- a real **pre-execution gate** for dangerous terminal commands in `tools/approval.py`
+- a partial **confidence-threshold path** via `_smart_approve()` in `tools/approval.py`
+- gateway support for blocking approval resolution in `gateway/run.py`
+
+What is still missing is the core recommendation from this research issue:
+- **confidence scoring on all tool calls**, not just terminal commands that already matched a dangerous regex
+- a **hard pre-execution human gate for crisis interventions**, especially any action that would auto-respond to suicidal content
+- a consistent way to classify actions into:
+  1. pre-execution gate
+  2. post-execution review
+  3. confidence-threshold execution
+
+Recommendation:
+- use **Pattern 1: Pre-Execution Gate** for crisis interventions and irreversible/high-impact actions
+- use **Pattern 3: Confidence Threshold** for normal operations
+- reserve **Pattern 2: Post-Execution Review** only for low-risk and reversible actions
+
+The next implementation step should be a **tool-call risk assessment layer** that runs before dispatch in `model_tools.handle_function_call()`, assigns a score and pattern to every tool call, and routes only the highest-risk calls into mandatory human confirmation.
+
+---
+
+## 1. The Three Proven Patterns
+
+### Pattern 1: Pre-Execution Gate
+
+Definition:
+- halt before execution
+- show the proposed action to the human
+- require explicit approval or denial
+
+Best for:
+- destructive actions
+- irreversible side effects
+- crisis interventions
+- actions that affect another human's safety, money, infrastructure, or private data
+
+Strengths:
+- strongest safety guarantee
+- simplest audit story
+- prevents the most catastrophic failure mode: acting first and apologizing later
+
+Weaknesses:
+- adds latency
+- creates operator burden if overused
+- should not be applied to every ordinary tool call
+
+### Pattern 2: Post-Execution Review
+
+Definition:
+- execute first
+- expose result to human
+- allow rollback or follow-up correction
+
+Best for:
+- reversible operations
+- low-risk actions with fast recovery
+- tasks where human review matters but immediate execution is acceptable
+
+Strengths:
+- low friction
+- fast iteration
+- useful when rollback is practical
+
+Weaknesses:
+- unsafe for crisis or destructive actions
+- only works when rollback actually exists
+- a poor fit for external communication or life-safety contexts
+
+### Pattern 3: Confidence Threshold
+
+Definition:
+- compute a risk/confidence score before execution
+- auto-execute high-confidence safe actions
+- request confirmation for lower-confidence or higher-risk actions
+
+Best for:
+- mixed-risk tool ecosystems
+- day-to-day operations where always-confirm would be too expensive
+- systems with a large volume of ordinary, safe reads and edits
+
+Strengths:
+- best balance of speed and safety
+- scales across many tool types
+- allows targeted human attention where it matters most
+
+Weaknesses:
+- depends on a good scoring model
+- weak scoring creates false negatives or unnecessary prompts
+- must remain inspectable and debuggable
+
+---
+
+## 2. What Hermes Already Has
+
+## 2.1 Existing Pre-Execution Gate for Dangerous Terminal Commands
+
+`tools/approval.py` already implements a real pre-execution confirmation path for dangerous shell commands.
+
+Observed components:
+- `DANGEROUS_PATTERNS`
+- `detect_dangerous_command()`
+- `prompt_dangerous_approval()`
+- `check_dangerous_command()`
+- gateway queueing and resolution support in the same module
+
+This is already Pattern 1.
+
+Current behavior:
+- dangerous terminal commands are detected before execution
+- the user can allow once / session / always / deny
+- gateway sessions can block until approval resolves
+
+This is a strong foundation, but it is limited to a subset of terminal commands.
+
+## 2.2 Partial Confidence Threshold via Smart Approvals
+
+Hermes also already has a partial Pattern 3.
+
+Observed component:
+- `_smart_approve()` in `tools/approval.py`
+
+Current behavior:
+- only runs **after** a command has already been flagged by dangerous-pattern detection
+- uses the auxiliary LLM to decide:
+  - approve
+  - deny
+  - escalate
+
+This means Hermes has a confidence-threshold mechanism, but only for **already-flagged dangerous terminal commands**.
+
+What it does not yet do:
+- score all tool calls
+- classify non-terminal tools
+- distinguish crisis interventions from normal ops
+- produce a shared risk model across the tool surface
+
+## 2.3 Blocking Approval UX in Gateway
+
+`gateway/run.py` already routes `/approve` and `/deny` into the blocking approval path.
+
+This means the infrastructure for a true human confirmation firewall already exists in messaging contexts.
+
+That is important because the missing work is not "invent human approval from zero."
+The missing work is:
+- expand the scope from dangerous shell commands to **all tool calls that matter**
+- make the routing policy explicit and inspectable
+
+---
+
+## 3. What Hermes Still Lacks
+
+## 3.1 No Universal Tool-Call Risk Assessment
+
+The current approval system is command-pattern-centric.
+It is not yet a tool-call firewall.
+
+Missing capability:
+- before dispatch, every tool call should receive a structured assessment:
+  - tool name
+  - side-effect class
+  - reversibility
+  - human-impact potential
+  - crisis relevance
+  - confidence score
+  - recommended confirmation pattern
+
+Natural insertion point:
+- `model_tools.handle_function_call()`
+
+That function already sits at the central dispatch boundary.
+It is the right place to add a pre-dispatch classifier.
+
+## 3.2 No Hard Crisis Gate for Outbound Intervention
+
+Issue #878 explicitly recommends:
+- Pattern 1 for crisis interventions
+- never auto-respond to suicidal content
+
+That recommendation is not yet codified as a global firewall rule.
+
+Missing rule:
+- if a tool call would directly intervene in a crisis context or send outward guidance in response to suicidal content, it must require explicit human confirmation before execution
+
+Examples that should hard-gate:
+- outbound `send_message` content aimed at a suicidal user
+- any future tool that places calls, escalates emergencies, or contacts third parties about a crisis
+- any autonomous action that claims a person should or should not take a life-safety step
+
+## 3.3 No First-Class Post-Execution Review Policy
+
+Hermes has approval and denial, but it does not yet have a formal policy for when Pattern 2 is acceptable.
+
+Without a policy, post-execution review tends to get used implicitly rather than intentionally.
+
+That is risky.
+
+Hermes should define Pattern 2 narrowly:
+- only for actions that are both low-risk and reversible
+- only when the system can show the human exactly what happened
+- never for crisis, finance, destructive config, or sensitive comms
+
+---
+
+## 4. Recommended Architecture for Hermes
+
+## 4.1 Add a Tool-Call Assessment Layer
+
+Add a pre-dispatch assessment object for every tool call.
+
+Suggested shape:
+
+```python
+@dataclass
+class ToolCallAssessment:
+    tool_name: str
+    risk_score: float          # 0.0 to 1.0
+    confidence: float          # confidence in the assessment itself
+    pattern: str               # pre_execution_gate | post_execution_review | confidence_threshold
+    requires_human: bool
+    reasons: list[str]
+    reversible: bool
+    crisis_sensitive: bool
+```
+
+Suggested execution point:
+- inside `model_tools.handle_function_call()` before `orchestrator.dispatch()`
+
+Why here:
+- one place covers all tools
+- one place can emit traces
+- one place can remain model-agnostic
+- one place lets plugins observe or override the assessment
+
+## 4.2 Classify Tool Calls by Side-Effect Class
+
+Suggested first-pass taxonomy:
+
+### A. Read-only
+Examples:
+- `read_file`
+- `search_files`
+- `browser_snapshot`
+- `browser_console` read-only inspection
+
+Pattern:
+- confidence threshold
+- almost always auto-execute
+- human confirmation normally unnecessary
+
+### B. Local reversible edits
+Examples:
+- `patch`
+- `write_file`
+- `todo`
+
+Pattern:
+- confidence threshold
+- human confirmation only when risk score rises because of path sensitivity or scope breadth
+
+### C. External side effects
+Examples:
+- `send_message`
+- `cronjob`
+- `delegate_task`
+- smart-home actuation tools
+
+Pattern:
+- confidence threshold by default
+- pre-execution gate when score exceeds threshold or when context is sensitive
+
+### D. Critical / destructive / crisis-sensitive
+Examples:
+- dangerous `terminal`
+- financial actions
+- deletion / kill / restart / deployment in sensitive paths
+- outbound crisis intervention
+
+Pattern:
+- pre-execution gate
+- never auto-execute on confidence alone
+
+## 4.3 Crisis Override Rule
+
+Add a hard override:
+
+```text
+If tool call is crisis-sensitive AND outbound or irreversible:
+    requires_human = True
+    pattern = pre_execution_gate
+```
+
+This is the most important rule in the issue.
+
+The model may draft the message.
+The human must confirm before the system sends it.
+
+## 4.4 Use Confidence Threshold for Normal Ops
+
+For non-crisis operations, use Pattern 3.
+
+Suggested logic:
+- low risk + high assessment confidence -> auto-execute
+- medium risk or medium confidence -> ask human
+- high risk -> always ask human
+
+Key point:
+- confidence is not just "how sure the LLM is"
+- confidence should combine:
+  - tool type certainty
+  - argument clarity
+  - path sensitivity
+  - external side effects
+  - crisis indicators
+
+---
+
+## 5. Recommended Initial Scoring Factors
+
+A simple initial scorer is enough.
+It does not need to be fancy.
+
+Suggested factors:
+
+### 5.1 Tool class risk
+- read-only tools: very low base risk
+- local mutation tools: moderate base risk
+- external communication / automation tools: higher base risk
+- shell execution: variable, often high
+
+### 5.2 Target sensitivity
+Examples:
+- `/tmp` or local scratch paths -> lower
+- repo files under git -> medium
+- system config, credentials, secrets, gateway lifecycle -> high
+- human-facing channels -> high if message content is sensitive
+
+### 5.3 Reversibility
+- reversible -> lower
+- difficult but possible to undo -> medium
+- practically irreversible -> high
+
+### 5.4 Human-impact content
+- no direct human impact -> low
+- administrative impact -> medium
+- crisis / safety / emotional intervention -> critical
+
+### 5.5 Context certainty
+- arguments are explicit and narrow -> higher confidence
+- arguments are vague, inferred, or broad -> lower confidence
+
+---
+
+## 6. Implementation Plan
+
+## Phase 1: Assessment Without Behavior Change
+
+Goal:
+- score all tool calls
+- log assessment decisions
+- emit traces for review
+- do not yet block new tool categories
+
+Files to touch:
+- `tools/approval.py`
+- `model_tools.py`
+- tests for assessment coverage
+
+Output:
+- risk/confidence trace for every tool call
+- pattern recommendation for every tool call
+
+Why first:
+- lets us calibrate before changing runtime behavior
+- avoids breaking existing workflows blindly
+
+## Phase 2: Hard-Gate Crisis-Sensitive Outbound Actions
+
+Goal:
+- enforce Pattern 1 for crisis interventions
+
+Likely surfaces:
+- `send_message`
+- any future telephony / call / escalation tools
+- other tools with direct human intervention side effects
+
+Rule:
+- never auto-send crisis intervention content without human confirmation
+
+## Phase 3: General Confidence Threshold for Normal Ops
+
+Goal:
+- apply Pattern 3 to all tool calls
+- auto-run clearly safe actions
+- escalate ambiguous or medium-risk actions
+
+Likely thresholds:
+- score < 0.25 -> auto
+- 0.25 to 0.60 -> confirm if confidence is weak
+- > 0.60 -> confirm
+- crisis-sensitive -> always confirm
+
+## Phase 4: Optional Post-Execution Review Lane
+
+Goal:
+- allow Pattern 2 only for explicitly reversible operations
+
+Examples:
+- maybe low-risk messaging drafts saved locally
+- maybe reversible UI actions in specific environments
+
+Important:
+- this phase is optional
+- Hermes should not rely on Pattern 2 for safety-critical flows
+
+---
+
+## 7. Verification Criteria for the Future Implementation
+
+The eventual implementation should prove all of the following:
+
+1. every tool call receives a scored assessment before dispatch
+2. crisis-sensitive outbound actions always require human confirmation
+3. dangerous terminal commands still preserve their current pre-execution gate
+4. clearly safe read-only tool calls are not slowed by unnecessary prompts
+5. assessment traces can be inspected after a run
+6. approval decisions remain session-safe across CLI and gateway contexts
+
+---
+
+## 8. Concrete Recommendations
+
+### Recommendation 1
+Do **not** replace the current dangerous-command approval path.
+Generalize above it.
+
+Why:
+- existing terminal Pattern 1 already works
+- this is the strongest piece of the current firewall
+
+### Recommendation 2
+Add a universal scorer in `model_tools.handle_function_call()`.
+
+Why:
+- that is the first point where Hermes knows the tool name and structured arguments
+- it is the cleanest place to classify all tool calls uniformly
+
+### Recommendation 3
+Treat crisis-sensitive outbound intervention as a separate safety class.
+
+Why:
+- issue #878 explicitly calls for Pattern 1 here
+- this matches Timmy's SOUL-level safety requirements
+
+### Recommendation 4
+Ship scoring traces before enforcement expansion.
+
+Why:
+- you cannot tune thresholds you cannot inspect
+- false positives will otherwise frustrate normal usage
+
+### Recommendation 5
+Use Pattern 3 as the default policy for normal operations.
+
+Why:
+- full manual confirmation on every tool call is too expensive
+- full autonomy is too risky
+- Pattern 3 is the practical middle ground
+
+---
+
+## 9. Bottom Line
+
+Hermes should implement a **two-track human confirmation firewall**:
+
+1. **Pattern 1: Pre-Execution Gate**
+   - crisis interventions
+   - destructive terminal actions
+   - irreversible or safety-critical tool calls
+
+2. **Pattern 3: Confidence Threshold**
+   - all ordinary tool calls
+   - driven by a universal tool-call assessment layer
+   - integrated at the central dispatch boundary
+
+Pattern 2 should remain optional and narrow.
+It is not the primary answer for Hermes.
+
+The repo already contains the beginnings of this system.
+The next step is not new theory.
+It is to turn the existing approval path into a true **tool-call-wide human confirmation firewall**.
+
+---
+
+## References
+
+- Issue #878 — Human Confirmation Firewall Implementation Patterns
+- Issue #659 — Critical Research Tasks
+- `tools/approval.py` — current dangerous-command approval flow and smart approvals
+- `model_tools.py` — central tool dispatch boundary
+- `gateway/run.py` — blocking approval handling for messaging sessions
--- a/scripts/morning_review_packet.py
+++ b/scripts/morning_review_packet.py
@@ -1,301 +0,0 @@
-#!/usr/bin/env python3
-"""Build a morning review packet from a Gitea epic and its child QA issues.
-
-This script fetches a parent epic plus its sub-issues, extracts the structured
-sections from each QA issue body, and renders a single markdown packet suitable
-for morning review.
-
-Usage:
-    python scripts/morning_review_packet.py --epic-number 949
-    python scripts/morning_review_packet.py --epic-number 949 --children 950-962
-    python scripts/morning_review_packet.py --epic-number 949 --output docs/review_packets/hermes-harness-2026-04-21.md
-"""
-
-from __future__ import annotations
-
-import argparse
-import json
-import os
-import re
-import urllib.request
-from dataclasses import dataclass, field
-from pathlib import Path
-from typing import Iterable
-
-DEFAULT_BASE_URL = "https://forge.alexanderwhitestone.com"
-DEFAULT_OWNER = "Timmy_Foundation"
-DEFAULT_REPO = "hermes-agent"
-DEFAULT_TOKEN_PATH = Path.home() / ".config" / "gitea" / "token"
-
-
-@dataclass(frozen=True)
-class CommitEvidence:
-    sha: str
-    summary: str
-
-
-@dataclass
-class ReviewIssue:
-    number: int
-    title: str
-    state: str
-    url: str
-    comments: int = 0
-    parent_issue: int | None = None
-    checkout_notes: list[str] = field(default_factory=list)
-    commits: list[CommitEvidence] = field(default_factory=list)
-    targeted_tests: list[str] = field(default_factory=list)
-    files_touched: list[str] = field(default_factory=list)
-    tasks: list[str] = field(default_factory=list)
-    acceptance_criteria: list[str] = field(default_factory=list)
-
-
-def parse_issue_number_spec(spec: str) -> list[int]:
-    """Parse a comma-separated issue list like ``950-952,955,962``."""
-    numbers: list[int] = []
-    seen: set[int] = set()
-    for chunk in (part.strip() for part in spec.split(",")):
-        if not chunk:
-            continue
-        if "-" in chunk:
-            start_str, end_str = (part.strip() for part in chunk.split("-", 1))
-            start = int(start_str)
-            end = int(end_str)
-            if end < start:
-                raise ValueError(f"Invalid descending issue range: {chunk}")
-            for number in range(start, end + 1):
-                if number not in seen:
-                    numbers.append(number)
-                    seen.add(number)
-        else:
-            number = int(chunk)
-            if number not in seen:
-                numbers.append(number)
-                seen.add(number)
-    return numbers
-
-
-def _parse_sections(body: str) -> dict[str, list[str]]:
-    sections: dict[str, list[str]] = {}
-    current: str | None = None
-    for raw_line in body.splitlines():
-        line = raw_line.rstrip()
-        if line.startswith("## "):
-            current = line[3:].strip()
-            sections[current] = []
-            continue
-        if current is not None:
-            sections[current].append(line)
-    return sections
-
-
-def _clean_bullet(line: str) -> str | None:
-    stripped = line.strip()
-    if not stripped:
-        return None
-    stripped = re.sub(r"^-\s*\[(?: |x|X)\]\s*", "", stripped)
-    stripped = re.sub(r"^-\s*", "", stripped)
-    return stripped.strip() or None
-
-
-def _extract_bullets(lines: Iterable[str]) -> list[str]:
-    items: list[str] = []
-    for line in lines:
-        cleaned = _clean_bullet(line)
-        if cleaned:
-            items.append(cleaned)
-    return items
-
-
-def _extract_parent_issue(body: str, sections: dict[str, list[str]]) -> int | None:
-    parent_lines = sections.get("Parent", [])
-    for line in parent_lines:
-        match = re.search(r"#(\d+)", line)
-        if match:
-            return int(match.group(1))
-    match = re.search(r"Linked to Epic\s+#(\d+)", body, flags=re.IGNORECASE)
-    if match:
-        return int(match.group(1))
-    return None
-
-
-def _extract_commits(lines: Iterable[str]) -> list[CommitEvidence]:
-    commits: list[CommitEvidence] = []
-    for item in _extract_bullets(lines):
-        match = re.match(r"`([^`]+)`\s*(.*)", item)
-        if match:
-            commits.append(CommitEvidence(sha=match.group(1).strip(), summary=match.group(2).strip()))
-        else:
-            commits.append(CommitEvidence(sha="", summary=item))
-    return commits
-
-
-def _strip_backticks(items: Iterable[str]) -> list[str]:
-    cleaned: list[str] = []
-    for item in items:
-        cleaned.append(item.replace("`", "").strip())
-    return cleaned
-
-
-def discover_child_issue_numbers(epic_body: str) -> list[int]:
-    """Discover sub-issue numbers from an epic body."""
-    sections = _parse_sections(epic_body)
-    sub_lines = sections.get("Sub-issues")
-    if not sub_lines:
-        return []
-    numbers: list[int] = []
-    seen: set[int] = set()
-    for line in sub_lines:
-        for match in re.finditer(r"#(\d+)", line):
-            number = int(match.group(1))
-            if number not in seen:
-                numbers.append(number)
-                seen.add(number)
-    return numbers
-
-
-def parse_child_issue(issue: dict) -> ReviewIssue:
-    body = issue.get("body") or ""
-    sections = _parse_sections(body)
-    commit_lines = sections.get("Commits landed today", []) or sections.get("Commit landed today", [])
-
-    return ReviewIssue(
-        number=int(issue["number"]),
-        title=issue.get("title") or "",
-        state=(issue.get("state") or "unknown").lower(),
-        url=issue.get("html_url") or issue.get("url") or "",
-        comments=int(issue.get("comments") or 0),
-        parent_issue=_extract_parent_issue(body, sections),
-        checkout_notes=_extract_bullets(sections.get("Branch / checkout", [])),
-        commits=_extract_commits(commit_lines),
-        targeted_tests=_strip_backticks(_extract_bullets(sections.get("Targeted tests", []))),
-        files_touched=_strip_backticks(_extract_bullets(sections.get("Files touched", []))),
-        tasks=_extract_bullets(sections.get("Tasks", [])),
-        acceptance_criteria=_extract_bullets(sections.get("Acceptance Criteria", [])),
-    )
-
-
-def build_packet_markdown(epic_issue: dict, child_issues: list[ReviewIssue]) -> str:
-    title = epic_issue.get("title") or f"Epic #{epic_issue.get('number')}"
-    url = epic_issue.get("html_url") or epic_issue.get("url") or ""
-    body = epic_issue.get("body") or ""
-    children = sorted(child_issues, key=lambda item: item.number)
-
-    lines: list[str] = []
-    lines.append("# Morning Review Packet")
-    lines.append("")
-    lines.append(f"Source epic: [{title}]({url})")
-    lines.append("")
-    lines.append("## Epic context")
-    lines.append("")
-    lines.append(title)
-    lines.append("")
-    for line in body.splitlines():
-        if line.strip():
-            lines.append(line)
-        else:
-            lines.append("")
-    lines.append("")
-    lines.append("## Summary")
-    lines.append("")
-    lines.append("| Issue | State | Commits | Tests |")
-    lines.append("| --- | --- | --- | --- |")
-    for child in children:
-        lines.append(
-            f"| #{child.number} | {child.state} | {len(child.commits)} | {len(child.targeted_tests)} |"
-        )
-    lines.append("")
-
-    for child in children:
-        lines.append(f"## #{child.number} — {child.title}")
-        lines.append("")
-        lines.append(f"State: {child.state}")
-        lines.append(f"URL: {child.url}")
-        lines.append("")
-        if child.checkout_notes:
-            lines.append("### Branch / checkout")
-            for note in child.checkout_notes:
-                lines.append(f"- {note}")
-            lines.append("")
-        if child.commits:
-            lines.append("### Commits")
-            for commit in child.commits:
-                if commit.sha:
-                    lines.append(f"- `{commit.sha}` — {commit.summary}")
-                else:
-                    lines.append(f"- {commit.summary}")
-            lines.append("")
-        if child.targeted_tests:
-            lines.append("### Targeted tests")
-            for test_path in child.targeted_tests:
-                lines.append(f"- `{test_path}`")
-            lines.append("")
-        if child.files_touched:
-            lines.append("### Files touched")
-            for file_path in child.files_touched:
-                lines.append(f"- `{file_path}`")
-            lines.append("")
-        if child.tasks:
-            lines.append("### Tasks")
-            for task in child.tasks:
-                lines.append(f"- [ ] {task}")
-            lines.append("")
-        if child.acceptance_criteria:
-            lines.append("### Acceptance criteria")
-            for item in child.acceptance_criteria:
-                lines.append(f"- [ ] {item}")
-            lines.append("")
-
-    return "\n".join(lines).rstrip() + "\n"
-
-
-def _resolve_token(explicit_token: str | None = None) -> str:
-    if explicit_token:
-        return explicit_token.strip()
-    env_token = os.getenv("GITEA_TOKEN")
-    if env_token:
-        return env_token.strip()
-    if DEFAULT_TOKEN_PATH.exists():
-        return DEFAULT_TOKEN_PATH.read_text().strip()
-    raise FileNotFoundError(f"No Gitea token found. Set GITEA_TOKEN or create {DEFAULT_TOKEN_PATH}")
-
-
-def fetch_issue(base_url: str, owner: str, repo: str, number: int, token: str) -> dict:
-    url = f"{base_url.rstrip('/')}/api/v1/repos/{owner}/{repo}/issues/{number}"
-    request = urllib.request.Request(url, headers={"Authorization": f"token {token}"})
-    with urllib.request.urlopen(request, timeout=30) as response:
-        return json.loads(response.read().decode())
-
-
-def collect_child_issues(base_url: str, owner: str, repo: str, epic_issue: dict, token: str, children_spec: str | None = None) -> list[dict]:
-    numbers = parse_issue_number_spec(children_spec) if children_spec else discover_child_issue_numbers(epic_issue.get("body") or "")
-    return [fetch_issue(base_url, owner, repo, number, token) for number in numbers]
-
-
-def main(argv: list[str] | None = None) -> int:
-    parser = argparse.ArgumentParser(description="Build a markdown morning review packet from a Gitea epic")
-    parser.add_argument("--base-url", default=DEFAULT_BASE_URL)
-    parser.add_argument("--owner", default=DEFAULT_OWNER)
-    parser.add_argument("--repo", default=DEFAULT_REPO)
-    parser.add_argument("--epic-number", type=int, required=True)
-    parser.add_argument("--children", help="Explicit issue list/ranges, e.g. 950-962")
-    parser.add_argument("--token", help="Gitea token (defaults to GITEA_TOKEN or ~/.config/gitea/token)")
-    parser.add_argument("--output", help="Write markdown packet to this path instead of stdout")
-    args = parser.parse_args(argv)
-
-    token = _resolve_token(args.token)
-    epic_issue = fetch_issue(args.base_url, args.owner, args.repo, args.epic_number, token)
-    child_issue_dicts = collect_child_issues(args.base_url, args.owner, args.repo, epic_issue, token, args.children)
-    packet = build_packet_markdown(epic_issue, [parse_child_issue(issue) for issue in child_issue_dicts])
-
-    if args.output:
-        output_path = Path(args.output)
-        output_path.parent.mkdir(parents=True, exist_ok=True)
-        output_path.write_text(packet)
-    else:
-        print(packet, end="")
-    return 0
-
-
-if __name__ == "__main__":
-    raise SystemExit(main())
--- a/tests/test_morning_review_packet.py
+++ b/tests/test_morning_review_packet.py
@@ -1,162 +0,0 @@
-from pathlib import Path
-import sys
-
-SCRIPT_DIR = Path(__file__).resolve().parents[1] / "scripts"
-sys.path.insert(0, str(SCRIPT_DIR))
-
-import morning_review_packet as mrp
-
-
-EPIC_BODY = """Source: git log on upstream/main since 2026-04-21 00:00 EDT.
-
-## Success criteria
- [ ] Every issue has a clear PASS / FAIL outcome.
-
-## Sub-issues
- [ ] #950 [QA] Verify AI Gateway provider UX + attribution headers
- [ ] #951 [QA] Verify transport abstraction + AnthropicTransport wiring
- [x] #962 [QA] Verify hardcoded-home path guard on burn/921 branch
-"""
-
-
-CHILD_BODY_PLURAL = """## Parent
-#949
-
-## Branch / checkout
- Validate on `upstream/main` or an equivalent synced checkout.
-
-## Commits landed today
- `b11753879` attribution default_headers for ai-gateway provider
- `700437440` curated picker with live pricing
-
-## Targeted tests
- `tests/hermes_cli/test_ai_gateway_models.py`
- `tests/run_agent/test_provider_attribution_headers.py`
-
-## Tasks
- [ ] Verify the picker ordering.
- [ ] Verify attribution headers.
-
-## Acceptance Criteria
- [ ] Picker shows AI Gateway prominently.
- [ ] Headers appear on OpenRouter calls.
-"""
-
-
-CHILD_BODY_SINGULAR = """## Parent
-#949
-
-## Branch / checkout
- Validate on `upstream/main` or an equivalent synced checkout.
-
-## Commit landed today
- `fc21c1420` add buttons to update Hermes and restart gateway
-
-## Files touched
- `web/src/pages/StatusPage.tsx`
- `web/src/lib/api.ts`
- `web/src/i18n/en.ts`
-
-## Tasks
- [ ] Open the Web UI status page and verify both buttons are present.
- [ ] Click Restart Gateway in a safe environment.
-"""
-
-
-def test_discover_child_issue_numbers_from_epic_body():
-    assert mrp.discover_child_issue_numbers(EPIC_BODY) == [950, 951, 962]
-
-
-def test_parse_issue_number_spec_supports_ranges_and_lists():
-    assert mrp.parse_issue_number_spec("950-952,955,962") == [950, 951, 952, 955, 962]
-
-
-def test_parse_child_issue_extracts_structured_sections():
-    issue = {
-        "number": 950,
-        "title": "[QA] Verify AI Gateway provider UX + attribution headers",
-        "state": "open",
-        "html_url": "https://forge.example/950",
-        "comments": 0,
-        "body": CHILD_BODY_PLURAL,
-    }
-
-    parsed = mrp.parse_child_issue(issue)
-
-    assert parsed.number == 950
-    assert parsed.parent_issue == 949
-    assert parsed.checkout_notes == ["Validate on `upstream/main` or an equivalent synced checkout."]
-    assert [c.sha for c in parsed.commits] == ["b11753879", "700437440"]
-    assert parsed.targeted_tests == [
-        "tests/hermes_cli/test_ai_gateway_models.py",
-        "tests/run_agent/test_provider_attribution_headers.py",
-    ]
-    assert parsed.tasks == [
-        "Verify the picker ordering.",
-        "Verify attribution headers.",
-    ]
-    assert parsed.acceptance_criteria == [
-        "Picker shows AI Gateway prominently.",
-        "Headers appear on OpenRouter calls.",
-    ]
-
-
-def test_parse_child_issue_handles_singular_commit_heading_and_files_touched():
-    issue = {
-        "number": 961,
-        "title": "[QA] Verify web dashboard update/restart action buttons",
-        "state": "closed",
-        "html_url": "https://forge.example/961",
-        "comments": 16,
-        "body": CHILD_BODY_SINGULAR,
-    }
-
-    parsed = mrp.parse_child_issue(issue)
-
-    assert [c.sha for c in parsed.commits] == ["fc21c1420"]
-    assert parsed.files_touched == [
-        "web/src/pages/StatusPage.tsx",
-        "web/src/lib/api.ts",
-        "web/src/i18n/en.ts",
-    ]
-    assert parsed.tasks == [
-        "Open the Web UI status page and verify both buttons are present.",
-        "Click Restart Gateway in a safe environment.",
-    ]
-
-
-def test_build_packet_markdown_renders_summary_and_details():
-    epic_issue = {
-        "number": 949,
-        "title": "EPIC: Morning review packet — Hermes harness features landed 2026-04-21",
-        "state": "open",
-        "html_url": "https://forge.example/949",
-        "body": EPIC_BODY,
-    }
-    child_a = mrp.parse_child_issue({
-        "number": 950,
-        "title": "[QA] Verify AI Gateway provider UX + attribution headers",
-        "state": "open",
-        "html_url": "https://forge.example/950",
-        "comments": 0,
-        "body": CHILD_BODY_PLURAL,
-    })
-    child_b = mrp.parse_child_issue({
-        "number": 961,
-        "title": "[QA] Verify web dashboard update/restart action buttons",
-        "state": "closed",
-        "html_url": "https://forge.example/961",
-        "comments": 16,
-        "body": CHILD_BODY_SINGULAR,
-    })
-
-    markdown = mrp.build_packet_markdown(epic_issue, [child_a, child_b])
-
-    assert "# Morning Review Packet" in markdown
-    assert "EPIC: Morning review packet — Hermes harness features landed 2026-04-21" in markdown
-    assert "| #950 | open | 2 | 2 |" in markdown
-    assert "| #961 | closed | 1 | 0 |" in markdown
-    assert "## #950 — [QA] Verify AI Gateway provider UX + attribution headers" in markdown
-    assert "## #961 — [QA] Verify web dashboard update/restart action buttons" in markdown
-    assert "`b11753879` — attribution default_headers for ai-gateway provider" in markdown
-    assert "`web/src/pages/StatusPage.tsx`" in markdown