Compare commits
4 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
e28d16b324 | ||
|
|
bc32047610 | ||
|
|
3a24420d7d | ||
|
|
d14c1c5a56 |
@@ -1,515 +0,0 @@
|
||||
# Human Confirmation Firewall: Research Report
|
||||
## Implementation Patterns for Hermes Agent
|
||||
|
||||
**Issue:** #878
|
||||
**Parent:** #659
|
||||
**Priority:** P0
|
||||
**Scope:** Human-in-the-loop safety patterns for tool calls, crisis handling, and irreversible actions
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Hermes already has a partial human confirmation firewall, but it is narrow.
|
||||
|
||||
Current repo state shows:
|
||||
- a real **pre-execution gate** for dangerous terminal commands in `tools/approval.py`
|
||||
- a partial **confidence-threshold path** via `_smart_approve()` in `tools/approval.py`
|
||||
- gateway support for blocking approval resolution in `gateway/run.py`
|
||||
|
||||
What is still missing is the core recommendation from this research issue:
|
||||
- **confidence scoring on all tool calls**, not just terminal commands that already matched a dangerous regex
|
||||
- a **hard pre-execution human gate for crisis interventions**, especially any action that would auto-respond to suicidal content
|
||||
- a consistent way to classify actions into:
|
||||
1. pre-execution gate
|
||||
2. post-execution review
|
||||
3. confidence-threshold execution
|
||||
|
||||
Recommendation:
|
||||
- use **Pattern 1: Pre-Execution Gate** for crisis interventions and irreversible/high-impact actions
|
||||
- use **Pattern 3: Confidence Threshold** for normal operations
|
||||
- reserve **Pattern 2: Post-Execution Review** only for low-risk and reversible actions
|
||||
|
||||
The next implementation step should be a **tool-call risk assessment layer** that runs before dispatch in `model_tools.handle_function_call()`, assigns a score and pattern to every tool call, and routes only the highest-risk calls into mandatory human confirmation.
|
||||
|
||||
---
|
||||
|
||||
## 1. The Three Proven Patterns
|
||||
|
||||
### Pattern 1: Pre-Execution Gate
|
||||
|
||||
Definition:
|
||||
- halt before execution
|
||||
- show the proposed action to the human
|
||||
- require explicit approval or denial
|
||||
|
||||
Best for:
|
||||
- destructive actions
|
||||
- irreversible side effects
|
||||
- crisis interventions
|
||||
- actions that affect another human's safety, money, infrastructure, or private data
|
||||
|
||||
Strengths:
|
||||
- strongest safety guarantee
|
||||
- simplest audit story
|
||||
- prevents the most catastrophic failure mode: acting first and apologizing later
|
||||
|
||||
Weaknesses:
|
||||
- adds latency
|
||||
- creates operator burden if overused
|
||||
- should not be applied to every ordinary tool call
|
||||
|
||||
### Pattern 2: Post-Execution Review
|
||||
|
||||
Definition:
|
||||
- execute first
|
||||
- expose result to human
|
||||
- allow rollback or follow-up correction
|
||||
|
||||
Best for:
|
||||
- reversible operations
|
||||
- low-risk actions with fast recovery
|
||||
- tasks where human review matters but immediate execution is acceptable
|
||||
|
||||
Strengths:
|
||||
- low friction
|
||||
- fast iteration
|
||||
- useful when rollback is practical
|
||||
|
||||
Weaknesses:
|
||||
- unsafe for crisis or destructive actions
|
||||
- only works when rollback actually exists
|
||||
- a poor fit for external communication or life-safety contexts
|
||||
|
||||
### Pattern 3: Confidence Threshold
|
||||
|
||||
Definition:
|
||||
- compute a risk/confidence score before execution
|
||||
- auto-execute high-confidence safe actions
|
||||
- request confirmation for lower-confidence or higher-risk actions
|
||||
|
||||
Best for:
|
||||
- mixed-risk tool ecosystems
|
||||
- day-to-day operations where always-confirm would be too expensive
|
||||
- systems with a large volume of ordinary, safe reads and edits
|
||||
|
||||
Strengths:
|
||||
- best balance of speed and safety
|
||||
- scales across many tool types
|
||||
- allows targeted human attention where it matters most
|
||||
|
||||
Weaknesses:
|
||||
- depends on a good scoring model
|
||||
- weak scoring creates false negatives or unnecessary prompts
|
||||
- must remain inspectable and debuggable
|
||||
|
||||
---
|
||||
|
||||
## 2. What Hermes Already Has
|
||||
|
||||
## 2.1 Existing Pre-Execution Gate for Dangerous Terminal Commands
|
||||
|
||||
`tools/approval.py` already implements a real pre-execution confirmation path for dangerous shell commands.
|
||||
|
||||
Observed components:
|
||||
- `DANGEROUS_PATTERNS`
|
||||
- `detect_dangerous_command()`
|
||||
- `prompt_dangerous_approval()`
|
||||
- `check_dangerous_command()`
|
||||
- gateway queueing and resolution support in the same module
|
||||
|
||||
This is already Pattern 1.
|
||||
|
||||
Current behavior:
|
||||
- dangerous terminal commands are detected before execution
|
||||
- the user can allow once / session / always / deny
|
||||
- gateway sessions can block until approval resolves
|
||||
|
||||
This is a strong foundation, but it is limited to a subset of terminal commands.
|
||||
|
||||
## 2.2 Partial Confidence Threshold via Smart Approvals
|
||||
|
||||
Hermes also already has a partial Pattern 3.
|
||||
|
||||
Observed component:
|
||||
- `_smart_approve()` in `tools/approval.py`
|
||||
|
||||
Current behavior:
|
||||
- only runs **after** a command has already been flagged by dangerous-pattern detection
|
||||
- uses the auxiliary LLM to decide:
|
||||
- approve
|
||||
- deny
|
||||
- escalate
|
||||
|
||||
This means Hermes has a confidence-threshold mechanism, but only for **already-flagged dangerous terminal commands**.
|
||||
|
||||
What it does not yet do:
|
||||
- score all tool calls
|
||||
- classify non-terminal tools
|
||||
- distinguish crisis interventions from normal ops
|
||||
- produce a shared risk model across the tool surface
|
||||
|
||||
## 2.3 Blocking Approval UX in Gateway
|
||||
|
||||
`gateway/run.py` already routes `/approve` and `/deny` into the blocking approval path.
|
||||
|
||||
This means the infrastructure for a true human confirmation firewall already exists in messaging contexts.
|
||||
|
||||
That is important because the missing work is not "invent human approval from zero."
|
||||
The missing work is:
|
||||
- expand the scope from dangerous shell commands to **all tool calls that matter**
|
||||
- make the routing policy explicit and inspectable
|
||||
|
||||
---
|
||||
|
||||
## 3. What Hermes Still Lacks
|
||||
|
||||
## 3.1 No Universal Tool-Call Risk Assessment
|
||||
|
||||
The current approval system is command-pattern-centric.
|
||||
It is not yet a tool-call firewall.
|
||||
|
||||
Missing capability:
|
||||
- before dispatch, every tool call should receive a structured assessment:
|
||||
- tool name
|
||||
- side-effect class
|
||||
- reversibility
|
||||
- human-impact potential
|
||||
- crisis relevance
|
||||
- confidence score
|
||||
- recommended confirmation pattern
|
||||
|
||||
Natural insertion point:
|
||||
- `model_tools.handle_function_call()`
|
||||
|
||||
That function already sits at the central dispatch boundary.
|
||||
It is the right place to add a pre-dispatch classifier.
|
||||
|
||||
## 3.2 No Hard Crisis Gate for Outbound Intervention
|
||||
|
||||
Issue #878 explicitly recommends:
|
||||
- Pattern 1 for crisis interventions
|
||||
- never auto-respond to suicidal content
|
||||
|
||||
That recommendation is not yet codified as a global firewall rule.
|
||||
|
||||
Missing rule:
|
||||
- if a tool call would directly intervene in a crisis context or send outward guidance in response to suicidal content, it must require explicit human confirmation before execution
|
||||
|
||||
Examples that should hard-gate:
|
||||
- outbound `send_message` content aimed at a suicidal user
|
||||
- any future tool that places calls, escalates emergencies, or contacts third parties about a crisis
|
||||
- any autonomous action that claims a person should or should not take a life-safety step
|
||||
|
||||
## 3.3 No First-Class Post-Execution Review Policy
|
||||
|
||||
Hermes has approval and denial, but it does not yet have a formal policy for when Pattern 2 is acceptable.
|
||||
|
||||
Without a policy, post-execution review tends to get used implicitly rather than intentionally.
|
||||
|
||||
That is risky.
|
||||
|
||||
Hermes should define Pattern 2 narrowly:
|
||||
- only for actions that are both low-risk and reversible
|
||||
- only when the system can show the human exactly what happened
|
||||
- never for crisis, finance, destructive config, or sensitive comms
|
||||
|
||||
---
|
||||
|
||||
## 4. Recommended Architecture for Hermes
|
||||
|
||||
## 4.1 Add a Tool-Call Assessment Layer
|
||||
|
||||
Add a pre-dispatch assessment object for every tool call.
|
||||
|
||||
Suggested shape:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ToolCallAssessment:
|
||||
tool_name: str
|
||||
risk_score: float # 0.0 to 1.0
|
||||
confidence: float # confidence in the assessment itself
|
||||
pattern: str # pre_execution_gate | post_execution_review | confidence_threshold
|
||||
requires_human: bool
|
||||
reasons: list[str]
|
||||
reversible: bool
|
||||
crisis_sensitive: bool
|
||||
```
|
||||
|
||||
Suggested execution point:
|
||||
- inside `model_tools.handle_function_call()` before `orchestrator.dispatch()`
|
||||
|
||||
Why here:
|
||||
- one place covers all tools
|
||||
- one place can emit traces
|
||||
- one place can remain model-agnostic
|
||||
- one place lets plugins observe or override the assessment
|
||||
|
||||
## 4.2 Classify Tool Calls by Side-Effect Class
|
||||
|
||||
Suggested first-pass taxonomy:
|
||||
|
||||
### A. Read-only
|
||||
Examples:
|
||||
- `read_file`
|
||||
- `search_files`
|
||||
- `browser_snapshot`
|
||||
- `browser_console` read-only inspection
|
||||
|
||||
Pattern:
|
||||
- confidence threshold
|
||||
- almost always auto-execute
|
||||
- human confirmation normally unnecessary
|
||||
|
||||
### B. Local reversible edits
|
||||
Examples:
|
||||
- `patch`
|
||||
- `write_file`
|
||||
- `todo`
|
||||
|
||||
Pattern:
|
||||
- confidence threshold
|
||||
- human confirmation only when risk score rises because of path sensitivity or scope breadth
|
||||
|
||||
### C. External side effects
|
||||
Examples:
|
||||
- `send_message`
|
||||
- `cronjob`
|
||||
- `delegate_task`
|
||||
- smart-home actuation tools
|
||||
|
||||
Pattern:
|
||||
- confidence threshold by default
|
||||
- pre-execution gate when score exceeds threshold or when context is sensitive
|
||||
|
||||
### D. Critical / destructive / crisis-sensitive
|
||||
Examples:
|
||||
- dangerous `terminal`
|
||||
- financial actions
|
||||
- deletion / kill / restart / deployment in sensitive paths
|
||||
- outbound crisis intervention
|
||||
|
||||
Pattern:
|
||||
- pre-execution gate
|
||||
- never auto-execute on confidence alone
|
||||
|
||||
## 4.3 Crisis Override Rule
|
||||
|
||||
Add a hard override:
|
||||
|
||||
```text
|
||||
If tool call is crisis-sensitive AND outbound or irreversible:
|
||||
requires_human = True
|
||||
pattern = pre_execution_gate
|
||||
```
|
||||
|
||||
This is the most important rule in the issue.
|
||||
|
||||
The model may draft the message.
|
||||
The human must confirm before the system sends it.
|
||||
|
||||
## 4.4 Use Confidence Threshold for Normal Ops
|
||||
|
||||
For non-crisis operations, use Pattern 3.
|
||||
|
||||
Suggested logic:
|
||||
- low risk + high assessment confidence -> auto-execute
|
||||
- medium risk or medium confidence -> ask human
|
||||
- high risk -> always ask human
|
||||
|
||||
Key point:
|
||||
- confidence is not just "how sure the LLM is"
|
||||
- confidence should combine:
|
||||
- tool type certainty
|
||||
- argument clarity
|
||||
- path sensitivity
|
||||
- external side effects
|
||||
- crisis indicators
|
||||
|
||||
---
|
||||
|
||||
## 5. Recommended Initial Scoring Factors
|
||||
|
||||
A simple initial scorer is enough.
|
||||
It does not need to be fancy.
|
||||
|
||||
Suggested factors:
|
||||
|
||||
### 5.1 Tool class risk
|
||||
- read-only tools: very low base risk
|
||||
- local mutation tools: moderate base risk
|
||||
- external communication / automation tools: higher base risk
|
||||
- shell execution: variable, often high
|
||||
|
||||
### 5.2 Target sensitivity
|
||||
Examples:
|
||||
- `/tmp` or local scratch paths -> lower
|
||||
- repo files under git -> medium
|
||||
- system config, credentials, secrets, gateway lifecycle -> high
|
||||
- human-facing channels -> high if message content is sensitive
|
||||
|
||||
### 5.3 Reversibility
|
||||
- reversible -> lower
|
||||
- difficult but possible to undo -> medium
|
||||
- practically irreversible -> high
|
||||
|
||||
### 5.4 Human-impact content
|
||||
- no direct human impact -> low
|
||||
- administrative impact -> medium
|
||||
- crisis / safety / emotional intervention -> critical
|
||||
|
||||
### 5.5 Context certainty
|
||||
- arguments are explicit and narrow -> higher confidence
|
||||
- arguments are vague, inferred, or broad -> lower confidence
|
||||
|
||||
---
|
||||
|
||||
## 6. Implementation Plan
|
||||
|
||||
## Phase 1: Assessment Without Behavior Change
|
||||
|
||||
Goal:
|
||||
- score all tool calls
|
||||
- log assessment decisions
|
||||
- emit traces for review
|
||||
- do not yet block new tool categories
|
||||
|
||||
Files to touch:
|
||||
- `tools/approval.py`
|
||||
- `model_tools.py`
|
||||
- tests for assessment coverage
|
||||
|
||||
Output:
|
||||
- risk/confidence trace for every tool call
|
||||
- pattern recommendation for every tool call
|
||||
|
||||
Why first:
|
||||
- lets us calibrate before changing runtime behavior
|
||||
- avoids breaking existing workflows blindly
|
||||
|
||||
## Phase 2: Hard-Gate Crisis-Sensitive Outbound Actions
|
||||
|
||||
Goal:
|
||||
- enforce Pattern 1 for crisis interventions
|
||||
|
||||
Likely surfaces:
|
||||
- `send_message`
|
||||
- any future telephony / call / escalation tools
|
||||
- other tools with direct human intervention side effects
|
||||
|
||||
Rule:
|
||||
- never auto-send crisis intervention content without human confirmation
|
||||
|
||||
## Phase 3: General Confidence Threshold for Normal Ops
|
||||
|
||||
Goal:
|
||||
- apply Pattern 3 to all tool calls
|
||||
- auto-run clearly safe actions
|
||||
- escalate ambiguous or medium-risk actions
|
||||
|
||||
Likely thresholds:
|
||||
- score < 0.25 -> auto
|
||||
- 0.25 to 0.60 -> confirm if confidence is weak
|
||||
- > 0.60 -> confirm
|
||||
- crisis-sensitive -> always confirm
|
||||
|
||||
## Phase 4: Optional Post-Execution Review Lane
|
||||
|
||||
Goal:
|
||||
- allow Pattern 2 only for explicitly reversible operations
|
||||
|
||||
Examples:
|
||||
- maybe low-risk messaging drafts saved locally
|
||||
- maybe reversible UI actions in specific environments
|
||||
|
||||
Important:
|
||||
- this phase is optional
|
||||
- Hermes should not rely on Pattern 2 for safety-critical flows
|
||||
|
||||
---
|
||||
|
||||
## 7. Verification Criteria for the Future Implementation
|
||||
|
||||
The eventual implementation should prove all of the following:
|
||||
|
||||
1. every tool call receives a scored assessment before dispatch
|
||||
2. crisis-sensitive outbound actions always require human confirmation
|
||||
3. dangerous terminal commands still preserve their current pre-execution gate
|
||||
4. clearly safe read-only tool calls are not slowed by unnecessary prompts
|
||||
5. assessment traces can be inspected after a run
|
||||
6. approval decisions remain session-safe across CLI and gateway contexts
|
||||
|
||||
---
|
||||
|
||||
## 8. Concrete Recommendations
|
||||
|
||||
### Recommendation 1
|
||||
Do **not** replace the current dangerous-command approval path.
|
||||
Generalize above it.
|
||||
|
||||
Why:
|
||||
- existing terminal Pattern 1 already works
|
||||
- this is the strongest piece of the current firewall
|
||||
|
||||
### Recommendation 2
|
||||
Add a universal scorer in `model_tools.handle_function_call()`.
|
||||
|
||||
Why:
|
||||
- that is the first point where Hermes knows the tool name and structured arguments
|
||||
- it is the cleanest place to classify all tool calls uniformly
|
||||
|
||||
### Recommendation 3
|
||||
Treat crisis-sensitive outbound intervention as a separate safety class.
|
||||
|
||||
Why:
|
||||
- issue #878 explicitly calls for Pattern 1 here
|
||||
- this matches Timmy's SOUL-level safety requirements
|
||||
|
||||
### Recommendation 4
|
||||
Ship scoring traces before enforcement expansion.
|
||||
|
||||
Why:
|
||||
- you cannot tune thresholds you cannot inspect
|
||||
- false positives will otherwise frustrate normal usage
|
||||
|
||||
### Recommendation 5
|
||||
Use Pattern 3 as the default policy for normal operations.
|
||||
|
||||
Why:
|
||||
- full manual confirmation on every tool call is too expensive
|
||||
- full autonomy is too risky
|
||||
- Pattern 3 is the practical middle ground
|
||||
|
||||
---
|
||||
|
||||
## 9. Bottom Line
|
||||
|
||||
Hermes should implement a **two-track human confirmation firewall**:
|
||||
|
||||
1. **Pattern 1: Pre-Execution Gate**
|
||||
- crisis interventions
|
||||
- destructive terminal actions
|
||||
- irreversible or safety-critical tool calls
|
||||
|
||||
2. **Pattern 3: Confidence Threshold**
|
||||
- all ordinary tool calls
|
||||
- driven by a universal tool-call assessment layer
|
||||
- integrated at the central dispatch boundary
|
||||
|
||||
Pattern 2 should remain optional and narrow.
|
||||
It is not the primary answer for Hermes.
|
||||
|
||||
The repo already contains the beginnings of this system.
|
||||
The next step is not new theory.
|
||||
It is to turn the existing approval path into a true **tool-call-wide human confirmation firewall**.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- Issue #878 — Human Confirmation Firewall Implementation Patterns
|
||||
- Issue #659 — Critical Research Tasks
|
||||
- `tools/approval.py` — current dangerous-command approval flow and smart approvals
|
||||
- `model_tools.py` — central tool dispatch boundary
|
||||
- `gateway/run.py` — blocking approval handling for messaging sessions
|
||||
@@ -148,3 +148,184 @@ class TestStrategyNameSurfaced:
|
||||
assert count == 0
|
||||
assert strategy is None
|
||||
assert err is not None
|
||||
|
||||
|
||||
class TestEscapeDriftGuard:
|
||||
"""Tests for the escape-drift guard that catches bash/JSON serialization
|
||||
artifacts where an apostrophe gets prefixed with a spurious backslash
|
||||
in tool-call transport.
|
||||
"""
|
||||
|
||||
def test_drift_blocked_apostrophe(self):
|
||||
"""File has ', old_string and new_string both have \\' — classic
|
||||
tool-call drift. Guard must block with a helpful error instead of
|
||||
writing \\' literals into source code."""
|
||||
content = "x = \"hello there\"\n"
|
||||
# Simulate transport-corrupted old_string and new_string where an
|
||||
# apostrophe-like context got prefixed with a backslash. The content
|
||||
# itself has no apostrophe, but both strings do — matching via
|
||||
# whitespace/anchor strategies would otherwise succeed.
|
||||
old_string = "x = \"hello there\" # don\\'t edit\n"
|
||||
new_string = "x = \"hi there\" # don\\'t edit\n"
|
||||
# This particular pair won't match anything, so it exits via
|
||||
# no-match path. Build a case where a non-exact strategy DOES match.
|
||||
content = "line\n x = 1\nline"
|
||||
old_string = "line\n x = \\'a\\'\nline"
|
||||
new_string = "line\n x = \\'b\\'\nline"
|
||||
new, count, strategy, err = fuzzy_find_and_replace(content, old_string, new_string)
|
||||
assert count == 0
|
||||
assert err is not None and "Escape-drift" in err
|
||||
assert "backslash" in err.lower()
|
||||
assert new == content # file untouched
|
||||
|
||||
def test_drift_blocked_double_quote(self):
|
||||
"""Same idea but with \\" drift instead of \\'."""
|
||||
content = 'line\n x = 1\nline'
|
||||
old_string = 'line\n x = \\"a\\"\nline'
|
||||
new_string = 'line\n x = \\"b\\"\nline'
|
||||
new, count, strategy, err = fuzzy_find_and_replace(content, old_string, new_string)
|
||||
assert count == 0
|
||||
assert err is not None and "Escape-drift" in err
|
||||
|
||||
def test_drift_allowed_when_file_genuinely_has_backslash_escapes(self):
|
||||
"""If the file already contains \\' (e.g. inside an existing escaped
|
||||
string), the model is legitimately preserving it. Guard must NOT
|
||||
fire."""
|
||||
content = "line\n x = \\'a\\'\nline"
|
||||
old_string = "line\n x = \\'a\\'\nline"
|
||||
new_string = "line\n x = \\'b\\'\nline"
|
||||
new, count, strategy, err = fuzzy_find_and_replace(content, old_string, new_string)
|
||||
assert err is None
|
||||
assert count == 1
|
||||
assert "\\'b\\'" in new
|
||||
|
||||
def test_drift_allowed_on_exact_match(self):
|
||||
"""Exact matches bypass the drift guard entirely — if the file
|
||||
really contains the exact bytes old_string specified, it's not
|
||||
drift."""
|
||||
content = "hello \\'world\\'"
|
||||
new, count, strategy, err = fuzzy_find_and_replace(
|
||||
content, "hello \\'world\\'", "hello \\'there\\'"
|
||||
)
|
||||
assert err is None
|
||||
assert count == 1
|
||||
assert strategy == "exact"
|
||||
|
||||
def test_drift_allowed_when_adding_escaped_strings(self):
|
||||
"""Model is adding new content with \\' that wasn't in the original.
|
||||
old_string has no \\', so guard doesn't fire."""
|
||||
content = "line1\nline2\nline3"
|
||||
old_string = "line1\nline2\nline3"
|
||||
new_string = "line1\nprint(\\'added\\')\nline2\nline3"
|
||||
new, count, strategy, err = fuzzy_find_and_replace(content, old_string, new_string)
|
||||
assert err is None
|
||||
assert count == 1
|
||||
assert "\\'added\\'" in new
|
||||
|
||||
def test_no_drift_check_when_new_string_lacks_suspect_chars(self):
|
||||
"""Fast-path: if new_string has no \\' or \\", guard must not
|
||||
fire even on fuzzy match."""
|
||||
content = "def foo():\n pass" # extra space ignored by line_trimmed
|
||||
old_string = "def foo():\n pass"
|
||||
new_string = "def bar():\n return 1"
|
||||
new, count, strategy, err = fuzzy_find_and_replace(content, old_string, new_string)
|
||||
assert err is None
|
||||
assert count == 1
|
||||
|
||||
|
||||
class TestFindClosestLines:
|
||||
def setup_method(self):
|
||||
from tools.fuzzy_match import find_closest_lines
|
||||
self.find_closest_lines = find_closest_lines
|
||||
|
||||
def test_finds_similar_line(self):
|
||||
content = "def foo():\n pass\ndef bar():\n return 1\n"
|
||||
result = self.find_closest_lines("def baz():", content)
|
||||
assert "def foo" in result or "def bar" in result
|
||||
|
||||
def test_returns_empty_for_no_match(self):
|
||||
content = "completely different content here"
|
||||
result = self.find_closest_lines("xyzzy_no_match_possible_!!!", content)
|
||||
assert result == ""
|
||||
|
||||
def test_returns_empty_for_empty_inputs(self):
|
||||
assert self.find_closest_lines("", "some content") == ""
|
||||
assert self.find_closest_lines("old string", "") == ""
|
||||
|
||||
def test_includes_context_lines(self):
|
||||
content = "line1\nline2\ndef target():\n pass\nline5\n"
|
||||
result = self.find_closest_lines("def target():", content)
|
||||
assert "target" in result
|
||||
|
||||
def test_includes_line_numbers(self):
|
||||
content = "line1\nline2\ndef foo():\n pass\n"
|
||||
result = self.find_closest_lines("def foo():", content)
|
||||
# Should include line numbers in format "N| content"
|
||||
assert "|" in result
|
||||
|
||||
|
||||
class TestFormatNoMatchHint:
|
||||
"""Gating tests for format_no_match_hint — the shared helper that decides
|
||||
whether a 'Did you mean?' snippet should be appended to an error.
|
||||
"""
|
||||
|
||||
def setup_method(self):
|
||||
from tools.fuzzy_match import format_no_match_hint
|
||||
self.fmt = format_no_match_hint
|
||||
|
||||
def test_fires_on_could_not_find_with_match(self):
|
||||
"""Classic no-match: similar content exists → hint fires."""
|
||||
content = "def foo():\n pass\ndef bar():\n pass\n"
|
||||
result = self.fmt(
|
||||
"Could not find a match for old_string in the file",
|
||||
0, "def baz():", content,
|
||||
)
|
||||
assert "Did you mean" in result
|
||||
assert "foo" in result or "bar" in result
|
||||
|
||||
def test_silent_on_ambiguous_match_error(self):
|
||||
"""'Found N matches' is not a missing-match failure — no hint."""
|
||||
content = "aaa bbb aaa\n"
|
||||
result = self.fmt(
|
||||
"Found 2 matches for old_string. Provide more context to make it unique, or use replace_all=True.",
|
||||
0, "aaa", content,
|
||||
)
|
||||
assert result == ""
|
||||
|
||||
def test_silent_on_escape_drift_error(self):
|
||||
"""Escape-drift errors are intentional blocks — hint would mislead."""
|
||||
content = "x = 1\n"
|
||||
result = self.fmt(
|
||||
"Escape-drift detected: old_string and new_string contain the literal sequence '\\\\''...",
|
||||
0, "x = \\'1\\'", content,
|
||||
)
|
||||
assert result == ""
|
||||
|
||||
def test_silent_on_identical_strings(self):
|
||||
"""old_string == new_string — hint irrelevant."""
|
||||
result = self.fmt(
|
||||
"old_string and new_string are identical",
|
||||
0, "foo", "foo bar\n",
|
||||
)
|
||||
assert result == ""
|
||||
|
||||
def test_silent_when_match_count_nonzero(self):
|
||||
"""If match succeeded, we shouldn't be in the error path — defense in depth."""
|
||||
result = self.fmt(
|
||||
"Could not find a match for old_string in the file",
|
||||
1, "foo", "foo bar\n",
|
||||
)
|
||||
assert result == ""
|
||||
|
||||
def test_silent_on_none_error(self):
|
||||
"""No error at all — no hint."""
|
||||
result = self.fmt(None, 0, "foo", "bar\n")
|
||||
assert result == ""
|
||||
|
||||
def test_silent_when_no_similar_content(self):
|
||||
"""Even for a valid no-match error, skip hint when nothing similar exists."""
|
||||
result = self.fmt(
|
||||
"Could not find a match for old_string in the file",
|
||||
0, "totally_unique_xyzzy_qux", "abc\nxyz\n",
|
||||
)
|
||||
assert result == ""
|
||||
|
||||
114
tests/tools/test_patch_did_you_mean.py
Normal file
114
tests/tools/test_patch_did_you_mean.py
Normal file
@@ -0,0 +1,114 @@
|
||||
import json
|
||||
import os
|
||||
import textwrap
|
||||
from pathlib import Path
|
||||
|
||||
import tools.skill_manager_tool as skill_manager_tool
|
||||
from tools.file_tools import patch_tool
|
||||
from tools.skill_manager_tool import _create_skill, _patch_skill
|
||||
|
||||
|
||||
def _disable_patch_tool_guards(monkeypatch):
|
||||
monkeypatch.setattr("tools.file_tools._check_sensitive_path", lambda _path: None)
|
||||
monkeypatch.setattr("tools.file_tools._check_file_staleness", lambda _path, _task_id: None)
|
||||
monkeypatch.setattr("tools.file_tools._log_and_check_conflict", lambda _path, _task_id, _action: None)
|
||||
|
||||
|
||||
def test_patch_tool_replace_no_match_shows_rich_hint_without_legacy_hint(tmp_path, monkeypatch):
|
||||
_disable_patch_tool_guards(monkeypatch)
|
||||
sample = tmp_path / "sample.py"
|
||||
sample.write_text("def foo():\n return 1\n\ndef bar():\n return 2\n", encoding="utf-8")
|
||||
|
||||
raw = patch_tool(
|
||||
mode="replace",
|
||||
path=str(sample),
|
||||
old_string="def barycentric():",
|
||||
new_string="def barycentric_new():",
|
||||
task_id="qa960-replace-rich-hint",
|
||||
)
|
||||
|
||||
result = json.loads(raw)
|
||||
assert result["success"] is False
|
||||
assert "Could not find a match" in result["error"]
|
||||
assert "Did you mean one of these sections?" in result["error"]
|
||||
assert "def bar():" in result["error"] or "def foo():" in result["error"]
|
||||
assert "[Hint:" not in raw
|
||||
|
||||
|
||||
def test_patch_tool_replace_ambiguous_error_does_not_show_did_you_mean(tmp_path, monkeypatch):
|
||||
_disable_patch_tool_guards(monkeypatch)
|
||||
sample = tmp_path / "sample.py"
|
||||
sample.write_text("aaa\nbbb\naaa\n", encoding="utf-8")
|
||||
|
||||
raw = patch_tool(
|
||||
mode="replace",
|
||||
path=str(sample),
|
||||
old_string="aaa",
|
||||
new_string="ccc",
|
||||
task_id="qa960-replace-ambiguous",
|
||||
)
|
||||
|
||||
result = json.loads(raw)
|
||||
assert result["success"] is False
|
||||
assert "Found 2 matches" in result["error"]
|
||||
assert "Did you mean one of these sections?" not in result["error"]
|
||||
assert "[Hint:" not in raw
|
||||
|
||||
|
||||
def test_patch_tool_v4a_no_match_shows_rich_hint(tmp_path, monkeypatch):
|
||||
_disable_patch_tool_guards(monkeypatch)
|
||||
sample = tmp_path / "sample.py"
|
||||
sample.write_text("def foo():\n return 1\n", encoding="utf-8")
|
||||
|
||||
patch = textwrap.dedent(
|
||||
f"""\
|
||||
*** Begin Patch
|
||||
*** Update File: {sample}
|
||||
@@
|
||||
-def barycentric():
|
||||
+def barycentric_new():
|
||||
*** End Patch
|
||||
"""
|
||||
)
|
||||
|
||||
raw = patch_tool(mode="patch", patch=patch, task_id="qa960-v4a-rich-hint")
|
||||
result = json.loads(raw)
|
||||
assert result["success"] is False
|
||||
assert "Patch validation failed" in result["error"]
|
||||
assert "Did you mean one of these sections?" in result["error"]
|
||||
assert "def foo():" in result["error"]
|
||||
|
||||
|
||||
def test_skill_patch_no_match_shows_rich_hint(tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
|
||||
skills_dir = tmp_path / "skills"
|
||||
skills_dir.mkdir(parents=True, exist_ok=True)
|
||||
monkeypatch.setattr(skill_manager_tool, "SKILLS_DIR", skills_dir)
|
||||
monkeypatch.setattr(skill_manager_tool, "_security_scan_skill", lambda _skill_dir: None)
|
||||
|
||||
_create_skill(
|
||||
"qa-skill",
|
||||
textwrap.dedent(
|
||||
"""\
|
||||
---
|
||||
name: qa-skill
|
||||
description: test
|
||||
---
|
||||
|
||||
Step 1: Do the thing.
|
||||
Step 2: Verify the thing.
|
||||
"""
|
||||
),
|
||||
)
|
||||
|
||||
result = _patch_skill(
|
||||
"qa-skill",
|
||||
"Step 1: Do the production rollout.",
|
||||
"Step 1: Updated.",
|
||||
)
|
||||
|
||||
assert result["success"] is False
|
||||
assert "Could not find a match" in result["error"]
|
||||
assert "Did you mean one of these sections?" in result["error"]
|
||||
assert "Step 1: Do the thing." in result["error"]
|
||||
assert "file_preview" in result
|
||||
@@ -757,12 +757,14 @@ class ShellFileOperations(FileOperations):
|
||||
content, old_string, new_string, replace_all
|
||||
)
|
||||
|
||||
if error:
|
||||
return PatchResult(error=error)
|
||||
|
||||
if match_count == 0:
|
||||
return PatchResult(error=f"Could not find match for old_string in {path}")
|
||||
|
||||
if error or match_count == 0:
|
||||
err_msg = error or f"Could not find match for old_string in {path}"
|
||||
try:
|
||||
from tools.fuzzy_match import format_no_match_hint
|
||||
err_msg += format_no_match_hint(err_msg, match_count, old_string, content)
|
||||
except Exception:
|
||||
pass
|
||||
return PatchResult(error=err_msg)
|
||||
# Write back
|
||||
write_result = self.write_file(path, new_content)
|
||||
if write_result.error:
|
||||
|
||||
@@ -8,6 +8,7 @@ import os
|
||||
import threading
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, Optional
|
||||
from tools.binary_extensions import has_binary_extension
|
||||
from tools.file_operations import ShellFileOperations
|
||||
from agent.redact import redact_sensitive_text
|
||||
@@ -690,8 +691,11 @@ def patch_tool(mode: str = "replace", path: str = None, old_string: str = None,
|
||||
result_json = json.dumps(result_dict, ensure_ascii=False)
|
||||
# Hint when old_string not found — saves iterations where the agent
|
||||
# retries with stale content instead of re-reading the file.
|
||||
# Suppressed when patch_replace already attached a rich "Did you mean?"
|
||||
# snippet (which is strictly more useful than the generic hint).
|
||||
if result_dict.get("error") and "Could not find" in str(result_dict["error"]):
|
||||
result_json += "\n\n[Hint: old_string not found. Use read_file to verify the current content, or search_files to locate the text.]"
|
||||
if "Did you mean one of these sections?" not in str(result_dict["error"]):
|
||||
result_json += "\n\n[Hint: old_string not found. Use read_file to verify the current content, or search_files to locate the text.]"
|
||||
return result_json
|
||||
except Exception as e:
|
||||
return tool_error(str(e))
|
||||
|
||||
@@ -93,6 +93,21 @@ def fuzzy_find_and_replace(content: str, old_string: str, new_string: str,
|
||||
f"Provide more context to make it unique, or use replace_all=True."
|
||||
)
|
||||
|
||||
# Escape-drift guard: when the matched strategy is NOT `exact`,
|
||||
# we matched via some form of normalization. If new_string
|
||||
# contains shell/JSON-style escape sequences (\\' or \\") that
|
||||
# would be written literally into the file but the matched
|
||||
# region of the file has no such sequences, this is almost
|
||||
# certainly tool-call serialization drift — the model typed
|
||||
# an apostrophe/quote and the transport added a stray
|
||||
# backslash. Writing new_string as-is would corrupt the file.
|
||||
# Block with a helpful error so the model re-reads and retries
|
||||
# instead of the caller silently persisting garbage (or not).
|
||||
if strategy_name != "exact":
|
||||
drift_err = _detect_escape_drift(content, matches, old_string, new_string)
|
||||
if drift_err:
|
||||
return content, 0, None, drift_err
|
||||
|
||||
# Perform replacement
|
||||
new_content = _apply_replacements(content, matches, new_string)
|
||||
return new_content, len(matches), strategy_name, None
|
||||
@@ -101,6 +116,46 @@ def fuzzy_find_and_replace(content: str, old_string: str, new_string: str,
|
||||
return content, 0, None, "Could not find a match for old_string in the file"
|
||||
|
||||
|
||||
def _detect_escape_drift(content: str, matches: List[Tuple[int, int]],
|
||||
old_string: str, new_string: str) -> Optional[str]:
|
||||
"""Detect tool-call escape-drift artifacts in new_string.
|
||||
|
||||
Looks for ``\\'`` or ``\\"`` sequences that are present in both
|
||||
old_string and new_string (i.e. the model copy-pasted them as "context"
|
||||
it intended to preserve) but don't exist in the matched region of the
|
||||
file. That pattern indicates the transport layer inserted spurious
|
||||
shell-style escapes around apostrophes or quotes — writing new_string
|
||||
verbatim would literally insert ``\\'`` into source code.
|
||||
|
||||
Returns an error string if drift is detected, None otherwise.
|
||||
"""
|
||||
# Cheap pre-check: bail out unless new_string actually contains a
|
||||
# suspect escape sequence. This keeps the guard free for all the
|
||||
# common, correct cases.
|
||||
if "\\'" not in new_string and '\\"' not in new_string:
|
||||
return None
|
||||
|
||||
# Aggregate matched regions of the file — that's what new_string will
|
||||
# replace. If the suspect escapes are present there already, the
|
||||
# model is genuinely preserving them (valid for some languages /
|
||||
# escaped strings); accept the patch.
|
||||
matched_regions = "".join(content[start:end] for start, end in matches)
|
||||
|
||||
for suspect in ("\\'", '\\"'):
|
||||
if suspect in new_string and suspect in old_string and suspect not in matched_regions:
|
||||
plain = suspect[1] # "'" or '"'
|
||||
return (
|
||||
f"Escape-drift detected: old_string and new_string contain "
|
||||
f"the literal sequence {suspect!r} but the matched region of "
|
||||
f"the file does not. This is almost always a tool-call "
|
||||
f"serialization artifact where an apostrophe or quote got "
|
||||
f"prefixed with a spurious backslash. Re-read the file with "
|
||||
f"read_file and pass old_string/new_string without "
|
||||
f"backslash-escaping {plain!r} characters."
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
def _apply_replacements(content: str, matches: List[Tuple[int, int]], new_string: str) -> str:
|
||||
"""
|
||||
Apply replacements at the given positions.
|
||||
@@ -564,3 +619,86 @@ def _map_normalized_positions(original: str, normalized: str,
|
||||
original_matches.append((orig_start, min(orig_end, len(original))))
|
||||
|
||||
return original_matches
|
||||
|
||||
|
||||
def find_closest_lines(old_string: str, content: str, context_lines: int = 2, max_results: int = 3) -> str:
|
||||
"""Find lines in content most similar to old_string for "did you mean?" feedback.
|
||||
|
||||
Returns a formatted string showing the closest matching lines with context,
|
||||
or empty string if no useful match is found.
|
||||
"""
|
||||
if not old_string or not content:
|
||||
return ""
|
||||
|
||||
old_lines = old_string.splitlines()
|
||||
content_lines = content.splitlines()
|
||||
|
||||
if not old_lines or not content_lines:
|
||||
return ""
|
||||
|
||||
# Use first line of old_string as anchor for search
|
||||
anchor = old_lines[0].strip()
|
||||
if not anchor:
|
||||
# Try second line if first is blank
|
||||
candidates = [l.strip() for l in old_lines if l.strip()]
|
||||
if not candidates:
|
||||
return ""
|
||||
anchor = candidates[0]
|
||||
|
||||
# Score each line in content by similarity to anchor
|
||||
scored = []
|
||||
for i, line in enumerate(content_lines):
|
||||
stripped = line.strip()
|
||||
if not stripped:
|
||||
continue
|
||||
ratio = SequenceMatcher(None, anchor, stripped).ratio()
|
||||
if ratio > 0.3:
|
||||
scored.append((ratio, i))
|
||||
|
||||
if not scored:
|
||||
return ""
|
||||
|
||||
# Take top matches
|
||||
scored.sort(key=lambda x: -x[0])
|
||||
top = scored[:max_results]
|
||||
|
||||
parts = []
|
||||
seen_ranges = set()
|
||||
for _, line_idx in top:
|
||||
start = max(0, line_idx - context_lines)
|
||||
end = min(len(content_lines), line_idx + len(old_lines) + context_lines)
|
||||
key = (start, end)
|
||||
if key in seen_ranges:
|
||||
continue
|
||||
seen_ranges.add(key)
|
||||
snippet = "\n".join(
|
||||
f"{start + j + 1:4d}| {content_lines[start + j]}"
|
||||
for j in range(end - start)
|
||||
)
|
||||
parts.append(snippet)
|
||||
|
||||
if not parts:
|
||||
return ""
|
||||
|
||||
return "\n---\n".join(parts)
|
||||
|
||||
|
||||
def format_no_match_hint(error: Optional[str], match_count: int,
|
||||
old_string: str, content: str) -> str:
|
||||
"""Return a '\\n\\nDid you mean...' snippet for plain no-match errors.
|
||||
|
||||
Gated so the hint only fires for actual "old_string not found" failures.
|
||||
Ambiguous-match ("Found N matches"), escape-drift, and identical-strings
|
||||
errors all have ``match_count == 0`` but a "did you mean?" snippet would
|
||||
be misleading — those failed for unrelated reasons.
|
||||
|
||||
Returns an empty string when there's nothing useful to append.
|
||||
"""
|
||||
if match_count != 0:
|
||||
return ""
|
||||
if not error or not error.startswith("Could not find"):
|
||||
return ""
|
||||
hint = find_closest_lines(old_string, content)
|
||||
if not hint:
|
||||
return ""
|
||||
return "\n\nDid you mean one of these sections?\n" + hint
|
||||
|
||||
@@ -290,10 +290,16 @@ def _validate_operations(
|
||||
)
|
||||
if count == 0:
|
||||
label = f"'{hunk.context_hint}'" if hunk.context_hint else "(no hint)"
|
||||
errors.append(
|
||||
msg = (
|
||||
f"{op.file_path}: hunk {label} not found"
|
||||
+ (f" — {match_error}" if match_error else "")
|
||||
)
|
||||
try:
|
||||
from tools.fuzzy_match import format_no_match_hint
|
||||
msg += format_no_match_hint(match_error, count, search_pattern, simulated)
|
||||
except Exception:
|
||||
pass
|
||||
errors.append(msg)
|
||||
else:
|
||||
# Advance simulation so subsequent hunks validate correctly.
|
||||
# Reuse the result from the call above — no second fuzzy run.
|
||||
@@ -537,7 +543,13 @@ def _apply_update(op: PatchOperation, file_ops: Any) -> Tuple[bool, str]:
|
||||
error = None
|
||||
|
||||
if error:
|
||||
return False, f"Could not apply hunk: {error}"
|
||||
err_msg = f"Could not apply hunk: {error}"
|
||||
try:
|
||||
from tools.fuzzy_match import format_no_match_hint
|
||||
err_msg += format_no_match_hint(error, 0, search_pattern, new_content)
|
||||
except Exception:
|
||||
pass
|
||||
return False, err_msg
|
||||
else:
|
||||
# Addition-only hunk (no context or removed lines).
|
||||
# Insert at the location indicated by the context hint, or at end of file.
|
||||
|
||||
@@ -575,9 +575,15 @@ def _patch_skill(
|
||||
if match_error:
|
||||
# Show a short preview of the file so the model can self-correct
|
||||
preview = content[:500] + ("..." if len(content) > 500 else "")
|
||||
err_msg = match_error
|
||||
try:
|
||||
from tools.fuzzy_match import format_no_match_hint
|
||||
err_msg += format_no_match_hint(match_error, match_count, old_string, content)
|
||||
except Exception:
|
||||
pass
|
||||
return {
|
||||
"success": False,
|
||||
"error": match_error,
|
||||
"error": err_msg,
|
||||
"file_preview": preview,
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user