feat: smart approvals + /stop command (inspired by OpenAI Codex)

* feat: smart approvals — LLM-based risk assessment for dangerous commands

Adds a 'smart' approval mode that uses the auxiliary LLM to assess
whether a flagged command is genuinely dangerous or a false positive,
auto-approving low-risk commands without prompting the user.

Inspired by OpenAI Codex's Smart Approvals guardian subagent
(openai/codex#13860).

Config (config.yaml):
  approvals:
    mode: manual   # manual (default), smart, off

Modes:
- manual — current behavior, always prompt the user
- smart  — aux LLM evaluates risk: APPROVE (auto-allow), DENY (block),
           or ESCALATE (fall through to manual prompt)
- off    — skip all approval prompts (equivalent to --yolo)

When smart mode auto-approves, the pattern gets session-level approval
so subsequent uses of the same pattern don't trigger another LLM call.
When it denies, the command is blocked without user prompt. When
uncertain, it escalates to the normal manual approval flow.

The LLM prompt is carefully scoped: it sees only the command text and
the flagged reason, assesses actual risk vs false positive, and returns
a single-word verdict.

* feat: make smart approval model configurable via config.yaml

Adds auxiliary.approval section to config.yaml with the same
provider/model/base_url/api_key pattern as other aux tasks (vision,
web_extract, compression, etc.).

Config:
  auxiliary:
    approval:
      provider: auto
      model: ''        # fast/cheap model recommended
      base_url: ''
      api_key: ''

Bridged to env vars in both CLI and gateway paths so the aux client
picks them up automatically.

* feat: add /stop command to kill all background processes

Adds a /stop slash command that kills all running background processes
at once. Currently users have to process(list) then process(kill) for
each one individually.

Inspired by OpenAI Codex's separation of interrupt (Ctrl+C stops current
turn) from /stop (cleans up background processes). See openai/codex#14602.

Ctrl+C continues to only interrupt the active agent turn — background
dev servers, watchers, etc. are preserved. /stop is the explicit way
to clean them all up.
This commit is contained in:
Teknium
2026-03-16 06:20:11 -07:00
committed by GitHub
parent 2633272ea9
commit 57be18c026
6 changed files with 142 additions and 4 deletions

View File

@@ -4,6 +4,7 @@ This module is the single source of truth for the dangerous command system:
- Pattern detection (DANGEROUS_PATTERNS, detect_dangerous_command)
- Per-session approval state (thread-safe, keyed by session_key)
- Approval prompting (CLI interactive + gateway async)
- Smart approval via auxiliary LLM (auto-approve low-risk commands)
- Permanent allowlist persistence (config.yaml)
"""
@@ -283,6 +284,68 @@ def prompt_dangerous_approval(command: str, description: str,
sys.stdout.flush()
def _get_approval_mode() -> str:
"""Read the approval mode from config. Returns 'manual', 'smart', or 'off'."""
try:
from hermes_cli.config import load_config
config = load_config()
return config.get("approvals", {}).get("mode", "manual")
except Exception:
return "manual"
def _smart_approve(command: str, description: str) -> str:
"""Use the auxiliary LLM to assess risk and decide approval.
Returns 'approve' if the LLM determines the command is safe,
'deny' if genuinely dangerous, or 'escalate' if uncertain.
Inspired by OpenAI Codex's Smart Approvals guardian subagent
(openai/codex#13860).
"""
try:
from agent.auxiliary_client import get_text_auxiliary_client, auxiliary_max_tokens_param
client, model = get_text_auxiliary_client(task="approval")
if not client or not model:
logger.debug("Smart approvals: no aux client available, escalating")
return "escalate"
prompt = f"""You are a security reviewer for an AI coding agent. A terminal command was flagged by pattern matching as potentially dangerous.
Command: {command}
Flagged reason: {description}
Assess the ACTUAL risk of this command. Many flagged commands are false positives — for example, `python -c "print('hello')"` is flagged as "script execution via -c flag" but is completely harmless.
Rules:
- APPROVE if the command is clearly safe (benign script execution, safe file operations, development tools, package installs, git operations, etc.)
- DENY if the command could genuinely damage the system (recursive delete of important paths, overwriting system files, fork bombs, wiping disks, dropping databases, etc.)
- ESCALATE if you're uncertain
Respond with exactly one word: APPROVE, DENY, or ESCALATE"""
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
**auxiliary_max_tokens_param(16),
temperature=0,
)
answer = (response.choices[0].message.content or "").strip().upper()
if "APPROVE" in answer:
return "approve"
elif "DENY" in answer:
return "deny"
else:
return "escalate"
except Exception as e:
logger.debug("Smart approvals: LLM call failed (%s), escalating", e)
return "escalate"
def check_dangerous_command(command: str, env_type: str,
approval_callback=None) -> dict:
"""Check if a command is dangerous and handle approval.
@@ -372,8 +435,9 @@ def check_all_command_guards(command: str, env_type: str,
if env_type in ("docker", "singularity", "modal", "daytona"):
return {"approved": True, "message": None}
# --yolo: bypass all approval prompts and pre-exec guard checks
if os.getenv("HERMES_YOLO_MODE"):
# --yolo or approvals.mode=off: bypass all approval prompts
approval_mode = _get_approval_mode()
if os.getenv("HERMES_YOLO_MODE") or approval_mode == "off":
return {"approved": True, "message": None}
is_cli = os.getenv("HERMES_INTERACTIVE")
@@ -430,6 +494,31 @@ def check_all_command_guards(command: str, env_type: str,
if not warnings:
return {"approved": True, "message": None}
# --- Phase 2.5: Smart approval (auxiliary LLM risk assessment) ---
# When approvals.mode=smart, ask the aux LLM before prompting the user.
# Inspired by OpenAI Codex's Smart Approvals guardian subagent
# (openai/codex#13860).
if approval_mode == "smart":
combined_desc_for_llm = "; ".join(desc for _, desc, _ in warnings)
verdict = _smart_approve(command, combined_desc_for_llm)
if verdict == "approve":
# Auto-approve and grant session-level approval for these patterns
for key, _, _ in warnings:
approve_session(session_key, key)
logger.debug("Smart approval: auto-approved '%s' (%s)",
command[:60], combined_desc_for_llm)
return {"approved": True, "message": None,
"smart_approved": True}
elif verdict == "deny":
combined_desc_for_llm = "; ".join(desc for _, desc, _ in warnings)
return {
"approved": False,
"message": f"BLOCKED by smart approval: {combined_desc_for_llm}. "
"The command was assessed as genuinely dangerous. Do NOT retry.",
"smart_denied": True,
}
# verdict == "escalate" → fall through to manual prompt
# --- Phase 3: Approval ---
# Combine descriptions for a single approval prompt