Files
hermes-agent/agent_core_analysis.md
Allegro 10271c6b44
Some checks failed
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Failing after 25s
Tests / test (pull_request) Failing after 24s
Docker Build and Publish / build-and-push (pull_request) Failing after 35s
security: fix command injection vulnerabilities (CVSS 9.8)
Replace shell=True with list-based subprocess execution to prevent
command injection via malicious user input.

Changes:
- tools/transcription_tools.py: Use shlex.split() + shell=False
- tools/environments/docker.py: List-based commands with container ID validation

Fixes CVE-level vulnerability where malicious file paths or container IDs
could inject arbitrary commands.

CVSS: 9.8 (Critical)
Refs: V-001 in SECURITY_AUDIT_REPORT.md
2026-03-30 23:15:11 +00:00

21 KiB

Deep Analysis: Agent Core (run_agent.py + agent/*.py)

Executive Summary

The AIAgent class is a sophisticated conversation orchestrator (~8500 lines) with multi-provider support, parallel tool execution, context compression, and robust error handling. This analysis covers the state machine, retry logic, context management, optimizations, and potential issues.


1. State Machine Diagram of Conversation Flow

┌─────────────────────────────────────────────────────────────────────────────────┐
│                         AIAgent Conversation State Machine                       │
└─────────────────────────────────────────────────────────────────────────────────┘

┌─────────────┐     ┌─────────────┐     ┌─────────────────┐     ┌─────────────┐
│   START     │────▶│  INIT       │────▶│  BUILD_SYSTEM   │────▶│   USER      │
│             │     │  (config)   │     │  _PROMPT        │     │   INPUT     │
└─────────────┘     └─────────────┘     └─────────────────┘     └──────┬──────┘
                                                                       │
    ┌──────────────────────────────────────────────────────────────────┘
    │
    ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────────┐     ┌─────────────┐
│   API_CALL  │◄────│  PREPARE    │◄────│  HONCHO_PREFETCH│◄────│  COMPRESS?  │
│   (stream)  │     │  _MESSAGES  │     │  (context)      │     │  (threshold)│
└──────┬──────┘     └─────────────┘     └─────────────────┘     └─────────────┘
       │
       ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                              API Response Handler                                │
├─────────────────────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐      │
│  │   STOP      │    │  TOOL_CALLS │    │   LENGTH    │    │   ERROR     │      │
│  │  (finish)   │    │  (execute)  │    │ (truncate)  │    │  (retry)    │      │
│  └──────┬──────┘    └──────┬──────┘    └──────┬──────┘    └──────┬──────┘      │
│         │                  │                  │                  │             │
│         ▼                  ▼                  ▼                  ▼             │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐      │
│  │   RETURN    │    │  EXECUTE    │    │ CONTINUATION│    │  FALLBACK/  │      │
│  │  RESPONSE   │    │  TOOLS      │    │   REQUEST   │    │  COMPRESS   │      │
│  │             │    │  (parallel/ │    │             │    │             │      │
│  │             │    │ sequential) │    │             │    │             │      │
│  └─────────────┘    └──────┬──────┘    └─────────────┘    └─────────────┘      │
│                            │                                                   │
│                            └─────────────────────────────────┐                 │
│                                                              ▼                 │
│                                                   ┌─────────────────┐          │
│                                                   │  APPEND_RESULTS │──────────┘
│                                                   │  (loop back)    │
│                                                   └─────────────────┘
└─────────────────────────────────────────────────────────────────────────────────┘

Key States:
───────────
1. INIT: Agent initialization, client setup, tool loading
2. BUILD_SYSTEM_PROMPT: Cached system prompt assembly with skills/memory
3. USER_INPUT: Message injection with Honcho turn context
4. COMPRESS?: Context threshold check (50% default)
5. API_CALL: Streaming/non-streaming LLM request
6. TOOL_EXECUTION: Parallel (safe) or sequential (interactive) tool calls
7. FALLBACK: Provider failover on errors
8. RETURN: Final response with metadata

Transitions:
────────────
- INTERRUPT: Any state → immediate cleanup → RETURN
- MAX_ITERATIONS: API_CALL → RETURN (budget exhausted)
- 413/CONTEXT_ERROR: API_CALL → COMPRESS → retry
- 401/429: API_CALL → FALLBACK → retry

Sub-State: Tool Execution

┌─────────────────────────────────────────────────────────────┐
│                    Tool Execution Flow                       │
└─────────────────────────────────────────────────────────────┘

┌─────────────────┐
│  RECEIVE_BATCH  │
└────────┬────────┘
         │
    ┌────┴────┐
    │ Parallel?│
    └────┬────┘
   YES /  \ NO
      /    \
     ▼      ▼
┌─────────┐  ┌─────────┐
│CONCURRENT│  │SEQUENTIAL│
│(ThreadPool│  │(for loop)│
│  max=8)  │  │         │
└────┬────┘  └────┬────┘
     │            │
     ▼            ▼
┌─────────┐  ┌─────────┐
│ _invoke_│  │ _invoke_│
│ _tool() │  │ _tool() │ (per tool)
│ (workers)│  │         │
└────┬────┘  └────┬────┘
     │            │
     └────────────┘
            │
            ▼
    ┌───────────────┐
    │ CHECKPOINT?   │ (write_file/patch/terminal)
    └───────┬───────┘
            │
            ▼
    ┌───────────────┐
    │ BUDGET_WARNING│ (inject if >70% iterations)
    └───────┬───────┘
            │
            ▼
    ┌───────────────┐
    │ APPEND_TO_MSGS│
    └───────────────┘

2. All Retry/Fallback Logic Identified

2.1 API Call Retry Loop (lines 6420-7351)

# Primary retry configuration
max_retries = 3
retry_count = 0

# Retryable errors (with backoff):
- Timeout errors (httpx.ReadTimeout, ConnectTimeout, PoolTimeout)
- Connection errors (ConnectError, RemoteProtocolError, ConnectionError)
- SSE connection drops ("connection lost", "network error")
- Rate limits (429) - with Retry-After header respect

# Backoff strategy:
wait_time = min(2 ** retry_count, 60)  # 2s, 4s, 8s max 60s
# Rate limits: use Retry-After header (capped at 120s)

2.2 Streaming Retry Logic (lines 4157-4268)

_max_stream_retries = int(os.getenv("HERMES_STREAM_RETRIES", 2))

# Streaming-specific fallbacks:
1. Streaming fails after partial delivery  NO retry (partial content shown)
2. Streaming fails BEFORE delivery  fallback to non-streaming
3. Stale stream detection (>180s, scaled to 300s for >100K tokens)  kill connection

2.3 Provider Fallback Chain (lines 4334-4443)

# Fallback chain from config (fallback_model / fallback_providers)
self._fallback_chain = [...]  # List of {provider, model} dicts
self._fallback_index = 0      # Current position in chain

# Trigger conditions:
- max_retries exhausted
- Rate limit (429) with fallback available
- Non-retryable 4xx error (401, 403, 404, 422)
- Empty/malformed response after retries

# Fallback activation:
_try_activate_fallback()  swaps client, model, base_url in-place

2.4 Context Length Error Handling (lines 6998-7164)

# 413 Payload Too Large:
max_compression_attempts = 3
# Compress context and retry

# Context length exceeded:
CONTEXT_PROBE_TIERS = [128_000, 64_000, 32_000, 16_000, 8_000]
# Step down through tiers on error

2.5 Authentication Refresh Retry (lines 6904-6950)

# Codex OAuth (401):
codex_auth_retry_attempted = False  # Once per request
_try_refresh_codex_client_credentials()

# Nous Portal (401):
nous_auth_retry_attempted = False
_try_refresh_nous_client_credentials()

# Anthropic (401):
anthropic_auth_retry_attempted = False
_try_refresh_anthropic_client_credentials()

2.6 Length Continuation Retry (lines 6639-6765)

# Response truncated (finish_reason='length'):
length_continue_retries = 0
max_continuation_retries = 3

# Request continuation with prompt:
"[System: Your previous response was truncated... Continue exactly where you left off]"

2.7 Tool Call Validation Retries (lines 7400-7500)

# Invalid tool name: 3 repair attempts
# 1. Lowercase
# 2. Normalize (hyphens/spaces to underscores)
# 3. Fuzzy match (difflib, cutoff=0.7)

# Invalid JSON arguments: 3 retries
# Empty content after think blocks: 3 retries
# Incomplete scratchpad: 3 retries

3. Context Window Management Analysis

3.1 Multi-Layer Context System

┌────────────────────────────────────────────────────────────────────────┐
│                        Context Architecture                             │
├────────────────────────────────────────────────────────────────────────┤
│ Layer 1: System Prompt (cached per session)                            │
│   - SOUL.md or DEFAULT_AGENT_IDENTITY                                  │
│   - Memory blocks (MEMORY.md, USER.md)                                 │
│   - Skills index                                                       │
│   - Context files (AGENTS.md, .cursorrules)                            │
│   - Timestamp, platform hints                                          │
│   - ~2K-10K tokens typical                                            │
├────────────────────────────────────────────────────────────────────────┤
│ Layer 2: Conversation History                                          │
│   - User/assistant/tool messages                                       │
│   - Protected head (first 3 messages)                                  │
│   - Protected tail (last N messages by token budget)                   │
│   - Compressible middle section                                        │
├────────────────────────────────────────────────────────────────────────┤
│ Layer 3: Tool Definitions                                              │
│   - ~20-30K tokens with many tools                                     │
│   - Filtered by enabled/disabled toolsets                              │
├────────────────────────────────────────────────────────────────────────┤
│ Layer 4: Ephemeral Context (API call only)                             │
│   - Prefill messages                                                   │
│   - Honcho turn context                                                │
│   - Plugin context                                                     │
│   - Ephemeral system prompt                                            │
└────────────────────────────────────────────────────────────────────────┘

3.2 ContextCompressor Algorithm (agent/context_compressor.py)

# Configuration:
threshold_percent = 0.50        # Compress at 50% of context length
protect_first_n = 3             # Head protection
protect_last_n = 20             # Tail protection (message count fallback)
tail_token_budget = 20_000      # Tail protection (token budget)
summary_target_ratio = 0.20     # 20% of compressed content for summary

# Compression phases:
1. Prune old tool results (cheap pre-pass)
2. Determine boundaries (head + tail protection)
3. Generate structured summary via LLM
4. Sanitize tool_call/tool_result pairs
5. Assemble compressed message list

# Iterative summary updates:
_previous_summary = None  # Stored for next compression

3.3 Context Length Detection Hierarchy

# Detection priority (model_metadata.py):
1. Config override (config.yaml model.context_length)
2. Custom provider config (custom_providers[].models[].context_length)
3. models.dev registry lookup
4. OpenRouter API metadata
5. Endpoint /models probe (local servers)
6. Hardcoded DEFAULT_CONTEXT_LENGTHS
7. Context probing (trial-and-error tiers)
8. DEFAULT_FALLBACK_CONTEXT (128K)

3.4 Prompt Caching (Anthropic)

# System-and-3 strategy:
# - 4 cache_control breakpoints max
# - System prompt (stable)
# - Last 3 non-system messages (rolling window)
# - 5m or 1h TTL

# Activation conditions:
_is_openrouter_url() and "claude" in model.lower()
# OR native Anthropic endpoint

3.5 Context Pressure Monitoring

# User-facing warnings (not injected to LLM):
_context_pressure_warned = False

# Thresholds:
_budget_caution_threshold = 0.7   # 70% - nudge to wrap up
_budget_warning_threshold = 0.9   # 90% - urgent

# Injection method:
# Added to last tool result JSON as _budget_warning field

4. Ten Performance Optimization Opportunities

4.1 Tool Call Deduplication (Missing)

Current: No deduplication of identical tool calls within a batch Impact: Redundant API calls, wasted tokens Fix: Add _deduplicate_tool_calls() before execution (already implemented but only for delegate_task)

4.2 Context Compression Frequency

Current: Compress only at threshold crossing Impact: Sudden latency spike during compression Fix: Background compression prediction + prefetch

4.3 Skills Prompt Cache Invalidation

Current: LRU cache keyed by (skills_dir, tools, toolsets) Issue: External skill file changes may not invalidate cache Fix: Add file watcher or mtime check before cache hit

4.4 Streaming Response Buffering

Current: Accumulates all deltas in memory Impact: Memory bloat for long responses Fix: Stream directly to output with minimal buffering

4.5 Tool Result Truncation Timing

Current: Truncates after tool execution completes Impact: Wasted time on tools returning huge outputs Fix: Streaming truncation during tool execution

4.6 Concurrent Tool Execution Limits

Current: Fixed _MAX_TOOL_WORKERS = 8 Issue: Not tuned by available CPU/memory Fix: Dynamic worker count based on system resources

4.7 API Client Connection Pooling

Current: Creates new client per interruptible request Issue: Connection overhead Fix: Connection pool with proper cleanup

4.8 Model Metadata Cache TTL

Current: 1 hour fixed TTL for OpenRouter metadata Issue: Stale pricing/context data Fix: Adaptive TTL based on error rates

4.9 Honcho Context Prefetch

Current: Prefetch queued at turn end, consumed next turn Issue: First turn has no prefetch Fix: Pre-warm cache on session creation

4.10 Session DB Write Batching

Current: Per-message writes to SQLite Impact: I/O overhead Fix: Batch writes with periodic flush


5. Five Potential Race Conditions or Bugs

5.1 Interrupt Propagation Race (HIGH SEVERITY)

Location: run_agent.py lines 2253-2259

with self._active_children_lock:
    children_copy = list(self._active_children)
for child in children_copy:
    child.interrupt(message)  # Child may be gone

Issue: Child agent may be removed from _active_children between copy and iteration Fix: Check if child still exists in list before calling interrupt

5.2 Concurrent Tool Execution Order

Location: run_agent.py lines 5308-5478

# Results collected in order, but execution is concurrent
results = [None] * num_tools
def _run_tool(index, ...):
    results[index] = (function_name, ..., result, ...)

Issue: If tool A depends on tool B's side effects, concurrent execution may fail Fix: Document that parallel tools must be independent; add dependency tracking

5.3 Session DB Concurrent Access

Location: run_agent.py lines 1716-1755

if not self._session_db:
    return
# ... multiple DB operations without transaction

Issue: Gateway creates multiple AIAgent instances; SQLite may lock Fix: Add proper transaction wrapping and retry logic

5.4 Context Compressor State Mutation

Location: agent/context_compressor.py lines 545-677

messages, pruned_count = self._prune_old_tool_results(messages, ...)
# messages is modified copy, but original may be referenced elsewhere

Issue: Deep copy is shallow for nested structures; tool_calls may be shared Fix: Ensure deep copy of entire message structure

5.5 Tool Call ID Collision

Location: run_agent.py lines 2910-2954

def _derive_responses_function_call_id(self, call_id, response_item_id):
    # Multiple derivations may collide
    return f"fc_{sanitized[:48]}"

Issue: Truncated IDs may collide in long conversations Fix: Use full UUIDs or ensure uniqueness with counter


Appendix: Key Files and Responsibilities

File Lines Responsibility
run_agent.py ~8500 Main AIAgent class, conversation loop
agent/prompt_builder.py ~816 System prompt assembly, skills indexing
agent/context_compressor.py ~676 Context compression, summarization
agent/auxiliary_client.py ~1822 Side-task LLM client routing
agent/model_metadata.py ~930 Context length detection, pricing
agent/display.py ~771 CLI feedback, spinners
agent/prompt_caching.py ~72 Anthropic cache control
agent/trajectory.py ~56 Trajectory format conversion
agent/models_dev.py ~172 models.dev registry integration

Summary Statistics

  • Total Core Code: ~13,000 lines
  • State Machine States: 8 primary, 4 sub-states
  • Retry Mechanisms: 7 distinct types
  • Context Layers: 4 layers with compression
  • Potential Issues: 5 identified (1 high severity)
  • Optimization Opportunities: 10 identified