Timmy_Foundation/timmy-home

Fork 0

Files

Alexander Whitestone 8d9e7cbf7e

Smoke Test / smoke (pull_request) Failing after 18s

Details

docs: record hermes-agent test finding (#668 )

2026-04-15 00:26:33 -04:00

18 KiB

Raw Blame History

GENOME.md — hermes-agent

Repository-wide facts in this document come from two grounded passes over /Users/apayne/hermes-agent on 2026-04-15:

python3 ~/.hermes/pipelines/codebase-genome.py --path /Users/apayne/hermes-agent --dry-run
targeted manual inspection of the core runtime, tooling, gateway, ACP, cron, and persistence modules

This is the Timmy Foundation fork of hermes-agent, not a generic upstream summary.

Project Overview

hermes-agent is a multi-surface AI agent runtime, not just a terminal chatbot. It combines:

a rich interactive CLI/TUI
a synchronous core agent loop
a large tool registry with terminal, file, web, browser, MCP, memory, cron, delegation, and code-execution tools
a multi-platform messaging gateway
ACP editor integration
an OpenAI-compatible API server
cron scheduling
persistent session/memory/state stores
batch and RL-adjacent research surfaces

The product promise in README.md is that Hermes is a self-improving agent:

it creates and updates skills
persists memory across sessions
searches past conversations
delegates to subagents
runs scheduled automations
can operate through multiple runtime backends and communication surfaces

Grounded quick facts from the analyzed checkout:

pipeline scan: 395 source files, 561 test files, 11 config files, 331,794 total lines
Python-only pass: 307 non-test .py modules and 561 test Python files
Python LOC split: 211,709 source LOC / 184,512 test LOC
current branch: main
current commit: 95d11dfd
last commit seen by pipeline: 95d11dfd docs: automation templates gallery + comparison post (#9821)
total commits reported by pipeline: 4140
largest Python modules observed:
- run_agent.py — 10,871 LOC
- cli.py — 10,017 LOC
- gateway/run.py — 9,289 LOC
- hermes_cli/main.py — 6,056 LOC

That size profile matters. Hermes is architecturally broad, but a few very large orchestration files still dominate the control plane.

Architecture Diagram

flowchart TD
    A[CLI / Gateway / ACP / API / Cron / Batch] --> B[AIAgent in run_agent.py]
    B --> C[agent/prompt_builder.py]
    B --> D[agent/memory_manager.py]
    B --> E[agent/context_compressor.py]
    B --> F[model_tools.py]

    F --> G[tools/registry.py]
    G --> H[tools/*.py built-in tools]
    G --> I[tools/mcp_tool.py imported MCP tools]
    G --> J[delegate / execute_code / cron / browser / terminal / file tools]

    B --> K[hermes_state.py SQLite SessionDB]
    B --> L[toolsets.py toolset selection]

    M[cli.py + hermes_cli/main.py] --> B
    N[gateway/run.py] --> B
    O[acp_adapter/server.py] --> B
    P[gateway/platforms/api_server.py] --> B
    Q[cron/scheduler.py + cron/jobs.py] --> B
    R[batch_runner.py] --> B

    N --> S[gateway/session.py]
    N --> T[gateway/platforms/* adapters]
    P --> U[Responses API store]
    O --> V[ACP session/event server]
    Q --> W[cron job persistence + delivery]

    K --> X[state.db / FTS5 search]
    S --> Y[sessions.json mapping]
    J --> Z[local shell, files, web, browser, subprocesses, remote MCP servers]

Entry Points and Data Flow

Primary entry points

hermes → hermes_cli.main:main
- canonical CLI entry point
- preloads profile context and builds the argparse/subcommand shell
- hands interactive chat to cli.py
hermes-agent → run_agent:main
- direct runner around the core agent loop
- closest entry point to the raw agent runtime
hermes-acp → acp_adapter.entry:main
- ACP server for VS Code / Zed / JetBrains style integrations
gateway/run.py
- async orchestration loop for Telegram, Discord, Slack, WhatsApp, Signal, Matrix, webhook, email, SMS, and other adapters
gateway/platforms/api_server.py
- OpenAI-compatible HTTP surface
- exposes /v1/chat/completions, /v1/responses, /v1/models, /v1/runs, and /health
cron/scheduler.py + cron/jobs.py
- scheduled job execution and delivery
batch_runner.py
- parallel batch trajectory and research workloads

Core data flow

An entry surface receives input:
- terminal prompt
- incoming platform message
- ACP editor request
- HTTP request
- scheduled cron job
- batch input
The surface resolves runtime state:
- profile/config
- platform identity
- model/provider settings
- toolset selection
- current session ID and conversation history
run_agent.py assembles the effective prompt:
- persona/system directives
- platform hints
- context files (AGENTS.md, SOUL.md, repo-local context)
- skill content
- memory blocks from agent/memory_manager.py
- compression summaries from agent/context_compressor.py
model_tools.py discovers and filters tools:
- imports tool modules so they self-register into tools/registry.py
- resolves enabled toolsets from toolsets.py
- returns tool schemas to the active model provider
The model responds with either:
- final assistant text
- tool calls
Tool calls are dispatched through:
- model_tools.py
- tools/registry.py
- the concrete tool handler
Tool outputs are appended back into the conversation and the loop continues until a final answer is produced.
State is persisted through:
- hermes_state.py for sessions/messages/search
- gateway/session.py for gateway session routing state
- dedicated stores for response APIs, background processes, and cron jobs

This is a layered architecture: many user-facing surfaces, one central agent runtime, one central tool registry, and several specialized persistence layers.

Key Abstractions

1. `AIAgent` (`run_agent.py`)

This is the heart of Hermes. It owns:

provider/model invocation
tool-loop orchestration
prompt assembly
memory integration
compression and token budgeting
final response construction

2. `IterationBudget` (`run_agent.py`)

A guardrail abstraction around how much work a turn may do. It matters because Hermes is not just text generation — it may launch tools, spawn subagents, or recurse through internal workflows.

3. `ToolRegistry` / tool self-registration (`tools/registry.py`)

Every major tool advertises itself into a central registry. That gives Hermes one place to manage:

schemas
handlers
availability checks
environment requirements
dispatch behavior

This is a defining architectural trait of the codebase.

4. Toolsets (`toolsets.py`)

Tool exposure is not hardcoded per surface. Instead, Hermes uses named toolsets and platform-specific aliases such as CLI, gateway, ACP, and API-server presets. This is how one agent runtime can safely shape different operating surfaces.

5. `MemoryManager` (`agent/memory_manager.py`)

Hermes supports both built-in memory and external memory providers. The abstraction here is not “a markdown note” but a memory multiplexor that decides what memory context gets injected and how memory tools behave.

6. `ContextCompressor` (`agent/context_compressor.py`)

Compression is a first-class subsystem. Hermes treats long-context management as part of the runtime architecture, not an afterthought.

7. `SessionDB` (`hermes_state.py`)

SQLite + FTS5 session persistence is core infrastructure. This is what makes cross-session recall, search, billing/accounting, and agent continuity practical.

8. `SessionStore` / `SessionContext` (`gateway/session.py`)

The gateway needs a routing abstraction different from raw message history. It tracks home channels, session keys, reset policy, and platform-specific mapping.

9. `HermesACPAgent` (`acp_adapter/server.py`)

ACP is not bolted on as a thin shim. It wraps Hermes as an editor-native agent with its own session/event lifecycle.

10. `ProcessRegistry` (`tools/process_registry.py`)

Long-running background commands are first-class managed resources. Hermes tracks them explicitly rather than treating subprocesses as disposable side effects.

API Surface

CLI and shell API

Important surfaces exposed by packaging and command routing:

hermes
hermes-agent
hermes-acp
subcommands in hermes_cli/main.py
slash commands defined centrally in hermes_cli/commands.py

The slash-command registry is a notable design choice because the same command metadata feeds:

CLI help
gateway help
Telegram bot command menus
Slack subcommand routing
autocomplete

HTTP API surface

From gateway/platforms/api_server.py, the major routes are:

POST /v1/chat/completions
POST /v1/responses
GET /v1/responses/{response_id}
DELETE /v1/responses/{response_id}
GET /v1/models
POST /v1/runs
GET /v1/runs/{run_id}/events
GET /health

This makes Hermes usable as an OpenAI-compatible backend for external clients and web UIs.

Messaging platform API surface

The gateway platform abstraction exposes Hermes across many adapters under gateway/platforms/. Observed adapters include:

Telegram
Discord
Slack
WhatsApp
Signal
Matrix
Home Assistant
webhook
email
SMS
Mattermost
QQBot
WeCom / Weixin
DingTalk
BlueBubbles

Tool API surface

The tool surface is broad and central to the product:

terminal execution
process management
file IO / search / patch
browser automation
web search/extract
cron jobs
memory and session search
subagent delegation
execute_code sandbox
MCP tool import
TTS / vision / image generation
smart-home integrations

MCP / ACP surface

Hermes participates on both sides:

as an MCP client via tools/mcp_tool.py
as an MCP server for messaging/session capabilities via mcp_serve.py
as an ACP server via acp_adapter/*

That makes Hermes an orchestration hub, not just a single runtime process.

Test Coverage Gaps

Current observed test posture

A live collection pass on the analyzed checkout produced:

11,470 tests collected
50 deselected
6 collection errors

The collection errors are all ACP-related:

tests/acp/test_entry.py
tests/acp/test_events.py
tests/acp/test_mcp_e2e.py
tests/acp/test_permissions.py
tests/acp/test_server.py
tests/acp/test_tools.py

Root cause from the live run:

ModuleNotFoundError: No module named 'acp'
equivalently: ModuleNotFoundError: No module named acp`` in the failing ACP collection lane
this lines up with pyproject.toml, where ACP support is optional and gated behind the acp extra (agent-client-protocol>=0.9.0,<1.0)

A secondary signal from collection:

tests/tools/test_file_sync_perf.py emits PytestUnknownMarkWarning: Unknown pytest.mark.ssh

This specific collection problem is now tracked in hermes-agent issue #779.

Where coverage looks strong

By file distribution, the codebase is heavily tested around:

gateway/
tools/
hermes_cli/
run_agent
cli
agent

That matches the product center of gravity: runtime orchestration, tool dispatch, and communication surfaces.

Highest-value remaining gaps

The biggest gaps are not in total test count. They are in critical-path complexity.

run_agent.py
- the most important file in the repo and also the largest
- likely has broad behavior coverage, but branch-level completeness is improbable at 10k+ LOC
cli.py
- extremely large UI/orchestration surface
- high risk of hidden regressions across streaming, voice, slash-command routing, and interaction state
gateway/run.py
- core async gateway brain
- many platform-specific edge cases converge here
hermes_cli/main.py
- main command shell is huge and mixes parsing, routing, setup, and environment behavior
ACP end-to-end coverage under optional dependency installation
- current collection failure proves this lane is environment-sensitive
- ACP deserves a reliable extras-aware CI lane so collection failures are surfaced intentionally, not accidentally
batch_runner.py and trajectory_compressor.py
- research/training surfaces appear lighter and deserve more explicit contract tests
cron lifecycle and delivery failure behavior
- cron/scheduler.py and cron/jobs.py are safety-critical for unattended automation
optional or integration-heavy backends
- platform adapters like Feishu / Discord / Telegram
- container/cloud terminal environments
- MCP server interop
- API server streaming edge cases

Missing tests for critical paths

The next high-leverage test work should target:

ACP extras-enabled collection and smoke execution
run_agent.py happy-path + interruption + compression + delegate + approval interaction boundaries
gateway/run.py cache/interrupt/restart/session-boundary behavior at integration level
cron/scheduler.py delivery error recovery, stale-job cleanup, and due-job fairness
batch_runner.py and trajectory_compressor.py contract tests
API-server Responses lifecycle and streaming segmentation behavior

Security Considerations

Hermes is security-sensitive because it can run commands, read files, talk to platforms, call browsers, and broker MCP tools. The codebase already contains several strong defensive layers.

1. Prompt-injection defense for context files

agent/prompt_builder.py scans context files such as AGENTS.md, SOUL.md, and similar instructions for:

prompt-override language
hidden comment/HTML tricks
invisible unicode
secret exfiltration patterns

That is an important architectural guardrail because Hermes explicitly ingests repository-local instruction files.

2. Dangerous-command approval system

tools/approval.py centralizes detection of destructive commands and risky shell behavior. The repo treats command approval as a core policy subsystem, not a UI nicety.

3. File-path and device protections

tools/file_tools.py blocks dangerous device paths and sensitive system writes. It also redacts sensitive content in read/search results and blocks reads from internal Hermes-sensitive locations.

4. Terminal/workdir sanitization

tools/terminal_tool.py constrains workdir handling and shell execution boundaries. This matters because terminal access is one of the highest-risk capabilities Hermes exposes.

5. MCP subprocess hygiene

tools/mcp_tool.py filters environment variables passed to MCP servers and strips credentials from surfaced errors. Given that MCP introduces third-party subprocesses into the tool graph, this is a critical boundary.

6. Gateway privacy and pairing controls

Gateway code includes pairing, session routing, and ID-redaction logic. That is important because Hermes operates across public and semi-public communication surfaces.

7. HTTP/API hardening

gateway/platforms/api_server.py includes auth, CORS handling, and response-store boundaries. This makes the API server a real production surface, not just a convenience wrapper.

8. Supply-chain awareness

pyproject.toml pins many dependencies to constrained ranges and includes security notes for selected packages. That indicates explicit supply-chain thinking in dependency management.

Performance Characteristics

1. prompt caching is a first-class optimization

Hermes preserves long-lived agent instances and supports provider-specific prompt caching for compatible providers. That is essential because repeated system prompts and tool schemas are expensive.

2. context compression is built into the runtime

Compression is not a manual rescue path only. Hermes estimates token budgets, prunes old tool noise, and can summarize prior context when needed.

3. parallel tool execution exists, but selectively

The runtime can batch safe tool calls in parallel rather than serializing every read-only action. This improves latency without giving up all control over side effects.

4. Async loop reuse reduces orchestration overhead

The runtime avoids constantly recreating event loops for async tools, which matters when many tool calls are issued inside otherwise synchronous agent flows.

5. SQLite is tuned for agent workloads

hermes_state.py uses WAL mode, short lock windows, and retry logic instead of pretending SQLite is magically contention-free. This is a sensible tradeoff for sovereign local persistence.

6. Background processes are explicitly managed

ProcessRegistry maintains output windows, state, and watcher behavior so long-running commands do not become invisible resource leaks.

7. Large control-plane files are a real performance and maintenance cost

The repo has broad feature coverage, but a few huge orchestration files dominate complexity:

run_agent.py
cli.py
gateway/run.py
hermes_cli/main.py

These files are not just maintainability debt; they also create higher reasoning and regression load for both humans and agents working in the codebase.

Critical Modules to Name Explicitly

The following files define the real control plane of Hermes and should always be named in any serious architecture summary:

run_agent.py
model_tools.py
tools/registry.py
toolsets.py
cli.py
hermes_cli/main.py
hermes_cli/commands.py
hermes_state.py
agent/prompt_builder.py
agent/context_compressor.py
agent/memory_manager.py
tools/terminal_tool.py
tools/file_tools.py
tools/mcp_tool.py
gateway/run.py
gateway/session.py
gateway/platforms/api_server.py
acp_adapter/server.py
cron/scheduler.py
cron/jobs.py
batch_runner.py
trajectory_compressor.py

Practical Takeaway

Hermes Agent is best understood as a sovereign agent operating system. The CLI, gateway, ACP server, API server, cron scheduler, and tool graph are all frontends onto one core runtime.

The strongest qualities of the codebase are:

broad feature coverage
a central tool-registry design
serious persistence/memory infrastructure
strong security thinking around prompts, tools, files, and approvals
a deep test surface across gateway/tools/CLI behavior

The most important risks are:

extremely large orchestration files
optional-surface fragility, especially ACP extras and integration-heavy adapters
under-tested research/batch lanes relative to the core runtime
growing complexity at the boundaries where multiple surfaces reuse the same agent loop

18 KiB Raw Blame History