docs: record hermes-agent test finding (#668 )

test: validate hermes-agent genome (#668 )
docs: add hermes-agent genome draft (#668 )
2026-04-15 00:26:33 -04:00 · 2026-04-15 00:24:01 -04:00 · 2026-04-15 00:21:39 -04:00
6 changed files with 569 additions and 353 deletions
--- a/GENOME.md
+++ b/GENOME.md
@@ -0,0 +1,485 @@
+# GENOME.md — hermes-agent
+
+Repository-wide facts in this document come from two grounded passes over `/Users/apayne/hermes-agent` on 2026-04-15:
+- `python3 ~/.hermes/pipelines/codebase-genome.py --path /Users/apayne/hermes-agent --dry-run`
+- targeted manual inspection of the core runtime, tooling, gateway, ACP, cron, and persistence modules
+
+This is the Timmy Foundation fork of `hermes-agent`, not a generic upstream summary.
+
+## Project Overview
+
+`hermes-agent` is a multi-surface AI agent runtime, not just a terminal chatbot.
+It combines:
+- a rich interactive CLI/TUI
+- a synchronous core agent loop
+- a large tool registry with terminal, file, web, browser, MCP, memory, cron, delegation, and code-execution tools
+- a multi-platform messaging gateway
+- ACP editor integration
+- an OpenAI-compatible API server
+- cron scheduling
+- persistent session/memory/state stores
+- batch and RL-adjacent research surfaces
+
+The product promise in `README.md` is that Hermes is a self-improving agent:
+- it creates and updates skills
+- persists memory across sessions
+- searches past conversations
+- delegates to subagents
+- runs scheduled automations
+- can operate through multiple runtime backends and communication surfaces
+
+Grounded quick facts from the analyzed checkout:
+- pipeline scan: 395 source files, 561 test files, 11 config files, 331,794 total lines
+- Python-only pass: 307 non-test `.py` modules and 561 test Python files
+- Python LOC split: 211,709 source LOC / 184,512 test LOC
+- current branch: `main`
+- current commit: `95d11dfd`
+- last commit seen by pipeline: `95d11dfd docs: automation templates gallery + comparison post (#9821)`
+- total commits reported by pipeline: 4140
+- largest Python modules observed:
+  - `run_agent.py` — 10,871 LOC
+  - `cli.py` — 10,017 LOC
+  - `gateway/run.py` — 9,289 LOC
+  - `hermes_cli/main.py` — 6,056 LOC
+
+That size profile matters. Hermes is architecturally broad, but a few very large orchestration files still dominate the control plane.
+
+## Architecture Diagram
+
+```mermaid
+flowchart TD
+    A[CLI / Gateway / ACP / API / Cron / Batch] --> B[AIAgent in run_agent.py]
+    B --> C[agent/prompt_builder.py]
+    B --> D[agent/memory_manager.py]
+    B --> E[agent/context_compressor.py]
+    B --> F[model_tools.py]
+
+    F --> G[tools/registry.py]
+    G --> H[tools/*.py built-in tools]
+    G --> I[tools/mcp_tool.py imported MCP tools]
+    G --> J[delegate / execute_code / cron / browser / terminal / file tools]
+
+    B --> K[hermes_state.py SQLite SessionDB]
+    B --> L[toolsets.py toolset selection]
+
+    M[cli.py + hermes_cli/main.py] --> B
+    N[gateway/run.py] --> B
+    O[acp_adapter/server.py] --> B
+    P[gateway/platforms/api_server.py] --> B
+    Q[cron/scheduler.py + cron/jobs.py] --> B
+    R[batch_runner.py] --> B
+
+    N --> S[gateway/session.py]
+    N --> T[gateway/platforms/* adapters]
+    P --> U[Responses API store]
+    O --> V[ACP session/event server]
+    Q --> W[cron job persistence + delivery]
+
+    K --> X[state.db / FTS5 search]
+    S --> Y[sessions.json mapping]
+    J --> Z[local shell, files, web, browser, subprocesses, remote MCP servers]
+```
+
+## Entry Points and Data Flow
+
+### Primary entry points
+
+1. `hermes` → `hermes_cli.main:main`
+   - canonical CLI entry point
+   - preloads profile context and builds the argparse/subcommand shell
+   - hands interactive chat to `cli.py`
+
+2. `hermes-agent` → `run_agent:main`
+   - direct runner around the core agent loop
+   - closest entry point to the raw agent runtime
+
+3. `hermes-acp` → `acp_adapter.entry:main`
+   - ACP server for VS Code / Zed / JetBrains style integrations
+
+4. `gateway/run.py`
+   - async orchestration loop for Telegram, Discord, Slack, WhatsApp, Signal, Matrix, webhook, email, SMS, and other adapters
+
+5. `gateway/platforms/api_server.py`
+   - OpenAI-compatible HTTP surface
+   - exposes `/v1/chat/completions`, `/v1/responses`, `/v1/models`, `/v1/runs`, and `/health`
+
+6. `cron/scheduler.py` + `cron/jobs.py`
+   - scheduled job execution and delivery
+
+7. `batch_runner.py`
+   - parallel batch trajectory and research workloads
+
+### Core data flow
+
+1. An entry surface receives input:
+   - terminal prompt
+   - incoming platform message
+   - ACP editor request
+   - HTTP request
+   - scheduled cron job
+   - batch input
+
+2. The surface resolves runtime state:
+   - profile/config
+   - platform identity
+   - model/provider settings
+   - toolset selection
+   - current session ID and conversation history
+
+3. `run_agent.py` assembles the effective prompt:
+   - persona/system directives
+   - platform hints
+   - context files (`AGENTS.md`, `SOUL.md`, repo-local context)
+   - skill content
+   - memory blocks from `agent/memory_manager.py`
+   - compression summaries from `agent/context_compressor.py`
+
+4. `model_tools.py` discovers and filters tools:
+   - imports tool modules so they self-register into `tools/registry.py`
+   - resolves enabled toolsets from `toolsets.py`
+   - returns tool schemas to the active model provider
+
+5. The model responds with either:
+   - final assistant text
+   - tool calls
+
+6. Tool calls are dispatched through:
+   - `model_tools.py`
+   - `tools/registry.py`
+   - the concrete tool handler
+
+7. Tool outputs are appended back into the conversation and the loop continues until a final answer is produced.
+
+8. State is persisted through:
+   - `hermes_state.py` for sessions/messages/search
+   - `gateway/session.py` for gateway session routing state
+   - dedicated stores for response APIs, background processes, and cron jobs
+
+This is a layered architecture: many user-facing surfaces, one central agent runtime, one central tool registry, and several specialized persistence layers.
+
+## Key Abstractions
+
+### 1. `AIAgent` (`run_agent.py`)
+This is the heart of Hermes.
+It owns:
+- provider/model invocation
+- tool-loop orchestration
+- prompt assembly
+- memory integration
+- compression and token budgeting
+- final response construction
+
+### 2. `IterationBudget` (`run_agent.py`)
+A guardrail abstraction around how much work a turn may do.
+It matters because Hermes is not just text generation — it may launch tools, spawn subagents, or recurse through internal workflows.
+
+### 3. `ToolRegistry` / tool self-registration (`tools/registry.py`)
+Every major tool advertises itself into a central registry.
+That gives Hermes one place to manage:
+- schemas
+- handlers
+- availability checks
+- environment requirements
+- dispatch behavior
+
+This is a defining architectural trait of the codebase.
+
+### 4. Toolsets (`toolsets.py`)
+Tool exposure is not hardcoded per surface.
+Instead, Hermes uses named toolsets and platform-specific aliases such as CLI, gateway, ACP, and API-server presets.
+This is how one agent runtime can safely shape different operating surfaces.
+
+### 5. `MemoryManager` (`agent/memory_manager.py`)
+Hermes supports both built-in memory and external memory providers.
+The abstraction here is not “a markdown note” but a memory multiplexor that decides what memory context gets injected and how memory tools behave.
+
+### 6. `ContextCompressor` (`agent/context_compressor.py`)
+Compression is a first-class subsystem.
+Hermes treats long-context management as part of the runtime architecture, not an afterthought.
+
+### 7. `SessionDB` (`hermes_state.py`)
+SQLite + FTS5 session persistence is core infrastructure.
+This is what makes cross-session recall, search, billing/accounting, and agent continuity practical.
+
+### 8. `SessionStore` / `SessionContext` (`gateway/session.py`)
+The gateway needs a routing abstraction different from raw message history.
+It tracks home channels, session keys, reset policy, and platform-specific mapping.
+
+### 9. `HermesACPAgent` (`acp_adapter/server.py`)
+ACP is not bolted on as a thin shim.
+It wraps Hermes as an editor-native agent with its own session/event lifecycle.
+
+### 10. `ProcessRegistry` (`tools/process_registry.py`)
+Long-running background commands are first-class managed resources.
+Hermes tracks them explicitly rather than treating subprocesses as disposable side effects.
+
+## API Surface
+
+### CLI and shell API
+Important surfaces exposed by packaging and command routing:
+- `hermes`
+- `hermes-agent`
+- `hermes-acp`
+- subcommands in `hermes_cli/main.py`
+- slash commands defined centrally in `hermes_cli/commands.py`
+
+The slash-command registry is a notable design choice because the same command metadata feeds:
+- CLI help
+- gateway help
+- Telegram bot command menus
+- Slack subcommand routing
+- autocomplete
+
+### HTTP API surface
+From `gateway/platforms/api_server.py`, the major routes are:
+- `POST /v1/chat/completions`
+- `POST /v1/responses`
+- `GET /v1/responses/{response_id}`
+- `DELETE /v1/responses/{response_id}`
+- `GET /v1/models`
+- `POST /v1/runs`
+- `GET /v1/runs/{run_id}/events`
+- `GET /health`
+
+This makes Hermes usable as an OpenAI-compatible backend for external clients and web UIs.
+
+### Messaging platform API surface
+The gateway platform abstraction exposes Hermes across many adapters under `gateway/platforms/`.
+Observed adapters include:
+- Telegram
+- Discord
+- Slack
+- WhatsApp
+- Signal
+- Matrix
+- Home Assistant
+- webhook
+- email
+- SMS
+- Mattermost
+- QQBot
+- WeCom / Weixin
+- DingTalk
+- BlueBubbles
+
+### Tool API surface
+The tool surface is broad and central to the product:
+- terminal execution
+- process management
+- file IO / search / patch
+- browser automation
+- web search/extract
+- cron jobs
+- memory and session search
+- subagent delegation
+- execute_code sandbox
+- MCP tool import
+- TTS / vision / image generation
+- smart-home integrations
+
+### MCP / ACP surface
+Hermes participates on both sides:
+- as an MCP client via `tools/mcp_tool.py`
+- as an MCP server for messaging/session capabilities via `mcp_serve.py`
+- as an ACP server via `acp_adapter/*`
+
+That makes Hermes an orchestration hub, not just a single runtime process.
+
+## Test Coverage Gaps
+
+### Current observed test posture
+A live collection pass on the analyzed checkout produced:
+- 11,470 tests collected
+- 50 deselected
+- 6 collection errors
+
+The collection errors are all ACP-related:
+- `tests/acp/test_entry.py`
+- `tests/acp/test_events.py`
+- `tests/acp/test_mcp_e2e.py`
+- `tests/acp/test_permissions.py`
+- `tests/acp/test_server.py`
+- `tests/acp/test_tools.py`
+
+Root cause from the live run:
+- `ModuleNotFoundError: No module named 'acp'`
+- equivalently: `ModuleNotFoundError: No module named `acp`` in the failing ACP collection lane
+- this lines up with `pyproject.toml`, where ACP support is optional and gated behind the `acp` extra (`agent-client-protocol>=0.9.0,<1.0`)
+
+A secondary signal from collection:
+- `tests/tools/test_file_sync_perf.py` emits `PytestUnknownMarkWarning: Unknown pytest.mark.ssh`
+
+This specific collection problem is now tracked in hermes-agent issue `#779`.
+
+### Where coverage looks strong
+By file distribution, the codebase is heavily tested around:
+- `gateway/`
+- `tools/`
+- `hermes_cli/`
+- `run_agent`
+- `cli`
+- `agent`
+
+That matches the product center of gravity: runtime orchestration, tool dispatch, and communication surfaces.
+
+### Highest-value remaining gaps
+The biggest gaps are not in total test count. They are in critical-path complexity.
+
+1. `run_agent.py`
+   - the most important file in the repo and also the largest
+   - likely has broad behavior coverage, but branch-level completeness is improbable at 10k+ LOC
+
+2. `cli.py`
+   - extremely large UI/orchestration surface
+   - high risk of hidden regressions across streaming, voice, slash-command routing, and interaction state
+
+3. `gateway/run.py`
+   - core async gateway brain
+   - many platform-specific edge cases converge here
+
+4. `hermes_cli/main.py`
+   - main command shell is huge and mixes parsing, routing, setup, and environment behavior
+
+5. ACP end-to-end coverage under optional dependency installation
+   - current collection failure proves this lane is environment-sensitive
+   - ACP deserves a reliable extras-aware CI lane so collection failures are surfaced intentionally, not accidentally
+
+6. `batch_runner.py` and `trajectory_compressor.py`
+   - research/training surfaces appear lighter and deserve more explicit contract tests
+
+7. cron lifecycle and delivery failure behavior
+   - `cron/scheduler.py` and `cron/jobs.py` are safety-critical for unattended automation
+
+8. optional or integration-heavy backends
+   - platform adapters like Feishu / Discord / Telegram
+   - container/cloud terminal environments
+   - MCP server interop
+   - API server streaming edge cases
+
+### Missing tests for critical paths
+The next high-leverage test work should target:
+- ACP extras-enabled collection and smoke execution
+- `run_agent.py` happy-path + interruption + compression + delegate + approval interaction boundaries
+- `gateway/run.py` cache/interrupt/restart/session-boundary behavior at integration level
+- `cron/scheduler.py` delivery error recovery, stale-job cleanup, and due-job fairness
+- `batch_runner.py` and `trajectory_compressor.py` contract tests
+- API-server Responses lifecycle and streaming segmentation behavior
+
+## Security Considerations
+
+Hermes is security-sensitive because it can run commands, read files, talk to platforms, call browsers, and broker MCP tools.
+The codebase already contains several strong defensive layers.
+
+### 1. Prompt-injection defense for context files
+`agent/prompt_builder.py` scans context files such as `AGENTS.md`, `SOUL.md`, and similar instructions for:
+- prompt-override language
+- hidden comment/HTML tricks
+- invisible unicode
+- secret exfiltration patterns
+
+That is an important architectural guardrail because Hermes explicitly ingests repository-local instruction files.
+
+### 2. Dangerous-command approval system
+`tools/approval.py` centralizes detection of destructive commands and risky shell behavior.
+The repo treats command approval as a core policy subsystem, not a UI nicety.
+
+### 3. File-path and device protections
+`tools/file_tools.py` blocks dangerous device paths and sensitive system writes.
+It also redacts sensitive content in read/search results and blocks reads from internal Hermes-sensitive locations.
+
+### 4. Terminal/workdir sanitization
+`tools/terminal_tool.py` constrains workdir handling and shell execution boundaries.
+This matters because terminal access is one of the highest-risk capabilities Hermes exposes.
+
+### 5. MCP subprocess hygiene
+`tools/mcp_tool.py` filters environment variables passed to MCP servers and strips credentials from surfaced errors.
+Given that MCP introduces third-party subprocesses into the tool graph, this is a critical boundary.
+
+### 6. Gateway privacy and pairing controls
+Gateway code includes pairing, session routing, and ID-redaction logic.
+That is important because Hermes operates across public and semi-public communication surfaces.
+
+### 7. HTTP/API hardening
+`gateway/platforms/api_server.py` includes auth, CORS handling, and response-store boundaries.
+This makes the API server a real production surface, not just a convenience wrapper.
+
+### 8. Supply-chain awareness
+`pyproject.toml` pins many dependencies to constrained ranges and includes security notes for selected packages.
+That indicates explicit supply-chain thinking in dependency management.
+
+## Performance Characteristics
+
+### 1. prompt caching is a first-class optimization
+Hermes preserves long-lived agent instances and supports provider-specific prompt caching for compatible providers.
+That is essential because repeated system prompts and tool schemas are expensive.
+
+### 2. context compression is built into the runtime
+Compression is not a manual rescue path only.
+Hermes estimates token budgets, prunes old tool noise, and can summarize prior context when needed.
+
+### 3. parallel tool execution exists, but selectively
+The runtime can batch safe tool calls in parallel rather than serializing every read-only action.
+This improves latency without giving up all control over side effects.
+
+### 4. Async loop reuse reduces orchestration overhead
+The runtime avoids constantly recreating event loops for async tools, which matters when many tool calls are issued inside otherwise synchronous agent flows.
+
+### 5. SQLite is tuned for agent workloads
+`hermes_state.py` uses WAL mode, short lock windows, and retry logic instead of pretending SQLite is magically contention-free.
+This is a sensible tradeoff for sovereign local persistence.
+
+### 6. Background processes are explicitly managed
+`ProcessRegistry` maintains output windows, state, and watcher behavior so long-running commands do not become invisible resource leaks.
+
+### 7. Large control-plane files are a real performance and maintenance cost
+The repo has broad feature coverage, but a few huge orchestration files dominate complexity:
+- `run_agent.py`
+- `cli.py`
+- `gateway/run.py`
+- `hermes_cli/main.py`
+
+These files are not just maintainability debt; they also create higher reasoning and regression load for both humans and agents working in the codebase.
+
+## Critical Modules to Name Explicitly
+
+The following files define the real control plane of Hermes and should always be named in any serious architecture summary:
+- `run_agent.py`
+- `model_tools.py`
+- `tools/registry.py`
+- `toolsets.py`
+- `cli.py`
+- `hermes_cli/main.py`
+- `hermes_cli/commands.py`
+- `hermes_state.py`
+- `agent/prompt_builder.py`
+- `agent/context_compressor.py`
+- `agent/memory_manager.py`
+- `tools/terminal_tool.py`
+- `tools/file_tools.py`
+- `tools/mcp_tool.py`
+- `gateway/run.py`
+- `gateway/session.py`
+- `gateway/platforms/api_server.py`
+- `acp_adapter/server.py`
+- `cron/scheduler.py`
+- `cron/jobs.py`
+- `batch_runner.py`
+- `trajectory_compressor.py`
+
+## Practical Takeaway
+
+Hermes Agent is best understood as a sovereign agent operating system.
+The CLI, gateway, ACP server, API server, cron scheduler, and tool graph are all frontends onto one core runtime.
+
+The strongest qualities of the codebase are:
+- broad feature coverage
+- a central tool-registry design
+- serious persistence/memory infrastructure
+- strong security thinking around prompts, tools, files, and approvals
+- a deep test surface across gateway/tools/CLI behavior
+
+The most important risks are:
+- extremely large orchestration files
+- optional-surface fragility, especially ACP extras and integration-heavy adapters
+- under-tested research/batch lanes relative to the core runtime
+- growing complexity at the boundaries where multiple surfaces reuse the same agent loop
--- a/docs/FLEET_PHASE_1_SURVIVAL.md
+++ b/docs/FLEET_PHASE_1_SURVIVAL.md
@@ -1,61 +0,0 @@
-# [PHASE-1] Survival - Keep the Lights On
-
-Phase 1 is the manual-clicker stage of the fleet. The machines exist. The services exist. The human is still the automation loop.
-
-## Phase Definition
-
- Current state: fleet exists, agents run, everything important still depends on human vigilance.
- Resources tracked here: Capacity, Uptime.
- Next phase: [PHASE-2] Automation - Self-Healing Infrastructure
-
-## Current Buildings
-
- VPS hosts: Ezra, Allegro, Bezalel
- Agents: Timmy harness, Code Claw heartbeat, Gemini AI Studio worker
- Gitea forge
- Evennia worlds
-
-## Current Resource Snapshot
-
- Fleet operational: yes
- Uptime baseline: 0.0%
- Days at or above 95% uptime: 0
- Capacity utilization: 0.0%
-
-## Next Phase Trigger
-
-To unlock [PHASE-2] Automation - Self-Healing Infrastructure, the fleet must hold both of these conditions at once:
- Uptime >= 95% for 30 consecutive days
- Capacity utilization > 60%
- Current trigger state: NOT READY
-
-## Missing Requirements
-
- Uptime 0.0% / 95.0%
- Days at or above 95% uptime: 0/30
- Capacity utilization 0.0% / >60.0%
-
-## Manual Clicker Interpretation
-
-Paperclips analogy: Phase 1 = Manual clicker. You ARE the automation.
-Every restart, every SSH, every check is a manual click.
-
-## Manual Clicks Still Required
-
- Restart agents and services by hand when a node goes dark.
- SSH into machines to verify health, disk, and memory.
- Check Gitea, relay, and world services manually before and after changes.
- Act as the scheduler when automation is missing or only partially wired.
-
-## Repo Signals Already Present
-
- `scripts/fleet_health_probe.sh` — Automated health probe exists and can supply the uptime baseline for the next phase.
- `scripts/fleet_milestones.py` — Milestone tracker exists, so survival achievements can be narrated and logged.
- `scripts/auto_restart_agent.sh` — Auto-restart tooling already exists as phase-2 groundwork.
- `scripts/backup_pipeline.sh` — Backup pipeline scaffold exists for post-survival automation work.
- `infrastructure/timmy-bridge/reports/generate_report.py` — Bridge reporting exists and can summarize heartbeat-driven uptime.
-
-## Notes
-
- The fleet is alive, but the human is still the control loop.
- Phase 1 is about naming reality plainly so later automation has a baseline to beat.
--- a/docs/RUNBOOK_INDEX.md
+++ b/docs/RUNBOOK_INDEX.md
@@ -12,7 +12,6 @@ Quick-reference index for common operational tasks across the Timmy Foundation i
 | Check fleet health | fleet-ops | `python3 scripts/fleet_readiness.py` |
 | Agent scorecard | fleet-ops | `python3 scripts/agent_scorecard.py` |
 | View fleet manifest | fleet-ops | `cat manifest.yaml` |
-| Render Phase-1 survival report | timmy-home | `python3 scripts/fleet_phase_status.py --output docs/FLEET_PHASE_1_SURVIVAL.md` |

 ## the-nexus (Frontend + Brain)

--- a/scripts/fleet_phase_status.py
+++ b/scripts/fleet_phase_status.py
@@ -1,224 +0,0 @@
-#!/usr/bin/env python3
-"""Render the current fleet survival phase as a durable report."""
-
-from __future__ import annotations
-
-import argparse
-import json
-from copy import deepcopy
-from pathlib import Path
-from typing import Any
-
-
-PHASE_NAME = "[PHASE-1] Survival - Keep the Lights On"
-NEXT_PHASE_NAME = "[PHASE-2] Automation - Self-Healing Infrastructure"
-TARGET_UPTIME_PERCENT = 95.0
-TARGET_UPTIME_DAYS = 30
-TARGET_CAPACITY_PERCENT = 60.0
-
-DEFAULT_BUILDINGS = [
-    "VPS hosts: Ezra, Allegro, Bezalel",
-    "Agents: Timmy harness, Code Claw heartbeat, Gemini AI Studio worker",
-    "Gitea forge",
-    "Evennia worlds",
-]
-
-DEFAULT_MANUAL_CLICKS = [
-    "Restart agents and services by hand when a node goes dark.",
-    "SSH into machines to verify health, disk, and memory.",
-    "Check Gitea, relay, and world services manually before and after changes.",
-    "Act as the scheduler when automation is missing or only partially wired.",
-]
-
-REPO_SIGNAL_FILES = {
-    "scripts/fleet_health_probe.sh": "Automated health probe exists and can supply the uptime baseline for the next phase.",
-    "scripts/fleet_milestones.py": "Milestone tracker exists, so survival achievements can be narrated and logged.",
-    "scripts/auto_restart_agent.sh": "Auto-restart tooling already exists as phase-2 groundwork.",
-    "scripts/backup_pipeline.sh": "Backup pipeline scaffold exists for post-survival automation work.",
-    "infrastructure/timmy-bridge/reports/generate_report.py": "Bridge reporting exists and can summarize heartbeat-driven uptime.",
-}
-
-DEFAULT_SNAPSHOT = {
-    "fleet_operational": True,
-    "resources": {
-        "uptime_percent": 0.0,
-        "days_at_or_above_95_percent": 0,
-        "capacity_utilization_percent": 0.0,
-    },
-    "current_buildings": DEFAULT_BUILDINGS,
-    "manual_clicks": DEFAULT_MANUAL_CLICKS,
-    "notes": [
-        "The fleet is alive, but the human is still the control loop.",
-        "Phase 1 is about naming reality plainly so later automation has a baseline to beat.",
-    ],
-}
-
-
-def default_snapshot() -> dict[str, Any]:
-    return deepcopy(DEFAULT_SNAPSHOT)
-
-
-def _deep_merge(base: dict[str, Any], override: dict[str, Any]) -> dict[str, Any]:
-    result = deepcopy(base)
-    for key, value in override.items():
-        if isinstance(value, dict) and isinstance(result.get(key), dict):
-            result[key] = _deep_merge(result[key], value)
-        else:
-            result[key] = value
-    return result
-
-
-def load_snapshot(snapshot_path: Path | None = None) -> dict[str, Any]:
-    snapshot = default_snapshot()
-    if snapshot_path is None:
-        return snapshot
-    override = json.loads(snapshot_path.read_text(encoding="utf-8"))
-    return _deep_merge(snapshot, override)
-
-
-def collect_repo_signals(repo_root: Path) -> list[str]:
-    signals: list[str] = []
-    for rel_path, description in REPO_SIGNAL_FILES.items():
-        if (repo_root / rel_path).exists():
-            signals.append(f"`{rel_path}` — {description}")
-    return signals
-
-
-def compute_phase_status(snapshot: dict[str, Any], repo_root: Path | None = None) -> dict[str, Any]:
-    repo_root = repo_root or Path(__file__).resolve().parents[1]
-    resources = snapshot.get("resources", {})
-    uptime_percent = float(resources.get("uptime_percent", 0.0))
-    uptime_days = int(resources.get("days_at_or_above_95_percent", 0))
-    capacity_percent = float(resources.get("capacity_utilization_percent", 0.0))
-    fleet_operational = bool(snapshot.get("fleet_operational", False))
-
-    missing: list[str] = []
-    if not fleet_operational:
-        missing.append("Fleet operational flag is false.")
-    if uptime_percent < TARGET_UPTIME_PERCENT:
-        missing.append(f"Uptime {uptime_percent:.1f}% / {TARGET_UPTIME_PERCENT:.1f}%")
-    if uptime_days < TARGET_UPTIME_DAYS:
-        missing.append(f"Days at or above 95% uptime: {uptime_days}/{TARGET_UPTIME_DAYS}")
-    if capacity_percent <= TARGET_CAPACITY_PERCENT:
-        missing.append(f"Capacity utilization {capacity_percent:.1f}% / >{TARGET_CAPACITY_PERCENT:.1f}%")
-
-    return {
-        "title": PHASE_NAME,
-        "current_phase": "PHASE-1 Survival",
-        "fleet_operational": fleet_operational,
-        "resources": {
-            "uptime_percent": uptime_percent,
-            "days_at_or_above_95_percent": uptime_days,
-            "capacity_utilization_percent": capacity_percent,
-        },
-        "current_buildings": list(snapshot.get("current_buildings", DEFAULT_BUILDINGS)),
-        "manual_clicks": list(snapshot.get("manual_clicks", DEFAULT_MANUAL_CLICKS)),
-        "notes": list(snapshot.get("notes", [])),
-        "repo_signals": collect_repo_signals(repo_root),
-        "next_phase": NEXT_PHASE_NAME,
-        "next_phase_ready": fleet_operational and not missing,
-        "missing_requirements": missing,
-    }
-
-
-def render_markdown(status: dict[str, Any]) -> str:
-    resources = status["resources"]
-    missing = status["missing_requirements"]
-    ready_line = "READY" if status["next_phase_ready"] else "NOT READY"
-
-    lines = [
-        f"# {status['title']}",
-        "",
-        "Phase 1 is the manual-clicker stage of the fleet. The machines exist. The services exist. The human is still the automation loop.",
-        "",
-        "## Phase Definition",
-        "",
-        "- Current state: fleet exists, agents run, everything important still depends on human vigilance.",
-        "- Resources tracked here: Capacity, Uptime.",
-        f"- Next phase: {status['next_phase']}",
-        "",
-        "## Current Buildings",
-        "",
-    ]
-    lines.extend(f"- {item}" for item in status["current_buildings"])
-
-    lines.extend([
-        "",
-        "## Current Resource Snapshot",
-        "",
-        f"- Fleet operational: {'yes' if status['fleet_operational'] else 'no'}",
-        f"- Uptime baseline: {resources['uptime_percent']:.1f}%",
-        f"- Days at or above 95% uptime: {resources['days_at_or_above_95_percent']}",
-        f"- Capacity utilization: {resources['capacity_utilization_percent']:.1f}%",
-        "",
-        "## Next Phase Trigger",
-        "",
-        f"To unlock {status['next_phase']}, the fleet must hold both of these conditions at once:",
-        f"- Uptime >= {TARGET_UPTIME_PERCENT:.0f}% for {TARGET_UPTIME_DAYS} consecutive days",
-        f"- Capacity utilization > {TARGET_CAPACITY_PERCENT:.0f}%",
-        f"- Current trigger state: {ready_line}",
-        "",
-        "## Missing Requirements",
-        "",
-    ])
-    if missing:
-        lines.extend(f"- {item}" for item in missing)
-    else:
-        lines.append("- None. Phase 2 can unlock now.")
-
-    lines.extend([
-        "",
-        "## Manual Clicker Interpretation",
-        "",
-        "Paperclips analogy: Phase 1 = Manual clicker. You ARE the automation.",
-        "Every restart, every SSH, every check is a manual click.",
-        "",
-        "## Manual Clicks Still Required",
-        "",
-    ])
-    lines.extend(f"- {item}" for item in status["manual_clicks"])
-
-    lines.extend([
-        "",
-        "## Repo Signals Already Present",
-        "",
-    ])
-    if status["repo_signals"]:
-        lines.extend(f"- {item}" for item in status["repo_signals"])
-    else:
-        lines.append("- No survival-adjacent repo signals detected.")
-
-    if status["notes"]:
-        lines.extend(["", "## Notes", ""])
-        lines.extend(f"- {item}" for item in status["notes"])
-
-    return "\n".join(lines).rstrip() + "\n"
-
-
-def main() -> None:
-    parser = argparse.ArgumentParser(description="Render the fleet phase-1 survival report")
-    parser.add_argument("--snapshot", help="Optional JSON snapshot overriding the default phase-1 baseline")
-    parser.add_argument("--output", help="Write markdown report to this path")
-    parser.add_argument("--json", action="store_true", help="Print computed status as JSON instead of markdown")
-    args = parser.parse_args()
-
-    snapshot = load_snapshot(Path(args.snapshot).expanduser() if args.snapshot else None)
-    repo_root = Path(__file__).resolve().parents[1]
-    status = compute_phase_status(snapshot, repo_root=repo_root)
-
-    if args.json:
-        rendered = json.dumps(status, indent=2)
-    else:
-        rendered = render_markdown(status)
-
-    if args.output:
-        output_path = Path(args.output).expanduser()
-        output_path.parent.mkdir(parents=True, exist_ok=True)
-        output_path.write_text(rendered, encoding="utf-8")
-        print(f"Phase status written to {output_path}")
-    else:
-        print(rendered)
-
-
-if __name__ == "__main__":
-    main()
--- a/tests/test_fleet_phase_status.py
+++ b/tests/test_fleet_phase_status.py
@@ -1,67 +0,0 @@
-from __future__ import annotations
-
-import importlib.util
-from pathlib import Path
-
-
-ROOT = Path(__file__).resolve().parents[1]
-SCRIPT_PATH = ROOT / "scripts" / "fleet_phase_status.py"
-DOC_PATH = ROOT / "docs" / "FLEET_PHASE_1_SURVIVAL.md"
-
-
-def _load_module(path: Path, name: str):
-    assert path.exists(), f"missing {path.relative_to(ROOT)}"
-    spec = importlib.util.spec_from_file_location(name, path)
-    assert spec and spec.loader
-    module = importlib.util.module_from_spec(spec)
-    spec.loader.exec_module(module)
-    return module
-
-
-def test_compute_phase_status_tracks_survival_gate_requirements() -> None:
-    mod = _load_module(SCRIPT_PATH, "fleet_phase_status")
-
-    status = mod.compute_phase_status(
-        {
-            "fleet_operational": True,
-            "resources": {
-                "uptime_percent": 94.5,
-                "days_at_or_above_95_percent": 12,
-                "capacity_utilization_percent": 45.0,
-            },
-        }
-    )
-
-    assert status["current_phase"] == "PHASE-1 Survival"
-    assert status["next_phase_ready"] is False
-    assert any("94.5% / 95.0%" in item for item in status["missing_requirements"])
-    assert any("12/30" in item for item in status["missing_requirements"])
-    assert any("45.0% / >60.0%" in item for item in status["missing_requirements"])
-
-
-def test_render_markdown_preserves_phase_buildings_and_manual_clicker_language() -> None:
-    mod = _load_module(SCRIPT_PATH, "fleet_phase_status")
-    status = mod.compute_phase_status(mod.default_snapshot())
-    report = mod.render_markdown(status)
-
-    for snippet in (
-        "# [PHASE-1] Survival - Keep the Lights On",
-        "VPS hosts: Ezra, Allegro, Bezalel",
-        "Timmy harness",
-        "Gitea forge",
-        "Evennia worlds",
-        "Every restart, every SSH, every check is a manual click.",
-    ):
-        assert snippet in report
-
-
-def test_repo_contains_generated_phase_1_doc() -> None:
-    assert DOC_PATH.exists(), "missing committed phase-1 survival doc"
-    text = DOC_PATH.read_text(encoding="utf-8")
-    for snippet in (
-        "# [PHASE-1] Survival - Keep the Lights On",
-        "## Current Buildings",
-        "## Next Phase Trigger",
-        "## Manual Clicker Interpretation",
-    ):
-        assert snippet in text
--- a/tests/test_hermes_agent_genome.py
+++ b/tests/test_hermes_agent_genome.py
@@ -0,0 +1,84 @@
+from pathlib import Path
+
+GENOME = Path('GENOME.md')
+
+
+def read_genome() -> str:
+    assert GENOME.exists(), 'GENOME.md must exist at repo root'
+    return GENOME.read_text(encoding='utf-8')
+
+
+def test_genome_exists():
+    assert GENOME.exists(), 'GENOME.md must exist at repo root'
+
+
+def test_genome_has_required_sections():
+    text = read_genome()
+    for heading in [
+        '# GENOME.md — hermes-agent',
+        '## Project Overview',
+        '## Architecture Diagram',
+        '## Entry Points and Data Flow',
+        '## Key Abstractions',
+        '## API Surface',
+        '## Test Coverage Gaps',
+        '## Security Considerations',
+        '## Performance Characteristics',
+        '## Critical Modules to Name Explicitly',
+    ]:
+        assert heading in text
+
+
+def test_genome_contains_mermaid_diagram():
+    text = read_genome()
+    assert '```mermaid' in text
+    assert 'flowchart TD' in text
+
+
+def test_genome_mentions_control_plane_modules():
+    text = read_genome()
+    for token in [
+        'run_agent.py',
+        'model_tools.py',
+        'tools/registry.py',
+        'toolsets.py',
+        'cli.py',
+        'hermes_cli/main.py',
+        'hermes_state.py',
+        'gateway/run.py',
+        'acp_adapter/server.py',
+        'cron/scheduler.py',
+    ]:
+        assert token in text
+
+
+def test_genome_mentions_test_gap_and_collection_findings():
+    text = read_genome()
+    for token in [
+        '11,470 tests collected',
+        '6 collection errors',
+        'ModuleNotFoundError: No module named `acp`',
+        'trajectory_compressor.py',
+        'batch_runner.py',
+    ]:
+        assert token in text
+
+
+def test_genome_mentions_security_and_performance_layers():
+    text = read_genome()
+    for token in [
+        'prompt_builder.py',
+        'approval.py',
+        'file_tools.py',
+        'mcp_tool.py',
+        'WAL mode',
+        'prompt caching',
+        'context compression',
+        'parallel tool execution',
+    ]:
+        assert token in text
+
+
+def test_genome_is_substantial():
+    text = read_genome()
+    assert len(text) >= 10000
Author	SHA1	Message	Date
Alexander Whitestone	8d9e7cbf7e	docs: record hermes-agent test finding (#668 ) Some checks are pending Smoke Test / smoke (pull_request) Waiting to run Details	2026-04-15 00:26:33 -04:00
Alexander Whitestone	85bc612100	test: validate hermes-agent genome (#668 )	2026-04-15 00:24:01 -04:00
Alexander Whitestone	9e120888c0	docs: add hermes-agent genome draft (#668 )	2026-04-15 00:21:39 -04:00