Compare commits
3 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
8d9e7cbf7e | ||
|
|
85bc612100 | ||
|
|
9e120888c0 |
485
GENOME.md
Normal file
485
GENOME.md
Normal file
@@ -0,0 +1,485 @@
|
||||
# GENOME.md — hermes-agent
|
||||
|
||||
Repository-wide facts in this document come from two grounded passes over `/Users/apayne/hermes-agent` on 2026-04-15:
|
||||
- `python3 ~/.hermes/pipelines/codebase-genome.py --path /Users/apayne/hermes-agent --dry-run`
|
||||
- targeted manual inspection of the core runtime, tooling, gateway, ACP, cron, and persistence modules
|
||||
|
||||
This is the Timmy Foundation fork of `hermes-agent`, not a generic upstream summary.
|
||||
|
||||
## Project Overview
|
||||
|
||||
`hermes-agent` is a multi-surface AI agent runtime, not just a terminal chatbot.
|
||||
It combines:
|
||||
- a rich interactive CLI/TUI
|
||||
- a synchronous core agent loop
|
||||
- a large tool registry with terminal, file, web, browser, MCP, memory, cron, delegation, and code-execution tools
|
||||
- a multi-platform messaging gateway
|
||||
- ACP editor integration
|
||||
- an OpenAI-compatible API server
|
||||
- cron scheduling
|
||||
- persistent session/memory/state stores
|
||||
- batch and RL-adjacent research surfaces
|
||||
|
||||
The product promise in `README.md` is that Hermes is a self-improving agent:
|
||||
- it creates and updates skills
|
||||
- persists memory across sessions
|
||||
- searches past conversations
|
||||
- delegates to subagents
|
||||
- runs scheduled automations
|
||||
- can operate through multiple runtime backends and communication surfaces
|
||||
|
||||
Grounded quick facts from the analyzed checkout:
|
||||
- pipeline scan: 395 source files, 561 test files, 11 config files, 331,794 total lines
|
||||
- Python-only pass: 307 non-test `.py` modules and 561 test Python files
|
||||
- Python LOC split: 211,709 source LOC / 184,512 test LOC
|
||||
- current branch: `main`
|
||||
- current commit: `95d11dfd`
|
||||
- last commit seen by pipeline: `95d11dfd docs: automation templates gallery + comparison post (#9821)`
|
||||
- total commits reported by pipeline: 4140
|
||||
- largest Python modules observed:
|
||||
- `run_agent.py` — 10,871 LOC
|
||||
- `cli.py` — 10,017 LOC
|
||||
- `gateway/run.py` — 9,289 LOC
|
||||
- `hermes_cli/main.py` — 6,056 LOC
|
||||
|
||||
That size profile matters. Hermes is architecturally broad, but a few very large orchestration files still dominate the control plane.
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
A[CLI / Gateway / ACP / API / Cron / Batch] --> B[AIAgent in run_agent.py]
|
||||
B --> C[agent/prompt_builder.py]
|
||||
B --> D[agent/memory_manager.py]
|
||||
B --> E[agent/context_compressor.py]
|
||||
B --> F[model_tools.py]
|
||||
|
||||
F --> G[tools/registry.py]
|
||||
G --> H[tools/*.py built-in tools]
|
||||
G --> I[tools/mcp_tool.py imported MCP tools]
|
||||
G --> J[delegate / execute_code / cron / browser / terminal / file tools]
|
||||
|
||||
B --> K[hermes_state.py SQLite SessionDB]
|
||||
B --> L[toolsets.py toolset selection]
|
||||
|
||||
M[cli.py + hermes_cli/main.py] --> B
|
||||
N[gateway/run.py] --> B
|
||||
O[acp_adapter/server.py] --> B
|
||||
P[gateway/platforms/api_server.py] --> B
|
||||
Q[cron/scheduler.py + cron/jobs.py] --> B
|
||||
R[batch_runner.py] --> B
|
||||
|
||||
N --> S[gateway/session.py]
|
||||
N --> T[gateway/platforms/* adapters]
|
||||
P --> U[Responses API store]
|
||||
O --> V[ACP session/event server]
|
||||
Q --> W[cron job persistence + delivery]
|
||||
|
||||
K --> X[state.db / FTS5 search]
|
||||
S --> Y[sessions.json mapping]
|
||||
J --> Z[local shell, files, web, browser, subprocesses, remote MCP servers]
|
||||
```
|
||||
|
||||
## Entry Points and Data Flow
|
||||
|
||||
### Primary entry points
|
||||
|
||||
1. `hermes` → `hermes_cli.main:main`
|
||||
- canonical CLI entry point
|
||||
- preloads profile context and builds the argparse/subcommand shell
|
||||
- hands interactive chat to `cli.py`
|
||||
|
||||
2. `hermes-agent` → `run_agent:main`
|
||||
- direct runner around the core agent loop
|
||||
- closest entry point to the raw agent runtime
|
||||
|
||||
3. `hermes-acp` → `acp_adapter.entry:main`
|
||||
- ACP server for VS Code / Zed / JetBrains style integrations
|
||||
|
||||
4. `gateway/run.py`
|
||||
- async orchestration loop for Telegram, Discord, Slack, WhatsApp, Signal, Matrix, webhook, email, SMS, and other adapters
|
||||
|
||||
5. `gateway/platforms/api_server.py`
|
||||
- OpenAI-compatible HTTP surface
|
||||
- exposes `/v1/chat/completions`, `/v1/responses`, `/v1/models`, `/v1/runs`, and `/health`
|
||||
|
||||
6. `cron/scheduler.py` + `cron/jobs.py`
|
||||
- scheduled job execution and delivery
|
||||
|
||||
7. `batch_runner.py`
|
||||
- parallel batch trajectory and research workloads
|
||||
|
||||
### Core data flow
|
||||
|
||||
1. An entry surface receives input:
|
||||
- terminal prompt
|
||||
- incoming platform message
|
||||
- ACP editor request
|
||||
- HTTP request
|
||||
- scheduled cron job
|
||||
- batch input
|
||||
|
||||
2. The surface resolves runtime state:
|
||||
- profile/config
|
||||
- platform identity
|
||||
- model/provider settings
|
||||
- toolset selection
|
||||
- current session ID and conversation history
|
||||
|
||||
3. `run_agent.py` assembles the effective prompt:
|
||||
- persona/system directives
|
||||
- platform hints
|
||||
- context files (`AGENTS.md`, `SOUL.md`, repo-local context)
|
||||
- skill content
|
||||
- memory blocks from `agent/memory_manager.py`
|
||||
- compression summaries from `agent/context_compressor.py`
|
||||
|
||||
4. `model_tools.py` discovers and filters tools:
|
||||
- imports tool modules so they self-register into `tools/registry.py`
|
||||
- resolves enabled toolsets from `toolsets.py`
|
||||
- returns tool schemas to the active model provider
|
||||
|
||||
5. The model responds with either:
|
||||
- final assistant text
|
||||
- tool calls
|
||||
|
||||
6. Tool calls are dispatched through:
|
||||
- `model_tools.py`
|
||||
- `tools/registry.py`
|
||||
- the concrete tool handler
|
||||
|
||||
7. Tool outputs are appended back into the conversation and the loop continues until a final answer is produced.
|
||||
|
||||
8. State is persisted through:
|
||||
- `hermes_state.py` for sessions/messages/search
|
||||
- `gateway/session.py` for gateway session routing state
|
||||
- dedicated stores for response APIs, background processes, and cron jobs
|
||||
|
||||
This is a layered architecture: many user-facing surfaces, one central agent runtime, one central tool registry, and several specialized persistence layers.
|
||||
|
||||
## Key Abstractions
|
||||
|
||||
### 1. `AIAgent` (`run_agent.py`)
|
||||
This is the heart of Hermes.
|
||||
It owns:
|
||||
- provider/model invocation
|
||||
- tool-loop orchestration
|
||||
- prompt assembly
|
||||
- memory integration
|
||||
- compression and token budgeting
|
||||
- final response construction
|
||||
|
||||
### 2. `IterationBudget` (`run_agent.py`)
|
||||
A guardrail abstraction around how much work a turn may do.
|
||||
It matters because Hermes is not just text generation — it may launch tools, spawn subagents, or recurse through internal workflows.
|
||||
|
||||
### 3. `ToolRegistry` / tool self-registration (`tools/registry.py`)
|
||||
Every major tool advertises itself into a central registry.
|
||||
That gives Hermes one place to manage:
|
||||
- schemas
|
||||
- handlers
|
||||
- availability checks
|
||||
- environment requirements
|
||||
- dispatch behavior
|
||||
|
||||
This is a defining architectural trait of the codebase.
|
||||
|
||||
### 4. Toolsets (`toolsets.py`)
|
||||
Tool exposure is not hardcoded per surface.
|
||||
Instead, Hermes uses named toolsets and platform-specific aliases such as CLI, gateway, ACP, and API-server presets.
|
||||
This is how one agent runtime can safely shape different operating surfaces.
|
||||
|
||||
### 5. `MemoryManager` (`agent/memory_manager.py`)
|
||||
Hermes supports both built-in memory and external memory providers.
|
||||
The abstraction here is not “a markdown note” but a memory multiplexor that decides what memory context gets injected and how memory tools behave.
|
||||
|
||||
### 6. `ContextCompressor` (`agent/context_compressor.py`)
|
||||
Compression is a first-class subsystem.
|
||||
Hermes treats long-context management as part of the runtime architecture, not an afterthought.
|
||||
|
||||
### 7. `SessionDB` (`hermes_state.py`)
|
||||
SQLite + FTS5 session persistence is core infrastructure.
|
||||
This is what makes cross-session recall, search, billing/accounting, and agent continuity practical.
|
||||
|
||||
### 8. `SessionStore` / `SessionContext` (`gateway/session.py`)
|
||||
The gateway needs a routing abstraction different from raw message history.
|
||||
It tracks home channels, session keys, reset policy, and platform-specific mapping.
|
||||
|
||||
### 9. `HermesACPAgent` (`acp_adapter/server.py`)
|
||||
ACP is not bolted on as a thin shim.
|
||||
It wraps Hermes as an editor-native agent with its own session/event lifecycle.
|
||||
|
||||
### 10. `ProcessRegistry` (`tools/process_registry.py`)
|
||||
Long-running background commands are first-class managed resources.
|
||||
Hermes tracks them explicitly rather than treating subprocesses as disposable side effects.
|
||||
|
||||
## API Surface
|
||||
|
||||
### CLI and shell API
|
||||
Important surfaces exposed by packaging and command routing:
|
||||
- `hermes`
|
||||
- `hermes-agent`
|
||||
- `hermes-acp`
|
||||
- subcommands in `hermes_cli/main.py`
|
||||
- slash commands defined centrally in `hermes_cli/commands.py`
|
||||
|
||||
The slash-command registry is a notable design choice because the same command metadata feeds:
|
||||
- CLI help
|
||||
- gateway help
|
||||
- Telegram bot command menus
|
||||
- Slack subcommand routing
|
||||
- autocomplete
|
||||
|
||||
### HTTP API surface
|
||||
From `gateway/platforms/api_server.py`, the major routes are:
|
||||
- `POST /v1/chat/completions`
|
||||
- `POST /v1/responses`
|
||||
- `GET /v1/responses/{response_id}`
|
||||
- `DELETE /v1/responses/{response_id}`
|
||||
- `GET /v1/models`
|
||||
- `POST /v1/runs`
|
||||
- `GET /v1/runs/{run_id}/events`
|
||||
- `GET /health`
|
||||
|
||||
This makes Hermes usable as an OpenAI-compatible backend for external clients and web UIs.
|
||||
|
||||
### Messaging platform API surface
|
||||
The gateway platform abstraction exposes Hermes across many adapters under `gateway/platforms/`.
|
||||
Observed adapters include:
|
||||
- Telegram
|
||||
- Discord
|
||||
- Slack
|
||||
- WhatsApp
|
||||
- Signal
|
||||
- Matrix
|
||||
- Home Assistant
|
||||
- webhook
|
||||
- email
|
||||
- SMS
|
||||
- Mattermost
|
||||
- QQBot
|
||||
- WeCom / Weixin
|
||||
- DingTalk
|
||||
- BlueBubbles
|
||||
|
||||
### Tool API surface
|
||||
The tool surface is broad and central to the product:
|
||||
- terminal execution
|
||||
- process management
|
||||
- file IO / search / patch
|
||||
- browser automation
|
||||
- web search/extract
|
||||
- cron jobs
|
||||
- memory and session search
|
||||
- subagent delegation
|
||||
- execute_code sandbox
|
||||
- MCP tool import
|
||||
- TTS / vision / image generation
|
||||
- smart-home integrations
|
||||
|
||||
### MCP / ACP surface
|
||||
Hermes participates on both sides:
|
||||
- as an MCP client via `tools/mcp_tool.py`
|
||||
- as an MCP server for messaging/session capabilities via `mcp_serve.py`
|
||||
- as an ACP server via `acp_adapter/*`
|
||||
|
||||
That makes Hermes an orchestration hub, not just a single runtime process.
|
||||
|
||||
## Test Coverage Gaps
|
||||
|
||||
### Current observed test posture
|
||||
A live collection pass on the analyzed checkout produced:
|
||||
- 11,470 tests collected
|
||||
- 50 deselected
|
||||
- 6 collection errors
|
||||
|
||||
The collection errors are all ACP-related:
|
||||
- `tests/acp/test_entry.py`
|
||||
- `tests/acp/test_events.py`
|
||||
- `tests/acp/test_mcp_e2e.py`
|
||||
- `tests/acp/test_permissions.py`
|
||||
- `tests/acp/test_server.py`
|
||||
- `tests/acp/test_tools.py`
|
||||
|
||||
Root cause from the live run:
|
||||
- `ModuleNotFoundError: No module named 'acp'`
|
||||
- equivalently: `ModuleNotFoundError: No module named `acp`` in the failing ACP collection lane
|
||||
- this lines up with `pyproject.toml`, where ACP support is optional and gated behind the `acp` extra (`agent-client-protocol>=0.9.0,<1.0`)
|
||||
|
||||
A secondary signal from collection:
|
||||
- `tests/tools/test_file_sync_perf.py` emits `PytestUnknownMarkWarning: Unknown pytest.mark.ssh`
|
||||
|
||||
This specific collection problem is now tracked in hermes-agent issue `#779`.
|
||||
|
||||
### Where coverage looks strong
|
||||
By file distribution, the codebase is heavily tested around:
|
||||
- `gateway/`
|
||||
- `tools/`
|
||||
- `hermes_cli/`
|
||||
- `run_agent`
|
||||
- `cli`
|
||||
- `agent`
|
||||
|
||||
That matches the product center of gravity: runtime orchestration, tool dispatch, and communication surfaces.
|
||||
|
||||
### Highest-value remaining gaps
|
||||
The biggest gaps are not in total test count. They are in critical-path complexity.
|
||||
|
||||
1. `run_agent.py`
|
||||
- the most important file in the repo and also the largest
|
||||
- likely has broad behavior coverage, but branch-level completeness is improbable at 10k+ LOC
|
||||
|
||||
2. `cli.py`
|
||||
- extremely large UI/orchestration surface
|
||||
- high risk of hidden regressions across streaming, voice, slash-command routing, and interaction state
|
||||
|
||||
3. `gateway/run.py`
|
||||
- core async gateway brain
|
||||
- many platform-specific edge cases converge here
|
||||
|
||||
4. `hermes_cli/main.py`
|
||||
- main command shell is huge and mixes parsing, routing, setup, and environment behavior
|
||||
|
||||
5. ACP end-to-end coverage under optional dependency installation
|
||||
- current collection failure proves this lane is environment-sensitive
|
||||
- ACP deserves a reliable extras-aware CI lane so collection failures are surfaced intentionally, not accidentally
|
||||
|
||||
6. `batch_runner.py` and `trajectory_compressor.py`
|
||||
- research/training surfaces appear lighter and deserve more explicit contract tests
|
||||
|
||||
7. cron lifecycle and delivery failure behavior
|
||||
- `cron/scheduler.py` and `cron/jobs.py` are safety-critical for unattended automation
|
||||
|
||||
8. optional or integration-heavy backends
|
||||
- platform adapters like Feishu / Discord / Telegram
|
||||
- container/cloud terminal environments
|
||||
- MCP server interop
|
||||
- API server streaming edge cases
|
||||
|
||||
### Missing tests for critical paths
|
||||
The next high-leverage test work should target:
|
||||
- ACP extras-enabled collection and smoke execution
|
||||
- `run_agent.py` happy-path + interruption + compression + delegate + approval interaction boundaries
|
||||
- `gateway/run.py` cache/interrupt/restart/session-boundary behavior at integration level
|
||||
- `cron/scheduler.py` delivery error recovery, stale-job cleanup, and due-job fairness
|
||||
- `batch_runner.py` and `trajectory_compressor.py` contract tests
|
||||
- API-server Responses lifecycle and streaming segmentation behavior
|
||||
|
||||
## Security Considerations
|
||||
|
||||
Hermes is security-sensitive because it can run commands, read files, talk to platforms, call browsers, and broker MCP tools.
|
||||
The codebase already contains several strong defensive layers.
|
||||
|
||||
### 1. Prompt-injection defense for context files
|
||||
`agent/prompt_builder.py` scans context files such as `AGENTS.md`, `SOUL.md`, and similar instructions for:
|
||||
- prompt-override language
|
||||
- hidden comment/HTML tricks
|
||||
- invisible unicode
|
||||
- secret exfiltration patterns
|
||||
|
||||
That is an important architectural guardrail because Hermes explicitly ingests repository-local instruction files.
|
||||
|
||||
### 2. Dangerous-command approval system
|
||||
`tools/approval.py` centralizes detection of destructive commands and risky shell behavior.
|
||||
The repo treats command approval as a core policy subsystem, not a UI nicety.
|
||||
|
||||
### 3. File-path and device protections
|
||||
`tools/file_tools.py` blocks dangerous device paths and sensitive system writes.
|
||||
It also redacts sensitive content in read/search results and blocks reads from internal Hermes-sensitive locations.
|
||||
|
||||
### 4. Terminal/workdir sanitization
|
||||
`tools/terminal_tool.py` constrains workdir handling and shell execution boundaries.
|
||||
This matters because terminal access is one of the highest-risk capabilities Hermes exposes.
|
||||
|
||||
### 5. MCP subprocess hygiene
|
||||
`tools/mcp_tool.py` filters environment variables passed to MCP servers and strips credentials from surfaced errors.
|
||||
Given that MCP introduces third-party subprocesses into the tool graph, this is a critical boundary.
|
||||
|
||||
### 6. Gateway privacy and pairing controls
|
||||
Gateway code includes pairing, session routing, and ID-redaction logic.
|
||||
That is important because Hermes operates across public and semi-public communication surfaces.
|
||||
|
||||
### 7. HTTP/API hardening
|
||||
`gateway/platforms/api_server.py` includes auth, CORS handling, and response-store boundaries.
|
||||
This makes the API server a real production surface, not just a convenience wrapper.
|
||||
|
||||
### 8. Supply-chain awareness
|
||||
`pyproject.toml` pins many dependencies to constrained ranges and includes security notes for selected packages.
|
||||
That indicates explicit supply-chain thinking in dependency management.
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### 1. prompt caching is a first-class optimization
|
||||
Hermes preserves long-lived agent instances and supports provider-specific prompt caching for compatible providers.
|
||||
That is essential because repeated system prompts and tool schemas are expensive.
|
||||
|
||||
### 2. context compression is built into the runtime
|
||||
Compression is not a manual rescue path only.
|
||||
Hermes estimates token budgets, prunes old tool noise, and can summarize prior context when needed.
|
||||
|
||||
### 3. parallel tool execution exists, but selectively
|
||||
The runtime can batch safe tool calls in parallel rather than serializing every read-only action.
|
||||
This improves latency without giving up all control over side effects.
|
||||
|
||||
### 4. Async loop reuse reduces orchestration overhead
|
||||
The runtime avoids constantly recreating event loops for async tools, which matters when many tool calls are issued inside otherwise synchronous agent flows.
|
||||
|
||||
### 5. SQLite is tuned for agent workloads
|
||||
`hermes_state.py` uses WAL mode, short lock windows, and retry logic instead of pretending SQLite is magically contention-free.
|
||||
This is a sensible tradeoff for sovereign local persistence.
|
||||
|
||||
### 6. Background processes are explicitly managed
|
||||
`ProcessRegistry` maintains output windows, state, and watcher behavior so long-running commands do not become invisible resource leaks.
|
||||
|
||||
### 7. Large control-plane files are a real performance and maintenance cost
|
||||
The repo has broad feature coverage, but a few huge orchestration files dominate complexity:
|
||||
- `run_agent.py`
|
||||
- `cli.py`
|
||||
- `gateway/run.py`
|
||||
- `hermes_cli/main.py`
|
||||
|
||||
These files are not just maintainability debt; they also create higher reasoning and regression load for both humans and agents working in the codebase.
|
||||
|
||||
## Critical Modules to Name Explicitly
|
||||
|
||||
The following files define the real control plane of Hermes and should always be named in any serious architecture summary:
|
||||
- `run_agent.py`
|
||||
- `model_tools.py`
|
||||
- `tools/registry.py`
|
||||
- `toolsets.py`
|
||||
- `cli.py`
|
||||
- `hermes_cli/main.py`
|
||||
- `hermes_cli/commands.py`
|
||||
- `hermes_state.py`
|
||||
- `agent/prompt_builder.py`
|
||||
- `agent/context_compressor.py`
|
||||
- `agent/memory_manager.py`
|
||||
- `tools/terminal_tool.py`
|
||||
- `tools/file_tools.py`
|
||||
- `tools/mcp_tool.py`
|
||||
- `gateway/run.py`
|
||||
- `gateway/session.py`
|
||||
- `gateway/platforms/api_server.py`
|
||||
- `acp_adapter/server.py`
|
||||
- `cron/scheduler.py`
|
||||
- `cron/jobs.py`
|
||||
- `batch_runner.py`
|
||||
- `trajectory_compressor.py`
|
||||
|
||||
## Practical Takeaway
|
||||
|
||||
Hermes Agent is best understood as a sovereign agent operating system.
|
||||
The CLI, gateway, ACP server, API server, cron scheduler, and tool graph are all frontends onto one core runtime.
|
||||
|
||||
The strongest qualities of the codebase are:
|
||||
- broad feature coverage
|
||||
- a central tool-registry design
|
||||
- serious persistence/memory infrastructure
|
||||
- strong security thinking around prompts, tools, files, and approvals
|
||||
- a deep test surface across gateway/tools/CLI behavior
|
||||
|
||||
The most important risks are:
|
||||
- extremely large orchestration files
|
||||
- optional-surface fragility, especially ACP extras and integration-heavy adapters
|
||||
- under-tested research/batch lanes relative to the core runtime
|
||||
- growing complexity at the boundaries where multiple surfaces reuse the same agent loop
|
||||
@@ -1,61 +0,0 @@
|
||||
# [PHASE-1] Survival - Keep the Lights On
|
||||
|
||||
Phase 1 is the manual-clicker stage of the fleet. The machines exist. The services exist. The human is still the automation loop.
|
||||
|
||||
## Phase Definition
|
||||
|
||||
- Current state: fleet exists, agents run, everything important still depends on human vigilance.
|
||||
- Resources tracked here: Capacity, Uptime.
|
||||
- Next phase: [PHASE-2] Automation - Self-Healing Infrastructure
|
||||
|
||||
## Current Buildings
|
||||
|
||||
- VPS hosts: Ezra, Allegro, Bezalel
|
||||
- Agents: Timmy harness, Code Claw heartbeat, Gemini AI Studio worker
|
||||
- Gitea forge
|
||||
- Evennia worlds
|
||||
|
||||
## Current Resource Snapshot
|
||||
|
||||
- Fleet operational: yes
|
||||
- Uptime baseline: 0.0%
|
||||
- Days at or above 95% uptime: 0
|
||||
- Capacity utilization: 0.0%
|
||||
|
||||
## Next Phase Trigger
|
||||
|
||||
To unlock [PHASE-2] Automation - Self-Healing Infrastructure, the fleet must hold both of these conditions at once:
|
||||
- Uptime >= 95% for 30 consecutive days
|
||||
- Capacity utilization > 60%
|
||||
- Current trigger state: NOT READY
|
||||
|
||||
## Missing Requirements
|
||||
|
||||
- Uptime 0.0% / 95.0%
|
||||
- Days at or above 95% uptime: 0/30
|
||||
- Capacity utilization 0.0% / >60.0%
|
||||
|
||||
## Manual Clicker Interpretation
|
||||
|
||||
Paperclips analogy: Phase 1 = Manual clicker. You ARE the automation.
|
||||
Every restart, every SSH, every check is a manual click.
|
||||
|
||||
## Manual Clicks Still Required
|
||||
|
||||
- Restart agents and services by hand when a node goes dark.
|
||||
- SSH into machines to verify health, disk, and memory.
|
||||
- Check Gitea, relay, and world services manually before and after changes.
|
||||
- Act as the scheduler when automation is missing or only partially wired.
|
||||
|
||||
## Repo Signals Already Present
|
||||
|
||||
- `scripts/fleet_health_probe.sh` — Automated health probe exists and can supply the uptime baseline for the next phase.
|
||||
- `scripts/fleet_milestones.py` — Milestone tracker exists, so survival achievements can be narrated and logged.
|
||||
- `scripts/auto_restart_agent.sh` — Auto-restart tooling already exists as phase-2 groundwork.
|
||||
- `scripts/backup_pipeline.sh` — Backup pipeline scaffold exists for post-survival automation work.
|
||||
- `infrastructure/timmy-bridge/reports/generate_report.py` — Bridge reporting exists and can summarize heartbeat-driven uptime.
|
||||
|
||||
## Notes
|
||||
|
||||
- The fleet is alive, but the human is still the control loop.
|
||||
- Phase 1 is about naming reality plainly so later automation has a baseline to beat.
|
||||
@@ -12,7 +12,6 @@ Quick-reference index for common operational tasks across the Timmy Foundation i
|
||||
| Check fleet health | fleet-ops | `python3 scripts/fleet_readiness.py` |
|
||||
| Agent scorecard | fleet-ops | `python3 scripts/agent_scorecard.py` |
|
||||
| View fleet manifest | fleet-ops | `cat manifest.yaml` |
|
||||
| Render Phase-1 survival report | timmy-home | `python3 scripts/fleet_phase_status.py --output docs/FLEET_PHASE_1_SURVIVAL.md` |
|
||||
|
||||
## the-nexus (Frontend + Brain)
|
||||
|
||||
|
||||
@@ -1,224 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Render the current fleet survival phase as a durable report."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
from copy import deepcopy
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
|
||||
PHASE_NAME = "[PHASE-1] Survival - Keep the Lights On"
|
||||
NEXT_PHASE_NAME = "[PHASE-2] Automation - Self-Healing Infrastructure"
|
||||
TARGET_UPTIME_PERCENT = 95.0
|
||||
TARGET_UPTIME_DAYS = 30
|
||||
TARGET_CAPACITY_PERCENT = 60.0
|
||||
|
||||
DEFAULT_BUILDINGS = [
|
||||
"VPS hosts: Ezra, Allegro, Bezalel",
|
||||
"Agents: Timmy harness, Code Claw heartbeat, Gemini AI Studio worker",
|
||||
"Gitea forge",
|
||||
"Evennia worlds",
|
||||
]
|
||||
|
||||
DEFAULT_MANUAL_CLICKS = [
|
||||
"Restart agents and services by hand when a node goes dark.",
|
||||
"SSH into machines to verify health, disk, and memory.",
|
||||
"Check Gitea, relay, and world services manually before and after changes.",
|
||||
"Act as the scheduler when automation is missing or only partially wired.",
|
||||
]
|
||||
|
||||
REPO_SIGNAL_FILES = {
|
||||
"scripts/fleet_health_probe.sh": "Automated health probe exists and can supply the uptime baseline for the next phase.",
|
||||
"scripts/fleet_milestones.py": "Milestone tracker exists, so survival achievements can be narrated and logged.",
|
||||
"scripts/auto_restart_agent.sh": "Auto-restart tooling already exists as phase-2 groundwork.",
|
||||
"scripts/backup_pipeline.sh": "Backup pipeline scaffold exists for post-survival automation work.",
|
||||
"infrastructure/timmy-bridge/reports/generate_report.py": "Bridge reporting exists and can summarize heartbeat-driven uptime.",
|
||||
}
|
||||
|
||||
DEFAULT_SNAPSHOT = {
|
||||
"fleet_operational": True,
|
||||
"resources": {
|
||||
"uptime_percent": 0.0,
|
||||
"days_at_or_above_95_percent": 0,
|
||||
"capacity_utilization_percent": 0.0,
|
||||
},
|
||||
"current_buildings": DEFAULT_BUILDINGS,
|
||||
"manual_clicks": DEFAULT_MANUAL_CLICKS,
|
||||
"notes": [
|
||||
"The fleet is alive, but the human is still the control loop.",
|
||||
"Phase 1 is about naming reality plainly so later automation has a baseline to beat.",
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
def default_snapshot() -> dict[str, Any]:
|
||||
return deepcopy(DEFAULT_SNAPSHOT)
|
||||
|
||||
|
||||
def _deep_merge(base: dict[str, Any], override: dict[str, Any]) -> dict[str, Any]:
|
||||
result = deepcopy(base)
|
||||
for key, value in override.items():
|
||||
if isinstance(value, dict) and isinstance(result.get(key), dict):
|
||||
result[key] = _deep_merge(result[key], value)
|
||||
else:
|
||||
result[key] = value
|
||||
return result
|
||||
|
||||
|
||||
def load_snapshot(snapshot_path: Path | None = None) -> dict[str, Any]:
|
||||
snapshot = default_snapshot()
|
||||
if snapshot_path is None:
|
||||
return snapshot
|
||||
override = json.loads(snapshot_path.read_text(encoding="utf-8"))
|
||||
return _deep_merge(snapshot, override)
|
||||
|
||||
|
||||
def collect_repo_signals(repo_root: Path) -> list[str]:
|
||||
signals: list[str] = []
|
||||
for rel_path, description in REPO_SIGNAL_FILES.items():
|
||||
if (repo_root / rel_path).exists():
|
||||
signals.append(f"`{rel_path}` — {description}")
|
||||
return signals
|
||||
|
||||
|
||||
def compute_phase_status(snapshot: dict[str, Any], repo_root: Path | None = None) -> dict[str, Any]:
|
||||
repo_root = repo_root or Path(__file__).resolve().parents[1]
|
||||
resources = snapshot.get("resources", {})
|
||||
uptime_percent = float(resources.get("uptime_percent", 0.0))
|
||||
uptime_days = int(resources.get("days_at_or_above_95_percent", 0))
|
||||
capacity_percent = float(resources.get("capacity_utilization_percent", 0.0))
|
||||
fleet_operational = bool(snapshot.get("fleet_operational", False))
|
||||
|
||||
missing: list[str] = []
|
||||
if not fleet_operational:
|
||||
missing.append("Fleet operational flag is false.")
|
||||
if uptime_percent < TARGET_UPTIME_PERCENT:
|
||||
missing.append(f"Uptime {uptime_percent:.1f}% / {TARGET_UPTIME_PERCENT:.1f}%")
|
||||
if uptime_days < TARGET_UPTIME_DAYS:
|
||||
missing.append(f"Days at or above 95% uptime: {uptime_days}/{TARGET_UPTIME_DAYS}")
|
||||
if capacity_percent <= TARGET_CAPACITY_PERCENT:
|
||||
missing.append(f"Capacity utilization {capacity_percent:.1f}% / >{TARGET_CAPACITY_PERCENT:.1f}%")
|
||||
|
||||
return {
|
||||
"title": PHASE_NAME,
|
||||
"current_phase": "PHASE-1 Survival",
|
||||
"fleet_operational": fleet_operational,
|
||||
"resources": {
|
||||
"uptime_percent": uptime_percent,
|
||||
"days_at_or_above_95_percent": uptime_days,
|
||||
"capacity_utilization_percent": capacity_percent,
|
||||
},
|
||||
"current_buildings": list(snapshot.get("current_buildings", DEFAULT_BUILDINGS)),
|
||||
"manual_clicks": list(snapshot.get("manual_clicks", DEFAULT_MANUAL_CLICKS)),
|
||||
"notes": list(snapshot.get("notes", [])),
|
||||
"repo_signals": collect_repo_signals(repo_root),
|
||||
"next_phase": NEXT_PHASE_NAME,
|
||||
"next_phase_ready": fleet_operational and not missing,
|
||||
"missing_requirements": missing,
|
||||
}
|
||||
|
||||
|
||||
def render_markdown(status: dict[str, Any]) -> str:
|
||||
resources = status["resources"]
|
||||
missing = status["missing_requirements"]
|
||||
ready_line = "READY" if status["next_phase_ready"] else "NOT READY"
|
||||
|
||||
lines = [
|
||||
f"# {status['title']}",
|
||||
"",
|
||||
"Phase 1 is the manual-clicker stage of the fleet. The machines exist. The services exist. The human is still the automation loop.",
|
||||
"",
|
||||
"## Phase Definition",
|
||||
"",
|
||||
"- Current state: fleet exists, agents run, everything important still depends on human vigilance.",
|
||||
"- Resources tracked here: Capacity, Uptime.",
|
||||
f"- Next phase: {status['next_phase']}",
|
||||
"",
|
||||
"## Current Buildings",
|
||||
"",
|
||||
]
|
||||
lines.extend(f"- {item}" for item in status["current_buildings"])
|
||||
|
||||
lines.extend([
|
||||
"",
|
||||
"## Current Resource Snapshot",
|
||||
"",
|
||||
f"- Fleet operational: {'yes' if status['fleet_operational'] else 'no'}",
|
||||
f"- Uptime baseline: {resources['uptime_percent']:.1f}%",
|
||||
f"- Days at or above 95% uptime: {resources['days_at_or_above_95_percent']}",
|
||||
f"- Capacity utilization: {resources['capacity_utilization_percent']:.1f}%",
|
||||
"",
|
||||
"## Next Phase Trigger",
|
||||
"",
|
||||
f"To unlock {status['next_phase']}, the fleet must hold both of these conditions at once:",
|
||||
f"- Uptime >= {TARGET_UPTIME_PERCENT:.0f}% for {TARGET_UPTIME_DAYS} consecutive days",
|
||||
f"- Capacity utilization > {TARGET_CAPACITY_PERCENT:.0f}%",
|
||||
f"- Current trigger state: {ready_line}",
|
||||
"",
|
||||
"## Missing Requirements",
|
||||
"",
|
||||
])
|
||||
if missing:
|
||||
lines.extend(f"- {item}" for item in missing)
|
||||
else:
|
||||
lines.append("- None. Phase 2 can unlock now.")
|
||||
|
||||
lines.extend([
|
||||
"",
|
||||
"## Manual Clicker Interpretation",
|
||||
"",
|
||||
"Paperclips analogy: Phase 1 = Manual clicker. You ARE the automation.",
|
||||
"Every restart, every SSH, every check is a manual click.",
|
||||
"",
|
||||
"## Manual Clicks Still Required",
|
||||
"",
|
||||
])
|
||||
lines.extend(f"- {item}" for item in status["manual_clicks"])
|
||||
|
||||
lines.extend([
|
||||
"",
|
||||
"## Repo Signals Already Present",
|
||||
"",
|
||||
])
|
||||
if status["repo_signals"]:
|
||||
lines.extend(f"- {item}" for item in status["repo_signals"])
|
||||
else:
|
||||
lines.append("- No survival-adjacent repo signals detected.")
|
||||
|
||||
if status["notes"]:
|
||||
lines.extend(["", "## Notes", ""])
|
||||
lines.extend(f"- {item}" for item in status["notes"])
|
||||
|
||||
return "\n".join(lines).rstrip() + "\n"
|
||||
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(description="Render the fleet phase-1 survival report")
|
||||
parser.add_argument("--snapshot", help="Optional JSON snapshot overriding the default phase-1 baseline")
|
||||
parser.add_argument("--output", help="Write markdown report to this path")
|
||||
parser.add_argument("--json", action="store_true", help="Print computed status as JSON instead of markdown")
|
||||
args = parser.parse_args()
|
||||
|
||||
snapshot = load_snapshot(Path(args.snapshot).expanduser() if args.snapshot else None)
|
||||
repo_root = Path(__file__).resolve().parents[1]
|
||||
status = compute_phase_status(snapshot, repo_root=repo_root)
|
||||
|
||||
if args.json:
|
||||
rendered = json.dumps(status, indent=2)
|
||||
else:
|
||||
rendered = render_markdown(status)
|
||||
|
||||
if args.output:
|
||||
output_path = Path(args.output).expanduser()
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
output_path.write_text(rendered, encoding="utf-8")
|
||||
print(f"Phase status written to {output_path}")
|
||||
else:
|
||||
print(rendered)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -1,67 +0,0 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import importlib.util
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
ROOT = Path(__file__).resolve().parents[1]
|
||||
SCRIPT_PATH = ROOT / "scripts" / "fleet_phase_status.py"
|
||||
DOC_PATH = ROOT / "docs" / "FLEET_PHASE_1_SURVIVAL.md"
|
||||
|
||||
|
||||
def _load_module(path: Path, name: str):
|
||||
assert path.exists(), f"missing {path.relative_to(ROOT)}"
|
||||
spec = importlib.util.spec_from_file_location(name, path)
|
||||
assert spec and spec.loader
|
||||
module = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(module)
|
||||
return module
|
||||
|
||||
|
||||
def test_compute_phase_status_tracks_survival_gate_requirements() -> None:
|
||||
mod = _load_module(SCRIPT_PATH, "fleet_phase_status")
|
||||
|
||||
status = mod.compute_phase_status(
|
||||
{
|
||||
"fleet_operational": True,
|
||||
"resources": {
|
||||
"uptime_percent": 94.5,
|
||||
"days_at_or_above_95_percent": 12,
|
||||
"capacity_utilization_percent": 45.0,
|
||||
},
|
||||
}
|
||||
)
|
||||
|
||||
assert status["current_phase"] == "PHASE-1 Survival"
|
||||
assert status["next_phase_ready"] is False
|
||||
assert any("94.5% / 95.0%" in item for item in status["missing_requirements"])
|
||||
assert any("12/30" in item for item in status["missing_requirements"])
|
||||
assert any("45.0% / >60.0%" in item for item in status["missing_requirements"])
|
||||
|
||||
|
||||
def test_render_markdown_preserves_phase_buildings_and_manual_clicker_language() -> None:
|
||||
mod = _load_module(SCRIPT_PATH, "fleet_phase_status")
|
||||
status = mod.compute_phase_status(mod.default_snapshot())
|
||||
report = mod.render_markdown(status)
|
||||
|
||||
for snippet in (
|
||||
"# [PHASE-1] Survival - Keep the Lights On",
|
||||
"VPS hosts: Ezra, Allegro, Bezalel",
|
||||
"Timmy harness",
|
||||
"Gitea forge",
|
||||
"Evennia worlds",
|
||||
"Every restart, every SSH, every check is a manual click.",
|
||||
):
|
||||
assert snippet in report
|
||||
|
||||
|
||||
def test_repo_contains_generated_phase_1_doc() -> None:
|
||||
assert DOC_PATH.exists(), "missing committed phase-1 survival doc"
|
||||
text = DOC_PATH.read_text(encoding="utf-8")
|
||||
for snippet in (
|
||||
"# [PHASE-1] Survival - Keep the Lights On",
|
||||
"## Current Buildings",
|
||||
"## Next Phase Trigger",
|
||||
"## Manual Clicker Interpretation",
|
||||
):
|
||||
assert snippet in text
|
||||
84
tests/test_hermes_agent_genome.py
Normal file
84
tests/test_hermes_agent_genome.py
Normal file
@@ -0,0 +1,84 @@
|
||||
from pathlib import Path
|
||||
|
||||
GENOME = Path('GENOME.md')
|
||||
|
||||
|
||||
def read_genome() -> str:
|
||||
assert GENOME.exists(), 'GENOME.md must exist at repo root'
|
||||
return GENOME.read_text(encoding='utf-8')
|
||||
|
||||
|
||||
def test_genome_exists():
|
||||
assert GENOME.exists(), 'GENOME.md must exist at repo root'
|
||||
|
||||
|
||||
def test_genome_has_required_sections():
|
||||
text = read_genome()
|
||||
for heading in [
|
||||
'# GENOME.md — hermes-agent',
|
||||
'## Project Overview',
|
||||
'## Architecture Diagram',
|
||||
'## Entry Points and Data Flow',
|
||||
'## Key Abstractions',
|
||||
'## API Surface',
|
||||
'## Test Coverage Gaps',
|
||||
'## Security Considerations',
|
||||
'## Performance Characteristics',
|
||||
'## Critical Modules to Name Explicitly',
|
||||
]:
|
||||
assert heading in text
|
||||
|
||||
|
||||
def test_genome_contains_mermaid_diagram():
|
||||
text = read_genome()
|
||||
assert '```mermaid' in text
|
||||
assert 'flowchart TD' in text
|
||||
|
||||
|
||||
def test_genome_mentions_control_plane_modules():
|
||||
text = read_genome()
|
||||
for token in [
|
||||
'run_agent.py',
|
||||
'model_tools.py',
|
||||
'tools/registry.py',
|
||||
'toolsets.py',
|
||||
'cli.py',
|
||||
'hermes_cli/main.py',
|
||||
'hermes_state.py',
|
||||
'gateway/run.py',
|
||||
'acp_adapter/server.py',
|
||||
'cron/scheduler.py',
|
||||
]:
|
||||
assert token in text
|
||||
|
||||
|
||||
def test_genome_mentions_test_gap_and_collection_findings():
|
||||
text = read_genome()
|
||||
for token in [
|
||||
'11,470 tests collected',
|
||||
'6 collection errors',
|
||||
'ModuleNotFoundError: No module named `acp`',
|
||||
'trajectory_compressor.py',
|
||||
'batch_runner.py',
|
||||
]:
|
||||
assert token in text
|
||||
|
||||
|
||||
def test_genome_mentions_security_and_performance_layers():
|
||||
text = read_genome()
|
||||
for token in [
|
||||
'prompt_builder.py',
|
||||
'approval.py',
|
||||
'file_tools.py',
|
||||
'mcp_tool.py',
|
||||
'WAL mode',
|
||||
'prompt caching',
|
||||
'context compression',
|
||||
'parallel tool execution',
|
||||
]:
|
||||
assert token in text
|
||||
|
||||
|
||||
def test_genome_is_substantial():
|
||||
text = read_genome()
|
||||
assert len(text) >= 10000
|
||||
Reference in New Issue
Block a user