Compare commits

...

3 Commits

Author SHA1 Message Date
Ezra
f9839ad278 Merge branch 'main' into epic-999-phase-i
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Docs Site Checks / docs-site-checks (pull_request) Failing after 4s
Nix / nix (ubuntu-latest) (pull_request) Failing after 1s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Failing after 1s
Tests / test (pull_request) Failing after 3s
Tests / e2e (pull_request) Failing after 3s
Nix / nix (macos-latest) (pull_request) Has been cancelled
2026-04-06 14:02:37 +00:00
Ezra
c266661bff [EPIC-999] Phase I — add call graph, test stubs, and AIAgent decomposition plan
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Docs Site Checks / docs-site-checks (pull_request) Failing after 3s
Nix / nix (ubuntu-latest) (pull_request) Failing after 1s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Failing after 1s
Tests / test (pull_request) Failing after 2s
Tests / e2e (pull_request) Failing after 2s
Nix / nix (macos-latest) (pull_request) Has been cancelled
- call_graph.json: 128 calls inside AIAgent.run_conversation identified
- test_invariants_stubs.py: property-based contract tests for loop/registry/state/compressor
- AIAgent_DECOMPOSITION.md: 5-class refactor plan for Phase II competing rewrites

Authored-by: Ezra
2026-04-05 23:30:28 +00:00
Ezra
5f1cdfc9e4 [EPIC-999] Phase I — The Mirror: formal spec extraction artifacts
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Docs Site Checks / docs-site-checks (pull_request) Failing after 1m47s
Nix / nix (ubuntu-latest) (pull_request) Failing after 26s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Failing after 0s
Tests / test (pull_request) Failing after 47s
Tests / e2e (pull_request) Failing after 3s
Nix / nix (macos-latest) (pull_request) Has been cancelled
- module_inventory.json: 679 Python files, 298k lines, 232k SLOC
- core_analysis.json: deep AST parse of 9 core modules
- SPEC.md: high-level architecture, module specs, coupling risks, Phase II prep

Authored-by: Ezra <ezra@hermes.vps>
2026-04-05 23:27:29 +00:00
7 changed files with 52159 additions and 0 deletions

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,74 @@
# AIAgent Decomposition Plan (EPIC-999 Phase II Prep)
## Current State
`run_agent.py` contains `AIAgent` — a ~7,000-SLOC class that is the highest-blast-radius module in Hermes.
## Goal
Decompose `AIAgent` into 5 focused classes with strict interfaces, enabling:
- Parallel rewrites by competing sub-agents (Phase II)
- Independent testing of loop semantics vs. model I/O vs. memory
- Future runtime replacement (Hermes Ω) without touching tool infrastructure
## Proposed Decomposition
### 1. `ConversationLoop`
**Responsibility:** Own the `while` loop invariant, iteration budget, and termination conditions.
**Interface:**
```python
class ConversationLoop:
def run(self, messages: list, tools: list, client) -> dict:
...
```
**Invariant:** Must terminate before `max_iterations` and `iteration_budget.remaining <= 0`.
### 2. `ModelDispatcher`
**Responsibility:** All interaction with `client.chat.completions.create`, including streaming, fallback activation, and response normalization.
**Interface:**
```python
class ModelDispatcher:
def call(self, model: str, messages: list, tools: list, **kwargs) -> ModelResponse:
...
```
**Invariant:** Must always return a normalized object with `.content`, `.tool_calls`, `.reasoning`.
### 3. `ToolExecutor`
**Responsibility:** Execute tool calls (sequential or concurrent), handle errors, and format results.
**Interface:**
```python
class ToolExecutor:
def execute(self, tool_calls: list, task_id: str = None) -> list[ToolResult]:
...
```
**Invariant:** Every tool_call produces exactly one ToolResult, and errors are JSON-serializable.
### 4. `MemoryInterceptor`
**Responsibility:** Intercept `memory` and `todo` tool calls before they reach the registry, plus flush memories on session end.
**Interface:**
```python
class MemoryInterceptor:
def intercept(self, tool_name: str, args: dict, task_id: str = None) -> str | None:
... # returns result if intercepted, None if pass-through
```
**Invariant:** Must not mutate agent state except through explicit `flush()` calls.
### 5. `PromptBuilder`
**Responsibility:** Assemble system prompt, inject skills, apply context compression, and manage prompt caching markers.
**Interface:**
```python
class PromptBuilder:
def build(self, user_message: str, conversation_history: list) -> list:
...
```
**Invariant:** Output list must start with a system message (or equivalent provider parameter).
## Migration Path
1. Create the 5 classes as thin facades that delegate back to `AIAgent` methods.
2. Move logic incrementally from `AIAgent` into the new classes.
3. Once `AIAgent` is a pure coordinator (~500 SLOC), freeze the interface.
4. Phase II competing agents rewrite one class at a time.
## Acceptance Criteria
- [ ] `AIAgent` reduced to < 1,000 SLOC
- [ ] Each new class has > 80% test coverage
- [ ] Full existing test suite still passes
- [ ] No behavioral regressions in shadow mode

View File

@@ -0,0 +1,263 @@
# Hermes Ω Specification Draft (Ouroboros Phase I)
> Auto-generated by Ezra as part of EPIC-999. This document is a living artifact.
## Scope
This specification covers the core runtime of Hermes agent v0.7.x as found in the `hermes-agent` codebase.
## High-Level Architecture
```
User Message
Gateway (gateway/run.py) — platform adapter (Telegram, Discord, CLI, etc.)
HermesCLI (cli.py) or AIAgent.chat() (run_agent.py)
ModelTools (model_tools.py) — tool discovery, schema assembly, dispatch
Tool Registry (tools/registry.py) — handler lookup, availability checks
Individual Tool Implementations (tools/*.py)
Results returned up the stack
```
## Module Specifications
### `run_agent.py`
**Lines of Code:** 8948
**Classes:**
- `_SafeWriter`
- *Transparent stdio wrapper that catches OSError/ValueError from broken pipes.*
- `__init__(self, inner)`
- `write(self, data)`
- `flush(self)`
- `fileno(self)`
- `isatty(self)`
- ... and 1 more methods
- `IterationBudget`
- *Thread-safe iteration counter for an agent.*
- `__init__(self, max_total)`
- `consume(self)`
- `refund(self)`
- `used(self)`
- `remaining(self)`
- `AIAgent`
- *AI Agent with tool calling capabilities.*
- `base_url(self)`
- `base_url(self, value)`
- `__init__(self, base_url, api_key, provider, api_mode, acp_command, acp_args, command, args, model, max_iterations, tool_delay, enabled_toolsets, disabled_toolsets, save_trajectories, verbose_logging, quiet_mode, ephemeral_system_prompt, log_prefix_chars, log_prefix, providers_allowed, providers_ignored, providers_order, provider_sort, provider_require_parameters, provider_data_collection, session_id, tool_progress_callback, tool_start_callback, tool_complete_callback, thinking_callback, reasoning_callback, clarify_callback, step_callback, stream_delta_callback, tool_gen_callback, status_callback, max_tokens, reasoning_config, prefill_messages, platform, skip_context_files, skip_memory, session_db, iteration_budget, fallback_model, credential_pool, checkpoints_enabled, checkpoint_max_snapshots, pass_session_id, persist_session)`
- `reset_session_state(self)`
- `_safe_print(self)`
- ... and 100 more methods
**Top-Level Functions:**
- `_install_safe_stdio()`
- `_is_destructive_command(cmd)`
- `_should_parallelize_tool_batch(tool_calls)`
- `_extract_parallel_scope_path(tool_name, function_args)`
- `_paths_overlap(left, right)`
- `_sanitize_surrogates(text)`
- `_sanitize_messages_surrogates(messages)`
- `_strip_budget_warnings_from_history(messages)`
- `main(query, model, api_key, base_url, max_turns, enabled_toolsets, disabled_toolsets, list_tools, save_trajectories, save_sample, verbose, log_prefix_chars)`
**Inferred Side Effects & Invariants:**
- Persists state to SQLite database.
- Performs file I/O.
- Makes HTTP network calls.
- Uses global mutable state (risk factor).
### `model_tools.py`
**Lines of Code:** 466
**Top-Level Functions:**
- `_get_tool_loop()`
- `_get_worker_loop()`
- `_run_async(coro)`
- `_discover_tools()`
- `get_tool_definitions(enabled_toolsets, disabled_toolsets, quiet_mode)`
- `handle_function_call(function_name, function_args, task_id, user_task, enabled_tools)`
- `get_all_tool_names()`
- `get_toolset_for_tool(tool_name)`
- `get_available_toolsets()`
- `check_toolset_requirements()`
- ... and 1 more functions
**Inferred Side Effects & Invariants:**
- Uses global mutable state (risk factor).
- Primarily pure Python logic / orchestration.
### `cli.py`
**Lines of Code:** 8280
**Classes:**
- `ChatConsole`
- *Rich Console adapter for prompt_toolkit's patch_stdout context.*
- `__init__(self)`
- `print(self)`
- `HermesCLI`
- *Interactive CLI for the Hermes Agent.*
- `__init__(self, model, toolsets, provider, api_key, base_url, max_turns, verbose, compact, resume, checkpoints, pass_session_id)`
- `_invalidate(self, min_interval)`
- `_status_bar_context_style(self, percent_used)`
- `_build_context_bar(self, percent_used, width)`
- `_get_status_bar_snapshot(self)`
- ... and 106 more methods
**Top-Level Functions:**
- `_load_prefill_messages(file_path)`
- `_parse_reasoning_config(effort)`
- `load_cli_config()`
- `_run_cleanup()`
- `_git_repo_root()`
- `_path_is_within_root(path, root)`
- `_setup_worktree(repo_root)`
- `_cleanup_worktree(info)`
- `_prune_stale_worktrees(repo_root, max_age_hours)`
- `_accent_hex()`
- ... and 9 more functions
**Inferred Side Effects & Invariants:**
- Persists state to SQLite database.
- Performs file I/O.
- Spawns subprocesses / shell commands.
- Uses global mutable state (risk factor).
### `tools/registry.py`
**Lines of Code:** 275
**Classes:**
- `ToolEntry`
- *Metadata for a single registered tool.*
- `__init__(self, name, toolset, schema, handler, check_fn, requires_env, is_async, description, emoji)`
- `ToolRegistry`
- *Singleton registry that collects tool schemas + handlers from tool files.*
- `__init__(self)`
- `register(self, name, toolset, schema, handler, check_fn, requires_env, is_async, description, emoji)`
- `deregister(self, name)`
- `get_definitions(self, tool_names, quiet)`
- `dispatch(self, name, args)`
- ... and 10 more methods
**Inferred Side Effects & Invariants:**
- Primarily pure Python logic / orchestration.
### `gateway/run.py`
**Lines of Code:** 6657
**Classes:**
- `GatewayRunner`
- *Main gateway controller.*
- `__init__(self, config)`
- `_has_setup_skill(self)`
- `_load_voice_modes(self)`
- `_save_voice_modes(self)`
- `_set_adapter_auto_tts_disabled(self, adapter, chat_id, disabled)`
- ... and 78 more methods
**Top-Level Functions:**
- `_ensure_ssl_certs()`
- `_normalize_whatsapp_identifier(value)`
- `_expand_whatsapp_auth_aliases(identifier)`
- `_resolve_runtime_agent_kwargs()`
- `_build_media_placeholder(event)`
- `_dequeue_pending_text(adapter, session_key)`
- `_check_unavailable_skill(command_name)`
- `_platform_config_key(platform)`
- `_load_gateway_config()`
- `_resolve_gateway_model(config)`
- ... and 4 more functions
**Inferred Side Effects & Invariants:**
- Persists state to SQLite database.
- Performs file I/O.
- Spawns subprocesses / shell commands.
- Contains async code paths.
- Uses global mutable state (risk factor).
### `hermes_state.py`
**Lines of Code:** 1270
**Classes:**
- `SessionDB`
- *SQLite-backed session storage with FTS5 search.*
- `__init__(self, db_path)`
- `_execute_write(self, fn)`
- `_try_wal_checkpoint(self)`
- `close(self)`
- `_init_schema(self)`
- ... and 29 more methods
**Inferred Side Effects & Invariants:**
- Persists state to SQLite database.
### `agent/context_compressor.py`
**Lines of Code:** 676
**Classes:**
- `ContextCompressor`
- *Compresses conversation context when approaching the model's context limit.*
- `__init__(self, model, threshold_percent, protect_first_n, protect_last_n, summary_target_ratio, quiet_mode, summary_model_override, base_url, api_key, config_context_length, provider)`
- `update_from_response(self, usage)`
- `should_compress(self, prompt_tokens)`
- `should_compress_preflight(self, messages)`
- `get_status(self)`
- ... and 11 more methods
**Inferred Side Effects & Invariants:**
- Primarily pure Python logic / orchestration.
### `agent/prompt_caching.py`
**Lines of Code:** 72
**Top-Level Functions:**
- `_apply_cache_marker(msg, cache_marker, native_anthropic)`
- `apply_anthropic_cache_control(api_messages, cache_ttl, native_anthropic)`
**Inferred Side Effects & Invariants:**
- Primarily pure Python logic / orchestration.
### `agent/skill_commands.py`
**Lines of Code:** 297
**Top-Level Functions:**
- `build_plan_path(user_instruction)`
- `_load_skill_payload(skill_identifier, task_id)`
- `_build_skill_message(loaded_skill, skill_dir, activation_note, user_instruction, runtime_note)`
- `scan_skill_commands()`
- `get_skill_commands()`
- `build_skill_invocation_message(cmd_key, user_instruction, task_id, runtime_note)`
- `build_preloaded_skills_prompt(skill_identifiers, task_id)`
**Inferred Side Effects & Invariants:**
- Uses global mutable state (risk factor).
- Primarily pure Python logic / orchestration.
## Cross-Module Dependencies
Key data flow:
1. `run_agent.py` defines `AIAgent` — the canonical conversation loop.
2. `model_tools.py` assembles tool schemas and dispatches function calls.
3. `tools/registry.py` maintains the central registry; all tool files import it.
4. `gateway/run.py` adapts platform events into `AIAgent.run_conversation()` calls.
5. `cli.py` (`HermesCLI`) provides the interactive shell and slash-command routing.
## Known Coupling Risks
- `run_agent.py` is ~7k SLOC and contains the core loop, todo/memory interception, context compression, and trajectory saving. High blast radius.
- `cli.py` is ~6.5k SLOC and combines UI (Rich/prompt_toolkit), config loading, and command dispatch. Tightly coupled to display state.
- `model_tools.py` holds a process-global `_last_resolved_tool_names`. Subagent execution saves/restores this global.
- `tools/registry.py` is imported by ALL tool files; schema generation happens at import time.
## Next Actions (Phase II Prep)
1. Decompose `AIAgent` into: `ConversationLoop`, `ContextManager`, `ToolDispatcher`, `MemoryInterceptor`.
2. Extract CLI display logic from command dispatch.
3. Define strict interfaces between gateway → agent → tools.
4. Write property-based tests for the conversation loop invariant: *given the same message history and tool results, the agent must produce deterministic tool_call ordering*.
---
Generated: 2026-04-05 by Ezra (Phase I)

View File

@@ -0,0 +1,137 @@
"""
Property-based test stubs for Hermes core invariants.
Part of EPIC-999 Phase I — The Mirror.
These tests define behavioral contracts that ANY rewrite of the runtime
must satisfy, including the Hermes Ω target.
"""
import pytest
from unittest.mock import Mock, patch
# -----------------------------------------------------------------------------
# Conversation Loop Invariants
# -----------------------------------------------------------------------------
class TestConversationLoopInvariants:
"""
Invariants for AIAgent.run_conversation and its successors.
"""
def test_deterministic_tool_ordering(self):
"""
Given the same message history and available tools,
the agent must produce the same tool_call ordering.
(If non-determinism is introduced by temperature > 0,
this becomes a statistical test.)
"""
pytest.skip("TODO: implement with seeded mock model responses")
def test_tool_result_always_appended_to_history(self):
"""
After any tool_call is executed, its result MUST appear
in the conversation history before the next assistant turn.
"""
pytest.skip("TODO: mock model with forced tool_call and verify history")
def test_iteration_budget_never_exceeded(self):
"""
The loop must terminate before api_call_count >= max_iterations
AND before iteration_budget.remaining <= 0.
"""
pytest.skip("TODO: mock model to always return tool_calls; verify termination")
def test_system_prompt_presence(self):
"""
Every API call must include a system message as the first message
(or system parameter for providers that support it).
"""
pytest.skip("TODO: intercept all client.chat.completions.create calls")
def test_compression_preserves_last_n_messages(self):
"""
After context compression, the final N messages (configurable,
default ~4) must remain uncompressed to preserve local context.
"""
pytest.skip("TODO: create history > threshold, compress, verify tail")
# -----------------------------------------------------------------------------
# Tool Registry Invariants
# -----------------------------------------------------------------------------
class TestToolRegistryInvariants:
"""
Invariants for tools.registry.Registry.
"""
def test_register_then_list_contains_tool(self):
"""
After register() is called with a valid schema and handler,
list_tools() must include the registered name.
"""
pytest.skip("TODO: instantiate fresh Registry, register, assert membership")
def test_dispatch_unknown_tool_returns_error_json(self):
"""
Calling dispatch() with an unregistered tool name must return
a JSON string containing an error key, never raise raw.
"""
pytest.skip("TODO: call dispatch with 'nonexistent_tool', parse result")
def test_handler_receives_task_id_kwarg(self):
"""
Registered handlers that accept **kwargs must receive task_id
when dispatch is called with one.
"""
pytest.skip("TODO: register mock handler, dispatch with task_id, verify")
# -----------------------------------------------------------------------------
# State Persistence Invariants
# -----------------------------------------------------------------------------
class TestStatePersistenceInvariants:
"""
Invariants for hermes_state.SessionDB.
"""
def test_saved_message_is_retrievable_by_session_id(self):
"""
After save_message(session_id, ...), get_messages(session_id)
must return the message.
"""
pytest.skip("TODO: use temp SQLite DB, save, query, assert")
def test_fts_search_returns_relevant_messages(self):
"""
After indexing messages, FTS search for a unique keyword
must return the message containing it.
"""
pytest.skip("TODO: seed DB with messages, search unique token")
# -----------------------------------------------------------------------------
# Context Compressor Invariants
# -----------------------------------------------------------------------------
class TestContextCompressorInvariants:
"""
Invariants for agent.context_compressor.ContextCompressor.
"""
def test_compression_reduces_token_count(self):
"""
compress_messages(output) must have fewer tokens than
the uncompressed input (for any input > threshold).
"""
pytest.skip("TODO: mock tokenizer, provide long history, assert reduction")
def test_compression_never_drops_system_message(self):
"""
The system message must survive compression and remain
at index 0 of the returned message list.
"""
pytest.skip("TODO: compress history with system msg, verify position")