[EPIC-999] Phase I — add call graph, test stubs, and AIAgent decomposition plan
Some checks are pending
Docker Build and Publish / build-and-push (pull_request) Waiting to run
Docs Site Checks / docs-site-checks (pull_request) Waiting to run
Nix / nix (macos-latest) (pull_request) Waiting to run
Nix / nix (ubuntu-latest) (pull_request) Waiting to run
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Waiting to run
Tests / test (pull_request) Waiting to run
Tests / e2e (pull_request) Waiting to run
Some checks are pending
Docker Build and Publish / build-and-push (pull_request) Waiting to run
Docs Site Checks / docs-site-checks (pull_request) Waiting to run
Nix / nix (macos-latest) (pull_request) Waiting to run
Nix / nix (ubuntu-latest) (pull_request) Waiting to run
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Waiting to run
Tests / test (pull_request) Waiting to run
Tests / e2e (pull_request) Waiting to run
- call_graph.json: 128 calls inside AIAgent.run_conversation identified - test_invariants_stubs.py: property-based contract tests for loop/registry/state/compressor - AIAgent_DECOMPOSITION.md: 5-class refactor plan for Phase II competing rewrites Authored-by: Ezra
This commit is contained in:
4657
docs/ouroboros/artifacts/call_graph.json
Normal file
4657
docs/ouroboros/artifacts/call_graph.json
Normal file
File diff suppressed because it is too large
Load Diff
74
docs/ouroboros/specs/AIAgent_DECOMPOSITION.md
Normal file
74
docs/ouroboros/specs/AIAgent_DECOMPOSITION.md
Normal file
@@ -0,0 +1,74 @@
|
||||
# AIAgent Decomposition Plan (EPIC-999 Phase II Prep)
|
||||
|
||||
## Current State
|
||||
`run_agent.py` contains `AIAgent` — a ~7,000-SLOC class that is the highest-blast-radius module in Hermes.
|
||||
|
||||
## Goal
|
||||
Decompose `AIAgent` into 5 focused classes with strict interfaces, enabling:
|
||||
- Parallel rewrites by competing sub-agents (Phase II)
|
||||
- Independent testing of loop semantics vs. model I/O vs. memory
|
||||
- Future runtime replacement (Hermes Ω) without touching tool infrastructure
|
||||
|
||||
## Proposed Decomposition
|
||||
|
||||
### 1. `ConversationLoop`
|
||||
**Responsibility:** Own the `while` loop invariant, iteration budget, and termination conditions.
|
||||
**Interface:**
|
||||
```python
|
||||
class ConversationLoop:
|
||||
def run(self, messages: list, tools: list, client) -> dict:
|
||||
...
|
||||
```
|
||||
**Invariant:** Must terminate before `max_iterations` and `iteration_budget.remaining <= 0`.
|
||||
|
||||
### 2. `ModelDispatcher`
|
||||
**Responsibility:** All interaction with `client.chat.completions.create`, including streaming, fallback activation, and response normalization.
|
||||
**Interface:**
|
||||
```python
|
||||
class ModelDispatcher:
|
||||
def call(self, model: str, messages: list, tools: list, **kwargs) -> ModelResponse:
|
||||
...
|
||||
```
|
||||
**Invariant:** Must always return a normalized object with `.content`, `.tool_calls`, `.reasoning`.
|
||||
|
||||
### 3. `ToolExecutor`
|
||||
**Responsibility:** Execute tool calls (sequential or concurrent), handle errors, and format results.
|
||||
**Interface:**
|
||||
```python
|
||||
class ToolExecutor:
|
||||
def execute(self, tool_calls: list, task_id: str = None) -> list[ToolResult]:
|
||||
...
|
||||
```
|
||||
**Invariant:** Every tool_call produces exactly one ToolResult, and errors are JSON-serializable.
|
||||
|
||||
### 4. `MemoryInterceptor`
|
||||
**Responsibility:** Intercept `memory` and `todo` tool calls before they reach the registry, plus flush memories on session end.
|
||||
**Interface:**
|
||||
```python
|
||||
class MemoryInterceptor:
|
||||
def intercept(self, tool_name: str, args: dict, task_id: str = None) -> str | None:
|
||||
... # returns result if intercepted, None if pass-through
|
||||
```
|
||||
**Invariant:** Must not mutate agent state except through explicit `flush()` calls.
|
||||
|
||||
### 5. `PromptBuilder`
|
||||
**Responsibility:** Assemble system prompt, inject skills, apply context compression, and manage prompt caching markers.
|
||||
**Interface:**
|
||||
```python
|
||||
class PromptBuilder:
|
||||
def build(self, user_message: str, conversation_history: list) -> list:
|
||||
...
|
||||
```
|
||||
**Invariant:** Output list must start with a system message (or equivalent provider parameter).
|
||||
|
||||
## Migration Path
|
||||
1. Create the 5 classes as thin facades that delegate back to `AIAgent` methods.
|
||||
2. Move logic incrementally from `AIAgent` into the new classes.
|
||||
3. Once `AIAgent` is a pure coordinator (~500 SLOC), freeze the interface.
|
||||
4. Phase II competing agents rewrite one class at a time.
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] `AIAgent` reduced to < 1,000 SLOC
|
||||
- [ ] Each new class has > 80% test coverage
|
||||
- [ ] Full existing test suite still passes
|
||||
- [ ] No behavioral regressions in shadow mode
|
||||
137
docs/ouroboros/specs/test_invariants_stubs.py
Normal file
137
docs/ouroboros/specs/test_invariants_stubs.py
Normal file
@@ -0,0 +1,137 @@
|
||||
"""
|
||||
Property-based test stubs for Hermes core invariants.
|
||||
Part of EPIC-999 Phase I — The Mirror.
|
||||
|
||||
These tests define behavioral contracts that ANY rewrite of the runtime
|
||||
must satisfy, including the Hermes Ω target.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from unittest.mock import Mock, patch
|
||||
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Conversation Loop Invariants
|
||||
# -----------------------------------------------------------------------------
|
||||
|
||||
class TestConversationLoopInvariants:
|
||||
"""
|
||||
Invariants for AIAgent.run_conversation and its successors.
|
||||
"""
|
||||
|
||||
def test_deterministic_tool_ordering(self):
|
||||
"""
|
||||
Given the same message history and available tools,
|
||||
the agent must produce the same tool_call ordering.
|
||||
|
||||
(If non-determinism is introduced by temperature > 0,
|
||||
this becomes a statistical test.)
|
||||
"""
|
||||
pytest.skip("TODO: implement with seeded mock model responses")
|
||||
|
||||
def test_tool_result_always_appended_to_history(self):
|
||||
"""
|
||||
After any tool_call is executed, its result MUST appear
|
||||
in the conversation history before the next assistant turn.
|
||||
"""
|
||||
pytest.skip("TODO: mock model with forced tool_call and verify history")
|
||||
|
||||
def test_iteration_budget_never_exceeded(self):
|
||||
"""
|
||||
The loop must terminate before api_call_count >= max_iterations
|
||||
AND before iteration_budget.remaining <= 0.
|
||||
"""
|
||||
pytest.skip("TODO: mock model to always return tool_calls; verify termination")
|
||||
|
||||
def test_system_prompt_presence(self):
|
||||
"""
|
||||
Every API call must include a system message as the first message
|
||||
(or system parameter for providers that support it).
|
||||
"""
|
||||
pytest.skip("TODO: intercept all client.chat.completions.create calls")
|
||||
|
||||
def test_compression_preserves_last_n_messages(self):
|
||||
"""
|
||||
After context compression, the final N messages (configurable,
|
||||
default ~4) must remain uncompressed to preserve local context.
|
||||
"""
|
||||
pytest.skip("TODO: create history > threshold, compress, verify tail")
|
||||
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Tool Registry Invariants
|
||||
# -----------------------------------------------------------------------------
|
||||
|
||||
class TestToolRegistryInvariants:
|
||||
"""
|
||||
Invariants for tools.registry.Registry.
|
||||
"""
|
||||
|
||||
def test_register_then_list_contains_tool(self):
|
||||
"""
|
||||
After register() is called with a valid schema and handler,
|
||||
list_tools() must include the registered name.
|
||||
"""
|
||||
pytest.skip("TODO: instantiate fresh Registry, register, assert membership")
|
||||
|
||||
def test_dispatch_unknown_tool_returns_error_json(self):
|
||||
"""
|
||||
Calling dispatch() with an unregistered tool name must return
|
||||
a JSON string containing an error key, never raise raw.
|
||||
"""
|
||||
pytest.skip("TODO: call dispatch with 'nonexistent_tool', parse result")
|
||||
|
||||
def test_handler_receives_task_id_kwarg(self):
|
||||
"""
|
||||
Registered handlers that accept **kwargs must receive task_id
|
||||
when dispatch is called with one.
|
||||
"""
|
||||
pytest.skip("TODO: register mock handler, dispatch with task_id, verify")
|
||||
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# State Persistence Invariants
|
||||
# -----------------------------------------------------------------------------
|
||||
|
||||
class TestStatePersistenceInvariants:
|
||||
"""
|
||||
Invariants for hermes_state.SessionDB.
|
||||
"""
|
||||
|
||||
def test_saved_message_is_retrievable_by_session_id(self):
|
||||
"""
|
||||
After save_message(session_id, ...), get_messages(session_id)
|
||||
must return the message.
|
||||
"""
|
||||
pytest.skip("TODO: use temp SQLite DB, save, query, assert")
|
||||
|
||||
def test_fts_search_returns_relevant_messages(self):
|
||||
"""
|
||||
After indexing messages, FTS search for a unique keyword
|
||||
must return the message containing it.
|
||||
"""
|
||||
pytest.skip("TODO: seed DB with messages, search unique token")
|
||||
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Context Compressor Invariants
|
||||
# -----------------------------------------------------------------------------
|
||||
|
||||
class TestContextCompressorInvariants:
|
||||
"""
|
||||
Invariants for agent.context_compressor.ContextCompressor.
|
||||
"""
|
||||
|
||||
def test_compression_reduces_token_count(self):
|
||||
"""
|
||||
compress_messages(output) must have fewer tokens than
|
||||
the uncompressed input (for any input > threshold).
|
||||
"""
|
||||
pytest.skip("TODO: mock tokenizer, provide long history, assert reduction")
|
||||
|
||||
def test_compression_never_drops_system_message(self):
|
||||
"""
|
||||
The system message must survive compression and remain
|
||||
at index 0 of the returned message list.
|
||||
"""
|
||||
pytest.skip("TODO: compress history with system msg, verify position")
|
||||
Reference in New Issue
Block a user