Compare commits
1 Commits
epic-999-p
...
timmy/issu
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
3915c5e32b |
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -1,74 +0,0 @@
|
||||
# AIAgent Decomposition Plan (EPIC-999 Phase II Prep)
|
||||
|
||||
## Current State
|
||||
`run_agent.py` contains `AIAgent` — a ~7,000-SLOC class that is the highest-blast-radius module in Hermes.
|
||||
|
||||
## Goal
|
||||
Decompose `AIAgent` into 5 focused classes with strict interfaces, enabling:
|
||||
- Parallel rewrites by competing sub-agents (Phase II)
|
||||
- Independent testing of loop semantics vs. model I/O vs. memory
|
||||
- Future runtime replacement (Hermes Ω) without touching tool infrastructure
|
||||
|
||||
## Proposed Decomposition
|
||||
|
||||
### 1. `ConversationLoop`
|
||||
**Responsibility:** Own the `while` loop invariant, iteration budget, and termination conditions.
|
||||
**Interface:**
|
||||
```python
|
||||
class ConversationLoop:
|
||||
def run(self, messages: list, tools: list, client) -> dict:
|
||||
...
|
||||
```
|
||||
**Invariant:** Must terminate before `max_iterations` and `iteration_budget.remaining <= 0`.
|
||||
|
||||
### 2. `ModelDispatcher`
|
||||
**Responsibility:** All interaction with `client.chat.completions.create`, including streaming, fallback activation, and response normalization.
|
||||
**Interface:**
|
||||
```python
|
||||
class ModelDispatcher:
|
||||
def call(self, model: str, messages: list, tools: list, **kwargs) -> ModelResponse:
|
||||
...
|
||||
```
|
||||
**Invariant:** Must always return a normalized object with `.content`, `.tool_calls`, `.reasoning`.
|
||||
|
||||
### 3. `ToolExecutor`
|
||||
**Responsibility:** Execute tool calls (sequential or concurrent), handle errors, and format results.
|
||||
**Interface:**
|
||||
```python
|
||||
class ToolExecutor:
|
||||
def execute(self, tool_calls: list, task_id: str = None) -> list[ToolResult]:
|
||||
...
|
||||
```
|
||||
**Invariant:** Every tool_call produces exactly one ToolResult, and errors are JSON-serializable.
|
||||
|
||||
### 4. `MemoryInterceptor`
|
||||
**Responsibility:** Intercept `memory` and `todo` tool calls before they reach the registry, plus flush memories on session end.
|
||||
**Interface:**
|
||||
```python
|
||||
class MemoryInterceptor:
|
||||
def intercept(self, tool_name: str, args: dict, task_id: str = None) -> str | None:
|
||||
... # returns result if intercepted, None if pass-through
|
||||
```
|
||||
**Invariant:** Must not mutate agent state except through explicit `flush()` calls.
|
||||
|
||||
### 5. `PromptBuilder`
|
||||
**Responsibility:** Assemble system prompt, inject skills, apply context compression, and manage prompt caching markers.
|
||||
**Interface:**
|
||||
```python
|
||||
class PromptBuilder:
|
||||
def build(self, user_message: str, conversation_history: list) -> list:
|
||||
...
|
||||
```
|
||||
**Invariant:** Output list must start with a system message (or equivalent provider parameter).
|
||||
|
||||
## Migration Path
|
||||
1. Create the 5 classes as thin facades that delegate back to `AIAgent` methods.
|
||||
2. Move logic incrementally from `AIAgent` into the new classes.
|
||||
3. Once `AIAgent` is a pure coordinator (~500 SLOC), freeze the interface.
|
||||
4. Phase II competing agents rewrite one class at a time.
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] `AIAgent` reduced to < 1,000 SLOC
|
||||
- [ ] Each new class has > 80% test coverage
|
||||
- [ ] Full existing test suite still passes
|
||||
- [ ] No behavioral regressions in shadow mode
|
||||
@@ -1,263 +0,0 @@
|
||||
# Hermes Ω Specification Draft (Ouroboros Phase I)
|
||||
|
||||
> Auto-generated by Ezra as part of EPIC-999. This document is a living artifact.
|
||||
|
||||
## Scope
|
||||
This specification covers the core runtime of Hermes agent v0.7.x as found in the `hermes-agent` codebase.
|
||||
|
||||
## High-Level Architecture
|
||||
|
||||
```
|
||||
User Message
|
||||
↓
|
||||
Gateway (gateway/run.py) — platform adapter (Telegram, Discord, CLI, etc.)
|
||||
↓
|
||||
HermesCLI (cli.py) or AIAgent.chat() (run_agent.py)
|
||||
↓
|
||||
ModelTools (model_tools.py) — tool discovery, schema assembly, dispatch
|
||||
↓
|
||||
Tool Registry (tools/registry.py) — handler lookup, availability checks
|
||||
↓
|
||||
Individual Tool Implementations (tools/*.py)
|
||||
↓
|
||||
Results returned up the stack
|
||||
```
|
||||
|
||||
## Module Specifications
|
||||
|
||||
### `run_agent.py`
|
||||
**Lines of Code:** 8948
|
||||
|
||||
**Classes:**
|
||||
- `_SafeWriter`
|
||||
- *Transparent stdio wrapper that catches OSError/ValueError from broken pipes.*
|
||||
- `__init__(self, inner)`
|
||||
- `write(self, data)`
|
||||
- `flush(self)`
|
||||
- `fileno(self)`
|
||||
- `isatty(self)`
|
||||
- ... and 1 more methods
|
||||
- `IterationBudget`
|
||||
- *Thread-safe iteration counter for an agent.*
|
||||
- `__init__(self, max_total)`
|
||||
- `consume(self)`
|
||||
- `refund(self)`
|
||||
- `used(self)`
|
||||
- `remaining(self)`
|
||||
- `AIAgent`
|
||||
- *AI Agent with tool calling capabilities.*
|
||||
- `base_url(self)`
|
||||
- `base_url(self, value)`
|
||||
- `__init__(self, base_url, api_key, provider, api_mode, acp_command, acp_args, command, args, model, max_iterations, tool_delay, enabled_toolsets, disabled_toolsets, save_trajectories, verbose_logging, quiet_mode, ephemeral_system_prompt, log_prefix_chars, log_prefix, providers_allowed, providers_ignored, providers_order, provider_sort, provider_require_parameters, provider_data_collection, session_id, tool_progress_callback, tool_start_callback, tool_complete_callback, thinking_callback, reasoning_callback, clarify_callback, step_callback, stream_delta_callback, tool_gen_callback, status_callback, max_tokens, reasoning_config, prefill_messages, platform, skip_context_files, skip_memory, session_db, iteration_budget, fallback_model, credential_pool, checkpoints_enabled, checkpoint_max_snapshots, pass_session_id, persist_session)`
|
||||
- `reset_session_state(self)`
|
||||
- `_safe_print(self)`
|
||||
- ... and 100 more methods
|
||||
|
||||
**Top-Level Functions:**
|
||||
- `_install_safe_stdio()`
|
||||
- `_is_destructive_command(cmd)`
|
||||
- `_should_parallelize_tool_batch(tool_calls)`
|
||||
- `_extract_parallel_scope_path(tool_name, function_args)`
|
||||
- `_paths_overlap(left, right)`
|
||||
- `_sanitize_surrogates(text)`
|
||||
- `_sanitize_messages_surrogates(messages)`
|
||||
- `_strip_budget_warnings_from_history(messages)`
|
||||
- `main(query, model, api_key, base_url, max_turns, enabled_toolsets, disabled_toolsets, list_tools, save_trajectories, save_sample, verbose, log_prefix_chars)`
|
||||
|
||||
**Inferred Side Effects & Invariants:**
|
||||
- Persists state to SQLite database.
|
||||
- Performs file I/O.
|
||||
- Makes HTTP network calls.
|
||||
- Uses global mutable state (risk factor).
|
||||
|
||||
### `model_tools.py`
|
||||
**Lines of Code:** 466
|
||||
|
||||
**Top-Level Functions:**
|
||||
- `_get_tool_loop()`
|
||||
- `_get_worker_loop()`
|
||||
- `_run_async(coro)`
|
||||
- `_discover_tools()`
|
||||
- `get_tool_definitions(enabled_toolsets, disabled_toolsets, quiet_mode)`
|
||||
- `handle_function_call(function_name, function_args, task_id, user_task, enabled_tools)`
|
||||
- `get_all_tool_names()`
|
||||
- `get_toolset_for_tool(tool_name)`
|
||||
- `get_available_toolsets()`
|
||||
- `check_toolset_requirements()`
|
||||
- ... and 1 more functions
|
||||
|
||||
**Inferred Side Effects & Invariants:**
|
||||
- Uses global mutable state (risk factor).
|
||||
- Primarily pure Python logic / orchestration.
|
||||
|
||||
### `cli.py`
|
||||
**Lines of Code:** 8280
|
||||
|
||||
**Classes:**
|
||||
- `ChatConsole`
|
||||
- *Rich Console adapter for prompt_toolkit's patch_stdout context.*
|
||||
- `__init__(self)`
|
||||
- `print(self)`
|
||||
- `HermesCLI`
|
||||
- *Interactive CLI for the Hermes Agent.*
|
||||
- `__init__(self, model, toolsets, provider, api_key, base_url, max_turns, verbose, compact, resume, checkpoints, pass_session_id)`
|
||||
- `_invalidate(self, min_interval)`
|
||||
- `_status_bar_context_style(self, percent_used)`
|
||||
- `_build_context_bar(self, percent_used, width)`
|
||||
- `_get_status_bar_snapshot(self)`
|
||||
- ... and 106 more methods
|
||||
|
||||
**Top-Level Functions:**
|
||||
- `_load_prefill_messages(file_path)`
|
||||
- `_parse_reasoning_config(effort)`
|
||||
- `load_cli_config()`
|
||||
- `_run_cleanup()`
|
||||
- `_git_repo_root()`
|
||||
- `_path_is_within_root(path, root)`
|
||||
- `_setup_worktree(repo_root)`
|
||||
- `_cleanup_worktree(info)`
|
||||
- `_prune_stale_worktrees(repo_root, max_age_hours)`
|
||||
- `_accent_hex()`
|
||||
- ... and 9 more functions
|
||||
|
||||
**Inferred Side Effects & Invariants:**
|
||||
- Persists state to SQLite database.
|
||||
- Performs file I/O.
|
||||
- Spawns subprocesses / shell commands.
|
||||
- Uses global mutable state (risk factor).
|
||||
|
||||
### `tools/registry.py`
|
||||
**Lines of Code:** 275
|
||||
|
||||
**Classes:**
|
||||
- `ToolEntry`
|
||||
- *Metadata for a single registered tool.*
|
||||
- `__init__(self, name, toolset, schema, handler, check_fn, requires_env, is_async, description, emoji)`
|
||||
- `ToolRegistry`
|
||||
- *Singleton registry that collects tool schemas + handlers from tool files.*
|
||||
- `__init__(self)`
|
||||
- `register(self, name, toolset, schema, handler, check_fn, requires_env, is_async, description, emoji)`
|
||||
- `deregister(self, name)`
|
||||
- `get_definitions(self, tool_names, quiet)`
|
||||
- `dispatch(self, name, args)`
|
||||
- ... and 10 more methods
|
||||
|
||||
**Inferred Side Effects & Invariants:**
|
||||
- Primarily pure Python logic / orchestration.
|
||||
|
||||
### `gateway/run.py`
|
||||
**Lines of Code:** 6657
|
||||
|
||||
**Classes:**
|
||||
- `GatewayRunner`
|
||||
- *Main gateway controller.*
|
||||
- `__init__(self, config)`
|
||||
- `_has_setup_skill(self)`
|
||||
- `_load_voice_modes(self)`
|
||||
- `_save_voice_modes(self)`
|
||||
- `_set_adapter_auto_tts_disabled(self, adapter, chat_id, disabled)`
|
||||
- ... and 78 more methods
|
||||
|
||||
**Top-Level Functions:**
|
||||
- `_ensure_ssl_certs()`
|
||||
- `_normalize_whatsapp_identifier(value)`
|
||||
- `_expand_whatsapp_auth_aliases(identifier)`
|
||||
- `_resolve_runtime_agent_kwargs()`
|
||||
- `_build_media_placeholder(event)`
|
||||
- `_dequeue_pending_text(adapter, session_key)`
|
||||
- `_check_unavailable_skill(command_name)`
|
||||
- `_platform_config_key(platform)`
|
||||
- `_load_gateway_config()`
|
||||
- `_resolve_gateway_model(config)`
|
||||
- ... and 4 more functions
|
||||
|
||||
**Inferred Side Effects & Invariants:**
|
||||
- Persists state to SQLite database.
|
||||
- Performs file I/O.
|
||||
- Spawns subprocesses / shell commands.
|
||||
- Contains async code paths.
|
||||
- Uses global mutable state (risk factor).
|
||||
|
||||
### `hermes_state.py`
|
||||
**Lines of Code:** 1270
|
||||
|
||||
**Classes:**
|
||||
- `SessionDB`
|
||||
- *SQLite-backed session storage with FTS5 search.*
|
||||
- `__init__(self, db_path)`
|
||||
- `_execute_write(self, fn)`
|
||||
- `_try_wal_checkpoint(self)`
|
||||
- `close(self)`
|
||||
- `_init_schema(self)`
|
||||
- ... and 29 more methods
|
||||
|
||||
**Inferred Side Effects & Invariants:**
|
||||
- Persists state to SQLite database.
|
||||
|
||||
### `agent/context_compressor.py`
|
||||
**Lines of Code:** 676
|
||||
|
||||
**Classes:**
|
||||
- `ContextCompressor`
|
||||
- *Compresses conversation context when approaching the model's context limit.*
|
||||
- `__init__(self, model, threshold_percent, protect_first_n, protect_last_n, summary_target_ratio, quiet_mode, summary_model_override, base_url, api_key, config_context_length, provider)`
|
||||
- `update_from_response(self, usage)`
|
||||
- `should_compress(self, prompt_tokens)`
|
||||
- `should_compress_preflight(self, messages)`
|
||||
- `get_status(self)`
|
||||
- ... and 11 more methods
|
||||
|
||||
**Inferred Side Effects & Invariants:**
|
||||
- Primarily pure Python logic / orchestration.
|
||||
|
||||
### `agent/prompt_caching.py`
|
||||
**Lines of Code:** 72
|
||||
|
||||
**Top-Level Functions:**
|
||||
- `_apply_cache_marker(msg, cache_marker, native_anthropic)`
|
||||
- `apply_anthropic_cache_control(api_messages, cache_ttl, native_anthropic)`
|
||||
|
||||
**Inferred Side Effects & Invariants:**
|
||||
- Primarily pure Python logic / orchestration.
|
||||
|
||||
### `agent/skill_commands.py`
|
||||
**Lines of Code:** 297
|
||||
|
||||
**Top-Level Functions:**
|
||||
- `build_plan_path(user_instruction)`
|
||||
- `_load_skill_payload(skill_identifier, task_id)`
|
||||
- `_build_skill_message(loaded_skill, skill_dir, activation_note, user_instruction, runtime_note)`
|
||||
- `scan_skill_commands()`
|
||||
- `get_skill_commands()`
|
||||
- `build_skill_invocation_message(cmd_key, user_instruction, task_id, runtime_note)`
|
||||
- `build_preloaded_skills_prompt(skill_identifiers, task_id)`
|
||||
|
||||
**Inferred Side Effects & Invariants:**
|
||||
- Uses global mutable state (risk factor).
|
||||
- Primarily pure Python logic / orchestration.
|
||||
|
||||
## Cross-Module Dependencies
|
||||
|
||||
Key data flow:
|
||||
1. `run_agent.py` defines `AIAgent` — the canonical conversation loop.
|
||||
2. `model_tools.py` assembles tool schemas and dispatches function calls.
|
||||
3. `tools/registry.py` maintains the central registry; all tool files import it.
|
||||
4. `gateway/run.py` adapts platform events into `AIAgent.run_conversation()` calls.
|
||||
5. `cli.py` (`HermesCLI`) provides the interactive shell and slash-command routing.
|
||||
|
||||
## Known Coupling Risks
|
||||
|
||||
- `run_agent.py` is ~7k SLOC and contains the core loop, todo/memory interception, context compression, and trajectory saving. High blast radius.
|
||||
- `cli.py` is ~6.5k SLOC and combines UI (Rich/prompt_toolkit), config loading, and command dispatch. Tightly coupled to display state.
|
||||
- `model_tools.py` holds a process-global `_last_resolved_tool_names`. Subagent execution saves/restores this global.
|
||||
- `tools/registry.py` is imported by ALL tool files; schema generation happens at import time.
|
||||
|
||||
## Next Actions (Phase II Prep)
|
||||
|
||||
1. Decompose `AIAgent` into: `ConversationLoop`, `ContextManager`, `ToolDispatcher`, `MemoryInterceptor`.
|
||||
2. Extract CLI display logic from command dispatch.
|
||||
3. Define strict interfaces between gateway → agent → tools.
|
||||
4. Write property-based tests for the conversation loop invariant: *given the same message history and tool results, the agent must produce deterministic tool_call ordering*.
|
||||
|
||||
---
|
||||
Generated: 2026-04-05 by Ezra (Phase I)
|
||||
@@ -1,137 +0,0 @@
|
||||
"""
|
||||
Property-based test stubs for Hermes core invariants.
|
||||
Part of EPIC-999 Phase I — The Mirror.
|
||||
|
||||
These tests define behavioral contracts that ANY rewrite of the runtime
|
||||
must satisfy, including the Hermes Ω target.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from unittest.mock import Mock, patch
|
||||
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Conversation Loop Invariants
|
||||
# -----------------------------------------------------------------------------
|
||||
|
||||
class TestConversationLoopInvariants:
|
||||
"""
|
||||
Invariants for AIAgent.run_conversation and its successors.
|
||||
"""
|
||||
|
||||
def test_deterministic_tool_ordering(self):
|
||||
"""
|
||||
Given the same message history and available tools,
|
||||
the agent must produce the same tool_call ordering.
|
||||
|
||||
(If non-determinism is introduced by temperature > 0,
|
||||
this becomes a statistical test.)
|
||||
"""
|
||||
pytest.skip("TODO: implement with seeded mock model responses")
|
||||
|
||||
def test_tool_result_always_appended_to_history(self):
|
||||
"""
|
||||
After any tool_call is executed, its result MUST appear
|
||||
in the conversation history before the next assistant turn.
|
||||
"""
|
||||
pytest.skip("TODO: mock model with forced tool_call and verify history")
|
||||
|
||||
def test_iteration_budget_never_exceeded(self):
|
||||
"""
|
||||
The loop must terminate before api_call_count >= max_iterations
|
||||
AND before iteration_budget.remaining <= 0.
|
||||
"""
|
||||
pytest.skip("TODO: mock model to always return tool_calls; verify termination")
|
||||
|
||||
def test_system_prompt_presence(self):
|
||||
"""
|
||||
Every API call must include a system message as the first message
|
||||
(or system parameter for providers that support it).
|
||||
"""
|
||||
pytest.skip("TODO: intercept all client.chat.completions.create calls")
|
||||
|
||||
def test_compression_preserves_last_n_messages(self):
|
||||
"""
|
||||
After context compression, the final N messages (configurable,
|
||||
default ~4) must remain uncompressed to preserve local context.
|
||||
"""
|
||||
pytest.skip("TODO: create history > threshold, compress, verify tail")
|
||||
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Tool Registry Invariants
|
||||
# -----------------------------------------------------------------------------
|
||||
|
||||
class TestToolRegistryInvariants:
|
||||
"""
|
||||
Invariants for tools.registry.Registry.
|
||||
"""
|
||||
|
||||
def test_register_then_list_contains_tool(self):
|
||||
"""
|
||||
After register() is called with a valid schema and handler,
|
||||
list_tools() must include the registered name.
|
||||
"""
|
||||
pytest.skip("TODO: instantiate fresh Registry, register, assert membership")
|
||||
|
||||
def test_dispatch_unknown_tool_returns_error_json(self):
|
||||
"""
|
||||
Calling dispatch() with an unregistered tool name must return
|
||||
a JSON string containing an error key, never raise raw.
|
||||
"""
|
||||
pytest.skip("TODO: call dispatch with 'nonexistent_tool', parse result")
|
||||
|
||||
def test_handler_receives_task_id_kwarg(self):
|
||||
"""
|
||||
Registered handlers that accept **kwargs must receive task_id
|
||||
when dispatch is called with one.
|
||||
"""
|
||||
pytest.skip("TODO: register mock handler, dispatch with task_id, verify")
|
||||
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# State Persistence Invariants
|
||||
# -----------------------------------------------------------------------------
|
||||
|
||||
class TestStatePersistenceInvariants:
|
||||
"""
|
||||
Invariants for hermes_state.SessionDB.
|
||||
"""
|
||||
|
||||
def test_saved_message_is_retrievable_by_session_id(self):
|
||||
"""
|
||||
After save_message(session_id, ...), get_messages(session_id)
|
||||
must return the message.
|
||||
"""
|
||||
pytest.skip("TODO: use temp SQLite DB, save, query, assert")
|
||||
|
||||
def test_fts_search_returns_relevant_messages(self):
|
||||
"""
|
||||
After indexing messages, FTS search for a unique keyword
|
||||
must return the message containing it.
|
||||
"""
|
||||
pytest.skip("TODO: seed DB with messages, search unique token")
|
||||
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Context Compressor Invariants
|
||||
# -----------------------------------------------------------------------------
|
||||
|
||||
class TestContextCompressorInvariants:
|
||||
"""
|
||||
Invariants for agent.context_compressor.ContextCompressor.
|
||||
"""
|
||||
|
||||
def test_compression_reduces_token_count(self):
|
||||
"""
|
||||
compress_messages(output) must have fewer tokens than
|
||||
the uncompressed input (for any input > threshold).
|
||||
"""
|
||||
pytest.skip("TODO: mock tokenizer, provide long history, assert reduction")
|
||||
|
||||
def test_compression_never_drops_system_message(self):
|
||||
"""
|
||||
The system message must survive compression and remain
|
||||
at index 0 of the returned message list.
|
||||
"""
|
||||
pytest.skip("TODO: compress history with system msg, verify position")
|
||||
541
scripts/test_config_validation.py
Normal file
541
scripts/test_config_validation.py
Normal file
@@ -0,0 +1,541 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Comprehensive config structure validation test script for Issue #116.
|
||||
|
||||
Tests the validate_config_structure() function from hermes_cli.config
|
||||
across four scenarios:
|
||||
1. Valid config passes without issues
|
||||
2. YAML syntax errors are caught
|
||||
3. Type mismatches are detected
|
||||
4. Completely broken YAML is handled gracefully
|
||||
|
||||
Usage:
|
||||
python scripts/test_config_validation.py
|
||||
python -m pytest scripts/test_config_validation.py -v
|
||||
"""
|
||||
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
PASS = "\033[32mPASS\033[0m"
|
||||
FAIL = "\033[31mFAIL\033[0m"
|
||||
|
||||
|
||||
def _hermes_agent_root() -> Path:
|
||||
"""Return the hermes-agent project root."""
|
||||
return Path(__file__).resolve().parent.parent
|
||||
|
||||
|
||||
def _run_in_project(cmd: list[str], extra_env: dict[str, str] | None = None, **kwargs) -> subprocess.CompletedProcess:
|
||||
"""Run a command with the project root on sys.path."""
|
||||
env = os.environ.copy()
|
||||
root = str(_hermes_agent_root())
|
||||
env["PYTHONPATH"] = root
|
||||
if extra_env:
|
||||
env.update(extra_env)
|
||||
return subprocess.run(cmd, capture_output=True, text=True, env=env, **kwargs)
|
||||
|
||||
|
||||
def _write_and_load_yaml(yaml_content: str):
|
||||
"""Write yaml_content to a temp file, set HERMES_HOME to point at it,
|
||||
then run validate_config_structure() in a subprocess and return (rc, stdout, stderr).
|
||||
"""
|
||||
home = tempfile.mkdtemp(prefix="hermes_test_")
|
||||
cfg_path = Path(home) / "config.yaml"
|
||||
cfg_path.write_text(yaml_content, encoding="utf-8")
|
||||
|
||||
# We use a small inline Python script that loads the validator and
|
||||
# exercises it with the given HERMES_HOME.
|
||||
py_code = """
|
||||
import os, sys, json
|
||||
root = sys.argv[1]
|
||||
sys.path.insert(0, root)
|
||||
|
||||
from hermes_cli.config import validate_config_structure, ConfigIssue
|
||||
|
||||
try:
|
||||
issues = validate_config_structure()
|
||||
out = [{"severity": i.severity, "message": i.message, "hint": i.hint} for i in issues]
|
||||
print(json.dumps({"status": "ok", "issues": out}))
|
||||
except yaml.YAMLError as e:
|
||||
print(json.dumps({"status": "yaml_error", "detail": str(e)}))
|
||||
except Exception as e:
|
||||
print(json.dumps({"status": "error", "detail": str(e)}))
|
||||
""".strip()
|
||||
|
||||
result = _run_in_project(
|
||||
[sys.executable, "-c", py_code, str(_hermes_agent_root())],
|
||||
extra_env={"HERMES_HOME": home},
|
||||
)
|
||||
return result
|
||||
|
||||
|
||||
def _call_validate(config: dict):
|
||||
"""Call validate_config_structure(config) directly in a subprocess and
|
||||
return a dict: {"status": "ok", "issues": [...]}.
|
||||
"""
|
||||
import json
|
||||
|
||||
py_code = """
|
||||
import os, sys, json
|
||||
root = sys.argv[1]
|
||||
config_str = sys.argv[2]
|
||||
sys.path.insert(0, root)
|
||||
|
||||
from hermes_cli.config import validate_config_structure
|
||||
|
||||
config = json.loads(config_str)
|
||||
issues = validate_config_structure(config)
|
||||
out = [{"severity": i.severity, "message": i.message, "hint": i.hint} for i in issues]
|
||||
print(json.dumps({"status": "ok", "issues": out}))
|
||||
""".strip()
|
||||
|
||||
result = _run_in_project(
|
||||
[sys.executable, "-c", py_code, str(_hermes_agent_root()), json.dumps(config)],
|
||||
)
|
||||
assert result.returncode == 0, f"Subprocess failed:\nstdout={result.stdout}\nstderr={result.stderr}"
|
||||
return json.loads(result.stdout.strip().splitlines()[-1])
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Test harness
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestResult:
|
||||
def __init__(self):
|
||||
self.passed = 0
|
||||
self.failed = 0
|
||||
self.results: list[tuple[str, bool, str]] = []
|
||||
|
||||
def record(self, name: str, ok: bool, detail: str = "") -> None:
|
||||
if ok:
|
||||
self.passed += 1
|
||||
self.results.append((name, True, detail))
|
||||
else:
|
||||
self.failed += 1
|
||||
self.results.append((name, False, detail))
|
||||
marker = PASS if ok else FAIL
|
||||
print(f" [{marker}] {name}" + (f" — {detail}" if detail and not ok else ""))
|
||||
|
||||
def summary(self) -> bool:
|
||||
total = self.passed + self.failed
|
||||
print(f"\n{'='*60}")
|
||||
print(f" Results: {self.passed}/{total} passed, {self.failed} failed")
|
||||
print(f"{'='*60}")
|
||||
if self.failed:
|
||||
print("\n Failed tests:")
|
||||
for name, ok, detail in self.results:
|
||||
if not ok:
|
||||
print(f" - {name}: {detail}")
|
||||
return self.failed == 0
|
||||
|
||||
|
||||
t = TestResult()
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# 1. Valid config passes
|
||||
# ===================================================================
|
||||
|
||||
def test_valid_empty_dict():
|
||||
issues = _call_validate({})
|
||||
# Empty dict — no custom_providers, no fallback_model, so no issues expected
|
||||
t.record("valid: empty config dict", len(issues["issues"]) == 0)
|
||||
|
||||
|
||||
def test_valid_custom_providers_list():
|
||||
issues = _call_validate({
|
||||
"custom_providers": [
|
||||
{"name": "my-provider", "base_url": "https://api.example.com/v1"},
|
||||
],
|
||||
"model": {"provider": "custom", "default": "test"},
|
||||
})
|
||||
t.record("valid: custom_providers as proper list", len(issues["issues"]) == 0)
|
||||
|
||||
|
||||
def test_valid_fallback_model():
|
||||
issues = _call_validate({
|
||||
"fallback_model": {
|
||||
"provider": "openrouter",
|
||||
"model": "anthropic/claude-sonnet-4",
|
||||
},
|
||||
})
|
||||
fb_relevant = [i for i in issues["issues"] if "fallback" in i["message"].lower()]
|
||||
t.record("valid: fallback_model with provider+model", len(fb_relevant) == 0)
|
||||
|
||||
|
||||
def test_valid_empty_fallback():
|
||||
issues = _call_validate({"fallback_model": {}})
|
||||
fb_relevant = [i for i in issues["issues"] if "fallback" in i["message"].lower()]
|
||||
t.record("valid: empty fallback_model is fine", len(fb_relevant) == 0)
|
||||
|
||||
|
||||
def test_valid_fullish_config():
|
||||
issues = _call_validate({
|
||||
"model": {"provider": "openrouter", "default": "anthropic/claude-sonnet-4"},
|
||||
"providers": {},
|
||||
"fallback_providers": [],
|
||||
"toolsets": ["hermes-cli"],
|
||||
"custom_providers": [
|
||||
{"name": "gemini", "base_url": "https://generativelanguage.googleapis.com/v1beta/openai"},
|
||||
],
|
||||
})
|
||||
t.record("valid: full config with all sections", len(issues["issues"]) == 0)
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# 2. YAML syntax errors caught
|
||||
# ===================================================================
|
||||
|
||||
def test_yaml_syntax_bad_indent():
|
||||
"""YAML with content that pyyaml cannot parse (mismatched indentation with
|
||||
an unexpected block mapping context)."""
|
||||
# Use a clearly broken structure: unquoted colon in a flow context
|
||||
broken = "model:\n provider: openrouter\n- list_item: at_wrong_level\n"
|
||||
result = _write_and_load_yaml(broken)
|
||||
out = result.stdout.strip().splitlines()[-1] if result.stdout.strip() else ""
|
||||
import json
|
||||
try:
|
||||
data = json.loads(out)
|
||||
# Should handle gracefully — either yaml_error or ok (pyyaml may accept
|
||||
# some "broken-looking" YAML by merging). The key is no crash.
|
||||
ok = data.get("status") in ("ok", "yaml_error", "error")
|
||||
t.record("yaml syntax: bad indentation handled gracefully", ok,
|
||||
f"got status={data.get('status')}")
|
||||
except json.JSONDecodeError:
|
||||
t.record("yaml syntax: bad indentation handled gracefully", False, "could not parse output")
|
||||
|
||||
|
||||
def test_yaml_syntax_duplicate_key():
|
||||
"""YAML with duplicate keys that confuse the parser."""
|
||||
result = _write_and_load_yaml("model: openrouter\nmodel: anthropic\n")
|
||||
# yaml.safe_load accepts duplicate keys silently (last wins), so
|
||||
# validate_config_structure should still process it without crash.
|
||||
out = result.stdout.strip().splitlines()[-1] if result.stdout.strip() else ""
|
||||
import json
|
||||
try:
|
||||
data = json.loads(out)
|
||||
# Should complete without crashing
|
||||
ok = data.get("status") == "ok"
|
||||
t.record("yaml syntax: duplicate keys handled", ok,
|
||||
f"unexpected status: {data.get('status')}")
|
||||
except json.JSONDecodeError:
|
||||
t.record("yaml syntax: duplicate keys handled", False, "could not parse output")
|
||||
|
||||
|
||||
def test_yaml_syntax_trailing_colon():
|
||||
"""YAML with a trailing colon that creates an unexpected mapping."""
|
||||
bad_yaml = """
|
||||
custom_providers:
|
||||
name: test
|
||||
base_url: https://example.com
|
||||
invalid_key:: some_value
|
||||
"""
|
||||
result = _write_and_load_yaml(bad_yaml)
|
||||
out = result.stdout.strip().splitlines()[-1] if result.stdout.strip() else ""
|
||||
import json
|
||||
try:
|
||||
data = json.loads(out)
|
||||
# Either yaml_error for parse failure, or ok with detection
|
||||
ok = data.get("status") in ("ok", "yaml_error")
|
||||
t.record("yaml syntax: trailing colon handled gracefully", ok,
|
||||
f"got status={data.get('status')}")
|
||||
except json.JSONDecodeError:
|
||||
t.record("yaml syntax: trailing colon handled gracefully", False, "could not parse output")
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# 3. Type mismatches detected
|
||||
# ===================================================================
|
||||
|
||||
def test_custom_providers_dict_instead_of_list():
|
||||
"""The classic Discord-user error: custom_providers as flat dict."""
|
||||
issues = _call_validate({
|
||||
"custom_providers": {
|
||||
"name": "Generativelanguage.googleapis.com",
|
||||
"base_url": "https://generativelanguage.googleapis.com/v1beta/openai",
|
||||
"api_key": "***",
|
||||
},
|
||||
})
|
||||
errors = [i for i in issues["issues"] if i["severity"] == "error"]
|
||||
ok = any("dict" in i["message"].lower() and "list" in i["message"].lower() for i in errors)
|
||||
t.record("type mismatch: custom_providers as dict instead of list", ok)
|
||||
|
||||
|
||||
def test_custom_providers_string_instead_of_list():
|
||||
issues = _call_validate({
|
||||
"custom_providers": "just a string",
|
||||
})
|
||||
# A string is not a dict or list, so no custom_providers-specific
|
||||
# errors fire, but the fact that we don't crash is the test.
|
||||
ok = True # Should complete without crash
|
||||
t.record("type mismatch: custom_providers as string (no crash)", ok)
|
||||
|
||||
|
||||
def test_custom_providers_list_of_strings():
|
||||
issues = _call_validate({
|
||||
"custom_providers": ["not-a-dict", "also-not-a-dict"],
|
||||
"model": {"provider": "custom"},
|
||||
})
|
||||
warnings = [i for i in issues["issues"] if i["severity"] == "warning"]
|
||||
ok = any("not a dict" in i["message"] for i in warnings)
|
||||
t.record("type mismatch: custom_providers list of strings detected", ok)
|
||||
|
||||
|
||||
def test_fallback_model_string_instead_of_dict():
|
||||
issues = _call_validate({
|
||||
"fallback_model": "openrouter:anthropic/claude-sonnet-4",
|
||||
})
|
||||
errors = [i for i in issues["issues"] if i["severity"] == "error"]
|
||||
ok = any("should be a dict" in i["message"] for i in errors)
|
||||
t.record("type mismatch: fallback_model as string instead of dict", ok)
|
||||
|
||||
|
||||
def test_fallback_model_list_instead_of_dict():
|
||||
issues = _call_validate({
|
||||
"fallback_model": ["openrouter", "claude-sonnet-4"],
|
||||
})
|
||||
errors = [i for i in issues["issues"] if i["severity"] == "error"]
|
||||
ok = any("should be a dict" in i["message"] for i in errors)
|
||||
t.record("type mismatch: fallback_model as list instead of dict", ok)
|
||||
|
||||
|
||||
def test_fallback_model_number_instead_of_dict():
|
||||
issues = _call_validate({"fallback_model": 42})
|
||||
errors = [i for i in issues["issues"] if i["severity"] == "error"]
|
||||
ok = any("should be a dict" in i["message"] for i in errors)
|
||||
t.record("type mismatch: fallback_model as int instead of dict", ok)
|
||||
|
||||
|
||||
def test_custom_providers_missing_name():
|
||||
issues = _call_validate({
|
||||
"custom_providers": [{"base_url": "https://example.com/v1"}],
|
||||
"model": {"provider": "custom"},
|
||||
})
|
||||
ok = any("missing 'name'" in i["message"] for i in issues["issues"])
|
||||
t.record("type mismatch: custom_providers entry missing 'name'", ok)
|
||||
|
||||
|
||||
def test_custom_providers_missing_base_url():
|
||||
issues = _call_validate({
|
||||
"custom_providers": [{"name": "test"}],
|
||||
"model": {"provider": "custom"},
|
||||
})
|
||||
ok = any("missing 'base_url'" in i["message"] for i in issues["issues"])
|
||||
t.record("type mismatch: custom_providers entry missing 'base_url'", ok)
|
||||
|
||||
|
||||
def test_custom_providers_missing_model_section():
|
||||
issues = _call_validate({
|
||||
"custom_providers": [{"name": "test", "base_url": "https://example.com/v1"}],
|
||||
})
|
||||
ok = any("no 'model' section" in i["message"] for i in issues["issues"])
|
||||
t.record("type mismatch: custom_providers without model section", ok)
|
||||
|
||||
|
||||
def test_nested_fallback_inside_custom_providers():
|
||||
issues = _call_validate({
|
||||
"custom_providers": {
|
||||
"name": "test",
|
||||
"fallback_model": {"provider": "openrouter", "model": "test"},
|
||||
},
|
||||
})
|
||||
errors = [i for i in issues["issues"] if i["severity"] == "error"]
|
||||
ok = any("fallback_model" in i["message"] and "inside" in i["message"] for i in errors)
|
||||
t.record("type mismatch: fallback_model nested inside custom_providers dict", ok)
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# 4. Completely broken YAML handled gracefully
|
||||
# ===================================================================
|
||||
|
||||
def test_completely_broken_yaml_binary_content():
|
||||
"""Binary-ish content that YAML cannot parse."""
|
||||
broken = "key: \x00\x01\x02\x03 invalid binary stuff: \xff\xfe"
|
||||
result = _write_and_load_yaml(broken)
|
||||
out = result.stdout.strip().splitlines()[-1] if result.stdout.strip() else ""
|
||||
import json
|
||||
try:
|
||||
data = json.loads(out)
|
||||
# Any status including yaml_error / error is acceptable — no traceback
|
||||
ok = True
|
||||
t.record("broken yaml: binary content handled gracefully", ok)
|
||||
except json.JSONDecodeError:
|
||||
t.record("broken yaml: binary content handled gracefully", False,
|
||||
"subprocess returned non-JSON output (possible crash)")
|
||||
|
||||
|
||||
def test_completely_broken_yaml_random_chars():
|
||||
"""Random garbage that is definitely not valid YAML."""
|
||||
broken = "{{{{{}}}}} {{{{not_yaml: [}}}}\n!invalid-tag!!! @@###$$$\n"
|
||||
result = _write_and_load_yaml(broken)
|
||||
out = result.stdout.strip().splitlines()[-1] if result.stdout.strip() else ""
|
||||
import json
|
||||
try:
|
||||
data = json.loads(out)
|
||||
# Should either be yaml_error status, or ok with zero/many issues
|
||||
ok = True # The fact we got back JSON means we didn't crash
|
||||
t.record("broken yaml: random garbage handled gracefully", ok)
|
||||
except json.JSONDecodeError:
|
||||
t.record("broken yaml: random garbage handled gracefully", False,
|
||||
"subprocess returned non-JSON output (possible crash)")
|
||||
|
||||
|
||||
def test_completely_broken_yaml_nested_braces():
|
||||
"""Deeply-nested braces that break YAML parsing."""
|
||||
broken = "a: {{{{{}}}}}\n b: {{{{{}}}}}\n c: {{{{{}}}}}\n"
|
||||
result = _write_and_load_yaml(broken)
|
||||
out = result.stdout.strip().splitlines()[-1] if result.stdout.strip() else ""
|
||||
import json
|
||||
try:
|
||||
data = json.loads(out)
|
||||
t.record("broken yaml: nested braces handled gracefully", True)
|
||||
except json.JSONDecodeError:
|
||||
t.record("broken yaml: nested braces handled gracefully", False,
|
||||
"subprocess returned non-JSON output")
|
||||
|
||||
|
||||
def test_empty_yaml_file():
|
||||
"""Empty config file — should load and produce no issues."""
|
||||
result = _write_and_load_yaml("")
|
||||
out = result.stdout.strip().splitlines()[-1] if result.stdout.strip() else ""
|
||||
import json
|
||||
try:
|
||||
data = json.loads(out)
|
||||
ok = data.get("status") == "ok" and len(data.get("issues", [])) == 0
|
||||
t.record("broken yaml: empty file handled gracefully (no issues)", ok,
|
||||
f"got status={data.get('status')}")
|
||||
except json.JSONDecodeError:
|
||||
t.record("broken yaml: empty file handled gracefully", False,
|
||||
"subprocess returned non-JSON output")
|
||||
|
||||
|
||||
def test_yaml_with_only_null():
|
||||
"""YAML file containing only '~' or 'null' should produce empty dict."""
|
||||
result = _write_and_load_yaml("~\n")
|
||||
out = result.stdout.strip().splitlines()[-1] if result.stdout.strip() else ""
|
||||
import json
|
||||
try:
|
||||
data = json.loads(out)
|
||||
ok = data.get("status") == "ok"
|
||||
t.record("broken yaml: null-only YAML handled gracefully", ok,
|
||||
f"got status={data.get('status')}")
|
||||
except json.JSONDecodeError:
|
||||
t.record("broken yaml: null-only YAML handled gracefully", False,
|
||||
"subprocess returned non-JSON output")
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# Print config warnings test
|
||||
# ===================================================================
|
||||
|
||||
def test_print_config_warnings_output():
|
||||
"""Ensure print_config_warnings prints warnings when issues exist."""
|
||||
import json
|
||||
|
||||
py_code = """
|
||||
import os, sys, json
|
||||
root = sys.argv[1]
|
||||
sys.path.insert(0, root)
|
||||
|
||||
from hermes_cli.config import print_config_warnings
|
||||
|
||||
# This config should produce warnings
|
||||
config = {
|
||||
"custom_providers": {
|
||||
"name": "test",
|
||||
"base_url": "https://example.com",
|
||||
},
|
||||
}
|
||||
print_config_warnings(config)
|
||||
""".strip()
|
||||
|
||||
result = _run_in_project(
|
||||
[sys.executable, "-c", py_code, str(_hermes_agent_root())],
|
||||
)
|
||||
ok = "config" in result.stderr.lower() or returncode_ok(result.returncode)
|
||||
t.record("print_config_warnings: outputs warnings to stderr for bad config", ok,
|
||||
f"stderr={result.stderr[:200]}")
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# Root-level misplaced keys test
|
||||
# ===================================================================
|
||||
|
||||
def test_misplaced_root_level_key():
|
||||
"""A root-level "base_url" that should be inside model/custom_providers."""
|
||||
issues = _call_validate({
|
||||
"base_url": "https://api.example.com/v1",
|
||||
"model": {"provider": "openrouter"},
|
||||
})
|
||||
warnings = [i for i in issues["issues"] if i["severity"] == "warning"]
|
||||
ok = any("misplaced" in i["message"].lower() for i in warnings)
|
||||
t.record("misplaced root key: base_url flagged", ok)
|
||||
|
||||
|
||||
def test_returncode_ok(code: int) -> bool:
|
||||
return code == 0
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# Main
|
||||
# ===================================================================
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Ensure project root is on sys.path for import in the _call_validate/
|
||||
# _write_and_load_yaml subprocesses
|
||||
sys.path.insert(0, str(_hermes_agent_root()))
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print(" Config Structure Validation Tests (Issue #116)")
|
||||
print(f"{'='*60}\n")
|
||||
|
||||
# 1. Valid config passes
|
||||
print("--- 1. Valid config passes ---")
|
||||
test_valid_empty_dict()
|
||||
test_valid_custom_providers_list()
|
||||
test_valid_fallback_model()
|
||||
test_valid_empty_fallback()
|
||||
test_valid_fullish_config()
|
||||
|
||||
# 2. YAML syntax errors caught
|
||||
print("\n--- 2. YAML syntax errors caught ---")
|
||||
test_yaml_syntax_bad_indent()
|
||||
test_yaml_syntax_duplicate_key()
|
||||
test_yaml_syntax_trailing_colon()
|
||||
|
||||
# 3. Type mismatches detected
|
||||
print("\n--- 3. Type mismatches detected ---")
|
||||
test_custom_providers_dict_instead_of_list()
|
||||
test_custom_providers_string_instead_of_list()
|
||||
test_custom_providers_list_of_strings()
|
||||
test_fallback_model_string_instead_of_dict()
|
||||
test_fallback_model_list_instead_of_dict()
|
||||
test_fallback_model_number_instead_of_dict()
|
||||
test_custom_providers_missing_name()
|
||||
test_custom_providers_missing_base_url()
|
||||
test_custom_providers_missing_model_section()
|
||||
test_nested_fallback_inside_custom_providers()
|
||||
test_misplaced_root_level_key()
|
||||
|
||||
# 4. Completely broken YAML handled gracefully
|
||||
print("\n--- 4. Completely broken YAML handled gracefully ---")
|
||||
test_completely_broken_yaml_binary_content()
|
||||
test_completely_broken_yaml_random_chars()
|
||||
test_completely_broken_yaml_nested_braces()
|
||||
test_empty_yaml_file()
|
||||
test_yaml_with_only_null()
|
||||
|
||||
# 5. Print config warnings
|
||||
print("\n--- 5. Print config warnings ---")
|
||||
test_print_config_warnings_output()
|
||||
|
||||
ok = t.summary()
|
||||
sys.exit(0 if ok else 1)
|
||||
Reference in New Issue
Block a user