Compare commits

..

1 Commits

Author SHA1 Message Date
Alexander Whitestone
3915c5e32b test: add comprehensive config structure validation tests (#116)
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Docs Site Checks / docs-site-checks (pull_request) Failing after 3s
Nix / nix (ubuntu-latest) (pull_request) Failing after 1s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Failing after 2s
Tests / test (pull_request) Failing after 2s
Tests / e2e (pull_request) Failing after 3s
Nix / nix (macos-latest) (pull_request) Has been cancelled
Verify config structure validation at startup covers:
- Valid config passes without issues
- YAML syntax errors caught gracefully
- Type mismatches detected (dict vs list, string vs dict, etc.)
- Completely broken YAML handled without crashes

25 tests covering validate_config_structure(), print_config_warnings(),
and edge cases for malformed configs.
2026-04-06 10:23:57 -04:00
8 changed files with 541 additions and 52159 deletions

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,74 +0,0 @@
# AIAgent Decomposition Plan (EPIC-999 Phase II Prep)
## Current State
`run_agent.py` contains `AIAgent` — a ~7,000-SLOC class that is the highest-blast-radius module in Hermes.
## Goal
Decompose `AIAgent` into 5 focused classes with strict interfaces, enabling:
- Parallel rewrites by competing sub-agents (Phase II)
- Independent testing of loop semantics vs. model I/O vs. memory
- Future runtime replacement (Hermes Ω) without touching tool infrastructure
## Proposed Decomposition
### 1. `ConversationLoop`
**Responsibility:** Own the `while` loop invariant, iteration budget, and termination conditions.
**Interface:**
```python
class ConversationLoop:
def run(self, messages: list, tools: list, client) -> dict:
...
```
**Invariant:** Must terminate before `max_iterations` and `iteration_budget.remaining <= 0`.
### 2. `ModelDispatcher`
**Responsibility:** All interaction with `client.chat.completions.create`, including streaming, fallback activation, and response normalization.
**Interface:**
```python
class ModelDispatcher:
def call(self, model: str, messages: list, tools: list, **kwargs) -> ModelResponse:
...
```
**Invariant:** Must always return a normalized object with `.content`, `.tool_calls`, `.reasoning`.
### 3. `ToolExecutor`
**Responsibility:** Execute tool calls (sequential or concurrent), handle errors, and format results.
**Interface:**
```python
class ToolExecutor:
def execute(self, tool_calls: list, task_id: str = None) -> list[ToolResult]:
...
```
**Invariant:** Every tool_call produces exactly one ToolResult, and errors are JSON-serializable.
### 4. `MemoryInterceptor`
**Responsibility:** Intercept `memory` and `todo` tool calls before they reach the registry, plus flush memories on session end.
**Interface:**
```python
class MemoryInterceptor:
def intercept(self, tool_name: str, args: dict, task_id: str = None) -> str | None:
... # returns result if intercepted, None if pass-through
```
**Invariant:** Must not mutate agent state except through explicit `flush()` calls.
### 5. `PromptBuilder`
**Responsibility:** Assemble system prompt, inject skills, apply context compression, and manage prompt caching markers.
**Interface:**
```python
class PromptBuilder:
def build(self, user_message: str, conversation_history: list) -> list:
...
```
**Invariant:** Output list must start with a system message (or equivalent provider parameter).
## Migration Path
1. Create the 5 classes as thin facades that delegate back to `AIAgent` methods.
2. Move logic incrementally from `AIAgent` into the new classes.
3. Once `AIAgent` is a pure coordinator (~500 SLOC), freeze the interface.
4. Phase II competing agents rewrite one class at a time.
## Acceptance Criteria
- [ ] `AIAgent` reduced to < 1,000 SLOC
- [ ] Each new class has > 80% test coverage
- [ ] Full existing test suite still passes
- [ ] No behavioral regressions in shadow mode

View File

@@ -1,263 +0,0 @@
# Hermes Ω Specification Draft (Ouroboros Phase I)
> Auto-generated by Ezra as part of EPIC-999. This document is a living artifact.
## Scope
This specification covers the core runtime of Hermes agent v0.7.x as found in the `hermes-agent` codebase.
## High-Level Architecture
```
User Message
Gateway (gateway/run.py) — platform adapter (Telegram, Discord, CLI, etc.)
HermesCLI (cli.py) or AIAgent.chat() (run_agent.py)
ModelTools (model_tools.py) — tool discovery, schema assembly, dispatch
Tool Registry (tools/registry.py) — handler lookup, availability checks
Individual Tool Implementations (tools/*.py)
Results returned up the stack
```
## Module Specifications
### `run_agent.py`
**Lines of Code:** 8948
**Classes:**
- `_SafeWriter`
- *Transparent stdio wrapper that catches OSError/ValueError from broken pipes.*
- `__init__(self, inner)`
- `write(self, data)`
- `flush(self)`
- `fileno(self)`
- `isatty(self)`
- ... and 1 more methods
- `IterationBudget`
- *Thread-safe iteration counter for an agent.*
- `__init__(self, max_total)`
- `consume(self)`
- `refund(self)`
- `used(self)`
- `remaining(self)`
- `AIAgent`
- *AI Agent with tool calling capabilities.*
- `base_url(self)`
- `base_url(self, value)`
- `__init__(self, base_url, api_key, provider, api_mode, acp_command, acp_args, command, args, model, max_iterations, tool_delay, enabled_toolsets, disabled_toolsets, save_trajectories, verbose_logging, quiet_mode, ephemeral_system_prompt, log_prefix_chars, log_prefix, providers_allowed, providers_ignored, providers_order, provider_sort, provider_require_parameters, provider_data_collection, session_id, tool_progress_callback, tool_start_callback, tool_complete_callback, thinking_callback, reasoning_callback, clarify_callback, step_callback, stream_delta_callback, tool_gen_callback, status_callback, max_tokens, reasoning_config, prefill_messages, platform, skip_context_files, skip_memory, session_db, iteration_budget, fallback_model, credential_pool, checkpoints_enabled, checkpoint_max_snapshots, pass_session_id, persist_session)`
- `reset_session_state(self)`
- `_safe_print(self)`
- ... and 100 more methods
**Top-Level Functions:**
- `_install_safe_stdio()`
- `_is_destructive_command(cmd)`
- `_should_parallelize_tool_batch(tool_calls)`
- `_extract_parallel_scope_path(tool_name, function_args)`
- `_paths_overlap(left, right)`
- `_sanitize_surrogates(text)`
- `_sanitize_messages_surrogates(messages)`
- `_strip_budget_warnings_from_history(messages)`
- `main(query, model, api_key, base_url, max_turns, enabled_toolsets, disabled_toolsets, list_tools, save_trajectories, save_sample, verbose, log_prefix_chars)`
**Inferred Side Effects & Invariants:**
- Persists state to SQLite database.
- Performs file I/O.
- Makes HTTP network calls.
- Uses global mutable state (risk factor).
### `model_tools.py`
**Lines of Code:** 466
**Top-Level Functions:**
- `_get_tool_loop()`
- `_get_worker_loop()`
- `_run_async(coro)`
- `_discover_tools()`
- `get_tool_definitions(enabled_toolsets, disabled_toolsets, quiet_mode)`
- `handle_function_call(function_name, function_args, task_id, user_task, enabled_tools)`
- `get_all_tool_names()`
- `get_toolset_for_tool(tool_name)`
- `get_available_toolsets()`
- `check_toolset_requirements()`
- ... and 1 more functions
**Inferred Side Effects & Invariants:**
- Uses global mutable state (risk factor).
- Primarily pure Python logic / orchestration.
### `cli.py`
**Lines of Code:** 8280
**Classes:**
- `ChatConsole`
- *Rich Console adapter for prompt_toolkit's patch_stdout context.*
- `__init__(self)`
- `print(self)`
- `HermesCLI`
- *Interactive CLI for the Hermes Agent.*
- `__init__(self, model, toolsets, provider, api_key, base_url, max_turns, verbose, compact, resume, checkpoints, pass_session_id)`
- `_invalidate(self, min_interval)`
- `_status_bar_context_style(self, percent_used)`
- `_build_context_bar(self, percent_used, width)`
- `_get_status_bar_snapshot(self)`
- ... and 106 more methods
**Top-Level Functions:**
- `_load_prefill_messages(file_path)`
- `_parse_reasoning_config(effort)`
- `load_cli_config()`
- `_run_cleanup()`
- `_git_repo_root()`
- `_path_is_within_root(path, root)`
- `_setup_worktree(repo_root)`
- `_cleanup_worktree(info)`
- `_prune_stale_worktrees(repo_root, max_age_hours)`
- `_accent_hex()`
- ... and 9 more functions
**Inferred Side Effects & Invariants:**
- Persists state to SQLite database.
- Performs file I/O.
- Spawns subprocesses / shell commands.
- Uses global mutable state (risk factor).
### `tools/registry.py`
**Lines of Code:** 275
**Classes:**
- `ToolEntry`
- *Metadata for a single registered tool.*
- `__init__(self, name, toolset, schema, handler, check_fn, requires_env, is_async, description, emoji)`
- `ToolRegistry`
- *Singleton registry that collects tool schemas + handlers from tool files.*
- `__init__(self)`
- `register(self, name, toolset, schema, handler, check_fn, requires_env, is_async, description, emoji)`
- `deregister(self, name)`
- `get_definitions(self, tool_names, quiet)`
- `dispatch(self, name, args)`
- ... and 10 more methods
**Inferred Side Effects & Invariants:**
- Primarily pure Python logic / orchestration.
### `gateway/run.py`
**Lines of Code:** 6657
**Classes:**
- `GatewayRunner`
- *Main gateway controller.*
- `__init__(self, config)`
- `_has_setup_skill(self)`
- `_load_voice_modes(self)`
- `_save_voice_modes(self)`
- `_set_adapter_auto_tts_disabled(self, adapter, chat_id, disabled)`
- ... and 78 more methods
**Top-Level Functions:**
- `_ensure_ssl_certs()`
- `_normalize_whatsapp_identifier(value)`
- `_expand_whatsapp_auth_aliases(identifier)`
- `_resolve_runtime_agent_kwargs()`
- `_build_media_placeholder(event)`
- `_dequeue_pending_text(adapter, session_key)`
- `_check_unavailable_skill(command_name)`
- `_platform_config_key(platform)`
- `_load_gateway_config()`
- `_resolve_gateway_model(config)`
- ... and 4 more functions
**Inferred Side Effects & Invariants:**
- Persists state to SQLite database.
- Performs file I/O.
- Spawns subprocesses / shell commands.
- Contains async code paths.
- Uses global mutable state (risk factor).
### `hermes_state.py`
**Lines of Code:** 1270
**Classes:**
- `SessionDB`
- *SQLite-backed session storage with FTS5 search.*
- `__init__(self, db_path)`
- `_execute_write(self, fn)`
- `_try_wal_checkpoint(self)`
- `close(self)`
- `_init_schema(self)`
- ... and 29 more methods
**Inferred Side Effects & Invariants:**
- Persists state to SQLite database.
### `agent/context_compressor.py`
**Lines of Code:** 676
**Classes:**
- `ContextCompressor`
- *Compresses conversation context when approaching the model's context limit.*
- `__init__(self, model, threshold_percent, protect_first_n, protect_last_n, summary_target_ratio, quiet_mode, summary_model_override, base_url, api_key, config_context_length, provider)`
- `update_from_response(self, usage)`
- `should_compress(self, prompt_tokens)`
- `should_compress_preflight(self, messages)`
- `get_status(self)`
- ... and 11 more methods
**Inferred Side Effects & Invariants:**
- Primarily pure Python logic / orchestration.
### `agent/prompt_caching.py`
**Lines of Code:** 72
**Top-Level Functions:**
- `_apply_cache_marker(msg, cache_marker, native_anthropic)`
- `apply_anthropic_cache_control(api_messages, cache_ttl, native_anthropic)`
**Inferred Side Effects & Invariants:**
- Primarily pure Python logic / orchestration.
### `agent/skill_commands.py`
**Lines of Code:** 297
**Top-Level Functions:**
- `build_plan_path(user_instruction)`
- `_load_skill_payload(skill_identifier, task_id)`
- `_build_skill_message(loaded_skill, skill_dir, activation_note, user_instruction, runtime_note)`
- `scan_skill_commands()`
- `get_skill_commands()`
- `build_skill_invocation_message(cmd_key, user_instruction, task_id, runtime_note)`
- `build_preloaded_skills_prompt(skill_identifiers, task_id)`
**Inferred Side Effects & Invariants:**
- Uses global mutable state (risk factor).
- Primarily pure Python logic / orchestration.
## Cross-Module Dependencies
Key data flow:
1. `run_agent.py` defines `AIAgent` — the canonical conversation loop.
2. `model_tools.py` assembles tool schemas and dispatches function calls.
3. `tools/registry.py` maintains the central registry; all tool files import it.
4. `gateway/run.py` adapts platform events into `AIAgent.run_conversation()` calls.
5. `cli.py` (`HermesCLI`) provides the interactive shell and slash-command routing.
## Known Coupling Risks
- `run_agent.py` is ~7k SLOC and contains the core loop, todo/memory interception, context compression, and trajectory saving. High blast radius.
- `cli.py` is ~6.5k SLOC and combines UI (Rich/prompt_toolkit), config loading, and command dispatch. Tightly coupled to display state.
- `model_tools.py` holds a process-global `_last_resolved_tool_names`. Subagent execution saves/restores this global.
- `tools/registry.py` is imported by ALL tool files; schema generation happens at import time.
## Next Actions (Phase II Prep)
1. Decompose `AIAgent` into: `ConversationLoop`, `ContextManager`, `ToolDispatcher`, `MemoryInterceptor`.
2. Extract CLI display logic from command dispatch.
3. Define strict interfaces between gateway → agent → tools.
4. Write property-based tests for the conversation loop invariant: *given the same message history and tool results, the agent must produce deterministic tool_call ordering*.
---
Generated: 2026-04-05 by Ezra (Phase I)

View File

@@ -1,137 +0,0 @@
"""
Property-based test stubs for Hermes core invariants.
Part of EPIC-999 Phase I — The Mirror.
These tests define behavioral contracts that ANY rewrite of the runtime
must satisfy, including the Hermes Ω target.
"""
import pytest
from unittest.mock import Mock, patch
# -----------------------------------------------------------------------------
# Conversation Loop Invariants
# -----------------------------------------------------------------------------
class TestConversationLoopInvariants:
"""
Invariants for AIAgent.run_conversation and its successors.
"""
def test_deterministic_tool_ordering(self):
"""
Given the same message history and available tools,
the agent must produce the same tool_call ordering.
(If non-determinism is introduced by temperature > 0,
this becomes a statistical test.)
"""
pytest.skip("TODO: implement with seeded mock model responses")
def test_tool_result_always_appended_to_history(self):
"""
After any tool_call is executed, its result MUST appear
in the conversation history before the next assistant turn.
"""
pytest.skip("TODO: mock model with forced tool_call and verify history")
def test_iteration_budget_never_exceeded(self):
"""
The loop must terminate before api_call_count >= max_iterations
AND before iteration_budget.remaining <= 0.
"""
pytest.skip("TODO: mock model to always return tool_calls; verify termination")
def test_system_prompt_presence(self):
"""
Every API call must include a system message as the first message
(or system parameter for providers that support it).
"""
pytest.skip("TODO: intercept all client.chat.completions.create calls")
def test_compression_preserves_last_n_messages(self):
"""
After context compression, the final N messages (configurable,
default ~4) must remain uncompressed to preserve local context.
"""
pytest.skip("TODO: create history > threshold, compress, verify tail")
# -----------------------------------------------------------------------------
# Tool Registry Invariants
# -----------------------------------------------------------------------------
class TestToolRegistryInvariants:
"""
Invariants for tools.registry.Registry.
"""
def test_register_then_list_contains_tool(self):
"""
After register() is called with a valid schema and handler,
list_tools() must include the registered name.
"""
pytest.skip("TODO: instantiate fresh Registry, register, assert membership")
def test_dispatch_unknown_tool_returns_error_json(self):
"""
Calling dispatch() with an unregistered tool name must return
a JSON string containing an error key, never raise raw.
"""
pytest.skip("TODO: call dispatch with 'nonexistent_tool', parse result")
def test_handler_receives_task_id_kwarg(self):
"""
Registered handlers that accept **kwargs must receive task_id
when dispatch is called with one.
"""
pytest.skip("TODO: register mock handler, dispatch with task_id, verify")
# -----------------------------------------------------------------------------
# State Persistence Invariants
# -----------------------------------------------------------------------------
class TestStatePersistenceInvariants:
"""
Invariants for hermes_state.SessionDB.
"""
def test_saved_message_is_retrievable_by_session_id(self):
"""
After save_message(session_id, ...), get_messages(session_id)
must return the message.
"""
pytest.skip("TODO: use temp SQLite DB, save, query, assert")
def test_fts_search_returns_relevant_messages(self):
"""
After indexing messages, FTS search for a unique keyword
must return the message containing it.
"""
pytest.skip("TODO: seed DB with messages, search unique token")
# -----------------------------------------------------------------------------
# Context Compressor Invariants
# -----------------------------------------------------------------------------
class TestContextCompressorInvariants:
"""
Invariants for agent.context_compressor.ContextCompressor.
"""
def test_compression_reduces_token_count(self):
"""
compress_messages(output) must have fewer tokens than
the uncompressed input (for any input > threshold).
"""
pytest.skip("TODO: mock tokenizer, provide long history, assert reduction")
def test_compression_never_drops_system_message(self):
"""
The system message must survive compression and remain
at index 0 of the returned message list.
"""
pytest.skip("TODO: compress history with system msg, verify position")

View File

@@ -0,0 +1,541 @@
#!/usr/bin/env python3
"""
Comprehensive config structure validation test script for Issue #116.
Tests the validate_config_structure() function from hermes_cli.config
across four scenarios:
1. Valid config passes without issues
2. YAML syntax errors are caught
3. Type mismatches are detected
4. Completely broken YAML is handled gracefully
Usage:
python scripts/test_config_validation.py
python -m pytest scripts/test_config_validation.py -v
"""
import os
import subprocess
import sys
import tempfile
from pathlib import Path
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
PASS = "\033[32mPASS\033[0m"
FAIL = "\033[31mFAIL\033[0m"
def _hermes_agent_root() -> Path:
"""Return the hermes-agent project root."""
return Path(__file__).resolve().parent.parent
def _run_in_project(cmd: list[str], extra_env: dict[str, str] | None = None, **kwargs) -> subprocess.CompletedProcess:
"""Run a command with the project root on sys.path."""
env = os.environ.copy()
root = str(_hermes_agent_root())
env["PYTHONPATH"] = root
if extra_env:
env.update(extra_env)
return subprocess.run(cmd, capture_output=True, text=True, env=env, **kwargs)
def _write_and_load_yaml(yaml_content: str):
"""Write yaml_content to a temp file, set HERMES_HOME to point at it,
then run validate_config_structure() in a subprocess and return (rc, stdout, stderr).
"""
home = tempfile.mkdtemp(prefix="hermes_test_")
cfg_path = Path(home) / "config.yaml"
cfg_path.write_text(yaml_content, encoding="utf-8")
# We use a small inline Python script that loads the validator and
# exercises it with the given HERMES_HOME.
py_code = """
import os, sys, json
root = sys.argv[1]
sys.path.insert(0, root)
from hermes_cli.config import validate_config_structure, ConfigIssue
try:
issues = validate_config_structure()
out = [{"severity": i.severity, "message": i.message, "hint": i.hint} for i in issues]
print(json.dumps({"status": "ok", "issues": out}))
except yaml.YAMLError as e:
print(json.dumps({"status": "yaml_error", "detail": str(e)}))
except Exception as e:
print(json.dumps({"status": "error", "detail": str(e)}))
""".strip()
result = _run_in_project(
[sys.executable, "-c", py_code, str(_hermes_agent_root())],
extra_env={"HERMES_HOME": home},
)
return result
def _call_validate(config: dict):
"""Call validate_config_structure(config) directly in a subprocess and
return a dict: {"status": "ok", "issues": [...]}.
"""
import json
py_code = """
import os, sys, json
root = sys.argv[1]
config_str = sys.argv[2]
sys.path.insert(0, root)
from hermes_cli.config import validate_config_structure
config = json.loads(config_str)
issues = validate_config_structure(config)
out = [{"severity": i.severity, "message": i.message, "hint": i.hint} for i in issues]
print(json.dumps({"status": "ok", "issues": out}))
""".strip()
result = _run_in_project(
[sys.executable, "-c", py_code, str(_hermes_agent_root()), json.dumps(config)],
)
assert result.returncode == 0, f"Subprocess failed:\nstdout={result.stdout}\nstderr={result.stderr}"
return json.loads(result.stdout.strip().splitlines()[-1])
# ---------------------------------------------------------------------------
# Test harness
# ---------------------------------------------------------------------------
class TestResult:
def __init__(self):
self.passed = 0
self.failed = 0
self.results: list[tuple[str, bool, str]] = []
def record(self, name: str, ok: bool, detail: str = "") -> None:
if ok:
self.passed += 1
self.results.append((name, True, detail))
else:
self.failed += 1
self.results.append((name, False, detail))
marker = PASS if ok else FAIL
print(f" [{marker}] {name}" + (f"{detail}" if detail and not ok else ""))
def summary(self) -> bool:
total = self.passed + self.failed
print(f"\n{'='*60}")
print(f" Results: {self.passed}/{total} passed, {self.failed} failed")
print(f"{'='*60}")
if self.failed:
print("\n Failed tests:")
for name, ok, detail in self.results:
if not ok:
print(f" - {name}: {detail}")
return self.failed == 0
t = TestResult()
# ===================================================================
# 1. Valid config passes
# ===================================================================
def test_valid_empty_dict():
issues = _call_validate({})
# Empty dict — no custom_providers, no fallback_model, so no issues expected
t.record("valid: empty config dict", len(issues["issues"]) == 0)
def test_valid_custom_providers_list():
issues = _call_validate({
"custom_providers": [
{"name": "my-provider", "base_url": "https://api.example.com/v1"},
],
"model": {"provider": "custom", "default": "test"},
})
t.record("valid: custom_providers as proper list", len(issues["issues"]) == 0)
def test_valid_fallback_model():
issues = _call_validate({
"fallback_model": {
"provider": "openrouter",
"model": "anthropic/claude-sonnet-4",
},
})
fb_relevant = [i for i in issues["issues"] if "fallback" in i["message"].lower()]
t.record("valid: fallback_model with provider+model", len(fb_relevant) == 0)
def test_valid_empty_fallback():
issues = _call_validate({"fallback_model": {}})
fb_relevant = [i for i in issues["issues"] if "fallback" in i["message"].lower()]
t.record("valid: empty fallback_model is fine", len(fb_relevant) == 0)
def test_valid_fullish_config():
issues = _call_validate({
"model": {"provider": "openrouter", "default": "anthropic/claude-sonnet-4"},
"providers": {},
"fallback_providers": [],
"toolsets": ["hermes-cli"],
"custom_providers": [
{"name": "gemini", "base_url": "https://generativelanguage.googleapis.com/v1beta/openai"},
],
})
t.record("valid: full config with all sections", len(issues["issues"]) == 0)
# ===================================================================
# 2. YAML syntax errors caught
# ===================================================================
def test_yaml_syntax_bad_indent():
"""YAML with content that pyyaml cannot parse (mismatched indentation with
an unexpected block mapping context)."""
# Use a clearly broken structure: unquoted colon in a flow context
broken = "model:\n provider: openrouter\n- list_item: at_wrong_level\n"
result = _write_and_load_yaml(broken)
out = result.stdout.strip().splitlines()[-1] if result.stdout.strip() else ""
import json
try:
data = json.loads(out)
# Should handle gracefully — either yaml_error or ok (pyyaml may accept
# some "broken-looking" YAML by merging). The key is no crash.
ok = data.get("status") in ("ok", "yaml_error", "error")
t.record("yaml syntax: bad indentation handled gracefully", ok,
f"got status={data.get('status')}")
except json.JSONDecodeError:
t.record("yaml syntax: bad indentation handled gracefully", False, "could not parse output")
def test_yaml_syntax_duplicate_key():
"""YAML with duplicate keys that confuse the parser."""
result = _write_and_load_yaml("model: openrouter\nmodel: anthropic\n")
# yaml.safe_load accepts duplicate keys silently (last wins), so
# validate_config_structure should still process it without crash.
out = result.stdout.strip().splitlines()[-1] if result.stdout.strip() else ""
import json
try:
data = json.loads(out)
# Should complete without crashing
ok = data.get("status") == "ok"
t.record("yaml syntax: duplicate keys handled", ok,
f"unexpected status: {data.get('status')}")
except json.JSONDecodeError:
t.record("yaml syntax: duplicate keys handled", False, "could not parse output")
def test_yaml_syntax_trailing_colon():
"""YAML with a trailing colon that creates an unexpected mapping."""
bad_yaml = """
custom_providers:
name: test
base_url: https://example.com
invalid_key:: some_value
"""
result = _write_and_load_yaml(bad_yaml)
out = result.stdout.strip().splitlines()[-1] if result.stdout.strip() else ""
import json
try:
data = json.loads(out)
# Either yaml_error for parse failure, or ok with detection
ok = data.get("status") in ("ok", "yaml_error")
t.record("yaml syntax: trailing colon handled gracefully", ok,
f"got status={data.get('status')}")
except json.JSONDecodeError:
t.record("yaml syntax: trailing colon handled gracefully", False, "could not parse output")
# ===================================================================
# 3. Type mismatches detected
# ===================================================================
def test_custom_providers_dict_instead_of_list():
"""The classic Discord-user error: custom_providers as flat dict."""
issues = _call_validate({
"custom_providers": {
"name": "Generativelanguage.googleapis.com",
"base_url": "https://generativelanguage.googleapis.com/v1beta/openai",
"api_key": "***",
},
})
errors = [i for i in issues["issues"] if i["severity"] == "error"]
ok = any("dict" in i["message"].lower() and "list" in i["message"].lower() for i in errors)
t.record("type mismatch: custom_providers as dict instead of list", ok)
def test_custom_providers_string_instead_of_list():
issues = _call_validate({
"custom_providers": "just a string",
})
# A string is not a dict or list, so no custom_providers-specific
# errors fire, but the fact that we don't crash is the test.
ok = True # Should complete without crash
t.record("type mismatch: custom_providers as string (no crash)", ok)
def test_custom_providers_list_of_strings():
issues = _call_validate({
"custom_providers": ["not-a-dict", "also-not-a-dict"],
"model": {"provider": "custom"},
})
warnings = [i for i in issues["issues"] if i["severity"] == "warning"]
ok = any("not a dict" in i["message"] for i in warnings)
t.record("type mismatch: custom_providers list of strings detected", ok)
def test_fallback_model_string_instead_of_dict():
issues = _call_validate({
"fallback_model": "openrouter:anthropic/claude-sonnet-4",
})
errors = [i for i in issues["issues"] if i["severity"] == "error"]
ok = any("should be a dict" in i["message"] for i in errors)
t.record("type mismatch: fallback_model as string instead of dict", ok)
def test_fallback_model_list_instead_of_dict():
issues = _call_validate({
"fallback_model": ["openrouter", "claude-sonnet-4"],
})
errors = [i for i in issues["issues"] if i["severity"] == "error"]
ok = any("should be a dict" in i["message"] for i in errors)
t.record("type mismatch: fallback_model as list instead of dict", ok)
def test_fallback_model_number_instead_of_dict():
issues = _call_validate({"fallback_model": 42})
errors = [i for i in issues["issues"] if i["severity"] == "error"]
ok = any("should be a dict" in i["message"] for i in errors)
t.record("type mismatch: fallback_model as int instead of dict", ok)
def test_custom_providers_missing_name():
issues = _call_validate({
"custom_providers": [{"base_url": "https://example.com/v1"}],
"model": {"provider": "custom"},
})
ok = any("missing 'name'" in i["message"] for i in issues["issues"])
t.record("type mismatch: custom_providers entry missing 'name'", ok)
def test_custom_providers_missing_base_url():
issues = _call_validate({
"custom_providers": [{"name": "test"}],
"model": {"provider": "custom"},
})
ok = any("missing 'base_url'" in i["message"] for i in issues["issues"])
t.record("type mismatch: custom_providers entry missing 'base_url'", ok)
def test_custom_providers_missing_model_section():
issues = _call_validate({
"custom_providers": [{"name": "test", "base_url": "https://example.com/v1"}],
})
ok = any("no 'model' section" in i["message"] for i in issues["issues"])
t.record("type mismatch: custom_providers without model section", ok)
def test_nested_fallback_inside_custom_providers():
issues = _call_validate({
"custom_providers": {
"name": "test",
"fallback_model": {"provider": "openrouter", "model": "test"},
},
})
errors = [i for i in issues["issues"] if i["severity"] == "error"]
ok = any("fallback_model" in i["message"] and "inside" in i["message"] for i in errors)
t.record("type mismatch: fallback_model nested inside custom_providers dict", ok)
# ===================================================================
# 4. Completely broken YAML handled gracefully
# ===================================================================
def test_completely_broken_yaml_binary_content():
"""Binary-ish content that YAML cannot parse."""
broken = "key: \x00\x01\x02\x03 invalid binary stuff: \xff\xfe"
result = _write_and_load_yaml(broken)
out = result.stdout.strip().splitlines()[-1] if result.stdout.strip() else ""
import json
try:
data = json.loads(out)
# Any status including yaml_error / error is acceptable — no traceback
ok = True
t.record("broken yaml: binary content handled gracefully", ok)
except json.JSONDecodeError:
t.record("broken yaml: binary content handled gracefully", False,
"subprocess returned non-JSON output (possible crash)")
def test_completely_broken_yaml_random_chars():
"""Random garbage that is definitely not valid YAML."""
broken = "{{{{{}}}}} {{{{not_yaml: [}}}}\n!invalid-tag!!! @@###$$$\n"
result = _write_and_load_yaml(broken)
out = result.stdout.strip().splitlines()[-1] if result.stdout.strip() else ""
import json
try:
data = json.loads(out)
# Should either be yaml_error status, or ok with zero/many issues
ok = True # The fact we got back JSON means we didn't crash
t.record("broken yaml: random garbage handled gracefully", ok)
except json.JSONDecodeError:
t.record("broken yaml: random garbage handled gracefully", False,
"subprocess returned non-JSON output (possible crash)")
def test_completely_broken_yaml_nested_braces():
"""Deeply-nested braces that break YAML parsing."""
broken = "a: {{{{{}}}}}\n b: {{{{{}}}}}\n c: {{{{{}}}}}\n"
result = _write_and_load_yaml(broken)
out = result.stdout.strip().splitlines()[-1] if result.stdout.strip() else ""
import json
try:
data = json.loads(out)
t.record("broken yaml: nested braces handled gracefully", True)
except json.JSONDecodeError:
t.record("broken yaml: nested braces handled gracefully", False,
"subprocess returned non-JSON output")
def test_empty_yaml_file():
"""Empty config file — should load and produce no issues."""
result = _write_and_load_yaml("")
out = result.stdout.strip().splitlines()[-1] if result.stdout.strip() else ""
import json
try:
data = json.loads(out)
ok = data.get("status") == "ok" and len(data.get("issues", [])) == 0
t.record("broken yaml: empty file handled gracefully (no issues)", ok,
f"got status={data.get('status')}")
except json.JSONDecodeError:
t.record("broken yaml: empty file handled gracefully", False,
"subprocess returned non-JSON output")
def test_yaml_with_only_null():
"""YAML file containing only '~' or 'null' should produce empty dict."""
result = _write_and_load_yaml("~\n")
out = result.stdout.strip().splitlines()[-1] if result.stdout.strip() else ""
import json
try:
data = json.loads(out)
ok = data.get("status") == "ok"
t.record("broken yaml: null-only YAML handled gracefully", ok,
f"got status={data.get('status')}")
except json.JSONDecodeError:
t.record("broken yaml: null-only YAML handled gracefully", False,
"subprocess returned non-JSON output")
# ===================================================================
# Print config warnings test
# ===================================================================
def test_print_config_warnings_output():
"""Ensure print_config_warnings prints warnings when issues exist."""
import json
py_code = """
import os, sys, json
root = sys.argv[1]
sys.path.insert(0, root)
from hermes_cli.config import print_config_warnings
# This config should produce warnings
config = {
"custom_providers": {
"name": "test",
"base_url": "https://example.com",
},
}
print_config_warnings(config)
""".strip()
result = _run_in_project(
[sys.executable, "-c", py_code, str(_hermes_agent_root())],
)
ok = "config" in result.stderr.lower() or returncode_ok(result.returncode)
t.record("print_config_warnings: outputs warnings to stderr for bad config", ok,
f"stderr={result.stderr[:200]}")
# ===================================================================
# Root-level misplaced keys test
# ===================================================================
def test_misplaced_root_level_key():
"""A root-level "base_url" that should be inside model/custom_providers."""
issues = _call_validate({
"base_url": "https://api.example.com/v1",
"model": {"provider": "openrouter"},
})
warnings = [i for i in issues["issues"] if i["severity"] == "warning"]
ok = any("misplaced" in i["message"].lower() for i in warnings)
t.record("misplaced root key: base_url flagged", ok)
def test_returncode_ok(code: int) -> bool:
return code == 0
# ===================================================================
# Main
# ===================================================================
if __name__ == "__main__":
# Ensure project root is on sys.path for import in the _call_validate/
# _write_and_load_yaml subprocesses
sys.path.insert(0, str(_hermes_agent_root()))
print(f"\n{'='*60}")
print(" Config Structure Validation Tests (Issue #116)")
print(f"{'='*60}\n")
# 1. Valid config passes
print("--- 1. Valid config passes ---")
test_valid_empty_dict()
test_valid_custom_providers_list()
test_valid_fallback_model()
test_valid_empty_fallback()
test_valid_fullish_config()
# 2. YAML syntax errors caught
print("\n--- 2. YAML syntax errors caught ---")
test_yaml_syntax_bad_indent()
test_yaml_syntax_duplicate_key()
test_yaml_syntax_trailing_colon()
# 3. Type mismatches detected
print("\n--- 3. Type mismatches detected ---")
test_custom_providers_dict_instead_of_list()
test_custom_providers_string_instead_of_list()
test_custom_providers_list_of_strings()
test_fallback_model_string_instead_of_dict()
test_fallback_model_list_instead_of_dict()
test_fallback_model_number_instead_of_dict()
test_custom_providers_missing_name()
test_custom_providers_missing_base_url()
test_custom_providers_missing_model_section()
test_nested_fallback_inside_custom_providers()
test_misplaced_root_level_key()
# 4. Completely broken YAML handled gracefully
print("\n--- 4. Completely broken YAML handled gracefully ---")
test_completely_broken_yaml_binary_content()
test_completely_broken_yaml_random_chars()
test_completely_broken_yaml_nested_braces()
test_empty_yaml_file()
test_yaml_with_only_null()
# 5. Print config warnings
print("\n--- 5. Print config warnings ---")
test_print_config_warnings_output()
ok = t.summary()
sys.exit(0 if ok else 1)