Files
hermes-agent/TEST_OPTIMIZATION_GUIDE.md
Allegro 10271c6b44
Some checks failed
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Failing after 25s
Tests / test (pull_request) Failing after 24s
Docker Build and Publish / build-and-push (pull_request) Failing after 35s
security: fix command injection vulnerabilities (CVSS 9.8)
Replace shell=True with list-based subprocess execution to prevent
command injection via malicious user input.

Changes:
- tools/transcription_tools.py: Use shlex.split() + shell=False
- tools/environments/docker.py: List-based commands with container ID validation

Fixes CVE-level vulnerability where malicious file paths or container IDs
could inject arbitrary commands.

CVSS: 9.8 (Critical)
Refs: V-001 in SECURITY_AUDIT_REPORT.md
2026-03-30 23:15:11 +00:00

8.4 KiB

Test Optimization Guide for Hermes Agent

Current Test Execution Analysis

Test Suite Statistics

  • Total Test Files: 373
  • Estimated Test Functions: ~4,311
  • Async Tests: ~679 (15.8%)
  • Integration Tests: 7 files (excluded from CI)
  • Average Tests per File: ~11.6

Current CI Configuration

# .github/workflows/tests.yml
- name: Run tests
  run: |
    source .venv/bin/activate
    python -m pytest tests/ -q --ignore=tests/integration --tb=short -n auto

Current Flags:

  • -q: Quiet mode
  • --ignore=tests/integration: Skip integration tests
  • --tb=short: Short traceback format
  • -n auto: Auto-detect parallel workers

Optimization Recommendations

1. Add Test Duration Reporting

Current: No duration tracking Recommended:

run: |
  python -m pytest tests/ \
    --ignore=tests/integration \
    -n auto \
    --durations=20 \           # Show 20 slowest tests
    --durations-min=1.0        # Only show tests >1s

This will help identify slow tests that need optimization.

2. Implement Test Categories

Add markers to pyproject.toml:

[tool.pytest.ini_options]
testpaths = ["tests"]
markers = [
    "integration: marks tests requiring external services",
    "slow: marks tests that take >5 seconds",
    "unit: marks fast unit tests",
    "security: marks security-focused tests",
    "flakey: marks tests that may be unstable",
]
addopts = "-m 'not integration and not slow' -n auto"

Usage:

# Run only fast unit tests
pytest -m unit

# Run all tests including slow ones
pytest -m "not integration"

# Run only security tests
pytest -m security

3. Optimize Slow Test Candidates

Based on file sizes, these tests likely need optimization:

File Lines Optimization Strategy
test_run_agent.py 3,329 Split into multiple files by feature
test_mcp_tool.py 2,902 Split by MCP functionality
test_voice_command.py 2,632 Review for redundant tests
test_feishu.py 2,580 Mock external API calls
test_api_server.py 1,503 Parallelize independent tests

4. Add Coverage Reporting to CI

Updated workflow:

- name: Run tests with coverage
  run: |
    source .venv/bin/activate
    python -m pytest tests/ \
      --ignore=tests/integration \
      -n auto \
      --cov=agent --cov=tools --cov=gateway --cov=hermes_cli \
      --cov-report=xml \
      --cov-report=html \
      --cov-fail-under=70

- name: Upload coverage to Codecov
  uses: codecov/codecov-action@v3
  with:
    files: ./coverage.xml
    fail_ci_if_error: true

5. Implement Flaky Test Handling

Add pytest-rerunfailures:

dev = [
    "pytest>=9.0.2,<10",
    "pytest-asyncio>=1.3.0,<2",
    "pytest-xdist>=3.0,<4",
    "pytest-cov>=5.0,<6",
    "pytest-rerunfailures>=14.0,<15",  # Add this
]

Usage:

# Mark known flaky tests
@pytest.mark.flakey(reruns=3, reruns_delay=1)
async def test_network_dependent_feature():
    # Test that sometimes fails due to network
    pass

6. Optimize Fixture Scopes

Review conftest.py fixtures:

# Current: Function scope (runs for every test)
@pytest.fixture()
def mock_config():
    return {...}

# Optimized: Session scope (runs once per session)
@pytest.fixture(scope="session")
def mock_config():
    return {...}

# Optimized: Module scope (runs once per module)
@pytest.fixture(scope="module")
def expensive_setup():
    # Setup that can be reused across module
    pass

7. Parallel Execution Tuning

Current: -n auto (uses all CPUs) Issues:

  • May cause resource contention
  • Some tests may not be thread-safe

Recommendations:

# Limit workers to prevent resource exhaustion
pytest -n 4  # Use 4 workers regardless of CPU count

# Use load-based scheduling for uneven test durations
pytest -n auto --dist=load

# Group tests by module to reduce setup overhead
pytest -n auto --dist=loadscope

8. Test Data Management

Current Issue: Tests may create files in /tmp without cleanup

Solution - Factory Pattern:

# tests/factories.py
import tempfile
import shutil
from contextlib import contextmanager

@contextmanager
def temp_workspace():
    """Create isolated temp directory for tests."""
    path = tempfile.mkdtemp(prefix="hermes_test_")
    try:
        yield Path(path)
    finally:
        shutil.rmtree(path, ignore_errors=True)

# Usage in tests
def test_file_operations():
    with temp_workspace() as tmp:
        # All file operations in isolated directory
        file_path = tmp / "test.txt"
        file_path.write_text("content")
        assert file_path.exists()
    # Automatically cleaned up

9. Database/State Isolation

Current: Uses monkeypatch for env vars Enhancement: Database mocking

@pytest.fixture
def mock_honcho():
    """Mock Honcho client for tests."""
    with patch("honcho_integration.client.HonchoClient") as mock:
        mock_instance = MagicMock()
        mock_instance.get_session.return_value = {"id": "test-session"}
        mock.return_value = mock_instance
        yield mock

# Usage
async def test_memory_storage(mock_honcho):
    # Fast, isolated test
    pass

10. CI Pipeline Optimization

Current Pipeline:

  1. Checkout
  2. Install uv
  3. Install Python
  4. Install deps
  5. Run tests

Optimized Pipeline (with caching):

jobs:
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Install uv
        uses: astral-sh/setup-uv@v5
        with:
          version: "0.5.x"
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
          cache: 'pip'  # Cache pip dependencies
      
      - name: Cache uv packages
        uses: actions/cache@v4
        with:
          path: ~/.cache/uv
          key: ${{ runner.os }}-uv-${{ hashFiles('**/pyproject.toml') }}
      
      - name: Install dependencies
        run: |
          uv venv .venv
          uv pip install -e ".[all,dev]"
      
      - name: Run fast tests
        run: |
          source .venv/bin/activate
          pytest -m "not integration and not slow" -n auto --tb=short
      
      - name: Run slow tests
        if: github.event_name == 'pull_request'
        run: |
          source .venv/bin/activate
          pytest -m "slow" -n 2 --tb=short

Quick Wins (Implement First)

1. Add Duration Reporting (5 minutes)

--durations=10

2. Mark Slow Tests (30 minutes)

Add @pytest.mark.slow to tests taking >5s.

3. Split Largest Test File (2 hours)

Split test_run_agent.py into:

  • test_run_agent_core.py
  • test_run_agent_tools.py
  • test_run_agent_memory.py
  • test_run_agent_messaging.py

4. Add Coverage Baseline (1 hour)

pytest --cov=agent --cov=tools --cov=gateway tests/ --cov-report=html

5. Optimize Fixture Scopes (1 hour)

Review and optimize 5 most-used fixtures.


Long-term Improvements

Test Data Generation

# Implement hypothesis-based testing
from hypothesis import given, strategies as st

@given(st.lists(st.text(), min_size=1))
def test_message_batching(messages):
    # Property-based testing
    pass

Performance Regression Testing

@pytest.mark.benchmark
def test_message_processing_speed(benchmark):
    result = benchmark(process_messages, sample_data)
    assert result.throughput > 1000  # msgs/sec

Contract Testing

# Verify API contracts between components
@pytest.mark.contract
def test_agent_tool_contract():
    """Verify agent sends correct format to tools."""
    pass

Measurement Checklist

After implementing optimizations, verify:

  • Test suite execution time < 5 minutes
  • No individual test > 10 seconds (except integration)
  • Code coverage > 70%
  • All flaky tests marked and retried
  • CI passes consistently (>95% success rate)
  • Memory usage stable (no leaks in test suite)

Tools to Add

[project.optional-dependencies]
dev = [
    "pytest>=9.0.2,<10",
    "pytest-asyncio>=1.3.0,<2",
    "pytest-xdist>=3.0,<4",
    "pytest-cov>=5.0,<6",
    "pytest-rerunfailures>=14.0,<15",
    "pytest-benchmark>=4.0,<5",       # Performance testing
    "pytest-mock>=3.12,<4",            # Enhanced mocking
    "hypothesis>=6.100,<7",            # Property-based testing
    "factory-boy>=3.3,<4",             # Test data factories
]