Timmy_Foundation/hermes-agent

Fork 0

Files

Allegro 10271c6b44

Supply Chain Audit / Scan PR for supply chain risks (pull_request) Failing after 25s

Details

Tests / test (pull_request) Failing after 24s

Details

Docker Build and Publish / build-and-push (pull_request) Failing after 35s

Details

security: fix command injection vulnerabilities (CVSS 9.8)

Replace shell=True with list-based subprocess execution to prevent
command injection via malicious user input.

Changes:
- tools/transcription_tools.py: Use shlex.split() + shell=False
- tools/environments/docker.py: List-based commands with container ID validation

Fixes CVE-level vulnerability where malicious file paths or container IDs
could inject arbitrary commands.

CVSS: 9.8 (Critical)
Refs: V-001 in SECURITY_AUDIT_REPORT.md

2026-03-30 23:15:11 +00:00

8.4 KiB

Raw Permalink Blame History

Test Optimization Guide for Hermes Agent

Current Test Execution Analysis

Test Suite Statistics

Total Test Files: 373
Estimated Test Functions: ~4,311
Async Tests: ~679 (15.8%)
Integration Tests: 7 files (excluded from CI)
Average Tests per File: ~11.6

Current CI Configuration

# .github/workflows/tests.yml
- name: Run tests
  run: |
    source .venv/bin/activate
    python -m pytest tests/ -q --ignore=tests/integration --tb=short -n auto

Current Flags:

-q: Quiet mode
--ignore=tests/integration: Skip integration tests
--tb=short: Short traceback format
-n auto: Auto-detect parallel workers

Optimization Recommendations

1. Add Test Duration Reporting

Current: No duration tracking Recommended:

run: |
  python -m pytest tests/ \
    --ignore=tests/integration \
    -n auto \
    --durations=20 \           # Show 20 slowest tests
    --durations-min=1.0        # Only show tests >1s

This will help identify slow tests that need optimization.

2. Implement Test Categories

Add markers to pyproject.toml:

[tool.pytest.ini_options]
testpaths = ["tests"]
markers = [
    "integration: marks tests requiring external services",
    "slow: marks tests that take >5 seconds",
    "unit: marks fast unit tests",
    "security: marks security-focused tests",
    "flakey: marks tests that may be unstable",
]
addopts = "-m 'not integration and not slow' -n auto"

Usage:

# Run only fast unit tests
pytest -m unit

# Run all tests including slow ones
pytest -m "not integration"

# Run only security tests
pytest -m security

3. Optimize Slow Test Candidates

Based on file sizes, these tests likely need optimization:

File	Lines	Optimization Strategy
`test_run_agent.py`	3,329	Split into multiple files by feature
`test_mcp_tool.py`	2,902	Split by MCP functionality
`test_voice_command.py`	2,632	Review for redundant tests
`test_feishu.py`	2,580	Mock external API calls
`test_api_server.py`	1,503	Parallelize independent tests

4. Add Coverage Reporting to CI

Updated workflow:

- name: Run tests with coverage
  run: |
    source .venv/bin/activate
    python -m pytest tests/ \
      --ignore=tests/integration \
      -n auto \
      --cov=agent --cov=tools --cov=gateway --cov=hermes_cli \
      --cov-report=xml \
      --cov-report=html \
      --cov-fail-under=70

- name: Upload coverage to Codecov
  uses: codecov/codecov-action@v3
  with:
    files: ./coverage.xml
    fail_ci_if_error: true

5. Implement Flaky Test Handling

Add pytest-rerunfailures:

dev = [
    "pytest>=9.0.2,<10",
    "pytest-asyncio>=1.3.0,<2",
    "pytest-xdist>=3.0,<4",
    "pytest-cov>=5.0,<6",
    "pytest-rerunfailures>=14.0,<15",  # Add this
]

Usage:

# Mark known flaky tests
@pytest.mark.flakey(reruns=3, reruns_delay=1)
async def test_network_dependent_feature():
    # Test that sometimes fails due to network
    pass

6. Optimize Fixture Scopes

Review conftest.py fixtures:

# Current: Function scope (runs for every test)
@pytest.fixture()
def mock_config():
    return {...}

# Optimized: Session scope (runs once per session)
@pytest.fixture(scope="session")
def mock_config():
    return {...}

# Optimized: Module scope (runs once per module)
@pytest.fixture(scope="module")
def expensive_setup():
    # Setup that can be reused across module
    pass

7. Parallel Execution Tuning

Current: -n auto (uses all CPUs) Issues:

May cause resource contention
Some tests may not be thread-safe

Recommendations:

# Limit workers to prevent resource exhaustion
pytest -n 4  # Use 4 workers regardless of CPU count

# Use load-based scheduling for uneven test durations
pytest -n auto --dist=load

# Group tests by module to reduce setup overhead
pytest -n auto --dist=loadscope

8. Test Data Management

Current Issue: Tests may create files in /tmp without cleanup

Solution - Factory Pattern:

# tests/factories.py
import tempfile
import shutil
from contextlib import contextmanager

@contextmanager
def temp_workspace():
    """Create isolated temp directory for tests."""
    path = tempfile.mkdtemp(prefix="hermes_test_")
    try:
        yield Path(path)
    finally:
        shutil.rmtree(path, ignore_errors=True)

# Usage in tests
def test_file_operations():
    with temp_workspace() as tmp:
        # All file operations in isolated directory
        file_path = tmp / "test.txt"
        file_path.write_text("content")
        assert file_path.exists()
    # Automatically cleaned up

9. Database/State Isolation

Current: Uses monkeypatch for env vars Enhancement: Database mocking

@pytest.fixture
def mock_honcho():
    """Mock Honcho client for tests."""
    with patch("honcho_integration.client.HonchoClient") as mock:
        mock_instance = MagicMock()
        mock_instance.get_session.return_value = {"id": "test-session"}
        mock.return_value = mock_instance
        yield mock

# Usage
async def test_memory_storage(mock_honcho):
    # Fast, isolated test
    pass

10. CI Pipeline Optimization

Current Pipeline:

Checkout
Install uv
Install Python
Install deps
Run tests

Optimized Pipeline (with caching):

jobs:
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Install uv
        uses: astral-sh/setup-uv@v5
        with:
          version: "0.5.x"
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
          cache: 'pip'  # Cache pip dependencies
      
      - name: Cache uv packages
        uses: actions/cache@v4
        with:
          path: ~/.cache/uv
          key: ${{ runner.os }}-uv-${{ hashFiles('**/pyproject.toml') }}
      
      - name: Install dependencies
        run: |
          uv venv .venv
          uv pip install -e ".[all,dev]"
      
      - name: Run fast tests
        run: |
          source .venv/bin/activate
          pytest -m "not integration and not slow" -n auto --tb=short
      
      - name: Run slow tests
        if: github.event_name == 'pull_request'
        run: |
          source .venv/bin/activate
          pytest -m "slow" -n 2 --tb=short

Quick Wins (Implement First)

1. Add Duration Reporting (5 minutes)

--durations=10

2. Mark Slow Tests (30 minutes)

Add @pytest.mark.slow to tests taking >5s.

3. Split Largest Test File (2 hours)

Split test_run_agent.py into:

test_run_agent_core.py
test_run_agent_tools.py
test_run_agent_memory.py
test_run_agent_messaging.py

4. Add Coverage Baseline (1 hour)

pytest --cov=agent --cov=tools --cov=gateway tests/ --cov-report=html

5. Optimize Fixture Scopes (1 hour)

Review and optimize 5 most-used fixtures.

Long-term Improvements

Test Data Generation

# Implement hypothesis-based testing
from hypothesis import given, strategies as st

@given(st.lists(st.text(), min_size=1))
def test_message_batching(messages):
    # Property-based testing
    pass

Performance Regression Testing

@pytest.mark.benchmark
def test_message_processing_speed(benchmark):
    result = benchmark(process_messages, sample_data)
    assert result.throughput > 1000  # msgs/sec

Contract Testing

# Verify API contracts between components
@pytest.mark.contract
def test_agent_tool_contract():
    """Verify agent sends correct format to tools."""
    pass

Measurement Checklist

After implementing optimizations, verify:

Test suite execution time < 5 minutes
No individual test > 10 seconds (except integration)
Code coverage > 70%
All flaky tests marked and retried
CI passes consistently (>95% success rate)
Memory usage stable (no leaks in test suite)

Tools to Add

[project.optional-dependencies]
dev = [
    "pytest>=9.0.2,<10",
    "pytest-asyncio>=1.3.0,<2",
    "pytest-xdist>=3.0,<4",
    "pytest-cov>=5.0,<6",
    "pytest-rerunfailures>=14.0,<15",
    "pytest-benchmark>=4.0,<5",       # Performance testing
    "pytest-mock>=3.12,<4",            # Enhanced mocking
    "hypothesis>=6.100,<7",            # Property-based testing
    "factory-boy>=3.3,<4",             # Test data factories
]

8.4 KiB Raw Permalink Blame History

Test Optimization Guide for Hermes Agent

Current Test Execution Analysis

Test Suite Statistics

Current CI Configuration

Optimization Recommendations

1. Add Test Duration Reporting

2. Implement Test Categories

3. Optimize Slow Test Candidates

4. Add Coverage Reporting to CI

5. Implement Flaky Test Handling

6. Optimize Fixture Scopes

7. Parallel Execution Tuning

8. Test Data Management

9. Database/State Isolation

10. CI Pipeline Optimization

Quick Wins (Implement First)

1. Add Duration Reporting (5 minutes)

2. Mark Slow Tests (30 minutes)

3. Split Largest Test File (2 hours)

4. Add Coverage Baseline (1 hour)

5. Optimize Fixture Scopes (1 hour)

Long-term Improvements

Test Data Generation

Performance Regression Testing

Contract Testing

Measurement Checklist

Tools to Add

8.4 KiB

Raw Permalink Blame History