Replace shell=True with list-based subprocess execution to prevent command injection via malicious user input. Changes: - tools/transcription_tools.py: Use shlex.split() + shell=False - tools/environments/docker.py: List-based commands with container ID validation Fixes CVE-level vulnerability where malicious file paths or container IDs could inject arbitrary commands. CVSS: 9.8 (Critical) Refs: V-001 in SECURITY_AUDIT_REPORT.md
8.4 KiB
8.4 KiB
Test Optimization Guide for Hermes Agent
Current Test Execution Analysis
Test Suite Statistics
- Total Test Files: 373
- Estimated Test Functions: ~4,311
- Async Tests: ~679 (15.8%)
- Integration Tests: 7 files (excluded from CI)
- Average Tests per File: ~11.6
Current CI Configuration
# .github/workflows/tests.yml
- name: Run tests
run: |
source .venv/bin/activate
python -m pytest tests/ -q --ignore=tests/integration --tb=short -n auto
Current Flags:
-q: Quiet mode--ignore=tests/integration: Skip integration tests--tb=short: Short traceback format-n auto: Auto-detect parallel workers
Optimization Recommendations
1. Add Test Duration Reporting
Current: No duration tracking Recommended:
run: |
python -m pytest tests/ \
--ignore=tests/integration \
-n auto \
--durations=20 \ # Show 20 slowest tests
--durations-min=1.0 # Only show tests >1s
This will help identify slow tests that need optimization.
2. Implement Test Categories
Add markers to pyproject.toml:
[tool.pytest.ini_options]
testpaths = ["tests"]
markers = [
"integration: marks tests requiring external services",
"slow: marks tests that take >5 seconds",
"unit: marks fast unit tests",
"security: marks security-focused tests",
"flakey: marks tests that may be unstable",
]
addopts = "-m 'not integration and not slow' -n auto"
Usage:
# Run only fast unit tests
pytest -m unit
# Run all tests including slow ones
pytest -m "not integration"
# Run only security tests
pytest -m security
3. Optimize Slow Test Candidates
Based on file sizes, these tests likely need optimization:
| File | Lines | Optimization Strategy |
|---|---|---|
test_run_agent.py |
3,329 | Split into multiple files by feature |
test_mcp_tool.py |
2,902 | Split by MCP functionality |
test_voice_command.py |
2,632 | Review for redundant tests |
test_feishu.py |
2,580 | Mock external API calls |
test_api_server.py |
1,503 | Parallelize independent tests |
4. Add Coverage Reporting to CI
Updated workflow:
- name: Run tests with coverage
run: |
source .venv/bin/activate
python -m pytest tests/ \
--ignore=tests/integration \
-n auto \
--cov=agent --cov=tools --cov=gateway --cov=hermes_cli \
--cov-report=xml \
--cov-report=html \
--cov-fail-under=70
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
with:
files: ./coverage.xml
fail_ci_if_error: true
5. Implement Flaky Test Handling
Add pytest-rerunfailures:
dev = [
"pytest>=9.0.2,<10",
"pytest-asyncio>=1.3.0,<2",
"pytest-xdist>=3.0,<4",
"pytest-cov>=5.0,<6",
"pytest-rerunfailures>=14.0,<15", # Add this
]
Usage:
# Mark known flaky tests
@pytest.mark.flakey(reruns=3, reruns_delay=1)
async def test_network_dependent_feature():
# Test that sometimes fails due to network
pass
6. Optimize Fixture Scopes
Review conftest.py fixtures:
# Current: Function scope (runs for every test)
@pytest.fixture()
def mock_config():
return {...}
# Optimized: Session scope (runs once per session)
@pytest.fixture(scope="session")
def mock_config():
return {...}
# Optimized: Module scope (runs once per module)
@pytest.fixture(scope="module")
def expensive_setup():
# Setup that can be reused across module
pass
7. Parallel Execution Tuning
Current: -n auto (uses all CPUs)
Issues:
- May cause resource contention
- Some tests may not be thread-safe
Recommendations:
# Limit workers to prevent resource exhaustion
pytest -n 4 # Use 4 workers regardless of CPU count
# Use load-based scheduling for uneven test durations
pytest -n auto --dist=load
# Group tests by module to reduce setup overhead
pytest -n auto --dist=loadscope
8. Test Data Management
Current Issue: Tests may create files in /tmp without cleanup
Solution - Factory Pattern:
# tests/factories.py
import tempfile
import shutil
from contextlib import contextmanager
@contextmanager
def temp_workspace():
"""Create isolated temp directory for tests."""
path = tempfile.mkdtemp(prefix="hermes_test_")
try:
yield Path(path)
finally:
shutil.rmtree(path, ignore_errors=True)
# Usage in tests
def test_file_operations():
with temp_workspace() as tmp:
# All file operations in isolated directory
file_path = tmp / "test.txt"
file_path.write_text("content")
assert file_path.exists()
# Automatically cleaned up
9. Database/State Isolation
Current: Uses monkeypatch for env vars
Enhancement: Database mocking
@pytest.fixture
def mock_honcho():
"""Mock Honcho client for tests."""
with patch("honcho_integration.client.HonchoClient") as mock:
mock_instance = MagicMock()
mock_instance.get_session.return_value = {"id": "test-session"}
mock.return_value = mock_instance
yield mock
# Usage
async def test_memory_storage(mock_honcho):
# Fast, isolated test
pass
10. CI Pipeline Optimization
Current Pipeline:
- Checkout
- Install uv
- Install Python
- Install deps
- Run tests
Optimized Pipeline (with caching):
jobs:
test:
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v5
with:
version: "0.5.x"
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip' # Cache pip dependencies
- name: Cache uv packages
uses: actions/cache@v4
with:
path: ~/.cache/uv
key: ${{ runner.os }}-uv-${{ hashFiles('**/pyproject.toml') }}
- name: Install dependencies
run: |
uv venv .venv
uv pip install -e ".[all,dev]"
- name: Run fast tests
run: |
source .venv/bin/activate
pytest -m "not integration and not slow" -n auto --tb=short
- name: Run slow tests
if: github.event_name == 'pull_request'
run: |
source .venv/bin/activate
pytest -m "slow" -n 2 --tb=short
Quick Wins (Implement First)
1. Add Duration Reporting (5 minutes)
--durations=10
2. Mark Slow Tests (30 minutes)
Add @pytest.mark.slow to tests taking >5s.
3. Split Largest Test File (2 hours)
Split test_run_agent.py into:
test_run_agent_core.pytest_run_agent_tools.pytest_run_agent_memory.pytest_run_agent_messaging.py
4. Add Coverage Baseline (1 hour)
pytest --cov=agent --cov=tools --cov=gateway tests/ --cov-report=html
5. Optimize Fixture Scopes (1 hour)
Review and optimize 5 most-used fixtures.
Long-term Improvements
Test Data Generation
# Implement hypothesis-based testing
from hypothesis import given, strategies as st
@given(st.lists(st.text(), min_size=1))
def test_message_batching(messages):
# Property-based testing
pass
Performance Regression Testing
@pytest.mark.benchmark
def test_message_processing_speed(benchmark):
result = benchmark(process_messages, sample_data)
assert result.throughput > 1000 # msgs/sec
Contract Testing
# Verify API contracts between components
@pytest.mark.contract
def test_agent_tool_contract():
"""Verify agent sends correct format to tools."""
pass
Measurement Checklist
After implementing optimizations, verify:
- Test suite execution time < 5 minutes
- No individual test > 10 seconds (except integration)
- Code coverage > 70%
- All flaky tests marked and retried
- CI passes consistently (>95% success rate)
- Memory usage stable (no leaks in test suite)
Tools to Add
[project.optional-dependencies]
dev = [
"pytest>=9.0.2,<10",
"pytest-asyncio>=1.3.0,<2",
"pytest-xdist>=3.0,<4",
"pytest-cov>=5.0,<6",
"pytest-rerunfailures>=14.0,<15",
"pytest-benchmark>=4.0,<5", # Performance testing
"pytest-mock>=3.12,<4", # Enhanced mocking
"hypothesis>=6.100,<7", # Property-based testing
"factory-boy>=3.3,<4", # Test data factories
]