Replace shell=True with list-based subprocess execution to prevent command injection via malicious user input. Changes: - tools/transcription_tools.py: Use shlex.split() + shell=False - tools/environments/docker.py: List-based commands with container ID validation Fixes CVE-level vulnerability where malicious file paths or container IDs could inject arbitrary commands. CVSS: 9.8 (Critical) Refs: V-001 in SECURITY_AUDIT_REPORT.md
365 lines
8.4 KiB
Markdown
365 lines
8.4 KiB
Markdown
# Test Optimization Guide for Hermes Agent
|
|
|
|
## Current Test Execution Analysis
|
|
|
|
### Test Suite Statistics
|
|
- **Total Test Files:** 373
|
|
- **Estimated Test Functions:** ~4,311
|
|
- **Async Tests:** ~679 (15.8%)
|
|
- **Integration Tests:** 7 files (excluded from CI)
|
|
- **Average Tests per File:** ~11.6
|
|
|
|
### Current CI Configuration
|
|
```yaml
|
|
# .github/workflows/tests.yml
|
|
- name: Run tests
|
|
run: |
|
|
source .venv/bin/activate
|
|
python -m pytest tests/ -q --ignore=tests/integration --tb=short -n auto
|
|
```
|
|
|
|
**Current Flags:**
|
|
- `-q`: Quiet mode
|
|
- `--ignore=tests/integration`: Skip integration tests
|
|
- `--tb=short`: Short traceback format
|
|
- `-n auto`: Auto-detect parallel workers
|
|
|
|
---
|
|
|
|
## Optimization Recommendations
|
|
|
|
### 1. Add Test Duration Reporting
|
|
|
|
**Current:** No duration tracking
|
|
**Recommended:**
|
|
```yaml
|
|
run: |
|
|
python -m pytest tests/ \
|
|
--ignore=tests/integration \
|
|
-n auto \
|
|
--durations=20 \ # Show 20 slowest tests
|
|
--durations-min=1.0 # Only show tests >1s
|
|
```
|
|
|
|
This will help identify slow tests that need optimization.
|
|
|
|
### 2. Implement Test Categories
|
|
|
|
Add markers to `pyproject.toml`:
|
|
```toml
|
|
[tool.pytest.ini_options]
|
|
testpaths = ["tests"]
|
|
markers = [
|
|
"integration: marks tests requiring external services",
|
|
"slow: marks tests that take >5 seconds",
|
|
"unit: marks fast unit tests",
|
|
"security: marks security-focused tests",
|
|
"flakey: marks tests that may be unstable",
|
|
]
|
|
addopts = "-m 'not integration and not slow' -n auto"
|
|
```
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Run only fast unit tests
|
|
pytest -m unit
|
|
|
|
# Run all tests including slow ones
|
|
pytest -m "not integration"
|
|
|
|
# Run only security tests
|
|
pytest -m security
|
|
```
|
|
|
|
### 3. Optimize Slow Test Candidates
|
|
|
|
Based on file sizes, these tests likely need optimization:
|
|
|
|
| File | Lines | Optimization Strategy |
|
|
|------|-------|----------------------|
|
|
| `test_run_agent.py` | 3,329 | Split into multiple files by feature |
|
|
| `test_mcp_tool.py` | 2,902 | Split by MCP functionality |
|
|
| `test_voice_command.py` | 2,632 | Review for redundant tests |
|
|
| `test_feishu.py` | 2,580 | Mock external API calls |
|
|
| `test_api_server.py` | 1,503 | Parallelize independent tests |
|
|
|
|
### 4. Add Coverage Reporting to CI
|
|
|
|
**Updated workflow:**
|
|
```yaml
|
|
- name: Run tests with coverage
|
|
run: |
|
|
source .venv/bin/activate
|
|
python -m pytest tests/ \
|
|
--ignore=tests/integration \
|
|
-n auto \
|
|
--cov=agent --cov=tools --cov=gateway --cov=hermes_cli \
|
|
--cov-report=xml \
|
|
--cov-report=html \
|
|
--cov-fail-under=70
|
|
|
|
- name: Upload coverage to Codecov
|
|
uses: codecov/codecov-action@v3
|
|
with:
|
|
files: ./coverage.xml
|
|
fail_ci_if_error: true
|
|
```
|
|
|
|
### 5. Implement Flaky Test Handling
|
|
|
|
Add `pytest-rerunfailures`:
|
|
```toml
|
|
dev = [
|
|
"pytest>=9.0.2,<10",
|
|
"pytest-asyncio>=1.3.0,<2",
|
|
"pytest-xdist>=3.0,<4",
|
|
"pytest-cov>=5.0,<6",
|
|
"pytest-rerunfailures>=14.0,<15", # Add this
|
|
]
|
|
```
|
|
|
|
**Usage:**
|
|
```python
|
|
# Mark known flaky tests
|
|
@pytest.mark.flakey(reruns=3, reruns_delay=1)
|
|
async def test_network_dependent_feature():
|
|
# Test that sometimes fails due to network
|
|
pass
|
|
```
|
|
|
|
### 6. Optimize Fixture Scopes
|
|
|
|
Review `conftest.py` fixtures:
|
|
|
|
```python
|
|
# Current: Function scope (runs for every test)
|
|
@pytest.fixture()
|
|
def mock_config():
|
|
return {...}
|
|
|
|
# Optimized: Session scope (runs once per session)
|
|
@pytest.fixture(scope="session")
|
|
def mock_config():
|
|
return {...}
|
|
|
|
# Optimized: Module scope (runs once per module)
|
|
@pytest.fixture(scope="module")
|
|
def expensive_setup():
|
|
# Setup that can be reused across module
|
|
pass
|
|
```
|
|
|
|
### 7. Parallel Execution Tuning
|
|
|
|
**Current:** `-n auto` (uses all CPUs)
|
|
**Issues:**
|
|
- May cause resource contention
|
|
- Some tests may not be thread-safe
|
|
|
|
**Recommendations:**
|
|
```bash
|
|
# Limit workers to prevent resource exhaustion
|
|
pytest -n 4 # Use 4 workers regardless of CPU count
|
|
|
|
# Use load-based scheduling for uneven test durations
|
|
pytest -n auto --dist=load
|
|
|
|
# Group tests by module to reduce setup overhead
|
|
pytest -n auto --dist=loadscope
|
|
```
|
|
|
|
### 8. Test Data Management
|
|
|
|
**Current Issue:** Tests may create files in `/tmp` without cleanup
|
|
|
|
**Solution - Factory Pattern:**
|
|
```python
|
|
# tests/factories.py
|
|
import tempfile
|
|
import shutil
|
|
from contextlib import contextmanager
|
|
|
|
@contextmanager
|
|
def temp_workspace():
|
|
"""Create isolated temp directory for tests."""
|
|
path = tempfile.mkdtemp(prefix="hermes_test_")
|
|
try:
|
|
yield Path(path)
|
|
finally:
|
|
shutil.rmtree(path, ignore_errors=True)
|
|
|
|
# Usage in tests
|
|
def test_file_operations():
|
|
with temp_workspace() as tmp:
|
|
# All file operations in isolated directory
|
|
file_path = tmp / "test.txt"
|
|
file_path.write_text("content")
|
|
assert file_path.exists()
|
|
# Automatically cleaned up
|
|
```
|
|
|
|
### 9. Database/State Isolation
|
|
|
|
**Current:** Uses `monkeypatch` for env vars
|
|
**Enhancement:** Database mocking
|
|
|
|
```python
|
|
@pytest.fixture
|
|
def mock_honcho():
|
|
"""Mock Honcho client for tests."""
|
|
with patch("honcho_integration.client.HonchoClient") as mock:
|
|
mock_instance = MagicMock()
|
|
mock_instance.get_session.return_value = {"id": "test-session"}
|
|
mock.return_value = mock_instance
|
|
yield mock
|
|
|
|
# Usage
|
|
async def test_memory_storage(mock_honcho):
|
|
# Fast, isolated test
|
|
pass
|
|
```
|
|
|
|
### 10. CI Pipeline Optimization
|
|
|
|
**Current Pipeline:**
|
|
1. Checkout
|
|
2. Install uv
|
|
3. Install Python
|
|
4. Install deps
|
|
5. Run tests
|
|
|
|
**Optimized Pipeline (with caching):**
|
|
```yaml
|
|
jobs:
|
|
test:
|
|
runs-on: ubuntu-latest
|
|
timeout-minutes: 10
|
|
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
|
|
- name: Install uv
|
|
uses: astral-sh/setup-uv@v5
|
|
with:
|
|
version: "0.5.x"
|
|
|
|
- name: Set up Python
|
|
uses: actions/setup-python@v5
|
|
with:
|
|
python-version: '3.11'
|
|
cache: 'pip' # Cache pip dependencies
|
|
|
|
- name: Cache uv packages
|
|
uses: actions/cache@v4
|
|
with:
|
|
path: ~/.cache/uv
|
|
key: ${{ runner.os }}-uv-${{ hashFiles('**/pyproject.toml') }}
|
|
|
|
- name: Install dependencies
|
|
run: |
|
|
uv venv .venv
|
|
uv pip install -e ".[all,dev]"
|
|
|
|
- name: Run fast tests
|
|
run: |
|
|
source .venv/bin/activate
|
|
pytest -m "not integration and not slow" -n auto --tb=short
|
|
|
|
- name: Run slow tests
|
|
if: github.event_name == 'pull_request'
|
|
run: |
|
|
source .venv/bin/activate
|
|
pytest -m "slow" -n 2 --tb=short
|
|
```
|
|
|
|
---
|
|
|
|
## Quick Wins (Implement First)
|
|
|
|
### 1. Add Duration Reporting (5 minutes)
|
|
```yaml
|
|
--durations=10
|
|
```
|
|
|
|
### 2. Mark Slow Tests (30 minutes)
|
|
Add `@pytest.mark.slow` to tests taking >5s.
|
|
|
|
### 3. Split Largest Test File (2 hours)
|
|
Split `test_run_agent.py` into:
|
|
- `test_run_agent_core.py`
|
|
- `test_run_agent_tools.py`
|
|
- `test_run_agent_memory.py`
|
|
- `test_run_agent_messaging.py`
|
|
|
|
### 4. Add Coverage Baseline (1 hour)
|
|
```bash
|
|
pytest --cov=agent --cov=tools --cov=gateway tests/ --cov-report=html
|
|
```
|
|
|
|
### 5. Optimize Fixture Scopes (1 hour)
|
|
Review and optimize 5 most-used fixtures.
|
|
|
|
---
|
|
|
|
## Long-term Improvements
|
|
|
|
### Test Data Generation
|
|
```python
|
|
# Implement hypothesis-based testing
|
|
from hypothesis import given, strategies as st
|
|
|
|
@given(st.lists(st.text(), min_size=1))
|
|
def test_message_batching(messages):
|
|
# Property-based testing
|
|
pass
|
|
```
|
|
|
|
### Performance Regression Testing
|
|
```python
|
|
@pytest.mark.benchmark
|
|
def test_message_processing_speed(benchmark):
|
|
result = benchmark(process_messages, sample_data)
|
|
assert result.throughput > 1000 # msgs/sec
|
|
```
|
|
|
|
### Contract Testing
|
|
```python
|
|
# Verify API contracts between components
|
|
@pytest.mark.contract
|
|
def test_agent_tool_contract():
|
|
"""Verify agent sends correct format to tools."""
|
|
pass
|
|
```
|
|
|
|
---
|
|
|
|
## Measurement Checklist
|
|
|
|
After implementing optimizations, verify:
|
|
|
|
- [ ] Test suite execution time < 5 minutes
|
|
- [ ] No individual test > 10 seconds (except integration)
|
|
- [ ] Code coverage > 70%
|
|
- [ ] All flaky tests marked and retried
|
|
- [ ] CI passes consistently (>95% success rate)
|
|
- [ ] Memory usage stable (no leaks in test suite)
|
|
|
|
---
|
|
|
|
## Tools to Add
|
|
|
|
```toml
|
|
[project.optional-dependencies]
|
|
dev = [
|
|
"pytest>=9.0.2,<10",
|
|
"pytest-asyncio>=1.3.0,<2",
|
|
"pytest-xdist>=3.0,<4",
|
|
"pytest-cov>=5.0,<6",
|
|
"pytest-rerunfailures>=14.0,<15",
|
|
"pytest-benchmark>=4.0,<5", # Performance testing
|
|
"pytest-mock>=3.12,<4", # Enhanced mocking
|
|
"hypothesis>=6.100,<7", # Property-based testing
|
|
"factory-boy>=3.3,<4", # Test data factories
|
|
]
|
|
```
|