hermes-agent/TEST_OPTIMIZATION_GUIDE.md

# Test Optimization Guide for Hermes Agent

## Current Test Execution Analysis

### Test Suite Statistics
- **Total Test Files:** 373
- **Estimated Test Functions:** ~4,311
- **Async Tests:** ~679 (15.8%)
- **Integration Tests:** 7 files (excluded from CI)
- **Average Tests per File:** ~11.6

### Current CI Configuration
```yaml
# .github/workflows/tests.yml
- name: Run tests
  run: |
    source .venv/bin/activate
    python -m pytest tests/ -q --ignore=tests/integration --tb=short -n auto
```

**Current Flags:**
- `-q`: Quiet mode
- `--ignore=tests/integration`: Skip integration tests
- `--tb=short`: Short traceback format
- `-n auto`: Auto-detect parallel workers

---

## Optimization Recommendations

### 1. Add Test Duration Reporting

**Current:** No duration tracking
**Recommended:**
```yaml
run: |
  python -m pytest tests/ \
    --ignore=tests/integration \
    -n auto \
    --durations=20 \           # Show 20 slowest tests
    --durations-min=1.0        # Only show tests >1s
```

This will help identify slow tests that need optimization.

### 2. Implement Test Categories

Add markers to `pyproject.toml`:
```toml
[tool.pytest.ini_options]
testpaths = ["tests"]
markers = [
    "integration: marks tests requiring external services",
    "slow: marks tests that take >5 seconds",
    "unit: marks fast unit tests",
    "security: marks security-focused tests",
    "flakey: marks tests that may be unstable",
]
addopts = "-m 'not integration and not slow' -n auto"
```

**Usage:**
```bash
# Run only fast unit tests
pytest -m unit

# Run all tests including slow ones
pytest -m "not integration"

# Run only security tests
pytest -m security
```

### 3. Optimize Slow Test Candidates

Based on file sizes, these tests likely need optimization:

| File | Lines | Optimization Strategy |
|------|-------|----------------------|
| `test_run_agent.py` | 3,329 | Split into multiple files by feature |
| `test_mcp_tool.py` | 2,902 | Split by MCP functionality |
| `test_voice_command.py` | 2,632 | Review for redundant tests |
| `test_feishu.py` | 2,580 | Mock external API calls |
| `test_api_server.py` | 1,503 | Parallelize independent tests |

### 4. Add Coverage Reporting to CI

**Updated workflow:**
```yaml
- name: Run tests with coverage
  run: |
    source .venv/bin/activate
    python -m pytest tests/ \
      --ignore=tests/integration \
      -n auto \
      --cov=agent --cov=tools --cov=gateway --cov=hermes_cli \
      --cov-report=xml \
      --cov-report=html \
      --cov-fail-under=70

- name: Upload coverage to Codecov
  uses: codecov/codecov-action@v3
  with:
    files: ./coverage.xml
    fail_ci_if_error: true
```

### 5. Implement Flaky Test Handling

Add `pytest-rerunfailures`:
```toml
dev = [
    "pytest>=9.0.2,<10",
    "pytest-asyncio>=1.3.0,<2",
    "pytest-xdist>=3.0,<4",
    "pytest-cov>=5.0,<6",
    "pytest-rerunfailures>=14.0,<15",  # Add this
]
```

**Usage:**
```python
# Mark known flaky tests
@pytest.mark.flakey(reruns=3, reruns_delay=1)
async def test_network_dependent_feature():
    # Test that sometimes fails due to network
    pass
```

### 6. Optimize Fixture Scopes

Review `conftest.py` fixtures:

```python
# Current: Function scope (runs for every test)
@pytest.fixture()
def mock_config():
    return {...}

# Optimized: Session scope (runs once per session)
@pytest.fixture(scope="session")
def mock_config():
    return {...}

# Optimized: Module scope (runs once per module)
@pytest.fixture(scope="module")
def expensive_setup():
    # Setup that can be reused across module
    pass
```

### 7. Parallel Execution Tuning

**Current:** `-n auto` (uses all CPUs)
**Issues:**
- May cause resource contention
- Some tests may not be thread-safe

**Recommendations:**
```bash
# Limit workers to prevent resource exhaustion
pytest -n 4  # Use 4 workers regardless of CPU count

# Use load-based scheduling for uneven test durations
pytest -n auto --dist=load

# Group tests by module to reduce setup overhead
pytest -n auto --dist=loadscope
```

### 8. Test Data Management

**Current Issue:** Tests may create files in `/tmp` without cleanup

**Solution - Factory Pattern:**
```python
# tests/factories.py
import tempfile
import shutil
from contextlib import contextmanager

@contextmanager
def temp_workspace():
    """Create isolated temp directory for tests."""
    path = tempfile.mkdtemp(prefix="hermes_test_")
    try:
        yield Path(path)
    finally:
        shutil.rmtree(path, ignore_errors=True)

# Usage in tests
def test_file_operations():
    with temp_workspace() as tmp:
        # All file operations in isolated directory
        file_path = tmp / "test.txt"
        file_path.write_text("content")
        assert file_path.exists()
    # Automatically cleaned up
```

### 9. Database/State Isolation

**Current:** Uses `monkeypatch` for env vars
**Enhancement:** Database mocking

```python
@pytest.fixture
def mock_honcho():
    """Mock Honcho client for tests."""
    with patch("honcho_integration.client.HonchoClient") as mock:
        mock_instance = MagicMock()
        mock_instance.get_session.return_value = {"id": "test-session"}
        mock.return_value = mock_instance
        yield mock

# Usage
async def test_memory_storage(mock_honcho):
    # Fast, isolated test
    pass
```

### 10. CI Pipeline Optimization

**Current Pipeline:**
1. Checkout
2. Install uv
3. Install Python
4. Install deps
5. Run tests

**Optimized Pipeline (with caching):**
```yaml
jobs:
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 10

    steps:
      - uses: actions/checkout@v4

      - name: Install uv
        uses: astral-sh/setup-uv@v5
        with:
          version: "0.5.x"

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
          cache: 'pip'  # Cache pip dependencies

      - name: Cache uv packages
        uses: actions/cache@v4
        with:
          path: ~/.cache/uv
          key: ${{ runner.os }}-uv-${{ hashFiles('**/pyproject.toml') }}

      - name: Install dependencies
        run: |
          uv venv .venv
          uv pip install -e ".[all,dev]"

      - name: Run fast tests
        run: |
          source .venv/bin/activate
          pytest -m "not integration and not slow" -n auto --tb=short

      - name: Run slow tests
        if: github.event_name == 'pull_request'
        run: |
          source .venv/bin/activate
          pytest -m "slow" -n 2 --tb=short
```

---

## Quick Wins (Implement First)

### 1. Add Duration Reporting (5 minutes)
```yaml
--durations=10
```

### 2. Mark Slow Tests (30 minutes)
Add `@pytest.mark.slow` to tests taking >5s.

### 3. Split Largest Test File (2 hours)
Split `test_run_agent.py` into:
- `test_run_agent_core.py`
- `test_run_agent_tools.py`
- `test_run_agent_memory.py`
- `test_run_agent_messaging.py`

### 4. Add Coverage Baseline (1 hour)
```bash
pytest --cov=agent --cov=tools --cov=gateway tests/ --cov-report=html
```

### 5. Optimize Fixture Scopes (1 hour)
Review and optimize 5 most-used fixtures.

---

## Long-term Improvements

### Test Data Generation
```python
# Implement hypothesis-based testing
from hypothesis import given, strategies as st

@given(st.lists(st.text(), min_size=1))
def test_message_batching(messages):
    # Property-based testing
    pass
```

### Performance Regression Testing
```python
@pytest.mark.benchmark
def test_message_processing_speed(benchmark):
    result = benchmark(process_messages, sample_data)
    assert result.throughput > 1000  # msgs/sec
```

### Contract Testing
```python
# Verify API contracts between components
@pytest.mark.contract
def test_agent_tool_contract():
    """Verify agent sends correct format to tools."""
    pass
```

---

## Measurement Checklist

After implementing optimizations, verify:

- [ ] Test suite execution time < 5 minutes
- [ ] No individual test > 10 seconds (except integration)
- [ ] Code coverage > 70%
- [ ] All flaky tests marked and retried
- [ ] CI passes consistently (>95% success rate)
- [ ] Memory usage stable (no leaks in test suite)

---

## Tools to Add

```toml
[project.optional-dependencies]
dev = [
    "pytest>=9.0.2,<10",
    "pytest-asyncio>=1.3.0,<2",
    "pytest-xdist>=3.0,<4",
    "pytest-cov>=5.0,<6",
    "pytest-rerunfailures>=14.0,<15",
    "pytest-benchmark>=4.0,<5",       # Performance testing
    "pytest-mock>=3.12,<4",            # Enhanced mocking
    "hypothesis>=6.100,<7",            # Property-based testing
    "factory-boy>=3.3,<4",             # Test data factories
]
```