skills/software-development/test-driven-development/SKILL.md

---
name: test-driven-development
description: Use when implementing any feature or bugfix, before writing implementation code. Enforces RED-GREEN-REFACTOR cycle with test-first approach.
version: 1.0.0
author: Hermes Agent (adapted from Superpowers)
license: MIT
metadata:
  hermes:
    tags: [testing, tdd, development, quality, red-green-refactor]
    related_skills: [systematic-debugging, writing-plans, subagent-driven-development]
---

# Test-Driven Development (TDD)

## Overview

Write the test first. Watch it fail. Write minimal code to pass.

**Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing.

**Violating the letter of the rules is violating the spirit of the rules.**

## When to Use

**Always:**
- New features
- Bug fixes
- Refactoring
- Behavior changes

**Exceptions (ask your human partner):**
- Throwaway prototypes
- Generated code
- Configuration files

Thinking "skip TDD just this once"? Stop. That's rationalization.

## The Iron Law

```
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
```

Write code before the test? Delete it. Start over.

**No exceptions:**
- Don't keep it as "reference"
- Don't "adapt" it while writing tests
- Don't look at it
- Delete means delete

Implement fresh from tests. Period.

## Red-Green-Refactor Cycle

### RED - Write Failing Test

Write one minimal test showing what should happen.

**Good Example:**
```python
def test_retries_failed_operations_3_times():
    attempts = 0
    def operation():
        nonlocal attempts
        attempts += 1
        if attempts < 3:
            raise Exception('fail')
        return 'success'
    
    result = retry_operation(operation)
    
    assert result == 'success'
    assert attempts == 3
```
- Clear name, tests real behavior, one thing

**Bad Example:**
```python
def test_retry_works():
    mock = MagicMock()
    mock.side_effect = [Exception(), Exception(), 'success']
    
    result = retry_operation(mock)
    
    assert result == 'success'  # What about retry count? Timing?
```
- Vague name, mocks behavior not reality, unclear what it tests

### Verify RED

Run the test. It MUST fail.

**If it passes:**
- Test is wrong (not testing what you think)
- Code already exists (delete it, start over)
- Wrong test file running

**What to check:**
- Error message makes sense
- Fails for expected reason
- Stack trace points to right place

### GREEN - Minimal Code

Write just enough code to pass. Nothing more.

**Good:**
```python
def add(a, b):
    return a + b  # Nothing extra
```

**Bad:**
```python
def add(a, b):
    result = a + b
    logging.info(f"Adding {a} + {b} = {result}")  # Extra!
    return result
```

Cheating is OK in GREEN:
- Hardcode return values
- Copy-paste
- Duplicate code
- Skip edge cases

**We'll fix it in refactor.**

### Verify GREEN

Run tests. All must pass.

**If fails:**
- Fix minimal code
- Don't expand scope
- Stay in GREEN

### REFACTOR - Clean Up

Now improve the code while keeping tests green.

**Safe refactorings:**
- Rename variables/functions
- Extract helper functions
- Remove duplication
- Simplify expressions
- Improve readability

**Golden rule:** Tests stay green throughout.

**If tests fail during refactor:**
- Undo immediately
- Smaller refactoring steps
- Check you didn't change behavior

## Implementation Workflow

### 1. Create Test File

```bash
# Python
touch tests/test_feature.py

# JavaScript
touch tests/feature.test.js

# Rust
touch tests/feature_tests.rs
```

### 2. Write First Failing Test

```python
# tests/test_calculator.py
def test_adds_two_numbers():
    calc = Calculator()
    result = calc.add(2, 3)
    assert result == 5
```

### 3. Run and Verify Failure

```bash
pytest tests/test_calculator.py -v
# Expected: FAIL - Calculator not defined
```

### 4. Write Minimal Implementation

```python
# src/calculator.py
class Calculator:
    def add(self, a, b):
        return a + b  # Minimal!
```

### 5. Run and Verify Pass

```bash
pytest tests/test_calculator.py -v
# Expected: PASS
```

### 6. Commit

```bash
git add tests/test_calculator.py src/calculator.py
git commit -m "feat: add calculator with add method"
```

### 7. Next Test

```python
def test_adds_negative_numbers():
    calc = Calculator()
    result = calc.add(-2, -3)
    assert result == -5
```

Repeat cycle.

## Testing Anti-Patterns

### Mocking What You Don't Own

**Bad:** Mock database, HTTP client, file system
**Good:** Abstract behind interface, test interface

### Testing Implementation Details

**Bad:** Test that function was called
**Good:** Test the result/behavior

### Happy Path Only

**Bad:** Only test expected inputs
**Good:** Test edge cases, errors, boundaries

### Brittle Tests

**Bad:** Test breaks when refactoring
**Good:** Tests verify behavior, not structure

## Common Pitfalls

### "I'll Write Tests After"

No, you won't. Write them first.

### "This is Too Simple to Test"

Simple bugs cause complex problems. Test everything.

### "I Need to See It Work First"

Temporary code becomes permanent. Test first.

### "Tests Take Too Long"

Untested code takes longer. Invest in tests.

## Language-Specific Commands

### Python (pytest)

```bash
# Run all tests
pytest

# Run specific test
pytest tests/test_feature.py::test_name -v

# Run with coverage
pytest --cov=src --cov-report=term-missing

# Watch mode
pytest-watch
```

### JavaScript (Jest)

```bash
# Run all tests
npm test

# Run specific test
npm test -- test_name

# Watch mode
npm test -- --watch

# Coverage
npm test -- --coverage
```

### TypeScript (Jest with ts-jest)

```bash
# Run tests
npx jest

# Run specific file
npx jest tests/feature.test.ts
```

### Go

```bash
# Run all tests
go test ./...

# Run specific test
go test -run TestName

# Verbose
go test -v

# Coverage
go test -cover
```

### Rust

```bash
# Run tests
cargo test

# Run specific test
cargo test test_name

# Show output
cargo test -- --nocapture
```

## Integration with Other Skills

### With writing-plans

Every plan task should specify:
- What test to write
- Expected test failure
- Minimal implementation
- Refactoring opportunities

### With systematic-debugging

When fixing bugs:
1. Write test that reproduces bug
2. Verify test fails (RED)
3. Fix bug (GREEN)
4. Refactor if needed

### With subagent-driven-development

Subagent implements one test at a time:
1. Write failing test
2. Minimal code to pass
3. Commit
4. Next test

## Success Indicators

**You're doing TDD right when:**
- Tests fail before code exists
- You write <10 lines between test runs
- Refactoring feels safe
- Bugs are caught immediately
- Code is simpler than expected

**Red flags:**
- Writing 50+ lines without running tests
- Tests always pass
- Fear of refactoring
- "I'll test later"

## Remember

1. **RED** - Write failing test
2. **GREEN** - Minimal code to pass  
3. **REFACTOR** - Clean up safely
4. **REPEAT** - Next behavior

**If you didn't see it fail, it doesn't test what you think.**
feat: Add Superpowers software development skills Add 5 new skills for professional software development workflows, adapted from the Superpowers project ( obra/superpowers ): - test-driven-development: RED-GREEN-REFACTOR cycle enforcement - systematic-debugging: 4-phase root cause investigation - subagent-driven-development: Structured delegation with two-stage review - writing-plans: Comprehensive implementation planning - requesting-code-review: Systematic code review process These skills provide structured development workflows that transform Hermes from a general assistant into a professional software engineer with defined processes for quality assurance. Skills are organized under software-development category and follow Hermes skill format with proper frontmatter, examples, and integration guidance with existing skills. 2026-02-27 15:32:58 +01:00			`---`
			`name: test-driven-development`
			`description: Use when implementing any feature or bugfix, before writing implementation code. Enforces RED-GREEN-REFACTOR cycle with test-first approach.`
			`version: 1.0.0`
			`author: Hermes Agent (adapted from Superpowers)`
			`license: MIT`
			`metadata:`
			`hermes:`
			`tags: [testing, tdd, development, quality, red-green-refactor]`
			`related_skills: [systematic-debugging, writing-plans, subagent-driven-development]`
			`---`

			`# Test-Driven Development (TDD)`

			`## Overview`

			`Write the test first. Watch it fail. Write minimal code to pass.`

			`Core principle: If you didn't watch the test fail, you don't know if it tests the right thing.`

			`Violating the letter of the rules is violating the spirit of the rules.`

			`## When to Use`

			`Always:`
			`- New features`
			`- Bug fixes`
			`- Refactoring`
			`- Behavior changes`

			`Exceptions (ask your human partner):`
			`- Throwaway prototypes`
			`- Generated code`
			`- Configuration files`

			`Thinking "skip TDD just this once"? Stop. That's rationalization.`

			`## The Iron Law`

			```
			`NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST`
			```

			`Write code before the test? Delete it. Start over.`

			`No exceptions:`
			`- Don't keep it as "reference"`
			`- Don't "adapt" it while writing tests`
			`- Don't look at it`
			`- Delete means delete`

			`Implement fresh from tests. Period.`

			`## Red-Green-Refactor Cycle`

			`### RED - Write Failing Test`

			`Write one minimal test showing what should happen.`

			`Good Example:`
			```python
			`def test_retries_failed_operations_3_times():`
			`attempts = 0`
			`def operation():`
			`nonlocal attempts`
			`attempts += 1`
			`if attempts < 3:`
			`raise Exception('fail')`
			`return 'success'`

			`result = retry_operation(operation)`

			`assert result == 'success'`
			`assert attempts == 3`
			```
			`- Clear name, tests real behavior, one thing`

			`Bad Example:`
			```python
			`def test_retry_works():`
			`mock = MagicMock()`
			`mock.side_effect = [Exception(), Exception(), 'success']`

			`result = retry_operation(mock)`

			`assert result == 'success' # What about retry count? Timing?`
			```
			`- Vague name, mocks behavior not reality, unclear what it tests`

			`### Verify RED`

			`Run the test. It MUST fail.`

			`If it passes:`
			`- Test is wrong (not testing what you think)`
			`- Code already exists (delete it, start over)`
			`- Wrong test file running`

			`What to check:`
			`- Error message makes sense`
			`- Fails for expected reason`
			`- Stack trace points to right place`

			`### GREEN - Minimal Code`

			`Write just enough code to pass. Nothing more.`

			`Good:`
			```python
			`def add(a, b):`
			`return a + b # Nothing extra`
			```

			`Bad:`
			```python
			`def add(a, b):`
			`result = a + b`
			`logging.info(f"Adding {a} + {b} = {result}") # Extra!`
			`return result`
			```

			`Cheating is OK in GREEN:`
			`- Hardcode return values`
			`- Copy-paste`
			`- Duplicate code`
			`- Skip edge cases`

			`We'll fix it in refactor.`

			`### Verify GREEN`

			`Run tests. All must pass.`

			`If fails:`
			`- Fix minimal code`
			`- Don't expand scope`
			`- Stay in GREEN`

			`### REFACTOR - Clean Up`

			`Now improve the code while keeping tests green.`

			`Safe refactorings:`
			`- Rename variables/functions`
			`- Extract helper functions`
			`- Remove duplication`
			`- Simplify expressions`
			`- Improve readability`

			`Golden rule: Tests stay green throughout.`

			`If tests fail during refactor:`
			`- Undo immediately`
			`- Smaller refactoring steps`
			`- Check you didn't change behavior`

			`## Implementation Workflow`

			`### 1. Create Test File`

			```bash
			`# Python`
			`touch tests/test_feature.py`

			`# JavaScript`
			`touch tests/feature.test.js`

			`# Rust`
			`touch tests/feature_tests.rs`
			```

			`### 2. Write First Failing Test`

			```python
			`# tests/test_calculator.py`
			`def test_adds_two_numbers():`
			`calc = Calculator()`
			`result = calc.add(2, 3)`
			`assert result == 5`
			```

			`### 3. Run and Verify Failure`

			```bash
			`pytest tests/test_calculator.py -v`
			`# Expected: FAIL - Calculator not defined`
			```

			`### 4. Write Minimal Implementation`

			```python
			`# src/calculator.py`
			`class Calculator:`
			`def add(self, a, b):`
			`return a + b # Minimal!`
			```

			`### 5. Run and Verify Pass`

			```bash
			`pytest tests/test_calculator.py -v`
			`# Expected: PASS`
			```

			`### 6. Commit`

			```bash
			`git add tests/test_calculator.py src/calculator.py`
			`git commit -m "feat: add calculator with add method"`
			```

			`### 7. Next Test`

			```python
			`def test_adds_negative_numbers():`
			`calc = Calculator()`
			`result = calc.add(-2, -3)`
			`assert result == -5`
			```

			`Repeat cycle.`

			`## Testing Anti-Patterns`

			`### Mocking What You Don't Own`

			`Bad: Mock database, HTTP client, file system`
			`Good: Abstract behind interface, test interface`

			`### Testing Implementation Details`

			`Bad: Test that function was called`
			`Good: Test the result/behavior`

			`### Happy Path Only`

			`Bad: Only test expected inputs`
			`Good: Test edge cases, errors, boundaries`

			`### Brittle Tests`

			`Bad: Test breaks when refactoring`
			`Good: Tests verify behavior, not structure`

			`## Common Pitfalls`

			`### "I'll Write Tests After"`

			`No, you won't. Write them first.`

			`### "This is Too Simple to Test"`

			`Simple bugs cause complex problems. Test everything.`

			`### "I Need to See It Work First"`

			`Temporary code becomes permanent. Test first.`

			`### "Tests Take Too Long"`

			`Untested code takes longer. Invest in tests.`

			`## Language-Specific Commands`

			`### Python (pytest)`

			```bash
			`# Run all tests`
			`pytest`

			`# Run specific test`
			`pytest tests/test_feature.py::test_name -v`

			`# Run with coverage`
			`pytest --cov=src --cov-report=term-missing`

			`# Watch mode`
			`pytest-watch`
			```

			`### JavaScript (Jest)`

			```bash
			`# Run all tests`
			`npm test`

			`# Run specific test`
			`npm test -- test_name`

			`# Watch mode`
			`npm test -- --watch`

			`# Coverage`
			`npm test -- --coverage`
			```

			`### TypeScript (Jest with ts-jest)`

			```bash
			`# Run tests`
			`npx jest`

			`# Run specific file`
			`npx jest tests/feature.test.ts`
			```

			`### Go`

			```bash
			`# Run all tests`
			`go test ./...`

			`# Run specific test`
			`go test -run TestName`

			`# Verbose`
			`go test -v`

			`# Coverage`
			`go test -cover`
			```

			`### Rust`

			```bash
			`# Run tests`
			`cargo test`

			`# Run specific test`
			`cargo test test_name`

			`# Show output`
			`cargo test -- --nocapture`
			```

			`## Integration with Other Skills`

			`### With writing-plans`

			`Every plan task should specify:`
			`- What test to write`
			`- Expected test failure`
			`- Minimal implementation`
			`- Refactoring opportunities`

			`### With systematic-debugging`

			`When fixing bugs:`
			`1. Write test that reproduces bug`
			`2. Verify test fails (RED)`
			`3. Fix bug (GREEN)`
			`4. Refactor if needed`

			`### With subagent-driven-development`

			`Subagent implements one test at a time:`
			`1. Write failing test`
			`2. Minimal code to pass`
			`3. Commit`
			`4. Next test`

			`## Success Indicators`

			`You're doing TDD right when:`
			`- Tests fail before code exists`
			`- You write <10 lines between test runs`
			`- Refactoring feels safe`
			`- Bugs are caught immediately`
			`- Code is simpler than expected`

			`Red flags:`
			`- Writing 50+ lines without running tests`
			`- Tests always pass`
			`- Fear of refactoring`
			`- "I'll test later"`

			`## Remember`

			`1. RED - Write failing test`
			`2. GREEN - Minimal code to pass`
			`3. REFACTOR - Clean up safely`
			`4. REPEAT - Next behavior`

			`If you didn't see it fail, it doesn't test what you think.`