Files
hermes-agent/skills/software-development/test-driven-development/SKILL.md
kaos35 2595d81733 feat: Add Superpowers software development skills
Add 5 new skills for professional software development workflows,
adapted from the Superpowers project ( obra/superpowers ):

- test-driven-development: RED-GREEN-REFACTOR cycle enforcement
- systematic-debugging: 4-phase root cause investigation
- subagent-driven-development: Structured delegation with two-stage review
- writing-plans: Comprehensive implementation planning
- requesting-code-review: Systematic code review process

These skills provide structured development workflows that transform
Hermes from a general assistant into a professional software engineer
with defined processes for quality assurance.

Skills are organized under software-development category and follow
Hermes skill format with proper frontmatter, examples, and integration
guidance with existing skills.
2026-02-27 15:32:58 +01:00

6.8 KiB

name, description, version, author, license, metadata
name description version author license metadata
test-driven-development Use when implementing any feature or bugfix, before writing implementation code. Enforces RED-GREEN-REFACTOR cycle with test-first approach. 1.0.0 Hermes Agent (adapted from Superpowers) MIT
hermes
tags related_skills
testing
tdd
development
quality
red-green-refactor
systematic-debugging
writing-plans
subagent-driven-development

Test-Driven Development (TDD)

Overview

Write the test first. Watch it fail. Write minimal code to pass.

Core principle: If you didn't watch the test fail, you don't know if it tests the right thing.

Violating the letter of the rules is violating the spirit of the rules.

When to Use

Always:

  • New features
  • Bug fixes
  • Refactoring
  • Behavior changes

Exceptions (ask your human partner):

  • Throwaway prototypes
  • Generated code
  • Configuration files

Thinking "skip TDD just this once"? Stop. That's rationalization.

The Iron Law

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST

Write code before the test? Delete it. Start over.

No exceptions:

  • Don't keep it as "reference"
  • Don't "adapt" it while writing tests
  • Don't look at it
  • Delete means delete

Implement fresh from tests. Period.

Red-Green-Refactor Cycle

RED - Write Failing Test

Write one minimal test showing what should happen.

Good Example:

def test_retries_failed_operations_3_times():
    attempts = 0
    def operation():
        nonlocal attempts
        attempts += 1
        if attempts < 3:
            raise Exception('fail')
        return 'success'
    
    result = retry_operation(operation)
    
    assert result == 'success'
    assert attempts == 3
  • Clear name, tests real behavior, one thing

Bad Example:

def test_retry_works():
    mock = MagicMock()
    mock.side_effect = [Exception(), Exception(), 'success']
    
    result = retry_operation(mock)
    
    assert result == 'success'  # What about retry count? Timing?
  • Vague name, mocks behavior not reality, unclear what it tests

Verify RED

Run the test. It MUST fail.

If it passes:

  • Test is wrong (not testing what you think)
  • Code already exists (delete it, start over)
  • Wrong test file running

What to check:

  • Error message makes sense
  • Fails for expected reason
  • Stack trace points to right place

GREEN - Minimal Code

Write just enough code to pass. Nothing more.

Good:

def add(a, b):
    return a + b  # Nothing extra

Bad:

def add(a, b):
    result = a + b
    logging.info(f"Adding {a} + {b} = {result}")  # Extra!
    return result

Cheating is OK in GREEN:

  • Hardcode return values
  • Copy-paste
  • Duplicate code
  • Skip edge cases

We'll fix it in refactor.

Verify GREEN

Run tests. All must pass.

If fails:

  • Fix minimal code
  • Don't expand scope
  • Stay in GREEN

REFACTOR - Clean Up

Now improve the code while keeping tests green.

Safe refactorings:

  • Rename variables/functions
  • Extract helper functions
  • Remove duplication
  • Simplify expressions
  • Improve readability

Golden rule: Tests stay green throughout.

If tests fail during refactor:

  • Undo immediately
  • Smaller refactoring steps
  • Check you didn't change behavior

Implementation Workflow

1. Create Test File

# Python
touch tests/test_feature.py

# JavaScript
touch tests/feature.test.js

# Rust
touch tests/feature_tests.rs

2. Write First Failing Test

# tests/test_calculator.py
def test_adds_two_numbers():
    calc = Calculator()
    result = calc.add(2, 3)
    assert result == 5

3. Run and Verify Failure

pytest tests/test_calculator.py -v
# Expected: FAIL - Calculator not defined

4. Write Minimal Implementation

# src/calculator.py
class Calculator:
    def add(self, a, b):
        return a + b  # Minimal!

5. Run and Verify Pass

pytest tests/test_calculator.py -v
# Expected: PASS

6. Commit

git add tests/test_calculator.py src/calculator.py
git commit -m "feat: add calculator with add method"

7. Next Test

def test_adds_negative_numbers():
    calc = Calculator()
    result = calc.add(-2, -3)
    assert result == -5

Repeat cycle.

Testing Anti-Patterns

Mocking What You Don't Own

Bad: Mock database, HTTP client, file system Good: Abstract behind interface, test interface

Testing Implementation Details

Bad: Test that function was called Good: Test the result/behavior

Happy Path Only

Bad: Only test expected inputs Good: Test edge cases, errors, boundaries

Brittle Tests

Bad: Test breaks when refactoring Good: Tests verify behavior, not structure

Common Pitfalls

"I'll Write Tests After"

No, you won't. Write them first.

"This is Too Simple to Test"

Simple bugs cause complex problems. Test everything.

"I Need to See It Work First"

Temporary code becomes permanent. Test first.

"Tests Take Too Long"

Untested code takes longer. Invest in tests.

Language-Specific Commands

Python (pytest)

# Run all tests
pytest

# Run specific test
pytest tests/test_feature.py::test_name -v

# Run with coverage
pytest --cov=src --cov-report=term-missing

# Watch mode
pytest-watch

JavaScript (Jest)

# Run all tests
npm test

# Run specific test
npm test -- test_name

# Watch mode
npm test -- --watch

# Coverage
npm test -- --coverage

TypeScript (Jest with ts-jest)

# Run tests
npx jest

# Run specific file
npx jest tests/feature.test.ts

Go

# Run all tests
go test ./...

# Run specific test
go test -run TestName

# Verbose
go test -v

# Coverage
go test -cover

Rust

# Run tests
cargo test

# Run specific test
cargo test test_name

# Show output
cargo test -- --nocapture

Integration with Other Skills

With writing-plans

Every plan task should specify:

  • What test to write
  • Expected test failure
  • Minimal implementation
  • Refactoring opportunities

With systematic-debugging

When fixing bugs:

  1. Write test that reproduces bug
  2. Verify test fails (RED)
  3. Fix bug (GREEN)
  4. Refactor if needed

With subagent-driven-development

Subagent implements one test at a time:

  1. Write failing test
  2. Minimal code to pass
  3. Commit
  4. Next test

Success Indicators

You're doing TDD right when:

  • Tests fail before code exists
  • You write <10 lines between test runs
  • Refactoring feels safe
  • Bugs are caught immediately
  • Code is simpler than expected

Red flags:

  • Writing 50+ lines without running tests
  • Tests always pass
  • Fear of refactoring
  • "I'll test later"

Remember

  1. RED - Write failing test
  2. GREEN - Minimal code to pass
  3. REFACTOR - Clean up safely
  4. REPEAT - Next behavior

If you didn't see it fail, it doesn't test what you think.