refactor: enhance software-development skills with Hermes integration

Improvements to all 5 skills adapted from obra/superpowers:

- Restored anti-rationalization tables and red flags from originals
  (key behavioral guardrails that prevent LLMs from taking shortcuts)
- Restored 'Rule of Three' for debugging (3+ failed fixes = question
  architecture, not keep fixing)
- Restored Pattern Analysis and Hypothesis Testing phases in debugging
- Restored 'Why Order Matters' rebuttals and verification checklist in TDD
- Added proper Hermes delegate_task integration with real parameter examples
  and toolset specifications throughout
- Added Hermes tool usage (search_files, read_file, terminal) for
  investigation and verification steps
- Removed references to non-existent skills (brainstorming,
  finishing-a-development-branch, executing-plans, using-git-worktrees)
- Removed generic language-specific sections (Go, Rust, Jest) that
  added bulk without agent value
- Tightened prose — cut ~430 lines while adding more actionable content
- Added execution handoff section to writing-plans
- Consistent cross-references between the 5 skills
This commit is contained in:
teknium1
2026-03-03 04:08:56 -08:00
parent 0e1723ef74
commit de0af4df66
5 changed files with 879 additions and 1312 deletions

View File

@@ -1,8 +1,8 @@
---
name: requesting-code-review
description: Use when completing tasks, implementing major features, or before merging. Validates work meets requirements through systematic review process.
version: 1.0.0
author: Hermes Agent (adapted from Superpowers)
version: 1.1.0
author: Hermes Agent (adapted from obra/superpowers)
license: MIT
metadata:
hermes:
@@ -14,108 +14,102 @@ metadata:
## Overview
Systematic code review catches issues before they cascade. Review early, review often.
Dispatch a reviewer subagent to catch issues before they cascade. Review early, review often.
**Core principle:** Fresh perspective finds issues you'll miss.
## When to Request Review
**Mandatory Reviews:**
**Mandatory:**
- After each task in subagent-driven development
- After completing major features
- After completing a major feature
- Before merge to main
- After bug fixes
**Optional but Valuable:**
**Optional but valuable:**
- When stuck (fresh perspective)
- Before refactoring (baseline check)
- After complex logic implementation
- When touching critical code (auth, payments, data)
**Don't skip because:**
- "It's simple" (simple bugs compound)
- "I'm in a hurry" (reviews save time)
- "I tested it" (you have blind spots)
**Never skip because:**
- "It's simple" simple bugs compound
- "I'm in a hurry" reviews save time
- "I tested it" you have blind spots
## Review Process
### Step 1: Prepare Context
### Step 1: Self-Review First
Gather:
- What was implemented
- Original requirements/plan
- Files changed
- Test results
Before dispatching a reviewer, check yourself:
```bash
# Get changed files
git diff --name-only HEAD~1
# Get diff summary
git diff --stat HEAD~1
# Get commit messages
git log --oneline HEAD~5
```
### Step 2: Self-Review First
Before requesting external review:
**Checklist:**
- [ ] Code follows project conventions
- [ ] All tests pass
- [ ] No debug print statements
- [ ] No hardcoded secrets
- [ ] No debug print statements left
- [ ] No hardcoded secrets or credentials
- [ ] Error handling in place
- [ ] Documentation updated
- [ ] Commit messages are clear
```bash
# Run full test suite
pytest
pytest tests/ -q
# Check for debug code
grep -r "print(" src/ --include="*.py"
grep -r "debugger" src/ --include="*.js"
search_files("print(", path="src/", file_glob="*.py")
search_files("console.log", path="src/", file_glob="*.js")
# Check for TODOs
grep -r "TODO" src/ --include="*.py"
search_files("TODO|FIXME|HACK", path="src/")
```
### Step 3: Request Review
### Step 2: Gather Context
Use `delegate_task` to dispatch a reviewer subagent:
```bash
# Changed files
git diff --name-only HEAD~1
# Diff summary
git diff --stat HEAD~1
# Recent commits
git log --oneline -5
```
### Step 3: Dispatch Reviewer Subagent
Use `delegate_task` to dispatch a focused reviewer:
```python
delegate_task(
goal="Review implementation for quality and correctness",
goal="Review implementation for correctness and quality",
context="""
WHAT WAS IMPLEMENTED: [Brief description]
ORIGINAL REQUIREMENTS: [From plan or issue]
WHAT WAS IMPLEMENTED:
[Brief description of the feature/fix]
ORIGINAL REQUIREMENTS:
[From plan, issue, or user request]
FILES CHANGED:
- src/feature.py (added X function)
- tests/test_feature.py (added tests)
COMMIT RANGE: [SHA range or branch]
CHECK FOR:
- Correctness (does it do what it should?)
- Edge cases handled?
- Error handling adequate?
- Code quality and style
- Test coverage
- Security issues
- Performance concerns
- src/models/user.py (added User class)
- src/auth/login.py (added login endpoint)
- tests/test_auth.py (added 8 tests)
REVIEW CHECKLIST:
- [ ] Correctness: Does it do what it should?
- [ ] Edge cases: Are they handled?
- [ ] Error handling: Is it adequate?
- [ ] Code quality: Clear names, good structure?
- [ ] Test coverage: Are tests meaningful?
- [ ] Security: Any vulnerabilities?
- [ ] Performance: Any obvious issues?
OUTPUT FORMAT:
- Summary: [brief assessment]
- Critical Issues: [must fix]
- Important Issues: [should fix]
- Critical Issues: [must fix — blocks merge]
- Important Issues: [should fix before merge]
- Minor Issues: [nice to have]
- Verdict: [APPROVE / REQUEST_CHANGES / NEEDS_WORK]
- Strengths: [what was done well]
- Verdict: APPROVE / REQUEST_CHANGES
""",
toolsets=['file']
)
@@ -123,110 +117,72 @@ delegate_task(
### Step 4: Act on Feedback
**Critical Issues (Block merge):**
**Critical Issues (block merge):**
- Security vulnerabilities
- Broken functionality
- Data loss risk
- Test failures
- **Action:** Fix immediately before proceeding
**Action:** Fix immediately before proceeding
**Important Issues (Should fix):**
**Important Issues (should fix):**
- Missing edge case handling
- Poor error messages
- Unclear code
- Missing tests
- **Action:** Fix before merge if possible
**Action:** Fix before merge if possible
**Minor Issues (Nice to have):**
**Minor Issues (nice to have):**
- Style preferences
- Refactoring suggestions
- Documentation improvements
- **Action:** Note for later or quick fix
**Action:** Note for later or quick fix
**If reviewer is wrong:**
- Push back with technical reasoning
- Show code/tests that prove it works
- Request clarification
## Review Dimensions
### Correctness
**Questions:**
- Does it implement the requirements?
- Are there logic errors?
- Do edge cases work?
- Are there race conditions?
**Check:**
- Read implementation against requirements
- Trace through edge cases
- Check boundary conditions
### Code Quality
**Questions:**
- Is code readable?
- Are names clear?
- Is it too complex?
- Are names clear and descriptive?
- Is it too complex? (Functions >20 lines = smell)
- Is there duplication?
**Check:**
- Function length (aim <20 lines)
- Cyclomatic complexity
- DRY violations
- Naming clarity
### Testing
**Questions:**
- Are there tests?
- Are there meaningful tests?
- Do they cover edge cases?
- Are they meaningful?
- Do they pass?
**Check:**
- Test coverage
- Edge case coverage
- Test clarity
- Assertion quality
- Do they test behavior, not implementation?
- Do all tests pass?
### Security
**Questions:**
- Any injection vulnerabilities?
- Proper input validation?
- Secrets handled correctly?
- Access control in place?
**Check:**
- Input sanitization
- Authentication/authorization
- Secret management
- SQL/query safety
### Performance
**Questions:**
- Any N+1 queries?
- Unnecessary computation?
- Unnecessary computation in loops?
- Memory leaks?
- Scalability concerns?
**Check:**
- Database queries
- Algorithm complexity
- Resource usage
- Caching opportunities
- Missing caching opportunities?
## Review Output Format
Standard review format:
Standard format for reviewer subagent output:
```markdown
## Review Summary
**Assessment:** [Brief overall assessment]
**Verdict:** [APPROVE / REQUEST_CHANGES / NEEDS_WORK]
**Verdict:** APPROVE / REQUEST_CHANGES
---
@@ -237,8 +193,6 @@ Standard review format:
- Problem: [Description]
- Suggestion: [How to fix]
---
## Important Issues (Should Fix)
1. **[Issue title]**
@@ -246,15 +200,11 @@ Standard review format:
- Problem: [Description]
- Suggestion: [How to fix]
---
## Minor Issues (Optional)
1. **[Issue title]**
- Suggestion: [Improvement idea]
---
## Strengths
- [What was done well]
@@ -264,111 +214,41 @@ Standard review format:
### With subagent-driven-development
Review after EACH task:
1. Subagent implements task
2. Request code review
3. Fix issues
4. Proceed to next task
Review after EACH task — this is the two-stage review:
1. Spec compliance review (does it match the plan?)
2. Code quality review (is it well-built?)
3. Fix issues from either review
4. Proceed to next task only when both approve
### With test-driven-development
Review checks:
- Tests written first?
- Tests are meaningful?
Review verifies:
- Tests were written first (RED-GREEN-REFACTOR followed?)
- Tests are meaningful (not just asserting True)?
- Edge cases covered?
- All tests pass?
### With writing-plans
Review validates:
- Implementation matches plan?
- Implementation matches the plan?
- All tasks completed?
- Quality standards met?
## Common Review Patterns
## Red Flags
### Pre-Merge Review
Before merging feature branch:
```bash
# Create review checkpoint
git log --oneline main..feature-branch
# Get summary of changes
git diff --stat main..feature-branch
# Request review
delegate_task(
goal="Pre-merge review of feature branch",
context="[changes, requirements, test results]"
)
# Address feedback
# Merge when approved
```
### Continuous Review
During subagent-driven development:
```python
# After each task
if task_complete:
review_result = request_review(task)
if review_result.has_critical_issues():
fix_issues(review_result.critical)
re_review()
proceed_to_next_task()
```
### Emergency Review
When fixing production bugs:
1. Fix with tests
2. Self-review
3. Quick peer review
4. Deploy
5. Full review post-deploy
## Review Best Practices
### As Review Requester
**Do:**
- Provide complete context
- Highlight areas of concern
- Ask specific questions
- Be responsive to feedback
- Fix issues promptly
**Don't:**
- Rush the reviewer
- Argue without evidence
- Ignore feedback
- Take criticism personally
### As Reviewer (via subagent)
**Do:**
- Be specific about issues
- Suggest improvements
- Acknowledge what works
- Prioritize issues
**Don't:**
- Nitpick style (unless project requires)
- Make vague comments
- Block without explanation
- Be overly critical
**Never:**
- Skip review because "it's simple"
- Ignore Critical issues
- Proceed with unfixed Important issues
- Argue with valid technical feedback without evidence
## Quality Gates
**Must pass before merge:**
- [ ] No critical issues
- [ ] All tests pass
- [ ] Review approved
- [ ] Review verdict: APPROVE
- [ ] Requirements met
**Should pass before merge:**
@@ -376,10 +256,6 @@ When fixing production bugs:
- [ ] Documentation updated
- [ ] Performance acceptable
**Nice to have:**
- [ ] No minor issues
- [ ] Extra polish
## Remember
```

View File

@@ -1,8 +1,8 @@
---
name: subagent-driven-development
description: Use when executing implementation plans with independent tasks. Dispatches fresh delegate_task per task with two-stage review (spec compliance then code quality).
version: 1.0.0
author: Hermes Agent (adapted from Superpowers)
version: 1.1.0
author: Hermes Agent (adapted from obra/superpowers)
license: MIT
metadata:
hermes:
@@ -16,33 +16,41 @@ metadata:
Execute implementation plans by dispatching fresh subagents per task with systematic two-stage review.
**Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration
**Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration.
## When to Use
Use this skill when:
- You have an implementation plan (from writing-plans skill)
- You have an implementation plan (from writing-plans skill or user requirements)
- Tasks are mostly independent
- You want to stay in the current session
- Quality and spec compliance are important
- You want automated review between tasks
**vs. Manual execution:**
- Parallel task execution possible
- Automated review process
- Consistent quality checks
- Better for complex multi-step plans
**vs. manual execution:**
- Fresh context per task (no confusion from accumulated state)
- Automated review process catches issues early
- Consistent quality checks across all tasks
- Subagents can ask questions before starting work
## The Process
### 1. Read and Parse Plan
```markdown
[Read plan file once: docs/plans/feature-plan.md]
[Extract all tasks with full text and context]
[Create todo list with all tasks]
Read the plan file. Extract ALL tasks with their full text and context upfront. Create a todo list:
```python
# Read the plan
read_file("docs/plans/feature-plan.md")
# Create todo list with all tasks
todo([
{"id": "task-1", "content": "Create User model with email field", "status": "pending"},
{"id": "task-2", "content": "Add password hashing utility", "status": "pending"},
{"id": "task-3", "content": "Create login endpoint", "status": "pending"},
])
```
**Action:** Read plan, extract tasks, create todo list.
**Key:** Read the plan ONCE. Extract everything. Don't make subagents read the plan file — provide the full task text directly in context.
### 2. Per-Task Workflow
@@ -50,124 +58,137 @@ For EACH task in the plan:
#### Step 1: Dispatch Implementer Subagent
Use `delegate_task` with:
- **goal:** Implement [specific task from plan]
- **context:** Full task description from plan, project structure, relevant files
- **toolsets:** ['terminal', 'file', 'web'] (or as needed)
Use `delegate_task` with complete context:
**Example:**
```python
# Task: Add user authentication middleware
delegate_task(
goal="Implement JWT authentication middleware as specified in Task 3 of the plan",
goal="Implement Task 1: Create User model with email and password_hash fields",
context="""
Task from plan:
- Create: src/middleware/auth.py
- Validate JWT tokens from Authorization header
- Return 401 for invalid tokens
- Attach user info to request object
Project structure:
- Flask app in src/app.py
- Uses PyJWT library
- Existing middleware pattern in src/middleware/
TASK FROM PLAN:
- Create: src/models/user.py
- Add User class with email (str) and password_hash (str) fields
- Use bcrypt for password hashing
- Include __repr__ for debugging
FOLLOW TDD:
1. Write failing test in tests/models/test_user.py
2. Run: pytest tests/models/test_user.py -v (verify FAIL)
3. Write minimal implementation
4. Run: pytest tests/models/test_user.py -v (verify PASS)
5. Run: pytest tests/ -q (verify no regressions)
6. Commit: git add -A && git commit -m "feat: add User model with password hashing"
PROJECT CONTEXT:
- Python 3.11, Flask app in src/app.py
- Existing models in src/models/
- Tests use pytest, run from project root
- bcrypt already in requirements.txt
""",
toolsets=['terminal', 'file']
)
```
#### Step 2: Implementer Subagent Works
#### Step 2: Dispatch Spec Compliance Reviewer
The subagent will:
1. Ask questions if needed (you answer)
2. Implement the task following TDD
3. Write tests
4. Run tests to verify
5. Self-review
6. Report completion
**Your role:** Answer questions, provide context.
#### Step 3: Spec Compliance Review
Dispatch reviewer subagent:
After the implementer completes, verify against the original spec:
```python
delegate_task(
goal="Review if implementation matches spec from plan",
goal="Review if implementation matches the spec from the plan",
context="""
Original task spec: [copy from plan]
Implementation: [file paths and key code]
Check:
- All requirements from spec implemented?
- File paths match spec?
- Behavior matches spec?
- Nothing extra added?
ORIGINAL TASK SPEC:
- Create src/models/user.py with User class
- Fields: email (str), password_hash (str)
- Use bcrypt for password hashing
- Include __repr__
CHECK:
- [ ] All requirements from spec implemented?
- [ ] File paths match spec?
- [ ] Function signatures match spec?
- [ ] Behavior matches expected?
- [ ] Nothing extra added (no scope creep)?
OUTPUT: PASS or list of specific spec gaps to fix.
""",
toolsets=['file']
)
```
**If spec issues found:**
- Subagent fixes gaps
- Re-run spec review
- Continue only when spec-compliant
**If spec issues found:** Fix gaps, then re-run spec review. Continue only when spec-compliant.
#### Step 4: Code Quality Review
#### Step 3: Dispatch Code Quality Reviewer
Dispatch quality reviewer:
After spec compliance passes:
```python
delegate_task(
goal="Review code quality and best practices",
goal="Review code quality for Task 1 implementation",
context="""
Code to review: [file paths]
Check:
- Follows project style?
- Proper error handling?
- Good naming?
- Test coverage adequate?
- No obvious bugs?
FILES TO REVIEW:
- src/models/user.py
- tests/models/test_user.py
CHECK:
- [ ] Follows project conventions and style?
- [ ] Proper error handling?
- [ ] Clear variable/function names?
- [ ] Adequate test coverage?
- [ ] No obvious bugs or missed edge cases?
- [ ] No security issues?
OUTPUT FORMAT:
- Critical Issues: [must fix before proceeding]
- Important Issues: [should fix]
- Minor Issues: [optional]
- Verdict: APPROVED or REQUEST_CHANGES
""",
toolsets=['file']
)
```
**If quality issues found:**
- Subagent fixes issues
- Re-run quality review
- Continue only when approved
**If quality issues found:** Fix issues, re-review. Continue only when approved.
#### Step 5: Mark Complete
#### Step 4: Mark Complete
Update todo list, mark task complete.
```python
todo([{"id": "task-1", "content": "Create User model with email field", "status": "completed"}], merge=True)
```
### 3. Final Review
After ALL tasks complete:
After ALL tasks are complete, dispatch a final integration reviewer:
```python
delegate_task(
goal="Review entire implementation for consistency",
context="All tasks completed, review for integration issues",
toolsets=['file']
goal="Review the entire implementation for consistency and integration issues",
context="""
All tasks from the plan are complete. Review the full implementation:
- Do all components work together?
- Any inconsistencies between tasks?
- All tests passing?
- Ready for merge?
""",
toolsets=['terminal', 'file']
)
```
### 4. Branch Cleanup
### 4. Verify and Commit
Use `finishing-a-development-branch` skill:
- Verify all tests pass
- Present merge options
- Clean up worktree
```bash
# Run full test suite
pytest tests/ -q
# Review all changes
git diff --stat
# Final commit if needed
git add -A && git commit -m "feat: complete [feature name] implementation"
```
## Task Granularity
**Good task size:** 2-5 minutes of focused work
**Examples:**
**Each task = 2-5 minutes of focused work.**
**Too big:**
- "Implement user authentication system"
@@ -177,188 +198,134 @@ Use `finishing-a-development-branch` skill:
- "Add password hashing function"
- "Create login endpoint"
- "Add JWT token generation"
- "Create registration endpoint"
## Communication Pattern
## Red Flags — Never Do These
### You to Subagent
**Provide:**
- Clear task description
- Exact file paths
- Expected behavior
- Success criteria
- Relevant context
**Example:**
```
Task: Add email validation
Files: Create src/validators/email.py
Expected: Function returns True for valid emails, False for invalid
Success: Tests pass for 10 test cases including edge cases
Context: Used in user registration flow
```
### Subagent to You
**Expect:**
- Questions for clarification
- Progress updates
- Completion report
- Self-review summary
**Respond to:**
- Answer questions promptly
- Provide missing context
- Approve approach decisions
## Two-Stage Review Details
### Stage 1: Spec Compliance
**Checks:**
- [ ] All requirements from plan implemented
- [ ] File paths match specification
- [ ] Function signatures match spec
- [ ] Behavior matches expected
- [ ] No scope creep (nothing extra)
**Output:** PASS or list of spec gaps
### Stage 2: Code Quality
**Checks:**
- [ ] Follows language conventions
- [ ] Consistent with project style
- [ ] Clear variable/function names
- [ ] Proper error handling
- [ ] Adequate test coverage
- [ ] No obvious bugs/edge cases missed
- [ ] Documentation if needed
**Output:** APPROVED or list of issues (critical/important/minor)
- Start implementation without a plan
- Skip reviews (spec compliance OR code quality)
- Proceed with unfixed critical/important issues
- Dispatch multiple implementation subagents for tasks that touch the same files
- Make subagent read the plan file (provide full text in context instead)
- Skip scene-setting context (subagent needs to understand where the task fits)
- Ignore subagent questions (answer before letting them proceed)
- Accept "close enough" on spec compliance
- Skip review loops (reviewer found issues → implementer fixes → review again)
- Let implementer self-review replace actual review (both are needed)
- **Start code quality review before spec compliance is PASS** (wrong order)
- Move to next task while either review has open issues
## Handling Issues
### Critical Issues
### If Subagent Asks Questions
**Examples:** Security vulnerability, broken functionality, data loss risk
- Answer clearly and completely
- Provide additional context if needed
- Don't rush them into implementation
**Action:** Must fix before proceeding
### If Reviewer Finds Issues
### Important Issues
- Implementer subagent (or a new one) fixes them
- Reviewer reviews again
- Repeat until approved
- Don't skip the re-review
**Examples:** Missing tests, poor error handling, unclear code
### If Subagent Fails a Task
**Action:** Should fix before proceeding
- Dispatch a new fix subagent with specific instructions about what went wrong
- Don't try to fix manually in the controller session (context pollution)
### Minor Issues
## Efficiency Notes
**Examples:** Style inconsistency, minor refactoring opportunity
**Why fresh subagent per task:**
- Prevents context pollution from accumulated state
- Each subagent gets clean, focused context
- No confusion from prior tasks' code or reasoning
**Action:** Note for later, optional fix
**Why two-stage review:**
- Spec review catches under/over-building early
- Quality review ensures the implementation is well-built
- Catches issues before they compound across tasks
**Cost trade-off:**
- More subagent invocations (implementer + 2 reviewers per task)
- But catches issues early (cheaper than debugging compounded problems later)
## Integration with Other Skills
### With writing-plans
This skill EXECUTES plans created by the writing-plans skill:
1. User requirements → writing-plans → implementation plan
2. Implementation plan → subagent-driven-development → working code
### With test-driven-development
Subagent should:
Implementer subagents should follow TDD:
1. Write failing test first
2. Implement minimal code
3. Verify test passes
4. Commit
### With systematic-debugging
If subagent encounters bugs:
1. Pause implementation
2. Debug systematically
3. Fix root cause
4. Resume
### With writing-plans
This skill EXECUTES plans created by writing-plans skill.
**Sequence:**
1. brainstorming → writing-plans → subagent-driven-development
Include TDD instructions in every implementer context.
### With requesting-code-review
After subagent completes task, use requesting-code-review skill for final validation.
The two-stage review process IS the code review. For final integration review, use the requesting-code-review skill's review dimensions.
## Common Patterns
### With systematic-debugging
### Pattern: Fresh Subagent Per Task
**Why:** Prevents context pollution
**How:** New delegate_task for each task
**Result:** Each subagent has clean context
### Pattern: Two-Stage Review
**Why:** Catch issues early, ensure quality
**How:** Spec review → Quality review
**Result:** High-quality, spec-compliant code
### Pattern: Frequent Checkpoints
**Why:** Catch issues before they compound
**How:** Review after each task
**Result:** Issues don't cascade
## Best Practices
1. **Clear Task Boundaries**
- One task = one focused change
- Independent where possible
- Clear success criteria
2. **Complete Context**
- Provide all needed files
- Explain project conventions
- Share relevant examples
3. **Review Discipline**
- Don't skip spec review
- Address critical issues immediately
- Keep quality bar consistent
4. **Communication**
- Answer subagent questions quickly
- Clarify when needed
- Provide feedback on reviews
If a subagent encounters bugs during implementation:
1. Follow systematic-debugging process
2. Find root cause before fixing
3. Write regression test
4. Resume implementation
## Example Workflow
```markdown
User: Implement user authentication
```
[Read plan: docs/plans/auth-feature.md]
[Create todo list with 5 tasks]
You: I'll use subagent-driven development. Let me create a plan first.
[Uses writing-plans skill]
--- Task 1: Create User model ---
[Dispatch implementer subagent]
Implementer: "Should email be unique?"
You: "Yes, email must be unique"
Implementer: Implemented, 3/3 tests passing, committed.
Plan created with 5 tasks:
1. Create User model
2. Add password hashing
3. Implement login endpoint
4. Add JWT middleware
5. Create registration endpoint
[Dispatch spec reviewer]
Spec reviewer: ✅ PASS — all requirements met
--- Task 1 ---
[Dispatch implementer subagent for User model]
[Subagent asks: "Should email be unique?"]
You: Yes, email must be unique
[Subagent implements]
[Dispatch spec reviewer - PASS]
[Dispatch quality reviewer - APPROVED]
Task 1 complete
[Dispatch quality reviewer]
Quality reviewer: ✅ APPROVED — clean code, good tests
--- Task 2 ---
[Dispatch implementer for password hashing]
...
[Mark Task 1 complete]
[After all tasks]
[Final review]
[Merge branch]
--- Task 2: Password hashing ---
[Dispatch implementer subagent]
Implementer: No questions, implemented, 5/5 tests passing.
[Dispatch spec reviewer]
Spec reviewer: ❌ Missing: password strength validation (spec says "min 8 chars")
[Implementer fixes]
Implementer: Added validation, 7/7 tests passing.
[Dispatch spec reviewer again]
Spec reviewer: ✅ PASS
[Dispatch quality reviewer]
Quality reviewer: Important: Magic number 8, extract to constant
Implementer: Extracted MIN_PASSWORD_LENGTH constant
Quality reviewer: ✅ APPROVED
[Mark Task 2 complete]
... (continue for all tasks)
[After all tasks: dispatch final integration reviewer]
[Run full test suite: all passing]
[Done!]
```
## Remember
@@ -366,8 +333,8 @@ Task 1 complete
```
Fresh subagent per task
Two-stage review every time
Spec compliance first
Code quality second
Spec compliance FIRST
Code quality SECOND
Never skip reviews
Catch issues early
```

View File

@@ -1,8 +1,8 @@
---
name: systematic-debugging
description: Use when encountering any bug, test failure, or unexpected behavior. 4-phase root cause investigation process - NO fixes without understanding the problem first.
version: 1.0.0
author: Hermes Agent (adapted from Superpowers)
description: Use when encountering any bug, test failure, or unexpected behavior. 4-phase root cause investigation NO fixes without understanding the problem first.
version: 1.1.0
author: Hermes Agent (adapted from obra/superpowers)
license: MIT
metadata:
hermes:
@@ -48,7 +48,7 @@ Use for ANY technical issue:
**Don't skip when:**
- Issue seems simple (simple bugs have root causes too)
- You're in a hurry (rushing guarantees rework)
- Manager wants it fixed NOW (systematic is faster than thrashing)
- Someone wants it fixed NOW (systematic is faster than thrashing)
## The Four Phases
@@ -67,7 +67,7 @@ You MUST complete each phase before proceeding to the next.
- Read stack traces completely
- Note line numbers, file paths, error codes
**Action:** Copy full error message to your notes.
**Action:** Use `read_file` on the relevant source files. Use `search_files` to find the error string in the codebase.
### 2. Reproduce Consistently
@@ -76,16 +76,24 @@ You MUST complete each phase before proceeding to the next.
- Does it happen every time?
- If not reproducible → gather more data, don't guess
**Action:** Write down exact reproduction steps.
**Action:** Use the `terminal` tool to run the failing test or trigger the bug:
```bash
# Run specific failing test
pytest tests/test_module.py::test_name -v
# Run with verbose output
pytest tests/test_module.py -v --tb=long
```
### 3. Check Recent Changes
- What changed that could cause this?
- Git diff, recent commits
- New dependencies, config changes
- Environmental differences
**Commands:**
**Action:**
```bash
# Recent commits
git log --oneline -10
@@ -94,59 +102,50 @@ git log --oneline -10
git diff
# Changes in specific file
git log -p --follow src/problematic_file.py
git log -p --follow src/problematic_file.py | head -100
```
### 4. Gather Evidence in Multi-Component Systems
**WHEN system has multiple components (CI pipeline, API service, database layer):**
**WHEN system has multiple components (API service database, CI → build → deploy):**
**BEFORE proposing fixes, add diagnostic instrumentation:**
For EACH component boundary:
- Log what data enters component
- Log what data exits component
- Log what data enters the component
- Log what data exits the component
- Verify environment/config propagation
- Check state at each layer
Run once to gather evidence showing WHERE it breaks
THEN analyze evidence to identify failing component
THEN investigate that specific component
Run once to gather evidence showing WHERE it breaks.
THEN analyze evidence to identify the failing component.
THEN investigate that specific component.
### 5. Trace Data Flow
**WHEN error is deep in the call stack:**
- Where does the bad value originate?
- What called this function with the bad value?
- Keep tracing upstream until you find the source
- Fix at the source, not at the symptom
**Action:** Use `search_files` to trace references:
**Example (multi-layer system):**
```python
# Layer 1: Entry point
def entry_point(input_data):
print(f"DEBUG: Input received: {input_data}")
result = process_layer1(input_data)
print(f"DEBUG: Layer 1 output: {result}")
return result
# Find where the function is called
search_files("function_name(", path="src/", file_glob="*.py")
# Layer 2: Processing
def process_layer1(data):
print(f"DEBUG: Layer 1 received: {data}")
# ... processing ...
print(f"DEBUG: Layer 1 returning: {result}")
return result
# Find where the variable is set
search_files("variable_name\\s*=", path="src/", file_glob="*.py")
```
**Action:** Add logging, run once, analyze output.
### 5. Isolate the Problem
- Comment out code until problem disappears
- Binary search through recent changes
- Create minimal reproduction case
- Test with fresh environment
**Action:** Create minimal reproduction case.
### Phase 1 Completion Checklist
- [ ] Error messages fully read and understood
- [ ] Issue reproduced consistently
- [ ] Recent changes identified and reviewed
- [ ] Evidence gathered (logs, state)
- [ ] Evidence gathered (logs, state, data flow)
- [ ] Problem isolated to specific component/code
- [ ] Root cause hypothesis formed
@@ -154,290 +153,214 @@ def process_layer1(data):
---
## Phase 2: Solution Design
## Phase 2: Pattern Analysis
**Given the root cause, design the fix:**
**Find the pattern before fixing:**
### 1. Understand the Fix Area
### 1. Find Working Examples
- Read relevant code thoroughly
- Understand data flow
- Identify affected components
- Check for similar issues elsewhere
- Locate similar working code in the same codebase
- What works that's similar to what's broken?
**Action:** Read all relevant code files.
**Action:** Use `search_files` to find comparable patterns:
### 2. Design Minimal Fix
```python
search_files("similar_pattern", path="src/", file_glob="*.py")
```
- Smallest change that fixes root cause
- Avoid scope creep
- Don't refactor while fixing
- Fix one issue at a time
### 2. Compare Against References
**Action:** Write down the exact fix before coding.
- If implementing a pattern, read the reference implementation COMPLETELY
- Don't skim — read every line
- Understand the pattern fully before applying
### 3. Consider Side Effects
### 3. Identify Differences
- What else could this change affect?
- Are there dependencies?
- Will this break other functionality?
- What's different between working and broken?
- List every difference, however small
- Don't assume "that can't matter"
**Action:** Identify potential side effects.
### 4. Understand Dependencies
### Phase 2 Completion Checklist
- [ ] Fix area code fully understood
- [ ] Minimal fix designed
- [ ] Side effects identified
- [ ] Fix approach documented
- What other components does this need?
- What settings, config, environment?
- What assumptions does it make?
---
## Phase 3: Implementation
## Phase 3: Hypothesis and Testing
**Make the fix:**
**Scientific method:**
### 1. Write Test First (if possible)
### 1. Form a Single Hypothesis
```python
def test_should_handle_empty_input():
"""Regression test for bug #123"""
result = process_data("")
assert result == expected_empty_result
```
- State clearly: "I think X is the root cause because Y"
- Write it down
- Be specific, not vague
### 2. Implement Fix
### 2. Test Minimally
```python
# Before (buggy)
def process_data(data):
return data.split(",")[0]
- Make the SMALLEST possible change to test the hypothesis
- One variable at a time
- Don't fix multiple things at once
# After (fixed)
def process_data(data):
if not data:
return ""
return data.split(",")[0]
```
### 3. Verify Before Continuing
- Did it work? → Phase 4
- Didn't work? → Form NEW hypothesis
- DON'T add more fixes on top
### 4. When You Don't Know
- Say "I don't understand X"
- Don't pretend to know
- Ask the user for help
- Research more
---
## Phase 4: Implementation
**Fix the root cause, not the symptom:**
### 1. Create Failing Test Case
- Simplest possible reproduction
- Automated test if possible
- MUST have before fixing
- Use the `test-driven-development` skill
### 2. Implement Single Fix
- Address the root cause identified
- ONE change at a time
- No "while I'm here" improvements
- No bundled refactoring
### 3. Verify Fix
```bash
# Run the specific test
pytest tests/test_data.py::test_should_handle_empty_input -v
# Run the specific regression test
pytest tests/test_module.py::test_regression -v
# Run all tests to check for regressions
pytest
# Run full suite — no regressions
pytest tests/ -q
```
### Phase 3 Completion Checklist
### 4. If Fix Doesn't Work — The Rule of Three
- [ ] Test written that reproduces the bug
- [ ] Minimal fix implemented
- [ ] Test passes
- [ ] No regressions introduced
- **STOP.**
- Count: How many fixes have you tried?
- If < 3: Return to Phase 1, re-analyze with new information
- **If ≥ 3: STOP and question the architecture (step 5 below)**
- DON'T attempt Fix #4 without architectural discussion
### 5. If 3+ Fixes Failed: Question Architecture
**Pattern indicating an architectural problem:**
- Each fix reveals new shared state/coupling in a different place
- Fixes require "massive refactoring" to implement
- Each fix creates new symptoms elsewhere
**STOP and question fundamentals:**
- Is this pattern fundamentally sound?
- Are we "sticking with it through sheer inertia"?
- Should we refactor the architecture vs. continue fixing symptoms?
**Discuss with the user before attempting more fixes.**
This is NOT a failed hypothesis — this is a wrong architecture.
---
## Phase 4: Verification
## Red Flags — STOP and Follow Process
**Confirm it's actually fixed:**
If you catch yourself thinking:
- "Quick fix for now, investigate later"
- "Just try changing X and see if it works"
- "Add multiple changes, run tests"
- "Skip the test, I'll manually verify"
- "It's probably X, let me fix that"
- "I don't fully understand but this might work"
- "Pattern says X but I'll adapt it differently"
- "Here are the main problems: [lists fixes without investigation]"
- Proposing solutions before tracing data flow
- **"One more fix attempt" (when already tried 2+)**
- **Each fix reveals a new problem in a different place**
### 1. Reproduce Original Issue
**ALL of these mean: STOP. Return to Phase 1.**
- Follow original reproduction steps
- Verify issue is resolved
- Test edge cases
**If 3+ fixes failed:** Question the architecture (Phase 4 step 5).
### 2. Regression Testing
## Common Rationalizations
```bash
# Full test suite
pytest
| Excuse | Reality |
|--------|---------|
| "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. |
| "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
| "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. |
| "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. |
| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
| "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. |
| "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
| "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question the pattern, don't fix again. |
# Integration tests
pytest tests/integration/
## Quick Reference
# Check related areas
pytest -k "related_feature"
```
| Phase | Key Activities | Success Criteria |
|-------|---------------|------------------|
| **1. Root Cause** | Read errors, reproduce, check changes, gather evidence, trace data flow | Understand WHAT and WHY |
| **2. Pattern** | Find working examples, compare, identify differences | Know what's different |
| **3. Hypothesis** | Form theory, test minimally, one variable at a time | Confirmed or new hypothesis |
| **4. Implementation** | Create regression test, fix root cause, verify | Bug resolved, all tests pass |
### 3. Monitor After Deploy
## Hermes Agent Integration
- Watch logs for related errors
- Check metrics
- Verify fix in production
### Investigation Tools
### Phase 4 Completion Checklist
Use these Hermes tools during Phase 1:
- [ ] Original issue cannot be reproduced
- [ ] All tests pass
- [ ] No new warnings/errors
- [ ] Fix documented (commit message, comments)
- **`search_files`** — Find error strings, trace function calls, locate patterns
- **`read_file`** — Read source code with line numbers for precise analysis
- **`terminal`** — Run tests, check git history, reproduce bugs
- **`web_search`/`web_extract`** — Research error messages, library docs
---
### With delegate_task
## Debugging Techniques
### Root Cause Tracing
Ask "why" 5 times:
1. Why did it fail? → Null pointer
2. Why was it null? → Function returned null
3. Why did function return null? → Missing validation
4. Why was validation missing? → Assumed input always valid
5. Why was that assumption wrong? → API changed
**Root cause:** API change not accounted for
### Defense in Depth
Don't fix just the symptom:
**Bad:** Add null check at crash site
**Good:**
1. Add validation at API boundary
2. Add null check at crash site
3. Add test for both
4. Document API behavior
### Condition-Based Waiting
For timing/race conditions:
For complex multi-component debugging, dispatch investigation subagents:
```python
# Bad - arbitrary sleep
import time
time.sleep(5) # "Should be enough"
delegate_task(
goal="Investigate why [specific test/behavior] fails",
context="""
Follow systematic-debugging skill:
1. Read the error message carefully
2. Reproduce the issue
3. Trace the data flow to find root cause
4. Report findings — do NOT fix yet
# Good - wait for condition
from tenacity import retry, wait_exponential, stop_after_attempt
@retry(wait=wait_exponential(multiplier=1, min=4, max=10),
stop=stop_after_attempt(5))
def wait_for_service():
response = requests.get(health_url)
assert response.status_code == 200
Error: [paste full error]
File: [path to failing code]
Test command: [exact command]
""",
toolsets=['terminal', 'file']
)
```
---
## Common Debugging Pitfalls
### Fix Without Understanding
**Symptom:** "Just add a try/catch"
**Problem:** Masks the real issue
**Solution:** Complete Phase 1 before any fix
### Shotgun Debugging
**Symptom:** Change 5 things at once
**Problem:** Don't know what fixed it
**Solution:** One change at a time, verify each
### Premature Optimization
**Symptom:** Rewrite while debugging
**Problem:** Introduces new bugs
**Solution:** Fix first, refactor later
### Assuming Environment
**Symptom:** "Works on my machine"
**Problem:** Environment differences
**Solution:** Check environment variables, versions, configs
---
## Language-Specific Tools
### Python
```python
# Add debugger
import pdb; pdb.set_trace()
# Or use ipdb for better experience
import ipdb; ipdb.set_trace()
# Log state
import logging
logging.debug(f"Variable state: {variable}")
# Stack trace
import traceback
traceback.print_exc()
```
### JavaScript/TypeScript
```javascript
// Debugger
debugger;
// Console with context
console.log("State:", { var1, var2, var3 });
// Stack trace
console.trace("Here");
// Error with context
throw new Error(`Failed with input: ${JSON.stringify(input)}`);
```
### Go
```go
// Print state
fmt.Printf("Debug: variable=%+v\n", variable)
// Stack trace
import "runtime/debug"
debug.PrintStack()
// Panic with context
if err != nil {
panic(fmt.Sprintf("unexpected error: %v", err))
}
```
---
## Integration with Other Skills
### With test-driven-development
When debugging:
1. Write test that reproduces bug
2. Debug systematically
3. Fix root cause
4. Test passes
When fixing bugs:
1. Write a test that reproduces the bug (RED)
2. Debug systematically to find root cause
3. Fix the root cause (GREEN)
4. The test proves the fix and prevents regression
### With writing-plans
## Real-World Impact
Include debugging tasks in plans:
- "Add diagnostic logging"
- "Create reproduction test"
- "Verify fix resolves issue"
### With subagent-driven-development
If subagent gets stuck:
1. Switch to systematic debugging
2. Analyze root cause
3. Provide findings to subagent
4. Resume implementation
---
## Remember
```
PHASE 1: Investigate → Understand WHY
PHASE 2: Design → Plan the fix
PHASE 3: Implement → Make the fix
PHASE 4: Verify → Confirm it's fixed
```
From debugging sessions:
- Systematic approach: 15-30 minutes to fix
- Random fixes approach: 2-3 hours of thrashing
- First-time fix rate: 95% vs 40%
- New bugs introduced: Near zero vs common
**No shortcuts. No guessing. Systematic always wins.**

View File

@@ -1,8 +1,8 @@
---
name: test-driven-development
description: Use when implementing any feature or bugfix, before writing implementation code. Enforces RED-GREEN-REFACTOR cycle with test-first approach.
version: 1.0.0
author: Hermes Agent (adapted from Superpowers)
version: 1.1.0
author: Hermes Agent (adapted from obra/superpowers)
license: MIT
metadata:
hermes:
@@ -28,7 +28,7 @@ Write the test first. Watch it fail. Write minimal code to pass.
- Refactoring
- Behavior changes
**Exceptions (ask your human partner):**
**Exceptions (ask the user first):**
- Throwaway prototypes
- Generated code
- Configuration files
@@ -53,11 +53,11 @@ Implement fresh from tests. Period.
## Red-Green-Refactor Cycle
### RED - Write Failing Test
### RED Write Failing Test
Write one minimal test showing what should happen.
**Good Example:**
**Good test:**
```python
def test_retries_failed_operations_3_times():
attempts = 0
@@ -67,43 +67,51 @@ def test_retries_failed_operations_3_times():
if attempts < 3:
raise Exception('fail')
return 'success'
result = retry_operation(operation)
assert result == 'success'
assert attempts == 3
```
- Clear name, tests real behavior, one thing
Clear name, tests real behavior, one thing.
**Bad Example:**
**Bad test:**
```python
def test_retry_works():
mock = MagicMock()
mock.side_effect = [Exception(), Exception(), 'success']
result = retry_operation(mock)
assert result == 'success' # What about retry count? Timing?
```
- Vague name, mocks behavior not reality, unclear what it tests
Vague name, tests mock not real code.
### Verify RED
**Requirements:**
- One behavior per test
- Clear descriptive name ("and" in name? Split it)
- Real code, not mocks (unless truly unavoidable)
- Name describes behavior, not implementation
Run the test. It MUST fail.
### Verify RED — Watch It Fail
**If it passes:**
- Test is wrong (not testing what you think)
- Code already exists (delete it, start over)
- Wrong test file running
**MANDATORY. Never skip.**
**What to check:**
- Error message makes sense
- Fails for expected reason
- Stack trace points to right place
```bash
# Use terminal tool to run the specific test
pytest tests/test_feature.py::test_specific_behavior -v
```
### GREEN - Minimal Code
Confirm:
- Test fails (not errors from typos)
- Failure message is expected
- Fails because the feature is missing
Write just enough code to pass. Nothing more.
**Test passes immediately?** You're testing existing behavior. Fix the test.
**Test errors?** Fix the error, re-run until it fails correctly.
### GREEN — Minimal Code
Write the simplest code to pass the test. Nothing more.
**Good:**
```python
@@ -119,266 +127,216 @@ def add(a, b):
return result
```
Cheating is OK in GREEN:
Don't add features, refactor other code, or "improve" beyond the test.
**Cheating is OK in GREEN:**
- Hardcode return values
- Copy-paste
- Duplicate code
- Skip edge cases
**We'll fix it in refactor.**
We'll fix it in REFACTOR.
### Verify GREEN
### Verify GREEN — Watch It Pass
Run tests. All must pass.
**MANDATORY.**
**If fails:**
- Fix minimal code
- Don't expand scope
- Stay in GREEN
```bash
# Run the specific test
pytest tests/test_feature.py::test_specific_behavior -v
### REFACTOR - Clean Up
# Then run ALL tests to check for regressions
pytest tests/ -q
```
Now improve the code while keeping tests green.
Confirm:
- Test passes
- Other tests still pass
- Output pristine (no errors, warnings)
**Safe refactorings:**
- Rename variables/functions
- Extract helper functions
**Test fails?** Fix the code, not the test.
**Other tests fail?** Fix regressions now.
### REFACTOR — Clean Up
After green only:
- Remove duplication
- Improve names
- Extract helpers
- Simplify expressions
- Improve readability
**Golden rule:** Tests stay green throughout.
Keep tests green throughout. Don't add behavior.
**If tests fail during refactor:**
- Undo immediately
- Smaller refactoring steps
- Check you didn't change behavior
**If tests fail during refactor:** Undo immediately. Take smaller steps.
## Implementation Workflow
### Repeat
### 1. Create Test File
Next failing test for next behavior. One cycle at a time.
```bash
# Python
touch tests/test_feature.py
## Why Order Matters
# JavaScript
touch tests/feature.test.js
**"I'll write tests after to verify it works"**
# Rust
touch tests/feature_tests.rs
```
Tests written after code pass immediately. Passing immediately proves nothing:
- Might test the wrong thing
- Might test implementation, not behavior
- Might miss edge cases you forgot
- You never saw it catch the bug
### 2. Write First Failing Test
Test-first forces you to see the test fail, proving it actually tests something.
**"I already manually tested all the edge cases"**
Manual testing is ad-hoc. You think you tested everything but:
- No record of what you tested
- Can't re-run when code changes
- Easy to forget cases under pressure
- "It worked when I tried it" ≠ comprehensive
Automated tests are systematic. They run the same way every time.
**"Deleting X hours of work is wasteful"**
Sunk cost fallacy. The time is already gone. Your choice now:
- Delete and rewrite with TDD (high confidence)
- Keep it and add tests after (low confidence, likely bugs)
The "waste" is keeping code you can't trust.
**"TDD is dogmatic, being pragmatic means adapting"**
TDD IS pragmatic:
- Finds bugs before commit (faster than debugging after)
- Prevents regressions (tests catch breaks immediately)
- Documents behavior (tests show how to use code)
- Enables refactoring (change freely, tests catch breaks)
"Pragmatic" shortcuts = debugging in production = slower.
**"Tests after achieve the same goals — it's spirit not ritual"**
No. Tests-after answer "What does this do?" Tests-first answer "What should this do?"
Tests-after are biased by your implementation. You test what you built, not what's required. Tests-first force edge case discovery before implementing.
## Common Rationalizations
| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
| "Test hard = design unclear" | Listen to the test. Hard to test = hard to use. |
| "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
| "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. |
| "Existing code has no tests" | You're improving it. Add tests for the code you touch. |
## Red Flags — STOP and Start Over
If you catch yourself doing any of these, delete the code and restart with TDD:
- Code before test
- Test after implementation
- Test passes immediately on first run
- Can't explain why test failed
- Tests added "later"
- Rationalizing "just this once"
- "I already manually tested it"
- "Tests after achieve the same purpose"
- "Keep as reference" or "adapt existing code"
- "Already spent X hours, deleting is wasteful"
- "TDD is dogmatic, I'm being pragmatic"
- "This is different because..."
**All of these mean: Delete code. Start over with TDD.**
## Verification Checklist
Before marking work complete:
- [ ] Every new function/method has a test
- [ ] Watched each test fail before implementing
- [ ] Each test failed for expected reason (feature missing, not typo)
- [ ] Wrote minimal code to pass each test
- [ ] All tests pass
- [ ] Output pristine (no errors, warnings)
- [ ] Tests use real code (mocks only if unavoidable)
- [ ] Edge cases and errors covered
Can't check all boxes? You skipped TDD. Start over.
## When Stuck
| Problem | Solution |
|---------|----------|
| Don't know how to test | Write the wished-for API. Write the assertion first. Ask the user. |
| Test too complicated | Design too complicated. Simplify the interface. |
| Must mock everything | Code too coupled. Use dependency injection. |
| Test setup huge | Extract helpers. Still complex? Simplify the design. |
## Hermes Agent Integration
### Running Tests
Use the `terminal` tool to run tests at each step:
```python
# tests/test_calculator.py
def test_adds_two_numbers():
calc = Calculator()
result = calc.add(2, 3)
assert result == 5
# RED — verify failure
terminal("pytest tests/test_feature.py::test_name -v")
# GREEN — verify pass
terminal("pytest tests/test_feature.py::test_name -v")
# Full suite — verify no regressions
terminal("pytest tests/ -q")
```
### 3. Run and Verify Failure
### With delegate_task
```bash
pytest tests/test_calculator.py -v
# Expected: FAIL - Calculator not defined
```
### 4. Write Minimal Implementation
When dispatching subagents for implementation, enforce TDD in the goal:
```python
# src/calculator.py
class Calculator:
def add(self, a, b):
return a + b # Minimal!
delegate_task(
goal="Implement [feature] using strict TDD",
context="""
Follow test-driven-development skill:
1. Write failing test FIRST
2. Run test to verify it fails
3. Write minimal code to pass
4. Run test to verify it passes
5. Refactor if needed
6. Commit
Project test command: pytest tests/ -q
Project structure: [describe relevant files]
""",
toolsets=['terminal', 'file']
)
```
### 5. Run and Verify Pass
```bash
pytest tests/test_calculator.py -v
# Expected: PASS
```
### 6. Commit
```bash
git add tests/test_calculator.py src/calculator.py
git commit -m "feat: add calculator with add method"
```
### 7. Next Test
```python
def test_adds_negative_numbers():
calc = Calculator()
result = calc.add(-2, -3)
assert result == -5
```
Repeat cycle.
## Testing Anti-Patterns
### Mocking What You Don't Own
**Bad:** Mock database, HTTP client, file system
**Good:** Abstract behind interface, test interface
### Testing Implementation Details
**Bad:** Test that function was called
**Good:** Test the result/behavior
### Happy Path Only
**Bad:** Only test expected inputs
**Good:** Test edge cases, errors, boundaries
### Brittle Tests
**Bad:** Test breaks when refactoring
**Good:** Tests verify behavior, not structure
## Common Pitfalls
### "I'll Write Tests After"
No, you won't. Write them first.
### "This is Too Simple to Test"
Simple bugs cause complex problems. Test everything.
### "I Need to See It Work First"
Temporary code becomes permanent. Test first.
### "Tests Take Too Long"
Untested code takes longer. Invest in tests.
## Language-Specific Commands
### Python (pytest)
```bash
# Run all tests
pytest
# Run specific test
pytest tests/test_feature.py::test_name -v
# Run with coverage
pytest --cov=src --cov-report=term-missing
# Watch mode
pytest-watch
```
### JavaScript (Jest)
```bash
# Run all tests
npm test
# Run specific test
npm test -- test_name
# Watch mode
npm test -- --watch
# Coverage
npm test -- --coverage
```
### TypeScript (Jest with ts-jest)
```bash
# Run tests
npx jest
# Run specific file
npx jest tests/feature.test.ts
```
### Go
```bash
# Run all tests
go test ./...
# Run specific test
go test -run TestName
# Verbose
go test -v
# Coverage
go test -cover
```
### Rust
```bash
# Run tests
cargo test
# Run specific test
cargo test test_name
# Show output
cargo test -- --nocapture
```
## Integration with Other Skills
### With writing-plans
Every plan task should specify:
- What test to write
- Expected test failure
- Minimal implementation
- Refactoring opportunities
### With systematic-debugging
When fixing bugs:
1. Write test that reproduces bug
2. Verify test fails (RED)
3. Fix bug (GREEN)
4. Refactor if needed
Bug found? Write failing test reproducing it. Follow TDD cycle. The test proves the fix and prevents regression.
### With subagent-driven-development
Never fix bugs without a test.
Subagent implements one test at a time:
1. Write failing test
2. Minimal code to pass
3. Commit
4. Next test
## Testing Anti-Patterns
## Success Indicators
- **Testing mock behavior instead of real behavior** — mocks should verify interactions, not replace the system under test
- **Testing implementation details** — test behavior/results, not internal method calls
- **Happy path only** — always test edge cases, errors, and boundaries
- **Brittle tests** — tests should verify behavior, not structure; refactoring shouldn't break them
**You're doing TDD right when:**
- Tests fail before code exists
- You write <10 lines between test runs
- Refactoring feels safe
- Bugs are caught immediately
- Code is simpler than expected
## Final Rule
**Red flags:**
- Writing 50+ lines without running tests
- Tests always pass
- Fear of refactoring
- "I'll test later"
```
Production code → test exists and failed first
Otherwise → not TDD
```
## Remember
1. **RED** - Write failing test
2. **GREEN** - Minimal code to pass
3. **REFACTOR** - Clean up safely
4. **REPEAT** - Next behavior
**If you didn't see it fail, it doesn't test what you think.**
No exceptions without the user's explicit permission.

View File

@@ -1,8 +1,8 @@
---
name: writing-plans
description: Use when you have a spec or requirements for a multi-step task. Creates comprehensive implementation plans with bite-sized tasks, exact file paths, and complete code examples.
version: 1.0.0
author: Hermes Agent (adapted from Superpowers)
version: 1.1.0
author: Hermes Agent (adapted from obra/superpowers)
license: MIT
metadata:
hermes:
@@ -14,112 +14,34 @@ metadata:
## Overview
Transform specifications into actionable implementation plans. Write comprehensive plans that any developer can follow - even with zero context about your codebase.
Write comprehensive implementation plans assuming the implementer has zero context for the codebase and questionable taste. Document everything they need: which files to touch, complete code, testing commands, docs to check, how to verify. Give them bite-sized tasks. DRY. YAGNI. TDD. Frequent commits.
**Core principle:** Document everything: exact file paths, complete code, test commands, verification steps.
Assume the implementer is a skilled developer but knows almost nothing about the toolset or problem domain. Assume they don't know good test design very well.
**Assume the implementer:**
- Is a skilled developer
- Knows almost nothing about your codebase
- Has questionable taste in code style
- Needs explicit guidance
**Core principle:** A good plan makes implementation obvious. If someone has to guess, the plan is incomplete.
## When to Use
**Always use this skill:**
- Before implementing multi-step features
- After design approval (from brainstorming)
- When breaking down complex requirements
- Before delegating to subagents
**Always use before:**
- Implementing multi-step features
- Breaking down complex requirements
- Delegating to subagents via subagent-driven-development
**Don't skip when:**
- Feature seems simple (assumptions cause bugs)
- You plan to implement yourself (future you needs guidance)
- You plan to implement it yourself (future you needs guidance)
- Working alone (documentation matters)
## Plan Document Structure
## Bite-Sized Task Granularity
### Header (Required)
**Each task = 2-5 minutes of focused work.**
Every plan MUST start with:
```markdown
# [Feature Name] Implementation Plan
> **For Hermes:** Use subagent-driven-development or executing-plans skill to implement this plan.
**Goal:** One sentence describing what this builds
**Architecture:** 2-3 sentences about approach
**Tech Stack:** Key technologies/libraries
---
```
### Task Structure
Each task follows this format:
```markdown
### Task N: [Descriptive Name]
**Objective:** What this task accomplishes (one sentence)
**Files:**
- Create: `exact/path/to/new_file.py`
- Modify: `exact/path/to/existing.py:45-67` (line numbers if known)
- Test: `tests/path/to/test_file.py`
**Implementation Steps:**
**Step 1: [Action description]**
```python
# Complete code to write
class NewClass:
def __init__(self):
self.value = None
def process(self, input):
return input.upper()
```
**Step 2: [Action description]**
```bash
# Command to run
pytest tests/test_new.py -v
```
Expected: Tests pass with 3 green dots
**Step 3: [Action description]**
```python
# More code if needed
```
**Verification:**
- [ ] Test passes
- [ ] Function returns expected output
- [ ] No syntax errors
**Commit:**
```bash
git add src/new_file.py tests/test_new.py
git commit -m "feat: add new feature component"
```
```
## Task Granularity
**Each task = 2-5 minutes of work**
**Break down:**
- "Write failing test" - one task
- "Run test to verify it fails" - one task
- "Implement minimal code" - one task
- "Run test to verify pass" - one task
- "Commit" - one task
**Examples:**
Every step is one action:
- "Write the failing test" — step
- "Run it to make sure it fails" — step
- "Implement the minimal code to make the test pass" — step
- "Run the tests and make sure they pass" — step
- "Commit" — step
**Too big:**
```markdown
@@ -139,6 +61,146 @@ git commit -m "feat: add new feature component"
[15 lines, 1 file]
```
## Plan Document Structure
### Header (Required)
Every plan MUST start with:
```markdown
# [Feature Name] Implementation Plan
> **For Hermes:** Use subagent-driven-development skill to implement this plan task-by-task.
**Goal:** [One sentence describing what this builds]
**Architecture:** [2-3 sentences about approach]
**Tech Stack:** [Key technologies/libraries]
---
```
### Task Structure
Each task follows this format:
````markdown
### Task N: [Descriptive Name]
**Objective:** What this task accomplishes (one sentence)
**Files:**
- Create: `exact/path/to/new_file.py`
- Modify: `exact/path/to/existing.py:45-67` (line numbers if known)
- Test: `tests/path/to/test_file.py`
**Step 1: Write failing test**
```python
def test_specific_behavior():
result = function(input)
assert result == expected
```
**Step 2: Run test to verify failure**
Run: `pytest tests/path/test.py::test_specific_behavior -v`
Expected: FAIL — "function not defined"
**Step 3: Write minimal implementation**
```python
def function(input):
return expected
```
**Step 4: Run test to verify pass**
Run: `pytest tests/path/test.py::test_specific_behavior -v`
Expected: PASS
**Step 5: Commit**
```bash
git add tests/path/test.py src/path/file.py
git commit -m "feat: add specific feature"
```
````
## Writing Process
### Step 1: Understand Requirements
Read and understand:
- Feature requirements
- Design documents or user description
- Acceptance criteria
- Constraints
### Step 2: Explore the Codebase
Use Hermes tools to understand the project:
```python
# Understand project structure
search_files("*.py", target="files", path="src/")
# Look at similar features
search_files("similar_pattern", path="src/", file_glob="*.py")
# Check existing tests
search_files("*.py", target="files", path="tests/")
# Read key files
read_file("src/app.py")
```
### Step 3: Design Approach
Decide:
- Architecture pattern
- File organization
- Dependencies needed
- Testing strategy
### Step 4: Write Tasks
Create tasks in order:
1. Setup/infrastructure
2. Core functionality (TDD for each)
3. Edge cases
4. Integration
5. Cleanup/documentation
### Step 5: Add Complete Details
For each task, include:
- **Exact file paths** (not "the config file" but `src/config/settings.py`)
- **Complete code examples** (not "add validation" but the actual code)
- **Exact commands** with expected output
- **Verification steps** that prove the task works
### Step 6: Review the Plan
Check:
- [ ] Tasks are sequential and logical
- [ ] Each task is bite-sized (2-5 min)
- [ ] File paths are exact
- [ ] Code examples are complete (copy-pasteable)
- [ ] Commands are exact with expected output
- [ ] No missing context
- [ ] DRY, YAGNI, TDD principles applied
### Step 7: Save the Plan
```bash
mkdir -p docs/plans
# Save plan to docs/plans/YYYY-MM-DD-feature-name.md
git add docs/plans/
git commit -m "docs: add implementation plan for [feature]"
```
## Principles
### DRY (Don't Repeat Yourself)
@@ -146,35 +208,21 @@ git commit -m "feat: add new feature component"
**Bad:** Copy-paste validation in 3 places
**Good:** Extract validation function, use everywhere
```python
# Good - DRY
def validate_email(email):
if not re.match(r'^[^@]+@[^@]+$', email):
raise ValueError("Invalid email")
return email
# Use everywhere
validate_email(user_input)
validate_email(config_email)
validate_email(imported_data)
```
### YAGNI (You Aren't Gonna Need It)
**Bad:** Add "flexibility" for future requirements
**Good:** Implement only what's needed now
```python
# Bad - YAGNI violation
# Bad YAGNI violation
class User:
def __init__(self, name, email):
self.name = name
self.email = email
self.preferences = {} # Not needed yet!
self.metadata = {} # Not needed yet!
self.settings = {} # Not needed yet!
# Good - YAGNI
# Good YAGNI
class User:
def __init__(self, name, email):
self.name = name
@@ -183,7 +231,7 @@ class User:
### TDD (Test-Driven Development)
Every task that produces code should include:
Every task that produces code should include the full TDD cycle:
1. Write failing test
2. Run to verify failure
3. Write minimal code
@@ -199,255 +247,50 @@ git add [files]
git commit -m "type: description"
```
## Writing Process
### Step 1: Understand Requirements
Read and understand:
- Feature requirements
- Design documents
- Acceptance criteria
- Constraints
### Step 2: Explore Codebase
```bash
# Understand project structure
find src -type f -name "*.py" | head -20
# Look at similar features
grep -r "similar_pattern" src/
# Check existing tests
ls tests/
```
### Step 3: Design Approach
Decide:
- Architecture pattern
- File organization
- Dependencies needed
- Testing strategy
### Step 4: Write Tasks
Create tasks in order:
1. Setup/infrastructure
2. Core functionality
3. Edge cases
4. Integration
5. Cleanup
### Step 5: Add Details
For each task, add:
- Exact file paths
- Complete code examples
- Exact commands
- Expected outputs
- Verification steps
### Step 6: Review Plan
Check:
- [ ] Tasks are sequential and logical
- [ ] Each task is bite-sized (2-5 min)
- [ ] File paths are exact
- [ ] Code examples are complete
- [ ] Commands are exact with expected output
- [ ] No missing context
### Step 7: Save Plan
```bash
# Create plans directory
mkdir -p docs/plans
# Save plan
cat > docs/plans/YYYY-MM-DD-feature-name.md << 'EOF'
[plan content]
EOF
# Commit plan
git add docs/plans/YYYY-MM-DD-feature-name.md
git commit -m "docs: add implementation plan for feature"
```
## Example Plan
```markdown
# User Authentication Implementation Plan
> **For Hermes:** Use subagent-driven-development to implement this plan.
**Goal:** Add JWT-based user authentication to the Flask API
**Architecture:** Use PyJWT for tokens, bcrypt for hashing. Middleware validates tokens on protected routes.
**Tech Stack:** Python, Flask, PyJWT, bcrypt
---
### Task 1: Create User model
**Objective:** Define User model with email and hashed password
**Files:**
- Create: `src/models/user.py`
- Test: `tests/models/test_user.py`
**Step 1: Write failing test**
```python
def test_user_creation():
user = User(email="test@example.com", password="secret123")
assert user.email == "test@example.com"
assert user.password_hash is not None
assert user.password_hash != "secret123" # Should be hashed
```
**Step 2: Run to verify failure**
```bash
pytest tests/models/test_user.py -v
```
Expected: FAIL - User class not defined
**Step 3: Implement User model**
```python
import bcrypt
class User:
def __init__(self, email, password):
self.email = email
self.password_hash = bcrypt.hashpw(
password.encode(),
bcrypt.gensalt()
)
```
**Step 4: Run to verify pass**
```bash
pytest tests/models/test_user.py -v
```
Expected: PASS
**Commit:**
```bash
git add src/models/user.py tests/models/test_user.py
git commit -m "feat: add User model with password hashing"
```
### Task 2: Create login endpoint
**Objective:** Add POST /login endpoint that returns JWT
**Files:**
- Modify: `src/app.py`
- Test: `tests/test_login.py`
[Continue...]
```
## Common Mistakes
### Vague Tasks
**Bad:**
```markdown
### Task 1: Add authentication
```
**Good:**
```markdown
### Task 1: Create User model with email and password_hash fields
```
**Bad:** "Add authentication"
**Good:** "Create User model with email and password_hash fields"
### Incomplete Code
**Bad:**
```markdown
Step 1: Add validation function
```
**Good:**
```markdown
Step 1: Add validation function
```python
def validate_email(email):
"""Validate email format."""
import re
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
if not re.match(pattern, email):
raise ValueError(f"Invalid email: {email}")
return email
```
**Bad:** "Step 1: Add validation function"
**Good:** "Step 1: Add validation function" followed by the complete function code
### Missing Verification
**Bad:**
```markdown
Step 3: Test it works
```
**Bad:** "Step 3: Test it works"
**Good:** "Step 3: Run `pytest tests/test_auth.py -v`, expected: 3 passed"
**Good:**
```markdown
Step 3: Verify authentication
```bash
curl -X POST http://localhost:5000/login \
-H "Content-Type: application/json" \
-d '{"email":"test@example.com","password":"secret123"}'
```
Expected: Returns 200 with JWT token in response
```
### Missing File Paths
## Integration with Other Skills
**Bad:** "Create the model file"
**Good:** "Create: `src/models/user.py`"
### With brainstorming
## Execution Handoff
**Sequence:**
1. brainstorming → Explore and refine design
2. writing-plans → Create implementation plan
3. subagent-driven-development → Execute plan
After saving the plan, offer the execution approach:
### With subagent-driven-development
**"Plan complete and saved. Ready to execute using subagent-driven-development — I'll dispatch a fresh subagent per task with two-stage review (spec compliance then code quality). Shall I proceed?"**
Plans feed into subagent-driven-development:
- Subagents implement each task
- Two-stage review ensures quality
- Plan provides context and requirements
### With test-driven-development
Every code-producing task should follow TDD:
1. Write failing test
2. Verify failure
3. Write minimal code
4. Verify pass
## Success Checklist
Before considering a plan complete:
- [ ] Header with goal, architecture, tech stack
- [ ] All tasks are bite-sized (2-5 min each)
- [ ] Exact file paths for every file
- [ ] Complete code examples (not partial)
- [ ] Exact commands with expected output
- [ ] Verification steps for each task
- [ ] Commit commands included
- [ ] DRY, YAGNI, TDD principles applied
- [ ] Tasks are sequential and logical
- [ ] Plan saved to docs/plans/
When executing, use the `subagent-driven-development` skill:
- Fresh `delegate_task` per task with full context
- Spec compliance review after each task
- Code quality review after spec passes
- Proceed only when both reviews approve
## Remember
```
Bite-sized tasks
Bite-sized tasks (2-5 min each)
Exact file paths
Complete code
Exact commands
Complete code (copy-pasteable)
Exact commands with expected output
Verification steps
DRY, YAGNI, TDD
Frequent commits
```
**A good plan makes implementation obvious.**