refactor: enhance software-development skills with Hermes integration

Improvements to all 5 skills adapted from obra/superpowers:

- Restored anti-rationalization tables and red flags from originals
  (key behavioral guardrails that prevent LLMs from taking shortcuts)
- Restored 'Rule of Three' for debugging (3+ failed fixes = question
  architecture, not keep fixing)
- Restored Pattern Analysis and Hypothesis Testing phases in debugging
- Restored 'Why Order Matters' rebuttals and verification checklist in TDD
- Added proper Hermes delegate_task integration with real parameter examples
  and toolset specifications throughout
- Added Hermes tool usage (search_files, read_file, terminal) for
  investigation and verification steps
- Removed references to non-existent skills (brainstorming,
  finishing-a-development-branch, executing-plans, using-git-worktrees)
- Removed generic language-specific sections (Go, Rust, Jest) that
  added bulk without agent value
- Tightened prose — cut ~430 lines while adding more actionable content
- Added execution handoff section to writing-plans
- Consistent cross-references between the 5 skills
This commit is contained in:
teknium1
2026-03-03 04:08:56 -08:00
parent 0e1723ef74
commit de0af4df66
5 changed files with 879 additions and 1312 deletions

View File

@@ -1,8 +1,8 @@
--- ---
name: requesting-code-review name: requesting-code-review
description: Use when completing tasks, implementing major features, or before merging. Validates work meets requirements through systematic review process. description: Use when completing tasks, implementing major features, or before merging. Validates work meets requirements through systematic review process.
version: 1.0.0 version: 1.1.0
author: Hermes Agent (adapted from Superpowers) author: Hermes Agent (adapted from obra/superpowers)
license: MIT license: MIT
metadata: metadata:
hermes: hermes:
@@ -14,108 +14,102 @@ metadata:
## Overview ## Overview
Systematic code review catches issues before they cascade. Review early, review often. Dispatch a reviewer subagent to catch issues before they cascade. Review early, review often.
**Core principle:** Fresh perspective finds issues you'll miss. **Core principle:** Fresh perspective finds issues you'll miss.
## When to Request Review ## When to Request Review
**Mandatory Reviews:** **Mandatory:**
- After each task in subagent-driven development - After each task in subagent-driven development
- After completing major features - After completing a major feature
- Before merge to main - Before merge to main
- After bug fixes - After bug fixes
**Optional but Valuable:** **Optional but valuable:**
- When stuck (fresh perspective) - When stuck (fresh perspective)
- Before refactoring (baseline check) - Before refactoring (baseline check)
- After complex logic implementation - After complex logic implementation
- When touching critical code (auth, payments, data) - When touching critical code (auth, payments, data)
**Don't skip because:** **Never skip because:**
- "It's simple" (simple bugs compound) - "It's simple" simple bugs compound
- "I'm in a hurry" (reviews save time) - "I'm in a hurry" reviews save time
- "I tested it" (you have blind spots) - "I tested it" you have blind spots
## Review Process ## Review Process
### Step 1: Prepare Context ### Step 1: Self-Review First
Gather: Before dispatching a reviewer, check yourself:
- What was implemented
- Original requirements/plan
- Files changed
- Test results
```bash
# Get changed files
git diff --name-only HEAD~1
# Get diff summary
git diff --stat HEAD~1
# Get commit messages
git log --oneline HEAD~5
```
### Step 2: Self-Review First
Before requesting external review:
**Checklist:**
- [ ] Code follows project conventions - [ ] Code follows project conventions
- [ ] All tests pass - [ ] All tests pass
- [ ] No debug print statements - [ ] No debug print statements left
- [ ] No hardcoded secrets - [ ] No hardcoded secrets or credentials
- [ ] Error handling in place - [ ] Error handling in place
- [ ] Documentation updated
- [ ] Commit messages are clear - [ ] Commit messages are clear
```bash ```bash
# Run full test suite # Run full test suite
pytest pytest tests/ -q
# Check for debug code # Check for debug code
grep -r "print(" src/ --include="*.py" search_files("print(", path="src/", file_glob="*.py")
grep -r "debugger" src/ --include="*.js" search_files("console.log", path="src/", file_glob="*.js")
# Check for TODOs # Check for TODOs
grep -r "TODO" src/ --include="*.py" search_files("TODO|FIXME|HACK", path="src/")
``` ```
### Step 3: Request Review ### Step 2: Gather Context
Use `delegate_task` to dispatch a reviewer subagent: ```bash
# Changed files
git diff --name-only HEAD~1
# Diff summary
git diff --stat HEAD~1
# Recent commits
git log --oneline -5
```
### Step 3: Dispatch Reviewer Subagent
Use `delegate_task` to dispatch a focused reviewer:
```python ```python
delegate_task( delegate_task(
goal="Review implementation for quality and correctness", goal="Review implementation for correctness and quality",
context=""" context="""
WHAT WAS IMPLEMENTED: [Brief description] WHAT WAS IMPLEMENTED:
[Brief description of the feature/fix]
ORIGINAL REQUIREMENTS: [From plan or issue]
ORIGINAL REQUIREMENTS:
[From plan, issue, or user request]
FILES CHANGED: FILES CHANGED:
- src/feature.py (added X function) - src/models/user.py (added User class)
- tests/test_feature.py (added tests) - src/auth/login.py (added login endpoint)
- tests/test_auth.py (added 8 tests)
COMMIT RANGE: [SHA range or branch]
REVIEW CHECKLIST:
CHECK FOR: - [ ] Correctness: Does it do what it should?
- Correctness (does it do what it should?) - [ ] Edge cases: Are they handled?
- Edge cases handled? - [ ] Error handling: Is it adequate?
- Error handling adequate? - [ ] Code quality: Clear names, good structure?
- Code quality and style - [ ] Test coverage: Are tests meaningful?
- Test coverage - [ ] Security: Any vulnerabilities?
- Security issues - [ ] Performance: Any obvious issues?
- Performance concerns
OUTPUT FORMAT: OUTPUT FORMAT:
- Summary: [brief assessment] - Summary: [brief assessment]
- Critical Issues: [must fix] - Critical Issues: [must fix — blocks merge]
- Important Issues: [should fix] - Important Issues: [should fix before merge]
- Minor Issues: [nice to have] - Minor Issues: [nice to have]
- Verdict: [APPROVE / REQUEST_CHANGES / NEEDS_WORK] - Strengths: [what was done well]
- Verdict: APPROVE / REQUEST_CHANGES
""", """,
toolsets=['file'] toolsets=['file']
) )
@@ -123,110 +117,72 @@ delegate_task(
### Step 4: Act on Feedback ### Step 4: Act on Feedback
**Critical Issues (Block merge):** **Critical Issues (block merge):**
- Security vulnerabilities - Security vulnerabilities
- Broken functionality - Broken functionality
- Data loss risk - Data loss risk
- Test failures - Test failures
- **Action:** Fix immediately before proceeding
**Action:** Fix immediately before proceeding **Important Issues (should fix):**
**Important Issues (Should fix):**
- Missing edge case handling - Missing edge case handling
- Poor error messages - Poor error messages
- Unclear code - Unclear code
- Missing tests - Missing tests
- **Action:** Fix before merge if possible
**Action:** Fix before merge if possible **Minor Issues (nice to have):**
**Minor Issues (Nice to have):**
- Style preferences - Style preferences
- Refactoring suggestions - Refactoring suggestions
- Documentation improvements - Documentation improvements
- **Action:** Note for later or quick fix
**Action:** Note for later or quick fix **If reviewer is wrong:**
- Push back with technical reasoning
- Show code/tests that prove it works
- Request clarification
## Review Dimensions ## Review Dimensions
### Correctness ### Correctness
**Questions:**
- Does it implement the requirements? - Does it implement the requirements?
- Are there logic errors? - Are there logic errors?
- Do edge cases work? - Do edge cases work?
- Are there race conditions? - Are there race conditions?
**Check:**
- Read implementation against requirements
- Trace through edge cases
- Check boundary conditions
### Code Quality ### Code Quality
**Questions:**
- Is code readable? - Is code readable?
- Are names clear? - Are names clear and descriptive?
- Is it too complex? - Is it too complex? (Functions >20 lines = smell)
- Is there duplication? - Is there duplication?
**Check:**
- Function length (aim <20 lines)
- Cyclomatic complexity
- DRY violations
- Naming clarity
### Testing ### Testing
- Are there meaningful tests?
**Questions:**
- Are there tests?
- Do they cover edge cases? - Do they cover edge cases?
- Are they meaningful? - Do they test behavior, not implementation?
- Do they pass? - Do all tests pass?
**Check:**
- Test coverage
- Edge case coverage
- Test clarity
- Assertion quality
### Security ### Security
**Questions:**
- Any injection vulnerabilities? - Any injection vulnerabilities?
- Proper input validation? - Proper input validation?
- Secrets handled correctly? - Secrets handled correctly?
- Access control in place? - Access control in place?
**Check:**
- Input sanitization
- Authentication/authorization
- Secret management
- SQL/query safety
### Performance ### Performance
**Questions:**
- Any N+1 queries? - Any N+1 queries?
- Unnecessary computation? - Unnecessary computation in loops?
- Memory leaks? - Memory leaks?
- Scalability concerns? - Missing caching opportunities?
**Check:**
- Database queries
- Algorithm complexity
- Resource usage
- Caching opportunities
## Review Output Format ## Review Output Format
Standard review format: Standard format for reviewer subagent output:
```markdown ```markdown
## Review Summary ## Review Summary
**Assessment:** [Brief overall assessment] **Assessment:** [Brief overall assessment]
**Verdict:** APPROVE / REQUEST_CHANGES
**Verdict:** [APPROVE / REQUEST_CHANGES / NEEDS_WORK]
--- ---
@@ -237,8 +193,6 @@ Standard review format:
- Problem: [Description] - Problem: [Description]
- Suggestion: [How to fix] - Suggestion: [How to fix]
---
## Important Issues (Should Fix) ## Important Issues (Should Fix)
1. **[Issue title]** 1. **[Issue title]**
@@ -246,15 +200,11 @@ Standard review format:
- Problem: [Description] - Problem: [Description]
- Suggestion: [How to fix] - Suggestion: [How to fix]
---
## Minor Issues (Optional) ## Minor Issues (Optional)
1. **[Issue title]** 1. **[Issue title]**
- Suggestion: [Improvement idea] - Suggestion: [Improvement idea]
---
## Strengths ## Strengths
- [What was done well] - [What was done well]
@@ -264,111 +214,41 @@ Standard review format:
### With subagent-driven-development ### With subagent-driven-development
Review after EACH task: Review after EACH task — this is the two-stage review:
1. Subagent implements task 1. Spec compliance review (does it match the plan?)
2. Request code review 2. Code quality review (is it well-built?)
3. Fix issues 3. Fix issues from either review
4. Proceed to next task 4. Proceed to next task only when both approve
### With test-driven-development ### With test-driven-development
Review checks: Review verifies:
- Tests written first? - Tests were written first (RED-GREEN-REFACTOR followed?)
- Tests are meaningful? - Tests are meaningful (not just asserting True)?
- Edge cases covered? - Edge cases covered?
- All tests pass? - All tests pass?
### With writing-plans ### With writing-plans
Review validates: Review validates:
- Implementation matches plan? - Implementation matches the plan?
- All tasks completed? - All tasks completed?
- Quality standards met? - Quality standards met?
## Common Review Patterns ## Red Flags
### Pre-Merge Review **Never:**
- Skip review because "it's simple"
Before merging feature branch: - Ignore Critical issues
- Proceed with unfixed Important issues
```bash - Argue with valid technical feedback without evidence
# Create review checkpoint
git log --oneline main..feature-branch
# Get summary of changes
git diff --stat main..feature-branch
# Request review
delegate_task(
goal="Pre-merge review of feature branch",
context="[changes, requirements, test results]"
)
# Address feedback
# Merge when approved
```
### Continuous Review
During subagent-driven development:
```python
# After each task
if task_complete:
review_result = request_review(task)
if review_result.has_critical_issues():
fix_issues(review_result.critical)
re_review()
proceed_to_next_task()
```
### Emergency Review
When fixing production bugs:
1. Fix with tests
2. Self-review
3. Quick peer review
4. Deploy
5. Full review post-deploy
## Review Best Practices
### As Review Requester
**Do:**
- Provide complete context
- Highlight areas of concern
- Ask specific questions
- Be responsive to feedback
- Fix issues promptly
**Don't:**
- Rush the reviewer
- Argue without evidence
- Ignore feedback
- Take criticism personally
### As Reviewer (via subagent)
**Do:**
- Be specific about issues
- Suggest improvements
- Acknowledge what works
- Prioritize issues
**Don't:**
- Nitpick style (unless project requires)
- Make vague comments
- Block without explanation
- Be overly critical
## Quality Gates ## Quality Gates
**Must pass before merge:** **Must pass before merge:**
- [ ] No critical issues - [ ] No critical issues
- [ ] All tests pass - [ ] All tests pass
- [ ] Review approved - [ ] Review verdict: APPROVE
- [ ] Requirements met - [ ] Requirements met
**Should pass before merge:** **Should pass before merge:**
@@ -376,10 +256,6 @@ When fixing production bugs:
- [ ] Documentation updated - [ ] Documentation updated
- [ ] Performance acceptable - [ ] Performance acceptable
**Nice to have:**
- [ ] No minor issues
- [ ] Extra polish
## Remember ## Remember
``` ```

View File

@@ -1,8 +1,8 @@
--- ---
name: subagent-driven-development name: subagent-driven-development
description: Use when executing implementation plans with independent tasks. Dispatches fresh delegate_task per task with two-stage review (spec compliance then code quality). description: Use when executing implementation plans with independent tasks. Dispatches fresh delegate_task per task with two-stage review (spec compliance then code quality).
version: 1.0.0 version: 1.1.0
author: Hermes Agent (adapted from Superpowers) author: Hermes Agent (adapted from obra/superpowers)
license: MIT license: MIT
metadata: metadata:
hermes: hermes:
@@ -16,33 +16,41 @@ metadata:
Execute implementation plans by dispatching fresh subagents per task with systematic two-stage review. Execute implementation plans by dispatching fresh subagents per task with systematic two-stage review.
**Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration **Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration.
## When to Use ## When to Use
Use this skill when: Use this skill when:
- You have an implementation plan (from writing-plans skill) - You have an implementation plan (from writing-plans skill or user requirements)
- Tasks are mostly independent - Tasks are mostly independent
- You want to stay in the current session
- Quality and spec compliance are important - Quality and spec compliance are important
- You want automated review between tasks
**vs. Manual execution:** **vs. manual execution:**
- Parallel task execution possible - Fresh context per task (no confusion from accumulated state)
- Automated review process - Automated review process catches issues early
- Consistent quality checks - Consistent quality checks across all tasks
- Better for complex multi-step plans - Subagents can ask questions before starting work
## The Process ## The Process
### 1. Read and Parse Plan ### 1. Read and Parse Plan
```markdown Read the plan file. Extract ALL tasks with their full text and context upfront. Create a todo list:
[Read plan file once: docs/plans/feature-plan.md]
[Extract all tasks with full text and context] ```python
[Create todo list with all tasks] # Read the plan
read_file("docs/plans/feature-plan.md")
# Create todo list with all tasks
todo([
{"id": "task-1", "content": "Create User model with email field", "status": "pending"},
{"id": "task-2", "content": "Add password hashing utility", "status": "pending"},
{"id": "task-3", "content": "Create login endpoint", "status": "pending"},
])
``` ```
**Action:** Read plan, extract tasks, create todo list. **Key:** Read the plan ONCE. Extract everything. Don't make subagents read the plan file — provide the full task text directly in context.
### 2. Per-Task Workflow ### 2. Per-Task Workflow
@@ -50,124 +58,137 @@ For EACH task in the plan:
#### Step 1: Dispatch Implementer Subagent #### Step 1: Dispatch Implementer Subagent
Use `delegate_task` with: Use `delegate_task` with complete context:
- **goal:** Implement [specific task from plan]
- **context:** Full task description from plan, project structure, relevant files
- **toolsets:** ['terminal', 'file', 'web'] (or as needed)
**Example:**
```python ```python
# Task: Add user authentication middleware
delegate_task( delegate_task(
goal="Implement JWT authentication middleware as specified in Task 3 of the plan", goal="Implement Task 1: Create User model with email and password_hash fields",
context=""" context="""
Task from plan: TASK FROM PLAN:
- Create: src/middleware/auth.py - Create: src/models/user.py
- Validate JWT tokens from Authorization header - Add User class with email (str) and password_hash (str) fields
- Return 401 for invalid tokens - Use bcrypt for password hashing
- Attach user info to request object - Include __repr__ for debugging
Project structure: FOLLOW TDD:
- Flask app in src/app.py 1. Write failing test in tests/models/test_user.py
- Uses PyJWT library 2. Run: pytest tests/models/test_user.py -v (verify FAIL)
- Existing middleware pattern in src/middleware/ 3. Write minimal implementation
4. Run: pytest tests/models/test_user.py -v (verify PASS)
5. Run: pytest tests/ -q (verify no regressions)
6. Commit: git add -A && git commit -m "feat: add User model with password hashing"
PROJECT CONTEXT:
- Python 3.11, Flask app in src/app.py
- Existing models in src/models/
- Tests use pytest, run from project root
- bcrypt already in requirements.txt
""", """,
toolsets=['terminal', 'file'] toolsets=['terminal', 'file']
) )
``` ```
#### Step 2: Implementer Subagent Works #### Step 2: Dispatch Spec Compliance Reviewer
The subagent will: After the implementer completes, verify against the original spec:
1. Ask questions if needed (you answer)
2. Implement the task following TDD
3. Write tests
4. Run tests to verify
5. Self-review
6. Report completion
**Your role:** Answer questions, provide context.
#### Step 3: Spec Compliance Review
Dispatch reviewer subagent:
```python ```python
delegate_task( delegate_task(
goal="Review if implementation matches spec from plan", goal="Review if implementation matches the spec from the plan",
context=""" context="""
Original task spec: [copy from plan] ORIGINAL TASK SPEC:
Implementation: [file paths and key code] - Create src/models/user.py with User class
- Fields: email (str), password_hash (str)
Check: - Use bcrypt for password hashing
- All requirements from spec implemented? - Include __repr__
- File paths match spec?
- Behavior matches spec? CHECK:
- Nothing extra added? - [ ] All requirements from spec implemented?
- [ ] File paths match spec?
- [ ] Function signatures match spec?
- [ ] Behavior matches expected?
- [ ] Nothing extra added (no scope creep)?
OUTPUT: PASS or list of specific spec gaps to fix.
""", """,
toolsets=['file'] toolsets=['file']
) )
``` ```
**If spec issues found:** **If spec issues found:** Fix gaps, then re-run spec review. Continue only when spec-compliant.
- Subagent fixes gaps
- Re-run spec review
- Continue only when spec-compliant
#### Step 4: Code Quality Review #### Step 3: Dispatch Code Quality Reviewer
Dispatch quality reviewer: After spec compliance passes:
```python ```python
delegate_task( delegate_task(
goal="Review code quality and best practices", goal="Review code quality for Task 1 implementation",
context=""" context="""
Code to review: [file paths] FILES TO REVIEW:
- src/models/user.py
Check: - tests/models/test_user.py
- Follows project style?
- Proper error handling? CHECK:
- Good naming? - [ ] Follows project conventions and style?
- Test coverage adequate? - [ ] Proper error handling?
- No obvious bugs? - [ ] Clear variable/function names?
- [ ] Adequate test coverage?
- [ ] No obvious bugs or missed edge cases?
- [ ] No security issues?
OUTPUT FORMAT:
- Critical Issues: [must fix before proceeding]
- Important Issues: [should fix]
- Minor Issues: [optional]
- Verdict: APPROVED or REQUEST_CHANGES
""", """,
toolsets=['file'] toolsets=['file']
) )
``` ```
**If quality issues found:** **If quality issues found:** Fix issues, re-review. Continue only when approved.
- Subagent fixes issues
- Re-run quality review
- Continue only when approved
#### Step 5: Mark Complete #### Step 4: Mark Complete
Update todo list, mark task complete. ```python
todo([{"id": "task-1", "content": "Create User model with email field", "status": "completed"}], merge=True)
```
### 3. Final Review ### 3. Final Review
After ALL tasks complete: After ALL tasks are complete, dispatch a final integration reviewer:
```python ```python
delegate_task( delegate_task(
goal="Review entire implementation for consistency", goal="Review the entire implementation for consistency and integration issues",
context="All tasks completed, review for integration issues", context="""
toolsets=['file'] All tasks from the plan are complete. Review the full implementation:
- Do all components work together?
- Any inconsistencies between tasks?
- All tests passing?
- Ready for merge?
""",
toolsets=['terminal', 'file']
) )
``` ```
### 4. Branch Cleanup ### 4. Verify and Commit
Use `finishing-a-development-branch` skill: ```bash
- Verify all tests pass # Run full test suite
- Present merge options pytest tests/ -q
- Clean up worktree
# Review all changes
git diff --stat
# Final commit if needed
git add -A && git commit -m "feat: complete [feature name] implementation"
```
## Task Granularity ## Task Granularity
**Good task size:** 2-5 minutes of focused work **Each task = 2-5 minutes of focused work.**
**Examples:**
**Too big:** **Too big:**
- "Implement user authentication system" - "Implement user authentication system"
@@ -177,188 +198,134 @@ Use `finishing-a-development-branch` skill:
- "Add password hashing function" - "Add password hashing function"
- "Create login endpoint" - "Create login endpoint"
- "Add JWT token generation" - "Add JWT token generation"
- "Create registration endpoint"
## Communication Pattern ## Red Flags — Never Do These
### You to Subagent - Start implementation without a plan
- Skip reviews (spec compliance OR code quality)
**Provide:** - Proceed with unfixed critical/important issues
- Clear task description - Dispatch multiple implementation subagents for tasks that touch the same files
- Exact file paths - Make subagent read the plan file (provide full text in context instead)
- Expected behavior - Skip scene-setting context (subagent needs to understand where the task fits)
- Success criteria - Ignore subagent questions (answer before letting them proceed)
- Relevant context - Accept "close enough" on spec compliance
- Skip review loops (reviewer found issues → implementer fixes → review again)
**Example:** - Let implementer self-review replace actual review (both are needed)
``` - **Start code quality review before spec compliance is PASS** (wrong order)
Task: Add email validation - Move to next task while either review has open issues
Files: Create src/validators/email.py
Expected: Function returns True for valid emails, False for invalid
Success: Tests pass for 10 test cases including edge cases
Context: Used in user registration flow
```
### Subagent to You
**Expect:**
- Questions for clarification
- Progress updates
- Completion report
- Self-review summary
**Respond to:**
- Answer questions promptly
- Provide missing context
- Approve approach decisions
## Two-Stage Review Details
### Stage 1: Spec Compliance
**Checks:**
- [ ] All requirements from plan implemented
- [ ] File paths match specification
- [ ] Function signatures match spec
- [ ] Behavior matches expected
- [ ] No scope creep (nothing extra)
**Output:** PASS or list of spec gaps
### Stage 2: Code Quality
**Checks:**
- [ ] Follows language conventions
- [ ] Consistent with project style
- [ ] Clear variable/function names
- [ ] Proper error handling
- [ ] Adequate test coverage
- [ ] No obvious bugs/edge cases missed
- [ ] Documentation if needed
**Output:** APPROVED or list of issues (critical/important/minor)
## Handling Issues ## Handling Issues
### Critical Issues ### If Subagent Asks Questions
**Examples:** Security vulnerability, broken functionality, data loss risk - Answer clearly and completely
- Provide additional context if needed
- Don't rush them into implementation
**Action:** Must fix before proceeding ### If Reviewer Finds Issues
### Important Issues - Implementer subagent (or a new one) fixes them
- Reviewer reviews again
- Repeat until approved
- Don't skip the re-review
**Examples:** Missing tests, poor error handling, unclear code ### If Subagent Fails a Task
**Action:** Should fix before proceeding - Dispatch a new fix subagent with specific instructions about what went wrong
- Don't try to fix manually in the controller session (context pollution)
### Minor Issues ## Efficiency Notes
**Examples:** Style inconsistency, minor refactoring opportunity **Why fresh subagent per task:**
- Prevents context pollution from accumulated state
- Each subagent gets clean, focused context
- No confusion from prior tasks' code or reasoning
**Action:** Note for later, optional fix **Why two-stage review:**
- Spec review catches under/over-building early
- Quality review ensures the implementation is well-built
- Catches issues before they compound across tasks
**Cost trade-off:**
- More subagent invocations (implementer + 2 reviewers per task)
- But catches issues early (cheaper than debugging compounded problems later)
## Integration with Other Skills ## Integration with Other Skills
### With writing-plans
This skill EXECUTES plans created by the writing-plans skill:
1. User requirements → writing-plans → implementation plan
2. Implementation plan → subagent-driven-development → working code
### With test-driven-development ### With test-driven-development
Subagent should: Implementer subagents should follow TDD:
1. Write failing test first 1. Write failing test first
2. Implement minimal code 2. Implement minimal code
3. Verify test passes 3. Verify test passes
4. Commit 4. Commit
### With systematic-debugging Include TDD instructions in every implementer context.
If subagent encounters bugs:
1. Pause implementation
2. Debug systematically
3. Fix root cause
4. Resume
### With writing-plans
This skill EXECUTES plans created by writing-plans skill.
**Sequence:**
1. brainstorming → writing-plans → subagent-driven-development
### With requesting-code-review ### With requesting-code-review
After subagent completes task, use requesting-code-review skill for final validation. The two-stage review process IS the code review. For final integration review, use the requesting-code-review skill's review dimensions.
## Common Patterns ### With systematic-debugging
### Pattern: Fresh Subagent Per Task If a subagent encounters bugs during implementation:
1. Follow systematic-debugging process
**Why:** Prevents context pollution 2. Find root cause before fixing
**How:** New delegate_task for each task 3. Write regression test
**Result:** Each subagent has clean context 4. Resume implementation
### Pattern: Two-Stage Review
**Why:** Catch issues early, ensure quality
**How:** Spec review → Quality review
**Result:** High-quality, spec-compliant code
### Pattern: Frequent Checkpoints
**Why:** Catch issues before they compound
**How:** Review after each task
**Result:** Issues don't cascade
## Best Practices
1. **Clear Task Boundaries**
- One task = one focused change
- Independent where possible
- Clear success criteria
2. **Complete Context**
- Provide all needed files
- Explain project conventions
- Share relevant examples
3. **Review Discipline**
- Don't skip spec review
- Address critical issues immediately
- Keep quality bar consistent
4. **Communication**
- Answer subagent questions quickly
- Clarify when needed
- Provide feedback on reviews
## Example Workflow ## Example Workflow
```markdown ```
User: Implement user authentication [Read plan: docs/plans/auth-feature.md]
[Create todo list with 5 tasks]
You: I'll use subagent-driven development. Let me create a plan first. --- Task 1: Create User model ---
[Uses writing-plans skill] [Dispatch implementer subagent]
Implementer: "Should email be unique?"
You: "Yes, email must be unique"
Implementer: Implemented, 3/3 tests passing, committed.
Plan created with 5 tasks: [Dispatch spec reviewer]
1. Create User model Spec reviewer: ✅ PASS — all requirements met
2. Add password hashing
3. Implement login endpoint
4. Add JWT middleware
5. Create registration endpoint
--- Task 1 --- [Dispatch quality reviewer]
[Dispatch implementer subagent for User model] Quality reviewer: ✅ APPROVED — clean code, good tests
[Subagent asks: "Should email be unique?"]
You: Yes, email must be unique
[Subagent implements]
[Dispatch spec reviewer - PASS]
[Dispatch quality reviewer - APPROVED]
Task 1 complete
--- Task 2 --- [Mark Task 1 complete]
[Dispatch implementer for password hashing]
...
[After all tasks] --- Task 2: Password hashing ---
[Final review] [Dispatch implementer subagent]
[Merge branch] Implementer: No questions, implemented, 5/5 tests passing.
[Dispatch spec reviewer]
Spec reviewer: ❌ Missing: password strength validation (spec says "min 8 chars")
[Implementer fixes]
Implementer: Added validation, 7/7 tests passing.
[Dispatch spec reviewer again]
Spec reviewer: ✅ PASS
[Dispatch quality reviewer]
Quality reviewer: Important: Magic number 8, extract to constant
Implementer: Extracted MIN_PASSWORD_LENGTH constant
Quality reviewer: ✅ APPROVED
[Mark Task 2 complete]
... (continue for all tasks)
[After all tasks: dispatch final integration reviewer]
[Run full test suite: all passing]
[Done!]
``` ```
## Remember ## Remember
@@ -366,8 +333,8 @@ Task 1 complete
``` ```
Fresh subagent per task Fresh subagent per task
Two-stage review every time Two-stage review every time
Spec compliance first Spec compliance FIRST
Code quality second Code quality SECOND
Never skip reviews Never skip reviews
Catch issues early Catch issues early
``` ```

View File

@@ -1,8 +1,8 @@
--- ---
name: systematic-debugging name: systematic-debugging
description: Use when encountering any bug, test failure, or unexpected behavior. 4-phase root cause investigation process - NO fixes without understanding the problem first. description: Use when encountering any bug, test failure, or unexpected behavior. 4-phase root cause investigation NO fixes without understanding the problem first.
version: 1.0.0 version: 1.1.0
author: Hermes Agent (adapted from Superpowers) author: Hermes Agent (adapted from obra/superpowers)
license: MIT license: MIT
metadata: metadata:
hermes: hermes:
@@ -48,7 +48,7 @@ Use for ANY technical issue:
**Don't skip when:** **Don't skip when:**
- Issue seems simple (simple bugs have root causes too) - Issue seems simple (simple bugs have root causes too)
- You're in a hurry (rushing guarantees rework) - You're in a hurry (rushing guarantees rework)
- Manager wants it fixed NOW (systematic is faster than thrashing) - Someone wants it fixed NOW (systematic is faster than thrashing)
## The Four Phases ## The Four Phases
@@ -67,7 +67,7 @@ You MUST complete each phase before proceeding to the next.
- Read stack traces completely - Read stack traces completely
- Note line numbers, file paths, error codes - Note line numbers, file paths, error codes
**Action:** Copy full error message to your notes. **Action:** Use `read_file` on the relevant source files. Use `search_files` to find the error string in the codebase.
### 2. Reproduce Consistently ### 2. Reproduce Consistently
@@ -76,16 +76,24 @@ You MUST complete each phase before proceeding to the next.
- Does it happen every time? - Does it happen every time?
- If not reproducible → gather more data, don't guess - If not reproducible → gather more data, don't guess
**Action:** Write down exact reproduction steps. **Action:** Use the `terminal` tool to run the failing test or trigger the bug:
```bash
# Run specific failing test
pytest tests/test_module.py::test_name -v
# Run with verbose output
pytest tests/test_module.py -v --tb=long
```
### 3. Check Recent Changes ### 3. Check Recent Changes
- What changed that could cause this? - What changed that could cause this?
- Git diff, recent commits - Git diff, recent commits
- New dependencies, config changes - New dependencies, config changes
- Environmental differences
**Commands:** **Action:**
```bash ```bash
# Recent commits # Recent commits
git log --oneline -10 git log --oneline -10
@@ -94,59 +102,50 @@ git log --oneline -10
git diff git diff
# Changes in specific file # Changes in specific file
git log -p --follow src/problematic_file.py git log -p --follow src/problematic_file.py | head -100
``` ```
### 4. Gather Evidence in Multi-Component Systems ### 4. Gather Evidence in Multi-Component Systems
**WHEN system has multiple components (CI pipeline, API service, database layer):** **WHEN system has multiple components (API service database, CI → build → deploy):**
**BEFORE proposing fixes, add diagnostic instrumentation:** **BEFORE proposing fixes, add diagnostic instrumentation:**
For EACH component boundary: For EACH component boundary:
- Log what data enters component - Log what data enters the component
- Log what data exits component - Log what data exits the component
- Verify environment/config propagation - Verify environment/config propagation
- Check state at each layer - Check state at each layer
Run once to gather evidence showing WHERE it breaks Run once to gather evidence showing WHERE it breaks.
THEN analyze evidence to identify failing component THEN analyze evidence to identify the failing component.
THEN investigate that specific component THEN investigate that specific component.
### 5. Trace Data Flow
**WHEN error is deep in the call stack:**
- Where does the bad value originate?
- What called this function with the bad value?
- Keep tracing upstream until you find the source
- Fix at the source, not at the symptom
**Action:** Use `search_files` to trace references:
**Example (multi-layer system):**
```python ```python
# Layer 1: Entry point # Find where the function is called
def entry_point(input_data): search_files("function_name(", path="src/", file_glob="*.py")
print(f"DEBUG: Input received: {input_data}")
result = process_layer1(input_data)
print(f"DEBUG: Layer 1 output: {result}")
return result
# Layer 2: Processing # Find where the variable is set
def process_layer1(data): search_files("variable_name\\s*=", path="src/", file_glob="*.py")
print(f"DEBUG: Layer 1 received: {data}")
# ... processing ...
print(f"DEBUG: Layer 1 returning: {result}")
return result
``` ```
**Action:** Add logging, run once, analyze output.
### 5. Isolate the Problem
- Comment out code until problem disappears
- Binary search through recent changes
- Create minimal reproduction case
- Test with fresh environment
**Action:** Create minimal reproduction case.
### Phase 1 Completion Checklist ### Phase 1 Completion Checklist
- [ ] Error messages fully read and understood - [ ] Error messages fully read and understood
- [ ] Issue reproduced consistently - [ ] Issue reproduced consistently
- [ ] Recent changes identified and reviewed - [ ] Recent changes identified and reviewed
- [ ] Evidence gathered (logs, state) - [ ] Evidence gathered (logs, state, data flow)
- [ ] Problem isolated to specific component/code - [ ] Problem isolated to specific component/code
- [ ] Root cause hypothesis formed - [ ] Root cause hypothesis formed
@@ -154,290 +153,214 @@ def process_layer1(data):
--- ---
## Phase 2: Solution Design ## Phase 2: Pattern Analysis
**Given the root cause, design the fix:** **Find the pattern before fixing:**
### 1. Understand the Fix Area ### 1. Find Working Examples
- Read relevant code thoroughly - Locate similar working code in the same codebase
- Understand data flow - What works that's similar to what's broken?
- Identify affected components
- Check for similar issues elsewhere
**Action:** Read all relevant code files. **Action:** Use `search_files` to find comparable patterns:
### 2. Design Minimal Fix ```python
search_files("similar_pattern", path="src/", file_glob="*.py")
```
- Smallest change that fixes root cause ### 2. Compare Against References
- Avoid scope creep
- Don't refactor while fixing
- Fix one issue at a time
**Action:** Write down the exact fix before coding. - If implementing a pattern, read the reference implementation COMPLETELY
- Don't skim — read every line
- Understand the pattern fully before applying
### 3. Consider Side Effects ### 3. Identify Differences
- What else could this change affect? - What's different between working and broken?
- Are there dependencies? - List every difference, however small
- Will this break other functionality? - Don't assume "that can't matter"
**Action:** Identify potential side effects. ### 4. Understand Dependencies
### Phase 2 Completion Checklist - What other components does this need?
- What settings, config, environment?
- [ ] Fix area code fully understood - What assumptions does it make?
- [ ] Minimal fix designed
- [ ] Side effects identified
- [ ] Fix approach documented
--- ---
## Phase 3: Implementation ## Phase 3: Hypothesis and Testing
**Make the fix:** **Scientific method:**
### 1. Write Test First (if possible) ### 1. Form a Single Hypothesis
```python - State clearly: "I think X is the root cause because Y"
def test_should_handle_empty_input(): - Write it down
"""Regression test for bug #123""" - Be specific, not vague
result = process_data("")
assert result == expected_empty_result
```
### 2. Implement Fix ### 2. Test Minimally
```python - Make the SMALLEST possible change to test the hypothesis
# Before (buggy) - One variable at a time
def process_data(data): - Don't fix multiple things at once
return data.split(",")[0]
# After (fixed) ### 3. Verify Before Continuing
def process_data(data):
if not data: - Did it work? → Phase 4
return "" - Didn't work? → Form NEW hypothesis
return data.split(",")[0] - DON'T add more fixes on top
```
### 4. When You Don't Know
- Say "I don't understand X"
- Don't pretend to know
- Ask the user for help
- Research more
---
## Phase 4: Implementation
**Fix the root cause, not the symptom:**
### 1. Create Failing Test Case
- Simplest possible reproduction
- Automated test if possible
- MUST have before fixing
- Use the `test-driven-development` skill
### 2. Implement Single Fix
- Address the root cause identified
- ONE change at a time
- No "while I'm here" improvements
- No bundled refactoring
### 3. Verify Fix ### 3. Verify Fix
```bash ```bash
# Run the specific test # Run the specific regression test
pytest tests/test_data.py::test_should_handle_empty_input -v pytest tests/test_module.py::test_regression -v
# Run all tests to check for regressions # Run full suite — no regressions
pytest pytest tests/ -q
``` ```
### Phase 3 Completion Checklist ### 4. If Fix Doesn't Work — The Rule of Three
- [ ] Test written that reproduces the bug - **STOP.**
- [ ] Minimal fix implemented - Count: How many fixes have you tried?
- [ ] Test passes - If < 3: Return to Phase 1, re-analyze with new information
- [ ] No regressions introduced - **If ≥ 3: STOP and question the architecture (step 5 below)**
- DON'T attempt Fix #4 without architectural discussion
### 5. If 3+ Fixes Failed: Question Architecture
**Pattern indicating an architectural problem:**
- Each fix reveals new shared state/coupling in a different place
- Fixes require "massive refactoring" to implement
- Each fix creates new symptoms elsewhere
**STOP and question fundamentals:**
- Is this pattern fundamentally sound?
- Are we "sticking with it through sheer inertia"?
- Should we refactor the architecture vs. continue fixing symptoms?
**Discuss with the user before attempting more fixes.**
This is NOT a failed hypothesis — this is a wrong architecture.
--- ---
## Phase 4: Verification ## Red Flags — STOP and Follow Process
**Confirm it's actually fixed:** If you catch yourself thinking:
- "Quick fix for now, investigate later"
- "Just try changing X and see if it works"
- "Add multiple changes, run tests"
- "Skip the test, I'll manually verify"
- "It's probably X, let me fix that"
- "I don't fully understand but this might work"
- "Pattern says X but I'll adapt it differently"
- "Here are the main problems: [lists fixes without investigation]"
- Proposing solutions before tracing data flow
- **"One more fix attempt" (when already tried 2+)**
- **Each fix reveals a new problem in a different place**
### 1. Reproduce Original Issue **ALL of these mean: STOP. Return to Phase 1.**
- Follow original reproduction steps **If 3+ fixes failed:** Question the architecture (Phase 4 step 5).
- Verify issue is resolved
- Test edge cases
### 2. Regression Testing ## Common Rationalizations
```bash | Excuse | Reality |
# Full test suite |--------|---------|
pytest | "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. |
| "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
| "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. |
| "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. |
| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
| "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. |
| "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
| "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question the pattern, don't fix again. |
# Integration tests ## Quick Reference
pytest tests/integration/
# Check related areas | Phase | Key Activities | Success Criteria |
pytest -k "related_feature" |-------|---------------|------------------|
``` | **1. Root Cause** | Read errors, reproduce, check changes, gather evidence, trace data flow | Understand WHAT and WHY |
| **2. Pattern** | Find working examples, compare, identify differences | Know what's different |
| **3. Hypothesis** | Form theory, test minimally, one variable at a time | Confirmed or new hypothesis |
| **4. Implementation** | Create regression test, fix root cause, verify | Bug resolved, all tests pass |
### 3. Monitor After Deploy ## Hermes Agent Integration
- Watch logs for related errors ### Investigation Tools
- Check metrics
- Verify fix in production
### Phase 4 Completion Checklist Use these Hermes tools during Phase 1:
- [ ] Original issue cannot be reproduced - **`search_files`** — Find error strings, trace function calls, locate patterns
- [ ] All tests pass - **`read_file`** — Read source code with line numbers for precise analysis
- [ ] No new warnings/errors - **`terminal`** — Run tests, check git history, reproduce bugs
- [ ] Fix documented (commit message, comments) - **`web_search`/`web_extract`** — Research error messages, library docs
--- ### With delegate_task
## Debugging Techniques For complex multi-component debugging, dispatch investigation subagents:
### Root Cause Tracing
Ask "why" 5 times:
1. Why did it fail? → Null pointer
2. Why was it null? → Function returned null
3. Why did function return null? → Missing validation
4. Why was validation missing? → Assumed input always valid
5. Why was that assumption wrong? → API changed
**Root cause:** API change not accounted for
### Defense in Depth
Don't fix just the symptom:
**Bad:** Add null check at crash site
**Good:**
1. Add validation at API boundary
2. Add null check at crash site
3. Add test for both
4. Document API behavior
### Condition-Based Waiting
For timing/race conditions:
```python ```python
# Bad - arbitrary sleep delegate_task(
import time goal="Investigate why [specific test/behavior] fails",
time.sleep(5) # "Should be enough" context="""
Follow systematic-debugging skill:
1. Read the error message carefully
2. Reproduce the issue
3. Trace the data flow to find root cause
4. Report findings — do NOT fix yet
# Good - wait for condition Error: [paste full error]
from tenacity import retry, wait_exponential, stop_after_attempt File: [path to failing code]
Test command: [exact command]
@retry(wait=wait_exponential(multiplier=1, min=4, max=10), """,
stop=stop_after_attempt(5)) toolsets=['terminal', 'file']
def wait_for_service(): )
response = requests.get(health_url)
assert response.status_code == 200
``` ```
---
## Common Debugging Pitfalls
### Fix Without Understanding
**Symptom:** "Just add a try/catch"
**Problem:** Masks the real issue
**Solution:** Complete Phase 1 before any fix
### Shotgun Debugging
**Symptom:** Change 5 things at once
**Problem:** Don't know what fixed it
**Solution:** One change at a time, verify each
### Premature Optimization
**Symptom:** Rewrite while debugging
**Problem:** Introduces new bugs
**Solution:** Fix first, refactor later
### Assuming Environment
**Symptom:** "Works on my machine"
**Problem:** Environment differences
**Solution:** Check environment variables, versions, configs
---
## Language-Specific Tools
### Python
```python
# Add debugger
import pdb; pdb.set_trace()
# Or use ipdb for better experience
import ipdb; ipdb.set_trace()
# Log state
import logging
logging.debug(f"Variable state: {variable}")
# Stack trace
import traceback
traceback.print_exc()
```
### JavaScript/TypeScript
```javascript
// Debugger
debugger;
// Console with context
console.log("State:", { var1, var2, var3 });
// Stack trace
console.trace("Here");
// Error with context
throw new Error(`Failed with input: ${JSON.stringify(input)}`);
```
### Go
```go
// Print state
fmt.Printf("Debug: variable=%+v\n", variable)
// Stack trace
import "runtime/debug"
debug.PrintStack()
// Panic with context
if err != nil {
panic(fmt.Sprintf("unexpected error: %v", err))
}
```
---
## Integration with Other Skills
### With test-driven-development ### With test-driven-development
When debugging: When fixing bugs:
1. Write test that reproduces bug 1. Write a test that reproduces the bug (RED)
2. Debug systematically 2. Debug systematically to find root cause
3. Fix root cause 3. Fix the root cause (GREEN)
4. Test passes 4. The test proves the fix and prevents regression
### With writing-plans ## Real-World Impact
Include debugging tasks in plans: From debugging sessions:
- "Add diagnostic logging" - Systematic approach: 15-30 minutes to fix
- "Create reproduction test" - Random fixes approach: 2-3 hours of thrashing
- "Verify fix resolves issue" - First-time fix rate: 95% vs 40%
- New bugs introduced: Near zero vs common
### With subagent-driven-development
If subagent gets stuck:
1. Switch to systematic debugging
2. Analyze root cause
3. Provide findings to subagent
4. Resume implementation
---
## Remember
```
PHASE 1: Investigate → Understand WHY
PHASE 2: Design → Plan the fix
PHASE 3: Implement → Make the fix
PHASE 4: Verify → Confirm it's fixed
```
**No shortcuts. No guessing. Systematic always wins.** **No shortcuts. No guessing. Systematic always wins.**

View File

@@ -1,8 +1,8 @@
--- ---
name: test-driven-development name: test-driven-development
description: Use when implementing any feature or bugfix, before writing implementation code. Enforces RED-GREEN-REFACTOR cycle with test-first approach. description: Use when implementing any feature or bugfix, before writing implementation code. Enforces RED-GREEN-REFACTOR cycle with test-first approach.
version: 1.0.0 version: 1.1.0
author: Hermes Agent (adapted from Superpowers) author: Hermes Agent (adapted from obra/superpowers)
license: MIT license: MIT
metadata: metadata:
hermes: hermes:
@@ -28,7 +28,7 @@ Write the test first. Watch it fail. Write minimal code to pass.
- Refactoring - Refactoring
- Behavior changes - Behavior changes
**Exceptions (ask your human partner):** **Exceptions (ask the user first):**
- Throwaway prototypes - Throwaway prototypes
- Generated code - Generated code
- Configuration files - Configuration files
@@ -53,11 +53,11 @@ Implement fresh from tests. Period.
## Red-Green-Refactor Cycle ## Red-Green-Refactor Cycle
### RED - Write Failing Test ### RED Write Failing Test
Write one minimal test showing what should happen. Write one minimal test showing what should happen.
**Good Example:** **Good test:**
```python ```python
def test_retries_failed_operations_3_times(): def test_retries_failed_operations_3_times():
attempts = 0 attempts = 0
@@ -67,43 +67,51 @@ def test_retries_failed_operations_3_times():
if attempts < 3: if attempts < 3:
raise Exception('fail') raise Exception('fail')
return 'success' return 'success'
result = retry_operation(operation) result = retry_operation(operation)
assert result == 'success' assert result == 'success'
assert attempts == 3 assert attempts == 3
``` ```
- Clear name, tests real behavior, one thing Clear name, tests real behavior, one thing.
**Bad Example:** **Bad test:**
```python ```python
def test_retry_works(): def test_retry_works():
mock = MagicMock() mock = MagicMock()
mock.side_effect = [Exception(), Exception(), 'success'] mock.side_effect = [Exception(), Exception(), 'success']
result = retry_operation(mock) result = retry_operation(mock)
assert result == 'success' # What about retry count? Timing? assert result == 'success' # What about retry count? Timing?
``` ```
- Vague name, mocks behavior not reality, unclear what it tests Vague name, tests mock not real code.
### Verify RED **Requirements:**
- One behavior per test
- Clear descriptive name ("and" in name? Split it)
- Real code, not mocks (unless truly unavoidable)
- Name describes behavior, not implementation
Run the test. It MUST fail. ### Verify RED — Watch It Fail
**If it passes:** **MANDATORY. Never skip.**
- Test is wrong (not testing what you think)
- Code already exists (delete it, start over)
- Wrong test file running
**What to check:** ```bash
- Error message makes sense # Use terminal tool to run the specific test
- Fails for expected reason pytest tests/test_feature.py::test_specific_behavior -v
- Stack trace points to right place ```
### GREEN - Minimal Code Confirm:
- Test fails (not errors from typos)
- Failure message is expected
- Fails because the feature is missing
Write just enough code to pass. Nothing more. **Test passes immediately?** You're testing existing behavior. Fix the test.
**Test errors?** Fix the error, re-run until it fails correctly.
### GREEN — Minimal Code
Write the simplest code to pass the test. Nothing more.
**Good:** **Good:**
```python ```python
@@ -119,266 +127,216 @@ def add(a, b):
return result return result
``` ```
Cheating is OK in GREEN: Don't add features, refactor other code, or "improve" beyond the test.
**Cheating is OK in GREEN:**
- Hardcode return values - Hardcode return values
- Copy-paste - Copy-paste
- Duplicate code - Duplicate code
- Skip edge cases - Skip edge cases
**We'll fix it in refactor.** We'll fix it in REFACTOR.
### Verify GREEN ### Verify GREEN — Watch It Pass
Run tests. All must pass. **MANDATORY.**
**If fails:** ```bash
- Fix minimal code # Run the specific test
- Don't expand scope pytest tests/test_feature.py::test_specific_behavior -v
- Stay in GREEN
### REFACTOR - Clean Up # Then run ALL tests to check for regressions
pytest tests/ -q
```
Now improve the code while keeping tests green. Confirm:
- Test passes
- Other tests still pass
- Output pristine (no errors, warnings)
**Safe refactorings:** **Test fails?** Fix the code, not the test.
- Rename variables/functions
- Extract helper functions **Other tests fail?** Fix regressions now.
### REFACTOR — Clean Up
After green only:
- Remove duplication - Remove duplication
- Improve names
- Extract helpers
- Simplify expressions - Simplify expressions
- Improve readability
**Golden rule:** Tests stay green throughout. Keep tests green throughout. Don't add behavior.
**If tests fail during refactor:** **If tests fail during refactor:** Undo immediately. Take smaller steps.
- Undo immediately
- Smaller refactoring steps
- Check you didn't change behavior
## Implementation Workflow ### Repeat
### 1. Create Test File Next failing test for next behavior. One cycle at a time.
```bash ## Why Order Matters
# Python
touch tests/test_feature.py
# JavaScript **"I'll write tests after to verify it works"**
touch tests/feature.test.js
# Rust Tests written after code pass immediately. Passing immediately proves nothing:
touch tests/feature_tests.rs - Might test the wrong thing
``` - Might test implementation, not behavior
- Might miss edge cases you forgot
- You never saw it catch the bug
### 2. Write First Failing Test Test-first forces you to see the test fail, proving it actually tests something.
**"I already manually tested all the edge cases"**
Manual testing is ad-hoc. You think you tested everything but:
- No record of what you tested
- Can't re-run when code changes
- Easy to forget cases under pressure
- "It worked when I tried it" ≠ comprehensive
Automated tests are systematic. They run the same way every time.
**"Deleting X hours of work is wasteful"**
Sunk cost fallacy. The time is already gone. Your choice now:
- Delete and rewrite with TDD (high confidence)
- Keep it and add tests after (low confidence, likely bugs)
The "waste" is keeping code you can't trust.
**"TDD is dogmatic, being pragmatic means adapting"**
TDD IS pragmatic:
- Finds bugs before commit (faster than debugging after)
- Prevents regressions (tests catch breaks immediately)
- Documents behavior (tests show how to use code)
- Enables refactoring (change freely, tests catch breaks)
"Pragmatic" shortcuts = debugging in production = slower.
**"Tests after achieve the same goals — it's spirit not ritual"**
No. Tests-after answer "What does this do?" Tests-first answer "What should this do?"
Tests-after are biased by your implementation. You test what you built, not what's required. Tests-first force edge case discovery before implementing.
## Common Rationalizations
| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
| "Test hard = design unclear" | Listen to the test. Hard to test = hard to use. |
| "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
| "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. |
| "Existing code has no tests" | You're improving it. Add tests for the code you touch. |
## Red Flags — STOP and Start Over
If you catch yourself doing any of these, delete the code and restart with TDD:
- Code before test
- Test after implementation
- Test passes immediately on first run
- Can't explain why test failed
- Tests added "later"
- Rationalizing "just this once"
- "I already manually tested it"
- "Tests after achieve the same purpose"
- "Keep as reference" or "adapt existing code"
- "Already spent X hours, deleting is wasteful"
- "TDD is dogmatic, I'm being pragmatic"
- "This is different because..."
**All of these mean: Delete code. Start over with TDD.**
## Verification Checklist
Before marking work complete:
- [ ] Every new function/method has a test
- [ ] Watched each test fail before implementing
- [ ] Each test failed for expected reason (feature missing, not typo)
- [ ] Wrote minimal code to pass each test
- [ ] All tests pass
- [ ] Output pristine (no errors, warnings)
- [ ] Tests use real code (mocks only if unavoidable)
- [ ] Edge cases and errors covered
Can't check all boxes? You skipped TDD. Start over.
## When Stuck
| Problem | Solution |
|---------|----------|
| Don't know how to test | Write the wished-for API. Write the assertion first. Ask the user. |
| Test too complicated | Design too complicated. Simplify the interface. |
| Must mock everything | Code too coupled. Use dependency injection. |
| Test setup huge | Extract helpers. Still complex? Simplify the design. |
## Hermes Agent Integration
### Running Tests
Use the `terminal` tool to run tests at each step:
```python ```python
# tests/test_calculator.py # RED — verify failure
def test_adds_two_numbers(): terminal("pytest tests/test_feature.py::test_name -v")
calc = Calculator()
result = calc.add(2, 3) # GREEN — verify pass
assert result == 5 terminal("pytest tests/test_feature.py::test_name -v")
# Full suite — verify no regressions
terminal("pytest tests/ -q")
``` ```
### 3. Run and Verify Failure ### With delegate_task
```bash When dispatching subagents for implementation, enforce TDD in the goal:
pytest tests/test_calculator.py -v
# Expected: FAIL - Calculator not defined
```
### 4. Write Minimal Implementation
```python ```python
# src/calculator.py delegate_task(
class Calculator: goal="Implement [feature] using strict TDD",
def add(self, a, b): context="""
return a + b # Minimal! Follow test-driven-development skill:
1. Write failing test FIRST
2. Run test to verify it fails
3. Write minimal code to pass
4. Run test to verify it passes
5. Refactor if needed
6. Commit
Project test command: pytest tests/ -q
Project structure: [describe relevant files]
""",
toolsets=['terminal', 'file']
)
``` ```
### 5. Run and Verify Pass
```bash
pytest tests/test_calculator.py -v
# Expected: PASS
```
### 6. Commit
```bash
git add tests/test_calculator.py src/calculator.py
git commit -m "feat: add calculator with add method"
```
### 7. Next Test
```python
def test_adds_negative_numbers():
calc = Calculator()
result = calc.add(-2, -3)
assert result == -5
```
Repeat cycle.
## Testing Anti-Patterns
### Mocking What You Don't Own
**Bad:** Mock database, HTTP client, file system
**Good:** Abstract behind interface, test interface
### Testing Implementation Details
**Bad:** Test that function was called
**Good:** Test the result/behavior
### Happy Path Only
**Bad:** Only test expected inputs
**Good:** Test edge cases, errors, boundaries
### Brittle Tests
**Bad:** Test breaks when refactoring
**Good:** Tests verify behavior, not structure
## Common Pitfalls
### "I'll Write Tests After"
No, you won't. Write them first.
### "This is Too Simple to Test"
Simple bugs cause complex problems. Test everything.
### "I Need to See It Work First"
Temporary code becomes permanent. Test first.
### "Tests Take Too Long"
Untested code takes longer. Invest in tests.
## Language-Specific Commands
### Python (pytest)
```bash
# Run all tests
pytest
# Run specific test
pytest tests/test_feature.py::test_name -v
# Run with coverage
pytest --cov=src --cov-report=term-missing
# Watch mode
pytest-watch
```
### JavaScript (Jest)
```bash
# Run all tests
npm test
# Run specific test
npm test -- test_name
# Watch mode
npm test -- --watch
# Coverage
npm test -- --coverage
```
### TypeScript (Jest with ts-jest)
```bash
# Run tests
npx jest
# Run specific file
npx jest tests/feature.test.ts
```
### Go
```bash
# Run all tests
go test ./...
# Run specific test
go test -run TestName
# Verbose
go test -v
# Coverage
go test -cover
```
### Rust
```bash
# Run tests
cargo test
# Run specific test
cargo test test_name
# Show output
cargo test -- --nocapture
```
## Integration with Other Skills
### With writing-plans
Every plan task should specify:
- What test to write
- Expected test failure
- Minimal implementation
- Refactoring opportunities
### With systematic-debugging ### With systematic-debugging
When fixing bugs: Bug found? Write failing test reproducing it. Follow TDD cycle. The test proves the fix and prevents regression.
1. Write test that reproduces bug
2. Verify test fails (RED)
3. Fix bug (GREEN)
4. Refactor if needed
### With subagent-driven-development Never fix bugs without a test.
Subagent implements one test at a time: ## Testing Anti-Patterns
1. Write failing test
2. Minimal code to pass
3. Commit
4. Next test
## Success Indicators - **Testing mock behavior instead of real behavior** — mocks should verify interactions, not replace the system under test
- **Testing implementation details** — test behavior/results, not internal method calls
- **Happy path only** — always test edge cases, errors, and boundaries
- **Brittle tests** — tests should verify behavior, not structure; refactoring shouldn't break them
**You're doing TDD right when:** ## Final Rule
- Tests fail before code exists
- You write <10 lines between test runs
- Refactoring feels safe
- Bugs are caught immediately
- Code is simpler than expected
**Red flags:** ```
- Writing 50+ lines without running tests Production code → test exists and failed first
- Tests always pass Otherwise → not TDD
- Fear of refactoring ```
- "I'll test later"
## Remember No exceptions without the user's explicit permission.
1. **RED** - Write failing test
2. **GREEN** - Minimal code to pass
3. **REFACTOR** - Clean up safely
4. **REPEAT** - Next behavior
**If you didn't see it fail, it doesn't test what you think.**

View File

@@ -1,8 +1,8 @@
--- ---
name: writing-plans name: writing-plans
description: Use when you have a spec or requirements for a multi-step task. Creates comprehensive implementation plans with bite-sized tasks, exact file paths, and complete code examples. description: Use when you have a spec or requirements for a multi-step task. Creates comprehensive implementation plans with bite-sized tasks, exact file paths, and complete code examples.
version: 1.0.0 version: 1.1.0
author: Hermes Agent (adapted from Superpowers) author: Hermes Agent (adapted from obra/superpowers)
license: MIT license: MIT
metadata: metadata:
hermes: hermes:
@@ -14,112 +14,34 @@ metadata:
## Overview ## Overview
Transform specifications into actionable implementation plans. Write comprehensive plans that any developer can follow - even with zero context about your codebase. Write comprehensive implementation plans assuming the implementer has zero context for the codebase and questionable taste. Document everything they need: which files to touch, complete code, testing commands, docs to check, how to verify. Give them bite-sized tasks. DRY. YAGNI. TDD. Frequent commits.
**Core principle:** Document everything: exact file paths, complete code, test commands, verification steps. Assume the implementer is a skilled developer but knows almost nothing about the toolset or problem domain. Assume they don't know good test design very well.
**Assume the implementer:** **Core principle:** A good plan makes implementation obvious. If someone has to guess, the plan is incomplete.
- Is a skilled developer
- Knows almost nothing about your codebase
- Has questionable taste in code style
- Needs explicit guidance
## When to Use ## When to Use
**Always use this skill:** **Always use before:**
- Before implementing multi-step features - Implementing multi-step features
- After design approval (from brainstorming) - Breaking down complex requirements
- When breaking down complex requirements - Delegating to subagents via subagent-driven-development
- Before delegating to subagents
**Don't skip when:** **Don't skip when:**
- Feature seems simple (assumptions cause bugs) - Feature seems simple (assumptions cause bugs)
- You plan to implement yourself (future you needs guidance) - You plan to implement it yourself (future you needs guidance)
- Working alone (documentation matters) - Working alone (documentation matters)
## Plan Document Structure ## Bite-Sized Task Granularity
### Header (Required) **Each task = 2-5 minutes of focused work.**
Every plan MUST start with: Every step is one action:
- "Write the failing test" — step
```markdown - "Run it to make sure it fails" — step
# [Feature Name] Implementation Plan - "Implement the minimal code to make the test pass" — step
- "Run the tests and make sure they pass" — step
> **For Hermes:** Use subagent-driven-development or executing-plans skill to implement this plan. - "Commit" — step
**Goal:** One sentence describing what this builds
**Architecture:** 2-3 sentences about approach
**Tech Stack:** Key technologies/libraries
---
```
### Task Structure
Each task follows this format:
```markdown
### Task N: [Descriptive Name]
**Objective:** What this task accomplishes (one sentence)
**Files:**
- Create: `exact/path/to/new_file.py`
- Modify: `exact/path/to/existing.py:45-67` (line numbers if known)
- Test: `tests/path/to/test_file.py`
**Implementation Steps:**
**Step 1: [Action description]**
```python
# Complete code to write
class NewClass:
def __init__(self):
self.value = None
def process(self, input):
return input.upper()
```
**Step 2: [Action description]**
```bash
# Command to run
pytest tests/test_new.py -v
```
Expected: Tests pass with 3 green dots
**Step 3: [Action description]**
```python
# More code if needed
```
**Verification:**
- [ ] Test passes
- [ ] Function returns expected output
- [ ] No syntax errors
**Commit:**
```bash
git add src/new_file.py tests/test_new.py
git commit -m "feat: add new feature component"
```
```
## Task Granularity
**Each task = 2-5 minutes of work**
**Break down:**
- "Write failing test" - one task
- "Run test to verify it fails" - one task
- "Implement minimal code" - one task
- "Run test to verify pass" - one task
- "Commit" - one task
**Examples:**
**Too big:** **Too big:**
```markdown ```markdown
@@ -139,6 +61,146 @@ git commit -m "feat: add new feature component"
[15 lines, 1 file] [15 lines, 1 file]
``` ```
## Plan Document Structure
### Header (Required)
Every plan MUST start with:
```markdown
# [Feature Name] Implementation Plan
> **For Hermes:** Use subagent-driven-development skill to implement this plan task-by-task.
**Goal:** [One sentence describing what this builds]
**Architecture:** [2-3 sentences about approach]
**Tech Stack:** [Key technologies/libraries]
---
```
### Task Structure
Each task follows this format:
````markdown
### Task N: [Descriptive Name]
**Objective:** What this task accomplishes (one sentence)
**Files:**
- Create: `exact/path/to/new_file.py`
- Modify: `exact/path/to/existing.py:45-67` (line numbers if known)
- Test: `tests/path/to/test_file.py`
**Step 1: Write failing test**
```python
def test_specific_behavior():
result = function(input)
assert result == expected
```
**Step 2: Run test to verify failure**
Run: `pytest tests/path/test.py::test_specific_behavior -v`
Expected: FAIL — "function not defined"
**Step 3: Write minimal implementation**
```python
def function(input):
return expected
```
**Step 4: Run test to verify pass**
Run: `pytest tests/path/test.py::test_specific_behavior -v`
Expected: PASS
**Step 5: Commit**
```bash
git add tests/path/test.py src/path/file.py
git commit -m "feat: add specific feature"
```
````
## Writing Process
### Step 1: Understand Requirements
Read and understand:
- Feature requirements
- Design documents or user description
- Acceptance criteria
- Constraints
### Step 2: Explore the Codebase
Use Hermes tools to understand the project:
```python
# Understand project structure
search_files("*.py", target="files", path="src/")
# Look at similar features
search_files("similar_pattern", path="src/", file_glob="*.py")
# Check existing tests
search_files("*.py", target="files", path="tests/")
# Read key files
read_file("src/app.py")
```
### Step 3: Design Approach
Decide:
- Architecture pattern
- File organization
- Dependencies needed
- Testing strategy
### Step 4: Write Tasks
Create tasks in order:
1. Setup/infrastructure
2. Core functionality (TDD for each)
3. Edge cases
4. Integration
5. Cleanup/documentation
### Step 5: Add Complete Details
For each task, include:
- **Exact file paths** (not "the config file" but `src/config/settings.py`)
- **Complete code examples** (not "add validation" but the actual code)
- **Exact commands** with expected output
- **Verification steps** that prove the task works
### Step 6: Review the Plan
Check:
- [ ] Tasks are sequential and logical
- [ ] Each task is bite-sized (2-5 min)
- [ ] File paths are exact
- [ ] Code examples are complete (copy-pasteable)
- [ ] Commands are exact with expected output
- [ ] No missing context
- [ ] DRY, YAGNI, TDD principles applied
### Step 7: Save the Plan
```bash
mkdir -p docs/plans
# Save plan to docs/plans/YYYY-MM-DD-feature-name.md
git add docs/plans/
git commit -m "docs: add implementation plan for [feature]"
```
## Principles ## Principles
### DRY (Don't Repeat Yourself) ### DRY (Don't Repeat Yourself)
@@ -146,35 +208,21 @@ git commit -m "feat: add new feature component"
**Bad:** Copy-paste validation in 3 places **Bad:** Copy-paste validation in 3 places
**Good:** Extract validation function, use everywhere **Good:** Extract validation function, use everywhere
```python
# Good - DRY
def validate_email(email):
if not re.match(r'^[^@]+@[^@]+$', email):
raise ValueError("Invalid email")
return email
# Use everywhere
validate_email(user_input)
validate_email(config_email)
validate_email(imported_data)
```
### YAGNI (You Aren't Gonna Need It) ### YAGNI (You Aren't Gonna Need It)
**Bad:** Add "flexibility" for future requirements **Bad:** Add "flexibility" for future requirements
**Good:** Implement only what's needed now **Good:** Implement only what's needed now
```python ```python
# Bad - YAGNI violation # Bad YAGNI violation
class User: class User:
def __init__(self, name, email): def __init__(self, name, email):
self.name = name self.name = name
self.email = email self.email = email
self.preferences = {} # Not needed yet! self.preferences = {} # Not needed yet!
self.metadata = {} # Not needed yet! self.metadata = {} # Not needed yet!
self.settings = {} # Not needed yet!
# Good - YAGNI # Good YAGNI
class User: class User:
def __init__(self, name, email): def __init__(self, name, email):
self.name = name self.name = name
@@ -183,7 +231,7 @@ class User:
### TDD (Test-Driven Development) ### TDD (Test-Driven Development)
Every task that produces code should include: Every task that produces code should include the full TDD cycle:
1. Write failing test 1. Write failing test
2. Run to verify failure 2. Run to verify failure
3. Write minimal code 3. Write minimal code
@@ -199,255 +247,50 @@ git add [files]
git commit -m "type: description" git commit -m "type: description"
``` ```
## Writing Process
### Step 1: Understand Requirements
Read and understand:
- Feature requirements
- Design documents
- Acceptance criteria
- Constraints
### Step 2: Explore Codebase
```bash
# Understand project structure
find src -type f -name "*.py" | head -20
# Look at similar features
grep -r "similar_pattern" src/
# Check existing tests
ls tests/
```
### Step 3: Design Approach
Decide:
- Architecture pattern
- File organization
- Dependencies needed
- Testing strategy
### Step 4: Write Tasks
Create tasks in order:
1. Setup/infrastructure
2. Core functionality
3. Edge cases
4. Integration
5. Cleanup
### Step 5: Add Details
For each task, add:
- Exact file paths
- Complete code examples
- Exact commands
- Expected outputs
- Verification steps
### Step 6: Review Plan
Check:
- [ ] Tasks are sequential and logical
- [ ] Each task is bite-sized (2-5 min)
- [ ] File paths are exact
- [ ] Code examples are complete
- [ ] Commands are exact with expected output
- [ ] No missing context
### Step 7: Save Plan
```bash
# Create plans directory
mkdir -p docs/plans
# Save plan
cat > docs/plans/YYYY-MM-DD-feature-name.md << 'EOF'
[plan content]
EOF
# Commit plan
git add docs/plans/YYYY-MM-DD-feature-name.md
git commit -m "docs: add implementation plan for feature"
```
## Example Plan
```markdown
# User Authentication Implementation Plan
> **For Hermes:** Use subagent-driven-development to implement this plan.
**Goal:** Add JWT-based user authentication to the Flask API
**Architecture:** Use PyJWT for tokens, bcrypt for hashing. Middleware validates tokens on protected routes.
**Tech Stack:** Python, Flask, PyJWT, bcrypt
---
### Task 1: Create User model
**Objective:** Define User model with email and hashed password
**Files:**
- Create: `src/models/user.py`
- Test: `tests/models/test_user.py`
**Step 1: Write failing test**
```python
def test_user_creation():
user = User(email="test@example.com", password="secret123")
assert user.email == "test@example.com"
assert user.password_hash is not None
assert user.password_hash != "secret123" # Should be hashed
```
**Step 2: Run to verify failure**
```bash
pytest tests/models/test_user.py -v
```
Expected: FAIL - User class not defined
**Step 3: Implement User model**
```python
import bcrypt
class User:
def __init__(self, email, password):
self.email = email
self.password_hash = bcrypt.hashpw(
password.encode(),
bcrypt.gensalt()
)
```
**Step 4: Run to verify pass**
```bash
pytest tests/models/test_user.py -v
```
Expected: PASS
**Commit:**
```bash
git add src/models/user.py tests/models/test_user.py
git commit -m "feat: add User model with password hashing"
```
### Task 2: Create login endpoint
**Objective:** Add POST /login endpoint that returns JWT
**Files:**
- Modify: `src/app.py`
- Test: `tests/test_login.py`
[Continue...]
```
## Common Mistakes ## Common Mistakes
### Vague Tasks ### Vague Tasks
**Bad:** **Bad:** "Add authentication"
```markdown **Good:** "Create User model with email and password_hash fields"
### Task 1: Add authentication
```
**Good:**
```markdown
### Task 1: Create User model with email and password_hash fields
```
### Incomplete Code ### Incomplete Code
**Bad:** **Bad:** "Step 1: Add validation function"
```markdown **Good:** "Step 1: Add validation function" followed by the complete function code
Step 1: Add validation function
```
**Good:**
```markdown
Step 1: Add validation function
```python
def validate_email(email):
"""Validate email format."""
import re
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
if not re.match(pattern, email):
raise ValueError(f"Invalid email: {email}")
return email
```
### Missing Verification ### Missing Verification
**Bad:** **Bad:** "Step 3: Test it works"
```markdown **Good:** "Step 3: Run `pytest tests/test_auth.py -v`, expected: 3 passed"
Step 3: Test it works
```
**Good:** ### Missing File Paths
```markdown
Step 3: Verify authentication
```bash
curl -X POST http://localhost:5000/login \
-H "Content-Type: application/json" \
-d '{"email":"test@example.com","password":"secret123"}'
```
Expected: Returns 200 with JWT token in response
```
## Integration with Other Skills **Bad:** "Create the model file"
**Good:** "Create: `src/models/user.py`"
### With brainstorming ## Execution Handoff
**Sequence:** After saving the plan, offer the execution approach:
1. brainstorming → Explore and refine design
2. writing-plans → Create implementation plan
3. subagent-driven-development → Execute plan
### With subagent-driven-development **"Plan complete and saved. Ready to execute using subagent-driven-development — I'll dispatch a fresh subagent per task with two-stage review (spec compliance then code quality). Shall I proceed?"**
Plans feed into subagent-driven-development: When executing, use the `subagent-driven-development` skill:
- Subagents implement each task - Fresh `delegate_task` per task with full context
- Two-stage review ensures quality - Spec compliance review after each task
- Plan provides context and requirements - Code quality review after spec passes
- Proceed only when both reviews approve
### With test-driven-development
Every code-producing task should follow TDD:
1. Write failing test
2. Verify failure
3. Write minimal code
4. Verify pass
## Success Checklist
Before considering a plan complete:
- [ ] Header with goal, architecture, tech stack
- [ ] All tasks are bite-sized (2-5 min each)
- [ ] Exact file paths for every file
- [ ] Complete code examples (not partial)
- [ ] Exact commands with expected output
- [ ] Verification steps for each task
- [ ] Commit commands included
- [ ] DRY, YAGNI, TDD principles applied
- [ ] Tasks are sequential and logical
- [ ] Plan saved to docs/plans/
## Remember ## Remember
``` ```
Bite-sized tasks Bite-sized tasks (2-5 min each)
Exact file paths Exact file paths
Complete code Complete code (copy-pasteable)
Exact commands Exact commands with expected output
Verification steps Verification steps
DRY, YAGNI, TDD DRY, YAGNI, TDD
Frequent commits
``` ```
**A good plan makes implementation obvious.** **A good plan makes implementation obvious.**