refactor: enhance software-development skills with Hermes integration
Improvements to all 5 skills adapted from obra/superpowers: - Restored anti-rationalization tables and red flags from originals (key behavioral guardrails that prevent LLMs from taking shortcuts) - Restored 'Rule of Three' for debugging (3+ failed fixes = question architecture, not keep fixing) - Restored Pattern Analysis and Hypothesis Testing phases in debugging - Restored 'Why Order Matters' rebuttals and verification checklist in TDD - Added proper Hermes delegate_task integration with real parameter examples and toolset specifications throughout - Added Hermes tool usage (search_files, read_file, terminal) for investigation and verification steps - Removed references to non-existent skills (brainstorming, finishing-a-development-branch, executing-plans, using-git-worktrees) - Removed generic language-specific sections (Go, Rust, Jest) that added bulk without agent value - Tightened prose — cut ~430 lines while adding more actionable content - Added execution handoff section to writing-plans - Consistent cross-references between the 5 skills
This commit is contained in:
@@ -1,8 +1,8 @@
|
||||
---
|
||||
name: requesting-code-review
|
||||
description: Use when completing tasks, implementing major features, or before merging. Validates work meets requirements through systematic review process.
|
||||
version: 1.0.0
|
||||
author: Hermes Agent (adapted from Superpowers)
|
||||
version: 1.1.0
|
||||
author: Hermes Agent (adapted from obra/superpowers)
|
||||
license: MIT
|
||||
metadata:
|
||||
hermes:
|
||||
@@ -14,108 +14,102 @@ metadata:
|
||||
|
||||
## Overview
|
||||
|
||||
Systematic code review catches issues before they cascade. Review early, review often.
|
||||
Dispatch a reviewer subagent to catch issues before they cascade. Review early, review often.
|
||||
|
||||
**Core principle:** Fresh perspective finds issues you'll miss.
|
||||
|
||||
## When to Request Review
|
||||
|
||||
**Mandatory Reviews:**
|
||||
**Mandatory:**
|
||||
- After each task in subagent-driven development
|
||||
- After completing major features
|
||||
- After completing a major feature
|
||||
- Before merge to main
|
||||
- After bug fixes
|
||||
|
||||
**Optional but Valuable:**
|
||||
**Optional but valuable:**
|
||||
- When stuck (fresh perspective)
|
||||
- Before refactoring (baseline check)
|
||||
- After complex logic implementation
|
||||
- When touching critical code (auth, payments, data)
|
||||
|
||||
**Don't skip because:**
|
||||
- "It's simple" (simple bugs compound)
|
||||
- "I'm in a hurry" (reviews save time)
|
||||
- "I tested it" (you have blind spots)
|
||||
**Never skip because:**
|
||||
- "It's simple" — simple bugs compound
|
||||
- "I'm in a hurry" — reviews save time
|
||||
- "I tested it" — you have blind spots
|
||||
|
||||
## Review Process
|
||||
|
||||
### Step 1: Prepare Context
|
||||
### Step 1: Self-Review First
|
||||
|
||||
Gather:
|
||||
- What was implemented
|
||||
- Original requirements/plan
|
||||
- Files changed
|
||||
- Test results
|
||||
Before dispatching a reviewer, check yourself:
|
||||
|
||||
```bash
|
||||
# Get changed files
|
||||
git diff --name-only HEAD~1
|
||||
|
||||
# Get diff summary
|
||||
git diff --stat HEAD~1
|
||||
|
||||
# Get commit messages
|
||||
git log --oneline HEAD~5
|
||||
```
|
||||
|
||||
### Step 2: Self-Review First
|
||||
|
||||
Before requesting external review:
|
||||
|
||||
**Checklist:**
|
||||
- [ ] Code follows project conventions
|
||||
- [ ] All tests pass
|
||||
- [ ] No debug print statements
|
||||
- [ ] No hardcoded secrets
|
||||
- [ ] No debug print statements left
|
||||
- [ ] No hardcoded secrets or credentials
|
||||
- [ ] Error handling in place
|
||||
- [ ] Documentation updated
|
||||
- [ ] Commit messages are clear
|
||||
|
||||
```bash
|
||||
# Run full test suite
|
||||
pytest
|
||||
pytest tests/ -q
|
||||
|
||||
# Check for debug code
|
||||
grep -r "print(" src/ --include="*.py"
|
||||
grep -r "debugger" src/ --include="*.js"
|
||||
search_files("print(", path="src/", file_glob="*.py")
|
||||
search_files("console.log", path="src/", file_glob="*.js")
|
||||
|
||||
# Check for TODOs
|
||||
grep -r "TODO" src/ --include="*.py"
|
||||
search_files("TODO|FIXME|HACK", path="src/")
|
||||
```
|
||||
|
||||
### Step 3: Request Review
|
||||
### Step 2: Gather Context
|
||||
|
||||
Use `delegate_task` to dispatch a reviewer subagent:
|
||||
```bash
|
||||
# Changed files
|
||||
git diff --name-only HEAD~1
|
||||
|
||||
# Diff summary
|
||||
git diff --stat HEAD~1
|
||||
|
||||
# Recent commits
|
||||
git log --oneline -5
|
||||
```
|
||||
|
||||
### Step 3: Dispatch Reviewer Subagent
|
||||
|
||||
Use `delegate_task` to dispatch a focused reviewer:
|
||||
|
||||
```python
|
||||
delegate_task(
|
||||
goal="Review implementation for quality and correctness",
|
||||
goal="Review implementation for correctness and quality",
|
||||
context="""
|
||||
WHAT WAS IMPLEMENTED: [Brief description]
|
||||
|
||||
ORIGINAL REQUIREMENTS: [From plan or issue]
|
||||
|
||||
WHAT WAS IMPLEMENTED:
|
||||
[Brief description of the feature/fix]
|
||||
|
||||
ORIGINAL REQUIREMENTS:
|
||||
[From plan, issue, or user request]
|
||||
|
||||
FILES CHANGED:
|
||||
- src/feature.py (added X function)
|
||||
- tests/test_feature.py (added tests)
|
||||
|
||||
COMMIT RANGE: [SHA range or branch]
|
||||
|
||||
CHECK FOR:
|
||||
- Correctness (does it do what it should?)
|
||||
- Edge cases handled?
|
||||
- Error handling adequate?
|
||||
- Code quality and style
|
||||
- Test coverage
|
||||
- Security issues
|
||||
- Performance concerns
|
||||
|
||||
- src/models/user.py (added User class)
|
||||
- src/auth/login.py (added login endpoint)
|
||||
- tests/test_auth.py (added 8 tests)
|
||||
|
||||
REVIEW CHECKLIST:
|
||||
- [ ] Correctness: Does it do what it should?
|
||||
- [ ] Edge cases: Are they handled?
|
||||
- [ ] Error handling: Is it adequate?
|
||||
- [ ] Code quality: Clear names, good structure?
|
||||
- [ ] Test coverage: Are tests meaningful?
|
||||
- [ ] Security: Any vulnerabilities?
|
||||
- [ ] Performance: Any obvious issues?
|
||||
|
||||
OUTPUT FORMAT:
|
||||
- Summary: [brief assessment]
|
||||
- Critical Issues: [must fix]
|
||||
- Important Issues: [should fix]
|
||||
- Critical Issues: [must fix — blocks merge]
|
||||
- Important Issues: [should fix before merge]
|
||||
- Minor Issues: [nice to have]
|
||||
- Verdict: [APPROVE / REQUEST_CHANGES / NEEDS_WORK]
|
||||
- Strengths: [what was done well]
|
||||
- Verdict: APPROVE / REQUEST_CHANGES
|
||||
""",
|
||||
toolsets=['file']
|
||||
)
|
||||
@@ -123,110 +117,72 @@ delegate_task(
|
||||
|
||||
### Step 4: Act on Feedback
|
||||
|
||||
**Critical Issues (Block merge):**
|
||||
**Critical Issues (block merge):**
|
||||
- Security vulnerabilities
|
||||
- Broken functionality
|
||||
- Data loss risk
|
||||
- Test failures
|
||||
- **Action:** Fix immediately before proceeding
|
||||
|
||||
**Action:** Fix immediately before proceeding
|
||||
|
||||
**Important Issues (Should fix):**
|
||||
**Important Issues (should fix):**
|
||||
- Missing edge case handling
|
||||
- Poor error messages
|
||||
- Unclear code
|
||||
- Missing tests
|
||||
- **Action:** Fix before merge if possible
|
||||
|
||||
**Action:** Fix before merge if possible
|
||||
|
||||
**Minor Issues (Nice to have):**
|
||||
**Minor Issues (nice to have):**
|
||||
- Style preferences
|
||||
- Refactoring suggestions
|
||||
- Documentation improvements
|
||||
- **Action:** Note for later or quick fix
|
||||
|
||||
**Action:** Note for later or quick fix
|
||||
**If reviewer is wrong:**
|
||||
- Push back with technical reasoning
|
||||
- Show code/tests that prove it works
|
||||
- Request clarification
|
||||
|
||||
## Review Dimensions
|
||||
|
||||
### Correctness
|
||||
|
||||
**Questions:**
|
||||
- Does it implement the requirements?
|
||||
- Are there logic errors?
|
||||
- Do edge cases work?
|
||||
- Are there race conditions?
|
||||
|
||||
**Check:**
|
||||
- Read implementation against requirements
|
||||
- Trace through edge cases
|
||||
- Check boundary conditions
|
||||
|
||||
### Code Quality
|
||||
|
||||
**Questions:**
|
||||
- Is code readable?
|
||||
- Are names clear?
|
||||
- Is it too complex?
|
||||
- Are names clear and descriptive?
|
||||
- Is it too complex? (Functions >20 lines = smell)
|
||||
- Is there duplication?
|
||||
|
||||
**Check:**
|
||||
- Function length (aim <20 lines)
|
||||
- Cyclomatic complexity
|
||||
- DRY violations
|
||||
- Naming clarity
|
||||
|
||||
### Testing
|
||||
|
||||
**Questions:**
|
||||
- Are there tests?
|
||||
- Are there meaningful tests?
|
||||
- Do they cover edge cases?
|
||||
- Are they meaningful?
|
||||
- Do they pass?
|
||||
|
||||
**Check:**
|
||||
- Test coverage
|
||||
- Edge case coverage
|
||||
- Test clarity
|
||||
- Assertion quality
|
||||
- Do they test behavior, not implementation?
|
||||
- Do all tests pass?
|
||||
|
||||
### Security
|
||||
|
||||
**Questions:**
|
||||
- Any injection vulnerabilities?
|
||||
- Proper input validation?
|
||||
- Secrets handled correctly?
|
||||
- Access control in place?
|
||||
|
||||
**Check:**
|
||||
- Input sanitization
|
||||
- Authentication/authorization
|
||||
- Secret management
|
||||
- SQL/query safety
|
||||
|
||||
### Performance
|
||||
|
||||
**Questions:**
|
||||
- Any N+1 queries?
|
||||
- Unnecessary computation?
|
||||
- Unnecessary computation in loops?
|
||||
- Memory leaks?
|
||||
- Scalability concerns?
|
||||
|
||||
**Check:**
|
||||
- Database queries
|
||||
- Algorithm complexity
|
||||
- Resource usage
|
||||
- Caching opportunities
|
||||
- Missing caching opportunities?
|
||||
|
||||
## Review Output Format
|
||||
|
||||
Standard review format:
|
||||
Standard format for reviewer subagent output:
|
||||
|
||||
```markdown
|
||||
## Review Summary
|
||||
|
||||
**Assessment:** [Brief overall assessment]
|
||||
|
||||
**Verdict:** [APPROVE / REQUEST_CHANGES / NEEDS_WORK]
|
||||
**Verdict:** APPROVE / REQUEST_CHANGES
|
||||
|
||||
---
|
||||
|
||||
@@ -237,8 +193,6 @@ Standard review format:
|
||||
- Problem: [Description]
|
||||
- Suggestion: [How to fix]
|
||||
|
||||
---
|
||||
|
||||
## Important Issues (Should Fix)
|
||||
|
||||
1. **[Issue title]**
|
||||
@@ -246,15 +200,11 @@ Standard review format:
|
||||
- Problem: [Description]
|
||||
- Suggestion: [How to fix]
|
||||
|
||||
---
|
||||
|
||||
## Minor Issues (Optional)
|
||||
|
||||
1. **[Issue title]**
|
||||
- Suggestion: [Improvement idea]
|
||||
|
||||
---
|
||||
|
||||
## Strengths
|
||||
|
||||
- [What was done well]
|
||||
@@ -264,111 +214,41 @@ Standard review format:
|
||||
|
||||
### With subagent-driven-development
|
||||
|
||||
Review after EACH task:
|
||||
1. Subagent implements task
|
||||
2. Request code review
|
||||
3. Fix issues
|
||||
4. Proceed to next task
|
||||
Review after EACH task — this is the two-stage review:
|
||||
1. Spec compliance review (does it match the plan?)
|
||||
2. Code quality review (is it well-built?)
|
||||
3. Fix issues from either review
|
||||
4. Proceed to next task only when both approve
|
||||
|
||||
### With test-driven-development
|
||||
|
||||
Review checks:
|
||||
- Tests written first?
|
||||
- Tests are meaningful?
|
||||
Review verifies:
|
||||
- Tests were written first (RED-GREEN-REFACTOR followed?)
|
||||
- Tests are meaningful (not just asserting True)?
|
||||
- Edge cases covered?
|
||||
- All tests pass?
|
||||
|
||||
### With writing-plans
|
||||
|
||||
Review validates:
|
||||
- Implementation matches plan?
|
||||
- Implementation matches the plan?
|
||||
- All tasks completed?
|
||||
- Quality standards met?
|
||||
|
||||
## Common Review Patterns
|
||||
## Red Flags
|
||||
|
||||
### Pre-Merge Review
|
||||
|
||||
Before merging feature branch:
|
||||
|
||||
```bash
|
||||
# Create review checkpoint
|
||||
git log --oneline main..feature-branch
|
||||
|
||||
# Get summary of changes
|
||||
git diff --stat main..feature-branch
|
||||
|
||||
# Request review
|
||||
delegate_task(
|
||||
goal="Pre-merge review of feature branch",
|
||||
context="[changes, requirements, test results]"
|
||||
)
|
||||
|
||||
# Address feedback
|
||||
# Merge when approved
|
||||
```
|
||||
|
||||
### Continuous Review
|
||||
|
||||
During subagent-driven development:
|
||||
|
||||
```python
|
||||
# After each task
|
||||
if task_complete:
|
||||
review_result = request_review(task)
|
||||
if review_result.has_critical_issues():
|
||||
fix_issues(review_result.critical)
|
||||
re_review()
|
||||
proceed_to_next_task()
|
||||
```
|
||||
|
||||
### Emergency Review
|
||||
|
||||
When fixing production bugs:
|
||||
|
||||
1. Fix with tests
|
||||
2. Self-review
|
||||
3. Quick peer review
|
||||
4. Deploy
|
||||
5. Full review post-deploy
|
||||
|
||||
## Review Best Practices
|
||||
|
||||
### As Review Requester
|
||||
|
||||
**Do:**
|
||||
- Provide complete context
|
||||
- Highlight areas of concern
|
||||
- Ask specific questions
|
||||
- Be responsive to feedback
|
||||
- Fix issues promptly
|
||||
|
||||
**Don't:**
|
||||
- Rush the reviewer
|
||||
- Argue without evidence
|
||||
- Ignore feedback
|
||||
- Take criticism personally
|
||||
|
||||
### As Reviewer (via subagent)
|
||||
|
||||
**Do:**
|
||||
- Be specific about issues
|
||||
- Suggest improvements
|
||||
- Acknowledge what works
|
||||
- Prioritize issues
|
||||
|
||||
**Don't:**
|
||||
- Nitpick style (unless project requires)
|
||||
- Make vague comments
|
||||
- Block without explanation
|
||||
- Be overly critical
|
||||
**Never:**
|
||||
- Skip review because "it's simple"
|
||||
- Ignore Critical issues
|
||||
- Proceed with unfixed Important issues
|
||||
- Argue with valid technical feedback without evidence
|
||||
|
||||
## Quality Gates
|
||||
|
||||
**Must pass before merge:**
|
||||
- [ ] No critical issues
|
||||
- [ ] All tests pass
|
||||
- [ ] Review approved
|
||||
- [ ] Review verdict: APPROVE
|
||||
- [ ] Requirements met
|
||||
|
||||
**Should pass before merge:**
|
||||
@@ -376,10 +256,6 @@ When fixing production bugs:
|
||||
- [ ] Documentation updated
|
||||
- [ ] Performance acceptable
|
||||
|
||||
**Nice to have:**
|
||||
- [ ] No minor issues
|
||||
- [ ] Extra polish
|
||||
|
||||
## Remember
|
||||
|
||||
```
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
---
|
||||
name: subagent-driven-development
|
||||
description: Use when executing implementation plans with independent tasks. Dispatches fresh delegate_task per task with two-stage review (spec compliance then code quality).
|
||||
version: 1.0.0
|
||||
author: Hermes Agent (adapted from Superpowers)
|
||||
version: 1.1.0
|
||||
author: Hermes Agent (adapted from obra/superpowers)
|
||||
license: MIT
|
||||
metadata:
|
||||
hermes:
|
||||
@@ -16,33 +16,41 @@ metadata:
|
||||
|
||||
Execute implementation plans by dispatching fresh subagents per task with systematic two-stage review.
|
||||
|
||||
**Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration
|
||||
**Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration.
|
||||
|
||||
## When to Use
|
||||
|
||||
Use this skill when:
|
||||
- You have an implementation plan (from writing-plans skill)
|
||||
- You have an implementation plan (from writing-plans skill or user requirements)
|
||||
- Tasks are mostly independent
|
||||
- You want to stay in the current session
|
||||
- Quality and spec compliance are important
|
||||
- You want automated review between tasks
|
||||
|
||||
**vs. Manual execution:**
|
||||
- Parallel task execution possible
|
||||
- Automated review process
|
||||
- Consistent quality checks
|
||||
- Better for complex multi-step plans
|
||||
**vs. manual execution:**
|
||||
- Fresh context per task (no confusion from accumulated state)
|
||||
- Automated review process catches issues early
|
||||
- Consistent quality checks across all tasks
|
||||
- Subagents can ask questions before starting work
|
||||
|
||||
## The Process
|
||||
|
||||
### 1. Read and Parse Plan
|
||||
|
||||
```markdown
|
||||
[Read plan file once: docs/plans/feature-plan.md]
|
||||
[Extract all tasks with full text and context]
|
||||
[Create todo list with all tasks]
|
||||
Read the plan file. Extract ALL tasks with their full text and context upfront. Create a todo list:
|
||||
|
||||
```python
|
||||
# Read the plan
|
||||
read_file("docs/plans/feature-plan.md")
|
||||
|
||||
# Create todo list with all tasks
|
||||
todo([
|
||||
{"id": "task-1", "content": "Create User model with email field", "status": "pending"},
|
||||
{"id": "task-2", "content": "Add password hashing utility", "status": "pending"},
|
||||
{"id": "task-3", "content": "Create login endpoint", "status": "pending"},
|
||||
])
|
||||
```
|
||||
|
||||
**Action:** Read plan, extract tasks, create todo list.
|
||||
**Key:** Read the plan ONCE. Extract everything. Don't make subagents read the plan file — provide the full task text directly in context.
|
||||
|
||||
### 2. Per-Task Workflow
|
||||
|
||||
@@ -50,124 +58,137 @@ For EACH task in the plan:
|
||||
|
||||
#### Step 1: Dispatch Implementer Subagent
|
||||
|
||||
Use `delegate_task` with:
|
||||
- **goal:** Implement [specific task from plan]
|
||||
- **context:** Full task description from plan, project structure, relevant files
|
||||
- **toolsets:** ['terminal', 'file', 'web'] (or as needed)
|
||||
Use `delegate_task` with complete context:
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Task: Add user authentication middleware
|
||||
delegate_task(
|
||||
goal="Implement JWT authentication middleware as specified in Task 3 of the plan",
|
||||
goal="Implement Task 1: Create User model with email and password_hash fields",
|
||||
context="""
|
||||
Task from plan:
|
||||
- Create: src/middleware/auth.py
|
||||
- Validate JWT tokens from Authorization header
|
||||
- Return 401 for invalid tokens
|
||||
- Attach user info to request object
|
||||
|
||||
Project structure:
|
||||
- Flask app in src/app.py
|
||||
- Uses PyJWT library
|
||||
- Existing middleware pattern in src/middleware/
|
||||
TASK FROM PLAN:
|
||||
- Create: src/models/user.py
|
||||
- Add User class with email (str) and password_hash (str) fields
|
||||
- Use bcrypt for password hashing
|
||||
- Include __repr__ for debugging
|
||||
|
||||
FOLLOW TDD:
|
||||
1. Write failing test in tests/models/test_user.py
|
||||
2. Run: pytest tests/models/test_user.py -v (verify FAIL)
|
||||
3. Write minimal implementation
|
||||
4. Run: pytest tests/models/test_user.py -v (verify PASS)
|
||||
5. Run: pytest tests/ -q (verify no regressions)
|
||||
6. Commit: git add -A && git commit -m "feat: add User model with password hashing"
|
||||
|
||||
PROJECT CONTEXT:
|
||||
- Python 3.11, Flask app in src/app.py
|
||||
- Existing models in src/models/
|
||||
- Tests use pytest, run from project root
|
||||
- bcrypt already in requirements.txt
|
||||
""",
|
||||
toolsets=['terminal', 'file']
|
||||
)
|
||||
```
|
||||
|
||||
#### Step 2: Implementer Subagent Works
|
||||
#### Step 2: Dispatch Spec Compliance Reviewer
|
||||
|
||||
The subagent will:
|
||||
1. Ask questions if needed (you answer)
|
||||
2. Implement the task following TDD
|
||||
3. Write tests
|
||||
4. Run tests to verify
|
||||
5. Self-review
|
||||
6. Report completion
|
||||
|
||||
**Your role:** Answer questions, provide context.
|
||||
|
||||
#### Step 3: Spec Compliance Review
|
||||
|
||||
Dispatch reviewer subagent:
|
||||
After the implementer completes, verify against the original spec:
|
||||
|
||||
```python
|
||||
delegate_task(
|
||||
goal="Review if implementation matches spec from plan",
|
||||
goal="Review if implementation matches the spec from the plan",
|
||||
context="""
|
||||
Original task spec: [copy from plan]
|
||||
Implementation: [file paths and key code]
|
||||
|
||||
Check:
|
||||
- All requirements from spec implemented?
|
||||
- File paths match spec?
|
||||
- Behavior matches spec?
|
||||
- Nothing extra added?
|
||||
ORIGINAL TASK SPEC:
|
||||
- Create src/models/user.py with User class
|
||||
- Fields: email (str), password_hash (str)
|
||||
- Use bcrypt for password hashing
|
||||
- Include __repr__
|
||||
|
||||
CHECK:
|
||||
- [ ] All requirements from spec implemented?
|
||||
- [ ] File paths match spec?
|
||||
- [ ] Function signatures match spec?
|
||||
- [ ] Behavior matches expected?
|
||||
- [ ] Nothing extra added (no scope creep)?
|
||||
|
||||
OUTPUT: PASS or list of specific spec gaps to fix.
|
||||
""",
|
||||
toolsets=['file']
|
||||
)
|
||||
```
|
||||
|
||||
**If spec issues found:**
|
||||
- Subagent fixes gaps
|
||||
- Re-run spec review
|
||||
- Continue only when spec-compliant
|
||||
**If spec issues found:** Fix gaps, then re-run spec review. Continue only when spec-compliant.
|
||||
|
||||
#### Step 4: Code Quality Review
|
||||
#### Step 3: Dispatch Code Quality Reviewer
|
||||
|
||||
Dispatch quality reviewer:
|
||||
After spec compliance passes:
|
||||
|
||||
```python
|
||||
delegate_task(
|
||||
goal="Review code quality and best practices",
|
||||
goal="Review code quality for Task 1 implementation",
|
||||
context="""
|
||||
Code to review: [file paths]
|
||||
|
||||
Check:
|
||||
- Follows project style?
|
||||
- Proper error handling?
|
||||
- Good naming?
|
||||
- Test coverage adequate?
|
||||
- No obvious bugs?
|
||||
FILES TO REVIEW:
|
||||
- src/models/user.py
|
||||
- tests/models/test_user.py
|
||||
|
||||
CHECK:
|
||||
- [ ] Follows project conventions and style?
|
||||
- [ ] Proper error handling?
|
||||
- [ ] Clear variable/function names?
|
||||
- [ ] Adequate test coverage?
|
||||
- [ ] No obvious bugs or missed edge cases?
|
||||
- [ ] No security issues?
|
||||
|
||||
OUTPUT FORMAT:
|
||||
- Critical Issues: [must fix before proceeding]
|
||||
- Important Issues: [should fix]
|
||||
- Minor Issues: [optional]
|
||||
- Verdict: APPROVED or REQUEST_CHANGES
|
||||
""",
|
||||
toolsets=['file']
|
||||
)
|
||||
```
|
||||
|
||||
**If quality issues found:**
|
||||
- Subagent fixes issues
|
||||
- Re-run quality review
|
||||
- Continue only when approved
|
||||
**If quality issues found:** Fix issues, re-review. Continue only when approved.
|
||||
|
||||
#### Step 5: Mark Complete
|
||||
#### Step 4: Mark Complete
|
||||
|
||||
Update todo list, mark task complete.
|
||||
```python
|
||||
todo([{"id": "task-1", "content": "Create User model with email field", "status": "completed"}], merge=True)
|
||||
```
|
||||
|
||||
### 3. Final Review
|
||||
|
||||
After ALL tasks complete:
|
||||
After ALL tasks are complete, dispatch a final integration reviewer:
|
||||
|
||||
```python
|
||||
delegate_task(
|
||||
goal="Review entire implementation for consistency",
|
||||
context="All tasks completed, review for integration issues",
|
||||
toolsets=['file']
|
||||
goal="Review the entire implementation for consistency and integration issues",
|
||||
context="""
|
||||
All tasks from the plan are complete. Review the full implementation:
|
||||
- Do all components work together?
|
||||
- Any inconsistencies between tasks?
|
||||
- All tests passing?
|
||||
- Ready for merge?
|
||||
""",
|
||||
toolsets=['terminal', 'file']
|
||||
)
|
||||
```
|
||||
|
||||
### 4. Branch Cleanup
|
||||
### 4. Verify and Commit
|
||||
|
||||
Use `finishing-a-development-branch` skill:
|
||||
- Verify all tests pass
|
||||
- Present merge options
|
||||
- Clean up worktree
|
||||
```bash
|
||||
# Run full test suite
|
||||
pytest tests/ -q
|
||||
|
||||
# Review all changes
|
||||
git diff --stat
|
||||
|
||||
# Final commit if needed
|
||||
git add -A && git commit -m "feat: complete [feature name] implementation"
|
||||
```
|
||||
|
||||
## Task Granularity
|
||||
|
||||
**Good task size:** 2-5 minutes of focused work
|
||||
|
||||
**Examples:**
|
||||
**Each task = 2-5 minutes of focused work.**
|
||||
|
||||
**Too big:**
|
||||
- "Implement user authentication system"
|
||||
@@ -177,188 +198,134 @@ Use `finishing-a-development-branch` skill:
|
||||
- "Add password hashing function"
|
||||
- "Create login endpoint"
|
||||
- "Add JWT token generation"
|
||||
- "Create registration endpoint"
|
||||
|
||||
## Communication Pattern
|
||||
## Red Flags — Never Do These
|
||||
|
||||
### You to Subagent
|
||||
|
||||
**Provide:**
|
||||
- Clear task description
|
||||
- Exact file paths
|
||||
- Expected behavior
|
||||
- Success criteria
|
||||
- Relevant context
|
||||
|
||||
**Example:**
|
||||
```
|
||||
Task: Add email validation
|
||||
Files: Create src/validators/email.py
|
||||
Expected: Function returns True for valid emails, False for invalid
|
||||
Success: Tests pass for 10 test cases including edge cases
|
||||
Context: Used in user registration flow
|
||||
```
|
||||
|
||||
### Subagent to You
|
||||
|
||||
**Expect:**
|
||||
- Questions for clarification
|
||||
- Progress updates
|
||||
- Completion report
|
||||
- Self-review summary
|
||||
|
||||
**Respond to:**
|
||||
- Answer questions promptly
|
||||
- Provide missing context
|
||||
- Approve approach decisions
|
||||
|
||||
## Two-Stage Review Details
|
||||
|
||||
### Stage 1: Spec Compliance
|
||||
|
||||
**Checks:**
|
||||
- [ ] All requirements from plan implemented
|
||||
- [ ] File paths match specification
|
||||
- [ ] Function signatures match spec
|
||||
- [ ] Behavior matches expected
|
||||
- [ ] No scope creep (nothing extra)
|
||||
|
||||
**Output:** PASS or list of spec gaps
|
||||
|
||||
### Stage 2: Code Quality
|
||||
|
||||
**Checks:**
|
||||
- [ ] Follows language conventions
|
||||
- [ ] Consistent with project style
|
||||
- [ ] Clear variable/function names
|
||||
- [ ] Proper error handling
|
||||
- [ ] Adequate test coverage
|
||||
- [ ] No obvious bugs/edge cases missed
|
||||
- [ ] Documentation if needed
|
||||
|
||||
**Output:** APPROVED or list of issues (critical/important/minor)
|
||||
- Start implementation without a plan
|
||||
- Skip reviews (spec compliance OR code quality)
|
||||
- Proceed with unfixed critical/important issues
|
||||
- Dispatch multiple implementation subagents for tasks that touch the same files
|
||||
- Make subagent read the plan file (provide full text in context instead)
|
||||
- Skip scene-setting context (subagent needs to understand where the task fits)
|
||||
- Ignore subagent questions (answer before letting them proceed)
|
||||
- Accept "close enough" on spec compliance
|
||||
- Skip review loops (reviewer found issues → implementer fixes → review again)
|
||||
- Let implementer self-review replace actual review (both are needed)
|
||||
- **Start code quality review before spec compliance is PASS** (wrong order)
|
||||
- Move to next task while either review has open issues
|
||||
|
||||
## Handling Issues
|
||||
|
||||
### Critical Issues
|
||||
### If Subagent Asks Questions
|
||||
|
||||
**Examples:** Security vulnerability, broken functionality, data loss risk
|
||||
- Answer clearly and completely
|
||||
- Provide additional context if needed
|
||||
- Don't rush them into implementation
|
||||
|
||||
**Action:** Must fix before proceeding
|
||||
### If Reviewer Finds Issues
|
||||
|
||||
### Important Issues
|
||||
- Implementer subagent (or a new one) fixes them
|
||||
- Reviewer reviews again
|
||||
- Repeat until approved
|
||||
- Don't skip the re-review
|
||||
|
||||
**Examples:** Missing tests, poor error handling, unclear code
|
||||
### If Subagent Fails a Task
|
||||
|
||||
**Action:** Should fix before proceeding
|
||||
- Dispatch a new fix subagent with specific instructions about what went wrong
|
||||
- Don't try to fix manually in the controller session (context pollution)
|
||||
|
||||
### Minor Issues
|
||||
## Efficiency Notes
|
||||
|
||||
**Examples:** Style inconsistency, minor refactoring opportunity
|
||||
**Why fresh subagent per task:**
|
||||
- Prevents context pollution from accumulated state
|
||||
- Each subagent gets clean, focused context
|
||||
- No confusion from prior tasks' code or reasoning
|
||||
|
||||
**Action:** Note for later, optional fix
|
||||
**Why two-stage review:**
|
||||
- Spec review catches under/over-building early
|
||||
- Quality review ensures the implementation is well-built
|
||||
- Catches issues before they compound across tasks
|
||||
|
||||
**Cost trade-off:**
|
||||
- More subagent invocations (implementer + 2 reviewers per task)
|
||||
- But catches issues early (cheaper than debugging compounded problems later)
|
||||
|
||||
## Integration with Other Skills
|
||||
|
||||
### With writing-plans
|
||||
|
||||
This skill EXECUTES plans created by the writing-plans skill:
|
||||
1. User requirements → writing-plans → implementation plan
|
||||
2. Implementation plan → subagent-driven-development → working code
|
||||
|
||||
### With test-driven-development
|
||||
|
||||
Subagent should:
|
||||
Implementer subagents should follow TDD:
|
||||
1. Write failing test first
|
||||
2. Implement minimal code
|
||||
3. Verify test passes
|
||||
4. Commit
|
||||
|
||||
### With systematic-debugging
|
||||
|
||||
If subagent encounters bugs:
|
||||
1. Pause implementation
|
||||
2. Debug systematically
|
||||
3. Fix root cause
|
||||
4. Resume
|
||||
|
||||
### With writing-plans
|
||||
|
||||
This skill EXECUTES plans created by writing-plans skill.
|
||||
|
||||
**Sequence:**
|
||||
1. brainstorming → writing-plans → subagent-driven-development
|
||||
Include TDD instructions in every implementer context.
|
||||
|
||||
### With requesting-code-review
|
||||
|
||||
After subagent completes task, use requesting-code-review skill for final validation.
|
||||
The two-stage review process IS the code review. For final integration review, use the requesting-code-review skill's review dimensions.
|
||||
|
||||
## Common Patterns
|
||||
### With systematic-debugging
|
||||
|
||||
### Pattern: Fresh Subagent Per Task
|
||||
|
||||
**Why:** Prevents context pollution
|
||||
**How:** New delegate_task for each task
|
||||
**Result:** Each subagent has clean context
|
||||
|
||||
### Pattern: Two-Stage Review
|
||||
|
||||
**Why:** Catch issues early, ensure quality
|
||||
**How:** Spec review → Quality review
|
||||
**Result:** High-quality, spec-compliant code
|
||||
|
||||
### Pattern: Frequent Checkpoints
|
||||
|
||||
**Why:** Catch issues before they compound
|
||||
**How:** Review after each task
|
||||
**Result:** Issues don't cascade
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Clear Task Boundaries**
|
||||
- One task = one focused change
|
||||
- Independent where possible
|
||||
- Clear success criteria
|
||||
|
||||
2. **Complete Context**
|
||||
- Provide all needed files
|
||||
- Explain project conventions
|
||||
- Share relevant examples
|
||||
|
||||
3. **Review Discipline**
|
||||
- Don't skip spec review
|
||||
- Address critical issues immediately
|
||||
- Keep quality bar consistent
|
||||
|
||||
4. **Communication**
|
||||
- Answer subagent questions quickly
|
||||
- Clarify when needed
|
||||
- Provide feedback on reviews
|
||||
If a subagent encounters bugs during implementation:
|
||||
1. Follow systematic-debugging process
|
||||
2. Find root cause before fixing
|
||||
3. Write regression test
|
||||
4. Resume implementation
|
||||
|
||||
## Example Workflow
|
||||
|
||||
```markdown
|
||||
User: Implement user authentication
|
||||
```
|
||||
[Read plan: docs/plans/auth-feature.md]
|
||||
[Create todo list with 5 tasks]
|
||||
|
||||
You: I'll use subagent-driven development. Let me create a plan first.
|
||||
[Uses writing-plans skill]
|
||||
--- Task 1: Create User model ---
|
||||
[Dispatch implementer subagent]
|
||||
Implementer: "Should email be unique?"
|
||||
You: "Yes, email must be unique"
|
||||
Implementer: Implemented, 3/3 tests passing, committed.
|
||||
|
||||
Plan created with 5 tasks:
|
||||
1. Create User model
|
||||
2. Add password hashing
|
||||
3. Implement login endpoint
|
||||
4. Add JWT middleware
|
||||
5. Create registration endpoint
|
||||
[Dispatch spec reviewer]
|
||||
Spec reviewer: ✅ PASS — all requirements met
|
||||
|
||||
--- Task 1 ---
|
||||
[Dispatch implementer subagent for User model]
|
||||
[Subagent asks: "Should email be unique?"]
|
||||
You: Yes, email must be unique
|
||||
[Subagent implements]
|
||||
[Dispatch spec reviewer - PASS]
|
||||
[Dispatch quality reviewer - APPROVED]
|
||||
Task 1 complete
|
||||
[Dispatch quality reviewer]
|
||||
Quality reviewer: ✅ APPROVED — clean code, good tests
|
||||
|
||||
--- Task 2 ---
|
||||
[Dispatch implementer for password hashing]
|
||||
...
|
||||
[Mark Task 1 complete]
|
||||
|
||||
[After all tasks]
|
||||
[Final review]
|
||||
[Merge branch]
|
||||
--- Task 2: Password hashing ---
|
||||
[Dispatch implementer subagent]
|
||||
Implementer: No questions, implemented, 5/5 tests passing.
|
||||
|
||||
[Dispatch spec reviewer]
|
||||
Spec reviewer: ❌ Missing: password strength validation (spec says "min 8 chars")
|
||||
|
||||
[Implementer fixes]
|
||||
Implementer: Added validation, 7/7 tests passing.
|
||||
|
||||
[Dispatch spec reviewer again]
|
||||
Spec reviewer: ✅ PASS
|
||||
|
||||
[Dispatch quality reviewer]
|
||||
Quality reviewer: Important: Magic number 8, extract to constant
|
||||
Implementer: Extracted MIN_PASSWORD_LENGTH constant
|
||||
Quality reviewer: ✅ APPROVED
|
||||
|
||||
[Mark Task 2 complete]
|
||||
|
||||
... (continue for all tasks)
|
||||
|
||||
[After all tasks: dispatch final integration reviewer]
|
||||
[Run full test suite: all passing]
|
||||
[Done!]
|
||||
```
|
||||
|
||||
## Remember
|
||||
@@ -366,8 +333,8 @@ Task 1 complete
|
||||
```
|
||||
Fresh subagent per task
|
||||
Two-stage review every time
|
||||
Spec compliance first
|
||||
Code quality second
|
||||
Spec compliance FIRST
|
||||
Code quality SECOND
|
||||
Never skip reviews
|
||||
Catch issues early
|
||||
```
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
---
|
||||
name: systematic-debugging
|
||||
description: Use when encountering any bug, test failure, or unexpected behavior. 4-phase root cause investigation process - NO fixes without understanding the problem first.
|
||||
version: 1.0.0
|
||||
author: Hermes Agent (adapted from Superpowers)
|
||||
description: Use when encountering any bug, test failure, or unexpected behavior. 4-phase root cause investigation — NO fixes without understanding the problem first.
|
||||
version: 1.1.0
|
||||
author: Hermes Agent (adapted from obra/superpowers)
|
||||
license: MIT
|
||||
metadata:
|
||||
hermes:
|
||||
@@ -48,7 +48,7 @@ Use for ANY technical issue:
|
||||
**Don't skip when:**
|
||||
- Issue seems simple (simple bugs have root causes too)
|
||||
- You're in a hurry (rushing guarantees rework)
|
||||
- Manager wants it fixed NOW (systematic is faster than thrashing)
|
||||
- Someone wants it fixed NOW (systematic is faster than thrashing)
|
||||
|
||||
## The Four Phases
|
||||
|
||||
@@ -67,7 +67,7 @@ You MUST complete each phase before proceeding to the next.
|
||||
- Read stack traces completely
|
||||
- Note line numbers, file paths, error codes
|
||||
|
||||
**Action:** Copy full error message to your notes.
|
||||
**Action:** Use `read_file` on the relevant source files. Use `search_files` to find the error string in the codebase.
|
||||
|
||||
### 2. Reproduce Consistently
|
||||
|
||||
@@ -76,16 +76,24 @@ You MUST complete each phase before proceeding to the next.
|
||||
- Does it happen every time?
|
||||
- If not reproducible → gather more data, don't guess
|
||||
|
||||
**Action:** Write down exact reproduction steps.
|
||||
**Action:** Use the `terminal` tool to run the failing test or trigger the bug:
|
||||
|
||||
```bash
|
||||
# Run specific failing test
|
||||
pytest tests/test_module.py::test_name -v
|
||||
|
||||
# Run with verbose output
|
||||
pytest tests/test_module.py -v --tb=long
|
||||
```
|
||||
|
||||
### 3. Check Recent Changes
|
||||
|
||||
- What changed that could cause this?
|
||||
- Git diff, recent commits
|
||||
- New dependencies, config changes
|
||||
- Environmental differences
|
||||
|
||||
**Commands:**
|
||||
**Action:**
|
||||
|
||||
```bash
|
||||
# Recent commits
|
||||
git log --oneline -10
|
||||
@@ -94,59 +102,50 @@ git log --oneline -10
|
||||
git diff
|
||||
|
||||
# Changes in specific file
|
||||
git log -p --follow src/problematic_file.py
|
||||
git log -p --follow src/problematic_file.py | head -100
|
||||
```
|
||||
|
||||
### 4. Gather Evidence in Multi-Component Systems
|
||||
|
||||
**WHEN system has multiple components (CI pipeline, API service, database layer):**
|
||||
**WHEN system has multiple components (API → service → database, CI → build → deploy):**
|
||||
|
||||
**BEFORE proposing fixes, add diagnostic instrumentation:**
|
||||
|
||||
For EACH component boundary:
|
||||
- Log what data enters component
|
||||
- Log what data exits component
|
||||
- Log what data enters the component
|
||||
- Log what data exits the component
|
||||
- Verify environment/config propagation
|
||||
- Check state at each layer
|
||||
|
||||
Run once to gather evidence showing WHERE it breaks
|
||||
THEN analyze evidence to identify failing component
|
||||
THEN investigate that specific component
|
||||
Run once to gather evidence showing WHERE it breaks.
|
||||
THEN analyze evidence to identify the failing component.
|
||||
THEN investigate that specific component.
|
||||
|
||||
### 5. Trace Data Flow
|
||||
|
||||
**WHEN error is deep in the call stack:**
|
||||
|
||||
- Where does the bad value originate?
|
||||
- What called this function with the bad value?
|
||||
- Keep tracing upstream until you find the source
|
||||
- Fix at the source, not at the symptom
|
||||
|
||||
**Action:** Use `search_files` to trace references:
|
||||
|
||||
**Example (multi-layer system):**
|
||||
```python
|
||||
# Layer 1: Entry point
|
||||
def entry_point(input_data):
|
||||
print(f"DEBUG: Input received: {input_data}")
|
||||
result = process_layer1(input_data)
|
||||
print(f"DEBUG: Layer 1 output: {result}")
|
||||
return result
|
||||
# Find where the function is called
|
||||
search_files("function_name(", path="src/", file_glob="*.py")
|
||||
|
||||
# Layer 2: Processing
|
||||
def process_layer1(data):
|
||||
print(f"DEBUG: Layer 1 received: {data}")
|
||||
# ... processing ...
|
||||
print(f"DEBUG: Layer 1 returning: {result}")
|
||||
return result
|
||||
# Find where the variable is set
|
||||
search_files("variable_name\\s*=", path="src/", file_glob="*.py")
|
||||
```
|
||||
|
||||
**Action:** Add logging, run once, analyze output.
|
||||
|
||||
### 5. Isolate the Problem
|
||||
|
||||
- Comment out code until problem disappears
|
||||
- Binary search through recent changes
|
||||
- Create minimal reproduction case
|
||||
- Test with fresh environment
|
||||
|
||||
**Action:** Create minimal reproduction case.
|
||||
|
||||
### Phase 1 Completion Checklist
|
||||
|
||||
- [ ] Error messages fully read and understood
|
||||
- [ ] Issue reproduced consistently
|
||||
- [ ] Recent changes identified and reviewed
|
||||
- [ ] Evidence gathered (logs, state)
|
||||
- [ ] Evidence gathered (logs, state, data flow)
|
||||
- [ ] Problem isolated to specific component/code
|
||||
- [ ] Root cause hypothesis formed
|
||||
|
||||
@@ -154,290 +153,214 @@ def process_layer1(data):
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Solution Design
|
||||
## Phase 2: Pattern Analysis
|
||||
|
||||
**Given the root cause, design the fix:**
|
||||
**Find the pattern before fixing:**
|
||||
|
||||
### 1. Understand the Fix Area
|
||||
### 1. Find Working Examples
|
||||
|
||||
- Read relevant code thoroughly
|
||||
- Understand data flow
|
||||
- Identify affected components
|
||||
- Check for similar issues elsewhere
|
||||
- Locate similar working code in the same codebase
|
||||
- What works that's similar to what's broken?
|
||||
|
||||
**Action:** Read all relevant code files.
|
||||
**Action:** Use `search_files` to find comparable patterns:
|
||||
|
||||
### 2. Design Minimal Fix
|
||||
```python
|
||||
search_files("similar_pattern", path="src/", file_glob="*.py")
|
||||
```
|
||||
|
||||
- Smallest change that fixes root cause
|
||||
- Avoid scope creep
|
||||
- Don't refactor while fixing
|
||||
- Fix one issue at a time
|
||||
### 2. Compare Against References
|
||||
|
||||
**Action:** Write down the exact fix before coding.
|
||||
- If implementing a pattern, read the reference implementation COMPLETELY
|
||||
- Don't skim — read every line
|
||||
- Understand the pattern fully before applying
|
||||
|
||||
### 3. Consider Side Effects
|
||||
### 3. Identify Differences
|
||||
|
||||
- What else could this change affect?
|
||||
- Are there dependencies?
|
||||
- Will this break other functionality?
|
||||
- What's different between working and broken?
|
||||
- List every difference, however small
|
||||
- Don't assume "that can't matter"
|
||||
|
||||
**Action:** Identify potential side effects.
|
||||
### 4. Understand Dependencies
|
||||
|
||||
### Phase 2 Completion Checklist
|
||||
|
||||
- [ ] Fix area code fully understood
|
||||
- [ ] Minimal fix designed
|
||||
- [ ] Side effects identified
|
||||
- [ ] Fix approach documented
|
||||
- What other components does this need?
|
||||
- What settings, config, environment?
|
||||
- What assumptions does it make?
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Implementation
|
||||
## Phase 3: Hypothesis and Testing
|
||||
|
||||
**Make the fix:**
|
||||
**Scientific method:**
|
||||
|
||||
### 1. Write Test First (if possible)
|
||||
### 1. Form a Single Hypothesis
|
||||
|
||||
```python
|
||||
def test_should_handle_empty_input():
|
||||
"""Regression test for bug #123"""
|
||||
result = process_data("")
|
||||
assert result == expected_empty_result
|
||||
```
|
||||
- State clearly: "I think X is the root cause because Y"
|
||||
- Write it down
|
||||
- Be specific, not vague
|
||||
|
||||
### 2. Implement Fix
|
||||
### 2. Test Minimally
|
||||
|
||||
```python
|
||||
# Before (buggy)
|
||||
def process_data(data):
|
||||
return data.split(",")[0]
|
||||
- Make the SMALLEST possible change to test the hypothesis
|
||||
- One variable at a time
|
||||
- Don't fix multiple things at once
|
||||
|
||||
# After (fixed)
|
||||
def process_data(data):
|
||||
if not data:
|
||||
return ""
|
||||
return data.split(",")[0]
|
||||
```
|
||||
### 3. Verify Before Continuing
|
||||
|
||||
- Did it work? → Phase 4
|
||||
- Didn't work? → Form NEW hypothesis
|
||||
- DON'T add more fixes on top
|
||||
|
||||
### 4. When You Don't Know
|
||||
|
||||
- Say "I don't understand X"
|
||||
- Don't pretend to know
|
||||
- Ask the user for help
|
||||
- Research more
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Implementation
|
||||
|
||||
**Fix the root cause, not the symptom:**
|
||||
|
||||
### 1. Create Failing Test Case
|
||||
|
||||
- Simplest possible reproduction
|
||||
- Automated test if possible
|
||||
- MUST have before fixing
|
||||
- Use the `test-driven-development` skill
|
||||
|
||||
### 2. Implement Single Fix
|
||||
|
||||
- Address the root cause identified
|
||||
- ONE change at a time
|
||||
- No "while I'm here" improvements
|
||||
- No bundled refactoring
|
||||
|
||||
### 3. Verify Fix
|
||||
|
||||
```bash
|
||||
# Run the specific test
|
||||
pytest tests/test_data.py::test_should_handle_empty_input -v
|
||||
# Run the specific regression test
|
||||
pytest tests/test_module.py::test_regression -v
|
||||
|
||||
# Run all tests to check for regressions
|
||||
pytest
|
||||
# Run full suite — no regressions
|
||||
pytest tests/ -q
|
||||
```
|
||||
|
||||
### Phase 3 Completion Checklist
|
||||
### 4. If Fix Doesn't Work — The Rule of Three
|
||||
|
||||
- [ ] Test written that reproduces the bug
|
||||
- [ ] Minimal fix implemented
|
||||
- [ ] Test passes
|
||||
- [ ] No regressions introduced
|
||||
- **STOP.**
|
||||
- Count: How many fixes have you tried?
|
||||
- If < 3: Return to Phase 1, re-analyze with new information
|
||||
- **If ≥ 3: STOP and question the architecture (step 5 below)**
|
||||
- DON'T attempt Fix #4 without architectural discussion
|
||||
|
||||
### 5. If 3+ Fixes Failed: Question Architecture
|
||||
|
||||
**Pattern indicating an architectural problem:**
|
||||
- Each fix reveals new shared state/coupling in a different place
|
||||
- Fixes require "massive refactoring" to implement
|
||||
- Each fix creates new symptoms elsewhere
|
||||
|
||||
**STOP and question fundamentals:**
|
||||
- Is this pattern fundamentally sound?
|
||||
- Are we "sticking with it through sheer inertia"?
|
||||
- Should we refactor the architecture vs. continue fixing symptoms?
|
||||
|
||||
**Discuss with the user before attempting more fixes.**
|
||||
|
||||
This is NOT a failed hypothesis — this is a wrong architecture.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Verification
|
||||
## Red Flags — STOP and Follow Process
|
||||
|
||||
**Confirm it's actually fixed:**
|
||||
If you catch yourself thinking:
|
||||
- "Quick fix for now, investigate later"
|
||||
- "Just try changing X and see if it works"
|
||||
- "Add multiple changes, run tests"
|
||||
- "Skip the test, I'll manually verify"
|
||||
- "It's probably X, let me fix that"
|
||||
- "I don't fully understand but this might work"
|
||||
- "Pattern says X but I'll adapt it differently"
|
||||
- "Here are the main problems: [lists fixes without investigation]"
|
||||
- Proposing solutions before tracing data flow
|
||||
- **"One more fix attempt" (when already tried 2+)**
|
||||
- **Each fix reveals a new problem in a different place**
|
||||
|
||||
### 1. Reproduce Original Issue
|
||||
**ALL of these mean: STOP. Return to Phase 1.**
|
||||
|
||||
- Follow original reproduction steps
|
||||
- Verify issue is resolved
|
||||
- Test edge cases
|
||||
**If 3+ fixes failed:** Question the architecture (Phase 4 step 5).
|
||||
|
||||
### 2. Regression Testing
|
||||
## Common Rationalizations
|
||||
|
||||
```bash
|
||||
# Full test suite
|
||||
pytest
|
||||
| Excuse | Reality |
|
||||
|--------|---------|
|
||||
| "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. |
|
||||
| "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
|
||||
| "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. |
|
||||
| "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. |
|
||||
| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
|
||||
| "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. |
|
||||
| "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
|
||||
| "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question the pattern, don't fix again. |
|
||||
|
||||
# Integration tests
|
||||
pytest tests/integration/
|
||||
## Quick Reference
|
||||
|
||||
# Check related areas
|
||||
pytest -k "related_feature"
|
||||
```
|
||||
| Phase | Key Activities | Success Criteria |
|
||||
|-------|---------------|------------------|
|
||||
| **1. Root Cause** | Read errors, reproduce, check changes, gather evidence, trace data flow | Understand WHAT and WHY |
|
||||
| **2. Pattern** | Find working examples, compare, identify differences | Know what's different |
|
||||
| **3. Hypothesis** | Form theory, test minimally, one variable at a time | Confirmed or new hypothesis |
|
||||
| **4. Implementation** | Create regression test, fix root cause, verify | Bug resolved, all tests pass |
|
||||
|
||||
### 3. Monitor After Deploy
|
||||
## Hermes Agent Integration
|
||||
|
||||
- Watch logs for related errors
|
||||
- Check metrics
|
||||
- Verify fix in production
|
||||
### Investigation Tools
|
||||
|
||||
### Phase 4 Completion Checklist
|
||||
Use these Hermes tools during Phase 1:
|
||||
|
||||
- [ ] Original issue cannot be reproduced
|
||||
- [ ] All tests pass
|
||||
- [ ] No new warnings/errors
|
||||
- [ ] Fix documented (commit message, comments)
|
||||
- **`search_files`** — Find error strings, trace function calls, locate patterns
|
||||
- **`read_file`** — Read source code with line numbers for precise analysis
|
||||
- **`terminal`** — Run tests, check git history, reproduce bugs
|
||||
- **`web_search`/`web_extract`** — Research error messages, library docs
|
||||
|
||||
---
|
||||
### With delegate_task
|
||||
|
||||
## Debugging Techniques
|
||||
|
||||
### Root Cause Tracing
|
||||
|
||||
Ask "why" 5 times:
|
||||
1. Why did it fail? → Null pointer
|
||||
2. Why was it null? → Function returned null
|
||||
3. Why did function return null? → Missing validation
|
||||
4. Why was validation missing? → Assumed input always valid
|
||||
5. Why was that assumption wrong? → API changed
|
||||
|
||||
**Root cause:** API change not accounted for
|
||||
|
||||
### Defense in Depth
|
||||
|
||||
Don't fix just the symptom:
|
||||
|
||||
**Bad:** Add null check at crash site
|
||||
**Good:**
|
||||
1. Add validation at API boundary
|
||||
2. Add null check at crash site
|
||||
3. Add test for both
|
||||
4. Document API behavior
|
||||
|
||||
### Condition-Based Waiting
|
||||
|
||||
For timing/race conditions:
|
||||
For complex multi-component debugging, dispatch investigation subagents:
|
||||
|
||||
```python
|
||||
# Bad - arbitrary sleep
|
||||
import time
|
||||
time.sleep(5) # "Should be enough"
|
||||
delegate_task(
|
||||
goal="Investigate why [specific test/behavior] fails",
|
||||
context="""
|
||||
Follow systematic-debugging skill:
|
||||
1. Read the error message carefully
|
||||
2. Reproduce the issue
|
||||
3. Trace the data flow to find root cause
|
||||
4. Report findings — do NOT fix yet
|
||||
|
||||
# Good - wait for condition
|
||||
from tenacity import retry, wait_exponential, stop_after_attempt
|
||||
|
||||
@retry(wait=wait_exponential(multiplier=1, min=4, max=10),
|
||||
stop=stop_after_attempt(5))
|
||||
def wait_for_service():
|
||||
response = requests.get(health_url)
|
||||
assert response.status_code == 200
|
||||
Error: [paste full error]
|
||||
File: [path to failing code]
|
||||
Test command: [exact command]
|
||||
""",
|
||||
toolsets=['terminal', 'file']
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Debugging Pitfalls
|
||||
|
||||
### Fix Without Understanding
|
||||
|
||||
**Symptom:** "Just add a try/catch"
|
||||
**Problem:** Masks the real issue
|
||||
**Solution:** Complete Phase 1 before any fix
|
||||
|
||||
### Shotgun Debugging
|
||||
|
||||
**Symptom:** Change 5 things at once
|
||||
**Problem:** Don't know what fixed it
|
||||
**Solution:** One change at a time, verify each
|
||||
|
||||
### Premature Optimization
|
||||
|
||||
**Symptom:** Rewrite while debugging
|
||||
**Problem:** Introduces new bugs
|
||||
**Solution:** Fix first, refactor later
|
||||
|
||||
### Assuming Environment
|
||||
|
||||
**Symptom:** "Works on my machine"
|
||||
**Problem:** Environment differences
|
||||
**Solution:** Check environment variables, versions, configs
|
||||
|
||||
---
|
||||
|
||||
## Language-Specific Tools
|
||||
|
||||
### Python
|
||||
|
||||
```python
|
||||
# Add debugger
|
||||
import pdb; pdb.set_trace()
|
||||
|
||||
# Or use ipdb for better experience
|
||||
import ipdb; ipdb.set_trace()
|
||||
|
||||
# Log state
|
||||
import logging
|
||||
logging.debug(f"Variable state: {variable}")
|
||||
|
||||
# Stack trace
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
```
|
||||
|
||||
### JavaScript/TypeScript
|
||||
|
||||
```javascript
|
||||
// Debugger
|
||||
debugger;
|
||||
|
||||
// Console with context
|
||||
console.log("State:", { var1, var2, var3 });
|
||||
|
||||
// Stack trace
|
||||
console.trace("Here");
|
||||
|
||||
// Error with context
|
||||
throw new Error(`Failed with input: ${JSON.stringify(input)}`);
|
||||
```
|
||||
|
||||
### Go
|
||||
|
||||
```go
|
||||
// Print state
|
||||
fmt.Printf("Debug: variable=%+v\n", variable)
|
||||
|
||||
// Stack trace
|
||||
import "runtime/debug"
|
||||
debug.PrintStack()
|
||||
|
||||
// Panic with context
|
||||
if err != nil {
|
||||
panic(fmt.Sprintf("unexpected error: %v", err))
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integration with Other Skills
|
||||
|
||||
### With test-driven-development
|
||||
|
||||
When debugging:
|
||||
1. Write test that reproduces bug
|
||||
2. Debug systematically
|
||||
3. Fix root cause
|
||||
4. Test passes
|
||||
When fixing bugs:
|
||||
1. Write a test that reproduces the bug (RED)
|
||||
2. Debug systematically to find root cause
|
||||
3. Fix the root cause (GREEN)
|
||||
4. The test proves the fix and prevents regression
|
||||
|
||||
### With writing-plans
|
||||
## Real-World Impact
|
||||
|
||||
Include debugging tasks in plans:
|
||||
- "Add diagnostic logging"
|
||||
- "Create reproduction test"
|
||||
- "Verify fix resolves issue"
|
||||
|
||||
### With subagent-driven-development
|
||||
|
||||
If subagent gets stuck:
|
||||
1. Switch to systematic debugging
|
||||
2. Analyze root cause
|
||||
3. Provide findings to subagent
|
||||
4. Resume implementation
|
||||
|
||||
---
|
||||
|
||||
## Remember
|
||||
|
||||
```
|
||||
PHASE 1: Investigate → Understand WHY
|
||||
PHASE 2: Design → Plan the fix
|
||||
PHASE 3: Implement → Make the fix
|
||||
PHASE 4: Verify → Confirm it's fixed
|
||||
```
|
||||
From debugging sessions:
|
||||
- Systematic approach: 15-30 minutes to fix
|
||||
- Random fixes approach: 2-3 hours of thrashing
|
||||
- First-time fix rate: 95% vs 40%
|
||||
- New bugs introduced: Near zero vs common
|
||||
|
||||
**No shortcuts. No guessing. Systematic always wins.**
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
---
|
||||
name: test-driven-development
|
||||
description: Use when implementing any feature or bugfix, before writing implementation code. Enforces RED-GREEN-REFACTOR cycle with test-first approach.
|
||||
version: 1.0.0
|
||||
author: Hermes Agent (adapted from Superpowers)
|
||||
version: 1.1.0
|
||||
author: Hermes Agent (adapted from obra/superpowers)
|
||||
license: MIT
|
||||
metadata:
|
||||
hermes:
|
||||
@@ -28,7 +28,7 @@ Write the test first. Watch it fail. Write minimal code to pass.
|
||||
- Refactoring
|
||||
- Behavior changes
|
||||
|
||||
**Exceptions (ask your human partner):**
|
||||
**Exceptions (ask the user first):**
|
||||
- Throwaway prototypes
|
||||
- Generated code
|
||||
- Configuration files
|
||||
@@ -53,11 +53,11 @@ Implement fresh from tests. Period.
|
||||
|
||||
## Red-Green-Refactor Cycle
|
||||
|
||||
### RED - Write Failing Test
|
||||
### RED — Write Failing Test
|
||||
|
||||
Write one minimal test showing what should happen.
|
||||
|
||||
**Good Example:**
|
||||
**Good test:**
|
||||
```python
|
||||
def test_retries_failed_operations_3_times():
|
||||
attempts = 0
|
||||
@@ -67,43 +67,51 @@ def test_retries_failed_operations_3_times():
|
||||
if attempts < 3:
|
||||
raise Exception('fail')
|
||||
return 'success'
|
||||
|
||||
|
||||
result = retry_operation(operation)
|
||||
|
||||
|
||||
assert result == 'success'
|
||||
assert attempts == 3
|
||||
```
|
||||
- Clear name, tests real behavior, one thing
|
||||
Clear name, tests real behavior, one thing.
|
||||
|
||||
**Bad Example:**
|
||||
**Bad test:**
|
||||
```python
|
||||
def test_retry_works():
|
||||
mock = MagicMock()
|
||||
mock.side_effect = [Exception(), Exception(), 'success']
|
||||
|
||||
result = retry_operation(mock)
|
||||
|
||||
assert result == 'success' # What about retry count? Timing?
|
||||
```
|
||||
- Vague name, mocks behavior not reality, unclear what it tests
|
||||
Vague name, tests mock not real code.
|
||||
|
||||
### Verify RED
|
||||
**Requirements:**
|
||||
- One behavior per test
|
||||
- Clear descriptive name ("and" in name? Split it)
|
||||
- Real code, not mocks (unless truly unavoidable)
|
||||
- Name describes behavior, not implementation
|
||||
|
||||
Run the test. It MUST fail.
|
||||
### Verify RED — Watch It Fail
|
||||
|
||||
**If it passes:**
|
||||
- Test is wrong (not testing what you think)
|
||||
- Code already exists (delete it, start over)
|
||||
- Wrong test file running
|
||||
**MANDATORY. Never skip.**
|
||||
|
||||
**What to check:**
|
||||
- Error message makes sense
|
||||
- Fails for expected reason
|
||||
- Stack trace points to right place
|
||||
```bash
|
||||
# Use terminal tool to run the specific test
|
||||
pytest tests/test_feature.py::test_specific_behavior -v
|
||||
```
|
||||
|
||||
### GREEN - Minimal Code
|
||||
Confirm:
|
||||
- Test fails (not errors from typos)
|
||||
- Failure message is expected
|
||||
- Fails because the feature is missing
|
||||
|
||||
Write just enough code to pass. Nothing more.
|
||||
**Test passes immediately?** You're testing existing behavior. Fix the test.
|
||||
|
||||
**Test errors?** Fix the error, re-run until it fails correctly.
|
||||
|
||||
### GREEN — Minimal Code
|
||||
|
||||
Write the simplest code to pass the test. Nothing more.
|
||||
|
||||
**Good:**
|
||||
```python
|
||||
@@ -119,266 +127,216 @@ def add(a, b):
|
||||
return result
|
||||
```
|
||||
|
||||
Cheating is OK in GREEN:
|
||||
Don't add features, refactor other code, or "improve" beyond the test.
|
||||
|
||||
**Cheating is OK in GREEN:**
|
||||
- Hardcode return values
|
||||
- Copy-paste
|
||||
- Duplicate code
|
||||
- Skip edge cases
|
||||
|
||||
**We'll fix it in refactor.**
|
||||
We'll fix it in REFACTOR.
|
||||
|
||||
### Verify GREEN
|
||||
### Verify GREEN — Watch It Pass
|
||||
|
||||
Run tests. All must pass.
|
||||
**MANDATORY.**
|
||||
|
||||
**If fails:**
|
||||
- Fix minimal code
|
||||
- Don't expand scope
|
||||
- Stay in GREEN
|
||||
```bash
|
||||
# Run the specific test
|
||||
pytest tests/test_feature.py::test_specific_behavior -v
|
||||
|
||||
### REFACTOR - Clean Up
|
||||
# Then run ALL tests to check for regressions
|
||||
pytest tests/ -q
|
||||
```
|
||||
|
||||
Now improve the code while keeping tests green.
|
||||
Confirm:
|
||||
- Test passes
|
||||
- Other tests still pass
|
||||
- Output pristine (no errors, warnings)
|
||||
|
||||
**Safe refactorings:**
|
||||
- Rename variables/functions
|
||||
- Extract helper functions
|
||||
**Test fails?** Fix the code, not the test.
|
||||
|
||||
**Other tests fail?** Fix regressions now.
|
||||
|
||||
### REFACTOR — Clean Up
|
||||
|
||||
After green only:
|
||||
- Remove duplication
|
||||
- Improve names
|
||||
- Extract helpers
|
||||
- Simplify expressions
|
||||
- Improve readability
|
||||
|
||||
**Golden rule:** Tests stay green throughout.
|
||||
Keep tests green throughout. Don't add behavior.
|
||||
|
||||
**If tests fail during refactor:**
|
||||
- Undo immediately
|
||||
- Smaller refactoring steps
|
||||
- Check you didn't change behavior
|
||||
**If tests fail during refactor:** Undo immediately. Take smaller steps.
|
||||
|
||||
## Implementation Workflow
|
||||
### Repeat
|
||||
|
||||
### 1. Create Test File
|
||||
Next failing test for next behavior. One cycle at a time.
|
||||
|
||||
```bash
|
||||
# Python
|
||||
touch tests/test_feature.py
|
||||
## Why Order Matters
|
||||
|
||||
# JavaScript
|
||||
touch tests/feature.test.js
|
||||
**"I'll write tests after to verify it works"**
|
||||
|
||||
# Rust
|
||||
touch tests/feature_tests.rs
|
||||
```
|
||||
Tests written after code pass immediately. Passing immediately proves nothing:
|
||||
- Might test the wrong thing
|
||||
- Might test implementation, not behavior
|
||||
- Might miss edge cases you forgot
|
||||
- You never saw it catch the bug
|
||||
|
||||
### 2. Write First Failing Test
|
||||
Test-first forces you to see the test fail, proving it actually tests something.
|
||||
|
||||
**"I already manually tested all the edge cases"**
|
||||
|
||||
Manual testing is ad-hoc. You think you tested everything but:
|
||||
- No record of what you tested
|
||||
- Can't re-run when code changes
|
||||
- Easy to forget cases under pressure
|
||||
- "It worked when I tried it" ≠ comprehensive
|
||||
|
||||
Automated tests are systematic. They run the same way every time.
|
||||
|
||||
**"Deleting X hours of work is wasteful"**
|
||||
|
||||
Sunk cost fallacy. The time is already gone. Your choice now:
|
||||
- Delete and rewrite with TDD (high confidence)
|
||||
- Keep it and add tests after (low confidence, likely bugs)
|
||||
|
||||
The "waste" is keeping code you can't trust.
|
||||
|
||||
**"TDD is dogmatic, being pragmatic means adapting"**
|
||||
|
||||
TDD IS pragmatic:
|
||||
- Finds bugs before commit (faster than debugging after)
|
||||
- Prevents regressions (tests catch breaks immediately)
|
||||
- Documents behavior (tests show how to use code)
|
||||
- Enables refactoring (change freely, tests catch breaks)
|
||||
|
||||
"Pragmatic" shortcuts = debugging in production = slower.
|
||||
|
||||
**"Tests after achieve the same goals — it's spirit not ritual"**
|
||||
|
||||
No. Tests-after answer "What does this do?" Tests-first answer "What should this do?"
|
||||
|
||||
Tests-after are biased by your implementation. You test what you built, not what's required. Tests-first force edge case discovery before implementing.
|
||||
|
||||
## Common Rationalizations
|
||||
|
||||
| Excuse | Reality |
|
||||
|--------|---------|
|
||||
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
|
||||
| "I'll test after" | Tests passing immediately prove nothing. |
|
||||
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
|
||||
| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
|
||||
| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
|
||||
| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
|
||||
| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
|
||||
| "Test hard = design unclear" | Listen to the test. Hard to test = hard to use. |
|
||||
| "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
|
||||
| "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. |
|
||||
| "Existing code has no tests" | You're improving it. Add tests for the code you touch. |
|
||||
|
||||
## Red Flags — STOP and Start Over
|
||||
|
||||
If you catch yourself doing any of these, delete the code and restart with TDD:
|
||||
|
||||
- Code before test
|
||||
- Test after implementation
|
||||
- Test passes immediately on first run
|
||||
- Can't explain why test failed
|
||||
- Tests added "later"
|
||||
- Rationalizing "just this once"
|
||||
- "I already manually tested it"
|
||||
- "Tests after achieve the same purpose"
|
||||
- "Keep as reference" or "adapt existing code"
|
||||
- "Already spent X hours, deleting is wasteful"
|
||||
- "TDD is dogmatic, I'm being pragmatic"
|
||||
- "This is different because..."
|
||||
|
||||
**All of these mean: Delete code. Start over with TDD.**
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
Before marking work complete:
|
||||
|
||||
- [ ] Every new function/method has a test
|
||||
- [ ] Watched each test fail before implementing
|
||||
- [ ] Each test failed for expected reason (feature missing, not typo)
|
||||
- [ ] Wrote minimal code to pass each test
|
||||
- [ ] All tests pass
|
||||
- [ ] Output pristine (no errors, warnings)
|
||||
- [ ] Tests use real code (mocks only if unavoidable)
|
||||
- [ ] Edge cases and errors covered
|
||||
|
||||
Can't check all boxes? You skipped TDD. Start over.
|
||||
|
||||
## When Stuck
|
||||
|
||||
| Problem | Solution |
|
||||
|---------|----------|
|
||||
| Don't know how to test | Write the wished-for API. Write the assertion first. Ask the user. |
|
||||
| Test too complicated | Design too complicated. Simplify the interface. |
|
||||
| Must mock everything | Code too coupled. Use dependency injection. |
|
||||
| Test setup huge | Extract helpers. Still complex? Simplify the design. |
|
||||
|
||||
## Hermes Agent Integration
|
||||
|
||||
### Running Tests
|
||||
|
||||
Use the `terminal` tool to run tests at each step:
|
||||
|
||||
```python
|
||||
# tests/test_calculator.py
|
||||
def test_adds_two_numbers():
|
||||
calc = Calculator()
|
||||
result = calc.add(2, 3)
|
||||
assert result == 5
|
||||
# RED — verify failure
|
||||
terminal("pytest tests/test_feature.py::test_name -v")
|
||||
|
||||
# GREEN — verify pass
|
||||
terminal("pytest tests/test_feature.py::test_name -v")
|
||||
|
||||
# Full suite — verify no regressions
|
||||
terminal("pytest tests/ -q")
|
||||
```
|
||||
|
||||
### 3. Run and Verify Failure
|
||||
### With delegate_task
|
||||
|
||||
```bash
|
||||
pytest tests/test_calculator.py -v
|
||||
# Expected: FAIL - Calculator not defined
|
||||
```
|
||||
|
||||
### 4. Write Minimal Implementation
|
||||
When dispatching subagents for implementation, enforce TDD in the goal:
|
||||
|
||||
```python
|
||||
# src/calculator.py
|
||||
class Calculator:
|
||||
def add(self, a, b):
|
||||
return a + b # Minimal!
|
||||
delegate_task(
|
||||
goal="Implement [feature] using strict TDD",
|
||||
context="""
|
||||
Follow test-driven-development skill:
|
||||
1. Write failing test FIRST
|
||||
2. Run test to verify it fails
|
||||
3. Write minimal code to pass
|
||||
4. Run test to verify it passes
|
||||
5. Refactor if needed
|
||||
6. Commit
|
||||
|
||||
Project test command: pytest tests/ -q
|
||||
Project structure: [describe relevant files]
|
||||
""",
|
||||
toolsets=['terminal', 'file']
|
||||
)
|
||||
```
|
||||
|
||||
### 5. Run and Verify Pass
|
||||
|
||||
```bash
|
||||
pytest tests/test_calculator.py -v
|
||||
# Expected: PASS
|
||||
```
|
||||
|
||||
### 6. Commit
|
||||
|
||||
```bash
|
||||
git add tests/test_calculator.py src/calculator.py
|
||||
git commit -m "feat: add calculator with add method"
|
||||
```
|
||||
|
||||
### 7. Next Test
|
||||
|
||||
```python
|
||||
def test_adds_negative_numbers():
|
||||
calc = Calculator()
|
||||
result = calc.add(-2, -3)
|
||||
assert result == -5
|
||||
```
|
||||
|
||||
Repeat cycle.
|
||||
|
||||
## Testing Anti-Patterns
|
||||
|
||||
### Mocking What You Don't Own
|
||||
|
||||
**Bad:** Mock database, HTTP client, file system
|
||||
**Good:** Abstract behind interface, test interface
|
||||
|
||||
### Testing Implementation Details
|
||||
|
||||
**Bad:** Test that function was called
|
||||
**Good:** Test the result/behavior
|
||||
|
||||
### Happy Path Only
|
||||
|
||||
**Bad:** Only test expected inputs
|
||||
**Good:** Test edge cases, errors, boundaries
|
||||
|
||||
### Brittle Tests
|
||||
|
||||
**Bad:** Test breaks when refactoring
|
||||
**Good:** Tests verify behavior, not structure
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
### "I'll Write Tests After"
|
||||
|
||||
No, you won't. Write them first.
|
||||
|
||||
### "This is Too Simple to Test"
|
||||
|
||||
Simple bugs cause complex problems. Test everything.
|
||||
|
||||
### "I Need to See It Work First"
|
||||
|
||||
Temporary code becomes permanent. Test first.
|
||||
|
||||
### "Tests Take Too Long"
|
||||
|
||||
Untested code takes longer. Invest in tests.
|
||||
|
||||
## Language-Specific Commands
|
||||
|
||||
### Python (pytest)
|
||||
|
||||
```bash
|
||||
# Run all tests
|
||||
pytest
|
||||
|
||||
# Run specific test
|
||||
pytest tests/test_feature.py::test_name -v
|
||||
|
||||
# Run with coverage
|
||||
pytest --cov=src --cov-report=term-missing
|
||||
|
||||
# Watch mode
|
||||
pytest-watch
|
||||
```
|
||||
|
||||
### JavaScript (Jest)
|
||||
|
||||
```bash
|
||||
# Run all tests
|
||||
npm test
|
||||
|
||||
# Run specific test
|
||||
npm test -- test_name
|
||||
|
||||
# Watch mode
|
||||
npm test -- --watch
|
||||
|
||||
# Coverage
|
||||
npm test -- --coverage
|
||||
```
|
||||
|
||||
### TypeScript (Jest with ts-jest)
|
||||
|
||||
```bash
|
||||
# Run tests
|
||||
npx jest
|
||||
|
||||
# Run specific file
|
||||
npx jest tests/feature.test.ts
|
||||
```
|
||||
|
||||
### Go
|
||||
|
||||
```bash
|
||||
# Run all tests
|
||||
go test ./...
|
||||
|
||||
# Run specific test
|
||||
go test -run TestName
|
||||
|
||||
# Verbose
|
||||
go test -v
|
||||
|
||||
# Coverage
|
||||
go test -cover
|
||||
```
|
||||
|
||||
### Rust
|
||||
|
||||
```bash
|
||||
# Run tests
|
||||
cargo test
|
||||
|
||||
# Run specific test
|
||||
cargo test test_name
|
||||
|
||||
# Show output
|
||||
cargo test -- --nocapture
|
||||
```
|
||||
|
||||
## Integration with Other Skills
|
||||
|
||||
### With writing-plans
|
||||
|
||||
Every plan task should specify:
|
||||
- What test to write
|
||||
- Expected test failure
|
||||
- Minimal implementation
|
||||
- Refactoring opportunities
|
||||
|
||||
### With systematic-debugging
|
||||
|
||||
When fixing bugs:
|
||||
1. Write test that reproduces bug
|
||||
2. Verify test fails (RED)
|
||||
3. Fix bug (GREEN)
|
||||
4. Refactor if needed
|
||||
Bug found? Write failing test reproducing it. Follow TDD cycle. The test proves the fix and prevents regression.
|
||||
|
||||
### With subagent-driven-development
|
||||
Never fix bugs without a test.
|
||||
|
||||
Subagent implements one test at a time:
|
||||
1. Write failing test
|
||||
2. Minimal code to pass
|
||||
3. Commit
|
||||
4. Next test
|
||||
## Testing Anti-Patterns
|
||||
|
||||
## Success Indicators
|
||||
- **Testing mock behavior instead of real behavior** — mocks should verify interactions, not replace the system under test
|
||||
- **Testing implementation details** — test behavior/results, not internal method calls
|
||||
- **Happy path only** — always test edge cases, errors, and boundaries
|
||||
- **Brittle tests** — tests should verify behavior, not structure; refactoring shouldn't break them
|
||||
|
||||
**You're doing TDD right when:**
|
||||
- Tests fail before code exists
|
||||
- You write <10 lines between test runs
|
||||
- Refactoring feels safe
|
||||
- Bugs are caught immediately
|
||||
- Code is simpler than expected
|
||||
## Final Rule
|
||||
|
||||
**Red flags:**
|
||||
- Writing 50+ lines without running tests
|
||||
- Tests always pass
|
||||
- Fear of refactoring
|
||||
- "I'll test later"
|
||||
```
|
||||
Production code → test exists and failed first
|
||||
Otherwise → not TDD
|
||||
```
|
||||
|
||||
## Remember
|
||||
|
||||
1. **RED** - Write failing test
|
||||
2. **GREEN** - Minimal code to pass
|
||||
3. **REFACTOR** - Clean up safely
|
||||
4. **REPEAT** - Next behavior
|
||||
|
||||
**If you didn't see it fail, it doesn't test what you think.**
|
||||
No exceptions without the user's explicit permission.
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
---
|
||||
name: writing-plans
|
||||
description: Use when you have a spec or requirements for a multi-step task. Creates comprehensive implementation plans with bite-sized tasks, exact file paths, and complete code examples.
|
||||
version: 1.0.0
|
||||
author: Hermes Agent (adapted from Superpowers)
|
||||
version: 1.1.0
|
||||
author: Hermes Agent (adapted from obra/superpowers)
|
||||
license: MIT
|
||||
metadata:
|
||||
hermes:
|
||||
@@ -14,112 +14,34 @@ metadata:
|
||||
|
||||
## Overview
|
||||
|
||||
Transform specifications into actionable implementation plans. Write comprehensive plans that any developer can follow - even with zero context about your codebase.
|
||||
Write comprehensive implementation plans assuming the implementer has zero context for the codebase and questionable taste. Document everything they need: which files to touch, complete code, testing commands, docs to check, how to verify. Give them bite-sized tasks. DRY. YAGNI. TDD. Frequent commits.
|
||||
|
||||
**Core principle:** Document everything: exact file paths, complete code, test commands, verification steps.
|
||||
Assume the implementer is a skilled developer but knows almost nothing about the toolset or problem domain. Assume they don't know good test design very well.
|
||||
|
||||
**Assume the implementer:**
|
||||
- Is a skilled developer
|
||||
- Knows almost nothing about your codebase
|
||||
- Has questionable taste in code style
|
||||
- Needs explicit guidance
|
||||
**Core principle:** A good plan makes implementation obvious. If someone has to guess, the plan is incomplete.
|
||||
|
||||
## When to Use
|
||||
|
||||
**Always use this skill:**
|
||||
- Before implementing multi-step features
|
||||
- After design approval (from brainstorming)
|
||||
- When breaking down complex requirements
|
||||
- Before delegating to subagents
|
||||
**Always use before:**
|
||||
- Implementing multi-step features
|
||||
- Breaking down complex requirements
|
||||
- Delegating to subagents via subagent-driven-development
|
||||
|
||||
**Don't skip when:**
|
||||
- Feature seems simple (assumptions cause bugs)
|
||||
- You plan to implement yourself (future you needs guidance)
|
||||
- You plan to implement it yourself (future you needs guidance)
|
||||
- Working alone (documentation matters)
|
||||
|
||||
## Plan Document Structure
|
||||
## Bite-Sized Task Granularity
|
||||
|
||||
### Header (Required)
|
||||
**Each task = 2-5 minutes of focused work.**
|
||||
|
||||
Every plan MUST start with:
|
||||
|
||||
```markdown
|
||||
# [Feature Name] Implementation Plan
|
||||
|
||||
> **For Hermes:** Use subagent-driven-development or executing-plans skill to implement this plan.
|
||||
|
||||
**Goal:** One sentence describing what this builds
|
||||
|
||||
**Architecture:** 2-3 sentences about approach
|
||||
|
||||
**Tech Stack:** Key technologies/libraries
|
||||
|
||||
---
|
||||
```
|
||||
|
||||
### Task Structure
|
||||
|
||||
Each task follows this format:
|
||||
|
||||
```markdown
|
||||
### Task N: [Descriptive Name]
|
||||
|
||||
**Objective:** What this task accomplishes (one sentence)
|
||||
|
||||
**Files:**
|
||||
- Create: `exact/path/to/new_file.py`
|
||||
- Modify: `exact/path/to/existing.py:45-67` (line numbers if known)
|
||||
- Test: `tests/path/to/test_file.py`
|
||||
|
||||
**Implementation Steps:**
|
||||
|
||||
**Step 1: [Action description]**
|
||||
```python
|
||||
# Complete code to write
|
||||
class NewClass:
|
||||
def __init__(self):
|
||||
self.value = None
|
||||
|
||||
def process(self, input):
|
||||
return input.upper()
|
||||
```
|
||||
|
||||
**Step 2: [Action description]**
|
||||
```bash
|
||||
# Command to run
|
||||
pytest tests/test_new.py -v
|
||||
```
|
||||
Expected: Tests pass with 3 green dots
|
||||
|
||||
**Step 3: [Action description]**
|
||||
```python
|
||||
# More code if needed
|
||||
```
|
||||
|
||||
**Verification:**
|
||||
- [ ] Test passes
|
||||
- [ ] Function returns expected output
|
||||
- [ ] No syntax errors
|
||||
|
||||
**Commit:**
|
||||
```bash
|
||||
git add src/new_file.py tests/test_new.py
|
||||
git commit -m "feat: add new feature component"
|
||||
```
|
||||
```
|
||||
|
||||
## Task Granularity
|
||||
|
||||
**Each task = 2-5 minutes of work**
|
||||
|
||||
**Break down:**
|
||||
- "Write failing test" - one task
|
||||
- "Run test to verify it fails" - one task
|
||||
- "Implement minimal code" - one task
|
||||
- "Run test to verify pass" - one task
|
||||
- "Commit" - one task
|
||||
|
||||
**Examples:**
|
||||
Every step is one action:
|
||||
- "Write the failing test" — step
|
||||
- "Run it to make sure it fails" — step
|
||||
- "Implement the minimal code to make the test pass" — step
|
||||
- "Run the tests and make sure they pass" — step
|
||||
- "Commit" — step
|
||||
|
||||
**Too big:**
|
||||
```markdown
|
||||
@@ -139,6 +61,146 @@ git commit -m "feat: add new feature component"
|
||||
[15 lines, 1 file]
|
||||
```
|
||||
|
||||
## Plan Document Structure
|
||||
|
||||
### Header (Required)
|
||||
|
||||
Every plan MUST start with:
|
||||
|
||||
```markdown
|
||||
# [Feature Name] Implementation Plan
|
||||
|
||||
> **For Hermes:** Use subagent-driven-development skill to implement this plan task-by-task.
|
||||
|
||||
**Goal:** [One sentence describing what this builds]
|
||||
|
||||
**Architecture:** [2-3 sentences about approach]
|
||||
|
||||
**Tech Stack:** [Key technologies/libraries]
|
||||
|
||||
---
|
||||
```
|
||||
|
||||
### Task Structure
|
||||
|
||||
Each task follows this format:
|
||||
|
||||
````markdown
|
||||
### Task N: [Descriptive Name]
|
||||
|
||||
**Objective:** What this task accomplishes (one sentence)
|
||||
|
||||
**Files:**
|
||||
- Create: `exact/path/to/new_file.py`
|
||||
- Modify: `exact/path/to/existing.py:45-67` (line numbers if known)
|
||||
- Test: `tests/path/to/test_file.py`
|
||||
|
||||
**Step 1: Write failing test**
|
||||
|
||||
```python
|
||||
def test_specific_behavior():
|
||||
result = function(input)
|
||||
assert result == expected
|
||||
```
|
||||
|
||||
**Step 2: Run test to verify failure**
|
||||
|
||||
Run: `pytest tests/path/test.py::test_specific_behavior -v`
|
||||
Expected: FAIL — "function not defined"
|
||||
|
||||
**Step 3: Write minimal implementation**
|
||||
|
||||
```python
|
||||
def function(input):
|
||||
return expected
|
||||
```
|
||||
|
||||
**Step 4: Run test to verify pass**
|
||||
|
||||
Run: `pytest tests/path/test.py::test_specific_behavior -v`
|
||||
Expected: PASS
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add tests/path/test.py src/path/file.py
|
||||
git commit -m "feat: add specific feature"
|
||||
```
|
||||
````
|
||||
|
||||
## Writing Process
|
||||
|
||||
### Step 1: Understand Requirements
|
||||
|
||||
Read and understand:
|
||||
- Feature requirements
|
||||
- Design documents or user description
|
||||
- Acceptance criteria
|
||||
- Constraints
|
||||
|
||||
### Step 2: Explore the Codebase
|
||||
|
||||
Use Hermes tools to understand the project:
|
||||
|
||||
```python
|
||||
# Understand project structure
|
||||
search_files("*.py", target="files", path="src/")
|
||||
|
||||
# Look at similar features
|
||||
search_files("similar_pattern", path="src/", file_glob="*.py")
|
||||
|
||||
# Check existing tests
|
||||
search_files("*.py", target="files", path="tests/")
|
||||
|
||||
# Read key files
|
||||
read_file("src/app.py")
|
||||
```
|
||||
|
||||
### Step 3: Design Approach
|
||||
|
||||
Decide:
|
||||
- Architecture pattern
|
||||
- File organization
|
||||
- Dependencies needed
|
||||
- Testing strategy
|
||||
|
||||
### Step 4: Write Tasks
|
||||
|
||||
Create tasks in order:
|
||||
1. Setup/infrastructure
|
||||
2. Core functionality (TDD for each)
|
||||
3. Edge cases
|
||||
4. Integration
|
||||
5. Cleanup/documentation
|
||||
|
||||
### Step 5: Add Complete Details
|
||||
|
||||
For each task, include:
|
||||
- **Exact file paths** (not "the config file" but `src/config/settings.py`)
|
||||
- **Complete code examples** (not "add validation" but the actual code)
|
||||
- **Exact commands** with expected output
|
||||
- **Verification steps** that prove the task works
|
||||
|
||||
### Step 6: Review the Plan
|
||||
|
||||
Check:
|
||||
- [ ] Tasks are sequential and logical
|
||||
- [ ] Each task is bite-sized (2-5 min)
|
||||
- [ ] File paths are exact
|
||||
- [ ] Code examples are complete (copy-pasteable)
|
||||
- [ ] Commands are exact with expected output
|
||||
- [ ] No missing context
|
||||
- [ ] DRY, YAGNI, TDD principles applied
|
||||
|
||||
### Step 7: Save the Plan
|
||||
|
||||
```bash
|
||||
mkdir -p docs/plans
|
||||
# Save plan to docs/plans/YYYY-MM-DD-feature-name.md
|
||||
git add docs/plans/
|
||||
git commit -m "docs: add implementation plan for [feature]"
|
||||
```
|
||||
|
||||
## Principles
|
||||
|
||||
### DRY (Don't Repeat Yourself)
|
||||
@@ -146,35 +208,21 @@ git commit -m "feat: add new feature component"
|
||||
**Bad:** Copy-paste validation in 3 places
|
||||
**Good:** Extract validation function, use everywhere
|
||||
|
||||
```python
|
||||
# Good - DRY
|
||||
def validate_email(email):
|
||||
if not re.match(r'^[^@]+@[^@]+$', email):
|
||||
raise ValueError("Invalid email")
|
||||
return email
|
||||
|
||||
# Use everywhere
|
||||
validate_email(user_input)
|
||||
validate_email(config_email)
|
||||
validate_email(imported_data)
|
||||
```
|
||||
|
||||
### YAGNI (You Aren't Gonna Need It)
|
||||
|
||||
**Bad:** Add "flexibility" for future requirements
|
||||
**Good:** Implement only what's needed now
|
||||
|
||||
```python
|
||||
# Bad - YAGNI violation
|
||||
# Bad — YAGNI violation
|
||||
class User:
|
||||
def __init__(self, name, email):
|
||||
self.name = name
|
||||
self.email = email
|
||||
self.preferences = {} # Not needed yet!
|
||||
self.metadata = {} # Not needed yet!
|
||||
self.settings = {} # Not needed yet!
|
||||
|
||||
# Good - YAGNI
|
||||
# Good — YAGNI
|
||||
class User:
|
||||
def __init__(self, name, email):
|
||||
self.name = name
|
||||
@@ -183,7 +231,7 @@ class User:
|
||||
|
||||
### TDD (Test-Driven Development)
|
||||
|
||||
Every task that produces code should include:
|
||||
Every task that produces code should include the full TDD cycle:
|
||||
1. Write failing test
|
||||
2. Run to verify failure
|
||||
3. Write minimal code
|
||||
@@ -199,255 +247,50 @@ git add [files]
|
||||
git commit -m "type: description"
|
||||
```
|
||||
|
||||
## Writing Process
|
||||
|
||||
### Step 1: Understand Requirements
|
||||
|
||||
Read and understand:
|
||||
- Feature requirements
|
||||
- Design documents
|
||||
- Acceptance criteria
|
||||
- Constraints
|
||||
|
||||
### Step 2: Explore Codebase
|
||||
|
||||
```bash
|
||||
# Understand project structure
|
||||
find src -type f -name "*.py" | head -20
|
||||
|
||||
# Look at similar features
|
||||
grep -r "similar_pattern" src/
|
||||
|
||||
# Check existing tests
|
||||
ls tests/
|
||||
```
|
||||
|
||||
### Step 3: Design Approach
|
||||
|
||||
Decide:
|
||||
- Architecture pattern
|
||||
- File organization
|
||||
- Dependencies needed
|
||||
- Testing strategy
|
||||
|
||||
### Step 4: Write Tasks
|
||||
|
||||
Create tasks in order:
|
||||
1. Setup/infrastructure
|
||||
2. Core functionality
|
||||
3. Edge cases
|
||||
4. Integration
|
||||
5. Cleanup
|
||||
|
||||
### Step 5: Add Details
|
||||
|
||||
For each task, add:
|
||||
- Exact file paths
|
||||
- Complete code examples
|
||||
- Exact commands
|
||||
- Expected outputs
|
||||
- Verification steps
|
||||
|
||||
### Step 6: Review Plan
|
||||
|
||||
Check:
|
||||
- [ ] Tasks are sequential and logical
|
||||
- [ ] Each task is bite-sized (2-5 min)
|
||||
- [ ] File paths are exact
|
||||
- [ ] Code examples are complete
|
||||
- [ ] Commands are exact with expected output
|
||||
- [ ] No missing context
|
||||
|
||||
### Step 7: Save Plan
|
||||
|
||||
```bash
|
||||
# Create plans directory
|
||||
mkdir -p docs/plans
|
||||
|
||||
# Save plan
|
||||
cat > docs/plans/YYYY-MM-DD-feature-name.md << 'EOF'
|
||||
[plan content]
|
||||
EOF
|
||||
|
||||
# Commit plan
|
||||
git add docs/plans/YYYY-MM-DD-feature-name.md
|
||||
git commit -m "docs: add implementation plan for feature"
|
||||
```
|
||||
|
||||
## Example Plan
|
||||
|
||||
```markdown
|
||||
# User Authentication Implementation Plan
|
||||
|
||||
> **For Hermes:** Use subagent-driven-development to implement this plan.
|
||||
|
||||
**Goal:** Add JWT-based user authentication to the Flask API
|
||||
|
||||
**Architecture:** Use PyJWT for tokens, bcrypt for hashing. Middleware validates tokens on protected routes.
|
||||
|
||||
**Tech Stack:** Python, Flask, PyJWT, bcrypt
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Create User model
|
||||
|
||||
**Objective:** Define User model with email and hashed password
|
||||
|
||||
**Files:**
|
||||
- Create: `src/models/user.py`
|
||||
- Test: `tests/models/test_user.py`
|
||||
|
||||
**Step 1: Write failing test**
|
||||
```python
|
||||
def test_user_creation():
|
||||
user = User(email="test@example.com", password="secret123")
|
||||
assert user.email == "test@example.com"
|
||||
assert user.password_hash is not None
|
||||
assert user.password_hash != "secret123" # Should be hashed
|
||||
```
|
||||
|
||||
**Step 2: Run to verify failure**
|
||||
```bash
|
||||
pytest tests/models/test_user.py -v
|
||||
```
|
||||
Expected: FAIL - User class not defined
|
||||
|
||||
**Step 3: Implement User model**
|
||||
```python
|
||||
import bcrypt
|
||||
|
||||
class User:
|
||||
def __init__(self, email, password):
|
||||
self.email = email
|
||||
self.password_hash = bcrypt.hashpw(
|
||||
password.encode(),
|
||||
bcrypt.gensalt()
|
||||
)
|
||||
```
|
||||
|
||||
**Step 4: Run to verify pass**
|
||||
```bash
|
||||
pytest tests/models/test_user.py -v
|
||||
```
|
||||
Expected: PASS
|
||||
|
||||
**Commit:**
|
||||
```bash
|
||||
git add src/models/user.py tests/models/test_user.py
|
||||
git commit -m "feat: add User model with password hashing"
|
||||
```
|
||||
|
||||
### Task 2: Create login endpoint
|
||||
|
||||
**Objective:** Add POST /login endpoint that returns JWT
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/app.py`
|
||||
- Test: `tests/test_login.py`
|
||||
|
||||
[Continue...]
|
||||
```
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
### Vague Tasks
|
||||
|
||||
**Bad:**
|
||||
```markdown
|
||||
### Task 1: Add authentication
|
||||
```
|
||||
|
||||
**Good:**
|
||||
```markdown
|
||||
### Task 1: Create User model with email and password_hash fields
|
||||
```
|
||||
**Bad:** "Add authentication"
|
||||
**Good:** "Create User model with email and password_hash fields"
|
||||
|
||||
### Incomplete Code
|
||||
|
||||
**Bad:**
|
||||
```markdown
|
||||
Step 1: Add validation function
|
||||
```
|
||||
|
||||
**Good:**
|
||||
```markdown
|
||||
Step 1: Add validation function
|
||||
```python
|
||||
def validate_email(email):
|
||||
"""Validate email format."""
|
||||
import re
|
||||
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
|
||||
if not re.match(pattern, email):
|
||||
raise ValueError(f"Invalid email: {email}")
|
||||
return email
|
||||
```
|
||||
**Bad:** "Step 1: Add validation function"
|
||||
**Good:** "Step 1: Add validation function" followed by the complete function code
|
||||
|
||||
### Missing Verification
|
||||
|
||||
**Bad:**
|
||||
```markdown
|
||||
Step 3: Test it works
|
||||
```
|
||||
**Bad:** "Step 3: Test it works"
|
||||
**Good:** "Step 3: Run `pytest tests/test_auth.py -v`, expected: 3 passed"
|
||||
|
||||
**Good:**
|
||||
```markdown
|
||||
Step 3: Verify authentication
|
||||
```bash
|
||||
curl -X POST http://localhost:5000/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"email":"test@example.com","password":"secret123"}'
|
||||
```
|
||||
Expected: Returns 200 with JWT token in response
|
||||
```
|
||||
### Missing File Paths
|
||||
|
||||
## Integration with Other Skills
|
||||
**Bad:** "Create the model file"
|
||||
**Good:** "Create: `src/models/user.py`"
|
||||
|
||||
### With brainstorming
|
||||
## Execution Handoff
|
||||
|
||||
**Sequence:**
|
||||
1. brainstorming → Explore and refine design
|
||||
2. writing-plans → Create implementation plan
|
||||
3. subagent-driven-development → Execute plan
|
||||
After saving the plan, offer the execution approach:
|
||||
|
||||
### With subagent-driven-development
|
||||
**"Plan complete and saved. Ready to execute using subagent-driven-development — I'll dispatch a fresh subagent per task with two-stage review (spec compliance then code quality). Shall I proceed?"**
|
||||
|
||||
Plans feed into subagent-driven-development:
|
||||
- Subagents implement each task
|
||||
- Two-stage review ensures quality
|
||||
- Plan provides context and requirements
|
||||
|
||||
### With test-driven-development
|
||||
|
||||
Every code-producing task should follow TDD:
|
||||
1. Write failing test
|
||||
2. Verify failure
|
||||
3. Write minimal code
|
||||
4. Verify pass
|
||||
|
||||
## Success Checklist
|
||||
|
||||
Before considering a plan complete:
|
||||
|
||||
- [ ] Header with goal, architecture, tech stack
|
||||
- [ ] All tasks are bite-sized (2-5 min each)
|
||||
- [ ] Exact file paths for every file
|
||||
- [ ] Complete code examples (not partial)
|
||||
- [ ] Exact commands with expected output
|
||||
- [ ] Verification steps for each task
|
||||
- [ ] Commit commands included
|
||||
- [ ] DRY, YAGNI, TDD principles applied
|
||||
- [ ] Tasks are sequential and logical
|
||||
- [ ] Plan saved to docs/plans/
|
||||
When executing, use the `subagent-driven-development` skill:
|
||||
- Fresh `delegate_task` per task with full context
|
||||
- Spec compliance review after each task
|
||||
- Code quality review after spec passes
|
||||
- Proceed only when both reviews approve
|
||||
|
||||
## Remember
|
||||
|
||||
```
|
||||
Bite-sized tasks
|
||||
Bite-sized tasks (2-5 min each)
|
||||
Exact file paths
|
||||
Complete code
|
||||
Exact commands
|
||||
Complete code (copy-pasteable)
|
||||
Exact commands with expected output
|
||||
Verification steps
|
||||
DRY, YAGNI, TDD
|
||||
Frequent commits
|
||||
```
|
||||
|
||||
**A good plan makes implementation obvious.**
|
||||
|
||||
Reference in New Issue
Block a user