diff --git a/skills/software-development/requesting-code-review/SKILL.md b/skills/software-development/requesting-code-review/SKILL.md index f2374b65..fb942ec2 100644 --- a/skills/software-development/requesting-code-review/SKILL.md +++ b/skills/software-development/requesting-code-review/SKILL.md @@ -1,8 +1,8 @@ --- name: requesting-code-review description: Use when completing tasks, implementing major features, or before merging. Validates work meets requirements through systematic review process. -version: 1.0.0 -author: Hermes Agent (adapted from Superpowers) +version: 1.1.0 +author: Hermes Agent (adapted from obra/superpowers) license: MIT metadata: hermes: @@ -14,108 +14,102 @@ metadata: ## Overview -Systematic code review catches issues before they cascade. Review early, review often. +Dispatch a reviewer subagent to catch issues before they cascade. Review early, review often. **Core principle:** Fresh perspective finds issues you'll miss. ## When to Request Review -**Mandatory Reviews:** +**Mandatory:** - After each task in subagent-driven development -- After completing major features +- After completing a major feature - Before merge to main - After bug fixes -**Optional but Valuable:** +**Optional but valuable:** - When stuck (fresh perspective) - Before refactoring (baseline check) - After complex logic implementation - When touching critical code (auth, payments, data) -**Don't skip because:** -- "It's simple" (simple bugs compound) -- "I'm in a hurry" (reviews save time) -- "I tested it" (you have blind spots) +**Never skip because:** +- "It's simple" — simple bugs compound +- "I'm in a hurry" — reviews save time +- "I tested it" — you have blind spots ## Review Process -### Step 1: Prepare Context +### Step 1: Self-Review First -Gather: -- What was implemented -- Original requirements/plan -- Files changed -- Test results +Before dispatching a reviewer, check yourself: -```bash -# Get changed files -git diff --name-only HEAD~1 - -# Get diff summary -git diff --stat HEAD~1 - -# Get commit messages -git log --oneline HEAD~5 -``` - -### Step 2: Self-Review First - -Before requesting external review: - -**Checklist:** - [ ] Code follows project conventions - [ ] All tests pass -- [ ] No debug print statements -- [ ] No hardcoded secrets +- [ ] No debug print statements left +- [ ] No hardcoded secrets or credentials - [ ] Error handling in place -- [ ] Documentation updated - [ ] Commit messages are clear ```bash # Run full test suite -pytest +pytest tests/ -q # Check for debug code -grep -r "print(" src/ --include="*.py" -grep -r "debugger" src/ --include="*.js" +search_files("print(", path="src/", file_glob="*.py") +search_files("console.log", path="src/", file_glob="*.js") # Check for TODOs -grep -r "TODO" src/ --include="*.py" +search_files("TODO|FIXME|HACK", path="src/") ``` -### Step 3: Request Review +### Step 2: Gather Context -Use `delegate_task` to dispatch a reviewer subagent: +```bash +# Changed files +git diff --name-only HEAD~1 + +# Diff summary +git diff --stat HEAD~1 + +# Recent commits +git log --oneline -5 +``` + +### Step 3: Dispatch Reviewer Subagent + +Use `delegate_task` to dispatch a focused reviewer: ```python delegate_task( - goal="Review implementation for quality and correctness", + goal="Review implementation for correctness and quality", context=""" - WHAT WAS IMPLEMENTED: [Brief description] - - ORIGINAL REQUIREMENTS: [From plan or issue] - + WHAT WAS IMPLEMENTED: + [Brief description of the feature/fix] + + ORIGINAL REQUIREMENTS: + [From plan, issue, or user request] + FILES CHANGED: - - src/feature.py (added X function) - - tests/test_feature.py (added tests) - - COMMIT RANGE: [SHA range or branch] - - CHECK FOR: - - Correctness (does it do what it should?) - - Edge cases handled? - - Error handling adequate? - - Code quality and style - - Test coverage - - Security issues - - Performance concerns - + - src/models/user.py (added User class) + - src/auth/login.py (added login endpoint) + - tests/test_auth.py (added 8 tests) + + REVIEW CHECKLIST: + - [ ] Correctness: Does it do what it should? + - [ ] Edge cases: Are they handled? + - [ ] Error handling: Is it adequate? + - [ ] Code quality: Clear names, good structure? + - [ ] Test coverage: Are tests meaningful? + - [ ] Security: Any vulnerabilities? + - [ ] Performance: Any obvious issues? + OUTPUT FORMAT: - Summary: [brief assessment] - - Critical Issues: [must fix] - - Important Issues: [should fix] + - Critical Issues: [must fix — blocks merge] + - Important Issues: [should fix before merge] - Minor Issues: [nice to have] - - Verdict: [APPROVE / REQUEST_CHANGES / NEEDS_WORK] + - Strengths: [what was done well] + - Verdict: APPROVE / REQUEST_CHANGES """, toolsets=['file'] ) @@ -123,110 +117,72 @@ delegate_task( ### Step 4: Act on Feedback -**Critical Issues (Block merge):** +**Critical Issues (block merge):** - Security vulnerabilities - Broken functionality - Data loss risk - Test failures +- **Action:** Fix immediately before proceeding -**Action:** Fix immediately before proceeding - -**Important Issues (Should fix):** +**Important Issues (should fix):** - Missing edge case handling - Poor error messages - Unclear code - Missing tests +- **Action:** Fix before merge if possible -**Action:** Fix before merge if possible - -**Minor Issues (Nice to have):** +**Minor Issues (nice to have):** - Style preferences - Refactoring suggestions - Documentation improvements +- **Action:** Note for later or quick fix -**Action:** Note for later or quick fix +**If reviewer is wrong:** +- Push back with technical reasoning +- Show code/tests that prove it works +- Request clarification ## Review Dimensions ### Correctness - -**Questions:** - Does it implement the requirements? - Are there logic errors? - Do edge cases work? - Are there race conditions? -**Check:** -- Read implementation against requirements -- Trace through edge cases -- Check boundary conditions - ### Code Quality - -**Questions:** - Is code readable? -- Are names clear? -- Is it too complex? +- Are names clear and descriptive? +- Is it too complex? (Functions >20 lines = smell) - Is there duplication? -**Check:** -- Function length (aim <20 lines) -- Cyclomatic complexity -- DRY violations -- Naming clarity - ### Testing - -**Questions:** -- Are there tests? +- Are there meaningful tests? - Do they cover edge cases? -- Are they meaningful? -- Do they pass? - -**Check:** -- Test coverage -- Edge case coverage -- Test clarity -- Assertion quality +- Do they test behavior, not implementation? +- Do all tests pass? ### Security - -**Questions:** - Any injection vulnerabilities? - Proper input validation? - Secrets handled correctly? - Access control in place? -**Check:** -- Input sanitization -- Authentication/authorization -- Secret management -- SQL/query safety - ### Performance - -**Questions:** - Any N+1 queries? -- Unnecessary computation? +- Unnecessary computation in loops? - Memory leaks? -- Scalability concerns? - -**Check:** -- Database queries -- Algorithm complexity -- Resource usage -- Caching opportunities +- Missing caching opportunities? ## Review Output Format -Standard review format: +Standard format for reviewer subagent output: ```markdown ## Review Summary **Assessment:** [Brief overall assessment] - -**Verdict:** [APPROVE / REQUEST_CHANGES / NEEDS_WORK] +**Verdict:** APPROVE / REQUEST_CHANGES --- @@ -237,8 +193,6 @@ Standard review format: - Problem: [Description] - Suggestion: [How to fix] ---- - ## Important Issues (Should Fix) 1. **[Issue title]** @@ -246,15 +200,11 @@ Standard review format: - Problem: [Description] - Suggestion: [How to fix] ---- - ## Minor Issues (Optional) 1. **[Issue title]** - Suggestion: [Improvement idea] ---- - ## Strengths - [What was done well] @@ -264,111 +214,41 @@ Standard review format: ### With subagent-driven-development -Review after EACH task: -1. Subagent implements task -2. Request code review -3. Fix issues -4. Proceed to next task +Review after EACH task — this is the two-stage review: +1. Spec compliance review (does it match the plan?) +2. Code quality review (is it well-built?) +3. Fix issues from either review +4. Proceed to next task only when both approve ### With test-driven-development -Review checks: -- Tests written first? -- Tests are meaningful? +Review verifies: +- Tests were written first (RED-GREEN-REFACTOR followed?) +- Tests are meaningful (not just asserting True)? - Edge cases covered? - All tests pass? ### With writing-plans Review validates: -- Implementation matches plan? +- Implementation matches the plan? - All tasks completed? - Quality standards met? -## Common Review Patterns +## Red Flags -### Pre-Merge Review - -Before merging feature branch: - -```bash -# Create review checkpoint -git log --oneline main..feature-branch - -# Get summary of changes -git diff --stat main..feature-branch - -# Request review -delegate_task( - goal="Pre-merge review of feature branch", - context="[changes, requirements, test results]" -) - -# Address feedback -# Merge when approved -``` - -### Continuous Review - -During subagent-driven development: - -```python -# After each task -if task_complete: - review_result = request_review(task) - if review_result.has_critical_issues(): - fix_issues(review_result.critical) - re_review() - proceed_to_next_task() -``` - -### Emergency Review - -When fixing production bugs: - -1. Fix with tests -2. Self-review -3. Quick peer review -4. Deploy -5. Full review post-deploy - -## Review Best Practices - -### As Review Requester - -**Do:** -- Provide complete context -- Highlight areas of concern -- Ask specific questions -- Be responsive to feedback -- Fix issues promptly - -**Don't:** -- Rush the reviewer -- Argue without evidence -- Ignore feedback -- Take criticism personally - -### As Reviewer (via subagent) - -**Do:** -- Be specific about issues -- Suggest improvements -- Acknowledge what works -- Prioritize issues - -**Don't:** -- Nitpick style (unless project requires) -- Make vague comments -- Block without explanation -- Be overly critical +**Never:** +- Skip review because "it's simple" +- Ignore Critical issues +- Proceed with unfixed Important issues +- Argue with valid technical feedback without evidence ## Quality Gates **Must pass before merge:** - [ ] No critical issues - [ ] All tests pass -- [ ] Review approved +- [ ] Review verdict: APPROVE - [ ] Requirements met **Should pass before merge:** @@ -376,10 +256,6 @@ When fixing production bugs: - [ ] Documentation updated - [ ] Performance acceptable -**Nice to have:** -- [ ] No minor issues -- [ ] Extra polish - ## Remember ``` diff --git a/skills/software-development/subagent-driven-development/SKILL.md b/skills/software-development/subagent-driven-development/SKILL.md index 7b94ece0..a47e4415 100644 --- a/skills/software-development/subagent-driven-development/SKILL.md +++ b/skills/software-development/subagent-driven-development/SKILL.md @@ -1,8 +1,8 @@ --- name: subagent-driven-development description: Use when executing implementation plans with independent tasks. Dispatches fresh delegate_task per task with two-stage review (spec compliance then code quality). -version: 1.0.0 -author: Hermes Agent (adapted from Superpowers) +version: 1.1.0 +author: Hermes Agent (adapted from obra/superpowers) license: MIT metadata: hermes: @@ -16,33 +16,41 @@ metadata: Execute implementation plans by dispatching fresh subagents per task with systematic two-stage review. -**Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration +**Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration. ## When to Use Use this skill when: -- You have an implementation plan (from writing-plans skill) +- You have an implementation plan (from writing-plans skill or user requirements) - Tasks are mostly independent -- You want to stay in the current session - Quality and spec compliance are important +- You want automated review between tasks -**vs. Manual execution:** -- Parallel task execution possible -- Automated review process -- Consistent quality checks -- Better for complex multi-step plans +**vs. manual execution:** +- Fresh context per task (no confusion from accumulated state) +- Automated review process catches issues early +- Consistent quality checks across all tasks +- Subagents can ask questions before starting work ## The Process ### 1. Read and Parse Plan -```markdown -[Read plan file once: docs/plans/feature-plan.md] -[Extract all tasks with full text and context] -[Create todo list with all tasks] +Read the plan file. Extract ALL tasks with their full text and context upfront. Create a todo list: + +```python +# Read the plan +read_file("docs/plans/feature-plan.md") + +# Create todo list with all tasks +todo([ + {"id": "task-1", "content": "Create User model with email field", "status": "pending"}, + {"id": "task-2", "content": "Add password hashing utility", "status": "pending"}, + {"id": "task-3", "content": "Create login endpoint", "status": "pending"}, +]) ``` -**Action:** Read plan, extract tasks, create todo list. +**Key:** Read the plan ONCE. Extract everything. Don't make subagents read the plan file — provide the full task text directly in context. ### 2. Per-Task Workflow @@ -50,124 +58,137 @@ For EACH task in the plan: #### Step 1: Dispatch Implementer Subagent -Use `delegate_task` with: -- **goal:** Implement [specific task from plan] -- **context:** Full task description from plan, project structure, relevant files -- **toolsets:** ['terminal', 'file', 'web'] (or as needed) +Use `delegate_task` with complete context: -**Example:** ```python -# Task: Add user authentication middleware delegate_task( - goal="Implement JWT authentication middleware as specified in Task 3 of the plan", + goal="Implement Task 1: Create User model with email and password_hash fields", context=""" - Task from plan: - - Create: src/middleware/auth.py - - Validate JWT tokens from Authorization header - - Return 401 for invalid tokens - - Attach user info to request object - - Project structure: - - Flask app in src/app.py - - Uses PyJWT library - - Existing middleware pattern in src/middleware/ + TASK FROM PLAN: + - Create: src/models/user.py + - Add User class with email (str) and password_hash (str) fields + - Use bcrypt for password hashing + - Include __repr__ for debugging + + FOLLOW TDD: + 1. Write failing test in tests/models/test_user.py + 2. Run: pytest tests/models/test_user.py -v (verify FAIL) + 3. Write minimal implementation + 4. Run: pytest tests/models/test_user.py -v (verify PASS) + 5. Run: pytest tests/ -q (verify no regressions) + 6. Commit: git add -A && git commit -m "feat: add User model with password hashing" + + PROJECT CONTEXT: + - Python 3.11, Flask app in src/app.py + - Existing models in src/models/ + - Tests use pytest, run from project root + - bcrypt already in requirements.txt """, toolsets=['terminal', 'file'] ) ``` -#### Step 2: Implementer Subagent Works +#### Step 2: Dispatch Spec Compliance Reviewer -The subagent will: -1. Ask questions if needed (you answer) -2. Implement the task following TDD -3. Write tests -4. Run tests to verify -5. Self-review -6. Report completion - -**Your role:** Answer questions, provide context. - -#### Step 3: Spec Compliance Review - -Dispatch reviewer subagent: +After the implementer completes, verify against the original spec: ```python delegate_task( - goal="Review if implementation matches spec from plan", + goal="Review if implementation matches the spec from the plan", context=""" - Original task spec: [copy from plan] - Implementation: [file paths and key code] - - Check: - - All requirements from spec implemented? - - File paths match spec? - - Behavior matches spec? - - Nothing extra added? + ORIGINAL TASK SPEC: + - Create src/models/user.py with User class + - Fields: email (str), password_hash (str) + - Use bcrypt for password hashing + - Include __repr__ + + CHECK: + - [ ] All requirements from spec implemented? + - [ ] File paths match spec? + - [ ] Function signatures match spec? + - [ ] Behavior matches expected? + - [ ] Nothing extra added (no scope creep)? + + OUTPUT: PASS or list of specific spec gaps to fix. """, toolsets=['file'] ) ``` -**If spec issues found:** -- Subagent fixes gaps -- Re-run spec review -- Continue only when spec-compliant +**If spec issues found:** Fix gaps, then re-run spec review. Continue only when spec-compliant. -#### Step 4: Code Quality Review +#### Step 3: Dispatch Code Quality Reviewer -Dispatch quality reviewer: +After spec compliance passes: ```python delegate_task( - goal="Review code quality and best practices", + goal="Review code quality for Task 1 implementation", context=""" - Code to review: [file paths] - - Check: - - Follows project style? - - Proper error handling? - - Good naming? - - Test coverage adequate? - - No obvious bugs? + FILES TO REVIEW: + - src/models/user.py + - tests/models/test_user.py + + CHECK: + - [ ] Follows project conventions and style? + - [ ] Proper error handling? + - [ ] Clear variable/function names? + - [ ] Adequate test coverage? + - [ ] No obvious bugs or missed edge cases? + - [ ] No security issues? + + OUTPUT FORMAT: + - Critical Issues: [must fix before proceeding] + - Important Issues: [should fix] + - Minor Issues: [optional] + - Verdict: APPROVED or REQUEST_CHANGES """, toolsets=['file'] ) ``` -**If quality issues found:** -- Subagent fixes issues -- Re-run quality review -- Continue only when approved +**If quality issues found:** Fix issues, re-review. Continue only when approved. -#### Step 5: Mark Complete +#### Step 4: Mark Complete -Update todo list, mark task complete. +```python +todo([{"id": "task-1", "content": "Create User model with email field", "status": "completed"}], merge=True) +``` ### 3. Final Review -After ALL tasks complete: +After ALL tasks are complete, dispatch a final integration reviewer: ```python delegate_task( - goal="Review entire implementation for consistency", - context="All tasks completed, review for integration issues", - toolsets=['file'] + goal="Review the entire implementation for consistency and integration issues", + context=""" + All tasks from the plan are complete. Review the full implementation: + - Do all components work together? + - Any inconsistencies between tasks? + - All tests passing? + - Ready for merge? + """, + toolsets=['terminal', 'file'] ) ``` -### 4. Branch Cleanup +### 4. Verify and Commit -Use `finishing-a-development-branch` skill: -- Verify all tests pass -- Present merge options -- Clean up worktree +```bash +# Run full test suite +pytest tests/ -q + +# Review all changes +git diff --stat + +# Final commit if needed +git add -A && git commit -m "feat: complete [feature name] implementation" +``` ## Task Granularity -**Good task size:** 2-5 minutes of focused work - -**Examples:** +**Each task = 2-5 minutes of focused work.** **Too big:** - "Implement user authentication system" @@ -177,188 +198,134 @@ Use `finishing-a-development-branch` skill: - "Add password hashing function" - "Create login endpoint" - "Add JWT token generation" +- "Create registration endpoint" -## Communication Pattern +## Red Flags — Never Do These -### You to Subagent - -**Provide:** -- Clear task description -- Exact file paths -- Expected behavior -- Success criteria -- Relevant context - -**Example:** -``` -Task: Add email validation -Files: Create src/validators/email.py -Expected: Function returns True for valid emails, False for invalid -Success: Tests pass for 10 test cases including edge cases -Context: Used in user registration flow -``` - -### Subagent to You - -**Expect:** -- Questions for clarification -- Progress updates -- Completion report -- Self-review summary - -**Respond to:** -- Answer questions promptly -- Provide missing context -- Approve approach decisions - -## Two-Stage Review Details - -### Stage 1: Spec Compliance - -**Checks:** -- [ ] All requirements from plan implemented -- [ ] File paths match specification -- [ ] Function signatures match spec -- [ ] Behavior matches expected -- [ ] No scope creep (nothing extra) - -**Output:** PASS or list of spec gaps - -### Stage 2: Code Quality - -**Checks:** -- [ ] Follows language conventions -- [ ] Consistent with project style -- [ ] Clear variable/function names -- [ ] Proper error handling -- [ ] Adequate test coverage -- [ ] No obvious bugs/edge cases missed -- [ ] Documentation if needed - -**Output:** APPROVED or list of issues (critical/important/minor) +- Start implementation without a plan +- Skip reviews (spec compliance OR code quality) +- Proceed with unfixed critical/important issues +- Dispatch multiple implementation subagents for tasks that touch the same files +- Make subagent read the plan file (provide full text in context instead) +- Skip scene-setting context (subagent needs to understand where the task fits) +- Ignore subagent questions (answer before letting them proceed) +- Accept "close enough" on spec compliance +- Skip review loops (reviewer found issues → implementer fixes → review again) +- Let implementer self-review replace actual review (both are needed) +- **Start code quality review before spec compliance is PASS** (wrong order) +- Move to next task while either review has open issues ## Handling Issues -### Critical Issues +### If Subagent Asks Questions -**Examples:** Security vulnerability, broken functionality, data loss risk +- Answer clearly and completely +- Provide additional context if needed +- Don't rush them into implementation -**Action:** Must fix before proceeding +### If Reviewer Finds Issues -### Important Issues +- Implementer subagent (or a new one) fixes them +- Reviewer reviews again +- Repeat until approved +- Don't skip the re-review -**Examples:** Missing tests, poor error handling, unclear code +### If Subagent Fails a Task -**Action:** Should fix before proceeding +- Dispatch a new fix subagent with specific instructions about what went wrong +- Don't try to fix manually in the controller session (context pollution) -### Minor Issues +## Efficiency Notes -**Examples:** Style inconsistency, minor refactoring opportunity +**Why fresh subagent per task:** +- Prevents context pollution from accumulated state +- Each subagent gets clean, focused context +- No confusion from prior tasks' code or reasoning -**Action:** Note for later, optional fix +**Why two-stage review:** +- Spec review catches under/over-building early +- Quality review ensures the implementation is well-built +- Catches issues before they compound across tasks + +**Cost trade-off:** +- More subagent invocations (implementer + 2 reviewers per task) +- But catches issues early (cheaper than debugging compounded problems later) ## Integration with Other Skills +### With writing-plans + +This skill EXECUTES plans created by the writing-plans skill: +1. User requirements → writing-plans → implementation plan +2. Implementation plan → subagent-driven-development → working code + ### With test-driven-development -Subagent should: +Implementer subagents should follow TDD: 1. Write failing test first 2. Implement minimal code 3. Verify test passes 4. Commit -### With systematic-debugging - -If subagent encounters bugs: -1. Pause implementation -2. Debug systematically -3. Fix root cause -4. Resume - -### With writing-plans - -This skill EXECUTES plans created by writing-plans skill. - -**Sequence:** -1. brainstorming → writing-plans → subagent-driven-development +Include TDD instructions in every implementer context. ### With requesting-code-review -After subagent completes task, use requesting-code-review skill for final validation. +The two-stage review process IS the code review. For final integration review, use the requesting-code-review skill's review dimensions. -## Common Patterns +### With systematic-debugging -### Pattern: Fresh Subagent Per Task - -**Why:** Prevents context pollution -**How:** New delegate_task for each task -**Result:** Each subagent has clean context - -### Pattern: Two-Stage Review - -**Why:** Catch issues early, ensure quality -**How:** Spec review → Quality review -**Result:** High-quality, spec-compliant code - -### Pattern: Frequent Checkpoints - -**Why:** Catch issues before they compound -**How:** Review after each task -**Result:** Issues don't cascade - -## Best Practices - -1. **Clear Task Boundaries** - - One task = one focused change - - Independent where possible - - Clear success criteria - -2. **Complete Context** - - Provide all needed files - - Explain project conventions - - Share relevant examples - -3. **Review Discipline** - - Don't skip spec review - - Address critical issues immediately - - Keep quality bar consistent - -4. **Communication** - - Answer subagent questions quickly - - Clarify when needed - - Provide feedback on reviews +If a subagent encounters bugs during implementation: +1. Follow systematic-debugging process +2. Find root cause before fixing +3. Write regression test +4. Resume implementation ## Example Workflow -```markdown -User: Implement user authentication +``` +[Read plan: docs/plans/auth-feature.md] +[Create todo list with 5 tasks] -You: I'll use subagent-driven development. Let me create a plan first. -[Uses writing-plans skill] +--- Task 1: Create User model --- +[Dispatch implementer subagent] + Implementer: "Should email be unique?" + You: "Yes, email must be unique" + Implementer: Implemented, 3/3 tests passing, committed. -Plan created with 5 tasks: -1. Create User model -2. Add password hashing -3. Implement login endpoint -4. Add JWT middleware -5. Create registration endpoint +[Dispatch spec reviewer] + Spec reviewer: ✅ PASS — all requirements met ---- Task 1 --- -[Dispatch implementer subagent for User model] -[Subagent asks: "Should email be unique?"] -You: Yes, email must be unique -[Subagent implements] -[Dispatch spec reviewer - PASS] -[Dispatch quality reviewer - APPROVED] -Task 1 complete +[Dispatch quality reviewer] + Quality reviewer: ✅ APPROVED — clean code, good tests ---- Task 2 --- -[Dispatch implementer for password hashing] -... +[Mark Task 1 complete] -[After all tasks] -[Final review] -[Merge branch] +--- Task 2: Password hashing --- +[Dispatch implementer subagent] + Implementer: No questions, implemented, 5/5 tests passing. + +[Dispatch spec reviewer] + Spec reviewer: ❌ Missing: password strength validation (spec says "min 8 chars") + +[Implementer fixes] + Implementer: Added validation, 7/7 tests passing. + +[Dispatch spec reviewer again] + Spec reviewer: ✅ PASS + +[Dispatch quality reviewer] + Quality reviewer: Important: Magic number 8, extract to constant + Implementer: Extracted MIN_PASSWORD_LENGTH constant + Quality reviewer: ✅ APPROVED + +[Mark Task 2 complete] + +... (continue for all tasks) + +[After all tasks: dispatch final integration reviewer] +[Run full test suite: all passing] +[Done!] ``` ## Remember @@ -366,8 +333,8 @@ Task 1 complete ``` Fresh subagent per task Two-stage review every time -Spec compliance first -Code quality second +Spec compliance FIRST +Code quality SECOND Never skip reviews Catch issues early ``` diff --git a/skills/software-development/systematic-debugging/SKILL.md b/skills/software-development/systematic-debugging/SKILL.md index 40261fa4..70a68d58 100644 --- a/skills/software-development/systematic-debugging/SKILL.md +++ b/skills/software-development/systematic-debugging/SKILL.md @@ -1,8 +1,8 @@ --- name: systematic-debugging -description: Use when encountering any bug, test failure, or unexpected behavior. 4-phase root cause investigation process - NO fixes without understanding the problem first. -version: 1.0.0 -author: Hermes Agent (adapted from Superpowers) +description: Use when encountering any bug, test failure, or unexpected behavior. 4-phase root cause investigation — NO fixes without understanding the problem first. +version: 1.1.0 +author: Hermes Agent (adapted from obra/superpowers) license: MIT metadata: hermes: @@ -48,7 +48,7 @@ Use for ANY technical issue: **Don't skip when:** - Issue seems simple (simple bugs have root causes too) - You're in a hurry (rushing guarantees rework) -- Manager wants it fixed NOW (systematic is faster than thrashing) +- Someone wants it fixed NOW (systematic is faster than thrashing) ## The Four Phases @@ -67,7 +67,7 @@ You MUST complete each phase before proceeding to the next. - Read stack traces completely - Note line numbers, file paths, error codes -**Action:** Copy full error message to your notes. +**Action:** Use `read_file` on the relevant source files. Use `search_files` to find the error string in the codebase. ### 2. Reproduce Consistently @@ -76,16 +76,24 @@ You MUST complete each phase before proceeding to the next. - Does it happen every time? - If not reproducible → gather more data, don't guess -**Action:** Write down exact reproduction steps. +**Action:** Use the `terminal` tool to run the failing test or trigger the bug: + +```bash +# Run specific failing test +pytest tests/test_module.py::test_name -v + +# Run with verbose output +pytest tests/test_module.py -v --tb=long +``` ### 3. Check Recent Changes - What changed that could cause this? - Git diff, recent commits - New dependencies, config changes -- Environmental differences -**Commands:** +**Action:** + ```bash # Recent commits git log --oneline -10 @@ -94,59 +102,50 @@ git log --oneline -10 git diff # Changes in specific file -git log -p --follow src/problematic_file.py +git log -p --follow src/problematic_file.py | head -100 ``` ### 4. Gather Evidence in Multi-Component Systems -**WHEN system has multiple components (CI pipeline, API service, database layer):** +**WHEN system has multiple components (API → service → database, CI → build → deploy):** **BEFORE proposing fixes, add diagnostic instrumentation:** For EACH component boundary: -- Log what data enters component -- Log what data exits component +- Log what data enters the component +- Log what data exits the component - Verify environment/config propagation - Check state at each layer -Run once to gather evidence showing WHERE it breaks -THEN analyze evidence to identify failing component -THEN investigate that specific component +Run once to gather evidence showing WHERE it breaks. +THEN analyze evidence to identify the failing component. +THEN investigate that specific component. + +### 5. Trace Data Flow + +**WHEN error is deep in the call stack:** + +- Where does the bad value originate? +- What called this function with the bad value? +- Keep tracing upstream until you find the source +- Fix at the source, not at the symptom + +**Action:** Use `search_files` to trace references: -**Example (multi-layer system):** ```python -# Layer 1: Entry point -def entry_point(input_data): - print(f"DEBUG: Input received: {input_data}") - result = process_layer1(input_data) - print(f"DEBUG: Layer 1 output: {result}") - return result +# Find where the function is called +search_files("function_name(", path="src/", file_glob="*.py") -# Layer 2: Processing -def process_layer1(data): - print(f"DEBUG: Layer 1 received: {data}") - # ... processing ... - print(f"DEBUG: Layer 1 returning: {result}") - return result +# Find where the variable is set +search_files("variable_name\\s*=", path="src/", file_glob="*.py") ``` -**Action:** Add logging, run once, analyze output. - -### 5. Isolate the Problem - -- Comment out code until problem disappears -- Binary search through recent changes -- Create minimal reproduction case -- Test with fresh environment - -**Action:** Create minimal reproduction case. - ### Phase 1 Completion Checklist - [ ] Error messages fully read and understood - [ ] Issue reproduced consistently - [ ] Recent changes identified and reviewed -- [ ] Evidence gathered (logs, state) +- [ ] Evidence gathered (logs, state, data flow) - [ ] Problem isolated to specific component/code - [ ] Root cause hypothesis formed @@ -154,290 +153,214 @@ def process_layer1(data): --- -## Phase 2: Solution Design +## Phase 2: Pattern Analysis -**Given the root cause, design the fix:** +**Find the pattern before fixing:** -### 1. Understand the Fix Area +### 1. Find Working Examples -- Read relevant code thoroughly -- Understand data flow -- Identify affected components -- Check for similar issues elsewhere +- Locate similar working code in the same codebase +- What works that's similar to what's broken? -**Action:** Read all relevant code files. +**Action:** Use `search_files` to find comparable patterns: -### 2. Design Minimal Fix +```python +search_files("similar_pattern", path="src/", file_glob="*.py") +``` -- Smallest change that fixes root cause -- Avoid scope creep -- Don't refactor while fixing -- Fix one issue at a time +### 2. Compare Against References -**Action:** Write down the exact fix before coding. +- If implementing a pattern, read the reference implementation COMPLETELY +- Don't skim — read every line +- Understand the pattern fully before applying -### 3. Consider Side Effects +### 3. Identify Differences -- What else could this change affect? -- Are there dependencies? -- Will this break other functionality? +- What's different between working and broken? +- List every difference, however small +- Don't assume "that can't matter" -**Action:** Identify potential side effects. +### 4. Understand Dependencies -### Phase 2 Completion Checklist - -- [ ] Fix area code fully understood -- [ ] Minimal fix designed -- [ ] Side effects identified -- [ ] Fix approach documented +- What other components does this need? +- What settings, config, environment? +- What assumptions does it make? --- -## Phase 3: Implementation +## Phase 3: Hypothesis and Testing -**Make the fix:** +**Scientific method:** -### 1. Write Test First (if possible) +### 1. Form a Single Hypothesis -```python -def test_should_handle_empty_input(): - """Regression test for bug #123""" - result = process_data("") - assert result == expected_empty_result -``` +- State clearly: "I think X is the root cause because Y" +- Write it down +- Be specific, not vague -### 2. Implement Fix +### 2. Test Minimally -```python -# Before (buggy) -def process_data(data): - return data.split(",")[0] +- Make the SMALLEST possible change to test the hypothesis +- One variable at a time +- Don't fix multiple things at once -# After (fixed) -def process_data(data): - if not data: - return "" - return data.split(",")[0] -``` +### 3. Verify Before Continuing + +- Did it work? → Phase 4 +- Didn't work? → Form NEW hypothesis +- DON'T add more fixes on top + +### 4. When You Don't Know + +- Say "I don't understand X" +- Don't pretend to know +- Ask the user for help +- Research more + +--- + +## Phase 4: Implementation + +**Fix the root cause, not the symptom:** + +### 1. Create Failing Test Case + +- Simplest possible reproduction +- Automated test if possible +- MUST have before fixing +- Use the `test-driven-development` skill + +### 2. Implement Single Fix + +- Address the root cause identified +- ONE change at a time +- No "while I'm here" improvements +- No bundled refactoring ### 3. Verify Fix ```bash -# Run the specific test -pytest tests/test_data.py::test_should_handle_empty_input -v +# Run the specific regression test +pytest tests/test_module.py::test_regression -v -# Run all tests to check for regressions -pytest +# Run full suite — no regressions +pytest tests/ -q ``` -### Phase 3 Completion Checklist +### 4. If Fix Doesn't Work — The Rule of Three -- [ ] Test written that reproduces the bug -- [ ] Minimal fix implemented -- [ ] Test passes -- [ ] No regressions introduced +- **STOP.** +- Count: How many fixes have you tried? +- If < 3: Return to Phase 1, re-analyze with new information +- **If ≥ 3: STOP and question the architecture (step 5 below)** +- DON'T attempt Fix #4 without architectural discussion + +### 5. If 3+ Fixes Failed: Question Architecture + +**Pattern indicating an architectural problem:** +- Each fix reveals new shared state/coupling in a different place +- Fixes require "massive refactoring" to implement +- Each fix creates new symptoms elsewhere + +**STOP and question fundamentals:** +- Is this pattern fundamentally sound? +- Are we "sticking with it through sheer inertia"? +- Should we refactor the architecture vs. continue fixing symptoms? + +**Discuss with the user before attempting more fixes.** + +This is NOT a failed hypothesis — this is a wrong architecture. --- -## Phase 4: Verification +## Red Flags — STOP and Follow Process -**Confirm it's actually fixed:** +If you catch yourself thinking: +- "Quick fix for now, investigate later" +- "Just try changing X and see if it works" +- "Add multiple changes, run tests" +- "Skip the test, I'll manually verify" +- "It's probably X, let me fix that" +- "I don't fully understand but this might work" +- "Pattern says X but I'll adapt it differently" +- "Here are the main problems: [lists fixes without investigation]" +- Proposing solutions before tracing data flow +- **"One more fix attempt" (when already tried 2+)** +- **Each fix reveals a new problem in a different place** -### 1. Reproduce Original Issue +**ALL of these mean: STOP. Return to Phase 1.** -- Follow original reproduction steps -- Verify issue is resolved -- Test edge cases +**If 3+ fixes failed:** Question the architecture (Phase 4 step 5). -### 2. Regression Testing +## Common Rationalizations -```bash -# Full test suite -pytest +| Excuse | Reality | +|--------|---------| +| "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. | +| "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. | +| "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. | +| "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. | +| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. | +| "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. | +| "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. | +| "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question the pattern, don't fix again. | -# Integration tests -pytest tests/integration/ +## Quick Reference -# Check related areas -pytest -k "related_feature" -``` +| Phase | Key Activities | Success Criteria | +|-------|---------------|------------------| +| **1. Root Cause** | Read errors, reproduce, check changes, gather evidence, trace data flow | Understand WHAT and WHY | +| **2. Pattern** | Find working examples, compare, identify differences | Know what's different | +| **3. Hypothesis** | Form theory, test minimally, one variable at a time | Confirmed or new hypothesis | +| **4. Implementation** | Create regression test, fix root cause, verify | Bug resolved, all tests pass | -### 3. Monitor After Deploy +## Hermes Agent Integration -- Watch logs for related errors -- Check metrics -- Verify fix in production +### Investigation Tools -### Phase 4 Completion Checklist +Use these Hermes tools during Phase 1: -- [ ] Original issue cannot be reproduced -- [ ] All tests pass -- [ ] No new warnings/errors -- [ ] Fix documented (commit message, comments) +- **`search_files`** — Find error strings, trace function calls, locate patterns +- **`read_file`** — Read source code with line numbers for precise analysis +- **`terminal`** — Run tests, check git history, reproduce bugs +- **`web_search`/`web_extract`** — Research error messages, library docs ---- +### With delegate_task -## Debugging Techniques - -### Root Cause Tracing - -Ask "why" 5 times: -1. Why did it fail? → Null pointer -2. Why was it null? → Function returned null -3. Why did function return null? → Missing validation -4. Why was validation missing? → Assumed input always valid -5. Why was that assumption wrong? → API changed - -**Root cause:** API change not accounted for - -### Defense in Depth - -Don't fix just the symptom: - -**Bad:** Add null check at crash site -**Good:** -1. Add validation at API boundary -2. Add null check at crash site -3. Add test for both -4. Document API behavior - -### Condition-Based Waiting - -For timing/race conditions: +For complex multi-component debugging, dispatch investigation subagents: ```python -# Bad - arbitrary sleep -import time -time.sleep(5) # "Should be enough" +delegate_task( + goal="Investigate why [specific test/behavior] fails", + context=""" + Follow systematic-debugging skill: + 1. Read the error message carefully + 2. Reproduce the issue + 3. Trace the data flow to find root cause + 4. Report findings — do NOT fix yet -# Good - wait for condition -from tenacity import retry, wait_exponential, stop_after_attempt - -@retry(wait=wait_exponential(multiplier=1, min=4, max=10), - stop=stop_after_attempt(5)) -def wait_for_service(): - response = requests.get(health_url) - assert response.status_code == 200 + Error: [paste full error] + File: [path to failing code] + Test command: [exact command] + """, + toolsets=['terminal', 'file'] +) ``` ---- - -## Common Debugging Pitfalls - -### Fix Without Understanding - -**Symptom:** "Just add a try/catch" -**Problem:** Masks the real issue -**Solution:** Complete Phase 1 before any fix - -### Shotgun Debugging - -**Symptom:** Change 5 things at once -**Problem:** Don't know what fixed it -**Solution:** One change at a time, verify each - -### Premature Optimization - -**Symptom:** Rewrite while debugging -**Problem:** Introduces new bugs -**Solution:** Fix first, refactor later - -### Assuming Environment - -**Symptom:** "Works on my machine" -**Problem:** Environment differences -**Solution:** Check environment variables, versions, configs - ---- - -## Language-Specific Tools - -### Python - -```python -# Add debugger -import pdb; pdb.set_trace() - -# Or use ipdb for better experience -import ipdb; ipdb.set_trace() - -# Log state -import logging -logging.debug(f"Variable state: {variable}") - -# Stack trace -import traceback -traceback.print_exc() -``` - -### JavaScript/TypeScript - -```javascript -// Debugger -debugger; - -// Console with context -console.log("State:", { var1, var2, var3 }); - -// Stack trace -console.trace("Here"); - -// Error with context -throw new Error(`Failed with input: ${JSON.stringify(input)}`); -``` - -### Go - -```go -// Print state -fmt.Printf("Debug: variable=%+v\n", variable) - -// Stack trace -import "runtime/debug" -debug.PrintStack() - -// Panic with context -if err != nil { - panic(fmt.Sprintf("unexpected error: %v", err)) -} -``` - ---- - -## Integration with Other Skills - ### With test-driven-development -When debugging: -1. Write test that reproduces bug -2. Debug systematically -3. Fix root cause -4. Test passes +When fixing bugs: +1. Write a test that reproduces the bug (RED) +2. Debug systematically to find root cause +3. Fix the root cause (GREEN) +4. The test proves the fix and prevents regression -### With writing-plans +## Real-World Impact -Include debugging tasks in plans: -- "Add diagnostic logging" -- "Create reproduction test" -- "Verify fix resolves issue" - -### With subagent-driven-development - -If subagent gets stuck: -1. Switch to systematic debugging -2. Analyze root cause -3. Provide findings to subagent -4. Resume implementation - ---- - -## Remember - -``` -PHASE 1: Investigate → Understand WHY -PHASE 2: Design → Plan the fix -PHASE 3: Implement → Make the fix -PHASE 4: Verify → Confirm it's fixed -``` +From debugging sessions: +- Systematic approach: 15-30 minutes to fix +- Random fixes approach: 2-3 hours of thrashing +- First-time fix rate: 95% vs 40% +- New bugs introduced: Near zero vs common **No shortcuts. No guessing. Systematic always wins.** diff --git a/skills/software-development/test-driven-development/SKILL.md b/skills/software-development/test-driven-development/SKILL.md index e4429331..4be2d532 100644 --- a/skills/software-development/test-driven-development/SKILL.md +++ b/skills/software-development/test-driven-development/SKILL.md @@ -1,8 +1,8 @@ --- name: test-driven-development description: Use when implementing any feature or bugfix, before writing implementation code. Enforces RED-GREEN-REFACTOR cycle with test-first approach. -version: 1.0.0 -author: Hermes Agent (adapted from Superpowers) +version: 1.1.0 +author: Hermes Agent (adapted from obra/superpowers) license: MIT metadata: hermes: @@ -28,7 +28,7 @@ Write the test first. Watch it fail. Write minimal code to pass. - Refactoring - Behavior changes -**Exceptions (ask your human partner):** +**Exceptions (ask the user first):** - Throwaway prototypes - Generated code - Configuration files @@ -53,11 +53,11 @@ Implement fresh from tests. Period. ## Red-Green-Refactor Cycle -### RED - Write Failing Test +### RED — Write Failing Test Write one minimal test showing what should happen. -**Good Example:** +**Good test:** ```python def test_retries_failed_operations_3_times(): attempts = 0 @@ -67,43 +67,51 @@ def test_retries_failed_operations_3_times(): if attempts < 3: raise Exception('fail') return 'success' - + result = retry_operation(operation) - + assert result == 'success' assert attempts == 3 ``` -- Clear name, tests real behavior, one thing +Clear name, tests real behavior, one thing. -**Bad Example:** +**Bad test:** ```python def test_retry_works(): mock = MagicMock() mock.side_effect = [Exception(), Exception(), 'success'] - result = retry_operation(mock) - assert result == 'success' # What about retry count? Timing? ``` -- Vague name, mocks behavior not reality, unclear what it tests +Vague name, tests mock not real code. -### Verify RED +**Requirements:** +- One behavior per test +- Clear descriptive name ("and" in name? Split it) +- Real code, not mocks (unless truly unavoidable) +- Name describes behavior, not implementation -Run the test. It MUST fail. +### Verify RED — Watch It Fail -**If it passes:** -- Test is wrong (not testing what you think) -- Code already exists (delete it, start over) -- Wrong test file running +**MANDATORY. Never skip.** -**What to check:** -- Error message makes sense -- Fails for expected reason -- Stack trace points to right place +```bash +# Use terminal tool to run the specific test +pytest tests/test_feature.py::test_specific_behavior -v +``` -### GREEN - Minimal Code +Confirm: +- Test fails (not errors from typos) +- Failure message is expected +- Fails because the feature is missing -Write just enough code to pass. Nothing more. +**Test passes immediately?** You're testing existing behavior. Fix the test. + +**Test errors?** Fix the error, re-run until it fails correctly. + +### GREEN — Minimal Code + +Write the simplest code to pass the test. Nothing more. **Good:** ```python @@ -119,266 +127,216 @@ def add(a, b): return result ``` -Cheating is OK in GREEN: +Don't add features, refactor other code, or "improve" beyond the test. + +**Cheating is OK in GREEN:** - Hardcode return values - Copy-paste - Duplicate code - Skip edge cases -**We'll fix it in refactor.** +We'll fix it in REFACTOR. -### Verify GREEN +### Verify GREEN — Watch It Pass -Run tests. All must pass. +**MANDATORY.** -**If fails:** -- Fix minimal code -- Don't expand scope -- Stay in GREEN +```bash +# Run the specific test +pytest tests/test_feature.py::test_specific_behavior -v -### REFACTOR - Clean Up +# Then run ALL tests to check for regressions +pytest tests/ -q +``` -Now improve the code while keeping tests green. +Confirm: +- Test passes +- Other tests still pass +- Output pristine (no errors, warnings) -**Safe refactorings:** -- Rename variables/functions -- Extract helper functions +**Test fails?** Fix the code, not the test. + +**Other tests fail?** Fix regressions now. + +### REFACTOR — Clean Up + +After green only: - Remove duplication +- Improve names +- Extract helpers - Simplify expressions -- Improve readability -**Golden rule:** Tests stay green throughout. +Keep tests green throughout. Don't add behavior. -**If tests fail during refactor:** -- Undo immediately -- Smaller refactoring steps -- Check you didn't change behavior +**If tests fail during refactor:** Undo immediately. Take smaller steps. -## Implementation Workflow +### Repeat -### 1. Create Test File +Next failing test for next behavior. One cycle at a time. -```bash -# Python -touch tests/test_feature.py +## Why Order Matters -# JavaScript -touch tests/feature.test.js +**"I'll write tests after to verify it works"** -# Rust -touch tests/feature_tests.rs -``` +Tests written after code pass immediately. Passing immediately proves nothing: +- Might test the wrong thing +- Might test implementation, not behavior +- Might miss edge cases you forgot +- You never saw it catch the bug -### 2. Write First Failing Test +Test-first forces you to see the test fail, proving it actually tests something. + +**"I already manually tested all the edge cases"** + +Manual testing is ad-hoc. You think you tested everything but: +- No record of what you tested +- Can't re-run when code changes +- Easy to forget cases under pressure +- "It worked when I tried it" ≠ comprehensive + +Automated tests are systematic. They run the same way every time. + +**"Deleting X hours of work is wasteful"** + +Sunk cost fallacy. The time is already gone. Your choice now: +- Delete and rewrite with TDD (high confidence) +- Keep it and add tests after (low confidence, likely bugs) + +The "waste" is keeping code you can't trust. + +**"TDD is dogmatic, being pragmatic means adapting"** + +TDD IS pragmatic: +- Finds bugs before commit (faster than debugging after) +- Prevents regressions (tests catch breaks immediately) +- Documents behavior (tests show how to use code) +- Enables refactoring (change freely, tests catch breaks) + +"Pragmatic" shortcuts = debugging in production = slower. + +**"Tests after achieve the same goals — it's spirit not ritual"** + +No. Tests-after answer "What does this do?" Tests-first answer "What should this do?" + +Tests-after are biased by your implementation. You test what you built, not what's required. Tests-first force edge case discovery before implementing. + +## Common Rationalizations + +| Excuse | Reality | +|--------|---------| +| "Too simple to test" | Simple code breaks. Test takes 30 seconds. | +| "I'll test after" | Tests passing immediately prove nothing. | +| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" | +| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. | +| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. | +| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. | +| "Need to explore first" | Fine. Throw away exploration, start with TDD. | +| "Test hard = design unclear" | Listen to the test. Hard to test = hard to use. | +| "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. | +| "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. | +| "Existing code has no tests" | You're improving it. Add tests for the code you touch. | + +## Red Flags — STOP and Start Over + +If you catch yourself doing any of these, delete the code and restart with TDD: + +- Code before test +- Test after implementation +- Test passes immediately on first run +- Can't explain why test failed +- Tests added "later" +- Rationalizing "just this once" +- "I already manually tested it" +- "Tests after achieve the same purpose" +- "Keep as reference" or "adapt existing code" +- "Already spent X hours, deleting is wasteful" +- "TDD is dogmatic, I'm being pragmatic" +- "This is different because..." + +**All of these mean: Delete code. Start over with TDD.** + +## Verification Checklist + +Before marking work complete: + +- [ ] Every new function/method has a test +- [ ] Watched each test fail before implementing +- [ ] Each test failed for expected reason (feature missing, not typo) +- [ ] Wrote minimal code to pass each test +- [ ] All tests pass +- [ ] Output pristine (no errors, warnings) +- [ ] Tests use real code (mocks only if unavoidable) +- [ ] Edge cases and errors covered + +Can't check all boxes? You skipped TDD. Start over. + +## When Stuck + +| Problem | Solution | +|---------|----------| +| Don't know how to test | Write the wished-for API. Write the assertion first. Ask the user. | +| Test too complicated | Design too complicated. Simplify the interface. | +| Must mock everything | Code too coupled. Use dependency injection. | +| Test setup huge | Extract helpers. Still complex? Simplify the design. | + +## Hermes Agent Integration + +### Running Tests + +Use the `terminal` tool to run tests at each step: ```python -# tests/test_calculator.py -def test_adds_two_numbers(): - calc = Calculator() - result = calc.add(2, 3) - assert result == 5 +# RED — verify failure +terminal("pytest tests/test_feature.py::test_name -v") + +# GREEN — verify pass +terminal("pytest tests/test_feature.py::test_name -v") + +# Full suite — verify no regressions +terminal("pytest tests/ -q") ``` -### 3. Run and Verify Failure +### With delegate_task -```bash -pytest tests/test_calculator.py -v -# Expected: FAIL - Calculator not defined -``` - -### 4. Write Minimal Implementation +When dispatching subagents for implementation, enforce TDD in the goal: ```python -# src/calculator.py -class Calculator: - def add(self, a, b): - return a + b # Minimal! +delegate_task( + goal="Implement [feature] using strict TDD", + context=""" + Follow test-driven-development skill: + 1. Write failing test FIRST + 2. Run test to verify it fails + 3. Write minimal code to pass + 4. Run test to verify it passes + 5. Refactor if needed + 6. Commit + + Project test command: pytest tests/ -q + Project structure: [describe relevant files] + """, + toolsets=['terminal', 'file'] +) ``` -### 5. Run and Verify Pass - -```bash -pytest tests/test_calculator.py -v -# Expected: PASS -``` - -### 6. Commit - -```bash -git add tests/test_calculator.py src/calculator.py -git commit -m "feat: add calculator with add method" -``` - -### 7. Next Test - -```python -def test_adds_negative_numbers(): - calc = Calculator() - result = calc.add(-2, -3) - assert result == -5 -``` - -Repeat cycle. - -## Testing Anti-Patterns - -### Mocking What You Don't Own - -**Bad:** Mock database, HTTP client, file system -**Good:** Abstract behind interface, test interface - -### Testing Implementation Details - -**Bad:** Test that function was called -**Good:** Test the result/behavior - -### Happy Path Only - -**Bad:** Only test expected inputs -**Good:** Test edge cases, errors, boundaries - -### Brittle Tests - -**Bad:** Test breaks when refactoring -**Good:** Tests verify behavior, not structure - -## Common Pitfalls - -### "I'll Write Tests After" - -No, you won't. Write them first. - -### "This is Too Simple to Test" - -Simple bugs cause complex problems. Test everything. - -### "I Need to See It Work First" - -Temporary code becomes permanent. Test first. - -### "Tests Take Too Long" - -Untested code takes longer. Invest in tests. - -## Language-Specific Commands - -### Python (pytest) - -```bash -# Run all tests -pytest - -# Run specific test -pytest tests/test_feature.py::test_name -v - -# Run with coverage -pytest --cov=src --cov-report=term-missing - -# Watch mode -pytest-watch -``` - -### JavaScript (Jest) - -```bash -# Run all tests -npm test - -# Run specific test -npm test -- test_name - -# Watch mode -npm test -- --watch - -# Coverage -npm test -- --coverage -``` - -### TypeScript (Jest with ts-jest) - -```bash -# Run tests -npx jest - -# Run specific file -npx jest tests/feature.test.ts -``` - -### Go - -```bash -# Run all tests -go test ./... - -# Run specific test -go test -run TestName - -# Verbose -go test -v - -# Coverage -go test -cover -``` - -### Rust - -```bash -# Run tests -cargo test - -# Run specific test -cargo test test_name - -# Show output -cargo test -- --nocapture -``` - -## Integration with Other Skills - -### With writing-plans - -Every plan task should specify: -- What test to write -- Expected test failure -- Minimal implementation -- Refactoring opportunities - ### With systematic-debugging -When fixing bugs: -1. Write test that reproduces bug -2. Verify test fails (RED) -3. Fix bug (GREEN) -4. Refactor if needed +Bug found? Write failing test reproducing it. Follow TDD cycle. The test proves the fix and prevents regression. -### With subagent-driven-development +Never fix bugs without a test. -Subagent implements one test at a time: -1. Write failing test -2. Minimal code to pass -3. Commit -4. Next test +## Testing Anti-Patterns -## Success Indicators +- **Testing mock behavior instead of real behavior** — mocks should verify interactions, not replace the system under test +- **Testing implementation details** — test behavior/results, not internal method calls +- **Happy path only** — always test edge cases, errors, and boundaries +- **Brittle tests** — tests should verify behavior, not structure; refactoring shouldn't break them -**You're doing TDD right when:** -- Tests fail before code exists -- You write <10 lines between test runs -- Refactoring feels safe -- Bugs are caught immediately -- Code is simpler than expected +## Final Rule -**Red flags:** -- Writing 50+ lines without running tests -- Tests always pass -- Fear of refactoring -- "I'll test later" +``` +Production code → test exists and failed first +Otherwise → not TDD +``` -## Remember - -1. **RED** - Write failing test -2. **GREEN** - Minimal code to pass -3. **REFACTOR** - Clean up safely -4. **REPEAT** - Next behavior - -**If you didn't see it fail, it doesn't test what you think.** +No exceptions without the user's explicit permission. diff --git a/skills/software-development/writing-plans/SKILL.md b/skills/software-development/writing-plans/SKILL.md index c232af7a..92a8d017 100644 --- a/skills/software-development/writing-plans/SKILL.md +++ b/skills/software-development/writing-plans/SKILL.md @@ -1,8 +1,8 @@ --- name: writing-plans description: Use when you have a spec or requirements for a multi-step task. Creates comprehensive implementation plans with bite-sized tasks, exact file paths, and complete code examples. -version: 1.0.0 -author: Hermes Agent (adapted from Superpowers) +version: 1.1.0 +author: Hermes Agent (adapted from obra/superpowers) license: MIT metadata: hermes: @@ -14,112 +14,34 @@ metadata: ## Overview -Transform specifications into actionable implementation plans. Write comprehensive plans that any developer can follow - even with zero context about your codebase. +Write comprehensive implementation plans assuming the implementer has zero context for the codebase and questionable taste. Document everything they need: which files to touch, complete code, testing commands, docs to check, how to verify. Give them bite-sized tasks. DRY. YAGNI. TDD. Frequent commits. -**Core principle:** Document everything: exact file paths, complete code, test commands, verification steps. +Assume the implementer is a skilled developer but knows almost nothing about the toolset or problem domain. Assume they don't know good test design very well. -**Assume the implementer:** -- Is a skilled developer -- Knows almost nothing about your codebase -- Has questionable taste in code style -- Needs explicit guidance +**Core principle:** A good plan makes implementation obvious. If someone has to guess, the plan is incomplete. ## When to Use -**Always use this skill:** -- Before implementing multi-step features -- After design approval (from brainstorming) -- When breaking down complex requirements -- Before delegating to subagents +**Always use before:** +- Implementing multi-step features +- Breaking down complex requirements +- Delegating to subagents via subagent-driven-development **Don't skip when:** - Feature seems simple (assumptions cause bugs) -- You plan to implement yourself (future you needs guidance) +- You plan to implement it yourself (future you needs guidance) - Working alone (documentation matters) -## Plan Document Structure +## Bite-Sized Task Granularity -### Header (Required) +**Each task = 2-5 minutes of focused work.** -Every plan MUST start with: - -```markdown -# [Feature Name] Implementation Plan - -> **For Hermes:** Use subagent-driven-development or executing-plans skill to implement this plan. - -**Goal:** One sentence describing what this builds - -**Architecture:** 2-3 sentences about approach - -**Tech Stack:** Key technologies/libraries - ---- -``` - -### Task Structure - -Each task follows this format: - -```markdown -### Task N: [Descriptive Name] - -**Objective:** What this task accomplishes (one sentence) - -**Files:** -- Create: `exact/path/to/new_file.py` -- Modify: `exact/path/to/existing.py:45-67` (line numbers if known) -- Test: `tests/path/to/test_file.py` - -**Implementation Steps:** - -**Step 1: [Action description]** -```python -# Complete code to write -class NewClass: - def __init__(self): - self.value = None - - def process(self, input): - return input.upper() -``` - -**Step 2: [Action description]** -```bash -# Command to run -pytest tests/test_new.py -v -``` -Expected: Tests pass with 3 green dots - -**Step 3: [Action description]** -```python -# More code if needed -``` - -**Verification:** -- [ ] Test passes -- [ ] Function returns expected output -- [ ] No syntax errors - -**Commit:** -```bash -git add src/new_file.py tests/test_new.py -git commit -m "feat: add new feature component" -``` -``` - -## Task Granularity - -**Each task = 2-5 minutes of work** - -**Break down:** -- "Write failing test" - one task -- "Run test to verify it fails" - one task -- "Implement minimal code" - one task -- "Run test to verify pass" - one task -- "Commit" - one task - -**Examples:** +Every step is one action: +- "Write the failing test" — step +- "Run it to make sure it fails" — step +- "Implement the minimal code to make the test pass" — step +- "Run the tests and make sure they pass" — step +- "Commit" — step **Too big:** ```markdown @@ -139,6 +61,146 @@ git commit -m "feat: add new feature component" [15 lines, 1 file] ``` +## Plan Document Structure + +### Header (Required) + +Every plan MUST start with: + +```markdown +# [Feature Name] Implementation Plan + +> **For Hermes:** Use subagent-driven-development skill to implement this plan task-by-task. + +**Goal:** [One sentence describing what this builds] + +**Architecture:** [2-3 sentences about approach] + +**Tech Stack:** [Key technologies/libraries] + +--- +``` + +### Task Structure + +Each task follows this format: + +````markdown +### Task N: [Descriptive Name] + +**Objective:** What this task accomplishes (one sentence) + +**Files:** +- Create: `exact/path/to/new_file.py` +- Modify: `exact/path/to/existing.py:45-67` (line numbers if known) +- Test: `tests/path/to/test_file.py` + +**Step 1: Write failing test** + +```python +def test_specific_behavior(): + result = function(input) + assert result == expected +``` + +**Step 2: Run test to verify failure** + +Run: `pytest tests/path/test.py::test_specific_behavior -v` +Expected: FAIL — "function not defined" + +**Step 3: Write minimal implementation** + +```python +def function(input): + return expected +``` + +**Step 4: Run test to verify pass** + +Run: `pytest tests/path/test.py::test_specific_behavior -v` +Expected: PASS + +**Step 5: Commit** + +```bash +git add tests/path/test.py src/path/file.py +git commit -m "feat: add specific feature" +``` +```` + +## Writing Process + +### Step 1: Understand Requirements + +Read and understand: +- Feature requirements +- Design documents or user description +- Acceptance criteria +- Constraints + +### Step 2: Explore the Codebase + +Use Hermes tools to understand the project: + +```python +# Understand project structure +search_files("*.py", target="files", path="src/") + +# Look at similar features +search_files("similar_pattern", path="src/", file_glob="*.py") + +# Check existing tests +search_files("*.py", target="files", path="tests/") + +# Read key files +read_file("src/app.py") +``` + +### Step 3: Design Approach + +Decide: +- Architecture pattern +- File organization +- Dependencies needed +- Testing strategy + +### Step 4: Write Tasks + +Create tasks in order: +1. Setup/infrastructure +2. Core functionality (TDD for each) +3. Edge cases +4. Integration +5. Cleanup/documentation + +### Step 5: Add Complete Details + +For each task, include: +- **Exact file paths** (not "the config file" but `src/config/settings.py`) +- **Complete code examples** (not "add validation" but the actual code) +- **Exact commands** with expected output +- **Verification steps** that prove the task works + +### Step 6: Review the Plan + +Check: +- [ ] Tasks are sequential and logical +- [ ] Each task is bite-sized (2-5 min) +- [ ] File paths are exact +- [ ] Code examples are complete (copy-pasteable) +- [ ] Commands are exact with expected output +- [ ] No missing context +- [ ] DRY, YAGNI, TDD principles applied + +### Step 7: Save the Plan + +```bash +mkdir -p docs/plans +# Save plan to docs/plans/YYYY-MM-DD-feature-name.md +git add docs/plans/ +git commit -m "docs: add implementation plan for [feature]" +``` + ## Principles ### DRY (Don't Repeat Yourself) @@ -146,35 +208,21 @@ git commit -m "feat: add new feature component" **Bad:** Copy-paste validation in 3 places **Good:** Extract validation function, use everywhere -```python -# Good - DRY -def validate_email(email): - if not re.match(r'^[^@]+@[^@]+$', email): - raise ValueError("Invalid email") - return email - -# Use everywhere -validate_email(user_input) -validate_email(config_email) -validate_email(imported_data) -``` - ### YAGNI (You Aren't Gonna Need It) **Bad:** Add "flexibility" for future requirements **Good:** Implement only what's needed now ```python -# Bad - YAGNI violation +# Bad — YAGNI violation class User: def __init__(self, name, email): self.name = name self.email = email self.preferences = {} # Not needed yet! self.metadata = {} # Not needed yet! - self.settings = {} # Not needed yet! -# Good - YAGNI +# Good — YAGNI class User: def __init__(self, name, email): self.name = name @@ -183,7 +231,7 @@ class User: ### TDD (Test-Driven Development) -Every task that produces code should include: +Every task that produces code should include the full TDD cycle: 1. Write failing test 2. Run to verify failure 3. Write minimal code @@ -199,255 +247,50 @@ git add [files] git commit -m "type: description" ``` -## Writing Process - -### Step 1: Understand Requirements - -Read and understand: -- Feature requirements -- Design documents -- Acceptance criteria -- Constraints - -### Step 2: Explore Codebase - -```bash -# Understand project structure -find src -type f -name "*.py" | head -20 - -# Look at similar features -grep -r "similar_pattern" src/ - -# Check existing tests -ls tests/ -``` - -### Step 3: Design Approach - -Decide: -- Architecture pattern -- File organization -- Dependencies needed -- Testing strategy - -### Step 4: Write Tasks - -Create tasks in order: -1. Setup/infrastructure -2. Core functionality -3. Edge cases -4. Integration -5. Cleanup - -### Step 5: Add Details - -For each task, add: -- Exact file paths -- Complete code examples -- Exact commands -- Expected outputs -- Verification steps - -### Step 6: Review Plan - -Check: -- [ ] Tasks are sequential and logical -- [ ] Each task is bite-sized (2-5 min) -- [ ] File paths are exact -- [ ] Code examples are complete -- [ ] Commands are exact with expected output -- [ ] No missing context - -### Step 7: Save Plan - -```bash -# Create plans directory -mkdir -p docs/plans - -# Save plan -cat > docs/plans/YYYY-MM-DD-feature-name.md << 'EOF' -[plan content] -EOF - -# Commit plan -git add docs/plans/YYYY-MM-DD-feature-name.md -git commit -m "docs: add implementation plan for feature" -``` - -## Example Plan - -```markdown -# User Authentication Implementation Plan - -> **For Hermes:** Use subagent-driven-development to implement this plan. - -**Goal:** Add JWT-based user authentication to the Flask API - -**Architecture:** Use PyJWT for tokens, bcrypt for hashing. Middleware validates tokens on protected routes. - -**Tech Stack:** Python, Flask, PyJWT, bcrypt - ---- - -### Task 1: Create User model - -**Objective:** Define User model with email and hashed password - -**Files:** -- Create: `src/models/user.py` -- Test: `tests/models/test_user.py` - -**Step 1: Write failing test** -```python -def test_user_creation(): - user = User(email="test@example.com", password="secret123") - assert user.email == "test@example.com" - assert user.password_hash is not None - assert user.password_hash != "secret123" # Should be hashed -``` - -**Step 2: Run to verify failure** -```bash -pytest tests/models/test_user.py -v -``` -Expected: FAIL - User class not defined - -**Step 3: Implement User model** -```python -import bcrypt - -class User: - def __init__(self, email, password): - self.email = email - self.password_hash = bcrypt.hashpw( - password.encode(), - bcrypt.gensalt() - ) -``` - -**Step 4: Run to verify pass** -```bash -pytest tests/models/test_user.py -v -``` -Expected: PASS - -**Commit:** -```bash -git add src/models/user.py tests/models/test_user.py -git commit -m "feat: add User model with password hashing" -``` - -### Task 2: Create login endpoint - -**Objective:** Add POST /login endpoint that returns JWT - -**Files:** -- Modify: `src/app.py` -- Test: `tests/test_login.py` - -[Continue...] -``` - ## Common Mistakes ### Vague Tasks -**Bad:** -```markdown -### Task 1: Add authentication -``` - -**Good:** -```markdown -### Task 1: Create User model with email and password_hash fields -``` +**Bad:** "Add authentication" +**Good:** "Create User model with email and password_hash fields" ### Incomplete Code -**Bad:** -```markdown -Step 1: Add validation function -``` - -**Good:** -```markdown -Step 1: Add validation function -```python -def validate_email(email): - """Validate email format.""" - import re - pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' - if not re.match(pattern, email): - raise ValueError(f"Invalid email: {email}") - return email -``` +**Bad:** "Step 1: Add validation function" +**Good:** "Step 1: Add validation function" followed by the complete function code ### Missing Verification -**Bad:** -```markdown -Step 3: Test it works -``` +**Bad:** "Step 3: Test it works" +**Good:** "Step 3: Run `pytest tests/test_auth.py -v`, expected: 3 passed" -**Good:** -```markdown -Step 3: Verify authentication -```bash -curl -X POST http://localhost:5000/login \ - -H "Content-Type: application/json" \ - -d '{"email":"test@example.com","password":"secret123"}' -``` -Expected: Returns 200 with JWT token in response -``` +### Missing File Paths -## Integration with Other Skills +**Bad:** "Create the model file" +**Good:** "Create: `src/models/user.py`" -### With brainstorming +## Execution Handoff -**Sequence:** -1. brainstorming → Explore and refine design -2. writing-plans → Create implementation plan -3. subagent-driven-development → Execute plan +After saving the plan, offer the execution approach: -### With subagent-driven-development +**"Plan complete and saved. Ready to execute using subagent-driven-development — I'll dispatch a fresh subagent per task with two-stage review (spec compliance then code quality). Shall I proceed?"** -Plans feed into subagent-driven-development: -- Subagents implement each task -- Two-stage review ensures quality -- Plan provides context and requirements - -### With test-driven-development - -Every code-producing task should follow TDD: -1. Write failing test -2. Verify failure -3. Write minimal code -4. Verify pass - -## Success Checklist - -Before considering a plan complete: - -- [ ] Header with goal, architecture, tech stack -- [ ] All tasks are bite-sized (2-5 min each) -- [ ] Exact file paths for every file -- [ ] Complete code examples (not partial) -- [ ] Exact commands with expected output -- [ ] Verification steps for each task -- [ ] Commit commands included -- [ ] DRY, YAGNI, TDD principles applied -- [ ] Tasks are sequential and logical -- [ ] Plan saved to docs/plans/ +When executing, use the `subagent-driven-development` skill: +- Fresh `delegate_task` per task with full context +- Spec compliance review after each task +- Code quality review after spec passes +- Proceed only when both reviews approve ## Remember ``` -Bite-sized tasks +Bite-sized tasks (2-5 min each) Exact file paths -Complete code -Exact commands +Complete code (copy-pasteable) +Exact commands with expected output Verification steps DRY, YAGNI, TDD +Frequent commits ``` **A good plan makes implementation obvious.**