refactor: enhance software-development skills with Hermes integration

Improvements to all 5 skills adapted from obra/superpowers: - Restored anti-rationalization tables and red flags from originals (key behavioral guardrails that prevent LLMs from taking shortcuts) - Restored 'Rule of Three' for debugging (3+ failed fixes = question architecture, not keep fixing) - Restored Pattern Analysis and Hypothesis Testing phases in debugging - Restored 'Why Order Matters' rebuttals and verification checklist in TDD - Added proper Hermes delegate_task integration with real parameter examples and toolset specifications throughout - Added Hermes tool usage (search_files, read_file, terminal) for investigation and verification steps - Removed references to non-existent skills (brainstorming, finishing-a-development-branch, executing-plans, using-git-worktrees) - Removed generic language-specific sections (Go, Rust, Jest) that added bulk without agent value - Tightened prose — cut ~430 lines while adding more actionable content - Added execution handoff section to writing-plans - Consistent cross-references between the 5 skills
2026-03-03 04:08:56 -08:00
parent 0e1723ef74
commit de0af4df66
5 changed files with 879 additions and 1312 deletions
--- a/skills/software-development/requesting-code-review/SKILL.md
+++ b/skills/software-development/requesting-code-review/SKILL.md
@@ -1,8 +1,8 @@
 ---
 name: requesting-code-review
 description: Use when completing tasks, implementing major features, or before merging. Validates work meets requirements through systematic review process.
-version: 1.0.0
+version: 1.1.0
-author: Hermes Agent (adapted from Superpowers)
+author: Hermes Agent (adapted from obra/superpowers)
 license: MIT
 metadata:
  hermes:
@@ -14,108 +14,102 @@ metadata:
 ## Overview
-Systematic code review catches issues before they cascade. Review early, review often.
+Dispatch a reviewer subagent to catch issues before they cascade. Review early, review often.
 **Core principle:** Fresh perspective finds issues you'll miss.
 ## When to Request Review
-**Mandatory Reviews:**
+**Mandatory:**
 - After each task in subagent-driven development
- After completing major features
+- After completing a major feature
 - Before merge to main
 - After bug fixes
-**Optional but Valuable:**
+**Optional but valuable:**
 - When stuck (fresh perspective)
 - Before refactoring (baseline check)
 - After complex logic implementation
 - When touching critical code (auth, payments, data)
-**Don't skip because:**
+**Never skip because:**
- "It's simple" (simple bugs compound)
+- "It's simple" — simple bugs compound
- "I'm in a hurry" (reviews save time)
+- "I'm in a hurry" — reviews save time
- "I tested it" (you have blind spots)
+- "I tested it" — you have blind spots
 ## Review Process
-### Step 1: Prepare Context
+### Step 1: Self-Review First
-Gather:
+Before dispatching a reviewer, check yourself:
 - What was implemented
 - Original requirements/plan
 - Files changed
 - Test results
 ```bash
 # Get changed files
 git diff --name-only HEAD~1
 # Get diff summary
 git diff --stat HEAD~1
 # Get commit messages
 git log --oneline HEAD~5
 ```
 ### Step 2: Self-Review First
 Before requesting external review:
 **Checklist:**
 - [ ] Code follows project conventions
 - [ ] All tests pass
- [ ] No debug print statements
+- [ ] No debug print statements left
- [ ] No hardcoded secrets
+- [ ] No hardcoded secrets or credentials
 - [ ] Error handling in place
 - [ ] Documentation updated
 - [ ] Commit messages are clear
 ```bash
 # Run full test suite
-pytest
+pytest tests/ -q
 # Check for debug code
-grep -r "print(" src/ --include="*.py"
+search_files("print(", path="src/", file_glob="*.py")
-grep -r "debugger" src/ --include="*.js"
+search_files("console.log", path="src/", file_glob="*.js")
 # Check for TODOs
-grep -r "TODO" src/ --include="*.py"
+search_files("TODO|FIXME|HACK", path="src/")
 ```
-### Step 3: Request Review
+### Step 2: Gather Context
-Use `delegate_task` to dispatch a reviewer subagent:
+```bash
 # Changed files
 git diff --name-only HEAD~1
 # Diff summary
 git diff --stat HEAD~1
 # Recent commits
 git log --oneline -5
 ```
 ### Step 3: Dispatch Reviewer Subagent
 Use `delegate_task` to dispatch a focused reviewer:
 ```python
 delegate_task(
-    goal="Review implementation for quality and correctness",
+    goal="Review implementation for correctness and quality",
    context="""
-    WHAT WAS IMPLEMENTED: [Brief description]
+    WHAT WAS IMPLEMENTED:
-    
+    [Brief description of the feature/fix]
-    ORIGINAL REQUIREMENTS: [From plan or issue]
+
-    
+    ORIGINAL REQUIREMENTS:
    [From plan, issue, or user request]
    FILES CHANGED:
-    - src/feature.py (added X function)
+    - src/models/user.py (added User class)
-    - tests/test_feature.py (added tests)
+    - src/auth/login.py (added login endpoint)
-    
+    - tests/test_auth.py (added 8 tests)
-    COMMIT RANGE: [SHA range or branch]
+
-    
+    REVIEW CHECKLIST:
-    CHECK FOR:
+    - [ ] Correctness: Does it do what it should?
-    - Correctness (does it do what it should?)
+    - [ ] Edge cases: Are they handled?
-    - Edge cases handled?
+    - [ ] Error handling: Is it adequate?
-    - Error handling adequate?
+    - [ ] Code quality: Clear names, good structure?
-    - Code quality and style
+    - [ ] Test coverage: Are tests meaningful?
-    - Test coverage
+    - [ ] Security: Any vulnerabilities?
-    - Security issues
+    - [ ] Performance: Any obvious issues?
-    - Performance concerns
+
    OUTPUT FORMAT:
    - Summary: [brief assessment]
-    - Critical Issues: [must fix]
+    - Critical Issues: [must fix — blocks merge]
-    - Important Issues: [should fix]
+    - Important Issues: [should fix before merge]
    - Minor Issues: [nice to have]
-    - Verdict: [APPROVE / REQUEST_CHANGES / NEEDS_WORK]
+    - Strengths: [what was done well]
    - Verdict: APPROVE / REQUEST_CHANGES
    """,
    toolsets=['file']
 )
@@ -123,110 +117,72 @@ delegate_task(
 ### Step 4: Act on Feedback
-**Critical Issues (Block merge):**
+**Critical Issues (block merge):**
 - Security vulnerabilities
 - Broken functionality
 - Data loss risk
 - Test failures
 - **Action:** Fix immediately before proceeding
-**Action:** Fix immediately before proceeding
+**Important Issues (should fix):**
 **Important Issues (Should fix):**
 - Missing edge case handling
 - Poor error messages
 - Unclear code
 - Missing tests
 - **Action:** Fix before merge if possible
-**Action:** Fix before merge if possible
+**Minor Issues (nice to have):**
 **Minor Issues (Nice to have):**
 - Style preferences
 - Refactoring suggestions
 - Documentation improvements
 - **Action:** Note for later or quick fix
-**Action:** Note for later or quick fix
+**If reviewer is wrong:**
 - Push back with technical reasoning
 - Show code/tests that prove it works
 - Request clarification
 ## Review Dimensions
 ### Correctness
 **Questions:**
 - Does it implement the requirements?
 - Are there logic errors?
 - Do edge cases work?
 - Are there race conditions?
 **Check:**
 - Read implementation against requirements
 - Trace through edge cases
 - Check boundary conditions
 ### Code Quality
 **Questions:**
 - Is code readable?
- Are names clear?
+- Are names clear and descriptive?
- Is it too complex?
+- Is it too complex? (Functions >20 lines = smell)
 - Is there duplication?
 **Check:**
 - Function length (aim <20 lines)
 - Cyclomatic complexity
 - DRY violations
 - Naming clarity
 ### Testing
-
+- Are there meaningful tests?
 **Questions:**
 - Are there tests?
 - Do they cover edge cases?
- Are they meaningful?
+- Do they test behavior, not implementation?
- Do they pass?
+- Do all tests pass?
 **Check:**
 - Test coverage
 - Edge case coverage
 - Test clarity
 - Assertion quality
 ### Security
 **Questions:**
 - Any injection vulnerabilities?
 - Proper input validation?
 - Secrets handled correctly?
 - Access control in place?
 **Check:**
 - Input sanitization
 - Authentication/authorization
 - Secret management
 - SQL/query safety
 ### Performance
 **Questions:**
 - Any N+1 queries?
- Unnecessary computation?
+- Unnecessary computation in loops?
 - Memory leaks?
- Scalability concerns?
+- Missing caching opportunities?
 **Check:**
 - Database queries
 - Algorithm complexity
 - Resource usage
 - Caching opportunities
 ## Review Output Format
-Standard review format:
+Standard format for reviewer subagent output:
 ```markdown
 ## Review Summary
 **Assessment:** [Brief overall assessment]
-
+**Verdict:** APPROVE / REQUEST_CHANGES
 **Verdict:** [APPROVE / REQUEST_CHANGES / NEEDS_WORK]
 ---
@@ -237,8 +193,6 @@ Standard review format:
   - Problem: [Description]
   - Suggestion: [How to fix]
 ---
 ## Important Issues (Should Fix)
 1. **[Issue title]**
@@ -246,15 +200,11 @@ Standard review format:
   - Problem: [Description]
   - Suggestion: [How to fix]
 ---
 ## Minor Issues (Optional)
 1. **[Issue title]**
   - Suggestion: [Improvement idea]
 ---
 ## Strengths
 - [What was done well]
@@ -264,111 +214,41 @@ Standard review format:
 ### With subagent-driven-development
-Review after EACH task:
+Review after EACH task — this is the two-stage review:
-1. Subagent implements task
+1. Spec compliance review (does it match the plan?)
-2. Request code review
+2. Code quality review (is it well-built?)
-3. Fix issues
+3. Fix issues from either review
-4. Proceed to next task
+4. Proceed to next task only when both approve
 ### With test-driven-development
-Review checks:
+Review verifies:
- Tests written first?
+- Tests were written first (RED-GREEN-REFACTOR followed?)
- Tests are meaningful?
+- Tests are meaningful (not just asserting True)?
 - Edge cases covered?
 - All tests pass?
 ### With writing-plans
 Review validates:
- Implementation matches plan?
+- Implementation matches the plan?
 - All tasks completed?
 - Quality standards met?
-## Common Review Patterns
+## Red Flags
-### Pre-Merge Review
+**Never:**
-
+- Skip review because "it's simple"
-Before merging feature branch:
+- Ignore Critical issues
-
+- Proceed with unfixed Important issues
-```bash
+- Argue with valid technical feedback without evidence
 # Create review checkpoint
 git log --oneline main..feature-branch
 # Get summary of changes
 git diff --stat main..feature-branch
 # Request review
 delegate_task(
    goal="Pre-merge review of feature branch",
    context="[changes, requirements, test results]"
 )
 # Address feedback
 # Merge when approved
 ```
 ### Continuous Review
 During subagent-driven development:
 ```python
 # After each task
 if task_complete:
    review_result = request_review(task)
    if review_result.has_critical_issues():
        fix_issues(review_result.critical)
        re_review()
    proceed_to_next_task()
 ```
 ### Emergency Review
 When fixing production bugs:
 1. Fix with tests
 2. Self-review
 3. Quick peer review
 4. Deploy
 5. Full review post-deploy
 ## Review Best Practices
 ### As Review Requester
 **Do:**
 - Provide complete context
 - Highlight areas of concern
 - Ask specific questions
 - Be responsive to feedback
 - Fix issues promptly
 **Don't:**
 - Rush the reviewer
 - Argue without evidence
 - Ignore feedback
 - Take criticism personally
 ### As Reviewer (via subagent)
 **Do:**
 - Be specific about issues
 - Suggest improvements
 - Acknowledge what works
 - Prioritize issues
 **Don't:**
 - Nitpick style (unless project requires)
 - Make vague comments
 - Block without explanation
 - Be overly critical
 ## Quality Gates
 **Must pass before merge:**
 - [ ] No critical issues
 - [ ] All tests pass
- [ ] Review approved
+- [ ] Review verdict: APPROVE
 - [ ] Requirements met
 **Should pass before merge:**
@@ -376,10 +256,6 @@ When fixing production bugs:
 - [ ] Documentation updated
 - [ ] Performance acceptable
 **Nice to have:**
 - [ ] No minor issues
 - [ ] Extra polish
 ## Remember
 ```
--- a/skills/software-development/subagent-driven-development/SKILL.md
+++ b/skills/software-development/subagent-driven-development/SKILL.md
@@ -1,8 +1,8 @@
 ---
 name: subagent-driven-development
 description: Use when executing implementation plans with independent tasks. Dispatches fresh delegate_task per task with two-stage review (spec compliance then code quality).
-version: 1.0.0
+version: 1.1.0
-author: Hermes Agent (adapted from Superpowers)
+author: Hermes Agent (adapted from obra/superpowers)
 license: MIT
 metadata:
  hermes:
@@ -16,33 +16,41 @@ metadata:
 Execute implementation plans by dispatching fresh subagents per task with systematic two-stage review.
-**Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration
+**Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration.
 ## When to Use
 Use this skill when:
- You have an implementation plan (from writing-plans skill)
+- You have an implementation plan (from writing-plans skill or user requirements)
 - Tasks are mostly independent
 - You want to stay in the current session
 - Quality and spec compliance are important
 - You want automated review between tasks
-**vs. Manual execution:**
+**vs. manual execution:**
- Parallel task execution possible
+- Fresh context per task (no confusion from accumulated state)
- Automated review process
+- Automated review process catches issues early
- Consistent quality checks
+- Consistent quality checks across all tasks
- Better for complex multi-step plans
+- Subagents can ask questions before starting work
 ## The Process
 ### 1. Read and Parse Plan
-```markdown
+Read the plan file. Extract ALL tasks with their full text and context upfront. Create a todo list:
-[Read plan file once: docs/plans/feature-plan.md]
+
-[Extract all tasks with full text and context]
+```python
-[Create todo list with all tasks]
+# Read the plan
 read_file("docs/plans/feature-plan.md")
 # Create todo list with all tasks
 todo([
    {"id": "task-1", "content": "Create User model with email field", "status": "pending"},
    {"id": "task-2", "content": "Add password hashing utility", "status": "pending"},
    {"id": "task-3", "content": "Create login endpoint", "status": "pending"},
 ])
 ```
-**Action:** Read plan, extract tasks, create todo list.
+**Key:** Read the plan ONCE. Extract everything. Don't make subagents read the plan file — provide the full task text directly in context.
 ### 2. Per-Task Workflow
@@ -50,124 +58,137 @@ For EACH task in the plan:
 #### Step 1: Dispatch Implementer Subagent
-Use `delegate_task` with:
+Use `delegate_task` with complete context:
 - **goal:** Implement [specific task from plan]
 - **context:** Full task description from plan, project structure, relevant files
 - **toolsets:** ['terminal', 'file', 'web'] (or as needed)
 **Example:**
 ```python
 # Task: Add user authentication middleware
 delegate_task(
-    goal="Implement JWT authentication middleware as specified in Task 3 of the plan",
+    goal="Implement Task 1: Create User model with email and password_hash fields",
    context="""
-    Task from plan:
+    TASK FROM PLAN:
-    - Create: src/middleware/auth.py
+    - Create: src/models/user.py
-    - Validate JWT tokens from Authorization header
+    - Add User class with email (str) and password_hash (str) fields
-    - Return 401 for invalid tokens
+    - Use bcrypt for password hashing
-    - Attach user info to request object
+    - Include __repr__ for debugging
-    
+
-    Project structure:
+    FOLLOW TDD:
-    - Flask app in src/app.py
+    1. Write failing test in tests/models/test_user.py
-    - Uses PyJWT library
+    2. Run: pytest tests/models/test_user.py -v (verify FAIL)
-    - Existing middleware pattern in src/middleware/
+    3. Write minimal implementation
    4. Run: pytest tests/models/test_user.py -v (verify PASS)
    5. Run: pytest tests/ -q (verify no regressions)
    6. Commit: git add -A && git commit -m "feat: add User model with password hashing"
    PROJECT CONTEXT:
    - Python 3.11, Flask app in src/app.py
    - Existing models in src/models/
    - Tests use pytest, run from project root
    - bcrypt already in requirements.txt
    """,
    toolsets=['terminal', 'file']
 )
 ```
-#### Step 2: Implementer Subagent Works
+#### Step 2: Dispatch Spec Compliance Reviewer
-The subagent will:
+After the implementer completes, verify against the original spec:
 1. Ask questions if needed (you answer)
 2. Implement the task following TDD
 3. Write tests
 4. Run tests to verify
 5. Self-review
 6. Report completion
 **Your role:** Answer questions, provide context.
 #### Step 3: Spec Compliance Review
 Dispatch reviewer subagent:
 ```python
 delegate_task(
-    goal="Review if implementation matches spec from plan",
+    goal="Review if implementation matches the spec from the plan",
    context="""
-    Original task spec: [copy from plan]
+    ORIGINAL TASK SPEC:
-    Implementation: [file paths and key code]
+    - Create src/models/user.py with User class
-    
+    - Fields: email (str), password_hash (str)
-    Check:
+    - Use bcrypt for password hashing
-    - All requirements from spec implemented?
+    - Include __repr__
-    - File paths match spec?
+
-    - Behavior matches spec?
+    CHECK:
-    - Nothing extra added?
+    - [ ] All requirements from spec implemented?
    - [ ] File paths match spec?
    - [ ] Function signatures match spec?
    - [ ] Behavior matches expected?
    - [ ] Nothing extra added (no scope creep)?
    OUTPUT: PASS or list of specific spec gaps to fix.
    """,
    toolsets=['file']
 )
 ```
-**If spec issues found:**
+**If spec issues found:** Fix gaps, then re-run spec review. Continue only when spec-compliant.
 - Subagent fixes gaps
 - Re-run spec review
 - Continue only when spec-compliant
-#### Step 4: Code Quality Review
+#### Step 3: Dispatch Code Quality Reviewer
-Dispatch quality reviewer:
+After spec compliance passes:
 ```python
 delegate_task(
-    goal="Review code quality and best practices",
+    goal="Review code quality for Task 1 implementation",
    context="""
-    Code to review: [file paths]
+    FILES TO REVIEW:
-    
+    - src/models/user.py
-    Check:
+    - tests/models/test_user.py
-    - Follows project style?
+
-    - Proper error handling?
+    CHECK:
-    - Good naming?
+    - [ ] Follows project conventions and style?
-    - Test coverage adequate?
+    - [ ] Proper error handling?
-    - No obvious bugs?
+    - [ ] Clear variable/function names?
    - [ ] Adequate test coverage?
    - [ ] No obvious bugs or missed edge cases?
    - [ ] No security issues?
    OUTPUT FORMAT:
    - Critical Issues: [must fix before proceeding]
    - Important Issues: [should fix]
    - Minor Issues: [optional]
    - Verdict: APPROVED or REQUEST_CHANGES
    """,
    toolsets=['file']
 )
 ```
-**If quality issues found:**
+**If quality issues found:** Fix issues, re-review. Continue only when approved.
 - Subagent fixes issues
 - Re-run quality review
 - Continue only when approved
-#### Step 5: Mark Complete
+#### Step 4: Mark Complete
-Update todo list, mark task complete.
+```python
 todo([{"id": "task-1", "content": "Create User model with email field", "status": "completed"}], merge=True)
 ```
 ### 3. Final Review
-After ALL tasks complete:
+After ALL tasks are complete, dispatch a final integration reviewer:
 ```python
 delegate_task(
-    goal="Review entire implementation for consistency",
+    goal="Review the entire implementation for consistency and integration issues",
-    context="All tasks completed, review for integration issues",
+    context="""
-    toolsets=['file']
+    All tasks from the plan are complete. Review the full implementation:
    - Do all components work together?
    - Any inconsistencies between tasks?
    - All tests passing?
    - Ready for merge?
    """,
    toolsets=['terminal', 'file']
 )
 ```
-### 4. Branch Cleanup
+### 4. Verify and Commit
-Use `finishing-a-development-branch` skill:
+```bash
- Verify all tests pass
+# Run full test suite
- Present merge options
+pytest tests/ -q
- Clean up worktree
+
 # Review all changes
 git diff --stat
 # Final commit if needed
 git add -A && git commit -m "feat: complete [feature name] implementation"
 ```
 ## Task Granularity
-**Good task size:** 2-5 minutes of focused work
+**Each task = 2-5 minutes of focused work.**
 **Examples:**
 **Too big:**
 - "Implement user authentication system"
@@ -177,188 +198,134 @@ Use `finishing-a-development-branch` skill:
 - "Add password hashing function"
 - "Create login endpoint"
 - "Add JWT token generation"
 - "Create registration endpoint"
-## Communication Pattern
+## Red Flags — Never Do These
-### You to Subagent
+- Start implementation without a plan
-
+- Skip reviews (spec compliance OR code quality)
-**Provide:**
+- Proceed with unfixed critical/important issues
- Clear task description
+- Dispatch multiple implementation subagents for tasks that touch the same files
- Exact file paths
+- Make subagent read the plan file (provide full text in context instead)
- Expected behavior
+- Skip scene-setting context (subagent needs to understand where the task fits)
- Success criteria
+- Ignore subagent questions (answer before letting them proceed)
- Relevant context
+- Accept "close enough" on spec compliance
-
+- Skip review loops (reviewer found issues → implementer fixes → review again)
-**Example:**
+- Let implementer self-review replace actual review (both are needed)
-```
+- **Start code quality review before spec compliance is PASS** (wrong order)
-Task: Add email validation
+- Move to next task while either review has open issues
 Files: Create src/validators/email.py
 Expected: Function returns True for valid emails, False for invalid
 Success: Tests pass for 10 test cases including edge cases
 Context: Used in user registration flow
 ```
 ### Subagent to You
 **Expect:**
 - Questions for clarification
 - Progress updates
 - Completion report
 - Self-review summary
 **Respond to:**
 - Answer questions promptly
 - Provide missing context
 - Approve approach decisions
 ## Two-Stage Review Details
 ### Stage 1: Spec Compliance
 **Checks:**
 - [ ] All requirements from plan implemented
 - [ ] File paths match specification
 - [ ] Function signatures match spec
 - [ ] Behavior matches expected
 - [ ] No scope creep (nothing extra)
 **Output:** PASS or list of spec gaps
 ### Stage 2: Code Quality
 **Checks:**
 - [ ] Follows language conventions
 - [ ] Consistent with project style
 - [ ] Clear variable/function names
 - [ ] Proper error handling
 - [ ] Adequate test coverage
 - [ ] No obvious bugs/edge cases missed
 - [ ] Documentation if needed
 **Output:** APPROVED or list of issues (critical/important/minor)
 ## Handling Issues
-### Critical Issues
+### If Subagent Asks Questions
-**Examples:** Security vulnerability, broken functionality, data loss risk
+- Answer clearly and completely
 - Provide additional context if needed
 - Don't rush them into implementation
-**Action:** Must fix before proceeding
+### If Reviewer Finds Issues
-### Important Issues
+- Implementer subagent (or a new one) fixes them
 - Reviewer reviews again
 - Repeat until approved
 - Don't skip the re-review
-**Examples:** Missing tests, poor error handling, unclear code
+### If Subagent Fails a Task
-**Action:** Should fix before proceeding
+- Dispatch a new fix subagent with specific instructions about what went wrong
 - Don't try to fix manually in the controller session (context pollution)
-### Minor Issues
+## Efficiency Notes
-**Examples:** Style inconsistency, minor refactoring opportunity
+**Why fresh subagent per task:**
 - Prevents context pollution from accumulated state
 - Each subagent gets clean, focused context
 - No confusion from prior tasks' code or reasoning
-**Action:** Note for later, optional fix
+**Why two-stage review:**
 - Spec review catches under/over-building early
 - Quality review ensures the implementation is well-built
 - Catches issues before they compound across tasks
 **Cost trade-off:**
 - More subagent invocations (implementer + 2 reviewers per task)
 - But catches issues early (cheaper than debugging compounded problems later)
 ## Integration with Other Skills
 ### With writing-plans
 This skill EXECUTES plans created by the writing-plans skill:
 1. User requirements → writing-plans → implementation plan
 2. Implementation plan → subagent-driven-development → working code
 ### With test-driven-development
-Subagent should:
+Implementer subagents should follow TDD:
 1. Write failing test first
 2. Implement minimal code
 3. Verify test passes
 4. Commit
-### With systematic-debugging
+Include TDD instructions in every implementer context.
 If subagent encounters bugs:
 1. Pause implementation
 2. Debug systematically
 3. Fix root cause
 4. Resume
 ### With writing-plans
 This skill EXECUTES plans created by writing-plans skill.
 **Sequence:**
 1. brainstorming → writing-plans → subagent-driven-development
 ### With requesting-code-review
-After subagent completes task, use requesting-code-review skill for final validation.
+The two-stage review process IS the code review. For final integration review, use the requesting-code-review skill's review dimensions.
-## Common Patterns
+### With systematic-debugging
-### Pattern: Fresh Subagent Per Task
+If a subagent encounters bugs during implementation:
-
+1. Follow systematic-debugging process
-**Why:** Prevents context pollution
+2. Find root cause before fixing
-**How:** New delegate_task for each task
+3. Write regression test
-**Result:** Each subagent has clean context
+4. Resume implementation
 ### Pattern: Two-Stage Review
 **Why:** Catch issues early, ensure quality
 **How:** Spec review → Quality review
 **Result:** High-quality, spec-compliant code
 ### Pattern: Frequent Checkpoints
 **Why:** Catch issues before they compound
 **How:** Review after each task
 **Result:** Issues don't cascade
 ## Best Practices
 1. **Clear Task Boundaries**
   - One task = one focused change
   - Independent where possible
   - Clear success criteria
 2. **Complete Context**
   - Provide all needed files
   - Explain project conventions
   - Share relevant examples
 3. **Review Discipline**
   - Don't skip spec review
   - Address critical issues immediately
   - Keep quality bar consistent
 4. **Communication**
   - Answer subagent questions quickly
   - Clarify when needed
   - Provide feedback on reviews
 ## Example Workflow
-```markdown
+```
-User: Implement user authentication
+[Read plan: docs/plans/auth-feature.md]
 [Create todo list with 5 tasks]
-You: I'll use subagent-driven development. Let me create a plan first.
+--- Task 1: Create User model ---
-[Uses writing-plans skill]
+[Dispatch implementer subagent]
  Implementer: "Should email be unique?"
  You: "Yes, email must be unique"
  Implementer: Implemented, 3/3 tests passing, committed.
-Plan created with 5 tasks:
+[Dispatch spec reviewer]
-1. Create User model
+  Spec reviewer: ✅ PASS — all requirements met
 2. Add password hashing
 3. Implement login endpoint
 4. Add JWT middleware
 5. Create registration endpoint
--- Task 1 ---
+[Dispatch quality reviewer]
-[Dispatch implementer subagent for User model]
+  Quality reviewer: ✅ APPROVED — clean code, good tests
 [Subagent asks: "Should email be unique?"]
 You: Yes, email must be unique
 [Subagent implements]
 [Dispatch spec reviewer - PASS]
 [Dispatch quality reviewer - APPROVED]
 Task 1 complete
--- Task 2 ---
+[Mark Task 1 complete]
 [Dispatch implementer for password hashing]
 ...
-[After all tasks]
+--- Task 2: Password hashing ---
-[Final review]
+[Dispatch implementer subagent]
-[Merge branch]
+  Implementer: No questions, implemented, 5/5 tests passing.
 [Dispatch spec reviewer]
  Spec reviewer: ❌ Missing: password strength validation (spec says "min 8 chars")
 [Implementer fixes]
  Implementer: Added validation, 7/7 tests passing.
 [Dispatch spec reviewer again]
  Spec reviewer: ✅ PASS
 [Dispatch quality reviewer]
  Quality reviewer: Important: Magic number 8, extract to constant
  Implementer: Extracted MIN_PASSWORD_LENGTH constant
  Quality reviewer: ✅ APPROVED
 [Mark Task 2 complete]
 ... (continue for all tasks)
 [After all tasks: dispatch final integration reviewer]
 [Run full test suite: all passing]
 [Done!]
 ```
 ## Remember
@@ -366,8 +333,8 @@ Task 1 complete
 ```
 Fresh subagent per task
 Two-stage review every time
-Spec compliance first
+Spec compliance FIRST
-Code quality second
+Code quality SECOND
 Never skip reviews
 Catch issues early
 ```
--- a/skills/software-development/systematic-debugging/SKILL.md
+++ b/skills/software-development/systematic-debugging/SKILL.md
@@ -1,8 +1,8 @@
 ---
 name: systematic-debugging
-description: Use when encountering any bug, test failure, or unexpected behavior. 4-phase root cause investigation process - NO fixes without understanding the problem first.
+description: Use when encountering any bug, test failure, or unexpected behavior. 4-phase root cause investigation — NO fixes without understanding the problem first.
-version: 1.0.0
+version: 1.1.0
-author: Hermes Agent (adapted from Superpowers)
+author: Hermes Agent (adapted from obra/superpowers)
 license: MIT
 metadata:
  hermes:
@@ -48,7 +48,7 @@ Use for ANY technical issue:
 **Don't skip when:**
 - Issue seems simple (simple bugs have root causes too)
 - You're in a hurry (rushing guarantees rework)
- Manager wants it fixed NOW (systematic is faster than thrashing)
+- Someone wants it fixed NOW (systematic is faster than thrashing)
 ## The Four Phases
@@ -67,7 +67,7 @@ You MUST complete each phase before proceeding to the next.
 - Read stack traces completely
 - Note line numbers, file paths, error codes
-**Action:** Copy full error message to your notes.
+**Action:** Use `read_file` on the relevant source files. Use `search_files` to find the error string in the codebase.
 ### 2. Reproduce Consistently
@@ -76,16 +76,24 @@ You MUST complete each phase before proceeding to the next.
 - Does it happen every time?
 - If not reproducible → gather more data, don't guess
-**Action:** Write down exact reproduction steps.
+**Action:** Use the `terminal` tool to run the failing test or trigger the bug:
 ```bash
 # Run specific failing test
 pytest tests/test_module.py::test_name -v
 # Run with verbose output
 pytest tests/test_module.py -v --tb=long
 ```
 ### 3. Check Recent Changes
 - What changed that could cause this?
 - Git diff, recent commits
 - New dependencies, config changes
 - Environmental differences
-**Commands:**
+**Action:**
 ```bash
 # Recent commits
 git log --oneline -10
@@ -94,59 +102,50 @@ git log --oneline -10
 git diff
 # Changes in specific file
-git log -p --follow src/problematic_file.py
+git log -p --follow src/problematic_file.py | head -100
 ```
 ### 4. Gather Evidence in Multi-Component Systems
-**WHEN system has multiple components (CI pipeline, API service, database layer):**
+**WHEN system has multiple components (API → service → database, CI → build → deploy):**
 **BEFORE proposing fixes, add diagnostic instrumentation:**
 For EACH component boundary:
- Log what data enters component
+- Log what data enters the component
- Log what data exits component
+- Log what data exits the component
 - Verify environment/config propagation
 - Check state at each layer
-Run once to gather evidence showing WHERE it breaks
+Run once to gather evidence showing WHERE it breaks.
-THEN analyze evidence to identify failing component
+THEN analyze evidence to identify the failing component.
-THEN investigate that specific component
+THEN investigate that specific component.
 ### 5. Trace Data Flow
 **WHEN error is deep in the call stack:**
 - Where does the bad value originate?
 - What called this function with the bad value?
 - Keep tracing upstream until you find the source
 - Fix at the source, not at the symptom
 **Action:** Use `search_files` to trace references:
 **Example (multi-layer system):**
 ```python
-# Layer 1: Entry point
+# Find where the function is called
-def entry_point(input_data):
+search_files("function_name(", path="src/", file_glob="*.py")
    print(f"DEBUG: Input received: {input_data}")
    result = process_layer1(input_data)
    print(f"DEBUG: Layer 1 output: {result}")
    return result
-# Layer 2: Processing
+# Find where the variable is set
-def process_layer1(data):
+search_files("variable_name\\s*=", path="src/", file_glob="*.py")
    print(f"DEBUG: Layer 1 received: {data}")
    # ... processing ...
    print(f"DEBUG: Layer 1 returning: {result}")
    return result
 ```
 **Action:** Add logging, run once, analyze output.
 ### 5. Isolate the Problem
 - Comment out code until problem disappears
 - Binary search through recent changes
 - Create minimal reproduction case
 - Test with fresh environment
 **Action:** Create minimal reproduction case.
 ### Phase 1 Completion Checklist
 - [ ] Error messages fully read and understood
 - [ ] Issue reproduced consistently
 - [ ] Recent changes identified and reviewed
- [ ] Evidence gathered (logs, state)
+- [ ] Evidence gathered (logs, state, data flow)
 - [ ] Problem isolated to specific component/code
 - [ ] Root cause hypothesis formed
@@ -154,290 +153,214 @@ def process_layer1(data):
 ---
-## Phase 2: Solution Design
+## Phase 2: Pattern Analysis
-**Given the root cause, design the fix:**
+**Find the pattern before fixing:**
-### 1. Understand the Fix Area
+### 1. Find Working Examples
- Read relevant code thoroughly
+- Locate similar working code in the same codebase
- Understand data flow
+- What works that's similar to what's broken?
 - Identify affected components
 - Check for similar issues elsewhere
-**Action:** Read all relevant code files.
+**Action:** Use `search_files` to find comparable patterns:
-### 2. Design Minimal Fix
+```python
 search_files("similar_pattern", path="src/", file_glob="*.py")
 ```
- Smallest change that fixes root cause
+### 2. Compare Against References
 - Avoid scope creep
 - Don't refactor while fixing
 - Fix one issue at a time
-**Action:** Write down the exact fix before coding.
+- If implementing a pattern, read the reference implementation COMPLETELY
 - Don't skim — read every line
 - Understand the pattern fully before applying
-### 3. Consider Side Effects
+### 3. Identify Differences
- What else could this change affect?
+- What's different between working and broken?
- Are there dependencies?
+- List every difference, however small
- Will this break other functionality?
+- Don't assume "that can't matter"
-**Action:** Identify potential side effects.
+### 4. Understand Dependencies
-### Phase 2 Completion Checklist
+- What other components does this need?
-
+- What settings, config, environment?
- [ ] Fix area code fully understood
+- What assumptions does it make?
 - [ ] Minimal fix designed
 - [ ] Side effects identified
 - [ ] Fix approach documented
 ---
-## Phase 3: Implementation
+## Phase 3: Hypothesis and Testing
-**Make the fix:**
+**Scientific method:**
-### 1. Write Test First (if possible)
+### 1. Form a Single Hypothesis
-```python
+- State clearly: "I think X is the root cause because Y"
-def test_should_handle_empty_input():
+- Write it down
-    """Regression test for bug #123"""
+- Be specific, not vague
    result = process_data("")
    assert result == expected_empty_result
 ```
-### 2. Implement Fix
+### 2. Test Minimally
-```python
+- Make the SMALLEST possible change to test the hypothesis
-# Before (buggy)
+- One variable at a time
-def process_data(data):
+- Don't fix multiple things at once
    return data.split(",")[0]
-# After (fixed)
+### 3. Verify Before Continuing
-def process_data(data):
+
-    if not data:
+- Did it work? → Phase 4
-        return ""
+- Didn't work? → Form NEW hypothesis
-    return data.split(",")[0]
+- DON'T add more fixes on top
-```
+
 ### 4. When You Don't Know
 - Say "I don't understand X"
 - Don't pretend to know
 - Ask the user for help
 - Research more
 ---
 ## Phase 4: Implementation
 **Fix the root cause, not the symptom:**
 ### 1. Create Failing Test Case
 - Simplest possible reproduction
 - Automated test if possible
 - MUST have before fixing
 - Use the `test-driven-development` skill
 ### 2. Implement Single Fix
 - Address the root cause identified
 - ONE change at a time
 - No "while I'm here" improvements
 - No bundled refactoring
 ### 3. Verify Fix
 ```bash
-# Run the specific test
+# Run the specific regression test
-pytest tests/test_data.py::test_should_handle_empty_input -v
+pytest tests/test_module.py::test_regression -v
-# Run all tests to check for regressions
+# Run full suite — no regressions
-pytest
+pytest tests/ -q
 ```
-### Phase 3 Completion Checklist
+### 4. If Fix Doesn't Work — The Rule of Three
- [ ] Test written that reproduces the bug
+- **STOP.**
- [ ] Minimal fix implemented
+- Count: How many fixes have you tried?
- [ ] Test passes
+- If < 3: Return to Phase 1, re-analyze with new information
- [ ] No regressions introduced
+- **If ≥ 3: STOP and question the architecture (step 5 below)**
 - DON'T attempt Fix #4 without architectural discussion
 ### 5. If 3+ Fixes Failed: Question Architecture
 **Pattern indicating an architectural problem:**
 - Each fix reveals new shared state/coupling in a different place
 - Fixes require "massive refactoring" to implement
 - Each fix creates new symptoms elsewhere
 **STOP and question fundamentals:**
 - Is this pattern fundamentally sound?
 - Are we "sticking with it through sheer inertia"?
 - Should we refactor the architecture vs. continue fixing symptoms?
 **Discuss with the user before attempting more fixes.**
 This is NOT a failed hypothesis — this is a wrong architecture.
 ---
-## Phase 4: Verification
+## Red Flags — STOP and Follow Process
-**Confirm it's actually fixed:**
+If you catch yourself thinking:
 - "Quick fix for now, investigate later"
 - "Just try changing X and see if it works"
 - "Add multiple changes, run tests"
 - "Skip the test, I'll manually verify"
 - "It's probably X, let me fix that"
 - "I don't fully understand but this might work"
 - "Pattern says X but I'll adapt it differently"
 - "Here are the main problems: [lists fixes without investigation]"
 - Proposing solutions before tracing data flow
 - **"One more fix attempt" (when already tried 2+)**
 - **Each fix reveals a new problem in a different place**
-### 1. Reproduce Original Issue
+**ALL of these mean: STOP. Return to Phase 1.**
- Follow original reproduction steps
+**If 3+ fixes failed:** Question the architecture (Phase 4 step 5).
 - Verify issue is resolved
 - Test edge cases
-### 2. Regression Testing
+## Common Rationalizations
-```bash
+| Excuse | Reality |
-# Full test suite
+|--------|---------|
-pytest
+| "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. |
 | "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
 | "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. |
 | "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. |
 | "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
 | "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. |
 | "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
 | "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question the pattern, don't fix again. |
-# Integration tests
+## Quick Reference
 pytest tests/integration/
-# Check related areas
+| Phase | Key Activities | Success Criteria |
-pytest -k "related_feature"
+|-------|---------------|------------------|
-```
+| **1. Root Cause** | Read errors, reproduce, check changes, gather evidence, trace data flow | Understand WHAT and WHY |
 | **2. Pattern** | Find working examples, compare, identify differences | Know what's different |
 | **3. Hypothesis** | Form theory, test minimally, one variable at a time | Confirmed or new hypothesis |
 | **4. Implementation** | Create regression test, fix root cause, verify | Bug resolved, all tests pass |
-### 3. Monitor After Deploy
+## Hermes Agent Integration
- Watch logs for related errors
+### Investigation Tools
 - Check metrics
 - Verify fix in production
-### Phase 4 Completion Checklist
+Use these Hermes tools during Phase 1:
- [ ] Original issue cannot be reproduced
+- **`search_files`** — Find error strings, trace function calls, locate patterns
- [ ] All tests pass
+- **`read_file`** — Read source code with line numbers for precise analysis
- [ ] No new warnings/errors
+- **`terminal`** — Run tests, check git history, reproduce bugs
- [ ] Fix documented (commit message, comments)
+- **`web_search`/`web_extract`** — Research error messages, library docs
---
+### With delegate_task
-## Debugging Techniques
+For complex multi-component debugging, dispatch investigation subagents:
 ### Root Cause Tracing
 Ask "why" 5 times:
 1. Why did it fail? → Null pointer
 2. Why was it null? → Function returned null
 3. Why did function return null? → Missing validation
 4. Why was validation missing? → Assumed input always valid
 5. Why was that assumption wrong? → API changed
 **Root cause:** API change not accounted for
 ### Defense in Depth
 Don't fix just the symptom:
 **Bad:** Add null check at crash site
 **Good:** 
 1. Add validation at API boundary
 2. Add null check at crash site
 3. Add test for both
 4. Document API behavior
 ### Condition-Based Waiting
 For timing/race conditions:
 ```python
-# Bad - arbitrary sleep
+delegate_task(
-import time
+    goal="Investigate why [specific test/behavior] fails",
-time.sleep(5)  # "Should be enough"
+    context="""
    Follow systematic-debugging skill:
    1. Read the error message carefully
    2. Reproduce the issue
    3. Trace the data flow to find root cause
    4. Report findings — do NOT fix yet
-# Good - wait for condition
+    Error: [paste full error]
-from tenacity import retry, wait_exponential, stop_after_attempt
+    File: [path to failing code]
-
+    Test command: [exact command]
-@retry(wait=wait_exponential(multiplier=1, min=4, max=10),
+    """,
-       stop=stop_after_attempt(5))
+    toolsets=['terminal', 'file']
-def wait_for_service():
+)
    response = requests.get(health_url)
    assert response.status_code == 200
 ```
 ---
 ## Common Debugging Pitfalls
 ### Fix Without Understanding
 **Symptom:** "Just add a try/catch"
 **Problem:** Masks the real issue
 **Solution:** Complete Phase 1 before any fix
 ### Shotgun Debugging
 **Symptom:** Change 5 things at once
 **Problem:** Don't know what fixed it
 **Solution:** One change at a time, verify each
 ### Premature Optimization
 **Symptom:** Rewrite while debugging
 **Problem:** Introduces new bugs
 **Solution:** Fix first, refactor later
 ### Assuming Environment
 **Symptom:** "Works on my machine"
 **Problem:** Environment differences
 **Solution:** Check environment variables, versions, configs
 ---
 ## Language-Specific Tools
 ### Python
 ```python
 # Add debugger
 import pdb; pdb.set_trace()
 # Or use ipdb for better experience
 import ipdb; ipdb.set_trace()
 # Log state
 import logging
 logging.debug(f"Variable state: {variable}")
 # Stack trace
 import traceback
 traceback.print_exc()
 ```
 ### JavaScript/TypeScript
 ```javascript
 // Debugger
 debugger;
 // Console with context
 console.log("State:", { var1, var2, var3 });
 // Stack trace
 console.trace("Here");
 // Error with context
 throw new Error(`Failed with input: ${JSON.stringify(input)}`);
 ```
 ### Go
 ```go
 // Print state
 fmt.Printf("Debug: variable=%+v\n", variable)
 // Stack trace
 import "runtime/debug"
 debug.PrintStack()
 // Panic with context
 if err != nil {
    panic(fmt.Sprintf("unexpected error: %v", err))
 }
 ```
 ---
 ## Integration with Other Skills
 ### With test-driven-development
-When debugging:
+When fixing bugs:
-1. Write test that reproduces bug
+1. Write a test that reproduces the bug (RED)
-2. Debug systematically
+2. Debug systematically to find root cause
-3. Fix root cause
+3. Fix the root cause (GREEN)
-4. Test passes
+4. The test proves the fix and prevents regression
-### With writing-plans
+## Real-World Impact
-Include debugging tasks in plans:
+From debugging sessions:
- "Add diagnostic logging"
+- Systematic approach: 15-30 minutes to fix
- "Create reproduction test"
+- Random fixes approach: 2-3 hours of thrashing
- "Verify fix resolves issue"
+- First-time fix rate: 95% vs 40%
-
+- New bugs introduced: Near zero vs common
 ### With subagent-driven-development
 If subagent gets stuck:
 1. Switch to systematic debugging
 2. Analyze root cause
 3. Provide findings to subagent
 4. Resume implementation
 ---
 ## Remember
 ```
 PHASE 1: Investigate → Understand WHY
 PHASE 2: Design → Plan the fix
 PHASE 3: Implement → Make the fix
 PHASE 4: Verify → Confirm it's fixed
 ```
 **No shortcuts. No guessing. Systematic always wins.**
--- a/skills/software-development/test-driven-development/SKILL.md
+++ b/skills/software-development/test-driven-development/SKILL.md
@@ -1,8 +1,8 @@
 ---
 name: test-driven-development
 description: Use when implementing any feature or bugfix, before writing implementation code. Enforces RED-GREEN-REFACTOR cycle with test-first approach.
-version: 1.0.0
+version: 1.1.0
-author: Hermes Agent (adapted from Superpowers)
+author: Hermes Agent (adapted from obra/superpowers)
 license: MIT
 metadata:
  hermes:
@@ -28,7 +28,7 @@ Write the test first. Watch it fail. Write minimal code to pass.
 - Refactoring
 - Behavior changes
-**Exceptions (ask your human partner):**
+**Exceptions (ask the user first):**
 - Throwaway prototypes
 - Generated code
 - Configuration files
@@ -53,11 +53,11 @@ Implement fresh from tests. Period.
 ## Red-Green-Refactor Cycle
-### RED - Write Failing Test
+### RED — Write Failing Test
 Write one minimal test showing what should happen.
-**Good Example:**
+**Good test:**
 ```python
 def test_retries_failed_operations_3_times():
    attempts = 0
@@ -67,43 +67,51 @@ def test_retries_failed_operations_3_times():
        if attempts < 3:
            raise Exception('fail')
        return 'success'
-    
+
    result = retry_operation(operation)
-    
+
    assert result == 'success'
    assert attempts == 3
 ```
- Clear name, tests real behavior, one thing
+Clear name, tests real behavior, one thing.
-**Bad Example:**
+**Bad test:**
 ```python
 def test_retry_works():
    mock = MagicMock()
    mock.side_effect = [Exception(), Exception(), 'success']
    result = retry_operation(mock)
    assert result == 'success'  # What about retry count? Timing?
 ```
- Vague name, mocks behavior not reality, unclear what it tests
+Vague name, tests mock not real code.
-### Verify RED
+**Requirements:**
 - One behavior per test
 - Clear descriptive name ("and" in name? Split it)
 - Real code, not mocks (unless truly unavoidable)
 - Name describes behavior, not implementation
-Run the test. It MUST fail.
+### Verify RED — Watch It Fail
-**If it passes:**
+**MANDATORY. Never skip.**
 - Test is wrong (not testing what you think)
 - Code already exists (delete it, start over)
 - Wrong test file running
-**What to check:**
+```bash
- Error message makes sense
+# Use terminal tool to run the specific test
- Fails for expected reason
+pytest tests/test_feature.py::test_specific_behavior -v
- Stack trace points to right place
+```
-### GREEN - Minimal Code
+Confirm:
 - Test fails (not errors from typos)
 - Failure message is expected
 - Fails because the feature is missing
-Write just enough code to pass. Nothing more.
+**Test passes immediately?** You're testing existing behavior. Fix the test.
 **Test errors?** Fix the error, re-run until it fails correctly.
 ### GREEN — Minimal Code
 Write the simplest code to pass the test. Nothing more.
 **Good:**
 ```python
@@ -119,266 +127,216 @@ def add(a, b):
    return result
 ```
-Cheating is OK in GREEN:
+Don't add features, refactor other code, or "improve" beyond the test.
 **Cheating is OK in GREEN:**
 - Hardcode return values
 - Copy-paste
 - Duplicate code
 - Skip edge cases
-**We'll fix it in refactor.**
+We'll fix it in REFACTOR.
-### Verify GREEN
+### Verify GREEN — Watch It Pass
-Run tests. All must pass.
+**MANDATORY.**
-**If fails:**
+```bash
- Fix minimal code
+# Run the specific test
- Don't expand scope
+pytest tests/test_feature.py::test_specific_behavior -v
 - Stay in GREEN
-### REFACTOR - Clean Up
+# Then run ALL tests to check for regressions
 pytest tests/ -q
 ```
-Now improve the code while keeping tests green.
+Confirm:
 - Test passes
 - Other tests still pass
 - Output pristine (no errors, warnings)
-**Safe refactorings:**
+**Test fails?** Fix the code, not the test.
- Rename variables/functions
+
- Extract helper functions
+**Other tests fail?** Fix regressions now.
 ### REFACTOR — Clean Up
 After green only:
 - Remove duplication
 - Improve names
 - Extract helpers
 - Simplify expressions
 - Improve readability
-**Golden rule:** Tests stay green throughout.
+Keep tests green throughout. Don't add behavior.
-**If tests fail during refactor:**
+**If tests fail during refactor:** Undo immediately. Take smaller steps.
 - Undo immediately
 - Smaller refactoring steps
 - Check you didn't change behavior
-## Implementation Workflow
+### Repeat
-### 1. Create Test File
+Next failing test for next behavior. One cycle at a time.
-```bash
+## Why Order Matters
 # Python
 touch tests/test_feature.py
-# JavaScript
+**"I'll write tests after to verify it works"**
 touch tests/feature.test.js
-# Rust
+Tests written after code pass immediately. Passing immediately proves nothing:
-touch tests/feature_tests.rs
+- Might test the wrong thing
-```
+- Might test implementation, not behavior
 - Might miss edge cases you forgot
 - You never saw it catch the bug
-### 2. Write First Failing Test
+Test-first forces you to see the test fail, proving it actually tests something.
 **"I already manually tested all the edge cases"**
 Manual testing is ad-hoc. You think you tested everything but:
 - No record of what you tested
 - Can't re-run when code changes
 - Easy to forget cases under pressure
 - "It worked when I tried it" ≠ comprehensive
 Automated tests are systematic. They run the same way every time.
 **"Deleting X hours of work is wasteful"**
 Sunk cost fallacy. The time is already gone. Your choice now:
 - Delete and rewrite with TDD (high confidence)
 - Keep it and add tests after (low confidence, likely bugs)
 The "waste" is keeping code you can't trust.
 **"TDD is dogmatic, being pragmatic means adapting"**
 TDD IS pragmatic:
 - Finds bugs before commit (faster than debugging after)
 - Prevents regressions (tests catch breaks immediately)
 - Documents behavior (tests show how to use code)
 - Enables refactoring (change freely, tests catch breaks)
 "Pragmatic" shortcuts = debugging in production = slower.
 **"Tests after achieve the same goals — it's spirit not ritual"**
 No. Tests-after answer "What does this do?" Tests-first answer "What should this do?"
 Tests-after are biased by your implementation. You test what you built, not what's required. Tests-first force edge case discovery before implementing.
 ## Common Rationalizations
 | Excuse | Reality |
 |--------|---------|
 | "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
 | "I'll test after" | Tests passing immediately prove nothing. |
 | "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
 | "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
 | "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
 | "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
 | "Need to explore first" | Fine. Throw away exploration, start with TDD. |
 | "Test hard = design unclear" | Listen to the test. Hard to test = hard to use. |
 | "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
 | "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. |
 | "Existing code has no tests" | You're improving it. Add tests for the code you touch. |
 ## Red Flags — STOP and Start Over
 If you catch yourself doing any of these, delete the code and restart with TDD:
 - Code before test
 - Test after implementation
 - Test passes immediately on first run
 - Can't explain why test failed
 - Tests added "later"
 - Rationalizing "just this once"
 - "I already manually tested it"
 - "Tests after achieve the same purpose"
 - "Keep as reference" or "adapt existing code"
 - "Already spent X hours, deleting is wasteful"
 - "TDD is dogmatic, I'm being pragmatic"
 - "This is different because..."
 **All of these mean: Delete code. Start over with TDD.**
 ## Verification Checklist
 Before marking work complete:
 - [ ] Every new function/method has a test
 - [ ] Watched each test fail before implementing
 - [ ] Each test failed for expected reason (feature missing, not typo)
 - [ ] Wrote minimal code to pass each test
 - [ ] All tests pass
 - [ ] Output pristine (no errors, warnings)
 - [ ] Tests use real code (mocks only if unavoidable)
 - [ ] Edge cases and errors covered
 Can't check all boxes? You skipped TDD. Start over.
 ## When Stuck
 | Problem | Solution |
 |---------|----------|
 | Don't know how to test | Write the wished-for API. Write the assertion first. Ask the user. |
 | Test too complicated | Design too complicated. Simplify the interface. |
 | Must mock everything | Code too coupled. Use dependency injection. |
 | Test setup huge | Extract helpers. Still complex? Simplify the design. |
 ## Hermes Agent Integration
 ### Running Tests
 Use the `terminal` tool to run tests at each step:
 ```python
-# tests/test_calculator.py
+# RED — verify failure
-def test_adds_two_numbers():
+terminal("pytest tests/test_feature.py::test_name -v")
-    calc = Calculator()
+
-    result = calc.add(2, 3)
+# GREEN — verify pass
-    assert result == 5
+terminal("pytest tests/test_feature.py::test_name -v")
 # Full suite — verify no regressions
 terminal("pytest tests/ -q")
 ```
-### 3. Run and Verify Failure
+### With delegate_task
-```bash
+When dispatching subagents for implementation, enforce TDD in the goal:
 pytest tests/test_calculator.py -v
 # Expected: FAIL - Calculator not defined
 ```
 ### 4. Write Minimal Implementation
 ```python
-# src/calculator.py
+delegate_task(
-class Calculator:
+    goal="Implement [feature] using strict TDD",
-    def add(self, a, b):
+    context="""
-        return a + b  # Minimal!
+    Follow test-driven-development skill:
    1. Write failing test FIRST
    2. Run test to verify it fails
    3. Write minimal code to pass
    4. Run test to verify it passes
    5. Refactor if needed
    6. Commit
    Project test command: pytest tests/ -q
    Project structure: [describe relevant files]
    """,
    toolsets=['terminal', 'file']
 )
 ```
 ### 5. Run and Verify Pass
 ```bash
 pytest tests/test_calculator.py -v
 # Expected: PASS
 ```
 ### 6. Commit
 ```bash
 git add tests/test_calculator.py src/calculator.py
 git commit -m "feat: add calculator with add method"
 ```
 ### 7. Next Test
 ```python
 def test_adds_negative_numbers():
    calc = Calculator()
    result = calc.add(-2, -3)
    assert result == -5
 ```
 Repeat cycle.
 ## Testing Anti-Patterns
 ### Mocking What You Don't Own
 **Bad:** Mock database, HTTP client, file system
 **Good:** Abstract behind interface, test interface
 ### Testing Implementation Details
 **Bad:** Test that function was called
 **Good:** Test the result/behavior
 ### Happy Path Only
 **Bad:** Only test expected inputs
 **Good:** Test edge cases, errors, boundaries
 ### Brittle Tests
 **Bad:** Test breaks when refactoring
 **Good:** Tests verify behavior, not structure
 ## Common Pitfalls
 ### "I'll Write Tests After"
 No, you won't. Write them first.
 ### "This is Too Simple to Test"
 Simple bugs cause complex problems. Test everything.
 ### "I Need to See It Work First"
 Temporary code becomes permanent. Test first.
 ### "Tests Take Too Long"
 Untested code takes longer. Invest in tests.
 ## Language-Specific Commands
 ### Python (pytest)
 ```bash
 # Run all tests
 pytest
 # Run specific test
 pytest tests/test_feature.py::test_name -v
 # Run with coverage
 pytest --cov=src --cov-report=term-missing
 # Watch mode
 pytest-watch
 ```
 ### JavaScript (Jest)
 ```bash
 # Run all tests
 npm test
 # Run specific test
 npm test -- test_name
 # Watch mode
 npm test -- --watch
 # Coverage
 npm test -- --coverage
 ```
 ### TypeScript (Jest with ts-jest)
 ```bash
 # Run tests
 npx jest
 # Run specific file
 npx jest tests/feature.test.ts
 ```
 ### Go
 ```bash
 # Run all tests
 go test ./...
 # Run specific test
 go test -run TestName
 # Verbose
 go test -v
 # Coverage
 go test -cover
 ```
 ### Rust
 ```bash
 # Run tests
 cargo test
 # Run specific test
 cargo test test_name
 # Show output
 cargo test -- --nocapture
 ```
 ## Integration with Other Skills
 ### With writing-plans
 Every plan task should specify:
 - What test to write
 - Expected test failure
 - Minimal implementation
 - Refactoring opportunities
 ### With systematic-debugging
-When fixing bugs:
+Bug found? Write failing test reproducing it. Follow TDD cycle. The test proves the fix and prevents regression.
 1. Write test that reproduces bug
 2. Verify test fails (RED)
 3. Fix bug (GREEN)
 4. Refactor if needed
-### With subagent-driven-development
+Never fix bugs without a test.
-Subagent implements one test at a time:
+## Testing Anti-Patterns
 1. Write failing test
 2. Minimal code to pass
 3. Commit
 4. Next test
-## Success Indicators
+- **Testing mock behavior instead of real behavior** — mocks should verify interactions, not replace the system under test
 - **Testing implementation details** — test behavior/results, not internal method calls
 - **Happy path only** — always test edge cases, errors, and boundaries
 - **Brittle tests** — tests should verify behavior, not structure; refactoring shouldn't break them
-**You're doing TDD right when:**
+## Final Rule
 - Tests fail before code exists
 - You write <10 lines between test runs
 - Refactoring feels safe
 - Bugs are caught immediately
 - Code is simpler than expected
-**Red flags:**
+```
- Writing 50+ lines without running tests
+Production code → test exists and failed first
- Tests always pass
+Otherwise → not TDD
- Fear of refactoring
+```
 - "I'll test later"
-## Remember
+No exceptions without the user's explicit permission.
 1. **RED** - Write failing test
 2. **GREEN** - Minimal code to pass  
 3. **REFACTOR** - Clean up safely
 4. **REPEAT** - Next behavior
 **If you didn't see it fail, it doesn't test what you think.**
--- a/skills/software-development/writing-plans/SKILL.md
+++ b/skills/software-development/writing-plans/SKILL.md
@@ -1,8 +1,8 @@
 ---
 name: writing-plans
 description: Use when you have a spec or requirements for a multi-step task. Creates comprehensive implementation plans with bite-sized tasks, exact file paths, and complete code examples.
-version: 1.0.0
+version: 1.1.0
-author: Hermes Agent (adapted from Superpowers)
+author: Hermes Agent (adapted from obra/superpowers)
 license: MIT
 metadata:
  hermes:
@@ -14,112 +14,34 @@ metadata:
 ## Overview
-Transform specifications into actionable implementation plans. Write comprehensive plans that any developer can follow - even with zero context about your codebase.
+Write comprehensive implementation plans assuming the implementer has zero context for the codebase and questionable taste. Document everything they need: which files to touch, complete code, testing commands, docs to check, how to verify. Give them bite-sized tasks. DRY. YAGNI. TDD. Frequent commits.
-**Core principle:** Document everything: exact file paths, complete code, test commands, verification steps.
+Assume the implementer is a skilled developer but knows almost nothing about the toolset or problem domain. Assume they don't know good test design very well.
-**Assume the implementer:**
+**Core principle:** A good plan makes implementation obvious. If someone has to guess, the plan is incomplete.
 - Is a skilled developer
 - Knows almost nothing about your codebase
 - Has questionable taste in code style
 - Needs explicit guidance
 ## When to Use
-**Always use this skill:**
+**Always use before:**
- Before implementing multi-step features
+- Implementing multi-step features
- After design approval (from brainstorming)
+- Breaking down complex requirements
- When breaking down complex requirements
+- Delegating to subagents via subagent-driven-development
 - Before delegating to subagents
 **Don't skip when:**
 - Feature seems simple (assumptions cause bugs)
- You plan to implement yourself (future you needs guidance)
+- You plan to implement it yourself (future you needs guidance)
 - Working alone (documentation matters)
-## Plan Document Structure
+## Bite-Sized Task Granularity
-### Header (Required)
+**Each task = 2-5 minutes of focused work.**
-Every plan MUST start with:
+Every step is one action:
-
+- "Write the failing test" — step
-```markdown
+- "Run it to make sure it fails" — step
-# [Feature Name] Implementation Plan
+- "Implement the minimal code to make the test pass" — step
-
+- "Run the tests and make sure they pass" — step
-> **For Hermes:** Use subagent-driven-development or executing-plans skill to implement this plan.
+- "Commit" — step
 **Goal:** One sentence describing what this builds
 **Architecture:** 2-3 sentences about approach
 **Tech Stack:** Key technologies/libraries
 ---
 ```
 ### Task Structure
 Each task follows this format:
 ```markdown
 ### Task N: [Descriptive Name]
 **Objective:** What this task accomplishes (one sentence)
 **Files:**
 - Create: `exact/path/to/new_file.py`
 - Modify: `exact/path/to/existing.py:45-67` (line numbers if known)
 - Test: `tests/path/to/test_file.py`
 **Implementation Steps:**
 **Step 1: [Action description]**
 ```python
 # Complete code to write
 class NewClass:
    def __init__(self):
        self.value = None
    def process(self, input):
        return input.upper()
 ```
 **Step 2: [Action description]**
 ```bash
 # Command to run
 pytest tests/test_new.py -v
 ```
 Expected: Tests pass with 3 green dots
 **Step 3: [Action description]**
 ```python
 # More code if needed
 ```
 **Verification:**
 - [ ] Test passes
 - [ ] Function returns expected output
 - [ ] No syntax errors
 **Commit:**
 ```bash
 git add src/new_file.py tests/test_new.py
 git commit -m "feat: add new feature component"
 ```
 ```
 ## Task Granularity
 **Each task = 2-5 minutes of work**
 **Break down:**
 - "Write failing test" - one task
 - "Run test to verify it fails" - one task  
 - "Implement minimal code" - one task
 - "Run test to verify pass" - one task
 - "Commit" - one task
 **Examples:**
 **Too big:**
 ```markdown
@@ -139,6 +61,146 @@ git commit -m "feat: add new feature component"
 [15 lines, 1 file]
 ```
 ## Plan Document Structure
 ### Header (Required)
 Every plan MUST start with:
 ```markdown
 # [Feature Name] Implementation Plan
 > **For Hermes:** Use subagent-driven-development skill to implement this plan task-by-task.
 **Goal:** [One sentence describing what this builds]
 **Architecture:** [2-3 sentences about approach]
 **Tech Stack:** [Key technologies/libraries]
 ---
 ```
 ### Task Structure
 Each task follows this format:
 ````markdown
 ### Task N: [Descriptive Name]
 **Objective:** What this task accomplishes (one sentence)
 **Files:**
 - Create: `exact/path/to/new_file.py`
 - Modify: `exact/path/to/existing.py:45-67` (line numbers if known)
 - Test: `tests/path/to/test_file.py`
 **Step 1: Write failing test**
 ```python
 def test_specific_behavior():
    result = function(input)
    assert result == expected
 ```
 **Step 2: Run test to verify failure**
 Run: `pytest tests/path/test.py::test_specific_behavior -v`
 Expected: FAIL — "function not defined"
 **Step 3: Write minimal implementation**
 ```python
 def function(input):
    return expected
 ```
 **Step 4: Run test to verify pass**
 Run: `pytest tests/path/test.py::test_specific_behavior -v`
 Expected: PASS
 **Step 5: Commit**
 ```bash
 git add tests/path/test.py src/path/file.py
 git commit -m "feat: add specific feature"
 ```
 ````
 ## Writing Process
 ### Step 1: Understand Requirements
 Read and understand:
 - Feature requirements
 - Design documents or user description
 - Acceptance criteria
 - Constraints
 ### Step 2: Explore the Codebase
 Use Hermes tools to understand the project:
 ```python
 # Understand project structure
 search_files("*.py", target="files", path="src/")
 # Look at similar features
 search_files("similar_pattern", path="src/", file_glob="*.py")
 # Check existing tests
 search_files("*.py", target="files", path="tests/")
 # Read key files
 read_file("src/app.py")
 ```
 ### Step 3: Design Approach
 Decide:
 - Architecture pattern
 - File organization
 - Dependencies needed
 - Testing strategy
 ### Step 4: Write Tasks
 Create tasks in order:
 1. Setup/infrastructure
 2. Core functionality (TDD for each)
 3. Edge cases
 4. Integration
 5. Cleanup/documentation
 ### Step 5: Add Complete Details
 For each task, include:
 - **Exact file paths** (not "the config file" but `src/config/settings.py`)
 - **Complete code examples** (not "add validation" but the actual code)
 - **Exact commands** with expected output
 - **Verification steps** that prove the task works
 ### Step 6: Review the Plan
 Check:
 - [ ] Tasks are sequential and logical
 - [ ] Each task is bite-sized (2-5 min)
 - [ ] File paths are exact
 - [ ] Code examples are complete (copy-pasteable)
 - [ ] Commands are exact with expected output
 - [ ] No missing context
 - [ ] DRY, YAGNI, TDD principles applied
 ### Step 7: Save the Plan
 ```bash
 mkdir -p docs/plans
 # Save plan to docs/plans/YYYY-MM-DD-feature-name.md
 git add docs/plans/
 git commit -m "docs: add implementation plan for [feature]"
 ```
 ## Principles
 ### DRY (Don't Repeat Yourself)
@@ -146,35 +208,21 @@ git commit -m "feat: add new feature component"
 **Bad:** Copy-paste validation in 3 places
 **Good:** Extract validation function, use everywhere
 ```python
 # Good - DRY
 def validate_email(email):
    if not re.match(r'^[^@]+@[^@]+$', email):
        raise ValueError("Invalid email")
    return email
 # Use everywhere
 validate_email(user_input)
 validate_email(config_email)
 validate_email(imported_data)
 ```
 ### YAGNI (You Aren't Gonna Need It)
 **Bad:** Add "flexibility" for future requirements
 **Good:** Implement only what's needed now
 ```python
-# Bad - YAGNI violation
+# Bad — YAGNI violation
 class User:
    def __init__(self, name, email):
        self.name = name
        self.email = email
        self.preferences = {}  # Not needed yet!
        self.metadata = {}     # Not needed yet!
        self.settings = {}     # Not needed yet!
-# Good - YAGNI
+# Good — YAGNI
 class User:
    def __init__(self, name, email):
        self.name = name
@@ -183,7 +231,7 @@ class User:
 ### TDD (Test-Driven Development)
-Every task that produces code should include:
+Every task that produces code should include the full TDD cycle:
 1. Write failing test
 2. Run to verify failure
 3. Write minimal code
@@ -199,255 +247,50 @@ git add [files]
 git commit -m "type: description"
 ```
 ## Writing Process
 ### Step 1: Understand Requirements
 Read and understand:
 - Feature requirements
 - Design documents
 - Acceptance criteria
 - Constraints
 ### Step 2: Explore Codebase
 ```bash
 # Understand project structure
 find src -type f -name "*.py" | head -20
 # Look at similar features
 grep -r "similar_pattern" src/
 # Check existing tests
 ls tests/
 ```
 ### Step 3: Design Approach
 Decide:
 - Architecture pattern
 - File organization
 - Dependencies needed
 - Testing strategy
 ### Step 4: Write Tasks
 Create tasks in order:
 1. Setup/infrastructure
 2. Core functionality
 3. Edge cases
 4. Integration
 5. Cleanup
 ### Step 5: Add Details
 For each task, add:
 - Exact file paths
 - Complete code examples
 - Exact commands
 - Expected outputs
 - Verification steps
 ### Step 6: Review Plan
 Check:
 - [ ] Tasks are sequential and logical
 - [ ] Each task is bite-sized (2-5 min)
 - [ ] File paths are exact
 - [ ] Code examples are complete
 - [ ] Commands are exact with expected output
 - [ ] No missing context
 ### Step 7: Save Plan
 ```bash
 # Create plans directory
 mkdir -p docs/plans
 # Save plan
 cat > docs/plans/YYYY-MM-DD-feature-name.md << 'EOF'
 [plan content]
 EOF
 # Commit plan
 git add docs/plans/YYYY-MM-DD-feature-name.md
 git commit -m "docs: add implementation plan for feature"
 ```
 ## Example Plan
 ```markdown
 # User Authentication Implementation Plan
 > **For Hermes:** Use subagent-driven-development to implement this plan.
 **Goal:** Add JWT-based user authentication to the Flask API
 **Architecture:** Use PyJWT for tokens, bcrypt for hashing. Middleware validates tokens on protected routes.
 **Tech Stack:** Python, Flask, PyJWT, bcrypt
 ---
 ### Task 1: Create User model
 **Objective:** Define User model with email and hashed password
 **Files:**
 - Create: `src/models/user.py`
 - Test: `tests/models/test_user.py`
 **Step 1: Write failing test**
 ```python
 def test_user_creation():
    user = User(email="test@example.com", password="secret123")
    assert user.email == "test@example.com"
    assert user.password_hash is not None
    assert user.password_hash != "secret123"  # Should be hashed
 ```
 **Step 2: Run to verify failure**
 ```bash
 pytest tests/models/test_user.py -v
 ```
 Expected: FAIL - User class not defined
 **Step 3: Implement User model**
 ```python
 import bcrypt
 class User:
    def __init__(self, email, password):
        self.email = email
        self.password_hash = bcrypt.hashpw(
            password.encode(), 
            bcrypt.gensalt()
        )
 ```
 **Step 4: Run to verify pass**
 ```bash
 pytest tests/models/test_user.py -v
 ```
 Expected: PASS
 **Commit:**
 ```bash
 git add src/models/user.py tests/models/test_user.py
 git commit -m "feat: add User model with password hashing"
 ```
 ### Task 2: Create login endpoint
 **Objective:** Add POST /login endpoint that returns JWT
 **Files:**
 - Modify: `src/app.py`
 - Test: `tests/test_login.py`
 [Continue...]
 ```
 ## Common Mistakes
 ### Vague Tasks
-**Bad:**
+**Bad:** "Add authentication"
-```markdown
+**Good:** "Create User model with email and password_hash fields"
 ### Task 1: Add authentication
 ```
 **Good:**
 ```markdown
 ### Task 1: Create User model with email and password_hash fields
 ```
 ### Incomplete Code
-**Bad:**
+**Bad:** "Step 1: Add validation function"
-```markdown
+**Good:** "Step 1: Add validation function" followed by the complete function code
 Step 1: Add validation function
 ```
 **Good:**
 ```markdown
 Step 1: Add validation function
 ```python
 def validate_email(email):
    """Validate email format."""
    import re
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    if not re.match(pattern, email):
        raise ValueError(f"Invalid email: {email}")
    return email
 ```
 ### Missing Verification
-**Bad:**
+**Bad:** "Step 3: Test it works"
-```markdown
+**Good:** "Step 3: Run `pytest tests/test_auth.py -v`, expected: 3 passed"
 Step 3: Test it works
 ```
-**Good:**
+### Missing File Paths
 ```markdown
 Step 3: Verify authentication
 ```bash
 curl -X POST http://localhost:5000/login \
  -H "Content-Type: application/json" \
  -d '{"email":"test@example.com","password":"secret123"}'
 ```
 Expected: Returns 200 with JWT token in response
 ```
-## Integration with Other Skills
+**Bad:** "Create the model file"
 **Good:** "Create: `src/models/user.py`"
-### With brainstorming
+## Execution Handoff
-**Sequence:**
+After saving the plan, offer the execution approach:
 1. brainstorming → Explore and refine design
 2. writing-plans → Create implementation plan
 3. subagent-driven-development → Execute plan
-### With subagent-driven-development
+**"Plan complete and saved. Ready to execute using subagent-driven-development — I'll dispatch a fresh subagent per task with two-stage review (spec compliance then code quality). Shall I proceed?"**
-Plans feed into subagent-driven-development:
+When executing, use the `subagent-driven-development` skill:
- Subagents implement each task
+- Fresh `delegate_task` per task with full context
- Two-stage review ensures quality
+- Spec compliance review after each task
- Plan provides context and requirements
+- Code quality review after spec passes
-
+- Proceed only when both reviews approve
 ### With test-driven-development
 Every code-producing task should follow TDD:
 1. Write failing test
 2. Verify failure
 3. Write minimal code
 4. Verify pass
 ## Success Checklist
 Before considering a plan complete:
 - [ ] Header with goal, architecture, tech stack
 - [ ] All tasks are bite-sized (2-5 min each)
 - [ ] Exact file paths for every file
 - [ ] Complete code examples (not partial)
 - [ ] Exact commands with expected output
 - [ ] Verification steps for each task
 - [ ] Commit commands included
 - [ ] DRY, YAGNI, TDD principles applied
 - [ ] Tasks are sequential and logical
 - [ ] Plan saved to docs/plans/
 ## Remember
 ```
-Bite-sized tasks
+Bite-sized tasks (2-5 min each)
 Exact file paths
-Complete code
+Complete code (copy-pasteable)
-Exact commands
+Exact commands with expected output
 Verification steps
 DRY, YAGNI, TDD
 Frequent commits
 ```
 **A good plan makes implementation obvious.**