refactor: enhance software-development skills with Hermes integration

Improvements to all 5 skills adapted from obra/superpowers: - Restored anti-rationalization tables and red flags from originals (key behavioral guardrails that prevent LLMs from taking shortcuts) - Restored 'Rule of Three' for debugging (3+ failed fixes = question architecture, not keep fixing) - Restored Pattern Analysis and Hypothesis Testing phases in debugging - Restored 'Why Order Matters' rebuttals and verification checklist in TDD - Added proper Hermes delegate_task integration with real parameter examples and toolset specifications throughout - Added Hermes tool usage (search_files, read_file, terminal) for investigation and verification steps - Removed references to non-existent skills (brainstorming, finishing-a-development-branch, executing-plans, using-git-worktrees) - Removed generic language-specific sections (Go, Rust, Jest) that added bulk without agent value - Tightened prose — cut ~430 lines while adding more actionable content - Added execution handoff section to writing-plans - Consistent cross-references between the 5 skills
2026-03-03 04:08:56 -08:00
parent 0e1723ef74
commit de0af4df66
5 changed files with 879 additions and 1312 deletions
--- a/skills/software-development/requesting-code-review/SKILL.md
+++ b/skills/software-development/requesting-code-review/SKILL.md
@@ -1,8 +1,8 @@
 ---
 name: requesting-code-review
 description: Use when completing tasks, implementing major features, or before merging. Validates work meets requirements through systematic review process.
-version: 1.0.0
-author: Hermes Agent (adapted from Superpowers)
+version: 1.1.0
+author: Hermes Agent (adapted from obra/superpowers)
 license: MIT
 metadata:
  hermes:
@@ -14,108 +14,102 @@ metadata:

 ## Overview

-Systematic code review catches issues before they cascade. Review early, review often.
+Dispatch a reviewer subagent to catch issues before they cascade. Review early, review often.

 **Core principle:** Fresh perspective finds issues you'll miss.

 ## When to Request Review

-**Mandatory Reviews:**
+**Mandatory:**
 - After each task in subagent-driven development
- After completing major features
+- After completing a major feature
 - Before merge to main
 - After bug fixes

-**Optional but Valuable:**
+**Optional but valuable:**
 - When stuck (fresh perspective)
 - Before refactoring (baseline check)
 - After complex logic implementation
 - When touching critical code (auth, payments, data)

-**Don't skip because:**
- "It's simple" (simple bugs compound)
- "I'm in a hurry" (reviews save time)
- "I tested it" (you have blind spots)
+**Never skip because:**
+- "It's simple" — simple bugs compound
+- "I'm in a hurry" — reviews save time
+- "I tested it" — you have blind spots

 ## Review Process

-### Step 1: Prepare Context
+### Step 1: Self-Review First

-Gather:
- What was implemented
- Original requirements/plan
- Files changed
- Test results
+Before dispatching a reviewer, check yourself:

-```bash
-# Get changed files
-git diff --name-only HEAD~1
-
-# Get diff summary
-git diff --stat HEAD~1
-
-# Get commit messages
-git log --oneline HEAD~5
-```
-
-### Step 2: Self-Review First
-
-Before requesting external review:
-
-**Checklist:**
 - [ ] Code follows project conventions
 - [ ] All tests pass
- [ ] No debug print statements
- [ ] No hardcoded secrets
+- [ ] No debug print statements left
+- [ ] No hardcoded secrets or credentials
 - [ ] Error handling in place
- [ ] Documentation updated
 - [ ] Commit messages are clear

 ```bash
 # Run full test suite
-pytest
+pytest tests/ -q

 # Check for debug code
-grep -r "print(" src/ --include="*.py"
-grep -r "debugger" src/ --include="*.js"
+search_files("print(", path="src/", file_glob="*.py")
+search_files("console.log", path="src/", file_glob="*.js")

 # Check for TODOs
-grep -r "TODO" src/ --include="*.py"
+search_files("TODO|FIXME|HACK", path="src/")
 ```

-### Step 3: Request Review
+### Step 2: Gather Context

-Use `delegate_task` to dispatch a reviewer subagent:
+```bash
+# Changed files
+git diff --name-only HEAD~1
+
+# Diff summary
+git diff --stat HEAD~1
+
+# Recent commits
+git log --oneline -5
+```
+
+### Step 3: Dispatch Reviewer Subagent
+
+Use `delegate_task` to dispatch a focused reviewer:

 ```python
 delegate_task(
-    goal="Review implementation for quality and correctness",
+    goal="Review implementation for correctness and quality",
    context="""
-    WHAT WAS IMPLEMENTED: [Brief description]
-    
-    ORIGINAL REQUIREMENTS: [From plan or issue]
-    
+    WHAT WAS IMPLEMENTED:
+    [Brief description of the feature/fix]
+
+    ORIGINAL REQUIREMENTS:
+    [From plan, issue, or user request]
+
    FILES CHANGED:
-    - src/feature.py (added X function)
-    - tests/test_feature.py (added tests)
-    
-    COMMIT RANGE: [SHA range or branch]
-    
-    CHECK FOR:
-    - Correctness (does it do what it should?)
-    - Edge cases handled?
-    - Error handling adequate?
-    - Code quality and style
-    - Test coverage
-    - Security issues
-    - Performance concerns
-    
+    - src/models/user.py (added User class)
+    - src/auth/login.py (added login endpoint)
+    - tests/test_auth.py (added 8 tests)
+
+    REVIEW CHECKLIST:
+    - [ ] Correctness: Does it do what it should?
+    - [ ] Edge cases: Are they handled?
+    - [ ] Error handling: Is it adequate?
+    - [ ] Code quality: Clear names, good structure?
+    - [ ] Test coverage: Are tests meaningful?
+    - [ ] Security: Any vulnerabilities?
+    - [ ] Performance: Any obvious issues?
+
    OUTPUT FORMAT:
    - Summary: [brief assessment]
-    - Critical Issues: [must fix]
-    - Important Issues: [should fix]
+    - Critical Issues: [must fix — blocks merge]
+    - Important Issues: [should fix before merge]
    - Minor Issues: [nice to have]
-    - Verdict: [APPROVE / REQUEST_CHANGES / NEEDS_WORK]
+    - Strengths: [what was done well]
+    - Verdict: APPROVE / REQUEST_CHANGES
    """,
    toolsets=['file']
 )
@@ -123,110 +117,72 @@ delegate_task(

 ### Step 4: Act on Feedback

-**Critical Issues (Block merge):**
+**Critical Issues (block merge):**
 - Security vulnerabilities
 - Broken functionality
 - Data loss risk
 - Test failures
+- **Action:** Fix immediately before proceeding

-**Action:** Fix immediately before proceeding
-
-**Important Issues (Should fix):**
+**Important Issues (should fix):**
 - Missing edge case handling
 - Poor error messages
 - Unclear code
 - Missing tests
+- **Action:** Fix before merge if possible

-**Action:** Fix before merge if possible
-
-**Minor Issues (Nice to have):**
+**Minor Issues (nice to have):**
 - Style preferences
 - Refactoring suggestions
 - Documentation improvements
+- **Action:** Note for later or quick fix

-**Action:** Note for later or quick fix
+**If reviewer is wrong:**
+- Push back with technical reasoning
+- Show code/tests that prove it works
+- Request clarification

 ## Review Dimensions

 ### Correctness
-
-**Questions:**
 - Does it implement the requirements?
 - Are there logic errors?
 - Do edge cases work?
 - Are there race conditions?

-**Check:**
- Read implementation against requirements
- Trace through edge cases
- Check boundary conditions
-
 ### Code Quality
-
-**Questions:**
 - Is code readable?
- Are names clear?
- Is it too complex?
+- Are names clear and descriptive?
+- Is it too complex? (Functions >20 lines = smell)
 - Is there duplication?

-**Check:**
- Function length (aim <20 lines)
- Cyclomatic complexity
- DRY violations
- Naming clarity
-
 ### Testing
-
-**Questions:**
- Are there tests?
+- Are there meaningful tests?
 - Do they cover edge cases?
- Are they meaningful?
- Do they pass?
-
-**Check:**
- Test coverage
- Edge case coverage
- Test clarity
- Assertion quality
+- Do they test behavior, not implementation?
+- Do all tests pass?

 ### Security
-
-**Questions:**
 - Any injection vulnerabilities?
 - Proper input validation?
 - Secrets handled correctly?
 - Access control in place?

-**Check:**
- Input sanitization
- Authentication/authorization
- Secret management
- SQL/query safety
-
 ### Performance
-
-**Questions:**
 - Any N+1 queries?
- Unnecessary computation?
+- Unnecessary computation in loops?
 - Memory leaks?
- Scalability concerns?
-
-**Check:**
- Database queries
- Algorithm complexity
- Resource usage
- Caching opportunities
+- Missing caching opportunities?

 ## Review Output Format

-Standard review format:
+Standard format for reviewer subagent output:

 ```markdown
 ## Review Summary

 **Assessment:** [Brief overall assessment]
-
-**Verdict:** [APPROVE / REQUEST_CHANGES / NEEDS_WORK]
+**Verdict:** APPROVE / REQUEST_CHANGES

 ---

@@ -237,8 +193,6 @@ Standard review format:
   - Problem: [Description]
   - Suggestion: [How to fix]

---
-
 ## Important Issues (Should Fix)

 1. **[Issue title]**
@@ -246,15 +200,11 @@ Standard review format:
   - Problem: [Description]
   - Suggestion: [How to fix]

---
-
 ## Minor Issues (Optional)

 1. **[Issue title]**
   - Suggestion: [Improvement idea]

---
-
 ## Strengths

 - [What was done well]
@@ -264,111 +214,41 @@ Standard review format:

 ### With subagent-driven-development

-Review after EACH task:
-1. Subagent implements task
-2. Request code review
-3. Fix issues
-4. Proceed to next task
+Review after EACH task — this is the two-stage review:
+1. Spec compliance review (does it match the plan?)
+2. Code quality review (is it well-built?)
+3. Fix issues from either review
+4. Proceed to next task only when both approve

 ### With test-driven-development

-Review checks:
- Tests written first?
- Tests are meaningful?
+Review verifies:
+- Tests were written first (RED-GREEN-REFACTOR followed?)
+- Tests are meaningful (not just asserting True)?
 - Edge cases covered?
 - All tests pass?

 ### With writing-plans

 Review validates:
- Implementation matches plan?
+- Implementation matches the plan?
 - All tasks completed?
 - Quality standards met?

-## Common Review Patterns
+## Red Flags

-### Pre-Merge Review
-
-Before merging feature branch:
-
-```bash
-# Create review checkpoint
-git log --oneline main..feature-branch
-
-# Get summary of changes
-git diff --stat main..feature-branch
-
-# Request review
-delegate_task(
-    goal="Pre-merge review of feature branch",
-    context="[changes, requirements, test results]"
-)
-
-# Address feedback
-# Merge when approved
-```
-
-### Continuous Review
-
-During subagent-driven development:
-
-```python
-# After each task
-if task_complete:
-    review_result = request_review(task)
-    if review_result.has_critical_issues():
-        fix_issues(review_result.critical)
-        re_review()
-    proceed_to_next_task()
-```
-
-### Emergency Review
-
-When fixing production bugs:
-
-1. Fix with tests
-2. Self-review
-3. Quick peer review
-4. Deploy
-5. Full review post-deploy
-
-## Review Best Practices
-
-### As Review Requester
-
-**Do:**
- Provide complete context
- Highlight areas of concern
- Ask specific questions
- Be responsive to feedback
- Fix issues promptly
-
-**Don't:**
- Rush the reviewer
- Argue without evidence
- Ignore feedback
- Take criticism personally
-
-### As Reviewer (via subagent)
-
-**Do:**
- Be specific about issues
- Suggest improvements
- Acknowledge what works
- Prioritize issues
-
-**Don't:**
- Nitpick style (unless project requires)
- Make vague comments
- Block without explanation
- Be overly critical
+**Never:**
+- Skip review because "it's simple"
+- Ignore Critical issues
+- Proceed with unfixed Important issues
+- Argue with valid technical feedback without evidence

 ## Quality Gates

 **Must pass before merge:**
 - [ ] No critical issues
 - [ ] All tests pass
- [ ] Review approved
+- [ ] Review verdict: APPROVE
 - [ ] Requirements met

 **Should pass before merge:**
@@ -376,10 +256,6 @@ When fixing production bugs:
 - [ ] Documentation updated
 - [ ] Performance acceptable

-**Nice to have:**
- [ ] No minor issues
- [ ] Extra polish
-
 ## Remember

 ```
--- a/skills/software-development/subagent-driven-development/SKILL.md
+++ b/skills/software-development/subagent-driven-development/SKILL.md
@@ -1,8 +1,8 @@
 ---
 name: subagent-driven-development
 description: Use when executing implementation plans with independent tasks. Dispatches fresh delegate_task per task with two-stage review (spec compliance then code quality).
-version: 1.0.0
-author: Hermes Agent (adapted from Superpowers)
+version: 1.1.0
+author: Hermes Agent (adapted from obra/superpowers)
 license: MIT
 metadata:
  hermes:
@@ -16,33 +16,41 @@ metadata:

 Execute implementation plans by dispatching fresh subagents per task with systematic two-stage review.

-**Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration
+**Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration.

 ## When to Use

 Use this skill when:
- You have an implementation plan (from writing-plans skill)
+- You have an implementation plan (from writing-plans skill or user requirements)
 - Tasks are mostly independent
- You want to stay in the current session
 - Quality and spec compliance are important
+- You want automated review between tasks

-**vs. Manual execution:**
- Parallel task execution possible
- Automated review process
- Consistent quality checks
- Better for complex multi-step plans
+**vs. manual execution:**
+- Fresh context per task (no confusion from accumulated state)
+- Automated review process catches issues early
+- Consistent quality checks across all tasks
+- Subagents can ask questions before starting work

 ## The Process

 ### 1. Read and Parse Plan

-```markdown
-[Read plan file once: docs/plans/feature-plan.md]
-[Extract all tasks with full text and context]
-[Create todo list with all tasks]
+Read the plan file. Extract ALL tasks with their full text and context upfront. Create a todo list:
+
+```python
+# Read the plan
+read_file("docs/plans/feature-plan.md")
+
+# Create todo list with all tasks
+todo([
+    {"id": "task-1", "content": "Create User model with email field", "status": "pending"},
+    {"id": "task-2", "content": "Add password hashing utility", "status": "pending"},
+    {"id": "task-3", "content": "Create login endpoint", "status": "pending"},
+])
 ```

-**Action:** Read plan, extract tasks, create todo list.
+**Key:** Read the plan ONCE. Extract everything. Don't make subagents read the plan file — provide the full task text directly in context.

 ### 2. Per-Task Workflow

@@ -50,124 +58,137 @@ For EACH task in the plan:

 #### Step 1: Dispatch Implementer Subagent

-Use `delegate_task` with:
- **goal:** Implement [specific task from plan]
- **context:** Full task description from plan, project structure, relevant files
- **toolsets:** ['terminal', 'file', 'web'] (or as needed)
+Use `delegate_task` with complete context:

-**Example:**
 ```python
-# Task: Add user authentication middleware
 delegate_task(
-    goal="Implement JWT authentication middleware as specified in Task 3 of the plan",
+    goal="Implement Task 1: Create User model with email and password_hash fields",
    context="""
-    Task from plan:
-    - Create: src/middleware/auth.py
-    - Validate JWT tokens from Authorization header
-    - Return 401 for invalid tokens
-    - Attach user info to request object
-    
-    Project structure:
-    - Flask app in src/app.py
-    - Uses PyJWT library
-    - Existing middleware pattern in src/middleware/
+    TASK FROM PLAN:
+    - Create: src/models/user.py
+    - Add User class with email (str) and password_hash (str) fields
+    - Use bcrypt for password hashing
+    - Include __repr__ for debugging
+
+    FOLLOW TDD:
+    1. Write failing test in tests/models/test_user.py
+    2. Run: pytest tests/models/test_user.py -v (verify FAIL)
+    3. Write minimal implementation
+    4. Run: pytest tests/models/test_user.py -v (verify PASS)
+    5. Run: pytest tests/ -q (verify no regressions)
+    6. Commit: git add -A && git commit -m "feat: add User model with password hashing"
+
+    PROJECT CONTEXT:
+    - Python 3.11, Flask app in src/app.py
+    - Existing models in src/models/
+    - Tests use pytest, run from project root
+    - bcrypt already in requirements.txt
    """,
    toolsets=['terminal', 'file']
 )
 ```

-#### Step 2: Implementer Subagent Works
+#### Step 2: Dispatch Spec Compliance Reviewer

-The subagent will:
-1. Ask questions if needed (you answer)
-2. Implement the task following TDD
-3. Write tests
-4. Run tests to verify
-5. Self-review
-6. Report completion
-
-**Your role:** Answer questions, provide context.
-
-#### Step 3: Spec Compliance Review
-
-Dispatch reviewer subagent:
+After the implementer completes, verify against the original spec:

 ```python
 delegate_task(
-    goal="Review if implementation matches spec from plan",
+    goal="Review if implementation matches the spec from the plan",
    context="""
-    Original task spec: [copy from plan]
-    Implementation: [file paths and key code]
-    
-    Check:
-    - All requirements from spec implemented?
-    - File paths match spec?
-    - Behavior matches spec?
-    - Nothing extra added?
+    ORIGINAL TASK SPEC:
+    - Create src/models/user.py with User class
+    - Fields: email (str), password_hash (str)
+    - Use bcrypt for password hashing
+    - Include __repr__
+
+    CHECK:
+    - [ ] All requirements from spec implemented?
+    - [ ] File paths match spec?
+    - [ ] Function signatures match spec?
+    - [ ] Behavior matches expected?
+    - [ ] Nothing extra added (no scope creep)?
+
+    OUTPUT: PASS or list of specific spec gaps to fix.
    """,
    toolsets=['file']
 )
 ```

-**If spec issues found:**
- Subagent fixes gaps
- Re-run spec review
- Continue only when spec-compliant
+**If spec issues found:** Fix gaps, then re-run spec review. Continue only when spec-compliant.

-#### Step 4: Code Quality Review
+#### Step 3: Dispatch Code Quality Reviewer

-Dispatch quality reviewer:
+After spec compliance passes:

 ```python
 delegate_task(
-    goal="Review code quality and best practices",
+    goal="Review code quality for Task 1 implementation",
    context="""
-    Code to review: [file paths]
-    
-    Check:
-    - Follows project style?
-    - Proper error handling?
-    - Good naming?
-    - Test coverage adequate?
-    - No obvious bugs?
+    FILES TO REVIEW:
+    - src/models/user.py
+    - tests/models/test_user.py
+
+    CHECK:
+    - [ ] Follows project conventions and style?
+    - [ ] Proper error handling?
+    - [ ] Clear variable/function names?
+    - [ ] Adequate test coverage?
+    - [ ] No obvious bugs or missed edge cases?
+    - [ ] No security issues?
+
+    OUTPUT FORMAT:
+    - Critical Issues: [must fix before proceeding]
+    - Important Issues: [should fix]
+    - Minor Issues: [optional]
+    - Verdict: APPROVED or REQUEST_CHANGES
    """,
    toolsets=['file']
 )
 ```

-**If quality issues found:**
- Subagent fixes issues
- Re-run quality review
- Continue only when approved
+**If quality issues found:** Fix issues, re-review. Continue only when approved.

-#### Step 5: Mark Complete
+#### Step 4: Mark Complete

-Update todo list, mark task complete.
+```python
+todo([{"id": "task-1", "content": "Create User model with email field", "status": "completed"}], merge=True)
+```

 ### 3. Final Review

-After ALL tasks complete:
+After ALL tasks are complete, dispatch a final integration reviewer:

 ```python
 delegate_task(
-    goal="Review entire implementation for consistency",
-    context="All tasks completed, review for integration issues",
-    toolsets=['file']
+    goal="Review the entire implementation for consistency and integration issues",
+    context="""
+    All tasks from the plan are complete. Review the full implementation:
+    - Do all components work together?
+    - Any inconsistencies between tasks?
+    - All tests passing?
+    - Ready for merge?
+    """,
+    toolsets=['terminal', 'file']
 )
 ```

-### 4. Branch Cleanup
+### 4. Verify and Commit

-Use `finishing-a-development-branch` skill:
- Verify all tests pass
- Present merge options
- Clean up worktree
+```bash
+# Run full test suite
+pytest tests/ -q
+
+# Review all changes
+git diff --stat
+
+# Final commit if needed
+git add -A && git commit -m "feat: complete [feature name] implementation"
+```

 ## Task Granularity

-**Good task size:** 2-5 minutes of focused work
-
-**Examples:**
+**Each task = 2-5 minutes of focused work.**

 **Too big:**
 - "Implement user authentication system"
@@ -177,188 +198,134 @@ Use `finishing-a-development-branch` skill:
 - "Add password hashing function"
 - "Create login endpoint"
 - "Add JWT token generation"
+- "Create registration endpoint"

-## Communication Pattern
+## Red Flags — Never Do These

-### You to Subagent
-
-**Provide:**
- Clear task description
- Exact file paths
- Expected behavior
- Success criteria
- Relevant context
-
-**Example:**
-```
-Task: Add email validation
-Files: Create src/validators/email.py
-Expected: Function returns True for valid emails, False for invalid
-Success: Tests pass for 10 test cases including edge cases
-Context: Used in user registration flow
-```
-
-### Subagent to You
-
-**Expect:**
- Questions for clarification
- Progress updates
- Completion report
- Self-review summary
-
-**Respond to:**
- Answer questions promptly
- Provide missing context
- Approve approach decisions
-
-## Two-Stage Review Details
-
-### Stage 1: Spec Compliance
-
-**Checks:**
- [ ] All requirements from plan implemented
- [ ] File paths match specification
- [ ] Function signatures match spec
- [ ] Behavior matches expected
- [ ] No scope creep (nothing extra)
-
-**Output:** PASS or list of spec gaps
-
-### Stage 2: Code Quality
-
-**Checks:**
- [ ] Follows language conventions
- [ ] Consistent with project style
- [ ] Clear variable/function names
- [ ] Proper error handling
- [ ] Adequate test coverage
- [ ] No obvious bugs/edge cases missed
- [ ] Documentation if needed
-
-**Output:** APPROVED or list of issues (critical/important/minor)
+- Start implementation without a plan
+- Skip reviews (spec compliance OR code quality)
+- Proceed with unfixed critical/important issues
+- Dispatch multiple implementation subagents for tasks that touch the same files
+- Make subagent read the plan file (provide full text in context instead)
+- Skip scene-setting context (subagent needs to understand where the task fits)
+- Ignore subagent questions (answer before letting them proceed)
+- Accept "close enough" on spec compliance
+- Skip review loops (reviewer found issues → implementer fixes → review again)
+- Let implementer self-review replace actual review (both are needed)
+- **Start code quality review before spec compliance is PASS** (wrong order)
+- Move to next task while either review has open issues

 ## Handling Issues

-### Critical Issues
+### If Subagent Asks Questions

-**Examples:** Security vulnerability, broken functionality, data loss risk
+- Answer clearly and completely
+- Provide additional context if needed
+- Don't rush them into implementation

-**Action:** Must fix before proceeding
+### If Reviewer Finds Issues

-### Important Issues
+- Implementer subagent (or a new one) fixes them
+- Reviewer reviews again
+- Repeat until approved
+- Don't skip the re-review

-**Examples:** Missing tests, poor error handling, unclear code
+### If Subagent Fails a Task

-**Action:** Should fix before proceeding
+- Dispatch a new fix subagent with specific instructions about what went wrong
+- Don't try to fix manually in the controller session (context pollution)

-### Minor Issues
+## Efficiency Notes

-**Examples:** Style inconsistency, minor refactoring opportunity
+**Why fresh subagent per task:**
+- Prevents context pollution from accumulated state
+- Each subagent gets clean, focused context
+- No confusion from prior tasks' code or reasoning

-**Action:** Note for later, optional fix
+**Why two-stage review:**
+- Spec review catches under/over-building early
+- Quality review ensures the implementation is well-built
+- Catches issues before they compound across tasks
+
+**Cost trade-off:**
+- More subagent invocations (implementer + 2 reviewers per task)
+- But catches issues early (cheaper than debugging compounded problems later)

 ## Integration with Other Skills

+### With writing-plans
+
+This skill EXECUTES plans created by the writing-plans skill:
+1. User requirements → writing-plans → implementation plan
+2. Implementation plan → subagent-driven-development → working code
+
 ### With test-driven-development

-Subagent should:
+Implementer subagents should follow TDD:
 1. Write failing test first
 2. Implement minimal code
 3. Verify test passes
 4. Commit

-### With systematic-debugging
-
-If subagent encounters bugs:
-1. Pause implementation
-2. Debug systematically
-3. Fix root cause
-4. Resume
-
-### With writing-plans
-
-This skill EXECUTES plans created by writing-plans skill.
-
-**Sequence:**
-1. brainstorming → writing-plans → subagent-driven-development
+Include TDD instructions in every implementer context.

 ### With requesting-code-review

-After subagent completes task, use requesting-code-review skill for final validation.
+The two-stage review process IS the code review. For final integration review, use the requesting-code-review skill's review dimensions.

-## Common Patterns
+### With systematic-debugging

-### Pattern: Fresh Subagent Per Task
-
-**Why:** Prevents context pollution
-**How:** New delegate_task for each task
-**Result:** Each subagent has clean context
-
-### Pattern: Two-Stage Review
-
-**Why:** Catch issues early, ensure quality
-**How:** Spec review → Quality review
-**Result:** High-quality, spec-compliant code
-
-### Pattern: Frequent Checkpoints
-
-**Why:** Catch issues before they compound
-**How:** Review after each task
-**Result:** Issues don't cascade
-
-## Best Practices
-
-1. **Clear Task Boundaries**
-   - One task = one focused change
-   - Independent where possible
-   - Clear success criteria
-
-2. **Complete Context**
-   - Provide all needed files
-   - Explain project conventions
-   - Share relevant examples
-
-3. **Review Discipline**
-   - Don't skip spec review
-   - Address critical issues immediately
-   - Keep quality bar consistent
-
-4. **Communication**
-   - Answer subagent questions quickly
-   - Clarify when needed
-   - Provide feedback on reviews
+If a subagent encounters bugs during implementation:
+1. Follow systematic-debugging process
+2. Find root cause before fixing
+3. Write regression test
+4. Resume implementation

 ## Example Workflow

-```markdown
-User: Implement user authentication
+```
+[Read plan: docs/plans/auth-feature.md]
+[Create todo list with 5 tasks]

-You: I'll use subagent-driven development. Let me create a plan first.
-[Uses writing-plans skill]
+--- Task 1: Create User model ---
+[Dispatch implementer subagent]
+  Implementer: "Should email be unique?"
+  You: "Yes, email must be unique"
+  Implementer: Implemented, 3/3 tests passing, committed.

-Plan created with 5 tasks:
-1. Create User model
-2. Add password hashing
-3. Implement login endpoint
-4. Add JWT middleware
-5. Create registration endpoint
+[Dispatch spec reviewer]
+  Spec reviewer: ✅ PASS — all requirements met

--- Task 1 ---
-[Dispatch implementer subagent for User model]
-[Subagent asks: "Should email be unique?"]
-You: Yes, email must be unique
-[Subagent implements]
-[Dispatch spec reviewer - PASS]
-[Dispatch quality reviewer - APPROVED]
-Task 1 complete
+[Dispatch quality reviewer]
+  Quality reviewer: ✅ APPROVED — clean code, good tests

--- Task 2 ---
-[Dispatch implementer for password hashing]
-...
+[Mark Task 1 complete]

-[After all tasks]
-[Final review]
-[Merge branch]
+--- Task 2: Password hashing ---
+[Dispatch implementer subagent]
+  Implementer: No questions, implemented, 5/5 tests passing.
+
+[Dispatch spec reviewer]
+  Spec reviewer: ❌ Missing: password strength validation (spec says "min 8 chars")
+
+[Implementer fixes]
+  Implementer: Added validation, 7/7 tests passing.
+
+[Dispatch spec reviewer again]
+  Spec reviewer: ✅ PASS
+
+[Dispatch quality reviewer]
+  Quality reviewer: Important: Magic number 8, extract to constant
+  Implementer: Extracted MIN_PASSWORD_LENGTH constant
+  Quality reviewer: ✅ APPROVED
+
+[Mark Task 2 complete]
+
+... (continue for all tasks)
+
+[After all tasks: dispatch final integration reviewer]
+[Run full test suite: all passing]
+[Done!]
 ```

 ## Remember
@@ -366,8 +333,8 @@ Task 1 complete
 ```
 Fresh subagent per task
 Two-stage review every time
-Spec compliance first
-Code quality second
+Spec compliance FIRST
+Code quality SECOND
 Never skip reviews
 Catch issues early
 ```
--- a/skills/software-development/systematic-debugging/SKILL.md
+++ b/skills/software-development/systematic-debugging/SKILL.md
@@ -1,8 +1,8 @@
 ---
 name: systematic-debugging
-description: Use when encountering any bug, test failure, or unexpected behavior. 4-phase root cause investigation process - NO fixes without understanding the problem first.
-version: 1.0.0
-author: Hermes Agent (adapted from Superpowers)
+description: Use when encountering any bug, test failure, or unexpected behavior. 4-phase root cause investigation — NO fixes without understanding the problem first.
+version: 1.1.0
+author: Hermes Agent (adapted from obra/superpowers)
 license: MIT
 metadata:
  hermes:
@@ -48,7 +48,7 @@ Use for ANY technical issue:
 **Don't skip when:**
 - Issue seems simple (simple bugs have root causes too)
 - You're in a hurry (rushing guarantees rework)
- Manager wants it fixed NOW (systematic is faster than thrashing)
+- Someone wants it fixed NOW (systematic is faster than thrashing)

 ## The Four Phases

@@ -67,7 +67,7 @@ You MUST complete each phase before proceeding to the next.
 - Read stack traces completely
 - Note line numbers, file paths, error codes

-**Action:** Copy full error message to your notes.
+**Action:** Use `read_file` on the relevant source files. Use `search_files` to find the error string in the codebase.

 ### 2. Reproduce Consistently

@@ -76,16 +76,24 @@ You MUST complete each phase before proceeding to the next.
 - Does it happen every time?
 - If not reproducible → gather more data, don't guess

-**Action:** Write down exact reproduction steps.
+**Action:** Use the `terminal` tool to run the failing test or trigger the bug:
+
+```bash
+# Run specific failing test
+pytest tests/test_module.py::test_name -v
+
+# Run with verbose output
+pytest tests/test_module.py -v --tb=long
+```

 ### 3. Check Recent Changes

 - What changed that could cause this?
 - Git diff, recent commits
 - New dependencies, config changes
- Environmental differences

-**Commands:**
+**Action:**
+
 ```bash
 # Recent commits
 git log --oneline -10
@@ -94,59 +102,50 @@ git log --oneline -10
 git diff

 # Changes in specific file
-git log -p --follow src/problematic_file.py
+git log -p --follow src/problematic_file.py | head -100
 ```

 ### 4. Gather Evidence in Multi-Component Systems

-**WHEN system has multiple components (CI pipeline, API service, database layer):**
+**WHEN system has multiple components (API → service → database, CI → build → deploy):**

 **BEFORE proposing fixes, add diagnostic instrumentation:**

 For EACH component boundary:
- Log what data enters component
- Log what data exits component
+- Log what data enters the component
+- Log what data exits the component
 - Verify environment/config propagation
 - Check state at each layer

-Run once to gather evidence showing WHERE it breaks
-THEN analyze evidence to identify failing component
-THEN investigate that specific component
+Run once to gather evidence showing WHERE it breaks.
+THEN analyze evidence to identify the failing component.
+THEN investigate that specific component.
+
+### 5. Trace Data Flow
+
+**WHEN error is deep in the call stack:**
+
+- Where does the bad value originate?
+- What called this function with the bad value?
+- Keep tracing upstream until you find the source
+- Fix at the source, not at the symptom
+
+**Action:** Use `search_files` to trace references:

-**Example (multi-layer system):**
 ```python
-# Layer 1: Entry point
-def entry_point(input_data):
-    print(f"DEBUG: Input received: {input_data}")
-    result = process_layer1(input_data)
-    print(f"DEBUG: Layer 1 output: {result}")
-    return result
+# Find where the function is called
+search_files("function_name(", path="src/", file_glob="*.py")

-# Layer 2: Processing
-def process_layer1(data):
-    print(f"DEBUG: Layer 1 received: {data}")
-    # ... processing ...
-    print(f"DEBUG: Layer 1 returning: {result}")
-    return result
+# Find where the variable is set
+search_files("variable_name\\s*=", path="src/", file_glob="*.py")
 ```

-**Action:** Add logging, run once, analyze output.
-
-### 5. Isolate the Problem
-
- Comment out code until problem disappears
- Binary search through recent changes
- Create minimal reproduction case
- Test with fresh environment
-
-**Action:** Create minimal reproduction case.
-
 ### Phase 1 Completion Checklist

 - [ ] Error messages fully read and understood
 - [ ] Issue reproduced consistently
 - [ ] Recent changes identified and reviewed
- [ ] Evidence gathered (logs, state)
+- [ ] Evidence gathered (logs, state, data flow)
 - [ ] Problem isolated to specific component/code
 - [ ] Root cause hypothesis formed

@@ -154,290 +153,214 @@ def process_layer1(data):

 ---

-## Phase 2: Solution Design
+## Phase 2: Pattern Analysis

-**Given the root cause, design the fix:**
+**Find the pattern before fixing:**

-### 1. Understand the Fix Area
+### 1. Find Working Examples

- Read relevant code thoroughly
- Understand data flow
- Identify affected components
- Check for similar issues elsewhere
+- Locate similar working code in the same codebase
+- What works that's similar to what's broken?

-**Action:** Read all relevant code files.
+**Action:** Use `search_files` to find comparable patterns:

-### 2. Design Minimal Fix
+```python
+search_files("similar_pattern", path="src/", file_glob="*.py")
+```

- Smallest change that fixes root cause
- Avoid scope creep
- Don't refactor while fixing
- Fix one issue at a time
+### 2. Compare Against References

-**Action:** Write down the exact fix before coding.
+- If implementing a pattern, read the reference implementation COMPLETELY
+- Don't skim — read every line
+- Understand the pattern fully before applying

-### 3. Consider Side Effects
+### 3. Identify Differences

- What else could this change affect?
- Are there dependencies?
- Will this break other functionality?
+- What's different between working and broken?
+- List every difference, however small
+- Don't assume "that can't matter"

-**Action:** Identify potential side effects.
+### 4. Understand Dependencies

-### Phase 2 Completion Checklist
-
- [ ] Fix area code fully understood
- [ ] Minimal fix designed
- [ ] Side effects identified
- [ ] Fix approach documented
+- What other components does this need?
+- What settings, config, environment?
+- What assumptions does it make?

 ---

-## Phase 3: Implementation
+## Phase 3: Hypothesis and Testing

-**Make the fix:**
+**Scientific method:**

-### 1. Write Test First (if possible)
+### 1. Form a Single Hypothesis

-```python
-def test_should_handle_empty_input():
-    """Regression test for bug #123"""
-    result = process_data("")
-    assert result == expected_empty_result
-```
+- State clearly: "I think X is the root cause because Y"
+- Write it down
+- Be specific, not vague

-### 2. Implement Fix
+### 2. Test Minimally

-```python
-# Before (buggy)
-def process_data(data):
-    return data.split(",")[0]
+- Make the SMALLEST possible change to test the hypothesis
+- One variable at a time
+- Don't fix multiple things at once

-# After (fixed)
-def process_data(data):
-    if not data:
-        return ""
-    return data.split(",")[0]
-```
+### 3. Verify Before Continuing
+
+- Did it work? → Phase 4
+- Didn't work? → Form NEW hypothesis
+- DON'T add more fixes on top
+
+### 4. When You Don't Know
+
+- Say "I don't understand X"
+- Don't pretend to know
+- Ask the user for help
+- Research more
+
+---
+
+## Phase 4: Implementation
+
+**Fix the root cause, not the symptom:**
+
+### 1. Create Failing Test Case
+
+- Simplest possible reproduction
+- Automated test if possible
+- MUST have before fixing
+- Use the `test-driven-development` skill
+
+### 2. Implement Single Fix
+
+- Address the root cause identified
+- ONE change at a time
+- No "while I'm here" improvements
+- No bundled refactoring

 ### 3. Verify Fix

 ```bash
-# Run the specific test
-pytest tests/test_data.py::test_should_handle_empty_input -v
+# Run the specific regression test
+pytest tests/test_module.py::test_regression -v

-# Run all tests to check for regressions
-pytest
+# Run full suite — no regressions
+pytest tests/ -q
 ```

-### Phase 3 Completion Checklist
+### 4. If Fix Doesn't Work — The Rule of Three

- [ ] Test written that reproduces the bug
- [ ] Minimal fix implemented
- [ ] Test passes
- [ ] No regressions introduced
+- **STOP.**
+- Count: How many fixes have you tried?
+- If < 3: Return to Phase 1, re-analyze with new information
+- **If ≥ 3: STOP and question the architecture (step 5 below)**
+- DON'T attempt Fix #4 without architectural discussion
+
+### 5. If 3+ Fixes Failed: Question Architecture
+
+**Pattern indicating an architectural problem:**
+- Each fix reveals new shared state/coupling in a different place
+- Fixes require "massive refactoring" to implement
+- Each fix creates new symptoms elsewhere
+
+**STOP and question fundamentals:**
+- Is this pattern fundamentally sound?
+- Are we "sticking with it through sheer inertia"?
+- Should we refactor the architecture vs. continue fixing symptoms?
+
+**Discuss with the user before attempting more fixes.**
+
+This is NOT a failed hypothesis — this is a wrong architecture.

 ---

-## Phase 4: Verification
+## Red Flags — STOP and Follow Process

-**Confirm it's actually fixed:**
+If you catch yourself thinking:
+- "Quick fix for now, investigate later"
+- "Just try changing X and see if it works"
+- "Add multiple changes, run tests"
+- "Skip the test, I'll manually verify"
+- "It's probably X, let me fix that"
+- "I don't fully understand but this might work"
+- "Pattern says X but I'll adapt it differently"
+- "Here are the main problems: [lists fixes without investigation]"
+- Proposing solutions before tracing data flow
+- **"One more fix attempt" (when already tried 2+)**
+- **Each fix reveals a new problem in a different place**

-### 1. Reproduce Original Issue
+**ALL of these mean: STOP. Return to Phase 1.**

- Follow original reproduction steps
- Verify issue is resolved
- Test edge cases
+**If 3+ fixes failed:** Question the architecture (Phase 4 step 5).

-### 2. Regression Testing
+## Common Rationalizations

-```bash
-# Full test suite
-pytest
+| Excuse | Reality |
+|--------|---------|
+| "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. |
+| "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
+| "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. |
+| "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. |
+| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
+| "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. |
+| "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
+| "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question the pattern, don't fix again. |

-# Integration tests
-pytest tests/integration/
+## Quick Reference

-# Check related areas
-pytest -k "related_feature"
-```
+| Phase | Key Activities | Success Criteria |
+|-------|---------------|------------------|
+| **1. Root Cause** | Read errors, reproduce, check changes, gather evidence, trace data flow | Understand WHAT and WHY |
+| **2. Pattern** | Find working examples, compare, identify differences | Know what's different |
+| **3. Hypothesis** | Form theory, test minimally, one variable at a time | Confirmed or new hypothesis |
+| **4. Implementation** | Create regression test, fix root cause, verify | Bug resolved, all tests pass |

-### 3. Monitor After Deploy
+## Hermes Agent Integration

- Watch logs for related errors
- Check metrics
- Verify fix in production
+### Investigation Tools

-### Phase 4 Completion Checklist
+Use these Hermes tools during Phase 1:

- [ ] Original issue cannot be reproduced
- [ ] All tests pass
- [ ] No new warnings/errors
- [ ] Fix documented (commit message, comments)
+- **`search_files`** — Find error strings, trace function calls, locate patterns
+- **`read_file`** — Read source code with line numbers for precise analysis
+- **`terminal`** — Run tests, check git history, reproduce bugs
+- **`web_search`/`web_extract`** — Research error messages, library docs

---
+### With delegate_task

-## Debugging Techniques
-
-### Root Cause Tracing
-
-Ask "why" 5 times:
-1. Why did it fail? → Null pointer
-2. Why was it null? → Function returned null
-3. Why did function return null? → Missing validation
-4. Why was validation missing? → Assumed input always valid
-5. Why was that assumption wrong? → API changed
-
-**Root cause:** API change not accounted for
-
-### Defense in Depth
-
-Don't fix just the symptom:
-
-**Bad:** Add null check at crash site
-**Good:** 
-1. Add validation at API boundary
-2. Add null check at crash site
-3. Add test for both
-4. Document API behavior
-
-### Condition-Based Waiting
-
-For timing/race conditions:
+For complex multi-component debugging, dispatch investigation subagents:

 ```python
-# Bad - arbitrary sleep
-import time
-time.sleep(5)  # "Should be enough"
+delegate_task(
+    goal="Investigate why [specific test/behavior] fails",
+    context="""
+    Follow systematic-debugging skill:
+    1. Read the error message carefully
+    2. Reproduce the issue
+    3. Trace the data flow to find root cause
+    4. Report findings — do NOT fix yet

-# Good - wait for condition
-from tenacity import retry, wait_exponential, stop_after_attempt
-
-@retry(wait=wait_exponential(multiplier=1, min=4, max=10),
-       stop=stop_after_attempt(5))
-def wait_for_service():
-    response = requests.get(health_url)
-    assert response.status_code == 200
+    Error: [paste full error]
+    File: [path to failing code]
+    Test command: [exact command]
+    """,
+    toolsets=['terminal', 'file']
+)
 ```

---
-
-## Common Debugging Pitfalls
-
-### Fix Without Understanding
-
-**Symptom:** "Just add a try/catch"
-**Problem:** Masks the real issue
-**Solution:** Complete Phase 1 before any fix
-
-### Shotgun Debugging
-
-**Symptom:** Change 5 things at once
-**Problem:** Don't know what fixed it
-**Solution:** One change at a time, verify each
-
-### Premature Optimization
-
-**Symptom:** Rewrite while debugging
-**Problem:** Introduces new bugs
-**Solution:** Fix first, refactor later
-
-### Assuming Environment
-
-**Symptom:** "Works on my machine"
-**Problem:** Environment differences
-**Solution:** Check environment variables, versions, configs
-
---
-
-## Language-Specific Tools
-
-### Python
-
-```python
-# Add debugger
-import pdb; pdb.set_trace()
-
-# Or use ipdb for better experience
-import ipdb; ipdb.set_trace()
-
-# Log state
-import logging
-logging.debug(f"Variable state: {variable}")
-
-# Stack trace
-import traceback
-traceback.print_exc()
-```
-
-### JavaScript/TypeScript
-
-```javascript
-// Debugger
-debugger;
-
-// Console with context
-console.log("State:", { var1, var2, var3 });
-
-// Stack trace
-console.trace("Here");
-
-// Error with context
-throw new Error(`Failed with input: ${JSON.stringify(input)}`);
-```
-
-### Go
-
-```go
-// Print state
-fmt.Printf("Debug: variable=%+v\n", variable)
-
-// Stack trace
-import "runtime/debug"
-debug.PrintStack()
-
-// Panic with context
-if err != nil {
-    panic(fmt.Sprintf("unexpected error: %v", err))
-}
-```
-
---
-
-## Integration with Other Skills
-
 ### With test-driven-development

-When debugging:
-1. Write test that reproduces bug
-2. Debug systematically
-3. Fix root cause
-4. Test passes
+When fixing bugs:
+1. Write a test that reproduces the bug (RED)
+2. Debug systematically to find root cause
+3. Fix the root cause (GREEN)
+4. The test proves the fix and prevents regression

-### With writing-plans
+## Real-World Impact

-Include debugging tasks in plans:
- "Add diagnostic logging"
- "Create reproduction test"
- "Verify fix resolves issue"
-
-### With subagent-driven-development
-
-If subagent gets stuck:
-1. Switch to systematic debugging
-2. Analyze root cause
-3. Provide findings to subagent
-4. Resume implementation
-
---
-
-## Remember
-
-```
-PHASE 1: Investigate → Understand WHY
-PHASE 2: Design → Plan the fix
-PHASE 3: Implement → Make the fix
-PHASE 4: Verify → Confirm it's fixed
-```
+From debugging sessions:
+- Systematic approach: 15-30 minutes to fix
+- Random fixes approach: 2-3 hours of thrashing
+- First-time fix rate: 95% vs 40%
+- New bugs introduced: Near zero vs common

 **No shortcuts. No guessing. Systematic always wins.**
--- a/skills/software-development/test-driven-development/SKILL.md
+++ b/skills/software-development/test-driven-development/SKILL.md
@@ -1,8 +1,8 @@
 ---
 name: test-driven-development
 description: Use when implementing any feature or bugfix, before writing implementation code. Enforces RED-GREEN-REFACTOR cycle with test-first approach.
-version: 1.0.0
-author: Hermes Agent (adapted from Superpowers)
+version: 1.1.0
+author: Hermes Agent (adapted from obra/superpowers)
 license: MIT
 metadata:
  hermes:
@@ -28,7 +28,7 @@ Write the test first. Watch it fail. Write minimal code to pass.
 - Refactoring
 - Behavior changes

-**Exceptions (ask your human partner):**
+**Exceptions (ask the user first):**
 - Throwaway prototypes
 - Generated code
 - Configuration files
@@ -53,11 +53,11 @@ Implement fresh from tests. Period.

 ## Red-Green-Refactor Cycle

-### RED - Write Failing Test
+### RED — Write Failing Test

 Write one minimal test showing what should happen.

-**Good Example:**
+**Good test:**
 ```python
 def test_retries_failed_operations_3_times():
    attempts = 0
@@ -67,43 +67,51 @@ def test_retries_failed_operations_3_times():
        if attempts < 3:
            raise Exception('fail')
        return 'success'
-    
+
    result = retry_operation(operation)
-    
+
    assert result == 'success'
    assert attempts == 3
 ```
- Clear name, tests real behavior, one thing
+Clear name, tests real behavior, one thing.

-**Bad Example:**
+**Bad test:**
 ```python
 def test_retry_works():
    mock = MagicMock()
    mock.side_effect = [Exception(), Exception(), 'success']
-    
    result = retry_operation(mock)
-    
    assert result == 'success'  # What about retry count? Timing?
 ```
- Vague name, mocks behavior not reality, unclear what it tests
+Vague name, tests mock not real code.

-### Verify RED
+**Requirements:**
+- One behavior per test
+- Clear descriptive name ("and" in name? Split it)
+- Real code, not mocks (unless truly unavoidable)
+- Name describes behavior, not implementation

-Run the test. It MUST fail.
+### Verify RED — Watch It Fail

-**If it passes:**
- Test is wrong (not testing what you think)
- Code already exists (delete it, start over)
- Wrong test file running
+**MANDATORY. Never skip.**

-**What to check:**
- Error message makes sense
- Fails for expected reason
- Stack trace points to right place
+```bash
+# Use terminal tool to run the specific test
+pytest tests/test_feature.py::test_specific_behavior -v
+```

-### GREEN - Minimal Code
+Confirm:
+- Test fails (not errors from typos)
+- Failure message is expected
+- Fails because the feature is missing

-Write just enough code to pass. Nothing more.
+**Test passes immediately?** You're testing existing behavior. Fix the test.
+
+**Test errors?** Fix the error, re-run until it fails correctly.
+
+### GREEN — Minimal Code
+
+Write the simplest code to pass the test. Nothing more.

 **Good:**
 ```python
@@ -119,266 +127,216 @@ def add(a, b):
    return result
 ```

-Cheating is OK in GREEN:
+Don't add features, refactor other code, or "improve" beyond the test.
+
+**Cheating is OK in GREEN:**
 - Hardcode return values
 - Copy-paste
 - Duplicate code
 - Skip edge cases

-**We'll fix it in refactor.**
+We'll fix it in REFACTOR.

-### Verify GREEN
+### Verify GREEN — Watch It Pass

-Run tests. All must pass.
+**MANDATORY.**

-**If fails:**
- Fix minimal code
- Don't expand scope
- Stay in GREEN
+```bash
+# Run the specific test
+pytest tests/test_feature.py::test_specific_behavior -v

-### REFACTOR - Clean Up
+# Then run ALL tests to check for regressions
+pytest tests/ -q
+```

-Now improve the code while keeping tests green.
+Confirm:
+- Test passes
+- Other tests still pass
+- Output pristine (no errors, warnings)

-**Safe refactorings:**
- Rename variables/functions
- Extract helper functions
+**Test fails?** Fix the code, not the test.
+
+**Other tests fail?** Fix regressions now.
+
+### REFACTOR — Clean Up
+
+After green only:
 - Remove duplication
+- Improve names
+- Extract helpers
 - Simplify expressions
- Improve readability

-**Golden rule:** Tests stay green throughout.
+Keep tests green throughout. Don't add behavior.

-**If tests fail during refactor:**
- Undo immediately
- Smaller refactoring steps
- Check you didn't change behavior
+**If tests fail during refactor:** Undo immediately. Take smaller steps.

-## Implementation Workflow
+### Repeat

-### 1. Create Test File
+Next failing test for next behavior. One cycle at a time.

-```bash
-# Python
-touch tests/test_feature.py
+## Why Order Matters

-# JavaScript
-touch tests/feature.test.js
+**"I'll write tests after to verify it works"**

-# Rust
-touch tests/feature_tests.rs
-```
+Tests written after code pass immediately. Passing immediately proves nothing:
+- Might test the wrong thing
+- Might test implementation, not behavior
+- Might miss edge cases you forgot
+- You never saw it catch the bug

-### 2. Write First Failing Test
+Test-first forces you to see the test fail, proving it actually tests something.
+
+**"I already manually tested all the edge cases"**
+
+Manual testing is ad-hoc. You think you tested everything but:
+- No record of what you tested
+- Can't re-run when code changes
+- Easy to forget cases under pressure
+- "It worked when I tried it" ≠ comprehensive
+
+Automated tests are systematic. They run the same way every time.
+
+**"Deleting X hours of work is wasteful"**
+
+Sunk cost fallacy. The time is already gone. Your choice now:
+- Delete and rewrite with TDD (high confidence)
+- Keep it and add tests after (low confidence, likely bugs)
+
+The "waste" is keeping code you can't trust.
+
+**"TDD is dogmatic, being pragmatic means adapting"**
+
+TDD IS pragmatic:
+- Finds bugs before commit (faster than debugging after)
+- Prevents regressions (tests catch breaks immediately)
+- Documents behavior (tests show how to use code)
+- Enables refactoring (change freely, tests catch breaks)
+
+"Pragmatic" shortcuts = debugging in production = slower.
+
+**"Tests after achieve the same goals — it's spirit not ritual"**
+
+No. Tests-after answer "What does this do?" Tests-first answer "What should this do?"
+
+Tests-after are biased by your implementation. You test what you built, not what's required. Tests-first force edge case discovery before implementing.
+
+## Common Rationalizations
+
+| Excuse | Reality |
+|--------|---------|
+| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
+| "I'll test after" | Tests passing immediately prove nothing. |
+| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
+| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
+| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
+| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
+| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
+| "Test hard = design unclear" | Listen to the test. Hard to test = hard to use. |
+| "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
+| "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. |
+| "Existing code has no tests" | You're improving it. Add tests for the code you touch. |
+
+## Red Flags — STOP and Start Over
+
+If you catch yourself doing any of these, delete the code and restart with TDD:
+
+- Code before test
+- Test after implementation
+- Test passes immediately on first run
+- Can't explain why test failed
+- Tests added "later"
+- Rationalizing "just this once"
+- "I already manually tested it"
+- "Tests after achieve the same purpose"
+- "Keep as reference" or "adapt existing code"
+- "Already spent X hours, deleting is wasteful"
+- "TDD is dogmatic, I'm being pragmatic"
+- "This is different because..."
+
+**All of these mean: Delete code. Start over with TDD.**
+
+## Verification Checklist
+
+Before marking work complete:
+
+- [ ] Every new function/method has a test
+- [ ] Watched each test fail before implementing
+- [ ] Each test failed for expected reason (feature missing, not typo)
+- [ ] Wrote minimal code to pass each test
+- [ ] All tests pass
+- [ ] Output pristine (no errors, warnings)
+- [ ] Tests use real code (mocks only if unavoidable)
+- [ ] Edge cases and errors covered
+
+Can't check all boxes? You skipped TDD. Start over.
+
+## When Stuck
+
+| Problem | Solution |
+|---------|----------|
+| Don't know how to test | Write the wished-for API. Write the assertion first. Ask the user. |
+| Test too complicated | Design too complicated. Simplify the interface. |
+| Must mock everything | Code too coupled. Use dependency injection. |
+| Test setup huge | Extract helpers. Still complex? Simplify the design. |
+
+## Hermes Agent Integration
+
+### Running Tests
+
+Use the `terminal` tool to run tests at each step:

 ```python
-# tests/test_calculator.py
-def test_adds_two_numbers():
-    calc = Calculator()
-    result = calc.add(2, 3)
-    assert result == 5
+# RED — verify failure
+terminal("pytest tests/test_feature.py::test_name -v")
+
+# GREEN — verify pass
+terminal("pytest tests/test_feature.py::test_name -v")
+
+# Full suite — verify no regressions
+terminal("pytest tests/ -q")
 ```

-### 3. Run and Verify Failure
+### With delegate_task

-```bash
-pytest tests/test_calculator.py -v
-# Expected: FAIL - Calculator not defined
-```
-
-### 4. Write Minimal Implementation
+When dispatching subagents for implementation, enforce TDD in the goal:

 ```python
-# src/calculator.py
-class Calculator:
-    def add(self, a, b):
-        return a + b  # Minimal!
+delegate_task(
+    goal="Implement [feature] using strict TDD",
+    context="""
+    Follow test-driven-development skill:
+    1. Write failing test FIRST
+    2. Run test to verify it fails
+    3. Write minimal code to pass
+    4. Run test to verify it passes
+    5. Refactor if needed
+    6. Commit
+
+    Project test command: pytest tests/ -q
+    Project structure: [describe relevant files]
+    """,
+    toolsets=['terminal', 'file']
+)
 ```

-### 5. Run and Verify Pass
-
-```bash
-pytest tests/test_calculator.py -v
-# Expected: PASS
-```
-
-### 6. Commit
-
-```bash
-git add tests/test_calculator.py src/calculator.py
-git commit -m "feat: add calculator with add method"
-```
-
-### 7. Next Test
-
-```python
-def test_adds_negative_numbers():
-    calc = Calculator()
-    result = calc.add(-2, -3)
-    assert result == -5
-```
-
-Repeat cycle.
-
-## Testing Anti-Patterns
-
-### Mocking What You Don't Own
-
-**Bad:** Mock database, HTTP client, file system
-**Good:** Abstract behind interface, test interface
-
-### Testing Implementation Details
-
-**Bad:** Test that function was called
-**Good:** Test the result/behavior
-
-### Happy Path Only
-
-**Bad:** Only test expected inputs
-**Good:** Test edge cases, errors, boundaries
-
-### Brittle Tests
-
-**Bad:** Test breaks when refactoring
-**Good:** Tests verify behavior, not structure
-
-## Common Pitfalls
-
-### "I'll Write Tests After"
-
-No, you won't. Write them first.
-
-### "This is Too Simple to Test"
-
-Simple bugs cause complex problems. Test everything.
-
-### "I Need to See It Work First"
-
-Temporary code becomes permanent. Test first.
-
-### "Tests Take Too Long"
-
-Untested code takes longer. Invest in tests.
-
-## Language-Specific Commands
-
-### Python (pytest)
-
-```bash
-# Run all tests
-pytest
-
-# Run specific test
-pytest tests/test_feature.py::test_name -v
-
-# Run with coverage
-pytest --cov=src --cov-report=term-missing
-
-# Watch mode
-pytest-watch
-```
-
-### JavaScript (Jest)
-
-```bash
-# Run all tests
-npm test
-
-# Run specific test
-npm test -- test_name
-
-# Watch mode
-npm test -- --watch
-
-# Coverage
-npm test -- --coverage
-```
-
-### TypeScript (Jest with ts-jest)
-
-```bash
-# Run tests
-npx jest
-
-# Run specific file
-npx jest tests/feature.test.ts
-```
-
-### Go
-
-```bash
-# Run all tests
-go test ./...
-
-# Run specific test
-go test -run TestName
-
-# Verbose
-go test -v
-
-# Coverage
-go test -cover
-```
-
-### Rust
-
-```bash
-# Run tests
-cargo test
-
-# Run specific test
-cargo test test_name
-
-# Show output
-cargo test -- --nocapture
-```
-
-## Integration with Other Skills
-
-### With writing-plans
-
-Every plan task should specify:
- What test to write
- Expected test failure
- Minimal implementation
- Refactoring opportunities
-
 ### With systematic-debugging

-When fixing bugs:
-1. Write test that reproduces bug
-2. Verify test fails (RED)
-3. Fix bug (GREEN)
-4. Refactor if needed
+Bug found? Write failing test reproducing it. Follow TDD cycle. The test proves the fix and prevents regression.

-### With subagent-driven-development
+Never fix bugs without a test.

-Subagent implements one test at a time:
-1. Write failing test
-2. Minimal code to pass
-3. Commit
-4. Next test
+## Testing Anti-Patterns

-## Success Indicators
+- **Testing mock behavior instead of real behavior** — mocks should verify interactions, not replace the system under test
+- **Testing implementation details** — test behavior/results, not internal method calls
+- **Happy path only** — always test edge cases, errors, and boundaries
+- **Brittle tests** — tests should verify behavior, not structure; refactoring shouldn't break them

-**You're doing TDD right when:**
- Tests fail before code exists
- You write <10 lines between test runs
- Refactoring feels safe
- Bugs are caught immediately
- Code is simpler than expected
+## Final Rule

-**Red flags:**
- Writing 50+ lines without running tests
- Tests always pass
- Fear of refactoring
- "I'll test later"
+```
+Production code → test exists and failed first
+Otherwise → not TDD
+```

-## Remember
-
-1. **RED** - Write failing test
-2. **GREEN** - Minimal code to pass  
-3. **REFACTOR** - Clean up safely
-4. **REPEAT** - Next behavior
-
-**If you didn't see it fail, it doesn't test what you think.**
+No exceptions without the user's explicit permission.
--- a/skills/software-development/writing-plans/SKILL.md
+++ b/skills/software-development/writing-plans/SKILL.md
@@ -1,8 +1,8 @@
 ---
 name: writing-plans
 description: Use when you have a spec or requirements for a multi-step task. Creates comprehensive implementation plans with bite-sized tasks, exact file paths, and complete code examples.
-version: 1.0.0
-author: Hermes Agent (adapted from Superpowers)
+version: 1.1.0
+author: Hermes Agent (adapted from obra/superpowers)
 license: MIT
 metadata:
  hermes:
@@ -14,112 +14,34 @@ metadata:

 ## Overview

-Transform specifications into actionable implementation plans. Write comprehensive plans that any developer can follow - even with zero context about your codebase.
+Write comprehensive implementation plans assuming the implementer has zero context for the codebase and questionable taste. Document everything they need: which files to touch, complete code, testing commands, docs to check, how to verify. Give them bite-sized tasks. DRY. YAGNI. TDD. Frequent commits.

-**Core principle:** Document everything: exact file paths, complete code, test commands, verification steps.
+Assume the implementer is a skilled developer but knows almost nothing about the toolset or problem domain. Assume they don't know good test design very well.

-**Assume the implementer:**
- Is a skilled developer
- Knows almost nothing about your codebase
- Has questionable taste in code style
- Needs explicit guidance
+**Core principle:** A good plan makes implementation obvious. If someone has to guess, the plan is incomplete.

 ## When to Use

-**Always use this skill:**
- Before implementing multi-step features
- After design approval (from brainstorming)
- When breaking down complex requirements
- Before delegating to subagents
+**Always use before:**
+- Implementing multi-step features
+- Breaking down complex requirements
+- Delegating to subagents via subagent-driven-development

 **Don't skip when:**
 - Feature seems simple (assumptions cause bugs)
- You plan to implement yourself (future you needs guidance)
+- You plan to implement it yourself (future you needs guidance)
 - Working alone (documentation matters)

-## Plan Document Structure
+## Bite-Sized Task Granularity

-### Header (Required)
+**Each task = 2-5 minutes of focused work.**

-Every plan MUST start with:
-
-```markdown
-# [Feature Name] Implementation Plan
-
-> **For Hermes:** Use subagent-driven-development or executing-plans skill to implement this plan.
-
-**Goal:** One sentence describing what this builds
-
-**Architecture:** 2-3 sentences about approach
-
-**Tech Stack:** Key technologies/libraries
-
---
-```
-
-### Task Structure
-
-Each task follows this format:
-
-```markdown
-### Task N: [Descriptive Name]
-
-**Objective:** What this task accomplishes (one sentence)
-
-**Files:**
- Create: `exact/path/to/new_file.py`
- Modify: `exact/path/to/existing.py:45-67` (line numbers if known)
- Test: `tests/path/to/test_file.py`
-
-**Implementation Steps:**
-
-**Step 1: [Action description]**
-```python
-# Complete code to write
-class NewClass:
-    def __init__(self):
-        self.value = None
-    
-    def process(self, input):
-        return input.upper()
-```
-
-**Step 2: [Action description]**
-```bash
-# Command to run
-pytest tests/test_new.py -v
-```
-Expected: Tests pass with 3 green dots
-
-**Step 3: [Action description]**
-```python
-# More code if needed
-```
-
-**Verification:**
- [ ] Test passes
- [ ] Function returns expected output
- [ ] No syntax errors
-
-**Commit:**
-```bash
-git add src/new_file.py tests/test_new.py
-git commit -m "feat: add new feature component"
-```
-```
-
-## Task Granularity
-
-**Each task = 2-5 minutes of work**
-
-**Break down:**
- "Write failing test" - one task
- "Run test to verify it fails" - one task  
- "Implement minimal code" - one task
- "Run test to verify pass" - one task
- "Commit" - one task
-
-**Examples:**
+Every step is one action:
+- "Write the failing test" — step
+- "Run it to make sure it fails" — step
+- "Implement the minimal code to make the test pass" — step
+- "Run the tests and make sure they pass" — step
+- "Commit" — step

 **Too big:**
 ```markdown
@@ -139,6 +61,146 @@ git commit -m "feat: add new feature component"
 [15 lines, 1 file]
 ```

+## Plan Document Structure
+
+### Header (Required)
+
+Every plan MUST start with:
+
+```markdown
+# [Feature Name] Implementation Plan
+
+> **For Hermes:** Use subagent-driven-development skill to implement this plan task-by-task.
+
+**Goal:** [One sentence describing what this builds]
+
+**Architecture:** [2-3 sentences about approach]
+
+**Tech Stack:** [Key technologies/libraries]
+
+---
+```
+
+### Task Structure
+
+Each task follows this format:
+
+````markdown
+### Task N: [Descriptive Name]
+
+**Objective:** What this task accomplishes (one sentence)
+
+**Files:**
+- Create: `exact/path/to/new_file.py`
+- Modify: `exact/path/to/existing.py:45-67` (line numbers if known)
+- Test: `tests/path/to/test_file.py`
+
+**Step 1: Write failing test**
+
+```python
+def test_specific_behavior():
+    result = function(input)
+    assert result == expected
+```
+
+**Step 2: Run test to verify failure**
+
+Run: `pytest tests/path/test.py::test_specific_behavior -v`
+Expected: FAIL — "function not defined"
+
+**Step 3: Write minimal implementation**
+
+```python
+def function(input):
+    return expected
+```
+
+**Step 4: Run test to verify pass**
+
+Run: `pytest tests/path/test.py::test_specific_behavior -v`
+Expected: PASS
+
+**Step 5: Commit**
+
+```bash
+git add tests/path/test.py src/path/file.py
+git commit -m "feat: add specific feature"
+```
+````
+
+## Writing Process
+
+### Step 1: Understand Requirements
+
+Read and understand:
+- Feature requirements
+- Design documents or user description
+- Acceptance criteria
+- Constraints
+
+### Step 2: Explore the Codebase
+
+Use Hermes tools to understand the project:
+
+```python
+# Understand project structure
+search_files("*.py", target="files", path="src/")
+
+# Look at similar features
+search_files("similar_pattern", path="src/", file_glob="*.py")
+
+# Check existing tests
+search_files("*.py", target="files", path="tests/")
+
+# Read key files
+read_file("src/app.py")
+```
+
+### Step 3: Design Approach
+
+Decide:
+- Architecture pattern
+- File organization
+- Dependencies needed
+- Testing strategy
+
+### Step 4: Write Tasks
+
+Create tasks in order:
+1. Setup/infrastructure
+2. Core functionality (TDD for each)
+3. Edge cases
+4. Integration
+5. Cleanup/documentation
+
+### Step 5: Add Complete Details
+
+For each task, include:
+- **Exact file paths** (not "the config file" but `src/config/settings.py`)
+- **Complete code examples** (not "add validation" but the actual code)
+- **Exact commands** with expected output
+- **Verification steps** that prove the task works
+
+### Step 6: Review the Plan
+
+Check:
+- [ ] Tasks are sequential and logical
+- [ ] Each task is bite-sized (2-5 min)
+- [ ] File paths are exact
+- [ ] Code examples are complete (copy-pasteable)
+- [ ] Commands are exact with expected output
+- [ ] No missing context
+- [ ] DRY, YAGNI, TDD principles applied
+
+### Step 7: Save the Plan
+
+```bash
+mkdir -p docs/plans
+# Save plan to docs/plans/YYYY-MM-DD-feature-name.md
+git add docs/plans/
+git commit -m "docs: add implementation plan for [feature]"
+```
+
 ## Principles

 ### DRY (Don't Repeat Yourself)
@@ -146,35 +208,21 @@ git commit -m "feat: add new feature component"
 **Bad:** Copy-paste validation in 3 places
 **Good:** Extract validation function, use everywhere

-```python
-# Good - DRY
-def validate_email(email):
-    if not re.match(r'^[^@]+@[^@]+$', email):
-        raise ValueError("Invalid email")
-    return email
-
-# Use everywhere
-validate_email(user_input)
-validate_email(config_email)
-validate_email(imported_data)
-```
-
 ### YAGNI (You Aren't Gonna Need It)

 **Bad:** Add "flexibility" for future requirements
 **Good:** Implement only what's needed now

 ```python
-# Bad - YAGNI violation
+# Bad — YAGNI violation
 class User:
    def __init__(self, name, email):
        self.name = name
        self.email = email
        self.preferences = {}  # Not needed yet!
        self.metadata = {}     # Not needed yet!
-        self.settings = {}     # Not needed yet!

-# Good - YAGNI
+# Good — YAGNI
 class User:
    def __init__(self, name, email):
        self.name = name
@@ -183,7 +231,7 @@ class User:

 ### TDD (Test-Driven Development)

-Every task that produces code should include:
+Every task that produces code should include the full TDD cycle:
 1. Write failing test
 2. Run to verify failure
 3. Write minimal code
@@ -199,255 +247,50 @@ git add [files]
 git commit -m "type: description"
 ```

-## Writing Process
-
-### Step 1: Understand Requirements
-
-Read and understand:
- Feature requirements
- Design documents
- Acceptance criteria
- Constraints
-
-### Step 2: Explore Codebase
-
-```bash
-# Understand project structure
-find src -type f -name "*.py" | head -20
-
-# Look at similar features
-grep -r "similar_pattern" src/
-
-# Check existing tests
-ls tests/
-```
-
-### Step 3: Design Approach
-
-Decide:
- Architecture pattern
- File organization
- Dependencies needed
- Testing strategy
-
-### Step 4: Write Tasks
-
-Create tasks in order:
-1. Setup/infrastructure
-2. Core functionality
-3. Edge cases
-4. Integration
-5. Cleanup
-
-### Step 5: Add Details
-
-For each task, add:
- Exact file paths
- Complete code examples
- Exact commands
- Expected outputs
- Verification steps
-
-### Step 6: Review Plan
-
-Check:
- [ ] Tasks are sequential and logical
- [ ] Each task is bite-sized (2-5 min)
- [ ] File paths are exact
- [ ] Code examples are complete
- [ ] Commands are exact with expected output
- [ ] No missing context
-
-### Step 7: Save Plan
-
-```bash
-# Create plans directory
-mkdir -p docs/plans
-
-# Save plan
-cat > docs/plans/YYYY-MM-DD-feature-name.md << 'EOF'
-[plan content]
-EOF
-
-# Commit plan
-git add docs/plans/YYYY-MM-DD-feature-name.md
-git commit -m "docs: add implementation plan for feature"
-```
-
-## Example Plan
-
-```markdown
-# User Authentication Implementation Plan
-
-> **For Hermes:** Use subagent-driven-development to implement this plan.
-
-**Goal:** Add JWT-based user authentication to the Flask API
-
-**Architecture:** Use PyJWT for tokens, bcrypt for hashing. Middleware validates tokens on protected routes.
-
-**Tech Stack:** Python, Flask, PyJWT, bcrypt
-
---
-
-### Task 1: Create User model
-
-**Objective:** Define User model with email and hashed password
-
-**Files:**
- Create: `src/models/user.py`
- Test: `tests/models/test_user.py`
-
-**Step 1: Write failing test**
-```python
-def test_user_creation():
-    user = User(email="test@example.com", password="secret123")
-    assert user.email == "test@example.com"
-    assert user.password_hash is not None
-    assert user.password_hash != "secret123"  # Should be hashed
-```
-
-**Step 2: Run to verify failure**
-```bash
-pytest tests/models/test_user.py -v
-```
-Expected: FAIL - User class not defined
-
-**Step 3: Implement User model**
-```python
-import bcrypt
-
-class User:
-    def __init__(self, email, password):
-        self.email = email
-        self.password_hash = bcrypt.hashpw(
-            password.encode(), 
-            bcrypt.gensalt()
-        )
-```
-
-**Step 4: Run to verify pass**
-```bash
-pytest tests/models/test_user.py -v
-```
-Expected: PASS
-
-**Commit:**
-```bash
-git add src/models/user.py tests/models/test_user.py
-git commit -m "feat: add User model with password hashing"
-```
-
-### Task 2: Create login endpoint
-
-**Objective:** Add POST /login endpoint that returns JWT
-
-**Files:**
- Modify: `src/app.py`
- Test: `tests/test_login.py`
-
-[Continue...]
-```
-
 ## Common Mistakes

 ### Vague Tasks

-**Bad:**
-```markdown
-### Task 1: Add authentication
-```
-
-**Good:**
-```markdown
-### Task 1: Create User model with email and password_hash fields
-```
+**Bad:** "Add authentication"
+**Good:** "Create User model with email and password_hash fields"

 ### Incomplete Code

-**Bad:**
-```markdown
-Step 1: Add validation function
-```
-
-**Good:**
-```markdown
-Step 1: Add validation function
-```python
-def validate_email(email):
-    """Validate email format."""
-    import re
-    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
-    if not re.match(pattern, email):
-        raise ValueError(f"Invalid email: {email}")
-    return email
-```
+**Bad:** "Step 1: Add validation function"
+**Good:** "Step 1: Add validation function" followed by the complete function code

 ### Missing Verification

-**Bad:**
-```markdown
-Step 3: Test it works
-```
+**Bad:** "Step 3: Test it works"
+**Good:** "Step 3: Run `pytest tests/test_auth.py -v`, expected: 3 passed"

-**Good:**
-```markdown
-Step 3: Verify authentication
-```bash
-curl -X POST http://localhost:5000/login \
-  -H "Content-Type: application/json" \
-  -d '{"email":"test@example.com","password":"secret123"}'
-```
-Expected: Returns 200 with JWT token in response
-```
+### Missing File Paths

-## Integration with Other Skills
+**Bad:** "Create the model file"
+**Good:** "Create: `src/models/user.py`"

-### With brainstorming
+## Execution Handoff

-**Sequence:**
-1. brainstorming → Explore and refine design
-2. writing-plans → Create implementation plan
-3. subagent-driven-development → Execute plan
+After saving the plan, offer the execution approach:

-### With subagent-driven-development
+**"Plan complete and saved. Ready to execute using subagent-driven-development — I'll dispatch a fresh subagent per task with two-stage review (spec compliance then code quality). Shall I proceed?"**

-Plans feed into subagent-driven-development:
- Subagents implement each task
- Two-stage review ensures quality
- Plan provides context and requirements
-
-### With test-driven-development
-
-Every code-producing task should follow TDD:
-1. Write failing test
-2. Verify failure
-3. Write minimal code
-4. Verify pass
-
-## Success Checklist
-
-Before considering a plan complete:
-
- [ ] Header with goal, architecture, tech stack
- [ ] All tasks are bite-sized (2-5 min each)
- [ ] Exact file paths for every file
- [ ] Complete code examples (not partial)
- [ ] Exact commands with expected output
- [ ] Verification steps for each task
- [ ] Commit commands included
- [ ] DRY, YAGNI, TDD principles applied
- [ ] Tasks are sequential and logical
- [ ] Plan saved to docs/plans/
+When executing, use the `subagent-driven-development` skill:
+- Fresh `delegate_task` per task with full context
+- Spec compliance review after each task
+- Code quality review after spec passes
+- Proceed only when both reviews approve

 ## Remember

 ```
-Bite-sized tasks
+Bite-sized tasks (2-5 min each)
 Exact file paths
-Complete code
-Exact commands
+Complete code (copy-pasteable)
+Exact commands with expected output
 Verification steps
 DRY, YAGNI, TDD
+Frequent commits
 ```

 **A good plan makes implementation obvious.**