Add 5 new skills for professional software development workflows, adapted from the Superpowers project ( obra/superpowers ): - test-driven-development: RED-GREEN-REFACTOR cycle enforcement - systematic-debugging: 4-phase root cause investigation - subagent-driven-development: Structured delegation with two-stage review - writing-plans: Comprehensive implementation planning - requesting-code-review: Systematic code review process These skills provide structured development workflows that transform Hermes from a general assistant into a professional software engineer with defined processes for quality assurance. Skills are organized under software-development category and follow Hermes skill format with proper frontmatter, examples, and integration guidance with existing skills.
9.2 KiB
name, description, version, author, license, metadata
| name | description | version | author | license | metadata | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| systematic-debugging | Use when encountering any bug, test failure, or unexpected behavior. 4-phase root cause investigation process - NO fixes without understanding the problem first. | 1.0.0 | Hermes Agent (adapted from Superpowers) | MIT |
|
Systematic Debugging
Overview
Random fixes waste time and create new bugs. Quick patches mask underlying issues.
Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
Violating the letter of this process is violating the spirit of debugging.
The Iron Law
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
If you haven't completed Phase 1, you cannot propose fixes.
When to Use
Use for ANY technical issue:
- Test failures
- Bugs in production
- Unexpected behavior
- Performance problems
- Build failures
- Integration issues
Use this ESPECIALLY when:
- Under time pressure (emergencies make guessing tempting)
- "Just one quick fix" seems obvious
- You've already tried multiple fixes
- Previous fix didn't work
- You don't fully understand the issue
Don't skip when:
- Issue seems simple (simple bugs have root causes too)
- You're in a hurry (rushing guarantees rework)
- Manager wants it fixed NOW (systematic is faster than thrashing)
The Four Phases
You MUST complete each phase before proceeding to the next.
Phase 1: Root Cause Investigation
BEFORE attempting ANY fix:
1. Read Error Messages Carefully
- Don't skip past errors or warnings
- They often contain the exact solution
- Read stack traces completely
- Note line numbers, file paths, error codes
Action: Copy full error message to your notes.
2. Reproduce Consistently
- Can you trigger it reliably?
- What are the exact steps?
- Does it happen every time?
- If not reproducible → gather more data, don't guess
Action: Write down exact reproduction steps.
3. Check Recent Changes
- What changed that could cause this?
- Git diff, recent commits
- New dependencies, config changes
- Environmental differences
Commands:
# Recent commits
git log --oneline -10
# Uncommitted changes
git diff
# Changes in specific file
git log -p --follow src/problematic_file.py
4. Gather Evidence in Multi-Component Systems
WHEN system has multiple components (CI pipeline, API service, database layer):
BEFORE proposing fixes, add diagnostic instrumentation:
For EACH component boundary:
- Log what data enters component
- Log what data exits component
- Verify environment/config propagation
- Check state at each layer
Run once to gather evidence showing WHERE it breaks THEN analyze evidence to identify failing component THEN investigate that specific component
Example (multi-layer system):
# Layer 1: Entry point
def entry_point(input_data):
print(f"DEBUG: Input received: {input_data}")
result = process_layer1(input_data)
print(f"DEBUG: Layer 1 output: {result}")
return result
# Layer 2: Processing
def process_layer1(data):
print(f"DEBUG: Layer 1 received: {data}")
# ... processing ...
print(f"DEBUG: Layer 1 returning: {result}")
return result
Action: Add logging, run once, analyze output.
5. Isolate the Problem
- Comment out code until problem disappears
- Binary search through recent changes
- Create minimal reproduction case
- Test with fresh environment
Action: Create minimal reproduction case.
Phase 1 Completion Checklist
- Error messages fully read and understood
- Issue reproduced consistently
- Recent changes identified and reviewed
- Evidence gathered (logs, state)
- Problem isolated to specific component/code
- Root cause hypothesis formed
STOP: Do not proceed to Phase 2 until you understand WHY it's happening.
Phase 2: Solution Design
Given the root cause, design the fix:
1. Understand the Fix Area
- Read relevant code thoroughly
- Understand data flow
- Identify affected components
- Check for similar issues elsewhere
Action: Read all relevant code files.
2. Design Minimal Fix
- Smallest change that fixes root cause
- Avoid scope creep
- Don't refactor while fixing
- Fix one issue at a time
Action: Write down the exact fix before coding.
3. Consider Side Effects
- What else could this change affect?
- Are there dependencies?
- Will this break other functionality?
Action: Identify potential side effects.
Phase 2 Completion Checklist
- Fix area code fully understood
- Minimal fix designed
- Side effects identified
- Fix approach documented
Phase 3: Implementation
Make the fix:
1. Write Test First (if possible)
def test_should_handle_empty_input():
"""Regression test for bug #123"""
result = process_data("")
assert result == expected_empty_result
2. Implement Fix
# Before (buggy)
def process_data(data):
return data.split(",")[0]
# After (fixed)
def process_data(data):
if not data:
return ""
return data.split(",")[0]
3. Verify Fix
# Run the specific test
pytest tests/test_data.py::test_should_handle_empty_input -v
# Run all tests to check for regressions
pytest
Phase 3 Completion Checklist
- Test written that reproduces the bug
- Minimal fix implemented
- Test passes
- No regressions introduced
Phase 4: Verification
Confirm it's actually fixed:
1. Reproduce Original Issue
- Follow original reproduction steps
- Verify issue is resolved
- Test edge cases
2. Regression Testing
# Full test suite
pytest
# Integration tests
pytest tests/integration/
# Check related areas
pytest -k "related_feature"
3. Monitor After Deploy
- Watch logs for related errors
- Check metrics
- Verify fix in production
Phase 4 Completion Checklist
- Original issue cannot be reproduced
- All tests pass
- No new warnings/errors
- Fix documented (commit message, comments)
Debugging Techniques
Root Cause Tracing
Ask "why" 5 times:
- Why did it fail? → Null pointer
- Why was it null? → Function returned null
- Why did function return null? → Missing validation
- Why was validation missing? → Assumed input always valid
- Why was that assumption wrong? → API changed
Root cause: API change not accounted for
Defense in Depth
Don't fix just the symptom:
Bad: Add null check at crash site Good:
- Add validation at API boundary
- Add null check at crash site
- Add test for both
- Document API behavior
Condition-Based Waiting
For timing/race conditions:
# Bad - arbitrary sleep
import time
time.sleep(5) # "Should be enough"
# Good - wait for condition
from tenacity import retry, wait_exponential, stop_after_attempt
@retry(wait=wait_exponential(multiplier=1, min=4, max=10),
stop=stop_after_attempt(5))
def wait_for_service():
response = requests.get(health_url)
assert response.status_code == 200
Common Debugging Pitfalls
Fix Without Understanding
Symptom: "Just add a try/catch" Problem: Masks the real issue Solution: Complete Phase 1 before any fix
Shotgun Debugging
Symptom: Change 5 things at once Problem: Don't know what fixed it Solution: One change at a time, verify each
Premature Optimization
Symptom: Rewrite while debugging Problem: Introduces new bugs Solution: Fix first, refactor later
Assuming Environment
Symptom: "Works on my machine" Problem: Environment differences Solution: Check environment variables, versions, configs
Language-Specific Tools
Python
# Add debugger
import pdb; pdb.set_trace()
# Or use ipdb for better experience
import ipdb; ipdb.set_trace()
# Log state
import logging
logging.debug(f"Variable state: {variable}")
# Stack trace
import traceback
traceback.print_exc()
JavaScript/TypeScript
// Debugger
debugger;
// Console with context
console.log("State:", { var1, var2, var3 });
// Stack trace
console.trace("Here");
// Error with context
throw new Error(`Failed with input: ${JSON.stringify(input)}`);
Go
// Print state
fmt.Printf("Debug: variable=%+v\n", variable)
// Stack trace
import "runtime/debug"
debug.PrintStack()
// Panic with context
if err != nil {
panic(fmt.Sprintf("unexpected error: %v", err))
}
Integration with Other Skills
With test-driven-development
When debugging:
- Write test that reproduces bug
- Debug systematically
- Fix root cause
- Test passes
With writing-plans
Include debugging tasks in plans:
- "Add diagnostic logging"
- "Create reproduction test"
- "Verify fix resolves issue"
With subagent-driven-development
If subagent gets stuck:
- Switch to systematic debugging
- Analyze root cause
- Provide findings to subagent
- Resume implementation
Remember
PHASE 1: Investigate → Understand WHY
PHASE 2: Design → Plan the fix
PHASE 3: Implement → Make the fix
PHASE 4: Verify → Confirm it's fixed
No shortcuts. No guessing. Systematic always wins.