systematic-debugging Skill
Four-phase debugging framework that ensures root cause investigation before attempting fixes. Never jump to solutions.
Core Principle
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
Random fixes waste time and create new bugs. This skill enforces systematic investigation.
When to Use
Use systematic-debugging for ANY technical issue:
- Test failures
- Bugs in production
- Unexpected behavior
- Performance problems
- Build failures
- Integration issues
ESPECIALLY when:
- Under time pressure
- “Quick fix” seems obvious
- You’ve tried multiple fixes
- Previous fix didn’t work
- You don’t fully understand the issue
Quick Start
Invoke the Skill
"Use systematic-debugging to investigate why tests are failing"
What You Get
The skill guides you through:
- Root cause investigation
- Pattern analysis
- Hypothesis testing
- Fix implementation
The Four Phases
Phase 1: Root Cause Investigation
BEFORE attempting ANY fix:
-
Read Error Messages Carefully
- Don’t skip errors or warnings
- They often contain exact solution
- Read stack traces completely
-
Reproduce Consistently
- Can you trigger it reliably?
- What are exact steps?
- If not reproducible → gather more data
-
Check Recent Changes
- What changed that could cause this?
- Git diff, recent commits
- Dependencies, config changes
-
Gather Evidence
- Add diagnostic logging
- Check component boundaries
- Verify data flow
- Trace through system layers
-
Trace Data Flow
- Where does bad value originate?
- What called this with bad value?
- Fix at source, not symptom
Phase 2: Pattern Analysis
-
Find Working Examples
- Locate similar working code
- What works that’s similar?
-
Compare Against References
- Read reference implementation completely
- Don’t skim - understand fully
-
Identify Differences
- List every difference
- Don’t assume “that can’t matter”
-
Understand Dependencies
- What components does this need?
- What settings, config, environment?
Phase 3: Hypothesis Testing
-
Form Single Hypothesis
- State clearly: “I think X because Y”
- Be specific, not vague
-
Test Minimally
- Smallest possible change
- One variable at a time
- Don’t fix multiple things
-
Verify Before Continuing
- Did it work? → Phase 4
- Didn’t work? → New hypothesis
- Don’t add more fixes
Phase 4: Implementation
-
Create Failing Test
- Simplest reproduction
- Automated if possible
- MUST have before fixing
-
Implement Single Fix
- Address root cause
- ONE change at a time
- No bundled refactoring
-
Verify Fix
- Test passes?
- Other tests OK?
- Issue resolved?
-
If Fix Doesn’t Work
- Count: How many fixes tried?
- If < 3: Return to Phase 1
- If ≥ 3: Question architecture
Red Flags - STOP
If you think:
- “Quick fix for now”
- “Just try changing X”
- “Add multiple changes”
- “Skip the test”
- “It’s probably X”
- “I don’t fully understand”
- “One more fix attempt” (after 2+)
ALL mean: STOP. Return to Phase 1.
Common Use Cases
Test Failure
"Use systematic-debugging to investigate test failure:
1. Read exact error message
2. Reproduce locally
3. Check recent code changes
4. Trace data flow
5. Find root cause
6. Create minimal failing test
7. Implement fix"
Production Bug
"Use systematic-debugging for production bug:
- Gather error logs
- Reproduce in staging
- Check recent deployments
- Analyze error patterns
- Form hypothesis
- Test fix in staging
- Deploy with monitoring"
Performance Issue
"Use systematic-debugging to investigate slow queries:
- Identify slow operations
- Check query patterns
- Analyze indexes
- Review recent changes
- Test optimization
- Verify improvement"
Integration Failure
"Use systematic-debugging for API integration issue:
- Check both sides of integration
- Log request/response
- Verify data format
- Compare working examples
- Fix at source"
Red Flags
Watch for these patterns:
“Quick fix first, investigate later” → Sets bad precedent, wastes time
“Just try changing X” → Random guessing, not systematic
“Add multiple changes at once” → Can’t isolate what worked
“Skip test, manually verify” → Untested fixes don’t stick
“One more fix” (after 2+) → Architecture problem, not implementation
Example Session
Wrong Approach:
User: Tests failing
Dev: Let me try fixing the import
Dev: That didn't work, trying different config
Dev: Still failing, maybe it's the version
Dev: [2 hours later, still broken]
Systematic Approach:
User: Tests failing
Dev: Use systematic-debugging
Phase 1: Read error - "Module not found: ./utils"
Phase 1: Check recent changes - Moved utils folder
Phase 1: Root cause - Import path outdated
Phase 2: Find working imports in other files
Phase 3: Hypothesis - Update import path
Phase 4: Create test, fix import, verify
[15 minutes, fixed]
Integration with Other Skills
Works with:
- root-cause-tracing - Trace through call stack
- verification-before-completion - Verify fix worked
- defense-in-depth - Add validation after fix
Benefits
Using systematic-debugging:
- 15-30 min to fix vs 2-3 hours thrashing
- 95% first-time fix rate vs 40%
- Near zero new bugs vs common
- Lower stress - clear process
Quick Examples
Simple Bug:
"Use systematic-debugging to investigate button not working"
Complex Issue:
"Use systematic-debugging for multi-service failure:
- Check each service layer
- Add diagnostic logging
- Trace request flow
- Identify failing component
- Fix root cause"
Performance:
"Use systematic-debugging to find why page loads slowly:
- Measure load time
- Check network requests
- Analyze bundle size
- Profile JavaScript
- Optimize bottleneck"
Key Takeaways
- Always investigate before fixing
- One change at a time
- Test the fix
- If 3+ fixes fail, question architecture
- Systematic is faster than random
Next Steps
Bottom Line: systematic-debugging prevents random fixes. Find root cause first, fix once, move on.