Incorporates finding that error rates INCREASE in marathon sessions (first-half: 26.8%, second-half: 32.7%). Initial hypothesis was partially wrong. Revised to focus on user guidance patterns instead of agent proficiency.
3.5 KiB
3.5 KiB
Warm Session Provisioning: Revised Hypothesis
Research Document v2.0
Issue: #327
Date: April 2026
Status: Revised Based on Empirical Data
Executive Summary
Initial hypothesis: Marathon sessions (100+ messages) have lower error rates, suggesting agents improve with experience. This was partially incorrect.
Actual finding: Error rates INCREASE within marathon sessions (avg first-half: 26.8%, second-half: 32.7%). Sessions don't improve - they degrade.
Corrected Understanding
What the Data Actually Shows
- Error rates increase over time within sessions
- Marathon sessions appear more reliable in aggregate because:
- Only well-guided sessions survive to 100+ messages
- Users who correct errors keep sessions alive
- Selection bias: failed sessions end early
- User guidance drives success, not agent adaptation
Revised Hypothesis
The "proficiency" observed in marathon sessions comes from:
- User expertise: Users who know how to guide the agent
- Established context: Shared reference points reduce ambiguity
- Error correction patterns: Users develop strategies to fix agent mistakes
- Session survivorship: Only well-managed sessions reach marathon length
New Research Direction
1. User Guidance Patterns
Instead of agent proficiency, study user strategies:
- How do expert users phrase requests?
- What correction patterns work best?
- How do users establish context?
2. Context Window Management
Long sessions may suffer from context degradation:
- Attention dilution over many messages
- Lost context from early messages
- Compression artifacts
3. Warm Session v2: User-Guided Templates
Instead of pre-seeding agent patterns, pre-seed user guidance:
- Effective prompt templates
- Error correction strategies
- Context establishment patterns
Implementation Plan
Phase 1: User Pattern Analysis
- Analyze successful user strategies
- Extract effective prompt patterns
- Identify error correction techniques
Phase 2: Guidance Templates
- Create user-facing templates
- Document effective patterns
- Provide prompt engineering guidance
Phase 3: Context Management
- Optimize context window usage
- Implement smart context refresh
- Prevent attention degradation
Phase 4: A/B Testing
- Test guided vs unguided sessions
- Measure error reduction from user guidance
- Statistical validation
Key Metrics
-
Error Rate by Position
- First 10 messages: baseline
- Messages 10-50: degradation rate
- Messages 50+: long-session behavior
-
User Intervention Rate
- How often users correct errors
- Success rate of corrections
- Patterns in effective corrections
-
Context Window Utilization
- Token usage over time
- Information retention rate
- Compression effectiveness
Paper Contributions (Revised)
- Counterintuitive finding: Longer sessions have HIGHER error rates
- Selection bias: Marathon sessions represent survivorship bias
- User expertise matters more than agent adaptation
- Context degradation over long sessions
Next Steps
- ✅ Correct initial hypothesis
- ⏳ Analyze user guidance patterns
- ⏳ Extract effective prompt strategies
- ⏳ Create user-facing guidance templates
- ⏳ Optimize context window management
- ⏳ Run A/B tests on guided sessions
- ⏳ Write paper with corrected findings
References
- Empirical Audit 2026-04-12, Finding 4
- Follow-up Analysis: Comment on #327 (2026-04-13)
- Issue #327 (original hypothesis)