Files
hermes-agent/docs/research/warm-session-provisioning-v2.md
Alexander Whitestone 021249cb15 docs(research): Update warm session research with corrected hypothesis
Incorporates finding that error rates INCREASE in marathon sessions (first-half: 26.8%, second-half: 32.7%). Initial hypothesis was partially wrong. Revised to focus on user guidance patterns instead of agent proficiency.
2026-04-14 01:09:54 +00:00

3.5 KiB

Warm Session Provisioning: Revised Hypothesis

Research Document v2.0
Issue: #327
Date: April 2026
Status: Revised Based on Empirical Data

Executive Summary

Initial hypothesis: Marathon sessions (100+ messages) have lower error rates, suggesting agents improve with experience. This was partially incorrect.

Actual finding: Error rates INCREASE within marathon sessions (avg first-half: 26.8%, second-half: 32.7%). Sessions don't improve - they degrade.

Corrected Understanding

What the Data Actually Shows

  1. Error rates increase over time within sessions
  2. Marathon sessions appear more reliable in aggregate because:
    • Only well-guided sessions survive to 100+ messages
    • Users who correct errors keep sessions alive
    • Selection bias: failed sessions end early
  3. User guidance drives success, not agent adaptation

Revised Hypothesis

The "proficiency" observed in marathon sessions comes from:

  • User expertise: Users who know how to guide the agent
  • Established context: Shared reference points reduce ambiguity
  • Error correction patterns: Users develop strategies to fix agent mistakes
  • Session survivorship: Only well-managed sessions reach marathon length

New Research Direction

1. User Guidance Patterns

Instead of agent proficiency, study user strategies:

  • How do expert users phrase requests?
  • What correction patterns work best?
  • How do users establish context?

2. Context Window Management

Long sessions may suffer from context degradation:

  • Attention dilution over many messages
  • Lost context from early messages
  • Compression artifacts

3. Warm Session v2: User-Guided Templates

Instead of pre-seeding agent patterns, pre-seed user guidance:

  • Effective prompt templates
  • Error correction strategies
  • Context establishment patterns

Implementation Plan

Phase 1: User Pattern Analysis

  • Analyze successful user strategies
  • Extract effective prompt patterns
  • Identify error correction techniques

Phase 2: Guidance Templates

  • Create user-facing templates
  • Document effective patterns
  • Provide prompt engineering guidance

Phase 3: Context Management

  • Optimize context window usage
  • Implement smart context refresh
  • Prevent attention degradation

Phase 4: A/B Testing

  • Test guided vs unguided sessions
  • Measure error reduction from user guidance
  • Statistical validation

Key Metrics

  1. Error Rate by Position

    • First 10 messages: baseline
    • Messages 10-50: degradation rate
    • Messages 50+: long-session behavior
  2. User Intervention Rate

    • How often users correct errors
    • Success rate of corrections
    • Patterns in effective corrections
  3. Context Window Utilization

    • Token usage over time
    • Information retention rate
    • Compression effectiveness

Paper Contributions (Revised)

  1. Counterintuitive finding: Longer sessions have HIGHER error rates
  2. Selection bias: Marathon sessions represent survivorship bias
  3. User expertise matters more than agent adaptation
  4. Context degradation over long sessions

Next Steps

  1. Correct initial hypothesis
  2. Analyze user guidance patterns
  3. Extract effective prompt strategies
  4. Create user-facing guidance templates
  5. Optimize context window management
  6. Run A/B tests on guided sessions
  7. Write paper with corrected findings

References

  • Empirical Audit 2026-04-12, Finding 4
  • Follow-up Analysis: Comment on #327 (2026-04-13)
  • Issue #327 (original hypothesis)