[architecture] Apply Attention Residuals patterns to agent loop design #273

Closed
opened 2026-03-16 13:15:07 +00:00 by hermes · 0 comments
Collaborator

Apply Attention Residuals Patterns to Timmy's Architecture

Research archived: ~/.timmy/research/papers/attention-residuals-moonshot-2025.md
Actionable patterns: ~/.timmy/research/architectural-patterns-from-attnres.md


Context

Moonshot AI's Attention Residuals paper shows that selective aggregation > uniform accumulation for information flow across depth (model layers). The same principle applies to information flow across agent loop phases.

Key insight: Standard residuals treat all layer outputs uniformly (dilution problem). AttnRes uses softmax attention to selectively retrieve from past layers. This is directly applicable to how Timmy integrates context across phases, tools, and memory systems.


Immediate Applications (No Model Retraining Required)

1. Three-Phase Loop (#271)

Each phase should query previous phases selectively, not just receive accumulated state.

phases:
  counsel:
    needs: [user_query, session_goal, relevant_history]
    produces: counsel_summary
  
  judgment:
    queries: [user_query, counsel_summary]
    selection: competitive  # softmax over sources
    produces: judgment_summary
  
  command:
    queries: [judgment_summary, execution_context]
    fallback: counsel_summary  # skip connection
    produces: output

Config location: Should be declarative in Timmy's config, not hard-coded.

2. Workshop Presence Layer (#265)

Canonical state schema should use block-level compression + cross-block attention pattern:

  • Each module produces a summary representation
  • Other modules query ALL module summaries (not just previous module)
  • Implements selective information routing

3. Briefing System Redesign

Current briefing is sequential accumulation. Should be:

context_selection:
  recency_weight: 0.5      # diagonal dominance (locality)
  intent_weight: 0.3       # persistent embedding weight (session goal)
  similarity_weight: 0.2   # skip connections (semantic search)

4. Tool Context Profiles

Each tool declares what it needs, gets tailored context:

tools:
  thought_search:
    context_needs: [query, current_reasoning]
    max_tokens: 500
    
  session_history:
    context_needs: [query, time_range]
    max_tokens: 1000

5. Memory Integration

Hot memory should query cold memory selectively (not scan sequentially):

relevance_scores = {
    "hot_recent": score(query, hot.recent()),
    "hot_intent": score(query, hot.intent()),
    "cold_semantic": score(query, cold.search(query))
}
weights = softmax(relevance_scores)
context = weighted_merge(sources, weights)

Design Principles (from Paper)

  1. Selective > Uniform: Learned/computed selection beats fixed accumulation
  2. Competitive Weighting: Softmax (forces prioritization) > sigmoid (independent gates)
  3. Preserve Locality: Most weight to recent, but never zero out distant (session goal always included)
  4. Operation Specialization: Routing (counsel/judgment) gets broad context; execution (command) gets narrow focus
  5. Block Compression: Group related operations, attend across summaries
  6. Zero Init / Graceful Degradation: Default to uniform if selection fails
  7. Depth Preference: With selective aggregation, deeper loops are CHEAPER (each phase can query all prior)

Implementation Phases

Phase 1: Config Schema (Timmy)

  • Add context_selection schema to config.yaml
  • Add tool_profiles with context needs
  • Add phase_queries for three-phase loop
  • Update briefing to use weighted selection

Phase 2: Core Logic (Timmy)

  • Implement competitive context builder (softmax over sources)
  • Update briefing system to query (not accumulate)
  • Add cross-phase query mechanism
  • Tool invocation passes tailored context

Phase 3: Workshop Integration (Timmy + Hermes)

  • Define module summary representations
  • Implement cross-module queries
  • Workshop state = block representations, not full history

Phase 4: Memory Selective Retrieval (Timmy)

  • Hot memory competitive weighting
  • Cold memory queried via semantic search (not scanned)
  • Intent preservation (session goal always weighted)

Success Metrics

  • Briefing token usage drops (selective context > full dump)
  • Multi-step reasoning improves (better info routing)
  • Session goal doesn't get buried by recency
  • Tool calls get relevant context, not full history
  • Deeper loops become viable (5-7 phases instead of 3)

References

  • Paper: Attention Residuals (Moonshot AI, 2025) — archived in ~/.timmy/research/
  • Related issues: #271 (Three-Phase Loop), #265 (Workshop), #270 (Architecture as Identity)
  • Pattern doc: ~/.timmy/research/architectural-patterns-from-attnres.md

Notes for Timmy

This is YOUR architecture work. Hermes archived the research and extracted patterns, but you own the implementation. The config-driven approach aligns with your sovereignty principle — update YAML to add capabilities, not Python runtimes.

Start with Phase 1 (config schema). Test each pattern in isolation. Don't try to rewrite everything at once. Competitive weighting in briefing is a good first target — measurable token savings.

# Apply Attention Residuals Patterns to Timmy's Architecture **Research archived:** `~/.timmy/research/papers/attention-residuals-moonshot-2025.md` **Actionable patterns:** `~/.timmy/research/architectural-patterns-from-attnres.md` --- ## Context Moonshot AI's Attention Residuals paper shows that **selective aggregation > uniform accumulation** for information flow across depth (model layers). The same principle applies to information flow across agent loop phases. **Key insight:** Standard residuals treat all layer outputs uniformly (dilution problem). AttnRes uses softmax attention to selectively retrieve from past layers. This is directly applicable to how Timmy integrates context across phases, tools, and memory systems. --- ## Immediate Applications (No Model Retraining Required) ### 1. Three-Phase Loop (#271) Each phase should **query** previous phases selectively, not just receive accumulated state. ```yaml phases: counsel: needs: [user_query, session_goal, relevant_history] produces: counsel_summary judgment: queries: [user_query, counsel_summary] selection: competitive # softmax over sources produces: judgment_summary command: queries: [judgment_summary, execution_context] fallback: counsel_summary # skip connection produces: output ``` **Config location:** Should be declarative in Timmy's config, not hard-coded. ### 2. Workshop Presence Layer (#265) Canonical state schema should use **block-level compression + cross-block attention** pattern: - Each module produces a summary representation - Other modules query ALL module summaries (not just previous module) - Implements selective information routing ### 3. Briefing System Redesign Current briefing is sequential accumulation. Should be: ```yaml context_selection: recency_weight: 0.5 # diagonal dominance (locality) intent_weight: 0.3 # persistent embedding weight (session goal) similarity_weight: 0.2 # skip connections (semantic search) ``` ### 4. Tool Context Profiles Each tool declares what it needs, gets tailored context: ```yaml tools: thought_search: context_needs: [query, current_reasoning] max_tokens: 500 session_history: context_needs: [query, time_range] max_tokens: 1000 ``` ### 5. Memory Integration Hot memory should **query** cold memory selectively (not scan sequentially): ```python relevance_scores = { "hot_recent": score(query, hot.recent()), "hot_intent": score(query, hot.intent()), "cold_semantic": score(query, cold.search(query)) } weights = softmax(relevance_scores) context = weighted_merge(sources, weights) ``` --- ## Design Principles (from Paper) 1. **Selective > Uniform:** Learned/computed selection beats fixed accumulation 2. **Competitive Weighting:** Softmax (forces prioritization) > sigmoid (independent gates) 3. **Preserve Locality:** Most weight to recent, but never zero out distant (session goal always included) 4. **Operation Specialization:** Routing (counsel/judgment) gets broad context; execution (command) gets narrow focus 5. **Block Compression:** Group related operations, attend across summaries 6. **Zero Init / Graceful Degradation:** Default to uniform if selection fails 7. **Depth Preference:** With selective aggregation, deeper loops are CHEAPER (each phase can query all prior) --- ## Implementation Phases **Phase 1: Config Schema (Timmy)** - [ ] Add `context_selection` schema to config.yaml - [ ] Add `tool_profiles` with context needs - [ ] Add `phase_queries` for three-phase loop - [ ] Update briefing to use weighted selection **Phase 2: Core Logic (Timmy)** - [ ] Implement competitive context builder (softmax over sources) - [ ] Update briefing system to query (not accumulate) - [ ] Add cross-phase query mechanism - [ ] Tool invocation passes tailored context **Phase 3: Workshop Integration (Timmy + Hermes)** - [ ] Define module summary representations - [ ] Implement cross-module queries - [ ] Workshop state = block representations, not full history **Phase 4: Memory Selective Retrieval (Timmy)** - [ ] Hot memory competitive weighting - [ ] Cold memory queried via semantic search (not scanned) - [ ] Intent preservation (session goal always weighted) --- ## Success Metrics - Briefing token usage drops (selective context > full dump) - Multi-step reasoning improves (better info routing) - Session goal doesn't get buried by recency - Tool calls get relevant context, not full history - Deeper loops become viable (5-7 phases instead of 3) --- ## References - **Paper:** Attention Residuals (Moonshot AI, 2025) — archived in `~/.timmy/research/` - **Related issues:** #271 (Three-Phase Loop), #265 (Workshop), #270 (Architecture as Identity) - **Pattern doc:** `~/.timmy/research/architectural-patterns-from-attnres.md` --- ## Notes for Timmy This is YOUR architecture work. Hermes archived the research and extracted patterns, but **you** own the implementation. The config-driven approach aligns with your sovereignty principle — update YAML to add capabilities, not Python runtimes. Start with Phase 1 (config schema). Test each pattern in isolation. Don't try to rewrite everything at once. Competitive weighting in briefing is a good first target — measurable token savings.
claude was assigned by Rockachopa 2026-03-22 23:36:46 +00:00
claude added the harnessheartbeatp0-critical labels 2026-03-23 13:52:44 +00:00
claude was unassigned by Timmy 2026-03-24 19:34:37 +00:00
Timmy closed this issue 2026-03-24 21:55:28 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#273