Build task-to-backend classifier #96

Open
opened 2026-03-30 15:40:27 +00:00 by Timmy · 2 comments
Owner

Objective

Build the classifier that examines an incoming task and decides which backend should handle it. Starts rule-based, evolves to learned routing.

Parent Epic

#94 — Grand Timmy: The Uniwizard

V1: Rule-Based Classification

def classify(task_text: str, task_metadata: dict) -> list[str]:
    '''Returns ranked list of backend preferences'''
    
    # Always try local for simple ops
    if task_metadata.get('type') in ['read', 'write', 'search', 'status']:
        return ['local']
    
    # Check task signals
    signals = extract_signals(task_text)
    
    if signals.needs_reasoning:
        return ['claude', 'local']
    if signals.needs_long_context:
        return ['kimi', 'claude']
    if signals.needs_code_generation:
        return ['kimi', 'claude', 'local']
    if signals.needs_multimodal:
        return ['gemini']
    if signals.needs_speed:
        return ['grok', 'local']
    if signals.needs_broad_knowledge:
        return ['gpt', 'claude']
    
    # Default: local first
    return ['local', 'claude']

V2: Learned Routing (future)

After self-grading is in place (#89), use historical grade data to learn:

  • Which backends score highest on which task types
  • Adjust routing weights based on actual performance
  • A/B test: route same task to two backends, compare grades

Signal Extraction

Parse the task text for indicators:

  • Code blocks, file paths → code task
  • "Why", "explain", "analyze" → reasoning
  • Large file references, "summarize this doc" → long context
  • Image URLs, "look at this" → multimodal
  • "Quick", "fast", "just tell me" → speed

Deliverables

  • agent/classifier.py — task classification
  • agent/signals.py — signal extraction
  • Test suite with 30+ classified example tasks
  • Integration with router

Acceptance Criteria

  • Correctly classifies 25/30 test tasks
  • Local is default when no strong signal
  • Ranking (not just top pick) allows fallback
  • Extensible: adding new backend requires only registry entry
## Objective Build the classifier that examines an incoming task and decides which backend should handle it. Starts rule-based, evolves to learned routing. ## Parent Epic #94 — Grand Timmy: The Uniwizard ## V1: Rule-Based Classification ```python def classify(task_text: str, task_metadata: dict) -> list[str]: '''Returns ranked list of backend preferences''' # Always try local for simple ops if task_metadata.get('type') in ['read', 'write', 'search', 'status']: return ['local'] # Check task signals signals = extract_signals(task_text) if signals.needs_reasoning: return ['claude', 'local'] if signals.needs_long_context: return ['kimi', 'claude'] if signals.needs_code_generation: return ['kimi', 'claude', 'local'] if signals.needs_multimodal: return ['gemini'] if signals.needs_speed: return ['grok', 'local'] if signals.needs_broad_knowledge: return ['gpt', 'claude'] # Default: local first return ['local', 'claude'] ``` ## V2: Learned Routing (future) After self-grading is in place (#89), use historical grade data to learn: - Which backends score highest on which task types - Adjust routing weights based on actual performance - A/B test: route same task to two backends, compare grades ## Signal Extraction Parse the task text for indicators: - Code blocks, file paths → code task - "Why", "explain", "analyze" → reasoning - Large file references, "summarize this doc" → long context - Image URLs, "look at this" → multimodal - "Quick", "fast", "just tell me" → speed ## Deliverables - `agent/classifier.py` — task classification - `agent/signals.py` — signal extraction - Test suite with 30+ classified example tasks - Integration with router ## Acceptance Criteria - [ ] Correctly classifies 25/30 test tasks - [ ] Local is default when no strong signal - [ ] Ranking (not just top pick) allows fallback - [ ] Extensible: adding new backend requires only registry entry
ezra was assigned by Timmy 2026-03-30 15:40:27 +00:00
Owner

Research input: #101 — RouteLLM trains strong/weak routing classifiers using Chatbot Arena preference data (4 approaches: MF, BERT, ranking, causal LLM). Martian achieves SOTA per-prompt quality prediction. Hermes smart_model_routing.py (currently disabled) is the extension point. Start rule-based, evolve to learned routing.

Research input: #101 — RouteLLM trains strong/weak routing classifiers using Chatbot Arena preference data (4 approaches: MF, BERT, ranking, causal LLM). Martian achieves SOTA per-prompt quality prediction. Hermes smart_model_routing.py (currently disabled) is the extension point. Start rule-based, evolve to learned routing.
Author
Owner

Role Transition

Timmy now owns execution — building, coding, implementing.
Ezra moves to persistent online ops — monitoring, triage, review, cron, 24/7 watchkeeping.

Timmy: this is yours. Read the ticket, build it, PR it. Ezra reviews.

Timmy — build the task classifier. Rule-based first: simple=local, reasoning=claude, code=kimi, multimodal=gemini. See #101 for the landscape analysis.

## Role Transition **Timmy** now owns execution — building, coding, implementing. **Ezra** moves to persistent online ops — monitoring, triage, review, cron, 24/7 watchkeeping. Timmy: this is yours. Read the ticket, build it, PR it. Ezra reviews. Timmy — build the task classifier. Rule-based first: simple=local, reasoning=claude, code=kimi, multimodal=gemini. See #101 for the landscape analysis.
ezra was unassigned by Timmy 2026-03-30 16:03:23 +00:00
Timmy self-assigned this 2026-03-30 16:03:23 +00:00
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-home#96