Build task-to-backend classifier #96

New Issue

Timmy · 2026-03-30T15:40:27Z

Timmy commented

2026-03-30 15:40:27 +00:00

Objective

Build the classifier that examines an incoming task and decides which backend should handle it. Starts rule-based, evolves to learned routing.

Parent Epic

#94 — Grand Timmy: The Uniwizard

V1: Rule-Based Classification

def classify(task_text: str, task_metadata: dict) -> list[str]:
    '''Returns ranked list of backend preferences'''
    
    # Always try local for simple ops
    if task_metadata.get('type') in ['read', 'write', 'search', 'status']:
        return ['local']
    
    # Check task signals
    signals = extract_signals(task_text)
    
    if signals.needs_reasoning:
        return ['claude', 'local']
    if signals.needs_long_context:
        return ['kimi', 'claude']
    if signals.needs_code_generation:
        return ['kimi', 'claude', 'local']
    if signals.needs_multimodal:
        return ['gemini']
    if signals.needs_speed:
        return ['grok', 'local']
    if signals.needs_broad_knowledge:
        return ['gpt', 'claude']
    
    # Default: local first
    return ['local', 'claude']

V2: Learned Routing (future)

After self-grading is in place (#89), use historical grade data to learn:

Which backends score highest on which task types
Adjust routing weights based on actual performance
A/B test: route same task to two backends, compare grades

Signal Extraction

Parse the task text for indicators:

Code blocks, file paths → code task
"Why", "explain", "analyze" → reasoning
Large file references, "summarize this doc" → long context
Image URLs, "look at this" → multimodal
"Quick", "fast", "just tell me" → speed

Deliverables

agent/classifier.py — task classification
agent/signals.py — signal extraction
Test suite with 30+ classified example tasks
Integration with router

Acceptance Criteria

Correctly classifies 25/30 test tasks
Local is default when no strong signal
Ranking (not just top pick) allows fallback
Extensible: adding new backend requires only registry entry

## Objective Build the classifier that examines an incoming task and decides which backend should handle it. Starts rule-based, evolves to learned routing. ## Parent Epic #94 — Grand Timmy: The Uniwizard ## V1: Rule-Based Classification ```python def classify(task_text: str, task_metadata: dict) -> list[str]: '''Returns ranked list of backend preferences''' # Always try local for simple ops if task_metadata.get('type') in ['read', 'write', 'search', 'status']: return ['local'] # Check task signals signals = extract_signals(task_text) if signals.needs_reasoning: return ['claude', 'local'] if signals.needs_long_context: return ['kimi', 'claude'] if signals.needs_code_generation: return ['kimi', 'claude', 'local'] if signals.needs_multimodal: return ['gemini'] if signals.needs_speed: return ['grok', 'local'] if signals.needs_broad_knowledge: return ['gpt', 'claude'] # Default: local first return ['local', 'claude'] ``` ## V2: Learned Routing (future) After self-grading is in place (#89), use historical grade data to learn: - Which backends score highest on which task types - Adjust routing weights based on actual performance - A/B test: route same task to two backends, compare grades ## Signal Extraction Parse the task text for indicators: - Code blocks, file paths → code task - "Why", "explain", "analyze" → reasoning - Large file references, "summarize this doc" → long context - Image URLs, "look at this" → multimodal - "Quick", "fast", "just tell me" → speed ## Deliverables - `agent/classifier.py` — task classification - `agent/signals.py` — signal extraction - Test suite with 30+ classified example tasks - Integration with router ## Acceptance Criteria - [ ] Correctly classifies 25/30 test tasks - [ ] Local is default when no strong signal - [ ] Ranking (not just top pick) allows fallback - [ ] Extensible: adding new backend requires only registry entry

ezra was assigned by Timmy

2026-03-30 15:40:27 +00:00

Rockachopa referenced this issue

2026-03-30 15:48:45 +00:00

[RESEARCH] Multi-Backend LLM Routing Landscape — Uniwizard Design Input #101

Rockachopa referenced this issue

2026-03-30 15:49:19 +00:00

[EPIC] Grand Timmy — The Uniwizard #94

Rockachopa commented

2026-03-30 15:49:28 +00:00

Research input: #101 — RouteLLM trains strong/weak routing classifiers using Chatbot Arena preference data (4 approaches: MF, BERT, ranking, causal LLM). Martian achieves SOTA per-prompt quality prediction. Hermes smart_model_routing.py (currently disabled) is the extension point. Start rule-based, evolve to learned routing.

Rockachopa referenced this issue

2026-03-30 15:50:21 +00:00

[EPIC] Grand Timmy — The Uniwizard #94

Timmy referenced this issue

2026-03-30 15:58:50 +00:00

[EPIC] Grand Timmy — The Uniwizard #94

Timmy commented

2026-03-30 16:03:23 +00:00

Role Transition

Timmy now owns execution — building, coding, implementing.
Ezra moves to persistent online ops — monitoring, triage, review, cron, 24/7 watchkeeping.

Timmy: this is yours. Read the ticket, build it, PR it. Ezra reviews.

Timmy — build the task classifier. Rule-based first: simple=local, reasoning=claude, code=kimi, multimodal=gemini. See #101 for the landscape analysis.

## Role Transition **Timmy** now owns execution — building, coding, implementing. **Ezra** moves to persistent online ops — monitoring, triage, review, cron, 24/7 watchkeeping. Timmy: this is yours. Read the ticket, build it, PR it. Ezra reviews. Timmy — build the task classifier. Rule-based first: simple=local, reasoning=claude, code=kimi, multimodal=gemini. See #101 for the landscape analysis.

ezra was unassigned by Timmy

2026-03-30 16:03:23 +00:00

Timmy self-assigned this 2026-03-30 16:03:23 +00:00

Timmy referenced this issue

2026-03-30 16:49:46 +00:00

[RESEARCH] Multi-Backend LLM Routing Landscape — Uniwizard Design Input #101

Rockachopa referenced this issue

2026-03-30 17:01:38 +00:00

[HARNESS] Programmatic session API — RPC/SDK mode for dispatch without cron #104

ezra referenced this issue

2026-03-31 16:30:05 +00:00

[EPIC] Claude Code Source Study — Reference Architecture for Grand Timmy #154