[P0] Build ResearchOrchestrator pipeline (src/timmy/research.py) #975

Closed
opened 2026-03-22 19:08:53 +00:00 by perplexity · 1 comment
Collaborator

Parent

  • #972 — [GOVERNING] Replacing Claude — Autonomous Research Pipeline Spec

Objective

Implement the main research pipeline that chains Scope→Query→Search→Fetch→Synthesize→Deliver into an autonomous workflow. This is the core of research sovereignty.

Scope

Implement ResearchOrchestrator class with:

class ResearchOrchestrator:
    def __init__(self, cascade, memory, tools)
    async def run(self, topic, template, context) -> ResearchResult

Pipeline Steps:

  1. CHECK LOCAL KNOWLEDGE FIRSTmemory.search(topic, limit=10). If confidence > 0.85, return cached result. This is the critical line.
  2. GENERATE QUERIES — Fill template, ask cascade to generate 8-12 search queries
  3. SEARCH — Execute queries via web_search, collect top 5 results per query
  4. FETCH — Rank snippets by relevance, fetch top 10 full pages via web_fetch (3000 tokens each)
  5. SYNTHESIZE — Pass filled template + fetched pages to cascade.generate (max_tokens=4000)
  6. CRYSTALLIZE — Store result in semantic memory (memory.store(topic, report, type="research"))
  7. WRITE ARTIFACT — Commit to repo, extract action items, create Gitea issues

Dependencies

  • web_fetch tool (sibling P0 issue)
  • Research templates (sibling P0 issue)
  • cascade.py (exists — LLM router)
  • Semantic memory (exists or P1 enhancement)

Key Design Notes

  • Runs as Paperclip task via DistributedWorker (P1 integration)
  • The cascade router passes template's cascade_tier hint for model selection
  • Every research output gets embedded → compound interest of crystallized knowledge
  • Record metrics: research_cache_hit, research_api_call

Effort Estimate

1 day

Acceptance Criteria

  • Pipeline runs end-to-end: topic in → structured report out
  • Local knowledge check works (cache hit returns instantly)
  • Results are stored in semantic memory for future queries
  • Gitea issues are created from extracted action items
  • Metrics are recorded for sovereignty tracking
## Parent - #972 — [GOVERNING] Replacing Claude — Autonomous Research Pipeline Spec ## Objective Implement the main research pipeline that chains Scope→Query→Search→Fetch→Synthesize→Deliver into an autonomous workflow. This is the core of research sovereignty. ## Scope Implement `ResearchOrchestrator` class with: ``` class ResearchOrchestrator: def __init__(self, cascade, memory, tools) async def run(self, topic, template, context) -> ResearchResult ``` ### Pipeline Steps: 0. **CHECK LOCAL KNOWLEDGE FIRST** — `memory.search(topic, limit=10)`. If confidence > 0.85, return cached result. This is the critical line. 1. **GENERATE QUERIES** — Fill template, ask cascade to generate 8-12 search queries 2. **SEARCH** — Execute queries via web_search, collect top 5 results per query 3. **FETCH** — Rank snippets by relevance, fetch top 10 full pages via web_fetch (3000 tokens each) 4. **SYNTHESIZE** — Pass filled template + fetched pages to cascade.generate (max_tokens=4000) 5. **CRYSTALLIZE** — Store result in semantic memory (`memory.store(topic, report, type="research")`) 6. **WRITE ARTIFACT** — Commit to repo, extract action items, create Gitea issues ## Dependencies - web_fetch tool (sibling P0 issue) - Research templates (sibling P0 issue) - cascade.py (exists — LLM router) - Semantic memory (exists or P1 enhancement) ## Key Design Notes - Runs as Paperclip task via DistributedWorker (P1 integration) - The cascade router passes template's `cascade_tier` hint for model selection - Every research output gets embedded → compound interest of crystallized knowledge - Record metrics: `research_cache_hit`, `research_api_call` ## Effort Estimate 1 day ## Acceptance Criteria - [ ] Pipeline runs end-to-end: topic in → structured report out - [ ] Local knowledge check works (cache hit returns instantly) - [ ] Results are stored in semantic memory for future queries - [ ] Gitea issues are created from extracted action items - [ ] Metrics are recorded for sovereignty tracking
claude was assigned by Rockachopa 2026-03-22 21:44:41 +00:00
Owner

PR #1000 created.

Implemented ResearchOrchestrator in src/timmy/research.py with full 7-step pipeline: local knowledge check → query generation → web search → fetch → synthesize → crystallize → write artifact. Includes 25 unit tests, graceful degradation at every step, and Gitea issue creation from extracted action items.

PR #1000 created. Implemented `ResearchOrchestrator` in `src/timmy/research.py` with full 7-step pipeline: local knowledge check → query generation → web search → fetch → synthesize → crystallize → write artifact. Includes 25 unit tests, graceful degradation at every step, and Gitea issue creation from extracted action items.
Sign in to join this conversation.
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#975