Rockachopa/Timmy-time-dashboard

Fork 2

Files

Alexander Whitestone 81ee0557d6

Tests / lint (pull_request) Failing after 35s

Details

Tests / test (pull_request) Has been skipped

Details

feat: implement autonomous research pipeline (#972 )

Closes three P0 items from the governing research sovereignty spec:

- `src/timmy/research.py` — ResearchOrchestrator (6-step pipeline):
  Step 0 semantic cache check (SQLite, instant, $0 cost)
  Step 1 research template loading from skills/research/
  Step 2 query formulation via Ollama slot-fill
  Step 3 web search via SerpAPI (graceful fallback when key absent)
  Step 4 full-page fetch via trafilatura (web_fetch)
  Step 5 synthesis via cascade (Ollama → Claude API fallback)
  Step 6 store to semantic memory + optional disk persist

- `tests/timmy/test_research.py` — 24 unit tests, all passing

- `SOVEREIGNTY.md` — machine-readable research independence manifest
  with pipeline status, cascade tiers, templates, and metrics targets

Refs #972 (governing spec), #973 (web_fetch), #974 (templates), #975 (orchestrator)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-23 21:39:58 -04:00

4.2 KiB

Raw Blame History

SOVEREIGNTY.md — Research Sovereignty Manifest

"If this spec is implemented correctly, it is the last research document Alexander should need to request from a corporate AI." — Issue #972, March 22 2026

What This Is

A machine-readable declaration of Timmy's research independence: where we are, where we're going, and how to measure progress.

The Problem We're Solving

On March 22, 2026, a single Claude session produced six deep research reports. It consumed ~3 hours of human time and substantial corporate AI inference. Every report was valuable — but the workflow was linear. It would cost exactly the same to reproduce tomorrow.

This file tracks the pipeline that crystallizes that workflow into something Timmy can run autonomously.

The Six-Step Pipeline

Step	What Happens	Status
1. Scope	Human describes knowledge gap → Gitea issue with template	✅ Done (`skills/research/`)
2. Query	LLM slot-fills template → 5–15 targeted queries	✅ Done (`research.py`)
3. Search	Execute queries → top result URLs	✅ Done (`research_tools.py`)
4. Fetch	Download + extract full pages (trafilatura)	✅ Done (`tools/system_tools.py`)
5. Synthesize	Compress findings → structured report	✅ Done (`research.py` cascade)
6. Deliver	Store to semantic memory + optional disk persist	✅ Done (`research.py`)

Cascade Tiers (Synthesis Quality vs. Cost)

Tier	Model	Cost	Quality	Status
4	SQLite semantic cache	$0.00 / instant	reuses prior	✅ Active
3	Ollama `qwen3:14b`	$0.00 / local	★★★	✅ Active
2	Claude API (haiku)	~$0.01/report	★★★★	✅ Active (opt-in)
1	Groq `llama-3.3-70b`	$0.00 / rate-limited	★★★★	🔲 Planned (#980)

Set ANTHROPIC_API_KEY to enable Tier 2 fallback.

Research Templates

Six prompt templates live in skills/research/:

Template	Use Case
`tool_evaluation.md`	Find all shipping tools for `{domain}`
`architecture_spike.md`	How to connect `{system_a}` to `{system_b}`
`game_analysis.md`	Evaluate `{game}` for AI agent play
`integration_guide.md`	Wire `{tool}` into `{stack}` with code
`state_of_art.md`	What exists in `{field}` as of `{date}`
`competitive_scan.md`	How does `{project}` compare to `{alternatives}`

Sovereignty Metrics

Metric	Target (Week 1)	Target (Month 1)	Target (Month 3)	Graduation
Queries answered locally	10%	40%	80%	>90%
API cost per report	<$1.50	<$0.50	<$0.10	<$0.01
Time from question to report	<3 hours	<30 min	<5 min	<1 min
Human involvement	100% (review)	Review only	Approve only	None

How to Use the Pipeline

from timmy.research import run_research

# Quick research (no template)
result = await run_research("best local embedding models for 36GB RAM")

# With a template and slot values
result = await run_research(
    topic="PDF text extraction libraries for Python",
    template="tool_evaluation",
    slots={"domain": "PDF parsing", "use_case": "RAG pipeline", "focus_criteria": "accuracy"},
    save_to_disk=True,
)

print(result.report)
print(f"Backend: {result.synthesis_backend}, Cached: {result.cached}")

Implementation Status

Component	Issue	Status
`web_fetch` tool (trafilatura)	#973	✅ Done
Research template library (6 templates)	#974	✅ Done
`ResearchOrchestrator` (`research.py`)	#975	✅ Done
Semantic index for outputs	#976	🔲 Planned
Auto-create Gitea issues from findings	#977	🔲 Planned
Paperclip task runner integration	#978	🔲 Planned
Kimi delegation via labels	#979	🔲 Planned
Groq free-tier cascade tier	#980	🔲 Planned
Sovereignty metrics dashboard	#981	🔲 Planned

Governing Spec

See issue #972 for the full spec and rationale.

Research artifacts committed to docs/research/.

4.2 KiB Raw Blame History Unescape Escape