[GOVERNING] Replacing Claude — Autonomous Research Pipeline Spec #972

Closed
opened 2026-03-22 19:07:27 +00:00 by perplexity · 3 comments
Collaborator

Summary

This is a governing implementation spec authored by Alexander Whitestone (rockachopa) that defines how Timmy must build an autonomous research pipeline to eliminate dependency on Claude and all corporate AI for deep research tasks.

Core thesis: "If this spec is implemented correctly, it is the last research document Alexander should need to request from a corporate AI."

The Problem

On March 22, 2026, a single Claude session produced six deep research reports. It consumed ~3 hours of human time and substantial corporate AI inference. Every report was valuable but the workflow was linear — it would cost exactly the same to reproduce tomorrow. This spec crystallizes that workflow into an automatable pipeline.

The Six-Step Research Pattern to Automate

Step What Happens Current Tool Sovereign Replacement
1. Scope Human describes knowledge gap Chat with Claude Gitea issue with research template
2. Query Formulate 5-15 targeted search queries Claude prompt engineering Query template + local LLM slot-fill
3. Search Execute queries, get top results Claude web_search Timmy web_search tool (already exists)
4. Fetch Read full pages for key results Claude web_fetch NEW: web_fetch tool (requests + trafilatura)
5. Synthesize Compress 200K tokens into structured report Claude Opus (frontier) cascade.py: Groq → Ollama local
6. Deliver Format as PDF/markdown, file Gitea issues Claude artifact generation reportlab PDF + Gitea API

Steps 1, 3, and 6 already exist. Three gaps must be closed: research templates (Step 2), web_fetch tool (Step 4), and synthesis quality at non-frontier models (Step 5).

Three Components to Build

3.1 Research Prompt Templates (skills/research/)

Six template files with YAML frontmatter and {slot} placeholders:

  • tool_evaluation.md — find all shipping tools for {domain}
  • architecture_spike.md — how to connect {system_a} to {system_b}
  • game_analysis.md — evaluate {game} for AI agent play
  • integration_guide.md — wire {tool} into {stack} with code
  • state_of_art.md — what exists in {field} as of {date}
  • competitive_scan.md — how does {project} compare to {alternatives}

3.2 Web Fetch Tool (src/timmy/tools.py)

New tool: download URL, extract clean text via trafilatura, truncate to token budget. Pure Python, zero cloud dependency. Register as Agno tool in create_full_toolkit().

3.3 Research Orchestrator (src/timmy/research.py)

Main pipeline chaining all steps. Key design: Step 0 — check local knowledge first. Every research output gets embedded and stored. Second query on same topic = SQLite lookup in milliseconds, zero API calls.

Cascade Strategy for Synthesis

Tier Model Cost Quality When to Use
1 (best) Claude API (claude-sonnet-4) $0.50-$2.00/report ★★★★★ Novel domains, high-stakes architecture
2 (good) Groq free tier (llama-3.3-70b) $0.00 (rate limited) ★★★★ Most research tasks (30 req/min free)
3 (local) Ollama qwen3:32b / qwen3-coder:32b $0.00 (local) ★★★ Routine lookups, re-synthesis
4 (cached) SQLite semantic memory $0.00 (instant) N/A Any question asked before. Target: 80%+

Kimi Delegation Pattern

For research exceeding Groq free tier: Timmy fills template → creates Gitea issue labeled kimi-ready → Kimi picks up from queue → commits artifact → Timmy indexes into memory.

Implementation Priority

Priority Component Effort Dependency
P0 web_fetch tool (trafilatura) 2 hours None — pure Python
P0 Research template library (6 templates) 4 hours None — markdown files
P0 Research orchestrator (src/timmy/research.py) 1 day web_fetch + templates
P1 Semantic index for research outputs 4 hours nomic-embed-text via Ollama
P1 Gitea issue creation from research findings 4 hours Gitea API (already used)
P1 Paperclip task integration 4 hours TaskRunner (exists)
P2 Kimi delegation via Gitea labels 2 hours Kimi watching Gitea queue
P2 Claude API fallback in cascade.py 2 hours Anthropic API key
P2 Research sovereignty metrics + dashboard 4 hours Metrics emitter (#954)

Total estimated effort: ~5 days of focused work.

Success Metrics

Metric Week 1 Month 1 Month 3 Graduation
Research queries answered locally 10% 40% 80% >90%
API cost per research task $1.50 $0.50 $0.10 <$0.01
Time from question to report 3 hours 30 min 5 min <1 min
Human involvement per task 100% Review only Approve only None

Immediate Transfer Actions (from Section VIII)

  1. Commit all PDFs/research artifacts to repo under docs/research/
  2. Create this epic with sub-issues (this issue)
  3. Extract GamePortal Protocol code into portals/protocol.py
  4. Extract stack_manifest into machine-readable JSON
  5. Write SOVEREIGNTY.md at repo root
  6. Index all six research reports into semantic memory via nomic-embed-text

PDF attached below. See cross-reference comment for links to work items and related tickets.

## Summary This is a **governing implementation spec** authored by Alexander Whitestone (rockachopa) that defines how Timmy must build an autonomous research pipeline to eliminate dependency on Claude and all corporate AI for deep research tasks. Core thesis: **"If this spec is implemented correctly, it is the last research document Alexander should need to request from a corporate AI."** ## The Problem On March 22, 2026, a single Claude session produced six deep research reports. It consumed ~3 hours of human time and substantial corporate AI inference. Every report was valuable but the workflow was linear — it would cost exactly the same to reproduce tomorrow. This spec crystallizes that workflow into an automatable pipeline. ## The Six-Step Research Pattern to Automate | Step | What Happens | Current Tool | Sovereign Replacement | |------|-------------|-------------|----------------------| | 1. Scope | Human describes knowledge gap | Chat with Claude | Gitea issue with research template | | 2. Query | Formulate 5-15 targeted search queries | Claude prompt engineering | Query template + local LLM slot-fill | | 3. Search | Execute queries, get top results | Claude web_search | Timmy web_search tool (already exists) | | 4. Fetch | Read full pages for key results | Claude web_fetch | **NEW: web_fetch tool (requests + trafilatura)** | | 5. Synthesize | Compress 200K tokens into structured report | Claude Opus (frontier) | **cascade.py: Groq → Ollama local** | | 6. Deliver | Format as PDF/markdown, file Gitea issues | Claude artifact generation | reportlab PDF + Gitea API | Steps 1, 3, and 6 already exist. **Three gaps** must be closed: research templates (Step 2), web_fetch tool (Step 4), and synthesis quality at non-frontier models (Step 5). ## Three Components to Build ### 3.1 Research Prompt Templates (`skills/research/`) Six template files with YAML frontmatter and `{slot}` placeholders: - `tool_evaluation.md` — find all shipping tools for `{domain}` - `architecture_spike.md` — how to connect `{system_a}` to `{system_b}` - `game_analysis.md` — evaluate `{game}` for AI agent play - `integration_guide.md` — wire `{tool}` into `{stack}` with code - `state_of_art.md` — what exists in `{field}` as of `{date}` - `competitive_scan.md` — how does `{project}` compare to `{alternatives}` ### 3.2 Web Fetch Tool (`src/timmy/tools.py`) New tool: download URL, extract clean text via trafilatura, truncate to token budget. Pure Python, zero cloud dependency. Register as Agno tool in `create_full_toolkit()`. ### 3.3 Research Orchestrator (`src/timmy/research.py`) Main pipeline chaining all steps. Key design: **Step 0 — check local knowledge first.** Every research output gets embedded and stored. Second query on same topic = SQLite lookup in milliseconds, zero API calls. ## Cascade Strategy for Synthesis | Tier | Model | Cost | Quality | When to Use | |------|-------|------|---------|-------------| | 1 (best) | Claude API (claude-sonnet-4) | $0.50-$2.00/report | ★★★★★ | Novel domains, high-stakes architecture | | 2 (good) | Groq free tier (llama-3.3-70b) | $0.00 (rate limited) | ★★★★ | Most research tasks (30 req/min free) | | 3 (local) | Ollama qwen3:32b / qwen3-coder:32b | $0.00 (local) | ★★★ | Routine lookups, re-synthesis | | 4 (cached) | SQLite semantic memory | $0.00 (instant) | N/A | Any question asked before. Target: 80%+ | ## Kimi Delegation Pattern For research exceeding Groq free tier: Timmy fills template → creates Gitea issue labeled `kimi-ready` → Kimi picks up from queue → commits artifact → Timmy indexes into memory. ## Implementation Priority | Priority | Component | Effort | Dependency | |----------|-----------|--------|------------| | **P0** | web_fetch tool (trafilatura) | 2 hours | None — pure Python | | **P0** | Research template library (6 templates) | 4 hours | None — markdown files | | **P0** | Research orchestrator (src/timmy/research.py) | 1 day | web_fetch + templates | | **P1** | Semantic index for research outputs | 4 hours | nomic-embed-text via Ollama | | **P1** | Gitea issue creation from research findings | 4 hours | Gitea API (already used) | | **P1** | Paperclip task integration | 4 hours | TaskRunner (exists) | | **P2** | Kimi delegation via Gitea labels | 2 hours | Kimi watching Gitea queue | | **P2** | Claude API fallback in cascade.py | 2 hours | Anthropic API key | | **P2** | Research sovereignty metrics + dashboard | 4 hours | Metrics emitter (#954) | **Total estimated effort: ~5 days of focused work.** ## Success Metrics | Metric | Week 1 | Month 1 | Month 3 | Graduation | |--------|--------|---------|---------|------------| | Research queries answered locally | 10% | 40% | 80% | >90% | | API cost per research task | $1.50 | $0.50 | $0.10 | <$0.01 | | Time from question to report | 3 hours | 30 min | 5 min | <1 min | | Human involvement per task | 100% | Review only | Approve only | None | ## Immediate Transfer Actions (from Section VIII) 1. Commit all PDFs/research artifacts to repo under `docs/research/` 2. Create this epic with sub-issues (this issue) 3. Extract GamePortal Protocol code into `portals/protocol.py` 4. Extract stack_manifest into machine-readable JSON 5. Write `SOVEREIGNTY.md` at repo root 6. Index all six research reports into semantic memory via nomic-embed-text --- **PDF attached below. See cross-reference comment for links to work items and related tickets.**
Author
Collaborator

Cross-References

Work Items (P0/P1/P2)

P0 — Build First:

  • #973 — web_fetch tool (trafilatura)
  • #974 — Research prompt template library (6 templates)
  • #975 — ResearchOrchestrator pipeline

P1 — Build Next:

  • #976 — Semantic index for research outputs (nomic-embed + SQLite)
  • #977 — Auto-create Gitea issues from research findings
  • #978 — Paperclip task runner integration

P2 — Build Later:

  • #979 — Kimi delegation via Gitea labels
  • #980 — Claude API fallback tier in cascade.py
  • #981 — Research sovereignty metrics + dashboard
  • #982 — Session Crystallization Playbook (master handoff, references this spec)
  • #953 — Sovereignty Loop (the governing architecture this pipeline serves)
  • #954 — Metrics emitter (feeds #981 sovereignty tracking)
  • #969 — UESP RAG pipeline (shares nomic-embed + ChromaDB infrastructure with #976)
  • #911 — Wire Gitea API for PR creation (same Gitea API patterns as #977)
  • #904 — Autoresearch (self-improvement loop complements research sovereignty)

Transfer Actions (from Section VIII)

  1. Commit all PDFs to docs/research/ attached to Gitea issues
  2. Create epic with sub-issues — this issue + 9 children
  3. Extract GamePortal Protocol to portals/protocol.py — needs separate ticket
  4. Extract stack_manifest to JSON — #986
  5. Write SOVEREIGNTY.md at repo root — needs separate ticket
  6. Index research into semantic memory — covered by #976
## Cross-References ### Work Items (P0/P1/P2) **P0 — Build First:** - #973 — web_fetch tool (trafilatura) - #974 — Research prompt template library (6 templates) - #975 — ResearchOrchestrator pipeline **P1 — Build Next:** - #976 — Semantic index for research outputs (nomic-embed + SQLite) - #977 — Auto-create Gitea issues from research findings - #978 — Paperclip task runner integration **P2 — Build Later:** - #979 — Kimi delegation via Gitea labels - #980 — Claude API fallback tier in cascade.py - #981 — Research sovereignty metrics + dashboard ### Related Existing Tickets - #982 — Session Crystallization Playbook (master handoff, references this spec) - #953 — Sovereignty Loop (the governing architecture this pipeline serves) - #954 — Metrics emitter (feeds #981 sovereignty tracking) - #969 — UESP RAG pipeline (shares nomic-embed + ChromaDB infrastructure with #976) - #911 — Wire Gitea API for PR creation (same Gitea API patterns as #977) - #904 — Autoresearch (self-improvement loop complements research sovereignty) ### Transfer Actions (from Section VIII) 1. Commit all PDFs to `docs/research/` — ✅ attached to Gitea issues 2. Create epic with sub-issues — ✅ this issue + 9 children 3. Extract GamePortal Protocol to `portals/protocol.py` — needs separate ticket 4. Extract stack_manifest to JSON — #986 5. Write SOVEREIGNTY.md at repo root — needs separate ticket 6. Index research into semantic memory — covered by #976
claude was assigned by Rockachopa 2026-03-22 23:30:49 +00:00
Author
Collaborator

📎 Cross-reference: #1063 — [Study] Best Local Uncensored Agent Model for M3 Max 36GB

This study directly answers the "which local model replaces Claude" question:

  • Primary: Qwen3-14B Q5_K_M (0.971 F1 tool calling, GPT-4-class structured output)
  • Fast mode: Qwen3-8B Q6_K for routine tasks at 2x speed
  • Key insight: "Uncensored" label is a red herring — abliteration degrades the exact capabilities an agent orchestrator needs most. Qwen3-14B is permissive enough + Ollama constrained decoding forces compliance.
📎 **Cross-reference:** #1063 — [Study] Best Local Uncensored Agent Model for M3 Max 36GB This study directly answers the "which local model replaces Claude" question: - **Primary:** Qwen3-14B Q5_K_M (0.971 F1 tool calling, GPT-4-class structured output) - **Fast mode:** Qwen3-8B Q6_K for routine tasks at 2x speed - **Key insight:** "Uncensored" label is a red herring — abliteration degrades the exact capabilities an agent orchestrator needs most. Qwen3-14B is permissive enough + Ollama constrained decoding forces compliance.
claude added the harnessp2-backlog labels 2026-03-23 13:56:04 +00:00
Collaborator

PR created: #1274

What was implemented

src/timmy/research.py — 6-step ResearchOrchestrator:

  • Step 0: Semantic cache check (instant, $0, ~80% hit rate after warm-up)
  • Step 1: Template loading from skills/research/
  • Step 2: Query formulation via Ollama
  • Step 3: Web search (SerpAPI, degrades gracefully)
  • Step 4: Full-page fetch via web_fetch / trafilatura
  • Step 5: Synthesis cascade (Ollama local → Claude API fallback)
  • Step 6: Memory indexing + optional disk persist

tests/timmy/test_research.py — 24 unit tests, all passing (461 total pass)

SOVEREIGNTY.md — machine-readable sovereignty manifest at repo root

What was already done (no changes needed)

  • All 6 research templates in skills/research/
  • web_fetch tool with trafilatura

P0 items from spec (§3.1, §3.2, §3.3) are now complete. Refs #973, #974, #975.

PR created: #1274 ## What was implemented **`src/timmy/research.py`** — 6-step ResearchOrchestrator: - Step 0: Semantic cache check (instant, $0, ~80% hit rate after warm-up) - Step 1: Template loading from `skills/research/` - Step 2: Query formulation via Ollama - Step 3: Web search (SerpAPI, degrades gracefully) - Step 4: Full-page fetch via `web_fetch` / trafilatura - Step 5: Synthesis cascade (Ollama local → Claude API fallback) - Step 6: Memory indexing + optional disk persist **`tests/timmy/test_research.py`** — 24 unit tests, all passing (461 total pass) **`SOVEREIGNTY.md`** — machine-readable sovereignty manifest at repo root ## What was already done (no changes needed) - All 6 research templates in `skills/research/` ✅ - `web_fetch` tool with trafilatura ✅ P0 items from spec (§3.1, §3.2, §3.3) are now complete. Refs #973, #974, #975.
Sign in to join this conversation.
No Label harness p2-backlog
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#972