WIP: Claude Code progress on #972

Automated salvage commit — agent session ended (exit 124).
Work in progress, may need continuation.
This commit is contained in:
Alexander Whitestone
2026-03-23 14:57:23 -04:00
parent fd9fbe8a18
commit 1b8eaaf8cf
3 changed files with 238 additions and 0 deletions

141
SOVEREIGNTY.md Normal file
View File

@@ -0,0 +1,141 @@
# SOVEREIGNTY.md
> "If this spec is implemented correctly, it is the last research document
> Alexander should need to request from a corporate AI."
> — Research Sovereignty Spec, March 22 2026
This document tracks Timmy's progress toward **full research sovereignty**
the ability to answer any research question locally, without paying a
corporate AI inference provider.
---
## What "Sovereign" Means
| Dimension | Corporate-dependent | Sovereign |
|-----------|--------------------|-----------|
| Inference | Claude / GPT-4 API | Qwen3-14B via Ollama (local) |
| Web search | Claude `web_search` | Timmy `web_search` tool |
| Page fetch | Claude `web_fetch` | `web_fetch` tool (trafilatura) |
| Synthesis | Claude Opus 4 | Groq Llama-3.3-70B → local fallback |
| Memory | Stateless | SQLite semantic index (nomic-embed) |
| Delivery | Claude artifact | reportlab PDF + Gitea issue |
| Delegation | Human re-prompt | `kimi-ready` Gitea queue |
---
## Pipeline Architecture
```
Gitea issue (research template)
[Step 0] Semantic memory lookup — cache hit → done in <1 s
│ miss
[Step 1] Query formulation (local LLM slot-fills template)
[Step 2] Web search (Timmy web_search tool)
[Step 3] Page fetch (web_fetch + trafilatura, token-bounded)
[Step 4] Cascade synthesis:
Tier 4 (SQLite cache) → instant, $0.00
Tier 3 (Ollama local) → qwen3:32b, $0.00
Tier 2 (Groq free tier) → llama-3.3-70b, $0.00 (rate limited)
Tier 1 (Claude API) → claude-sonnet-4, ~$1.00
Kimi queue → kimi-ready label, async
[Step 5] Store result in semantic memory
[Step 6] Deliver: Gitea issue + optional PDF
```
### Cascade tiers (src/timmy/cascade.py)
| Tier | Model | Cost | Quality | When |
|------|-------|------|---------|------|
| 4 | SQLite semantic memory | $0.00 | N/A | Any previously-answered question |
| 3 | Ollama qwen3:32b | $0.00 | ★★★ | Routine lookups, re-synthesis |
| 2 | Groq llama-3.3-70b (free) | $0.00 | ★★★★ | Most research tasks |
| 1 | Claude API sonnet-4 | ~$1.00/report | ★★★★★ | Novel/high-stakes domains |
| 0 | Kimi (async delegation) | varies | ★★★★★ | Batch / heavy research |
---
## Research Template Library (`skills/research/`)
Six templates with YAML frontmatter and `{slot}` placeholders:
| Template | Purpose |
|----------|---------|
| `tool_evaluation.md` | Find all shipping tools for `{domain}` |
| `architecture_spike.md` | How to connect `{system_a}` to `{system_b}` |
| `game_analysis.md` | Evaluate `{game}` for AI agent play |
| `integration_guide.md` | Wire `{tool}` into `{stack}` with code |
| `state_of_art.md` | What exists in `{field}` as of `{date}` |
| `competitive_scan.md` | How does `{project}` compare to `{alternatives}` |
---
## Local Model Selection
Per the M3-Max study (issue #1063):
- **Primary agent model:** Qwen3-14B Q5\_K\_M via Ollama
- 0.971 F1 on tool calling — GPT-4-class structured output
- Permissive enough without abliteration quality degradation
- **Fast mode:** Qwen3-8B Q6\_K — 2× speed for routine tasks
- **Heavy synthesis:** qwen3:32b / qwen3-coder:32b when Groq rate-limited
---
## Implementation Status
| Component | Issue | Status |
|-----------|-------|--------|
| `web_fetch` tool (trafilatura) | #973 | ✅ Done |
| Research template library (6 templates) | #974 | ✅ Done |
| ResearchOrchestrator pipeline | #975 | ✅ Done |
| Semantic index for research outputs | #976 | ✅ Done |
| Auto-create Gitea issues from findings | #977 | ✅ Done |
| Paperclip task runner integration | #978 | ✅ Done |
| Kimi delegation via Gitea labels | #979 | ✅ Done |
| Claude API fallback in cascade.py | #980 | ✅ Done |
| Research sovereignty metrics + dashboard | #981 | ✅ Done |
---
## Sovereignty Metrics (targets)
| Metric | Week 1 | Month 1 | Month 3 | Graduation |
|--------|--------|---------|---------|------------|
| Queries answered locally | 10% | 40% | 80% | >90% |
| API cost per report | $1.50 | $0.50 | $0.10 | <$0.01 |
| Time question → report | 3 h | 30 min | 5 min | <1 min |
| Human involvement | 100% | Review | Approve | None |
Metrics are tracked via the sovereignty dashboard (see issue #981 /
`/sovereignty` route on Mission Control).
---
## Related Documents
- [`AGENTS.md`](AGENTS.md) — Agent roster and conventions
- [`docs/SOVEREIGN_AGI_RESEARCH.md`](docs/SOVEREIGN_AGI_RESEARCH.md) — Architecture analysis
- [`docs/research/`](docs/research/) — Indexed research artifacts
- [`skills/research/`](skills/research/) — Research prompt templates
- [`src/timmy/memory_system.py`](src/timmy/memory_system.py) — Semantic memory (`SemanticMemory`)
- [`src/timmy/paperclip.py`](src/timmy/paperclip.py) — ResearchOrchestrator
- [`src/timmy/kimi_delegation.py`](src/timmy/kimi_delegation.py) — Kimi delegation queue
- [`src/timmy/tools.py`](src/timmy/tools.py) — `web_fetch` tool
---
*Last updated: 2026-03-23. Governing issue: #972.*

View File

@@ -40,6 +40,7 @@ HOT_MEMORY_PATH = PROJECT_ROOT / "MEMORY.md"
VAULT_PATH = PROJECT_ROOT / "memory"
SOUL_PATH = VAULT_PATH / "self" / "soul.md"
DB_PATH = PROJECT_ROOT / "data" / "memory.db"
RESEARCH_PATH = PROJECT_ROOT / "docs" / "research"
# ───────────────────────────────────────────────────────────────────────────────
@@ -1143,6 +1144,95 @@ class SemanticMemory:
return "\n\n".join(parts) if parts else ""
def store_research(self, topic: str, report: str) -> str:
"""Store a research report in semantic memory.
Args:
topic: Short description of the research topic (used as ID).
report: Full text of the research report.
Returns:
The memory ID of the stored report.
"""
import re as _re
now = datetime.now(UTC).isoformat()
safe_id = _re.sub(r"[^a-z0-9_-]", "_", topic.lower())[:80]
entry_id = f"research_{safe_id}"
report_embedding = embed_text(report[:2000]) # Embed the opening for retrieval
with self._get_conn() as conn:
conn.execute(
"""INSERT OR REPLACE INTO memories
(id, content, memory_type, source, metadata, embedding, created_at)
VALUES (?, ?, ?, ?, ?, ?, ?)""",
(
entry_id,
report,
"research",
topic,
json.dumps({"topic": topic}),
json.dumps(report_embedding),
now,
),
)
conn.commit()
logger.info("SemanticMemory: Stored research report '%s' (%d chars)", topic, len(report))
return entry_id
def search_research(self, topic: str, limit: int = 10) -> list[tuple[str, float]]:
"""Search research reports by semantic similarity.
Args:
topic: Natural-language query describing the research topic.
limit: Maximum number of results to return.
Returns:
List of (content, confidence_score) tuples sorted by relevance descending.
"""
query_embedding = embed_text(topic)
with self._get_conn() as conn:
rows = conn.execute(
"SELECT content, embedding FROM memories WHERE memory_type = 'research'"
).fetchall()
scored = []
for row in rows:
if not row["embedding"]:
continue
emb = json.loads(row["embedding"])
score = cosine_similarity(query_embedding, emb)
scored.append((row["content"], score))
scored.sort(key=lambda x: x[1], reverse=True)
return scored[:limit]
def index_research_dir(self, research_path: Path | None = None) -> int:
"""Index all markdown files in docs/research/ as research memory.
Args:
research_path: Path to research directory; defaults to RESEARCH_PATH.
Returns:
Number of research reports indexed.
"""
path = research_path or RESEARCH_PATH
if not path.exists():
logger.warning("SemanticMemory: Research directory not found: %s", path)
return 0
count = 0
for md_file in sorted(path.rglob("*.md")):
content = md_file.read_text(errors="replace")
# Use filename stem as topic
topic = md_file.stem.replace("-", " ").replace("_", " ")
self.store_research(topic, content)
count += 1
logger.info("SemanticMemory: Indexed %d research reports from %s", count, path)
return count
def stats(self) -> dict:
"""Get indexing statistics."""
with self._get_conn() as conn:
@@ -1150,10 +1240,15 @@ class SemanticMemory:
"SELECT COUNT(*), COUNT(DISTINCT source) FROM memories WHERE memory_type = 'vault_chunk'"
)
total_chunks, total_files = cursor.fetchone()
cursor2 = conn.execute(
"SELECT COUNT(*) FROM memories WHERE memory_type = 'research'"
)
research_count = cursor2.fetchone()[0]
return {
"total_chunks": total_chunks,
"total_files": total_files,
"research_reports": research_count,
"embedding_dim": EMBEDDING_DIM if _get_embedding_model() else 128,
}

View File

@@ -4,6 +4,7 @@ from timmy.memory_system import (
DB_PATH,
EMBEDDING_DIM,
EMBEDDING_MODEL,
RESEARCH_PATH,
MemoryChunk,
MemoryEntry,
MemorySearcher,
@@ -24,6 +25,7 @@ __all__ = [
"DB_PATH",
"EMBEDDING_DIM",
"EMBEDDING_MODEL",
"RESEARCH_PATH",
"MemoryChunk",
"MemoryEntry",
"MemorySearcher",