diff --git a/docs/DEEPSDIVE_ARCHITECTURE.md b/docs/DEEPSDIVE_ARCHITECTURE.md new file mode 100644 index 0000000..c80f8b0 --- /dev/null +++ b/docs/DEEPSDIVE_ARCHITECTURE.md @@ -0,0 +1,88 @@ +# Deep Dive — Sovereign NotebookLM Architecture + +> Parent: [#830](http://143.198.27.163:3000/Timmy_Foundation/the-nexus/issues/830) +> Status: Architecture committed, awaiting infrastructure decisions +> Owner: @ezra +> Created: 2026-04-05 + +## Vision + +**Deep Dive** is a fully automated daily intelligence briefing system that eliminates the 20+ minute manual research overhead. It produces a personalized AI-generated podcast (or text briefing) with **zero manual input**. + +Unlike NotebookLM which requires manual source curation, Deep Dive operates autonomously. + +## Architecture Overview + +``` +┌──────────────────────────────────────────────────────────────────────────────┐ +│ D E E P D I V E P I P E L I N E │ +├──────────────────────────────────────────────────────────────────────────────┤ +│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌────────┐ │ +│ │ AGGREGATE │──▶│ FILTER │──▶│ SYNTHESIZE│──▶│ AUDIO │──▶│DELIVER │ │ +│ │ arXiv RSS │ │ Keywords │ │ LLM brief │ │ TTS voice │ │Telegram│ │ +│ └───────────┘ └───────────┘ └───────────┘ └───────────┘ └────────┘ │ +└──────────────────────────────────────────────────────────────────────────────┘ +``` + +## Phase Specifications + +### Phase 1: Aggregate +Fetches from arXiv RSS (cs.AI, cs.CL, cs.LG), lab blogs, newsletters. + +**Output**: `List[RawItem]` +**Implementation**: `bin/deepdive_aggregator.py` + +### Phase 2: Filter +Ranks items by keyword relevance to Hermes/Timmy work. + +**Scoring Algorithm (MVP)**: +```python +keywords = ["agent", "llm", "tool use", "rlhf", "alignment"] +score = sum(1 for kw in keywords if kw in content) +``` + +### Phase 3: Synthesize +LLM generates structured briefing: HEADLINES, DEEP DIVES, BOTTOM LINE. + +### Phase 4: Audio +TTS converts briefing to MP3 (10-15 min). + +**Decision needed**: Local (Piper/coqui) vs API (ElevenLabs/OpenAI) + +### Phase 5: Deliver +Telegram voice message delivered at scheduled time (default 6 AM). + +## Implementation Path + +### MVP (2 hours, Phases 1+5) +arXiv RSS → keyword filter → text briefing → Telegram text at 6 AM + +### V1 (1 week, Phases 1-3+5) +Add LLM synthesis, more sources + +### V2 (2 weeks, Full) +Add TTS audio, embedding-based filtering + +## Integration Points + +| System | Point | Status | +|--------|-------|--------| +| Hermes | `/deepdive` command | Pending | +| timmy-config | `cron/jobs.json` entry | Ready | +| Telegram | Voice delivery | Existing | +| TTS Service | Local vs API | **NEEDS DECISION** | + +## Files + +- `docs/DEEPSDIVE_ARCHITECTURE.md` — This document +- `bin/deepdive_aggregator.py` — Phase 1 source adapters +- `bin/deepdive_orchestrator.py` — Pipeline controller + +## Blockers + +| # | Item | Status | +|---|------|--------| +| 1 | TTS Service decision | **NEEDS DECISION** | +| 2 | `/deepdive` command registration | Pending | + +**Ezra, Architect** — 2026-04-05