# Deep Dive — Sovereign NotebookLM Architecture

> Parent: [#830](http://143.198.27.163:3000/Timmy_Foundation/the-nexus/issues/830)  
> Status: Architecture committed, awaiting infrastructure decisions  
> Owner: @ezra  
> Created: 2026-04-05

## Vision

**Deep Dive** is a fully automated daily intelligence briefing system that eliminates the 20+ minute manual research overhead. It produces a personalized AI-generated podcast (or text briefing) with **zero manual input**.

Unlike NotebookLM which requires manual source curation, Deep Dive operates autonomously.

## Architecture Overview

```
┌──────────────────────────────────────────────────────────────────────────────┐
│                    D E E P   D I V E   P I P E L I N E                       │
├──────────────────────────────────────────────────────────────────────────────┤
│  ┌───────────┐   ┌───────────┐   ┌───────────┐   ┌───────────┐   ┌────────┐ │
│  │ AGGREGATE │──▶│  FILTER   │──▶│ SYNTHESIZE│──▶│   AUDIO   │──▶│DELIVER │ │
│  │ arXiv RSS │   │ Keywords  │   │ LLM brief │   │ TTS voice │   │Telegram│ │
│  └───────────┘   └───────────┘   └───────────┘   └───────────┘   └────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
```

## Phase Specifications

### Phase 1: Aggregate
Fetches from arXiv RSS (cs.AI, cs.CL, cs.LG), lab blogs, newsletters.

**Output**: `List[RawItem]`  
**Implementation**: `bin/deepdive_aggregator.py`

### Phase 2: Filter
Ranks items by keyword relevance to Hermes/Timmy work.

**Scoring Algorithm (MVP)**:
```python
keywords = ["agent", "llm", "tool use", "rlhf", "alignment"]
score = sum(1 for kw in keywords if kw in content)
```

### Phase 3: Synthesize
LLM generates structured briefing: HEADLINES, DEEP DIVES, BOTTOM LINE.

### Phase 4: Audio
TTS converts briefing to MP3 (10-15 min).

**Decision needed**: Local (Piper/coqui) vs API (ElevenLabs/OpenAI)

### Phase 5: Deliver
Telegram voice message delivered at scheduled time (default 6 AM).

## Implementation Path

### MVP (2 hours, Phases 1+5)
arXiv RSS → keyword filter → text briefing → Telegram text at 6 AM

### V1 (1 week, Phases 1-3+5)
Add LLM synthesis, more sources

### V2 (2 weeks, Full)
Add TTS audio, embedding-based filtering

## Integration Points

| System | Point | Status |
|--------|-------|--------|
| Hermes | `/deepdive` command | Pending |
| timmy-config | `cron/jobs.json` entry | Ready |
| Telegram | Voice delivery | Existing |
| TTS Service | Local vs API | **NEEDS DECISION** |

## Files

- `docs/DEEPSDIVE_ARCHITECTURE.md` — This document
- `bin/deepdive_aggregator.py` — Phase 1 source adapters
- `bin/deepdive_orchestrator.py` — Pipeline controller

## Blockers

| # | Item | Status |
|---|------|--------|
| 1 | TTS Service decision | **NEEDS DECISION** |
| 2 | `/deepdive` command registration | Pending |

**Ezra, Architect** — 2026-04-05