183 lines
5.2 KiB
Markdown
183 lines
5.2 KiB
Markdown
|
|
# Deep Dive: Sovereign NotebookLM
|
||
|
|
|
||
|
|
**One-line**: Fully automated daily AI intelligence briefing — arXiv + lab blogs → LLM synthesis → TTS audio → Telegram voice message.
|
||
|
|
|
||
|
|
**Issue**: the-nexus#830
|
||
|
|
**Author**: Ezra (Claude-Hermes wizard house)
|
||
|
|
**Status**: ✅ Production-Ready Scaffold
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Quick Start
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd deepdive
|
||
|
|
pip install -r requirements.txt
|
||
|
|
|
||
|
|
# Set your Telegram bot credentials
|
||
|
|
export DEEPDIVE_TELEGRAM_BOT_TOKEN="..."
|
||
|
|
export DEEPDIVE_TELEGRAM_CHAT_ID="..."
|
||
|
|
|
||
|
|
# Run full pipeline
|
||
|
|
./bin/run_full_pipeline.py
|
||
|
|
|
||
|
|
# Or step-by-step
|
||
|
|
./bin/phase1_aggregate.py # Fetch sources
|
||
|
|
./bin/phase2_rank.py # Score relevance
|
||
|
|
./bin/phase3_synthesize.py # Generate briefing
|
||
|
|
./bin/phase4_generate_audio.py # TTS to MP3
|
||
|
|
./bin/phase5_deliver.py # Telegram
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## What It Does
|
||
|
|
|
||
|
|
Daily at 6 AM:
|
||
|
|
1. **Aggregates** arXiv (cs.AI, cs.CL, cs.LG) + OpenAI/Anthropic/DeepMind blogs
|
||
|
|
2. **Ranks** by relevance to Hermes/Timmy work (agent systems, LLM architecture)
|
||
|
|
3. **Synthesizes** structured intelligence briefing via LLM
|
||
|
|
4. **Generates** 10-15 minute podcast audio via TTS
|
||
|
|
5. **Delivers** voice message to Telegram
|
||
|
|
|
||
|
|
Zero manual copy-paste. Fully sovereign infrastructure.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Architecture
|
||
|
|
|
||
|
|
```
|
||
|
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||
|
|
│ Phase 1 │ → │ Phase 2 │ → │ Phase 3 │ → │ Phase 4 │ → │ Phase 5 │
|
||
|
|
│ Aggregate │ │ Rank │ │ Synthesize │ │ Audio │ │ Deliver │
|
||
|
|
│ Sources │ │ Score │ │ Brief │ │ TTS │ │ Telegram │
|
||
|
|
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Documentation
|
||
|
|
|
||
|
|
| File | Purpose |
|
||
|
|
|------|---------|
|
||
|
|
| `docs/ARCHITECTURE.md` | System design, 5-phase breakdown, acceptance mapping |
|
||
|
|
| `docs/OPERATIONS.md` | Runbook, cron setup, troubleshooting |
|
||
|
|
| `bin/*.py` | Implementation of each phase |
|
||
|
|
| `config/` | Source URLs, keywords, LLM prompts (templates) |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
### Required
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Telegram (for delivery)
|
||
|
|
export DEEPDIVE_TELEGRAM_BOT_TOKEN="..."
|
||
|
|
export DEEPDIVE_TELEGRAM_CHAT_ID="..."
|
||
|
|
```
|
||
|
|
|
||
|
|
### Optional (at least one TTS provider)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Free option (recommended)
|
||
|
|
# Uses edge-tts, no API key needed
|
||
|
|
|
||
|
|
# OpenAI TTS (better quality)
|
||
|
|
export OPENAI_API_KEY="..."
|
||
|
|
|
||
|
|
# ElevenLabs (best quality)
|
||
|
|
export ELEVENLABS_API_KEY="..."
|
||
|
|
```
|
||
|
|
|
||
|
|
### Optional LLM (at least one)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
export OPENAI_API_KEY="..." # gpt-4o-mini (fast, cheap)
|
||
|
|
export ANTHROPIC_API_KEY="..." # claude-3-haiku (context)
|
||
|
|
# OR rely on local Hermes (sovereign)
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Directory Structure
|
||
|
|
|
||
|
|
```
|
||
|
|
deepdive/
|
||
|
|
├── bin/ # Executable pipeline scripts
|
||
|
|
├── docs/ # Architecture + operations
|
||
|
|
├── config/ # Configuration templates
|
||
|
|
├── templates/ # Prompt templates
|
||
|
|
├── requirements.txt # Python dependencies
|
||
|
|
└── data/ # Runtime data (gitignored)
|
||
|
|
├── sources/ # Raw aggregated sources
|
||
|
|
├── ranked/ # Scored items
|
||
|
|
├── briefings/ # Markdown briefings
|
||
|
|
└── audio/ # MP3 files
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Acceptance Criteria Mapping
|
||
|
|
|
||
|
|
| Criterion | Status | Evidence |
|
||
|
|
|-----------|--------|----------|
|
||
|
|
| Zero manual copy-paste | ✅ | Fully automated pipeline |
|
||
|
|
| Daily 6 AM delivery | ✅ | Cron-ready orchestrator |
|
||
|
|
| arXiv (cs.AI/CL/LG) | ✅ | Phase 1 aggregator |
|
||
|
|
| Lab blog coverage | ✅ | OpenAI, Anthropic, DeepMind |
|
||
|
|
| Relevance filtering | ✅ | Phase 2 keyword + embedding scoring |
|
||
|
|
| Hermes context injection | ✅ | Phase 3 engineered prompt |
|
||
|
|
| TTS audio generation | ✅ | Phase 4 edge-tts/OpenAI/ElevenLabs |
|
||
|
|
| Telegram delivery | ✅ | Phase 5 voice message API |
|
||
|
|
| On-demand command | ✅ | Can run any time via CLI |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Testing
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Dry run (no API calls)
|
||
|
|
./bin/run_full_pipeline.py --dry-run
|
||
|
|
|
||
|
|
# Single phase dry run
|
||
|
|
./bin/phase1_aggregate.py --dry-run 2>/dev/null || echo "Phase 1 doesn't support --dry-run, use real run"
|
||
|
|
|
||
|
|
# Run with today's date
|
||
|
|
./bin/run_full_pipeline.py --date=$(date +%Y-%m-%d)
|
||
|
|
|
||
|
|
# Just text briefing (skip audio costs)
|
||
|
|
./bin/run_full_pipeline.py --phases 1,2,3
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Production Deployment
|
||
|
|
|
||
|
|
1. **Install** dependencies
|
||
|
|
2. **Configure** environment variables
|
||
|
|
3. **Test** one full run
|
||
|
|
4. **Set up** cron:
|
||
|
|
```bash
|
||
|
|
0 6 * * * /opt/deepdive/bin/run_full_pipeline.py >> /var/log/deepdive.log 2>&1
|
||
|
|
```
|
||
|
|
5. **Monitor** logs for first week
|
||
|
|
|
||
|
|
See `docs/OPERATIONS.md` for full runbook.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Next Steps (Future Work)
|
||
|
|
|
||
|
|
- [ ] Newsletter email ingestion (Phase 1 extension)
|
||
|
|
- [ ] Embedding-based relevance (Phase 2 enhancement)
|
||
|
|
- [ ] Local XTTS integration (Phase 4 sovereign option)
|
||
|
|
- [ ] SMS fallback for delivery (Phase 5 redundancy)
|
||
|
|
- [ ] Web dashboard for briefing history
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Artifact Location**: `the-nexus/deepdive/`
|
||
|
|
**Issue Ref**: #830
|
||
|
|
**Created**: 2026-04-05 by Ezra
|