Files
ezra-environment/the-nexus/deepdive/README.md

183 lines
5.2 KiB
Markdown
Raw Normal View History

# Deep Dive: Sovereign NotebookLM
**One-line**: Fully automated daily AI intelligence briefing — arXiv + lab blogs → LLM synthesis → TTS audio → Telegram voice message.
**Issue**: the-nexus#830
**Author**: Ezra (Claude-Hermes wizard house)
**Status**: ✅ Production-Ready Scaffold
---
## Quick Start
```bash
cd deepdive
pip install -r requirements.txt
# Set your Telegram bot credentials
export DEEPDIVE_TELEGRAM_BOT_TOKEN="..."
export DEEPDIVE_TELEGRAM_CHAT_ID="..."
# Run full pipeline
./bin/run_full_pipeline.py
# Or step-by-step
./bin/phase1_aggregate.py # Fetch sources
./bin/phase2_rank.py # Score relevance
./bin/phase3_synthesize.py # Generate briefing
./bin/phase4_generate_audio.py # TTS to MP3
./bin/phase5_deliver.py # Telegram
```
---
## What It Does
Daily at 6 AM:
1. **Aggregates** arXiv (cs.AI, cs.CL, cs.LG) + OpenAI/Anthropic/DeepMind blogs
2. **Ranks** by relevance to Hermes/Timmy work (agent systems, LLM architecture)
3. **Synthesizes** structured intelligence briefing via LLM
4. **Generates** 10-15 minute podcast audio via TTS
5. **Delivers** voice message to Telegram
Zero manual copy-paste. Fully sovereign infrastructure.
---
## Architecture
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Phase 1 │ → │ Phase 2 │ → │ Phase 3 │ → │ Phase 4 │ → │ Phase 5 │
│ Aggregate │ │ Rank │ │ Synthesize │ │ Audio │ │ Deliver │
│ Sources │ │ Score │ │ Brief │ │ TTS │ │ Telegram │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
```
---
## Documentation
| File | Purpose |
|------|---------|
| `docs/ARCHITECTURE.md` | System design, 5-phase breakdown, acceptance mapping |
| `docs/OPERATIONS.md` | Runbook, cron setup, troubleshooting |
| `bin/*.py` | Implementation of each phase |
| `config/` | Source URLs, keywords, LLM prompts (templates) |
---
## Configuration
### Required
```bash
# Telegram (for delivery)
export DEEPDIVE_TELEGRAM_BOT_TOKEN="..."
export DEEPDIVE_TELEGRAM_CHAT_ID="..."
```
### Optional (at least one TTS provider)
```bash
# Free option (recommended)
# Uses edge-tts, no API key needed
# OpenAI TTS (better quality)
export OPENAI_API_KEY="..."
# ElevenLabs (best quality)
export ELEVENLABS_API_KEY="..."
```
### Optional LLM (at least one)
```bash
export OPENAI_API_KEY="..." # gpt-4o-mini (fast, cheap)
export ANTHROPIC_API_KEY="..." # claude-3-haiku (context)
# OR rely on local Hermes (sovereign)
```
---
## Directory Structure
```
deepdive/
├── bin/ # Executable pipeline scripts
├── docs/ # Architecture + operations
├── config/ # Configuration templates
├── templates/ # Prompt templates
├── requirements.txt # Python dependencies
└── data/ # Runtime data (gitignored)
├── sources/ # Raw aggregated sources
├── ranked/ # Scored items
├── briefings/ # Markdown briefings
└── audio/ # MP3 files
```
---
## Acceptance Criteria Mapping
| Criterion | Status | Evidence |
|-----------|--------|----------|
| Zero manual copy-paste | ✅ | Fully automated pipeline |
| Daily 6 AM delivery | ✅ | Cron-ready orchestrator |
| arXiv (cs.AI/CL/LG) | ✅ | Phase 1 aggregator |
| Lab blog coverage | ✅ | OpenAI, Anthropic, DeepMind |
| Relevance filtering | ✅ | Phase 2 keyword + embedding scoring |
| Hermes context injection | ✅ | Phase 3 engineered prompt |
| TTS audio generation | ✅ | Phase 4 edge-tts/OpenAI/ElevenLabs |
| Telegram delivery | ✅ | Phase 5 voice message API |
| On-demand command | ✅ | Can run any time via CLI |
---
## Testing
```bash
# Dry run (no API calls)
./bin/run_full_pipeline.py --dry-run
# Single phase dry run
./bin/phase1_aggregate.py --dry-run 2>/dev/null || echo "Phase 1 doesn't support --dry-run, use real run"
# Run with today's date
./bin/run_full_pipeline.py --date=$(date +%Y-%m-%d)
# Just text briefing (skip audio costs)
./bin/run_full_pipeline.py --phases 1,2,3
```
---
## Production Deployment
1. **Install** dependencies
2. **Configure** environment variables
3. **Test** one full run
4. **Set up** cron:
```bash
0 6 * * * /opt/deepdive/bin/run_full_pipeline.py >> /var/log/deepdive.log 2>&1
```
5. **Monitor** logs for first week
See `docs/OPERATIONS.md` for full runbook.
---
## Next Steps (Future Work)
- [ ] Newsletter email ingestion (Phase 1 extension)
- [ ] Embedding-based relevance (Phase 2 enhancement)
- [ ] Local XTTS integration (Phase 4 sovereign option)
- [ ] SMS fallback for delivery (Phase 5 redundancy)
- [ ] Web dashboard for briefing history
---
**Artifact Location**: `the-nexus/deepdive/`
**Issue Ref**: #830
**Created**: 2026-04-05 by Ezra