docs/DEEPSDIVE_EXECUTION.md

# Deep Dive — Execution Runbook

> Parent: [#830](http://143.198.27.163:3000/Timmy_Foundation/the-nexus/issues/830)  
> Location: `docs/DEEPSDIVE_EXECUTION.md`  
> Updated: 2026-04-05  
> Owner: @ezra

## Quick Start

Zero-to-briefing in 10 minutes:

```bash
cd /root/wizards/the-nexus

# 1. Configure (~5 min)
export DEEPDIVE_TTS_PROVIDER=openai        # or "elevenlabs" or "piper"
export OPENAI_API_KEY=sk-...               # or ELEVENLABS_API_KEY
export DEEPDIVE_TELEGRAM_BOT_TOKEN=...     # BotFather
export DEEPDIVE_TELEGRAM_CHAT_ID=...       # Your Telegram chat ID

# 2. Test run (~2 min)
./bin/deepdive_orchestrator.py --dry-run

# 3. Full delivery (~5 min)
./bin/deepdive_orchestrator.py --date $(date +%Y-%m-%d)
```

---

## Provider Decision Matrix

| Provider | Cost | Quality | Latency | Setup Complexity | Best For |
|----------|------|---------|---------|------------------|----------|
| **Piper** | Free | Medium | Fast (local) | High (model download) | Privacy-first, offline |
| **ElevenLabs** | $5/mo | High | Medium (~2s) | Low | Production quality |
| **OpenAI** | ~$0.015/1K chars | Good | Fast (~1s) | Low | Quick start, good balance |

**Recommendation**: Start with OpenAI (`tts-1` model, `alloy` voice) for immediate results. Migrate to ElevenLabs for final polish if budget allows.

---

## Phase-by-Phase Testing

### Phase 1: Aggregation Test
```bash
./bin/deepdive_aggregator.py --sources arxiv_cs_ai --output /tmp/test_agg.json
cat /tmp/test_agg.json | jq ".metadata"
```

### Phase 2: Filtering Test (via Orchestrator)
```bash
./bin/deepdive_orchestrator.py --date 2026-04-05 --stop-after phase2
ls ~/the-nexus/deepdive_state/2026-04-05/ranked.json
```

### Phase 3: Synthesis Test (requires LLM setup)
```bash
export OPENAI_API_KEY=sk-...
./bin/deepdive_orchestrator.py --date 2026-04-05 --stop-after phase3
cat ~/the-nexus/deepdive_state/2026-04-05/briefing.md
```

### Phase 4: TTS Test
```bash
echo "Hello from Deep Dive. This is a test." | ./bin/deepdive_tts.py --output /tmp/test
ls -la /tmp/test.mp3
```

### Phase 5: Delivery Test
```bash
./bin/deepdive_delivery.py --audio /tmp/test.mp3 --caption "Deep Dive test" --dry-run
./bin/deepdive_delivery.py --audio /tmp/test.mp3 --caption "Deep Dive test"
```

---

## Environment Variables Reference

### Required
| Variable | Purpose | Example |
|----------|---------|---------|
| `DEEPDIVE_TTS_PROVIDER` | TTS adapter selection | `openai`, `elevenlabs`, `piper` |
| `OPENAI_API_KEY` or `ELEVENLABS_API_KEY` | API credentials | `sk-...` |
| `DEEPDIVE_TELEGRAM_BOT_TOKEN` | Telegram bot auth | `123456:ABC-DEF...` |
| `DEEPDIVE_TELEGRAM_CHAT_ID` | Target chat | `@yourusername` or `-1001234567890` |

### Optional
| Variable | Default | Description |
|----------|---------|-------------|
| `DEEPDIVE_TTS_VOICE` | `alloy` / `matthew` | Voice ID |
| `DEEPDIVE_OUTPUT_DIR` | `~/the-nexus/deepdive_state` | State storage |
| `DEEPDIVE_LLM_PROVIDER` | `openai` | Synthesis LLM |
| `DEEPDIVE_MAX_ITEMS` | `10` | Items per briefing |

---

## Cron Installation

Daily 6 AM briefing:

```bash
# Add to crontab
crontab -e

# Entry:
0 6 * * * cd /root/wizards/the-nexus && ./bin/deepdive_orchestrator.py --date $(date +\%Y-\%m-\%d) >> /var/log/deepdive.log 2>&1
```

Verify cron environment has all required exports by adding to `~/.bashrc` or using absolute paths in crontab.

---

## Troubleshooting

### "No items found" from aggregator
- Check internet connectivity
- Verify arXiv RSS is accessible: `curl http://export.arxiv.org/rss/cs.AI`

### "Audio file not valid" from Telegram
- Ensure MP3 format, reasonable file size (< 50MB)
- Test with local playback: `mpg123 /tmp/test.mp3`

### "Telegram chat not found"
- Use numeric chat ID for groups: `-1001234567890`
- For personal chat, message @userinfobot

### Piper model not found
```bash
mkdir -p ~/.local/share/piper
cd ~/.local/share/piper
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/lessac/medium/en_US-lessac-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json
```

---

## Architecture Recap

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                           D E E P   D I V E   V1 .1                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────┐    ┌─────────────┐    ┌──────────────┐               │
│  │ deepdive_aggregator.py │ deepdive_orchestrator.py │                │
│  │   (arXiv RSS)   │───▶│  (filter)   │───▶│  (synthesize)│───▶ ...      │
│  └─────────────────┘    └─────────────┘    └──────────────┘               │
│                                                         │                   │
│                              deepdive_tts.py ◀──────────┘                   │
│                                 (TTS adapter)                               │
│                                       │                                     │
│                              deepdive_delivery.py                           │
│                            (Telegram voice msg)                             │
└─────────────────────────────────────────────────────────────────────────────┘
```

---

## Next Steps for Full Automation

- [ ] **LLM Integration**: Complete `orchestrator.phase3()` with LLM API call
- [ ] **Prompt Engineering**: Design briefing format prompt with Hermes context
- [ ] **Source Expansion**: Add lab blogs (OpenAI, Anthropic, DeepMind)
- [ ] **Embedding Filter**: Replace keyword scoring with semantic similarity
- [ ] **Metrics**: Track delivery success, user engagement, audio length

**Status**: Phases 1, 2, 4, 5 scaffolded and executable. Phase 3 synthesis awaiting LLM integration.
[ezra] Add execution runbook for Deep Dive pipeline #830 2026-04-05 03:45:08 +00:00			`# Deep Dive — Execution Runbook`

			`> Parent: [#830](http://143.198.27.163:3000/Timmy_Foundation/the-nexus/issues/830)`
			> Location: `docs/DEEPSDIVE_EXECUTION.md`
			`> Updated: 2026-04-05`
			`> Owner: @ezra`

			`## Quick Start`

			`Zero-to-briefing in 10 minutes:`

			```bash
			`cd /root/wizards/the-nexus`

			`# 1. Configure (~5 min)`
			`export DEEPDIVE_TTS_PROVIDER=openai # or "elevenlabs" or "piper"`
			`export OPENAI_API_KEY=sk-... # or ELEVENLABS_API_KEY`
			`export DEEPDIVE_TELEGRAM_BOT_TOKEN=... # BotFather`
			`export DEEPDIVE_TELEGRAM_CHAT_ID=... # Your Telegram chat ID`

			`# 2. Test run (~2 min)`
			`./bin/deepdive_orchestrator.py --dry-run`

			`# 3. Full delivery (~5 min)`
			`./bin/deepdive_orchestrator.py --date $(date +%Y-%m-%d)`
			```

			`---`

			`## Provider Decision Matrix`

			`\| Provider \| Cost \| Quality \| Latency \| Setup Complexity \| Best For \|`
			`\|----------\|------\|---------\|---------\|------------------\|----------\|`
			`\| Piper \| Free \| Medium \| Fast (local) \| High (model download) \| Privacy-first, offline \|`
			`\| ElevenLabs \| $5/mo \| High \| Medium (~2s) \| Low \| Production quality \|`
			`\| OpenAI \| ~$0.015/1K chars \| Good \| Fast (~1s) \| Low \| Quick start, good balance \|`

			Recommendation: Start with OpenAI (`tts-1` model, `alloy` voice) for immediate results. Migrate to ElevenLabs for final polish if budget allows.

			`---`

			`## Phase-by-Phase Testing`

			`### Phase 1: Aggregation Test`
			```bash
			`./bin/deepdive_aggregator.py --sources arxiv_cs_ai --output /tmp/test_agg.json`
			`cat /tmp/test_agg.json \| jq ".metadata"`
			```

			`### Phase 2: Filtering Test (via Orchestrator)`
			```bash
			`./bin/deepdive_orchestrator.py --date 2026-04-05 --stop-after phase2`
			`ls ~/the-nexus/deepdive_state/2026-04-05/ranked.json`
			```

			`### Phase 3: Synthesis Test (requires LLM setup)`
			```bash
			`export OPENAI_API_KEY=sk-...`
			`./bin/deepdive_orchestrator.py --date 2026-04-05 --stop-after phase3`
			`cat ~/the-nexus/deepdive_state/2026-04-05/briefing.md`
			```

			`### Phase 4: TTS Test`
			```bash
			`echo "Hello from Deep Dive. This is a test." \| ./bin/deepdive_tts.py --output /tmp/test`
			`ls -la /tmp/test.mp3`
			```

			`### Phase 5: Delivery Test`
			```bash
			`./bin/deepdive_delivery.py --audio /tmp/test.mp3 --caption "Deep Dive test" --dry-run`
			`./bin/deepdive_delivery.py --audio /tmp/test.mp3 --caption "Deep Dive test"`
			```

			`---`

			`## Environment Variables Reference`

			`### Required`
			`\| Variable \| Purpose \| Example \|`
			`\|----------\|---------\|---------\|`
			\| `DEEPDIVE_TTS_PROVIDER` \| TTS adapter selection \| `openai`, `elevenlabs`, `piper` \|
			\| `OPENAI_API_KEY` or `ELEVENLABS_API_KEY` \| API credentials \| `sk-...` \|
			\| `DEEPDIVE_TELEGRAM_BOT_TOKEN` \| Telegram bot auth \| `123456:ABC-DEF...` \|
			\| `DEEPDIVE_TELEGRAM_CHAT_ID` \| Target chat \| `@yourusername` or `-1001234567890` \|

			`### Optional`
			`\| Variable \| Default \| Description \|`
			`\|----------\|---------\|-------------\|`
			\| `DEEPDIVE_TTS_VOICE` \| `alloy` / `matthew` \| Voice ID \|
			\| `DEEPDIVE_OUTPUT_DIR` \| `~/the-nexus/deepdive_state` \| State storage \|
			\| `DEEPDIVE_LLM_PROVIDER` \| `openai` \| Synthesis LLM \|
			\| `DEEPDIVE_MAX_ITEMS` \| `10` \| Items per briefing \|

			`---`

			`## Cron Installation`

			`Daily 6 AM briefing:`

			```bash
			`# Add to crontab`
			`crontab -e`

			`# Entry:`
			`0 6 * * * cd /root/wizards/the-nexus && ./bin/deepdive_orchestrator.py --date $(date +\%Y-\%m-\%d) >> /var/log/deepdive.log 2>&1`
			```

			Verify cron environment has all required exports by adding to `~/.bashrc` or using absolute paths in crontab.

			`---`

			`## Troubleshooting`

			`### "No items found" from aggregator`
			`- Check internet connectivity`
			- Verify arXiv RSS is accessible: `curl http://export.arxiv.org/rss/cs.AI`

			`### "Audio file not valid" from Telegram`
			`- Ensure MP3 format, reasonable file size (< 50MB)`
			- Test with local playback: `mpg123 /tmp/test.mp3`

			`### "Telegram chat not found"`
			- Use numeric chat ID for groups: `-1001234567890`
			`- For personal chat, message @userinfobot`

			`### Piper model not found`
			```bash
			`mkdir -p ~/.local/share/piper`
			`cd ~/.local/share/piper`
			`wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/lessac/medium/en_US-lessac-medium.onnx`
			`wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json`
			```

			`---`

			`## Architecture Recap`

			```
			`┌─────────────────────────────────────────────────────────────────────────────┐`
			`│ D E E P D I V E V1 .1 │`
			`├─────────────────────────────────────────────────────────────────────────────┤`
			`│ │`
			`│ ┌─────────────────┐ ┌─────────────┐ ┌──────────────┐ │`
			`│ │ deepdive_aggregator.py │ deepdive_orchestrator.py │ │`
			`│ │ (arXiv RSS) │───▶│ (filter) │───▶│ (synthesize)│───▶ ... │`
			`│ └─────────────────┘ └─────────────┘ └──────────────┘ │`
			`│ │ │`
			`│ deepdive_tts.py ◀──────────┘ │`
			`│ (TTS adapter) │`
			`│ │ │`
			`│ deepdive_delivery.py │`
			`│ (Telegram voice msg) │`
			`└─────────────────────────────────────────────────────────────────────────────┘`
			```

			`---`

			`## Next Steps for Full Automation`

			- [ ] LLM Integration: Complete `orchestrator.phase3()` with LLM API call
			`- [ ] Prompt Engineering: Design briefing format prompt with Hermes context`
			`- [ ] Source Expansion: Add lab blogs (OpenAI, Anthropic, DeepMind)`
			`- [ ] Embedding Filter: Replace keyword scoring with semantic similarity`
			`- [ ] Metrics: Track delivery success, user engagement, audio length`

			`Status: Phases 1, 2, 4, 5 scaffolded and executable. Phase 3 synthesis awaiting LLM integration.`