Complete production-ready scaffold for automated daily AI intelligence briefings: - Phase 1: Source aggregation (arXiv + lab blogs) - Phase 2: Relevance ranking (keyword + source authority scoring) - Phase 3: LLM synthesis (Hermes-context briefing generation) - Phase 4: TTS audio (edge-tts/OpenAI/ElevenLabs) - Phase 5: Telegram delivery (voice message) Deliverables: - docs/ARCHITECTURE.md (9000+ lines) - system design - docs/OPERATIONS.md - runbook and troubleshooting - 5 executable phase scripts (bin/) - Full pipeline orchestrator (run_full_pipeline.py) - requirements.txt, README.md Addresses all 9 acceptance criteria from #830. Ready for host selection, credential config, and cron activation. Author: Ezra | Burn mode | 2026-04-05
234 lines
4.6 KiB
Markdown
234 lines
4.6 KiB
Markdown
# Deep Dive Operations Runbook
|
|
|
|
**Issue**: the-nexus#830
|
|
**Maintainer**: Operations team post-deployment
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# 1. Install dependencies
|
|
cd deepdive && pip install -r requirements.txt
|
|
|
|
# 2. Configure environment
|
|
cp config/.env.example config/.env
|
|
# Edit config/.env with your API keys
|
|
|
|
# 3. Test full pipeline
|
|
./bin/run_full_pipeline.py --date=$(date +%Y-%m-%d) --dry-run
|
|
|
|
# 4. Run for real
|
|
./bin/run_full_pipeline.py
|
|
```
|
|
|
|
---
|
|
|
|
## Daily Operations
|
|
|
|
### Manual Run (On-Demand)
|
|
|
|
```bash
|
|
# Run full pipeline for today
|
|
./bin/run_full_pipeline.py
|
|
|
|
# Run specific phases
|
|
./bin/run_full_pipeline.py --phases 1,2 # Just aggregate and rank
|
|
./bin/run_full_pipeline.py --phase3-only # Regenerate briefing
|
|
```
|
|
|
|
### Cron Setup (Scheduled)
|
|
|
|
```bash
|
|
# Edit crontab
|
|
crontab -e
|
|
|
|
# Add daily 6 AM run (server time should be EST)
|
|
0 6 * * * /opt/deepdive/bin/run_full_pipeline.py >> /var/log/deepdive.log 2>&1
|
|
```
|
|
|
|
Systemd timer alternative:
|
|
```bash
|
|
sudo cp config/deepdive.service /etc/systemd/system/
|
|
sudo cp config/deepdive.timer /etc/systemd/system/
|
|
sudo systemctl enable deepdive.timer
|
|
sudo systemctl start deepdive.timer
|
|
```
|
|
|
|
---
|
|
|
|
## Monitoring
|
|
|
|
### Check Today's Run
|
|
|
|
```bash
|
|
# View logs
|
|
tail -f /var/log/deepdive.log
|
|
|
|
# Check data directories
|
|
ls -la data/sources/$(date +%Y-%m-%d)/
|
|
ls -la data/briefings/
|
|
ls -la data/audio/
|
|
|
|
# Verify Telegram delivery
|
|
curl -s "https://api.telegram.org/bot${TOKEN}/getUpdates" | jq '.result[-1]'
|
|
```
|
|
|
|
### Common Issues
|
|
|
|
| Issue | Cause | Fix |
|
|
|-------|-------|-----|
|
|
| No sources aggregated | arXiv API down | Wait and retry; check http://status.arxiv.org |
|
|
| Empty briefing | No relevant sources | Lower relevance threshold in config |
|
|
| TTS fails | No API credits | Switch to `edge-tts` (free) |
|
|
| Telegram not delivering | Bot token invalid | Regenerate bot token via @BotFather |
|
|
| Audio too long | Briefing too verbose | Reduce max_chars in phase4 |
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
### Source Management
|
|
|
|
Edit `config/sources.yaml`:
|
|
|
|
```yaml
|
|
sources:
|
|
arxiv:
|
|
categories:
|
|
- cs.AI
|
|
- cs.CL
|
|
- cs.LG
|
|
max_items: 50
|
|
|
|
blogs:
|
|
openai: https://openai.com/blog/rss.xml
|
|
anthropic: https://www.anthropic.com/news.atom
|
|
deepmind: https://deepmind.google/blog/rss.xml
|
|
max_items_per_source: 10
|
|
|
|
newsletters:
|
|
- name: "Import AI"
|
|
email_filter: "importai@jack-clark.net"
|
|
```
|
|
|
|
### Relevance Tuning
|
|
|
|
Edit `config/relevance.yaml`:
|
|
|
|
```yaml
|
|
keywords:
|
|
hermes: 3.0 # Boost Hermes mentions
|
|
agent: 1.5
|
|
mcp: 2.0
|
|
|
|
thresholds:
|
|
min_score: 2.0 # Drop items below this
|
|
max_items: 20 # Top N to keep
|
|
```
|
|
|
|
### LLM Selection
|
|
|
|
Environment variable:
|
|
```bash
|
|
export DEEPDIVE_LLM_MODEL="openai/gpt-4o-mini"
|
|
# or
|
|
export DEEPDIVE_LLM_MODEL="anthropic/claude-3-haiku"
|
|
# or
|
|
export DEEPDIVE_LLM_MODEL="hermes/local"
|
|
```
|
|
|
|
### TTS Selection
|
|
|
|
Environment variable:
|
|
```bash
|
|
export DEEPDIVE_TTS_PROVIDER="edge-tts" # Free, recommended
|
|
# or
|
|
export DEEPDIVE_TTS_PROVIDER="openai" # Requires OPENAI_API_KEY
|
|
# or
|
|
export DEEPDIVE_TTS_PROVIDER="elevenlabs" # Best quality
|
|
```
|
|
|
|
---
|
|
|
|
## Telegram Bot Setup
|
|
|
|
1. **Create Bot**: Message @BotFather, create new bot, get token
|
|
2. **Get Chat ID**: Message bot, then:
|
|
```bash
|
|
curl https://api.telegram.org/bot<TOKEN>/getUpdates
|
|
```
|
|
3. **Configure**:
|
|
```bash
|
|
export DEEPDIVE_TELEGRAM_BOT_TOKEN="<token>"
|
|
export DEEPDIVE_TELEGRAM_CHAT_ID="<chat_id>"
|
|
```
|
|
|
|
---
|
|
|
|
## Maintenance
|
|
|
|
### Weekly
|
|
|
|
- [ ] Check disk space in `data/` directory
|
|
- [ ] Review log for errors: `grep ERROR /var/log/deepdive.log`
|
|
- [ ] Verify cron/timer is running: `systemctl status deepdive.timer`
|
|
|
|
### Monthly
|
|
|
|
- [ ] Archive old audio: `find data/audio -mtime +30 -exec gzip {} \;`
|
|
- [ ] Review source quality: are rankings accurate?
|
|
- [ ] Update API keys if approaching limits
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Debug Mode
|
|
|
|
Run phases individually with verbose output:
|
|
|
|
```bash
|
|
# Phase 1 with verbose
|
|
python -c "
|
|
import asyncio
|
|
from bin.phase1_aggregate import SourceAggregator
|
|
from pathlib import Path
|
|
agg = SourceAggregator(Path('data'), '2026-04-05')
|
|
asyncio.run(agg.run())
|
|
"
|
|
```
|
|
|
|
### Reset State
|
|
|
|
Delete and regenerate:
|
|
```bash
|
|
rm -rf data/sources/2026-04-*
|
|
rm -rf data/ranked/*.json
|
|
rm -rf data/briefings/*.md
|
|
rm -rf data/audio/*.mp3
|
|
```
|
|
|
|
### Test Telegram
|
|
|
|
```bash
|
|
curl -X POST \
|
|
https://api.telegram.org/bot<TOKEN>/sendMessage \
|
|
-d chat_id=<CHAT_ID> \
|
|
-d text="Deep Dive test message"
|
|
```
|
|
|
|
---
|
|
|
|
## Security
|
|
|
|
- API keys stored in `config/.env` (gitignored)
|
|
- `.env` file permissions: `chmod 600 config/.env`
|
|
- Telegram bot token: regenerate if compromised
|
|
- LLM API usage: monitor for unexpected spend
|
|
|
|
---
|
|
|
|
**Issue Ref**: #830
|
|
**Last Updated**: 2026-04-05 by Ezra
|