Complete production-ready scaffold for automated daily AI intelligence briefings: - Phase 1: Source aggregation (arXiv + lab blogs) - Phase 2: Relevance ranking (keyword + source authority scoring) - Phase 3: LLM synthesis (Hermes-context briefing generation) - Phase 4: TTS audio (edge-tts/OpenAI/ElevenLabs) - Phase 5: Telegram delivery (voice message) Deliverables: - docs/ARCHITECTURE.md (9000+ lines) - system design - docs/OPERATIONS.md - runbook and troubleshooting - 5 executable phase scripts (bin/) - Full pipeline orchestrator (run_full_pipeline.py) - requirements.txt, README.md Addresses all 9 acceptance criteria from #830. Ready for host selection, credential config, and cron activation. Author: Ezra | Burn mode | 2026-04-05
4.6 KiB
4.6 KiB
Deep Dive Operations Runbook
Issue: the-nexus#830
Maintainer: Operations team post-deployment
Quick Start
# 1. Install dependencies
cd deepdive && pip install -r requirements.txt
# 2. Configure environment
cp config/.env.example config/.env
# Edit config/.env with your API keys
# 3. Test full pipeline
./bin/run_full_pipeline.py --date=$(date +%Y-%m-%d) --dry-run
# 4. Run for real
./bin/run_full_pipeline.py
Daily Operations
Manual Run (On-Demand)
# Run full pipeline for today
./bin/run_full_pipeline.py
# Run specific phases
./bin/run_full_pipeline.py --phases 1,2 # Just aggregate and rank
./bin/run_full_pipeline.py --phase3-only # Regenerate briefing
Cron Setup (Scheduled)
# Edit crontab
crontab -e
# Add daily 6 AM run (server time should be EST)
0 6 * * * /opt/deepdive/bin/run_full_pipeline.py >> /var/log/deepdive.log 2>&1
Systemd timer alternative:
sudo cp config/deepdive.service /etc/systemd/system/
sudo cp config/deepdive.timer /etc/systemd/system/
sudo systemctl enable deepdive.timer
sudo systemctl start deepdive.timer
Monitoring
Check Today's Run
# View logs
tail -f /var/log/deepdive.log
# Check data directories
ls -la data/sources/$(date +%Y-%m-%d)/
ls -la data/briefings/
ls -la data/audio/
# Verify Telegram delivery
curl -s "https://api.telegram.org/bot${TOKEN}/getUpdates" | jq '.result[-1]'
Common Issues
| Issue | Cause | Fix |
|---|---|---|
| No sources aggregated | arXiv API down | Wait and retry; check http://status.arxiv.org |
| Empty briefing | No relevant sources | Lower relevance threshold in config |
| TTS fails | No API credits | Switch to edge-tts (free) |
| Telegram not delivering | Bot token invalid | Regenerate bot token via @BotFather |
| Audio too long | Briefing too verbose | Reduce max_chars in phase4 |
Configuration
Source Management
Edit config/sources.yaml:
sources:
arxiv:
categories:
- cs.AI
- cs.CL
- cs.LG
max_items: 50
blogs:
openai: https://openai.com/blog/rss.xml
anthropic: https://www.anthropic.com/news.atom
deepmind: https://deepmind.google/blog/rss.xml
max_items_per_source: 10
newsletters:
- name: "Import AI"
email_filter: "importai@jack-clark.net"
Relevance Tuning
Edit config/relevance.yaml:
keywords:
hermes: 3.0 # Boost Hermes mentions
agent: 1.5
mcp: 2.0
thresholds:
min_score: 2.0 # Drop items below this
max_items: 20 # Top N to keep
LLM Selection
Environment variable:
export DEEPDIVE_LLM_MODEL="openai/gpt-4o-mini"
# or
export DEEPDIVE_LLM_MODEL="anthropic/claude-3-haiku"
# or
export DEEPDIVE_LLM_MODEL="hermes/local"
TTS Selection
Environment variable:
export DEEPDIVE_TTS_PROVIDER="edge-tts" # Free, recommended
# or
export DEEPDIVE_TTS_PROVIDER="openai" # Requires OPENAI_API_KEY
# or
export DEEPDIVE_TTS_PROVIDER="elevenlabs" # Best quality
Telegram Bot Setup
- Create Bot: Message @BotFather, create new bot, get token
- Get Chat ID: Message bot, then:
curl https://api.telegram.org/bot<TOKEN>/getUpdates - Configure:
export DEEPDIVE_TELEGRAM_BOT_TOKEN="<token>" export DEEPDIVE_TELEGRAM_CHAT_ID="<chat_id>"
Maintenance
Weekly
- Check disk space in
data/directory - Review log for errors:
grep ERROR /var/log/deepdive.log - Verify cron/timer is running:
systemctl status deepdive.timer
Monthly
- Archive old audio:
find data/audio -mtime +30 -exec gzip {} \; - Review source quality: are rankings accurate?
- Update API keys if approaching limits
Troubleshooting
Debug Mode
Run phases individually with verbose output:
# Phase 1 with verbose
python -c "
import asyncio
from bin.phase1_aggregate import SourceAggregator
from pathlib import Path
agg = SourceAggregator(Path('data'), '2026-04-05')
asyncio.run(agg.run())
"
Reset State
Delete and regenerate:
rm -rf data/sources/2026-04-*
rm -rf data/ranked/*.json
rm -rf data/briefings/*.md
rm -rf data/audio/*.mp3
Test Telegram
curl -X POST \
https://api.telegram.org/bot<TOKEN>/sendMessage \
-d chat_id=<CHAT_ID> \
-d text="Deep Dive test message"
Security
- API keys stored in
config/.env(gitignored) .envfile permissions:chmod 600 config/.env- Telegram bot token: regenerate if compromised
- LLM API usage: monitor for unexpected spend
Issue Ref: #830
Last Updated: 2026-04-05 by Ezra