# Deep Dive Operations Runbook **Issue**: the-nexus#830 **Maintainer**: Operations team post-deployment --- ## Quick Start ```bash # 1. Install dependencies cd deepdive && pip install -r requirements.txt # 2. Configure environment cp config/.env.example config/.env # Edit config/.env with your API keys # 3. Test full pipeline ./bin/run_full_pipeline.py --date=$(date +%Y-%m-%d) --dry-run # 4. Run for real ./bin/run_full_pipeline.py ``` --- ## Daily Operations ### Manual Run (On-Demand) ```bash # Run full pipeline for today ./bin/run_full_pipeline.py # Run specific phases ./bin/run_full_pipeline.py --phases 1,2 # Just aggregate and rank ./bin/run_full_pipeline.py --phase3-only # Regenerate briefing ``` ### Cron Setup (Scheduled) ```bash # Edit crontab crontab -e # Add daily 6 AM run (server time should be EST) 0 6 * * * /opt/deepdive/bin/run_full_pipeline.py >> /var/log/deepdive.log 2>&1 ``` Systemd timer alternative: ```bash sudo cp config/deepdive.service /etc/systemd/system/ sudo cp config/deepdive.timer /etc/systemd/system/ sudo systemctl enable deepdive.timer sudo systemctl start deepdive.timer ``` --- ## Monitoring ### Check Today's Run ```bash # View logs tail -f /var/log/deepdive.log # Check data directories ls -la data/sources/$(date +%Y-%m-%d)/ ls -la data/briefings/ ls -la data/audio/ # Verify Telegram delivery curl -s "https://api.telegram.org/bot${TOKEN}/getUpdates" | jq '.result[-1]' ``` ### Common Issues | Issue | Cause | Fix | |-------|-------|-----| | No sources aggregated | arXiv API down | Wait and retry; check http://status.arxiv.org | | Empty briefing | No relevant sources | Lower relevance threshold in config | | TTS fails | No API credits | Switch to `edge-tts` (free) | | Telegram not delivering | Bot token invalid | Regenerate bot token via @BotFather | | Audio too long | Briefing too verbose | Reduce max_chars in phase4 | --- ## Configuration ### Source Management Edit `config/sources.yaml`: ```yaml sources: arxiv: categories: - cs.AI - cs.CL - cs.LG max_items: 50 blogs: openai: https://openai.com/blog/rss.xml anthropic: https://www.anthropic.com/news.atom deepmind: https://deepmind.google/blog/rss.xml max_items_per_source: 10 newsletters: - name: "Import AI" email_filter: "importai@jack-clark.net" ``` ### Relevance Tuning Edit `config/relevance.yaml`: ```yaml keywords: hermes: 3.0 # Boost Hermes mentions agent: 1.5 mcp: 2.0 thresholds: min_score: 2.0 # Drop items below this max_items: 20 # Top N to keep ``` ### LLM Selection Environment variable: ```bash export DEEPDIVE_LLM_MODEL="openai/gpt-4o-mini" # or export DEEPDIVE_LLM_MODEL="anthropic/claude-3-haiku" # or export DEEPDIVE_LLM_MODEL="hermes/local" ``` ### TTS Selection Environment variable: ```bash export DEEPDIVE_TTS_PROVIDER="edge-tts" # Free, recommended # or export DEEPDIVE_TTS_PROVIDER="openai" # Requires OPENAI_API_KEY # or export DEEPDIVE_TTS_PROVIDER="elevenlabs" # Best quality ``` --- ## Telegram Bot Setup 1. **Create Bot**: Message @BotFather, create new bot, get token 2. **Get Chat ID**: Message bot, then: ```bash curl https://api.telegram.org/bot/getUpdates ``` 3. **Configure**: ```bash export DEEPDIVE_TELEGRAM_BOT_TOKEN="" export DEEPDIVE_TELEGRAM_CHAT_ID="" ``` --- ## Maintenance ### Weekly - [ ] Check disk space in `data/` directory - [ ] Review log for errors: `grep ERROR /var/log/deepdive.log` - [ ] Verify cron/timer is running: `systemctl status deepdive.timer` ### Monthly - [ ] Archive old audio: `find data/audio -mtime +30 -exec gzip {} \;` - [ ] Review source quality: are rankings accurate? - [ ] Update API keys if approaching limits --- ## Troubleshooting ### Debug Mode Run phases individually with verbose output: ```bash # Phase 1 with verbose python -c " import asyncio from bin.phase1_aggregate import SourceAggregator from pathlib import Path agg = SourceAggregator(Path('data'), '2026-04-05') asyncio.run(agg.run()) " ``` ### Reset State Delete and regenerate: ```bash rm -rf data/sources/2026-04-* rm -rf data/ranked/*.json rm -rf data/briefings/*.md rm -rf data/audio/*.mp3 ``` ### Test Telegram ```bash curl -X POST \ https://api.telegram.org/bot/sendMessage \ -d chat_id= \ -d text="Deep Dive test message" ``` --- ## Security - API keys stored in `config/.env` (gitignored) - `.env` file permissions: `chmod 600 config/.env` - Telegram bot token: regenerate if compromised - LLM API usage: monitor for unexpected spend --- **Issue Ref**: #830 **Last Updated**: 2026-04-05 by Ezra