diff --git a/docs/deep-dive/ARCHITECTURE.md b/docs/deep-dive/ARCHITECTURE.md new file mode 100644 index 0000000..9f57048 --- /dev/null +++ b/docs/deep-dive/ARCHITECTURE.md @@ -0,0 +1,80 @@ +# Deep Dive Architecture + +Technical specification for the automated daily intelligence briefing system. + +## System Overview + +``` +┌─────────────┬─────────────┬─────────────┬─────────────┬─────────────┐ +│ Phase 1 │ Phase 2 │ Phase 3 │ Phase 4 │ Phase 5 │ +│ Aggregate │ Filter │ Synthesize │ TTS │ Deliver │ +├─────────────┼─────────────┼─────────────┼─────────────┼─────────────┤ +│ arXiv RSS │ Chroma DB │ Claude/GPT │ Piper │ Telegram │ +│ Lab Blogs │ Embeddings │ Prompt │ (local) │ Voice │ +└─────────────┴─────────────┴─────────────┴─────────────┴─────────────┘ +``` + +## Data Flow + +1. **Aggregation**: Fetch from arXiv + lab blogs +2. **Relevance**: Score against Hermes context via embeddings +3. **Synthesis**: LLM generates structured briefing +4. **TTS**: Piper converts to audio (Opus) +5. **Delivery**: Telegram voice message + +## Source Coverage + +| Source | Method | Frequency | +|--------|--------|-----------| +| arXiv cs.AI | RSS | Daily | +| arXiv cs.CL | RSS | Daily | +| arXiv cs.LG | RSS | Daily | +| OpenAI Blog | RSS | Weekly | +| Anthropic | RSS | Weekly | +| DeepMind | Scraper | Weekly | + +## Relevance Scoring + +**Keyword Layer**: Match against 20+ Hermes keywords +**Embedding Layer**: `all-MiniLM-L6-v2` + Chroma DB +**Composite**: `0.3 * keyword_score + 0.7 * embedding_score` + +## TTS Pipeline + +- **Engine**: Piper (`en_US-lessac-medium`) +- **Speed**: ~1.5x realtime on CPU +- **Format**: WAV → FFmpeg → Opus (24kbps) +- **Sovereign**: Fully local, zero API cost + +## Cron Integration + +```yaml +job: + name: deep-dive-daily + schedule: "0 6 * * *" + command: python3 orchestrator.py --cron +``` + +## On-Demand + +```bash +python3 orchestrator.py # Full run +python3 orchestrator.py --dry-run # No delivery +python3 orchestrator.py --skip-tts # Text only +``` + +## Acceptance Criteria + +| Criterion | Status | +|-----------|--------| +| Zero manual copy-paste | ✅ Automated | +| Daily 6 AM delivery | ✅ Cron ready | +| arXiv + labs coverage | ✅ RSS + scraper | +| Hermes relevance filter | ✅ Embeddings | +| Written briefing | ✅ LLM synthesis | +| Audio via TTS | ✅ Piper pipeline | +| Telegram delivery | ✅ Voice API | +| On-demand command | ✅ CLI flags | + +--- +**Epic**: #830 | **Status**: Architecture Complete