81 lines
2.7 KiB
Markdown
81 lines
2.7 KiB
Markdown
|
|
# Deep Dive Architecture
|
||
|
|
|
||
|
|
Technical specification for the automated daily intelligence briefing system.
|
||
|
|
|
||
|
|
## System Overview
|
||
|
|
|
||
|
|
```
|
||
|
|
┌─────────────┬─────────────┬─────────────┬─────────────┬─────────────┐
|
||
|
|
│ Phase 1 │ Phase 2 │ Phase 3 │ Phase 4 │ Phase 5 │
|
||
|
|
│ Aggregate │ Filter │ Synthesize │ TTS │ Deliver │
|
||
|
|
├─────────────┼─────────────┼─────────────┼─────────────┼─────────────┤
|
||
|
|
│ arXiv RSS │ Chroma DB │ Claude/GPT │ Piper │ Telegram │
|
||
|
|
│ Lab Blogs │ Embeddings │ Prompt │ (local) │ Voice │
|
||
|
|
└─────────────┴─────────────┴─────────────┴─────────────┴─────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
## Data Flow
|
||
|
|
|
||
|
|
1. **Aggregation**: Fetch from arXiv + lab blogs
|
||
|
|
2. **Relevance**: Score against Hermes context via embeddings
|
||
|
|
3. **Synthesis**: LLM generates structured briefing
|
||
|
|
4. **TTS**: Piper converts to audio (Opus)
|
||
|
|
5. **Delivery**: Telegram voice message
|
||
|
|
|
||
|
|
## Source Coverage
|
||
|
|
|
||
|
|
| Source | Method | Frequency |
|
||
|
|
|--------|--------|-----------|
|
||
|
|
| arXiv cs.AI | RSS | Daily |
|
||
|
|
| arXiv cs.CL | RSS | Daily |
|
||
|
|
| arXiv cs.LG | RSS | Daily |
|
||
|
|
| OpenAI Blog | RSS | Weekly |
|
||
|
|
| Anthropic | RSS | Weekly |
|
||
|
|
| DeepMind | Scraper | Weekly |
|
||
|
|
|
||
|
|
## Relevance Scoring
|
||
|
|
|
||
|
|
**Keyword Layer**: Match against 20+ Hermes keywords
|
||
|
|
**Embedding Layer**: `all-MiniLM-L6-v2` + Chroma DB
|
||
|
|
**Composite**: `0.3 * keyword_score + 0.7 * embedding_score`
|
||
|
|
|
||
|
|
## TTS Pipeline
|
||
|
|
|
||
|
|
- **Engine**: Piper (`en_US-lessac-medium`)
|
||
|
|
- **Speed**: ~1.5x realtime on CPU
|
||
|
|
- **Format**: WAV → FFmpeg → Opus (24kbps)
|
||
|
|
- **Sovereign**: Fully local, zero API cost
|
||
|
|
|
||
|
|
## Cron Integration
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
job:
|
||
|
|
name: deep-dive-daily
|
||
|
|
schedule: "0 6 * * *"
|
||
|
|
command: python3 orchestrator.py --cron
|
||
|
|
```
|
||
|
|
|
||
|
|
## On-Demand
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python3 orchestrator.py # Full run
|
||
|
|
python3 orchestrator.py --dry-run # No delivery
|
||
|
|
python3 orchestrator.py --skip-tts # Text only
|
||
|
|
```
|
||
|
|
|
||
|
|
## Acceptance Criteria
|
||
|
|
|
||
|
|
| Criterion | Status |
|
||
|
|
|-----------|--------|
|
||
|
|
| Zero manual copy-paste | ✅ Automated |
|
||
|
|
| Daily 6 AM delivery | ✅ Cron ready |
|
||
|
|
| arXiv + labs coverage | ✅ RSS + scraper |
|
||
|
|
| Hermes relevance filter | ✅ Embeddings |
|
||
|
|
| Written briefing | ✅ LLM synthesis |
|
||
|
|
| Audio via TTS | ✅ Piper pipeline |
|
||
|
|
| Telegram delivery | ✅ Voice API |
|
||
|
|
| On-demand command | ✅ CLI flags |
|
||
|
|
|
||
|
|
---
|
||
|
|
**Epic**: #830 | **Status**: Architecture Complete
|