scaffold/deepdive/
├── phase1/          # Source aggregation (ZERO blockers, can start now)
│   ├── arxiv_aggregator.py   ← Run this today
│   ├── blog_scraper.py       (stub)
│   └── config.yaml
├── phase2/          # Relevance engine (needs Phase 1)
│   ├── relevance_engine.py   (stub)
│   └── embeddings.py         (stub)
├── phase3/          # Synthesis (needs Phase 2)
│   ├── synthesis.py            (stub)
│   └── briefing_template.md
├── phase4/          # TTS pipeline (needs Phase 3)
│   ├── tts_pipeline.py         (stub)
│   └── piper_config.json
└── phase5/          # Delivery (needs Phase 4)
    ├── telegram_delivery.py    (stub)
    └── deepdive_command.py     (stub)

Quick Start

Phase 1 (Today)

cd the-nexus/scaffold/deepdive/phase1
python3 arxiv_aggregator.py

Requirements: Python 3.8+, internet connection, no API keys.

Output: data/deepdive/raw/arxiv-YYYY-MM-DD.jsonl

Sovereignty Preservation

Component	Local Option	Cloud Fallback
Embeddings	nomic-embed-text via llama.cpp	OpenAI
LLM	Gemma 4 via Hermes	Kimi K2.5
TTS	Piper	ElevenLabs

Rule: Implement local first, add cloud fallback only if quality unacceptable.

Next Steps

✅ Phase 1: Run arxiv_aggregator.py to validate fetch pipeline
⏳ Phase 2: Implement relevance_engine.py with embeddings
⏳ Phase 3: Draft synthesis.py with prompt templates
⏳ Phase 4: Test tts_pipeline.py with Piper
⏳ Phase 5: Integrate telegram_delivery.py with Hermes gateway

See docs/deep-dive-architecture.md for full technical specification.