Files

Ezra (Archivist) 014bb3b71e

Deploy Nexus / deploy (push) Has been cancelled

Details

[ezra] Gemini handoff for Deep Dive (#830 )

- Add GEMINI_HANDOFF.md with codebase map, secrets inventory,
  production checklist, and recommended next steps
- Continuity from Ezra scaffold to Gemini production-hardening

2026-04-05 18:20:53 +00:00

8.1 KiB

Raw Permalink Blame History

Gemini Handoff — Deep Dive Sovereign NotebookLM (#830)

Issue: #830
Assignee: @gemini (reassigned from Fenrir, 2026-04-05)
Previous Work: Ezra (scaffold, implementation, tests, fleet context)
Created: Ezra | 2026-04-05
Purpose: Give Gemini a complete map of the Deep Dive codebase, current state, and the exact path to production.

1. Assignment Context

You (Gemini) are now the owner of the Deep Dive epic. The scaffold and core implementation are complete and tested. Your job is to take the pipeline from "tests pass in a clean venv" to "daily 6 AM production delivery to Alexander's Telegram."

This is not a greenfield project. It is a production-hardening and operational-integration task.

2. Codebase Map

File	Lines	Purpose	State
`pipeline.py`	~750	5-phase orchestrator (aggregate → filter → synthesize → TTS → deliver)	Production-ready
`fleet_context.py`	~200	Phase 0: Gitea fleet snapshot injection	Complete, tested
`tts_engine.py`	~230	Piper (local) + ElevenLabs (cloud) adapters	Complete, tested
`telegram_command.py`	~130	`/deepdive` on-demand handler for Hermes Telegram gateway	Complete
`config.yaml`	~110	Central configuration (sources, LLM, TTS, delivery)	Complete
`Makefile`	~70	Install, test, e2e, systemd targets	Complete
`architecture.md`	~280	Original architecture spec	Reference only
`README.md`	~70	Project overview	Complete
`QUICKSTART.md`	~80	Fast path to first run	Complete

Tests (all passing)

Test File	Coverage
`tests/test_aggregator.py`	ArXiv RSS fetch, deduplication
`tests/test_relevance.py`	Keyword + embedding scoring
`tests/test_fleet_context.py`	Gitea client, markdown formatting
`tests/test_e2e.py`	Full dry-run pipeline

Last verified: 2026-04-05 — 9 passed, 8 warnings in 21.32s

3. Current Implementation State

What Works Today

✅ ArXiv RSS aggregation (cs.AI, cs.CL, cs.LG)
✅ Lab blog scraping (OpenAI, Anthropic, DeepMind)
✅ Keyword + sentence-transformer relevance scoring
✅ LLM synthesis with fleet context injection
✅ TTS generation (Piper local, ElevenLabs fallback)
✅ Telegram text/voice delivery
✅ On-demand CLI execution (--dry-run, --since)
✅ systemd timer scaffolding (make install-systemd)
✅ Fleet context grounding (live Gitea issues, commits, PRs)

What's Configured but Not Secrets-Injected

🔶 config.yaml references TELEGRAM_BOT_TOKEN — must be in env
🔶 config.yaml references LLM endpoint http://localhost:4000/v1 — must be live
🔶 ElevenLabs adapter needs ELEVENLABS_API_KEY — optional (Piper is sovereign default)

4. Operational Secrets Inventory

Secret	Env Var	Required?	Where to Get
Telegram Bot Token	`TELEGRAM_BOT_TOKEN`	Yes	@BotFather
Telegram Channel ID	`CHANNEL_ID` or in `config.yaml`	Yes	Forward a message to `@userinfobot`
Gitea Token	`GITEA_TOKEN`	Yes (fleet context)	Ezra's `.env` or generate new
ElevenLabs API Key	`ELEVENLABS_API_KEY`	No (fallback)	ElevenLabs dashboard
OpenRouter/API Key	`OPENROUTER_API_KEY`	No (local LLM default)	If using cloud LLM fallback

Recommended Secret Injection Pattern

Create /root/wizards/the-nexus/intelligence/deepdive/.env:

TELEGRAM_BOT_TOKEN=your_token_here
CHANNEL_ID=-1001234567890
GITEA_TOKEN=your_token_here
ELEVENLABS_API_KEY=optional_fallback_here

Load it in systemd service or cron by adding:

set -a; source /path/to/.env; set +a

5. Production Readiness Checklist

Step 1: Inject Secrets (15 min)

.env file created with real tokens
config.yaml points to correct LLM endpoint
Telegram bot added to target channel with send permissions

Step 2: Local Live Run (30 min)

make install in clean environment
python pipeline.py --config config.yaml --since 24 executes without error
Telegram receives a test briefing (text or voice)
Audio length is in the 10-15 minute range

Step 3: Voice Quality Gate (30 min)

Piper output evaluated: is it "premium" enough for daily listening?
If Piper is too robotic, switch primary TTS to ElevenLabs
Document the chosen voice ID in config.yaml

Alexander's directive: "Voice quality matters. This should sound premium, not like a throwaway TTS demo."

Step 4: Content Quality Gate (30 min)

Briefing references live fleet context (repos, issues, commits)
External news is tied back to Hermes/OpenClaw/Nexus/Timmy implications
Not generic AI news — it must be a context-rich daily deep dive for Alexander

Step 5: Automation Hardening (30 min)

make install-systemd executed and timer active
systemctl --user status deepdive.timer shows OnCalendar=06:00
Logs are written to persistent location (~/.local/share/deepdive/logs/)
Failure alerts route to #fleet-alerts or equivalent

Step 6: Hermes Integration (30 min)

/deepdive command registered in Hermes Telegram gateway
On-demand trigger works from Telegram chat
Command accepts --since override (e.g., /deepdive 48)

6. Architecture Decisions Already Made (Do Not Re-Litigate)

Piper primary, ElevenLabs fallback — preserves sovereignty, allows quality escape hatch.
Local LLM endpoint default (localhost:4000) — keeps inference sovereign; cloud fallback is optional.
SQLite/JSON caching, no Postgres — reduces operational surface area.
Fleet context is mandatory — fleet_context.py runs before every synthesis.
Telegram voice delivery — MP3 output, sent as voice message for mobile consumption.

7. Known Issues / Watches

Issue	Risk	Mitigation
ArXiv RSS throttling	Medium	`since` window is configurable; add exponential backoff if needed
Piper voice quality	Medium	Primary reason for ElevenLabs fallback
LLM endpoint downtime	Low	Hermes local stack is 24/7; add health check if concerned
Gitea API rate limits	Low	Fleet context is lightweight; cache for 1 hour if needed

8. Recommended Next Steps (Gemini)

Read this handoff ✅ (you are here)
Inject secrets and run one live delivery
Evaluate voice quality — decide Piper vs ElevenLabs primary
Tune synthesis prompt in pipeline.py to match Alexander's taste
Enable systemd timer and verify first automated run
Register /deepdive in Hermes Telegram gateway
Post SITREP on #830 documenting production state

9. Quick Commands

# Clone / navigate
cd /root/wizards/the-nexus/intelligence/deepdive

# Install & test
make install
make test
make test-e2e

# Live run (requires secrets)
python pipeline.py --config config.yaml --since 24

# Systemd automation
make install-systemd
systemctl --user status deepdive.timer

# Test Telegram command locally
python telegram_command.py --since 24

10. References

Epic: #830
Architecture: architecture.md
Quickstart: QUICKSTART.md
TTS Proof: docs/deep-dive/TTS_INTEGRATION_PROOF.md
Deep Dive Canonical Index: docs/CANONICAL_INDEX_DEEPDIVE.md

Ezra Sign-off: The hard engineering is done. What remains is operational integration and quality tuning. Gemini is the right owner for this final mile.

— Ezra, Archivist
2026-04-05

8.1 KiB Raw Permalink Blame History