Production Readiness Review — Deep Dive (#830)

Issue: #830 — Deep Dive: Sovereign NotebookLM + Daily AI Intelligence Briefing
Author: Ezra
Date: 2026-04-05
Review Status: Code Complete → Operational Readiness Verified → Pending Live Tuning

Acceptance Criteria Traceability Matrix

#	Criterion	Status	Evidence	Gap / Next Action
1	Zero manual copy-paste required	✅ Met	`pipeline.py` auto-aggregates arXiv RSS and blog feeds; no human ingestion step exists	None
2	Daily delivery at configurable time (default 6 AM)	✅ Met	`systemd/deepdive.timer` triggers at `06:00` daily; `config.yaml` accepts `delivery.time`	None
3	Covers arXiv (cs.AI, cs.CL, cs.LG)	✅ Met	`config.yaml` lists `cs.AI`, `cs.CL`, `cs.LG` under `sources.arxiv.categories`	None
4	Covers OpenAI, Anthropic, DeepMind blogs	✅ Met	`sources.blogs` entries in `config.yaml` for all three labs	None
5	Ranks/filters by relevance to agent systems, LLM architecture, RL training	✅ Met	`pipeline.py` uses keyword + embedding scoring against a relevance corpus	None
6	Generates concise written briefing with Hermes/Timmy context	✅ Met	`prompts/production_briefing_v1.txt` injects fleet context and demands actionable summaries	None
7	Produces audio file via TTS	✅ Met	`tts_engine.py` supports Piper, ElevenLabs, and OpenAI TTS backends	None
8	Delivers to Telegram as voice message	✅ Met	`telegram_command.py` and `pipeline.py` both implement `send_voice()`	None
9	On-demand generation via command	⚠️ Partial	`telegram_command.py` exists with `/deepdive` handler, but is not yet registered in the active Hermes gateway command registry	Action: one-line registration in gateway slash-command dispatcher
10	Default audio runtime 10–15 minutes	⚠️ Partial	Prompt targets 1,300–1,950 words (~10–15 min at 130 WPM), but empirical validation requires 3–5 live runs	Action: run live briefings and measure actual audio length; tune `max_tokens` if needed
11	Production voice is high-quality and natural	⚠️ Partial	Piper `en_US-lessac-medium` is acceptable but not "premium"; ElevenLabs path exists but requires API key injection	Action: inject ElevenLabs key for premium voice, or evaluate Piper `en_US-ryan-high`
12	Includes grounded awareness of live fleet, repos, issues/PRs, architecture	✅ Met	`fleet_context.py` pulls live Gitea state and injects it into the synthesis prompt	None
13	Explains implications for Hermes/OpenClaw/Nexus/Timmy	✅ Met	`production_briefing_v1.txt` explicitly requires "so what" analysis tied to our systems	None
14	Product is context-rich daily deep dive, not generic AI news read aloud	✅ Met	Prompt architecture enforces narrative framing around fleet context and actionable implications	None

Score: 11 ✅ / 2 ⚠️ / 0 ❌

Component Maturity Assessment

Component	Maturity	Notes
Source aggregation (arXiv + blogs)	🟢 Production	RSS fetchers with caching and retry logic
Relevance engine (embeddings + keywords)	🟢 Production	`sentence-transformers` with fallback keyword scoring
Synthesis LLM prompt	🟢 Production	`production_briefing_v1.txt` is versioned and loadable dynamically
TTS pipeline	🟡 Staging	Functional, but premium voice requires external API key
Telegram delivery	🟢 Production	Voice message delivery tested end-to-end
Fleet context grounding	🟢 Production	Live Gitea integration verified on Hermes VPS
Systemd automation	🟢 Production	Timer + service files present, `deploy.sh` installs them
Container deployment	🟢 Production	`Dockerfile` + `docker-compose.yml` + `deploy.sh` committed
On-demand command	🟡 Staging	Code ready, pending gateway registration

Risk Register

Risk	Likelihood	Impact	Mitigation
LLM endpoint down at 06:00	Medium	High	`deploy.sh` supports `--dry-run` fallback; consider retry with exponential backoff
TTS engine fails (Piper missing model)	Low	High	`Dockerfile` pre-bakes model; fallback to ElevenLabs if key present
Telegram rate-limit on voice messages	Low	Medium	Voice messages are ~2–5 MB; stay within Telegram 20 MB limit by design
Source RSS feeds change format	Medium	Medium	RSS parsers use defensive `try/except`; failure is logged, not fatal
Briefing runs long (>20 min)	Medium	Low	Tune `max_tokens` and prompt concision after live measurement
Fleet context Gitea token expires	Low	High	Documented in `OPERATIONAL_READINESS.md`; rotate annually

Go-Live Prerequisites (Named Concretely)

Hermes gateway command registration
- File: hermes-agent/gateway/run.py (or equivalent command registry)
- Change: import and register telegram_command.deepdive_handler under /deepdive
- Effort: ~5 minutes
Premium TTS decision
- Option A: inject ELEVENLABS_API_KEY into docker-compose.yml environment
- Option B: stay with Piper and accept "good enough" voice quality
- Decision owner: @rockachopa
Empirical runtime validation
- Run deploy.sh --dry-run 3–5 times
- Measure generated audio length
- Adjust config.yaml synthesis.max_tokens to land briefing in 10–15 minute window
- Effort: ~30 minutes over 3 days
Secrets injection
- GITEA_TOKEN (fleet context)
- TELEGRAM_BOT_TOKEN (delivery)
- ELEVENLABS_API_KEY (optional, premium voice)
- Effort: ~5 minutes

Ezra Assessment

#830 is not a 21-point architecture problem anymore. It is a 2-point operations and tuning task.

The code runs.
The container builds.
The timer installs.
The pipeline aggregates, ranks, contextualizes, synthesizes, speaks, and delivers.

What remains is:

One line of gateway hook-up.
One secrets injection.
Three to five live runs for runtime calibration.

Ezra recommends closing the architecture phase and treating #830 as an operational deployment ticket with a go-live target of 48 hours once the TTS decision is made.

References

intelligence/deepdive/OPERATIONAL_READINESS.md — deployment checklist
intelligence/deepdive/QUALITY_FRAMEWORK.md — evaluation rubrics
intelligence/deepdive/architecture.md — system design
intelligence/deepdive/prompts/production_briefing_v1.txt — synthesis prompt
intelligence/deepdive/deploy.sh — one-command deployment

6.6 KiB Raw Blame History Unescape Escape