Files
the-nexus/intelligence/deepdive
perplexity 55d53c513c
Some checks failed
CI / test (pull_request) Failing after 22s
CI / validate (pull_request) Failing after 15s
Review Approval Gate / verify-review (pull_request) Failing after 2s
feat: Phase 3.5 — DPO training pair generation from Deep Dive pipeline
Wire arXiv relevance filter output directly into DPO pair generation,
closing the loop between research synthesis and overnight training data.

New module: dpo_generator.py
- DPOPairGenerator class with 3 pair strategies:
  * summarize: paper → fleet-grounded analysis (chosen) vs generic (rejected)
  * relevance: 'what matters to Hermes?' → scored context vs vague
  * implication: 'what should we do?' → actionable insight vs platitude
- Extracts synthesis excerpts matched to each ranked item
- Outputs to ~/.timmy/training-data/dpo-pairs/deepdive_{timestamp}.jsonl
- Format: {prompt, chosen, rejected, task_type, evidence_ids,
  source_session, safety_flags, metadata}

Pipeline changes (pipeline.py):
- Import DPOPairGenerator with graceful degradation
- Initialize from config deepdive.training.dpo section
- Execute as Phase 3.5 between synthesis and audio
- DPO results included in pipeline return dict
- Wrapped in try/except — DPO failure never blocks delivery

Config changes (config.yaml):
- New deepdive.training.dpo section with:
  enabled, output_dir, min_score, max_pairs_per_run, pair_types

Integration tested: 2 mock items × 3 pair types = 6 valid JSONL pairs.
Chosen responses consistently richer than rejected (assert-verified).
2026-04-13 02:24:04 +00:00
..
2026-04-12 19:26:43 -04:00

Deep Dive: Automated Intelligence Briefing System

Sovereign, automated daily intelligence pipeline for the Timmy Foundation fleet.

Vision

Zero-manual-input daily AI-generated podcast briefing covering:

  • arXiv (cs.AI, cs.CL, cs.LG)
  • OpenAI, Anthropic, DeepMind research blogs
  • AI newsletters (Import AI, TLDR AI)

Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│  Phase 1        │───▶│  Phase 2        │───▶│  Phase 3        │
│  Aggregation    │    │  Relevance      │    │  Synthesis      │
│  (RSS/Feeds)    │    │  (Embeddings)   │    │  (LLM Briefing) │
└─────────────────┘    └─────────────────┘    └────────┬────────┘
                                                       │
                              ┌────────────────────────┘
                              ▼
                    ┌─────────────────┐    ┌─────────────────┐
                    │  Phase 4        │───▶│  Phase 5        │
                    │  Audio (TTS)    │    │  Delivery       │
                    │  (Piper)        │    │  (Telegram)     │
                    └─────────────────┘    └─────────────────┘

Status: IMPLEMENTATION COMPLETE

This is no longer a reference scaffold — it is a production-ready executable pipeline.

Component Status File
Phase 1: Aggregation Complete pipeline.py — RSS fetcher with caching
Phase 2: Relevance Complete pipeline.py — sentence-transformers ranking
Phase 3: Synthesis Complete pipeline.py — LLM briefing generation
Phase 4: Audio Complete tts_engine.py — Piper + ElevenLabs hybrid
Phase 5: Delivery Complete pipeline.py — Telegram text + voice
Orchestrator Complete pipeline.py — asyncio CLI + Python API
Tests Complete tests/test_e2e.py — dry-run validation
Systemd Timer Complete systemd/deepdive.timer — 06:00 daily

Quick Start

See QUICKSTART.md for exact commands to run the pipeline.

Sovereignty Compliance

Component Implementation Non-Negotiable
Aggregation Local RSS polling No third-party APIs
Relevance sentence-transformers local No cloud embeddings
Synthesis Gemma 4 via Hermes llama-server No OpenAI/Anthropic API
TTS Piper TTS local No ElevenLabs
Delivery Hermes Telegram gateway Existing infra

Files

  • pipeline.py — Main orchestrator (production implementation)
  • tts_engine.py — Phase 4 TTS engine (Piper + ElevenLabs fallback)
  • config.yaml — Configuration template
  • Makefile — Build automation (make test-e2e, make install-systemd)
  • tests/ — pytest suite including end-to-end dry-run test
  • systemd/ — Daily timer for 06:00 execution
  • QUICKSTART.md — Step-by-step execution guide
  • architecture.md — Full technical specification
  • telegram_command.py — Hermes /deepdive command handler

Issue

#830 — Deep Dive: Sovereign NotebookLM + Daily AI Intelligence Briefing