feat: Phase 3.5 — DPO training pair generation from Deep Dive pipeline

Wire arXiv relevance filter output directly into DPO pair generation,
closing the loop between research synthesis and overnight training data.

New module: dpo_generator.py
- DPOPairGenerator class with 3 pair strategies:
  * summarize: paper → fleet-grounded analysis (chosen) vs generic (rejected)
  * relevance: 'what matters to Hermes?' → scored context vs vague
  * implication: 'what should we do?' → actionable insight vs platitude
- Extracts synthesis excerpts matched to each ranked item
- Outputs to ~/.timmy/training-data/dpo-pairs/deepdive_{timestamp}.jsonl
- Format: {prompt, chosen, rejected, task_type, evidence_ids,
  source_session, safety_flags, metadata}

Pipeline changes (pipeline.py):
- Import DPOPairGenerator with graceful degradation
- Initialize from config deepdive.training.dpo section
- Execute as Phase 3.5 between synthesis and audio
- DPO results included in pipeline return dict
- Wrapped in try/except — DPO failure never blocks delivery

Config changes (config.yaml):
- New deepdive.training.dpo section with:
  enabled, output_dir, min_score, max_pairs_per_run, pair_types

Integration tested: 2 mock items × 3 pair types = 6 valid JSONL pairs.
Chosen responses consistently richer than rejected (assert-verified).
This commit is contained in:
perplexity
2026-04-13 02:24:04 +00:00
committed by Alexander Whitestone
parent 61c24c390b
commit c3b455bd9c
3 changed files with 426 additions and 2 deletions

View File

@@ -88,6 +88,18 @@ deepdive:
speed: 1.0
output_format: "mp3" # piper outputs WAV, convert for Telegram
# Phase 3.5: DPO Training Pair Generation
training:
dpo:
enabled: true
output_dir: "~/.timmy/training-data/dpo-pairs"
min_score: 0.5 # Only generate pairs from items above this relevance score
max_pairs_per_run: 30 # Cap pairs per pipeline execution
pair_types: # Which pair strategies to use
- "summarize" # Paper summary → fleet-grounded analysis
- "relevance" # Relevance analysis → scored fleet context
- "implication" # Implications → actionable insight
# Phase 0: Fleet Context Grounding
fleet_context:
enabled: true