feat: Phase 3.5 — DPO training pair generation from Deep Dive pipeline #1347

perplexity · 2026-04-13T02:24:22Z

perplexity commented

2026-04-13 02:24:22 +00:00

Summary

Wires the arXiv relevance filter in the Deep Dive pipeline to output DPO training pairs directly, closing the loop between research synthesis and overnight training data.

Changes

New: `intelligence/deepdive/dpo_generator.py`

DPOPairGenerator class with 3 pair strategies:
- summarize: paper → fleet-grounded analysis (chosen) vs generic abstract (rejected)
- relevance: "what matters to Hermes?" → scored fleet-context analysis vs vague response
- implication: "what should we do?" → actionable insight vs platitude
Extracts synthesis excerpts matched to each ranked item via title-word overlap
Outputs to ~/.timmy/training-data/dpo-pairs/deepdive_{timestamp}.jsonl
Compatible format: {prompt, chosen, rejected, task_type, evidence_ids, source_session, safety_flags, metadata}

Modified: `intelligence/deepdive/pipeline.py`

Imports DPOPairGenerator with graceful degradation (HAS_DPO_GENERATOR flag)
Initializes from deepdive.training.dpo config section
Executes as Phase 3.5 between synthesis (Phase 3) and audio (Phase 4)
DPO results included in pipeline return dict under result["dpo"]
Wrapped in try/except — DPO failure never blocks briefing delivery

Modified: `intelligence/deepdive/config.yaml`

New deepdive.training.dpo section with: enabled, output_dir, min_score, max_pairs_per_run, pair_types

Testing

Integration tested with mock data: 2 ranked items × 3 pair types = 6 valid JSONL pairs.

All pairs validated: valid JSON, required fields present
Chosen responses consistently richer than rejected (assert-verified)
Python syntax check: both files compile clean
YAML validation: config loads correctly with all expected values

Pipeline Flow

Phase 1: RSS Aggregation → raw items
Phase 2: Relevance Scoring → ranked: [(FeedItem, score), ...]
Phase 0: Fleet Context → fleet_ctx
Phase 3: Synthesis → briefing dict
Phase 3.5: DPO Generation → training pairs JSONL  ← NEW
Phase 4: Audio (TTS)
Phase 5: Delivery (Telegram)

Overnight Training Integration

The DPO pairs land in ~/.timmy/training-data/dpo-pairs/ where the overnight R&D task (timmy-config #503) picks them up for the tightening loop.

## Summary Wires the arXiv relevance filter in the Deep Dive pipeline to output DPO training pairs directly, closing the loop between research synthesis and overnight training data. ## Changes ### New: `intelligence/deepdive/dpo_generator.py` - `DPOPairGenerator` class with 3 pair strategies: - **summarize**: paper → fleet-grounded analysis (chosen) vs generic abstract (rejected) - **relevance**: "what matters to Hermes?" → scored fleet-context analysis vs vague response - **implication**: "what should we do?" → actionable insight vs platitude - Extracts synthesis excerpts matched to each ranked item via title-word overlap - Outputs to `~/.timmy/training-data/dpo-pairs/deepdive_{timestamp}.jsonl` - Compatible format: `{prompt, chosen, rejected, task_type, evidence_ids, source_session, safety_flags, metadata}` ### Modified: `intelligence/deepdive/pipeline.py` - Imports `DPOPairGenerator` with graceful degradation (`HAS_DPO_GENERATOR` flag) - Initializes from `deepdive.training.dpo` config section - Executes as **Phase 3.5** between synthesis (Phase 3) and audio (Phase 4) - DPO results included in pipeline return dict under `result["dpo"]` - Wrapped in try/except — DPO failure never blocks briefing delivery ### Modified: `intelligence/deepdive/config.yaml` - New `deepdive.training.dpo` section with: `enabled`, `output_dir`, `min_score`, `max_pairs_per_run`, `pair_types` ## Testing Integration tested with mock data: 2 ranked items × 3 pair types = 6 valid JSONL pairs. - All pairs validated: valid JSON, required fields present - Chosen responses consistently richer than rejected (assert-verified) - Python syntax check: both files compile clean - YAML validation: config loads correctly with all expected values ## Pipeline Flow ``` Phase 1: RSS Aggregation → raw items Phase 2: Relevance Scoring → ranked: [(FeedItem, score), ...] Phase 0: Fleet Context → fleet_ctx Phase 3: Synthesis → briefing dict Phase 3.5: DPO Generation → training pairs JSONL ← NEW Phase 4: Audio (TTS) Phase 5: Delivery (Telegram) ``` ## Overnight Training Integration The DPO pairs land in `~/.timmy/training-data/dpo-pairs/` where the overnight R&D task (timmy-config #503) picks them up for the tightening loop.

perplexity added 1 commit 2026-04-13 02:24:23 +00:00

feat: Phase 3.5 — DPO training pair generation from Deep Dive pipeline

CI / test (pull_request) Failing after 22s

Details

CI / validate (pull_request) Failing after 15s

Details

Review Approval Gate / verify-review (pull_request) Failing after 2s

Details

55d53c513c

Wire arXiv relevance filter output directly into DPO pair generation,
closing the loop between research synthesis and overnight training data.

New module: dpo_generator.py
- DPOPairGenerator class with 3 pair strategies:
  * summarize: paper → fleet-grounded analysis (chosen) vs generic (rejected)
  * relevance: 'what matters to Hermes?' → scored context vs vague
  * implication: 'what should we do?' → actionable insight vs platitude
- Extracts synthesis excerpts matched to each ranked item
- Outputs to ~/.timmy/training-data/dpo-pairs/deepdive_{timestamp}.jsonl
- Format: {prompt, chosen, rejected, task_type, evidence_ids,
  source_session, safety_flags, metadata}

Pipeline changes (pipeline.py):
- Import DPOPairGenerator with graceful degradation
- Initialize from config deepdive.training.dpo section
- Execute as Phase 3.5 between synthesis and audio
- DPO results included in pipeline return dict
- Wrapped in try/except — DPO failure never blocks delivery

Config changes (config.yaml):
- New deepdive.training.dpo section with:
  enabled, output_dir, min_score, max_pairs_per_run, pair_types

Integration tested: 2 mock items × 3 pair types = 6 valid JSONL pairs.
Chosen responses consistently richer than rejected (assert-verified).

perplexity merged commit c19000de03 into main

2026-04-13 02:24:36 +00:00

perplexity referenced this issue from a commit

2026-04-13 02:24:36 +00:00

Merge pull request 'feat: Phase 3.5 — DPO training pair generation from Deep Dive pipeline' (#1347) from feature/deepdive-dpo-phase-3.5 into main

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Timmy_Foundation/the-nexus#1347

feat: Phase 3.5 — DPO training pair generation from Deep Dive pipeline #1347

Summary

Changes

New: intelligence/deepdive/dpo_generator.py

Modified: intelligence/deepdive/pipeline.py

Modified: intelligence/deepdive/config.yaml

Testing

Pipeline Flow

Overnight Training Integration

New: `intelligence/deepdive/dpo_generator.py`

Modified: `intelligence/deepdive/pipeline.py`

Modified: `intelligence/deepdive/config.yaml`