Commit Graph

36 Commits

Author SHA1 Message Date
perplexity
4b15cf8283 feat: full-history persistent dedup index for DPO training pairs
Some checks failed
CI / test (pull_request) Failing after 16s
CI / validate (pull_request) Failing after 14s
Review Approval Gate / verify-review (pull_request) Failing after 3s
Replace the 5-file sliding window cross-run dedup with a persistent
hash index that covers ALL historical training data. Overfitting risk
compounds across the full dataset — a 5-file window lets old duplicates
leak back into training after enough overnight runs.

New module: dedup_index.py (DedupIndex)
- Persistent JSON index (.dpo_dedup_index.json) alongside JSONL files
- Append-on-export: new prompt hashes registered after each successful
  export — no full rescan needed for normal operations
- Incremental sync: on load, detects JSONL files not yet indexed and
  ingests them automatically (handles files from other tools)
- Full rebuild: rebuild() scans ALL deepdive_*.jsonl + pairs_*.jsonl
  to reconstruct from scratch (first run, corruption recovery)
- Atomic writes (write-to-tmp + rename) to prevent index corruption
- Standalone CLI: python3 dedup_index.py <dir> --rebuild --stats

Modified: dpo_quality.py
- Imports DedupIndex with graceful degradation
- Replaces _load_history_hashes() with persistent index lookup
- Fallback: if index unavailable, scans ALL files in-memory (not just 5)
- New register_exported_hashes() method called after export
- Config key: dedup_full_history (replaces dedup_history_files)

Modified: dpo_generator.py
- Calls validator.register_exported_hashes() after successful export
  to keep the persistent index current without rescanning

Modified: config.yaml
- Replaced dedup_history_files: 5 with dedup_full_history: true

Tested — 7 integration tests:
  ✓ Fresh index build from empty directory
  ✓ Build from 3 existing JSONL files (15 unique hashes)
  ✓ Incremental sync when new file appears between runs
  ✓ Append after export + persistence across reloads
  ✓ Rebuild from scratch (recovers from corruption)
  ✓ Validator catches day-1 dupe from 20-day history (5-file window miss)
  ✓ Full pipeline: generate → validate → export → register → re-run detects
2026-04-13 03:11:10 +00:00
perplexity
bb4922adeb feat: DPO pair quality validator — gate before overnight training
Some checks failed
CI / test (pull_request) Failing after 20s
CI / validate (pull_request) Failing after 16s
Review Approval Gate / verify-review (pull_request) Failing after 2s
Add DPOQualityValidator that catches bad training pairs before they
enter the tightening loop. Wired into DPOPairGenerator between
generate() and export() as an automatic quality gate.

New module: dpo_quality.py
- 5 single-pair quality checks:
  1. Field length minimums (prompt ≥40, chosen ≥80, rejected ≥30 chars)
  2. Chosen/rejected length ratio (chosen must be ≥1.3x longer)
  3. Chosen≈rejected similarity (Jaccard ≤0.70 — catches low-contrast)
  4. Vocabulary diversity in chosen (unique word ratio ≥0.30)
  5. Substance markers in chosen (≥2 fleet/training/action terms)
- 2 cross-pair quality checks:
  6. Near-duplicate prompts within batch (Jaccard ≤0.85)
  7. Cross-run dedup against recent JSONL history files
- Two modes: 'drop' (filter out bad pairs) or 'flag' (export with warning)
- BatchReport with per-pair diagnostics, pass rates, and warnings
- Standalone CLI: python3 dpo_quality.py <file.jsonl> [--strict] [--json]

Modified: dpo_generator.py
- Imports DPOQualityValidator with graceful degradation
- Initializes from config validation section (enabled by default)
- Validates between generate() and export() in run()
- Quality report included in pipeline result dict
- Validator failure never blocks — falls back to unvalidated export

Modified: config.yaml
- New deepdive.training.dpo.validation section with all tunable knobs:
  enabled, flagged_pair_action, similarity thresholds, length minimums,
  dedup_history_files

Integration tested — 6 test cases covering:
  ✓ Good pairs pass (3/3 accepted)
  ✓ Bad pairs caught: too-short, high-similarity, inverted signal (0/3)
  ✓ Near-duplicate prompt detection (1/2 deduped)
  ✓ Flag mode preserves pairs with warnings (3/3 flagged)
  ✓ Cross-run deduplication against history (1 dupe caught)
  ✓ Full generator→validator→export pipeline (6/6 validated)
2026-04-13 02:46:50 +00:00
perplexity
55d53c513c feat: Phase 3.5 — DPO training pair generation from Deep Dive pipeline
Some checks failed
CI / test (pull_request) Failing after 22s
CI / validate (pull_request) Failing after 15s
Review Approval Gate / verify-review (pull_request) Failing after 2s
Wire arXiv relevance filter output directly into DPO pair generation,
closing the loop between research synthesis and overnight training data.

New module: dpo_generator.py
- DPOPairGenerator class with 3 pair strategies:
  * summarize: paper → fleet-grounded analysis (chosen) vs generic (rejected)
  * relevance: 'what matters to Hermes?' → scored context vs vague
  * implication: 'what should we do?' → actionable insight vs platitude
- Extracts synthesis excerpts matched to each ranked item
- Outputs to ~/.timmy/training-data/dpo-pairs/deepdive_{timestamp}.jsonl
- Format: {prompt, chosen, rejected, task_type, evidence_ids,
  source_session, safety_flags, metadata}

Pipeline changes (pipeline.py):
- Import DPOPairGenerator with graceful degradation
- Initialize from config deepdive.training.dpo section
- Execute as Phase 3.5 between synthesis and audio
- DPO results included in pipeline return dict
- Wrapped in try/except — DPO failure never blocks delivery

Config changes (config.yaml):
- New deepdive.training.dpo section with:
  enabled, output_dir, min_score, max_pairs_per_run, pair_types

Integration tested: 2 mock items × 3 pair types = 6 valid JSONL pairs.
Chosen responses consistently richer than rejected (assert-verified).
2026-04-13 02:24:04 +00:00
f737577faf purge: remove Anthropic from the-nexus fleet + deepdive (#1346)
Some checks failed
Deploy Nexus / deploy (push) Failing after 3s
Staging Verification Gate / verify-staging (push) Failing after 3s
2026-04-13 02:02:12 +00:00
Alexander Whitestone
adec58f980 fix: closes #830
Some checks failed
CI / test (pull_request) Failing after 9s
CI / validate (pull_request) Failing after 15s
Review Approval Gate / verify-review (pull_request) Failing after 3s
2026-04-12 19:26:43 -04:00
Alexander Whitestone
ef74536e33 feat: add edge-tts as zero-cost voice output provider
Some checks failed
CI / test (pull_request) Failing after 33s
CI / validate (pull_request) Failing after 26s
Review Approval Gate / verify-review (pull_request) Failing after 5s
- Add EdgeTTSAdapter to bin/deepdive_tts.py (provider key: "edge-tts")
  default voice: en-US-GuyNeural, no API key required
- Add EdgeTTS class to intelligence/deepdive/tts_engine.py
- Update HybridTTS to try edge-tts as fallback between piper and elevenlabs
- Add --voice-memo flag to bin/night_watch.py for spoken nightly reports
- Add edge-tts>=6.1.9 to requirements.txt
- Create docs/voice-output.md documenting all providers and fallback chain
- Add tests/test_edge_tts.py with 17 unit tests (all mocked, no network)

Fixes #1126

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 06:29:26 -04:00
34862cf5e5 feat(fleet): promote Ollama to first-class provider, assign Gemma 4 across fleet
Some checks failed
Deploy Nexus / deploy (push) Failing after 3s
Staging Verification Gate / verify-staging (push) Failing after 3s
- lazarus-registry.yaml: replace big_brain/RunPod with local ollama/gemma4:12b
- fleet-routing.json: assign ollama:gemma4:12b to carnice, bilbobagginshire, substratum
- intelligence/deepdive/config.yaml: local model -> gemma4:12b
2026-04-07 15:55:52 +00:00
Ezra
ce2cd85adc [ezra] Production Readiness Review for Deep Dive (#830) 2026-04-05 21:00:26 +00:00
Ezra (Archivist)
d2f103654f intelligence(deepdive): Docker deployment scaffold for #830
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
- Add Dockerfile for production containerized pipeline
- Add docker-compose.yml for full stack deployment
- Add .dockerignore for clean builds
- Add deploy.sh: one-command build, test, and systemd timer install

This provides a sovereign, reproducible deployment path for the
Deep Dive daily briefing pipeline.
2026-04-05 20:40:58 +00:00
Ezra (Archivist)
4b1873d76e feat(deepdive): production briefing prompt + prompt engineering KT
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
- production_briefing_v1.txt: podcast-script prompt engineered for
  10-15 min premium audio, grounded fleet context, and actionable tone.
- PROMPT_ENGINEERING_KT.md: A/B testing protocol, failure modes,
  and maintenance checklist.
- pipeline.py: load external prompt_file from config.yaml.

Refs #830
2026-04-05 20:19:20 +00:00
Ezra (Archivist)
9ad2132482 [ezra] #830: Operational readiness checklist + fix Gitea URL to forge
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 19:54:47 +00:00
Ezra
3df184e1e6 feat(deepdive): quality evaluation framework
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
- Add quality_eval.py: automated briefing quality scorer with drift detection
- Add QUALITY_FRAMEWORK.md: rubric, usage guide, and production integration spec

Refs #830
2026-04-05 19:03:05 +00:00
Ezra (Archivist)
00600a7e67 [BURN] Deep Dive proof-of-life, fleet context fix, dry-run repair
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
- Fix fleet_context.py env-var substitution for 0c16baadaebaaabc2c8390f35ef5e9aa2f4db671
- Remove non-existent wizard-checkpoints from config.yaml
- Fix bin/deepdive_orchestrator.py dry-run mock items
- Add PROOF_OF_LIFE.md with live execution output including fleet context

Progresses #830
2026-04-05 18:42:18 +00:00
Ezra (Archivist)
014bb3b71e [ezra] Gemini handoff for Deep Dive (#830)
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
- Add GEMINI_HANDOFF.md with codebase map, secrets inventory,
  production checklist, and recommended next steps
- Continuity from Ezra scaffold to Gemini production-hardening
2026-04-05 18:20:53 +00:00
b6a473d808 test(deepdive): add fleet context unit tests (#830)
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 17:32:25 +00:00
5f4cc8cae2 config(deepdive): enable fleet context grounding (#830)
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 17:32:24 +00:00
ca1a11f66b feat(deepdive): integrate Phase 0 fleet context into synthesis (#830)
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 17:32:23 +00:00
7189565d4d feat(deepdive): add Phase 0 fleet context grounding module (#830)
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 17:32:22 +00:00
b3bec469b1 [ezra] #830: Pipeline proof-of-execution document
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 12:46:03 +00:00
16bd546fc9 [ezra] #830: Fix config wrapper, add arXiv API fallback, implement voice delivery, fix datetime
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 12:45:07 +00:00
76c973c0c2 Update README to reflect production implementation status (#830)
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 12:18:18 +00:00
fc237e67d7 Add Telegram /deepdive command handler for on-demand briefings (#830)
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
Hermes-compatible command handler that parses /deepdive args,
runs the pipeline, and returns status + audio to Telegram.
2026-04-05 12:17:17 +00:00
25a45467ac Add QUICKSTART.md for Deep Dive pipeline (#830)
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
Step-by-step guide for installation, dry-run testing, live
delivery, systemd timer enablement, and Telegram command setup.
2026-04-05 12:17:16 +00:00
92f1164be9 Add TTS engine implementation for Deep Dive (#830)
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
Executable Phase 4 component: PiperTTS, ElevenLabsTTS, HybridTTS
classes with chunking, concatenation, error handling.

Ready for integration with Phase 3 synthesizer.

Burn mode artifact by Ezra.
2026-04-05 08:31:34 +00:00
6c5ac52374 [BURN] #830: End-to-end pipeline test (dry-run validation)
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 08:08:11 +00:00
b131a12592 [BURN] #830: Phase 2 tests (relevance scoring)
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 08:08:10 +00:00
ffae1b6285 [BURN] #830: Phase 1 tests (arXiv RSS aggregation)
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 08:08:08 +00:00
f8634c0105 [BURN] #830: Systemd timer for daily 06:00 execution
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 08:08:07 +00:00
c488bb7e94 [BURN] #830: Systemd service unit
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 08:08:07 +00:00
66f632bd99 [BURN] #830: Build automation (Makefile)
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 08:06:12 +00:00
44302bbdf9 [BURN] #830: Working pipeline.py implementation (645 lines, executable)
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 08:06:11 +00:00
88af4870d3 [scaffold] Deep Dive intelligence pipeline: intelligence/deepdive/requirements.txt
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 06:19:51 +00:00
cca5909cf9 [scaffold] Deep Dive intelligence pipeline: intelligence/deepdive/config.yaml
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 06:19:50 +00:00
a8b4f7a8c0 [scaffold] Deep Dive intelligence pipeline: intelligence/deepdive/pipeline.py
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 06:19:49 +00:00
949becff22 [scaffold] Deep Dive intelligence pipeline: intelligence/deepdive/architecture.md
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 06:19:48 +00:00
fc11ea8a28 [scaffold] Deep Dive intelligence pipeline: intelligence/deepdive/README.md
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-05 06:19:47 +00:00