the-nexus

Author	SHA1	Message	Date
perplexity	9f90392a93	feat: full-history persistent dedup index for DPO training pairs Replace the 5-file sliding window cross-run dedup with a persistent hash index that covers ALL historical training data. Overfitting risk compounds across the full dataset — a 5-file window lets old duplicates leak back into training after enough overnight runs. New module: dedup_index.py (DedupIndex) - Persistent JSON index (.dpo_dedup_index.json) alongside JSONL files - Append-on-export: new prompt hashes registered after each successful export — no full rescan needed for normal operations - Incremental sync: on load, detects JSONL files not yet indexed and ingests them automatically (handles files from other tools) - Full rebuild: rebuild() scans ALL deepdive_.jsonl + pairs_.jsonl to reconstruct from scratch (first run, corruption recovery) - Atomic writes (write-to-tmp + rename) to prevent index corruption - Standalone CLI: python3 dedup_index.py <dir> --rebuild --stats Modified: dpo_quality.py - Imports DedupIndex with graceful degradation - Replaces _load_history_hashes() with persistent index lookup - Fallback: if index unavailable, scans ALL files in-memory (not just 5) - New register_exported_hashes() method called after export - Config key: dedup_full_history (replaces dedup_history_files) Modified: dpo_generator.py - Calls validator.register_exported_hashes() after successful export to keep the persistent index current without rescanning Modified: config.yaml - Replaced dedup_history_files: 5 with dedup_full_history: true Tested — 7 integration tests: ✓ Fresh index build from empty directory ✓ Build from 3 existing JSONL files (15 unique hashes) ✓ Incremental sync when new file appears between runs ✓ Append after export + persistence across reloads ✓ Rebuild from scratch (recovers from corruption) ✓ Validator catches day-1 dupe from 20-day history (5-file window miss) ✓ Full pipeline: generate → validate → export → register → re-run detects	2026-04-15 21:24:01 -04:00
perplexity	d15a82ff1e	feat: DPO pair quality validator — gate before overnight training Add DPOQualityValidator that catches bad training pairs before they enter the tightening loop. Wired into DPOPairGenerator between generate() and export() as an automatic quality gate. New module: dpo_quality.py - 5 single-pair quality checks: 1. Field length minimums (prompt ≥40, chosen ≥80, rejected ≥30 chars) 2. Chosen/rejected length ratio (chosen must be ≥1.3x longer) 3. Chosen≈rejected similarity (Jaccard ≤0.70 — catches low-contrast) 4. Vocabulary diversity in chosen (unique word ratio ≥0.30) 5. Substance markers in chosen (≥2 fleet/training/action terms) - 2 cross-pair quality checks: 6. Near-duplicate prompts within batch (Jaccard ≤0.85) 7. Cross-run dedup against recent JSONL history files - Two modes: 'drop' (filter out bad pairs) or 'flag' (export with warning) - BatchReport with per-pair diagnostics, pass rates, and warnings - Standalone CLI: python3 dpo_quality.py <file.jsonl> [--strict] [--json] Modified: dpo_generator.py - Imports DPOQualityValidator with graceful degradation - Initializes from config validation section (enabled by default) - Validates between generate() and export() in run() - Quality report included in pipeline result dict - Validator failure never blocks — falls back to unvalidated export Modified: config.yaml - New deepdive.training.dpo.validation section with all tunable knobs: enabled, flagged_pair_action, similarity thresholds, length minimums, dedup_history_files Integration tested — 6 test cases covering: ✓ Good pairs pass (3/3 accepted) ✓ Bad pairs caught: too-short, high-similarity, inverted signal (0/3) ✓ Near-duplicate prompt detection (1/2 deduped) ✓ Flag mode preserves pairs with warnings (3/3 flagged) ✓ Cross-run deduplication against history (1 dupe caught) ✓ Full generator→validator→export pipeline (6/6 validated)	2026-04-15 21:24:01 -04:00
perplexity	c3b455bd9c	feat: Phase 3.5 — DPO training pair generation from Deep Dive pipeline Wire arXiv relevance filter output directly into DPO pair generation, closing the loop between research synthesis and overnight training data. New module: dpo_generator.py - DPOPairGenerator class with 3 pair strategies: * summarize: paper → fleet-grounded analysis (chosen) vs generic (rejected) * relevance: 'what matters to Hermes?' → scored context vs vague * implication: 'what should we do?' → actionable insight vs platitude - Extracts synthesis excerpts matched to each ranked item - Outputs to ~/.timmy/training-data/dpo-pairs/deepdive_{timestamp}.jsonl - Format: {prompt, chosen, rejected, task_type, evidence_ids, source_session, safety_flags, metadata} Pipeline changes (pipeline.py): - Import DPOPairGenerator with graceful degradation - Initialize from config deepdive.training.dpo section - Execute as Phase 3.5 between synthesis and audio - DPO results included in pipeline return dict - Wrapped in try/except — DPO failure never blocks delivery Config changes (config.yaml): - New deepdive.training.dpo section with: enabled, output_dir, min_score, max_pairs_per_run, pair_types Integration tested: 2 mock items × 3 pair types = 6 valid JSONL pairs. Chosen responses consistently richer than rejected (assert-verified).	2026-04-15 21:24:01 -04:00
Perplexity Computer	61c24c390b	purge: remove Anthropic from the-nexus fleet + deepdive (#1346 )	2026-04-15 21:24:01 -04:00
Alexander Whitestone	557713501c	fix: closes #830	2026-04-15 21:24:01 -04:00
Alexander Whitestone	ef74536e33	feat: add edge-tts as zero-cost voice output provider Some checks failed CI / test (pull_request) Failing after 33s Details CI / validate (pull_request) Failing after 26s Details Review Approval Gate / verify-review (pull_request) Failing after 5s Details - Add EdgeTTSAdapter to bin/deepdive_tts.py (provider key: "edge-tts") default voice: en-US-GuyNeural, no API key required - Add EdgeTTS class to intelligence/deepdive/tts_engine.py - Update HybridTTS to try edge-tts as fallback between piper and elevenlabs - Add --voice-memo flag to bin/night_watch.py for spoken nightly reports - Add edge-tts>=6.1.9 to requirements.txt - Create docs/voice-output.md documenting all providers and fallback chain - Add tests/test_edge_tts.py with 17 unit tests (all mocked, no network) Fixes #1126 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 06:29:26 -04:00
Bezalel	34862cf5e5	feat(fleet): promote Ollama to first-class provider, assign Gemma 4 across fleet Some checks failed Deploy Nexus / deploy (push) Failing after 3s Details Staging Verification Gate / verify-staging (push) Failing after 3s Details - lazarus-registry.yaml: replace big_brain/RunPod with local ollama/gemma4:12b - fleet-routing.json: assign ollama:gemma4:12b to carnice, bilbobagginshire, substratum - intelligence/deepdive/config.yaml: local model -> gemma4:12b	2026-04-07 15:55:52 +00:00
Ezra	ce2cd85adc	[ezra] Production Readiness Review for Deep Dive (#830 )	2026-04-05 21:00:26 +00:00
Ezra (Archivist)	d2f103654f	intelligence(deepdive): Docker deployment scaffold for #830 Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details - Add Dockerfile for production containerized pipeline - Add docker-compose.yml for full stack deployment - Add .dockerignore for clean builds - Add deploy.sh: one-command build, test, and systemd timer install This provides a sovereign, reproducible deployment path for the Deep Dive daily briefing pipeline.	2026-04-05 20:40:58 +00:00
Ezra (Archivist)	4b1873d76e	feat(deepdive): production briefing prompt + prompt engineering KT Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details - production_briefing_v1.txt: podcast-script prompt engineered for 10-15 min premium audio, grounded fleet context, and actionable tone. - PROMPT_ENGINEERING_KT.md: A/B testing protocol, failure modes, and maintenance checklist. - pipeline.py: load external prompt_file from config.yaml. Refs #830	2026-04-05 20:19:20 +00:00
Ezra (Archivist)	9ad2132482	[ezra] #830 : Operational readiness checklist + fix Gitea URL to forge Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 19:54:47 +00:00
Ezra	3df184e1e6	feat(deepdive): quality evaluation framework Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details - Add quality_eval.py: automated briefing quality scorer with drift detection - Add QUALITY_FRAMEWORK.md: rubric, usage guide, and production integration spec Refs #830	2026-04-05 19:03:05 +00:00
Ezra (Archivist)	00600a7e67	[BURN] Deep Dive proof-of-life, fleet context fix, dry-run repair Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details - Fix fleet_context.py env-var substitution for 0c16baadaebaaabc2c8390f35ef5e9aa2f4db671 - Remove non-existent wizard-checkpoints from config.yaml - Fix bin/deepdive_orchestrator.py dry-run mock items - Add PROOF_OF_LIFE.md with live execution output including fleet context Progresses #830	2026-04-05 18:42:18 +00:00
Ezra (Archivist)	014bb3b71e	[ezra] Gemini handoff for Deep Dive (#830 ) Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details - Add GEMINI_HANDOFF.md with codebase map, secrets inventory, production checklist, and recommended next steps - Continuity from Ezra scaffold to Gemini production-hardening	2026-04-05 18:20:53 +00:00
Ezra	b6a473d808	test(deepdive): add fleet context unit tests (#830 ) Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 17:32:25 +00:00
Ezra	5f4cc8cae2	config(deepdive): enable fleet context grounding (#830 ) Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 17:32:24 +00:00
Ezra	ca1a11f66b	feat(deepdive): integrate Phase 0 fleet context into synthesis (#830 ) Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 17:32:23 +00:00
Ezra	7189565d4d	feat(deepdive): add Phase 0 fleet context grounding module (#830 ) Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 17:32:22 +00:00
Ezra	b3bec469b1	[ezra] #830 : Pipeline proof-of-execution document Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 12:46:03 +00:00
Ezra	16bd546fc9	[ezra] #830 : Fix config wrapper, add arXiv API fallback, implement voice delivery, fix datetime Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 12:45:07 +00:00
Ezra	76c973c0c2	Update README to reflect production implementation status (#830 ) Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 12:18:18 +00:00
Ezra	fc237e67d7	Add Telegram /deepdive command handler for on-demand briefings (#830 ) Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details Hermes-compatible command handler that parses /deepdive args, runs the pipeline, and returns status + audio to Telegram.	2026-04-05 12:17:17 +00:00
Ezra	25a45467ac	Add QUICKSTART.md for Deep Dive pipeline (#830 ) Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details Step-by-step guide for installation, dry-run testing, live delivery, systemd timer enablement, and Telegram command setup.	2026-04-05 12:17:16 +00:00
Ezra	92f1164be9	Add TTS engine implementation for Deep Dive (#830 ) Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details Executable Phase 4 component: PiperTTS, ElevenLabsTTS, HybridTTS classes with chunking, concatenation, error handling. Ready for integration with Phase 3 synthesizer. Burn mode artifact by Ezra.	2026-04-05 08:31:34 +00:00
Ezra	6c5ac52374	[BURN] #830 : End-to-end pipeline test (dry-run validation) Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 08:08:11 +00:00
Ezra	b131a12592	[BURN] #830 : Phase 2 tests (relevance scoring) Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 08:08:10 +00:00
Ezra	ffae1b6285	[BURN] #830 : Phase 1 tests (arXiv RSS aggregation) Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 08:08:08 +00:00
Ezra	f8634c0105	[BURN] #830 : Systemd timer for daily 06:00 execution Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 08:08:07 +00:00
Ezra	c488bb7e94	[BURN] #830 : Systemd service unit Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 08:08:07 +00:00
Ezra	66f632bd99	[BURN] #830 : Build automation (Makefile) Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 08:06:12 +00:00
Ezra	44302bbdf9	[BURN] #830 : Working pipeline.py implementation (645 lines, executable) Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 08:06:11 +00:00
Ezra	88af4870d3	[scaffold] Deep Dive intelligence pipeline: intelligence/deepdive/requirements.txt Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 06:19:51 +00:00
Ezra	cca5909cf9	[scaffold] Deep Dive intelligence pipeline: intelligence/deepdive/config.yaml Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 06:19:50 +00:00
Ezra	a8b4f7a8c0	[scaffold] Deep Dive intelligence pipeline: intelligence/deepdive/pipeline.py Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 06:19:49 +00:00
Ezra	949becff22	[scaffold] Deep Dive intelligence pipeline: intelligence/deepdive/architecture.md Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 06:19:48 +00:00
Ezra	fc11ea8a28	[scaffold] Deep Dive intelligence pipeline: intelligence/deepdive/README.md Some checks failed Deploy Nexus / deploy (push) Has been cancelled Details	2026-04-05 06:19:47 +00:00

36 Commits