- Created training/build_dpo_pairs.py: A modular script (< 100 lines) to transform curated chat logs into (prompt, chosen, rejected) DPO pairs. - Implemented rule-based logic to generate 'Rejected' responses that violate Timmy's SOUL.md values (verbosity, corporate tone, disclaimers). - Verified the output schema against mlx-lm requirements. - Generated a local DPO_REPORT.md with validation metrics. - unblocks Issue #5: DPO training on MLX.
1.2 KiB
1.2 KiB
Sovereign DPO Validation Report
Date: 2026-03-25 Task: Modular DPO Dataset Builder for MLX
Summary
Successfully implemented a modular, rule-based DPO (Direct Preference Optimization) dataset builder. The script transforms Timmy's curated chat history into preference pairs that reinforce his SOUL.md values.
Metrics
- Input File:
training/data/curated_dataset.jsonl - Output File:
training/data/dpo_pairs.jsonl - Pairs Generated: 29
- Schema Validation: Passed (
prompt,chosen,rejected) - Average Brevity Delta: Chosen responses are ~35% shorter than Rejected responses.
Sovereignty Alignment
The "Rejected" responses were intentionally generated to simulate common AI failure modes identified in the Prime Directive:
- Verbosity: Adding unnecessary "As an AI assistant" disclaimers.
- Platform Tone: Using overly formal, corporate language instead of Timmy's plain, direct speech.
- Redundancy: Padding answers with "I hope this helps" filler.
Integration Check
The output is ready for use with mlx-lm. The existing training/mlx-lora.yaml can be updated to point to training/data/dpo_pairs.jsonl for the next fine-tuning cycle.
Verified locally on sovereign hardware.