Implement DPO training on MLX — it's just a loss function #5
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The Problem
mlx-lmships with SFT training only. There's no--method dpoflag. This is a tooling gap, not a hardware or framework limitation. MLX can compute gradients. DPO is just a different loss function. We've already trained 3 LoRA adapters on MLX (timmy-v0, v0.1, v0.2). This should work.The Math
DPO loss is one line:
Two forward passes per training pair. That's it.
Implementation (~40-60 lines of Python)
Training Loop
ref_model = deepcopy(model); freeze(ref_model))The training loop is almost identical to the existing
mlx_lm.loratrainer — same optimizer, same LoRA config, same checkpoint saving. Swap the loss function.Data Format
The session_export Huey task (already running) extracts user→assistant pairs from Hermes sessions. For DPO we need:
We can generate rejected responses by running the same prompt through the base model without the LoRA — the unaligned response is the rejection.
SimPO Alternative (Even Simpler)
If reference model memory is an issue on 36GB:
No reference model at all. One forward pass per response. SimPO outperforms DPO on AlpacaEval by 6.4 points. Paper: https://arxiv.org/abs/2405.14734
Acceptance Criteria
dpo_trainer.pyin timmy-config/training/ that runs DPO on MLX with LoRAWhy This Matters
SFT teaches Timmy what to say. DPO teaches Timmy what Alexander prefers. That's the difference between a chatbot and an apprentice.
Do the SImPO Too! Eval them! Write a report! Do it all!
PR submitted: http://143.198.27.163:3000/Timmy_Foundation/timmy-config/pulls/2
training/build_dpo_pairs.pyto automate sovereign data preparation.curated_dataset.jsonl(29 pairs generated).training/DPO_REPORT.mdwith validation metrics.⚡ Dispatched to
claude. Huey task queued.⚡ Dispatched to
gemini. Huey task queued.⚡ Dispatched to
kimi. Huey task queued.⚡ Dispatched to
grok. Huey task queued.⚡ Dispatched to
perplexity. Huey task queued.Closing during the 2026-03-28 backlog burn-down.
Reason: this issue is being retired as part of a backlog reset toward the current final vision: Heartbeat, Harness, and Portal. If the work still matters after reset, it should return as a narrower, proof-oriented next-step issue rather than stay open as a broad legacy frontier.