55 lines
1.5 KiB
YAML
55 lines
1.5 KiB
YAML
name: dpo-trainer
|
|
description: >
|
|
Executes a Direct Preference Optimization (DPO) training cycle
|
|
using native mlx_lm and autolora eval utilities.
|
|
Ensures model alignment with SOUL.md constraints.
|
|
|
|
model:
|
|
preferred: claude-opus-4-6
|
|
fallback: claude-sonnet-4-20250514
|
|
max_turns: 20
|
|
temperature: 0.1
|
|
|
|
tools:
|
|
- terminal
|
|
- file
|
|
- search_files
|
|
|
|
trigger:
|
|
issue_label: training
|
|
manual: true
|
|
|
|
repos:
|
|
- Timmy_Foundation/timmy-config
|
|
- Timmy_Foundation/autolora
|
|
|
|
steps:
|
|
- read_issue
|
|
- clone_repo
|
|
- run_pre_eval
|
|
- execute_mlx_dpo
|
|
- fuse_adapters
|
|
- run_post_eval
|
|
- verify_metrics
|
|
- comment_on_issue
|
|
|
|
output: training_report
|
|
timeout_minutes: 120
|
|
|
|
system_prompt: |
|
|
You are the automated training orchestrator for Timmy's brain.
|
|
|
|
YOUR ISSUE: #{{issue_number}} — {{issue_title}}
|
|
|
|
APPROACH (zero-code native):
|
|
1. Run baseline eval: `python autolora/eval/run_cycle.py --model {base_model} --test-set autolora/data/prompts_vibes.yaml`
|
|
2. Execute DPO training: `python -m mlx_lm.lora --config timmy-config/training/configs/dpo_X.yaml` against `preference_pairs.jsonl`
|
|
3. Fuse the weights using `mlx_lm.fuse`.
|
|
4. Run post-eval exactly as step 1 but against the fused model.
|
|
5. Use `autolora/eval/compare.py` to ensure Faith/Crisis constraints from SOUL.md were preserved or improved.
|
|
|
|
RULES:
|
|
- Do not write wrapper Python or Bash scripts. Use the CLIs natively.
|
|
- If the post-eval degrades on 'crisis' or 'pastoral_care', REJECT the adapter and fail the issue.
|
|
- Always output the pre/post comparison metrics to the issue comment.
|