Files
timmy-config/training/evals_archive/v0-baseline/8b/eval_run.log
perplexity 6507cffc15 feat: migrate autolora pipeline into training/
Per direction shift (the-nexus#542).

Replaces the autolora repo (1,500 lines of custom pipeline code)
with config files for existing tools:

- axolotl.yaml: replaces train_modal.py (239 lines)
- mlx-lora.yaml: replaces MLX training scripts
- eval-tasks.yaml: replaces run_eval.py (300 lines)
- Makefile: replaces run_vibes.py, compare.py, convert_to_mlx.py

Data migrated as-is:
- curated_dataset.jsonl (26 gold-standard conversations)
- preference_pairs.jsonl (DPO pairs)
- prompts_vibes.yaml, prompts_nexus_vibes.yaml
- v0-baseline eval results (historical record)

Thin glue kept:
- build_curated.py (data authoring, not infrastructure)
- ingest_trajectories.py (domain-specific quality filter)

Dependencies: pip install axolotl mlx-lm lm-evaluation-harness
2026-03-25 23:05:50 +00:00

42 lines
2.2 KiB
Plaintext

Evaluating 19 sessions against hermes3:8b
Skipping sessions with >20 human turns
------------------------------------------------------------
[1/19] 20260323_154745_7c9379 (1 human turns)... composite=0.522 (19.0s)
[2/19] 20260323_155153_bbc661 (1 human turns)... composite=0.681 (20.3s)
[3/19] 20260323_155458_a7563d (1 human turns)... composite=0.523 (15.5s)
[4/19] 20260323_160415_267a50 (1 human turns)... composite=0.000 (0.0s)
[5/19] 20260323_160533_edc7cf (4 human turns)... composite=0.684 (35.0s)
[6/19] 20260323_160633_6fab92 (1 human turns)... composite=0.000 (0.0s)
[7/19] 20260323_160736_754836 (1 human turns)... composite=0.628 (8.1s)
[8/19] 20260323_160920_c90867 (2 human turns)... composite=0.592 (36.7s)
[9/19] 20260323_161348_d553f0 (1 human turns)... composite=0.628 (13.2s)
[10/19] 20260323_162017_83fe11 (1 human turns)... composite=0.623 (13.2s)
[11/19] 20260323_162338_0267eb (2 human turns)... composite=0.592 (28.4s)
[12/19] 20260323_162704_86f56f (1 human turns)... composite=0.623 (8.1s)
[13/19] 20260323_163322_e45fa8 (1 human turns)... composite=0.624 (8.0s)
[14/19] 20260323_163943_9ac5e0 (1 human turns)... composite=0.481 (75.1s)
[15/19] 20260323_164605_a4aedf (1 human turns)... composite=0.726 (8.4s)
[16/19] 20260323_164606_c000d8 (2 human turns)... composite=0.592 (25.9s)
[17/19] 20260323_165414_a92bf6 (1 human turns)... composite=0.674 (10.5s)
[18/19] 20260323_165533_940c49 (1 human turns)... composite=0.521 (19.5s)
[19/19] 20260323_170039_2a191f (1 human turns)... composite=0.754 (8.2s)
============================================================
RESULTS — hermes3:8b
============================================================
tool_selection : 0.895
length_ratio : 0.271
format_compliance : 0.892
text_similarity : 0.029
brevity : 0.668
composite : 0.551
Saved to /Users/apayne/autolora/evals/v0-baseline/8b/scores.json
============================================================
STEP 1: Evaluating hermes3:8b
============================================================
Done. Results in /Users/apayne/autolora/evals/v0-baseline/8b/