Per direction shift (the-nexus#542). Replaces the autolora repo (1,500 lines of custom pipeline code) with config files for existing tools: - axolotl.yaml: replaces train_modal.py (239 lines) - mlx-lora.yaml: replaces MLX training scripts - eval-tasks.yaml: replaces run_eval.py (300 lines) - Makefile: replaces run_vibes.py, compare.py, convert_to_mlx.py Data migrated as-is: - curated_dataset.jsonl (26 gold-standard conversations) - preference_pairs.jsonl (DPO pairs) - prompts_vibes.yaml, prompts_nexus_vibes.yaml - v0-baseline eval results (historical record) Thin glue kept: - build_curated.py (data authoring, not infrastructure) - ingest_trajectories.py (domain-specific quality filter) Dependencies: pip install axolotl mlx-lm lm-evaluation-harness
35 lines
1.9 KiB
Plaintext
35 lines
1.9 KiB
Plaintext
Evaluating 19 sessions against hermes3:8b
|
|
Skipping sessions with >20 human turns
|
|
------------------------------------------------------------
|
|
[1/19] 20260323_154745_7c9379 (1 human turns)... composite=0.563 (17.3s)
|
|
[2/19] 20260323_155153_bbc661 (1 human turns)... composite=0.681 (36.7s)
|
|
[3/19] 20260323_155458_a7563d (1 human turns)... composite=0.521 (26.7s)
|
|
[4/19] 20260323_160415_267a50 (1 human turns)... composite=0.000 (0.0s)
|
|
[5/19] 20260323_160533_edc7cf (4 human turns)... composite=0.692 (35.1s)
|
|
[6/19] 20260323_160633_6fab92 (1 human turns)... composite=0.000 (0.0s)
|
|
[7/19] 20260323_160736_754836 (1 human turns)... composite=0.628 (7.4s)
|
|
[8/19] 20260323_160920_c90867 (2 human turns)... composite=0.572 (34.5s)
|
|
[9/19] 20260323_161348_d553f0 (1 human turns)... composite=0.634 (120.1s)
|
|
[10/19] 20260323_162017_83fe11 (1 human turns)... composite=0.562 (59.7s)
|
|
[11/19] 20260323_162338_0267eb (2 human turns)... composite=0.612 (40.2s)
|
|
[12/19] 20260323_162704_86f56f (1 human turns)... composite=0.623 (8.2s)
|
|
[13/19] 20260323_163322_e45fa8 (1 human turns)... composite=0.624 (8.2s)
|
|
[14/19] 20260323_163943_9ac5e0 (1 human turns)... composite=0.646 (120.0s)
|
|
[15/19] 20260323_164605_a4aedf (1 human turns)... composite=0.644 (120.0s)
|
|
[16/19] 20260323_164606_c000d8 (2 human turns)... composite=0.602 (26.8s)
|
|
[17/19] 20260323_165414_a92bf6 (1 human turns)... composite=0.632 (18.3s)
|
|
[18/19] 20260323_165533_940c49 (1 human turns)... composite=0.521 (17.9s)
|
|
[19/19] 20260323_170039_2a191f (1 human turns)... composite=0.754 (8.2s)
|
|
|
|
============================================================
|
|
RESULTS — hermes3:8b
|
|
============================================================
|
|
tool_selection : 0.895
|
|
length_ratio : 0.221
|
|
format_compliance : 0.892
|
|
text_similarity : 0.039
|
|
brevity : 0.719
|
|
composite : 0.553
|
|
|
|
Saved to ../evals/v0-baseline/8b-full/scores.json
|