feat: migrate autolora pipeline into training/
Per direction shift (the-nexus#542). Replaces the autolora repo (1,500 lines of custom pipeline code) with config files for existing tools: - axolotl.yaml: replaces train_modal.py (239 lines) - mlx-lora.yaml: replaces MLX training scripts - eval-tasks.yaml: replaces run_eval.py (300 lines) - Makefile: replaces run_vibes.py, compare.py, convert_to_mlx.py Data migrated as-is: - curated_dataset.jsonl (26 gold-standard conversations) - preference_pairs.jsonl (DPO pairs) - prompts_vibes.yaml, prompts_nexus_vibes.yaml - v0-baseline eval results (historical record) Thin glue kept: - build_curated.py (data authoring, not infrastructure) - ingest_trajectories.py (domain-specific quality filter) Dependencies: pip install axolotl mlx-lm lm-evaluation-harness
This commit is contained in:
26
training/eval-tasks.yaml
Normal file
26
training/eval-tasks.yaml
Normal file
@@ -0,0 +1,26 @@
|
||||
# Eval Config — lm-evaluation-harness
|
||||
# Replaces: autolora/eval/run_eval.py (300 lines)
|
||||
#
|
||||
# Usage:
|
||||
# lm_eval --model local-completions \
|
||||
# --model_args model=timmy:v0.1-q4,base_url=http://localhost:11434/v1 \
|
||||
# --tasks hellaswag,truthfulqa_mc2,arc_challenge \
|
||||
# --output_path training/evals_archive/
|
||||
#
|
||||
# For custom Timmy-specific evals, use the vibes check (see Makefile).
|
||||
# The vibes check is manual by design — you read the output and judge.
|
||||
|
||||
# Standard benchmarks to run against each model version
|
||||
benchmarks:
|
||||
- hellaswag # Common sense reasoning
|
||||
- truthfulqa_mc2 # Honesty / factuality
|
||||
- arc_challenge # Science reasoning
|
||||
- winogrande # Coreference resolution
|
||||
|
||||
# Models to compare
|
||||
models:
|
||||
baseline: hermes3:latest
|
||||
candidate: timmy:v0.1-q4
|
||||
|
||||
# Ollama endpoint
|
||||
endpoint: http://localhost:11434/v1
|
||||
Reference in New Issue
Block a user