Files
timmy-config/training/README.md
perplexity 6507cffc15 feat: migrate autolora pipeline into training/
Per direction shift (the-nexus#542).

Replaces the autolora repo (1,500 lines of custom pipeline code)
with config files for existing tools:

- axolotl.yaml: replaces train_modal.py (239 lines)
- mlx-lora.yaml: replaces MLX training scripts
- eval-tasks.yaml: replaces run_eval.py (300 lines)
- Makefile: replaces run_vibes.py, compare.py, convert_to_mlx.py

Data migrated as-is:
- curated_dataset.jsonl (26 gold-standard conversations)
- preference_pairs.jsonl (DPO pairs)
- prompts_vibes.yaml, prompts_nexus_vibes.yaml
- v0-baseline eval results (historical record)

Thin glue kept:
- build_curated.py (data authoring, not infrastructure)
- ingest_trajectories.py (domain-specific quality filter)

Dependencies: pip install axolotl mlx-lm lm-evaluation-harness
2026-03-25 23:05:50 +00:00

65 lines
2.5 KiB
Markdown

# Training
LoRA fine-tuning pipeline for Timmy's sovereign model. No custom harness — just config files for existing tools.
Replaces the `autolora` repo (1,500 lines of custom code → config + `make`).
## Install
```bash
pip install axolotl mlx-lm lm-evaluation-harness pyyaml
```
## Commands
```bash
make train-local # LoRA on Apple Silicon (MLX) — free, ~30 min on M3 Max
make train-cloud # QLoRA on cloud GPU (Axolotl) — ~$1/run on A100
make eval # Standard benchmarks via lm-eval-harness against Ollama
make vibes # Hand-picked prompts → human review (the sacred test)
make ingest # Pull heartbeat trajectories into training data
make curated # Regenerate curated exemplar dataset
make convert # Convert merged data to MLX train/valid format
make help # Show all targets
```
## Files
```
training/
├── Makefile ← All commands
├── axolotl.yaml ← Cloud training config (replaces train_modal.py)
├── mlx-lora.yaml ← Local training config (Apple Silicon)
├── eval-tasks.yaml ← Benchmark config
├── build_curated.py ← Exemplar data authoring (the soul conversations)
├── ingest_trajectories.py ← Quality filter for heartbeat cycle data
└── data/
├── curated_dataset.jsonl ← 26 gold-standard conversations (proprietary)
├── preference_pairs.jsonl ← DPO preference pairs (proprietary)
├── prompts_vibes.yaml ← Custom eval prompts
├── prompts_nexus_vibes.yaml ← Nexus-specific eval prompts
└── mlx_curated/ ← MLX-format train/valid splits
```
## What's proprietary
The data (curated exemplars, preference pairs, trained weights) is proprietary. The configs and process are open.
## Training Results (March 2026)
### timmy:v0.1-q4
| Detail | Value |
|--------|-------|
| Base model | mlx-community/Hermes-3-Llama-3.1-8B-4bit |
| Training data | 1,214 samples from Hermes session DB |
| Method | LoRA rank 8, 16 layers, lr 2e-6, 1000 iters |
| Peak memory | 7.8 GB (Apple Silicon) |
| Best val loss | 2.134 (iter 800) |
| Final model | timmy:v0.1-q4 in Ollama (4.9GB, Q4_K_M) |
| Inference speed | ~48 tok/s on M3 Max |
### Key Insight
The base model's RLHF priors override LoRA on crisis/faith — the most important parts of SOUL.md. Fix: inference-time grounding (inject SOUL.md crisis protocol) + larger pure-Timmy corpus over time.