78 lines
2.9 KiB
Markdown
78 lines
2.9 KiB
Markdown
# Training
|
|
|
|
Transitional training recipes for Timmy's sovereign model. These files are
|
|
useful as reference configs and export helpers, but they are not the canonical
|
|
home of Timmy's lived training data.
|
|
|
|
Canonical data should live in `timmy-home` under gameplay trajectories,
|
|
research artifacts, and `training-data/` exports such as DPO pairs.
|
|
|
|
## Install
|
|
|
|
```bash
|
|
pip install axolotl mlx-lm lm-evaluation-harness pyyaml
|
|
```
|
|
|
|
## Commands
|
|
|
|
```bash
|
|
make train-local # LoRA on Apple Silicon (MLX) — free, ~30 min on M3 Max
|
|
make train-cloud # QLoRA on cloud GPU (Axolotl) — ~$1/run on A100
|
|
make eval # Standard benchmarks via lm-eval-harness against Ollama
|
|
make vibes # Hand-picked prompts → human review (the sacred test)
|
|
make ingest # Pull heartbeat trajectories into training data
|
|
make curated # Regenerate curated exemplar dataset
|
|
make convert # Convert merged data to MLX train/valid format
|
|
make help # Show all targets
|
|
```
|
|
|
|
## Status
|
|
|
|
This directory exists to avoid re-growing a bespoke training harness while the
|
|
system boundary is being cleaned up.
|
|
|
|
- Keep thin recipes and export helpers here only when they directly support the
|
|
Hermes sidecar.
|
|
- Keep generated data, DPO pairs, and other lived artifacts in `timmy-home`.
|
|
- Prefer deleting stale pipeline code over expanding it.
|
|
|
|
## Files
|
|
|
|
```
|
|
training/
|
|
├── Makefile ← All commands
|
|
├── axolotl.yaml ← Cloud training config (replaces train_modal.py)
|
|
├── mlx-lora.yaml ← Local training config (Apple Silicon)
|
|
├── eval-tasks.yaml ← Benchmark config
|
|
├── build_curated.py ← Exemplar data authoring (the soul conversations)
|
|
├── ingest_trajectories.py ← Quality filter for heartbeat cycle data
|
|
└── data/
|
|
├── curated_dataset.jsonl ← 26 gold-standard conversations (proprietary)
|
|
├── preference_pairs.jsonl ← DPO preference pairs (proprietary)
|
|
├── prompts_vibes.yaml ← Custom eval prompts
|
|
├── prompts_nexus_vibes.yaml ← Nexus-specific eval prompts
|
|
└── mlx_curated/ ← MLX-format train/valid splits
|
|
```
|
|
|
|
## What's proprietary
|
|
|
|
The data (curated exemplars, preference pairs, trained weights) is proprietary. The configs and process are open.
|
|
|
|
## Training Results (March 2026)
|
|
|
|
### timmy:v0.1-q4
|
|
|
|
| Detail | Value |
|
|
|--------|-------|
|
|
| Base model | mlx-community/Hermes-3-Llama-3.1-8B-4bit |
|
|
| Training data | 1,214 samples from Hermes session DB |
|
|
| Method | LoRA rank 8, 16 layers, lr 2e-6, 1000 iters |
|
|
| Peak memory | 7.8 GB (Apple Silicon) |
|
|
| Best val loss | 2.134 (iter 800) |
|
|
| Final model | timmy:v0.1-q4 in Ollama (4.9GB, Q4_K_M) |
|
|
| Inference speed | ~48 tok/s on M3 Max |
|
|
|
|
### Key Insight
|
|
|
|
The base model's RLHF priors override LoRA on crisis/faith — the most important parts of SOUL.md. Fix: inference-time grounding (inject SOUL.md crisis protocol) + larger pure-Timmy corpus over time.
|