training/README.md

# Training

Transitional training recipes for Timmy's sovereign model. These files are
useful as reference configs and export helpers, but they are not the canonical
home of Timmy's lived training data.

Canonical data should live in `timmy-home` under gameplay trajectories,
research artifacts, and `training-data/` exports such as DPO pairs.

## Install

```bash
pip install axolotl mlx-lm lm-evaluation-harness pyyaml
```

## Commands

```bash
make train-local    # LoRA on Apple Silicon (MLX) — free, ~30 min on M3 Max
make train-cloud    # QLoRA on cloud GPU (Axolotl) — ~$1/run on A100
make eval           # Standard benchmarks via lm-eval-harness against Ollama
make vibes          # Hand-picked prompts → human review (the sacred test)
make ingest         # Pull heartbeat trajectories into training data
make curated        # Regenerate curated exemplar dataset
make convert        # Convert merged data to MLX train/valid format
make help           # Show all targets
```

## Status

This directory exists to avoid re-growing a bespoke training harness while the
system boundary is being cleaned up.

- Keep thin recipes and export helpers here only when they directly support the
  Hermes sidecar.
- Keep generated data, DPO pairs, and other lived artifacts in `timmy-home`.
- Prefer deleting stale pipeline code over expanding it.

## Files

```
training/
├── Makefile                 ← All commands
├── axolotl.yaml             ← Cloud training config (replaces train_modal.py)
├── mlx-lora.yaml            ← Local training config (Apple Silicon)
├── eval-tasks.yaml          ← Benchmark config
├── build_curated.py         ← Exemplar data authoring (the soul conversations)
├── ingest_trajectories.py   ← Quality filter for heartbeat cycle data
└── data/
    ├── curated_dataset.jsonl       ← 26 gold-standard conversations (proprietary)
    ├── preference_pairs.jsonl      ← DPO preference pairs (proprietary)
    ├── prompts_vibes.yaml          ← Custom eval prompts
    ├── prompts_nexus_vibes.yaml    ← Nexus-specific eval prompts
    └── mlx_curated/                ← MLX-format train/valid splits
```

## What's proprietary

The data (curated exemplars, preference pairs, trained weights) is proprietary. The configs and process are open.

## Training Results (March 2026)

### timmy:v0.1-q4

| Detail | Value |
|--------|-------|
| Base model | mlx-community/Hermes-3-Llama-3.1-8B-4bit |
| Training data | 1,214 samples from Hermes session DB |
| Method | LoRA rank 8, 16 layers, lr 2e-6, 1000 iters |
| Peak memory | 7.8 GB (Apple Silicon) |
| Best val loss | 2.134 (iter 800) |
| Final model | timmy:v0.1-q4 in Ollama (4.9GB, Q4_K_M) |
| Inference speed | ~48 tok/s on M3 Max |

### Key Insight

The base model's RLHF priors override LoRA on crisis/faith — the most important parts of SOUL.md. Fix: inference-time grounding (inject SOUL.md crisis protocol) + larger pure-Timmy corpus over time.
feat: migrate autolora pipeline into training/ Per direction shift (the-nexus#542). Replaces the autolora repo (1,500 lines of custom pipeline code) with config files for existing tools: - axolotl.yaml: replaces train_modal.py (239 lines) - mlx-lora.yaml: replaces MLX training scripts - eval-tasks.yaml: replaces run_eval.py (300 lines) - Makefile: replaces run_vibes.py, compare.py, convert_to_mlx.py Data migrated as-is: - curated_dataset.jsonl (26 gold-standard conversations) - preference_pairs.jsonl (DPO pairs) - prompts_vibes.yaml, prompts_nexus_vibes.yaml - v0-baseline eval results (historical record) Thin glue kept: - build_curated.py (data authoring, not infrastructure) - ingest_trajectories.py (domain-specific quality filter) Dependencies: pip install axolotl mlx-lm lm-evaluation-harness 2026-03-25 23:05:45 +00:00			`# Training`

docs: clarify sidecar boundary and training status 2026-03-27 17:15:57 -04:00			`Transitional training recipes for Timmy's sovereign model. These files are`
			`useful as reference configs and export helpers, but they are not the canonical`
			`home of Timmy's lived training data.`
feat: migrate autolora pipeline into training/ Per direction shift (the-nexus#542). Replaces the autolora repo (1,500 lines of custom pipeline code) with config files for existing tools: - axolotl.yaml: replaces train_modal.py (239 lines) - mlx-lora.yaml: replaces MLX training scripts - eval-tasks.yaml: replaces run_eval.py (300 lines) - Makefile: replaces run_vibes.py, compare.py, convert_to_mlx.py Data migrated as-is: - curated_dataset.jsonl (26 gold-standard conversations) - preference_pairs.jsonl (DPO pairs) - prompts_vibes.yaml, prompts_nexus_vibes.yaml - v0-baseline eval results (historical record) Thin glue kept: - build_curated.py (data authoring, not infrastructure) - ingest_trajectories.py (domain-specific quality filter) Dependencies: pip install axolotl mlx-lm lm-evaluation-harness 2026-03-25 23:05:45 +00:00
docs: clarify sidecar boundary and training status 2026-03-27 17:15:57 -04:00			Canonical data should live in `timmy-home` under gameplay trajectories,
			research artifacts, and `training-data/` exports such as DPO pairs.
feat: migrate autolora pipeline into training/ Per direction shift (the-nexus#542). Replaces the autolora repo (1,500 lines of custom pipeline code) with config files for existing tools: - axolotl.yaml: replaces train_modal.py (239 lines) - mlx-lora.yaml: replaces MLX training scripts - eval-tasks.yaml: replaces run_eval.py (300 lines) - Makefile: replaces run_vibes.py, compare.py, convert_to_mlx.py Data migrated as-is: - curated_dataset.jsonl (26 gold-standard conversations) - preference_pairs.jsonl (DPO pairs) - prompts_vibes.yaml, prompts_nexus_vibes.yaml - v0-baseline eval results (historical record) Thin glue kept: - build_curated.py (data authoring, not infrastructure) - ingest_trajectories.py (domain-specific quality filter) Dependencies: pip install axolotl mlx-lm lm-evaluation-harness 2026-03-25 23:05:45 +00:00
			`## Install`

			```bash
			`pip install axolotl mlx-lm lm-evaluation-harness pyyaml`
			```

			`## Commands`

			```bash
			`make train-local # LoRA on Apple Silicon (MLX) — free, ~30 min on M3 Max`
			`make train-cloud # QLoRA on cloud GPU (Axolotl) — ~$1/run on A100`
			`make eval # Standard benchmarks via lm-eval-harness against Ollama`
			`make vibes # Hand-picked prompts → human review (the sacred test)`
			`make ingest # Pull heartbeat trajectories into training data`
			`make curated # Regenerate curated exemplar dataset`
			`make convert # Convert merged data to MLX train/valid format`
			`make help # Show all targets`
			```

docs: clarify sidecar boundary and training status 2026-03-27 17:15:57 -04:00			`## Status`

			`This directory exists to avoid re-growing a bespoke training harness while the`
			`system boundary is being cleaned up.`

			`- Keep thin recipes and export helpers here only when they directly support the`
			`Hermes sidecar.`
			- Keep generated data, DPO pairs, and other lived artifacts in `timmy-home`.
			`- Prefer deleting stale pipeline code over expanding it.`

feat: migrate autolora pipeline into training/ Per direction shift (the-nexus#542). Replaces the autolora repo (1,500 lines of custom pipeline code) with config files for existing tools: - axolotl.yaml: replaces train_modal.py (239 lines) - mlx-lora.yaml: replaces MLX training scripts - eval-tasks.yaml: replaces run_eval.py (300 lines) - Makefile: replaces run_vibes.py, compare.py, convert_to_mlx.py Data migrated as-is: - curated_dataset.jsonl (26 gold-standard conversations) - preference_pairs.jsonl (DPO pairs) - prompts_vibes.yaml, prompts_nexus_vibes.yaml - v0-baseline eval results (historical record) Thin glue kept: - build_curated.py (data authoring, not infrastructure) - ingest_trajectories.py (domain-specific quality filter) Dependencies: pip install axolotl mlx-lm lm-evaluation-harness 2026-03-25 23:05:45 +00:00			`## Files`

			```
			`training/`
			`├── Makefile ← All commands`
			`├── axolotl.yaml ← Cloud training config (replaces train_modal.py)`
			`├── mlx-lora.yaml ← Local training config (Apple Silicon)`
			`├── eval-tasks.yaml ← Benchmark config`
			`├── build_curated.py ← Exemplar data authoring (the soul conversations)`
			`├── ingest_trajectories.py ← Quality filter for heartbeat cycle data`
			`└── data/`
			`├── curated_dataset.jsonl ← 26 gold-standard conversations (proprietary)`
			`├── preference_pairs.jsonl ← DPO preference pairs (proprietary)`
			`├── prompts_vibes.yaml ← Custom eval prompts`
			`├── prompts_nexus_vibes.yaml ← Nexus-specific eval prompts`
			`└── mlx_curated/ ← MLX-format train/valid splits`
			```

			`## What's proprietary`

			`The data (curated exemplars, preference pairs, trained weights) is proprietary. The configs and process are open.`

			`## Training Results (March 2026)`

			`### timmy:v0.1-q4`

			`\| Detail \| Value \|`
			`\|--------\|-------\|`
			`\| Base model \| mlx-community/Hermes-3-Llama-3.1-8B-4bit \|`
			`\| Training data \| 1,214 samples from Hermes session DB \|`
			`\| Method \| LoRA rank 8, 16 layers, lr 2e-6, 1000 iters \|`
			`\| Peak memory \| 7.8 GB (Apple Silicon) \|`
			`\| Best val loss \| 2.134 (iter 800) \|`
			`\| Final model \| timmy:v0.1-q4 in Ollama (4.9GB, Q4_K_M) \|`
			`\| Inference speed \| ~48 tok/s on M3 Max \|`

			`### Key Insight`

			`The base model's RLHF priors override LoRA on crisis/faith — the most important parts of SOUL.md. Fix: inference-time grounding (inject SOUL.md crisis protocol) + larger pure-Timmy corpus over time.`