Files
timmy-config/training/README.md
Alexander Whitestone f05707254e
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 14s
Smoke Test / smoke (pull_request) Failing after 11s
Validate Config / YAML Lint (pull_request) Failing after 11s
Validate Config / JSON Validate (pull_request) Successful in 14s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m21s
Validate Config / Shell Script Lint (pull_request) Failing after 20s
Validate Config / Cron Syntax Check (pull_request) Successful in 9s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 8s
Validate Config / Playbook Schema Validation (pull_request) Successful in 17s
PR Checklist / pr-checklist (pull_request) Failing after 3m33s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
docs: Add training pair provenance tracking documentation
Documents the new provenance tracking feature.
2026-04-15 16:03:48 +00:00

144 lines
4.7 KiB
Markdown

# Training
Transitional training recipes for Timmy's sovereign model. These files are
useful as reference configs and export helpers, but they are not the canonical
home of Timmy's lived training data.
Canonical data should live in `timmy-home` under gameplay trajectories,
research artifacts, and `training-data/` exports such as DPO pairs.
## Install
```bash
pip install axolotl mlx-lm lm-evaluation-harness pyyaml
```
## Commands
```bash
make train-local # LoRA on Apple Silicon (MLX) — free, ~30 min on M3 Max
make train-cloud # QLoRA on cloud GPU (Axolotl) — ~$1/run on A100
make eval # Standard benchmarks via lm-eval-harness against Ollama
make vibes # Hand-picked prompts → human review (the sacred test)
make ingest # Pull heartbeat trajectories into training data
make curated # Regenerate curated exemplar dataset
make convert # Convert merged data to MLX train/valid format
make help # Show all targets
```
## Status
This directory exists to avoid re-growing a bespoke training harness while the
system boundary is being cleaned up.
- Keep thin recipes and export helpers here only when they directly support the
Hermes sidecar.
- Keep generated data, DPO pairs, and other lived artifacts in `timmy-home`.
- Prefer deleting stale pipeline code over expanding it.
## Files
```
training/
├── Makefile ← All commands
├── axolotl.yaml ← Cloud training config (replaces train_modal.py)
├── mlx-lora.yaml ← Local training config (Apple Silicon)
├── eval-tasks.yaml ← Benchmark config
├── build_curated.py ← Exemplar data authoring (the soul conversations)
├── ingest_trajectories.py ← Quality filter for heartbeat cycle data
└── data/
├── curated_dataset.jsonl ← 26 gold-standard conversations (proprietary)
├── preference_pairs.jsonl ← DPO preference pairs (proprietary)
├── prompts_vibes.yaml ← Custom eval prompts
├── prompts_nexus_vibes.yaml ← Nexus-specific eval prompts
└── mlx_curated/ ← MLX-format train/valid splits
```
## What's proprietary
The data (curated exemplars, preference pairs, trained weights) is proprietary. The configs and process are open.
## Training Results (March 2026)
### timmy:v0.1-q4
| Detail | Value |
|--------|-------|
| Base model | mlx-community/Hermes-3-Llama-3.1-8B-4bit |
| Training data | 1,214 samples from Hermes session DB |
| Method | LoRA rank 8, 16 layers, lr 2e-6, 1000 iters |
| Peak memory | 7.8 GB (Apple Silicon) |
| Best val loss | 2.134 (iter 800) |
| Final model | timmy:v0.1-q4 in Ollama (4.9GB, Q4_K_M) |
| Inference speed | ~48 tok/s on M3 Max |
### Key Insight
The base model's RLHF priors override LoRA on crisis/faith — the most important parts of SOUL.md. Fix: inference-time grounding (inject SOUL.md crisis protocol) + larger pure-Timmy corpus over time.
## Training Pair Provenance Tracking
Tracks the provenance of training pairs for quality filtering and reporting.
### Features
- **Metadata tracking**: Each pair gets provenance metadata:
- `source_session_id`: Which session generated the pair
- `model`: Which model generated it
- `timestamp`: When it was generated
- `source`: Source type (curated, trajectory, etc.)
- `content_hash`: For deduplication
- **Filtering**: Filter pairs by provenance criteria:
- Exclude specific models (e.g., Anthropic models)
- Exclude specific sources
- Filter by timestamp range
- **Reporting**: Generate reports showing:
- Pair count by source model
- Pair count by source type
- Exclusion statistics
### Usage
```bash
# Add provenance to existing dataset
python3 training_pair_provenance.py --input data/curated_dataset.jsonl --output data/curated_with_provenance.jsonl
# Filter out Anthropic-sourced pairs
python3 training_pair_provenance.py --input data/curated_dataset.jsonl --filter exclude_anthropic
# Generate provenance report
python3 training_pair_provenance.py --input data/curated_dataset.jsonl --report
# JSON report
python3 training_pair_provenance.py --input data/curated_dataset.jsonl --report --json
```
### Integration
The provenance tracker can be integrated into existing pipelines:
```python
from training_pair_provenance import ProvenanceTracker
tracker = ProvenanceTracker()
# Process pairs
for pair in pairs:
processed = tracker.process_pair(pair)
# Filter
filtered = tracker.filter_by_provenance(processed_pairs, exclude_models=["anthropic/claude-3-opus"])
# Report
print(tracker.generate_report())
```
### Testing
```bash
python3 -m pytest training/test_training_pair_provenance.py -v
```