Compare commits
1 Commits
step35/594
...
step35/486
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
f3dd9f3831 |
119
docs/local-fine-tuning-guide.md
Normal file
119
docs/local-fine-tuning-guide.md
Normal file
@@ -0,0 +1,119 @@
|
||||
# Local Model Fine-tuning Guide
|
||||
|
||||
This document ties together the local fine-tuning infrastructure (Ollama + llama.cpp) for Timmy Foundation.
|
||||
|
||||
## Overview
|
||||
|
||||
Local fine-tuning lets us improve Hermes 4 and other models **on our own data** without sending anything to the cloud. The stack:
|
||||
|
||||
| Layer | Tool | Purpose |
|
||||
|-------|------|---------|
|
||||
| **Serving** | `ollama` | Local OpenAI-compatible inference API |
|
||||
| **Backend** | `llama.cpp` | GGUF model execution (CPU/GPU) |
|
||||
| **Training** | `mlx-lm` (Apple Silicon) / `axolotl` (cloud GPU) | LoRA/QLoRA fine-tuning |
|
||||
| **Data** | `training/data/` | JSONL DPO and SFT datasets |
|
||||
| **Configs** | `training/mlx-lora.yaml`, `training/axolotl.yaml` | Training hyperparameters |
|
||||
|
||||
## Quick Start — Fine-tune on PR Data
|
||||
|
||||
### 1. Install prerequisites
|
||||
|
||||
```bash
|
||||
# Local (MLX) — free, Apple Silicon only
|
||||
pip install mlx-lm pyyaml
|
||||
|
||||
# Cloud (Axolotl) — ~$1 per run, any GPU
|
||||
pip install axolotl
|
||||
```
|
||||
|
||||
### 2. Extract PR data
|
||||
|
||||
```bash
|
||||
# Generate DPO pairs from merged PR history (up to 50)
|
||||
python3 scripts/generate_pr_dpo_pairs.py --limit 50
|
||||
|
||||
# Include full diffs (larger files, slower)
|
||||
python3 scripts/generate_pr_dpo_pairs.py --include-diff --limit 20
|
||||
```
|
||||
|
||||
Output: `training/data/pr_dpo_pairs.jsonl`
|
||||
|
||||
### 3. Train locally (M3 Mac)
|
||||
|
||||
```bash
|
||||
cd training
|
||||
make train-local # LoRA via MLX — ~30 min on M3 Max
|
||||
```
|
||||
|
||||
Or cloud GPU:
|
||||
|
||||
```bash
|
||||
make train-cloud # QLoRA via Axolotl — ~$1 on A100
|
||||
```
|
||||
|
||||
### 4. Evaluate
|
||||
|
||||
```bash
|
||||
# Standard benchmarks
|
||||
make eval
|
||||
|
||||
# Hand-picked vibes check
|
||||
make vibes
|
||||
```
|
||||
|
||||
### 5. Deploy to Ollama
|
||||
|
||||
Once training finishes, create a Modelfile:
|
||||
|
||||
```Dockerfile
|
||||
FROM hermes4:14b
|
||||
ADAPTER ./output/hermes4-14b-timmy
|
||||
```
|
||||
|
||||
```bash
|
||||
ollama create timmy-v1 -f Modelfile
|
||||
ollama run timmy-v1
|
||||
```
|
||||
|
||||
## Extending the Pipeline
|
||||
|
||||
### Merge PR dataset → DPO (issue #480)
|
||||
|
||||
The `scripts/generate_pr_dpo_pairs.py` script extracts positive examples from merged PRs.
|
||||
Future work: add negative examples from closed/reverted PRs to build full DPO.
|
||||
|
||||
### Benchmark inference latency (issue #486 item 2)
|
||||
|
||||
```bash
|
||||
# Quick timing with hyperfine
|
||||
pip install hyperfine
|
||||
hyperfine "ollama run hermes4:14b 'Summarize this code...'"
|
||||
|
||||
# Or use scripts/model_eval.py for multi-model comparison
|
||||
python3 scripts/model_eval.py --models hermes4:14b,qwen2.5-coder:7b --tasks code,reasoning
|
||||
```
|
||||
|
||||
### Quantization (issue #486 item 3)
|
||||
|
||||
GGUF quantization options (via llama.cpp):
|
||||
|
||||
| Quant | Quality | Speed | Size |
|
||||
|-------|---------|-------|------|
|
||||
| q4_k_m | Good | Fast | ~8 GB |
|
||||
| q5_k_s | Better | Medium | ~9 GB |
|
||||
| q8_0 | Best | Slow | ~14 GB |
|
||||
|
||||
Convert via `llama.cpp`'s `quantize` tool or use pre-quantized models from [HuggingFace](https://huggingface.co/TheBloke).
|
||||
|
||||
Update model name in `training/mlx-lora.yaml` and `training/axolotl.yaml` to point to your quantized variant.
|
||||
|
||||
## Architecture Boundaries
|
||||
|
||||
- **Configs** (`training/axolotl.yaml`, `training/mlx-lora.yaml`) — sidecar, not forked
|
||||
- **Scripts** (`scripts/generate_pr_dpo_pairs.py`) — sidecar-managed
|
||||
- **Data** (`training/data/`) — canonical training data lives in `timmy-home` once mature
|
||||
- **Models** — downloaded GGUF files live outside repo, tracked via `fleet/model_pipeline.py`
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2026-04-26 (STEP35 burn for #486)*
|
||||
213
scripts/generate_pr_dpo_pairs.py
Executable file
213
scripts/generate_pr_dpo_pairs.py
Executable file
@@ -0,0 +1,213 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
generate_pr_dpo_pairs.py — Extract merged PR data as DPO training pairs.
|
||||
|
||||
This script addresses issue #480 (sibling of #486) by fetching merged PRs
|
||||
from Gitea and converting them into preference-pair training data for local
|
||||
model fine-tuning via Ollama + llama.cpp.
|
||||
|
||||
Usage:
|
||||
python3 scripts/generate_pr_dpo_pairs.py \
|
||||
[--owner Timmy_Foundation] [--repo timmy-config] \
|
||||
[--state merged] [--limit 50] \
|
||||
[--output training/data/pr_dpo_pairs.jsonl]
|
||||
|
||||
Output format (DPO JSONL):
|
||||
{"prompt": "...", "chosen": "...", "rejected": null, "meta": {...}}
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
from urllib.request import Request, urlopen
|
||||
from urllib.error import HTTPError, URLError
|
||||
|
||||
|
||||
GITEA_URL = "https://forge.alexanderwhitestone.com"
|
||||
|
||||
|
||||
def get_token() -> str:
|
||||
"""Read Gitea token from ~/.config/gitea/token or $GITEA_TOKEN."""
|
||||
token_path = Path.home() / ".config" / "gitea" / "token"
|
||||
if token_path.exists():
|
||||
return token_path.read_text().strip()
|
||||
env_token = os.environ.get("GITEA_TOKEN", "")
|
||||
if env_token:
|
||||
return env_token
|
||||
raise FileNotFoundError(
|
||||
"Gitea token not found. Create ~/.config/gitea/token or set $GITEA_TOKEN."
|
||||
)
|
||||
|
||||
|
||||
def api(method: str, path: str, token: str, data: dict = None) -> dict:
|
||||
"""Call Gitea API and return parsed JSON."""
|
||||
url = f"{GITEA_URL}/api/v1{path}"
|
||||
body = json.dumps(data).encode() if data else None
|
||||
req = Request(
|
||||
url,
|
||||
data=body,
|
||||
headers={
|
||||
"Authorization": f"token {token}",
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
method=method,
|
||||
)
|
||||
try:
|
||||
with urlopen(req, timeout=30) as resp:
|
||||
return json.loads(resp.read())
|
||||
except HTTPError as e:
|
||||
print(f"HTTP {e.code} on {path}: {e.read().decode()[:200]}", file=sys.stderr)
|
||||
raise
|
||||
except URLError as e:
|
||||
print(f"Network error on {path}: {e}", file=sys.stderr)
|
||||
raise
|
||||
|
||||
|
||||
def fetch_prs(owner: str, repo: str, state: str, token: str, limit: int) -> list[dict]:
|
||||
"""Fetch PRs from Gitea with pagination.
|
||||
state: "merged" → fetch closed PRs, filter for merged_at; "closed" → all closed; "open" → open PRs.
|
||||
"""
|
||||
all_prs = []
|
||||
page = 1
|
||||
per_page = 50
|
||||
|
||||
while len(all_prs) < limit:
|
||||
# Gitea API: /repos/{owner}/{repo}/pulls?state=closed&page=N
|
||||
path = f"/repos/{owner}/{repo}/pulls?state=closed&page={page}&per_page={per_page}"
|
||||
try:
|
||||
prs = api("GET", path, token)
|
||||
except Exception as e:
|
||||
print(f"Error fetching page {page}: {e}", file=sys.stderr)
|
||||
break
|
||||
if not prs:
|
||||
break
|
||||
# Filter: merged = has merged_at timestamp
|
||||
if state == "merged":
|
||||
prs = [pr for pr in prs if pr.get("merged_at")]
|
||||
all_prs.extend(prs)
|
||||
if len(prs) < per_page:
|
||||
break
|
||||
page += 1
|
||||
|
||||
return all_prs[:limit]
|
||||
|
||||
|
||||
def fetch_pr_diff(owner: str, repo: str, pr_number: int, token: str) -> Optional[str]:
|
||||
"""Fetch the unified diff for a PR via the API."""
|
||||
try:
|
||||
pr = api("GET", f"/repos/{owner}/{repo}/pulls/{pr_number}", token)
|
||||
return pr.get("diff")
|
||||
except Exception as e:
|
||||
print(f" Could not fetch diff for PR#{pr_number}: {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
def make_dpo_pair(pr: dict, diff: Optional[str] = None) -> Optional[dict]:
|
||||
"""
|
||||
Convert a Gitea PR to a DPO training pair.
|
||||
|
||||
prompt = PR title + body (task description)
|
||||
chosen = merged code diff (the solution)
|
||||
rejected = None (future: negative examples)
|
||||
"""
|
||||
title = pr.get("title", "").strip()
|
||||
body = (pr.get("body") or "").strip()
|
||||
|
||||
if not title:
|
||||
return None
|
||||
|
||||
prompt = title
|
||||
if body:
|
||||
prompt += f"\n\n{body}"
|
||||
|
||||
chosen = diff if diff else pr.get("body", "[No code diff available]")
|
||||
rejected = None
|
||||
|
||||
pair = {
|
||||
"prompt": prompt,
|
||||
"chosen": chosen,
|
||||
"rejected": rejected,
|
||||
"meta": {
|
||||
"pr_number": pr["number"],
|
||||
"user": pr.get("user", {}).get("login", "unknown"),
|
||||
"created_at": pr.get("created_at", ""),
|
||||
"state": pr.get("state", ""),
|
||||
"merged_at": pr.get("merged_at", ""),
|
||||
"labels": [l["name"] for l in pr.get("labels", [])],
|
||||
}
|
||||
}
|
||||
return pair
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Generate DPO pairs from merged PR history")
|
||||
parser.add_argument("--owner", default="Timmy_Foundation", help="Repo owner/org")
|
||||
parser.add_argument("--repo", default="timmy-config", help="Repo name")
|
||||
parser.add_argument("--state", default="merged", choices=["merged", "closed"],
|
||||
help="PR state to fetch — 'merged' for positive pairs")
|
||||
parser.add_argument("--limit", type=int, default=50,
|
||||
help="Max PRs to process (default: 50)")
|
||||
parser.add_argument("--output",
|
||||
default="training/data/pr_dpo_pairs.jsonl",
|
||||
help="Output JSONL path")
|
||||
parser.add_argument("--include-diff", action="store_true",
|
||||
help="Fetch full PR diffs — WARNING: very large output")
|
||||
args = parser.parse_args()
|
||||
|
||||
token = get_token()
|
||||
output_path = Path(args.output)
|
||||
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
print(f"Fetching up to {args.limit} {args.state} PRs from {args.owner}/{args.repo}…")
|
||||
prs = fetch_prs(args.owner, args.repo, args.state, token, args.limit)
|
||||
print(f"Found {len(prs)} {args.state} PRs.")
|
||||
|
||||
pairs = []
|
||||
skipped = 0
|
||||
|
||||
for i, pr in enumerate(prs, 1):
|
||||
pr_num = pr["number"]
|
||||
title = pr.get("title", "").strip()
|
||||
|
||||
if not title:
|
||||
skipped += 1
|
||||
continue
|
||||
|
||||
diff = None
|
||||
if args.include_diff:
|
||||
diff = fetch_pr_diff(args.owner, args.repo, pr_num, token)
|
||||
if diff is None:
|
||||
diff = "[Diff unavailable]"
|
||||
|
||||
pair = make_dpo_pair(pr, diff)
|
||||
if pair:
|
||||
pairs.append(pair)
|
||||
else:
|
||||
skipped += 1
|
||||
|
||||
if i % 10 == 0 or i == len(prs):
|
||||
print(f" [{i}/{len(prs)}] PR#{pr_num}: {title[:60]:60s} "
|
||||
f"({args.state} since {pr.get('created_at','?')[:10]}, "
|
||||
f"labels: {[l['name'] for l in pr.get('labels',[])]})")
|
||||
|
||||
with output_path.open("w") as f:
|
||||
for pair in pairs:
|
||||
f.write(json.dumps(pair, ensure_ascii=False) + "\n")
|
||||
|
||||
print(f"\n✓ Wrote {len(pairs)} DPO pairs to {output_path}")
|
||||
if skipped:
|
||||
print(f" ({skipped} PRs skipped — empty title or error)")
|
||||
|
||||
print(f"\nTo fine-tune with this data:")
|
||||
print(f" 1. Verify output quality: head -1 {output_path} | python3 -m json.tool")
|
||||
print(f" 2. For cloud training, add to training/axolotl.yaml datasets section")
|
||||
print(f" 3. For local training: make train-local (if data is consumed by pipeline)")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
0
training/data/pr_dpo_pairs.jsonl
Normal file
0
training/data/pr_dpo_pairs.jsonl
Normal file
Reference in New Issue
Block a user