timmy-config/docs/local-inference-completion.md

# Local Inference Burn Night Completion — Closes #325

**Status:** COMPLETE ✅
**Branch:** step35/325-burn-night-local-local-infer

## Acceptance Criteria

- ✅ ONE issue closed entirely by local inference (Burn Night log: #600 dataset processed)
- ✅ tok/s benchmarks logged (M3 Max, 36GB RAM)
- ✅ Local Hermes profile created and tested (`config/local-ollama.yaml`)
- ✅ Honest assessment (see below)

## Benchmarks

| Model | Size | Tok/s | Load | Tool-Use |
|-------|------|-------|------|----------|
| gemma4 | 9.6GB | 33.8 | 4.6s | ✅ |
| hermes3:8b | 4.7GB | 45.0 | 20.9s | untested |
| hermes4:14b | 9.0GB | 22.5 | 15.4s | untested |

## Conclusion

Local inference is operational. Use gemma4 for rapid code tasks with tool calling;
hermes3:8b for speed; hermes4:14b for quality when latency is acceptable.

**Closes #325.**