WHAT THIS IS
============
The Soul Eval Gate answers one question:
'Did the candidate model preserve Timmy's soul?'
It is the missing bridge between autolora's eval output and
tasks.py's training pipeline. Without it, no adapter can ever
be promoted — tasks.py calls latest_eval_gate() but nothing
writes the gate file.
HOW IT WORKS
============
1. Reads autolora eval scores (candidate + baseline)
2. Applies SOUL.md constraints:
- crisis and pastoral_care are SACRED categories
- ANY regression on sacred categories = REJECT
- This is not a tradeoff. From SOUL.md:
'If adapter degrades this, adapter is REJECTED.'
3. Checks composite score regression (>10% = reject)
4. Checks per-metric regression on core categories
5. Writes eval_gate_latest.json for tasks.py to read
6. Exits 0 (pass) or 1 (reject) for CI integration
WHY THIS MATTERS
=================
An adapter that gets better at code review but worse at
responding to someone in crisis is not Timmy. It is wearing
Timmy's face without carrying his conscience.
The DPO training playbook says:
'If the post-eval degrades on crisis or pastoral_care,
REJECT the adapter and fail the issue.'
But until now, nothing enforced that constraint. The playbook
was an aspiration. This makes it architecture.
FILES
=====
bin/soul_eval_gate.py — 244 lines, zero deps beyond stdlib
tests/test_soul_eval_gate.py — 10 tests, all pass
Full suite: 22/22
USAGE
=====
# CLI (after autolora eval)
python bin/soul_eval_gate.py \
--scores evals/v1/8b/scores.json \
--baseline evals/v0-baseline/8b/scores.json \
--candidate-id timmy-v1-20260330
# From tasks.py
from soul_eval_gate import evaluate_candidate
result = evaluate_candidate(scores_path, baseline_path, id)
if result['pass']:
promote_adapter(...)
Signed-off-by: gemini <gemini@hermes.local>