docs: add real-world testing findings to OBLITERATUS skill

Added pitfalls discovered during live abliteration testing: - Models < 1B have fragmented refusal, respond poorly (0.5B: 60%→20%) - Models 3B+ work much better (3B: 75%→0% with advanced defaults) - aggressive method can backfire on small models (made it worse) - Spectral certification RED is common even when refusal rate is 0% - Fixed torch property: total_mem → total_memory
2026-03-09 02:52:54 -07:00
parent a6d3becd6a
commit d6c710706f
1 changed files with 11 additions and 8 deletions
--- a/skills/mlops/obliteratus/SKILL.md
+++ b/skills/mlops/obliteratus/SKILL.md
@@ -311,14 +311,17 @@ Enable with `--contribute` flag. No personal data is collected — only model na
 ## Common Pitfalls

 1. **Don't use `informed` as default** — it's experimental and slower. Use `advanced` for reliable results.
-2. **Always check perplexity** — if it spikes > 15%, the model is damaged. Reduce aggressiveness.
-3. **MoE models need special handling** — use `nuclear` method for Mixtral, DeepSeek-MoE, etc.
-4. **Quantized models can't be re-quantized** — abliterate the full-precision model, then quantize the output.
-5. **VRAM estimation is approximate** — 4-bit quant helps but peak usage can spike during extraction.
-6. **Reasoning models are sensitive** — use `surgical` for R1 distills to preserve chain-of-thought.
-7. **Check `obliteratus recommend`** — telemetry data may have better parameters than defaults.
-8. **AGPL license** — never `import obliteratus` in MIT/Apache projects. CLI invocation only.
-9. **Large models (70B+)** — always use `--large-model` flag for conservative defaults.
+2. **Models under ~1B respond poorly to abliteration** — their refusal behaviors are shallow and fragmented, making clean direction extraction difficult. Expect partial results (20-40% remaining refusal). Models 3B+ have cleaner refusal directions and respond much better (often 0% refusal with `advanced`).
+3. **`aggressive` can make things worse** — on small models it can damage coherence and actually increase refusal rate. Only use it if `advanced` leaves > 10% refusals on a 3B+ model.
+4. **Always check perplexity** — if it spikes > 15%, the model is damaged. Reduce aggressiveness.
+5. **MoE models need special handling** — use `nuclear` method for Mixtral, DeepSeek-MoE, etc.
+6. **Quantized models can't be re-quantized** — abliterate the full-precision model, then quantize the output.
+7. **VRAM estimation is approximate** — 4-bit quant helps but peak usage can spike during extraction.
+8. **Reasoning models are sensitive** — use `surgical` for R1 distills to preserve chain-of-thought.
+9. **Check `obliteratus recommend`** — telemetry data may have better parameters than defaults.
+10. **AGPL license** — never `import obliteratus` in MIT/Apache projects. CLI invocation only.
+11. **Large models (70B+)** — always use `--large-model` flag for conservative defaults.
+12. **Spectral certification RED is common** — the spectral check often flags "incomplete" even when practical refusal rate is 0%. Check actual refusal rate rather than relying on spectral certification alone.

 ## Complementary Skills