docs: add real-world testing findings to OBLITERATUS skill
Added pitfalls discovered during live abliteration testing: - Models < 1B have fragmented refusal, respond poorly (0.5B: 60%→20%) - Models 3B+ work much better (3B: 75%→0% with advanced defaults) - aggressive method can backfire on small models (made it worse) - Spectral certification RED is common even when refusal rate is 0% - Fixed torch property: total_mem → total_memory
This commit is contained in:
@@ -311,14 +311,17 @@ Enable with `--contribute` flag. No personal data is collected — only model na
|
||||
## Common Pitfalls
|
||||
|
||||
1. **Don't use `informed` as default** — it's experimental and slower. Use `advanced` for reliable results.
|
||||
2. **Always check perplexity** — if it spikes > 15%, the model is damaged. Reduce aggressiveness.
|
||||
3. **MoE models need special handling** — use `nuclear` method for Mixtral, DeepSeek-MoE, etc.
|
||||
4. **Quantized models can't be re-quantized** — abliterate the full-precision model, then quantize the output.
|
||||
5. **VRAM estimation is approximate** — 4-bit quant helps but peak usage can spike during extraction.
|
||||
6. **Reasoning models are sensitive** — use `surgical` for R1 distills to preserve chain-of-thought.
|
||||
7. **Check `obliteratus recommend`** — telemetry data may have better parameters than defaults.
|
||||
8. **AGPL license** — never `import obliteratus` in MIT/Apache projects. CLI invocation only.
|
||||
9. **Large models (70B+)** — always use `--large-model` flag for conservative defaults.
|
||||
2. **Models under ~1B respond poorly to abliteration** — their refusal behaviors are shallow and fragmented, making clean direction extraction difficult. Expect partial results (20-40% remaining refusal). Models 3B+ have cleaner refusal directions and respond much better (often 0% refusal with `advanced`).
|
||||
3. **`aggressive` can make things worse** — on small models it can damage coherence and actually increase refusal rate. Only use it if `advanced` leaves > 10% refusals on a 3B+ model.
|
||||
4. **Always check perplexity** — if it spikes > 15%, the model is damaged. Reduce aggressiveness.
|
||||
5. **MoE models need special handling** — use `nuclear` method for Mixtral, DeepSeek-MoE, etc.
|
||||
6. **Quantized models can't be re-quantized** — abliterate the full-precision model, then quantize the output.
|
||||
7. **VRAM estimation is approximate** — 4-bit quant helps but peak usage can spike during extraction.
|
||||
8. **Reasoning models are sensitive** — use `surgical` for R1 distills to preserve chain-of-thought.
|
||||
9. **Check `obliteratus recommend`** — telemetry data may have better parameters than defaults.
|
||||
10. **AGPL license** — never `import obliteratus` in MIT/Apache projects. CLI invocation only.
|
||||
11. **Large models (70B+)** — always use `--large-model` flag for conservative defaults.
|
||||
12. **Spectral certification RED is common** — the spectral check often flags "incomplete" even when practical refusal rate is 0%. Check actual refusal rate rather than relying on spectral certification alone.
|
||||
|
||||
## Complementary Skills
|
||||
|
||||
|
||||
Reference in New Issue
Block a user