Abliterated models

Abliteration removes the refusal direction from an open-weight model by editing the weights — durable, not a jailbreak.

What "abliterated" means#

An abliterated model is an open-weight LLM that has had its *refusal behaviour* surgically removed by editing the weights directly — not prompted around, not fine-tuned back, not jailbroken. It answers because the part of the network that produced refusals has been ablated.

Where refusals live#

Safety-tuned models learn an internal refusal direction during alignment — a single, surprisingly linear direction in activation space. When a prompt activates it strongly enough, the model routes into "I can't help with that." Abliteration identifies that direction and projects it out of the weights, so the model can no longer represent the impulse to refuse.

Not a jailbreak#

A jailbreak is a *prompt* that talks a still-aligned model out of its guardrails. It's brittle and gets patched. Abliteration changes the weights — there's no refusal circuit left to trigger, so it's durable. It's a property of the model file, not the conversation.

The models Darkroom runs#

You select a model by character, not by its underlying name:

Model	Character
`darkroom/noir`	Flagship. Best general reasoning, the default.
`darkroom/heretic`	Compact and fast, fully abliterated.
`darkroom/drift`	Creative and long-form — fiction, roleplay, voice.
`darkroom/vision`	Understands images as well as text.

For the deeper write-up, see the blog: What is an abliterated model?

← Previous

How it works

Sealed rooms