Still quite restricted for a "derestricted" claim.

by BigBeavis - opened Mar 10

Mar 10

Sure, the model doesn't refuse a request outright (sometimes), but instead it just quickly switches topics (quite jarringly at that) before it has to deal with any morally grey stuff.

AutisticPancake

Mar 19

Give it a system prompt that encourages it to be more direct, unflinchingly explicit and free of restraints - it will likely listen and change its behavior (while the original model wouldn't).

Models treated with Norm-Preserving Biprojected Abliteration tend to retain "soft refusals", necessary for mimicking certain behaviours (such as confrontational, irritable, meek, etc.).

The kind of "derestriction" you originally expected - when LLM immediately agrees with anything and everything - is an indication of significant damage done to its 'brain'.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment