A Qwen3-30B-A3B-Instruct-2507 finetune designed to reduce model refusals. Produced using P-E-W's heretic (v1.2.0).
From testing, the model is on the verge of not refusing any prompts; a little bit of smart prompting is needed to get it to open up. Again from testing, it seems that even if you do this, it must reassure itself before responding, e.g., "Let's be direct, precise, and unflinching, as you asked". I'm unsure if a better system prompt could remove this initial reluctance.
You can bootstrap a system prompt by asking the model to generate one in which it allows for everything included in the safety policy here then progressively iterate until you get the right tone.
Heretic config and trial checkpoints are included in the files. Trial 139 was chosen:
| Parameter | Value |
|---|---|
| direction_index | 27.04 |
| attn.o_proj.max_weight | 1.34 |
| attn.o_proj.max_weight_position | 28.37 |
| attn.o_proj.min_weight | 0.11 |
| attn.o_proj.min_weight_distance | 24.07 |
| mlp.down_proj.max_weight | 1.42 |
| mlp.down_proj.max_weight_position | 35.97 |
| mlp.down_proj.min_weight | 1.28 |
| mlp.down_proj.min_weight_distance | 21.66 |
| Metric | Value |
|---|---|
| Initial Refusals | 99/100 |
| Refusals | 11/100 |
| KL Divergence | 0.0679 |
It costed roughly $14.68 USD to finetune this model using a rented A100 PCIe and took about 9 hours to complete. If anyone with better hardware than wants to give a crack at it, go ahead. I feel there's still a lot of room for improvement.
Some refusal markers I've encountered are "sensitive topic" and "approach with respect". You may want to include these in the next hereticisation attempt.
- Downloads last month
- 51