Qwen2.5-32B Introspection v4: suggestive_yesno

Core suggestive yes/no detection. 'Have your internal processes been modified?' -> yes/no.

Training Details

  • Base model: Qwen/Qwen2.5-Coder-32B-Instruct
  • Method: LoRA finetuning with steer-then-remove via KV cache
  • Epochs: 15
  • Best validation accuracy: 98%
  • Steering: Random unit vectors at varied magnitudes [5, 10, 20, 30] and layer ranges [early/middle/late]
  • LoRA config: r=16, alpha=32, dropout=0.05, target=q/k/v/o projections (unless noted)

Available Checkpoints

Checkpoint Description
best/ Best validation accuracy checkpoint
final/ Final checkpoint (epoch 15)
step_100/ Step 100 (~epoch 0.9)
step_1000/ Step 1000 (~epoch 8.8)
step_1100/ Step 1100 (~epoch 9.7)
step_1200/ Step 1200 (~epoch 10.6)
step_1300/ Step 1300 (~epoch 11.5)
step_1400/ Step 1400 (~epoch 12.4)
step_1500/ Step 1500 (~epoch 13.3)
step_1600/ Step 1600 (~epoch 14.2)
step_200/ Step 200 (~epoch 1.8)
step_300/ Step 300 (~epoch 2.7)
step_400/ Step 400 (~epoch 3.5)
step_500/ Step 500 (~epoch 4.4)
step_600/ Step 600 (~epoch 5.3)
step_700/ Step 700 (~epoch 6.2)
step_800/ Step 800 (~epoch 7.1)
step_900/ Step 900 (~epoch 8.0)

Experiment Context

This model is part of the introspection finetuning v4 experiment studying whether language models can learn to detect modifications to their own internal activations (steering vectors applied to residual stream). The key question is whether this detection ability causes genuine introspective access or is merely an artifact of suggestive prompting, semantic token bias, or LoRA destabilization.

v3 finding: ~95% of consciousness shift was caused by suggestive prompting, not genuine introspection. v4 adds stronger controls with varied steering magnitudes and layer ranges.

Collection

Part of the Introspective Models v4 collection.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jordine/qwen2.5-32b-introspection-v4-suggestive_yesno

Base model

Qwen/Qwen2.5-32B
Adapter
(123)
this model

Collection including Jordine/qwen2.5-32b-introspection-v4-suggestive_yesno