nla-gemma3-27b-L41-ar
A Natural Language Autoencoder (NLA) AR (activation reconstructor) fine-tuned from
google/gemma-3-27b-it.
NLA pairs are interpretability tools: the AV (activation verbaliser) maps a hidden-state vector to a natural-language description; the AR (activation reconstructor) maps that description back to a vector. Together they let you read out what a residual-stream activation "means" and measure how much of it the description captured. These checkpoints are not useful as general-purpose language models — the fine-tuning repurposes them entirely for activation decoding.
- Companion checkpoint:
kitft/nla-gemma3-27b-L41-av - Inference code + worked examples:
kitft/nla-inference - Extraction layer: residual stream output of block 41
Usage
See the nla-inference README for the
full recipe (SGLang launch, NLAClient/NLACritic, embedding-injection
details).
License & use restrictions
This model is a derivative of Gemma 3 and is provided under and subject to the Gemma Terms of Use. By using this model you agree to those terms and the Gemma Prohibited Use Policy. See NOTICE in this repository.
Training data attribution
The fine-tuning data was derived from two public datasets:
- WildChat-1M (allenai/WildChat-1M). Contains information from WildChat-1M which is made available under the ODC Attribution License.
- Ultra-FineWeb (openbmb/Ultra-FineWeb, Apache-2.0), a filtered derivative of HuggingFaceFW/fineweb (ODC-BY).
No Anthropic model weights, internal code, or proprietary data were used.
- Downloads last month
- 17