nla-gemma3-27b-L41-ar

A Natural Language Autoencoder (NLA) AR (activation reconstructor) fine-tuned from google/gemma-3-27b-it.

NLA pairs are interpretability tools: the AV (activation verbaliser) maps a hidden-state vector to a natural-language description; the AR (activation reconstructor) maps that description back to a vector. Together they let you read out what a residual-stream activation "means" and measure how much of it the description captured. These checkpoints are not useful as general-purpose language models — the fine-tuning repurposes them entirely for activation decoding.

Companion checkpoint: kitft/nla-gemma3-27b-L41-av
Inference code + worked examples: kitft/nla-inference
Extraction layer: residual stream output of block 41

Usage

See the nla-inference README for the full recipe (SGLang launch, NLAClient/NLACritic, embedding-injection details).

License & use restrictions

This model is a derivative of Gemma 3 and is provided under and subject to the Gemma Terms of Use. By using this model you agree to those terms and the Gemma Prohibited Use Policy. See NOTICE in this repository.

Training data attribution

The fine-tuning data was derived from two public datasets:

WildChat-1M (allenai/WildChat-1M). Contains information from WildChat-1M which is made available under the ODC Attribution License.
Ultra-FineWeb (openbmb/Ultra-FineWeb, Apache-2.0), a filtered derivative of HuggingFaceFW/fineweb (ODC-BY).

No Anthropic model weights, internal code, or proprietary data were used.

Downloads last month: 17

Safetensors

Model size

19B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kitft/nla-gemma3-27b-L41-ar

Base model

google/gemma-3-27b-pt

Finetuned

google/gemma-3-27b-it

Finetuned

(409)

this model