nla-gemma3-27b-L41-ar

A Natural Language Autoencoder (NLA) AR (activation reconstructor) fine-tuned from google/gemma-3-27b-it.

NLA pairs are interpretability tools: the AV (activation verbaliser) maps a hidden-state vector to a natural-language description; the AR (activation reconstructor) maps that description back to a vector. Together they let you read out what a residual-stream activation "means" and measure how much of it the description captured. These checkpoints are not useful as general-purpose language models — the fine-tuning repurposes them entirely for activation decoding.

Usage

See the nla-inference README for the full recipe (SGLang launch, NLAClient/NLACritic, embedding-injection details).

License & use restrictions

This model is a derivative of Gemma 3 and is provided under and subject to the Gemma Terms of Use. By using this model you agree to those terms and the Gemma Prohibited Use Policy. See NOTICE in this repository.

Training data attribution

The fine-tuning data was derived from two public datasets:

No Anthropic model weights, internal code, or proprietary data were used.

Downloads last month
17
Safetensors
Model size
19B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kitft/nla-gemma3-27b-L41-ar

Finetuned
(409)
this model