🌍 Gemma‑3‑12B‑QAT‑Abliterated — Sikaworld FP4 Editions

Blackwell‑optimized FP4 text encoders for LTX‑2 and 2.3, based on mlabonne’s improved Abliteration technique.

🌐 Overview

The NVIDIA Blackwell architecture update introduced first‑class support for FP4/NVFP4 inference, enabling extremely fast and memory‑efficient text encoders. At the same time, the LTX‑2 development team officially recommends Gemma‑QAT‑based encoders for video generation due to their stable activation distributions, strong semantic gradients, and robust temporal behavior.

This repository provides two custom FP4 variants of the uncensored Gemma‑3‑12B‑QAT model created by mlabonne using his improved Abliteration v2 method.

Both models are fully uncensored, explicitly optimized for LTX‑2 and of course LTX-2.3, and designed to deliver strong motion vectors while maintaining spatial coherence.

📦 The Two FP4 Editions

🛡️ FP4 High‑Fidelity Edition (Protected Layers) [Recommended]

This version uses a surgical mixed‑precision stabilizer to preserve facial symmetry and spatial coherence.

Layers 0–1 (Input embeddings) kept in BF16.
Layers 44–47 (Final output projections) kept in BF16.
All LayerNorms and Biases kept in BF16.
All mid-transformer layers quantized to FP4.

Best for: Maximum stability, minimal facial drift, consistent anatomy, and strong but mathematically controlled motion vectors. Highly recommended for complex I2V/T2V tasks.

🚀 FP4 Pure Edition (No Protected Layers)

This version is a relentless, flat FP4/NVFP4 quantization of the Abliterated QAT model.

All transformer layers (0-47) quantized to FP4.
Only LayerNorms and Biases remain in BF16.

Best for: Maximum performance, the absolute lowest VRAM footprint, and the fastest inference on Blackwell GPUs. It trades a tiny amount of spatial stability for raw speed and more intense, aggressive motion vectors.

🧰 Usage in ComfyUI

Download your preferred .safetensors file.
Place the file inside your ComfyUI models folder: ComfyUI/models/text_encoders/
Load the model via the standard DualCLIPLoader or LTX‑2 Text Encoder Loader.
Recommended dtype: fp8_e4m3fn (Note: The BF16‑protected layers will automatically be respected and kept in BF16 by ComfyUI's loader).

💡 Prompting Tip: Start your prompts with direct action verbs (e.g., "running", "falling", "embracing", "exploding"). FP4 models respond extremely well to dynamic, upfront phrasing.

🔬 Technical Background

Why Gemma‑QAT for LTX‑2?

The LTX‑2 base model architecture reacts very sensitively to the text encoder's conditioning. The LTX‑team recommends QAT (Quantization-Aware Training) encoders because they provide:

Stable activation distributions
Smooth residual streams
Strong temporal gradients
Robust spatial alignment
Heavily reduced “frozen video” (motion collapse) behavior

The Abliteration V2 Magic

These models are derived from mlabonne/gemma-3-12b-it-qat-abliterated. Abliteration is a multi‑step orthogonalization process, not just a simple deletion. It compares residual streams from harmful vs. harmless samples, computes a "refusal direction", and subtracts this direction natively from the hidden states of target modules. The result is a fully uncensored, high‑fidelity instruction model with loud and uninhibited semantic gradients — acting as the perfect cure for static/frozen LTX‑2 generations.

Why FP4 for Blackwell GPUs?

NVIDIA's latest Blackwell Tensor Cores are explicitly optimized for FP4/NVFP4 mathematical operations. This format offers:

Significantly higher throughput than FP8
Extremely low VRAM footprint
Faster long‑prompt (prefill) inference
Decreased pressure on memory bandwidth

These FP4 editions feature a pure FP4 tensor layout (with appropriate micro-block and global scales) fully compatible with NVFP4 hardware acceleration on RTX 50‑series and data center hardware.

📊 Technical Summary

Component	🛡️ High‑Fidelity Edition	🚀 Pure Edition
Base Model	mlabonne/gemma‑3‑12b‑it‑qat‑abliterated	mlabonne/gemma‑3‑12b‑it‑qat‑abliterated
Quantization	FP4 + BF16 stabilizer	Pure FP4
Protected Layers	`0–1`, `44–47`	None
Norms & Biases	BF16	BF16
Inference Speed	Fast	Fastest
Stability	Highest	Moderate
VRAM Usage	Low	Lowest

🏷️ Credits & Acknowledgments

Base Model & Abliteration v2: mlabonne
QAT Architecture & Gemma Weights: Google
FP4 Optimization, Hybrid Architecture & Stabilization: Sikaworld
LTX‑2 & QAT Recommendation: Lightricks / LTX‑Team

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Sikaworld1990/gemma-3-12b-qat-abliterated-sikaworld-fp4-ltx2

Base model

google/gemma-3-12b-pt

Finetuned

google/gemma-3-12b-it

Finetuned

google/gemma-3-12b-it-qat-q4_0-unquantized

Finetuned

mlabonne/gemma-3-12b-it-qat-abliterated

Finetuned

(2)

this model