🌍 Gemma‑3‑12B‑QAT‑Abliterated — Sikaworld FP4 Editions

Blackwell‑optimized FP4 text encoders for LTX‑2 and 2.3, based on mlabonne’s improved Abliteration technique.

🌐 Overview

The NVIDIA Blackwell architecture update introduced first‑class support for FP4/NVFP4 inference, enabling extremely fast and memory‑efficient text encoders. At the same time, the LTX‑2 development team officially recommends Gemma‑QAT‑based encoders for video generation due to their stable activation distributions, strong semantic gradients, and robust temporal behavior.

This repository provides two custom FP4 variants of the uncensored Gemma‑3‑12B‑QAT model created by mlabonne using his improved Abliteration v2 method.

Both models are fully uncensored, explicitly optimized for LTX‑2 and of course LTX-2.3, and designed to deliver strong motion vectors while maintaining spatial coherence.


📦 The Two FP4 Editions

🛡️ FP4 High‑Fidelity Edition (Protected Layers) [Recommended]

This version uses a surgical mixed‑precision stabilizer to preserve facial symmetry and spatial coherence.

  • Layers 0–1 (Input embeddings) kept in BF16.
  • Layers 44–47 (Final output projections) kept in BF16.
  • All LayerNorms and Biases kept in BF16.
  • All mid-transformer layers quantized to FP4.

Best for: Maximum stability, minimal facial drift, consistent anatomy, and strong but mathematically controlled motion vectors. Highly recommended for complex I2V/T2V tasks.

🚀 FP4 Pure Edition (No Protected Layers)

This version is a relentless, flat FP4/NVFP4 quantization of the Abliterated QAT model.

  • All transformer layers (0-47) quantized to FP4.
  • Only LayerNorms and Biases remain in BF16.

Best for: Maximum performance, the absolute lowest VRAM footprint, and the fastest inference on Blackwell GPUs. It trades a tiny amount of spatial stability for raw speed and more intense, aggressive motion vectors.


🧰 Usage in ComfyUI

  1. Download your preferred .safetensors file.
  2. Place the file inside your ComfyUI models folder: ComfyUI/models/text_encoders/
  3. Load the model via the standard DualCLIPLoader or LTX‑2 Text Encoder Loader.
  4. Recommended dtype: fp8_e4m3fn (Note: The BF16‑protected layers will automatically be respected and kept in BF16 by ComfyUI's loader).

💡 Prompting Tip: Start your prompts with direct action verbs (e.g., "running", "falling", "embracing", "exploding"). FP4 models respond extremely well to dynamic, upfront phrasing.


🔬 Technical Background

Why Gemma‑QAT for LTX‑2?

The LTX‑2 base model architecture reacts very sensitively to the text encoder's conditioning. The LTX‑team recommends QAT (Quantization-Aware Training) encoders because they provide:

  • Stable activation distributions
  • Smooth residual streams
  • Strong temporal gradients
  • Robust spatial alignment
  • Heavily reduced “frozen video” (motion collapse) behavior

The Abliteration V2 Magic

These models are derived from mlabonne/gemma-3-12b-it-qat-abliterated. Abliteration is a multi‑step orthogonalization process, not just a simple deletion. It compares residual streams from harmful vs. harmless samples, computes a "refusal direction", and subtracts this direction natively from the hidden states of target modules. The result is a fully uncensored, high‑fidelity instruction model with loud and uninhibited semantic gradients — acting as the perfect cure for static/frozen LTX‑2 generations.

Why FP4 for Blackwell GPUs?

NVIDIA's latest Blackwell Tensor Cores are explicitly optimized for FP4/NVFP4 mathematical operations. This format offers:

  • Significantly higher throughput than FP8
  • Extremely low VRAM footprint
  • Faster long‑prompt (prefill) inference
  • Decreased pressure on memory bandwidth

These FP4 editions feature a pure FP4 tensor layout (with appropriate micro-block and global scales) fully compatible with NVFP4 hardware acceleration on RTX 50‑series and data center hardware.


📊 Technical Summary

Component 🛡️ High‑Fidelity Edition 🚀 Pure Edition
Base Model mlabonne/gemma‑3‑12b‑it‑qat‑abliterated mlabonne/gemma‑3‑12b‑it‑qat‑abliterated
Quantization FP4 + BF16 stabilizer Pure FP4
Protected Layers 0–1, 44–47 None
Norms & Biases BF16 BF16
Inference Speed Fast Fastest
Stability Highest Moderate
VRAM Usage Low Lowest

--

🏷️ Credits & Acknowledgments

  • Base Model & Abliteration v2: mlabonne
  • QAT Architecture & Gemma Weights: Google
  • FP4 Optimization, Hybrid Architecture & Stabilization: Sikaworld
  • LTX‑2 & QAT Recommendation: Lightricks / LTX‑Team
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Sikaworld1990/gemma-3-12b-qat-abliterated-sikaworld-fp4-ltx2