🌍 Gemma‑3‑12B‑QAT‑Abliterated — Sikaworld FP4 Editions

Blackwell‑optimized FP4 text encoders for LTX‑2 and 2.3, based on mlabonne’s improved Abliteration technique.

Important General Note Using the official LTX-2.3 dev NVFP4 model in combination with any of these FP4 text encoders results in a noticeable quality degradation compared to pairing the same text encoders with FP8 or BF16 versions of the LTX-2.3 model. Different workflow variants and their respective quality/speed trade-offs are demonstrated in the embedded showcase videos.

I have also modified the official ComfyUI template workflow by adjusting the audio and video parameters for better action dynamics and clearer speech output — the same optimizations I already apply in my standard workflow when using Transformers-only checkpoints from KJ.

🌐 Overview

The NVIDIA Blackwell architecture update introduced first‑class support for FP4/NVFP4 inference, enabling extremely fast and memory‑efficient text encoders. At the same time, the LTX‑2 development team officially recommends Gemma‑QAT‑based encoders for video generation due to their stable activation distributions, strong semantic gradients, and robust temporal behavior.

This repository provides two custom FP4 variants of the uncensored Gemma‑3‑12B‑QAT model created by mlabonne using his improved Abliteration v2 method.

Both models are fully uncensored, explicitly optimized for LTX‑2 and of course LTX-2.3, and designed to deliver strong motion vectors while maintaining spatial coherence.

📦 The Two FP4 Editions

🛡️ FP4 High‑Fidelity Edition (Protected Layers) [Recommended]

This version uses a surgical mixed‑precision stabilizer to preserve facial symmetry and spatial coherence.

Layers 0–1 (Input embeddings) kept in BF16.
Layers 44–47 (Final output projections) kept in BF16.
All LayerNorms and Biases kept in BF16.
All mid-transformer layers quantized to FP4.

Best for: Maximum stability, minimal facial drift, consistent anatomy, and strong but mathematically controlled motion vectors. Highly recommended for complex I2V/T2V tasks.

🚀 FP4 Pure Edition (No Protected Layers)

This version is a relentless, flat FP4/NVFP4 quantization of the Abliterated QAT model.

All transformer layers (0-47) quantized to FP4.
Only LayerNorms and Biases remain in BF16.

Best for: Maximum performance, the absolute lowest VRAM footprint, and the fastest inference on Blackwell GPUs. It trades a tiny amount of spatial stability for raw speed and more intense, aggressive motion vectors.

🧰 Usage in ComfyUI

Download your preferred .safetensors file.
Place the file inside your ComfyUI models folder: ComfyUI/models/text_encoders/
Load the model via the standard DualCLIPLoader or LTX‑2 Text Encoder Loader.
Recommended dtype: fp8_e4m3fn (Note: The BF16‑protected layers will automatically be respected and kept in BF16 by ComfyUI's loader).

💡 Prompting Tip: Start your prompts with direct action verbs (e.g., "running", "falling", "embracing", "exploding"). FP4 models respond extremely well to dynamic, upfront phrasing.

🔬 Technical Background

Why Gemma‑QAT for LTX‑2?

The LTX‑2 base model architecture reacts very sensitively to the text encoder's conditioning. The LTX‑team recommends QAT (Quantization-Aware Training) encoders because they provide:

Stable activation distributions
Smooth residual streams
Strong temporal gradients
Robust spatial alignment
Heavily reduced “frozen video” (motion collapse) behavior

The Abliteration V2 Magic

These models are derived from mlabonne/gemma-3-12b-it-qat-abliterated. Abliteration is a multi‑step orthogonalization process, not just a simple deletion. It compares residual streams from harmful vs. harmless samples, computes a "refusal direction", and subtracts this direction natively from the hidden states of target modules. The result is a fully uncensored, high‑fidelity instruction model with loud and uninhibited semantic gradients — acting as the perfect cure for static/frozen LTX‑2 generations.

Why FP4 for Blackwell GPUs?

NVIDIA's latest Blackwell Tensor Cores are explicitly optimized for FP4/NVFP4 mathematical operations. This format offers:

Significantly higher throughput than FP8
Extremely low VRAM footprint
Faster long‑prompt (prefill) inference
Decreased pressure on memory bandwidth

These FP4 editions feature a pure FP4 tensor layout (with appropriate micro-block and global scales) fully compatible with NVFP4 hardware acceleration on RTX 50‑series and data center hardware.

📊 Technical Summary

Component🛡️ High‑Fidelity Edition🚀 Pure EditionBase Modelmlabonne/gemma‑3‑12b‑it‑qat‑abliteratedmlabonne/gemma‑3‑12b‑it‑qat‑abliteratedQuantizationFP4 + BF16 stabilizerPure FP4Protected Layers0–1, 44–47NoneNorms & BiasesBF16BF16Inference SpeedFastFastestStabilityHighestModerateVRAM UsageLowLowest

🏷️ Credits & Acknowledgments

Base Model & Abliteration v2: mlabonne
QAT Architecture & Gemma Weights: Google
FP4 Optimization, Hybrid Architecture & Stabilization: Sikaworld
LTX‑2 & QAT Recommendation: Lightricks / LTX‑Team

Model Type	Checkpoint
Base Model	LTXV 2.3
Published	2026-03-21

gemma-3-12b-qat-abliterated-sikaworld-fp4-ltx2

Details

Download Files (1)

Model description