gemma-3-12b-qat-abliterated-sikaworld-fp4-ltx2
Details
Download Files (1)
Model description
đ Gemmaâ3â12BâQATâAbliterated â Sikaworld FP4 Editions
Blackwellâoptimized FP4 text encoders for LTXâ2 and 2.3, based on mlabonneâs improved Abliteration technique.
Important General Note Using the official LTX-2.3 dev NVFP4 model in combination with any of these FP4 text encoders results in a noticeable quality degradation compared to pairing the same text encoders with FP8 or BF16 versions of the LTX-2.3 model. Different workflow variants and their respective quality/speed trade-offs are demonstrated in the embedded showcase videos.
I have also modified the official ComfyUI template workflow by adjusting the audio and video parameters for better action dynamics and clearer speech output â the same optimizations I already apply in my standard workflow when using Transformers-only checkpoints from KJ.
đ Overview
The NVIDIA Blackwell architecture update introduced firstâclass support for FP4/NVFP4 inference, enabling extremely fast and memoryâefficient text encoders. At the same time, the LTXâ2 development team officially recommends GemmaâQATâbased encoders for video generation due to their stable activation distributions, strong semantic gradients, and robust temporal behavior.
This repository provides two custom FP4 variants of the uncensored Gemmaâ3â12BâQAT model created by mlabonne using his improved Abliteration v2 method.
Both models are fully uncensored, explicitly optimized for LTXâ2 and of course LTX-2.3, and designed to deliver strong motion vectors while maintaining spatial coherence.
đŚ The Two FP4 Editions
đĄď¸ FP4 HighâFidelity Edition (Protected Layers) [Recommended]
This version uses a surgical mixedâprecision stabilizer to preserve facial symmetry and spatial coherence.
Layers 0â1 (Input embeddings) kept in BF16.
Layers 44â47 (Final output projections) kept in BF16.
All LayerNorms and Biases kept in BF16.
All mid-transformer layers quantized to FP4.
Best for: Maximum stability, minimal facial drift, consistent anatomy, and strong but mathematically controlled motion vectors. Highly recommended for complex I2V/T2V tasks.
đ FP4 Pure Edition (No Protected Layers)
This version is a relentless, flat FP4/NVFP4 quantization of the Abliterated QAT model.
All transformer layers (0-47) quantized to FP4.
Only LayerNorms and Biases remain in BF16.
Best for: Maximum performance, the absolute lowest VRAM footprint, and the fastest inference on Blackwell GPUs. It trades a tiny amount of spatial stability for raw speed and more intense, aggressive motion vectors.
đ§° Usage in ComfyUI
Download your preferred
.safetensorsfile.Place the file inside your ComfyUI models folder:
ComfyUI/models/text_encoders/Load the model via the standard DualCLIPLoader or LTXâ2 Text Encoder Loader.
Recommended dtype:
fp8_e4m3fn(Note: The BF16âprotected layers will automatically be respected and kept in BF16 by ComfyUI's loader).
đĄ Prompting Tip: Start your prompts with direct action verbs (e.g., "running", "falling", "embracing", "exploding"). FP4 models respond extremely well to dynamic, upfront phrasing.
đŹ Technical Background
Why GemmaâQAT for LTXâ2?
The LTXâ2 base model architecture reacts very sensitively to the text encoder's conditioning. The LTXâteam recommends QAT (Quantization-Aware Training) encoders because they provide:
Stable activation distributions
Smooth residual streams
Strong temporal gradients
Robust spatial alignment
Heavily reduced âfrozen videoâ (motion collapse) behavior
The Abliteration V2 Magic
These models are derived from mlabonne/gemma-3-12b-it-qat-abliterated. Abliteration is a multiâstep orthogonalization process, not just a simple deletion. It compares residual streams from harmful vs. harmless samples, computes a "refusal direction", and subtracts this direction natively from the hidden states of target modules. The result is a fully uncensored, highâfidelity instruction model with loud and uninhibited semantic gradients â acting as the perfect cure for static/frozen LTXâ2 generations.
Why FP4 for Blackwell GPUs?
NVIDIA's latest Blackwell Tensor Cores are explicitly optimized for FP4/NVFP4 mathematical operations. This format offers:
Significantly higher throughput than FP8
Extremely low VRAM footprint
Faster longâprompt (prefill) inference
Decreased pressure on memory bandwidth
These FP4 editions feature a pure FP4 tensor layout (with appropriate micro-block and global scales) fully compatible with NVFP4 hardware acceleration on RTX 50âseries and data center hardware.
đ Technical Summary
ComponentđĄď¸ HighâFidelity Editionđ Pure EditionBase Modelmlabonne/gemmaâ3â12bâitâqatâabliteratedmlabonne/gemmaâ3â12bâitâqatâabliteratedQuantizationFP4 + BF16 stabilizerPure FP4Protected Layers0â1, 44â47NoneNorms & BiasesBF16BF16Inference SpeedFastFastestStabilityHighestModerateVRAM UsageLowLowest
--

