Gemma3-12B-Abliterated-fp8
Details
Download Files
About this version
Model description
V1.0a Experimental!!
Important, pls read!!
Gemma-3-12B-Heretic-X (Sikaworld High-Fidelity Edition)
This is the ultra-dynamic, fully uncensored text encoder for LTX-2, based on the experimental Heretic-X fine-tune by LastRef.
While the standard abliterated version removes the "refusal" mechanism, Heretic-X was actively steered with a custom dataset to be proactively descriptive and uninhibited. In LTX-2 video generation, this translates to significantly stronger motion vectors, helping to "unfreeze" static videos and generate more intense dynamics in complex scenes.
This edition applies the Sikaworld High-Fidelity Quantization method to tame the aggressive nature of Heretic-X, ensuring that the increased dynamics do not come at the cost of facial symmetry or anatomical coherence.
🚀 Key Features
Aggressive Uncensoring (Heretic-X): Unlike standard abliteration (which just deletes the refusal direction), this model uses modified weights (attn.o_proj, mlp.down_proj) derived from x-rated dataset training. It delivers a "louder" and more confident signal to the video transformer, which is often the cure for "frozen" I2V generations.
High-Fidelity Layer Protection (The Stabilizer): Aggressive fine-tunes can often lead to "melting" faces in video. This version uses a Mixed Precision Strategy: The critical input layers (0-1) and the final output layers (44-47), as well as all LayerNorms and Biases, are kept in BF16. This acts as a safety rail, keeping facial features symmetric while allowing the body and background to move dynamically.
True Standalone (.safetensors): Includes the embedded spiece_model tensor. It works as a single-file plug-and-play solution in ComfyUI (LTX-2) without requiring external tokenizer.model files or complex folder structures.
Surgical Extraction: Stripped of the 20GB+ Vision-Tower weights (which LTX-2 does not use) to save VRAM and loading time, while retaining the full 48-layer text intelligence of the 24GB BF16 source.
🛠Usage in ComfyUI
Place the .safetensors file in your ComfyUI/models/text_encoders/ folder.
In your LTX-2 workflow (DualCLIPLoader), select this model.
Recommended Dtype: Set weight_dtype to fp8_e4m3fn (the critical layers remain BF16 automatically).
Prompting Tip: This model reacts very well to "action verbs" at the very beginning of the prompt. It requires less CFG scale than standard models to produce motion.
📊 Technical Background
Why Heretic-X for Video?
LTX-2 (especially the Dev version) often suffers from "motion collapse" (frozen video) when the text embedding is too neutral. Heretic-X provides a higher variance in its embeddings.
Why this Quantization?
Standard FP8 conversions of Heretic models often result in "weird" artifacts because the aggressive weights clip during quantization. By protecting the last 4 layers (44-47) in BF16, we ensure that the final instructions sent to the Video Transformer retain their high-precision spatial alignment, preventing the "uncanny valley" effect often seen in dynamic clips.
Credits
Base Model: Google Gemma 3
Heretic Fine-tune: LastRef
Optimization & Architecture Fixes: Sikaworld
v1.0
Gemma-3-12B-it-Abliterated (Sikaworld High-Fidelity Edition)
This is a specialized, fully uncensored (abliterated) text encoder for the LTX-2 audiovisual model.
While standard FP8 conversions often lead to "frozen" videos, facial drifting, or anatomical asymmetry in Image-to-Video (I2V) workflows, this version was surgically optimized to preserve the intelligence and stability of the original model.
🚀 Key Features
Uncensored Freedom: Based on the abliteration technique by Maxime Labonne. This model follows complex or "sensitive" prompts without refusals, ensuring a strong vector signal for high-motion video generation.
High-Fidelity Layer Protection: Unlike radical FP8 quants, this version uses a Mixed Precision Strategy. Critical input layers (0-1) and final output layers (44-47), as well as all LayerNorms and Biases, are kept in BF16. This specifically fixes the "face shifting" and "asymmetry" issues common in LTX-2.
True Standalone (.safetensors): Includes the embedded spiece_model tensor. It works as a single-file plug-and-play solution in ComfyUI without requiring external tokenizer.model files.
FP32 Sourced: Converted directly from the original 47GB FP32 shards to ensure maximum rounding precision during the FP8/BF16 hybrid conversion.
🛠Usage in ComfyUI
Place the .safetensors file in your ComfyUI/models/text_encoders/ folder.
In your LTX-2 workflow, use the DualCLIPLoader or the specific LTXV Text Encoder Loader.
Tip: For best motion results, leave the negative prompt empty and focus your positive prompt on actions and dynamics.
📊 Technical Background
Standard 8-bit quantization often "muffles" the subtle signals needed for temporal consistency in video models. By protecting the "navigation" layers (the beginning and end of the 48-layer stack) in BF16, this encoder provides a much "louder" and more stable movement command to the LTX-2 Transformer.
Credits
Abliteration: mlabonne
Optimization & Quantization: Sikaworld
