zer0int's Long CLIP_L-Registers-Gated_MLP-ViT-L-14
Details
Download Files
About this version
Model description
100% of the credit goes to zer0int from Huggingface. I'm only putting this on here so I can mark it as a resource in image generation. - Huggingface link - https://huggingface.co/zer0int/LongCLIP-Registers-Gated_MLP-ViT-L-14 - And will remove this if requested by zer0int.
NOTE:- This is not recommended for SDXL - Quickly reading issues it appears a compatible CLIP_G has not been released yet. So it may not work very well for SDXL. If you need CLIP_G probably best to not use this.
The main difference between "Long-CLIP_L" and "CLIP_L" is token length.
CLIP_L = 77 Token length from the prompt data.
Long-CLIP_L = 248 Token length from the prompt data.
Since I do pretty much only Flux generations and utilize LLM's sometimes having a larger token length helps. I mean, why limit yourself to 77 tokens when you can have 248!
TE Only = Text Encoder only, this is really all you need most of the time.
Full Model = The whole thing, if you wanted to do more than just text to image.
This particular Long-CLIP_L is the Registers-Gated version which is a fine-tune done version. zer0int has provided a nice little chart showing the difference with this fine-tune. If I'm not mistaken it's basically saying that both the text-2-text and img-2-text are closer lined up with each other and have much less instances of erroneous data. TL'DR - The shorter and wider the better!


