Qwen 360 Diffusion

세부 정보

모델 설명

Qwen 360 Diffusion

General

Qwen 360 Diffusion is a rank 128 LoRA built on top of a 20B parameter MMDiT (Multimodal Diffusion Transformer) model, designed to generate 360 degree equirectangular projection images from text descriptions.

The model was trained from the Qwen Image model on an extremely diverse dataset composed of tens of thousands of equirectangular images, depicting landscapes, interiors, humans, animals, and objects. All images were resized to 2048x1024 before training.

The model was also trained with a diverse dataset of normal photos for regularization, making the model a realism finetune when prompted correctly.

Based on extensive testing, the model's capabilities vastly exceed all other currently available T2I 360 image generation models. Thus when given the right prompt, the model should be capable of producing almost anything you want.

The model is designed to be capable of producing equirectangular images that can be used for non-VR purposes such as general imagery, photography, artwork, architecture, portraiture, and many other concepts.

Training Details

The training dataset consists of 32k unique 360 degree equirectangular images. Each image was randomly rotated horizontally 3 times for data augmentation (original + 3 rotations), providing a total of 128k training images. All 32k original 360 images were manually checked by humans for seams, polar artifacts, incorrect distortions, and other problems before their inclusion in the dataset.

For regularization, 64k images were randomly selected from the pexels-568k-internvl2 dataset and added to the training set.

Training timeline: 3 months and 23 days

Training was first performed using nf4 quantization for 32 epochs (8 epochs counting the original + augmentations as a single epoch):

  • qwen-360-diffusion-int4-bf16-v1.safetensors was trained for 28 epochs (1,344,000 steps)

  • qwen-360-diffusion-int4-bf16-v1-b.safetensors was trained for 32 epochs (1,536,000 steps)

Training then continued at int8 quantization for another 16 epochs (4 epochs counting the original + augmentations as a single epoch):

  • qwen-360-diffusion-int8-bf16-v1.safetensors was trained for a total of 48 epochs (2,304,000 steps)

Usage

To activate panoramic generation, include one of the following trigger phrases or some variation of one or more of the following trigger words in your prompt:

"equirectangular", "360 image", "360 panorama", or "360 degree panorama with equirectangular projection"

Note that even using a 360 viewer on your 2D device screen can create a feeling like you are actually inside the scene, known as a sense of 'presence' in psychology.

Recommended Settings

  • Aspect ratio: For best results use the 2:1 resolution of 2048×1024. Using 1024×512, 1536×768, and other 2:1 ratios for text-to-image generation may cause the model to struggle with generating proper horizons.

  • Prompt tips: Include desired medium or style, such as photograph, oil painting, illustration, or digital art.

  • 360-specific considerations: Remember that 360 images wrap around with no borders—the left edge connects to the right edge, while the top and bottom edges merge into a single point at the poles of the sphere.

  • Human subject considerations: For full body shots, specify the head/face and footwear (e.g., "wearing boots") or lack thereof to avoid incomplete or incorrectly distorted outputs.

  • Equirectangular distortion: Outputs show increasing horizontal stretching as you move vertically away from the center. These distortions are not visible when viewed in a 360 viewer.

Once generated, you can upscale your panoramas for use as photographs, artwork, skyboxes, virtual environments, VR experiences, VR therapy, or 3D scene backgrounds—or as part of a text-to-video-to-3D-world pipeline. Note that the model is also designed to produce equirectangular images for non-VR usage as well.


Notes

FP8 inference

When using FP8 quantization, for maximum visual fidelity it's strongly recommended to use the GGUF Q8 or int8 quantized versions of Qwen Image transformer models.

If you are using transformer models with fp8_e4m3fn or fp8_e5m2 precision, or low precision models trained with "accuracy-fixing" methods (e.g., ostris/ai-toolkit), they may cause patch or grid artifacts when used with the int8-trained LoRA model. Some have found this issue to be caused by directly downcasting to fp8 from fp16, without proper scaling and calibration. → To avoid this, use the lower-accuracy full-precision versions of the model:
qwen-360-diffusion-int4-bf16-v1.safetensors or qwen-360-diffusion-int4-bf16-v1-b.safetensors.

  • Low-Precision Artifact Mitigation
    If artifacts still appear when using the int4-trained LoRA on a fp8_e4m3fn or fp8_e5m2 transformer quant, they can often be reduced by:

    • Adjusting the LoRA weight, and/or refining both positive and negative prompts.

Additional Tools

HTML 360 Viewer

To make the viewing and sharing of 360 images & video easier, I built a web browser based HTML 360 viewer that runs locally on your device. It works on desktop and mobile browsers, and has optional VR headset support.

Recommended ComfyUI Nodes

If you are a user of ComfyUI, then these sets of nodes can be useful for working with 360 images & videos.

For those using diffusers and other libraries, you can make use of the pytorch360convert library when working with 360 media.


Limitations

A large portion of training data has the viewer at 90 degrees to the direction of gravity, and thus rotating outputs may be required to achieve different vertical angles.


Contributors

Citation Information

BibTeX

@software{Egan_Qwen_360_Diffusion_2025,
  author = {Egan, Ben and {XWAVE} and {Jimmy Carter}},
  license = {MIT},
  month = dec,
  title = {{Qwen 360 Diffusion}},
  url = {https://huggingface.co/ProGamerGov/qwen-360-diffusion},
  year = {2025}
}

APA

Egan, B., XWAVE, & Jimmy Carter. (2025). Qwen 360 Diffusion [Computer software]. https://huggingface.co/ProGamerGov/qwen-360-diffusion

Please refer to the CITATION.cff for more information on how to cite this dataset.


This model can also be found on HuggingFace: https://huggingface.co/ProGamerGov/qwen-360-diffusion

이 모델로 만든 이미지

이미지를 찾을 수 없습니다.