# PixArt-Σ

## Overview

[PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation](https://huggingface.co/papers/2403.04692) is Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, and Zhenguo Li.

Some notes about this pipeline:

* It uses a Transformer backbone (instead of a UNet) for denoising. As such it has a similar architecture as [DiT](https://hf.co/docs/transformers/model_doc/dit).
* It was trained using text conditions computed from T5. This aspect makes the pipeline better at following complex text prompts with intricate details.
* It is good at producing high-resolution images at different aspect ratios. To get the best results, the authors recommend some size brackets which can be found [here](https://github.com/PixArt-alpha/PixArt-sigma/blob/master/diffusion/data/datasets/utils.py).
* It rivals the quality of state-of-the-art text-to-image generation systems (as of this writing) such as PixArt-α, Stable Diffusion XL, Playground V2.0 and DALL-E 3, while being more efficient than them.
* It shows the ability of generating super high resolution images, such as 2048px or even 4K.
* It shows that text-to-image models can grow from a weak model to a stronger one through several improvements (VAEs, datasets, and so on.)

🤗 `Optimum` extends `Diffusers` to support inference on the second generation of Neuron devices(powering Trainium and Inferentia 2). It aims at inheriting the ease of Diffusers on Neuron.

## Export to Neuron

To deploy models in the PixArt-Σ pipeline, you will need to compile them to TorchScript optimized for AWS Neuron. There are four components which need to be exported to the `.neuron` format to boost the performance:

* Text encoder
* Transformer
* VAE encoder
* VAE decoder

You can either compile and export a PixArt-Σ Checkpoint via CLI or `NeuronPixArtSigmaPipeline` class.

### Option 1: CLI

```bash
optimum-cli export neuron --model Jingya/pixart_sigma_pipe_xl_2_512_ms --batch_size 1 --height 512 --width 512 --num_images_per_prompt 1 --torch_dtype bfloat16 --sequence_length 120 pixart_sigma_neuron_512/
```

> [!TIP]
> We recommend using a `inf2.8xlarge` or a larger instance for the model compilation. You will also be able to compile the model with the Optimum CLI on a CPU-only instance (needs ~35 GB memory), and then run the pre-compiled model on `inf2.xlarge` to reduce the expenses. In this case, don't forget to disable validation of inference by adding the `--disable-validation` argument.

### Option 2: Python API

```python
import torch
from optimum.neuron import NeuronPixArtSigmaPipeline

# Compile
compiler_args = {"auto_cast": "none"}
input_shapes = {"batch_size": 1, "height": 512, "width": 512, "sequence_length": 120}

neuron_model = NeuronPixArtSigmaPipeline.from_pretrained("Jingya/pixart_sigma_pipe_xl_2_512_ms", torch_dtype=torch.bfloat16, export=True, disable_neuron_cache=True, **compiler_args, **input_shapes)

# Save locally
neuron_model.save_pretrained("pixart_sigma_neuron_512/")

# Upload to the HuggingFace Hub
neuron_model.push_to_hub(
    "pixart_sigma_neuron_512/", repository_id="optimum/pixart_sigma_pipe_xl_2_512_ms_neuronx"  # Replace with your HF Hub repo id
)
```

## Text-to-Image

`NeuronPixArtSigmaPipeline` class allows you to generate images from a text prompt on neuron devices similar to the experience with `Diffusers`.

With pre-compiled PixArt-Σ models, now generate an image with a prompt on Neuron:

```python
from optimum.neuron import NeuronPixArtSigmaPipeline

neuron_model = NeuronPixArtSigmaPipeline.from_pretrained("pixart_sigma_neuron_512/")
prompt = "Oppenheimer sits on the beach on a chair, watching a nuclear exposition with a huge mushroom cloud, 120mm."
image = neuron_model(prompt=prompt).images[0]
```

## NeuronPixArtSigmaPipeline[[optimum.neuron.NeuronPixArtSigmaPipeline]]

Pipeline for text-to-image generation using PixArt-Σ.

#### optimum.neuron.NeuronPixArtSigmaPipeline[[optimum.neuron.NeuronPixArtSigmaPipeline]]

[Source](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling_diffusion.py#L1577)

__call__optimum.neuron.NeuronPixArtSigmaPipeline.__call__https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling_diffusion.py#L1094[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}]

Are there any other diffusion features that you want us to support in 🤗`Optimum-neuron`? Please file an issue to [`Optimum-neuron` Github repo](https://github.com/huggingface/optimum-neuron) or discuss with us on [HuggingFace’s community forum](/static-proxy?url=https%3A%2F%2Fdiscuss.huggingface.co%2Fc%2Foptimum%2F)%2C cheers 🤗 !