Flux.1 [schnell] is too slow

Hello! When I use Flux.1 with diffusers library it gives me this warning:

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers

and also runs extremely slowly. What’s the problem? I tried using both bf16 and fp16, but it has no effect whatsoever.
Here’s my code:

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()

prompt = "Draw a 2D red ball with no background, without shadows"
image = pipe(
    prompt,
    height=512,
    width=512,
    guidance_scale=3.5,
    num_inference_steps=40,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux.png")

Try disabling CPU offloading and keep everything on the GPU if you have enough GPU memory available.

#pipe.enable_model_cpu_offload()

Now it’s even slower. My CPU has 60% usage, however GPU has 0%. What’s going on?

I think that the GPU version of torch may not have been installed correctly, and that the CPU version may have been installed instead.

I’ve installed torch with this command:

pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126

How do I check if I have the right version installed?

How do I check if I have the right version installed?

import torch
print(torch.cuda.is_available())

I’ve installed torch with this command:

I recommend this.

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
  1. It prints “True”.
  2. I have Cuda 12.6

Try this.

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16).to("cuda")
#pipe.enable_model_cpu_offload()

And if you don’t install it yet.

pip install -U accelerate

Is 226.04s/it ok? I have GTX 1650.

Even the RTX 4090 isn’t enough…

Oh, ok. Sorry for being stupid :sweat_smile: I’m very new to AI

Are there lightweight models that can generate simple 2D art fastly?

FLUX is the latest model, so it’s very large and requires an abnormal amount of VRAM. SD1.5 is 2GB, SDXL is 6.5GB, and FLUX is around 32GB. You can reduce memory consumption using a technique called quantization, but at most you can only reduce it by a quarter.
Well, that’s why many people try to use cloud services rather than their own PCs.

Are there lightweight models that can generate simple 2D art fastly?

SD1.5 or SDXL ones. For anime, SDXL is recommended by me.

one of SDXL model

Thanks! I should’ve read that before installing such a large model… :smiling_face_with_tear:

A lot of people are surprised by the size of FLUX. I’m poor with GPUs, so it won’t run locally!:sob:

Spaces is free to use unless you get scammed, so you should try out different things and find a compromise.
If you’re okay with SDXL, then as long as you have a video card with 12GB VRAM, it should work. Even with 8GB, you can manage if you try hard.