latent size

#28

by CemalSahin - opened Nov 28, 2025

Discussion

CemalSahin

Nov 28, 2025

Great work! 👏
What are the optimal latent size / resolution buckets for different aspect ratios with z-image-turbo?

elguachiiii

Nov 29, 2025

•

edited Nov 29, 2025

Those are the resolution they provide on there own demo page so i guess they are the Optimal

--1024--
1024x1024 ( 1:1 )
1152x896 ( 9:7 )
896x1152 ( 7:9 )
1152x864 ( 4:3 )
864x1152 ( 3:4 )
1248x832 ( 3:2 )
832x1248 ( 2:3 )
1280x720 ( 16:9 )
720x1280 ( 9:16 )
1344x576 ( 21:9 )
576x1344 ( 9:21 )

-- 1280 --
1280x1280 ( 1:1 )
1440x1120 ( 9:7 )
1120x1440 ( 7:9 )
1472x1104 ( 4:3 )
1104x1472 ( 3:4 )
1536x1024 ( 3:2 )
1024x1536 ( 2:3 )
1600x896 ( 16:9 )
896x1600 ( 9:16 )
1680x720 ( 21:9 )
720x1680 ( 9:21 )

QJerry

Tongyi-MAI org Nov 29, 2025

•

edited Nov 29, 2025

Thanks for supporting our work! And btw it's correct! Just like @elguachiiii mentioned, we recommend you to use these!!

Great work! 👏
What are the optimal latent size / resolution buckets for different aspect ratios with z-image-turbo?

You could use more as long as both width & height are divided by 16 (8xvae + 2xpatch) and not exceeded 256 pixels fluctuation of 1024 resolution grid (like 768 ~ 1280), however, better use these resolution choices just like these in https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo/blob/main/app.py#L37 ~ Not only are they common for serving as regular inference resolution ratio for more than approximately 95% cases, but also they are in domain in our last training stage.

0xbeycan

20 days ago

•

edited 20 days ago

I'm starting my LoRa training tomorrow. Claude found me this resource, but I'll be using it for training, not production. So I wanted to ask: Should I only use black images in LoRa training, or would a more varied aspect ratio enrich the dataset and yield better results in production?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment