latent size
Great work! π
What are the optimal latent size / resolution buckets for different aspect ratios with z-image-turbo?
Those are the resolution they provide on there own demo page so i guess they are the Optimal
--1024--
1024x1024 ( 1:1 )
1152x896 ( 9:7 )
896x1152 ( 7:9 )
1152x864 ( 4:3 )
864x1152 ( 3:4 )
1248x832 ( 3:2 )
832x1248 ( 2:3 )
1280x720 ( 16:9 )
720x1280 ( 9:16 )
1344x576 ( 21:9 )
576x1344 ( 9:21 )
-- 1280 --
1280x1280 ( 1:1 )
1440x1120 ( 9:7 )
1120x1440 ( 7:9 )
1472x1104 ( 4:3 )
1104x1472 ( 3:4 )
1536x1024 ( 3:2 )
1024x1536 ( 2:3 )
1600x896 ( 16:9 )
896x1600 ( 9:16 )
1680x720 ( 21:9 )
720x1680 ( 9:21 )
Thanks for supporting our work! And btw it's correct! Just like @elguachiiii mentioned, we recommend you to use these!!
Great work! π
What are the optimal latent size / resolution buckets for different aspect ratios with z-image-turbo?
You could use more as long as both width & height are divided by 16 (8xvae + 2xpatch) and not exceeded 256 pixels fluctuation of 1024 resolution grid (like 768 ~ 1280), however, better use these resolution choices just like these in https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo/blob/main/app.py#L37 ~ Not only are they common for serving as regular inference resolution ratio for more than approximately 95% cases, but also they are in domain in our last training stage.
I'm starting my LoRa training tomorrow. Claude found me this resource, but I'll be using it for training, not production. So I wanted to ask: Should I only use black images in LoRa training, or would a more varied aspect ratio enrich the dataset and yield better results in production?