Phenaki: Variable Length Video Generation From Open Domain Textual Description
Paper
• 2210.02399 • Published
• 3
MaskGiT is trained to reconstruct masked tokens z predicted by a frozen C-ViViT encoder and conditioned on T5X tokens of a given prompt p0