Phenaki: Variable Length Video Generation From Open Domain Textual Description
Paper
• 2210.02399 • Published
• 3
The embeddings of images and video patches from raw frames x are processed by a spatial and then a causal transformer (AR in time) to gen video tokens