KVAE 2.0: Video tokenizers
KVAE 2.0 and previous KVAE 1.0 are familys of video and image tokenizers with spacial compression ratio of 8 and 16 and for video models with a time compression ratio of 4
KVAE-3D-2.0-t4s8
Model KVAE-3D-2.0-t4s8 has time compression 4 and spacial compression 8x8
Evaluation of reconstruction
For the test, open datasets MCL-JCV (video in 1280x720 resolution) and BVI-DVC were used. Wan-2.1 and HunyuanVideo-1.0 were considered as alternatives for the 4x8x8 format. Below are the results of a comparison using the PSNR, SSIM, and LPIPS metrics (with features from AlexNet).
Reconstruction comparison of KVAE 2.0, Hunyuan 1.0 and Wan 2.1
Inference instruction
Installation
Clone the repo:
git clone https://github.com/kandinskylab/kvae.git
cd kvae
Create environment with torch==2.8.0 ั CUDA 12.8
conda create -n kvae_inference python=3.11
conda activate kvae_inference
pip install -r requirements.txt
KVAE inference
To run an image model on some dataset to calculate metrics, you can use the script:
PYTHONPATH=. python scripts/inference_2d_kvae.py --dataset_folder ./assets/images/ --model KVAE_1.0
To run video models:
PYTHONPATH=. python scripts/inference_3d_kvae.py --dataset_folder ./assets/test1/ --model KVAE_2.0-t4s8
If you want to save the reconstructions, then set the parameter --saving_folder with the folder to save ./your_path/. Please note that this will affect the running time, especially of the video model, even though saving works asynchronously with the rest of the components.
More detailed example of work with models is presented in inference_examples.ipynb
To use the library mediapy, you will need to install ffmpeg:
conda install -c conda-forge ffmpeg
pip install -q mediapy
Model Zoo
Collection KVAE 1.0 featured 2 models for tokenizing videos and images with spacial compression ratio of 8. The collection KVAE 2.0 features 2 models, both for video tokenization, but with varying spacial compression ratio of 8 and 16, respectively. Below are links to all models KVAE
- Downloads last month
- 253