Configuration Parsing Warning: Invalid JSON for config file config.json

TwiFF (Think With Future Frames): A Large-Scale Dataset for Dynamic Visual Reasoning

TwiFF is a unified model fine-tuned on a high-quality dynamic visual Chain-of-Thought (VCoT) dataset comprising 2.7 million samples. In dynamic multimodal question-answering tasks involving instructional, predictive, and camera motion, TwiFF iteratively generates future event frames alongside textual reasoning, thereby producing temporally coherent visual reasoning trajectories.

🧠 Method

Experimental results demonstrate that, on dynamic scenario reasoning benchmarks, our dynamic VCoT approach outperforms both static VCoT methods based on tool-calling paradigms and purely textual chain-of-thought baselines.

🚀 Quick Start

To use TwiFF, follow the instructions below derived from the official repository.

1. Set up environment

git clone https://github.com/LiuJunhua02/TwiFF.git
cd TwiFF
conda create -n TwiFF python=3.10 -y
conda activate TwiFF
pip install -r requirements.txt
pip install flash_attn==2.5.8 --no-build-isolation

2. Download checkpoint

from huggingface_hub import snapshot_download

save_dir = "models/TwiFF-7B"
repo_id = "Liu-Junhua/TwiFF-7B"
cache_dir = save_dir + "/cache"

snapshot_download(cache_dir=cache_dir,
  local_dir=save_dir,
  repo_id=repo_id,
  local_dir_use_symlinks=False,
  resume_download=True,
  allow_patterns=["*.json", "*.safetensors", "*.bin", "*.py", "*.md", "*.txt"],
)

3. Start Inference

Store your test cases in output/demo.jsonl (see the GitHub README for the specific JSON format) and run:

python \
  scripts/inference.py \
  --max_round 8 \
  --model_dir models/TwiFF-7B \
  --checkpoint_file model.safetensors \
  --checkpoint_dir models/TwiFF-7B \
  --QA_file output/demo.jsonl \
  --seed 42

✍️ Citation

@article{liu2026twiff,
         title={TwiFF (Think With Future Frames): A Large-Scale Dataset for Dynamic Visual Reasoning}, 
         author={Liu, Junhua and Wang, Zhangcheng and Han, Zhike and Wang, Ningli and Liang, Guotao and Kuang, Kun},
         journal={arXiv preprint arXiv:2602.10675},
         year={2026},
}