--- license: apache-2.0 language: - en base_model: Qwen/Qwen3-VL-8B-Thinking tags: - vision - image-text-to-text - visual-question-answering - agent - function-calling - thinking - chain-of-thought datasets: - hxssgaa/xlam-interleave-thinking-40k --- # Qwen3-VL-8B-Interleave-Thinking (v0.1) **Qwen3-VL-8B-Interleave-Thinking** is a specialized agentic model fine-tuned on top of [Qwen/Qwen3-VL-8B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking). It is designed to provide an experience similar to the OpenAI Agent SDK, featuring **interleaved thinking** capabilities where the model generates internal thought processes before executing function calls. ## Model Details - **Base Model:** [Qwen/Qwen3-VL-8B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking) - **Fine-tuning Dataset:** [hxssgaa/xlam-interleave-thinking-40k](https://huggingface.co/datasets/hxssgaa/xlam-interleave-thinking-40k) - **Methodology:** Distilled from MiniMax M2.1, specifically targeting agentic behaviors and reasoning chains. - **Version:** v0.1 (SFT Only). Future versions will incorporate large-scale Reinforcement Learning (RL) to further enhance agentic capabilities. ## Key Features - **Interleaved Thinking:** The model is trained to "think" before acting. It generates a reasoning trace (thought chain) before emitting a function call, allowing for better error correction and planning. - **Long-Horizon Function Calling:** Capable of handling complex, multi-step tasks by maintaining a coherent thought process throughout the interaction. - **Agentic Focus:** Optimized for tool use and complex scenarios where the model needs to decide *why* and *how* to use a tool effectively. ## Usage Please use vllm to serve the model, you may need to write a custom agent framework to check whether the agent's output contain `xxx`, to judge whether LLM has finished function calling. Note one user query could result in multiple `xxx` ``` vllm serve hxssgaa/Qwen3-VL-8B-Interleave-Thinking \ --trust-remote-code \ --host 0.0.0.0 \ --port 8000 ``` ## Dataset & Training The model was fine-tuned on **xlam-interleave-thinking-40k**, a dataset containing 40,000 high-quality examples of interleaved thinking and tool usage distilled from the MiniMax M2.1 model. This dataset ensures the model adopts a rigorous thinking pattern suitable for autonomous agents. ## Future Work This v0.1 release represents the initial Supervised Fine-Tuning (SFT) phase. Subsequent releases will focus on: - Large-scale Reinforcement Learning (RL) to refine policy optimization. - Enhanced robustness in edge-case handling. ## Citation If you use this model, please cite the original Qwen3-VL work and the xlam-interleave-thinking dataset.