---
license: apache-2.0
language:
- en
base_model: Qwen/Qwen3-VL-8B-Thinking
tags:
- vision
- image-text-to-text
- visual-question-answering
- agent
- function-calling
- thinking
- chain-of-thought
datasets:
- hxssgaa/xlam-interleave-thinking-40k
---

# Qwen3-VL-8B-Interleave-Thinking (v0.1)

**Qwen3-VL-8B-Interleave-Thinking** is a specialized agentic model fine-tuned on top of [Qwen/Qwen3-VL-8B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking). It is designed to provide an experience similar to the OpenAI Agent SDK, featuring **interleaved thinking** capabilities where the model generates internal thought processes before executing function calls.

## Model Details

- **Base Model:** [Qwen/Qwen3-VL-8B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking)
- **Fine-tuning Dataset:** [hxssgaa/xlam-interleave-thinking-40k](https://huggingface.co/datasets/hxssgaa/xlam-interleave-thinking-40k)
- **Methodology:** Distilled from MiniMax M2.1, specifically targeting agentic behaviors and reasoning chains.
- **Version:** v0.1 (SFT Only). Future versions will incorporate large-scale Reinforcement Learning (RL) to further enhance agentic capabilities.

## Key Features

- **Interleaved Thinking:** The model is trained to "think" before acting. It generates a reasoning trace (thought chain) before emitting a function call, allowing for better error correction and planning.
- **Long-Horizon Function Calling:** Capable of handling complex, multi-step tasks by maintaining a coherent thought process throughout the interaction.
- **Agentic Focus:** Optimized for tool use and complex scenarios where the model needs to decide *why* and *how* to use a tool effectively.

## Usage

Please use vllm to serve the model, you may need to write a custom agent framework to check whether the agent's output contain `<tool_call>xxx</tool_call>`, to judge whether LLM has finished function calling.

Note one user query could result in multiple `<tool_call>xxx</tool_call>`

```
vllm serve hxssgaa/Qwen3-VL-8B-Interleave-Thinking \
  --trust-remote-code \
  --host 0.0.0.0 \
  --port 8000
```

## Dataset & Training
The model was fine-tuned on **xlam-interleave-thinking-40k**, a dataset containing 40,000 high-quality examples of interleaved thinking and tool usage distilled from the MiniMax M2.1 model. This dataset ensures the model adopts a rigorous thinking pattern suitable for autonomous agents.

## Future Work
This v0.1 release represents the initial Supervised Fine-Tuning (SFT) phase. Subsequent releases will focus on:
- Large-scale Reinforcement Learning (RL) to refine policy optimization.
- Enhanced robustness in edge-case handling.

## Citation
If you use this model, please cite the original Qwen3-VL work and the xlam-interleave-thinking dataset.