From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
Paper
β’ 2602.22859 β’ Published
β’ 148
Qwen2.5-VL-7B-Instruct_DPE_v1 is the first-iteration model evolved from Qwen2.5-VL-7B-Instruct using the DPE (Diagnostic-driven Progressive Evolution) framework.
DPE is a self-evolving training framework that prioritizes the diagnosis of capability gaps to steer targeted data generation. This version represents the first successful cycle of the DPE pipeline for the 7B model.
| Category | Benchmark | Base Model | DPE_v1 (Ours) | Improvement |
|---|---|---|---|---|
| STEM | MMMU | 53.11 | 54.44 | +1.33 |
| RealWorldQA | 68.63 | 69.41 | +0.78 | |
| Visual Math | MathVista | 65.50 | 67.50 | +2.00 |
| Specialized | HallusionBench | 64.98 | 69.09 | +4.11 |
| Overall | Average | 57.29 | 58.47 | +1.18 |
@misc{jia2026blindspotsgainsdiagnosticdriven,
title={From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models},
author={Hongrui Jia and Chaoya Jiang and Shikun Zhang and Wei Ye},
year={2026},
eprint={2602.22859},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.22859},
}