InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models

Jun 23, 2025·

Nianchen Deng*

Lixin Gu*

Shenglong Ye*

Yinan He*

Zhe Chen

Songze Li

Haomin Wang

Xingguang Wei

Tianshuo Yang

Min Dou

Tong He

Wenqi Shao

Kaipeng Zhang

Yi Wang

Botian Shi

Yanting Zhang

Jifeng Dai

Yu Qiao

Hongjie Zhang

Wenhai Wang

· 1 min read

PDF Cite

Abstract

Recent benchmarks and datasets have been proposed to improve spatial reasoning in vision-language models (VLMs), yet existing open resources remain limited in scale, visual diversity, and instruction expressiveness. In this work, we introduce InternSpatial, the largest open-source dataset for spatial reasoning in VLMs, along with InternSpatial-Bench, a corresponding evaluation benchmark designed to assess spatial understanding under diverse instruction formats. InternSpatial comprises 12 million QA pairs spanning both single-view and multi-view settings, drawn from diverse visual environments and supporting 19 instruction formats that reflect varied query styles. For evaluation, we propose InternSpatial-Bench for single-view tasks and expand multi-view reasoning by introducing a novel rotation angle prediction task that has not been explored in prior work. Experimental results show that models trained on InternSpatial achieve 12.1% improvement on InternSpatial-Bench and 10.7% on VSI-Bench, while maintaining strong performance on general-purpose benchmarks. We hope these resources will support the development of spatially capable VLMs in practical applications such as robotics and embodied AI.

Type

Preprint

Citation

If you find this project useful in your research, please consider cite:

@article{deng2025internspatial,
  title={InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models},
  author={Deng, Nianchen and Gu, Lixin and Ye, Shenglong and He, Yinan and Chen, Zhe and Li, Songze and Wang, Haomin and Wei, Xingguang and Yang, Tianshuo and Dou, Min and others},
  journal={arXiv preprint arXiv:2506.18385},
  year={2025}
}

Last updated on Jun 23, 2025