Hi! My name is Haomin Wang (็Ž‹ๆ˜Šๆ—ป). I am currently a Ph.D. student in the School of Artificial Intelligence at Shanghai Jiao Tong University (SJTU), focusing on Multimodal Large Language Models and World Models.

My recent research primarily focuses on the application of MLLMs to vector graphics, including SVG understanding, generation, and editing. More recently, I have also been exploring world models in the context of embodied intelligence.

I am currently looking for collaborations and new opportunities. Feel free to contact me via email: kiyotakawang@sjtu.edu.cn.

Hereโ€™s my CV.

๐Ÿ”ฅ News

  • 2026.06: ย ๐ŸŽ‰ CTRL-S is accepted by ECCV 2026!
  • 2026.03: ย ๐ŸŽ‰ We released CTRL-S, welcome to have a try!
  • 2026.01: ย ๐ŸŽ‰ InternSVG and InternSpatial are accepted by ICLR 2026!
  • 2025.10: ย ๐ŸŽ‰ We released InternSVG, welcome to have a try!
  • 2025.09: ย ๐ŸŽ‰ VecFormer and ArchCAD-400K are accepted by NeurIPS 2025!
  • 2025.08: ย ๐ŸŽ‰ Our team released InternVL 3.5, welcome to have a try!

๐Ÿ“ Selected Publications

ECCV 2026
sym

Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning

Haomin Wang, Qi Wei, Qianli Ma, Shengyuan Ding, Jinhui Yin, Kai Chen, Hongjie Zhang

PDF | Code | Dataset

CTRL-S is a reinforcement learning framework for reliable SVG generation with explicit chain-of-thought reasoning. It introduces SVG-Sophia, a high-quality dataset covering SVG code refinement, text-to-SVG, and image-to-SVG tasks. By combining structured SVG reasoning, group-level code generation, and multi-reward optimization over visual fidelity, text-image alignment, format validity, and code efficiency, CTRL-S improves both SVG quality and generalization across diverse scenarios.

ICLR 2026
sym

InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models

Haomin Wang*, Jinhui Yin*, Qi Wei*, Wenguang Zeng, Lixin Gu, Shenglong Ye, Zhangwei Gao, Yaohui Wang, Yanting Zhang, Yuanqi Li, Yanwen Guo, Wenhai Wang, Kai Chen, Yu Qiao, Hongjie Zhang

PDF | Code | Model | Dataset | Benchmark | Homepage

InternSVG is a unified dataโ€“benchmarkโ€“model suite for SVG understanding, editing, generation. It introduces SAgoge, a large-scale multi-domain SVG dataset, and SArena, a comprehensive benchmark for evaluating multimodal SVG capabilities. Built on VLMs with SVG-specific tokenization and two stage training, InternSVG-8B model achieves strong performance across diverse SVG tasks.

NeurIPS 2025
sym

Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings

Xingguang Wei*, Haomin Wang*, Shenglong Ye, Ruifeng Luo, Yanting Zhang, Lixin Gu, Jifeng Dai, Yu Qiao, Wenhai Wang, Hongjie Zhang

PDF | Homepage

VecFormer uses a line-based representation of primitives for panoptic symbol spotting in floor plan CAD drawings, achieving a new state-of-the-art in the FloorPlanCAD dataset.

arXiv
sym

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Intern-S1-Pro Team, Shanghai AI Laboratory

PDF | Model

Intern-S1-Pro is a trillion-parameter scientific multimodal foundation model that strengthens both general reasoning and domain-specific scientific intelligence. With advanced agent capabilities and support for over 100 specialized tasks across fields such as chemistry, materials, life sciences, and earth sciences, it serves as a specializable generalist for scientific discovery and multimodal understanding.

arXiv
sym

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

InternVL Team, Shanghai AI Laboratory

PDF | Code | Models

InternVL 3.5 is a new open-source MLLM family that improves multimodal reasoning, versatility, and inference efficiency. It introduces Cascade RL for stronger coarse-to-fine reasoning, ViR for dynamic visual-token resolution, and DvD for efficient vision-language deployment, achieving major gains in reasoning performance and speed. The model also extends to GUI interaction and embodied agent capabilities, reaching SOTA performance among open-source MLLMs.

๐ŸŽ– Honors and Awards

  • Outstanding Graduate, Nanjing University.
  • Outstanding Student, Nanjing University.
  • China Merchants Bank Scholarship, Nanjing University.
  • Ruli Scholarship, Nanjing University.

๐Ÿ“– Educations

  • 2025.09 - now, Ph.D. in School of Artificial Intelligence, Shanghai Jiao Tong University.
  • 2021.09 - 2025.06, B.Eng. in Software Engineering, Nanjing University. (GPA 4.59/5.00, Top 3%)

๐Ÿ’ป Internships

  • 2024.07 - 2026.06, Research Intern, Shanghai AI Laboratory.