Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning

Mar 18, 2026·

Haomin Wang*

Qi Wei*

Qianli Ma

Shengyuan Ding

Jinhui Yin

Kai Chen

Hongjie Zhang

· 1 min read

PDF Cite Code Dataset

Abstract

With the rapid advancement of vision-language models, an increasing number of studies have explored their potential for SVG generation tasks. Although existing approaches improve performance by constructing large-scale SVG datasets and introducing SVG-specific tokens, they still suffer from limited generalization, redundant paths in code outputs, and a lack of explicit reasoning. In this work, we present CTRL-S (Chain-of-Thought Reinforcement Learning for SVG), a unified framework that introduces a chain-of-thought mechanism to explicitly expose the model’s reasoning process during SVG generation. To support this structured reasoning, we construct SVG-Sophia, a high-quality dataset containing 145K samples across SVG code refinement, Text-to-SVG, and Image-to-SVG tasks. By training the model to generate group-level structured SVG code, CTRL-S significantly improves structural coherence and visual fidelity. Furthermore, we adopt the GRPO algorithm and design a multi-reward optimization framework, incorporating DINO, image-text similarity, format, and code efficiency rewards. Through joint multi-reward optimization and multi-task training, our approach systematically enhances overall generation capabilities. Extensive experiments show that CTRL-S outperforms existing methods, achieving higher task success rates, superior SVG code quality, and exceptional visual fidelity.

Type

Preprint

Citation

If you find this project useful in your research, please consider cite:

@article{wang2026reliable,
  title={Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning},
  author={Wang, Haomin and Wei, Qi and Ma, Qianli and Ding, Shengyuan and Yin, Jinhui and Chen, Kai and Zhang, Hongjie},
  journal={arXiv preprint arXiv:2603.16189},
  year={2026}
}

Last updated on Mar 18, 2026