毛松

Profile

Hi there 👋 I'm Mao Song (毛松)

I'm a Researcher at Shanghai Artificial Intelligence Laboratory (Shanghai AI LAB), currently focusing on the exciting field of Multimodal Large Language Models (MLLMs) and the potential of unified understanding & generation models towards AGI.

My academic journey includes a Master's degree from ShanghaiTech University (supervised by Professor Wang Hao) and a Bachelor's degree from Beijing Institute of Technology (BIT).

My Recent Work & Interests:

  • Multimodal Large Language Models (MLLMs): I developed DocParser, a tool to process academic papers with LaTeX source files from arXiv. Leveraging DocParser, we released DocGenome, a rich academic dataset providing annotations across layouts, OCR, and entity relationships to enhance MLLM understanding of text-rich images.

  • Unified Understanding & Generation Models: While I haven't initiated a specific project in this area yet, I believe it represents a crucial step towards achieving true Artificial General Intelligence.

As a newcomer to this pioneering domain, I am actively learning from foundational works like Qwen-VL and Intern-VL, aiming to contribute meaningfully to this emerging field.

Explore More:

Publications

https://arxiv.org/abs/2406.11633

Renqiu Xia, Song Mao, Xiangchao Yan, Hongbin Zhou, Bo Zhang, Haoyang Peng, Jiahao Pi, Daocheng Fu, Wenjie Wu, Hancheng Ye, Shiyang Feng, Bin Wang, Chao Xu, Conghui He, Pinlong Cai, Min Dou, Botian Shi, Sheng Zhou, Yongwei Wang, Bin Wang, Junchi Yan, Fei Wu, Yu Qiao