profile image

Xintao Wang

Contact Me

🧭 I am currently the Tech Lead for Multimodal Generation and Post-Training at the Kling Team.

Kling Focus

I lead a team working on four main directions: 1) multimodal generation (Omni), 2) 4K video generation, 3) RL post-training, and 4) next-generation unified models.

1. Multimodal Generation

Focus on unfied generative models for reference-based generation, video editing, and other multimodal video-generation tasks. In Dec. 2025, we introduced Kling-O1 (X post). In Feb. 2026, we further released Kling 3.0-Omni (X post, X post). Before Kling-O1, this line of work also covered early Kling controllable-generation capabilities, including image-to-video (X post), motion brush (X post), camera control(X post), multi-reference generation (X post), and multimodal editing (X post).

2. 4K Video Generation

Develop Kling's 4K video-generation capability, including the Kling 4K release.

3. RL Post-Training

Focus on reinforcement learning and human-feedback alignment for video foundation models, and has been applied to recent Kling releases.

4. Next-Generation Unified Models

I am now leading next-generation video pre-training for native unified multimodal foundation models.

💼 Previously, I was a Senior Staff Researcher at Tencent ARC Lab and Tencent AI Lab, where I worked on generative AI for image, video, 3D, and restoration.

Tencent Focus

1. Restoration and Enhancement

Developed practical restoration and enhancement methods, including Real-ESRGAN (over 36K ⭐ GitHub Star) and GFPGAN (over 37.5K ⭐ GitHub Star), which have been widely adopted in academia, industry, and consumer-facing applications. Also led the AI algorithms for anime video super-resolution and enhancement, which were deployed in Tencent Video.

2. Controllable and Multimodal Generation

Developed controllable generation methods for image and video, including T2I-Adapter, an influential controllable image-generation method proposed around the same time as ControlNet. This line also includes generative-AI works across image, video, and 3D, such as PhotoMaker (over 10.1K ⭐ GitHub Star), ToonCrafter, and InstantMesh.

3. Open-Source Video Generation

Led the VideoCrafter series, one of the leading open-source video-generation efforts before Sora.

🎓 I received my Ph.D. from Multimedia Lab (MMLab), the Chinese University of Hong Kong, advised by Prof. Xiaoou Tang and Prof. Chen Change Loy. I obtained my bachelor's degree from Zhejiang University.
🏅 Ranked among the Top 2% Scientists Worldwide by Stanford University since 2022.

We are Hiring!

We are actively looking for research interns and full-time researchers to work on unified multimodal models, next-state-prediction-style generative video pre-training, multimodal video generation, and RL. Research interns interested in unified multimodal models and generative video pre-training are especially welcome. If you're interested in exploring these opportunities, please reach out to me at xintao.alpha@gmail.com.

News

Selected Publications and Preprints [Full List]

(* equal contribution, # corresponding author)