Yuanhao Cai

I am currently a 3rd year PhD student in the department of Computer Science, Johns Hopkins University. I am a member of CCVL, advised by Bloomberg Distinguished Professor Dr. Alan Yuille. Previously, I received my MSE and BSE degrees from Tsinghua University in 2023 and 2020. My master advisor is prof. Haoqian Wang. During my study in Tsinghua University, I spent a good time with prof. Yulun Zhang, prof. Xin Yuan, prof. Radu Timofte, and prof. Luc Van Gool. I interned in Adobe Research (2024 - 2025) and Meta Superintelligence Labs (2025 - 2026).

GitHub / Google Scholar / Linkdin / ZhiHu

I am open to collaboration. Our Lab also has opennings for 2025 summer interns. Check this document for introduction. Feel free to contact me at ycai51@jh.edu if you interested.

Honor and Award

Stanford/Elsevier World’s Top 2% Scientists, 2025

Best Paper Award of AI4CC Workshop at CVPR, 2025

Runner-up Award of NTIRE Low Light Enhancement Challenge at CVPR, 2024

Outstanding Reviewer Award at CVPR, 2024

Outstanding Reviewer Award at ICLR, 2024

Excellent Graduate of Tsinghua University, 2023

Excellent Master Thesis Award, Tsinghua University, 2023

Outstanding Reviewer Award at CVPR, 2023

Top-10 Talented Graduate Student Finalist, Tsinghua University, 2022

National Scholarship, Tsinghua University, 2022

Winner of NTIRE Spectral Reconstruction Challenge at CVPR, 2022

National Scholarship, Tsinghua University, 2021

Winner of COCO Keypoint Detection Challenge at ECCV, 2020

Winner of COCO Keypoint Detection Challenge at ICCV, 2019

Best Paper Award of Joint COCO and Mapillary Recognition Challenge Workshop at ICCV, 2019

2nd Place of AdultSize Drop-In Challenge at RoboCup World Final, 2019

2nd Place of AdultSize Technical Challenge at RoboCup World Final, 2019

3rd Place of AdultSize Soccer Competition at RoboCup World Final, 2019

Learning Progress Scholarship, Tsinghua University, 2019

Science and Technology Innovation Excellence Scholarship, Tsinghua University, 2019

Publication

( * = Equal Contribution, † = Corresponding Author)

	PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation Yuanhao Cai, Kunpeng Li, Menglin Jia, Jialiang Wang, Junzhe Sun, Weifeng Chen, Felix Juefei Xu, Chu Wang, Ali Thabet, Xiaoliang Dai, Xuan Ju, Alan Yuille, Ji Hou arxiv, 2025 paper / project A data construction pipeline and a new diret preference optimization framework for physically consistant video generation
	OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions Yuanhao Cai, He Zhang, Xi Chen, Jinbo Xing, Yiwei Hu, Yuqian Zhou, Kai Zhang, Zhifei Zhang, Soo Ye Kim, Tianyu Wang, Yulun Zhang, Xiaokang Yang, Zhe Lin, Alan Yuille Advances in Neural Information Processing Systems (NeurIPS), 2025 paper / project A data construction pipeline and a unified diffusion Transformer for subject-driven video customization under different control conditions; 4D generation
	EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning Xuan Ju, Tianyu Wang, Yuqian Zhou, He Zhang, Qing Liu, Nanxuan Zhao, Zhifei Zhang, Yijun Li, Yuanhao Cai, Shaoteng Liu, Daniil Pakhomov, Daniil Pakhomov, Zhe Lin, Soo Ye Kim, Qiang Xu arxiv, 2025 paper / project Unifying a diverse range of generation and editing tasks for both images and videos within a single and powerful model
	LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS Wanhua Li, Yujie Zhao, Minghan Qin, Yang Liu, Yuanhao Cai, Chuang Gan, Hanspeter Pfister Advances in Neural Information Processing Systems (NeurIPS)*, 2025 paper / project A super fast framework for real-time 3D open-vocabulary querying and high-dimensional feature splatting
	Are Pixel-Wise Metrics Reliable for Sparse-View Computed Tomography Reconstruction? Tianyu Lin, Xinran Li, Chuntung Zhuang, Qi Chen, Yuanhao Cai, Kai Ding, Alan Yuille, Zongwei Zhou Advances in Neural Information Processing Systems (NeurIPS), 2025 paper / project New metrics and diffusion based anatomy-aware enhancement for sparse-view CT reconstruction
	X-LRM: X-ray Large Reconstruction Model for Extremely Sparse-View Computed Tomography Recovery in One Second Guofeng Zhang , Ruyi Zha , Hao He, Yixun Liang, Alan Yuille, Hongdong Li, Yuanhao Cai arxiv, 2025 paper / project A feedforward method and a large-scale dataset for instant CT reconstruction
	Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction Yuanhao Cai, He Zhang, Kai Zhang, Yixun Liang, Mengwei Ren, Fujun Luan, Qing Liu, Soo Ye Kim, Jianming Zhang, Zhifei Zhang, Yuqian Zhou, Yulun Zhang, Xiaokang Yang, Zhe Lin, Alan Yuille International Conference on Computer Vision (ICCV), 2025 paper / project / media (MrNeRF) A 3DGS diffusion generates objects and reconstructs scenes from a single view in 6s
	4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction Weihao Yu, Yuanhao Cai, Ruyi Zha, Zhiwen Fan, Chenxin Li, Yixuan Yuan International Conference on Computer Vision (ICCV), 2025 paper / project / media (MrNeRF) A 4D Gaussian Splatting method for dynamic CT reconstruction
	Motion-X++: A Large-Scale Multimodal 3D Whole-body Human Motion Dataset Yuhong Zhang, Jing Lin, Ailing Zeng, Guanlin Wu, Shunlin Lu, Yurong Fu, Yuanhao Cai, Ruimao Zhang, Haoqian Wang, Lei Zhang arxiv, 2024 paper / project A Large-Scale Dataset for Multimodal 3D Whole-body Human Motion Generation
	VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment Wenyan Cong, Kevin Wang, Jiahui Lei, Colton Stearns, Yuanhao Cai, Dilin Wang, Rakesh Ranjan, Matt Feiszli, Leonidas Guibas, Zhangyang Wang, Weiyao Wang, Zhiwen Fan arxiv, 2024 paper / project A 3DGS-based reconstruction method for videos without SfM initilization
	LucidFusion: Generating 3D Gaussians with Arbitrary Unposed Images Hao He, Yixun Liang, Luozhou Wang, Yuanhao Cai, Xinli Xu, Hao-Xiang Guo, Xiang Wen, Ying-Cong Chen arxiv, 2024 paper / project A pose-free multi-view 3D reconstruction methods based on Gaussian Splatting
	HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting Yuanhao Cai , Zihao Xiao, Yixun Liang, Minghan Qin, Yulun Zhang, Xiaokang Yang, Yaoyao Liu, Alan Yuille Advances in Neural Information Processing Systems (NeurIPS), 2024 paper / project / video / zhihu / leaderboard / media (AK) / media (MrNeRF) / bibtex The first 3D Gaussian splatting-based method for high dynamic range imaging
	R2-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction Ruyi Zha, Tao Jun Lin, Yuanhao Cai†, Jiwen Cao, Hongdong Li Advances in Neural Information Processing Systems (NeurIPS), 2024 paper / project / media (MrNeRF) The first 3D Gaussian splatting-based method for CT reconstruction
	Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis Yuanhao Cai , Yixun Liang, jiahao Wang, Angtian Wang, Yulun Zhang, Xiaokang Yang, Zongwei Zhou, Alan Yuille European Conference on Computer Vision (ECCV), 2024 paper / project / video / zhihu / media (AK) / media (MrNeRF) / bibtex The first 3D Gaussian splatting-based method for X-ray 3D reconstruction
	Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement Kun Zhou, Xinyu Lin, Wenbo Li, Xiaogang Xu, Yuanhao Cai , Zhonghang Liu, Xiaoguang Han, Jiangbo Lu European Conference on Computer Vision (ECCV), 2024 paper / project A light-weight network for low-light image enhancement
	Structure-Aware Sparse-View X-ray 3D Reconstruction Yuanhao Cai , jiahao Wang, Zongwei Zhou, Angtian Wang, Alan Yuille Conference on Computer Vision and Pattern Recognition (CVPR), 2024 paper / project / video / zhihu / leaderboard / bibtex A NeRF algorithm capturing structures for large-scale X-ray 3D reconstruction
	Binarized Spectral Compressive Imaging Yuanhao Cai , Yuxin Zheng, Jing Lin, Xin Yuan, Yulun Zhang, Haoqian Wang Advances in Neural Information Processing Systems (NeurIPS), 2023 paper / project / zhihu / bibtex An Efficient Retinex-based method for Low-light Image Enhancement
	Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset Jing Lin , Ailing Zeng , Shunling Lu , Yuanhao Cai* , Ruimao Zhang, Haoqian Wang, Lei Zhang Advances in Neural Information Processing Systems (NeurIPS), 2023 paper / project / bibtex A large-scale human motion benchmark with text description
	Retinexformer: One-stage Retinex-based Transformer for Low-light Image Enhancement Yuanhao Cai , Hao Bian, Jing Lin, Haoqian Wang, Radu Timofte, Yulun Zhang International Conference on Computer Vision (ICCV), 2023 paper / project / zhihu / bibtex An Efficient Retinex-based method for Low-light Image Enhancement
	Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging Yuanhao Cai , Jing Lin , Haoqian Wang, Xin Yuan, Henghui Ding, Yulun Zhang, Radu Timofte, Luc Van Gool Advances in Neural Information Processing Systems (NeurIPS), 2022 paper / project / zhihu / bibtex The first Transformer-based deep unfolding method for spectral compressive imaging
	Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction Yuanhao Cai , Jing Lin , Xiaowan Hu, Haoqian Wang, Xin Yuan, Yulun Zhang, Radu Timofte, Luc Van Gool European Conference on Computer Vision (ECCV), 2022 paper / project / zhihu / bibtex A novel SOTA Transformer-based method for hyperspectral image reconstruction
	Flow-Guided Sparse Transformer for Video Deblurring Jing Lin , Yuanhao Cai , Xiaowan Hu, Haoqian Wang, Youliang Yan, Xueyi Zou, Henghui Ding, Yulun Zhang, Radu Timofte, Luc Van Gool International Conference on Machine Learning (ICML), 2022 paper / project / bibtex The first Transformer-based method for video deblurring
	Unsupervised Flow-Aligned Sequence-to-Sequence Learning for Video Restoration Jing Lin , Xiaowan Hu , Yuanhao Cai, Haoqian Wang, Youliang Yan, Xueyi Zou, Yulun Zhang, Luc Van Gool International Conference on Machine Learning (ICML), 2022 paper / project / bibtex The first Sequence-to-Sequence model for video restoration
	MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction Yuanhao Cai , Jing Lin , Zudi Lin, Haoqian Wang, Yulun Zhang, Hanspeter Pfister, Radu Timofte, Luc Van Gool Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2022 paper / project / zhihu / bibtex Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB. The first Transformer-based method for spectral reconstruction. A baseline and toolbox.
	Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction Yuanhao Cai , Jing Lin , Xiaowan Hu, Haoqian Wang, Xin Yuan, Yulun Zhang, Radu Timofte, Luc Van Gool Conference on Computer Vision and Pattern Recognition (CVPR), 2022 paper / project / zhihu / bibtex The first Transformer-based method for hyperspectral image reconstruction
	HDNet: High-resolution Dual-domain Learning for Spectral Compressive Imaging Xiaowan Hu , Yuanhao Cai , Jing Lin, Haoqian Wang, Xin Yuan, Yulun Zhang, Radu Timofte, Luc Van Gool Conference on Computer Vision and Pattern Recognition (CVPR), 2022 paper / project / bibtex Dual-domain learning for hyperspectral image reconstruction
	RFormer: Transformer-based Generative Adversarial Network for Real Fundus Image Restoration on A New Clinical Benchmark Zhuo Deng , Yuanhao Cai , Lu Chen, Zheng Gong, Qiqi Bao, Xue Yao, Dong Fang, Shaochong Zhang, Lan Ma IEEE Journal of Biomedical and Health Informatics (J-BHI), 2022 paper / project / bibtex The first clinical benchmark and Transformer-based method for fundus image restoration
	Learning to Generate Realistic Noisy Images via Pixel-level Noise-aware Adversarial Training Yuanhao Cai , Xiaowan Hu, Haoqian Wang, Yulun Zhang, Hanspeter Pfister, Donglai Wei Advances in Neural Information Processing Systems (NeurIPS), 2021 paper / project / leaderboard / bibtex A GAN for real noisy image generation
	Multi-Scale Selective Feedback Network with Dual Loss for Real Image Denoising Xiaowan Hu, Yuanhao Cai, Zhihong Liu, Haoqian Wang, Yulun Zhang International Joint Conference on Artificial Intelligence (IJCAI), Oral, 2021 paper / project / bibtex A semi-supervised method for real image denoising
	Pseudo 3D Auto-Correlation Network for Real Image Denoising Xiaowan Hu, Ruijun Ma, Zhihong Liu, Yuanhao Cai , Xiaole Zhao, Yulun Zhang, Haoqian Wang Conference on Computer Vision and Pattern Recognition (CVPR), 2021 paper / project / bibtex An efficient architecture for image denoising
	Efficient Human Pose Estimation by Learning Deeply Aggregated Representations Zhengxion Luo, Zhicheng Wang, Yuanhao Cai , Guan'an Wang, Liang Wang, Yan Huang, ErJin Zhou, Tieniu Tan, Jian Sun International Conference on Multimedia and Expo (ICME), Oral, 2021 paper / project / bibtex An DenseNet-like backbone for efficient human pose estimation
	Pyramid Orthogonal Attention Network based on Dual Self-Similarity for Accurate Mr Image Super-Resolution Xiaowan Hu, Haoqian Wang, Yuanhao Cai , Xiaole Zhao, Yulun Zhang International Conference on Multimedia and Expo (ICME), 2021 paper / project / bibtex A novel self-attention mechanism for MR image super-resolution
	Learning Delicate Local Representations for Multi-Person Pose Estimation Yuanhao Cai , Zhicheng Wang , Zhengxion Luo, Binyi Yin, Ang'ang Du, Haoqian Wang, Xiangyu Zhang, Xinyu Zhou, ErJin Zhou, Jian Sun European Conference on Computer Vision (ECCV), Spotlight, 2020 paper / project / zhihu / leaderboard / bibtex A multi-stage structure and an attention mechanism for accurate human pose estimation
	UDP++ Junjie Huang , Zengguang Shan , Yuanhao Cai , Feng Guo, Yun Ye, Xinze Chen, Zheng Zhu, Guan Huang, Jiwen Lu, Dalong Du European Conference on Computer Vision Workshop (ECCVW), Oral*, 2020 paper / project / zhihu / leaderboard / bibtex Winner of COCO Keypoint Detection Challenge, 2020
	EG^2N: Enhanced Gradient Guiding Network for Single MR Image Super-Resolution Xiaowan Hu, Yuanhao Cai, Haoqian Wang, Yanbin Peng, Yulun Zhang Optoelectronic Imaging and Multimedia Technology VII 11550, 115500I, 2020 paper / project / bibtex A novel gradient guiding mechanism for MR image super-resolution
	Res-Steps-Net for Multi-Person Pose Estimation Yuanhao Cai , Zhicheng Wang , Binyi Yin, Ruihao Yin, Angang Du, Zhengxion Luo, Zeming Li, Xinyu Zhou, Gang Yu, ErJin Zhou, Xiangyu Zhang, Yichen Wei, Jian Sun International Conference on Computer Vision Workshop (ICCVW), Best Paper Award, 2019 paper / project / zhihu / leaderboard / bibtex Winner of COCO Keypoint Detection Challenge, 2019

Peer Reviewer

Conference: CVPR, ECCV, ICCV, NeurIPS, ICML, ICLR, AAAI, IJCAI, ACM MM, etc.

Journal: TPAMI, IJCV, TIP, TNNLS, Pattern Recognition, etc.

Yuanhao Cai

Honor and Award

Publication

PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation

OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions

EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning

LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS

Are Pixel-Wise Metrics Reliable for Sparse-View Computed Tomography Reconstruction?

X-LRM: X-ray Large Reconstruction Model for Extremely Sparse-View Computed Tomography Recovery in One Second

Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction

4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction

Motion-X++: A Large-Scale Multimodal 3D Whole-body Human Motion Dataset

VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment

LucidFusion: Generating 3D Gaussians with Arbitrary Unposed Images

HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting

R2-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction

Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis

Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement

Structure-Aware Sparse-View X-ray 3D Reconstruction

Binarized Spectral Compressive Imaging

Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset

Retinexformer: One-stage Retinex-based Transformer for Low-light Image Enhancement

Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging

Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction

Flow-Guided Sparse Transformer for Video Deblurring

Unsupervised Flow-Aligned Sequence-to-Sequence Learning for Video Restoration

MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction

Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

HDNet: High-resolution Dual-domain Learning for Spectral Compressive Imaging

RFormer: Transformer-based Generative Adversarial Network for Real Fundus Image Restoration on A New Clinical Benchmark

Learning to Generate Realistic Noisy Images via Pixel-level Noise-aware Adversarial Training

Multi-Scale Selective Feedback Network with Dual Loss for Real Image Denoising

Pseudo 3D Auto-Correlation Network for Real Image Denoising

Efficient Human Pose Estimation by Learning Deeply Aggregated Representations

Pyramid Orthogonal Attention Network based on Dual Self-Similarity for Accurate Mr Image Super-Resolution

Learning Delicate Local Representations for Multi-Person Pose Estimation

UDP++

EG^2N: Enhanced Gradient Guiding Network for Single MR Image Super-Resolution

Res-Steps-Net for Multi-Person Pose Estimation

Peer Reviewer