Sucheng Ren

Hi, I am Sucheng Ren (任苏成), a Computer Science Ph.D. student at Johns Hopkins University, where I am fortunate to be advised by Professor Alan Yuille and Prof. Cihang Xie. I received my B.S. and M.S. degree in Computer Science from South China University of Technology advised by Prof. Shengfeng He. Currently, I am a research intern at Apple. Previously, I spent great time at Bytedance Seed, Microsoft Research Asia (MSRA), Tsinghua University and National University of Singapore.

My research lies at the Diffusion/Autoregressive based Generative Model and Multimodal Learning.

Email | CV | Scholar | Github |

News

[Feb. 2026] FreqFlow got accepted by CVPR2026, M-VAR got accepted by CVPR2026 Findings!🎉
[Jun. 2025] xAR got accepted by ICCV2025!🎉
[May. 2025] FlowAR got accepted by ICML2025!🎉
[Jan. 2025] ARM got accepted by ICLR2025!🎉
[Jan. 2025] ARVideo got accepted by TMLR!🎉
[May. 2024] D-iGPT got accepted by ICML2024 as Oral presentation!🎉
[Aug. 2023] Join Johns Hopkins University as a PhD student!
[Jul. 2023] SG-Former got accepted by ICCV2023!🎉
[Feb. 2023] TinyMIM got accepted by CVPR2023!🎉

Selected Publications

M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation

Sucheng Ren, Yaodong Yu, Nataniel Ruiz, Feng Wang, Alan Yuille, Cihang Xie
IEEE Conference on Computer Vision and Pattern Recognition (CVPR Findings), 2026
[paper] [code] [bibtex]

We decouple scale-wise attention which allows to rebuild VAR in a more computationally efficient manner.

Frequency-Aware Flow Matching for High-Quality Image Generation

Sucheng Ren, Qihang Yu, Ju He, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2026
[Coming Soon] [bibtex]

We explicitly incorporates frequency-aware conditioning into the flow matching.

Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation

Sucheng Ren, Qihang Yu, Ju He, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen
International Conference on Conputer Vision (ICCV), 2025
[paper] [code] [bibtex]

We generalize next token prediction to next X prediction.

FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching

Sucheng Ren, Qihang Yu, Ju He, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen
International Conference on Machine Learning (ICML), 2025
[paper] [code] [bibtex]

We generalize next scale prediction to simplest scale design and make it compatible with any VAE.

Autoregressive Pretraining with Mamba in Vision

Sucheng Ren, Xianhang Li, Haoqin Tu, Feng Wang, Fangxun Shu, Lei Zhang, Jieru Mei, Linjie Yang, Peng Wang, Heng Wang, Alan Yuille, Cihang Xie
International Conference on Learning Representation (ICLR), 2025
[paper] [code] [bibtex]

We are the first to pretrain Mamba in vision with Cluster-based autoregressive modeling

ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning

Sucheng Ren, Hongru Zhu, Chen Wei, Yijiang Li, Alan Yuille, Cihang Xie
Transactions on Machine Learning Research (TMLR), 2025
[paper] [code] [bibtex]

We use autoregressive pretraining for self-supervised video representation learning

Rejuvenating image-GPT as Strong Visual Representation Learners

Sucheng Ren, Zeyu Wang, Hongru Zhu, Junfei Xiao, Alan Yuille, Cihang Xie
International Conference on Machine Learning (ICML), (Oral), 2024
[paper] [code] [bibtex]

We enhance image-GPT with semantic-rich supervision.

TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models

Sucheng Ren, Fangyun Wei, Zheng Zhang, Han Hu
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023
[paper] [code] [bibtex]

We explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones.

Shunted Self-Attention via Multi-Scale Token Aggregation

Sucheng Ren, Daquan Zhou, Shengfeng He, Jiashi Feng, Xinchao Wang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (Oral), 2022
[paper] [code] [bibtex]

Integrating the capability of capturing multiscale objects in each attention layer by adaptively merging tokens.

SG-Former: Self-guided Transformer with Evolving Token Reallocation

Sucheng Ren, Xingyi Yang, Songhua Liu, Xinchao Wang
International Conference on Computer Vision (ICCV), 2023
[paper] [code] [bibtex]

Integrating the capability of capturing multiscale objects in each attention layer by adaptively merging tokens.

Co-advise: Cross Inductive Bias Distillation

Sucheng Ren, Zhengqi Gao, Tianyu Hua, Zihui Xue, Yonglong Tian, Shengfeng He, Hang Zhao
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[paper] [code] [bibtex]

The first work delves into the influence of models inductive biases in knowledge distillation

A Simple Data Mixing Prior for Improving Self-Supervised Learning

Sucheng Ren, Huiyu Wang, Zhengqi Gao, Shengfeng He, Alan Yuille, Yuyin Zhou, Cihang Xie
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[paper] [bibtex]

A generic training strategy in data mixing that can improve the self-supervised representation learning of both CNNs and ViTs

Learning from the Master: Distilling Cross-modal Advanced Knowledge for Lip Reading
Sucheng Ren, Yong Du, Jianming Lv, Guoqiang Han, and Shengfeng He
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021
[paper] [bibtex]

Training a master to learn how to teach a better student.

Reciprocal Transformations for Unsupervised Video Object Segmentation

Sucheng Ren, Wenxi Liu, Yongtuo Liu, Haoxin Chen, Guoqiang Han and Shengfeng He
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021
[paper] [bibtex] [code]

Jointly learning salient objects, moving objects, recurring objects for Unsupervised Video Object Segmentation.

TENet: Triple Excitation Network for Video Salient Object Detection

Sucheng Ren, Chu Han, Xin Yang, Guoqiang Han and Shengfeng He
European Conference on Computer Vision (ECCV), 2020
(Spotlight, Acceptance Rate 5.0%)
[paper] [bibtex]