Hi! I'm a research scientist at Meta FAIR,
where I work on building omni models that can understand and generate across many modalities (text, image, video, and more).
I currently focus on post-training these models using advanced reward models and reinforcement learning.
I received my PhD from the University of Washington,
advised by Prof. Mari Ostendorf and Prof. Noah A. Smith,
and closely collaborated with Prof. Ranjay Krishna.
During my PhD, I was supported by the Qualcomm Innovation Fellowship and Apple.
I have also interned at Allen Institute for AI (AI2) and Google Research.
* indicates equal contribution
Yushi Hu*, Reyhane Askari-Hemmat*, Melissa Hall, Emily Dinan, Luke Zettlemoyer, Marjan Ghazvininejad
Preprint 2025
[paper]
[code & data]
[Huggingface dataset]
TLDR: A benchmark for reward models that advance SOTA omni models (e.g., Nano Banana).
Yushi Hu*, Weijia Shi*, Xingyu Fu, Dan Roth, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Ranjay Krishna
NeurIPS 2024
[paper]
[code]
[project page]
TLDR: Proposes "thinking with images." Enables multimodal LLMs to generate images during reasoning, improving math, vision, and spatial reasoning tasks.
Xingyu Fu*, Yushi Hu*, Bangzheng Li, Yu Feng, Haoyu Wang, Xudong Lin, Dan Roth, Noah A. Smith, Wei-Chiu Ma, Ranjay Krishna
ECCV 2024
[paper]
[project page]
[code]
[HF data]
TLDR: A benchmark revealing that multimodal LLMs struggle with core visual perception tasks that humans find trivial.
Yushi Hu, Otilia Stretcu, Chun-Ta Lu, Krishnamurthy Viswanathan, Kenji Hata,
Enming Luo, Ranjay Krishna, Ariel Fuxman
CVPR 2024 (Oral)
[paper]
[project page]
TLDR: Distills tool usage and programmatic reasoning from visual programs into end-to-end vision-language models.
Zeqiu Wu*, Yushi Hu*, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj
Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi
NeurIPS 2023 (Spotlight)
[paper]
[project page]
[code & data]
TLDR: Fine-grained feedback on sub-sentences enables better reward models and more effective RLHF training.
Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay
Krishna, Noah A. Smith
ICCV 2023
[paper]
[project page]
[code & data]
[poster]
TLDR: The first paper proposing to evaluate image generation with multimodal LLMs.