I am currently a PhD student at the University of Virginia, advised by Dr. Sheng Li. I work on Multimodal Intelligence. Specifically, I am interested in:
- Visual understanding and reasoning with MLLMs (post-training, test-time scaling)
- Vision–Language alignment (VLMs, MLLMs)
- Multimodal LLM agents