Portrait
Zhanqiu (Jack) Guo
M.S. Student in Machine Learning
Carnegie Mellon University
About Me

I am an M.S. student in the Machine Learning Department at Carnegie Mellon University, expected to graduate in December 2026. My research focuses on multimodal agents that interact with complex environments, including Vision-Language-Action models, web agents, and clinical multimodal AI systems. I am particularly interested in building models that go beyond recognizing static inputs, grounding perception, language, memory, and feedback in settings where actions have real consequences.

I am currently advised by Prof. Chenyan Xiong in the CX Research Group. Before CMU, I received my B.S. in Computer Science and Mathematics from New York University, where I was fortunate to be advised by Prof. Lerrel Pinto in the General-purpose Robotics and AI Lab (GRAIL) and by Prof. Yiqiu (Artie) Shen.

Education
  • Carnegie Mellon University
    Aug. 2025 - Dec. 2026
    Machine Learning Department
    M.S. in Machine Learning
  • New York University
    Aug. 2021 - May 2025
    Tandon School of Engineering
    B.S. in Computer Science & Mathematics
Experience
  • TikTok
    May 2026 - Aug. 2026
    TikTok Shop Ads Ranking
    Machine Learning Engineer Intern
  • DTCC
    Jun. 2025 - Aug. 2025
    Technology Research & Innovation
    IT Intern
  • Teragonia
    Feb. 2025 - May 2025
    Data Science & Artificial Intelligence
    AI/ML Engineer Intern
  • Coleman Research
    Jun. 2024 - Aug. 2024
    Machine Learning & Generative AI
    IT Intern
Honors & Awards
  • Best Paper Award
    CoRL Workshop on Lifelong Learning for Home Robot
    2024
  • Honorable Mention
    Mathematical Contest in Modeling
    2022
Selected Publications (view all )
EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training

Yiyang Du, Zhanqiu Guo, Xin Ye, Liu Ren, Chenyan Xiong

arXiv preprint 2026

A mid-training framework that selects VLA-aligned data from broader VLM corpora to improve downstream vision-language-action policy learning.

EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training

Yiyang Du, Zhanqiu Guo, Xin Ye, Liu Ren, Chenyan Xiong

arXiv preprint 2026

A mid-training framework that selects VLA-aligned data from broader VLM corpora to improve downstream vision-language-action policy learning.

Modeling Distinct Human Interaction in Web Agents

Faria Huq, Zora Zhiruo Wang, Zhanqiu Guo, Venu Arvind Arangarajan, Tianyue Ou, Frank Xu, Shuyan Zhou, Graham Neubig, Jeffrey P. Bigham

arXiv preprint 2026

Introduces CowCorpus, a dataset of real-user web navigation trajectories with interleaved human and agent actions, and trains intervention-aware web agents.

Modeling Distinct Human Interaction in Web Agents

Faria Huq, Zora Zhiruo Wang, Zhanqiu Guo, Venu Arvind Arangarajan, Tianyue Ou, Frank Xu, Shuyan Zhou, Graham Neubig, Jeffrey P. Bigham

arXiv preprint 2026

Introduces CowCorpus, a dataset of real-user web navigation trajectories with interleaved human and agent actions, and trains intervention-aware web agents.

DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation
DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

Peiqi Liu, Zhanqiu Guo, Mohit Warke, Soumith Chintala, Chris Paxton, Nur Muhammad Mahi Shafiullah, Lerrel Pinto

IEEE International Conference on Robotics and Automation (ICRA) 2025 pp. 13346-13355

An online dynamic spatio-semantic memory system for open-world mobile manipulation, enabling robots to search for, localize, and recover objects in changing environments.

DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

Peiqi Liu, Zhanqiu Guo, Mohit Warke, Soumith Chintala, Chris Paxton, Nur Muhammad Mahi Shafiullah, Lerrel Pinto

IEEE International Conference on Robotics and Automation (ICRA) 2025 pp. 13346-13355

An online dynamic spatio-semantic memory system for open-world mobile manipulation, enabling robots to search for, localize, and recover objects in changing environments.

ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL

Zhanqiu Guo, Wayne Wang

arXiv preprint 2024

A context-aware mixture-of-experts extension of Neural Whittle Index Networks for restless multi-armed bandits, with convergence analysis and applications to dynamic decision making.

ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL

Zhanqiu Guo, Wayne Wang

arXiv preprint 2024

A context-aware mixture-of-experts extension of Neural Whittle Index Networks for restless multi-armed bandits, with convergence analysis and applications to dynamic decision making.

All publications