Homepage - Zhanqiu Guo

Zhanqiu (Jack) Guo

M.S. Student in Machine Learning
Carnegie Mellon University

zhanqiug(at)cs.cmu.edu

About Me

I am an M.S. student in the Machine Learning Department at Carnegie Mellon University, expected to graduate in December 2026. My research focuses on multimodal agents that interact with complex environments, including Vision-Language-Action models, web agents, and clinical multimodal AI systems. I am particularly interested in building models that go beyond recognizing static inputs, grounding perception, language, memory, and feedback in settings where actions have real consequences.

I am currently advised by Prof. Chenyan Xiong in the CX Research Group. Before CMU, I received my B.S. in Computer Science and Mathematics from New York University, where I was fortunate to be advised by Prof. Lerrel Pinto in the General-purpose Robotics and AI Lab (GRAIL) and by Prof. Yiqiu (Artie) Shen.

Education

Carnegie Mellon University

Aug. 2025 - Dec. 2026

Machine Learning Department

M.S. in Machine Learning
New York University

Aug. 2021 - May 2025

Tandon School of Engineering

B.S. in Computer Science & Mathematics

Experience

TikTok

May 2026 - Aug. 2026

TikTok Shop Ads Ranking

Machine Learning Engineer Intern
DTCC

Jun. 2025 - Aug. 2025

Technology Research & Innovation

IT Intern
Teragonia

Feb. 2025 - May 2025

Data Science & Artificial Intelligence

AI/ML Engineer Intern
Coleman Research

Jun. 2024 - Aug. 2024

Machine Learning & Generative AI

IT Intern

Honors & Awards

Best Paper Award

CoRL Workshop on Lifelong Learning for Home Robot

2024
Honorable Mention

Mathematical Contest in Modeling

2022

Selected Publications (view all )

EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training

Yiyang Du, Zhanqiu Guo, Xin Ye, Liu Ren, Chenyan Xiong

arXiv preprint 2026

A mid-training framework that selects VLA-aligned data from broader VLM corpora to improve downstream vision-language-action policy learning.

[arXiv] [PDF]

EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training

Yiyang Du, Zhanqiu Guo, Xin Ye, Liu Ren, Chenyan Xiong

arXiv preprint 2026

A mid-training framework that selects VLA-aligned data from broader VLM corpora to improve downstream vision-language-action policy learning.

[arXiv] [PDF]

Modeling Distinct Human Interaction in Web Agents

Faria Huq, Zora Zhiruo Wang, Zhanqiu Guo, Venu Arvind Arangarajan, Tianyue Ou, Frank Xu, Shuyan Zhou, Graham Neubig, Jeffrey P. Bigham

arXiv preprint 2026

Introduces CowCorpus, a dataset of real-user web navigation trajectories with interleaved human and agent actions, and trains intervention-aware web agents.

[arXiv] [PDF]

Modeling Distinct Human Interaction in Web Agents

Faria Huq, Zora Zhiruo Wang, Zhanqiu Guo, Venu Arvind Arangarajan, Tianyue Ou, Frank Xu, Shuyan Zhou, Graham Neubig, Jeffrey P. Bigham

arXiv preprint 2026

Introduces CowCorpus, a dataset of real-user web navigation trajectories with interleaved human and agent actions, and trains intervention-aware web agents.

[arXiv] [PDF]

DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

Peiqi Liu, Zhanqiu Guo, Mohit Warke, Soumith Chintala, Chris Paxton, Nur Muhammad Mahi Shafiullah, Lerrel Pinto

IEEE International Conference on Robotics and Automation (ICRA) 2025 pp. 13346-13355

An online dynamic spatio-semantic memory system for open-world mobile manipulation, enabling robots to search for, localize, and recover objects in changing environments.

[arXiv] [PDF] [Project]

DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

Peiqi Liu, Zhanqiu Guo, Mohit Warke, Soumith Chintala, Chris Paxton, Nur Muhammad Mahi Shafiullah, Lerrel Pinto

IEEE International Conference on Robotics and Automation (ICRA) 2025 pp. 13346-13355

An online dynamic spatio-semantic memory system for open-world mobile manipulation, enabling robots to search for, localize, and recover objects in changing environments.

[arXiv] [PDF] [Project]

ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL

Zhanqiu Guo, Wayne Wang

arXiv preprint 2024

A context-aware mixture-of-experts extension of Neural Whittle Index Networks for restless multi-armed bandits, with convergence analysis and applications to dynamic decision making.

[arXiv] [PDF]

ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL

Zhanqiu Guo, Wayne Wang

arXiv preprint 2024

A context-aware mixture-of-experts extension of Neural Whittle Index Networks for restless multi-armed bandits, with convergence analysis and applications to dynamic decision making.

[arXiv] [PDF]

Education

Experience

Honors & Awards

Selected Publications (view all )

EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training

EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training

Modeling Distinct Human Interaction in Web Agents

Modeling Distinct Human Interaction in Web Agents

DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL

ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL

All publications