Hi, I'm a final-year Ph.D. candidate at
Northeastern University in Boston, where I work closely with
Professor Lawson L.S. Wong and
Professor Christopher Amato.
My research interests span machine learning, computer vision,
and natural language processing, particularly at their intersection with robotics. My long-term goal is to
equip robots with human-level multi-modality intelligence, including advanced visual perception for better
scene understanding and robust natural language grounding for improved language-driven robot control.
Before joining Northeastern, I earned my Master's and Bachelor's degrees from Nankai University in China.
Propose a vision-based navigation system that can localize an embodied agent on top-down 2-D coarse maps (e.g., hand-drawn maps) from photo-realistic panoramic RGB images.
Propose a navigation framework that achieves zero-shot vision-and-language navigation in real world scenarios using a Large Language Model (LLM) and a Large Visual-Language Model(VLM).
Propose a hierarchical navigation framework to facilitate high-level long-horizon planning and low-level goal-conditioned policy learning. To bridge the high-level map, we utilized a conditional generative model to generate RGB image goals from binary occupancy grids.
We design a co-pilot driving system that uses vision controller as a vehicle assistant.
Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.