Vision and Language Navigation in the Real World via Online Visual Language Mapping.
This work tackles the vision-and-language navigation task in the real world, using large language models (LLMs) and visual language models (VLMs).
See the video here Download the paper here