Vision and Language Navigation in the Real World via Online Visual Language Mapping.

This work tackles the vision-and-language navigation task in the real world, using large language models (LLMs) and visual language models (VLMs).