Abstract: Vision-and-Language Navigation in Continuous Environments (VLN-CE) requires agents to navigate with lowlevel actions following natural language instructions in 3D environments. Most existing ...