Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
- Qiuyuan Huang, Microsoft; Jianfeng Gao, Microsoft
Vision-Language Navigation is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments. We propose a novel Reinforced Cross-Modal Matching (RCM) approach that enforces cross-modal grounding both locally and globally via reinforcement learning (RL) and further introduce a Self-Supervised Imitation Learning (SIL) method to explore unseen environments by imitating its own past, good decisions.
-
-
Qiuyuan Huang
Principal Researcher
-
Jianfeng Gao
Technical Fellow & Corporate Vice President
-
-
Watch Next
-
-
-
-
-
-
-
-
Microsoft Transforms its Cloud Supply Chain with Optimization and Generative AI
- Peter Lee,
- Konstantina Mellou,
- Kayla Kummerlowe
-
-