Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

June 17, 2019
Qiuyuan Huang, Microsoft; Jianfeng Gao, Microsoft

Vision-Language Navigation is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments. We propose a novel Reinforced Cross-Modal Matching (RCM) approach that enforces cross-modal grounding both locally and globally via reinforcement learning (RL) and further introduce a Self-Supervised Imitation Learning (SIL) method to explore unseen environments by imitating its own past, good decisions.

- Qiuyuan Huang
  
  Principal Researcher
- Jianfeng Gao
  
  Technical Fellow & Corporate Vice President
Research Area
- Artificial intelligence
Publication
- Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
Blog & Podcasts
- See what we mean – Visually grounded natural language navigation is going places

Watch Next

Guiding the AI disruption to the Good Place
May 14, 2026
Yash Lara,

David Rothschild
New fine-tuning of language models: Match meaning, not tokens
May 14, 2026
Yash Lara,

Carles Domingo-Enrich
Introducing Interwhen: Steering reasoning agents with real-time verification
May 14, 2026
Yash Lara,

Amit Sharma
Introducing GitHub Agentic Workflows: AI that runs your repo
May 14, 2026
Yash Lara,

Peli de Halleux
MagenticLite: A full-stack agentic experience powered by Small Models
May 14, 2026
Harkirat Behl,

Weili Shi,

Hussein Mozannar
Language & Voice AI for Africa: From Data to Deployment and Impact
April 30, 2026
Vukosi Marivate,

Tavonga Siyavora,

Tobi Olatunji

, et. al.
AutoAdapt demo
April 24, 2026
Microsoft Transforms its Cloud Supply Chain with Optimization and Generative AI
April 16, 2026
Peter Lee,

Konstantina Mellou,

Kayla Kummerlowe

, et. al.
Will machines ever be intelligent?
March 23, 2026
Subutai Ahmad,

Doug Burger,

Nicolo Fusi
Efficient Distributed Orthonormal Optimizers for Large-Scale Training
February 12, 2026
Kwangjun Ahn

Your Privacy Choices