NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance

Learning navigation in dynamic open-world environments is an important yet challenging skill for robots. Most previous methods rely on precise localization and mapping or learn from expensive real-world demonstrations. In this paper, we propose the Navigation Diffusion Policy (NavDP), an end-to-end framework trained solely in simulation and can zero-shot transfer to different embodiments in diverse real-world environments. The key ingredient of NavDP's network is the combination of diffusion-based trajectory generation and a critic function for trajectory selection, which are conditioned on only local observation tokens encoded from a shared policy transformer. Given the privileged information of the global environment in simulation, we scale up the demonstrations of good quality to train the diffusion policy and formulate the critic value function targets with contrastive negative samples. Our demonstration generation approach achieves about 2,500 trajectories/GPU per day, 20x more efficient than real-world data collection, and results in a large-scale navigation dataset with 363.2km trajectories across 1244 scenes. Trained with this simulation dataset, NavDP achieves state-of-the-art performance and consistently outstanding generalization capability on quadruped, wheeled, and humanoid robots in diverse indoor and outdoor environments. In addition, we present a preliminary attempt at using Gaussian Splatting to make in-domain real-to-sim fine-tuning to further bridge the sim-to-real gap. Experiments show that adding such real-to-sim data can improve the success rate by 30% without hurting its generalization capability.

(a) NavDP is trained with the simulation data from two types of supervision:

A Diffusion Policy head models the distribution of expert demonstrations recorded in the dataset.
A Critic Function head that can evaluate the safety of any navigation trajectory. This is learned from the ESDF map in the simulation.

(b) Two-stage Inference for Safety. Given a supported navigation goal, the diffusion head first generates a batch of available trajectories, then the critic head assigns scores for each trajectory. We select the best to execute. This procedure balance the generation diversity and deployment safety.

(c) Real-to-Sim Reconstruction Bridges the Sim-to-Real Gap. To mitigate the sim-to-real gap for navigation, we use 3DGS to reconstruct a specified scene and follow the same pipeline to generate in-domain navigation datasets.

BibTeX

@article{cai2025navdp,
  title = {NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance},
  author = {Wenzhe Cai, Jiaqi Peng, Yuqiang Yang, Yujian Zhang, Meng Wei, Hanqing Wang, Yilun Chen, Tai Wang and Jiangmiao Pang},
  booktitle = {Arxiv},
  year = {2025},
}

NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance

Abstract

Approach

Large-Scale Simulation Navigation Data

Evaluation in Simulation

In-the-Wild Real-World Navigation

BibTeX