Research
My research interests include Embodied AI, Visual Navigation and Deep Reinforcement Learning.
|
|
ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
Xinxin Zhao*, Wenzhe Cai*, Likun Tang, Teng Wang
Under Review
website
/
paper
/
github
We propose a novel navigation decision framework, which first use imagination to generate candidate future images and let the VLMs to select.
This breaks ObjectGoal Navigation problem into a PointGoal navigation problem.
|
|
Boosting Efficient Reinforcement Learning for Vision-and-Language Navigation with Open-Sourced LLM
Jiawei Wang, Teng Wang, Wenzhe Cai, Lele Xu, Changyin Sun
IEEE Robotics and Automation Letters (RA-L), 2024
website
/
paper
/
github
We propose a hierarchical reinforcement learning method for vision-language navigation, which uses efficient open-sourced LLMs as a high-level planner and an RL-based policy for sub-instruction accomplishment.
|
|
InstructNav: Zero-shot System for Generic Instruction Navigation in Unexplored Environment
Yuxing Long*, Wenzhe Cai*, Hongcheng Wang, Guanqi Zhan, Hao Dong
Conference on Robot Learning (CoRL), 2024
website
/
paper
/
github
We propose a zero-shot navigation system, InstructNav, which makes the first
endeavor to handle multi-task navigation problems without any navigation
training or pre-built maps.
|
|
MO-DDN: A Coarse-to-Fine Attribute-based Exploration Agent for Multi-object Demand-driven Navigation
Hongcheng Wang, Peiqi Liu, Wenzhe Cai, Mingdong Wu, Zhengyu Qian, Hao Dong
Conference on Neural Information Processing Systems (NeurIPS), 2024
website
/
paper
/
github
We propose the Multi-Object Demand-Driven Navigation (MO-DDN) benchmark, which evaluates the agent navigation performance in multi-object exploration and aligns with personalized demands.
|
|
Bridging Zero-Shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill
Wenzhe Cai, Siyuan Huang, Guangran Cheng, Yuxing Long, Peng Gao, Changyin Sun, Hao Dong
IEEE Conference on Robotics and Automation (ICRA), 2024
website
/
paper
/
github
We propose a pure RGB-based navigation skill, PixNav, which takes in an assigned pixel as goal specification and can be used to navigate towards any objects.
|
|
Discuss Before Moving: Visual Language Navigation via Multi-expert Discussions
Yuxing Long, Xiaoqi Li, Wenzhe Cai, Hao Dong
IEEE Conference on Robotics and Automation (ICRA), 2024
website
/
paper
/
github
DiscussNav agent actively discusses with multiple domain experts before moving. And with multi-expert discussion, our method achieves zero-shot visual language navigation without any training.
|
|
Multi-Task Reinforcement Learning With Attention-Based Mixture of Experts
Guangran Cheng, Lu Dong, Wenzhe Cai, Changyin Sun
IEEE Robotics and Automation Letters (RA-L), 2023
paper
/
github
We propose a soft mixture of experts (MoE) based reinforcement learning method to tackle multi-task robotics control problems, which effectively captured the latent relationships among different tasks.
|
|
XuanCE: A Comprehensive and Unified Deep Reinforcement Learning Library
Wenzhang Liu, Wenzhe Cai, Kun Jiang, Guangran Cheng, Yuanda Wang, Jiawei Wang, Jingyu Cao, Lele Xu, Chaoxu Mu, Changyin Sun
Journel of Machine Learning Research (JMLR), 2023 (under review)
github
/
paper
XuanCE is an open-source ensemble of Deep Reinforcement Learning (DRL) algorithm implementations, which supports both single-agent RL and multi-agents RL algorithms.
|
|
Robust Navigation with Cross-Modal Fusion and Knowledge Transfer
Wenzhe Cai*, Guangran Cheng*, Lingyue Kong, Lu Dong, Changyin Sun
IEEE Conference on Robotics and Automation (ICRA), 2023
website
/
paper
/
github
We propose a efficient distillation architecture to tackle the sim-to-real gap of an RL-based navigation policy. Our experiment shows our architecture outperforms the domain randomization techniques.
|
|
DGMem: Learning Visual Navigation Policy without Any Labels by Dynamic Graph Memory
Wenzhe Cai, Teng Wang, Guangran Cheng, Lele Xu, Changyin Sun
Applied Intelligence, 2024
paper
We discuss the self-supervised navigation problem and present Dynamic Graph Memory (DGMem), which facilitates training only with on-board observations.
|
|
Learning a World Model with Multi-Timescale Memory Augmentation
Wenzhe Cai, Teng Wang, Jiawei Wang, Changyin Sun
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
paper
We propose a novel action-conditioned video prediction method which introduces optical flow prediction to model the influence of actions and incorporate the optical-flow based image prediction to improve the long-term prediction quality.
|
Services
Reviewer: RAL, TNNLS, TAI, ICRA, ICLR.
|
Honers & Awards
SEU Doctoral Entrance Scholarship (¥20000)
SEU Scholarship from Shanghai Zhang Jiang Hi-Tech Park (¥10000)
SEU Scholarship from Jiangsu Zhongnan Construction Group (¥10000)
SEU Merit Student
First Prize in Robocup Rescue Simulation Competition, China (1st)
First Prize in Electronic Design Competition, JiangSu
|
|