Towards Fully Autonomous Robot Navigation in Complex Outdoors: A Multi-Modal Perception and Learning-Based Approach
Files
Publication or External Link
Date
Advisor
Citation
DRUM DOI
Abstract
Autonomous mobile robots are increasingly deployed in diverse outdoor applications such as surveillance, search and rescue, planetary exploration, delivery, and agriculture. However, navigating complex and unstructured outdoor environments remains a formidable challenge due to factors including uneven terrain, heterogeneous surface conditions, dense vegetation with varying physical properties, and the need for contextual scene understanding. This dissertation attempts to address these challenges through a multi-modal perception and learning-based framework, introducing novel algorithms for robust and adaptive outdoor navigation.
The first part of the dissertation focuses on the development of deployable deep reinforcement learning (DRL) policies. For wheeled robots, an online RL framework is proposed that leverages elevation-aware perception and a hybrid attention-based planner to enable stable traversal of highly uneven terrain and effective sim-to-real transfer. For legged robots operating in dense vegetation, an offline RL approach is introduced that integrates proprioceptive and exteroceptive sensing for stability-aware and vegetation-compliant planning. Additionally, a novel policy gradient algorithm with heavy-tailed parameterization is proposed to tackle sparse reward challenges, enabling sample-efficient training and reliable real-world deployment.
The second part explores robust perception under degraded sensing conditions. A trajectory traversability estimation framework is developed by fusing RGB images, 3D LiDAR, and odometry, with candidate paths modeled as graphs and processed using an attention-based Graph Neural Network (GNN) trained to handle partial sensor failures. Furthermore, we introduce a novel 3D object representation, the Multi-Layer Intensity Map (MIM), which leverages stacked LiDAR intensity grids to estimate obstacle height, solidity, and opacity. MIMs enable reliable differentiation between passable and impassable vegetation and structures, while an adaptive inflation scheme further enhances navigation performance in narrow or cluttered environments. The MIM formulation is also extended to support real-time transparent object detection, addressing a common failure case in LiDAR-based systems by enabling safe and reactive collision avoidance. Finally, we present AdVENTR, a general-purpose system for autonomous navigation in unstructured outdoor environments marked by uneven terrain and dense vegetation. AdVENTR integrates data from RGB cameras, 3D LiDAR, IMU, odometry, and pose estimation, processed through efficient, edge-deployable learning-based perception and planning modules.
The third part tackles context-aware navigation in long-tail and open-world scenarios using compact vision-language models (VLMs). A lightweight framework is proposed that leverages zero-shot scene understanding from VLMs to interpret user-defined behavioral instructions. Detected objects are associated with regulatory actions to construct a behavioral cost map, encoding language-driven rules into a spatial format. This enables adaptive, behavior-aware navigation that responds dynamically to scene changes. To balance motion efficiency with compliance, the framework integrates the cost map with an unconstrained Model Predictive Control (MPC) planner. The result is a flexible and interpretable navigation system capable of executing high-level objectives in complex outdoor environments.