Tese: Meta Reinforcement Learning Applied on Quadrupedal Robots for Blind Locomotion and Fast Adaptation on Unknown Terrains
Aluno(a) : Pedro Leon Fontes Cardoso BazanOrientador(a): Marco A. Meggiolaro, Vivian Medeiros e Wouter Caarls
Área de Concentração: Mecânica Aplicada
Data: 10/10/2025
Resumo:
Blind locomotion refers to the challenge of navigating varied terrains without prior knowledge or exteroceptive data. Although quadruped robots often use external sensors, these can be unreliable in low-light or resource-constrained settings and cannot anticipate disturbances such as slippage. In such scenarios, quadrupeds must rely exclusively on proprioceptive feedback, using internal measurements — joint positions, velocities, and contact forces — to adapt their locomotion strategies. While slip detection and terrain-estimation methods exist, leveraging proprioceptive information offers advantages across many applications. This work explores Meta-Reinforcement Learning (Meta-RL) to enhance policy robustness and rapid adaptation for quadruped robots during blind locomotion on challenging terrain, with the goal of achieving zero-shot generalization — i.e., enabling the agent to perform effectively in unseen environments without additional training. It builds on the RL² algorithm, integrating recurrent neural networks into Proximal Policy Optimization (PPO) to implicitly encode task-specific information from experience. Two novel RL²-based architectures are proposed and evaluated in simulation with the ANYmal C quadruped robot across diverse terrain conditions, focusing on flat surfaces with stochastic slip and highly unstructured terrains. Results show that recurrent policies significantly outperform standard PPO, improving both adaptability and robustness under unpredictable ground dynamics and thereby advancing the state of blind quadrupedal locomotion in challenging simulated environments, with implications for real-world deployment.
Link da defesa:
https://puc-rio.zoom.us/j/91899345128?pwd=kG4WTUnpW5fVkDuQ3hvcSQb4hxEjHu.1
