Policy Gradient Bayesian Robust Optimization for Imitation Learning
Zaynah Javed*, Daniel Brown*, Satvik Sharma, Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca D. Dragan, Ken Goldberg
International Conference on Machine Learning (ICML), 2021
Website / PDF

A scalable and robust RL algorithm which optimizes for a combination of expected performance and tail risk under a distribution over learned reward functions.

Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones
Brijen Thananjeyan*, Ashwin Balakrishna*, Suraj Nair, Michael Luo, Krishnan Srinivasan, Minho Hwang, Joseph E. Gonzalez, Julian Ibarz, Chelsea Finn, Ken Goldberg
Robotics and Automation Letters (RA-L) Journal and International Conference on Robotics and Automation (ICRA), 2021 - Mentioned in Google AI Year in Review
Website / PDF
Mentioned in Google AI Year in Review

An algorithm for safe reinforcement learning which utilizes a set of offline data to learn about constraints before policy learning and a pair of policies which seperate the often conflicting objectives of task directed exploration and constraint satisfaction to learn contact rich and visuomotor control tasks.

ABC-LMPC: Safe Sample-Based Learning MPC for Stochastic Nonlinear Dynamical Systems with Adjustable Boundary Conditions
Brijen Thananjeyan*, Ashwin Balakrishna*, Ugo Rosolia, Joseph E. Gonzalez, Aaron Ames, Ken Goldberg
Algorithmic Foundations of Robotics (WAFR), 2020 - Invited to IJRR Special Issue
Website / PDF

An MPC-based algorithm for robotic control (ABC-LMPC) with (1) performance and safety guarantees for stochastic nonlinear systems and (2) the ability to continuously explore the environment and expand the controller domain.

Safety Augmented Value Estimation from Demonstrations (SAVED): Safe Deep Model-Based RL for Sparse Cost Robotic Tasks
Brijen Thananjeyan*, Ashwin Balakrishna*, Ugo Rosolia, Felix Li, Rowan McAllister, Joseph E. Gonzalez, Sergey Levine, Francesco Borrelli, Ken Goldberg
Robotics and Automation Letters (RA-L) Journal and International Conference on Robotics and Automation (ICRA), 2020
Website / PDF

A new algorithm for safe and efficient reinforcement learning (SAVED) which leverages a small set of suboptimal demonstrations and prior task successes to structure exploration. SAVED also provides a mechanism for handling state-space constraints by leveraging probabilistic estimates of system dynamics.

On-Policy Robot Imitation Learning from a Converging Supervisor
Ashwin Balakrishna*, Brijen Thananjeyan*, Jonathan Lee, Felix Li, Arsh Zahed, Joseph E. Gonzalez, Ken Goldberg
Conference on Robot Learning (CoRL), 2019 - Oral Presentation
PDF

A new formulation of imitiation learning from a non-stationary supervisor, associated theoretical analysis, and a practical algorithm to apply this formulation to develeop an RL algorithm which combines the sample efficiency of model-based RL and the fast policy evaluation enabled by model-free policies.

Website template from Jon Barron