[go: up one dir, main page]

Progressive Policy Learning: A Hierarchical Framework for Dexterous Bimanual Manipulation

Kang-Won Lee, Jung-Woo Lee, Seongyong Kim, Soo-Chul Lim*,
Dongguk University

*Corresponding Author
Mathematics

Abstract

Dexterous bimanual manipulation remains a challenging task in reinforcement learning (RL) due to the vast state–action space and the complex interdependence between the hands. Conventional end-to-end learning struggles to handle this complexity, and multi-agent RL often faces limitations in stably acquiring cooperative movements. To address these issues, this study proposes a hierarchical progressive policy learning framework for dexterous bimanual manipulation. In the proposed method, one hand’s policy is first trained to stably grasp the object, and, while maintaining this grasp, the other hand’s manipulation policy is progressively learned. This hierarchical decomposition reduces the search space for each policy and enhances both the connectivity and the stability of learning by training the subsequent policy on the stable states generated by the preceding policy. Simulation results show that the proposed framework outperforms conventional end-to-end and multi-agent RL approaches. The proposed method was demonstrated via sim-to-real transfer on a physical dual-arm platform and empirically validated on a bimanual cube manipulation task.

Hierarchical Progressive Policy Learning

In this study, we propose a Hierarchical Progressive Policy Learning (HPPL) framework for dexterous bimanual manipulation. The proposed method decomposes the task into sequential sub-policies: first, a policy for one hand is trained to achieve a stable grasp, and then, while this state is maintained, a policy for the other hand is progressively trained for manipulation. This hierarchical structure effectively reduces the exploration space for each policy and enhances learning stability. In particular, by ensuring the state space of the subsequent policy includes the state of the preceding policy, the proposed approach improves the connectivity between the two policies and enables stable learning.


teaser

Two-stage training and zero-shot sim-to-real transfer

The system is a bimanual robot in which each arm provides 22 DoF—a UR5e (6 DoF) (Universal Robots, Odense, Denmark) coupled with a LEAP-Hand (16 DoF)—for a total of 44 DoF. To improve contact stability and grasp quality, the default rigid fingertips on the LEAP-Hand were replaced with elastomer fingertips of identical geometry. Reinforcement learning is conducted in the IsaacLab simulator. We instantiate a digital twin of the real bimanual system and model the manipulated object as a cube with a rotatable layer matching the real setup. Stage 1 trains the Holding policy from random initializations. Stage 2 initializes from random holding states and trains the Rotating policy while executing the frozen Holding policy. During transfer, both policies run concurrently, and the controller maps their actions to low-level commands that drive the real robot.


teaser