Abstract
We investigate the active manipulation of objects using model-free and long-horizon DRL (Deep Reinforcement Learning) to achieve target shapes. Our proposed approach uses visual observations consisting of segmented images, to mitigate the sim-to-real gap. We address a long-horizon manipulation task requiring a sequence of accurate actions to achieve the target shapes using a robot arm with an RGB-D camera in eye-in-hand configuration, and an elongated, volumetric, elastoplastic object. We find similar objects in food, marine, and manufacturing domains. The aim is to actively manipulate the object into an arbitrary target shape using image observations. We trained a DRL agent using PPO (Proximal Policy Optimization) by running 768 parallel actors in simulation, for a total of 1,2M environment interactions, and tested this on 200 unseen target deformations. In three attempts, 82% of the trials achieved a greater than 90% overlap with the 200 target shapes. By relying on segmentation images as a visual observation space, we successfully transferred the agent to the real world without supplementary training. Our approach does not need any real-world manipulation examples nor fine-tuning in the real world. The robustness of our approach was demonstrated in simulation, and experimentally validated in the real world for specific manipulation tasks, achieving a 94.2% mean zero-shot overlap success rate on previously unseen target shapes.