Sparse Latent Space Policy Search
Kevin Sebastian Luck, Joni Pajarinen, Erik Berger, Ville Kyrki and Heni Ben Amor
Arizona State University, Technical University Bergakademie Freiberg, Aalto University
Abstract Computational agents often need to learn policies that involve many control variables, e.g., a robot needs to control several joints simultaneously. Learning a policy with a high number of parameters, however, usually requires a large number of training samples. We introduce a reinforcement learning method for sample-efficient policy search that exploits correlations between control variables. Such correlations are particularly frequent in motor skill learning tasks. The introduced method uses Variational Inference to estimate policy parameters, while at the same time uncovering a low-dimensional latent space of controls. Prior knowledge about the task and the structure of the learning agent can be provided by specifying groups of potentially correlated parameters. This information is then used to impose sparsity constraints on the mapping between the high-dimensional space of controls and a lower-dimensional latent space. In experiments with a simulated bi-manual manipulator, the new approach effectively identifies synergies between joints, performs efficient low-dimensional policy search, and outperforms state-of-the-art policy search methods.
Setting up the V-REP Simulation
The V-REP simulator of Coppelia Robotics provides several robotic models and implements a physics engine. It is free for academic and educational use and can be downloaded here. In the following we describe how you can set up your V-REP to execute the NAO experiment provided in the zip-file:
- Download and extract V-REP
- Include the path V-REP/programming/remoteApiBindings/matlab/matlab in your matlab workspace
- Include the path V-REP/programming/remoteApi in your matlab workspace
- Include the path V-REP/programming/remoteApiBindings/lib/lib/64Bit (or 32Bit) in your matlab workspace
- Change the port number in the file V-REP/remoteApiConnections.txt to “portIndex1_port = 19999”
- Start V-REP (under Linux with “sh V-REP/vrep.sh”)
- Open the scene-file NAOExperimentsStandZero.1.ttt provided in the GrouPS-file
- Execute the matlab script evaluateGFoneVREP.m
Please note that the code linked above is an outdated version, we are going to publish a new, reviewed version probably in January 2017. If you are interested feel welcome to write an email to Kevin Luck.
BibTex
@inproceedings{luck2016sparse, title={Sparse Latent Space Policy Search}, author={Luck, Kevin Sebastian and Pajarinen, Joni and Berger, Erik and Kyrki, Ville and Amor, Heni Ben}, booktitle={Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence}, year={2016}, organization={AAAI Press} }