The lack of data efficiency and stability is one of the main challenges in end-to-end model free reinforcement learning (RL) methods. Recent researches solve the problem resort to supervised learning methods by utilizing human expert demonstrations, e.g. imitation learning. In this paper we present a novel framework which builds a self-improving process upon a policy improvement operator, which is used as a black box such that it has multiple implementation options for various applications. An agent is trained to iteratively imitate behaviors that are generated by the operator. Hence the agent can learn by itself without domain knowledge from human. We employ generative adversarial networks (GAN) to implement the imitation module in the new framework. We evaluate the framework performance over multiple application domains and provide comparison results in support.
|ACM International Conference on Autonomous Agents and Multiagent Systems. Proceedings
|Association for Computing Machinery
|The 18th International Conference on Autonomous Agents and MultiAgent Systems
|13/05/19 → …