Models for Autonomously Motivated Exploration in Reinforcement Learning

Peter Auer; Shiau Hong Lim; Chris Watkins

doi:10.1007/978-3-642-24477-3_4

Models for Autonomously Motivated Exploration in Reinforcement Learning

Peter Auer, Shiau Hong Lim, Chris Watkins

Source

Lecture Notes in Computer Science > Discovery Science > 29-29

Abstract

One of the striking differences between current reinforcement learning algorithms and early human learning is that animals and infants appear to explore their environments with autonomous purpose, in a manner appropriate to their current level of skills. An important intuition for autonomously motivated exploration was proposed by Schmidhuber [1,2]: an agent should be interested in making observations that reduce its uncertainty about future observations.

However, there is not yet a theoretical analysis of the usefulness of autonomous exploration in respect to the overall performance of a learning agent. We discuss models for a learning agent’s autonomous exploration and present some recent results. In particular, we investigate the exploration time for navigating effectively in a Markov Decsion Process (MDP) without rewards, and we consider extensions to MDPs with infinite state spaces.