A New Natural Policy Gradient by Stationary Distribution Metric

Tetsuro Morimura; Eiji Uchibe; Junichiro Yoshimoto; Kenji Doya

doi:10.1007/978-3-540-87481-2_6

A New Natural Policy Gradient by Stationary Distribution Metric

Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto, Kenji Doya

Source

Lecture Notes in Computer Science > Machine Learning and Knowledge Discovery in Databases > Regular Papers > 82-97

Abstract

The parameter space of a statistical learning machine has a Riemannian metric structure in terms of its objective function. [1] Amari proposed the concept of “natural gradient” that takes the Riemannian metric of the parameter space into account. Kakade [2] applied it to policy gradient reinforcement learning, called a natural policy gradient (NPG). Although NPGs evidently depend on the underlying Riemannian metrics, careful attention was not paid to the alternative choice of the metric in previous studies. In this paper, we propose a Riemannian metric for the joint distribution of the state-action, which is directly linked with the average reward, and derive a new NPG named “Natural State-action Gradient” (NSG). Then, we prove that NSG can be computed by fitting a certain linear model into the immediate reward function. In numerical experiments, we verify that the NSG learning can handle MDPs with a large number of states, for which the performances of the existing (N)PG methods degrade.

Identifiers

series ISSN :	0302-9743
series e-ISSN :	1611-3349
book ISBN :	978-3-540-87480-5
book e-ISBN :	978-3-540-87481-2
DOI	10.1007/978-3-540-87481-2_6

Authors

Tetsuro Morimura

, Initial Research Project, Okinawa Institute of Science and Technology,
, IBM Research, Tokyo Research Laboratory,

Eiji Uchibe

, Initial Research Project, Okinawa Institute of Science and Technology,

Junichiro Yoshimoto

, Initial Research Project, Okinawa Institute of Science and Technology,
, Graduate School of Information Science, Nara Institute of Science and Technology,

Kenji Doya

, Initial Research Project, Okinawa Institute of Science and Technology,
, Graduate School of Information Science, Nara Institute of Science and Technology,
, ATR Computational Neuroscience Laboratories,

Keywords

policy gradient reinforcement learning natural gradient Riemannian metric matrix Markov decision process

Additional information

Data set: Springer

Publisher

Springer Berlin Heidelberg

chapter

Read online
Download
Add to read later
Add to collection
Add to followed
Share

Export to bibliography


Assign to other user
	×
Wrong email address

INFONA - science communication portal

A New Natural Policy Gradient by Stationary Distribution Metric $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Tetsuro Morimura

Eiji Uchibe

Junichiro Yoshimoto

Kenji Doya

Keywords

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

A New Natural Policy Gradient by Stationary Distribution Metric