Two-time-scale online actor-critic paradigm driven by POMDP

Bo Liu; Haibo He; Daniel W Repperger

doi:10.1109/ICNSC.2010.5461491

Source

2010 International Conference on Networking, Sensing and Control (ICNSC) > 243 - 248

Abstract

In this paper, we analyze a class of actor-critic algorithms under partially observable Markov decision process (POMDP) environment. Specifically, in this work we focus on the two-time-scale framework in which the critic uses a temporal difference with neural network (NN) as nonlinear function approximator, and the actor is updated using greedy algorithm with the stochastic gradient approach. Instead of the common construction of hidden state estimator, we develop the idea originated from Singh, Jaakkola and Jordan (1994) into an online action-dependent actor-critic paradigm. This framework explores the ability of the adaptive dynamic programming (ADP) approach in POMDP environment without implementing extra architectures such as state estimators. Both the theoretical analysis and simulation studies validate that the framework performs effectively under certain assumptions given in this paper.

Identifiers

book ISBN :	978-1-4244-6450-0
book e-ISBN :	978-1-4244-6453-1 , 978-1-4244-6452-4
DOI	10.1109/ICNSC.2010.5461491

Keywords

neural nets dynamic programming function approximation gradient methods greedy algorithms Markov processes adaptive dynamic programming online actor-critic paradigm partially observable Markov decision process POMDP environment two-time-scale framework temporal difference neural network nonlinear function approximator greedy algorithm stochastic gradient approach Convergence Noise Sensors Artificial neural networks Approximation methods

Additional information

Data set: ieee

Publisher

IEEE

INFONA - science communication portal

Two-time-scale online actor-critic paradigm driven by POMDP

Source

Abstract

Identifiers

Authors

Bo Liu

Haibo He

Repperger, D.W.

Keywords

Additional information

Publisher


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Two-time-scale online actor-critic paradigm driven by POMDP $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Bo Liu

Haibo He

Repperger, D.W.

Keywords

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Two-time-scale online actor-critic paradigm driven by POMDP