Wyniki wyszukiwania dla: Shalabh Bhatnagar

Pozycje od 1 do 2 spośród 2 wyników

artykuł

Multiscale Q-learning with linear function approximation

Shalabh Bhatnagar, K. Lakshmanan

Discrete Event Dynamic Systems > 2016 > 26 > 3 > 477-509

We present in this article a two-timescale variant of Q-learning with linear function approximation. Both Q-values and policies are assumed to be parameterized with the policy parameter updated on a faster timescale as compared to the Q-value parameter. This timescale separation is seen to result in significantly improved numerical performance of the proposed algorithm over Q-learning. We show that...

rozdział

A novel Q-learning algorithm with function approximation for constrained Markov decision processes

K. Lakshmanan, Shalabh Bhatnagar

2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton) > 400 - 405

2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton)

We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decision process subject to multiple inequality constraints. We formulate a relaxed version of this problem through the Lagrange multiplier method. Our algorithm is different from Q-learning in that it updates two parameters — a Q-value parameter and a policy parameter. The Q-value parameter is updated on...

Opcje filtrowania

Słowa kluczowe:
MULTI-STAGE STOCHASTIC SHORTEST PATH PROBLEM

Data publikacji

Ustaw własny zakres dat

Typ publikacji

artykuł (1)
książka (1)

Słowa kluczowe

Zbiór danych

ieee (1)
Springer (1)

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania dla: Shalabh Bhatnagar

Multiscale Q-learning with linear function approximation

A novel Q-learning algorithm with function approximation for constrained Markov decision processes

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Typ publikacji

Słowa kluczowe

Zbiór danych

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu