2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

Items from 1 to 20 out of 54 results

chapter

The Effect of Bootstrapping in Multi-Automata Reinforcement Learning

Maarten Peeters, Katja Verbeeck, Ann Nowe

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning > 76 - 83

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

Learning Automata are shown to be an excellent tool for creating learning multi-agent systems. Most algorithms used in current automata research expect the environment to end in an explicit end-stage. In this end-stage the rewards are given to the learning automata (i.e. Monte Carlo updating). This is however unfeasible in sequential decision problems with infinite horizon where no such end-stage...

chapter

A Theoretical Analysis of Cooperative Behavior in Multi-agent Q-learning

Ludo Waltman, Uzay Kaymak

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning > 84 - 91

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

A number of experimental studies have investigated whether cooperative behavior may emerge in multi-agent Q-learning. In some studies cooperative behavior did emerge, in others it did not. This paper provides a theoretical analysis of this issue. The analysis focuses on multi-agent Q-learning in iterated prisoner's dilemmas. It is shown that under certain assumptions cooperative behavior may emerge...

chapter

Coordinated Reinforcement Learning for Decentralized Optimal Control

D. Yagan, Chen-Khong Tham

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning > 296 - 302

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

We consider a multi-agent system where the overall performance is affected by the joint actions or policies of agents. However, each agent only observes a partial view of the global state condition. This model is known as a decentralized partially-observable Markov decision process (DEC-POMDP), which can be considered more applicable in real-world applications such as communication networks. It is...

chapter

Opposition-Based Q(λ) with Non-Markovian Update

M. Shokri, H.R. Tizhoosh, M.S. Kamel

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning > 288 - 295

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

The OQ(λ) algorithm benefits from an extension of eligibility traces introduced as opposition trace. This new technique is a combination of the idea of opposition and eligibility traces to deal with large state space problems in reinforcement learning applications. In our previous works the comparison of the results of OQ(λ) and conventional Watkins' Q(λ) reflected a remarkable increase in performance...

chapter

Reinforcement Learning in Continuous Action Spaces

Hado van Hasselt, Marco A. Wiering

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning > 272 - 279

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

Quite some research has been done on Reinforcement Learning in continuous environments, but the research on problems where the actions can also be chosen from a continuous space is much more limited. We present a new class of algorithms named Continuous Actor Critic Learning Automaton (CACLA) that can handle continuous states and actions. The resulting algorithm is straightforward to implement. An...

chapter

Continuous-Time ADP for Linear Systems with Partially Unknown Dynamics

Draguna Vrabie, Murad Abu-Khalaf, Frank L. Lewis, Youyi Wang

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning > 247 - 253

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

Approximate Dynamic Programming has been formulated and applied mainly to discrete-time systems. Expressing the ADP concept for continuous-time systems raises difficult issues related to sampling time and system model knowledge requirements. In this paper is presented a novel online adaptive critic (AC) scheme, based on approximate dynamic programming (ADP), to solve the infinite horizon optimal control...

chapter

Using Reward-weighted Regression for Reinforcement Learning of Task Space Control

Jan Peters, Stefan Schaal

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning > 262 - 267

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

Many robot control problems of practical importance, including task or operational space control, can be reformulated as immediate reward reinforcement learning problems. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require...

chapter

An Approximate Dynamic Programming Strategy for Responsive Traffic Signal Control

Chen Cai

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning > 303 - 310

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

This paper proposes an approximate dynamic programming strategy for responsive traffic signal control. It is the first attempt that optimizes signal control objective dynamically through adaptive approximation of value function. The proposed value function approximation is separable and exogenous factor independent. The algorithm updates the approximated value function progressively in operation,...

chapter

Robust Dynamic Programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice

Baohua Li, Jennie Si

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning > 96 - 102

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

In this paper, finite-state, finite-action, discounted infinite-horizon-cost Markov decision processes (MDPs) with uncertain stationary transition matrices are discussed in the deterministic policy space. Uncertain stationary parametric transition matrices are clearly classified into independent and correlated cases. It is pointed out in this paper that the optimality criterion of uniform minimization...

chapter

Identifying trajectory classes in dynamic tasks

Stuart O. Anderson, Siddhartha S. Srinivasa

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning > 172 - 177

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

Using domain knowledge to decompose difficult control problems is a widely used technique in robotics. Previous work has automated the process of identifying some qualitative behaviors of a system, finding a decomposition of the system based on that behavior, and constructing a control policy based on that decomposition. We introduce a novel method for automatically finding decompositions of a task...

chapter

Knowledge Transfer Using Local Features

Martin Stolle, Christopher G. Atkeson

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning > 26 - 31

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

We present a method for reducing the effort required to compute policies for tasks based on solutions to previously solved tasks. The key idea is to use a learned intermediate policy based on local features to create an initial policy for the new task. In order to further improve this initial policy, we developed a form of generalized policy iteration. We achieve a substantial reduction in computation...

chapter

An Optimal ADP Algorithm for a High-Dimensional Stochastic Control Problem

Juliana Nascimento, Warren Powell

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning > 52 - 59

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

We propose a provably optimal approximate dynamic programming algorithm for a class of multistage stochastic problems, taking into account that the probability distribution of the underlying stochastic process is not known and the state space is too large to be explored entirely. The algorithm and its proof of convergence rely on the fact that the optimal value functions of the problems within the...

chapter

Dual Representations for Dynamic Programming and Reinforcement Learning

Tao Wang, Michael Bowling, Dale Schuurmans

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning > 44 - 51

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

We investigate the dual approach to dynamic programming and reinforcement learning, based on maintaining an explicit representation of stationary distributions as opposed to value functions. A significant advantage of the dual approach is that it allows one to exploit well developed techniques for representing, approximating and estimating probability distributions, without running the risks associated...

chapter

On a Successful Application of Multi-Agent Reinforcement Learning to Operations Research Benchmarks

Thomas Gabel, Martin Riedmiller

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning > 68 - 75

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

In this paper, we suggest and analyze the use of approximate reinforcement learning techniques for a new category of challenging benchmark problems from the field of Operations Research. We demonstrate that interpreting and solving the task of job-shop scheduling as a multi-agent learning problem is beneficial for obtaining near-optimal solutions and can very well compete with alternative solution...

chapter

Convergence of Model-Based Temporal Difference Learning for Control

Hado van Hasselt, Marco A. Wiering

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning > 60 - 67

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

A theoretical analysis of Model-Based Temporal Difference Learning for Control is given, leading to a proof of convergence. This work differs from earlier work on the convergence of Temporal Difference Learning by proving convergence to the optimal value function. This means that not the values of the current policy are found, but instead the policy is updated in such a manner that ultimately the...

chapter

SVM Viability Controller Active Learning: Application to Bike Control

L. Chapel, G. Deffuant

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning > 193 - 200

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

It was shown recently that SVMs are particularly adequate to define action policies to keep a dynamical system inside a given constraint set (in the framework of viability theory). However, the training set of the SVMs face the dimensionality curse, because it is based on a regular grid of the state space. In this paper, we propose an active learning approach, aiming at decreasing dramatically the...

chapter

Q-Learning with Continuous State Spaces and Finite Decision Set

Kengy Barty, Pierre Girardeau, Jean-Sebastien Roy, Cyrille Strugarek

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning > 346 - 351

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

This paper aims to present an original technique in order to compute the optimal policy of a Markov Decision Problem with continuous state space and discrete decision variables. We propose an extension of the Q-learning algorithm introduced in 1989 by Watkins for discrete Markov Decision Problems. Our algorithm relies on stochastic approximation and functional estimation, and uses kernels to locally...

chapter

Computing Optimal Stationary Policies for Multi-Objective Markov Decision Processes

M.A. Wiering, E.D. de Jong

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning > 158 - 165

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

This paper describes a novel algorithm called CON-MODP for computing Pareto optimal policies for deterministic multi-objective sequential decision problems. CON-MODP is a value iteration based multi-objective dynamic programming algorithm that only computes stationary policies. We observe that for guaranteeing convergence to the unique Pareto optimal set of deterministic stationary policies, the algorithm...

chapter

Reinforcement learning by backpropagation through an LSTM model/critic

Bram Bakker

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning > 127 - 134

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

This paper describes backpropagation through an LSTM recurrent neural network model/critic, for reinforcement learning tasks in partially observable domains. This combines the advantage of LSTM's strength at learning long-term temporal dependencies to infer states in partially observable tasks, with the advantage of being able to learn high-dimensional and/or continuous actions with backpropagation's...

Publication date

Set your own date range

Content availability

Available (53)
None (1)

Keywords

LEARNING (ARTIFICIAL INTELLIGENCE) (8)
REINFORCEMENT LEARNING (6)
DYNAMIC PROGRAMMING (4)
MARKOV PROCESSES (4)
APPROXIMATE DYNAMIC PROGRAMMING (3)
ADAPTIVE CRITICS (2)
ADAPTIVE DYNAMIC PROGRAMMING (2)
HEURISTIC PROGRAMMING (2)
ITERATIVE METHODS (2)
OPTIMAL CONTROL (2)
POLICY ITERATIONS (2)
Q-LEARNING (2)
VIABILITY THEORY (2)
ACTION-VALUE FUNCTION (1)
ACTOR CRITIC LEARNING AUTOMATON (1)
ADAPTIVE CONTROL (1)
ADAPTIVE CRITIC DESIGN (1)
ADAPTIVE CRITIC SYSTEMS (1)
ADAPTIVE CRITIC VELOCITY (1)
ADAPTIVE SYSTEMS (1)
ALGEBRAIC RICCATI EQUATION (1)
APPROXIMATE VALUE ITERATION (1)
ARTIFICIAL INTELLIGENCE (1)
AUTOMATON-LIKE UPDATE RULE (1)
AUTONOMOUS DRIVE (1)
AUTONOMOUS ROBOT (1)
AUTONOMOUS WHEELED MOBILE ROBOT (1)
BATCH REINFORCEMENT LEARNING (1)
BELLMAN EQUATION (1)
BIKE CONTROL (1)
BRAIN INTELLIGENCE (1)
CONSTRAINT OPTIMIZATION (1)
CONSTRAINT SET (1)
CONTINUOUS SPACE (1)
CONTINUOUS SYSTEMS (1)
CONTROL DESIGN ANALYSIS (1)
CONTROL DESIGN SYNTHESIS (1)
CONVERGENCE (1)
CONVERGENCE PROOF (1)
COORDINATED REINFORCEMENT LEARNING (1)
DECENTRALISED CONTROL (1)
DECENTRALIZED OPTIMAL CONTROL (1)
DECENTRALIZED PARTIALLY-OBSERVABLE MARKOV DECISION PROCESS (1)
DECISION AID (1)
DETERMINISTIC INFINITE HORIZON (1)
DETERMINISTIC MULTIOBJECTIVE SEQUENTIAL DECISION PROBLEMS (1)
DHP ADAPTIVE CRITIC MOTION CONTROL (1)
DISCOUNTED-REWARD MARKOVIAN DECISION PROBLEM (1)
DISCRETE TIME SYSTEMS (1)
DISCRETE-TIME NONLINEAR OPTIMAL CONTROL (1)
DUAL HEURISTIC PROGRAMMING (1)
DYNAMIC LEADER-FOLLOWER PROBLEMS (1)
DYNAMICAL SYSTEM (1)
ELIGIBILITY TRACE (1)
EQUI-GRADIENT DESCENT ALGORITHM (1)
FINITE-HORIZON PROBLEM (1)
FUNCTION APPROXIMATION (1)
FUNCTION APPROXIMATOR (1)
GAME THEORY (1)
GLOBAL STATE CONDITION (1)
GRAPH THEORY (1)
GREEDY ITERATION (1)
HAMILTON JACOBI BELLMAN EQUATION (1)
HEURISTIC DYNAMIC PROGRAMMING (1)
HJB (1)
INTELLIGENT CONTROL (1)
INVERSE VELOCITY MODEL (1)
JOINT OPTIMAL POLICY (1)
KERNEL-BASED METHODS (1)
LASSO (1)
LEADER-FOLLOWER SEMIMARKOV DECISION PROBLEMS (1)
LEARNING AUTOMATA (1)
LEARNING SPEED (1)
LEAST SQUARES APPROXIMATIONS (1)
LEAST-SQUARES-BASED POLICY EVALUATION (1)
LYAPUNOV METHOD (1)
MAMMAL BRAINS (1)
MARKOV DECISION PROCESS (1)
MARKOVIAN UPDATE (1)
MOBILE ROBOTS (1)
MOTION CONTROL (1)
MOTORCYCLES (1)
MULTI-AGENT SYSTEMS (1)
MULTIAGENT SYSTEM (1)
MULTIAGENT WIRELESS MULTIHOP NETWORK (1)
MULTIOBJECTIVE DYNAMIC PROGRAMMING (1)
MULTIOBJECTIVE MARKOV DECISION PROCESSES (1)
MULTIOBJECTIVE VALUE ITERATION ALGORITHMS (1)
MULTIPERIOD DECISION SITUATIONS (1)
NANOMANIPULATION (1)
NEURAL NETS (1)
NEURAL NETWORK (1)
NEURAL NETWORK VALUE FUNCTION REPRESENTATION (1)
NEURAL NETWORKS (1)
NEURO-CONTROL (1)
NEUROCONTROLLERS (1)
NONLINEAR CONTROL SYSTEMS (1)
NONMARKOVIAN OPPOSITION TRACE (1)
NONMARKOVIAN UPDATE (1)
NONPARAMETRIC FUNCTION APPROXIMATORS (1)
more

INFONA - science communication portal

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

The Effect of Bootstrapping in Multi-Automata Reinforcement Learning

A Theoretical Analysis of Cooperative Behavior in Multi-agent Q-learning

Coordinated Reinforcement Learning for Decentralized Optimal Control

Opposition-Based Q(λ) with Non-Markovian Update

Reinforcement Learning in Continuous Action Spaces

Continuous-Time ADP for Linear Systems with Partially Unknown Dynamics

Using Reward-weighted Regression for Reinforcement Learning of Task Space Control

An Approximate Dynamic Programming Strategy for Responsive Traffic Signal Control

Robust Dynamic Programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice

Identifying trajectory classes in dynamic tasks

Knowledge Transfer Using Local Features

An Optimal ADP Algorithm for a High-Dimensional Stochastic Control Problem

Dual Representations for Dynamic Programming and Reinforcement Learning

On a Successful Application of Multi-Agent Reinforcement Learning to Operations Research Benchmarks

Convergence of Model-Based Temporal Difference Learning for Control

SVM Viability Controller Active Learning: Application to Bike Control

Q-Learning with Continuous State Spaces and Finite Decision Set

toc

Computing Optimal Stationary Policies for Multi-Objective Markov Decision Processes

Reinforcement learning by backpropagation through an LSTM model/critic

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2007 First IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning