The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper introduces an algorithm based on MLE to learn the structure and parameters of CDHMM (Continuous Density HMM). One of the most cumbersome troubles encountered in applications that incorporates HMM as a model, is guessing the required number of states and the entire structure especially when sources of information is continuous and variable (e.g. speech). In our algorithm, induction steps...
Reinforcement learning suffers scalability problems due to the state space explosion and the temporal credit assignment problem. Knowledge-based approaches have received a significant attention in the area. Reward shaping is a particular approach to incorporate domain knowledge into reinforcement learning. Theoretical and empirical analysis of this paper reveals important properties of this principle,...
A reinforcement learning (RL) agent that performs successfully in a complex and dynamic environment has to continuously learn and adapt to perform new tasks. This necessitates for them to not only extract control and representation knowledge from the tasks learned, but also to reuse the extracted knowledge to learn new tasks. This paper presents a new method to extract this control and representational...
A macro-action is a typical series of useful actions that brings high expected rewards to an agent. Murata et al. have proposed an actor-critic model which can generate macro-actions automatically based on the information on state values and visiting frequency of states. However, their model has not assumed that generated macro-actions are utilized for leaning different tasks. In this paper, we extend...
Hidden Markov models (HMMs) are widely applied to the analysis of time-dependent data sequences, such as nonlinear signal processing, natural language processing, and bioinformatics. Training data in HMMs have two possible formats: a large set of time-dependent sequential data and an infinitely long sequence. The learning process is one of the main concerns in machine learning. For a large set of...
In the HRL field, there are several main methods such as HAMs, options, MAXQ. These methods all rely on the theory of SMDPs. However, SMDPs does not specify how the overall task can be decomposed into a collection of subtasks. This paper introduces the concept of ldquopolicy-coupledrdquo SMDPs into HAMs. It defines the concept of HAM-decomposable and makes the relations among the HAM machine, HAM-decomposable,...
A basic problem of intelligent systems is choosing adaptive action to perform in a non-stationary environment. Due to the combinatorial complexity of actions, agent cannot possibly consider every option available to it at every instant in time. It needs to find good policies that dictate optimum actions to perform in each situation. This paper proposes an algorithm, called UQ-learning, to better solve...
The previous works describing the generalization ability of learning algorithms are based on independent and identically distributed (i.i.d.) samples. In this paper we go far beyond this classical framework by studying the learning performance of the empirical risk minimization (ERM) algorithm with Markov chain samples. We obtain the bound on the rate of uniform convergence of the ERM algorithm with...
In this paper, non deterministic Indirect Reinforcement Learning (RL) techniques for controlling the transmission times and power of Wireless Network nodes are presented. Indirect RL facilitates planning and learning which ultimately leads to convergence on optimal actions with reduced episodes or time steps compared to direct RL. Three Dyna architecture based algorithms for non deterministic environments...
We address the problem of label assignment in computer vision: given a novel 3D or 2D scene, we wish to assign a unique label to every site (voxel, pixel, superpixel, etc.). To this end, the Markov Random Field framework has proven to be a model of choice as it uses contextual information to yield improved classification results over locally independent classifiers. In this work we adapt a functional...
Markov random field (MRF, CRF) models are popular in computer vision. However, in order to be computationally tractable they are limited to incorporate only local interactions and cannot model global properties, such as connectedness, which is a potentially useful high-level prior for object segmentation. In this work, we overcome this limitation by deriving a potential function that enforces the...
This paper presents a modified R-learning according to the traditional average reward reinforcement learning algorithm. Reinforcement learning problems constitute an important class of learning and control problems faced by artificial intelligence systems. The general framework of reinforcement learning can be divided into two forms, discounted reward reinforcement learning and average reward reinforcement...
In this paper, we propose a novel framework to extract temporally extended concepts in a grid world environment using a probable data structure named temporal-difference network. First a reinforcement-learning agent tries to learn its environment for the task of wall following. After that we train a newly introduced temporal-difference network (TDN) in the brain of the agent in order to gain a predictive...
In this paper we propose a novel strategy for converging dynamic policies generated by adaptive agents, which receive and accumulate rewards for their actions. The goal of the proposed strategy is to speed up the convergence of such agents to a good policy in dynamic environments. Since it is difficult to have the good value for a state due to the continuous changing in the environment, previous policies...
Markov decision processes are one of the most popular frameworks for reinforcement learning. The entropy of probability density functions of Markov decision processes is referred to as the stochastic complexity. The stochastic complexity is helpful for tuning the parameters of an action-selection strategy to alleviate the exploration-exploitation dilemma. In this paper, we improve an action-selection...
Gaussian Markov random fields are applied to many statistical inferences. Probabilistic models of statistical inferences are constructed in the concept of Bayesian statistics and have some network structures. In the present paper, we analyze the statistical performance of the statistical inferences in Gaussian Markov random fields on some complex networks including scale free networks. We discuss...
Stochastic decision processes in reinforcement learning are usually formulated as Markov decision processes which are stationary and ergodic. However, in fact, some of the stochastic decision processes are not necessarily Markov, stationary, and/or ergodic. In this paper, using an information-theoretic property, we show a class of stochastic decision processes in reinforcement learning in which return...
This research integrates rigorous methods of reinforcement learning (RL) and control engineering with a behavioral (ethology) approach to the agent technology. The main outcome is a hybrid architecture for intelligent autonomous agents targeted to the Artificial Life like environments. The architecture adopts several biology concepts and shows that they can provide robust solutions to some areas....
We describe a reinforcement learning based scheme to estimate the stationary distribution of subsets of states of large Markov chains. dasiaSplit samplingpsila ensures that the algorithm needs to just encode the state transitions and will not need to know any other property of the Markov chain. (An earlier scheme required knowledge of the column sums of the transition probability matrix.) This algorithm...
The reinforcement learning is applied to automatic parking problem for four-wheeled automobile. The automobile controlled by reinforcement learning learns the appropriate steering angle against the outer environment using distance measuring sensors. The Rational Policy Making (PRM) Method is introduced in order to cope with random start positions. The present method has the advantage of easy implementation...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.