The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this work, we present a network architecture to solve a supervised learning problem, the classification of a handwritten dataset, and a reinforcement learning problem, a complex First-Person Shooter 3D game environment. We used a Deep Neural Network model to solve both problems. For classification, we used a Softmax regression and cross entropy loss to train the network. To play the game, we used...
Urban traffic flow is dynamic and uncertain. In this paper, we combine the deep learning and the reinforcement learning, and design an intersection signal controller based on Q-learning and convolutional neural network. We redefine the state space and the reward function. The training and simulation of the controller are carried out in traffic micro-simulator SUMO. Compared with timing control, the...
With the developing of modern training, advanced requirements of training process such as creative, economical, realistic, safety have been proposed. It is hard for tradition training method to satisfy the training requirement of new technology and equipment in the background of modern industry. Traditional training methods were facing severe challenges. In order to solve the complex technical training...
In this paper, a reinforcement learning approach is proposed to detect unexpected faults, where the noise-to-signal ratio of the data series is minimized for achieving robustness. The model parameter is taken as a special action of the reinforcement learning, and the policy valuation and policy improvement are utilized to find the parameters, which can make the estimated model consistent to the real-time...
Recently, a state-of-the-art algorithm, called deep deterministic policy gradient (DDPG), has achieved good performance in many continuous control tasks in the MuJoCo simulator. To further improve the efficiency of the experience replay mechanism in DDPG and thus speeding up the training process, in this paper, a prioritized experience replay method is proposed for the DDPG algorithm, where prioritized...
In this paper we report on our study of the performance of Deep Reinforcement Learning (DRL) agents in performing tasks that are illustrative for human Sensor Operators (SOs) in Remotely Piloted Aircraft Systems (RPASs). Our hypothesis is that the descriptive and predictive qualities of the agent's learning process potentially allow us to identify human task requirements, training needs, selection...
Reinforcement learning is one of the best methods to train autonomous robots. Using this method, a robot can learn to make optimal decisions without detailed programming and hard coded instructions. So, this method is useful for learning complex robotic behaviors. For example, in RoboCup competitions this method will be very useful in learning different behaviors. We propose a method for training...
We formulate tracking as an online decision-making process, where a tracking agent must follow an object despite ambiguous image frames and a limited computational bud- get. Crucially, the agent must decide where to look in the upcoming frames, when to reinitialize because it believes the target has been lost, and when to update its appearance model for the tracked object. Such decisions are typically...
Training of artificial neural networks (ANNs) using reinforcement learning (RL) techniques is being widely discussed in the robot learning literature. The high model complexity of ANNs along with the model-free nature of RL algorithms provides a desirable combination for many robotics applications. There is a huge need for algorithms that generalize using raw sensory inputs, such as vision, without...
This paper investigates the potential of combining deep learning and neuroevolution to create a bot for a simple first person shooter (FPS) game capable of aiming and shooting based on high-dimensional raw pixel input. The deep learning component is responsible for visual recognition and translating raw pixels to compact feature representations, while the evolving network takes those features as inputs...
In this paper, we study the model of human trust where an operator controls a robotic swarm remotely for a search mission. Existing trust models in human-in-the-loop systems are based on task performance of robots. However, we find that humans tend to make their decisions based on physical characteristics of the swarm rather than its performance since task performance of swarms is not clearly perceivable...
Robot navigation is a central problem in extraterrestrial environments and a suitable navigation algorithm that allows the robot to quickly but precisely avoid initially unknown obstacles is important for efficient navigation. In this paper, we consider a well-known machine learning-based framework called reinforcement learning for robot navigation and investigate a technique for adaptively adjusting...
Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction model. However,...
Recently it has been shown that policy-gradient methods for reinforcement learning can be utilized to train deep end-to-end systems directly on non-differentiable metrics for the task at hand. In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, significant...
This paper proposes a novel tracker which is controlled by sequentially pursuing actions learned by deep reinforcement learning. In contrast to the existing trackers using deep networks, the proposed tracker is designed to achieve a light computation as well as satisfactory tracking accuracy in both location and scale. The deep network to control actions is pre-trained using various training sequences...
Increase in popularity of deep convolutional neural networks in many different areas leads to increase in the use of these networks in reinforcement learning. Training a huge deep neural network structure by using simple gradient descent learning can take quite a long time. Some additional learning approaches should be utilized to solve this problem. One of these techniques is use of momentum which...
In this work a novel approach to Transfer Learning for the use in Deep Reinforcement Learning is introduced. The agent is realized as an actor-critic framework, namely the Deep Deterministic Policy Gradient algorithm. The Q-function and the policy are represented as deep feed-forward networks, that are trained by minimizing the mean squared Bellman error and by maximizing the expected reward, respectively...
The aim of this paper is to solve the control problem of trajectory tracking of Autonomous Underwater Vehicles (AUVs) through using and improving deep reinforcement learning (DRL). The deep reinforcement learning of an underwater motion control system is composed of two neural networks: one network selects action and the other evaluates whether the selected action is accurate, and they modify themselves...
In many real-world scenarios, rewards extrinsic to the agent are extremely sparse, or absent altogether. In such cases, curiosity can serve as an intrinsic reward signal to enable the agent to explore its environment and learn skills that might be useful later in its life. We formulate curiosity as the error in an agent's ability to predict the consequence of its own actions in a visual feature space...
We address the problem of autonomous race car driving. Using a recent rally game (WRC6) with realistic physics and graphics we train an Asynchronous Actor Critic (A3C) in an end-to-end fashion and propose an improved reward function to learn faster. The network is trained simultaneously on three very different tracks (snow, mountain, and coast) with various road structures, graphics and physics. Despite...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.