The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Affordance learning in general, is to identify the purpose, use, and ways to interact with an object, based on information gained from observing the object. Most of the existing affordance learning approaches assume the object target has been cropped individually from images. However, the object could not be easily separated from others due to occlusion or noise. Actually, two or more neighboring...
Classifiers trained on given databases perform poorly when tested on data acquired in different settings. This is explained in domain adaptation through a shift among distributions of the source and target domains. Attempts to align them have traditionally resulted in works reducing the domain shift by introducing appropriate loss terms, measuring the discrepancies between source and target distributions,...
The success of deep learning in vision can be attributed to: (a) models with high capacity; (b) increased computational power; and (c) availability of large-scale labeled data. Since 2012, there have been significant advances in representation capabilities of the models and computational capabilities of GPUs. But the size of the biggest dataset has surprisingly remained constant. What will happen...
In this paper, a system to aid the visually impaired by providing contextual information of the surroundings using 360° view camera combined with deep learning is proposed. The system uses a 360° view camera with a mobile device to capture surrounding scene information and provide contextual information to the user in the form of audio. The scene information from the spherical camera feed is classified...
In this paper we report on our study of the performance of Deep Reinforcement Learning (DRL) agents in performing tasks that are illustrative for human Sensor Operators (SOs) in Remotely Piloted Aircraft Systems (RPASs). Our hypothesis is that the descriptive and predictive qualities of the agent's learning process potentially allow us to identify human task requirements, training needs, selection...
The contribution of this paper is to bridge the gap on understanding the mathematical structure and the computational implementation of a convolutional neural network using a minimal model. The proposed minimal convolutional neural network is presented using a layering approach. This approach provides a clear understanding of the main mathematical operations in a convolutional neural network. Hence,...
Deep learning has brought a series of breakthroughs in image processing. Specifically, there are significant improvements in the application of food image classification using deep learning techniques. However, very little work has been studied for the classification of food ingredients. Therefore, this paper proposes a new framework, called DeepFood which not only extracts rich and effective features...
We introduce the first goal-driven training for visual question answering and dialog agents. Specifically, we pose a cooperative ‘image guessing’ game between two agents – Q-BOT and A-BOT– who communicate in natural language dialog so that Q-BOT can select an unseen image from a lineup of images. We use deep reinforcement learning (RL) to learn the policies of these agents end-to-end – from pixels...
The contribution of this paper is to bridge the gap on understanding the mathematical structure and the computational implementation of a convolutional neural network (CNN) using a minimal model (Minimal CNN). The proposed minimal CNN is presented using a layering approach. This approach provides a concise and accessible understanding of the main mathematical operations of a CNN. Hence, it benefits...
Textual-visual matching aims at measuring similarities between sentence descriptions and images. Most existing methods tackle this problem without effectively utilizing identity-level annotations. In this paper, we propose an identity-aware two-stage framework for the textual-visual matching problem. Our stage-1 CNN-LSTM network learns to embed cross-modal features with a novel Cross-Modal Cross-Entropy...
Although shadows in images have a constructive role providing a natural view of features of the scene, they also have a destructive role in image processing by hiding significant information. Improving the quality of 3D textured models for serious games and augmented reality applications via shadow detection and removal remains challenging due to the complexity of an image scene. This paper proposes...
The development of a deep (stacked) convolutional auto-encoder in the Caffe deep learning framework is presented in this paper. We describe simple principles which we used to create this model in Caffe. The proposed model of convolutional auto-encoder does not have pooling/unpooling layers yet. The results of our experimental research show comparable accuracy of dimensionality reduction in comparison...
In this article, we develop two visual impression models: recognition model and generalization model to simulate the cognition process of human visual systems. We show how the visual impression learned with a deep neural network can be efficiently transferred to other visual recognition tasks. By reusing the hidden layers trained in an unsupervised way, we show that we can largely reduce the number...
We present in this paper a novel approach for training a topological deep neural network with visual impression. We show that by combing denoising auto-encoder model and contractive auto-encoder with Hessian regularization model, we can achieve a deterministic auto-encoder aiming for robustness to small variations of the input. We exploit the tangent propagation algorithm to show how our algorithm...
Modeling the activity of an ensemble of neurons can provide critical insights into the workings of the brain. In this work we examine if learning based signal modeling can contribute to a high quality modeling of neuronal signal data. To that end, we employ the sparse coding and dictionary learning schemes for capturing the behavior of neuronal responses into a small number of representative prototypical...
This paper investigates the potential of combining deep learning and neuroevolution to create a bot for a simple first person shooter (FPS) game capable of aiming and shooting based on high-dimensional raw pixel input. The deep learning component is responsible for visual recognition and translating raw pixels to compact feature representations, while the evolving network takes those features as inputs...
Large-scale datasets have driven the rapid development of deep neural networks for visual recognition. However, annotating a massive dataset is expensive and time-consuming. Web images and their labels are, in comparison, much easier to obtain, but direct training on such automatially harvested images can lead to unsatisfactory performance, because the noisy labels of Web images adversely affect the...
Zero-shot learning for visual recognition has received much interest in the most recent years. However, the semantic gap across visual features and their underlying semantics is still the biggest obstacle in zero-shot learning. To fight off this hurdle, we propose an effective Low-rank Embedded Semantic Dictionary learning (LESD) through ensemble strategy. Specifically, we formulate a novel framework...
We present a principled approach to uncover the structure of visual data by solving a novel deep learning task coined visual permutation learning. The goal of this task is to find the permutation that recovers the structure of data from shuffled versions of it. In the case of natural images, this task boils down to recovering the original image from patches shuffled by an unknown permutation matrix...
Deep neural networks require a large amount of labeled training data during supervised learning. However, collecting and labeling so much data might be infeasible in many cases. In this paper, we introduce a deep transfer learning scheme, called selective joint fine-tuning, for improving the performance of deep learning tasks with insufficient training data. In this scheme, a target learning task...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.