The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
While strong progress has been made in image captioning recently, machine and human captions are still quite distinct. This is primarily due to the deficiencies in the generated word distribution, vocabulary size, and strong bias in the generators towards frequent captions. Furthermore, humans – rightfully so – generate multiple, diverse captions, due to the inherent ambiguity in the captioning task...
The success of deep learning in vision can be attributed to: (a) models with high capacity; (b) increased computational power; and (c) availability of large-scale labeled data. Since 2012, there have been significant advances in representation capabilities of the models and computational capabilities of GPUs. But the size of the biggest dataset has surprisingly remained constant. What will happen...
For large-scale visual search, highly compressed yet meaningful representations of images are essential. Structured vector quantizers based on product quantization and its variants are usually employed to achieve such compression while minimizing the loss of accuracy. Yet, unlike binary hashing schemes, these unsupervised methods have not yet benefited from the supervision, end-to-end learning and...
We propose ‘Hide-and-Seek’, a weakly-supervised framework that aims to improve object localization in images and action localization in videos. Most existing weakly-supervised methods localize only the most discriminative parts of an object rather than all relevant parts, which leads to suboptimal performance. Our key idea is to hide patches in a training image randomly, forcing the network to seek...
In this paper, we address the problem of spatio-temporal person retrieval from videos using a natural language query, in which we output a tube (i.e., a sequence of bounding boxes) which encloses the person described by the query. For this problem, we introduce a novel dataset consisting of videos containing people annotated with bounding boxes for each second and with five natural language descriptions...
Understanding the visual relationship between two objects involves identifying the subject, the object, and a predicate relating them. We leverage the strong correlations between the predicate and the hsubj; obji pair (both semantically and spatially) to predict predicates conditioned on the subjects and the objects. Modeling the three entities jointly more accurately reflects their relationships...
This paper presents a new method for the reconstruction of images from samples located at non-integer mesh positions. This is a common scenario for many image processing applications such as multi-image super-resolution, frame-rate up-conversion, or virtual view synthesis in multi-camera systems. The proposed method consists of an iterative procedure that employs adaptive denoising in order to reduce...
In this paper, we present a tracking system to estimate the position of a surgical instrument used in minimally invasive spine surgeries for training. The purpose of our system is to get the information about movements and surgeons skills during the training. The system uses four infrared markers embedded on the surgical instrument of common used. At least two Wii Remote Control is needed for calculating...
With the benefit of convolutional operation, the convolutional neural networks (CNN) has been successfully applied in classification, regression and time series modeling. For neural modeling of dynamic systems, CNN also should have many advantages over other neural models, such as avoiding local minima and the noises and outliers affections. In this paper, the dynamic system identification is addressed...
Artificial intelligence is widely used in image processing. Neural networks (NN) were successful used for solving complicated issues due to their capacity of generalization and learning from examples. In this paper some aspects of image compression using artificial neural networks are discussed. The network is used in the feedback loop of the visual servoing system, which aims to control a wheeled...
When looking at an image, humans shift their attention towards interesting regions, making sequences of eye fixations. When describing an image, they also come up with simple sentences that highlight the key elements in the scene. What is the correlation between where people look and what they describe in an image? To investigate this problem, we look into eye fixations and image captions, two types...
Nowadays online image search become more essential. In this paper, we have extended existing system for image re-ranking is explained. The existing system is divided into offline and online parts. In offline part various semantic spaces are automatically learns for different query keywords. Image Semantic content as signatures are generated by mapping the image features i.e. visual features into its...
The paper describes human-interactive robot that supports gait training base on autonomous evaluation and navigation of human body movements. Robotic intervention in gait training is a promising method for prospective rehabilitation. In literature, gait training platforms such as power assisting limbs and body supporting mobile platforms have been studied well. These types of platforms, however, mainly...
In this paper, a system to aid the visually impaired by providing contextual information of the surroundings using 360° view camera combined with deep learning is proposed. The system uses a 360° view camera with a mobile device to capture surrounding scene information and provide contextual information to the user in the form of audio. The scene information from the spherical camera feed is classified...
The present work proposes a neurofeedback training system for the induction of an attention state aided by audiovisual stimuli on an experimental group of nine junior high school individuals between twelve and fifteen years old. A control group of 10 individuals with the same characteristics as the experimental group is defined as well to validate the training's efficiency. The auditory stimulation...
In this paper we report on our study of the performance of Deep Reinforcement Learning (DRL) agents in performing tasks that are illustrative for human Sensor Operators (SOs) in Remotely Piloted Aircraft Systems (RPASs). Our hypothesis is that the descriptive and predictive qualities of the agent's learning process potentially allow us to identify human task requirements, training needs, selection...
The contribution of this paper is to bridge the gap on understanding the mathematical structure and the computational implementation of a convolutional neural network using a minimal model. The proposed minimal convolutional neural network is presented using a layering approach. This approach provides a clear understanding of the main mathematical operations in a convolutional neural network. Hence,...
Task outcome feedback is often used to sustain human work motivation and ensure overall productivity. Feedback can also be useful for reducing learning periods for new tasks. Two types of feedback, including knowledge of results (KR) and knowledge of performance (KP), are commonly delivered as part of motor training by using various sensory modalities (e.g., visual, auditory or kinesthetic). Unfortunately,...
We report an experimental study that involves understanding how display (conventional or ecological) and system mode (profiting, neutral or losing) affect financial trading performance and risk preference. Twenty-four undergraduate and graduate student participants interacted with a financial trading simulator in the playback of a real market. Each participant completed a conventional display scenario...
The high-level feature representation of deep convo-lutional neural networks (ConvNets) has proven to be superior to hand-crafted low-level features. Thus, this study investigates the effect of fusing such high-level features from multi-deep ConvNets under an application of visual object/scene categorization. In which, three pre-trained ConvNets are exploited as feature extractors, a single hidden...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.