The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Language models based on recurrent neural networks have dominated recent image caption generation tasks. In this paper, we introduce a language CNN model which is suitable for statistical language modeling tasks and shows competitive performance in image captioning. In contrast to previous models which predict next word based on one previous word and hidden state, our language CNN is fed with all...
This paper proposes a novel hybrid model that integrates the synergy of two superior classifiers for functional magnetic resonance imaging (fMRI) recognition, namely, convolutional neural networks (CNNs) and support vector machines (SVMs), both of which have proven results in the field of image recognition. In the proposed model, the CNN functions as a trainable feature extractor and the SVM functions...
Human sketches are unique in being able to capture both the spatial topology of a visual object, as well as its subtle appearance details. Fine-grained sketch-based image retrieval (FG-SBIR) importantly leverages on such fine-grained characteristics of sketches to conduct instance-level retrieval of photos. Nevertheless, human sketches are often highly abstract and iconic, resulting in severe misalignments...
The majority of existing solutions to the Multi-Target Tracking (MTT) problem do not combine cues over a long period of time in a coherent fashion. In this paper, we present an online method that encodes long-term temporal dependencies across multiple cues. One key challenge of tracking methods is to accurately track occluded targets or those which share similar appearance properties with surrounding...
Recently, CNN-based models have achieved remarkable success in image-based salient object detection (SOD). In these models, a key issue is to find a proper network architecture that best fits for the task of SOD. Toward this end, this paper proposes two-stream fixation-semantic CNNs, whose architecture is inspired by the fact that salient objects in complex images can be unambiguously annotated by...
Person Re-identification (re-id) aims to match people across non-overlapping camera views in a public space. It is a challenging problem because many people captured in surveillance videos wear similar clothes. Consequently, the differences in their appearance are often subtle and only detectable at the right location and scales. Existing re-id models, particularly the recently proposed deep learning...
Mobile device features such as camera and other sensors are evolving rapidly nowadays. Supported by a reliable communications network, it raises new methods in information retrieval. Mobile devices can capture an image with its camera and pass it to the retrieval systems to get the information needed. This system, called Mobile Content-Based Image Retrieval (MCBIR), generally consists of two parts:...
Machine learning techniques, namely convolutional neural networks (CNN) and regression forests, have recently shown great promise in performing 6-DoF localization of monocular images. However, in most cases image-sequences, rather only single images, are readily available. To this extent, none of the proposed learning-based approaches exploit the valuable constraint of temporal smoothness, often leading...
This paper proposes a deep learning architecture based on Residual Network that dynamically adjusts the number of executed layers for the regions of the image. This architecture is end-to-end trainable, deterministic and problem-agnostic. It is therefore applicable without any modifications to a wide range of computer vision problems such as image classification, object detection and image segmentation...
Most of the recent successful methods in accurate object detection and localization used some variants of R-CNN style two stage Convolutional Neural Networks (CNN) where plausible regions were proposed in the first stage then followed by a second stage for decision refinement. Despite the simplicity of training and the efficiency in deployment, the single stage detection methods have not been as competitive...
In this paper we propose a deep learning architecture to make the best use of global and local information for pixel-wise semantic segmentation. The architecture of three-skips CNN is built with convolutional layers in VGG16 network and its mirrored convolutional layers. Our architecture aims to road scene understanding. In order to save memory and computational time, we use unpooling layers to map...
Predicting interestingness of media content remains an important, but challenging research subject. The difficulty comes first from the fact that, besides being a high-level semantic concept, interestingness is highly subjective and its global definition has not been agreed yet. This paper presents the use of up-to-date deep learning techniques for solving the task. We perform experiments with both...
In this paper, we introduce a joint model that learns to directly localize the temporal bounds of actions in untrimmed videos as well as precisely classify what actions occur. Most existing approaches tend to scan the whole video to generate action instances, which are really inefficient. Instead, inspired by human perception, our model is formulated based on a recurrent neural network to observe...
Face detection in unconstrained environments is a challenging problem due to partial occlusions with pose variations. Existing partial occluded face detection methods require training several models, computing hand-crafted features, or both. In this paper, our contributions are two-fold. First, we propose our Large-Scale Deep Learning (LSDL), a method that requires a single Convolutional Neural Network...
In this article, we propose a new optimized embedded architecture based soft-core processors oriented to visual attention based object recognition applications. Our recognition approach relies mainly on two specific modules for online processing of acquired images in real-time: a novel saliency based feature detector/descriptor module and then an object classifier module. To deal with such parallel/pipeline...
Connecting different text attributes associated with the same entity (conflation) is important in business data analytics since it could help merge two different tables in a database to provide a more comprehensive profile of an entity. However, the conflation task is challenging because two text strings that describe the same entity could be quite different from each other for reasons such as misspelling...
Most affect based systems analyse facial expressions for emotion detection, and utilize face detection and recognition methods in order to do effective affect analysis. Recent work has demonstrated the efficacy of deep architectures for face recognition by training as classifiers on voluminous datasets. Some architectures are trained as classifiers, and some directly learn an embedding via a triplet...
Deep Learning (DL), especially Convolutional Neural Networks (CNN), has become the state-of-the-art for a variety of pattern recognition issues. Technological developments have allowed the use of high-end General Purpose Graphic Processor Units (GPGPU) for accelerating numerical problem solving. They resort no only to lower computational time, but also allow considering much larger networks. Hence,...
A new type of End-to-End system for text-dependent speaker verification is presented in this paper. Previously, using the phonetic discriminate/speaker discriminate DNN as a feature extractor for speaker verification has shown promising results. The extracted frame-level (bottleneck, posterior or d-vector) features are equally weighted and aggregated to compute an utterance-level speaker representation...
Convolutional Neural Networks (CNN) are useful methods for identification of previously unknown embedded patterns in images. Several object and facial recognition along with image segmentation tasks have benefited from the non-linear abstraction of hybrid features using CNN. This work presents a novel CNN model parametrization work-flow developed on the cloud-computing platform of Microsoft Azure...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.