The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We consider the problem of object figure-ground segmentation when the object categories are not available during training (i.e. zero-shot). During training, we learn standard segmentation models for a handful of object categories (called “source objects”) using existing semantic segmentation datasets. During testing, we are given images of objects (called “target objects”) that are unseen during training...
This paper presents a phonetically-aware joint density Gaussian mixture model (JD-GMM) framework for voice conversion that no longer requires parallel data from source speaker at the training stage. Considering that the phonetic level features contain text information which should be preserved in the conversion task, we propose a method that only concatenates phonetic discriminant features and spectral...
This paper addresses the problem of object counting, which is to estimate the number of objects of interest from an input observation. We formalize the problem as a posterior inference of the count by introducing a particular type of Gaussian mixture for the input observation, whose mixture indexes correspond to the count. Unlike existing approaches in image analysis, which typically perform explicit...
Convolutional Neural Networks (CNN) have demonstrated its successful applications in computer vision, speech recognition, and natural language processing. For object recognition, CNNs might be limited by its strict label requirement and an implicit assumption that images are supposed to be target-object-dominated for optimal solutions. However, the labeling procedure, necessitating laying out the...
A+ aka Adjusted Anchored Neighborhood Regression - is a state-of-the-art method for exemplar-based single image super-resolution with low time complexity at both train and test time. By robustly training a clustered regression model over a low-resolution dictionary, its performance keeps improving with the dictionary size - even when using tens of thousands of regressors. However, this can pose a...
We present a novel approach towards web video classification and recounting that uses video segments to model an event. This approach overcomes the limitations faced by the classical video-level models such as modeling semantics, identifying informative segments in a video and background segment suppression. We posit that segment-based models are able to identify both the frequently-occurring and...
In this paper, we focus on training a classifier from large-scale data with incompletely assigned labels. In other words, we treat samples with following properties: 1. assigned labels are definitely positive, 2. absent labels are not necessarily negative, and 3. samples are allowed to take more than one label. These properties are frequently found in various kinds of computer vision tasks, including...
In general, CNN based semantic segmentation methods assume pixel-wise annotation is available, which is costly to obtain in general. On the other hand, image-level annotations is much easier to obtain than pixel-level annotation. Then, in this work, we focus on weakly-supervised semantic segmentation which is known as task of using training data with only image-level annotations. In this paper, we...
Appearance-based action recognition can be considered as a natural extension of appearance-based object detection from the spatial to the spatio-temporal domain. Although this step seems natural, most action recognition approaches are evaluated in isolation. Towards this end the contribution of this paper is twofold. First, a view-independent approach to action recognition is proposed and second the...
Recognition of dominant planes is an important task used in areas such as robot navigation, augmented reality, 3D reconstruction, among others. There are several approaches for recognizing planar structures, however, most of these approaches are based on processing two or more images captured from different camera views or on processing 3D data in the form of point clouds associated with the camera...
Video-based activity and behavior analysis for mice has garnered wide attention in biomedical research. Animal facilities hold large numbers of mice housed in ‘home-cages’ densely stored within ventilated racks. Automated analysis of mice activity in their home-cages can provide a new set of sensitive measures for detecting abnormalities and time-resolved deviation from baseline behavior. Large scale...
We present an approach for the detection of buildings in multispectral satellite images. Unlike 3-channel RGB images, satellite imagery contains additional channels corresponding to different wavelengths. Approaches that do not use all channels are unable to fully exploit these images for optimal performance. Furthermore, care must be taken due to the large bias in classes, e.g., most of the Earth...
Retrieving a small set of relevant and interesting objects from a large background class is challenging because classifiers can easily be overwhelmed by the large class. Classifiers have been developed that are more sensitive to the small class, and typically they optimize a ranking, or precision at the top. These measures can be costly because they often look at pairwise rankings. The classical approach...
Micro-expression recognition is a challenging task in computer vision field due to the repressed facial appearance and short duration. Previous work for micro-expression recognition have used hand-crafted features like LBP-TOP, Gabor filter and optical flow. This paper is the first work to explore the possible use of deep learning for micro-expression recognition task. Due to the lack of data for...
Background subtraction (BS) is one of the key steps for detecting moving objects in video surveillance applications. In the last few years, many BS methods have been developed to handle the different challenges met in video surveillance but the role and the relevance of the visual features used has been less investigated. In this paper, we present an Online Weighted Ensemble of One-Class SVMs (Support...
A new efficient measure for predicting estimation accuracy is proposed and successfully applied to multistream-based unsupervised adaptation of ASR systems to address data uncertainty when the ground-truth is unknown. The proposed measure is an extension of the M-measure, which predicts confidence in the output of a probability estimator by measuring the divergences of probability estimates spaced...
Developing reliable and robust face verification systems has been a tough challenge in computer vision, for several decades. The variation in illumination and head pose may seriously inhibit the accuracy of two-dimensional face recognition. With the invention of a depth map sensor, more three-dimensional volume data can be processed to mitigate the problem associated with face verification. This paper...
In this paper, we propose a novel regularized sparse coding approach for template-based unconstrained face verification. Unlike traditional verification tasks, which require the evaluation on image-to-image or video-to-video pairs, template-based face verification/recognition methods can exploit training and/or gallery data containing a mixture of both images or videos from the person of interest...
Domain adaptation (DA) algorithms utilize a label-rich old dataset (domain) to build a machine learning model (classification, detection etc.) in a label-scarce new dataset with different data distribution. Recent approaches transform cross-domain data into a shared subspace by minimizing the shift between their marginal distributions. In this paper, we propose a novel iterative method to learn a...
In real applications of one class classification, new features may be added due to some practical or technical reason. While lacking of representative samples for the new features, multi-task learning idea could be used to bring some information from the former learning model. Based on the above assumption, a new multi-task learning approach is proposed to deal with the training of the updated system...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.