The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
A novel statistical textural distinctiveness approach for robustly detecting salient regions in natural images is proposed. Rotational-invariant neighborhood-based textural representations are extracted and used to learn a set of representative texture atoms for defining a sparse texture model for the image. Based on the learnt sparse texture model, a weighted graphical model is constructed to characterize...
We study in this paper the problem of learning classifiers from ambiguously labeled images. For instance, in the collection of new images, each image contains some samples of interest (\emph{e.g.,} human faces), and its associated caption has labels with the true ones included, while the sample-label association is unknown. The task is to learn classifiers from these ambiguously labeled images and...
We present a new descriptor for activity recognition from videos acquired by a depth sensor. Previous descriptors mostly compute shape and motion features independently, thus, they often fail to capture the complex joint shape-motion cues at pixel-level. In contrast, we describe the depth sequence using a histogram capturing the distribution of the surface normal orientation in the 4D space of time,...
An approach to learn a structured low-rank representation for image classification is presented. We use a supervised learning method to construct a discriminative and reconstructive dictionary. By introducing an ideal regularization term, we perform low-rank matrix recovery for contaminated training data from all categories simultaneously without losing structural information. A discriminative low-rank...
Active learning provides useful tools to reduce annotation costs without compromising classifier performance. However it traditionally views the supervisor simply as a labeling machine. Recently a new interactive learning paradigm was introduced that allows the supervisor to additionally convey useful domain knowledge using attributes. The learner first conveys its belief about an actively chosen...
We propose a new model for recognizing human attributes (e.g. wearing a suit, sitting, short hair) and actions (e.g. running, riding a horse) in still images. The proposed model relies on a collection of part templates which are learnt discriminatively to explain specific scale-space locations in the images (in human centric coordinates). It avoids the limitations of highly structured models, which...
Recognizing the location of a query image by matching it to a database is an important problem in computer vision, and one for which the representation of the database is a key issue. We explore new ways for exploiting the structure of a database by representing it as a graph, and show how the rich information embedded in a graph can improve a bag-of-words-based location recognition method. In particular,...
Complex real-world signals, such as images, contain discriminative structures that differ in many aspects including scale, invariance, and data channel. While progress in deep learning shows the importance of learning features through multiple layers, it is equally important to learn features through multiple paths. We propose Multipath Hierarchical Matching Pursuit (M-HMP), a novel feature learning...
Domain adaptation addresses the problem where data instances of a source domain have different distributions from that of a target domain, which occurs frequently in many real life scenarios. This work focuses on unsupervised domain adaptation, where labeled data are only available in the source domain. We propose to interpolate subspaces through dictionary learning to link the source and target domains...
This paper presents a method for quasi-rigid objects modeling from a sequence of depth scans captured at different time instances. As quasi-rigid objects, such as human bodies, usually have shape motions during the capture procedure, it is difficult to reconstruct their geometries. We represent the shape motion by a deformation graph, and propose a model-to-part method to gradually integrate sampled...
This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals. A straightforward application of the popular l1-minimization for face recognition on a frame-by-frame basis is prohibitively expensive, so we propose a novel algorithm...
In many real-world face recognition scenarios, face images can hardly be aligned accurately due to complex appearance variations or low-quality images. To address this issue, we propose a new approach to extract robust face region descriptors. Specifically, we divide each image (resp. video) into several spatial blocks (resp. spatial-temporal volumes) and then represent each block (resp. volume) by...
Challenge, but also an opportunity to eliminate spurious similarities. Luckily, a major source of confusion in visual similarity of faces is the 3D head orientation, for which image analysis tools provide an accurate estimation. The method we propose belongs to a family of classifier-based similarity scores. We present an effective way to discount pose induced similarities within such a framework,...
Automatic facial action unit (AFA) detection from video is a long-standing problem in facial expression analysis. Most approaches emphasize choices of features and classifiers. They neglect individual differences in target persons. People vary markedly in facial morphology (e.g., heavy versus delicate brows, smooth versus deeply etched wrinkles) and behavior. Individual differences can dramatically...
How many labeled examples are needed to estimate a classifier's performance on a new dataset? We study the case where data is plentiful, but labels are expensive. We show that by making a few reasonable assumptions on the structure of the data, it is possible to estimate performance curves, with confidence bounds, using a small number of ground truth labels. Our approach, which we call Semi supervised...
Users often have very specific visual content in mind that they are searching for. The most natural way to communicate this content to an image search engine is to use key-words that specify various properties or attributes of the content. A naive way of dealing with such multi-attribute queries is the following: train a classifier for each attribute independently, and then combine their scores on...
Metric learning methods, for person re-identification, estimate a scaling for distances in a vector space that is optimized for picking out observations of the same individual. This paper presents a novel approach to the pedestrian re-identification problem that uses metric learning to improve the state-of-the-art performance on standard public datasets. Very high dimensional features are extracted...
Recent trends in semantic image segmentation have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning. In this work, we are interested in understanding the roles of these different tasks in aiding semantic segmentation. Towards this goal, we "plug-in" human subjects for each...
In this paper, we propose a weakly supervised method for simultaneously learning scene parts and attributes from a collection of images associated with attributes in text, where the precise localization of the each attribute left unknown. Our method includes three aspects. (i) Compositional scene configuration. We learn the spatial layouts of the scene by Hierarchical Space Tiling (HST) representation,...
In this paper, rather than modeling activities in videos individually, we propose a hierarchical framework that jointly models and recognizes related activities using motion and various context features. This is motivated from the observations that the activities related in space and time rarely occur independently and can serve as the context for each other. Given a video, action segments are automatically...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.