The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The success of fine-grained visual categorization (FGVC) extremely relies on the modeling of appearance and interactions of various semantic parts. This makes FGVC very challenging because: (i) part annotation and detection require expert guidance and are very expensive; (ii) parts are of different sizes; and (iii) the part interactions are complex and of higher-order. To address these issues, we...
Most recent CNN architectures use average pooling as a final feature encoding step. In the field of fine-grained recognition, however, recent global representations like bilinear pooling offer improved performance. In this paper, we generalize average and bilinear pooling to “α-pooling”, allowing for learning the pooling strategy during training. In addition, we present a novel way to visualize decisions...
This paper aims to develop an effective flower classification approach using the technology of feature extraction. With this regard, a fused descriptor based on Pyramid Histogram of Visual Words (PHOW) is used to extract the color, texture and contour information of flower image. Secondly, Dictionary Learning and Locality-constrained Linear Coding (LLC) are operated on PHOW feature and then images...
A novel proposed approach, collaborative representation-based classification, has been developed for face recognition and recently used in image classification task owing to its simplicity and effectiveness. The major drawback of this method is the neglect of the spatial structure among the image representations. Inspired by the success of this technique and motivated by the power of spatial information...
Action recognition has been one of the most popular fields of computer vision. This paper presents a novel approach to action recognition problem using the dimension reduction method, local fisher discriminant analysis, to reduce the dimension of feature descriptors as the preprocessing step after feature extraction. We propose to use sparse matrix and randomized kd-tree to modify and accelerate the...
As integration of depth sensing into mobile devices is likely forthcoming, we investigate on merging appearance and shape information for mobile visual search. Accordingly, we propose an RGB-D search engine architecture that can attain high recognition rates with peculiarly moderate bandwidth requirements. Our experiments include a comparison to the CDVS (Compact Descriptors for Visual Search) pipeline,...
Local features have played an important role in visual recognition. Methods based on local features, e.g., the bag-of-words (BoW) model and sparse coding, have shown their effectiveness in image and object recognition in the past decades. Recently, many new techniques, including the improvements of BoW and sparse coding as well as the non-parametric naive bayes nearest neighbor (NBNN) classifier,...
Visual matching algorithms can be described in terms of visual content representation and similarity measure. With local feature based representations, visual matching can be restated as: 1) how to obtain visual similarity from the local kernel matrix, and 2) how to calculate the local kernel matrix effectively and efficiently. Existing methods mostly focus on the former, and use Euclidean distance...
The bag of visual words (BoW) model is one of the most successful model in image classification task. However, the major problem of the BoW model lies in the determination of visual words, which consists of codebook training and feature encoding phases. The traditional K-means and hard-assignment method completely ignore the structure of the local feature space, leading to high loss of information...
Nowadays the bag-of-visual-words is a very popular approach to perform the task of Visual Object Classification (VOC). Two key phases of VOC are the vocabulary building step, i.e. the construction of a ‘visual dictionary’ including common codewords in the image corpus, and the assignment step, i.e. the encoding of the images by means of these codewords. Hard assignment of image descriptors to visual...
Event recognition has been an important topic in computer vision research due to its many applications. However, most of the work has focused on videos taken from a fixed camera, known environments and basic events. Here, we focus on classification of unconstrained, web videos into much higher level activities. We follow the approach of constructing fixed length feature vectors from local feature...
This paper addresses the challenging problem of scene classification in street-view georeferenced images of urban environments. More precisely, the goal of this task is semantic image classification, consisting in predicting in a given image, the presence or absence of a pre-defined class (e.g. shops, vegetation, etc.). The approach is based on the BOSSA representation, which enriches the Bag of Words...
In this work, we introduce a hierarchical matching framework with so-called side information for image classification based on bag-of-words representation. Each image is expressed as a bag of orderless pairs, each of which includes a local feature vector encoded over a visual dictionary, and its corresponding side information from priors or contexts. The side information is used for hierarchical clustering...
In this paper, we propose a compact image signature based on VLAT. Our method integrates spatial information while significantly reducing the size of original VLAT by using two pojection steps. we carry out experiments showing our approach is competitive with state of the art signatures.
In this paper, we address an interesting application of computer vision technique, namely classification of Indian Classical Dance (ICD). With the best of our knowledge, the problem has not been addressed so far in computer vision domain. To deal with this problem, we use a sparse representation based dictionary learning technique. First, we represent each frame of a dance video by a pose descriptor...
Spatial relationships between local features are thought to play a vital role in representing object categories. However, learning a compact set of higher-order spatial features based on visual words, e.g., doublets and triplets, remains a challenging problem as possible combinations of visual words grow exponentially. While the local pairwise codebook achieves a compact codebook of pairs of spatially...
Spatial Pyramid Match lies at a heart of modern object category recognition systems. Once image descriptors are expressed as histograms of visual words, they are further deployed across spatial pyramid with coarse-to-fine spatial location grids. However, such representation results in extreme histogram vectors of 200K or more elements increasing computational and memory requirements. This paper investigates...
Visual Word Uncertainty also referred to as Soft Assignment is a well established technique for representing images as histograms by flexible assignment of image descriptors to a visual vocabulary. Recently, an attention of the community dealing with the object category recognition has been drawn to Linear Coordinate Coding methods. In this work, we focus on Soft Assignment as it yields good results...
The problem of large-scale image search has been traditionally addressed with the bag-of-visual-words (BOV). In this article, we propose to use as an alternative the Fisher kernel framework. We first show why the Fisher representation is well-suited to the retrieval problem: it describes an image by what makes it different from other images. One drawback of the Fisher vector is that it is high-dimensional...
Human action recognition can be performed using multiscale salient features which encode the local events in the video. Existing feature extraction methods use non-causal spatio-temporal filtering, and hence, they are not biologically plausible. To address this inconsistency, new features extracted from a biologically plausible perception model are introduced. In this model, the opponent-based motion...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.