The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Content based indexing is critical to the effective access of the multimedia data. To this end, visual data is often annotated with textual content for bridging the semantic gap. In this paper, we present a method to generate frame level fine grained annotations for a given video clip. Access to the frame level fine grained annotations lead to rich, dense and meaningful semantic associations between...
Road detection from images is a challenging task in computer vision. Previous methods are not robust, because their features and classifiers cannot adapt to different circumstances. To overcome this problem, we propose to apply unsupervised feature learning for road detection. Specifically, we develop an improved encoding function and add a feature selection process to obtain robust and discriminative...
We propose to learn semantic spatio-temporal embeddings for videos to support high-level video analysis. The first step of the proposed embedding employs a deep architecture consisting of two channels of convolutional neural networks (capturing appearance and local motion) followed by their corresponding Gated Recurrent Unit encoders for capturing longer-term temporal structure of the CNN features...
In this paper, we propose a new local descriptor for action recognition in depth images. The proposed descriptor relies on surface normals in 4D space of depth, time, spatial coordinates and higher-order partial derivatives of depth values along spatial coordinates. In order to classify actions, we follow the traditional Bag-of-words (BoW) approach, and propose two encoding methods termed Multi-Scale...
We propose mutually incoherent pose bases for action recognition in static image, each of which implicitly represents co-occurrence of poselets. First of all, action specific poselets are trained. To suppress the ambiguity of detection, we cluster poselet activations by the overlap of predicted torso bound of each poselet. Then pose feature of an action person can be extracted which is a vector composed...
We present Deep Sparse-coded Network (DSN), a deep architecture based on multilayer sparse coding. It has been considered difficult to learn a useful feature hierarchy by stacking sparse coding layers in a straightforward manner. The primary reason is the modeling assumption for sparse coding that takes in a dense input and yields a sparse output vector. Applying a sparse coding layer on the output...
We present a regularization technique based on the minimum description length (MDL) principle for the linear manifold clustering. We suggest an inexact minimum description length method based on describing the data structure as linear manifold clusters. We examine the behavior of the proposed method and compare it performance against simulated clustering results of various dimensionality and structure...
Error Correcting Output Coding (ECOC) is a multi-class classification technique in which multiple binary classifiers are trained according to a preset code matrix such that each one learns a separate dichotomy of the classes. While ECOC is one of the best solutions for multi-class problems, one issue which makes it suboptimal is that the training of the base classifiers is done independently of the...
Biometric systems can be attacked in several ways and the most common being spoofing the input sensor. Therefore, anti-spoofing is one of the most essential prerequisite against attacks on biometric systems. For face recognition it is even more vulnerable as the image capture is non-contact based. Several anti-spoofing methods have been proposed in the literature for both contact and non-contact based...
Active one-shot scanning techniques have been widely used for various applications. Stereo-based active one-shot scanning embeds a positional information regarding the image plane of a projector onto a projected pattern to retrieve correspondences entirely from a captured image. Many combinations of patterns and decoding algorithms for active one-shot scanning have been proposed. If the capturing...
We present an approach for unsupervised computation of local shape descriptors, which relies on the use of linear autoencoders for characterizing local regions of complex shapes. The proposed approach responds to the need for a robust scheme to index binary images using local descriptors, which arises when only few examples of the complete images are available for training, thus making inaccurate...
Recently, Approximate Nearest Neighbor (ANN) Search has become a very popular approach for similarity search on large-scale datasets. In this paper, we propose a novel vector quantization method for ANN, which introduces a joint multi-layer K-Means clustering solution for determination of the codebooks. The performance of the proposed method is improved further by a joint encoding scheme. Experimental...
This paper presents a novel deep architecture for saliency prediction. Current state of the art models for saliency prediction employ Fully Convolutional networks that perform a non-linear combination of features extracted from the last convolutional layer to predict saliency maps. We propose an architecture which, instead, combines features extracted at different levels of a Convolutional Neural...
The goal of semi-supervised learning is to improve supervised classifiers by using additional unlabeled training examples. In this work we study a simple self-learning approach to semi-supervised learning applied to the least squares classifier. We show that a soft-label and a hard-label variant of self-learning can be derived by applying block coordinate descent to two related but slightly different...
Dimensionality reduction methods have been shown to be effective for handwritten Chinese character recognition. In this paper, we propose discriminative projection based on locality-sensitive sparse representation (DPLSR) for in-air handwritten Chinese character recognition. DPLSR based on the locality-sensitive sparse representation based classifier (LSRC), which can provide closed-form solutions...
Orientation Field (OF) is one of the most significant characters to distinguish fingerprint images from non-fingerprint images. An effective definition of fingerprint OF pattern will not only benefit fingerprint enhancement, but also contribute to latent fingerprint detection and segmentation. The existing fingerprint OF models either require pre-knowledge of singular points, or cannot be generalized...
Recently, sparse representation (SR) over a redundant dictionary has become a popular way of representing the data. It has been verified as an efficient and useful tool to promote the discrimination between signals. This work develops a joint learning approach to find the low dimensional discriminative features for high dimensional data. To avoid the high computational cost of direct sparse coding...
A+ aka Adjusted Anchored Neighborhood Regression - is a state-of-the-art method for exemplar-based single image super-resolution with low time complexity at both train and test time. By robustly training a clustered regression model over a low-resolution dictionary, its performance keeps improving with the dictionary size - even when using tens of thousands of regressors. However, this can pose a...
The encoding method is an important factor for an action recognition pipeline. One of the key points for the encoding method is the assignment step. A very widely used super-vector encoding method is the vector of locally aggregated descriptors (VLAD), with very competitive results in many tasks. However, it considers only hard assignment and the criteria for the assignment is performed only from...
In this paper, we propose a novel regularized sparse coding approach for template-based unconstrained face verification. Unlike traditional verification tasks, which require the evaluation on image-to-image or video-to-video pairs, template-based face verification/recognition methods can exploit training and/or gallery data containing a mixture of both images or videos from the person of interest...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.