The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We propose to learn semantic spatio-temporal embeddings for videos to support high-level video analysis. The first step of the proposed embedding employs a deep architecture consisting of two channels of convolutional neural networks (capturing appearance and local motion) followed by their corresponding Gated Recurrent Unit encoders for capturing longer-term temporal structure of the CNN features...
In this paper, we propose a new local descriptor for action recognition in depth images. The proposed descriptor relies on surface normals in 4D space of depth, time, spatial coordinates and higher-order partial derivatives of depth values along spatial coordinates. In order to classify actions, we follow the traditional Bag-of-words (BoW) approach, and propose two encoding methods termed Multi-Scale...
Biometric systems can be attacked in several ways and the most common being spoofing the input sensor. Therefore, anti-spoofing is one of the most essential prerequisite against attacks on biometric systems. For face recognition it is even more vulnerable as the image capture is non-contact based. Several anti-spoofing methods have been proposed in the literature for both contact and non-contact based...
This paper presents a novel deep architecture for saliency prediction. Current state of the art models for saliency prediction employ Fully Convolutional networks that perform a non-linear combination of features extracted from the last convolutional layer to predict saliency maps. We propose an architecture which, instead, combines features extracted at different levels of a Convolutional Neural...
In this paper, we introduce a novel local feature-based hierarchical framework to produce invariant sparse codes for object recognition. In order to enforce the invariant property for each sample patch (local feature descriptor) in the image, its sparse code is recovered with a dedicated dictionary whose atoms are adaptively chosen from several bags of candidate atoms. The single-layer invariant sparse...
The encoding method is an important factor for an action recognition pipeline. One of the key points for the encoding method is the assignment step. A very widely used super-vector encoding method is the vector of locally aggregated descriptors (VLAD), with very competitive results in many tasks. However, it considers only hard assignment and the criteria for the assignment is performed only from...
Activity recognition in videos is a challenging task, mainly if a scarce number of samples is available for modelling the problem. The task becomes even harder when using generative models such as mixture models or Hidden Markov Models (HMMs), as they demand a lot of samples to determinate their parameters. Additionally, these models rely on the appropriate selection of some parameters, for instance...
One of the main problems of recognizing faces in videos is to achieve accurate algorithms which can be used in real-time applications. Recently, Fisher Vector representation of local descriptors (e.g., SIFT) has gained widespread popularity, achieving good recognition rates. In this work, we propose to use Fisher Vector encoding of binary features for video face recognition, in order to speed up the...
Articulatory features are used as an universal set of speech attributes shared across many different languages. Some multilingual and cross-language speech recognition systems using articulatory features have been shown to improve the performance. The existing articulatory features are defined by phonetician as a set of articulatory descriptions of phones, which represent some semantic information...
In this paper, we propose a novel two-stream framework based on combinational deep neural networks. The framework is mainly composed of two components: one is a parallel two-stream encoding component which learns video encoding from multiple sources using 3D convolutional neural networks and the other is a long-short-term-memory (LSTM)-based decoding language model which transfers the input encoded...
Recently, hash algorithms catch amounts of sights in the field of machine learning. Most existing hash methods directly utilize a vector, which can be piped by the column of image matrix, as a unit and adopt some feature extraction functions to project the original data into generally shorter fixed-length values or characters. Then each of these projected real values is quantized or hashed into zero-one...
Lines are the most essential and discriminative features of palmprint images, which motivate researches to propose various line direction based methods for palmprint recognition. Conventional methods usually capture the only one of the most dominant direction of palmprint images. However, a number of points in palmprint images have double or even more than two dominant directions because of a plenty...
The word embedding models are capable of capturing the semantic content of the textual words. The process of extracting a set of word embedding vectors from a text document is similar to the feature extraction step of the Bag-of-Features pipeline, which is usually used in computer vision tasks. That gives rise to the Bag-of-Embedded Words (BoEW) model. In this paper a novel learning technique that...
This paper presents a novel image feature representation method, called multi-channel micro-structure difference descriptor (MCMSDD) for image retrieval. With the local feature extraction from a micro-structure and MAX operator, MCMSDD integrates the advantages of multi-channel local binary encoding and color difference histogram , which are the fusion of color, texture and spatial distribution information...
We present a method for combining the Vector of Locally Aggregated Descriptor (VLAD) feature encoding with Deep Convolutional Neural Network (DCNN) features for unconstrained face verification. One of the key features of our method, called the VLAD-encoded DCNN (VLAD-DCNN) features, is that spatial and appearance information are simultaneously processed to learn an improved discriminative representation...
With the advent of the Internet and wide-spread popularity of online technology-enhanced learning platforms, many pedagogical activities today involve learners in online discussions such as synchronous chat. In this study, we describe a text mining method used for analyzing teamwork from such chat dialogue of students. The steps in the text mining method such as pre-processing and classification are...
The presented work proposes a simple feature extraction technique which is designed for robust detection of event related potentials (ERP). This technique was tested to detect the N400 which is an ERP generally associated with recall. The chief advantages of the proposed technique are that it is robust to different ocular artifacts and yet sensitive to event related potentials. Further each signal...
Convolutive non-negative matrix factorization (CNMF) is a promising method for extracting features from sequential multivariate data. Conventional algorithms for CNMF require that the structure, or the number of bases for expressing the data, be specified in advance. We are concerned with the issue of how we can select the best structure of CNMF from given data. We first introduce a framework of probabilistic...
Authorship identification is the task of identifying the author of a given text from a set of suspects. The main concern of this task is to define an appropriate characterization of texts that captures the writing style of authors. Although deep learning was recently used in different natural language processing tasks, it has not been used in author identification (to the best of our knowledge). In...
High accuracy fault diagnosis systems are extremely important for effective condition based maintenance (CBM) of rotating machines. In this work, we develop a fault diagnosis system using time and frequency domain statistical features as input to a backend support vector machine (SVM) classifier. We evaluate the performance of the baseline system for speed dependent and speed independent performance...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.