The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper compares the use of signal to noise ratio (SNR)-dependent and SNR-independent mixtures of probabilistic linear discriminant analysis (PLDA) versus conventional PLDA, under multi-noise and multi-SNR conditions for a small-set speaker verification system. Results indicate that conventional PLDA is more robust under multi-SNR conditions. The effect of the testing speech length is also examined...
In this paper, we address the problem of automated pose classification and segmentation of the left ventricle (LV) in 2D echocardiographic images. For this purpose, we compare two complementary approaches. The first one is based on engineering ad-hoc features according to the traditional machine learning paradigm. Namely, we extract phase features to build an unsupervised LV pose estimator, as well...
The detection of cells and nuclei is a crucial step for the automatic analysis of digital pathology slides and as such for the quantification of the phenotypic information contained in tissue sections. This task is however challenging because of high variability in size, shape and textural appearance of the objects to be detected and of the high variability of tissue appearance. In this work, we propose...
We address the problem of latent truth discovery, LTD for short, where the goal is to discover the underlying true values of entity attributes in the presence of noisy, conflicting or incomplete information. Despite a multitude of algorithms addressing the LTD problem, only little is known about their overall performance with respect to effectiveness, efficiency and robustness. The LTD model proposed...
In the calculation of rank minimization, the non-negative sparse low-rank representation classification (NSLRRC) regularizes nuclear norm's each singular value equally, but this limits its flexibility and ability to solve many practical problems, where the singular values with clear physical meanings ought to be treated differently. In this paper, a weighted non-negative sparse low-rank representation...
In this work, we propose a method collaborating the local similarity and local community paradigm with a tunable parameter to balance the contribution of the energy from these two sources. We show that local similarity e.g., common neighbors and local community paradigm e.g., local community links both play significant roles in network evolution; therefore, one cannot ignore or penalize anyone of...
Sparse representation based classification has gained popularity with geospatial image analysis in general and hyperspectral image analysis in particular. A central idea with such classification approaches is that a test pixel (spectral reflectance vector) can be sparsely represented in a training dictionary of pixels from all classes - in particular, only training pixels in the dictionary that bear...
In this paper, we propose to extract robust video descriptor by training deep neural network to automatically capture the intrinsic visual characteristics of digital video. More specifically, we first train a conditional generative model to capture the spatio-temporal correlations among visual contents and represent them as an intermediate descriptor. A nonlinear encoder, with the functions of dimension...
Spoken dialogue systems must be able to recover gracefully from unexpected user inputs. In many cases, these unexpected utterances may be within the scope of the system, but include previously unseen phrases that the system cannot interpret. In this work, we augment a spoken dialogue system with the ability to learn about new concepts by conversing with the user in natural language. We present a novel...
This paper investigates the framework of encoder-decoder with attention for sequence labelling based spoken language understanding. We introduce Bidirectional Long Short Term Memory - Long Short Term Memory networks (BLSTM-LSTM) as the encoder-decoder model to fully utilize the power of deep learning. In the sequence labelling task, the input and output sequences are aligned word by word, while the...
Linear discriminant analysis (LDA) is typically carried out using Fisher's method, which relies heavily on the estimation of sample mean vectors and covariance matrices. However, Fisher LDA is vulnerable to outliers as it happens to other multivariate statistical methods. In this paper, we analyzed the optimal discriminant design based on the criterion of minimizing total misclassification rate, assuming...
When emotion recognition systems are used in new domains, the classification performance usually drops due to mismatches between training and testing conditions. Annotations of new data in the new domain is expensive and time demanding. Therefore, it is important to design strategies that efficiently use limited amount of new data to improve the robustness of the classification system. The use of...
This paper proposes a psychologically inspired convolutional neural network (PI-CNN) to achieve automatic facial beauty prediction. Different from the previous methods, the PI-CNN is a hierarchical model that facilitates both the facial beauty representation learning and predictor training. Inspired by the recent psychological studies, significant appearance features of facial detail, lighting and...
Acoustic beamforming has played a key role in the robust automatic speech recognition (ASR) applications. Accurate estimates of the speech and noise spatial covariance matrices (SCM) are crucial for successfully applying the minimum variance distortionless response (MVDR) beamforming. Reliable estimation of time-frequency (TF) masks can improve the estimation of the SCMs and significantly improve...
Multi-view representations are widely existed in practical applications, the quality of latent representation learned from multi-view observations often suffer from noise and outliers in original data. In this work, we propose an auto encoder based deep multi-view robust representation learning (DMRRL) algorithm, which can learn a shared representation from multi-view observations and the algorithm...
Subspace methods are used for deep neural network (DNN)-based acoustic model adaptation. These methods first construct a subspace and then perform the speaker adaptation as a point in the subspace. This paper aims to investigate the effectiveness of subspace methods for robust unsupervised adaptation. For the analysis, we compare two state-of-the-art subspace methods, namely, the singular value decomposition...
Depth motion maps (DMMs) have shown effectiveness in human action recognition, however, they lose the temporal information and suffer from intra-class variations caused by action speed variations. To address these challenges, we propose a novel method for human action recognition. Firstly, Adaptive Hierarchical Depth Motion Maps (AH-DMMs) are calculated over temporal hierarchical windows of video...
Occlusion handling is one of the most challenging issues for pedestrian detection, and no satisfactory achievement has been found in this issue yet. Using human body parts has been considered as a reasonable way to overcome such an issue. In this paper, we propose a brand new approach based on the fusion of Mid-level body part mining and Convolutional Neural Network (CNN) to solve this problem, named...
In this paper, we develop a new approach called DeepText for text region proposal generation and text detection in natural images via a fully convolutional neural network (CNN). First, we propose the novel inception region proposal network (Inception-RPN), which slides an inception network with multi-scale windows over the top of convolutional feature maps and associates a set of text characteristic...
In this paper, we propose a new texture descriptor, scale selective extended local binary pattern (SSELBP), to characterize texture images with scale variations. We first utilize multi-scale extended local binary patterns (ELBP) with rotation-invariant and uniform mappings to capture robust local microand macro-features. Then, we build a scale space using Gaussian filters and calculate the histogram...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.