The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
A solution to long distance outdoor face recognition is presented in this work. The proposed method, called the Two-Stage Alignment/Enhancement Filtering (TAEF) system, consists of three main components: a cross-distance face alignment technique, a cross-environment face enhancement technique, and a two-stage filtering system. Given a probe image, the procedure of face alignment, enhancement and matching...
We evaluate different architectures to recognize multilingual speech for real-time mobile applications. In particular, we show that combining the results of several recognizers greatly outperforms other solutions such as training a single large multilingual system or using an explicit language identification system to select the appropriate recognizer. Experiments are conducted on a trilingual English-French-Mandarin...
Pitch mismatch between training and testing is one of the important factors causing the performance degradation of the speaker recognition system. In this paper, we adopted the missing feature theory and specified the Unreliable Region (UR) as the parts of the utterance with high emotion induced pitch variation. To model these regions, a virtual HD (High Different from neutral, with large pitch offset)...
This paper addresses the problem of discriminative training of language models that does not require any transcribed acoustic data. We propose to minimize the conditional entropy of word sequences given phone sequences, and present two settings in which this criterion can be applied. In an inductive learning setting, the phonetic/acoustic confusability information is given by a general phone error...
While a sound spoken is described by a handful of frame-level spectral vectors, not all frames have equal contribution for either human perception or machine classification. In this paper, we introduce a novel framework to automatically emphasize important speech frames relevant to phonetic information. We jointly learn the importance of speech frames by a distance metric across the phone classes,...
Speech with various emotions aggravates the performance of speaker recognition systems. In this paper, a novel score normalization approach called pitch envelope based frame level score reweighted (PFLSR) algorithm is introduced to compensate the influence of the affective speech on speaker recognition. The approach assumes that the maximum likelihood model is not easily changed with the expressive...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.