The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Despite the fact that several sites have reported the effectiveness of convolutional neural networks (CNNs) on some tasks, there is no deep analysis regarding why CNNs perform well and in which case we should see CNNs' advantage. In the light of this, this paper aims to provide some detailed analysis of CNNs. By visualizing the localized filters learned in the convolutional layer, we show that edge...
We evaluate different architectures to recognize multilingual speech for real-time mobile applications. In particular, we show that combining the results of several recognizers greatly outperforms other solutions such as training a single large multilingual system or using an explicit language identification system to select the appropriate recognizer. Experiments are conducted on a trilingual English-French-Mandarin...
There are learning and emotional difficulties for the children with communication disorders which involve a wide variety of problems in speech, language, and hearing. This paper aims at developing a Chinese PCS Editing Processor with Picture Communication Symbols (PCS), Chinese Text-to-Speech Engine and recording engine to improve the social interactivity and learning environment for the children...
Pitch mismatch between training and testing is one of the important factors causing the performance degradation of the speaker recognition system. In this paper, we adopted the missing feature theory and specified the Unreliable Region (UR) as the parts of the utterance with high emotion induced pitch variation. To model these regions, a virtual HD (High Different from neutral, with large pitch offset)...
This paper addresses the problem of discriminative training of language models that does not require any transcribed acoustic data. We propose to minimize the conditional entropy of word sequences given phone sequences, and present two settings in which this criterion can be applied. In an inductive learning setting, the phonetic/acoustic confusability information is given by a general phone error...
While a sound spoken is described by a handful of frame-level spectral vectors, not all frames have equal contribution for either human perception or machine classification. In this paper, we introduce a novel framework to automatically emphasize important speech frames relevant to phonetic information. We jointly learn the importance of speech frames by a distance metric across the phone classes,...
Speech with various emotions aggravates the performance of speaker recognition systems. In this paper, a novel score normalization approach called pitch envelope based frame level score reweighted (PFLSR) algorithm is introduced to compensate the influence of the affective speech on speaker recognition. The approach assumes that the maximum likelihood model is not easily changed with the expressive...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.