The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper compares different approaches for using deep neural networks (DNNs) trained to predict senone posteriors for the task of spoken language recognition (SLR). These approaches have recently been found to outperform various baseline systems on different datasets, but they have not yet been compared to each other or to a common baseline. Two of these approaches use the DNNs to generate feature...
The recent application of deep neural networks (DNN) to speaker identification (SID) has resulted in significant improvements over current state-of-the-art on telephone speech. In this work, we report a similar achievement in DNN-based SID performance on microphone speech. We consider two approaches to DNN-based SID: one that uses the DNN to extract features, and another that uses the DNN during feature...
Prosody is the part of speech where rhythm, stress, and intonation are reflected. In language identification tasks, these characteristics are assumed to be language dependent, and thus the language can be identified from them. In this paper, an automatic language recognition system that extracts prosody information from speech and makes decisions about the language with a generative classifier based...
This work addresses the problem of speaker verification where additive noise is present in the enrollment and testing utterances. We show how the current state-of-the-art framework can be effectively used to mitigate this effect. We first look at the degradation a standard speaker verification system is subjected to when presented with noisy speech waveforms. We designed and generated a corpus with...
The goal of this work was to explore the optimization of the feature extraction module (front-end) parameters to improve bird species recognition. We explored optimizing the spectral and temporal parameters of a Mel cepstrum feature-based front-end, starting from common parameter values used in speech processing experiments. These features were modeled using a Gaussian mixture model (GMM) system....
Prosodic information has been successfully used for speaker recognition for more than a decade. The best-performing prosodic system to date has been one based on features extracted over syllables obtained automatically from speech recognition output. The features are then transformed using a Fisher kernel, and speaker models are trained using support vector machines (SVMs). Recently, a simpler version...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.