The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Probabilistic linear discriminant analysis (PLDA) is widely described as an effective model for text-independent speaker verification in the i-vector space. The PLDA scoring function is typically formulated as the likelihood ratio between the speaker-adapted and the universal PLDAs. In this case, the adaptation of PLDA was performed through the speaker factors. In this paper, we show that the channel...
I-vector adaptation of DNN-HMM acoustic models has shown clear performance improvement for speech recognition. In this paper, we study this technique on Babel task. we use Swahili as target language (training data of 50 hours) and another 6 languages as multilingual resources to train i-vector extractors respectively. Our study shows that i-vector extractors trained with more multilingual data only...
In this paper, we propose a feature adaptation method that combines speech features from multiple microphone channels for robust automatic speech recognition (ASR). The proposed method first transforms the features in all channels using channel-dependent linear transforms, and then sum the channels into one channel for acoustic modeling. The transform parameters are estimated by maximizing the likelihood...
Spectral information represents short-term speech information within a frame of a few tens of milliseconds, while temporal information captures the evolution of speech statistics over consecutive frames. Motivated by the findings that human speech comprehension relies on the integrity of both the spectral content and temporal envelope of speech signal, we study a spectro-temporal transform framework...
Probabilistic linear discriminant analysis (PLDA) has shown to be effective for modeling channel variability in the i-vector space for text-independent speaker verification. Speaker verification is a binary hypothesis testing. Given a test segment, the verification score could be computed as the log-likelihood ratio between a speaker-adapted PLDA and the universal PLDA model. This work proposes to...
In phonotactic spoken language recognition systems, acoustic model adaptation prior to phone lattice decoding has been adopted to deal with the mismatch between training and test conditions. Moreover, combining diversified phonotactic features is commonly used. These motivate us to have an in-depth investigation of combining diversified phonotactic features from diversely adapted acoustic models....
In this paper, we investigate a feature conditioning method for the VTS-based model compensation. The VTS is a technique that predicts noisy acoustic model from clean acoustic model and noise model. It is noted that most of the previous studies use a single Gaussian noise model, which is unable to model noise statistics well, especially in non-stationary noisy environments. In this paper, we propose...
In this paper, we propose a novel acoustic model adaptation method for noise robust speech recognition. Model combination is a common way to adapt acoustic models to a target test environment. For example, the mean supervectors of the adapted model are obtained as a linear combination of mean supervectors of many pre-trained environment-dependent acoustic models. Usually, the combination weights are...
Voice conversion - the methodology of automatically converting one's utterances to sound as if spoken by another speaker - presents a threat for applications relying on speaker verification. We study vulnerability of text-independent speaker verification systems against voice conversion attacks using telephone speech. We implemented a voice conversion systems with two types of features and nonparallel...
In this paper, we propose a novel feature space adaptation technique to improve the robustness of speech recognition in noisy environments. Histogram equalization (HEQ) is an effective technique for improving robustness by reducing the difference between clean and noisy features. A weakness of HEQ is that it does not take into account acoustic model, resulting in possible mismatch between HEQ-processed...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.