The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
One important class of state emission densities of the hiddenMarkov model (HMM) is the Gaussian mixture densities. The classical Baum-Welch algorithm often fails to reliably learn the Gaussian mixture densities when there is insufficient training data, due to the large number of free parameters present in the model. In this paper, we propose a novel strategy for robustly and accurately learning the...
While a sound spoken is described by a handful of frame-level spectral vectors, not all frames have equal contribution for either human perception or machine classification. In this paper, we introduce a novel framework to automatically emphasize important speech frames relevant to phonetic information. We jointly learn the importance of speech frames by a distance metric across the phone classes,...
Gaussian mixture models (GMMs) and the minimum error rate classifier (i.e. Bayesian optimal classifier) are popular and effective tools for speech emotion recognition. Typically, GMMs are used to model the class-conditional distributions of acoustic features and their parameters are estimated by the expectation maximization (EM) algorithm based on a training data set. Then, classification is performed...
Recent studies in patch-based Gaussian Mixture Model (GMM) approaches for face age estimation present promising results. We propose using a hidden Markov model (HMM) supervector to represent face image patches, to improve from the previous GMM supervector approach by capturing the spatial structure of human faces and loosening the assumption of identical face patch distribution within a face image...
Speech perceptual features, such as Mel-frequency Cepstral Coefficients (MFCC), have been widely used in acoustic event detection. However, the different spectral structures between speech and acoustic events degrade the performance of the speech feature sets. We propose quantifying the discriminative capability of each feature component according to the approximated Bayesian accuracy and deriving...
We report on investigations, conducted at the 2006 Johns Hopkins Workshop, into the use of articulatory features (AFs) for observation and pronunciation models in speech recognition. In the area of observation modeling, we use the outputs of AF classifiers both directly, in an extension of hybrid HMM/neural network models, and as part of the observation vector, an extension of the "tandem"...
This paper studies the speech of three talkers with spastic dysarthria caused by cerebral palsy. All three subjects share the symptom of low intelligibility, but causes differ. First, all subjects tend to reduce or delete word-initial consonants; one subject deletes all consonants. Second, one subject exhibits a painstaking stutter. Two algorithms were used to develop automatic isolated digit recognition...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.