The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this study, we introduce a new factor analysis of Laplacian approach to speaker recognition under the support vector machine (SVM) framework. The Laplacian-projected supervector from our proposed Laplacian approach, which finds an embedding that preserves local information by locality preserving projections (LPP), is believed to contain speaker dependent information. The proposed method was compared...
In this paper, we propose an automatic data-driven technique for selecting proper background dataset. By the technique, impostor confidence(IC) is proposed as a metric and more discriminative background dataset is automatically chose by impostor confidence(IC) to train more discriminative model. Experiment results on NIST 2008 SRE corpus in GMM-SVM speaker verification system show that the proposed...
This paper proposes a SVM-based method to deal with the problem of detecting audio events(cheering and applause) by audio analysis. In our framework, a sliding window is first used to pre-segment the audio stream into short segments by moving from start to the end. Second, various kinds of audio features are extracted to represent different audio sounds in each segment. Third, SVM(super vector machine)...
The paper presents a support vector machine based Part-Of-Speech tagging on Chinese database which is part of our speech synthesis system. The model can be classified as SVM model and uses many sequential features to predict the POS tag. The text database was download from the internet with 1,280,000 words and 33 parts of Speech. The total accuracy of our experiments is 99.31%.
Recently, using maximum likelihood linear regression (MLLR) transforms as the features for SVM based speaker recognition has been proposed. This can achieve performance comparable to that obtained with state-of-the-art approaches. In this paper, we focus on calculating the transforms based on a GMM universal background model (UBM). Rather than estimating the transforms using maximum likelihood criterion,...
In this paper, we propose a speech emotion recognition system using both spectral and prosodic features. Most traditional systems have focused on spectral features or prosodic features. Since both the spectral and the prosodic features contain emotion information, it is believed that the combining of spectral features and prosodic features will improve the performance of the emotion recognition system...
Gaussian mixture models with an universal background model (UBM) have been the standard method for speaker recognition. Typically, maximum a posteriori (MAP) or maximum likelihood linear regression (MLLR) is used to adapt the means of the UBM. Together with the SVM modeling technique, these approaches can achieve excellent performance. MLLR is quite efficient when the amount of adaptation data is...
Token-based approaches have proven quite effective for spoken language identification (LID). Traditionally, Speech utterances are first decoded into token sequences, and then LID tasks are performed on these token sequences by either n-gram language models or support vector machines. In this paper, we propose a hierarchical system design, which utilizes a group of bayesian logistic regression models...
This paper proposes a novel feature set for robust speaker recognition, which is based on the harmonic structure of speech signals. Channel modulation effects are supposed to be weakened in the harmonic structure features, and furthermore the influence introduced by channel variability could be diminished to a certain degree. Though experiment results show that the raw performance of the harmonic...
This paper presents our Mandarin pronunciation quality assessment system for the examination of Putonghua Shuiping Kaoshi (PSK) and investigates some measures to improve the assessment accuracy. In this paper, a selective speaker adaptation method is studied. In the adaptation module, we select well pronounced speech as the adaptation data, and adopt Maximum Likelihood Linear Regression (MLLR) to...
Modern lifestyle has increased the risk of pathological voices problems. So the therapy of pathological people attracts more attention of people. Meanwhile, acoustic features have been used widely in the therapy of voice disordered people. Classification of Normal and Pathological people is also an auxiliary therapy operation. MFCC has been proved to be a useful feature with traditional classifier...
Modern lifestyles have increased the risk of suffering some kind of voice disorders. It is estimated that nearly 19% of the population have suffered from dysphonic voicing. It is very important to detect pathological voices automatically. Many classification methods have been used to detect the pathological voices automatically and got good results. In this paper, we focus on the automatic detection...
Maximum likelihood linear regression (MLLR) is a widely used technique for speaker adaptation in large vocabulary speech recognition system. Recently, using MLLR transforms as features for SVM based speaker recognition tasks has been proposed, achieving performance comparable to that obtained with cepstral features. In this paper, we focus on calculating the transforms based on a GMM universal background...
Eigenvoice speaker adaptation has been shown to be effective in recent years. In this paper, we propose to use eigenvoice coefficients as features for speaker recognition. We use a simplified version of probabilistic subspace adaptation (PSA) to estimate eigenvoice coefficients, and the coefficients are concatenated to construct supervectors of support vector machines. This approach significantly...
In this paper, we present a new modeling approach for speaker recognition, which uses a kind of novel phonotactic information as the feature for S VM modeling. Gaussian mixture models (GMMs) have been proven extremely successful for text- independent speaker recognition. The GMM universal background model (UBM) is a speaker-independent model, each component of which can be considered to be modeling...
This paper describes a study of subjective criteria for untrained singerspsila singing voice quality evaluation, focusing on the perceptual aspects that have relatively strong acoustic implications. And the correlation among the individual perceptual criteria is also investigated. A SVM regression method is applied to find the importance of every evaluation criterion. Experiments on a 200 singing...
The design approach for classifying the backend features of the PPRLM (Parallel Phone Recognition and Language Modeling) system is demonstrated in this paper. A variety of features and their combinations extracted by language dependent recognizers were evaluated based on the National Institute of Standards and Technology (NIST) Language Recognition Evaluation (LRE) 2003 corpus. Three well-known classifiers:...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.