The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, a text independent speaker recognition system based on Gaussian mixture models (GMM) was developed with a specific focus on the use of a voice activated detector (VAD) algorithm in the training and testing. At the training level, a modified estimation/maximization (EM) algorithm is used. It is less prone to get trapped around a local maximum and so, it will have more chance to converge...
We analyze the theoretical vulnerability of maximum a posteriori(MAP) speaker adaptation, which is widely used in practical speaker recognition systems. First, we proved that there exist a set of feature vectors, what are called wolves, which can impersonate almost all the registered speakers with probability asymptotically close to 1 with at most two trials. Second, our experiment shows that the...
For text-independent short-utterance speaker recognition (SUSR), the performance often degrades dramatically. This paper presents a combination approach to the SUSR tasks with two phonetic-aware systems: one is the DNN-based i-vector system and the other is our recently proposed subregion-based GMM-UBM system. The former employs phone posteriors to construct an i-vector model in which the shared statistics...
The popular i-vector model represents speakers as low-dimensional continuous vectors (i-vectors), and hence it is a way of continuous speaker embedding. In this paper, we investigate binary speaker embedding, which transforms i-vectors to binary vectors (codes) by a hash function. We start from locality sensitive hashing (LSH), a simple binarization approach where binary codes are derived from a set...
In this paper, a pathological voice dataset (PVD) is introduced. The dataset contains recordings of 14 speakers (9 female and 5male) and two health states: normal and unhealthy. Each speaker pronounces fixed words, prompted digits, reads sentences and gives free talking. These materials cover all the phonemes in Chinese. The dataset also considerate the channel variability and is recorded through...
Probabilistic linear discriminant analysis (PLDA) is a popular normalization approach for the i-vector model, and has delivered state-of-the-art performance in speaker recognition. A potential problem of the PLDA model, however, is that it essentially assumes Gaussian distributions over speaker vectors, which is not always true in practice. Additionally, the objective function is not directly related...
Person identification is a very important task for intelligent devices when communicating or interacting with humans. A potential problem in real applications is that the amount of enrollment data is insufficient. When multiple modalities are available, it is possible to re-train the system online by exploiting the conditional independence between the modalities and thus improving classification accuracy...
The paper presents speaker verification results for six basic emotional states. The database of emotional speech (six acted states: anger, sadness, happiness, fear, disgust, surprise) plus the neutral state were examined with a typical speaker verification system based on MFCC features and GMM classifiers. The obtained results were confronted with the subjective and objective emotion recognition scores...
Few research has been conducted on Uyghur speaker recognition. Among the limited works, researchers usually collect small speech databases and publish results based on their own private data. This ‘close-door evaluation’ makes most of the publications doubtable. This paper publishes an open and free speech database THUYG-20 SRE and a benchmark for Uyghur speaker recognition. The database is based...
Recently we have proposed a new State-GMM-supervector extractor for solving the problem of text-dependent speaker recognition. We demonstrated that segmenting the passphrase into word states for supervector extraction makes it possible to create more accurate statistical models of speech signals and to achieve reduction of EER compared to the best state-of-the-art systems of text-dependent verification...
In this paper, we investigate the influence of the language on the text-independent speaker recognition. For this purpose, we have used several automatic text-independent speaker recognition methods (Multivariable Auto-Regression, Vector Quantization and Histogram Classifiers). To measure the effect of the language, we have applied these methods on the POLY-COST 250 multi-language database. Among...
In this paper, we propose a frame selection procedure for text-independent speaker identification. Instead of averaging the frame likelihoods along the whole test utterance, some of these are rejected (pruning) and the final score is computed with a limited number of frames. This pruning stage requires a prior frame level likelihood normalization in order to make comparison between frames meaningful...
This paper presents a new approach and the study of GMM-SVM system for text-dependent speaker recognition on scenario of the fixed pass-phrases. The uniform-split content-based GMM-SVM system is proposed and applied to text-dependent speaker evaluation. We conducted detailed study of the proposed method compared to the baseline GMM-SVM system on the RSR2015 database, which has been designed and collected...
Statistical pattern recognition has been considered to be one of the most successful approaches in the recent advancement of speech and speaker recognition. Out of all the approaches Hidden Markov Models, Gaussian mixture models and Vector Quantization has been considered to be one of the most successful techniques in regards to the performance of the speaker recognition systems. However the performance...
The popular i-vector approach to speaker recognition represents a speech segment as an i-vector in a low-dimensional space. It is well known that i-vectors involve both speaker and session variances, and therefore additional discriminative approaches are required to extract speaker information from the ‘total variance’ space. Among various methods, the probabilistic linear discriminant analysis (PLDA)...
In this work we focus on Emarati speaker identification systems in neutral talking environments based on each of Vector Quantization (VQ), Gaussian Mixture Models (GMMs), and Hidden Markov Models (HMMs) as classifiers. These systems have been tested on our collected Emarati speech database which is composed of 25 male and 25 female Emarati speakers using Mel-Frequency Cepstral Coefficients (MFCCs)...
In this work we propose, implement, and evaluate novel models called Third-Order Hidden Markov Models (HMM3s) to enhance low performance of text-independent speaker identification in shouted talking environments. The proposed models have been tested on our collected speech database using Mel-Frequency Cepstral Coefficients (MFCCs). Our results demonstrate that HMM3s significantly improve speaker identification...
Automatic speaker recognition (ASR) is a well-investigated area among the researcher of speech processing. Many of the factors that affect the recognition rate of an ASR system are addressed in the literature. Those factors are clean and noisy environments, cross channel, session variability, age and health of a speaker etc. One of the factors that still needs investigation is the effect of type of...
The I-vector approach to speaker recognition has become the prevalent paradigm over the past 2 years, showing top performance in NIST evaluations. This success is due mainly to the capability of the I-vector to capture and compress the speaker characteristics at low dimension and the subsequent channel compensation techniques that minimize channel variability. The Linear Discriminative Analysis (LDA)...
Short speech duration remains a critical factor of performance degradation when deploying a speaker verification system. To overcome this difficulty, a large number of commercial applications impose the use of fixed pass-phrases. In this context, we show that the performance of the popular i-vector approach can be greatly improved by taking advantage of the phonetic information that they convey. Moreover,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.