The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Recently, a hybrid deep neural network/i-vector framework has been proved effective for speaker verification, where the DNN trained to predict tied-triphone states (senones) is used to produce frame alignments for sufficient statistics extraction. In this work, in order to better understand the impact of different phonetic precision to speaker verification tasks, three levels of phonetic granularity...
An i-vector has become the state-of-the-art algorithm for text-independent recognition. Most of related works take the extraction of the i-vector as a black-box by using some open software (e.g. Kaldi, Alize) and focus on the vector-based back-end algorithms, such as length normalization, WCCN, or PLDA. In this paper, we study the variational method and present a concise derivation for the i-vector...
This paper presents a statistical modeling framework termed as PRISM for text-independent speaker verification. We decompose the verification task into three subtasks: PRobability density estimation, Information metric and Subspace/Manifold learning (PRISM). Subsequently, we take advantages of variational maximum likelihood estimation, Fisher information metric and discriminant locality preserving...
i-Vector modeling has shown to be effective for text independent speaker verification. It represents each utterance as a low-dimensional vector using factor analysis with a GMM supervector. In order to capture more complex speaker statistics, this paper proposes a new feature representation other than i-vectors for speaker verification using neural networks. In this work, stacked bottleneck features...
A novel method is presented based on a statistical manifold for text-independent speaker recognition. After feature extraction, speaker recognition becomes a sequence classification problem. By discarding time information, the core task is the comparison of multiple sample sets. Each set is assumed to be governed by a probability density function (PDF). We estimate the PDFs and place the estimated...
Channel variability is the major cause of performance degradation in text-independent speaker verification. Compensation technology in feature, model or score domain has been widely applied to baseline systems to mitigate mismatch. Newly proposed Gaussian mixture models super vector-support vector machine (GMM-SVM or GSV-SVM) baseline system has proven successful through integrating advantages of...
This paper reports on a novel feature, auditory cepstrum coefficient (ACC) with vocal tract length normalization (VTLN), for language identification (LID). The ACC feature is based on the auditory characteristics of human ear and the VTLN technology compensates the speaker variability. The detailed implementation of ACC feature with VTLN in frequency domain is given. Experimental results show that...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.