The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we apply Locality Sensitive Discriminant Analysis (LSDA) to speaker verification system for intersession variability compensation. As opposed to LDA which fails to discover the local geometrical structure of the data manifold, LSDA finds a projection which maximizes the margin between i-vectors from different speakers at each local area. Since the number of samples varies in a wide...
The current approaches for spoken language recognition (LR) are predominantly based on GMM mean supervector as the representation of the utterances. It is assumed that the language information lies in a linear manifold of low dimensional spaces. Exploiting that a low dimensional projections of the GMM mean supervectors, known as i-vectors, are derived using a total variability matrix. The i-vector...
In speech processing, speech signal is usually processed frame by frame due to the non-stationary characteristic of speech. In this paper, a frequency-domain averaging based frame smoothing method is proposed. Besides the conventional frame shift, we introduce a short time shift to create several frames around current frame. Then we take the average of power spectrum for these frames. The average...
Language identification systems combining i-vectors estimated from different acoustic feature spaces have recently been shown to be superior to i-vector systems based on a single acoustic feature space. Specifically, i-vectors estimated using MFCC and PLP front-ends were concatenated prior to using LDA to obtain a combined i-vector. In this work, we investigate the scalability of this i-vector concatenation...
The present work investigates the importance of excitation source features for language identification (LID). Linear prediction residual (LPR) represents the excitation source signal. By processing the LPR in sub-segmental, segmental and supra-segmental levels, we can get the language specific information present within a glottal cycle, within a sequence of a few glottal cycles and at the prosody...
We recently proposed the use of coefficients extracted from the 2D discrete cosine transform (DCT) of log Mel filter bank energies to improve speaker recognition over the traditional Mel frequency cepstral coefficients (MFCC) with appended deltas and double deltas (MFCC/deltas). Selection of relevant coefficients was shown to be crucial, resulting in the proposal of a zig-zag parsing strategy. While...
Limited data speaker verification has shown its significance in practical system oriented applications. The paper shows the importance of different aspects of voice source feature for limited test data scenario. A baseline speaker verification system using conventional mel frequency cepstral co-efficients (MFCC) feature is developed and performance under limited test data condition (≤10 s) is evaluated...
This paper evaluates the performance of the twelve primary systems submitted to the evaluation on speaker verification in the context of a mobile environment using the MOBIO database. The mobile environment provides a challenging and realistic test-bed for current state-of-the-art speaker verification techniques. Results in terms of equal error rate (EER), half total error rate (HTER) and detection...
In this work, we have investigated the performance of 2D Gabor features (known as spectro-temporal features) for speaker recognition. Gabor features have been used mainly for automatic speech recognition (ASR), where they have yielded improvements. We explored different Gabor feature implementations, along with different speaker recognition approaches, on ROSSI [1] and NIST SRE08 databases. Using...
The work presented in this paper is an extension of our two previous works [1, 2]. In the first paper [1], we proposed a low dimensional feature (i-vectors) extractor which is suitable for both telephone and microphone data of the NIST speaker recognition evaluation dataset. The second paper [2] introduces the use of Probabilistic Linear Discriminant Analysis (PLDA) framework with a heavy tailed distribution...
Sparse representations of signals have received a great deal of attention in recent years, and the sparse representation classifier has very lately appeared in a speaker recognition system. This approach represents the (sparse) GMM mean supervector of an unknown speaker as a linear combination of an over-complete dictionary of GMM supervectors of many speaker models, and ℓ1-norm minimization results...
Acoustic feature extraction from speech is a fundamental part in both automatic speech recognition and automatic speaker recognition. Mel-frequency cepstral coefficients (MFCCs) are widely used in both of the above two research directions. A new feature extraction technique named perceptual MVDR-based cepstral coefficients (PMCCs) has been demonstrated to perform superior in automatic speech recognition...
Speaker diarization for meetings data are recently converging towards multistream systems. The most common complementary features used in combination with MFCC are Time Delay of Arrival (TDOA). Also other features have been proposed although, there are no reported improvements on top of MFCC+TDOA systems. In this work we investigate the combination of other feature sets along with MFCC+TDOA. We discuss...
The objective of this work is to demonstrate the significant speaker information present in the subband energies of the Linear Prediction (LP) residual. The LP residual mostly contains the excitation source information. The subband energies extracted using the mel filterbank followed by cepstral analysis provides a compact representation. The resulting cepstral values are termed as Residual-mel Frequency...
The popular mel-frequency cepstral coefficients (MFCCs) capture a mixture of speaker-related, phonemic and channel information. Speaker-related information could be further broken down according to articulatory criteria. How these underlying components are exactly mixed in the features is not well understood. To this end, in this paper we aim at separating the spectra of glottal source and vocal tract...
The following article shows how a state-of-the-art speaker diarization system can be improved by combining traditional short-term features (MFCCs) with prosodic and other long-term features. First, we present a framework to study the speaker discriminability of 70 different long-term features. Then, we show how the top-ranked long-term features can be combined with short-term features to increase...
In this paper, the fusion of two speaker recognition subsystems, one based on frequency modulation (FM) and another on MFCC features, is reported. The motivation for their fusion was to improve the recognition accuracy across different types of channel variations, since the two features are believed to contain complementary information. It was found that the MFCC-based subsystem outperformed the FM-based...
In this paper we use acoustic and prosodic features jointly in a long-temporal lexical context for automatic speaker recognition from speech. The contours of pitch, energy and cepstral coefficients are continuously modeled over the time span of a syllable to capture the speaking style on phonetic level. As these features are affected by session variability, established channel compensation techniques...
This paper reports on a novel feature, auditory cepstrum coefficient (ACC) with vocal tract length normalization (VTLN), for language identification (LID). The ACC feature is based on the auditory characteristics of human ear and the VTLN technology compensates the speaker variability. The detailed implementation of ACC feature with VTLN in frequency domain is given. Experimental results show that...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.