The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The tradeoff between noise reduction and speech distortion is a key concern in designing noise reduction algorithms. We have proposed a regularization framework for noise reduction with the consideration of the tradeoff problem. We regard speech estimation as a functional approximation problem in a reproducing kernel Hilbert space (RKHS). In the estimation, the objective function is formulated to...
Although automatic speech recognition (ASR) has been successfully used in several applications, it is still non-robust and imprecise especially in a harsh environment wherein the input speech is of low quality. Robust error correction for ASR outputs thus becomes important in addition to improving recognition performance. In recent approaches to error correction, linguistic or domain information is...
This paper presents an automatic speaker verification system based on the hybrid GMM-SVM model working in real environment. An important step in speaker verification is extracting features that best characterized the speaker. Mel-Frequency Cepstral Coefficients (MFCC) and their firt and second derivatives are commonly used as acoustic features for speaker verification. To reduce the high dimensionality...
In this article the text-independent speaker verification problem is considered. In the presented system each conversation side is represented as a vector lying on the unit hypersphere. These vectors are compared by an inner product which produces similarity scores. In this article classical score normalization methods (z-norm and t-norm) are analyzed and compared with the support vector machines...
In this article a text-independent speaker verification problem is considered. After the feature extraction, each conversation side has been represented as a vector in a fixed dimensional space. In order to reduce an influence of the lengths of utterances and also the channel properties, various vector normalization techniques have been selected from the literature, modified, and tested. Additionally,...
In the text-independent speaker recognition system, Support Vector Machine (SVM) equipped with sequence kernel has been widely used. In this paper, a generic structure conceiving sequence kernel has been encapsulated and in the structure we make an analytical comparison between two well used sequence kernel system-GMM Super vector Kernel (GSK) and Generalized Linear Discriminant Sequence (GLDS) showing...
The standard support vector machine (SVM) is a common method of machine learning, the parameters selection of SVM affects the machine learning ability directly. At present, the research on the choice of SVM parameters is still no uniform approach. In order to avoid the difficult problem of selecting parameters, this paper used a deformed SVM, that is, v-SVM, selected parameters of v-SVM by particle...
Combining multiple low-level visual features is a proven and effective strategy for a range of computer vision tasks. However, limited attention has been paid to combining such features with information from other modalities, such as audio and videotext, for large scale analysis of web videos. In our work, we rigorously analyze and combine a large set of low-level features that capture appearance,...
Four multiclass Support Vector Machines (SVMs) methods were designed for the task of speaker independent phoneme recognition. These are the All-at-once, One-against-all, One-against-one, and the Directed Acyclic Graph SVM (DAGSVM). The Discrete Wavelet Transform (DWT) 8 frequency band power percentages are used for feature extraction. All tests were carried out on the TIMIT database. Comparable recognition...
This paper proposes the novel method to assess Thai speech based on fractal analysis. The fractal algorithm, namely, Higuchi's method was selected to evaluate the fractal dimension (FD) of segmented speech signals. To show the FD changes in waveform over time, the time-dependent FD (TDFD) was proposed. Probability distribution of TDFDs using kernel density estimation was used as an additional parameter...
In this paper, we analyze the effect of channel compensation technique on support vector machines (SVM) based speaker verification performance and compare with another well-known speaker modeling algorithm Gaussian Mixture Models with universal background model (GMM-UBM). Experiments conducted on NIST 2002 SRE shows that channel compensation considerable improves the speaker verification accuracy.
The speech feature extraction has been a key focus in robust speech recognition research; it significantly affects the recognition performance. In this paper, we first study a set of different feature extraction methods such as linear predictive coding (LPC), mel frequency cepstral coefficient (MFCC) and perceptual linear prediction (PLP) with several features normalization techniques including rasta...
User authentication is very critical to ensure only allowed users are able to access restricted resources. Voiceprint can be used as a unique password of the user to prove his/her identity. In this paper, we propose a text-dependent speaker verification system for Arabic language. The paper advocates the use of discrete representation of speech signals in terms of Mel-frequency cepstral coefficients...
This paper proposes a novel direction-of-arrival estimation method in a general 3-dimensional array configuration for multiple speech signals uttered simultaneously. The method is based on sparseness in the time-frequency representation of speech signal and is applicable to an underdetermined case where the sources outnumber sensors. At first, we introduce a parameterized closed surface to which we...
In recent years, there have been significant advances in the field of speaker recognition that has resulted in very robust recognition systems. The primary focus of many recent developments have shifted to the problem of recognizing speakers in adverse conditions, e.g in the presence of noise/reverberation. In this paper, we present the UMD-JHU speaker recognition system applied on the NIST 2010 SRE...
Variable bit-rate coding introduced for effective utilization of limited communication bandwidth requires accurate classification of input signals. This paper investigates implementation of a support vector machine (SVM)-based speech/music classifier in the selectable mode vocoder (SMV) framework, which is a standard codec adopted by the Third-Generation Partnership Project 2 (3GPP2). A support vector...
In this paper we present a fast unsupervised spoken term detection system based on lower-bound Dynamic Time Warping (DTW) search on Graphical Processing Units (GPUs). The lower-bound estimate and the K nearest neighbor DTW search are carefully designed to fit the GPU parallel computing architecture. In a spoken term detection task on the TIMIT corpus, a 55x speed-up is achieved compared to our previous...
We propose a novel method of efficiently searching very large populations of speakers, tens of thousands or more, using an utterance comparison model proposed in a previous work. The model allows much more efficient comparison of utterances compared to the traditional Gaussian Mixture Model(GMM)-based approach because of its computational simplicity while maintaining high accuracy. Furthermore, efficiency...
This paper presents a multiple kernel learning (MKL) approach to speech/music discrimination (SMD). The time-frequency representation (spectrogram) implemented by short-time Fourier transform (STFT) of audio segment is decomposed by wavelet packet transform into different subband levels. The subbands, which contain rich texture information, are used as features for this discrimination problem. MKL...
This paper shows that pattern classification based on machine learning is a powerful tool to analyze human brain activity data obtained by magnetoencephalography (MEG). We propose a new weighting method using a multiple kernel learning (MKL) algorithm to localize the brain area contributing to the accurate vowel discrimination. Our MKL simultaneously estimates both the classification boundary and...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.