The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Combining multiple low-level visual features is a proven and effective strategy for a range of computer vision tasks. However, limited attention has been paid to combining such features with information from other modalities, such as audio and videotext, for large scale analysis of web videos. In our work, we rigorously analyze and combine a large set of low-level features that capture appearance,...
Four multiclass Support Vector Machines (SVMs) methods were designed for the task of speaker independent phoneme recognition. These are the All-at-once, One-against-all, One-against-one, and the Directed Acyclic Graph SVM (DAGSVM). The Discrete Wavelet Transform (DWT) 8 frequency band power percentages are used for feature extraction. All tests were carried out on the TIMIT database. Comparable recognition...
This paper proposes the novel method to assess Thai speech based on fractal analysis. The fractal algorithm, namely, Higuchi's method was selected to evaluate the fractal dimension (FD) of segmented speech signals. To show the FD changes in waveform over time, the time-dependent FD (TDFD) was proposed. Probability distribution of TDFDs using kernel density estimation was used as an additional parameter...
In this paper, we analyze the effect of channel compensation technique on support vector machines (SVM) based speaker verification performance and compare with another well-known speaker modeling algorithm Gaussian Mixture Models with universal background model (GMM-UBM). Experiments conducted on NIST 2002 SRE shows that channel compensation considerable improves the speaker verification accuracy.
The speech feature extraction has been a key focus in robust speech recognition research; it significantly affects the recognition performance. In this paper, we first study a set of different feature extraction methods such as linear predictive coding (LPC), mel frequency cepstral coefficient (MFCC) and perceptual linear prediction (PLP) with several features normalization techniques including rasta...
User authentication is very critical to ensure only allowed users are able to access restricted resources. Voiceprint can be used as a unique password of the user to prove his/her identity. In this paper, we propose a text-dependent speaker verification system for Arabic language. The paper advocates the use of discrete representation of speech signals in terms of Mel-frequency cepstral coefficients...
This paper proposes a novel direction-of-arrival estimation method in a general 3-dimensional array configuration for multiple speech signals uttered simultaneously. The method is based on sparseness in the time-frequency representation of speech signal and is applicable to an underdetermined case where the sources outnumber sensors. At first, we introduce a parameterized closed surface to which we...
In recent years, there have been significant advances in the field of speaker recognition that has resulted in very robust recognition systems. The primary focus of many recent developments have shifted to the problem of recognizing speakers in adverse conditions, e.g in the presence of noise/reverberation. In this paper, we present the UMD-JHU speaker recognition system applied on the NIST 2010 SRE...
Variable bit-rate coding introduced for effective utilization of limited communication bandwidth requires accurate classification of input signals. This paper investigates implementation of a support vector machine (SVM)-based speech/music classifier in the selectable mode vocoder (SMV) framework, which is a standard codec adopted by the Third-Generation Partnership Project 2 (3GPP2). A support vector...
In this paper we present a fast unsupervised spoken term detection system based on lower-bound Dynamic Time Warping (DTW) search on Graphical Processing Units (GPUs). The lower-bound estimate and the K nearest neighbor DTW search are carefully designed to fit the GPU parallel computing architecture. In a spoken term detection task on the TIMIT corpus, a 55x speed-up is achieved compared to our previous...
We propose a novel method of efficiently searching very large populations of speakers, tens of thousands or more, using an utterance comparison model proposed in a previous work. The model allows much more efficient comparison of utterances compared to the traditional Gaussian Mixture Model(GMM)-based approach because of its computational simplicity while maintaining high accuracy. Furthermore, efficiency...
This paper presents a multiple kernel learning (MKL) approach to speech/music discrimination (SMD). The time-frequency representation (spectrogram) implemented by short-time Fourier transform (STFT) of audio segment is decomposed by wavelet packet transform into different subband levels. The subbands, which contain rich texture information, are used as features for this discrimination problem. MKL...
This paper shows that pattern classification based on machine learning is a powerful tool to analyze human brain activity data obtained by magnetoencephalography (MEG). We propose a new weighting method using a multiple kernel learning (MKL) algorithm to localize the brain area contributing to the accurate vowel discrimination. Our MKL simultaneously estimates both the classification boundary and...
An important task in Music Information Retrieval is content-based similarity retrieval in which given a query music track, a set of tracks that are similar in terms of musical content are retrieved. A variety of audio features that attempt to model different aspects of the music have been proposed. In most cases the resulting audio feature vector used to represent each music track is high dimensional...
In this paper, we present a new method to de-noise speech in the complex spectral domain. The method is derived from kernel principal component analysis (kPCA). Instead of applying PCA in a high-dimensional feature space and then going back to the original input space by using a solution to the pre-image problem, only the pre-image step is applied for de-noising. We show that the de-noised audio sample...
This paper describes a voice quality control method in statistical esophageal speech enhancement. Esophageal speech is produced by one of the alternative speaking methods for laryngectomees. Its naturalness and intelligibility are much lower than those of natural voices and its voice quality sounds similar even if uttered by different laryngectomees. These issues are alleviated by a statistical voice...
In this work, we explore the use of sparse representation of GMM mean shifted supervectors over a learned dictionary for the speaker verification (SV) task. In this method the dictionaries are learned using the KSVD algorithm unlike the recently proposed SV methods employing the sparse representation classification (SRC) over exemplar dictionaries. The proposed approach with learned dictionary results...
Emotion recognition from Speech has been a very active research topic in pattern recognition. In this paper, we investigate the use of kernel reduced-rank regression (KRRR) model to address the emotion recognition problem from speech. KRRR is a nonlinear extension of the linear reduced-rank regression (RRR) model via the kernel trick, in which a kernel mapping is used for the multivariable of RRR...
A classification system that accurately categorizes caller behavior within Interactive Voice Response systems would assist in developing good automated self service applications. This paper details the implementation of such a classification system for a pay beneficiary application. Adaptive Neuro-Fuzzy Inference System (ANFIS), Feed forward Artificial Neural Network (ANN) and Support Vector Machine...
In this paper, we investigate the enhancement of speech by applying kernel adaptive filter. Noise removal is very important in many applications like telephone conversation, speech recognition, etc. Kernel methods have shown good results for other applications like handwriting recognition, inverse distance weightings, etc. To improve the speech quality and intelligibility, we can process the signals...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.