The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In enclosed environments where robots are deployed, the observed speech signal is smeared due to reverberation. This degrades the performance of the automatic speech recognition (ASR). Thus, hands-free speech recognition for human-machine communication is a difficult task. Most speech enhancement techniques used to address this problem enhance the contaminated waveform independent from that of the...
Active Learning (AL) is designed to aid the labor-intensive process of training acoustic model for speech recognition. In AL, only the most informative training samples are selected for manual annotation. Thus, how to evaluate the unlabeled samples is worth researching. In this paper, we propose a unified framework to generate confusion networks of multiple levels including character, syllable and...
This paper addresses the design and implementation of automatic speaker verification (ASV) systems. There is great interest in developing and increasing the performance of ASV applications, taking into account the advantages offered when compared to other biometrical methods. State-of-the-art speaker recognizers are based on statistical models such as GMM, HMM, SVM, ANN or hybrid models. This work...
This paper compares three different approaches currently used in recognizing contact calls made from the North Atlantic Right Whale (NRW), Eubalaena glacialis. We present two new approaches consisting of machine learning algorithms based on artificial neural networks (NET) and the classification and regression tree classifiers (CART), and compare their performance with earlier work that employs multi-Stage...
Autonomous signal detection of the North Atlantic right whale (NRW), Eubalaena glacialis, is becoming an important factor in monitoring and conservation for this highly endangered species. Both online and offline systems exist to help study and protect animals within this population. In both cases auto-detection of species-specific calls plays a vital role in localizing individual animal by searching...
This paper investigates the Bayesian Ying-Yang (BYY) learning for speech recognition via Gaussian mixture models (GMMs) based Hidden Markov models (HMMs). A two level procedure is proposed with the hidden Markov level trained still under the maximum likelihood principle by the Baum-Welch algorithm but with the GMMs level trained under the BYY best harmony. We proposed a new batch way EM-like Ying-Yang...
We propose an acoustic segment model (ASM) approach to incorporating temporal information into speaker modeling in text-independent speaker recognition. In training, the proposed framework first estimates a collection of ASM-based universal background models (UBMs). Multiple sets of speaker-specific ASMs are then obtained by adapting the ASM-based UBMs with speaker-specific enrollment data. A novel...
This paper describes the Arabic broadcast transcription system fielded by IBM in the GALE Phase 3.5 machine translation evaluation. Key advances compared to our Phase 2.5 system include improved discriminative training, the use of Subspace Gaussian Mixture Models (SGMM), neural network acoustic features, variable frame rate decoding, training data partitioning experiments, unpruned n-gram language...
This paper addresses the problem of discriminative training of language models that does not require any transcribed acoustic data. We propose to minimize the conditional entropy of word sequences given phone sequences, and present two settings in which this criterion can be applied. In an inductive learning setting, the phonetic/acoustic confusability information is given by a general phone error...
Although research has previously been done on multilingual speech recognition, it has been found to be very difficult to improve over separately trained systems. The usual approach has been to use some kind of “universal phone set” that covers multiple languages. We report experiments on a different approach to multilingual speech recognition, in which the phone sets are entirely distinct but the...
This paper addresses the training of classification trees for weakly labelled data. We call “weakly labelled data”, a training set such as the prior labelling information provided refers to vector that indicates the probabilities for instances to belong to each class. Classification tree typically deals with hard labelled data, in this paper a new procedure is suggested in order to train a tree from...
The primary study of this paper is focused on the acoustic module (AM) design in order to improve the performance of Mandarin TTS system. The AM is composed of the prosody generator, the spectrum generator, and the speech synthesizer. The HMM, recurrent neural network (RNN), and PSOLA algorithms are employed to build the AM. Finally, the performance analyses including the speech quality, memory requirement,...
This paper presents a sound source (talker) localization method using only a single microphone, where a HMM (Hidden Markov Model) of clean speech is introduced to estimate the acoustic transfer function from a user's position. The new method is able to carry out this estimation without measuring impulse responses. The frame sequence of the acoustic transfer function is estimated by maximizing the...
People usually consider recognition and retrieval as two cascaded independent modules for spoken term detection. Retrieval techniques were assumed to be applied on top of some ASR output, with performance depending on ASR accuracy. In this paper, we propose a new framework: to integrate the two parts into a single task. This can be achieved by adjusting the acoustic model parameters, borrowing the...
Choosing the kernel and error penalty parameters for support vector machine (SVM) is very important for the performance of classifiers. An improved grid-search algorithm is proposed to choose the optimal parameters of SVM. The battlefield multi-target SVM classifier is designed using this algorithm. Also three classifiers including k-nearest neighborhood classifier, improved BP neural network classifier...
There is an important significance of the application for real-time classification by using of the acoustic and seismic signals generated by vehicles in the road ramp. The eight test points were put on the both sides of a road ramp, the some devices of acoustic and seismic sensors etc were put in each point. On the acquisition of acoustic and seismic signals, short-time Fourier transform (STFT) was...
In this paper, we investigate some specific acoustic problems of the computer assisted language learning (CALL) system by modifying the acoustic model and feature under the speech recognition framework. At first, in order to alleviate the distortion of channel and speaker, speaker-dependent Cepstrum Mean Normalization (Speaker CMN) is adopted, by which the average correlation coefficient (ACC) between...
Vehicle classification is an important task for various traffic monitoring applications. This paper investigates the capabilities of acoustic feature generation for vehicle classification. Six temporal and spectral features are extracted from the audio recordings. Six different classification algorithms are compared using the extracted features. We focus on a single sensor setting to keep the computational...
Accurately modeling the acoustic variabilities caused by coarticulation is important in continuous speech recognition. Recent research indicates that syllable units do better in modeling intra-syllable co-articulation effect than sub-syllable units. However, most continuous Mandarin speech recognition systems use context dependent phones or initial/finals (IFs) as the basic acoustic unit because it...
Aiming at the requirement of class incremental learning in acoustic fault identification research, a network model using a novel Self-organizing map--negative self-organizing map (NSOM) and probabilistic neural network (PNN) is proposed. The experiment of acoustic fault identification of underwater vehicle shows that the proposed network has better capability of class incremental learning than traditional...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.