The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes a new noisy robust speech recognition method. Under noise circumstances, several noise reduction methods have been developed and they are applied in various noise conditions. However, in case of similar pronunciation speech, for example, it is still not easy to realize high recognition accuracy. In this paper, the new processing algorithm into speech modulation spectrum is proposed...
Mobile communications are greatly influenced by environmental noise that may cause a significant deterioration in automatic speech recognition (ASR) systems performance. In this paper, we present a new framework integrating a noise-robust front-end in distributed speech recognition (DSR) systems. Using the Aurora-2 speech database, the authors evaluate the development of an additional feature set...
This paper presents the use of lip-reading and Thai speech to control electronic devices in a vehicle. The Viola-Jones algorithm detects the face of the driver and the constrained local model detects their mouth area before three lips features are extracted. Hidden Markov models are utilized to recognize speech and lip movement, with the lip movement recognizer offering better accuracy than the speech...
This paper presents an approach for improving the perceptual quality of speech separated from background noise at low signal-to-noise ratios. Our approach uses two stages of deep neural networks, where the first stage estimates the ideal ratio mask that separates speech from noise, and the second stage maps the ratio-masked speech to the clean speech activation matrices that are used for nonnegative...
Previously, we applied a distribution equalization on our HIerarchical Spectro-Temporal (HIST) features using distributions estimated from histogram of one or several utterances. Although a performance increase could be observed in both cases, we noticed low performance improvement when estimating the distribution only from one utterance. The aim here is to determine a parametric distribution from...
Unvoiced-voiced portions of cochannel speech contain considerable amounts of both voiced and unvoiced speech and play a significant role in separation. Motivated by recent developments in separation of speech from nonspeech noise, we propose a classification-based approach for unvoiced-voiced speech separation. A new feature set consisting of pitch-based features and gammatone frequency cepstral coefficients...
Gaussian Mixture Models (GMMs) have been proven effective in modeling speech and other acoustic signals. In this study, we have used GMMs to model different noise sources, viz. subway, babble, car and exhibition. Expectation maximization algorithm has been implemented to fit the model. Further, we present the ‘threshold’ method which uses the energy coefficient of the Mel - Frequency Cepstral Coefficients...
Some principles of the cyclic feature-based signal detection and classification are describes. α-profiles for spectral coherence and spectral correlation density (SCD) are considered for this purpose. The theoretical SCD α-profile of OFDM/QAM signal with cyclostationary signature is shown.
The research on noisy Tibetan speech recognition algorithm based on wavelet neural network (WNN) combined with auditory feature was carried out in this paper. The recognition classifier based on WNN was designed, and Mel Frequency Cepstrum Constant (MFCC) feature was given. Then the simulation on the given algorithm was run under the different signal to noise ratios (SNR), and the results illustrated...
For the task of detecting shouted speech in a noisy environment, this paper introduces a system based on mel frequency cepstral coefficient (MFCC) feature extraction, unsupervised frame dropping and Gaussian mixture model (GMM) classification. The evaluation material consists of phonemically identical speech and shouting as well as environmental noise of varying levels. The performance of the shout...
The method which is called the “tandem approach” in speech recognition has been shown to increase performance by using classifier posterior probabilities as observations in a hidden Markov model. We study the effect of using visual tandem features in audio-visual speech recognition using a novel setup which uses multiple classifiers to obtain multiple visual tandem features. We adopt the approach...
Efficiency of the speech recognition system in noise free environment is impressive but in the presence of environmental noise the efficiency of the speech recognition system deteriorates drastically. Environmental noise also affects human-to-human or human-to-machine communications and degrades the speech quality as well as intelligibility. Here a speech recognition system is proposed in presence...
When discrete Hidden-Markov-Models (HMMs)-based recognition is performed, vector quantization (VQ) is used to transform continuous observations to sequences of discrete symbols. After VQ, the quantization error is not spread equally among the features. This impairs the feature significance, which is important when features are selected, e. g. by applying the Sequential Forward Selection (SFS). In...
We recently proposed a new algorithm to perform acoustic model adaptation to noisy environments called Linear Spline Interpolation (LSI). In this method, the nonlinear relationship between clean and noisy speech features is modeled using linear spline regression. Linear spline parameters that minimize the error the between the predicted noisy features and the actual noisy features are learned from...
One of the weaknesses of speech recognition system is its lack of robustness to background noise as compared to human listeners under similarly conditions. This paper proposes a 2D psychoacoustic modeling algorithm which is integrated with a feature extraction front-end for hidden Markov model (HMM). The proposed algorithm incorporates the properties of human auditory system and applies it to the...
Phoneme recognition is an essential component of any robust speech decoder and has been tackled by many researchers. Speech feature extraction constitutes the front end module of any speech decoder: it plays an essential role and has a strong impact on the recognition performance. The research community is aggressively searching for more powerful solutions which combine the existing feature extraction...
In this paper, we motivate and introduce a novel vector quantization (VQ) scheme for distributing the quantization error among the quantized features of a continuous feature vector in a predefined manner. This is done by defining ratios between the individual quantization errors of the features and shaping the Voronoi cells accordingly. In a series of experiments we show that the novel approach is...
Transcription of music is the process of generating a symbolic representation such as a score sheet or a MIDI file from an audio recording of a piece of music. A statistical machine learning approach for detecting note onsets in polyphonic piano music is presented. An area from the spectrogram of the sound is concatenated into one feature vector. A cascade of boosted classifiers is used for dimensionality...
We consider the problem of word boundary detection in spontaneous speech utterances. Acoustic features have been well explored in the literature in the context of word boundary detection; however, in spontaneous speech of Switchboard-I corpus, we found that the accuracy of word boundary detection using acoustic features is poor (F-score ~ 0.63). We propose a new feature - that captures lexical cues...
A robust speech feature extraction method based on the power law of hearing and non-uniform spectral compression technique is proposed, and the correspondent model compensation algorithm is given. The mismatch functions, reflecting the infections of additive noise and spectral compression, and the model compensation formulae are deduced. The experiment results show that the significant improvement...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.