The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The robustness of speech recognizers towards noise can be increased by normalizing the statistical moments of the Mel-frequency cepstral coefficients (MFCCs), e. g. by using cepstral mean normalization (CMN) or cepstral mean and variance normalization (CMVN). The necessary statistics are estimated over a long time window and often, a complete utterance is chosen. Consequently, changes in the background...
The speech of cleft palate (CP) patients has typical characteristics. Hypernasality and low speech intelligibility are the primary characteristics for CP speech. In this work, an automatic evaluation of different levels of hypernasality and speech intelligibility algorithm for CP speech was proposed, in order to provide an objective tool for speech therapist. To identify different levels of hypernasality,...
Previous work has shown that spectro-temporal features reduce the word error rate for automatic speech recognition under noisy conditions. These systems, however, required significant hand-tuning in order to determine which spectral and temporal modulations should be included in a particular stream. In this work, streams are split into one spectral and temporal modulation each and their posterior...
Hidden factor such as gender characteristic plays an important role on the performance of Bangla (widely used as Bengali) automatic speech recognition (ASR). If there is a suppression process that represses the decrease of differences in acoustic-likelihood among categories resulted from gender factors, a robust ASR system can be realized. In our previous paper, we proposed a technique of gender effects...
Speaker-specific characteristics play an important role on the performance of Bangla (widely used as Bengali) automatic speech recognition (ASR). It is difficult to recognize speech affected by gender factors, especially when an ASR system contains only a single acoustic model. If there exists any suppression process that represses the decrease of differences in acoustic-likelihood among categories...
This paper presents a Neural Network-based Bangla phoneme recognition method for Automatic Speech Recognition (ASR). The method consists of three stages: at first stage, a multilayer neural network (MLN) converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities, where the second stage computes velocity (?) coefficients from the phoneme probabilities by using...
In this paper, we extend the work done on integrating multilayer perceptron (MLP) networks with HMM systems via the Tandem approach. In particular, we explore whether the use of Deep Belief Networks (DBN) adds any substantial gain over MLPs on the Aurora2 speech recognition task under mismatched noise conditions. Our findings suggest that DBNs outperform single layer MLPs under the clean condition,...
Aspiration is an important phonemic feature in several Indian languages. Unlike English, languages such as Marathi have lexicons in which words with different meanings differ only in the aspiration feature of the initial voiced or unvoiced stop. Thus the reliable discrimination of aspirated stops from their unaspirated counterparts is important in automatic speech recognition for such languages. The...
This paper describes several Sound-Packet segmentation techniques, which will facilitate Automatic Speech Recognition (ASR) for Bangla speech signal. The approximate duration of a sound-packet has been determined and an envelope-detection method has been presented to determine the end-points of sound-packets. The 1st difference method, based on moving average of 1st difference of the signal, is then...
This paper describes a Bangla phoneme recognition method for Automatic Speech Recognition (ASR). The method consists of two stages: i) a multilayer neural network (MLN), which converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities and ii) the phoneme probabilities obtained from the first stage and corresponding Δ and ΔΔ are inserted into another MLN to...
Auditory based front-ends for speech recognition have been compared before, but this paper focuses on two of the most promising algorithms for noise robustness in automatic speech recognition (ASR). The feature sets are Zero-Crossings with Peak Amplitudes (ZCPA) and the recently introduced Power-Law Nonlinearity and Power-Bias Subtraction (PNCC). Standard Mel-Frequency Cepstral Coefficients (MFCC)...
This paper describes a way of designing modulation filter by data driven analysis which improves the performance of automatic speech recognition systems that operate in real environments. The filter for each nonlinear channel output is obtained by a constrained optimization process which jointly minimizes the environmental distortion as well as the distortion caused by the filter itself. Recognition...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.