The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Multi Layer Perceptron (MLP) features extracted from different types of critical band energies (CRBE) — derived from MFCC, GT, and PLP pipeline — are compared on French broadcast news and conversational speech recognition task. Though the MLP structure is kept fixed, ROVER combination of different CRBE based systems leads to 4% relative improvement. Furthermore, aiming at the combination of state-of-the-art...
Animals cannot communicate the different states of their being — such as normal, hunger, or heat state — through semantics. However, they do generate voices in different states. In this paper, we start with the hypothesis that identification of the specific state of the animal is possible by analyzing their speech signals. We use a variety of spectral features for the purpose of identifying the type...
This paper describes an improved speaker diarization system for multiple distance microphone (MDM) meeting conversations. First, the new system includes a modified speech activity detector (SAD). Second, it adopts the new spectral features based on equivalent rectangular bandwidth (ERB) or bark scale, which are compared with the traditional Mel Frequency Cepstral Coefficients (MFCC) features. Third,...
This contribution is about experiments with different speech feature extraction methods and strategies where the goal has been to improve the result and the resulting recognition rate of the speech recognizer of an automatic audio speech signal transcription system. The extraction of speech features is based on MFCC (Mel Frequency Cepstral Coefficients) and PLP (Perceptual Linear Prediction), which...
Hidden factor such as gender characteristic plays an important role on the performance of Bangla (widely used as Bengali) automatic speech recognition (ASR). If there is a suppression process that represses the decrease of differences in acoustic-likelihood among categories resulted from gender factors, a robust ASR system can be realized. In our previous paper, we proposed a technique of gender effects...
In this paper Wavelet Based Mel Frequency Cepstral Coefficient (WMFCC) features are proposed for speaker verification. The performance of WMFCC features is evaluated and compared with the performance of Mel Frequency Cepstral Coefficient (MFCC) features. A database of ten Hindi digits of sixteen speakers is used during simulation of results. Gaussian Mixture Models (GMMs) are used for maximum log...
This paper presents a hybridization of Multilayer Neural Network-based Bangla phoneme recognition method for Automatic Speech Recognition (ASR) incorporating dynamic parameters. The method consists of four stages: at first stage, a multilayer neural network (MLN) converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities. Phoneme probabilities from the first...
This paper presents an alternative approach to Mel Frequency Cepstral Coefficient (MFCC) based method of feature extraction for robust text independent speaker identification. This work is focused to increase the identification accuracy without increasing the size and complexity of filter bank. The drive for this new feature extraction technique comes from a transformation which is based on the Nyquist...
Speaker-specific characteristics play an important role on the performance of Bangla (widely used as Bengali) automatic speech recognition (ASR). It is difficult to recognize speech affected by gender factors, especially when an ASR system contains only a single acoustic model. If there exists any suppression process that represses the decrease of differences in acoustic-likelihood among categories...
This paper presents a Neural Network-based Bangla phoneme recognition method for Automatic Speech Recognition (ASR). The method consists of three stages: at first stage, a multilayer neural network (MLN) converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities, where the second stage computes velocity (?) coefficients from the phoneme probabilities by using...
Selection of the speech feature for speech recognition has been investigated for languages other than Arabic. Arabic Language has its own characteristics hence some speech features may be more suited for Arabic speech recognition than the others. In this paper, some feature extraction techniques are explored to find the features that will give the highest speech recognition rate. Our investigation...
The research on noisy Tibetan speech recognition algorithm based on wavelet neural network (WNN) combined with auditory feature was carried out in this paper. The recognition classifier based on WNN was designed, and Mel Frequency Cepstrum Constant (MFCC) feature was given. Then the simulation on the given algorithm was run under the different signal to noise ratios (SNR), and the results illustrated...
This paper presents the performance of the deaf speech recognition using Hidden markov model. Even persons those having perfect nasal and oral cavity cannot produce sounds if they are deaf, since they cannot hear anything. If deafness is found earlier, then using speech therapist they can be made to reproduce sounds at the maximum. Depending on the degree of hearing they are deaf, profoundly deaf...
For the task of detecting shouted speech in a noisy environment, this paper introduces a system based on mel frequency cepstral coefficient (MFCC) feature extraction, unsupervised frame dropping and Gaussian mixture model (GMM) classification. The evaluation material consists of phonemically identical speech and shouting as well as environmental noise of varying levels. The performance of the shout...
In this paper, we propose a novel parts-based binary-valued feature for ASR. This feature is extracted using boosted ensembles of simple threshold-based classifiers. Each such classifier looks at a specific pair of time-frequency bins located on the spectro-temporal plane. These features termed as Boosted Binary Features (BBF) are integrated into standard HMM-based system by using multilayer perceptron...
In this paper, the voice recognition algorithm based on HMM (Hidden Markov Modeling) is analyzed in detail. The HMM voice recognition algorithm is explained and the importance of voice information DB is revealed for better improvement of voice recognition rate. The feature vector of each voice characteristic parameter is chosen by means of MFCC (Mel Frequency Cepstral Coefficients). The extracting...
This paper proposes a robust and automated applause detection algorithm for meeting speech. The features used in the proposed algorithm are the short-time autocorrelation features such as autocorrelation energy decay factor, amplitude and lag values of first local minimum and zero-crossing points extracted from the autocorrelation sequence of a windowed audio signal. We apply decision thresholds for...
Efficiency of the speech recognition system in noise free environment is impressive but in the presence of environmental noise the efficiency of the speech recognition system deteriorates drastically. Environmental noise also affects human-to-human or human-to-machine communications and degrades the speech quality as well as intelligibility. Here a speech recognition system is proposed in presence...
Recently, several multi-layer perceptron (MLP)-based front-ends have been developed and used for Mandarin speech recognition, often showing significant complementary properties to conventional spectral features. Although widely used in multiple Mandarin systems, no systematic comparison of all the different approaches as well as their scalability has been proposed. The novelty of this correspondence...
In automatic speech recognition system a diagonal GMM based CDHMM modeling is commonly used. So there is a need to use reasonable feature transformation to decorrelate input feature vectors to satisfy diagonal GMM assumption. In this paper, we introduce the utilization of the several supervised linear feature transformation in speech recognition tasks. Specially each of these methods has particular...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.