The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper is concerned with speech signal based emotion recognition. Linear Prediction (LP) residual mainly contains source specific emotional information. LP residual is derived by inverse filtering of the speech signal. For characterizing the basic emotions, LP residual has been explored at sub-segmental level, segmental level, supra-segmental level, respectively. Gaussian mixture models (GMMs)...
In this paper, a speech recognition system with isolated words is implemented. Discrete hidden Markov model is used to recognize words. Feature vector consists of cepstral and delta cepstrum coefficients which are extracted from speech signal frames. Since the discrete Markov model is used, the feature vector is mapped to a discrete element by a vector quantizer. One of the problems we face in training...
Voice activity detection (VAD) is an outstanding problem for speech transmission, enhancement and recognition. The variety and the varying nature of speech and background noise make it especially challenging. In the past years, many features emphasizing the differences between speech and noise have been proposed for their robustness. However an important problem in many areas of speech processing...
Phoneme spotting in continuous speech has various applications - in speech recognition, smart audio filtering, multimedia synchronization and other fields. Many studies on phoneme spotting have been conducted, using different approaches. We present two algorithms for spotting fricatives (such as /s/, /sh/, /f/) and affricates (/ts/, /ch/) - one based on a cepstrogram-matching approach, and the other...
Automatic speech recognition (ASR) has moved from science-fiction fantasy to daily reality for citizens of technological societies. Some people seek it out, preferring dictating to typing, or benefiting from voice control of aids such as wheel-chairs. Others find it embedded in their hi-tec gadgetry-in mobile phones and car navigation systems, or cropping up in what would have until recently been...
This paper presents a lip reading technique to classify the discrete utterances without evaluating the acoustic signals. The reported technique analysis the video data of lip motions by computing the optical flow (OF). The statistical properties of the vertical OF component were used to form the feature vectors for training the support vector machines (SVM) classifier. The impact of the variation...
In this paper, we propose an approach for recognizing online Persian isolated characters using LLNF model. Local Linear Neuro Fuzzy (LLNF) Model is a powerful approach for classification tasks. It uses divide-and-conquer strategy to partition the problem space into sub-problems and construct Local Linear Models (LLMs). In order to classify the characters, at first, we extract some generic features...
Speech recognition can achieve a simple human-computer interaction and voice control. It is widely used in industrial control, consumer electronics and many other fields. Combining with the characteristic of human physiology, the paper presents a higher-performance speech recognition system for specific people and isolated words. It realizes on a DSP (Digital Signal Processor)system by using the LPMCC...
When the amount of available training and testing data will be few seconds, the number of feature vectors we obtain are less which are insufficient to model and discriminate speaker well. It presented a new method for speaker recognition with short utterances. By non-linear mapping, it used the sectional set fuzzy Vector Quantization with Lp norm to form speaker's model in the high-dimensional feature...
This paper aims to design and implement English digits speech recognition system using Matlab (GUI). This work was based on the Hidden Markov Model (HMM), which provides a highly reliable way for recognizing speech. The system is able to recognize the speech waveform by translating the speech waveform into a set of feature vectors using Mel Frequency Cepstral Coefficients (MFCC) technique This paper...
In this paper, we continue our previous work on nonlinear feature compensation of distortions in clean and telephone speech recognition systems. We have shown that Bidirectional Neural Network (Bidi-NN) can compensate nonlinearly-distorted components of feature vectors. In this study, we present a new effort to improve recognition accuracy on clean and telephone speech data by employing a two-stage...
This paper proposes a method of determining a variable-length frame overlap between two consecutive frames for speech recognition. Compared with the conventional fixed-length frame overlapping method, the proposed method can improve the performance of speech signal processing when performing the front-end processing procedure of speech recognition. By varying the length of frame overlaps using the...
Using spectral and spectro-temporal auditory models, we develop a computationally simple feature vector based on the design architecture of existing mel frequency cepstral coefficients (MFCCs). Along with the use of an optimized static function to compress a set of filter bank energies, we propose to use a memory-based adaptive compression function to incorporate the behavior of human auditory response...
An efficient noise robust feature is presented to track the speech activity in noisy environments. Speech is modeled by one class of 16 phone-like Gaussian mixtures while noises are modeled by 15 classes of 6 mixtures each. The feature vector used is a concatenation of carefully selected coefficients from MFCC, LPCC, and their first and second derivatives. A finite state machine and energy validation...
Infant's cry is a multimodal behavior that contains a lot of information about the infant, particularly, information about the health of the infant. In this paper a new feature in infant cry analysis is presented for recognition two groups: infants with hearing disorder and normal infants, by Mel frequency multi-band entropy cepstrum extraction from infant's cry. Signal processing stage is included...
Feature extraction is an essential first step in speaker verification applications. In addition to static features extracted from each frame of speech data, it is beneficial to use dynamic features that use information from neighboring frames. In this paper a new feature estimation method based on maximum likelihood discriminant analysis is presented. We compare it to traditional MFCC features in...
For improving noise robustness of speech recognition under adverse noise environment, a method of noise robust speech recognition, which combines discrete wavelet transform (DWT), wavelet packet decomposition (WPD) and Lin-log RASTA, is researched in this paper. After one scale of DWT was employed for noisy speech, this method used three scales of DWT and three scales of WPD for the low frequency...
This paper proposes a new approach for emotion recognition based on a hybrid of hidden Markov models (HMMs) and artificial neural network (ANN), using both utterance and segment level information from speech. To combine the advantage on capability to dynamic time warping of HMMs and pattern recognition of ANN, the utterance is viewed as a series of voiced segments, and feature vectors extracted from...
Due to temporal and spectral difference between speech and acceleration signal, the conventional end point detection (EPD) in automatic speech recognition cannot be directly applied to acceleration and threshold-based algorithms found in literatures are too heuristic to be accepted for automatic EPD. In this regard, for motion detection by acceleration, supervised learning in pattern recognition is...
This paper proposes to perform latent semantic analysis (LSA) on character/syllable n-gram sequences of automatic speech recognition (ASR) transcripts, namely subword LSA, as an extension of our previous work on subword text tiling for automatic story segmentation of Chinese broadcast news. LSA represents the 'meaning' of a lexical term by a feature vector conveying the term's relations with other...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.