The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Today speech processing is one of the demanding applications among all others. This article highlights two important aspects of speech processing, namely which feature representation must be employed and what is their selection criteria. Depending on different application areas, speech processing needs different set of features and techniques to extract them. At the same time it is necessary to choose...
Present era is full of speech recognition based services and products. The machine learning paradigms is at the centre stage of speech recognition methodology. Automatic speech recognition (ASR) technology has vastly evolved in recent years including emerging applications in mobile computing, natural user interface, and man-machine assistive technology. In this paper, it's the first time we are presenting...
The current work presents a multilingual speech-to-text conversion system. Conversion is based on information in speech signal. Speech is the natural and most important form of communication for human being. Speech-To-Text (STT) system takes a human speech utterance as an input and requires a string of words as output. The objective of this system is to extract, characterize and recognize the information...
With the advent of technology, speech recognition is no longer just the capability of the humans. Voice based interfaces can turn most favorable for human computer interaction if computers respond according to its users emotional state. Emotion recognition from speech is a challenging problem as the system has to interact with diverse user utterances. This paper presents an age driven speech emotion...
Speech emotion recognition is an aspect of equipped robots with the human capabilities. The need to tradeoff between computational volume and performance accuracy is the main challenge of real-time processes. Application domain of this paper is robotic; therefore both mentioned factors are important. Selecting distinguishing factor with low dimension and high resolution is the optimal solution for...
Emotions in speech are the key to the fluent human communication. The investigation of emotions in speech has been reported in many different studies. Thus the scope of this article is dedicated to the emotion recognition from speech signal. To find out the best recognition performance of used system, different cepstral coefficient were extracted from the emotional recordings of two female and one...
This paper suggested a technique based on MFCC analysis for audio signals with speech classification application. The proposed work used multi-resolution (wavelet) analysis and spectral analysis based features for feature extraction. The proposed approach uses a no. of features like Mel Frequency Cepstral Coefficient (MFCC), and FFT Coefficients combined with wavelet based features. In addition, accuracy...
Automatic speech recognition in different languages, spoken in different areas of any country, is one of the major research area in the field of signal processing. This paper presents an improved MFCC algorithm for Bundelkhandi digit speech recognition. Here speech digit features are extracted by using modified Mel Frequency Cepstral Coefficient algorithm (MFCC). In this modified MFCC algorithm, one...
In this paper, we mainly paying attention on mechanization of Infant's Cry. For this implementation we use LFCC for feature extraction and VQ codebook for toning samples using LBG algorithm. The newborn crying samples composed from various crying baby having 0–6 months age. There are 27 babie's sound as training data, each of which represents the 7 hungry infant cries, 4 sleepy infant cries, 10 in...
Speech processing is the one of the interesting and challenging concept in man machine communication. Emotion detection is the process of determination of the psychological state of the speaker. Pitch, formant frequencies, duration, timbre, MFCCs, energy are some of the efficient parameters from which, bulk of information can be retrieved from speech signal. These parameters have provided good accuracy...
Speech processing is emerged as one of the important application area of digital signal processing. Power Normalized Cepstral Coefficients (PNCC) and Mel Frequency Cepstral Coefficient (MFCC) are mainly used in feature extraction of speech signals. The problem of real time speaker segmentation in speech processing is enormous in which no prior knowledge about the number of speakers and the identities...
This paper deals with a new automatic stressed recognition system based on kernel classification. We extracted advanced acoustic features from the stressed signals and employed a multi-class Support Vector Machines with different kernels to recognize speech utterances under stress. Gammatone Frequency Cepstral Coefficients are also established. The system implemented is tested using isolated words...
An interesting issue to be considered up to now in emotional speaker recognition system is the context in which speech database used to develop and evaluate the performance of the system. So, we propose and assess an emotional speaker recognition system based on different feature extraction methods, focusing on the diversities between simulated and natural emotional speech databases(BERLIN and IEMOCAP)...
In this paper, a new combination of features and normalization methods is investigated for robust biometric speaker identification. Mel Frequency Cepstral Coefficients (MFCC) are efficient for speaker identification in clean speech while Power Normalized Cepstral Coefficients (PNCC) features are robust for noisy environments. Therefore, combining both features together is better than taking each one...
Speaker voice characteristics are an important aspect of forensic phonetics. Previous studies have suggested that all the features present in the speech signals are not equally important for speaker discrimination, and it is well-known that subsets of phonemes are more informative than others. However, most of theses studies have concerned a whole group of speakers, without taking into account the...
This paper frames co-relation on three feature extraction techniques in ASR system. As compared to primarily used technique called MFCC (Mel Frequency Cepstral Coefficients), PNCC (Power Normalized Cepstral Coefficients) obtains impressive advancement in noisy speech recognition due of its inhibition in high frequency spectrum for human voice. The techniques differ in the way as MFCC uses traditional...
The task of developing automatic speaker verification (ASV) system for security application is of considerable importance. This paper aims at developing a fusion strategy which combines both magnitude and phase information of the speech signal which yields a better performance when compared to conventional individual features. This paper employs Mel frequency cepstral coefficients (MFCC) and modified...
The Gaussian mixture models (GMM) represent an efficient model that was broadly used in most of speaker recognition applications. This study introduces a novel method for speaker verification task. We propose a reduced feature vector employing new information detected from the speaker's voice for performing text-independent speaker verification applications using GMM. We use the power spectrum density...
Emotion detection currently is found to be an important and interesting part of speech analysis. The analysis can be done by selection of an effective parameter or by combination of a number of parameters to gain higher accuracy level. Definitely selection of a number of parameters together will provide a reliable solution for getting higher level of accuracy than that of for the single parameter...
Speech signals carry valuable information about the speaker including age, gender, and emotional state. Gender information can act as a vital preprocessing ingredient for enhancing speech analysis applications like adaptive human-machine interfaces, multi-modal security applications, and sophisticated intent and context analysis based forensic systems. In uncontrolled environments like telephone speech...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.