The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Most present day Speaker Identification (SID) systems focus on the speech features used for modeling the speakers without any concern for the speech being input to the system. Knowing how reliable the input speech information is can be very important and useful. The idea of SID-usable speech is to identify and extract those portions of corrupted input speech, which are more reliable for SID systems,...
Traditional speech recognition systems use Gaussian mixture models to obtain the likelihoods of individual phonemes, which are then used as state emission probabilities in hidden Markov models representing the words. In hybrid systems, the Gaussian mixtures are replaced by more discriminant classifiers, leading to an improved performance. Most of the time the classifiers used in such systems are neural...
This paper investigates the use of MultiDimensional Voice Program (MDVP) parameters to automatically detect voice pathology in Arabic voice pathology database (AVPD). MDVP parameters are very popular among the physician / clinician to detect voice pathology; however, MDVP is a commercial software. AVPD is a newly developed speech database designed to suit a wide range of experiments in the field of...
Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with wide range of applications. The speech features such as, Mel Frequency cepstrum coefficients (MFCC) and Mel Energy Spectrum Dynamic Coefficients (MEDC) are extracted from speech utterance. The LIBSVM is used as classifier to identify different emotional states such as anger,...
A biometric system makes a pattern recognition decision in accordance with the biometric features extracted from a human being. This paper presents a text-independent speaker Verification system using support vector machines (SVMs) is to identify the speaker by listening to the voice of the speaker. Thus speaker verification is to determine whether a test utterance is spoken by a target speaker and...
Human computer interaction with the time has extended its branches to many different other fields like engineering, cognition, medical etc. Speech analysis has also become an important area of concern. People involved are using this mode for the interaction with the machines to bridge the gap between physical and digital world. Speech emotion recognition has become an integral subfield in the domain...
Research on speech/music classification of digital audio has been both popular in academia, and increasingly utilized in industry. Most of the usual methods use carefully hand-crafted features with Gaussian Mixture Models. To get best performance, some of the features necessitate a long latency due to look ahead, or/and a long onset error. This paper aims to have a different approach to the problem...
There is an increasing use of sensor networks capable of sensing multimedia data including audio data. Unfortunately, public use of these is not allowed because they contain crucial privacy information such as person and location names. Person name extraction (PNE), which is a widely investigated research topic, is an effective technique to resolve this problem. However, there is an important difference...
Finding the user's emotion it can be used for business development and psychological analysis. The motivation of this paper is to build the Tamil emotional corpus for being base for the emotional analysis based on the acoustic variations present and to make Tamil emotional corpus available in public domain. Tamil Play will be used as main resource for building the emotional corpus. Basically, emotions...
Speaker age recognition is an essential technique in automation speech recognition based on the speech wavform parameters in speaker's voice. However, there are several challenges in speaker age recognition, such as innate differences in speaker's voice, subjective classification fuzzy, etc. The issue of speaker age based on isolated words is proposed in this paper, including support vector machine...
Recently, the Voice Activity Detection (VAD) algorithms based on machine learning techniques have shown impressive results in the area of speech recognition. In this paper, we present a case study and we discuss the performance of VAD based on Support Vector Machines (SVM) for Distributed Speech Recognition (DSR) system. In this case study, the speech and the non-speech frames are detected from the...
This paper presents a deep learning method application to the extraction of emotions included in Chinese speech with a deep belief network (DBN) structure. Eight proper features such as pitch, mel frequency cepstrum coefficient (MFCC) are chosen from mandarin speech used as network inputs, and a DBN classifier is used instead of traditional shallow learning methods to recognition of emotions. Experiment...
The main goal of this paper is to establish the relevance of nonlinear parameters (Lyapunov exponents) in the automatic classification of emotions, for the Romanian language. The Largest Lyapunov Exponent - LLE was computed for the MFCC mel frequency cepstral coefficients and the LPCC linear prediction cepstral coefficients. The Support Vector Machine - SVM classifier provides better results than...
Automatic sentence segmentation of speech is a process of identifying the end of a sentence. It is used for improving the output after speech recognition and helps in making the recognition output more readable. It is generally a two-class problem which involves the identification of a boundary characterizing the sentence part and the non-sentence part. An Automatic Speech Recognition (ASR) system...
In the past decade a lot of research has gone into Automatic Speech Emotion Recognition(SER). The primary objective of SER is to improve man-machine interface. It can also be used to monitor the psycho physiological state of a person in lie detectors. In recent time, speech emotion recognition also find its applications in medicine and forensics. In this paper 7 emotions are recognized using pitch...
This work presents a low-cost and fast-trainable automatic speaker-speech recognition (ASSR) system, by proposed binary halved clustering (BHC) method for human-machine interface (HMI) on an embedded platform, owing to the trait of low cost in ASSR system is essential and affordable for real-world application. In addition, fast-trainable ability can provide fast responding time. The reduction of waiting...
This paper presents modification of a speech emotion recognition system for a social robot. Using speaker dependent classifiers with prior speaker identification step was proposed. Emotion recognition is done using global acoustic features of the speech. Six speech signal parameters are computed with the specialised software. The feature extraction is based on calculation of global statistics of those...
Emotion recognition from speech plays an important role in developing affective and intelligent Human Computer Interaction. The goal of this work is to build an Automatic Emotion Variation Detection (AEVD) system to determine each emotional salient segment in continuous speech. We focus on emotion detection in angry-neutral speech, which is common in recent studies of AEVD. This study proposes a novel...
Recognition of human's emotion from speech has become one of the most challenging and attractive fields of research in speech processing area. The present study aimed to detect valence of emotions, using Non-Linear Dynamic features (NLDs). NLDs are extracted from the Discrete Cosine Transform (DCT) of descriptor contours computed from Phase Space Reconstruction (PSR) of speech. These features are...
Emotional speech recognition is an interesting application that is able to recognize different emotional states from speech signal. In Human-Robot Interaction (HRI), emotion recognition is being applied on intelligent robots so that they can understand emotional states of user and interact in a more human-like manner. However, it is not easy to apply emotion recognition algorithms in real applications...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.