The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we focus on the classification of neutral and stressed speech. The parameters representing airflow patterns in physiological system are achieved using a physical model. Speech features were modeled using Gaussian Mixture Models (GMM) and Support Vector Machines (SVM). A comparison is made of different classifiers to determine their performance in stressed speech classification. Results...
Prosodic cues are an important part of human communication. One of these cues is the word prominence which is used to e.g. highlight important information. Since individual speakers use different ways of expressing prominence, it is not easily extracted and incorporated in a dialog system. As a consequence, up to date prominence only plays a marginal role in human-machine communication. In this paper...
This paper presents an overview of the studies that have been conducted with the purpose of understanding the use of brain signals as input to a speech recogniser. The studies have been categorised based on the type of the technology used with a summary of the methodologies used and achieved results. In addition, the paper gives an insight into some studies that examined the effect of the chosen stimuli...
Automatic emotion recognition from human speech signal has many important practical applications. For the reason, a number of studies has been performed on the basis of English, German, Mandarin, Persian, and Danish languages. This work intends to develop automatic emotion recognition system on the basis of speech signal in Indonesia language. The study is limited to four emotional states, namely,...
Lying is among the most common wrong human acts that merits spending time thinking about it. The lie detection is until now posing a problem in recent research which aims to develop a non-contact application in order to estimate physiological changes. In this paper, we have proposed a preliminary investigation on which relevant acoustic parameter can be useful to classify lie or truth from speech...
In this paper, we propose an efficient approach to identify the opinion leader from group discussion. This approach is able to recognize the opinion leader without analyzing semantic and syntactic features, which may cost a lot more computing effort. We firstly propose algorithms to evaluate the degree of participation and the emotion expression from the speaking of each member during group discussion...
The identification of emotional hints from speech shows a large number of applications. Machine learning researchers have analyzed sets of acoustic parameters as potential cues for the identification of discrete emotional categories or, alternatively, of the dimensions of emotions. Experiments have been carried out over records including simulated or induced emotions, even if recently more research...
In this article we applied Support Vector Machines to acoustic model of Speech Recognition System based on MFCC and LPC features for Azerbaijani DataSet. This DataSet has been used for speech recognition by Multilayer Artificial Neural Network and achieved some results. The main goal of this work is applying SVM techniques to the Azerbaijan Speech Recognition System. The variety of results of SVM...
The article presents an analysis of the possibility of recognizing speaker's emotions from speech signal in Polish language. In order to perform experiments a database containing speech recordings with emotional content was created. On its basis, extraction of features from the speech signals was performed. The most important step was to determine which of the previously extracted features were the...
Language is the ability to know any complex era in a real world application. Approximate number of languages are 6700. Different regions in a world have different languages spoken. When a human meets another human, speaking different language, it is difficult to identify the one what next person is talking about or in which language. Hence, the main focus is recognition of language which is spoken...
In human computer interaction, speech emotion recognition is playing a pivotal part in the field of research. Human emotions consist of being angry, happy, sad, disgust, neutral. In this paper the features are extracted with hybrid of pitch, formants, zero crossing, MFCC and its statistical parameters. The pitch detection is done by cepstral algorithm after comparing it with autocorrelation and AMDF...
Classification of long duration speech, represented as varying length sets of feature vectors using support vector machine (SVM) requires a suitable kernel. In this paper we propose a novel segment-level pyramid match kernel (SLPMK) for the classification of varying length patterns of long duration speech represented as sets of feature vectors. This kernel is designed by partitioning the speech signal...
Speech therapy is essential to help children with speech sound disorders. While some computer tools for speech therapy have been proposed, most focus on articulation disorders. Another important aspect of speech therapy is voice quality but not much research has been developed on this issue. As a contribution to fill this gap, we propose a robust scoring model for voice exercises often used in speech...
In this paper, we propose to use a kernel sparse representation based classifier (KSRC) for the task of speech emotion recognition. Further, the recognition performance using the KSRC is improved by imposing a group sparsity constraint. The speech utterances with same emotion may have different duration, but the frame sequence information does not play a crucial role in this task. Hence, in this work,...
Recent times have been marked with the increasing demand for more intelligent human computer interfaces. By adding emotion recognition abilities, voice based interfaces can be made more human centric. As natural languages do not share similar acoustic-phonetic features and vary in production of speech sound, the emotion recognition accuracy gets affected with respect to the user's language. This work...
In this paper, we propose to use deep neural network (DNN) as an effective tool for audio feature extraction. The DNN-derived features can be effectively used in a subsequent classifier (e.g., an SVM in this study) for audio classification. Specifically, we learn bottleneck features from a multi-layer perceptron (MLP), in which Mel filter bank feature is used as network input and one of the hidden...
The speech recognition technology is one of the hot spots in the field of audio technology. For the recognition of the lyrics with the accompaniment, there are two commonly used methods, one is applying automatic speech recognition technology to singing recognition, the other way is using sound classification, extracting audio features, and then using pattern matching classifier for classification...
Emotions play an important role in our thinking and behavior and hence contribute in shaping up of our personality. Many theoretical and experimental researches have been conducted to recognize the emotions from verbal or non-verbal behaviors. It is well known that the electroencephalogram (EEG) signals contain rich information about the activities of the brain and they can reliably enable us to estimate...
In this work, a new feature, residual sinusoidal peak amplitude (RSPA), is proposed for emotion classification. The RSPA feature is evaluated from the LP residual of the speech signal using sinusoidal model. Residual signal is a major source of the excitation and it is expected that emotional information can be well manifested in the residual signal. The effectiveness of the proposed feature is explored...
In this paper, the performance of Convolution Neural Network (CNN) in image recognition and emotion recognition in speech will be compared and presented. Feature extraction and selection in pattern recognition is an important issue and have been frequently discussed. Moreover, two-dimensional signals such as image and voice are hard to be modelled well by traditional models like SVM. The ability of...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.