The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, efficiency comparison of Support Vector Machines (SVM) and Binary Support Vector Machines (BSVM) techniques in utterance-based emotion recognition is studied. Acoustic features including energy, Mel-frequency cepstral coefficients (MFCC), Perceptual linear predictive (PLP), Filter bank (FBANK), pitch, their first and second derivatives are used as frame-based features. Four basic emotions...
Emotion representations are psychological constructs for modelling, analysing, and recognising emotion, being one essential element of affect. Due to its complexity, the boundaries between different emotion concepts are often fuzzy, which is also reflected in the diversification of emotion databases, and their inconsistent target labels. When facing data scarcity as an ever present issue for acoustic...
Studies have shown that ranking emotional attributes through preference learning methods has significant advantages over conventional emotional classification/regression frameworks. Preference learning is particularly appealing for retrieval tasks, where the goal is to identify speech conveying target emotional behaviors (e.g., positive samples with low arousal). With recent advances in deep neural...
Automatic emotion recognition from speech is a challenging task which relies heavily on the effectiveness of the speech features used for classification. In this work, we study the use of deep learning to automatically discover emotionally relevant features from speech. It is shown that using a deep recurrent neural network, we can learn both the short-time frame-level acoustic features that are emotionally...
Affective computing, particularly emotion and personality trait recognition, is of increasing interest in many research disciplines. The interplay of emotion and personality shows itself in the first impression left on other people. Moreover, the ambient information, e.g. the environment and objects surrounding the subject, also affect these impressions. In this work, we employ pre-trained Deep Convolutional...
As emotion recognition from speech has matured to a degree where it becomes suitable for real-life applications, it is time for developing techniques for matching different types of emotional data with multi-dimensional and categories-based annotations. The categorical approach is usually applied for acted ‘full blown’ emotions and multi-dimensional annotation is often preferred for spontaneous real...
Video Affective Content Analysis is an active research area in computer vision. Live Streaming video has become one of the modes of communication in the recent decade. Hence video affect content analysis plays a vital role. Existing works on video affective content analysis are more focused on predicting the current state of the users using either of the visual or the acoustic features. In this paper,...
This study compared the perceptions of Chinese sentences conveying the attitudinal contrast of praising and blaming by five groups of subjects (Chinese natives, Japanese L2 learners of Mandarin, French L2 learners of Mandarin, Japanese and French subjects without any Mandarin ability). Context-elicited target sentences conveying praising, blaming or neutral attitude were used as stimuli in the listening...
Data selection is an important component of cross-corpus training and semi-supervised/active learning. However, its effect on acoustic emotion recognition is still not well understood. In this work, we perform an in-depth exploration of various data selection strategies for emotion classification from speech using classifier agreement as the selection metric. Our methods span both the traditional...
Boosted by a wide potential application spectrum, emotional speech recognition, i.e., the automatic computer-aided identification of human emotional states based on speech signals, currently describes a popular field of research. However, a variety of studies especially concentrating on the recognition of negative emotions often neglected the specific requirements of real-world scenarios, for example,...
Recent years have witnessed a growing interest in recognizing emotions and events based on speech. One of the applications of such systems is automatically detecting when a situations gets out of hand and human intervention is needed. Most studies have focused on increasing recognition accuracies using parts of the same dataset for training and testing. However, this says little about how such a trained...
As the recognition of emotion from speech has matured to a degree where it becomes applicable in real-life settings, it is time for a realistic view on obtainable performances. Most studies tend to overestimation in this respect: acted data is often used rather than spontaneous data, results are reported on pre-selected prototypical data, and true speaker disjunctive partitioning is still less common...
There are two main emotion annotation techniques: multidimensional and categories based. In order to conduct experiments on emotional data annotated with different techniques, two-classes emotion mapping strategies (e.g. high-vs. low-arousal) are commonly used. The ”affective computing” community could not specify a location of emotionally neutral area in multi-dimensional emotional space (e.g. valence-arousal-dominance...
Detection of affective states in speech could improve the way users interact with electronic devices. However the analysis of speech at the acoustic level could be not enough to determine the emotion of a user speaking in a realistic scenario. In this paper we analysed the spontaneous speech recordings of the FAU Aibo Corpus at the acoustic and linguistic levels to extract two sets of acoustic and...
This paper describes three categorical classification approaches to spontaneous children's emotion recognition based on acoustic features from speech. Also, we present a fourth approach combining by stacking generalisation the two best classifiers. We used the FAU Aibo Corpus to work under real-life conditions, dealing with spontaneous speech and with low emotional expressiveness, unbalanced data,...
Articulation training with many kinds of stimulus and messages such as visual, voice, and articulatory information can teach user to pronounce correctly and improve user's articulatory ability. In this paper, an articulation training system with intelligent interface and multimode feedbacks is proposed to improve the performance of articulation training. Clinical knowledge of speech evaluation is...
This paper describes a system that deploys acoustic and linguistic information from speech in order to decide whether the utterance contains negative or non-negative meaning. An earlier version of this system was submitted to the Interspeech-2009 Emotion Challenge evaluation. The speech data consist of short utterances of the children's speech, and the proposed system is designed to detect anger in...
More and more efforts have been made for the research of emotional speech recently. Although we may, sometimes be able to make a definite perceptual decision on emotion state, emotion is actually a kind of cline in a large vector space. Different emotions can be thought of as zones along an emotional vector. To resolve the ambiguity of emotion perception, the authors make an array of perception experiments...
Recognition of emotion in speech usually uses acoustic models that ignore the spoken content. Likewise one general model per emotion is trained independent of the phonetic structure. Given sufficient data, this approach seemingly works well enough. Yet, this paper tries to answer the question whether acoustic emotion recognition strongly depends on phonetic content, and if models tailored for the...
Facial expression recognition can be divided into three steps: face detection, expression feature extraction and expression categorization. Facial expression feature extraction and categorization are the most key issue. To address this issue, we propose a method to combine local binary pattern (LBP) and embedded hidden markov model (EHMM), which is the key contribution of this paper. This paper first...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.