The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Automatic voiceprint recognition, posited on human speech signal, serves many salient practical applications. A number of studies are undertaken on the basis of normal speech. This research intends to develop automatic voiceprint recognition system on the basis of emotion speech signal in Indonesia language. The study is limited to four different people with speeches of four distinctive emotional...
In this paper we are presenting our approach of creating voice based interface for one of the leading Lithuania bus route search system — www.autobusubilietai.lt. We designed a hybrid speech recognition system which is based on one Lithuanian speech recognizer (LIEPA) and two foreign language recognizers (German and Spanish). We experimented with different methods that may be used for combining outputs...
In this paper, a system based on support vector machines is proposed for content-based dialect classification and retrieval. This work is part of an ongoing effort to address the needs of new under-resourced languages. The recognition system will work for the interest and welfare of the Pashto speaking people and will help in keeping the language dialects alive by this process. Voice samples are collected...
With the increase in man to machine interaction, speech analysis has become an integral part in reducing the gap between physical and digital world. An important subfield within this domain is the recognition of emotion in speech signals, which was traditionally studied in linguistics and psychology. Speech emotion recognition is a field having diverse applications. The prime objective of this paper...
For the problem low speech recognition rate, an improved method of combining Deep Belief Network (DBN) with support vector machine (SVM) for analyzing Small sample speech signals is proposed. The speech signal data collected as the training sample is used for training the DBN to get the optimal parameter values. The trained DBN is utilized for feature extraction, and these speech sample data signals...
Traditional speech-related identity recognition commonly pays attention to individual aspect of speech signals but in reality, the speech signals are made up of semantics, speaker dependent features, etc. This paper therefore presents a new study that recognizes simultaneously multidimensional speaker information. In order to extract sufficient relational features, both high-level and low-level features...
In this paper, we propose to achieve the classification of pathologic voices and essentially the classification between organic pathologies: it's about polyp, edema and nodule pathologies using new features. The principle contribution in this work is to provide new parameter more efficient than the classic MFCC. It's about calculating MFCC not from the speech signal but from the speech multiscale...
Several speech processing and audio data-mining applications rely on a description of the acoustic environment as a feature vector for classification. The discriminative properties of the feature domain play a crucial role in the effectiveness of these methods. In this work, we consider three environment identification tasks and the task of acoustic model selection for speech recognition. A set of...
Bidirectional long short-term memory (BLSTM) recurrent neural network (RNN) has achieved state-of-the-art performance in many sequence processing problems given its capability in capturing contextual information. However, for languages with limited amount of training data, it is still difficult to obtain a high quality BLSTM model for emphasis detection, the aim of which is to recognize the emphasized...
This paper addresses the problem of speech emotion recognition from movie audio tracks. The recently collected Acted Facial Expression in the Wild 5.0 database is used. The aim is to discriminate among angry, happy, and neutral. We extract a relatively small number of features, a subset of which is not commonly used for the emotion recognition task. Those features are fed as input to an ensemble classifier...
We propose a neural-network training algorithm that is robust to data imbalance in classification. In our proposed algorithm, weights are introduced to training examples, effectively modifying the trajectory traversed in the parameter space during the learning process. Furthermore, the proposed algorithm would reduce to the normal stochastic gradient decent learning if the data is balanced. On the...
Automatic spoken digit recognition is one of the important areas in speech recognition. Local language spoken digits recognition is the next stage in this technological advancement. This paper presents a new approach for Pashto digits recognition using spectral and prosodic based feature extraction. Very little or almost no work has been done in Pashto spoken digit recognition. Thats why no standard...
Emotions exhibited by a speaker can be detected by analyzing his/her speech, facial expressions and gestures or by combining these properties. This paper concentrates on determining the emotional state from speech signals. Various acoustic features such as energy, zero crossing rate(ZCR), fundamental frequency, Mel Frequency Cepstral Coefficients (MFCCs), etc are extracted for short term, overlapping...
Many pattern recognition problems involve characterizing samples with continuous labels instead of discrete categories. While regression models are suitable for these learning tasks, these labels are often discretized into binary classes to formulate the problem as a conventional classification task (e.g., classes with low versus high values). This methodology brings intrinsic limitations on the classification...
This paper presents an approach that aims to recognize stressed speech utterances. Our work consists of extracting features using Mel Frequency Cepstral Coefficients (MFCC) and Gammatone Frequency Cepstral Coefficients (GFCC). Indeed, these features are classified with One-class Support Vector Machines (OC-SVM). The results of the proposed method are obtained by conducting speech samples of four stressed...
Human facial expressions change with different states of health; therefore, a facial-expression recognition system can be beneficial to a healthcare framework. In this paper, a facial-expression recognition system is proposed to improve the service of the healthcare in a smart city. The proposed system applies a bandlet transform to a face image to extract sub-bands. Then, a weighted, center-symmetric...
The use of non-verbal vocal input (NVVI) as a hand-free trigger approach has proven to be valuable in previous work [7]. Nevertheless, BlowClick's original detection method is vulnerable to false positives and, thus, is limited in its potential use, e.g., together with acoustic feedback for the trigger. Therefore, we extend the existing approach by adding common machine learning methods. We found...
In the field of Human Computer Interaction (HCI), human emotion recognition from speech signal is evolving as a recent research area. Speech is the most common way for communication among human beings. Speech consists of sentences, which can be further segregated into words. Words consist of phonemes which are considered to be the primary voice construction elements. This paper presents a classification...
This paper presents an approach that aims to recognize stress under speech. The proposed system is based on wavelet packet prosody features. These features are extracted from speech according to Mel scale, Bark scale and ERB scale. Multiclass Support Vector Machines are used as the based classifiers in order to classify the stress states. The speech utterances used is this study are taken from Speech...
The field of Emotion recognition (ER) is a part of human-computer interaction and this field has evolved very rapidly since the last decade. There are several works which have been done on emotion recognition using audio and video, however recently work is being done on fusion of the different modalities. The aim of this paper is to fuse the results of emotion detection obtained using audio and visual...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.