The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Post-traumatic stress disorder (PTSD) is a traumatic-stressor related disorder developed by exposure to a traumatic or adverse environmental event that caused serious harm or injury. Structured interview is the only widely accepted clinical practice for PTSD diagnosis but suffers from several limitations including the stigma associated with the disease. Diagnosis of PTSD patients by analyzing speech...
The aim of our proposed research work is to identify language of spoken utterance using visual speech recognition and include Marathi language in language identification (LID) system. In this paper we have focused on the task of identifying first three digits in Marathi language. For this first Lips are extracted from video frames of face images and then landmark points on the lips are detected. Then...
Depression is a mental disorder of high prevalence, leading to a negative effect on individuals, their families, society and the economy. In recent years, the problem of automatic detection of depression from the speech signal has gained more interest. In this paper, a new multiple classifier system for depression recognition was developed and tested. The novel aspect of this methodology is the combination...
Accurate Speech endpoint detection is important for speaker recognition, speech recognition, coding, and transmission and so on. In this paper, a fusion feature is proposed for speech endpoint detection, which utilized zero-crossing rate, Lempel and Ziv complexity (LZC), C0 complexity and fluctuation complexity to represent the speech signal. In order to classify speech signal and background signal,...
Deep Neural Networks (DNN) are the dominant technique widely used in English and Chinese speech recognition currently. However, Tibetan speech recognition research starts late and mainly uses Hidden Markov Model (HMM). In this paper, We show a better method of replacing Gaussian Mixture Models (GMM) by DNN to Tibetan Lhasa dialect speech recognition system. The system contains seven layers of features...
For the sake of improving the precision of speech emotion recognition, this paper proposed a novel speech emotion recognition approach based on Gaussian Kernel Nonlinear Proximal Support Vector Machine (PSVM) to recognize four basic human emotions (angry, joy, sadness, surprise). Firstly, preprocess speech signal containing sampling, quantification, pre-emphasizing, framing, adding window and endpoint...
Speech emotion recognition has been widely used in human computer interaction and applications. This paper has classified emotion into two classes: happy and angry. All the speech signal is preprocessed from Malay spoken speech database. Emotional information is obtained by applying two well-established acoustical features that are Mel Frequency Cepstral Coefficients (MFCC) and Short Time Energy (STE)...
The accuracy of object recognition has been greatly improved due to the rapid development of deep learning, but the deep learning generally requires a lot of training data and the training process is very slow and complex. We propose an incremental object recognition system based on deep learning techniques and speech recognition technology with high learning speed and wide applicability. The system...
Flow pattern is one of the most important parameters for gas-liquid two-phase flow. In this work, a new flow pattern identification method based on Convolution Neural Network (CNN) is presented. A 7-layer CNN structure is chosen, and the parameters of this network are determined by a training set. In order to verify the feasibility, experiments were carried out in horizontal pipe with the inner diameter...
Keyword extraction is widely used for information indexing, compressing, summarizing, etc. Existing keyword extraction techniques apply various text-based algorithms and metrics to locate the keywords. At the same time, some types of audio and audiovisual content, e. g. lectures, talks, interviews and other speech-oriented information, allow to perform keyword search by prosodic accents made by a...
The results of the implementation of an external accent recognition system and its integration into massive open online courses platform Moodle are reported. Accent recognition becomes important in foreign languages learning to provide a feedback to a student on a presence of a certain unwanted accent in a foreign language pronunciation. Implementation of several accent recognition methods and their...
The aim of this study is to suggest an algorithm that combines two speech recognition systems. These systems differ in the methods used in the feature extraction stage, but they have the same classifier Hidden Markov Model (HMM). The first system uses Mel-Frequency Cepstrum Coefficients (MFCC), the second one uses Linear Prediction Cepstrum Coefficients (LPCC), and the third system uses Perceptual...
Speaker-dependent speech recognition system requires the system should not only recognize speech, but also recognize the speaker of the segment. In this paper, two indicators are selected—short-time average zero-crossing rate and dual-threshold endpoint to test the signal endpoint through the study of speaker-dependent isolated-word speech characteristics, and MFCC parameters are taken...
Recently we proposed a novel multichannel end-to-end speech recognition architecture that integrates the components of multichannel speech enhancement and speech recognition into a single neural-network-based architecture and demonstrated its fundamental utility for automatic speech recognition (ASR). However, the behavior of the proposed integrated system remains insufficiently clarified. An open...
Effective presentation skills can help to succeed in business, career and academy. This paper presents the design of speech assessment during the oral presentation and the algorithm for speech evaluation based on criteria of optimal intonation. As the pace of the speech and its optimal intonation varies from language to language, developing an automatic identification of language during the presentation...
Speech recognition system has application in many areas such as customer call centers and as a medium in helping those with learning disabilities. There are three main stages in speech recognition which are signal analysis, feature extraction and modeling. Feature extraction plays an important role in speech recognition system and good speech feature extraction technique will allow the systems to...
Speech classification is an important part of speech signal processing. It is significant to classify speech accurately and quickly in speech coding and speech synthesis. Because of the diversity and uncertainty of the speech signals, the traditional classification method is slow and not so accurate in the large-scale application of real speech classification. In order to improve the accuracy and...
The goal of this work is to validate the impact of natural elicitation of emotions by the speakers during the development of speech emotion databases for Malayalam language. The work also proposes a Gaussian Mixture Model-Deep Belief Networks (GMM-DBN) based speech emotion recognition system. To test the effect of emotion elicitation by the speakers, two independent datasets with emotionally biased...
This paper presents a study of speech recognition based on electromyographic biosignals captured from the articulatory muscles in the face using surface electrodes. This paper compares the speech recognition system for spoken English and Malay words by a group of Malay native speakers. Feature extraction was done in both temporal and time-frequency domains. Temporal features used are integrated EMG...
The paper deals with affective computing to improve the performance of Human-Machine interaction. The focus of this work is to detect affective state of a human using speech processing techniques primarily intended for call centre applications. Limited work is reported till date on affect detection using phase derived features. A unique combination of Group delay (GD), Phase delay (PD), One Sided...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.