The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This work proposes a method for the shouted and multi speaker's vs normal and single speaker's speech classification, which is the most frequently occurring scenario in news debates. In this work, multi speaker shouted and single speaker normal speech classes are addressed as shouted and normal speech, respectively. Spectral features and source features are explored for the classification task. The...
Post-traumatic stress disorder (PTSD) is a traumatic-stressor related disorder developed by exposure to a traumatic or adverse environmental event that caused serious harm or injury. Structured interview is the only widely accepted clinical practice for PTSD diagnosis but suffers from several limitations including the stigma associated with the disease. Diagnosis of PTSD patients by analyzing speech...
Depression is a mental disorder of high prevalence, leading to a negative effect on individuals, their families, society and the economy. In recent years, the problem of automatic detection of depression from the speech signal has gained more interest. In this paper, a new multiple classifier system for depression recognition was developed and tested. The novel aspect of this methodology is the combination...
This paper focuses on open set text independent speaker identification which is one of the most challenging subclass of Speaker recognition. The initial stage is similar to closed set speaker identification, where the distortion for each test voice against all train voices are determined. The distortions after normalization is set as decision criteria which eases the process of thresholding. The threshold...
With the increasing stress in working and studying, mental health becomes a major problem in the current social research. Generally, researchers can analyze psychological health states by using social perception behavior. The speech signal is an important research direction in this domain. It objectively assesses the mental health of social groups through the extraction and fusion of speech features...
Speech emotion recognition has been widely used in human computer interaction and applications. This paper has classified emotion into two classes: happy and angry. All the speech signal is preprocessed from Malay spoken speech database. Emotional information is obtained by applying two well-established acoustical features that are Mel Frequency Cepstral Coefficients (MFCC) and Short Time Energy (STE)...
The results of the implementation of an external accent recognition system and its integration into massive open online courses platform Moodle are reported. Accent recognition becomes important in foreign languages learning to provide a feedback to a student on a presence of a certain unwanted accent in a foreign language pronunciation. Implementation of several accent recognition methods and their...
The aim of this study is to suggest an algorithm that combines two speech recognition systems. These systems differ in the methods used in the feature extraction stage, but they have the same classifier Hidden Markov Model (HMM). The first system uses Mel-Frequency Cepstrum Coefficients (MFCC), the second one uses Linear Prediction Cepstrum Coefficients (LPCC), and the third system uses Perceptual...
Speaker-dependent speech recognition system requires the system should not only recognize speech, but also recognize the speaker of the segment. In this paper, two indicators are selected—short-time average zero-crossing rate and dual-threshold endpoint to test the signal endpoint through the study of speaker-dependent isolated-word speech characteristics, and MFCC parameters are taken...
The goal of this work is to validate the impact of natural elicitation of emotions by the speakers during the development of speech emotion databases for Malayalam language. The work also proposes a Gaussian Mixture Model-Deep Belief Networks (GMM-DBN) based speech emotion recognition system. To test the effect of emotion elicitation by the speakers, two independent datasets with emotionally biased...
The robustness of speaker verification systems is often degraded in real forensic applications, which contain environmental noise and reverberation. Reverberation results in mismatched conditions between enrolment and test speech signals. In this work, we investigate the effectiveness of combining features of discrete wavelet transform (DWT) and feature-warped mel frequency cepstral coefficients (MFCCs)...
Speaker verification based on phonetic-acoustic approach and text-dependent framework has been applied for forensic purposes in Indonesian court since 2008. In order to accelerate the speaker verification process, an automatic text-independent system is developed. This automatic system employs MFCC features and GMM speaker modeling, a standard and simple approach used in automatic speaker recognition...
This paper presents a study of how speech features have comparable parameters amongst blood relations. Mel Frequency Cepstral Coefficients (MFCC) has been used for extracting the features of input speech signal, along with vector quantization through modified k-means LBG (Linde, Buzo, and Gray) algorithm are implemented to analyse and estimate the similarity to perform related studies. The study is...
Speech Synthesis (SS) and Voice Conversion (VC) presents a genuine risk of attacks for Automatic Speaker Verification (ASV) technology. In this paper, we evaluate front-end anti-spoofing technique to protect ASV system for SS and VC attack using a standard benchmarking database. In particular, we propose a novel feature set, namely, Energy Separation Algorithm-based Instantaneous Frequency Cosine...
The vulnerability of automatic speaker verification (ASV) systems against spoofing attacks is an important security concern about the reliability of ASV technology. Recently, various countermeasures have been developed for spoofing detection. In this paper, we propose to use features derived from linear prediction (LP) residual signal for spoofing detection using simple Gaussian mixture model (GMM)...
In this paper, automatic speaker verification using normal and whispered speech is explored. Typically, for speaker verification systems, varying vocal effort inputs during the testing stage significantly degrades system performance. Solutions such as feature mapping or addition of multi-style data during training and enrollment stages have been proposed but do not show similar advantages for the...
The Glottal Mixture Model (GLOMM) extracts speaker-dependent voice source information from speech data. It has previously been shown to provide speaker identification performance on clean speech comparable to universal background model (UBM), a state of the art method based on MFCC. And, when combined with UBM, the error rate was reduced by a factor of three, showing that the voice source information...
Background noise reduction has been studied for many years. However, unwanted human speech noise suppression is not well discussed due to sparsity of the speech signal. Traditional blind source separation (BSS) methods such as independent component analysis (ICA) assume the prior knowledge of the number of sources and require that the number of sources must equal the number of sensors. Above limitations...
Speaker recognition has been developed over many years and it comes with many different methods. MFCC is one of more the successful methods due to it being generally modeled on the human auditory system. It represents high success rate of recognition and strong robustness against noise in the lower frequency regions. However, in the higher frequency regions, it captures speaker characteristics information...
The growth in human computer interaction has necessitated the requirement of accurate recognition of emotion from speech data. This paper presents a new novel feature called TEO (Teager Energy Operator) Slope for emotion recognition. The feature is obtained by applying least square fit instead of applying DCT in TEO feature. The feature was tested on the publically available Berlin Emotion Database...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.