The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Dialect can be defined as a variety of a language that is distinguished from other varieties of the same language by pronunciation, grammar and vocabulary. The process of recognizing such dialects is called Dialect Identification. Kamrupi, although a dialect of the Assamese language, is spoken both in Assam (Kamrup district) and North Bengal. In this paper, we describe a method to identify not just...
This paper aims to bring in light non-intrusive speech quality assessment using Teager-Kaiser energy computation. Based on the above mentioned energy computation technique, Features in the form of cepstral coefficient are calculated and thereby, compared to the classical Mel-frequency cepstral co-efficient. The energy computation technique is vividly used in automatic speech recognition area. The...
Automatic spoken digit recognition is one of the important areas in speech recognition. Local language spoken digits recognition is the next stage in this technological advancement. This paper presents a new approach for Pashto digits recognition using spectral and prosodic based feature extraction. Very little or almost no work has been done in Pashto spoken digit recognition. Thats why no standard...
Emotions exhibited by a speaker can be detected by analyzing his/her speech, facial expressions and gestures or by combining these properties. This paper concentrates on determining the emotional state from speech signals. Various acoustic features such as energy, zero crossing rate(ZCR), fundamental frequency, Mel Frequency Cepstral Coefficients (MFCCs), etc are extracted for short term, overlapping...
Detection of Emotion by analysis of speech is important for identification of emotional state of person. This can be done using ‘Linear Predictive Techniques(LPC)’, which has different parameters like pitch, vocal tract spectrum, formant frequencies, Duration, MFCC etc. which are used for extraction of features from speech. TEO-CB-Auto-Env is the method which is non-linear method of features extraction...
This paper presents an approach that aims to recognize stressed speech utterances. Our work consists of extracting features using Mel Frequency Cepstral Coefficients (MFCC) and Gammatone Frequency Cepstral Coefficients (GFCC). Indeed, these features are classified with One-class Support Vector Machines (OC-SVM). The results of the proposed method are obtained by conducting speech samples of four stressed...
Speech recognition is a procedure of perceiving human speech by the PC and creating string yield in composed shape. A model is found out from an arrangement of sound recordings whose comparing transcripts are made by taking recordings of speech as sound and their content interpretations, and utilizing programming to make measurable meaning of the sounds that identify every word. Speech based applications...
The term gender identification deals with finding out the gender of a person from his or her voice. Gender identification has been implemented in several Automatic Speaker Recognition (ASR) systems and has proved to be of great significance. The use of gender identification in today's technology makes it easier for user authentication and identification in high security systems. In this paper, we...
The amount of audio data on public networks like Internet is increasing in huge volume daily. So to access these media, we need to efficiently index and annotate them. Due to non-stationary nature and discontinuities present in the audio signal, segmentation and classification of audio signal has really become a challenging task. Automatic music classification and annotation is also one of the challenging...
Recently studies have been performed on spectral features such as Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictor Cepstral Coefficients (LPCC) for speech emotion recognition. It was found in our study that the Fourier Transform of MFCC time trajectories also play an important role in speech emotion recognition. And also a new hierarchical classification method was proposed based on...
This article proposes a gender and geographical origin recognition system for Arabic speakers based on the dialect and accent characteristics. We demonstrate that the speaker gender and nationality can be determined from colloquial Arabic speech and recommend that this system can be integrated to more complex biometric applications. The acoustic features of our proposed dataset used to identify the...
Automatic prediction of continuous level emotional state requires selection of suitable affective features to develop a regression system based on supervised machine learning. This paper investigates the performance of low-level dynamic features for predicting two common dimensions of emotional state, namely, valence and arousal instantaneously. Low-complexity features are extracted from audio and...
I-vector space feature has been recently proved to be very efficient in speaker recognition field. In this paper, we assess using the i-vector approach for emotional speaker recognition in order to boost the performance which is deteriorated by emotions. The key idea of the i-vector algorithm is to represent each speaker by a fixed length and low dimensional feature vector. The concatenation of these...
Speech is natural vocalized and primary means of communication. Speech is easy, hand-free, fast and do not require any technical knowledge. Communicating with computer using speech is simple and comfortable way for human being. Speech recognition system made this possible. The acoustic and language model for this system are available but mostly in English language. In India there are so many peoples...
This paper presents an alternate representation of phase information in speech signals using Hartley transform. Hartley Group Delay Function (HGDF) is computed on similar lines of Fourier Group delay function. Cepstral smoothing is applied so as to reduce the spiky nature of the group delay functions. The smoothened HGDF (SHGDF) is reported to have better resolution in group delay spectrum. A speaker...
Speech is the simplest modality to be considered for Unimodal Biometric System. The accuracy however does depend on the quality of the signal which tends to be compromised due to the conditions under which it is spoken or recorded. To implement a robust system the challenge is in enhancing the quality and intelligibility of the noisy speech signal. Various speech enhancement techniques can be applied...
Articulatory features are used as an universal set of speech attributes shared across many different languages. Some multilingual and cross-language speech recognition systems using articulatory features have been shown to improve the performance. The existing articulatory features are defined by phonetician as a set of articulatory descriptions of phones, which represent some semantic information...
Peer-Led Team Learning (PLTL) is a structured learning model where a team leader is appointed to facilitate collaborative problem solving among students for Science, Technology, Engineering and Mathematics (STEM) courses. This paper presents an informed HMM-based speaker diarization system. The minimum duration of short conversational-turns and number of participating students were fed as side information...
Speech is one of the most popular modalities for emotion recognition. This work uses Mel and Bark scale dependent perceptual auditory features for recognizing seven emotions from Berlin speech corpus. A combination of Mel Frequency Cepstral Coefficients (MFCC's), Perceptual Linear Predictive Cepstrum (PLPC), Mel Frequency Perceptual Linear Predictive Cepstrum (MFPLPC) and Linear predictive coefficients...
In this paper, we investigate the effect of the G.723.1 (6.3kbps) on speaker recognition system. In order to improve the robustness of codec mismatch, we used the Power Normalized Cepstral Coefficients (PNCC) which is a new robustness acoustic feature, to improve the performance of speaker verification system. And a modified SCF speech feature is propose to improve the robustness under codec mismatch...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.