The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents the experiments on feature selection for emotional speech classification. There are 152 features used in this experiment. The minimum redundancy maximum relevance (mRMR) feature selection is applied as the features selection. The experiments are constructed from two corpora; Interactive Emotional Dyadic Motion Capture (IEMOCAP) and Emotional Tagged Corpus on Lakorn (EMOLA) which...
This paper presents a system for automatic bird identification, which uses audio input. The experiments have been conducted on three groups of birds, which were created basing finishing on classification, the system is fully automated. The main problem in automatic bird recognition (ABR) is the choice of proper features and classifiers. Identification has been made using two classifiers-kNN (k Nearest...
Percussion instruments play a significant role in Carnatic music concerts. The percussion artist enjoys a great degree of freedom in improvising within the defined tāla structure of a composition. The objective of this paper is to transcribe the improvisations, treating the percussion strokes as syllables or aksharas.
This paper presents a speaker based Language Independent Isolated Speech Recognition System (LIISRS). The most popular feature extraction technique Mel Frequency Cepstral Coefficients (MFCC) is used for training the system. Representative specific features are identified using K-Means algorithm. Distortion measure is calculated using Euclidian distance function. Pitch contour characteristics are used...
This paper motivates the use of combination of mel frequency cepstral coefficients (MFCC) and its delta derivatives (DMFCC and DDMFCC) calculated using mel spaced Gaussian filter banks for text independent speaker recognition. MFCC modeled on the human auditory system shows robustness against noise and session changes and hence has become synonymous with speaker recognition. Our main aim is to test...
In speech recognition system, the Mel Frequency Cepstrum Coefficients (i.e. MFCC) feature extraction is an important process. It has also been wildly used in many applications. In this paper, we present the conventional MFCC feature extraction method and propose two novel versions of MFCC method that will combine the PCA technique and conventional MFCC feature extraction method. Finally, these three...
Devnagari (Marathi) is an Indo-Aryan language and has a number of speakers all around the world. Marathi language has gained acceptability in the media & communication and therefore deserves to have a place in the growing field of automatic speech recognition. This manuscript describes the automatic speech recognition system that recognizes Marathi phoneme using Continuous Density Hidden Markov...
Analysis and recognition of auditory scenes play an important role in content-based multimedia processing and context-aware applications. In this paper, we propose an auditory scene recognition scheme that integrates the analysis of the audio data of scene with LDA topic model to discover latent structures (i.e. contextual correlations) of audio words, and generation of intermediate contextual descriptions...
This paper presents a dialogue emotion recognition system using Hidden Markov Model (HMM). We have compared accuracy of Mel-frequency cepstral coefficients (MFCC), Energy, and wavelet sub-band energies and their first derivative and all possible combination. Based on our experiment, MFCC show better performance in comparison with the other studied features. We have also evaluated the impact of gender...
In neuroscience, the extracellular actions potentials of neurons are the most important signals, which are called spikes. However, a single extracellular electrode can capture spikes from more than one neuron. Spike sorting is an important task to diagnose various neural activities. The more we can understand neurons the more we can cure more neural diseases. The process of sorting these spikes is...
Besides video surveillance system for monitoring large urban areas also the acoustic events detection system can be used. The acoustic detection system is monitoring potentially dangerous sounds and in case of detection an alarm is produced. We developed our own approach to the acoustic events detection system with modified Viterbi decoder operating over HMM (Hidden Markov Models) especially adapted...
Hidden factor such as gender characteristic plays an important role on the performance of Bangla (widely used as Bengali) automatic speech recognition (ASR). If there is a suppression process that represses the decrease of differences in acoustic-likelihood among categories resulted from gender factors, a robust ASR system can be realized. In our previous paper, we proposed a technique of gender effects...
This paper presents a hybridization of Multilayer Neural Network-based Bangla phoneme recognition method for Automatic Speech Recognition (ASR) incorporating dynamic parameters. The method consists of four stages: at first stage, a multilayer neural network (MLN) converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities. Phoneme probabilities from the first...
Speaker-specific characteristics play an important role on the performance of Bangla (widely used as Bengali) automatic speech recognition (ASR). It is difficult to recognize speech affected by gender factors, especially when an ASR system contains only a single acoustic model. If there exists any suppression process that represses the decrease of differences in acoustic-likelihood among categories...
This paper presents a Neural Network-based Bangla phoneme recognition method for Automatic Speech Recognition (ASR). The method consists of three stages: at first stage, a multilayer neural network (MLN) converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities, where the second stage computes velocity (?) coefficients from the phoneme probabilities by using...
Aspiration is an important phonemic feature in several Indian languages. Unlike English, languages such as Marathi have lexicons in which words with different meanings differ only in the aspiration feature of the initial voiced or unvoiced stop. Thus the reliable discrimination of aspirated stops from their unaspirated counterparts is important in automatic speech recognition for such languages. The...
This paper describes an automatic system for the detection of some common pronunciation mistakes occurring in Quran recitation. It addresses the application of the Arabic language pronunciation rules. The system is a basic step towards a complete automatic teaching system of the Holy Quran recitation rules. The focus of this study is to detect the non proper pronunciation of a chosen set of emphatic...
In this paper, we propose an approach of multi-layered feature combination associated with support vector machine (SVM) for Chinese accent identification. The multi-layered features include both segmental and suprasegmental information, such as MFCC and pitch contour, to capture the diversity of variations in Chinese accented speech. The pitch contour is estimated using cubic polynomial method to...
Auditory based front-ends for speech recognition have been compared before, but this paper focuses on two of the most promising algorithms for noise robustness in automatic speech recognition (ASR). The feature sets are Zero-Crossings with Peak Amplitudes (ZCPA) and the recently introduced Power-Law Nonlinearity and Power-Bias Subtraction (PNCC). Standard Mel-Frequency Cepstral Coefficients (MFCC)...
In this paper, we introduce a full use of MPEG-7 audio features for environment recognition from audio for different multimedia applications. Environment recognition from audio files is a growing area of interest, however, compared to other branches of multimedia it is a less researched one. To recognize environment, we utilize total of 17 temporal and spectral MPEG-7 audio low level descriptors as...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.