The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper suggested a technique based on MFCC analysis for audio signals with speech classification application. The proposed work used multi-resolution (wavelet) analysis and spectral analysis based features for feature extraction. The proposed approach uses a no. of features like Mel Frequency Cepstral Coefficient (MFCC), and FFT Coefficients combined with wavelet based features. In addition, accuracy...
In addition to the traditional video surveillance, various audio processing techniques can also be added to the existing CCTV cameras. They can be used as additional features to help in analyzing the scene better and autonomously detecting violence or any unwanted activity in the scene. For this purpose, a deep learning based scream sound detection approach is proposed in this paper. MFCC features...
The main goal of this paper is to explore the recognition of particular guitar models from single instrument audio recordings. This is different than existing work in music instrument recognition that deals with identifying different instrument types. Through a set of experiments we evaluate different sets of audio features and classifiers for this purpose. To improve accuracy a composite classifier...
This paper presents the experiments on feature selection for emotional speech classification. There are 152 features used in this experiment. The minimum redundancy maximum relevance (mRMR) feature selection is applied as the features selection. The experiments are constructed from two corpora; Interactive Emotional Dyadic Motion Capture (IEMOCAP) and Emotional Tagged Corpus on Lakorn (EMOLA) which...
Speech recognition is widely applied to speech to text, speech to emotion, in order to make gadget and computer easier to use, or to help people with hearing disability. Feature extraction is one of significant step in the performance of speech recognition. Therefore, the proper selection is really needed. In this paper, we analyze feature extraction that can have good performance for Indonesian speech...
In this work, an effort has been made to identify vocal and non-vocal regions from a given song using signal processing techniques and machine learning algorithm. Initially spectral features like mel-frequency cepstral coefficients (MFCCs) are used to develop the baseline system. Statistical values of pitch, jitter and shimmer are considered to improve performance of the system. Artificial neural...
This paper presents a system for automatic bird identification, which uses audio input. The experiments have been conducted on three groups of birds, which were created basing finishing on classification, the system is fully automated. The main problem in automatic bird recognition (ABR) is the choice of proper features and classifiers. Identification has been made using two classifiers-kNN (k Nearest...
The most successful approach to speech and speaker recognition is to treat the speech signal as a stochastic pattern and to use a statistical pattern recognition technique for matching utterances. This paper attempts to study the performance of Text dependent speaker verification system using Delta-Delta Mel Frequency Cepstral Coefficients (MFCC-Δ-Δ) feature vector and Fuzzy C means (FCM) speaker...
The current benchmark speech-based depression detection techniques rely on acoustic speech parameters collected from large sets of representative speech recordings. This study for the first time investigates depression detection based on the higher order influence model (HOIM) coefficients and emotional transition parameters derived from a relatively small set of conversational speech recordings representing...
For many years, filterbanks have been widely used as one step of frontend feature extraction for Automatic Speech Recognition (ASR). In this paper, we propose a unified framework for ASR frontends, by first moving the nonlinear amplitude scaling, and then combining the filterbank weights with the cosine basis vectors. As part of this framework, we also show that the delta terms used to encode feature...
This paper studies the effectiveness of speech contents for detecting clinical depression in adolescents. We also evaluated the performances of acoustic features such as Mel frequency cepstral coefficients (MFCC), short time energy (Energy), zero crossing rate (ZCR) and Teager energy operator (TEO) using Gaussian mixture models for depression detection. A clinical data set of speech from 139 adolescents,...
Percussion instruments play a significant role in Carnatic music concerts. The percussion artist enjoys a great degree of freedom in improvising within the defined tāla structure of a composition. The objective of this paper is to transcribe the improvisations, treating the percussion strokes as syllables or aksharas.
This paper proposes a method to identify the arohana-avarohana of carnatic raga. Carnatic raga is broadly classified as melakarta (parent) and janya (child) raga. Arohana-avarohana of 10 different ragas is collected from 16 different singers. 16 audio data are collected for each raga. 11 among the 16 are used in the training phase and the remaining 5 are used for testing. The acoustic feature, MFCC...
This paper presents a speaker based Language Independent Isolated Speech Recognition System (LIISRS). The most popular feature extraction technique Mel Frequency Cepstral Coefficients (MFCC) is used for training the system. Representative specific features are identified using K-Means algorithm. Distortion measure is calculated using Euclidian distance function. Pitch contour characteristics are used...
This paper motivates the use of combination of mel frequency cepstral coefficients (MFCC) and its delta derivatives (DMFCC and DDMFCC) calculated using mel spaced Gaussian filter banks for text independent speaker recognition. MFCC modeled on the human auditory system shows robustness against noise and session changes and hence has become synonymous with speaker recognition. Our main aim is to test...
Human computer interaction with the time has extended its branches to many different other fields like engineering, cognition, medical etc. Speech analysis has also become an important area of concern. People involved are using this mode for the interaction with the machines to bridge the gap between physical and digital world. Speech emotion recognition has become an integral subfield in the domain...
In this paper, we propose a novel biométrie authentication system using motion vectors of lips. We have already proposed a biométrie authentication system using multimodal features of utterance. However, since both the edges and texture of lips can be easily extracted from a still image, an imposter may be recognized as a registrant by using a still image of the registrant. Therefore, the robustness...
Pulmonary acoustic signal analysis provides essential information on the present state of the Lungs. In this paper, we intend to distinguish between normal, airway obstruction pathology and interstitial lung disease using pulmonary acoustic signal recordings. The proposed method extracts Mel frequency cepstral coefficients (MFCC) and AR Coefficients as features from pulmonary acoustic signals. The...
This paper presents the process of Quranic Accent Automatic Identification. Recent feature extraction technique that is used for Quranic verse rule identification/Tajweed include Mel Frequency Cepstral Coefficients (MFCC) which prone to additive noise and may reduce the classification result. Therefore, to improve the performance of MFCC with addition of Spectral Centroid features and is proposed...
This paper presents the work of acoustic analysis related to Modern Standard Arabic (MSA). The problem of classifying the consonant counterparts in MSA is tackled here. The study considers four phonemes: /dˤ, ðˤ/ and their non-emphatic counterparts /d, ð/ respectively. An accurate automatic classification for those phonemes is to be achieved. Artificial neural networks (ANNs) are used for that purpose...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.