The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Audio Event Detection (AED) aims to recognize sounds within audio and video recordings. AED employs machine learning algorithms commonly trained and tested on annotated datasets. However, available datasets are limited in number of samples and hence it is difficult to model acoustic diversity. Therefore, we propose combining labeled audio from a dataset and unlabeled audio from the web to improve...
Feature extraction plays a very important role in the speech classification process because a better feature is good for improving the classification rate. This paper presents a speech feature extraction method by using Discrete Wavelet Transform (DWT) at 7th level of decomposition with mother wavelet of Dau-bechies 2, Renyi Entropy (RE), Autoregressive Power Spectral Density (AR-PSD), Statistical,...
Speech recognition systems are ubiquitous and find its application in automated voice control, voice dialling and automated directory assistance. This paper aims at implementing a neural network based isolated spoken word recognition system on an embedded board — Raspberry Pi using open source software called octave. Mel-Frequency Cepstral Coefficient (MFCC) features are extracted from speech signal...
The personal identification from the features of personal face and voice is described in this study. The face area is detected from the picture including both the face and the complicated background by using Microsoft Kinect sensor. The personal voice is also recorded from Kinect microphone array, which is used for the personal identification. The features of the personal face are calculated from...
Lyrics are an important part of songs. Lyrics recognition is the basis of retrieving songs and recognizing the content of songs, which is of great value. At present, the research of speech recognition has made great progresses. But there are still difficulties in recognition of lyrics in songs with accompaniment. Related research is generally lacking, especially for Chinese lyrics in songs with accompaniment,...
For cardiologists, the detection of cardiac abnormalities is a very delicate and crucial task for the treatment of a patient's condition. This task that requires electronic systems of medical assistance that is more precise, faster and reliable to help cardiologists to analyze and make the right decisions. These medical assistance systems tend to model the human expertise and perception using signal...
In this article we applied Support Vector Machines to acoustic model of Speech Recognition System based on MFCC and LPC features for Azerbaijani DataSet. This DataSet has been used for speech recognition by Multilayer Artificial Neural Network and achieved some results. The main goal of this work is applying SVM techniques to the Azerbaijan Speech Recognition System. The variety of results of SVM...
An artificial neural network is one of the most important models for training features in a voice conversion task. Typically, Neural Networks (NNs) are not effective in processing low-dimensional F0 features, thus this causes that the performance of those methods based on neural networks for training Mel Cepstral Coefficients (MCC) are not outstanding. However, F0 can robustly represent various prosody...
Speech processing is the one of the interesting and challenging concept in man machine communication. Emotion detection is the process of determination of the psychological state of the speaker. Pitch, formant frequencies, duration, timbre, MFCCs, energy are some of the efficient parameters from which, bulk of information can be retrieved from speech signal. These parameters have provided good accuracy...
The acoustic analysis can provide great results in the identification of voice disorders as a complementary tool to other medical techniques. This paper scrutinizes the Mel Frequency Cepstral Coefficients (MFCC), their first and second derivatives. A full comparative study is established in order to demonstrate that short-term cepstral parameters could be useful to conclude an efficient system for...
Spoken keyword recognition has been under the spotlight for the past several decades, but has gained significant attention in recent years due to the rapid increase in front-end technology applications for mobile and wearable computing. This work presents the trade-off in performance between Artificial Neural Networks (ANN) and Dynamic Time Warping (DTW) methodologies, implemented for this task under...
Detection of lung abnormalities by characterizing lung sounds has been a primary step for clinical examination for a pulmonologist. This work focuses on utilization of cepstral features for lung sound analysis and classification. The proposed method incorporates statistical properties of cepstral features along with artificial neural network (ANN) based classification. Experimental results indicate...
This paper presents the development of a speech-controlled human-computer interface (SR-HCI) as a subsystem of the audio-visual breast self-examination guidance system. This aims to better control the system during computer-guided breast self-examination (BSE) performance and allows for user indications of possible tumor locations by dictating it to the system through the speech recognition feature...
In this work, an effort has been made to identify vocal and non-vocal regions from a given song using signal processing techniques and machine learning algorithm. Initially spectral features like mel-frequency cepstral coefficients (MFCCs) are used to develop the baseline system. Statistical values of pitch, jitter and shimmer are considered to improve performance of the system. Artificial neural...
Computational methods for speech-based detection of depression are still relatively new, and have focused on either a standard set of features or on specific additional approaches. We systematically study the effects of feature type, machine learning approach, and speaking style (read versus spontaneous) on depression prediction in the AVEC-2014 evaluation corpus, using features related to speech...
Abstract Speech is an important means of communication. Gender is the most significant characteristic of speech. Pitch is commonly used feature for gender classification as it differs in male and female voice. But this method is not applicable in cases where pitch of male and female is almost the same. In this paper the above limitations are rectified by extracting other features like Mel Frequency...
Pulmonary acoustic signal analysis provides essential information on the present state of the Lungs. In this paper, we intend to distinguish between normal, airway obstruction pathology and interstitial lung disease using pulmonary acoustic signal recordings. The proposed method extracts Mel frequency cepstral coefficients (MFCC) and AR Coefficients as features from pulmonary acoustic signals. The...
This paper presents the work of acoustic analysis related to Modern Standard Arabic (MSA). The problem of classifying the consonant counterparts in MSA is tackled here. The study considers four phonemes: /dˤ, ðˤ/ and their non-emphatic counterparts /d, ð/ respectively. An accurate automatic classification for those phonemes is to be achieved. Artificial neural networks (ANNs) are used for that purpose...
Mood content in spoken word recognition is an important element in formulation of a decision support system (DSS). Many times it becomes integral components of human computer interaction (HCI) systems based on speech recognition with language orientation. In this paper, we propose a mood verification system of speakers of Assamese language with dialectal components. Five features namely Mel Frequency...
This paper is about the creation of an artificial neural network (ANN) in MATLAB to analyze the features extracted from calculating the mel-frequency cepstral coefficients (MFCC) of the raw audio data. The paper explains basic concepts about the ANN, as well as the MFCC and other relevant theories. Regarding the design of the ANN, it uses multiple infant crying sounds, as well as non-crying sounds,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.