The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper we shall present recent results of two applications for monitoring using acoustical signal classification. The first case study is the problem of context awareness based on acoustic analysis for a service robot. Then we discussed the acoustic classification for wildlife intruder detection. Previous results are briefly recalled and new experimental results are also provided.
This paper aims to compare the Linear Predictive Cepstral Coefficients (LPCC) method, the Mel-frequency Cepstral Coefficient (MFCC) method, their concatenation (LPCC-MFCC), and a new proposed feature fusion approach based on method involving this concatenation with the respective averages normalization; Linear predictive and Mel-frequency Cepstral Coefficients (LMACC) through applying a multi-layer...
Dialect can be defined as a variety of a language that is distinguished from other varieties of the same language by pronunciation, grammar and vocabulary. The process of recognizing such dialects is called Dialect Identification. Kamrupi, although a dialect of the Assamese language, is spoken both in Assam (Kamrup district) and North Bengal. In this paper, we describe a method to identify not just...
Speech is natural vocalized and primary means of communication. Speech is easy, hand-free, fast and do not require any technical knowledge. Communicating with computer using speech is simple and comfortable way for human being. Speech recognition system made this possible. The acoustic and language model for this system are available but mostly in English language. In India there are so many peoples...
Identification of musical instruments from the acoustic signal using speech signal processing methods is a challenging problem. Further, whether this identification can be carried out by a single musical note, like humans are able to do, is an interesting research issue that has several potential applications in the music industry. Attempts have been made earlier using the spectral and temporal features...
Rhythm information, which plays an important role in music features, still has a long way to go. Most current researches on this field are based on single feature, which is unstable. In this paper, we proposed a novel method to change this by fusing rhythm feature with gammatone frequency cepstral coefficients(GFCC) feature. After the pre-processing including detecting the beginning of songs, removing...
This work projects the importance of phonetic match between train and test session for a text-independent framework under limited test data condition. The robustness of text-independent speaker verification (SV) tends to fall down with the reduction of the amount of speech involved. From a deployable application oriented system point of view, the amount of speech involved, is expected to be less to...
Voice recognition process is started with voice feature extraction using Mel Frequency Cepstrum Coefficient (MFCC). The purpose of the MFCC method is to get the signal feature that correlate to the human voice. The converted signal from analog to digital is needed in the MFCC method. The digital signal has a time domain and it make the analysis harder. So, the domain time is converted to time domain...
In this paper, we mainly paying attention on mechanization of Infant's Cry. For this implementation we use LFCC for feature extraction and VQ codebook for toning samples using LBG algorithm. The newborn crying samples composed from various crying baby having 0–6 months age. There are 27 babie's sound as training data, each of which represents the 7 hungry infant cries, 4 sleepy infant cries, 10 in...
The task of developing automatic speaker verification (ASV) system for security application is of considerable importance. This paper aims at developing a fusion strategy which combines both magnitude and phase information of the speech signal which yields a better performance when compared to conventional individual features. This paper employs Mel frequency cepstral coefficients (MFCC) and modified...
A method to detect an abnormal situation inside a public transport bus using audio signals is presented. Mel Frequency Cepstral Coefficients (MFCC) were used as a feature vector and a multilayer backpropagation neural network as a classifier. Audio samples were taken inside the bus running along Epifanio Delos Santos Avenue (EDSA), Metro Manila, Philippines. The audio samples depict sounds under normal...
This work presents an implementation of a speaker-dependent speech recognition system used to control a gripper. The application was made using MATLAB and the gripper was assembled using the Lego Mindstorm NXT robotic kit. Four commands are implemented for controlling the gripper: Open, close, rotate left and rotate right. The development was divided into two stages. In training stage, we use Mel...
In this paper, classifying and indexing hierarchical video genres using Support Vector Machines (SVMs) are based on only audio features. In fact, segmentation parameters are extracted at block levels, which have a major benefit by capturing local temporal information. The main contribution of our study is to present a powerful combination between the two employed audio descriptors; Mel Frequency Cepstral...
Speaker recognition has made great progress under the laboratory environment, but in real life the performance of speaker recognition system is affected by various factors including environmental noise. This paper studies the performance of speaker recognition system in noisy environment and presents Speaker recognition system using modified Mel-Frequency Cepstral Coefficients (MFCC) technique based...
Speaker recognition has made great progress under the laboratory environment, but in real life the performance of speaker recognition system is affected by various factors including environmental noise. This paper studies the performance of speaker recognition system in noisy environment and presents Speaker recognition system using Mel-Frequency Cepstral Coefficients (MFCC) technique based on different...
Obstruents are very important acoustical events (i.e., abrupt-consonantal landmarks) in the speech signal. This paper presents the use of novel Spectral Transition Measure (STM) to locate the obstruents in the continuous speech signal. The problem of obstruent detection involves detection of phonetic boundaries associated with obstruent sounds. In this paper, we propose use of STM information derived...
The more frequent human interaction to technology demands the development of methods of interaction with the machine to a more natural. Sound which is the most frequently used communication humans makes it one of the natural methods of interaction. Thus the development of a system that can recognize human speech as an action on the machine can be an option for those problems. Voice command is a speech...
The use of joint acoustic features at the feature level leads to vectors of large dimensions and computational complexity. In this paper we propose a method for dynamic acoustic feature stream selection based on mutual information criterion for speaker verification. The method is based on the intuition that different acoustic features are better suited for recognising different speakers. An optimal...
This paper proposes a method to identify the arohana-avarohana of carnatic raga. Carnatic raga is broadly classified as melakarta (parent) and janya (child) raga. Arohana-avarohana of 10 different ragas is collected from 16 different singers. 16 audio data are collected for each raga. 11 among the 16 are used in the training phase and the remaining 5 are used for testing. The acoustic feature, MFCC...
This work address text-independent speaker verification with the constraint of limited data (<15 seconds). The existing techniques for speaker verification work well for sufficient data (>1 minute). Developing techniques for verifying the speakers for limited data condition is a challenging issue since data available of speakers is very small nowadays. This is because people reluctant to give...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.