The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In general for any speech processing, represented speech signals are pre-processed for some features at front end and some estimation are performed at back end. Hidden Markov Model is exclusively used for modeling time-varying vector sequences due to its simplicity. It also provides high accuracy in non-stationary environment. In this paper, HTK (Hidden Markov model Tool-Kit) toolkit is used for compiling...
Cry segmentation is an essential preprocessing step in any infant crying diagnosis system. Besides crying sounds consisting of expiration phases followed by short periods of inspiration episodes, each recording of newborn cries also includes silence sections as well as other sounds such as speech of caregivers, noise and sound of medical equipments. This paper is devoted to a newly developed Empirical...
Missing data theory has recently been used as a solution to noise robustness issue in Automatic Speech Recognition (ASR). Missing components of spectrogram can either be reconstructed, as carried out in Spectral Imputation, or simply ignored, as done in classifier modification. Most of the research has been focused on imputation because of the problems associated with classifier modification approaches...
This This research constructs a phonetic feature (PF) table for all the phonemes pronounced in Bangla (widely known as Bengali) language where the whole study is divided into two parts. In the first part, a PF table is constructed, while the second part deals with Bangla automatic speech recognition (ASR) using PFs. For Bangla language, fifty three phonemes including both vowels and consonants are...
This paper presents the development of a speech recognition system for automatically recognizing fluently spoken digit strings in Northern Sotho. The digit strings can be isolated or connected/continuous with known or unknown length. The digit recognition system has been trained with the aim of satisfying its potential end-users. Our main research focus was to enhance the robustness of a connected-digits...
This paper presents a novel endpoint detection method based on Cepstral Mean Subtraction (CMS) for robust and accurate speech recognition in noisy environments. The improved method based on CMS applies Hidden Markov Model (HMM) to do two-step classification for better performance, using the optimal spectral feature subset extracted according to the rule of minimum conditional entropy. In addition,...
This paper presents the use of fuzzy min-max neural network for the text independent speaker identification. The fuzzy min-max neural network utilizes fuzzy sets as pattern classes. It is a three layer feedforward network that grows adaptively to meet the demands of the problem. The database containing speech utterances recorded from fifty speakers in Marathi language is used for experimentation....
We present our experiments in context-free recognition of non-lexical responses. Non-lexical verbal responses such as mmm-hmm or uh-huh are used by listeners to signal confirmation, uncertainty in understanding, agreement or disagreement in speech-based interaction between humans. Correct recognition of these utterances by speech interfaces can lead to a more natural interaction paradigm with computers...
Efficiency of the speech recognition system in noise free environment is impressive but in the presence of environmental noise the efficiency of the speech recognition system deteriorates drastically. Environmental noise also affects human-to-human or human-to-machine communications and degrades the speech quality as well as intelligibility. Here a speech recognition system is proposed in presence...
One of the issues in using audio books for building a synthetic voice is the segmentation of large speech files. The use of the Viterbi algorithm to obtain phone boundaries on large audio files fails primarily because of huge memory requirements. Earlier works have attempted to resolve this problem by using large vocabulary speech recognition system employing restricted dictionary and language model...
This paper proposes a system obtained through decision level fusion of two well known biometric sensors to identify a person namely, Fingerprint sensor and Voice sensor. More than one sensor is needed for critical or highly secured areas. This paper proposes a multiple sensor data fusion methodology using Fuzzy Logic (FL) approach. The finger prints recognition system uses orientation of the input...
The area of speaker recognition is concerned with extracting the identity of the person speaking. Speaker recognition can be classified into speaker identification and speaker verification. Speaker identification can be Text-Independent or Text-Dependent. In this paper we lay emphasis on text-Independent speaker identification system where we adopted Mel-Frequency Cepstral Coefficients (MFCC) as the...
In Islamic religion, mistakes in recitation of holy Quran (the sacred book of Muslims) are forbidden. Mistakes can be missing words, verse, misreading Harakat (pronunciations, punctuations, and accents). Thus, a hafiz/reciter who memorizes the holy Quran, needs other hafiz/tutor who listens the recitation and points oral mistakes. Due to the seriously commitment, the availability and expertise of...
We propose a statistical framework for high-level feature extraction that uses SIFT Gaussian mixture models (GMMs) and audio models. SIFT features were extracted from all the image frames and modeled by a GMM. In addition, we used mel-frequency cepstral coefficients and ergodic hidden Markov models to detect high-level features in audio streams. The best result obtained by using SIFT GMMs in terms...
Auditory based front-ends for speech recognition have been compared before, but this paper focuses on two of the most promising algorithms for noise robustness in automatic speech recognition (ASR). The feature sets are Zero-Crossings with Peak Amplitudes (ZCPA) and the recently introduced Power-Law Nonlinearity and Power-Bias Subtraction (PNCC). Standard Mel-Frequency Cepstral Coefficients (MFCC)...
Smartphones with diverse sensing capabilities are becoming widely available and pervasive in use. With the phone becoming a mobile personal computer, integrated applications can use multi-sensory data to derive information about the user's actions and the context in which these actions occur. This paper develops a novel method to assess daily living patterns using a smartphone equipped with microphones...
This paper reports the design, implementation, and evaluation of a research work for developing a high performance natural speaker-independent Arabic continuous speech recognition system. It aims to explore the usefulness and success of a newly developed speech corpus, which is phonetically rich and balanced, presenting a competitive approach towards the development of an Arabic ASR system as compared...
In order to help general technicians to recognize insects conveniently in pests management, this paper proposed a viable scheme to identify insect sounds automatically by using Sub-band based cepstral(SBC) and Hidden Markov Model(HMM). The acoustic signal is preprocessed, segmented into a series of sound samples. SBC is extracted from the sound sample as the feature, and HMMs are trained with given...
We describe a method to select features for speech recognition that is based on a quantitative model of the human auditory periphery. The method maximizes the similarity of the geometry of the space spanned by the subset of features and the geometry of the space spanned by the auditory model output. The selection method uses a spectro-temporal auditory model that captures both frequency- and time-domain...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.