The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper describes an implementation of speech recognition that recognizes and suppresses ten (10) defined profane and vulgar Filipino words. The adapted speech recognition architecture was that of the Oregon Graduate Institute's (OGI) Center for Spoken Language and Learning (CSLU). It utilizes a hybrid Hidden Markov Model/ Artificial Neural Network (HMM/ANN) keyword spotting framework. The feature...
Automatic speech recognition (ASR) can be very helpful for speakers who suffer from dysarthria, a neurological disability that damages the control of motor speech articulators. Although a few attempts have been made to apply ASR technologies to sufferers of dysarthria, previous studies show that such ASR systems have not attained an adequate level of performance. In this study, a dysarthric multi-networks...
Emotion Recognition from speech has evolved itself as the most significant research area in the field of affective computing. In this paper, two emotional speech datasets, have been analyzed, based on gender distinction (male and female speech). This paper introduces a new approach of speech-emotion recognition based on the use of AdaBoost classification Algorithm. Artificial neural network has been...
In this paper, we present a model for Turkish speech recognition. The model is syllable-based, where the recognition is performed through syllables as speech recognition units. The main goal of the model is to recognize as much as possible of a given continuous speech by identifying only a small set of syllables in the language. For that purpose, only the syllable types with a higher frequency are...
In order to overcome the problem existing in original speech recognition (e.g. noise interruption and private data loss), many researchers have investigated to deal with these problems. Electromyography (EMG) from the muscles producing speech was used to replace a voiced signal. Similarly, we aim to develop EMG speech recognition based on Thai language. Tone is the important characteristic of this...
Speech signals are one of the most important means of communication among the human beings. In this paper, a comparative study of two feature extraction techniques are carried out for recognizing speaker independent spoken isolated words. First one is a hybrid approach with Linear Predictive Coding (LPC) and Artificial Neural Networks (ANN) and the second method uses a combination of Wavelet Packet...
A classification system that accurately categorizes caller behavior within Interactive Voice Response systems would assist in developing good automated self service applications. This paper details the implementation of such a classification system for a pay beneficiary application. Adaptive Neuro-Fuzzy Inference System (ANFIS), Feed forward Artificial Neural Network (ANN) and Support Vector Machine...
The aim of the spoken term detection task is to find the occurrence of user-entered keywords in an archive of audio recordings. The kind of techniques that are used usually are vocabulary-independent, using only the acoustic information available. In this scenario, however, we rely exclusively on the acoustic model, which is a drawback when it is unreliable; for example when the input is noisy. In...
The context-independent deep belief network (DBN) hidden Markov model (HMM) hybrid architecture has recently achieved promising results for phone recognition. In this work, we propose a context-dependent DBN-HMM system that dramatically outperforms strong Gaussian mixture model (GMM)-HMM baselines on a challenging, large vocabulary, spontaneous speech recognition dataset from the Bing mobile voice...
This paper discusses and evaluates the effect of voice Activity Detection (VAD) in an isolated Yoruba word recognition system (IYWRS). The word database used in this paper are collected from 22 speakers by repeating the numbers 1 to 9 three times each. A hybrid configuration of Mel-Frequency Cepstral coefficient (MFCC) and Linear Predictive Coding (LPC) have been used to extract the features of the...
This paper describes an application of the Orthogonal Least Squares (OLS) algorithm for feature selection of spoken letters. Traditionally used for system identification purposes, the OLS method was used to select important Mel-Frequency Cepstrum Coefficients (MFCC) for classification of two spoken letters - `A' and `S' using Multi-Layer Perceptron (MLP) neural network. We evaluated several network...
The aim of the spoken term detection task is to find the occurrence of user-entered keywords in an archive of audio recordings. In this area, besides the accuracy of hits returned, the speed of search is also very important, for which an intermediate representation of recordings is normally used. In this paper we evaluate a spoken term detection method which represents the speech signals by their...
This paper describes a Bangla phoneme recognition method for Automatic Speech Recognition (ASR). The method consists of two stages: i) a multilayer neural network (MLN), which converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities and ii) the phoneme probabilities obtained from the first stage and corresponding Δ and ΔΔ are inserted into another MLN to...
This paper presents a Bangla phoneme recognition method for Automatic Speech Recognition (ASR). The method consists of three stages: i) a multilayer neural network (MLN), which converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities, ii) the phoneme probabilities obtained from the first stage and corresponding Δ and ΔΔ are inserted into another MLN to improve...
There are some problems to be resolved for speech emotion recognition, such as the dimension of feature sets is usually too high and the redundancy among various features is relatively stronger. Considering these problems, the factor analysis and majority voting based speech emotion recognition was proposed. How to extract emotional factors from global statistical features and GMM super vectors was...
This paper describes an isolated word recognition method based on distinctive phonetic features (DPFs). The method comprises two multilayer neural networks (MLNs). The first MLN, MLNLF-DPF, maps local features (LFs) of an input speech signal into discrete DPFs and the second MLN, MLNDyn, restricts dynamics of outputted DPFs by the MLNLF-DPF. In the experiments on Tohokudai Isolated Spoken-Word Database...
In RBF neural network designing, hidden neuron number and parameter influence the performance of network. The paper discusses influences of pruning hidden neurons using different criteria and parameter on speech recognition rate of modified RBF neural network. First we introduce three hidden neuron pruning criteria, then propose a modified RBF neural network, at last recognition results before and...
This paper describes the methodology to recognize Thai speech words by integrating two approaches e.g., Double filter banks and Euclidian distance in a feature extraction and the recognition processes, respectively. Firstly, the speech signals are transformed into the 3-dimension of signal or spectrogram. The spectrogram displays energy information along both time and frequency axes. Secondly, the...
Most of the research works in Information Extraction focus only on written language processing, in which a few are devoted to the study of Spoken Language Information Extraction. This paper discusses a novel technique for recognition of the isolated question words from Malayalam (one of the south Indian languages) speech query. We have created and analyzed a database consisting of 250 isolated question...
This paper introduces Partially Connected Locally Recurrent Probabilistic Neural Networks (PC-LRPNN) as an extension of the well-known Probabilistic Neural Networks (PNN) and Locally Recurrent Probabilistic Neural Networks (LRPNN). Besides the definition of the PC-LRPNN architecture a fast four-step training method is proposed. The first two steps are identical to the training of traditional PNNs,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.