The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Recently we proposed a novel multichannel end-to-end speech recognition architecture that integrates the components of multichannel speech enhancement and speech recognition into a single neural-network-based architecture and demonstrated its fundamental utility for automatic speech recognition (ASR). However, the behavior of the proposed integrated system remains insufficiently clarified. An open...
This paper proposes a new noisy robust speech recognition method. Under noise circumstances, several noise reduction methods have been developed and they are applied in various noise conditions. However, in case of similar pronunciation speech, for example, it is still not easy to realize high recognition accuracy. In this paper, the new processing algorithm into speech modulation spectrum is proposed...
This paper proposes a competent system that is not only text independent in identifying gender of a speaker but can also work efficiently in noisy environmental conditions in real time. The noisy environmental conditions are the places where noise signals are generated at different SNRs (Signal to Noise Ratios) such as train station, restaurant, exhibition hall, airport, and so on. The algorithms...
Recent research shows that the i-vector framework for speaker recognition can significantly benefit from phonetic information. A common approach is to use a deep neural network (DNN) trained for automatic speech recognition to generate a universal background model (UBM). Studies in this area have been done in relatively clean conditions. However, strong background noise is known to severely reduce...
The Automatic Captioned Relay Service is crucial for hearing disabilities or hard-of-hearing to communicate with others in real life. This service uses an Automatic Speech Recognition (ASR) to transcribe speech to a caption. If can reduce waiting time from non-streaming speech recognition, the relay service will support more users. In this paper, we proposed a method for improving a voice activity...
In speech interfaces, it is often necessary to understand the overall auditory environment, not only recognizing what is being said, but also being aware of the location or actions surrounding the utterance. However, automatic speech recognition (ASR) becomes difficult when recognizing speech with environmental sounds. Standard solutions treat environmental sounds as noise, and remove them to improve...
We propose a sudden-noise suppression method for speech recognition using a phase linearity feature for noise detection. Our investigation of sound data recorded in actual retail stores shows that short, sudden noises are dominant in such environments. We also confirm the negative effect of such noises on speech recognition performance. Our method addresses this problem by focusing on sudden noises...
In this paper, a two-layer Gaussian Mixed Model (GMM) structure for Vector Taylor Series (VTS) feature compensation is proposed for robust speech recognition. Since GMM with the numerous mixture components is used for VTS, the computation complexity of VTS is extremely huge. To deal with this issue, we propose two-layer GMM structure for VTS. In detail, the GMM with fewer mixture components is utilized...
Sub-band speech processing is well-known in robust speech recognition. On the other hand, in recent years, deep neural networks (DNNs) have been widely used in speech recognition for acoustic modeling and also feature extraction and transformation. In this paper, we propose to use deep belief network (DBN) as a post-processing method for de-noising in Mel sub-band level where we enhance logarithm...
Detection of whispered speech in the presence of high levels of background noise has applications in fraudulent behaviour recognition. For instance, it can serve as an indicator of possible insider trading. We propose a deep neural network (DNN)-based whispering detection system, which operates on both magnitude and phase features, including the group delay feature from all-pole models (APGD). We...
Owing to the decline of recognition rate of speech recognition system in noisy environments. In signal space, the speech enhancement algorithm which combines the Priori Signal-to-Noise Ratio (SNR) with Auditory Masking Effect can effectively remove the noise of the speech signal. In feature space, improved non-uniform spectral perceptual compression feature extraction algorithm can effectively compress...
Generally Speech Recognition Systems are specific to speech/spoken word recognition or Speaker Identification/Verification. In this paper, An attempt has been made to find the better combination of Speech feature extraction and Artificial Neural Network Model for Speaker Identification combined with Spoken word recognition in general noisy back ground (i. e Home/Office environment). Different speech...
Automatic speech recognition is one of the challenging area in the field of speech signal processing. Automatic speech recognition technology converts speech signal into text. This paper presents the implementation of isolated kannada word recognizer using Vector Quantization (VQ) and Fuzzy-C Means (FCM) techniques. The paper compares and contrasts the recognition accuracies of FCM and k-means techniques...
The speech processing and its application to intelligent system is the state of the art research. The systems are getting smarter day by day with the introduction of the speech signal to control the machine. The basic model of human speech system is used to design the algorithm which is able to detect words as well as an alphabetical letter to generate commands and control machine. The speech enhancement...
In emotion recognition from speech, several well-established corpora are used to date for the development of classification engines. The data is annotated differently, and the community in the field uses a variety of feature extraction schemes. The aim of this paper is to investigate promising features for individual corpora and then compare the results for proposing optimal features across data sets,...
Spoken language identification is a technique to model and classify the language, spoken by an unknown person. Language identification task is more challenging in environmental condition due to addition of different types of noise. Presence of noise in speech signal causes several nuisances. This paper covers several aspect of language identification in noisy environment. Experiments have been carried...
This paper presents the use of lip-reading and Thai speech to control electronic devices in a vehicle. The Viola-Jones algorithm detects the face of the driver and the constrained local model detects their mouth area before three lips features are extracted. Hidden Markov models are utilized to recognize speech and lip movement, with the lip movement recognizer offering better accuracy than the speech...
The speaker verification (SV) task has been an active area of research in the last thirty years. One of the recent research topics is on improving the robustness of SV system in challenging environments. This paper examines the robustness of current state of the art SV system against background noise corruptions. Specifically, we consider the scenario where the SV system is trained from noise free...
This paper addresses the problem of robust text-independent speaker verification when some of the features for the target signal are heavily masked by noise. In the framework of Gaussian mixture models (GMMs), a new approach based on the spectral subtraction technique and the statistical missing feature compensation is presented. The identity of spectral features missing due to noise masking is provided...
We propose a spatial diffuseness feature for deep neural network (DNN)-based automatic speech recognition to improve recognition accuracy in reverberant and noisy environments. The feature is computed in real-time from multiple microphone signals without requiring knowledge or estimation of the direction of arrival, and represents the relative amount of diffuse noise in each time and frequency bin...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.