The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Dropout and DropConnect can be viewed as regularization methods for deep neural network (DNN) training. In DNN acoustic modeling, the huge number of speech samples makes it expensive to sample the neuron mask (Dropout) or the weight mask (DropConnect) repetitively from a high dimensional distribution. In this paper we investigate the effect of Gaussian stochastic neurons on DNN acoustic modeling....
The problem of blind estimation of the room acoustic clarity index C50 from single-channel reverberant speech signals is presented in this paper. We analyze the performance of several machine learning methods for a regression task using 309 features derived from the speech signal and modeled with a Deep Belief Network (DBN), Classification And Regression Tree (CART) and Linear Regression (LR). These...
This paper addresses the problem of speech segregation by estimating the ideal binary mask (IBM) from noisy speech. Two methods will be compared, one supervised learning approach that incorporates a priori knowledge about the feature distribution observed during training. The second method solely relies on a frame-based speech presence probability (SPP) es-timation, and therefore, does not depend...
Speech recognition technology was applied to information collection of agricultural prices, with the acoustic models trained for agricultural prices information collection environment so as to minimize the environmental influence. Firstly, we constructed the speech corpus by collecting speech under the operating scene, and then selected tri-phone modeling as the decode unit to train hidden Markov...
This paper addresses the problem of automatic learning of statistical models of clicks for odontocete species classifications, particularly focusing on improving accuracy of the classifier by iteratively identifying click-like sounds that are likely to be noise and removing these from the model training set. The algorithm is weakly supervised in that no hand-labeled click regions are available, but...
This paper presents field results for a pollution estimation system based on ultrasound noise and Statistical AutoAssociative Artificial Neural Networks (SA³N²). The system extracts spectral information from the ultrasonic noise emitted by the corona discharges that occur nearby electric insulation, then correlates this information to a previously known pollution intensity situation. The entire acquisition...
In this paper, we propose a novel acoustic model adaptation method for noise robust speech recognition. Model combination is a common way to adapt acoustic models to a target test environment. For example, the mean supervectors of the adapted model are obtained as a linear combination of mean supervectors of many pre-trained environment-dependent acoustic models. Usually, the combination weights are...
In most real-world audio recordings, we encounter several types of audio events. In this paper, we develop a technique for detecting signature audio events, that is based on identifying patterns of occurrences of automatically learned atomic units of sound, which we call Acoustic Unit Descriptors or AUDs. Experiments show that the methodology works as well for detection of individual events and their...
This paper presents new approaches to improve the detection of two key audio events in a sport game (tennis) using contextual information. When analysing a tennis match using only audio information, the sound of the ball being struck and the occurrence of a line judge's shout can be obscured by players' grunts or shouts. Furthermore, if models of these two important events are trained from labelled...
In regions of the world where tuberculosis (TB) poses the greatest disease burden, the lack of access to skilled laboratories is a significant problem. A lab-free method for assessing patient recovery during treatment would be of great benefit, particularly for identifying patients who may have drug-resistant tuberculosis. We hypothesize that cough analysis may provide such a test. In this paper we...
Obstructive Sleep Apnea Syndrome (OSAS) is defined as a sleep related breathing disorder that causes the body to stop breathing for about 10 seconds and mostly ends with a loud sound due to the opening of the airway. OSAS is traditionally diagnosed using polysomnography, which requires a whole night stay at the sleep laboratory of a hospital, with multiple electrodes attached to the patient's body...
In this paper, we propose a robust classification strategy for distinguishing between a healthy subject and a patient with pulmonary emphysema on the basis of lung sounds. A symptom of pulmonary emphysema is that almost all lung sounds include some abnormal (i.e., adventitious) sounds. However, the great variety of possible adventitious sounds and noises at auscultation makes high-accuracy detection...
In this paper, we propose a novel multi-task multi-variate (MTMV) sparse representation method for multi-sensor classification, which takes into account correlations between sensors simultaneously while considering joint sparsity within each sensor's observations. This approach can be seen as the generalized model of multi-task and multivariate Lasso, where all the multi-sensor data are jointly represented...
In enclosed environments where robots are deployed, the observed speech signal is smeared due to reverberation. This degrades the performance of the automatic speech recognition (ASR). Thus, hands-free speech recognition for human-machine communication is a difficult task. Most speech enhancement techniques used to address this problem enhance the contaminated waveform independent from that of the...
This paper compares three different approaches currently used in recognizing contact calls made from the North Atlantic Right Whale (NRW), Eubalaena glacialis. We present two new approaches consisting of machine learning algorithms based on artificial neural networks (NET) and the classification and regression tree classifiers (CART), and compare their performance with earlier work that employs multi-Stage...
Using discriminative classifiers, such as Support Vector Machines (SVMs) in combination with, or as an alternative to, Hidden Markov Models (HMMs) has a number of advantages for difficult speech recognition tasks. For example, the models can make use of additional dependencies in the observation sequences than HMMs provided the appropriate form of kernel is used. However standard SVMs are binary classifiers,...
Flow regeneration noise is a main reason effect on attenuation performance of mufflers, at present no sophisticated software or tool is found to predict effectively flow regeneration noises from mufflers. Prediction of flow regeneration noise from a muffler element of simple expansion chamber is realized using Bp neural network, and comparison of prediction with experiment is carried out. Results...
This paper presents an enhanced stochastic mapping technique in the discriminative feature (fMPE) space that exploits stereo data for noise robust LVCSR. Both MMSE and MAP estimates of the mapping are given and the performance of the two is investigated. Due to the iterative nature of the MAP estimate, we show that combining MMSE and MAP estimates is possible and yields superior performance than each...
The presence of noise degrades the recognition percent of automatic speech recognition systems. The improvement of noise can be achieved by changing acoustic units during the recognition process. In this paper, we concentrate on automatic Arabic speech recognition in different conditions of noise using different acoustic units. Automatic Arabic speech was described by showing their constructing monophones,...
An approach is proposed to classifying simultaneous multiple low altitude targets in battlefield. Based on Independent Component Analysis (ICA), the mixed signal is separated into several single and pure signals, and the noise is removed from the acoustic signal. mel-frequency cepstrum coefficients (MFCC) which responses the characteristic of the sound more aggressively is extracted as characteristic...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.