The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Pitch is an important characteristic of speech and is useful for many applications. However, it is still challenging to estimate pitch in strong noise. In this paper, we propose a joint training approach to determinate pitch. First, a Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTMRNN) is trained to map the noisy to clean speech features. Second, the pitch estimation is also...
The aim of this work is the estimation of respiratory flow from lung sound recordings, i.e. acoustic airflow estimation. With a 16-channel lung sound recording device, we simultaneously record the respiratory flow and the lung sounds on the posterior chest from six lung-healthy subjects in supine position. For the recordings of four selected sensor positions, we extract linear frequency cepstral coefficient...
Acoustic beamforming has played a key role in the robust automatic speech recognition (ASR) applications. Accurate estimates of the speech and noise spatial covariance matrices (SCM) are crucial for successfully applying the minimum variance distortionless response (MVDR) beamforming. Reliable estimation of time-frequency (TF) masks can improve the estimation of the SCMs and significantly improve...
A Gaussian mixture model (GMM) is used in state-of-the-art i-Vector based speaker recognition systems for acoustic space division and prediction. The main purpose of such acoustic space clustering is to constrain the acoustic comparison in small regions where between-speaker differences are the main source of variability. In this study, we investigate two unsupervised discriminative approaches as...
Based on the relationship between porosity (or lithological facies) and other petrophysical properties, Artificial neural networks (ANN) are respectively trained for porosity estimation and lithological facies classification, using core porosity (CPOR) data and core lithological facies interpretation results of part of core interval together with some well logs (petrophysical properties). After the...
In this paper, the methods for estimating and tracking sparse doubly spread channels in single-carrier coherent communications are investigated. The sparse doubly spread channel is parameterized by a few paths with different delays, Doppler scales, and gains. Based on the model, a low-complexity channel estimation algorithm is proposed. The channel estimation is divided into two stages, the first...
Spoken language translation (SLT) combines automatic speech recognition (ASR) and machine translation (MT). During the decoding stage, the best hypothesis produced by the ASR system may not be the best input candidate to the MT system, but making use of multiple sub-optimal ASR results in SLT has been shown to be too complex computationally. This paper presents a method to rescore the k-best ASR output...
In this paper we present a confidence estimation system using recurrent neural networks (RNN) and compare it to a traditional multilayered perception (MLP) based system. The ability of RNN to capture sequence information and improve decisions using processed history was main motivation to explore RNN's for confidence estimation. In this paper we also explore two subtle variations of confidence estimator:...
The problem of blind estimation of the room acoustic clarity index C50 from single-channel reverberant speech signals is presented in this paper. We analyze the performance of several machine learning methods for a regression task using 309 features derived from the speech signal and modeled with a Deep Belief Network (DBN), Classification And Regression Tree (CART) and Linear Regression (LR). These...
This paper addresses the problem of speech segregation by estimating the ideal binary mask (IBM) from noisy speech. Two methods will be compared, one supervised learning approach that incorporates a priori knowledge about the feature distribution observed during training. The second method solely relies on a frame-based speech presence probability (SPP) es-timation, and therefore, does not depend...
The effectiveness of unsupervised speaker adaptation is typically limited by errors in the estimated transcription of the adaptation data. Previous work has mitigated this negative effect by using only those sections of the adaptation data which are transcribed with relatively high confidence. In this work, phoneme correctness predictions are integrated into a discriminative unsupervised speaker adaptation...
In many criminal cases, evidence might be in the form of telephone conversations or tape recordings. Therefore, law enforcement agencies have been concerned about accurate methods to profile different characteristics of a speaker from recorded voice patterns, which facilitate the identification of a criminal. This paper proposes a new approach for speaker gender detection and age estimation, based...
In this study, we propose increasing discriminative power on the maximum a posteriori (MAP)-based mapping function estimation for acoustic model adaptation. Based on the effective and stable learning advantages of MAP-based estimation, we incorporate a discriminative term and derive a new objective function. By applying the new function for online mapping function estimation, we developed discriminative...
In this study, we evaluate our proposed methods for enhancing alaryngeal speech based on statistical voice conversion techniques. Voice conversion based on a Gaussian mixture model has been applied to the conversion of alaryngeal speech into normal speech (AL-to-Speech). Moreover, one-to-many eigenvoice conversion (EVC) has also been applied to AL-to-Speech to enable the recovery of the original voice...
This paper presents a sound source (talker) localization method using only a single microphone, where a HMM (Hidden Markov Model) of clean speech is introduced to estimate the acoustic transfer function from a user's position. The new method is able to carry out this estimation without measuring impulse responses. The frame sequence of the acoustic transfer function is estimated by maximizing the...
This paper investigates an eigen feature space maximum likelihood linear regression (fMLLR) scheme to improve the performance of online speaker adaptation in automatic speech recognition systems. In this stochastic-approximation-like framework, the traditional incremental fMLLR estimation is considered as a slowly changing mean of the eigen fMLLR. It helps the adaptation when only a limited amount...
Since speech is highly variable, even if we have a fairly large-scale database, we cannot avoid the data sparseness problem in constructing automatic speech recognition (ASR) systems. How to train and adapt statistical models using limited amounts of data is one of the most important research issues in ASR. This paper summarizes major techniques that have been proposed to solve the generalization...
The room impulse response (RIR) can be used to calculate many room acoustical parameters, such as the reverberation time (RT). However, estimating the room volume, another important room parameter, from the RIR is typically a more difficult task requiring extraction of other features from the RIR. Most of the existing fully-blind methods for estimating the room volume from the RIR do not combine features...
This paper presents a development of a sensory system for analysis of badminton smashes. During a badminton game, the ability to execute a powerful smash is fundamental for a player to be competitive. In most games, the winning factor for the game is often attributed to a high shuttle speed during the execution of a smash. It was envisioned that the shuttle speed can be correlated from the speed of...
This paper suggests an alternative solution for the task of spoken document retrieval (SDR). The proposed system runs retrieval on multi-level transcriptions (word and phone) produced by word and phone recognizers respectively, and their outputs are combined. We propose to use latent Dirichlet allocation (LDA) model for capturing the semantic information on word transcription. The LDA model is employed...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.