The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Significant multipath propagation and heavy clutter in indoor environments renders through-the-wall radar imaging a difficult and complex proposition. It is highly desirable to properly interpret the radar images and determine the contents of the indoor scene with a high level of confidence. Data collected from multiple positions around a structure can be used to improve imaging visibility into the...
Repair or error-recovery strategies are an important design issue in spoken dialogue systems (SDSs) - how to conduct the dialogue when there is no progress (e.g. due to repeated ASR errors). Nearly all current SDSs use hand-crafted repair rules, but a more robust approach is to use reinforcement learning (RL) for data-driven dialogue strategy learning. However, as well as usually being tested only...
The universalbackgroundmodel (UBM) is an effective framework widely used in speaker recognition. But so far it has received little attention from the speech recognition field. In this work, we make a first attempt to apply the UBM to acoustic modeling in ASR. We propose a tree-based parameter estimation technique for UBMs, and describe a set of smoothing and pruning methods to facilitate learning...
Part-of-speech tagging is a necessary pre-processing step for many natural language tasks. Recent statistical approaches, such as conditional random fields, rely on well chosen feature functions to ensure that important characteristics of the empirical training distribution are reflected in the trained model. In practice, however, it is not always clear how to best select these feature functions in...
In this paper, we develop a dual-microphone speech dereverberation algorithm for noisy environments, which is aimed at suppressing late reverberation and background noise. The spectral variance of the late reverberation is obtained with adaptively-estimated direct path compensation. A Markov-switching generalized autoregressive conditional heteroscedasticity (GARCH) model is used to estimate the spectral...
Outlined in this paper is a novel approach to speech dereverberation when an estimate of the source-receiver transfer function is known. It is a two-stage algorithm based on the minimum phase/allpass decomposition of a mixed phase room impulse response (RIR). The reverberant speech is first filtered with the inverse minimum phase component of the RIR. Then a non-negative matrix factorization (NMF)...
This paper reports comparative evaluations of the method we previously proposed of estimating fundamental frequency (F0) based on complex cepstrum analysis with nine typical methods over huge speech-sound datasets in both artificial and realistic reverberant environments (in room acoustics). They involve several classic algorithms (Cepstrum, AMDF, TPC, and modified autocorrelation) and a few modern...
Speech babble represents the most challenging noise interference in all speech systems, yet no research has been performed at a systematic level to model the underlying structure. For the first time, this study establishes a working foundation for the analysis and modeling of babble speech. We first address the underlying model for multiple speaker babble speech - considering the number of conversations...
Differences of physiological properties of the glottis and the vocal tract are partly due to age and/or gender differences. Since these differences are reflected in the speech signal, acoustic measures related to those properties can be helpful for automatic age and gender classification. In this paper, the focus is on the role of acoustic measures related to the voice source in automatic gender classification,...
Current approaches to automatic spoken language identification (LID) assume the availability of a large corpus of manually language-labeled speech samples for training statistical classifiers. We investigate two methods of active learning to significantly reduce the amount of labeled speech needed for training LID systems. Starting with a small training set, an automated method is used to select samples...
This paper presents a new strategy for designing the parallel phone recognizers for spoken language recognition. Given a collection of parallel phone recognizers, we select a subset of phones from each phone recognizer for each target language to construct a target-oriented phone tokenizer (TOPT). As a result, the collection of target-oriented phone tokenizers is more effective than the original parallel...
In this paper, we present a new approach to HMM adaptation that jointly compensates for additive and convolutive acoustic distortion in environment-robust speech recognition. The hallmark of our new approach is the use of a nonlinear, phase-sensitive model of acoustic distortion that captures phase asynchrony between clean speech and the mixing noise. In the first step of the developed algorithm,...
In this paper, we cast discriminative training problems into standard linear programming (LP) optimization. Besides being convex and having globally optimal solution(s), LP programs are well-studied with well-established solutions, and efficient LP solvers are freely available. In practice, however, one may not have complete knowledge of the feasible region since it is constructed from a limited number...
In this paper a new adaptive leakage factor variable tap-length learning algorithm is proposed. Through analysis the converged difference between the segmented mean square error (MSE) of a filter formed from a number of the initial coefficients of an adaptive filter, and the MSE of the full adaptive filter, is confirmed as a function of the tap-length of the adaptive filter to be monotonically non-increasing...
This paper presents a procedure for implementing fully adaptive interpolated FIR filters with removed border effect. The proposed approach allows reducing the steady-state mean-square error by eliminating the main sources of performance degradation from the adaptive interpolated FIR filters. In addition, the computational effort needed for implementing such a procedure is very small. Simulation results...
In this paper, we present an analytical analysis to predict the power spectral density (PSD) at the output of a nonlinear power amplifier (PA). We focus on offset quadrature phase shift keying (OQPSK) waveform band-limited by a square root raised cosine (SRRC) filter. This is one of the waveforms used in wideband code division multiple access (W-CDMA) wireless standard. We show that the PA output...
We propose in this paper a novel modification of the popular Adaptive Notch Filter (ANF) to improve the tracking of time-varying frequencies. Unlike previous algorithms, our new method incorporates a modeling of frequency variation directly into the cost minimization procedure. Our results show a notable improvement in the frequency estimation performance over earlier methods, and comparisons over...
A novel design for a two-channel IIR quadrature-mirror filter (QMF) bank with near-perfect reconstruction (NPR) is presented. The analysis filter-bank is given by an efficient polyphase network (PPN) implementation based on allpass filters. The arising phase distortions are almost compensated by stable allpass filters, designed via analytical closed-form expressions. In a first design, the remaining...
Recently, Candan introduced higher order DFT-commuting matrices whose eigenvectors are accurate approximations to the continuous Hermite-Gaussian functions (HGFs). However, the highest order 2k of the O(h2k) NtimesN DFT-commuting matrices proposed by Candan is restricted by 2k+1lesN. In this paper, we remove that restriction of order upper bound by developing a coefficient truncation technique to...
This paper presents a simple method to improve the frame-bounds-ratio of perfect reconstruction (PR) oversampled filter banks (FBs) by adjusting the gain of each subband filter. For a given analysis PRFB, a finite convex optimization algorithm is presented to redesign the subband gains such that the frame-bounds-ratio of the FB is minimized. The algorithm also provides an effective way to compute...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.