The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper discusses a spoken language acquisition system for a command-and-control interface. The proposed system learns a set of words through coupled commands and demonstrations. The user can teach the system a new command by demonstrating the uttered command through an alternative interface. With these coupled commands and demonstrations, the system can learn the acoustic representations of the...
Language model adaptation based on Machine Translation (MT) is a recently proposed approach to improve the Automatic Speech Recognition (ASR) of spoken translations that does not suffer from a common problem in approaches based on rescoring i.e. errors made during recognition cannot be recovered by the MT system. In previous work we presented an efficient implementation for MT-based language model...
Exemplar-based techniques, where the noisy speech is decomposed as a linear combination of the speech and noise exemplars stored in a dictionary, have been successfully used for speech enhancement in noisy environments. This paper extends this technique to achieve speech dereverberation in noisy environments by means of a nonnegative approximation of the noisy reverberant speech in the frequency domain...
Exemplar-based acoustic modeling is based on labeled training segments that are compared with the unseen test utterances with respect to a dissimilarity measure. Using a larger number of accurately labeled exemplars provides better generalization thus improved recognition performance which comes with increased computation and memory requirements. We have recently developed a noise robust exemplar...
This work examines the use of a Wireless Acoustic Sensor Network (WASN) for the classification of clinically relevant activities of daily living (ADL) from elderly people. The aim of this research is to automatically compile a summary report about the performed ADLs which can be easily interpreted by caregivers. In this work the classification performance of the WASN will be evaluated in both clean...
This work examines the use of a Wireless Acoustic Sensor Network (WASN) for the classification of clinically relevant activities of daily living (ADL) of elderly people. The aim of this research is to automatically compile a summary report about the performed ADLs which can be easily interpreted by caregivers. In this work, the classification performance of the WASN will be evaluated in both clean...
This paper investigates an adaptive noise dictionary design approach to achieve an effective and computationally feasible noise modeling for the noise robust exemplar matching (N-REM) framework. N-REM approximates noisy speech segments as a linear combination of multiple length exemplars in a sparse representation (SR) formulation. Compared to the previous SR techniques with a single overcomplete...
Exemplar-based feature enhancement successfully exploits a wide temporal signal context. We extend this technique with hy brid input spaces that are chosen for a more effective separation of speech from background noise. This work investigates the use of two different hybrid input spaces which are formed by incorporating the full-resolution and modulation envelope spectral representations with the...
In this paper, we propose a single-channel speech enhancement system based on the noise robust exemplar matching (N-REM) framework using coupled dictionaries. N-REM approximates noisy speech segments as a sparse linear combination of speech and noise exemplars that are stored in multiple dictionaries based on their length and associated speech unit. The dictionaries providing the best approximation...
This paper proposes a novel approach for automatic speaker height estimation based on the i-vector framework. In this method, each utterance is modeled by its corresponding i-vector. Then artificial neural networks (ANNs) and least-squares support vector regression (LSSVR) are employed to estimate the height of a speaker from a given utterance. The proposed method is trained and tested on the telephone...
We propose a method to transform the on line speech signal so as to comply with the specifications of an HMM-based automatic speech recognizer. The spectrum of the input signal undergoes a vocal tract length (VTL) normalization based on differences of the average third formant F3. The high frequency gap which is generated after scaling is estimated by means of an extrapolation scheme. Mel scale cepstral...
This paper studies the asymptotic properties (strong consistency, convergence rate, asymptotic normality) of a generalized weighted nonlinear least squares estimator under weak noise assumptions. Both deterministic and stochastic weighting are handled and the presence of model errors is considered.
Compounding is one of the most productive word formation processes in many languages and is therefore a main source of data sparsity in language modeling. Many solutions have been suggested to model compound words, most of which break the compound into its constituents and train a new model with them. In earlier work, we argued that this approach is suboptimal and we presented a novel technique that...
Deep neural network (DNN) based acoustic modelling has been successfully used for a variety of automatic speech recognition (ASR) tasks, thanks to its ability to learn higher-level information using multiple hidden layers. This paper investigates the recently proposed exemplar-based speech enhancement technique using coupled dictionaries as a pre-processing stage for DNN-based systems. In this setting,...
In this paper, a novel approach for automatic speaker weight estimation from spontaneous telephone speech signals is proposed. In this method, each utterance is modeled using the i-vector framework which is based on the factor analysis on Gaussian Mixture Model (GMM) mean supervectors, and the Non-negative Factor Analysis (NFA) framework which is based on a constrained factor analysis on GMM weights...
Over the past decade, several speech-based electronic assistive technologies (EATs) have been developed that target users with dysarthric speech. These EATs include vocal command & control systems, but also voice-input voice-output communication aids (VIVOCAs). In these systems, the vocal interfaces are based on automatic speech recognition systems (ASR), but this approach requires much training...
We propose a novel exemplar-based feature enhancement method for automatic speech recognition which uses coupled dictionaries: an input dictionary containing atoms sampled in the modulation (envelope) spectrogram domain and an output dictionary with atoms in the Mel or full-resolution frequency domain. The input modulation representation is chosen for its separation properties of speech and noise...
This paper issues in the design of a vocal interface for a robot that can learn to understand spoken utterances through demonstration. Weakly supervised non-negative matrix factorization (NMF) is used as a machine learning algorithm where acoustic data are augmented with semantic labels representing the meaning of the command. Many parameters that the robot needs in order to execute the commands have...
This paper proposes a novel approach for automatic estimation of four important traits of speakers, namely age, height, weight and smoking habit, from speech signals. In this method, each utterance is modeled using the i-vector framework which is based on the factor analysis on Gaussian Mixture Model (GMM) mean supervectors, and the Non-negative Factor Analysis (NFA) framework which is based on a...
In this paper, weakly supervised HMM learning is applied to modeling word acquisition towards human-computer interaction with little manual effort. The only imposed supervisory information is initializing the learning algorithms by two labeled data samples per pattern. Experiments on TIDIG-ITS show that our recently proposed algorithm, Baum-Welch learning regularized by non-negative Tucker decomposition,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.