The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Current model-based speech analysis tends to be incomplete — only a part of parameters of interest (e.g. only the pitch or vocal tract) are modeled, while the rest that might as well be important are disregarded. The drawback is that without joint modeling of parameters that are correlated, the analysis on speech parameters may be inaccurate or even incorrect. Under this motivation, we have proposed...
HMM-based speech synthesis system (HTS) often generates buzzy and muffled speech. Such degradation of voice quality makes synthetic speech sound robotically rather than naturally. From this point, we suppose that synthetic speech is in a different speaker space apart from the original. We propose to use voice conversion method to transform synthetic speech toward the original so as to improve its...
In this paper, a simple method for pitch-scale modifications of speech based on a recently suggested model for AM-FM decomposition of speech signals, is presented. This model is referred to as the adaptive Harmonic Model (aHM). The aHM models speech as a sum of harmonically related sinusoids that can adapt to the local characteristics of the signal. It was shown that this model provides high quality...
The paper presents a solution for singing voice processing that is used in a karaoke application with automated voice correction1. The intended purpose of the application is to automatically improve user's performance towards performance of a professional singer by implementation of voice effects such as pitch correction, artificial polyphony, time stretching and other. The proposed framework incorporates...
We consider in this paper multiuser downlink beamforming with interference cancellation (BFIC). In our BFIC problem, the total transmitted power of the base station (BS) is minimized under signal-to-interference-plus-noise ratio (SINR) requirements of the mobile stations (MSs) and single-stage interference cancellation (SSIC) is adopted at the MSs. The challenge of the problem lies in its combinatorial...
We present a system for detecting lexical stress in English words spoken by English learners. The system uses both spectral and segmental features to detect three levels of stress for each syllable in a word. The segmental features are computed on the vowels and include normalized energy, pitch, spectral tilt and duration measurements. The spectral features are computed at the frame level and are...
This paper presents a system that allows users to customize an audio signal of polyphonic music (input), without using musical scores, by replacing the frequency characteristics of harmonic sounds and the timbres of drum sounds with those of another audio signal of polyphonic music (reference). To develop the system, we first use a method that can separate the amplitude spectra of the input and reference...
The kernel least mean squares (KLMS) algorithm is a computationally efficient nonlinear adaptive filtering method that “kernelizes” the celebrated (linear) least mean squares algorithm. We demonstrate that the least mean squares algorithm is closely related to the Kalman filtering, and thus, the KLMS can be interpreted as an approximate Bayesian filtering method. This allows us to systematically develop...
We propose a novel recovery algorithm for signals with complex, irregular structure that is commonly represented by graphs. Our approach is a generalization of the signal inpainting technique from classical signal processing. We formulate corresponding minimization problems and demonstrate that in many cases they have closed-form solutions. We discuss a relation of the proposed approach to regression,...
The fixed point implementation of IIR digital filters usually leads to the appearance of zero-input limit cycles, which degrade the performance of the system. In this paper, we develop an efficient Monte Carlo algorithm to detect and characterize limit cycles in fixed-point IIR digital filters. The proposed approach considers filters formulated in the state space and is valid for any fixed point representation...
This paper deals with the problem of estimating expectations of sums of additive functionals under the joint smoothing distribution in general hidden Markov models. Computing such expectations is a key ingredient in any kind of expectation-maximization-based parameter inference in models of this sort. The paper presents a computationally efficient algorithm for online estimation of these expectations...
We consider the problem of sparse modeling of a signal consisting of an unknown number of exponentially decaying sinusoids. Since such signals are not sparse in an oversam-pled Fourier matrix, earlier approaches typically exploit large dictionary matrices that include not only a finely spaced frequency grid but also a grid over the considered damping factors. The resulting dictionary is often very...
We present practical, experimental results for a system, driven by a particle filter, that dynamically steers a space surveillance sensor to track and search for resident space objects. In contrast to traditional Kalman-filter-based trackers, this system can exploit scheduled observations where the target is not found within the field of view. Furthermore, real-time observation-evaluation enables...
In this paper we study parameter estimation for α-stable distribution parameters. The proposed approach uses a Poisson series representation (PSR) for skewed α-stable random variables, which provides a conditionally Gaussian framework. Therefore, a straightforward implementation of Bayesian parameter estimation using Markov chain Monte Carlo (MCMC) methods is feasible. To extend the series representation...
In this paper, we extend the multiple model track-before-detect method to track all possible target combinations at low signal-to-noise ratios. Given a maximum number of targets, the method estimates the posterior probability density function of the multitarget state vector, the corresponding target existence probabilities, and the probabilities of all possible target combinations. As the particle...
Recently, Markopoulos et al. [1], [2] presented an optimal algorithm that computes the L1 maximum-projection principal component of any set of N real-valued data vectors of dimension D with complexity polynomial in N, O(ND). Still, moderate to high values of the data dimension D and/or data record size N may render the optimal algorithm unsuitable for practical implementation due to its exponential...
In this paper we introduce a new multiple particle filtering approach for problems where the state-space of the system is of highdimension. We propose to break the space into subspaces and to perform separate particle filtering in each of them. The two critical operations of particle filtering, the particle propagation and weight computation of each particle filter are performed wherever necessary...
In this paper we propose an autoencoder-based method for the unsupervised identification of subword units. We experiment with different types and architectures of autoencoders to asses what autoencoder properties are most important for this task. We first show that the encoded representation of speech produced by standard autencoders is more effective than Gaussian posteriorgrams in a spoken query...
We consider a distributed detection system under communication constraints, where several peripheral nodes observe a common phenomenon and send their observations to a fusion center via error-free but rate-constrained channels. Using the minimum expected error probability as a design criterion, we propose a cyclic procedure for the design of the peripheral nodes using the person-by-person methodology...
We explore the use of maxout neuron in various aspects of acoustic modelling for large vocabulary speech recognition systems; including low-resource scenario and multilingual knowledge transfers. Through the experiments on voice search and short message dictation datasets, we found that maxout networks are around three times faster to train and offer lower or comparable word error rates on several...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.