The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Pitch information is an important cue for speech separation. However, pitch estimation in noisy condition is also a task as challenging as speech separation. In this paper, we propose a supervised learning architecture which combines these two problems concisely. The proposed algorithm is based on deep stacking network (DSN) which provides a method of stacking simple processing modules in building...
Model-based single-channel source separation (SCSS) is an ill-posed problem requiring source-specific prior knowledge. In this paper, we use representation learning and compare general stochastic networks (GSNs), Gauss Bernoulli restricted Boltzmann machines (GBRBMs), conditional Gauss Bernoulli restricted Boltzmann machines (CGBRBMs), and higher order contractive autoencoders (HCAEs) for modeling...
Despite recent advancements in digital signal processing technology for cochlear implant (CI) devices, there still remains a significant gap between speech identification performance of CI users in reverberation compared to that in anechoic quiet conditions. Alternatively, automatic speech recognition (ASR) systems have seen significant improvements in recent years resulting in robust speech recognition...
In this work, we consider enhancing a target speech from a singlechannel noisy observation corrupted by non-stationary noises at low signal-to-noise ratios (SNRs). We take a classification-based approach, where the objective is to estimate an Ideal Binary Mask (IBM) that classifies each time-frequency (T-F) unit of the noisy observation into one of the two categories: speech-dominant unit or noise-dominant...
This paper presents an investigation into the detection and classification of drum sounds in polyphonic music and drum loops using non-negative matrix deconvolution (NMD) and the Itakura Saito divergence. The Itakura Saito divergence has recently been proposed as especially appropriate for decomposing audio spectra due to the fact that it is scale invariant, but it has not yet been widely adopted...
We introduce an unsupervised optimization method for optimal fusion of multiple classifiers in retrieval problems. The method is based on a ranking loss called the “clarity” index, which does not depend on the label of the test instances. The technique optimizes the weights with which individual classifier scores must be combined to maximize this clarity. Our method is instance-specific; the weights...
In this paper, we present a birdsong-phrase segmentation and verification algorithm that is robust to limited training data, class variability, and noise. The algorithm comprises a noise-robust, Dynamic-Time-Warping (DTW)-based segmentation and a discriminative classifier for outlier rejection. The algorithm utilizes DTW and prominent (high energy) time-frequency regions of training spectrograms to...
The presence of Lombard Effect in speech is proven to have severe effects on the performance of speech systems, especially speaker recognition. Varying kinds of Lombard speech are produced by speakers under influence of varying noise types [1]. This study proposes a high-accuracy classifier using deep neural networks for detecting various kinds of Lombard speech against neutral speech, independent...
Dropout and DropConnect can be viewed as regularization methods for deep neural network (DNN) training. In DNN acoustic modeling, the huge number of speech samples makes it expensive to sample the neuron mask (Dropout) or the weight mask (DropConnect) repetitively from a high dimensional distribution. In this paper we investigate the effect of Gaussian stochastic neurons on DNN acoustic modeling....
Non-negative matrix factorization (NMF) has been widely used for challenging single-channel audio source separation tasks. However, inference in NMF-based models relies on iterative inference methods, typically formulated as multiplicative updates. We propose “deep NMF”, a novel non-negative deep network architecture which results from unfolding the NMF iterations and untying its parameters. This...
Stochastic optimization finds wide application in signal processing, online learning, and network problems, especially problems processing large-scale data. We propose an Incremental Constraint Averaging Projection Method (ICAPM) that is tailored to optimization problems involving a large number of constraints. The ICAPM makes fast updates by taking sample gradients and averaging over random constraint...
We provide a non-iterative channel impulse response (CIR) estimation algorithm for communication systems which utilize a periodically transmitted training sequence within a continuous stream of information symbols. The non-iterative channel estimate is an approximation to the Best Linear Unbiased Estimate (BLUE) of the CIR, achieving almost similar performance, with much lower complexity. We first...
Recently, a discrepancy in results has appeared in the literature concerning score fusion methods, classified in “combination methods” and “classification methods” [1]. Some works suggest that a simple Arithmetic Mean Rule (AMR) can outperform some training-based methods on multimodal data [2], while others favour, among other trained classifiers, a Support Vector Machine [3]. This paper makes a comparative...
In this paper we describe a method for estimation of noise power spectral densities from a noisy speech signal. The method is used in conjunction with a time-frequency domain speech presence detection method that provides connected time-frequency regions of each decision type. In speech absence regions hidden Markov models are trained on-line and in speech presence regions the trained models are used...
In the distributed multisensory information fusion system, each local sensor independently forms local tracks, and multisensory track fusion refers to fusing multiple local tracks that represent the same target into one global track. By studying the theory of multisensory track fusion and signal sparse representation, a sparse representation based multisensory track fusion algorithm is proposed. This...
Model-based FDI systems are considered here. The problem of constructing the diagnosed system model as well as the automatic search for the best rule base of the residual analyzer is reduced to a set of global optimization tasks. Various optimization problems are considered depending on the chosen technology of the non-analytical model construction as well as that of the residual evaluation. Most...
This paper1 presents a method to estimate a Decision Feedback Equalizer (DFE) directly from training data, which is robust w.r.t. time-variations in the communication channel. It is based on the indirect method proposed in [15], where the time variations in the channel are modeled as a probabilistic uncertainty. The robust DFE optimizes the performance by minimizing the mean squared error averaged...
A technique is developed for utilizing models of the Human Visual System to improve the design of filters for the enhancement of color images. The technique uses an image fidelity measure based on models of the human visual system — such as the Visible Differences Predictor (VDP) — in a nested loop training algorithm. In the inner loop of the algorithm, a stack filter is trained under a Weighted Mean...
Anatomical structure labeling in echocardiogram images will assist cardiac disease diagnosis by providing a framework for doing geometrical statistics. General labeling algorithms often focus on stationary body structures and do not perform well in echocardiography due to cardiac motion, low signal to noise ratio, and structural deformation caused by diseases. In this paper, we propose a new method...
In this letter, we describe highly effective known-plaintext attacks against physical layer security schemes. We substantially reduce the amount of required known-plaintext symbols and lower the symbol error rate (SER) for the attacker. In particular, we analyze the security of orthogonal blinding schemes that disturb an eavesdropper's signal reception using artificial noise transmission. We improve...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.