The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we present a birdsong-phrase segmentation and verification algorithm that is robust to limited training data, class variability, and noise. The algorithm comprises a noise-robust, Dynamic-Time-Warping (DTW)-based segmentation and a discriminative classifier for outlier rejection. The algorithm utilizes DTW and prominent (high energy) time-frequency regions of training spectrograms to...
The presence of Lombard Effect in speech is proven to have severe effects on the performance of speech systems, especially speaker recognition. Varying kinds of Lombard speech are produced by speakers under influence of varying noise types [1]. This study proposes a high-accuracy classifier using deep neural networks for detecting various kinds of Lombard speech against neutral speech, independent...
Dropout and DropConnect can be viewed as regularization methods for deep neural network (DNN) training. In DNN acoustic modeling, the huge number of speech samples makes it expensive to sample the neuron mask (Dropout) or the weight mask (DropConnect) repetitively from a high dimensional distribution. In this paper we investigate the effect of Gaussian stochastic neurons on DNN acoustic modeling....
Non-negative matrix factorization (NMF) has been widely used for challenging single-channel audio source separation tasks. However, inference in NMF-based models relies on iterative inference methods, typically formulated as multiplicative updates. We propose “deep NMF”, a novel non-negative deep network architecture which results from unfolding the NMF iterations and untying its parameters. This...
Stochastic optimization finds wide application in signal processing, online learning, and network problems, especially problems processing large-scale data. We propose an Incremental Constraint Averaging Projection Method (ICAPM) that is tailored to optimization problems involving a large number of constraints. The ICAPM makes fast updates by taking sample gradients and averaging over random constraint...
We provide a non-iterative channel impulse response (CIR) estimation algorithm for communication systems which utilize a periodically transmitted training sequence within a continuous stream of information symbols. The non-iterative channel estimate is an approximation to the Best Linear Unbiased Estimate (BLUE) of the CIR, achieving almost similar performance, with much lower complexity. We first...
Recently, a discrepancy in results has appeared in the literature concerning score fusion methods, classified in “combination methods” and “classification methods” [1]. Some works suggest that a simple Arithmetic Mean Rule (AMR) can outperform some training-based methods on multimodal data [2], while others favour, among other trained classifiers, a Support Vector Machine [3]. This paper makes a comparative...
In this paper we describe a method for estimation of noise power spectral densities from a noisy speech signal. The method is used in conjunction with a time-frequency domain speech presence detection method that provides connected time-frequency regions of each decision type. In speech absence regions hidden Markov models are trained on-line and in speech presence regions the trained models are used...
In the distributed multisensory information fusion system, each local sensor independently forms local tracks, and multisensory track fusion refers to fusing multiple local tracks that represent the same target into one global track. By studying the theory of multisensory track fusion and signal sparse representation, a sparse representation based multisensory track fusion algorithm is proposed. This...
Model-based FDI systems are considered here. The problem of constructing the diagnosed system model as well as the automatic search for the best rule base of the residual analyzer is reduced to a set of global optimization tasks. Various optimization problems are considered depending on the chosen technology of the non-analytical model construction as well as that of the residual evaluation. Most...
This paper1 presents a method to estimate a Decision Feedback Equalizer (DFE) directly from training data, which is robust w.r.t. time-variations in the communication channel. It is based on the indirect method proposed in [15], where the time variations in the channel are modeled as a probabilistic uncertainty. The robust DFE optimizes the performance by minimizing the mean squared error averaged...
A technique is developed for utilizing models of the Human Visual System to improve the design of filters for the enhancement of color images. The technique uses an image fidelity measure based on models of the human visual system — such as the Visible Differences Predictor (VDP) — in a nested loop training algorithm. In the inner loop of the algorithm, a stack filter is trained under a Weighted Mean...
Anatomical structure labeling in echocardiogram images will assist cardiac disease diagnosis by providing a framework for doing geometrical statistics. General labeling algorithms often focus on stationary body structures and do not perform well in echocardiography due to cardiac motion, low signal to noise ratio, and structural deformation caused by diseases. In this paper, we propose a new method...
The estimation of the power spectrum of discrete-time signals is one of the most fundamental and useful tools in signal processing. However, there are practical situations where one needs to look beyond the power spectrum, especially to extract information regarding the phase relations and deviations from Gaussianity. This has created considerable interest in the use of higher order spectra such as...
We propose a novel method for improved object detection in a video. Our approach adapts a generic offline trained detector (OTD) to a specific test video by collecting online samples in an unsupervised manner. Most of the existing adaptation methods focus on collecting confident online samples and do not address how to deal with ambiguous and noisy online samples. We address the importance of collecting...
In this paper, we propose a feature-based approach to address the challenging task of recognising overlapping sound events from single channel audio. Our approach is based on our previous work on Local Spectrogram Features (LSFs), where we combined a local spectral representation of the spectrogram with the Generalised Hough Transform (GHT) voting system for recognition. Here we propose to take the...
In this paper, we consider a two-way relay system where two terminals exchange their information via an amplify-and-forward relay in a bi-directional manner. Due to the two-way relay protocol, signals from both terminals travel through different cascaded channels, and this makes synchronization and channel estimation much more complicated than those in conventional one-way relay systems. To cope with...
In this study, we investigate the effect of blind spatial subtraction arrays (BSSA) on speech recognition systems by comparing the performance of a method using Mel-Frequency Cepstral Coefficients (MFCCs) with a method using Deep Bottleneck Features (DBNF) based on Deep Neural Networks (DNN). Performance is evaluated under various conditions, including noisy, in-vehicle conditions. Although performance...
Extreme learning machine (ELM) as an emergent technology has shown its good performance in regression applications as well as in large dataset classification applications. It has been broadly embedded in many applications due to its fast speed of computation and accuracy. How to make good use of machine learning techniques in Indoor Positioning System (IPS) is a hot research topic in recent years...
This paper presents a signal processing technique for segmenting short speech utterances into unvoiced and voiced sections and identifying points where the spectrum becomes steady. The segmentation process is part of a system for deriving musculoskeletal articulation data from disordered utterances, in order to provide training feedback for people with speech articulation problem. The approach implement...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.