The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Mixed images cannot be avoided in visual tracking since the transmitted scene may be captured with specular reflections. Since few previous methods tackle this important problem, this paper proposes a novel visual tracking method using Blind Source Separation (BSS) for mixed images. Based on the framework of particle filter with compensated motion model at the prediction stage for mobile cameras,...
This paper introduces a highly integrated system providing very accurate object detection with RGB-D sensor. To solve the problem that there are always insufficient training sets for object detection in real world, we present an online learning architecture to learn templates and to detect objects real-time. The proposed novel concept skips the training phase required in previous recognition works,...
In this paper, we propose a parametric multichannel noise reduction algorithm utilizing temporal correlations in a noisy and reverberant environment. Under the reverberant condition, the received acoustic signal becomes highly correlated in the time domain and it makes successful noise reduction quite difficult. The proposed parametric noise reduction method takes account of interdependencies between...
In this work, we derive a distributed power control algorithm for energy-efficientuplink transmissions in interference-limited cellular networks, equipped with either multiple or shared relays. The proposed solution is derived by modeling the mobile terminals as utility-driven rational agents that engage in a noncooperative game, under minimum-rate constraints. The theoretical analysis of the game...
Voice-conversion (VC) techniques aim to transform utterances from a source speaker to sound as if a target speaker had produced them. For this reason, VC is generally ill-suited for accent-conversion (AC) purposes, where the goal is to capture the regional accent of the source while preserving the voice quality of the target. In this paper, we propose a modification of the conventional training process...
The point process model (PPM) for keyword search is a phonetic event-driven approach that provides a whole-word focused alternative to fast lattice matching techniques. Recent efforts in PPMs have been focused on improved model estimation techniques and efficient search algorithms, but past evaluations have been limited to searching relatively easy scripted corpora for simple unigram queries, preventing...
Among many speaker adaptation embodiments, Speaker Adaptive Training (SAT) has been successfully applied to a standard Hidden-Markov-Model (HMM) speech recognizer, whose state is associated with Gaussian Mixture Models (GMMs). On the other hand, recent studies on Speaker-Independent (SI) recognizer development have reported that a new type of HMM speech recognizer, which replaces GMMs with Deep Neural...
A method for speaker normalization in deep neural network (DNN) based discriminative feature estimation for automatic speech recognition (ASR) is presented. This method is applied in the context of a DNN configured for auto-encoder based low dimensional bottleneck (AE-BN) feature extraction where the derived features are used as input to a continuous Gaussian density hidden Markov model (HMM/GMM)...
The large number of parameters in deep neural networks (DNN) for automatic speech recognition (ASR) makes speaker adaptation very challenging. It also limits the use of speaker personalization due to the huge storage cost in large-scale deployments. In this paper we address DNN adaptation and personalization issues by presenting two methods based on the singular value decomposition (SVD). The first...
The use of a graph embedding framework is investigated as a regularization technique in the expectation-maximization (EM) algorithm applied to automatic speech recognition (ASR). The technique is motivated by the fact that graph em-beddings of feature vectors have been shown to provide useful characterizations of the underlying manifolds on which these features lie. Incorporating intrinsic graphs...
The application of compressed sensing (CS) to MRI has the potential to significantly reduce scan time. However, the quality of reconstructed images will be degraded when the MR images have strong phase variations. In the present paper, we propose a new CS method that is easy to implement and robust to phase variations on MR images. When the signal trajectory in k-space is symmetrical with respect...
A two-stage speaker adaptation approach is proposed for the subspace Gaussian mixture model (SGMM) [1] in large vocabulary automatic speech recognition (ASR). The SGMM differs from the more well known continuous density hidden Markov model (CDHMM) in that a large portion of the SGMM parameters are dedicated to shared full covariance Gaussian subspace parameters and a relatively small number of parameters...
Recently an effective fast speaker adaptation method using discriminative speaker code (SC) has been proposed for the hybrid DNN-HMM models in speech recognition [1]. This adaptation method depends on a joint learning of a large generic adaptation neural network for all speakers as well as multiple small speaker codes using the standard back-propagation algorithm. In this paper, we propose an alternative...
State of the art speaker recognition systems are based on the i-vector representation of speech segments. In this paper we show how this representation can be used to perform blind speaker adaptation of hybrid DNN-HMM speech recognition system and we report excellent results on a French language audio transcription task. The implemenation is very simple. An audio file is first diarized and each speaker...
Adaptation to speaker variations is an essential component of speech recognition systems. One common approach to adapting deep neural network (DNN) acoustic models is to perform global constrained maximum likelihood linear regression (CMLLR) at some point of the systems. Using CMLLR (or more generally, generative approaches) is advantageous especially in unsupervised adaptation scenarios with high...
Power system state estimation (PSSE) constitutes a crucial prerequisite for reliable operation of the power grid. A key challenge for accurate PSSE is the inherent nonlinearity of SCADA measurements in the system states. Recent proposals for static PSSE tackle this issue by exploiting hidden convexity structure and solving a semidefinite programming (SDP) relaxation. In this work, an online PSSE algorithm...
In this paper, a framework for dynamic high-dimensional hypothesis testing in wireless sensor networks is presented. The sensor nodes (SNs) collect and transmit to a fusion center (FC), in a distributed fashion, compressed measurements of a time-correlated hypothesis vector. The FC, based on the measurements collected, tracks the hypothesis vector, and feeds back minimal information about the uncertainty...
In this paper, we consider the 1-bit compressive sensing reconstruction problem in a scenario that the sparsity level of the signal is unknown and time variant, and the binary measurements are contaminated with the noise. We introduce a new reconstruction algorithm which we refer to as Noise-Adaptive Restricted Step Shrinkage (NARSS). NARSS is superior in terms of performance, complexity and speed...
We consider physical-layer security of a wireless LAN where multiple receivers collude to eavesdrop the information from the basestation to the intended receiver. To enhance the physical-layer security, we design the interference signals to combat the eavesdropping. Our design problems are resolved using semidefinite relaxation problems, which can be numerically solved efficiently by the existing...
This paper addresses the problem of mitigating non-stationary diffusely scattered multipath interference or “hot clutter” by space-time adaptive processing (STAP) in radar systems that use a multi-channel receive antenna array. A computationally efficient time-varying (TV) fast-time STAP algorithm that can effectively cancel hot clutter during the coherent processing interval (CPI) while simultaneously...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.