The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper introduces an improved reranking method for the Bag-of-Words (BoW) based image search. Built on [1], a directed image graph robust to outlier distraction is proposed. In our approach, the relevance among images is encoded in the image graph, based on which the initial rank list is refined. Moreover, we show that the rank-level feature fusion can be adopted in this reranking method as well...
Inspired by the recently proposed Magnetic Resonance Fingerprinting technique, we develop a principled compressed sensing framework for quantitative MRI. The three key components are: a random pulse excitation sequence following the MRF technique; a random EPI subsampling strategy and an iterative projection algorithm that imposes consistency with the Bloch equations. We show that, as long as the...
The mismatching problem between the training and test speech corpora hinders the practical use of the machine-learning-based voice activity detection (VAD). In this paper, we try to address this problem by the unsupervised domain adaptation techniques, which try to find a shared feature subspace between the mismatching corpora. The denoising deep neural network is used as the learning machine. Three...
While bag-of-features (BOF) models have been widely applied for addressing image retrieval problems, the resulting performance is typically limited due to its disregard of spatial information of local image descriptors (and the associated visual words). In this paper, we present a novel spatial pooling scheme, called extended bag-of-features (EBOF), for solving the above task. Besides improving image...
We propose an object matching approach aimed at smartphone cameras that exploits the well-known concept of local sets of features for object representation. We also enable the temporal alignment of cameras by exploiting the frames of detected objects to group objects appeared in the same time interval for the assignment within each camera. The proposed approach does not need training thus making it...
In this paper, we analyze the convergence rate of the bi-alternating direction method of multipliers (BiADMM). Differently from ADMM that optimizes an augmented Lagrangian function, Bi-ADMM optimizes an augmented primal-dual Lagrangian function. The new function involves both the objective functions and their conjugates, thus incorporating more information of the objective functions than the augmented...
Statistical parametric speech synthesis (SPSS) using deep neural networks (DNNs) has shown its potential to produce naturally-sounding synthesized speech. However, there are limitations in the current implementation of DNN-based acoustic modeling for speech synthesis, such as the unimodal nature of its objective function and its lack of ability to predict variances. To address these limitations, this...
This paper presents a formal definition of stability for node centrality measures in networks and shows that the well-known betweenness centrality is not stable with respect to that metric. An alternative definition that preserves the same centrality notion while satisfying this stability criterion is then introduced. The practical implications of stability are explored by studying the behavior of...
This paper presents a study on complex cepstrum-based speech factorization for acoustic modeling in statistical parametric synthesizers. The factorization is conducted assuming that both vocal tract resonance and glottal flow effect are fully represented by the complex cepstrum. We investigated four different forms to represent the complex cepstrum in the acoustic models and compared their performances...
This paper examines two issues of a statistical speech synthesis approach based Gaussian process (GP) regression. Although GP-based speech synthesis can give higher performance in generating spectral parameters than the HMM-based one, a number of issues still remain. In this paper, we incorporate global variance (GV) feature to overcome over-smoothing problem into the parameter generation. Furthermore,...
In this paper, we consider learning dictionary models over a network of agents, where each agent is only in charge of a portion of the dictionary elements. This formulation is relevant in big data scenarios where multiple large dictionary models may be spread over different spatial locations and it is not feasible to aggregate all dictionaries in one location due to communication and privacy considerations...
The following article describes research on source detection in multi channel (3DTV) audio streams. The problem is extremely complex due to the fact that multiple layers can be present in scenes (background music, ambience, commentator). In this work a new algorithm is developed that exploits the information from the different audio channels to detect, and possibly localize and separate independent...
Fingerprint-based Audio recognition system must address concurrent objectives. Indeed, fingerprints must be both robust to distortions and discriminative while their dimension must remain to allow fast comparison. This paper proposes to restate these objectives as a penalized sparse representation problem. On top of this dictionary-based approach, we propose a structured sparsity model in the form...
Classification of environmental sounds is a fundamental procedure for a wide range of real-world applications. In this paper, we propose a novel acoustic feature extraction method for classifying the environmental sounds. The proposed method is motivated from the image processing technique, local binary pattern (LBP), and works on a spectrogram which forms two-dimensional (time-frequency) data like...
Many current speech models used in recognition involve thousands of parameters, whereas the mechanisms of speech production are conceptually very simple. We present and evaluate a new continuous state probabilistic model (CS-HMM) for recovering dwell-transition and phoneme sequences from dynamic speech production features. We show that with very few parameters, these features can be tracked, and phoneme...
Transmission power variance constrained power allocation in single carrier multiuser (MU) single-input multiple-output (SIMO) systems with iterative frequency domain (FD) soft cancelation (SC) minimum mean squared error (MMSE) equalization is considered in this paper. It is known in the literature that peak to average power ratio (PAPR) at the transmitter can be decreased by reducing the variance...
In compressive sensing, wavelet space is widely used to generate sparse signal (image signal in particular) representations. In this work, we propose a novel approach of statistical context modeling to increase the level of sparsity of wavelet image representations. It is shown, contrary to a widely held assumption, that high-frequency wavelet coefficients have non-zero mean distributions if conditioned...
This paper presents a systematic approach to block processing with iterative correction filters for time-interleaved analog-to-digital converters (TI-ADCs). TI-ADCs consist of several channels and can significantly increase the achievable sampling rate, but suffer from mismatches among the channels. Iterative digital correction filters are a general approach to mitigate the impact of mismatches in...
Phase unwrapping is a reconstruction problem of the continuous phase function from its finite wrapped samples. Especially the two-dimensional phase unwrapping has been a common key for estimating many crucial physical information, e.g., the surface topography measured by interferometric synthetic aperture radar. However almost all two-dimensional phase unwrapping algorithms are suffering from either...
We are motivated by many applications such as problems that arise in online marketing applications, where the observations are governed by non-homogeneous Poisson models. We analyze the performance of a Maximum Likelihood (ML) decoder. We prove consistency and show an exponential rate of converge for sparse recovery in the high-dimensional Poisson setting. After verifying the efficiency of ML estimator...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.