The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Traditional speech-related identity recognition commonly pays attention to individual aspect of speech signals but in reality, the speech signals are made up of semantics, speaker dependent features, etc. This paper therefore presents a new study that recognizes simultaneously multidimensional speaker information. In order to extract sufficient relational features, both high-level and low-level features...
In this paper, a matrix form of Sym wavelet synthesis is deduced, keeping the length of the coefficient no more than the length of original speech signals, and then we propose an Adaptive Multiscale Compressed Sensing (AMCS) method, which design the sensing matrix and the num of level of wavelet decomposition adaptively, according to the sparsity of each level wavelet coefficients of the speech signals...
This paper presents a simple and effective steganography approach applied to 5.3 Kbps G.723.1 low-rate coded speech, based on analyzing the redundancy of coded parameters. Augmented identity matrix is used to reduce the modification to cover speech and enhance the imperceptibility of the mixed speech correspondingly. The scheme is with good transparency and low computational complexity, which is easy...
The adaptive multirate wideband (AMR-WB) speech codec and variable-rate multimode wideband (VMR-WB) speech codec are two coding standards based on CELP model for processing wideband input speech. When communication occurs between them, transcoding must be performed to translate the encoding format from one standard to another one. In this paper, an effective transcoding scheme is presented which makes...
Compressed sensing (CS) is an emerging signal acquisition theory that provides a universal approach for characterizing signals which are sparse or compressible on some basis at sub-Nyquist sampling rate. This paper focuses on the realization of CS on natural speech signals. We construct an over-complete data-driven dictionary as the sparse basis specialized for speech signals. Based on this, CS sampling...
This paper proposes a novel scheme for speech secure communication based on information hiding and Compressed Sensing (CS). The scheme first uses CS technology to compress the secret speech and reduce the information bit rate to be embedded, which is significantly different from state-of-art secret speech processing methods. Secret bit stream is then embedded into cover speech based on SCS (Scalar...
In this paper, we present a new voice conversion method based on the state-space model (SSM). A modified version of the conventional SSM model is first proposed to describe the relationship between the source speech and the target speech in the spectral domain. Then the expectation maximum (EM) and variational Bayesian (VB) algorithms are individually employed to estimate the SSM parameters, resulting...
In this paper, we propose a new voice activity detection (VAD) algorithm to improve the speech detection robustness in nonstationary noisy environments. At front-end, Wiener filtering speech enhancement is adopted to suppress noise from noisy speech. Then, at back-end, the voice activity detector based on mel filter-bank spectral entropy is presented to distinguish speech from noise. We have evaluated...
In this paper, we present an effective feature normalization algorithm to improve the robustness of automatic speech recognition systems. At front-end, minimum mean square error log-spectral amplitude estimation speech enhancement is adopted to suppress noise from noisy speech. Then, at back-end, the histogram equalization feature normalization is used to deal with the residual mismatch between enhanced...
This paper proposes a novel voice activity detection (VAD) algorithm to improve the speech detection robustness in noisy environments. In the proposed algorithm, two-stage mel-warped Wiener filter is introduced to improve the performance of voice activity detector based on spectral entropy. Then an improved decision rule based on spectral entropy was derived. We have evaluated system performance under...
One of the most recent models for voice conversion is the classical LPC analysis-synthesis model combined with GMM, which aims to separate information from excitation and vocal tract and to learn the transformation rules with statistical methods. However, it does not work well as it is supposed to be due to the inaccuracy of the extracted feature information as well as the overly-smoothed spectral...
This paper presents an improved phase-space voicing state classification method based on pitch detection to simultaneously determine the voicing state of two speakers present in a segment of co-channel speech. Three possible voicing states are considered: Unvoiced/Unvoiced (U/U), Voice/Unvoiced (V/U), Voiced/Voiced (V/V). Firstly, the method employs a phase-space voicing-state classification algorithm...
The problem of single-channel speaker separation attempts to extract the speech signal uttered by the speaker of interest from one channel signals containing a mixture of acoustic signals. Most of current techniques failed to eliminate the interfering signal completely. In this paper, we present a new approach to solve this problem. Itpsilas an iterative separation approach based on sub-spectrum GMM...
This paper presents a novel voice morphing system which reproduces high quality speech while maintaining the majority of the target characteristics. Bi-GMM is named for using GMM technique to estimate mapping functions as well as a codebook generated by GMM either. Compared with the traditional GMM technique, a maximum likelihood estimation framework combined with codebook compensation technique is...
A novel algorithm for voice conversion is proposed in this paper. The mapping function of spectral vectors of the source and target speakers is calculated by the canonical correlation analysis (CCA) estimation based on Gaussian mixture models. Since the spectral envelope feature remains a majority of second order statistical information contained in speech after linear prediction (LPC) analysis, the...
This paper proposes a scheme of real-time secure communication system based on information hiding and speech recognition. The algorithm uses speech recognition to reduce the bit-rate of secret speech greatly. Then we design an information hiding algorithm by adaptively choosing embedding locations and adopting the multi-nary modulation technique. Experimental results show that this algorithm has good...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.