The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Most of the speaker recognition systems use system features for speaker recognition which are mostly spectral in nature. Recently, there has been significant work on using source features, viz., prosodies and pitch dynamics, glottal flow derivative, Linear Prediction (LP) residual and its phase, wavelet-domain representation of LP residual, etc for speaker recognition. In this paper, a new source-like...
This paper attempts to utilize the pitch synchronous property of Pseudo-periodic signals to increase the efficiency of compression, to minimize losses and thus to enhance the quality of the reconstruction. Results show higher signal to noise ratio, higher compression ratio and lower percentage distortion with the new method of 2-D compression as compared to 1-D compression. A new method is used for...
The Lines Of Maximum Amplitude (LOMA) of the wavelet transform are used for glottal closure instant detection. Following Kadambe & al. (1992), the wavelet transform modulus maxima can be used for singularity detection. The LOMA method extends this idea. All the lines chaining maxima of a wavelet transform across scales are built. Then a back-tracking procedure allows for selection of the optimal...
Speech enhancement is concerned with the processing of corrupted or noisy speech signal in order to improve the quality or intelligibility of the signal. Our goal is to enhance speech signal corrupted by noise to obtain a clean signal with higher quality. However, the presence of noise in speech signals will contribute to a high degree of inaccuracy in a system that requires speech processing. This...
Low-frequency modulation of sound carry essential information for speech and music. They must be preserved for compression. The complex modulation spectrum has already been used for audio compression and is commonly obtained by spectral analysis of the sole temporal envelopes of the subbands out of a time/frequency analysis (modified discrete cosine transform combined with a modified discrete sine...
In this paper some questions of analysis of methods of preliminary segmentation of speech signals and their features for the tasks of recognition are considered.
This paper proposes a robust and accurate multi-pitch estimation method for multiple voices. This method is based on the spectral analysis of the mixture sound multi-scale product. The multi-scale product (PM) consists of making the product of wavelet transform coefficients. The wavelet used is the quadratic spline function. Simulation results showed that the proposed method can robustly estimate...
Human auditory has non-linear characteristics, while wavelet packet transform (WPT) has flexible analysis ability to time-frequency property so that it is more compatible to simulate the human auditory model. In this paper, human auditory model is analyzed, after which a new algorithm for speech enhancement using node-threshold wavelet packet transform based on bark-scaled decomposition is established,...
Most of researches on speech recognition in the world concentrate on improving the large vocabulary of the corpus. In real-time robot control by speech commands, speech recognition is usually no need very large vocabulary but the fast implementation and the noise robust is prerequisite. This study proposes a novel fast noise robust wavelet-based Vietnamese speech recognition applied for robot control...
Low-frequency modulation of sound carry important information for speech and music. The modulation spectrum is commonly obtained by spectral analysis of the sole temporal envelopes of the sub-bands out of a time-frequency analysis. Processing in this domain usually creates undesirable distortions because only the magnitudes are taken into account and the phase data is often neglected. We remedy this...
Most of the current pitch detection algorithms can not work well under the high noise environment. For this reason, a pitch detection algorithm for noisy speech signal based on pre-filtering and weighted wavelet coefficients is proposed. Firstly, the noisy speech signals are pre-filtered. Secondly, the speech pre-filtered is decomposed by the quadratic spline wavelet. Thirdly, the wavelet coefficients...
In order to solve the problem of endpoint detection in presence of multi noises, this paper presents a robust algorithm of Chinese. Without any priori information of noise statistics, this approach employs the autocorrelation of low frequency coefficients in wavelet transform to detect the endpoint of voiced signal, and combines with the power spectral density of noisy speech to determine the commence...
A phoneme recognition system using an anti-symmetric multi-stage filter bank structure is presented. In a filter bank the input signal is convolved with digital filters having different cut-off frequencies so that the signal is analysed at different frequencies with different resolutions. The percentage energy content in each signal decomposition level is calculated and used as input to the artificial...
Voice conversion is a method used to transform one speakerpsilas voice into another speakerpsilas voice. New modification approach for voice conversion is proposed in this paper. We take Mel-frequency Discrete Wavelet coefficients (MFDWC) as the basic feature. This feature copes well with small training sets of high dimension, which is a problem often encountered in voice conversion. The proposed...
One of the most important signal processing method in digital signal processing discipline is speaker identification method (SIM). Because of the difficult nature of speech signals and their fast variation with time, the wavelet transform is used to reduce the complexity of such signals. In this paper two identification methods are presented based on Continuous Wavelet Transform CWT. The first method...
In tasks related to the analysis and recognition of pathological speech it is often more important to provide the respective person (e.g. physician) with guidelines for a deformation degree assessment of speech signal than to achieve a very accurate automated recognition. By ear it is easy to judge whether the speech is regular or deformed, but any attempt of a deformation degree evaluation is not...
Based on the dynamic characteristic of speech signal, we proposed a new method of number speech recognition using wavelet packet transform and K-L expansion. Firstly, speech signals underwent a series of preprocessing course including pre-filtering, quantification, pre-emphasizing and endpoint detector. Secondly, using wavelet packet transform extracted the relative energies in 32 sub-bands and the...
A method, which is on the basis of auditory perception wavelet transform, is proposed to model the speech process and extract features for cochlear implants. First, the original speech signal is decomposed by using an auditory perception wavelet transform. Second, a linear predictive coding method is used to extract the fundamental frequency and formant frequency in the perception channel. Experimental...
In this paper, we design an enhanced human-computer speech interface by wavelet transform. By using a new thresholding algorithm and shrink function, we improve the efficiency of the speech interface. This shrink function tries to decrease sharp time-frequency spectrogram discontinuities by attenuating the wavelet coefficients instead of setting them to zero. This attenuation will be done regarding...
Segmenting the speech signals on the basis of time-frequency analysis is the most natural approach. Boundaries are located in places where energy of some frequency subband rapidly changes. Speech segmentation method which bases on discrete wavelet transform, the resulting power spectrum and its derivatives is presented. This information allows to locate the boundaries of phonemes. A statistical classification...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.