The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes a novel approach for automatic estimation of four important traits of speakers, namely age, height, weight and smoking habit, from speech signals. In this method, each utterance is modeled using the i-vector framework which is based on the factor analysis on Gaussian Mixture Model (GMM) mean supervectors, and the Non-negative Factor Analysis (NFA) framework which is based on a...
Blind Source Separation (BSS) of underdetermined mixture has acquired a huge attention in signal processing environment, even though it is very much difficult to separate the underlying sources. The difficulty in source separation arise due to the mixing of large number of source signals in time and frequency, and propagation of it to one or more sensors through air. The objective in BSS is to identify...
Voice disorders are non-trivial when it comes to their early detection. Symptoms range from slight hoarseness to complete loss of voice, and may seriously impact personal and professional life. To date, we are still largely missing reliable data to help us better understand and screen voice pathologies. In this paper, we present an ambulatory voice monitoring system using surface electromyography...
Speech recognition is the important problem in pattern recognition research field. In this paper, the kernel ridge regression method is proposed to be applied to the MFCC feature vectors of the speech dataset available from IC Design lab at Faculty of Electricals-Electronics Engineering, University of Technology, Ho Chi Minh City. Experiment results show that the kernel ridge regression method outperforms...
In this paper, we propose a simple and fast method for evaluating the pathological voice (esophageal) by applying the continuous speech recognition in a speaker dependent mode, on our own database of the pathological voice, we call FPSD (French Pathological Speech Database). The recognition system used is implemented using the HTK platform, based on HMM/GMM monophone models. The acoustic vectors are...
Emotional speech can be synthesized by converting prosodic and spectrum features in neutral speech. This paper propose a multi-level prosody conversion method, it converts three prosodic features of F0, short-time energy and speaking rate in syllable, prosodic word and sentence level sequentially. The F0 and speaking rate is modeled by Gaussians, and energy is modeled by Gamma distribution respectively...
This paper describes a novel approach to construct a mapping function between a given speaker pair using probability density functions (PDF) of matrix variate. In voice conversion studies, two important functions should be realized: 1) precise modeling of both the source and target feature spaces, and 2) construction of a proper transform function between these spaces. Voice conversion based on Gaussian...
Deep Neural Network (DNN), which can model hierarchical and complex relationship between input and output layer has recently been applied in speech synthesis. However, it is remained uncertain why DNN outperform traditional HMM-based synthesis. This paper describes several implementation details of DNN-based speech synthesis system and compares different impacting factors, e.g, F0 modeling method...
A scheme for the feature level fusion of two behavioral biometrics speech and signature using fusion method weighted sum is proposed. Feature reduction is performed using modified feature selection algorithm based on Pollination based optimization which has never been applied to the problem earlier. The modified algorithm is applied to the fusion method to search the feature space for optimal and...
In this work we focus on Emarati speaker identification systems in neutral talking environments based on each of Vector Quantization (VQ), Gaussian Mixture Models (GMMs), and Hidden Markov Models (HMMs) as classifiers. These systems have been tested on our collected Emarati speech database which is composed of 25 male and 25 female Emarati speakers using Mel-Frequency Cepstral Coefficients (MFCCs)...
Ad hoc wireless acoustic sensor networks (WASNs) hold great potential for improved performance in speech processing applications, thanks to better coverage and higher diversity of the received signals. We consider a multiple speaker scenario where each of the WASN nodes, an autonomous system comprising of sensing, processing and communicating capabilities, is positioned in the near-field of one of...
This contribution describes a step-wise source counting algorithm to determine the number of speakers in an offline sce-nario. Each speaker is identified by a variational expectation maximization (VEM) algorithm for complex Watson mixture models and therefore directly yields beamforming vectors for a subsequent speech separation process. An observation selection criterion is proposed which improves...
Widely linear model has recently been used for signal processing applications due to its ability to achieve better performance than conventional linear filtering for non-circular complex random variables (CRVs) and improper quaternion random variables (QRVs). In this paper, we study the time-domain widely linear quaternion model based minimum variance distortionless response beamformer (WL-QMVDR)...
Spherical arrays facilitate processing and analysis of sound fields with the potential for high resolution in three dimensions in the spherical harmonic domain. Using the captured sound field, robust source localisation systems are required for speech acquisition, speaker tracking and environment mapping. Source localisation becomes a challenging problem in reverberant environments and under noisy...
Reverberated speech signals in acoustical environments produces some problems such as reducing speech intelligibility, distinguishing speakers, locating source, quality for hands-free telephony, hearing aid, etc. Adaptive filters can be applied to reduce the reverberation effects or to dereverberate the received speech signals at microphone. In this paper a dereverberation method is proposed by applying...
Speaker identification attempts to determine the best possible match from a group of certain speakers, for any given input speech signal. The text-independent speaker identification system does the task to identify the person who speaks regardless of what is said. The first step in speaker identification is the extraction of features. In this proposed method, the Bessel features are used as an alternative...
The voice conversion system modifies the speaker specific features of the source speaker so that it sounds like a target speaker speech. The voice individuality of the speech signal is characterized at various levels such as shape of the glottal excitation, shape of the vocal tract and the long term prosodic features. In this work, Line Spectral Frequencies (LSF) are used to represent the shape of...
An acoustic echo canceler (AEC) is often employed to remove the acoustic echoes generated in hands-free communication systems. The AEC cancels the acoustic echoes by approximating the echo-path with the use of an adaptive filter and subtracting the pseudo echoes generated by the filter from the observed signal. The conventional adaptive algorithm for updating the filter, however, fails in estimation...
In this paper, a new structure for acoustic echo cancellation is presented. The role of acoustic echo canceller (AEC) is to remove undesirable acoustic echoes in communication systems. However, in double-talk case the performance of the AEC is degraded, thus, a double-talk detector (DTD) must be used for controlling the AEC. A new structure for AEC using an auxiliary adaptive filter is proposed in...
In telephony applications, artificial bandwidth extension (ABE) can be applied to narrowband (NB) calls for speech quality and intelligibility enhancement. However, high-band extension is challenging due to insufficient mutual information between the lower and upper frequency band in speech. Estimation errors particularly of fricatives /s, z/ are the consequence leading to annoying artifacts, such...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.