The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The linearly constrained minimum variance (LCMV)-beamformer (BF) is a viable solution for desired source extraction from a mixture of speakers in a noisy environment. The performance in terms of speech distortion, interference cancellation and noise reduction depends on the estimation of a set of parameters. This paper presents a new mechanism to update the parameters of the LCMV-BF. A new speech...
In this paper, we comparatively study alternative dictionary designs for recently proposed meeting diarization and adaptive beamforming based on a probabilistic spatial dictionary. This dictionary models the feature distribution for each possible direction of arrival (DOA) of speech signals and the feature distribution for background noise. The dictionary enables online DOA detection, which in turn...
A distortionless speech extraction in a reverberant environment can be achieved by an application of a beamforming algorithm, provided that the relative transfer functions (RTFs) of the sources and the covariance matrix of the noise are known. In this contribution, we consider the RTF identification challenge in a multi-source scenario. We propose a successive RTF identification (SRI), based on a...
A common approach to multiple Direction-of-Arrival (DOA) estimation of speech sources is to identify Time-Frequency (TF) bins with dominant Single Source (SS) and apply DOA estimation such as Multiple Signal Classification (MUSIC) only on those TF bins. In the state-of-the-art Direct Path Dominance (DPD)-MUSIC, the covariance matrix, used as the input to MUSIC, is calculated using only the TF bins...
In this paper, we address the blind source separation (BSS) problem and analyze the optimal window length in the short-time Fourier transform (STFT) for independent low-rank matrix analysis (ILRMA). ILRMA is a state-of-the-art BSS technique that utilizes the statistical independence between low-rank matrix spectrogram models, which are estimated by nonnegative matrix factorization. In conventional...
The modeling of speech can be used for speech synthesis and speech recognition. We present a speech analysis method based on pole-zero modeling of speech with mixed block sparse and Gaussian excitation. By using a pole-zero model, instead of the all-pole model, a better spectral fitting can be expected. Moreover, motivated by the block sparse glottal flow excitation during voiced speech and the white...
In this paper, the problem of age estimation is addressed based on two modalities: speech utterances and speakers' face images. The proposed age estimation framework employs the Shifted Covariates REgression Analysis for Multi-way data (SCREAM) model, which combines Parallel Factor Analysis 2 and Principal Covariates Regression. SCREAM is able to extract a few latent variables from multi-way data...
We present a novel approach for epoch estimation from the simple observation of the speech spectrum. Fundamental frequency (F0) of the speech signal and local variations around F0 are the characteristics of glottal excitation source. Extraction of this information from the speech spectrum can be used to estimate epochs (since higher harmonics interact with the vocal tract characteristics, they no...
The spectral envelope of a speech signal encodes information about the characteristics of the speech source. As a result, spectral envelope modeling is a central task in speech applications, where tracking temporal transitions in diphones and triphones is essential for efficient speech synthesis and recognition algorithms. Temporal changes in the envelope structure are often derived from estimated...
Tracheoesophageal (TE) speech is generated by patients who have undergone a total laryngectomy where the larynx (voice box) is removed and replaced by a tracheoesophageal puncture. This work presents a novel low complexity algorithm to estimate the degree of severity of disordered TE speech. The proposed algorithm uses features which are computed from 32-ms voiced frames of the speech signal. A 21-st...
Mobile devices are widely used today for speech communication. The environments in which these devices are used are widely varied and often the level of background noise in the speaker's environment can be significant. The purpose of speech enhancement is to reduce the level of background noise, ideally to such a level that it is not noticed by the listener. While speech enhancement algorithms can...
We present a method for estimating the body orientation of seated people in a smart room by fusing low-resolution range information collected from downward pointed time-of-flight (ToF) sensors with synchronized speaker identification information from microphone recordings. The ToF sensors preserve the privacy of the occupants in that they only return the range to a small set of hit points. We propose...
The following paper presents our work on audio phylogeny with a focus on two application scenarios: audiovisual (A/V) archives and tampering detection. Starting from a set of near-duplicate audio files, our goal is to determine the processing history for the set, and detect the transformations that have been applied on each linked pair of nodes. Our approach targets AAC and MP3 encoding operations...
Patients affected by Amyotrophic Lateral Sclerosis (ALS) show specific dysarthric clues in speech. These marks could be used to detect early symptoms and monitor the evolution of the disease in time. Classically articulation marks have been mainly based on static premises. Articulation Kinematics from acoustic correlates may help in producing measurements based on the dynamic behavior of speech. Specifically,...
This study is summary of research results of existing voice information security estimation approach analysis and modification, especially by change of immediate appreciation test conditions, along with considering speech forcing effect, adjustment of frequency range width and the method of its division, analysis of amplitude speech constitution and qualification of test signal level. Another point...
In the presence of environmental noise, speaker verification systems inevitably see a decrease in performance. This paper proposes the (1) use of two parallel classifiers, (2) feature enhancement based on blind signal-to-noise ratio (SNR) estimation and (3) fusion, to improve the performance of speaker verification systems. The two classifiers are based on Gaussian mixture models and the partial least-squares...
Development of automatic speech recognition (ASR) systems robust to late reverberation action is urgent task. It is well known that a late reverberation reduction algorithm used as ASR pre-processor demands prior estimation of reverberation time. Blind reverberation time measurements are less accurate than ones for known room impulse response (RIR) direct measurements. As result, it is naturally expect...
In this paper, a modified Wiener filtering speech enhancement algorithm with phase spectrum compensation is proposed, which aims at improving performance of typical Wiener filtering speech enhancement algorithm in low signal-noise ratio. Since typical speech enhancement algorithms always used the observed noisy speech phase spectrum unchanged directly as enhanced speech phase spectrum, and estimated...
Mask estimation has shown a IoT of promise in speech enhancement for its simplicity and large speech intelligibility improvement. In this paper, the gammachirp filter banks are applied on the contaminated speech signal to get the auditory time-frequency representation. Robust principal component analysis with non-negative constraint is employed to decompose the auditory time-frequency representation...
This paper proposes a novel non-intrusive auditory perception-based approach for disordered speech quality estimation. An adaptive time-frequency algorithm, viz. the Matching Pursuit (MP) algorithm, is used to generate a reference signal from the disordered speech signal. Both the generated reference signal and the original degraded signal are given to the International Telecommunication Union (ITU)-standardized...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.