Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
In this work, a robust recursive procedure based on WRLS algorithm with VFF and a quadratic classifier with sliding training data set for identification of non-stationary AR model of speech production system is proposed. Experimental analysis is done according to the results obtained in analyzing speech signal with voiced and mixed excitation segments. Presented experimental results justify that two...
In this paper several preprocessing techniques used to improve speech recognition performance are compared over both PSN and GSM networks. Recognition experiments are conducted on a digit database in a speaker-independent isolated-word mode in order to evaluate the performances under within- and cross-network (PSN and GSM) conditions. Two classes of preprocessing techniques are distinguished depending...
In this paper a new approach to robust speech recognition using Fuzzy Matrix Quantisation, Hidden Markov Models and Neural Networks is presented and tested when speech is corrupted by car noise. Thus two new robust isolated word speech recognition (IWSR) systems called FMQ/HMM and FMQ/MLP, are proposed and designed optimally for operation in a variety of input SNR conditions. The schemes and associated...
In recent years, extensive research has been taken for hiding data into digital audio signal because of advantages of psycho acoustical masking phenomenon of human auditory system [HAS]. This paper presents a novel method based on audio steganography by integrating optimal steganography and two level cryptographic methods. Improvement of imperceptibility of data hiding and increased security level...
In this paper, a method of adaptive noise suppression combining spatially robust fixed beamforming and the TRINICON blind source separation algorithm is presented. A multichannel sensor array is first processed using complementary fixed beamformers into maximum and minimum SINR channels. The channels form the inputs to a single 2×2 second-order statistics TRINICON-BSS system which adaptively compensates...
Speaker identification (SID) in cochannel speech, where two speakers are talking simultaneously over a single recording channel, is a challenging problem. Previous studies address this problem in the anechoic environment under the Gaussian mixture model (GMM) framework. On the other hand, cochannel SID in reverberant conditions has not been addressed. This paper studies cochannel SID in both anechoic...
In this paper, we consider the robust covariance estimation problem in the non-Gaussian set-up. In particular, Tyler's M-estimator is adopted for samples drawn from a heavy-tailed elliptical distribution. For some applications, the covariance matrix naturally possesses certain structure. Therefore, incorporating the prior structure information in the estimation procedure is beneficial to improving...
Research on detecting depression from speech has advanced in recent years, but most work has focused on the analysis of one corpus at a time. Given that clinical corpora are typically small, it is important to explore approaches that generalize across corpora and that could ultimately be adapted to new data. We study a new corpus of patient-clinician interactions recorded when patients are admitted...
This paper addresses the problem of localising multiple competing speakers in the presence of room reverberation, where sound sources can be positioned at any azimuth on the horizontal plane. To reduce the amount of front-back confusions which can occur due to the similarity of interaural time differences (ITDs) and interaural level differences (ILDs) in the front and rear hemifield, a machine hearing...
Traditional sound event recognition methods based on informative front end features such as MFCC, with back end sequencing methods such as HMM, tend to perform poorly in the presence of interfering acoustic noise. Since noise corruption may be unavoidable in practical situations, it is important to develop more robust features and classifiers. Recent advances in this field use powerful machine learning...
In this paper we investigate the use of noise-robust features characterizing the speech excitation signal as complementary features to the usually considered vocal tract based features for Automatic Speech Recognition (ASR). The proposed Excitation-based Features (EBF) are tested in a state-of-the-art Deep Neural Network (DNN) based hybrid acoustic model for speech recognition. The suggested excitation...
In this paper, a broadband region-based near-field beamforming algorithm is proposed and demonstrated for acoustic applications. We use an eigenfilter structure with a minimum-energy cost function based on desired and undesired near-field regions. Robustness is thus achieved by focusing on signals generated from desired zones in space while rejecting signals from undesired zones. This construction...
Speech applications in noisy and degraded channel conditions continue to be a challenging problem especially when there is a mismatch between the training and test conditions. In this paper, a robust speech feature extraction scheme is developed based on autoregressive moving average (ARMA) modeling that emphasizes high energy regions of the signal with a data driven modulation filter. The peak preserving...
Measures of sparsity are useful in many aspects of audio signal processing including speech enhancement, audio coding and singing voice enhancement, and the well-known method for these applications is non-negative matrix factorization (NMF), which decomposes a non-negative data matrix into two non-negative matrices. Although previous studies on NMF have focused on the sparsity of the two matrices,...
While many recently proposed audio declipping algorithms are highly effective in their ability to restore clipped speech, the algorithms' computational complexities inhibit their use in many practical situations. Real-time or nearly real-time performance is impossible using a typical laptop computer, with some algorithms taking as long as 400 times the actual duration of the input to complete restoration...
This paper presents the latest improvements on our Spectro system that detects transformed duplicate audio content. We propose a new binary image feature derived from a spectrogram matrix by using a threshold based on the average of the spectral values. We quantize this binary image by applying a tile of fixed size and computing the sum of each small square in the tile. Fingerprints of each binary...
Arousal is essential in understanding human behavior and decision-making. In this work, we present a multimodal arousal rating framework that incorporates minimal set of vocal and non-verbal behavior descriptors. The rating framework and fusion techniques are unsupervised in nature to ensure that it can be readily-applicable and interpretable. Our proposed multimodal framework improves correlation...
Reduced frequency range in vowel production is a well documented speech characteristic of individuals' with psychological and neurological disorders. Depression is known to influence motor control and in particular speech production. The assessment and documentation of reduced vowel space and associated perceived hypoarticulation and reduced expressivity often rely on subjective assessments. Within...
Most automatic speech recognition (ASR) systems incorporate a single source of information about their input, namely, features and transformations derived from the speech signal. However, in many applications, e.g., vehicle-based speech recognition, sensor data and environmental information are often available to complement audio information. In this paper, we show how these data can be used to improve...
We investigate sequence-discriminative training of long shortterm memory recurrent neural networks using the maximum mutual information criterion. We show that although recurrent neural networks already make use of the whole observation sequence and are able to incorporate more contextual information than feed forward networks, their performance can be improved with sequence-discriminative training...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.