Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
This paper presents the construction of Binary Support Vector Machines and its significance for efficient Speech Emotion Recognition (SER). German Emotional Speech Corpus EmoDB has been used in this study. Seven Binary Support Vector Machines (SVMs) corresponding to each of the seven emotions in the EmoDB, namely Anger-Not Anger, Boredom-Not Boredom, Disgust-Not Disgust, Fear-Not Fear, Happy-Not Happy,...
Cough is an important symptom in many diseases and at times is the only major symptom to diagnose some particular ailments. Cough is the powerful mechanism of human body to clear the central airways. Analyzing the cough type, its intensity and sound, the medical experts can estimate enough details about the ailment and appropriate cure. Hence, it should be possible to estimate the cough type and the...
As more and more audio-visual content such as talks, lectures and presentations is made available online, it becomes increasingly difficult for prospective viewers of such content to assess which videos they might find interesting or engaging. Automatic classification of content as engaging versus non-engaging might help viewers cope with this situation, and presenters gauge their presentation skills...
Pitch, or fundamental frequency, estimation is an important problem in speech processing. Research on pitch extraction is several years old and numerous algorithms have been developed over the years to improve its accuracy. It becomes more difficult in the presence of additive noise and reverberation because noise corrupts the periodicity information which is vital for estimating the pitch. In this...
This paper presents a computationally efficient Direction-Of-Arrival (DOA) estimation method for Uniform Rectangular Array (URA), which is effective for both correlated and uncorrelated sources. The proposed method is an extension of our previous study for Uniform Linear Array (ULA), basically based on the relation between the elements of array covariance matrix, does not need iteration, angular peak-search...
We investigate the correlation between similarity in speaker characteristics and information transmission quality using a map task dialogue corpus. Similarity between the prosodic features and lexical styles of different speakers are analyzed, and most of these similarity measurements are shown to have significant correlations with information transmission quality as measured by a direction following...
This paper presents an automatic non-native accent assessment approach using phonetic level posterior and duration features. In this method, instead of using conventional MFCC trained Gaussian Mixture Models (GMM), we use phonetic phoneme states as tokens to calculate the posterior probability and zero-oder Baum-Welch statistics. Phoneme recognizers from five languages are employed to extract phonetic...
Most existing objective intelligibility prediction methods predict monaural intelligibility using monaural signals. These methods do not consider that a human can easily distinguish sounds arriving from different directions by sound heard in both ears. Therefore, intelligibility prediction using binaural signals that take this into account is necessary. Accordingly, speech samples with various source...
Blind source separation plays an important role in extracting the source components from one or more mixture(s) of the sources received by a sensor or receiver. It is blind since no other information besides the observed mixture signals is available. In presence of only one observed mixture, it is known as single channel blind source separation (SCBSS). This paper proposes a method of SCBSS based...
This paper proposes frame-by-frame speech recognition as a hardware decoder on Field Programmable Gate Arrays (FPGAs). As a first step for FPGA implementation, Voice Activity Detection (VAD) using second order autocorrelation and a speech recognition decoder using formant frequency distances were evaluated. The hardware decoding was then implemented on an FPGA emulator. The VAD and decoder were demonstrated...
In this article, impersonation experiments were conducted utilizing natural morphed speeches between (/a/-/b/-/a/) and (/a/-/g/-/a/) as stimuli. The sound stimuli are produced utilizing the natural glottal source and morphed linear predictive coding (LPC) filtering coefficients, which represent the vocal tract states. An algorithm has been proposed for determination of impersonation quality based...
In this paper a new approach is presented to develop the subspace-based speech enhancement for non-stationary noise cases. The new method updates the noise correlation matrix segment-by-segment assuming that only the eigenvalues of the matrix are varying with time. In other words, the characteristic of varying loudness of noise signals is just considered, as it is observed in the modulated white noise...
In human-human dialogue, especially in attentive listening such as counseling, backchannels play an important role. Appropriately coordinated backchannels will not only make smooth communication but also help establish rapport. By collecting counseling dialogue, we investigate whether and how synchrony is expressed by prosodic and linguistic features of backchannels with respect to the preceding speaker's...
Acoustic signal, speech, having a property for detecting the gender of a speaker. This is well known as Gender Detection (GD). In this paper, we propose pitch based gender detection algorithm. Pitch is the fundamental frequency of speech signal. Gender Detection using pitch can be performed in time domain, frequency domain, or in both. In this current paper, we propose an efficient time domain based...
The ambiguity of named entity refers to one named entity with multiple entity concepts. We use the text contextual information and other external repository to cope with the ambiguity of named entity. Then we can make sure the truly allegations of a named entity. Our system can improve the performance of the online recommendation system, the ability to extract information and other practical applications...
For better understanding of the identification difficulties in Japanese geminate/singleton consonants for second language (L2) learners, a perceptual factor is newly introduced to supply the insufficiencies of conventional explanations solely using acoustic duration differences. To systematically explain speech-rate related serious errors of geminate/singleton identification in fast/slow speech, loudness...
In this paper we discuss the role of fundamental frequency f0 and Formants F1 F2 and F3 of the speech signal in unsupervised source separation of real recorded convolutive speech mixtures. In unsupervised source separation there is no prior knowledge of the underlying sources and mixing conditions. We observed that supervised source separation using both f0 and Formants gives most accurate separation...
Speaker profiling is invincibly required to solve cases such as kidnapping, robbery, black mail calls, hoax, bomb threat calls and false alarms too where the evidence is in the form of telephonic conversations, tape recording, and digital recordings of speeches. Ranking them according to objective criteria such as gender, age, height and weight will be really useful. In this area many different methods...
In this paper, we propose two novel Dynamic Active Learning (DAL) methods with the aim of ultimately reducing the costly human labelling work for subjective tasks such as speech emotion recognition. Compared to conventional Active Learning (AL) algorithms, the proposed DAL approaches employ a highly efficient adaptive query strategy that minimises the number of annotations through three advancements...
This paper presents an approach to hierarchical modeling of temporal course in emotional expression for speech emotion recognition. In the proposed approach, a segmentation algorithm is employed to hierarchically chunk an input utterance into three-level temporal units, including low-level descriptors (LLDs)-based sub-utterance level, emotion profile (EP)-based sub-utterance level and utterance level...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.