Wyniki wyszukiwania

rozdział

Efficient speech emotion recognition using binary support vector machines & multiclass SVM

N. Ratna Kanth, S. Saraswathi

2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC) > 1 - 6

2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC)

This paper presents the construction of Binary Support Vector Machines and its significance for efficient Speech Emotion Recognition (SER). German Emotional Speech Corpus EmoDB has been used in this study. Seven Binary Support Vector Machines (SVMs) corresponding to each of the seven emotions in the EmoDB, namely Anger-Not Anger, Boredom-Not Boredom, Disgust-Not Disgust, Fear-Not Fear, Happy-Not Happy,...

rozdział

Preliminary analysis of cough sounds

Vishwanath Pratap Singh, J.M.S Rohith, Vinay Kumar Mittal

2015 Annual IEEE India Conference (INDICON) > 1 - 6

2015 Annual IEEE India Conference (INDICON)

Cough is an important symptom in many diseases and at times is the only major symptom to diagnose some particular ailments. Cough is the powerful mechanism of human body to clear the central airways. Analyzing the cough type, its intensity and sound, the medical experts can estimate enough details about the ailment and appropriate cure. Hence, it should be possible to estimate the cough type and the...

rozdział

High level visual and paralinguistic features extraction and their correlation with user engagement

Fasih Haider, Fahim A. Salim, Saturnino Luz, Owen Conlan, więcej

2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) > 326 - 331

2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)

As more and more audio-visual content such as talks, lectures and presentations is made available online, it becomes increasingly difficult for prospective viewers of such content to assess which videos they might find interesting or engaging. Automatic classification of content as engaging versus non-engaging might help viewers cope with this situation, and presenters gauge their presentation skills...

rozdział

Pitch tracking in reverberant environments

Mohammed Kamal Khwaja, Sunil Sivadas, P. Arulmozhivarman

2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) > 192 - 196

2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)

Pitch, or fundamental frequency, estimation is an important problem in speech processing. Research on pitch extraction is several years old and numerous algorithms have been developed over the years to improve its accuracy. It becomes more difficult in the presence of additive noise and reverberation because noise corrupts the periodicity information which is vital for estimating the pitch. In this...

rozdział

An efficient Direction-Of-Arrival estimation method for Uniform Rectangular Array based on array covariance matrix element properties

Koichi Ichige, Yu Iwabuchi

2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP) > 345 - 348

2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)

This paper presents a computationally efficient Direction-Of-Arrival (DOA) estimation method for Uniform Rectangular Array (URA), which is effective for both correlated and uncorrelated sources. The proposed method is an extension of our previous study for Uniform Linear Array (ULA), basically based on the relation between the elements of array covariance matrix, does not need iteration, angular peak-search...

rozdział

Relationship between speaker/listener similarity and information transmission quality in speech communication

Bohan Chen, Norihide Kitaoka, Kazuya Takeda

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1190 - 1193

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

We investigate the correlation between similarity in speaker characteristics and information transmission quality using a map task dialogue corpus. Similarity between the prosodic features and lexical styles of different speakers are analyzed, and most of these similarity measurements are shown to have significant correlations with information transmission quality as measured by a direction following...

rozdział

Automatic assessment of non-native accent degrees using phonetic level posterior and duration features from multiple languages

Shushan Chen, Yiming Zhou, Ming Li

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 156 - 159

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper presents an automatic non-native accent assessment approach using phonetic level posterior and duration features. In this method, instead of using conventional MFCC trained Gaussian Mixture Models (GMM), we use phonetic phoneme states as tokens to calculate the posterior probability and zero-oder Baum-Welch statistics. Phoneme recognizers from five languages are employed to extract phonetic...

rozdział

Estimation of binaural intelligibility using the frequency-weighted segmental SNR of stereo channel signals

Kazuya Taira, Kazuhiro Kondo

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 101 - 104

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Most existing objective intelligibility prediction methods predict monaural intelligibility using monaural signals. These methods do not consider that a human can easily distinguish sounds arriving from different directions by sound heard in both ears. Therefore, intelligibility prediction using binaural signals that take this into account is necessary. Accordingly, speech samples with various source...

rozdział

Single channel blind source separation based on variational mode decomposition and PCA

Priyanka Dey, Udit Satija, Barathram Ramkumar

2015 Annual IEEE India Conference (INDICON) > 1 - 5

2015 Annual IEEE India Conference (INDICON)

Blind source separation plays an important role in extracting the source components from one or more mixture(s) of the sources received by a sensor or receiver. It is blind since no other information besides the observed mixture signals is available. In presence of only one observed mixture, it is known as single channel blind source separation (SCBSS). This paper proposes a method of SCBSS based...

rozdział

Frame-by-frame speech recognition as hardware decoding on FPGA devices

Masashi Nakayama, Naoki Shigekawa, Takashi Yokouchi, Shunsuke Ishimitsu

2015 9th International Conference on Sensing Technology (ICST) > 785 - 788

2015 9th International Conference on Sensing Technology (ICST)

This paper proposes frame-by-frame speech recognition as a hardware decoder on Field Programmable Gate Arrays (FPGAs). As a first step for FPGA implementation, Voice Activity Detection (VAD) using second order autocorrelation and a speech recognition decoder using formant frequency distances were evaluated. The hardware decoding was then implemented on an FPGA emulator. The VAD and decoder were demonstrated...

rozdział

Canonical correlation based impersonation quality determination algorithm for natural morphed speech

Md. Mahbub Hasan, Sathi Rani Mitra, Kenbu Teramoto

2015 IEEE International Conference on Telecommunications and Photonics (ICTP) > 1 - 4

2015 IEEE International Conference on Telecommunications and Photonics (ICTP)

In this article, impersonation experiments were conducted utilizing natural morphed speeches between (/a/-/b/-/a/) and (/a/-/g/-/a/) as stimuli. The sound stimuli are produced utilizing the natural glottal source and morphed linear predictive coding (LPC) filtering coefficients, which represent the vocal tract states. An algorithm has been proposed for determination of impersonation quality based...

rozdział

Improved subspace-based speech enhancement using a novel updating approach for noise correlation matrix

Neda Faraji, Seyed Mohammad Ahadi

2015 Signal Processing and Intelligent Systems Conference (SPIS) > 88 - 92

2015 Signal Processing and Intelligent Systems Conference (SPIS)

In this paper a new approach is presented to develop the subspace-based speech enhancement for non-stationary noise cases. The new method updates the noise correlation matrix segment-by-segment assuming that only the eigenvalues of the matrix are varying with time. In other words, the characteristic of varying loudness of noise signals is just considered, as it is observed in the modulated white noise...

rozdział

Synchrony in prosodic and linguistic features between backchannels and preceding utterances in attentive listening

Tatsuya Kawahara, Takashi Yamaguchi, Miki Uesato, Koichiro Yoshino, więcej

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 392 - 395

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In human-human dialogue, especially in attentive listening such as counseling, backchannels play an important role. Appropriately coordinated backchannels will not only make smooth communication but also help establish rapport. By collecting counseling dialogue, we investigate whether and how synchrony is expressed by prosodic and linguistic features of backchannels with respect to the preceding speaker's...

rozdział

An efficient algorithm for Gender Detection using voice samples

Mamta Kumari, Israj Ali

2015 Communication, Control and Intelligent Systems (CCIS) > 221 - 226

2015 Communication, Control and Intelligent Systems (CCIS)

Acoustic signal, speech, having a property for detecting the gender of a speaker. This is well known as Gender Detection (GD). In this paper, we propose pitch based gender detection algorithm. Pitch is the fundamental frequency of speech signal. Gender Detection using pitch can be performed in time domain, frequency domain, or in both. In this current paper, we propose an efficient time domain based...

rozdział

LSA-Based Chinese-Slavic Mongolian NER Disambiguation

Jiang Yupeng, Hou Hongxu, Yang Ping

2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing > 703 - 708

2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM)

The ambiguity of named entity refers to one named entity with multiple entity concepts. We use the text contextual information and other external repository to cope with the ambiguity of named entity. Then we can make sure the truly allegations of a named entity. Our system can improve the performance of the online recommendation system, the ability to extract information and other practical applications...

rozdział

Analysis on L2 learners' perception errors between geminate and singleton of Japanese consonants using loudness related parameters

Yanlong Zhang, Mee Sonu, Hiroaki Kato, Yoshinori Sagisaka

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) > 186 - 189

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

For better understanding of the identification difficulties in Japanese geminate/singleton consonants for second language (L2) learners, a perceptual factor is newly introduced to supply the insufficiencies of conventional explanations solely using acoustic duration differences. To systematically explain speech-rate related serious errors of geminate/singleton identification in fast/slow speech, loudness...

rozdział

Role of f0 and formant frequencies in unsupervised separation of convolutive speech mixtures

M K Prasanna Kumar, R Kumaraswamy

2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT) > 316 - 320

2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)

In this paper we discuss the role of fundamental frequency f0 and Formants F1 F2 and F3 of the speech signal in unsupervised source separation of real recorded convolutive speech mixtures. In unsupervised source separation there is no prior knowledge of the underlying sources and mixing conditions. We observed that supervised source separation using both f0 and Formants gives most accurate separation...

rozdział

Speaker profiling by extracting paralinguistic parameters using mel frequency cepstral coefficients

Sudeep Galgali, S Selva Priyanka, B. R. Shashank, Annapurna P Patil

2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT) > 486 - 489

2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)

Speaker profiling is invincibly required to solve cases such as kidnapping, robbery, black mail calls, hoax, bomb threat calls and false alarms too where the evidence is in the form of telephonic conversations, tape recording, and digital recordings of speeches. Ranking them according to objective criteria such as gender, age, height and weight will be really useful. In this area many different methods...

rozdział

On rater reliability and agreement based dynamic active learning

Yue Zhang, Eduardo Coutinho, Bjorn Schuller, Zixing Zhang, więcej

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) > 70 - 76

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

In this paper, we propose two novel Dynamic Active Learning (DAL) methods with the aim of ultimately reducing the costly human labelling work for subjective tasks such as speech emotion recognition. Compared to conventional Active Learning (AL) algorithms, the proposed DAL approaches employ a highly efficient adaptive query strategy that minimises the number of annotations through three advancements...

rozdział

Hierarchical modeling of temporal course in emotional expression for speech emotion recognition

Chung-Hsien Wu, Wei-Bin Liang, Kuan-Chun Cheng, Jen-Chun Lin

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) > 810 - 814

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

This paper presents an approach to hierarchical modeling of temporal course in emotional expression for speech emotion recognition. In the proposed approach, a segmentation algorithm is employed to hierarchically chunk an input utterance into three-level temporal units, including low-level descriptors (LLDs)-based sub-utterance level, emotion profile (EP)-based sub-utterance level and utterance level...

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Efficient speech emotion recognition using binary support vector machines & multiclass SVM

Preliminary analysis of cough sounds

High level visual and paralinguistic features extraction and their correlation with user engagement

Pitch tracking in reverberant environments

An efficient Direction-Of-Arrival estimation method for Uniform Rectangular Array based on array covariance matrix element properties

Relationship between speaker/listener similarity and information transmission quality in speech communication

Automatic assessment of non-native accent degrees using phonetic level posterior and duration features from multiple languages

Estimation of binaural intelligibility using the frequency-weighted segmental SNR of stereo channel signals

Single channel blind source separation based on variational mode decomposition and PCA

Frame-by-frame speech recognition as hardware decoding on FPGA devices

Canonical correlation based impersonation quality determination algorithm for natural morphed speech

Improved subspace-based speech enhancement using a novel updating approach for noise correlation matrix

Synchrony in prosodic and linguistic features between backchannels and preceding utterances in attentive listening

An efficient algorithm for Gender Detection using voice samples

LSA-Based Chinese-Slavic Mongolian NER Disambiguation

Analysis on L2 learners' perception errors between geminate and singleton of Japanese consonants using loudness related parameters

Role of f0 and formant frequencies in unsupervised separation of convolutive speech mixtures

Speaker profiling by extracting paralinguistic parameters using mel frequency cepstral coefficients

On rater reliability and agreement based dynamic active learning

Hierarchical modeling of temporal course in emotional expression for speech emotion recognition

Opcje filtrowania

Data publikacji

Dostępność treści

Słowa kluczowe

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Dostępność treści

Słowa kluczowe

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu