Search results

chapter

Local training in speaker verification for PLDA

Hunny Pahuja, Priya Ranjan, Amit Ujlayan

2017 International Conference on Computing, Communication and Automation (ICCCA) > 1466 - 1469

2017 International Conference on Computing, Communication and Automation (ICCCA)

For i-vector model, normalization approach is Probabilistic linear discriminant analysis and has a significant performance for verification of speaker. However it requires a huge development data which cost a lot in many cases. Unsupervised adaption method is a possible approach, which use unlabeled data to adapt PLDA scattering matrices to the target domain. In this paper, ‘local training’ approach...

chapter

Exploiting sequence information for text-dependent Speaker Verification

Subhadeep Dey, Petr Motlicek, Srikanth Madikeri, Marc Ferras

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5370 - 5374

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Model-based approaches to Speaker Verification (SV), such as Joint Factor Analysis (JFA), i-vector and relevance Maximum-a-Posteriori (MAP), have shown to provide state-of-the-art performance for text-dependent systems with fixed phrases. The performance of i-vector and JFA models has been further enhanced by estimating posteriors from Deep Neural Network (DNN) instead of Gaussian Mixture Model (GMM)...

chapter

System combination for short utterance speaker recognition

Lantian Li, Dong Wang, Xiaodong Zhang, Thomas Fang Zheng, more

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

For text-independent short-utterance speaker recognition (SUSR), the performance often degrades dramatically. This paper presents a combination approach to the SUSR tasks with two phonetic-aware systems: one is the DNN-based i-vector system and the other is our recently proposed subregion-based GMM-UBM system. The former employs phone posteriors to construct an i-vector model in which the shared statistics...

chapter

Emirati speaker verification based on HMMls, HMM2s, and HMM3s

Shahin Ismail

2016 IEEE 13th International Conference on Signal Processing (ICSP) > 562 - 567

2016 IEEE 13th International Conference on Signal Processing (ICSP)

This work focuses on Emirati speaker verification systems in neutral talking environments based on each of First-Order Hidden Markov Models (HMMls), Second-Order Hidden Markov Models (HMM2s), and Third-Order Hidden Markov Models (HMM3s) as classifiers. These systems have been evaluated on our collected Emirati speech database which is comprised of 25 male and 25 female Emirati speakers using Mel-Frequency...

chapter

Binary speaker embedding

Lantian Li, Chao Xing, Dong Wang, Kaimin Yu, more

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 4

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

The popular i-vector model represents speakers as low-dimensional continuous vectors (i-vectors), and hence it is a way of continuous speaker embedding. In this paper, we investigate binary speaker embedding, which transforms i-vectors to binary vectors (codes) by a hash function. We start from locality sensitive hashing (LSH), a simple binarization approach where binary codes are derived from a set...

chapter

Analysis of glottal source parameters in Parkinsonian speech

Jane Hanratty, Catherine Deegan, Mary Walsh, Barry Kirkpatrick

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) > 3666 - 3669

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Diagnosis and monitoring of Parkinson's disease has a number of challenges as there is no definitive biomarker despite the broad range of symptoms. Research is ongoing to produce objective measures that can either diagnose Parkinson's or act as an objective decision support tool. Recent research on speech based measures have demonstrated promising results. This study aims to investigate the characteristics...

chapter

Obstructive sleep apnea severity estimation: Fusion of speech-based systems

D. Ben Or, E. Dafna, A. Tarasiuk, Y. Zigel

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) > 3207 - 3210

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Obstructive sleep apnea (OSA) is a common sleep-related breathing disorder. Previous studies associated OSA with anatomical abnormalities of the upper respiratory tract that may be reflected in the acoustic characteristics of speech. We tested the hypothesis that the speech signal carries essential information that can assist in early assessment of OSA severity by estimating apnea-hypopnea index (AHI)...

chapter

A new speech corpus in Spanish for speaker verification

N. Garcia, T. Arias-Vergara, J. R. Orozco-Arroyave, J. F. Vargas-Bonilla

2016 XXI Symposium on Signal Processing, Images and Artificial Vision (STSIVA) > 1 - 7

2016 XXI Symposium on Signal Processing, Images and Artificial Vision (STSIVA)

In this paper we present a new database with speech recordings in Spanish. The database contains recordings of 54 native Spanish speakers. It is appropriate to be used in the development and testing of better Speaker Verification systems. The recording procedure, equipments and speech tasks are detailed. Experiments using the GMM-UBM speaker verification methodology were performed. The methodology...

chapter

A database for emotional interactions of the elderly

Kunxia Wang, ZongBao Zhu, Shidong Wang, Xiao Sun, more

2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS) > 1 - 6

2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS)

Emotional interaction plays an important role in human-computer interaction domains. One of the major limitations in the study of emotion interaction is the lack of databases. This paper describes a database for emotion interactions of the elderly. The database was collected with audio and video from sixteen actors (8 female and 8 male) in daily conversations of TV series, which covers seven type...

chapter

Analysis of JestKOD database using affective state annotations

Sinan Kececi, Engin Erzin, Yucel Yemez

2016 24th Signal Processing and Communication Application Conference (SIU) > 1033 - 1036

2016 24th Signal Processing and Communication Application Conference (SIU)

Gesticulation, together with the speech, is an important part of natural and affective human-human interaction. Analysis of gesticulation and speech is expected to help designing more natural human-computer interaction (HCI) systems. We build the JestKOD database, which consists of speech and motion capture recordings of dyadic interactions. In this paper we describe our annotation efforts and present...

chapter

Cohort selection for text-dependent speaker verification score normalization

Houssemeddine Khemiri, Dijana Petrovska-Delacretaz

2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) > 689 - 692

2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)

In this paper a speaker dependent cohort selection for T-norm score normalization is proposed in the context of text-dependent speaker verification. The goal of the proposed technique is to find a set of cohort speakers who are close to the target speaker. In order to properly select the subset of speakers for the normalization, a distance between each target speaker model and the the available normalization...

chapter

Speaker specific features and phonemes in speech: a proposal for evaluating a possible interaction

Nivedita Yadav, Solange Rossato, Juliette Kahn, Jean Francois Bonastre

2016 4th International Conference on Biometrics and Forensics (IWBF) > 1 - 6

2016 4th International Conference on Biometrics and Forensics (IWBF)

Speaker voice characteristics are an important aspect of forensic phonetics. Previous studies have suggested that all the features present in the speech signals are not equally important for speaker discrimination, and it is well-known that subsets of phonemes are more informative than others. However, most of theses studies have concerned a whole group of speakers, without taking into account the...

chapter

Isolated speech recognition using Fuzzy C Means technique

Vani H.Y, M.A. Anusuya

2015 International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT) > 352 - 357

2015 International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT)

Automatic speech recognition is one of the challenging area in the field of speech signal processing. Automatic speech recognition technology converts speech signal into text. This paper presents the implementation of isolated kannada word recognizer using Vector Quantization (VQ) and Fuzzy-C Means (FCM) techniques. The paper compares and contrasts the recognition accuracies of FCM and k-means techniques...

chapter

Pause duration model for Malayalam TTS

Jesin James, Deepa P. Gopinath

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) > 2206 - 2210

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

In this paper a CART-based pause duration prediction model has been developed for Malayalam language. Prosodic features like pause durations, syllable prolongations etc. play an important role in making the speech output from a Text To Speech (TTS) system more intelligible. An analysis on the various factors that affect pause duration for Malayalam language has not been conducted till date. Here,...

chapter

Quality evaluation of computational models for movie summarization

A. Zlatintsi, P. Koutras, N. Efthymiou, P. Maragos, more

2015 Seventh International Workshop on Quality of Multimedia Experience (QoMEX) > 1 - 6

2015 Seventh International Workshop on Quality of Multimedia Experience (QoMEX)

In this paper we present a movie summarization system and we investigate what composes high quality movie summaries in terms of user experience evaluation. We propose state-of-the-art audio, visual and text techniques for the detection of perceptually salient events from movies. The evaluation of such computational models is usually based on the comparison of the similarity between the system-detected...

chapter

A novel model for phoneme recognition using phonetically derived features

Naomi Harte, Saeed Vaseghi, Paul McCourt

9th European Signal Processing Conference (EUSIPCO 1998) > 1 - 4

9th European Signal Processing Conference (EUSIPCO 1998)

This paper presents work on the use of segmental modelling and phonetic features for phoneme based speech recognition. The motivation for the work is to lessen the effects of the IID assumption in HMM based recognition. The use of phonetic features which are derived across the duration of a phonetic segment is discussed. In conjunction with the use of these features, a hybrid phoneme model is introduced...

chapter

MaxMBROLA: A Max/MSP MBROLA-based tool for real-time voice synthesis

Nicolas D'Alessandro, Raphael Sebbe, Baris Bozkurt, Thierry Dutoit

2005 13th European Signal Processing Conference > 1 - 4

2005 13th European Signal Processing Conference

In this paper, we present the first step of a project that is able to perform both speech and singing synthesis controlled in real-time. Our aim is to develop a flexible application allowing performers to produce complex and versatile singing - as well as speech - articulations. Thus, we have adapted an existing speech synthesizer, the MBROLA software, to real-time singing constraints. The work presented...

chapter

Eigenresiduals for improved parametric speech synthesis

Thomas Drugman, Geoffrey Wilfart, Thierry Dutoit

2009 17th European Signal Processing Conference > 2176 - 2180

2009 17th European Signal Processing Conference

Statistical parametric speech synthesizers have recently shown their ability to produce natural-sounding and flexible voices. Unfortunately the delivered quality suffers from a typical buzziness due to the fact that speech is vocoded. This paper proposes a new excitation model in order to reduce this undesirable effect. This model is based on the decomposition of pitch-synchronous residual frames...

chapter

Automatic speaker verification using nearest neighbor normalization (3N) on an iPad tablet

Houssemeddine Khemiri, Alexander Usoltsev, Marie-Christine Legout, Dijana Petrovska-Delacretaz, more

2014 International Conference of the Biometrics Special Interest Group (BIOSIG) > 1 - 8

2014 International Conference of the Biometrics Special Interest Group (BIOSIG)

This paper describes the development, implementation and validation of an automatic speaker recognition system on an iPad tablet. A score normalization approach, referred as Nearest Neighbor Normalization (3N), is applied in order to improve the baseline speaker verification system. The system is evaluated on the MOBIO corpus and results show an absolute improvement of the HTER by more than 4% when...

chapter

Towards improving the performance of language identification system for Indian languages

Abitha Anto, K. T. Sreekumar, C. Santhosh Kumar, P. C. Reghu Raj

2014 First International Conference on Computational Systems and Communications (ICCSC) > 42 - 46

2014 First International Conference on Computational Systems and Communications (ICCSC)

In this paper, we present the details of a phonotactic language identification (LID) system developed for five Indian languages, English (Indian), Hindi, Malayalam, Tamil and Kan-nada. Since there are no publicly available speech databases for English, Malayalam and Kannada, we developed the database for each of the target languages by downloading the audio files from YouTube videos and removing the...

INFONA - science communication portal

Search results

Local training in speaker verification for PLDA

Exploiting sequence information for text-dependent Speaker Verification

System combination for short utterance speaker recognition

Emirati speaker verification based on HMMls, HMM2s, and HMM3s

Binary speaker embedding

Analysis of glottal source parameters in Parkinsonian speech

Obstructive sleep apnea severity estimation: Fusion of speech-based systems

A new speech corpus in Spanish for speaker verification

A database for emotional interactions of the elderly

Analysis of JestKOD database using affective state annotations

Cohort selection for text-dependent speaker verification score normalization

Speaker specific features and phonemes in speech: a proposal for evaluating a possible interaction

Isolated speech recognition using Fuzzy C Means technique

Pause duration model for Malayalam TTS

Quality evaluation of computational models for movie summarization

A novel model for phoneme recognition using phonetically derived features

MaxMBROLA: A Max/MSP MBROLA-based tool for real-time voice synthesis

Eigenresiduals for improved parametric speech synthesis

Automatic speaker verification using nearest neighbor normalization (3N) on an iPad tablet

Towards improving the performance of language identification system for Indian languages

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options