Search results

chapter

Two-stage speech enhancement with manipulation of the cepstral excitation

Samy Elshamy, Nilesh Madhu, Wouter Tirry, Tim Fingscheidt

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA) > 106 - 110

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA)

The development of new speech enhancement techniques is a continuous progress to combat the impairment of speech signals by various acoustical environmental influences. In this contribution we propose a new two-stage speech enhancement algorithm, exploiting the source-filter model to decompose a denoised target signal, and specifically we manipulate the excitation signal in the cepstral domain. The...

chapter

Online meeting recognition in noisy environments with time-frequency mask based MVDR beamforming

Shoko Araki, Nobutaka Ito, Marc Delcroix, Atsunori Ogawa, more

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA) > 16 - 20

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA)

This paper addresses our new online meeting recognition prototype, which works even in noisy environments. For speech enhancement, we employ a mask-based minimum variance distortionless response (MVDR) beamformer, which has recently shown to be a successful front-end for a state-of-the-art deep neural network (DNN)-based automatic speech recognition (ASR) system. To ensure more accurate and computationally...

chapter

Speaker tracking in reverberant environments using multiple directions of arrival

Christine Evers, Boaz Rafaely, Patrick A. Naylor

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA) > 91 - 95

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA)

Accurate estimation of the Direction of Arrival (DOA) of a sound source is an important prerequisite for a wide range of acoustic signal processing applications. However, in enclosed environments, early reflections and late reverberation often lead to localization errors. Recent work demonstrated that improved robustness against reverberation can be achieved by clustering only the DOAs from direct-path...

chapter

Variable Step-Size Nonholonomic Natural Gradient Algorithm Based on Optimal Selective Function

Ji Ce, Zhang Jun, Shan Changfang, Li Shuangshuang

2017 9th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA) > 256 - 259

2017 9th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA)

By introducing the nonholonomic constraints, the nonholonomic natural gradient algorithm is effective to overcome the shortcomings of traditional natural gradient algorithm. Namely, when the source signal amplitude changes rapidly over time or is equal to zero in a certain period of time, it can still work well. In addition, selecting the different estimate function in different stage can get the...

chapter

Research on denoising method based on improved short — Time spectrum estimation

Jian Kang, Hongbo Wang

2016 5th International Conference on Computer Science and Network Technology (ICCSNT) > 771 - 775

2016 5th International Conference on Computer Science and Network Technology (ICCSNT)

With the popularity of mobile terminal equipment, voice communication is becoming more and more frequent, the application of speech recognition scene is increasing. All these put forward higher requirements on the accuracy of speech recognition, therefore, how to enhance the speech as effectively as possible is becoming more and more important. At present, a lot of research has been done on the preprocessing...

chapter

Pitch and formant estimation of bangla speech signal using autocorrelation, cepstrum and LPC algorithm

Muhammad Navid Anjum Aadit, Sharadindu Gopal Kirtania, Mehnaz Tabassum Mahin

2016 19th International Conference on Computer and Information Technology (ICCIT) > 371 - 376

2016 19th International Conference on Computer and Information Technology (ICCIT)

In this paper, we present comparative study of digital speech processing on Bangla speech signal. We represent oral characteristics of Bangla alphabet in terms of pitch and formant. We worked with both vowels and consonants to show their difference in practical use. We take oral speech signals as voice record and extract phonemes to analyze in both time and frequency domains. Both male and female...

chapter

A new efficient measure for accuracy prediction and its application to multistream-based unsupervised adaptation

Tetsuji Ogawa, Sri Harish Mallidi, Emmanuel Dupoux, Jordan Cohen, more

2016 23rd International Conference on Pattern Recognition (ICPR) > 2222 - 2227

2016 23rd International Conference on Pattern Recognition (ICPR)

A new efficient measure for predicting estimation accuracy is proposed and successfully applied to multistream-based unsupervised adaptation of ASR systems to address data uncertainty when the ground-truth is unknown. The proposed measure is an extension of the M-measure, which predicts confidence in the output of a probability estimator by measuring the divergences of probability estimates spaced...

chapter

A landmark-based approach to automatic voice onset time estimation in stop-vowel sequences

Stephan R. Kuberski, Stephen J. Tobin, Adamantios I. Gafos

2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP) > 60 - 64

2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

In the field of phonetics, voice onset time (VOT) is a major parameter of human speech defining linguistic contrasts in voicing. In this article, a landmark-based method of automatic VOT estimation in acoustic signals is presented. The proposed technique is based on a combination of two landmark detection procedures for release burst onset and glottal activity detection. Robust release burst detection...

chapter

An unsupervised vocabulary selection technique for Chinese automatic speech recognition

Yike Zhang, Pengyuan Zhang, Ta Li, Yonghong Yan

2016 IEEE Spoken Language Technology Workshop (SLT) > 420 - 425

2016 IEEE Spoken Language Technology Workshop (SLT)

The vocabulary is a vital component of automatic speech recognition(ASR) systems. For a specific Chinese speech recognition task, using a large general vocabulary not only leads to a much longer time to decode, but also hurts the recognition accuracy. In this paper, we proposed an unsupervised algorithm to select task-specific words from a large general vocabulary. The out-of-vocabulary(OOV) rate...

chapter

Weakly supervised user intent detection for multi-domain dialogues

Ming Sun, Aasish Pappu, Yun-Nung Chen, Alexander I. Rudnicky

2016 IEEE Spoken Language Technology Workshop (SLT) > 91 - 97

2016 IEEE Spoken Language Technology Workshop (SLT)

Users interact with mobile apps with certain intents such as finding a restaurant. Some intents and their corresponding activities are complex and may involve multiple apps; for example, a restaurant app, a messenger app and a calendar app may be needed to plan a dinner with friends. However, activities may be quite personal and third-party developers would not be building apps to specifically handle...

chapter

Fujisaki model parameter estimation: Solution by ‘direct-search’

Akshay Khatwani, D. N. Krishna, Komala Pawar, A. Sricharan, more

2016 IEEE Annual India Conference (INDICON) > 1 - 5

2016 IEEE Annual India Conference (INDICON)

We address the problem of estimation of the Fujisaki model parameters for F₀ synthesis. For this, we propose the use of a very efficient search and optimization method termed the ‘direct-search’ (Hooke and Jeeves, 1961) which belongs to the class of derivative-free unconstrained optimization methods, in the sense that it is applicable for non-linear optimization problems which are not amenable for...

chapter

Phonetic content impact on Forensic Voice Comparison

Ajili. Moez, Bonastre Jean-Francois, Ben Kheder Waad, Rossato Solange, more

2016 IEEE Spoken Language Technology Workshop (SLT) > 210 - 217

2016 IEEE Spoken Language Technology Workshop (SLT)

Forensic Voice Comparison (FVC) is increasingly using the likelihood ratio (LR) in order to indicate whether the evidence supports the prosecution (same-speaker) or defender (different-speakers) hypotheses. In addition to support one hypothesis, the LR provides a theoretically founded estimate of the relative strength of its support. Despite this nice theoretical aspect, the LR accepts some practical...

chapter

Improved prediction of the accent gap between speakers of English for individual-based clustering of World Englishes

Fumiya Shiozawa, Daisuke Saito, Nobuaki Minematsu

2016 IEEE Spoken Language Technology Workshop (SLT) > 129 - 135

2016 IEEE Spoken Language Technology Workshop (SLT)

The term of “World Englishes” describes the current state of English and one of their main characteristics is a large diversity of pronunciation, called accents. In our previous studies, we developed several techniques to realize effective clustering and visualization of the diversity. For this aim, the accent gap between two speakers has to be quantified independently of extra-linguistic factors...

chapter

SMT-based lexicon expansion for broadcast transcription

Manon Ichiki, Aiko Hagiwara, Hitoshi Ito, Kazuo Onoe, more

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

We describe a method of lexicon expansion to tackle variations of spontaneous speech. The variations of utterances are found widely in the programs such as conversations talk shows and are typically observed as unintelligible utterances with a high speech-rate. Unlike read speech in news programs, these variations often severely degrade automatic speech recognition (ASR) performance. Then, these variations...

chapter

A noise masking method with adaptive thresholds based on CASA

Feng Bao, Waleed H. Abdulla

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this paper, we propose a novel noise masking method based on Computational Auditory Scene Analysis by using an adaptive factor. Although it has succeeded in the field of speech separation and speech enhancement to some extent, the usage of fixed thresholds used for segregation and labeling heavily affects the processing performance. Focusing on this issue, the proposed method utilizes the Normalized...

chapter

Speech enhancement method with geometric phase estimation by incorporating MIXMAX model

Xianyun Wang, Changchun Bao

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this paper, we propose a frequency-domain speech enhancement algorithm with phase estimation, in which the speech model is modeled by a Gaussian mixture model (GMM) in the log-spectral domain and two closed-form log-spectral amplitude estimators for speech and noise are derived directly by using a Mixture-Maximum (MIXMAX) model. Because the accurate estimation of speech phase could help to reduce...

chapter

Improved ETSI advanced front-end for ASR based on robust complex speech analysis

Keita Higa, Keiichi Funaki

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

An automatic speech recognition (ASR) is commonly used in these days. Current ASR systems perform well in ideal environment, however it does not perform well in realistic noisy environment. As a robust ASR, ETSI has standardized Advanced Front-End (AFE) that adopts two-stage of iterative Wiener filter (IWF) to realize a speech enhancement as the front-end of ASR. In the ETSI AFE, FFT is used to estimate...

chapter

Voice-pathology analysis based on AR-HMM

Akira Sasou

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Voice-pathology detection from a subject's voice is a promising technology for pre-diagnosis of larynx diseases. Glottal source estimation in particular plays a very important role in voice-pathology analysis. For more accurate estimation of the spectral envelope and glottal source of the pathology voice, we propose a method that can automatically generate the topology of the glottal source Hidden...

chapter

Incremental approach to NMF basis estimation for audio source separation

Kisoo Kwon, Jong Won Shin, Inkyu Choi, Hyung Yong Kim, more

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Nonnegative matrix factorization (NMF) is a matrix factorization technique that might find meaningful latent nonnegative components. Since, however, the objective function is non-convex, the source separation performance can degrade when the iterative update of the basis matrix is stuck to a poor local minimum. Most of the research updates basis iteratively to minimize certain objective function with...

chapter

Voice conversion to emotional speech based on three-layered model in dimensional approach and parameterization of dynamic features in prosody

Yawen Xue, Yasuhiro Hamada, Masato Akagi

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 6

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper proposes a system to convert neutral speech to emotional with controlled intensity of emotions. Most of previous researches considering synthesis of emotional voices used statistical or concatenative methods that can synthesize emotions in categorical emotional states such as joy, angry, sad, etc. While humans sometimes enhance or relieve emotional states and intensity during daily life,...

INFONA - science communication portal

Search results

Two-stage speech enhancement with manipulation of the cepstral excitation

Online meeting recognition in noisy environments with time-frequency mask based MVDR beamforming

Speaker tracking in reverberant environments using multiple directions of arrival

Variable Step-Size Nonholonomic Natural Gradient Algorithm Based on Optimal Selective Function

Research on denoising method based on improved short — Time spectrum estimation

Pitch and formant estimation of bangla speech signal using autocorrelation, cepstrum and LPC algorithm

A new efficient measure for accuracy prediction and its application to multistream-based unsupervised adaptation

A landmark-based approach to automatic voice onset time estimation in stop-vowel sequences

An unsupervised vocabulary selection technique for Chinese automatic speech recognition

Weakly supervised user intent detection for multi-domain dialogues

Fujisaki model parameter estimation: Solution by ‘direct-search’

Phonetic content impact on Forensic Voice Comparison

Improved prediction of the accent gap between speakers of English for individual-based clustering of World Englishes

SMT-based lexicon expansion for broadcast transcription

A noise masking method with adaptive thresholds based on CASA

Speech enhancement method with geometric phase estimation by incorporating MIXMAX model

Improved ETSI advanced front-end for ASR based on robust complex speech analysis

Voice-pathology analysis based on AR-HMM

Incremental approach to NMF basis estimation for audio source separation

Voice conversion to emotional speech based on three-layered model in dimensional approach and parameterization of dynamic features in prosody

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options