Search results

chapter

A robust spoken Q&A system with scarce in-domain resources

Luis Fernando D'Haro, Seokhwan Kim, Rafael E. Banchs

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 47 - 53

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Nowadays there is an increasing interest on deploying spoken conversational agents to provide ubiquitous Question and Answering information to customers about corporate services and commercial products and supporting different users' devices such as PC desktops or mobile phones. Unfortunately, creating an accurate system requires a lot of handwork, where developers must consider several factors such...

chapter

A spoken dialog system with redundant response to prevent user misunderstanding

Masaki Yamaoka, Sunao Hara, Masanobu Abe

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 223 - 226

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

We propose a spoken dialog strategy for car navigation systems to facilitate safe driving. To drive safely, drivers need to concentrate on their driving; however, their concentration may be disrupted due to disagreement with their spoken dialog system. Therefore, we need to solve the problems of user misunderstandings as well as misunderstanding of spoken dialog systems. For this purpose, we introduced...

chapter

Robust formant features for speaker verification in the lombard effect

Ileun Kwak, Hong-Goo Kang

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 114 - 118

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper presents a voice controlled speaker verification system for hand-held devices in noisy environments. In noisy environments, users unintentionally increase their voice intensity because of the ear-mouth feedback mechanism i.e., the Lombard effect; thus, the characteristic of the input signal is much different from that in a quiet environment. To enhance the accuracy of a speaker verification...

chapter

A waveform representation framework for high-quality statistical parametric speech synthesis

Bo Fan, Siu Wa Lee, Xiaohai Tian, Lei Xie, more

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 530 - 536

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

State-of-the-art statistical parametric speech synthesis (SPSS) generally uses a vocoder to represent speech signals and parameterize them into features for subsequent modeling. Magnitude spectrum has been a dominant feature over the years. Although perceptual studies have shown that phase spectrum is essential to the quality of synthesized speech, it is often ignored by using a minimum phase filter...

chapter

A preliminary study of a hybrid user interface for augmented reality applications

Federico Manuri, Giovanni Piumatti

2015 7th International Conference on Intelligent Technologies for Interactive Entertainment (INTETAIN) > 37 - 41

2015 7th International Conference on Intelligent Technologies for Interactive Entertainment (INTETAIN)

Augmented Reality (AR) applications are nowadays largely diffused in many fields of use, especially for entertainment, and the market of AR applications for mobile devices grows faster and faster. Moreover, new and innovative hardware for human-computer interaction has been deployed, such as the Leap Motion Controller. This paper presents some preliminary results in the design and development of a...

chapter

Compensating changes in speaker position for improved voice-based human-robot communication

Randy Gomez, Keisuke Nakamura, Takeshi Mizumoto, Kazuhiro Nakadai

2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids) > 977 - 982

2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids)

Acoustic perturbation due to reverberation and the changes in speaker position are detrimental to seamless human-robot speech-based communication. These cause a mismatch between the speech features at runtime condition and the acoustic model (training condition). Then the degradation of the Automatic Speech Recognition (ASR) and the Spoken Language Understanding (SLU) performances is imminent. As...

chapter

Speech and EGG polarity detection using Hilbert Envelope

K. T. Deepak, K. Ramesh, Nagaraj Adiga, S. R. M. Prasanna

TENCON 2015 - 2015 IEEE Region 10 Conference > 1 - 6

TENCON 2015 - 2015 IEEE Region 10 Conference

This work proposes two different methods for polarity detection in speech and Electroglottograph (EGG) signals using Hilbert Envelope (HE). HE is defined as the magnitude of complex time function and hence an unipolar signal. The zero frequency filtering (ZFF) obtained from HE of LP residual is of same phase for both polarity. Alternatively, the ZFF of speech and EGG, integrated linear prediction...

chapter

Speech pattern classification using Large Geometric Margin Minimum Classification Error training

Mikiyo Kitaoka, Tetsuya Hashimoto, Tsubasa Ochiai, Shigeru Katagiri, more

TENCON 2015 - 2015 IEEE Region 10 Conference > 1 - 6

TENCON 2015 - 2015 IEEE Region 10 Conference

As one of the recent popular discriminative training methods, Minimum Classification Error (MCE) training aims at efficiently developing high-performance classifiers through the minimization of smooth (differentiable in classifier parameters) classification error count loss. However, MCE training, sometimes referred to as Functional Margin (FM) MCE training, does not necessarily guarantee training...

chapter

Modified energy based method for word endpoints detection of continuous speech signal in real world environment

Provat Kumar Pal, Santanu Phadikar

2015 IEEE International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN) > 381 - 385

2015 IEEE International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN)

Accurately identifying the word endpoints is an important step of speech recognition process. This paper proposes a robust word endpoints detection algorithm of continuous speech signal collected from real world environment. In this process energy feature is used along with zero crossing rate feature to locate the endpoints of word in speech signal. A set of 100 different sentences have been recorded...

chapter

Speech enhancement based on robust NMF solved by alternating direction method of multipliers

Yinan Li, Xiongwei Zhang, Meng Sun, Jingfeng Pan

2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP) > 1 - 5

2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP)

A robust version of non-negative matrix factorization (RNMF) with generalized Kullback-Leibler divergence designed for the task of unsupervised monaural speech enhancement is proposed. RNMF tackles unsupervised speech enhancement problem through factorizing the magnitude spectrum of mixture into the sum of a non-negative sparse matrix and a non-negative low-rank matrix. The parameters of nonnegative...

chapter

Separation of singing voice from music accompaniment using matrix factorization method

Harshada Burute, P. B. Mane

2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT) > 166 - 171

2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)

Songs play an important role in entertainment. An audio signal separation system should be able to identify different audio signals such as speech, music and background noise. In a song the singing voice provides useful information. An automatic singing voice separation system is used for attenuating or removing the music accompaniment. The singing voice becomes a main attractive focus of attention...

chapter

New features for emotional speech recognition

Hemanta Kumar Palo, Mihir Narayan Mohanty, Mahesh Chandra

2015 IEEE Power, Communication and Information Technology Conference (PCITC) > 424 - 429

2015 IEEE Power, Communication and Information Technology Conference (PCITC)

Bio-medical research extends towards human voice and auditory systems day by day. Similarly it helps for the security issues. Emotion analysis and recognition for such purpose is a challenging task. To analyze and recognize, the emotions has been attempted in this piece of work. Initially, Sub-band spectral features have been extracted to characterize high arousal angry, happy, fear, surprise and...

chapter

On multimodality in the perception of emotions from materials of the HuComTech corpus

Laszlo Hunyadi

2015 6th IEEE International Conference on Cognitive Infocommunications (CogInfoCom) > 489 - 492

2015 6th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)

Emotions are important constituents of human behavior. The production and perception of cues of emotions is a complex task involving both verbal and nonverbal aspects of behavior. This complexity is further enhanced by the fact that emotions are subject to interpretation; a resulting emotion cannot be compositionally derived from its constituent building blocks. Even though we commonly associate an...

chapter

From simulated speech to natural speech, what are the robust features for emotion recognition?

Ya Li, Linlin Chao, Yazhu Liu, Wei Bao, more

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) > 368 - 373

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

The earliest research on emotion recognition starts with simulated/acted stereotypical emotional corpus, and then extends to elicited corpus. Recently, the demanding for real application forces the research shift to natural and spontaneous corpus. Previous research shows that accuracies of emotion recognition are gradual decline from simulated speech, to elicited and totally natural speech. This paper...

chapter

Robot audition based Acoustic Event Identification using a Bayesian model considering spectral and temporal uncertainties

Keisuke Nakamura, Kazuhiro Nakadai

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) > 4840 - 4845

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

To analyze auditory scenes of robots' surrounding environments, not only speeches but also non-speech sounds are important, which are spatially distributed and have different spectral and temporal characteristics. Thus, this paper investigates Acoustic Event Identification (AEI) which includes problems of localization, detection, and identification of sound sources. To achieve AEI by a robot in a...

chapter

Robustness analysis of automatic speech signal recognition system against factors degrading speech signal

Jaroslaw Oska, Jaroslaw Wojtun, Krzysztof Wodecki, Zbigniew Piotrowski

2015 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA) > 71 - 75

2015 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)

In the article there are presented the results of research on the influence of the lossy compression, used in codecs G.711, G.723.1 and iLBC, on the efficiency of isolated speech phrase recognition. In the research the degree of robustness against degrading factors in the parameterisation method of audio signal LPCC and MFCC (Linear Prediction Cepstral Coefficients, Mel Frequency Cepstral Coefficients)...

chapter

EMD based clear recursive thresholding (EMD-CRT) for speech enhancement

Nageswara Rao Saggurti, Jaya Shankar

2015 International Conference on Signal Processing, Computing and Control (ISPCC) > 149 - 154

2015 International Conference on Signal Processing, Computing and Control (ISPCC)

In this paper, a novel speech enhancement approach was proposed to improve the quality of the speech contaminated with various types of non-stationary noises. An EMD based clear recursive thresholding (EMD-CRT) approach was proposed in this approach, inspired by wavelet thresholding. This approach performs the thresholding operation on the noisy speech recursively, such that the non-stationary noises...

chapter

Improving Robustness of Speaker Recognition in Noisy and Reverberant Conditions via Training

Ahmed H. Al-Noori, Khamis A. Al-Karawi, Francis F. Li

2015 European Intelligence and Security Informatics Conference > 180

2015 European Intelligence and Security Informatics Conference (EISIC)

Speaker recognition can be used as a security means to authenticate the speaker or as a forensic tool to determine who is likely to be the talker. For such critical applications, robustness or reliability of the system is crucial. In spite of the development and advancement in the field of speaker recognition, there are still many limitations and challenges. Amongst these, environment factors, in...

chapter

An Automatic Watermarking in CELP Speech Codec Based on Formant Tuning

Erick Christian Garcia Alvarez, Shengbei Wang, Masashi Unoki

2015 International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP) > 160 - 163

2015 International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP)

This paper proposes the unification of the codeexcited linear prediction (CELP) codec process with watermarking based on formant tuning. The serial problem in atermarking and then encoding with the CELP codec was thereby reduced by using the proposed method which also ncreased the bit detection rate. We took advantage of two key properties: I) humans do not perceive alterations applied to formants...

chapter

Privacy-enhanced perceptual hashing of audio data

Heiko Knospe

2013 International Conference on Security and Cryptography (SECRYPT) > 1 - 6

2013 International Conference on Security and Cryptography (SECRYPT)

Audio hashes are compact and robust representations of audio data and allow the efficient identification of specific recordings and their transformations. Audio hashing for music identification is well established and similar algorithms can also be used for speech data. A possible application is the identification of replayed telephone spam. This contribution investigates the security and privacy...

INFONA - science communication portal

Search results

A robust spoken Q&A system with scarce in-domain resources

A spoken dialog system with redundant response to prevent user misunderstanding

Robust formant features for speaker verification in the lombard effect

A waveform representation framework for high-quality statistical parametric speech synthesis

A preliminary study of a hybrid user interface for augmented reality applications

Compensating changes in speaker position for improved voice-based human-robot communication

Speech and EGG polarity detection using Hilbert Envelope

Speech pattern classification using Large Geometric Margin Minimum Classification Error training

Modified energy based method for word endpoints detection of continuous speech signal in real world environment

Speech enhancement based on robust NMF solved by alternating direction method of multipliers

Separation of singing voice from music accompaniment using matrix factorization method

New features for emotional speech recognition

On multimodality in the perception of emotions from materials of the HuComTech corpus

From simulated speech to natural speech, what are the robust features for emotion recognition?

Robot audition based Acoustic Event Identification using a Bayesian model considering spectral and temporal uncertainties

Robustness analysis of automatic speech signal recognition system against factors degrading speech signal

EMD based clear recursive thresholding (EMD-CRT) for speech enhancement

Improving Robustness of Speaker Recognition in Noisy and Reverberant Conditions via Training

An Automatic Watermarking in CELP Speech Codec Based on Formant Tuning

Privacy-enhanced perceptual hashing of audio data

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options