2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

chapter

SMT-based lexicon expansion for broadcast transcription

Manon Ichiki, Aiko Hagiwara, Hitoshi Ito, Kazuo Onoe, more

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

We describe a method of lexicon expansion to tackle variations of spontaneous speech. The variations of utterances are found widely in the programs such as conversations talk shows and are typically observed as unintelligible utterances with a high speech-rate. Unlike read speech in news programs, these variations often severely degrade automatic speech recognition (ASR) performance. Then, these variations...

chapter

Improved keyword spotting based on keyword/garbage models

Qiyu Chen, Weibin Zhang, Xiangmin Xu, Xiaofen Xing

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

We propose two simple methods to improve the performance of a keyword spotting system. In our application, the users are allowed to change the keywords anytime if they want. Thus we focused on phone-based GMM-HMM models since they do not require keyword-specific training data. However, the GMM-HMM based models usually have very high false alarm rate, i.e., a keyword is not present but the system gives...

chapter

Evaluation of singing enthusiasm for songs with multiple phrases

Pei Pei Chen, Shyh-Kang Jeng, Jyh-Shing Roger Jang, Nobutaka Ono

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

A system for automatically evaluating singing enthusiasm is proposed in this study. The definition of singing enthusiasm is how much enthusiasm is perceived in a song being evaluated. This system evaluates the singing enthusiasm on the basis of pitch accuracy, vibrato, diminuendo, roughness, and the correlation between pitch and loudness. A support vector regression (SVR) machine is used for the evaluation...

chapter

Colorimetric background estimation for color blending reduction of OST-HMD

Je-Ho Ryu, Jae-Woo Kim, Kang-Kyu Lee, Jong-Ok Kim

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In an augmented reality scenario, the perceived image of OST-HMD contains color distortion due to background color blending. In order to reduce color blending, accurate estimation of background color is necessary. In this paper, we perform colorimetric estimation of background using camera images, via local linear regression. Using the estimated background color, virtual image is compensated. Experimental...

chapter

Classification of footstep attributes using a vibration sensor

Futoshi Asano, Miyuki Fukushima

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

The evacuation of children and the elderly from disaster areas is sometimes difficult. This study aims to use a vibration sensor to estimate situations involving people who remain in a devastated building. This paper proposes a method to estimate the attributes of the people, such as their age or sex, based on the vibration data produced by their footsteps. The vibration data obtained through sensors...

chapter

Deep networks with stochastic depth for acoustic modelling

Duisheng Chen, Weibin Zhang, Xiangmin Xu, Xiaofeng Xing

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Training very deep neural networks is very difficult because of gradient degradation. However, the incomparable expressiveness of the many deep layers is highly desirable at testing time and usually leads to better performance. Recently, training techniques such as residual networks that enable us to train very deep networks have proved to be a great success. In this paper, we studied the application...

chapter

DNN based detection of pronunciation erroneous tendency in data sparse condition

Yingming Gao, Yanlu Xie, Ju Lin, Jinsong Zhang

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Detecting pronunciation erroneous tendency (PET) can provide second languages learners with detailedly instructive feedbacks in the computer aided pronunciation training (CAPT) systems. Due to the data sparseness, DNN-HMM achieved limited improvement over GMM-HMM in our previous work. Instead of directly employing DNN-HMM to detect PETs, this paper investigated how to further improve the performance...

chapter

A study on target feature activation and normalization and their impacts on the performance of DNN based speech dereverberation systems

Bo Wu, Kehuang Li, Minglei Yang, Chin-Hui Lee

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

We adopt a linear activation function at the output layer and globally normalize the target features into zero mean and unit variance to learn the complicated mapping from reverberant to anechoic speech with a regression model based on deep neural networks (DNNs). The proposed feature activation and normalization framework was found to retain clearly observable harmonics and improve the speech quality...

chapter

Enhancing a glossectomy patient's speech via GMM-based voice conversion

Kei Tanaka, Sunao Hara, Masanobu Abe, Shogo Minagi

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this paper, we describe the use of a voice conversion algorithm for improving the intelligibility of speech by patients with articulation disorders caused by a wide glossectomy and/or segmental mandibulectomy. As a first trial, to demonstrate the difficulty of the task at hand, we implemented a conventional Gaussian mixture model (GMM)-based algorithm using a frame-by-frame approach. We compared...

chapter

Optimal automatic speech recognition system selection for noisy environments

Yuuki Tachioka, Tomohiro Narita

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 8

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

To improve the performance of noisy automatic speech recognition (ASR), it is effective to prepare multiple ASR systems that can address the large varieties of noise. However, the optimal ASR system is different for each environment and mismatches between training and testing degrade ASR performance. In this situation, the overall system combination of multiple systems is effective; however, the computational...

INFONA - science communication portal

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

SMT-based lexicon expansion for broadcast transcription

Improved keyword spotting based on keyword/garbage models

Evaluation of singing enthusiasm for songs with multiple phrases

Colorimetric background estimation for color blending reduction of OST-HMD

Classification of footstep attributes using a vibration sensor

Deep networks with stochastic depth for acoustic modelling

DNN based detection of pronunciation erroneous tendency in data sparse condition

A study on target feature activation and normalization and their impacts on the performance of DNN based speech dereverberation systems

Enhancing a glossectomy patient's speech via GMM-based voice conversion

Optimal automatic speech recognition system selection for noisy environments

Filter options

Publication date

Keywords

INFONA - science communication portal

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)