2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

rozdział

Sudden-noise suppression with strike-portion detection based on phase linearity for speech recognition

Terumi Umematsu, Shuji Komeiji, Masanori Tsujikawa, Ryosuke Isotani

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

We propose a sudden-noise suppression method for speech recognition using a phase linearity feature for noise detection. Our investigation of sound data recorded in actual retail stores shows that short, sudden noises are dominant in such environments. We also confirm the negative effect of such noises on speech recognition performance. Our method addresses this problem by focusing on sudden noises...

rozdział

SMT-based lexicon expansion for broadcast transcription

Manon Ichiki, Aiko Hagiwara, Hitoshi Ito, Kazuo Onoe, więcej

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

We describe a method of lexicon expansion to tackle variations of spontaneous speech. The variations of utterances are found widely in the programs such as conversations talk shows and are typically observed as unintelligible utterances with a high speech-rate. Unlike read speech in news programs, these variations often severely degrade automatic speech recognition (ASR) performance. Then, these variations...

rozdział

Multi-lingual and multi-task DNN learning for articulatory error detection

Richeng Duan, Tatsuya Kawahara, Masatake Dantsuji, Jinsong Zhang

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

For effective pronunciation error detection for second language learners, we address articulatory models based on deep neural network (DNN). Articulatory attributes are defined for manner and place of articulation. In order to efficiently train these models of non-native speech without using such data, which is difficult to collect in a large scale, we propose a multi-lingual learning method, in which...

rozdział

AP16-OL7: A multilingual database for oriental languages and a language recognition baseline

Dong Wang, Lantian Li, Difei Tang, Qing Chen

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

We present the AP16-OL7 database which was released as the training and test data for the oriental language recognition (OLR) challenge on APSIPA 2016. Based on the database, a baseline system was constructed on the basis of the i-vector model. We report the baseline results evaluated in various metrics defined by the AP16-OLR evaluation plan and demonstrate that AP16-OL7 is a reasonable data resource...

rozdział

Tibetan vowel analysis with a multi-modal Mandarin-Tibetan speech corpus

Gyaltsen Lobsang, Wenhuan Lu, Kiyoshi Honda, Jianguo Wei, więcej

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 6

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper reports on the construction of a multi-modal Mandarin-Tibetan speech database collected from native speakers of WeiZang dialect. The Mandarin-Tibetan corpus contains 41 Tibetan sentences, 27 Chinese sentences, 30 Tibetan consonants, 4 Tibetan vowels, and 25 Tibetan monosyllables. A multi-modal data collection system was established, which comprises an ultrasound scanner, high-speed camera,...

rozdział

Using tone-based extended recognition network to detect non-native Mandarin tone mispronunciations

Wei Li, Sabato Marco Siniscalchi, Nancy F. Chen, Chin-Hui Lee

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this paper, we investigate a DNN tone-based extended recognition network (ERN) approach to Mandarin tone recognition and tone mispronunciation detection. Given a toneless syllable sequence, a tone-based ERN is constructed by assigning five different tones to each toneless syllable, obtaining a fully expanded tonal syllable network. Next, Viterbi decoding is carried out on the tone-based ERN to...

rozdział

Brain-computer interface technology for speech recognition: A review

Mashael M. AlSaleh, Mahnaz Arvaneh, Heidi Christensen, Roger K. Moore

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper presents an overview of the studies that have been conducted with the purpose of understanding the use of brain signals as input to a speech recogniser. The studies have been categorised based on the type of the technology used with a summary of the methodologies used and achieved results. In addition, the paper gives an insight into some studies that examined the effect of the chosen stimuli...

rozdział

Investigation of glottal features and annotation procedures for speech emotion recognition

Masaaki Takebe, Kazumasa Yamamoto, Seiichi Nakagawa

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Speech emotion recognition is a still challenging problem despite having been investigated over the last couple of decades. Conventional speech emotion recognition performance is low, but this may be improved by considering new features and an annotation method. In this paper, firstly we use glottal features for speech emotion recognition to improve its performance because the emotions are related...

rozdział

Investigation on acoustic modeling with different phoneme set for continuous Lhasa Tibetan recognition based on DNN method

Hongcui Wang, Kuntharrgyal Khyuru, Jian Li, Guanyu Li, więcej

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Deep neural network (DNN) acoustic models have significantly advanced forward in recent years, outperforming the traditional Gaussian Mixture Hidden Markov Model (GMM-HMM) in large vocabulary continuous speech recognition tasks. We try to develop a practical Lhasa Tibetan ASR system. For higher speech recognition accuracy, in this paper, we consider to investigate the performances of Tibetan acoustic...

rozdział

Unsupervised single-channel speech separation via deep neural network for different gender mixtures

Yannan Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this study, we propose a regression approach via deep neural network (DNN) for unsupervised speech separation in a single-channel setting. We rely on a key assumption that two speakers could be well segregated if they are not too similar to each other. A dissimilarity measure between two speakers is then proposed to characterize the separation ability between competing speakers. We demonstrate...

rozdział

Relative entropy normalized Gaussian supervector for speech emotion recognition using kernel extreme learning machine

Ruru Li, Dali Yang, Xinxing Li, Renyu Wang, więcej

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Speech emotion recognition is a challenging and significant task. On the one hand, the emotion features need to be robust enough to capture the emotion information, and while on the other, machine learning algorithms need to be insensitive to model the utterance. In this paper, we presented a novel framework of speech emotion recognition to address the two above-mentioned challenges. Relative Entropy...

rozdział

Automatic pronunciation assessment of Korean spoken by L2 learners using best feature set selection

Hyuksu Ryu, Hyejin Hong, Sunhee Kim, Minhwa Chung

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 6

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper proposes a method for automatic pronunciation assessment of Korean spoken by L2 learners by selecting the best feature set from a collection of the most well-known features in the literature. The L2 Korean Speech Corpus is used for assessment modeling, where the native languages of the L2 learners are English, Chinese, Japanese, Russian, and Mongolian. In our system, learners' speech is...

rozdział

Speech emotion recognition using convolutional and Recurrent Neural Networks

Wootaek Lim, Daeyoung Jang, Taejin Lee

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

With rapid developments in the design of deep architecture models and learning algorithms, methods referred to as deep learning have come to be widely used in a variety of research areas such as pattern recognition, classification, and signal processing. Deep learning methods are being applied in various recognition tasks such as image, speech, and music recognition. Convolutional Neural Networks...

rozdział

I-vector based deep neural network acoustic model adaptation using multilingual language resource

Haihua Xu, Wei Rao, Xiong Xiao, Hao Huang, więcej

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

I-vector adaptation of DNN-HMM acoustic models has shown clear performance improvement for speech recognition. In this paper, we study this technique on Babel task. we use Swahili as target language (training data of 50 hours) and another 6 languages as multilingual resources to train i-vector extractors respectively. Our study shows that i-vector extractors trained with more multilingual data only...

rozdział

Voice conversion from non-parallel corpora using variational auto-encoder

Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, więcej

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 6

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora. Many SC frameworks require parallel corpora, phonetic alignments, or explicit frame-wise correspondence for learning conversion functions or for synthesizing a target spectrum with the aid of alignments. However, these requirements gravely limit the scope of practical applications of SC due...

rozdział

Multi-task recurrent model for speech and speaker recognition

Zhiyuan Tang, Lantian Li, Dong Wang

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Although highly correlated, speech and speaker recognition have been regarded as two independent tasks and studied by two communities. This is certainly not the way that people behave: we decipher both speech content and speaker traits at the same time. This paper presents a unified model to perform speech and speaker recognition simultaneously and altogether. The model is based on a unified neural...

rozdział

Domain adaptation of a speech translation system for lectures by utilizing frequently appearing parallel phrases in-domain

Norioki Goto, Kazumasa Yamamoto, Seiichi Nakagawa

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper describes our scheme to translate spoken English lectures into Japanese consisting of an English automatic speech recognition system (ASR) that utilizes a deep neural network (DNN) and an English to Japanese phrase-based statistical machine translation system (SMT). We focused on domain adaptation of the acoustic and translation models. For domain adaptation of the translation model, frequently...

rozdział

System combination for short utterance speaker recognition

Lantian Li, Dong Wang, Xiaodong Zhang, Thomas Fang Zheng, więcej

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

For text-independent short-utterance speaker recognition (SUSR), the performance often degrades dramatically. This paper presents a combination approach to the SUSR tasks with two phonetic-aware systems: one is the DNN-based i-vector system and the other is our recently proposed subregion-based GMM-UBM system. The former employs phone posteriors to construct an i-vector model in which the shared statistics...

rozdział

A discriminative training method incorporating pronunciation variations for dysarthric automatic speech recognition

Woo Kyeong Seong, Nam Kyun Kim, Hun Kyu Ha, Hong Kook Kim

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

While dysarthric speech recognition can be a convenient interface for dysarthric speakers, it is hard to collect enough speech data to overcome the underestimation problem of acoustic models. In addition, there are lots of pronunciation variations in the collected database due to the paralysis of the articulator of dysarthric speakers. Thus, a discriminative training method is proposed for improving...

rozdział

Multi-task recurrent model for true multilingual speech recognition

Zhiyuan Tang, Lantian Li, Dong Wang

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Research on multilingual speech recognition remains attractive yet challenging. Recent studies focus on learning shared structures under the multi-task paradigm, in particular a feature sharing structure. This approach has been found effective to improve performance on each individual language. However, this approach is only useful when the deployed system supports just one language. In a true multilingual...

INFONA - portal komunikacji naukowej

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Sudden-noise suppression with strike-portion detection based on phase linearity for speech recognition

SMT-based lexicon expansion for broadcast transcription

Multi-lingual and multi-task DNN learning for articulatory error detection

AP16-OL7: A multilingual database for oriental languages and a language recognition baseline

Tibetan vowel analysis with a multi-modal Mandarin-Tibetan speech corpus

Using tone-based extended recognition network to detect non-native Mandarin tone mispronunciations

Brain-computer interface technology for speech recognition: A review

Investigation of glottal features and annotation procedures for speech emotion recognition

Investigation on acoustic modeling with different phoneme set for continuous Lhasa Tibetan recognition based on DNN method

Unsupervised single-channel speech separation via deep neural network for different gender mixtures

Relative entropy normalized Gaussian supervector for speech emotion recognition using kernel extreme learning machine

Automatic pronunciation assessment of Korean spoken by L2 learners using best feature set selection

Speech emotion recognition using convolutional and Recurrent Neural Networks

I-vector based deep neural network acoustic model adaptation using multilingual language resource

Voice conversion from non-parallel corpora using variational auto-encoder

Multi-task recurrent model for speech and speaker recognition

Domain adaptation of a speech translation system for lectures by utilizing frequently appearing parallel phrases in-domain

System combination for short utterance speaker recognition

A discriminative training method incorporating pronunciation variations for dysarthric automatic speech recognition

Multi-task recurrent model for true multilingual speech recognition

Opcje filtrowania

Data publikacji

Słowa kluczowe

INFONA - portal komunikacji naukowej

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) $("#expandableTitles").expandable();

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Słowa kluczowe

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)