Search results

Items from 1 to 13 out of 13 results

chapter

Exploring recurrent neural network based acoustic and linguistic modeling for children's speech recognition

Sreeram Ganji, Rohit Sinha

TENCON 2017 - 2017 IEEE Region 10 Conference > 2880 - 2884

TENCON 2017 - 2017 IEEE Region 10 Conference

The conventional automatic speech recognition (ASR) systems employ the GMM-HMM for acoustic modeling and the n-gram for language modeling. Over the last decade, the deep feed-forward neural network (DFNN) has almost replaced the GMM in acoustic modeling. The current ASR systems are predominantly based on the DFNN-HMM acoustic model and the n-gram language model (LM). Owing to better long-term context...

chapter

Research on unified phone set for Mandarin-Tibetan Bilingual ASR

Guanyu Li, Hongzhi Yu, Shipeng Xu

2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) > 478 - 482

2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)

Mandarin and Tibetan Lhasa dialect are chosen to be the research objects. Phones sets and corresponding Latin Transformation scheme of Mandarin and Tibetan Lhasa dialect are established respectively. KL distance between two GMMs are studied. GMM-HMM models for phones of two languages are trained on the basis of corpus and pronunciation dictionaries. Phones of Mandarin and Tibetan Lhasa dialect are...

chapter

Efficient deep neural networks for speech synthesis using bottleneck features

Young-Sun Joo, Won-Suk Jun, Hong-Goo Kang

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper proposes a cascading deep neural network (DNN) structure for speech synthesis system that consists of text-to-bottleneck (TTB) and bottleneck-to-speech (BTS) models. Unlike conventional single structure that requires a large database to find complicated mapping rules between linguistic and acoustic features, the proposed structure is very effective even if the available training database...

chapter

On the training of DNN-based average voice model for speech synthesis

Shan Yang, Zhizheng Wu, Lei Xie

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 6

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Adaptability and controllability are the major advantages of statistical parametric speech synthesis (SPSS) over unit-selection synthesis. Recently, deep neural networks (DNNs) have significantly improved the performance of SPSS. However, current studies are mainly focusing on the training of speaker-dependent DNNs, which generally requires a significant amount of data from a single speaker. In this...

chapter

DNN-based unit selection using frame-sized speech segments

Zhi-Ping Zhou, Zhen-Hua Ling

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

This paper presents a deep neural network (DNN)-based unit selection method for waveform concatenation speech synthesis using frame-sized speech segments. In this method, three DNNs are adopted to calculate target costs and concatenation costs respectively for selecting frame-sized candidate units. The first DNN is built in the same way as the DNN-based statistical parametric speech synthesis, which...

chapter

Cross-linguistic perception of Chinese attitudes praising and blaming

Ping Tang, Lei Liu, Shanpeng Li, Wentao Gu

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) > 113 - 117

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

This study compared the perceptions of Chinese sentences conveying the attitudinal contrast of praising and blaming by five groups of subjects (Chinese natives, Japanese L2 learners of Mandarin, French L2 learners of Mandarin, Japanese and French subjects without any Mandarin ability). Context-elicited target sentences conveying praising, blaming or neutral attitude were used as stimuli in the listening...

chapter

Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis

Yuchen Fan, Yao Qian, Frank K. Soong, Lei He

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4475 - 4479

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In DNN-based TTS synthesis, DNNs hidden layers can be viewed as deep transformation for linguistic features and the output layers as representation of acoustic space to regress the transformed linguistic features to acoustic parameters. The deep-layered architectures of DNN can not only represent highly-complex transformation compactly, but also take advantage of huge amount of training data. In this...

chapter

Multi-view features in a DNN-CRF model for improved sentence unit detection on English broadcast news

Guangpu Huang, Chenglin Xu, Xiong Xiao, Lei Xie, more

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific > 1 - 9

2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper presents a deep neural network-conditional random field (DNN-CRF) system with multi-view features for sentence unit detection on English broadcast news. We proposed a set of multi-view features extracted from the acoustic, articulatory, and linguistic domains, and used them together in the DNN-CRF model to predict the sentence boundaries. We tested the accuracy of the multi-view features...

chapter

Comparison between decision-level and feature-level fusion of acoustic and linguistic features for spontaneous emotion recognition

Santiago Planet, Ignasi Iriondo

7th Iberian Conference on Information Systems and Technologies (CISTI 2012) > 1 - 6

2012 7th Iberian Conference on Information Systems and Technologies (CISTI)

Detection of affective states in speech could improve the way users interact with electronic devices. However the analysis of speech at the acoustic level could be not enough to determine the emotion of a user speaking in a realistic scenario. In this paper we analysed the spontaneous speech recordings of the FAU Aibo Corpus at the acoustic and linguistic levels to extract two sets of acoustic and...

chapter

Morpheme concatenation approach in language modeling for large-vocabulary Uyghur speech recognition

Mijit Ablimit, Askar Hamdulla, Tatsuya Kawahara

2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) > 112 - 115

2011 Oriental COCOSDA 2011 - International Conference on Speech Database and Assessments

For large-vocabulary continuous speech recognition (LVCSR) of highly-inflected languages, selection of an appropriate recognition unit is the first important step. The morpheme-based approach is often adopted because of its high coverage and linguistic properties. But morpheme units are short, often consisting of one or two phonemes, thus they are more likely to be confused in ASR than word units...

chapter

Integrating frame-based and segment-based dynamic time warping for unsupervised spoken term detection with spoken queries

Chun-an Chan, Lin-shan Lee

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5652 - 5655

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Rapidly increasing quantities of multimedia and spoken content today demand fast and accurate retrieval approaches for convenient browsing. The spoken documents with wide variety of different acoustic and linguistic conditions make supervised training of well-matched acoustic/language models very difficult. Unsupervised methods using frame-based dynamic time warping (DTW) require no acoustic/language...

chapter

Transcription-based video genre classification

Stanislas Oger, Mickael Rouvier, Georges Linares

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 5114 - 5117

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

In this paper, we present a new method for video genre identification based on the linguistic content analysis. This approach relies on the analysis of the most frequent words in the video transcriptions provided by an automatic speech recognition system. Experiments are conducted on a corpus composed of cartoons, movies, news, commercials, documentary, sport and music. On this 7-genre identification...

chapter

The research and implementation of acoustic module based Mandarin TTS

Cheng-Yu Yeh, Kuan-Lin Chen

2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP) > 1 - 4

4th International Symposium on Communications, Control and Signal Processing (ISCCSP 2010)

The primary study of this paper is focused on the acoustic module (AM) design in order to improve the performance of Mandarin TTS system. The AM is composed of the prosody generator, the spectrum generator, and the speech synthesizer. The HMM, recurrent neural network (RNN), and PSOLA algorithms are employed to build the AM. Finally, the performance analyses including the speech quality, memory requirement,...

Filter options

Keywords:
TRAINING
ACOUSTICS
PRAGMATICS

Publication date

Set your own date range

Keywords

SPEECH (10)
HIDDEN MARKOV MODELS (7)
FEATURE EXTRACTION (4)
SPEECH SYNTHESIS (3)
ADAPTATION MODELS (2)
CONTEXT (2)
DATABASES (2)
EMOTION RECOGNITION (2)
MANDARIN (2)
SPEECH RECOGNITION (2)
TRAINING DATA (2)
ACOUSTIC FEATURES (1)
ACOUSTIC MODELING (1)
ACOUSTIC MODULE DESIGN (1)
ACOUSTIC SIGNAL PROCESSING (1)
ANALYSIS OF VARIANCE (1)
ARTIFICIAL NEURAL NETWORKS (1)
AUDIO-BASED VIDEO PROCESSING (1)
AUTOMATIC SPEECH RECOGNITION (1)
AUTOMATIC SPEECH RECOGNITION SYSTEM (1)
BILINGUAL ASR (1)
BLAMING ATTITUDE (1)
CLASSIFICATION ALGORITHMS (1)
CLUSTERING ALGORITHMS (1)
COMPUTATIONAL COMPLEXITY (1)
COMPUTATIONAL MODELING (1)
CONTEXT MODELING (1)
CORRELATION (1)
COVARIANCE MATRICES (1)
CROSS-LINGUISTIC PERCEPTION (1)
DATA MODELS (1)
DECISION-LEVEL FUSION (1)
DEEP NEURAL NETWORK (1)
DEEP NEURAL NETWORKS (1)
DYNAMIC TIME WARPING (1)
FEATURE-LEVEL FUSION (1)
FIXED-POINT DSP CHIP (1)
FRENCH (1)
GENERATORS (1)
GMM-HMM (1)
HMM (1)
IP NETWORKS (1)
JAPANESE (1)
L2 LEARNERS (1)
LANGUAGE MODEL (1)
LANGUAGE MODELING (1)
LINGUISTIC CONTENT ANALYSIS (1)
LINGUISTIC FEATURE EXTRACTION (1)
LINGUISTIC FEATURES (1)
LINGUISTIC-LEVEL FEATURES (1)
LOGISTICS (1)
LOW-LEVEL ACOUSTIC FEATURES (1)
MANDARIN TTS SYSTEM (1)
MORPHEME (1)
MULTI-TASK LEARNING (1)
MULTIMEDIA COMMUNICATION (1)
PITCH-SYNCHRONOUS OVERLAP-ADD APPROACH (1)
PRAISING ATTITUDE (1)
PROBABILITY DENSITY FUNCTION (1)
PROSODY GENERATOR (1)
PSOLA ALGORITHMS (1)
RECURRENT NEURAL NETS (1)
RECURRENT NEURAL NETWORK (1)
RECURRENT NEURAL NETWORKS (1)
SPECTRUM GENERATOR (1)
SPEECH QUALITY (1)
SPEECH SYNTHESIZER (1)
SPOKEN TERM DETECTION (1)
SPONTANEOUS SPEECH (1)
STATISTICAL PARAMETRIC SPEECH SYNTHESIS (1)
SUPPORT VECTOR MACHINES (1)
TIBETAN (1)
TONGUE (1)
TRANSCRIPTION-BASED VIDEO GENRE CLASSIFICATION (1)
TRANSFER LEARNING (1)
UNIFIED PHONE SET (1)
UNIT SELECTION (1)
UYGHUR (1)
VIDEO GENRE CLASSIFICATION (1)
VIDEO RETRIEVAL (1)
VOCABULARY (1)
more

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options