Advanced search

Advanced search in people

From:

To:

Items from 101 to 120 out of 937 results

1 ...
3
4
5
6
7
8
9

chapter

Extracting structural spectral features using what-where auto-encoders for statistical parametric speech synthesis

Ya-Jun Hu, Zhen-Hua Ling, Li-Rong Dai

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4915 - 4919

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents a method to extract structural spectral features from spectral envelopes using what-where autoencoders (WWAE) for statistical parametric speech synthesis (SPSS). A WWAE is constructed by concatenating a convolutional net for input encoding and a deconvolutional net for reconstruction. The output values of the max-pooling layer in the encoder and the positions of the max-pooling...

chapter

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework

Naoyuki Kanda, Xugang Lu, Hisashi Kawai

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4855 - 4859

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

When using connectionist temporal classification (CTC) based acoustic models (AMs) for large vocabulary continuous speech recognition (LVCSR), most previous studies have used a naive interpolation of the CTC-AM score and an additional language model score, although there is no theoretical justification for such an approach. On the other hand, we recently proposed a theoretically more sound decoding...

chapter

Very deep convolutional networks for end-to-end speech recognition

Yu Zhang, William Chan, Navdeep Jaitly

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4845 - 4849

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Sequence-to-sequence models have shown success in end-to-end speech recognition. However these models have only used shallow acoustic encoder networks. In our work, we successively train very deep convolutional networks to add more expressive power and better generalization for end-to-end ASR models. We apply network-in-network principles, batch normalization, residual connections and convolutional...

chapter

Knowledge distillation for small-footprint highway networks

Liang Lu, Michelle Guo, Steve Renals

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4820 - 4824

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Deep learning has significantly advanced state-of-the-art of speech recognition in the past few years. However, compared to conventional Gaussian mixture acoustic models, neural network models are usually much larger, and are therefore not very deployable in embedded devices. Previously, we investigated a compact highway deep neural network (HDNN) for acoustic modelling, which is a type of depth-gated...

chapter

Noisy objective functions based on the f-divergence

Markus Nussbaum-Thom, Ralf Schluter, Vaibhava Goel, Hermann Ney

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2327 - 2331

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Dropout, the random dropping out of activations according to a specified rate, is a very simple but effective method to avoid over-fitting of deep neural networks to the training data.

chapter

Training variance and performance evaluation of neural networks in speech

Ewout van den Berg, Bhuvana Ramabhadran, Michael Picheny

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2287 - 2291

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this work we study variance in the results of neural network training on a wide variety of configurations in automatic speech recognition. Although this variance itself is well known, this is, to the best of our knowledge, the first paper that performs an extensive empirical study on its effects in speech recognition. We view training as sampling from a distribution and show that these distributions...

chapter

Combining unidirectional long short-term memory with convolutional output layer for high-performance speech synthesis

Wenfu Wang, Bo Xu

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5500 - 5504

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we target improving the accuracy of acoustic modelling for statistical parametric speech synthesis (SPSS) and introduce the convolutional neural network (CNN) due to its powerful capacity in locality modelling. A novel model architecture combining unidirectional long short-term memory (LSTM) and a time-domain convolutional output layer (COL) is proposed and employed to acoustic modelling...

chapter

Fast tagging of natural sounds using marginal co-regularization

Qiang Huang, Yong Xu, Philip J. B. Jackson, Wenwu Wang, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2991 - 2995

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Automatic and fast tagging of natural sounds in audio collections is a very challenging task due to wide acoustic variations, the large number of possible tags, the incomplete and ambiguous tags provided by different labellers. To handle these problems, we use a co-regularization approach to learn a pair of classifiers on sound and text. The first classifier maps low-level audio features to a true...

chapter

Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features

Ondrej Klejch, Peter Bell, Steve Renals

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5700 - 5704

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we present an extension of our previously described neural machine translation based system for punctuated transcription. This extension allows the system to map from per frame acoustic features to word level representations by replacing the traditional encoder in the encoder-decoder architecture with a hierarchical encoder. Furthermore, we show that a system combining lexical and acoustic...

chapter

Pairwise learning using multi-lingual bottleneck features for low-resource query-by-example spoken term detection

Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5645 - 5649

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose to use a feature representation obtained by pairwise learning in a low-resource language for query-by-example spoken term detection (QbE-STD). We assume that word pairs identified by humans are available in the low-resource target language. The word pairs are parameterized by a multi-lingual bottleneck feature (BNF) extractor that is trained using transcribed data in high-resource languages...

chapter

An LSTM-CTC based verification system for proxy-word based OOV keyword search

Zhiqiang Lv, Jian Kang, Wei-Qiang Zhang, Jia Liu

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5655 - 5659

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Proxy-word based out of vocabulary (OOV) keyword search has been proven to be quite effective in keyword search. In proxy-word based OOV keyword search, each OOV keyword is assigned several proxies and detections of the proxies are regarded as detections of the OOV keywords. However, the confidence scores of these detections are still those of the proxies from lattices. To obtain a better confidence...

chapter

Exemplar selection methods in voice conversion

Guanlong Zhao, Ricardo Gutierrez-Osuna

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5525 - 5529

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Exemplar-based methods for voice conversion often use a large number of randomly-selected exemplars to ensure good coverage. As a result, the factorization step can be costly. This paper presents two algorithms that can be used to construct compact sets of exemplars. The first algorithm uses a forward selection procedure to build the exemplar set sequentially, selecting exemplar pairs that minimize...

chapter

Voice-transformation-based data augmentation for prosodic classification

Raul Fernandez, Andrew Rosenberg, Alexander Sorin, Bhuvana Ramabhadran, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5530 - 5534

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this work we explore data-augmentation techniques for the task of improving the performance of a supervised recurrent-neural-network classifier tasked with predicting prosodic-boundary and pitch-accent labels. The technique is based on applying voice transformations to the training data that modify the pitch baseline and range, as well as the vocal-tract and vocal-source characteristics of the...

chapter

Quality assessment of voice converted speech using articulatory features

Avni Rajpal, Nirmesh J. Shah, Mohammadi Zaki, Hemant A. Patil

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5515 - 5519

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose a novel application of the acoustic-to-articulatory inversion (AAI) towards a quality assessment of the voice converted speech. The ability of humans to speak effortlessly requires the coordinated movements of various articulators, muscles, etc. This effortless movement contributes towards a naturalness, intelligibility and speaker's identity (which is partially present in voice converted...

chapter

Active learning for low-resource speech recognition: Impact of selection size and language modeling data

Ali Raza Syed, Andrew Rosenberg, Michael Mandel

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5315 - 5319

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Active learning aims to reduce the time and cost of developing speech recognition systems by selecting for transcription highly informative subsets from large pools of audio data. Previous evaluations at OpenKWS and IARPA BABEL have investigated data selection for low-resource languages in very constrained scenarios with 2-hour data selections given a 1-hour seed set. We expand on this to investigate...

chapter

An empirical evaluation of zero resource acoustic unit discovery

Chunxi Liu, Jinyi Yang, Ming Sun, Santosh Kesiraju, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5305 - 5309

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Acoustic unit discovery (AUD) is a process of automatically identifying a categorical acoustic unit inventory from speech and producing corresponding acoustic unit tokenizations. AUD provides an important avenue for unsupervised acoustic model training in a zero resource setting where expert-provided linguistic knowledge and transcribed speech are unavailable. Therefore, to further facilitate zero-resource...

chapter

Alternative networks for monolingual bottleneck features

William Hartmann, Roger Hsiao, Stavros Tsakalidis

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5290 - 5294

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

While recent advances in deep neural networks have lead to significant improvements in speech recognition, they have been applied mainly to acoustic and language modeling. We instead apply the models to bottleneck feature extraction. Several DNN, CNN, and BLSTM-based bottleneck feature networks are compared using both DNN and BLSTM acoustic models. Multiple variations in network architecture and feature...

chapter

Semi-supervised ensemble DNN acoustic model training

Sheng Li, Xugang Lu, Shinsuke Sakai, Masato Mimura, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5270 - 5274

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

It is very important to exploit abundant unlabeled speech for improving the acoustic model training in automatic speech recognition (ASR). Semi-supervised training methods incorporate unlabeled data in addition to labeled data to enhance the model training, but it encounters the error-prone label problem. The ensemble training scheme trains a set of models and combines them to make the model more...

chapter

End-to-end speech recognition and keyword search on low-resource languages

Andrew Rosenberg, Kartik Audhkhasi, Abhinav Sethy, Bhuvana Ramabhadran, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5280 - 5284

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In recent years, so-called, “end-to-end” speech recognition systems have emerged as viable alternatives to traditional ASR frameworks. Keyword search, localizing an orthographic query in a speech corpus, is typically performed by using automatic speech recognition (ASR) to generate an index. Previous work has evaluated the use of end-to-end systems for ASR on well known corpora (WSJ, Switchboard,...

chapter

Student-teacher network learning with enhanced features

Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, John R. Hershey

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5275 - 5279

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recent advances in distant-talking ASR research have confirmed that speech enhancement is an essential technique for improving the ASR performance, especially in the multichannel scenario. However, speech enhancement inevitably distorts speech signals, which can cause significant degradation when enhanced signals are used as training data. Thus, distant-talking ASR systems often resort to using the...

1 ...
3
4
5
6
7
8
9

Keywords:
TRAINING
ACOUSTICS

Publication date

Set your own date range

Content availability

Available (936)
None (1)

Publication type

book (816)
article (121)

Keywords

SPEECH (566)
HIDDEN MARKOV MODELS (487)
SPEECH RECOGNITION (429)
FEATURE EXTRACTION (218)
DATA MODELS (151)
ADAPTATION MODELS (103)
SPEECH PROCESSING (98)
TRAINING DATA (95)
NEURAL NETWORKS (93)
ACCURACY (90)
COMPUTATIONAL MODELING (85)
AUTOMATIC SPEECH RECOGNITION (75)
SUPPORT VECTOR MACHINES (74)
ARTIFICIAL NEURAL NETWORKS (71)
DATABASES (65)
VECTORS (58)
TESTING (56)
DECODING (54)
ADAPTATION MODEL (49)
NATURAL LANGUAGE PROCESSING (49)
MATHEMATICAL MODEL (47)
SPEAKER RECOGNITION (47)
ACOUSTIC SIGNAL PROCESSING (46)
NOISE (44)
ACOUSTIC MODELING (43)
CONTEXT (43)
HIDDEN MARKOV MODEL (42)
SPEECH SYNTHESIS (41)
ESTIMATION (40)
ROBUSTNESS (40)
DATA MINING (39)
SIGNAL PROCESSING (39)
DEEP NEURAL NETWORKS (38)
DEEP NEURAL NETWORK (36)
DISCRIMINATIVE TRAINING (36)
MAXIMUM LIKELIHOOD ESTIMATION (35)
LATTICES (34)
LEARNING (ARTIFICIAL INTELLIGENCE) (34)
TRANSFORMS (34)
ERROR ANALYSIS (32)
CLASSIFICATION ALGORITHMS (31)
VOCABULARY (31)
SIGNAL TO NOISE RATIO (29)
VISUALIZATION (28)
ACOUSTIC MODEL (27)
CONTEXT MODELING (27)
MACHINE LEARNING (26)
EMOTION RECOGNITION (25)
KERNEL (25)
DICTIONARIES (24)
NOISE MEASUREMENT (24)
STANDARDS (24)
OPTIMIZATION (23)
PATTERN RECOGNITION (23)
EQUATIONS (22)
GAUSSIAN PROCESSES (22)
INDEXES (22)
ALGORITHM DESIGN AND ANALYSIS (21)
EDUCATIONAL INSTITUTIONS (21)
MICROPHONES (21)
PROBABILITY (21)
SIGNAL PROCESSING ALGORITHMS (21)
CLUSTERING ALGORITHMS (20)
CONFERENCES (20)
RECURRENT NEURAL NETWORKS (20)
SPEAKER ADAPTATION (20)
COMPUTERS (19)
CORRELATION (19)
HMM (19)
SUPPORT VECTOR MACHINE CLASSIFICATION (18)
COMPUTER ARCHITECTURE (17)
GAUSSIAN MIXTURE MODEL (17)
UNSUPERVISED LEARNING (17)
COMPLEXITY THEORY (16)
DETECTORS (16)
LANGUAGE MODEL (16)
NEURAL NETS (16)
PRAGMATICS (16)
CONVOLUTION (15)
MEASUREMENT (15)
NIST (15)
PATTERN CLASSIFICATION (15)
ACOUSTIC MEASUREMENTS (14)
EVENT DETECTION (14)
KEYWORD SEARCH (14)
NEURONS (14)
PREDICTIVE MODELS (14)
SPEECH ENHANCEMENT (14)
VOICE CONVERSION (14)
JOINTS (13)
LVCSR (13)
MEL FREQUENCY CEPSTRAL COEFFICIENT (13)
SPEECH CODING (13)
SUPPORT VECTOR MACHINE (13)
TRAJECTORY (13)
APPROXIMATION METHODS (12)
DEEP LEARNING (12)
DNN (12)
more

INFONA - science communication portal

Advanced search

Advanced search in people

Extracting structural spectral features using what-where auto-encoders for statistical parametric speech synthesis

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework

Very deep convolutional networks for end-to-end speech recognition

Knowledge distillation for small-footprint highway networks

Noisy objective functions based on the f-divergence

Training variance and performance evaluation of neural networks in speech

Combining unidirectional long short-term memory with convolutional output layer for high-performance speech synthesis

Fast tagging of natural sounds using marginal co-regularization

Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features

Pairwise learning using multi-lingual bottleneck features for low-resource query-by-example spoken term detection

An LSTM-CTC based verification system for proxy-word based OOV keyword search

Exemplar selection methods in voice conversion

Voice-transformation-based data augmentation for prosodic classification

Quality assessment of voice converted speech using articulatory features

Active learning for low-resource speech recognition: Impact of selection size and language modeling data

An empirical evaluation of zero resource acoustic unit discovery

Alternative networks for monolingual bottleneck features

Semi-supervised ensemble DNN acoustic model training

End-to-end speech recognition and keyword search on low-resource languages

Student-teacher network learning with enhanced features

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Advanced search

Advanced search in people

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options