Search results

Items from 141 to 160 out of 2,639 results

1 ...
5
6
7
8
9
10
11

chapter

Extracting structural spectral features using what-where auto-encoders for statistical parametric speech synthesis

Ya-Jun Hu, Zhen-Hua Ling, Li-Rong Dai

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4915 - 4919

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents a method to extract structural spectral features from spectral envelopes using what-where autoencoders (WWAE) for statistical parametric speech synthesis (SPSS). A WWAE is constructed by concatenating a convolutional net for input encoding and a deconvolutional net for reconstruction. The output values of the max-pooling layer in the encoder and the positions of the max-pooling...

chapter

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework

Naoyuki Kanda, Xugang Lu, Hisashi Kawai

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4855 - 4859

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

When using connectionist temporal classification (CTC) based acoustic models (AMs) for large vocabulary continuous speech recognition (LVCSR), most previous studies have used a naive interpolation of the CTC-AM score and an additional language model score, although there is no theoretical justification for such an approach. On the other hand, we recently proposed a theoretically more sound decoding...

chapter

Very deep convolutional networks for end-to-end speech recognition

Yu Zhang, William Chan, Navdeep Jaitly

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4845 - 4849

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Sequence-to-sequence models have shown success in end-to-end speech recognition. However these models have only used shallow acoustic encoder networks. In our work, we successively train very deep convolutional networks to add more expressive power and better generalization for end-to-end ASR models. We apply network-in-network principles, batch normalization, residual connections and convolutional...

chapter

Knowledge distillation for small-footprint highway networks

Liang Lu, Michelle Guo, Steve Renals

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4820 - 4824

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Deep learning has significantly advanced state-of-the-art of speech recognition in the past few years. However, compared to conventional Gaussian mixture acoustic models, neural network models are usually much larger, and are therefore not very deployable in embedded devices. Previously, we investigated a compact highway deep neural network (HDNN) for acoustic modelling, which is a type of depth-gated...

chapter

Melody extraction and detection through LSTM-RNN with harmonic sum loss

Hyunsin Park, Chang D. Yoo

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2766 - 2770

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper proposes a long short-term memory recurrent neural network (LSTM-RNN) for extracting melody and simultaneously detecting regions of melody from polyphonic audio using the proposed harmonic sum loss. The previous state-of-the-art algorithms have not been based on machine learning techniques and certainly not on deep architectures. The harmonics structure in melody is incorporated in the...

chapter

Noisy objective functions based on the f-divergence

Markus Nussbaum-Thom, Ralf Schluter, Vaibhava Goel, Hermann Ney

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2327 - 2331

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Dropout, the random dropping out of activations according to a specified rate, is a very simple but effective method to avoid over-fitting of deep neural networks to the training data.

chapter

Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features

Ondrej Klejch, Peter Bell, Steve Renals

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5700 - 5704

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we present an extension of our previously described neural machine translation based system for punctuated transcription. This extension allows the system to map from per frame acoustic features to word level representations by replacing the traditional encoder in the encoder-decoder architecture with a hierarchical encoder. Furthermore, we show that a system combining lexical and acoustic...

chapter

End-to-end joint learning of natural language understanding and dialogue manager

Xuesong Yang, Yun-Nung Chen, Dilek Hakkani-Tur, Paul Crook, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5690 - 5694

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Natural language understanding and dialogue policy learning are both essential in conversational systems that predict the next system actions in response to a current user utterance. Conventional approaches aggregate separate models of natural language understanding (NLU) and system action prediction (SAP) as a pipeline that is sensitive to noisy outputs of error-prone NLU. To address the issues,...

chapter

Exemplar selection methods in voice conversion

Guanlong Zhao, Ricardo Gutierrez-Osuna

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5525 - 5529

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Exemplar-based methods for voice conversion often use a large number of randomly-selected exemplars to ensure good coverage. As a result, the factorization step can be costly. This paper presents two algorithms that can be used to construct compact sets of exemplars. The first algorithm uses a forward selection procedure to build the exemplar set sequentially, selecting exemplar pairs that minimize...

chapter

An empirical evaluation of zero resource acoustic unit discovery

Chunxi Liu, Jinyi Yang, Ming Sun, Santosh Kesiraju, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5305 - 5309

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Acoustic unit discovery (AUD) is a process of automatically identifying a categorical acoustic unit inventory from speech and producing corresponding acoustic unit tokenizations. AUD provides an important avenue for unsupervised acoustic model training in a zero resource setting where expert-provided linguistic knowledge and transcribed speech are unavailable. Therefore, to further facilitate zero-resource...

chapter

Semi-supervised ensemble DNN acoustic model training

Sheng Li, Xugang Lu, Shinsuke Sakai, Masato Mimura, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5270 - 5274

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

It is very important to exploit abundant unlabeled speech for improving the acoustic model training in automatic speech recognition (ASR). Semi-supervised training methods incorporate unlabeled data in addition to labeled data to enhance the model training, but it encounters the error-prone label problem. The ensemble training scheme trains a set of models and combines them to make the model more...

chapter

End-to-end speech recognition and keyword search on low-resource languages

Andrew Rosenberg, Kartik Audhkhasi, Abhinav Sethy, Bhuvana Ramabhadran, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5280 - 5284

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In recent years, so-called, “end-to-end” speech recognition systems have emerged as viable alternatives to traditional ASR frameworks. Keyword search, localizing an orthographic query in a speech corpus, is typically performed by using automatic speech recognition (ASR) to generate an index. Previous work has evaluated the use of end-to-end systems for ASR on well known corpora (WSJ, Switchboard,...

chapter

Training algorithm to deceive Anti-Spoofing Verification for DNN-based speech synthesis

Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4900 - 4904

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper proposes a novel training algorithm for high-quality Deep Neural Network (DNN)-based speech synthesis. The parameters of synthetic speech tend to be over-smoothed, and this causes significant quality degradation in synthetic speech. The proposed algorithm takes into account an Anti-Spoofing Verification (ASV) as an additional constraint in the acoustic model training. The ASV is a discriminator...

chapter

Unsupervised speaker adaptation of batch normalized acoustic models for robust ASR

Zhong-Qiu Wang, DeLiang Wang

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4890 - 4894

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Batch normalization is a standard technique for training deep neural networks. In batch normalization, the input of each hidden layer is first mean-variance normalized and then linearly transformed before applying non-linear activation functions. We propose a novel unsupervised speaker adaptation technique for batch normalized acoustic models. The key idea is to adjust the linear transformations previously...

chapter

An investigation into learning effective speaker subspaces for robust unsupervised DNN adaptation

Lahiru Samarakoon, Khe Chai Sim, Brian Mak

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5035 - 5039

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Subspace methods are used for deep neural network (DNN)-based acoustic model adaptation. These methods first construct a subspace and then perform the speaker adaptation as a point in the subspace. This paper aims to investigate the effectiveness of subspace methods for robust unsupervised adaptation. For the analysis, we compare two state-of-the-art subspace methods, namely, the singular value decomposition...

chapter

Exploiting sequential Low-Rank Factorization for multilingual DNNS

Reza Sahraeian, Dirk Van Compernolle

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5025 - 5029

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

DNNs have shown remarkable performance in multilingual scenarios; however, these models are often too large in size that adaptation to a target language with relatively small amount of data cannot be well accomplished. In our previous work, we utilized Low-Rank Factorization (LRF) using singular value decomposition for multilingual DNNs to learn compact models which can be adapted more successfully...

chapter

Reconstruction-error-based learning for continuous emotion recognition in speech

Jing Han, Zixing Zhang, Fabien Ringeval, Bjorn Schuller

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2367 - 2371

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

To advance the performance of continuous emotion recognition from speech, we introduce a reconstruction-error-based (RE-based) learning framework with memory-enhanced Recurrent Neural Networks (RNN). In the framework, two successive RNN models are adopted, where the first model is used as an autoencoder for reconstructing the original features, and the second is employed to perform emotion prediction...

chapter

Effective keyword search for low-resourced conversational speech

Rasa Lileikyte, Thiago Fraga-Silva, Lori Lamel, Jean-Luc Gauvain, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5785 - 5789

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we aim to enhance keyword search for conversational telephone speech under low-resourced conditions. Two techniques to improve the detection of out-of-vocabulary keywords are assessed in this study: using extra text resources to augment the lexicon and language model, and via subword units for keyword search. Two approaches for data augmentation are explored to extend the limited amount...

chapter

Lombard speech synthesis using long short-term memory recurrent neural networks

Bajibabu Bollepalli, Manu Airaksinen, Paavo Alku

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5505 - 5509

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In statistical parametric speech synthesis (SPSS), a few studies have investigated the Lombard effect, specifically by using hidden Markov model (HMM)-based systems. Recently, artificial neural networks have demonstrated promising results in SPSS, specifically by using long short-term memory recurrent neural networks (LSTMs). The Lombard effect, however, has not been studied in the LSTM-based speech...

chapter

Joint modeling of articulatory and acoustic spaces for continuous speech recognition tasks

Vikramjit Mitra, Ganesh Sivaraman, Chris Bartels, Hosung Nam, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5205 - 5209

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Articulatory information can effectively model variability in speech and can improve speech recognition performance under varying acoustic conditions. Learning speaker-independent articulatory models has always been challenging, as speaker-specific information in the articulatory and acoustic spaces increases the complexity of the speech-to-articulatory space inverse modeling, which is already an...

1 ...
5
6
7
8
9
10
11

Keywords:
TRAINING
HIDDEN MARKOV MODELS

Publication date

Set your own date range

Content availability

Available (2,627)
None (12)

Keywords

SPEECH (916)
SPEECH RECOGNITION (756)
FEATURE EXTRACTION (722)
ACOUSTICS (426)
HIDDEN MARKOV MODEL (334)
ACCURACY (312)
COMPUTATIONAL MODELING (307)
DATABASES (290)
DATA MODELS (286)
DATA MINING (233)
SUPPORT VECTOR MACHINES (228)
TRAINING DATA (212)
HMM (211)
HANDWRITING RECOGNITION (184)
TESTING (183)
NATURAL LANGUAGE PROCESSING (175)
MATHEMATICAL MODEL (161)
ARTIFICIAL NEURAL NETWORKS (151)
VECTORS (146)
NEURAL NETWORKS (137)
LEARNING (ARTIFICIAL INTELLIGENCE) (136)
ADAPTATION MODELS (135)
SPEECH PROCESSING (132)
SPEECH SYNTHESIS (129)
CONTEXT (116)
MEL FREQUENCY CEPSTRAL COEFFICIENT (116)
DECODING (111)
IMAGE SEGMENTATION (111)
SPEAKER RECOGNITION (109)
PROBABILITY (105)
AUTOMATIC SPEECH RECOGNITION (104)
HUMANS (99)
ADAPTATION MODEL (97)
TRAJECTORY (94)
CLASSIFICATION ALGORITHMS (93)
VOCABULARY (93)
GESTURE RECOGNITION (89)
GAUSSIAN PROCESSES (85)
MAXIMUM LIKELIHOOD ESTIMATION (85)
MARKOV PROCESSES (83)
CHARACTER RECOGNITION (82)
ERROR ANALYSIS (82)
PATTERN RECOGNITION (81)
TEXT ANALYSIS (80)
ESTIMATION (79)
DICTIONARIES (77)
VITERBI ALGORITHM (77)
NOISE (76)
PREDICTIVE MODELS (76)
IMAGE RECOGNITION (75)
MACHINE LEARNING (74)
PATTERN CLASSIFICATION (73)
OPTIMIZATION (72)
VISUALIZATION (71)
KERNEL (67)
ROBUSTNESS (66)
TAGGING (66)
CLUSTERING ALGORITHMS (63)
STATISTICAL ANALYSIS (63)
CONTEXT MODELING (62)
SHAPE (62)
FACE RECOGNITION (61)
JOINTS (57)
NOISE MEASUREMENT (57)
RECURRENT NEURAL NETWORKS (57)
IMAGE CLASSIFICATION (56)
NEURONS (56)
SUPPORT VECTOR MACHINE (55)
LABELING (53)
TRANSFORMS (53)
CONDITIONAL RANDOM FIELDS (52)
SENSORS (52)
STANDARDS (52)
ALGORITHM DESIGN AND ANALYSIS (51)
HANDWRITTEN CHARACTER RECOGNITION (51)
NEURAL NETS (50)
SEMANTICS (49)
BAYES METHODS (47)
DETECTORS (47)
FACE (47)
IMAGE SEQUENCES (47)
PRINCIPAL COMPONENT ANALYSIS (47)
CONFERENCES (46)
CAMERAS (45)
PROBABILISTIC LOGIC (43)
TEXT RECOGNITION (43)
DISCRIMINATIVE TRAINING (42)
NATURAL LANGUAGES (42)
COMPUTER VISION (41)
ENTROPY (41)
INFORMATION RETRIEVAL (41)
SIGNAL TO NOISE RATIO (41)
HEURISTIC ALGORITHMS (40)
IMAGE MOTION ANALYSIS (40)
LATTICES (40)
PATTERN CLUSTERING (40)
PIXEL (40)
BIOLOGICAL SYSTEM MODELING (39)
more

INFONA - science communication portal

Search results

Extracting structural spectral features using what-where auto-encoders for statistical parametric speech synthesis

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework

Very deep convolutional networks for end-to-end speech recognition

Knowledge distillation for small-footprint highway networks

Melody extraction and detection through LSTM-RNN with harmonic sum loss

Noisy objective functions based on the f-divergence

Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features

End-to-end joint learning of natural language understanding and dialogue manager

Exemplar selection methods in voice conversion

An empirical evaluation of zero resource acoustic unit discovery

Semi-supervised ensemble DNN acoustic model training

End-to-end speech recognition and keyword search on low-resource languages

Training algorithm to deceive Anti-Spoofing Verification for DNN-based speech synthesis

Unsupervised speaker adaptation of batch normalized acoustic models for robust ASR

An investigation into learning effective speaker subspaces for robust unsupervised DNN adaptation

Exploiting sequential Low-Rank Factorization for multilingual DNNS

Reconstruction-error-based learning for continuous emotion recognition in speech

Effective keyword search for low-resourced conversational speech

Lombard speech synthesis using long short-term memory recurrent neural networks

Joint modeling of articulatory and acoustic spaces for continuous speech recognition tasks

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options