Search results

chapter

End-to-end ASR-free keyword search from speech

Kartik Audhkhasi, Andrew Rosenberg, Abhinav Sethy, Bhuvana Ramabhadran, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4840 - 4844

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

End-to-end (E2E) systems have achieved competitive results compared to conventional hybrid hidden Markov model (HMM)-deep neural network based automatic speech recognition (ASR) systems. Such E2E systems are attractive due to the lack of dependence on alignments between input acoustic and output grapheme or HMM state sequence during training. This paper explores the design of an ASR-free end-to-end...

chapter

Investigations on byte-level convolutional neural networks for language modeling in low resource speech recognition

Kazuki Irie, Pavel Golik, Ralf Schluter, Hermann Ney

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5740 - 5744

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we present an investigation on technical details of the byte-level convolutional layer which replaces the conventional linear word projection layer in the neural language model. In particular, we discuss and compare the effective filter configurations, pooling types and the use of bytes instead of characters. We carry out experiments on language packs released by the IARPA Babel project...

chapter

An LSTM-CTC based verification system for proxy-word based OOV keyword search

Zhiqiang Lv, Jian Kang, Wei-Qiang Zhang, Jia Liu

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5655 - 5659

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Proxy-word based out of vocabulary (OOV) keyword search has been proven to be quite effective in keyword search. In proxy-word based OOV keyword search, each OOV keyword is assigned several proxies and detections of the proxies are regarded as detections of the OOV keywords. However, the confidence scores of these detections are still those of the proxies from lattices. To obtain a better confidence...

chapter

End-to-end speech recognition and keyword search on low-resource languages

Andrew Rosenberg, Kartik Audhkhasi, Abhinav Sethy, Bhuvana Ramabhadran, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5280 - 5284

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In recent years, so-called, “end-to-end” speech recognition systems have emerged as viable alternatives to traditional ASR frameworks. Keyword search, localizing an orthographic query in a speech corpus, is typically performed by using automatic speech recognition (ASR) to generate an index. Previous work has evaluated the use of end-to-end systems for ASR on well known corpora (WSJ, Switchboard,...

chapter

Stimulated training for automatic speech recognition and keyword search in limited resource conditions

A. Ragni, C. Wu, M. J. F. Gales, J. Vasilakes, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4830 - 4834

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Training neural network acoustic models on limited quantities of data is a challenging task. A number of techniques have been proposed to improve generalisation. This paper investigates one such technique called stimulated training. It enables standard criteria such as cross-entropy to enforce spatial constraints on activations originating from different units. Having different regions being active...

chapter

Network architectures for multilingual speech representation learning

Tom Sercu, George Saon, Jia Cui, Xiaodong Cui, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5295 - 5299

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Multilingual (ML) representations play a key role in building speech recognition systems for low resource languages. The IARPA sponsored BABEL program focuses on building speech recognition (ASR) and keyword search (KWS) systems in over 24 languages with limited training data. The most common mechanism to derive ML representations in the BABEL program has been with the use of a two-stage network,...

chapter

On the study of very low-resource language keyword search

Van Tung Pham, Haihua Xu, Van Hai Do, Tze Yuang Chong, more

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 358 - 364

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this paper we report our approaches to accomplishing the very limited resource keyword search (KWS) task in the NIST Open Keyword Search 2015 (OpenKWS15) Evaluation. We devised the methods, first, to attain better acoustic modeling, multilingual and semi-supervised acoustic model training as well as the examplar-based acoustic model training; second, to address the overwhelming out-of-vocabulary...

article

Data Augmentation for Deep Neural Network Acoustic Modeling

Xiaodong Cui, Vaibhava Goel, Brian Kingsbury

IEEE/ACM Transactions on Audio, Speech, and Language Processing > 2015 > 23 > 9 > 1469 - 1477

This paper investigates data augmentation for deep neural network acoustic modeling based on label-preserving transformations to deal with data sparsity. Two data augmentation approaches, vocal tract length perturbation (VTLP) and stochastic feature mapping (SFM), are investigated for both deep neural networks (DNNs) and convolutional neural networks (CNNs). The approaches are focused on increasing...

chapter

The THUEE system for the openKWS14 keyword search evaluation

Meng Cai, Zhiqiang Lv, Beili Song, Yongzhe Shi, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4734 - 4738

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The OpenKWS14 keyword search evaluation is one of the most challenging and influential evaluations in the field of speech recognition. Its goal is to build a high-performance keyword search system for a minority language with limited training data in a short period of time. We present the system of the Department of Electronic Engineering, Tsinghua University (THUEE team) for the OpenKWS14 keyword...

chapter

Semi-supervised training in low-resource ASR and KWS

Florian Metze, Ankur Gandhe, Yajie Miao, Zaid Sheikh, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4699 - 4703

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In particular for “low resource” Keyword Search (KWS) and Speech-to-Text (STT) tasks, more untranscribed test data may be available than training data. Several approaches have been proposed to make this data useful during system development, even when initial systems have Word Error Rates (WER) above 70%. In this paper, we present a set of experiments on low-resource languages in telephony speech...

chapter

Low-resource keyword search strategies for tamil

Nancy F. Chen, Chongjia Ni, I-Fan Chen, Sunil Sivadas, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5366 - 5370

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose strategies for a state-of-the-art keyword search (KWS) system developed by the SINGA team in the context of the 2014 NIST Open Keyword Search Evaluation (OpenKWS14) using conversational Tamil provided by the IARPA Babel program. To tackle low-resource challenges and the rich morphological nature of Tamil, we present highlights of our current KWS system, including: (1) Submodular optimization...

chapter

Unsupervised data selection and word-morph mixed language model for tamil low-resource keyword search

Chongjia Ni, Cheung-Chi Leung, Lei Wang, Nancy F. Chen, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4714 - 4718

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper considers an unsupervised data selection problem for the training data of an acoustic model and the vocabulary coverage of a keyword search system in low-resource settings. We propose to use Gaussian component index based n-grams as acoustic features in a submodular function for unsupervised data selection. The submodular function provides a near-optimal solution in terms of the objective...

chapter

Improvements on transducing syllable lattice to word lattice for keyword search

Hang Su, Van Tung Pham, Yanzhang He, James Hieronymus

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4729 - 4733

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper investigates a weighted finite state transducer (WFST) based syllable decoding and transduction method for keyword search (KWS), and compares it with sub-word search and phone confusion methods in detail. Acoustic context dependent phone models are trained from word forced alignments and then used for syllable decoding and lattice generation. Out-of-vocabulary (OOV) keyword pronunciations...

chapter

Towards better keyword search performance on Malay broadcast news data

Haihua Xu, Pham Van Tung, Eng-Siong Chng, Haizhou Li

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific > 1 - 5

2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this paper we describe approaches to building our recent Malay broadcast news audio retrieval system. This system contains speech-to-text and keyword search subsystems. The speech-to-text system is built aiming at two folds: hybrid vocabulary recognition to tackle out-of-vocabulary keyword search issue and diversified acoustic modeling for effective system combination in keyword searching afterwards...

INFONA - science communication portal

Search results

End-to-end ASR-free keyword search from speech

Investigations on byte-level convolutional neural networks for language modeling in low resource speech recognition

An LSTM-CTC based verification system for proxy-word based OOV keyword search

End-to-end speech recognition and keyword search on low-resource languages

Stimulated training for automatic speech recognition and keyword search in limited resource conditions

Network architectures for multilingual speech representation learning

On the study of very low-resource language keyword search

Data Augmentation for Deep Neural Network Acoustic Modeling

The THUEE system for the openKWS14 keyword search evaluation

Semi-supervised training in low-resource ASR and KWS

Low-resource keyword search strategies for tamil

Unsupervised data selection and word-morph mixed language model for tamil low-resource keyword search

Improvements on transducing syllable lattice to word lattice for keyword search

Towards better keyword search performance on Malay broadcast news data

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options