Search results

Items from 61 to 80 out of 1 468 results

chapter

Towards bootstrapping Acoustic Models for resource poor Indian languages

Prabhat Pandey, Praful Hebbar, Prashant Borole, Sandeep Satpal, more

2017 Twenty-third National Conference on Communications (NCC) > 1 - 4

2017 Twenty-third National Conference on Communications (NCC)

There are several challenges while building Automatic Speech Recognition (ASR) system for low resource languages such as Indic languages. One problem is the access to large amounts of training data required to build Acoustic Models (AM) from scratch. In the context of Indian English, another challenge encountered is code-mixing as many Indian speakers are multilingual and exhibit code-mixing in their...

chapter

Recurrent neural network language models for keyword search

X. Chen, A. Ragni, J. Vasilakes, X. Liu, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5775 - 5779

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recurrent neural network language models (RNNLMs) have becoming increasingly popular in many applications such as automatic speech recognition (ASR). Significant performance improvements in both perplexity and word error rate over standard n-gram LMs have been widely reported on ASR tasks. In contrast, published research on using RNNLMs for keyword search systems has been relatively limited. In this...

chapter

Radio-browsing for developmental monitoring in Uganda

Raghav Menon, Armin Saeb, Hugh Cameron, William Kibira, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5795 - 5799

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We consider the extraction of information from broadcast radio speech in Uganda for the purposes of informing relief and development programmes by the United Nations. Although internet penetration in Uganda is low, mobile phones are ubiquitous and have made radio a vibrant medium for interactive public discussion. Vulnerable groups make use of radio to discuss issues related to, for example, agriculture,...

chapter

Analysis of keyword spotting performance across IARPA babel languages

William Hartmann, Damianos Karakos, Roger Hsiao, Le Zhang, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5765 - 5769

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

With the completion of the IARPA Babel program, it is possible to systematically analyze the performance of speech recognition systems across a wide variety of languages. We select 16 languages from the dataset and compare performance using a deep neural network-based acoustic model. The focus is on keyword spotting using the actual term-weighted value (ATWV) metric. We demonstrate that ATWV is keyword...

chapter

Automatic node selection for Deep Neural Networks using Group Lasso regularization

Tsubasa Ochiai, Shigeki Matsuda, Hideyuki Watanabe, Shigeru Katagiri

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5485 - 5489

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We examine the effect of the Group Lasso (gLasso) regularizer in selecting the salient nodes of Deep Neural Network (DNN) hidden layers by applying a DNN-HMM hybrid speech recognizer to TED Talks speech data. We test two types of gLasso regularization, one for outgoing weight vectors and another for incoming weight vectors, as well as two sizes of DNNs: 2048 hidden layer nodes and 4096 nodes. Furthermore,...

chapter

Effective joint training of denoising feature space transforms and Neural Network based acoustic models

Takashi Fukuda, Osamu Ichikawa, Gakuto Kurata, Ryuki Tachibana, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5190 - 5194

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Neural Network (NN) based acoustic frontends, such as denoising autoencoders, are actively being investigated to improve the robustness of NN based acoustic models to various noise conditions. In recent work the joint training of such frontends with backend NNs has been shown to significantly improve speech recognition performance. In this paper, we propose an effective algorithm to jointly train...

chapter

Unsupervised adaptation for deep neural networks using Alternating Direction Method of Multipliers

Roger Hsiao, Tim Ng, Man-Hung Siu

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5180 - 5184

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we continue our work on linear least squares based adaptation (LLS) for deep neural networks. We show that our previously proposed algorithm is a special case of an optimization algorithm called Alternating Direction Method of Multipliers (ADMM). We demonstrate that the adaptation algorithm can improve the performance on various deep neural networks including the bidirectional long...

chapter

Personalized acoustic modeling by weakly supervised multi-task deep learning using acoustic tokens discovered from unlabeled data

Cheng-Kuan Wei, Cheng-Tao Chung, Hung-Yi Lee, Lin-Shan Lee

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5165 - 5169

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

It is well known that recognizers personalized to each user are much more effective than user-independent recognizers. With the popularity of smartphones today, although it is not difficult to collect a large set of audio data for each user, it is difficult to transcribe it. However, it is now possible to automatically discover acoustic tokens from unlabeled personal data in an unsupervised way. We...

chapter

Advances in all-neural speech recognition

Geoffrey Zweig, Chengzhu Yu, Jasha Droppo, Andreas Stolcke

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4805 - 4809

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper advances the design of CTC-based all-neural (or end-to-end) speech recognizers. We propose a novel symbol inventory, and a novel iterated-CTC method in which a second system is used to transform a noisy initial output into a cleaner version. We present a number of stabilization and initialization methods we have found useful in training these networks. We evaluate our system on the commonly...

chapter

Deep learning based automatic volume control and limiter system

Jun Yang, Philip Hilmes, Brian Adair, David W. Krueger

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2177 - 2181

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Automatic speech recognition is now playing an important role in volume control and adjustment of modern smart speakers. According to the recognition results by using the advanced deep neural network technology, this paper proposes an efficient processing system for automatic volume control (AVC) and limiter. The theoretical analyses, subjective and objective testing results show that the proposed...

chapter

Very deep convolutional neural networks for raw waveforms

Wei Dai, Chia Dai, Shuhui Qu, Juncheng Li, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 421 - 425

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Learning acoustic models directly from the raw waveform data with minimal processing is challenging. Current waveform-based models have generally used very few (∼2) convolutional layers, which might be insufficient for building high-level discriminative features. In this work, we propose very deep convolutional neural networks (CNNs) that directly use time-domain waveforms as inputs. Our CNNs, with...

chapter

Lyric recognition in monophonic singing using pitch-dependent DNN

Dairoku Kawai, Kazumasa Yamamoto, Seiichi Nakagawa

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 326 - 330

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

One of the difficulties in sung speech recognition is the small distance in an acoustic space between phonemes in sung speech. Therefore we considered clustering the speech based on a pitch (fundamental frequency F0) and creating a larger distance between the phonemes. In addition, we considered a two-stage training method of DNN-HMM: the first stage is trained by using conventional acoustic features...

chapter

Shefce: A Cantonese-English bilingual speech corpus for pronunciation assessment

Raymond W. M. Ng, Alvin C.M. Kwan, Tan Lee, Thomas Hain

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5825 - 5829

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper introduces the development of ShefCE: a Cantonese-English bilingual speech corpus from L2 English speakers in Hong Kong. Bilingual parallel recording materials were chosen from TED online lectures. Script selection were carried out according to bilingual consistency (evaluated using a machine translation system) and the distribution balance of phonemes. 31 undergraduate to postgraduate...

chapter

Improving audio-visual speech recognition using deep neural networks with dynamic stream reliability estimates

Hendrik Meutzner, Ning Ma, Robert Nickel, Christopher Schymura, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5320 - 5324

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Audio-visual speech recognition is a promising approach to tackling the problem of reduced recognition rates under adverse acoustic conditions. However, finding an optimal mechanism for combining multi-modal information remains a challenging task. Various methods are applicable for integrating acoustic and visual information in Gaussian-mixture-model-based speech recognition, e.g., via dynamic stream...

chapter

Learning online alignments with continuous rewards policy gradient

Yuping Luo, Chung-Cheng Chiu, Navdeep Jaitly, Ilya Sutskever

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2801 - 2805

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Sequence-to-sequence models with soft attention had significant success in machine translation, speech recognition, and question answering. Though capable and easy to use, they require that the entirety of the input sequence is available at the beginning of inference, an assumption that is not valid for instantaneous translation and speech recognition. To address this problem, we present a new method...

chapter

Memory visualization for gated recurrent neural networks in speech recognition

Zhiyuan Tang, Ying Shi, Dong Wang, Yang Feng, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2736 - 2740

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recurrent neural networks (RNNs) have shown clear superiority in sequence modeling, particularly the ones with gated units, such as long short-term memory (LSTM) and gated recurrent unit (GRU). However, the dynamic properties behind the remarkable performance remain unclear in many applications, e.g., automatic speech recognition (ASR). This paper employs visualization techniques to study the behavior...

chapter

LDA-based context dependent recurrent neural network language model using document-based topic distribution of words

Md. Akmal Haidar, Mikko Kurimo

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5730 - 5734

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Adding context information into recurrent neural network language models (RNNLMs) have been investigated recently to improve the effectiveness of learning RNNLM. Conventionally, a fast approximate topic representation for a block of words was proposed by using corpus-based topic distribution of word incorporating latent Dirichlet allocation (LDA) model. It is then updated for each subsequent word...

chapter

Investigations on byte-level convolutional neural networks for language modeling in low resource speech recognition

Kazuki Irie, Pavel Golik, Ralf Schluter, Hermann Ney

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5740 - 5744

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we present an investigation on technical details of the byte-level convolutional layer which replaces the conventional linear word projection layer in the neural language model. In particular, we discuss and compare the effective filter configurations, pooling types and the use of bytes instead of characters. We carry out experiments on language packs released by the IARPA Babel project...

chapter

Automatic multi-lingual arousal detection from voice applied to real product testing applications

Florian Eyben, Matthias Unfried, Gerhard Hagerer, Bjorn Schuller

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5155 - 5159

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A method is presented which applies Long Short-Term Memory Recurrent Neural Networks on real market-research voice recordings in order to automatically predict emotional arousal from speech. While most previous work has dealt with evaluations of algorithms within the same speech corpus, the novelty of this paper lies in an extensive evaluation across corpora and languages. The approach is evaluated...

chapter

Learning utterance-level representations for speech emotion and age/gender recognition using deep neural networks

Zhong-Qiu Wang, Ivan Tashev

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5150 - 5154

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Accurately recognizing speaker emotion and age/gender from speech can provide better user experience for many spoken dialogue systems. In this study, we propose to use deep neural networks (DNNs) to encode each utterance into a fixed-length vector by pooling the activations of the last hidden layer over time. The feature encoding process is designed to be jointly trained with the utterance-level classifier...

Keywords:
TRAINING
SPEECH RECOGNITION

Publication date

Set your own date range

Content availability

Available (1,462)
None (6)

Keywords

SPEECH (1,071)
HIDDEN MARKOV MODELS (756)
ACOUSTICS (384)
FEATURE EXTRACTION (377)
ACCURACY (187)
SPEECH PROCESSING (172)
MEL FREQUENCY CEPSTRAL COEFFICIENT (160)
DATABASES (147)
DATA MODELS (137)
SPEAKER RECOGNITION (136)
NEURAL NETWORKS (132)
ARTIFICIAL NEURAL NETWORKS (130)
COMPUTATIONAL MODELING (126)
NATURAL LANGUAGE PROCESSING (124)
TRAINING DATA (123)
SUPPORT VECTOR MACHINES (117)
AUTOMATIC SPEECH RECOGNITION (116)
TESTING (100)
VOCABULARY (91)
EMOTION RECOGNITION (90)
DATA MINING (87)
MATHEMATICAL MODEL (86)
ADAPTATION MODELS (85)
HIDDEN MARKOV MODEL (82)
DECODING (79)
ADAPTATION MODEL (78)
NOISE (78)
LEARNING (ARTIFICIAL INTELLIGENCE) (74)
ERROR ANALYSIS (69)
HMM (65)
CONTEXT (64)
CLASSIFICATION ALGORITHMS (57)
MAXIMUM LIKELIHOOD ESTIMATION (57)
GAUSSIAN PROCESSES (55)
PATTERN CLASSIFICATION (54)
LATTICES (53)
NEURAL NETS (53)
ROBUSTNESS (49)
CEPSTRAL ANALYSIS (48)
MFCC (48)
NOISE MEASUREMENT (48)
VECTORS (48)
DISCRIMINATIVE TRAINING (45)
PROBABILITY (45)
MACHINE LEARNING (43)
OPTIMIZATION (43)
STATISTICAL ANALYSIS (43)
KERNEL (41)
DICTIONARIES (40)
RECURRENT NEURAL NETWORKS (40)
TRANSFORMS (39)
SIGNAL TO NOISE RATIO (38)
DEEP NEURAL NETWORK (37)
LANGUAGE MODEL (37)
ACOUSTIC MODELING (36)
CONTEXT MODELING (36)
CORRELATION (36)
DEEP NEURAL NETWORKS (35)
NEURONS (35)
VISUALIZATION (34)
ENTROPY (33)
SUPPORT VECTOR MACHINE (33)
NATURAL LANGUAGES (32)
ACOUSTIC SIGNAL PROCESSING (31)
EQUATIONS (31)
SPEECH CODING (31)
SPEECH SYNTHESIS (30)
SPEECH ENHANCEMENT (29)
SUPPORT VECTOR MACHINE CLASSIFICATION (29)
GAUSSIAN MIXTURE MODEL (28)
VECTOR QUANTIZATION (28)
ESTIMATION (27)
NIST (27)
PATTERN RECOGNITION (27)
ROBUST SPEECH RECOGNITION (27)
SIGNAL CLASSIFICATION (27)
COMPUTERS (26)
SPEAKER IDENTIFICATION (25)
ACOUSTIC MODEL (24)
ALGORITHM DESIGN AND ANALYSIS (24)
COMPUTER ARCHITECTURE (24)
HUMANS (24)
PRINCIPAL COMPONENT ANALYSIS (24)
SPEECH EMOTION RECOGNITION (24)
DETECTORS (23)
STANDARDS (23)
COVARIANCE MATRIX (22)
MULTILAYER PERCEPTRONS (22)
TEXT ANALYSIS (22)
VITERBI ALGORITHM (22)
CLUSTERING ALGORITHMS (21)
LANGUAGE MODELING (21)
NEURAL NETWORK (21)
SIGNAL PROCESSING (21)
SPEAKER VERIFICATION (20)
WORD ERROR RATE (20)
ASR (19)
CONFERENCES (19)
more

INFONA - science communication portal

Search results

Towards bootstrapping Acoustic Models for resource poor Indian languages

Recurrent neural network language models for keyword search

Radio-browsing for developmental monitoring in Uganda

Analysis of keyword spotting performance across IARPA babel languages

Automatic node selection for Deep Neural Networks using Group Lasso regularization

Effective joint training of denoising feature space transforms and Neural Network based acoustic models

Unsupervised adaptation for deep neural networks using Alternating Direction Method of Multipliers

Personalized acoustic modeling by weakly supervised multi-task deep learning using acoustic tokens discovered from unlabeled data

Advances in all-neural speech recognition

Deep learning based automatic volume control and limiter system

Very deep convolutional neural networks for raw waveforms

Lyric recognition in monophonic singing using pitch-dependent DNN

Shefce: A Cantonese-English bilingual speech corpus for pronunciation assessment

Improving audio-visual speech recognition using deep neural networks with dynamic stream reliability estimates

Learning online alignments with continuous rewards policy gradient

Memory visualization for gated recurrent neural networks in speech recognition

LDA-based context dependent recurrent neural network language model using document-based topic distribution of words

Investigations on byte-level convolutional neural networks for language modeling in low resource speech recognition

Automatic multi-lingual arousal detection from voice applied to real product testing applications

Learning utterance-level representations for speech emotion and age/gender recognition using deep neural networks

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options