2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Items from 1 to 20 out of 21 results

chapter

Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition

Laszlo Toth

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 190 - 194

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Convolutional neural networks have proved very successful in image recognition, thanks to their tolerance to small translations. They have recently been applied to speech recognition as well, using a spectral representation as input. However, in this case the translations along the two axes — time and frequency — should be handled quite differently. So far, most authors have focused on convolution...

chapter

On combining DNN and GMM with unsupervised speaker adaptation for robust automatic speech recognition

Shilin Liu, Khe Chai Sim

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 195 - 199

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recently, context-dependent Deep Neural Network (CD-DNN) has been found to significantly outperform Gaussian Mixture Model (GMM) for various large vocabulary continuous speech recognition tasks. Unlike the GMM approach, there is no meaningful interpretation of the DNN parameters, which makes it difficult to devise effective adaptation methods for DNNs. Furthermore, DNN parameter estimation is based...

chapter

Stochastic data sweeping for fast DNN training

Wei Deng, Yanmin Qian, Yuchen Fan, Tianfan Fu, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 240 - 244

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Context-dependent deep neural network (CD-DNN) has been successfully used in large vocabulary continuous speech recognition (LVCSR). However the immense computational cost of the mini-batch based back-propagation (BP) training has become a major block to utilize massive speech data for DNN training. Previous works on BP training acceleration mainly focus on parallelization with multiple GPUs. In this...

chapter

Deep learning vector quantization for acoustic information retrieval

Zhen Huang, Chao Weng, Kehuang Li, You-Chi Cheng, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1350 - 1354

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose a novel deep learning vector quantization (DLVQ) algorithm based on deep neural networks (DNNs). Utilizing a strong representation power of this deep learning framework, with any vector quantization (VQ) method as an initializer, the proposed DLVQ technique is capable of learning a code-constrained codebook and thus improves over conventional VQ to be used in classification problems. Tested...

chapter

A novel scheme for speaker recognition using a phonetically-aware deep neural network

Yun Lei, Nicolas Scheffer, Luciana Ferrer, Mitchell McLaren

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1695 - 1699

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose a novel framework for speaker recognition in which extraction of sufficient statistics for the state-of-the-art i-vector model is driven by a deep neural network (DNN) trained for automatic speech recognition (ASR). Specifically, the DNN replaces the standard Gaussian mixture model (GMM) to produce frame alignments. The use of an ASR-DNN system in the speaker recognition pipeline is attractive...

chapter

Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition

Xue Feng, Yaodong Zhang, James Glass

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1759 - 1763

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Denoising autoencoders (DAs) have shown success in generating robust features for images, but there has been limited work in applying DAs for speech. In this paper we present a deep denoising autoencoder (DDA) framework that can produce robust speech features for noisy reverberant speech recognition. The DDA is first pre-trained as restricted Boltzmann machines (RBMs) in an unsupervised fashion. Then...

chapter

Synthesized stereo mapping via deep neural networks for noisy speech recognition

Jun Du, Li-Rong Dai, Qiang Huo

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1764 - 1768

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In our previous work, we extend the traditional stereo-based stochastic mapping by relaxing the constraint of stereo-data, which is not practical in real applications, via HMM-based speech synthesis to construct the “clean” channel data for noisy speech recognition. In this paper, we propose to use deep neural networks (DNNs) for stereo mapping compared with the joint Gaussian mixture model (GMM)...

chapter

Joint noise adaptive training for robust automatic speech recognition

Arun Narayanan, DeLiang Wang

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2504 - 2508

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We explore time-frequency masking to improve noise robust automatic speech recognition. Apart from its use as a frontend, we use it for providing smooth estimates of speech and noise which are then passed as additional features to a deep neural network (DNN) based acoustic model. Such a system improves performance on the Aurora-4 dataset by 10.5% (relative) compared to the previous best published...

chapter

Refinements of regression-based context-dependent modelling of deep neural networks for automatic speech recognition

Guangsen Wang, Khe Chai Sim

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 3022 - 3026

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The data sparsity problem of context-dependent (CD) acoustic modelling of deep neural networks (DNNs) in speech recognition is addressed by using the decision tree state clusters as the training targets. The CD states within a cluster cannot be distinguished during decoding. This problem, referred to as the clustering problem, is not explicitly addressed in the current literature. In our previous...

chapter

A DNN-based acoustic modeling of tonal language and its application to Mandarin pronunciation training

Wenping Hu, Yao Qian, Frank K. Soong

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 3206 - 3210

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we investigate a Deep Neural Network (DNN) based approach to acoustic modeling of tonal language and assess its speech recognition performance with different features and modeling techniques. Mandarin Chinese, the most widely spoken tonal language, is chosen for testing the tone related ASR performance. Furthermore, the DNN-trained, tone-sensitive model is evaluated in automatic detection...

chapter

Deep neural networks for single channel source separation

Emad M. Grais, Mehmet Umut Sen, Hakan Erdogan

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 3734 - 3738

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, a novel approach for single channel source separation (SCSS) using a deep neural network (DNN) architecture is introduced. Unlike previous studies in which DNN and other classifiers were used for classifying time-frequency bins to obtain hard masks for each source, we use the DNN to classify estimated source spectra to check for their validity during separation. In the training stage,...

chapter

Robust speaker identification in noisy and reverberant conditions

Xiaojia Zhao, Yuxuan Wang, DeLiang Wang

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 3997 - 4001

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Robustness of speaker recognition systems is crucial for real-world applications, which typically contain both additive noise and room reverberation. However, the combined effects of additive noise and convolutive reverberation have been rarely studied in speaker identification (SID). This paper addresses this issue in two phases. We first remove background noise through binary masking using a deep...

chapter

Small-footprint keyword spotting using deep neural networks

Guoguo Chen, Carolina Parada, Georg Heigold

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4087 - 4091

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Our application requires a keyword spotting system with a small memory footprint, low computational cost, and high precision. To meet these requirements, we propose a simple approach based on deep neural networks. A deep neural network is trained to directly predict the keyword(s) or subword units of the keyword(s) followed by a posterior handling method producing a final confidence score. Keyword...

chapter

Impact of single-microphone dereverberation on DNN-based meeting transcription systems

Takuya Yoshioka, Xie Chen, Mark J. F. Gales

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5527 - 5531

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Over the past few decades, a range of front-end techniques have been proposed to improve the robustness of automatic speech recognition systems against environmental distortion. While these techniques are effective for small tasks consisting of carefully designed data sets, especially when used with a classical acoustic model, there has been limited evidence that they are useful for a state-of-the-art...

chapter

Factorized adaptation for deep neural network

Jinyu Li, Jui-Ting Huang, Yifan Gong

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5537 - 5541

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we propose a novel method to adapt context-dependent deep neural network hidden Markov model (CD-DNN-HMM) with only limited number of parameters by taking into account the underlying factors that contribute to the distorted speech signal. We derive this factorized adaptation method from the perspectives of joint factor analysis and vector Taylor series expansion, respectively. Evaluated...

chapter

A family of discriminative training criteria based on the F-divergence for deep neural networks

Markus Nussbaum-Thom, Xiaodong Cui, Ralf Schluter, Vaibhava Goel, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5612 - 5616

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We present novel bounds on the classification error which are based on the f-Divergence and, at the same time, can be used as practical training criteria. There exist virtually no studies which investigate the link between the f-Divergence, the classification error and practical training criteria. So far only the Kullback-Leibler f-Divergence has been examined in this context to formulate a bound...

chapter

Investigation of unsupervised adaptation of DNN acoustic models with filter bank input

Takuya Yoshioka, Anton Ragni, Mark J. F. Gales

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 6344 - 6348

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Adaptation to speaker variations is an essential component of speech recognition systems. One common approach to adapting deep neural network (DNN) acoustic models is to perform global constrained maximum likelihood linear regression (CMLLR) at some point of the systems. Using CMLLR (or more generally, generative approaches) is advantageous especially in unsupervised adaptation scenarios with high...

chapter

Speaker Adaptive Training using Deep Neural Networks

Tsubasa Ochiai, Shigeki Matsuda, Xugang Lu, Chiori Hori, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 6349 - 6353

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Among many speaker adaptation embodiments, Speaker Adaptive Training (SAT) has been successfully applied to a standard Hidden-Markov-Model (HMM) speech recognizer, whose state is associated with Gaussian Mixture Models (GMMs). On the other hand, recent studies on Speaker-Independent (SI) recognizer development have reported that a new type of HMM speech recognizer, which replaces GMMs with Deep Neural...

chapter

Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network

Jian Xue, Jinyu Li, Dong Yu, Mike Seltzer, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 6359 - 6363

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The large number of parameters in deep neural networks (DNN) for automatic speech recognition (ASR) makes speaker adaptation very challenging. It also limits the use of speaker personalization due to the huge storage cost in large-scale deployments. In this paper we address DNN adaptation and personalization issues by presenting two methods based on the singular value decomposition (SVD). The first...

chapter

Exploring one pass learning for deep neural network training with averaged stochastic gradient descent

Zhao You, Xiaorui Wang, Bo Xu

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 6854 - 6858

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Deep neural network acoustic models have shown large improvement in performance over Gaussian mixture models (G-MMs) in recent studies. Typically, deep neural networks are trained based on the cross-entropy criterion using stochastic gradient descent (SGD). However, plain SGD requires scanning the whole training set many passes before reaching the asymptotic region, making it difficult to scale to...

Keywords:
DEEP NEURAL NETWORK

Publication date

Set your own date range

Keywords

SPEECH RECOGNITION (3)
REVERBERATION (2)
SPEAKER ADAPTATION (2)
ACOUSTIC MODEL (1)
ACOUSTIC MODEL ADAPTATION (1)
ARTICULATORY FEATURES (1)
ASYNCHRONOUS SGD (1)
AURORA-4 (1)
AVERAGED STOCHASTIC GRADIENT DESCENT (1)
CANONICAL STATE MODELLING (1)
CHIME-2 (1)
CLASSIFICATION ERROR BOUND (1)
COMPUTER-AIDED PRONUNCIATION TRAINING (1)
CONTEXT DEPENDENT MODELLING (1)
CONVOLUTIONAL NEURAL NETWORK (1)
DENOISING AUTOENCODER (1)
DISCRIMINATIVE TRAINING (1)
EMBEDDED SPEECH RECOGNITION (1)
ENVIRONMENTAL ROBUSTNESS (1)
F-DIVERGENCE (1)
F0 (1)
FACTORIZED ADAPTATION (1)
FEATURE DENOISING (1)
FIXED-POINT OPTIMIZATION (1)
GAUSSIAN MIXTURE MODEL (1)
GPU (1)
HMM-BASED SPEECH SYNTHESIS (1)
HYBRID (1)
IDEAL BINARY MASK (1)
INFORMATION RETRIEVAL (1)
JOINT FACTOR ANALYSIS (1)
JOINT GAUSSIAN MIXTURE MODEL (1)
K-MEANS (1)
KEYWORD SPOTTING (1)
LEARNING VECTOR QUANTIZATION (1)
LOGISTIC REGRESSION (1)
MANDARIN (1)
MEETING TRANSCRIPTION (1)
NOISE (1)
NOISE ROBUSTNESS (1)
NOISY SPEECH RECOGNITION (1)
NONNEGATIVE MATRIX FACTORIZATION (1)
ONE PASS LEARNING (1)
PHONEME RECOGNITION (1)
RECTIFIED LINEAR UNIT (1)
ROBUST SPEAKER IDENTIFICATION (1)
ROBUST SPEECH RECOGNITION (1)
SEQUENTIAL LEARNING (1)
SINGLE CHANNEL SOURCE SEPARATION (1)
SINGLE DISTANT MICROPHONE (1)
SINGULAR VALUE DECOMPOSITION (1)
SPEAKER ADAPTATIVE TRAINING (1)
SPEAKER PERSONALIZATION (1)
SPEAKER RECOGNITION (1)
STACKED HYBRID (1)
STOCHASTIC DATA SWEEPING (1)
TANDEM (1)
TIME-FREQUENCY MASKING (1)
TIMIT (1)
VECTOR TAYLOR SERIES (1)
VLSI (1)
more

INFONA - science communication portal

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition

On combining DNN and GMM with unsupervised speaker adaptation for robust automatic speech recognition

Stochastic data sweeping for fast DNN training

Deep learning vector quantization for acoustic information retrieval

A novel scheme for speaker recognition using a phonetically-aware deep neural network

Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition

Synthesized stereo mapping via deep neural networks for noisy speech recognition

Joint noise adaptive training for robust automatic speech recognition

Refinements of regression-based context-dependent modelling of deep neural networks for automatic speech recognition

A DNN-based acoustic modeling of tonal language and its application to Mandarin pronunciation training

Deep neural networks for single channel source separation

Robust speaker identification in noisy and reverberant conditions

Small-footprint keyword spotting using deep neural networks

Impact of single-microphone dereverberation on DNN-based meeting transcription systems

Factorized adaptation for deep neural network

A family of discriminative training criteria based on the F-divergence for deep neural networks

Investigation of unsupervised adaptation of DNN acoustic models with filter bank input

Speaker Adaptive Training using Deep Neural Networks

Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network

Exploring one pass learning for deep neural network training with averaged stochastic gradient descent

Filter options

Publication date

Keywords

INFONA - science communication portal

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)