2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Items from 1 to 20 out of 24 results

chapter

An ideal hidden-activation mask for deep neural networks based noise-robust speech recognition

Bo Li, Khe Chai Sim

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 200 - 204

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Deep neural networks (DNNs) are capable of modeling large acoustic variations. However, the performance on noisy data is still below humans' expectations. In this work, we present an ideal hidden-activation masking (IHM) approach to improve their noise robustness. This IHM is inspired by the existing spectral masking techniques. Instead of masking away the noise-dominant components in the spectral...

chapter

Transcribing code-switched bilingual lectures using deep neural networks with unit merging in acoustic modeling

Ching-Feng Yeh, Lin-Shan Lee

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 220 - 224

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper considers the transcription of the widely observed yet less investigated bilingual code-switched speech: the words or phrases of the guest language are inserted within the utterances of the host language, so the languages are switched back and forth within an utterance, and much less data are available for the guest language. Two approaches utilizing the deep neural network (DNN) were tested...

chapter

Improving DNN speaker independence with I-vector inputs

Andrew Senior, Ignacio Lopez-Moreno

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 225 - 229

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose providing additional utterance-level features as inputs to a deep neural network (DNN) to facilitate speaker, channel and background normalization. Modifications of the basic algorithm are developed which result in significant reductions in word error rates (WERs). The algorithms are shown to combine well with speaker adaptation by backpropagation, resulting in a 9% relative WER reduction...

chapter

Context dependent state tying for speech recognition using deep neural network acoustic models

Michiel Bacchiani, David Rybach

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 230 - 234

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper proposes an algorithm to design a tied-state inventory for a context dependent, neural network-based acoustic model for speech recognition. Rather than relying on a GMM/HMM system that operates on a different feature space and is of a different model family, the proposed algorithm optimizes state tying on the activation vectors of the neural network directly. Experiments show the viability...

chapter

Reshaping deep neural network for fast decoding by node-pruning

Tianxing He, Yuchen Fan, Yanmin Qian, Tian Tan, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 245 - 249

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Although deep neural networks (DNN) has achieved significant accuracy improvements in speech recognition, it is computationally expensive to deploy large-scale DNN in decoding due to huge number of parameters. Weights truncation and decomposition methods have been proposed to speed up decoding by exploiting the sparseness of DNN. This paper summarizes different approaches of restructuring DNN and...

chapter

Neural networks for supervised pitch tracking in noise

Kun Han, DeLiang Wang

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1488 - 1492

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Determination of pitch in noise is challenging because of corrupted harmonic structure. In this paper, we extract pitch using supervised learning, where probabilistic pitch states are directly learned from noisy speech. We investigate two alternative neural networks modeling the pitch states given observations. The first one is the feedforward deep neural network (DNN), which is trained on static...

chapter

Exploiting un-transcribed foreign data for speech recognition in well-resourced languages

David Imseng, Blaise Potard, Petr Motlicek, Alexandre Nanchen, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2322 - 2326

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Manual transcription of audio databases for automatic speech recognition (ASR) training is a costly and time-consuming process. State-of-the-art hybrid ASR systems that are based on deep neural networks (DNN) can exploit un-transcribed foreign data during unsupervised DNN pre-training or semi-supervised DNN training. We investigate the relevance of foreign data characteristics, in particular domain...

chapter

Articulatory features from deep neural networks and their role in speech recognition

Vikramjit Mitra, Ganesh Sivaraman, Hosung Nam, Carol Espy-Wilson, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 3017 - 3021

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents a deep neural network (DNN) to extract articulatory information from the speech signal and explores different ways to use such information in a continuous speech recognition task. The DNN was trained to estimate articulatory trajectories from input speech, where the training data is a corpus of synthetic English words generated by the Haskins Laboratories' task-dynamic model of...

chapter

Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis

Heiga Zen, Andrew Senior

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 3844 - 3848

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Statistical parametric speech synthesis (SPSS) using deep neural networks (DNNs) has shown its potential to produce naturally-sounding synthesized speech. However, there are limitations in the current implementation of DNN-based acoustic modeling for speech synthesis, such as the unimodal nature of its objective function and its lack of ability to predict variances. To address these limitations, this...

chapter

Deep neural networks for small footprint text-dependent speaker verification

Ehsan Variani, Xin Lei, Erik McDermott, Ignacio Lopez Moreno, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4052 - 4056

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we investigate the use of deep neural networks (DNNs) for a small footprint text-dependent speaker verification task. At development stage, a DNN is trained to classify speakers at the framelevel. During speaker enrollment, the trained DNN is used to extract speaker specific features from the last hidden layer. The average of these speaker features, or d-vector, is taken as the speaker...

chapter

A maximal figure-of-merit learning approach to maximizing mean average precision with deep neural network based classifiers

Kehuang Li, Zhen Huang, You-Chi Cheng, Chin-Hui Lee

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4503 - 4507

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose a maximal figure-of-merit (MFoM) learning framework to directly maximize mean average precision (MAP) which is a key performance metric in many multi-class classification tasks. Conventional classifiers based on support vector machines cannot be easily adopted to optimize the MAP metric. On the other hand, classifiers based on deep neural networks (DNNs) have recently been shown to deliver...

chapter

Learning spectral mapping for speech dereverberation

Kun Han, Yuxuan Wang, DeLiang Wang

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4628 - 4632

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Reverberation distorts human speech and usually has negative effects on speech intelligibility, especially for hearing-impaired listeners. It also causes performance degradation in automatic speech recognition and speaker identification systems. Therefore, the dereverberation problem must be dealt with in daily listening environments. We propose to use deep neural networks (DNNs) to learn a spectral...

chapter

Mandarin tone classification without pitch tracking

Neville Ryant, Jiahong Yuan, Mark Liberman

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4868 - 4872

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A deep neural network (DNN) based classifier achieved 27.38% frame error rate (FER) and 15.62% segment error rate (SER) in recognizing five tonal categories in Mandarin Chinese broadcast news, based on 40 mel-frequency cepstral coefficients (MFCCs). The same architecture scored substantially lower when trained and tested with F₀ and amplitude parameters alone: 40.05% FER and 22.66% SER. These results...

chapter

Using neural network front-ends on far field multiple microphones based speech recognition

Yulan Liu, Pengyuan Zhang, Thomas Hain

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5542 - 5546

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents an investigation of far field speech recognition using beamforming and channel concatenation in the context of Deep Neural Network (DNN) based feature extraction. While speech enhancement with beamforming is attractive, the algorithms are typically signal-based with no information about the special properties of speech. A simple alternative to beamforming is concatenating multiple...

chapter

Data Augmentation for deep neural network acoustic modeling

Xiaodong Cui, Vaibhava Goel, Brian Kingsbury

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5582 - 5586

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Data augmentation using label preserving transformations has been shown to be effective for neural network training to make invariant predictions. In this paper we focus on data augmentation approaches to acoustic modeling using deep neural networks (DNNs) for automatic speech recognition (ASR). We first investigate a modified version of a previously studied approach using vocal tract length perturbation...

chapter

Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition

Dongpeng Chen, Brian Mak, Cheung-Chi Leung, Sunil Sivadas

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5592 - 5596

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

It is well-known in machine learning that multitask learning (MTL) can help improve the generalization performance of singly learning tasks if the tasks being trained in parallel are related, especially when the amount of training data is relatively small. In this paper, we investigate the estimation of triphone acoustic models in parallel with the estimation of trigrapheme acoustic models under the...

chapter

GMM-free DNN acoustic model training

Andrew Senior, Georg Heigold, Michiel Bacchiani, Hank Liao

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5602 - 5606

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

While deep neural networks (DNNs) have become the dominant acoustic model (AM) for speech recognition systems, they are still dependent on Gaussian mixture models (GMMs) for alignments both for supervised training and for context dependent (CD) tree building. Here we explore bootstrapping DNN AM training without GMM AMs and show that CD trees can be built with DNN alignments which are better matched...

chapter

Multilingual shifting deep bottleneck features for low-resource ASR

Quoc Bao Nguyen, Jonas Gehring, Markus Muller, Sebastian Stuker, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5607 - 5611

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this work, we propose a deep bottleneck feature architecture that is able to leverage data from multiple languages. We also show that tonal features are helpful for non-tonal languages. Evaluations are performed on a low-resource conversational telephone speech transcription task in Bengali, while additional data for DBNF training is provided in Assamese, Pashto, Tagalog, Turkish, and Vietnamese...

chapter

A structure-preserving training target for supervised speech separation

Yuxuan Wang, DeLiang Wang

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 6107 - 6111

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Supervised learning based speech separation has shown considerable success recently. In its simplest form, a discriminative model is trained as a time-frequency masking function, where the training target is an ideal mask. Ideal masks, such as the ideal binary masks, are structured spectro-temporal patterns. However, previous formulations do not model prominent output structure. In this paper, we...

chapter

I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription

Vishwa Gupta, Patrick Kenny, Pierre Ouellet, Themos Stafylakis

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 6334 - 6338

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

State of the art speaker recognition systems are based on the i-vector representation of speech segments. In this paper we show how this representation can be used to perform blind speaker adaptation of hybrid DNN-HMM speech recognition system and we report excellent results on a French language audio transcription task. The implemenation is very simple. An audio file is first diarized and each speaker...

Keywords:
DEEP NEURAL NETWORKS

Publication date

Set your own date range

Keywords

SPEECH RECOGNITION (6)
AUTOMATIC SPEECH RECOGNITION (3)
VOICE SEARCH (3)
HYBRID NEURAL NETWORK SPEECH RECOGNITION (2)
I-VECTORS (2)
MOBILE SPEECH RECOGNITION (2)
MULTITASK LEARNING (2)
SINGULAR VALUE DECOMPOSITION (2)
SPEAKER ADAPTATION (2)
ACOUSTIC MODELING (1)
ARTICULATORY TRAJECTORIES (1)
AVERAGE PRECISION (1)
BEAMFORMING (1)
BILINGUAL (1)
CODE-SWITCHING (1)
CONFIDENCE MEASURES (1)
CONTEXT DEPENDENT TREE-BUILDING (1)
CONTEXT MODELING (1)
DATA AUGMENTATION (1)
DNN-HMM (1)
DROPOUT (1)
DROPOUT AS PRE-CONDITIONER (DAP) (1)
EMBEDDED RECOGNIZER (1)
FLAT START (1)
HIDDEN MARKOV MODELS (1)
HMM (1)
LARGE VOCABULARY SPEECH RECOGNITION (1)
LOW-RANK APPROXIMATION (1)
LOW-RESOURCE ASR (1)
LOW-RESOURCE SPEECH RECOGNITION (1)
LVCSR (1)
MANDARIN (1)
MAXIMAL FIGURE-OF-MERIT (1)
MAXOUT NETWORKS (1)
MIXTURE DENSITY NETWORKS (1)
MULTILINGUAL DEEP BOTTLENECK FEATURES (1)
MULTIPLE MICROPHONE (1)
NODE PRUNING (1)
NOISE ROBUSTNESS (1)
NONLINEARITY (1)
PITCH ESTIMATION (1)
RECURRENT NEURAL NETWORKS (1)
SEMI-SUPERVISED LEARNING (1)
SHRINKING HIDDEN LAYER (1)
SOFTPLUS NONLINEARITY (1)
SPEAKER VERIFICATION (1)
SPECTRAL MAPPING (1)
SPECTRO-TEMPORAL PATTERNS (1)
SPEECH DEREVERBERATION (1)
SPEECH SEPARATION (1)
SPOKEN TERM DETECTION (1)
STATE TYING (1)
STATISTICAL PARAMETRIC SPEECH SYNTHESIS (1)
STOCHASTIC FEATURE MAPPING (1)
SUPERVISED LEARNING (1)
TONE MODELING (1)
TOPIC IDENTIFICATION (1)
TRAINING TARGET (1)
TRIGRAPHEME MODELING (1)
TRIPHONE MODELING (1)
UNIT MERGING (1)
VITERBI DECODING (1)
VITERBI FORCED-ALIGNMENT (1)
VOCAL TRACT LENGTH PERTURBATION (1)
VOCAL TRACT VARIABLES (1)
more

INFONA - science communication portal

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

An ideal hidden-activation mask for deep neural networks based noise-robust speech recognition

Transcribing code-switched bilingual lectures using deep neural networks with unit merging in acoustic modeling

Improving DNN speaker independence with I-vector inputs

Context dependent state tying for speech recognition using deep neural network acoustic models

Reshaping deep neural network for fast decoding by node-pruning

Neural networks for supervised pitch tracking in noise

Exploiting un-transcribed foreign data for speech recognition in well-resourced languages

Articulatory features from deep neural networks and their role in speech recognition

Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis

Deep neural networks for small footprint text-dependent speaker verification

A maximal figure-of-merit learning approach to maximizing mean average precision with deep neural network based classifiers

Learning spectral mapping for speech dereverberation

Mandarin tone classification without pitch tracking

Using neural network front-ends on far field multiple microphones based speech recognition

Data Augmentation for deep neural network acoustic modeling

Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition

GMM-free DNN acoustic model training

Multilingual shifting deep bottleneck features for low-resource ASR

A structure-preserving training target for supervised speech separation

I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription

Filter options

Publication date

Keywords

INFONA - science communication portal

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)