2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Items from 1 to 12 out of 12 results

chapter

Deep Scattering Spectrum with deep neural networks

Vijayaditya Peddinti, TaraN. Sainath, Shay Maymon, Bhuvana Ramabhadran, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 210 - 214

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

State-of-the-art convolutional neural networks (CNNs) typically use a log-mel spectral representation of the speech signal. However, this representation is limited by the spectro-temporal resolution afforded by log-mel filter-banks. A novel technique known as Deep Scattering Spectrum (DSS) addresses this limitation and preserves higher resolution information, while ensuring time warp stability, through...

chapter

Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks

Erik Marchi, Giacomo Ferroni, Florian Eyben, Leonardo Gabrielli, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2164 - 2168

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A plethora of different onset detection methods have been proposed in the recent years. However, few attempts have been made with respect to widely-applicable approaches in order to achieve superior performances over different types of music and with considerable temporal precision. In this paper, we present a multi-resolution approach based on discrete wavelet transform and linear prediction filtering...

chapter

Automatic discovery of a phonetic inventory for unwritten languages for statistical speech synthesis

Prasanna Kumar Muthukumar, Alan W Black

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2594 - 2598

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Speech synthesis systems are typically built with speech data and transcriptions. In this paper, we try to build synthesis systems when no transcriptions or knowledge about the language are available. It is usually necessary to at least possess phonetic knowledge about the language. In this paper, we propose an automated way of obtaining phones and phonetic knowledge about the corpus at hand by making...

chapter

RASR/NN: The RWTH neural network toolkit for speech recognition

Simon Wiesler, Alexander Richard, Pavel Golik, Ralf Schluter, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 3281 - 3285

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper describes the new release of RASR — the open source version of the well-proven speech recognition toolkit developed and used at RWTH Aachen University. The focus is put on the implementation of the NN module for training neural network acoustic models. We describe code design, configuration, and features of the NN module. The key feature is a high flexibility regarding the network topology,...

chapter

Advanced algorithms for surgical gesture classification

Giovanni Luca Santosuosso, Giovanni Saggio, Fabio Sora, Laura Sbernini, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 3596 - 3600

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A novel gesture binary classification procedure is presented to determine surgical ability. To this aim a sensory glove was employed to track surgical hand movements and sensors data were recorded to be processed by a specific algorithm. The classification task was able to discriminate a gesture made by an expert surgeon with respect to a novice one, thanks to a two steps classification strategy....

chapter

Optimization of Neural Network Language Models for keyword search

Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4888 - 4892

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recent works have shown Neural Network based Language Models (NNLMs) to be an effective modeling technique for Automatic Speech Recognition. Prior works have shown that these models obtain lower perplexity and word error rate (WER) compared to both standard n-gram language models (LMs) and more advanced language models including maximum entropy and random forest LMs. While these results are compelling,...

chapter

Deep learning of split temporal context for automatic speech recognition

Moez Baccouche, Benoit Besset, Patrice Collen, Olivier Le Blouch

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5422 - 5426

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper follows the recent advances in speech recognition which recommend replacing the standard hybrid GMM/HMM approach by deep neural architectures. These models were shown to drastically improve recognition performances, due to their ability to capture the underlying structure of data. However, they remain particularly complex since the entire temporal context of a given phoneme is learned with...

chapter

Joint training of convolutional and non-convolutional neural networks

Hagen Soltau, George Saon, Tara N. Sainath

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5572 - 5576

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We describe a simple modification of neural networks which consists in extending the commonly used linear layer structure to an arbitrary graph structure. This allows us to combine the benefits of convolutional neural networks with the benefits of regular networks. The joint model has only a small increase in parameter size and training and decoding time are virtually unaffected. We report significant...

chapter

Asynchronous stochastic optimization for sequence training of deep neural networks

Georg Heigold, Erik McDermott, Vincent Vanhoucke, Andrew Senior, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5587 - 5591

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper explores asynchronous stochastic optimization for sequence training of deep neural networks. Sequence training requires more computation than frame-level training using pre-computed frame data. This leads to several complications for stochastic optimization, arising from significant asynchrony in model updates under massive parallelization, and limited data shuffling due to utterance-chunked...

chapter

Lattice based optimization of bottleneck feature extractor with linear transformation

Diyuan Liu, Si Wei, Wu Guo, Yebo Bao, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5617 - 5621

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper proposes a lattice-based sequential discriminative training method to extract more discriminative bottleneck features. In our method, the bottleneck neural network is first trained with cross entropy criteria, and then only the weights of bottleneck layer are retrained with sequential criteria. If the outputs of the layer before bottleneck are treated as the raw features, the new method...

chapter

Deep neural network trained with speaker representation for speaker normalization

Yun Tang, Aanchan Mohan, Richard C. Rose, Chengyuan Ma

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 6329 - 6333

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A method for speaker normalization in deep neural network (DNN) based discriminative feature estimation for automatic speech recognition (ASR) is presented. This method is applied in the context of a DNN configured for auto-encoder based low dimensional bottleneck (AE-BN) feature extraction where the derived features are used as input to a continuous Gaussian density hidden Markov model (HMM/GMM)...

chapter

Improved music feature learning with deep neural networks

Siddharth Sigtia, Simon Dixon

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 6959 - 6963

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recent advances in neural network training provide a way to efficiently learn representations from raw data. Good representations are an important requirement for Music Information Retrieval (MIR) tasks to be performed successfully. However, a major problem with neural networks is that training time becomes prohibitive for very large datasets and the learning algorithm can get stuck in local minima...

Filter options

Keywords:
NEURAL NETWORKS

Publication date

Set your own date range

Keywords

SPEECH RECOGNITION (4)
ACOUSTIC MODELING (3)
DEEP LEARNING (2)
SEQUENCE TRAINING (2)
ARTICULATORY FEATURES (1)
ASYNCHRONOUS STOCHASTIC OPTIMIZATION (1)
AUDIO ONSET DETECTION (1)
BIDIRECTIONAL LONGSHORT TERM MEMORY (1)
BIOMEDICAL SIGNAL PROCESSING (1)
BOTTLENECK FEATURES (1)
CNN (1)
COMPUTATIONAL INTELLIGENT (1)
DEEP SCATTERING SPECTRUM (1)
DISCRETE WAVELET TRANSFORM (1)
DISCRIMINATIVE TRAINING (1)
GPU (1)
KEYWORD SEARCH (1)
LANGUAGE MODELING (1)
LINEAR PREDICTION (1)
MIR (1)
MLP (1)
OPEN SOURCE (1)
RASR (1)
SPEAKER ADAPTATION (1)
SPEAKER NORMALIZATION (1)
SPEECH SYNTHESIS (1)
SPLIT TEMPORAL CONTEXT (1)
SUPERVISED LEARNING (1)
TTS WITHOUT TEXT (1)
UN-LABELED SPEECH CORPORA (1)
WEARABLE SENSORS (1)
more

INFONA - science communication portal

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)