Advanced search

Advanced search in people

From:

To:

Items from 81 to 100 out of 937 results

chapter

A novel pitch extraction based on jointly trained deep BLSTM Recurrent Neural Networks with bottleneck features

Bin Liu, Jianhua Tao, Dawei Zhang, Yibin Zheng

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 336 - 340

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pitch is an important characteristic of speech and is useful for many applications. However, it is still challenging to estimate pitch in strong noise. In this paper, we propose a joint training approach to determinate pitch. First, a Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTMRNN) is trained to map the noisy to clean speech features. Second, the pitch estimation is also...

chapter

Improving audio-visual speech recognition using deep neural networks with dynamic stream reliability estimates

Hendrik Meutzner, Ning Ma, Robert Nickel, Christopher Schymura, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5320 - 5324

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Audio-visual speech recognition is a promising approach to tackling the problem of reduced recognition rates under adverse acoustic conditions. However, finding an optimal mechanism for combining multi-modal information remains a challenging task. Various methods are applicable for integrating acoustic and visual information in Gaussian-mixture-model-based speech recognition, e.g., via dynamic stream...

chapter

Low-rank and sparse soft targets to learn better DNN acoustic models

Pranay Dighe, Afsaneh Asaei, Herve Bourlard

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5265 - 5269

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Conventional deep neural networks (DNN) for speech acoustic modeling rely on Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary class labels as the targets for DNN training. Subword classes in speech recognition systems correspond to context-dependent tied states or senones. The present work addresses some limitations of GMM-HMM senone alignments for DNN training. We hypothesize...

chapter

Faster sequence training

Albert Zeyer, Ilia Kulikov, Ralf Schluter, Hermann Ney

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5285 - 5289

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

It has been shown that sequence-discriminative training can improve the performance for large vocabulary continuous speech recognition. Our main contribution is a novel method for reducing the computation time of any sort of sequence training while only slightly decreasing the overall performance. The method allows to parallelize the forward propagation through the network, the loss and loss gradient...

chapter

A study on data augmentation of reverberant speech for robust speech recognition

Tom Ko, Vijayaditya Peddinti, Daniel Povey, Michael L. Seltzer, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5220 - 5224

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The environmental robustness of DNN-based acoustic models can be significantly improved by using multi-condition training data. However, as data collection is a costly proposition, simulation of the desired conditions is a frequently adopted strategy. In this paper we detail a data augmentation approach for far-field ASR. We examine the impact of using simulated room impulse responses (RIRs), as real...

chapter

Non-negative matrix factorization of signals with overlapping events for event detection applications

Shiqiang Wang, Jorge Ortiz

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5960 - 5964

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In many event detection applications, training data may contain tags with multiple, simultaneous events. This is particularly likely when the definition of “event” is broad and includes events that can persist for an extended period of time. Decomposing a mixed signal into signals corresponding to individual events is non-trivial. In this paper, we propose a non-negative matrix factorization (NMF)...

chapter

Speaker diarization using deep neural network embeddings

Daniel Garcia-Romero, David Snyder, Gregory Sell, Daniel Povey, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4930 - 4934

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Speaker diarization is an important front-end for many speech technologies in the presence of multiple speakers, but current methods that employ i-vector clustering for short segments of speech are potentially too cumbersome and costly for the front-end role. In this work, we propose an alternative approach for learning representations via deep neural networks to remove the i-vector extraction process...

chapter

Expressive visual text to speech and expression adaptation using deep neural networks

Jonathan Parker, Ranniery Maia, Yannis Stylianou, Roberto Cipolla

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4920 - 4924

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we present an expressive visual text to speech system (VTTS) based on a deep neural network (DNN). Given an input text sentence and a set of expression tags, the VTTS is able to produce not only the audio speech, but also the accompanying facial movements. The expressions can either be one of the expressions in the training corpus or a blend of expressions from the training corpus....

chapter

Multi-task deep neural network with shared hidden layers: Breaking down the wall between emotion representations

Yue Zhang, Yifan Liu, Felix Weninger, Bjorn Schuller

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4990 - 4994

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Emotion representations are psychological constructs for modelling, analysing, and recognising emotion, being one essential element of affect. Due to its complexity, the boundaries between different emotion concepts are often fuzzy, which is also reflected in the diversification of emotion databases, and their inconsistent target labels. When facing data scarcity as an ever present issue for acoustic...

chapter

A first attempt at polyphonic sound event detection using connectionist temporal classification

Yun Wang, Florian Metze

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2986 - 2990

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Sound event detection is the task of detecting the type, starting time, and ending time of sound events in audio streams. Recently, recurrent neural networks (RNNs) have become the mainstream solution for sound event detection. Because RNNs make a prediction at every frame, it is necessary to provide exact starting and ending times of the sound events in the training data, making data annotation an...

chapter

End-to-end ASR-free keyword search from speech

Kartik Audhkhasi, Andrew Rosenberg, Abhinav Sethy, Bhuvana Ramabhadran, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4840 - 4844

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

End-to-end (E2E) systems have achieved competitive results compared to conventional hybrid hidden Markov model (HMM)-deep neural network based automatic speech recognition (ASR) systems. Such E2E systems are attractive due to the lack of dependence on alignments between input acoustic and output grapheme or HMM state sequence during training. This paper explores the design of an ASR-free end-to-end...

chapter

Active learning for sound event classification by clustering unlabeled data

Zhao Shuyang, Toni Heittola, Tuomas Virtanen

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 751 - 755

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper proposes a novel active learning method to save annotation effort when preparing material to train sound event classifiers. K-medoids clustering is performed on unlabeled sound segments, and medoids of clusters are presented to annotators for labeling. The annotated label for a medoid is used to derive predicted labels for other cluster members. The obtained labels are used to build a classifier...

chapter

Investigations on byte-level convolutional neural networks for language modeling in low resource speech recognition

Kazuki Irie, Pavel Golik, Ralf Schluter, Hermann Ney

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5740 - 5744

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we present an investigation on technical details of the byte-level convolutional layer which replaces the conventional linear word projection layer in the neural language model. In particular, we discuss and compare the effective filter configurations, pooling types and the use of bytes instead of characters. We carry out experiments on language packs released by the IARPA Babel project...

chapter

Hearing in a shoe-box: Binaural source position and wall absorption estimation using virtually supervised learning

Saurabh Kataria, Clement Gaultier, Antoine Deleforge

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 226 - 230

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper introduces a new framework for supervised sound source localization referred to as virtually-supervised learning. An acoustic shoe-box room simulator is used to generate a large number of binaural single-source audio scenes. These scenes are used to build a dataset of spatial binaural features annotated with acoustic properties such as the 3D source position and the walls' absorption coefficients...

chapter

Respiratory airflow estimation from lung sounds based on regression

Elmar Messner, Martin Hagmuller, Paul Swatek, Freyja-Maria Smolle-Juttner, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1123 - 1127

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The aim of this work is the estimation of respiratory flow from lung sound recordings, i.e. acoustic airflow estimation. With a 16-channel lung sound recording device, we simultaneously record the respiratory flow and the lung sounds on the posterior chest from six lung-healthy subjects in supine position. For the recordings of four selected sensor positions, we extract linear frequency cepstral coefficient...

chapter

A network of deep neural networks for Distant Speech Recognition

Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4880 - 4884

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Despite the remarkable progress recently made in distant speech recognition, state-of-the-art technology still suffers from a lack of robustness, especially when adverse acoustic conditions characterized by non-stationary noises and reverberation are met.

chapter

Anuran call classification with deep learning

Julia Strout, Bryce Rogan, S.M. Mahdi Seyednezhad, Katrina Smart, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2662 - 2665

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Ecologists can assess the health of flooded habitats or wetlands by studying the variations in the populations of bioindicators such as anurans (i.e., frogs and toads). To monitor anuran populations, ecologists manually identify anuran species from audio recordings. This identification task can be significantly streamlined by the availability of an automated method for anuran identification. Previous...

chapter

A comprehensive study of deep bidirectional LSTM RNNS for acoustic modeling in speech recognition

Albert Zeyer, Patrick Doetsch, Paul Voigtlaender, Ralf Schluter, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2462 - 2466

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recent experiments show that deep bidirectional long short-term memory (BLSTM) recurrent neural network acoustic models outperform feedforward neural networks for automatic speech recognition (ASR). However, their training requires a lot of tuning and experience. In this work, we provide a comprehensive overview over various BLSTM training aspects and their interplay within ASR, which has been missing...

chapter

On time-frequency mask estimation for MVDR beamforming with application in robust speech recognition

Xiong Xiao, Shengkui Zhao, Douglas L. Jones, Eng Siong Chng, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 3246 - 3250

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Acoustic beamforming has played a key role in the robust automatic speech recognition (ASR) applications. Accurate estimates of the speech and noise spatial covariance matrices (SCM) are crucial for successfully applying the minimum variance distortionless response (MVDR) beamforming. Reliable estimation of time-frequency (TF) masks can improve the estimation of the SCMs and significantly improve...

chapter

A deep neural network integrated with filterbank learning for speech recognition

Hiroshi Seki, Kazumasa Yamamoto, Seiichi Nakagawa

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5480 - 5484

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Deep neural networks (DNN) have achieved significant success in the field of speech recognition. One of the main advantages of the DNN is automatic feature extraction without human intervention. Therefore, we incorporate a pseudo-filterbank layer to the bottom of DNN and train the whole filterbank layer and the following networks jointly, while most systems take pre-defined mel-scale filterbanks as...

Keywords:
TRAINING
ACOUSTICS

Publication date

Set your own date range

Content availability

Available (936)
None (1)

Publication type

book (816)
article (121)

Keywords

SPEECH (566)
HIDDEN MARKOV MODELS (487)
SPEECH RECOGNITION (429)
FEATURE EXTRACTION (218)
DATA MODELS (151)
ADAPTATION MODELS (103)
SPEECH PROCESSING (98)
TRAINING DATA (95)
NEURAL NETWORKS (93)
ACCURACY (90)
COMPUTATIONAL MODELING (85)
AUTOMATIC SPEECH RECOGNITION (75)
SUPPORT VECTOR MACHINES (74)
ARTIFICIAL NEURAL NETWORKS (71)
DATABASES (65)
VECTORS (58)
TESTING (56)
DECODING (54)
ADAPTATION MODEL (49)
NATURAL LANGUAGE PROCESSING (49)
MATHEMATICAL MODEL (47)
SPEAKER RECOGNITION (47)
ACOUSTIC SIGNAL PROCESSING (46)
NOISE (44)
ACOUSTIC MODELING (43)
CONTEXT (43)
HIDDEN MARKOV MODEL (42)
SPEECH SYNTHESIS (41)
ESTIMATION (40)
ROBUSTNESS (40)
DATA MINING (39)
SIGNAL PROCESSING (39)
DEEP NEURAL NETWORKS (38)
DEEP NEURAL NETWORK (36)
DISCRIMINATIVE TRAINING (36)
MAXIMUM LIKELIHOOD ESTIMATION (35)
LATTICES (34)
LEARNING (ARTIFICIAL INTELLIGENCE) (34)
TRANSFORMS (34)
ERROR ANALYSIS (32)
CLASSIFICATION ALGORITHMS (31)
VOCABULARY (31)
SIGNAL TO NOISE RATIO (29)
VISUALIZATION (28)
ACOUSTIC MODEL (27)
CONTEXT MODELING (27)
MACHINE LEARNING (26)
EMOTION RECOGNITION (25)
KERNEL (25)
DICTIONARIES (24)
NOISE MEASUREMENT (24)
STANDARDS (24)
OPTIMIZATION (23)
PATTERN RECOGNITION (23)
EQUATIONS (22)
GAUSSIAN PROCESSES (22)
INDEXES (22)
ALGORITHM DESIGN AND ANALYSIS (21)
EDUCATIONAL INSTITUTIONS (21)
MICROPHONES (21)
PROBABILITY (21)
SIGNAL PROCESSING ALGORITHMS (21)
CLUSTERING ALGORITHMS (20)
CONFERENCES (20)
RECURRENT NEURAL NETWORKS (20)
SPEAKER ADAPTATION (20)
COMPUTERS (19)
CORRELATION (19)
HMM (19)
SUPPORT VECTOR MACHINE CLASSIFICATION (18)
COMPUTER ARCHITECTURE (17)
GAUSSIAN MIXTURE MODEL (17)
UNSUPERVISED LEARNING (17)
COMPLEXITY THEORY (16)
DETECTORS (16)
LANGUAGE MODEL (16)
NEURAL NETS (16)
PRAGMATICS (16)
CONVOLUTION (15)
MEASUREMENT (15)
NIST (15)
PATTERN CLASSIFICATION (15)
ACOUSTIC MEASUREMENTS (14)
EVENT DETECTION (14)
KEYWORD SEARCH (14)
NEURONS (14)
PREDICTIVE MODELS (14)
SPEECH ENHANCEMENT (14)
VOICE CONVERSION (14)
JOINTS (13)
LVCSR (13)
MEL FREQUENCY CEPSTRAL COEFFICIENT (13)
SPEECH CODING (13)
SUPPORT VECTOR MACHINE (13)
TRAJECTORY (13)
APPROXIMATION METHODS (12)
DEEP LEARNING (12)
DNN (12)
more

INFONA - science communication portal

Advanced search

Advanced search in people

A novel pitch extraction based on jointly trained deep BLSTM Recurrent Neural Networks with bottleneck features

Improving audio-visual speech recognition using deep neural networks with dynamic stream reliability estimates

Low-rank and sparse soft targets to learn better DNN acoustic models

Faster sequence training

A study on data augmentation of reverberant speech for robust speech recognition

Non-negative matrix factorization of signals with overlapping events for event detection applications

Speaker diarization using deep neural network embeddings

Expressive visual text to speech and expression adaptation using deep neural networks

Multi-task deep neural network with shared hidden layers: Breaking down the wall between emotion representations

A first attempt at polyphonic sound event detection using connectionist temporal classification

End-to-end ASR-free keyword search from speech

Active learning for sound event classification by clustering unlabeled data

Investigations on byte-level convolutional neural networks for language modeling in low resource speech recognition

Hearing in a shoe-box: Binaural source position and wall absorption estimation using virtually supervised learning

Respiratory airflow estimation from lung sounds based on regression

A network of deep neural networks for Distant Speech Recognition

Anuran call classification with deep learning

A comprehensive study of deep bidirectional LSTM RNNS for acoustic modeling in speech recognition

On time-frequency mask estimation for MVDR beamforming with application in robust speech recognition

A deep neural network integrated with filterbank learning for speech recognition

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Advanced search

Advanced search in people

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options