Search results

Items from 1 to 20 out of 22 results

chapter

Deep neural network bottleneck features for bird species verification

Jinming Zhao, Yanyan Xu, Dengfeng Ke, Kaile Su

2017 International Joint Conference on Neural Networks (IJCNN) > 927 - 933

2017 International Joint Conference on Neural Networks (IJCNN)

Recently, bottleneck features as effective representations have been successfully used in Speaker Recognition (SR) and Language Recognition (LR), but little work has focused on bottleneck features for Bird Species Verification (BSV). In SR, LR and BSR tasks, using short-time spectra features may be insufficient, so it need some more abstract and discriminative representations as complementation to...

chapter

Advances in all-neural speech recognition

Geoffrey Zweig, Chengzhu Yu, Jasha Droppo, Andreas Stolcke

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4805 - 4809

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper advances the design of CTC-based all-neural (or end-to-end) speech recognizers. We propose a novel symbol inventory, and a novel iterated-CTC method in which a second system is used to transform a noisy initial output into a cleaner version. We present a number of stabilization and initialization methods we have found useful in training these networks. We evaluate our system on the commonly...

chapter

A network of deep neural networks for Distant Speech Recognition

Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4880 - 4884

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Despite the remarkable progress recently made in distant speech recognition, state-of-the-art technology still suffers from a lack of robustness, especially when adverse acoustic conditions characterized by non-stationary noises and reverberation are met.

chapter

A comprehensive study of deep bidirectional LSTM RNNS for acoustic modeling in speech recognition

Albert Zeyer, Patrick Doetsch, Paul Voigtlaender, Ralf Schluter, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2462 - 2466

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recent experiments show that deep bidirectional long short-term memory (BLSTM) recurrent neural network acoustic models outperform feedforward neural networks for automatic speech recognition (ASR). However, their training requires a lot of tuning and experience. In this work, we provide a comprehensive overview over various BLSTM training aspects and their interplay within ASR, which has been missing...

chapter

Training variance and performance evaluation of neural networks in speech

Ewout van den Berg, Bhuvana Ramabhadran, Michael Picheny

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2287 - 2291

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this work we study variance in the results of neural network training on a wide variety of configurations in automatic speech recognition. Although this variance itself is well known, this is, to the best of our knowledge, the first paper that performs an extensive empirical study on its effects in speech recognition. We view training as sampling from a distribution and show that these distributions...

chapter

Stimulated training for automatic speech recognition and keyword search in limited resource conditions

A. Ragni, C. Wu, M. J. F. Gales, J. Vasilakes, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4830 - 4834

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Training neural network acoustic models on limited quantities of data is a challenging task. A number of techniques have been proposed to improve generalisation. This paper investigates one such technique called stimulated training. It enables standard criteria such as cross-entropy to enforce spatial constraints on activations originating from different units. Having different regions being active...

chapter

Adapting and controlling DNN-based speech synthesis using input codes

Hieu-Thi Luong, Shinji Takaki, Gustav Eje Henter, Junichi Yamagishi

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4905 - 4909

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Methods for adapting and controlling the characteristics of output speech are important topics in speech synthesis. In this work, we investigated the performance of DNN-based text-to-speech systems that in parallel to conventional text input also take speaker, gender, and age codes as inputs, in order to 1) perform multi-speaker synthesis, 2) perform speaker adaptation using small amounts of target-speaker...

chapter

Joint optimisation of tandem systems using Gaussian mixture density neural network discriminative sequence training

C. Zhang, P. C. Woodland

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5015 - 5019

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The use of deep neural networks (DNNs) for feature extraction and Gaussian mixture models (GMMs) for acoustic modelling is often termed a tandem system configuration and can be viewed as a Gaussian mixture density neural network (MDNN). Compared to the direct use of DNN output probabilities in the acoustic model, the tandem approach suffers from a major weakness in that the feature extraction stage...

chapter

Rapid feature space MLLR speaker adaptation for deep neural network acoustic modeling

Shilei Zhang, Yong Qin

2016 23rd International Conference on Pattern Recognition (ICPR) > 2889 - 2894

2016 23rd International Conference on Pattern Recognition (ICPR)

Bilinear models based feature space Maximum Likelihood Linear Regression (FMLLR) speaker adaptation have showed good performance for GMM-HMMs especially when the amount of adaptation data is limited. In this paper, we propose using bilinear models feature as inputs to deep neural networks (DNNs) for rapid speaker adaptation of acoustic modeling to facilitate utterance-level normalization. The effectiveness...

chapter

Iterative training of a DPGMM-HMM acoustic unit recognizer in a zero resource scenario

Michael Heck, Sakriani Sakti, Satoshi Nakamura

2016 IEEE Spoken Language Technology Workshop (SLT) > 57 - 63

2016 IEEE Spoken Language Technology Workshop (SLT)

In this paper we propose a framework for building a full-fledged acoustic unit recognizer in a zero resource setting, i.e., without any provided labels. For that, we combine an iterative Dirichlet process Gaussian mixture model (DPGMM) clustering framework with a standard pipeline for supervised GMM-HMM acoustic model (AM) and n-gram language model (LM) training, enhanced by a scheme for iterative...

chapter

I-Vector estimation as auxiliary task for Multi-Task Learning based acoustic modeling for automatic speech recognition

Gueorgui Pironkov, Stephane Dupont, Thierry Dutoit

2016 IEEE Spoken Language Technology Workshop (SLT) > 1 - 7

2016 IEEE Spoken Language Technology Workshop (SLT)

I-Vectors have been successfully applied in the speaker identification community in order to characterize the speaker and its acoustic environment. Recently, i-vectors have also shown their usefulness in automatic speech recognition, when concatenated to standard acoustic features. Instead of directly feeding the acoustic model with i-vectors, we here investigate a Multi-Task Learning approach, where...

chapter

A two-pass framework of mispronunciation detection & diagnosis for computer-aided pronunciation training

Xiaojun Qian, Helen Meng, Frank Soong

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 384 - 387

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper presents a two-pass framework of mispronunciation detection and diagnosis (MD&D) — detection followed by diagnosis, without the need of explicit error pattern modeling, so that the main efforts can be devoted to improving acoustic modeling by discriminative training (or by applying alternative models like neural nets). The framework instantiates a set of anti-phones and a filler model...

chapter

Accent classification with phonetic vowel representation

Zhenhao Ge, Yingyi Tan, Aravind Ganapathiraju

2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) > 529 - 533

2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)

Previous accent classification research focused mainly on detecting accents with pure acoustic information without recognizing accented speech. This work combines phonetic knowledge such as vowels with acoustic information to build Guassian Mixture Model (GMM) classifier with Perceptual Linear Predictive (PLP) features, optimized by Hetroscedastic Linear Discriminant Aanlysis (HLDA). With input about...

chapter

A new training algorithm for hybrid HMM/ANN speech recognition systems

Herve Bourlard, Yochai Konig, Nelson Morgan, Christophe Ris

1996 8th European Signal Processing Conference (EUSIPCO 1996) > 1 - 4

1996 8th European Signal Processing Conference (EUSIPCO 1996)

In this paper, we briefly describe REMAP, an approach for the training and estimation of posterior probabilities, and report its application to speech recognition. REMAP is a recursive algorithm that is reminiscent of the Expectation Maximization (EM) [5] algorithm for the estimation of data likelihoods. Although very general, the method is developed in the context of a statistical model for transition-based...

chapter

Improving multiple-crowd-sourced transcriptions using a speech recogniser

R. C. van Dalen, K. M. Knill, P. Tsiakoulis, M. J. F. Gales

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4709 - 4713

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper introduces a method to produce high-quality transcriptions of speech data from only two crowd-sourced transcriptions. These transcriptions, produced cheaply by people on the Internet, for example through Amazon Mechanical Turk, are often of low quality. Often, multiple crowd-sourced transcriptions are combined to form one transcription of higher quality. However, the state of the art is...

chapter

Brandt's GLR method & refined HMM segmentation for TTS synthesis application

Safaa Jarifi, Dominique Pastor, Olivier Rosec

2005 13th European Signal Processing Conference > 1 - 4

2005 13th European Signal Processing Conference

In comparison with standard HMM (Hidden Markov Model) with forced alignment, this paper discusses two automatic segmentation algorithms from different points of view: the probabilities of insertion and omission, and the accuracy. The first algorithm, hereafter named the refined HMM algorithm, aims at refining the segmentation performed by standard HMM via a GMM (Gaussian Mixture Model) of each boundary...

chapter

Classification of healthy subjects and patients with pulmonary emphysema using continuous respiratory sounds

Takanori Okubo, Naoki Nakamura, Masaru Yamashita, Shoichi Matsunaga

2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society > 70 - 73

2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

In this paper, we propose a new method for classifying patients with pulmonary emphysema and healthy subjects using lung sounds. Using conventional classification methods, every boundary between inspiratory and expiratory phases in successive respiratory sounds are detected manually prior to automatic classification. However, manual segmentation must be performed accurately and has therefore created...

chapter

Likability of human voices: A feature analysis and a neural network regression approach to automatic likability estimation

Florian Eyben, Felix Weninger, Erik Marchi, Bjorn Schuller

2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS) > 1 - 4

2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS)

Recently, the automatic analysis of likability of a voice has become popular. This work follows up on our original work in this field and provides an in-depth discussion of the matter and an analysis of the acoustic parameters. We investigate the automatic analysis of voice likability in a continuous label space with neural networks as regressors and discuss the relevance of acoustic features. We...

chapter

Optimized state-tying for triphone-based HMMs under training data deficiency

Michal Borsky, Petr Pollak

2013 International Conference on Applied Electronics > 1 - 4

2013 International Conference on Applied Electronics (AE)

This paper deals with an optimization of state-tying for triphone-based HMM in the case of training data deficiency. The main goal is to analyse the importance of stopping threshold for criterial function in tree-based clustering. The log-likelihood measure was used as the criterial function, when a varying threshold with different sizes of training set was evaluated. Tied-state triphone HMMs with...

chapter

Speaker independent discriminant feature extraction for acoustic pattern-matching

Xavier Anguera

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 485 - 488

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

Acoustic pattern-matching algorithms have recently become prominent again for automatically processing speech utterances where no prior knowledge of the spoken language is required. Applications of such technology include, but are not limited to, query-by-example search, spoken term detection and automatic word discovery. Obtaining content-aware acoustic features as independent as possible from speaker...

Keywords:
TRAINING
ACOUSTICS
STANDARDS

Publication date

Set your own date range

Keywords

HIDDEN MARKOV MODELS (13)
SPEECH (11)
FEATURE EXTRACTION (7)
SPEECH RECOGNITION (7)
NEURAL NETWORKS (6)
COMPUTATIONAL MODELING (4)
DATA MODELS (4)
ACCURACY (3)
VECTORS (3)
ADAPTATION MODELS (2)
AUTOMATIC SPEECH RECOGNITION (2)
CAMERAS (2)
COMPUTERS (2)
DATABASES (2)
DECODING (2)
DEEP NEURAL NETWORKS (2)
IMAGE EDGE DETECTION (2)
LABORATORIES (2)
LATTICES (2)
LSTM (2)
PATTERN RECOGNITION (2)
ROBUSTNESS (2)
SHAPE (2)
3D TRACKING (1)
ACOUSTIC MODELING (1)
ACOUSTIC MODELLING (1)
ACOUSTIC UNIT DISCOVERY (1)
AGGREGATES (1)
ALGORITHM DESIGN AND ANALYSIS (1)
APPROXIMATION METHODS (1)
ART (1)
ARTIFICIAL NEURAL NETWORKS (1)
AUDITORY DISPLAYS (1)
AUGMENTED REALITY (1)
BILINEAR MODELS (1)
BIOLOGICAL CELLS (1)
BIOMEDICAL EQUIPMENT (1)
BIOMEDICAL IMAGING (1)
BIRDS (1)
BOOKS (1)
CANCER (1)
CARDIOLOGY (1)
CERVICAL CANCER (1)
CLUSTERING ALGORITHMS (1)
COLLABORATION (1)
COMPUTER SCIENCE (1)
COMPUTER VISION (1)
CONVERGENCE (1)
CORRELATION (1)
CROWD-SOURCING (1)
CTC (1)
DATA COMPRESSION (1)
DATA MINING (1)
DENSITY MEASUREMENT (1)
DIRICHLET PROCESS (1)
DISEASES (1)
DISPLAYS (1)
DNNS (1)
DRUGS (1)
EDUCATION (1)
ELECTRON MICROSCOPY (1)
END-TO-END TRAINING (1)
ENTROPY (1)
EQUATIONS (1)
ERROR ANALYSIS (1)
ESTIMATION (1)
FLUIDS (1)
FMLLR (1)
GLASS (1)
GLOBAL POSITIONING SYSTEM (1)
GOVERNMENT (1)
GRAY-SCALE (1)
HOSPITALS (1)
HUMAN FACTORS (1)
I-VECTOR (1)
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (1)
IMAGE ANALYSIS (1)
IMAGE COLOR ANALYSIS (1)
IMAGE GENERATION (1)
IMAGE PROCESSING (1)
IMAGE RECOGNITION (1)
IMAGE SEGMENTATION (1)
IMAGING (1)
INSPECTION (1)
IRON (1)
JOINT DECODING (1)
JOINT TRAINING (1)
K-MEANS (1)
KEYWORD SEARCH (1)
LIGHTING (1)
LIMITED RESOURCES (1)
LIMITING (1)
LINEAR PROGRAMMING (1)
LUNGS (1)
MACHINE LEARNING (1)
MAGNETIC SENSORS (1)
MAINTENANCE ENGINEERING (1)
more

INFONA - science communication portal

Search results

Deep neural network bottleneck features for bird species verification

Advances in all-neural speech recognition

A network of deep neural networks for Distant Speech Recognition

A comprehensive study of deep bidirectional LSTM RNNS for acoustic modeling in speech recognition

Training variance and performance evaluation of neural networks in speech

Stimulated training for automatic speech recognition and keyword search in limited resource conditions

Adapting and controlling DNN-based speech synthesis using input codes

Joint optimisation of tandem systems using Gaussian mixture density neural network discriminative sequence training

Rapid feature space MLLR speaker adaptation for deep neural network acoustic modeling

Iterative training of a DPGMM-HMM acoustic unit recognizer in a zero resource scenario

I-Vector estimation as auxiliary task for Multi-Task Learning based acoustic modeling for automatic speech recognition

A two-pass framework of mispronunciation detection & diagnosis for computer-aided pronunciation training

Accent classification with phonetic vowel representation

A new training algorithm for hybrid HMM/ANN speech recognition systems

Improving multiple-crowd-sourced transcriptions using a speech recogniser

Brandt's GLR method & refined HMM segmentation for TTS synthesis application

Classification of healthy subjects and patients with pulmonary emphysema using continuous respiratory sounds

Likability of human voices: A feature analysis and a neural network regression approach to automatic likability estimation

Optimized state-tying for triphone-based HMMs under training data deficiency

Speaker independent discriminant feature extraction for acoustic pattern-matching

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options