Search results

Items from 1 to 20 out of 54 results

chapter

Speech recognition features based on deep latent Gaussian models

Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 6

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)

This paper constructs speech features based on a generative model using a deep latent Gaussian model (DLGM), which is trained using stochastic gradient variational Bayes (SGVB) algorithm and performs efficient approximate inference and learning with a directed probabilistic graphical model. The trained DLGM then generate latent variables based on Gaussian distribution, which is used as new features...

chapter

Acoustic novelty detection with adversarial autoencoders

Emanuele Principi, Fabio Vesperini, Stefano Squartini, Francesco Piazza

2017 International Joint Conference on Neural Networks (IJCNN) > 3324 - 3330

2017 International Joint Conference on Neural Networks (IJCNN)

Novelty detection is the task of recognising events the differ from a model of normality. This paper proposes an acoustic novelty detector based on neural networks trained with an adversarial training strategy. The proposed approach is composed of a feature extraction stage that calculates Log-Mel spectral features from the input signal. Then, an autoencoder network, trained on a corpus of “normal”...

chapter

Analysis of keyword spotting performance across IARPA babel languages

William Hartmann, Damianos Karakos, Roger Hsiao, Le Zhang, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5765 - 5769

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

With the completion of the IARPA Babel program, it is possible to systematically analyze the performance of speech recognition systems across a wide variety of languages. We select 16 languages from the dataset and compare performance using a deep neural network-based acoustic model. The focus is on keyword spotting using the actual term-weighted value (ATWV) metric. We demonstrate that ATWV is keyword...

chapter

Advances in all-neural speech recognition

Geoffrey Zweig, Chengzhu Yu, Jasha Droppo, Andreas Stolcke

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4805 - 4809

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper advances the design of CTC-based all-neural (or end-to-end) speech recognizers. We propose a novel symbol inventory, and a novel iterated-CTC method in which a second system is used to transform a noisy initial output into a cleaner version. We present a number of stabilization and initialization methods we have found useful in training these networks. We evaluate our system on the commonly...

chapter

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework

Naoyuki Kanda, Xugang Lu, Hisashi Kawai

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4855 - 4859

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

When using connectionist temporal classification (CTC) based acoustic models (AMs) for large vocabulary continuous speech recognition (LVCSR), most previous studies have used a naive interpolation of the CTC-AM score and an additional language model score, although there is no theoretical justification for such an approach. On the other hand, we recently proposed a theoretically more sound decoding...

chapter

Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features

Ondrej Klejch, Peter Bell, Steve Renals

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5700 - 5704

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we present an extension of our previously described neural machine translation based system for punctuated transcription. This extension allows the system to map from per frame acoustic features to word level representations by replacing the traditional encoder in the encoder-decoder architecture with a hierarchical encoder. Furthermore, we show that a system combining lexical and acoustic...

chapter

End-to-end speech recognition and keyword search on low-resource languages

Andrew Rosenberg, Kartik Audhkhasi, Abhinav Sethy, Bhuvana Ramabhadran, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5280 - 5284

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In recent years, so-called, “end-to-end” speech recognition systems have emerged as viable alternatives to traditional ASR frameworks. Keyword search, localizing an orthographic query in a speech corpus, is typically performed by using automatic speech recognition (ASR) to generate an index. Previous work has evaluated the use of end-to-end systems for ASR on well known corpora (WSJ, Switchboard,...

chapter

Cumulative moving averaged bottleneck speaker vectors for online speaker adaptation of CNN-based acoustic models

Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5175 - 5179

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Adapting acoustic models to speakers have shown to greatly improve performance for many tasks. Among the adaptation approaches, exploiting auxiliary features characterizing speakers or environments has received great attention because they allow rapid adaptation, i.e. adaptation with limited amount of speech data such as a single utterance. However, the auxiliary features are usually computed in batch...

chapter

Joint CTC-attention based end-to-end speech recognition using multi-task learning

Suyoun Kim, Takaaki Hori, Shinji Watanabe

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4835 - 4839

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments. One approach is the attention-based encoder-decoder framework that learns a mapping between variable-length input and output sequences in one step using a purely data-driven method. The attention model has often been shown to improve the performance...

chapter

Improving latency-controlled BLSTM acoustic models for online speech recognition

Shaofei Xue, Zhijie Yan

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5340 - 5344

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Bidirectional long short-term memory (BLSTM) recurrent neural networks are powerful acoustic models in terms of recognition accuracy. When BLSTM acoustic models are used in decoding, the speech decoder needs to wait until the end of a whole sentence is reached, such that forward-propagation in the backward direction can then be performed. The nature of BLSTM acoustic models makes them inappropriate...

chapter

An extended experimental investigation of DNN uncertainty propagation for noise robust ASR

Karan Nathwani, Juan A. Morales-Cordovilla, Sunit Sivasankaran, Irina Illina, more

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA) > 26 - 30

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA)

Automatic speech recognition (ASR) in noisy environments remains a challenging goal. Recently, the idea of estimating the uncertainty about the features obtained after speech enhancement and propagating it to dynamically adapt deep neural network (DNN) based acoustic models has raised some interest. However, the results in the literature were reported on simulated noisy datasets for a limited variety...

chapter

Look, listen, and decode: Multimodal speech recognition with images

Felix Sun, David Harwath, James Glass

2016 IEEE Spoken Language Technology Workshop (SLT) > 573 - 578

2016 IEEE Spoken Language Technology Workshop (SLT)

In this paper, we introduce a multimodal speech recognition scenario, in which an image provides contextual information for a spoken caption to be decoded. We investigate a lattice rescoring algorithm that integrates information from the image at two different points: the image is used to augment the language model with the most likely words, and to rescore the top hypotheses using a word-level RNN...

chapter

Speaker independent diarization for child language environment analysis using deep neural networks

Maryam Najafian, John H. L. Hansen

2016 IEEE Spoken Language Technology Workshop (SLT) > 114 - 120

2016 IEEE Spoken Language Technology Workshop (SLT)

Large-scale monitoring of the child language environment through measuring the amount of speech directed to the child by other children and adults during a vocal communication is an important task. Using the audio extracted from a recording unit worn by a child within a childcare center, at each point in time our proposed diarization system can determine the content of the child's language environment,...

chapter

Iterative training of a DPGMM-HMM acoustic unit recognizer in a zero resource scenario

Michael Heck, Sakriani Sakti, Satoshi Nakamura

2016 IEEE Spoken Language Technology Workshop (SLT) > 57 - 63

2016 IEEE Spoken Language Technology Workshop (SLT)

In this paper we propose a framework for building a full-fledged acoustic unit recognizer in a zero resource setting, i.e., without any provided labels. For that, we combine an iterative Dirichlet process Gaussian mixture model (DPGMM) clustering framework with a standard pipeline for supervised GMM-HMM acoustic model (AM) and n-gram language model (LM) training, enhanced by a scheme for iterative...

chapter

Using tone-based extended recognition network to detect non-native Mandarin tone mispronunciations

Wei Li, Sabato Marco Siniscalchi, Nancy F. Chen, Chin-Hui Lee

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this paper, we investigate a DNN tone-based extended recognition network (ERN) approach to Mandarin tone recognition and tone mispronunciation detection. Given a toneless syllable sequence, a tone-based ERN is constructed by assigning five different tones to each toneless syllable, obtaining a fully expanded tonal syllable network. Next, Viterbi decoding is carried out on the tone-based ERN to...

chapter

Automatic speech recognition errors detection using supervised learning techniques

Rahhal Errattahi, Asmaa El Hannani, Hassan Ouahmane, Thomas Hain

2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA) > 1 - 6

2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)

Over the last years, many advances have been made in the field of Automatic Speech Recognition (ASR). However, the persistent presence of ASR errors is limiting the widespread adoption of speech technology in real life applications. This motivates the attempts to find alternative techniques to automatically detect and correct ASR errors, which can be very effective and especially when the user does...

chapter

Cluster-based senone selection for the efficient calculation of deep neural network acoustic models

Jun-Hua Liu, Zhen-Hua Ling, Si Wei, Guo-Ping Hu, more

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

In this paper, we propose a cluster-based senone selection method to speed up the computation of deep neural networks (DNN) at the decoding time of speech recognition. In DNN-based acoustic models, the large number of senones at the output layer is one of the main causes that lead to the high computation complexity of DNNs. Inspired by the mixture selection method designed for the Gaussian mixture...

chapter

Confidence estimation for speech recognition systems using conditional random fields trained with partially annotated data

Sheng Li, Xugang Lu, Shinsuke Mori, Yuya Akita, more

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

Conditional random fields (CRF) can generate high-quality confidence measure scores (CMS) for speech recognition systems. However, like many other real-world machine learning tasks, there are only limited annotated data for training but always abundant unlabeled data, which requires too much human efforts and expertise to annotate. To address this issue, we use a scheme of CRF training for ASR confidence...

chapter

An algorithm for automatic words extraction from a stream of phones in dictionary-based large vocabulary continuous speech recognition systems

Giorgio Biagetti, Paolo Crippa, Laura Falaschetti, Simone Orcioni, more

2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) > 18 - 23

2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)

As improvements on acoustic modeling have rapidly progressed in recent years thanks to the impressive gains in performance obtained using deep neural networks (DNNs), language modeling remains a bottleneck for high performance large vocabulary continuous speech recognition (LVCSR) systems. In this paper an algorithm for automatic words extraction from a stream of phones is suggested to be used in...

article

Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition

Chao Weng, Dong Yu, Michael L. Seltzer, Jasha Droppo

IEEE/ACM Transactions on Audio, Speech, and Language Processing > 2015 > 23 > 10 > 1670 - 1679

We investigate techniques based on deep neural networks (DNNs) for attacking the single-channel multi-talker speech recognition problem. Our proposed approach contains five key ingredients: a multi-style training strategy on artificially mixed speech data, a separate DNN to estimate senone posterior probabilities of the louder and softer speakers at each frame, a weighted finite-state transducer (WFST)-based...

Keywords:
ACOUSTICS
DECODING

Publication date

Set your own date range

Publication type

book (49)
article (5)

Keywords

HIDDEN MARKOV MODELS (42)
SPEECH RECOGNITION (39)
SPEECH (32)
FEATURE EXTRACTION (12)
DATA MODELS (11)
COMPUTATIONAL MODELING (10)
AUTOMATIC SPEECH RECOGNITION (6)
LATTICES (6)
SPEECH PROCESSING (6)
ACCURACY (5)
ADAPTATION MODEL (5)
SPEECH CODING (5)
VITERBI ALGORITHM (5)
ARTIFICIAL NEURAL NETWORKS (4)
DATABASES (4)
ERROR ANALYSIS (4)
NATURAL LANGUAGE PROCESSING (4)
TRAINING DATA (4)
ACOUSTIC SIGNAL PROCESSING (3)
CONTEXT (3)
DEEP NEURAL NETWORK (3)
DICTIONARIES (3)
INDEXES (3)
MAXIMUM LIKELIHOOD ESTIMATION (3)
NEURAL NETWORKS (3)
OPTIMIZATION (3)
SUPPORT VECTOR MACHINES (3)
UNSUPERVISED LEARNING (3)
VECTORS (3)
ACOUSTIC MODEL (2)
ACOUSTIC MODELING (2)
ADAPTATION MODELS (2)
CLASSIFICATION ALGORITHMS (2)
CONDITIONAL RANDOM FIELDS (2)
CONNECTIONIST TEMPORAL CLASSIFICATION (2)
CONTINUOUS SPEECH RECOGNITION (2)
CTC (2)
DATA MINING (2)
DETECTORS (2)
ESTIMATION (2)
GAUSSIAN DISTRIBUTION (2)
GAUSSIAN PROCESSES (2)
HMM (2)
JOINTS (2)
KEYWORD SEARCH (2)
LANGUAGE TRANSLATION (2)
LEARNING (ARTIFICIAL INTELLIGENCE) (2)
LINGUISTICS (2)
MACHINE LEARNING (2)
MATHEMATICAL MODEL (2)
NATURAL LANGUAGES (2)
NOISE MEASUREMENT (2)
NOISE ROBUSTNESS (2)
PRONUNCIATION MODELING (2)
RELIABILITY (2)
SIGNAL PROCESSING (2)
SPEAKER DIARIZATION (2)
SPEAKER RECOGNITION (2)
STANDARDS (2)
STRUCTURAL CLASSIFICATION (2)
WORD ERROR RATE (2)
ACCELERATION (1)
ACOUSTIC DETECTORS (1)
ACOUSTIC IMPROVEMENTS (1)
ACOUSTIC MATCH (1)
ACOUSTIC MEASUREMENTS (1)
ACOUSTIC MODEL ADAPTATION (1)
ACOUSTIC MODELS (1)
ACOUSTIC NOISE (1)
ACOUSTIC SEGMENT MODEL (1)
ACOUSTIC SEGMENT MODELING (1)
ACOUSTIC UNIT DISCOVERY (1)
ADAPTIVELY TRAINED ACOUSTIC MODEL (1)
AIRCRAFT (1)
AIRCRAFT MANUFACTURE (1)
APPROXIMATION METHODS (1)
ARABIC BROADCAST TRANSCRIPTION SYSTEM (1)
ARABIC LANGUAGE (1)
ASR SYSTEM COMBINATION (1)
ASR SYSTEMS (1)
ATTENTION (1)
ATTENTION NETWORKS (1)
ATWV (1)
AUDIO MES FEATURE (1)
AUDIO-MES ASR (1)
AUTOMATIC SEGMENTATION (1)
AUTOMATIC SPEECH RECOGNITION (ASR) (1)
AUTOMATIC SPEECH RECOGNITION SYSTEMS (1)
AUTOMATIC SPEECH RECOGNIZERS (1)
BABBLE NOISE (1)
BABEL (1)
BELIEF FUNCTIONS (1)
BING MOBILE (1)
BIOLOGICAL SYSTEM MODELING (1)
BIT RATE (1)
BLSTM (1)
BOOKS (1)
more

INFONA - science communication portal

Search results

Speech recognition features based on deep latent Gaussian models

Acoustic novelty detection with adversarial autoencoders

Analysis of keyword spotting performance across IARPA babel languages

Advances in all-neural speech recognition

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework

Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features

End-to-end speech recognition and keyword search on low-resource languages

Cumulative moving averaged bottleneck speaker vectors for online speaker adaptation of CNN-based acoustic models

Joint CTC-attention based end-to-end speech recognition using multi-task learning

Improving latency-controlled BLSTM acoustic models for online speech recognition

An extended experimental investigation of DNN uncertainty propagation for noise robust ASR

Look, listen, and decode: Multimodal speech recognition with images

Speaker independent diarization for child language environment analysis using deep neural networks

Iterative training of a DPGMM-HMM acoustic unit recognizer in a zero resource scenario

Using tone-based extended recognition network to detect non-native Mandarin tone mispronunciations

Automatic speech recognition errors detection using supervised learning techniques

Cluster-based senone selection for the efficient calculation of deep neural network acoustic models

Confidence estimation for speech recognition systems using conditional random fields trained with partially annotated data

An algorithm for automatic words extraction from a stream of phones in dictionary-based large vocabulary continuous speech recognition systems

Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options