Search results

Items from 1 to 20 out of 95 results

chapter

Transfer learning of weakly labelled audio

Aleksandr Diment, Tuomas Virtanen

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 6 - 10

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Many machine learning tasks have been shown solvable with impressive levels of success given large amounts of training data and computational power. For the problems which lack data sufficient to achieve high performance, methods for transfer learning can be applied. These refer to performing the new task while having prior knowledge of the nature of the data, gained by first performing a different...

chapter

Metric learning based data augmentation for environmental sound classification

Rui Lu, Zhiyao Duan, Changshui Zhang

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 1 - 5

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Deep neural networks have been widely applied in the field of environmental sound classification. However, due to the scarcity of carefully labeled data, their training process suffers from over-fitting. Data augmentation is a technique that alleviates this issue. It augments the training set with synthetic data that are created by modifying some parameters of the real data. However, not all kinds...

chapter

Optimizing acoustic feature extractor for anomalous sound detection based on Neyman-Pearson lemma

Yuma Koizumi, Shoichiro Saito, Hisashi Uematsu, Noboru Harada

2017 25th European Signal Processing Conference (EUSIPCO) > 698 - 702

2017 25th European Signal Processing Conference (EUSIPCO)

We propose a method for optimizing an acoustic feature extractor for anomalous sound detection (ASD). Most ASD systems adopt outlier-detection techniques because it is difficult to collect a massive amount of anomalous sound data. To improve the performance of such outlier-detection-based ASD, it is essential to extract a set of efficient acoustic features that is suitable for identifying anomalous...

chapter

Unsupervised query-by-example spoken term detection based on DPHMM tokenizer

Cao Jiankai, Zhang Lianhai

2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) > 1321 - 1325

2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)

This paper investigates the use of Dirichlet process hidden Markov model (DPHMM) tokenizer for the template matching based query-by-example spoken term detection (QbE-STD) task. DPHMM can be obtained following an unsupervised iterative procedure without any training transcriptions. The STD performance of the DPHMM tokenizer is evaluated on TIMIT Corpus. We construct three kinds of DPHMM based QbE-STD...

chapter

Towards bootstrapping Acoustic Models for resource poor Indian languages

Prabhat Pandey, Praful Hebbar, Prashant Borole, Sandeep Satpal, more

2017 Twenty-third National Conference on Communications (NCC) > 1 - 4

2017 Twenty-third National Conference on Communications (NCC)

There are several challenges while building Automatic Speech Recognition (ASR) system for low resource languages such as Indic languages. One problem is the access to large amounts of training data required to build Acoustic Models (AM) from scratch. In the context of Indian English, another challenge encountered is code-mixing as many Indian speakers are multilingual and exhibit code-mixing in their...

chapter

Analysis of keyword spotting performance across IARPA babel languages

William Hartmann, Damianos Karakos, Roger Hsiao, Le Zhang, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5765 - 5769

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

With the completion of the IARPA Babel program, it is possible to systematically analyze the performance of speech recognition systems across a wide variety of languages. We select 16 languages from the dataset and compare performance using a deep neural network-based acoustic model. The focus is on keyword spotting using the actual term-weighted value (ATWV) metric. We demonstrate that ATWV is keyword...

chapter

The 2016 BBN Georgian telephone speech keyword spotting system

Tanel Alumae, Damianos Karakos, William Hartmann, Roger Hsiao, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5755 - 5759

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we describe the 2016 BBN conversational telephone speech keyword spotting system; the culmination of four years of research and development under the IARPA Babel program. The system was constructed in response to the NIST Open Keyword Search (OpenKWS) evaluation of 2016. We present our technological breakthroughs in building top-performing keyword spotting processing systems for new...

chapter

Low-rank and sparse soft targets to learn better DNN acoustic models

Pranay Dighe, Afsaneh Asaei, Herve Bourlard

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5265 - 5269

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Conventional deep neural networks (DNN) for speech acoustic modeling rely on Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary class labels as the targets for DNN training. Subword classes in speech recognition systems correspond to context-dependent tied states or senones. The present work addresses some limitations of GMM-HMM senone alignments for DNN training. We hypothesize...

chapter

A study on data augmentation of reverberant speech for robust speech recognition

Tom Ko, Vijayaditya Peddinti, Daniel Povey, Michael L. Seltzer, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5220 - 5224

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The environmental robustness of DNN-based acoustic models can be significantly improved by using multi-condition training data. However, as data collection is a costly proposition, simulation of the desired conditions is a frequently adopted strategy. In this paper we detail a data augmentation approach for far-field ASR. We examine the impact of using simulated room impulse responses (RIRs), as real...

chapter

Non-negative matrix factorization of signals with overlapping events for event detection applications

Shiqiang Wang, Jorge Ortiz

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5960 - 5964

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In many event detection applications, training data may contain tags with multiple, simultaneous events. This is particularly likely when the definition of “event” is broad and includes events that can persist for an extended period of time. Decomposing a mixed signal into signals corresponding to individual events is non-trivial. In this paper, we propose a non-negative matrix factorization (NMF)...

chapter

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework

Naoyuki Kanda, Xugang Lu, Hisashi Kawai

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4855 - 4859

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

When using connectionist temporal classification (CTC) based acoustic models (AMs) for large vocabulary continuous speech recognition (LVCSR), most previous studies have used a naive interpolation of the CTC-AM score and an additional language model score, although there is no theoretical justification for such an approach. On the other hand, we recently proposed a theoretically more sound decoding...

chapter

Voice-transformation-based data augmentation for prosodic classification

Raul Fernandez, Andrew Rosenberg, Alexander Sorin, Bhuvana Ramabhadran, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5530 - 5534

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this work we explore data-augmentation techniques for the task of improving the performance of a supervised recurrent-neural-network classifier tasked with predicting prosodic-boundary and pitch-accent labels. The technique is based on applying voice transformations to the training data that modify the pitch baseline and range, as well as the vocal-tract and vocal-source characteristics of the...

chapter

Discriminative importance weighting of augmented training data for acoustic model training

Sunit Sivasankaran, Emmanuel Vincent, Irina Illina

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4885 - 4889

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

DNN based acoustic models require a large amount of training data. Parametric data augmentation techniques such as adding noise, reverberation, or changing the speech rate, are often employed to boost the dataset size and the ASR performance. The choice of augmentation techniques and the associated parameters has been handled heuristically so far. In this work we propose an algorithm to automatically...

chapter

Unsupervised speaker adaptation of batch normalized acoustic models for robust ASR

Zhong-Qiu Wang, DeLiang Wang

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4890 - 4894

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Batch normalization is a standard technique for training deep neural networks. In batch normalization, the input of each hidden layer is first mean-variance normalized and then linearly transformed before applying non-linear activation functions. We propose a novel unsupervised speaker adaptation technique for batch normalized acoustic models. The key idea is to adjust the linear transformations previously...

chapter

Network architectures for multilingual speech representation learning

Tom Sercu, George Saon, Jia Cui, Xiaodong Cui, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5295 - 5299

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Multilingual (ML) representations play a key role in building speech recognition systems for low resource languages. The IARPA sponsored BABEL program focuses on building speech recognition (ASR) and keyword search (KWS) systems in over 24 languages with limited training data. The most common mechanism to derive ML representations in the BABEL program has been with the use of a two-stage network,...

chapter

High quality agreement-based semi-supervised training data for acoustic modeling

Felix de Chaumont Quitry, Asa Oines, Pedro Moreno, Eugene Weinstein

2016 IEEE Spoken Language Technology Workshop (SLT) > 592 - 596

2016 IEEE Spoken Language Technology Workshop (SLT)

This paper describes a new technique to automatically obtain large high-quality training speech corpora for acoustic modeling. Traditional approaches select utterances based on confidence thresholds and other heuristics. We propose instead to use an ensemble approach: we transcribe each utterance using several recognizers, and only keep those on which they agree. The recognizers we use are trained...

chapter

Attribute based shared hidden layers for cross-language knowledge transfer

Vipul Arora, Aditi Lahiri, Henning Reet

2016 IEEE Spoken Language Technology Workshop (SLT) > 617 - 623

2016 IEEE Spoken Language Technology Workshop (SLT)

Deep neural network (DNN) acoustic models can be adapted to under-resourced languages by transferring the hidden layers. An analogous transfer problem is popular as few-shot learning to recognise scantily seen objects based on their meaningful attributes. In similar way, this paper proposes a principled way to represent the hidden layers of DNN in terms of attributes shared across languages. The diverse...

chapter

Towards acoustic model unification across dialects

Mohamed Elfeky, Meysam Bastani, Xavier Velez, Pedro Moreno, more

2016 IEEE Spoken Language Technology Workshop (SLT) > 624 - 628

2016 IEEE Spoken Language Technology Workshop (SLT)

Acoustic model performance typically decreases when evaluated on a dialectal variation of the same language that was not used during training. Similarly, models simultaneously trained on a group of dialects tend to underperform dialect-specific models. In this paper, we report on our efforts towards building a unified acoustic model that can serve a multi-dialectal language. Two techniques are presented:...

chapter

Deep networks with stochastic depth for acoustic modelling

Duisheng Chen, Weibin Zhang, Xiangmin Xu, Xiaofeng Xing

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Training very deep neural networks is very difficult because of gradient degradation. However, the incomparable expressiveness of the many deep layers is highly desirable at testing time and usually leads to better performance. Recently, training techniques such as residual networks that enable us to train very deep networks have proved to be a great success. In this paper, we studied the application...

chapter

Speech recognition of under-resourced languages using mismatched transcriptions

Van Hai Do, Nancy F. Chen, Boon Pang Lim, Mark Hasegawa-Johnson

2016 International Conference on Asian Language Processing (IALP) > 112 - 115

2016 International Conference on Asian Language Processing (IALP)

Mismatched crowdsourcing is a technique to derive speech transcriptions using crowd-workers unfamiliar with the language being spoken. This technique is especially useful for under-resourced languages since it is hard to hire native transcribers. In this paper, we demonstrate that using mismatched transcription for adaptation improves performance of speech recognition under limited matched training...

Keywords:
ACOUSTICS
TRAINING DATA

Publication date

Set your own date range

Publication type

book (83)
article (12)

Keywords

SPEECH (51)
HIDDEN MARKOV MODELS (50)
DATA MODELS (44)
SPEECH RECOGNITION (42)
FEATURE EXTRACTION (18)
ADAPTATION MODELS (13)
ACCURACY (12)
VECTORS (11)
AUTOMATIC SPEECH RECOGNITION (10)
COMPUTATIONAL MODELING (10)
ACOUSTIC MODELING (8)
DATABASES (8)
ACOUSTIC MODEL (7)
ARTIFICIAL NEURAL NETWORKS (7)
MEASUREMENT (7)
SIGNAL PROCESSING (7)
CONTEXT (6)
DATA MINING (6)
DEEP NEURAL NETWORKS (6)
VOCABULARY (6)
CORRELATION (5)
DATA AUGMENTATION (5)
DEEP NEURAL NETWORK (5)
DICTIONARIES (5)
LEARNING (ARTIFICIAL INTELLIGENCE) (5)
NEURAL NETWORKS (5)
SUPPORT VECTOR MACHINES (5)
TESTING (5)
TIME FREQUENCY ANALYSIS (5)
TRANSFORMS (5)
DECODING (4)
EQUATIONS (4)
LANGUAGE MODEL (4)
NATURAL LANGUAGE PROCESSING (4)
NOISE (4)
PATTERN RECOGNITION (4)
SIGNAL PROCESSING ALGORITHMS (4)
SIGNAL TO NOISE RATIO (4)
SPEECH PROCESSING (4)
TIME DOMAIN ANALYSIS (4)
TOPOLOGY (4)
ACOUSTIC SIGNAL PROCESSING (3)
ADAPTATION MODEL (3)
ALGORITHM DESIGN AND ANALYSIS (3)
CLASSIFICATION ALGORITHMS (3)
COMPUTERS (3)
CONFERENCES (3)
DATA SELECTION (3)
DETECTION ALGORITHMS (3)
DNN (3)
EDUCATIONAL INSTITUTIONS (3)
ELECTRONIC MAIL (3)
ENTROPY (3)
ERROR ANALYSIS (3)
FILTERING (3)
FREQUENCY DOMAIN ANALYSIS (3)
GAUSSIAN MIXTURE MODEL (3)
GAUSSIAN PROCESSES (3)
KEYWORD SPOTTING (3)
LATTICES (3)
MACHINE LEARNING (3)
RELIABILITY (3)
ROBUSTNESS (3)
SGMM (3)
SUPPORT VECTOR MACHINE CLASSIFICATION (3)
UNSUPERVISED TRAINING (3)
WAVELET TRANSFORMS (3)
ACOUSTIC MEASUREMENTS (2)
ACTIVE LEARNING (2)
ARTIFICIAL INTELLIGENCE (2)
ARTIFICIAL NEURAL NETWORK (2)
BACKGROUND NOISE (2)
BAND PASS FILTERS (2)
BOOTSTRAPPING (2)
BUILDINGS (2)
COMPUTER ARCHITECTURE (2)
CONTEXT MODELING (2)
CONVERGENCE (2)
COVARIANCE MATRIX (2)
DECISION MAKING (2)
DETECTORS (2)
ENCODING (2)
ESTIMATION (2)
FAST FOURIER TRANSFORMS (2)
FAULT DIAGNOSIS (2)
FILTER BANK (2)
FILTERING THEORY (2)
FOURIER TRANSFORMS (2)
HEURISTIC ALGORITHMS (2)
HIDDEN MARKOV MODEL (2)
HMM (2)
INDIAN LANGUAGES (2)
INTERNET (2)
INTERPOLATION (2)
KEYWORD SEARCH (2)
LVCSR (2)
MATHEMATICAL MODEL (2)
more

INFONA - science communication portal

Search results

Transfer learning of weakly labelled audio

Metric learning based data augmentation for environmental sound classification

Optimizing acoustic feature extractor for anomalous sound detection based on Neyman-Pearson lemma

Unsupervised query-by-example spoken term detection based on DPHMM tokenizer

Towards bootstrapping Acoustic Models for resource poor Indian languages

Analysis of keyword spotting performance across IARPA babel languages

The 2016 BBN Georgian telephone speech keyword spotting system

Low-rank and sparse soft targets to learn better DNN acoustic models

A study on data augmentation of reverberant speech for robust speech recognition

Non-negative matrix factorization of signals with overlapping events for event detection applications

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework

Voice-transformation-based data augmentation for prosodic classification

Discriminative importance weighting of augmented training data for acoustic model training

Unsupervised speaker adaptation of batch normalized acoustic models for robust ASR

Network architectures for multilingual speech representation learning

High quality agreement-based semi-supervised training data for acoustic modeling

Attribute based shared hidden layers for cross-language knowledge transfer

Towards acoustic model unification across dialects

Deep networks with stochastic depth for acoustic modelling

Speech recognition of under-resourced languages using mismatched transcriptions

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options