Search results

Items from 1 to 20 out of 151 results

chapter

Metric learning based data augmentation for environmental sound classification

Rui Lu, Zhiyao Duan, Changshui Zhang

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 1 - 5

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Deep neural networks have been widely applied in the field of environmental sound classification. However, due to the scarcity of carefully labeled data, their training process suffers from over-fitting. Data augmentation is a technique that alleviates this issue. It augments the training set with synthetic data that are created by modifying some parameters of the real data. However, not all kinds...

chapter

Scaper: A library for soundscape synthesis and augmentation

Justin Salamon, Duncan MacConnell, Mark Cartwright, Peter Li, more

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 344 - 348

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Sound event detection (SED) in environmental recordings is a key topic of research in machine listening, with applications in noise monitoring for smart cities, self-driving cars, surveillance, bioa-coustic monitoring, and indexing of large multimedia collections. Developing new solutions for SED often relies on the availability of strongly labeled audio recordings, where the annotation includes the...

chapter

Novel alignment method for DNN TTS training using HMM synthesis models

Sinisa Suzic, Tijana Delic, Darko Pekar, Vladimir Ostojic

2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY) > 271 - 276

2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY)

In order to train neural networks (NN) for text-to-speech synthesis (TTS), phonetic segmentation must be performed. The most accurate segmentation is performed manually, but the process of creating manual alignments is costly and time-consuming, so automatic procedures are preferable. In this paper, a simple alignment method based on models trained during hidden Markov Model (HMM) based TTS system...

chapter

Automatic detection of bird species from audio field recordings using HMM-based modelling of frequency tracks

Peter Jancovic, Munevver Kokuer

2017 25th European Signal Processing Conference (EUSIPCO) > 1779 - 1783

2017 25th European Signal Processing Conference (EUSIPCO)

This paper presents an automatic system for detection of bird species in field recordings. A sinusoidal detection algorithm is employed to segment the acoustic scene into isolated spectro-temporal segments. Each segment is represented as a temporal sequence of frequencies of the detected sinusoid, referred to as frequency track. Each bird species is represented by a set of hidden Markov models (HMMs),...

chapter

Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging

Pawel Rosciszewski, Jakub Kaliski

2017 International Conference on High Performance Computing & Simulation (HPCS) > 560 - 565

2017 International Conference on High Performance Computing & Simulation (HPCS)

In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training...

article

Using Eigenvoices and Nearest-Neighbors in HMM-Based Cross-Lingual Speaker Adaptation With Limited Data

Seyyed Saeed Sarfjoo, Cenk Demiroglu, Simon King

IEEE/ACM Transactions on Audio, Speech, and Language Processing > 2017 > 25 > 4 > 839 - 851

Cross-lingual speaker adaptation for speech synthesis has many applications, such as use in speech-to-speech translation systems. Here, we focus on cross-lingual adaptation for statistical speech synthesis systems using limited adaptation data. To that end, we propose two eigenvoice adaptation approaches exploiting a bilingual Turkish–English speech database that we collected. In one approach, eigenvoice...

chapter

Unsupervised query-by-example spoken term detection based on DPHMM tokenizer

Cao Jiankai, Zhang Lianhai

2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) > 1321 - 1325

2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)

This paper investigates the use of Dirichlet process hidden Markov model (DPHMM) tokenizer for the template matching based query-by-example spoken term detection (QbE-STD) task. DPHMM can be obtained following an unsupervised iterative procedure without any training transcriptions. The STD performance of the DPHMM tokenizer is evaluated on TIMIT Corpus. We construct three kinds of DPHMM based QbE-STD...

chapter

Addressing data sparsity in DNN acoustic modeling

Seeram Tejaswi, S Umesh

2017 Twenty-third National Conference on Communications (NCC) > 1 - 5

2017 Twenty-third National Conference on Communications (NCC)

This paper presents our work on developing acoustic models using deep neural networks (DNN) for low resource languages. This is considered one of the challenging problems in automatic speech recognition (ASR) as DNNs need large amount of data for building efficient models. The techniques explored in this approach use a common idea of transferring knowledge from models of high resource language to...

chapter

DNN acoustic models for dysarthric speech

Seeram Tejaswi, S Umesh

2017 Twenty-third National Conference on Communications (NCC) > 1 - 4

2017 Twenty-third National Conference on Communications (NCC)

In this paper, we investigate various training methods for building deep neural network (DNN) based acoustic models for dysarthric speech data. Methods like multitask learning, knowledge distillation and model adaptation, which overcome data sparsity and model over-fitting problems are employed to study the merits of each method. In Knowledge distillation framework, some privilege information in addition...

chapter

Towards bootstrapping Acoustic Models for resource poor Indian languages

Prabhat Pandey, Praful Hebbar, Prashant Borole, Sandeep Satpal, more

2017 Twenty-third National Conference on Communications (NCC) > 1 - 4

2017 Twenty-third National Conference on Communications (NCC)

There are several challenges while building Automatic Speech Recognition (ASR) system for low resource languages such as Indic languages. One problem is the access to large amounts of training data required to build Acoustic Models (AM) from scratch. In the context of Indian English, another challenge encountered is code-mixing as many Indian speakers are multilingual and exhibit code-mixing in their...

chapter

Personalized acoustic modeling by weakly supervised multi-task deep learning using acoustic tokens discovered from unlabeled data

Cheng-Kuan Wei, Cheng-Tao Chung, Hung-Yi Lee, Lin-Shan Lee

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5165 - 5169

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

It is well known that recognizers personalized to each user are much more effective than user-independent recognizers. With the popularity of smartphones today, although it is not difficult to collect a large set of audio data for each user, it is difficult to transcribe it. However, it is now possible to automatically discover acoustic tokens from unlabeled personal data in an unsupervised way. We...

chapter

Topic identification of spoken documents using unsupervised acoustic unit discovery

Santosh Kesiraju, Raghavendra Pappagari, Lucas Ondel, Lukas Burget, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5745 - 5749

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper investigates the application of unsupervised acoustic unit discovery for topic identification (topic ID) of spoken audio documents. The acoustic unit discovery method is based on a non-parametric Bayesian phone-loop model that segments a speech utterance into phone-like categories. The discovered phone-like (acoustic) units are further fed into the conventional topic ID framework. Using...

chapter

A study on data augmentation of reverberant speech for robust speech recognition

Tom Ko, Vijayaditya Peddinti, Daniel Povey, Michael L. Seltzer, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5220 - 5224

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The environmental robustness of DNN-based acoustic models can be significantly improved by using multi-condition training data. However, as data collection is a costly proposition, simulation of the desired conditions is a frequently adopted strategy. In this paper we detail a data augmentation approach for far-field ASR. We examine the impact of using simulated room impulse responses (RIRs), as real...

chapter

Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features

Ondrej Klejch, Peter Bell, Steve Renals

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5700 - 5704

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we present an extension of our previously described neural machine translation based system for punctuated transcription. This extension allows the system to map from per frame acoustic features to word level representations by replacing the traditional encoder in the encoder-decoder architecture with a hierarchical encoder. Furthermore, we show that a system combining lexical and acoustic...

chapter

Voice-transformation-based data augmentation for prosodic classification

Raul Fernandez, Andrew Rosenberg, Alexander Sorin, Bhuvana Ramabhadran, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5530 - 5534

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this work we explore data-augmentation techniques for the task of improving the performance of a supervised recurrent-neural-network classifier tasked with predicting prosodic-boundary and pitch-accent labels. The technique is based on applying voice transformations to the training data that modify the pitch baseline and range, as well as the vocal-tract and vocal-source characteristics of the...

chapter

Active learning for low-resource speech recognition: Impact of selection size and language modeling data

Ali Raza Syed, Andrew Rosenberg, Michael Mandel

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5315 - 5319

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Active learning aims to reduce the time and cost of developing speech recognition systems by selecting for transcription highly informative subsets from large pools of audio data. Previous evaluations at OpenKWS and IARPA BABEL have investigated data selection for low-resource languages in very constrained scenarios with 2-hour data selections given a 1-hour seed set. We expand on this to investigate...

chapter

Semi-supervised ensemble DNN acoustic model training

Sheng Li, Xugang Lu, Shinsuke Sakai, Masato Mimura, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5270 - 5274

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

It is very important to exploit abundant unlabeled speech for improving the acoustic model training in automatic speech recognition (ASR). Semi-supervised training methods incorporate unlabeled data in addition to labeled data to enhance the model training, but it encounters the error-prone label problem. The ensemble training scheme trains a set of models and combines them to make the model more...

chapter

Student-teacher network learning with enhanced features

Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, John R. Hershey

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5275 - 5279

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recent advances in distant-talking ASR research have confirmed that speech enhancement is an essential technique for improving the ASR performance, especially in the multichannel scenario. However, speech enhancement inevitably distorts speech signals, which can cause significant degradation when enhanced signals are used as training data. Thus, distant-talking ASR systems often resort to using the...

chapter

Discriminative importance weighting of augmented training data for acoustic model training

Sunit Sivasankaran, Emmanuel Vincent, Irina Illina

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4885 - 4889

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

DNN based acoustic models require a large amount of training data. Parametric data augmentation techniques such as adding noise, reverberation, or changing the speech rate, are often employed to boost the dataset size and the ASR performance. The choice of augmentation techniques and the associated parameters has been handled heuristically so far. In this work we propose an algorithm to automatically...

chapter

Domain adaptation of DNN acoustic models using knowledge distillation

Taichi Asami, Ryo Masumura, Yoshikazu Yamaguchi, Hirokazu Masataki, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5185 - 5189

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Constructing deep neural network (DNN) acoustic models from limited training data is an important issue for the development of automatic speech recognition (ASR) applications that will be used in various application-specific acoustic environments. To this end, domain adaptation techniques that train a domain-matched model without overfitting by lever-aging pre-constructed source models are widely...

Keywords:
ACOUSTICS
DATA MODELS

Publication date

Set your own date range

Publication type

book (128)
article (23)

Keywords

SPEECH (98)
HIDDEN MARKOV MODELS (93)
SPEECH RECOGNITION (72)
TRAINING DATA (44)
ADAPTATION MODELS (34)
COMPUTATIONAL MODELING (22)
FEATURE EXTRACTION (20)
ACCURACY (16)
SPEECH PROCESSING (15)
AUTOMATIC SPEECH RECOGNITION (14)
NATURAL LANGUAGE PROCESSING (14)
ACOUSTIC MODELING (13)
ADAPTATION MODEL (13)
ARTIFICIAL NEURAL NETWORKS (11)
DECODING (11)
NEURAL NETWORKS (11)
SPEAKER ADAPTATION (11)
VECTORS (11)
DEEP NEURAL NETWORKS (8)
GAUSSIAN PROCESSES (8)
SPEAKER RECOGNITION (8)
UNSUPERVISED LEARNING (8)
DATABASES (7)
DICTIONARIES (7)
SIGNAL PROCESSING (7)
ACOUSTIC MODEL (6)
CONTEXT MODELING (6)
DISCRIMINATIVE TRAINING (6)
LEARNING (ARTIFICIAL INTELLIGENCE) (6)
SIGNAL PROCESSING ALGORITHMS (6)
UNSUPERVISED TRAINING (6)
ACOUSTIC SIGNAL PROCESSING (5)
ACTIVE LEARNING (5)
COMPUTERS (5)
DATA AUGMENTATION (5)
DEEP NEURAL NETWORK (5)
DNN (5)
EDUCATIONAL INSTITUTIONS (5)
LATTICES (5)
MAXIMUM LIKELIHOOD ESTIMATION (5)
SIGNAL TO NOISE RATIO (5)
SPEECH SYNTHESIS (5)
SUPPORT VECTOR MACHINES (5)
TESTING (5)
TRANSFORMS (5)
ACOUSTIC MEASUREMENTS (4)
ALGORITHM DESIGN AND ANALYSIS (4)
CLASSIFICATION ALGORITHMS (4)
CLUSTERING ALGORITHMS (4)
CONFERENCES (4)
CONTEXT (4)
CONVERGENCE (4)
DATA MINING (4)
ENCODING (4)
GAUSSIAN MIXTURE MODEL (4)
HMM (4)
INDEXES (4)
KEYWORD SPOTTING (4)
MATHEMATICAL MODEL (4)
NOISE (4)
PROBABILITY (4)
ROBUSTNESS (4)
SILICON (4)
STANDARDS (4)
SUPPORT VECTOR MACHINE CLASSIFICATION (4)
TOPOLOGY (4)
VOCABULARY (4)
ADAPTATION (3)
ASR (3)
BAYES METHODS (3)
BUILDINGS (3)
COMPUTER ARCHITECTURE (3)
COVARIANCE MATRIX (3)
DATA SELECTION (3)
DEEP NEURAL NETWORK (DNN) (3)
ERROR ANALYSIS (3)
FILTERING (3)
GENETIC ALGORITHMS (3)
HIDDEN MARKOV MODEL (3)
KERNEL (3)
KEYWORD SEARCH (3)
LANGUAGE IDENTIFICATION (3)
LANGUAGE MODEL (3)
LIBRARIES (3)
MULTITASK LEARNING (3)
NIST (3)
OPTIMIZATION (3)
PATTERN RECOGNITION (3)
PREDICTIVE MODELS (3)
RECURRENT NEURAL NETWORKS (3)
RELIABILITY (3)
ROBOTS (3)
SEARCH PROBLEMS (3)
SGMM (3)
SPEAKER ADAPTIVE TRAINING (3)
SPEAKER DIARIZATION (3)
VIBRATIONS (3)
more

INFONA - science communication portal

Search results

Metric learning based data augmentation for environmental sound classification

Scaper: A library for soundscape synthesis and augmentation

Novel alignment method for DNN TTS training using HMM synthesis models

Automatic detection of bird species from audio field recordings using HMM-based modelling of frequency tracks

Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging

Using Eigenvoices and Nearest-Neighbors in HMM-Based Cross-Lingual Speaker Adaptation With Limited Data

Unsupervised query-by-example spoken term detection based on DPHMM tokenizer

Addressing data sparsity in DNN acoustic modeling

DNN acoustic models for dysarthric speech

Towards bootstrapping Acoustic Models for resource poor Indian languages

Personalized acoustic modeling by weakly supervised multi-task deep learning using acoustic tokens discovered from unlabeled data

Topic identification of spoken documents using unsupervised acoustic unit discovery

A study on data augmentation of reverberant speech for robust speech recognition

Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features

Voice-transformation-based data augmentation for prosodic classification

Active learning for low-resource speech recognition: Impact of selection size and language modeling data

Semi-supervised ensemble DNN acoustic model training

Student-teacher network learning with enhanced features

Discriminative importance weighting of augmented training data for acoustic model training

Domain adaptation of DNN acoustic models using knowledge distillation

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options