Wyniki wyszukiwania

rozdział

Inter dataset variability modeling for speaker recognition

Hagai Aronowitz

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5400 - 5404

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We introduce a novel approach of addressing inter-dataset variability in the context of speaker recognition in a mismatched condition under the JHU-2013 domain adaptation challenge (DAC) framework. Previously, we took a subspace removal approach for inter-dataset variability compensation (IDVC) of within speaker variability. In this work we substitute subspace removal with incorporation of the variability...

rozdział

Incremental adaptation using active learning for acoustic emotion recognition

Mohammed Abdelwahab, Carlos Busso

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5160 - 5164

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The performance of speech emotion classifiers greatly degrade when the training conditions do not match the testing conditions. This problem is observed in cross-corpora evaluations, even when the corpora are similar. The lack of generalization is particularly problematic when the emotion classifiers are used in real applications. This study addresses this problem by combining active learning (AL)...

rozdział

Unsupervised adaptation for deep neural networks using Alternating Direction Method of Multipliers

Roger Hsiao, Tim Ng, Man-Hung Siu

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5180 - 5184

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we continue our work on linear least squares based adaptation (LLS) for deep neural networks. We show that our previously proposed algorithm is a special case of an optimization algorithm called Alternating Direction Method of Multipliers (ADMM). We demonstrate that the adaptation algorithm can improve the performance on various deep neural networks including the bidirectional long...

rozdział

Personalized acoustic modeling by weakly supervised multi-task deep learning using acoustic tokens discovered from unlabeled data

Cheng-Kuan Wei, Cheng-Tao Chung, Hung-Yi Lee, Lin-Shan Lee

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5165 - 5169

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

It is well known that recognizers personalized to each user are much more effective than user-independent recognizers. With the popularity of smartphones today, although it is not difficult to collect a large set of audio data for each user, it is difficult to transcribe it. However, it is now possible to automatically discover acoustic tokens from unlabeled personal data in an unsupervised way. We...

rozdział

Theoretical vulnerabilities in map speaker adaptation

Tetsushi Ohki, Akira Otsuka

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2042 - 2046

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We analyze the theoretical vulnerability of maximum a posteriori(MAP) speaker adaptation, which is widely used in practical speaker recognition systems. First, we proved that there exist a set of feature vectors, what are called wolves, which can impersonate almost all the registered speakers with probability asymptotically close to 1 with at most two trials. Second, our experiment shows that the...

rozdział

The 2016 BBN Georgian telephone speech keyword spotting system

Tanel Alumae, Damianos Karakos, William Hartmann, Roger Hsiao, więcej

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5755 - 5759

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we describe the 2016 BBN conversational telephone speech keyword spotting system; the culmination of four years of research and development under the IARPA Babel program. The system was constructed in response to the NIST Open Keyword Search (OpenKWS) evaluation of 2016. We present our technological breakthroughs in building top-performing keyword spotting processing systems for new...

rozdział

Practical strategies for content-adaptive batch steganography and pooled steganalysis

Remi Cogranne, Vahid Sedighi, Jessica Fridrich

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2122 - 2126

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper investigates practical strategies for distributing payload across images with content-adaptive steganography and for pooling outputs of a single-image detector for steganalysis. Adopting a statistical model for the detector's output, the steganographer minimizes the power of the most powerful detector of an omniscient Warden, while the Warden, informed by the payload spreading strategy,...

rozdział

Intra-class covariance adaptation in PLDA back-ends for speaker verification

Srikanth Madikeri, Marc Ferras, Petr Motlicek, Subhadeep Dey

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5365 - 5369

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Multi-session training conditions are becoming increasingly common in recent benchmark datasets for both text-independent and text-dependent speaker verification. In the state-of-the-art i-vector framework for speaker verification, such conditions are addressed by simple techniques such as averaging the individual i-vectors, averaging scores, or modifying the Probabilistic Linear Discriminant Analysis...

rozdział

e-vectors: JFA and i-vectors revisited

Sandro Cumani, Pietro Laface

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5435 - 5439

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Systems based on i-vectors represent the current state-of-the-art in text-independent speaker recognition. In this work we introduce a new compact representation of a speech segment, similar to the speaker factors of Joint Factor Analysis (JFA) and to i-vectors, that we call “e-vector”. The e-vectors derive their name from the eigenvoice space of the JFA speaker modeling approach. Our working hypothesis...

rozdział

Low-resource grapheme-to-phoneme conversion using recurrent neural networks

Preethi Jyothi, Mark Hasegawa-Johnson

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5030 - 5034

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Grapheme-to-phoneme (G2P) conversion is an important problem for many speech and language processing applications. G2P models are particularly useful for low-resource languages that do not have well-developed pronunciation lexicons. Prominent G2P paradigms are based on initial alignments between grapheme and phoneme sequences. In this work, we devise new alignment strategies that work effectively...

rozdział

A deep neural network integrated with filterbank learning for speech recognition

Hiroshi Seki, Kazumasa Yamamoto, Seiichi Nakagawa

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5480 - 5484

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Deep neural networks (DNN) have achieved significant success in the field of speech recognition. One of the main advantages of the DNN is automatic feature extraction without human intervention. Therefore, we incorporate a pseudo-filterbank layer to the bottom of DNN and train the whole filterbank layer and the following networks jointly, while most systems take pre-defined mel-scale filterbanks as...

rozdział

Exploiting sequence information for text-dependent Speaker Verification

Subhadeep Dey, Petr Motlicek, Srikanth Madikeri, Marc Ferras

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5370 - 5374

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Model-based approaches to Speaker Verification (SV), such as Joint Factor Analysis (JFA), i-vector and relevance Maximum-a-Posteriori (MAP), have shown to provide state-of-the-art performance for text-dependent systems with fixed phrases. The performance of i-vector and JFA models has been further enhanced by estimating posteriors from Deep Neural Network (DNN) instead of Gaussian Mixture Model (GMM)...

rozdział

Knowledge distillation for small-footprint highway networks

Liang Lu, Michelle Guo, Steve Renals

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4820 - 4824

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Deep learning has significantly advanced state-of-the-art of speech recognition in the past few years. However, compared to conventional Gaussian mixture acoustic models, neural network models are usually much larger, and are therefore not very deployable in embedded devices. Previously, we investigated a compact highway deep neural network (HDNN) for acoustic modelling, which is a type of depth-gated...

rozdział

Adaptation of PLDA for multi-source text-independent speaker verification

Liping Chen, Kong Aik Lee, Bin Ma, Long Ma, więcej

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5380 - 5384

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Probabilistic linear discriminant analysis (PLDA) is widely described as an effective model for text-independent speaker verification in the i-vector space. The PLDA scoring function is typically formulated as the likelihood ratio between the speaker-adapted and the universal PLDAs. In this case, the adaptation of PLDA was performed through the speaker factors. In this paper, we show that the channel...

rozdział

Recurrent Neural Network based language modeling with controllable external Memory

Wei-Jen Ko, Bo-Hsiang Tseng, Hung-Yi Lee

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5705 - 5709

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

It is crucial for language models to model long-term dependency in word sequences, which can be achieved to some good extent by recurrent neural network (RNN) based language models with long short-term memory (LSTM) units. To accurately model the sophisticated long-term information in human languages, large memory in language models is necessary. However, the size of RNN-based language models cannot...

rozdział

Alternative networks for monolingual bottleneck features

William Hartmann, Roger Hsiao, Stavros Tsakalidis

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5290 - 5294

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

While recent advances in deep neural networks have lead to significant improvements in speech recognition, they have been applied mainly to acoustic and language modeling. We instead apply the models to bottleneck feature extraction. Several DNN, CNN, and BLSTM-based bottleneck feature networks are compared using both DNN and BLSTM acoustic models. Multiple variations in network architecture and feature...

rozdział

Semi-supervised ensemble DNN acoustic model training

Sheng Li, Xugang Lu, Shinsuke Sakai, Masato Mimura, więcej

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5270 - 5274

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

It is very important to exploit abundant unlabeled speech for improving the acoustic model training in automatic speech recognition (ASR). Semi-supervised training methods incorporate unlabeled data in addition to labeled data to enhance the model training, but it encounters the error-prone label problem. The ensemble training scheme trains a set of models and combines them to make the model more...

rozdział

Unsupervised speaker adaptation of batch normalized acoustic models for robust ASR

Zhong-Qiu Wang, DeLiang Wang

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4890 - 4894

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Batch normalization is a standard technique for training deep neural networks. In batch normalization, the input of each hidden layer is first mean-variance normalized and then linearly transformed before applying non-linear activation functions. We propose a novel unsupervised speaker adaptation technique for batch normalized acoustic models. The key idea is to adjust the linear transformations previously...

rozdział

An investigation into learning effective speaker subspaces for robust unsupervised DNN adaptation

Lahiru Samarakoon, Khe Chai Sim, Brian Mak

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5035 - 5039

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Subspace methods are used for deep neural network (DNN)-based acoustic model adaptation. These methods first construct a subspace and then perform the speaker adaptation as a point in the subspace. This paper aims to investigate the effectiveness of subspace methods for robust unsupervised adaptation. For the analysis, we compare two state-of-the-art subspace methods, namely, the singular value decomposition...

rozdział

Exploiting sequential Low-Rank Factorization for multilingual DNNS

Reza Sahraeian, Dirk Van Compernolle

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5025 - 5029

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

DNNs have shown remarkable performance in multilingual scenarios; however, these models are often too large in size that adaptation to a target language with relatively small amount of data cannot be well accomplished. In our previous work, we utilized Low-Rank Factorization (LRF) using singular value decomposition for multilingual DNNs to learn compact models which can be adapted more successfully...

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Inter dataset variability modeling for speaker recognition

Incremental adaptation using active learning for acoustic emotion recognition

Unsupervised adaptation for deep neural networks using Alternating Direction Method of Multipliers

Personalized acoustic modeling by weakly supervised multi-task deep learning using acoustic tokens discovered from unlabeled data

Theoretical vulnerabilities in map speaker adaptation

The 2016 BBN Georgian telephone speech keyword spotting system

Practical strategies for content-adaptive batch steganography and pooled steganalysis

Intra-class covariance adaptation in PLDA back-ends for speaker verification

e-vectors: JFA and i-vectors revisited

Low-resource grapheme-to-phoneme conversion using recurrent neural networks

A deep neural network integrated with filterbank learning for speech recognition

Exploiting sequence information for text-dependent Speaker Verification

Knowledge distillation for small-footprint highway networks

Adaptation of PLDA for multi-source text-independent speaker verification

Recurrent Neural Network based language modeling with controllable external Memory

Alternative networks for monolingual bottleneck features

Semi-supervised ensemble DNN acoustic model training

Unsupervised speaker adaptation of batch normalized acoustic models for robust ASR

An investigation into learning effective speaker subspaces for robust unsupervised DNN adaptation

Exploiting sequential Low-Rank Factorization for multilingual DNNS

Opcje filtrowania

Data publikacji

Słowa kluczowe

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Słowa kluczowe

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu