Zoltan Tuske

chapter

Towards Automatic Transcription of Large Spoken Archives in Agglutinating Languages – Hungarian ASR for the MALACH Project

Péter Mihajlik, Tibor Fegyó, Bottyán Németh, Zoltán Tüske, more

Lecture Notes in Computer Science > Text, Speech and Dialogue > Speech > 342-349

The paper describes automatic speech recognition experiments and results on the spontaneous Hungarian MALACH speech corpus. A novel morph-based lexical modeling approach is compared to the traditional word-based one and to another, previously best performing morph-based one in terms of word and letter error rates. The applied language and acoustic modeling techniques are also detailed. Using unsupervised...

chapter

Investigation on log-linear interpolation of multi-domain neural network language model

Zoltan Tuske, Kazuki Irie, Ralf Schluter, Hermann Ney

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 6005 - 6009

ICASSP 2016 - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Inspired by the success of multi-task training in acoustic modeling, this paper investigates a new architecture for a multi-domain neural network based language model (NNLM). The proposed model has several shared hidden layers and domain-specific output layers. As will be shown, the log-linear interpolation of the multi-domain outputs and the optimization of interpolation weights fit naturally in...

chapter

Multilingual representations for low resource speech recognition and keyword search

Jia Cui, Brian Kingsbury, Bhuvana Ramabhadran, Abhinav Sethy, more

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) > 259 - 266

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

This paper examines the impact of multilingual (ML) acoustic representations on Automatic Speech Recognition (ASR) and keyword search (KWS) for low resource languages in the context of the OpenKWS15 evaluation of the IARPA Babel program. The task is to develop Swahili ASR and KWS systems within two weeks using as little as 3 hours of transcribed data. Multilingual acoustic representations proved to...

chapter

Speaker adaptive joint training of Gaussian mixture models and bottleneck features

Zoltan Tuske, Pavel Golik, Ralf Schluter, Hermann Ney

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) > 596 - 603

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

In the tandem approach, the output of a neural network (NN) serves as input features to a Gaussian mixture model (GMM) aiming to improve the emission probability estimates. As has been shown in our previous work, GMM with pooled covariance matrix can be integrated into a neural network framework as a softmax layer with hidden variables, which allows for joint estimation of both neural network and...

chapter

Integrating Gaussian mixtures into deep neural networks: Softmax layer with hidden variables

Zoltan Tuske, Muhammad Ali Tahir, Ralf Schluter, Hermann Ney

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4285 - 4289

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In the hybrid approach, neural network output directly serves as hidden Markov model (HMM) state posterior probability estimates. In contrast to this, in the tandem approach neural network output is used as input features to improve classic Gaussian mixture model (GMM) based emission probability estimates. This paper shows that GMM can be easily integrated into the deep neural network framework. By...

chapter

Multilingual MRASTA features for low-resource keyword search and speech recognition systems

Zoltan Tuske, David Nolden, Ralf Schluter, Hermann Ney

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 7854 - 7858

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper investigates the application of hierarchical MRASTA bottleneck (BN) features for under-resourced languages within the IARPA Babel project. Through multilingual training of Multilayer Perceptron (MLP) BN features on five languages (Cantonese, Pashto, Tagalog, Turkish, and Vietnamese), we could end up in a single feature stream which is more beneficial to all languages than the unilingual...

chapter

The RWTH English lecture recognition system

Simon Wiesler, Kazuki Irie, Zoltan Tuske, Ralf Schluter, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 3286 - 3290

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we describe the RWTH speech recognition system for English lectures developed within the Translectures project. A difficulty in the development of an English lectures recognition system, is the high ratio of non-native speakers. We address this problem by using very effective deep bottleneck features trained on multilingual data. The acoustic model is trained on large amounts of data...

chapter

Deep hierarchical bottleneck MRASTA features for LVCSR

Zoltan Tuske, Ralf Schluter, Hermann Ney

2013 IEEE International Conference on Acoustics, Speech and Signal Processing > 6970 - 6974

ICASSP 2013 - 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Hierarchical Multi Layer Perceptron (MLP) based long-term feature extraction is optimized for TANDEM connectionist large vocabulary continuous speech recognition (LVCSR) system within the QUAERO project. Training the bottleneck MLP on multi-resolutional RASTA filtered critical band energies, more than 20% relative word error rate (WER) reduction over standard MFCC system is observed after optimizing...

chapter

Investigation on cross- and multilingual MLP features under matched and mismatched acoustical conditions

Zoltan Tuske, Joel Pinto, Daniel Willett, Ralf Schluter

2013 IEEE International Conference on Acoustics, Speech and Signal Processing > 7349 - 7353

ICASSP 2013 - 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, Multi Layer Perceptron (MLP) based multilingual bottleneck features are investigated for acoustic modeling in three languages — German, French, and US English. We use a modified training algorithm to handle the multilingual training scenario without having to explicitly map the phonemes to a common phoneme set. Furthermore, the cross-lingual portability of bottleneck features between...

chapter

Phase difference of filter-stable part-tones as acoustic feature

Zoltan Tuske, Friedhelm R. Drepper, Ralf Schluter

2012 IEEE Statistical Signal Processing Workshop (SSP) > 365 - 368

2012 IEEE Statistical Signal Processing Workshop (SSP)

A part-tone decomposition of voiced sections of speech is introduced, which is adapted with high accuracy to the frequency of the glottal oscillator of the speaker. The iterative replacement of the center filter frequency contours (chosen locally as linear chirp) of the non-stationary bandpass filters converges extremely fast and leads to the extraction of filter-stable part-tones with uncorrupted...

chapter

Comparison and combination of different CRBE based MLP features for LVCSR

Zoltan Tuske, Ralf Schluter, Hermann Ney

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4081 - 4084

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

Multi Layer Perceptron (MLP) features extracted from different types of critical band energies (CRBE) — derived from MFCC, GT, and PLP pipeline — are compared on French broadcast news and conversational speech recognition task. Though the MLP structure is kept fixed, ROVER combination of different CRBE based systems leads to 4% relative improvement. Furthermore, aiming at the combination of state-of-the-art...

chapter

Non-stationary feature extraction for automatic speech recognition

Zoltan Tuske, Pavel Golik, Ralf Schluter, Friedhelm R. Drepper

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5204 - 5207

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In current speech recognition systems mainly Short-Time Fourier Transform based features like MFCC are applied. Dropping the short-time stationarity assumption of the voiced speech, this paper introduces the non-stationary signal analysis into the ASR framework. We present new acoustic features extracted by a pitch-adaptive Gammatone filter bank. The noise robustness was proved on AURORA 2 and 4 tasks,...

INFONA - science communication portal

Search results for: Zoltan Tuske

Towards Automatic Transcription of Large Spoken Archives in Agglutinating Languages – Hungarian ASR for the MALACH Project

Investigation on log-linear interpolation of multi-domain neural network language model

Multilingual representations for low resource speech recognition and keyword search

Speaker adaptive joint training of Gaussian mixture models and bottleneck features

Integrating Gaussian mixtures into deep neural networks: Softmax layer with hidden variables

Multilingual MRASTA features for low-resource keyword search and speech recognition systems

The RWTH English lecture recognition system

Deep hierarchical bottleneck MRASTA features for LVCSR

Investigation on cross- and multilingual MLP features under matched and mismatched acoustical conditions

Phase difference of filter-stable part-tones as acoustic feature

Comparison and combination of different CRBE based MLP features for LVCSR

Non-stationary feature extraction for automatic speech recognition

Filter options

Publication date

Keywords

Data set

INFONA - science communication portal

Search results for: Zoltan Tuske

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options