Search results for: M. J. F. Gales

Items from 1 to 20 out of 41 results

chapter

Morph-to-word transduction for accurate and efficient automatic speech recognition and keyword search

A. Ragni, D. Saunders, P. Zahemszky, J. Vasilakes, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5770 - 5774

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Word units are a popular choice in statistical language modelling. For inflective and agglutinative languages this choice may result in a high out of vocabulary rate. Subword units, such as morphs, provide an interesting alternative to words. These units can be derived in an unsupervised fashion and empirically show lower out of vocabulary rates. This paper proposes a morph-to-word transduction to...

chapter

Stimulated training for automatic speech recognition and keyword search in limited resource conditions

A. Ragni, C. Wu, M. J. F. Gales, J. Vasilakes, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4830 - 4834

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Training neural network acoustic models on limited quantities of data is a challenging task. A number of techniques have been proposed to improve generalisation. This paper investigates one such technique called stimulated training. It enables standard criteria such as cross-entropy to enforce spatial constraints on activations originating from different units. Having different regions being active...

chapter

Improved DNN-based segmentation for multi-genre broadcast audio

L. Wang, C. Zhang, P. C. Woodland, M. J. F. Gales, more

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5700 - 5704

ICASSP 2016 - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Automatic segmentation is a crucial initial processing step for processing multi-genre broadcast (MGB) audio. It is very challenging since the data exhibits a wide range of both speech types and background conditions with many types of non-speech audio. This paper describes a segmentation system for multi-genre broadcast audio with deep neural network (DNN) based speech/non-speech detection. A further...

chapter

CUED-RNNLM — An open-source toolkit for efficient training and evaluation of recurrent neural network language models

X. Chen, X. Liu, Y. Qian, M. J. F. Gales, more

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 6000 - 6004

ICASSP 2016 - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In recent years, recurrent neural network language models (RNNLMs) have become increasingly popular for a range of applications including speech recognition. However, the training of RNNLMs is computationally expensive, which limits the quantity of data, and size of network, that can be used. In order to fully exploit the power of RNNLMs, efficient training implementations are required. This paper...

chapter

System combination with log-linear models

J. Yang, C. Zhang, A. Ragni, M. J. F. Gales, more

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5675 - 5679

ICASSP 2016 - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Improved speech recognition performance can often be obtained by combining multiple systems together. Joint decoding, where scores from multiple systems are combined during decoding rather than combining hypotheses, is one efficient approach for system combination. In standard joint decoding the frame log-likelihoods from each system are used as the scores. These scores are then weighted and summed...

chapter

Model-Based Approaches to Handling Uncertainty

M. J. F. Gales

Robust Speech Recognition of Uncertain or Missing Data > Applications: Noise Robustness > 101-125

A powerful approach for handling uncertainty in observations is to modify the statistical model of the data to appropriately reflect this uncertainty. For the task of noise-robust speech recognition, this requires modifying an underlying “clean” acoustic model to be representative of speech in a particular target acoustic environment. This chapter describes the underlying concepts of model-based noise...

chapter

The MGB challenge: Evaluating multi-genre broadcast media recognition

P Bell, M J F Gales, T Hain, J Kilgour, more

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) > 687 - 693

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

This paper describes the Multi-Genre Broadcast (MGB) Challenge at ASRU 2015, an evaluation focused on speech recognition, speaker diarization, and "lightly supervised" alignment of BBC TV recordings. The challenge training data covered the whole range of seven weeks BBC TV output across four channels, resulting in about 1,600 hours of broadcast audio. In addition several hundred million...

chapter

The development of the cambridge university alignment systems for the multi-genre broadcast challenge

P. Lanchantin, M. J. F. Gales, P. Karanasou, X. Liu, more

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) > 647 - 653

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

We describe the alignment systems developed both for the preparation of data for the Multi-Genre Broadcast (MGB) challenge and for our participation in the transcription and alignment tasks. Captions of varying quality are aligned with the audio of TV shows that range from few minutes long to more than six hours. Lightly supervised decoding is performed on the audio and the output text is aligned...

chapter

Speaker diarisation and longitudinal linking in multi-genre broadcast data

P. Karanasou, M. J. F. Gales, P. Lanchantin, X. Liu, more

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) > 660 - 666

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

This paper presents a multi-stage speaker diarisation system with longitudinal Linking developed on BBC multi-genre data for the 2015 Multi-Genre Broadcast (MGB) challenge. The basic speaker diarisation system draws on techniques from the Cambridge March 2005 system with a new deep neural network (DNN)-based speech/non speech segmenter. A newly developed linking stage is next added to the basic diarisation...

chapter

Cambridge university transcription systems for the multi-genre broadcast challenge

P. C. Woodland, X. Liu, Y. Qian, C. Zhang, more

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) > 639 - 646

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

We describe the development of our speech-to-text transcription systems for the 2015 Multi-Genre Broadcast (MGB) challenge. Key features of the systems are: a segmentation system based on deep neural networks (DNNs); the use of HTK 3.5 for building DNN-based hybrid and tandem acoustic models and the use of these models in a joint decoding framework; techniques for adaptation of DNN based acoustic...

chapter

Structured discriminative models using deep neural-network features

R. C. van Dalen, J. Yang, H. Wang, A. Ragni, more

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) > 160 - 166

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

State-of-the-art speech recognisers employ neural networks in various configurations. A standard (hybrid) speech recogniser computes the likelihood for one time frame and state, using only one out of thousands of possible neural-network outputs. However, the whole output vector carries information. In this paper, features from state-of-the-art speech recognisers are collected per phone given a particular...

chapter

Investigation of back-off based interpolation between recurrent neural network and n-gram language models

X. Chen, X. Liu, M. J. F. Gales, P. C. Woodland

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) > 181 - 186

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Recurrent neural network language models (RNNLMs) have become an increasingly popular choice for speech and language processing tasks including automatic speech recognition (ASR). As the generalization patterns of RNNLMs and n-gram LMs are inherently different, RNNLMs are usually combined with n-gram LMs via a fixed weighting based linear interpolation in state-of-the-art ASR systems. However, previous...

chapter

Paraphrastic recurrent neural network language models

X. Liu, X. Chen, M. J. F. Gales, P. C. Woodland

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5406 - 5410

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recurrent neural network language models (RNNLM) have become an increasingly popular choice for state-of-the-art speech recognition systems. Linguistic factors in??uencing the realization of surface word sequences, for example, expressive richness, are only implicitly learned by RNNLMs. Observed sentences and their associated alternative paraphrases representing the same meaning are not explicitly...

chapter

Recurrent neural network language model training with noise contrastive estimation for speech recognition

X. Chen, X. Liu, M. J. F. Gales, P. C. Woodland

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5411 - 5415

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In recent years recurrent neural network language models (RNNLMs) have been successfully applied to a range of tasks including speech recognition. However, an important issue that limits the quantity of data used, and their possible application areas, is the computational cost in training. A signi??cant part of this cost is associated with the softmax function at the output layer, as this requires...

chapter

Improving multiple-crowd-sourced transcriptions using a speech recogniser

R. C. van Dalen, K. M. Knill, P. Tsiakoulis, M. J. F. Gales

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4709 - 4713

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper introduces a method to produce high-quality transcriptions of speech data from only two crowd-sourced transcriptions. These transcriptions, produced cheaply by people on the Internet, for example through Amazon Mechanical Turk, are often of low quality. Often, multiple crowd-sourced transcriptions are combined to form one transcription of higher quality. However, the state of the art is...

chapter

A language space representation for speech recognition

A. Ragni, M. J. F. Gales, K. M. Knill

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4634 - 4638

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The number of languages for which speech recognition systems have become available is growing each year. This paper proposes to view languages as points in some rich space, termed language space, where bases are eigen-languages and a particular selection of the projection determines points. Such an approach could not only reduce development costs for each new language but also provide automatic means...

chapter

Efficient lattice rescoring using recurrent neural network language models

X. Liu, Y. Wang, X. Chen, M. J. F. Gales, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4908 - 4912

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recurrent neural network language models (RNNLM) have become an increasingly popular choice for state-of-the-art speech recognition systems due to their inherently strong generalization performance. As these models use a vector representation of complete history contexts, RNNLMs are normally used to rescore N-best lists. Motivated by their intrinsic characteristics, two novel lattice rescoring methods...

chapter

Infinite structured support vector machines for speech recognition

J. Yang, R. C. van Dalen, S.-X. Zhang, M. J. F. Gales

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 3320 - 3324

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Discriminative models, like support vector machines (SVMs), have been successfully applied to speech recognition and improved performance. A Bayesian non-parametric version of the SVM, the infinite SVM, improves on the SVM by allowing more flexible decision boundaries. However, like SVMs, infinite SVMs model each class separately, which restricts them to classifying one word at a time. A generalisation...

chapter

Paraphrastic neural network language models

X. Liu, M. J. F. Gales, P. C. Woodland

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4903 - 4907

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Expressive richness in natural languages presents a significant challenge for statistical language models (LM). As multiple word sequences can represent the same underlying meaning, only modelling the observed surface word sequence can lead to poor context coverage. To handle this issue, paraphrastic LMs were previously proposed to improve the generalization of back-off n-gram LMs. Paraphrastic neural...

chapter

Investigation of multilingual deep neural networks for spoken term detection

K. M. Knill, M. J. F. Gales, S. P. Rath, P. C. Woodland, more

2013 IEEE Workshop on Automatic Speech Recognition and Understanding > 138 - 143

2013 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

The development of high-performance speech processing systems for low-resource languages is a challenging area. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to use bottleneck features, or hybrid systems, trained on multilingual data for speech-to-text (STT) systems. This paper presents an investigation into the...

Publication date

Set your own date range

INFONA - science communication portal

Search results for: M. J. F. Gales

Morph-to-word transduction for accurate and efficient automatic speech recognition and keyword search

Stimulated training for automatic speech recognition and keyword search in limited resource conditions

Improved DNN-based segmentation for multi-genre broadcast audio

CUED-RNNLM — An open-source toolkit for efficient training and evaluation of recurrent neural network language models

System combination with log-linear models

Model-Based Approaches to Handling Uncertainty

The MGB challenge: Evaluating multi-genre broadcast media recognition

The development of the cambridge university alignment systems for the multi-genre broadcast challenge

Speaker diarisation and longitudinal linking in multi-genre broadcast data

Cambridge university transcription systems for the multi-genre broadcast challenge

Structured discriminative models using deep neural-network features

Investigation of back-off based interpolation between recurrent neural network and n-gram language models

Paraphrastic recurrent neural network language models

Recurrent neural network language model training with noise contrastive estimation for speech recognition

Improving multiple-crowd-sourced transcriptions using a speech recogniser

A language space representation for speech recognition

Efficient lattice rescoring using recurrent neural network language models

Infinite structured support vector machines for speech recognition

Paraphrastic neural network language models

Investigation of multilingual deep neural networks for spoken term detection

Filter options

Publication date

Publication type

Keywords

Data set

Journal

INFONA - science communication portal

Search results for: M. J. F. Gales

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Data set

Journal

Reporting an error / abuse

Sending the report failed

Accessibility options