Search results for: P. C. Woodland

Items from 1 to 20 out of 28 results

chapter

Joint optimisation of tandem systems using Gaussian mixture density neural network discriminative sequence training

C. Zhang, P. C. Woodland

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5015 - 5019

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The use of deep neural networks (DNNs) for feature extraction and Gaussian mixture models (GMMs) for acoustic modelling is often termed a tandem system configuration and can be viewed as a Gaussian mixture density neural network (MDNN). Compared to the direct use of DNN output probabilities in the acoustic model, the tandem approach suffers from a major weakness in that the feature extraction stage...

chapter

Improved DNN-based segmentation for multi-genre broadcast audio

L. Wang, C. Zhang, P. C. Woodland, M. J. F. Gales, more

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5700 - 5704

ICASSP 2016 - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Automatic segmentation is a crucial initial processing step for processing multi-genre broadcast (MGB) audio. It is very challenging since the data exhibits a wide range of both speech types and background conditions with many types of non-speech audio. This paper describes a segmentation system for multi-genre broadcast audio with deep neural network (DNN) based speech/non-speech detection. A further...

chapter

CUED-RNNLM — An open-source toolkit for efficient training and evaluation of recurrent neural network language models

X. Chen, X. Liu, Y. Qian, M. J. F. Gales, more

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 6000 - 6004

ICASSP 2016 - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In recent years, recurrent neural network language models (RNNLMs) have become increasingly popular for a range of applications including speech recognition. However, the training of RNNLMs is computationally expensive, which limits the quantity of data, and size of network, that can be used. In order to fully exploit the power of RNNLMs, efficient training implementations are required. This paper...

chapter

System combination with log-linear models

J. Yang, C. Zhang, A. Ragni, M. J. F. Gales, more

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5675 - 5679

ICASSP 2016 - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Improved speech recognition performance can often be obtained by combining multiple systems together. Joint decoding, where scores from multiple systems are combined during decoding rather than combining hypotheses, is one efficient approach for system combination. In standard joint decoding the frame log-likelihoods from each system are used as the scores. These scores are then weighted and summed...

chapter

DNN speaker adaptation using parameterised sigmoid and ReLU hidden activation functions

C. Zhang, P. C. Woodland

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5300 - 5304

ICASSP 2016 - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper investigates the use of parameterised sigmoid and rectified linear unit (ReLU) hidden activation functions in deep neural network (DNN) speaker adaptation. The sigmoid and ReLU parameterisation schemes from a previous study for speaker independent (SI) training are used. An adaptive linear factor associated with each sigmoid or ReLU hidden unit is used to scale the unit output value and...

chapter

The MGB challenge: Evaluating multi-genre broadcast media recognition

P Bell, M J F Gales, T Hain, J Kilgour, more

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) > 687 - 693

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

This paper describes the Multi-Genre Broadcast (MGB) Challenge at ASRU 2015, an evaluation focused on speech recognition, speaker diarization, and "lightly supervised" alignment of BBC TV recordings. The challenge training data covered the whole range of seven weeks BBC TV output across four channels, resulting in about 1,600 hours of broadcast audio. In addition several hundred million...

chapter

Speaker diarisation and longitudinal linking in multi-genre broadcast data

P. Karanasou, M. J. F. Gales, P. Lanchantin, X. Liu, more

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) > 660 - 666

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

This paper presents a multi-stage speaker diarisation system with longitudinal Linking developed on BBC multi-genre data for the 2015 Multi-Genre Broadcast (MGB) challenge. The basic speaker diarisation system draws on techniques from the Cambridge March 2005 system with a new deep neural network (DNN)-based speech/non speech segmenter. A newly developed linking stage is next added to the basic diarisation...

chapter

Cambridge university transcription systems for the multi-genre broadcast challenge

P. C. Woodland, X. Liu, Y. Qian, C. Zhang, more

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) > 639 - 646

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

We describe the development of our speech-to-text transcription systems for the 2015 Multi-Genre Broadcast (MGB) challenge. Key features of the systems are: a segmentation system based on deep neural networks (DNNs); the use of HTK 3.5 for building DNN-based hybrid and tandem acoustic models and the use of these models in a joint decoding framework; techniques for adaptation of DNN based acoustic...

chapter

Investigation of back-off based interpolation between recurrent neural network and n-gram language models

X. Chen, X. Liu, M. J. F. Gales, P. C. Woodland

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) > 181 - 186

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Recurrent neural network language models (RNNLMs) have become an increasingly popular choice for speech and language processing tasks including automatic speech recognition (ASR). As the generalization patterns of RNNLMs and n-gram LMs are inherently different, RNNLMs are usually combined with n-gram LMs via a fixed weighting based linear interpolation in state-of-the-art ASR systems. However, previous...

chapter

Paraphrastic recurrent neural network language models

X. Liu, X. Chen, M. J. F. Gales, P. C. Woodland

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5406 - 5410

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recurrent neural network language models (RNNLM) have become an increasingly popular choice for state-of-the-art speech recognition systems. Linguistic factors in??uencing the realization of surface word sequences, for example, expressive richness, are only implicitly learned by RNNLMs. Observed sentences and their associated alternative paraphrases representing the same meaning are not explicitly...

chapter

Improving the training and evaluation efficiency of recurrent neural network language models

X. Chen, X. Liu, M.J.F. Gales, P. C. Woodland

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5401 - 5405

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recurrent neural network language models (RNNLMs) are becoming increasingly popular for speech recognition. Previously, we have shown that RNNLMs with a full (non-classed) output layer (F-RNNLMs) can be trained efficiently using a GPU giving a large reduction in training time over conventional class-based models (C-RNNLMs) on a standard CPU. However, since test-time RNNLM evaluation is often performed...

chapter

Recurrent neural network language model training with noise contrastive estimation for speech recognition

X. Chen, X. Liu, M. J. F. Gales, P. C. Woodland

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5411 - 5415

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In recent years recurrent neural network language models (RNNLMs) have been successfully applied to a range of tasks including speech recognition. However, an important issue that limits the quantity of data used, and their possible application areas, is the computational cost in training. A signi??cant part of this cost is associated with the softmax function at the output layer, as this requires...

chapter

Standalone training of context-dependent deep neural network acoustic models

C. Zhang, P. C. Woodland

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5597 - 5601

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recently, context-dependent (CD) deep neural network (DNN) hidden Markov models (HMMs) have been widely used as acoustic models for speech recognition. However, the standard method to build such models requires target training labels from a system using HMMs with Gaussian mixture model output distributions (GMM-HMMs). In this paper, we introduce a method for training state-of-the-art CD-DNN-HMMs without...

chapter

Efficient lattice rescoring using recurrent neural network language models

X. Liu, Y. Wang, X. Chen, M. J. F. Gales, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4908 - 4912

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recurrent neural network language models (RNNLM) have become an increasingly popular choice for state-of-the-art speech recognition systems due to their inherently strong generalization performance. As these models use a vector representation of complete history contexts, RNNLMs are normally used to rescore N-best lists. Motivated by their intrinsic characteristics, two novel lattice rescoring methods...

chapter

Paraphrastic neural network language models

X. Liu, M. J. F. Gales, P. C. Woodland

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4903 - 4907

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Expressive richness in natural languages presents a significant challenge for statistical language models (LM). As multiple word sequences can represent the same underlying meaning, only modelling the observed surface word sequence can lead to poor context coverage. To handle this issue, paraphrastic LMs were previously proposed to improve the generalization of back-off n-gram LMs. Paraphrastic neural...

chapter

Investigation of multilingual deep neural networks for spoken term detection

K. M. Knill, M. J. F. Gales, S. P. Rath, P. C. Woodland, more

2013 IEEE Workshop on Automatic Speech Recognition and Understanding > 138 - 143

2013 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

The development of high-performance speech processing systems for low-resource languages is a challenging area. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to use bottleneck features, or hybrid systems, trained on multilingual data for speech-to-text (STT) systems. This paper presents an investigation into the...

chapter

Paraphrastic language models and combination with neural network language models

X. Liu, M. J. F. Gales, P. C. Woodland

2013 IEEE International Conference on Acoustics, Speech and Signal Processing > 8421 - 8425

ICASSP 2013 - 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In natural languages multiple word sequences can represent the same underlying meaning. Only modelling the observed surface word sequence can result in poor context coverage, for example, when using n-gram language models (LM). To handle this issue, paraphrastic LMs were proposed in previous research and successfully applied to a US English conversational telephone speech transcription task. In order...

chapter

Transcription of multi-genre media archives using out-of-domain data

P. J. Bell, M. J. F. Gales, P. Lanchantin, X. Liu, more

2012 IEEE Spoken Language Technology Workshop (SLT) > 324 - 329

2012 IEEE Spoken Language Technology Workshop (SLT 2012)

We describe our work on developing a speech recognition system for multi-genre media archives. The high diversity of the data makes this a challenging recognition task, which may benefit from systems trained on a combination of in-domain and out-of-domain data. Working with tandem HMMs, we present Multi-level Adaptive Networks (MLAN), a novel technique for incorporating information from out-of-domain...

chapter

Language model combination and adaptation usingweighted finite state transducers

X Liu, M J F Gales, J L Hieronymus, P C Woodland

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 5390 - 5393

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

In speech recognition systems language model (LMs) are often constructed by training and combining multiple n-gram models. They can be either used to represent different genres or tasks found in diverse text sources, or capture stochastic properties of different linguistic symbol sequences, for example, syllables and words. Unsupervised LM adaptation may also be used to further improve robustness...

chapter

Recent improvements to the Cambridge Arabic Speech-to-Text systems

M Tomalin, F Diehl, M J F Gales, J Park, more

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 4382 - 4385

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

This paper describes recent improvements to the Cambridge Arabic Large Vocabulary Continuous Speech Recognition (LVSCR) Speech-to-Text (STT) system. It is shown that Multi-Layer Perceptron (MLP) features trained on phonetic targets can improve the performance of both phonemic and graphemic systems. Also, a morphological decomposition scheme is extended from the graphemic domain to the phonetic domain,...

Publication date

Set your own date range

INFONA - science communication portal

Search results for: P. C. Woodland

Joint optimisation of tandem systems using Gaussian mixture density neural network discriminative sequence training

Improved DNN-based segmentation for multi-genre broadcast audio

CUED-RNNLM — An open-source toolkit for efficient training and evaluation of recurrent neural network language models

System combination with log-linear models

DNN speaker adaptation using parameterised sigmoid and ReLU hidden activation functions

The MGB challenge: Evaluating multi-genre broadcast media recognition

Speaker diarisation and longitudinal linking in multi-genre broadcast data

Cambridge university transcription systems for the multi-genre broadcast challenge

Investigation of back-off based interpolation between recurrent neural network and n-gram language models

Paraphrastic recurrent neural network language models

Improving the training and evaluation efficiency of recurrent neural network language models

Recurrent neural network language model training with noise contrastive estimation for speech recognition

Standalone training of context-dependent deep neural network acoustic models

Efficient lattice rescoring using recurrent neural network language models

Paraphrastic neural network language models

Investigation of multilingual deep neural networks for spoken term detection

Paraphrastic language models and combination with neural network language models

Transcription of multi-genre media archives using out-of-domain data

Language model combination and adaptation usingweighted finite state transducers

Recent improvements to the Cambridge Arabic Speech-to-Text systems

Filter options

Publication date

Publication type

Keywords

Data set

Journal

INFONA - science communication portal

Search results for: P. C. Woodland

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Data set

Journal

Reporting an error / abuse

Sending the report failed

Accessibility options