Mortaza Doulaty

article

Lightly supervised alignment of subtitles on multi-genre broadcasts

Oscar Saz, Salil Deena, Mortaza Doulaty, Madina Hasan, more

Multimedia Tools and Applications > 2018 > 77 > 23 > 30533- 30550

This paper describes a system for performing alignment of subtitles to audio on multigenre broadcasts using a lightly supervised approach. Accurate alignment of subtitles plays a substantial role in the daily work of media companies and currently still requires large human effort. Here, a comprehensive approach to performing this task in an automated way using lightly supervised alignment is proposed...

chapter

Automatic optimization of data perturbation distributions for multi-style training in speech recognition

Mortaza Doulaty, Richard Rose, Olivier Siohan

2016 IEEE Spoken Language Technology Workshop (SLT) > 21 - 27

2016 IEEE Spoken Language Technology Workshop (SLT)

Speech recognition performance using deep neural network based acoustic models is known to degrade when the acoustic environment and the speaker population in the target utterances are significantly different from the conditions represented in the training data. To address these mismatched scenarios, multi-style training (MTR) has been used to perturb utterances in an existing uncorrupted and potentially...

chapter

The 2015 sheffield system for longitudinal diarisation of broadcast media

Rosanna Milner, Oscar Saz, Salil Deena, Mortaza Doulaty, more

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) > 632 - 638

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Speaker diarisation is the task of answering "who spoke when" within a multi-speaker audio recording. Diarisation of broadcast media typically operates on individual television shows, and is a particularly difficult task, due to a high number of speakers and challenging background conditions. Using prior knowledge, such as that from previous shows in a series, can improve performance. Longitudinal...

chapter

The 2015 sheffield system for transcription of Multi-Genre Broadcast media

Oscar Saz, Mortaza Doulaty, Salil Deena, Rosanna Milner, more

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) > 624 - 631

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

We describe the University of Sheffield system for participation in the 2015 Multi-Genre Broadcast (MGB) challenge task of transcribing multi-genre broadcast shows. Transcription was one of four tasks proposed in the MGB challenge, with the aim of advancing the state of the art of automatic speech recognition, speaker diarisation and automatic alignment of subtitles for broadcast media. Four topics...

chapter

Latent Dirichlet Allocation based organisation of broadcast media archives for deep neural network adaptation

Mortaza Doulaty, Oscar Saz, Raymond W. M. Ng, Thomas Hain

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) > 130 - 136

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

This paper presents a new method for the discovery of latent domains in diverse speech data, for the use of adaptation of Deep Neural Networks (DNNs) for Automatic Speech Recognition. Our work focuses on transcription of multi-genre broadcast media, which is often only categorised broadly in terms of high level genres such as sports, news, documentary, etc. However, in terms of acoustic modelling...

chapter

Background-tracking acoustic features for genre identification of broadcast shows

Oscar Saz, Mortaza Doulaty, Thomas Hain

2014 IEEE Spoken Language Technology Workshop (SLT) > 118 - 123

2014 IEEE Spoken Language Technology Workshop (SLT)

This paper presents a novel method for extracting acoustic features that characterise the background environment in audio recordings. These features are based on the output of an alignment that fits multiple parallel background-based Constrained Maximum Likelihood Linear Regression transformations asynchronously to the input audio signal. With this setup, the resulting features can track changes in...

INFONA - science communication portal

Search results for: Mortaza Doulaty

Lightly supervised alignment of subtitles on multi-genre broadcasts

Automatic optimization of data perturbation distributions for multi-style training in speech recognition

The 2015 sheffield system for longitudinal diarisation of broadcast media

The 2015 sheffield system for transcription of Multi-Genre Broadcast media

Latent Dirichlet Allocation based organisation of broadcast media archives for deep neural network adaptation

Background-tracking acoustic features for genre identification of broadcast shows

Filter options

Publication date

Publication type

Keywords

Data set

INFONA - science communication portal

Search results for: Mortaza Doulaty

Lightly supervised alignment of subtitles on multi-genre broadcasts

Automatic optimization of data perturbation distributions for multi-style training in speech recognition

The 2015 sheffield system for longitudinal diarisation of broadcast media

The 2015 sheffield system for transcription of Multi-Genre Broadcast media

Latent Dirichlet Allocation based organisation of broadcast media archives for deep neural network adaptation

Background-tracking acoustic features for genre identification of broadcast shows

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options