Effect of Neural Network based phonetic feature segmentation in ASR

Mohammed Rokibul Alam Kotwal; Foyzul Hassan; Mohammad Nurul Huda

doi:10.1109/ICCITechn.2014.6997306

Źródło

16th Int'l Conf. Computer and Information Technology > 389 - 395

Abstrakt

This paper describes a system for phone segmentation using phonetic features, where context information influences the performance of Automatic Speech Recognition (ASR). Current Hidden Markov Model (HMM) based ASR systems have solved this problem by using context-sensitive triphone models. However, these models need a large number of speech parameters and a large volume of speech corpus. In this paper, we propose a technique to model a dynamic process of co-articulation and embed it to ASR systems. Recurrent Neural Network (RNN) is expected to realize this dynamic process. But main problem is the slowness of RNN for training the network of large size. We introduce Distinctive Phonetic Feature (DPF) based feature extraction using a two-stage system consists of a Multi-Layer Neural Network (MLN) in the first stage and another MLN with longer context window in the second stage where the first MLN is expected to reduce the dynamics of acoustic feature pattern and the second MLN to suppress the fluctuation caused by DPF context. The experiments are carried out using Japanese triphthong and Japanese Newspaper Article Sentences (JNAS) data. The proposed DPF based feature extractor provides better segmentation performance with a reduced mixture-set of HMMs. Better context effect is achieved with less computation using MLN instead of RNN.

Identyfikatory

e-ISBN książki :	978-1-4799-3497-3 , 978-1-4799-3496-6
DOI	10.1109/ICCITechn.2014.6997306

Słowa kluczowe

speech recognition feature extraction hidden Markov models multilayer perceptrons natural language processing recurrent neural nets speech processing mixture-set neural network based phonetic feature segmentation phone segmentation automatic speech recognition hidden Markov model HMM based ASR system context-sensitive triphone model speech parameter speech corpus recurrent neural network RNN distinctive phonetic feature DPF based feature extraction multilayer neural network MLN context window acoustic feature pattern fluctuation suppression DPF context Japanese triphthong Japanese newspaper article sentences JNAS data DPF based feature extractor segmentation performance Context Vectors Mel frequency cepstral coefficient Speech local features multi-layer neural networ

Informacje dodatkowe

Zbiór danych: ieee

Wydawca

IEEE

INFONA - portal komunikacji naukowej

Effect of Neural Network based phonetic feature segmentation in ASR

Źródło

Abstrakt

Identyfikatory

Autorzy

Kotwal, M.R.A.

Hassan, F.

Huda, M.N.

Słowa kluczowe

Informacje dodatkowe

Wydawca


Przypisz innemu użytkownikowi
	×
Niepoprawny email

INFONA - portal komunikacji naukowej

Effect of Neural Network based phonetic feature segmentation in ASR $("#expandableTitles").expandable();

Źródło

Abstrakt

Identyfikatory

Autorzy

Przypisywanie użytkownika

Potwierdzenie anulowania przypisania

Czy jesteś pewien, że chcesz anulować to przypisanie?

Kotwal, M.R.A.

Hassan, F.

Huda, M.N.

Słowa kluczowe

Informacje dodatkowe

Wydawca

Podziel się

Eksport do bibliografii

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu

Effect of Neural Network based phonetic feature segmentation in ASR