Kazumasa Yamamoto

chapter

Robust lecture speech translation for speech misrecognition and its rescoring effect from multiple candidates

Koya Sahashi, Norioki Goto, Hiroshi Seki, Kazumasa Yamamoto, more

2017 International Conference on Advanced Informatics, Concepts, Theory, and Applications (ICAICTA) > 1 - 6

2017 International Conference on Advanced Informatics, Concepts, Theory, and Applications (ICAICTA)

We describe a scheme to translate spoken English lectures into Japanese consisting of a deep neural network based English automatic speech recognition system (ASR) and an English to Japanese phrase-based statistical machine translation system (SMT). The bad influence of speech misrecognition for the translation model is focused. For coping with bad influence caused by speech misrecognition, we utilized...

chapter

Lyric recognition in monophonic singing using pitch-dependent DNN

Dairoku Kawai, Kazumasa Yamamoto, Seiichi Nakagawa

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 326 - 330

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

One of the difficulties in sung speech recognition is the small distance in an acoustic space between phonemes in sung speech. Therefore we considered clustering the speech based on a pitch (fundamental frequency F0) and creating a larger distance between the phonemes. In addition, we considered a two-stage training method of DNN-HMM: the first stage is trained by using conventional acoustic features...

chapter

A deep neural network integrated with filterbank learning for speech recognition

Hiroshi Seki, Kazumasa Yamamoto, Seiichi Nakagawa

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5480 - 5484

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Deep neural networks (DNN) have achieved significant success in the field of speech recognition. One of the main advantages of the DNN is automatic feature extraction without human intervention. Therefore, we incorporate a pseudo-filterbank layer to the bottom of DNN and train the whole filterbank layer and the following networks jointly, while most systems take pre-defined mel-scale filterbanks as...

chapter

Investigation of glottal features and annotation procedures for speech emotion recognition

Masaaki Takebe, Kazumasa Yamamoto, Seiichi Nakagawa

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Speech emotion recognition is a still challenging problem despite having been investigated over the last couple of decades. Conventional speech emotion recognition performance is low, but this may be improved by considering new features and an annotation method. In this paper, firstly we use glottal features for speech emotion recognition to improve its performance because the emotions are related...

chapter

Domain adaptation of a speech translation system for lectures by utilizing frequently appearing parallel phrases in-domain

Norioki Goto, Kazumasa Yamamoto, Seiichi Nakagawa

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper describes our scheme to translate spoken English lectures into Japanese consisting of an English automatic speech recognition system (ASR) that utilizes a deep neural network (DNN) and an English to Japanese phrase-based statistical machine translation system (SMT). We focused on domain adaptation of the acoustic and translation models. For domain adaptation of the translation model, frequently...

chapter

Deep neural network based acoustic model using speaker-class information for short time utterance

Hiroshi Seki, Kazumasa Yamamoto, Seiichi Nakagawa

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1222 - 1225

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In speech recognition, it is preferable not to hypothesize the details, e.g., specific age and gender, of a target user. However, speaker independence is one of the things that degrades ASR performance. In this work, we propose a speaker adaptation method to recognize a short time utterance. There have been several studies on speaker-independent DNN-HMM in which i-vector is computed, and the additional...

chapter

Fast NMF based approach and improved VQ based approach for speech recognition from mixed sound

Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa

Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference > 1 - 4

2012 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

We have considered a speech recognition method for mixed sound, consisting of speech and music, that removes only the music based on vector quantization (VQ) and non-negative matrix factorization (NMF). This paper describe fast calculation technique of music removal based on NMF and improvement using a VQ method. For isolated word recognition using the clean speech model, an improvement of 46% word...

chapter

Automatic speech recognition using Hidden Conditional Neural Fields

Yasuhisa Fujii, Kazumasa Yamamoto, Seiichi Nakagawa

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5036 - 5039

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Hidden Conditional Random Fields(HCRF) is a very promising approach to model speech. However, because HCRF computes the score of a hypothesis by summing up linearly weighted features, it cannot consider non-linearity among features that will be crucial for speech recognition. In this paper, we extend HCRF by incorporating gate function used in neural networks and propose a new model called Hidden...

chapter

Speaker identification by combining MFCC and phase information in noisy environments

Longbiao Wang, Kazue Minami, Kazumasa Yamamoto, Seiichi Nakagawa

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 4502 - 4505

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

In conventional speaker recognition methods based on MFCC, the phase information has been ignored. Recently, we proposed a method that integrated MFCC with the phase information on a speaker recognition method. Using the phase information, the speaker identification error rate was reduced by 78% for clean speech. In this paper, we describe the effectiveness of phase information for noisy environments...

INFONA - science communication portal

Search results for: Kazumasa Yamamoto

Robust lecture speech translation for speech misrecognition and its rescoring effect from multiple candidates

Lyric recognition in monophonic singing using pitch-dependent DNN

A deep neural network integrated with filterbank learning for speech recognition

Investigation of glottal features and annotation procedures for speech emotion recognition

Domain adaptation of a speech translation system for lectures by utilizing frequently appearing parallel phrases in-domain

Deep neural network based acoustic model using speaker-class information for short time utterance

Fast NMF based approach and improved VQ based approach for speech recognition from mixed sound

Automatic speech recognition using Hidden Conditional Neural Fields

Speaker identification by combining MFCC and phase information in noisy environments

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results for: Kazumasa Yamamoto

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options