Kazumasa Yamamoto

chapter

Detection of overlapping acoustic events based on NMF with shared basis vectors

Kazumasa Yamamoto, Chikara Ishikawa, Koya Sahashi, Seiichi Nakagawa

2017 IEEE 6th Global Conference on Consumer Electronics (GCCE) > 1 - 5

2017 IEEE 6th Global Conference on Consumer Electronics (GCCE)

Acoustic Event Detection plays an important role for computational acoustic scene analysis. Although we would face with a sound overlapping problem in a real situation, conventional methods do not consider the problem enough. In this paper, we propose a new overlapped acoustic event detection technique combined a source separation technique of Non-negative Matrix Factorization with shared basis vectors...

chapter

Robust lecture speech translation for speech misrecognition and its rescoring effect from multiple candidates

Koya Sahashi, Norioki Goto, Hiroshi Seki, Kazumasa Yamamoto, more

2017 International Conference on Advanced Informatics, Concepts, Theory, and Applications (ICAICTA) > 1 - 6

2017 International Conference on Advanced Informatics, Concepts, Theory, and Applications (ICAICTA)

We describe a scheme to translate spoken English lectures into Japanese consisting of a deep neural network based English automatic speech recognition system (ASR) and an English to Japanese phrase-based statistical machine translation system (SMT). The bad influence of speech misrecognition for the translation model is focused. For coping with bad influence caused by speech misrecognition, we utilized...

chapter

Lyric recognition in monophonic singing using pitch-dependent DNN

Dairoku Kawai, Kazumasa Yamamoto, Seiichi Nakagawa

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 326 - 330

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

One of the difficulties in sung speech recognition is the small distance in an acoustic space between phonemes in sung speech. Therefore we considered clustering the speech based on a pitch (fundamental frequency F0) and creating a larger distance between the phonemes. In addition, we considered a two-stage training method of DNN-HMM: the first stage is trained by using conventional acoustic features...

chapter

A deep neural network integrated with filterbank learning for speech recognition

Hiroshi Seki, Kazumasa Yamamoto, Seiichi Nakagawa

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5480 - 5484

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Deep neural networks (DNN) have achieved significant success in the field of speech recognition. One of the main advantages of the DNN is automatic feature extraction without human intervention. Therefore, we incorporate a pseudo-filterbank layer to the bottom of DNN and train the whole filterbank layer and the following networks jointly, while most systems take pre-defined mel-scale filterbanks as...

chapter

Investigation of glottal features and annotation procedures for speech emotion recognition

Masaaki Takebe, Kazumasa Yamamoto, Seiichi Nakagawa

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Speech emotion recognition is a still challenging problem despite having been investigated over the last couple of decades. Conventional speech emotion recognition performance is low, but this may be improved by considering new features and an annotation method. In this paper, firstly we use glottal features for speech emotion recognition to improve its performance because the emotions are related...

chapter

Domain adaptation of a speech translation system for lectures by utilizing frequently appearing parallel phrases in-domain

Norioki Goto, Kazumasa Yamamoto, Seiichi Nakagawa

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper describes our scheme to translate spoken English lectures into Japanese consisting of an English automatic speech recognition system (ASR) that utilizes a deep neural network (DNN) and an English to Japanese phrase-based statistical machine translation system (SMT). We focused on domain adaptation of the acoustic and translation models. For domain adaptation of the translation model, frequently...

chapter

Prediction of Driving Actions from Driving Signals

Toshihiko Itoh, Shinya Yamada, Kazumasa Yamamoto, Kenji Araki

In-Vehicle Corpus and Signal Processing for Driver Behavior > 197-210

A spoken dialogue system for car-navigation systems may be able to provide more natural and smoother communications but it must also cause safety problems. One of these problems is distraction whereby machine operation and voice conversations influence the driver. Even the use of a simple speech interface may affect the driving operation. We consider that a spoken dialogue system which can understand...

chapter

Effect of sympathetic relation and unsympathetic relation in multi-agent spoken dialogue system

Yuma Shibahara, Kazumasa Yamamoto, Seiichi Nakagawa

2016 International Conference On Advanced Informatics: Concepts, Theory And Application (ICAICTA) > 1 - 6

2016 International Conference On Advanced Informatics: Concepts, Theory And Application (ICAICTA)

Recently, spoken dialog systems using speech recognition technology have been becoming popular. Such as chat-like dialog systems, these systems which do not have any specific purpose are called “Non-Task-oriented spoken dialog system”. In this study, we focused on non-task-oriented spoken dialog systems. We have developed a multi-party chat-like dialog system with different preferences between two...

chapter

Speech analysis of sung-speech and lyric recognition in monophonic singing

Dairoku Kawai, Kazumasa Yamamoto, Seiichi Nakagawa

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 271 - 275

ICASSP 2016 - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Lyric recognition in singing is challenging because of a number of problems, including a lack of singing databases, superposed musical instruments and different spectral variations. First of all, we investigated the difference of spectral variations among read speech, spontaneous speech and sung speech and we found that sung speech recognition was the most difficult. Next, we consider Japanese lyric...

chapter

Deep neural network based acoustic model using speaker-class information for short time utterance

Hiroshi Seki, Kazumasa Yamamoto, Seiichi Nakagawa

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1222 - 1225

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In speech recognition, it is preferable not to hypothesize the details, e.g., specific age and gender, of a target user. However, speaker independence is one of the things that degrades ASR performance. In this work, we propose a speaker adaptation method to recognize a short time utterance. There have been several studies on speaker-independent DNN-HMM in which i-vector is computed, and the additional...

chapter

Combination of syllable based N-gram search and word search for spoken term detection through spoken queries and IV/OOV classification

Nagisa Sakamoto, Kazumasa Yamamoto, Seiichi Nakagawa

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) > 200 - 206

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

This paper presents a Japanese spoken term detection method for spoken queries using a combination of word-based search and syllable-based N-gram search with in-vocabulary/out-of-vocabulary (IV/OOV) term classification. The N-gram index in a recognized syllable-based lattice for OOV terms, which assumes recognition errors such as substitution, insertion and deletion errors, incorporates a distance...

chapter

Speech recognition for mixed speech and music by NMF using various cost functions and noise adaptive training methods

Naoaki Hashimoto, Kazumasa Yamamoto, Seiichi Nakagawa

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 27 - 30

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

We investigated speech recognition methods for mixed speech and music that only remove music based on non-negative matrix factorization (NMF). In this paper, we introduced the Euclidean distance of logarithm spectrum DLOG as a distance measure for source separation, which may correspond to the distance measure for speech recognition, and compared it with such traditional distance measures as the Kullback-Leibler...

chapter

English to Japanese spoken lecture translation system by using DNN-HMM and phrase-based SMT

Norioki Goto, Kazumasa Yamamoto, Seiichi Nakagawa

2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA) > 1 - 6

2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)

This paper presents our scheme to translate spoken English lectures into Japanese that consists of an English automatic speech recognition system (ASR) that utilizes a deep neural network (DNN) and an English to Japanese phrase-based statistical machine translation system (SMT). We utilized an existing Wall Street Journal corpus for our acoustic model and adapted it with MIT OpenCourseWare lectures...

chapter

Comparison of syllable-based and phoneme-based DNN-HMM in Japanese speech recognition

Hiroshi Seki, Kazumasa Yamamoto, Seiichi Nakagawa

2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA) > 249 - 254

2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA)

Japanese is syllabic language. Additionally we have studied syllable-based GMM-HMM for Japanese speech recognition. In this paper, we investigate the differences of recognition accuracy using phoneme/syllable-based GMM-HMM and DNN (Deep Neural Network)-HMM. First, we present a comparison of syllable-based and phoneme-based DNN-HMM. Second, we train the tied state left-context dependent syllable DNN-HMM,...

chapter

Fast NMF based approach and VQ based approach using MFCC distance measure for speech recognition from mixed sound

Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa

2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference > 1 - 4

2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

We have considered a speech recognition method for mixed sound, consisting of speech and music, that removes only the music based on vector quantization (VQ) and non-negative matrix factorization (NMF). Instead of conventional amplitude spectrum distance measure, MFCC distance measure which is not affected by the pitch is introduced. For isolated word recognition using the clean speech model, an improvement...

chapter

Single channel dereverberation method in log-melspectral domain using limited stereo data for distant speaker identification

Aditya Arie Nugraha, Kazumasa Yamamoto, Seiichi Nakagawa

2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference > 1 - 4

2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this paper, we present a feature enhancement method that uses neural networks (NNs) to map the reverberant feature in a log-melspectral domain to its corresponding anechoic feature. The mapping is done by cascade NNs trained using Cascade 2 algorithm with an implementation of segment-based normalization. We assumed that the dimensions of feature were independent from each other and experimented...

chapter

Speaker tracking with spherical microphone arrays

John McDonough, Kenichi Kumatani, Takayuki Arakawa, Kazumasa Yamamoto, more

2013 IEEE International Conference on Acoustics, Speech and Signal Processing > 3981 - 3985

ICASSP 2013 - 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In prior work, we investigated the application of a spherical microphone array to a distant speech recognition task. In that work, the relative positions of a fixed loud speaker and the spherical array required for beamforming were measured with an optical tracking device. In the present work, we investigate how these relative positions can be determined automatically for real, human speakers based...

chapter

A low power many-core SoC with two 32-core clusters connected by tree based NoC for multimedia applications

Hui Xu, Jun Tanabe, Hiroyuki Usui, Soichiro Hosoda, more

2012 Symposium on VLSI Circuits (VLSIC) > 150 - 151

2012 IEEE Symposium on VLSI Circuits

A low-power many-core SoC for multimedia applications is implemented in 40nm CMOS technology. Within a 210mm² die, two 32-core clusters are integrated with dynamically reconfigurable processors, hardware accelerators, 2-channel DDR3 I/Fs, and other peripherals. Processor cores in the cluster share a 2MB L2 cache connected through a tree-based Network-on-Chip (NoC). The high scalability and low power...

chapter

Microphone array processing for distant speech recognition: Towards real-world deployment

Kenichi Kumatani, Takayuki Arakawa, Kazumasa Yamamoto, John McDonough, more

Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference > 1 - 10

2012 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

Distant speech recognition (DSR) holds out the promise of providing a natural human computer interface in that it enables verbal interactions with computers without the necessity of donning intrusive body- or head-mounted devices. Recognizing distant speech robustly, however, remains a challenge. This paper provides a overview of DSR systems based on microphone arrays. In particular, we present recent...

chapter

Fast NMF based approach and improved VQ based approach for speech recognition from mixed sound

Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa

Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference > 1 - 4

2012 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

We have considered a speech recognition method for mixed sound, consisting of speech and music, that removes only the music based on vector quantization (VQ) and non-negative matrix factorization (NMF). This paper describe fast calculation technique of music removal based on NMF and improvement using a VQ method. For isolated word recognition using the clean speech model, an improvement of 46% word...

INFONA - science communication portal

Search results for: Kazumasa Yamamoto

Detection of overlapping acoustic events based on NMF with shared basis vectors

Robust lecture speech translation for speech misrecognition and its rescoring effect from multiple candidates

Lyric recognition in monophonic singing using pitch-dependent DNN

A deep neural network integrated with filterbank learning for speech recognition

Investigation of glottal features and annotation procedures for speech emotion recognition

Domain adaptation of a speech translation system for lectures by utilizing frequently appearing parallel phrases in-domain

Prediction of Driving Actions from Driving Signals

Effect of sympathetic relation and unsympathetic relation in multi-agent spoken dialogue system

Speech analysis of sung-speech and lyric recognition in monophonic singing

Deep neural network based acoustic model using speaker-class information for short time utterance

Combination of syllable based N-gram search and word search for spoken term detection through spoken queries and IV/OOV classification

Speech recognition for mixed speech and music by NMF using various cost functions and noise adaptive training methods

English to Japanese spoken lecture translation system by using DNN-HMM and phrase-based SMT

Comparison of syllable-based and phoneme-based DNN-HMM in Japanese speech recognition

Fast NMF based approach and VQ based approach using MFCC distance measure for speech recognition from mixed sound

Single channel dereverberation method in log-melspectral domain using limited stereo data for distant speaker identification

Speaker tracking with spherical microphone arrays

A low power many-core SoC with two 32-core clusters connected by tree based NoC for multimedia applications

Microphone array processing for distant speech recognition: Towards real-world deployment

Fast NMF based approach and improved VQ based approach for speech recognition from mixed sound

Filter options

Publication date

Keywords

Data set

INFONA - science communication portal

Search results for: Kazumasa Yamamoto

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options