Search results for: Seiichi Nakagawa

Items from 1 to 20 out of 36 results

chapter

Detection of overlapping acoustic events based on NMF with shared basis vectors

Kazumasa Yamamoto, Chikara Ishikawa, Koya Sahashi, Seiichi Nakagawa

2017 IEEE 6th Global Conference on Consumer Electronics (GCCE) > 1 - 5

2017 IEEE 6th Global Conference on Consumer Electronics (GCCE)

Acoustic Event Detection plays an important role for computational acoustic scene analysis. Although we would face with a sound overlapping problem in a real situation, conventional methods do not consider the problem enough. In this paper, we propose a new overlapped acoustic event detection technique combined a source separation technique of Non-negative Matrix Factorization with shared basis vectors...

chapter

Pseudo-pitch-synchronized phase information extraction and its application for robust speaker recognition

Longbiao Wang, Seiichi Nakagawa, Jianwu Dang, Jianguo Wei, more

2017 IEEE 6th Global Conference on Consumer Electronics (GCCE) > 1 - 5

2017 IEEE 6th Global Conference on Consumer Electronics (GCCE)

Recent studies have shown that phase information contains speaker-dependent characteristics and is effective for speaker recognition. In this paper, we summarize a robust phase feature extracted from Fourier spectrum (including pitch non-synchronized phase information and pseudo-pitchsynchronized phase information) and its application for speaker recognition for different speaking rate speech and...

chapter

Robust lecture speech translation for speech misrecognition and its rescoring effect from multiple candidates

Koya Sahashi, Norioki Goto, Hiroshi Seki, Kazumasa Yamamoto, more

2017 International Conference on Advanced Informatics, Concepts, Theory, and Applications (ICAICTA) > 1 - 6

2017 International Conference on Advanced Informatics, Concepts, Theory, and Applications (ICAICTA)

We describe a scheme to translate spoken English lectures into Japanese consisting of a deep neural network based English automatic speech recognition system (ASR) and an English to Japanese phrase-based statistical machine translation system (SMT). The bad influence of speech misrecognition for the translation model is focused. For coping with bad influence caused by speech misrecognition, we utilized...

chapter

Phase aware deep neural network for noise robust voice activity detection

Longbiao Wang, Khomdet Phapatanaburi, Zeyan Go, Seiichi Nakagawa, more

2017 IEEE International Conference on Multimedia and Expo (ICME) > 1087 - 1092

2017 IEEE International Conference on Multimedia and Expo (ICME)

Phase information is ignored for almost all voice activity detection (VAD). To exploit full information in the original signal, this paper proposes a deep neural network (DNN) using magnitude and phase information (that is, phase aware DNN) to achieve better VAD performance. Mel-frequency cepstral coefficient (MFCC), power-normalized cepstral coefficients (PNCC), instantaneous frequency derivative...

article

Spoofing Speech Detection Using Modified Relative Phase Information

Longbiao Wang, Seiichi Nakagawa, Zhaofeng Zhang, Yohei Yoshida, more

IEEE Journal of Selected Topics in Signal Processing > 2017 > 11 > 4 > 660 - 670

The detection of human and spoofing (synthetic or converted) speech has started to receive an increasing amount of attention. In this paper, modified relative phase (MRP) information extracted from a Fourier spectrum is proposed for spoofing speech detection. Because original phase information is almost entirely lost in spoofing speech using current synthesis or conversion techniques, some phase information...

chapter

Lyric recognition in monophonic singing using pitch-dependent DNN

Dairoku Kawai, Kazumasa Yamamoto, Seiichi Nakagawa

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 326 - 330

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

One of the difficulties in sung speech recognition is the small distance in an acoustic space between phonemes in sung speech. Therefore we considered clustering the speech based on a pitch (fundamental frequency F0) and creating a larger distance between the phonemes. In addition, we considered a two-stage training method of DNN-HMM: the first stage is trained by using conventional acoustic features...

chapter

A deep neural network integrated with filterbank learning for speech recognition

Hiroshi Seki, Kazumasa Yamamoto, Seiichi Nakagawa

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5480 - 5484

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Deep neural networks (DNN) have achieved significant success in the field of speech recognition. One of the main advantages of the DNN is automatic feature extraction without human intervention. Therefore, we incorporate a pseudo-filterbank layer to the bottom of DNN and train the whole filterbank layer and the following networks jointly, while most systems take pre-defined mel-scale filterbanks as...

chapter

Investigation of glottal features and annotation procedures for speech emotion recognition

Masaaki Takebe, Kazumasa Yamamoto, Seiichi Nakagawa

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Speech emotion recognition is a still challenging problem despite having been investigated over the last couple of decades. Conventional speech emotion recognition performance is low, but this may be improved by considering new features and an annotation method. In this paper, firstly we use glottal features for speech emotion recognition to improve its performance because the emotions are related...

chapter

Domain adaptation of a speech translation system for lectures by utilizing frequently appearing parallel phrases in-domain

Norioki Goto, Kazumasa Yamamoto, Seiichi Nakagawa

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper describes our scheme to translate spoken English lectures into Japanese consisting of an English automatic speech recognition system (ASR) that utilizes a deep neural network (DNN) and an English to Japanese phrase-based statistical machine translation system (SMT). We focused on domain adaptation of the acoustic and translation models. For domain adaptation of the translation model, frequently...

chapter

Effect of sympathetic relation and unsympathetic relation in multi-agent spoken dialogue system

Yuma Shibahara, Kazumasa Yamamoto, Seiichi Nakagawa

2016 International Conference On Advanced Informatics: Concepts, Theory And Application (ICAICTA) > 1 - 6

2016 International Conference On Advanced Informatics: Concepts, Theory And Application (ICAICTA)

Recently, spoken dialog systems using speech recognition technology have been becoming popular. Such as chat-like dialog systems, these systems which do not have any specific purpose are called “Non-Task-oriented spoken dialog system”. In this study, we focused on non-task-oriented spoken dialog systems. We have developed a multi-party chat-like dialog system with different preferences between two...

chapter

Speech analysis of sung-speech and lyric recognition in monophonic singing

Dairoku Kawai, Kazumasa Yamamoto, Seiichi Nakagawa

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 271 - 275

ICASSP 2016 - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Lyric recognition in singing is challenging because of a number of problems, including a lack of singing databases, superposed musical instruments and different spectral variations. First of all, we investigated the difference of spectral variations among read speech, spontaneous speech and sung speech and we found that sung speech recognition was the most difficult. Next, we consider Japanese lyric...

chapter

Deep neural network based acoustic model using speaker-class information for short time utterance

Hiroshi Seki, Kazumasa Yamamoto, Seiichi Nakagawa

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1222 - 1225

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In speech recognition, it is preferable not to hypothesize the details, e.g., specific age and gender, of a target user. However, speaker independence is one of the things that degrades ASR performance. In this work, we propose a speaker adaptation method to recognize a short time utterance. There have been several studies on speaker-independent DNN-HMM in which i-vector is computed, and the additional...

chapter

Combination of syllable based N-gram search and word search for spoken term detection through spoken queries and IV/OOV classification

Nagisa Sakamoto, Kazumasa Yamamoto, Seiichi Nakagawa

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) > 200 - 206

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

This paper presents a Japanese spoken term detection method for spoken queries using a combination of word-based search and syllable-based N-gram search with in-vocabulary/out-of-vocabulary (IV/OOV) term classification. The N-gram index in a recognized syllable-based lattice for OOV terms, which assumes recognition errors such as substitution, insertion and deletion errors, incorporates a distance...

chapter

Speech recognition for mixed speech and music by NMF using various cost functions and noise adaptive training methods

Naoaki Hashimoto, Kazumasa Yamamoto, Seiichi Nakagawa

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 27 - 30

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

We investigated speech recognition methods for mixed speech and music that only remove music based on non-negative matrix factorization (NMF). In this paper, we introduced the Euclidean distance of logarithm spectrum DLOG as a distance measure for source separation, which may correspond to the distance measure for speech recognition, and compared it with such traditional distance measures as the Kullback-Leibler...

chapter

English to Japanese spoken lecture translation system by using DNN-HMM and phrase-based SMT

Norioki Goto, Kazumasa Yamamoto, Seiichi Nakagawa

2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA) > 1 - 6

2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)

This paper presents our scheme to translate spoken English lectures into Japanese that consists of an English automatic speech recognition system (ASR) that utilizes a deep neural network (DNN) and an English to Japanese phrase-based statistical machine translation system (SMT). We utilized an existing Wall Street Journal corpus for our acoustic model and adapted it with MIT OpenCourseWare lectures...

chapter

Elimination of person names in spoken documents for privacy protection

Ryo Kawaguchi, Masatoshi Tsuchiya, Seiichi Nakagawa

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific > 1 - 4

2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

There is an increasing use of sensor networks capable of sensing multimedia data including audio data. Unfortunately, public use of these is not allowed because they contain crucial privacy information such as person and location names. Person name extraction (PNE), which is a widely investigated research topic, is an effective technique to resolve this problem. However, there is an important difference...

chapter

Comparison of syllable-based and phoneme-based DNN-HMM in Japanese speech recognition

Hiroshi Seki, Kazumasa Yamamoto, Seiichi Nakagawa

2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA) > 249 - 254

2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA)

Japanese is syllabic language. Additionally we have studied syllable-based GMM-HMM for Japanese speech recognition. In this paper, we investigate the differences of recognition accuracy using phoneme/syllable-based GMM-HMM and DNN (Deep Neural Network)-HMM. First, we present a comparison of syllable-based and phoneme-based DNN-HMM. Second, we train the tied state left-context dependent syllable DNN-HMM,...

chapter

English to Japanese spoken language translation system for classroom lectures

Veri Ferdiansyah, Seiichi Nakagawa

2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA) > 34 - 38

2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA)

This paper presents our attempt to create English automatic speech recognition (ASR) and English to Japanese statistical machine translation system (SMT). We used MIT OpenCourseWare lectures as our test lecture corpus. Wall Street Journal (WSJ) corpus adapted with MIT OpenCourseWare lectures was used as our acoustic model. MIT OpenCourseWare lecture transcriptions were utilized to create our language...

chapter

Fast NMF based approach and VQ based approach using MFCC distance measure for speech recognition from mixed sound

Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa

2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference > 1 - 4

2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

We have considered a speech recognition method for mixed sound, consisting of speech and music, that removes only the music based on vector quantization (VQ) and non-negative matrix factorization (NMF). Instead of conventional amplitude spectrum distance measure, MFCC distance measure which is not affected by the pitch is introduced. For isolated word recognition using the clean speech model, an improvement...

chapter

Speaker identification using pseudo pitch synchronized phase information in noisy environments

Yuta Kawakami, Longbiao Wang, Seiichi Nakagawa

2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference > 1 - 4

2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In conventional speaker identification methods based on mel-frequency cepstral coefficients (MFCCs), phase information is ignored. Recent studies have shown that phase information contains speaker dependent characteristics, and, pitch synchronous phase information is more suitable for speaker identification. In this paper, we verify the effectiveness of pitch synchronous phase information for speaker...

Data set:
ieee

Publication date

Set your own date range

Publication type

book (33)
article (3)

Keywords

SPEECH (18)
SPEECH RECOGNITION (14)
HIDDEN MARKOV MODELS (8)
ACOUSTICS (7)
FEATURE EXTRACTION (7)
MEL FREQUENCY CEPSTRAL COEFFICIENT (7)
TRAINING (7)
ADAPTATION MODELS (5)
DATABASES (5)
DNN-HMM (4)
PHASE INFORMATION (4)
SPEAKER RECOGNITION (4)
AUTOMATIC SPEECH RECOGNITION (3)
DATA MODELS (3)
MATHEMATICAL MODEL (3)
NOISE MEASUREMENT (3)
VOCABULARY (3)
CONTEXT (2)
DELAYS (2)
GAUSSIAN MIXTURE MODEL (GMM) (2)
GMM (2)
GMM-HMM (2)
HISTORY (2)
LANGUAGE MODEL (2)
LYRICS RECOGNITION (2)
PERPLEXITY (2)
SEMANTICS (2)
SPEAKER IDENTIFICATION (2)
SUPPORT VECTOR MACHINES (2)
TOPIC DEPENDENT (2)
TRAINING DATA (2)
VECTORS (2)
$N$ -GRAM (1)
ACOUSTIC MODELS (1)
BASEBAND (1)
BAYES RISK MINIMIZATION (1)
CEPSTRUM (1)
CHAT (1)
CLASSROOM LECTURE (1)
CLASSROOM LECTURES (1)
COMPUTATIONAL MODELING (1)
CONTEXT MODELING (1)
CONVENTIONAL CMN (1)
COUNTERMEASURES (1)
DATA MINING (1)
DATA-DRIVEN FILTERBANK (1)
DECISION SUPPORT SYSTEMS (1)
DEEP NEURAL NETWORK (1)
DEEP NEURAL NETWORKS (1)
DELAY (1)
DISCRIMINATIVE TRAINING (1)
DISTANT-TALKING ENVIRONMENTS (1)
DNN (1)
EDUCATIONAL INSTITUTIONS (1)
EMOTION RECOGNITION (1)
ENTROPY (1)
ERROR PROBABILITY (1)
EUCLIDEAN DISTANCE (1)
EVALUATION FRAMEWORK (1)
EVENT DETECTION (1)
FILTERBANK LEARNING (1)
FOURIER TRANSFORMS (1)
GROUP DELAY (1)
HIDDEN CONDITIONAL NEURAL FIELDS (1)
HIDDEN CONDITIONAL RANDOM FIELDS (1)
HMM (1)
HUMANS (1)
INDEXING (1)
IV/OOV CLASSIFICATION (1)
JOINTS (1)
LABELING (1)
LANGUAGE MODELING (1)
LATTICES (1)
LEAD (1)
LINEAR PROGRAMMING (1)
LOGIC GATES (1)
MACHINE TRANSLATION (1)
MAGNITUDE INFORMATION (1)
MAINICHI SHIMBUN (1)
MATERIALS REQUIREMENTS PLANNING (1)
MATRIX DECOMPOSITION (1)
MEL-FREQUENCY CEPSTRAL COEFFICIENT (MFCC) (1)
MEL-FREQUENCY CEPSTRAL COEFFICIENTS (1)
MFCC (1)
MIS-RECOGNITION (1)
MIT OCW (1)
MULTI-DIMENSIONAL SIGNAL PROCESSING (1)
MULTI-PARTY DIALOGUE (1)
MULTIPLE MICROPHONE PROCESSING (1)
MUSIC (1)
N-GRAM (1)
NATURAL LANGUAGE PROCESSING (1)
NEURAL NETWORKS (1)
NIST (1)
NOISE (1)
NOISY ENVIRONMENT (1)
NOISY ENVIRONMENTS (1)
NOISY SPEECH (1)
NOISY SPEECH RECOGNITION (1)
OPTIMIZATION (1)
more

INFONA - science communication portal

Search results for: Seiichi Nakagawa

Detection of overlapping acoustic events based on NMF with shared basis vectors

Pseudo-pitch-synchronized phase information extraction and its application for robust speaker recognition

Robust lecture speech translation for speech misrecognition and its rescoring effect from multiple candidates

Phase aware deep neural network for noise robust voice activity detection

Spoofing Speech Detection Using Modified Relative Phase Information

Lyric recognition in monophonic singing using pitch-dependent DNN

A deep neural network integrated with filterbank learning for speech recognition

Investigation of glottal features and annotation procedures for speech emotion recognition

Domain adaptation of a speech translation system for lectures by utilizing frequently appearing parallel phrases in-domain

Effect of sympathetic relation and unsympathetic relation in multi-agent spoken dialogue system

Speech analysis of sung-speech and lyric recognition in monophonic singing

Deep neural network based acoustic model using speaker-class information for short time utterance

Combination of syllable based N-gram search and word search for spoken term detection through spoken queries and IV/OOV classification

Speech recognition for mixed speech and music by NMF using various cost functions and noise adaptive training methods

English to Japanese spoken lecture translation system by using DNN-HMM and phrase-based SMT

Elimination of person names in spoken documents for privacy protection

Comparison of syllable-based and phoneme-based DNN-HMM in Japanese speech recognition

English to Japanese spoken language translation system for classroom lectures

Fast NMF based approach and VQ based approach using MFCC distance measure for speech recognition from mixed sound

Speaker identification using pseudo pitch synchronized phase information in noisy environments

Filter options

Publication date

Publication type

Keywords

Journal

INFONA - science communication portal

Search results for: Seiichi Nakagawa

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Journal

Reporting an error / abuse

Sending the report failed

Accessibility options