Search results for: N. Minematsu

Items from 1 to 20 out of 34 results

chapter

Improved and robust prediction of pronunciation distance for individual-basis clustering of World Englishes pronunciation

S. Kasahara, S. Kitahara, N. Minematsu, H.-P. Shen, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 3216 - 3220

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

English is the only language available for global communication and is used by approximately 1.5 billions of speakers. It is also known to have a large diversity of pronunciation due to the influence of speakers' mother tongue, called accents. Our project aims at creating a global and individual-basis map of English pronunciations to be used in teaching and learning World Englishes (WE) as well as...

chapter

Automatic pronunciation clustering using a World English archive and pronunciation structure analysis

H.-P. Shen, N. Minematsu, T. Makino, S. H. Weinberger, more

2013 IEEE Workshop on Automatic Speech Recognition and Understanding > 222 - 227

2013 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

English is the only language available for global communication. Due to the influence of speakers' mother tongue, however, those from different regions inevitably have different accents in their pronunciation of English. The ultimate goal of our project is creating a global pronunciation map of World Englishes on an individual basis, for speakers to use to locate similar English pronunciations. If...

chapter

Human speech model based on information separation and its application to speech processing

N Minematsu

2010 7th International Symposium on Chinese Spoken Language Processing > 477 - 482

7th International Symposium on Chinese Spoken Language Processing (ISCSLP 2010)

This paper points out that no existing technically-implemented speech model is adequate enough to describe one of the most fundamental and unique capacities of human speech processing. Language acquisition of infants is based on vocal imitation but they don't impersonate their parents and imitate only the linguistic and para-linguistic aspects of the parents' utterances. The vocal imitation is found...

chapter

Improved Mandarin segmental duration prediction with automatically extracted syntax features

Miaomiao Wen, Miaomiao Wang, K Hirose, N Minematsu

IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS > 621 - 624

2010 10th International Conference on Signal Processing (ICSP 2010)

Previous researches have indicated the relevance between segmental duration and syntax information, but the usefulness of syntax features have not been thoroughly studied for predicting segmental duration. In this paper, we design two sets of syntax features to improve Mandarin phone and pause duration prediction respectively. Instead of using manually extracted syntax information as previous researches...

chapter

Control of prosodie focus in corpus-based generation of fundamental frequency contours based on the generation process model

K Hirose, K Ochi, N Minematsu

IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS > 629 - 632

2010 10th International Conference on Signal Processing (ICSP 2010)

HMM-based speech synthesis is known to be a possible solution for realizing "flexibility" in speech synthesis. However, its frame-by-frame process of acoustic features is not appropriate for prosodic features. Prosodic features cover a wider time span as compared to segmental features, and should be handled differently. From this point of view, a method has been developed for generating...

chapter

Control of fundamental frequency contours using the generation process model in HMM-based speech synthesis

T Matsuda, K Hirose, N Minematsu

IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS > 617 - 620

2010 10th International Conference on Signal Processing (ICSP 2010)

A method was proposed to increase the naturalness of prosody generated with speech synthesis based on hidden Markov models (HMMs). This method adds a constraint to the fundamental frequency contours (F₀ contours) during the HMM-based speech synthesis. The constraint adopted is the generation process model of F₀ contours (F₀ model). The method first extracts the F₀ model parameters from the original...

chapter

HMM-based sequence-to-frame mapping for voice conversion

Yu Qiao, Daisuke Saito, N Minematsu

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 4830 - 4833

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

Voice conversion can be reduced to a problem to find a transformation function between the corresponding speech sequences of two speakers. Perhaps the most voice conversions methods are GMM-based statistical mapping methods. However, the classical GMM-based mapping is frame-to-frame, and cannot take account of the contextual information existing over a speech sequence. It is well known that HMM yields...

chapter

Sub-structure-based estimation of pronunciation proficiency and classification of learners

M. Suzuki, N. Minematsu, Dean Luo, K. Hirose

2009 IEEE Workshop on Automatic Speech Recognition&Understanding > 574 - 579

2009 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU 2009)

Automatic estimation of pronunciation proficiency has its specific difficulty. Adequacy in controlling the vocal organs can be estimated from spectral envelopes of input utterances but the envelope patterns are also affected easily by different speakers. To develop a pedagogically sound method for automatic estimation, the envelope changes caused by linguistic factors and those by extra-linguistic...

chapter

A study on Hidden Structural Model and its application to labeling sequences

Yu Qiao, M. Suzuki, N. Minematsu

2009 IEEE Workshop on Automatic Speech Recognition&Understanding > 118 - 123

2009 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU 2009)

This paper proposes hidden structure model (HSM) for statistical modeling of sequence data. The HSM generalizes our previous proposal on structural representation by introducing hidden states and probabilistic models. Compared with the previous structural representation, HSM not only can solve the problem of misalignment of events, but also can conduct structure-based decoding, which allows us to...

chapter

Mixture of Probabilistic Linear Regressions: A unified view of GMM-based mapping techiques

Yu Qiao, N. Minematsu

2009 IEEE International Conference on Acoustics, Speech and Signal Processing > 3913 - 3916

ICASSP 2009 - 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

This paper introduces a model of mixture of probabilistic linear regressions (MPLR) to learn a mapping function between two feature spaces. The MPLR consists of weighted combination of several probabilistic linear regressions, whose parameters are estimated by using matrix calculation. The mixture nature of MPLR allows it to model nonlinear transformation. The formulation of MPLR is general and independent...

chapter

Control of prosodic focus in corpus-based generation of fundamental frequency contours of Japanese based on the generation process model

K. Ochi, K. Hirose, N. Minematsu

2009 IEEE International Conference on Acoustics, Speech and Signal Processing > 4257 - 4260

ICASSP 2009 - 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

A total corpus-based process of generating prosodic features from text is developed. The process first predicts pauses and phone durations, and then generates F₀ contours. Since F₀ contour generation is based on the generation process model, it is rather easy to manipulate the generated F₀ contours in command level. A method was developed for generating sentence F₀ contours, when a focus is placed...

chapter

Affine invariant features and their application to speech recognition

Yu Qiao, M. Suzuki, N. Minematsu

2009 IEEE International Conference on Acoustics, Speech and Signal Processing > 4629 - 4632

ICASSP 2009 - 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

This paper proposes a set of affine invariant features (AIFs) for sequence data. The proposed AIFs can be calculated directly from the sequence data, and their invariance to affine transformation is proved mathematically through algebraic calculation. We apply the AIFs to speech recognition. Since the vocal tract length (VTL) difference causes to frequency warping which can be approximated well by...

article

A Theory of Phase Singularities for Image Representation and its Applications to Object Tracking and Image Matching

Yu Qiao, Wei Wang, N. Minematsu, Jianzhuang Liu, more

IEEE Transactions on Image Processing > 2009 > 18 > 10 > 2153 - 2166

This paper studies phase singularities (PSs) for image representation. We show that PSs calculated with Laguerre-Gauss filters contain important information and provide a useful tool for image analysis. PSs are invariant to image translation and rotation. We introduce several invariant features to characterize the core structures around PSs and analyze the stability of PSs to noise addition and scale...

chapter

Automatic Assessment of Language Proficiency through Shadowing

D. Luo, N. Minematsu, Y. Yamauchi, K. Hirose

2008 6th International Symposium on Chinese Spoken Language Processing > 1 - 4

2008 6th International Symposium on Chinese Spoken Language Processing

Shadowing is a practice that requires learners to shadow a presented native utterance as closely and quickly as possible. Learners' pronunciation in shadowing, especially in the case of beginners, often becomes inarticulate and corrupt. These features of shadowing make it very difficult to assess shadowing productions. In this paper, we investigate the automatic pronunciation scoring methods for shadowing...

chapter

Corpus-based synthesis of Mandarin speech with F₀ contours generated by superposing tone components on rule-generated phrase components

K. Hirose, Q. Sun, N. Minematsu

2008 IEEE Spoken Language Technology Workshop > 33 - 36

2008 IEEE Workshop on Spoken Language Technology. SLT 2008

Mandarin speech synthesis was conducted by generating prosodic features by the proposed method and segmental features by HMM-based method. The proposed method generates sentence fundamental frequency (F₀) contours by representing them as a superposition of tone components on phrase components. The tone components are realized by concatenating their fragments at tone nuclei predicted by a corpus-based...

chapter

Corpus-based generation of F₀ contours of Japanese based on the generation process model and its control for prosodic focus

K. Hirose, K. Ochi, N. Minematsu

2008 9th International Conference on Signal Processing > 647 - 650

2008 9th International Conference on Signal Processing (ICSP 2008)

A total corpus-based process of generating prosodic features form text is developed. The process first predicts pauses and phone durations, and then generates F₀ contours. Since F₀ contour generation is based on the generation process model, it is rather easy to manipulate the generated F₀ contours in command level. A method was developed for generating sentence F₀ contours, when a focus is placed...

chapter

Experimental study of structure to speech conversion

N. Minematsu, D. Saito, K. Hirose

2008 9th International Conference on Signal Processing > 651 - 654

2008 9th International Conference on Signal Processing (ICSP 2008)

Most of the speech synthesizers have been developed as text (phoneme sequence) to speech converters and, in this framework, text input is a precondition for speech production. However, we can say that no child acquires spoken language by reading a given text out. Children are explained to acquire spoken language by imitating the utterances of their parents but they never imitate the voices of their...

chapter

ICA’s suitability assisted by Voice Activity Detection

A. Rebordao, M.K. Islam Molla, K. Hirose, N. Minematsu

2008 International Conference on Audio, Language and Image Processing > 665 - 669

2008 International Conference on Audio, Language and Image Processing

This research presents an innovative system for adaptive speech denoising using Independent Component Analysis (ICA) and Voice Activity Detection (VAD). Designed for instantaneous mixtures (two sources and two microphones), the proposed system identifies the noise contained in each noisy mixture. For that type of noise applies the most suitable ICA method among three methods (FastICA, Kernel ICA and...

chapter

Phase singularities for image representation and matching

Yu Qiao, Wei Wang, N. Minematsu, Jianzhuang Liu, more

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 885 - 888

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

Phase features are widely used in image processing and representation due to their stability to deformation and noise. However, phase singularities,where the signals vanish, are generally regarded as harmful and unreliable facts. In this paper, on the contrary, we will show that phase singularities calculated by Laguerre-Gauss filter contain important information of input image and can provide a reliable...

chapter

Unsupervised optimal phoneme segmentation: Objectives, algorithm and comparisons

Yu Qiao, N. Shimomura, N. Minematsu

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 3989 - 3992

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

Phoneme segmentation is a fundamental problem in many speech recognition and synthesis studies. Unsupervised phoneme segmentation assumes no knowledge on linguistic contents and acoustic models, and thus poses a challenging problem. The essential question here is what is the optimal segmentation. This paper formulates the optimal segmentation problem into a probabilistic framework. Using statistics...

Publication date

Set your own date range

INFONA - science communication portal

Search results for: N. Minematsu

Improved and robust prediction of pronunciation distance for individual-basis clustering of World Englishes pronunciation

Automatic pronunciation clustering using a World English archive and pronunciation structure analysis

Human speech model based on information separation and its application to speech processing

Improved Mandarin segmental duration prediction with automatically extracted syntax features

Control of prosodie focus in corpus-based generation of fundamental frequency contours based on the generation process model

Control of fundamental frequency contours using the generation process model in HMM-based speech synthesis

HMM-based sequence-to-frame mapping for voice conversion

Sub-structure-based estimation of pronunciation proficiency and classification of learners

A study on Hidden Structural Model and its application to labeling sequences

Mixture of Probabilistic Linear Regressions: A unified view of GMM-based mapping techiques

Control of prosodic focus in corpus-based generation of fundamental frequency contours of Japanese based on the generation process model

Affine invariant features and their application to speech recognition

A Theory of Phase Singularities for Image Representation and its Applications to Object Tracking and Image Matching

Automatic Assessment of Language Proficiency through Shadowing

Corpus-based synthesis of Mandarin speech with F₀ contours generated by superposing tone components on rule-generated phrase components

Corpus-based generation of F₀ contours of Japanese based on the generation process model and its control for prosodic focus

Experimental study of structure to speech conversion

ICA’s suitability assisted by Voice Activity Detection

Phase singularities for image representation and matching

Unsupervised optimal phoneme segmentation: Objectives, algorithm and comparisons

Filter options

Publication date

Content availability

Publication type

Keywords

Data set

Journal

INFONA - science communication portal

Search results for: N. Minematsu

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Data set

Journal

Reporting an error / abuse

Sending the report failed

Accessibility options