ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

Items from 21 to 40 out of 44 results

chapter

Automatic phonetics-driven reconstruction of medical dictations on multiple levels of segmentation

S. Petrik, F. Pernkopf

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4317 - 4320

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

Automatic phonetic reconstruction of medical dictations from non- literal and automatically recognized speech transcripts leads to closer-to-literal transcripts for training. In this paper, we introduce an extended alignment method assessing multiple levels of text segmentation and show how open issues like wrong segmentation in the recognized transcript can be resolved. Furthermore, the effect of...

chapter

Segment selection method based on tonal validity evaluation using machine learning for concatenative speech synthesis

A. Yoshida, H. Mizuno, K. Mano

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4617 - 4620

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

This paper proposes a speech segment selection method based on machine learning for concatenative speech synthesis systems. The proposed method has two novel features. One is its use of support vector machine (SVM) to estimate the subjective correctness of pitch accent with respect to each accent phrase of possible candidate speech segments. The other is its use of a determination function to identify...

chapter

Incorporation of phrase intonation to context clustering for average voice models in HMM-based Thai speech synthesis

S. Chomphan, T. Kobayashi

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4637 - 4640

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

This paper describes a novel approach to the context clustering process in a speaker independent HMM-based Thai speech synthesis for improvement of the tone intelligibility of the average voice and also the speaker adapted voice. A couple of phrase intonation features from a generative model including a baseline value of fundamental frequency and a phrase command amplitude are extracted and thereafter...

chapter

A study of JEMA for intonation modeling

P.D. Aguero, A. Bonafonte

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4625 - 4628

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

In the literature many intonation models are trained using parameters extracted sentence-by-sentence on contours interpolated in the unvoiced segments. This may introduce a bias in the final parameters and a reduction of the generalization of the model due to the increased dispersion of them. Recently, we have proposed JEMA, a joint extraction and prediction approach for intonation modeling that avoids...

chapter

Recognition for synthesis: Automatic parameter selection for resynthesis of emotional speech from neutral speech

M. Bulut, Sungbok Lee, S. Narayanan

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4629 - 4632

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

One of the biggest challenges in emotional speech resynthesis is the selection of modification parameters that will make humans perceive a targeted emotion. The best selection method is by using human raters. However, for large evaluation sets this process can be very costly. In this paper, we describe a recognition for synthesis (RFS) system to automatically select a set of possible parameter values...

chapter

Admissible stopping in viterbi beam search for unit selection in concatenative speech synthesis

S. Sakai, T. Kawahara, S. Nakamura

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4613 - 4616

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

Corpus-based concatenative speech synthesis is very popular these days due to its highly natural speech quality. The amount of computation required in the run time, however, is often quite large and various approaches have been proposed for reducing this runtime computation. In this paper, we propose early stopping schemes for Viterbi beam search in the unit selection, with which we can stop early...

chapter

Minimum generation error criterion considering global/local variance for HMM-based speech synthesis

Long Qin, Yi-Jian Wu, Zhen-Hua Ling, Ren-Hua Wang, more

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4621 - 4624

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

Due to the inconsistency between the maximum likelihood (ML) based training and the synthesis application in HMM-based speech synthesis, a minimum generation error (MGE) criterion had been proposed for HMM training. This paper continues to apply the MGE criterion to model adaptation for HMM-based speech synthesis. We propose a MGE linear regression (MGELR) based model adaptation algorithm, where the...

chapter

Text-independent voice conversion based on state mapped codebook

Meng Zhang, Jianhua Tao, Jilei Tian, Xia Wang

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4605 - 4608

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

Voice conversion has become more and more important in speech technology, but most of current works have to use parallel utterances of both source and target speaker as the training corpus, which limits the application of the technology. In the paper, we propose a new method of text- independent voice conversion which uses non-parallel corpus for the training. The hidden Markov model (HMM) is used...

chapter

A statistical method for database reduction for embedded unit selection speech synthesis

P. Tsiakoulis, A. Chalamandaris, S. Karabetsos, S. Raptis

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4601 - 4604

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

This paper presents a new method for the reduction of an existing speech database in order to be used for domain independent embedded unit selection text-to-speech synthesis. The method relies on statistical data produced by the unit selection process on a large text corpus. It utilizes the selection frequency, as well as the actual score of each unit. Both objective and subjective evaluation of the...

chapter

Combination of agglomerative and sequential clustering for speaker diarization

D. Vijayasenan, F. Valente, H. Bourlard

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4361 - 4364

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

This paper aims at investigating the use of sequential clustering for speaker diarization. Conventional diarization systems are based on parametric models and agglomerative clustering. In our previous work we proposed a non-parametric method based on the agglomerative information bottleneck for very fast diarization. Here we consider the combination of sequential and agglomerative clustering for avoiding...

chapter

Long-term flexible 2D cepstral modeling of speech spectral amplitudes

M. Firouzmand, L. Girin

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 3937 - 3940

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

This paper presents a method for modeling the envelope of spectral amplitude parameters of speech signals in "two dimensions" (2D). It consists of two cascaded modelings: the first one along the frequency axis is the usual cepstrum technique, which consists of modeling the log-scaled spectral envelope with a discrete cosine model (DCM). The second one, along the time axis, consists of modeling...

chapter

Extending efficient spectral envelope modeling to Mel-frequency based representation

F. Villavicencio, A. Robel, X. Rodet

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 1625 - 1628

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

In this work we consider the problem of spectral envelope estimation using spectra with perceptually warped frequency axis. The goal of this work is the reduction of the order of the spectral envelope model which will facilitate the use of these envelopes for training of voice conversion systems. We adapt the true-envelope estimator to Mel-frequency representations and adapt a recently proposed cepstral...

chapter

Is voice transformation a threat to speaker identification?

Qin Jin, A.R. Toth, A.W. Black, T. Schultz

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4845 - 4848

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

With the development of voice transformation and speech synthesis technologies, speaker identification systems are likely to face attacks from imposters who use voice transformed or synthesized speech to mimic a particular speaker. Therefore, we investigated in this paper how speaker identification systems perform on voice transformed speech. We conducted experiments with two different approaches,...

chapter

Significance of early tagged contextual graphemes in grapheme based speech synthesis and recognition systems

G.K. Anumanchipalli, K. Prahallad, A.W. Black

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4645 - 4648

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

In this paper we present our argument that context information could be used in early stages i.e., during the definition of mapping of the words into sequence of graphemes. We show that the early tagged contextual graphemes play a significant role in improving the performance of grapheme based speech synthesis and speech recognition systems.

chapter

Voice conversion with linear prediction residual estimaton

W.S. Percybrooks, E. Moore

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4673 - 4676

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

The work presented here shows a comparison between a voice conversion system based on converting only the vocal tract representation of the source speaker and an augmented system that adds an algorithm for estimating the target excitation signal. The estimation algorithm uses a stochastic model for relating the excitation signal to the vocal tract features. The two systems were subjected to objective...

chapter

On combining statistical methods and frequency warping for high-quality voice conversion

D. Erro, T. Polyakova, A. Moreno

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4665 - 4668

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

In current voice conversion systems, obtaining a high similarity between converted and target voices requires a high degree of signal manipulation, which implies important quality degradation, up to the point that in some cases the quality scores are unacceptable for real-life applications. Indeed, a tradeoff can be observed between the similarity scores and the quality scores achieved by a given...

chapter

Development of a visual speech synthesizer via second-order isomorphism

Jintao Jiang, J.M. Aronoff, L.E. Bernstein

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4677 - 4680

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

The goals of this study were to evaluate the synthesis of visible speech that was based on 3-D motion data using second-order isomorphism. To do this, word stimuli were generated for perceptual discrimination and identification tasks. Discrimination trials were based on word-pairs that were predicted to be at four levels of perceptual dissimilarity. Results from the discrimination tasks indicated...

chapter

Minumum generation error linear regression based model adaptation for HMM-based speech synthesis

Long Qin,, Yi-Jian Wu,, Zhen-Hua Ling,, Ren-Hua Wang,, more

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 3953 - 3956

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

chapter

Minimum unit selection error training for HMM-based unit selection speech synthesis system

Zhen-Hua Ling, Ren-Hua Wang

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 3949 - 3952

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

This paper presents a minimum unit selection error (MUSE) training method for HMM-based unit selection speech synthesis system, which selects the optimal phone-sized unit sequence from the speech database by maximizing the combined likelihood of a group of trained HMMs. Under MUSE criterion, the weights and distribution parameters of these HMMs are estimated to minimize the number of different units...

chapter

Template constrained posterior for verifying phone transcriptions

Lijuan Wang, Tao Hu, F. Soong

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4681 - 4684

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

A new statistical confidence measure, template constrained posterior (TCP), is proposed for verifying phone transcriptions of speech databases. Different from generalized posterior probability (GPP), TCP is computed by considering string hypotheses that bear a focused unit, e.g., phone with partially matched left and right contexts. Parameters used for TCP include context window length, partial matching...

Keywords:
SPEECH SYNTHESIS

Publication date

Set your own date range

Keywords

SPEECH PROCESSING (16)
SPEECH RECOGNITION (11)
HIDDEN MARKOV MODELS (8)
NATURAL LANGUAGE PROCESSING (7)
SPEECH ANALYSIS (5)
UNIT SELECTION (5)
CONCATENATIVE SPEECH SYNTHESIS (4)
SPEECH CODING (4)
DECISION TREES (3)
GAUSSIAN PROCESSES (3)
HIDDEN MARKOV MODEL (3)
HMM (3)
PROBABILITY (3)
SPEAKER RECOGNITION (3)
STATISTICAL ANALYSIS (3)
VOICE CONVERSION (3)
AGGLOMERATIVE CLUSTERING (2)
AUDIO DATABASES (2)
CEPSTRAL ANALYSIS (2)
DISCRETE COSINE TRANSFORMS (2)
DISCRIMINATIVE TRAINING (2)
ENERGY ENVELOPE (2)
FINITE STATE MACHINES (2)
GAUSSIAN MIXTURE MODEL (2)
LEARNING (ARTIFICIAL INTELLIGENCE) (2)
LEAST MEAN SQUARES METHODS (2)
MANDARIN TEXT-TO-SPEECH SYSTEM (2)
MAXIMUM LIKELIHOOD ESTIMATION (2)
MINIMUM GENERATION ERROR (2)
MODEL ADAPTATION (2)
NATURAL SPEECH (2)
PREDICTION THEORY (2)
REGRESSION ANALYSIS (2)
SIGNAL REPRESENTATION (2)
SPEAKER ADAPTATION (2)
SPECTRAL ANALYSIS (2)
SPEECH DATABASE (2)
SPEECH ENHANCEMENT (2)
STOCHASTIC PROCESSES (2)
TEXT-TO-SPEECH SYNTHESIS (2)
3D MOTION DATA (1)
ACCENT (1)
ACCENT ESTIMATION (1)
ACOUSTIC PARAMETERS (1)
ADAPTIVE ESTIMATION (1)
ADAPTIVE SIGNAL PROCESSING (1)
ADAPTIVE SPECTRAL SMOOTHING (1)
ADMISSIBLE STOPPING (1)
AGGLOMERATIVE AND SEQUENTIAL INFORMATION BOTTLENECK (1)
ANALYSIS-BY-SYNTHESIS FEATURES (1)
ANALYSIS-BY-SYNTHESIS SPEECH CODERS (1)
APERIODIC ENERGIES (1)
APERIODICITY ESTIMATION (1)
APPROXIMATION THEORY (1)
ARABIC (1)
ARABIC LARGE VOCABULARY (1)
ARABIC SPEECH-TO-TEXT SYSTEMS (1)
ARTICULATORY RECOGNITION (1)
ARTICULATORY SYNTHESIS (1)
AUGMENTED SYSTEM (1)
AUTOMATIC CONTEXT SENSITIVE PHONE SET MAPPING METHOD (1)
AUTOMATIC EVALUATION (1)
AUTOMATIC FREQUENCY ESTIMATION (1)
AUTOMATIC JOINT PROSODY LABELING (1)
AUTOMATIC PARAMETER SELECTION (1)
AUTOMATIC PHONETICS RECONSTRUCTION (1)
AUTOMATIC SPEECH RECOGNITION (1)
AUTOMATIC TRANSCRIPTION (1)
AVERAGE VOICE (1)
AVERAGE VOICE MODEL (1)
AVERAGE VOICE MODELS (1)
BASELINE SYSTEM (1)
BI-DIRECTIONAL CROSS-LINGUAL MAPPINGS (1)
BILINGUAL (1)
BILINGUAL CODE SPEECH SYNTHESIS (1)
BILINGUAL MANDARIN-ENGLISH TTS (1)
BILINGUAL SPEECH DATABASE (1)
BLIZZARD CHALLENGE (1)
BLIZZARD CHALLENGE 2007 (1)
BREATH GROUP (1)
CASCADED MODELINGS (1)
CEPSTRAL LIFTERING (1)
CEPSTRAL MODEL ORDER SELECTION CRITERION (1)
CHINESE CHARACTER TRANSCRIPTION (1)
COMPUTATIONAL LINGUISTICS (1)
CONCATENATIVE TEXT-TO-SPEECH SYNTHESIS (1)
CONFIDENCE MEASURE (1)
CONSISTENT SAMPLING (1)
CONSISTENT SAMPLING THEORY (1)
CONTEXT CLUSTERING (1)
CONTEXT SENSITIVE MAPPING (1)
CONTEXT WINDOW LENGTH (1)
CONTEXTUAL GRAPHEME (1)
CONTEXTUAL GRAPHEMES (1)
CONTINUOUS MANDARIN SPEECH (1)
CONTOURS INTERPOLATION (1)
CORRECT PRONUNCIATION PREDICTION (1)
COST DEGRADATION CRITERION (1)
COVARIANCE ANALYSIS (1)
more

INFONA - science communication portal

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

Automatic phonetics-driven reconstruction of medical dictations on multiple levels of segmentation

Segment selection method based on tonal validity evaluation using machine learning for concatenative speech synthesis

Incorporation of phrase intonation to context clustering for average voice models in HMM-based Thai speech synthesis

A study of JEMA for intonation modeling

Recognition for synthesis: Automatic parameter selection for resynthesis of emotional speech from neutral speech

Admissible stopping in viterbi beam search for unit selection in concatenative speech synthesis

Minimum generation error criterion considering global/local variance for HMM-based speech synthesis

Text-independent voice conversion based on state mapped codebook

A statistical method for database reduction for embedded unit selection speech synthesis

Combination of agglomerative and sequential clustering for speaker diarization

Long-term flexible 2D cepstral modeling of speech spectral amplitudes

Extending efficient spectral envelope modeling to Mel-frequency based representation

Is voice transformation a threat to speaker identification?

Significance of early tagged contextual graphemes in grapheme based speech synthesis and recognition systems

Voice conversion with linear prediction residual estimaton

On combining statistical methods and frequency warping for high-quality voice conversion

Development of a visual speech synthesizer via second-order isomorphism

Minumum generation error linear regression based model adaptation for HMM-based speech synthesis

Minimum unit selection error training for HMM-based unit selection speech synthesis system

Template constrained posterior for verifying phone transcriptions

Filter options

Publication date

Keywords

INFONA - science communication portal

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes