ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

Items from 1 to 20 out of 44 results

chapter

A novel approach to mixed phase room impulse response inversion for speech dereverberation

N. Cahill, R. Lawlor

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4593 - 4596

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

Outlined in this paper is a novel approach to speech dereverberation when an estimate of the source-receiver transfer function is known. It is a two-stage algorithm based on the minimum phase/allpass decomposition of a mixed phase room impulse response (RIR). The reverberant speech is first filtered with the inverse minimum phase component of the RIR. Then a non-negative matrix factorization (NMF)...

chapter

Discriminative training for improving letter-to-sound conversion performance

Yi-Ning Chen, Peng Liu, Jia-Li You, F.K. Soong

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4649 - 4652

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

In this paper, we propose to use discriminative training (DT) for improving letter-to-sound (LTS) conversion performance. LTS is a critical component in both ASR and TTS for predicting the correct pronunciation of a word not included in the lexicon. For TTS applications, predicting the proper pronunciation of an out-of-vocabulary person/place name, especially a name with foreign origin can be challenging...

chapter

Analysis-by-synthesis features for speech recognition

Z. Al Bawab, R. Bhiksha, R.M. Stern

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4185 - 4188

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

We present a framework for speech recognition that accounts for hidden articulatory information. We model the articulatory space using a codebook of articulatory configurations geometrically derived from EMA measurements available in the MOCHA database. The articulatory parameter set we derive is in the form of Maeda parameters. In turn, these parameters are used in a physiologically- motivated articulatory...

chapter

Stylization of pitch with syllable-based linear segments

S. Ravuri, D.P.W. Ellis

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 3985 - 3988

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

Fundamental frequency contours for speech, as obtained by common pitch tracking algorithms, contain a great deal of fine detail that is unlikely to hold much perceptual significance for listeners. In our experiments, a radically reduced pitch contour consisting of a single linear segment for each syllable was found to judged as equally natural as the original pitch track by listeners, based on high-quality...

chapter

Exploration of high-level prosodic patterns for continuous mandarin speech

Chen-Yu Chiang, Hsiu-Min Yu, Yih-Ru Wang, Sin-Horng Chen

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 3977 - 3980

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

In this paper, the high-level prosodic patterns of prosodic word (PW), prosodic phrase (PPh) and breath group/prosodic phrase group (BQ/PQ) for syllable pitch-level and duration are explored using an automatic joint prosody labeling and modeling method. Experimental results on a treebank speech corpus showed that the explored high-level prosodic patterns not only matched well with our a priori knowledge...

chapter

Further analysis of LSM-based unit pruning forunit selection TTS

J.R. Bellegarda

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 3961 - 3964

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

The level of quality that can be achieved in concatenative text-to-speech synthesis is primarily governed by the inventory of units used in unit selection. This has led to the collection of ever larger corpora in the quest for ever more natural synthetic speech. As operational considerations limit the size of the unit inventory, however, pruning is critical to removing any instances that prove either...

chapter

Unsupervised optimal phoneme segmentation: Objectives, algorithm and comparisons

Yu Qiao, N. Shimomura, N. Minematsu

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 3989 - 3992

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

Phoneme segmentation is a fundamental problem in many speech recognition and synthesis studies. Unsupervised phoneme segmentation assumes no knowledge on linguistic contents and acoustic models, and thus poses a challenging problem. The essential question here is what is the optimal segmentation. This paper formulates the optimal segmentation problem into a probabilistic framework. Using statistics...

chapter

Unit database pruning based on the cost degradation criterion for concatenative speech synthesis

N. Nishizawa, H. Kawai

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 3969 - 3972

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

A novel method of unit database pruning for concatenative speech synthesis is proposed. The proposed method uses sums of the unit preference criterion, which are calculated from cost degradation from the optimal sequence, instead of the appearance frequencies of units, which is used in the conventional method. Therefore, the proposed method is an extension of the conventional method. Since not only...

chapter

Estimation of the voicing cut-off frequency contour of natural speech based on harmonic and aperiodic energies

K. Hermus, L. Girin, H. Van hamme, S. Irhimeh

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4473 - 4476

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

We present a new algorithm for the automatic estimation of the voicing cut-off frequency (VCO), i.e., the frequency that separates the periodic low-frequency part from the aperiodic high-frequency part in voiced segments of natural speech. Starting from the power spectrum of a two pitch period speech frame, we define the VCO to be located at the frequency for which the sum of the periodic and aperiodic...

chapter

Robust phone set mapping using decision tree clustering for cross-lingual phone recognition

Khe Chai Sim, Haizhou Li

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4309 - 4312

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

Recently, research related to multi-lingual and cross-lingual speech has gained increasing popularity. One of the major problems when dealing with multi-lingual speech data is the mapping of the phone sets between different languages. Phone mapping is useful for cross-lingual speech recognition, cross-lingual pronunciation modelling and mixed language speech synthesis, to name a few. In this paper,...

chapter

Speaker and style adaptation using average voice model for style control in HMM-based speech synthesis

M. Tachibana, S. Izawa, T. Nose, T. Kobayashi

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4633 - 4636

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

We propose a technique for synthesizing speech with desired style expressivity of an arbitrary target speaker's voice. In an MLLR-based speaker adaptation technique for multiple regression hidden semi-Markov model (MRHSMM), the quality of synthesized speech crucially depends on the initial MRHSMM trained from a certain source speaker's data and it is not always possible to synthesize natural sounding...

chapter

Improving the modeling of the noise part in the harmonic plus noise model of speech

Y. Pantazis, Y. Stylianou

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4609 - 4612

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

Harmonic + noise model (HNM) is a hybrid model of speech with a harmonic component and a noise component. While the harmonic part describes efficiently the periodicities in speech signals (voiced parts), modeling of the noise part introduces artifacts primarily because of the specific time-domain characteristics of noise in voiced speech. In this paper, we concentrated on the modeling of noise in...

chapter

Performance evaluation of the speaker-independent HMM-based speech synthesis system “HTS 2007” for the Blizzard Challenge 2007

J. Yamagishi, T. Nose, H. Zen, T. Toda, more

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 3957 - 3960

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

This paper describes a speaker-independent/adaptive HMM-based speech synthesis system developed for the Blizzard Challenge 2007. The new system, named "HTS-2007", employs speaker adaptation (CSMAPLR+MAP), feature-space adaptive training, mixed-gender modeling, and full-covariance modeling using CSMAPLR transforms, in addition to several other techniques that have proved effective in our...

chapter

Time-varying linear prediction for speech analysis and synthesis

K. Schnell, A. Lacroix

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 3941 - 3944

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

In this contribution, a time-varying linear prediction is proposed for speech analysis and synthesis. In comparison to the time-invariant prediction, the predictor coefficients are time-varying within the frames. For that purpose, the coefficient trajectories can be described by basis functions. This approach leads to discontinuities between the frames if the frames are analyzed independently. Therefore,...

chapter

Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation

H. Kawahara, M. Morise, T. Takahashi, R. Nisimura, more

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 3933 - 3936

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

A simple new method for estimating temporally stable power spectra is introduced to provide a unified basis for computing an interference-free spectrum, the fundamental frequency (F0), as well as aperiodicity estimation. F0 adaptive spectral smoothing and cepstral liftering based on consistent sampling theory are employed for interference-free spectral estimation. A perturbation spectrum, calculated...

chapter

Tree-guided transformation-based homograph disambiguation in Mandarin TTS system

Fangzhou Liu, Qin Shi, Jianhua Tao

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4657 - 4660

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

Homograph disambiguation is the core issue of the grapheme- to-phoneme conversion in Mandarin Text-to-Speech system. In this paper, a hybrid algorithm called tree-guided transformation-based learning (TTBL), which combines decision tree with transformation-based learning (TBL), is proposed to resolve homograph ambiguity. It can automatically generate templates, thereby avoiding manually summarizing...

chapter

A cross-language state mapping approach to bilingual (Mandarin-English) TTS

Hui Liang, Yao Qian, F.K. Soong, Gongshen Liu

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4641 - 4644

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

We propose a cross-language state mapping approach to HMM-based bilingual TTS. Two language-dependent decision trees are built first with a bilingual speech database recorded by a single speaker. A state mapping for every leaf node in the decision tree of a target language is created by finding the nearest leaf node in the tree of a source language. Kullback-Leibler divergence between two distributions...

chapter

Parameterized MMSE spectral magnitude estimation for the enhancement of noisy speech

C. Breithaupt, M. Krawczyk, R. Martin

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4037 - 4040

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

The enhancement of short-term spectra of noisy speech can be achieved by statistical estimation of the clean speech spectral components. We present a minimum mean-square error estimator of the clean speech spectral magnitude that uses both a parametric compression function in the estimation error criterion and a parametric prior distribution for the statistical model of the clean speech magnitude...

chapter

On the state definition for a trainable excitation model in HMM-based speech synthesis

R. Maia, T. Toda, K. Tokuda, S. Sakai, more

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 3965 - 3968

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

One of the issues of speech synthesizers based on hidden Markov models concerns the vocoded quality of the synthesized speech. From the principle of analysis-by-synthesis speech coders a trainable excitation model has been proposed to improve naturalness, where the method consists in the design of a set of state-dependent filters in a way to minimize the distortion between residual and synthetic excitation...

chapter

Modelling and synthesising F0 contours with the discrete cosine transform

J. Teutenberg, C. Watson, P. Riddle

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 3973 - 3976

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

The discrete cosine transform is proposed as a basis for representing fundamental frequency (F0) contours of speech. The advantages over existing representations include deterministic algorithms for both analysis and synthesis and a simple distance measure in the parameter space. A two-tier model using the DCT is shown to be able to model F0 contours to around 10 Hz RMS error. A proof-of-concept system...

Keywords:
SPEECH SYNTHESIS

Publication date

Set your own date range

Keywords

SPEECH PROCESSING (16)
SPEECH RECOGNITION (11)
HIDDEN MARKOV MODELS (8)
NATURAL LANGUAGE PROCESSING (7)
SPEECH ANALYSIS (5)
UNIT SELECTION (5)
CONCATENATIVE SPEECH SYNTHESIS (4)
SPEECH CODING (4)
DECISION TREES (3)
GAUSSIAN PROCESSES (3)
HIDDEN MARKOV MODEL (3)
HMM (3)
PROBABILITY (3)
SPEAKER RECOGNITION (3)
STATISTICAL ANALYSIS (3)
VOICE CONVERSION (3)
AGGLOMERATIVE CLUSTERING (2)
AUDIO DATABASES (2)
CEPSTRAL ANALYSIS (2)
DISCRETE COSINE TRANSFORMS (2)
DISCRIMINATIVE TRAINING (2)
ENERGY ENVELOPE (2)
FINITE STATE MACHINES (2)
GAUSSIAN MIXTURE MODEL (2)
LEARNING (ARTIFICIAL INTELLIGENCE) (2)
LEAST MEAN SQUARES METHODS (2)
MANDARIN TEXT-TO-SPEECH SYSTEM (2)
MAXIMUM LIKELIHOOD ESTIMATION (2)
MINIMUM GENERATION ERROR (2)
MODEL ADAPTATION (2)
NATURAL SPEECH (2)
PREDICTION THEORY (2)
REGRESSION ANALYSIS (2)
SIGNAL REPRESENTATION (2)
SPEAKER ADAPTATION (2)
SPECTRAL ANALYSIS (2)
SPEECH DATABASE (2)
SPEECH ENHANCEMENT (2)
STOCHASTIC PROCESSES (2)
TEXT-TO-SPEECH SYNTHESIS (2)
3D MOTION DATA (1)
ACCENT (1)
ACCENT ESTIMATION (1)
ACOUSTIC PARAMETERS (1)
ADAPTIVE ESTIMATION (1)
ADAPTIVE SIGNAL PROCESSING (1)
ADAPTIVE SPECTRAL SMOOTHING (1)
ADMISSIBLE STOPPING (1)
AGGLOMERATIVE AND SEQUENTIAL INFORMATION BOTTLENECK (1)
ANALYSIS-BY-SYNTHESIS FEATURES (1)
ANALYSIS-BY-SYNTHESIS SPEECH CODERS (1)
APERIODIC ENERGIES (1)
APERIODICITY ESTIMATION (1)
APPROXIMATION THEORY (1)
ARABIC (1)
ARABIC LARGE VOCABULARY (1)
ARABIC SPEECH-TO-TEXT SYSTEMS (1)
ARTICULATORY RECOGNITION (1)
ARTICULATORY SYNTHESIS (1)
AUGMENTED SYSTEM (1)
AUTOMATIC CONTEXT SENSITIVE PHONE SET MAPPING METHOD (1)
AUTOMATIC EVALUATION (1)
AUTOMATIC FREQUENCY ESTIMATION (1)
AUTOMATIC JOINT PROSODY LABELING (1)
AUTOMATIC PARAMETER SELECTION (1)
AUTOMATIC PHONETICS RECONSTRUCTION (1)
AUTOMATIC SPEECH RECOGNITION (1)
AUTOMATIC TRANSCRIPTION (1)
AVERAGE VOICE (1)
AVERAGE VOICE MODEL (1)
AVERAGE VOICE MODELS (1)
BASELINE SYSTEM (1)
BI-DIRECTIONAL CROSS-LINGUAL MAPPINGS (1)
BILINGUAL (1)
BILINGUAL CODE SPEECH SYNTHESIS (1)
BILINGUAL MANDARIN-ENGLISH TTS (1)
BILINGUAL SPEECH DATABASE (1)
BLIZZARD CHALLENGE (1)
BLIZZARD CHALLENGE 2007 (1)
BREATH GROUP (1)
CASCADED MODELINGS (1)
CEPSTRAL LIFTERING (1)
CEPSTRAL MODEL ORDER SELECTION CRITERION (1)
CHINESE CHARACTER TRANSCRIPTION (1)
COMPUTATIONAL LINGUISTICS (1)
CONCATENATIVE TEXT-TO-SPEECH SYNTHESIS (1)
CONFIDENCE MEASURE (1)
CONSISTENT SAMPLING (1)
CONSISTENT SAMPLING THEORY (1)
CONTEXT CLUSTERING (1)
CONTEXT SENSITIVE MAPPING (1)
CONTEXT WINDOW LENGTH (1)
CONTEXTUAL GRAPHEME (1)
CONTEXTUAL GRAPHEMES (1)
CONTINUOUS MANDARIN SPEECH (1)
CONTOURS INTERPOLATION (1)
CORRECT PRONUNCIATION PREDICTION (1)
COST DEGRADATION CRITERION (1)
COVARIANCE ANALYSIS (1)
more

INFONA - science communication portal

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

A novel approach to mixed phase room impulse response inversion for speech dereverberation

Discriminative training for improving letter-to-sound conversion performance

Analysis-by-synthesis features for speech recognition

Stylization of pitch with syllable-based linear segments

Exploration of high-level prosodic patterns for continuous mandarin speech

Further analysis of LSM-based unit pruning forunit selection TTS

Unsupervised optimal phoneme segmentation: Objectives, algorithm and comparisons

Unit database pruning based on the cost degradation criterion for concatenative speech synthesis

Estimation of the voicing cut-off frequency contour of natural speech based on harmonic and aperiodic energies

Robust phone set mapping using decision tree clustering for cross-lingual phone recognition

Speaker and style adaptation using average voice model for style control in HMM-based speech synthesis

Improving the modeling of the noise part in the harmonic plus noise model of speech

Performance evaluation of the speaker-independent HMM-based speech synthesis system “HTS 2007” for the Blizzard Challenge 2007

Time-varying linear prediction for speech analysis and synthesis

Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation

Tree-guided transformation-based homograph disambiguation in Mandarin TTS system

A cross-language state mapping approach to bilingual (Mandarin-English) TTS

Parameterized MMSE spectral magnitude estimation for the enhancement of noisy speech

On the state definition for a trainable excitation model in HMM-based speech synthesis

Modelling and synthesising F0 contours with the discrete cosine transform

Filter options

Publication date

Keywords

INFONA - science communication portal

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes