ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Items from 21 to 40 out of 120 results

chapter

Supervised hierarchical segmentation for bird song recording

Teresa V. Tjahja, Xiaoli Z. Fern, Raviv Raich, Anh T. Pham

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 763 - 767

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A common framework of identifying bird species from audio recordings involves detecting bird song segments, which will be subsequently input to a classifier. In-field recordings are contaminated with various environmental noise. For such recordings, supervised segmentation has been observed to outperform unsupervised energy-based approaches. Prior supervised segmentation work considers only pixel-level...

chapter

Speech-laughs: An HMM-based approach for amused speech synthesis

Kevin El Haddad, Stephane Dupont, Jerome Urbain, Thierry Dutoit

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4939 - 4943

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents an HMM-based synthesis approach for speechlaughs. The building stone of this project was the idea of the co-occurrence of smile and laughter bursts in varying proportions within amused speech utterances. A corpus with three complementary speaking styles was used to train the underlying HMM models: neutral speech, speech-smile, and finally laughter in different articulatory configurations...

chapter

Prosody generation using frame-based Gaussian process regression and classification for statistical parametric speech synthesis

Tomoki Koriyama, Takao Kobayashi

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4929 - 4933

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper proposes novel models of F0 contours and phone durations using Gaussian process regression and classification (GPR and GPC) for statistical parametric speech synthesis. Although the use of frame-based GPR has shown the effectiveness of spectral feature modeling in previous studies, the application of GPR to prosodic features, i.e., F0 and phone duration, was not investigated sufficiently...

chapter

HMM-based emphatic speech synthesis for corrective feedback in computer-aided pronunciation training

Yishuang Ning, Zhiyong Wu, Jia Jia, Fanbo Meng, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4934 - 4938

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper investigates the incorporation of hidden Markov model (HMM) based emphatic speech synthesis for audio exaggeration into an audio-visual speech synthesis framework for the corrective feedback in computer-aided pronunciation training (CAPT). To improve the voice quality of the synthetic emphatic speech, this paper proposes a new method for HMM training. In this method, the contextual questions...

chapter

Sparse HMM-based speech enhancement method for stationary and non-stationary noise environments

Feng Deng, Chang-chun Bao, W. Bastiaan Kleijn

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5073 - 5077

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose a sparse hidden Markov model (HMM)-based single-channel speech enhancement method that models the speech and noise gains accurately in both stationary and nonstationary environments. The objective function is augmented with an lp regularization term resulting in a sparse autoregressive HMM (SARHMM). The method encourages sparsity in the speech- and noise- modeling, which eliminates the...

chapter

Improved time-frequency trajectory excitation modeling for a statistical parametric speech synthesis system

Eunwoo Song, Young-Sun Joo, Hong-Goo Kang

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4949 - 4953

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper proposes an improved time-frequency trajectory excitation (TFTE) modeling method for a statistical parametric speech synthesis system. The proposed approach overcomes the dimensional variation problem of the training process caused by the inherent nature of the pitch-dependent analysis paradigm. By reducing the redundancies of the parameters using predicted average block coefficients (PABC),...

chapter

A multi-level representation of f0 using the continuous wavelet transform and the Discrete Cosine Transform

Manuel Sam Ribeiro, Robert A. J. Clark

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4909 - 4913

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose a representation of f0 using the Continuous Wavelet Transform (CWT) and the Discrete Cosine Transform (DCT). The CWT decomposes the signal into various scales of selected frequencies, while the DCT compactly represents complex contours as a weighted sum of cosine functions. The proposed approach has the advantage of combining signal decomposition and higher-level representations, thus modeling...

chapter

Methods for applying dynamic sinusoidal models to statistical parametric speech synthesis

Qiong Hu, Yannis Stylianou, Ranniery Maia, Korin Richmond, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4889 - 4893

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Sinusoidal vocoders can generate high quality speech, but they have not been extensively applied to statistical parametric speech synthesis. This paper presents two ways for using dynamic sinusoidal models for statistical speech synthesis, enabling the sinusoid parameters to be modelled in HMM-based synthesis. In the first method, features extracted from a fixed- and low-dimensional, perception-based...

chapter

Robust excitation-based features for Automatic Speech Recognition

Thomas Drugman, Yannis Stylianou, Langzhou Chen, Xie Chen, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4664 - 4668

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we investigate the use of noise-robust features characterizing the speech excitation signal as complementary features to the usually considered vocal tract based features for Automatic Speech Recognition (ASR). The proposed Excitation-based Features (EBF) are tested in a state-of-the-art Deep Neural Network (DNN) based hybrid acoustic model for speech recognition. The suggested excitation...

chapter

Softsad: Integrated frame-based speech confidence for speaker recognition

Mitchell McLaren, Martin Graciarena, Yun Lei

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4694 - 4698

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we propose softSAD: the direct integration of speech posteriors into a speaker recognition system as an alternative to using speech activity detection (SAD). Motivated by the need to use audio from short recordings more efficiently, softSAD removes the need to discard audio using speech/non-speech decisions based on a threshold as done with SAD. Instead, softSAD explicitly integrates...

chapter

JFA modeling with left-to-right structure and a new backend for text-dependent speaker recognition

Patrick Kenny, Themos Stafylakis, Jahangir Alam, Marcel Kockmann

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4689 - 4693

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper introduces a new formulation of Joint Factor Analysis (JFA) for text-dependent speaker recognition based on left-to-right modeling with tied mixture HMMs. It accommodates many different ways of extracting multiple features to characterize speakers (features may or may not be HMM state-dependent, they may be modeled with subspace or factorial priors and these priors maybe imputed from text-dependent...

chapter

Multi-frame factorisation for long-span acoustic modelling

Liang Lu, Steve Renals

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4595 - 4599

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Acoustic models based on Gaussian mixture models (GMMs) typically use short span acoustic feature inputs. This does not capture long-term temporal information from speech owing to the conditional independence assumption of hidden Markov models. In this paper, we present an implicit approach that approximates the joint distribution of long span features by product of factorized models, in contrast...

chapter

Unsupervised speaker adaptation of deep neural network based on the combination of speaker codes and singular value decomposition for speech recognition

Shaofei Xue, Hui Jiang, Lirong Dai, Qingfeng Liu

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4555 - 4559

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recently, we have proposed a general adaptation scheme for deep neural network based on discriminant condition codes and applied it to supervised speaker adaptation in speech recognition based on either frame-level cross-entropy or sequence-level maximum mutual information training criterion [1, 2, 3, 4]. In this case, each condition code is associated with one speaker in data, which is thus called...

chapter

Building context-dependent DNN acoustic models using Kullback-Leibler divergence-based state tying

Gabor Gosztolya, Tamas Grosz, Laszlo Toth, David Imseng

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4570 - 4574

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Deep neural network (DNN) based speech recognizers have recently replaced Gaussian mixture (GMM) based systems as the state-of-the-art. HMM/DNN systems have kept many refinements of the HMM/GMM framework, even though some of these may be suboptimal for them. One such example is the creation of context-dependent tied states, for which an efficient decision tree state tying method exists. The tied states...

chapter

Modeling long temporal contexts in convolutional neural network-based phone recognition

Laszlo Toth

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4575 - 4579

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The deep neural network component of current hybrid speech recognizers is trained on a context of consecutive feature vectors. Here, we investigate whether the time span of this input can be extended by splitting it up and modeling it in smaller chunks. One method for this is to train a hierarchy of two networks, while the less well-known split temporal context (STC) method models the left and right...

chapter

Investigation of mixture splitting concept for training linear bottlenecks of deep neural network acoustic models

Muhammad Ali Tahir, Simon Wiesler, Ralf Schluter, Hermann Ney

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4614 - 4618

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A Gaussian or log-linear mixture model trained by maximum likelihood may be trained further using discriminative training. It is desirable that the mixture splitting is also done during the discriminative training, to achieve better mixture density distribution. In previous work such a discriminative splitting approach was presented. Similarly, the resolution of a deep neural network may also be increased...

chapter

A deep recurrent approach for acoustic-to-articulatory inversion

Peng Liu, Quanjie Yu, Zhiyong Wu, Shiyin Kang, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4450 - 4454

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

To solve the acoustic-to-articulatory inversion problem, this paper proposes a deep bidirectional long short term memory recurrent neural network and a deep recurrent mixture density network. The articulatory parameters of the current frame may have correlations with the acoustic features many frames before or after. The traditional pre-designed fixed-length context window may be either insufficient...

chapter

KL-HMM based speaker diarization system for meetings

Srikanth Madikeri, Herve Bourlard

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4435 - 4439

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, the Kullback-Leibler Hidden Markov Model (KL-HMMs) is applied for unsupervised diarization of speech. A general approach to speaker diarization is to split the audio into uniform segments followed by one or more iterations of clustering of the segments and resegmentation of the audio. In the Information Bottlneck (IB) approach to diarization, short uniform segments are clustered using...

chapter

Joint training of front-end and back-end deep neural networks for robust speech recognition

Tian Gao, Jun Du, Li-Rong Dai, Chin-Hui Lee

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4375 - 4379

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Based on the recently proposed speech pre-processing front-end with deep neural networks (DNNs), we first investigate different feature mapping directly from noisy speech via DNN for robust speech recognition. Next, we propose to jointly train a single DNN for both feature mapping and acoustic modeling. In the end, we show that the word error rate (WER) of the jointly trained system could be significantly...

chapter

Max-product dynamical systems and applications to audio-visual salient event detection in videos

Petros Maragos, Petros Koutras

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2284 - 2288

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper introduces a theory for max-product systems by analyzing them as discrete-time nonlinear dynamical systems that obey a superposition of a weighted maximum type and evolve on nonlinear spaces which we call complete weighted lattices. Special cases of such systems have found applications in speech recognition as weighted finite-state transducers and in belief propagation on graphical models...

Keywords:
HIDDEN MARKOV MODELS

Publication date

Set your own date range

Keywords

SPEECH (86)
TRAINING (51)
SPEECH RECOGNITION (49)
ACOUSTICS (48)
NEURAL NETWORKS (27)
FEATURE EXTRACTION (22)
ADAPTATION MODELS (20)
DEEP NEURAL NETWORK (19)
SPEECH SYNTHESIS (13)
COMPUTATIONAL MODELING (12)
CONTEXT (12)
RECURRENT NEURAL NETWORKS (11)
NOISE (10)
ACCURACY (9)
DEEP NEURAL NETWORKS (9)
ARTIFICIAL NEURAL NETWORKS (8)
AUTOMATIC SPEECH RECOGNITION (8)
MATHEMATICAL MODEL (8)
SPEECH PROCESSING (8)
CONTEXT MODELING (7)
DATA MODELS (7)
PROBABILISTIC LOGIC (7)
STATISTICAL PARAMETRIC SPEECH SYNTHESIS (7)
CONVOLUTION (6)
DATABASES (6)
ESTIMATION (6)
JOINTS (6)
ROBUSTNESS (6)
VISUALIZATION (6)
COMPUTER ARCHITECTURE (5)
DNN (5)
ERROR ANALYSIS (5)
HIDDEN MARKOV MODEL (5)
NOISE MEASUREMENT (5)
ROBUST SPEECH RECOGNITION (5)
SPEAKER DIARIZATION (5)
TRAINING DATA (5)
TRAJECTORY (5)
ACOUSTIC MODELING (4)
CLUSTERING ALGORITHMS (4)
HARMONIC ANALYSIS (4)
HMM (4)
LATTICES (4)
RNN (4)
SPEAKER ADAPTATION (4)
SPEAKER RECOGNITION (4)
SPEECH ENHANCEMENT (4)
BAYES METHODS (3)
CONDITIONAL RANDOM FIELDS (3)
CONFERENCES (3)
CONVOLUTIONAL NEURAL NETWORKS (3)
EMOTION RECOGNITION (3)
HIDDEN MARKOV MODEL (HMM) (3)
INSTRUMENTS (3)
LABELING (3)
LONG SHORT-TERM MEMORY (3)
MEL FREQUENCY CEPSTRAL COEFFICIENT (3)
NEURAL NETWORK (3)
NEURONS (3)
NIST (3)
PRAGMATICS (3)
SIGNAL PROCESSING (3)
SIGNAL TO NOISE RATIO (3)
SILICON (3)
SPECTROGRAM (3)
SUPPORT VECTOR MACHINES (3)
TIME-FREQUENCY ANALYSIS (3)
ACOUSTIC MODEL (2)
ACOUSTIC MODELLING (2)
ACTIVE LEARNING (2)
ADAPTATION (2)
ALGORITHM DESIGN AND ANALYSIS (2)
ANALYTICAL MODELS (2)
APPROXIMATION METHODS (2)
AUDIO SEGMENTATION (2)
BELIEF PROPAGATION (2)
BIRDS (2)
BLSTM (2)
CAMERAS (2)
CONVOLUTIONAL NEURAL NETWORK (2)
CORRELATION (2)
DECODING (2)
DEEP LEARNING (2)
DENSITY ESTIMATION ROBUST ALGORITHM (2)
DISCRETE COSINE TRANSFORMS (2)
FREQUENCY ESTIMATION (2)
GMM (2)
GRAPHICAL MODELS (2)
HANDWRITING RECOGNITION (2)
IMAGE SEGMENTATION (2)
INTEGRATED CIRCUITS (2)
JOINING PROCESSES (2)
LARGE VOCABULARY SPEECH RECOGNITION (2)
LSTM (2)
MATRIX DECOMPOSITION (2)
MAXOUT (2)
MEASUREMENT (2)
MULTI-TASK LEARNING (2)
MULTIPLE SIGNAL CLASSIFICATION (2)
more

INFONA - science communication portal

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Supervised hierarchical segmentation for bird song recording

Speech-laughs: An HMM-based approach for amused speech synthesis

Prosody generation using frame-based Gaussian process regression and classification for statistical parametric speech synthesis

HMM-based emphatic speech synthesis for corrective feedback in computer-aided pronunciation training

Sparse HMM-based speech enhancement method for stationary and non-stationary noise environments

Improved time-frequency trajectory excitation modeling for a statistical parametric speech synthesis system

A multi-level representation of f0 using the continuous wavelet transform and the Discrete Cosine Transform

Methods for applying dynamic sinusoidal models to statistical parametric speech synthesis

Robust excitation-based features for Automatic Speech Recognition

Softsad: Integrated frame-based speech confidence for speaker recognition

JFA modeling with left-to-right structure and a new backend for text-dependent speaker recognition

Multi-frame factorisation for long-span acoustic modelling

Unsupervised speaker adaptation of deep neural network based on the combination of speaker codes and singular value decomposition for speech recognition

Building context-dependent DNN acoustic models using Kullback-Leibler divergence-based state tying

Modeling long temporal contexts in convolutional neural network-based phone recognition

Investigation of mixture splitting concept for training linear bottlenecks of deep neural network acoustic models

A deep recurrent approach for acoustic-to-articulatory inversion

KL-HMM based speaker diarization system for meetings

Joint training of front-end and back-end deep neural networks for robust speech recognition

Max-product dynamical systems and applications to audio-visual salient event detection in videos

Filter options

Publication date

Keywords

INFONA - science communication portal

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)