ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Items from 1 to 20 out of 120 results

chapter

A new study of GMM-SVM system for text-dependent speaker recognition

Hanwu Sun, Kong Aik Lee, Bin Ma

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4195 - 4199

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents a new approach and the study of GMM-SVM system for text-dependent speaker recognition on scenario of the fixed pass-phrases. The uniform-split content-based GMM-SVM system is proposed and applied to text-dependent speaker evaluation. We conducted detailed study of the proposed method compared to the baseline GMM-SVM system on the RSR2015 database, which has been designed and collected...

chapter

Coherent modification of pitch and energy for expressive prosody implantation

Alexander Sorin, Slava Shechtman, Vincent Pollet

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4914 - 4918

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In expressive TTS and voice transformation systems, implantation of expressive prosody derived from external out-of-domain sources often leads to extreme pitch modification that compromises the naturalness of the synthesized speech.

chapter

Evaluation of linear regression for speaker adaptation in HMM-based articulatory movements estimation

Hao Li, Jianhua Tao, Yang Wang

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4944 - 4948

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Acoustic-to-articulatory inversion problem is usually studied in speaker-specific manner because both articulatory data and acoustic features contain speaker-specific components. This paper presents our work on speaker-adaptation training for this problem. We implement speaker adaptation in HMM-based acoustic-to-articulatory inversion mapping, and evaluate different combinatorial structures of the...

chapter

Deep neural networks for estimating speech model activations

Donald S. Williamson, Yuxuan Wang, DeLiang Wang

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5113 - 5117

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents an approach for improving the perceptual quality of speech separated from background noise at low signal-to-noise ratios. Our approach uses two stages of deep neural networks, where the first stage estimates the ideal ratio mask that separates speech from noise, and the second stage maps the ratio-masked speech to the clean speech activation matrices that are used for nonnegative...

chapter

Photo-real talking head with deep bidirectional LSTM

Bo Fan, Lijuan Wang, Frank K. Soong, Lei Xie

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4884 - 4888

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Long short-term memory (LSTM) is a specific recurrent neural network (RNN) architecture that is designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. In this paper, we propose to use deep bidirectional LSTM (BLSTM) for audio/visual modeling in our photo-real talking head system. An audio/visual database of a subject's talking is firstly recorded...

chapter

Combining SGMM speaker vectors and KL-HMM approach for speaker diarization

Srikanth Madikeri, Petr Motlicek, Herve Bourlard

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4834 - 4838

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, a method to use SGMM speaker vectors for speaker diarization is introduced. The architecture of the Information Bottleneck (IB) based speaker diarization is utilized for this purpose. The audio for speaker diarization is split into short uniform segments. Speaker vectors are obtained from a Subspace Gaussian Mixture Model (SGMM) system trained on meeting data. The speaker vectors are...

chapter

The effect of neural networks in statistical parametric speech synthesis

Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4455 - 4459

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper investigates how to use neural networks in statistical parametric speech synthesis. Recently, deep neural networks (DNNs) have been used for statistical parametric speech synthesis. However, the specific way how DNNs should be used in statistical parametric speech synthesis has not been studied thoroughly. A generation process of statistical parametric speech synthesis based on generative...

chapter

Full-rank linear-chain NeuroCRF for sequence labeling

Marc-Antoine Rondeau, Yi Su

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5281 - 5285

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Inspired by the success of deep neural network-hidden Markov model (DNN-HMM) in acoustic modeling for automatic speech recognition, a number of researchers from various fields have independently proposed the idea of combining DNN and conditional random fields (CRFs). Despite their subtle differences, this class of models is collectively referred to as “NeuroCRF” in this paper. We focus our attention...

chapter

Emotion recognition using synthetic speech as neutral reference

Reza Lotfian, Carlos Busso

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4759 - 4763

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A common approach to recognize emotion from speech is to estimate multiple acoustic features at sentence or turn level. These features are derived independent of the underlying lexical content. Studies have demonstrated that lexical dependent models improve emotion recognition accuracy. However, current practical approaches can only model small lexical units like phonemes, syllables or few key words,...

chapter

Multi-basis adaptive neural network for rapid adaptation in speech recognition

Chunyang Wu, Mark J.F. Gales

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4315 - 4319

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recent progress in acoustic modeling with deep neural network has significantly improved the performance of automatic speech recognition systems. However, it remains as an open problem how to rapidly adapt these networks with limited, unsupervised, data. Most existing methods to adapt a neural network involve modifying a large number of parameters thus rapid adaptation is not possible with these schemes...

chapter

Pitch estimation and tracking with harmonic emphasis on the acoustic spectrum

Sam Karimian-Azari, Nasser Mohammadiha, Jesper R. Jensen, Mads G. Christensen

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4330 - 4334

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we use unconstrained frequency estimates (UFEs) from a noisy harmonic signal and propose two methods to estimate and track the pitch over time. We assume that the UFEs are multivariate-normally-distributed random variables, and derive a maximum likelihood (ML) pitch estimator by maximizing the likelihood of the UFEs over short time-intervals. As the main contribution of this paper,...

chapter

Deep convolutional neural networks for acoustic modeling in low resource languages

William Chan, Ian Lane

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2056 - 2060

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Convolutional Neural Networks (CNNs) have demonstrated powerful acoustic modelling capabilities due to their ability to account for structural locality in the feature space; and in recent works CNNs have been shown to often outperform fully connected Deep Neural Networks (DNNs) on TIMIT and LVCSR. In this paper, we perform a detailed empirical study of CNNs under the low resource condition, wherein...

chapter

Speech recognition with prediction-adaptation-correction recurrent neural networks

Yu Zhang, Dong Yu, Michael L. Seltzer, Jasha Droppo

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5004 - 5008

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose the prediction-adaptation-correction RNN (PAC-RNN), in which a correction DNN estimates the state posterior probability based on both the current frame and the prediction made on the past frames by a prediction DNN. The result from the main DNN is fed back to the prediction DNN to make better predictions for the future frames. In the PAC-RNN, we can consider that, given the new, current...

chapter

Integrated pronunciation learning for automatic speech recognition using probabilistic lexical modeling

Ramya Rasipuram, Marzieh Razavi, Mathew Magimai-Doss

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5176 - 5180

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Standard automatic speech recognition (ASR) systems use phoneme-based pronunciation lexicon prepared by linguistic experts. When the hand crafted pronunciations fail to cover the vocabulary of a new domain, a grapheme-to-phoneme (G2P) converter is used to extract pronunciations for new words and then a phonemebased ASR system is trained. G2P converters are typically trained only on the existing lexicons...

chapter

An iterative bayesian algorithm for block-sparse signal reconstruction

M. Korki, J. Zhangy, C. Zhang, H. Zayyani

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2174 - 2178

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents a novel iterative Bayesian algorithm, Block Iterative Bayesian Algorithm (Block-IBA), for reconstructing block-sparse signals with unknown block structures. Unlike the other existing algorithms for block sparse signal recovery which assume the cluster structure of the non-zero elements of the unknown signal to be independent and identically distributed (i.i.d.), we use a more realistic...

chapter

On the complexity of information planning in Gaussian models

Georgios Papachristoudis, John W. Fisher

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2184 - 2188

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We analyze the complexity of evaluating information rewards for measurement selection in sparse graphical models under the assumption that measurements are drawn from a limited number of nodes subject to a finite budget. Previous analyses [1, 2, 3] exploit the submodular property of conditional mutual information to demonstrate that greedy measurement selection come with near-optimal guarantees As...

chapter

A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks

Erik Marchi, Fabio Vesperini, Florian Eyben, Stefano Squartini, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1996 - 2000

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Acoustic novelty detection aims at identifying abnormal/novel acoustic signals which differ from the reference/normal data that the system was trained with. In this paper we present a novel unsupervised approach based on a denoising autoencoder. In our approach auditory spectral features are processed by a denoising autoencoder with bidirectional Long Short-Term Memory recurrent neural networks. We...

chapter

Discriminative spectral learning of hidden markov models for human activity recognition

Alfredo Nazabal, Antonio Artes-Rodriguez

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1966 - 1970

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Hidden Markov Models (HMMs) are one of the most important techniques to model and classify sequential data. Maximum Likelihood (ML) and (parametric and non-parametric) Bayesian estimation of the HMM parameters suffers from local maxima and in massive datasets they can be specially time consuming. In this paper, we extend the spectral learning of HMMs, a moment matching learning technique free from...

chapter

Section-level modeling of musical audio for linking performances to scores in Turkish makam music

Andre Holzapfel, Umut Simsekli, Sertan Senturk, Ali Taylan Cemgil

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 141 - 145

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Section linking aims at relating structural units in the notation of a piece of music to their occurrences in a performance of the piece. In this paper, we address this task by presenting a score-informed hierarchical Hidden Markov Model (HHMM) for modeling musical audio signals on the temporal level of sections present in a composition, where the main idea is to explicitly model the long range and...

chapter

Speech Separation based on signal-noise-dependent deep neural networks for robust speech recognition

Yan-Hui Tu, Jun Du, Li-Rong Dai, Chin-Hui Lee

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 61 - 65

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we propose a new signal-noise-dependent (SND) deep neural network (DNN) framework to further improve the separation and recognition performance of the recently developed technique for general DNN-based speech separation. We adopt a divide and conquer strategy to design the proposed SND-DNNs with higher resolutions that a single general DNN could not well accommodate for all the speaker...

Keywords:
HIDDEN MARKOV MODELS

Publication date

Set your own date range

Keywords

SPEECH (86)
TRAINING (51)
SPEECH RECOGNITION (49)
ACOUSTICS (48)
NEURAL NETWORKS (27)
FEATURE EXTRACTION (22)
ADAPTATION MODELS (20)
DEEP NEURAL NETWORK (19)
SPEECH SYNTHESIS (13)
COMPUTATIONAL MODELING (12)
CONTEXT (12)
RECURRENT NEURAL NETWORKS (11)
NOISE (10)
ACCURACY (9)
DEEP NEURAL NETWORKS (9)
ARTIFICIAL NEURAL NETWORKS (8)
AUTOMATIC SPEECH RECOGNITION (8)
MATHEMATICAL MODEL (8)
SPEECH PROCESSING (8)
CONTEXT MODELING (7)
DATA MODELS (7)
PROBABILISTIC LOGIC (7)
STATISTICAL PARAMETRIC SPEECH SYNTHESIS (7)
CONVOLUTION (6)
DATABASES (6)
ESTIMATION (6)
JOINTS (6)
ROBUSTNESS (6)
VISUALIZATION (6)
COMPUTER ARCHITECTURE (5)
DNN (5)
ERROR ANALYSIS (5)
HIDDEN MARKOV MODEL (5)
NOISE MEASUREMENT (5)
ROBUST SPEECH RECOGNITION (5)
SPEAKER DIARIZATION (5)
TRAINING DATA (5)
TRAJECTORY (5)
ACOUSTIC MODELING (4)
CLUSTERING ALGORITHMS (4)
HARMONIC ANALYSIS (4)
HMM (4)
LATTICES (4)
RNN (4)
SPEAKER ADAPTATION (4)
SPEAKER RECOGNITION (4)
SPEECH ENHANCEMENT (4)
BAYES METHODS (3)
CONDITIONAL RANDOM FIELDS (3)
CONFERENCES (3)
CONVOLUTIONAL NEURAL NETWORKS (3)
EMOTION RECOGNITION (3)
HIDDEN MARKOV MODEL (HMM) (3)
INSTRUMENTS (3)
LABELING (3)
LONG SHORT-TERM MEMORY (3)
MEL FREQUENCY CEPSTRAL COEFFICIENT (3)
NEURAL NETWORK (3)
NEURONS (3)
NIST (3)
PRAGMATICS (3)
SIGNAL PROCESSING (3)
SIGNAL TO NOISE RATIO (3)
SILICON (3)
SPECTROGRAM (3)
SUPPORT VECTOR MACHINES (3)
TIME-FREQUENCY ANALYSIS (3)
ACOUSTIC MODEL (2)
ACOUSTIC MODELLING (2)
ACTIVE LEARNING (2)
ADAPTATION (2)
ALGORITHM DESIGN AND ANALYSIS (2)
ANALYTICAL MODELS (2)
APPROXIMATION METHODS (2)
AUDIO SEGMENTATION (2)
BELIEF PROPAGATION (2)
BIRDS (2)
BLSTM (2)
CAMERAS (2)
CONVOLUTIONAL NEURAL NETWORK (2)
CORRELATION (2)
DECODING (2)
DEEP LEARNING (2)
DENSITY ESTIMATION ROBUST ALGORITHM (2)
DISCRETE COSINE TRANSFORMS (2)
FREQUENCY ESTIMATION (2)
GMM (2)
GRAPHICAL MODELS (2)
HANDWRITING RECOGNITION (2)
IMAGE SEGMENTATION (2)
INTEGRATED CIRCUITS (2)
JOINING PROCESSES (2)
LARGE VOCABULARY SPEECH RECOGNITION (2)
LSTM (2)
MATRIX DECOMPOSITION (2)
MAXOUT (2)
MEASUREMENT (2)
MULTI-TASK LEARNING (2)
MULTIPLE SIGNAL CLASSIFICATION (2)
more

INFONA - science communication portal

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A new study of GMM-SVM system for text-dependent speaker recognition

Coherent modification of pitch and energy for expressive prosody implantation

Evaluation of linear regression for speaker adaptation in HMM-based articulatory movements estimation

Deep neural networks for estimating speech model activations

Photo-real talking head with deep bidirectional LSTM

Combining SGMM speaker vectors and KL-HMM approach for speaker diarization

The effect of neural networks in statistical parametric speech synthesis

Full-rank linear-chain NeuroCRF for sequence labeling

Emotion recognition using synthetic speech as neutral reference

Multi-basis adaptive neural network for rapid adaptation in speech recognition

Pitch estimation and tracking with harmonic emphasis on the acoustic spectrum

Deep convolutional neural networks for acoustic modeling in low resource languages

Speech recognition with prediction-adaptation-correction recurrent neural networks

Integrated pronunciation learning for automatic speech recognition using probabilistic lexical modeling

An iterative bayesian algorithm for block-sparse signal reconstruction

On the complexity of information planning in Gaussian models

A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks

Discriminative spectral learning of hidden markov models for human activity recognition

Section-level modeling of musical audio for linking performances to scores in Turkish makam music

Speech Separation based on signal-noise-dependent deep neural networks for robust speech recognition

Filter options

Publication date

Keywords

INFONA - science communication portal

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)