ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Items from 101 to 120 out of 120 results

chapter

Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis

Heiga Zen, Hasim Sak

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4470 - 4474

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Long short-term memory recurrent neural networks (LSTM-RNNs) have been applied to various speech applications including acoustic modeling for statistical parametric speech synthesis. One of the concerns for applying them to text-to-speech applications is its effect on latency. To address this concern, this paper proposes a low-latency, streaming speech synthesis architecture using unidirectional LSTM-RNNs...

chapter

Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis

Yuchen Fan, Yao Qian, Frank K. Soong, Lei He

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4475 - 4479

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In DNN-based TTS synthesis, DNNs hidden layers can be viewed as deep transformation for linguistic features and the output layers as representation of acoustic space to regress the transformed linguistic features to acoustic parameters. The deep-layered architectures of DNN can not only represent highly-complex transformation compactly, but also take advantage of huge amount of training data. In this...

chapter

Employment of Subspace Gaussian Mixture Models in speaker recognition

Petr Motlicek, Subhadeep Dey, Srikanth Madikeri, Lukas Burget

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4445 - 4449

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents Subspace Gaussian Mixture Model (SGMM) approach employed as a probabilistic generative model to estimate speaker vector representations to be subsequently used in the speaker verification task. SGMMs have already been shown to significantly outperform traditional HMM/GMMs in Automatic Speech Recognition (ASR) applications. An extension to the basic SGMM framework allows to robustly...

chapter

Affective structure modeling of speech using probabilistic context free grammar for emotion recognition

Kun-Yi Huang, Jia-Kuan Lin, Yu-Hsien Chiu, Chung-Hsien Wu

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5286 - 5290

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A complete emotional expression typically contains a complex temporal course in a natural conversation. Related research on utterance-level and segment-level processing lacks understanding of the underlying structure of emotional speech. In this study, a hierarchical affective structure of an emotional utterance characterized by the probabilistic context free grammars (PCFGs) is proposed for emotion...

chapter

Word-semantic lattices for spoken language understanding

Jan Svec, Lubos Smidl, Tomas Valenta, Adam Chylek, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5266 - 5270

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The paper presents a method for converting word-based automatic speech recognition (ASR) lattices into word-semantic (W-SE) lattices that contain original words together with a partial semantic information - so-called semantic entities. Semantic entity detection algorithm generates semantic entities based on the expert-defined knowledge. The generated W-SE lattices have smaller vocabulary and consequently...

chapter

A Bernoulli filter approach to detection and estimation of hidden Markov models using cluttered observation sequences

Karl Granstrom, Peter Willett, Yaakov Bar-Shalom

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 3911 - 3915

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Hidden Markov Models (HMMs) are powerful statistical techniques with many applications, and in this paper they are used for modeling asymmetric threats. The observations generated by such HMMs are generally cluttered with observations that are not related to the HMM. In this paper a Bernoulli filter is proposed, which processes cluttered observations and is capable of detecting if there is an HMM...

chapter

Deep autoencoders augmented with phone-class feature for reverberant speech recognition

Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4365 - 4369

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper addresses reverberant speech recognition based on front-end processing using DAE (Deep AutoEncoder) coupled with DNN (Deep Neural Network) acoustic model. DAE can effectively and flexibly learn mapping from corrupted speech to the original clean speech based on the deep learning scheme. While this mapping is conventionally conducted only with the acoustic information, we presume the mapping...

chapter

Maximum likelihood nonlinear transformations based on deep neural networks

Xiaodong Cui, Vaibhava Goel

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4320 - 4324

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper investigates modeling nonlinear transformations based on deep neural networks (DNNs). Specifically, a DNN is used as a nonlinear mapping function for feature space transformation for HMM acoustic models. The nonlinear transformations are estimated under the sequence-based maximum likelihood criterion. The likelihood partition function is evaluated using the Monte Carlo method based on importance...

chapter

Cluster adaptive training for deep neural network

Tian Tan, Yanmin Qian, Maofan Yin, Yimeng Zhuang, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4325 - 4329

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Although context-dependent DNN-HMM systems have achieved significant improvements over GMM-HMM systems, there still exists big performance degradation if the acoustic condition of the test data mismatches that of the training data. Hence, adaptation and adaptive training of DNN are of great research interest. Previous works mainly focus on adapting the parameters of a single DNN by regularized or...

chapter

Investigating online low-footprint speaker adaptation using generalized linear regression and click-through data

Yong Zhao, Jinyu Li, Jian Xue, Yifan Gong

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4310 - 4314

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

To develop speaker adaptation algorithms for deep neural network (DNN) that are suitable for large-scale online deployment, it is desirable that the adaptation model be represented in a compact form and learned in an unsupervised fashion. In this paper, we propose a novel low-footprint adaptation technique for DNN that adapts the DNN model through node activation functions. The approach introduces...

chapter

Feedback-based handwriting recognition from inertial sensor data for wearable devices

Yujia Li, Kaisheng Yao, Geoffrey Zweig

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2269 - 2273

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents a novel interactive method for recognizing handwritten words, using the inertial sensor data available on smart watches. The goal is to allow the user to write with a finger, and use the smart watch sensor signals to infer what the user has written. Past work has exploited the similarity of handwriting recognition to speech recognition in order to deploy HMM based methods. In contrast...

chapter

Modeling mutual influence of multimodal behavior in affective dyadic interactions

Zhaojun Yang, Shrikanth Narayanan

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2234 - 2238

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

To accomplish effective communication, interaction partners generally adapt their verbal and non-verbal behavior to that of their interlocutors. This behavior adaptation is often modulated by the underlying emotional states of partners. Modeling such mutual behavioral influence is critical for emotion characterization in an interaction. In this paper, we focus on explicitly modeling the mutual influence...

chapter

Continuous visual speech recognition for audio speech enhancement

Eric Benhaim, Hichem Sahbi, Guillaume Vittey

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2244 - 2248

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We introduce in this paper a novel non-blind speech enhancement procedure based on visual speech recognition (VSR). The latter is based on a generative process that analyzes sequences of talking faces and classifies them into visual speech units known as visemes. We use an effective graphical model able to segment and label a given sequence of talking faces into a sequence of visemes. Our model captures...

chapter

Regularizing DNN acoustic models with Gaussian stochastic neurons

Hao Zhang, Yajie Miao, Florian Metze

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4964 - 4968

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Dropout and DropConnect can be viewed as regularization methods for deep neural network (DNN) training. In DNN acoustic modeling, the huge number of speech samples makes it expensive to sample the neuron mask (Dropout) or the weight mask (DropConnect) repetitively from a high dimensional distribution. In this paper we investigate the effect of Gaussian stochastic neurons on DNN acoustic modeling....

chapter

Small-footprint high-performance deep neural network-based speech recognition using split-VQ

Yongqiang Wang, Jinyu Li, Yifan Gong

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4984 - 4988

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Due to a large number of parameters in deep neural networks (DNNs), it is challenging to design a small-footprint DNN-based speech recognition system while maintaining a high recognition performance. Even with a singular value matrix decomposition (SVD) method and scalar quantization, the DNN model is still too large to be deployed on many mobile devices. Common practices like reducing the number...

chapter

A hybrid recurrent neural network for music transcription

Siddharth Sigtia, Emmanouil Benetos, Nicolas Boulanger-Lewandowski, Tillman Weyde, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2061 - 2065

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We investigate the problem of incorporating higher-level symbolic score-like information into Automatic Music Transcription (AMT) systems to improve their performance. We use recurrent neural networks (RNNs) and their variants as music language models (MLMs) and present a generative architecture for combining these models with predictions from a frame level acoustic classifier. We also compare different...

chapter

Blind bleed-through removal for scanned historical document images with conditional random fields

Bin Sun, Shutao Li, Jun Sun

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1652 - 1656

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Due to the quality of paper and long-time preservation, the ink on one side of the historical documents often seeps through and appears on the other side. In this paper, a new blind ink bleed-through removal method is proposed to deal with the scanned historical document images. The scanned historical document image generally consists of three components: foreground, bleed-through and background....

chapter

Acoustic scene analysis from acoustic event sequence with intermittent missing event

Keisuke Imoto, Nobutaka Ono

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 156 - 160

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose a novel method for analyzing acoustic scenes that can sophisticatedly estimate acoustic scenes from an acoustic event sequence with intermittent missing events. On the basis of the idea that acoustic events are temporally correlated, we model the transition of acoustic events using a hidden Markov model (HMM) and estimate missing acoustic events. Then, we incorporate the transition of acoustic...

chapter

Multi-instrument detection in polyphonic music using Gaussian Mixture based factorial HMM

Ranjani H. G., T. V. Sreenivas

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 191 - 195

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We formulate the problem of detecting the constituent instruments in a polyphonic music piece as a joint decoding problem. From monophonic data, parametric Gaussian Mixture Hidden Markov Models (GM-HMM) are obtained for each instrument. We propose a method to use the above models in a factorial framework, termed as Factorial GM-HMM (F-GM-HMM). The states are jointly inferred to explain the evolution...

chapter

Vocaine the vocoder and applications in speech synthesis

Yannis Agiomyrgiannakis

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4230 - 4234

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Vocoders received renewed attention recently as basic components in speech synthesis applications such as voice transformation, voice conversion and statistical parametric speech synthesis. This paper presents a new vocoder synthesizer, referred to as Vocaine, that features a novel Amplitude Modulated-Frequency Modulated (AM-FM) speech model, a new way to synthesize non-stationary sinusoids using...

Keywords:
HIDDEN MARKOV MODELS

Publication date

Set your own date range

Keywords

SPEECH (86)
TRAINING (51)
SPEECH RECOGNITION (49)
ACOUSTICS (48)
NEURAL NETWORKS (27)
FEATURE EXTRACTION (22)
ADAPTATION MODELS (20)
DEEP NEURAL NETWORK (19)
SPEECH SYNTHESIS (13)
COMPUTATIONAL MODELING (12)
CONTEXT (12)
RECURRENT NEURAL NETWORKS (11)
NOISE (10)
ACCURACY (9)
DEEP NEURAL NETWORKS (9)
ARTIFICIAL NEURAL NETWORKS (8)
AUTOMATIC SPEECH RECOGNITION (8)
MATHEMATICAL MODEL (8)
SPEECH PROCESSING (8)
CONTEXT MODELING (7)
DATA MODELS (7)
PROBABILISTIC LOGIC (7)
STATISTICAL PARAMETRIC SPEECH SYNTHESIS (7)
CONVOLUTION (6)
DATABASES (6)
ESTIMATION (6)
JOINTS (6)
ROBUSTNESS (6)
VISUALIZATION (6)
COMPUTER ARCHITECTURE (5)
DNN (5)
ERROR ANALYSIS (5)
HIDDEN MARKOV MODEL (5)
NOISE MEASUREMENT (5)
ROBUST SPEECH RECOGNITION (5)
SPEAKER DIARIZATION (5)
TRAINING DATA (5)
TRAJECTORY (5)
ACOUSTIC MODELING (4)
CLUSTERING ALGORITHMS (4)
HARMONIC ANALYSIS (4)
HMM (4)
LATTICES (4)
RNN (4)
SPEAKER ADAPTATION (4)
SPEAKER RECOGNITION (4)
SPEECH ENHANCEMENT (4)
BAYES METHODS (3)
CONDITIONAL RANDOM FIELDS (3)
CONFERENCES (3)
CONVOLUTIONAL NEURAL NETWORKS (3)
EMOTION RECOGNITION (3)
HIDDEN MARKOV MODEL (HMM) (3)
INSTRUMENTS (3)
LABELING (3)
LONG SHORT-TERM MEMORY (3)
MEL FREQUENCY CEPSTRAL COEFFICIENT (3)
NEURAL NETWORK (3)
NEURONS (3)
NIST (3)
PRAGMATICS (3)
SIGNAL PROCESSING (3)
SIGNAL TO NOISE RATIO (3)
SILICON (3)
SPECTROGRAM (3)
SUPPORT VECTOR MACHINES (3)
TIME-FREQUENCY ANALYSIS (3)
ACOUSTIC MODEL (2)
ACOUSTIC MODELLING (2)
ACTIVE LEARNING (2)
ADAPTATION (2)
ALGORITHM DESIGN AND ANALYSIS (2)
ANALYTICAL MODELS (2)
APPROXIMATION METHODS (2)
AUDIO SEGMENTATION (2)
BELIEF PROPAGATION (2)
BIRDS (2)
BLSTM (2)
CAMERAS (2)
CONVOLUTIONAL NEURAL NETWORK (2)
CORRELATION (2)
DECODING (2)
DEEP LEARNING (2)
DENSITY ESTIMATION ROBUST ALGORITHM (2)
DISCRETE COSINE TRANSFORMS (2)
FREQUENCY ESTIMATION (2)
GMM (2)
GRAPHICAL MODELS (2)
HANDWRITING RECOGNITION (2)
IMAGE SEGMENTATION (2)
INTEGRATED CIRCUITS (2)
JOINING PROCESSES (2)
LARGE VOCABULARY SPEECH RECOGNITION (2)
LSTM (2)
MATRIX DECOMPOSITION (2)
MAXOUT (2)
MEASUREMENT (2)
MULTI-TASK LEARNING (2)
MULTIPLE SIGNAL CLASSIFICATION (2)
more

INFONA - science communication portal

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis

Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis

Employment of Subspace Gaussian Mixture Models in speaker recognition

Affective structure modeling of speech using probabilistic context free grammar for emotion recognition

Word-semantic lattices for spoken language understanding

A Bernoulli filter approach to detection and estimation of hidden Markov models using cluttered observation sequences

Deep autoencoders augmented with phone-class feature for reverberant speech recognition

Maximum likelihood nonlinear transformations based on deep neural networks

Cluster adaptive training for deep neural network

Investigating online low-footprint speaker adaptation using generalized linear regression and click-through data

Feedback-based handwriting recognition from inertial sensor data for wearable devices

Modeling mutual influence of multimodal behavior in affective dyadic interactions

Continuous visual speech recognition for audio speech enhancement

Regularizing DNN acoustic models with Gaussian stochastic neurons

Small-footprint high-performance deep neural network-based speech recognition using split-VQ

A hybrid recurrent neural network for music transcription

Blind bleed-through removal for scanned historical document images with conditional random fields

Acoustic scene analysis from acoustic event sequence with intermittent missing event

Multi-instrument detection in polyphonic music using Gaussian Mixture based factorial HMM

Vocaine the vocoder and applications in speech synthesis

Filter options

Publication date

Keywords

INFONA - science communication portal

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)