Search results

Items from 121 to 140 out of 2,639 results

1 ...
4
5
6
7
8
9
10

chapter

Learning a hierarchical spatio-temporal model for human activity recognition

Wanru Xu, Zhenjiang Miao, Xiao-Ping Zhang, Yi Tian

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1607 - 1611

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recent works have shown that hierarchical models lead to significant improvement in human activity recognition, which can not only enhance descriptive capability, but also improve discriminative power. However, most existing methods exploit just one of the two advantages. In this paper, a new hierarchical spatio-temporal model (HSTM) is proposed to integrate feature learning into two-layer hierarchical...

chapter

The 2016 BBN Georgian telephone speech keyword spotting system

Tanel Alumae, Damianos Karakos, William Hartmann, Roger Hsiao, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5755 - 5759

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we describe the 2016 BBN conversational telephone speech keyword spotting system; the culmination of four years of research and development under the IARPA Babel program. The system was constructed in response to the NIST Open Keyword Search (OpenKWS) evaluation of 2016. We present our technological breakthroughs in building top-performing keyword spotting processing systems for new...

chapter

Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding

Su Zhu, Kai Yu

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5675 - 5679

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper investigates the framework of encoder-decoder with attention for sequence labelling based spoken language understanding. We introduce Bidirectional Long Short Term Memory - Long Short Term Memory networks (BLSTM-LSTM) as the encoder-decoder model to fully utilize the power of deep learning. In the sequence labelling task, the input and output sequences are aligned word by word, while the...

chapter

Drum transcription from polyphonic music with recurrent neural networks

Richard Vogl, Matthias Dorfer, Peter Knees

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 201 - 205

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Automatic drum transcription methods aim at extracting a symbolic representation of notes played by a drum kit in audio recordings. For automatic music analysis, this task is of particular interest as such a transcript can be used to extract high level information about the piece, e.g., tempo, downbeat positions, meter, and genre cues. In this work, an approach to transcribe drums from polyphonic...

chapter

CNN architectures for large-scale audio classification

Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 131 - 135

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the...

chapter

Improving audio-visual speech recognition using deep neural networks with dynamic stream reliability estimates

Hendrik Meutzner, Ning Ma, Robert Nickel, Christopher Schymura, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5320 - 5324

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Audio-visual speech recognition is a promising approach to tackling the problem of reduced recognition rates under adverse acoustic conditions. However, finding an optimal mechanism for combining multi-modal information remains a challenging task. Various methods are applicable for integrating acoustic and visual information in Gaussian-mixture-model-based speech recognition, e.g., via dynamic stream...

chapter

Blind bandwidth extension using K-means and Support Vector Regression

Chih-Wei Wu, Mark Vinton

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 721 - 725

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, a blind bandwidth extension algorithm for music signals has been proposed. This method applies the K-means algorithm to firstly cluster audio data in the feature space, and constructs multiple envelope predictors for each cluster accordingly using Support Vector Regression (SVR). A set of well-established audio features for Music Information Retrieval (MIR) has been used to characterize...

chapter

Low-rank and sparse soft targets to learn better DNN acoustic models

Pranay Dighe, Afsaneh Asaei, Herve Bourlard

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5265 - 5269

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Conventional deep neural networks (DNN) for speech acoustic modeling rely on Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary class labels as the targets for DNN training. Subword classes in speech recognition systems correspond to context-dependent tied states or senones. The present work addresses some limitations of GMM-HMM senone alignments for DNN training. We hypothesize...

chapter

Faster sequence training

Albert Zeyer, Ilia Kulikov, Ralf Schluter, Hermann Ney

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5285 - 5289

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

It has been shown that sequence-discriminative training can improve the performance for large vocabulary continuous speech recognition. Our main contribution is a novel method for reducing the computation time of any sort of sequence training while only slightly decreasing the overall performance. The method allows to parallelize the forward propagation through the network, the loss and loss gradient...

chapter

Combination strategy based on relative performance monitoring for multi-stream reverberant speech recognition

Feifei Xiong, Stefan Goetze, Bernd T. Meyer

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4870 - 4874

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A multi-stream framework with deep neural network (DNN) classifiers is applied to improve automatic speech recognition (ASR) in environments with different reverberation characteristics. We propose a room parameter estimation model to establish a reliable combination strategy which performs on either DNN posterior probabilities or word lattices. The model is implemented by training a multilayer perceptron...

chapter

Sequence segmentation using joint RNN and structured prediction models

Yossi Adi, Joseph Keshet, Emily Cibelli, Matthew Goldrick

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2422 - 2426

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We describe and analyze a simple and effective algorithm for sequence segmentation applied to speech processing tasks. We propose a neural architecture that is composed of two modules trained jointly: a recurrent neural network (RNN) module and a structured prediction model. The RNN outputs are considered as feature functions to the structured model. The overall model is trained with a structured...

chapter

Inferring latent states in a network influenced by neighbor activities: An undirected generative approach

Buddhika L. Samarakoon, Manohar N. Murthi, Kamal Premaratne

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2372 - 2376

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The problem of inferring the hidden state of individual nodes in social/sensor networks in which node activities affect their neighbors is growing in importance. We present an undirected generative model, a type of probabilistic model that has so far not been used for modeling latent variables influenced by neighbors in a network. We also propose an efficient inference method based on variational...

chapter

Expressive visual text to speech and expression adaptation using deep neural networks

Jonathan Parker, Ranniery Maia, Yannis Stylianou, Roberto Cipolla

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4920 - 4924

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we present an expressive visual text to speech system (VTTS) based on a deep neural network (DNN). Given an input text sentence and a set of expression tags, the VTTS is able to produce not only the audio speech, but also the accompanying facial movements. The expressions can either be one of the expressions in the training corpus or a blend of expressions from the training corpus....

chapter

End-to-end ASR-free keyword search from speech

Kartik Audhkhasi, Andrew Rosenberg, Abhinav Sethy, Bhuvana Ramabhadran, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4840 - 4844

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

End-to-end (E2E) systems have achieved competitive results compared to conventional hybrid hidden Markov model (HMM)-deep neural network based automatic speech recognition (ASR) systems. Such E2E systems are attractive due to the lack of dependence on alignments between input acoustic and output grapheme or HMM state sequence during training. This paper explores the design of an ASR-free end-to-end...

chapter

BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic Sound Event Detection

Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 766 - 770

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents a new hybrid approach for polyphonic Sound Event Detection (SED) which incorporates a temporal structure modeling technique based on a hidden Markov model (HMM) with a frame-by-frame detection method based on a bidirectional long short-term memory (BLSTM) recurrent neural network (RNN). The proposed BLSTM-HMM hybrid system makes it possible to model sound event-dependent temporal...

chapter

Bayesian phonotactic Language Model for Acoustic Unit Discovery

Lucas Ondel, Lukas Burget, Jan Cernocky, Santosh Kesiraju

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5750 - 5754

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recent work on Acoustic Unit Discovery (AUD) has led to the development of a non-parametric Bayesian phone-loop model where the prior over the probability of the phone-like units is assumed to be sampled from a Dirichlet Process (DP). In this work, we propose to improve this model by incorporating a Hierarchical Pitman-Yor based bigram Language Model on top of the units' transitions. This new model...

chapter

Speaker segmentation using deep speaker vectors for fast speaker change scenarios

Renyu Wang, Mingliang Gu, Lantian Li, Mingxing Xu, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5420 - 5424

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A novel speaker segmentation approach based on deep neural network is proposed and investigated. This approach uses deep speaker vectors (d-vectors) to represent speaker characteristics and to find speaker change points. The d-vector is a kind of frame-level speaker discriminative feature, whose discriminative training process corresponds to the goal of discriminating a speaker change point from a...

chapter

Enhancing noise and pitch robustness of children's ASR

S Shahnawazuddin, Deepak K. T., Gayadhar Pradhan, Rohit Sinha

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5225 - 5229

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

It is well known that, when noisy speech is transcribed using automatic speech recognition (ASR) systems trained on clean data, a highly degraded recognition performance is obtained. The problemgets further aggravatedwhen the targeted group happens to be child speakers. For children's speech, the acoustic correlates such as pitch and formant frequency vary significantly with age. This makes the recognition...

chapter

A deep neural network integrated with filterbank learning for speech recognition

Hiroshi Seki, Kazumasa Yamamoto, Seiichi Nakagawa

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5480 - 5484

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Deep neural networks (DNN) have achieved significant success in the field of speech recognition. One of the main advantages of the DNN is automatic feature extraction without human intervention. Therefore, we incorporate a pseudo-filterbank layer to the bottom of DNN and train the whole filterbank layer and the following networks jointly, while most systems take pre-defined mel-scale filterbanks as...

chapter

Exploiting sequence information for text-dependent Speaker Verification

Subhadeep Dey, Petr Motlicek, Srikanth Madikeri, Marc Ferras

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5370 - 5374

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Model-based approaches to Speaker Verification (SV), such as Joint Factor Analysis (JFA), i-vector and relevance Maximum-a-Posteriori (MAP), have shown to provide state-of-the-art performance for text-dependent systems with fixed phrases. The performance of i-vector and JFA models has been further enhanced by estimating posteriors from Deep Neural Network (DNN) instead of Gaussian Mixture Model (GMM)...

1 ...
4
5
6
7
8
9
10

Keywords:
TRAINING
HIDDEN MARKOV MODELS

Publication date

Set your own date range

Content availability

Available (2 627)
None (12)

Keywords

SPEECH (916)
SPEECH RECOGNITION (756)
FEATURE EXTRACTION (722)
ACOUSTICS (426)
HIDDEN MARKOV MODEL (334)
ACCURACY (312)
COMPUTATIONAL MODELING (307)
DATABASES (290)
DATA MODELS (286)
DATA MINING (233)
SUPPORT VECTOR MACHINES (228)
TRAINING DATA (212)
HMM (211)
HANDWRITING RECOGNITION (184)
TESTING (183)
NATURAL LANGUAGE PROCESSING (175)
MATHEMATICAL MODEL (161)
ARTIFICIAL NEURAL NETWORKS (151)
VECTORS (146)
NEURAL NETWORKS (137)
LEARNING (ARTIFICIAL INTELLIGENCE) (136)
ADAPTATION MODELS (135)
SPEECH PROCESSING (132)
SPEECH SYNTHESIS (129)
CONTEXT (116)
MEL FREQUENCY CEPSTRAL COEFFICIENT (116)
DECODING (111)
IMAGE SEGMENTATION (111)
SPEAKER RECOGNITION (109)
PROBABILITY (105)
AUTOMATIC SPEECH RECOGNITION (104)
HUMANS (99)
ADAPTATION MODEL (97)
TRAJECTORY (94)
CLASSIFICATION ALGORITHMS (93)
VOCABULARY (93)
GESTURE RECOGNITION (89)
GAUSSIAN PROCESSES (85)
MAXIMUM LIKELIHOOD ESTIMATION (85)
MARKOV PROCESSES (83)
CHARACTER RECOGNITION (82)
ERROR ANALYSIS (82)
PATTERN RECOGNITION (81)
TEXT ANALYSIS (80)
ESTIMATION (79)
DICTIONARIES (77)
VITERBI ALGORITHM (77)
NOISE (76)
PREDICTIVE MODELS (76)
IMAGE RECOGNITION (75)
MACHINE LEARNING (74)
PATTERN CLASSIFICATION (73)
OPTIMIZATION (72)
VISUALIZATION (71)
KERNEL (67)
ROBUSTNESS (66)
TAGGING (66)
CLUSTERING ALGORITHMS (63)
STATISTICAL ANALYSIS (63)
CONTEXT MODELING (62)
SHAPE (62)
FACE RECOGNITION (61)
JOINTS (57)
NOISE MEASUREMENT (57)
RECURRENT NEURAL NETWORKS (57)
IMAGE CLASSIFICATION (56)
NEURONS (56)
SUPPORT VECTOR MACHINE (55)
LABELING (53)
TRANSFORMS (53)
CONDITIONAL RANDOM FIELDS (52)
SENSORS (52)
STANDARDS (52)
ALGORITHM DESIGN AND ANALYSIS (51)
HANDWRITTEN CHARACTER RECOGNITION (51)
NEURAL NETS (50)
SEMANTICS (49)
BAYES METHODS (47)
DETECTORS (47)
FACE (47)
IMAGE SEQUENCES (47)
PRINCIPAL COMPONENT ANALYSIS (47)
CONFERENCES (46)
CAMERAS (45)
PROBABILISTIC LOGIC (43)
TEXT RECOGNITION (43)
DISCRIMINATIVE TRAINING (42)
NATURAL LANGUAGES (42)
COMPUTER VISION (41)
ENTROPY (41)
INFORMATION RETRIEVAL (41)
SIGNAL TO NOISE RATIO (41)
HEURISTIC ALGORITHMS (40)
IMAGE MOTION ANALYSIS (40)
LATTICES (40)
PATTERN CLUSTERING (40)
PIXEL (40)
BIOLOGICAL SYSTEM MODELING (39)
more

INFONA - science communication portal

Search results

Learning a hierarchical spatio-temporal model for human activity recognition

The 2016 BBN Georgian telephone speech keyword spotting system

Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding

Drum transcription from polyphonic music with recurrent neural networks

CNN architectures for large-scale audio classification

Improving audio-visual speech recognition using deep neural networks with dynamic stream reliability estimates

Blind bandwidth extension using K-means and Support Vector Regression

Low-rank and sparse soft targets to learn better DNN acoustic models

Faster sequence training

Combination strategy based on relative performance monitoring for multi-stream reverberant speech recognition

Sequence segmentation using joint RNN and structured prediction models

Inferring latent states in a network influenced by neighbor activities: An undirected generative approach

Expressive visual text to speech and expression adaptation using deep neural networks

End-to-end ASR-free keyword search from speech

BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic Sound Event Detection

Bayesian phonotactic Language Model for Acoustic Unit Discovery

Speaker segmentation using deep speaker vectors for fast speaker change scenarios

Enhancing noise and pitch robustness of children's ASR

A deep neural network integrated with filterbank learning for speech recognition

Exploiting sequence information for text-dependent Speaker Verification

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options