Search results

chapter

Cumulative moving averaged bottleneck speaker vectors for online speaker adaptation of CNN-based acoustic models

Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5175 - 5179

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Adapting acoustic models to speakers have shown to greatly improve performance for many tasks. Among the adaptation approaches, exploiting auxiliary features characterizing speakers or environments has received great attention because they allow rapid adaptation, i.e. adaptation with limited amount of speech data such as a single utterance. However, the auxiliary features are usually computed in batch...

chapter

Joint CTC-attention based end-to-end speech recognition using multi-task learning

Suyoun Kim, Takaaki Hori, Shinji Watanabe

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4835 - 4839

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments. One approach is the attention-based encoder-decoder framework that learns a mapping between variable-length input and output sequences in one step using a purely data-driven method. The attention model has often been shown to improve the performance...

chapter

Simultaneous segmentation and classification of bird song using CNN

Revathy Narasimhan, Xiaoli Z. Fern, Raviv Raich

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 146 - 150

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In bioacoustics, automatic animal voice detection and recognition from audio recordings is an emerging topic for animal preservation. Our research focuses on bird bioacoustics, where the goal is to segment bird syllables from the recording and predict the bird species for the syllables. Traditional methods for this task addresses the segmentation and species prediction separately, leading to propagated...

chapter

Neural decoding systems using Markov Decision Processes

Henrique Dantas, V John Mathews, Suzanne M. Wendelken, Tyler S. Davis, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 974 - 978

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents a framework for modeling neural decoding using electromyogram (EMG) and electrocorticogram (ECoG) signals to interpret human intent and control prosthetic arms. Specifically, the method of this paper employs Markov Decision Processes (MDP) for neural decoding, parameterizing the policy using an artificial neural network. The system is trained using a modification of the Dataset...

chapter

Improving latency-controlled BLSTM acoustic models for online speech recognition

Shaofei Xue, Zhijie Yan

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5340 - 5344

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Bidirectional long short-term memory (BLSTM) recurrent neural networks are powerful acoustic models in terms of recognition accuracy. When BLSTM acoustic models are used in decoding, the speech decoder needs to wait until the end of a whole sentence is reached, such that forward-propagation in the backward direction can then be performed. The nature of BLSTM acoustic models makes them inappropriate...

chapter

Deep Feature Consistent Variational Autoencoder

Xianxu Hou, Linlin Shen, Ke Sun, Guoping Qiu

2017 IEEE Winter Conference on Applications of Computer Vision (WACV) > 1133 - 1141

2017 IEEE Winter Conference on Applications of Computer Vision (WACV)

We present a novel method for constructing Variational Autoencoder (VAE). Instead of using pixel-by-pixel loss, we enforce deep feature consistency between the input and the output of a VAE, which ensures the VAE's output to preserve the spatial correlation characteristics of the input, thus leading the output to have a more natural visual appearance and better perceptual quality. Based on recent...

chapter

A fast and ultra low power time-based spiking neuromorphic architecture for embedded applications

Tao Liu, Wujie Wen

2017 18th International Symposium on Quality Electronic Design (ISQED) > 19 - 22

2017 18th International Symposium on Quality Electronic Design (ISQED)

Time-based Spiking Neural Network (SNN) has recently received increased attentions in neuromorphic computing system designs due to more bio-plausibility and better energy-efficiency. However, unleashing its potentials in realistic cognitive applications is facing significant challenges such as inefficient information representations and impractical learnings. In this work, we aim for exploring a practical...

chapter

An accurate HSMM-based system for Arabic phonemes recognition

Mohamed O. M. Khelifa, Mostafa Belkasmi, Yousfi Abdellah, Yahya O. M. ElHadj

2017 Ninth International Conference on Advanced Computational Intelligence (ICACI) > 211 - 216

2017 Ninth International Conference on Advanced Computational Intelligence (ICACI)

The majority of successful automatic speech recognition (ASR) systems utilize a probabilistic modeling of the speech signal via hidden Markov models (HMMs). In a standard HMM model, state duration probabilities decrease exponentially with time, which fails to satisfactorily describe the temporal structure of speech. Incorporating explicit state durational probability distribution functions (pdf) into...

chapter

Code-division multiplexed resistive pulse sensor networks for spatio-temporal detection of particles in microfluidic devices

Ningquan Wang, Ruxiu Liu, Roozbeh Khodambashi, Norh Asmare, more

2017 IEEE 30th International Conference on Micro Electro Mechanical Systems (MEMS) > 362 - 365

2017 IEEE 30th International Conference on Micro Electro Mechanical Systems (MEMS)

Spatial separation of suspended particles based on contrast in their physical or chemical properties forms the basis of various biological assays performed on lab-on-a-chip devices. To electronically acquire this information, we have recently introduced a microfluidic sensing platform, called Microfluidic CODES, which combines the resistive pulse sensing with the code division multiple access in multiplexing...

chapter

An extended experimental investigation of DNN uncertainty propagation for noise robust ASR

Karan Nathwani, Juan A. Morales-Cordovilla, Sunit Sivasankaran, Irina Illina, more

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA) > 26 - 30

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA)

Automatic speech recognition (ASR) in noisy environments remains a challenging goal. Recently, the idea of estimating the uncertainty about the features obtained after speech enhancement and propagating it to dynamically adapt deep neural network (DNN) based acoustic models has raised some interest. However, the results in the literature were reported on simulated noisy datasets for a limited variety...

chapter

SCNet: A simplified encoder-decoder CNN for semantic segmentation

Robail Yasrab, Naijie Gu, Xiaoci Zhang

2016 5th International Conference on Computer Science and Network Technology (ICCSNT) > 785 - 789

2016 5th International Conference on Computer Science and Network Technology (ICCSNT)

We present a simplified and novel fully convolutional neural network (CNN) architecture for semantic pixel-wise segmentation named as SCNet. Different from current CNN pipelines, proposed network uses only convolution layers with no pooling layer. The key objective of this model is to offer a more simplified CNN model with equal benchmark performance and results. It is an encoder-decoder based fully...

chapter

Cascading BLSTM networks for handwritten word recognition

Bruno Stuner, Clement Chatelain, Thierry Paquet

2016 23rd International Conference on Pattern Recognition (ICPR) > 3416 - 3421

2016 23rd International Conference on Pattern Recognition (ICPR)

Handwritten word recognition is a tough task, mixing image and natural language processing. Recently new recurrent neural networks with LSTM cells allowed significant improvements in this field. These networks are generally coupled with lexical and linguistic knowledge in order to correct character misrecognitions, namely using a lexicon driven decoding. Yet the high performances of LSTM networks...

chapter

BeamECOC: A local search for the optimization of the ECOC matrix

Cemre Zor, Berrin Yanikoglu, Erinc Merdivan, Terry Windeatt, more

2016 23rd International Conference on Pattern Recognition (ICPR) > 198 - 203

2016 23rd International Conference on Pattern Recognition (ICPR)

Error Correcting Output Coding (ECOC) is a multi-class classification technique in which multiple binary classifiers are trained according to a preset code matrix such that each one learns a separate dichotomy of the classes. While ECOC is one of the best solutions for multi-class problems, one issue which makes it suboptimal is that the training of the base classifiers is done independently of the...

chapter

Face hallucination by deep traversal network

Zhanxiang Feng, Jianhuang Lai, Xiaohua Xie, Dakun Yang, more

2016 23rd International Conference on Pattern Recognition (ICPR) > 3276 - 3281

2016 23rd International Conference on Pattern Recognition (ICPR)

In this paper, we propose a novel patch-based face hallucination method that consists of two patch-based sparse autoencoder (SAE) networks and a deep fully connected network (namely traversal network). The SAE networks are used to capture the intrinsic features of low-resolution (LR) images and high-resolution (HR) images in the hidden layers, while the traversal network is used to map features from...

chapter

Image compression via multiple learned geometric dictionaries

Danlan Huang, Xiaoming Tao, Mai Xu, Shenghua Gao, more

2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP) > 1373 - 1377

2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

In this paper, we present a novel image codec by leveraging sparse representation strategy for geometric pattern encoding. Specifically, we propose a Multiple Learned Geometric Dictionaries (MLGD) solution to explore various texture patterns of images, and use different dictionaries to encode homogenous smooth components and heterogeneous directional components. Profiting from model proficiency, our...

chapter

Look, listen, and decode: Multimodal speech recognition with images

Felix Sun, David Harwath, James Glass

2016 IEEE Spoken Language Technology Workshop (SLT) > 573 - 578

2016 IEEE Spoken Language Technology Workshop (SLT)

In this paper, we introduce a multimodal speech recognition scenario, in which an image provides contextual information for a spoken caption to be decoded. We investigate a lattice rescoring algorithm that integrates information from the image at two different points: the image is used to augment the language model with the most likely words, and to rescore the top hypotheses using a word-level RNN...

chapter

Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting

Ming Sun, Anirudh Raju, George Tucker, Sankaran Panchapagesan, more

2016 IEEE Spoken Language Technology Workshop (SLT) > 474 - 480

2016 IEEE Spoken Language Technology Workshop (SLT)

We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and latency requirements. The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network. A posterior smoothing based evaluation approach is employed to measure keyword spotting performance...

chapter

Abstractive headline generation for spoken content by attentive recurrent neural networks with ASR error modeling

Lang-Chi Yu, Hung-yi Lee, Lin-shan Lee

2016 IEEE Spoken Language Technology Workshop (SLT) > 151 - 157

2016 IEEE Spoken Language Technology Workshop (SLT)

Headline generation for spoken content is important since spoken content is difficult to be shown on the screen and browsed by the user. It is a special type of abstractive summarization, for which the summaries are generated word by word from scratch without using any part of the original content. Many deep learning approaches for headline generation from text document have been proposed recently,...

chapter

Speaker independent diarization for child language environment analysis using deep neural networks

Maryam Najafian, John H. L. Hansen

2016 IEEE Spoken Language Technology Workshop (SLT) > 114 - 120

2016 IEEE Spoken Language Technology Workshop (SLT)

Large-scale monitoring of the child language environment through measuring the amount of speech directed to the child by other children and adults during a vocal communication is an important task. Using the audio extracted from a recording unit worn by a child within a childcare center, at each point in time our proposed diarization system can determine the content of the child's language environment,...

chapter

Iterative training of a DPGMM-HMM acoustic unit recognizer in a zero resource scenario

Michael Heck, Sakriani Sakti, Satoshi Nakamura

2016 IEEE Spoken Language Technology Workshop (SLT) > 57 - 63

2016 IEEE Spoken Language Technology Workshop (SLT)

In this paper we propose a framework for building a full-fledged acoustic unit recognizer in a zero resource setting, i.e., without any provided labels. For that, we combine an iterative Dirichlet process Gaussian mixture model (DPGMM) clustering framework with a standard pipeline for supervised GMM-HMM acoustic model (AM) and n-gram language model (LM) training, enhanced by a scheme for iterative...

INFONA - science communication portal

Search results

Cumulative moving averaged bottleneck speaker vectors for online speaker adaptation of CNN-based acoustic models

Joint CTC-attention based end-to-end speech recognition using multi-task learning

Simultaneous segmentation and classification of bird song using CNN

Neural decoding systems using Markov Decision Processes

Improving latency-controlled BLSTM acoustic models for online speech recognition

Deep Feature Consistent Variational Autoencoder

A fast and ultra low power time-based spiking neuromorphic architecture for embedded applications

An accurate HSMM-based system for Arabic phonemes recognition

Code-division multiplexed resistive pulse sensor networks for spatio-temporal detection of particles in microfluidic devices

An extended experimental investigation of DNN uncertainty propagation for noise robust ASR

SCNet: A simplified encoder-decoder CNN for semantic segmentation

Cascading BLSTM networks for handwritten word recognition

BeamECOC: A local search for the optimization of the ECOC matrix

Face hallucination by deep traversal network

Image compression via multiple learned geometric dictionaries

Look, listen, and decode: Multimodal speech recognition with images

Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting

Abstractive headline generation for spoken content by attentive recurrent neural networks with ASR error modeling

Speaker independent diarization for child language environment analysis using deep neural networks

Iterative training of a DPGMM-HMM acoustic unit recognizer in a zero resource scenario

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options