Search results

chapter

Look, listen, and decode: Multimodal speech recognition with images

Felix Sun, David Harwath, James Glass

2016 IEEE Spoken Language Technology Workshop (SLT) > 573 - 578

2016 IEEE Spoken Language Technology Workshop (SLT)

In this paper, we introduce a multimodal speech recognition scenario, in which an image provides contextual information for a spoken caption to be decoded. We investigate a lattice rescoring algorithm that integrates information from the image at two different points: the image is used to augment the language model with the most likely words, and to rescore the top hypotheses using a word-level RNN...

chapter

Jointly learning to align and convert graphemes to phonemes with neural attention models

Shubham Toshniwal, Karen Livescu

2016 IEEE Spoken Language Technology Workshop (SLT) > 76 - 82

2016 IEEE Spoken Language Technology Workshop (SLT)

We propose an attention-enabled encoder-decoder model for the problem of grapheme-to-phoneme conversion. Most previous work has tackled the problem via joint sequence models that require explicit alignments for training. In contrast, the attention-enabled encoder-decoder model allows for jointly learning to align and convert characters to phonemes. We explore different types of attention models, including...

chapter

Triply Stochastic Variational Inference for Non-linear Beta Process Factor Analysis

Kai Fan, Yizhe Zhang, Ricardo Henao, Katherine Heller

2016 IEEE 16th International Conference on Data Mining (ICDM) > 121 - 130

2016 IEEE 16th International Conference on Data Mining (ICDM)

We propose a non-linear extension to factor analysis with beta process priors for improved data representation ability. This non-linear Beta Process Factor Analysis (nBPFA) allows data to be represented as a non-linear transformation of a standard sparse factor decomposition. We develop a scalable variational inference framework, which builds upon the ideas of the variational auto-encoder, by allowing...

chapter

Improved keyword spotting based on keyword/garbage models

Qiyu Chen, Weibin Zhang, Xiangmin Xu, Xiaofen Xing

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

We propose two simple methods to improve the performance of a keyword spotting system. In our application, the users are allowed to change the keywords anytime if they want. Thus we focused on phone-based GMM-HMM models since they do not require keyword-specific training data. However, the GMM-HMM based models usually have very high false alarm rate, i.e., a keyword is not present but the system gives...

chapter

Multi-task recurrent model for true multilingual speech recognition

Zhiyuan Tang, Lantian Li, Dong Wang

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Research on multilingual speech recognition remains attractive yet challenging. Recent studies focus on learning shared structures under the multi-task paradigm, in particular a feature sharing structure. This approach has been found effective to improve performance on each individual language. However, this approach is only useful when the deployed system supports just one language. In a true multilingual...

chapter

Secure computation of linear functions over linear discrete multiple-access wiretap channels

Mario Goldenbaum, Holger Boche, H. Vincent Poor

2016 50th Asilomar Conference on Signals, Systems and Computers > 1670 - 1674

2016 50th Asilomar Conference on Signals, Systems and Computers

In this paper, a joint source-channel coding approach is taken to the problem of securely computing a function of distributed sources over a multiple-access wiretap channel that is linear with respect to a finite field. It is shown that if the joint source distribution fulfills certain conditions and the function to be computed matches the linear structure of the channel, secrecy comes for free in...

chapter

A reordering model for Vietnamese-English statistical machine translation using dependency information

Viet Hong Tran, Huyen Thuong Vu, Thu Hoai Pham, Vinh Van Nguyen, more

2016 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF) > 125 - 130

2016 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF)

Reordering is a major challenge in machine translation (MT) between two languages with significant differences in word order. In this paper, we present an approach to learn reordering rules as pre-processing step based on a dependency parser in phrase-based statistical machine translation (SMT) from Vietnamese to English. Dependency parser and transformation rules are used to reorder the source sentence...

chapter

Information-theoretic limits of algorithmic noise tolerance

Daewon Seo, Lav R. Varshney

2016 IEEE International Conference on Rebooting Computing (ICRC) > 1 - 4

2016 IEEE International Conference on Rebooting Computing (ICRC)

Statistical error compensation techniques in computing circuits are becoming prevalent, especially as implemented on nanoscale physical substrates. One such technique that has been developed and deployed is algorithmic noise tolerance (ANT), which aggregates information from several computational branches operating at different points along energy-reliability circuit tradeoffs. To understand this...

chapter

Extracting behaviour from an executable instruction set model

Brian Campbell, Ian Stark

2016 Formal Methods in Computer-Aided Design (FMCAD) > 33 - 40

2016 Formal Methods in Computer-Aided Design (FMCAD)

Presenting large formal instruction set models as executable functions makes them accessible to engineers and useful for less formal purposes such as simulation. However, it is more difficult to extract information about the behaviour of individual instructions for reasoning. We present a method which combines symbolic evaluation and symbolic execution techniques to provide a rule-based view of instruction...

chapter

Cluster-based senone selection for the efficient calculation of deep neural network acoustic models

Jun-Hua Liu, Zhen-Hua Ling, Si Wei, Guo-Ping Hu, more

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

In this paper, we propose a cluster-based senone selection method to speed up the computation of deep neural networks (DNN) at the decoding time of speech recognition. In DNN-based acoustic models, the large number of senones at the output layer is one of the main causes that lead to the high computation complexity of DNNs. Inspired by the mixture selection method designed for the Gaussian mixture...

chapter

Prosodic annotation enriched statistical machine translation

Peidong Guo, Heyan Huang, Ping Jian, Yuhang Guo

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

More and more linguistic information has been employed to improve the performance of machine translation, such as part of speech, syntactic structures, discourse contexts, and so on. However, conventional approaches typically ignore the key information beyond the text such as prosody. In this paper, we exploit and employ three prosodic features: pronunciation (phonetic alphabet and tone), prosodic...

chapter

Applying connectionist temporal classification objective function to Chinese Mandarin speech recognition

Pengrui Wang, Jie Li, Bo Xu

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

In automatic speech recognition (ASR), connectionist temporal classification (CTC) is regarded as a method to achieve end-to-end system. Actually, not only characters (Chars) but also context independent phonemes (CI-Phns) or context dependent phoneme (CD-Phns) can be used as output units of CTC-trained neural network. The contribution of this paper mainly lies in three aspects: First, we trained...

chapter

Bidirectional Decoder Networks for Attention-Based End-to-End Offline Handwriting Recognition

Patrick Doetsch, Albert Zeyer, Hermann Ney

2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR) > 361 - 366

2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)

Recurrent neural networks that can be trained end-to-end on sequence learning tasks provide promising benefits over traditional recognition systems. In this paper, we demonstrate the application of an attention-based long short-term memory decoder network for offline handwriting recognition and analyze the segmentation, classification and decoding errors produced by the model. We further extend the...

chapter

Computer model of steganographic system based on contraction mapping with stream audio container

Maxim Shakurskiy, Victor Shakurskiy, Vladimir Volovach

2016 IEEE East-West Design & Test Symposium (EWDTS) > 1 - 4

2016 IEEE East-West Design & Test Symposium (EWDTS)

In this paper digital steganographic system based on contraction mapping is considered. The proposed steganographic algorithm uses two channels to achieve informational redundancy, which allow decoding by using an algorithm invariant to container signal. This peculiarity is excellent in case of information hiding in chaotic signal. This paper is devoted to the illustration of algorithm performance...

chapter

Coding theory for robust computing: Models, tools, and applications

Lara Dolecek

2016 9th International Symposium on Turbo Codes and Iterative Information Processing (ISTC) > 111 - 115

2016 9th International Symposium on Turbo Codes and Iterative Information Processing (ISTC)

Computing under uncertainty has become increasingly important in modern information processing systems. In this work, we first review exciting recent results on the multi-faceted role coding approaches can play in this important discipline, including fundamental performance limits of noisy iterative algorithms and decoders, implications on system design, and coding-theoretic methods for approximate...

chapter

A fully convolutional deep auditory model for musical chord recognition

Filip Korzeniowski, Gerhard Widmer

2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 6

2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)

Chord recognition systems depend on robust feature extraction pipelines. While these pipelines are traditionally hand-crafted, recent advances in end-to-end machine learning have begun to inspire researchers to explore data-driven methods for such tasks. In this paper, we present a chord recognition system that uses a fully convolutional deep auditory model for feature extraction. The extracted features...

chapter

Redundant frame structure using M-frame for interactive light field streaming

Benedicte Motz, Gene Cheung, Antonio Ortega

2016 IEEE International Conference on Image Processing (ICIP) > 1369 - 1373

2016 IEEE International Conference on Image Processing (ICIP)

A light field (LF) is a 2D array of closely spaced viewpoint images of a static 3D scene. In an interactive LF streaming (ILFS) scenario, a user successively requests desired neighboring viewpoints for observation, and in response the server must transmit pre-encoded data for correct decoding of the requested viewpoint images. Designing frame structures for ILFS is challenging, since at encoding time...

chapter

Fault diagnosis in DESs modeled by partially observed Petri nets

Li Yin, Zhiwu Li, Naiqi Wu

2016 IEEE International Conference on Automation Science and Engineering (CASE) > 966 - 971

2016 IEEE International Conference on Automation Science and Engineering (CASE)

In this paper, we focus on fault diagnosis in discrete event systems (DESs) which are modeled by partially observed Petri nets. We consider not only the case where faults occur either on transitions or places, but also a more general case where faults occur on both transitions and places at the same time. Some faults cannot be diagnosed directly due to the unobservability of some transitions and places...

chapter

Efficient SpiNNaker simulation of a heteroassociative memory using the Neural Engineering Framework

James Knight, Aaron R. Voelker, Andrew Mundy, Chris Eliasmith, more

2016 International Joint Conference on Neural Networks (IJCNN) > 5210 - 5217

2016 International Joint Conference on Neural Networks (IJCNN)

The biological brain is a highly plastic system within which the efficacy and structure of synaptic connections are constantly changing in response to internal and external stimuli. While numerous models of this plastic behavior exist at various levels of abstraction, how these mechanisms allow the brain to learn meaningful values is unclear. The Neural Engineering Framework (NEF) is a hypothesis...

chapter

Multi-digit image synthesis using recurrent conditional variational autoencoder

Haoze Sun, Weidi Xu, Chao Deng, Ying Tan

2016 International Joint Conference on Neural Networks (IJCNN) > 375 - 380

2016 International Joint Conference on Neural Networks (IJCNN)

In the field of deep neural networks, several generative methods have been proposed to address the challenges from generative and discriminative tasks, e.g., natural language process, image caption and image generation. In this paper, a conditional recurrent variational autoencoder is proposed for multi-digit image synthesis. This model is capable of generating multi-digit images from the given number...

INFONA - science communication portal

Search results

Look, listen, and decode: Multimodal speech recognition with images

Jointly learning to align and convert graphemes to phonemes with neural attention models

Triply Stochastic Variational Inference for Non-linear Beta Process Factor Analysis

Improved keyword spotting based on keyword/garbage models

Multi-task recurrent model for true multilingual speech recognition

Secure computation of linear functions over linear discrete multiple-access wiretap channels

A reordering model for Vietnamese-English statistical machine translation using dependency information

Information-theoretic limits of algorithmic noise tolerance

Extracting behaviour from an executable instruction set model

Cluster-based senone selection for the efficient calculation of deep neural network acoustic models

Prosodic annotation enriched statistical machine translation

Applying connectionist temporal classification objective function to Chinese Mandarin speech recognition

Bidirectional Decoder Networks for Attention-Based End-to-End Offline Handwriting Recognition

Computer model of steganographic system based on contraction mapping with stream audio container

Coding theory for robust computing: Models, tools, and applications

A fully convolutional deep auditory model for musical chord recognition

Redundant frame structure using M-frame for interactive light field streaming

Fault diagnosis in DESs modeled by partially observed Petri nets

Efficient SpiNNaker simulation of a heteroassociative memory using the Neural Engineering Framework

Multi-digit image synthesis using recurrent conditional variational autoencoder

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options