The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we introduce a multimodal speech recognition scenario, in which an image provides contextual information for a spoken caption to be decoded. We investigate a lattice rescoring algorithm that integrates information from the image at two different points: the image is used to augment the language model with the most likely words, and to rescore the top hypotheses using a word-level RNN...
We propose an attention-enabled encoder-decoder model for the problem of grapheme-to-phoneme conversion. Most previous work has tackled the problem via joint sequence models that require explicit alignments for training. In contrast, the attention-enabled encoder-decoder model allows for jointly learning to align and convert characters to phonemes. We explore different types of attention models, including...
We propose a non-linear extension to factor analysis with beta process priors for improved data representation ability. This non-linear Beta Process Factor Analysis (nBPFA) allows data to be represented as a non-linear transformation of a standard sparse factor decomposition. We develop a scalable variational inference framework, which builds upon the ideas of the variational auto-encoder, by allowing...
We propose two simple methods to improve the performance of a keyword spotting system. In our application, the users are allowed to change the keywords anytime if they want. Thus we focused on phone-based GMM-HMM models since they do not require keyword-specific training data. However, the GMM-HMM based models usually have very high false alarm rate, i.e., a keyword is not present but the system gives...
Research on multilingual speech recognition remains attractive yet challenging. Recent studies focus on learning shared structures under the multi-task paradigm, in particular a feature sharing structure. This approach has been found effective to improve performance on each individual language. However, this approach is only useful when the deployed system supports just one language. In a true multilingual...
In this paper, a joint source-channel coding approach is taken to the problem of securely computing a function of distributed sources over a multiple-access wiretap channel that is linear with respect to a finite field. It is shown that if the joint source distribution fulfills certain conditions and the function to be computed matches the linear structure of the channel, secrecy comes for free in...
Reordering is a major challenge in machine translation (MT) between two languages with significant differences in word order. In this paper, we present an approach to learn reordering rules as pre-processing step based on a dependency parser in phrase-based statistical machine translation (SMT) from Vietnamese to English. Dependency parser and transformation rules are used to reorder the source sentence...
Statistical error compensation techniques in computing circuits are becoming prevalent, especially as implemented on nanoscale physical substrates. One such technique that has been developed and deployed is algorithmic noise tolerance (ANT), which aggregates information from several computational branches operating at different points along energy-reliability circuit tradeoffs. To understand this...
Presenting large formal instruction set models as executable functions makes them accessible to engineers and useful for less formal purposes such as simulation. However, it is more difficult to extract information about the behaviour of individual instructions for reasoning. We present a method which combines symbolic evaluation and symbolic execution techniques to provide a rule-based view of instruction...
In this paper, we propose a cluster-based senone selection method to speed up the computation of deep neural networks (DNN) at the decoding time of speech recognition. In DNN-based acoustic models, the large number of senones at the output layer is one of the main causes that lead to the high computation complexity of DNNs. Inspired by the mixture selection method designed for the Gaussian mixture...
More and more linguistic information has been employed to improve the performance of machine translation, such as part of speech, syntactic structures, discourse contexts, and so on. However, conventional approaches typically ignore the key information beyond the text such as prosody. In this paper, we exploit and employ three prosodic features: pronunciation (phonetic alphabet and tone), prosodic...
In automatic speech recognition (ASR), connectionist temporal classification (CTC) is regarded as a method to achieve end-to-end system. Actually, not only characters (Chars) but also context independent phonemes (CI-Phns) or context dependent phoneme (CD-Phns) can be used as output units of CTC-trained neural network. The contribution of this paper mainly lies in three aspects: First, we trained...
Recurrent neural networks that can be trained end-to-end on sequence learning tasks provide promising benefits over traditional recognition systems. In this paper, we demonstrate the application of an attention-based long short-term memory decoder network for offline handwriting recognition and analyze the segmentation, classification and decoding errors produced by the model. We further extend the...
In this paper digital steganographic system based on contraction mapping is considered. The proposed steganographic algorithm uses two channels to achieve informational redundancy, which allow decoding by using an algorithm invariant to container signal. This peculiarity is excellent in case of information hiding in chaotic signal. This paper is devoted to the illustration of algorithm performance...
Computing under uncertainty has become increasingly important in modern information processing systems. In this work, we first review exciting recent results on the multi-faceted role coding approaches can play in this important discipline, including fundamental performance limits of noisy iterative algorithms and decoders, implications on system design, and coding-theoretic methods for approximate...
Chord recognition systems depend on robust feature extraction pipelines. While these pipelines are traditionally hand-crafted, recent advances in end-to-end machine learning have begun to inspire researchers to explore data-driven methods for such tasks. In this paper, we present a chord recognition system that uses a fully convolutional deep auditory model for feature extraction. The extracted features...
A light field (LF) is a 2D array of closely spaced viewpoint images of a static 3D scene. In an interactive LF streaming (ILFS) scenario, a user successively requests desired neighboring viewpoints for observation, and in response the server must transmit pre-encoded data for correct decoding of the requested viewpoint images. Designing frame structures for ILFS is challenging, since at encoding time...
In this paper, we focus on fault diagnosis in discrete event systems (DESs) which are modeled by partially observed Petri nets. We consider not only the case where faults occur either on transitions or places, but also a more general case where faults occur on both transitions and places at the same time. Some faults cannot be diagnosed directly due to the unobservability of some transitions and places...
The biological brain is a highly plastic system within which the efficacy and structure of synaptic connections are constantly changing in response to internal and external stimuli. While numerous models of this plastic behavior exist at various levels of abstraction, how these mechanisms allow the brain to learn meaningful values is unclear. The Neural Engineering Framework (NEF) is a hypothesis...
In the field of deep neural networks, several generative methods have been proposed to address the challenges from generative and discriminative tasks, e.g., natural language process, image caption and image generation. In this paper, a conditional recurrent variational autoencoder is proposed for multi-digit image synthesis. This model is capable of generating multi-digit images from the given number...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.