The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
With the constant development of smart devices, Gesture recognition is applied on more and more fields. Gesture recognition devices currently on the market are inconvenient and expensive. A gesture recognition method based on Kinect is proposed in this paper. The camera of Kinect is used to get gesture images and then a hidden Markov model is established to recognize dynamic gesture so that the operation...
Novelty detection is the task of recognising events the differ from a model of normality. This paper proposes an acoustic novelty detector based on neural networks trained with an adversarial training strategy. The proposed approach is composed of a feature extraction stage that calculates Log-Mel spectral features from the input signal. Then, an autoencoder network, trained on a corpus of “normal”...
Traditional stacked autoencoders have an equal number of encoders and decoders. However, while fine-tuned as a deep neural network the decoder portion is detached and never used. This begs the question: ‘do we need equal number of decoders and encoders’? In this study we explore asymmetric autoencoders — unequal number of encoders and decoders. We specifically address two tasks — 1. Classification...
Feature extraction becomes increasingly important as data grows high dimensional. Autoencoder as a neural network based feature extraction method achieves great success in generating abstract features of high dimensional data. However, it fails to consider the relationships of data samples which may affect experimental results of using original and new features. In this paper, we propose a Relation...
Neural machine translation (NMT) has shown promising results and rapidly gained adoption in many large-scale settings. With the NMT model being widely used in empirical productions, its long-standing weakness in handling the rare and out of vocabulary words has been amplified a lot. In order to release the model from the stress of “understanding” the rare words, copy mechanism has been proposed to...
In this work we propose a new framework for combined feature extraction and classification. The base idea stems from the sparse representation based classification; where in the training samples from each class are assumed to form a basis for representing the same. Later studies learned a basis for each class using dictionary learning; these were shallow techniques where only one level of dictionary...
Real-world data such as medical images and sensor measurements is usually high-dimensional and limited. Using such datasets directly in machine learning tasks can lead to poor generalization. Feature learning is a general approach for transforming high-dimensional data points to a representational space with lower dimensionality. Machine learning models can be trained efficiently with such representations...
Automated cell detection is a critical step for a number of computer-assisted pathology related image analysis algorithm. However, automated cell detection is complicated due to the variable cytomorphological and histological factors associated with each cell. In order to efficiently resolve the challenge of automated cell detection, deep learning strategies are widely applied and have recently been...
The High Efficiency Video Coding (HEVC) standard defines 35 Intra Prediction Modes (IPM) to provide an efficient compression of intra coded blocks. Those IPMs are signalled to the decoder through the use of three compression tools: prediction, clustering and coding. In this paper we provide improvements to these three tools through: new labels for the prediction, new tests for the clustering and new...
Most traditional template matching based keyword recognition methods don't need training data, just rely on frame matching. However, the recognition speed is relatively slow and it can't be used in practice. The LVCSR-based method needs to convert the speech signal into text signal before recognition, which has an important impact on the final recognition performance. In this paper, we propose a method...
With the completion of the IARPA Babel program, it is possible to systematically analyze the performance of speech recognition systems across a wide variety of languages. We select 16 languages from the dataset and compare performance using a deep neural network-based acoustic model. The focus is on keyword spotting using the actual term-weighted value (ATWV) metric. We demonstrate that ATWV is keyword...
This paper advances the design of CTC-based all-neural (or end-to-end) speech recognizers. We propose a novel symbol inventory, and a novel iterated-CTC method in which a second system is used to transform a noisy initial output into a cleaner version. We present a number of stabilization and initialization methods we have found useful in training these networks. We evaluate our system on the commonly...
This paper investigates the framework of encoder-decoder with attention for sequence labelling based spoken language understanding. We introduce Bidirectional Long Short Term Memory - Long Short Term Memory networks (BLSTM-LSTM) as the encoder-decoder model to fully utilize the power of deep learning. In the sequence labelling task, the input and output sequences are aligned word by word, while the...
A multi-stream framework with deep neural network (DNN) classifiers is applied to improve automatic speech recognition (ASR) in environments with different reverberation characteristics. We propose a room parameter estimation model to establish a reliable combination strategy which performs on either DNN posterior probabilities or word lattices. The model is implemented by training a multilayer perceptron...
In this paper, we propose a new supervised monaural source separation based on autoencoders. We employ the autoencoder for the dictionary training such that the nonlinear network can encode the target source with high expressiveness. The dictionary is trained by each target source without the mixture signal, which makes the system independent from the context where the dictionaries will be used. In...
This paper considers near optimal design of predictive compression system that accounts for packet loss over unreliable networks. Major challenges to address include, propagation of errors due to packet loss through the prediction loop, mismatch between statistics used for design and during operation, and above all a cost function that is fraught with poor local minima. Accurately estimating and minimizing...
When using connectionist temporal classification (CTC) based acoustic models (AMs) for large vocabulary continuous speech recognition (LVCSR), most previous studies have used a naive interpolation of the CTC-AM score and an additional language model score, although there is no theoretical justification for such an approach. On the other hand, we recently proposed a theoretically more sound decoding...
In this paper we present an extension of our previously described neural machine translation based system for punctuated transcription. This extension allows the system to map from per frame acoustic features to word level representations by replacing the traditional encoder in the encoder-decoder architecture with a hierarchical encoder. Furthermore, we show that a system combining lexical and acoustic...
In recent years, so-called, “end-to-end” speech recognition systems have emerged as viable alternatives to traditional ASR frameworks. Keyword search, localizing an orthographic query in a speech corpus, is typically performed by using automatic speech recognition (ASR) to generate an index. Previous work has evaluated the use of end-to-end systems for ASR on well known corpora (WSJ, Switchboard,...
In this paper we aim to enhance keyword search for conversational telephone speech under low-resourced conditions. Two techniques to improve the detection of out-of-vocabulary keywords are assessed in this study: using extra text resources to augment the lexicon and language model, and via subword units for keyword search. Two approaches for data augmentation are explored to extend the limited amount...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.