The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we propose adapt the recurrent neural network (RNN) based language model to improve the performance of multi-accent Mandarin speech recognition. N-gram based language model has already been applied to speech recognition system, but it is hard to describe the long span information in a sentence and arises a serious phenomenon of data sparse. Instead, RNN based language model can overcome...
More and more linguistic information has been employed to improve the performance of machine translation, such as part of speech, syntactic structures, discourse contexts, and so on. However, conventional approaches typically ignore the key information beyond the text such as prosody. In this paper, we exploit and employ three prosodic features: pronunciation (phonetic alphabet and tone), prosodic...
Conditional random fields (CRF) can generate high-quality confidence measure scores (CMS) for speech recognition systems. However, like many other real-world machine learning tasks, there are only limited annotated data for training but always abundant unlabeled data, which requires too much human efforts and expertise to annotate. To address this issue, we use a scheme of CRF training for ASR confidence...
Spoken keyword search in low-resource condition suffers from out-of-vocabulary (OOV) problem and insufficient text data for language model (LM) training. Web-crawled text data is used to expand vocabulary and to augment language model. However, the mismatching between web text and the target speech data brings difficulties to effective utilization. New words from web data need an evaluation to exclude...
In this paper, we propose a dictionary update method for Non-negative Matrix Factorization (NMF) with high dimensional data in a spectral conversion (SC) task. Voice conversion has been widely studied due to its potential applications such as personalized speech synthesis and speech enhancement. Exemplar-based NMF (ENMF) emerges as an effective and probably the simplest choice among all techniques...
In automatic speech recognition (ASR), connectionist temporal classification (CTC) is regarded as a method to achieve end-to-end system. Actually, not only characters (Chars) but also context independent phonemes (CI-Phns) or context dependent phoneme (CD-Phns) can be used as output units of CTC-trained neural network. The contribution of this paper mainly lies in three aspects: First, we trained...
In this paper, we investigate how we can take advantage of the availability of linguistic knowledge, particularly semantic knowledge, in Air Traffic Control (ATC) to reduce the Word Error Rate (WER) of Automatic Speech Recognition (ASR) systems. To facilitate this, we integrate semantic knowledge into post-processing by performing n-best list re-ranking. We first propose a feature called semantic...
This paper presents a Finite State Machine (FSM) to reduce user's waiting time to get the recognition result after finishing writing in recognition of online handwritten English text. The lexicon is modeled by a FSM, and then determination and minimization are applied to reduce the number of states. The reduction of states in the FSM shortens the waiting time without degrading the recognition accuracy...
Handwriting recognition always has been a difficult problem, with image related problems on the one hand and language processing on the other hand. Significant improvements have been made in handwriting recognition thanks to new recurrent neural networks based on LSTM cells. The high character recognition performances of these networks are almost systematically combined with linguistic knowledge,...
In this paper, the turbo principle is applied to the existing least symbols error rate(LSER) decision feedback equalization(DFE) for underwater channel. The performance of the DFE adaptive algorithms are aided by soft information delivered from the channel decoder. We introduce a variable step size scheme that takes soft information into account to get the more suitable step size which can reduce...
Chord recognition systems depend on robust feature extraction pipelines. While these pipelines are traditionally hand-crafted, recent advances in end-to-end machine learning have begun to inspire researchers to explore data-driven methods for such tasks. In this paper, we present a chord recognition system that uses a fully convolutional deep auditory model for feature extraction. The extracted features...
This paper describes an adaptive deblocking postfilter based on neural networks for use in H.265 High Efficiency Video Coding (HEVC). Blocking noise is a common problem in video coding caused by the division of the frame into blocks. The filter is adaptive because it uses different filter parameters depending on block characteristics. We use a modified HEVC decoder to export the block information...
A design method of a multiple description vector quantizer (VQ) is proposed. VQ is widely used for data compression, transmission and other processing. Here, we assume transmission channels with data erasure such as a packet-based network. Multiple description coding is a coding method used to achieve “graceful degradation” when transmitting signals through lossy channels. The proposed method is inspired...
Brain-machine interface (BMI) systems have the potential to restore function to people who suffer from paralysis due to a spinal cord injury. However, in order to achieve long-term use, BMI systems have to overcome two challenges — signal degeneration over time, and non-stationarity of signals. Effects of loss in spike signals over time can be mitigated by using local field potential (LFP) signals...
Brain Computer Interfaces (BCIs) assist individuals with motor disabilities by enabling them to control prosthetic devices with their neural activity. Performance of closed-loop BCI systems can be improved by using design strategies that leverage structured and task-relevant neural activity. We use data from high density electrocorticography (ECoG) grids implanted in three subjects to study sensory-motor...
Recent advances in Brain Computer Interfaces (BCIs) have created hope that one day paralyzed patients will be able to regain control of their paralyzed limbs. As part of an ongoing clinical study, we have implanted a 96-electrode Utah array in the motor cortex of a paralyzed human. The array generates almost 3 million data points from the brain every second. This presents several big data challenges...
This study addresses neural decoding of a code modulated visual evoked potentials (c-VEPs). c-VEP was recently developed, and applied to brain computer interfaces (BCIs). c-VEP BCI exhibits faster communication speed than existing VEP-based BCIs. In c-VEP BCI, the canonical correlation analysis (CCA) that maximizes the correlation between an averaged signal and single trial signals is often used for...
Label-deficient semi-supervised learning is a challenging setting in which there is an abundance of unlabeled data but a dearth of labeled data. A hybrid network that mixes an autoencoder, capable of extracting information from unlabeled data, and a neural network classifier, which incorporates information from labeled data, can be useful in a label-deficient setting. In this case study, we examine...
This paper proposes a neural network model and learning algorithm that can be applied to encode words. The model realizes the function of words encoding and decoding which can be applied to text encryption/decryption and word-based compression. The model is based on Deep Belief Networks (DBNs) and it differs from traditional DBNs in that it is asymmetric structured and the output of it is a binary...
This work proposes to learn autoencoders with sparse connections. Prior studies on autoencoders enforced sparsity on the neuronal activity; these are different from our proposed approach - we learn sparse connections. Sparsity in connections helps in learning (and keeping) the important relations while trimming the irrelevant ones. We have tested the performance of our proposed method on two tasks...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.