The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Different neural networks have exhibited excellent performance on various speech processing tasks, and they usually have specific advantages and disadvantages. We propose to use a recently developed deep learning model, recurrent convolutional neural network (RCNN), for speech processing, which inherits some merits of recurrent neural network (RNN) and convolutional neural network (CNN). The core...
We introduce a novel dynamic model for discrete time-series data, in which the temporal sampling may be nonuniform. The model is specified by constructing a hierarchy of Poisson factor analysis blocks, one for the transitions between latent states and the other for the emissions between latent states and observations. Latent variables are binary and linked to Poisson factor analysis via Bernoulli-Poisson...
There has been a challenging research topic on exploring an universal set of speech attributes sharing among a large number of languages for detection-based bottom-up cross-language speech recognition. In some recent research works, articulatory features are used as an universal set of speech attributes shared across many different languages. Since they are defined by human as a set of semantic articulatory...
Cross-language transfer speech recognition aims to transform phoneme models for a source language to recognize a target language lacking labeled data and other linguistic resources. In this paper, sparse auto-encoder, a deep learning method, is introduced to derive shared speech features between source and target language using semi-supervised learning. It can extract the shared representation of...
This paper proposed an unsupervised learning method to learn speech features based on Dynamic Bayesian Networks (DBNs) that accounts for the spatiotemporal dependences in speech signal. Although deep networks have been successfully applied to unsupervised learning features, the structures of the deep networks are often fixed before learning and they fail to capture temporal representation. In this...
Dynamic Bayesian Networks (DBN) area subset of the probabilistic graphical models (PGM) which include hidden Markov model (HMM) as a special case. One of the principle weaknesses of HMMs is the independence assumptions on the observed and hidden processes of speech. This paper proposed to use the DBN for Tibetan language continuous speech recognition.The proposed approach is based on structure learning...
The research on Tibetan speech recognition is in its initial stage. It is significant to research on recognition algorithm adapted for Tibetan speech. A kind of algorithm of Tibetan speech recognition, based on dynamic Bayesian network (DBN), would be investigated in this paper. The simulation on the given algorithm would be carried out, and through the comparing with the recognizing algorithm based...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.