The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The deep neural network component of current hybrid speech recognizers is trained on a context of consecutive feature vectors. Here, we investigate whether the time span of this input can be extended by splitting it up and modeling it in smaller chunks. One method for this is to train a hierarchy of two networks, while the less well-known split temporal context (STC) method models the left and right...
To solve the acoustic-to-articulatory inversion problem, this paper proposes a deep bidirectional long short term memory recurrent neural network and a deep recurrent mixture density network. The articulatory parameters of the current frame may have correlations with the acoustic features many frames before or after. The traditional pre-designed fixed-length context window may be either insufficient...
Exploiting sparseness in deep neural networks is an important method for reducing the computational cost. In this paper, we study neuron sparseness in deep neural networks for acoustic modeling. For the feed-forward stage, we only activate neurons whose input values are larger than a given threshold, and set the outputs of inactive nodes to zero. Thus, only a few nonzero outputs are fed to the next...
This paper presents a novel approach for enhancing the multiple sets of acoustic patterns automatically discovered from a given corpus. In a previous work it was proposed that different HMM configurations (number of states per model, number of distinct models) for the acoustic patterns form a two-dimensional space. Multiple sets of acoustic patterns automatically discovered with the HMM configurations...
The use of context-dependent targets has become standard in hybrid DNN systems for automatic speech recognition. However, we argue that despite the use of state-tying, optimising to context-dependent targets can lead to over-fitting, and that discriminating between arbitrary tied context-dependent targets may not be optimal. We propose a multitask learning method where the network jointly predicts...
Traditional utterance phonetization methods concatenate pronunciations of uncontextualized constituent words. This approach is too weak for some languages, like French, where transitions between words imply pronunciation modifications. Moreover, it makes it difficult to consider global pronunciation strategies, for instance to model a specific speaker or a specific accent. To overcome these problems,...
Long Short Term Memory Recurrent Neural Networks (LSTM RNNs), combined with hidden Markov models (HMMs), have recently been show to outperform other acoustic models such as Gaussian mixture models (GMMs) and deep neural networks (DNNs) for large scale speech recognition. We argue that using multi-state HMMs with LSTM RNN acoustic models is an unnecessary vestige of GMM-HMM and DNN-HMM modelling since...
Both Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) have shown improvements over Deep Neural Networks (DNNs) across a wide variety of speech recognition tasks. CNNs, LSTMs and DNNs are complementary in their modeling capabilities, as CNNs are good at reducing frequency variations, LSTMs are good at temporal modeling, and DNNs are appropriate for mapping features to a more separable...
Deep neural networks (DNNs) use a cascade of hidden representations to enable the learning of complex mappings from input to output features. They are able to learn the complex mapping from text-based linguistic features to speech acoustic features, and so perform text-to-speech synthesis. Recent results suggest that DNNs can produce more natural synthetic speech than conventional HMM-based statistical...
The problem of semantic video structuring is vital for automated management of large video collections. The goal is to automatically extract from the raw data the inner structure of a video collection; so that a whole new range of applications to browse and search video collections can be derived out of this high-level segmentation. To reach this goal, we exploit techniques that consider the full...
This paper presents two speech recognition systems which use the notion of phonetic and phonological similarity to improve the robustness of phoneme recognition. The first recognition system, YASPER, uses phonetic feature extraction engines to identify phonemes based on overlap relations between phonetic features. The second system uses the CMU Sphinx 3.7 decoder based on statistical context-dependent...
Notwithstanding the significant advances in context-aware computing in pervasive computing and self-adaptive systems, there is still much more to be desired in providing better context services. The number of sensors deployed world-wide increases very rapidly. The Internet of Things, amongst others, generates vast amounts of data of many different data types. How data are used is essential to improve...
Aiming at the problem of Chinese thesaurus construction, we propose a method of using HMM to extract new terms from academic literature to expand automatically entry-words for Chinese thesaurus. This method converts the new terms extraction problem to a sequence labelling problem. It uses HMM fully integrated lexical information and syntactic information of new terms, as well as local context information,...
This paper presents a new context dependent tone recognition method. First we suggest that there be more than five tone modes in Chinese continuous speech. We get all new tone modes by grouping all tone feature vectors to a specific number of categories. Secondly, we recognize a sentence with the new tone modes and get the new tone sequence. Finally, we find out each original tone of the sentence...
E-Reputation is gaining increasing attention among companies. Many brands are making deep invests in managing their image across the web and virtual communities. Thereby, marketers try to access to large volumes of data generated by e-reputation analysis. Their main issue is detecting what is said about their brand and how it can impact their business. As social mediacontributes in assessing opinions...
The growing affordability of smart phones and mobile devices has only added to this trend by encouraging prolonged durations of inactivity. In this paper, we present a middleware, called the Pervasive Middleware for Activity Recognition (PEMAR) that aims to increase the level of physical activity by creating a middleware for active games on mobile devices. For the PEMAR application, we present a human...
TnT is an efficient statistical Parts-of-speech (POS) Tagger based on Hidden Markov Model. TnT stands for Trigrams‘n’Tags. Viterbi algorithm is used for finding the best tag sequence for a given observation sequence of words. TnT performs well on known word sequences. But, the performance degrades with increase in the number of unknown words. In this paper, we propose a method to overcome this performance...
In this paper, we describe a novel approach to investigate negative behavior dynamics in online social networks as epidemic phenomena. We present a finite-state machine model for time-varying epidemic dynamics, and validate this model with experiments over a large dataset of Youtube commentaries, indicating how different epidemic patterns of behavior can be tied to specific interaction patterns among...
In this paper, we analyse the emotion of children's stories in sentence level by considering the context information. We demonstrate that the emotion of a sentence is not only dependent on its content, but also affected by its neighbours in a story. A Hidden Markov Model (HMM) based method is proposed to model the emotion sequence and to detect whether a sentence is neutral or not. We show the important...
This paper describes an approach to HMM-based Thai speech synthesis using stress context. It has been shown that context related to stressed/unstressed syllable information (stress context) significantly improves the tone correctness of the synthetic speech, but there is a problem of requiring a manual context labeling process in tone modeling. To reduce costs for the stress context labeling, we propose...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.