The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In previous studies, no consensus has been reached on the existence of significant correlation between perception and production. A large number of empirical studies have been done upon first and second languages from different language families. However, few studies were carried out on the perception-production relation of Chinese English learners. Therefore, in the current study, under the theoretical...
Stress is an important parameter for prosody processing in speech synthesis. However, it is not easy to stress from text analysis due to the complicated information. In this paper, we explore the novel use of the continuous lexical embedding and bidirectional long short-term memory recurrent neural network (BLSTM) model into sentential stress prediction for Mandarin speech synthesis. We look at augmenting...
This paper presents a deep neural network (DNN)-based unit selection method for waveform concatenation speech synthesis using frame-sized speech segments. In this method, three DNNs are adopted to calculate target costs and concatenation costs respectively for selecting frame-sized candidate units. The first DNN is built in the same way as the DNN-based statistical parametric speech synthesis, which...
Punctuation plays an important role in language processing. However, automatic speech recognition systems only output plain word sequences. It is then of interest to predict punctuations on plain word sequences. Previous works have focused on using lexical features or prosodic cues captured from small corpus to predict simple punctuations. Compared with simple punctuations, rich punctuations provide...
Model based VAD approaches have been widely used and achieved success in practice. These approaches usually cast VAD as a frame-level classification problem and employ statistical classifiers, such as Gaussian Mixture Model (GMM) or Deep Neural Network (DNN) to assign a speech/silence label for each frame. Due to the frame independent assumption classification, the VAD results tend to be fragile....
Although uni-directional recurrent neural network language model(RNNLM) has been very successful, it's hard to train a bi-directional RNNLM properly due to the generative nature of language model. In this work, we propose to train bi-directional RNNLM with noise contrastive estimation(NCE), since the properities of NCE training will help the model to acheieve sentence-level normalization. Experiments...
Recently, several fast speaker adaptation methods have been proposed for the hybrid DNN-HMM models based on the so-called discriminative speaker codes (SC) [1-3] and applied to unsupervised speaker adaptation in speech recognition [4]. It has been demonstrated that the SC based methods are quite effective in adapting DNNs even when only a very small amount of adaptation data is available. However,...
This paper presents a data-driven approach towards the modeling of agent behaviors in a full-fledged, commercial off-the-shelf simulation milieu for tactical military training. The modeling approach employs machine learning to identify behavioral rules and patterns in data. Potential advantages of this approach are that it may improve modeling efficiency and, perhaps more importantly, increase the...
Irony is something most people can tell is therewhen they see it, but it is not so easy to define, let alone detectautomatically. In this paper we describe the construction of abalanced corpus of ironic vs. serious watch reviews and show thepromising results achieved by classifiers trained on this corpusin predicting the presence of irony or lack thereof in productreviews from a manually labeled corpus...
Message-level and word-level polarity classification are two popular tasks in Twitter sentiment analysis. They have been commonly addressed by training supervised models from labelled data. The main limitation of these models is the high cost of data annotation. Transferring existing labels from a related problem domain is one possible solution for this problem. In this paper, we propose a simple...
Text line detection and localisation is a crucial step for full page document analysis, but still suffers from heterogeneity of real life documents. In this paper, we present a novel approach for text line localisation based on Convolutional Neural Networks and Multidimensional Long Short-Term Memory cells as a regressor in order to predict the coordinates of the text line bounding boxes directly...
One of the major challenges in opportunistic networks is the correct identification of a transmission opportunity and its corresponding duration. In this work, recurrent neural network structures are investigated for transmission opportunity forecast. The proposed method is based on in-channel spectrum sensing and the use of Elman recurrent neural network to model the occupation of the channel. The...
In this paper we will present our investigations related to contextual modeling for HMM-based handwritten Arabic text recognition. We will, first, discuss the justifications and the need for contextual modeling for handwritten Arabic text recognition. Next, we will discuss the issues related to contextual modeling for Arabic text recognition. Finally, we will present our novel class-based contextual...
Unlike French and English, the richness and ambiguity of written Arabic texts cause a great deal of errors. The purpose of this article is to resolve issues of tolerance of some errors in Arabic texts and to develop an automatic detection system as well as a correction system of those errors. This work represents a combination of the Levenshtein Distance (LD) and bi-context language models based on...
Multidimensional long short-term memory recurrent neural networks achieve impressive results for handwriting recognition. However, with current CPU-based implementations, their training is very expensive and thus their capacity has so far been limited. We release an efficient GPU-based implementation which greatly reduces training times by processing the input in a diagonal-wise fashion. We use this...
This paper presents a new representation for handwritten math formulae: a Line-of-Sight (LOS) graph over handwritten strokes, computed using stroke convex hulls. Experimental results using the CROHME 2012 and 2014 datasets show that LOS graphs capture the visual structure of handwritten formulae better than commonly used graphs such as Time-series, Minimum Spanning Trees, and k-Nearest Neighbor graphs...
Agent-based modeling is a paradigm of modeling dynamic systems of interacting agents that are individually governed by specified behavioral rules. Training a model of such agents to produce an emergent behavior by specification of the emergent (as opposed to agent) behavior is easier from a demonstration perspective. While many approaches involve manual behavior specification via code or reliance...
Acoustic scene classification (ASC) has attracted growing research interest in recent years. Whereas the previous work has investigated closed-set classification scenarios, the predominant ASC application is open-set in nature. The contributions of the paper are (i) the first investigation of ASC in an open-set scenario, (ii) the formulation of open-set ASC as a detection problem, (iii) a classifier...
While off-the-shelf OCR systems work well on many modern documents, the heterogeneity of early prints provides a significant challenge. To achieve good recognition quality, existing software must be “trained” specifically to each particular corpus. This is a tedious process that involves significant user effort. In this paper we demonstrate a system that generically replaces a common part of the training...
This article posits reflections from the author's mature body of work that resulted in sizeable national (Denmark) and international (European) funded projects, a patent, commercial product, and a Serious Games company. Main focus is on sharing a two-stage in-action and on-action emergent model for evaluating the use of ICT (serious games and creative expression) in healthcare and learning intervention...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.