The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Sound source separation at low-latency requires that each incoming frame of audio data be processed at very low delay, and outputted as soon as possible. For practical purposes involving human listeners, a 20 ms algorithmic delay is the uppermost limit which is comfortable to the listener. In this paper, we propose a low-latency (algorithmic delay < 20 ms) deep neural network (DNN) based source...
Scene text is one of the most important information sources for our daily life because it has particular functions such as disambiguation and navigation. In contrast, ordinary document text has no such function. Consequently, it is natural to have a hypothesis that scene text and document text have different characteristics. This paper tries to prove this hypothesis by semantic analysis of texts by...
In this paper, we focus on the text/non-text classification problem: distinguishing images that contain text from a lot of natural images. To this end, we propose a novel neural network architecture, termed Convolutional Multi-Dimensional Recurrent Neural Network (CMDRNN), which distinguishes text/non-text images by classifying local image blocks, taking both region pixels and dependencies among blocks...
In this paper we propose a novel end-to-end framework for mathematical expression (ME) recognition. The method uses a convolutional neural network (CNN) to perform mathematical symbol detection and recognition simultaneously incorporating spatial context, and can handle multi-part and touching symbols effectively. To evaluate the performance, we provide a benchmark that contains MEs both from real-life...
This paper addresses the problem of pixel-wise semantic labeling of images. To this end, we use a fully convolutional network (FCN) whose input are raw pixels, and output are pixel labels. Our key novelty is that we regularize a supervised learning of FCN, such that FCN correctly predicts pixel labels and additionally does not violate a given set of spatial object relationships of interest. The frequency...
This paper proposes an original method which integrates contextual information of words into Word2vec neural networks that learn from words and their respective context windows. In the classical word embedding approach, context windows are represented as bag-of-words, i.e. every word in the context is treated equally. A log-linear weighting approach modeling the continuous context is proposed in our...
Speech recognition performance using deep neural network based acoustic models is known to degrade when the acoustic environment and the speaker population in the target utterances are significantly different from the conditions represented in the training data. To address these mismatched scenarios, multi-style training (MTR) has been used to perturb utterances in an existing uncorrupted and potentially...
We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and latency requirements. The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network. A posterior smoothing based evaluation approach is employed to measure keyword spotting performance...
In this paper, we consider the task of language identification in the context of mismatch conditions. Specifically, we address the issue of using unlabeled data in the domain of interest to improve the performance of a state-of-the-art system. The evaluation is performed on a 9-language set that includes data in both conversational telephone speech and narrowband broadcast speech. Multiple experiments...
State-of-the-art targeted language understanding systems rely on deep learning methods using 1-hot word vectors or off-the-shelf word embeddings. While word embeddings can be enriched with information from semantic lexicons (such as WordNet and PPDB) to improve their semantic representation, most previous research on word-embedding enriching has focused on improving intrinsic word-level tasks such...
Headline generation for spoken content is important since spoken content is difficult to be shown on the screen and browsed by the user. It is a special type of abstractive summarization, for which the summaries are generated word by word from scratch without using any part of the original content. Many deep learning approaches for headline generation from text document have been proposed recently,...
Two of the major problems in social media message classification are the data sparseness issue and the high degree of lexical variation. Paraphrases, or synonyms, are alternative ways of expressing the same meaning using different lexical variations. In this study, we try to use paraphrases to improve tweet topic classification performance. We explored two approaches to generating paraphrases, WordNet,...
Short text classification is a crucial task for information retrieval, social medial text categorization, and many other applications. In reality, due to the inherent sparsity and the limited information available in the short texts, learning and classifying short texts is a significant challenge. In this paper, we propose a new framework, WEFEST, which expands short texts using word embedding for...
In this paper, we present Bengali word embeddings and it's application in the classification of news documents. Word embeddings are multi-dimensional vectors that can be created by exploiting the linguistic context of the words in large corpus. To generate the embeddings, we collected Bengali news document of last five years from the major daily newspapers. Word embeddings are generated using the...
The security of the power system can be enhanced with the prediction of the dynamic state variables. To increase the security, a distributed predictor is developed based on the Elman Recurrent Neural Network (ERNN) in this study. To develop a scalable distributed predictor, the whole network is divided in a number of ERNNs. They take the current and the previous actual states from its own and its...
This paper presents a new methodology for detecting deterioration in performance of deep neural networks when applied to on line visual analysis problems and enabling fine-tuning, or retraining, of the network to the current data characteristics. Pre-trained deep neural networks which have a satisfactory performance on the problem under study constitute the basis of the approach, with efficient transfer...
We propose an attention-enabled encoder-decoder model for the problem of grapheme-to-phoneme conversion. Most previous work has tackled the problem via joint sequence models that require explicit alignments for training. In contrast, the attention-enabled encoder-decoder model allows for jointly learning to align and convert characters to phonemes. We explore different types of attention models, including...
Exponential growth of media generated in online social networks demands effective recommendation to improve the efficiency of media access especially for mobile users. In particular, content, objective quality or general popularity are less decisive for the prediction of user-click behavior than friendship-conditioned patterns. Existing recommender systems however, rarely consider user behavior in...
This paper investigates neural character-based morphological tagging for languages with complex morphology and large tag sets. Character-based approaches are attractive as they can handle rarely- and unseen words gracefully. More specifically, beside a rich morphology, non-canonical language, change of language or other linguistic variability can heavily degrade the accuracy of natural language processing...
In the current paper we describe how Watson Experience Manager (WEM), an industrial Question Answering (QA) tool developed by IBM, has been used in an educational context at the National Technical University of Athens (NTUA). During the postgraduate course on Data Science, three student teams experimented with WEM's QA capabilities on three different topics, namely, Nutrition, Autism and New York...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.