Search results

chapter

Low-latency sound source separation using deep neural networks

Gaurav Naithani, Giambattista Parascandolo, Tom Barker, Niels Henrik Pontoppidan, more

2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP) > 272 - 276

2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

Sound source separation at low-latency requires that each incoming frame of audio data be processed at very low delay, and outputted as soon as possible. For practical purposes involving human listeners, a 20 ms algorithmic delay is the uppermost limit which is comfortable to the listener. In this paper, we propose a low-latency (algorithmic delay < 20 ms) deep neural network (DNN) based source...

chapter

What does scene text tell us?

Seiichi Uchida, Yuto Shinahara

2016 23rd International Conference on Pattern Recognition (ICPR) > 4047 - 4052

2016 23rd International Conference on Pattern Recognition (ICPR)

Scene text is one of the most important information sources for our daily life because it has particular functions such as disambiguation and navigation. In contrast, ordinary document text has no such function. Consequently, it is natural to have a hypothesis that scene text and document text have different characteristics. This paper tries to prove this hypothesis by semantic analysis of texts by...

chapter

Distinguishing text/non-text natural images with Multi-Dimensional Recurrent Neural Networks

Pengyuan Lyu, Baoguang Shi, Chengquan Zhang, Xiang Bai

2016 23rd International Conference on Pattern Recognition (ICPR) > 3981 - 3986

2016 23rd International Conference on Pattern Recognition (ICPR)

In this paper, we focus on the text/non-text classification problem: distinguishing images that contain text from a lot of natural images. To this end, we propose a novel neural network architecture, termed Convolutional Multi-Dimensional Recurrent Neural Network (CMDRNN), which distinguishes text/non-text images by classifying local image blocks, taking both region pixels and dependencies among blocks...

chapter

Context-aware mathematical expression recognition: An end-to-end framework and a benchmark

Wenhao He, Yuxuan Luo, Fei Yin, Han Hu, more

2016 23rd International Conference on Pattern Recognition (ICPR) > 3246 - 3251

2016 23rd International Conference on Pattern Recognition (ICPR)

In this paper we propose a novel end-to-end framework for mathematical expression (ME) recognition. The method uses a convolutional neural network (CNN) to perform mathematical symbol detection and recognition simultaneously incorporating spatial context, and can handle multi-part and touching symbols effectively. To evaluate the performance, we provide a benchmark that contains MEs both from real-life...

chapter

Context-regularized learning of fully convolutional networks for scene labeling

Anirban Roy, Sinisa Todorovic, Longin Jan Latecki

2016 23rd International Conference on Pattern Recognition (ICPR) > 3751 - 3756

2016 23rd International Conference on Pattern Recognition (ICPR)

This paper addresses the problem of pixel-wise semantic labeling of images. To this end, we use a fully convolutional network (FCN) whose input are raw pixels, and output are pixel labels. Our key novelty is that we regularize a supervised learning of FCN, such that FCN correctly predicts pixel labels and additionally does not violate a given set of spatial object relationships of interest. The frequency...

chapter

A log-linear weighting approach in the Word2vec space for spoken language understanding

Janod Killian, Mohamed Morchid, Richard Dufour, Georges Linares

2016 IEEE Spoken Language Technology Workshop (SLT) > 356 - 361

2016 IEEE Spoken Language Technology Workshop (SLT)

This paper proposes an original method which integrates contextual information of words into Word2vec neural networks that learn from words and their respective context windows. In the classical word embedding approach, context windows are represented as bag-of-words, i.e. every word in the context is treated equally. A log-linear weighting approach modeling the continuous context is proposed in our...

chapter

Automatic optimization of data perturbation distributions for multi-style training in speech recognition

Mortaza Doulaty, Richard Rose, Olivier Siohan

2016 IEEE Spoken Language Technology Workshop (SLT) > 21 - 27

2016 IEEE Spoken Language Technology Workshop (SLT)

Speech recognition performance using deep neural network based acoustic models is known to degrade when the acoustic environment and the speaker population in the target utterances are significantly different from the conditions represented in the training data. To address these mismatched scenarios, multi-style training (MTR) has been used to perturb utterances in an existing uncorrupted and potentially...

chapter

Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting

Ming Sun, Anirudh Raju, George Tucker, Sankaran Panchapagesan, more

2016 IEEE Spoken Language Technology Workshop (SLT) > 474 - 480

2016 IEEE Spoken Language Technology Workshop (SLT)

We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and latency requirements. The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network. A posterior smoothing based evaluation approach is employed to measure keyword spotting performance...

chapter

Approaches for language identification in mismatched environments

Shahan Nercessian, Pedro Torres-Carrasquillo, Gabriel Martinez-Montes

2016 IEEE Spoken Language Technology Workshop (SLT) > 335 - 340

2016 IEEE Spoken Language Technology Workshop (SLT)

In this paper, we consider the task of language identification in the context of mismatch conditions. Specifically, we address the issue of using unlabeled data in the domain of interest to improve the performance of a state-of-the-art system. The evaluation is performed on a 9-language set that includes data in both conversational telephone speech and narrowband broadcast speech. Multiple experiments...

chapter

Intent detection using semantically enriched word embeddings

Joo-Kyung Kim, Gokhan Tur, Asli Celikyilmaz, Bin Cao, more

2016 IEEE Spoken Language Technology Workshop (SLT) > 414 - 419

2016 IEEE Spoken Language Technology Workshop (SLT)

State-of-the-art targeted language understanding systems rely on deep learning methods using 1-hot word vectors or off-the-shelf word embeddings. While word embeddings can be enriched with information from semantic lexicons (such as WordNet and PPDB) to improve their semantic representation, most previous research on word-embedding enriching has focused on improving intrinsic word-level tasks such...

chapter

Abstractive headline generation for spoken content by attentive recurrent neural networks with ASR error modeling

Lang-Chi Yu, Hung-yi Lee, Lin-shan Lee

2016 IEEE Spoken Language Technology Workshop (SLT) > 151 - 157

2016 IEEE Spoken Language Technology Workshop (SLT)

Headline generation for spoken content is important since spoken content is difficult to be shown on the screen and browsed by the user. It is a special type of abstractive summarization, for which the summaries are generated word by word from scratch without using any part of the original content. Many deep learning approaches for headline generation from text document have been proposed recently,...

chapter

Using paraphrases to improve tweet classification: Comparing WordNet and word embedding approaches

Quanzhi Li, Sameena Shah, Mohammad Ghassemi, Rui Fang, more

2016 IEEE International Conference on Big Data (Big Data) > 4014 - 4016

2016 IEEE International Conference on Big Data (Big Data)

Two of the major problems in social media message classification are the data sparseness issue and the high degree of lexical variation. Paraphrases, or synonyms, are alternative ways of expressing the same meaning using different lexical variations. In this study, we try to use paraphrases to improve tweet topic classification performance. We explored two approaches to generating paraphrases, WordNet,...

chapter

WEFEST: Word Embedding Feature Extension for Short Text Classification

Lei Sang, Fei Xie, Xiaojian Liu, Xindong Wu

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) > 677 - 683

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)

Short text classification is a crucial task for information retrieval, social medial text categorization, and many other applications. In reality, due to the inherent sparsity and the limited information available in the short texts, learning and classifying short texts is a significant challenge. In this paper, we propose a new framework, WEFEST, which expands short texts using word embedding for...

chapter

Bengali word embeddings and it's application in solving document classification problem

Adnan Ahmad, Mohammad Ruhul Amin

2016 19th International Conference on Computer and Information Technology (ICCIT) > 425 - 430

2016 19th International Conference on Computer and Information Technology (ICCIT)

In this paper, we present Bengali word embeddings and it's application in the classification of news documents. Word embeddings are multi-dimensional vectors that can be created by exploiting the linguistic context of the words in large corpus. To generate the embeddings, we collected Bengali news document of last five years from the major daily newspapers. Word embeddings are generated using the...

chapter

Power system distributed dynamic state prediction

Md. Ashfaqur Rahman, Ganesh Kumar Venayagamoorthy

2016 IEEE Symposium Series on Computational Intelligence (SSCI) > 1 - 7

2016 IEEE Symposium Series on Computational Intelligence (SSCI)

The security of the power system can be enhanced with the prediction of the dynamic state variables. To increase the security, a distributed predictor is developed based on the Elman Recurrent Neural Network (ERNN) in this study. To develop a scalable distributed predictor, the whole network is divided in a number of ERNNs. They take the current and the previous actual states from its own and its...

chapter

On line emotion detection using retrainable deep neural networks

Dimitrios Kollias, Athanasios Tagaris, Andreas Stafylopatis

2016 IEEE Symposium Series on Computational Intelligence (SSCI) > 1 - 8

2016 IEEE Symposium Series on Computational Intelligence (SSCI)

This paper presents a new methodology for detecting deterioration in performance of deep neural networks when applied to on line visual analysis problems and enabling fine-tuning, or retraining, of the network to the current data characteristics. Pre-trained deep neural networks which have a satisfactory performance on the problem under study constitute the basis of the approach, with efficient transfer...

chapter

Jointly learning to align and convert graphemes to phonemes with neural attention models

Shubham Toshniwal, Karen Livescu

2016 IEEE Spoken Language Technology Workshop (SLT) > 76 - 82

2016 IEEE Spoken Language Technology Workshop (SLT)

We propose an attention-enabled encoder-decoder model for the problem of grapheme-to-phoneme conversion. Most previous work has tackled the problem via joint sequence models that require explicit alignments for training. In contrast, the attention-enabled encoder-decoder model allows for jointly learning to align and convert characters to phonemes. We explore different types of attention models, including...

chapter

CoCo (Context vs. Content): Behavior-Inspired Social Media Recommendation for Mobile Apps

Bowen Yang, Chao Wu, Stephan Sigg, Yaoxue Zhang

2016 IEEE Global Communications Conference (GLOBECOM) > 1 - 6

GLOBECOM 2016 - 2016 IEEE Global Communications Conference

Exponential growth of media generated in online social networks demands effective recommendation to improve the efficiency of media access especially for mobile users. In particular, content, objective quality or general popularity are less decisive for the prediction of user-click behavior than friendship-conditioned patterns. Existing recommender systems however, rarely consider user behavior in...

chapter

Scaling character-based morphological tagging to fourteen languages

Georg Heigold, Josef van Genabith, Gunter Neumann

2016 IEEE International Conference on Big Data (Big Data) > 3895 - 3902

2016 IEEE International Conference on Big Data (Big Data)

This paper investigates neural character-based morphological tagging for languages with complex morphology and large tag sets. Character-based approaches are attractive as they can handle rarely- and unseen words gracefully. More specifically, beside a rich morphology, non-canonical language, change of language or other linguistic variability can heavily degrade the accuracy of natural language processing...

chapter

Using the IBM Watson cognitive system in educational contexts

Ilianna Kollia, Georgios Siolas

2016 IEEE Symposium Series on Computational Intelligence (SSCI) > 1 - 8

2016 IEEE Symposium Series on Computational Intelligence (SSCI)

In the current paper we describe how Watson Experience Manager (WEM), an industrial Question Answering (QA) tool developed by IBM, has been used in an educational context at the National Technical University of Athens (NTUA). During the postgraduate course on Data Science, three student teams experimented with WEM's QA capabilities on three different topics, namely, Nutrition, Autism and New York...

INFONA - science communication portal

Search results

Low-latency sound source separation using deep neural networks

What does scene text tell us?

Distinguishing text/non-text natural images with Multi-Dimensional Recurrent Neural Networks

Context-aware mathematical expression recognition: An end-to-end framework and a benchmark

Context-regularized learning of fully convolutional networks for scene labeling

A log-linear weighting approach in the Word2vec space for spoken language understanding

Automatic optimization of data perturbation distributions for multi-style training in speech recognition

Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting

Approaches for language identification in mismatched environments

Intent detection using semantically enriched word embeddings

Abstractive headline generation for spoken content by attentive recurrent neural networks with ASR error modeling

Using paraphrases to improve tweet classification: Comparing WordNet and word embedding approaches

WEFEST: Word Embedding Feature Extension for Short Text Classification

Bengali word embeddings and it's application in solving document classification problem

Power system distributed dynamic state prediction

On line emotion detection using retrainable deep neural networks

Jointly learning to align and convert graphemes to phonemes with neural attention models

CoCo (Context vs. Content): Behavior-Inspired Social Media Recommendation for Mobile Apps

Scaling character-based morphological tagging to fourteen languages

Using the IBM Watson cognitive system in educational contexts

Filter options

Publication date

Content availability

Publication type

Keywords

Data set

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options