Search results

Items from 1 to 20 out of 31 results

chapter

Document image classification using SEMCON

Zenun Kastrati, Ali Shariq Imran

2015 20th Symposium on Signal Processing, Images and Computer Vision (STSIVA) > 1 - 6

2015 20th Symposium on Signal Processing, Images and Computer Vision (STSIVA)

In this paper, we are proposing a new semantic and contextual based document image classification framework. The framework is composed of two main modules. The first one is the text analysis module (TAM) which processes document images and extracts words from the image, and second one is the SEMCON, which is a semantic and contextual objective metric. From the list of extracted words by TAM, SEMCON...

chapter

A Typed and Handwritten Text Block Segmentation System for Heterogeneous and Complex Documents

P. Barlas, S. Adam, C. Chatelain, T. Paquet

2014 11th IAPR International Workshop on Document Analysis Systems > 46 - 50

2014 11th IAPR International Workshop on Document Analysis Systems (DAS)

This paper presents a Document Image Analysis (DIA) system able to extract homogeneous typed and handwritten text regions from complex layout documents of various types. The method is based on two connected component classification stages that successively discriminate text/non text and typed/handwritten shapes, followed by an original block segmentation method based on white rectangles detection...

article

Distributional Semantic Models for Affective Text Analysis

Nikolaos Malandrakis, Alexandros Potamianos, Elias Iosif, Shrikanth Narayanan

IEEE Transactions on Audio, Speech, and Language Processing > 2013 > 21 > 11 > 2379 - 2392

We present an affective text analysis model that can directly estimate and combine affective ratings of multi-word terms, with application to the problem of sentence polarity/semantic orientation detection. Starting from a hierarchical compositional method for generating sentence ratings, we expand the model by adding multi-word terms that can capture non-compositional semantics. The method operates...

chapter

Mathematical Formula Identification in PDF Documents

Xiaoyan Lin, Liangcai Gao, Zhi Tang, Xiaofan Lin, more

2011 International Conference on Document Analysis and Recognition > 1419 - 1423

2011 International Conference on Document Analysis and Recognition (ICDAR)

Recognizing mathematical expressions in PDF documents is a new and important field in document analysis. It is quite different from extracting mathematical expressions in image-based documents. In this paper, we propose a novel method by combining rule-based and learning-based methods to detect both isolated and embedded mathematical expressions in PDF documents. Moreover, various features of formulas,...

chapter

A dynamic adjustment algorithm research of sentiment word weight based on context

Xu Ye-qiang, Zhu Yan-hui, Wang Wen-hua, Gao Li-chun

2011 3rd International Conference on Computer Research and Development > 3 > 19 - 22

2011 3rd International Conference on Computer Research and Development (ICCRD 2011)

The emotion tendency of sentiment word is divided into two types: static emotion tendency and dynamic emotion tendency. Basic semantic lexicon is static emotion tendency, in the real context, but it is different between static emotion tendency and dynamic emotion tendency. The paper proposes a method based on degree lexicon, negative lexicon and dependence relationship of sentence elements. The experimental...

chapter

Web Text Clustering Based on Concept Lattice

Yimin Shi, Jun Zhang, Xianzhong Zhang, Yanxia Li

2010 2nd International Conference on Information Engineering and Computer Science > 1 - 4

2010 2nd International Conference on Information Engineering and Computer Science (ICIECS)

Most web text clustering is based on the space vector text representation model. This results in a high dimension in the terms; and it leads to an increase in time complexity and a loss of text semantics due to the fact that the semantic relationship of the terms is not considered. In this paper, a new approach is taken where a concept lattice is generated with text treated as object and terms of...

chapter

Integrating Geometric Context for Text Alignment of Handwritten Chinese Documents

Fei Yin, Qiu-Feng Wang, Cheng-Lin Liu

2010 12th International Conference on Frontiers in Handwriting Recognition > 7 - 12

2010 12th International Conference on Frontiers in Handwriting Recognition (ICFHR 2010)

The alignment of text line images with text transcript is a crucial step of handwritten document annotation. Handwritten text alignment is prone to errors due to the difficulty of character segmentation and the variability of character shape, size and position. In this paper, we propose to incorporate the geometric context of character strings to improve the alignment accuracy for offline handwritten...

chapter

Local Feature Selection for Generation of Ensembles in Text Clustering

M N Ribeiro, R B C Prudȇncio

2010 Eleventh Brazilian Symposium on Neural Networks > 67 - 72

2010 Eleventh Brazilian Symposium on Neural Networks (SBRN 2010)

In the context of text clustering, global feature selection tries to identify a single subset of features which are relevant to all clusters. However, the clustering process might be improved by considering different subsets of features for locally describing each cluster. In experiments with local feature selection, it was observed that the resulting partitions were unstable but there were cohesive...

chapter

Effective object-based image retrieval using higher-level visual representation

Ismail El Sayad, Jean Martinet, Thierry Urruty, Samir Amir, more

2010 International Conference on Machine and Web Intelligence > 218 - 224

International Conference on Machine and Web Intelligence (ICMWI 2010)

Having effective methods to access the desired images is essential nowadays with the availability of huge amount of digital images. The proposed approach is based on an analogy between image retrieval containing desired objects (object-based image retrieval) and text retrieval. We propose a higher-level visual representation, for object-based image retrieval beyond visual appearances. The proposed...

chapter

A feature selection method for document clustering based on part-of-speech and word co-occurrence

Zitao Liu, Wenchao Yu, Yalan Deng, Yongtao Wang, more

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery > 5 > 2331 - 2334

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

Feature selection is a process which chooses a subset from the original feature set according to some rules. The selected feature retains original physical meaning and provides a better understanding for the data and learning process. However, few modern feature selection approaches take the advantage of features' context information. Based on this analysis, we propose a novel feature selection method...

chapter

Keyword Extraction Using Word Co-occurrence

C Wartena, R Brussee, W Slakhorst

2010 Workshops on Database and Expert Systems Applications > 54 - 58

2010 21st International Conference on Database and Expert Systems Applications

A common strategy to assign keywords to documents is to select the most appropriate words from the document text. One of the most important criteria for a word to be selected as keyword is its relevance for the text. The tf.idf score of a term is a widely used relevance measure. While easy to compute and giving quite satisfactory results, this measure does not take (semantic) relations between words...

chapter

Computer Assisted Transcription of Text Images: Results on the GERMANA Corpus and Analysis of Improvements Needed for Practical Use

Verónica Romero, Alejandro H Toselli, Enrique Vidal

2010 20th International Conference on Pattern Recognition > 2017 - 2020

2010 20th International Conference on Pattern Recognition (ICPR 2010)

We present a study of the application of Computer Assisted Transcription of Text Images (CATTI) to a task which is much closer to real applications than other tasks previously studied. The new task consists in the transcription of a new publicly available historic handwritten document, called GERMANA. A detailed analysis of the main factors influencing the system performance are exposed and some strategies...

chapter

Adaptive Correction of Errors from Segmented Digital Ink Texts in Chinese Based on Context

Xi-Wen Zhang, Wei-Hua An, Yong-Gang Fu

2010 Second International Conference on Information Technology and Computer Science > 25 - 35

2010 2nd International Conference on Information Technology and Computer Science (ITCS 2010)

Digital ink texts in Chinese can neither be converted into users' desired layouts nor be recognized until their characters, lines, and paragraphs are correctly extracted. There are many errors in automatically segmented digital ink texts in Chinese because they are free forms and mixed with other languages, as well as their Chinese characters have small gaps and complex structures. Paragraphs, lines,...

chapter

Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora

Richard Socher, Li Fei-Fei

2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition > 966 - 973

2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We propose a semi-supervised model which segments and annotates images using very few labeled images and a large unaligned text corpus to relate image regions to text labels. Given photos of a sports event, all that is necessary to provide a pixel-level labeling of objects and background is a set of newspaper articles about this sport and one to five labeled images. Our model is motivated by the observation...

chapter

Recognizing Words from Source Code Identifiers Using Speech Recognition Techniques

N Madani, L Guerrouj, M Di Penta, Y Gueheneuc, more

2010 14th European Conference on Software Maintenance and Reengineering > 68 - 77

14th European Conference on Software Maintenance and Reengineering (CSMR 2010)

The existing software engineering literature has empirically shown that a proper choice of identifiers influences software understandability and maintainability. Researchers have noticed that identifiers are one of the most important source of information about program entities and that the semantic of identifiers guide the cognitive process. Recognizing the words forming identifiers is not an easy...

chapter

Morpheme-based product features categorization in Chinese reviews mining

Shu Zhang, Wenjie Jia, Yingju Xia, Yao Meng, more

2010 6th International Conference on Advanced Information Management and Service (IMS) > 324 - 329

2010 6th International Conference on Advanced Information Management and Service (IMS 2010)

Pursuing on the analysis of product reviews, an unsupervised product features categorization method is proposed. Morphemes as smallest linguistic meaningful unit are induced in measuring the intra relationship among product features instead of words. Opinion words around product features are chosen to represent the inter relationship among product features instead of full context information. The...

chapter

Evaluation of clustering algorithms for Polish Word Sense Disambiguation

B Broda, W Mazur

Proceedings of the International Multiconference on Computer Science and Information Technology > 25 - 32

2010 International Multiconference on Computer Science and Information Technology (IMCSIT 2010)

Word Sense Disambiguation in text is still a difficult problem as the best supervised methods require laborious and costly manual preparation of training data. Thus, this work focuses on evaluation of a few selected clustering algorithms in task of Word Sense Disambiguation for Polish. We tested 6 clustering algorithms (K-Means, K-Medoids, hierarchical agglomerative clustering, hierarchical divisive...

chapter

Improving Topic Extraction in Chinese Documents Using Word Sense Disambiguation

Hongyan Song, Tianfang Yao

2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC) > 1106 - 1109

2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC 2009)

This paper reports experiments on topic extraction in Chinese documents using a feature set enriched with Word Sense Disambiguation (WSD) as semantic information. The results of these experiments suggest that incorporating WSD information into Chinese topic extraction tasks may yield improvements over models which do not use WSD information.

chapter

Automatic Domain-Ontology Relation Extraction from Semi-structured Texts

Cheng Xiao, Dequan Zheng, Yuhang Yang, Guojun Shao

2009 International Conference on Asian Language Processing > 211 - 216

2009 International Conference on Asian Language Processing (IALP 2009)

This paper presents a new method to acquire domain-ontology relations from semi-structured data sources. First, obtain Web documents according to the co-occurrence of concept instance and attribute value. Further, define formats of relation patterns, and extract pattern instances from Web documents, including pattern clustering and pattern combining in each cluster. Finally, relation pattern instances...

chapter

Multi-instance learning with relational information of instances

G. Herman, Getian Ye, Yang Wang, Jie Xu, more

2009 Workshop on Applications of Computer Vision (WACV) > 1 - 7

2009 Workshop on Applications of Computer Vision (WACV 2009)

Multi-instance learning (MIL) has many applications, including image and text categorization. One of the most effective approaches to MIL is by using support vector machines with multi-instance kernels. In this paper we propose a multi-instance kernel, called MIR-kernel, that takes into account the relational information of instances when computing similarities between bags. The relational information...

Data set:
ieee
Keywords:
CONTEXT
FEATURE EXTRACTION
TEXT ANALYSIS

Publication date

Set your own date range

Publication type

book (30)
article (1)

Keywords

DATA MINING (12)
TRAINING (8)
NATURAL LANGUAGE PROCESSING (7)
SEMANTICS (7)
CHARACTER RECOGNITION (5)
HIDDEN MARKOV MODELS (5)
PATTERN CLUSTERING (5)
CLUSTERING ALGORITHMS (4)
DOCUMENT IMAGE PROCESSING (4)
IMAGE SEGMENTATION (4)
LEARNING (ARTIFICIAL INTELLIGENCE) (4)
OPTICAL CHARACTER RECOGNITION SOFTWARE (4)
VISUALIZATION (4)
ACCURACY (3)
COMPUTATIONAL MODELING (3)
CONTEXT MODELING (3)
DICTIONARIES (3)
HANDWRITING RECOGNITION (3)
INFORMATION RETRIEVAL (3)
KERNEL (3)
MACHINE LEARNING (3)
OPTICAL CHARACTER RECOGNITION (3)
SHAPE (3)
STATISTICAL ANALYSIS (3)
SUPPORT VECTOR MACHINES (3)
CHINESE NAMED ENTITY RECOGNITION (2)
COMPUTATIONAL LINGUISTICS (2)
ENTROPY (2)
FEATURE SELECTION (2)
HANDWRITTEN CHARACTER RECOGNITION (2)
HANDWRITTEN DOCUMENT (2)
HEURISTIC ALGORITHMS (2)
IMAGE EDGE DETECTION (2)
IMAGE RECOGNITION (2)
IMAGE REPRESENTATION (2)
INTERNET (2)
MANUALS (2)
MEASUREMENT (2)
NAMED ENTITY RECOGNITION (2)
OBJECT RECOGNITION (2)
OPINION MINING (2)
ORGANIZATIONS (2)
PART OF SPEECH (2)
PARTITIONING ALGORITHMS (2)
PIXEL (2)
PROBABILITY DISTRIBUTION (2)
SEARCH ENGINES (2)
SENTIMENT ANALYSIS (2)
TEXT CATEGORIZATION (2)
TEXT RECOGNITION (2)
VOCABULARY (2)
WORD CO-OCCURRENCE (2)
WORD PROCESSING (2)
WORD SENSE DISAMBIGUATION (2)
2-D AUTOREGRESSION (1)
2D AR MODEL COEFFICIENT ESTIMATION (1)
ABSTRACTS (1)
ACCIDENTS (1)
ADAPTIVE ERROR CORRECTION (1)
AFFECT (1)
AFFECTIVE LEXICON (1)
ALGORITHM DESIGN AND ANALYSIS (1)
ALIGNMENT ACCURACY (1)
ALLOGRAPH-BASED FEATURE EXTRACTION (1)
ANALYTICAL MODELS (1)
ANAPHORA (1)
ANNOTATION (1)
APPRAISAL (1)
ARTIFICIAL NEURAL NETWORKS (1)
AUTOMATIC DOMAIN-ONTOLOGY RELATION EXTRACTION (1)
AUTOMATIC TEXT SUMMARIZATION (1)
AUTOREGRESSIVE PROCESSES (1)
BACKGROUND (1)
BAG OF VISUAL WORDS (1)
BATTERIES (1)
BENGALI WRITER (1)
BETWEEN-CHARACTER RELATIONSHIPS (1)
BIPARTITE GRAPH (1)
BISMUTH (1)
BLANK SPACE (1)
BUILDINGS (1)
CAMEL CASE ALGORITHM (1)
CATEGORIZATION QUALITY (1)
CHARACTER RECOGNIZER (1)
CHARACTER SEGMENTATION (1)
CHARACTER STRINGS (1)
CHINESE CHARACTER (1)
CHINESE CHARACTER DESCRIPTOR (1)
CHINESE CHARACTER RECOGNITION TECHNIQUE (1)
CHINESE DOCUMENT (1)
CHINESE DOCUMENTS (1)
CHINESE NER SYSTEM (1)
CHINESE REVIEW MINING (1)
CHINESE TEXT (1)
CHINESE TOPIC EXTRACTION TASK (1)
CLICK DATA (1)
CLUSTER ENSEMBLES (1)
more

INFONA - science communication portal

Search results

Document image classification using SEMCON

A Typed and Handwritten Text Block Segmentation System for Heterogeneous and Complex Documents

Distributional Semantic Models for Affective Text Analysis

Mathematical Formula Identification in PDF Documents

A dynamic adjustment algorithm research of sentiment word weight based on context

Web Text Clustering Based on Concept Lattice

Integrating Geometric Context for Text Alignment of Handwritten Chinese Documents

Local Feature Selection for Generation of Ensembles in Text Clustering

Effective object-based image retrieval using higher-level visual representation

A feature selection method for document clustering based on part-of-speech and word co-occurrence

Keyword Extraction Using Word Co-occurrence

Computer Assisted Transcription of Text Images: Results on the GERMANA Corpus and Analysis of Improvements Needed for Practical Use

Adaptive Correction of Errors from Segmented Digital Ink Texts in Chinese Based on Context

Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora

Recognizing Words from Source Code Identifiers Using Speech Recognition Techniques

Morpheme-based product features categorization in Chinese reviews mining

Evaluation of clustering algorithms for Polish Word Sense Disambiguation

Improving Topic Extraction in Chinese Documents Using Word Sense Disambiguation

Automatic Domain-Ontology Relation Extraction from Semi-structured Texts

Multi-instance learning with relational information of instances

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options