Search results

Items from 1 to 20 out of 52 results

chapter

Land Map Images Binarization Based on Distance Transform and Adaptive Threshold

Samit Biswas, Sekhar Mandal, Amit Kumar Das, Bhabatosh Chanda

2014 11th IAPR International Workshop on Document Analysis Systems > 334 - 338

2014 11th IAPR International Workshop on Document Analysis Systems (DAS)

This work presents a binarization technique of map document images. It exploits an amalgam of global and local threshold approaches best suited for binarization of document images with complex background and overlapping objects in the foreground like maps. The proposed approach uses Distance Transform (DT) and Adaptive threshold. Initially a rough estimate of the map background is done using Distance...

chapter

When is a Problem Solved?

Daniel Lopresti, George Nagy

2011 International Conference on Document Analysis and Recognition > 32 - 36

2011 International Conference on Document Analysis and Recognition (ICDAR)

Open problems are defined differently in document image analysis than in the physical sciences, theoretical computer science, or mathematics. Instead of a formal definition, problems in DIA are stated in terms of automation of an application area (e.g., postal address reading) or a scientific sub field (e.g., image compression). The notion of a successful solution may be based on (1) the relative...

article

The Effect of Border Noise on the Performance of Projection-Based Page Segmentation Methods

F Shafait, T M Breuel

IEEE Transactions on Pattern Analysis and Machine Intelligence > 2011 > 33 > 4 > 846 - 851

Projection methods have been used in the analysis of bitonal document images for different tasks such as page segmentation and skew correction for more than two decades. However, these algorithms are sensitive to the presence of border noise in document images. Border noise can appear along the page border due to scanning or photocopying. Over the years, several page segmentation algorithms have been...

chapter

Chinese Web Text Outlier Mining Based on Domain Knowledge

Xia Huosong, Fan Zhaoyan, Peng Liuyan

2010 Second WRI Global Congress on Intelligent Systems > 2 > 73 - 77

2010 Second WRI Global Congress on Intelligent Systems (GCIS 2010)

Web text mining is a growing research area in data mining. Interestingly, the existing Web text mining algorithms have concentrated on finding frequent patterns while discarding the less frequent ones that may contain outliers. In addition, the domain knowledge in one industry is partly different from that in the others. Whatever they belong to, web texts are analyzed using the same dictionary. This...

chapter

Research of Chinese word segmentation based on neural network and particle swarm optimization

Jia He, Guan-Hong Li

The 2010 International Conference on Apperceiving Computing and Intelligence Analysis Proceeding > 56 - 59

2010 International Conference on Apperceiving Computing and Intelligence Analysis (ICACIA 2010)

For the research of Chinese word segmentation, the BP algorithm model has a lot of defects such as low convergent velocity, easily falling into local minimum, low velocity and efficiency. In this paper, we proposed a new particle swarm neural network algorithm (NPSO-BP), and used it in Chinese word segmentation. The results show that the speed of the segmentation algorithm is obviously faster than...

chapter

Search-based short-text classification

Kang Wei, Ruiquan Zhang, Xinguo Xu

5th International Conference on Pervasive Computing and Applications > 297 - 301

2010 5th International Conference on Pervasive Computing and Applications (ICPCA 2010)

Since the traditional classification algorithm does not work well in the case of short-text classification, we propose a search-based method employing Na'iveBayes classification algorithm. This paper describes the whole process, including the classification algorithms, training and the evaluation. The results indicate that the classifier has better performance comparing with other methods.

chapter

New Cluster Detection Based on Multi-Representation Index Tree Text Clustering

Hui Song, Lifeng Wang, Baiyan Li, Xiaoqiang Liu

2010 2nd International Workshop on Database Technology and Applications > 1 - 4

2010 2nd International Workshop on Database Technology and Applications (DBTA 2010)

Traditional Clustering is a powerful technique for revealing the "hot" topics among documents. However, it's hard to discover the new type events coming out gradually. In this paper, we propose a novel model for detecting new clusters from time-streaming documents. It consists of three parts: the cluster definition based on Multi-Representation Index Tree (MI-Tree), the new cluster detecting...

chapter

Algorithm of the Text Copy Detection Based on Topic Bag

Wang Sen, Wang Yu

2010 International Conference on Web Information Systems and Mining > 1 > 285 - 288

2010 International Conference on Web Information Systems and Mining (WISM 2010)

In order to resolve the current problem about seriously academic plagiarism in the web environment, this article proposes an algorithm of the text copy detection on the topic bag and the algorithm uses the idea of semantic clustering and multi-instance learning. Firstly, a paper is divided into three layers construction tree: a leaf node denotes a sentence; a branch node represents a topic bag, and...

chapter

Knowledge representation of Urdu text using predicate logic

Amjad Ali, Mohammad Abid Khan

2010 6th International Conference on Emerging Technologies (ICET) > 293 - 298

2010 6th International Conference on Emerging Technologies (ICET)

Knowledge representation is a key area of research in artificial intelligence which deals with the proper storage and retrieval of knowledge for various useful applications. This research paper proves that knowledge can be easily and efficiently represented in predicate logic. The algorithm in this paper splits the Urdu text/sentences into phrases/constituents and then represents these in predicate...

chapter

Farsi and Latin script identification using curvature scale space features

Malike Khoddami, Alireza Behrad

10th Symposium on Neural Network Applications in Electrical Engineering > 213 - 217

10th Symposium on Neural Network Applications in Electrical Engineering (NEUREL 2010)

Script recognition is a necessary process before OCR algorithm in multilingual systems. In this paper, a novel method is proposed for identifying Farsi and Latin scripts in bilingual document using curvature scale space features. The proposed features are rotation and scale invariant and can be used to identify scripts with different fonts. We assumed that the bilingual scripts may have Farsi and...

chapter

Chinese coding type identification based on Kolmogorov complexity theory

Gang He, Ning Zhu, Xiaochun Wu, Qiuchen Xu

2010 2nd IEEE InternationalConference on Network Infrastructure and Digital Content > 293 - 297

2010 2nd IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC 2010)

Identification of Chinese coding type is a major and challenging issue in Chinese web content audit and analysis. In this paper we develop a novel algorithm based on the theory of Kolmogorov complexity to identify the coding type of Chinese characters of a given text segment. An array of text compressors are used as filters to evaluate the information distance of text under examination and the training...

chapter

An Improved LAM Feature Selection Algorithm

Yong-gong Ren, Nan Lin, Yu-qi Sun

2010 Seventh Web Information Systems and Applications Conference > 35 - 38

2010 7th Web Information Systems and Applications Conference (WISA 2010). Workshop on Semantic Web and Ontology (SWON2010). Workshop on Electronic Government Technology and Application (EGTA 2010)

In text categorization, feature selection is an effective feature dimension-reduction methods. To solve the problems of unadaptable high original feature space dimension, too much irrelevance, data redundancy and difficulties in selecting a threshold, we propose an improved LAM feature selection algorithm (ILAMFS). Firstly, combining the gold segmentation and the LAM algorithm based on the characteristics...

chapter

Chinese text categorization study based on CBM learning

Yan Zhan, Hao Chen

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery > 4 > 1511 - 1514

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

Text Categorization (TC) is an important component in many information organization and information management tasks. In many TC applications, the case-base grows at a fast rate and this causes inefficiency in the case retrieval process. Using Case-Base Maintenance learning via the GC (Generalization Capability) algorithm, which can reduce the case number into KNN algorithm, can improve efficiency...

chapter

Text line processing for high-confidence skew detection in image documents

Daniel Rosner, Costin-Anton Boiangiu, Alexandru Stefanescu, Nicolae Tapus, more

Proceedings of the 2010 IEEE 6th International Conference on Intelligent Computer Communication and Processing > 129 - 132

2010 IEEE 6th International Conference on Intelligent Computer Communication and Processing (ICCP 2010)

Skew detection and correction is an important step in automated content conversion systems, on which overall system performance is dependent. Although there are many working solutions at the present time, the search for an algorithm that can achieve good error rates in a fast running time and on different layout types is still open, so new solutions for skew detection are needed. The paper at hand...

chapter

Fast Seamless Skew and Orientation Detection in Document Images

I Konya, S Eickeler, C Seibert

2010 20th International Conference on Pattern Recognition > 1924 - 1928

2010 20th International Conference on Pattern Recognition (ICPR 2010)

Reliable and generic methods for skew detection are a necessity for any large-scale digitization projects. As one of the first processing steps, skew detection and correction has a heavy influence on all further document analysis modules, such as geometric and logical layout analysis. This paper introduces a generic, scale-independent algorithm capable of accurately detecting the global skew angle...

chapter

Unsupervised Block Covering Analysis for Text-Line Segmentation of Arabic Ancient Handwritten Document Images

Wafa Boussellaa, Abderrazak Zahour, Haikal Elabed, Abdellatif Benabdelhafid, more

2010 20th International Conference on Pattern Recognition > 1929 - 1932

2010 20th International Conference on Pattern Recognition (ICPR 2010)

This paper presents a new method for automatic text-line extraction from Arabic historical handwritten documents presenting an overlapping and multi-touching characters problems. Our approach is based on block covering analysis using unsupervised technique. This algorithm performs firstly a statistical block analysis which computes the optimal number of document decomposition into vertical strips...

chapter

Research on K-means Text Clustering Algorithm Based on Semantic

Yufang Liu, Shibin Xiao, Xueqiang Lv, Shuicai Shi

2010 International Conference on Computing, Control and Industrial Engineering > 1 > 124 - 127

2010 International Conference on Computing, Control and Industrial Engineering (CCIE 2010)

Through research on K-means algorithm of text clustering and semantic-based vector space model, a semantic-based K-means text clustering model is proposed to solve the problem on high-dimensional and sparse characteristics of text data set. The model reduces the semantic loss of the text data and improves the quality of text clustering. Experiments prove that semantic-based text clustering increases...

chapter

A Novel Approach to Improve the Accuracy of Web Retrieval

Vitaly Klyuev, Vladimir Oleshchuk

2010 5th International Conference on Future Information Technology > 1 - 5

2010 5th International Conference on Future Information Technology (FutureTech)

General purpose search engines utilize a very simple view on text documents: They consider them as bags of words. It results that after indexing, the semantics of documents is lost. In this paper, we introduce a novel approach to improve the accuracy of Web retrieval. We utilize the WordNet and WordNet SenseRelate All Words Software as main tools to preserve the semantics of the sentences of documents...

chapter

An Improved Fuzzy Clustering Method for Text Mining

Jiabin Deng, JuanLi Hu, Hehua Chi, Juebo Wu

2010 Second International Conference on Networks Security, Wireless Communications and Trusted Computing > 1 > 65 - 69

2010 2nd International Conference on Networks Security, Wireless Communications and Trusted Computing (NSWCTC 2010)

In recent years, the text data of text mining has gradually become a new research topic. Among them, the study of the text clustering has attracted wide attention. This paper proposes an improved fuzzy clustering-text clustering method based on the fuzzy C-means clustering algorithm and the edit distance algorithm. We use the feature evaluation to reduce the dimensionality of high-dimensional text...

chapter

Text Categorization Research Based on Cluster Idea

Jialun Lin, Xiaoling Li, Yuan Jiao

2010 Second International Workshop on Education Technology and Computer Science > 1 > 483 - 486

2010 2nd International Workshop on Education Technology and Computer Science (ETCS)

Classification and clustering are frequently-used methods in data excavation technology. This paper introduces the idea of text clustering into the categorization algorithm study. The authors also attempt to use the text categorization pattern of self'-initiated learning to design a clustering-based text categorization algorithm, in the purpose of reducing the dimension of training set and raising...

Data set:
ieee
Keywords:
ALGORITHM DESIGN AND ANALYSIS
ACCURACY
TEXT ANALYSIS

Publication date

Set your own date range

INFONA - science communication portal

Search results

Land Map Images Binarization Based on Distance Transform and Adaptive Threshold

When is a Problem Solved?

The Effect of Border Noise on the Performance of Projection-Based Page Segmentation Methods

Chinese Web Text Outlier Mining Based on Domain Knowledge

Research of Chinese word segmentation based on neural network and particle swarm optimization

Search-based short-text classification

New Cluster Detection Based on Multi-Representation Index Tree Text Clustering

Algorithm of the Text Copy Detection Based on Topic Bag

Knowledge representation of Urdu text using predicate logic

Farsi and Latin script identification using curvature scale space features

Chinese coding type identification based on Kolmogorov complexity theory

An Improved LAM Feature Selection Algorithm

Chinese text categorization study based on CBM learning

Text line processing for high-confidence skew detection in image documents

Fast Seamless Skew and Orientation Detection in Document Images

Unsupervised Block Covering Analysis for Text-Line Segmentation of Arabic Ancient Handwritten Document Images

Research on K-means Text Clustering Algorithm Based on Semantic

A Novel Approach to Improve the Accuracy of Web Retrieval

An Improved Fuzzy Clustering Method for Text Mining

Text Categorization Research Based on Cluster Idea

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options