Search results

Items from 1 to 20 out of 32 results

chapter

Development and study of clustering algorithms for large sets of data

Yu Stekh, M Lobur, F M E Sardieh, M Dombrova, more

2011 11th International Conference The Experience of Designing and Application of CAD Systems in Microelectronics (CADSM) > 202 - 204

2011 11th International Conference The Experience of Designing and Application of CAD Systems in Microelectronics (CADSM 2011)

This paper focuses on document clustering algorithms that build hierarchical solutions. In this paper is evaluate the performance of different criterion functions for the problem of clustering documents.

chapter

An Improved Data Clustering Algorithm for Mining Web Documents

O H Odukoya, G A Aderounmu, E R Adagunodo

2010 International Conference on Computational Intelligence and Software Engineering > 1 - 8

2010 International Conference on Computational Intelligence and Software Engineering (CiSE 2010)

This paper formulates, simulates and assess an improved data clustering algorithm for mining web documents with a view to preserving their conceptual similarities and eliminating the problem of speed while increasing accuracy. The improved data clustering algorithm was formulated using the concept of K-means algorithm. Real and artificial datasets were used to test the proposed and existing algorithm...

chapter

Semi-supervised PLSA for Document Clustering

Lingfeng Niu, Yong Shi

2010 IEEE International Conference on Data Mining Workshops > 1196 - 1203

2010 10th IEEE International Conference on Data Mining Workshops (ICDMW 2010)

By utilizing the must-link or cannot-link pair wise constraints in data, semi-supervised clustering improves the performance of unsupervised clustering significantly. A number of semi-supervised clustering algorithms have been proposed to consider such pair wise constraints. However, most of them assign a hard label to each data item and produce little information about the cluster itself. In this...

chapter

The PARIS Algorithm for Determining Latent Topics

M Aharon, I Cohen, A Itskovitch, I Marhaim, more

2010 IEEE International Conference on Data Mining Workshops > 1092 - 1099

2010 10th IEEE International Conference on Data Mining Workshops (ICDMW 2010)

We introduce a new method for discovering latent topics in sets of objects, such as documents. Our method, which we call PARIS (for Principal Atoms Recognition In Sets), aims to detect principal sets of elements, representing latent topics in the data, that tend to appear frequently together. These latent topics, which we refer to as `atoms', are used as the basis for clustering, classification, collaborative...

chapter

Clustering Algorithm on Block Division of Documents

Gang Liu, Mingyue Luo

2010 6th International Conference on Wireless Communications Networking and Mobile Computing (WiCOM) > 1 - 4

2010 6th International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM)

In the traditional K-means algorithm, the selection of cluster number and the initial cluster center brings huge affection on the quality of clustering. To reduce the dependence on the initial center and to locate the types of new data rapidly, an algorithm applicable for text data is proposed. In this algorithm, document density is considered as parameter. Documents are divided into blocks first...

chapter

CQIG: An Improved Web Search Results Clustering Algorithm

Ren Yong-gong, Fan Dan

2010 Seventh Web Information Systems and Applications Conference > 75 - 78

2010 7th Web Information Systems and Applications Conference (WISA 2010). Workshop on Semantic Web and Ontology (SWON2010). Workshop on Electronic Government Technology and Application (EGTA 2010)

Massive linear search results returned from traditional search engines bring much inconvenience to users when extract the information they need. Search result clustering is of critical need for grouping similar topics of documents. The existing algorithm has drawbacks in clustering labels screening, cluster quality assessment, overlapping clusters controlling. The improved clustering algorithm-CQIG,...

chapter

Clustering GML documents using maximal frequent induced subtrees

Ying-wen Zhu, Gen-lin Ji, Qin-hong Sun

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery > 5 > 2265 - 2269

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

An algorithm, TBCClustering, is presented in the paper for clustering GML documents using maximal frequent induced subtree patterns. TBCClustering mines the maximal frequent induced subtrees by using the structural information of GML documents, it can get the best minimum support automatically, and then chooses a set of subtree patterns to form the optimistic clustering features. Finally it uses CLOPE...

chapter

XML Documents Clustering Research Based on Weighted Cosine Measure

Li Wei, Li Xiong-fei, Zhao Yan

2010 Fifth International Conference on Frontier of Computer Science and Technology > 95 - 100

2010 Fifth International Conference on Frontier of Computer Science and Technology (FCST 2010)

Recently, a large amount of work has been done in XML data mining. However, most of the existing work focuses on the snapshot XML data, while XML data is dynamic in practical application. In order to mine knowledge hidden in the frozen structures (FS) which are not changed or very little changed during the historical changing process of an XML document, we present a method for clustering XML documents...

chapter

Community structure of the Chinese document network based on content similarity

Xin Pan, Jian-Guo Liu, Guishi Deng

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery > 4 > 1515 - 1519

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

Based on the complex network theory, we proposed a clustering algorithm based on content similarity. Firstly, the Chinese documents are represented by the vector-space model, and the content similarity between any two documents is computed by the cosine similarity. Consequently, the network node is defined as a document, and the edge weight is defined as the similarity obtained by the cosine similarity...

chapter

Web document clustering based on Global-Best Harmony Search, K-means, Frequent Term Sets and Bayesian Information Criterion

C Cobos, J Andrade, W Constain, M Mendoza, more

IEEE Congress on Evolutionary Computation > 1 - 8

2010 IEEE Congress on Evolutionary Computation

This paper introduces a new description-centric algorithm for web document clustering based on the hybridization of the Global-Best Harmony Search with the K-means algorithm, Frequent Term Sets and Bayesian Information Criterion. The new algorithm defines the number of clusters automatically. The Global-Best Harmony Search provides a global strategy for a search in the solution space, based on the...

chapter

A comparison of two suffix tree-based document clustering algorithms

M Rafi, M Maujood, Murtaza Munawar Fazal, Syed Muhammad Ali

2010 International Conference on Information and Emerging Technologies > 1 - 5

2010 International Conference on Information and Emerging Technologies (ICIET)

Document clustering as an unsupervised approach extensively used to navigate, filter, summarize and manage large collection of document repositories like the World Wide Web (WWW). Recently, focuses in this domain shifted from traditional vector based document similarity for clustering to suffix tree based document similarity, as it offers more semantic representation of the text present in the document...

chapter

Effects of Similarity Metrics on Document Clustering

Kazem Taghva, Rushikesh Veni

2010 Seventh International Conference on Information Technology: New Generations > 222 - 226

Seventh International Conference on Information Technology: New Generations (ITNG 2010)

Document clustering or unsupervised document classification is an automated process of grouping documents with similar content. A typical technique uses a similarity function to compare documents. In the literature, many similarity functions such as dot product or cosine measures are proposed for the comparison operator. In these papers, we evaluate the effects of many similarity functions on k-mean...

chapter

System for a cluster analysis

Yuri Stekh, Fajsal M E Sardieh, Mykhaylo Lobur

2010 International Conference on Modern Problems of Radio Engineering, Telecommunications and Computer Science (TCSET) > 236

2010 International Conference on "Modern Problems of Radio Engineering, Telecommunications and Computer Science" (TCSET 2010)

Summary form only given. This paper focuses on structure of dialog graphical system for pattern recognition with help of distance function. In this paper is evaluate the performance of different criterion functions and algorithms for the problem of clustering large datasets.

chapter

A Document Clustering Algorithm for Web Search Engine Retrieval System

Hongwei Yang

2010 International Conference on e-Education, e-Business, e-Management and e-Learning > 383 - 386

2010 International Conference on e-Education, e-Business, e-Management, and e-Learning, (IC4E)

As the number of available Web pages grows, it is become more difficult for users finding documents relevant to their interests. Clustering is the classification of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often proximity according to some defined distance measure. It can enable users to find the relevant documents more easily and also...

chapter

A clustering algorithm based on latent semantic model

Bu-Yu Wang, Mei-An Li, Yong-Jiang Wang

2009 International Conference on Apperceiving Computing and Intelligence Analysis > 44 - 48

2009 International Conference on Apperceiving Computing and Intelligence Analysis (ICACIA 2009)

In order to precisely procure the Chinese person information on the web, especially distinguish from the namesake, this paper propose a clustering algorithm based on latent semantic model. It establishes for every document a latent semantic model of sentence-word matrix based on central distance, central segment, document length, etc, by building the central word library of person attributes. It clusters...

chapter

XML document clustering based on common tag names anywhere in the structure

M. Alishahi, M. Ravakhah, B. Shakeriaski, M. Naghibzade

2009 14th International CSI Computer Conference > 588 - 595

2009 14th International CSI Computer Conference (CSICC 2009) (Postponed from July 2009)

One of the most effective ways to extract knowledge from large information resources is applying data mining methods. Since the amount of information on the Internet is exploding, using XML documents is common as they have many advantages. Knowledge extraction from XML documents is a way to provide more utilizable results. XCLS is one of the most efficient algorithms for XML documents clustering....

chapter

Web News Summarization via Soft Clustering Algorithm

Jianwu Wu

2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery > 7 > 618 - 621

2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2009)

As the information available on the internet is growing explosively, this paper proposed a new method of web news summarization via sentence clustering algorithm. It adopted cluster algorithm to cluster all the sentences. Feature fusion will be used to extract summary sentences. Experimental result shows that the proposed summarization method can improve the performance of summary.

chapter

Application of Genetic Algorithm in Document Clustering

Wei Jian-Xiang, Liu Huai, Sun Yue-hong, Su Xin-Ning

2009 International Conference on Information Technology and Computer Science > 1 > 145 - 148

2009 International Conference on Information Technology and Computer Science (ITCS 2009)

By researching all kinds of methods for document clustering, we put forward a new dynamic method based on genetic algorithm (GA). K-means is a greedy algorithm, which is sensitive to the choice of cluster center and very easily results in local optimization. Genetic algorithm is a global convergence algorithm, which can find the best cluster centers easily. Among the traditional document clustering...

chapter

Analysis of Book Documents' Table of Content Based on Clustering

Liangcai Gao, Zhi Tang, Xiaofan Lin, Xin Tao, more

2009 10th International Conference on Document Analysis and Recognition > 911 - 915

2009 10th International Conference on Document Analysis and Recognition (ICDAR)

Table of contents (TOC) recognition has attracted a great deal of attention in recent years. After reviewing the merits and drawbacks of the existing TOC recognition methods, we have observed that book documents are multi-page documents with intrinsic local format consistency. Based on this finding we introduce an automatic TOC analysis method through clustering. This method first detects the decorative...

chapter

A Robust Algorithm for Fuzzy Document Clustering

Lifei Chen, Shengrui Wang, Qingshan Jiang

2009 International Conference on Advanced Information Networking and Applications Workshops > 679 - 684

2009 IEEE 23rd International Conference on Advanced Information Networking and Applications Workshops (WAINA)

In many applications of document clustering, a document may include multiple topics and thus may relate to multiple categories at the same time. Most of the existing subspace clustering algorithms can only perform hard clustering on document collections. In this paper, a fuzzy algorithm named R-FPC is introduced for document clustering. The algorithm discovers soft partitions of a data set in the...

Data set:
ieee
Keywords:
ALGORITHM DESIGN AND ANALYSIS
PATTERN CLUSTERING
DOCUMENT HANDLING

Publication date

Set your own date range

Publication type

book (31)
article (1)

Keywords

CLUSTERING ALGORITHMS (30)
DATA MINING (20)
DOCUMENT CLUSTERING (14)
PARTITIONING ALGORITHMS (12)
INTERNET (9)
CLUSTERING (6)
CLUSTERING ALGORITHM (5)
PROBABILITY DENSITY FUNCTION (5)
CLASSIFICATION ALGORITHMS (4)
INFORMATION RETRIEVAL (4)
MERGING (4)
TEXT MINING (4)
COMPUTATIONAL MODELING (3)
DATA MODELS (3)
FEATURE EXTRACTION (3)
FUZZY SET THEORY (3)
GREEDY ALGORITHMS (3)
HEURISTIC ALGORITHMS (3)
MATRIX DECOMPOSITION (3)
NATURAL LANGUAGE PROCESSING (3)
QUERY PROCESSING (3)
SEARCH ENGINES (3)
WEB SEARCH (3)
WEIGHT MEASUREMENT (3)
WORLD WIDE WEB (3)
XML (3)
ACCURACY (2)
APPROXIMATION ALGORITHMS (2)
CLUSTERING METHODS (2)
CLUSTERING QUALITY (2)
CO-CLUSTERING (2)
COMPLEXITY THEORY (2)
COMPUTERS (2)
CONVERGENCE (2)
DATA CLUSTERING ALGORITHM (2)
DATA STRUCTURES (2)
DATABASES (2)
DISTANCE FUNCTION (2)
DOCUMENT REPRESENTATION (2)
ENCODING (2)
ENTROPY (2)
EQUATIONS (2)
GALLIUM (2)
GENETIC ALGORITHM (2)
GENETIC ALGORITHMS (2)
ITERATIVE METHODS (2)
K-MEANS (2)
K-MEANS ALGORITHM (2)
K-MEANS CLUSTERING (2)
MATRIX ALGEBRA (2)
ROBUSTNESS (2)
SEARCH PROBLEMS (2)
SILICON (2)
SIMILARITY MEASURE (2)
TEXT ANALYSIS (2)
UNSUPERVISED LEARNING (2)
VECTOR SPACE MODEL (2)
VOCABULARY (2)
WEB DOCUMENT CLUSTERING (2)
ACONS ALGORITHM (1)
ADAPTATION MODEL (1)
ADJUSTED RAND INDEX VALUE (1)
AGGLOMERATIVE HIERARCHICAL CLUSTERING ALGORITHM (1)
AGGLOMERATIVE HIERARCHICAL METHOD (1)
ALGEBRAIC TRANSFORMATION (1)
ANALYTICAL MODELS (1)
APPROXIMATION METHODS (1)
ARRAYS (1)
ARTIFICIAL DATASET (1)
ASTROPHYSICS (1)
ATOMIC MEASUREMENTS (1)
AUTOMATIC DISCOVERY (1)
BAYES METHODS (1)
BAYESIAN INFORMATION CRITERION (1)
BIOLOGICAL CELLS (1)
BOOK DOCUMENTS (1)
BOOKS (1)
BUCKSHOT METHOD (1)
BUILDINGS (1)
CANBERRA DISTANCES (1)
CANDIDATE CLUSTERS MERGING (1)
CENTER WORD DISTANCE (1)
CENTRAL DISTANCE (1)
CENTRAL SEGMENT (1)
CENTRAL WORD LIBRARY (1)
CENTRAL WORD POSITION (1)
CENTRAL WORD SET (1)
CHI-SIM SIMILARITY MEASURE (1)
CHI-SQUARE (1)
CHINESE DOCUMENT NETWORK (1)
CHINESE INFORMATION CLUSTERING TECHNIQUES (1)
CHINESE PERSON INFORMATION (1)
CLASSIFICATION (1)
CLOPE ALGORITHM (1)
CLUSTER ANALYSIS (1)
CLUSTER CENTER (1)
CLUSTER ENSEMBLE (1)
more

INFONA - science communication portal

Search results

Development and study of clustering algorithms for large sets of data

An Improved Data Clustering Algorithm for Mining Web Documents

Semi-supervised PLSA for Document Clustering

The PARIS Algorithm for Determining Latent Topics

Clustering Algorithm on Block Division of Documents

CQIG: An Improved Web Search Results Clustering Algorithm

Clustering GML documents using maximal frequent induced subtrees

XML Documents Clustering Research Based on Weighted Cosine Measure

Community structure of the Chinese document network based on content similarity

Web document clustering based on Global-Best Harmony Search, K-means, Frequent Term Sets and Bayesian Information Criterion

A comparison of two suffix tree-based document clustering algorithms

Effects of Similarity Metrics on Document Clustering

System for a cluster analysis

A Document Clustering Algorithm for Web Search Engine Retrieval System

A clustering algorithm based on latent semantic model

XML document clustering based on common tag names anywhere in the structure

Web News Summarization via Soft Clustering Algorithm

Application of Genetic Algorithm in Document Clustering

Analysis of Book Documents' Table of Content Based on Clustering

A Robust Algorithm for Fuzzy Document Clustering

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options