Distributed Clustering for Data Sources with Diverse Schema

N.K. Visalakshi; K. Thangavel; P. Alagambigai

doi:10.1109/ICCIT.2008.282

Source

2008 Third International Conference on Convergence and Hybrid Information Technology > 1 > 1058 - 1063

Abstract

Many enterprises incorporate information gathered from a variety of data sources into an integrated input for some learning task. For example, aiming towards the design of an automated diagnostic tool for some diseases, one may wish to integrate data gathered from many different hospitals. Analyzing and mining these distributed heterogeneous data sources require distributed machine learning and data mining technique In this paper, a Modified Distributed Combining Algorithm is proposed to cluster disparate data sources having diverse, possibly overlapping set of features and also need not share objects. First, all objects located at local sites are grouped using K-Means/Fuzzy C-Means clustering algorithm and resulting centroid is considered as local models. Then, the set of centroids are transformed into unified structure and optimum values are assigned to missing attributes. Finally, global cluster centroid is computed to identify global cluster model based on cluster ensemble and centroid mapping. The experiments are carried out for various datasets of UCI machine learning data repository in order to achieve the efficiency of the proposed algorithm.

Identifiers

book ISBN :	978-0-7695-3407-7
DOI	10.1109/ICCIT.2008.282

Keywords

pattern clustering data mining distributed processing fuzzy set theory learning (artificial intelligence) fuzzy c-means clustering algorithm distributed data sources clustering diverse schema learning task automated diagnostic tool distributed heterogeneous data sources distributed machine learning data mining technique modified distributed combining algorithm k-means clustering algorithm centroid mapping cluster ensemble Distributed databases Clustering algorithms Partitioning algorithms Indexes Algorithm design and analysis Databases Local Centroid Distributed Clustering Global Centroid K-Means

Additional information

Data set: ieee

Publisher

IEEE

INFONA - science communication portal

Distributed Clustering for Data Sources with Diverse Schema

Source

Abstract

Identifiers

Authors

Visalakshi, N.K.

Thangavel, K.

Alagambigai, P.

Keywords

Additional information

Publisher


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Distributed Clustering for Data Sources with Diverse Schema $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Visalakshi, N.K.

Thangavel, K.

Alagambigai, P.

Keywords

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Distributed Clustering for Data Sources with Diverse Schema