Search results

chapter

keybin: Key-Based Binning for Distributed Clustering

Xinyu Chen, Jeremy Benson, Trilce Estrada

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 572 - 581

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Traditional machine learning algorithms often require computations on centralized data, but modern datasets are collected and stored in a distributed way. In addition to the cost of moving data to centralized locations, increasing concerns about privacy and security warrant distributed approaches. We propose keybin, a distributed key-based binning clustering algorithm for high-dimensional spaces....

chapter

Expectation-maximization algorithm for topic modeling on big data streams

Walisa Romsaiyud

2016 IEEE 7th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) > 1 - 7

2016 IEEE 7th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)

Expectation-Maximization (EM) is typically used to compute maximum likelihood estimates given incomplete samples and estimated the parameters. We proposed a new algorithm for generating an extension Dynamic Topic Model (exDTM)-in a time-based manner and based on the distribution of documents topics on Spark. The proposed algorithm can be applied in clustering documents from data streams for threat...

chapter

MapReduce-based k-prototypes clustering method for big data

Mohamed Aymen Ben Haj Kacem, Chiheb-Eddine Ben N'cir, Nadia Essoussi

2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA) > 1 - 7

2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

Big data clustering is one of the recently challenging tasks that is used in many application domains. Traditional clustering methods are not able to deal with large-scale of data. Furthermore, Big data are often characterized by the mixed type of data, including numerical and categorical attributes. Thus, we propose in this paper the parallelization of k-prototypes clustering method (MR-KP) using...

chapter

Machine learning approaches on map reduce for Big Data analytics

J V N Lakshmi, Ananthi Sheshasaayee

2015 International Conference on Green Computing and Internet of Things (ICGCIoT) > 480 - 484

2015 International Conference on Green Computing and Internet of Things (ICGCIoT)

To analyze enormous datasets, collection of algorithms, associated systems and perform necessary processing on massive data structures there is obligation for a novel trend, which is framed by Big Data. Architecture of Big Data varies across compound machines and clusters with unique purpose sub systems. The data produced from several sources requires analysis and organization with meager amounts...

chapter

Performance enhancement of distributed K-Means clustering for big Data analytics through in-memory computation

Shwet Ketu, Sonali Agarwal

2015 Eighth International Conference on Contemporary Computing (IC3) > 318 - 324

2015 Eighth International Conference on Contemporary Computing (IC3)

Big Data analytics are recently coming up as prominent research area in the field of Information Technology serving various data driven domains for effective processing of big data. Big data analytics have been facing various challenges such as inefficient storage, processing delays, low rate of information retrieval, complex algorithms which cannot be handled and managed using traditional methods...

chapter

K-Means Clustering Algorithm for Large-Scale Chinese Commodity Information Web Based on Hadoop

Geng Yushui, Zhang Lishuo

2015 14th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES) > 256 - 259

2015 14th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES)

With the growing popularity of the network, product information filled in the many pages of the Internet, which you want to get the information you need on these pages tend to consider clustering information, and the current explosive growth of data so that the information mass storage condition occurs, clustering to facing the problems such as large calculation complexity and time consuming, then...

chapter

k-Means Performance Improvements with Centroid Calculation Heuristics Both for Serial and Parallel Environments

Jeyhun Karimov, Murat Ozbayoglu, Erdogan Dogdu

2015 IEEE International Congress on Big Data > 444 - 451

2015 IEEE International Congress on Big Data (BigData Congress)

K-means is the most widely used clustering algorithm due to its fairly straightforward implementations in various problems. Meanwhile, when the number of clusters increase, the number of iterations also tend to slightly increase. However there are still opportunities for improvement as some studies in the literature indicate. In this study, improved implementations of k-means algorithm with a centroid...

chapter

Simulation of parallel similarity measure computations for large data sets

Pawel Czarnul, Pawel Rosciszewski, Mariusz Matuszek, Julian Szymanski

2015 IEEE 2nd International Conference on Cybernetics (CYBCONF) > 472 - 477

2015 IEEE 2nd International Conference on Cybernetics (CYBCONF)

The paper presents our approach to implementation of similarity measure for big data analysis in a parallel environment. We describe the algorithm for parallelisation of the computations. We provide results from a real MPI application for computations of similarity measures as well as results achieved with our simulation software. The simulation environment allows us to model parallel systems of various...

INFONA - science communication portal

Search results

keybin: Key-Based Binning for Distributed Clustering

Expectation-maximization algorithm for topic modeling on big data streams

MapReduce-based k-prototypes clustering method for big data

Machine learning approaches on map reduce for Big Data analytics

Performance enhancement of distributed K-Means clustering for big Data analytics through in-memory computation

K-Means Clustering Algorithm for Large-Scale Chinese Commodity Information Web Based on Hadoop

k-Means Performance Improvements with Centroid Calculation Heuristics Both for Serial and Parallel Environments

Simulation of parallel similarity measure computations for large data sets

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options