Search results

chapter

Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms

Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, Justin Ward

2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) > 61 - 72

2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)

Clustering is a classic topic in optimization with k-means being one of the most fundamental such problems. In the absence of any restrictions on the input, the best known algorithm for k-means with a provable guarantee is a simple local search heuristic yielding an approximation guarantee of 9+≥ilon, a ratio that is known to be tight with respect to such methods.We overcome this barrier...

chapter

Relationship of Jaccard and edit distance in malware clustering and online identification (Extended abstract)

Shlomi Dolev, Mohammad Ghanayim, Alexander Binun, Sergey Frenkel, more

2017 IEEE 16th International Symposium on Network Computing and Applications (NCA) > 1 - 5

2017 IEEE 16th International Symposium on Network Computing and Applications (NCA)

In this paper, we examine the possibility to utilize the well-known approximations of Jaccard metric in order to reduce computational complexity of Edit Distance metric estimation. The scope of our analytical results is the representing strings rather than the original (raw) textual data, still in practice we obtained a solid indication that the results can be applied to (raw) strings that have low...

chapter

Extended core-based community detection for directed networks

Anubhuti Garg, Mohammad Rehaan, Amiya Nayak

2017 International Conference on Computer, Information and Telecommunication Systems (CITS) > 302 - 306

2017 International Conference on Computer, Information and Telecommunication Systems (CITS)

The focus of this paper is on detecting overlapping communities for the directed graphs by implementing a new algorithm and analyzing it with various performance metrics. The algorithm aims at finding core nodes for the directed graph which are subset of communities and have higher contact frequency. These are then extended to find communities using compactness measurement (CM). The compactness of...

chapter

GraphSteal: Dynamic Re-Partitioning for Efficient Graph Processing in Heterogeneous Clusters

Dinesh Kumar, Arun Raj, Janakiram Dharanipragada

2017 IEEE 10th International Conference on Cloud Computing (CLOUD) > 439 - 446

2017 IEEE 10th International Conference on Cloud Computing (CLOUD)

With continuously growing data, clusters also need to grow periodically to accommodate the increased demand of data processing. This is usually done by addition of newer hardware, whose configuration might differ from the existing nodes. As a result, clusters are becoming heterogeneous in nature. For many real world machine learning and data mining applications, data is represented in the form of...

chapter

Arabic text mining based on clustering and coreference resolution

Salma Mahmood, Faiez Musa Lahmood Al-Rufaye

2017 International Conference on Current Research in Computer Science and Information Technology (ICCIT) > 140 - 144

2017 International Conference on Current Research in Computer Science and Information Technology (ICCIT)

Text mining discover and extract useful information from documents, whenever increase the size and number documents leads to redouble features. The huge features for the documents adds challenge to text mining called high dimension. The aim of this proposed study is minimize the high dimension of the documents, and improve Arabic text mining using clustering. In order to achieve this goal, we propose...

chapter

Graph Partitioning in Parallelization of Large Scale Networks

Sima Das, Jennifer Leopold, Susmita Ghosh, Sajal K. Das

2016 IEEE 41st Conference on Local Computer Networks (LCN) > 176 - 179

2016 IEEE 41st Conference on Local Computer Networks (LCN)

Real world large scale networks exhibit intrinsiccommunity structure, with dense intra-community connectivityand sparse inter-community connectivity. Leveraging their communitystructure for parallelization of computational tasks andapplications, is a significant step towards computational efficiencyand application effectiveness. We propose a weighted depth-firstsearchgraph partitioning algorithm for...

chapter

Can We Group Similar Amazon Reviews: A Case Study with Different Clustering Algorithms

Chantal Fry, Sukanya Manna

2016 IEEE Tenth International Conference on Semantic Computing (ICSC) > 374 - 377

2016 IEEE Tenth International Conference on Semantic Computing (ICSC)

The amount of unstructured text data available is growing exponentially due to the proliferation of digital information such as emails, text messages, blogs, social media posts, and product reviews. For users of e-commerce websites such as Amazon, navigating thousands of reviews before buying a product can be a daunting task. Unsupervised machine learning techniques can be used to automatically analyze...

chapter

Diminishing Prototype Size for k-Nearest Neighbors Classification

Mohammad Mehdi Samadpour, Hamid Parvin, Farhad Rad

2015 Fourteenth Mexican International Conference on Artificial Intelligence (MICAI) > 139 - 144

2015 Fourteenth Mexican International Conference on Artificial Intelligence (MICAI)

In this paper, a new classification method based on k-Nearest Neighbor (kNN) lazy classifier is proposed. This method leverages the clustering concept to reduce the size of the training set in kNN classifier and also in order to enhance its performance in terms of time complexity. The new approach is called Modified Nearest Neighbor Classifier Based on Clustering (MNNCBC). Inspiring the traditional...

chapter

A Practical Approach on Cleaning-Up Large Data Sets

Marius Barat, Dumitru Bogdan Prelipcean, Dragos Teodor Gavrilut

2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing > 280 - 284

2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)

In this paper we propose a noise detection system based on similarities between instances. Having a data set with instances that belongs to multiple classes, a noise instance denotes a wrongly classified record. The similarity between different labeled instances is determined computing distances between them using several metrics among the standard ones. In order to ensure that this approach is computational...

chapter

SSDE-Cluster: Fast Overlapping Clustering of Networks Using Sampled Spectral Distance Embedding and GMMs

Malik Magdon-Ismail, Jonathan Purnell

2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing > 756 - 759

2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust (PASSAT) / 2011 IEEE Third Int'l Conference on Social Computing (SocialCom)

Clustering social networks is vital to understanding online interactions and influence. This task becomes more difficult when communities overlap, and when the social networks become extremely large. We present an efficient algorithm for constructing overlapping clusters, (approximately linear). The algorithm first embeds the graph and then performs a metric clustering using a Gaussian Mixture Model...

INFONA - science communication portal

Search results

Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms

Relationship of Jaccard and edit distance in malware clustering and online identification (Extended abstract)

Extended core-based community detection for directed networks

GraphSteal: Dynamic Re-Partitioning for Efficient Graph Processing in Heterogeneous Clusters

Arabic text mining based on clustering and coreference resolution

Graph Partitioning in Parallelization of Large Scale Networks

Can We Group Similar Amazon Reviews: A Case Study with Different Clustering Algorithms

Diminishing Prototype Size for k-Nearest Neighbors Classification

A Practical Approach on Cleaning-Up Large Data Sets

SSDE-Cluster: Fast Overlapping Clustering of Networks Using Sampled Spectral Distance Embedding and GMMs

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options