The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Clustering is a classic topic in optimization with k-means being one of the most fundamental such problems. In the absence of any restrictions on the input, the best known algorithm for k-means with a provable guarantee is a simple local search heuristic yielding an approximation guarantee of 9+≥ilon, a ratio that is known to be tight with respect to such methods.We overcome this barrier...
In this paper, we examine the possibility to utilize the well-known approximations of Jaccard metric in order to reduce computational complexity of Edit Distance metric estimation. The scope of our analytical results is the representing strings rather than the original (raw) textual data, still in practice we obtained a solid indication that the results can be applied to (raw) strings that have low...
The focus of this paper is on detecting overlapping communities for the directed graphs by implementing a new algorithm and analyzing it with various performance metrics. The algorithm aims at finding core nodes for the directed graph which are subset of communities and have higher contact frequency. These are then extended to find communities using compactness measurement (CM). The compactness of...
With continuously growing data, clusters also need to grow periodically to accommodate the increased demand of data processing. This is usually done by addition of newer hardware, whose configuration might differ from the existing nodes. As a result, clusters are becoming heterogeneous in nature. For many real world machine learning and data mining applications, data is represented in the form of...
Text mining discover and extract useful information from documents, whenever increase the size and number documents leads to redouble features. The huge features for the documents adds challenge to text mining called high dimension. The aim of this proposed study is minimize the high dimension of the documents, and improve Arabic text mining using clustering. In order to achieve this goal, we propose...
Real world large scale networks exhibit intrinsiccommunity structure, with dense intra-community connectivityand sparse inter-community connectivity. Leveraging their communitystructure for parallelization of computational tasks andapplications, is a significant step towards computational efficiencyand application effectiveness. We propose a weighted depth-firstsearchgraph partitioning algorithm for...
The amount of unstructured text data available is growing exponentially due to the proliferation of digital information such as emails, text messages, blogs, social media posts, and product reviews. For users of e-commerce websites such as Amazon, navigating thousands of reviews before buying a product can be a daunting task. Unsupervised machine learning techniques can be used to automatically analyze...
In this paper, a new classification method based on k-Nearest Neighbor (kNN) lazy classifier is proposed. This method leverages the clustering concept to reduce the size of the training set in kNN classifier and also in order to enhance its performance in terms of time complexity. The new approach is called Modified Nearest Neighbor Classifier Based on Clustering (MNNCBC). Inspiring the traditional...
In this paper we propose a noise detection system based on similarities between instances. Having a data set with instances that belongs to multiple classes, a noise instance denotes a wrongly classified record. The similarity between different labeled instances is determined computing distances between them using several metrics among the standard ones. In order to ensure that this approach is computational...
Clustering social networks is vital to understanding online interactions and influence. This task becomes more difficult when communities overlap, and when the social networks become extremely large. We present an efficient algorithm for constructing overlapping clusters, (approximately linear). The algorithm first embeds the graph and then performs a metric clustering using a Gaussian Mixture Model...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.