The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Clustering is a common data mining procedure that groups multi-dimensional points with similar components to form different subsets. Among all of the clustering algorithms, DBSCAN is one of the most popular algorithms owing to finding clusters with arbitrary shapes and noise of datasets. However, with data volumes growing and the execution time of algorithms becoming longer, numerous methods have...
In this paper, we presented a novel graph-based clustering algorithm (GC). GC contains two main steps: the first step is to create a graph and find out the key nodes as centers, the second step is to divide every data point to each center. The centers are selected from a graph view. Experimental results on 8 datasets demonstrated that GC could do better than k-means, k-medoids, Hierarchical Clustering...
Data mining is a powerful concept with great potential to predict future trends and behavior. It refers to the extraction of hidden knowledge from large datasets using techniques like statistical analysis, machine learning, clustering, neural networks and genetic algorithms. Hybrid algorithms for data mining are a logical combination of multiple pre-existing techniques to enhance performance and provide...
Cluster analysis is a principal method in analytics domain of data mining. The algorithm used for clustering directly influences the results obtained from applying the clustering algorithm (clusters). Data clustering is done in order to identify the patterns and trends not identifiable from just looking at the data. Clustering may be supervised (if the machine training data set is available) or unsupervised...
Spectral clustering has shown a superior performance in analyzing the cluster structure. However, the exponentially computational complexity limits its application in analyzing large-scale data. To tackle this problem, many low-rank matrix approximating algorithms are proposed, of which the Nyström method is an approach with proved lower approximate errors. The algorithms commonly combine two powerful...
Data mining is the method which is useful for extracting useful information and data is extorted, but the classical data mining approaches cannot be directly used for big data due to their absolute complexity. The data that is been formed by numerous scientific applications and incorporated environment has grown rapidly not only in size but also in variety in recent era. The data collected is of very...
Amount and diversity of data produced and processed has been dramatically increased parallel to improvements in technology. Unfortunately produced data usually don't have any labels which may make the classification and building information process more easily. This resulted with higher importance on data clustering for builing information. In this work K-Means, Spectral Clustering and Girvan-Newman...
Machine Learning is the field of computer science that learns from data by studying algorithms and their constructions. In machine learning, predictions can be made by using certain algorithms for specific inputs. In this paper important classification and clustering algorithms are discussed which can be further applied to BE (Information Technology). Third Semester to evaluate student's performance...
Clustering is one of the most widely studied problem in machine learning and data mining. The algorithms for clustering depend on the application scenario and data domain. K-Means algorithm is one of the most popular clustering techniques that depend on distance measure. In this work, an extensive empirical evaluation of three significant variations of K-Means algorithm is carried out on the basis...
The data generated from both men and machines are exponentially multiplying the size and the structural definition of the data. Such a voluminous, dynamic and unstructured data termed as Big Data is analyzed and maintained and can be used for various purposes and applications. Big Data is generated from sources like social media, cyber physical system and business entities. This enormous data generation...
Big data is a broad data set that has been used in many fields. To process huge data set is a time consuming work, not only due to its big volume of data size, but also because data type and structure can be different and complex. Currently, many data mining and machine learning technique are being applied to deal with big data problem; some of them can construct a good learning algorithm in terms...
Application of clustering algorithms for investigating real life data has concerned many researchers and vague approaches or their hybridization with other analogous approaches has gained special attention due to their great effectiveness. Recently, rough intuitionistic fuzzy c-means algorithm has been proposed by Tripathy et al [3] and they established its supremacy over all other algorithms contained...
In view of the problems existing in traditional recommendation algorithm of low accuracy and low efficiency, this paper presents a machine learning based social media recommendation algorithm. The algorithm is based on the traditional personalized collaborative filtering algorithm, and combines with the correlation characteristics among users in a social network. Besides, the algorithm also considers...
Big data is flowing into every area of our life, professional and personal. Big data is defined as datasets whose size is beyond the ability of typical software tools to capture, store, manage and analyze, due to the time and memory complexity. Velocity is one of the main properties of big data. In this demo, we present SAMOA (Scalable Advanced Massive Online Analysis), an open-source platform for...
In this paper, we consider designing clustering algorithms that can be used in MapReduce using Spark platform, one of the most popular programming environment for processing large datasets. We focus on the practical and popular serial Self-organizing Map clustering algorithm (SOM). SOM is one of the famous unsupervised learning algorithms and it's useful for cluster analysis of large quantities of...
Data driven decision support systems often benefit from human participation to validate outcomes produced by automated procedures. Perceived utility hinges on the system's ability to learn transparent, comprehensible models from data. We introduce and formalize Informative Projection Recovery: the problem of extracting a set of low-dimensional projections of data which jointly form an accurate solution...
Clustering is an important tool which has seen an explosive growth in Machine Learning Algorithms. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clustering algorithm is one of the most primary methods for clustering in data mining. DBSCAN has ability to find the clusters of variable sizes and shapes and it will also detect the noise. The two important parameters Epsilon (Eps)...
Clustering is an attractive and important task in data mining which is used in many applications. However earlier work on clustering focused on only categorical data which is based on attribute values for grouping similar kind of data items thus will leads to convergence problem of clustering process. This proposed work is to enhance the existing k-means clustering process based on the categorical...
With the advent of modern techniques for scientific data collection, large quantities of data are getting accumulated at various databases. Systematic data analysis methods are necessary to extract useful information from rapidly growing data banks. Cluster analysis is one of the major data mining methods and the k-means clustering algorithm is widely used for many practical applications. But the...
This paper describes a new revised clustering algorithm in which each cluster center derived from the revised mean of a subclass in previous recursion. This modification factors make up with the mean of the cluster center in previous recursion multiplied with a coefficient polynomial. This computing center formula is derived from Fisher criteria. Experimental results show that the proposed clustering...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.