The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Clustering is one of the data mining techniques used in a knowledge discovery process. It is assumed that a good representation of data points may yield good clustering results [6]. This paper discusses the effect of the coordinate system on the clustering. In this paper, we propose a density based clustering approach to group objects represented using Polar coordinate system. The experiment is carried...
The new era of information communication and technology (ICT), everyone wants to store/share their Data or information in online media, like in cloud database, mobile database, grid database, drives etc. When the data is stored in online media the main problem is arises related to data is privacy because different types of hacker, attacker or crackers wants to disclose their private information as...
Clustering is an important facet of explorative data mining and finds extensive use in several fields. In this paper, we propose an extension of the classical Fuzzy C-Means clustering algorithm. The proposed algorithm, abbreviated as VFC, adopts a multi-dimensional membership vector for each data point instead of the traditional, scalar membership value defined in the original algorithm. The membership...
Data clustering is an important task in data mining, image processing and other pattern recognition problems. One of the most popular clustering algorithms is the Fuzzy C-Means (FCM). The performance of the FCM is strongly affected by the selection of the initial centroid clusters. Therefore, choosing a good set of initial centroid clusters is very important for the algorithm. However, it is difficult...
This paper uses expectation-maximization clustering algorithm and a simple multidimensional projection method for visualization and data reduction. The multidimensional data is projected into a 2D Cartesian coordinate system. We run EM and K-Means algorithms on the transformed data. The system uses Microsoft Spatial Data Base Engine as a GIS tool for visualization. We used Expectation-Maximization...
To increase the efficiency of the clustering algorithms and for visualization purpose the dimension reduction techniques may be employed. In this paper our aim is to develop a simple dimension reduction technique to convert a high dimensional data to two dimensional data and then apply K-Means clustering algorithm on converted (two dimensional) data. We have applied our technique on three real datasets...
Clustering is known as grouping of data based on their similarities. This paper introduces an algorithm of k means for clustering of data streams and detection of outliers. The introduced technique for detection of outliers is based on distance as well as on time on which they arrive in the cluster. This paper also takes into account the selection of k centers and variable size of buckets with the...
Uncertain data mining has recently attracted interests from researchers due to its presence in many applications such as Global Positioning System (GPS) Wireless Sensor Networks (WSN), Moving Object Tracking. This paper is researching uncertain data clustering problem, almost all the existed algorithms of uncertain data calculate expectation to express the distance of objects, so they can cluster...
The traditional fuzzy c-means (FCM) operates when cluster number c is assigned. The value of c makes a great influence on the cluster result. However, the value of cluster number can not be confirmed automatically and needs to be inputted manually, which results in hinders when using the fuzzy c-means. Some researchers have investigated the problem. By combining the concept of distance cost function...
We present an efficient genetic algorithm for mining multi-objective rules from large databases. Multi-objectives will conflict with each other, which makes it optimization problem that is very difficult to solve simultaneously. We propose a multi-objective evolutionary algorithm called improved niched Pareto genetic algorithm(INPGA), which not only accurate selects the candidates but also saves selection...
In Data mining and Knowledge discovery, clustering is one of the most important techniques in the process of discovering salient structures from the data. This paper explores the idea of statistical consensus method for combining results from multiple clustering or partitions. We explored this idea when working with customs data from Revenue Authority. The partitions are generated by running k-means...
Nowadays, clustering algorithms are widely used in the commercial field, such as customer analysis, and this application has achieved good effect. K-means algorithm is by far the most commonly used method for clustering. Although, the time consumption is fairly high when faced with lager-scale data. In this paper, we improved the K-means algorithm. Our improvement is based on the triangle inequality...
Outlier detection is a hot topic of data mining. After studying the existing classical algorithms of detecting outlier, this paper proposes an outlier mining algorithm based on confidence interval, and makes a new definition for outlier. The method combines mathematical statistics and density-based clustering algorithm. It clustering firstly with DBSCAN algorithm, obtains credible sample and suspicious...
K-means clustering is a popular clustering algorithm based on the partition of data. However, there are some shortcomings of it, such as its requiring a user to give out the number of clusters at first, and its sensitiveness to initial conditions, and its easily getting to the trap of a local solution et cetera. The global K-means algorithm proposed by Likas et al is an incremental approach to clustering...
Based on the basic model of ant colony clustering algorithm, LF, an improved ant colony clustering algorithm (IACC) is proposed. The constructing method, the colony similarity, and the behavior of the ant are redefined. A new adaptive parameter adjustment strategy is also presented in this paper. Experimental results on clustering benchmarks indicate that the proposed algorithm has better performance...
Outlier detection is a hot topic of data mining. After studying the existing classical algorithms of detecting outliers, this paper proposes an outlier mining algorithm based on probability, and makes a new definition for outlier. It clusterings firstly with density-based algorithm, and determines suspicious outlier. Then, outlier will be detected according to probability. The experiment results on...
Clustering methods usually require to know the best number of clusters, or another parameter, e.g. a threshold, which is not ever easy to provide. This paper proposes a new graph-based clustering method called GBC which detects automatically the best number of clusters, without requiring any other parameter. In this method based on regions of influence, a graph is constructed and the edges of the...
Cat swarm optimization (CSO) is one of the new heuristic optimization algorithm which based on swarm intelligence. Previous research shows that this algorithm has better performance compared to the other heuristic optimization algorithms: Particle swarm optimization (PSO) and weighted-PSO in the cases of function minimization. In this research a new CSO algorithm for clustering problem is proposed...
Clustering is a technique that can divide data objects into meaningful groups. Particle swarm optimization is an evolutionary computation technique developed through a simulation of simplified social models. K-means is one of the popular unsupervised learning clustering algorithms. After analyzing particle swarm optimization and K-means algorithm, a new hybrid algorithm based on both algorithms is...
As dimensionality is very high, image feature space is usually complex. For effectively processing this space, technology of dimensionality reduction is widely used. Semi-supervised clustering incorporates limited information into unsupervised clustering in order to improve clustering performance. However, many existing semi-supervised clustering methods can not be used to handle high-dimensional...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.