The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
When data mining techniques are applied to uncertain data, their uncertainty has to be considered to obtain high quality results. Usually, an uncertain object is described by a probability density function, a probability density function is approximated by a large amount of sample points, and the distance between two uncertain objects is expressed by the expected distance. Computing the expected distance...
Motion planning is an important step in any complex robotic motion task. Many algorithms deal with this problem and a lot of effective approaches makes use of random generation of roadmaps or motion commands. In this paper, a novel algorithm for random roadmap generation is proposed. This approach, which addresses the planning problem with a resilience philosophy, relies on a network model with some...
Supplier categorization is considered as a business approach to reduce the logistic costs and improve business performance. In this work we propose a data clustering algorithm for supplier categorization namely S-Canopy clustering. It is simply making use of canopy clustering to reduce the number of distance comparisons. Comparison analysis shows a feasibility to obtain better results for categorization...
K-means algorithm is one of the most popular clustering algorithms. However, it is sensitive to initialized partition and the circular dataset. To attack this problem, this paper introduced an improved k-means algorithm based on multiple feature points. The algorithm selects a number of feature points as cluster centroids unlike the traditional algorithm which only uses one centroid. In addition,...
The hierarchical clustering methods are not scalable with the size of the dataset and need many database scans. This is potentially a severe problem for large datasets. One way to speed up the hierarchical methods is to summarize the data efficiently and subsequently apply the clustering methods to the summary of the data. In this paper, we propose a new scheme to summarize the dataset called data...
Clustering performance of the K-means greatly relies upon the correctness of the initial centroids. Usually the initial centroids for the K-means clustering are determined randomly so that the determined centroids may reach the nearest local minima, not the global optimum. This paper proposes a new approach to optimizing the designation of initial centroids for K-means clustering. This approach is...
In this article, a distributed clustering technique, that is suitable for dealing with large data sets, is presented. This algorithm is actually a modified version of the very common k-means algorithm with suitable changes for making it executable in a distributed environment. For large input size, the running time complexity of k-means algorithm is very high and is measured as O(TKN), where K is...
The clustering agglomerative hierarchical algorithm for date grouping is considered. To reduce algorithmic complexity without accuracy losses an approach with the speed and accuracy coefficient is proposed. Some results with quality characteristics of clustered data are presented.
Traditional clustering approaches usually analyze static datasets in which objects are kept unchanged after being processed, but many practical datasets are dynamically modified which means some previously learned patterns have to be updated accordingly. Re-clustering the whole dataset from scratch is not a good choice due to the frequent data modifications and the limited out-of-service time, so...
Clustering spatial data is a well-known problem that has been extensively studied. Although many methods have been proposed in the literature, but few have handled the spatial constraints properly, which may have significant consequences on the effectiveness of the clustering. Taking into account these constraints during the clustering process is costly and the modeling of the constraints is paramount...
The CLARA algorithm is one of the popular clustering algorithms in use nowadays. This algorithm works on a randomly selected subset of the original data and produces near accurate results at a faster rate than other clustering algorithms. CLARA is basically used in data mining applications. We have used this algorithm for color image segmentation.The original CLARA is modified for producing better...
In this paper, we introduce a clustering algorithm for intrusion detection based on WaveCluster algorithm and an entropy-based characteristics screening algorithm. WaveCluster algorithm has a low time complexity when the data are low-dimensional, but on the contrary, the actual network data are high-dimensional. So we reduce the dimension of the network data using characteristics screening before...
The scale of spatial data is usually very large. Clustering algorithm needs very high performance, good scalability, and able to deal with noise data and high-dimensional data. Proposed a quickly clustering algorithm based on one-dimensional distance calculation. The algorithm first partitions space-sets by one-dimensional distance, then clusters space-sets by set-distance and set-density. Next, uses...
Having not enough priori knowledge, it's a difficult work for a user to choose proper input parameters of a clustering algorithm. To find the best clustering result, the usual strategy is "trial-and-error" which repeats a clustering algorithm several times with different input parameters. It's well-known that clustering analysis is a time-consuming process, so repeated clustering means costing...
K-anonymity is a model to protect public released microdata from individual identification. It requires that each record is identical to at least k-1 other records in the anonymized dataset with respect to a set of privacy-related attributes. Although it is easy to anonymize the original dataset to satisfy the requirement of k-anonymity, it is important to ensure that the anonymized dataset should...
Clustering is one of the fundamental data mining tasks. Many different clustering paradigms have been developed over the years, which include partitional, hierarchical, mixture model based, density-based, spectral, subspace, and so on. The focus of this paper is on full-dimensional, arbitrary shaped clusters. Existing methods for this problem suffer either in terms of the memory or time complexity...
Clustering sensors nodes as the basic of routing is an efficient mechanism for prolonging the lifetime of wireless sensor networks. In this paper, the high-efficient multilevel clustering is abstracted as a root tree which has the performances of the minimal relay set and the maximal weight according to graph theory. A mathematical model for the clustering virtual backbone is built. Based on the model,...
State explosion problem is the primary obstacle to model complex system with Petri nets; modularization and hierarchy provide ways to solve this problem. When the bottom-up method is adopted, system functions in the lower layers are combined to obtain sub systems. The idea of clustering is introduced to decide which functions should be combined. The operation to combine two functions is defined; the...
Network diameter is one of the important parameters of a network, until now, however, there has not been a perfect algorithm which has a lower time complexity than O(n2) to deal with this problem. As increasingly expanding of network scale and increasing number of nodes and edges, it would spend a lot of time that using Floyd algorithm whose time complexity is defined O(n3) or Breadth-first Search(BFS)...
As a fundamental problem in data mining, pattern recognition and machine learning, clustering algorithm has been studied for decades, and has been improved in many aspects. However, parameter-free clustering algorithms are still quite weak, which makes their potential generalization to a lot of promising applications rather difficult. A parameter-free clustering algorithm based on density model is...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.