The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper examines a schema for graph-theoretic clustering using node-based resilience measures. Node-based resilience measures optimize an objective based on a critical set of nodes whose removal causes some severity of disconnection in the network. Beyond presenting a general framework for the usage of node based resilience measures for variations of clustering problems, we emphasize the unique...
Clustering is applied to many applications and the decision with regards to which algorithm to use is dependent on the nature of the task to be carried out. Before choosing which clustering algorithm to use one needs to be aware of the nature of the task to be done and then determine the algorithm accordingly, based on the capabilities and performance metrics of that algorithm. This paper makes an...
In this paper, we consider a special multi-source data clustering problem for which the data-points from the same source cannot be grouped into the same cluster, namely cannot link (CL) constraint, and the sizes of the generated clusters are subject to maximum thresholds. No prior information is given about the level of clutter (namely noisy data) or the number of clusters. Particularly, the clusters...
Hydro and Agro Informatics Institute (HAII) has installed more than 800 telemetry stations across Thailand to collect water level data for operation tasks and researches, e.g., flooding prevention system. To have an accurate result, it is crucial to control the quality of data by detecting and filtering out anomalies. In our previous work, a data quality management system to capture various types...
In this paper we demonstrate a new density based clustering technique, CODSAS, for online clustering of streaming data into arbitrary shaped clusters. CODAS is a two stage process using a simple local density to initiate micro-clusters which are then combined into clusters. Memory efficiency is gained by not storing or re-using any data. Computational efficiency is gained by using hyper-spherical...
In masquerade attack, attacker impersonates legitimate user. Most of the masquerade detection techniques done so far are based on supervised learning techniques. But here in this paper masquerade detection based on unsupervised learning techniques are used. Various clustering algorithms used are K-Means, K-Medoid, Agglomerative clustering algorithm and DBSCAN. A comparative study is done based on...
Clustering algorithms based on Grid are attractive for the task of data partition in spatial database. In the background of big data more and more research focuses on how to solve the conflict between efficiency and accuracy of clustering. Existing Grid-based clustering algorithms generally have a high time efficiency without considering the distribution of the data inside a grid. In this paper, a...
In this paper we propose a noise detection system based on similarities between instances. Having a data set with instances that belongs to multiple classes, a noise instance denotes a wrongly classified record. The similarity between different labeled instances is determined computing distances between them using several metrics among the standard ones. In order to ensure that this approach is computational...
The use of word senses in place of surface word forms has been shown to improve performance on many computational tasks, including intelligent web search. In this paper we propose a novel approach to automatic discovery of word senses from raw text, a task referred to as Word Sense Induction (WSI). Almost all the WSI approaches proposed in the literature dealt with monolingual data and only very few...
DBSCAN is a clustering algorithm based on density. It can divide regions which have a high density for clusters, shield the noise effectively and discover clusters of arbitrary shape and any size from dataset. However, DBSCAN algorithm needs to traverse dataset to find core objects, so it results in large amount of I/O cost when processing large-scale datasets. A fast algorithm (BEDBSCAN) is developed...
Nowadays we communicate in a digital universe. In fact the amount of data (structured and unstructured) is exploding. That's what we call Big Data. The voluminous data are in the most of cases noisy and overlapping, their clustering makes critical challenges. In addition validating resulting partitions is a serious problem. In this paper we present a new fuzzy validity index able to interpret the...
Observations from satellite lidar instruments have provided evidence in the remarkable changes in polar ice sheets on a global scale. The Ice, Cloud and land Elevation Satellite-2 (ICESat-2) is scheduled for launch by NASA in 2017 and will monitor the elevation changes of polar ice sheets and vegetation canopy. To validate ICESat-2's approach of photon-counting laser altimetry, measurements obtained...
Points of interest (PoI) data serves an important role as a foundation for a wide variety of location-based services. Such data is typically obtained from an authoritative source or from users through crowd sourcing. It can be costly to maintain an up-to-date authoritative source, and data obtained from users can vary greatly in coverage and quality. We are also witnessing a proliferation of both...
This paper mainly introduces a practical algorithm called fuzzy-possibilistic c-means (FPCM) clustering algorithm. It is based on fuzzy c-means (FCM) clustering algorithm and possibilistic c-means (PCM) clustering algorithm. FPCM algorithm figures out the existing problems of the above two algorithms and produces both memberships and possibilities simultaneously. For example, FPCM algorithm works...
Clustering is a semi-supervised or unsupervised algorithm for classifying a set of data according to underlying characteristics or similarity. There are many different algorithms for different applications. Each algorithm has its advantages to some special fields. As to the data obtained from an automotive LUX-LIDAR, the existing algorithms are failed to cluster them accurately or efficiently. It...
Data Mining is all about data analysis techniques. It is useful for extracting hidden and interesting patterns from large datasets. Clustering techniques are important when it comes to extracting knowledge from large amount of spatial data collected from various applications including GIS, satellite images, X-ray crystallography, remote sensing and environmental assessment and planning etc. To extract...
Most of the clustering algorithms are affected by the number of attributes and instances with respect to the computation time. Thus, the data mining community has made efforts to enable induction of the clustering efficient. Hence, scalability is naturally a critical issue that the data mining community faces. A method to handle this issue is to use a subset of all instances. This paper suggests an...
Nowadays, organizations are facing several challenges when they try to analyze generated data with the aim of extracting useful information. This analytical capacity needs to be enhanced with tools capable of dealing with big data sets without making the analytical process a difficult task. Clustering is usually used, as this technique does not require any prior knowledge about the data. However,...
Spatio-temporal clustering is a sub field of data mining that is increasingly gaining more scientific attention due to the advances of location-based or environmental devices that register position, time and, in some cases, other semantic attributes. This process pretends to group objects based in their spatial and temporal similarity helping to discover interesting patterns and correlations in large...
Several clustering algorithms have been extensively used to analyze vast amounts of spatial data. One of these algorithms is the SNN (Shared Nearest Neighbor), a density-based algorithm, which has several advantages when analyzing this type of data due to its ability of identifying clusters of different shapes, sizes and densities, as well as the capability to deal with noise. Having into account...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.