The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Agricultural mechanization impacts on agricultural productivity and society development far-reaching. The emergence of VLSI (Very Large-Scale Integrated circuits) provides possibility for full intelligence and automation of agricultural products. The VLSI placement is now facing such double challenges: the integration scale and the circuit performance. From the experimental results, we find current...
Topological data analysis is a noble method to analyze high-dimensional qualitative data using a set of properties from topology. In this paper, we explore the feasibility of topological data analysis for mining social media data by investigating the problem of image popularity. We randomly crawl images from Instagram, convert their captions to 300 dimensional numerical vectors using Word2vec, calculate...
Cluster analysis aims at classifying data elements into different categories according to their similarity. It is a common task in data mining and useful in various field including pattern recognition, machine learning, information retrieval and so on. As an extensive studied area, many clustering methods are proposed in literature. Among them, some methods are focused on mining clusters with arbitrary...
The division of police patrol districts affects patrol performance, such as average response time and workload variation. However, the possible sample space is large and the corresponding graph-partitioning problem is NP-complete. Moreover, the resulting patrol beats must be contiguous and compact. We propose a heuristic based, clustering method to divide a given police district into optimal patrol...
In this paper, a new approach is presented for data stream clustering which is one of the popular subject in recent years. In this proposed approach, two distinct data stream algorithms are used. Proposed approach is based on integrating localized Linear Discriminant Analysis (LLDA) which is adopted from Linear Discriminant Analysis (LDA) for data stream to CEDAS which is used graph structure for...
Clustering is a classical unsupervised learning task, which is aimed to divide a data set into several groups with similar objects. Clustering problem has been studied for many years, and many excellent clustering algorithms have been proposed. In this paper, we propose a novel clustering method based on density, which is simple but effective. The primary idea of the proposed method is given as follows...
Shared Nearest Neighbor (SNN) Clustering is a well-established density based clustering algorithm, which can find clusters of different sizes, shapes, and densities. SNN has been widely adopted in numerous applications. As the size of dataset becomes extremely large nowadays, it is inefficient or even impossible for large-scale data to be stored and processed on a single machine. Therefore, the scalability...
In this researched paper, a clustering algorithm to discover clusters of unusual shapes and densities. Hierarchical and Density based ways are implemented for constructing minimum Spanning Tree; the MST can be divided into two segments. In the first segment, local density is guesstimate at every data point. In the subsequent segment, hierarchical ways are used by combining clusters according to the...
The fuzzy joint points (FJP) is a method that uses a fuzzy neighborhood notion to deal with neighborhood parameter selection issue of classical density-based clustering and offers an unsupervised clustering tool. Recent works improved the method in terms of speed to enable the method for big data applications. However, space efficiency of the method is still a limiting factor. In this work, we discuss...
CFSFDP is a clustering algorithm based on density peaks, which can cluster non-spherical data sets, and also has the advantages of fast clustering and simple realization. However, the global density threshold dc, which leads to the decrease of clustering quality, is specified without the consideration of spatial distribution of the data. Moreover, the data sets with multi-density peaks cannot be clustered...
In the Era of Information, Extracting useful information out of massive amount of data and process them in less span of time has become crucial part of Data mining. CURE is very useful hierarchical algorithm which has ability to identify cluster of arbitrary shape and able to identify outliers. In this paper we have implemented CURE clustering algorithm over distributed environment using Apache Hadoop...
Data mining has gained much importance in the field of research these days. It makes perfect blend for analyzing data of any fields and provide decision based output. Data generation and storage these days are done at high speed. Non stationary systems play holistic role in providing such data. Availability of such data creates scope of analysis for researchers. Such data which are continuous, unbounded,...
Cluster analysis is a popular technique in statistics and computer science with the objective of grouping similar observations in relatively distinct groups generally known as clusters. In this paper we propose an approach called Manifold Density Peaks Clustering to improve the basic density peaks clustering. It mainly concerns three aspects. First, geodesic distance is adopted to calculate manifold...
This paper proposes a density grid-based algorithm (C_UStream) for clustering on uncertain data stream in sliding window which can find clusters of arbitrary shapes. The statistical summary information of each grid is stored in linked queue structure by using sampling window mechanism. In order to guarantee the validity of clustering, the expired grids in the current window are removed regularly....
This paper proposed an automatic clustering algorithm based on entropy for discovering the interest pattern over users' web log. We introduced the information entropy on the basis of clustering algorithm. Compared with traditional clustering algorithms, our method does not require any parameters specified by the end user. Meanwhile, it can discover the clusters in arbitrary shape and size. Experimental...
Distributed data mining techniques and mainly distributed clustering are widely used in the last decade because they deal with very large and heterogeneous datasets which cannot be gathered centrally. Current distributed clustering approaches are normally generating global models by aggregating local results that are obtained on each site. While this approach mines the datasets on their locations...
This paper presents an approach for classification which is based on the neighborhood expansion. The proposed algorithm can (1) find automatically the number of clusters, and (2) classify irregular data set. In the approach, we first defined the distance between a point and a set, then the neighborhood of a data set. The algorithm can begin with any point in the data set and expands the point to a...
Humans analyze images mostly on their semantics. But such a semantic clustering of images is one of the difficult tasks in the field of computer vision. A clustering algorithm is proposed in this work to achieve a dataset with images grouped semantically. It does not utilize any background knowledge related either to the semantics of images or the number of clusters formed. The algorithm is based...
DBSCAN is a clustering algorithm based on density. It can divide regions which have a high density for clusters, shield the noise effectively and discover clusters of arbitrary shape and any size from dataset. However, DBSCAN algorithm needs to traverse dataset to find core objects, so it results in large amount of I/O cost when processing large-scale datasets. A fast algorithm (BEDBSCAN) is developed...
Clustering is a semi-supervised or unsupervised algorithm for classifying a set of data according to underlying characteristics or similarity. There are many different algorithms for different applications. Each algorithm has its advantages to some special fields. As to the data obtained from an automotive LUX-LIDAR, the existing algorithms are failed to cluster them accurately or efficiently. It...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.