The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The exponential amount of geospatial data that has been accumulated in an accelerated pace has inevitably motivated the scientific community to examine novel parallel technologies for tuning the performance of spatial queries. Managing spatial data for an optimized query performance is particularly a challenging task. This is due to the growing complexity of geometric computations involved in querying...
In this paper, the implementation of the K-means clustering algorithm on a Hadoop cluster with FPGA-based hardware accelerators is presented. The proposed design follows MapReduce programming model and uses Hadoop distribution file system (HDFS) for storing large dataset. The proposed FPGA-based hardware accelerator for speed up the K-means clustering algorithm is implemented on Xilinx VC707 evaluation...
One of the most important machine learning techniques include clustering of data into different clusters or categories. There are several decent algorithms and techniques that exist to perform clustering on small to medium scale data. In the era of Big Data and with applications being large-scale and data-intensive in nature, there is a significant increment in volume, variety and velocity of data...
The abundant aspects of big data and it's technology are increasing due to new methods of fetching data and diverse needs. Meteorological data is also the source of big data in terms of volume, variety, veracity and velocity, and it includes structured, unstructured and hybrid forms. This paper aims to apply Hadoop architecture and MapReduce algorithm into meteorological big data. It also describes...
Clustering is among the most common data mining techniques and Fuzzy clustering can model the world even more realistically and more precisely. One of the most favorable fuzzy clustering methods is the Fuzzy C-Means (FCM) algorithm, which is actually identical to the (original) K-Means clustering algorithm fueled with a fuzzy flavor. However, there are some issues with the fuzzy clustering methods;...
In today's digital world scenario, digital data is coming in and going out faster than ever before. This data is of no use until we extract some useful content from it. But, it is impractical and inefficient to use traditional database management techniques on big data. That's why, big data technologies like Hadoop comes to existence. Hadoop is an open source framework, which can be used to process...
With the phenomenal increase in digital data, it is inefficient to run the traditional clustering algorithms on separate servers. To deal with this problem, researchers are migrating to distribute environment to implement the traditional clustering algorithms, more specifically K-means clustering. In traditional K Means Clustering, the problem of instability caused by the random initial centers exists...
In this paper, a new algorithm for visualization of high-multidimensional data is described. The algorithm follows several steps. At first, centers representing several categories are selected, and Euclidean distances between these centers are calculated in a high-dimensional space. Then these centers are placed in a 2-dimensional space in such a way that distances in this 2-dimensional space are...
K-means is the most widely used clustering algorithm due to its fairly straightforward implementations in various problems. Meanwhile, when the number of clusters increase, the number of iterations also tend to slightly increase. However there are still opportunities for improvement as some studies in the literature indicate. In this study, improved implementations of k-means algorithm with a centroid...
Big data is a set of very large and complex data that is hard to load on computers. The main challenge in big data world is related to their search, categorize and analyze specially, when they are unbalanced. Despite, there are a lot of works in the field of big data but analyzing unbalanced big data is still a fundamental challenge in this area. In this paper we try to solve the problem of RSIO-LFCM...
Rough set theory has been proven to be a successful computational intelligence tool. Rough entropy is a basic concept in rough set theory and it is usually used to measure the roughness of information set. Existing algorithms can only deal with small data set. Therefore, this paper proposes a method for parallel computation of entropy using MapReduce, which is hot in big data mining. Moreover, corresponding...
Big data such as complex networks with over millions of vertices and edges is infeasible to process using conventional computation. MapReduce is a programming model that empowers us to analyze big data in a cluster of computers. In this paper we propose a Parallel Structural Clustering Algorithm for big Networks (PSCAN) in MapReduce for the detection of clusters or community structures in big networks...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.