The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The abundant aspects of big data and it's technology are increasing due to new methods of fetching data and diverse needs. Meteorological data is also the source of big data in terms of volume, variety, veracity and velocity, and it includes structured, unstructured and hybrid forms. This paper aims to apply Hadoop architecture and MapReduce algorithm into meteorological big data. It also describes...
With the phenomenal increase in digital data, it is inefficient to run the traditional clustering algorithms on separate servers. To deal with this problem, researchers are migrating to distribute environment to implement the traditional clustering algorithms, more specifically K-means clustering. In traditional K Means Clustering, the problem of instability caused by the random initial centers exists...
It has been planned that the whole region of Slovak Republic's surface would be scanned, and there arose a need for storing the resulting data and making it publicly available. For this purpose, a scalable file-based database system for storing and accessing a large amount of geographic point cloud data was developed. The principle of the system was tested and proved to be sufficient in most situations,...
K-means is the most widely used clustering algorithm due to its fairly straightforward implementations in various problems. Meanwhile, when the number of clusters increase, the number of iterations also tend to slightly increase. However there are still opportunities for improvement as some studies in the literature indicate. In this study, improved implementations of k-means algorithm with a centroid...
Rough set theory has been proven to be a successful computational intelligence tool. Rough entropy is a basic concept in rough set theory and it is usually used to measure the roughness of information set. Existing algorithms can only deal with small data set. Therefore, this paper proposes a method for parallel computation of entropy using MapReduce, which is hot in big data mining. Moreover, corresponding...
Hadoop is a very efficient distributed processing framework. It's based on map-reduce approach where the application is divided into small fragments of work, each of which may be executed on any node in the cluster. Hadoop is very efficient tool in storing and processing unstructured, semi-structured and structured data. Unstructured data usually refers to the data stored in files not in traditional...
This article first introduce the Core Architecture and operational mechanism of cloud computing and HADOOP platform, then put forward the technical architecture of data mining platform Based on HADOOP. After a thorough understanding to the Map Reduce programming pattern, HSPRINT algorithm is realized in the decision tree. At last the effectiveness of the algorithm is verified through experiments.
Big data such as complex networks with over millions of vertices and edges is infeasible to process using conventional computation. MapReduce is a programming model that empowers us to analyze big data in a cluster of computers. In this paper we propose a Parallel Structural Clustering Algorithm for big Networks (PSCAN) in MapReduce for the detection of clusters or community structures in big networks...
Cloud storage has become increasingly popular due to its convenience, cost-effectiveness and scalability. It provides the basis for a slate of file hosting services, which offer users the ability to synchronize their files between the servers and their devices. Naive file synchronization, however, requires the whole file to be transmitted to all other locations (servers, devices) whenever the file...
In the present, scheduling problem is a hot Cloud Computation research issues, the purpose is to coordinate the Cloud Computation resources to be fully rational use. Data locality is one of the main properties in the particular cloud platform for Hadoop. The paper discussed the property, proposed a new improvement of the Hadoop relevant data locality scheduling algorithm based on LATE. The algorithm...
With the high development of Internet, e-commerce websites now routinely have to work with log datasets which are up to a few terabytes in size. How to remove messy data timely with low cost and find out useful information is a problem we have to face. The mining process involves several steps from pre-processing the raw data to establishing the final models. In this paper we describe our method to...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.