The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Info-Kmeans, a K-means clustering method employing KL-divergence as the proximity function, is one of the representative methods in information-theoretic clustering. With the explosive growth of online texts such as online reviews and user-generated content, the text is becoming more sparse and much bigger, which poses significant challenges on both effectiveness and efficiency issues of text clustering...
In big data research, an important field is the big data graph algorithm. The Bayesian Network (BN) is a very powerful graph model for causal relationship modeling and probabilistic reasoning. One key process of building a BN is discovering its structure -- a directed acyclic graph (DAG). In the literature, numerous Bayesian network structure learning algorithms are proposed to discover BN structure...
Mining abnormal patterns is important in many areas. With the prevalence of big data, in order to ensure efficiency, an algorithm named PPSpan (JOMP-based parallel Prefix Span) is proposed under the research of traditional serial sequential pattern mining methods. Firstly, redundant parameters are eliminated with grey correlation analysis. Secondly, outlier information is extracted according to the...
Running data-intensive scientific workflow across multiple data centers faces massive data transfer problem which leads to low efficiency in actual workflow application for scientists. By considering data size and data dependency, we propose a k-means algorithm based initial data placement strategy that places the most related initial data sets into the same data center at workflow preparation stage...
With the rapid grows of cloud-based internet application, a need for efficient resource allocation, load balance and cost management increases. In this paper, we propose a group-auction based mechanism for the cloud instance market to efficiently allocate resources. In the market system, resource providers offer resources in the form of virtual machine. Users submit their bids. The proposed system...
With the explosion of data in the past decade, big data is becoming a research hotspot in the information field. Many cloud-based distributed data processing platforms have been proposed to provide efficient and cost effective solutions for big data query processing, such as Hadoop, Hive, Pig, etc. However, most of the current research works are focus on improving the performance of query processing...
Approximate duplicate-detection (or membership query) in data streams answers the question of whether an element from a large universe U (a query element) is present in a small subsequence of a data stream or not. It is an important query that has many Internet applications, such as web crawling, social networks and so on. Existing approximate duplicatedetection methods in the sliding window model...
Massive cloud-based data-intensive applications (e.g., iterative MapReduce-based) could involve graph data processing. How to effectively analyze and process large-scale graph data is an unsolved challenging problem. We present a parallel computation framework, named MyBSP, which is inspired by Google's Pregel system. MyBSP supports and implements the Bulk Synchronous Parallel (BSP) programming model,...
Cloud service providers (CSP) usually deploy geographically distributed data centers to improve QoS for colocated customers. Inter-Data center traffic constitutes almost half of the data center's export traffic and occupies significant part of the operational cost. Many store-and-forward mechanisms have been proposed to improve the efficiency of inter-data center transfer. However, existing store-and-forward...
Data center networks become increasingly important with the growth of cloud computing. For any integers k ≥ 0 and n ≥ 2, the k-dimensional DCell, Dk,n, has been proposed for one of the most important data center networks as a server-centric data center network structure. In this paper, we propose an efficient algorithm for finding disjoint paths in node-to-set routing...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.