The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Graph structure which is often used to model the relationship between the data items has drawn more and more attention. The graph datasets from many important domains have the property called scale-free. In the scale-free graphs, there exist the hubs, which have much larger degree than the average value. The hubs may cause the problems of load imbalance, poor scalability and high communication overhead...
Work-stealing, as a common user-level task scheduler for managing and scheduling tasks among worker threads, has been widely adopted in multithreaded applications. With work-stealing, worker threads attempt to steal tasks from other threads' queue when they run out of their own tasks. Though work-stealing based applications can achieve good performance due to the dynamic load balancing, these steal...
Providing QoS guarantees for hybrid storage systems made up of both solid-state drives (SSDs) and hard disks (HDs) is a challenging problem. Since HDs and SSDs have widely different IOPS capacities, it is not sensible to treat the storage system as a monolithic black box, instead a useful QoS model must necessarily differentiate the IOs made to different device types. Traditional storage resource...
In bioinformatics applications, suffix arrays are widely used to DNA sequence alignments in the initial exact match phase of heuristic algorithms. With the exponential growth and availability of data, using many-core accelerators, like GPUs, to optimize existing algorithms is very common. We present a new implementation of suffix array on GPU. As a result, suffix array construction on GPU achieves...
Host load prediction is one of the key research issues in Cloud computing. However, due to the drastic fluctuation of the host load in the Cloud, accurately predicting the host load remains a challenge. In this paper, a discriminative model (SVM) is employed to improve upon the accuracy of host load prediction in a Cloud data center. A rich set of features are generated by function based methods and...
The device people use to capture multimedia has changed over the years with the rise of smart phones. Smart phones are readily available, easy to use, and capture multimedia with high quality. While consumers capture all of this media, the storage requirements are not changing significantly. Therefore, people look towards cloud storage solutions. The typical consumer stores files within a single provider...
CUDA is a technology developed by NVIDIA which provides a parallel computing platform and programming model for NVIDIA GPUs and compatible ones. It takes benefit from the enormous parallel processing power of GPUs in order to accelerate a wide range of applications, thus reducing their execution time. rCUDA (remote CUDA) is a middleware which grants applications concurrent access to CUDA-compatible...
The well-known gap between relative CPU speeds and storage bandwidth results in the need for new strategies for managing I/O demands. In large-scale MPI applications, collective I/O has long been an effective way to achieve higher I/O rates, but it poses two constraints. First, although overlapping collective I/O and computation represents the next logical step toward a faster time to solution, MPI's...
The power consumption of a data centre (DC)can be attributed to the power consumed for running the servers and to the computer room air conditioner (CRAC)power for cooling them. The challenge is to distribute the load among servers, controlling the number of active servers and optimally balancing IT and cooling power requirement. This goal demands integration of thermal, power and workload models...
Unified Memory is an emerging technology which is supported by CUDA 6.X. Before CUDA 6.X, the existing CUDA programming model relies on programmers to explicitly manage data between CPU and GPU and hence increases programming complexity. CUDA 6.X provides a new technology which is called as Unified Memory to provide a new programming model that defines CPU and GPU memory space as a single coherent...
We present a scalable method to extensively search for and accurately select pharmaceutical drug candidates in large spaces of drug conformations computationally generated and stored across the nodes of a large distributed system. For each legend conformation in the dataset, our method first extracts relevant geometrical properties and transforms the properties into a single metadata point in the...
Provenance captured from E-Science experimentation is often large and complex, for instance, from agent-based simulations that have tens of thousands of heterogeneous components interacting over extended time periods. The subject of study of my dissertation is the use of E-Science provenance at scale. My initial research studied the visualization of large provenance graphs and proposed an abstract...
Although content sharing provides many benefits, content owners lose full control of their content once they are given away. Existing solutions provide limited capabilities of content access control as they are vendor-specific, non-structured and non-flexible. In this paper, we present an open and flexible software solution called SelfProtect Object (SPO). SPO bundles content and policy files in an...
Nowadays, there are several open-source solutions for building private, public and even hybrid clouds such as Eucalyptus, Apache Cloud Stack and Open Stack. KVM is one of the supported hypervisors for these cloud platforms. Different KVM configurations are being supplied by these platforms and, in some cases, a subset of CPU features are being presented to guest systems, providing a basic abstraction...
Entity resolution is the basic operation of data quality management, and the key step to find the value of data. The parallel data processing framework based on MapReduce can deal with the challenge brought by big data. However, there exist two important issues, avoiding redundant pairs led by the multi-pass blocking method and optimizing candidate pairs based on the transitive relations of similarity...
We present an architecture that increases persistence and reliability of automated infrastructure management in the context of hybrid, cluster-cloud environments. We describe our highly available implementation that builds upon Chef configuration management system and infrastructure-as-a-service cloud resources from Amazon Web Services. We summarize our experience with managing a 20-node Linux cluster...
The Smith-Waterman (SW) algorithm is universally used for a database search owing to its high sensitively. The widespread impact of the algorithm is reflected in over 8000 citations that the algorithm has received in the past decades. However, the algorithm is prohibitively high in terms of time and space complexity, and so poses significant computational challenges. Apache Spark is an increasingly...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.