The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Innovative scientific applications and emerging dense data sources are creating a data deluge for high-end computing systems. Processing such large input data typically involves copying (or staging) onto the supercomputer's specialized high-speed storage, scratch space, for sustained high I/O throughput. The current practice of conservatively staging data as early as possible makes the data vulnerable...
Virtual machine (VM) technologies have made much progress in improving the efficiency of virtualizing CPU and memory. However, achieving high performance for I/O virtualization remains a challenge, especially for high speed networking devices such as 10 Gigabit Ethernet (10GbE) NICs, and commonly used software-based I/O virtualization approaches usually suffer significant performance degradation compared...
In this paper we present a fine grained parallel architecture that performs meaning comparison using vector cosine similarity (dot product). Meaning comparison assigns a similarity value to two objects (e.g. text documents) based on how similar their meanings (represented as two vectors) are to each other. The novelty of our design is the fine grained parallelism which is not exploited in available...
An Interest Management (IM) mechanism eliminates irrelevant status updates transmitted in Networked Virtual Environments (NVE). However, IM itself involves both computation and communication overhead, of which the latter is the focus of this paper. Traditionally, there are area-based and cell-based IM mechanisms. This paper proposes a hybrid IM mechanism for peer-to-peer NVEs, that utilizes the cell-based...
With larger and larger systems being constantly deployed, trace-based performance analysis of parallel applications has become a challenging task. Even if the amount of performance data gathered per single process is small, traces rapidly become unmanageable when merging together the information collected from all processes. In general, an efficient analysis of such a large volume of data is subject...
Exploiting parallelism in route planning algorithms is a challenging algorithmic problem with obvious applications in mobile navigation and timetable information systems. In this work, we present a novel algorithm for the so-called one-to-all profile-search problem in public transportation networks. It answers the question for all fastest connections between a given station S and any other station...
Very Long Instruction Word (VLIW) processors are a popular choice in embedded domain due to their hardware simplicity, low cost and low power consumption. Simultaneous MultiThreading (SMT) is a popular technique for improving processor performance. To maintain execution semantics, a VLIW instruction needs to be issued in entirety, which restricts the opportunities in SMT. Split-issue at operation-level...
We present collaborative peer-to-peer algorithms for the problem of approximating frequency counts for popular items distributed across the peers of a large-scale network. Our algorithms are attack-resistant in the sense that they function correctly even in the case where an adaptive and computationally unbounded adversary causes up to a 1/3 fraction of the peers in the network to suffer Byzantine...
In this paper we study algorithms for performing the LU and QR factorizations of dense matrices. Recently, two communication optimal algorithms have been introduced for distributed memory architectures, referred to as communication avoiding CALU and CAQR. In this paper we discuss two algorithms based on CAQR and CALU that are adapted to multicore architectures. They combine ideas to reduce communication...
We consider the problem of scheduling jobs on a pool of machines. Each job requires multiple machines on which it executes in parallel. For each job, the input specifies release time, deadline, processing time, profit and the number of machines required. The total number of machines may be different at different points of time. A feasible solution is a subset of jobs and a schedule for them such that...
A quality-of-service (QoS) aware, bi-directional channel NoC (BiNoC) architecture is proposed to support guarantee-service (GS) traffic while reducing packet delivery latency. By incorporating dynamically self-reconfigured bidirectional communication channels between adjacent routers, BiNoC architecture promises more flexibility for various traffic flow patterns. A novel inter-router communication...
We propose a novel job scheduling approach for homogeneous cluster computing platforms. Its key feature is the use of virtual machine technology for sharing resources in a precise and controlled manner. We justify our approach and propose several job scheduling algorithms. We present results obtained in simulations for synthetic and real-world High Performance Computing (HPC) workloads, in which we...
Hypervisor-based fault tolerance (HBFT), a checkpoint-recovery mechanism, is an emerging approach to sustaining mission-critical applications. Based on virtualization technology, HBFT provides an economic and transparent solution. However, the advantages currently come at the cost of substantial overhead during failure-free, especially for memory intensive applications. This paper presents an in-depth...
We describe an approach to parallel graph partitioning that scales to hundreds of processors and produces a high solution quality. For example, for many instances from Walshaw's benchmark collection we improve the best known partitioning. We use the well known framework of multi-level graph partitioning. All components are implemented by scalable parallel algorithms. Quality improvements compared...
Modeling and analysis of large scale scientific systems often use linear least squares regression, frequently employing Cholesky factorization to solve the resulting set of linear equations. With large matrices, this often will be performed in high performance clusters containing many processors. Assuming a constant failure rate per processor, the probability of a failure occurring during the execution...
Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural resources, it has hitherto been limited to single kernel instantiations; in addition, the large variety of stencil kernels used in practice makes this computation pattern difficult to assemble into a library. This work presents a stencil auto-tuning framework that significantly advances programmer productivity...
The majority of CPUs now sold contain multiple computing cores. However, current desktop grid computing systems either ignore the multiplicity of cores, or treat them as distinct, independent machines. The latter approach ignores the resource contention present between cores in a single CPU, while the former approach fails to take advantage of significant computing power. We propose a decentralized...
The high power consumption of cluster computing infrastructures has become a major concern. It leads to the increased heat dissipation and decreased reliability of cluster servers. Power management becomes a critical issue in cluster computing. In this paper, we start with an analysis of the relationship between cluster performance and power consumption. We study both the problem of minimizing the...
Driven by the increasing demand for large-scale and high-performance data protection, disk-based de-duplication storage has become a new research focus of the storage industry and research community where several new schemes have emerged recently. So far these systems are mainly inline de-duplication approaches, which are centralized and do not lend themselves easily to be extended to handle global...
Backfilling and short-job-first are widely acknowledged enhancements to the simple but popular first-come, first-served job scheduling policy. However, both enhancements depend on user-provided estimates of job runtime, which research has repeatedly shown to be inaccurate. We have investigated the effects of this inaccuracy on backfilling and different queue prioritization policies, determining which...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.