The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications required to implement peer-to-peer...
DCFA-MPI is an MPI library implementation for Intel Xeon Phi co-processor clusters, where a compute node consists of an Intel Xeon Phi co-processor card connected to the host via PCI Express with InfiniBand. DCFA-MPI enables direct data transfer between Intel Xeon Phi co-processors without assistance from the host. Since DCFA, a direct communication facility for many-core based accelerators, provides...
Packet classification is used in network firewalls to identify and filter threats or unauthorized network access at the application level. This is realized by comparing incoming packet headers against a predefined rule set. Many solutions to packet classification are available, but most of these solutions exploit some features of the rule set in order to minimize the memory footprint of rule set storage...
Dynamically Reconfigurable Systems (DRS) allow hardware logic to be partially reconfigured while the rest of the design continues to operate. For example, the Auto Vision driver assistance system swaps video processing engines when the driving conditions change. However, the architectural flexibility of DRS also introduces challenges for verifying system functionality. Using Auto Vision as a case...
Managing future many-core architectures with hundreds of cores, running multiple applications in parallel, is very challenging. One of the major reasons is the communication overhead required to handle such a large system. Distributed management is proposed to reduce this overhead. The architecture is divided into regions which are managed separately. The instance managing the region and the applications...
Deep Belief Networks (DBNs) are state-of-art Machine Learning techniques and one of the most important unsupervised learning algorithms. Training DBNs is computationally intensive which naturally leads to investigate FPGA acceleration. Fixed-point arithmetic can have an important influence on the execution time and prediction accuracy of a DBN. Previous studies have focused only on customized DBN...
Multiply-add operations form a crucial part of many digital signal processing and control engineering applications. Since their performance is crucial for the application-level speed-up, it is worthwhile to explore a wide spectrum of implementations alternatives, trading increased area/energy usage to speed-up units on the critical path of the computation. This paper examines existing solutions and...
With the exhaustion of IPv4 (32 bit) address space, IPv6 (128 bit) addressing is emerging to facilitate the immense growth of the Internet. However, this poses two main challenges to high-speed routers that perform packet forwarding: 1) increased IP lookup complexity and 2) increased routing table storage requirements. In this work, we present a high-performance IPv6 lookup engine based on routing...
Different from the previous work on energy-efficient algorithms, which focused on assumption that a task can be assigned to any processor, we study the problem of task Scheduling with the objective of Energy Minimization on Restricted Parallel Processors (SEMRPP). Restriction accounts for affinities between tasks and processors, that is, a task has its own eligible processing set of processors. It...
Scientific simulations and instruments can generate tremendous amount of data in short time periods. Since the generated data is used for inferring new knowledge, it is important to efficiently store and provide it to the scientific endeavors. Although parallel and distributed systems can help to ease the management of such data, the transmission and storage are still challenging problems. Compression...
Polygon overlay is one of the complex operations in Geographic Information Systems (GIS). In GIS, a typical polygon tends to be large in size often consisting of thousands of vertices. Sequential algorithms for this problem are in abundance in literature and most of the parallel algorithms concentrate on parallelizing edge intersection phase only. Our research aims to develop parallel algorithms to...
As we consider building the next generation of extreme-scale systems, many of the biggest challenges are related to memory characteristics. In particular, overcoming challenges related to resilience and memory bandwidth will require innovative strategies for improving the performance of main memory. In this paper, we propose to exploit memory content similarity to improve memory performance. We begin...
Most real-world network models inherently include some degree of noise due to the approximations involved in measuring real-world data. My thesis focuses on studying how these approximations affect the stability of the networks. In this paper, we focus on the stability of betweenness centrality (BC), a metric used to measure the importance of the vertices in the network. We present our results on...
GEPETO (for GEoPrivacy-Enhancing Toolkit) is a flexible software that can be used to visualize, sanitize, perform inference attacks and measure the utility of a particular geolocated dataset. The main objective of GEPETO is to enable a data curator (e.g., a company, a governmental agency or a data protection authority) to design, tune, experiment and evaluate various sanitization algorithms and inference...
It is known that I/O system rather than CPU and memory is the performance killer of many of the newly emerged data intensive applications. Evaluating and understanding I/O system performance has become a timely issue facing the high performance computing community. Conventional I/O performance metrics, such as Input/Output Operations Per Second (IOPS), bandwidth, response time, etc., are effective...
In order to adapt to the requirements of the massive scale storage environments, and improve storage space utilization of the data center host, we designed and implemented InfoStor, a heterogeneous environment, distributed block storage system. Through in-band storage virtualization technology that provides the reliability of traditional enterprise arrays with low cost and better scalability; provide...
Given the recent advent of the multicore era [1], research efforts in the area of high performance, low latency runtime systems have increased significantly. This research has given birth to new techniques in low-overhead scheduling techniques, small-memory footprint parallel execution units and kernel-free contextual environments. This paper presents a framework and runtime system for a truly heterogeneous...
Understanding large-scale application behavior is critical for effectively utilizing existing HPC resources and making design decisions for upcoming systems. In this work we present a methodology for characterizing an MPI application's large-scale computation behavior and system requirements using information about the behavior of that application at a series of smaller core counts. The methodology...
Algebraic Multigrid (AMG) solvers find wide use in scientific simulation codes. Their ideal computational complexity makes them especially attractive for solving large problems on parallel machines. However, they also involve a substantial amount of data movement, posing challenges to performance and scalability. In this paper, we present an algorithm that provides a systematic means of reducing data...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.