The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Sparse Matrix Vector-Multiplication is an important operation for many iterative solvers. However, peak performance is limited by the fact that the commonly used algorithm alternates between compute-bound and memory-bound steps. This paper proposes a novel data structure and an FPGA-based hardware core that eliminates the limitations imposed by memory.
During recent years General-Purpose Graphical Processing Units (GP-GPUs) have entered the field of High-Performance Computing (HPC) as one of the primary architectural focuses for many research groups working with complex scientific applications. Nvidia's Tesla C2050, codenamed Fermi, and AMD's Radeon 5870 are two devices positioned to meet the computationally demanding needs of supercomputing research...
For some classes of problems, NVIDIA CUDA abstraction and hardware properties combine with problem characteristics to limit the specific problem instances that can be effectively accelerated. As a real-world example, a two-dimensional correlation-based template-matching MATLAB application is considered. While this problem has a well known solution for the common case of linear image filtering -- small...
Radar is a data-intensive measurement technique often requiring significant processing to make full use of the received signal. However, computing capacity is limited at remote or mobile radar installations thereby limiting radar data products used for real-time decisions. We used graphics processing units (GPUs) to accelerate processing of high resolution phase-coded radar data from the Modular UHF...
The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers between the CPU and GPU over PCIe. Emerging heterogeneous computing architectures that "fuse" the functionality of the CPU and GPU, e.g., AMD...
Object tracking is an important task in computer vision applications. One of the crucial challenges is the real-time speed requirement. In this paper we implement an object tracking system in reconfigurable hardware using an efficient parallel architecture. In our implementation, we adopt a background subtraction based algorithm. The designed object tracker exploits hardware parallelism to achieve...
Design of data structures for high performance computing (HPC) is one of the principal challenges facing researchers looking to utilize heterogeneous computing machinery. Heterogeneous systems derive cost, power, and speed efficiency by being composed of the appropriate hardware for the task. Yet, each type of processor requires a specific organization of the application state in order to achieve...
Three out of the top four supercomputers in the November 2010 TOP500 list of the world's most powerful supercomputers use NVIDIA GPUs to accelerate computations. Ninety-five systems from the list are using processors with six or more cores. Three-hundred-sixty-five systems use quad-core processor-based systems. Thirty-seven systems are using dual-core processors. The large-scale enabling of hybrid...
Open Computing Language (OpenCL) is fast becoming the standard for heterogeneous parallel computing. It is designed to run on CPUs, GPUs, and other accelerator architectures. By implementing a real world application, a solar radiation model component widely used in climate and weather models, we show that OpenCL multi-threaded programming and execution model can dramatically increase performance even...
Networking performance continues to grow but processor clock frequencies have not. Likewise, the latency to primary memory is not expected to improve dramatically either. This is leading computer architects to reconsider the networking subsystem and the roles and responsibilities of hardware and the operating system. This paper presents the first component of a new networking subsystem where the hardware...
We propose an approach for high-performance scientific computing that separates the description of algorithms from the generation of code for parallel hardware architectures like Multi-Core CPUs, GPUs or FPGAs. This way, a scientist can focus on his domain of expertise by describing his algorithms generically without the need to have knowledge of specific hardware architectures, programming languages,...
At Fermilab, we have prototyped a GPU-accelerated network performance monitoring system, called G-NetMon, to support large-scale scientific collaborations. In this work, we explore new opportunities in network traffic monitoring and analysis with GPUs. Our system exploits the data parallelism that exists within network flow data to provide fast analysis of bulk data movement between Fermilab and collaboration...
Lattice QCD calculations were one of the first applications to show the potential of GPUs in the area of high performance computing. Our interest is to find ways to effectively use GPUs for lattice calculations using the overlap operator. The large memory footprint of these codes requires the use of multiple GPUs in parallel. In this paper we show the methods we used to implement this operator efficiently...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.