The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
RDMA (Remote Direct Memory Access) is a technology that enables user applications to perform direct data transfer between the virtual memory of processes on remote endpoints, without operating system involvement or intermediate data copies. Achieving zero intermediate data copies using RDMA requires specialized network interface hardware. Software RDMA drivers emulate RDMA semantics in software to...
FPGAs are continuously increasing in both chip size and operating frequency. Dynamic reconfiguration is easier and more stable with current generation of hardware and software tools. These characteristics have made them more accessible to generic acceleration tasks instead of specialized functions. As a consequence, FPGAs are being deployed in more computing clusters than in the past. This leads to...
The Parallella is a hybrid computing platform that came into existence as the result of a Kickstarter project by Adapteva. It is composed of the high performance, energy-efficient, manycore architecture, Epiphany chip (used as co-processor) and one Zynq-7000 series chip, which normally runs a regular Linux OS version, serves as the main processor, and implements "glue logic" in its internal...
Deformable structures are abundant in various domains such as biology, medicine, life sciences, and ocean engineering. Our previous work created a numerical method, named LBM-IB method [1], to solve the fluid-structure interaction (FSI) problems. Our LBM-IB method is particularly suitable for simulating flexible (or elastic) structures immersed in a moving viscous fluid. Fluid-structure interaction...
A little over a decade ago, Goto and van de Geijn wrote about the importance of the treatment of the translation lookaside buffer (TLB) on the performance of matrix multiplication. Crucially, they did not say how important, nor did they provide results that would allow the reader to make his own judgement. In this paper, we revisit their work and look at the effect on the performance of their algorithm...
Understanding power usage in parallel workloads is crucial to develop the energy-aware software that will run in future Exascale systems. In this paper, we contribute towards this goal by introducing an integrated framework to profile, monitor, model and analyze power dissipation in parallel MPI and multi-threaded scientific applications. The framework includes an own-designed device to measure internal...
Stochastic simulations are often sensitive to the source of randomness that characterizes the statistical quality of their results. Consequently, we need highly reliable Random Number Generators (RNGs) to feed such applications. Recent developments try to shrink the computation time by relying more and more General Purpose Graphics Processing Units (GPGPUs) to speedup stochastic simulations. Such...
New high performance computing (HPC) applications recently have to face scalability over an increasing number of nodes and the programming of special accelerator hardware. Hybrid composition of large computing systems leads to a new dimension in complexity of software development. This paper presents a novel approach to gain insight into accelerator interaction and utilization without any changes...
High Performance Computing (HPC) or scientific codes are being executed across a wide variety of computing platforms from embedded processors to massively parallel GPUs. We present a comparison of the Basic Linear Algebra Subroutines (BLAS) using double-precision floating point on an FPGA, CPU and GPU. On the CPU and GPU, we utilize standard libraries on state-of-the-art devices. On the FPGA, we have...
The increasing computing requirements for GPUs (Graphics Processing Units) have favoured the design and marketing of commodity devices that nowadays can also be used to accelerate general purpose computing. Therefore, future high performance clusters intended for HPC (High Performance Computing) will likely include such devices. However, high-end GPU-based accelerators used in HPC feature a considerable...
Java is a valuable and emerging alternative for the development of parallel applications, thanks to the availability of several Java message-passing libraries and its full multithreading support. The combination of both shared and distributed memory programming is an interesting option for parallel programming multi-core systems. However, the concerns about Java performance are hindering its adoption...
Efficient intra-node shared memory communication is important for high performance computing (HPC), especially with the emergence of multi-core architectures. As clusters continue to grow in size and complexity, the use of virtual machine (VM) technologies has been suggested to ease the increasing number of management issues. As demonstrated by earlier research, shared memory communication must be...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.