The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Meshless methods to simulate fluid flows have been increasingly evolving through the years since they are a great alternative to deal with large deformations, which is where mesh-based methods fail to perform efficiently. A well known meshless method is the Moving Particle Semi-implicit (MPS) method, which was designed to simulate free-surface truly incompressible fluid flows. Many variations and...
Due to the rapid increase in biological data dimension and acquisition rate, the traditional analysis methods are unable to achieve acceptable accuracy. Recently, Deep learning technologies have shown outstanding results in many domains; especially in pattern recognition in the field of bioinformatics. In this paper, we provide background of what deep learning and its frameworks. In addition, we review...
Many neural architectures including RBF, SVM, FSVC classifiers, or deep-learning solutions require the efficient implementation of neurons layers, each of them having a given number of m neurons, a specific set of parameters and operating on a training or test set of N feature vectors having each a dimension n. Herein we investigate how to allocate the computation on GPU kernels and how to better...
This paper presents a high-level C++ framework to explore multi-CPU and multi-GPU systems called HPSM. HPSM enables parallel loops and reductions implemented over three parallel backends: Serial, OpenMP (with GCC and libKOMP runtime), and StarPU. We evaluated HPSM development effort with AXPY program, and performance with three parallel benchmarks: N-Body, Hotspot, and CFD solver. The CPU-GPU combination...
Cache tuning has been widely studied in CPUs, and shown to achieve substantial energy savings, with minimal performance degradations. However, cache tuning has yet to be explored in General Purpose Graphics Processing Units (GPGPU), which have emerged as efficient alternatives for general purpose high-performance computing. In this paper, we explore autonomic cache tuning for GPGPUs, where the cache...
Associative memories are models capable to store and retrieve messages given only a part of their content. These systems have been used in several applications such as databases engines, network routers, natural language processing and image recognition due to their error correction capability in pattern retrieving. Recently, Gripon and Berrou introduced a sparse associative memory based on cliques...
GPUs have been widely used in the past decade to speed up the execution of general purpose applications with high level of parallelism. The efficiency of running general purpose applications on GPUs depends on how well the processing and memory demands of the application is balanced with the hardware resources available on the target GPU and it can significantly affect the power and performance of...
High Performance Computing (HPC) is a strategical resource that allows research communities and developers to fulfill the processing demand (1 ExaFlops/Sec) for future Exascale Computing system which is expected in the end of current decade. In order to provide an extensive level of performance, many powerful and energy efficient devices (MIC, GPU) and parallel programming models have been proposed...
Technological advancements have necessitated the need for effectively teaching GPU computing. This need has been inspired by the increasing pattern of utilizing parallel computing and by the growing utilization of GPUs for computationally intensive tasks. This paper is motivated to address the above mentioned need. The paper describes a semester-long course on CUDA programming. The course has significant...
Compute Unified Device Architecture (CUDA) is an attractive alternative for our ever growing need for high performance computing. However to extract the full potential of CUDA one should, at the least be familiar with the programming model and should have a fair understanding of the memory and the cache architecture. Yet most of the domain experts from domains that warrant high performance computing...
Currently, the areas of transmission and distributed processing technology commonly used network functions virtualization (NFV). A key feature of these solutions is that they can use in the Cloud. Modern cloud infrastructure often has the GPGPU-coprocessors for acceleration purposes. There are a number of problems in which it is possible to use hybrid computing technology to speed up the NFV: encryption,...
The architecture of high-performance data storage and processing systems has changed considerably. Modern cloud computing systems are often not just a hybrid but also supports hardware acceleration. The paper describes the scope of information security protocols based on PRNG in industrial systems. The work provides a method for implementing GOST R 34.12-2015 Based Pseudo-Random Number Generator in...
The understanding of application characteristics such as hardware resource requirements and communication patterns is key in building highly utilized high performance computing systems for target workloads at a reasonable cost and with available technology. The characterization drives the design decision of both hardware and software. Memory access pattern is a key factor as data movement is a major...
The processing techniques and time consumed during the execution of a task in CPU and GPU vary depending on the technology of the architecture and their configuration. This paper compares the time consumed and the efficiency of CPU and GPU with different architectures and configurations to apply spatial smoothing filters on the images of a fixed spatial resolution. The processing speed was increased...
In in-memory database systems augmented by hardware accelerators, accelerating the index searching operations can greatly increase the runtime performance of database queries. Recently, adaptive radix trees (ART) have been shown to provide very fast index search implementation on the CPU. Here, we focus on an accelerator-based implementation of ART. We present a detailed performance study of our GPU-based...
The TOP500 and GREEN500 lists are two major resources to understand and forecast the future architecture design of high performance computing platform. Generally, supercomputer system design can be divided to two parts: single computing node and interconnection. Regardless interconnection, we categorize the systems into two types: homogeneous and heterogeneous, based on single node architecture. While...
We present a parallel distributed-memory algorithm for large deformation diffeomorphic registration of volumetric images that produces large isochoric deformations (locally volume preserving). Image registration is a key technology in medical image analysis. Our algorithm uses a partial differential equation constrained optimal control formulation. Finding the optimal deformation map requires the...
Effective utilization of the increasingly heterogeneous hardware in modern supercomputers is a significant challenge. Many applications have seen performance gains by using GPUs, but many implementations leave CPUs sitting idle.In this paper, we describe a runtime managed system for coordinating heterogeneous execution. This system manages data transfers to and from GPU devices and schedules work...
Power is a major limiting factor for the future of HPC and the realization of exascale computing under a power budget. GPUs have now become a mainstream parallel computation device in HPC, and optimizing power usage on GPUs is critical to achieving future goals. GPU memory is seldom studied, especially for power usage. Nevertheless, memory accesses draw significant power and are critical to understanding...
We report the utilization of the parallel resources of the graphic processing unit (GPU) to solve sparse systems by optimizing and implementation of a variant of Gaussian belief propagation algorithm for sparse matrices on a Tesla 2070M GPU with the Fermi architecture. The implementation was verified with finite element method data and achieved up to 4× improvement in execution time compared to serial...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.