The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
CUDA performs general purpose parallel computing using GPGPU, which has been applied to various computing fields. However, the multi-address-space architecture in CUDA makes memory management complicated. NVIDIA introduced UVA, Unified Virtual Addressing, into CUDA Toolkit 4.0 to address this issue. However, UVA has platform limitations and even performance loss under certain circumstances. We propose...
Today, the use of GPUs as coprocessors to accelerate high-performance scientific applications is becoming an important practice. Still, some of the high-level programming languages such as Java require extensions or new interfaces for utilising the huge parallelism of these new devices. In this paper, we propose extensions to an existing Java-based programming and parallel computing environment called...
GPUs and other accelerators are available on many different devices, while GPGPU has been massively adopted by the HPC research community. Although a plethora of libraries and applications providing GPU support are available, the need of implementing new algorithms from scratch, or adapting sequential programs to accelerators, will always exist. Writing CUDA or OpenCL codes, although an easier task...
Graphics Processing Units (GPUs) have been widely used as accelerators in large-scale heterogeneous computing systems. However, current programming models can only support the utilization of local GPUs. When using non-local GPUs, programmers need to explicitly call API functions for data communication across computing nodes. As such, programming GPUs in large-scale computing systems is more challenging...
Modern computer systems are becoming increasingly heterogeneous by comprising multi-core CPUs, GPUs, and other accelerators. Current programming approaches for such systems usually require the application developer to use a combination of several programming models (e.g., MPI with OpenCL or CUDA) in order to exploit the full compute capability of a system. In this paper, we presentd OpenCL (Distributed...
Due to their outstanding computational performance, many acceleration devices, such as GPUs, the Cell Broadband Engine (Cell/B.E.), and multi-core computing are attracting a lot of attention in the field of high-performance computing. Although there are many programming models and languages de-signed for programming accelerators, such as CUDA, AMD Accelerated Parallel Processing (AMD APP), and OpenCL,...
To cope with the complexity of programming GPU accelerators for medical imaging computations, we developed a framework to describe image processing kernels in a domain-specific language, which is embedded into C++. The description uses decoupled access/execute metadata, which allow the programmer to specify both execution constraints and memory access patterns of kernels. A source-to-source compiler...
Application programming for GPUs (Graphics Processing Units) is complex and error-prone, because the popular approaches - CUDA and OpenCL - are intrinsically low-level and offer no special support for systems consisting of multiple GPUs. The SkelCL library presented in this paper is built on top of the OpenCL standard and offers pre-implemented recurring computation and communication patterns (skeletons)...
Hybrid CPU/GPU computing architecture recently has become an alternative platform for high performance computing. This architecture provides massive computational power with lower energy consumption and less economic cost than the traditional one using only CPUs. However, the complexity of the GPU programming is too high for users to move their applications toward this hybrid computing architecture...
Clusters of GPUs are emerging as a new computational scenario. Programming them requires the use of hybrid models that increase the complexity of the applications, reducing the productivity of programmers. We present the implementation of OmpSs for clusters of GPUs, which supports asynchrony and heterogeneity for task parallelism. It is based on annotating a serial application with directives that...
As the demand for research on Image/ Content authentication has significantly increased, many authentication schemes have been proposed so far. But most of them are time consuming. This research concentrates on decreasing the time needed by an Image authentication algorithm. In this paper, we have shown a CUDA-based implementation of content authentication algorithm with NVIDIA's GeForce 8400 GS GPU...
GPU (Graphics Processing Unit) provides high computational speed at a very low cost as compared to high end systems. The field of parallel processing using GPU is advancing very fast with a new technology being introduced in the field every day. With such advancements, it is necessary to review the major works done in this field. Graph traversal is one of the major challenges in this field. So far...
The current trend in medical research for the discovery of new drugs is the use of Virtual Screening (VS) methods. In these methods, the calculation of the non-bonded interactions, such as electrostatics or van der Waals forces, plays an important role, representing up to 80% of the total execution time. These kernels are computational intensive and massively parallel in nature, and thus they are...
With the increasing diversity of computing systems and the rapid performance improvement of commodity hardware, heterogeneous clusters become the dominant platform for low-cost, high-performance computing. Grid-enabled and heterogeneous implementations of MPI establish it as the de facto programming model for these environments. On the other hand, task parallelism provides a natural way for exploiting...
CUDA has become a very popular programming paradigm in parallel computing area. However, very little work has been done for characterizing CUDA kernels. In this work, we measure the thread level performance, collect the basic block level characteristics, and glean the instruction level properties for about 35 programs from CUDA SDK, Parboil, and Rodinia benchmark suites. In addition, we define basic...
Iris Recognition stands out as one of the most accurate biometric methods in use today. However, the iris recognition algorithms are currently implemented on general purpose sequential processing systems, such as generic central processing units (CPUs). In this work, we presented a more direct and parallel processing alternative using the graphics processing unit (GPU), which originally was used exclusively...
Hybrid CPU/GPU computing architecture has received great attention from the researchers of high performance computing. This new architecture provides higher computation performance than that uses only CPUs for data computation. However, the programming on this computing architecture is not easy for programmers since they have to learn the programming APIs of GPU and handle data communication between...
GPUs are slowly becoming ubiquitous devices in high performance computing. Nvidia's newly released version 4.0 of the CUDA API[2] for GPU programming offers multiple ways to program on GPUs and emphasizes on Multi-GPU environments which are common in modern day compute clusters. However, despite of the subsequent progress in FLOP counts, the bane of large scale computing systems have been increased...
Automatic compilation for multiple types of devices is important, especially given the current trends towards heterogeneous computing. This paper concentrates on some issues in compiling fine-grained SPMD-threaded code (e.g., GPU CUDA code) for multicore CPUs. It points out some correctness pitfalls in existing techniques, particularly in their treatment to implicit synchronizations. It then describes...
Using multi-GPU systems, including GPU clusters, is gaining popularity in scientific computing. However, when using multiple GPUs concurrently, the conventional data parallel GPU programming paradigms, e.g., CUDA, cannot satisfactorily address certain issues, such as load balancing, GPU resource utilization, overlapping fine grained computation with communication, etc. In this paper, we present a...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.