The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Processing In Memory (PIM), the concept of integrating processing directly with memory, has been attracting a lot of attention since PIM can assist in overcoming the throughput limitation caused by data movement between CPU and memory. The challenge, however, is that it requires the programmers to have a deep understanding of the PIM architecture to maximize the benefits such as data locality and...
The adoption of a programming language is positively influenced by the breadth of its software libraries. Chapel is a modern andrelatively young parallel programming language. Consequently, not many domain-specific software libraries exists that are written for Chapel. Graph processing is an important domain with many applications in cyber security, energy, social networking, and health. Implementing...
For partitioned global address space (PGAS) runtimes, supporting out-of-core data computation is an important issue. Some researchers showed that flash SSDs are useful for out-of-core data computation.In this paper, we introduce ComEx-PM, a PGAS communication runtime. ComEx-PM supports out-of-core data computation using a flash SSD. ComEx-PM launched multiple processes in each node. Memory region...
This paper will describe the application of the PGAS Global Arrays (GA) library to power grid simulations. The GridPACK™ framework has been designed to enable power grid engineers to develop parallel simulations of the power grid by providing a set of templates and libraries that encapsulate most of the details of parallel programming in higher level abstractions. The communication portions of the...
General purpose compilers aim to extract the best average performance for all possible user applications. Due to the lack of specializations for different types of computations, compiler attained performance often lags behind those of the manually optimized libraries. In this paper, we demonstrate a new approach, programmable composition, to enable the specialization of compiler optimizations without...
Stencil computations, which are important kernels for CFD simulations, have been highly successful on GPGPU clusters, due to high memory bandwidth and computation speed of GPU accelerators. However, sizes of the computed domains are limited by small capacity of GPU device memory. In order to support larger domain sizes, we utilize the memory hierarchy of GPGPU clusters; larger host memory is used...
In this paper we introduce, Bohrium, a runtime-system for mapping vector operations onto a number of different hardware platforms, from simple multi-core systems to clusters and GPU enabled systems. In order to make efficient choices Bohrium is implemented as a virtual machine that makes runtime decisions, rather than a statically compiled library, which is the more common approach. In principle,...
In this paper, a dual language hybrid programming based on Mat lab engine technology and example of implementation are described. A lot of Mat lab functions can be used by this technology effectively, which reducing the workload of the program, also it can inherite the excellent VC program interface, therefore it is a kind of good hybrid program design method for debugging hardware and software interfaces,...
Java Native Access (JNA) has been proposed to alleviate the burden of programming in Java Native Interface (JNI). JNA allows programmer to call native functions without writing any JNI codes. However, JNA suffers from some performance degradation. To overcome this problem, in this paper, we modify the JNA source code and integrate the LLVM JIT compiler into JNA to improve the performance. Our experiment...
Live programming can be considered an interaction with incomplete code. Dynamic languages embrace the similar style of programming, such as pair programming and prototyping in a review session. Static languages require a certain degree of completeness of code, such as type safety and namespace resolution. SOMETHINGit is a Smalltalk library that combines dynamic Smalltalk and static Haskell and VDM-SL...
I/O performance is vital for most HPC applications especially those that generate a vast amount of data with the growth of scale. Many studies have shown that scientific applications tend to issue small and noncontiguous accesses in an interleaving fashion, causing different processes to access overlapping regions. In such scenario, collective I/O is a widely used optimization technique. However,...
Despite the numerous prevention and protection techniques that have been developed, the exploitation of memory corruption vulnerabilities still represents a serious threat to the security of software systems and networks. Because of the adoption of the write or execute only policy (W¨'X) and address space layout randomization (ASLR), modern operate systems have been strengthened against code injection...
We are facing a hardware revolution given by the increasing availability of multicore computers, clusters, Grids, and combinations of these. Consequently, there is plenty of computational power, but today's programmers are not fully prepared to exploit parallelism and distribution. Particularly, Java has helped in handling the heterogeneity of such environments, but there is a lack of facilities to...
The exploitation of heterogeneous resources is becoming increasingly important for general purpose computing. Unfortunately, heterogeneous systems require much more effort to be programmed than the traditional single or even multi-core computers most programmers are familiar with. Not only new concepts, but also new tools with different restrictions must be learned and applied. Additionally, many...
The Message Passing Interface (MPI) provides bindings for the three programming languages commonly used in High Performance Computing (HPC): C, C++ and Fortran. Unfortunately, MPI supports only the lowest common denominator of the three languages, providing a level of abstraction far lower than typical C++ libraries. Lately, after the decision of the MPI committee to deprecate and remove the C++ bindings...
Our ability to create systems with large amount of hardware parallelism is exceeding the average software developer's ability to effectively program them. This is a problem that plagues our industry. Since the vast majority of the world's software developers are not parallel programming experts, making it easy to write, port, and debug applications with sufficient core and vector parallelism is essential...
A promising approach to high-level design is to start initially with an obvious but possibly inefficient design, and apply multiple transformations to meet design goals. Many hardware compilation tools support a fixed recipe of applying design transformations, but designers have few options to adapt the recipe without re-writing the tools themselves. In addition, complex transformations based on linear...
For the new development of modern education, to enhance the practice part of teaching is the key approach to improve the teaching quality. In this paper the problems existed in the teaching process of data structure is expounded and analyzed in detail. New reformed experiment teaching methods are put forward and an instance is given in the end of the paper.
Multiple-precision integer operations are key components of many security applications; but unfortunately they are computationally expensive on contemporary CPUs. In this paper, we present our design and implementation of a multiple-precision integer library for GPUs which is implemented by CUDA. We report our experimental results which show that a significant speedup can be achieved by GPUs as compared...
In this paper, a source to source (S2S) compiler with profiling support is designed and implemented. The focus of this compiler is to convert the source code running in the homogeneous environment to the code that can be compiled and run under the Cell BE architecture. Combined with the runtime profiling mechanism, the S2S compiler records the optimization strategies and their effects, which can be...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.