The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Stencil computations are not well optimized by general-purpose production compilers and the increased use of multicore, manycore, and accelerator-based systems makes the optimization problem even more challenging. In this paper we present Snowflake, a Domain Specific Language (DSL) for stencils that uses a "micro-compiler" approach, i.e., small, focused, domain-specific code generators....
The main contribution of this paper is to show a new photomosaic generation method by rearranging subimages of an image. In the photomosaic generation, an input image is divided into small subimages and they are rearranged such that the rearranged image reproduces another image given as a target image. Therefore, this problem can be considered as a combinatorial optimization problem to obtain the...
This study presents a new algorithm and corresponding statistical package for estimating optimal bandwidth for a nonparametric kernel regression. Kernel regression is widely used in Economics, Statistics, and other fields. The formula for the optimal "bandwidth," or smoothing parameter, is well-known. In practice, however, the computational demands of estimating the optimal bandwidth have...
In this work we propose an accelerated stochastic learning system for very large-scale applications. Acceleration is achieved by mapping the training algorithm onto massively parallel processors: we demonstrate a parallel, asynchronous GPU implementation of the widely used stochastic coordinate descent/ascent algorithm that can provide up to 35× speed-up over a sequential CPU implementation. In order...
Because sparse matrix-vector multiplication (SpMV) is an important and widely used computational kernel in many real-world applications, it behooves us to accelerate SpMV on modern multi- and many-core architectures. While many storage formats have been developed to facilitate SpMV operations, the compressed sparse row (CSR) format is still the most popular and general storage format. However, parallelizing...
Driven by the increasing diversity of current and future HPC hardware and software platforms, the HPC community has seen a dramatic increase in research and development efforts into the composability of discrete software systems. While modularity is often desirable from a software engineering, quality assurance, and maintainability perspective, the barriers between software components often hide optimization...
In this paper we investigate an emerging application, 3D scene understanding, likely to be significant in the mobile space in the near future. The goal of this exploration is to reduce execution time while meeting our quality of result objectives. In previous work, we showed for the first time that it is possible to map this application to power constrained embedded systems, highlighting that decision...
Scientists who want to exploit the computing power of the latest parallel architectures are faced with a diverse set of architectures and a number of programming languages, models and approaches. Among several such programming techniques are directive-based programming models, OpenMP and OpenACC. This paper explores the similarities and the functionality gaps between both models and presents insights...
This paper presents two parallel implementationsof the Back-propagation algorithm, a widely used approach forArtificial Neural Networks (ANNs) training. These implementationspermit one to increase the number of ANNs trainedsimultaneously taking advantage of the thread-level massiveparallelism of GPUs and multi-core architecture of modernCPUs, respectively. Computational experiments are carried outwith...
A major component of many advanced programming courses is an open-ended "end-of-term project" assignment. Delivering and evaluating open-ended parallel programming projects for hundreds or thousands of students brings a need for broad system reconfigurability coupled with challenges of testing and development uniformity, access to esoteric hardware and programming environments, scalability,...
This paper analyzes the performance of different implementations of a three-point angular correlation function. This function is used in the study of large scale distribution of galaxies in a variety of computational platforms. The function is based on histogram construction and presents a large computational cost. This cost dramatically increases with the size of the datasets. The implementation...
The bulk execution of a sequential algorithm is to execute it for many different inputs in turn or at the same time. It is known that the bulk execution of an oblivious sequential algorithm can be implemented to run efficiently on a GPU. The bulk execution supports fine grained bitwise parallelism, allowing it to achieve high acceleration over a straightforward sequential computation. The main contribution...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.