The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In HPC applications, the energy efficiency is becoming more and more important, due to architectural constraints. It is therefore of primary interest to measure and evaluate the energy efficiency of current architectures using typical HPC workloads. One of the most used and appreciated codes publicly available for computational material science simulation, and largely used in many high end HPC system...
Phylogenetic inference is the process of reconstructing the evolutionary history of species based on their traits, nowadays mostly using molecular sequence data. Current state-of-the-art inference methods, like Bayesian and Maximum Likelihood (ML) inference, rely on the Phylogenetic Likelihood Function (PLF) as their computational core. Due to the large number of floating-point operations involved,...
With energy efficiency and power consumption being the primary impediment in the path to exascale systems, low-power high performance embedded systems are of increasing interest. The Parallella System-on-module (SoM) created by Adapteva combines the Epiphany-IV 64-core coprocessor with a host ARM processor housed in a Zynq System-on-chip. The Epiphany integrates low-power RISC cores on a 2D mesh network...
The Texas Instruments (TI) C6678 “Shannon” is TI's most recently-released Digital Signal Processor (DSP). Although its original purpose was voice and video encoding and decoding, it may have the potential to become a practical coprocessor for scientific computing. In this paper, we evaluate the C6678 in terms of its programming methodology, performance, and power efficiency. As a case study, we implemented...
In the last few years, efficient resource management turned out to be one of the major challenges for hardware designers. Strategies of reusability through reconfiguration have demonstrated interesting potentials to address it, providing also power and area minimization. The Multi-Dataflow Composer (MDC) tool has been presented to the scientific community to automatically build-up runtime coarse-grained...
Coarse-Grained Reconfigurable Architecture (CGRA) in a hybrid system can significantly accelerate the execution of compute-intensive kernels of applications. However, the data communication overhead between the main processor (MP) and the CGRA may be huge and can negate the speed-up of the CGRA. In this paper we address the problem of reducing the data communication overhead in a hybrid system by...
Embedded multicore devices require high performance with minimal power consumption; many systems use dedicated hardware units to meet these constraints. However, embedded systems have also become increasingly multi-purpose and must be able to execute a wide range of applications — some of which might not yet be known at design time. It is therefore difficult to choose an appropriate mix of dedicated...
Multi-core systems are now the norm, and reconfigurable systems have shown substantial benefits over general purpose ones. This paper presents a combination of the two: a fully featured reconfigurable multi-core processor based on the Leon3 processor. The platform has important features like cache coherency, a fully running modern OS (GNU/Linux) and each core has a tightly coupled reconfigurable coprocessor...
With the development of Graphics Processing Unit (GPU) and the Compute Unified Device Architecture (CUDA) platform, researchers shift their attentions to general-purpose computing applications with GPU. In this paper, we present a novel parallel approach to run artificial fish swarm algorithm (AFSA) on GPU. Experiments are conducted by running AFSA both on GPU and CPU respectively to optimize four...
We present a Floating Point Vector Coprocessor that works with the Xilinx embedded processors. The FPVC is completely autonomous from the embedded processor, exploiting parallelism and exhibiting greater speedup than alternative vector processors. The FPVC supports scalar computation so that loops can be executed independently of the main embedded processor. Floating point addition, multiplication,...
General purpose GPU Computing (GPGPU) has taken off in the past few years, with great promises for increased desktop processing power due to the large number of fast computing cores on high-end graphics cards. Many publications have demonstrated phenomenal performance and have reported speedups as much as 1000× over code running on multi-core CPUs. Other studies have claimed that well-tuned CPU code...
Registration of partial scan data sets is still a challenge for today's CAD systems and CAD system users. Many of the known methods rely on user interaction or feature recognition. For non-regular users this is too time consuming and error prone. The paper describes a method to register partial scan data by fitting a large fat tetrahedron (LFT) in the target point cloud. The method is computational...
Graphics Processing Units provide a large computational power at a very low price which position them as an ubiquitous accelerator. GPGPU is accelerating general purpose computations using GPU's. GPU's have been used to accelerate many Linear Algebra routines and Numerical Methods. Lanczos is an iterative method well suited for finding the extreme eigenvalues and the corresponding eigenvectors of...
In this paper, we propose a novel floorplanning algorithm based on simulated annealing on GPUs. Simulated annealing is an inherently sequential algorithm, far from the typical programs suitable for Single Instruction Multiple Data (SIMD) style concurrency in a GPU. We propose a fundamentally different approach of exploring the floorplan solution space, where we evaluate concurrent moves on a given...
A grid-enabled programming toolkit called GridCuda is proposed in this paper. This programming toolkit provides a platform for users to write programs with the CUDA API, and exploit GPGPU resources available in computational grids to execute their programs. Whenever the CUDA functions in user programs are invoked, they will be transparently redirected to remote GPGPUs for execution by means of remote...
General purpose programming on the graphics processing units(GPGPU) has received a lot of attention in the parallel computing community as it promises to offer a large computational power at a very low price. GPGPU is best suited for regular data parallel algorithms. They are not directly amenable for algorithms which have irregular data access patterns such as convex hull, list ranking etc. In this...
The implementation via CUDA of a hybrid dense dynamic programming method for knapsack problems on amulti-GPU architecture is considered. Tests are carried out on a Bull cluster with Tesla S1070 computing systems. A first series of computational results shows substantial speedup. The speedup factor is close to 28 with two GPUs.
Writing efficient software for heterogeneous architectures equipped with modern accelerator devices presents a serious challenge to programmer productivity, creating a need for powerful performance-analysis tools to adequately support the software development process. To guide the design of such tools, we describe typical patterns of inefficient runtime behavior that may adversely affect the performance...
Manycore accelerators such as graphics processor units (GPUs) organize processing units into single-instruction, multiple data “cores” to improve throughput per unit hardware cost. Programming models for these accelerators encourage applications to run kernels with large groups of parallel scalar threads. The hardware groups these threads into warps/wavefronts and executes them in lockstep-dubbed...
In this paper, an in situ power analysis profiling over time for general purpose graphics processing units (GPGPU) is presented. Based on this method the power consumption of different modes of operations like data transfer between GPU and host CPU, basic single precision floating point arithmetic operations (addition, subtraction, multiplication) on the multiprocessor units and instructions for shared...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.