Search results

Items from 1 to 20 out of 25 results

article

Performance Evaluation of Dynamic Speculative Multithreading with the Cascadia Architecture

D.A. Zier, Ben Lee

IEEE Transactions on Parallel and Distributed Systems > 2010 > 21 > 1 > 47 - 59

Thread-level parallelism (TLP) has been extensively studied in order to overcome the limitations of exploiting instruction-level parallelism (ILP) on high-performance superscalar processors. One promising method of exploiting TLP is dynamic speculative multithreading (D-SpMT), which extracts multiple threads from a sequential program without compiler support or instruction set extensions. This paper...

chapter

Investigation on Multi-Grain Parallelism in Chip Multiprocessor for Multimedia Application

Xiaoping Huang, Xiaoya Fan, Shengbing Zhang, Liwen Shi

2009 International Conference on Information Engineering and Computer Science > 1 - 4

2009 International Conference on Information Engineering and Computer Science. ICIECS 2009

With the advent of chip multiprocessor (CMP) architecture, programmer must tune the program to the architecture in order to fully utilize the hardware resource. How to parallel program multimedia application in the CMP is a big obstacle. In this paper, we introduce the potential parallelism in the multimedia application and the multi-grain parallelism architecture in the CMP; also we make a systematic...

chapter

Real-Time Motion Tracking Using the CELL BE

H. El-Sayed, L. Riha

2009 3rd International Conference on New Technologies, Mobility and Security > 1 - 7

2009 3rd International Conference on New Technologies, Mobility and Security (NTMS 2009)

Object tracking is an important computer vision problem with many civilian and military applications including surveillance, robotics and intelligent vehicle design. Most of these applications require fast processing due to their real time nature. The Cell processor is a cost-efficient commodity architecture intended for video gaming and provides new opportunities for parallel processing. This paper...

chapter

On the Robust Mapping of Dynamic Programming onto a Graphics Processing Unit

Shucai Xiao, A.M. Aji, Wu-chun Feng

2009 15th International Conference on Parallel and Distributed Systems > 26 - 33

2009 IEEE 15th International Conference on Parallel and Distributed Systems (ICPADS 2009)

Graphics processing units (GPUs) have been widely used to accelerate algorithms that exhibit massive data parallelism or task parallelism. When such parallelism is not inherent in an algorithm, computational scientists resort to simply replicating the algorithm on every multiprocessor of a NVIDIA GPU, for example, to create such parallelism, resulting in embarrassingly parallel ensemble runs that...

chapter

CUDA-Based Jacobi's Iterative Method

Zhihui Zhang, Qinghai Miao, Ying Wang

2009 International Forum on Computer Science-Technology and Applications > 1 > 259 - 262

2009 International Forum on Computer Science-Technology and Applications (IFCSTA 2009)

Solving linear equations is a common problem in the fields of science and engineering. Accelerating its solving process is of great significance. Modern GPUs are high performance many-core processors fit for large scale parallel computing. They provide us a novel way for accelerating the solving process. A GPU based parallel Jacobi's iterative solver for dense linear equations is presented in this...

chapter

Numerical Parallel Processing Based on GPU with CUDA Architecture

Chengming Zou, Chunfen Xia, Guanghui Zhao

2009 International Conference on Wireless Networks and Information Systems > 93 - 96

2009 International Conference on Wireless Networks and Information Systems (WNIS 2009)

The characteristics of modern graphics processing unit (GPU) is programmable, high price / performance ratio and high speed . It has a strong ability to adapt the parallel calculation, Based on this, the article study the general method of GPU calculating and use compute unified device architecture (CUDA) to design new parallel algorithm to accelerate the matrix inversion and binarization algorithm...

chapter

Implementation of association rule mining using CUDA

S.H. Adil, S. Qamar

2009 International Conference on Emerging Technologies > 332 - 336

2009 International Conference on Emerging Technologies (ICET)

The purpose of this paper is to implement association rule mining algorithm using Nvidia CUDA framework for general purpose computing on GPU. The major objective is to perform performance comparison of association rule mining algorithm using C based implementation on Intel Quad Core/Core2 Duo CPU with CUDA based implementation on Nvidia G80 and GTX 200 series GPU. The final outcome of this research...

chapter

Cellular Level Agent Based Modelling on the Graphics Processing Unit

P. Richmond, S. Coakley, D. Romano

2009 International Workshop on High Performance Computational Systems Biology > 43 - 50

2009 International Workshop on High Performance Computational Systems Biology (HiBi 2009)

Cellular level agent based modelling is reliant on either sequential processing environments or expensive and largely unavailable PC grids. The GPU offers an alternative architecture for such systems, however the steep learning curve associated with the GPUs data parallel architecture has previously limited the uptake of this emerging technology. In this paper we demonstrate a template driven agent...

chapter

Fine-grain Parallelism Using Multi-core, Cell/BE, and GPU Systems: Accelerating the Phylogenetic Likelihood Function

F. Pratas, P. Trancoso, A. Stamatakis, L. Sousa

2009 International Conference on Parallel Processing > 9 - 17

2009 International Conference on Parallel Processing (ICPP 2009)

We are currently faced with the situation where applications have increasing computational demands and there is a wide selection of parallel processor systems. In this paper we focus on exploiting fine-grain parallelism for a demanding bioinformatics application - MrBayes - and its phylogenetic likelihood functions (PLF) using different architectures. Our experiments compare side-by-side the scalability...

chapter

Fast Isosurface Extraction for Medical Volume Dataset on Cell BE

Hai Jin, Bo Li, Ran Zheng, Qin Zhang

2009 International Conference on Parallel Processing > 100 - 107

2009 International Conference on Parallel Processing (ICPP 2009)

The size of volumetric data generated by medical imaging and scientific simulations is increased significantly due to the dramatic advances in medical imaging modalities and computing technologies. The volumetric data generally need to be visualized and marching cubes algorithm (MC for short) is one of the standard methods of the isosurface extraction for the medical applications. However, MC algorithm...

chapter

Performance and Power Efficiency Analysis of the Symmetric Cryptograph on Two Stream Processor Architectures

Guang Xu, Hong An, Gu Liu, Ping Yao, more

2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing > 917 - 920

2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing. IIH-MSP 2009

Multimedia and some scientific applications have achieved good performance on the stream processor architecture by employing the stream programming model. In order to find out the way to accelerate the symmetric cryptograph on stream processor, we implement and analyze cryptograph algorithms on different stream processors in this paper. Four cipher algorithms including RC5, AES, TWOFISH and 3DES in...

chapter

Prophet: A Speculative Multi-threading Execution Model with Architectural Support Based on CMP

Zhaoyu Dong, Yinliang Zhao, Yuanke Wei, Xuhao Wang, more

2009 International Conference on Scalable Computing and Communications; Eighth International Conference on Embedded Computing > 103 - 108

2009 International Conference on Scalable Computing and Communications; Eighth International Conference on Embedded Computing. SCALCOM-EMBEDDEDCOM 2009

Speculative Multithreading (SpMT) has been proposed as a perspective method for sequential programs to benefit from the increasing computing resources provided by Chip Multiprocessors (CMP). This paper analyzes the extraction of ihread-level parallelism from general-purpose programs and presents a speculative multi-threading execution model, Prophet. The architectural support for Prophet execution...

chapter

Just-in-Time Renaming and Lazy Write-Back on the Cell/B.E.

P. Bellens, J.M. Perez, R.M. Badia, J. Labarta

2009 International Conference on Parallel Processing Workshops > 138 - 145

2009 38th International Conference on Parallel Processing Workshops (ICPPW 2009)

Cell Superscalar (CellSs) provides a simple, flexible and easy programming approach for the Cell Broadband Engine (Cell/B.E.) that automatically exploits the inherent concurrency of applications at a function or task level. The CellSs environment is based on a source-to-source compiler that translates annotated C or Fortran code and a runtime library tailored for the Cell/B.E. that orchestrates the...

chapter

Efficient implementation for MD5-RC4 encryption using GPU with CUDA

Changxin Li, Hongwei Wu, Shifeng Chen, Xiaochao Li, more

2009 3rd International Conference on Anti-counterfeiting, Security, and Identification in Communication > 167 - 170

2009 3rd International Conference on Anti-counterfeiting, Security, and Identification in Communication (2009 ASID)

Benefit from the novel compute unified device architecture (CUDA) introduced by NVIDIA, graphics processing unit (GPU) turns out to be a promising solution for cryptography applications. In this paper we present an efficient implementation for MD5-RC4 encryption using NVIDIA GPU with novel CUDA programming framework. The MD5-RC4 encryption algorithm was implemented on NVIDIA GeForce 9800GTX GPU. The...

chapter

NeMo: A Platform for Neural Modelling of Spiking Neurons Using GPUs

A.K. Fidjeland, E.B. Roesch, M.P. Shanahan, W. Luk

2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors > 137 - 144

2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors

Simulating spiking neural networks is of great interest to scientists wanting to model the functioning of the brain. However, large-scale models are expensive to simulate due to the number and interconnectedness of neurons in the brain. Furthermore, where such simulations are used in an embodied setting, the simulation must be real-time in order to be useful. In this paper we present NeMo, a platform...

chapter

Balancing Locality and Parallelism on Shared-cache Mulit-core Systems

M.J. Cade, A. Qasem

2009 11th IEEE International Conference on High Performance Computing and Communications > 188 - 195

2009 11th IEEE International Conference on High Performance Computing and Communications (HPCC)

The emergence of multi-core systems opens new opportunities for thread-level parallelism and dramatically increases the performance potential of applications running on these systems. However, the state of the art in performance enhancing software is far from adequate in regards to the exploitation of hardware features on this complex new architecture. As a result, much of the performance capabilities...

chapter

Lightweight Transactional Memory systems for large scale shared memory MPSoCs

Q. Meunier, F. Petrot

2009 Joint IEEE North-East Workshop on Circuits and Systems and TAISA Conference > 1 - 4

2009 Joint IEEE North-East Workshop on Circuits and Systems and TAISA Conference (NEWCAS-TAISA)

The evolution of the consumer electronic devices leads to a consolidation of the architectures towards fairly homogeneous multiprocessor platforms. As these highly programmable architectures execute explicitly parallel programs, and until automatic parallel compilers exist, the software programmer has to expose thread (i.e. coarse grain) level parallelism to use these resources. Thread is currently...

chapter

Programming Abstractions and Toolchain for Dataflow Multithreading Architectures

K. Stavrou, D. Pavlou, M. Nikolaides, P. Petrides, more

2009 Eighth International Symposium on Parallel and Distributed Computing > 107 - 114

2009 Eighth International Symposium on Parallel and Distributed Computing (ISPDC)

The need to exploit multi-core systems for parallel processing has revived the concept of dataflow. In particular, the dataflow multithreading architectures have proven to be good candidates for these systems. In this work we propose an abstraction layer that enables compiling and running a program written for an abstract dataflow multithreading architecture on different implementations. More specifically,...

chapter

Acceleration of Cloth Simulation Utilizing Cell/B.E.

S. Inui, A. Rokugawa, Y. Horiba

2009 International Conference on Biometrics and Kansei Engineering > 224 - 227

2009 International Conference on Biometrics and Kansei Engineering, ICBAKE

A system the final goal of which is to design feelings of fabrics has been developed. For this purpose, cloth was modeled based on thread model and geometrical arrangement of threads of cloth. A simulation system was constructed with the cloth model. In the system, a lot of calculating time is required when wide range of cloth is treated. The system was accelerated with a Cell/B.E. processor. By SIMD...

chapter

Design of a parallel AES for graphics hardware using the CUDA framework

A. Di Biagio, A. Barenghi, G. Agosta, G. Pelosi

2009 IEEE International Symposium on Parallel&Distributed Processing > 1 - 8

2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Web servers often need to manage encrypted transfers of data. The encryption activity is computationally intensive, and exposes a significant degree of parallelism. At the same time, cheap multicore processors are readily available on graphics hardware, and toolchains for development of general purpose programs are being released by the vendors. In this paper, we propose an effective implementation...

Data set:
ieee
Keywords:
COMPUTER ARCHITECTURE
PARALLEL PROCESSING
YARN

Publication date

Set your own date range

INFONA - science communication portal

Search results

Performance Evaluation of Dynamic Speculative Multithreading with the Cascadia Architecture

Investigation on Multi-Grain Parallelism in Chip Multiprocessor for Multimedia Application

Real-Time Motion Tracking Using the CELL BE

On the Robust Mapping of Dynamic Programming onto a Graphics Processing Unit

CUDA-Based Jacobi's Iterative Method

Numerical Parallel Processing Based on GPU with CUDA Architecture

Implementation of association rule mining using CUDA

Cellular Level Agent Based Modelling on the Graphics Processing Unit

Fine-grain Parallelism Using Multi-core, Cell/BE, and GPU Systems: Accelerating the Phylogenetic Likelihood Function

Fast Isosurface Extraction for Medical Volume Dataset on Cell BE

Performance and Power Efficiency Analysis of the Symmetric Cryptograph on Two Stream Processor Architectures

Prophet: A Speculative Multi-threading Execution Model with Architectural Support Based on CMP

Just-in-Time Renaming and Lazy Write-Back on the Cell/B.E.

Efficient implementation for MD5-RC4 encryption using GPU with CUDA

NeMo: A Platform for Neural Modelling of Spiking Neurons Using GPUs

Balancing Locality and Parallelism on Shared-cache Mulit-core Systems

Lightweight Transactional Memory systems for large scale shared memory MPSoCs

Programming Abstractions and Toolchain for Dataflow Multithreading Architectures

Acceleration of Cloth Simulation Utilizing Cell/B.E.

Design of a parallel AES for graphics hardware using the CUDA framework

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options