The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Thread-level parallelism (TLP) has been extensively studied in order to overcome the limitations of exploiting instruction-level parallelism (ILP) on high-performance superscalar processors. One promising method of exploiting TLP is dynamic speculative multithreading (D-SpMT), which extracts multiple threads from a sequential program without compiler support or instruction set extensions. This paper...
With the advent of chip multiprocessor (CMP) architecture, programmer must tune the program to the architecture in order to fully utilize the hardware resource. How to parallel program multimedia application in the CMP is a big obstacle. In this paper, we introduce the potential parallelism in the multimedia application and the multi-grain parallelism architecture in the CMP; also we make a systematic...
Object tracking is an important computer vision problem with many civilian and military applications including surveillance, robotics and intelligent vehicle design. Most of these applications require fast processing due to their real time nature. The Cell processor is a cost-efficient commodity architecture intended for video gaming and provides new opportunities for parallel processing. This paper...
Graphics processing units (GPUs) have been widely used to accelerate algorithms that exhibit massive data parallelism or task parallelism. When such parallelism is not inherent in an algorithm, computational scientists resort to simply replicating the algorithm on every multiprocessor of a NVIDIA GPU, for example, to create such parallelism, resulting in embarrassingly parallel ensemble runs that...
Solving linear equations is a common problem in the fields of science and engineering. Accelerating its solving process is of great significance. Modern GPUs are high performance many-core processors fit for large scale parallel computing. They provide us a novel way for accelerating the solving process. A GPU based parallel Jacobi's iterative solver for dense linear equations is presented in this...
The characteristics of modern graphics processing unit (GPU) is programmable, high price / performance ratio and high speed . It has a strong ability to adapt the parallel calculation, Based on this, the article study the general method of GPU calculating and use compute unified device architecture (CUDA) to design new parallel algorithm to accelerate the matrix inversion and binarization algorithm...
The purpose of this paper is to implement association rule mining algorithm using Nvidia CUDA framework for general purpose computing on GPU. The major objective is to perform performance comparison of association rule mining algorithm using C based implementation on Intel Quad Core/Core2 Duo CPU with CUDA based implementation on Nvidia G80 and GTX 200 series GPU. The final outcome of this research...
Cellular level agent based modelling is reliant on either sequential processing environments or expensive and largely unavailable PC grids. The GPU offers an alternative architecture for such systems, however the steep learning curve associated with the GPUs data parallel architecture has previously limited the uptake of this emerging technology. In this paper we demonstrate a template driven agent...
We are currently faced with the situation where applications have increasing computational demands and there is a wide selection of parallel processor systems. In this paper we focus on exploiting fine-grain parallelism for a demanding bioinformatics application - MrBayes - and its phylogenetic likelihood functions (PLF) using different architectures. Our experiments compare side-by-side the scalability...
The size of volumetric data generated by medical imaging and scientific simulations is increased significantly due to the dramatic advances in medical imaging modalities and computing technologies. The volumetric data generally need to be visualized and marching cubes algorithm (MC for short) is one of the standard methods of the isosurface extraction for the medical applications. However, MC algorithm...
Multimedia and some scientific applications have achieved good performance on the stream processor architecture by employing the stream programming model. In order to find out the way to accelerate the symmetric cryptograph on stream processor, we implement and analyze cryptograph algorithms on different stream processors in this paper. Four cipher algorithms including RC5, AES, TWOFISH and 3DES in...
Speculative Multithreading (SpMT) has been proposed as a perspective method for sequential programs to benefit from the increasing computing resources provided by Chip Multiprocessors (CMP). This paper analyzes the extraction of ihread-level parallelism from general-purpose programs and presents a speculative multi-threading execution model, Prophet. The architectural support for Prophet execution...
Cell Superscalar (CellSs) provides a simple, flexible and easy programming approach for the Cell Broadband Engine (Cell/B.E.) that automatically exploits the inherent concurrency of applications at a function or task level. The CellSs environment is based on a source-to-source compiler that translates annotated C or Fortran code and a runtime library tailored for the Cell/B.E. that orchestrates the...
Benefit from the novel compute unified device architecture (CUDA) introduced by NVIDIA, graphics processing unit (GPU) turns out to be a promising solution for cryptography applications. In this paper we present an efficient implementation for MD5-RC4 encryption using NVIDIA GPU with novel CUDA programming framework. The MD5-RC4 encryption algorithm was implemented on NVIDIA GeForce 9800GTX GPU. The...
Simulating spiking neural networks is of great interest to scientists wanting to model the functioning of the brain. However, large-scale models are expensive to simulate due to the number and interconnectedness of neurons in the brain. Furthermore, where such simulations are used in an embodied setting, the simulation must be real-time in order to be useful. In this paper we present NeMo, a platform...
The emergence of multi-core systems opens new opportunities for thread-level parallelism and dramatically increases the performance potential of applications running on these systems. However, the state of the art in performance enhancing software is far from adequate in regards to the exploitation of hardware features on this complex new architecture. As a result, much of the performance capabilities...
The evolution of the consumer electronic devices leads to a consolidation of the architectures towards fairly homogeneous multiprocessor platforms. As these highly programmable architectures execute explicitly parallel programs, and until automatic parallel compilers exist, the software programmer has to expose thread (i.e. coarse grain) level parallelism to use these resources. Thread is currently...
The need to exploit multi-core systems for parallel processing has revived the concept of dataflow. In particular, the dataflow multithreading architectures have proven to be good candidates for these systems. In this work we propose an abstraction layer that enables compiling and running a program written for an abstract dataflow multithreading architecture on different implementations. More specifically,...
A system the final goal of which is to design feelings of fabrics has been developed. For this purpose, cloth was modeled based on thread model and geometrical arrangement of threads of cloth. A simulation system was constructed with the cloth model. In the system, a lot of calculating time is required when wide range of cloth is treated. The system was accelerated with a Cell/B.E. processor. By SIMD...
Web servers often need to manage encrypted transfers of data. The encryption activity is computationally intensive, and exposes a significant degree of parallelism. At the same time, cheap multicore processors are readily available on graphics hardware, and toolchains for development of general purpose programs are being released by the vendors. In this paper, we propose an effective implementation...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.