The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper studies two parallelization techniques for the implementation of a SPSO algorithm applied to optimize electromagnetic field devices, GPGPU and Pthreads for multiprocessor architectures. The GPGPU and Pthreads implementations are compared in terms of solution quality and speed up. The electromagnetic optimization problems chosen for testing the efficiency of the parallelization techniques...
In this paper, we would like to introduce a GPU accelerated solver for systems of linear equations with an infinite precision. The infinite precision means that the system can provide a precise solution without any rounding error. These errors usually come from limited precision of floating point values within their natural computer representation. In a simplified description, the system is using...
General-purpose computing on Graphics Processing Units (GPGPUs) became increasingly popular for a wide range of applications beyond traditional graphic rendering workloads. GPGPU exploits parallelism in applications via multithreading to hide memory latencies, and handles control complexity by barrier synchronizations. Warp scheduling algorithms have been optimized to increase memory latency hiding...
Utilizing accelerators in heterogeneous systems is an established approach for designing peta-scale applications. Today, CUDA offers a rich programming interface for GPU accelerators but requires developers to incorporate several layers of parallelism on both CPU and GPU. From this increasing program complexity emerges the need for sophisticated performance tools. This work contributes by analyzing...
We present an auto-parallelization technique for generating GPU implementation of data-structure operations from a sequential spec-ification. The technique partitions the data-structure operations into barrier-separated phases such that each phase executes only homogeneous operations. Homogeneity is dictated by the method type, which is derived from the specification. Two key aspects of our technique...
Irregular algorithms are algorithms with complex main data structures such as directed and undirected graphs, trees, etc. A useful abstraction for many irregular algorithms is its operator formulation in which the algorithm is viewed as the iterated application of an operator to certain nodes, called active nodes, in the graph. Each operator application, called an activity, usually touches only a...
This paper presents a method to map and implement the 1-D FFT on a GPGPU and extends the method to the 2-D FFT. Two approaches are used to maximize the performance. One is to localize data inside the caches of the GPGPU and the other is to properly assign threads and blocks to reach higher performance. The results show that our implementation is 3.62 times faster to perform 32M-point 1-D FFT and 4...
Many general-purpose applications exploit Graphics Processing Units (GPUs) by executing a set of well-known dataparallel primitives. Those primitives are usually invoked from the host many times, so their throughput has a great impact on the performance of the overall system. Thus, the study of novel algorithmic strategies to optimize their implementation on current devices is an interesting topic...
In contrast to English search engines, Arabic search engines did not have their fair share in modern studies despite the continuous growth of Arabic Internet users and data. Towards bridging the gap, this paper presents a novel indexing algorithm customized for Arabic documents. Our algorithm exploits the characteristics of the Arabic language to enhance indexing and lookup. Additionally, the algorithm...
Hybrid CPU/GPU computing architecture has received great attention from the researchers of high performance computing. This new architecture provides higher computation performance than that uses only CPUs for data computation. However, the programming on this computing architecture is not easy for programmers since they have to learn the programming APIs of GPU and handle data communication between...
Nowadays microscopic analysis of tissue samples is done more and more by using digital imagery and special immunodiagnostic software. These are typically specific applications developed for one distinct field, but some subroutines are commonly repeated, for example several applications contain steps that can detect cell nuclei in a sample image. The aim of our research is developing a new data parallel...
Atomic operations are important building blocks in supporting general-purpose computing on graphics processing units (GPUs). For instance, they can be used to coordinate execution between concurrent threads, and in turn, assist in constructing complex data structures such as hash tables or implementing GPU-wide barrier synchronization. While the performance of atomic operations has improved substantially...
New high performance computing (HPC) applications recently have to face scalability over an increasing number of nodes and the programming of special accelerator hardware. Hybrid composition of large computing systems leads to a new dimension in complexity of software development. This paper presents a novel approach to gain insight into accelerator interaction and utilization without any changes...
We are witnessing an increasing adoption of GPUs for performing general purpose computation, which is usually known as GPGPU. The main challenge in developing such applications is that they often do not fit in the model required by the graphics processing devices, limiting the scope of applications that may be benefit from the computing power provided by GPUs. Even when the application fits GPU model,...
The Gauss-Seidel method is very efficient for solving problems such as tightly-coupled constraints with possible redundancies. However, the underlying algorithm is inherently sequential. Previous works have exploited sparsity in the system matrix to extract parallelism. In this paper, we propose to study several parallelization schemes for fully-coupled systems, unable to be parallelized by existing...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.