The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Acceleration of cryptographic applications on Graphical Processing Units (GPUs) platforms is a research topic with practical interest, because these platforms provide huge computational power for this type of applications. In this paper, we propose a parallel algorithm for Elliptic Curve (EC) point multiplication in order to compute EC cryptography on GPUs. The proposed approach relies in using the...
The Compute Unified Device Architecture (CUDA) is a new programming platform making use of the unified shader design of the most current Graphics Processing Units (GPUs) from NVIDIA. In this paper, we apply this revolutionary new technology to implement the automatic time gain compensation (ATGC) for medical ultrasound imaging. The parallel box filtering method and general matrix computation algorithms...
Ultrasound B-mode imaging is the basic image mode which can offer anatomic information of organs for clinical diagnosis. Because of the massive computation involved in baseband processing from focused radio-frequency (RF) signals followed by envelop detection, compression and scan conversion required for high quality B-mode imaging, existing medical systems always rely on complicated hardware in real...
Optimal heuristic searches such as A* search are commonly used for low-dimensional planning such as 2D path finding. These algorithms however, typically do not scale well to high-dimensional planning problems such as motion planning for robotic arms, computing motion trajectories for non-holonomic robotic vehicles and motion synthesis for humanoid characters. A recently developed randomized version...
Exploiting the graphics processing unit (GPU) is useful to obtain higher performance with a less number of host machines in grid systems. One problem in GPU-accelerated grid systems is the lack of efficient multitasking mechanisms. In this paper, we propose a cooperative multitasking method capable of simultaneous execution of a graphics application and a CUDA-based scientific application on a single...
Finite difference methods continue to provide an important and parallelisable approach to many numerical simulations problems. Iterative multigrid and multilevel algorithms can converge faster than ordinary finite difference methods but can be more difficult to parallelise. Data parallel paradigms tend to lend themselves particularly well to solving regular mesh PDEs whereby low latency communications...
To exploit the benefits of throughput-optimized processors such as GPUs, applications need to be redesigned to achieve performance and efficiency. In this work, we present techniques to speed up statistical timing analysis on throughput processors. We draw upon advancements in improving the efficiency of Monte Carlo based statistical static timing analysis (MC SSTA) using techniques to reduce the...
This paper deals with solving large instances of the Linear Sum Assignment Problems (LSAPs) under realtime constraints, using Graphical Processing Units (GPUs). The motivating scenario is an industrial application for P2P live streaming that is moderated by a central tracker that is periodically solving LSAP instances to optimize the connectivity of thousands of peers. However, our findings are generic...
The release of general purpose GPU programming environments has garnered universal access to computing performance that was once only available to super-computers. The availability of such computational power has fostered the creation and re-deployment of algorithms, new and old, creating entirely new classes of applications. In this paper, a GPU implementation of the Center-Surround Distribution...
Throughput and programmability have always been the central, but generally conflicting concerns for modern IP router designs. Current high performance routers depend on proprietary hardware solutions, which make it difficult to adapt to ever-changing network protocols. On the other hand, software routers offer the best flexibility and programmability, but could only achieve a throughput one order...
This paper explores the ability to use graphics processing units (GPUs) as co-processors to harness the inherent parallelism of batch operations in systems that require high performance. To this end we have chosen bloom filters (space-efficient data structures that support the probabilistic representation of set membership) as the queries these data structures support are often performed in batches...
The strategy of using CUDA-compatible GPUs as a parallel computation solution to improve the performance of programs has been more and more widely approved during the last two years since the CUDA platform was released. Its benefit extends from the graphic domain to many other computationally intensive domains. Tiling, as the most general and important technique, is widely used for optimization in...
In recent years, with the development of GPU, based on the general purpose computation on graphics processors has became a new field. Aiming at the processing of GPU, this paper provides the formal description for data parallel mode, a detailed description of the CUDA programming mode land the principle of optimization. It shows by the comparative experiment that CUDA owns strongly of the ability...
Scene recognition has become a remarkable field in image processing area, and many methods have been proposed in recent years, in which the idea of extracting the scene gist from global features has been proved to have higher retrieval accuracy compared with many other methods. However, the process of extracting gist is heavily time-consuming and not suitable for real-time application. In this paper,...
Deep reactive ion etching (DRIE) technique is a new and powerful tool in Micro-Electro-Mechanical Systems (MEMS) fabrication. A 3D DRIE simulation can help researcher understand the time-evolution of Bosch process used in DRIE. Due to the high complexity of the algorithm used in the simulation, it is necessary to develop an algorithm that can accelerate the simulation. This paper presents a parallel...
Driven by the insatiable demand of real-time graphics, especially from the market of computer games, graphics processing unit (GPU) is becoming a major computing horsepower during recent years since the performance of GPU is surpassing that of the contemporary CPU. This paper presents our study on how to efficiently recover the passwords for encrypted RAR files. Our research focus is on the AES key...
The essence of high performance computing (HPC) in the field of computation nanotechnology and problems encountered by HPC arrangement in applying HPC to Nano-enabled calculations have been presented in the paper. A proposal to optimize computations in an HPC setup has been formulated to make nanotechnology computations more effective and realistic on a CUDA based framework. Results and findings in...
Sparse matrix-vector multiplication (SpMV) is a common operation in numerical linear algebra and is the computational kernel of many scientific applications. It is one of the original and perhaps most studied targets for FPGA acceleration. Despite this, GPUs, which have only recently gained both general-purpose programmability and native support for double precision floating-point arithmetic, are...
Graphics processing units (GPUs) have been widely used to accelerate algorithms that exhibit massive data parallelism or task parallelism. When such parallelism is not inherent in an algorithm, computational scientists resort to simply replicating the algorithm on every multiprocessor of a NVIDIA GPU, for example, to create such parallelism, resulting in embarrassingly parallel ensemble runs that...
The characteristics of modern graphics processing unit (GPU) is programmable, high price / performance ratio and high speed . It has a strong ability to adapt the parallel calculation, Based on this, the article study the general method of GPU calculating and use compute unified device architecture (CUDA) to design new parallel algorithm to accelerate the matrix inversion and binarization algorithm...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.