The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Network simulation is an important technique for designing interconnection networks and communication libraries. Also network simulations are useful for the analysis of internal communication behavior in parallel applications. This paper introduces a new interconnection network simulator NSIM-ACE. This simulator enables us to evaluate RDMA directly while existing simulators do not have such capability...
In biology, there is a research field ‘bioinformatics’ in which computers are used as a method for problem solving. Bioinformatics includes a topic that is related to the analysis of genetic information. To analyze genetic information, a homology search is used. The homology search detects similar parts of two base sequences. The Smith-Waterman algorithm is one of the most famous approaches for a...
In this paper, we propose PGIS, a parallelism and garbage collection aware I/O Scheduler, which identifies the hot data based on trace characteristics to exploit the channel level internal parallelism of flash-based storage systems. PGIS not only fully exploits abundant channel resource in the SSD, but also it introduces a hot data identification mechanism to reduce the garbage collection overhead...
Parallel programming has been an active area of research in computer science and software engineering for many years. Parallel programming should ideally provide a linear speedup to computational problems. In reality, this is rarely the case. While there are some algorithms that cannot be parallelized, many that can, still fail to provide the ideal linear speedup. For algorithms that can benefit from...
In order to answer effectively on behalf of humans in a DeepQA environment, such as the American quiz show Jeopardy (http://www.jeopardy.com), the computer is required to have the capability of fast temporal and spatial reasoning on a large-scale commonsense knowledge base. Many existing spatial reasoners share a common limitation in that they do not contain conversion rules between the directional...
We present a built-in self-restructuring system for a mesh-connected processor array where faulty processing elements are compensated for by spare processing elements located on a diagonal. First, an algorithm for restructuring the array with faulty processing elements is presented. The reliability of the system is analyzed by simulation. It is compared with that of an array with spare processing...
Modern cloud computing systems have been using multiple processing units on servers to increase their processing capability. Recently, applications with multiple parallelization options have been witnessed, and serve as a promising model for efficiently utilizing the processing capacity of the system. In this paper, we consider utility-based scheduling for periodic multisegment tasks with multiple...
We discuss the future of massively parallel computing from a fundamental architecture standpoint. Our central thesis is that various versions of Moore's Laws will all unavoidably break down over the next two to three decades, due to fundamental limitations imposed by the laws of physics (especially quantum mechanics). Therefore, the end to scaling-up von Neumann-based architectures by adding more...
Frequent Subgraph Mining is an essential operation for graph analytics and knowledge extraction. Due to its high computational cost, parallel solutions are necessary. Existing approaches either suffer from load imbalance, or high communication and synchronization overheads. In this paper we propose ScaleMine; a novel parallel frequent subgraph mining system for a single large graph. ScaleMine introduces...
Performance of the PLASMA dense symmetric Eigensolver is optimized for large shared memory computer systems using multiple Householder domains for dense to band reduction and a communication reducing kernel for bulge chasing. The mr3-smp code by Petschow and Bientinesi is used for the tridiagonal eigensolution and the eigenvector back-transformations employ a 1D parallel decomposition. The input matrix,...
In this paper, we present our Concurrent Systems class, where parallel programming and parallel and distributed computing (PDC) concepts have been taught for more than 20 years. Despite several rounds of changes in hardware, the class maintains its goals of allowing students to learn parallel computer organizations, studying parallel algorithms, and writing code to be able to run on parallel and distributed...
Throughout three iterations and six years we have developed a project-based course in HPC for single-box computers tailored to science students in general. The course is based on strong premises: showing that assembly is what actually runs on machines, dividing parallelism in three dimensions (ILP, DLP, TLP), and using them incrementally in a single numerical simulation throughout the course working...
The main contribution of this paper is to show an implementation of the parallel convex hull algorithm on the Parallella architecture. Parallella is a single-board computer with 16 mesh-connected cores. We have considered the memory architecture and mesh-connected network of the Parallella architecture. We evaluated the computing time and the energy-efficiency by comparing with various computing platforms...
This paper presents parallel scalar multiplication techniques for elliptic curve cryptography using q-based addition-subtraction k-chain which can also effectively resist side-channel attack. Many techniques have been discussed to improve scalar multiplication, for example, double-and-add, NAF, w-NAF, addition chain and addition-subtraction chain. However, these techniques cannot resist side-channel...
In this paper, a novel graphics processing unit (GPU) implementation of the Correction Function Method (CFM) is applied to an electrostatic problem with curved surface charge distributions. The CFM is a highly accurate method in which curved surface charge distributions are immersed in a regular grid, without modifying the linear system's matrix. The CFM is shown to be an excellent candidate for parallelization,...
A directory-based chip multiprocessor (CMP) suffers from excessive directory area overhead when its size grows. This work leverages novel relinquishment coherence and superior directory efficiency (RECODE) to lower area overhead. Relinquishment coherence boosts the utilization of a hash-based, set-associative table which holds distinct present-bit vectors (PVs), as it transforms a conflict PV to its...
The development of high resolution digital cameras for recording of still images and video streams has had a momentous influence on how communication and entertainment have developed during the recent years. Processing of human faces discovers many applications in various domains like law enforcement, security surveillance etc. A standard face processing system consists of face detection, face recognition,...
This paper proposes a method to accelerate convolutional neural network(CNN) by utilizing GPGPU. The convolutional layer of the conventional CNN required a large number of multiplication operations. This paper seeks to reduce the number of multiplication operations through Winograd convolution operation and perform parallel processing of the convolution operation by utilizing SIMT structure of GPGPU...
There is a growing research interest in quantum computing because of its promise to provide significant performance speedups over classical computers at specialized tasks. While there have been many advances in building more capable, robust, and useful quantum algorithms and software, it is not clear how a scalable, high-performance, and area-efficient quantum architecture should be designed for efficient...
Quantum computing is an innovative and exciting field at the crossing point of mathematics, computer science and physics. In this paper we used a system dynamics approach to illustrate the impact of quantum computing through a variety of factors. Three main factors, namely optimization, scaling and parallelism, were identified as being the most influential. We then attempted to identify which areas...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.