Search results

chapter

NSIM-ACE: An interconnection network simulator for evaluating remote direct memory access

Ryutaro Susukita, Yoshiyuki Morie, Takeshi Nanri, Hidetomo Shibamura

2016 6th International Conference on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH) > 1 - 8

2016 6th International Conference on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH)

Network simulation is an important technique for designing interconnection networks and communication libraries. Also network simulations are useful for the analysis of internal communication behavior in parallel applications. This paper introduces a new interconnection network simulator NSIM-ACE. This simulator enables us to evaluate RDMA directly while existing simulators do not have such capability...

chapter

Acceleration of a parallel homology search for base sequences by a GPU

Kenya Tamura, Keiichi Kaneko

2017 6th ICT International Student Project Conference (ICT-ISPC) > 1 - 4

2017 6th ICT International Student Project Conference (ICT-ISPC)

In biology, there is a research field ‘bioinformatics’ in which computers are used as a method for problem solving. Bioinformatics includes a topic that is related to the analysis of genetic information. To analyze genetic information, a homology search is used. The homology search detects similar parts of two base sequences. The Smith-Waterman algorithm is one of the most famous approaches for a...

chapter

Parallelism and Garbage Collection Aware I/O Scheduler with Improved SSD Performance

Jiayang Guo, Yiming Hu, Bo Mao, Suzhen Wu

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 1184 - 1193

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

In this paper, we propose PGIS, a parallelism and garbage collection aware I/O Scheduler, which identifies the hot data based on trace characteristics to exploit the channel level internal parallelism of flash-based storage systems. PGIS not only fully exploits abundant channel resource in the SSD, but also it introduces a hot data identification mechanism to reduce the garbage collection overhead...

chapter

DCM: A Python-based middleware for parallel processing applications on small scale devices

Michael Lescisin, Qusay H. Mahmoud

2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE) > 1 - 5

2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE)

Parallel programming has been an active area of research in computer science and software engineering for many years. Parallel programming should ideally provide a linear speedup to computational problems. In reality, this is rarely the case. While there are some algorithms that cannot be parallelized, many that can, still fail to provide the ideal linear speedup. For algorithms that can benefit from...

chapter

MRQUSAR: A web-scale distributed spatial reasoner using MapReduce

Sangha Nam, Incheol Kim

2017 IEEE International Conference on Big Data and Smart Computing (BigComp) > 296 - 303

2017 IEEE International Conference on Big Data and Smart Computing (BigComp)

In order to answer effectively on behalf of humans in a DeepQA environment, such as the American quiz show Jeopardy (http://www.jeopardy.com), the computer is required to have the capability of fast temporal and spatial reasoning on a large-scale commonsense knowledge base. Many existing spatial reasoners share a common limitation in that they do not contain conversion rules between the directional...

chapter

A Built-in Circuit for Self-Repairing Mesh-Connected Processor Arrays with Spares on Diagonal

Itsuo Takanami, Masaru Fukushi

2017 IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC) > 110 - 117

2017 IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC)

We present a built-in self-restructuring system for a mesh-connected processor array where faulty processing elements are compensated for by spare processing elements located on a diagonal. First, an algorithm for restructuring the array with faulty processing elements is presented. The reliability of the system is analyzed by simulation. It is compared with that of an array with spare processing...

chapter

Utility-Based Scheduling for Periodic Tasks with Multiple Parallelization Options

Dawei Li, Jie Wu

2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom) > 423 - 430

2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)

Modern cloud computing systems have been using multiple processing units on servers to increase their processing capability. Recently, applications with multiple parallelization options have been witnessed, and serve as a promising model for efficiently utilizing the processing capacity of the system. In this paper, we consider utility-based scheduling for periodic multisegment tasks with multiple...

chapter

Position: Some thoughts on the future of parallel computing and connectionist massively parallel models

Predrag T. Tosic

2016 Future Technologies Conference (FTC) > 1241 - 1250

2016 Future Technologies Conference (FTC)

We discuss the future of massively parallel computing from a fundamental architecture standpoint. Our central thesis is that various versions of Moore's Laws will all unavoidably break down over the next two to three decades, due to fundamental limitations imposed by the laws of physics (especially quantum mechanics). Therefore, the end to scaling-up von Neumann-based architectures by adding more...

chapter

ScaleMine: Scalable Parallel Frequent Subgraph Mining in a Single Large Graph

Ehab Abdelhamid, Ibrahim Abdelaziz, Panos Kalnis, Zuhair Khayyat, more

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 716 - 727

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

Frequent Subgraph Mining is an essential operation for graph analytics and knowledge extraction. Due to its high computational cost, parallel solutions are necessary. Existing approaches either suffer from load imbalance, or high communication and synchronization overheads. In this paper we propose ScaleMine; a novel parallel frequent subgraph mining system for a single large graph. ScaleMine introduces...

chapter

Optimizing PLASMA Eigensolver on Large Shared Memory Systems

Cheng Liao

2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) > 73 - 80

2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)

Performance of the PLASMA dense symmetric Eigensolver is optimized for large shared memory computer systems using multiple Householder domains for dense to band reduction and a communication reducing kernel for bulge chasing. The mr3-smp code by Petschow and Bientinesi is used for the tridiagonal eigensolution and the eigenvector back-transformations employ a 1D parallel decomposition. The input matrix,...

chapter

20 Years of Teaching Parallel Processing to Computer Science Seniors

Jie Liu

2016 Workshop on Education for High-Performance Computing (EduHPC) > 7 - 13

2016 Workshop on Education for High-Performance Computing (EduHPC)

In this paper, we present our Concurrent Systems class, where parallel programming and parallel and distributed computing (PDC) concepts have been taught for more than 20 years. Despite several rounds of changes in hardware, the class maintains its goals of allowing students to learn parallel computer organizations, studying parallel algorithms, and writing code to be able to run on parallel and distributed...

chapter

A Project-Based HPC Course for Single-Box Computers

Carlos Bederian, Nicolas Wolovick

2016 Workshop on Education for High-Performance Computing (EduHPC) > 1 - 6

2016 Workshop on Education for High-Performance Computing (EduHPC)

Throughout three iterations and six years we have developed a project-based course in HPC for single-box computers tailored to science students in general. The course is based on strong premises: showing that assembly is what actually runs on machines, dividing parallelism in three dimensions (ILP, DLP, TLP), and using them incrementally in a single numerical simulation throughout the course working...

chapter

An Evaluation of the Parallella Architecture for the Convex Hull Computation

Keisuke Nakata, Yasuaki Ito

2016 Fourth International Symposium on Computing and Networking (CANDAR) > 704 - 706

2016 Fourth International Symposium on Computing and Networking (CANDAR)

The main contribution of this paper is to show an implementation of the parallel convex hull algorithm on the Parallella architecture. Parallella is a single-board computer with 16 mesh-connected cores. We have considered the memory architecture and mesh-connected network of the Parallella architecture. We evaluated the computing time and the energy-efficiency by comparing with various computing platforms...

chapter

Parallelized Side-Channel Attack Resisted Scalar Multiplication Using q-Based Addition-Subtraction k-Chains

Kittiphop Phalakarn, Kittiphon Phalakarn, Vorapong Suppakitpaisarn

2016 Fourth International Symposium on Computing and Networking (CANDAR) > 140 - 146

2016 Fourth International Symposium on Computing and Networking (CANDAR)

This paper presents parallel scalar multiplication techniques for elliptic curve cryptography using q-based addition-subtraction k-chain which can also effectively resist side-channel attack. Many techniques have been discussed to improve scalar multiplication, for example, double-and-add, NAF, w-NAF, addition chain and addition-subtraction chain. However, these techniques cannot resist side-channel...

chapter

A parallel implementation of the correction function method for poisson's equation with immersed surface charges

David S. Abraham, Dennis D. Giannacopoulos

2016 IEEE Conference on Electromagnetic Field Computation (CEFC) > 1

2016 IEEE Conference on Electromagnetic Field Computation (CEFC)

In this paper, a novel graphics processing unit (GPU) implementation of the Correction Function Method (CFM) is applied to an electrostatic problem with curved surface charge distributions. The CFM is a highly accurate method in which curved surface charge distributions are immersed in a regular grid, without modifying the linear system's matrix. The CFM is shown to be an excellent candidate for parallelization,...

chapter

Relinquishment coherence for enhancing directory efficiency in chip multiprocessors

Wei Shu, Nian-Feng Tzeng

2016 IEEE 34th International Conference on Computer Design (ICCD) > 372 - 375

2016 IEEE 34th International Conference on Computer Design (ICCD)

A directory-based chip multiprocessor (CMP) suffers from excessive directory area overhead when its size grows. This work leverages novel relinquishment coherence and superior directory efficiency (RECODE) to lower area overhead. Relinquishment coherence boosts the utilization of a hash-based, set-associative table which holds distinct present-bit vectors (PVs), as it transforms a conflict PV to its...

chapter

Parallel implementation of face detection algorithm on GPU

Aashna R. Bhatia, Narendra M. Patel, Narendra C. Chauhan

2016 2nd International Conference on Next Generation Computing Technologies (NGCT) > 674 - 677

2016 2nd International Conference on Next Generation Computing Technologies (NGCT)

The development of high resolution digital cameras for recording of still images and video streams has had a momentous influence on how communication and entertainment have developed during the recent years. Processing of human faces discovers many applications in various domains like law enforcement, security surveillance etc. A standard face processing system consists of face detection, face recognition,...

chapter

The parallelization of convolution on a CNN using a SIMT based GPGPU

Heekyeong Jeon, Kwanho Lee, Seonghyung Han, Kwangyeob Lee

2016 International SoC Design Conference (ISOCC) > 333 - 334

2016 International SoC Design Conference (ISOCC)

This paper proposes a method to accelerate convolutional neural network(CNN) by utilizing GPGPU. The convolutional layer of the conventional CNN required a large number of multiplication operations. This paper seeks to reduce the number of multiplication operations through Winograd convolution operation and perform parallel processing of the convolution operation by utilizing SIMT structure of GPGPU...

chapter

Characterization of quantum workloads on SIMD architectures

Robert Risque, Adwait Jog

2016 IEEE International Symposium on Workload Characterization (IISWC) > 1 - 9

2016 IEEE International Symposium on Workload Characterization (IISWC)

There is a growing research interest in quantum computing because of its promise to provide significant performance speedups over classical computers at specialized tasks. While there have been many advances in building more capable, robust, and useful quantum algorithms and software, it is not clear how a scalable, high-performance, and area-efficient quantum architecture should be designed for efficient...

chapter

Analysing the impact of quantum computing using system dynamics

Mariam E. Elhaddad, Salma A. O. Mohammed

2016 International Conference on Engineering & MIS (ICEMIS) > 1 - 5

2016 International Conference on Engineering & MIS (ICEMIS)

Quantum computing is an innovative and exciting field at the crossing point of mathematics, computer science and physics. In this paper we used a system dynamics approach to illustrate the impact of quantum computing through a variety of factors. Three main factors, namely optimization, scaling and parallelism, were identified as being the most influential. We then attempted to identify which areas...

INFONA - science communication portal

Search results

NSIM-ACE: An interconnection network simulator for evaluating remote direct memory access

Acceleration of a parallel homology search for base sequences by a GPU

Parallelism and Garbage Collection Aware I/O Scheduler with Improved SSD Performance

DCM: A Python-based middleware for parallel processing applications on small scale devices

MRQUSAR: A web-scale distributed spatial reasoner using MapReduce

A Built-in Circuit for Self-Repairing Mesh-Connected Processor Arrays with Spares on Diagonal

Utility-Based Scheduling for Periodic Tasks with Multiple Parallelization Options

Position: Some thoughts on the future of parallel computing and connectionist massively parallel models

ScaleMine: Scalable Parallel Frequent Subgraph Mining in a Single Large Graph

Optimizing PLASMA Eigensolver on Large Shared Memory Systems

20 Years of Teaching Parallel Processing to Computer Science Seniors

A Project-Based HPC Course for Single-Box Computers

An Evaluation of the Parallella Architecture for the Convex Hull Computation

Parallelized Side-Channel Attack Resisted Scalar Multiplication Using q-Based Addition-Subtraction k-Chains

A parallel implementation of the correction function method for poisson's equation with immersed surface charges

Relinquishment coherence for enhancing directory efficiency in chip multiprocessors

Parallel implementation of face detection algorithm on GPU

The parallelization of convolution on a CNN using a SIMT based GPGPU

Characterization of quantum workloads on SIMD architectures

Analysing the impact of quantum computing using system dynamics

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options