Search results

Items from 1 to 20 out of 473 results

chapter

Aggressive pipelining of irregular applications on reconfigurable hardware

Zhaoshi Li, Leibo Liu, Yangdong Deng, Shouyi Yin, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 575 - 586

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

CPU-FPGA heterogeneous platforms offer a promising solution for high-performance and energy-efficient computing systems by providing specialized accelerators with post-silicon reconfigurability. To unleash the power of FPGA, however, the programmability gap has to be filled so that applications specified in high-level programming languages can be efficiently mapped and scheduled on FPGA. The above...

chapter

Assessing Sparse Triangular Linear System Solvers on GPUs

Daniel Erguiz, Ernesto Dufrechou, Pablo Ezzatti

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 37 - 42

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

An important number of Numerical Linear Algebra methods to tackle problems in diverse fields of science and engineering, rely heavily on the solution of one or many sparse triangular linear systems. Since the early years, this has motivated numerous efforts that seek to produce efficientimplementations of this kernel for most hardware platforms. However, this operation implies strong data dependencies...

chapter

Introducing parallel computing concepts in computer system related courses

Han Wan, Xiaopeng Gao, Xiang Long, Bo Jiang

2017 IEEE Frontiers in Education Conference (FIE) > 1 - 7

2017 IEEE Frontiers in Education Conference (FIE)

All semiconductor market domains are converging to concurrent platforms. This trend has certainly led real challenge to develop applications software that effectively uses these concurrent processors to achieve efficiency and performance goals. This paper argues that the Computer System related courses are natural places to introduce the parallelism, and the earlier to parallel computing concepts...

chapter

Neural network for saturation prediction of solid state drives

Jaehyung Kim, Jinuk Park, Sanghyun Park

2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC) > 2069 - 2074

2017 IEEE International Conference on Systems, Man and Cybernetics (SMC)

State-of-the-art storage devices that have parallel capability have significantly reduced the performance gap between processor and storage I/O. However, the internal parallelism makes it difficult to measure utilization that can be used as a basis of load balancing, which is a critical feature of performance improvement of parallel systems. When utilization of storage reaches to one hundred percent,...

chapter

Solving 0-1 quadratic problems with two-level parallelization of the BiqCrunch solver

Camille Coti, Etienne Leclercq, Frederic Roupin, Franck Butelle

2017 Federated Conference on Computer Science and Information Systems (FedCSIS) > 445 - 452

2017 Federated Conference on Computer Science and Information Systems (FedCSIS)

In this paper we present MLTBiqCrunch, a hierarchically parallelized version of the open-source solver BiqCrunch [1]. More precisely, this version has two levels of parallelization: a coarse grain, assigning a thread to a node evaluation and a fine grain, parallelizing a node evaluation when some threads are not busy. We present experiments on some classical binary quadratic optimization problems...

chapter

PolyPC: Polymorphic parallel computing framework on embedded reconfigurable system

Hongyuan Ding, Miaoqing Huang

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 8

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

With the help of parallelism provided by the fine-grained architecture, hardware accelerators on Field Programmable Gate Arrays (FPGAs) can significantly improve the performance of many applications. However, designers are typically required to have excellent hardware programming skills and unique optimization techniques to fully explore the potential of FPGA resources. In this work, we propose the...

chapter

SuperGraph-SLP Auto-Vectorization

Vasileios Porpodas

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 330 - 342

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

SIMD vectors help improve the performance of certain applications. The code gets vectorized into SIMD form either by hand, or automatically with auto-vectorizing compilers. The Superword-Level Parallelism (SLP) vectorization algorithm is a widely used algorithm for vectorizing straight-line code and is part of most industrial compilers. The algorithm attempts to pack scalar instructions into vectors...

chapter

Cloudifier virtual apps: Virtual desktop predictive analytics apps environment based on GPU computing framework

Andrei Ionut Damian, Alexandru Purdila, Nicolae Tapus

2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP) > 133 - 138

2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP)

The need for systems capable of conducting inferential analysis and predictive analytics is ubiquitous in a global information society. With the recent advances in the areas of predictive machine learning models and massive parallel computing a new set of resources is now potentially available for the computer science community in order to research and develop new truly intelligent and innovative...

chapter

Modified Convolution Neural Network for Highly Effective Parallel Processing

Sang-Soo Park, Jung-Hyun Hong, Ki-Seok Chung

2017 IEEE International Conference on Information Reuse and Integration (IRI) > 325 - 331

2017 IEEE International Conference on Information Reuse and Integration (IRI)

Today, Convolutional Neural Network (CNN) is adopted in a lot of areas such as computer vision and natural language processing. By employing hardware accelerators such as graphic processing unit (GPU), a significant amount of speedup can be achieved in CNN and many studies have proposed such acceleration methods. However, it is not straightforward to parallelize the CNN on a hardware accelerator because...

chapter

OpenCL 2.0 Compiler Adaptation on LLVM for PTX Simulators

Chun-Chieh Yang, Shao-Chung Wang, Min-Yi Hsu, Yuan-Ming Chang, more

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 53 - 58

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

OpenCL continues to gather momentum on both desktop and mobile devices. The new features of OpenCL 2.0 provides developers better expressive power in programming heterogeneous computing environments. Currently in the experimental simulation environment, gem5-gpu only supports CUDA, but GPGPU-Sim can support OpenCL by compiling OpenCL kernel code to PTX using real GPU driver. However, this driver compilation...

chapter

Runtime Data Layout Scheduling for Machine Learning Dataset

Yang You, James Demmel

2017 46th International Conference on Parallel Processing (ICPP) > 452 - 461

2017 46th International Conference on Parallel Processing (ICPP)

Machine Learning (ML) approaches are widelyused classification/regression methods for data mining applications. However, the time-consuming training process greatly limits the efficiency of ML approaches. We use the example of SVM (traditional ML algorithm) and DNN (state-of-the-art ML algorithm) to illustrate the idea in this paper. For SVM, a major performance bottleneck of current tools is that...

chapter

Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning

Hartwig Anzt, Jack Dongarra, Goran Flegar, Enrique S. Quintana-Orti

2017 46th International Conference on Parallel Processing (ICPP) > 91 - 100

2017 46th International Conference on Parallel Processing (ICPP)

We present a set of new batched CUDA kernels for the LU factorization of a large collection of independent problems of different size, and the subsequent triangular solves. All kernels heavily exploit the registers of the graphics processing unit (GPU) in order to deliver high performance for small problems. The development of these kernels is motivated by the need for tackling this embarrasingly-parallel...

chapter

Parallel Construction of Simultaneous Deterministic Finite Automata on Shared-Memory Multicores

Minyoung Jung, Jinwoo Park, Johann Blieberger, Bernd Burgstaller

2017 46th International Conference on Parallel Processing (ICPP) > 271 - 281

2017 46th International Conference on Parallel Processing (ICPP)

String pattern matching with finite automata (FAs) is a well-established method across many areas in computer science. Until now, data dependencies inherent in the pattern matching algorithm have hampered effective parallelization. To overcome the dependency-constraint between subsequent matching steps, simultaneous deterministic finite automata (SFAs) have been recently introduced. Although an SFA...

chapter

High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU

Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka

2017 46th International Conference on Parallel Processing (ICPP) > 101 - 110

2017 46th International Conference on Parallel Processing (ICPP)

Sparse general matrix-matrix multiplication (SpGEMM) is one of the key kernels of preconditioners such as algebraic multigrid method or graph algorithms. However, the performance of SpGEMM is quite low on modern processors due to random memory access to both input and output matrices. As well as the number and the pattern of non-zero elements in the output matrix, important for achieving locality,...

chapter

Parallel Space-Time Kernel Density Estimation

Erik Saule, Dinesh Panchananam, Alexander Hohl, Wenwu Tang, more

2017 46th International Conference on Parallel Processing (ICPP) > 483 - 492

2017 46th International Conference on Parallel Processing (ICPP)

The exponential growth of available data has increased the need for interactive exploratory analysis. Dataset can no longer be understood through manual crawling and simple statistics. In Geographical Information Systems (GIS), the dataset is often composed of events localized in space and time; and visualizing such a dataset involves building a map of where the events occurred.We focus in this paper...

chapter

An efficient FPGA-Based architecture for convolutional neural networks

Wen-Jyi Hwang, Yun-Jie Jhang, Tsung-Ming Tai

2017 40th International Conference on Telecommunications and Signal Processing (TSP) > 582 - 588

2017 40th International Conference on Telecommunications and Signal Processing (TSP)

The goal of this paper is to implement an efficient FPGA-based hardware architectures for the design of fast artificial vision systems. The proposed architecture is capable of performing classification operations of a Convolutional Neural Network (CNN) in realtime. To show the effectiveness of the architecture, some design examples such as hand posture recognition, character recognition, and face...

chapter

OpenCL-based design pattern for line rate packet processing

Jehandad Khan, Peter Athanas, Skip Booth, John Marshall

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 190 - 194

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

The ever changing nature of network technology requires a flexible platform that can change as the technology evolves. In this work, a complete networking switch designed in OpenCL is presented, identifying several high-level constructs that form the building blocks of any network application targeting FPGAs. These include the notion of an on-chip global memory and kernels constantly processing data...

chapter

GPU-based Gray-Level Co-occurrence Matrix for Extracting Features from Magnetic Resonance Images

Hsin-Yi Tsai, Zhang Hanyu, Che-Lun Hung, Hsian-Min Chen

2017 14th International Symposium on Pervasive Systems, Algorithms and Networks & 2017 11th International Conference on Frontier of Computer Science and Technology & 2017 Third International Symposium of Creative Computing (ISPAN-FCST-ISCC) > 391 - 396

With the continuously increasing power of computation, especially in the region of parallel computing, computerbased texture analysis, computer-assisted classification methods, automated pathology detections, etc. are more and more commonly performed on medical images, like X-ray, Magnetic Resonance (MR) images, for clinical or scientific purposes. These procedures almost always include a stage of...

chapter

FPGA acceleration of hyperspectral image processing for high-speed detection applications

Simon Vellas, George Lentaris, Konstantinos Maragos, Dimitrios Soudris, more

2017 IEEE International Symposium on Circuits and Systems (ISCAS) > 1 - 4

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

Recent advances in photonics and imaging technology allow the development of cutting-edge, lightweight hyperspectral sensors, both push-broom/line-scanning and snapshot/frame. At the same time, emerging applications in robotics, food inspection, medicine and earth observation are posing critical challenges on real-time processing and computational efficiency, both in terms of accuracy and power consumption...

chapter

PACENet: Energy efficient acceleration for convolutional network on embedded platform

Adwaya Kulkarni, Tahmid Abtahi, Colin Shea, Amey Kulkarni, more

2017 IEEE International Symposium on Circuits and Systems (ISCAS) > 1 - 4

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

Lightweight convolutional neural network (CNN) on tiny embedded platforms can offer energy efficient solution for today's IoT devices. However, CNN implementation on embedded system faces processing bottleneck in convolutional layers and memory storage issues in fully connected layers. In past years, heterogeneous acceleration, where compute intensive tasks are performed on kernel specific cores,...

Keywords:
KERNEL
PARALLEL PROCESSING

Publication date

Set your own date range

Content availability

Available (468)
None (5)

Keywords

INSTRUCTION SETS (149)
GRAPHICS PROCESSING UNITS (132)
GRAPHICS PROCESSING UNIT (98)
COMPUTER ARCHITECTURE (92)
HARDWARE (89)
GPU (82)
COMPUTATIONAL MODELING (73)
CUDA (58)
FIELD PROGRAMMABLE GATE ARRAYS (58)
PROGRAMMING (56)
OPTIMIZATION (53)
COPROCESSORS (50)
ARRAYS (46)
ALGORITHM DESIGN AND ANALYSIS (44)
PROGRAM PROCESSORS (42)
COMPUTER GRAPHIC EQUIPMENT (38)
MEMORY MANAGEMENT (38)
PERFORMANCE EVALUATION (35)
GPGPU (34)
ACCELERATION (33)
MULTIPROCESSING SYSTEMS (32)
BENCHMARK TESTING (31)
REGISTERS (30)
YARN (29)
OPENCL (28)
RUNTIME (26)
PARALLEL PROGRAMMING (24)
BANDWIDTH (23)
FPGA (23)
SYNCHRONIZATION (22)
COMPUTER GRAPHICS (21)
DATA MINING (21)
MULTICORE PROCESSING (21)
PARALLEL COMPUTING (21)
CENTRAL PROCESSING UNIT (18)
LIBRARIES (18)
MICROPROCESSOR CHIPS (18)
PIXEL (18)
THROUGHPUT (18)
IMAGE PROCESSING (17)
PIPELINES (17)
TRAINING (17)
PARALLEL ARCHITECTURES (16)
CONVOLUTION (15)
HEURISTIC ALGORITHMS (15)
COMPUTE UNIFIED DEVICE ARCHITECTURE (14)
SPARSE MATRICES (14)
LINUX (13)
SERVERS (13)
SUPPORT VECTOR MACHINES (13)
MULTI-THREADING (12)
RANDOM ACCESS MEMORY (12)
VECTORS (12)
CONTEXT (11)
DATA STRUCTURES (11)
DATABASES (11)
EMBEDDED SYSTEMS (11)
INDEXES (11)
RECONFIGURABLE ARCHITECTURES (11)
TILES (11)
ACCURACY (10)
COMPUTERS (10)
DECODING (10)
GRAPHIC PROCESSING UNIT (10)
MAGNETIC CORES (10)
MATHEMATICAL MODEL (10)
MESSAGE PASSING (10)
MESSAGE SYSTEMS (10)
PARALLEL ALGORITHMS (10)
RESOURCE MANAGEMENT (10)
APPLICATION PROGRAM INTERFACES (9)
DIGITAL SIGNAL PROCESSING (9)
HIGH PERFORMANCE COMPUTING (9)
MICROPROCESSORS (9)
OPENMP (9)
RESOURCE ALLOCATION (9)
SCHEDULING (9)
CPU (8)
ENCODING (8)
FEATURE EXTRACTION (8)
GPU COMPUTING (8)
MULTI-CORE (8)
OPTIMISATION (8)
PARALLEL (8)
PROCESSOR SCHEDULING (8)
REAL-TIME SYSTEMS (8)
SCHEDULES (8)
ANALYTICAL MODELS (7)
BIOINFORMATICS (7)
CLOCKS (7)
GRAPHICS (7)
IMAGE COLOR ANALYSIS (7)
JACOBIAN MATRICES (7)
LINEAR ALGEBRA (7)
MATRIX MULTIPLICATION (7)
SCALABILITY (7)
SIMD (7)
SOFTWARE (7)
more

INFONA - science communication portal

Search results

Aggressive pipelining of irregular applications on reconfigurable hardware

Assessing Sparse Triangular Linear System Solvers on GPUs

Introducing parallel computing concepts in computer system related courses

Neural network for saturation prediction of solid state drives

Solving 0-1 quadratic problems with two-level parallelization of the BiqCrunch solver

PolyPC: Polymorphic parallel computing framework on embedded reconfigurable system

SuperGraph-SLP Auto-Vectorization

Cloudifier virtual apps: Virtual desktop predictive analytics apps environment based on GPU computing framework

Modified Convolution Neural Network for Highly Effective Parallel Processing

OpenCL 2.0 Compiler Adaptation on LLVM for PTX Simulators

Runtime Data Layout Scheduling for Machine Learning Dataset

Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning

Parallel Construction of Simultaneous Deterministic Finite Automata on Shared-Memory Multicores

High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU

Parallel Space-Time Kernel Density Estimation

An efficient FPGA-Based architecture for convolutional neural networks

OpenCL-based design pattern for line rate packet processing

GPU-based Gray-Level Co-occurrence Matrix for Extracting Features from Magnetic Resonance Images

FPGA acceleration of hyperspectral image processing for high-speed detection applications

PACENet: Energy efficient acceleration for convolutional network on embedded platform

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options