Search results

Items from 1 to 20 out of 262 results

chapter

vPHI: Enabling Xeon Phi Capabilities in Virtual Machines

Stefanos Gerangelos, Nectarios Koziris

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1333 - 1340

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Heterogeneous processing has gained popularity in the high performancecomputing (HPC) area lately and it appears to have a great potential for future data centers. In this regard, accelerators, such as GPUs and Intel Xeon Phi, have already started to play a significant role in HPC systems offering a high degree of parallelism to application developers. Furthermore, hardware virtualization is gaining...

chapter

The implementation of edge detection on HSA environment

Sethakarn Prongnuch, Theerayod Wiangtong

2017 International Electrical Engineering Congress (iEECON) > 1 - 4

2017 International Electrical Engineering Congress (iEECON)

This paper presents the implementation of image edge detection on Heterogeneous System Architecture (HSA). HSA which includes ARM processor, Coprocessor and FPGA are compared with x64 CPU in terms of performance and power consumption. The experimental results show that although the best execution time is from x64 CPU, HSA has 50 times more energy efficiency. Also, HSA can exploit coprocessors and...

chapter

A study on the method of the remote IPC based on xeon-phi hardware platform

Jeong-Hwan Lee, Seung-Jun Cha, Seung-Hyub Jeon, Sungin Jung

2016 International Conference on Information and Communication Technology Convergence (ICTC) > 601 - 603

2016 International Conference on Information and Communication Technology Convergence (ICTC)

We designed and implemented a Remote Inter-Processor Communication architecture software on Xeon Phi coprocessors and made a testbed to verify it. Also, we implemented a lightweight kernel and RIPC transmission/receiver application threads on the lightweight kernel running on Xeon Phi coprocessors. This paper proposes RIPC methods to communicate between user threads in separate Xeon Phi nodes using...

chapter

Implementing Hilbert transform for Digital Signal Processing on epiphany many-core coprocessor

Kyle L. Labowski, Patrick W. Jungwirth, James A. Ross, David A. Richie

2016 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 6

2016 IEEE High Performance Extreme Computing Conference (HPEC)

The Adapteva Epiphany MIMD architecture is a scalable 2D array of RISC cores with a fast network-on-chip (NoC) for parallel processing. The work presented here discusses the suitability of the architecture to handle software defined radio (SDR) applications such as Finite Impulse Response (FIR) filters. This paper discusses implementation of the Hilbert filter through using the COPRTHR 2.0 SDK which...

chapter

Generation of the Single Precision BLAS Library for the Parallella Platform, with Epiphany Co-processor Acceleration, Using the BLIS Framework

Miguel Tasende

2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech) > 894 - 897

The Parallella is a hybrid computing platform that came into existence as the result of a Kickstarter project by Adapteva. It is composed of the high performance, energy-efficient, manycore architecture, Epiphany chip (used as co-processor) and one Zynq-7000 series chip, which normally runs a regular Linux OS version, serves as the main processor, and implements "glue logic" in its internal...

chapter

GPU-based nonlocal filtering for large scale SAR processing

Gerald Baier, Xiao Xiang Zhu

2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) > 7608 - 7611

IGARSS 2016 - 2016 IEEE International Geoscience and Remote Sensing Symposium

In the past few years nonlocal filters have emerged as a serious contender for denoising synthetic aperture radar (SAR) images, offering superior noise reduction and detail preservation compared to many other filters. In this manuscript we analyze how nonlocal filters, whose computational costs were so far prohibitive for large scale processing, can be implemented efficiently on graphics processing...

chapter

Evaluating the Performance Impact of Multiple Streams on the MIC-Based Heterogeneous Platform

Zhaokui Li, Jianbin Fang, Tao Tang, Xuhao Chen, more

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1341 - 1350

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Using multiple streams can improve the overall system performance by mitigating the data transfer overhead on heterogeneous systems. Prior work focuses a lot on GPUs but little is known about the performance impact on (Intel Xeon) Phi. In this work, we apply multiple streams into six real-world applications on Phi. We then systematically evaluate the performance benefits of using multiple streams...

chapter

Exploiting Very-Wide Vectors on Intel Xeon Phi with Lattice-QCD Kernels

Andreas Diavastos, Giannos Stylianou, Giannis Koutsou

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 296 - 300

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Our target in this work is to study ways of exploring the parallelism offered by vectorization on accelerators with very wide vector units. To this end, we implemented two kernels that derive from the Wilson Dslash operator and investigate several data layout techniques for increasing the scalability of lattice QCD scientific kernels suitable for the Intel Xeon Phi. In parts of the application where...

chapter

Exploiting Very-Wide Vectors on Intel Xeon Phi with Lattice-QCD Kernels

Andreas Diavastos, Giannos Stylianou, Giannis Koutsou

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 296 - 300

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

chapter

Reconfigurable coprocessors synthesis in the MPEG-RVC domain

Carlo Sau, Luca Fanni, Paolo Meloni, Luigi Raffo, more

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 8

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

Flexibility and high efficiency are common design drivers in the embedded systems domain. Coarse-grained reconfigurable coprocessors can tackle these issues, but they suffer of complex design, debugging and applications mapping problems. In this paper, we propose an automated design flow that aids developers in design and managing coarse-grained reconfigurable coprocessors. It provides both the hardware...

chapter

Exploring the Suitability of Remote GPGPU Virtualization for the OpenACC Programming Model Using rCUDA

Adrian Castello, Antonio J. Pena, Rafael Mayo, Pavan Balaji, more

2015 IEEE International Conference on Cluster Computing > 92 - 95

2015 IEEE International Conference on Cluster Computing (CLUSTER)

OpenACC is an application programming interface (API) that aims to unleash the power of heterogeneous systems composed of CPUs and accelerators such as graphic processing units (GPUs) or Intel Xeon Phi coprocessors. This directive-based programming model is intended to enable developers to accelerate their application's execution with much less effort. Coprocessors offer significant computing power...

chapter

Performance and productivity evaluation of hybrid-threading HLS versus HDLs

Gongyu Wang, Herman Lam, Alan George, Glen Edwards

2015 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2015 IEEE High Performance Extreme Computing Conference (HPEC)

FPGA-based reconfigurable computing is finding its way into a wide range of application areas in which high performance and low power consumption are paramount. However, FPGA-application development using hardware-description languages (HDLs) faces many productivity challenges that limit its wide adoption, including a steep learning curve and lengthy compilation. High-level synthesis (HLS) languages...

chapter

Implementation of Short Read Alignment Algorithm in OpenCL on Xeon Phi Coprocessor

Xiquan Zhao, Chuang Liu, Guangming Tan

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems > 1633 - 1636

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) and 2015 IEEE 12th International Conf on Embedded Software and Systems (ICESS)

Aligning sequencing reads to a reference genome is often essential in many comparative genomics pipelines. With the maturation of next-generation DNA sequencing (NGS) technologies, an enormous amount of sequence data has been generated, this calls for the development of faster read alignment programs. In this paper we present an OpenCL implementation of the short read aligner BarraCUDA [1], which...

chapter

3D-stacked many-core architecture for biological sequence analysis problems

Pei Liu, Ahmed Hemani, Kolin Paul

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) > 211 - 220

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)

Sequence analysis plays critical role in bioinformatics, and most applications of which have compute intensive kernels consuming over 70% of total execution time. By exploiting the compute intensive execution stages of popular sequence analysis applications, we present and evaluate a VLSI architecture with a focus on those that target at biological sequences directly, including pairwise alignment,...

chapter

Generic GNU/Linux reconfiguration platform proposal

Petr Cvek, Ondrej Novák

2015 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM) > 1 - 6

2015 IEEE International Workshop of Electronics, Control, Measurement, Signals and their application to Mechatronics (ECMSM)

This article presents a design of a dynamically reconfigurable hybrid multiprocessor system on a chip (SoC), where individual reconfiguration partitions (RP) are time multiplexed by demands of a task. Scheduling the RPs is designed to be done by a modified Linux kernel. Design is partially implemented on the experimental platform, tested by multiple benchmarks and will be extended in the future.

chapter

Implementation of numerical methods for nanoscaled semiconductor device simulation using OpenCL

E. Coronado-Barrientos, A. Garcia-Loureiro, G. Indalecio, N. Seoane

2015 10th Spanish Conference on Electron Devices (CDE) > 1 - 4

2015 10th Spanish Conference on Electron Devices (CDE)

The present work implements solvers with OpenCL of the FGMRES and preconditioned BCGSTAB algorithms. These solvers are integrated in a 3-D simulation tool of nanoscaled MOSFET transistors. Simulations are launched in two different platform devices: NVIDIA Tesla S2050 and Intel Xeon Phi 3120A. The resulting times of execution are compared against the optimized PSPARSLIB version of the FGMRES solver...

chapter

VOCL-FT: introducing techniques for efficient soft error coprocessor recovery

Antonio J. Peña, Wesley Bland, Pavan Balaji

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 12

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis

Popular accelerator programming models rely on offloading computation operations and their corresponding data transfers to the coprocessors, leveraging synchronization points where needed. In this paper we identify and explore how such a programming model enables optimization opportunities not utilized in traditional checkpoint/restart systems, and we analyze them as the building blocks for an efficient...

chapter

Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads across Accelerators, Coprocessors, and Multicore Processors

Chongxiao Cao, Mark Gates, Azzam Haidar, Piotr Luszczek, more

2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems > 61 - 68

2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)

Ever since accelerators and coprocessors became the mainstream hardware for throughput-oriented HPC workloads, various programming techniques have been proposed to increase productivity in terms of both the performance and ease-of-use. We evaluate these aspects of OpenCL on a number of hardware platforms for an important subset of dense linear algebra operations that are relevant to a wide range of...

chapter

Finite element numerical integration on Xeon Phi coprocessor

Filip Kruzel, Krzysztof Banas

2014 Federated Conference on Computer Science and Information Systems > 603 - 612

2014 Federated Conference on Computer Science and Information Systems (FedCSIS)

In the present article we describe the implementation of the finite element numerical integration algorithm for the Xeon Phi coprocessor. The coprocessor is an extension of the idea of the many-core specialized unit for calculations and, by assumption, its performance has to be competitive with the current families of GPUs. Its main advantage is the built-in set of 512-bit vector registers and the...

chapter

Automatic Tuning of a Parallel Pattern Library for Heterogeneous Systems with Intel Xeon Phi

Jiri Dokulil, Siegfried Benkner

2014 IEEE International Symposium on Parallel and Distributed Processing with Applications > 42 - 49

2014 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)

Pattern libraries are important tools for high productivity application development. Their struggle for best performance is complicated by the fact that they are used to execute user-provided code, which is not known during their creation. This makes pattern libraries good candidate for automatic software tuning. In this paper, we deal with automatic online parameter tuning of the HyPHI hybrid pattern...

Keywords:
KERNEL
COPROCESSORS

Publication date

Set your own date range

Content availability

Available (254)
None (8)

Keywords

GRAPHICS PROCESSING UNIT (158)
COMPUTER GRAPHIC EQUIPMENT (117)
INSTRUCTION SETS (92)
GPU (86)
CUDA (72)
COMPUTER ARCHITECTURE (59)
PARALLEL PROCESSING (50)
COMPUTATIONAL MODELING (47)
GRAPHICS PROCESSING UNITS (42)
GPGPU (40)
COMPUTER GRAPHICS (38)
HARDWARE (37)
YARN (37)
PROGRAMMING (36)
PARALLEL ARCHITECTURES (33)
OPTIMIZATION (30)
ARRAYS (27)
ACCELERATION (26)
PIXEL (24)
ALGORITHM DESIGN AND ANALYSIS (23)
COMPUTE UNIFIED DEVICE ARCHITECTURE (23)
MULTIPROCESSING SYSTEMS (23)
PERFORMANCE EVALUATION (23)
REGISTERS (22)
BENCHMARK TESTING (20)
FIELD PROGRAMMABLE GATE ARRAYS (20)
BANDWIDTH (18)
GRAPHICS (18)
HIGH PERFORMANCE COMPUTING (17)
MEMORY MANAGEMENT (17)
PARALLEL ALGORITHMS (17)
PARALLEL PROGRAMMING (17)
SPARSE MATRICES (17)
LIBRARIES (16)
OPTIMISATION (15)
PARALLEL COMPUTING (15)
GRAPHIC PROCESSING UNIT (14)
RUNTIME (14)
IMAGE PROCESSING (13)
MATHEMATICAL MODEL (13)
PROGRAM PROCESSORS (13)
RANDOM ACCESS MEMORY (13)
CPU (12)
DATA MINING (12)
MATRIX MULTIPLICATION (12)
CENTRAL PROCESSING UNIT (11)
INDEXES (11)
THREE DIMENSIONAL DISPLAYS (11)
NVIDIA (10)
EQUATIONS (9)
MAGNETIC CORES (9)
MULTICORE PROCESSING (9)
OPENCL (9)
POWER AWARE COMPUTING (9)
THROUGHPUT (9)
BIOINFORMATICS (8)
CONVOLUTION (8)
FAST FOURIER TRANSFORMS (8)
ITERATIVE METHODS (8)
MULTI-THREADING (8)
STREAMING MEDIA (8)
SYNCHRONIZATION (8)
APPLICATION PROGRAM INTERFACES (7)
BIOLOGY COMPUTING (7)
COPROCESSOR (7)
DATA TRANSFER (7)
EMBEDDED SYSTEMS (7)
FINITE DIFFERENCE METHODS (7)
FLOATING POINT ARITHMETIC (7)
GENERAL PURPOSE GRAPHICS PROCESSING UNITS (7)
GRAPHICAL PROCESSING UNIT (7)
HEURISTIC ALGORITHMS (7)
LAYOUT (7)
LINUX (7)
PATTERN CLUSTERING (7)
PROCESSOR SCHEDULING (7)
SERVERS (7)
SHARED MEMORY SYSTEMS (7)
VECTORS (7)
CLOCKS (6)
COMPUTATIONAL COMPLEXITY (6)
COMPUTERISED TOMOGRAPHY (6)
CRYPTOGRAPHY (6)
DATABASES (6)
DECODING (6)
ENERGY CONSUMPTION (6)
FEATURE EXTRACTION (6)
FPGA (6)
GENERAL PURPOSE COMPUTERS (6)
GRAPHICS HARDWARE (6)
HISTOGRAMS (6)
IMAGE RECONSTRUCTION (6)
LINEAR ALGEBRA (6)
MATHEMATICS COMPUTING (6)
MEDICAL IMAGE PROCESSING (6)
MESSAGE SYSTEMS (6)
NVIDIA CUDA (6)
OPENMP (6)
more

INFONA - science communication portal

Search results

vPHI: Enabling Xeon Phi Capabilities in Virtual Machines

The implementation of edge detection on HSA environment

A study on the method of the remote IPC based on xeon-phi hardware platform

Implementing Hilbert transform for Digital Signal Processing on epiphany many-core coprocessor

Generation of the Single Precision BLAS Library for the Parallella Platform, with Epiphany Co-processor Acceleration, Using the BLIS Framework

GPU-based nonlocal filtering for large scale SAR processing

Evaluating the Performance Impact of Multiple Streams on the MIC-Based Heterogeneous Platform

Exploiting Very-Wide Vectors on Intel Xeon Phi with Lattice-QCD Kernels

Exploiting Very-Wide Vectors on Intel Xeon Phi with Lattice-QCD Kernels

Reconfigurable coprocessors synthesis in the MPEG-RVC domain

Exploring the Suitability of Remote GPGPU Virtualization for the OpenACC Programming Model Using rCUDA

Performance and productivity evaluation of hybrid-threading HLS versus HDLs

Implementation of Short Read Alignment Algorithm in OpenCL on Xeon Phi Coprocessor

3D-stacked many-core architecture for biological sequence analysis problems

Generic GNU/Linux reconfiguration platform proposal

Implementation of numerical methods for nanoscaled semiconductor device simulation using OpenCL

VOCL-FT: introducing techniques for efficient soft error coprocessor recovery

Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads across Accelerators, Coprocessors, and Multicore Processors

Finite element numerical integration on Xeon Phi coprocessor

Automatic Tuning of a Parallel Pattern Library for Heterogeneous Systems with Intel Xeon Phi

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options