Search results

chapter

A new coherence estimating method: The magnitude squared coherence of smoothing minimum variance distortionless response

Dong Cui, Juan Wang, Zhaohui Li, Xiaoli Li

2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) > 1440 - 1445

2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)

The magnitude squared coherence (MSC) is an important method to calculate the connectivity between neural signals. It provides a better spectral resolution than the Welch's method and is often used in analyzing electroencephalograph (EEG) synchronization activity. The minimum variance distortionless response (MVDR) is a spectral estimation method based on matched filterbank theory. The Cheriet-Belouchrani...

chapter

Characterizing Performance and Power towards Efficient Synchronization of GPU Kernels

Islam Harb, Wu-Chun Feng

2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS) > 451 - 456

2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS)

There is a lack of support for explicit synchronization in GPUs between the streaming multiprocessors (SMs) adversely impacts the performance of the GPUs to efficiently perform inter-block communication. In this paper, we present several approaches to inter-block synchronization using explicit/implicit CPU-based and dynamic parallelism (DP) mechanisms. Although this topic has been addressed in previous...

chapter

GPGPU vs multiprocessor SPSO implementations to solve electromagnetic optimization problems

Anton Duca, Laurentiu Duca, Gabriela Ciuprina, Daniel Ioan

2015 7th International Joint Conference on Computational Intelligence (IJCCI) > 1 > 64 - 73

2015 7th International Joint Conference on Computational Intelligence (IJCCI)

This paper studies two parallelization techniques for the implementation of a SPSO algorithm applied to optimize electromagnetic field devices, GPGPU and Pthreads for multiprocessor architectures. The GPGPU and Pthreads implementations are compared in terms of solution quality and speed up. The electromagnetic optimization problems chosen for testing the efficiency of the parallelization techniques...

chapter

Measurement matrix design for compressed sensing based time delay estimation

Florian Roemer, Mohamed Ibrahim, Norbert Franke, Niels Hadaschik, more

2016 24th European Signal Processing Conference (EUSIPCO) > 458 - 462

2016 24th European Signal Processing Conference (EUSIPCO)

In this paper we study the problem of estimating the unknown delay(s) in a system where we receive a linear combination of several delayed copies of a known transmitted waveform. This problem arises in many applications such as timing-based localization or wireless synchronization. Since accurate delay estimation requires wideband signals, traditional systems need high-speed AD converters which poses...

chapter

GSI: A GPU Stall Inspector to characterize the sources of memory stalls for tightly coupled GPUs

Johnathan Alsop, Matthew D. Sinclair, Rakesh Komuravelli, Sarita V. Adve

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 172 - 182

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

In recent years the power wall has prevented the continued scaling of single core performance. This has lead to the rise of dark silicon and motivated a move toward parallelism and specialization. As a result, energy-efficient high-throughput GPU cores are increasingly favored for accelerating data-parallel applications. However, the best way to efficiently communicate and synchronize across heterogeneous...

chapter

A comprehensive performance analysis of HSA and OpenCL 2.0

Saoni Mukherjee, Yifan Sun, Paul Blinzer, Amir Kavyan Ziabari, more

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 183 - 193

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Heterogeneous systems, that marry CPUs and GPUs together in a range of configurations, are quickly becoming the design paradigm for today's platforms because of their impressive parallel processing capabilities. However, in many existing heterogeneous systems, the GPU is only treated as an accelerator by the CPU, working as a slave to the CPU master. But recently we are starting to see the introduction...

chapter

WR-ZEN: Ultra-accurate synchronization SoC based on Zynq technology

Miguel Jimenez-Lopez, Jose Luis Gutierrez-Rivas, Javier Diaz, Emilio Lopez-Marin, more

2016 European Frequency and Time Forum (EFTF) > 1 - 4

2016 European Frequency and Time Forum (EFTF)

Nowadays, many industrial synchronization systems rely on the Precise Time Protocol (PTP or IEEE1588) that provides sub-microsecond precision time transfer. However, there are some applications such as next generation of telecommunication systems (LTE-A & 5G) or scientific infrastructures that have stricter timing requirements that must guarantee the timing service regardless of traffic load conditions...

chapter

Accelerating all-pairs shortest path using a message-passing reconfigurable architecture

Osama G. Attia, Alex Grieve, Kevin R. Townsend, Phillip Jones, more

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 6

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

In this paper, we study the design and implementation of a reconfigurable architecture for graph processing algorithms. The architecture uses a message-passing model targeting shared-memory multi-FPGA platforms. We take advantage of our architecture to showcase a parallel implementation of the all-pairs shortest path algorithm (APSP) for unweighted directed graphs. Our APSP implementation adopts a...

chapter

On the synchronization of spatially coupled oscillators

Angelo Cenedese, Chiara Favaretto

2015 54th IEEE Conference on Decision and Control (CDC) > 4836 - 4841

2015 54th IEEE Conference on Decision and Control (CDC)

Over the past decade, considerable attention has been devoted to the problem of emergence of synchronization patterns in a network of coupled oscillators, which can be observed in a variety of disciplines, from the biological to the engineering fields. In this context, the Kuramoto model is a classical model for describing synchronization phenomena that arise in large-scale systems that exploit local...

chapter

Reproducible floating-point atomic addition in data-parallel environment

David Defour, Sylvain Collange

2015 Federated Conference on Computer Science and Information Systems (FedCSIS) > 721 - 728

2015 Federated Conference on Computer Science and Information Systems (FedCSIS)

Floating-point additions in concurrent execution environment are known to be hazardous, as the result depends on the order in which operations are performed. This problem is encountered in data parallel execution environments such as GPUs, where reproducibility involving floating-point atomic addition is challenging. This problem is due to the rounding error or cancellation that appears for each operation,...

chapter

Analysis of IEEE 1588-based Cyber Physical System for micro grid automation

Stefano Rinaldi, Paolo Ferrari, Alessandra Flammini

2015 IEEE International Symposium on Precision Clock Synchronization for Measurement, Control, and Communication (ISPCS) > 31 - 36

2015 IEEE International Symposium on Precision Clock Synchronization for Measurement, Control, and Communication (ISPCS)

The micro grid is a cluster of electricity generators, energy storage systems, and loads that can operate connected, as well as disconnected, from the distribution grid. In this paper the time related characteristics of Cyber Physical Systems (CPCs) to be used for (IEC 61850-based) automation of micro grids are investigated. Specific constrains are taken into account as, for instance: the use of heterogeneous...

chapter

Design and Verification of Heterogeneous Streaming Parallel Mechanisms on Kepler CUDA

Kailong Zhang, Shaoli Zhou, Liang Hu, Hang Su, more

2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing > 2256 - 2262

2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM)

In many-core based parallel computing field, how to optimally allocate and schedule computing core resources according to characteristics of parallel applications is one typical and fundamental problem, which touches closely to computing performances. After analyzing features and mechanisms of Kepler CUDA architecture, three heterogeneous streaming parallel computing modes and corresponding constraints,...

chapter

Exploiting Parallelism in Linear Algebra Kernels through Dataflow Execution

Brunno F. Goldstein, Felipe M.G. Franca, Leandro A.J. Marzulo, Tiago A.O. Alves

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW) > 103 - 108

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)

Linear Algebra Kernels have an important role in many petroleum reservoir simulators, extensively used by the industry. The growth in problem size, specially in pre-salt exploration, has caused an increase in execution time of those kernels, thus requiring parallel programming to improve performance and make the simulation viable. On the other hand, exploiting parallelism in systems with an ever increasing...

chapter

Evaluation of Parallel Communication Models in Nekbone, a Nek5000 Mini-Application

Ilya Ivanov, Jing Gong, Dana Akhmetova, Ivy Bo Peng, more

2015 IEEE International Conference on Cluster Computing > 760 - 767

2015 IEEE International Conference on Cluster Computing (CLUSTER)

Nekbone is a proxy application of Nek5000, a scalable Computational Fluid Dynamics (CFD) code used for modelling incompressible flows. The Nekbone mini-application is used by several international co-design centers to explore new concepts in computer science and to evaluate their performance. We present the design and implementation of a new communication kernel in the Nekbone mini-application with...

chapter

HcM-FreeRTOS: Hardware-centric FreeRTOS for ARM multicore

E. Qaralleh, D. Lima, T. Gomes, A. Tavares, more

2015 IEEE 20th Conference on Emerging Technologies & Factory Automation (ETFA) > 1 - 4

2015 IEEE 20th Conference on Emerging Technologies & Factory Automation (ETFA)

Migration to multicore is inevitable. To harness the potential of this technology, embedded system designers need to have available operating systems (OSes) with built-in capabilities for multicore hardware. When designed to meet real-time requirements, multicore SMP (Symmetric Multiprocessing) OSes not only face the inherent problem of concurrent access to shared kernel resources, but still suffer...

chapter

Automatic OpenCL Code Generation for Multi-device Heterogeneous Architectures

Pei Li, Elisabeth Brunet, Francois Trahay, Christian Parrot, more

2015 44th International Conference on Parallel Processing > 959 - 968

2015 44th International Conference on Parallel Processing (ICPP)

Using multiple accelerators, such as GPUs or Xeon Phis, is attractive to improve the performance of large data parallel applications and to increase the size of their workloads. However, writing an application for multiple accelerators remains today challenging because going from a single accelerator to multiple ones indeed requires to deal with potentially non-uniform domain decomposition, inter-accelerator...

chapter

Towards deductive verification of concurrent Linux kernel code with Jessie

Mikhail Mandrykin, Alexey Khoroshilov

2015 Computer Science and Information Technologies (CSIT) > 5 - 10

2015 Computer Science and Information Technologies (CSIT)

The paper considers the challenge of deductively verifying Linux kernel code written in C programming language with extensive use of low-level memory operations and interactions with the highly concurrent environment. The paper presents an initial approach to specification and verification of concurrent code working with shared data by proving the code's compliance with specified synchronization discipline...

chapter

GPU Solver for Systems of Linear Equations with Infinite Precision

J. Khun, I. imeeek, R. Lorencz

2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) > 121 - 124

2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)

In this paper, we would like to introduce a GPU accelerated solver for systems of linear equations with an infinite precision. The infinite precision means that the system can provide a precise solution without any rounding error. These errors usually come from limited precision of floating point values within their natural computer representation. In a simplified description, the system is using...

chapter

Parallel Native-Simulation for Multi-processing Embedded Systems

Alejandro Nicolas, Pablo Sanchez

2015 Euromicro Conference on Digital System Design > 543 - 546

2015 Euromicro Conference on Digital System Design (DSD)

The number of cores in embedded systems is continuously growing, supporting increasingly complex concurrent applications. In order to verify that the systems comply specification requirements during the design process, fast simulations and performance analysis tools are required. These simulation frameworks typically use virtualization or host-compiled simulation techniques. On one hand, current host...

chapter

Compiling HPC Kernels for the REDEFINE CGRA

Kavitha T. Madhu, Saptarsi Das, Nalesh S., S. K. Nandy, more

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems > 405 - 410

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) and 2015 IEEE 12th International Conf on Embedded Software and Systems (ICESS)

In this paper, we present a compilation flow for HPC kernels on the REDEFINE coarse-grain reconfigurable architecture (CGRA). REDEFINE is a scalable macro-dataflow machine in which the compute elements (CEs) communicate through messages. REDEFINE offers the ability to exploit high degree of coarse-grain and pipeline parallelism. The CEs in REDEFINE are enhanced with reconfigurable macro data-paths...

INFONA - science communication portal

Search results

A new coherence estimating method: The magnitude squared coherence of smoothing minimum variance distortionless response

Characterizing Performance and Power towards Efficient Synchronization of GPU Kernels

GPGPU vs multiprocessor SPSO implementations to solve electromagnetic optimization problems

Measurement matrix design for compressed sensing based time delay estimation

GSI: A GPU Stall Inspector to characterize the sources of memory stalls for tightly coupled GPUs

A comprehensive performance analysis of HSA and OpenCL 2.0

WR-ZEN: Ultra-accurate synchronization SoC based on Zynq technology

Accelerating all-pairs shortest path using a message-passing reconfigurable architecture

On the synchronization of spatially coupled oscillators

Reproducible floating-point atomic addition in data-parallel environment

Analysis of IEEE 1588-based Cyber Physical System for micro grid automation

Design and Verification of Heterogeneous Streaming Parallel Mechanisms on Kepler CUDA

Exploiting Parallelism in Linear Algebra Kernels through Dataflow Execution

Evaluation of Parallel Communication Models in Nekbone, a Nek5000 Mini-Application

HcM-FreeRTOS: Hardware-centric FreeRTOS for ARM multicore

Automatic OpenCL Code Generation for Multi-device Heterogeneous Architectures

Towards deductive verification of concurrent Linux kernel code with Jessie

GPU Solver for Systems of Linear Equations with Infinite Precision

Parallel Native-Simulation for Multi-processing Embedded Systems

Compiling HPC Kernels for the REDEFINE CGRA

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options