The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The magnitude squared coherence (MSC) is an important method to calculate the connectivity between neural signals. It provides a better spectral resolution than the Welch's method and is often used in analyzing electroencephalograph (EEG) synchronization activity. The minimum variance distortionless response (MVDR) is a spectral estimation method based on matched filterbank theory. The Cheriet-Belouchrani...
There is a lack of support for explicit synchronization in GPUs between the streaming multiprocessors (SMs) adversely impacts the performance of the GPUs to efficiently perform inter-block communication. In this paper, we present several approaches to inter-block synchronization using explicit/implicit CPU-based and dynamic parallelism (DP) mechanisms. Although this topic has been addressed in previous...
This paper studies two parallelization techniques for the implementation of a SPSO algorithm applied to optimize electromagnetic field devices, GPGPU and Pthreads for multiprocessor architectures. The GPGPU and Pthreads implementations are compared in terms of solution quality and speed up. The electromagnetic optimization problems chosen for testing the efficiency of the parallelization techniques...
In this paper we study the problem of estimating the unknown delay(s) in a system where we receive a linear combination of several delayed copies of a known transmitted waveform. This problem arises in many applications such as timing-based localization or wireless synchronization. Since accurate delay estimation requires wideband signals, traditional systems need high-speed AD converters which poses...
In recent years the power wall has prevented the continued scaling of single core performance. This has lead to the rise of dark silicon and motivated a move toward parallelism and specialization. As a result, energy-efficient high-throughput GPU cores are increasingly favored for accelerating data-parallel applications. However, the best way to efficiently communicate and synchronize across heterogeneous...
Heterogeneous systems, that marry CPUs and GPUs together in a range of configurations, are quickly becoming the design paradigm for today's platforms because of their impressive parallel processing capabilities. However, in many existing heterogeneous systems, the GPU is only treated as an accelerator by the CPU, working as a slave to the CPU master. But recently we are starting to see the introduction...
Nowadays, many industrial synchronization systems rely on the Precise Time Protocol (PTP or IEEE1588) that provides sub-microsecond precision time transfer. However, there are some applications such as next generation of telecommunication systems (LTE-A & 5G) or scientific infrastructures that have stricter timing requirements that must guarantee the timing service regardless of traffic load conditions...
In this paper, we study the design and implementation of a reconfigurable architecture for graph processing algorithms. The architecture uses a message-passing model targeting shared-memory multi-FPGA platforms. We take advantage of our architecture to showcase a parallel implementation of the all-pairs shortest path algorithm (APSP) for unweighted directed graphs. Our APSP implementation adopts a...
Over the past decade, considerable attention has been devoted to the problem of emergence of synchronization patterns in a network of coupled oscillators, which can be observed in a variety of disciplines, from the biological to the engineering fields. In this context, the Kuramoto model is a classical model for describing synchronization phenomena that arise in large-scale systems that exploit local...
Floating-point additions in concurrent execution environment are known to be hazardous, as the result depends on the order in which operations are performed. This problem is encountered in data parallel execution environments such as GPUs, where reproducibility involving floating-point atomic addition is challenging. This problem is due to the rounding error or cancellation that appears for each operation,...
The micro grid is a cluster of electricity generators, energy storage systems, and loads that can operate connected, as well as disconnected, from the distribution grid. In this paper the time related characteristics of Cyber Physical Systems (CPCs) to be used for (IEC 61850-based) automation of micro grids are investigated. Specific constrains are taken into account as, for instance: the use of heterogeneous...
In many-core based parallel computing field, how to optimally allocate and schedule computing core resources according to characteristics of parallel applications is one typical and fundamental problem, which touches closely to computing performances. After analyzing features and mechanisms of Kepler CUDA architecture, three heterogeneous streaming parallel computing modes and corresponding constraints,...
Linear Algebra Kernels have an important role in many petroleum reservoir simulators, extensively used by the industry. The growth in problem size, specially in pre-salt exploration, has caused an increase in execution time of those kernels, thus requiring parallel programming to improve performance and make the simulation viable. On the other hand, exploiting parallelism in systems with an ever increasing...
Nekbone is a proxy application of Nek5000, a scalable Computational Fluid Dynamics (CFD) code used for modelling incompressible flows. The Nekbone mini-application is used by several international co-design centers to explore new concepts in computer science and to evaluate their performance. We present the design and implementation of a new communication kernel in the Nekbone mini-application with...
Migration to multicore is inevitable. To harness the potential of this technology, embedded system designers need to have available operating systems (OSes) with built-in capabilities for multicore hardware. When designed to meet real-time requirements, multicore SMP (Symmetric Multiprocessing) OSes not only face the inherent problem of concurrent access to shared kernel resources, but still suffer...
Using multiple accelerators, such as GPUs or Xeon Phis, is attractive to improve the performance of large data parallel applications and to increase the size of their workloads. However, writing an application for multiple accelerators remains today challenging because going from a single accelerator to multiple ones indeed requires to deal with potentially non-uniform domain decomposition, inter-accelerator...
The paper considers the challenge of deductively verifying Linux kernel code written in C programming language with extensive use of low-level memory operations and interactions with the highly concurrent environment. The paper presents an initial approach to specification and verification of concurrent code working with shared data by proving the code's compliance with specified synchronization discipline...
In this paper, we would like to introduce a GPU accelerated solver for systems of linear equations with an infinite precision. The infinite precision means that the system can provide a precise solution without any rounding error. These errors usually come from limited precision of floating point values within their natural computer representation. In a simplified description, the system is using...
The number of cores in embedded systems is continuously growing, supporting increasingly complex concurrent applications. In order to verify that the systems comply specification requirements during the design process, fast simulations and performance analysis tools are required. These simulation frameworks typically use virtualization or host-compiled simulation techniques. On one hand, current host...
In this paper, we present a compilation flow for HPC kernels on the REDEFINE coarse-grain reconfigurable architecture (CGRA). REDEFINE is a scalable macro-dataflow machine in which the compute elements (CEs) communicate through messages. REDEFINE offers the ability to exploit high degree of coarse-grain and pipeline parallelism. The CEs in REDEFINE are enhanced with reconfigurable macro data-paths...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.