The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The Kalray MPPA-256 processor integrates 256 user cores and 32 system cores on a chip with 28nm CMOS technology. Each core implements a 32-bit 5-issue VLIW architecture. These cores are distributed across 16 compute clusters of 16+1 cores, and 4 quad-core I/O subsystems. Each compute cluster and I/O subsystem owns a private address space, while communication and synchronization between them is ensured...
In this paper, we propose built-in functions on parallel programming model in SMYLE OpenCL to extend the original OpenCL semantics giving our system's original limitation and interpretation for embedded many-core architecture. On a platform using FPGA to evaluate embedded many-core architecture SMYLEref, data parallel and task parallel programming models supported by the OpenCL execution model are...
To design and develop any auto tuning mechanisms for OpenACC, it is important to clarify the differences between conventional GPU programming models and OpenACC in terms of available programming and tuning techniques, called performance tunabilities. This paper hence discusses the performance tunabilities of OpenACC and OpenCL. As OpenACC cannot synchronize threads running on GPUs, some important...
Due to the growing complexity of embedded systems, simulation becomes an increasingly time-consuming task. Especially detailed simulation of so called Multi-Processor System-on-Chips (MPSoCs) is afflicted with extremely long runtimes and makes verification and debugging extraordinary expensive. In this work, a SystemC/TLM based methodology for accelerating simulation of NoC-based MPSoCs is presented...
In this paper, we take a new approach of thinking about programming Wireless Sensor Networks (WSNs) and introduce OSone, a distributed operating system (OS) for sensor transparency. Our philosophy is to make the network look like an ordinary computer, where each sensor of the network can be thought of one or multiple applications. Such a system allows software developers to abstract away from networking...
More and more devices in the mobile terminal market employ Android operating system as their operating system. To meet as much needs of users and devices manufactures as possible, the complicated architecture is adopted by Android, which leads to the long time of booting up. The average time of booting up for Android devices in the market is about 50 seconds. This paper introduces a method to improve...
Irregular algorithms are algorithms with complex main data structures such as directed and undirected graphs, trees, etc. A useful abstraction for many irregular algorithms is its operator formulation in which the algorithm is viewed as the iterated application of an operator to certain nodes, called active nodes, in the graph. Each operator application, called an activity, usually touches only a...
The simplicity of concurrent programming with Transactional Memory (TM) and its recent implementation in mainstream processors greatly motivates researchers and industry to investigate this field and propose new implementations and optimizations. However, there is still no standard C system library which a wide range of TM developers can adopt. TM application developers have been forced to avoid library...
Reconfigurable devices are often employed in heterogeneous systems due to their low power and parallel processing advantages. An important usability requirement is the support of a homogeneous programming interface. Nevertheless, homogeneous programming interfaces do not eliminate the need for code tweaking to enable efficient mapping of the computation across heterogeneous architectures. In this...
So far open source software has been developed for several decades. Linux has gradually become one of the major operating systems. The issue that Windows application migration can be migrated to Linux is raised. However, there is great difference in the implementation mechanism between Windows and Linux. In this research, we try to build an middle layer which between application and operating system...
EGNOS is the European SBAS currently providing GPS Safety of Life augmentation over Europe. In parallel with on-going EGNOS operations, GNSS constellations and signals are evolving (GLONASS, GPS, GALILEO, COMPASS, …). This evolving context has led ESA to launch the so-called “European GNSS Evolution Program” (EGEP) in order to explore more in depth the various evolution perspectives and evaluate possible...
This paper presents the development of a PC based multi-function recorder using an open-source real-time application interface (RTAI) in Linux environment. Here, various quantities such as three-phase real and reactive power (including the sign), power factor, RMS value of the voltage and currents and frequency are estimated employing the instantaneous samples of 3-phase voltages and currents. Such...
The advent of multi-core machines has lead to the need for revising the architecture of modern simulation platforms. One recent proposal we made attempted to explore the viability of load-sharing for optimistic simulators run on top of these types of machines. In this article, we provide an extensive experimental study for an assessment of the effects on run-time dynamics by a load-sharing architecture...
System virtualization enables multiple isolated running environments to be safely consolidated on a physical server, achieving better physical resource utilization and power saving. Virtual machine has been an essential component in most of the cloud/data-center system software stacks. However, virtualization brings negative impacts on synchronization in guest operating system (guest OS) and thus...
In a multi-CPU Virtual Machine(VM), virtual CPUs (VCPUs) are not guaranteed to be scheduled simultaneously. Operating System (OS) constructs, such as busy-wait (mainly spin locks and TLB shoot-down), are written with an assumption of running on bare-metal wastes lot of CPU time, resulting in performance degradation. For e.g., suppose a spin lock holding VCPU is preempted (aka LHP) by the host scheduler,...
Data-parallel languages feature fine-grained parallel primitives that can be supported by compilers targeting modern many-core architectures where data parallelism must be exploited to fully utilize the hardware. Previous research has focused on converting data-parallel languages for SIMD (single instruction multiple data) architectures. However, directly applying them to today's SIMT (single instruction...
Bilateral filtering is an ubiquitous tool for several kinds of image processing applications. This work explores multicore and many core accelerations for the embarrassingly parallel yet compute-intensive bilateral filtering kernel. For many core architectures, we have created a novel pair-symmetric algorithm to avoid redundant calculations. For multicore architectures, we improve the algorithm by...
In recent years, improvements of energy efficiency and computational performance have become a major issue, because smartphones and tablets become popular. To implement high performance, multi-core accelerator consists of general purpose processors and accelerators are often used. But to use these multi-core accelerator efficiently, programmers have to consider synchronization and data transfer between...
Traditionally, Logical Processes (LPs) forming a simulation model store their execution information into disjoint simulations states, forcing events exchange to communicate data between each other. In this work we propose the design and implementation of an extension to the traditional Time Warp (optimistic) synchronization protocol for parallel/distributed simulation, targeted at shared-memory/multicore...
This paper presents a method to map and implement the 1-D FFT on a GPGPU and extends the method to the 2-D FFT. Two approaches are used to maximize the performance. One is to localize data inside the caches of the GPGPU and the other is to properly assign threads and blocks to reach higher performance. The results show that our implementation is 3.62 times faster to perform 32M-point 1-D FFT and 4...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.