Search results

chapter

A clustered manycore processor architecture for embedded and accelerated applications

Benoit Dupont de Dinechin, Renaud Ayrignac, Pierre-Edouard Beaucamps, Patrice Couvert, more

2013 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 6

2013 IEEE High Performance Extreme Computing Conference (HPEC)

The Kalray MPPA-256 processor integrates 256 user cores and 32 system cores on a chip with 28nm CMOS technology. Each core implements a 32-bit 5-issue VLIW architecture. These cores are distributed across 16 compute clusters of 16+1 cores, and 4 quad-core I/O subsystems. Each compute cluster and I/O subsystem owns a private address space, while communication and synchronization between them is ensured...

chapter

Design, Implementation and Evaluation of Built-in Functions on Parallel Programming Model in SMYLE OpenCL

Noriko Etani, Takuji Hieda, Hiroyuki Tomiyama

2013 IEEE 7th International Symposium on Embedded Multicore Socs > 113 - 118

2013 IEEE 7th International Symposium on Embedded Multicore Socs (MCSoC)

In this paper, we propose built-in functions on parallel programming model in SMYLE OpenCL to extend the original OpenCL semantics giving our system's original limitation and interpretation for embedded many-core architecture. On a platform using FPGA to evaluate embedded many-core architecture SMYLEref, data parallel and task parallel programming models supported by the OpenCL execution model are...

chapter

A Comparison of Performance Tunabilities between OpenCL and OpenACC

Makoto Sugawara, Shoichi Hirasawa, Kazuhiko Komatsu, Hiroyuki Takizawa, more

2013 IEEE 7th International Symposium on Embedded Multicore Socs > 147 - 152

2013 IEEE 7th International Symposium on Embedded Multicore Socs (MCSoC)

To design and develop any auto tuning mechanisms for OpenACC, it is important to clarify the differences between conventional GPU programming models and OpenACC in terms of available programming and tuning techniques, called performance tunabilities. This paper hence discusses the performance tunabilities of OpenACC and OpenCL. As OpenACC cannot synchronize threads running on GPUs, some important...

chapter

A SystemC modeling and simulation methodology for fast and accurate parallel MPSoC simulation

Christoph Roth, Harald Bucher, Simon Reder, Florian Buciuman, more

2013 26th Symposium on Integrated Circuits and Systems Design (SBCCI) > 1 - 6

2013 26th Symposium on Integrated Circuits and Systems Design (SBCCI)

Due to the growing complexity of embedded systems, simulation becomes an increasingly time-consuming task. Especially detailed simulation of so called Multi-Processor System-on-Chips (MPSoCs) is afflicted with extremely long runtimes and makes verification and debugging extraordinary expensive. In this work, a SystemC/TLM based methodology for accelerating simulation of NoC-based MPSoCs is presented...

chapter

OSone: A distributed operating system for energy efficient Sensor Network

Bence Pasztor, Pan Hui

Proceedings of the 2013 25th International Teletraffic Congress (ITC) > 1 - 9

2013 25th International Teletraffic Congress (ITC 2013)

In this paper, we take a new approach of thinking about programming Wireless Sensor Networks (WSNs) and introduce OSone, a distributed operating system (OS) for sensor transparency. Our philosophy is to make the network look like an ordinary computer, where each sensor of the network can be thought of one or multiple applications. Such a system allows software developers to abstract away from networking...

chapter

Fast boot and fast shutdown of Android on the embedded system

Zheng Wenxuan, Lei Hang, Yang Xia, Zhang Fangjie

2013 IEEE 11th International Conference on Electronic Measurement & Instruments > 2 > 1003 - 1008

2013 IEEE 11th International Conference on Electronic Measurement & Instruments (ICEMI)

More and more devices in the mobile terminal market employ Android operating system as their operating system. To meet as much needs of users and devices manufactures as possible, the complicated architecture is adopted by Android, which leads to the long time of booting up. The average time of booting up for Android devices in the market is about 50 seconds. This paper introduces a method to improve...

chapter

Data-Driven Versus Topology-driven Irregular Computations on GPUs

Rupesh Nasre, Martin Burtscher, Keshav Pingali

2013 IEEE 27th International Symposium on Parallel and Distributed Processing > 463 - 474

2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Irregular algorithms are algorithms with complex main data structures such as directed and undirected graphs, trees, etc. A useful abstraction for many irregular algorithms is its operator formulation in which the algorithm is viewed as the iterated application of an operator to certain nodes, called active nodes, in the graph. Each operator application, called an activity, usually touches only a...

chapter

TM-dietlibc: A TM-aware Real-World System Library

Vesna Smiljkovic, Martin Nowack, Neboja Miletic, Timothy Harris, more

2013 IEEE 27th International Symposium on Parallel and Distributed Processing > 1266 - 1274

2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

The simplicity of concurrent programming with Transactional Memory (TM) and its recent implementation in mainstream processors greatly motivates researchers and industry to investigate this field and propose new implementations and optimizations. However, there is still no standard C system library which a wide range of TM developers can adopt. TM application developers have been forced to avoid library...

chapter

Throughput-oriented kernel porting onto FPGAs

Alexandros Papakonstantinou, Deming Chen, Wen-Mei Hwu, Jason Cong, more

2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 10

2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)

Reconfigurable devices are often employed in heterogeneous systems due to their low power and parallel processing advantages. An important usability requirement is the support of a homogeneous programming interface. Nevertheless, homogeneous programming interfaces do not eliminate the need for code tweaking to enable efficient mapping of the computation across heterogeneous architectures. In this...

chapter

An Approach of Windows Synchronization Mechanism Simulation on Linux

Rui Li, Nanjun Yang, Shilong Ma

2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies > 442 - 445

2012 13th International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT)

So far open source software has been developed for several decades. Linux has gradually become one of the major operating systems. The issue that Windows application migration can be migrated to Linux is raised. However, there is great difference in the implementation mechanism between Windows and Linux. In this research, we try to build an middle layer which between application and operating system...

chapter

Support Platform for EGNOS Evolution and Demonstration (SPEED): Operational SBAS test bed as real as EGNOS

H. Delfour, J. A. Gicquel, P. Larhantec, D. Joly, more

2012 6th ESA Workshop on Satellite Navigation Technologies (Navitec 2012) & European Workshop on GNSS Signals and Signal Processing > 1 - 8

2012 6th ESA Workshop on Satellite Navigation Technologies (Navitec 2012) & European Workshop on GNSS Signals and Signal Processing

EGNOS is the European SBAS currently providing GPS Safety of Life augmentation over Europe. In parallel with on-going EGNOS operations, GNSS constellations and signals are evolving (GLONASS, GPS, GALILEO, COMPASS, …). This evolving context has led ESA to launch the so-called “European GNSS Evolution Program” (EGEP) in order to explore more in depth the various evolution perspectives and evaluate possible...

chapter

Development of a PC based multi-function recorder for the laboratory model power system on RTAI-Linux platform

M Deepak., K.N Shubhanga

2012 Annual IEEE India Conference (INDICON) > 759 - 764

2012 Annual IEEE India Conference (INDICON)

This paper presents the development of a PC based multi-function recorder using an open-source real-time application interface (RTAI) in Linux environment. Here, various quantities such as three-phase real and reactive power (including the sign), power factor, RMS value of the voltage and currents and frequency are estimated employing the instantaneous samples of 3-phase voltages and currents. Such...

chapter

Assessing load-sharing within optimistic simulation platforms

Roberto Vitali, Alessandro Pellegrini, Francesco Quaglia

Proceedings Title: Proceedings of the 2012 Winter Simulation Conference (WSC) > 1 - 13

2012 Winter Simulation Conference - (WSC 2012)

The advent of multi-core machines has lead to the need for revising the architecture of modern simulation platforms. One recent proposal we made attempted to explore the viability of load-sharing for optimistic simulators run on top of these types of machines. In this article, we provide an extensive experimental study for an assessment of the effects on run-time dynamics by a load-sharing architecture...

chapter

Optimizing Xen Hypervisor by Using Lock-Aware Scheduling

Alin Zhong, Hai Jin, Song Wu, Xuanhua Shi, more

2012 Second International Conference on Cloud and Green Computing > 31 - 38

2012 International Conference on Cloud and Green Computing (CGC)

System virtualization enables multiple isolated running environments to be safely consolidated on a physical server, achieving better physical resource utilization and power saving. Virtual machine has been an essential component in most of the cloud/data-center system software stacks. However, virtualization brings negative impacts on synchronization in guest operating system (guest OS) and thus...

chapter

Paravirtualization for Scalable Kernel-Based Virtual Machine (KVM)

K. T. Raghavendra, Srivatsa Vaddagiri, Nikunj Dadhania, Jeremy Fitzhardinge

2012 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM) > 1 - 5

2012 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM)

In a multi-CPU Virtual Machine(VM), virtual CPUs (VCPUs) are not guaranteed to be scheduled simultaneously. Operating System (OS) constructs, such as busy-wait (mainly spin locks and TLB shoot-down), are written with an assumption of running on bare-metal wastes lot of CPU time, resulting in performance degradation. For e.g., suppose a spin lock holding VCPU is preempted (aka LHP) by the host scheduler,...

chapter

CuNesl: Compiling Nested Data-Parallel Languages for SIMT Architectures

Yongpeng Zhang, Frank Mueller

2012 41st International Conference on Parallel Processing > 340 - 349

2012 41st International Conference on Parallel Processing (ICPP)

Data-parallel languages feature fine-grained parallel primitives that can be supported by compilers targeting modern many-core architectures where data parallelism must be exploited to fully utilize the hardware. Previous research has focused on converting data-parallel languages for SIMD (single instruction multiple data) architectures. However, directly applying them to today's SIMT (single instruction...

chapter

Acceleration of Bilateral Filtering Algorithm for Manycore and Multicore Architectures

Dinesh Agarwal, Sami Wilf, Abinashi Dhungel, Sushil K. Prasad

2012 41st International Conference on Parallel Processing > 78 - 87

2012 41st International Conference on Parallel Processing (ICPP)

Bilateral filtering is an ubiquitous tool for several kinds of image processing applications. This work explores multicore and many core accelerations for the embarrassingly parallel yet compute-intensive bilateral filtering kernel. For many core architectures, we have created a novel pair-symmetric algorithm to avoid redundant calculations. For multicore architectures, we improve the algorithm by...

chapter

An OpenCL Runtime Library for Embedded Multi-Core Accelerator

Ryuichi Sakamoto, Mikiko Sato, Yusuke Koizumi, Hideharu Amano, more

2012 IEEE International Conference on Embedded and Real-Time Computing Systems and Applications > 419 - 422

2012 IEEE 18th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2012)

In recent years, improvements of energy efficiency and computational performance have become a major issue, because smartphones and tablets become popular. To implement high performance, multi-core accelerator consists of general purpose processors and accelerators are often used. But to use these multi-core accelerator efficiently, programmers have to consider synchronization and data transfer between...

chapter

Transparent and Efficient Shared-State Management for Optimistic Simulations on Multi-core Machines

Alessandro Pellegrini, Roberto Vitali, Sebastiano Peluso, Francesco Quaglia

2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems > 134 - 141

2012 IEEE 20th International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS)

Traditionally, Logical Processes (LPs) forming a simulation model store their execution information into disjoint simulations states, forcing events exchange to communicate data between each other. In this work we propose the design and implementation of an extension to the traditional Time Warp (optimistic) synchronization protocol for parallel/distributed simulation, targeted at shared-memory/multicore...

chapter

Parallel one- and two-dimensional FFTs on GPGPUs

Mehrdad Fallahpour, Chang-Hong Lin, Ming-Bo Lin, Chin-Yu Chang

Anti-counterfeiting, Security, and Identification > 1 - 5

2012 International Conference on Anti-Counterfeiting, Security and Identification (2012 ASID)

This paper presents a method to map and implement the 1-D FFT on a GPGPU and extends the method to the 2-D FFT. Two approaches are used to maximize the performance. One is to localize data inside the caches of the GPGPU and the other is to properly assign threads and blocks to reach higher performance. The results show that our implementation is 3.62 times faster to perform 32M-point 1-D FFT and 4...

INFONA - science communication portal

Search results

A clustered manycore processor architecture for embedded and accelerated applications

Design, Implementation and Evaluation of Built-in Functions on Parallel Programming Model in SMYLE OpenCL

A Comparison of Performance Tunabilities between OpenCL and OpenACC

A SystemC modeling and simulation methodology for fast and accurate parallel MPSoC simulation

OSone: A distributed operating system for energy efficient Sensor Network

Fast boot and fast shutdown of Android on the embedded system

Data-Driven Versus Topology-driven Irregular Computations on GPUs

TM-dietlibc: A TM-aware Real-World System Library

Throughput-oriented kernel porting onto FPGAs

An Approach of Windows Synchronization Mechanism Simulation on Linux

Support Platform for EGNOS Evolution and Demonstration (SPEED): Operational SBAS test bed as real as EGNOS

Development of a PC based multi-function recorder for the laboratory model power system on RTAI-Linux platform

Assessing load-sharing within optimistic simulation platforms

Optimizing Xen Hypervisor by Using Lock-Aware Scheduling

Paravirtualization for Scalable Kernel-Based Virtual Machine (KVM)

CuNesl: Compiling Nested Data-Parallel Languages for SIMT Architectures

Acceleration of Bilateral Filtering Algorithm for Manycore and Multicore Architectures

An OpenCL Runtime Library for Embedded Multi-Core Accelerator

Transparent and Efficient Shared-State Management for Optimistic Simulations on Multi-core Machines

Parallel one- and two-dimensional FFTs on GPGPUs

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options