Search results

chapter

PACXX: Towards a Unified Programming Model for Programming Accelerators Using C++14

Michael Haidl, Sergei Gorlatch

2014 LLVM Compiler Infrastructure in HPC > 1 - 11

2014 LLVM Compiler Infrastructure in HPC (LLVM-HPC)

We present PACXX -- a unified programming model for programming many-core systems that comprise accelerators like Graphics Processing Units (GPUs). One of the main difficulties of the current GPU programming is that two distinct programming models are required: the host code for the CPU is written in C/C++ with the restricted, C-like API for memory management, while the device code for the GPU has...

chapter

Towards Providing Low-Overhead Data Race Detection for Large OpenMP Applications

Joachim Protze, Simone Atzeni, Dong H. Ahn, Martin Schulz, more

2014 LLVM Compiler Infrastructure in HPC > 40 - 47

2014 LLVM Compiler Infrastructure in HPC (LLVM-HPC)

Neither static nor dynamic data race detection methods, by themselves, have proven to be sufficient for large HPC applications, as they often result in high runtime overheads and/or low race-checking accuracy. While combined static and dynamic approaches can fare better, creating such combinations, in practice, requires attention to many details. Specifically, existing state-of-the-art dynamic race...

article

Resisting Skew-Accumulation for Time-Stepped Applications in the Cloud via Exploiting Parallelism

Yu Zhang, Xiaofei Liao, Hai Jin, Geyong Min

IEEE Transactions on Cloud Computing > 2015 > 3 > 1 > 54 - 65

Time-stepped applications are pervasive in scientific computing domain but perform poorly in the cloud because these applications execute in discrete time-step or tick and use logical synchronization barriers at tick boundaries to ensure correctness. As a result, the accumulated computational skew and communication skew that were unsolved in each tick can slow downtime-stepped applications significantly...

chapter

SAWS: Synchronization aware GPGPU warp scheduling for multiple independent warp schedulers

Jiwei Liu, Jun Yang, Rami Melhem

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) > 383 - 394

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

General-purpose computing on Graphics Processing Units (GPGPUs) became increasingly popular for a wide range of applications beyond traditional graphic rendering workloads. GPGPU exploits parallelism in applications via multithreading to hide memory latencies, and handles control complexity by barrier synchronizations. Warp scheduling algorithms have been optimized to increase memory latency hiding...

chapter

Efficiently enforcing strong memory ordering in GPUs

Abhayendra Singh, Shaizeen Aga, Satish Narayanasamy

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) > 699 - 712

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

GPU programming models such as CUDA and OpenCL are starting to adopt a weaker data-race-free (DRF-0) memory model, which does not guarantee any semantics for programs with data-races. Before standardizing the memory model interface for GPUs, it is imperative that we understand the tradeoffs of different memory models for these devices. While there is a rich memory model literature for CPUs, studies...

chapter

Migration of CUDA Program Based on a Divide-and-Conquer Method

Nan Li, Jianmin Pang, Zheng Shan

2014 IEEE 17th International Conference on Computational Science and Engineering > 1685 - 1691

2014 IEEE 17th International Conference on Computational Science and Engineering (CSE)

Porting CUDA program to other heterogeneous and many-core platform especially native processor is very meaningful for extending the range of the CUDA application, taking advantage of many-core on target platform and supporting national industries. Traditional binary translation technique is not competent to this task. On the point of software reverse engineering, it is feasible to design a new migration...

chapter

Comparison of two development boards for embedded system functionalities — Intel Galileo and Intel Atom board SYS9400

A Siri, G R Meghana, Roshni Kishan, Rajeshwari Hegde

International Conference on Circuits, Communication, Control and Computing > 153 - 155

2014 International Conference on Circuits, Communication, Control and Computing (I4C)

The use of microcontroller boards are extremely common in day to day lives, to such an extent that it is impossible to live without them. The choice of controllers is numerous. In the market today, with rich features for every board it is hard to choose the best one. This research paper aims to compare two microcontroller boards and to point out pros and cons of both boards. The two controllers chosen...

chapter

XcalableACC: Extension of XcalableMP PGAS Language Using OpenACC for Accelerator Clusters

Masahiro Nakao, Hitoshi Murai, Takenori Shimosaka, Akihiro Tabuchi, more

2014 First Workshop on Accelerator Programming using Directives > 27 - 36

2014 First Workshop on Accelerator Programming using Directives (WACCPD)

The present paper introduces the XcalableACC (XACC) programming model, which is a hybrid model of the XcalableMP (XMP) Partitioned Global Address Space (PGAS) language and OpenACC. XACC defines directives that enable programmers to mix XMP and OpenACC directives in order to develop applications that can use accelerator clusters with ease. Moreover, in order to improve the performance of stencil applications,...

chapter

JPI UML: UML Class and Sequence Diagrams Proposal for Aspect-Oriented JPI Applications

Cristian Vidal Silva, Rodolfo Villarroel

2014 33rd International Conference of the Chilean Computer Science Society (SCCC) > 120 - 123

2014 33rd International Conference of the Chilean Computer Science Society (SCCC)

Join Point Interfaces (JPI) represent a currentAspect-Oriented Programming (AOP) methodology for solving modularization issues in classic AOP. Nevertheless, as it is for classic AOP, phases of requirement elicitation and software design are needed for the JPI software development process. In order to advance towards the solution of these issues, this article proposes and applies to a case study JPI...

chapter

Synthesis of synchronization using uninterpreted functions

Roderick Bloem, Georg Hofferek, Bettina Konighofer, Robert Konighofer, more

2014 Formal Methods in Computer-Aided Design (FMCAD) > 35 - 42

2014 Formal Methods in Computer-Aided Design (FMCAD)

Correctness of a program with respect to concurrency is often hard to achieve, but easy to specify: the concurrent program should produce the same results as a sequential reference version. We show how to automatically insert small atomic sections into a program to ensure correctness with respect to this implicit specification. Using techniques from bounded software model checking, we transform the...

chapter

A Case Study of Hybrid Dataflow and Shared-Memory Programming Models: Dependency-Based Parallel Game Engine

Vladimir Gajinov, Igor Eric, Saa Stojanovic, Veljko Milutinovic, more

2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing > 1 - 8

2014 26th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

Recently proposed hybrid dataflow and shared memory programming models combine these two underlying models in order to support a wider range of problems naturally. The effectiveness of such hybrid models for parallel implementations of dense and sparse algebra problems is well known. In this paper, we show another real world example for which hybrid dataflow models provide better support than traditional...

chapter

Leveraging OmpSs to Exploit Hardware Accelerators

Florentino Sainz, Sergi Mateo, Vicenc Beltran, Jose L. Bosque, more

2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing > 112 - 119

2014 26th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

CUDA and OpenCL are the most widely used programming models to exploit hardware accelerators. Both programming models provide a C-based programming language to write accelerator kernels and a host API used to glue the host and kernel parts. Although this model is a clear improvement over a low-level and ad-hoc programming model for each hardware accelerator, it is still too complex and cumbersome...

chapter

Refactoring Java Concurrent Programs Based on Synchronization Requirement Analysis

Binxian Tao, Ju Qian

2014 IEEE International Conference on Software Maintenance and Evolution > 361 - 370

2014 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Writing high quality concurrent programs is challenging. A concurrent program that is not well-written may suffer from coarse synchronization problems, e.g., overly-large critical sections, overly-coarse locks, and etc. These coarse synchronizations may introduce unnecessary lock contention and thereby affect the parallel execution of running threads. To optimize them, people suggest use refactorings,...

chapter

Competitors or Cousins? Studying the parallels between distributed programming languages SystemJ and IEC61499

Roopak Sinha, Valeriy Vyatkin, Zoran Salcic, Hee Jong Park

Proceedings of the 2014 IEEE Emerging Technology and Factory Automation (ETFA) > 1 - 7

2014 IEEE Emerging Technology and Factory Automation (ETFA)

We face a glut of languages for programming distributed software today. However, only a few languages have proven their potential with wider practical use in different domains of computing. We picked two such languages, meant for different domains, to see if they could cross-pollinate and enrich one another. Specifically, we chose SystemJ, a language to program distributed embedded systems, and IEC61499,...

chapter

FedLoop: Looping on Federated MapReduce

Chun-Yu Wang, Tzu-Li Tai, Kuan-Chieh Huang, Tse-En Liu, more

2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications > 755 - 762

2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

The challenges of the Big Data era has motivated many organizations to turn towards distributed, large-scale processing platforms to deal with their data. Map Reduce, and its open-source implementation, Hadoop, has grown to be highly popular with its successful programming model for simplified cluster processing. As a result, many organizations deploy their own Map Reduce/Hadoop clusters to store...

chapter

CASITA: A Tool for Identifying Critical Optimization Targets in Distributed Heterogeneous Applications

Felix Schmitt, Jonas Stolle, Robert Dietrich

2014 43rd International Conference on Parallel Processing Workshops > 186 - 195

2014 43nd International Conference on Parallel Processing Workshops (ICCPW)

Programming of high performance computing systems has become more complex over time. Several layers of parallelism need to be exploited to efficiently utilize the available resources. To support application developers and performance analysts we propose a technique for identifying the most performance critical optimization targets in distributed heterogeneous applications. We have developed CASITA,...

chapter

Times square - marriage of real-time and logical-time in GALS and synchronous languages

HeeJong Park, Avinash Malik, Zoran Salcic

2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications > 1 - 10

2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)

In this paper we introduce exact and non-exact real-time waits in reactive Globally Asynchronous Locally Synchronous (GALS) programming languages and synchronous languages as their subset. The language constructs that allow use of real-time waits are illustrated on the SystemJ GALS language. They allow system designers to explicitly use, at the specification level, not only logical time but also the...

chapter

A HLS-Based Toolflow to Design Next-Generation Heterogeneous Many-Core Platforms with Shared Memory

Paolo Burgio, Andrea Marongiu, Philippe Coussy, Luca Benini

2014 12th IEEE International Conference on Embedded and Ubiquitous Computing > 130 - 137

2014 12th IEEE International Conference on Embedded and Ubiquitous Computing (EUC)

This work describes how we use High-Level Synthesis to support design space exploration (DSE) of heterogeneous many-core systems. Modern embedded systems increasingly couple hardware accelerators and processing cores on the same chip, to trade specialization of the platform to an application domain for increased performance and energy efficiency. However, the process of designing such a platform is...

chapter

Communication Optimal Least Squares Solver

Pawan Kumar

2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS) > 316 - 319

2014 IEEE International Conference on High Performance Computing and Communications (HPCC), 2014 IEEE 6th International Symposium on Cyberspace Safety and Security (CSS) and 2014 IEEE 11th International Conference on Embedded Software and Systems (ICESS)

For matrix with full column rank, QR algorithm is among the best approach to solve wider class of least squares problem (LS). Using the communication optimal variant of TSQR, we study the scalability of the least squares solver with multiple right hand sides. The communication for TSQR based LS solver for multiple right hand sides is still optimal in the sense that no additional messages are necessary...

chapter

Wireless sensor network UML profile to support model-driven development

A. R. Paulon, A. A. Frohlich, L. B. Becker, F. P. Basso

2014 12th IEEE International Conference on Industrial Informatics (INDIN) > 227 - 232

2014 12th IEEE International Conference on Industrial Informatics (INDIN)

Wireless Sensor Networks (WSNs) are rapidly becoming a necessary tool in many different application areas, such as environmental monitoring, security, safety, and so on. The heterogeneity of hardware is large, so there exists several different environments that support WSN programming. However, the great majority of such environments only target the sensors programming, forgetting about their real...

INFONA - science communication portal

Search results

PACXX: Towards a Unified Programming Model for Programming Accelerators Using C++14

Towards Providing Low-Overhead Data Race Detection for Large OpenMP Applications

Resisting Skew-Accumulation for Time-Stepped Applications in the Cloud via Exploiting Parallelism

SAWS: Synchronization aware GPGPU warp scheduling for multiple independent warp schedulers

Efficiently enforcing strong memory ordering in GPUs

Migration of CUDA Program Based on a Divide-and-Conquer Method

Comparison of two development boards for embedded system functionalities — Intel Galileo and Intel Atom board SYS9400

XcalableACC: Extension of XcalableMP PGAS Language Using OpenACC for Accelerator Clusters

JPI UML: UML Class and Sequence Diagrams Proposal for Aspect-Oriented JPI Applications

Synthesis of synchronization using uninterpreted functions

A Case Study of Hybrid Dataflow and Shared-Memory Programming Models: Dependency-Based Parallel Game Engine

Leveraging OmpSs to Exploit Hardware Accelerators

Refactoring Java Concurrent Programs Based on Synchronization Requirement Analysis

Competitors or Cousins? Studying the parallels between distributed programming languages SystemJ and IEC61499

FedLoop: Looping on Federated MapReduce

CASITA: A Tool for Identifying Critical Optimization Targets in Distributed Heterogeneous Applications

Times square - marriage of real-time and logical-time in GALS and synchronous languages

A HLS-Based Toolflow to Design Next-Generation Heterogeneous Many-Core Platforms with Shared Memory

Communication Optimal Least Squares Solver

Wireless sensor network UML profile to support model-driven development

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options