The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We present PACXX -- a unified programming model for programming many-core systems that comprise accelerators like Graphics Processing Units (GPUs). One of the main difficulties of the current GPU programming is that two distinct programming models are required: the host code for the CPU is written in C/C++ with the restricted, C-like API for memory management, while the device code for the GPU has...
Neither static nor dynamic data race detection methods, by themselves, have proven to be sufficient for large HPC applications, as they often result in high runtime overheads and/or low race-checking accuracy. While combined static and dynamic approaches can fare better, creating such combinations, in practice, requires attention to many details. Specifically, existing state-of-the-art dynamic race...
Time-stepped applications are pervasive in scientific computing domain but perform poorly in the cloud because these applications execute in discrete time-step or tick and use logical synchronization barriers at tick boundaries to ensure correctness. As a result, the accumulated computational skew and communication skew that were unsolved in each tick can slow downtime-stepped applications significantly...
General-purpose computing on Graphics Processing Units (GPGPUs) became increasingly popular for a wide range of applications beyond traditional graphic rendering workloads. GPGPU exploits parallelism in applications via multithreading to hide memory latencies, and handles control complexity by barrier synchronizations. Warp scheduling algorithms have been optimized to increase memory latency hiding...
GPU programming models such as CUDA and OpenCL are starting to adopt a weaker data-race-free (DRF-0) memory model, which does not guarantee any semantics for programs with data-races. Before standardizing the memory model interface for GPUs, it is imperative that we understand the tradeoffs of different memory models for these devices. While there is a rich memory model literature for CPUs, studies...
Porting CUDA program to other heterogeneous and many-core platform especially native processor is very meaningful for extending the range of the CUDA application, taking advantage of many-core on target platform and supporting national industries. Traditional binary translation technique is not competent to this task. On the point of software reverse engineering, it is feasible to design a new migration...
The use of microcontroller boards are extremely common in day to day lives, to such an extent that it is impossible to live without them. The choice of controllers is numerous. In the market today, with rich features for every board it is hard to choose the best one. This research paper aims to compare two microcontroller boards and to point out pros and cons of both boards. The two controllers chosen...
The present paper introduces the XcalableACC (XACC) programming model, which is a hybrid model of the XcalableMP (XMP) Partitioned Global Address Space (PGAS) language and OpenACC. XACC defines directives that enable programmers to mix XMP and OpenACC directives in order to develop applications that can use accelerator clusters with ease. Moreover, in order to improve the performance of stencil applications,...
Join Point Interfaces (JPI) represent a currentAspect-Oriented Programming (AOP) methodology for solving modularization issues in classic AOP. Nevertheless, as it is for classic AOP, phases of requirement elicitation and software design are needed for the JPI software development process. In order to advance towards the solution of these issues, this article proposes and applies to a case study JPI...
Correctness of a program with respect to concurrency is often hard to achieve, but easy to specify: the concurrent program should produce the same results as a sequential reference version. We show how to automatically insert small atomic sections into a program to ensure correctness with respect to this implicit specification. Using techniques from bounded software model checking, we transform the...
Recently proposed hybrid dataflow and shared memory programming models combine these two underlying models in order to support a wider range of problems naturally. The effectiveness of such hybrid models for parallel implementations of dense and sparse algebra problems is well known. In this paper, we show another real world example for which hybrid dataflow models provide better support than traditional...
CUDA and OpenCL are the most widely used programming models to exploit hardware accelerators. Both programming models provide a C-based programming language to write accelerator kernels and a host API used to glue the host and kernel parts. Although this model is a clear improvement over a low-level and ad-hoc programming model for each hardware accelerator, it is still too complex and cumbersome...
Writing high quality concurrent programs is challenging. A concurrent program that is not well-written may suffer from coarse synchronization problems, e.g., overly-large critical sections, overly-coarse locks, and etc. These coarse synchronizations may introduce unnecessary lock contention and thereby affect the parallel execution of running threads. To optimize them, people suggest use refactorings,...
We face a glut of languages for programming distributed software today. However, only a few languages have proven their potential with wider practical use in different domains of computing. We picked two such languages, meant for different domains, to see if they could cross-pollinate and enrich one another. Specifically, we chose SystemJ, a language to program distributed embedded systems, and IEC61499,...
The challenges of the Big Data era has motivated many organizations to turn towards distributed, large-scale processing platforms to deal with their data. Map Reduce, and its open-source implementation, Hadoop, has grown to be highly popular with its successful programming model for simplified cluster processing. As a result, many organizations deploy their own Map Reduce/Hadoop clusters to store...
Programming of high performance computing systems has become more complex over time. Several layers of parallelism need to be exploited to efficiently utilize the available resources. To support application developers and performance analysts we propose a technique for identifying the most performance critical optimization targets in distributed heterogeneous applications. We have developed CASITA,...
In this paper we introduce exact and non-exact real-time waits in reactive Globally Asynchronous Locally Synchronous (GALS) programming languages and synchronous languages as their subset. The language constructs that allow use of real-time waits are illustrated on the SystemJ GALS language. They allow system designers to explicitly use, at the specification level, not only logical time but also the...
This work describes how we use High-Level Synthesis to support design space exploration (DSE) of heterogeneous many-core systems. Modern embedded systems increasingly couple hardware accelerators and processing cores on the same chip, to trade specialization of the platform to an application domain for increased performance and energy efficiency. However, the process of designing such a platform is...
For matrix with full column rank, QR algorithm is among the best approach to solve wider class of least squares problem (LS). Using the communication optimal variant of TSQR, we study the scalability of the least squares solver with multiple right hand sides. The communication for TSQR based LS solver for multiple right hand sides is still optimal in the sense that no additional messages are necessary...
Wireless Sensor Networks (WSNs) are rapidly becoming a necessary tool in many different application areas, such as environmental monitoring, security, safety, and so on. The heterogeneity of hardware is large, so there exists several different environments that support WSN programming. However, the great majority of such environments only target the sensors programming, forgetting about their real...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.