The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
High-performance computing increasingly makes use of heterogeneous many-core parallelism. Individual processor cores within such systems are radically simpler than their predecessors; and tasks previously the responsibility of hardware, are delegated to software. Rather than use a cache, fast on-chip memory, is exposed through a handful of address space annotations; associating pointers with discrete...
The pre-exascale systems are expected to have a significant amount of hierarchical and heterogeneous on-node memory, and this trend of system architecture in extreme-scale systems is expected to continue into the exascale era. Along with hierarchical-heterogeneous memory, the system typically has a high-performing network and a compute accelerator. This system architecture is not only effective for...
In this paper, we present our work to enable optimized one-sided communication operations on the ARM v8 architecture using a high-performance InfiniBand network interconnect, as well as an evaluation of our implementation. For this study, we started with an OpenSHMEM implementation based on Open MPI/SHMEM, and combined it with the UCX framework and the XPMEM kernel extension for shared memory communication...
GEMM is the main computational kernel in BLAS3. Its micro-kernel is either hand-crafted in assembly code or generated from C code by general-purpose compilers (guided by architecture-specific directives or auto-tuning). Therefore, either performance or portability suffers. We present a POrtable Compiler Approach, Poca, implemented in LLVM, to automatically generate and optimize this micro-kernel in...
This paper is focused on the hardware modeling and the algorithms mapping on the digital signal processor (DSP) with the very long instruction word (VLIW) architecture, such as TMS320C6000. The general methods to develop an efficient application for the target processor combine high- and/or low-level programming languages. Although the hardware capabilities of the nowadays processors and compilers...
Hardware architecture is increasingly complex, urging the development of asynchronous runtime systems with advance resource and locality management supports. However, these supports may come at the cost of complicating the user interface while programming remains one of the major constraints to wide adoption of asynchronous runtimes in practice. In this paper, we propose a solution that leverages...
In this paper we present a novel approach for functional-style programming of distributed-memory clusters, targeting data-centric applications. The programming model proposed is purely sequential, SPMD-free and based on high-level functional features introduced since C++11 specification. Additionally, we propose a novel cluster-as-accelerator design principle. In this scheme, cluster nodes act as...
In this paper we present a novel approach for functional-style programming of distributed-memory clusters, targeting data-centric applications. The programming model proposed is purely sequential, SPMD-free and based on high-level functional features introduced since C++11 specification. Additionally, we propose a novel cluster-as-accelerator design principle. In this scheme, cluster nodes act as...
Modern platform FPGAs are sufficiently dense to allow the assembly of a complete chip heterogeneous multiprocessor systems on chip (CHMPs) within a single die. Based on CHMP, every research group that sets out to explore how an application can be accelerated on an FPGA platform must firstly integrate processors, buses, memories, and support IP components into a base architecture prior to beginning...
Hybrid Diversity-aware Collective Adaptive Systems (HDA-CAS) is a new generation of socio-technical systems where both humans and machine peers complement each other and operate collectively to achieve their goals. These systems are characterized by the fundamental properties of hybridity and collectiveness, hiding from users the complexities associated with managing the collaboration and coordination...
Scripting languages such as Python and R have been widely adopted as tools for the productive development of scientific software because of the power and expressiveness of the languages and available libraries. However, deploying scripted applications on large-scale parallel computer systems such as the IBM Blue Gene/Q or Cray XE6 is a challenge because of issues including operating system limitations,...
Representative applications are versatile tools to evaluate new programming approaches, techniques and optimisations as a way to ensure continued high performance on future computing architectures. They make experimentation much easier before adopting changes/insights into the large scientific codes. In this paper we demonstrate the important role played by representative/proxy applications in designing...
Return-oriented programming is a kind of codereuse technique for attackers, which is very effective to bypass the DEP defense. However, the instruction snippet (we call it gadget) is often unprintable 1. This shortcoming can limit the ROP attack to be deployed to practice, since non-ASCII scanning can detect such ROP payload. In this paper, we present a novel method that only uses the printable gadgets,...
Code reuse attacks (CRAs) are recent security exploits that allow attackers to execute arbitrary code on a compromised machine. CRAs, exemplified by return-oriented and jump-oriented programming approaches, reuse fragments of the library code, thus avoiding the need for explicit injection of attack code on the stack. Since the executed code is reused existing code, CRAs bypass current hardware and...
Aiming at addressing difficulties in the development of embedded system, this paper proposes a new method. Based on the object-oriented concepts and by reference to the touching operating mode of the smart phones or tablet PCs, we achieved a new set of smart touching UI controls and device driver classes library in C# with the .NET Compact Framework environment. This method improves the programming...
High-end computing (HEC) applications in critical areas of science and technology tend to be more and more data intensive. I/O has become a vital performance bottleneck of modern HEC practice. Conventional HEC execution paradigms, however, are computing-centric for computation intensive applications. They are designed to utilize memory and CPU performance and have inherent limitations in addressing...
In this work we propose a flexible, generic and agent based platform for the design and simulation of wireless sensor networks (WSN) protocols and applications. This platform is independent from the physical architecture of nodes, where each node is considered as an autonomous agent having its own properties and behaviors according to the role it takes in the network. Once the description of the WSN...
This paper proposes a flexible programming model (FPM), which addresses the automatic parallel execution for functional tasks on heterogeneous multiprocessors. Guided by the simply annotated source codes, a front-end source to source compiler is provided to identify the parallel regions and generate the sources codes. A runtime middleware analyzes the inter-task data dependencies and schedules the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.