The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Partitioned global address space (PGAS) programming model presents programmers with a globally shared address space with locality awareness and one-sided communication constructs. The shared address space and the one-sided communication constructs enhance ease-of-use of PGAS based languages and the locality awareness enables programmers and the runtime systems to achieve higher performance. Thus PGAS...
We present a runtime system that uses the explicit on-chip communication mechanisms of the SARC multi-core architecture, to implement efficiently the OpenMP programming model and enable the exploitation of fine-grain parallelism in OpenMP programs. We explore the design space of implementation of OpenMP directives and runtime intrinsics, using a family of hardware primitives; remote stores, remote...
We present a multi-core virtual platform which follows single-core architecture, SPARC v8, available as an open source development suite. The proposed multi-SPARC system operates at electronic system level to accelerate its simulation speed. TLM channels are devised to connect the processors. To simplify the use of the proposed virtual platform, we define some specific APIs for data transaction and...
Parallelism is the most important mean to exploit the computation potential of multi-core processors. Real applications, particularly, commercial applications often have strong dependence that has to be respected. In order to achieve reasonably good performance, hybrid parallelism schemes usually need to be applied in these applications. Furthermore, parallel applications with task and pipeline parallelism...
In the recent years, multicore processor designs have become increasingly popular for embedded applications, but diversified inter-core communication mechanisms have led to the difficulties in software development, integration and migration. A unified, portable, and efficient inter-core communication mechanism would have helped reduce these difficulties significantly, but such a solution did not exist...
Considering the ability to perform multi-processor architecture systems on FPGA, partial reconfiguration is an opportunity to improve weak soft-core performances by specializing coprocessors according to context-dependent application needs. But at the application level, there is a need for straightforward programming models that allow applications to be easily mapped on an ad hoc architecture without...
Clusters of Symmetric MultiProcessing (SMP) nodes with multi-core Chip-Multiprocessors (CMP), also known as SMP-CMP clusters, are becoming ubiquitous today. For Message Passing interface (MPI) programs, such clusters have a multi-layer hierarchical communication structure: the performance of intra-node communication is usually higher than that of inter-node communication; and the performance of intra-node...
The now commonplace multi-core chips have introduced, by design, a deep hierarchy of memory and cache banks within parallel computers as a tradeoff between the user friendliness of shared memory on the one side, and memory access scalability and efficiency on the other side. However, to get high performance out of such machines requires a dynamic mapping of application tasks and data onto the underlying...
The introduction of multi-core architectures generates a higher demand for parallelism in order to fully exploit the potential of modern computers. It is of vital importance that a compiler can allocate parallel workload in a cost-aware manner in order to achieve optimal performance on a multi-core architecture. This paper presents an adaptive OpenMP-based mechanism capable of generating a reasonable...
In geophysics, the appropriate subdivision of a region into segments is extremely important. ICTM (interval categorizer tesselation model) is an application that categorizes geographic regions using information extracted from satellite images. The categorization of large regions is a computational intensive problem, what justifies the proposal and development of parallel solutions in order to improve...
In this paper, we present a methodology for profiling parallel applications executing on the IBM PowerXCell 8i (commonly referred to as the ldquoCellrdquo processor). Specifically, we examine Cell-centric MPI programs on hybrid clusters containing multiple Opteron and Cell processors per node such as those used in the petascale Roadrunner system. Our implementation incurs less than 3.2 mus of overhead...
Multicore processors have been adopted for consumer electronics like portable electronics, mobile phones, car navigation systems, digital TVs and games to obtain high performance with low power consumption. The OSCAR automatic parallelizing compiler has been developed to utilize these multicores easily. Also, a new consumer electronics multicore application program interface (API) to use the OSCAR...
This paper presents a multiprocessor system on FPGA that adopts Direct Memory Access (DMA) mechanisms to move data between the external memory and the local memory of each processor. The system integrates all standard DMA primitives via a fast Application Programming Interface (API) and relies on interrupts having also the possibility to manage a command list. This interface allows to program the...
In this article, we present an original MPI-based adaptive task migration support for the HS-Scale system. Our previous communication API was modified in order to be MPI compliant. In order to enable task migration without any MMU, a Position Independent Code compilation technique is implemented. The self-adaptability is based on monitoring information collected at run-time by each processing element...
It has been common that modern embedded system products are built on platforms with system-on-a-chip (SOC) in which two or more different processor cores are put into one single chip and form the architecture of heterogeneous multiprocessor. Although providing high performance at low cost, such architecture brings new design challenges as well as increased complexity in developing embedded software...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.