The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Hal is a High-level Actor-based Language. Hal supports a number of communication mechanisms, local synchronization constraints, inheritance, and restricted forms of reflection. This paper discusses some issues in compiling Hal. Specifically, we describe three source-level transformations used by the compiler for Hal. Two of the transformations translate RPC-style message sending into asynchronous...
In this paper, we present a concurrent execution semantics for Parallel Program Graphs (PPGs), a general parallel program representation that includes Program Dependence Graphs (PDGs) and sequential programs. We believe that this semantics is natural to the programmer's way of thinking, and that it also provides a suitable execution model for efficient implementation on real architectures. To demonstrate...
Compilers for superscalar and VLIW processors must expose sufficient instruction-level parallelism in order to achieve high performance. Compiletime code transformations which expose instruction-level parallelism typically take into account the constraints imposed by all execution scenarios in the program. However, there are additional opportunities to increase instructionlevel parallelism along the...
Hierarchical architectures attempt to provide the benefits of both VLIW/superscalar and MIMD machines by combining multiple VLIW or superscalar processors as parallel, asynchronous processors. An example of these architectures is the Intel Touchstone multicomputer. These machines provide the opportunity to execute a program in parallel at both the machine instruction level and the source statement...
A dynamic evaluation of the effects of data dependence analysis in the Perfect Benchmarks is demonstrated. We show that it is possible to measure the optimal parallelism, as defined by our model, and to compare the obtained parallelism for various data dependence tests with the optimal parallelism. We find that a variation of Banerjee's inequalities is sufficient in all cases to obtain more than half...
In this paper, we give experimental results of runtime partitioning of arbitrary order loop with pointer-based structures. Arbitrary order loops are those loops whose iterations can be executed in any order, typically iterating over a bag, list, or set. Such iterations are often used in algorithm texts for iterating over adjacency lists and other collections, and are similar to Linda's bags.
Many parallel programs require run-time support to implement the communication caused by indirect data references. In previous work, we have developed the inspectorexecutor paradigm to handle these cases. This paper extends that work by developing a dataflow framework to aid in placing the executor communications calls. Our dataflow analysis determines when it is safe to combine communications statements,...
This paper describes Orca C, the first language built on the Phase Abstractions programming model. The focus is on the syntax and semantics of the data ensembles, a parallel data structuring facility to support machine-independent parallel programs. These features are compared with the data decomposition capabilities of other parallel languages, including Fortran D, Vienna Fortran, and Kali.
A compositional parallel program is a program constructed by composing component programs in parallel, where the composed program inherits properties of its components. In this paper, we describe a small extension of C++ called Compositional C++ or CC++ which is an object-oriented notation that supports compositional parallel programming. CC++ integrates different paradigms of parallel programming:...
Is the owner-computes style of parallelism, captured in a variety of data parallel languages, attractive as a paradigm for designing explicitly parallel codes? This question gives rise to a number of others. Will such use be unwieldy? Will the resulting code run well? What can such an approach offer beyond merely replicating, in a more labor intensive way, the services and coverage of data parallel...
Concurrent object-oriented programming languages are an attractive approach for programming massively-parallel machines. However, exploiting object-level concurrency is problematic as the linkage and communication overhead can overwhelm the benefits of the fine-grained concurrency. Our approach achieves efficient execution by tuning the grain size, matching the execution grain size to that efficiently...
This paper presents a static algorithm, based on standard iterative dataflow techniques, for computing per-process memory references to shared data in coarse-grained parallel programs. The algorithm constructs control flow graphs for families of processes by recognizing predicates used in control statements whose values are invariant relative to any one process, but vary across processes. It is used...
In this paper, we address the problem of supporting SPMD execution of programs that use recursively-defined dynamic data structures on distributed memory machines. The techniques developed for supporting SPMD execution of array-based programs rely on the fact that arrays are statically defined and directly addressable. As a result, these techniques do not apply to recursive data structures, which...
Considerable research on loop parallelization for shared memory multiprocessors has focused upon developing transformations for removing loop-carried dependences. In many loops, more than one such transformation is required, and hence the choice of transformations and the order in which they are applied is critical. In this paper, we present an algorithm for selecting a sequence of transformations...
This paper presents a set of compiler optimizations and their application strategies for a common class of data parallel loop nests. The arrays updated in the body of the loop nests are assumed to be partitioned into blocks (rectangular, rows, or columns) where each block is assigned to a processor. These optimizations are demonstrated in the context of a FORTRAN-90 compiler with very encouraging...
Vienna Fortran is a language extension of Fortran which provides the user with a wide range of facilities for the distribution of data structures across the processors of a distributed-memory multiprocessing machine. In contrast to current programming practice, programs in Vienna Fortran are written using global data references. Thus, the user has the advantages of a shared memory programming paradigm...
This paper presents a methodology for synthesizing parallel programs for block recursive algorithms such as fast Fourier transforms and Strassen's matrix multiplication algorithm. A block recursive algorithm is expressed as a tensor product formula which consists of matrix sums, matrix products, direct sums, tensor products, componentwise matrix operations, and stride permutations. These mathematical...
In this paper we propose a loop fusion algorithm specifically designed to increase opportunities for array contraction. Array contraction is an optimization that transforms array variables into scalar variables within a loop nest. In contrast to array elements, scalar variables have better cache behavior and can be allocated to registers. In past work we investigated loop interchange and loop reversal...
We have designed and implemented parallel compile-time analysis algorithms based on a family of general-purpose, hybrid algorithms for data flow analysis [MR90]. Recently, we have developed optimizations that improve algorithm performance by reducing the time taken for computation, communication and dynamic scheduling. The practicality of this improved algorithm is evidenced by empirical studies of...
Although “data parallelism” has been shown to be an effective and portable way to express some types of parallel algorithms, there are many other problems for which data parallelism seems awkward and inefficient. For example, recursive decompositions and operations on irregular grids are most readily expressed using control parallelism. The problem is that control parallelism has always been associated...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.