The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Many present-day microprocessors have fine grain parallelism, be it in the form of a pipeline, of multiple functional units, or replicated processors. The efficient use of such architectures depends on the capability of the compiler to schedule the execution of the object code in such a way that most of the available hardware is in use while complying with data dependences. In the case of one simple...
Trade-offs between code selection, register allocation, and instruction scheduling are inherently interdependent, especially when compiling for fine-grain parallel architectures. However, the conventional approach to compiling for such machines arbitrarily separates these phases so that decisions made during any one phase place unnecessary constraints on the remaining phases. Mutation Scheduling attempts...
Just a few years ago, parallel computers were tightly-coupled SIMD, VLIW, or MIMD machines. Now, they are clusters of workstations connected by communication networks yielding ever-higher bandwidth (e.g., Ethernet, FDDI, HiPPI, ATM). For these clusters, compiler research is centered on techniques for hiding huge synchronization and communication latencies, etc. — in general, trying to make parallel...
Data and computation alignment is an important part of compiling sequential programs to architectures with non-uniform memory access times. In this paper, we show that elementary matrix methods can be used to determine communication-free alignment of code and data. We also solve the problem of replicating read-only data to eliminate communication. Our matrix-based approach leads to algorithms which...
This paper describes some aspects of the implementation of our Data Distribution Tool (DDT), which accepts programs written in Fortran77 and obtains alignment and distribution HPF directives for the arrays used in the program. In particular, we describe the phases of the tool which analyze reference patterns in loops, record preferences for alignment and obtain the alignment functions. These functions...
We consider distribution at compile time of the array data in a distributed-memory implementation of a data-parallel program written in a language like Fortran 90. We allow dynamic redistribution of data and define a heuristic algorithmic framework that chooses distribution parameters to minimize an estimate of program completion time. We represent the program as an alignment-distribution graph. We...
The paper describes a parallelization algorithm for programs consisting of arbitrary nestings of loops and sequences of loops. The code produced by our algorithm yields all the degrees of communication-free parallelism that can be obtained via loop fission, fusion, interchange, reversal, skewing, scaling, reindexing and statement reordering. The algorithm first assigns the iterations of instructions...
We present a unified framework for applying iteration reordering transformations. This framework is able to represent traditional transformations such as loop interchange, loop skewing and loop distribution as well as compositions of these transformations. Using a unified framework rather than a sequence of adhoc transformations makes it easier to analyze and predict the effects of these transformations...
Converting sequential programs to execute on parallel computers is difficult because of the need to globally optimize for both parallelism and data locality. The choice of which loop nests to parallelize, and how, drastically affects data locality. Similarly, data distribution directives, such as DISTRIBUTE in High Performance Fortran (HPF), affects available parallelism and locality. What is needed...
It is the goal of the Polaris project to develop a new parallelizing compiler that will overcome limitations of current compilers. While current parallelizing compilers may succeed on small kernels, they often fail to extract any meaningful parallelism from large applications. After a study of application codes, it was concluded that by adding a few new techniques to current compilers, automatic parallelization...
In this paper we describe an approach to the compilation of data-parallel programming languages based on a formally defined intermediate language, called V-cal. The calculus V-cal was designed to represent the semantics of data management and control primitives found in data-parallel languages and allows to describe program transformations and optimizations as semantics preserving rewrite rules. ...
Scalability and cost considerations suggest that distributed and distributed shared memory parallel computers will dominate future parallel architectures. These machines could not be used effectively unless efficient automatic and static solutions to the data partitioning and placement problem become available. Significant progress toward this end has been made in the last few years, but we are still...
Precise value-based data dependence analysis for scalars is useful for advanced compiler optimizations. The new method presented here for flow and output dependence uses Factored Use and Def chains (FUD chains), our interpretation and extension of Static Single Assignment. It is precise with respect to conditional control flow and dependence vectors. Our method detects dependences which are independent...
Many abstractions of program dependences have already been proposed, such as the Dependence Distance, the Dependence Direction Vector, the Dependence Level or the Dependence Cone. These different abstractions have different precision. The minimal abstraction associated to a transformation is the abstraction that contains the minimal amount of information necessary to decide when such a transformation...
Our parallel hybrid analysis methods facilitate the parallelization of the analysis phase of a software transformation system, by enabling deeper semantic analyses to be accomplished more efficiently than if performed sequentially. Our previous empirical studies profiled these hybrid techniques on the Reaching Definitions problem [LMR91, LR92a, LR92b]. Recently, we have applied our method to the Interprocedural...
Data-flow analysis algorithms can be classified into two categories: flow-sensitive and flow-insensitive. To improve efficiency, flow insensitive interprocedural analyses do not make use of the intraprocedural control flow information associated with individual procedures. Since pointer-induced aliases can change within a procedure, applying known flow-insensitive analyses can result in either incorrect...
In compiling array statements for distributed-memory machines, efficient generation of local index sets and communication sets is important. Several techniques for enumerating these sets for block-cyclically distributed arrays have been presented in the literature. When sufficient compile-time information is not available, generation of the structures which facilitate efficient enumeration of these...
This paper presents a framework, based on global array data flow analysis, to reduce communication costs in a program being compiled for a distributed memory machine. This framework applies techniques for partial redundancy elimination to available section descriptors, a novel representation of communication involving array sections. With a single framework, we are able to capture numerous optimizations...
Managing communication is a difficult problem in distributed memory compilation. When the exact data to be communicated cannot be determined at compile time, communication optimizations can be performed by runtime routines which generate schedule for communication. This leads to two optimization problems: placing communication so that data once communicated can be reused if possible and placing schedule calls...
The increase in the number and complexity of parallel programs has led to a need for better approaches for synchronization error detection and debugging of parallel programs. This paper presents an efficient and precise algorithm for the detection of nondeterminacy (race conditions) in parallel programs. Non determinacy exists in a program when the program yields different outputs for different runs...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.