Converting sequential programs to execute on parallel computers is difficult because of the need to globally optimize for both parallelism and data locality. The choice of which loop nests to parallelize, and how, drastically affects data locality. Similarly, data distribution directives, such as DISTRIBUTE in High Performance Fortran (HPF), affects available parallelism and locality. What is needed is a systematic approach to converting programs to parallel form, based upon analysis that identifies opportunities for both parallelism and locality in one representation.
This paper presents a global framework for optimizing parallelism and locality, based upon constraint solving for locality between potentially parallel loop nests. We outline the theory behind the framework, and provide a global algorithm for parallelizing programs while optimizing for locality. We also give results from applying the algorithm to parallelizing the Perfect benchmarks, targeted at the KSR-1, and analyze the results. Unlike other approaches, we do not assume an explicit distribution of data to processors. The distribution is inferred from locality constraints and available parallelism. This approach works well for machines such as the KSR-1, where there is no explicit distribution of data. However, our approach could be used to generate code for distributed memory processors (such as generating HPF) with explicit data distribution.