The data parallel programming language construct of a “foreach” loop is proposed in the context of hierarchically nested arrays and unbalanced k-ary trees used in high performance applications. In order perform an initial evaluation, an implementation of an automatic parallelization system for C++ programs is introduced, which consists of a preprocessor and a matching library for distributed memory, shared memory and mixed model parallelism. For a full compile time dependence analysis and a tight distributed memory parallelization, some additional application knowledge about alignment of arrays or indirect data access can be put into the application’s code data declarations. Results for a multigrid and a fast multipole benchmark code illustrate the concept.