Automatic Parallelization

Automatic parallelization of general sequential, imperative code is one of the ultimate goals of compiler construction. Numerical algorithms in scientific computing can often be parallelized efficiently with a data parallel approach. Basically there are three related problems to address: First the data partition problem, second the mapping problem where sets of partitions are mapped to a processor, and third the data dependence analysis where to add communication operations in a distributed memory environment and synchronization operations in a shared memory environment. Both partitioning and mapping can be hard problems depending on the data dependence graph. The data dependence analysis itself can be technically too complex for compilers. However, often there exist good solutions for the problems known in the specific area of application. Hence, parallelization of codes written in a domain specific languages may be possible, while parallelization in general is not feasible.
We present some examples codes, where an automatic parallelization can be done.Further, we have constructed a source-to-source C++ compilation system and some application libraries for this purpose.

source code
with grid & tree iterators
dependence analysis
source code
  • macro expansion
  • code generation dependent on the target architecture
sequential code
SSE instructions
POSIX threads
MPI message passing
Cuda GPU
mixed models
AltiVec instructions
OpenMP
MPI-2 one-sided communication
Brook+ GPU
Intel TBB
shmem one-sided communication
Cell BE processor
Boost.Thread