Mixed and Hybrid Parallel Programmings Models
Current computing platforms offer several levels of parallelism, such
as SIMD instructions (e.g. 4 floating point operations) in a processor
core, multiple cores in a processor chip, multiple processor chips in
a shared memory node and multiple nodes connected (in several layers)
as a parallel computer.
Similar hierarchies can be constructed by a
cluster of Cell processor nodes, each with 8 co-processor SPUs and
SIMD instructions. An alternative is a cluster of nodes with attached GPUs for
numerical computations. The Cuda system organizes the GPU into groups
of 8 processors running strictly in SIMD mode, where 4 such groups
form a warp sharing the processor resources. A number of warps can be
executed in parallel on the available multi-processors of a GPU board. Brook+ organizes data in streams, which are wrote to and read from the GPU running in SIMD mode.
Parallel programming models for hierarchically structured parallel
computers will use a hierarchy of appropriate programming models,
e.g. an outer MPI message-passing for the distributed memory part of
the computer, a middle layer of thread parallelism (Pthreads, OpenMP)
for the shared memory multi-processor nodes and an inner SIMD
parallelism for the instruction parallelism.
Similar, message passing
can be combined with Cell or GPU inner programming models, where again
a mixture of thread and SIMD programming is employed. Note that
multi-threading is not necessarily the optimal way to use (larger
virtual) shared memory computers. At some point message-passing or
one-sided communication may be superior.