Example Program
Problem:
compute the elements M
ij
of matrix M
column blocks of M are distributed among tasks
computation times for M
ij
highly variable
Program structure:
compute (unprocessed) rows of local matrix elements
while a task with unprocessed elements exists:
compute rows of this task
check result matrix
Program run:
Synchronisation:
integer variable
last
for each task
contains number of first unprocessed row
can be accessed by all tasks
every access to
last
is protected by a lock
barrier between
initialisation and computation
computation and check
Missing synchronisation leads to race condition, f.i.:
initialisation/computation:
task 1 in init phase
task 0 has already finished local computation,
reads uninitialised last pointer from task1
Remarks on example program (
oneside.c
,
DistMatrix.h
,
DistMatrix.c
)
DistMatrix:
matrix with column block distribution
contains global and local dimensions and offset
allocated with
MPI_Alloc_mem
special INDEX macro
blockDistribute: global index
local index
local computation:
other tasks may remotely work with this data
use lock to get next free row
remember to release lock in exit branch as well!
get_next:
visits neighbour tasks in cyclic way
exit criterion: local task reached
remote computations:
relate indices to task visited
compute corresponding global index
compute:
complicated way to compute
f(x) = x + eps
resistant against compiler optimisation
amount of computation strongly varying
check:
Peter Junglas 16.2.1999