CMSC 455 Lecture 3b, MPI multiprocessors

    <- previous    index    next ->

Lecture 3b MPI multiprocessors


We have a number of clusters at UMBC, I happen to use
our Bluegrit cluster and these examples are from it.

MPI stands for Message Passing Interface and is
available on many multiprocessors. MPI may be installed
as the open source version MPICH. There are other
software libraries and languages for multiprocessors,
yet, this lecture only covers MPI.

The WEB page here at UMBC is
www.csee.umbc.edu/help/MPI

Programming in MPI is the SPMD Single Program Multiple data style of
programming. One program runs on all CPU's in the multiprocessor.
Each CPU has a number, called a rank in MPI, called myid in my code and
called node or node number in comments.

"if-then-else" code may be based on node number is used to have
unique computation on specific nodes. There is a master node,
typically the node with rank zero in MPI. The node number may
also be used in index expressions and other computation. Many
MPI programs use the master as a number cruncher along with the
other nodes in addition to the master serving as overall control
and synchronization.
 
Examples below are given first in "C" and then a few in Fortran.
Other languages may interface with the MPI library.
These just show a simple MPI use, these are combined later for
solving simultaneous equations on a multiprocessor.

Just check that a message can be sent and received from each
node, processor, CPU, etc. numbered as "rank".

roll_call.c

roll_call.out

Just scatter unique data from the "master" to all nodes.
Then gather the unique results from all nodes.

scat.c

scat.out

Here is the Makefile I used.
Makefile for C on Bluegrit cluster


Repeating the "roll_call" just changing the language to Fortran.

roll_call.F

roll_call_F.out

Repeating scatter/gather just changing the language to Fortran.

scat.F

scat_F.out

The Fortran version of the Makefile with additional files I used.

Makefile for Fortran on Bluegrit cluster

my_mpif.h only needed if not on cluster

nodes only needed if default machinefile not used

Now, the purpose of this lecture, solve huge number of simultaneous
equations on a highly parallel multiprocessor.

Well, start small when programming a multiprocessor and print out
every step to be sure the indexing and communication is exactly
correct.

This is hard to read, yet it was a necessary step.

psimeq_debug.c

psimeq_debug.out

Then, some clean up and removing or commenting out most debug print:

psimeq1.c

psimeq1.out

The input data was created so that the exact answers were 1, 2, 3 ...
It is interesting to note: because the data in double precision floating
point was from the set of integers, the answers are exact for
8192 equations in 8192 unknowns.

psimeq1.out8192

|A| * |X| = |Y|  given matrix |A| and vector |Y| find vector |X|

  | 1 2 3 4 5 | |5| | 35|  for 5 equations in 5 unknowns
  | 2 2 3 4 5 | |4| | 40|  the solved problem is this
  | 3 3 3 4 5 |*|3|=| 49|
  | 4 4 4 4 5 | |2| | 61|
  | 5 5 5 5 5 | |1| | 75|

A series of timing runs were made, changing the number of equations.
The results were expected to increase in time as order n^3 over the
number of processors being used. Reasonable agreement was measured.

Using 16 processors:
Number of  Time computing  Cube root of
equations  solution (sec)  16 times Time (should approximately double
    1024       3.7          3.9           as number of equations double)
    2048      17.2          6.5
    4096      83.5         11.0
    8192     471.9         19.6

More work may be performed to minimize the amount of
data send and received in "rbuf".

    <- previous    index    next ->

Lecture 3b MPI multiprocessors

Other links

Go to top