Scientific Computing & Visualization
Help Contact
About Accounts Computation Visualization Documentation Services

Multiprocessing by Message Passing MPI

Example 1.1 Integration with MPI Blocking Send/Receive

Numerical integration is chosen as the instructional example for its trivially simplistic algorithm to parallelize; the task is narrowly confined yet computationally relevant. For this example, the integrand is the cosine function and the range of integration, from a to b, has length b-a. The work share is evenly divided by the number of processors, p so that each processor is responsible for integration of a partial range, (b-a)/p. Upon completion of the local integration on each processor, processor 0 is designated to collect the local integral sums of all processors to form the total sum.

First, MPI_Init is invoked to initiate MPI and synchronize all participating processes. The MPI parallel paradigm used throughout this tutorial is SPMD (Single Program Multiple Data). Once MPI is initiated, the identical program is executed on multiple processors. The number of processors is determined by the user at runtime with


my-machine% mpirun -np 4 a.out
The mpirun command runs the MPI executable a.out on 4 processors. This is the most frequently used command to launch an MPI job. However, it is system dependend. Consult your local system for the proper way to run your MPI job. The number of processors, 4 in this example, is passed to your executable as the variable p through a call to MPI_Comm_size. To distinguish the identity of one processor from another, MPI_Comm_rank is called to querry for myid, the current rank number. Essentially, myid facilitates the handling of different data on different processors. The MPI_Send and MPI_Recv pair of blocking, also referred to as "standard", send and receive subroutines are used to pass the local integral sum from individual processors to processor 0 to calculate the total sum. The MPI standard requires that a blocking send call blocks (and hence NOT return to the call) until the send buffer is safe to be reused. Similarly, the Standard requires that a blocking receive call blocks until the receive buffer actually contains the intended message. At the end, a call to MPI_Finalize permits an orderly shutdown of MPI before exiting the program.

To see the explanations on an MPI library routine used (as well as its arguments), click or place the mouse over . For detail explanations of the MPI subroutines and functions, please read MPI: The Complete Reference.

Example 1.1 Fortran code

Example 1.1 C code

Discussion

A number of points can be drawn from the above code:
  1. The assignment of master is arbitrary, the selection of "0" is, however, a strategic one because this is the only number that would work for p = 1 (single processor job).
  2. MPI_Send is used to send the message my_int from all processors to the master.
  3. This send is performed concurrently on all processors.
  4. For each point-to-point send, a matching receive on the master is expected.
  5. The matching receive requires the master to call the point-to-point receive MPI_Recv p times, each to receive the my_int sent by a processor.
  6. This effectively serializes an otherwise parallel procedure and is computationally less efficient.
  7. Usually, there is no problems for processors to send messages to the master and for the master to receive messages from other processors. In this example, however, there exists a problematic situation in which MPI_Send and MPI_Recv -- keep in mind that both are blocking -- try to send and receive on processor 0 at approximately the same time. Both send and receive processes got started but neither process could complete its task due to the other trying to get into the action and hence a deadlock situation arose -- at least theorectically. This is because on many of the MPI implementations, a system memory buffer is provided even though this is a blocking operation which is not required by the MPI standard to provide buffer. As a result, deadlock may not occur. Hence, this situation is considered "unsafe".
  8. Some MPI implementations allow the user to control the memory buffer. On the IBM pSeries, the memory buffer can be defined via the environment variable MP_EAGER_LIMIT. Setting it to 0 provides no buffer which enforces the strict definition of point-to-point blocking communication. The present example on the IBM pSeries deadlocks with setenv MP_EAGER_LIMIT 0. Incidentally, this is a good trick to use to see if your code is "safe".
  9. tag is one of the parameters used to help define a message uniquely. In this application, myid is sufficient to identify a message. tag is not needed as an additional parameter to define the message uniquely. If multiple messages are sent and received by the same processor, the receiver might need the tag to distinguish among the messages to determine what to do with the message.
  10. In both the send and receive subroutine/function, there is a constant MPI_COMM_WORLD, defined in mpif.h (for fortran) or mpi.h (for C). This is called a communicator. MPI_COMM_WORLD means no processors allocated for this job is excluded for the task (i.e., send or receive) at hand. In some situations, may be only a restricted group of processors (e.g., odd or even numbers) are considered in a message passing operation. In that case, one would create a taylored communicator to take the place of MPI_COMM_WORLD. In all the numerical integration examples, we use only use the MPI_COMM_WORLD communicator.

The shortcomings stated above will be remedied in the next example, Example 1.2.

Example 1  | Example 1.1 | Example 1.2 | Example 1.3 | Example 1.4 | Example 1.5

Documentation
Boston University
Boston University
 
OIT | CCS | June 26, 2009  
Scientific Computing & Visualization Boston University home page Boston University home page