Multiprocessing by Message Passing MPI
Example 1.5 Integration with MPI Collective Communications
So far, we have used point-to-point blocking and nonblocking communication
routines to perform numerical integration. In this example, we turn our
focus to collective communication routines. Unlike point-to-point
communications with which a message is passed between one processor and
another, a collective communication routine performs a one (processor)
to all (processors) , all-to-one, or all-to-all communications. Broadcasting
is a typical example of a one-to-all communication while a gather operation
is representative of an all-to-one communication.
Two collective communication routines are introduced in this example.
First, we make the program more general by allowing the number
of intervals, n, for each sub-range to be defined through run-time
input. To avoid repetition, it is read on the
master processor only. To make n available on all
processors, it must then be copied to all processes.
This can be accomplished with a broadcast operation, MPI_Bcast.
In previous examples, after the local integral sums have been
computed by all participating processes, they are individually
sent to the master who sums them to obtain the final sum.
In this example, a collective reduction routine, MPI_Reduce, will be used to perform
the same task. As you will see, they are more compact and convenient
to use than point-to-point communication routines, they are also
expected to be more efficient and less prone to errors.
Example 1.5 Fortran Code
Program Example1_5
c##################################################################################
c# #
c# This is an MPI example on parallel integration to demonstrate the use of: #
c# #
c# * MPI_Init, MPI_Comm_rank, MPI_Comm_size, MPI_Finalize #
c# * MPI_Reduce #
c# * MPI_SUM #
c# #
c# Dr. Kadin Tseng #
c# Scientific Computing and Visualization #
c# Boston University #
c# 1998 #
c# #
c##################################################################################
implicit none
integer n, p, i, j, proc, ierr, master, myid, tag, comm
real h, a, b, integral, pi, ai, my_int, integral_sum
include "mpif.h" ! brings in pre-defined MPI constants, ...
integer status(MPI_STATUS_SIZE) ! size defined in mpif.h
data master/0/ ! processor 0 collects integral sums from other processors
comm = MPI_COMM_WORLD
call MPI_Init(ierr) ! starts MPI
call MPI_Comm_rank(comm, myid, ierr) ! get current proc ID
call MPI_Comm_size(comm, p, ierr) ! get number of procs
pi = acos(-1.0) ! = 3.14159...
a = 0.0 ! lower limit of integration
b = pi*1./2. ! upper limit of integration
tag = 123 ! set the tag to identify this particular job
if(myid .eq. master) then
print *,'The requested number of processors =',p
print *,'enter number of increments within each process'
read(*,*)n
endif
c**Broadcast "n" to all processes
call MPI_Bcast( ! Broadcast "n" to all procs
& n, 1, MPI_INTEGER,
& master, comm, ierr)
h = (b-a)/n/p ! length of increment
ai = a + myid*n*h ! lower limit of integration for partition myid
my_int = integral(ai, h, n)
write(*,"('Process ',i2,' has the partial sum of',f10.6)")
& myid,my_int
call MPI_Reduce( ! a collective reduction operation
& my_int, integral_sum, 1, MPI_REAL,
& MPI_SUM,
& master,
& comm, ierr)
if(myid .eq. master) then
print *,'The Integral =',integral_sum
endif
call MPI_Finalize(ierr) ! let MPI finish up ...
end
real function integral(ai, h, n)
implicit none
integer n, j
real h, ai, aij
integral = 0.0 ! initialize integral
do j=0,n-1 ! sum integrals
aij = ai +(j+0.5)*h ! abscissa mid-point
integral = integral + cos(aij)*h
enddo
return
end
Example 1.5 C code
#include <mpi.h>
#include <math.h>
#include <stdio.h>
float fct(float x)
{
return cos(x);
}
/* Prototype */
float integral(float ai, float h, int n);
int main(int argc, char* argv[])
{
/*###############################################################################
# #
# This is an MPI example on parallel integration to demonstrate the use of: #
# #
# * MPI_Init, MPI_Comm_rank, MPI_Comm_size, MPI_Finalize #
# * MPI_Reduce #
# * MPI_SUM #
# #
# Dr. Kadin Tseng #
# Scientific Computing and Visualization #
# Boston University #
# 1998 #
# #
###############################################################################*/
int n, p, myid, tag, proc, ierr;
float h, integral_sum, a, b, ai, pi, my_int;
char line[10];
int master = 0; /* processor performing total sum */
MPI_Comm comm;
comm = MPI_COMM_WORLD;
ierr = MPI_Init(&argc,&argv); /* starts MPI */
MPI_Comm_rank(comm, &myid); /* get current process id */
MPI_Comm_size(comm, &p); /* get number of processes */
pi = acos(-1.0); /* = 3.14159... */
a = 0.; /* lower limit of integration */
b = pi*1./2.; /* upper limit of integration */
if(myid == master) {
printf("The requested number of processors is %d\n",p);
printf("Enter number of increments within each process\n");
(void) fgets(line, sizeof(line), stdin);
(void) sscanf(line, "%d", &n);
}
/* Broadcast "n" to all processes */
MPI_Bcast(
&n, 1, MPI_INT,
master, comm);
h = (b-a)/n/p; /* length of increment */
ai = a + myid*n*h; /* lower limit of integration for partition myid */
my_int = integral(ai, h, n); /* 0<=myid<=p-1 */
printf("Process %d has the partial result of %f\n", myid,my_int);
MPI_Reduce(
&my_int, &integral_sum, 1, MPI_FLOAT,
MPI_SUM,
master, comm);
if(myid == 0) {
printf("The result =%f\n",integral_sum);
}
MPI_Finalize(); /* let MPI finish up ... */
}
float integral(float ai, float h, int n)
{
int j;
float aij, integ;
integ = 0.0; /* initialize */
for (j=0;j<n;j++) { /* sum integrals */
aij = ai + (j+0.5)*h; /* mid-point */
integ += cos(aij)*h;
}
return integ;
}
Discussions
All collective communication routines are implemented so that the call
to the routine must be
invoked on all processes. For MPI_Bcast, the source of the broadcast is
master. On this process, MPI_Bcast acts as a send. On all
other processes where myid is not master, it automatically
knows that it must act as a receive.
Once my_int is computed on all processes, calling the collective reduction
subroutine MPI_Reduce will take care of collecting my_int from
all processes followed by a reduction (summation) operation.
Reduction operation can be one of two types: pre-defined
or user-defined. Pre-defined reduction operations are built in to MPI
and include, for example : MPI_SUM for summing; MPI_MAX
and MPI_MIN for finding extremums; MPI_MAXLOC and
MPI_MINLOC
for finding extremums and their corresponding locations.
In order to use MPI_Reduce, the reduction operation
must satisfy the associative rule. If it also satisfies the
commutative rule, the reduction operation may gain in efficiency.
For example, summation, or MPI_SUM, is a reduction operator that
is both associative and commutative. Subtraction, which is not on the list
of pre-defined reduction operations, is neither associative nor commutative.
User-defined reduction operations must also satisfy the associative rule and
optionally the commutative rule. For further details, see
MPI: The Complete Reference.
Example 1  |
Example 1.1 |
Example 1.2 |
Example 1.3 |
Example 1.4 |
Example 1.5
|
|