MATLAB Parallel Computing Toolbox (PCT)
The Parallel Computing Toolbox is a toolbox within MATLAB. It lets you solve computationally-intensive and data-intensive problems using MATLAB and Simulink on your local multicore computer or the Katana Cluster. Parallel processing operations such as parallel for-loops, parallel numerical algorithms, and message-passing functions let you implement task- and data-parallel algorithms in MATLAB. Converting serial MATLAB applications to parallel MATLAB applications requires few code modifications and no programming in a
low-level language. You can run your applications interactively or in batch.
To use MATLAB requires a software license. Similarly, using the PCT also requires a PCT software license, whether you are running it on your local computer or the Katana cluster. In addition, using multiprocessors on the Katana
cluster for parallel MATLAB application additionally requires an equal number of "remote worker" licenses. There are currently 32 remote worker licenses available for users with Katana access. License access is a fully automated process. There is no action necessary by the user.
Multicore local computers are ideal for interactive PCT sessions. It only require a PCT software license. No "local worker" licenses
are required to run the PCT. Furthermore, there is no wait time for accessing the local workers. However, up to eight (8) local workers are allowed on a local, or client, computer. Local interactive PCT sessions are very useful for PCT learning, code development and debugging. If you need more workers, such as for making production runs or program debugging that require more workers than is available on your local computer, you will need to use the workers on the Katana cluster.
The Katana cluster processors can be used for both interactive as well as batch processing using the PCT. However, since all PCT jobs are treated as batch jobs, running an interactive PCT session may require wait time when the Katana is in full utilization.
If your MATLAB application is to run multiple independent tasks, such as parametric studies of an analysis; image processing of large number of image files; or Monte Carlo simulations, please consult the Running Multiple MATLAB Tasks page. This alternative method does not require the PCT and hence you won't need to learn PCT. Besides, on a loaded queue the wait-time could be significantly reduced because you will only have to wait for processors to become available instead of having to wait for both processors and remote worker licenses. Please contact Kadin Tseng (617-353-8294 or kadin@bu.edu) for more information.
The primary goal of this page is to provide instructions on how to run batch
jobs on the Katana using the PCT. It also provides supplemental examples to the
PCT document to highlight and clarify the operational and functional differences
among the parallel paradigms. For details on specific PCT commands or concepts,
please refer to the MATLAB PCT User's Guide (HTML).
Parallel Computing Toolbox User's Guide (PDF file for printing).
Terminologies and symbols used in this and related pages
- Client — also MATLAB client; the MATLAB session (>>) where MATLAB/PCT commands are issued.
- Distributed job — multiple tasks running independently and without communications among workers. In batch, this
results in multiple single-processor jobs. This is also known as a
task-parallel job.
- Parallel job — single task running simultaneously on multiple workers that may communicate with each other. In batch, this results in one batch job with multiple processors. This is also known
as a data-parallel job.
- Local workers — processors on user's local computer.
- Remote workers — processors on the Katana Cluster.
- Place mouse over
symbol to read additional notes.
First Time Using the PCT
Before using the PCT, you will need to perform some chores. These depend
on whether you want to use PCT on the Katana, your local computer, or both.
- To use PCT on the Katana, you will need to set up the MATLAB batch
configuration. Here are the instructions.
- To use PCT on your local computer, you will need to install MATLAB
and the companion toolboxes, including the Parallel Computing Toolbox.
Here are the instructions.
Interactive Parallel Sessions
You can use an interactive MATLAB session to develop, debug, or run parallel codes. The two methods available for this purpose are pmode and matlabpool. The GUI-based
pmode method is developed exclusively for interactive sessions while the
command-line based matlabpool may be used for both interactive and
batch procesing. When either of these two methods is invoked at the MATLAB prompt, a batch job will be submitted to the batch queue. If both the requested processors and worker licenses are available, you will enter a parallel session.
Interactive parallel sessions are more practical on a client computer as there is no wait time and no worker license requirement. On Katana, if the system's processor-usage is near or at capacity,
you may want to request a modest number of workers (for example, 2) to increase the chance of getting in in a timely manner.
Parallel Paradigms
There are several methods available within MATLAB's parallel computing
toolbox for different types of parallel applications. Links to the relevant sections of
the PCT User's Guide are provided for your convenience. The Guide provides
simple examples focussing narrowly on specific function or construct usages.
To supplement these examples, this page provides a common linear algebraic
system of equation example (Ax = b) for all parallel methods to demonstrate
how they differ operationally. Furthermore, the example codes are complete,
fully functional standalone codes. They basically consist of pre- and post-processing (on the MATLAB client) with one or more parallel operations in between.
- spmd
After matlabpool has been invoked and workers assigned,
run spmd to activate the Single Program Multiple Data paradigm.
The spmd environment is essentially equivalent to the
pmode environment, but without the individual window for each
worker (i.e., the Parallel Command Window).
In this environment, task specific to a processor is identified by
labindex, or processor number.
% Solves multiple systems of Ax = b with spmd.
% Process systems one at a time; each Ax=b is solved in parallel.
% matlabpool open
% spmd_example
% matlabpool close
% Kadin Tseng, Scientific Computing and Visualization, Boston University
n = 3; % square matrix size
c = ones(n,1); % a vector used to generate solution
matlabpool open % requests workers
for i=1:4 % solves 4 cases of Ax=b one at a time in parallel
spmd
M = rand(n, codistributor()); % distributed random number matrix
C = codistributed(c, 'convert'); % distributes c
[A, x, b] = linearSystem(M, C, i);
y = A\b; % solves Ay = b; y should equal x
y1 = gather(y, 1); % collects distributed y to worker 1
if labindex == 1 % prints (or pick) y on worker 1
disp(['For case ' num2str(i) '; y1 is'])
disp([num2str(y1)])
end
end % spmd
% y1{1} % could also use y1 directly from client
end % for
matlabpool close
exit % must be present for batch processing
function [A, x, b] = linearSystem(M, C, i)
%function [A, x, b] = linearSystem(M, C, i)
% Returns matrices of a linear algebraic system Ax=b
% Works in parallel if input arrays are distributed
% M -- real square matrix
% C -- a vector used to generate the solution
% i -- number used to generate solution x;
% in this example, pass in the for-loop index
% Kadin Tseng, Scientific Computing and Visualization, Boston University
A = M + M'; % A is real and symmetric
x = i*C; % define solution
b = A * x; % Ax = b is the RHS of linear system
This falls into the category of "parallel job." M is a matrix
of random numbers generated with rand. With the specification of
"codistributor()", M is distributed among processors.
A is a symmetric matrix and is also distributed
in the manner determined by M. C is an array from the client workspace
and is converted into a codistributed array. Consequently, x and
b are also distributed.
With A and b distributed, y = A\b
is solved in parallel. In the matlabpool environment, a codistributed
array resides, distributed and local, among the workers. It is assigned as a "composite
array" on the client. You know it exists but it can not be accessed readily. To make it
accessible on the client, this array must first be gathered, either to a
specific worker or to all workers (broadcast). The choice rests on how you plan to use it.
- drange
This must be used within pmode or matlabpool + spmd
environment. Tasks distribution is controlled by the loop index whose
range is specified by the programmer through drange. Pre-defined variables or arrays
on the client are accessible as if local
on the workers. Variables or arrays generated as output reside also
on the workers.
% Solves multiple systems of Ax = b with drange
% Multiple systems are solved in parallel; each on a separate worker.
% To use drange, needs "matlabpool + spmd" or "pmode"
% matlabpool open
% spmd
% drange_example
% end
% matlabpool close
% Kadin Tseng, Scientific Computing and Visualization, Boston University
n = 3; % square matrix size
c = ones(n,1); % a vector used to generate solution
r = rand(n); % n x n random number matrix
matlabpool open
spmd
for i=drange(1:numlabs) % use drange to distribute numlabs cases of Ax=b
[A, x, b] = linearSystem(r, c, i) % computes A, x, b on one worker
y = A\b; % solves Ay = b on one worker; y should equal x
end % for
end % spmd
matlabpool close
exit % must be present for batch processing
function [A, x, b] = linearSystem(M, C, i)
%function [A, x, b] = linearSystem(M, C, i)
% Returns matrices of a linear algebraic system Ax=b
% Works in parallel if input arrays are distributed
% M -- real square matrix
% C -- a vector used to generate the solution
% i -- number used to generate solution x;
% in this example, pass in the for-loop index
% Kadin Tseng, Scientific Computing and Visualization, Boston University
A = M + M'; % A is real and symmetric
x = i*C; % define solution
b = A * x; % Ax = b is the RHS of linear system
- dfeval
Similar to drange and parfor (to be discussed next), but without the explicit
use of a for-loop. Essentially, the programmer performs the data
distribution in lieu of the for-loop. Unlike drange or parfor, dfeval only
operate on intrinsic functions, such as rand, or user-defined functions.
The following example demonstrates the salient points of this parallel
operation. The objective is to solve multiple cases of Ax=b on
multiple workers.
% Solves multiple systems of Ax = b with dfeval
% Multiple systems are solved in parallel; each on a separate worker.
% DONOT need matlabpool or spmd
% dfeval automatically starts a parallel batch job
% Kadin Tseng, Scientific Computing and Visualization, Boston University
n = 3; % size of square matrix
r = rand(n); % n x n random number matrix
ntask = 4; % number of independent tasks
% The SGE batch consists of a serial and a parallel batch config.
% dfeval automatically uses the serial config. Tasks are submitted
% to the batch queue as ntask single-processor jobs
% The cell arrays as input to dfeval must of the same size
% they play the role like drange or parfor
for i=1:ntask
M{i} = r; % matrix
C{i} = ones(n,1); % a vector used to generate solution
I{i} = i;
end
[A, x, b] = dfeval(@linearSystem, M, C, I, 'Configuration', 'SGE')
% A, x, b are cell arrays returned to the client.
% For example, A{2} is the matrix A returned by worker 2
for i=1:ntask
y = A{i}\b{i} % solves Ay = b; y should equal x
end
exit % must be present for batch processing
function [A, x, b] = linearSystem(M, C, i)
%function [A, x, b] = linearSystem(M, C, i)
% Returns matrices of a linear algebraic system Ax=b
% Works in parallel if input arrays are distributed
% M -- real square matrix
% C -- a vector used to generate the solution
% i -- number used to generate solution x;
% in this example, pass in the for-loop index
% Kadin Tseng, Scientific Computing and Visualization, Boston University
A = M + M'; % A is real and symmetric
x = i*C; % define solution
b = A * x; % Ax = b is the RHS of linear system
- You should always use the SGE batch configuration on Katana or "local" if
running on your client computer.
- Running dfeval automatically launches N single-processor batch
jobs. This is the only parallel paradigm that is straightly in the
"distributed jobs" category.
- matlabpool is not required to run dfeval.
- There is a more effective alternative to dfeval.
- parfor
This construct is similar to drange in that the
tasks are distributed by the loop index. However, there are differences in
several respects:
- In contrast to drange, parfor does not
work within the spmd environment.
- parfor allows reduction operations, such as summation,
which require communications among workers. Hence, parfor
is a data-parallel operation while drange is a
task-parallel operation.
% Solves multiple systems of Ax = b with parfor
% Multiple systems are solved in parallel; each on a separate worker.
% parfor needs matlabpool or pmode
% matlabpool open
% parallel-code
% matlabpool close
% Kadin Tseng, Scientific Computing and Visualization, Boston University
n = 3; % square matrix size
c = ones(n,1); % a vector used to generate solution
r = rand(n); % n x n random number matrix
matlabpool open
ntask = matlabpool('size');
parfor i=1:ntask % solve ntask cases of Ax=b
[A, x, b] = linearSystem(r, c, i)
y = A\b % solves Ay = b; y should equal x
y % output above diverted; so prints y this way
end
matlabpool close
% linearSystem is defined in the drange example above
exit % must be present for batch processing
function [A, x, b] = linearSystem(M, C, i)
%function [A, x, b] = linearSystem(M, C, i)
% Returns matrices of a linear algebraic system Ax=b
% Works in parallel if input arrays are distributed
% M -- real square matrix
% C -- a vector used to generate the solution
% i -- number used to generate solution x;
% in this example, pass in the for-loop index
% Kadin Tseng, Scientific Computing and Visualization, Boston University
A = M + M'; % A is real and symmetric
x = i*C; % define solution
b = A * x; % Ax = b is the RHS of linear system
The above usage of parfor is not a typical
usage pattern. It is more often used in fine-grain operations where the loop
counts are much larger. Also, parfor permits reduction operations
while drange or dfeval does not.
The following code fragment, extracted from the PCT User's Guide, demonstrates
additional features:
s = 0;
x = [];
parfor i=1:10000
s = s + i; % summation
x = [x, i]; % concatenation; this is allowed in PCT
end
These operations are clearly loop-dependent.
However, according to the PCT User's Guide, these are
legitimate parallel operations for parfor. As shown, parfor has
more parallel capabilities than drange or dfeval.
The rules, or rather,
exception to the rules, should be
studied carefully to avoid incorrect usages.
Note that, unlike regular for-loops, parfor has some restrictions on its range specification. It must consists of consecutive
(finite) integers layout as a row vector. Here are some incorrect
usages:
>> parfor i=1:2:10 % WRONG; index not consecutive
>> parfor i=-5.1:10.3 % WRONG; index not integers
>> parfor i=[5;6;7;8] % WRONG; index does not form a row
- MPI
The PCT is built on the Message Passing Interface. You have the option
to use some of its functionalities in your applications.
These include, for example, mpiInit, labSend,
labReceive.
Convert pmode code into matlabpool code
You could use pmode to perform operations and then use either diary
or cut-and-paste (from the MATLAB "history" pane) to generate an m-file using
matlabpool for batch processing. Please consult
the below table for differences between the two to ensure smooth conversion.
Table 1. Differences between pmode and matlabpool + spmd
| Operation | pmode | spmd | Notes |
| Graphics | No | No | Neither pmode nor spmd supports any form of graphics, such as plot or surf. Plots must be produced on the MATLAB client which means
that relevant data must first be copied from the workers to the client. |
| Interactive applications | Yes | Yes | |
| Batch | No | Yes | |
| whos | Yes | No | |
| Access worker arrays from client | Yes | No | For spmd, to enable the MATLAB client to access distributed local data, you must perform a gather. See spmd example above. |
Batch Processing
The Parallel Computing Toolbox provides myriad ways to run batch jobs. You could follow the procedures as outlined in the PCT User's Guide to submit batch jobs to the Katana batch queue. Included below is a simple batch procedure that works for all the parallel paradigms discussed above.
Recommended batch submission procedure on Katana
The examples in the previous sections demonstrate the components of
a typical code: a serial pre-processing section, a parallel processing section, and a serial post-processing section. For this type of applications, the
recommended batch procedure is as follows:
- Create a batch script, named mbatch (for example)
#!/bin/csh -f
# MATLAB script for running serial or parallel background jobs
# If your MATLAB application contains MATLAB Parallel Computing Toolbox
# parallel operations request (matlabpool or dfeval), a batch job
# will be queued and run in batch (using the SGE configuration)
# Script name: mbatch (you can change it)
# Usage: katana% mbatch <m-file> <output-file>
# <m-file>: name of m-file to be executed, DONOT include .m
# <output-file>: output file name; may include path
nohup matlab -nodisplay -nosplash -r $1 >! $2 &
- Change the execute attribute of mbatch
katana% chmod +x mbatch
- It is imperative to include exit as the last line of your
top level m-file (i.e., the "main" program). Otherwise, tens of megabytes of junk will appear in the
output file.
- The application script or function m-file must explicitly incorporates
the appropriate resource allocation procedure into the body of the
m-file. Specifically, as shown in the Parallel
Paradigms section
above, the dfeval example requires no resource allocation while
drange example and spmd example require matlabpool
and matlabpool plus spmd, respectively.
- mbatch script usage example
katana% mbatch drange_example mydrangeoutput
The PCT requires Java, so be sure not to include "-nojvm". The above job runs in the background. As soon as it encounters matlabpool open or dfeval, a batch request will be submitted to the batch queue. The "nohup" ensures that the job will continue to wait and eventually run to completion after you have logged out.
PCT batch processing summary
- At present, there are 32 remote worker licenses available on the
Katana Cluster for users with SCF accounts.
- A user can submit as many jobs as desired. However, no more than 16 processors (and 16 remote worker licenses for PCT applications) can be in the run state at any time by the same user.
- Users using the PCT on their local computers are limited to 8 local
workers -- provided that they have multicore hardware.
- The Katana run time limit is 24 hours which is the default for the
SGE configuration.
- A PCT job submitted to the Katana batch queue must wait until both the requested
number of processors and equal number of remote worker
licenses are available. Consequently on a loaded system, the wait time for a
queued parallel MATLAB job may be longer than non-PCT jobs.
Notes
- Many functions that provide information about standard arrays also work for codistributed
arrays, examples include ndims, length, size, isa. However, numel
doesn't work on codistributed arrays. It will return the value one (1).
- You can only launch one parallel interactive session from each MATLAB
session. You can, however,
launch an additional copy of MATLAB and then proceed from there to
launch another parallel interactive session.
- MATLAB compiler may be used to compile parallel m-files into
executables. However, the compiled executable cannot run on local
workers. Also, the MATLAB compiler does not support Simulink.
(more details...)
Useful Commands
Document Name: MATLAB PCT
Author/Maintainer: Kadin Tseng (kadin@bu.edu)
Keywords: matlab, parallel, computing, toolbox, plotting, math
Machines List: Katana Cluster
Related Help Pages: MathWorks' PCT User's Guide
Created February 12, 2009; Last Revised February 12, 2009; Last Modified August 31, 2009
URL of this document: http://scv.bu.edu/documentation/software-help/mathematics/PCT/
Go up to SCV Software Packages
This is a demo of mouse-over pop-up window.
When matlabpool is used, Codistributed arrays (on workers) are seen as
composite arrays on the client. These arrays can not be
accessed from the client. To make them accessible on the client, you
will need to use the gather function to either collect from all
workers to one or replicate it to all workers.
You would encounter errors if gather were not used.
>> spmd
[A, x, b] = linearSystem(M, C, i);
y = A\b; % solves for y
end
>> y{1}
??? Error using ==> Composite.subsref at 72
. . .
It is illegal to retrieve a portion of a
codistributed array
To make y accessible on the client:
>> spmd
y1=gather(y, 1); % collects y to worker 1
end
>> y1{1}
ans =
2.0000
2.0000
2.0000
Note that if you use pmode, gather is not necessary.
|
|