LSF (Load Sharing Facility) Basics
Description
LSF is the batch system used on the IBM p655 systems. LSF may be run via the command line or through a graphical user interface (GUI). For details on the command-line version see the lsfbatch man page, and for the GUI version see the xlsbatch man page.
Highlights: Highly configurable, X-Windows interface.
Availability and Setup
LSF is available on the IBM p655 (twister.bu.edu, scrabble.bu.edu, marbles.bu.edu, crayon.bu.edu, litebrite.bu.edu, hotwheels.bu.edu, jacks.bu.edu, playdoh.bu.edu, and slinky.bu.edu).
The batch system is highly configurable and continues to be tuned. There are limits to what the system will allow us to do in terms of configuration (for example, it is not possible to move a job that has been started on one machine onto another machine in the middle of execution). Currently, the overall goals behind the configuration are
- Never oversubscribe the processors.
- Minimize wait times.
Please also read the Usage policies and batch section of the document SCF User Information for a description of the batch system on the IBM p655. A table listing all of the available batch queues is online at http://scv.bu.edu/SCV/scf-techsumm.html#QUEUES.
Using LSF
Jobs which take less than 10 minutes of CPU time may be run interactively on all of the systems. In most cases, a process reaper will kill jobs which exceed this limit. The exception to this rule is when jobs run for more than 10 minutes, but are utilizing only 25% of a single processor. This exception allows users to keep jobs running interactively or in the background when the processes don't require much CPU time (e.g., emacs, xbiff).
Jobs which require more than 10 minutes of CPU time must be submitted through the batch system using LSF. There are several ways to submit a batch job. One method is to write a short script containing your run command. Make sure that you set the execute bit for this script (see the man page for chmod if you don't know how to do this). A sample script for a single-processor job is shown below:
#!/bin/csh -f progname < infile > outfile exit
The progname < infile > outfile line represents the command used to run your code.
For a multiprocessing job using OpenMP the number of processors is specified with the OMP_NUM_THREADS environment variable:
#!/bin/csh -f setenv OMP_NUM_THREADS N progname < infile > outfile exit
where N is the number of processors required.
And for a multiprocessing job using MPI the number of processors is specified with a poe command line option to poe:
#!/bin/csh -f poe progname < infile > outfile -procs N exit
To run these scripts under LSF, use the bsub command:
bsub -q queuename scriptname
It is important that you submit your job to the right queue. Each queue is intended to be used by jobs of a specific size (number of processors) and duration (wall clock limit). See the queue summary for a description of the queues. Alternatively, you can use the bqueues command:
bqueues [-l] [queuename]
to get queue descriptions as well as current utilization.
Users on multiple projects can control which project their job is accounted to by using the -P flag to bsub:
bsub -P project_name
The bjobs command will show you the status of all of your pending and running jobs. To show the status of all of your jobs in a particular queue, run:
bjobs -q queuename
To show the status of all jobs (including those of users other than yourself), run:
bjobs -u all
After your job has finished, LSF will send you email telling you whether or not the job has completed successfully and report the exit code if it failed. See the SCF FAQ for information about the meaning of the exit codes. The message also contains a summary of system resources used by the job.
The Motif tool xlsmon is available for detailed monitoring of loads and jobs on the IBM p655.
Additional Help/Documentation
LSF is produced by the Platform Computing Corporation and additional materials on it are available at their WWW site.
If you have questions about using the batch system, please send them to help@twister.bu.edu or if you think they would be of general interest to the SCF community, send them to the scfug-l@bu.edu mailing list/newsgroup.
Document Name: lsf
Author/Maintainer: Aaron D. Fuegi (aarondf@bu.edu)
Executable: /usr/local/bin/bsub, /usr/local/bin/xlsbatch, /usr/local/bin/xlsmon
Keywords: load, sharing, batch
Machines List: IBM p655
Related Man Pages: lsf, lsfbatch, xlsbatch, xlsmon
Created April 4, 1996; Last Revised May 29, 2009; Last Modified
July 21, 2009
URL of this document: http://scv.bu.edu/documentation/software-help/batchsystem/lsf.html
Go up to Software Help Pages
|