GPC: Job Scheduler

  • Faculty & Staff

Overview

Resource management and load balancing are controlled by GPC’s scheduler. Running a batch job on GPC begins with creating a wrapper script, followed by submitting to one of the queues.

Queues

The following queues are available on the GPC:

Available Queues
Queue Use
all.q General jobs, 192GB memory available per node.
himem.q High memory jobs. 256GB memory available per node.

 

Creating Job Wrapper Scripts

Job submission scripts can be used to run batch jobs on the GPC. These scripts contain options for the job scheduler (specified using #$), as well as the commands needed to start your job. The following table contains a list of common scheduler options:

Scheduler Options
Option Purpose
-q queue_list Defines 1 or more queues which may be used to run your job. This can be any of the queues listed in the table above.
-pe pe_name # Defines the parallel environment to use with '#' processes.
-N name Defines a name for your job. This will be displayed when using qstat.
-o filename Defines the file to use for standard logging.
-e filename Defines the file to use for error logging.
-j y Enables standard and error log output to a single log file. File can be specified by the '-o' option.
-j n Enables standard and error log output to a seperate log files. Files can be specified by the '-o' and '-e' options.
-m b,e  Enables job notifications. Mail will be sent when the job begins (b), ends (e), or aborts (a). One or more of these options can be specified using commas.
-M The email to use for job notifications.
-cwd Execute the job from the current working directory. 
-S /bin/bash Defines the shell to use for your job. The default is /bin/bash.
-l h_rt=03:30:00 Defines a time limit of 3 hours and 30 minutes for the job.
-t 1-10:1 Defines an array job, with subtask ids counting from 1 to 10 with an increment of 1. The starting id must be less than the end id, and the increment must be greater than 0. This essentially allows you to create 10 identical job submissions with a single job script.
-tc 5 Defines the number of concurrent tasks to run for an array job. This example would allow 5 array job tasks to run on the cluster at a time.

  (Note: A full list of these can be found under the Options section of man qsub)

 

Sample jobs scripts are provided below:

Example Serial Job

#!/bin/bash

#$ -q all.q

#$ -cwd

#$ -S /bin/bash

#$ -o output.log

#$ -e error.log

#$ -l h_rt=00:10:00            # Run for a max of 10 minutes

 

# Enable Additional Software

. /etc/profile.d/modules.sh

module load shared

 

# Run the job commands

./myprogram

This example can be used for a serial job in the all.q queue.

 

 

Example Single Node Parallel Job

Single node shared memory jobs can run using a single node. Each node offers up to 40 slots, but our per-user slot limit is slightly less than this.

#!/bin/bash

#$ -q all.q

#$ -pe openmp 25

#$ -cwd

#$ -S /bin/bash

#$ -o output.log

#$ -e error.log

#$ -l h_rt=00:10:00

 

# Enable Additional Software

. /etc/profile.d/modules.sh

module load shared

 

# Run the job commands

export OMP_NUM_THREADS=$NSLOTS

./myprogram

This example can be used for an OpenMP job that uses 25 processes. The OMP_NUM_THREADS will automatically be set to the number of job slots by the job scheduler. The scheduler will place an OpenMP job on a single node.

 

 

Example Multi Node Parallel Job

#!/bin/bash

#$ -pe openmpi 25

#$ -cwd

#$ -S /bin/bash

#$ -o output.log

#$ -e error.log

#$ -l h_rt=00:10:00

 

# Enable Additional Software

. /etc/profile.d/modules.sh

module load shared openmpi/gcc

 

# Run the job commands

mpirun -n $NSLOTS -bind-to hwthread ./myprogram

This example can be used for an OpenMPI job that uses 25 processes. The scheduler will first attempt to schedule workers using all available slots on a node, then span to another node for additional slots.

 

 

Job Management

Interactive Jobs can be started using qlogin. This will place you onto the shell of a node with the least amount of load.

ks347@gpc:~$ qlogin

Your job 323 ("QLOGIN") has been submitted

waiting for interactive job to be scheduled ...

Your interactive job 323 has been successfully scheduled.

Establishing /usr/global/sge/bin/qlogin_wrapper session to host node04 ...

/usr/bin/ssh -Y -p 52039 node04

 

Last login: Wed Oct 14 17:02:33 2015 from gpc

ks347@node04:~$

From here, you can run programs directly on the compute node. Please note, it is best to avoid running programs on the head node since it manages all of the compute nodes and provides access to the cluster from remote machines.

 

 

Submitting Batch Jobs

Batch jobs can be submitted using a wrapper script with the qsub command.

ks347@gpc:~/dev/MyJob$ qsub submit.sh

Your job 324 ("submit.sh") has been submitted

 

 

Submitting Matlab Communication Jobs

ks347@gpc:~/dev/MyJob$ qsub-mparfor MatlabScript.m

 

 

Submitting Matlab Independent Jobs

ks347@gpc:~/dev/MyJob$ qsub-mtasks MatlabScript.m

 

 

Deleting Jobs

The qdel command allows you to delete a job by JobID.

ks347@gpc:~/dev/MyJob$ qdel 325

ks347 has registered the job 325 for deletion

 

 

Monitoring Jobs

The qstat command will show a list of all jobs that are currently running and scheduled to run in the job queue.

[ks347@gpc Laplace]$ qstat -f

queuename                      qtype resv/used/tot. load_avg arch          states

---------------------------------------------------------------------------------

all.q@node01                   BIP   0/0/2          0.01     lx-amd64

---------------------------------------------------------------------------------

all.q@node02                   BIP   0/2/2          0.01     lx-amd64

     26 0.55500 LaplaceMPI ks347        r     11/09/2015 10:46:22     2 1

---------------------------------------------------------------------------------

all.q@node03                   BIP   0/0/2          0.01     lx-amd64

---------------------------------------------------------------------------------

all.q@node04                   BIP   0/0/2          0.01     lx-amd64

     26 0.55500 LaplaceMPI ks347        r     11/09/2015 10:46:22     2 1

 

############################################################################

 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS

############################################################################

     26 0.00000 LaplaceMPI ks347        qw    11/09/2015 10:46:15     4 5-8:1

     27 0.55500 LaplaceMPI ks347        qw    11/09/2015 10:46:16     4 1-8:1

     28 0.55500 LaplaceMPI ks347        qw    11/09/2015 10:46:17     4 1-8:1

   

Advanced Job Wrappers

 

Array Job Wrapper

An array job can be created to submit a batch of similar tasks. An example is as follows:

#!/bin/bash

#$ -cwd

#$ -q all.q

#$ -t 1-20:1

#$ -tc 2

#$ -N "QUEUE_ARRAY_TEST"

 

# Enable Additional Software

. /etc/profile.d/modules.sh

module load shared openmpi/gcc

 

# Set per-task variables. $SGE_TASK_ID can be used to vary input to each task.

# Each task will have a unique value counting from 1 to the max number of tasks.

let i1=$SGE_TASK_ID

let i2=$(($SGE_TASK_ID+1000))

 

# Run the job commands using the per task variables as input

echo "Task: $i1 $i2"

./myprogram  -a $i1 –b $i2

This script can be submitted using a standard qsub. The –t option specifies start_task_number-end_task_number:task_stride. The scheduler will create 20 jobs in the queue and allow at most 2 jobs (specified by -tc) to run on the nodes at the same time.

 

 

Compute node scratch space

Each compute node has ~1TB available for use as scratch space. I/O intensive jobs can move data onto nodes to speed up access during a job’s runtime. An example wrapper is as follows:

#!/bin/bash

#$ -q all.q

#$ -pe openmp 25

#$ -V

 

# Specify a few variables needed for this job

PROGRAM=$HOME/dev/MyProgram/myprogram

DATASET=$HOME/data/datafile

SCRATCHDIR=/scratch/ks347/$JOB_ID

 

# Check whether the scratch directory exists and create as needed

if [[ ! -d “$SCRATCHDIR” ]]

  then

    mkdir -p $SCRATCHDIR

fi

 

# Check whether out data is in scratch and copy as needed

if [[ ! -e “$SCRATCHDIR/datafile” ]]

 then

   cp $DATASET $SCRATCHDIR/datafile

   cp $PROGRAM $SCRATCHDIR/myprogram

fi

 

# Navigate to the scratch dir

cd $SCRATCHDIR

 

# Run our job commands from within the scratch dir

export OMP_NUM_THREADS=$NSLOTS

./myprogram -f datafile -o outfile

 

# Copy the output from the job commands to your homedir

# then delete the scratch dir

cp outfile $HOME/data/outputfile.$JOB_ID

rm -rf $SCRATCHDIR

This job will create a scratch directory on the node that it runs on, copy data and the job into the scratch directory on the node, then copy the job output back to the network home directory. After the job completes, the temporary scratch directory is deleted.

 

Cloud Jobs

 

Overview

The GPC allows jobs to run within Amazon AWS, when they are submitted to gpc-aws.q using cmsub. It is important to use cmsub, and not qsub, when submitting cloud jobs. Accidentally using qsub will result in a job that produces no output.

 

Creating a cloud job wrapper script

There are a few additional options which can be added to existing wrapper scripts to allow them to work with cmsub.

#CMSUB --input    <input file>

    : One input file for the job (can be used multiple times).

 

#CMSUB --input-list    <file>

    : One file containing a list of input files.

 

#CMSUB --output    <output file>

    : One output file for the job (can be used multiple times).

 

#CMSUB --output-list    <file>           

    : One file containing a list of output files.

 

There are a few qsub options that may cause problems if present in the cmsub wrapper script. These are:

#$ -cwd

    : Execute from the current working directory. This is not currently supported by cloud jobs

#$ -V

    : Export all environment variables. This is not currently supported by cloud jobs

#$ -N NAME

    : Cloud jobs with names containing numbers are known to fail

 

Failure to omit –cwd and –V from your cloud job wrapper will result in undefined job behavior.

A full example wrapper is shown below:

#!/bin/sh

#$ -q gpc-aws.q

#$ -N MPI-CLOUD

#$ -q gpc-aws.q

#$ -pe openmpi 8

#$ -S /bin/bash

#$ -e /home/ks347/dev/MPI/errors.log

#$ -o /home/ks347/dev/ MPI /output.log

 

#CMSUB --input /home/ks347/dev/ MPI /mpi-shell

 

# Enable OpenMPI support

. /etc/profile.d/modules.sh

module load shared openmpi/gcc

 

# Run the job commands

mpirun -n $NSLOTS -bind-to hwthread mpi-shell

Managing Cloud Jobs

Similar to managing jobs submitted via qsub, there are several commands to manage cmsub jobs.

 
Submitting

Cloud jobs can be submitted using the cmsub command.

ks347@gpc:~/dev/NetLogoJob$ cmsub netlogo-cloud.sh

Submitting job: netlogo-cloud.sh(sge-521) [sge:521] ... OK

 

Monitoring

Cloud Jobs can be monitored in two ways. The first is using qstat to check the status of the scheduler

ks347@gpc :~/dev/NetLogoJob$ qstat –f –q gpc-aws.q

queuename                      qtype resv/used/tot. load_avg arch          states

---------------------------------------------------------------------------------

gpc-aws.q@aws-gpc-storage-temp B     0/0/16          -NA-     -NA-          au

 

############################################################################

 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS

############################################################################

    521 0.00000 MPICompile ks347        hqw   02/10/2016 09:53:13     1

 

 

When the cloud nodes are idle for more than 55 minutes, they will power down to reduce Amazon’s hourly charges. If you submit a job to the cloud when all of the cloud nodes are powered off, you will see something similar to the above. Your job will be listed in the PENDING JOBS section with an hqw status while cnode001 boots within Amazon.

To check the status of your job’s cloud data transfer, you can use the -s option with cmsub as shown below

ks347@gpc:~/dev/NetLogoJob$ cmsub -s 521

Status for sge job with id 521 is: Booting storage node ...

 

After the cloud node has booted and begins processing your job, cmsub -s will report the status as Running… .

 

 

Removing

The qdel command can be used to remove cloud nodes from the queue.

ks347@gpc:~/dev/NetLogoJob$ qdel 521