GPC: Job Scheduler

  • Faculty & Staff

Overview

Resource management and load balancing are controlled by GPC’s scheduler. Running a batch job on GPC begins with creating a wrapper script, followed by submitting to the queue.

  

Creating Job Wrapper Scripts

Batch jobs intended for submission to the queue require a submit script. Sample jobs scripts are provided below. (Note: Scheduler settings are specified using #$ SETTING. A full list of these can be found under the Options section of man qsub)

 

 

Serial Job

#!/bin/bash

#$ -q all.q

#$ -cwd

#$ -S /bin/bash

#$ -o output.log

#$ -e error.log

#$ -l h_rt=00:10:00            # Run for a max of 10 minutes

 

# Enable Additional Software

. /etc/profile.d/modules.sh

module load shared

 

./myprogram

This example can be used for a serial job in the all.q queue.

 

 

OpenMP Job

Shared-Memory OpenMP jobs can run using a single node with up to 40 slots.

#!/bin/bash

#$ -q all.q

#$ -pe openmp 40

#$ -cwd

#$ -S /bin/bash

#$ -o output.log

#$ -e error.log

#$ -l h_rt=00:10:00

 

# Enable Additional Software

. /etc/profile.d/modules.sh

module load shared

 

export OMP_NUM_THREADS=$NSLOTS

./myprogram

This example can be used for an OpenMP job that uses 40 threads. The OMP_NUM_THREADS will automatically be set to the number of job slots by the job scheduler. The scheduler will place an OpenMP job on a single node.

 

 

OpenMPI Job

#!/bin/bash

#$ -pe openmpi 120

#$ -cwd

#$ -S /bin/bash

#$ -o output.log

#$ -e error.log

#$ -l h_rt=00:10:00

 

# Enable Additional Software

. /etc/profile.d/modules.sh

module load shared openmpi/gcc

 

mpirun -n $NSLOTS -bind-to hwthread ./myprogram

This example can be used for an OpenMPI job that uses 120 workers. The scheduler will first attempt to schedule workers using all available slots on a node, then span to another node for additional slots.

 

 

Job Management

Interactive Jobs can be started using qlogin. This will place you onto the shell of a node with the least amount of load.

ks347@gpc:~$ qlogin

Your job 323 ("QLOGIN") has been submitted

waiting for interactive job to be scheduled ...

Your interactive job 323 has been successfully scheduled.

Establishing /usr/global/sge/bin/qlogin_wrapper session to host node04 ...

/usr/bin/ssh -Y -p 52039 node04

 

Last login: Wed Oct 14 17:02:33 2015 from gpc

ks347@node04:~$

From here, you can run programs directly on the compute node. Please note, it is best to avoid running programs on the head node since it manages all of the compute nodes and provides access to the cluster from remote machines.

 

 

Submitting Batch Jobs

Batch jobs can be submitted using a wrapper script with the qsub command.

ks347@gpc:~/dev/MyJob$ qsub submit.sh

Your job 324 ("submit.sh") has been submitted

 

 

Submitting Matlab Communication Jobs

ks347@gpc:~/dev/MyJob$ qsub-mparfor MatlabScript.m

 

 

Submitting Matlab Independent Jobs

ks347@gpc:~/dev/MyJob$ qsub-mtasks MatlabScript.m

 

 

Deleting Jobs

The qdel command allows you to delete a job by JobID.

ks347@gpc:~/dev/MyJob$ qdel 325

ks347 has registered the job 325 for deletion

 

 

Monitoring Jobs

The qstat command will show a list of all jobs that are currently running and scheduled to run in the job queue.

[ks347@gpc Laplace]$ qstat -f

queuename                      qtype resv/used/tot. load_avg arch          states

---------------------------------------------------------------------------------

all.q@node01                   BIP   0/0/2          0.01     lx-amd64

---------------------------------------------------------------------------------

all.q@node02                   BIP   0/2/2          0.01     lx-amd64

     26 0.55500 LaplaceMPI ks347        r     11/09/2015 10:46:22     2 1

---------------------------------------------------------------------------------

all.q@node03                   BIP   0/0/2          0.01     lx-amd64

---------------------------------------------------------------------------------

all.q@node04                   BIP   0/0/2          0.01     lx-amd64

     26 0.55500 LaplaceMPI ks347        r     11/09/2015 10:46:22     2 1

 

############################################################################

 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS

############################################################################

     26 0.00000 LaplaceMPI ks347        qw    11/09/2015 10:46:15     4 5-8:1

     27 0.55500 LaplaceMPI ks347        qw    11/09/2015 10:46:16     4 1-8:1

     28 0.55500 LaplaceMPI ks347        qw    11/09/2015 10:46:17     4 1-8:1

   

Advanced Job Wrappers

 

Array Job Wrapper

An array job can be created to submit a batch of similar tasks. An example is as follows:

#!/bin/bash

#$ -cwd

#$ -q all.q

#$ -t 1-20:1

#$ -tc 2

#$ -N "QUEUE_ARRAY_TEST"

 

# Enable Additional Software

. /etc/profile.d/modules.sh

module load shared openmpi/gcc

 

let i1=$SGE_TASK_ID

let i2=$(($SGE_TASK_ID+1000))

 

echo "Task: $i1 $i2"

./myprogram  -a $i1 –b $i2

This script can be submitted using a standard qsub. The –t option specifies start_task_number-end_task_number:task_stride. The scheduler will create 20 jobs in the queue and allow at most 2 jobs (specified by -tc) to run on the nodes at the same time.

 

 

Compute node scratch space

Each compute node has ~1TB available for use as scratch space. I/O intensive jobs can move data onto nodes to speed up access during a job’s runtime. An example wrapper is as follows:

#!/bin/bash

#$ -q all.q

#$ -pe openmp 60

#$ -V

 

PROGRAM=$HOME/dev/MyProgram/myprogram

DATASET=$HOME/data/datafile

SCRATCHDIR=/scratch/ks347/$JOB_ID

 

if [[ ! -d “$SCRATCHDIR” ]]

  then

    mkdir -p $SCRATCHDIR

fi

 

if [[ ! -e “$SCRATCHDIR/datafile” ]]

 then

   cp $DATASET $SCRATCHDIR/datafile

   cp $PROGRAM $SCRATCHDIR/myprogram

fi

 

cd $SCRATCHDIR

 

export OMP_NUM_THREADS=$NSLOTS

./myprogram -f datafile -o outfile

 

cp outputfile $HOME/data/outputfile.$JOB_ID

rm -rf $SCRATCHDIR

This job will create a scratch directory on the node that it runs on, copy data and the job into the scratch directory on the node, then copy the job output back to the network home directory. After the job completes, the temporary scratch directory is deleted.

 

Cloud Jobs

 

Overview

The GPC allows jobs to run within Amazon AWS, when they are submitted to gpc-aws.q using cmsub. It is important to use cmsub, and not qsub, when submitting cloud jobs. Accidentally using qsub will result in a job that produces no output.

 

Creating a cloud job wrapper script

There are a few additional options which can be added to existing wrapper scripts to allow them to work with cmsub.

#CMSUB --input    <input file>

    : One input file for the job (can be used multiple times).

 

#CMSUB --input-list    <file>

    : One file containing a list of input files.

 

#CMSUB --output    <output file>

    : One output file for the job (can be used multiple times).

 

#CMSUB --output-list    <file>           

    : One file containing a list of output files.

 

There are a few qsub options that may cause problems if present in the cmsub wrapper script. These are:

#$ -cwd

    : Execute from the current working directory. This is not currently supported by cloud jobs

#$ -V

    : Export all environment variables. This is not currently supported by cloud jobs

#$ -N NAME

    : Cloud jobs with names containing numbers are known to fail

 

Failure to omit –cwd and –V from your cloud job wrapper will result in undefined job behavior.

A full example wrapper is shown below:

#!/bin/sh

#$ -q gpc-aws.q

#$ -N MPI-CLOUD

#$ -q gpc-aws.q

#$ -pe openmpi 8

#$ -S /bin/bash

#$ -e /home/ks347/dev/MPI/errors.log

#$ -o /home/ks347/dev/ MPI /output.log

 

#CMSUB --input /home/ks347/dev/ MPI /mpi-shell

 

# Enable OpenMPI support

. /etc/profile.d/modules.sh

module load shared openmpi/gcc

 

mpirun -n $NSLOTS -bind-to hwthread mpi-shell

Managing Cloud Jobs

Similar to managing jobs submitted via qsub, there are several commands to manage cmsub jobs.

 
Submitting

Cloud jobs can be submitted using the cmsub command.

ks347@gpc:~/dev/NetLogoJob$ cmsub netlogo-cloud.sh

Submitting job: netlogo-cloud.sh(sge-521) [sge:521] ... OK

 

Monitoring

Cloud Jobs can be monitored in two ways. The first is using qstat to check the status of the scheduler

ks347@gpc :~/dev/NetLogoJob$ qstat –f –q gpc-aws.q

queuename                      qtype resv/used/tot. load_avg arch          states

---------------------------------------------------------------------------------

gpc-aws.q@aws-gpc-storage-temp B     0/0/16          -NA-     -NA-          au

 

############################################################################

 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS

############################################################################

    521 0.00000 MPICompile ks347        hqw   02/10/2016 09:53:13     1

 

 

When the cloud nodes are idle for more than 55 minutes, they will power down to reduce Amazon’s hourly charges. If you submit a job to the cloud when all of the cloud nodes are powered off, you will see something similar to the above. Your job will be listed in the PENDING JOBS section with an hqw status while cnode001 boots within Amazon.

To check the status of your job’s cloud data transfer, you can use the -s option with cmsub as shown below

ks347@gpc:~/dev/NetLogoJob$ cmsub -s 521

Status for sge job with id 521 is: Booting storage node ...

 

After the cloud node has booted and begins processing your job, cmsub -s will report the status as Running… .

 

 

Removing

The qdel command can be used to remove cloud nodes from the queue.

ks347@gpc:~/dev/NetLogoJob$ qdel 521