Example SLURM Script for Comet/TSCC Systems

A lot of people have recently been asking me about help with the SLURM system at UCSD, so I decided to write a blog post to help people quickly learn how to use it and get their jobs running. I have a simple script that can be executed via the sbatch command that forks jobs in a job array. This allows you to run multiple of the same jobs in parallel on the cluster.

So say you have a bunch of scripts you want to run in parallel on a SLURM system. We can put all the script commands into a file (calls.txt):

calls.txt:

script1.sh
script2.sh
script3.sh

Then we can create an sbatch script that specifies a job array that runs through these commands and executes them in a parallel job array. Here is the job array script for sbatch (runTest.sh):

runTest.sh:

#!/bin/bash
#SBATCH --account=xxxx
#SBATCH --partition=shared
#SBATCH -N 1 # Ensure that all cores are on one machine
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH -t 0-00:10 # Runtime in D-HH:MM
#SBATCH -o outputfile%a # File to which STDOUT will be written
#SBATCH -e outputerr%a # File to which STDERR will be written
#SBATCH --job-name=TestJob
#SBATCH --mail-type=ALL # Type of email notification- BEGIN,END,FAIL,ALL
#SBATCH --mail-user=xxxx@ucsd.edu # Email to which notifications will be sent
#SBATCH --array=1-3%3

linevar=`sed $SLURM_ARRAY_TASK_ID'q;d' calls.txt`
eval $linevar

As one can see, the sbatch script has several parameters. Some of the useful ones that one should specify:

account: The university PI account on SLURM (different for each lab)
partition: shared vs. compute. “shared” is generally what people use if you just want to use the cluster for straight-forward jobs. “compute” specifies a more reserved use of the cluster that needs more time (but also costs more).
nodes: specifies how many nodes to use (each node has 24 cores)
ntasks-per-node: how many tasks there are (in this case, I’m treating this all as 1 task).
cpus-per-task: How many cores to use per task in the job array.
runtime (how long to run any 1 job). Runtime is specified in D-HH:MM
-o specifies that outputfiles and -e specifies error files. You can just keep these as is for now (and change once you try running it if need be).
SLURM has the option of sending you email notifications. See comments in script
array=1-3%3 specifies that the job array contains three jobs (i.e. calls.txt has three lines) and %3 specifies they should be run 3 at a time (i.e. all at once). If you have 50 jobs you want to run 10 at a time, the command is array=1-50%10.

All the job array script does is step through calls.txt and fork them to the cluster in batches that are specified by the “array” command.

Then on Comet/TSCC,all one has to do is:
sbatch runTest.sh

You can do the following command to check the status of your job array in the Comet or TSSC queue:
squeue -u yourusername

Hopefully this makes use of SLURM a lot simpler for everyone. Cheers!

Example SLURM Script for Comet/TSCC Systems

Share this: