Griz uses the SLURM Workload Manager to distribute resources and schedule jobs. What this means is that in order to run computational or memory intensive jobs, you will have to submit a job script with all the resources you need for that job and all the commands you would like to run. Below I will outline an example job script and also show how to connect to a compute node interactively.
Currently, during this test phase, we have no limits on resources we can request or time to run jobs, but this will likely change when the server is out of testing!
I am still learning the SLURM system (old University used TORQUE), so I am open to feedback about best practices!
Below is a basic job script called job_script.sh
:
#!/bin/bash #SBATCH --job-name=[job name] #SBATCH --output="/path/to/desired/directory/%x-%j.out" #SBATCH --mail-user=[your email] #SBATCH --mail-type=ALL #SBATCH --partition=good_lab_cpu #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 #SBATCH --mem=96000 #SBATCH --time=2:30:00 # How long should they run ## Above is all information for SLURM. It should all appear at the top of ## the script before your commands to run. SLURM understands lines ## beginning with ## as a comment. ## Command(s) to run: source ~/bin/anaconda3/bin/activate conda activate biotools # Make sure the environment with the software you need is activated. cd /mnt/beegfs/gt156213e/ wgsim -N1000 -S1 genomes/NC_008253_1K.fna simulated_reads/sim_reads.fq /dev/null bowtie2 -x indexes/e_coli -U simulated_reads/sim_reads.fq -S alignments/sim_reads_aligned.sam samtools view -b -S -o alignments/sim_reads_aligned.bam alignments/sim_reads_aligned.sam samtools view -c -f 4 alignments/sim_reads_aligned.bam samtools view -q 42 -c alignments/sim_reads_aligned.bam
Full documentation of the sbatch
options can be found at the following link:
Briefly, the options above are:
--job-name: A name to give your job that will appear in the queue. --output: Location for SLURM to write log files. Default is same location as job script. %x represents job name and %j represents job ID. --mail-user: An email address to receive updates from SLURM about job progress. --mail-type: What type of email updates you'd like to receive (NONE, BEGIN, END, FAIL, ALL). --partition: The type of node you want to run your job on. See Node info. --nodes: Number of nodes you will need for the job. --ntasks: Number of tasks you need for your job. Each command is a task. If you run commands with parallel or srun, set this to be the number of commands you want to run simultaneously. --cpus-per-task:Number of threads available for each command. If you run a program that is multithreaded, set this to be the number of threads specified by that program. --mem: The amount of memory you need for you job. Default unit is MB. --time: The amount of time needed for your job to run. Ignore for now!
Job scripts are submitted with the sbatch
command:
sbatch job_script.sh
SLURM will read the options in the header of the file and assign resources accordingly before executing the desired commands.
Jobs will be assigned an ID and a log file will be written in the same location as the job script called [job id].out
.
Check this file if you encounter errors during your run.
The status of running jobs can be checked by running squeue
.
A job can be cancelled by running scancel [job id]
I have also provided a script called sres
that checks node resource availability in the good-utils
repository. Simply clone the repository and add it to your $PATH to run:
By default sres
prints information for all Good Lab partitions. You can provide it with the name of a particular partition to print out info only for that one
For example:
sres good_lab_cpu
should produce the following output:
NODE NAME PARTITION(S) TOTAL CPUs ALLOCATED CPUs FREE CPUs TOTAL MEM (MB) ALLOCATED MEM (MB) FREE MEM (MB) STATE compute-0-1 good_lab_cpu,good_lab_large_cpu 72 0 72 772439 0 765683 IDLE compute-0-2 good_lab_cpu,good_lab_large_cpu 72 0 72 772439 0 755300 IDLE compute-0-3 good_lab_cpu,good_lab_large_cpu 72 0 72 772439 0 692035 IDLE compute-0-4 good_lab_cpu,good_lab_large_cpu 72 0 72 772439 0 722757 IDLE compute-0-5 good_lab_cpu,good_lab_large_cpu 72 0 72 772439 0 765922 IDLE compute-0-6 good_lab_cpu,good_lab_large_cpu 72 0 72 772439 0 752593 IDLE compute-0-7 good_lab_cpu,good_lab_large_cpu 72 0 72 772439 0 753552 IDLE compute-0-8 good_lab_cpu,good_lab_large_cpu 72 69 3 772439 0 23577 MIXED
In some instances, it may be preferable to allocate resources on a compute node and run commands manually rather than through a job
script. This can be especially useful for debugging and testing workflows, and can be effectively combined with
screen
.
To run commands interactively, use salloc
:
salloc -p good_lab_cpu -N1 --exclusive srun --pty bash
This will allocate one good_lab_cpu node for interactive commands. Many more options are available for salloc
.
See the following docs for more info
Be aware that if you request resources with salloc
that are unavailable, you may be waiting in queue
even for an interactive session
Download the good-utils repository, which includes
the interact
command to request interactive node allocations. Use interact -h
to see its usage.