Slurm - Useful Commands#

squeue#

squeue is used to view job and job step information for jobs managed by Slurm.

Command

Explanation

squeue

Display jobs in queue

squeue -u username

Display jobs in queue owned by username

squeue -t state

Display jobs with a certain state: R = Running, PD = pending

squeue-t pd -u username

You can combine options together. This shows all pending jobs for username

scancel#

scancel is used to cancel jobs, job arrays or job steps.

Command

Explanation

scancel jobid

Cancel a specific job by jobid

scancel -u username

Cancel all jobs by username

scancel -t state

Cancel all jobs in certain state (See below for list of states)

scancel -t pd -u username

Cancels all pending jobs by username

scancel jobid_index

Cancels an indexed job in a job array. E.g scancel 1234_4

scontrol#

scontrol is used to view or modify Slurm configuration including: job, job step, node, partition and reservation

Command

Explanation

scontrol hold jobid

Holds jobid preventing it from being scheduled

scontrol release jobid

Releases jobid allowing it to be scheduled

scontrol show jobid -dd jobid

Show detailed information for jobid (useful for troubleshooting)

sacct#

The sacct command displays job accounting data stored in the Slurm database in a variety of forms for your analysis

Command

Explanation

sacct -u username

Show all jobs by username

sacct -S 2022-01-01 -u username

Show all jobs since the 1st January 2022 for username

sacct -u username -format=JobID,JobName,MaxRSS,Elapsed

–format can follow the sacct command to report information from specific fields

Slurm Jobs States#

A batch job goes thrugh several satus during it’s execution. Here are the typical states you’ll see when you run something like squeue

Job State

Description

Explanation

PD

Pending

The job is waiting in a queue for allocaton of resources

R

Running

The job is allocated to a node and is running

CG

Completing

The job is finishing but some processes are still active

CD

Completed

The jobhas completed successfully

F

Failed

Failed with a non-zero exit value

TO

Terminated

Job terminated by SLURM after reaching it’s time limit

S

Suspended

A running job has been stopped with it’s resourses released to other jobs

ST

Stopped

A running job has been stopped with it’s resources retained

A full list of states can be found here: https://slurm.schedmd.com/squeue.html#SECTION_JOB-STATE-CODES