Slurm - Useful Commands#
squeue#
squeue is used to view job and job step information for jobs managed by Slurm.
Command |
Explanation |
---|---|
squeue |
Display jobs in queue |
squeue -u username |
Display jobs in queue owned by username |
squeue -t state |
Display jobs with a certain state: R = Running, PD = pending |
squeue-t pd -u username |
You can combine options together. This shows all pending jobs for username |
scancel#
scancel is used to cancel jobs, job arrays or job steps.
Command |
Explanation |
---|---|
scancel jobid |
Cancel a specific job by jobid |
scancel -u username |
Cancel all jobs by username |
scancel -t state |
Cancel all jobs in certain state (See below for list of states) |
scancel -t pd -u username |
Cancels all pending jobs by username |
scancel jobid_index |
Cancels an indexed job in a job array. E.g scancel 1234_4 |
scontrol#
scontrol is used to view or modify Slurm configuration including: job, job step, node, partition and reservation
Command |
Explanation |
---|---|
scontrol hold jobid |
Holds jobid preventing it from being scheduled |
scontrol release jobid |
Releases jobid allowing it to be scheduled |
scontrol show jobid -dd jobid |
Show detailed information for jobid (useful for troubleshooting) |
sacct#
The sacct command displays job accounting data stored in the Slurm database in a variety of forms for your analysis
Command |
Explanation |
---|---|
sacct -u username |
Show all jobs by username |
sacct -S 2022-01-01 -u username |
Show all jobs since the 1st January 2022 for username |
sacct -u username -format=JobID,JobName,MaxRSS,Elapsed |
–format can follow the sacct command to report information from specific fields |
Slurm Jobs States#
A batch job goes thrugh several satus during it’s execution. Here are the typical states you’ll see when you run something like squeue
Job State |
Description |
Explanation |
---|---|---|
PD |
Pending |
The job is waiting in a queue for allocaton of resources |
R |
Running |
The job is allocated to a node and is running |
CG |
Completing |
The job is finishing but some processes are still active |
CD |
Completed |
The jobhas completed successfully |
F |
Failed |
Failed with a non-zero exit value |
TO |
Terminated |
Job terminated by SLURM after reaching it’s time limit |
S |
Suspended |
A running job has been stopped with it’s resourses released to other jobs |
ST |
Stopped |
A running job has been stopped with it’s resources retained |
A full list of states can be found here: https://slurm.schedmd.com/squeue.html#SECTION_JOB-STATE-CODES