SLURM#

Overview#

The Slurm Workload Manager (Simple Linux Utility for Resource Management) or Slurm for short, is a free and open-source job scheduler for Linux.

It runs on many of the world’s supercomputers and computer clusetrs.

Slurm has three key functions.

First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work.

Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes.

Finally, it arbitrates contention for resources by managing a queue of pending work.

Architecture#

Slurm has a centralized manager, slurmctld, to monitor resources and work.

Each compute server (node) has a slurmd daemon, which can be compared to a remote shell: it waits for work, executes that work, returns status, and waits for more work.

Here is an image of it’s structure

Slurm Processes