Graphical Version

HPC@Mines Using Slurm



General Slurm Commands

---sbatch - Submit a batch script to SLURM.
---squeue - view information about jobs located in the SLURM scheduling queue.
---sinfo - view information about SLURM nodes and partitions.
---scancel - Used to signal jobs or job steps that are under the control of Slurm.
---scontrol - Used to view Slurm configuration and state. (Example: @mio001[~]->scontrol show node phi001)


Rosetta Stone


PDF version

Text only version

Shows mapping between common PBS/Slurm/Load Leveler commands
Slurm Documentation


HPC@Mines Specific Commands

There are slurm related commands that are unique to HPC@Mines.

Information about nodes available
Information about queue and running jobs
A utility for getting a full list of nodes used for a job

"-help" or "-h" options are available for each of the commands

[joeuser@mio001 utility]$ printenv SLURM_NODELIST
[joeuser@mio001 utility]$  ./expands  $SLURM_NODELIST

Mio Specific Slurm Commands

The scheduler on mio has partitions. If you don't care which nodes you run on you do not need to specify a partition. If you would like to run on your groups nodes or want to run on the PHI or GPU nodes you need to specify a partition. As discussed below, the command for submitting a batch job is sbatch script where script is your script. To run in the phi partition, and thus on the phi nodes, the syntax would be sbatch -p phi script

As of July 14 15:51:39 MDT 2014 the following partitions are defined.


[joeuser@mio001 ~]$ sinfo -a
compute*     up 6-00:00:00     52  alloc compute[032-047,061,068-083,102-111,119-122,125-129]
compute*     up 6-00:00:00     74   idle compute[000-005,008-031,049-052,054-060,062-067,084-101,112-118,123-124]
phi          up 6-00:00:00      1    mix phi001
phi          up 6-00:00:00      1   idle phi002
gpu          up 6-00:00:00      3   idle gpu[001-003]
hkazemi      up 6-00:00:00      1   idle compute031
anewman      up 6-00:00:00      1   idle compute055
asum         up 6-00:00:00      8   idle compute[051-052,094-099]
cciobanu     up 6-00:00:00      3   idle compute[054,090-091]
cmmaupin     up 6-00:00:00     10   idle compute[016-025]
geco         up 6-00:00:00      6   idle compute[084-089]
hpc          up 6-00:00:00      2   idle compute[004-005]
ireimani     up 6-00:00:00      1  alloc compute102
jbrune       up 6-00:00:00      6   idle compute[000-003,100-101]
lcarr        up 6-00:00:00      2  alloc compute[128-129]
lcarr        up 6-00:00:00     11   idle compute[026-030,062-067]
mganesh      up 6-00:00:00      1  alloc compute061
mganesh      up 6-00:00:00      5   idle compute[056-060]
mooney       up 6-00:00:00      2   idle compute[049-050]
nsulliva     up 6-00:00:00      1  alloc compute122
nsulliva     up 6-00:00:00      1   idle compute123
pconstan     up 6-00:00:00      1  alloc compute125
pconstan     up 6-00:00:00      1   idle compute124
psava        up 6-00:00:00     44  alloc compute[032-047,068-083,103-111,119-121]
psava        up 6-00:00:00      7   idle compute[112-118]
zhiwu        up 6-00:00:00      6   idle compute[010-015]
mlusk        up 6-00:00:00      2  alloc compute[126-127]
mlusk        up 6-00:00:00      4   idle compute[008-009,092-093]
mgpu3        up 6-00:00:00      1   idle gpu003


HPC@Mines Runtime Policies

The standard maximum walltime is: 6 days or

#SBATCH --time=144:00:00

If you find you do need to request an increased walltime, the official policy is thus:
Each request will be handled on a case-by-case basis.

HPC@Mines strongly encourages other means to tackle larger problems, rather than just extending the maximum walltime; there are two primary approaches to do this.

  1. Increase the amount of parallelism
    By increasing the number of cores/nodes used in your job, you can often decrease the total wall time needed.
  2. Checkpointing
    Checkpointing is the processes of periodically or on certain events saving the state of the execution so that it can be picked up at a later time.  This is extremely helpful if you are afraid a crash or error that could cause your entire run to be lost; this way you have save points every few hours, days, etc.

If you desire help in any of these areas as always the HPC@Mines team is available and willing to help you with the computing aspects of your research, you may email us at hpcinfo@mines.edu.  You may also find that first consulting with members of your group or other peers that are currently using the same code you are running may provide expedited answers to your questions since they already are more familiar with your specific context.

Font Size