HPC@Mines Using Slurm

 

 

General Slurm Commands

sbatch
---sbatch - Submit a batch script to SLURM.
squeue
---squeue - view information about jobs located in the SLURM scheduling queue.
sinfo
---sinfo - view information about SLURM nodes and partitions.
scancel
---scancel - Used to signal jobs or job steps that are under the control of Slurm.
scontrol
---scontrol - Used to view Slurm configuration and state. (Example: @mio001[~]->scontrol show node phi001)

 

Rosetta Stone

rosetta.pdfPDF versionText only version
Shows mapping between common PBS/Slurm/Load Leveler commands
Slurm Documentation

 

HPC@Mines Specific Commands

There are slurm related commands that are unique to HPC@Mines.

slurmnodes
Information about nodes available
slurmjobs
Information about queue and running jobs
expands
A utility for getting a full list of nodes used for a job

"-help" or "-h" options are available for each of the commands

     Example:
[joeuser@mio001 utility]$ printenv SLURM_NODELIST
compute[004-005]
[joeuser@mio001 utility]$  ./expands  $SLURM_NODELIST
compute004
compute004
compute004
compute004
compute005
compute005
compute005
compute005

Mio Specific Slurm Commands

The scheduler on mio has partitions. If you don't care which nodes you run on you do not need to specify a partition. If you would like to run on your groups nodes or want to run on the PHI or GPU nodes you need to specify a partition. As discussed below, the command for submitting a batch job is sbatch script where script is your script. To run in the phi partition, and thus on the phi nodes, the syntax would be sbatch -p phi script

As of July 14 15:51:39 MDT 2014 the following partitions are defined.

 

[joeuser@mio001 ~]$ sinfo -a
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
compute*     up 6-00:00:00     52  alloc compute[032-047,061,068-083,102-111,119-122,125-129]
compute*     up 6-00:00:00     74   idle compute[000-005,008-031,049-052,054-060,062-067,084-101,112-118,123-124]
phi          up 6-00:00:00      1    mix phi001
phi          up 6-00:00:00      1   idle phi002
gpu          up 6-00:00:00      3   idle gpu[001-003]
hkazemi      up 6-00:00:00      1   idle compute031
anewman      up 6-00:00:00      1   idle compute055
asum         up 6-00:00:00      8   idle compute[051-052,094-099]
cciobanu     up 6-00:00:00      3   idle compute[054,090-091]
cmmaupin     up 6-00:00:00     10   idle compute[016-025]
geco         up 6-00:00:00      6   idle compute[084-089]
hpc          up 6-00:00:00      2   idle compute[004-005]
ireimani     up 6-00:00:00      1  alloc compute102
jbrune       up 6-00:00:00      6   idle compute[000-003,100-101]
lcarr        up 6-00:00:00      2  alloc compute[128-129]
lcarr        up 6-00:00:00     11   idle compute[026-030,062-067]
mganesh      up 6-00:00:00      1  alloc compute061
mganesh      up 6-00:00:00      5   idle compute[056-060]
mooney       up 6-00:00:00      2   idle compute[049-050]
nsulliva     up 6-00:00:00      1  alloc compute122
nsulliva     up 6-00:00:00      1   idle compute123
pconstan     up 6-00:00:00      1  alloc compute125
pconstan     up 6-00:00:00      1   idle compute124
psava        up 6-00:00:00     44  alloc compute[032-047,068-083,103-111,119-121]
psava        up 6-00:00:00      7   idle compute[112-118]
zhiwu        up 6-00:00:00      6   idle compute[010-015]
mlusk        up 6-00:00:00      2  alloc compute[126-127]
mlusk        up 6-00:00:00      4   idle compute[008-009,092-093]
mgpu3        up 6-00:00:00      1   idle gpu003

 

HPC@Mines Runtime Policies

The standard maximum walltime is: 6 days or

#SBATCH --time=144:00:00

If you find you do need to request an increased walltime, the official policy is thus:
Each request will be handled on a case-by-case basis.

HPC@Mines strongly encourages other means to tackle larger problems, rather than just extending the maximum walltime; there are two primary approaches to do this.

  1. Increase the amount of parallelism
    By increasing the number of cores/nodes used in your job, you can often decrease the total wall time needed.
  2. Checkpointing
    Checkpointing is the processes of periodically or on certain events saving the state of the execution so that it can be picked up at a later time.  This is extremely helpful if you are afraid a crash or error that could cause your entire run to be lost; this way you have save points every few hours, days, etc.

If you desire help in any of these areas as always the HPC@Mines team is available and willing to help you with the computing aspects of your research, you may email us at hpcinfo@mines.edu.  You may also find that first consulting with members of your group or other peers that are currently using the same code you are running may provide expedited answers to your questions since they already are more familiar with your specific context.

© 2018 Colorado School of Mines | | Equal Opportunity | Privacy Policy | Directories | Text Only | Mines.edu | rss

 
Last Updated: 03/16/2018 14:41:45