Mio Reservations, Node Selection, Interactive Runs

Reservations on Mio

Reservations are no longer required on Mio to evict people from your nodes. In the past people would set a reservation for their nodes and in doing so purge jobs from users not belonging to their group. Now, people need only run the job, selecting to run in their group's partition. See Selecting Nodes on Mio and Running only on nodes you ownbelow.

Selecting Nodes on Mio

There are two ways to manually select nodes on which to run. They can be listed on the command line or by selecting a partition. The "partition" method is discussed in the next section.

We have below a section of the man page for srun that describes how to specify a list of nodes on which to run:

-w, --nodelist=<host1,host2,... or filename>
    Request a specific list of hosts. The job will contain at least these hosts.
    The list may be specified as a comma-separated list of  hosts, a range of hosts
    (compute[1-5,7,...] for example), or a filename.  The host list will be assumed to 
    be a filename if it contains a "/" character. If you specify a max node count 
    (-N1-2) if there are more than 2 hosts in the file only the first 2 nodes will 
    be used in the  request list.   Rather  than  repeating  a host name multiple 
    times, an asterisk and a repitition count may be appended to a host name. For 
    example "compute1,compute1" and "compute1*2" are equivalent.

Example: running the script myscript on compute001, compute002, and compute003...

[joeuser@mio001 ~]sbatch --nodelist=compute[001-003]  myscript

Example: running the "hello world" program /opt/utility/phostname interactively on compute001, compute002, and compute003...

[joeuser@mio001 ~]srun --nodelist=compute[001-003]  --tasks-per-node=4 /opt/utility/phostname
[joeuser@mio001 color]$ 

Running only on nodes you own (or in a particular partition)

Every normal compute node (exceptions are GPU and PHI nodes) on mio is part of two partitions or groupings. They are part of the compute partition and they are part of a partition that is assigned to a research group. That is, each research group has a partition and their nodes are in that partition. The GPU and PHI nodes are in their own partition to prevent people from accidentally running on them.

You can see the partitions that you are allowed to use (compute, phi, gpu and your groups partions) by running the command sinfo. sinfo -node will display which partitions you are allowed to run in. sinfo -a will show all partitions.

Add the option -p partition_name to your srun command run in the named partition.

The default partition is compute which is all of the normal nodes. By default your job can end up on any nodes. Specifying your groups partition will restrict your job to "your" nodes.

Also, starting a job in your groups partition will purge any job running on your nodes that are run under the default partition. Thus, it is not necessary to create a reservation to gain access to your nodes. If you do not run in your partition your jobs have the potential to be deleted by the group owning the nodes.

Running threaded jobs and/or Running with less than N MPI tasks per node

Slurm will try to pack as many tasks on a node as it can to try to fill it so that there is at least 1 task or thread per core. So if you are running less than N MPI tasks per node where N is the number of cores slurm may put additional jobs on your node.

You can prevent this from happening by selecting setting values for the flags --tasks-per-node and --cpus-per-task on your sbatch command line or in you slurm script. The value for --tasks-per-node times --cpus-per-task should be the number of cores on the node. For example, if you are running on 2 16 core nodes you want 8 MPI tasks you might say

--nodes=2 --tasks-per-node=4 --cpus-per-task=4

where 2*4*4=32 or the total number of cores on two nodes.

You can also prevent additional jobs from running on nodes by using the --exclusive flag

Running Interactive

The page http://hpc.mines.edu/bluem/interactive.html describes how to run interactively under slurm. This page is a work in progress. In particular the section on how your environment is propagated to the compute nodes needs to be expanded.