BETA High Performance Computing

**This document is in progress**

DISCLAIMER: The Department of Statistics HPC cluster is currently in BETA mode. All users of the HPC system assume responsibility for their work and have the full understanding that the system is currently being tested and configured.

General Information
Infrastructure
Requesting Access to the HPC cluster
Getting Started
Library Paths
Example Submit Scripts

General Information

The Department of Statistics maintains it's own small HPC cluster for departmental and individual research use. The project is currently in BETA mode.

Notes from a lecture given by Mike Cammilleri on SLURM are available here.


Infrastructure

The cluster is currently made up of six compute nodes, one head node, and a storage node. Each compute node consists of two Intel Xeon E5-2680 v3 @ 2.50GHz CPU's and 128 GB of RAM, and with hyperthreading there are total cluster resources of 288 cores and 768 GB of RAM available.

Two 'partitions' are available to run your jobs depending on time required. To view available partitions to run your job on:


[mikec@lunchbox] (293)$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
long up 14-00:00:0 4 idle marzano[01-04]
short* up 4-00:00:00 2 idle marzano[05-06]

We see two partitions - The first one listed is named 'long' and this has a time limit of two weeks. The second partition listed is named 'short' and is the default partition - it's time limit is 4 days. We also see that the 'long' partition has 4 nodes (marzano[01-04]) and the 'short' partition has two (marzano[05-06]). You can specify which partition to run your job in by using the -p option in your submit script. See examples further below.


Requesting access to Stats HPC

Send mail to lab @ stat.wisc.edu to request access to the HPC submit node named lunchbox. From within the submit node, all your data must reside in the /workspace directory in order for the HPC cluster to read/write for your job. The HPC cluster is not configured to run from within your users home directory in AFS. If you have custom R packages or software requests that you need, please email the lab and we will configure the cluster for your needs.


Getting Started - your first SLURM job

SLURM is resource manager that schedules your jobs to the cluster. Lots of information can be found at SLURM's website. You will use SLURM to submit all jobs.

Slurm commands are located in [mikec@lunchbox] (57)$ which sbatch
/s/slurm/bin/sbatch
So you will probably want to include /s/slurm/bin in your PATH in your ~/.bashrc.local file.

1. log into lunchbox
2. cd /workspace
3. mkdir -p [something, probably your username]
4. cd [something, probably your username]
5. create your script file here

Below is an example of a basic slurm batch script named submit.sh used to submit a single R job called 'myjob.R'.


#!/bin/bash
R CMD BATCH --no-save myjob.R myjob.out

Submit the batch script to SLURM.


[mikec@lunchbox] (31)$ pwd
/workspace/mikec
[mikec@lunchbox] (32)$ ls
submit.sh*
[mikec@lunchbox] (33)$ sbatch submit.sh
Submitted batch job 5499

TIP: It is possible to submit your SLURM batch script to the system from your user home directory by including the #SBATCH --workdir=/workspace/ directive which tells slurm to write output to /workspace. However, if your program you're running in the script tries to read from your home directory it will fail. See the sbatch man page for more info.

To view currently running jobs, use scontrol and squeue.


[mikec@lunchbox] (17)$ scontrol show job 5499
JobId=5499 JobName=submit.sh
UserId=mikec(3691) GroupId=mikec(3691) MCS_label=N/A
Priority=110085 Nice=0 Account=mikec QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:05 TimeLimit=4-00:00:00 TimeMin=N/A
SubmitTime=2016-12-06T12:47:28 EligibleTime=2016-12-06T12:47:28
StartTime=2016-12-06T12:47:28 EndTime=2016-12-10T12:47:28 Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=short AllocNode:Sid=lunchbox:3213
ReqNodeList=(null) ExcNodeList=(null)
NodeList=marzano05
BatchHost=marzano05
NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=2,node=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/workspace/mikec/submit.sh
WorkDir=/workspace/mikec
StdErr=/workspace/mikec/slurm-5499.out
StdIn=/dev/null
StdOut=/workspace/mikec/slurm-5499.out
Power=


[mikec@lunchbox] (18)$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
5499 short submit.s mikec R 0:06 1 marzano05

The sinfo command can also be used to show which cluster nodes are up or in use

[mikec@lunchbox] (19)$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
long up 14-00:00:0 4 idle marzano[01-04]
short* up 4-00:00:00 1 mix marzano05
short* up 4-00:00:00 1 idle marzano06

Specifying resources and other parameters

Below is an example of a SLURM batch script with additional parameters.


#!/bin/bash
#SBATCH -o outputfile.out
#SBATCH -e error.out
#SBATCH -D working_Directory
#SBATCH -J job_name
#SBATCH -t 72:00:00
#SBATCH -p long
#SBATCH --ntasks=48
R CMD BATCH --no-save myjob.R myjob.out

SLURM defaults with slurm-.out for all output, but we can specify an output file with -o. We can tell it to send errors to a file with -e. It is useful to use -D to explicity set the working directory - this will eliminate having to use full paths in the actual job execution code below. -J simply give the job a name but has no bearing on the jobs behavior. We can give maximum time our job will run with -t. This tells the scheduler that if our job will finish sometime before that, and if it's still running it can stop it. The -p option tells the scheduler where to run your job. In this example, our job will run longer than 4 days so we will specify the 'long' partition. The last option, -n, specifies that we will need 48 cpu's (48 processes will launch from our script). Each user has a cpu limit of 48. Always use less cpu's if you want your job scheduled more quickly.

IMPORTANT: If you set your job's intended requirements, especially -t (time), you will have a greater chance for getting your job queued and completed quickly. If you do not set parameters, your job will be set equal to others who have also not set these parameters, thus resulting in a "First In, First Out" linear scheduling which can result in long wait periods. Please set your job's time expectations.

Other directives for multi-core jobs and multi-node jobs are


#!/bin/bash
#SBATCH -N 6
#SBATCH --cpus-per-task 8
R CMD BATCH --no-save myjob.R myjob.out

-N tells the scheduler to use 6 compute nodes (individual machines)
--cpus-per-task=8 says use 8 cores on each node

In most cases you'll only be concerned with --cpus-per-task unless you are invoking true parallel tasks with MPI or multi-threading.

NOTE: The #SBATCH directives tell the scheduler what resources we require. It is up to your code to actually use the allotted resources. If you do not allocate the resources with #SBATCH for what your job requires, it will most likely fail.

See sbatch man page for more info.

Also, our friends at UC Berkeley are using a similar system. Chris Paciorek has useful information regarding SLURM and MPI, among other things found here. Some of this information will translate to our system. Thanks to Chris Paciorek for assisting us on our SLURM implementation.


Library Paths

Access to packages for your job must be accessible from each node in the cluster. Therefore, all packages and special libraries must also be located in /workspace/user_name/some_directory. Default paths for R libs are usually in your AFS space, such as ~username/R/, and this will not work for the HPC cluster.

R

Create a directory in /workspace/username to install packages that R will need while running in the cluster. Here's an example with user 'mikec':

export R_LIBS="/workspace/mikec/R/3.3"
mkdir /workspace/mikec/R/3.3

Now you can install R packages to this location:

install.packages("some_package")

Now make sure R will use this new path by setting it with .libPaths(). This will add the library path to your list of library locations. If it's the only place that package is installed, it will load it from there.

> .libPaths( c( .libPaths(), "/workspace/mikec/R/3.3") )
> .libPaths()
[1] "/afs/cs.wisc.edu/u/m/i/mikec/R/x86_64-pc-linux-gnu-library/3.3"
[2] "/usr/local/lib/R/site-library"
[3] "/usr/lib/R/site-library"
[4] "/usr/lib/R/library"
[5] "/workspace/mikec/R/3.3"
>

There is a central location for R packages maintained by the lab in /workspace/software/R/library. You can tell R to check for packages here first, or any other library first by setting your R_LIBS environment variable instead of just adding it with .libPaths() as the previous example:
export R_LIBS=/workspace/software/R/library
Then within R we see:

> .libPaths()
[1] "/workspace/software/R/library"
[2] "/afs/cs.wisc.edu/u/m/i/mikec/R/x86_64-pc-linux-gnu-library/3.3"
[3] "/afs/cs.wisc.edu/s/R_LIBS_UBU14-3.3/library"
[4] "/usr/lib/R/site-library"
[5] "/usr/lib/R/library"

Python

If using python from /workspace/software then packages are available from the default location of /workspace/software/pythyon-x.x.x/lib/pythonX.X/site-packages. These packages are installed and maintained by the lab. If you want to install to your own package location, extract the package and run it's setup file with the --prefix option:


/workspace/software/python-3.5.2/bin/python3 /workspace/your/python/package/dir/setup.py install --prefix=/workspace/your/python/package/dir

Or you can use pip with similar options:


/workspace/software/python-3.5.2/bin/pip3 install --install-option="--prefix=/workspace/your/python/package/dir" package_name

To use your custom package location, set the PYTHONPATH environment variable before python execution.


export PYTHONPATH=/workspace/your/python/package/dir


Example Submit Scripts

Julia

When using Julia with the Stat HPC there are couple things to note:

1. You can either have your Julia package directory in your working directory at /workspace/yourusername/.julia or you can request a package be available by mailing lab AT stat.wisc.edu. If you want to use your own package directory, remember that it must be in your directory in /workspace/yourusername/some_directory.
2. If using your own julia package directory be sure to set the JULIA_PKGDIR environment variable in your submit script as in the example script below. Also, if this is the first time using your package and/or package directory you should initialize it and precompile the package by

A) setting your JULIA_PKGDIR environment variable at the prompt (export JULIA_PKGDIR=/workspace/yourusername/.julia),
B) running the julia at /workspace/software/julia-0.5.0/julia on lunchbox.stat.wisc.edu, then
C) initialize your package repository by running Pkg.init(), then
D) add your desired package, for example, Pkg.add("PhyloNetworks"), and then
E) precompile the package by running 'using PhyloNetworks;'

Example using your own julia package directory in your /workspace/yourusername directory:

#!/bin/bash
#SBATCH --mail-type=ALL
#SBATCH --mail-user=yourusername@stat.wisc.edu
export JULIA_PKGDIR=/workspace/yourusername/.julia
/workspace/software/julia-0.5.0/julia /workspace/yourusername/file_with_your_julia_code.jl

Eample using system-wide julia package directory maintained by the Stat Lab (send requests to lab AT stat.wisc.edu):

#!/bin/bash
#SBATCH --mail-type=ALL
#SBATCH --mail-user=yourusername@stat.wisc.edu
export JULIA_PKGDIR=/workspace/software/julia-0.5.0/usr/local/share/julia/site
/workspace/software/julia-0.5.0/julia /workspace/yourusername/file_with_your_julia_code.jl

**This document is in progress**