The Department of Statistics maintains its own small-scale hybrid high performance computing (HPC) and high-throughput computing (HTC) cluster to enable students, faculty and staff access to a highly tailored computation environment that suits growing needs for teaching and learning as well as research. This cluster uses the widely adopted Simple Linux Utility for Resource Management (SLURM) as its job scheduler.
Unique to UW-Madison’s Department of Statistics HPC/HTC cluster is the offering of pre-built software and environments for immediate use in the classroom and for most basic job submission types used in the field of statistics.
To provide an easy-to-use HPC/HTC facilitated service and infrastructure that meets the ever changing needs of the teaching and learning community in Statistics. Achieving modern research goals using big data and conveying cluster computing skills to learners, including a growing undergraduate program, are the highest priority tasks. Providing the tools, resources and consulting to support this mission will continue to evolve as the field of Statistics grows.
The Case for Cluster Computing and Batch Submission Scheduling Methods
There are many logical reasons for using a batch job scheduler to run most types of computation in Statistics. The most basic is that of logistics when using Linux and active displays. In the scenario where you would like to run a simulation but it takes longer than a full working day, without a batch submission to a scheduler you would start your application by double clicking and opening a graphical user interface, or perhaps start it from the command line. You begin your computation and the application starts running.
At this point you need to leave the office, the coffee shop, or your kitchen table as you move on for the day. You must be sure that your computer remains on, without falling asleep, and stay connected to the session in order for your job to complete. This is not efficient as lots of things can interrupt your job such as
- Laptop battery dies
- Your operating system freezes
- Wifi connection shuts off
- Power goes out at home
- A remote server reboots
For these reasons alone it is not advised to start long running jobs from an interactive session on any computer. Instead, the batch job submission method offers a way for you to ''hand off`` your job to a service that will hold onto your request for resources until they are made available - and even if its 12 hours later it will then start your job automatically for you. It will also handle what to do if a computer is restarted - the scheduler will restart your job where it left off. For this reason alone it is always advised to submit your job to the HPC for scheduling in order to be sure your job completes successfully. If it does not it will also provide you with proper logging and error output so that you can troubleshoot and submit again.