Free Tier - Technical Information ***Page Under Construction***

Hardware


Execute Nodes

January 2023 update

Free Tier has consolidated to 150 execute/compute nodes.

Standard Nodes


Model

HP Enterprise XL170r

CPU

E5-2650v4

Number of CPUs

2

Cores per CPU

12

Total Cores

24

Memory

128 GB

Network

EDR Infiniband


High Memory Nodes


Free Tier's 32 high memory nodes have 512 GB of memory. They are otherwise identical to the standard nodes.


GPU Nodes


Free Tier has types of GPU nodes, both of which have the same CPUS and system memory (128 GB) as standard nodes.

  • 12 GPU nodes each have two Nvidia K80 GPUs. 
  • 10 GPU nodes each have two Nvidia P100 GPUs. 


Storage


640 TB of GPFS parallel file system storage is used for scratch space and home directories.

Network

EDR Infiniband which allows for 100 Gbit/s throughput.

Management Nodes

2 login nodes that are load balanced along with a head and secondary/backup node.

Visualization Server


Free Tier has a visualization server which allows web browser GUI access to Free Tier storage and uses GPU acceleration to allow remote visualization of data.It has 4 Tesla K80 GPU's


Scheduler


Free Tier uses the Slurm scheduler to manage jobs.


Fair Share

Resource allocation on our cluster is based on each group's contribution to computing cores. The Slurm scheduler uses fair share targets and historical resource utilization to determine when jobs are scheduled to run. Also, within-group priority is based on historical usage such that heavier users will have a lower priority than light users. Slurm uses all of a job's attributes - such as wall time, resource constraints, and group membership - to determine the order in which jobs are run. 

Using job data such as walltime and resources requested, the scheduler can start other, lower-priority jobs so long as they do not delay the highest priority jobs. Because it works by essentially filling in holes in node space, backfill tends to favor smaller and shorter running jobs more than larger and longer running ones.

There is no preemption in the current system; a job in the queue will never interrupt or stop a job in run state.


Partitions