########################################### # LANL Mustang trace FAQ version 1.0 Beta # ########################################### 1. Trace description ==================== Mustang was an HPC cluster used for capacity computing at the Los Alamos National Lab (LANL) from 2011 to 2016. Capacity clusters such as Mustang are architected as cost-effective, general-purpose resources for a large number of users. Mustang was largely used by scientists, engineers, and software developers at LANL and it was allocated to these users at the granularity of physical nodes. The cluster consisted of 1600 identical compute nodes, with a total of 38400 AMD Opteron 6176 2.3GHz cores and 102TB RAM. This Mustang dataset covers the entire 61 months of the machine's operation from October 2011 to November 2016, which makes this the longest publicly available cluster to date. The Mustang trace is also unique because its jobs are shorter than those in existing HPC traces. Overall, it consists of 2.1 million multi-node jobs submitted by 565 users and collected by SLURM, an open-source cluster resource manager. Collected data include: timestampts for job stages from submission to termination, job properties such as size and owner, the job's return status, and a time budget per job that if exceeded causes the job to be killed. 2. Scheduling policy ==================== The general scheduling policy is strongly fair-share dominated, however backfill is used even if the fair-share allocation is negative (i.e., user has exceeded their quota). Because SLURM knows the earliest time that the highest priority job can start, and which resources it will need, it can also determine which jobs can be started without delaying it. The backfill feature allows the scheduler to start other, lower-priority jobs so long as they do not delay the highest priority job. The scheduler policy for Mustang was the same in periods of ~6 months at a time, corresponding to different "campaigns". It remained fair-share dominated, but the individual project weightings change for each "campaign" period. Each campaign is expected to consist of the following phases: - Input and dataset validation: jobs that consist of a small number of nodes compared to the dataset size - Problem partitioning: grid sizing - Parametric studies: sweet spot determination of job sizes, nodes, processes per node, memory per process etc. - Computation - Analysis and visualization: jobs that are primarily I/O-bound, implying different node use patterns Jobs that belong to an individual user can get cancelled in batches, where the timestamp and duration match across all the cancelled jobs. This is usually because job scripts allow users to specify multiple jobs linked together by explicitly stating their dependencies in the script. If one job fails or gets cancelled, the rest are automatically cancelled. 3. Fields ========= 3.1 user_ID An identifier of the user submitting the job. There are 565 users in the trace, and user IDs have been anonymized using random numbers. 3.2 group_ID An identifier of the group of the user submitting the job. There are 565 groups in the trace, as each user belongs to the same group. 3.3 submit_time The time when the job was submitted to the scheduler's queue. The Mustang trace covers the period from October 27, 2011 until November 8, 2016. 3.4 start_time The time when the job began executing. The Mustang trace covers the period from October 27, 2011 until November 8, 2016. 3.5 end_time The time when the job completed executing. The Mustang trace covers the period from October 27, 2011 until November 8, 2016. 3.6 job_status The return code of a job. There are three job outcomes: - Completed: the job completed successfully without reporting errors - Cancelled: the job was cancelled by the user or an admin before completing - Timeout: the job exceeded its wallclock limit and was forcefully terminated (see section 3.9) 3.7 node_count The number of nodes assigned to the job by the scheduler. Since this is a homogeneous cluster, and each node consists of 24 identical CPU cores, this field can be used to derive the total number of cores assigned to the job by the scheduler. 3.8 tasks_requested The number of tasks requested, where each task corresponds to a physical CPU core. This may not be equal to the number of actual cores assigned to a job, as jobs at LANL are assigned entire physical nodes. To calculate the total number of physical CPU cores assigned to the job see section 3.7. 3.9 wallclock_limit The requested time quota for the job. Typically ranges from 1 to 960 minutes (i.e., 16 hours which is the max wall clock time for Mustang). There are a few outliers of jobs with longer wallclock limits, however. 4. Contact info =============== Please direct inquiries, feedback, or suggestions regarding this trace to the ATLAS mailing list at info@project-atlas.org 5. Usage ======== Please cite the following paper when you publish work that uses this trace: George Amvrosiadis, Jun Woo Park, Gregory R. Ganger, Garth A. Gibson, Elisabeth Baseman, Nathan DeBardeleben. "On the diversity of cluster workloads and its impact on research results." In Proceedings of the 2018 USENIX Annual Technical Conference, Boston, MA, July 11-13, 2018.