SPEChpc™ 2021 Overview / What's New?

$Id$ Latest: www.spec.org/hpg/hpc2021/Docs/

This document introduces SPEChpc 2021 via a series of questions and answers.

Benchmarks: good, bad, difficult, and standard

Q1. What is SPEC?

Q2. What is a (good) benchmark?

Q3. What are some common benchmarking mistakes?

Q4. Should I benchmark my own application?

Q5. Should I use a standard benchmark?

SPEChpc 2021 Basics

Q6. What does SPEChpc 2021 measure?

Q7. Should I use SPEChpc 2021? Why or why not?

Q8. What does SPEC provide?

Q9. What must I provide?

Q10. What are the basic steps to run SPEChpc 2021?

Q11. How long does it take?

Suites and Benchmarks

Q12. What is a SPEChpc 2021 "suite"?

Q13. What are the benchmarks?

Q14. Are 5xx.benchmark, 6nn.benchmark, 7nn.benchmark, and 8nn.benchmark different?

SPEChpc 2021 Metrics

Q15. What is a Metric?

Q16. How are the metrics calculated?

Q17. What are "base" and "peak" metrics?

Q18. Which SPEChpc 2021 metric should I use?

Q19: What is a "reference machine"? Why use one?

What's new?

Q20: Compared to SPEC MPI2007, OMP2012, and ACCEL, what's new SPEChpc 2021?

1. Benchmarks and Workloads

2. Source Code: C11, Fortran-2008, C++2014

3. Compiler versions are required

4. Usability

Publishing results

Q21: Where can I find SPEChpc 2021 results?

Q22: Can I publish elsewhere? Do the rules still apply?

Transitions

Q23: What will happen to other SPEC/HPG benchmarks?

SPEChpc 2021 Benchmark Selection

Q24: What criteria were used?

Q25: Were some benchmarks 'kept' from previous suites?

Q26. Are the benchmarks comparable to other programs?

Miscellaneous

Q27: Can I run the benchmarks manually?

Q28. How do I contact SPEC?

Q29. What should I do next?

Benchmarks; good, bad, difficult, and standard

Q1. What is SPEC?

SPEC is the Standard Performance Evaluation Corporation, a non-profit organization founded in 1988 to establish standardized performance benchmarks that are objective, meaningful, clearly defined, and readily available. SPEC members include hardware and software vendors, universities, and researchers. [About SPEC]

SPEC was founded on the realization that "An ounce of honest data is worth a pound of marketing hype".

Q2. What is a (good) benchmark?

A surveyor's bench mark (two words) defines a known reference point by which other locations may be measured.
A computer benchmark performs a known set of operations by which computer performance can be measured.

Table 1: Characteristics of useful performance benchmarks
Specifies a workload A strictly-defined set of operations to be performed.
Produces at least one metric

A numeric representation of performance. Common metrics include:

  • Time - For example, seconds to complete the workload.
  • Throughput - Work completed per unit of time, for example, jobs per hour.
Is reproducible If repeated, will report similar (*) metrics.
Is portable Can be run on a variety of interesting systems.
Is comparable If the metric is reported for multiple systems, the values are meaningful and useful.
Checks for correct operation

Verify that meaningful output is generated and that the work is actually done.

"I can make it run as fast as you like if you remove the constraint of getting correct answers." (**)
Has run rules A clear definition of required and forbidden hardware, software, optimization, tuning, and procedures.

(*) "Similar" performance will depend on context. The benchmark should include guidelines as to what variation one should expect if the benchmark is run multiple times.

(**) Author unknown. If you know who said it first, write.

Q3. What are some common benchmarking mistakes?

Creating high-quality benchmarks takes time and effort. There are some difficulties that need to be avoided.
The difficulties listed in the table are based on real examples, and the solutions are what SPEChpc tries to do about them.

If the benchmark description says: There may be potential difficulties: Solutions

1. It runs Loop 1 billion times.

Compiler X runs 1 billion times faster than Compiler Y, because compilers are allowed to skip work that has no effect on program outputs ("dead code elimination"). Benchmarks should print something.

2. Answers are printed, but not checked, because Minor Floating Point Differences are expected.

  • What if Minor Floating Point Difference sends it down Utterly Different Program Path?
  • If the program hits an error condition, it might "finish" twice as fast because half the work was never attempted.
Answers should be validated, within some sort of tolerance.

3. The benchmark is already compiled.
Just download and run.

You may want to compare new hardware, new operating systems, new compilers. Source code benchmarks allow a broader range of systems to be tested.

4. The benchmark is portable.
Just use compiler X and operating system Y.

You may want to compare other compilers and other operating systems. Test across multiple compilers and OS versions prior to release.

5. The benchmark measures X.

Has this been checked?
If not, measurements may be dominated by benchmark setup time, rather than the intended operations.
Analyze profile data prior to release, verify what it measures.

6. The benchmark is a slightly modified version of Well Known Benchmark.

  • Is there an exact writeup of the modifications?
  • Did the modifications break comparability?
Someone should check.
Create a process to do so.

7. The benchmark does not have a Run Rules document, because it is obvious how to run it correctly.

Although "obvious" now, questions may come up.
A change that seems innocent to one person may surprise another.
Explicit rules improve the likelihood that results can be meaningfully compared.

8. The benchmark is a collection of low-level operations representing X.

How do you know that it is representative? Prefer benchmarks that are derived from real applications.

Q4. Should I benchmark my own application?

Yes, if you can; but it may be difficult.

Ideally, the best comparison test for systems would be your own application with your own workload. Unfortunately, it is often impractical to get a wide set of comparable system measurements using your own application with your own workload. For example, it may be difficult to extract the application sections that you want to benchmark, or too difficult to remove confidential information from data sets.

It takes time and effort to create a good benchmark, and it is easy to fall into common mistakes.

Q5. Should I use a standard benchmark?

Maybe. A standardized benchmarks may provide a reference point, if you use it carefully.

You may find that a standardized benchmark has already been run on systems that you are interested in. Ideally, that benchmark will provide all the characteristics of Table 1 while avoiding common benchmark mistakes.

Before you consider the results of a standardized benchmark, you should consider whether it measures things that are important to your own application characteristics and computing needs. For example, a benchmark that emphasizes CPU performance will have limited usefulness if your primary concern is network throughput.

A standardized benchmark can serve as a useful reference point, but SPEC does not claim that any standardized benchmark can replace benchmarking your own actual application when you are selecting vendors or products.

SPEChpc 2021 Basics

Q6. What does SPEChpc 2021 measure?

SPEChpc 2021 focuses on compute intensive parallel performance across one or more nodes, which means these benchmarks emphasize the performance of:

SPEChpc 2021 intentionally depends on all of the above - not just the processor.

SPEChpc 2021 is not intended to stress other computer components such as graphics, Java libraries, or the I/O system. Note that there are other SPEC benchmarks that focus on those areas.

Q7.Should I use SPEChpc 2021? Why or why not?

SPEChpc 2021 provides a comparative measure of compute intensive parallel performance using MPI or optionally, hybrid MPI+OpenACC or MPI+OpenMP. If this matches with the type of workloads you are interested in, SPEChpc 2021 provides a good reference point.

Other advantages to using SPEChpc 2021 include:

Limitations of SPEChpc 2021: As described above, the ideal benchmark for vendor or product selection would be your own workload on your own application. Please bear in mind that no standardized benchmark can provide a perfect model of the realities of your particular system and user community.

Q8. What does SPEC provide?

SPEChpc 2021 is distributed as an tar or ISO package that contains:

The documentation is also available at www.spec.org/hpg/hpc2021/Docs/index.html, including the Linux installation guide.

Q9. What must I provide?

Briefly, you will need one or more interconnected nodes running Linux:

The above is only an abbreviated summary. See detail in the System Requirements document.


Reportable runs for all suites require compilation of all three languages (C, C++, Fortran).

Q10. What are the basic steps to run SPEChpc 2021?

A one-page summary is in SPEChpc 2021 Quick Start. Here is a summary of the summary:

Q11: How long does it take to run?

Run time depends on the suite, the cluster that it is running on, and how many ranks or threads are chosen.
When creating the workloads for each benchmark, the target reference runtime was approximately 15-30 minutes when running with MPI-only using 24 ranks for the Tiny suite, 240 ranks for the Small suite, 2040 ranks for the Medium suite, and 8160 ranks for the Large suite, on an Intel Haswell CPU based cluster. Hence, running a full compliant run could take 6 to 10 hours per suite.

This may seem excessively long, but is necessary to allow for future CPU performance improvements, use of accelerators, and the ability to scale each suite using larger rank counts.

Suites and Benchmarks

Q12. What is a SPEChpc 2021 "suite"?

A suite is a set of benchmarks that are run as a group to produce one of the overall metrics.

SPEChpc 2021 includes four suites that focus on different workload sizes:

Short
Tag
Suite Contents Metrics How many ranks?
What do Higher Scores Mean?
Tiny SPEChpc 2021 Tiny Workload 9 benchmarks SPEChpc 2021_tny_base
SPEChpc 2021_tny_peak
The Tiny workloads use up to 60GB of memory and are intended for use on a single node using between 1 and 256 ranks. More nodes and ranks may be used however higher rank counts may see lower scaling as MPI communication becomes more dominant.
Higher scores indicate that less time is needed.
Small SPEChpc 2021 Small Workload 9 benchmarks SPEChpc 2021_sml_base
SPEChpc 2021_sml_peak
The Small workloads use up to 480GB of memory and are intended for use on one or more nodes using between 64 and 1024 ranks. More ranks may be used however higher rank counts may see lower scaling as MPI communication becomes more dominant.
Higher scores indicate that less time is needed.
Medium SPEChpc 2021 Medium Workload 6 benchmarks SPEChpc 2021_med_base
SPEChpc 2021_med_peak
The Medium workloads use up to 4TB of memory and are intended for use on a mid-size cluster using between 256 and 4096 ranks. More ranks may be used however higher rank counts may see lower scaling as MPI communication becomes more dominant.
Higher scores indicate that less time is needed.
Large SPEChpc 2021 Large Workload 6 benchmarks SPEChpc 2021_lrg_base
SPEChpc 2021_lrg_peak
The Large workloads use up to 14.5TB of memory and are intended for use on a larger clusters using between 2048 and 32,768 ranks. More ranks may be used however higher rank counts may see lower scaling as MPI communication becomes more dominant.
Higher scores indicate that less time is needed.
The "Short Tag" is the canonical abbreviation for use with runhpc, where context is defined by the tools. In a published document, context may not be clear.
To avoid ambiguity in published documents, the Suite Name or the Metrics should be spelled as shown above.

Q13. What are the benchmarks?

SPEChpc has 9 benchmarks, organized into 4 suites by workload size: Tiny, Small, Medium, and Large.

Application Name Benchmark Language Approximate LOC Application Area
Tiny Small Medium Large
LBM D2Q37 505.lbm_t 605.lbm_s 705.lbm_m 805.lbm_l C 9000 Computational Fluid Dynamics
SOMA Offers Monte-Carlo Acceleration 513.soma_t 613.soma_s Not included. C 9500 Physics / Polymeric Systems
Tealeaf 518.tealeaf_t 618.tealeaf_s 718.tealeaf_m 818.tealeaf_l C 5400 Physics / High Energy Physics
Cloverleaf 519.clvleaf_t 619.clvleaf_s 719.clvleaf_m 819.clvleaf_l Fortran 12,500 Physics / High Energy Physics
Minisweep 521.miniswp_t 621.miniswp_s Not included. C 17,500 Nuclear Engineering - Radiation Transport
POT3D 528.pot3d_t 628.pot3d_s 728.pot3d_m 828.pot3d_l Fortran 495,000 (Includes HDF5 library) Solar Physics
SPH-EXA 532.sph_exa_t 632.sph_exa_s Not included. C++14 3400 Astrophysics and Cosmology
HPGMG-FV 534.hpgmgfv_t 634.hpgmgfv_s 734.hpgmgfv_m 834.hpgmgfv_l C 16,700 Cosmology, Astrophysics, Combustion
miniWeather 535.weather_t 635.weather_s 735.weather_m 835.weather_l Fortran 1100 Weather

Q14. Are 5nn.benchmark, 6nn.benchmark, 7nn.benchmark, and 8nn.benchmark different?

Most of the benchmarks in the table above include:

 5nn.benchmark_t for the Tiny workload
 6nn.benchmark_s for the Small workload
 7nn.benchmark_m for the Medium workload
 8nn.benchmark_l for the Large workload
	 

Benchmarks within each are similar to each other and share the same source code.
Main difference is the workload sizes. However the compiler flags and MPI configuration may vary as well. See: [memory]  

SPEChpc 2021 Metrics

Q15. What is a Metric?

A metric is a single composite score (higher is better) for a SPEChpc Suite result which can be compared to other results from the same Suite.

There are many ways to measure computer performance. Among the most common are:


SPEChpc 2021 is a time-based, strong scaling metric.

Q16. How are the metrics calculated?

For each benchmark, a performance ratio is calculated as:

time on a reference machine / time on the SUT

The reference machine ran 505.lbm_t using 24 ranks in 2250 seconds. A particular SUT took about 1/5 the time, scoring about 5.

More precisely: 2250/444 = 5.067567

The reference times may be found in the observations posted with www.spec.org/hpg/hpc2021/results/

Q17. What are "base" and "peak" metrics?

SPEChpc benchmarks are distributed as source code, and must be compiled, which leads to the question:
How should they be compiled? What node-level parallel model to use, if at all? There are many possibilities, ranging from

--debug --no-optimize

at the low end through highly customized optimization and even source code re-writing at a high end. Any point chosen from that range might seem arbitrary to those whose interests lie at a different point. Nevertheless, choices must be made.

For SPEChpc 2021, SPEC has chosen to allow two points in the range. The first may be of more interest to those who prefer a relatively simple build process; the second may be of more interest to those who are willing to invest more effort in order to achieve better performance.

Options allowed under the base rules are a subset of those allowed under the peak rules. A legal base result is also legal under the peak rules but a legal peak result is NOT necessarily legal under the base rules.

For more information, see the SPEChpc 2021 Run and Reporting Rules.

Q18. Which SPEChpc 2021 metric should I use?

It depends on your needs; you get to choose, depending on how you use computers, and these choices will differ from person to person. Though the choice largely is determined by the size of the cluster being benchmarked.

Q19. What is a "reference machine"? Why use one?

SPEC uses a reference machine to normalize the performance metrics used in the SPEChpc 2021 suites. Each benchmark is run and measured on this machine to establish a reference time for that benchmark. These times are then used in the SPEC calculations.

The reference system is TU Dresden's Taurus System using the Haswell CPU Islands. Each node contains a 2-socket 12-core Haswell (24 cores total) with 64GB of memory. Tiny reference time uses 24 ranks on a single node. Small uses 10 nodes (240 ranks), Medium use 85 nodes (2040 ranks), and Large use 340 nodes (8160 ranks).

Note that when comparing any two systems measured with the SPEChpc 2021, their performance relative to each other would remain the same even if a different reference machine was used. This is a consequence of the mathematics involved in calculating the individual and overall (geometric mean) metrics.

Q20. What's new in SPEChpc 2021?

Compared to SPEC MPI2007, OMP2012, and ACCEL, what's new SPEChpc 2021?

While previous SPEC/HPG benchmarks focused on a single parallel model, MPI, OpenMP, or OpenACC, SPEChpc combines these into a single comprehensive set of suites that can use MPI, MPI+OpenMP, or MPI+OpenACC. This allows users to better select the appropriate model for their system and allow for comparisons across a wider variety of systems.

1. Benchmarks and Workloads

2. Source Code: C11, Fortran-2008, C++14

3. Compiler versions are required

SPEChpc performance depends on compilers. For SPEChpc 2021, all config files must include:

CC_VERSION_OPTION   = (switch that causes version to be printed)
CXX VERSION_OPTION  = "       "    "      "       "  "  "
FC_VERSION_OPTION   = "       "    "      "       "  "  "

Builds will fail until you add the above to your config file.
Your installed copy of SPEChpc 2021 includes Example config files that demonstrate what to include for a variety of compilers. [detail]

4. Usability

Environment - It is easier to get things in and out of the environment now.
You can access environment variables using ENV macros.
You can set the environment for the full run using preenv.
You can set the environment for a single benchmark (in peak only) using env_vars.

OMP_NUM_THREADS and ACC_NUM_CORES are set automatically via --threads or threads.

Command line wins - The former interaction of command line and config file was deemed too confusing. For SPEChpc 2021, if an option can be specified both in a config file and on the runhpc command line, then the command line option is used.

Header section - The header section allows you to set many general options for the run (examples: number of iterations, where to email results). For previous suites, all such settings had to be at the top of the config file, which was sometimes inconvenient (e.g. when including files). For SPEChpc 2021, you can return to the header section at any time, using the section named

default:

or, equivalently,

default=default=default:

Easier-to-navigate reference document - config.html tries to provide memorable "#topicURLs" for easier navigation.
Section URLs of the long document are intended to be easy to memorize or guess.
If you memorize: www.spec.org/hpc2021/Docs/config.html
The rest is usually: #topic

Macros can be dumped and less scary info messages can be printed.

Synonyms

Publishing results

Q21: Where can I find SPEChpc 2021 results?

SPEChpc 2021 results submitted to SPEC are available at https://www.spec.org/hpg/hpc2021/results/.

Q22: Can I publish elsewhere? Do the rules still apply?

Yes, SPEChpc 2021 results can be published independently, and Yes, the rules still apply.

Although you are allowed to publish independently, SPEC encourages results to be submitted for publication on SPEC's web site, because it ensures a peer review process and uniform presentation of all results.

The Fair Use rule recognizes that Academic and Research usage of the benchmarks may be less formal; the key requirement is that non-compliant numbers must be clearly distinguished from rule-compliant results.

SPEChpc results may be estimated. Estimates must be clearly marked.

Transitions

Q23. What will happen to SPEC MPI2007, OMP2012, and ACCEL?

SPEC/HPG has not retired these benchmarks, but may do so in the future.

Q24: What criteria were used to select the benchmarks?

SPEC considered:

Q25. Were some benchmarks 'kept' from previous suites?

Almost all the benchmarks included in SPEChpc are new. Only Cloverleaf is from a previous suite, SPEC ACCEL. Cloverleaf's core algorithms are the same as SPEC ACCEL but the SPEChpc version adds MPI, OpenMP for CPUs, and has updated the OpenMP Target directives.

Q26. Are the benchmarks comparable to other programs?

Many of the SPEChpc 2021 benchmarks have been derived from publicly available applications. The individual benchmarks in this suite may be similar, but are NOT identical to benchmarks or programs with similar names which may be available from sources other than SPEC. In particular, SPEC has invested significant effort to improve portability and to minimize hardware dependencies, to avoid unfairly favoring one hardware platform over another. For this reason, the application programs in this distribution may perform differently from commercially available versions of the same application.

Therefore, it is not valid to compare SPEChpc 2021 benchmark results with anything other than other SPEChpc 2021 benchmark results.

Miscellaneous

Q27. Can I run the benchmarks manually?

To generate rule-compliant results, an approved toolset must be used. If several attempts at using the SPEC-provided tools are not successful, you should contact SPEC for technical support. SPEC may be able to help you, but this is not always possible -- for example, if you are attempting to build the tools on a platform that is not available to SPEC.

If you just want to work with the benchmarks and do not care to generate publishable results, SPEC provides information about how to do so.

Q28. How do I contact SPEC?

SPEC can be contacted in several ways. For general information, including other means of contacting SPEC, please see SPEC's Web Site at:

https://www.spec.org/

General questions can be emailed to: info@spec.org
SPEChpc 2021 Technical Support Questions can be sent to: hpc2021support@spec.org

Q29. What should I do next?

If you don't have SPEChpc 2021, it is hoped that you will consider ordering it.

Non-commercial users may obtain a no-cost license by applying at: https://www.spec.org/hpgdownload.html
If you are ready to get started, please follow one of these two paths:

I feel impatient.
Let me dive in.
  I want a clear and complete explanation
Quick Start   Read the System Requirements.
Then follow the Install Guide for Linux

SPEChpc™2021 Overview / What's New?: Copyright © 2021 Standard Performance Evaluation Corporation (SPEC)