SPEC CPU®2017 Run and Reporting Rules
SPEC® Open Systems Group

$Id: runrules.html 6491 2020-08-19 23:11:28Z JohnHenning $ Latest: www.spec.org/cpu2017/Docs/

This document sets the requirements to build, run, and report on the SPEC CPU®2017 benchmarks.
These rules may be updated from time to time.
Latest version: www.spec.org/cpu2017/Docs/runrules.html

Testers are required to comply with the version posted as of the date of their testing.
In the event of substantive changes, a notice will be posted at SPEC's top-level page, www.spec.org, to define a transition period during which compliance with the new rules is phased in.

1. Philosophy

1.1 Purpose

1.2 A SPEC CPU 2017 Result Is An Observation

1.2.1 Test Methods

1.2.2 Conditions of Observation

1.2.3 Assumptions About the Tester

1.3 A SPEC CPU 2017 Result Is A Declaration of Expected Performance

1.3.1 Reproducibility

1.3.2 Obtaining Components

1.4 A SPEC CPU 2017 Result is a Claim about Maturity of Performance Methods

1.5 Peak and base builds

1.6 Power Measurement

1.7 Estimates

1.8 About SPEC

1.8.1 Publication on SPEC's web site is encouraged

1.8.2 Publication on SPEC's web site is not required

1.8.3 SPEC May Require New Tests

1.8.4 SPEC May Adapt the Suites

1.9 Usage of the Philosophy Rule

2 Building SPEC CPU 2017

2.1 General Rules for Building the Benchmark

2.1.1 SPEC's tools must be used

2.1.2 The runcpu build environment

2.1.3 Continuous Build requirement

2.1.4 Cross-compilation allowed

2.2 Rules for Selecting Compilation Flags

2.2.1 Must not use names

2.2.2 Limitations on library substitutions

2.2.3 Feedback directed optimization is allowed in peak.

2.2.4 Limitations on size changes

2.2.5 Portability Flags

2.2.6 Compiler parallelization is allowed for SPECspeed,
      not allowed for SPECrate

2.3 Base Optimization Rules

2.3.1 Safety and Standards Conformance

2.3.2 C++ RTTI and exceptions required

2.3.3 IEEE-754 is not required

2.3.4 Regarding accuracy

2.3.5 Base flags same for all

2.3.5.1 Inter-module optimization for multi-language benchmarks

2.3.6 Feedback directed optimization must not be used in base.

2.3.7 Assertion flags must not be used in base.

2.3.8 Floating point reordering allowed

2.3.9 Portability Flags for Data Models

2.3.10 Alignment switches are allowed

2.3.11 Pointer sizes

3. Running SPEC CPU 2017

3.1 Single File System

3.2 Continuous Run Requirements

3.2.1 Base, Peak, and Basepeak

3.2.2 Untimed workloads

3.3 System State

3.4 Run-time environment

3.5 Submit

3.6 SPECrate Number of copies

3.7 SPECspeed Number of threads

3.8 Run-Time Dynamic Optimization

3.8.1 Definitions and Background

3.8.2 RDO Is Allowed, Subject to Certain Conditions

3.8.3 RDO Disclosure and Resources

3.8.4 RDO Settings Cannot Be Changed At Run-time

3.8.5 RDO and safety in base

3.8.6 RDO carry-over by program is not allowed

3.9 Power and Temperature Measurement

3.9.1 SPEC PTDaemon must be used

3.9.2 Line Voltage Source

3.9.3 Environmental Conditions

3.9.4 Network Interfaces

3.9.5 Power Analyzer requirements

3.9.6 Temperature Sensor Requirements

3.9.7 DC Line Voltage

3.9.8 Power Measurement Exclusion

4. Results Disclosure

4.0 One-sentence SUMMARY of Disclosure Requirements

4.1 General disclosure requirements

4.1.1 Tester's responsibility

4.1.2 Sysinfo must be used for published results

4.1.3 Information learned later

4.1.4 Peak Metrics are Optional

4.1.5 Base must be disclosed

4.2 Systems not yet shipped

4.2.1 Pre-production software can be used

4.2.2 Software component names

4.2.3 Specifying dates

4.2.4 If dates are not met

4.2.5 Performance changes for pre-production systems

4.3 Performance changes for production systems

4.4 Configuration Disclosure

4.4.1 System Identification

4.4.2 Hardware Configuration

4.4.3 Software Configuration

4.4.4 Power Management

4.5 Tuning Information

4.6 Description of Tuning Options ("Flags File")

4.7 A result may be published for only one system

4.8 Configuration Disclosure for User Built Systems

4.9 Documentation for cross-compiles

4.10 Metrics

4.10.1 SPECspeed Metrics

4.10.2 SPECrate Metrics

4.10.3 Energy Metrics

5. SPEC Process Information

5.1 Run Rule Exceptions

5.2 Publishing on the SPEC website

5.3 Fair Use

5.4 Research and Academic usage of CPU 2017

5.5 Required Disclosures

5.6 Estimates

5.6.1 Estimates are not allowed for energy metrics

5.6.2 Estimates are allowed for performance metrics

5.7 Procedures for Non-compliant results

Changes since the release of SPEC CPU 2017 v1.1.0

Changes since the release of SPEC CPU 2017 v1.0.0

Highlights of Changes from SPEC CPU 2006:

1. Philosophy

This philosophy section describes the basic rules upon which the rest of the SPEC CPU 2017 rules are built. It provides an overview of the purpose, definitions, methods, and assumptions for the rest of the SPEC CPU 2017 run rules.

1.1. Purpose

The purpose of the SPEC CPU 2017 benchmark and its run rules is to further the cause of fair and objective CPU benchmarking. The rules help ensure that published results are meaningful, comparable to other results, and reproducible. SPEC believes that the user community benefits from an objective series of tests which serve as a common reference.

Per the SPEC license agreement, all SPEC CPU 2017 results disclosed in public -- whether in writing or in verbal form -- must adhere to the SPEC CPU 2017 Run and Reporting Rules, or be clearly described as estimates.

A published SPEC CPU 2017 result is three things:

  1. A performance observation (rule 1.2);
  2. A declaration of expected performance (rule 1.3); and
  3. A claim about maturity of performance methods (rule 1.4).

1.2. A SPEC CPU 2017 Result Is An Observation

A published SPEC CPU 2017 result is an empirical report of performance observed when carrying out certain computationally intensive tasks.

1.2.1. Test Methods

SPEC supplies the CPU 2017 benchmarks in the form of source code, which testers are not allowed to modify except under certain very restricted circumstances. SPEC CPU 2017 includes 43 benchmarks, organized into 4 suites:

Note: this document avoids the (otherwise common) usage "CPU 2017 suite" (singular), instead insisting on "CPU 2017 suites" (plural). Thus a rule that requires consistency within a suite means that consistency is required across a set of 10 or 13 benchmarks, not a set of 43

Testers supply compilers and a System Under Test (SUT). Testers may set optimization flags and, where needed, portability flags, in a SPEC config file. The SPEC CPU 2017 tools automatically:

In order to provide some assurance that results are repeatable, each benchmark is run more than once. The tester may choose:

  1. To run each benchmark three times, in which case the tools use the median time.
  2. Or to run each benchmark twice, in which case the tools use the slower of the two runs.

1.2.2. Conditions of Observation

The report that certain performance has been observed is meaningful only if the conditions of observation are stated. SPEC therefore requires that a published result include a description of all performance-relevant conditions.

1.2.3. Assumptions About the Tester

It is assumed that the tester:

  1. is willing to describe the observation and its conditions clearly;
  2. is able to learn how to operate the SUT in ways that comply with the rules in this document, for example by selecting compilation options that meet SPEC's requirements;
  3. knows the SUT better than those who have only indirect contact with it;
  4. is honest: SPEC CPU does not employ an independent auditor process, though it does have requirements for reproducibility and does encourage use of a peer review process.

The person who actually carries out the test is, therefore, the first and the most important audience for these run rules. The rules attempt to help the tester by trying to be clear about what is and what is not allowed.

1.3. A Published SPEC CPU 2017 Result Is a Declaration of Expected Performance

A published SPEC CPU 2017 result is a declaration that the observed level of performance can be obtained by others. Such declarations are widely used by vendors in their marketing literature, and are expected to be meaningful to ordinary customers.

1.3.1. Reproducibility

It is expected that later testers can obtain a copy of the SPEC CPU 2017 suites, obtain the components described in the original result, and reproduce the claimed performance, within a small range to allow for run-to-run variation.

1.3.2. Obtaining Components

Therefore, it is expected that the components used in a published result can in fact be obtained, with the level of quality commonly expected for products sold to ordinary customers. Such components must:

  1. be specified using customer-recognizable names,
  2. be generally available within certain time frames,
  3. provide documentation,
  4. provide an option for customer support,
  5. be of production quality, and
  6. provide a suitable environment for programming.

The judgment of whether a component meets the above list may sometimes pose difficulty, and various references are given in these rules to guidelines for such judgment. But by way of introduction, imagine a vendor-internal version of a compiler, designated only by an internal code name, unavailable to customers, which frequently generates incorrect code. Such a compiler would fail to provide a suitable environment for general programming, and would not be ready for use in a SPEC CPU 2017 result.

1.4. A SPEC CPU 2017 Result is a Claim About Maturity of Performance Methods

A published SPEC CPU 2017 result carries an implicit claim that the performance methods it employs are more than just "prototype" or "experimental" or "research" methods; it is a claim that there is a certain level of maturity and general applicability in its methods. Unless clearly described as an estimate, a published SPEC result is a claim that the performance methods employed (whether hardware or software, compiler or other):

  1. generate correct code for a class of programs larger than the SPEC CPU 2017 suites,
  2. improve performance for a class of programs larger than the SPEC CPU 2017 suites,
  3. are recommended by the vendor for a specified class of programs larger than the SPEC CPU 2017 suites,
  4. are generally available, documented, supported, and
  5. if used as part of base (rule 2.3), are safe (rule 2.3.1).

SPEC is aware of the importance of optimizations in producing the best performance. SPEC is also aware that it is sometimes hard to draw an exact line between legitimate optimizations that happen to benefit SPEC benchmarks, versus optimizations that exclusively target the SPEC benchmarks. However, with the list above, SPEC wants to increase awareness of implementers and end users to issues of unwanted benchmark-specific optimizations that would be incompatible with SPEC's goal of fair benchmarking.

The tester must describe the performance methods that are used in terms that a performance-aware user can follow, so that users can understand how the performance was obtained and can determine whether the methods may be applicable to their own applications. The tester must be able to make a credible public claim that a class of applications in the real world may benefit from these methods.

1.5. Peak and base builds

"Peak" metrics may be produced by building each benchmark in the suite with a set of optimizations individually selected for that benchmark. The optimizations selected must adhere to the set of general benchmark optimization rules described in rule 2.1 below. This may also be referred to as "aggressive compilation".

"Base" metrics must be produced by building all the benchmarks in the suite with a common set of optimizations. In addition to the general benchmark optimization rules (rule 2.1), base optimizations must adhere to a stricter set of rules described in rule 2.2.

These additional rules serve to form a "baseline" of performance that can be obtained with a single set of compiler switches, single-pass make process, and a high degree of portability, safety, and performance.

  1. The choice of a single set of switches and single-pass make process is intended to reflect the performance that may be attained by a user who is interested in performance, but who prefers not to invest the time required for tuning of individual programs, development of training workloads, and development of multi-pass Makefiles.

  2. SPEC allows base builds to assume that the program follows the relevant language standard (i.e. it is portable). But this assumption may be made only where it does not interfere with getting the expected answer. For all testing, SPEC requires that benchmark outputs match an expected set of outputs, typically within a benchmark-defined tolerance to allow for implementation differences among systems.

    Because the SPEC CPU 2017 benchmarks are drawn from the compute intensive portion of real applications, some of them use popular practices that compilers must commonly cater for, even if those practices are nonstandard. In particular, some of the programs (and, therefore, all of base) may have to be compiled with settings that do not exploit all optimization possibilities that would be possible for programs with perfect standards compliance.

  3. In base, the compiler may not make unsafe assumptions that are more aggressive than what the language standard allows.

  4. Finally, though, as a performance suite, SPEC CPU has throughout its history allowed certain common optimizations to nevertheless be included in base, such as reordering of operands in accordance with algebraic identities.

Rules for building the benchmarks are described in rule 2.

1.6. Power Measurement

Optionally, power may be measured and energy metrics may be produced. In order to provide high-quality power reporting, SPEC requires the use of independent measurement hardware:

Power measurements are subject to validity checks to ensure that a sufficient number of valid samples are collected and that voltage and temperature fall within expected limits. These checks are in addition to the usual checks that benchmarks produce acceptable answers.

For more details on Power Measurement, see rule 3.9.

1.7. Estimates

SPEC CPU 2017 energy metrics must not be estimated.

Power consumption is affected by physical manufacturing variations in all of the active parts in a system; therefore, power consumption can not be estimated with enough accuracy to allow SPEC CPU energy metrics to be estimated.

SPEC CPU 2017 performance metrics may be estimated.

All estimates (rule 5.6) must be clearly designated as such.

This philosophy rule has described how a "result" is an empirical report (rule 1.2) of performance, includes a full disclosure of performance-relevant conditions (rule 1.2.2), can be reproduced (rule 1.3.1), and uses mature performance methods (rule 1.4). By contrast, estimates may fail to provide one or even all of these characteristics.

Nevertheless, estimates have long been seen as valuable for SPEC CPU benchmarks. Estimates are set at inception of a new chip design and are tracked carefully through analytic, simulation, and HDL (Hardware Description Language) models. They are validated against prototype hardware and, eventually, production hardware. With chip designs taking years, and requiring very large investments, estimates are central to corporate roadmaps. Such roadmaps may compare SPEC CPU estimates for several generations of processors, and, explicitly or by implication, contrast one company's products and plans with another's.

SPEC wants the CPU benchmarks to be useful, and part of that usefulness is allowing the performance metrics to be estimated.

The key philosophical point is simply that, where allowed, estimates (rule 5.6) must be clearly distinguished from results.

1.8. About SPEC

1.8.1. Publication on SPEC's web site is encouraged

SPEC encourages the review of CPU 2017 results by the relevant subcommittee, and subsequent publication on SPEC's web site (www.spec.org/cpu2017). SPEC uses a peer-review process prior to publication, in order to improve consistency in the understanding, application, and interpretation of these run rules.

1.8.2. Publication on SPEC's web site is not required

Review by SPEC is not required. Testers may publish rule-compliant results independently. No matter where published, all results publicly disclosed must adhere to the SPEC Run and Reporting Rules, or be clearly marked as estimates. SPEC may take action (rule 5.7) if the rules are not followed.

1.8.3. SPEC May Require New Tests

In cases where it appears that the run rules have not been followed, SPEC may investigate such a claim and require that a result be regenerated, or may require that the tester correct the deficiency (e.g. make the optimization more general purpose or correct problems with code generation).

1.8.4. SPEC May Adapt the Suites

The SPEC Open Systems Group reserves the right to adapt the SPEC CPU 2017 suites as it deems necessary to preserve its goal of fair benchmarking. Such adaptations might include (but are not limited to) removing benchmarks, modifying codes or workloads, adapting metrics, republishing old results adapted to a new metric, or requiring retesting by the original tester.

1.9. Usage of the Philosophy Rule

This philosophy rule is intended to introduce concepts of fair benchmarking. It is understood that in some cases, this rule uses terms that may require judgment, or which may lack specificity. For more specific requirements, please see the rules below.

In case of a conflict between this philosophy rule and a run rule in one of the rules below, normally the run rule found below takes priority.

Nevertheless, there are several conditions under which questions should be resolved by reference to rule 1: (a) self-conflict: if rules below are found to impose incompatible requirements; (b) ambiguity: if they are unclear or silent with respect to a question that affects how a result is obtained, published, or interpreted; (c) obsolecsence: if the rules below are made obsolete by changing technical circumstances or by directives from superior entities within SPEC.

When questions arise as to interpretation of the run rules:

  1. Interested parties should seek first to resolve questions based on the rules as written in the rules that follow. If this is not practical (because of problems of contradiction, ambiguity, or obsolescence), then the principles of the philosophy rule should be used to resolve the issue.

  2. The SPEC CPU subcommittee should be notified of the issue. Contact information may be found via the SPEC web site, www.spec.org.

  3. SPEC may choose to issue a ruling on the issue at hand, and may choose to amend the rules to avoid future such issues.

2. Building SPEC CPU 2017

Definition: flag For the purpose of these run rules, a "flag" is any means of expressing a choice regarding transformation of SPEC-supplied source into a binary executable. The term is not limited to traditional command line switches.
Other examples: environment variables; system-global options; choices made when products are installed.

2.1. General Rules for Building the Benchmark

2.1.1. SPEC's tools must be used

SPEC provides a set of tools that control how the benchmarks are run. The primary tool is called runcpu, which reads user-supplied config files, builds the benchmarks, and runs them.

Some Fortran programs in SPEC CPU 2017 are preprocessed. Fortran preprocessing must be done using the SPEC-supplied preprocessor, under the control of runcpu. It is not permitted to use a preprocessor that is supplied as part of the vendor's compiler.

SPEC supplies pre-compiled versions of the tools for many systems. For new systems, the document Building the Tools explains how to build them and how to obtain approval. Approval must be obtained prior to any publication of a result using those tools.

2.1.2. The runcpu build environment

When runcpu is used to build the SPEC CPU 2017 benchmarks, anything that contributes to performance must be disclosed (rule 4) and must meet the usual general availability tests, as described in the Philosophy (rule 1): supported, documented, product quality, recommended, and so forth. These requirements apply to all aspects of the build environment, including but not limited to:

  1. The operating system and any tuning thereof.
  2. Performance-enhancing software, firmware, or hardware.
  3. Resource management.
  4. Environment variables

2.1.3. Continuous Build requirement

For a reportable run, a suite of benchmarks are compiled (for example, 10 benchmarks for fpspeed). Optional peak tuning doubles the number of compiles (making 20 for the example of fpspeed). If a result is made public, then it must be possible for new testers to use its config file to compile all the benchmarks (both base and peak, if peak was used) in a single invocation of runcpu; and obtain executable binaries that are, from a performance point of view, equivalent to the binaries used by the original tester.

Of course, the new tester may need to set up the system to match the original build environment, as described just above (rule 2.1.2) and may need to make minor config file adjustments, e.g. for directory paths.

Note that this rule does not require that the original tester actually build all the benchmarks in a single invocation. Instead, it requires that the tester ensure that nothing would prevent a continuous build. The simplest and least error-prone way to meet this requirement is simply to do a complete build of all the benchmarks in a single invocation of runcpu. Nevertheless, SPEC recognizes that there is a cost to benchmarking and that it may be convenient to build benchmarks individually, perhaps as part of a tuning project.

Here are some examples of practices that this rule prohibits:

  1. Not allowed: Two benchmarks within a suite (for example, fprate) require incompatible OS tuning and a reboot is required in between their builds.
  2. Not allowed: One C++ benchmark within a suite does better when compiler patch 42 is installed, and another does worse; so the tester compiles one on a system with patch 42 and the other on a system that does not have patch 42.
  3. Not allowed: when all the benchmarks are built in a single invocation, a resource runs out and runcpu crashes. Therefore, the tester builds them one at a time.

2.1.4. Cross-compilation allowed

It is permitted to use cross-compilation, that is, a build process where the benchmark executables are built on a system different than the SUT. The runcpu tool must be used (in accordance with rule 2.1.2) on both systems, typically with --action=build on the host(s) and --action=validate on the SUT.

Documentation of cross-compiles is described in rule 4.9.

2.2. Rules for Selecting Compilation Flags

The following rules apply to compiler flag selection for SPEC CPU 2017 Peak and Base Metrics. Additional rules for Base Metrics follow in rule 2.3.

2.2.1. Must not use names

Benchmark source file or variable or subroutine names must not be used within optimization flags or compiler/build options.

Identifiers used in preprocessor directives to select alternative benchmark source code are also forbidden, except for a rule-compliant library substitution (rule 2.2.2) or an approved portability flag (rule 2.2.5).

For example, if a benchmark uses #ifdef IDENTIFIER to provide alternative source code under the control of compiler flag -DIDENTIFIER, that flag may not be used unless it meets the criteria of rule 2.2.2 or rule 2.2.5.

2.2.2. Limitations on library substitutions

Flags which substitute pre-computed (e.g. library-based) routines for routines defined in the benchmark on the basis of the routine's name must not be used. Exceptions are:

  1. the function alloca(). It is permitted to use a flag that substitutes a built-in alloca(). Such a flag may be applied to individual benchmarks (in both base and peak).

  2. the level 1, 2 and 3 BLAS functions in the floating point benchmarks, and the netlib-interface-compliant FFT functions. Such substitution may be used in a peak run, but must not be used in base.

Note: rule 2.2.2 does not forbid flags that select alternative implementations of library functions defined in an ANSI/ISO language standard. For example, such flags might select an optimized library of these functions, or allow them to be inlined.

2.2.3. Feedback directed optimization is allowed in peak.

Feedback directed optimization may be used in peak. Only the training input (which is automatically selected by runcpu) may be used for the run(s) that generate(s) feedback data.

Optimization with multiple feedback runs is also allowed (build, run, build, run, build...).

The requirement to use only the train data set at compile time shall not be taken to forbid the use of run-time dynamic optimization tools that would observe the reference execution and dynamically modify the in-memory copy of the benchmark. However, such tools must not in any way affect later executions of the same benchmark (for example, when running multiple times in order to determine the median run time). Such tools must also be disclosed in the publication of a result, and must be used for the entire suite (see rule 3.8).

2.2.4. Limitations on size changes

Flags that change a data type size to a size different from the default size of the compilation system (after having chosen the data model (rule 2.3.9) are not allowed.

Exceptions are:

  1. For SPECrate, pointer sizes may be set in a manner which requires, or which assumes, that the benchmarks (code+data) fit into 32 bits of address space. This exception is allowed even in base (rule 2.3.11).
  2. Changes to integer data types may be made:
    1. In base, only if it is safe (rule 2.3.1).
    2. For peak, only if the effects are fully disclosed via the flags file (rule 4.6). For example, it must be fully disclosed if there are effects on accuracy, or if the user must assert that the input data falls in a certain range.

2.2.5. Portability Flags

Rule 2.3.5 requires that all benchmarks use the same flags in base. Portability flags are an exception: they may differ from one benchmark to another, even in base. The first three items below describe rules for using them; the remaining items describe how they are proposed and approved.

  1. Portability flags must be used via the provided config file PORTABILITY lines (such as CPORTABILITY, FPORTABILITY, etc).

  2. Portability flags must be approved by the SPEC CPU Subcommittee.

  3. If a given portability problem (within a given language) occurs in multiple places within a suite, then, in base, the same method(s) must be applied to solve all instances of the problem.

  4. The initial published results for CPU 2017 will include a reviewed set of portability flags on several operating systems; later users who propose to apply additional portability flags must prepare a justification for their use.

  5. A proposed portability flag will normally be approved if one of the following conditions holds:

    1. The flag selects a performance-neutral alternate benchmark source, and the benchmark cannot build and execute correctly on the given platform unless the alternate source is selected. (Examples might be flags such as -DHOST_WORDS_BIG_ENDIAN, -DHAVE_SIGNED_CHAR.)

    2. The flag selects a compiler mode that allows basic parsing of the input source program, and it is not possible to set that flag for all programs of the given language in the suite. (An example might be -fixedform, to select Fortran source code fixed format.)

    3. The flag selects features from a certain version of the language, and it is not possible to set that flag for all programs of the given language in the suite.

    4. The flag solves a data model problem, as described in rule 2.3.9.

    5. The flag selects a resource limit, and it is not possible to set that flag for all programs of the given language in the suite.

  6. A proposed portability flag will normally not be approved unless it is essential in order to successfully build and run the benchmark.

  7. If more than one solution can be used for a problem, the subcommittee will review attributes such as precedent from previously published results, performance neutrality, standards compliance, amount of code affected, impact on the expressed original intent of the program, and good coding practices (in rough order of priority).

  8. If a benchmark is discovered to violate the relevant standard, that may or may not be reason for the subcommittee to grant a portability flag. If the justification for a portability flag is standards compliance, the tester must include a specific reference to the offending source code module and line number, and a specific reference to the relevant sections of the appropriate standard. The tester should also address impact on the other attributes mentioned in the previous paragraph.

  9. If a library is specified as a portability flag, SPEC may request that the table of contents of the library be included in the disclosure.

2.2.6. Compiler parallelization is allowed for SPECspeed, not allowed for SPECrate

Compiler flags that enable multi-threaded execution, whether by explicit OpenMP directive, or by automatic parallelization, are allowed only when building the SPECspeed benchmarks. For SPECrate, it is forbidden to use compiler parallelization (both explicit OpenMP and auto-parallelization are forbidden).

For example, the GCC flags -fopenmp and -floop-parallelize-all are allowed when building the SPECspeed Integer benchmarks, such as 657.xz_s; the same switches are forbidden when building the SPECrate Integer benchmarks, such as 557.xz_r.

Note: this rule does not forbid the use of SIMD instructions (Single Instruction, Multiple Data). For example, the GCC flag -mfpmath=sse is allowed in both SPECspeed and SPECrate.

2.3. Base Optimization Rules

The optimizations used to produce SPEC CPU 2017 Base Metrics must meet the requirements of this section, in addition to the requirements of rule 2.1 and rule 2.2 above.

2.3.1. Safety and Standards Conformance

The optimizations used are expected to be safe, and it is expected that system or compiler vendors would endorse the general use of these optimizations by customers who seek to achieve good application performance.

  1. The requirements that optimizations be safe, and that they generate correct code for a class of programs larger than the suites themselves (rule 1.4), are normally interpreted as requiring that the system, as used in base, implement the language correctly.

  2. "The language" is defined by the appropriate ANSI/ISO standard (C99, Fortran-2003, C++2003).

  3. The principle of standards conformance is not automatically applied, because SPEC has historically allowed certain exceptions:

    1. Rule 2.3.8 allows reordering of arithmetic operands.
    2. SPEC has not insisted on conformance to the C standard in the setting of errno.
    3. SPEC has not dealt with (and does not intend to deal with) language standard violations that are performance neutral for the CPU 2017 suites.
    4. When a more recent language standard modifies a requirement imposed by an earlier standard, SPEC will also accept systems that adhere to the more recent ANSI/ISO language standard.
  4. Otherwise, a deviation from the standard that is not performance neutral, and gives the particular implementation a performance advantage over standard-conforming implementations, is considered an indication that the requirements about "safe" and "correct code" optimizations are probably not met. Such a deviation may be a reason for SPEC to mark a result non-compliant (NC).

  5. If an optimization causes any SPEC CPU 2017 benchmark to fail to validate, and if the relevant portion of this benchmark's code is within the language standard, the failure is taken as additional evidence that an optimization is not safe.

2.3.2. C++ RTTI and exceptions required

The C++ standard calls for support of both run-time type information (RTTI) and exception handling. The compiler, as used in base, must enable these.

Example 1: a compiler enables exception handling by default; it can be turned off with --noexcept. The switch --noexcept is not allowed in base.

Example 2: a compiler defaults to no run time type information, but allows it to be turned on via --rtti. The switch --rtti must be used in base.

2.3.3. IEEE-754 is not required

  1. SPEC CPU 2017 does not require that compilers or hardware implement IEEE-754.
  2. Other floating point implementations may conceivably be used with CPU 2017.
  3. SPEC CPU 2017 attempts to avoid dependencies on IEEE-754 bit encoding, infinities, NaNs, and subnormal numbers.
  4. In practice, using other floating point implementations may be difficult or impossible because of the requirement to pass validation.
    In particular:

    1. SPEC CPU 2017 benchmarks are primarily drawn from real applications, which have been developed on systems that use the IEEE-754 formats 'binary32' and 'binary64' (formerly known as 'single' and 'double').
    2. SPEC has not tested CPU 2017 using other implementations.
    3. Therefore, there may be latent IEEE-754 assumptions or dependencies within the benchmarks that would cause other floating point implementations to fail validation.

2.3.4. Regarding accuracy

Because language standards generally do not set specific requirements for accuracy, SPEC has also chosen not to do so. Nevertheless:

  1. Optimizations are expected to generate code that provides appropriate accuracy for a class of applications larger than the SPEC benchmarks themselves.
  2. Implementations are encouraged to clearly document any accuracy limitations.
  3. Implementations are encouraged to adhere to the principle of "no surprises"; this can be achieved both by predictable algorithms and by documentation.

In cases where the class of appropriate applications appears to be so narrowly drawn as to constitute a "benchmark special", that may be a reason for SPEC to mark a result non-compliant (NC).

2.3.5. Base flags same for all

For base, the compilation system and all flags must be the same within a suite and language, with the exception of portability flags (rule 2.2.5). Flags are not required to be the same for differing languages, nor for differing suites. The requirement for consistency includes but is not limited to:

  1. Compilation system components:
    1. preprocessors
    2. compilers
    3. linkers
    4. libraries
    5. post-link-optimizers
  2. Flags:
    1. optimization (e.g. -O3)
    2. warning levels (e.g. -w)
    3. verbosity (e.g. -v)
    4. object file controls (e.g. -c, -o)
    5. OpenMP
    6. number of runtime threads (if controlled by compiler options)
    7. language dialect (e.g. -c99)
    8. assertion of standard compliance as explained in rule 2.3.7

Order: Base flags must be used in the same order for all compiles of a given language within a suite.

Multi-language benchmarks: Each module must be compiled the same way as other modules of its language in the suite. Multi-language benchmarks must use the same link flags as other benchmarks with the same "primary" language as designated by SPEC -- see "About Linking" in Make Variables. .

The SPEC tools provide methods to set flags on a per-language and per-suite basis. The example below demonstrates legal differences by language and by suite. Notice that a benchmark with both C++ and Fortran modules will use two different optimization levels -- which is legal, is required, and is automatically done.

fpspeed=base:
   CXXOPTIMIZE   = -O3
   FOPTIMIZE     = -O4
fprate=base:
   CXXOPTIMIZE   = -O3
   FOPTIMIZE     = -O5

2.3.5.1. Inter-module optimization for multi-language benchmarks

For mixed-language benchmarks, if the compilers have an incompatible inter-module optimization format, flags that require inter-module format compatibility may be dropped from base optimization of mixed-language benchmarks. The same flags must be dropped from all benchmarks that use the same combination of languages. All other base optimization flags for a given language must be retained for the modules of that language.

For example, suppose that a suite has exactly two benchmarks that employ both C and Fortran, namely 997.CFmix1 and 998.CFmix2. A tester uses a C compiler and Fortran compiler that are sufficiently compatible to be able to allow their object modules to be linked together - but not sufficiently compatible to allow inter-module optimization. The C compiler spells its intermodule optimization switch -ifo, and the Fortran compiler spells its switch --intermodule_optimize. In this case, the following would be legal:

fp=base:
   COPTIMIZE = -fast -O4 -ur=8 -ifo
   FOPTIMIZE = --prefetch:all --optimize:5 --intermodule_optimize
   FLD=/usr/opt/advanced/ld
   FLDOPT=--nocompress --lazyload --intermodule_optimize

997.CFmix1,998.CFmix2=base:
   COPTIMIZE = -fast -O4 -ur=8 
   FOPTIMIZE = --prefetch:all --optimize:5 
   FLD=/usr/opt/advanced/ld
   FLDOPT=--nocompress --lazyload

Following the precedence rules as explained in config.html, the above section specifiers set default tuning for the C and Fortran benchmarks in the floating point suite, but the tuning is modified for the two mixed-language benchmarks to remove switches that would have attempted inter-module optimization.

2.3.6. Feedback directed optimization must not be used in base.

Feedback directed optimization must not be used in base for SPEC CPU 2017.

2.3.7. Assertion flags must not be used in base.

An assertion flag is one that supplies semantic information that the compilation system did not derive from the source statements of the benchmark.

With an assertion flag, the programmer asserts to the compiler that the program has properties that allow the compiler to apply more aggressive optimization techniques. (The common historical example would be to assume that C pointers do not alias, prior to widespread adoption of the C89 aliasing rules.) The problem is that there can be legal programs (possibly strange, but still standard-conforming programs) where such a property does not hold. These programs could crash or give incorrect results if an assertion flag is used. This is the reason why such flags are sometimes also called "unsafe flags". Assertion flags should never be applied to a production program without previous careful checks; therefore they must not be used for base.

Exception: a tester is free to turn on a flag that asserts that the benchmark source code complies to the relevant standard (e.g. -ansi_alias). Note, however, that if such a flag is used, it must be applied to all compiles of the given language (C, C++, or Fortran) in a suite, while still passing SPEC's validation tools with correct answers for all the affected programs.

2.3.8. Floating point reordering allowed

Base results may use flags which affect the numerical accuracy or sensitivity by reordering floating-point operations based on algebraic identities.

2.3.9. Portability Flags for Data Models

"Data model flags" select sizes for items such as int, long, and pointers. For example, several benchmarks use -DSPEC_LP64, -DSPEC_P64, and/or -DSPEC_ILP64 to control the data model.

  1. A tester is free to select a data model that is supported on a particular system.

  2. Data model flags must be set consistently for all modules of a given language in base.

    1. Unlike other portability flags (see rule 2.2.5), there is no requirement to prove that data model flags are needed benchmark-by-benchmark.
    2. Instead, the requirement is that they be used consistently.
    3. The intent is to encourage consistency in base, and to ensure that Application Binary Interfaces (ABIs) are used correctly, even if it is not immediately easy to prove that they should be.
  3. If for some reason it is not practical to use a consistent data model in base, then a tester could describe the problem and request that SPEC allow use of an inconsistent data model in base. SPEC would consider such a request using the same process outlined in rule 2.2.5, including technical arguments as to the nature of the data model problem and the practicality of alternatives, if any. SPEC might or might not grant the request. SPEC might also choose to fix source code limitations, if any, that are causing difficulty.

  4. Many systems allow 32-bit programs to nevertheless use 64 bits for file-related types, under the control of flags such as -D_FILE_OFFSET_BITS=64 or -D_LARGE_FILES. This rule grants the same freedom, and imposes the same requirements, for file size flags.

  5. Portability flags for data models, like all other flags, must be documented, per rule 4.6. In some cases, SPEC supplies documentation (e.g. for -DSPEC_LP64); otherwise, the user must provide the documentation.

Example:

A tester wishes to use 32 bits where possible.
For SPECspeed, as explained in the System Requirements, some benchmarks use data sets much larger than will fit into 32 bits.
Therefore, for SPECspeed, the tester selects 64-bit.
The file system always uses 64 bits on this particular system.

In the config file, the tester writes:

   intrate,fprate=base:
     OPTIMIZE    = -m32 -O5
     PORTABILITY = -D_FILE_OFFSET_BITS=64
   intspeed,fpspeed=base:
     OPTIMIZE    = -m64 -O5
     PORTABILITY = -DSPEC_LP64

The user also edits the flags file to add

   <flag name="F-D_FILE_OFFSET_BITS:64" class="portability">
   Use 64 bits for file-related types.
   </flag>

2.3.10. Alignment switches are allowed

Switches that cause data to be aligned on natural boundaries may be used in base.

2.3.11. Pointer sizes

For SPECrate base, pointer sizes may be set in a manner which requires, or which assumes, that the benchmarks (code+data) fit into 32 bits of address space.

3. Running SPEC CPU 2017

3.1. Single File System

SPEC allows any type of file system (disk-based, memory-based, NFS, DFS, FAT, NTFS etc.) to be used. The type of file system must be disclosed in reported results.

SPEC requires the use of a single file system to contain the directory tree for the SPEC CPU 2017 suite being run, unless the output_root feature is used, in which case at most two file systems may be used: one for the installed SPEC CPU 2017 tree, and one for the run directories.

When running multiple copies for SPECrate testing, all of the run directories must be within the same file system. All copies must be executed under the control of the runcpu program, which:

  1. provides a working directory for each copy of the benchmark;
  2. places one copy of the benchmark binary into the run directories, and causes each of the copies to execute the same binary;
  3. where practical, does the same for input data: it places one copy of the inputs into the run directories, and causes each of the copies to read the same inputs;
  4. validates that each copy computes acceptable answers.

Using the config file features bench_post_setup and/or post_setup at the conclusion of setup of each benchmark and/or at the conclusion of the setup of all benchmarks, a system command may be issued to cause the benchmark data to be written to stable storage (e.g. sync). Note: It is not the intent of this run rule to provide a hook for a more generalized cleanup of memory; the intent is simply to allow dirty file system data to be written to stable storage.

3.2. Continuous Run Requirements

All benchmark executions, including the validation steps, contributing to a particular result page must occur continuously, that is, with one user invocation of runcpu. (Internally, runcpu may generate additional invocations of itself as needed; such internal details are not the topic of this rule, which concerns only user-requested invocations.)

It is permitted but not required to compile in the same runcpu invocation as the execution.

It is permitted but not required to run more than one suite (intrate, fprate, intspeed, and/or fpspeed) using a single of runcpu command.

3.2.1. Base, Peak, and Basepeak

If a result page will contain both base and peak results, a single runcpu invocation must generate both. When both base and peak are run, the tools run the base executables first, followed by the peak executables.

It is permitted to publish base results as peak. This may be:

  1. Done for an entire suite or for an individual benchmark.
  2. Decided before a run (via the config file basepeak option) or afterwards (via rawformat).

3.2.2. Untimed workloads

Reportable runs include, in the same user invocation, the untimed execution of the two smaller workloads known as "test" and "train" (see runcpu --size). These are followed by the timed "ref" workload. All workloads are checked for valid answers.

3.3. System State

The system state (for example, "Multi-User", "Single-User", "Safe Mode With Networking") may be selected by the tester. This state must be disclosed. As described in rule 4.4.3, the tester must also disclose whether any services or daemons are shut down, and any changes to tuning parameters.

3.4. Run-time environment

The run-time environment for the benchmarks:

  1. Must be fully described (rule 4).

  2. Must use generally available features, as described in the Philosophy (rule 1 -- the features must be supported, documented, of product quality, and so forth).

  3. Must not differ from benchmark to benchmark during base runs. All means by which a tester might attempt to change the environment for individual benchmarks in base are forbidden, including but not limited to config file hooks such as submit, monitor_pre_bench, and env_vars. If an environment variable is needed for base, the preferred method is preenv, because it restarts runcpu with the specified environment variables and automatically documents them in reports. A config file may set an environment variable in a submit command only on condition that the same submit command must be used for all base benchmarks in a suite.

  4. May differ in peak, using the config file features env_vars and submit.

  5. May be affected by compile-time flags. Sometimes choices can be made at compile time that cause benchmark binaries to carry information about a desired runtime environment. Rule 3.4 does not forbid use of such flags, and the same rules apply for them as for any other compiler flags.

Example 1 It would not be acceptable in base to use submit or env_vars to cause different benchmarks to pick differing page sizes, differing number of threads, or differing choices for local vs. shared memory. In peak, all of these could differ by benchmark.

Example 2 Suppose that a compiler flag, --bigpages=yes, requests that a binary should be run with bigpages (if available).
Each of the following are permitted:

3.5. Submit

The config file option submit may be used to assign work to processors. It is commonly used for SPECrate tests, but can also be used for SPECspeed. The tester may, if desired:

  1. Place benchmark copies or threads on desired processors.
  2. Place the benchmark memory on a desired memory unit and request desired memory types (such as a specific page size).
  3. Set environment variables, such as library paths. Note: the SPEC CPU 2017 tools control OMP_NUM_THREADS; that variable must not be set in a submit command.
  4. Do arithmetic (e.g. via shell commands) to derive a valid processor number from the SPEC copy number.
  5. Cause the tools to write each copy's benchmark invocation lines to a file, which is then sent to its processor.
  6. Reference a testbed description provided by the tester (such as a topology file). The same description must be used by all benchmarks.

The submit command must not be used to set differing run time environments among benchmarks for base (see rule 3.4).

In base, the submit command must be the same for all benchmarks in a suite. In peak, different benchmarks may use different submit commands.

3.6. SPECrate Number of copies

For SPECrate®2017_int_base and SPECrate®2017_fp_base, the tester selects how many copies to run, and the tools apply that to the base runs for all benchmarks in that suite.

For SPECrate®2017_int_peak and SPECrate®2017_fp_peak, the tester is allowed to select a different number of copies for each benchmark.

3.7. SPECspeed Number of threads

Note: the rules just below about number of threads are in regard to what the tester requests using runcpu --threads or the corresponding config feature. In some cases, the tester might request a particular number of threads, but a different number might be implemented by the actual benchmark.

For SPECspeed®2017_int_base and SPECspeed®2017_fp_base, the tester selects how many threads are desired, and the tools apply that to the base runs for all benchmarks in that suite.

For SPECspeed®2017_int_peak and SPECspeed®2017_fp_peak, the tester is allowed to select a different number of threads for each benchmark.

3.8. Run-Time Dynamic Optimization

3.8.1. Definitions and Background

As used in these run rules, the term "run-time dynamic optimization" (RDO) refers broadly to any method by which a system adapts to improve performance of an executing program based upon observation of its behavior as it runs. This is an intentionally broad definition, intended to include techniques such as:

  1. rearrangement of code to improve instruction cache performance
  2. replacement of emulated instructions by native code
  3. value prediction
  4. branch predictor training
  5. reallocation of on-chip functional units among hardware threads
  6. TLB training
  7. adjustment of the supply of big pages

RDO may be under control of hardware, software, or both.

Understood this broadly, RDO is already commonly in use, and usage can be expected to increase. SPEC believes that RDO is useful, and does not wish to prevent its development. Furthermore, SPEC views at least some RDO techniques as appropriate for base, on the grounds that some techniques may require no special settings or user intervention; the system simply learns about the workload and adapts.

However, benchmarking a system that includes RDO presents a challenge. A central idea of SPEC benchmarking is to create tests that are repeatable: if you run a benchmark suite multiple times, it is expected that results will be similar, although there will be a small degree of run-to-run variation. But an adaptive system may recognize the program that it is asked to run, and "carry over" lessons learned in the previous execution; therefore, it might complete a benchmark more quickly each time it is run. Furthermore, unlike in real life, the programs in the benchmark suites are presented with the same inputs each time they are run: value prediction is too easy if the inputs never change. In the extreme case, an adaptive system could be imagined that notices which program is about to run, notices what the inputs are, and which reduces the entire execution to a print statement. In the interest of benchmarking that is both repeatable and representative of real-life usage, it is therefore necessary to place limits on RDO carry-over.

3.8.2. RDO Is Allowed, Subject to Certain Conditions

Run time dynamic optimization is allowed, subject to the usual provisions that the techniques must be generally available, documented, and supported. It is also subject to the conditions listed in the rules immediately following.

3.8.3. RDO Disclosure and Resources

Rule 4 applies to run-time dynamic optimization: any settings which the tester has set to non-default values must be disclosed. If RDO requires any hardware resources, these must be included in the description of the hardware configuration.

For example, suppose that a system can be described as a 64-core system. After experimenting for a while, the tester decides that optimum SPECrate throughput is achieved by dedicating 4 cores to the run-time dynamic optimizer, and running only 60 copies of the benchmarks. The system under test is still correctly described as a 64-core system, even though only 60 cores ran SPEC code.

3.8.4. RDO Settings Cannot Be Changed At Run-time

Run time dynamic optimization settings must not be changed at run-time, with the exception that rule 3.4 (e) also applies to RDO. For example, in peak it would be acceptable to compile a subset of the benchmarks with a flag that suggests to the run-time dynamic optimizer that code rearrangement should be attempted. Of course, rule 2.2.1 also would apply: such a flag could not tell RDO which routines to rearrange.

3.8.5. RDO and safety in base

If run-time dynamic optimization is effectively enabled for base (after taking into account the system state at run-time and any compilation flags that interact with the run-time state), then RDO must comply with the safety rule (rule 2.3.1). It is understood that the safety rule has sometimes required judgment, including deliberation by SPEC in order to determine its applicability. The following is intended as guidance for the tester and for SPEC:

  1. If an RDO system optimizes a SPEC benchmark in a way which allows it to successfully process the SPEC-supplied inputs, that is not enough to demonstrate safety. If it can be shown that a different, but valid, input causes the program running under RDO to fail (either by giving a wrong answer or by exiting), where such failure does not occur without RDO; and if it is not a fault of the original source code; then this is taken as evidence that the RDO method is not safe.

  2. If an RDO system requires that programs use a subset of the relevant ANSI/ISO language standard, or requires that they use non-standard features, then this is taken as evidence that it is not safe.

  3. An RDO system is allowed to assume (rule 1.5 (b)) that programs adhere to the relevant ANSI/ISO language standard.

3.8.6. RDO carry-over by program is not allowed

As described in rule 3.8.1, SPEC has an interest in preventing carry-over of information from run to run. Specifically, no information may be carried over which identifies the specific program or executable image. Here are some examples of behavior that is, and is not, allowed.

  1. It doesn't matter whether the information is intentionally stored, or just "left over"; if it's about a specific program, it's not allowed:

    1. Allowed: when a program is run, its use of emulated instructions is noticed by the run-time dynamic optimizer, and these are replaced as it runs, during this run only, by native code.
    2. Not allowed: when the program is re-run, a disk cache is consulted to find out what instructions were replaced last time, and the replacement code is used instead of the original program.
    3. Not allowed: when the program is re-run, the replacement native instructions are still sitting in memory, and the replacement instructions are used instead of the original program.
  2. If information is left over from a previous run that is not associated with a specific program, that is allowed:

    1. Allowed: a virtually-indexed branch predictor is trained during the reference run of 500.perlbench_r. When the second run of 500.perlbench_r is begun, a portion of the branch predictor tables happen to still be in the state that they were in at the end of the previous run (i.e. some entries have not been re-used during runs of intervening programs).
    2. Not allowed: a branch predictor specifically identifies certain virtual addresses belong to the executable for 500.perlbench_r, and on the second run of that executable it uses that knowledge.
  3. Any form of RDO that uses memory about a specific program is forbidden:

    1. Allowed: while 500.perlbench_r is running, the run-time dynamic optimizer notices that it seems to be doing a poor job of rearranging instructions for instruction cache packing today, and gives up for the duration of this run.
    2. Not allowed: the next time 500.perlbench_r runs, the run-time dynamic optimizer remembers that it had difficulty last time and decides not to even try this time.
    3. Not allowed: the run-time dynamic optimizer recognizes that this new program is 500.perlbench_r by the fact that it has the same filename, or has the same size, or has the same checksum, or contains the same symbols.
  4. The system is allowed to respond to the currently running program, and to the overall workload:

    1. Allowed: the operating system notices that demand for big pages is intense for the currently running program, and takes measures to increase their supply.
    2. Not allowed: the operating system notices that the demand for big pages is intense for certain programs, and takes measures to supply big pages to those specific programs.
    3. Allowed: the operating system notices that the demand for big pages is intense today, and takes measures to increase the supply of them. This causes all but the first few SPECspeed®2017_fp benchmarks to run more quickly, as the bigpage supply is improved.

3.9. Power and Temperature Measurement

SPEC CPU 2017 allows optional power measurement. If energy metrics are reported, the rules in this section 3.9 apply.

Note: Testers are, as always, expected to follow all other rules in this document. For convenience, it may be noted that power-related changes were made to the following rules as of the introduction of SPEC CPU 2017 v1.1:

Additional rule updates may be made from time to time, as mentioned at the top of this document.

3.9.1 SPEC PTDaemon must be used

SPEC's Power and Temperature reporting tool, SPEC PTDaemon, must be used to gather power measurements. The PTDaemon runs on a "controller system", which must be separate from the SUT, collecting data from power analyzers (see rule 3.9.5) and temperature sensors (see rule 3.9.6).

The PTDaemon, along with runcpu, will perform various checks on the quality of the measurements. These may be adjusted from time to time, with subsequent releases of PTDaemon or SPEC CPU 2017; as of early 2019, they include:

It is permitted to reformat a power+performance run as performance-only, using the rawformat utility with the --nopower option. You may wish to do so if a run is marked invalid due to sampling or other problems detected during power measurement.

3.9.2. Line Voltage Source

The preferred line voltage source is the main AC power as provided by local utility companies. Power generated from other sources often has unwanted harmonics which are incapable of being measured correctly by many power analyzers, and thus, would generate inaccurate results.

  1. The AC Line Voltage Source must meet the following characteristics:

  2. If an unlisted AC line voltage source is used, a reference to the standard must be provided to SPEC.

  3. For situations in which the appropriate voltages are not provided by local utility companies (e.g. measuring a server in the United States which is configured for European markets, or measuring a server in a location where the local utility line voltage does not meet the required characteristics), an AC power source may be used. In such situations, the following requirements must be met, and the relevant measurements or power source specifications disclosed in the power_notes section of the report:

    1. The total harmonic distortion of the output voltage (under load), based on IEC standards, must be less than 5%
    2. The AC power source needs to meet the frequency and voltage characteristics previously listed in this section.
    3. The AC power source must not manipulate its output in a way that would alter the power measurements compared to a measurement made using a compliant line voltage source without the power source.

    The intent is that the AC power source does not interfere with measurements such as power factor by trying to adjust its output power to improve the power factor of the load.

  4. The usage of an uninterruptible power source (UPS) as the line voltage source is allowed, but the voltage output must be a pure sine-wave. For placement of the UPS, see 3.9.5.a. This usage must be specified in the power_notes section of the report.

  5. Systems that are designed to be able to run normal operations without an external source of power cannot be used to produce valid energy metrics. Some examples of disallowed systems are notebook computers, hand-held computers/communication devices, and servers that are designed to frequently operate on integrated batteries without external power.

    Systems with batteries intended to preserve operations during a temporary lapse of external power, or to maintain data integrity during an orderly shutdown when power is lost, can be used to produce valid energy metrics. For SUT components that have an integrated battery, the battery must be fully charged at the end of each of the measurement intervals, and proof must be provided that it is charged at least to the level of charge at the beginning of the interval.

    Note that integrated batteries that are intended to maintain such things as durable cache in a storage controller can be assumed to remain fully charged. The above paragraph is intended to address "system" batteries that can provide primary power for the SUT.

3.9.3. Environmental Conditions

For runs with power measurement, there are restrictions on the physical environment in which the run takes place.

  1. Ambient temperature must not go below 20°C for the duration of the run.
  2. Ambient temperature must not go above the documented upper operating limit of the SUT for the duration of the run.
  3. Elevation must be within the documented operating specification of the SUT.
  4. Humidity must remain within documented operating specification of the SUT for the duration of the run.
  5. Overtly directing air flow in order to increase cooling is not allowed unless it is consistent with normal data center practices.

The intent is to discourage extreme environments that may artificially affect power consumption or performance of the SUT, either before or during the benchmark run.

3.9.4. Network Interfaces

For runs with power measurement, at least one configured network interface on each SUT must be connected and operating at a minimum speed of 1 Gb/s, unless the adapter has a maximum speed of less than 1 Gb/s, in which case it must run at its full rated speed.

Automatically reducing network speed and power consumption in response to traffic levels is allowed for network interface controllers with such capabilities, as long as they are also capable of increasing to their configured speed automatically.

3.9.5. Power analyzer requirements

  1. The power analyzer must be located between the AC line voltage source (or UPS) and the SUT. No other active components are allowed between the AC line voltage source and the SUT.

  2. Power analyzer configuration settings that are set by the SPEC PTDaemon must not be manually overridden.

  3. SPEC maintains a list of accepted measurement devices. The devices listed in that document are currently supported by the SPEC PTDaemon and have specifications compliant with the requirements below.

    1. Measurements: the analyzer must report true RMS power in watts, as well as voltage and either amperes or power factor.
    2. Accuracy: Measurements must be reported by the analyzer with an overall uncertainty of 1% or better for the RMS power ranges measured during the benchmark run. Overall uncertainty means the sum of all specified analyzer uncertainties for the measurements made during the benchmark run.
    3. Calibration: the analyzer must have been calibrated within the past year to a standard traceable to NIST (U.S.A.) or a counterpart national metrology institute in other countries.
    4. Crest Factor: The analyzer must provide a current crest factor of a minimum value of 3. For analyzers which do not specify the crest factor, the analyzer must be capable of measuring an amperage spike of at least 3 times the maximum amperage measured during any 1-second sample of the benchmark run.
    5. Logging: The analyzer must have an interface that allows its measurements to be read by the SPEC PTDaemon. The reading rate supported by the analyzer must be at least 1 set of measurements per second. The data averaging interval of the analyzer must be either 1 (preferred) or 2 times the reading interval. "Data averaging interval" is defined as the time period over which all samples captured by the high-speed sampling electronics of the analyzer are averaged to provide the measurement set.
  4. The tester should pick a range that meets the above requirements. A power analyzer may meet these requirements when used in some power ranges but not in others, due to the dynamic nature of power analyzer accuracy and crest factor.

  5. The current range may need to be set to different values for different benchmarks within SPEC CPU 2017. This is allowed, for both base and peak. Rules 2.3.5 and 3.4 shall not be interpreted to forbid doing so.

    For example, the following is allowed. Notice that the integer rate benchmark 557.xz_r uses a different current_range than the other integer rate benchmarks.

    intrate=base:
       current_range = 3
    557.xz_r=base:
       current_range = 4
    intrate=peak:
       current_range = 5
    557.xz_r=peak:
       current_range = 6
  6. A power analyzer's auto-ranging function may be used only if it is not possible to pick a specific current range for a benchmark.

    For example, it is conceivable that a SPECspeed benchmark might need to be measured in a higher current range during a computation phase that uses many OpenMP threads, but in a lower current range during an analysis and reporting phase.

    If auto-ranging is selected, use the free-form notes section to explain why it was needed.

    For results that are used in public (whether at SPEC's web site or elsewhere), if auto-ranging is used, evidence that it is needed must be supplied on request, following the procedures of rule 5.5, Required Disclosures. In this case, the rawfiles to be supplied would be rawfiles that demonstrate the problem when auto-ranging is not used.

    For example: a published result uses current_range=auto for benchmark 649.fotonik3d_s. The power analyzer has current ranges of 6.25A and 25A. Upon request, the tester provides rawfiles demonstrating failure when attempting 649.fotonik3d_s with both these current ranges.

Initial power analyzer setup guidance can be found in the Power and Temperature Measurement Setup Guide.

3.9.6. Temperature Sensor Requirements

  1. Temperature must be measured between 10 mm to 50 mm in front of (upwind of) the main airflow inlet of the SUT.
  2. SPEC maintains a list of accepted measurement devices. Compliant temperature sensors meet the following specifications:

    1. Accuracy: Measurements must be reported by the sensor with an overall accuracy of ± 0.5°C or better for the ranges measured during the benchmark run.
    2. Logging: The sensor must have an interface that allows its measurements to be read by the SPEC PTDaemon.
    3. The reading rate supported by the analyzer must be at least 4 sets of measurements per minute.

Initial temperature sensor setup guidance can be found in the Power and Temperature Measurement Setup Guide.

3.9.7. DC Line Voltage

SPEC CPU 2017 power measurement is neither supported nor tested with DC loads.

3.9.8. Power Measurement Exclusion

Network switches and console access devices such as KVMs or serial terminal servers need not be included in the power measurement of the SUT.

4. Results Disclosure

4.0. One-sentence SUMMARY of Disclosure Requirements

All of the various parts of rule 4 may be summarized in one sentence:

For results that are used in public, SPEC requires a full disclosure of results and configuration details sufficient to independently reproduce the results.

Distributions of results within a company for internal use only, or distributions under a non-disclosure agreement, are not considered "public" usage.

4.1. General disclosure requirements

Requirements overview: A full disclosure of results must include:

  1. The components of the disclosure page, as generated by the SPEC tools.
  2. The tester's configuration file and any supplemental files needed to build and run the tests.
  3. A flags definition disclosure.
  4. A full description of the parts and procedures that are needed to reproduce the result, including but not limited to: hardware, firmware/BIOS, and software.
  5. All configuration choices differing from default.
  6. For results with energy metrics, full details of any hardware or non-default settings that affect power consumption, even if they do not affect performance.
  7. Any other information relevant to reproducing the result.

Many fields are provided, as described below, where such items are entered. In the case where an item falls outside the pre-defined fields, it must be described using the free-form notes sections.

Note: The disclosure rules are not intended to imply that the tester must include massively redundant information, nor details that are not relevant to reproducing the result.

4.1.1. Tester's responsibility

The requirements for full disclosure apply even if the tester does not personally set up the system. The tester is responsible to ascertain and document all performance-relevant steps that an ordinary customer would do in order to achieve the same performance.

4.1.2. Sysinfo must be used for published results

SPEC CPU 2017 published results must use a SPEC-supplied tool, sysinfo, which examines various aspects of the System Under Test. If for some reason it is not technically feasible to use sysinfo (perhaps on a new architecture), the tester should contact SPEC for assistance.

SPEC may update the sysinfo tool from time to time, which users can download from the SPEC web site using runcpu --update. SPEC may require that published results use a specific version of the tool.

4.1.3. Information learned later

It is expected that all published results are fully described when published. If it should happen that a performance relevant feature is discovered subsequent to publication, the publication must be updated. Note that SPEC has a Penalties and Remedies process which is designed to encourage prompt action in such cases. The Penalties and Remedies document is attached to the Fair Use page, www.spec.org/fairuse.html.

Example 1: If the SuperHero Model 1 comes with a write-through L3 cache, and the SuperHero Model 2 comes with a write-back L3 cache, then specifying the model number is sufficient, and no additional steps need to be taken to document the cache protocol. But if the Model 3 is available with both write-through and write-back L3 caches, then a full disclosure must specify which L3 cache is used.

Example 2: A tester reasonably believes that the choice of whether or not the SUT has manpages is not performance relevant, and does not specify whether that installation option is selected.

Example 3: The same tester as in example 2 is using the recently revised SuperHero OS v13. The tester is surprised to discover that as of v13, the act of installing manpages brings along the entire Superhero Library Online World (SLOW), which includes a web protocol stack containing two dozen software packages, each of which eats 2% of a CPU. The tester:
- de-selects manpages and SLOW,
- documents that fact in the full disclosure for a result that will be published next week, and
- generates a request to the SPEC Editor to add a note to all previously-published SuperHero v13 result pages to indicate whether or not each one includes SLOW.

4.1.4. Peak Metrics are Optional

Publication of peak results are considered optional by SPEC, so the tester may choose to publish only base results. Since by definition base results adhere to all the rules that apply to peak results, the tester may choose to refer to these results by either the base or peak metric names (e.g. SPECspeed®2017_int_base or SPECspeed®2017_int_peak).

It is permitted to publish base-only results. Alternatively, the use of the flag basepeak is permitted, as described in rule 3.2.1.

4.1.5. Base must be disclosed

For results published on its web site, SPEC requires that base results be published whenever peak results are published. If peak results are published outside of the SPEC web site -- www.spec.org -- in a publicly available medium, the tester must supply base results on request, as described in rule 5.5, Required Disclosures.

4.2. Systems not yet shipped

If a tester publishes results for a hardware or software configuration that has not yet shipped:

  1. The component suppliers must have firm plans to make production versions of all components generally available, within 3 months of the first public release of the result (whether first published by the tester or by SPEC). "Generally available" is defined in the SPEC Open Systems Group Policy document, which can be found at www.spec.org/osg/policy.html.

  2. The tester must specify the general availability dates that are planned.

  3. It is acceptable to test larger configurations than customers are currently ordering, provided that the larger configurations can be ordered and the company is prepared to ship them.

4.2.1. Pre-production software can be used

A "pre-production", "alpha", "beta", or other pre-release version of a compiler (or other software) can be used in a test, provided that the performance-related features of the software are committed for inclusion in the final product.

The tester must practice due diligence to ensure that the tests do not use an uncommitted prototype with no particular shipment plans. An example of due diligence would be a memo from the compiler Project Leader which asserts that the tester's version accurately represents the planned product, and that the product will ship on date X.

The final, production version of all components must be generally available within 3 months after first public release of the result.

4.2.2. Software component names

When specifying a software component name in the results disclosure, the component name that should be used is the name that customers are expected to be able to use to order the component, as best as can be determined by the tester. It is understood that sometimes this may not be known with full accuracy; for example, the tester may believe that the component will be called "TurboUnix V5.1.1" and later find out that it has been renamed "TurboUnix V5.2", or even "Nirvana 1.0". In such cases, an editorial request can be made to update the result after publication.

Some testers may wish to also specify the exact identifier of the version actually used in the test (for example, "build 20020604"). Such additional identifiers may aid in later result reproduction, but are not required; the key point is to include the name that customers will be able to use to order the component.

4.2.3. Specifying dates

The configuration disclosure includes fields for both "Hardware Availability" and "Software Availability". In both cases, the date which must be used is the date of the component which is the last of the respective type to become generally available. (The SPEC CPU suite used for testing is NOT part of the consideration for software availability date.)

4.2.4. If dates are not met

If a software or hardware availability date changes, but still falls within 3 months of first publication, a result page may be updated on request to SPEC.

If a software or hardware availability date changes to more than 3 months after first publication, the result is considered Non-Compliant.

4.2.5. Performance changes for pre-production systems

SPEC is aware that performance results for pre-production systems may sometimes be subject to change, for example when a last-minute bugfix reduces the final performance or increases energy consumption.

For results measured on pre-production systems, if the tester becomes aware of something that will reduce production system performance by more than 1.75% on an overall performance metric (for example, SPECspeed®2017_fp_base) or by more than 5% on an overall energy metric (for example, SPECrate®2017_fp_energy_base), the tester is required to republish the result, and the original result shall be considered non-compliant.

4.3. Performance changes for production systems

As mentioned just above, performance may sometimes change for pre-production systems; but this is also true of production systems (that is, systems that have already begun shipping). For example, a later revision to the firmware, or a mandatory OS bugfix, might reduce performance.

For production systems, if the tester becomes aware of something that reduces performance by more than 1.75% on an overall performance metric (for example, SPECspeed®2017_fp_base) or by more than 5% on an overall energy metric (for example, SPECrate®2017_fp_energy_base), the tester is encouraged but not required to republish the result. In such cases, the original result is not considered non-compliant. The tester is also encouraged, but not required, to include a reference to the change that caused the difference in performance or power consumption (e.g. "with OS patch 20020604-02").

4.4. Configuration Disclosure

The SPEC tools allow the tester to describe the system in the configuration file, prior to starting the measurement (i.e. prior to the runcpu command).

  1. It is acceptable to update the information after a measurement has been completed, by editing the rawfile. Rawfiles include a marker that separates the user-editable portion from the rest of the file.

    # =============== do not edit below this point ===================
          

    Edits are forbidden beyond that marker.

  2. The rawformat utility may be used to reformat results after edits.
  3. rawformat may be used to publish base results as base+peak (see rule 3.2.1).
  4. rawformat may be used to publish a base+peak result as base only (see runcpu.html#baseonly).
  5. rawformat may be used to update flags files (see rule 4.6.b).
  6. rawformat may be used to publish a power+performance result as performance-only (see rule 3.9.1).

The following rules describe the various elements that make up the disclosure of the system configuration tested.

4.4.1. System Identification

  1. Model Name
  2. Test Date: Month, Year
  3. Hardware Availability Date: Month, Year. If more than one date applies, use the latest one.
  4. Software Availability Date: Month, Year. If more than one date applies, use the latest one. (The SPEC CPU suite used for testing is NOT part of the consideration for software availability date.)
  5. Hardware Vendor
  6. Test sponsor: the entity sponsoring the testing (defaults to hardware vendor).
  7. Tester: the entity actually carrying out the tests (defaults to test sponsor).
  8. CPU 2017 license number of the test sponsor or the tester

4.4.2. Hardware Configuration

  1. CPU Name: A manufacturer-determined processor formal name.

    It is expected that the formal name, perhaps with the addition of the Nominal MHz, identifies the CPU sufficiently so that a customer would know what to order if they wish to reproduce the result. If this is not true, use the free-form notes to further identify the CPU.

  2. CPU Nominal MHz: Nominal chip frequency, in megahertz, as specified by the CPU chip vendor. Exception: In cases where a system vendor has adjusted the frequency ("over-clocking" or "under-clocking") enter the adjusted frequency here and clarify the adjustment in the notes section. Reminder: published results must use supported configurations. Over-clocked parts are not allowed unless supported (perhaps by the system vendor).

  3. CPU Maximum MHz: Maximum chip frequency, in megahertz, as specified by the CPU chip vendor. Exception: If a system vendor adjusts the maximum, enter the adjusted frequency here, in which case the same considerations apply as described just above for Nominal MHz

  4. Number of CPUs in System. As of mid-2016, it is assumed that systems can be described as containing one or more "chips", each of which contains some number of "cores", each of which can run some number of hardware "threads". Fields are provided in the results disclosure for each of these. If industry practice evolves such that these terms are no longer sufficient to describe processors, SPEC may adjust the field set.

    The current fields are:

    1. hw_ncores: number of processor cores enabled (total) during this test
    2. hw_nchips: number of processor chips enabled during this test
    3. hw_nthreadspercore: number of hardware threads enabled per core during this test

    Note: if resources are disabled, the method(s) used for such disabling must be documented and supported.

  5. Number of CPUs orderable. Specify the number of processors that can be ordered, using whatever units the customer would use when placing an order. If necessary, provide a mapping from that unit to the chips/cores units just above. For example:

    1 to 8 TurboCabinets. Each TurboCabinet contains 4 chips.

  6. Level 1 (primary) Cache: Size, location, number of instances (e.g. "32 KB I + 64 KB D on chip per core")

  7. Level 2 (secondary) Cache: Size, location, number of instances

  8. Level 3 (tertiary) Cache: Size, location, number of instances

  9. Other Cache: Size, location, number of instances

  10. Memory: Size in MB/GB. Performance relevant information as to the memory configuration must be included, either in the field or in the notes section. If there is one and only one way to configure memory of the stated size, then no additional detail need be disclosed. But if a buyer of the system has choices to make, then the result page must document the choices that were made by the tester.

    For example, the tester may need to document number of memory carriers, size of DIMMs, banks, interleaving, access time, or even arrangement of modules: which sockets were used, which were left empty, which sockets had the bigger DIMMs.

    Exception: if the tester has evidence that a memory configuration choice does not affect performance, then SPEC does not require disclosure of the choice made by the tester.

    For example, if a 1GB system is known to perform identically whether configured with 8 x 128MB DIMMs or 4 x 256MB DIMMs, then SPEC does not require disclosure of which choice was made.

  11. Disk Subsystem: Size (MB/GB), Type (SCSI, Fast SCSI etc.), other performance-relevant characteristics. The disk subsystem used for the SPEC CPU 2017 run directories must be described. If other disks are also performance relevant, then they must also be described.

  12. Other Hardware: Additional equipment added to improve performance

4.4.3. Software Configuration

  1. Operating System: Name and Version
  2. System State:

    1. On Linux systems with multiple run levels, the system state must be described by stating the run level and a very brief description of the meaning of that run level, for example:

      System State: Run level 4 (multi-user with display manager)

    2. On other systems: If the system is installed and booted using default options, document the System State as "Default".

      If the system is used in a non-default mode, document the system state using the vocabulary appropriate to that system (for example, "Safe Mode with Networking", "Single User Mode").

    3. Some Unix (and Unix-like) systems have deprecated the concept of "run levels", preferring other terminology for state description. In such cases, the system state field should use the vocabulary recommended by the operating system vendor.

    4. Additional detail about system state may be added in free form notes.

  3. File System Type used for the SPEC CPU 2017 run directories

  4. Compilers:

    1. C Compiler Name and Version
    2. C++ Compiler Name and Version
    3. Fortran Compiler Name and Version
    4. Pre-processors (if used): Name and Version
  5. Parallel: Automatically set to 'yes' if any of the benchmarks are compiled to use multiple hardware threads, cores, and/or chips.

    1. This field is specifically for compiler parallelism, whether from explicit OpenMP markup; or from compiler automatic parallelism, typically derived by loop analysis.
    2. It is set via compiler flag attributes, as documented in flag-description.html
    3. SPEC is aware that there are many other forms of parallel operation on contemporary systems, in addition to the above. Such other forms of parallelism are not reported via this field. For example, none of the following would alone be a reason to report this field as 'yes':

      • a multiply instruction takes 4 operands
      • a math library routine uses 4 chips for a matrix multiply
      • a chip keeps many instructions in-flight, overlapping memory fetches with multiply operations
      • a chip resource manager notices a heavy demand for multiply operations, and reallocates a functional unit
      • a file system manager writes results asynchronously to a RAID array
  6. Other Software: Additional software added to improve performance
  7. Scripted Installations and Pre-configured Software: In order to reduce the cost of benchmarking, test systems are sometimes installed using automatic scripting, or installed as preconfigured system images. A tester might use a set of scripts that configure the corporate-required customizations for IT Standards, or might install by copying a disk image that includes Best Practices of the performance community. SPEC understands that there is a cost to benchmarking, and does not forbid such installations, with the proviso that the tester is responsible to disclose how end users can achieve the claimed performance (using appropriate fields above).

    Example: the Corporate Standard Jumpstart Installation Script has 73 documented customizations and 278 undocumented customizations, 34 of which no one remembers. Of the various customizations, 17 are performance relevant for SPEC CPU 2017 - and 4 of these are in the category "no one remembers". The tester is nevertheless responsible for finding and documenting all 17. Therefore to remove doubt, the tester prudently decides that it is less error-prone and more straightforward to simply start from customer media, rather than the Corporate Jumpstart.

4.4.4. Power Management

Power Management: Briefly summarize any non-default settings for power management, whether set in BIOS, firmware, operating system, or elsewhere.

Explain your settings in a platform flags file and/or the power notes.

4.5. Tuning Information

  1. Base flags list
  2. Peak flags list for each benchmark
  3. Portability flags used for any benchmark
  4. Base pointers: size of pointers in base.
    1. "32-bit": if all benchmarks in base are compiled with switches that request only 32-bit pointers.
    2. "64-bit": if all benchmarks in base are compiled with switches that request only 64-bit pointers.
    3. "32/64-bit": if there is a mix of 32-bit and 64-bit
  5. Peak pointers: size of pointers in peak.
  6. System Services: If performance relevant system services or daemons are shut down (e.g. remote management service, disk indexer / defragmenter, spyware defender, screen savers) these must be documented in the notes section. Incidental services that are not performance relevant may be shut down without being disclosed, such as the print service on a system with no printers attached. The tester remains responsible for the results being reproducible as described.

  7. System and other tuning: Operating System tuning selections and other tuning that has been selected by the tester (including but not limited to firmware/BIOS, environment variables, kernel options, file system tuning options, and options for any other performance-relevant software packages) must be documented in the configuration disclosure in the rawfile. The meaning of the settings must also be described, in either the free form notes or in the flags file (rule 4.6). The tuning parameters must be documented and supported.

  8. Any additional notes such as listing any use of SPEC-approved alternate sources or tool changes.

  9. If a change is planned for the spelling of a tuning string, both spellings should be documented in the notes section.

    For example, suppose the tester uses a pre-release compiler with:

    f90 -O4 --newcodegen --loopunroll:outerloop:alldisable

    but the tester knows that the new code generator will be automatically applied in the final product, and that the spelling of the unroll switch will be simpler than the spelling used here. The recommended spelling for customers who wish to achieve the effect of the above command will be:

    f90 -O4 -no-outer-unroll

    In this case, the flags report will include the actual spelling used by the tester, but a note should be added to document the spelling that will be recommended for customers.

4.6. Description of Tuning Options ("Flags File")

SPEC CPU 2017 provides benchmarks in source code form, which are compiled under control of SPEC's toolset. Compilation flags are detected and reported by the tools with the help of "flag description files". Such files provide information about the syntax of flags and their meaning.

  1. Flags file required: A result will be marked "invalid" unless it has an associated flag description file. A description of how to write one may be found in flag-description.html.

  2. Results may be reformatted to fix invalid flags files. If a result is marked "invalid" solely due to a missing or incorrect flags file, it is allowed to fix the problem by incorporating an updated flags file, using the rawformat utility --flagsurl option.

  3. Flags description files are not limited to compiler flags. Although these descriptions have historically been called "flags files", flag description files are also used to describe other performance-relevant options.

  4. Notes section or flags file? As mentioned above (rule 4.5), all tuning must be disclosed, and the meaning of the tuning options must be described. In general, it is recommended that the result page should state what tuning has been done, and the flags file should state what it means. As an exception, if a definition is brief, it may be more convenient, and it is allowed, to simply include the definition in the notes section.

  5. Required detail: The level of detail in the description of a flag is expected to be sufficient so that an interested technical reader can form a preliminary judgment of whether he or she would also want to apply the option.

    1. This requirement is phrased as a "preliminary judgment" because a complete judgment of a performance option often requires testing with the user's own application, to ensure that there are no unintended consequences.

    2. At minimum, if a flag has implications for safety, accuracy, or standards conformance, such implications must be disclosed.

    3. For example, one might write:

      When --algebraII is used, the compiler is allowed to use the rules of elementary algebra to simplify expressions and perform calculations in an order that it deems efficient. This flag allows the compiler to perform arithmetic in an order that may differ from the order indicated by programmer-supplied parentheses.

      The preceding sentence ("This flag allows...") is an example of a deviation from a standard which must be disclosed.

  6. Description of Feedback-directed optimization: If feedback directed optimization is used, the description must indicate whether training runs:

    1. gather information regarding execution paths
    2. gather information regarding data values
    3. use hardware performance counters
    4. gather data for optimizations unique to FDO

    Hardware performance counters are often available to provide information such as branch mispredict frequencies, cache misses, or instruction frequencies. If they are used during the training run, the description needs to note this; but SPEC does not require a description of exactly which performance counters are used.

    As with any other optimization, if the optimizations performed have effects regarding safety, accuracy, or standards conformance, these effects must be described.

  7. Flag file sources: It is acceptable to build flags files using previously published results, or to reference a flags file provided by someone else (e.g. a compiler vendor). Doing so does not relieve an individual tester of the responsibility to ensure that his or her own result is accurate, including all its descriptions.

4.7. A result may be published for only one system

Previous versions of these run rules have allowed a single SPEC CPU result to be published for multiple "equivalent" systems, for example when a single system is sold with two different model numbers.

New: As of the release of SPEC CPU 2017 V1.1, such publication is no longer allowed. The measurements must be done on the actual system for which the results are claimed, and a result may be published for only one system.

The reasons for the change are that:

4.8. Configuration Disclosure for User Built Systems

SPEC CPU 2017 results are for systems, not just for chips: it is required that a user be able to obtain the system described in the result page and reproduce the result (within a small range for run-to-run variation).

Nevertheless, SPEC recognizes that chip and motherboard suppliers have a legitimate interest in CPU benchmarking. For those suppliers, the performance-relevant hardware components typically are the cpu chip, motherboard, and memory; but users would not be able to reproduce a result using only those three. To actually run the benchmarks, the user has to supply other components, such as a case, power supply, and disk; perhaps also a specialized CPU cooler, extra fans, a disk controller, graphics card, network adapter, BIOS, and configuration software.

Such systems are sometimes referred to as "white box", "home built", "kit built", or by various informal terms. For SPEC purposes, the key point is that the user has to do extra work in order to reproduce the performance of the tested components; therefore, this document refers to such systems as "user built".

  1. For user built systems, the configuration disclosure must supply a parts list sufficient to reproduce the result. As of the listed availability dates in the disclosure, the user should be able to obtain the items described in the disclosure, spread them out on an anti-static work area, and, by following the instructions supplied with the components, plus any special instructions in the SPEC disclosure, build a working system that reproduces the result. It is acceptable to describe components using a generic name (e.g. "Any ATX case"), but the recipe must also give specific model names or part numbers that the user could order (e.g. "such as a Mimble Company ATX3 case").

  2. Component settings that are listed in the disclosure must be within the supported ranges for those components. For example, if the memory timings are manipulated in the BIOS, the selected timings must be supported for the chosen type of memory.

  3. Graphics adapters:

    1. Sometimes a motherboard does not provide a graphics adapter. For many operating systems, in order to install the software, a graphics card must be added; but the choice of adapter does not noticeably affect SPEC CPU 2017 performance. For such a system, the graphics adapter can be described in the free form notes, and does not need to be listed in the field "Other Hardware".
    2. Other motherboards include a built-in graphics adapter, but SPEC CPU 2017 performance improves when an external adapter is added. If one is added, it is, therefore, performance relevant: list the graphics adapter that was used under "Other Hardware".
  4. Power modes: Sometimes CPU chips are capable of running with differing performance characteristics according to how much power the user would like to spend. If non-default power choices are made for a user built system, those choices must be documented in the notes section.

  5. Cooling systems: Sometimes CPU chips are capable of running with degraded performance if the cooling system (fans, heatsinks, etc.) is inadequate. When describing user built systems, the notes section must describe how to provide cooling that allows the chip to achieve the measured performance.

  6. Components for a user built system may be divided into two kinds: performance-relevant (for SPEC CPU 2017), and non-performance-relevant. For example, SPEC CPU 2017 benchmark scores are affected by memory speed, and motherboards often support more than one choice for memory; therefore, the choice of memory type is performance-relevant. By contrast, the motherboard needs to be mounted in a case. Which case is chosen in not normally performance-relevant; it simply has to be the correct size (e.g. ATX, microATX, etc).

    1. Performance-relevant components must be described in fields for "Configuration Disclosure" (see rules 4.4.2, and 4.4.3). These fields begin with hw_ or sw_ in the config file, as described in config.html (including hw_other and sw_other, which can be used for components not already covered by other fields). If more detail is needed beyond what will fit in the fields, add more information under the free-form notes.

    2. Components that are not performance-relevant are to be described in the free-form notes.

Example:

hw_cpu_name    = Frooble 1500 
hw_memory      = 2 GB (2x 1GB Mumble Inc Z12 DDR2 1066) 
sw_other       = SnailBios 17
notes_plat_000 = 
notes_plat_005 = The BIOS is the Mumble Inc SnailBios Version 17,
notes_plat_010 = which is required in order to set memory timings
notes_plat_015 = manually to DDR2-800 5-5-5-15.  The 2 DIMMs were
notes_plat_020 = configured in dual-channel mode. 
notes_plat_025 = 
notes_plat_030 = A standard ATX case is required, along with a 500W
notes_plat_035 = (minimum) ATX power supply [4-pin (+12V), 8-pin (+12V)
notes_plat_040 = and 24-pin are required].  An AGP or PCI graphics
notes_plat_045 = adapter is required in order to configure the system.
notes_plat_050 =
notes_plat_055 = The Frooble 1500 CPU chip is available in a retail box,
notes_plat_060 = part 12-34567, with appropriate heatsinks and fan assembly.  
notes_plat_065 =
notes_plat_070 = As tested, the system used a Mimble Company ATX3 case,
notes_plat_075 = a Frimble Ltd PS500 power supply, and a Frumble
notes_plat_080 = Corporation PCIe Z19 graphics adapter.
notes_plat_085 = 

4.9. Documentation for cross-compiles

It was mentioned in rule 2 that it is allowed to build on a different system than the system under test. This rule describes when and how to document such builds.

  1. Circumstances under which additional documentation is required for the build environment

    1. If all components of the build environment are available for the run environment, and if both belong to the same product family and are running the same operating system versions, then this is not considered a cross-compilation. The fact that the binaries were built on a different system than the run time system does not need to be documented.

    2. If the software used to build the benchmark executables is not available on the SUT, or if the host system provides performance gains via specialized tuning or hardware not available on the SUT, the host system(s) and software used for the benchmark building process must be documented.

    3. Sometimes, the person building the benchmarks may not know which of the two previous paragraphs apply, because the benchmark binaries and config file are redistributed to other users who run the actual tests. In this situation, the build environment must be documented.

  2. How to document a build environment.

    1. Compiler name and revision: document in the usual manner, in the Software section (i.e. using config file fields that begin with 'sw_').
    2. Performance libraries added (e.g. optimized memory allocators, tuned math libraries): document in the Software section. This requirement applies even if the binaries are statically linked.
    3. Build-time hardware: do not document in the Hardware section (i.e. do not use the config file fields that begin with 'hw_'); use the notes section, instead.
    4. Build-time Operating System: do not document in the Software section; use the notes section instead.
    5. Dependencies: Benchmark binaries sometimes have dependencies that must be satisfied on the SUT, and it is not uncommon for these dependencies to vary depending on characteristics of the SUT.
      Example: on OS Rev 10 you need patch #1234 with a new runtime loader, but on OS Rev 11, you do not need any patches, because Rev 11 already includes the new loader.
      Such dependencies are usually best documented in the notes, where space is available to explain the circumstances under which they apply. If a dependency is elevated to the Software section, perhaps because it is felt to be a major item that needs visibility, care must be taken to avoid confusion.
    6. Other components: If there are other hardware or software components of the build system that are relevant to performance of the generated binaries, disclose these components in the notes section.
    7. Notes Format: In the notes section, add a statement of the form "Binaries were compiled on a system with <Number of CPU chips> <type of CPU chip(s)> + <nn> <unit> Memory, using <name of os>  <os version> <(other info, such as dependencies and other components)>." Examples:
      1. Binaries were compiled on a system with 2x Model 19 CPU chips + 3 TB Memory using AcmeOS V11
      2. Binaries were compiled on a system with 2x Model 19 CPU chips + 3 TB Memory using AcmeOS V11 (with binutils V99.97 and the Acme CPU Accelerator).

4.10. Metrics

The actual test results consist of the elapsed times and ratios for the individual benchmarks and the overall SPEC metric produced by running the benchmarks via the SPEC tools. The required use of the SPEC tools helps ensure that the results generated are based on benchmarks built, run, and validated according to the SPEC run rules. Below is a list of the measurement components for each SPEC CPU 2017 suite and metric:

4.10.1. SPECspeed® Metrics

The elapsed time in seconds for each of the benchmarks in the SPECspeed 2017 Integer or SPECspeed 2017 Floating Point suite is measured and the ratio to the reference machine (a Sun Fire V490 using the year 2006 UltraSPARC IV+ chip) is calculated for each benchmark as:

SPECspeed Benchmark Performance Ratios
--------------------------------------
   Time on the reference machine
   / Time on the system under test

The overall SPECspeed metrics are calculated as a Geometric Mean of the individual ratios, where each ratio is based on the median execution time from three runs, or the slower of two runs, as explained in rule 1.2.1. All runs are required to validate correctly.

The benchmark executables must have been built according to the rules described in rule 2 above.

4.10.2. SPECrate® Metrics

The SPECrate (throughput) metrics are calculated based on the execution time for a tester-selected number of copies for each benchmark to be run. The same number of copies must be used for all benchmarks in a base test. This is not true for the peak results where the tester is free to select any combination of copies. The number of copies selected is usually a function of the number of CPUs in the system.

The SPECrate is calculated for each benchmark as:

SPECrate Benchmark Performance Ratios
-------------------------------------
   Number of copies *
   (Time on the reference machine 
   / Time on the system under test) 

The overall SPECrate metrics are calculated as a geometric mean from the individual benchmark SPECrate metrics using the median time from three runs or the slower of two runs, as explained above (rule 1.2.1).

As with the SPECspeed metric, all copies of the benchmark during each run are required to have validated correctly.

4.10.3. Energy Metrics

SPEC CPU 2017 includes the ability to measure and report energy metrics, including Maximum Power, Average Power, and Energy (in kilojoules).

As of SPEC CPU 2017 V1.1, power measurement and reporting:

  1. Is optional. Testers are not required to measure or report power.
  2. Produces official metrics. The SPEC Fair Use Rules www.spec.org/fairuse.html describe the allowed usage of SPEC Metrics for system comparisons. For purposes of Fair Use, power metrics from SPEC CPU 2017 v1.1 and later are official SPEC Metrics and as such may be used in comparisons.

If power measurement is enabled, the total energy consumed in kilojoules for each of the benchmarks is measured and the energy ratio to the consumption of the reference machine (a Sun Fire V490 using the year 2006 UltraSPARC IV+ chip) is calculated as:

SPECspeed Benchmark Energy Ratios           SPECrate Benchmark Energy Ratios
---------------------------------           ----------------------------------
Energy on the reference machine             Number of copies *
/ Energy on the system under test           (Energy on the reference machine 
                                            / Energy on the system under test) 

The overall energy metrics are calculated as a Geometric Mean of the individual energy ratios, where each ratio is based on the total energy consumed by the performance run selected for reporting. The overall metrics are called:

SPECspeed®2017_int_energy_base              SPECrate®2017_int_energy_base
SPECspeed®2017_int_energy_peak              SPECrate®2017_int_energy_peak
SPECspeed®2017_fp_energy_base               SPECrate®2017_fp_energy_base
SPECspeed®2017_fp_energy_peak               SPECrate®2017_fp_energy_peak

5. SPEC Process Information

5.1. Run Rule Exceptions

If for some reason, the tester cannot run the benchmarks as specified in these rules, the tester can seek SPEC approval for performance-neutral alternatives. No publication may be done without such approval. See Technical Support for SPEC CPU 2017 for information.

5.2. Publishing on the SPEC website

Procedures for result submission are described in the Open Systems Group Policies and Procedures Document, www.spec.org/osg/policy.html. Additional information on how to publish a result on SPEC's web site may be obtained from the SPEC office. Contact information is maintained at the SPEC web site, www.spec.org/.

5.3. Fair Use

Consistency and fairness are guiding principles for SPEC. To help assure that these principles are met, any organization or individual who makes public use of SPEC benchmark results must do so in accordance with the SPEC Fair Use Rule, as posted at www.spec.org/fairuse.html.

5.4. Research and Academic usage of CPU 2017

SPEC encourages use of the CPU 2017 suites in academic and research environments. It is understood that experiments in such environments may be conducted in a less formal fashion than that demanded of testers who publish on the SPEC web site. For example, a research environment may use early prototype hardware that simply cannot be expected to stay up for the length of time required to meet the Continuous Run requirement (see rule 3.2), or may use research compilers that are unsupported and are not generally available (see rule 1).

Nevertheless, SPEC would like to encourage researchers to obey as many of the run rules as practical, even for informal research. SPEC respectfully suggests that following the rules will improve the clarity, reproducibility, and comparability of research results.

Where the rules cannot be followed, SPEC requires that the deviations from the rules be clearly disclosed, and that any SPEC metrics (such as SPECrate®2017_int_base) be clearly marked as estimated.

It is especially important to clearly distinguish results that do not comply with the run rules when the areas of non-compliance are major, such as not using the reference workload, or only being able to correctly validate a subset of the benchmarks.

5.5. Required Disclosures

If a SPEC CPU 2017 licensee publicly discloses a CPU 2017 result (for example in a press release, academic paper, magazine article, or public web site), and does not clearly mark the result as an estimate, any SPEC member may request that the rawfile(s) from the run(s) be sent to SPEC. The rawfiles must be made available to all interested members no later than 10 working days after the request. The rawfile is expected to be complete, including configuration information (rule 4.4 above).

A required disclosure is considered public information as soon as it is provided, including the configuration description.

For example, Company A claims a result of 1000 SPECrate®2017_int_peak. A rawfile is requested, and supplied. Company B notices that the result was achieved by stringing together 50 chips in single-user mode. Company B is free to use this information in public (e.g. it could compare the Company A machine vs. a Company B machine that scores 999 using only 25 chips in multi-user mode).

Review of the result: Any SPEC member may request that a required disclosure be reviewed by the SPEC CPU subcommittee. At the conclusion of the review period, if the tester does not wish to have the result posted on the SPEC result pages, the result will not be posted. Nevertheless, as described above, the details of the disclosure are public information.

When public claims are made about CPU 2017 results, whether by vendors or by academic researchers, SPEC reserves the right to take action if the rawfile is not made available, or shows different performance than the tester's claim, or has other rule violations.

5.6. Estimates

5.6.1 Estimates are not allowed for energy metrics

Estimates are not allowed for any of the SPEC CPU 2017 energy metrics. All public use of SPEC CPU 2017 Energy Metrics must be from rule-compliant results.

5.6.2 Estimates are allowed for performance metrics

SPEC CPU 2017 performance metrics (but not energy metrics) may be estimated.

  1. All estimates must be clearly identified as such. It is acceptable to estimate a single metric (for example, SPECrate®2017_int_base, or SPECspeed®2017_fp_peak, or the elapsed seconds for 500.perlbench_r).
  2. It is permitted to estimate only the peak metric; one is not required to provide a corresponding estimate for base.

  3. SPEC requires that every use of an estimated number be clearly marked with "est." or "estimated" next to each estimated number, rather than burying a footnote at the bottom of a page. For example:

     The JumboFast will achieve estimated performance of:
       Model 1   SPECrate®2017_int_base  50 est.
                 SPECrate®2017_int_peak  60 est.
       Model 2   SPECrate®2017_int_base  70 est.
                 SPECrate®2017_int_peak  80 est.
          
  4. If estimates are used in graphs, the word "estimated" or "est." must be plainly visible within the graph, for example in the title, the scale, the legend, or next to each individual result that is estimated. Note that the term "plainly visible" in this rule is not defined; it is intended as a call for responsible design of graphical elements. Nevertheless, for the sake of giving at least rough guidance, here are two examples of the right way and wrong way to mark estimated results in graphs:

    • Acceptable: a 3 inch by 4 inch graph has 12 point (=1 pica) "est." markings directly above the top of every affected bar, using black type against a white background.
    • Unacceptable: a 1 meter by 3 meter poster has 12 point "est." markings ambiguously placed, with light gray text on a dark gray background
  5. Licensees are encouraged to give a rationale or methodology for any estimates, together with other information that may help the reader assess the accuracy of the estimate. For example:

    1. "This is a measured estimate: SPEC CPU 2017 was run on pre-production hardware. Customer systems, planned for Q4, are expected to be similar."
    2. "Performance estimates are modeled using the cycle simulator GrokSim Mark IV. It is likely that actual hardware, if built, would include significant differences."
  6. Those who publish estimates are encouraged to publish actual SPEC CPU 2017 metrics as soon as possible.

5.7. Procedures for Non-compliant results.

The procedures regarding Non-Compliant results are briefly outlined in the SPEC Open Systems Group (OSG) Policy Document, www.spec.org/osg/policy.html, and are described in detail in the document "Violations Determination, Penalties, and Remedies", www.spec.org/spec/docs/penaltiesremedies.pdf.

SPEC CPU®2017 Run and Reporting Rules SPEC Open Systems Group: Copyright © 2017-2021 Standard Performance Evaluation Corporation (SPEC®)