OMP2012 Flag Description
Hewlett Packard Enterprise Integrity MC990 X (2.20 GHz, Intel Xeon E7-8890 v4)

Test sponsored by HPE

Copyright © 2012 Intel Corporation. All Rights Reserved.


Base Compiler Invocation

C benchmarks

C++ benchmarks

Fortran benchmarks


Peak Compiler Invocation

C benchmarks

C++ benchmarks

Fortran benchmarks


Base Portability Flags

350.md

357.bt331

363.swim

367.imagick

370.mgrid331


Peak Portability Flags

350.md

357.bt331

363.swim

367.imagick

370.mgrid331


Base Optimization Flags

C benchmarks

C++ benchmarks

Fortran benchmarks

350.md

351.bwaves

357.bt331

360.ilbdc

362.fma3d

363.swim

370.mgrid331

371.applu331


Peak Optimization Flags

C benchmarks

352.nab

358.botsalgn

359.botsspar

367.imagick

372.smithwa

C++ benchmarks

376.kdtree

Fortran benchmarks

350.md

351.bwaves

357.bt331

360.ilbdc

362.fma3d

363.swim

370.mgrid331

371.applu331


Implicitly Included Flags

This section contains descriptions of flags that were included implicitly by other flags, but which do not have a permanent home at SPEC.


Shell, Environment, and Other Software Settings

Open MP Tuning Flags

  • KMP_AFFINITY

    The KMP_AFFINITY environment variable uses the following general syntax:

    Syntax

    KMP_AFFINITY=[<modifier>,...]<type>[,<permute>][,<offset>]

    For example, to list a machine topology map, specify KMP_AFFINITY=verbose,none to use a modifier of verbose and a type of none.

    The following table describes the supported specific arguments.

    Argument

    Default

    Description

    modifier

    noverbose

    respect

    granularity=core

    Optional. String consisting of keyword and specifier.

    • granularity=<specifier>
      takes the following specifiers: fine, thread, and core

    • norespect

    • noverbose

    • nowarnings

    • proclist={<proc-list>}

    • respect

    • verbose

    • warnings

    type

    none

    Required string. Indicates the thread affinity to use.

    • compact

    • disabled

    • explicit

    • none

    • scatter

    • logical (deprecated; instead use compact, but omit any permute value)

    • physical (deprecated; instead use scatter, possibly with an offset value)

    The logical and physical types are deprecated but supported for backward compatibility.

    permute

    0

    Optional. Positive integer value. Not valid with type values of explicit, none, or disabled.

    offset

    0

    Optional. Positive integer value. Not valid with type values of explicit, none, or disabled.

    Affinity Types

    Type is the only required argument.

    type = none (default)

    Does not bind OpenMP threads to particular thread contexts; however, if the operating system supports affinity, the compiler still uses the OpenMP thread affinity interface to determine machine topology. Specify KMP_AFFINITY=verbose,none to list a machine topology map.

    type = compact

    Specifying compact assigns the OpenMP thread <n>+1 to a free thread context as close as possible to the thread context where the <n> OpenMP thread was placed. For example, in a topology map, the nearer a node is to the root, the more significance the node has when sorting the threads.

    type = disabled

    Specifying disabled completely disables the thread affinity interfaces. This forces the OpenMP run-time library to behave as if the affinity interface was not supported by the operating system. This includes the low-level API interfaces such as kmp_set_affinity and kmp_get_affinity, which have no effect and will return a nonzero error code.

    type = explicit

    Specifying explicit assigns OpenMP threads to a list of OS proc IDs that have been explicitly specified by using the proclist= modifier, which is required for this affinity type.

    type = scatter

    Specifying scatter distributes the threads as evenly as possible across the entire system. scatter is the opposite of compact; so the leaves of the node are most significant when sorting through the machine topology map.

    Deprecated Types: logical and physical

    Types logical and physical are deprecated and may become unsupported in a future release. Both are supported for backward compatibility.

    For logical and physical affinity types, a single trailing integer is interpreted as an offset specifier instead of a permute specifier. In contrast, with compact and scatter types, a single trailing integer is interpreted as a permute specifier.

    Specifying logical assigns OpenMP threads to consecutive logical processors, which are also called hardware thread contexts. The type is equivalent to compact, except that the permute specifier is not allowed. Thus, KMP_AFFINITY=logical,n is equivalent to KMP_AFFINITY=compact,0,n  (this equivalence is true regardless of the whether or not a  granularity=fine modifier is present).

    Permute and offset combinations

    For both compact and scatter, permute and offset are allowed; however, if you specify only one integer, the compiler interprets the value as a permute specifier. Both permute and offset default to 0.  

    The permute specifier controls which levels are most significant when sorting the machine topology map. A value for permute forces the mappings to make the specified number of most significant levels of the sort the least significant, and it inverts the order of significance. The root node of the tree is not considered a separate level for the sort operations.

    The offset specifier indicates the starting position for thread assignment.

    Modifier Values for Affinity Types

    Modifiers are optional arguments that precede type. If you do not specify a modifier, the noverbose, respect, and granularity=core modifiers are used automatically.

    Modifiers are interpreted in order from left to right, and can negate each other. For example, specifying KMP_AFFINITY=verbose,noverbose,scatter is therefore equivalent to setting KMP_AFFINITY=noverbose,scatter, or just KMP_AFFINITY=scatter.

    modifier = noverbose (default)

    Does not print verbose messages.

    modifier = verbose

    Prints messages concerning the supported affinity. The messages include information about the number of packages, number of cores in each package, number of thread contexts for each core, and OpenMP thread bindings to physical thread contexts.

    Information about binding OpenMP threads to physical thread contexts is indirectly shown in the form of the mappings between hardware thread contexts and the operating system (OS) processor (proc) IDs. The affinity mask for each OpenMP thread is printed as a set of OS processor IDs.

    KMP_AFFINITY example, syntax: KMP_AFFINITY=[modifier,...]type[,permute][,offset]

    The value for the environment variable KMP_AFFINITY affects how the threads from an auto-parallelized program are scheduled across processors. It applies to binaries built with -openmp and -parallel (Linux and Mac OS X) or /Qopenmp and /Qparallel (Windows).

    modifier:

    type:

    permute:

    offset:

    Please see the Thread Affinity Interface article in the Intel Composer XE Documentation for more details.

    Example: KMP_AFFINITY=granularity=fine,scatter

    Example: KMP_AFFINITY=compact,1

  • KMP_LIBRARY

    KMP_LIBRARY = [ throughput | turnaround | serial ], Selects the OpenMP run-time library execution mode. The options for the variable value are throughput, turnaround, and serial.

    Execution modes

    The compiler with OpenMP enables you to run an application under different execution modes that can be specified at run time. The libraries support the serial, turnaround, and throughput modes.

    Serial

    The serial mode forces parallel applications to run on a single processor.

    Turnaround

    In a dedicated (batch or single user) parallel environment where all processors are exclusively allocated to the program for its entire run, it is most important to effectively utilize all of the processors all of the time. The turnaround mode is designed to keep active all of the processors involved in the parallel computation in order to minimize the execution time of a single job. In this mode, the worker threads actively wait for more parallel work, without yielding to other threads.

    Avoid over-allocating system resources. This occurs if either too many threads have been specified, or if too few processors are available at run time. If system resources are over-allocated, this mode will cause poor performance. The throughput mode should be used instead if this occurs.

    Throughput

    In a multi-user environment where the load on the parallel machine is not constant or where the job stream is not predictable, it may be better to design and tune for throughput. This minimizes the total time to run multiple jobs simultaneously. In this mode, the worker threads will yield to other threads while waiting for more parallel work.

    The throughput mode is designed to make the program aware of its environment (that is, the system load) and to adjust its resource usage to produce efficient execution in a dynamic environment. This mode is the default.

  • KMP_BLOCKTIME

    KMP_BLOCKTIME = value. Sets the time, in milliseconds, that a thread should wait, after completing the execution of a parallel region, before sleeping.Use the optional character suffixes: s (seconds), m (minutes), h (hours), or d (days) to specify the units.Specify infinite for an unlimited wait time.

  • KMP_STACKSIZE

    KMP_STACKSIZE = value. Sets the number of bytes to allocate for each OpenMP* thread to use as the private stack for the thread. Recommended size is 16m. Use the optional suffixes: b (bytes), k (kilobytes), m (megabytes), g (gigabytes), or t (terabytes) to specify the units. This variable does not affect the native operating system threads created by the user program nor the thread executing the sequential part of an OpenMP* program or parallel programs created using -parallel.

  • KMP_DETERMINISTIC_REDUCTION

    KMP_DETERMINISTIC_REDUCTION = value. Enables (true) or disables (false) the use of a specific ordering of the reduction operations for implementing the reduction clause for an OpenMP parallel region. This has the effect that, for a given number of threads, in a given parallel region, for a given data set and reduction operation, a floating point reduction done for an OpenMP* reduction clause has a consistent floating point result from run to run, since round-off errors are identical. Default: value = false/0

  • KMP_SCHEDULE

    KMP_SCHEDULE = type[,chunk]. Fine tune the load balancing of parallel loops that are statically scheduled under OpenMP with no chunk size specification. Setting it to "static,balanced" results in (#iterations/#threads) iterations--rounded to the next lower integer--being allocated to most threads, with at most one additional iteration being allocated to some threads. Although the largest number of iterations assigned to any thread remains the same, this results in a more even sharing of iterations between threads, which may sometimes lead to a performance improvement relative to the default static thread distribution.

  • OMP_NUM_THREADS

    Sets the maximum number of threads to use for OpenMP* parallel regions if no other value is specified in the application. This environment variable applies to both -openmp and -parallel. Example syntax on a Linux system with 8 cores: export OMP_NUM_THREADS=8

  • OMP_DYNAMIC

    OMP_DYNAMIC=[ 1 | 0 ] Enables (1) or disables (0) the dynamic adjustment of the number of threads.

  • OMP_WAIT_POLICY

    OMP_WAIT_POLICY = value. Decides whether threads spin (active) or sleep (passive) while they are waiting.

  • OMP_SCHEDULE

    OMP_SCHEDULE = type[,chunk] type = [static | dynamic | guided | auto] chunk = optional positive integer that specifies the chunk size The OMP_SCHEDULE environment variable controls the schedule type and chunk size of all loop directives that have the schedule type runtime,

  • OMP_NESTED

    OMP_NESTED={ 1 | 0 } Enables creation of new teams in case of nested parallel regions (1,true) or serializes (0,false) all nested parallel regions. Default is 0.


  • Operating System Tuning Parameters

    OS Tuning

    submit= MYMASK=`printf '0x%x' \$((1<<\$SPECCOPYNUM))`; /usr/bin/taskset \$MYMASK $command

    When running multiple copies of benchmarks, the SPEC config file feature submit is sometimes used to cause individual jobs to be bound to specific processors. This specific submit command is used for Linux. The description of the elements of the command are:

    ulimit -s [n | unlimited] (Linux)

    Sets the stack size to n kbytes, or unlimited to allow the stack size to grow without limit.

    vm.max_map_count=-n (Linux)

    The maximum number of memory map areas a process may have. Memory map areas are used as a side-effect of calling malloc, directly by mmap and mprotect, and also when loading shared libraries.

    Performance Governors (Linux)

    The in-kernel CPU frequency governors are pre-configured power schemes for the CPU. The CPUfreq governors use P-states to change frequencies and lower power consumption. The dynamic governors can switch between CPU frequencies, based on CPU utilization to allow for power savings while not sacrificing performance.

    For the Performance Governor the CPU frequency is statically set to the highest possible for maximum performance.

    On SUSE SLES 11 and 12 systems you can set the in-kernel CPU frequency governor for all CPUs to the Performance Governor with the following command:


    Firmware / BIOS / Microcode Settings

    Firmware Settings

    One or more of the following settings may have been set. If so, the "Platform Notes" section of the report will say so; and you can read below to find out more about what these settings mean.

    Intel Hyperthreading Options (Default = Enabled):

    This feature allows enabling/disabling of logical processor cores on processors supporting Intel's Hyper-Threading Technology. This option may improve overall performance for applications that will benefit from higher processor core count.

    Intel Turbo Boost Technology (Turbo) (Default = Enabled):

    Turbo Boost Technology enables the processor to transition to a higher frequency than the processor's rated speed if the processor has available power and is within temperature specifications. Disabling this option reduces power usage and also reduces the system's maximum achievable performance under some workloads.

    Memory RAS Configuration (Default = RAS):

    This option controls the configuration of memory reliability, availability and serviceability (RAS) features. When setting this option to Maximum Performance system performance is optimized. When setting this option to RAS (also known as lockstep mode) system reliability is enhanced with a degree of memory redundancy.


    Flag description origin markings:

    [user] Indicates that the flag description came from the user flags file.
    [suite] Indicates that the flag description came from the suite-wide flags file.
    [benchmark] Indicates that the flag description came from a per-benchmark flags file.

    The flags files that were used to format this result can be browsed at
    http://www.spec.org/omp2012/flags/HP-Platform-Flags-Intel-V1.2-Integrity-revD.20160804.html,
    http://www.spec.org/omp2012/flags/hp-ic16.0.2-linux64.v1.html.

    You can also download the XML flags sources by saving the following links:
    http://www.spec.org/omp2012/flags/HP-Platform-Flags-Intel-V1.2-Integrity-revD.20160804.xml,
    http://www.spec.org/omp2012/flags/hp-ic16.0.2-linux64.v1.xml.


    For questions about the meanings of these flags, please contact the tester.
    For other inquiries, please contact webmaster@spec.org
    Copyright 2012-2016 Standard Performance Evaluation Corporation
    Tested with SPEC OMP2012 v1.0.
    Report generated on Thu Aug 4 10:51:33 2016 by SPEC OMP2012 flags formatter v538.