OMP2012 Flag Description
Huawei Huawei Kunlun 9008 V5 (Intel Xeon Platinum 8280, 2.7 GHz)

Copyright © 2012 Intel Corporation. All Rights Reserved.

Base Compiler Invocation

C benchmarks

C++ benchmarks

Fortran benchmarks

Base Portability Flags




Base Optimization Flags

C benchmarks

C++ benchmarks

Fortran benchmarks

Implicitly Included Flags

This section contains descriptions of flags that were included implicitly by other flags, but which do not have a permanent home at SPEC.

Shell, Environment, and Other Software Settings

Open MP Tuning Flags


    The KMP_AFFINITY environment variable uses the following general syntax:



    For example, to list a machine topology map, specify KMP_AFFINITY=verbose,none to use a modifier of verbose and a type of none.

    The following table describes the supported specific arguments.








    Optional. String consisting of keyword and specifier.

    • granularity=<specifier>
      takes the following specifiers: fine, thread, and core

    • norespect

    • noverbose

    • nowarnings

    • proclist={<proc-list>}

    • respect

    • verbose

    • warnings



    Required string. Indicates the thread affinity to use.

    • compact

    • disabled

    • explicit

    • none

    • scatter

    • logical (deprecated; instead use compact, but omit any permute value)

    • physical (deprecated; instead use scatter, possibly with an offset value)

    The logical and physical types are deprecated but supported for backward compatibility.



    Optional. Positive integer value. Not valid with type values of explicit, none, or disabled.



    Optional. Positive integer value. Not valid with type values of explicit, none, or disabled.

    Affinity Types

    Type is the only required argument.

    type = none (default)

    Does not bind OpenMP threads to particular thread contexts; however, if the operating system supports affinity, the compiler still uses the OpenMP thread affinity interface to determine machine topology. Specify KMP_AFFINITY=verbose,none to list a machine topology map.

    type = compact

    Specifying compact assigns the OpenMP thread <n>+1 to a free thread context as close as possible to the thread context where the <n> OpenMP thread was placed. For example, in a topology map, the nearer a node is to the root, the more significance the node has when sorting the threads.

    type = disabled

    Specifying disabled completely disables the thread affinity interfaces. This forces the OpenMP run-time library to behave as if the affinity interface was not supported by the operating system. This includes the low-level API interfaces such as kmp_set_affinity and kmp_get_affinity, which have no effect and will return a nonzero error code.

    type = explicit

    Specifying explicit assigns OpenMP threads to a list of OS proc IDs that have been explicitly specified by using the proclist= modifier, which is required for this affinity type.

    type = scatter

    Specifying scatter distributes the threads as evenly as possible across the entire system. scatter is the opposite of compact; so the leaves of the node are most significant when sorting through the machine topology map.

    Deprecated Types: logical and physical

    Types logical and physical are deprecated and may become unsupported in a future release. Both are supported for backward compatibility.

    For logical and physical affinity types, a single trailing integer is interpreted as an offset specifier instead of a permute specifier. In contrast, with compact and scatter types, a single trailing integer is interpreted as a permute specifier.

    Specifying logical assigns OpenMP threads to consecutive logical processors, which are also called hardware thread contexts. The type is equivalent to compact, except that the permute specifier is not allowed. Thus, KMP_AFFINITY=logical,n is equivalent to KMP_AFFINITY=compact,0,n  (this equivalence is true regardless of the whether or not a  granularity=fine modifier is present).

    Permute and offset combinations

    For both compact and scatter, permute and offset are allowed; however, if you specify only one integer, the compiler interprets the value as a permute specifier. Both permute and offset default to 0.  

    The permute specifier controls which levels are most significant when sorting the machine topology map. A value for permute forces the mappings to make the specified number of most significant levels of the sort the least significant, and it inverts the order of significance. The root node of the tree is not considered a separate level for the sort operations.

    The offset specifier indicates the starting position for thread assignment.

    Modifier Values for Affinity Types

    Modifiers are optional arguments that precede type. If you do not specify a modifier, the noverbose, respect, and granularity=core modifiers are used automatically.

    Modifiers are interpreted in order from left to right, and can negate each other. For example, specifying KMP_AFFINITY=verbose,noverbose,scatter is therefore equivalent to setting KMP_AFFINITY=noverbose,scatter, or just KMP_AFFINITY=scatter.

    modifier = noverbose (default)

    Does not print verbose messages.

    modifier = verbose

    Prints messages concerning the supported affinity. The messages include information about the number of packages, number of cores in each package, number of thread contexts for each core, and OpenMP thread bindings to physical thread contexts.

    Information about binding OpenMP threads to physical thread contexts is indirectly shown in the form of the mappings between hardware thread contexts and the operating system (OS) processor (proc) IDs. The affinity mask for each OpenMP thread is printed as a set of OS processor IDs.


    KMP_LIBRARY = { throughput | turnaround | serial }, Selects the OpenMP run-time library execution mode. The options for the variable value are throughput, turnaround, and serial.

    Execution modes

    The compiler with OpenMP enables you to run an application under different execution modes that can be specified at run time. The libraries support the serial, turnaround, and throughput modes.


    The serial mode forces parallel applications to run on a single processor.


    In a dedicated (batch or single user) parallel environment where all processors are exclusively allocated to the program for its entire run, it is most important to effectively utilize all of the processors all of the time. The turnaround mode is designed to keep active all of the processors involved in the parallel computation in order to minimize the execution time of a single job. In this mode, the worker threads actively wait for more parallel work, without yielding to other threads.

    Avoid over-allocating system resources. This occurs if either too many threads have been specified, or if too few processors are available at run time. If system resources are over-allocated, this mode will cause poor performance. The throughput mode should be used instead if this occurs.


    In a multi-user environment where the load on the parallel machine is not constant or where the job stream is not predictable, it may be better to design and tune for throughput. This minimizes the total time to run multiple jobs simultaneously. In this mode, the worker threads will yield to other threads while waiting for more parallel work.

    The throughput mode is designed to make the program aware of its environment (that is, the system load) and to adjust its resource usage to produce efficient execution in a dynamic environment. This mode is the default.


    KMP_BLOCKTIME = value. Sets the time, in milliseconds, that a thread should wait, after completing the execution of a parallel region, before sleeping.Use the optional character suffixes: s (seconds), m (minutes), h (hours), or d (days) to specify the units.Specify infinite for an unlimited wait time.


    KMP_STACKSIZE = value. Sets the number of bytes to allocate for each OpenMP* thread to use as the private stack for the thread. Recommended size is 16m. Use the optional suffixes: b (bytes), k (kilobytes), m (megabytes), g (gigabytes), or t (terabytes) to specify the units. This variable does not affect the native operating system threads created by the user program nor the thread executing the sequential part of an OpenMP* program or parallel programs created using -parallel.


    Sets the maximum number of threads to use for OpenMP* parallel regions if no other value is specified in the application. This environment variable applies to both -openmp and -parallel. Example syntax on a Linux system with 8 cores: export OMP_NUM_THREADS=8


    OMP_DYNAMIC={ 1 | 0 } Enables (1, true) or disables (0,false) the dynamic adjustment of the number of threads.


    OMP_SCHEDULE={ type,[chunk size]} Controls the scheduling of the for-loop work-sharing construct. type can be either of static,dynamic,guided,runtime chunk size should be positive integer


    OMP_NESTED={ 1 | 0 } Enables creation of new teams in case of nested parallel regions (1,true) or serializes (0,false) all nested parallel regions. Default is 0.

  • Operating System Tuning Parameters

    Install only the relevant files

    Select only test related files when installing the operating system,So that many services are not installed, this will reduce the consumption of resources by the operating system itself. In accordance with the following methods to install the operating system: 1.The software installation mode was selected 'Customize now'. 2.Next,In 'base System' column, We choose the following installation package,'Base','Compatibility Libraries', 'Java Platform','Large Systems Performance','Performance Tools','Perl Support'.In 'Development' column, We choose the following installation package,'Development tools'.That is all the installation package.


    Set this environment variable to "yes" to enable applications to use large pages.


    Setting this environment variable is necessary to enable applications to use large pages.

    Cpufreq setting

    "cpupower frequency-set" provides a simplified mechanism to adjust processor frequencies when cpu frequency scaling is enabled in the OS. See the cpupower-frequency-set man page for details.Here is a brief description of options used in the config file. By default, settings are applied to all logical cpus in the system.Frequencies can be passed in Hz, kHz (default), MHz, GHz, or THz by postfixing the value with the desired unit name, without any space. Available frequencies and governors can be determined with "cpupower frequency-info".

    Tmpfs filesystem setting

    Tmpfs is a file system which keeps all files in virtual memory.A tmpfs file system will go to swap if memory pressure demands real memory for applications. This can have a very negative effect on the I/O load and system performance

    Process tuning setting

    Each process is assigned a time period, known as its time slice, that is the time allowed to run the process. Increse the process time slice can have a positive effect on the calculated sensitivity task. The related kernel parameters are sched_wakeup_granularity_ns, sched_min_granularity_ns, etc.

    Transparent Huge Pages

    Transparent Hugepages increase the memory page size from 4 kilobytes to 2 megabytes. Transparent Hugepages provide significant performance advantages on systems with highly contended resources and large memory workloads. If memory utilization is too high or memory is badly fragmented which prevents hugepages being allocated, the kernel will assign smaller 4k pages instead.

    On RedHat EL6 and later, Transparent Hugepages are used by default if /sys/kernel/mm/redhat_transparent_hugepage/enabled is set to always. The default value is always.

    On SUSE SLES11 and later, Transparent Hugepages are used by default if /sys/kernel/mm/transparent_hugepage/enabled is set to always. The default value is always.

    Kernel Boot Parameter

    nohz_full: This kernel option sets adaptive tick mode (NOHZ_FULL) to specified porcessors. Since the number of interrupts is reduced to ones per second, latency-sensitive applications can take advantage of it.

    Firmware / BIOS / Microcode Settings

    Hardware Prefetch:

    This BIOS option allows the enabling/disabling of a processor mechanism to prefetch data into the cache according to a pattern-recognition algorithm In some cases, setting this option to Disabled may improve performance. Users should only disable this option after performing application benchmarking to verify improved performance in their environment.

    Adjacent Sector Prefetch:

    This BIOS option allows the enabling/disabling of a processor mechanism to fetch the adjacent cache line within a 128-byte sector that contains the data needed due to a cache line miss. In some cases, setting this option to Disabled may improve performance. Users should only disable this option after performing application benchmarking to verify improved performance in their environment.

    Intel Turbo boost Technology:

    Enabling this option allows the processor cores to automatically increase its frequency and increasing performance if it is running below power, temperature.

    Intel Hyper Threading Technology:

    Enabling this option allows to use processor resources more efficiently, enabling multiple threads to run on each core and increases processor throughput, improving overall performance on threaded software.

    Power Policy Select (Default=Custom)

    Values for this BIOS setting can be: Efficiency: Maximize the power efficiency of the server,Performance:Maximize the performance of the server, Custom:Allows the user to customize power and performance related options individually.

    Lockstep Memory Mode (Default=Enabled)

    Values for this BIOS setting can be: Lockstep memory mode uses two memory channels at a time and provides an even higher level of protection.You can adjust the mode to disabled.

    cooling Configuration

    The Baseboard Management Controller allows the user to adjust the fan speed manually,If the server is in a stressful environment, the CPU have high temperature, you can adjust the fan speed to 100%.

    Memory Power Saving

    Selects the memory power saving mode, Depends on the selected mode, the Power Down clock mode, CKE, and IBT are intialized accordingly, disable this featrue will keep memory in high performance mode.


    Core C3, Core C6 can be disabled for latency-sensitive applications in order to minimize latency, but disable Core C-states can also significantly limit the amount of turbo when a low number of cores are active, C3 and C6 are recommended to enable in SPEC CPU benchmark.

    VT Support

    If virtualization is not used, this option should be set to "Disabled", this can result in slight performance liftings and energy savings

    Memory Patrol Scrub

    This BIOS option allows the enabling/disabling of Memory Periodic Patrol Scrubber. The Memory Periodic Patrol Scrubber corrects memory soft errors so that, over the length of the system runtime, the risk of producing multi-bit and uncorrectable errors is reduced.

    IMC (Integrated memory controller) Interleaving

    This BIOS option controls the interleaving between the Integrated Memory Controllers (IMCs), Memory could be interleaved across sockets, memory controllers, DDR channels, Ranks. Memory is interleaved for performance and thermal distribution.

    If IMC Interleaving is set to 2-way, addresses will be interleaved between the two IMCs.

    If IMC Interleaving is set to 1-way, there will be no interleaving.

    If IMC Interleaving is set to auto, it depends on the SNC (Sub NUMA Clustering) setting, when SNC is set to enbaled, the IMC Interleaving will be 1-way interleave, SNC is set to disabled, the IMC Interleaving will be 2-way interleave.

    If SNC is disabled, IMC Interleaving should be set to 2-way. If SNC is enabled, IMC Interleaving should be set to 1-way.

    Sub NUMA Cluster(SNC)

    SNC breaks up the last level cache (LLC) into disjoint clusters based on address range, with each cluster bound to a subset of the memory controllers in the system. SNC improves average latency to the LLC and memory. SNC is a replacement for the cluster on die (COD) feature found in previous processor families. For a multi-socketed system, all SNC clusters are mapped to unique NUMA (Non Uniform Memory Access) domains.

    SNC AUTO supports 1-cluster or 2-clusters depending on IMC interleave. SNC and IMC interleave both AUTO will support 1-cluster 2-way IMC interleave.

    SNC Enable supports Full SNC (2 clusters) and 1-way IMC interleave. Utilizes LLC capacity more efficiently and reduces latency due to core/IMC proximity. This may provide performance improvement on NUMA-aware operating systems.

    SNC disable supports 1-cluster and 2-way IMC interleave, the LLC is treated as one cluster.

    LLC Dead Line Allocation

    In some Intel CPU caching schemes, mid-level cache (MLC) evictions are filled into the last level cache (LLC). If a line is evicted from the MLC to the LLC, the core can flag the evicted MLC lines as "dead.” This means that the lines are not likely to be read again. This option allows dead lines to be dropped and never fill the LLC if the option is disabled.

    Values for this BIOS option can be:

    Disabled: Disabling this option can save space in the LLC by never filling MLC dead lines into the LLC.

    Enabled: Opportunistically fill MLC dead lines in LLC, if space is available.

    Last Level Cache (LLC) Prefetch

    This option configures the processor last level cache (LLC) prefetch feature as a result of the non-inclusive cache architecture. The LLC prefetcher exists on top of other prefetchers that can prefetch data into the core data cache unit (DCU) and mid-level cache (MLC). In some cases, setting this option to disabled can improve performance. Typically, setting this option to enable provides better performance.

    Values for this BIOS option can be:

    Disabled: Disables the LLC prefetcher. The other core prefetchers are unaffected.

    Enabled: Gives the core prefetcher the ability to prefetch data directly to the LLC.

    Xtended Prediction Table (XPT) Prefetch

    The Xtended Prediction Table (XPT) prefetcher exists on top of other prefetchers that can prefetch data into the DCU, MLC, and LLC. The XPT prefetcher will issue a speculative DRAM read request in parallel to an LLC lookup. This prefetch bypasses the LLC, saving latency. In some cases, setting this option to disabled can improve performance. Typically, setting this option to enable provides better performance.

    Values for this BIOS option can be:

    Enabled: Allows a read request sent to the LLC to speculatively issue a copy of the read to DRAM.

    Disabled: Read requests to the LLC are not allowed to send a speculative read to DRAM.

    Flag description origin markings:

    [user] Indicates that the flag description came from the user flags file.
    [suite] Indicates that the flag description came from the suite-wide flags file.
    [benchmark] Indicates that the flag description came from a per-benchmark flags file.

    The flags files that were used to format this result can be browsed at,

    You can also download the XML flags sources by saving the following links:,

    For questions about the meanings of these flags, please contact the tester.
    For other inquiries, please contact
    Copyright 2012-2019 Standard Performance Evaluation Corporation
    Tested with SPEC OMP2012 v1.0.
    Report generated on Tue Apr 2 13:36:28 2019 by SPEC OMP2012 flags formatter v538.