SPEC OMP2012 Flag Description for the PGI Compilers version 19.4 for x86 CPUs

Optimization Flags

- -O
- -O0(?=\s|$)
- Set the optimization level to -O2
- -O0
- -O0(?=\s|$)
- A basic block is generated for each C statement. No scheduling is done between statements. No global optimizations are performed.
- -O1
- -O1(?=\s|$)
- Level-one optimization specifies local optimization (-O1). The compiler performs scheduling of basic blocks as well as register allocation. This optimization level is a good choice when the code is very irregular; that is it contains many short statements containing IF statements and the program does not contain loops (DO or DO WHILE statements). For certain types of code, this optimization level may perform better than level-two (-O2) although this case rarely occurs. The PGI compilers perform many different types of local optimizations, including but not limited to: Algebraic identity removal Constant folding Common subexpression elimination Local register optimization Peephole optimizations Redundant load and store elimination Strength reductions Note that this is the default optimiation level when no optimization flags are specified on the compilation command line.
- -O2
- -O2(?=\s|$)
- Level-two optimization (-O2 or -O) specifies global optimization. The -fast option generally will specify global optimization; however, the -fast switch will vary from release to release depending on a reasonable selection of switches for any one particular release. The -O or -O2 level performs all level-one local optimizations as well as global optimizations. Control flow analysis is applied and global registers are allocated for all functions and subroutines. Loop regions are given special consideration. This optimization level is a good choice when the program contains loops, the loops are short, and the structure of the code is regular. The PGI compilers perform many different types of global optimizations, including but not limited to: Branch to branch elimination Constant propagation Copy propagation Dead store elimination Global register allocation Invariant code motion Induction variable elimination
- -O3
- -O3(?=\s|$)
- All level 1 and 2 optimizations are performed. In addition, this level enables more aggressive code hoisting and scalar replacement optimizations that may or may not be profitable.
- -O4
- -O4(?=\s|$)
- Performs all level 1, 2, and 3 optimizations and enables hoisting of guarded invariant floating point expressions.
- -tp
- -tp=(px|bulldozer|piledriver|zen|sandybridge|haswell|knl|skylake)(?=\s|$)
- Specify the type(s) of the target processors(s). The PGI compilers produce code specifically targeted to the type of processor on which the compilation is performed. In particular, the default is to use all supported instructions wherever possible when compiling on a given system. The default target processor is auto-selected depending on the processor on which the compilation is performed. You can specify a target processor to compile for a different processor type, such as to select a more generic processor, allowing the code to run on more system types. Specifying two or more target processors enables unified binary code generation, where two or more versions of each function may be generated, each version optimized for the specific instruction set available in each target processor. Executables created on a given system without the -tp flag may not be usable on previous generation systems. For example, executables created on an Intel Skylake processor may use instructions that are not available on earlier Intel Sandy Bridge systems. The following list contains the possible suboptions for -?tp and the processors that each suboption is intended to target. px - generate code that is usable on any x86-64 processor-based system. bulldozer - generate code for AMD Bulldozer and compatible processors. piledriver - generate code that is usable on any AMD Piledriver processor-based system. zen - generate code that is usable on any AMD Zen processor-based system (Epyc, Ryzen). sandybridge - generate code for Intel Sandy Bridge and compatible processors. haswell - generate code that is usable on any Intel Haswell processor-based system. knl - generate code that is usable on any Intel Knights Landing processor-based system. skylake - generate code that is usable on an Intel Skylake Xeon processor-based system.
- -mp
- -mp(?=\s|$)
- Instructs the compiler to interpret user-inserted OpenMP shared-memory parallel programming directives, and to generate an executable file which will utilize multiple processors in a shared-memory parallel system. Use the -mp option to instruct the compiler to interpret user-inserted OpenMP shared-memory parallel programming directives and to generate an executable file which utilizes multiple processors in a shared-memory parallel system. The suboptions are one or more of the following: align Forces loop iterations to be allocated to OpenMP processes using an algorithm that maximizes alignment of vector sub-sections in loops that are both parallelized and vectorized for SSE. This allocation can improve performance in program units that include many such loops. It can also result in load-balancing problems that significantly decrease performance in program units with relatively short loops that contain a large amount of work in each iteration. The numa suboption uses libnuma on systems where it is available. allcores Instructs the compiler to target all available cores. You specify this suboption at link time. bind Instructs the compiler to bind threads to cores. You specify this suboption at link time. [no]numa Uses [does not use] libnuma on systems where it is available. For a detailed description of this programming model and the associated directives, refer to Section 9, Using OpenMP of the PGI Compiler User's Guide. To set this option in PVF, use the Fortran | Language | Enable OpenMP Directives property, described in Enable OpenMP Directives.
- -Bstatic_pgi
- -Bstatic_pgi(?=\s|$)
- Staticily link with the PGI runtime libraries. System libraries may still be dynamically linked.
  Description You can use this option to explicitly compile for and link to the static version of the PGI runtime libraries. Note:On Linux,-Bstatic_pgi results in code that runs on most Linux systems without requiring a Portability package. For more information on using static libraries on Windows, refer to ‘Creating and Using Static Libraries on Windows’in the‘Creating and Using Libraries’section of the PGI Compiler User's Guide.
- -m64
- -m64(?=\s|$)
- Use the 64-bit compiler for the default processor type.
  Description Use this option to specify the 64-bit compiler as the default processor type.
- -Mallocatable
- -Mallocatable=(95|03)(?=\s|$)
- Controls whether Fortran 95 or Fortran 2003 semantics are used in allocatable array assignments. The default behavior is to use Fortran 95 semantics; the 03 option instructs the compiler to use Fortran 2003 semantics.
- -Mdefaultunit
- -Mdefaultunit(?=\s|$)
- Instructs the compiler to treat "*" as a synonym for standard input for reading and standard output for writing.
- -Mnostride0
- -Mnostride0(?=\s|$)
- Instructs the compiler to perform certain optimizations and to disallow for stride 0 array references.
- -Mnoupcase
- -Mnoupcase(?=\s|$)
- Instructs the compiler to convert all identifiers to lower case. This selection affects the linking process. If you compile and link the same source code using-Mupcase on one occasion and -Mnoupcase on another, you may get two different executables, depending on whether the source contains uppercase letters. The standard libraries are compiled using -Mnoupcase.
- -Masmkeyword
- -Masmkeyword(?=\s|$)
- Instructs the compiler to allow the asm keyword in C source files. The syntax of the asm statement is as follows: asm("statement"); Where statement is a legal assembly-language statement. The quote marks are required. Note:The current default is to support gcc's extended asm, where the syntax of extended asm includes asm strings. The -M[no]asmkeyword switch is useful only if the target device is a Pentium 3 or older cpu type (-tp piii|p6|k7|athlon|athlonxp|px).
- -Mnoasmkeyword
- -Mnoasmkeyword(?=\s|$)
- Instructs the compiler to allow the asm keyword in C source files. The syntax of the asm statement is as follows: asm("statement"); Where statement is a legal assembly-language statement. The quote marks are required. Note:The current default is to support gcc's extended asm, where the syntax of extended asm includes asm strings. The -M[no]asmkeyword switch is useful only if the target device is a Pentium 3 or older cpu type (-tp piii|p6|k7|athlon|athlonxp|px).
- -Msingle
- -Msingle(?=\s|$)
- Instructs the compiler to convert float parameters to double parameters in non-prototyped functions.
- -Mnosingle
- -Mnosingle(?=\s|$)
- Instructs the compiler to convert float parameters to double parameters in non-prototyped functions.
- -fast
- -fast(?=\s|$)
- This options creates a generally optimal set of flags for targets that support SIMD capability. It incorporates optimization options to enable use of vector streaming SIMD instructions (64-bit targets) and enable vectorization with SIMD instructions, cache aligned and flushz.
- -fastsse
- -fastsse(?=\s|$)
- Generally optimal set of flags for targets that include SSE/SSE2 capability.
- -Mipa
- -Mipa(?=\s|$)
- Pass options to the interprocedural analyzer.Note:-Mipa is not compatible with parallel make environments (e.g., pmake). -Mipa implies -O2, and the minimum optimization level that can be specified in combination with -Mipa is -O2. For example, if you specify -Mipa -O1 on the command line, the optimization level is automatically elevated to -O2 by the compiler driver. Typically, as recommended, you would use -Mipa=fast. fast : choose IPA options generally optimal for the target. To see settings for -Mipa=fast on a given target, use -help. inline : perform automatic function inlining. If the optional :n is provided, limit inlining to at most n levels. IPA-based function inlining is performed from leaf routines upward. fast,inline: Enables interprocedural analysis and optimization. Also enables automatic procedure inlining.
- -Mnoiomutex
- -Mnoiomutex(?=\s|$)
- Instructs the compiler not to generate critical section calls around Fortran I/O statements.
- -Mno8intrinsics
- -Mnor8intrinsics(?=\s|$)
- (pgf77,�pgf95, andpgfortran�only)�The compiler does not promote the intrinsics�CMPLX�and�REAL�to�DCMPLX�and�DBLE, respectively.
- -Mnosave
- -Mnosave(?=\s|$)
- Instructs the compiler not to assume that all local variables are subject to the�SAVE�statement.
- -Mpre
- -Mpre(?=\s|$)
- Enables partial redundancy elimination.
- -lfftw3
- -lfftw3(?=\s|$)
- FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST).
- -lamdlibm
- -lamdlibm(?=\s|$)
- AMD LibM is a software library containing a collection of basic math functions optimized for x86-64 processor-based machines. It provides many routines from the list of standard C99 math functions. Applications can link into AMD LibM library and invoke math functions instead of compiler’s math functions for better accuracy and performance.
- -lfftw3_omp
- -lfftw3_omp(?=\s|$)
- FFTW is a comprehensive collection of fast C routines for computing the Discrete Fourier Transform (DFT) and various special cases thereof. It is an open-source implementation of the Fast Fourier transform algorithm. It can compute transforms of real and complex-values arrays of arbitrary size and dimension. An AMD optimized FFTW that includes selective kernels and routines optimized for the AMD EPYC™ processor family is available. Source code is available on GitHub https://github.com/amd/amd-fftw
- -Mlre
- -Mlre(?=\s|$)
- Enables loop-carried redundancy elimination, an optimization that can reduce the number of arithmetic operations and memory references in loops.
- -Mnor8intrinsics
- -Mnor8intrinsics(?=\s|$)
- (pgf77,�pgf95, andpgfortran�only)�The compiler does not promote the intrinsics�CMPLX�and�REAL�to�DCMPLX�and�DBLE, respectively.
- -Mschar
- -Mschar(?=\s|$)
- Specifies signed char characters. The compiler treats "plain" char declarations as signed char.
- -Minline
- -Minline=levels:(\d+)\b
- Enables function inlining. Instructs the inliner to perform N levels of inlining where N is a supplied constant value. If no value is suppiled, then the default value of 2 is used.
- -lblis
- -lblis(?=\s|$)
- BLIS is a portable open-source software framework for instantiating high-performance Basic Linear Algebra Subprograms (BLAS) – like dense linear algebra libraries. The framework was designed to isolate essential kernels of computation that, when optimized, immediately enable optimized implementations of most of its commonly used and computationally intensive operations. Select kernels have been optimized for the AMD EPYCTM processor family by AMD and others. Source code is available on GitHub https://github.com/amd/blis.
- -Msafeptr
- -Msafeptr(?=\s|$)
- Instructs the C/C++ compiler to override data dependencies between pointers of a given storage class.
- -Mconcur
- -Mconcur(?=\s|$)
- Instructs the compiler to enable auto-concurrentization of loops. If specified, the compiler uses multiple processors to execute loops that it determines to be parallelizable; thus, loop iterations are split to execute optimally in a multithreaded execution context.
- -Mpfi
- -Mpfi(?=\s|$)
- Instrument the generated code and link in libraries for dynamic collection of profile and data information at runtime.
- -Mpfo
- -Mpfo(?=\s|$)
- Enable profile feedback information. The�nopfo�option is valid only immediately following the�inline�suboption.�-Mipa=inline,nopfo�tells IPA to ignore PFO information when deciding what functions to inline, if PFO information is available.
- -Mmovnt
- -Mmovnt(?=\s|$)
- The –Mmovnt option instructs the compiler to generate non-temporal stores and prefetch instructions, even when it cannot determine that this is beneficial.
- -Mnontemporal
- -Mnontemporal(?=\s|$)
- Worth a try during code tuning. May especially be useful for memory-bound code, since this supports cache bypass for streaming writes. instructs the compiler to generate nontemporal move and prefetch instructions even in cases where the compiler cannot determine statically at compile-time that these instructions will be beneficial.
- --no_exceptions
- --no_exceptions(?=\s|$)
- Removes exception handling from user code. For C++, declares that the functions in this file generate no C++ exceptions, allowing more optimal code generation.
- --zc_eh
- --zc_eh(?=\s|$)
- The --zc_eh option allows zero-cost exception handling for C++.
- -Mkeepasm
- -Mkeepasm(?=\s|$)
- Instructs the compiler to keep the assembly file.
- -Mbackslash
- -Mbackslash(?=\s|$)
- Instructs the compiler to treat the backslash as a normal character, and not as an escape character in quoted strings.
- -Mdlines
- -Mdlines(?=\s|$)
- Instructs the compiler to treat lines containing "D" in column 1 as executable statements (ignoring the "D").
- -alias
- -alias=(ansi|traditional)(?=\s|$)
- select optimizations based on type-based pointer alias rules in C and C++.
  ansi
  
  Enable optimizations using ANSI C type-based pointer disambiguation.
  
  traditional
  
  Disable type-based pointer disambiguation.
- -pgcpplibs
- -pgc\+\+libs(?=\s|$)
- Use this option to instruct the compiler to append C⁠+⁠+ runtime libraries to the link line for programs built using either PGF77 or PGF90 .
- -Mfixed
- -Mfixed(?=\s|$)
- Instructs the compiler to assume input source files are in FORTRAN�77-style fixed form format.
- -pgf77libs
- -pgf77libs(?=\s|$)
- Use this option to instruct the compiler to append PGF77 runtime libraries to the link line.
- -pgf90libs
- -pgf90libs(?=\s|$)
- Use this option to instruct the compiler to append�PGF90/PGF95/PGFORTRAN runtime libraries to the link line.
- -Mfpapprox
- -Mfpapprox(?=\s|$)
- Perform certain floating point operations using low-precision approximation. -Mnofpapprox specifies not to use low-precision fp approximation operations. By default -Mfpapprox is not used. If -⁠Mfpapprox is used without suboptions, it defaults to use approximate div, sqrt, and rsqrt. The available suboptions are these: div Approximate floating point division sqrt Approximate floating point square root rsqrt Approximate floating point reciprocal square root
- -Mfprelaxed
- -Mfprelaxed(?=\s|$)
- Instructs the compiler to use [not use] relaxed precision in the calculation of some intrinsic functions. Can result in improved performance at the expense of numerical accuracy.
- -Mnom128
- -Mnom128(?=\s|$)
- instructs the compiler to recognize [ignore] __m128, __m128d, and __m128i datatypes. floating-point constants as float data types, instead of double data types. This option can improve the performance of single-precision code.
- -Mcray
- -Mcray=pointer(?=\s|$)
- (Fortran only) Force Cray Fortran (CF77) compatibility with respect to the listed options. Possible values of option include:
  pointer
  
  for purposes of optimization, it is assumed that pointer-based variables do not overlay the storage of any other ariable.

- -Mfree
- -Mfree(?=\s|$)
- Instructs the compiler to assume input source files are in Fortran 90/95 freeform format.
- -mcmodel
- -mcmodel=medium(?=\s|$)
- Generate code which supports the medium memory model in the linux86-64 environment. Generate code for the medium memory model. The default small memory model limits the combined area for a user's object or executable to 1GB, with the Linux kernel managing usage of the second 1GB of address for system routines, shared libraries, stacks, etc. Programs are started at a fixed address, and the program can use a single instruction to make most memory references. The medium memory model allows for larger than 2GB data areas, or .bss sections. Program units compiled using either -mcmodel=medium or -fpic require additional instructions to reference memory. The effect on performance is a function of the data-use of the application. The -mcmodel=medium switch must be used at both compile time and link time to create 64-bit executables. Program units compiled for the default small memory model can be linked into medium memory model executables as long as they are compiled -fpic, or position-independent.
  (For use only on 64-bit Linux targets) Generates code for the medium memory model in the linux86-64 execution environment. Implies -Mlarge_arrays.
  
  Default: The compiler generates code for the small memory model.
  
  Usage
  
  The following command line requests position independent code be generated, and the option -mcmodel=medium be passed to the assembler and linker:
  
  $ pgfortran -mcmodel=medium myprog.f
  Description
  The default small memory model of the linux86-64 environment limits the combined area for a user's object or executable to 1GB,
  
  with the Linux kernel managing usage of the second 1GB of address for system routines, shared libraries, stacks, and so on.
  
  Programs are started at a fixed address, and the program can use a single instruction to make most memory references.
  
  The medium memory model allows for larger than 2GB data areas, or .bss sections. Program units compiled using either -mcmodel=medium
  
  or -fpic require additional instructions to reference memory.
  The effect on performance is a function of the data-use of the application. The -mcmodel=medium switch must be used at both compile
  
  time and link time to create 64-bit executables. Program units compiled for the default small memory model can be linked into medium
  
  memory model executables as long as they are compiled with the option -fpic, or position-independent.
  
  The linux86-64 environment provides static libxxx.a archive libraries, that are built both with and without -fpic, and dynamic
  
  libxxx.so shared object libraries that are compiled with -fpic. Using the link switch -mcmodel=medium implies the -fpic switch
  
  and utilizes the shared libraries by default. The directory $PGI/linux86-64//lib?contains the libraries for building small
  
  memory model codes; and the directory $PGI/linux86-64//libso contains shared libraries for building both -fpic and
  
  -mcmodel=medium?executables.
  
  Note:
  
  -mcmodel=medium -fpic is not allowed to create shared libraries. However, you can create static archive libraries (.a) that are -fpic.

Argument	Default	Description
modifier	noverbose respect granularity=core	Optional. String consisting of keyword and specifier. granularity=<specifier> takes the following specifiers: fine, thread, and core norespect noverbose nowarnings proclist={<proc-list>} respect verbose warnings
type	none	Required string. Indicates the thread affinity to use. compact disabled explicit none scatter logical (deprecated; instead use compact, but omit any permute value) physical (deprecated; instead use scatter, possibly with an offset value) The logical and physical types are deprecated but supported for backward compatibility.
permute	0	Optional. Positive integer value. Not valid with type values of explicit, none, or disabled.
offset	0	Optional. Positive integer value. Not valid with type values of explicit, none, or disabled.

SPEC OMP2012 Flag Description for the PGI Compilers version 19.4 for x86 CPUs

Sections

Optimization Flags

Portability Flags

Compiler Flags

Shell, Environment, and Other Software Settings

Open MP Tuning Flags

Syntax

Default

Description

Affinity Types

type = none (default)

type = compact

type = disabled

type = explicit

type = scatter

Deprecated Types: logical and physical

Permute and offset combinations

Modifier Values for Affinity Types

modifier = noverbose (default)

modifier = verbose

Execution modes

Serial

Turnaround

Throughput