IBM XL Compiler Flags, Common Operating System Commands and Environment Settings

Optimization Flags

-B
-D__extern_always_inline=inline
-hugetlbfs_align
-hugetlbfs_BDT
-lessl
-lhugetlbfs
-lmass
-ltcmalloc
-link_dl_static
-link_emit_relocation
-link_mul_defs
-link_no_whole_archive
-link_whole_archive
-lstd8d
-Lstd
-O
- -O2
  - -O
-O2
- -O
  - -O2
-O3
- -O2
  - -O
- -qhot=level=0
-O4
- -O3
  - -O2
    - -O
  - -qhot=level=0
- -qipa=level=1
- -qarch=auto
- -qtune=auto
- -qsimd=auto
-O5
- -O4
  - -O3
    - -O2
      - -O
    - -qhot=level=0
  - -qipa=level=1
  - -qarch=auto
  - -qtune=auto
  - -qsimd=auto
- -qipa=level=2
-q64
-qalias
-qalign
-qalloca
-qarch
-qassert
-qenablevmx
-qenum
-qessl
-qfdpr
-qhot
-qinlglue
-qinline
-qipa=inline
-qipa=level
-qipa=threads
-qnoinline
-qnoenablevmx
-qnoipa
-qnoprefetch
-qnothreaded
-qpdf1
-qpdf2
-qprefetch
-qrestrict
-qrtti
-qsave
-qsimd
-qsmallstack=dynlenonheap
-qsmp=auto
-qsmp=omp
-qspill
-qstrict=nans
-qtune
-qvecnvol
-qxlf90
-Rstd
-tl
-qdatasmall
-qunroll
-qnounroll
-funroll-all-loops
-qlibansi
-qpagesize=16M
-qipa=noobject:partition=large
-qipa=noobject
-qsmallstack
-qipa

- -B
- -B/\S*
- Determines substitute path names for XL Fortran executables such as the compiler, assembler, linker, and preprocessor. It can be used in combination with the -t option, which determines which of these components are affected by -B. Example : -B/opt/at10.0/share/libhugetlbfs/
- -D__extern_always_inline=inline
- -D__extern_always_inline=inline\b
- Macro to have compiler always inline externs if specified.
- -hugetlbfs_align
- -Wl,--hugetlbfs-align
- Pass the --hugetlbfs-align flag to the linker so that we can control (by environment variable HUGETLB_ELFMAP) which program segments are placed in hugepages.
- -hugetlbfs_BDT
- -Wl,--hugetlbfs-link=BDT
- Pass the --hugetlbfs-link=BDT flag to the linker so that the text, initialized data, and BSS segments of the application are backed by hugepages.
- -lessl
- -lessl\b
- Link the Engineering and Scientific Subroutine Library (ESSL).
- -lhugetlbfs
- (?:^|(?<=\s))-lhugetlbfs(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
- -lmass
- -lmass\b
- Link the mathematical acceleration subsystem libraries (MASS), which contain libraries of tuned mathematical intrinsic functions.
- -ltcmalloc
- (?:^|(?<=\s))-ltcmalloc(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Link with tcmalloc library for Linux on POWER. This is a library that optimizes calls to new, delete, malloc and free.
- -link_dl_static
- /usr/lib/libdl.a
- Instructs the linker to include libdl.a to enable dynamic linking loader. Links "/usr/lib/libdl.a" library.
- -link_emit_relocation
- -Wl,-q\b
- Pass the -q flag to the linker causing the final executable to have the relocation information.
- -link_mul_defs
- -Wl,-z,muldefs
- Instructs the linker to allow multiple definitions and the first definition will be used. Normally when a symbol is defined multiple times, the linker will report a fatal error.
- -link_no_whole_archive
- -Wl,--no-whole-archive
- Turn off the effect of the --whole-archive flag.
- -link_whole_archive
- -Wl,--whole-archive\s/\S*
- Instructs the linker to include every object file in the specified library, rather than searching the library for the required object files. Example : "-Wl,--wholearchive /usr/lib/libhugetlbfs.a"
- -lstd8d
- (?:^|(?<=\s))-lstd8d(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Link with the Apache C++ Standard Library ("stdcxx"). "libstd8d.so" is a 32-bit shared library with optimization enabled.
- -Lstd
- -L\s*[^ ]*stdcxx[^ ]*
- Adds the directory for the Apache C++ Standard Library to the search path at link time.
- -O
- -O\b
- -O enables the level of optimization that represents the best tradeoff between compilation speed and run-time performance. If you need a specific level of optimization, specify the appropriate numeric value. Currently, -O is equivalent to -O2.
- Includes:
  - -O2
    - -O
      
      -O2
- -O2
- -O2\b
- -O2 performs a set of optimizations that are intended to offer improved performance without an unreasonable increase in time or storage that is required for compilation including :
  - Eliminates redundant code
  - Basic loop optimization
  - Can structure code to take advantage of -qarch and -qtune settings
- Includes:
  - -O
    - -O2
      
      -O
- -O3
- -O3\b
- -O3 Performs additional optimizations that are memory intensive, compile-time intensive, and may change the semantics of the program slightly, unless -qstrict is specified. We recommend these optimizations when the desire for run-time speed improvements outweighs the concern for limiting compile-time resources.
  The optimizations provided include:
  - In-depth memory access analysis
  - Better loop scheduling
  - High-order loop analysis and transformations (-qhot=level=0)
  - Inlining of small procedures within a compilation unit by default
  - Eliminating implicit compile-time memory usage limits
  - Widening, which merges adjacent load/stores and other operations
  - Pointer aliasing improvements to enhance other optimizations
  -O3 is equivalent to the following flags :
  - -O2
  - -qhot=level=0
- Includes:
  - -O2
    - -O
      
      -O2
  - -qhot=level=0
- -O4
- -O4\b
- Perform optimizations for maximum performance. This includes interprocedural analysis on all of the objects presented on the "link" step.
  -O4 is equivalent to the following flags:
  - -O3
  - -qipa=level=1
  - -qarch=auto
  - -qtune=auto
  - -qsimd=auto
- Includes:
  - -O3
    - -O2
      
      -O
    - -qhot=level=0
  - -qipa=level=1
  - -qarch=auto
  - -qtune=auto
  - -qsimd=auto
- -O5
- -O5\b
- Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
  -O5 is equivalent to the following flags :
  - -O4
  - -qipa=level=2
- Includes:
  - -O4
    - -O3
      
      -O2
      
      -O
      
      -qhot=level=0
    - -qipa=level=1
    - -qarch=auto
    - -qtune=auto
    - -qsimd=auto
  - -qipa=level=2
- -q64
- -q64\b
- Generates 64-bit ABI binaries. The default is to generate 64-bit ABI binaries on little-endian Linux.
- -qalias
- -qalias=(noansi|nostd)\b
- -qalias=ansi | noansi :
  If ansi is specified, type-based aliasing is used during optimization, which restricts the lvalues that can be safely used to access a data object. The default is ansi for the xlc, xlC, and c89 commands. This option has no effect unless you also specify the -O option.
  qalias=std |nostd :
  Indicates whether the compilation units contain any non-standard aliasing. If so, specify nostd.
- -qalign
- -qalign=(\S+)\b
- Specifies what aggregate alignment rules the compiler uses for file compilation, where the alignment options are:
  - bit_packed : The compiler uses the bit_packed alignment rules.
  - full : The compiler uses the RISC System/6000 alignment rules. This is the same as power.
  - mac68k : The compiler uses the Macintosh alignment rules. This suboption is valid only for 32- bit compilations.
  - natural : The compiler maps structure members to their natural boundaries.
  - packed : The compiler uses the packed alignment rules.
  - power : The compiler uses the RISC System/6000 alignment rules.
  - twobyte : The compiler uses the Macintosh alignment rules. This suboption is valid only for 32- bit compilations.
  The default is -qalign=full.
- -qalloca
- -qalloca\b
- Indicates that the compiler understands how to do alloca(). This flag is not supported on little-endian Linux.
- -qarch
- -qarch=(\S+)\b
- Produces object code containing instructions that will run on the specified processors. auto selects the processor the compile is being done on.
  Supported values for this flag are :
  - auto - Use the processor on which the program is compiled.
  - pwr9 - The POWER9 processor based systems.
  - pwr8 - The POWER8 processor based systems.
- -qassert
- -qassert=(refalign|contiguous)?\b
- -qassert=refalign | norefalign | contig :
  - refalign specifies that all pointers inside the compilation unit only point to data that is naturally aligned according to the length of the pointer types.
  - contig specifies the compiler can perform optimizations according to the memory layout of the objects occupying contiguous blocks of memory.
- -qenablevmx
- -qenablevmx\b
- Enables the generation of vector instructions for processors that support them.
- -qenum
- -qenum=small\b
- Tell the compiler that enum size is small.
- -qessl
- -qessl\b
- Specifies that, if either -lessl or -lesslsmp is specified, then Engineering and Scientific Subroutine Library (ESSL) routines should be used in place of some Fortran 90 intrinsic procedures when there is a safe opportunity to do so.
- -qfdpr
- -qfdpr\b
- The compiler generates additional symbol information for use by the "fdpr" binary optimization tool.
- -qhot
- -qhot(=arraypad|=simd|=(no)?vector|=level=[01])?\b
- Performs high-order transformations on loops during optimization. Some example usages are: -qhot, -qhot=level=1, -qhot=simd, -qhot=novector
  
  The supported values for suboption are :
  - arraypad - The compiler will pad any arrays where it infers that there may be a benefit.
  - level=0 - The compiler performs a limited set of high-order loop transformations.
  - level=1 - The compiler performs its full set of high-order loop transformations.
  - simd - Replaces certain instruction sequences with vector instructions.
  - vector - Replaces certain instruction sequences with calls to the MASS library.
  Specifying -qhot without suboptions implies -qhot=nosimd, -qhot=noarraypad, -qhot=vector and -qhot=level=1. The -qhot option is also implied by -O4 and -O5 .
- -qinlglue
- -qinlglue\b
- This option inlines glue code that optimizes external function calls when compiling.
- -qinline
- -qinline=(\S+)\b
- The inline option specifies the threshold and limit of inlined functions. Example : -qinline=40.
- -qipa=inline
- -qipa=inline=(\S+)\b
- The inline suboption specifies the threshold and limit of inlined functions. Examples : -qipa=inline=limit=1000 and -qipa=inline=threshold=100
- -qipa=level
- -qipa=level=[012]\b
- Enhances optimization by doing detailed analysis across procedures (interprocedural analysis or IPA). The level determines the amount of interprocedural analysis and optimization that is performed.
  - level=0 does only minimal interprocedural analysis and optimization.
  - level=1 turns on inlining , limited alias analysis, and limited call-site tailoring.
  - level=2 turns on full interprocedural data flow and alias analysis.
- -qipa=threads
- -qipa=threads(=\d+)?\b
- The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads,
  which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used.
  N must be a positive integer.
  Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
- -qnoinline
- -qnoinline\b
- This option specifies that no functions are to be inlined.
- -qnoenablevmx
- -qnoenablevmx\b
- Disables the generation of vector instructions.
- -qnoipa
- -qnoipa\b
- Suppresses interprocedural analysis (IPA), which is enabled by default at optimization levels -O4 and -O5.
- -qnoprefetch
- -qnoprefetch\b
- The noprefetch option will not add any prefetch instructions automatically.
- -qnothreaded
- -qnothreaded\b
- Do not use the XL compiler thread information.
- -qpdf1
- -qpdf1\b
- The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both execution path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
- -qpdf2
- -qpdf2\b
- The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
- -qprefetch
- -qprefetch=(aggressive|dscr=(\S+))\b
- Inserts prefetch instructions automatically where there are opportunities to improve code performance.
  - -qprefetch=aggressive : Aggressively prefetch data.
  - -qprefetch=dscr option causes the Data Streams Control Register to be set to the value specified when executing this program.
  Example : -qprefetch=dscr=42
- -qrestrict
- -qrestrict\b
- Adds the restrict type qualifier to the pointer parameters within all functions without modifying the source file.
- -qrtti
- -qrtti\b
- Cause the C++ compiler to generate Run Time Type Identification code
- -qsave
- -qsave\b
- Specifies that all local variables be treated as STATIC.
- -qsimd
- -q(no)?simd(=auto|=noauto)?\b
- - -qsimd : enables the generation of vector instructions for processors that support them.
  - -qnosimd : disables the generation of vector instructions.
  - Default : whether -qsimd is specified or not, -qsimd=auto is implied at the -O3 or higher optimization level; -qsimd=noauto is implied at the -O2 or lower optimization level.
- -qsmallstack=dynlenonheap
- -qsmallstack=dynlenonheap\b
- Causes the Fortran compiler to allocate dynamic arrays on the heap instead of the stack.
- -qsmp=auto
- -qsmp=auto\b
- Yes
- Causes the compiler to automatically generate parallel code using OMP controls when possible.
- -qsmp=omp
- -qsmp=omp\b
- Yes
- Tell the compiler that OMP controls are used to identify parallel code.
- -qspill
- -qspill(=[0-9]*)\b
- Specifies the size of the register allocation spill area in bytes.
- -qstrict=nans
- -qstrict=nans\b
- Disables transformations that may produce incorrect results in the presence of, or that may incorrectly produce IEEE floating-point NaN (not-a-number) values.
- -qtune
- -qtune=(\S+)\b
- Specifies the system architecture for which the executable program is optimized. This includes instruction scheduling and cache setting. Allows specification of a target SMT mode to direct optimizations for best performance in that mode.
  The supported values for suboption are :
  - auto - Use the processor on which the program is compiled.
  - pwr9 - The POWER9 processor based systems.
  - pwr8 - The POWER8 processor based systems.
  - st - Optimizations are tuned for single-threaded execution.
  - smt2 - Optimizations are tuned for SMT2 execution mode.
  - smt4 - Optimizations are tuned for SMT4 execution mode.
  - smt8 - Optimizations are tuned for SMT8 execution mode.
- -qvecnvol
- -qvecnvol\b
- Specifies whether to use volatile or non-volatile vector registers. Volatile vector registers are registers whose value is not preserved across function calls so the compiler will not depend on values in them across function calls.
- -qxlf90
- -qxlf90=(signedzero|nosignedzero|autodealloc|noautodealloc|oldpad|nooldpad|)\b
- Determines whether the compiler provides the Fortran 90 or the Fortran 95 level of support for certain aspects of the language.
  suboption can be one of the following :
  - signedzero | nosignedzero : Determines how the SIGN(A,B) function handles signed real 0.0. In addition, determines whether negative internal values will be prefixed with a minus when formatted output would produce a negative sign zero.
  - autodealloc | noautodealloc : Determines whether the compiler deallocates allocatable arrays that are declared locally without either the SAVE or the STATIC attribute and have a status of currently allocated when the subprogram terminates.
  - oldpad | nooldpad : When the PAD=specifier is present in the INQUIRE statement, specifying -qxlf90=nooldpad returns UNDEFINED when there is no connection, or when the connection is for unformatted I/O. This behavior conforms with the Fortran 95 standard and above. Specifying -qxlf90=oldpad preserves the Fortran 90 behavior.
  - Default: signedzero, autodealloc and nooldpad for the xlf95, xlf95_r, xlf95_r7 and f95 invocation commands. nosignedzero, noautodealloc and oldpad for all other invocation commands.
- -Rstd
- -R\s*[^ ]*stdcxx[^ ]*
- Specifies library search directory for the Apache C++ Standard Library for use by the runtime linker. The information is recorded in the object file and passed to the runtime linker.

-tl
(?:^|(?<=\s))-tl(?:$[^$]+\))?(?:=\S*)?(?=\s|$)

Applies the prefix specified by the -B option to the designated components.

Parameter	Description	Executable name
a	Assembler	as
b	Low-level optimizer	xlfcode
c	Compiler front end	xlfentry
d	Disassembler	dis
F	C preprocessor	cpp
h	Array language optimizer	xlfhot
I	High-level optimizer, compile step	ipa
l	Linker	ld
z	Binder	bolt

- -qdatasmall
- -qdatasmall\b
- This option indicates to the compiler that each dynamic object allocated in the program fits within the size of 4GB.
- -qunroll
- -qunroll(=auto|yes|no|n)\b
- Unrolls inner loops in the program. This can help improve program performance.
  suboption can be one of the following :
  - auto : This suboption is equivalent to -funroll-loops.
  - yes : This suboption is equivalent to -funroll-all-loops.
  - no : Instructs the compiler to not unroll loops.
  - n : Instructs the compiler to unroll loops by a factor of n. In other words, the body of a loop is replicated to create n copies and the number of iterations is reduced by a factor of 1/n. The -qunroll=n option specifies a global unroll factor that affects all loops that do not have an unroll pragma already. The value of n must be a positive integer.
  - Default: -qunroll=auto
  If -qunroll is specified with no suboptions, the compiler assumes -qunroll=yes. -qnounroll is equivalent to -qunroll=no.
- -qnounroll
- -qnounroll\b
- This flag is equivalent to -qunroll=no.
- -funroll-all-loops
- -funroll-all-loops\b
- Instructs the compiler to search for more opportunities for loop unrolling than that performed with -funroll-loops. In general, -funroll-all-loops has more chances to increase compile time or program size than -funroll-loops processing, but it might also improve your application's performance.
- -qlibansi
- -qlibansi\b
- Assumes that all functions with the name of an ANSI C defined library function are, in fact, the library functions.
- -qpagesize=16M
- -qpagesize=(\S+)\b
- Asserts the minimum physical pagesize during program execution.
- -qipa=noobject:partition=large
- -qipa=noobject:partition(\S+)\b
- Flag -qipa=noobject, specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
  The partition suboption specifies the size of the program sections that are analysed together. Larger partitons may produce better analysis but require more storage. Default is medium.
  partition={small|medium|large}
  
  Specifies the size of program sections that are analyzed together. Larger partitions may produce better analysis but require more storage.
  
  Default: partition=medium
- -qipa=noobject
- -qipa=noobject\b
- Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option adds -qipa=level=1 if that is not already set.
- -qsmallstack
- -qsmallstack\b
- Reduces the size of the stack frame. Programs that allocate large amounts of data to the stack, such as threaded programs, may result in stack overflows. This option can reduce the size of the stack frame to help avoid overflows.
- -qipa
- -qipa\b
- Enhances optimization by doing detailed analysis across procedures (interprocedural analysis or IPA).

fdprpro is a Feedback Directed Program Restructuring optimization tool that is available for the IBM POWER platform. It can be used optionally during FDO.

IBM XL Compiler Flags, Common Operating System Commands and Environment Settings

Sections

Optimization Flags

Portability Flags

Compiler Flags

Other Flags

Commands and Options Used for Feedback-Directed Optimization