Compilers: IBM XL C/C++ Version 13.1 for Linux
Compilers: IBM XL Fortran Version 15.1 for Linux
Libraries: IBM IBM Advance Toolchain 7
Operating systems: Red Hat Enterprise Linux Server release 7
Last updated: 16-June-2014
Selecting one of the following will take you directly to that section:
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsPerform optimizations for maximum performance. This includes interprocedural analysis on all of the objects presented on the "link" step.
-O4 is equivalent to the following flagsProduces object code containing instructions that will run on the specified processors. "auto" selects the processor the compile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the system architecture for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
This option specifies that no functions are to be inlined.
This option inlines glue code that optimizes external function calls when compiling.
Performs high-order transformations on loops during optimization. The supported values for suboption are:
Specifying -qhot without suboptions implies -qhot=nosimd, -qhot=noarraypad, -qhot=vector and -qhot=level=1. The -qhot option is also implied by -O4, and -O5.
Enhances optimization by doing detailed analysis across procedures (interprocedural analysis or IPA). The level determines the amount of interprocedural analysis and optimization that is performed.
level=0 Does only minimal interprocedural analysis and optimization
level=1 turns on inlining , limited alias analysis, and limited call-site tailoring
level=2 turns on full interprocedural data flow and alias analysis
Suppresses interprocedural analysis (IPA), which is enabled by default at optimization levels -O4 and -O5.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both execution path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
The compiler generates additional symbol information for use by the "fdpr" binary optimization tool.
Do not use the XL compiler thread information.
Do not use the XL compiler compat macros.
         -qxlf90=<suboption>
                Determines whether the compiler provides the
                Fortran 90 or the Fortran 95 level of support for
                certain aspects of the language. <suboption> can be
                one of the following:
                signedzero | nosignedzero
                     Determines how the SIGN(A,B) function handles
                     signed real 0.0. In addition, determines
                     whether negative internal values will be
                     prefixed with a minus when formatted output
                     would produce a negative sign zero.
                autodealloc | noautodealloc
                     Determines whether the compiler deallocates
                     allocatable arrays that are declared locally
                     without either the SAVE or the STATIC
                     attribute and have a status of currently
                     allocated when the subprogram terminates.
                oldpad | nooldpad
                     When the PAD=specifier is present in the
                     INQUIRE statement, specifying -qxlf90=nooldpad
                     returns UNDEFINED when there is no connection,
                     or when the connection is for unformatted I/O.
                     This behavior conforms with the Fortran 95
                     standard and above. Specifying -qxlf90=oldpad
                     preserves the Fortran 90 behavior.
                Default:
                     o signedzero, autodealloc and nooldpad for the
                     xlf95, xlf95_r, xlf95_r7 and f95 invocation
                     commands.
                     o nosignedzero, noautodealloc and oldpad for
                     all other invocation commands.
Generates 64 bit ABI binaries. The default is to generate 32 bit ABI binaries.
Causes the Fortran compiler to allocate dynamic arrays on the heap instead of the stack
Specifies that all local variables be treated as STATIC.
Enables the generation of vector instructions for processors that support them.
Enables the generation of vector instructions for processors that support them.
Specifies whether to use volatile or non-volatile vector registers. Volatile vector registers are registers whose value is not preserved across function calls so the compiler will not depend on values in them across function calls.
Link the mathematical acceleration subsystem libraries (MASS), which contain libraries of tuned mathematical intrinsic functions.
Link the Engineering and Scientific Subroutine Library (ESSL).
Specifies that, if either -lessl or -lesslsmp are also specified, then Engineering and Scientific Subroutine Library (ESSL) routines should be used in place of some Fortran 90 intrinsic procedures when there is a safe opportunity to do so.
Cause the C++ compiler to generate Run Time Type Identification code
qalias=ansi | noansi If ansi is specified, type-based aliasing is used during optimization, which restricts the lvalues that can be safely used to access a data object. The default is ansi for the xlc, xlC, and c89 commands. This option has no effect unless you also specify the -O option. qalias=std |nostd Indicates whether the compilation units contain any non-standard aliasing (see Compiler Reference for more information). If so, specify nostd.
           Specifies what aggregate alignment rules the
                compiler uses for file compilation, where the
                alignment options are:
                bit_packed
                     The compiler uses the bit_packed alignment
                     rules.
                full
                     The compiler uses the RISC System/6000
                     alignment rules. This is the same as power.
                mac68k
                     The compiler uses the Macintosh alignment
                     rules.  This suboption is valid only for 32-
                     bit compilations.
                natural
                     The compiler maps structure members to their
                     natural boundaries.
                packed
                     The compiler uses the packed alignment rules.
                power
                     The compiler uses the RISC System/6000
                     alignment rules.
                twobyte
                     The compiler uses the Macintosh alignment
                     rules.  This suboption is valid only for 32-
                     bit compilations.  The mac68k option is the
                     same as twobyte.
                The default is -qalign=full.
qassert=refalign | norefalign | contig refalign specifies that all pointers inside the compilation unit only point to data that is naturally aligned according to the length of the pointer types. contig specifies the compiler can perform optimizations according to the memory layout of the objects occupying contiguous blocks of memory.
qprefetch=aggressive Aggressively prefetch data
The prefetch=dscr option causes the Data Streams Control Register to be set to the value specified when executing this program.
The noprefetch option causes the compiler to generate no prefetch instructions and to not adjust the DSCR when executing this program.
qrestrict TBD
Causes the compiler to automatically generate parallel code using OMP controls when possible.
Tell the compiler that OMP controls are used to identify parallel code.
                Ensures that optimizations done by default at
                optimization levels -O3 and higher, and, optionally
                at -O2, do not alter the semantics of a program.
                The -qstrict=all, -qstrict=precision,
                -qstrict=exceptions, -qstrict=ieeefp, and
                -qstrict=order suboptions and their negative forms
                are group suboptions that affect multiple,
                individual suboptions. Group suboptions act as if
                either the positive or the no form of every
                suboption of the group is specified.
                Default:
                     o Always -qstrict or -qstrict=all when the
                     -qnoopt or -O0 optimization level is in effect
                     o -qstrict or -qstrict=all is the default when
                     the -O2 or -O optimization level is in effect
                     o -qnostrict or -qstrict=none is the default
                     when -O3 or a higher optimization level is in
                     effect
                <suboptions_list> is a colon-separated list of one
                or more of the following:
                all | none
                     all disables all semantics-changing
                     transformations, including those controlled by
                     the ieeefp, order, library, precision, and
                     exceptions suboptions.  none enables these
                     transformations.
                precision | noprecision
                     precision disables all transformations that
                     are likely to affect floating-point precision,
                     including those controlled by the subnormals,
                     operationprecision, association,
                     reductionorder, and library suboptions.
                     noprecision enables these transformations.
                exceptions | noexceptions
                     exceptions disables all transformations likely
                     to affect exceptions or be affected by them,
                     including those controlled by the nans,
                     infinities, subnormals, guards, and library
                     suboptions. noexceptions enables these
                     transformations.
                ieeefp | noieeefp
                     ieeefp disables transformations that affect
                     IEEE floating-point compliance, including
                     those controlled by the nans, infinities,
                     subnormals, zerosigns, and operationprecision
                     suboptions. noieeefp enables these
                     transformations.
                nans | nonans
                     nans disables transformations that may produce
                     incorrect results in the presence of, or that
                     may incorrectly produce IEEE floating-point
                     signaling NaN (not-a-number) values. nonans
                     enables these transformations.
                infinities | noinfinities
                     infinities disables transformations that may
                     produce incorrect results in the presence of,
                     or that may incorrectly produce floating-point
                     infinities.  noinfinities enables these
                     transformations.
                subnormals | nosubnormals
                     subnormals disables transformations that may
                     produce incorrect results in the presence of,
                     or that may incorrectly produce IEEE
                     floating-point subnormals (formerly known as
                     denorms). nosubnormals enables these
                     transformations.
                zerosigns | nozerosigns
                     zerosigns disables transformations that may
                     affect or be affected by whether the sign of a
                     floating-point zero is correct. nozerosigns
                     enables these transformations.
                operationprecision | nooperationprecision
                     operationprecision disables transformations
                     that produce approximate results for
                     individual floating-point operations.
                     nooperationprecision enables these
                     transformations.
                order | noorder
                     order disables all code reordering between
                     multiple operations that may affect results or
                     exceptions, including those controlled by the
                     association, reductionorder, and guards
                     suboptions. noorder enables code reordering.
                association | noassociation
                     association disables reordering operations
                     within an expression. noassociation enables
                     reordering operations.
                reductionorder | noreductionorder
                     reductionorder disables parallelizing
                     floating-point reductions. noreductionorder
                     enables these reductions.
                guards | noguards
                     guards disables moving operations past guards
                     or calls which control whether the operation
                     should be executed or not. enables these
                     moving operations.
                library | nolibrary
                     library disables transformations that affect
                     floating-point library functions. nolibrary
                     enables these transformations.
Macro to have compiler always inline externs if specified.
The inline option specifies the threshold and limit of inlined functions
The inline suboption specifies the threshold and limit of inlined functions
The partition suboption specifies the size of the program sections that are analysed together. Larger partitons may produce better analysis but require more storage. Default is medium.
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
Link with tcmalloc's library for Linux on POWER. This is a library that optimizes calls to new, delete, malloc and free.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
| Parameter | Description | Executable name | 
|---|---|---|
| a | Assembler | as | 
| b | Low-level optimizer | xlfcode | 
| c | Compiler front end | xlfentry | 
| d | Disassembler | dis | 
| F | C preprocessor | cpp | 
| h | Array language optimizer | xlfhot | 
| I | High-level optimizer, compile step | ipa | 
| l | Linker | ld | 
| z | Binder | bolt | 
Instructs the linker to include every object file in the specified library, rather than searching the library for the required object files.
Instructs the linker to include libdl.a to enable dynamic linking loader.
Turn off the effect of the --whole-archive flag.
Instructs the linker to allow multiple definitions and the first definition will be used. Normally when a symbol is defined multiple times, the linker will report a fatal error.
Pass the --hugetlbfs-link=BDT flag to the linker so that the text, initialized data, and BSS segments of the application are backed by hugepages.
Pass the --hugetlbfs-align flag to the linker so that we can control (by environment variable HUGETLB_ELFMAP) which program segments are placed in hugepages.
Determines substitute path names for XL Fortran executables such as the compiler, assembler, linker, and preprocessor. It can be used in combination with the -t option, which determines which of these components are affected by -B.
Pass the -q flag to the linker causing the final executable to have the relocation information.
Link with the Apache C++ Standard Library ("stdcxx"). "libstd8d.so" is a 32-bit shared library with optimization enabled.
Adds the directory for the Apache C++ Standard Library to the search path at link time.
Specifies library search directory for the Apache C++ Standard Library for use by the runtime linker. The information is recorded in the object file and passed to the runtime linker.
Causes the compiler to treat "char" variables as signed instead of the default of unsigned.
Indicates that the input fortran source program is in fixed form.
Adds an underscore to global entities to match the C compiler ABI
Permits the usage of "//" to introduce a comment that lasts until the end of the current source line, as in C++.
Invoke the IBM XL C compiler. 32-bit binaries are produced by default. Link with the IBM Advanced Toolchain libraries.
Invoke the IBM XL C++ compiler. 32-bit binaries are produced by default. Link with the IBM Advanced Toolchain libraries.
Invoke the IBM XL Fortran compiler. 32-bit binaries are produced by default. Link with the IBM Advanced Toolchain libraries.
Allows most any c dialect.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
Specifies the size of the compiler's internal program storage areas, in bytes.
Causes the compiler to output a traceback if it abends.
Suppresses the message with the message number specified.
Suppresses informational, language-level, and warning messages. This option sets -qflag=e:e.
Usage:
      - First we copied the original executable (baseexe) to baseexe.orig.
      - Then, the executable is instrumented and its initial profile generated, as follows:
        $ fdprpro -a instr baseexe
        The output will be generated (by default) in baseexe.instr and its profile in baseexe.nprof.
      - Next, run baseexe.instr using the training data. This will fill the profile file with information that characterizes the training workload.
      - Finally, re-run FDPR-Pro with the profile file provided, as follows:
        $ fdprpro -a opt -f baseexe.nprof [optimization options] baseexe
fdprpro [options] -f profile program
where -f specifies the profile run data.  program is the name of the executable.
[options] can be one or more of the following:
  Action Options:
  -a/--action [action] 	Specifies customized actions
  where [action] can be one of the following:
  	anl          analyze program
  	instr        generate instrumented program for profile gathering (same as -1)
  	opt          generate optimized program (same as -3)
  	check_sign   check fdpr signature in the input program
 Action Options:     
  -anl, --analyze-program
                         Analyze the program but do not create a modified binary.
                         This option is used to generate profile/code coverage
                         reports in text format. When used with the -d option it
                         will generate the disassembly of the original program
  -cci, --code-coverage-instrumentation
                         Instrument program in order to obtain code coverage
                         information. program must be compiled with line number
                         debug info
  -pi, --profile-instrumentation
                         Instrument the program to obtain execution count profile
  -ui, --user-instrumentation
                         Instrument program by insert calls to user supplied
                         functions compiled into shared library
 Analysis Options:   
  -aawc/-noaawc, --analyze-assembly-written-csects/--noanalyze-assembly-written-csects
                         Analyze/Do not analyze objects written in Assembly.
  -acf <analysis configuration file>, --analysis-configuration-file <analysis configuration file>
                         Provide a configuration file of analysis information
                         (advanced option)
  -asd, --analyze-static-data
                         Analyze static data objects as distinct data elements
                         for data reordering (unsafe for certain compilers)
  -esa, --extra-safe-analysis
                         Limit analysis phase to compiler generated code
  -fca, --funcsect-analysis
                         Apply special analysis for an input executable that was
                         compiled with the -qfuncsect compiler option
  -ff <string>, --file-format <string>
                         Input file format: can be LM (load module) or PO
                         (program object)
  -ifl <file>, --ignored-function-list <file>
                         Set the ignored function list. The file contains names
                         of functions that considered as unsafe and thus are not
                         modified
  -iinf, --ignore-info   Ignore .info sections produced with the -qfdpr option
                         during compile time
 Instrumentation Options:
                          
  -ccl <level>, --code-coverage-level <level>
                         Perform code coverage at the basic block level (BB) or
                         at the functions level (FUNC). default is BB
  -ei, --embedded-instrumentation
                         Perform embedded instrumentation. The profile will be
                         collected into the application's global data area. When
                         the application terminates, the collected data will be
                         lost
  -fd <Fdesc>, --file-descriptor <Fdesc>
                         Set the file descriptor number to be used when opening
                         the profile file. The default of <Fdesc> is set to the
                         maximum-allowed number of open files
  -icsp, --instr-call-site-profiling
                         Instrument each basic block in order to collect each
                         caller context frequency
  -icvp, --instr-call-value-profiling
                         instrument the values of parameters passed in function
                         calles
  -imullX, --mullX-instrumentation
                         perform value profiling of RA and RB operands in mullX
                         instructions
  -iderat, --derat-instrumentation
                         Perform value profiling of RA and RB operands in
                         load/store indexed instructions
  -infp, --ignore-not-found-procedures
                         Ignore not found procedures defined in the
                         instrumentation directives file and do not exit with
                         error
  -ipcr/-noipcr, --instrumentation-preserve-condition-register/--noinstrumentation-preserve-condition-register
                         Preserve/Do not preserve the condition register while
                         calling stubs
  -ipctr/-noipctr, --instrumentation-preserve-count-register/--noinstrumentation-preserve-count-register
                         Preserve/Do not preserve the count register while
                         calling stubs
  -ipe/-noipe, --instrumentation-preserve-environment/--noinstrumentation-preserve-environment
                         Do not preserve registers that are not overwritten while
                         calling stubs. -noipe implies -noipvr -noipspr
  -iplr/-noiplr, --instrumentation-preserve-link-register/--noinstrumentation-preserve-link-register
                         Preserve/Do not preserve the link register while calling
                         stubs
  -ipnvr, --instrumentation-preserve-non-volatile-registers
                         Preserve the non volatile registers while calling stubs.
  -ipspr/-noipspr, --instrumentation-preserve-special-registers/--noinstrumentation-preserve-special-registers
                         Preserve/Do not preserve the special purpose registers
                         while calling stubs
  -ipvr/-noipvr, --instrumentation-preserve-volatile-registers/--noinstrumentation-preserve-volatile-registers
                         Preserve/Do not preserve the volatile registers while
                         calling stubs. -noipvr implies -noipnvr and -nosfp
  -ipxer/-noipxer, --instrumentation-preserve-fixed-point-exception-register/--noinstrumentation-preserve-fixed-point-exception-register
                         Preserve/Do not preserve the fixed-point exception
                         register while calling stubs
  -issu, --instrumentation-safe-stack-usage
                         Ensure that additional stack space is properly allocated
                         for the instrumented run. Use this option if your
                         application uses the stack extensively (e.g., when the
                         program uses alloca()). Note that this option adds
                         extra overhead on instrumentation code
  -iso <offset>, --instrumentation-stack-offset <offset>
                         Set the offset from the stack, a negative number, where
                         the instrumentation's area for saving registers is kept
                         at runtime. Use with care
  -M <addr>, --profile-map <addr>
                         Set the shared memory segment address for profiling.
                         Alternative shared memory addresses are needed when the
                         instrumented program application creates a conflict
                         with the shared-memory addresses preserved for the
                         profiling. Typical alternative values are 0x40000000,
                         0x50000000, ... up to 0xC0000000. The default is set to
                         0x3000000
  -ptm, --profile-to-memory
                         Use shared memory key instead of file mapping to obtain
                         a shared memory area for the profile data
  -ri/-nori, --register-instrumentation/--noregister-instrumentation
                         Instrument/Do not instrument the input program file to
                         collect profile information about indirect branches via
                         registers. The default is set to collect the profile
                         information
  -sfp/-nosfp, --save-floating-point-registers/--nosave-floating-point-registers
                         Save/Do not save floating point registers in
                         instrumented code. The default is set to save floating
                         point registers
  -shmkey <key number>, --shared-memory-key <key number>
                         Specify a shared memory key to use when creating a
                         shared memory area for the profile. The default key is
                         created by hashing the profile file name (with ftok).
  -spescr <0-127>, --spe-scratch-register <0-127>
                         Specify a global SPE scratch register, decreasing
                         instrumenation overhead, in order to minimize
                         possibility of local store overflow
 Profile Files Options:
                          
  -af <prof_file>, --ascii-profile-file <prof_file>
                         Set the name of a text format profile file containing
                         profile information.
  -aop, --accept-old-profile
                         Accept the old profile file collected on previous
                         versions of the input program file (requires the -f
                         flag)
  -f <prof_file>, --profile-file <prof_file>
                         Set the profile file name. The profile file is created
                         during the instrumentation phase and read during the
                         optimization phase. The profile file is updated each
                         time you run the instrumented program
  -fdir <prof_file_dir>, --profile-file-directory <prof_file_dir>
                         Set the run-time location of the profile file. The
                         profile will be search during the profiling phase at
                         this location. The default location is the path given
                         in the profile file name (-f option). Applicable only
                         at instrumentation phase
 Optimization Options:
                          
  -A <alignment>, --align-code <alignment>
                         Specify code alignment strategy. 1: Use grouping rules
                         of target machine (default), 2: Same as 1 but consider
                         also hotness of branch targets. See -m for the selected
                         machine model.
  -abb <factor>, --align-basic-blocks <factor>
                         Align basic blocks that are hotter than the average by a
                         given (float) <factor>. This is a lower-level
                         machine-specific alignment compared to --align-code.
                         Value of -1 (the default) disables this option
  -bh <factor>, --branch-hint <factor>
                         add branch hints to basic blocks that are hotter then
                         the average by given (float) <factor>. This is a SPE
                         specific optimization. Value of -1 (the default)
                         disables this option
  -ccc <threshold>, --cold-code-connector <threshold>
                         Preserves original order for code which is less
                         frequently executed than given threshold
  -bldcg, --build-dcg    Build a Data Connectivity Graph (DCG) for enhanced data
                         reordering (applicable only with the -RD flag)
  -cbpth, --cold-branch-prediction-threshold
                         Set the Cold Branch Prediction Threshold for branch
                         prediction optimization. Branches whose execution count
                         relative to the average is below this value will be
                         statically predicted. Allowed values are between (0,1).
                         Default is -1 - optimization is not applied.
                         (Applicable only with the -bp flag)
  -bpth, --branch-prediction-threshold
                         Set threshold for event based branch prediction
                         optimization
  -pbp, --preserve-branch-predication
                         Preserve branch predication pattern (bc+8) and avoid
                         code reordering and branch prediction
  -btcar, --branch-table-csect-anchor-removal
                         Eliminate load instructions used when accessing branch
                         tables
  -cbsi, --chain-based-selective-inline
                         Perform selective inlining of functions that produce
                         long hot chains of code
  -cbtd, --convert-bss-to-data
                         Convert BSS section into a data section. This is useful
                         for more aggressive tocload and RD optimizations
  -cib-opt, --convert-indirect-branches-optimization
                         Convert indirect branch to direct branch
  -cRD, --conservativeRD
                         Perform conservative static data reordering by packing
                         together all frequently referenced static variables
  -dce, --dead-code-elimination
                         Eliminate instructions related to unused local variables
                         within frequently executed functions. This is useful
                         mainly after applying function inlining optimization
  -dp, --data-prefetch   Insert data-cache prefetch instructions to improve
                         data-cache performance
  -dpht <threshold>, --data-placement-hotness-threshold <threshold>
                         Set data placement algorithm hotness threshold between
                         (0,1), where 0 reorders the static variables in large
                         groups based on the control flow, and 1 reorders the
                         variables in very small groups based on their access
                         frequency. (This is applicable only with the -RD flag)
  -dpnf <factor>, --data-placement-normalization-factor <factor>
                         Set data placement algorithm normalization factor
                         between (0,1), where 0 causes static variables to be
                         reordered regardless of their size, and 1 locates only
                         small sized variables first. (applicable only with the
                         -RD flag)
  -ece, --epilog-code-eliminate
                         Reduce code size by grouping common instructions in
                         function epilogs, into a single unified code
  -fatc <num_of_bytes>, --fat-const <num_of_bytes>
                         Inflate constant areas in code section by adding
                         <num_of_bytes> (entire set to 255) to each constant
                         area
  -fatd <num_of_bytes>, --fat-data <num_of_bytes>
                         Inflate data section by adding <num_of_bytes> (entire
                         set to 255) to each data basic unit
  -fatn <num_of_nops>, --fat-nop <num_of_nops>
                         Inflate code secion by adding <num_of_nop> to each code
                         basic block
  -bined < binary_editor>, --binary-editor < binary_editor>
                         Edit existing binary code (advanced option)
  -fc, --function-cloning
                         Enable function cloning phase only during function
                         inlining optimizations (applicable only with function
                         inlining flags: -i, -si, -ihf, -isf, -shci)
  -hr, --hco-reschedule  Relocate instructions from frequently executed code to
                         rarely executed code areas, when possible
  -hrf <factor>, --hco-resched-factor <factor>
                         Set the aggressiveness of the -hr optimization option
                         according to a factor value between (0,1), where 0 is
                         the least aggressive factor (applicable only with the
                         -hr option)
  -tasr, --toc-anchor-store-reschedule
                         Relocate TOC store instructions from frequently executed
                         code to rarely executed code areas, when possible
  -i, --inline           Same as --selective-inline with --inline-small-funcs 12
  -ia, --indirect-analysis
                         Perform indirect branch target analysis
  -icm-opt, --icm-optimization
                         Replace a sequence of l/ltr or ly/ltr instructions with
                         and icm or icmy instruction respectively
  -ihf <pct>, --inline-hot-functions <pct>
                         Inline all function call sites to functions that have a
                         frequency count greater than the given <pct> frequency
                         percentage
  -iplte, --inline-plt-entries
                         Replaces the call to a PLT entry with the PLT entry code
                         itself, by inlining the first part of the entry
  -isf <size>, --inline-small-funcs <size>
                         Inline all functions that are smaller than or equal to
                         the given <size> in bytes
  -kr, --killed-registers
                         Eliminate stores and restores of registers that are
                         killed (overwritten) after frequently executed function
                         calls
  -lal-opt, --load-after-load-optimization
                         Replace two load instruction from the same memory
                         location to one load instruction and one placement
                         instruction
  -lap, --load-address-propagation
                         Eliminate load instructions of variable addresses by
                         re-using pre-loaded addresses of adjacent variables
  -larl-opt, --larl-optimization
                         Replace a sequence of bras/const area/llgt instructions
                         with a single lalr instruction
  -las, --load-after-store
                         Add NOP instructions to place each load instruction
                         further apart following a store instruction that
                         references the same memory address
  -plas, --pattern-based-load-after-store
                         Optimizes inefficient memory access patterns in order to
                         avoid load-after-store events. 
  -ebplas, --event-based-pattern-based-load-after-store
                         Optimizes inefficient memory access patterns in order to
                         avoid load-after-store events. The optimization is
                         possible if PM_MRK_LSU_REJECT_LHS profile is available
  -rcl, --remove-constant-load
                         Reduces the number of load instructions used to bring
                         constant values into registers. The parameter is used
                         to control which version of optimization is applied,
                         versions from 0 to 3 are available.
  -ldce, --local-dead-code-optimization
                         Local dead code elimination (basic block scope only) -
                         needless when using -dce
  -ldp-opt, --long-displacement-optimization
                         Replace an instruction which has long displacement with
                         the matching insturction which has short displacement,
                         according to the displacement operand (e.g. ay-->a,
                         oy-->o, xy-->x, etc.)
  -lgfr-opt, --lgfr-optimization
                         Replace when can a 32 bit instruction with its matching
                         64 bit instruction
  -llgh-opt, --llgh-optimization
                         Replace a sequence of lh/nilh/llgfr instructions with a
                         single llgh instruction
  -fce, --fix-cobol-entries
                         An optimization for COBOL code - fixes entries of
                         CSECTs. Needed for HLR optimizations.
  -pvgc <mode>, --print-visual-graph-csect <mode>
                         Print a .dot file with CFG information for each csect.
                         Mode 0 is for a graph containing full instructions list
                         for each node, 1 is for a graph with short nodes
                         description.
  -pvgf <mode>, --print-visual-graph-func <mode>
                         Print a .dot file with CFG information for each
                         function. Mode 0 is for a graph containing full
                         instructions list for each node, 1 is for a graph with
                         short nodes description.
  -lro, --link-register-optimization
                         Eliminate saves and restores of the link register in
                         frequently-executed functions
  -lu <aggressiveness_factor>, --loop-unroll <aggressiveness_factor>
                         Unroll short loops containing one to several basic
                         blocks according to an aggressiveness factor between
                         (1,9), where 1 is the least aggressive unrolling option
                         for very hot and short loops
  -lun <unrolling_number>, --loop-unrolling-number <unrolling_number>
                         Set the number of unrolled iterations in each unrolled
                         loop. The allowed range is between (2,50). Default is
                         set to 2. (Applicable only with the -lu flag)
  -lux <unrolling_factor>, --loop-unroll-extended <unrolling_factor>
                         Unroll hot loops using given unrolling factor. The
                         allowed values are integer numbers that are power of 2.
                         Value -1 disables the optimization, value 1 calculates
                         the unrolling factor automatically, given a machine
                         model
  -mvc-opt, --mvc-optimization
                         Replace an mvc instruction with lg/stg instructions
  -nillr15-opt, --nillr15-optimization
                         Remove a nill r15,0xfffe instruction if followed by an
                         stmg r14,r12,8(r13) instruction
  -sls, --store-load-on-stack-opt
                         Optimize store load on stack pattern
  -fmrx, --fmr-to-xxlor  Replace FMR instructions from reordered code with XXLOR
                         instruction
  -xscpx, --xscpsgndp-to-xxlor
                         Replace Xscpsgndp instructions from reordered code with
                         XXLOR instruction
  -dir, --dependant-instr-resched
                         Put NOP between dependant instructions
  -O                     Switch on basic optimizations only. Same as -RC -nop -bp
                         -bf
  -O2                    Switch on less aggressive optimization flags. Same as -O
                         -hr -pto -isf 8 -tlo -kr -see 0
  -O3                    Switch on aggressive optimization flags. Same as -O2 -RD
                         -isf 12 -si -lro -las -vro -btcar (for XCOFF files) -lu
                         9 -rt 0 -so -see 1 -oderat
  -O4                    Switch on aggressive optimization flags together with
                         aggressive function inlining. Same as -O3 -sidf 50 -ihf
                         20 -sdp 9 -shci 90 and -bldcg (for XCOFF files)
  -ocvp, --opt-call-value-profiling
                         specialize function calls according to the values of
                         their passed parameters
  -ocsp, --opt-call-site-profiling
                         Cluster functions with simliar behaviour according to
                         calling context 
  -omullX, --mullX-optimization
                         Optimize mullX instructions by adding a run-time check
                         on RA and RB and performing equivalent operations with
                         lower penalty. The optimization requires the use of
                         -imullX in the instrumentation phase
  -oderat, --derat-optimization
                         Optimize load/store indexed instructions by adding a
                         run-time check on RA and RB and performing equivalent
                         operations with lower penalty. The optimization
                         requires the use of -iderat in the instrumentation
                         phase
  -pbsi, --path-based-selective-inline
                         Perform selective inlining of dominant hot function
                         calls based on the control flow paths leading to hot
                         functions
  -pc, --preserve-csects
                         Preserve CSects' boundaries in reordered code
  -pca, --propagate-constant-area
                         Relocate the constant variables area to the top of the
                         code section when possible
  -pfb, --preserve-first-bb
                         Preserve original location of the entry point basic
                         block in program
  -pp, --preserve-functions
                         Preserve functions' boundaries in reordered code
  -pr/-nopr, --ptrgl-r11/--noptrgl-r11
                         Perform/Do not perform removal of R11 load instruction
                         in _ptrgl csect (the default is to perform the
                         optimization)
  -pto, --ptrgl-optimization
                         Perform optimization of indirect call instructions via
                         registers by replacing them with conditional direct
                         jumps
  -ptoht <heatness_threshold>, --ptrgl-optimization-heatness-threshold <heatness_threshold>
                         Set the frequency threshold for indirect calls that are
                         to be optimized by -pto optimization. Allowed range
                         between 0 and 1. Default is set to 0.8. (Applicable
                         only with -pto flag)
  -ptosl <limit_size>, --ptrgl-optimization-size-limit <limit_size>
                         Set the limit of the number of conditional statements
                         generated by -pto optimization. Allowed values are
                         between 1 and 100. Default value is set to 3.
                         (Applicable only with the -pto flag)
  -rcaf <aggressiveness_factor>, --reorder-code-aggressivenes-factor <aggressiveness_factor>
                         Set the aggressiveness of code reordering optimization.
                         Allowed values are [0 | 1 | 2], where 0 preserves then
                         original code order and 2 is the most aggressive.
                         Default is set to 1. (Applicable only with the -RC
                         flag)
  -rccrf <reversal_factor>, --reorder-code-condition-reversal-factor <reversal_factor>
                         Set the threshold fraction that determines when to
                         enable condition reversal for each conditional branch
                         during code reordering. Allowed input range is between
                         0.0 and 1.0 where 0.0 tries to preserve original
                         condition direction and 1.0 ignores it. Default is set
                         to 0.8 (Applicable only with the -RC flag)
  -rcctf <termination_factor>, --reorder-code-chain-termination-factor <termination_factor>
                         Set the threshold fraction that determines when to
                         terminate each chain of basic blocks during code
                         reordering. Allowed input range is between 0.0 and 1.0
                         where 0.0 generates long chains and 1.0 creates single
                         basic block chains. Default is set to 0.05. (Applicable
                         only with the -RC flag)
  -RD, --reorder-data    Perform static data reordering
  -ippcf, --instrument-for-path-profiling
                         Perform cross function path profiling instrumentation
  -ppcf, --optimize-with-path-profiling
                         Perform cross function path profiling optimization
  -rmte, --remove-multiple-toc-entries
                         Remove multiple TOC entries pointing to the same
                         location in the input program file
  -rt <removal_factor>, --reduce-toc <removal_factor>
                         Perform removal of TOC entries according to a removal
                         factor between (0,1), where 0 removes non-accessed TOC
                         entries only and 1 removes all possible TOC entries
  -rtb, --remove-traceback-tables
                         Remove traceback tables in reordered code
  -rcs, --remove-csect-symbols
                         Remove csect symbols
  -sal-opt, --store-after-load-optimization
                         Remove store after load when there is no change
  -scca <level>, --safe-calling-conventions-analysis <level>
                         Determine how conservative must FDPR be when analysing a
                         function that may break calling conventions
  -sdp <aggressiveness_factor>, --stride-data-prefetch <aggressiveness_factor>
                         Perform data prefetching within frequently executed
                         loops based on stride analysis, according to an
                         aggressiveness factor between (1,9), where 1 is the
                         least aggressive
  -sdpila <instructions_number>, --stride-data-prefetch-instruction-look-ahead <instructions_number>
                         Set the number of instructions for which data is
                         prefetched into the cache ahead of time. Default value
                         is platform dependant. (Applicable only with the -sdp
                         flag)
  -sdpms <stride_min_size>, --stride-data-prefetch-min-size <stride_min_size>
                         Set the minimal stride size in bytes, for which data
                         will be considered a candidate for prefetching. Default
                         value is set to 128 bytes. (Applicable only with the
                         -sdp flag)
  -ebp <evt_based_prefetch>, --event-based-prefetch <evt_based_prefetch>
                         Perform data prefetching based on the events file
  -ebpla <instructions_number>, --event-based-prefetch-look-ahead <instructions_number>
                         Set the number of instructions for which event based
                         prefetch is performed. Default value is platform
                         dependant. (Applicable only with the -ebp flag)
  -see <level>           Use simplified prolog/epilog for functions that perform
                         conditional early-exit. Use basic optimization with
                         <level>=0 and maximal with <level>=1
  -shci <pct>, --selective-hot-code-inline <pct>
                         Perform selective inlining of functions in order to
                         decrease the total number of execution counts, so that
                         only functions with hotness above the given percentage
                         are inlined
  -si, --selective-inline
                         Perform selective inlining of dominant hot function
                         calls
  -sidf <percentage_factor>, --selective-inline-dominant-factor <percentage_factor>
                         Set a dominant factor percentage for selective inline
                         optimization. The allowed range is between 0 and 100.
                         Default is set to 80. (Applicable only with the -si and
                         -pbsi flags)
  -siht <frequency_factor>, --selective-inline-hotness-threshold <frequency_factor>
                         Set a hotness threshold factor percentage for selective
                         inline optimization to inline all dominant function
                         calls that have a frequency count greater than the
                         given frequency percentage. Default is set to 100.
                         (Applicable only with the -si -pbsi flags)
  -slbp, --spinlock-branch-prediction
                         Perform branch prediction bit setting for conditional
                         branches in spinlock code containing l*arx and st*cx
                         instructions. (Applicable after -bp flag)
  -sldp, --spinlock-data-prefetch
                         Perform data prefetching for memory access instructions
                         preceding spinlock code containing l*arx and st*cx
                         instructions
  -sll <Lib1:Prof1,...,LibN:ProfN>, --static-link-libraries <Lib1:Prof1,...,LibN:ProfN>
                         Statically link hot code from specified dynamically
                         linked libraries to the input program. The parameter
                         consists of a comma-separated list of libraries and
                         their profiles. IMPORTANT: Licensing rights of
                         specified libraries should be observed when applying
                         this copying optimization
  -sllht <hotness_threshold>, --static-link-libraries-hotness-threshold <hotness_threshold>
                         Set hotness threshold for the --static-link-libraries
                         optimization. The allowed input range is between 0
                         (least aggressive) and 1, or -1, which does not require
                         a profile and selects all code that might be called by
                         the input program from the given libraries. Default is
                         set at 0.5
  -so, --stack-optimization
                         Reduce the stack frame size of functions that are called
                         with a small number of arguments
  -spc, --shortcut-plt-calls
                         Shortcut PLT calls in shared libraries to local
                         functions if they exist. Note: Resolving to external
                         symbols is disabled for such calls
  -stf, --stack-flattening
                         Merge the stack frames of inlined functions with the
                         frames of the calling functions
  -tb, --preserve-traceback-tables
                         Force the restructuring of traceback tables in reordered
                         code. If -tb option is omitted, traceback tables are
                         automatically included only for C++ applications that
                         use the Try & Catch mechanism
  -tlo, --tocload-optimization
                         Replace each load instruction that references the TOC
                         with a corresponding add-immediate instruction via the
                         TOC anchor register, where possible
  -ucde, --unreachable-code-data-elimination
                         Remove unreachable code and non-accessed static data
  -vro, --volatile-registers-optimization
                         Eliminate stores and restores of non-volatile registers
                         in frequently executed functions by using available
                         volatile registers
  -vrox, --volatile-registers-extended-optimization
                         Eliminate stores and restores of non-volatile registers
                         in frequently executed functions by using available
                         volatile registers, the extended version supports FP
                         registers and transparency
 Output Options:     
  -bcdf <file>, --binary-code-dump-file <file>
                         Create a binary dump of the code (opcodes) with
                         annotations of addresses.
  -ccgi <mode>, --code-coverage-generate-info <mode>
                         Produce coverage information in a file based on profile
                         information. Use <mode>=XML for an XML output and
                         <mode>=FLAT for a formatted text file. The generated
                         file is <output file>.cci[.xml]
  -cep, --complement-edge-profile
                         Complements partial profile information given for the
                         basic blocks' frequencies by adding missing basic
                         block-to-basic block edge counts
  -d, --disassemble-text
                         Print the disassembled text section of the output
                         program into <output_file>.dis_text file
  -dap, --dump-ascii-profile
                         Dump profile information in ASCII format into
                         <program>.aprof (requires the -f flag).
  -db, --disassemble-bss
                         Print the disassembled bss section of the output program
                         into <output_file>.dis_bss file
  -dd, --disassemble-data
                         Print the disassembled data section of the output
                         program into <output_file>.dis_data file
  -diap, --dump-initial-ascii-profile
                         Dump the given profile information in ASCII format into
                         <program>.aprof.init (requires the -f flag)
  -dim, --dump-instruction-mix
                         Dump instruction mix statistics based on gathered
                         profile information
  -dm, --dump-mapper     Print a map of basic blocks and static variables with
                         their respective new -> old addresses into a
                         <program>.mapper file
  -enc, --encapsulate    Encapsulate SPE executables present in the PPE input
                         (see --spe-directory)
  -o <output_file>, --output-file <output_file>
                         Set the name of the output file. The default
                         instrumented file is <program>.instr. The default
                         optimized file is <program>.fdpr
  -scl, --show-constant-load
                         Adds annotaions in fdpr disassembly on load instructions
                         used to bring constant values into registers (requires
                         -d flag)
  -pds, --preserve-debug-symbols
                         Preserve debug symbols
  -plc, --preserve-linkage-conventions
                         Preserve linkage conventions
  -ppcf, --print-prof-counts-file
                         Print a text format of the profiling counters into a
                         <program>.counts file (requires the -f flag).
  -sf, --strip-file      Strip the output file
  -simo, --single-input-multiple-outputs
                         Optimize in parallel into multiple outputs as specified
                         by option sets read from stdin
  -spedir <directory>, --spe-directory <directory>
                         Set the directory into which SPE executables will be
                         extracted and from which they will be encapsulated
 General Options:    
  -cell, --cell-supervisor
                         Integrated PPE/SPE processing. Perform SPE extraction,
                         processing, and encapsulation automatically prior to
                         PPE processing
  -h, --help             Print the online help
  -j <jour_file>, --journal <jour_file>
                         Output optimization journal information to <jour_file>
  -smt, --smt_mode       set SMT mode (1:ST, 2: (SMT2-shared, SMT2-split),
                         4:SMT4, 8:SMT8)
  -m <machine-model>, --machine <machine-model>
                         Generate code for the specified machine model. Target
                         machine can be one of the following models: power2,
                         power3, ppc405, ppc440, power4, ppc970, power5, power6,
                         power7, power8, ppe, spe, spe_edp, z10, z9.
                         Default is power7
  -q, --quiet            Set the output mode to quiet, suppressing informational
                         messages
  -st <stat_file>, --statistics <stat_file>
                         Output statistics information to <stat_file>. If
                         <stat_file> is '-', the output goes to the standard
                         output. See --verbose for the default
  -v <level>, --verbose <level>
                         Set verbose output mode level. When set, various
                         statistics about the output program are printed into
                         the file <program>.stat. Allowed level range is between
                         0 and 3. Default is set to 0
  -V, --version          Print the version number
  -w <level>, --warning-level <level>
                         Set the warning level so only errors of this level and
                         below will be printed. The levels are: 1: errors, 2:
                         warnings, 3: debug warning, 4: debug information.
                         Default is 2
  -armember              For archive files - list of archive members to be
                         optimized, if -armember is not specified, all members
                         will be optimized