<?xml version="1.0"?>
<!DOCTYPE flagsdescription
    SYSTEM "http://www.spec.org/dtd/cpuflags1.dtd"
>

<!--
-->
<!-- The lines above are NOT optional.  If you're adept at reading DTDs,
     the one that this file conforms to is at the URL listed above.  
     
     But most humans writing a flags file will want to have it automatically 
     checked using a validating parser such as RXP (available at
     http://www.ltg.ed.ac.uk/~richard/rxp.html), or use one of the on-line
     parsers:
       http://www.stg.brown.edu/service/xmlvalid/
       http://www.cogsci.ed.ac.uk/~richard/xml-check.html
     
     The parser used by the CPU tools is _not_ a validating parser, so it
     may be possible to sneak things by it that would not pass the checkers
     above.  However, if the checkers above say that your file is clean, it's
     clean.

     Flag files submitted to SPEC _will_ be checked by a validating parser.
     Invalid or not-well-formed flag files will be rejected.
-->

<!-- **********************************************************************
     **********************************************************************
     Unless otherwise explicitly noted, all references to "section n.nn"
     refer to flag_description.html, available at

     http://www.spec.org/cpu2006/docs/flag_description.html
     **********************************************************************
     ********************************************************************** -->

<!--
     This file is
       Copyright (C) 2006 Standard Performance Evaluation Corporation
       All Rights Reserved
     
     This file may be freely modified and redistributed, provided that the
     copyright notice above and this notice remain unaltered.

-->

<flagsdescription>

<filename>IBM-Linux-XL</filename>

<title>Linux on Power with IBM XL Compilers SPEC CPU 2006 Flags</title>

<style>
<![CDATA[
body { background: white; }
]]>
</style>

<!-- =====================================================================
  The <header> section is also entirely optional.  If it is provided, and
  no class is specified, then it will be inserted verbatim at the top
  of the flags dump.

  If a class is specified, that text will be inserted verbatim before flags
  of that class.  
  
  As the contents should be HTML, it will save lots of time to just enclose
  the whole thing in a CDATA section.  Section 2.3.1 again.
     ===================================================================-->
<header>
<![CDATA[
<p>Compilers: IBM XL C/C++ Advanced Edition for Linux V9.0 and XL Fortran Advanced Edition for Linux V11.1</p>
<p>Compilers: IBM XL C/C++ for Linux V10.1 and XL Fortran for Linux V12.1</p>
<p>Compilers: IBM XL C/C++ for Linux V11.1 and XL Fortran for Linux V13.1</p>
<p>Operating systems: SUSE Linux Enterprise 10, SUSE Linux Enterprise 11, and Red Hat Enterprise Linux Advanced Platform 5</p>
<p>Last updated: 23-Jun-2010</p>
]]>
</header>

<!-- =====================================================================
  Information about the meaning of boot-time settings, BIOS options,
  kernel tuning, and so forth can go in the 'platform_settings' section.

  They'll be appended to the end of both the flags dump and per-result flag report.

  As the contents should be HTML, it will save lots of time to just enclose
  the whole thing in a CDATA section.  Section 2.3.1 again.
     ===================================================================-->
<platform_settings>
<![CDATA[

<ul>
<li> <kbd>ulimit -s 1048576</kbd> 
<br />Sets the maximum stack size to "<kbd>1048576 KB</kbd>".</li>
<li> To reserve 200 huge pages out of the physical memory pool, issue the following command,
<pre>
echo 200 > /proc/sys/vm/nr_hugepages
</pre> 
or 
<pre>
echo 200 > /proc/sys/vm/nr_overcommit_hugepages
</pre>
to allocate from the dynamic hugepage pool.
</li>
<li> chsyscfg -m <tt>system</tt> -r prof -i name=<tt>profile</tt>,lpar_name=<tt>partition</tt>,lpar_proc_compat_mode=POWER6_enhanced <br />
      This command enables the POWERPC architecture optional instructions supported on POWER6.<br />

<pre>
Usage: chsyscfg -r lpar | prof | sys | sysprof | frame
                -m &lt;managed system&gt; | -e &lt;managed frame&gt;
                -f &lt;configuration file&gt; | -i "&lt;configuration data&gt;"
                [--help]

Changes partitions, partition profiles, system profiles, or the attributes of a
managed system or a managed frame.

    -r                        - the type of resource(s) to be changed:
                                  lpar    - partition
                                  prof    - partition profile
                                  sys     - managed system
                                  sysprof - system profile
                                  frame   - managed frame
    -m &lt;managed system&gt;       - the managed system's name
    -e &lt;managed frame&gt;        - the managed frame's name
    -f &lt;configuration file&gt;   - the name of the file containing the
                                configuration data for this command.
                                The format is:
                                  attr_name1=value,attr_name2=value,...
                                or
                                  "attr_name1=value1,value2,...",...
    -i "&lt;configuration data&gt;" - the configuration data for this command.
                                The format is:
                                  "attr_name1=value,attr_name2=value,..."
                                or
                                  ""attr_name1=value1,value2,...",..."
    --help                    - prints this help

The valid attribute names for this command are:
    -r prof     required: name, lpar_id | lpar_name
                optional: ...
                          lpar_proc_compat_mode (default | POWER6_enhanced)
</pre> </li>

<li> Each process was bound to a cpu using submit= with the numactl command
<pre> 
submit = numactl --membind=\$SPECCOPYNUM --physcpubind=\$SPECCOPYNUM $command
</pre> </li>

<li> numactl : Control NUMA policy for processes or shared memory
<pre>
     --membind=nodes
       Only  allocate  memory  from  nodes.   Allocation will fail when
       there is not enough memory available on these nodes.

    --physcpubind=cpus
       Only execute process on cpus.  This accepts physical cpu numbers
       as shown in the processor fields of /proc/cpuinfo.
</pre> </li>
 

<li>Environment variables that can be set before the run:
<pre>
HUGETLB_VERBOSE=0 : Turn off any debugging message from libhugetlbfs
HUGETLB_MORECORE=yes:  Instructs libhugetlbfs to override libc's normal morecore() function with a hugepage version and use it for malloc(). 
HUGETLB_MORECORE_HEAPBASE=0x50000000: Specifies that the hugepage heap address to start at 0x50000000. 
HUGETLB_ELFMAP=R ; Instructs libhugetlbfs to place text segment in hugepages.
HUGETLB_ELFMAP=W ; Instructs libhugetlbfs to place data and BSS segments in hugepages.
HUGETLB_ELFMAP=RW ; Instructs libhugetlbfs to place all segments in hugepages.
HUGETLB_ELFMAP=no ; Instructs libhugetlbfs not to place any segment in hugepages.
XLFRTEOPTS=intrinthrds=1 : Causes the Fortran runtime to only use a single thread.
</pre></li>
<li>IBM Post-Link Optimization (fdprpro): </li>
</ul>
<pre>

      - First we copied the original executable (baseexe) to baseexe.orig. 

      - Then, the executable is instrumented and its initial profile generated, as follows: 
        $ fdprpro -a instr baseexe 
        The output will be generated (by default) in baseexe.instr and its profile in baseexe.nprof. 

      - Next, run baseexe.instr using the training data. This will fill the profile file with information that characterizes the training workload.

      - Finally, re-run FDPR-Pro with the profile file provided, as follows: 
        $ fdprpro -a opt -f baseexe.nprof [optimization options] baseexe 

      Instrumentation Options Descriptions:
       -ei, --embedded-instrumentation
            Perform embedded instrumentation. The profile will be collected
            into global variables.

       -fd Fdesc, --file-descriptor Fdesc
            Set the file descriptor number to be used when opening the profile
            file. The default of Fdesc is set to the maximum-allowed number of
            open files.

       -imullX, --mullX-instrumentation
            perform value profiling of RA and RB operands in mullX instruc-
            tions.

       -issu, --instrumentation-safe-stack-usage
            Ensure additional stack space is properly allocated for the
            instrumented run. Use this option if your application uses stack
            extensively (e.g., when the program uses alloca()). Note that this
            option adds extra overhead on instrumentation code.

       -iso offset, --instrumentation-stack-offset offset
            Set the offset from the stack, a negative number, where the
            instrumentation's area for saving registers is kept at runtime.
            Use with care.

       -M addr, --profile-map addr
            Set shared memory segment address for profiling. Alternative
            shared memory addresses are needed when the instrumented program
            application creates a conflict with the shared-memory addresses
            preserved for the profiling. Typical alternative values are
            0x40000000, 0x50000000, ... up to 0xC0000000. The default is set
            to 0x3000000.

       -[no]ri, --[no]register-instrumentation
            Instrument the input program file to collect profile information
            about indirect branches via registers. The default is set to col-
            lect the profile information.

       -[no]sfp, --[no]save-floating-point-registers
            Save floating point registers in instrumented code. The default is
            set to save floating point registers.

       Optimization Options Descriptions:

       -A alignment, --align-code alignment
            Align program so that hot code will be aligned on alignment-byte
            addresses.

       -abb factor, --align-basic-blocks factor
            Align basic blocks that are hotter than the average by a given
            (float) factor. This is a lower-level machine-specific alignment
            compared to --align-code. Value of -1 (the default) disables this
            option.

       -bf, --branch-folding
            Eliminate branch to branch instructions.

       -bldcg, --build-dcg
            Build a Data Connectivity Graph (DCG) for enhanced data reordering
            (applicable only with the -RD flag).

       -bp, --branch-prediction
            Set branch prediction bit for conditional branches according to
            the collected profile.

       -btcar, --branch-table-csect-anchor-removal
            Eliminate load instructions used when accessing branch tables.

       -cbtd, --convert-bss-to-data
            Convert BSS section into a data section. This is useful for more
            aggressive tocload and RD optimizations.

       -cRD, --conservativeRD
            Perform conservative static data reordering by packing together
            all frequently referenced static variables.

       -dce, --dead-code-elimination
            Eliminate instructions related to unused local variables within
            frequently executed functions. This is useful mainly after apply-
            ing function inlining optimization.

       -dp, --data-prefetch
            Insert data-cache prefetch instructions to improve data-cache per-
            formance.

       -dpht threshold, --data-placement-hotness-threshold threshold
            Set data placement algorithm hotness threshold between (0,1),
            where 0 reorders the static variables in large groups based on the
            control flow, and 1 reorders the variables in very small groups
            based on their access frequency. (This is applicable only with the
            -RD flag).

       -dpnf factor, --data-placement-normalization-factor factor
            Set data placement algorithm normalization factor between (0,1),
            where 0 causes static variables to be reordered regardless of
            their size, and 1 locates only small sized variables first.
            (applicable only with the -RD flag).

       -ece, --epilog-code-eliminate
            Reduce code size by grouping common instructions in function epi-
            logs, into a single unified code.

       -fc, --function-cloning
            Enable function cloning phase only during function inlining opti-
            mizations (applicable only with function inlining flags: -i, -si,
            -ihf, -isf, -shci).

       -hr, --hco-reschedule
            Relocate instructions from frequently executed code to rarely exe-
            cuted code areas, when possible.

       -hrf factor, --hco-resched-factor factor
            Set the aggressiveness of the -hr optimization option according to
            a factor value between (0,1), where 0 is the least aggressive fac-
            tor (applicable only with the -hr option).

       -i, --inline
            Same as --selective-inline with --inline-small-funcs 12.

       -ihf pct, --inline-hot-functions pct
            Inline all function call sites to functions that have a frequency
            count greater than the given pct frequency percentage.

       -isf size, --inline-small-funcs size
            Inline all functions that are smaller than or equal to the given
            size in bytes.

       -kr, --killed-registers
            Eliminate stores and restores of registers that are killed (over-
            written) after frequently executed function calls.

       -lap, --load-address-propagation
            Eliminate load instructions of variable addresses by re-using pre-
            loaded addresses of adjacent variables.

       -las, --load-after-store
            Add NOP instructions to place each load instruction further apart
            following a store instruction that references the same memory
            address.

       -lro, --link-register-optimization
            Eliminate saves and restores of the link register in frequently-
            executed functions.

       -lu aggressiveness_factor, --loop-unroll aggressiveness_factor
            Unroll short loops containing one to several basic blocks accord-
            ing to an aggressiveness factor between (1,9), where 1 is the
            least aggressive unrolling option for very hot and short loops.

       -lun unrolling_number, --loop-unrolling-number unrolling_number
            Set the number of unrolled iterations in each unrolled loop. The
            allowed range is between (2,50). Default is set to 2. (Applicable
            only with the -lu flag).

       -nop, --nop-removal
            Remove NOP instructions from reordered code.

       -O   Switch on basic optimizations only. Same as -RC -nop -bp -bf.

       -O2  Switch on less aggressive optimization flags. Same as -O -hr -pto
            -isf 8 -tlo -kr.

       -O3  Switch on aggressive optimization flags. Same as -O2 -RD -isf 12
            -si -dp -lro -las -vro -btcar -lu 9 -rt 0 -so.

       -O4  Switch on aggressive optimization flags together with aggressive
            function inlining. Same as -O3 -sidf 50 -ihf 20 -sdp 9 -shci 90
            and -bldcg (for XCOFF files).

       -O5  Switch on aggressive optimization flags together with HLR opti-
            mization. Same as -O4 -sa -gcpyp -gcnstp -dce -vrox.

       -omullX, --mullX-optimization
            Optimize mullX instructions by adding a run-time check on RA and
            RB and performing equivalent operations with lower penalty. The
            optimization requires the use of -imullX in the instrumentation
            phase.

       -pbsi, --path-based-selective-inline
            Perform selective inlining of dominant hot function calls based on
            the control flow paths leading to hot functions.

       -pc, --preserve-csects
            Preserve CSects' boundaries in reordered code.

       -pca, --propagate-constant-area
            Relocate the constant variables area to the top of the code sec-
            tion when possible.

       -pfb, --preserve-first-bb
            Preserve original location of the entry point basic block in pro-
            gram.

       -pp, --preserve-functions
            Preserve functions' boundaries in reordered code.

       -[no]pr, --[no]ptrgl-r11
            Perform removal of R11 load instruction in _ptrgl csect.

       -pto, --ptrgl-optimization
            Perform optimization of indirect call instructions via registers
            by replacing them with conditional direct jumps.

       -ptoht heatness_threshold, --ptrgl-optimization-heatness-threshold
       heatness_threshold
            Set the frequency threshold for indirect calls that are to be
            optimized by -pto optimization. Allowed range between 0 and 1.
            Default is set to 0.8. (Applicable only with -pto flag).

       -ptosl limit_size, --ptrgl-optimization-size-limit limit_size
            Set the limit of the number of conditional statements generated by
            -pto optimization. Allowed values are between 1 and 100. Default
            value is set to 3. (Applicable only with the -pto flag).

       -RC, --reorder-code
            Perform code reordering.

       -rcaf aggressiveness_factor, --reorder-code-aggressivenes-factor
       aggressiveness_factor
            Set the aggressiveness of code reordering optimization. Allowed
            values are [0 1 2], where 0 preserves then original code order
            and 2 is the most aggressive. Default is set to 1. (Applicable
            only with the -RC flag).

       -rccrf reversal_factor, --reorder-code-condition-reversal-factor rever-
       sal_factor
            Set the threshold fraction that determines when to enable condi-
            tion reversal for each conditional branch during code reordering.
            Allowed input range is between 0.0 and 1.0 where 0.0 tries to pre-
            serve original condition direction and 1.0 ignores it. Default is
            set to 0.8 (Applicable only with the -RC flag).

       -rcctf termination_factor, --reorder-code-chain-termination-factor ter-
       mination_factor
            Set the threshold fraction that determines when to terminate each
            chain of basic blocks during code reordering. Allowed input range
            is between 0.0 and 1.0 where 0.0 generates long chains and 1.0
            creates single basic block chains. Default is set to 0.05. (Appli-
            cable only with the -RC flag).

       -RD, --reorder-data
            Perform static data reordering.

       -rmte, --remove-multiple-toc-entries
            Remove multiple TOC entries pointing to the same location in the
            input program file.

       -rt removal_factor, --reduce-toc removal_factor
            Perform removal of TOC entries according to a removal factor
            between (0,1), where 0 removes non-accessed TOC entries only and 1
            removes all possible TOC entries.

       -rtb, --remove-traceback-tables
            Remove traceback tables in reordered code.

       -sdp aggressiveness_factor, --stride-data-prefetch aggressiveness_fac-
       tor
            Perform data prefetching within frequently executed loops based on
            stride analysis, according to an aggressiveness factor between
            (1,9), where 1 is the least aggressive.

       -sdpla iterations_number, --stride-data-prefetch-look-ahead itera-
       tions_number
            Set the number of iterations for which data is prefetched into the
            cache ahead of time. Default value is set to 4 iterations. (Appli-
            cable only with the -sdp flag).

       -sdpms stride_min_size, --stride-data-prefetch-min-size stride_min_size
            Set the minimal stride size in bytes, for which data will be con-
            sidered a candidate for prefetching. Default value is set to 128
            bytes. (Applicable only with the -sdp flag).

       -see level
            Use simplified prolog/epilog for functions that perform condi-
            tional early-exit. Use basic optimization with level=0 and maximal
            with level=1.

       -shci pct, --selective-hot-code-inline pct
            Perform selective inlining of functions in order to decrease the
            total number of execution counts, so that only functions with hot-
            ness above the given percentage are inlined.

       -si, --selective-inline
            Perform selective inlining of dominant hot function calls.

       -sidf percentage_factor, --selective-inline-dominant-factor percent-
       age_factor
            Set a dominant factor percentage for selective inline optimiza-
            tion. The allowed range is between 0 and 100. Default is set to
            80. (Applicable only with the -si and -pbsi flags).

       -siht frequency_factor, --selective-inline-hotness-threshold fre-
       quency_factor
            Set a hotness threshold factor percentage for selective inline
            optimization to inline all dominant function calls that have a
            frequency count greater than the given frequency percentage.
            Default is set to 100. (Applicable only with the -si -pbsi flags).

       -slbp, --spinlock-branch-prediction
            Perform branch prediction bit setting for conditional branches in
            spinlock code containing l*arx and st*cx instructions. (Applicable
            after -bp flag).

       -sldp, --spinlock-data-prefetch
            Perform data prefetching for memory access instructions preceding
            spinlock code containing l*arx and st*cx instructions.

       -sll Lib1:Prof1,...,LibN:ProfN, --static-link-libraries
       Lib1:Prof1,...,LibN:ProfN
            Statically link hot code from specified dynamically linked
            libraries to the input program. The parameter consists of a comma-
            separated list of libraries and their profiles. IMPORTANT: Licens-
            ing rights of specified libraries should be observed when applying
            this copying optimization.

       -sllht hotness_threshold, --static-link-libraries-hotness-threshold
       hotness_threshold
            Set hotness threshold for the --static-link-libraries optimiza-
            tion. The allowed input range is between 0 (least aggressive) and
            1, or -1, which does not require a profile and selects all code
            that might be called by the input program from the given
            libraries. Default is set at 0.5.

       -so, --stack-optimization
            Reduce the stack frame size of functions that are called with a
            small number of arguments.

       -spc, --shortcut-plt-calls
            Shortcut PLT calls in shared libraries to local functions if they
            exist. Note: Resolving to external symbols is disabled for such
            calls.

       -stf, --stack-flattening
            Merge the stack frames of inlined functions with the frames of the
            calling functions.

       -tb, --preserve-traceback-tables
            Force the restructuring of traceback tables in reordered code. If
            -tb option is omitted, traceback tables are automatically included
            only for C++ applications that use the Try &amp; Catch mechanism.

       -tlo, --tocload-optimization
            Replace each load instruction that references the TOC with a cor-
            responding add-immediate instruction via the TOC anchor register,
            where possible.

       -ucde, --unreachable-code-data-elimination
            Remove unreachable code and non-accessed static data.

       -vro, --volatile-registers-optimization
            Eliminate stores and restores of non-volatile registers in fre-
            quently executed functions by using available volatile registers.

       -vrox, --volatile-registers-extended-optimization
            Eliminate stores and restores of non-volatile registers in fre-
            quently executed functions by using available volatile registers,
            the extended version supports FP registers and transparency.

       General Options:

       -h, --help
            Print online help.

       -m machine-model, --machine machine-model
            Generate code for the specified machine model. Target machine can be one of the following models: power2, power3, ppc405, ppc440,
            power4, ppc970, power5, power6, ppe, spe, spe_edp, z10, z9. Default is set to no machine.

       -q, --quiet
            Set quiet output mode, suppressing informational messages.

       -st stat_file, --statistics stat_file
            Output statistics information to stat_file. If stat_file is '-', the output goes to standard output. See --verbose for the default.

       -v level, --verbose level
            Set verbose output mode level. When set, various statistics about the target optimized program are printed into the file pro-
            gram.stat. Allowed level range is between 0 and 3. Default is set to 0.

       -V, --version
            Print version.

</pre>
]]>
</platform_settings>

<flag
      name="invocation_path_stripper"
      class="other"
      regexp="(?:/\S+/)?(xlc|xlC|xlf95|fdpr)\b"
>
<include flag="INVOCATION_PATH" flagtext=" $1 " />
<include text="$2" />
<display enable="0" />
</flag>

<flag
      name="lop_xlc"
      class="compiler"
      regexp="xlc\b">

<example>exampleOFxlc</example>
<![CDATA[
<p>
Invoke the IBM XL C compliler. 32-bit binaries are produced by default.
</p>
]]>
</flag>

<flag
      name="lop_xlcpp"
      class="compiler"
      regexp="xlC\b">

<example>exampleOFxlC</example>
<![CDATA[
<p>
Invoke the IBM XL C++ compliler. 32-bit binaries are produced by default.
</p>
]]>
</flag>

<flag
      name="lop_xlf95"
      class="compiler"
      regexp="xlf95\b">

<example>exampleOFxlf95</example>
<![CDATA[
<p>
Invoke the IBM XL Fortran compliler. 32-bit binaries are produced by default.
</p>
]]>
</flag>

<flag
      name="lop_xlf95_r"
      class="compiler"
      regexp="xlf95_r\b">

<example>exampleOFxlf95</example>
<![CDATA[
<p>
Invoke the IBM XL Fortran compliler with the 'r' capabilities.
</p>
]]>
</flag>

<flag
      name="lop_fdpr"
      class="compiler"
      regexp="fdpr\b">

<example>fdpr -O3</example>
<![CDATA[
<p>
Invoke the IBM fdpr FDO program to do FDO optimizations on a binary module.
</p>
]]>
</flag>


<flag name="F-O5"
      class="optimization"
>
<example>
-O5
</example>

<![CDATA[
<p>
Perform optimizations for maximum performance. This includes maximum
interprocedural analysis on all of the objects presented on the "link" 
step. This level of optimization will increase the compiler's memory
usage and compile time requirements. -O5 Provides all of the functionality
of the -O4 option, but also provides the functionality of the
-qipa=level=2 option.
</p>

-O5 is equivalent to the following flags
<ul>
  <li> <tt>-O4</tt> </li>
  <li> <tt>-qipa=level=2</tt> </li>
  <li> <tt>-qarch=auto</tt> </li>
  <li> <tt>-qtune=auto</tt> </li>
</ul>
]]>
<include flag="F-O4" />
<include flag="F-qipa:level" flagtext="-qipa=level=2" />
<include flag="F-qarch" flagtext="-qarch=auto" />
<include flag="F-qtune" flagtext="-qtune=auto" />
</flag>

<flag name="F-O4"
      class="optimization"
>
<example>
-O4
</example>

<![CDATA[
<p>
Perform optimizations for maximum performance. This includes
interprocedural analysis on all of the objects presented on the "link" 
step.
</p>

-O4 is equivalent to the following flags
<ul>
  <li> <tt>-O3</tt> </li>
  <li> <tt>-qipa=level=1</tt> </li>
  <li> <tt>-qarch=auto</tt> </li>
  <li> <tt>-qtune=auto</tt> </li>
</ul>
]]>
<include flag="F-O3" />
<include flag="F-qipa:level" flagtext="-qipa=level=1" />
<include flag="F-qarch" flagtext="-qarch=auto" />
<include flag="F-qtune" flagtext="-qtune=auto" />
</flag>

<flag name="F-O3"
      class="optimization"
>
<example>-O3</example>
<![CDATA[
<p>
Performs additional optimizations that are memory intensive, compile-time
intensive, and may change the semantics of the program slightly, unless
-qstrict is specified. We recommend these optimizations when the desire for
run-time speed improvements outweighs the concern for limiting compile-time
resources. The optimizations provided include:
</p>
<ul>
  <li> In-depth memory access analysis </li>
  <li> Better loop scheduling </li>
  <li> High-order loop analysis and transformations (-qhot=level=0) </li>
  <li> Inlining of small procedures within a compilation unit by default </li>
  <li> Eliminating implicit compile-time memory usage limits </li>
  <li> Widening, which merges adjacent load/stores and other operations </li>
  <li> Pointer aliasing improvements to enhance other optimizations </li>
</ul>

-O3 is equivalent to the following flags
<ul>
  <li> <tt>-O2</tt> </li>
  <li> <tt>-qhot=level=0</tt> </li>
</ul>
]]>
<include flag="F-O2" />
<include flag="F-qhot" flagtext="-qhot=level=0" />
</flag>

<flag name="F-O2"
      class="optimization"
      regexp="-O2\b">
<example>-O2</example>
<![CDATA[
<p>
Performs a set of optimizations that are intended to offer improved
performance without an unreasonable increase in time or storage that is
required for compilation including:
</p>
<ul>
  <li> Eliminates redundant code </li>
  <li> Basic loop optimization </li>
  <li> Can structure code to take advantage of -qarch and -qtune settings </li>
</ul>
]]>
<include flag="F-O" />
</flag>

<flag name="F-O"
      class="optimization"
>
<example>-O</example>
<![CDATA[
<p>
Enables the level of optimization that represents the best tradeoff between compilation speed and run-time performance. If you need a specific level of optimization, specify the appropriate numeric value. Currently, -O is equivalent to -O2.
</p>
]]>
<include flag="F-O2" />
</flag>

<flag name="F-qhot"
      class="optimization"
>
<example>-qhot</example>
<![CDATA[
<pre>
Performs high-order transformations on loops during optimization.
    o arraypad
      The compiler will pad any arrays where it infers that there may be a benefit. 
    o level=0
      The compiler performs a limited set of high-order loop transformations. 
    o level=1
      The compiler performs its full set of high-order loop transformations. 
    o simd
      Replaces certain instruction sequences with vector instructions. 
    o vector
      Replaces certain instruction sequences with calls to the MASS library. 

Specifying -qhot without suboptions implies -qhot=nosimd, -qhot=noarraypad, -qhot=vector and -qhot=level=1. The -qhot option is also implied by -O4, and -O5. 
</pre>
]]>
</flag>

<flag name="F-qarch"
      class="optimization"
      regexp="-qarch=(\S+)\b"
>
<example>-qarch=pwr5x, -qarch=auto</example>
<![CDATA[
<p>
Produces object code containing instructions that will run on the
specified processors. "auto" selects the processor the complile
is being done on. "pwr5x" is the POWER5+ processor.
</p>

<p>Supported values for this flag are</p>
<ul>
  <li>auto	- Use the processor on which the program is compiled.</li>
  <li>pwr7	- The POWER7 processor based systems.</li>
  <li>pwr6e	- The POWER6 processor in "Enhanced" mode based systems.</li>
  <li>pwr6	- The POWER6 processor based systems.</li>
  <li>pwr5x	- The POWER5+ processor based systems.</li>
  <li>pwr5	- The POWER5 processor based systems.</li>
  <li>pwr4	- The POWER4 processor based systems.</li>
  <li>ppc970	- The PPC970 processor based systems.</li>
</ul>
]]>
</flag>

<flag name="F-qtune"
      class="optimization"
      regexp="-qtune=(\S+)\b"
>
<example>-qtune=pwr4, -qtune=auto</example>
<![CDATA[
<p>
Specifies the architecture system for which the executable program
is optimized.  This includes instruction scheduling and cache setting.
</p>

<p>The supported values for <tt>suboption</tt> are</p>

<ul>
  <li>auto	- Use the processor on which the program is compiled.</li>
  <li>pwr7      - The POWER7 processor based systems.</li>
  <li>pwr6      - The POWER6 processor based systems.</li>
  <li>pwr5x	- The POWER5+ processor based systems.</li>
  <li>pwr5	- The POWER5 processor based systems.</li>
  <li>pwr4	- The POWER4 processor based systems.</li>
  <li>ppc970	- The PPC970 processor based systems.</li>
</ul>
]]>
</flag>

<flag name="F-qipa:level"
      class="optimization"
      regexp="-qipa=level=[012]\b">
<example>
-qipa=level
</example>
<![CDATA[
<p>
Enhances optimization by doing detailed analysis across procedures
(interprocedural analysis or IPA). 
The <tt>level</tt> determines the amount of interprocedural analysis
and optimization that is performed.
</p>

<p>
  <tt>level=0</tt> Does only minimal interprocedural analysis and optimization
</p>

<p>
  <tt>level=1</tt> turns on inlining , limited alias analysis, and limited
  call-site tailoring
</p>

<p>
  <tt>level=2</tt> turns on full interprocedural data flow and alias analysis
</p>
]]>
</flag>
 
<flag name="F-qalias"
      class="optimization"
      regexp="-qalias=(noansi|nostd)\b">
<example>
-qalias=noansi
</example>
<![CDATA[
<pre>
 qalias=ansi | noansi
   If ansi is specified, type-based aliasing is
   used during optimization, which restricts the
   lvalues that can be safely used to access a
   data object. The default is ansi for the xlc,
   xlC, and c89 commands. This option has no
   effect unless you also specify the -O option.

 qalias=std |nostd
   Indicates whether the compilation units contain
   any non-standard aliasing (see Compiler Reference
   for more information). If so, specify nostd. 
</pre>
]]>
</flag>

<flag name="F-lhugetlbfs"
      class="optimization"
>
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
</flag>

<flag name="F-qfixed"
      class="portability"
>
Indicates that the input fortran source program is in fixed form.
</flag>

<flag name="F-qextname"
      class="portability"
>
Adds an underscore to global entites to match the C compiler ABI
</flag>

<flag name="F-qchars:signed"
      class="portability"
>
Causes the compiler to treat "char" variables as signed instead of the
default of unsigned.
</flag>

<flag name="F-qalloca"
      class="optimization"
>
Indicates that the compiler understands how to do alloca().
</flag>

<flag name="F-q64"
      class="optimization"
>
Generates 64 bit ABI binaries. The default is to generate 32 bit binaries.
</flag>

<flag name="F-qalign"
      class="optimization"
>
<![CDATA[
<pre>
Specifies what aggregate alignment rules the compiler uses for file compilation,
where the alignment options are:

bit_packed
   The compiler uses the bit_packed alignment rules.

full
   The compiler uses the RISC System/6000 alignment rules. This is the same 
   as power.
                                                                                                                                           mac68k 
  The compiler uses the Macintosh alignment rules.  This suboption is valid only 
  for 32- bit compilations.  
  
natural 
  The compiler maps structure members to their natural boundaries.  

packed 
  The compiler uses the packed alignment rules.  

power 
  The compiler uses the RISC System/6000 alignment rules.  
  
twobyte 
  The compiler uses the Macintosh alignment rules.  This suboption is valid
  only for 32-bit compilations.  The mac68k option is the same as twobyte.

The default is -qalign=full.

</pre>
]]>
</flag>

<flag name="F-lxlf90_r"
      class="optimization"
>
Link the Fortran runtime library libxlf90_r.so which is required by libessl.so.
</flag>

<flag name="F-lmass"
      class="optimization"
>
Link the mathematical acceleration subsystem libraries (MASS), which contain libraries of tuned mathematical intrinsic functions.
</flag>

<flag name="F-lessl"
      class="optimization"
>
<![CDATA[
<p>
Link the Engineering and Scientifc Subroutine Library (ESSL), libessl.so. 
ESSL is a collection of subroutines providing a wide range of performance-tuned mathematical functions for many common scientific and engineering applications. The mathematical subroutines are divided into nine computational areas:
</p>
<ul>

    <li> Linear Algebra Subprograms</li>
    <li> Matrix Operations</li>
    <li> Linear Algebraic Equations</li>
    <li> Eigensystem Analysis</li>
    <li> Fourier Transforms, Convolutions, Correlations and Related Computations</li>
    <li> Sorting and Searching</li>
    <li> Interpolation</li>
    <li> Numerical Quadrature</li>
    <li> Random Number Generation</li>
</ul>
]]>
</flag>

<flag name="F-qessl"
      class="optimization"
>
Specifies that, if either -lessl or -lesslsmp are also specified, then Engineering and Scientific Subroutine Library (ESSL) routines should be used in place of some Fortran 90 intrinsic procedures when there is a safe opportunity to do so.
</flag>

<flag name="F-qpdf1"
      class="optimization"
>
The option used in the first pass of a profile directed feedback compile that causes pdf information 
to be generated. The profile directed feedback optimization gathers data on both exectuion path and 
data values. It does not use hardware counters, nor gather any data other than path and data values 
for PDF specific optimizations.
</flag>

<flag name="F-qpdf2"
      class="optimization"
>
The option used in the second pass of a profile directed feedback compile that causes PDF information 
to be utilized during optimization.
</flag>

<flag name="F-qlanglvl:extc99"
      class="compiler"
>
Support ISO C99 standard, and accepts implementation-specific language extensions.
</flag>

<flag name="F-lsmartheap"
      class="optimization"
>
Link with MicroQuill's SmartHeap (32-bit) library for Linux on POWER. This is a library that 
optimizes calls to new, delete, malloc and free.
</flag>

<flag name="F-lsmartheap64"
      class="optimization"
>
Link with MicroQuill's SmartHeap (64-bit) library for Linux on POWER. This is a library that
optimizes calls to new, delete, malloc and free.
</flag>

<flag name="F-tl"
      class="optimization"
>
<![CDATA[
Applies the prefix specified by the -B option to the designated components.
<table>
<tr>
<th align="left">Parameter</th>
<th align="left">Description</th>
<th align="left">Executable name</th>
</tr>
<tr>
<td>a</td>
<td>Assembler</td>
<td>as</td>
</tr>
<tr>
<td>b</td>
<td>Low-level optimizer</td>
<td>xlfcode</td>
</tr>
<tr>
<td>c</td>
<td>Compiler front end</td>
<td>xlfentry</td>
</tr>
<tr>
<td>d</td>
<td>Disassembler</td>
<td>dis</td>
</tr>
<tr>
<td>F</td>
<td>C preprocessor</td>
<td>cpp</td>
</tr>
<tr>
<td>h</td>
<td>Array language optimizer</td>
<td>xlfhot</td>
</tr>
<tr>
<td>I</td>
<td>High-level optimizer, compile step</td>
<td>ipa</td>
</tr>
<tr>
<td>l</td>
<td>Linker</td>
<td>ld</td>
</tr>
<tr>
<td>z</td>
<td>Binder</td>
<td>bolt</td>
</tr>
</table>
]]>
</flag>

<flag name="F-qxlf90"
      class="optimization"
      regexp="-qxlf90=nosignedzero\b">
<example>
-qxlf90=nosignedzero
</example>
<![CDATA[
<pre>
         -qxlf90=&lt;suboption&gt;
                Determines whether the compiler provides the
                Fortran 90 or the Fortran 95 level of support for
                certain aspects of the language. &lt;suboption&gt; can be
                one of the following:

                signedzero | nosignedzero
                     Determines how the SIGN(A,B) function handles
                     signed real 0.0. In addition, determines
                     whether negative internal values will be
                     prefixed with a minus when formatted output
                     would produce a negative sign zero.
                autodealloc | noautodealloc
                     Determines whether the compiler deallocates
                     allocatable arrays that are declared locally
                     without either the SAVE or the STATIC
                     attribute and have a status of currently
                     allocated when the subprogram terminates.
                oldpad | nooldpad
                     When the PAD=specifier is present in the
                     INQUIRE statement, specifying -qxlf90=nooldpad
                     returns UNDEFINED when there is no connection,
                     or when the connection is for unformatted I/O.
                     This behavior conforms with the Fortran 95
                     standard and above. Specifying -qxlf90=oldpad
                     preserves the Fortran 90 behavior.

                Default:
                     o signedzero, autodealloc and nooldpad for the
                     xlf95, xlf95_r, xlf95_r7 and f95 invocation
                     commands.
                     o nosignedzero, noautodealloc and oldpad for
                     all other invocation commands.
</pre>
]]>
</flag>

<flag name="F-qstrict"
      class="optimization" 
      regexp="-qstrict|-qnostrict\b"
>
<![CDATA[
<pre>
 qstrict
    Turns off aggressive optimizations which have the potential to alter the
    semantics of your program.  -qstrict sets -qfloat=nofltint:norsqrt.
 qnostrict
    Sets -qfloat=rsqrt.

 These options are only valid with -O2 or higher optimization levels.
 Default: 
    o -qnostrict at -O3 or higher.
    o -qstrict otherwise.
</pre>
]]>
</flag>

<flag name="F-qstaticlink"
      class="optimization"
>
Controls how shared and non-shared runtime libraries are linked into an application.

When -qstaticlink is in effect, the compiler links only static libraries with the object file named in the invocation. When -qnostaticlink is in effect, the compiler links shared libraries with the object file named in the invocation.

This option provides the ability to specify linking rules that are equivalent to those implied by the GNU options -static, -static-libgcc, and -shared-libgcc, used singly and in combination.

</flag>

<flag name="F-qnoenablevmx"
      class="optimization"
>
Disables generation of vector instructions for processors that support them.
</flag>


<flag name="link_whole_archive"
      class="optimization"
      regexp="-Wl,--whole-archive\s/\S*"
>
<example>
"-Wl,--wholearchive /usr/lib/libhugetlbfs.a"
</example>
Instructs the linker to include every object file in the specified library,
rather than searching the library for the required object files.
</flag>

<flag name="link_dl_static"
      class="optimization"
      regexp="/usr/lib/libdl.a"
>
<example>
"/usr/lib/libdl.a"
</example>
Instructs the linker to include libdl.a to enable dynamic linking loader.
</flag>

<flag name="link_no_whole_archive"
      class="optimization"
      regexp="-Wl,--no-whole-archive"
>
Turn off the effect of the --whole-archive flag. 
</flag>

<flag name="link_mul_defs" 
      class="optimization"
      regexp="-Wl,-z,muldefs"
>
Instructs the linker to allow multiple definitions and the first definition will
be used. Normally when a symbol is defined multiple times, the linker will report
a fatal error. 
</flag>

<flag name="hugetlbfs_BDT"
      class="optimization"
      regexp="-Wl,--hugetlbfs-link=BDT"
>
Pass the --hugetlbfs-link=BDT flag to the linker so that
the text, initialized data, and BSS segments of the application are backed by hugepages.
</flag>

<flag name="hugetlbfs_align"
      class="optimization"
      regexp="-Wl,--hugetlbfs-align"
>
Pass the --hugetlbfs-align flag to the linker so that we  can control 
(by environment variable HUGETLB_ELFMAP) which program segments are placed in hugepages.
</flag>

<flag name="F-B"
      class="optimization"
      regexp="-B/\S*"
>
<example>
-B/usr/share/libhugetlbfs/
</example>
Determines substitute path names for XL Fortran executables such as the compiler, assembler, linker, and preprocessor.  It can be used in combination with the -t option, which determines which of these components are affected by -B.
</flag>

<flag name="link_emit_relocation"
      class="optimization"
      regexp="-Wl,-q\b"
>
Pass the -q flag to the linker causing the final executable to have the relocation information.
</flag>

<flag name="F-DSPEC_CPU_LINUX_PPC"
      class="portability"
>
This macro indicates that the benchmark is being compiled on a PowerPC-based Linux System.
</flag>

<flag name="F-qrtti"
      class="optimization"
>
Cause the C++ compiler to generate Run Time Type Identification code for exception handling and for use by the typeid and dynamic_cast operators.
</flag>

<flag name="F-qsmallstack:dynlenonheap"
      class="optimization"
>
Causes the Fortran compiler to allocate dynamic arrays on the heap instead of the stack
</flag>

<flag name="F-qipa:noobject"
      class="other"
      regexp="-qipa=noobject\b">
<example>
-qipa=noobject
</example>

<![CDATA[
<p>
 Specifies whether to include standard object code in the object files.
 The <tt>noobject</tt> suboption can substantially reduce overall
 compilation time, by not generating object code during the first IPA phase.
 This option does not affect the code in the final binary created.
</p>
]]>
</flag>


<flag name="F-qipa:threads"
      class="other"
      regexp="-qipa=threads\b">
<example>
-qipa=threads
</example>
<![CDATA[
<p>
 The <tt>threads</tt> suboption allows the IPA optimizer to run portions
 of the optimization process in parallel threads, which can speed up the
 compilation process on multi-processor systems. All the available
 threads, or the number specified by N, may be used. N must be a positive
 integer. Specifying <tt>nothreads</tt> does not run any parallel threads;
 this is equivalent to running one serial thread.
 This option does not affect the code in the final binary created.
</p>
]]>
</flag>



<flag
      name="INVOCATION_PATH"
      class="other"
      regexp="/\S+/bin/"
>
The path used to invoke the compilers.
<display enable="0" />
</flag>

<flag name="F-qsimd"
      class="optimization"
      regexp="-q(no)?simd\b">
<example>
-qsimd
-qnosimd
</example>
Enables the generation of vector instructions for processors
that support them.
</flag>

<flag name="F-qassert"
      class="optimization"
      regexp="-qassert=refalign\b">
<example>
-qassert=refalign
</example>
<![CDATA[
<pre>
 qassert=refalign | norefalign
   Specifies that all pointers inside the compilation
   unit only point to data that is naturally aligned
   according to the length of the pointer types.
</pre>
]]>
</flag>

<flag name="F-qipa:inline"
      class="optimization"
      regexp="-qipa=inline=(\S+)\b">
<example>
-qipa=inline=limit=1000
-qipa=inline=threshold=100
</example>
<![CDATA[
<p>
 The <tt>inline</tt> suboption  specifies the threshold and
 limit of inlined functions
</p>
]]>
</flag>

<flag name="F-lstd8d"
      class="optimization"
>
Link with the Apache C++ Standard Library ("stdcxx"). "libstd8d.so" is a 32-bit shared library with optimization enabled.
</flag>


<flag name="Lstd"
      class="optimization"
      regexp="-L\s*[^ ]*stdcxx[^ ]*">
Adds the directory for the Apache C++ Standard Library to the search path at link time.
</flag>

<flag name="Rstd"
      class="optimization"
      regexp="-R\s*[^ ]*stdcxx[^ ]*">
<![CDATA[
<p>Specifies library search directory for the Apache C++ Standard Library for use by the runtime linker.  The information is recorded in the object file and passed to the runtime linker.</p>
]]>
</flag>

<flag name="F-qcpp_stdinc"
      class="optimization"
      regexp="-qcpp_stdinc\s*[^ ]*stdcxx[^ ]*">
Changes the default search path for the XL C++ header files to use the header files from Apache C++ Library.
</flag>

<flag name="F-qipa:partition"
      class="optimization"
      regexp="-qipa=partition=large\b">
<example>
-qipa=partition=large
</example>
<![CDATA[
<p>
 The <tt>partition</tt> suboption specifies the size of the program
 sections that are analysed together.  Larger partitons may produce
 better analysis but require more storage. Default is medium.
</p>
]]>
</flag>

</flagsdescription>

