IBM Flag disclosure -- 15 February 2000

XLC/XLF options:
----------------
-1            - Executes DO loops at least once, if reached.

-ma           - use built-in alloca() function

-O            - optimization level 1 turned on
-O3           - optimization level 3 turned on
-O4           - equivalent to '-O3 -qipa', with automatic generation of
                architecture and tuning option ideal for that platform

-Q            - Turn inlining on
-Q=xxx        - Inline functions < xxx lines

-qalias=noaryovrlp - Program does not contain array assignments of overlapping
                or storage-associated arrays; can produce significant
                performance improvements for array language.

-qansialias   - Use type-based aliasing during optimization

-qarch=ppc    - sets architecture to PowerPC
-qarch=power2 - sets architecture to Power2
-qarch=pwrx   - sets architecture to Power2
-qarch=pwr3   - sets architecture to Power3
-qarch=rs64a  - sets architecture to PowerPC RS64-I
-qarch=rs64b  - sets architecture to PowerPC RS64-II

-qassert=addr - Variables are disjoint from pointers unless their address is
                taken.

-qassert=allp - Pointers are never aliased.

-qcompact     - Reduce code size where possible, at the expense of execution
                speed.  Code size is reduced by inhibiting optimizations that
                replicate or expand code inline.

-qdpc         - increase the precision of real constants, for maximum accuracy
                when assigning real constants to DOUBLE PRECISION variables.

-qhot         - performs high order loop transformations
-qhot=arraypad=n - Performs additional loop optimization and pads array
                dimensions to prevent cache misses.

-qhsflt       - prevents rounding of single-precision expressions and replacing
-qfloat=hsflt   floating-point division by multiplication by the reciprocal of
                the divisor

-qinlglue     - Generate fast external linkage by inlining the code (pointer
                glue code) necessary at calls via a function pointer and calls
                to external procedures.

-qintlog      - allows for mixing integer and logical data entities in
                expressions and statements

-qipa[=options] - turns on interprocedural analysis
 ipa options:
 inline=limit=n - Perform inlining where appropriate (compiler's decision)
                  but limit inlined code to no more than n bytes of object 
                  code 
 level=2      - Turn on inlining, cloning, full alias analysis, constant
                propagation, call-site tailoring, and dead code removal
 noobject     - omit an IPA pass; used only to save compilation time
 partition=large - Specifies the size of program sections that are analyzed
                   together.  Larger partitions produce better analysis but
                   require more storage.

-qlanglvl=ansi - Specify the language level to use during compilation.
                 ANSI standard, in this case.

-qlibansi     - Assumes that all functions with the names of ANSI C
                library functions are in fact the system functions.

-qlog4        - Logical expressions that have a LOGICAL result are of type 
                LOGICAL(4).

-qmaxmem=-1   - No limit to how much memory to use during compilation

-qnosave      - sets default storage class of local variables to automatic

-qpdf1/pdf2   - profile directed feedback optimization

-qrndsngl     - rounds the result of each single-precision operation to single-
                precision, rather than waiting until the full expression is
                evaluated

-qstrict      - ensures that optimzation level 3 does not alter the semantics
                of the program

-qtbtable=none - Don't generate traceback information

-qdatalocal   - assume all data items are local

-qtune=604    - instruction selection, scheduling, and other implementation 
                dependent performance enhancements for the PowerPC 604/604e

-qtune=pwr2   - instruction selection, scheduling, and other implementation 
                dependent performance enhancements for Power2

-qtune=pwr3   - instruction selection, scheduling, and other implementation 
                dependent performance enhancements for Power3

-qtune=rs64a  - instruction selection, scheduling, and other implementation 
                dependent performance enhancements for the PowerPC RS64-I

-qtune=rs64b  - instruction selection, scheduling, and other implementation 
                dependent performance enhancements for the PowerPC RS64-II

-qunroll[=n]  - Allow the optimizer to unroll loops, where the optional
                paramater n specifies the loop unrolling factor (default 4).

-qxlf77=nopersistent - Disables saving the addresses of arguments to
                       subprograms with ENTRY statements in static storage.


Linker Options:
---------------
-lmass       - Link the mathematical acceleration subsystem libraries (MASS),
               which contain libraries of tuned mathematical intrinsic
               functions.  See www.austin.ibm.com/tech/MASS.

-bnso          Brings referenced library procedures into the object file

-bI:/lib/syscalls.exp   Create statically linked object files (syscalls.exp
                        supplies the names of the routines that can be
                        imported).

-lhmu         
-lhm
-lhu          - link fast malloc libraries.  These libraries are part of the
                memdbg package that is included with IBM C compilers

-/usr/ccs/lib/bmalloc.o: A high performance implementation of the Berkeley 
 malloc package.


KAP Preprocessor Options:
-------------------------
-Pk -Wp       - turns on the Kap pre-processor
    -ag=a     - pads common blocks and memory local to the subroutine to avoid
                cache line collisions.
    -ag=b     - kapf can adjust the leading dimensions of arrays in COMMON away
                from a power of 2 if the arrays are not used as actual
                arguments to any user procedure calls.  
    -r=2      - sets roundoff level to 2
    -ur2=xxx  - sets a maximum weight (estimate of work) for each unrolled
                iteration. (Work is estimated by counting operands and
                operators in a loop.)
    -inl      - inline
    -ur=xxx   - maximum number of iterations of a loop to unroll
    -lm=5     - Limit amount of loop nesting.
    -fuse     - The fuse command line option enables loop fusion, a
                conventional compiler optimization that transforms two adjacent
                loops into a single loop.
    -f        - Leave pre-processed source file around


Vast Preprocessor Options:
--------------------------
-Pv -Wp       - turns on the Vast Pre-processor
    -me       - informs the preprocessor to enable alignment, inter-array
                padding and array redimensioning.
    -o        - Leave pre-processed source file around
    -ew       - is the same as -ea478
    -ea2478   - 
              (-ea  allows alassociative trnsformations.)
              (-e2  specifies that no data dependencies exist in loop 
                    containing pointer-based variables.)
              (-e4 generates calls to optimized BLAS library routines.)
              (-e7 automatically expands called routines inline.)
              (-e8 searches input file first for expandable routines.)


FDPR:
-----
The fdpr (feedback directed program restructuring) program optimizes the 
executable image of a program by collecting information on the behavior of 
the program while the program is used for some typical workload, and then 
creating a new version.  It is available on AIX Version 4 systems as part 
of the Performance Toolbox for AIX.

Options:
    -R2       - Employ a program-reordering technique in which the original 
                structure of the program, including traceback entries, is 
                preserved. 
    -R3       - Employ global reordering techniques that do not preserve
                debug information.