IBM Flag disclosure -- 15 February 2000 XLC/XLF options: ---------------- -1 - Executes DO loops at least once, if reached. -ma - use built-in alloca() function -O - optimization level 1 turned on -O3 - optimization level 3 turned on -O4 - equivalent to '-O3 -qipa', with automatic generation of architecture and tuning option ideal for that platform -Q - Turn inlining on -Q=xxx - Inline functions < xxx lines -qalias=noaryovrlp - Program does not contain array assignments of overlapping or storage-associated arrays; can produce significant performance improvements for array language. -qansialias - Use type-based aliasing during optimization -qarch=ppc - sets architecture to PowerPC -qarch=power2 - sets architecture to Power2 -qarch=pwrx - sets architecture to Power2 -qarch=pwr3 - sets architecture to Power3 -qarch=rs64a - sets architecture to PowerPC RS64-I -qarch=rs64b - sets architecture to PowerPC RS64-II -qassert=addr - Variables are disjoint from pointers unless their address is taken. -qassert=allp - Pointers are never aliased. -qcompact - Reduce code size where possible, at the expense of execution speed. Code size is reduced by inhibiting optimizations that replicate or expand code inline. -qdpc - increase the precision of real constants, for maximum accuracy when assigning real constants to DOUBLE PRECISION variables. -qhot - performs high order loop transformations -qhot=arraypad=n - Performs additional loop optimization and pads array dimensions to prevent cache misses. -qhsflt - prevents rounding of single-precision expressions and replacing -qfloat=hsflt floating-point division by multiplication by the reciprocal of the divisor -qinlglue - Generate fast external linkage by inlining the code (pointer glue code) necessary at calls via a function pointer and calls to external procedures. -qintlog - allows for mixing integer and logical data entities in expressions and statements -qipa[=options] - turns on interprocedural analysis ipa options: inline=limit=n - Perform inlining where appropriate (compiler's decision) but limit inlined code to no more than n bytes of object code level=2 - Turn on inlining, cloning, full alias analysis, constant propagation, call-site tailoring, and dead code removal noobject - omit an IPA pass; used only to save compilation time partition=large - Specifies the size of program sections that are analyzed together. Larger partitions produce better analysis but require more storage. -qlanglvl=ansi - Specify the language level to use during compilation. ANSI standard, in this case. -qlibansi - Assumes that all functions with the names of ANSI C library functions are in fact the system functions. -qlog4 - Logical expressions that have a LOGICAL result are of type LOGICAL(4). -qmaxmem=-1 - No limit to how much memory to use during compilation -qnosave - sets default storage class of local variables to automatic -qpdf1/pdf2 - profile directed feedback optimization -qrndsngl - rounds the result of each single-precision operation to single- precision, rather than waiting until the full expression is evaluated -qstrict - ensures that optimzation level 3 does not alter the semantics of the program -qtbtable=none - Don't generate traceback information -qdatalocal - assume all data items are local -qtune=604 - instruction selection, scheduling, and other implementation dependent performance enhancements for the PowerPC 604/604e -qtune=pwr2 - instruction selection, scheduling, and other implementation dependent performance enhancements for Power2 -qtune=pwr3 - instruction selection, scheduling, and other implementation dependent performance enhancements for Power3 -qtune=rs64a - instruction selection, scheduling, and other implementation dependent performance enhancements for the PowerPC RS64-I -qtune=rs64b - instruction selection, scheduling, and other implementation dependent performance enhancements for the PowerPC RS64-II -qunroll[=n] - Allow the optimizer to unroll loops, where the optional paramater n specifies the loop unrolling factor (default 4). -qxlf77=nopersistent - Disables saving the addresses of arguments to subprograms with ENTRY statements in static storage. Linker Options: --------------- -lmass - Link the mathematical acceleration subsystem libraries (MASS), which contain libraries of tuned mathematical intrinsic functions. See www.austin.ibm.com/tech/MASS. -bnso Brings referenced library procedures into the object file -bI:/lib/syscalls.exp Create statically linked object files (syscalls.exp supplies the names of the routines that can be imported). -lhmu -lhm -lhu - link fast malloc libraries. These libraries are part of the memdbg package that is included with IBM C compilers -/usr/ccs/lib/bmalloc.o: A high performance implementation of the Berkeley malloc package. KAP Preprocessor Options: ------------------------- -Pk -Wp - turns on the Kap pre-processor -ag=a - pads common blocks and memory local to the subroutine to avoid cache line collisions. -ag=b - kapf can adjust the leading dimensions of arrays in COMMON away from a power of 2 if the arrays are not used as actual arguments to any user procedure calls. -r=2 - sets roundoff level to 2 -ur2=xxx - sets a maximum weight (estimate of work) for each unrolled iteration. (Work is estimated by counting operands and operators in a loop.) -inl - inline -ur=xxx - maximum number of iterations of a loop to unroll -lm=5 - Limit amount of loop nesting. -fuse - The fuse command line option enables loop fusion, a conventional compiler optimization that transforms two adjacent loops into a single loop. -f - Leave pre-processed source file around Vast Preprocessor Options: -------------------------- -Pv -Wp - turns on the Vast Pre-processor -me - informs the preprocessor to enable alignment, inter-array padding and array redimensioning. -o - Leave pre-processed source file around -ew - is the same as -ea478 -ea2478 - (-ea allows alassociative trnsformations.) (-e2 specifies that no data dependencies exist in loop containing pointer-based variables.) (-e4 generates calls to optimized BLAS library routines.) (-e7 automatically expands called routines inline.) (-e8 searches input file first for expandable routines.) FDPR: ----- The fdpr (feedback directed program restructuring) program optimizes the executable image of a program by collecting information on the behavior of the program while the program is used for some typical workload, and then creating a new version. It is available on AIX Version 4 systems as part of the Performance Toolbox for AIX. Options: -R2 - Employ a program-reordering technique in which the original structure of the program, including traceback entries, is preserved. -R3 - Employ global reordering techniques that do not preserve debug information.