Compilers: IBM XL C/C++ Version 13.1.7 for Linux
Compilers: IBM XL Fortran Version 15.1.7 for Linux
Libraries: IBM Advance Toolchain version 11.0-0 Available for download at : https://ibm.biz/AdvanceToolchain
IBM Post Link Optimizer : IBM Feedback Directed Program Restructing (FDPR) for Linux on Power 5.6.4-0 Available for download at : https://developer.ibm.com/linuxonpower/sdk-packages/
Operating systems: SLES 12 SP3
Selecting one of the following will take you directly to that section:
Determines substitute path names for XL Fortran executables such as the compiler, assembler, linker, and preprocessor. It can be used in combination with the -t option, which determines which of these components are affected by -B. Example : -B/opt/at10.0/share/libhugetlbfs/
Macro to have compiler always inline externs if specified.
Pass the --hugetlbfs-align flag to the linker so that we can control (by environment variable HUGETLB_ELFMAP) which program segments are placed in hugepages.
Pass the --hugetlbfs-link=BDT flag to the linker so that the text, initialized data, and BSS segments of the application are backed by hugepages.
Link the Engineering and Scientific Subroutine Library (ESSL).
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Link the mathematical acceleration subsystem libraries (MASS), which contain libraries of tuned mathematical intrinsic functions.
Link with tcmalloc library for Linux on POWER. This is a library that optimizes calls to new, delete, malloc and free.
Instructs the linker to include libdl.a to enable dynamic linking loader. Links "/usr/lib/libdl.a" library.
Pass the -q flag to the linker causing the final executable to have the relocation information.
Instructs the linker to allow multiple definitions and the first definition will be used. Normally when a symbol is defined multiple times, the linker will report a fatal error.
Turn off the effect of the --whole-archive flag.
Instructs the linker to include every object file in the specified library, rather than searching the library for the required object files. Example : "-Wl,--wholearchive /usr/lib/libhugetlbfs.a"
Link with the Apache C++ Standard Library ("stdcxx"). "libstd8d.so" is a 32-bit shared library with optimization enabled.
Adds the directory for the Apache C++ Standard Library to the search path at link time.
The optimizations provided include:
-O3 is equivalent to the following flags :
-O4 is equivalent to the following flags:
-O5 is equivalent to the following flags :
Generates 64-bit ABI binaries. The default is to generate 64-bit ABI binaries on little-endian Linux.
-qalias=ansi | noansi :
If ansi is specified, type-based aliasing is used during optimization, which restricts the lvalues that can be safely used to access a data object. The default is ansi for the xlc, xlC, and c89 commands. This option has no effect unless you also specify the -O option.qalias=std |nostd :
Indicates whether the compilation units contain any non-standard aliasing. If so, specify nostd.Specifies what aggregate alignment rules the compiler uses for file compilation, where the alignment options are:
The default is -qalign=full.
Indicates that the compiler understands how to do alloca(). This flag is not supported on little-endian Linux.
Supported values for this flag are :
Enables the generation of vector instructions for processors that support them.
Tell the compiler that enum size is small.
Specifies that, if either -lessl or -lesslsmp is specified, then Engineering and Scientific Subroutine Library (ESSL) routines should be used in place of some Fortran 90 intrinsic procedures when there is a safe opportunity to do so.
The compiler generates additional symbol information for use by the "fdpr" binary optimization tool.
The supported values for suboption are :
Specifying -qhot without suboptions implies -qhot=nosimd, -qhot=noarraypad, -qhot=vector and -qhot=level=1. The -qhot option is also implied by -O4 and -O5 .
This option inlines glue code that optimizes external function calls when compiling.
The inline option specifies the threshold and limit of inlined functions. Example : -qinline=40.
The inline suboption specifies the threshold and limit of inlined functions. Examples : -qipa=inline=limit=1000 and -qipa=inline=threshold=100
Enhances optimization by doing detailed analysis across procedures (interprocedural analysis or IPA). The level determines the amount of interprocedural analysis and optimization that is performed.
Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
This option specifies that no functions are to be inlined.
Disables the generation of vector instructions.
Suppresses interprocedural analysis (IPA), which is enabled by default at optimization levels -O4 and -O5.
The noprefetch option will not add any prefetch instructions automatically.
Do not use the XL compiler thread information.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both execution path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
Inserts prefetch instructions automatically where there are opportunities to improve code performance.
Example : -qprefetch=dscr=42
Adds the restrict type qualifier to the pointer parameters within all functions without modifying the source file.
Cause the C++ compiler to generate Run Time Type Identification code
Specifies that all local variables be treated as STATIC.
Causes the Fortran compiler to allocate dynamic arrays on the heap instead of the stack.
Causes the compiler to automatically generate parallel code using OMP controls when possible.
Tell the compiler that OMP controls are used to identify parallel code.
Specifies the size of the register allocation spill area in bytes.
Disables transformations that may produce incorrect results in the presence of, or that may incorrectly produce IEEE floating-point NaN (not-a-number) values.
The supported values for suboption are :
Specifies whether to use volatile or non-volatile vector registers. Volatile vector registers are registers whose value is not preserved across function calls so the compiler will not depend on values in them across function calls.
suboption can be one of the following :
Specifies library search directory for the Apache C++ Standard Library for use by the runtime linker. The information is recorded in the object file and passed to the runtime linker.
Parameter | Description | Executable name |
---|---|---|
a | Assembler | as |
b | Low-level optimizer | xlfcode |
c | Compiler front end | xlfentry |
d | Disassembler | dis |
F | C preprocessor | cpp |
h | Array language optimizer | xlfhot |
I | High-level optimizer, compile step | ipa |
l | Linker | ld |
z | Binder | bolt |
This option indicates to the compiler that each dynamic object allocated in the program fits within the size of 4GB.
suboption can be one of the following :
If -qunroll is specified with no suboptions, the compiler assumes -qunroll=yes. -qnounroll is equivalent to -qunroll=no.
This flag is equivalent to -qunroll=no.
Instructs the compiler to search for more opportunities for loop unrolling than that performed with -funroll-loops. In general, -funroll-all-loops has more chances to increase compile time or program size than -funroll-loops processing, but it might also improve your application's performance.
Assumes that all functions with the name of an ANSI C defined library function are, in fact, the library functions.
Asserts the minimum physical pagesize during program execution.
Flag -qipa=noobject, specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
The partition suboption specifies the size of the program sections that are analysed together. Larger partitons may produce better analysis but require more storage. Default is medium.partition={small|medium|large}
Specifies the size of program sections that are analyzed together. Larger partitions may produce better analysis but require more storage.
Default: partition=medium
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option adds -qipa=level=1 if that is not already set.
Reduces the size of the stack frame. Programs that allocate large amounts of data to the stack, such as threaded programs, may result in stack overflows. This option can reduce the size of the stack frame to help avoid overflows.
Enhances optimization by doing detailed analysis across procedures (interprocedural analysis or IPA).
-qchars=signed : Causes the compiler to treat the type "char" as signed instead of the default of unsigned.
-qchars=unsigned : Causes the compiler to treat the type "char" as unsigned. This is the default.
Note: this particular portability flag is included for 526.blender_r per the recommendation in its documentation - see http://www.spec.org/cpu2017/Docs/benchmarks/526.blender_r.html.Permits the usage of "//" to introduce a comment that lasts until the end of the current source line, as in C++.
Adds an underscore to global entities to match the C compiler ABI
Indicates that the input fortran source program is in fixed form.
Do not use the XL compiler compat macros.
<suboption> must be one of the following suboptions:
Default: -qufmt=le
The xlc_r invocation is thread-safe version of xlc compiler. The xlc_at and xlc_r_at invocations link with the IBM Advance Toolchain libraries.
Only 64-bit compilation is supported.
The xlC_r invocation is thread-safe version of xlC compiler. The xlC_at and xlC_r_at invocations link with the IBM Advance Toolchain libraries. Only 64-bit compilation is supported.
The xlf95_r invocation is thread-safe version of xlf95 compiler. The xlf95_at and xlf95_r_at invocations link with the IBM Advance Toolchain libraries. Only 64-bit compilation is supported.
Compilation conforms to the ISO C99 standard and accepts implementation-specific language extensions.
Causes the compiler to output a traceback if it abends.
Specifies the size of the compiler's internal program storage areas, in bytes. Example : -qspillsize=512.
Suppresses the message with the message number specified. Examples : -qsuppress=1500-036 and -qsuppress=cmpmsg.
Suppresses informational, language-level, and warning messages. This option sets -qflag=e:e.
fdprpro is a Feedback Directed Program Restructuring optimization tool that is available for the IBM POWER platform. It can be used optionally during FDO.
An example command to invoke fdprpro for the optimization pass is:
Additional details regarding usage and flags are provided below. Usage: fdprpro -a/--action [action] [options] program where `program' specifies the input program in the form of an executable or a shared object [action] can be one of the following: anl analyze program instr generate instrumented program for profile gathering opt generate optimized program check_sign check FDPR signature in the input program sample generate script file for collecting sampled profile [options] can be one of the following: Analysis Options: -aawc, --analyze-assembly-written-csects Analyze objects written in Assembly. -acf <analysis configuration file>, --analysis-configuration-file <analysis configuration file> Provide a configuration file of analysis information (advanced option) -asd/-noasd, --analyze-static-data/--noanalyze-static-data -ifl <file>, --ignored-function-list <file> Set the ignored function list. The file contains names of functions that considered as unsafe and thus are not modified Instrumentation Options: -fd <Fdesc>, --file-descriptor <Fdesc> Set the file descriptor number to be used when opening the profile file. The default of <Fdesc> is set to the maximum-allowed number of open files -icvp, --instr-call-value-profiling instrument the values of parameters passed in function calles -imullX, --mullX-instrumentation perform value profiling of RA and RB operands in mullX instructions -iderat, --derat-instrumentation Perform value profiling of RA and RB operands in load/store indexed instructions -issu, --instrumentation-safe-stack-usage Ensure that additional stack space is properly allocated for the instrumented run. Use this option if your application uses the stack extensively (e.g., when the program uses alloca()). Note that this option adds extra overhead on instrumentation code -iso <offset>, --instrumentation-stack-offset <offset> Set the offset from the stack, a negative number, where the instrumentation's area for saving registers is kept at runtime. Use with care -M <addr>, --profile-map <addr> Set the shared memory segment address for profiling. Alternative shared memory addresses are needed when the instrumented program application creates a conflict with the shared-memory addresses preserved for the profiling. Typical alternative values are 0x40000000, 0x50000000, ... up to 0xC0000000. The default is set to 0x3000000 -ptm, --profile-to-memory Use shared memory key instead of file mapping to obtain a shared memory area for the profile data -ri/-nori, --register-instrumentation/--noregister-instrumentation Instrument/Do not instrument the input program file to collect profile information about indirect branches via registers. The default is set to collect the profile information -sfp/-nosfp, --save-floating-point-registers/--nosave-floating-point-registers Save/Do not save floating point registers in instrumented code. The default is set to save floating point registers -shmkey <key number>, --shared-memory-key <key number> Specify a shared memory key to use when creating a shared memory area for the profile. The default key is created by hashing the profile file name (with ftok). Profile Files Options: -af <prof_file>, --ascii-profile-file <prof_file> Set the name of a text format profile file containing profile information. -aop, --accept-old-profile Accept the old profile file collected on previous versions of the input program file (requires the -f flag) -f <prof_file>, --profile-file <prof_file> Set the profile file name. The profile file is created during the instrumentation phase and read during the optimization phase. The profile file is updated each time you run the instrumented program -fdir <prof_file_dir>, --profile-file-directory <prof_file_dir> Set the run-time location of the profile file. The profile will be search during the profiling phase at this location. The default location is the path given in the profile file name (-f option). Applicable only at instrumentation phase Optimization Options: -A <alignment>, --align-code <alignment> Specify code alignment strategy. 1: Use grouping rules of target machine (default), 2: Same as 1 but consider also hotness of branch targets. See -m for the selected machine model. -abb <factor>, --align-basic-blocks <factor> Align basic blocks that are hotter than the average by a given (float) <factor>. This is a lower-level machine-specific alignment compared to --align-code. Value of -1 (the default) disables this option -bf, --branch-folding Eliminate branch to branch instructions -ccc <threshold>, --cold-code-connector <threshold> Preserves original order for code which is less frequently executed than given threshold -bp, --branch-prediction Set branch prediction bit for conditional branches according to the collected profile -cbpth, --cold-branch-prediction-threshold Set the Cold Branch Prediction Threshold for branch prediction optimization. Branches whose execution count relative to the average is below this value will be statically predicted. Allowed values are between (0,1). Default is -1 - optimization is not applied. (Applicable only with the -bp flag) -pbp, --preserve-branch-predication Preserve branch predication pattern (bc+8) and avoid code reordering and branch prediction -cbsi, --chain-based-selective-inline Perform selective inlining of functions that produce long hot chains of code -dce, --dead-code-elimination Eliminate instructions related to unused local variables within frequently executed functions. This is useful mainly after applying function inlining optimization -dp, --data-prefetch Insert data-cache prefetch instructions to improve data-cache performance -ece, --epilog-code-eliminate Reduce code size by grouping common instructions in function epilogs, into a single unified code -fatc <num_of_bytes>, --fat-const <num_of_bytes> Inflate constant areas in code section by adding <num_of_bytes> (entire set to 255) to each constant area -fatd <num_of_bytes>, --fat-data <num_of_bytes> Inflate data section by adding <num_of_bytes> (entire set to 255) to each data basic unit -fatn <num_of_nops>, --fat-nop <num_of_nops> Inflate code secion by adding <num_of_nop> to each code basic block -bined < binary_editor>, --binary-editor < binary_editor> Edit existing binary code (advanced option) -hr, --hco-reschedule Relocate instructions from frequently executed code to rarely executed code areas, when possible -hrf <factor>, --hco-resched-factor <factor> Set the aggressiveness of the -hr optimization option according to a factor value between (0,1), where 0 is the least aggressive factor (applicable only with the -hr option) -tasr, --toc-anchor-store-reschedule Relocate TOC store instructions from frequently executed code to rarely executed code areas, when possible -i, --inline Same as --selective-inline with --inline-small-funcs 12 -ihf <pct>, --inline-hot-functions <pct> Inline all function call sites to functions that have a frequency count greater than the given <pct> frequency percentage -isf <size>, --inline-small-funcs <size> Inline all functions that are smaller than or equal to the given <size> in bytes -kr, --killed-registers Eliminate stores and restores of registers that are killed (overwritten) after frequently executed function calls -lap, --load-address-propagation Eliminate load instructions of variable addresses by re-using pre-loaded addresses of adjacent variables -las, --load-after-store Add NOP instructions to place each load instruction further apart following a store instruction that references the same memory address -plas, --pattern-based-load-after-store Optimizes inefficient memory access patterns in order to avoid load-after-store events. -ebplas, --event-based-pattern-based-load-after-store Optimizes inefficient memory access patterns in order to avoid load-after-store events. The optimization is possible if PM_MRK_LSU_REJECT_LHS profile is available -rcl, --remove-constant-load Reduces the number of load instructions used to bring constant values into registers. The parameter is used to control which version of optimization is applied, versions from 0 to 3 are available. -pvgc <mode>, --print-visual-graph-csect <mode> Print a .dot file with CFG information for each csect. Mode 0 is for a graph containing full instructions list for each node, 1 is for a graph with short nodes description. -pvgf <mode>, --print-visual-graph-func <mode> Print a .dot file with CFG information for each function. Mode 0 is for a graph containing full instructions list for each node, 1 is for a graph with short nodes description. -lro, --link-register-optimization Eliminate saves and restores of the link register in frequently-executed functions -lu <aggressiveness_factor>, --loop-unroll <aggressiveness_factor> Unroll short loops containing one to several basic blocks according to an aggressiveness factor between (1,9), where 1 is the least aggressive unrolling option for very hot and short loops -lun <unrolling_number>, --loop-unrolling-number <unrolling_number> Set the number of unrolled iterations in each unrolled loop. The allowed range is between (2,50). Default is set to 2. (Applicable only with the -lu flag) -lux <unrolling_factor>, --loop-unroll-extended <unrolling_factor> Unroll hot loops using given unrolling factor. The allowed values are integer numbers that are power of 2. Value -1 disables the optimization, value 1 calculates the unrolling factor automatically, given a machine model -nop, --nop-removal Remove NOP instructions from reordered code -sls, --store-load-on-stack-opt Optimize store load on stack pattern -fmrx, --fmr-to-xxlor Replace FMR instructions from reordered code with XXLOR instruction -xscpx, --xscpsgndp-to-xxlor Replace Xscpsgndp instructions from reordered code with XXLOR instruction -divopt, --divide-optimization Replace fdiv/fdivs instructions with fre + fmul/fmuls instructions -tslopt, --toc-store-in-loop-optimization Remove toc store instructions from the loop and place toc store instruction before loop -ifopt, --instruction_fusion_optimization put together two instructions suitable for fusion -liopt, --loop_invariant_optimization move loop invariant instructions out of the loop -sfopt, --simple_functions_opt inlining of the simple functions (isascii, isdigit) -dir, --dependant-instr-resched Put NOP between dependant instructions -O Switch on basic optimizations only. Same as -RC -nop -bp -bf -O2 Switch on less aggressive optimization flags. Same as -O -hr -pto -isf 8 -tlo -kr -see 0 -O3 Switch on aggressive optimization flags. Same as -O2 -RD -isf 12 -si -lro -las -vro -btcar (for XCOFF files) -lu 9 -rt 0 -so -see 1 -oderat -tslopt -O4 Switch on aggressive optimization flags together with aggressive function inlining. Same as -O3 -sidf 50 -ihf 20 -sdp 9 -shci 90 and -bldcg (for XCOFF files) -ocvp, --opt-call-value-profiling specialize function calls according to the values of their passed parameters -omullX, --mullX-optimization Optimize mullX instructions by adding a run-time check on RA and RB and performing equivalent operations with lower penalty. The optimization requires the use of -imullX in the instrumentation phase -oderat, --derat-optimization Optimize load/store indexed instructions by adding a run-time check on RA and RB and performing equivalent operations with lower penalty. The optimization requires the use of -iderat in the instrumentation phase -pbsi, --path-based-selective-inline Perform selective inlining of dominant hot function calls based on the control flow paths leading to hot functions -pca, --propagate-constant-area Relocate the constant variables area to the top of the code section when possible -pr/-nopr, --ptrgl-r11/--noptrgl-r11 Perform/Do not perform removal of R11 load instruction in _ptrgl csect (the default is to perform the optimization) -pto, --ptrgl-optimization Perform optimization of indirect call instructions via registers by replacing them with conditional direct jumps -ptoht <heatness_threshold>, --ptrgl-optimization-heatness-threshold <heatness_threshold> Set the frequency threshold for indirect calls that are to be optimized by -pto optimization. Allowed range between 0 and 1. Default is set to 0.8. (Applicable only with -pto flag) -ptosl <limit_size>, --ptrgl-optimization-size-limit <limit_size> Set the limit of the number of conditional statements generated by -pto optimization. Allowed values are between 1 and 100. Default value is set to 3. (Applicable only with the -pto flag) -RC, --reorder-code Perform code reordering -rcaf <aggressiveness_factor>, --reorder-code-aggressivenes-factor <aggressiveness_factor> Set the aggressiveness of code reordering optimization. Allowed values are [0 | 1 | 2], where 0 preserves then original code order and 2 is the most aggressive. Default is set to 1. (Applicable only with the -RC flag) -rccrf <reversal_factor>, --reorder-code-condition-reversal-factor <reversal_factor> Set the threshold fraction that determines when to enable condition reversal for each conditional branch during code reordering. Allowed input range is between 0.0 and 1.0 where 0.0 tries to preserve original condition direction and 1.0 ignores it. Default is set to 0.8 (Applicable only with the -RC flag) -rcctf <termination_factor>, --reorder-code-chain-termination-factor <termination_factor> Set the threshold fraction that determines when to terminate each chain of basic blocks during code reordering. Allowed input range is between 0.0 and 1.0 where 0.0 generates long chains and 1.0 creates single basic block chains. Default is set to 0.05. (Applicable only with the -RC flag) -RD, --reorder-data Perform static data reordering -ippcf, --instrument-for-path-profiling Perform cross function path profiling instrumentation -ppcf, --optimize-with-path-profiling Perform cross function path profiling optimization -rmte, --remove-multiple-toc-entries Remove multiple TOC entries pointing to the same location in the input program file -rt <removal_factor>, --reduce-toc <removal_factor> Perform removal of TOC entries according to a removal factor between (0,1), where 0 removes non-accessed TOC entries only and 1 removes all possible TOC entries -rtb, --remove-traceback-tables Remove traceback tables in reordered code -sdp <aggressiveness_factor>, --stride-data-prefetch <aggressiveness_factor> Perform data prefetching within frequently executed loops based on stride analysis, according to an aggressiveness factor between (1,9), where 1 is the least aggressive -sdpila <instructions_number>, --stride-data-prefetch-instruction-look-ahead <instructions_number> Set the number of instructions for which data is prefetched into the cache ahead of time. Default value is platform dependant. (Applicable only with the -sdp flag) -sdpms <stride_min_size>, --stride-data-prefetch-min-size <stride_min_size> Set the minimal stride size in bytes, for which data will be considered a candidate for prefetching. Default value is set to 128 bytes. (Applicable only with the -sdp flag) -ebp <evt_based_prefetch>, --event-based-prefetch <evt_based_prefetch> Perform data prefetching based on the events file -ebpla <instructions_number>, --event-based-prefetch-look-ahead <instructions_number> Set the number of instructions for which event based prefetch is performed. Default value is platform dependant. (Applicable only with the -ebp flag) -vecopt Use vector optimizations(remove double xxswapd, remove redundant load, remove xxlnand and replace data with complemented data, replace lxvd2x from rodata and xxswapd by lvx) -see <level> Use simplified prolog/epilog for functions that perform conditional early-exit. Use basic optimization with <level>=0 and maximal with <level>=1 -shci <pct>, --selective-hot-code-inline <pct> Perform selective inlining of functions in order to decrease the total number of execution counts, so that only functions with hotness above the given percentage are inlined -si, --selective-inline Perform selective inlining of dominant hot function calls -chca, --convert_hole_to_constareas Convert Holes In SafeCSects To ConstAreas -sidf <percentage_factor>, --selective-inline-dominant-factor <percentage_factor> Set a dominant factor percentage for selective inline optimization. The allowed range is between 0 and 100. Default is set to 80. (Applicable only with the -si and -pbsi flags) -siht <frequency_factor>, --selective-inline-hotness-threshold <frequency_factor> Set a hotness threshold factor percentage for selective inline optimization to inline all dominant function calls that have a frequency count greater than the given frequency percentage. Default is set to 100. (Applicable only with the -si -pbsi flags) -slbp, --spinlock-branch-prediction Perform branch prediction bit setting for conditional branches in spinlock code containing l*arx and st*cx instructions. (Applicable after -bp flag) -sldp, --spinlock-data-prefetch Perform data prefetching for memory access instructions preceding spinlock code containing l*arx and st*cx instructions -sll <Lib1:Prof1,...,LibN:ProfN>, --static-link-libraries <Lib1:Prof1,...,LibN:ProfN> Statically link hot code from specified dynamically linked libraries to the input program. The parameter consists of a comma-separated list of libraries and their profiles. IMPORTANT: Licensing rights of specified libraries should be observed when applying this copying optimization -sllht <hotness_threshold>, --static-link-libraries-hotness-threshold <hotness_threshold> Set hotness threshold for the --static-link-libraries optimization. The allowed input range is between 0 (least aggressive) and 1, or -1, which does not require a profile and selects all code that might be called by the input program from the given libraries. Default is set at 0.5 -so, --stack-optimization Reduce the stack frame size of functions that are called with a small number of arguments -spc, --shortcut-plt-calls Shortcut PLT calls in shared libraries to local functions if they exist. Note: Resolving to external symbols is disabled for such calls -tb, --preserve-traceback-tables Force the restructuring of traceback tables in reordered code. If -tb option is omitted, traceback tables are automatically included only for C++ applications that use the Try & Catch mechanism -tlo, --tocload-optimization Replace each load instruction that references the TOC with a corresponding add-immediate instruction via the TOC anchor register, where possible -vro, --volatile-registers-optimization Eliminate stores and restores of non-volatile registers in frequently executed functions by using available volatile registers -vrox, --volatile-registers-extended-optimization Eliminate stores and restores of non-volatile registers in frequently executed functions by using available volatile registers, the extended version supports FP registers and transparency Output Options: -cep, --complement-edge-profile Complements partial profile information given for the basic blocks' frequencies by adding missing basic block-to-basic block edge counts -d, --disassemble-text Print the disassembled text section of the output program into <output_file>.dis_text file -dap, --dump-ascii-profile Dump profile information in ASCII format into <program>.aprof (requires the -f flag). -db, --disassemble-bss Print the disassembled bss section of the output program into <output_file>.dis_bss file -dd, --disassemble-data Print the disassembled data section of the output program into <output_file>.dis_data file -diap, --dump-initial-ascii-profile Dump the given profile information in ASCII format into <program>.aprof.init (requires the -f flag) -dim, --dump-instruction-mix Dump instruction mix statistics based on gathered profile information -dm, --dump-mapper Print a map of basic blocks and static variables with their respective new -> old addresses into a <program>.mapper file -o <output_file>, --output-file <output_file> Set the name of the output file. The default instrumented file is <program>.instr. The default optimized file is <program>.fdpr -scl, --show-constant-load Adds annotaions in fdpr disassembly on load instructions used to bring constant values into registers (requires -d flag) -ppcf, --print-prof-counts-file Print a text format of the profiling counters into a <program>.counts file (requires the -f flag). -sf, --strip-file Strip the output file -simo, --single-input-multiple-outputs Optimize in parallel into multiple outputs as specified by option sets read from stdin General Options: -h, --help Print the online help -j <jour_file>, --journal <jour_file> Output optimization journal information to <jour_file> -smt, --smt_mode set SMT mode (1:ST, 2: (SMT2-shared, SMT2-split), 4:SMT4, 8:SMT8) -m <machine-model>, --machine <machine-model> Generate code for the specified machine model. Target machine can be one of the following models: power2, power3, ppc405, ppc440, power4, ppc970, power5, power6, power7, ppe, spe, spe_edp, z10, z9. Default is power7 -q, --quiet Set the output mode to quiet, suppressing informational messages -st <stat_file>, --statistics <stat_file> Output statistics information to <stat_file>. If <stat_file> is '-', the output goes to the standard output. See --verbose for the default -v <level>, --verbose <level> Set verbose output mode level. When set, various statistics about the output program are printed into the file <program>.stat. Allowed level range is between 0 and 3. Default is set to 0 -V, --version Print the version number -w <level>, --warning-level <level> Set the warning level so only errors of this level and below will be printed. The levels are: 1: errors, 2: warnings, 3: debug warning, 4: debug information. Default is 2 - Analysis options should be specified identically in the instrumentation and optimization phases - Some options are relevant only to specific platforms