Siemens CDS++ compiler flags (as of May 1999) ============================================= The following is a list of short explanations of compiler/linker flags used for SPEC CINT95 result submissions for Siemens RM systems, using the CDS++ 2.0 A compiler. Most flags are semantically similar to those that have been used with the predecessor compiler pyrC6. The syntax has changed in many cases; in several cases, the old syntax is still accepted together with the new one. It is likely that future result submissions, if they use new compilers or new compiler versions, will have different flags; then this flag description will be superseded by a new one. --------------------------------------------------------------------- 1. Compiler Flags [Syntax note: For most flags that have a numeric parameter (e.g., inlining control), this parameter can be separated from the flag by either a comma "," or a colon ":". After a "-F" or "-K", the blank space is optional.] -qfeedback Standard (1-pass) feedback optimization: Produce code that collects call graph and flow graph information suitable for feedback directed optimization. -F profdir, Specifies that profiling information should be written to and read from the directory . Default is ./PROF. -qfeedback2 Additional (2-pass) feedback optimization: Produce code that collects information from an executable optimized in a first pass of feedback optimization (i.e. one compiled with -qfeedback / -F O4 or -qfeedback / -F O5). -F profdir2, Specifies that profiling information from 2-pass feedback compilation should be written to and read from the directory . Default is ./PROF2. -F use_fb2 Specifies that profiling information from 2-pass feedback compilation should be used in the generation of the (final) executable. Must be used together with -F O4 or -F O5. -F X4 Performs all safe and generally applicable optimizations including interprocedural optimizations, register allocation across function calls and feedback directed optimizations (function inlining, procedure positioning, branch elimination, procedure splitting, register allocation and cross basic block scheduling). This flag also directs the compiler to produce nonposition- independent code, to generate code using the instruction set of the MIPS4 ISA, to inline alloca, printf, memcpy, memset, memcmp, and memmove and to use U-code system libraries. These libraries represent the same system services as their regular counterparts, but in a form more suitable for interprocedural optimization. The flag also includes -F fast_int_mul (see below). -F cost_benefit, Tells the "umerge" phase to consider only those functions for inlining/cloning whose estimated ratio of cost over savings is less than n. -F G Specifies that data items smaller than bytes in size should be placed in the global data area and accessed using a faster addressing mode. Default is 0. -F inline_limit, Sets a size threshold for inlining/cloning. A call will not be inlined/cloned if the resulting function (after inlining/cloning) exceeds basic blocks. Default is 500. -F loopunroll, Tells the optimizer to unroll loops times. Default is 4, -FX4 sets to 8. -F unrolllimit, is the limit on the number of instructions within a loop unrolled by the optimizer. Default is 320, -FX4 sets to 2000. -F fast_int_mul Directs the optimizer to to use the floating-point unit to perform 32-bit integer multiplications wherever doing so would result in correct, faster code. Because this flag changes the behavior of multiplications that overflow, programs that depend on the trunction to 32-bits of two- complement multiplication (the default behavior) should not use this flag. Because the difference to the default behavior appears in overflow cases only (not in legal C programs), and because rule 2.2.5 of the CPU95 Run Rules exempts numerical accuracy flags from baseline restrictions anyway, this flag is not an assertion flag in the sense of the CPU95 Run Rules. -F no_positioning Disables procedure positioning feedback optimization. -F afep, Subroutine entries are allocated on 2 ** num byte boundaries. Default is num=2. -F hot_switch_opt,, -Wc,-xjp_mh_opt,, Controls the hot switch optimization which uses conditional branches instead of indirect jumps at C switch statements. For a switch label to be considered for this optimization, the label's relative frequency of execution must be greater than num1 percent. The parameter num2 limits the maximum number of conditional branches. -F X4 sets the values to 6 and 5, respectively. (-Wc,-xxx: Syntax for flags that direct the compiler's code generator ) -KOlimit, -F Olimit, Changes the threshold size for optimizing very large programs. The argument specifies the maximum size in basic blocks of a function that will be optimized by the global optimizer. The default value of the argument is 1000. The optimization phase of the compiler warns the user if this flag is needed to optimize a particular program. -F X4 sets num to 4000. -Kr4000 Causes pipeline optimization for the R4000 and R4400 CPU -Wb,-xxx Syntax for flags that direct the compiler's back end -Wb,-br_likely_cntl,, Controls the branch likely optimization which sets the likely bit in a conditional branch. If feedback indicates that a conditional branch is probably taken and the branch cannot be reversed, the branch's likely bit is set if both of the following criteria are met: 1) the branch is taken at least percent of the time and 2) equals 0 or the branch is taken at times more often than the time the branch's function is called. Both and are expressed as percentages. -Wb,-prefetch,, -F prefetch,, This will insert prefetch instructions in loops if a loop appears to access memory in a serial fashion. Only loops which have at least iterations are considered. is the expected latency for fetches from memory in units of machine instruction cycle times. Off by default; -F X4 sets it "on" and sets the values to 400 and 400, respectively. -WG,-xxx / -Wg,-xxx / -Wn,-xxx Flags that have one of these forms control either the "inliner" pass of the compiler (-Wg,-xxx), or the "cloner" pass of the compiler (-Wn,-xxx), or both (-WG,-xxx). A setting with a more specific value (lower case letter g or n) overrides the more general setting (uppercase letter G). Although the following description uses the "-WG,-xxx" form, it holds for the other forms also. Some flags exist for the "cloner" only (the pass that optimizes for specific call locations of subroutines), they provide finer control over the cloning process. They can be written in the form -WG,-xxx or -Wn,-xxx; the following description uses the form -Wn,-xxx. -WG,-boc: Tells the "umerge" phase to consider only those functions for inlining/cloning whose estimated ratio of runtime cycle save to I-cache cost of doing inlining/cloning is greater than or equal to n. -WG,-clone_expansion: -Wn,-clone_expansion: Directs the cloner and/or inliner to limit the maximum relative growth of the program to . The default for is 1.3. -Wn,-recursion_depth: Sets the maximum number of function calls through which the cloner will search to identify recursive functions. For example, -WN,-recursion_depth:1 means that functions who call themselves will be consider recursive functions. -WG,-only_clone_recursion -Wn,-only_clone_recursion Directs the cloner and/or inliner to only clone recursive functions. -WG,-recursion_limit: -Wn,-recursion_limit: Directs the cloner and/or inlinert to limit the maximum number of basic blocks in a recursive function to . If -Wn,-recursion_limit isn't given, then this is set by the -WG,-inline_limit flag. If neither of these flags is given, the default is 500. -Wo,-xxx Syntax for flags that direct the compiler's optimizer pass -Wo,-no_const_in_reg Tells the optimizer not to put constants in registers. -Wo,-recursive_calls Directs uopt to use different heuristics that result in better performance if there are recursive function calls in the source code. Only effective in -F X4 mode. -Wo,-splitedges, Controls the edge splitting algorithm in "uopt" which inserts an empty basic block on infrequently executed control flow edges to increase optimization opportunities. This optimization uses feedback information to limit the number of split edges and avoid excessive compilation time. "uopt" will split an edge if its execution frequency multiplied by num is less than the smaller of the execution frequencies of the edge's head and tail basic blocks. Setting num to zero disables edge splitting. 2. Linker Flags -dn This option is passed to ld. It specifies static linking in the link editor. 3. Portability Flags: -DI_TIME -DI_SYS_TIME Enables certain (SPEC-approved) source code parts via conditional compilation. Questions? More details can be found in the compiler documentation. SPEC-specific questions should be sent to the SPEC OSG representative Reinhold Weicker, reinhold.weicker@pdb.siemens.de