------------------------------------------------------ Hewlett-Packard Company SPEC CPU2000 FLAG DESCRIPTIONS - Intel C/C++/Visual FORTRAN Compilers Version 9.1 - Windows and Linux - HP-20061003-IC91.txt ------------------------------------------------------ ------------------------------------------------------ General Options (C/C++/FORTRAN) ------------------------------------------------------ -arch: Determines the version of the architecture for which the compiler generates instructions keyword Is the processor type. Possible values are: SSE: Optimizes for Intel Pentium 4 processors with Streaming SIMD Extensions (SSE). SSE2: Optimizes for Intel Pentium 4 processors with Streaming SIMD Extensions 2 (SSE2). -fast This option maximizes speed across the entire program by including the following Options (IA32, EM64T) -O3 -Qipo -Qprec-div- -QxP (Windows) -O3 -ipo -no-prec-div -xP -static (Linux) -O{1|2|3} Optimization-level options: 1: optimize for speed, but disable some optimizations which increase code size for a small speed benefit. Includes inline expansion for intrinsic functions, global optimizations, string pooling optimizations. 2: This is the default level of optimization. Optimizes for speed. The -O2 option includes O1 optimizations and in addition enables inlining of intrinsics and more speed optimizations. 3: Builds on -01 and -02 optimizations by enabling high-level optimization. This level does not guarantee higher performance unless loop and memory access transformation take place. In conjunction with -QaxK/-QxK and QaxW/QxW, this switch causes the compiler to perform more aggressive data dependency analysis than for -O2. This may result in longer compilation times. -Oa[-] Assume [do not assume] no aliasing in program. -Qansi_alias[-] (Windows) -ansi-alias[-] (Linux) Enable/disable use of ANSI aliasing rules in optimizations; user asserts that the program adheres to these rules. The default for C++ is -Qansi_alias- which is that aliasing rules are not assumed. The default for the Fortran compiler is -Qansi_alias. For C++, the -Qansi_alias flag will enable optimizations that would otherwise be prevented by potential aliasing. -Qauto-ilp32 (Windows) -auto-ilp32 (Linux) This option instructs the compiler to analyze the program to determine if there are 64-bit pointers which can be safely shrunk into 32-bit pointers. In order for this option to be effective the compiler must be able to optimize using the -ipo/-Qipo option, and must be able to analyze all library/ external calls the program makes. This option imposes the following restriction on the program: The program cannot malloc any objects greater than 2**31 bytes in size. If the program does not satisfy this restriction, unpredictable behavior may occur. -Qipo (Windows) -ipo (Linux) Multi-file ip optimizations that includes: - inline function expansion - interprocedural constant propogation - dead code elimination - propagation of function characteristics - passing arguments in registers - loop-invariant code motion -Qoption,,options Passes options to a specified tool. Is the name of the tool. string can be any of the following: c - Indicates the Intel C++ compiler. cpp (or fpp) - Indicates the Intel C++ preprocessor. cxxinc - Indicates C++ header files. cinc - Indicates C header files. asm - Indicates the assembler. link - Indicates the linker. prof - Indicates the profiler. On Windows systems, the following is also available: masm - Indicates the Microsoft assembler. On Linux and Mac OS systems, the following are also available: as - Indicates the assembler. gas - Indicates the GNU assembler. ld - Indicates the loader. gld - Indicates the GNU loader. lib - Indicates an additional library. crt - Indicates the crt%.o files linked into executables to contain the place to start execution. Are one or more comma-separated, valid options for the designated tool. -Qoption can be used with the -Qipo flag to refine IPO. The valid options that can be used for this purpose are: -ip_args_in_regs=0 Disables the passing of arguments in registers. -ip_ninl_max_stats=n Sets the valid max number of intermediate language statements for a function that is expanded in line. The number n is a positive integer. The number of intermediate language statements usually exceeds the actual number of source language statements. The default value for n is 230. The compiler uses a larger limit for user inline functions. -ip_ninl_min_stats=n Sets the valid min number of intermediate language statements for a function that is expanded in line. The number n is a positive integer. The default values for ip_ninl_min_stats are: IA-32 compiler: ip_ninl_min_stats = 7 -ip_ninl_max_total_stats=n Sets the maximum increase in size of a function, measured in intermediate language statements, due to inlining. n is a positive integer whose default value is 2000. -Qprec-div[-] (Windows) -[no-]prec-div (Linux) Improves precision of floating point divides. -Qprof_gen (Windows) -prof_gen (Linux) Instrument program for profiling for the first phase of two-phase profile guided optimization. -Qprof_use (Windows) -prof_use (Linux) Instructs the compiler to produce a profile-optimized executable and merges available dynamic information (.dyn) files into a pgopti.dpi file. If you perform multiple executions of the instrumented program, -Qprof_use merges the dynamic information files again and overwrites the previous pgopti.dpi file. Without any other options, the current directory is searched for .dyn files. -Qrcd (Windows) -rcd (Linux) The Intel compiler uses the -Qrcd option to improve the performance of code that requires floating-point-to-integer conversions. The system default floating point rounding mode is round-to-nearest. This means that values are rounded during floating point calculations. However, the C language requires floating point values to be truncated when a conversion to an integer is involved. To do this, the compiler must change the rounding mode to truncation before each floating point-to-integer conversion and change it back afterwards. The -Qrcd option disables the change to truncation of the rounding mode for all floating point calculations, including floating point-to-integer conversions. Turning on this option can improve performance, but floating point conversions to integer may not conform to C semantics. -static (Linux) This option prevents linking with shared libraries. It causes the executable to link all libraries statically. -Qunroll[n] (Windows) -unroll[n] (Linux) Specifies the maximum number of times to unroll a loop. Omit n to let the compiler decide whether to perform unrolling or not. Use n = 0 to disable unroller. If n is not specified, the compiler automatically chooses the maximum number of times to unroll a loop. -Qax (Windows) -ax (Linux) Directs the compiler to generate processor-specific code if there is a performance benefit, while also generating generic IA-32 code. is the processor for which you want to target your program. Possible values are: K: Code is optimized for Intel® Pentium® III and compatible Intel processors. W: Code is optimized for Intel Pentium 4 and compatible Intel processors. N: Code is optimized for Intel Pentium 4 and compatible Intel processors with Streaming SIMD Extensions 2. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors. B: Code is optimized for Intel Pentium M and compatible Intel processors. This option also enables new optimizations in addition to Intel processor-specific optimizations. P: Code is optimized for Intel® Core™ Duo processors, Intel® Core™ Solo processors, Intel® Pentium® 4 processors with Streaming SIMD Extensions 3, and compatible Intel processors with Streaming SIMD Extensions 3. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors. -Qx (Windows) -x (Linux) Generate specialized code for processor specified by while also generating generic code. is the processor for which you want to target your program. Possible values are: K: Code is optimized for Intel® Pentium® III and compatible Intel processors. W: Code is optimized for Intel Pentium 4 and compatible Intel processors. N: Code is optimized for Intel Pentium 4 and compatible Intel processors with Streaming SIMD Extensions 2. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors. B: Code is optimized for Intel Pentium M and compatible Intel processors. This option also enables new optimizations in addition to Intel processor-specific optimizations. P: Code is optimized for Intel® Core™ Duo processors, Intel® Core™ Solo processors, Intel® Pentium® 4 processors with Streaming SIMD Extensions 3, and compatible Intel processors with Streaming SIMD Extensions 3. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors. Additional Notes on N and P: ------------------------------------ The N and P options target your program to run on Intel Pentium 4 and compatible Intel processors. The resulting code might contain unconditional use of features that are not supported on other processors. Programs, where the function main() is compiled with this option, will detect non compatible processors and generate an error message during execution. These options also enable new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors. -Zp{1|2|4|8|16} Specifies the strictest alignment constraint for structure and union types as one of the following: 1, 2, 4, 8, or 16 (default) bytes. ------------------------------------------------------ Flags Specific to C/C++ ------------------------------------------------------ -Qcxx_features (Windows) Enables standard C++ features without disabling Microsoft features and within the bounds of what is provided in the Microsoft headers and libraries. This option also enables -GR and -GX. -GR Enables C++ Runtime Type Information (RTTI). -GX Enables the full C++ Exception Handling unwind semantics. ------------------------------------------------------ Flags Specific to FORTRAN ------------------------------------------------------ -Qauto (Windows) -auto (Linux) Causes all variables to be allocated on the stack, rather than in local static storage. -Qscalar_rep[-] (Windows) -[no-]scalar-rep (Linux) Enables (DEFAULT) [disables] scalar replacement performed during loop transformations. ------------------------------------------------------ General Options and Libraries ------------------------------------------------------ The starting tokens "/" and "-" are both equivalent for flags passed to the compiler. For example, -QxW and /QxW are identical switches. +FDO PASS1=-Qprof_gen PASS2=-Qprof_use Using feedback-directed optimization, a profile is generated on the first pass of compilation and used on the second pass. shlW32M.lib MicroQuill SmartHeap Library available from http://www.microquill.com ------------------------------------------------------ Benchmark-Specific Portability Options ------------------------------------------------------ -DSPEC__CPU2000_LP64 Compile using LP64 programming model. 176.gcc: -Dalloca=_alloca So as to use the built-in optimized alloca. /F10000000 176.gcc uses alloca and this option tells the linker to pre-allocate 10MB of stack. The default amount of stack allocated is not enough and 176.gcc crashes with a run-time error 178.galgel: -FI (Linux) /FI Fixed-format F90 source code. /F32000000 Same as with 176.gcc, pre-allocates a 32MB stack 186.crafty: -DNT_i386 Specifies that it is a Windows NT Intel processor-based system which makes the compiler use "_int64" as the 64-bit variable that 186.crafty needs. 252.eon: -DHAS_ERRLIST Prog env provides specification for "sys_errlist[]". 253.perlbmk: -DSPEC_CPU2000_NTOS This enables the code changes for porting to Windows get included. -DPERLDLL On Windows, we need a perl.exe instead of a perl.exe and perl.dll. This pre-defines ensures that the changes necessary to get a single, UNIX-style executible without getting the indirect calls that can cause a 10% performance degradation. This allows the Windows-based executible to be as close as possible to the Unix-based one. /MT Use the static multi-threaded library else it will not compile. 254.gap: -DSYS_HAS_CALLOC_PROTO -DSYS_HAS_MALLOC_PROTO These two pre-defines tell of the existence of malloc and calloc prototypes. ---------------------------------------------------------------------------- BIOS Settings Notes ---------------------------------------------------------------------------- Note: The settings described in this section do not apply to every ProLiant server. Power Regulator for ProLiant support (Default = HP Dynamic Power Savings Mode): - HP Dynamic Power Savings Mode: Automatically varies processor speed and power usage based on processor utilization. Allows reducing overall power consumption with little or no impact to performance. Does not require OS support. - HP Static Low Power Mode: Reduces processor speed and power usage. Guarantees a lower maximum power usage for the system. Performance impacts will be greater for environments with higher processor utilization. - HP Static High Performance Mode: Processors will run in their maximum power/performance state at all times regardless of the OS power managment policy. - OS Control Mode: Processors will run in their maximum power/ performance state at all times unless the OS enables' a power management policy. Adjacent Sector Prefetch (Default = Enabled): This option allows the enabling/disabling of a processor mechanism to fetch the adjacent cache line within an 128-byte sector that contains the data needed due to a cache line miss. In some limited cases, setting this option to Disabled may improve performance. In the majority of cases, the default value of Enabled provides better performance. Users should only disable this option after performing application benchmarking to verify improved performance in their environment.