SPEC CPU2006 Flags Disclosure for the Intel Compilers (v10.1) on Mac OSX

macosx-iccifort-v10.1-flags-file-20080408 SPEC CPU2006 Flags Disclosure for the Intel Compilers (v10.1) on Mac OSX

Last updated: 8-Apr-2008

This flags disclosure file describes the compiler flags associated with the following Intel compilers:

Intel C++ Compiler for Mac OSX v10.1
Intel Fortran Compiler for Mac OSX v10.1

]]>

Platform settings

The system under test is deemed reasonably quiet by turning off the following from the System Preferences panel:

Automatic Software Updates (turned ON by default)
Screen Savers (turned ON by default)
Unused wireless and bluetooth connectivity (turned ON by default)
Network time syncrhonization (turned ON by default)

OMP_NUM_THREADS

Sets the maximum number of threads to use for OpenMP* parallel regions if no other value is specified in the application. This environment variable applies to both -openmp and -parallel (Linux and Mac OS X).
Example syntax on a Mac OS X system with 8 cores:
export OMP_NUM_THREADS=8

]]> icc icc invokes the Intel C++ compiler . It is invoked as:

icc [ options ] file1 [ file2 ... ]

where,

options: represent zero or more compile options
fileN: is a C/C++ source (.C .c .cc .cp .cpp .cxx .c++ .i), assembly (.s), object (.o), static library (.a), or other linkable file.

Invoking the compiler using icc compiles .c and .i files as C. Using icc only links in C++ libraries if C++ source is provided on the command line.

]]> invokes the Intel C compiler for Intel 64 applications

]]> invokes the Intel C++ compiler for Intel 64 applications

]]> invokes the Intel C compiler for Intel 32 applications

]]> invokes the Intel C++ compiler for Intel 32 applications

]]> invokes the Intel Fortran compiler for Intel 32 applications

]]> invokes the Intel Fortran compiler for Intel 64 applications

]]> icpc The icpc command uses the same compiler options as the icc command. Invoking the compiler using icpc compiles .c, and .i files as C++. Using icpc always links in C++ libraries.

]]> ifort ifort invokes the Intel Fortran compiler. It is invoked as:

ifort [ options ] file1 [ file2 ... ]

where,

options: represent zero or more compile options
fileN: is a Fortran source file, assembly file, object file, object library, or other linkable file.

]]> Compiler option to set the path for include files. Used in some integer peak benchmarks which were built using the Intel 64-bit C++ compiler. Compiler option to set the path for library files. Used in some integer peak benchmarks which were built using the Intel 64-bit C++ compiler. Compiler option to set the path for include files. Used in some peak benchmarks which were built using the Intel 32-bit C++ compiler. Compiler option to set the path for library files. Used in some integer peak benchmarks which were built using the Intel 32-bit C++ compiler. Compiler option to set the path for include files. Used in some peak benchmarks which were built using the Intel 32-bit Fortran compiler. Compiler option to set the path for library files. Used in some integer peak benchmarks which were built using the Intel 32-bit Fortran compiler. Compiler option to set the path for include files. Used in some peak benchmarks which were built using the Intel 64-bit Fortran compiler. Compiler option to set the path for library files. Used in some integer peak benchmarks which were built using the Intel 64-bit Fortran compiler. For mixed-language benchmarks, tell the compiler that the main program is not written in Fortran

]]> Enables optimizations for speed and disables some optimizations that increase code size and affect speed. To limit code size, this option:

Enables global optimization; this includes data-flow analysis, code motion, strength reduction and test replacement, split-lifetime analysis, and instruction scheduling.
Disables intrinsic recognition and intrinsics inlining.
Disables loop unrolling.

The O1 option may improve performance for applications with very large code size, many branches, and execution time not dominated by code within loops.

On IA-32 Mac OSX platforms, -O1 sets the following:

-unroll0,
-fno-builtin,
-mno-ieee-fp,
-fomit-frame-pointer (same as -fp),
-ffunction-sections

]]> Enables optimizations for speed. This is the generally recommended optimization level. This option also enables:

Inlining of intrinsics
Intra-file interprocedural optimizations, which include:
- inlining
- constant propagation
- forward substitution
- routine attribute propagation
- variable address-taken analysis
- dead static function elimination
- removal of unreferenced variables
The following capabilities for performance gain:
- constant propagation
- copy propagation
- dead-code elimination
- global register allocation
- global instruction scheduling and control speculation
- loop unrolling
- optimized code selection
- partial redundancy elimination
- strength reduction/induction variable simplification
- variable renaming
- exception handling optimizations
- tail recursions
- peephole optimizations
- structure assignment lowering and optimizations
- dead store elimination

]]> Enables O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations.

Enables optimizations for maximum speed, such as:

Loop unrolling, including instruction scheduling
Code replication to eliminate branches
Padding the size of certain power-of-two arrays to allow more efficient cache use

On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux/Mac OSX), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times.

The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations.

The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.

]]> This option enables additional interprocedural optimizations for single file compilation. These optimizations are a subset of full intra-file interprocedural optimizations. One of these optimizations enables the compiler to perform inline function expansion for calls to functions defined within the current source file. ]]> This option enables multi-file interprocedural optimizations that includes:

inline function expansion
interprocedural constant propogation
dead code elimination
propagation of function characteristics
passing arguments in registers
loop-invariant code motion

When you specify this option, the compiler performs inline function expansion for calls to functions defined in separate files.

]]> The -fast option enhances execution speed across the entire program by including the following options that can improve run-time performance:

-xT (optimizations for Intel Core 2 Duo processor family)
-O3 (maximum speed and high-level optimizations)
-ipo (enables interprocedural optimizations across files)
-no-prec-div (disable -prec-div), where -prec-div improves precision of FP divides (some speed impact)
-mdynamic-no-pic, where -mydynamic-no-pic indicates that code is not relocatable

Options set by -fast cannot be overidden, list options separately to change behavior. The options set by -fast may change from release to release.

]]> The -xT option tells the compiler to generate optimized code for the Intel Core 2 Duo processor family. It can generate SSSE3, SSE3, SSE2, and SSE instructions for the Intel processors.

]]> Tells the compiler to generate code for IA-32 architecture. If this flag is not specified, the compiler generates code based on whether 32-bit or the 64-bit compiler is in the search path.

]]> Tells the compiler to generate code for EM64T architecture. If this flag is not specified, the compiler generates code based on whether 32-bit or the 64-bit compiler is in the search path.

]]> Enables the compiler to generate runtime control code for effective automatic parallelization

]]> Tells the auto-parallelizer to generated multithreaded code for loops that can be safely executed in parallel. To use this option, you must also use option O2 or O3.

]]> Tells the compiler to link in the optimized malloc implementation that resides under /usr/lib.

]]> Links the 32-bit Intel's C++ compiler libraries.

]]> Links the 64-bit Intel's C++ compiler libraries.

]]> Links the 32-bit Intel's Fortran compiler libraries.

]]> Links the 64-bit Intel's Fortran compiler libraries.

]]> Code is not relocatable, but external references are relocatable.

]]> This option improves precision of floating-point divides. It has a slight impact on speed.

With some optimizations, such as -xN and -xB (Linux) or /QxN and /QxB (Windows), the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.

However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, use this option to disable the floating-point division-to-multiplication optimization. The result is more accurate, with some loss of performance.

If you specify -no-prec-div (Linux and Mac OSX), it enables optimizations that give slightly less precise results than full IEEE division. The default is -prec-div.

]]> Instrument program for profiling for the first phase of two-phase profile guided optimization. This instrumentation gathers information about a program's execution paths and data values but does not gather information from hardware performance counters. The profile instrumentation also gathers data for optimizations which are unique to profile-feedback optimization.

]]> Instructs the compiler to produce a profile-optimized executable and merges available dynamic information (.dyn) files into a pgopti.dpi file. If you perform multiple executions of the instrumented program, -Qprof_use merges the dynamic information files again and overwrites the previous pgopti.dpi file.

Without any other options, the current directory is searched for .dyn files

]]> Generates static binaries. Libraries are statically linked in to the executable. Default behavior on Mac OS X is to produce dynamically linked binaries. This flag has been deprecated in the 10.x compiler; use -static-intel instead.

]]> This option causes the Intel-provided libraries to be linked in statically. It is the opposite of -shared-intel. Note that when this option is provided, libguide is also linked in statically.

]]> Tells the compiler the maximum number of times (n) to unroll loops.

]]> Disables inline expansion of all intrinsic functions. ]]> Disables conformance to the ANSI C and IEEE 754 standards for floating-point arithmetic.

]]> Allows use of EBP as a general-purpose register in optimizations.

]]> Places each function in its own COMDAT section.

]]> Pass options o1, o2, etc. to the linker for processing.

]]> Specifies the initial address of the stack pointer value, where value is a hexadecimal number rounded to the segment alignment. The default segment alignment is the target pagesize (currently, 1000 hexadecimal for the PowerPC and for i386). If -stack_size is specified and -stack_addr is not, a default stack address specific for the architecture being linked will be used and its value printed as a warning message. This creates a segment named __UNIXSTACK. Note that the initial stack address will be either at the high address of the segment or the low address of the segment depending on which direction the stack grows for the architecture being linked.

]]> Specifies the size of the stack segment value, where value is a hexadecimal number rounded to the segment alignment. The default segment alignment is the target pagesize (currently, 1000 hexadecimal for the PowerPC and for i386). If -stack_addr is specified and -stack_size is not, a default stack size specific for the architecture being linked will be used and its value printed as a warning message. This creates a segment named __UNIXSTACK .

]]>