Fujitsu Limited PQ580A SPEC CPU Flags Description

Fujitsu.PQ580A.ipf.linux.flags Fujitsu Limited PQ580A SPEC CPU Flags Description

Compilers: Intel Compilers for C++ and Fortran, Version 10.1 for IPF Linux64
Operating system: Red Hat Enterprise Linux 5.1 (for Intel Itanium)

]]>

Processes are bound to CPUs using numactl and taskset.

taskset -c cpulist command {arguments ...}
launch a new COMMAND with a CPU affinity specified by cpulist
numactl --membind nodes command {arguments ...}
launch a new COMMAND with memory placing policy that only allocate memory from nodes

limit stacksize unlimited
set the stack size to unlimited using the command 'ulimit -s unlimited' prior to run

Memory system is in "Non Mirror Mode".
PRIMEQUEST 580A/540A/520A memory system supports DSSA (Dual Sync System Architecture )
and works in one of following two modes.

Mirror Mode
Address buses and Data buses are duplicated.
And most of internal action of chipset is also duplicated.
In this mode, system memory throughput becomes half but higher
reliability is expected by the memory system duplication.
Non Mirror Mode
Address buses and data buses are not duplicated.
And the internal action of chipset is not duplicated.
In this mode, full memory bandwidth is available,
but the system duplication for higher reliability does not work.

The following 2 environment variables were set.

MALLOC_MMAP_MAX_=0
MALLOC_TRIM_THRESHOLD_=-1
This will cause use of sbrk() calls instead of mmap() calls to get memory from the system. ]]> Specifies the main program is not written in Fortran, and prevents the compiler from linking for_main.o into applications.

]]> Maximizes speed across the entire program.

Sets the following options:

-O3
-ipo
-static

]]> Enables O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enables optimizations for maximum speed, such as:

Loop unrolling, including instruction scheduling
Code replication to eliminate branches
Padding the size of certain power-of-two arrays to allow more efficient cache use.

On Intel Itanium processors, the O3 option enables optimizations for technical computing applications (loop-intensive code): loop optimizations and data prefetch.

The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations.
The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.

]]> Enables optimizations for speed. This is the generally recommended optimization level.
This option enables optimizations for speed, including global code scheduling, software pipelining,predication, and speculation.
This option also enables:

Inlining of intrinsics
Intra-file interprocedural optimizations, which include:
- inlining
- constant propagation
- forward substitution
- routine attribute propagation
- variable address-taken analysis
- dead static function elimination
- removal of unreferenced variables
The following capabilities for performance gain:
- constant propagation
- copy propagation
- dead-code elimination
- global register allocation
- global instruction scheduling and control speculation
- loop unrolling
- optimized code selection
- partial redundancy elimination
- strength reduction/induction variable simplification
- variable renaming
- exception handling optimizations
- tail recursions
- peephole optimizations
- structure assignment lowering and optimizations
- dead store elimination

]]> Enables optimizations for speed and disables some optimizations that increase code size and affect speed.
To limit code size, this option:

Enables global optimization; this includes data-flow analysis, code motion, strength reduction and test replacement, split-lifetime analysis, and instruction scheduling.
Disables intrinsic recognition and intrinsics inlining.
On Itanium-based systems, it disables software pipelining, loop unrolling, and global code scheduling.

On Intel Itanium processors, this option also enables optimizations for server applications (straight-line and branch-like code with a flat profile).

The O1 option may improve performance for applications with very large code size, many branches, and execution time not dominated by code within loops.

-unroll0, -fbuiltin, -mno-ieee-fp, -fomit-frame-pointer (same as -fp), -ffunction-sections

]]> Tells the compiler the maximum number of times to unroll loops.

]]> Enables inline expansion of all intrinsic functions.

]]> Disables conformance to the ANSI C and IEEE 754 standards for floating-point arithmetic.

]]> Enables EBP to be used as a general-purpose register.

]]> Places each function in its own COMDAT section.

]]> Enables multifile interprocedural optimizations between files.

]]> Prevents linking with shared libraries.

]]> Instruments a program for profiling to get the execution count of each basic block. It also creates a new static profile information file.

]]> Enables use of profiling information (including function splitting and function grouping) during optimization. It enables option -fnsplit.

]]> Enables function splitting. This option is enabled automatically if you specify -prof-use.

]]> Enables use of faster but slightly less accurate code sequences for math functions, such as divide and sqrt. When compared to strict IEEE* precision, this option slightly reduces the accuracy of floating-point calculations performed by these functions, usually limited to the least significant digit.

This option also enables the performance of more aggressive floating-point transformations, which may affect accuracy.

]]> Disables prefetch insertion optimization..

]]> Specifies that aliasing should not be assumed in the program.

]]> Do not assume arguments may be aliased.

]]> Tells the compiler to assume that the program adheres to ISO C Standard. aliasability.
If your program adheres to these rules, then this option allows the compiler to optimize more aggressively. If it doesn't adhere to these rules, then it can cause the compiler to generate incorrect code.

]]> Enables language compatibility with the gcc option -ansi and provides the same level of ANSI standard comformance as that option.

This option sets option -fmath-errno.

]]> Tells the compiler to assume that the program test errno after calls to math library functions. This restricts optimization because it causes the compiler to treat most math functions as having side effects.

]]> The -Wl option directs the compiler to pass a list of arguments to the linker. In this case, "-z muldefs" is passed to the linker. For the Gnu linker (ld), the "-z keyword" option accepts several recognized keywords. Keyword "muldefs" allows multiple definitions. The muldefs keyword will enable, for example, linking with third party libraries like SmartHeap from Microquill.

]]> MicroQuill SmartHeap Library available from http://www.microquill.com/
To link SmartHeap with C applications, you must link with libsmartheap64.a
To link SmartHeap with C++ applications, you must link with libsmartheap64.a and libsmartheapC64.a

]]> -mtune=cpu Performs optimizations for a specified CPU. On Itanium(R)-based Linux systems, you can specify one of the following values.

itanium: Optimizes for Intel(R) Itanium(R) processors.
itanium2: Optimizes for Intel(R) Itanium(R) 2 processors..
itanium2-p9000: Optimizes for Dual-Core Intel(R) Itanium(R) 2 Processor 9000 Sequence processors.

]]> Instructs the compiler to analyze and transform the program so that 64-bit pointers are shrunk to 32-bit pointers, and 64-bit longs (on Linux) are shrunk into 32-bit longs wherever it is legal and safe to do so. In order for this option to be effective the compiler must be able to optimize using the -ipo option and must be able to analyze all library or external calls the program makes.

]]> -opt-mem-bandwidthn Enables or disables performance tuning and heuristics that control memory bandwidth use among processors. It allows the compiler to be less aggressive with optimizations that might consume more bandwidth, so that the bandwidth can be well-shared among multiple processors for a parallel program. For values of n greater than 0, the option tells the compiler to enable a set of performance tuning and heuristics in compiler optimizations such as prefetching, privatization, aggressive code motion, and so forth, for reducing memory bandwidth pressure and balancing memory bandwidth traffic among threads. The n value is the level of optimizing for memory bandwidth usage. You can specify one of the following values for n:

0 -- Disables a set of performance tuning and heuristics in compiler optimizations for parallel code. This is the default for serial code.
1-- Enables a set of performance tuning and heuristics in compiler optimizations for multithreaded code generated by the compiler. This is the default if compiler option -parallel or -openmp is specified, or Cluster OpenMP option -cluster-openmp is specified (see the Cluster OpenMP documentation).
2 -- Enables a set of performance tuning and heuristics in compiler optimizations for parallel code such as Windows Threads, pthreads, and MPI code, besides multithreaded code generated by the compiler.

]]> -inline-factor=n Specifies the percentage multiplier that should be applied to all inlining options that define upper limits: -inline-max-size, -inline-max-total-size, -inline-max-per-routine, and -inline-max-per-compile.

This option takes the default value for each of the above options and multiplies it by n divided by 100. For example, if 200 is specified, all inlining options that define upper limits are multiplied by a factor of 2. This option is usuful if you do not want to individually increase each option limit.

n is a positive integer specifying the percentage value. The default value is 100 (a factor of 1).

]]> -inline-max-size=n Specifies the lower limit for the size of what the inliner considers to be a large routine. It specifies the boundary between what the inliner considers to be medium and large-size routines.

The inliner prefers to inline small routines. It has a preference against inlining large routines. So, any large routine is highly unlikely to be inlined.

n is a positive integer that specifies the minimum size of a large routine.

]]> -inline-max-per-routine=n Specifies the maximum number of times the inliner may inline into a particular routine. It limits the number of times that inlining can be applied to any routine.

n is a positive integer that specifies the maximum number.

]]> -inline-max-total-size=n Specifies how much larger a routine can normally grow when inline expansion is performed. It limits the potential size of the routine. For example, if 2000 is specified for n, the size of any routine will normally not increase by more than 2000.

n is a positive integer that specifies the permitted increase in the size of the routine.

]]> -inline-min-size=n Specifies the upper limit for the size of what the inliner considers to be a small routine. It specifies the boundary between what the inliner considers to be small and medium-size routines. n is a positive integer that specifies the maximum size of a small routine.

The inliner has a preference to inline small routines. So, when a routine is smaller than or equal to the specified size, it is very likely to be inlined.

]]> This option turns on versioning of modulo operations for certain types of operands (e.g. x%y where y is dynamically determined to be a power of 2). The default is modulo versioning off. This option may improve performance. Versioning of modulo operations commonly results in possibly large speedups for x%y where y is a power of 2. However, the optimization could hurt performance slightly if y is not a power of 2. This option tells the compiler to use more aggressive unrolling for certain loops. The default is -no-unroll-aggressive (the compiler uses less aggressive default heuristics when unrolling loops). This option may improve performance. On the Itanium architecture, this option enables additional complete unrolling for loops that have multiple exits or outer loops that have a small constant trip count. This option controls the prefetches that are issued for a memory access in the next iteration, typically done in a pointer-chasing loop. This option should improve performance. The default is -no-opt-prefetch-next-iteration (next iteration prefetch off). This option controls the prefetches that are issued before the loop is entered. These prefetches target the initial iterations of the loop. The default is -opt-prefetch-initial-values (prefetch for initial iterations on) at -O1 and higher optimization levels. This option controls the loadpair optimization. The loadpair optimization is enabled by default when -O3 is used for Itanium. -no-opt-loadpair turns the loadpair optimization off. Enables or disables use of the "exclusive hint" when generating prefetch instructions. (IA-64 architecture only, default: off)

The Itanium architecture provides mechanisms, such as instruction templates, branch hints, and cache hints to enable the compiler to communicate compile-time information to the processor. "exclusive hint" is one of the cache hints and tells the processor to bring the prefetched cache line into the cache in exclusive state.

]]> Invoke the Intel C++ compiler for IPF Linux64 to compile C applications

]]> Invoke the Intel C++ compiler for IPF Linux64 to compile C++ applications

]]> Invoke the Intel Fortran compiler for IPF Linux64

]]>