IBM AIX Flag Disclosure SPEC OMP2001 For use with AIX submissions with the IBM XL compilers. Last Revised 23 May, 2007 Notes ===== The IBM C/C++ & Fortran compilers produce 32-bit binaries by default. Flags are described below which cause the compilers to produce 64-bit binaries. Source Level Portability Options ================================ Compiler Invocation =================== xlc_r The same as "xlc" except that it generates a threadsafe executable, compliant with the POSIX pthreads API. xlf90_r The same as "xlf90" except that it generates a threadsafe executable, compliant with the POSIX pthreads API. cleanpdf Erase the information in the PDF directory if any exists to ensure no feedback information is reused between compilations. Compiler Options ================ -O Performs optimizations that the compiler developers considered the best combination for compilation speed and runtime performance. -O3 Perform some memory and compile time intensive optimizations in addition to those executed with -O. The -O3 specific optimizations have the potential to slightly alter the semantics of a user's program. Optimizations may include, but are not limited to: Aggressive code motion, and scheduling on computations that have the potential to raise an exception, but no valid exceptions will be suppressed; Relaxed conformance to IEEE rules in cases where the difference in the results is not important to an application; Rewriting of floating point expressions. -O4 Equivalent to -O3 -qipa -qhot with automatic generation of architecture ( -qarch= )and tuning ( -qtune= )options ideal for that platform. The qipa level defaults to level=1. -O5 Equivalent to -O3 -qipa=level=2 -qhot with automatic generation of architecture ( -qarch= ) and tuning ( -qtune= ) options ideal for that platform. -Q, -qinline The -Q option without any list inlines all appropriate procedures, subject to limits on the number of inlined calls and the amount of code size increase as a result. -qinline is an alias for -Q. -q64 Selects 64-bit compiler mode. -qalign=struct=natural The compiler maps structure members to their -qalign=natural natural boundaries. The first form is used by the Fortran compiler; the second form is used by the C compiler and is a deprecated form for the Fortran compiler. -qarch=pwr6 Produces object code containing instructions that will run on power6 processors. -qarch=pwr6e Produces object code containing instructions that will run on power6 processors executing in "Enhanced" mode which includes instructions that are in addition to the PowerPC standard. -qarch=auto Produces object code containing instructions that will run on the hardware platform on which the program is compiled. -qfixed Indicates that the input source program is in fixed form. Allows fixed format Fortran 77 programs to be compiled using the xlf90 compiler invocation. -qfixed= States that Fortran code is in fixed source form, with optional argument specifying the maximum line length. -qhot Perform high-order transformations on loops during optimization. -qhot=arraypad Pad the sizes of arrays to align better in cache. -qipa=level=1 Turns on interprocedural analysis with inlining, limited alias analysis, and limited call-site tailoring. This is the default level of -qipa. -qipa=level=2 Turns on interprocedural analysis with inlining, cloning, full alias analysis, constant propagation, call-site tailoring, and dead code removal. -qipa=noobject Do not generate object files during the first stage of inter- procedural analysis. -qinline Alias for -Q. See -Q. -qipa=partition=large Specifies the size of the regions within the program to analyze. Larger partitions contain more procedures, which result in better interprocedural analysis but require more storage to optimize. -qmaxmem=-1 Allows the compiler to use as much memory as it needs to execute. -qpdf1/pdf2 Profile directed feedback optimization -qsmp=omp Enable OpenMP parallelization directives. -qsuffix=f=f90 Sets the suffix for source files to be .f90. The .f90 suffix is required by xlf90 to compile Fortran 90 programs. -qsuppress=cmpmsg Suppress the output of the specified message(s). cmpmsg is the message put out at the compilation completion of each Fortran routine. -qtune=pwr6 Instruction selection, scheduling, and other implementation dependent performance enhancements for the Power6 processors. -qtune=auto Instruction selection, scheduling, and other implementation dependent performance enhancements for the hardware platform on which the program is compiled. -qunroll=n Unrolls inner loops in th program by a factor of n. -w Suppress warning messages from the C, C++, and Fortran compilers. Linker Options ============== -bdatapsize:64K These flags set the page-sizes of the data, stack, and -bstackpsize:64K text segments to 64K. -btextpsize:64K -bmaxdata:0x........ Sets the maximum combined size of the program's stack- and data- segments to this number of byes, specified in hexadecimal, when the default is too small. Large Page Settings: ==================== chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE $USER Allows $USER (non-root ID) to access the large pages that are available. It takes effect on next login. bosboot -a Creates a boot image used on the next system reboot. shutdown -rF Halt the operating system and reboot. AIX Environment Variables: ========================== MEMORY_AFFINITY=MCM Turn on Memory Affinity which has been enabled with the vmo command. MALLOCOPTIONS=multiheap Maintains multiple heaps in the process, for servicing simultaneous "malloc" requests. OMP_DYNAMIC=FALSE Disables dynamic adjustment of the number of available threads. OMP_NUM_THREADS=... The exact number of threads available to be used, or if OMP_DYNAMIC is TRUE, the upper limit on the number of available threads. XLFRTEOPTS=intrinthds={num_threads} Specifies the number of threads for parallel execution for parallel execution of the MATMUL and RANDOM_NUMBER intrinsic procedures. The default value for num_threads when using the MATMUL intrinsic equals the number of processors online. The default value for num_threads when using the RANDOM_NUMBER intrinsic is equal to the number of processors online*2. Changing the number of threads available to the MATMUL and RANDOM_NUMBER intrinsic procedures can influence performance. XLSMPOPTS A list of runtime settings affecting SMP execution. Here are some of the possibilities: SCHEDULE=STATIC Work is scheduled to threads round-robin. SPINS=0 Allows work-requests to spin indefinitely without the thread having to yield the time-slice. STACK=.... Specifies the largest allowable size of a thread's stack, in bytes. YIELDS=0 Allows the thread to yield an indefinite number of times without being driven into a sleep state. STARTPROC=n When assigning threads to CPU's, begin with thread n on CPU n. STRIDE=X When assigning the next thread to a CPU, add X to the current CPU index instead of using (CPU+1). System & Process Management: ============================ The following commands are used to bind processes to processors in SPEC/CPU runs. The SPEC/CPU harness uses the $SPECUSERNUM variable to enumerate the different processes in a rate-run; in the text of the SPEC/CPU config-file, this is expressed as "\$SPECUSERNUM" in order for the variable-name to be evaluated at runtime. bindprocessor X Y AIX command, binding process X to CPU Y. Note that this binds the main process thread, while OMP child threads may be assigned to other CPUs. smtctl -m on -w boot AIX commands enabling & disabling SMT (Simultaneous smtctl -m off -w boot Multi-Threading) which allows a single CPU core to process multiple execution threads simultaneously. These forms of the command must be followed by a "bosboot -a" command and a "shutdown -r" reboot. drmgr -r -c cpu AIX command, deallocating one processor from the Operating System partition so it is not available for computation. The processors are reallocated on system reboot.