FLAG DESCRIPTIONS
SUN C AND FORTRAN
Sun ONE Studio 8 (S1S8)
          03/09/03

Compiler Flags

Flag                               Description

-autopar			   Perform automatic loop parallelization.

-D                                 Set definition for preprocessor.

-dalign                            Assume double-type data is double
                                   aligned.

-dn                                Specify static binding.

-e                                 Accept extended (132 character) input
                                   source lines (FORTRAN).

-fast                              This is a convenience option for selecting
				   a set of optimizations for performance and
				   it chooses the following switches that are
				   defined elsewhere in this page:

				   (C)
	                             -D__MATHERR_ERRNO_DONTCARE
				     -dalign
				     -fns
				     -fsimple=2
				     -fsingle
				     -ftrap=%none
				     -xalias_level=basic
				     -xbuiltin=%all
				     -xdepend
				     -xlibmil
				     -xO5
				     -xprefetch=auto,explicit
				     -xtarget=native

			      	   (C++)
				     -dalign
				     -fns
				     -fsimple=2
				     -ftrap=%none
				     -xbuiltin=%all
				     -xlibmil
				     -xlibmopt
				     -xO5
				     -xtarget=native

			      	   (Fortran)
				     -dalign
				     -depend
				     -fns
				     -fsimple=2
				     -ftrap=common
				     -xlibmil
				     -xlibmopt
				     -xO5
				     -xpad=local
				     -xprefetch=auto,explicit
				     -xtarget=native
				     -xvector=yes

-fixed                             Accept fixed-format input source files
                                   (FORTRAN).

-fns                               Select non-standard floating point
                                   mode.

                                   This flag causes the nonstandard
                                   floating point mode to be enabled when
                                   a program begins execution. By default,
                                   the nonstandard floating point mode
                                   will not be enabled automatically.

                                   On some SPARC systems, the nonstandard
                                   floating point mode disables "gradual
                                   underflow", causing tiny results to be
                                   flushed to zero rather than producing
                                   subnormal numbers. It also causes
                                   subnormal operands to be silently
                                   replaced by zero. On those SPARC
                                   systems that do not support gradual
                                   underflow and subnormal numbers in
                                   hardware, use of this option can
                                   significantly improve the performance
                                   of some programs.

                                   Warning: When nonstandard mode is
                                   enabled, floating point arithmetic may
                                   produce results that do not con- form
                                   to the requirements of the IEEE 754
                                   standard. See the Numerical Computation
                                   Guide for more information.

-fsimple=0                         Permits no simplifying assumptions.
                                   Preserves strict IEEE 754 conformance.

-fsimple=1                         With -fsimple=1, the optimizer can
                                   assume the following:

                                   o The IEEE 754 default
                                   rounding/trapping modes do not change
                                   after process initialization.

                                   o Computations producing no visible
                                   result other than potential
                                   floating-point exceptions may be
                                   deleted.

                                   o Computations with Infinity or NaNs as
                                   operands need not propagate NaNs to
                                   their results. For example, x*0 may be
                                   replaced by 0.

                                   o Computations do not depend on sign of
                                   zero.

-fsimple=2                         Permits aggressive floating point
                                   optimizations that may cause programs
                                   to produce different numeric results
                                   due to changes in rounding. Even with
                                   -fsimple=2, the optimizer still is not
                                   permitted to introduce a floating point
                                   exception in a program that otherwise
                                   produces none.

-fsimple[=n]                       Allows the compiler to make simplifying
                                   assumptions concerning floating-point
                                   arithmetic.

-ftrap=t                           Sets the IEEE 754 trapping mode in
                                   effect at startup.

                                   t is a comma-separated list that
                                   consists of one or more of the
                                   following: %all, %none, common,
                                   [no%]invalid, [no%]overflow,
                                   [no%]underflow, [no%]division,
                                   [no%]inexact.

                                   The default is -ftrap=%none.

                                   This option sets the IEEE 754 trapping
                                   modes that are established at program
                                   initialization. Processing is
                                   left-to-right. The common exceptions,
                                   by definition, are invalid, division by
                                   zero, and overflow.

                                   o %none, the default, turns off all
                                   trapping modes.

                                   Do not use this option for programs
                                   that depend on IEEE standard exception
                                   handling; you can get different
                                   numerical results, premature program
                                   termination, or unexpected SIGFPE
                                   signals.

-libmil                            Use inline expansion templates for
                                   libm.

-lm                                Link with math library

-lmopt                             This chooses the math library that is
                                   optimized for speed

-lmtmalloc			   fast concurrent malloc library suitable for
				   multi-threaded applications

-native                            Select native machine characteristics
                                   for optimization.

-openmp				   enable explicit parallelization with
				   Fortran 90 OpenMP directives.

-pad				   Synonymous with -xpad (see -xpad below)

-Qoption <phase> <flags>           Pass flags along to compiler phase:

                                   f90comp Fortran first pass

                                   iropt Global optimizer

                                   cg Code generator

-Qoption iropt <flags>             See -W2,<flags> below.

-Qoption iropt -Ainline[:cp=<n>]   Control the optimizer's loop inliner:
[:cs=<n>][:inc=<n>][:irs=<n>]	        cp=<n> The minimum call site frequency
[:mi][:recursion=1]			       counter in order to consider
					       a routine for inlining
               			        cs=<n> Set inline callee size limit to
					       n. The unit roughly corresponds
					       to the number of instructions.
               				inc=<n> The inliner is allowed to
						increase the size of
						the program by up to n%.
               				irs=<n> Allow routines to increase by
						up to n. The unit roughly
						corresponds to the number of
						instructions.
               				mi Perform maximum inlining (without
					   considering code size increase).
               				recursion=1 Allow routines that are
						    called recursively to
						    still be eligible for
						    inlining.

-Qoption iropt -Apf:const	   Mark prefetch candidates with detailed
				   analysis of constants in array subscripts.

-Qoption iropt -Apf:largedim       Mark prefetch candidates by assuming a large
				   first-dimension size for all arrays with
				   unknown sizes at compile time.

-Qoption iropt -Apf:outer=<n>	   Turn on (1) prefetch candidates marking in
				   the outer loop. 0 turns it off.

-Qoption iropt -Apf:pdl=1	   Do prefetching for one-level indirect
				   memory references.

-Qoption iropt -Atile:skewp[:b<n>] Perform loop tiling which is enabled by
				   loop skewing. Loop skewing transforms
				   a non-fully interchangeable loop nest to
				   a fully interchangeable loop nest.
				   The optional b<n> sets the tiling block
				   size to n.

-Qoption iropt -Aujam:inner=g      Increase the probability that
				   small-trip-count inner loops will be fully
				   unrolled.

-Qoption iropt -Athr		   Perform tree height reduction optimizations.

-Qoption iropt -whole              Do whole program optimizations.

-Qoption cg -Qlp=<n>[-av=<n>]      Control irregular loop prefetching:
[-t=<n>][-fa=<n>][-fl=<n>] 		lp=<n> Turns the module on (1) or
[ol=<n>]					       off (0) (default is on
					       for F90; off for C/C++)
               				-av=<n> Sets the prefetch look ahead
						distance, in bytes.
						Default is 256.
               				-t=<n> Sets the number of attempts at
					       prefetching. If not specified,
					       t=2 if -xprefetch_level=3 has
					       been set; otherwise, defaults
					       to t=1.
               				-fa=<n> 1=Force user settings to
						override internally computed
						values.
               				-fl=<n> 1=Force the optimization to
						be turned on for all languages.
					-ol=<n> Turns on (1) prefetching for
						outer loop.

-Qoption cg -Qms_pipe+prefolim=<n> Set prefetch ahead distance assuming that
				   the number of outstanding prefetches are <n>.
				   With larger <n>, the ahead distance gets
				   larger. Default value for <n> is 8 on
				   UltraSPARC-III.

-stackvar                          Allocate routine local variables on
                                   stack (FORTRAN).

-W<phase>,<flags>                  Pass flags along to compiler phase
                                   (2=optimizer, c=code generator)

-W2,<flags>                        Also see -Qoption iropt <flags> above.

-W2,-Ainline:call_in_pragma	   Consider functions called in parallel
                                   regions and loops as candidates
                                   for inlining

-W2,-Apf:outer=<n>	           Turn on (1) prefetch candidates marking in
				   the outer loop. 0 turns it off.

-Xa                                Assume ANSI C conformance, allow K & R
                                   extensions. (default mode)

-xalias_level=<a>                  Allows compiler to perform type-based
                                   alias analysis at the given alias
                                   level (C).

                                   basic assume ISO C9X aliasing rules for
                                   basic types only.

                                   std assume ISO C9X aliasing rules.

                                   strong assume all pointers are type
                                   safe (strongly typed).

-xarch=<a>                         Limit the set of instructions the
                                   compiler may use.

-xbuiltin=%all (C, C++)            Substitute intrinsic functions or inline
				   system functions where profitable for
				   performance.

-Xc                                Assume strict ANSI C conformance.

-xcache=<c>                        Defines the cache properties for use by
                                   the optimizer.

                                   c must be one of the following:

                                   o native (set parameters for the host
                                   environment)

                                   o s1/l1/a1

                                   o s1/l1/a1:s2/l2/a2

                                   o s1/l1/a1:s2/l2/a2:s3/l3/a3

                                   The si/li/ai are defined as follows:

                                   si The size of the data cache at level
                                   i, in kilobytes.

                                   li The line size of the data cache at
                                   level i, in bytes.

                                   ai The associativity of the data cache
                                   at level i.

-xchip=<c>                         Specifies the target processor for use
                                   by the optimizer. ultra3 (C, C++, 
Fortran)
				   for UltraSPARC-III based machines.

-xdepend                           Analyze loops for data dependencies.

-xipo=n                            Performs optimizations across all
                                   object files in the link step: 0=off,
                                   1=on, 2=performs whole-program
                                   detection and analysis

-xlibmopt                          This chooses the math library that is
                                   optimized for speed.

-xlic_lib=sunperf		   Link with Sun Performance library (this
				   library implements optimized BLAS 1,2,3,
				   LAPACK, FFTPACK, Sparse linear algebra and
				   other mathematical functions).

-xO1                               Does basic local optimization
                                   (peephole).

-xO2                               xO1 and more local and global
                                   optimizations.

-xO3                               Besides what xO2 does, it optimizes
                                   references or definitions for external
                                   variables. Loop unrolling and software
                                   pipelining are also performed.

-xO4                               xO3 plus function inlining.

-xO5                               Besides what xO4 does, it enables
                                   speculative code motion.

-xopenmp			   Enable explicit parallelization with
				   C OpenMP directives.

-xpad=common[:<n>]                 Pad common block variables, for better
                                   use of cache. n specifies the amount of
                                   padding to apply, in units that are the
				   same size as the array elements. If no
				   parameter is specified then the compiler
				   selects one automatically.

-xpad=local[:<n>]                  Pad local variables only, for better
                                   use of cache. n specifies the amount of
                                   padding to apply, in units that are the
				   same size as the array elements. If no
				   parameter is specified then the compiler
				   selects one automatically.

-xpagesize=<n> 			   Set the preferred page size for running
				   the program.

-xprefetch[=value]                 Enable prefetch instructions on those
                                   architectures that support prefetch,
                                   such as UltraSPARC II (-xarch=v8plus,
                                   v8plusa, v9plusb, v9, v9a, or v9b)

                                   auto

                                   Enable automatic generation of prefetch
                                   instructions

                                   no%auto

                                   Disable automatic generation of
                                   prefetch instructions

                                   explicit

                                   Enable explicit prefetch macros

                                   no%explicit

                                   Disable explicit prefetch macros

                                   yes

                                   -xprefetch=yes is the same as
                                   -xprefetch=auto,explicit

                                   no

                                   -xprefetch=no is the same as
                                   -xprefetch=no%auto,no%explicit

                                   Defaults

                                   If -xprefetch is not specified,
                                   -xprefetch=no%auto,explicit is assumed.

                                   If only -xprefetch is specified,
                                   -xprefetch=auto,explicit is assumed.

-xprefetch=latx:<n> 		   Adjust the compiler's assumptions about
				   prefetch latency by the specified factor.
				   Typically values in the range of 0.5 to 2.0
				   will be useful. A lower number might
				   indicate that data will usually be cache
				   resident; a higher number might indicate
				   a relatively larger gap between the
				   processor speed and the memory speed
				   (compared to the assumptions built into
				   the compiler).

-xprefetch_level		   Insert prefetches in loops with control flow
				   -xprefetch_level=1	compiler inserts
							prefetches only in
							loops with no control
							flow
				   -xprefetch_level=2	compiler inserts
							prefetches in loops
							with control flow
				   -xprefetch_level=3	compiler aggressively
							inserts prefetches in
							loops with control
							flow

-xprofile                          As used in the notes section of the report,
                                   an indication that profiling was used.  The
                                   configuration file used the following:
                                   fdo_pre0:  rm -rf ./feedback.profile \
                                              ./SunWS_cache
                                   PASS1:     -xprofile=collect:./feedback
                                   PASS2:     -xprofile=use:./feedback

-xprofile=collect                  Collect profile data for feedback
                                   directed optimizations.

-xprofile=use                      Use data collected for profile
                                   feedback.

-xreduction			   Recognize reduction operations in loops.

-xrestrict[=f1,...,f2,%all,        Treat pointer-valued function
%none]                             parameters as restricted pointers. The
                                   default is %none. Specifying -xrestrict
                                   is equivalent to specifying
                                   -xrestrict=%all.

-Xt                                Assume K & R conformance, allow ANSI C.

-xtarget=native                    Same as -native

-xvector			   Enable automatic calls to SPARC vector
				   math library functions.


------------------------------------------------------------------
Kernel Parameters

Flag                               Description

shmsys:shminfo_shmmin              Minimum size of system V shared memory
                                   segment that can be created.

shmsys:shminfo_shmmax              Maximum size of system V shared memory
                                   segment that can be created. This
                                   parameter is an upper limit that is
                                   checked before the system sees if it
                                   actually has the physical resources to
                                   create the requested memory segment.

shmsys:shminfo_shmmni              System wide limit on number of shared
                                   memory segments that can be created.

shmsys:shminfo_shmseg              Limit on the number of shared memory
                                   segments that any one process can
                                   create.

tune_t_fsflushr			   Specifies the number of seconds between
				   fsflush (system daemon for file system
				   flushing) invocations.

autoup				   Along with tune_t_flushr, autoup controls
				   the amount of memory examined for dirty
				   pages in each invocation and frequency of
				   file system sync operations.

segvn_comb_thrshld		   specifies a threshold when two adjacent
				   segvn (vnode segment driver) segments
				   should be concatenated together.

------------------------------------------------------------------

Environment Variables

Flag                               Description

OMP_DYNAMIC			   Enables or disables dynamic adjustment of
				   the number of threads available for
				   execution of parallel regions.

OMP_NUM_THREADS			   Sets the number of threads to use
				   during execution, unless that number is
				   explicitly changed by calling the
				   OMP_SET_NUM_THREADS subroutine.

MT_BIND_PROCESSOR		   This environment variable can be used to
				   bind the LWPs (lightweight processes) of
				   a multithreaded program to processors.
				   Performance can be enhanced with processor
				   binding, but performance degradation
				   will occur if multiple LWPs are bound to
				   the same processor.
				   MT_BIND_PROCESSOR = TRUE: bind LWPs to
							     processors.
				   MT_BIND_PROCESSOR = FALSE: do not bind LWPs
							      to processors
							      (default).
				   For MT_BIND_PROCESSOR=TRUE, LWPs are bound
				   in a round-robin fashion starting with
				   processor whose "virtual-id" is 0.  Virtual
				   processor IDs are consecutive integers that
				   start with 0, and may or may not be
				   identical to the actual processor
				   IDs.  If n processors are available online,
				   then their virtual processor IDs are 0, 1,
				   ..., n-1.

STACKSIZE			   A default stacksize of 4 MB (for 32-bit
				   programs) and 8 MB (for 64-bit programs) is
				   used for additional threads created in
				   an OpenMP program. The environment variable
				   STACKSIZE can be used to set it to
				   a different value. For example,
				   setenv STACKSIZE 2048 creates threads with
				   stacksize of 2 MB each.

MPSSHEAP=<n> 			   Specify the preferred page size for heap.
				   The specified page size is applied to all
				   created processes.

MPSSSTACK=<n> 			   Specify the preferred page size for stack.
				   The specified page size is applied to all
				   created processes.

LD_PRELOAD=mpss.so.1 (Unix)        Allow use of the mpss.so.1 shared object,
				   which provides a means by which preferred
				   stack and/or heap page sizes can be selected.

--------------------------------------------------------

src.alt modification for 331.art_l peak runs

The src.alt for 331.art_l is a modified version of the file scanner.c.

$ diff scanner.c src.alt/fabs/scanner.c
756c756
<     if (abs(ttemp - f1_layer[o][ti].P) < FLOAT_COMPARE_TOLERANCE)
---
>     if (fabs(ttemp - f1_layer[o][ti].P) < FLOAT_COMPARE_TOLERANCE)
903c903
<     if (abs(ttemp - f1_layer[o][ti].P) < FLOAT_COMPARE_TOLERANCE)
---
>     if (fabs(ttemp - f1_layer[o][ti].P) < FLOAT_COMPARE_TOLERANCE)

The integer abs function is changed to fp abs function (fabs) in
two places. fabs is used instead of abs also in ompm2001-isoc-20020619
for 330.art in OMPM: 330.art/src/src.alt/hpg.1/scanner.c.