<?xml version="1.0"?>

<!DOCTYPE flagsdescription SYSTEM

       "http://www.spec.org/dtd/cpuflags1.dtd">



<!-- This file defines flags for use with the PGI 7.1-3	and PathScale 3.0 Compilers on AMD Opteron 10h Family Processors-->

<flagsdescription>



<filename>amd814GH-flags</filename>



<title>PGI 7.1-3 and PathScale 3.0 Compilers for Linux.	 Optimization, Compiler, and Other flags for use by SPEC CPU2006</title>



<!-- Style -->



<style>

<![CDATA[

body {

  margin:  1em;

  border:  0;

  padding: 0;

  background-repeat: no-repeat;

  background-attachment: fixed;

  background-position: 100% 0;

  color:      black;

  font-family: "Times Roman", times, serif;

}



div.flagDesc {

  clear: both;

  color: black;

  background-color: #d6e7f7;

  border: 1px solid #blue;

  margin: 0 auto;

  width: 90%;

}



ul.flagTOC {

  list-style-type: none;

  margin: 0;

  padding: 0;

}



ul.flagTOC > li	{

  border: 1px solid #d6e7f7;

  background: #d6e7f7;

}



ul.flagTOC > li	> a:link {

   color: blue;

}



ul.SubMenu li {

  border: 1px solid #d6e7f7; /*	rgb(211, 211, 211); */

}



ul.SubMenu {

  border: 1px solid blue;

  background-color: #d6e7f7;

}

]]>

</style>





<!-- Header -->



<header>

<![CDATA[

<div id='banner'>

<h2><b>Compilers: PGI 7.1-3, PathScale 3.0</b></h2>

<h2><b>Operating systems: Linux</b></h2>

</div>

]]>

</header>



<!-- Platform Settings -->

<platform_settings>

<![CDATA[

<p><b>Linux Huge Page settings</b></p>

<p>In order to take full advantage of using PGI's huge page runtime library, your system must be configured to use huge	pages.

It is safe to run binaries compiled with "-Msmartalloc=huge" on	systems	not configured to use huge pages, however, you will not

benefit	from the performance improvements huge pages offer.  To	configure your system for huge pages perform the following steps:

</p>

<ul>

      <li>Create a mount point for the huge pages: "mkdir /mnt/hugepages"</li>

      <li>The huge page	file system needs to be	mounted	when the systems reboots.  Add the following to	a system boot configuration file before	any services are started: "mount -t hugetlbfs nodev /mnt/hugepages"</li>

      <li>Set vm/nr_hugepages=N	in your	/etc/sysctl.conf file where N is the maximum number of pages the system	may allocate.</li>

      <li>Reboot to have the changes take effect.</li>

</ul>

<p>Note	that further information about huge pages may be found in your Linux documentation file: /usr/src/linux/Documentation/vm/hugetlbpage.txt</p>

<p><b>PGI_HUGE_PAGES</b></p>

<p>The maximum number of huge pages an application is allowed to use can be set	at run time via	the environment	variable PGI_HUGE_PAGES.  If not set, then the process may use all available huge pages	when compiled with "-Msmartalloc=huge" or a maximum of <b>n</b>	pages where the	value of <b>n</b> is set via the compile time flag "-Msmartalloc=huge:<b>n</b>.</p>

<p><b>Using numactl to bind processes and memory to cores</b></p>

<p>For multi-copy runs or single copy runs on systems with multiple sockets, it	is advantageous	to bind	a process to a particular core.	 Otherwise, the	OS may arbitrarily move	your process from one core to another.	This can effect	performance.  To help, SPEC allows the use of a	"submit" command where users can specify a utility to use to bind processes.  We have found the	utility	'numactl' to be	the best choice.</p>

<p>numactl runs	processes with a specific NUMA scheduling or memory placement policy.  The policy is set for a command and inherited by	all of its children.  The numactl flag "--physcpubind" specifies which core(s) to bind the process. "-l" instructs numactl to keep a process memory on the local node while "-m" specifies which node(s) to place a process memory.  For full details on using numactl,	please refer to	your Linux documentation, 'man numactl'</p>

<p>Note	that some versions of numactl, particularly the	version	found on SLES 10, we have found	that the utility incorrectly interprets	application arguments as it's own.  For	example, with the command "numactl --physcpubind=0 -l a.out -m a", numactl will	interpret a.out's "-m" option as it's own "-m" option.	To work	around this problem, a user can	put the	command	to be run in a shell script and	then run the shell script using	numactl.  For example: "echo 'a.out -m a' > run.sh ; numactl --physcpubind=0 bash run.sh"</p>

<p><b> ulimit -s &lt;n&gt;</b></p>

<p>

	   Sets	the stack size to <b>n</b> kbytes, or <b>unlimited</b> to allow	the stack size

	   to grow without limit.

</p>

<p><b> ulimit -l &lt;n&gt;</b></p>

<p>

	   Sets	the maximum size of memory that	may be locked into physical memory.

</p>

]]>

</platform_settings>



<!-- Compilers -->



<flag name='pgcc' class='compiler' regexp="pgcc\b" >

<![CDATA[

 <p>The	PGI C compiler.</p>

 ]]>

<example>pgcc</example>

</flag>



<flag name='pgcpp' class='compiler' regexp="pgcpp\b">

<![CDATA[

 <p>The	PGI C++	compiler.</p>

 ]]>

<example>pgcpp</example>

</flag>



<flag name='pgf95' class='compiler' regexp="pgf95\b" >

<![CDATA[

 <p>The	PGI Fortran 95 compiler.</p>

 ]]>

<example>pgf95</example>

</flag>



<flag name="pathcc" class="compiler" regexp="pathcc">

   <example>pathcc</example>

   <![CDATA[

     <p>Invoke the PathScale C compiler.<br>

     Also used to invoke linker	for C programs.</p>

   ]]>

</flag>



<flag name="pathCC" class="compiler" regexp="pathCC">

   <example>pathCC</example>

   <![CDATA[

      <p>Invoke	the PathScale C++ compiler.<br>

      Also used	to invoke linker for C++ programs.</p>

   ]]>

</flag>



<flag name="pathf95" class="compiler" regexp="pathf95">

   <example>pathf95</example>

   <![CDATA[

      <p>Invoke	the PathScale Fortran 77, 90 and 95 compilers. <br>

      Also used	to invoke linker for Fortran programs and

      for mixed	C / Fortran.  pathf90 and pathf95 are synonymous.</p>

   ]]>

</flag>





<!-- Portability, Other	Flags. -->



<flag name="F-fno-second-underscore" compilers="pathcc,	pathCC,	pathf95" class="portability">

   <example>-fno-second-underscore</example>

   <![CDATA[

      <p><b>CFP2006:</b></p>

      <p>If  -funderscoring is in effect, and the original Fortran external

      identifier contained an underscore, -fsecond-underscore appends

      a	second underscore to  the one added  by	 -funderscoring.

      -fno-second-underscore  does  not	append a second	underscore.

      The default is both -funderscoring and -fsecond-underscore, the

      same defaults as g77 uses.  -fno-second-underscore corresponds

      to the default policies of PGI Fortran and Intel Fortran.



   ]]>

</flag>



<flag name="F-statics" class="other"

      compilers="pathcc, pathCC, pathf95" regexp="-static">

   <example>-static</example>

   <![CDATA[

      <p>-static: On systems that support dynamic linking, this	prevents linking with

	the shared libraries.  On other	systems, this option has no effect. </p>

   ]]>

</flag>



<flag name="w" class="other"

 compilers="pgcc,pgcpp,pgf95"	regexp="-w\b">

<![CDATA[

 <p>Disable warning messages.</p>

]]>

<example>-w</example>

</flag>



<flag name="Mnomain" class="portability"

 compilers="pgf95"

 regexp="-Mnomain\b">

<![CDATA[

 <p>Don't include Fortran main program object module.</p>

]]>

<example>-Mnomain</example>

</flag>



<flag name="c9x" class="optimization"

compilers="pgcc" regexp="-c9x\b">

<![CDATA[

 <p>Use	C99 language features.</p>

]]>

<example>-c9x</example></flag>



<!-- Optimization Flags	-->





<flag name="fast"

  class="optimization"

  compilers="pgcc, pgcpp, pgf95"

  regexp="-fasts?s?e?\b">

<![CDATA[

 <p>Chooses generally optimal flags for	the target platform.  As of the	PGI 7.0	release, the flags "-fast"

and "-fastsse" are equivlent for 64-bit	compilations.  For 32-bit compilations "-fast" does not	include

"-Mscalarsse", "-Mcache_align", or "-Mvect=sse".</p>

 ]]>

<example>-fast</example>

<include flag="O2" />

<include flag="Munroll_c_n" flagtext="-Munroll=c:1" />

<include flag="Mautoinline" />

<include flag="Msmart" />

<include flag="Mlre" />

<include flag="Mnoframe" />

<include flag="Mvect_sse" />

<include flag="Mcache_align" />

<include flag="Mflushz"	/>

<include flag="Mdaz" />

<include flag="Mscalarsse" />

</flag>



<flag name="F-splitting:all" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-(CG|IPA|LNO|OPT|WOPT):([^:\s]+):(\S+)(?=\s|$)">

   <include text="-$1:$2" />

   <include text="-$1:$3" />

   <display enable="0" />

 This rule is used to split a flag group containing sub-options	into multiple flag descriptions.

 Please	refer to the flag file rule of the various sub-options for the actual flag description.

</flag>



<flag name="F-O_n" class="optimization"	compilers="pathcc, pathCC, pathf95" regexp="-O[0-3]\b">

   <example>-O3</example>

   <![CDATA[

      <p>Specify the basic level of optimization desired.<br>

      The options can be one of	the following:</p>



	 <p style="text-indent:	-25px; margin-left: 25px">

	    0&nbsp;&nbsp;&nbsp;	Turn off all optimizations.</p>



	 <p style="text-indent:	-25px; margin-left: 25px">

	    1&nbsp;&nbsp;&nbsp;	Turn on	local optimizations that

	    can	be done	quickly. Do peephole optimizations and

	    instruction	scheduling.</p>



	 <p style="text-indent:	-25px; margin-left: 25px">

	    2&nbsp;&nbsp;&nbsp;	Turn on	extensive optimization.

	    This is the	default.<br>

	    The	optimizations at this level are	generally conservative,

	    in the sense that they are virtually always	beneficial and

	    avoid changes which	affect

	    such things	as floating point accuracy.  In	addition to the	level

	    1 optimizations, do	inner loop

	    unrolling, if-conversion, two passes of instruction	scheduling,

	    global register allocation,	dead store elimination,

	    instruction	scheduling across basic	blocks,

	    and	partial	redundancy elimination.</p>



	 <p style="text-indent:	-25px; margin-left: 25px">

	    3&nbsp;&nbsp;&nbsp;	Turn on	aggressive optimization.<br>

	    The	optimizations at this level are	distinguished from -O2

	    by their aggressiveness, generally seeking highest-quality

	    generated code even	if it requires extensive compile time.

	    They may include optimizations that	are generally beneficial

	    but	may hurt performance.<br>

	    This includes but is not limited to	turning	on the

	    Loop Nest Optimizer, -LNO:opt=1, and setting

	    -OPT:roundoff=1:IEEE_arithmetic=2:Olimit=9000:reorg_common=ON.</p>



	 <p style="text-indent:	-25px; margin-left: 25px">

	    s&nbsp;&nbsp;&nbsp;	Specify	that code size is to be	given

	    priority in	tradeoffs with execution time.</p>



      If no value is specified,	2 is assumed.</p>

   ]]>

</flag>



<flag name="F-fno-math-errno" compilers="pathcc, pathCC, pathf95" class="optimization">

   <example>-fno-math-errno</example>

   <![CDATA[

      <p>Do not	set ERRNO after	calling	math functions that are	executed

      with a single instruction, e.g. sqrt. A program that relies on IEEE

      exceptions for math error	handling may want to use this flag for speed

      while maintaining	IEEE arithmetic	compatibility. This is implied by

      -Ofast. The default is -fmath-errno.</p>

   ]]>

</flag>



<flag name="F-fexceptions" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-f(no-|)exceptions">

   <example>-fno-exceptions</example>

   <![CDATA[

      <p>(For C++ only)	-fexceptions enables exception handling.

      This is the default.

      -fno-exceptions disables exception handling.</p>

   ]]>

</flag>



<flag name="F-ffast-math" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-f(no-|)fast-math">

   <example>-ffast-math</example>

   <![CDATA[

      <p>-ffast-math improves FP speed by relaxing ANSI	& IEEE rules.

      -fno-fast-math tells the compiler	to conform to ANSI and IEEE

      math rules at the	expense	of speed. -ffast- math implies

      -OPT:IEEE_arithmetic=2 -fno-math-errno.	-fno-fast-math

      implies -OPT:IEEE_arithmetic=1 -fmath-errno.</p>

   ]]>

</flag>



<flag name="F-OPT:div_split" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-OPT:div_split=(on|off|0|1)\b">

    <example>-OPT:div_split</example>

   <![CDATA[

      <p>-OPT:div_split=(ON|OFF)<br>

      Enable or	disable	changing x/y into x*(recip(y)).	This is	OFF by

      default, but enabled by -OPT:Ofast or -OPT:IEEE_arithmetic=3.

      This transformation generates fairly accurate code.</p>

   ]]>

</flag>



<flag name="F-OPT:Olimit" class="optimization" compilers="pathcc, pathCC, pathf95" regexp="-OPT:Olimit=(\d+)">

    <example>-OPT:Olimit</example>

   <![CDATA[

      <p>-OPT:Olimit=N<br>

      Disable optimization when	size of	program	unit is	> N. When N is 0,

      program unit size	is ignored and optimization process will not be

      disabled due to compile time limit.

      The default is 0 when -OPT:Ofast is specified,

      9000 when	-O3 is specified; otherwise the	default	is 6000.</p>

   ]]>

</flag>



<flag name="F-Ofast" class="optimization" compilers="pathcc, pathCC, pathf95" regexp="-Ofast">

   <example>-Ofast</example>

   <![CDATA[

      <p>Equivalent to -O3 -ipa	-OPT:Ofast -fno-math-errno -ffast-math.<br>

      Use optimizations	selected to maximize performance.

      Although the optimizations are generally safe, they may affect

      floating point accuracy due to rearrangement of computations.</p>

      <p>NOTE: -Ofast enables -ipa (inter-procedural analysis),

      which places limitations on how libraries	and .o files are built.</p>

   ]]>

   <include flag="F-O_n" />

   <include flag="F-ipa" />

   <include flag="F-OPT:Ofast" />

   <include flag="F-fno-math-errno" />

   <include flag="F-ffast-math"	/>

   <display enable="1" />

</flag>



<flag name="F-fb_create_fbdata"	class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-fb_create fbdata">

   <example>-fb_create fbdata</example>

   <![CDATA[

      <p>-fb_create &lt;path&gt;<br>

      Used to specify that an instrumented executable program is to be

      generated. Such an executable is suitable	for producing feedback

      data files with the specified prefix for use in feedback-directed

      optimization (FDO).

      The commonly used	prefix is "fbdata".<br>

      This is OFF by default.</p>

      <p>During	the training run, the instrumented executable produces information regarding execution paths and data values, but

      does not generate	information by using hardware performance counters. </p>

   ]]>

</flag>



<flag name="F-fb_opt_fbdata" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-fb_opt fbdata">

   <example>-fb_opt fbdata</example>

   <![CDATA[

      <p>-fb_opt &lt;prefix for	feedback data files&gt;<br>

      Used to specify feedback-directed	optimization (FDO) by extracting

      feedback data from files with the	specified prefix, which	were

      previously generated using -fb-create.

      The commonly used	prefix is "fbdata".

      The same optimization flags should be used

      for both the -fb-create and fb_opt compile steps.

      Feedback data files created from executables compiled

      with different optimization flags	may give checksum errors.<br>

      FDO is OFF by default.</p>

      <p>During	the -fb_opt compilation	phase, information regarding execution paths and data values are

      used to improve the information available	to the optimizer.  FDO enables some optimizations which

      are only performed when the feedback data	file is	available.  The	safety of optimizations	performed under	FDO is

      consistent with the level	of safety implied by the other optimization flags (outside of fb_create	and

      fb_opt) specified	on the compile and link	lines.</p>

   ]]>

</flag>



<flag name="F-ipa" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-ipa">

   <example>-ipa</example>

   <![CDATA[

      <p>Invoke	inter-procedural analysis (IPA). Specifying this option	is

      identical	to specifying -IPA or -IPA:.

      Default settings for the individual IPA suboptions are used.</p>

   ]]>

</flag>



<flag name="F-CG:load_exe" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-CG:load_exe=\d+">

   <example>-CG:load_exe</example>

   <![CDATA[

      <p>-CG:load_exe=N	: Specify the threshold	for subsuming a	memory load

      operation	into the operand of an arithmetic instruction.

      The value	of 0 turns off this subsumption	optimization.

      If N is 1, this subsumption is performed only when the result of

      the load has only	one use.

      This subsumption is not performed	if the number of times the result

      of the load is used exceeds the value N, a non-negative integer.<br>

      If the ABI is 64-bit and the language is Fortran,	the default for	N

      is 2, otherwise the default is 1.</p>

   ]]>

</flag>



<flag name="F-CG:movnti" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-CG:movnti=\d+">

   <example>-CG:movnti</example>

   <![CDATA[

      <p>-CG:movnti=N :	Convert	ordinary stores	to non-temporal	stores

      when writing memory blocks of size larger	than  N	KB.  When  N

      is set to	0, this	transformation is avoided.

      The default value	is 120 (KB).</p>

   ]]>

</flag>



<flag name="F-INLINE:aggressive" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-INLINE:aggressive=(on|off|0|1)">

   <example>-INLINE:aggressive</example>

   <![CDATA[

      <p>-INLINE:aggressive : Tell the compiler	to be more aggressive about inlining.

	The default is -INLINE:aggressive=OFF.</p>

   ]]>

</flag>



<flag name="F-IPA:plimit" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-IPA:plimit=\d+">

   <example>-IPA:plimit</example>

   <![CDATA[

      <p>-IPA:plimit=N : This option stops inlining into a specific

      subprogram once it reaches size N	in the intermediate representation.

      Default is 2500.</p>

   ]]>

</flag>



<flag name="F-LNO:blocking" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-LNO:blocking=(on|off|0|1)\b">

   <example>-LNO:blocking</example>

   <![CDATA[

      <p>Specify options and transformations performed on loop nests

      by the Loop Nest Optimizer (LNO).	The -LNO options are enabled only

      if -O3 is	also specified on the pathf95 command line.</p>



      <p>-LNO:blocking : Enable	or disable the cache blocking transformation.

      The default is ON.</p>

   ]]>

</flag>



<flag name="F-LNO:full_unroll" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-LNO:(full_unroll|fu)=\d+\b">

   <example>-LNO:full_unroll</example>

   <![CDATA[

      <p>-LNO:full_unroll,fu=N : Fully unroll loops with trip_count <= N

      inside LNO. N can	be any integer between 0 and 100.

      The default value	for N is 5. Setting this flag to 0 disables

      full unrolling of	small trip count loops inside LNO.</p>

   ]]>

</flag>



<flag name="F-LNO:fission" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-LNO:fission=[0-2]\b">

   <example>-LNO:fission</example>

   <![CDATA[

      <p>-LNO:fission=N	: Perform loop fission.	N can be one of	the following:<br>

      0	= Disable loop fission (default)<br>

      1	= Perform normal loop fission as necessary<br>

      2	= Specify that fission be tried	before fusion<br></p>

      <p> Because  -LNO:fusion is on by	default, turning on fission without turning off

	  fusion may result in their effects being nullified.	Ordinarily, fusion  is

	  applied  before  fission.   Specifying  -LNO:fission=2  will turn on fission and

	  cause	it to be applied before	fusion.</p>

   ]]>

</flag>



<flag name="F-LNO:ignore_feedback" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-LNO:ignore_feedback=(on|off|0|1)\b">

   <example>-LNO:ignore_feedback</example>

   <![CDATA[

      <p>-LNO:ignore_feedback=(on|off|0|1) : If	the flag is ON then feedback information

      from the loop annotations	will be	ignored	in LNO transformations.

      The default is OFF.</p>

   ]]>

</flag>



<flag name="F-LNO:minvariant" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-LNO:minvariant=(on|off|0|1)\b">

   <example>-LNO:minvariant</example>

   <![CDATA[

      <p>Enable	or disable moving loop-invariant expressions out

      of loops.	The default is ON.</p>

   ]]>

</flag>



<flag name="F-LNO:opt" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-LNO:opt=(0|1)\b">

   <example>-LNO:opt=0</example>

   <![CDATA[

      <p>This option controls the LNO optimization level. The  options	can  be

	one of the following:<br>

	0 = Disable nearly all loop nest optimizations.<br>

	1 = Perform full loop nest transformations. This is the	default.</p>

   ]]>

</flag>



<flag name="F-LNO:ou_prod_max" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-LNO:ou_prod_max=\d+\b">

   <example>-LNO:ou_prod_max</example>

   <![CDATA[

      <p>-LNO:ou_prod_max=N : This option indicates that the product

      of unrolling of the various outer	loops in a given loop nest

      is not to	exceed N, where	N is a positive	integer.

      The default is 16.</p>

   ]]>

</flag>



<flag name="F-LNO:prefetch_ahead" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-LNO:prefetch_ahead=\d+\b">

   <example>-LNO:prefetch_ahead</example>

   <![CDATA[

      <p>-LNO:prefetch_ahead=N : Prefetch N cache line(s) ahead.

      The default is 2.</p>

   ]]>

</flag>



<flag name="F-LNO:prefetch" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-LNO:prefetch=[0-3]\b">

   <example>-LNO:prefetch</example>

   <![CDATA[

      <p>-LNO:prefetch=(0|1|2|3) : This	option specifies

      the level	of prefetching.</p>



      <p>0 = Prefetch disabled.</p>



      <p>1 = Prefetch is done only for arrays that are always referenced

      in each iteration	of a loop.</p>



      <p>2 = Prefetch is done without the above	restriction.

      This is the default.</p>



      <p> 3 = Most aggressive.</p>

       ]]>

</flag>



<flag name="F-LNO:simd"	class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-LNO:simd=[0-2]\b">

   <example>-LNO:simd</example>

   <![CDATA[

      <p>-LNO:simd=(0|1|2) : This option enables or disables

      inner loop vectorization.</p>



      <p>0 = Turn off the vectorizer.</p>



      <p>1 = (Default) Vectorize only if the compiler can determine that

      there is no undesirable performance impact due to	sub-optimal

      alignment. Vectorize only	if vectorization does not introduce

      accuracy problems	with floating-point operations.</p>



      <p>2 = Vectorize without any constraints (most aggressive).</p>

   ]]>

</flag>



<flag name="F-m32" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-m32">

   <example>-m32</example>

   <![CDATA[

      <p>Compile for 32-bit ABI, also known as x86 or IA32.</p>

   ]]>

</flag>



<flag name="F-OPT:alias" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-OPT:alias=(typed|restrict|disjoint)\b">

    <example>-OPT:alias</example>

   <![CDATA[

      <p>The -OPT: option group	controls miscellaneous optimizations.

      These options override defaults based on the main

      optimization level.</p>



      <p>-OPT:alias=&lt;name&gt;<br>

      Specify the pointer aliasing model

      to be used. By specifying	one or more of the following for &lt;name&gt;,

      the compiler is able to make assumptions throughout the compilation:</p>



      <p style="text-indent: -25px; margin-left: 25px">

      typed<br>

	 Assume	that the code adheres to the ANSI/ISO C	standard

	 which states that two pointers	of different types cannot point

	 to the	same location in  memory.

	 This is ON by default when -OPT:Ofast is specified.</p>



      <p style="text-indent: -25px; margin-left: 25px">

      restrict<br>

	 Specify that distinct pointers	are assumed to point to	distinct,

	 non-overlapping objects. This is OFF by default.</p>



      <p style="text-indent: -25px; margin-left: 25px">

      disjoint<br>

	 Specify that any two pointer expressions are assumed to point

	 to distinct, non-overlapping objects. This is OFF by default.</p>

   ]]>

</flag>



<flag name="F-OPT:IEEE_arith" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-OPT:(IEEE_arithmetic|IEEE_arith|IEEE_a)=[1-3]\b">

    <example>-OPT:IEEE_arithmetic</example>

   <![CDATA[

      <p>-OPT:IEEE_arithmetic,IEEE_arith,IEEE_a=(1|2|3)<br>

      Specify the level	of conformance to IEEE 754 floating pointing

      roundoff/overflow	behavior.

      The options can be one of	the following:</p>



      <p>1 Adhere to IEEE accuracy. This is the	default	when optimization

      levels -O0, -O1 and -O2 are in effect.</p>



      <p>2 May produce inexact result not conforming to	IEEE 754.

      This is the default when -O3 is in effect.</p>



      <p>3 All mathematically valid transformations are	allowed.</p>

   ]]>

</flag>



<flag name="F-OPT:malloc_alg" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-OPT:malloc_alg=[0-1]\b">

    <example>-OPT:malloc_alg</example>

   <![CDATA[

      <p>-OPT:malloc_alg=(0|1)<br>

      Select an	alternate malloc algorithm which may improve speed.

      The compiler adds	setup code in the

      C/C++/Fortran "main" function to enable the chosen algorithm.

      The default is 0.

      </p>

   ]]>

</flag>



<flag name="F-OPT:Ofast" class="optimization"

    compilers="pathcc, pathCC, pathf95"	regexp="-OPT:Ofast\b">

    <example>-OPT:Ofast</example>

   <![CDATA[

      <p>-OPT:Ofast<br>

      Use optimizations	selected to maximize performance.

      Although the optimizations are generally safe, they may affect

      floating point accuracy due to rearrangement of computations.

      This effectively turns on	the following optimizations:

      -OPT:ro=2:Olimit=0:div_split=ON:alias=typed.</p>

   ]]>

   <include flag="F-OPT:ro"/>

   <include flag="F-OPT:Olimit"/>

   <include flag="F-OPT:div_split"/>

   <include flag="F-OPT:alias"/>

</flag>



<flag name="F-OPT:ro" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-OPT:(roundoff|ro)=[0-3]">

    <example>-OPT:ro</example>

   <![CDATA[

      <p>-OPT:roundoff,ro=(0|1|2|3)<br>

      Specify the level	of acceptable departure	from source language

      floating-point, round-off, and overflow semantics.

      The options can be one of	the following:</p>



      <p>0 = Inhibit optimizations that	might affect the floating-point

      behavior.	This is	the default when optimization levels -O0, -O1,

      and -O2 are in effect.</p>



      <p>1 = Allow simple transformations that might cause limited

      round-off	or overflow differences. Compounding such transformations

      could have more extensive	effects.

      This is the default when -O3 is in effect.</p>



      <p>2 = Allow more	extensive transformations, such	as the

      reordering of reduction loops.

      This is the default level	when -OPT:Ofast	is specified.</p>



      <p>3 = Enable any	mathematically valid transformation.</p>

   ]]>

</flag>



<flag name="F-OPT:unroll_size" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-OPT:unroll_size=(\d+)">

    <example>-OPT:unroll_size</example>

   <![CDATA[

      <p>-OPT:unroll_size=N<br>

      Set the ceiling of maximum number	of instructions	for  an

      unrolled	inner loop. If N=0, the	ceiling	is disregarded.

      The default is 40.</p>

   ]]>

</flag>



<flag name="F-OPT:unroll_times_max" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-OPT:(unroll_times_max|unroll_times)=(\d+)">

    <example>-OPT:unroll_times_max</example>

   <![CDATA[

      <p>-OPT:unroll_times_max=N<br>

      Unroll  inner loops by a maximum of N.  The default is 4.</p>

   ]]>

</flag>



<flag name="F-L_lib_directory_lsmartheap" class="optimization"

 compilers="pathcc, pathCC, pathf95" regexp="-L\S+\s+-lsmartheap">

 <example>-L/cpu2006/work/cpu2006/SmartHeap -lsmartheap</example>

   <![CDATA[

   <p>-L/cpu2006/work/cpu2006/SmartHeap	-lsmartheap<br>

      when used	as an EXTRA_CLIB or EXTRA_CXXLIB variable,

      results in linking with MicroQuill's SmartHeap 8 (32-bit)	library

      for Linux.  This is a library that optimizes calls to new, delete, malloc	and free.</p>

   ]]>

</flag>



<flag name="F-WOPT:aggstr" class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-WOPT:aggstr=\d+">

   <example>-WOPT:aggstr</example>

   <![CDATA[

      <p>The -WOPT: Specifies options that affect the global optimizer.

      The options are enabled at -O2 or	above.</p>



      <p>-WOPT:aggstr=N<br>

      This controls the	aggressiveness of the strength reduction optimization

      performed	by the scalar optimizer, in which induction expressions

      within a loop are	replaced by temporaries	that are incremented

      together with the	loop variable. When strength reduction is overdone,

      the additional temporaries increase register pressure, resulting in

      excessive	register spills	that decrease performance.

      The value	specified must be a positive integer value, which specifies

      the maximum number of induction expressions that will be strength-reduced

      across an	index variable increment.

      When set at 0, strength reduction	is only	performed for non-trivial

      induction	expressions. The default is 11.</p>

   ]]>

</flag>



<flag name="F-WOPT:retype_expr"	class="optimization"

      compilers="pathcc, pathCC, pathf95" regexp="-WOPT:retype_expr=(on|off|0|1)\b">

   <example>-WOPT:retype_expr</example>

   <![CDATA[

      <p>-WOPT:retype_expr=(ON|OFF)<br>

     Enables the optimization in the compiler that converts 64-bit address

     computation to use	32-bit arithmetic as much as possible.

     Default is	OFF.</p>

   ]]>

</flag>





<flag name="no-exceptions"

 class="optimization"

 compilers="pgcpp"

 regexp="--no_exceptions\b">

<![CDATA[

 <p>Disable C++	exception handling support.</p>

 ]]>

<example>--no_exceptions</example>

</flag>



<flag name="no-rtti"

 class="optimization"

 compilers="pgcpp"

 regexp="--no_rtti\b">

<![CDATA[

 <p>Disable C++	run time type information support.</p>

 ]]>

<example>--no_rtti</example>

</flag>



<flag name="zc_eh" class="optimization"

 compilers="pgcpp"

 regexp="--zc_eh\b">

<![CDATA[

<p>Generate zero-overhead C++ exception	handlers.</p>

]]>

</flag>



<flag name="Mautoinline"

 class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mautoinline\b">

<![CDATA[

<p>Inline functions declared with the inline keyword.</p>

]]>

</flag>



<flag name="Mnoautoinline"

 class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mnoautoinline\b">

<![CDATA[

<p>Disable inlining of functions declared with the inline keyword.</p>

]]>

</flag>





<flag name="Mcache_align"

 class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mcache_align\b">

<![CDATA[

 <p>Align "unconstrained" data objects of size greater than or equal to	16

bytes on cache-line boundaries.	 An "unconstrained" object is a	variable or

array that is not a member of an aggregate structure or	common block, is not

allocatable, and is not	an automatic array.  On	by default on 64-bit Linux systems.</p>

 ]]>

<example>-Mcache_align</example>

</flag>



<flag name="Mflushz"

  class="optimization"

  compilers="pgcc, pgcpp, pgf95"

  regexp="-Mflushz\b">

<![CDATA[

 <p>Set	SSE to flush-to-zero mode; if a	floating-point underflow occurs, the value is set to zero.</p>

]]>

<example>-Mflushz</example>

</flag>



<flag name="Mdaz"

  class="optimization"

  compilers="pgcc, pgcpp, pgf95"

  regexp="-Mdaz\b">

<![CDATA[

 <p>Treat denormalized numbers as zero.	 Included with "-fast" on Intel	based systems.	For AMD	based systems, "-Mdaz" is

not included by	default	with "-fast".</p>

]]>

<example>-Mdaz</example>

</flag>



<flag name="Mframe" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mframe\b">

<![CDATA[

 <p>Generate code to set up a stack frame.</p>

]]>

<example>-Mframe</example>

</flag>



<flag name="Mnoframe" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mnoframe\b">

<![CDATA[

 <p>Eliminates operations that set up a	true stack frame pointer for every function.  With this	option enabled,	you

cannot perform a traceback on the generated code and you cannot	access local variables.</p>

]]>

<example>-Mnoframe</example>

</flag>



<flag name="Mfprelaxed_subopt" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mfprelaxed=([^,\s]+),(\S+)\b"	>

<include text="-Mfprelaxed=$1" />

<include text="-Mfprelaxed=$2" />

<display enable="0" />

CPU2006	flags file rule	used to	split an optimization flag containing sub-options into multiple	flag descriptions.

Please refer to	the flag file rule of the various sub-options for the actual flag description.

</flag>



<flag name="Mfprelaxed_rsqrt" class="optimization"

      compilers="pgcc, pgcpp, pgf95"

      regexp="-Mfprelaxed=rsqrt\b">

<![CDATA[

 <p>Instructs the compiler to use relaxed precision in the calculation of floating-point reciprocal square root	(1/sqrt). Can result in

improved performance at	the expense of numerical accuracy.</p>

 ]]>

<example>-Mfprelaxed=rsqrt</example>

</flag>



<flag name="Mfprelaxed_sqrt" class="optimization"

      compilers="pgcc, pgcpp, pgf95"

      regexp="-Mfprelaxed=sqrt\b">

<![CDATA[

 <p>Instructs the compiler to use relaxed precision in the calculation of floating-point square	root. Can result in

improved performance at	the expense of numerical accuracy.</p>

 ]]>

<example>-Mfprelaxed=sqrt</example>

</flag>



<flag name="Mfprelaxed_div" class="optimization"

      compilers="pgcc, pgcpp, pgf95"

      regexp="-Mfprelaxed=div\b">

<![CDATA[

 <p>Instructs the compiler to use relaxed precision in the calculation of floating-point division. Can result in improved performance at the expense of	numerical accuracy.</p>

 ]]>

<example>-Mfprelaxed=div</example>

</flag>



<flag name="Mfprelaxed_order" class="optimization"

      compilers="pgcc, pgcpp, pgf95"

      regexp="-Mfprelaxed=order\b">

<![CDATA[

 <p>Instructs the compiler to allow floating-point expression reordering, including factoring. Can result in

improved performance at	the expense of numerical accuracy.</p>

 ]]>

<example>-Mfprelaxed=order</example>

</flag>



<flag name="Mfprelaxed"	class="optimization"

      compilers="pgcc, pgcpp, pgf95"

      regexp="-Mfprelaxed\b">

<![CDATA[

 <p>Instructs the compiler to use relaxed precision in the calculation of some intrinsic functions.  Can result	in

improved performance at	the expense of numerical accuracy.  The	default	on an AMD system is "-Mfprelaxed=rsqrt,order".	The

default	on an Intel system is "-Mfprelaxed=rsqrt,sqrt,div,order"</p>

 ]]>

<example>-Mfprelaxed</example>

<include flag="Mfprelaxed_rsqrt" />

<include flag="Mfprelaxed_sqrt"	/>

<include flag="Mfprelaxed_div" />

<include flag="Mfprelaxed_order" />

</flag>





<flag name="Mfpapprox_subopt" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mfpapprox=([^,\s]+),(\S+)\b" >

<include text="-Mfpapprox=$1" />

<include text="-Mfpapprox=$2" />

<display enable="0" />

CPU2006	flags file rule	used to	split an optimization flag containing sub-options into multiple	flag descriptions.

Please refer to	the flag file rule of the various sub-options for the actual flag description.

</flag>



<flag name="Mfpapprox_rsqrt" class="optimization"

      compilers="pgcc, pgcpp, pgf95"

      regexp="-Mfpapprox=rsqrt\b">

<![CDATA[

 <p>Instructs the compiler to use low-precision	approximation in the calculation of reciprocal square root (1/sqrt). Can result	in

improved performance at	the expense of numerical accuracy.</p>

 ]]>

<example>-Mfpapprox=rsqrt</example>

</flag>



<flag name="Mfpapprox_sqrt" class="optimization"

      compilers="pgcc, pgcpp, pgf95"

      regexp="-Mfpapprox=sqrt\b">

<![CDATA[

 <p>Instructs the compiler to use low-precision	approximation in the calculation of square root. Can result in

improved performance at	the expense of numerical accuracy.</p>

 ]]>

<example>-Mfpapprox=sqrt</example>

</flag>



<flag name="Mfpapprox_div" class="optimization"

      compilers="pgcc, pgcpp, pgf95"

      regexp="-Mfpapprox=div\b">

<![CDATA[

 <p>Instructs the compiler to use low-precision	approximation in the calculation of divides.  Can result in

improved performance at	the expense of numerical accuracy.</p>

 ]]>

<example>-Mfpapprox=div</example>

</flag>



<flag name="Mfpapprox" class="optimization"

      compilers="pgcc, pgcpp, pgf95"

      regexp="-Mfpapprox\b">

<![CDATA[

 <p>Instructs the compiler to perform low-precision approximation in the calculation of	floating-point division, square-root, and reciprocal square root.

Can result in improved performance at the expense of numerical accuracy.</p>

 ]]>

<example>-Mfpapprox</example>

<include flag="Mfpapprox_rsqrt"	/>

<include flag="Mfpapprox_sqrt" />

<include flag="Mfpapprox_div" />

</flag>



<flag name="Mprefetch_subopt" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mprefetch=([^,\s]+),(\S+)\b" >

<include text="-Mprefetch=$1" />

<include text="-Mprefetch=$2" />

<display enable="0" />

CPU2006	flags file rule	used to	split an optimization flag containing sub-options into multiple	flag descriptions.

Please refer to	the flag file rule of the various sub-options for the actual flag description.

</flag>



<flag name="Mprefetch_d_m" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mprefetch=d[istance]*:(\d+)\b">

<![CDATA[

 <p>Set	the fetch-ahead	distance for prefetch instructions to <b>m</b> cache lines</p>

]]>

<example>-Mprefetch=d:m</example>

<include flag="Mprefetch" />

</flag>



<flag name="Mprefetch_n_p" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mprefetch=n:(\d+)\b">

<![CDATA[

 <p>Set	maximum	number of prefetch instructions	to generate for	a given	loop to	<b>p</b>.</p>

]]>

<example>-Mprefetch=n:p</example>

<include flag="Mprefetch" />

</flag>



<flag name="Mprefetch_nta" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mprefetch=nta\b">

<![CDATA[

 <p>Use	the <i>prefetchnta</i> instruction.</p>

]]>

<example>-Mprefetch_nta</example>

<include flag="Mprefetch" />

</flag>



<flag name="Mprefetch_plain" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mprefetch=plain\b">

<![CDATA[

 <p>Use	the <i>prefetch</i> instruction.</p>

]]>

<example>-Mprefetch=plain</example>

<include flag="Mprefetch" />

</flag>



<flag name="Mprefetch_t0" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mprefetch=t0\b" >

<![CDATA[

 <p>Use	the <i>prefetcht0</i> instruction.</p>

]]>

<example>-Mprefetch=t0</example>

<include flag="Mprefetch" />

</flag>



<flag name="Mprefetch_w" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mprefetch=w\b" >

<![CDATA[

 <p>Use	the AMD-specific <i>prefetchw</i> instruction.</p>

]]>

<example>-Mprefetch=w</example>

<include flag="Mprefetch" />

</flag>



<flag name="Mprefetch" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mprefetch\b">

<![CDATA[

 <p>Enable generation of prefetch instructions on processors where they	are supported.</p>

]]>

<example>-Mprefetch</example>

</flag>



<flag name="Mnoprefetch" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mnoprefetch\b" >

<![CDATA[

 <p>Disable generation of prefetch instructions.</p>

]]>

<example>-Mnoprefetch</example>

</flag>



<flag name="Mscalarsse"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mscalarsse\b">

<![CDATA[

 <p>Use	SSE/SSE2 instructions to perform scalar	floating-point arithmetic on targets where these

 instructions are supported.</p>

]]>

<example>-Mscalarsse</example>

</flag>



<flag name="Mnoscalarsse" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mnoscalarsse\b">

<![CDATA[

 <p>Do not use SSE/SSE2	instructions to	perform	scalar floating-point arithmetic; use x87 operations instead.</p>

]]>

<example>-Mnoscalarsse</example>

</flag>



<flag name="Msignextend" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Msignextend\b">

<![CDATA[

 <p>Instructs the compiler to extend the sign bit that is set as a result of an	object's conversion from one

 data type to an object	of a larger signed data	type.</p>

]]>

<example>-Msignextend</example>

</flag>



<flag name="Mlre_array"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mlre=array\b">

<![CDATA[

 <p>Treat individual array element references as candidates for	possible loop-carried redundancy elimination.

 The default is	to eliminate only redundant expressions	involving two or more operands.</p>

]]>

<example>-Mlre_array</example>

<include flag="Mlre" />

</flag>



<flag name="Mlre_assoc"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mlre=assoc\b">

<![CDATA[

 <p>Allow expression re-association; specifying	this sub-option	can increase opportunities for loop-carried

 redundancy elimination.</p>

]]>

<example>-Mlre=assoc</example>

<include flag="Mlre" />

</flag>



<flag name="Mlre_noassoc" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mlre=noassoc\b">

<![CDATA[

 <p>Disable expression re-association.</p>

]]>

<example>-Mlre=noassoc</example>

<include flag="Mlre" />

</flag>



<flag name="Mlre" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mlre\b">

<![CDATA[

 <p>Enables loop-carried redundancy elimination, an optimization that can reduce the number of arithmetic operations

 and memory references in loops.</p>

]]>

<example>-Mlre</example>

</flag>



<flag name="Mnolre" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mnolre\b">

<![CDATA[

 <p>Disable loop-carried redundancy elimination.</p>

]]>

<example>-Mnolre</example>

</flag>



<flag name="Mnovintr" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mnovintr\b">

<![CDATA[

 <p>Instructs the compiler not to perform idiom	recognition or introduce calls to hand-optimized vector	functions.</p>

]]>

<example>-Mnovintr</example>

</flag>



<flag name="Mpfi" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mpfi\b">

<![CDATA[

 <p>Generate profile-feedback instrumentation (PFI); this includes extra code to collect run-time statistics and dump

 them to a trace file for use in a subsequent compilation.  PFI	gathers	information about a program's execution	and data values

but does not gather information	from hardware performance counters.  PFI does gather data for optimizations which are unique to	profile-feedback optimization.</p>

]]>

<example>-Mpfi</example>

</flag>



<flag name="Mpfo" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mpfo\b">

<![CDATA[

 <p>Enable profile-feedback optimizations.  </p>

]]>

<example>-Mpfo</example>

</flag>



<flag name="Mipa_subopt" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=([^,\s]+),(\S+)\b">

<include text="-Mipa=$1" />

<include text="-Mipa=$2" />

<display enable="0" />

CPU2006	flags file rule	used to	split an optimization flag containing sub-options into multiple	flag descriptions.

Please refer to	the flag file rule of the various sub-options for the actual flag description.

</flag>



<flag name="Mipa_align"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=align\b">

<![CDATA[

 <p>Interprocedural Analysis option: Recognize when targets of pointer dummy are aligned.</p>

]]>

<example>-Mipa=align</example>

</flag>



<flag name="Mipa_noalign" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=noalign\b">

<![CDATA[

 <p>Interprocedural Analysis option: Disable recognizition when	targets	of pointer dummy are aligned.</p>

]]>

<example>-Mipa=noalign</example>

</flag>



<flag name="Mipa_arg" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=arg\b">

<![CDATA[

 <p>Interprocedural Analysis option: Remove arguments replaced by -Mipa=ptr,const</p>

]]>

<include flag="Mipa_ptr" />

<include flag="Mipa_const" />

<example>-Mipa=arg</example>

</flag>



<flag name="Mipa_noarg"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=noarg\b">

<![CDATA[

 <p>Interprocedural Analysis option: Do	not remove arguments replaced by -Mipa=ptr,const</p>

]]>

<include flag="Mipa_ptr" />

<include flag="Mipa_const" />

<example>-Mipa=noarg</example>

</flag>



<flag name="Mipa_cg" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=cg\b">

<![CDATA[

 <p>Interprocedural Analysis option: Generate call graph information for pgicg tool.</p>

]]>

<example>-Mipa=cg</example>

</flag>



<flag name="Mipa_nocg" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=nocg\b">

<![CDATA[

 <p>Interprocedural Analysis option: Do	not generate call graph	information for	pgicg

 tool.</p>

]]>

<example>-Mipa=nocg</example>

</flag>



<flag name="Mipa_const"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=const\b">

<![CDATA[

 <p>Interprocedural Analysis option: Enable interprocedural constant propagation.</p>

]]>

<example>-Mipa=const</example>

</flag>



<flag name="Mipa_noconst" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=noconst\b">

<![CDATA[

 <p>Interprocedural Analysis option: Disable interprocedural constant propagation.</p>

]]>

<example>-Mipa=noconst</example>

</flag>



<flag name="Mipa_except" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=except:([\-\w,]+)\b">

<![CDATA[

 <p>Interprocedural Analysis option: Used with -Mipa=inline to specify functions which should not be inlined.</p>

]]>

<include flag="Mipa_inline" />

<example>-Mipa=except:func</example>

</flag>



<flag name="Mipa_fast" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=fast\b">

<![CDATA[

 <p>Instructs the compiler to perform interprocedural analysis.	 Equivalant to -Mipa=align,arg,const,f90ptr,shape,globals,libc,localarg,ptr,pure.</p>

]]>

<example>-Mipa=fast</example>

<include flag="Mipa_align" />

<include flag="Mipa_arg" />

<include flag="Mipa_const" />

<include flag="Mipa_f90ptr" />

<include flag="Mipa_shape" />

<include flag="Mipa_globals" />

<include flag="Mipa_libc" />

<include flag="Mipa_localarg" />

<include flag="Mipa_ptr" />

<include flag="Mipa_pure" />

</flag>



<flag name="Mipa_force"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=force\b"	>

<![CDATA[

 <p>Interprocedural Analysis option: Force all objects to recompile regardless

 whether IPA information has changed.</p>

]]>

<example>-Mipa=force</example>

</flag>



<flag name="Mipa_globals" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=globals\b">

<![CDATA[

 <p>Interprocedural Analysis option: Optimize references to global values.</p>

]]>

<example>-Mipa=globals</example>

</flag>



<flag name="Mipa_noglobals" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=noglobals\b">

<![CDATA[

 <p>Interprocedural Analysis option: Do	not optimize references	to global values.</p>

]]>

<example>-Mipa=noglobals</example>

</flag>



<flag name="Mipa_inline:n" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=inline:(\d+)\b">

<![CDATA[

 <p>Interprocedural Analysis option: Automatically determine which functions

 to inline, limit to <b>n</b> levels.  IPA-based function inlining is performed	from leaf

 routines upward.</p>

]]>

<example>-Mipa=inline:n</example>

</flag>



<flag name="Mipa_inline" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=inline\b">

<![CDATA[

 <p>Interprocedural Analysis option: Automatically determine which functions to	inline.

 IPA-based function inlining is	performed from leaf routines upward.</p>

]]>

<example>-Mipa=inline</example>

</flag>





<flag name="Mipa_libinline" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=libinline\b">

<![CDATA[

 <p>Interprocedural Analysis option: Allow inlining of routines	from libraries.</p>

]]>

<include flag="Mipa_inline" />

<example>-Mipa=libinline</example>

</flag>



<flag name="Mipa_nolibinline" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=nolibinline\b">

<![CDATA[

 <p>Interprocedural Analysis option: Do	not inline routines from libraries.</p>

]]>

<include flag="Mipa_inline" />

<example>-Mipa=nolibinline</example>

</flag>



<flag name="Mipa_libc" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=libc\b">

<![CDATA[

 <p>Interprocedural Analysis option:  Used to optimize calls to	certain	functions in the system	standard C library, libc.</p>

]]>

<example>-Mipa=libc</example>

</flag>



<flag name="Mipa_libopt" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=libopt\b">

<![CDATA[

 <p>Interprocedural Analysis option: Allow recompiling and optimization	of routines from libraries using IPA information.</p>

]]>

<example>-Mipa=libopt</example>

</flag>



<flag name="Mipa_nolibopt" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=nolibopt\b">

<![CDATA[

 <p>Interprocedural Analysis option: Don't optimize routines in	libraries.</p>

]]>

<example>-Mipa=nolibopt</example>

</flag>



<flag name="Mipa_localarg" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=localarg\b">

<![CDATA[

 <p>Interprocedural Analysis option: -Mipa=arg plus externalizes local pointer targets.</p>

]]>

<include flag="Mipa_arg" />

<example>-Mipa=localarg</example>

</flag>



<flag name="Mipa_local"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=local\b">

<![CDATA[

 <p>Interprocedural Analysis option: -Mipa=arg plus externalizes local pointer targets.</p>

]]>

<include flag="Mipa_arg" />

<example>-Mipa=localarg</example>

</flag>



<flag name="Mipa_nolocalarg" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=nolocal[arg]*\b">

<![CDATA[

 <p>Interprocedural Analysis option: Do	not externalize	local pointer targets.</p>

]]>

<example>-Mipa=nolocalarg</example>

</flag>



<flag name="Mipa_ptr" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=ptr\b">

<![CDATA[

 <p>Interprocedural Analysis option: Enable pointer disambiguation across procedure calls.</p>

]]>

<example>-Mipa=ptr</example>

</flag>



<flag name="Mipa_noptr"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=noptr\b">

<![CDATA[

 <p>Interprocedural Analysis option: Disable pointer disambiguation.</p>

]]>

<example>-Mipa=noptr</example>

</flag>



<flag name="Mipa_f90ptr" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=f90ptr\b">

<![CDATA[

 <p>Interprocedural Analysis option: Fortran 90/95 Pointer disambiguation across calls.</p>

]]>

<example>-Mipa=f90ptr</example>

</flag>



<flag name="Mipa_nof90ptr" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=nof90ptr\b">

<![CDATA[

 <p>Interprocedural Analysis option: Disable Fortran 90/95 pointer disambiguation</p>

]]>

<example>-Mipa=nof90ptr</example>

</flag>



<flag name="Mipa_pure" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=pure\b">

<![CDATA[

 <p>Interprocedural Analysis option: Pure function detection.</p>

]]>

<example>-Mipa=pure</example>

</flag>



<flag name="Mipa_nopure" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=nopure\b">

<![CDATA[

 <p>Interprocedural Analysis option: Disable pure function detection.</p>

]]>

<example>-Mipa=nopure</example>

</flag>



<flag name="Mipa_reshape" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=reshape\b">

<![CDATA[

 <p>Interprocedural Analysis option: Allows inlining in	Fortran	even when array	shapes do not match.</p>

]]>

<example>-Mipa=reshape</example>

</flag>



<flag name="Mipa_shape"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=shape\b">

<![CDATA[

 <p>Interprocedural Analysis option: Perform Fortran 90	array shape propagation.</p>

]]>

<example>-Mipa=shape</example>

</flag>



<flag name="Mipa_noshape" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=noshape\b">

<![CDATA[

 <p>Interprocedural Analysis option: Disable Fortran 90	array shape propagation.</p>

]]>

<example>-Mipa=noshape</example>

</flag>



<flag name="Mipa_vestigial" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=vestigial\b">

<![CDATA[

 <p>Interprocedural Analysis option: Remove functions that are never called.</p>

]]>

<example>-Mipa=vestigial</example>

</flag>



<flag name="Mipa_novestigial" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa=novestigial\b">

<![CDATA[

 <p>Interprocedural Analysis option: Do	not remove functions that are never called.</p>

]]>

<example>-Mipa=novestigial</example>

</flag>



<flag name="Mipa" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mipa\b">

<![CDATA[

 <p>Enable Interprocedural Analysis.</p>

]]>

<include flag="Mipa_const" />

<example>-Mipa</example>

</flag>



<flag name="Mconcur_subopt" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mconcur=([^,\s]+),(\S+)\b">

<include text="-Mconcur=$1" />

<include text="-Mconcur=$2" />

<display enable="0" />

CPU2006	flags file rule	used to	split an optimization flag containing sub-options into multiple	flag descriptions.

Please refer to	the flag file rule of the various sub-options for the actual flag description.

</flag>



<flag name="Mconcur_altcode" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mconcur=altcode\b">

<![CDATA[

 <p>Instructs the parallelizer to generate alternate serial code for parallelized loops.  Without arguments,

 the parallelizer determines an	appropriate cutoff length and generates	serial code to be executed whenever

 the loop count	is less	than or	equal to that length.</p>

]]>

<example>-Mconcur=altcode</example>

<include flag="Mconcur"	/>

</flag>



<flag name="Mconcur_altcoden" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mconcur=altcode:(\d+)\b">

<![CDATA[

 <p>Instructs the parallelizer to generate alternate serial code for parallelized loops.  With arguments, the serial altcode

 is executed whenever the loop count is	less than or equal to <b>n</b>.</p>

]]>

<example>-Mconcur=altcode:n</example>

<include flag="Mconcur"	/>

</flag>



<flag name="Mconcur_noaltcode" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mconcur=noaltcode\b">

<![CDATA[

 <p>Always execute the parallelized version of a loop regardless of the	loop count.</p>

]]>

<example>-Mconcur=noaltcode</example>

<include flag="Mconcur"	/>

</flag>



<flag name="Mconcur_noassoc" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mconcur=noassoc\b" >

<![CDATA[

 <p>Disables parallelization of	loops with reductions.</p>

]]>

<example>-Mconcur=noassoc</example>

<include flag="Mconcur"	/>

</flag>



<flag name="Mconcur_cncall" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mconcur=cncall\b">

<![CDATA[

 <p>Assume loops containing calls are safe to parallelize and allows loops containing calls to be

 candidates for	parallelization.  Also,	no minimum loop	count threshold	must be	satisfied before

 parallelization will occur, and last values of	scalars	are assumed to be safe.</p>

]]>

<example>-Mconcur=cncall</example>

<include flag="Mconcur"	/>

</flag>



<flag name="Mconcur_nocncall" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mconcur=nocncall\b">

<![CDATA[

 <p>Do not assume loops	containing calls are safe to parallelize.</p>

]]>

<example>-Mconcur=nocncall</example>

<include flag="Mconcur"	/>

</flag>



<flag name="Mconcur_dist_block"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mconcur=dist:block\b">

<![CDATA[

 <p>Parallelize	with block distribution.  Contiguous blocks of iterations of a parallelizable loop

 are assigned to the available processors.</p>

]]>

<example>-Mconcur=dist:bloc</example>

<include flag="Mconcur"	/>

</flag>



<flag name="Mconcur_dist_cyclic" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mconcur=dist:cyclic\b">

<![CDATA[

 <p>Parallelize	with cyclic distribution.  The outermost parallelizable	loop in	any loop nest is

 parallelized.	If a parallelized loop is innermost, its iterations are	allocated to processors	cyclically.

 For example, if there are 3 processors	executing a loop, processor 0 performs iterations 0, 3,	6, etc.; processor 1

 performs iterations 1,	4, 7, etc.; and	processor 2 performs iterations	2, 5, 8, etc.</p>

]]>

<example>-Mconcur=dist:cyclic</example>

<include flag="Mconcur"	/>

</flag>



<flag name="Mconcur_innermost" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mconcur=innermost\b">

<![CDATA[

 <p>Enable parallelization of innermost	loops.</p>

]]>

<example>-Mconcur=innermost</example>

<include flag="Mconcur"	/>

</flag>



<flag name="Mconcur_noinnermost" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mconcur=noinnermost\b">

<![CDATA[

 <p>Disable parallelization of innermost loops.</p>

]]>

<example>-Mconcur=noinnermost</example>

<include flag="Mconcur"	/>

</flag>





<flag name="Mconcur" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mconcur\b">

<![CDATA[

 <p>Instructs the compiler to enable auto-concurrentization of loops.  If <i>-Mconcur</i> is specified,	multiple processors

 will be used to execute loops that the	compiler determines to be parallelizable.</p>

]]>

<include flag="MP_BIND"	/>

<include flag="MP_BLIST" />

<include flag="NCPUS" />

<example>-Mconcur</example>

</flag>



<flag name="Minline_subopt" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Minline=([^,\s]+),(\S+)\b">

<include text="-Minline=$1" />

<include text="-Minline=$2" />

<display enable="0" />

CPU2006	flags file rule	used to	split an optimization flag containing sub-options into multiple	flag descriptions.

Please refer to	the flag file rule of the various sub-options for the actual flag description.

</flag>



<flag name="Minline_lib" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Minline=lib:([\.\-\w]+)\b">

<![CDATA[

 <p>Instructs the inliner to inline the	functions within the library <b>filename.ext</b>.</p>

]]>

<example>-Minline=lib:filename.ext</example>

</flag>



<flag name="Minline_except" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Minline=except:([\-\w,]+)\b" >

<![CDATA[

 <p>Instructs the inliner to inline all	eligible functions except <b>func</b>, a function in the source	text.

 Multiple functions can	be listed, comma-separated.</p>

]]>

<example>-Minline=except:func</example>

</flag>



<flag name="Minline_name" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Minline=name:([\-\w,]+)\b" >

<![CDATA[

 <p>Instructs the inliner to inline function <b>func</b>.</p>

]]>

<example>-Minline=name:func</example>

</flag>



<flag name="Minline_reshape" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Minline=size:(\d+)\b">

<![CDATA[

 <p>Allows inlining in Fortran even when array shapes do not match.</p>

]]>

<example>-Minline=size:n</example>

</flag>



<flag name="Minline_size" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Minline=size:(\d+)\b">

<![CDATA[

 <p>Instructs the inliner to inline functions with <b>n</b> or fewer statements.</p>

]]>

<example>-Minline=size:n</example>

</flag>



<flag name="Minline_levels" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Minline=levels:(\d+)\b">

<![CDATA[

 <p>Instructs the inliner to perform <b>n</b> levels of	inlining.</p>

]]>

<example>-Minline=levels:n</example>

</flag>



<flag name="Minline" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Minline\b">

<![CDATA[

 <p>Instructs the inliner to perform 1 level of	inlining.</p>

]]>

<example>-Minline</example>

</flag>



<flag name="Mnopropcond" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mnopropcond\b">

<![CDATA[

 <p>Disable constant propagation from assertions derived from equality conditionals.</p>

]]>

<example>-Mnopropcond</example>

</flag>



<flag name="Msmartalloc_huge_n"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Msmartalloc\=huge\:\d+\b">

<![CDATA[

 <p>Link with the huge page runtime library and	allocate a maximum n huge pages.</p>

]]>

<example>-Msmartalloc=huge:128</example>

</flag>



<flag name="Msmartalloc_huge" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Msmartalloc\=huge\b">

<![CDATA[

 <p>Link with the huge page runtime library.</p>

]]>

<example>-Msmartalloc=huge</example>

</flag>



<flag name="Msmartalloc_hugebss" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Msmartalloc\=hugebss\b">

<![CDATA[

 <p>Link with the huge page runtime library.  Use huge pages for an executable's .BSS section.</p>

]]>

<example>-Msmartalloc=huge</example>

</flag>



<flag name="Msmartalloc" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Msmartalloc\b">

<![CDATA[

 <p>Adds a call	to the routine "mallopt" in the	main routine.  This option can have a dramatic impact on the performance of programs that dynamically allocate memory, especially for those which have a few large mallocs.  To	be effective, this switch must be specified when compiling the file containing the Fortran, C, or C++ main routine.</p>

]]>

<example>-Msmartalloc</example>

</flag>



<flag name="Msafeptr_subopt" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Msafeptr=([^,\s]+),(\S+)\b">

<include text="-Msafeptr=$1" />

<include text="-Msafeptr=$2" />

<display enable="0" />

CPU2006	flags file rule	used to	split an optimization flag containing sub-options into multiple	flag descriptions.

Please refer to	the flag file rule of the various sub-options for the actual flag description.

</flag>



<flag name="Msafeptr_all" class="optimization"

 compilers="pgcc, pgcpp"

 regexp="-Msafeptr=all\b">

<![CDATA[

 <p>Assume all pointers	and arrays are independent and safe for	aggressive optimizations,

 and in	particular that	no pointers or arrays overlap of conflict with each other.</p>

]]>

<example>-Msafeptr=all</example>

<include flag="Msafeptr" />

</flag>



<flag name="Msafeptr_arg" class="optimization"

 compilers="pgcc, pgcpp"

 regexp="-Msafeptr=arg\b">

<![CDATA[

 <p>Instructs the compiler that	arrays and pointers are	treated	with the same copyin and copyout

 semantics as Fortran dummy arguments.</p>

]]>

<example>-Msafeptr=arg</example>

<include flag="Msafeptr" />

</flag>



<flag name="Msafeptr_auto" class="optimization"

 compilers="pgcc, pgcpp"

 regexp="-Msafeptr=auto\b">

<![CDATA[

 <p>Instructs the compiler that	local pointers and arrays do not overlap or

 conflict with each other and are independent.</p>

]]>

<example>-Msafeptr=auto</example>

<include flag="Msafeptr" />

</flag>



<flag name="Msafeptr_local" class="optimization"

 compilers="pgcc, pgcpp"

 regexp="-Msafeptr=local\b">

<![CDATA[

 <p>Instructs the compiler that	local pointers and arrays do not overlap or

 conflict with each other and are independent.</p>

]]>

<example>-Msafeptr=local</example>

<include flag="Msafeptr" />

</flag>



<flag name="Msafeptr_static" class="optimization"

 compilers="pgcc, pgcpp"

 regexp="-Msafeptr=static\b">

<![CDATA[

 <p>Instructs the compiler that	static pointers	and arrays do not overlap or conflict

 with each other and are independent.</p>

]]>

<example>-Msafeptr=static</example>

<include flag="Msafeptr" />

</flag>



<flag name="Msafeptr_global" class="optimization"

 compilers="pgcc, pgcpp"

 regexp="-Msafeptr=global\b">

<![CDATA[

 <p>Instructs the compiler that	global or external pointers and	arrays do not overlap or

 conflict with each other and are independent.</p>

]]>

<example>-Msafeptr=global</example>

<include flag="Msafeptr" />

</flag>



<flag name="Msafeptr" class="optimization"

 compilers="pgcc, pgcpp"

 regexp="-Msafeptr\b">

<![CDATA[

 <p>Instructs the C/C++	compiler to override data dependencies between pointers	of a given storage class.</p>

]]>

<example>-Msafeptr</example>

</flag>



<flag name="Munroll_subopt" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Munroll=([^,\s]+),(\S+)\b">

<include text="-Munroll=$1" />

<include text="-Munroll=$2" />

<display enable="0" />

CPU2006	flags file rule	used to	split an optimization flag containing sub-options into multiple	flag descriptions.

Please refer to	the flag file rule of the various sub-options for the actual flag description.

</flag>



<flag name="Munroll_c_n" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Munroll=c:(\d+)\b">

<![CDATA[

 <p>"-Munroll=c:n" instructs the compiler to completely	unroll loops with a constant loop count	of less	than

 or equal to <b>n</b> where <b>n</b> is	a supplied constant value.  If no constant value is given, then	a default of 4 is used.</p>

]]>

<example>-Munroll=c:n</example>

</flag>



<flag name="Munroll_n_n" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Munroll=n:(\d+)\b">

<![CDATA[

 <p>"-Munroll=n:n" instructs the compiler to unroll loops <b>n</b> times where <b>n</b>	is a supplied constant value.

If no constant value is	given, then a default of 4 is used.</p>

]]>

<example>-Munroll=n:n</example>

</flag>



<flag name="Munroll_m_n" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Munroll=m:(\d+)\b">

<![CDATA[

 <p>"-Munroll=m:n" instructs the compiler to unroll loops with multiple	blocks <b>n</b>	times where <b>n</b> is	a supplied constant value.

If no constant value is	given, then a default of 4 is used.</p>

]]>

<example>-Munroll=m:n</example>

</flag>



<flag name="Munroll_m" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Munroll=m\b">

<![CDATA[

 <p>Instructs the compiler to unroll loops with	multiple blocks	using the default value	of 4 times</p>

]]>

<example>-Munroll=m:n</example>

</flag>



<flag name="Munroll" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Munroll\b">

<![CDATA[

 <p>Invokes the	loop unroller.</p>

]]>

<example>-Munroll</example>

</flag>



<flag name="Mnounroll" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mnounroll\b">

<![CDATA[

 <p>Disable loop unrolling.</p>

]]>

<example>-Mnounroll</example>

</flag>



<flag name="Mnodepchk" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="\-Mnodepchk\b">

<![CDATA[

 <p>Don't check	dependence relations for vector	or parallel code.</p>

]]>

</flag>



<flag name="Msafe_lastval" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="\-Msafe_lastval\b">

<![CDATA[

 <p>Allow parallelization of loops with	conditional scalar assignments.</p>

]]>

</flag>



<flag name="Mstride0" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="\-Mstride0\b">

<![CDATA[

 <p>Generate code to check for zero loop increments.</p>

]]>

</flag>



<flag name="Msmart" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Msmart\b">

<![CDATA[

 <p>Enable an optional post-pass instruction scheduling.</p>

]]>

<example>-Msmart</example>

</flag>



<flag name="Mnosmart" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mnosmart\b">

<![CDATA[

 <p>Disable an optional	post-pass instruction scheduling.</p>

]]>

<example>-Mnosmart</example>

</flag>



<flag name="Mvect_subopt" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mvect=([^,\s]+),([\:\S]+)\b">

<include text="-Mvect=$1" />

<include text="-Mvect=$2" />

<display enable="0" />

CPU2006	flags file rule	used to	split an optimization flag containing sub-options into multiple	flag descriptions.

Please refer to	the flag file rule of the various sub-options for the actual flag description.

</flag>



<flag name="Mnovect" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mnovect\b">

<![CDATA[

 <p>Disable automatic vector pipelining.</p>

]]>

<example>-Mnovect</example>

</flag>



<flag name="Mvect_altcode" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mvect=altcode\b">

<![CDATA[

 <p>Instructs the vectorizer to	generate alternate code	for vectorized loops when appropriate.	For each

 vectorized loop the compiler decides whether to generate altcode and what type	or types to generate, which may

 be any	or all of:</p>

<ul>

<li>Altcode without iteration peeling</li>

<li>Altcode with non-temporal stores and other data cache optimizations</li>

<li>Altcode base on array alignments calculated	dynamically at runtime.</li>

</ul>

<p>The compiler	also determines	suitable loop count and	array alignment	conditions for executing the altcode.</p>

]]>

<example>-Mvect=altcode</example>

</flag>



<flag name="Mvect_noaltcode" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mvect=noaltcode\b">

<![CDATA[

 <p>Disables alternate code generation for vectorized loops.</p>

]]>

<example>-Mvect=noaltcode</example>

</flag>



<flag name="Mvect_assoc" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mvect=altcode\b">

<![CDATA[

 <p>Instructs the vectorizer to	enable certain associativity conversions that can change the results of	a computations

 due to	roundoff error.	 A typical optimization	is to change an	arithmetic operation to	an arithmetic opteration that is

 mathmatically correct,	but can	be computationally different, due to round-off error.</p>

]]>

<example>-Mvect=assoc</example>

</flag>



<flag name="Mvect_noassoc" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mvect=noassoc\b">

<![CDATA[

 <p>Instructs the vectorizer to	disable	associativity conversions.</p>

]]>

<example>-Mvect=noassoc</example>

</flag>



<flag name="Mvect_cachesize_n" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="\-Mvect=cachesize:(\d+)\b">

<![CDATA[

 <p>Instructs the vectorizer, when performing cache tiling optimizations, to assume a cache size of <b>n</b>.

 The default size is <b>n</b>=262144.</p>

]]>

<example>-Mvect=cachesize:n</example>

</flag>



<flag name="Mvect_fuse"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mvect=fuse\b">

<![CDATA[

 <p>Instructs the vectorizer to	enable loop fusion.</p>

]]>

<example>-Mvect=fuse</example>

<include flag="Mvect" />

</flag>



<flag name="Mvect_nogather" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mvect=nogather\b">

<![CDATA[

 <p>Instructs the vectorizer to	disable	vectorization of indirect array	references.</p>

]]>

<example>-Mvect=nogather</example>

<include flag="Mvect" />

</flag>



<flag name="Mvect_idiom" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mvect=idiom\b">

<![CDATA[

 <p>Instructs the vectorizer to	enable idiom recognition.</p>

]]>

<example>-Mvect=idiom</example>

</flag>



<flag name="Mvect_noidiom" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mvect=noidiom\b">

<![CDATA[

 <p>Instructs the vectorizer to	disable	idiom recognition.</p>

]]>

<example>-Mvect=noidiom</example>

</flag>



<flag name="Mvect_nosizelimit" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mvect=nosizelimit\b">

<![CDATA[

 <p>Generate vector loops for all loops	where possible regardless of the number	of

 statements in the loop.  This overrides a heuristic in	the vectorizer that ordinarily

 prevents vectorization	of loops with a	number of statements that exceed a certain threshold.</p>

]]>

<example>-Mvect=nosizelimit</example>

<include flag="Mvect" />

</flag>



<flag name="Mvect_prefetch" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mvect=prefetch\b">

<![CDATA[

 <p>Instructs the vectorizer to	generate prefetch instructions.</p>

]]>

<example>-Mvect=prefetch</example>

<include flag="Mvect" />

</flag>



<flag name="Mvect_sse" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mvect=sse\b">

<![CDATA[

 <p>Instructs the vectorizer to	search for vectorizable	loops and, where possible, make	use of

 SSE, SSE2, and	prefetch instructions.</p>

]]>

<include flag="Mvect" />

<example>-Mvect=sse</example>

</flag>



<flag name="Mvect" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mvect\b">

<![CDATA[

 <p>Enable automatic vector pipelining.</p>

]]>

<include flag="Mvect_assoc" />

<include flag="Mvect_altcode" />

<include text="Mvect=cachesize:262144" />

<example>-Mvect</example>

</flag>



<flag name="Mnofptrap" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mnofptrap\b">

<![CDATA[

 <p>Disables -Ktrap=fp.</p>

]]>

<example>-Mnofptrap</example>

<include flag="Ktrap" />

</flag>



<flag name="Ktrap" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Ktrap\w*\b">

<![CDATA[

 <p>-Ktrap is only processed by	the compilers when compiling main functions' programs. The options inv,	denorm,	divz, ovf, unf,	and inexact correspond to the processor's exception mask bits invalid operation, denormalized operand, divide-by-zero, overflow, underflow, and	precision, respectively. Normally, the processor's exception mask bits are on (floating-point exceptions are masked the	processor recovers from	the exceptions and continues). If a floating-point exception occurs and	its corresponding mask bit is off (or  unmasked	), execution terminates	with an	arithmetic exception (C's SIGFPE signal). -Ktrap=fp is equivalent to -Ktrap=inv,divz,ovf.

</p>

]]>

<example>-Ktrap=fp</example>

</flag>



<flag name="Mlongbranch" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mlongbranch\b">

<![CDATA[

 <p>Enable long	branches.</p>

]]>

<example>-Mlongbranch</example>

</flag>



<flag name="acml" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-lacml\b">

<![CDATA[

 <p>Link with the AMD Core Math	Library. Available from	www.amd.com</p>

]]>

<example>-lacml</example>

</flag>



<flag name="mp"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-mp\b">

<![CDATA[

 <p>Use	the -mp	option to instruct the compiler	to interpret user-inserted OpenMP shared-memory	parallel programming directives	and generate an	executable file	which will utilize multiple processors in a shared-memory parallel system.



When used strictly as a	linker flag, the PGI OpenMP runtime will be linked and users can use the environment variables MP_BIND and MP_BLIST to bind a serial program to	a CPU.

</p>

]]>

<include flag="MP_BIND"	/>

<include flag="MP_BLIST" />

<include flag="NCPUS" />

<example>-mp</example>

</flag>



<flag name="mp_align" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-mp=align\b">

<![CDATA[

 <p>The	align sub-option to -mp	forces loop iterations to be allocated to OpenMP processes using an algorithm that maximizes alignment of vector sub-sections in loops that are	both parallelized and vectorized for SSE. This can improve performance in program units	that include many such loops. It can result in load-balancing problems that significantly decrease performance in program units	with relatively	short loops that contain a large amount	of work	in each	iteration. </p>

]]>

<example>-mp=align</example>

<include flag="mp" />

</flag>



<flag name="mp_numa" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-mp=numa\b">

<![CDATA[

 <p>The	numa suboption to -mp uses libnuma on systems where it is available.</p>

]]>

<example>-mp=numa</example>

<include flag="mp" />

</flag>



<flag name="mp_nonuma" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-mp=nonuma\b">

<![CDATA[

 <p>The	nonuma suboption to -mp	tells the driver to not	link with libnuma.</p>

]]>

<example>-mp=nonuma</example>

<include flag="mp" />

</flag>



<flag name="mcmodel_medium" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-mcmodel=medium\b">

<![CDATA[

 <p>(For use only on 64-bit Linux targets) Generate code for the medium	memory model in	the linux86-64 execution environment. The default small	memory model of	the linux86-64 environment limits the combined area for	a user's object	or executable to 1GB, with the Linux kernel managing usage of the second 1GB of	address	for system routines, shared libraries, stacks, etc. Programs are started at a fixed address, and the program can use a single instruction to make most memory references. The medium memory model allows for larger than 2GB data areas, or .bss sections. Program units compiled using	either -mcmodel=medium or -fpic	require	additional instructions	to reference memory. The effect	on performance is a function of	the data-use of	the application. The -mcmodel=medium switch must be used at both compile time and link time to create 64-bit executables. Program units	compiled for the default small memory model can	be linked into medium memory model executables as long as they are compiled -fpic, or position-independent.</p>

]]>

<example>-mcmodel=medium</example>

<include flag="Mlarge_arrays" />

</flag>



<flag name="Mlarge_arrays" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mlarge_arrays\b">

<![CDATA[<p>Enable support for 64-bit indexing and single static data objects larger than 2GB in size. This option is default in the presence of -mcmodel=medium. Can be used separately together with the default small memory	model for certain 64-bit applications that manage their	own memory space.</p>

]]>

<example>-Mlarge_arrays</example>

</flag>



<flag name="Mdse" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Mdse\b">

<![CDATA[

<p>Enable dead store elimination.</p>

]]>

</flag>



<flag name="alias_ansi"	class="optimization"

 compilers="pgcc, pgcpp"

 regexp="\-alias\=ansi\b">

<![CDATA[

<p>Enable optimizations	using ANSI C type-based	pointer	disambiguation.</p>

]]>

</flag>



<flag name="Bstatic_pgi" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-Bstatic_pgi\b">

<![CDATA[

<p>Staticily link with the PGI runtime libraries.  System libraries may	still be dynamically linked.</p>

]]>

</flag>



<flag name="Odefault" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-O\b">

<![CDATA[

 <p>Set	the optimization level to -O2</p>

]]>

<include flag="O2" />

<example>-O</example>

</flag>



<flag name="O0"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-O0\b">

<![CDATA[

 <p>A basic block is generated for each	C statement.  No scheduling is done

between	statements.  No	global optimizations are performed.</p>

]]>

<example>-O0</example>

</flag>



<flag name="O1"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-O1\b">

<![CDATA[

 <p>Level-one optimization specifies local optimization	(-O1). The compiler performs scheduling	of basic blocks	as well	as register allocation.	This optimization level	is a good choice when the code is very irregular; that is it contains many short statements containing IF statements and the program does not contain loops (DO	or DO WHILE statements). For certain types of code, this optimization level may	perform	better than level-two (-O2) although this case rarely occurs.</p>

<p>The PGI compilers perform many different types of local optimizations, including but	not limited to:</p>

<ul>

<li>Algebraic identity removal</li>

<li>Constant folding</li>

<li>Common subexpression elimination</li>

<li>Local register optimization</li>

<li>Peephole optimizations</li>

<li>Redundant load and store elimination</li>

<li>Strength reductions</li>

</ul>



]]>

<example>-O1</example>

</flag>



<flag name="O2"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-O2\b">

<![CDATA[

 <p>

Level-two optimization (-O2 or -O) specifies global optimization. The -fast option generally will specify global optimization; however,	the -fast switch will vary from	release	to release depending on	a reasonable selection of switches for any one particular release. The -O or -O2 level performs	all level-one local optimizations as well as global optimizations. Control flow	analysis is applied and	global registers are allocated for all functions and subroutines. Loop regions are given special consideration.	This optimization level	is a good choice when the program contains loops, the loops are	short, and the structure of the	code is	regular.</p>

<p>The PGI compilers perform many different types of global optimizations, including but not limited to:</p>

<ul>

<li>Branch to branch elimination</li>

<li>Constant propagation</li>

<li>Copy propagation</li>

<li>Dead store elimination</li>

<li>Global register allocation</li>

<li>Invariant code motion</li>

<li>Induction variable elimination</li>

</ul>

</p>

]]>

<include flag="O1" />

<example>-O2</example>

</flag>



<flag name="O3"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-O3\b">

<![CDATA[

 <p>All	level 1	and 2 optimizations are	performed.

In addition, this level	enables	more aggressive	code hoisting and scalar replacement optimizations that	may or may not be profitable.</p>

]]>

<include flag="O2" />

<example>-O3</example>

</flag>



<flag name="O4"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-O4\b">

<![CDATA[

 <p>Performs all level 1, 2, and 3 optimizations and enables hoisting of guarded invariant floating point expressions.</p>

]]>

<include flag="O3" />

<example>-O4</example>

</flag>



<flag name="tp_subopt" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-tp\s+([^,\s]+),(\S+)\b">

<include text="-tp $1" />

<include text="-tp $2" />

<display enable="1" />

Create a Unified Binary	using multiple targets.

</flag>



<flag name="tpk8-32" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-tp\s+k8-32\b">

<![CDATA[

 <p>Specify the	type of	the target processor as	AMD64 Processor	32-bit mode.</p>

]]>

<example>-tp k8-32</example>

</flag>



<flag name="tpk8-64" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-tp\s+k8-64\b">

<![CDATA[

 <p>Specify the	type of	the target processor as	AMD64 Processor	64-bit mode.</p>

]]>

<example>-tp k8-64</example>

</flag>



<flag name="tpk8" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-tp\s+k8(?:$|\s)">

<![CDATA[

 <p>Specify the	type of	the target processor as	AMD64 Processor	32-bit mode.</p>

]]>

<example>-tp k8</example>

</flag>



<flag name="tpbarcelona-64" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-tp\s+barcelona-64\b">

<![CDATA[

 <p>Specify the	type of	the target processor as	AMD64 Barcelona	Processor 64-bit mode.</p>

]]>

<example>-tp barcelona-64</example>

</flag>





<flag name="tpbarcelona32" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-tp\s+barcelona-32\b">

<![CDATA[

 <p>Specify the	type of	the target processor as	AMD64 Barcelona	Processor 32-bit mode.</p>

]]>

<example>-tp barcelona-32</example>

</flag>





<flag name="tpbarcelona" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-tp\s+barcelona\b">

<![CDATA[

 <p>Specify the	type of	the target processor as	AMD64 Barcelona	Processor 32-bit mode.</p>

]]>

<example>-tp barcelona</example>

</flag><flag name="tpp7-64" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-tp\s+p7-64\b">

<![CDATA[

 <p>Specify the	type of	the target processor as	Intel P7 Architecture with

 EM64t,	64-bit mode.</p>

]]>

<example>-tp p7-64</example>

</flag>



<flag name="tpp7" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-tp\s+p7\b">

<![CDATA[

 <p>Specify the	type of	the target processor as	Intel P7 Architecture (Pentium

 4, Xeon, Centrino).</p>

]]>

<example>-tp p7</example>

</flag>



<flag name="tpcore2-64"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-tp\s+core2-64\b">

<![CDATA[

 <p>Specify the	type of	the target processor as	Intel Core 2 EM64T or compatible architecture using 64-bit mode.</p>

]]>

<example>-tp core2-64</example>

</flag>



<flag name="tpcore2" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-tp\s+core2\b">

<![CDATA[

 <p>Specify the	type of	the target processor as	Intel Core 2 or	compatible architecture	using 32-bit mode.</p>

]]>

<example>-tp core2</example>

</flag>



<flag name="tpx64" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-tp\s+x64\b">

<![CDATA[

 <p>Use	the unified AMD/Intel 64-bit mode.</p>

]]>

<example>-tp x64</example>

</flag>





<flag name="tp"	class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-tp\s+([\-\w]+)\b">

<include text="-tp $1" />

<display enable="0" />

</flag>



<flag name="NCPUS" class="other"

 compilers="pgcc, pgcpp, pgf95"

 regexp="NCPUS\b">

<![CDATA[

 <p>The	NCPUS environment variable can be used to set the number of processes or threads used

in parallel regions. The default is to use only	one process or thread (serial mode). If	both

OMP_NUM_THREADS	and NCPUS are set, the value of	OMP_NUM_THREADS	takes precedence.

Warning: setting NCPUS to a value larger than the number of physical processors	or cores in your system

can cause parallel programs to run very	slowly.</p>

]]>

</flag>



<flag name="MP_BIND" class="other"

 compilers="pgcc, pgcpp, pgf95"

 regexp="MP_BIND\b">

<![CDATA[

<p>The MP_BIND environment variable can	be set to yes or y to bind processes or	threads

executing in a parallel	region to physical processors, or to no	or n to	disable	such binding. The default is

to not bind processes to processors. This is an	execution time environment variable interpreted	by the

PGI runtime support libraries. It does not affect the behavior of the PGI compilers in any way.	Note: the

MP_BIND	environment variable is	not supported on all platforms.</p>

]]>

</flag>



<flag name="MP_BLIST" class="other"

 compilers="pgcc, pgcpp, pgf95"

 regexp="MP_BLIST\b">

<![CDATA[

 <p>In addition	to the MP_BIND variable, it is possible	to define the thread-CPU relationship.

For example, setting MP_BLIST=3,2,1,0 maps CPUs	3, 2, 1	and 0 to threads 0, 1, 2 and 3 respectively.</p>

]]>

</flag>



<flag name="PGI_HUGE_PAGES" class="other"

 compilers="pgcc, pgcpp, pgf95"

 regexp="PGI_HUGE_PAGES\b">

<![CDATA[

 <p>Sets the maxium number of huge pages each process is allowed to used.  If not set, then the	process	may use	all available huge pages.</p>

]]>

</flag>



<flag name="Link_path" class="other"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-L[\w\\\/]+\b">

<![CDATA[

<p>Specifies a directory to search for libraries. Use -L to add	directories to the search path for library files.

Multiple -L options are	valid. However,	the position of	multiple -L options is important relative to -l

options	supplied.</p>

]]>

<example>-L/path/to/libs</example>

</flag>



<flag name="SmartHeap" class="optimization"

 compilers="pgcc, pgcpp, pgf95"

 regexp="-lsmartheap\b">

<![CDATA[

<p>Link	using <a href=http://www.microquill.com/>MicroQuill's</a> SmartHeap 8 (32-bit) library for Linux.

Description from Microquill: </p>

<p>SmartHeap is	a fast (3X-100X	faster than compiler-supplied libraries), portable (Windows, Linux, Solaris, HP-UX, IBM-AIX, Dec OSF Tru64, SGI	Irix), reliable, ANSI-compliant	malloc/operator	new library. SmartHeap supports	multiple memory	pools, includes	a fixed-size allocator,	and is thread-safe. SmartHeap also includes comprehensive memory debugging APIs	to detect leakage, overwrites, double-frees, wild pointers, out	of memory, references to previously freed memory, and other memory errors.

</p>

]]>

</flag>



</flagsdescription>

