SPEC Accel OpenMP Flag Description for the Intel(R) C/C++ Compiler for IA32 and Intel 64 applications and Intel(R) Fortran Compiler for IA32 and Intel 64 applications

Optimization Flags

-Istd
-Istdi
-Lstd
-qopenmp
-qopenmp-offload
-Ofast
-O3
-no-prec-div
-fp-model
-xCORE-AVX512
-qopt-zmm-usage
-fimf-precision
-no-prec-sqrt
-qopt-multiple-gather-scatter-by-shuffles
-qopt-streaming-stores
-ip
-ipo
-qopt-prefetch

- -Istd
- -I.?\s*[^ ]*include[^ ]*
- Adds the directory for include files to the search path at compile time.
- -Istdi
- -I.?\s
- Adds the directory for include files to the search path at compile time.
- -Lstd
- -L\s*[^ ]*[^ ]*
- Adds the library directory search path at link time
- -qopenmp
- -qopenmp(?=\s|$)
- Enable the compiler to generate multi-threaded code based on the OpenMP* directives (same as -fopenmp)
- -qopenmp-offload
- -qopenmp-offload=(host|mic|gfx)(?=\s|$)
- Enables OpenMP* offloading compilation for target pragmas. This option only applies to Intel(R) MIC Architecture and Intel(R) Graphics Technology. Enabled by default with -qopenmp. Use -qno-openmp-offload to disable.
  Specify kind to specify the default device for target pragmas
  host - allow target code to run on host system while still doing the outlining for offload
  mic - specify Intel(R) MIC Architecture
  gfx - specify Intel(R) Graphics Technology
- -Ofast
- -Ofast(?=\s|$)
- Enable -O3 -no-prec-div -fp-model fast=2 optimizations.
- -O3
- -O3(?=\s|$)
- Optimize for maximum speed and enable more aggressive optimizations that may not improve performance on some programs.
- -no-prec-div
- -no-prec-div(?=\s|$)
- Improve precision of FP divides (some speed impact).
- -fp-model
- -fp-model=([a-z\,/]+|$)(?=\s|$)
- -fp-model
  enable floating point model variation
  [no-]except - enable/disable floating point exception semantics
  fast[=1|2] - enables more aggressive floating point optimizations
  precise - allows value-safe optimizations
  source - enables intermediates in source precision
  sets -assume protect_parens for Fortran
  strict - enables -fp-model precise -fp-model except, disables
  contractions and enables pragma stdc fenv_access
  consistent - enables consistent, reproducible results for
  different optimization levels or between different
  processors of the same architecture
  double - rounds intermediates in 53-bit (double) precision
  extended - rounds intermediates in 64-bit (extended) precision
- -xCORE-AVX512
- -xCORE-AVX512(?=\s|$)
- May generate Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) Foundation instructions, Intel(R) AVX-512 Conflict Detection instructions, Intel(R) AVX-512 Doubleword and Quadword instructions, Intel(R) AVX-512 Byte and Word instructions and Intel(R) AVX-512 Vector Length Extensions for Intel(R) processors, and the instructions enabled with CORE-AVX2.
- -qopt-zmm-usage
- -qopt-zmm-usage=(low|high)(?=\s|$)
- -qopt-zmm-usage=
  Specifies the level of zmm registers usage. You can specify one of the following:
  low - Tells the compiler that the compiled program is unlikely to benefit from zmm registers usage. It specifies that the compiler should avoid using zmm registers unless it can prove the gain from their usage.
  high - Tells the compiler to generate zmm code without restrictions
- -fimf-precision
- -fimf-precision=(high|medium|low)($|[a-z\,\:]+|)
- -fimf-precision=value[:funclist]
  defines the accuracy (precision) for math library functions
  value - defined as one of the following values
  high - equivalent to max-error = 0.6
  medium - equivalent to max-error = 4 (DEFAULT)
  low - equivalent to accuracy-bits = 11 (single precision); accuracy-bits = 26 (double precision)
  funclist - optional comma separated list of one or more math library functions to which the attribute should be applied
- -no-prec-sqrt
- -no-prec-sqrt(?=\s|$)
- Determine if certain square root optimizations are enabled.
- -qopt-multiple-gather-scatter-by-shuffles
- -qopt-multiple-gather-scatter-by-shuffles(?=\s|$)
- Determine if certain square root optimizations are enabled.
- -qopt-streaming-stores
- -qopt-streaming-stores (always|auto|never)
- Specifies whether streaming stores are generated:
  
  always - enables generation of streaming stores under the assumption that the application is memory bound
  
  auto - compiler decides when streaming stores are used (DEFAULT)
  
  never - disables generation of streaming stores
- -ip
- -ip(?=\s|$)
- Enable single-file IP optimization within files.
- -ipo
- -ipo(?=\s|$)
- Enable multi-file IP optimization between files.
- -qopt-prefetch
- -qopt-prefetch=([0-5])(?=\s|$)
- Enable levels of prefetch insertion, where 0 disables. n may be 0 through 5 inclusive. Default is 2.

- -port_80
- -80
- FPORTABILITY flag
- -port_noformain
- -nofor-main
- No Fortran main method exists, use C equivalent instead.
- -declare_use_inner_simd
- -DSPEC_USE_INNER_SIMD
- Enables the use of nested SIMD statements for OpenMP.

Compiler Flags

-intel_cc
-intel_CC
-intel_f90

- -intel_cc
- (?:/\S+/)?icc\b
- Invoke the Intel C compiler.
- -intel_CC
- (?:/\S+/)?icpc(?=\s|$)
- Invoke the Intel C++ compiler.
- -intel_f90
- (?:/\S+/)?ifort\b
- Invoke the Intel Fortran compiler.

Other Flags

-lfftw3

- -lfftw3
- -lfftw3(?=\s|$)
- Link using FFTW 3.3.9 library for Linux. Description from FFTW:
  
  FFTW lib compiled with -O3 -xCORE-AVX512 -qopt-zmm-usage=high
  
  FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST).

Shell, Environment, and Other Software Settings

One or more of the following settings may have been applied to the testbed. If so, the "Platform Notes" section of the report will say so; and you can read below to find out more about what these settings mean.

LD_LIBRARY_PATH=<directories> (linker)
LD_LIBRARY_PATH controls the search order for both the compile-time and run-time linkers. Usually, it can be defaulted; but testers may sometimes choose to explicitly set it (as documented in the notes in the submission), in order to ensure that the correct versions of libraries are picked up.

STACKSIZE=<n> (Unix)
Set the size of the stack (temporary storage area) for each slave thread of a multithreaded program.