CPU2017 Result Flag Description

Base Optimization Flags

C benchmarks

- -m64
- CC, LD
- Generates code for a 64-bit environment. The 64-bit environment sets int to 32 bits and long and pointer to 64 bits and generates code for AMD's x86-64 architecture. The compiler generates AMD64, INTEL64, x86-64 64-bit ABI. The default on a 32-bit host is 32-bit ABI. The default on a 64-bit host is 64-bit ABI if the target platform specified is 64-bit, otherwise the default is 32-bit.
- -Wl,-mllvm -Wl,-align-all-nofallthru-blocks=6
- LDFLAGS
- Forces the alignment of all blocks that have no fall-through predecessors (i.e. don't add nops that are executed). In log2 format (e.g 4 means align on 16B boundaries).
- -Wl,-mllvm -Wl,-reduce-array-computations=3
- LDFLAGS
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -O3
- COPTIMIZE
- Like -O2, except that it enables optimizations that take longer to perform or that may generate larger code (in an attempt to make the program run faster).
  
  If multiple "O" options are used, with or without level numbers, the last such option is the one that is effective.
- Includes:
  - -O2
    - -O1
- -march=znver5
- COPTIMIZE
- Specify that Clang should generate code for a specific processor family member and later. For example, if you specify -march=znver1, the compiler is allowed to generate instructions that are valid on AMD Zen processors, but which may not exist on earlier products. -march=znver4 enables AVX 512 ISA for Genoa (znver4) processors.
- -fveclib=AMDLIBM
- COPTIMIZE
- Use the given vector functions library.
- -ffast-math
- COPTIMIZE
- Enables a range of optimizations that provide faster, though sometimes less precise, mathematical operations that may not conform to the IEEE-754 specifications. When this option is specified, the __STDC_IEC_559__ macro is ignored even if set by the system headers.
- -fopenmp
- Yes
- COPTIMIZE
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -DSPEC_OPENMP
- COPTIMIZE
- Definition of this macro indicates that compilation for parallel operation is enabled, and that any OpenMP directives or pragmas will be visible to the compiler. The behavior of this macro is overridden if -DSPEC_SUPPRESS_OPENMP also appears in the list of compilation flags.
- -flto
- COPTIMIZE
- Generate output files in LLVM formats suitable for link time optimization. When used with -S this generates LLVM intermediate language assembly files, otherwise this generates LLVM bitcode format object files (which may be passed to the linker depending on the stage selection options).
- -fremap-arrays
- COPTIMIZE
- This option enables an optimization that transforms the data layout of a single dimensional array to provide better cache locality by analysing the access patterns.
- -fstrip-mining
- COPTIMIZE
- Enables loop strip mining optimization. This optimization breaks a large loop into smaller segments or strips to improve temporal and spatial locality.
- -fstruct-layout=7
- COPTIMIZE
- Analyzes the whole program to determine if the structures in the code can be peeled, if dead or redundant fields can be deleted, and if the pointer or integer fields in the structure can be compressed. If feasible, this optimization transforms the code to enable these improvements. This transformation is likely to improve cache utilization and memory bandwidth. It is expected to improve the scalability of programs executed on multiple cores. This is effective only under flto as the whole program analysis is required to perform this optimization. You can choose different levels of aggressiveness with which this optimization can be applied to your application; with 1 being the least aggressive and 7 being the most aggressive level.
  
  Possible values:
  - fstruct-layout=0: disables structure peeling (default).
  - fstruct-layout=1: enables structure peeling.
  - fstruct-layout=2: enables structure peeling and selectively compresses self-referential pointers in these structures to 32-bit pointers wherever safe.
  - fstruct-layout=3: enables structure peeling and selectively compresses self-referential pointers in these structures to 16-bit pointers wherever safe.
  - fstruct-layout=4: enables structure peeling, pointer compression as in level 2 and further enables compression of structure fields which are of 64-bit integer type to 32-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=5: enables structure peeling, pointer compression as in level 3 and further enables compression of structure fields which are of 64-bit integer type to 32-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=6: enables structure peeling, pointer compression as in level 2 and further enables compression of structure fields which are of type 64-bit integer type to 16-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=7: enables structure peeling, pointer compression as in level 3 and further enables compression of structure fields which are of type 64-bit integer type to 16-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=8: enables structure peeling, pointer compression, 64 bit integer type compression as in level 6 and creates optimal ordering of peeled structure fields which could improve runtime performance.
  - fstruct-layout=9: enables structure peeling, pointer compression, 64 bit integer type compression as in level 7 and creates optimal ordering of peeled structure fields which could improve runtime performance.
  Note:
  fstruct-layout=4 and fstruct-layout=5 are derived from fstruct-layout=2 and fstruct-layout=3 respectively with the added feature of safe compression of 64-bit integer fields to 32-bit integer fields in structures. Going from fstruct-layout=4 to fstruct-layout=5 may result in higher performance if the pointer values are such that the pointers can be compressed to 16-bits.
  
  fstruct-layout=6 and fstruct-layout=7 are derived from fstruct-layout=2 and fstructlayout=3 respectively, with the added feature of safe compression of 64 bit integer fields to 16 bit integer in structures. Going from fstruct-layout=6 to fstruct-layout=7 may result in higher performance if the pointer values are such that the pointers can be compressed to 16-bits.
- -mllvm -inline-threshold=1000
- COPTIMIZE
- Sets the compiler's inlining threshold level to the value passed as the argument. The inline threshold is used in the inliner heuristics to decide which functions should be inlined.
- -mllvm -reduce-array-computations=3
- COPTIMIZE
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -mllvm -unroll-threshold=50
- COPTIMIZE
- Sets the limit at which loops will be unrolled. For example, if unroll threshold is set to 100 then only loops with 100 or fewer instructions will be unrolled.
- -zopt
- COPTIMIZE
- This option enables a subset of scalar, vector and loop transformations including improved variants of loop invariant code motion, SLP and loop vectorizations, loop-fusion, loop-interchange, loop-unswitch, loop tiling and loop distribution.
- -mrecip=none
- EXTRA_CFLAGS
- This option enables use of RCPSS and RSQRTSS instructions with an additional Newton-Raphson step to increase precision instead of DIVSS and SQRTSS.
- -fopenmp=libomp
- Yes
- EXTRA_LIBS
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -lomp
- EXTRA_LIBS
- Instructs the compiler to link with the OpenMP runtime libraries.
- -lamdlibm
- EXTRA_LIBS
- Instructs the compiler to link with AMD-supported optimized math library.
- -lamdalloc
- EXTRA_LIBS
- amdalloc is a AMD's memory allocator based on jemalloc library and is available as a part of AOCC binary package.
- -lflang
- EXTRA_LIBS
- Instructs the compiler to link with flang Fortran runtime libraries.

Fortran benchmarks

- -m64
- FC, LD
- Generates code for a 64-bit environment. The 64-bit environment sets int to 32 bits and long and pointer to 64 bits and generates code for AMD's x86-64 architecture. The compiler generates AMD64, INTEL64, x86-64 64-bit ABI. The default on a 32-bit host is 32-bit ABI. The default on a 64-bit host is 64-bit ABI if the target platform specified is 64-bit, otherwise the default is 32-bit.
- -Wl,-mllvm -Wl,-align-all-nofallthru-blocks=6
- LDFLAGS
- Forces the alignment of all blocks that have no fall-through predecessors (i.e. don't add nops that are executed). In log2 format (e.g 4 means align on 16B boundaries).
- -Wl,-mllvm -Wl,-reduce-array-computations=3
- LDFLAGS
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -Wl,-mllvm -Wl,-enable-X86-prefetching
- LDFFLAGS
- This optimization enables generation of prefetch instructions for tightly coupled loops
- -DSPEC_OPENMP
- FOPTIMIZE
- Definition of this macro indicates that compilation for parallel operation is enabled, and that any OpenMP directives or pragmas will be visible to the compiler. The behavior of this macro is overridden if -DSPEC_SUPPRESS_OPENMP also appears in the list of compilation flags.
- -O3
- FOPTIMIZE
- Like -O2, except that it enables optimizations that take longer to perform or that may generate larger code (in an attempt to make the program run faster).
  
  If multiple "O" options are used, with or without level numbers, the last such option is the one that is effective.
- Includes:
  - -O2
    - -O1
- -march=znver5
- FOPTIMIZE
- Specify that Clang should generate code for a specific processor family member and later. For example, if you specify -march=znver1, the compiler is allowed to generate instructions that are valid on AMD Zen processors, but which may not exist on earlier products. -march=znver4 enables AVX 512 ISA for Genoa (znver4) processors.
- -fveclib=AMDLIBM
- FOPTIMIZE
- Use the given vector functions library.
- -ffast-math
- FOPTIMIZE
- Enables a range of optimizations that provide faster, though sometimes less precise, mathematical operations that may not conform to the IEEE-754 specifications. When this option is specified, the __STDC_IEC_559__ macro is ignored even if set by the system headers.
- -fopenmp
- Yes
- FOPTIMIZE
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -flto
- FOPTIMIZE
- Generate output files in LLVM formats suitable for link time optimization. When used with -S this generates LLVM intermediate language assembly files, otherwise this generates LLVM bitcode format object files (which may be passed to the linker depending on the stage selection options).
- -funroll-loops
- FOPTIMIZE
- This option instructs the compiler to unroll loops wherever possible.
- -mllvm -lsr-in-nested-loop
- FOPTIMIZE
- Enables loop strength reduction for nested loop structures. By default, the compiler performs loop strength reduction only for the innermost loop.
- -mllvm -reduce-array-computations=3
- FOPTIMIZE
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -Mrecursive
- FOPTIMIZE
- Allocate local variables on the stack, thus allowing recursion. SAVEd, data-initialized, or namelist members are always allocated statically, regardless of the setting of this switch.
- -zopt
- FOPTIMIZE
- This option enables a subset of scalar, vector and loop transformations including improved variants of loop invariant code motion, SLP and loop vectorizations, loop-fusion, loop-interchange, loop-unswitch, loop tiling and loop distribution.
- -fopenmp=libomp
- Yes
- EXTRA_LIBS
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -lomp
- EXTRA_LIBS
- Instructs the compiler to link with the OpenMP runtime libraries.
- -lamdlibm
- EXTRA_LIBS
- Instructs the compiler to link with AMD-supported optimized math library.
- -lamdalloc
- EXTRA_LIBS
- amdalloc is a AMD's memory allocator based on jemalloc library and is available as a part of AOCC binary package.
- -lflang
- EXTRA_LIBS
- Instructs the compiler to link with flang Fortran runtime libraries.

Benchmarks using both Fortran and C

- -m64
- CC, FC, LD
- Generates code for a 64-bit environment. The 64-bit environment sets int to 32 bits and long and pointer to 64 bits and generates code for AMD's x86-64 architecture. The compiler generates AMD64, INTEL64, x86-64 64-bit ABI. The default on a 32-bit host is 32-bit ABI. The default on a 64-bit host is 64-bit ABI if the target platform specified is 64-bit, otherwise the default is 32-bit.
- -Wl,-mllvm -Wl,-align-all-nofallthru-blocks=6
- LDFLAGS
- Forces the alignment of all blocks that have no fall-through predecessors (i.e. don't add nops that are executed). In log2 format (e.g 4 means align on 16B boundaries).
- -Wl,-mllvm -Wl,-reduce-array-computations=3
- LDFLAGS
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -Wl,-mllvm -Wl,-enable-X86-prefetching
- LDFFLAGS
- This optimization enables generation of prefetch instructions for tightly coupled loops
- -O3
- COPTIMIZE, FOPTIMIZE
- Like -O2, except that it enables optimizations that take longer to perform or that may generate larger code (in an attempt to make the program run faster).
  
  If multiple "O" options are used, with or without level numbers, the last such option is the one that is effective.
- Includes:
  - -O2
    - -O1
- -march=znver5
- COPTIMIZE, FOPTIMIZE
- Specify that Clang should generate code for a specific processor family member and later. For example, if you specify -march=znver1, the compiler is allowed to generate instructions that are valid on AMD Zen processors, but which may not exist on earlier products. -march=znver4 enables AVX 512 ISA for Genoa (znver4) processors.
- -fveclib=AMDLIBM
- COPTIMIZE, FOPTIMIZE
- Use the given vector functions library.
- -ffast-math
- COPTIMIZE, FOPTIMIZE
- Enables a range of optimizations that provide faster, though sometimes less precise, mathematical operations that may not conform to the IEEE-754 specifications. When this option is specified, the __STDC_IEC_559__ macro is ignored even if set by the system headers.
- -fopenmp
- Yes
- COPTIMIZE, FOPTIMIZE
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -DSPEC_OPENMP
- COPTIMIZE, FOPTIMIZE
- Definition of this macro indicates that compilation for parallel operation is enabled, and that any OpenMP directives or pragmas will be visible to the compiler. The behavior of this macro is overridden if -DSPEC_SUPPRESS_OPENMP also appears in the list of compilation flags.
- -flto
- COPTIMIZE, FOPTIMIZE
- Generate output files in LLVM formats suitable for link time optimization. When used with -S this generates LLVM intermediate language assembly files, otherwise this generates LLVM bitcode format object files (which may be passed to the linker depending on the stage selection options).
- -fremap-arrays
- COPTIMIZE
- This option enables an optimization that transforms the data layout of a single dimensional array to provide better cache locality by analysing the access patterns.
- -fstrip-mining
- COPTIMIZE
- Enables loop strip mining optimization. This optimization breaks a large loop into smaller segments or strips to improve temporal and spatial locality.
- -fstruct-layout=7
- COPTIMIZE
- Analyzes the whole program to determine if the structures in the code can be peeled, if dead or redundant fields can be deleted, and if the pointer or integer fields in the structure can be compressed. If feasible, this optimization transforms the code to enable these improvements. This transformation is likely to improve cache utilization and memory bandwidth. It is expected to improve the scalability of programs executed on multiple cores. This is effective only under flto as the whole program analysis is required to perform this optimization. You can choose different levels of aggressiveness with which this optimization can be applied to your application; with 1 being the least aggressive and 7 being the most aggressive level.
  
  Possible values:
  - fstruct-layout=0: disables structure peeling (default).
  - fstruct-layout=1: enables structure peeling.
  - fstruct-layout=2: enables structure peeling and selectively compresses self-referential pointers in these structures to 32-bit pointers wherever safe.
  - fstruct-layout=3: enables structure peeling and selectively compresses self-referential pointers in these structures to 16-bit pointers wherever safe.
  - fstruct-layout=4: enables structure peeling, pointer compression as in level 2 and further enables compression of structure fields which are of 64-bit integer type to 32-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=5: enables structure peeling, pointer compression as in level 3 and further enables compression of structure fields which are of 64-bit integer type to 32-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=6: enables structure peeling, pointer compression as in level 2 and further enables compression of structure fields which are of type 64-bit integer type to 16-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=7: enables structure peeling, pointer compression as in level 3 and further enables compression of structure fields which are of type 64-bit integer type to 16-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=8: enables structure peeling, pointer compression, 64 bit integer type compression as in level 6 and creates optimal ordering of peeled structure fields which could improve runtime performance.
  - fstruct-layout=9: enables structure peeling, pointer compression, 64 bit integer type compression as in level 7 and creates optimal ordering of peeled structure fields which could improve runtime performance.
  Note:
  fstruct-layout=4 and fstruct-layout=5 are derived from fstruct-layout=2 and fstruct-layout=3 respectively with the added feature of safe compression of 64-bit integer fields to 32-bit integer fields in structures. Going from fstruct-layout=4 to fstruct-layout=5 may result in higher performance if the pointer values are such that the pointers can be compressed to 16-bits.
  
  fstruct-layout=6 and fstruct-layout=7 are derived from fstruct-layout=2 and fstructlayout=3 respectively, with the added feature of safe compression of 64 bit integer fields to 16 bit integer in structures. Going from fstruct-layout=6 to fstruct-layout=7 may result in higher performance if the pointer values are such that the pointers can be compressed to 16-bits.
- -mllvm -inline-threshold=1000
- COPTIMIZE
- Sets the compiler's inlining threshold level to the value passed as the argument. The inline threshold is used in the inliner heuristics to decide which functions should be inlined.
- -mllvm -reduce-array-computations=3
- COPTIMIZE, FOPTIMIZE
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -mllvm -unroll-threshold=50
- COPTIMIZE
- Sets the limit at which loops will be unrolled. For example, if unroll threshold is set to 100 then only loops with 100 or fewer instructions will be unrolled.
- -zopt
- COPTIMIZE, FOPTIMIZE
- This option enables a subset of scalar, vector and loop transformations including improved variants of loop invariant code motion, SLP and loop vectorizations, loop-fusion, loop-interchange, loop-unswitch, loop tiling and loop distribution.
- -funroll-loops
- FOPTIMIZE
- This option instructs the compiler to unroll loops wherever possible.
- -mllvm -lsr-in-nested-loop
- FOPTIMIZE
- Enables loop strength reduction for nested loop structures. By default, the compiler performs loop strength reduction only for the innermost loop.
- -Mrecursive
- FOPTIMIZE
- Allocate local variables on the stack, thus allowing recursion. SAVEd, data-initialized, or namelist members are always allocated statically, regardless of the setting of this switch.
- -mrecip=none
- EXTRA_CFLAGS
- This option enables use of RCPSS and RSQRTSS instructions with an additional Newton-Raphson step to increase precision instead of DIVSS and SQRTSS.
- -fopenmp=libomp
- Yes
- EXTRA_LIBS
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -lomp
- EXTRA_LIBS
- Instructs the compiler to link with the OpenMP runtime libraries.
- -lamdlibm
- EXTRA_LIBS
- Instructs the compiler to link with AMD-supported optimized math library.
- -lamdalloc
- EXTRA_LIBS
- amdalloc is a AMD's memory allocator based on jemalloc library and is available as a part of AOCC binary package.
- -lflang
- EXTRA_LIBS
- Instructs the compiler to link with flang Fortran runtime libraries.

Benchmarks using Fortran, C, and C++

- -m64
- CC, CXX, FC, LD
- Generates code for a 64-bit environment. The 64-bit environment sets int to 32 bits and long and pointer to 64 bits and generates code for AMD's x86-64 architecture. The compiler generates AMD64, INTEL64, x86-64 64-bit ABI. The default on a 32-bit host is 32-bit ABI. The default on a 64-bit host is 64-bit ABI if the target platform specified is 64-bit, otherwise the default is 32-bit.
- -std=c++14
- CXX, LD
- Selects the C++ language dialect.
- -Wl,-mllvm -Wl,-align-all-nofallthru-blocks=6
- LDFLAGS
- Forces the alignment of all blocks that have no fall-through predecessors (i.e. don't add nops that are executed). In log2 format (e.g 4 means align on 16B boundaries).
- -Wl,-mllvm -Wl,-reduce-array-computations=3
- LDFLAGS
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -Wl,-mllvm -Wl,-x86-use-vzeroupper=false
- LDCXXFLAGS
- This option controls the vzeroupper instruction generation before a transfer of control flow. Not emitting the vzeroupper instruction can help minimize the AVX to SSE transition penalty.
- -O3
- COPTIMIZE, CXXOPTIMIZE, FOPTIMIZE
- Like -O2, except that it enables optimizations that take longer to perform or that may generate larger code (in an attempt to make the program run faster).
  
  If multiple "O" options are used, with or without level numbers, the last such option is the one that is effective.
- Includes:
  - -O2
    - -O1
- -march=znver5
- COPTIMIZE, CXXOPTIMIZE, FOPTIMIZE
- Specify that Clang should generate code for a specific processor family member and later. For example, if you specify -march=znver1, the compiler is allowed to generate instructions that are valid on AMD Zen processors, but which may not exist on earlier products. -march=znver4 enables AVX 512 ISA for Genoa (znver4) processors.
- -fveclib=AMDLIBM
- COPTIMIZE, CXXOPTIMIZE, FOPTIMIZE
- Use the given vector functions library.
- -ffast-math
- COPTIMIZE, CXXOPTIMIZE, FOPTIMIZE
- Enables a range of optimizations that provide faster, though sometimes less precise, mathematical operations that may not conform to the IEEE-754 specifications. When this option is specified, the __STDC_IEC_559__ macro is ignored even if set by the system headers.
- -fopenmp
- Yes
- COPTIMIZE, CXXOPTIMIZE, FOPTIMIZE
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -DSPEC_OPENMP
- COPTIMIZE, CXXOPTIMIZE, FOPTIMIZE
- Definition of this macro indicates that compilation for parallel operation is enabled, and that any OpenMP directives or pragmas will be visible to the compiler. The behavior of this macro is overridden if -DSPEC_SUPPRESS_OPENMP also appears in the list of compilation flags.
- -flto
- COPTIMIZE, CXXOPTIMIZE, FOPTIMIZE
- Generate output files in LLVM formats suitable for link time optimization. When used with -S this generates LLVM intermediate language assembly files, otherwise this generates LLVM bitcode format object files (which may be passed to the linker depending on the stage selection options).
- -fremap-arrays
- COPTIMIZE
- This option enables an optimization that transforms the data layout of a single dimensional array to provide better cache locality by analysing the access patterns.
- -fstrip-mining
- COPTIMIZE
- Enables loop strip mining optimization. This optimization breaks a large loop into smaller segments or strips to improve temporal and spatial locality.
- -fstruct-layout=7
- COPTIMIZE
- Analyzes the whole program to determine if the structures in the code can be peeled, if dead or redundant fields can be deleted, and if the pointer or integer fields in the structure can be compressed. If feasible, this optimization transforms the code to enable these improvements. This transformation is likely to improve cache utilization and memory bandwidth. It is expected to improve the scalability of programs executed on multiple cores. This is effective only under flto as the whole program analysis is required to perform this optimization. You can choose different levels of aggressiveness with which this optimization can be applied to your application; with 1 being the least aggressive and 7 being the most aggressive level.
  
  Possible values:
  - fstruct-layout=0: disables structure peeling (default).
  - fstruct-layout=1: enables structure peeling.
  - fstruct-layout=2: enables structure peeling and selectively compresses self-referential pointers in these structures to 32-bit pointers wherever safe.
  - fstruct-layout=3: enables structure peeling and selectively compresses self-referential pointers in these structures to 16-bit pointers wherever safe.
  - fstruct-layout=4: enables structure peeling, pointer compression as in level 2 and further enables compression of structure fields which are of 64-bit integer type to 32-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=5: enables structure peeling, pointer compression as in level 3 and further enables compression of structure fields which are of 64-bit integer type to 32-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=6: enables structure peeling, pointer compression as in level 2 and further enables compression of structure fields which are of type 64-bit integer type to 16-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=7: enables structure peeling, pointer compression as in level 3 and further enables compression of structure fields which are of type 64-bit integer type to 16-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=8: enables structure peeling, pointer compression, 64 bit integer type compression as in level 6 and creates optimal ordering of peeled structure fields which could improve runtime performance.
  - fstruct-layout=9: enables structure peeling, pointer compression, 64 bit integer type compression as in level 7 and creates optimal ordering of peeled structure fields which could improve runtime performance.
  Note:
  fstruct-layout=4 and fstruct-layout=5 are derived from fstruct-layout=2 and fstruct-layout=3 respectively with the added feature of safe compression of 64-bit integer fields to 32-bit integer fields in structures. Going from fstruct-layout=4 to fstruct-layout=5 may result in higher performance if the pointer values are such that the pointers can be compressed to 16-bits.
  
  fstruct-layout=6 and fstruct-layout=7 are derived from fstruct-layout=2 and fstructlayout=3 respectively, with the added feature of safe compression of 64 bit integer fields to 16 bit integer in structures. Going from fstruct-layout=6 to fstruct-layout=7 may result in higher performance if the pointer values are such that the pointers can be compressed to 16-bits.
- -mllvm -inline-threshold=1000
- COPTIMIZE
- Sets the compiler's inlining threshold level to the value passed as the argument. The inline threshold is used in the inliner heuristics to decide which functions should be inlined.
- -mllvm -reduce-array-computations=3
- COPTIMIZE, CXXOPTIMIZE, FOPTIMIZE
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -mllvm -unroll-threshold=50
- COPTIMIZE
- Sets the limit at which loops will be unrolled. For example, if unroll threshold is set to 100 then only loops with 100 or fewer instructions will be unrolled.
- -zopt
- COPTIMIZE, CXXOPTIMIZE, FOPTIMIZE
- This option enables a subset of scalar, vector and loop transformations including improved variants of loop invariant code motion, SLP and loop vectorizations, loop-fusion, loop-interchange, loop-unswitch, loop tiling and loop distribution.
- -mllvm -loop-unswitch-threshold=200000
- CXXOPTIMIZE
- Sets the limit at which loops will be unswitched. For example, if unswitch threshold is set to 100 then only loops with 100 or fewer instructions will be unswtched.
- -mllvm -unroll-threshold=100
- CXXOPTIMIZE
- Sets the limit at which loops will be unrolled. For example, if unroll threshold is set to 100 then only loops with 100 or fewer instructions will be unrolled.
- -funroll-loops
- FOPTIMIZE
- This option instructs the compiler to unroll loops wherever possible.
- -mllvm -lsr-in-nested-loop
- FOPTIMIZE
- Enables loop strength reduction for nested loop structures. By default, the compiler performs loop strength reduction only for the innermost loop.
- -Mrecursive
- FOPTIMIZE
- Allocate local variables on the stack, thus allowing recursion. SAVEd, data-initialized, or namelist members are always allocated statically, regardless of the setting of this switch.
- -mrecip=none
- EXTRA_CFLAGS
- This option enables use of RCPSS and RSQRTSS instructions with an additional Newton-Raphson step to increase precision instead of DIVSS and SQRTSS.
- -fopenmp=libomp
- Yes
- EXTRA_LIBS
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -lomp
- EXTRA_LIBS
- Instructs the compiler to link with the OpenMP runtime libraries.
- -lamdlibm
- EXTRA_LIBS
- Instructs the compiler to link with AMD-supported optimized math library.
- -lamdalloc
- EXTRA_LIBS
- amdalloc is a AMD's memory allocator based on jemalloc library and is available as a part of AOCC binary package.
- -lflang
- EXTRA_LIBS
- Instructs the compiler to link with flang Fortran runtime libraries.

Peak Optimization Flags

C benchmarks

644.nab_s

- -m64
- CC, LD
- Generates code for a 64-bit environment. The 64-bit environment sets int to 32 bits and long and pointer to 64 bits and generates code for AMD's x86-64 architecture. The compiler generates AMD64, INTEL64, x86-64 64-bit ABI. The default on a 32-bit host is 32-bit ABI. The default on a 64-bit host is 64-bit ABI if the target platform specified is 64-bit, otherwise the default is 32-bit.
- -Wl,-mllvm -Wl,-align-all-nofallthru-blocks=6
- LDFLAGS
- Forces the alignment of all blocks that have no fall-through predecessors (i.e. don't add nops that are executed). In log2 format (e.g 4 means align on 16B boundaries).
- -Wl,-mllvm -Wl,-reduce-array-computations=3
- LDFLAGS
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -Ofast
- COPTIMIZE
- Enables all the optimizations from -O3 along with other aggressive optimizations that may violate strict compliance with language standards. Refer to the AOCC options document for the language you're using for more detailed documentation of optimizations enabled under -Ofast.
- Includes:
  - -O3
    - -O2
      
      -O1
- -march=znver5
- COPTIMIZE
- Specify that Clang should generate code for a specific processor family member and later. For example, if you specify -march=znver1, the compiler is allowed to generate instructions that are valid on AMD Zen processors, but which may not exist on earlier products. -march=znver4 enables AVX 512 ISA for Genoa (znver4) processors.
- -fveclib=AMDLIBM
- COPTIMIZE
- Use the given vector functions library.
- -ffast-math
- COPTIMIZE
- Enables a range of optimizations that provide faster, though sometimes less precise, mathematical operations that may not conform to the IEEE-754 specifications. When this option is specified, the __STDC_IEC_559__ macro is ignored even if set by the system headers.
- -fopenmp
- Yes
- COPTIMIZE
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -flto
- COPTIMIZE
- Generate output files in LLVM formats suitable for link time optimization. When used with -S this generates LLVM intermediate language assembly files, otherwise this generates LLVM bitcode format object files (which may be passed to the linker depending on the stage selection options).
- -DSPEC_OPENMP
- COPTIMIZE
- Definition of this macro indicates that compilation for parallel operation is enabled, and that any OpenMP directives or pragmas will be visible to the compiler. The behavior of this macro is overridden if -DSPEC_SUPPRESS_OPENMP also appears in the list of compilation flags.
- -fremap-arrays
- COPTIMIZE
- This option enables an optimization that transforms the data layout of a single dimensional array to provide better cache locality by analysing the access patterns.
- -fstrip-mining
- COPTIMIZE
- Enables loop strip mining optimization. This optimization breaks a large loop into smaller segments or strips to improve temporal and spatial locality.
- -fstruct-layout=9
- COPTIMIZE
- Analyzes the whole program to determine if the structures in the code can be peeled, if dead or redundant fields can be deleted, and if the pointer or integer fields in the structure can be compressed. If feasible, this optimization transforms the code to enable these improvements. This transformation is likely to improve cache utilization and memory bandwidth. It is expected to improve the scalability of programs executed on multiple cores. This is effective only under flto as the whole program analysis is required to perform this optimization. You can choose different levels of aggressiveness with which this optimization can be applied to your application; with 1 being the least aggressive and 7 being the most aggressive level.
  
  Possible values:
  - fstruct-layout=0: disables structure peeling (default).
  - fstruct-layout=1: enables structure peeling.
  - fstruct-layout=2: enables structure peeling and selectively compresses self-referential pointers in these structures to 32-bit pointers wherever safe.
  - fstruct-layout=3: enables structure peeling and selectively compresses self-referential pointers in these structures to 16-bit pointers wherever safe.
  - fstruct-layout=4: enables structure peeling, pointer compression as in level 2 and further enables compression of structure fields which are of 64-bit integer type to 32-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=5: enables structure peeling, pointer compression as in level 3 and further enables compression of structure fields which are of 64-bit integer type to 32-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=6: enables structure peeling, pointer compression as in level 2 and further enables compression of structure fields which are of type 64-bit integer type to 16-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=7: enables structure peeling, pointer compression as in level 3 and further enables compression of structure fields which are of type 64-bit integer type to 16-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=8: enables structure peeling, pointer compression, 64 bit integer type compression as in level 6 and creates optimal ordering of peeled structure fields which could improve runtime performance.
  - fstruct-layout=9: enables structure peeling, pointer compression, 64 bit integer type compression as in level 7 and creates optimal ordering of peeled structure fields which could improve runtime performance.
  Note:
  fstruct-layout=4 and fstruct-layout=5 are derived from fstruct-layout=2 and fstruct-layout=3 respectively with the added feature of safe compression of 64-bit integer fields to 32-bit integer fields in structures. Going from fstruct-layout=4 to fstruct-layout=5 may result in higher performance if the pointer values are such that the pointers can be compressed to 16-bits.
  
  fstruct-layout=6 and fstruct-layout=7 are derived from fstruct-layout=2 and fstructlayout=3 respectively, with the added feature of safe compression of 64 bit integer fields to 16 bit integer in structures. Going from fstruct-layout=6 to fstruct-layout=7 may result in higher performance if the pointer values are such that the pointers can be compressed to 16-bits.
- -mllvm -inline-threshold=1000
- COPTIMIZE
- Sets the compiler's inlining threshold level to the value passed as the argument. The inline threshold is used in the inliner heuristics to decide which functions should be inlined.
- -mllvm -reduce-array-computations=3
- COPTIMIZE
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -mllvm -unroll-threshold=50
- COPTIMIZE
- Sets the limit at which loops will be unrolled. For example, if unroll threshold is set to 100 then only loops with 100 or fewer instructions will be unrolled.
- -zopt
- COPTIMIZE
- This option enables a subset of scalar, vector and loop transformations including improved variants of loop invariant code motion, SLP and loop vectorizations, loop-fusion, loop-interchange, loop-unswitch, loop tiling and loop distribution.
- -mrecip=none
- EXTRA_CFLAGS
- This option enables use of RCPSS and RSQRTSS instructions with an additional Newton-Raphson step to increase precision instead of DIVSS and SQRTSS.
- -fopenmp=libomp
- Yes
- EXTRA_LIBS
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -lomp
- EXTRA_LIBS
- Instructs the compiler to link with the OpenMP runtime libraries.
- -lamdlibm
- EXTRA_LIBS
- Instructs the compiler to link with AMD-supported optimized math library.
- -lamdalloc
- EXTRA_LIBS
- amdalloc is a AMD's memory allocator based on jemalloc library and is available as a part of AOCC binary package.
- -lflang
- EXTRA_LIBS
- Instructs the compiler to link with flang Fortran runtime libraries.

Fortran benchmarks

603.bwaves_s

- -m64
- FC, LD
- Generates code for a 64-bit environment. The 64-bit environment sets int to 32 bits and long and pointer to 64 bits and generates code for AMD's x86-64 architecture. The compiler generates AMD64, INTEL64, x86-64 64-bit ABI. The default on a 32-bit host is 32-bit ABI. The default on a 64-bit host is 64-bit ABI if the target platform specified is 64-bit, otherwise the default is 32-bit.
- -Wl,-mllvm -Wl,-align-all-nofallthru-blocks=6
- LDFLAGS
- Forces the alignment of all blocks that have no fall-through predecessors (i.e. don't add nops that are executed). In log2 format (e.g 4 means align on 16B boundaries).
- -Wl,-mllvm -Wl,-reduce-array-computations=3
- LDFLAGS
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -Wl,-mllvm -Wl,-enable-X86-prefetching
- LDFFLAGS
- This optimization enables generation of prefetch instructions for tightly coupled loops
- -DSPEC_OPENMP
- FOPTIMIZE
- Definition of this macro indicates that compilation for parallel operation is enabled, and that any OpenMP directives or pragmas will be visible to the compiler. The behavior of this macro is overridden if -DSPEC_SUPPRESS_OPENMP also appears in the list of compilation flags.
- -Ofast
- FOPTIMIZE
- Enables all the optimizations from -O3 along with other aggressive optimizations that may violate strict compliance with language standards. Refer to the AOCC options document for the language you're using for more detailed documentation of optimizations enabled under -Ofast.
- Includes:
  - -O3
    - -O2
      
      -O1
- -march=znver5
- FOPTIMIZE
- Specify that Clang should generate code for a specific processor family member and later. For example, if you specify -march=znver1, the compiler is allowed to generate instructions that are valid on AMD Zen processors, but which may not exist on earlier products. -march=znver4 enables AVX 512 ISA for Genoa (znver4) processors.
- -fveclib=AMDLIBM
- FOPTIMIZE
- Use the given vector functions library.
- -ffast-math
- FOPTIMIZE
- Enables a range of optimizations that provide faster, though sometimes less precise, mathematical operations that may not conform to the IEEE-754 specifications. When this option is specified, the __STDC_IEC_559__ macro is ignored even if set by the system headers.
- -fopenmp
- Yes
- FOPTIMIZE
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -fscalar-transform
- FOPTIMIZE
- This option enables a subset of scalar transformations including improved variants of various code movement optimizations like hosting and invariant code movement.
- -fvector-transform
- FOPTIMIZE
- This option enables a subset of vector transformations including improved variants of SLP and loop vectorization.
- -mllvm -reduce-array-computations=3
- FOPTIMIZE
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -Mrecursive
- FOPTIMIZE
- Allocate local variables on the stack, thus allowing recursion. SAVEd, data-initialized, or namelist members are always allocated statically, regardless of the setting of this switch.
- -fopenmp=libomp
- Yes
- EXTRA_LIBS
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -lomp
- EXTRA_LIBS
- Instructs the compiler to link with the OpenMP runtime libraries.
- -lamdlibm
- EXTRA_LIBS
- Instructs the compiler to link with AMD-supported optimized math library.
- -lamdalloc
- EXTRA_LIBS
- amdalloc is a AMD's memory allocator based on jemalloc library and is available as a part of AOCC binary package.
- -lflang
- EXTRA_LIBS
- Instructs the compiler to link with flang Fortran runtime libraries.

Benchmarks using both Fortran and C

621.wrf_s

- -m64
- CC, FC, LD
- Generates code for a 64-bit environment. The 64-bit environment sets int to 32 bits and long and pointer to 64 bits and generates code for AMD's x86-64 architecture. The compiler generates AMD64, INTEL64, x86-64 64-bit ABI. The default on a 32-bit host is 32-bit ABI. The default on a 64-bit host is 64-bit ABI if the target platform specified is 64-bit, otherwise the default is 32-bit.
- -Wl,-mllvm -Wl,-align-all-nofallthru-blocks=6
- LDFLAGS
- Forces the alignment of all blocks that have no fall-through predecessors (i.e. don't add nops that are executed). In log2 format (e.g 4 means align on 16B boundaries).
- -Wl,-mllvm -Wl,-reduce-array-computations=3
- LDFLAGS
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -Wl,-mllvm -Wl,-enable-X86-prefetching
- LDFFLAGS
- This optimization enables generation of prefetch instructions for tightly coupled loops
- -Ofast
- COPTIMIZE, FOPTIMIZE
- Enables all the optimizations from -O3 along with other aggressive optimizations that may violate strict compliance with language standards. Refer to the AOCC options document for the language you're using for more detailed documentation of optimizations enabled under -Ofast.
- Includes:
  - -O3
    - -O2
      
      -O1
- -march=znver5
- COPTIMIZE, FOPTIMIZE
- Specify that Clang should generate code for a specific processor family member and later. For example, if you specify -march=znver1, the compiler is allowed to generate instructions that are valid on AMD Zen processors, but which may not exist on earlier products. -march=znver4 enables AVX 512 ISA for Genoa (znver4) processors.
- -fveclib=AMDLIBM
- COPTIMIZE, FOPTIMIZE
- Use the given vector functions library.
- -ffast-math
- COPTIMIZE, FOPTIMIZE
- Enables a range of optimizations that provide faster, though sometimes less precise, mathematical operations that may not conform to the IEEE-754 specifications. When this option is specified, the __STDC_IEC_559__ macro is ignored even if set by the system headers.
- -fopenmp
- Yes
- COPTIMIZE, FOPTIMIZE
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -flto
- COPTIMIZE
- Generate output files in LLVM formats suitable for link time optimization. When used with -S this generates LLVM intermediate language assembly files, otherwise this generates LLVM bitcode format object files (which may be passed to the linker depending on the stage selection options).
- -DSPEC_OPENMP
- COPTIMIZE, FOPTIMIZE
- Definition of this macro indicates that compilation for parallel operation is enabled, and that any OpenMP directives or pragmas will be visible to the compiler. The behavior of this macro is overridden if -DSPEC_SUPPRESS_OPENMP also appears in the list of compilation flags.
- -fremap-arrays
- COPTIMIZE
- This option enables an optimization that transforms the data layout of a single dimensional array to provide better cache locality by analysing the access patterns.
- -fstrip-mining
- COPTIMIZE
- Enables loop strip mining optimization. This optimization breaks a large loop into smaller segments or strips to improve temporal and spatial locality.
- -fstruct-layout=9
- COPTIMIZE
- Analyzes the whole program to determine if the structures in the code can be peeled, if dead or redundant fields can be deleted, and if the pointer or integer fields in the structure can be compressed. If feasible, this optimization transforms the code to enable these improvements. This transformation is likely to improve cache utilization and memory bandwidth. It is expected to improve the scalability of programs executed on multiple cores. This is effective only under flto as the whole program analysis is required to perform this optimization. You can choose different levels of aggressiveness with which this optimization can be applied to your application; with 1 being the least aggressive and 7 being the most aggressive level.
  
  Possible values:
  - fstruct-layout=0: disables structure peeling (default).
  - fstruct-layout=1: enables structure peeling.
  - fstruct-layout=2: enables structure peeling and selectively compresses self-referential pointers in these structures to 32-bit pointers wherever safe.
  - fstruct-layout=3: enables structure peeling and selectively compresses self-referential pointers in these structures to 16-bit pointers wherever safe.
  - fstruct-layout=4: enables structure peeling, pointer compression as in level 2 and further enables compression of structure fields which are of 64-bit integer type to 32-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=5: enables structure peeling, pointer compression as in level 3 and further enables compression of structure fields which are of 64-bit integer type to 32-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=6: enables structure peeling, pointer compression as in level 2 and further enables compression of structure fields which are of type 64-bit integer type to 16-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=7: enables structure peeling, pointer compression as in level 3 and further enables compression of structure fields which are of type 64-bit integer type to 16-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=8: enables structure peeling, pointer compression, 64 bit integer type compression as in level 6 and creates optimal ordering of peeled structure fields which could improve runtime performance.
  - fstruct-layout=9: enables structure peeling, pointer compression, 64 bit integer type compression as in level 7 and creates optimal ordering of peeled structure fields which could improve runtime performance.
  Note:
  fstruct-layout=4 and fstruct-layout=5 are derived from fstruct-layout=2 and fstruct-layout=3 respectively with the added feature of safe compression of 64-bit integer fields to 32-bit integer fields in structures. Going from fstruct-layout=4 to fstruct-layout=5 may result in higher performance if the pointer values are such that the pointers can be compressed to 16-bits.
  
  fstruct-layout=6 and fstruct-layout=7 are derived from fstruct-layout=2 and fstructlayout=3 respectively, with the added feature of safe compression of 64 bit integer fields to 16 bit integer in structures. Going from fstruct-layout=6 to fstruct-layout=7 may result in higher performance if the pointer values are such that the pointers can be compressed to 16-bits.
- -mllvm -inline-threshold=1000
- COPTIMIZE
- Sets the compiler's inlining threshold level to the value passed as the argument. The inline threshold is used in the inliner heuristics to decide which functions should be inlined.
- -mllvm -reduce-array-computations=3
- COPTIMIZE, FOPTIMIZE
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -mllvm -unroll-threshold=50
- COPTIMIZE
- Sets the limit at which loops will be unrolled. For example, if unroll threshold is set to 100 then only loops with 100 or fewer instructions will be unrolled.
- -zopt
- COPTIMIZE, FOPTIMIZE
- This option enables a subset of scalar, vector and loop transformations including improved variants of loop invariant code motion, SLP and loop vectorizations, loop-fusion, loop-interchange, loop-unswitch, loop tiling and loop distribution.
- -funroll-loops
- FOPTIMIZE
- This option instructs the compiler to unroll loops wherever possible.
- -mllvm -lsr-in-nested-loop
- FOPTIMIZE
- Enables loop strength reduction for nested loop structures. By default, the compiler performs loop strength reduction only for the innermost loop.
- -Mrecursive
- FOPTIMIZE
- Allocate local variables on the stack, thus allowing recursion. SAVEd, data-initialized, or namelist members are always allocated statically, regardless of the setting of this switch.
- -fopenmp=libomp
- Yes
- EXTRA_LIBS
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -lomp
- EXTRA_LIBS
- Instructs the compiler to link with the OpenMP runtime libraries.
- -lamdlibm
- EXTRA_LIBS
- Instructs the compiler to link with AMD-supported optimized math library.
- -lamdalloc
- EXTRA_LIBS
- amdalloc is a AMD's memory allocator based on jemalloc library and is available as a part of AOCC binary package.
- -lflang
- EXTRA_LIBS
- Instructs the compiler to link with flang Fortran runtime libraries.

627.cam4_s

- -m64
- CC, FC, LD
- Generates code for a 64-bit environment. The 64-bit environment sets int to 32 bits and long and pointer to 64 bits and generates code for AMD's x86-64 architecture. The compiler generates AMD64, INTEL64, x86-64 64-bit ABI. The default on a 32-bit host is 32-bit ABI. The default on a 64-bit host is 64-bit ABI if the target platform specified is 64-bit, otherwise the default is 32-bit.
- -Wl,-mllvm -Wl,-align-all-nofallthru-blocks=6
- LDFLAGS
- Forces the alignment of all blocks that have no fall-through predecessors (i.e. don't add nops that are executed). In log2 format (e.g 4 means align on 16B boundaries).
- -Wl,-mllvm -Wl,-reduce-array-computations=3
- LDFLAGS
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -Wl,-mllvm -Wl,-enable-X86-prefetching
- LDFFLAGS
- This optimization enables generation of prefetch instructions for tightly coupled loops
- -Ofast
- COPTIMIZE, FOPTIMIZE
- Enables all the optimizations from -O3 along with other aggressive optimizations that may violate strict compliance with language standards. Refer to the AOCC options document for the language you're using for more detailed documentation of optimizations enabled under -Ofast.
- Includes:
  - -O3
    - -O2
      
      -O1
- -march=znver5
- COPTIMIZE, FOPTIMIZE
- Specify that Clang should generate code for a specific processor family member and later. For example, if you specify -march=znver1, the compiler is allowed to generate instructions that are valid on AMD Zen processors, but which may not exist on earlier products. -march=znver4 enables AVX 512 ISA for Genoa (znver4) processors.
- -fveclib=AMDLIBM
- COPTIMIZE, FOPTIMIZE
- Use the given vector functions library.
- -ffast-math
- COPTIMIZE, FOPTIMIZE
- Enables a range of optimizations that provide faster, though sometimes less precise, mathematical operations that may not conform to the IEEE-754 specifications. When this option is specified, the __STDC_IEC_559__ macro is ignored even if set by the system headers.
- -fopenmp
- Yes
- COPTIMIZE, FOPTIMIZE
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -flto
- COPTIMIZE, FOPTIMIZE
- Generate output files in LLVM formats suitable for link time optimization. When used with -S this generates LLVM intermediate language assembly files, otherwise this generates LLVM bitcode format object files (which may be passed to the linker depending on the stage selection options).
- -DSPEC_OPENMP
- COPTIMIZE, FOPTIMIZE
- Definition of this macro indicates that compilation for parallel operation is enabled, and that any OpenMP directives or pragmas will be visible to the compiler. The behavior of this macro is overridden if -DSPEC_SUPPRESS_OPENMP also appears in the list of compilation flags.
- -fremap-arrays
- COPTIMIZE
- This option enables an optimization that transforms the data layout of a single dimensional array to provide better cache locality by analysing the access patterns.
- -fstrip-mining
- COPTIMIZE
- Enables loop strip mining optimization. This optimization breaks a large loop into smaller segments or strips to improve temporal and spatial locality.
- -fstruct-layout=9
- COPTIMIZE
- Analyzes the whole program to determine if the structures in the code can be peeled, if dead or redundant fields can be deleted, and if the pointer or integer fields in the structure can be compressed. If feasible, this optimization transforms the code to enable these improvements. This transformation is likely to improve cache utilization and memory bandwidth. It is expected to improve the scalability of programs executed on multiple cores. This is effective only under flto as the whole program analysis is required to perform this optimization. You can choose different levels of aggressiveness with which this optimization can be applied to your application; with 1 being the least aggressive and 7 being the most aggressive level.
  
  Possible values:
  - fstruct-layout=0: disables structure peeling (default).
  - fstruct-layout=1: enables structure peeling.
  - fstruct-layout=2: enables structure peeling and selectively compresses self-referential pointers in these structures to 32-bit pointers wherever safe.
  - fstruct-layout=3: enables structure peeling and selectively compresses self-referential pointers in these structures to 16-bit pointers wherever safe.
  - fstruct-layout=4: enables structure peeling, pointer compression as in level 2 and further enables compression of structure fields which are of 64-bit integer type to 32-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=5: enables structure peeling, pointer compression as in level 3 and further enables compression of structure fields which are of 64-bit integer type to 32-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=6: enables structure peeling, pointer compression as in level 2 and further enables compression of structure fields which are of type 64-bit integer type to 16-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=7: enables structure peeling, pointer compression as in level 3 and further enables compression of structure fields which are of type 64-bit integer type to 16-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=8: enables structure peeling, pointer compression, 64 bit integer type compression as in level 6 and creates optimal ordering of peeled structure fields which could improve runtime performance.
  - fstruct-layout=9: enables structure peeling, pointer compression, 64 bit integer type compression as in level 7 and creates optimal ordering of peeled structure fields which could improve runtime performance.
  Note:
  fstruct-layout=4 and fstruct-layout=5 are derived from fstruct-layout=2 and fstruct-layout=3 respectively with the added feature of safe compression of 64-bit integer fields to 32-bit integer fields in structures. Going from fstruct-layout=4 to fstruct-layout=5 may result in higher performance if the pointer values are such that the pointers can be compressed to 16-bits.
  
  fstruct-layout=6 and fstruct-layout=7 are derived from fstruct-layout=2 and fstructlayout=3 respectively, with the added feature of safe compression of 64 bit integer fields to 16 bit integer in structures. Going from fstruct-layout=6 to fstruct-layout=7 may result in higher performance if the pointer values are such that the pointers can be compressed to 16-bits.
- -mllvm -inline-threshold=1000
- COPTIMIZE
- Sets the compiler's inlining threshold level to the value passed as the argument. The inline threshold is used in the inliner heuristics to decide which functions should be inlined.
- -mllvm -reduce-array-computations=3
- COPTIMIZE, FOPTIMIZE
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -mllvm -unroll-threshold=50
- COPTIMIZE
- Sets the limit at which loops will be unrolled. For example, if unroll threshold is set to 100 then only loops with 100 or fewer instructions will be unrolled.
- -zopt
- COPTIMIZE, FOPTIMIZE
- This option enables a subset of scalar, vector and loop transformations including improved variants of loop invariant code motion, SLP and loop vectorizations, loop-fusion, loop-interchange, loop-unswitch, loop tiling and loop distribution.
- -Mrecursive
- FOPTIMIZE
- Allocate local variables on the stack, thus allowing recursion. SAVEd, data-initialized, or namelist members are always allocated statically, regardless of the setting of this switch.
- -mrecip=none
- EXTRA_CFLAGS
- This option enables use of RCPSS and RSQRTSS instructions with an additional Newton-Raphson step to increase precision instead of DIVSS and SQRTSS.
- -fopenmp=libomp
- Yes
- EXTRA_LIBS
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -lomp
- EXTRA_LIBS
- Instructs the compiler to link with the OpenMP runtime libraries.
- -lamdlibm
- EXTRA_LIBS
- Instructs the compiler to link with AMD-supported optimized math library.
- -lamdalloc
- EXTRA_LIBS
- amdalloc is a AMD's memory allocator based on jemalloc library and is available as a part of AOCC binary package.
- -lflang
- EXTRA_LIBS
- Instructs the compiler to link with flang Fortran runtime libraries.

628.pop2_s

- -m64
- CC, FC, LD
- Generates code for a 64-bit environment. The 64-bit environment sets int to 32 bits and long and pointer to 64 bits and generates code for AMD's x86-64 architecture. The compiler generates AMD64, INTEL64, x86-64 64-bit ABI. The default on a 32-bit host is 32-bit ABI. The default on a 64-bit host is 64-bit ABI if the target platform specified is 64-bit, otherwise the default is 32-bit.
- -Wl,-mllvm -Wl,-align-all-nofallthru-blocks=6
- LDFLAGS
- Forces the alignment of all blocks that have no fall-through predecessors (i.e. don't add nops that are executed). In log2 format (e.g 4 means align on 16B boundaries).
- -Wl,-mllvm -Wl,-reduce-array-computations=3
- LDFLAGS
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -Wl,-mllvm -Wl,-enable-X86-prefetching
- LDFFLAGS
- This optimization enables generation of prefetch instructions for tightly coupled loops
- -Ofast
- COPTIMIZE, FOPTIMIZE
- Enables all the optimizations from -O3 along with other aggressive optimizations that may violate strict compliance with language standards. Refer to the AOCC options document for the language you're using for more detailed documentation of optimizations enabled under -Ofast.
- Includes:
  - -O3
    - -O2
      
      -O1
- -march=znver5
- COPTIMIZE, FOPTIMIZE
- Specify that Clang should generate code for a specific processor family member and later. For example, if you specify -march=znver1, the compiler is allowed to generate instructions that are valid on AMD Zen processors, but which may not exist on earlier products. -march=znver4 enables AVX 512 ISA for Genoa (znver4) processors.
- -fveclib=AMDLIBM
- COPTIMIZE, FOPTIMIZE
- Use the given vector functions library.
- -ffast-math
- COPTIMIZE, FOPTIMIZE
- Enables a range of optimizations that provide faster, though sometimes less precise, mathematical operations that may not conform to the IEEE-754 specifications. When this option is specified, the __STDC_IEC_559__ macro is ignored even if set by the system headers.
- -fopenmp
- Yes
- COPTIMIZE, FOPTIMIZE
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -flto
- COPTIMIZE
- Generate output files in LLVM formats suitable for link time optimization. When used with -S this generates LLVM intermediate language assembly files, otherwise this generates LLVM bitcode format object files (which may be passed to the linker depending on the stage selection options).
- -DSPEC_OPENMP
- COPTIMIZE, FOPTIMIZE
- Definition of this macro indicates that compilation for parallel operation is enabled, and that any OpenMP directives or pragmas will be visible to the compiler. The behavior of this macro is overridden if -DSPEC_SUPPRESS_OPENMP also appears in the list of compilation flags.
- -fremap-arrays
- COPTIMIZE
- This option enables an optimization that transforms the data layout of a single dimensional array to provide better cache locality by analysing the access patterns.
- -fstrip-mining
- COPTIMIZE
- Enables loop strip mining optimization. This optimization breaks a large loop into smaller segments or strips to improve temporal and spatial locality.
- -fstruct-layout=9
- COPTIMIZE
- Analyzes the whole program to determine if the structures in the code can be peeled, if dead or redundant fields can be deleted, and if the pointer or integer fields in the structure can be compressed. If feasible, this optimization transforms the code to enable these improvements. This transformation is likely to improve cache utilization and memory bandwidth. It is expected to improve the scalability of programs executed on multiple cores. This is effective only under flto as the whole program analysis is required to perform this optimization. You can choose different levels of aggressiveness with which this optimization can be applied to your application; with 1 being the least aggressive and 7 being the most aggressive level.
  
  Possible values:
  - fstruct-layout=0: disables structure peeling (default).
  - fstruct-layout=1: enables structure peeling.
  - fstruct-layout=2: enables structure peeling and selectively compresses self-referential pointers in these structures to 32-bit pointers wherever safe.
  - fstruct-layout=3: enables structure peeling and selectively compresses self-referential pointers in these structures to 16-bit pointers wherever safe.
  - fstruct-layout=4: enables structure peeling, pointer compression as in level 2 and further enables compression of structure fields which are of 64-bit integer type to 32-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=5: enables structure peeling, pointer compression as in level 3 and further enables compression of structure fields which are of 64-bit integer type to 32-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=6: enables structure peeling, pointer compression as in level 2 and further enables compression of structure fields which are of type 64-bit integer type to 16-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=7: enables structure peeling, pointer compression as in level 3 and further enables compression of structure fields which are of type 64-bit integer type to 16-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=8: enables structure peeling, pointer compression, 64 bit integer type compression as in level 6 and creates optimal ordering of peeled structure fields which could improve runtime performance.
  - fstruct-layout=9: enables structure peeling, pointer compression, 64 bit integer type compression as in level 7 and creates optimal ordering of peeled structure fields which could improve runtime performance.
  Note:
  fstruct-layout=4 and fstruct-layout=5 are derived from fstruct-layout=2 and fstruct-layout=3 respectively with the added feature of safe compression of 64-bit integer fields to 32-bit integer fields in structures. Going from fstruct-layout=4 to fstruct-layout=5 may result in higher performance if the pointer values are such that the pointers can be compressed to 16-bits.
  
  fstruct-layout=6 and fstruct-layout=7 are derived from fstruct-layout=2 and fstructlayout=3 respectively, with the added feature of safe compression of 64 bit integer fields to 16 bit integer in structures. Going from fstruct-layout=6 to fstruct-layout=7 may result in higher performance if the pointer values are such that the pointers can be compressed to 16-bits.
- -mllvm -inline-threshold=1000
- COPTIMIZE
- Sets the compiler's inlining threshold level to the value passed as the argument. The inline threshold is used in the inliner heuristics to decide which functions should be inlined.
- -mllvm -reduce-array-computations=3
- COPTIMIZE, FOPTIMIZE
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -mllvm -unroll-threshold=50
- COPTIMIZE
- Sets the limit at which loops will be unrolled. For example, if unroll threshold is set to 100 then only loops with 100 or fewer instructions will be unrolled.
- -zopt
- COPTIMIZE
- This option enables a subset of scalar, vector and loop transformations including improved variants of loop invariant code motion, SLP and loop vectorizations, loop-fusion, loop-interchange, loop-unswitch, loop tiling and loop distribution.
- -fscalar-transform
- FOPTIMIZE
- This option enables a subset of scalar transformations including improved variants of various code movement optimizations like hosting and invariant code movement.
- -fvector-transform
- FOPTIMIZE
- This option enables a subset of vector transformations including improved variants of SLP and loop vectorization.
- -Mrecursive
- FOPTIMIZE
- Allocate local variables on the stack, thus allowing recursion. SAVEd, data-initialized, or namelist members are always allocated statically, regardless of the setting of this switch.
- -fopenmp=libomp
- Yes
- EXTRA_LIBS
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -lomp
- EXTRA_LIBS
- Instructs the compiler to link with the OpenMP runtime libraries.
- -lamdlibm
- EXTRA_LIBS
- Instructs the compiler to link with AMD-supported optimized math library.
- -lamdalloc
- EXTRA_LIBS
- amdalloc is a AMD's memory allocator based on jemalloc library and is available as a part of AOCC binary package.
- -lflang
- EXTRA_LIBS
- Instructs the compiler to link with flang Fortran runtime libraries.

Benchmarks using Fortran, C, and C++

- -m64
- CC, CXX, FC, LD
- Generates code for a 64-bit environment. The 64-bit environment sets int to 32 bits and long and pointer to 64 bits and generates code for AMD's x86-64 architecture. The compiler generates AMD64, INTEL64, x86-64 64-bit ABI. The default on a 32-bit host is 32-bit ABI. The default on a 64-bit host is 64-bit ABI if the target platform specified is 64-bit, otherwise the default is 32-bit.
- -std=c++14
- CXX, LD
- Selects the C++ language dialect.
- -Wl,-mllvm -Wl,-align-all-nofallthru-blocks=6
- LDFLAGS
- Forces the alignment of all blocks that have no fall-through predecessors (i.e. don't add nops that are executed). In log2 format (e.g 4 means align on 16B boundaries).
- -Wl,-mllvm -Wl,-reduce-array-computations=3
- LDFLAGS
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -Wl,-mllvm -Wl,-x86-use-vzeroupper=false
- LDCXXFLAGS
- This option controls the vzeroupper instruction generation before a transfer of control flow. Not emitting the vzeroupper instruction can help minimize the AVX to SSE transition penalty.
- -Ofast
- COPTIMIZE, CXXOPTIMIZE, FOPTIMIZE
- Enables all the optimizations from -O3 along with other aggressive optimizations that may violate strict compliance with language standards. Refer to the AOCC options document for the language you're using for more detailed documentation of optimizations enabled under -Ofast.
- Includes:
  - -O3
    - -O2
      
      -O1
- -march=znver5
- COPTIMIZE, CXXOPTIMIZE, FOPTIMIZE
- Specify that Clang should generate code for a specific processor family member and later. For example, if you specify -march=znver1, the compiler is allowed to generate instructions that are valid on AMD Zen processors, but which may not exist on earlier products. -march=znver4 enables AVX 512 ISA for Genoa (znver4) processors.
- -fveclib=AMDLIBM
- COPTIMIZE, CXXOPTIMIZE, FOPTIMIZE
- Use the given vector functions library.
- -ffast-math
- COPTIMIZE, CXXOPTIMIZE, FOPTIMIZE
- Enables a range of optimizations that provide faster, though sometimes less precise, mathematical operations that may not conform to the IEEE-754 specifications. When this option is specified, the __STDC_IEC_559__ macro is ignored even if set by the system headers.
- -fopenmp
- Yes
- COPTIMIZE, CXXOPTIMIZE, FOPTIMIZE
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -flto
- COPTIMIZE, CXXOPTIMIZE, FOPTIMIZE
- Generate output files in LLVM formats suitable for link time optimization. When used with -S this generates LLVM intermediate language assembly files, otherwise this generates LLVM bitcode format object files (which may be passed to the linker depending on the stage selection options).
- -DSPEC_OPENMP
- COPTIMIZE, CXXOPTIMIZE, FOPTIMIZE
- Definition of this macro indicates that compilation for parallel operation is enabled, and that any OpenMP directives or pragmas will be visible to the compiler. The behavior of this macro is overridden if -DSPEC_SUPPRESS_OPENMP also appears in the list of compilation flags.
- -fremap-arrays
- COPTIMIZE
- This option enables an optimization that transforms the data layout of a single dimensional array to provide better cache locality by analysing the access patterns.
- -fstrip-mining
- COPTIMIZE
- Enables loop strip mining optimization. This optimization breaks a large loop into smaller segments or strips to improve temporal and spatial locality.
- -fstruct-layout=9
- COPTIMIZE
- Analyzes the whole program to determine if the structures in the code can be peeled, if dead or redundant fields can be deleted, and if the pointer or integer fields in the structure can be compressed. If feasible, this optimization transforms the code to enable these improvements. This transformation is likely to improve cache utilization and memory bandwidth. It is expected to improve the scalability of programs executed on multiple cores. This is effective only under flto as the whole program analysis is required to perform this optimization. You can choose different levels of aggressiveness with which this optimization can be applied to your application; with 1 being the least aggressive and 7 being the most aggressive level.
  
  Possible values:
  - fstruct-layout=0: disables structure peeling (default).
  - fstruct-layout=1: enables structure peeling.
  - fstruct-layout=2: enables structure peeling and selectively compresses self-referential pointers in these structures to 32-bit pointers wherever safe.
  - fstruct-layout=3: enables structure peeling and selectively compresses self-referential pointers in these structures to 16-bit pointers wherever safe.
  - fstruct-layout=4: enables structure peeling, pointer compression as in level 2 and further enables compression of structure fields which are of 64-bit integer type to 32-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=5: enables structure peeling, pointer compression as in level 3 and further enables compression of structure fields which are of 64-bit integer type to 32-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=6: enables structure peeling, pointer compression as in level 2 and further enables compression of structure fields which are of type 64-bit integer type to 16-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=7: enables structure peeling, pointer compression as in level 3 and further enables compression of structure fields which are of type 64-bit integer type to 16-bit integer type. This is performed under a strict safety check.
  - fstruct-layout=8: enables structure peeling, pointer compression, 64 bit integer type compression as in level 6 and creates optimal ordering of peeled structure fields which could improve runtime performance.
  - fstruct-layout=9: enables structure peeling, pointer compression, 64 bit integer type compression as in level 7 and creates optimal ordering of peeled structure fields which could improve runtime performance.
  Note:
  fstruct-layout=4 and fstruct-layout=5 are derived from fstruct-layout=2 and fstruct-layout=3 respectively with the added feature of safe compression of 64-bit integer fields to 32-bit integer fields in structures. Going from fstruct-layout=4 to fstruct-layout=5 may result in higher performance if the pointer values are such that the pointers can be compressed to 16-bits.
  
  fstruct-layout=6 and fstruct-layout=7 are derived from fstruct-layout=2 and fstructlayout=3 respectively, with the added feature of safe compression of 64 bit integer fields to 16 bit integer in structures. Going from fstruct-layout=6 to fstruct-layout=7 may result in higher performance if the pointer values are such that the pointers can be compressed to 16-bits.
- -mllvm -inline-threshold=1000
- COPTIMIZE
- Sets the compiler's inlining threshold level to the value passed as the argument. The inline threshold is used in the inliner heuristics to decide which functions should be inlined.
- -mllvm -reduce-array-computations=3
- COPTIMIZE, CXXOPTIMIZE, FOPTIMIZE
- This option eliminates the array computations based on their usage. The computations on unused array elements and computations on zero valued array elements are eliminated with this optimization. -flto as whole program analysis is required to perform this optimization.
  
  Possible values:
  - 1: Eliminates the computations on unused array elements
  - 2: Eliminates the computations on zero valued array elements
  - 3: Eliminates the computations on unused and zero valued array elements
- -mllvm -unroll-threshold=50
- COPTIMIZE
- Sets the limit at which loops will be unrolled. For example, if unroll threshold is set to 100 then only loops with 100 or fewer instructions will be unrolled.
- -zopt
- COPTIMIZE, CXXOPTIMIZE, FOPTIMIZE
- This option enables a subset of scalar, vector and loop transformations including improved variants of loop invariant code motion, SLP and loop vectorizations, loop-fusion, loop-interchange, loop-unswitch, loop tiling and loop distribution.
- -mllvm -unroll-threshold=100
- CXXOPTIMIZE
- Sets the limit at which loops will be unrolled. For example, if unroll threshold is set to 100 then only loops with 100 or fewer instructions will be unrolled.
- -Mrecursive
- FOPTIMIZE
- Allocate local variables on the stack, thus allowing recursion. SAVEd, data-initialized, or namelist members are always allocated statically, regardless of the setting of this switch.
- -fopenmp=libomp
- Yes
- EXTRA_LIBS
- Enable handling of OpenMP directives and generate parallel code. The openmp library to be linked can be specified through -fopenmp=library option.
- -lomp
- EXTRA_LIBS
- Instructs the compiler to link with the OpenMP runtime libraries.
- -lamdlibm
- EXTRA_LIBS
- Instructs the compiler to link with AMD-supported optimized math library.
- -lamdalloc
- EXTRA_LIBS
- amdalloc is a AMD's memory allocator based on jemalloc library and is available as a part of AOCC binary package.
- -lflang
- EXTRA_LIBS
- Instructs the compiler to link with flang Fortran runtime libraries.

Implicitly Included Flags

This section contains descriptions of flags that were included implicitly by other flags, but which do not have a permanent home at SPEC.

Commands and Options Used to Submit Benchmark Runs

For multi-copy runs or single copy runs on systems with multiple sockets, it is advantageous to bind a process to a particular core. Otherwise, the OS may arbitrarily move your process from one core to another. This can affect performance. To help, SPEC allows the use of a "submit" command where users can specify a utility to use to bind processes. We have found the utility 'numactl' to be the best choice.

numactl runs processes with a specific NUMA scheduling or memory placement policy. The policy is set for a command and inherited by all of its children. The numactl flag "--physcpubind" specifies which core(s) to bind the process. "-l" instructs numactl to keep a process's memory on the local node while "-m" specifies which node(s) to place a process's memory. For full details on using numactl, please refer to your Linux documentation, 'man numactl'

Note that some older versions of numactl incorrectly interpret application arguments as its own. For example, with the command "numactl --physcpubind=0 -l a.out -m a", numactl will interpret a.out's "-m" option as its own "-m" option. To work around this problem, we put the command to be run in a shell script and then run the shell script using numactl. For example: "echo 'a.out -m a' > run.sh ; numactl --physcpubind=0 bash run.sh"

Shell, Environment, and Other Software Settings

numactl --interleave=all runcpu executes the SPEC CPU command runcpu so that memory is consumed across NUMA nodes rather than consumed from a single node. This helps prevent local node out-of-memory conditions which can occur when runcpu is executed without interleaving. For full details on using numactl, please refer to your Linux documentation, 'man numactl'

THP is an abstraction layer that automates most aspects of creating, managing, and using huge pages. It is designed to hide much of the complexity in using huge pages from system administrators and developers. Huge pages increase the memory page size from 4 kilobytes to 2 megabytes. This provides significant performance advantages on systems with highly contended resources and large memory workloads. If memory utilization is too high or memory is badly fragmented which prevents huge pages being allocated, the kernel will assign smaller 4k pages instead. Most recent Linux OS releases have THP enabled by default.

THP usage is controlled by the sysfs setting /sys/kernel/mm/transparent_hugepage/enabled. Possible values:

The SPEC CPU benchmark codes themselves never explicitly request huge pages, as the mechanism to do that is OS-specific and can change over time. Libraries such as amdalloc which are used by the benchmarks may explicitly request huge pages, and use of such libraries can make the "madvise" setting relevant and useful.

When no huge pages are immediately available and one is requested, how the system handles the request for THP creation is controlled by the sysfs setting /sys/kernel/mm/transparent_hugepage/defrag. Possible values:

An application that "always" requests THP often can benefit from waiting for an allocation until those huge pages can be assembled.
For more information see the Linux transparent hugepage documentation.

Sets the stack size to n kbytes, or unlimited to allow the stack size to grow without limit.

Disables the cpu frequency scaling program in order to set the CPUs to the highest supported frequency.

An environment variable that indicates the location in the filesystem of bundled libraries to use when running the benchmark binaries.

target nowait is supported via hidden helper task, which is a task not bound to any parallel region. A hidden helper team with a number of threads is created when the first hidden helper task is encountered.

The number of threads can be configured via the environment variable LIBOMP_NUM_HIDDEN_HELPER_THREADS. The default is 8. If LIBOMP_NUM_HIDDEN_HELPER_THREADS is 0, the hidden helper task is disabled and support falls back to a regular OpenMP task. The hidden helper task can also be disabled by setting the environment variable LIBOMP_USE_HIDDEN_HELPER_TASK=OFF.

This OS setting controls automatic NUMA balancing on memory mapping and process placement. NUMA balancing incurs overhead for no benefit on workloads that are already bound to NUMA nodes.

This setting can be used to select the type of process address space randomization. Defaults differ based on whether the architecture supports ASLR, whether the kernel was built with the CONFIG_COMPAT_BRK option or not, or the kernel boot options used.

Disabling ASLR can make process execution more deterministic and runtimes more consistent. For more information see the randomize_va_space entry in the Linux sysctl documentation.

The two commands are equivalent: echo 3> /proc/sys/vm/drop_caches and sysctl -w vm.drop_caches=3 Both must be run as root. The commands are used to free up the filesystem page cache, dentries, and inodes.

The amdalloc library is a variant of jemalloc library. The amdalloc library has tunable parameters, many of which may be changed at run-time via several mechanisms, one of which is the MALLOC_CONF environment variable. Other methods, as well as the order in which they're referenced, are detailed in the jemalloc documentation's TUNING section.

The options that can be tuned at run-time are everything in the jemalloc documentation's MALLCTL NAMESPACE section that begins with "opt.".

An environment variable used to initialize the allocated memory. Setting PGHPF_ZMEM to "Yes" has the effect of initializing all allocated memory to zero.

This environment variable is used to set the thread affinity for threads spawned by OpenMP.

This environment variable is defined as part of the OpenMP standard. Setting it to "false" prevents the OpenMP runtime from dynamically adjusting the number of threads to use for parallel execution.

This environment variable is defined as part of the OpenMP standard. Setting it to "static" causes loop iterations to be assigned to threads in round-robin fashion in the order of the thread number.

This environment variable is defined as part of the OpenMP standard and controls the size of the stack for threads created by OpenMP.

This environment variable is defined as part of the OpenMP standard and limits the maximum number of OpenMP threads that can be created.

Operating System Tuning Parameters

is a command used to set or check user limits on system resources such as memory, CPU, and the number of open files. Below are common usages of ulimit:

irqbalance is a Linux background service that distributes hardware interrupts across multiple CPU cores to prevent overloading a single core and improve system performance.

Performance governors are part of Linux's CPU frequency scaling mechanisms, used to determine how the CPU frequency should be managed. Simply put, they control "how fast the CPU should run under different conditions." Common CPU governors include:

When set to performance, the CPU will always operate at its maximum frequency to deliver the highest computing performance. This will improve overall system performance.

Many companies execute the following command when conducting system performance testing to ensure that the CPU operates at its maximum frequency:

is a command-line tool used to manage performance tuning settings on Linux systems. It allows users to select predefined tuning profiles that automatically adjust CPU, power saving, I/O, and network parameters according to the system’s intended usage, optimizing either performance or energy efficiency. The following four are the most commonly used profiles:

To clear the Linux filesystem cache during testing or prior to benchmarking, the following command is used:

Firmware / BIOS / Microcode Settings

SMT Control is a setting that enables or disables Simultaneous Multithreading (SMT), allowing each CPU core to execute one or more threads concurrently to improve multitasking performance or ensure thread isolation. Values for this BIOS option can be:

This setting controls how the system balances power efficiency and performance across CPU, memory, and I/O subsystems. Values for this BIOS option can be:

Performance Mode forces the system to operate at its highest performance level, sacrificing power efficiency for maximum speed. Values for this BIOS option can be:

ASPM (Active State Power Management) is a PCI Express power-saving feature that reduces power consumption by placing links into lower power states when idle. Values for this BIOS option can be:

CPPC (Collaborative Processor Performance Control) allows the OS and processor to work together to optimize performance and power efficiency by selecting appropriate performance levels dynamically. Values for this BIOS option can be:

Allows for disabling memory interleaving. Note that NUMA nodes per socket will be honored regardless of this setting. Values for this BIOS option can be:

SVM (Secure Virtual Machine) Mode is a BIOS setting that enables or disables hardware-assisted virtualization on AMD processors. When enabled, it allows the use of virtualization technologies such as AMD-V, which are required by hypervisors (e.g., VMware, Hyper-V, KVM) to run virtual machines with hardware-level isolation and improved performance. Values for this BIOS option can be:

SR-IOV (Single Root I/O Virtualization) is a hardware-assisted virtualization technology that allows a single physical PCIe device (such as a network interface card) to present multiple virtual functions (VFs) to the operating system or hypervisor. This enables more efficient and direct access to hardware for virtual machines, reducing I/O overhead and improving performance in virtualized environments. Values for this BIOS option can be:

SEV (Secure Encrypted Virtualization) is an AMD security technology that encrypts the memory of virtual machines, protecting guest data from being accessed or tampered with by the hypervisor or other VMs. It enhances data confidentiality in cloud or multi-tenant environments by isolating VMs at the hardware level. Values for this BIOS option can be:

BoostFmaxEn determines whether the CPU's maximum frequency (Fmax) is set automatically by the system or manually by the user. Values for this BIOS option can be:

BoostFmax defines the maximum frequency (in MHz) the CPU is allowed to reach when frequency boosting is enabled.

Determinism Control is a BIOS setting used on AMD EPYC processors to influence how the system behaves in terms of frequency and performance consistency across cores and sockets. It ensures predictable performance, which is especially useful in multi-socket or multi-node systems where workloads must remain consistent across processors. Values for this BIOS option can be:

Determinism Enable is a setting that determines whether a system prioritizes consistent power behavior or peak performance when determinism is manually controlled. It works in conjunction with the Determinism Control setting to fine-tune system response across cores and sockets. Values for this BIOS option can be:

TDP Control determines how the processor’s Thermal Design Power (TDP) is managed — either automatically by the system or manually by user-defined limits. This setting affects CPU power consumption and thermal behavior. Values for this BIOS option can be:

TDP (Thermal Design Power) sets a power consumption target for the CPU in watts, helping manage thermal output and power limits during operation — especially relevant when TDP Control is set to Manual.

PPT Control (Package Power Tracking Control) determines whether the maximum allowable CPU package power (PPT limit) is automatically set by the system or manually defined by the user to control CPU power usage. Values for this BIOS option can be:

PPT defines the upper limit of total power consumption (in watts) for the CPU package, including cores, cache, and SoC components, to ensure thermal and electrical safety.

ACPI CST C2 Latency defines the response time (in microseconds) for the processor to exit the C2 low-power state and return to full operation. This setting influences how quickly the CPU can resume tasks after being in power-saving mode.

Memory Target Speed sets the desired memory (DRAM) operating frequency for the system, affecting overall memory bandwidth and latency performance. Values for this BIOS option can be:

NUMA Nodes Per Socket (NPS) determines how many NUMA (Non-Uniform Memory Access) domains are created per CPU socket, impacting memory locality, bandwidth, and latency for multi-threaded workloads. Values for this BIOS option can be:

DRAM Scrub Time defines the periodic interval for background memory error correction (memory scrubbing), which helps detect and repair soft errors (bit flips) in DRAM to improve system reliability. Values for this BIOS option can be:

L1 Stride Prefetcher is a processor feature that attempts to pre-load data into the L1 cache by predicting memory access patterns with regular strides, helping improve performance by reducing cache miss latency. Values for this BIOS option can be:

APBDIS (Application Power Brake Disable) is a BIOS setting that controls whether the CPU’s internal power throttling feature (Application Power Brake, APB) is enabled or disabled. APB dynamically reduces performance under certain conditions to meet power or thermal constraints. Values for this BIOS option can be:

This BIOS setting defines whether each L3 cache segment is treated as a separate NUMA (Non-Uniform Memory Access) domain by reporting it in the ACPI SRAT (System Resource Affinity Table). This can affect how the OS and applications schedule memory and threads. Values for this BIOS option can be:

For questions about the meanings of these flags, please contact the tester.
For other inquiries, please contact info@spec.org
Copyright 2017-2025 Standard Performance Evaluation Corporation
Tested with SPEC CPU2017 v1.1.9.
Report generated on 2025-09-30 11:46:52 by SPEC CPU2017 flags formatter v5178.

	Indicates that the flag description came from the user flags file.
	Indicates that the flag description came from the suite-wide flags file.
	Indicates that the flag description came from a per-benchmark flags file.

CPU2017 Flag DescriptionCompal Electronics, Inc. SR224-2A AMD EPYC 9755

Compilers: AMD Optimizing C/C++ Compiler Suite

Base Compiler Invocation

Peak Compiler Invocation

Base Portability Flags

Peak Portability Flags

Base Optimization Flags

Peak Optimization Flags

Base Other Flags

Peak Other Flags

Implicitly Included Flags

CPU2017 Flag Description
Compal Electronics, Inc. SR224-2A AMD EPYC 9755