SPEC ACCEL PGI - Platform settings
Shell, Environment, and Other Software Settings - All compiler versions
- Set stack size to unlimited
- The command "ulimit -s unlimited" is used to set the stack size limit to unlimited.
- numactl --interleave=all "runspec command"
- Launching a process with numactl --interleave=all sets the memory interleave policy so that memory will be allocated using round robin on nodes. When memory cannot be allocated on the current interleave target fall back to other nodes.
- OMP_NUM_THREADS
- Sets the maximum number of threads to use for OpenMP parallel regions if no other value is specified in the application. Example syntax on a Linux system with 8 cores: export OMP_NUM_THREADS=8
- OMP_STACKSIZE
- Specify stack size to be allocated for each thread.
- ACC_NUM_CORES
-
Sets the maximum number of CPU cores to use. Default is to use all available physical cores.
- HUGETLB_PATH
-
Set the huge TLB pages mount path.
Shell, Environment, and Other Software Settings - PGI LLVM Compiler - x86 and Power architectures.
- OMP_PROC_BIND
-
If set to 'true', bind CPU threads to cores. Default is 'false'.
- OMP_PLACES
-
Set the thread to core number binding.
- Example: OMP_PLACES={0},{1},{2},{3}
- KMP_AFFINITY
- Syntax: KMP_AFFINITY=[<modifier>,...]<type>[,<permute>][,<offset>]
The value for the environment variable KMP_AFFINITY affects how the threads from an auto-parallelized program are scheduled across processors.
It applies to binaries built with -mp using the PGI LLVM compilers on x86 and Power.
modifier:
granularity=fine Causes each OpenMP thread to be bound to a single thread context.
type:
compact Specifying compact assigns the OpenMP thread <n>+1 to a free thread context as close as possible to the thread context where the <n> OpenMP thread was placed.
scatter Specifying scatter distributes the threads as evenly as possible across the entire system.
permute: The permute specifier is an integer value controls which levels are most significant when sorting the machine topology map. A value for permute forces the mappings to make the specified number of most significant levels of the sort the least significant, and it inverts the order of significance.
offset: The offset specifier indicates the starting position for thread assignment.
- Example: KMP_AFFINITY=granularity=fine,scatter
Specifying granularity=fine selects the finest granularity level and causes each OpenMP or auto-par thread to be bound to a single thread context.
This ensures that there is only one thread per core on cores supporting HyperThreading Technology
Specifying scatter distributes the threads as evenly as possible across the entire system.
Hence a combination of these two options, will spread the threads evenly across sockets, with one thread per physical core.
- Example: KMP_AFFINITY=compact,1,0
Specifying compact will assign the n+1 thread to a free thread context as close as possible to thread n.
A default granularity=core is implied if no granularity is explicitly specified.
Specifying 1,0 sets permute and offset values of the thread assignment.
With a permute value of 1, thread n+1 is assigned to a consecutive core. With an offset of 0, the process's first thread 0 will be assigned to thread 0.
The same behavior is exhibited in a multisocket system.
Shell, Environment, and Other Software Settings - PGI Native x86 compilers
- MP_BIND
-
If set to 'yes', bind CPU threads to cores. Default is 'no'.
- MP_BLIST
-
Set the thread to core number binding.
- Example: MP_BIND=0,1,2,3
- MP_SPIN
-
Specifies the number of times to check a semaphore before calling sched_yield() (on Linux or macOS) or _sleep() (on Windows).