SPEC CPU2017 Flag Description for Quanta Computer Inc.

Operating System Tuning Parameters

OS Tuning

ulimit:

Used to set user limits of system-wide resources. Provides control over resources available to the shell and processes started by it. Some common ulimit commands may include:

ulimit -s [n | unlimited]: Set the stack size to n kbytes, or unlimited to allow the stack size to grow without limit.
ulimit -l (number): Set the maximum size that can be locked into memory.

Performance Governors (Linux):

In-kernel CPU frequency governors are pre-configured power schemes for the CPU. The CPUfreq governors use P-states to change frequencies and lower power consumption. The dynamic governors can switch between CPU frequencies, based on CPU utilization to allow for power savings while not sacrificing performance.

Other options beside a generic performance governor can be set, such as the Performance governor and Powersave governor:

--governor , -g

The governor defines the power characteristics of the system CPU, which in turn affects CPU performance. Each governor has its own unique behavior, purpose, and suitability in terms of workload.

On many Linux systems one can set the governor for all CPUs through the cpupower utility with following commands:

"cpupower frequency-set -g performance"

Tuning Kernel parameters:

The following Linux Kernel parameters were tuned to better optimize performance of some areas of the system:

dirty_background_ratio: Set through "echo 40 > /proc/sys/vm/dirty_background_ratio". This setting can help Linux disk caching and performance by setting the percentage of system memory that can be filled with dirty pages.
dirty_ratio: Set through "echo 40 > /proc/sys/vm/dirty_ratio". This setting is the absolute maximum amount of system memory that can be filled with dirty pages before everything must get committed to disk.
swappiness: The swappiness value can range from 1 to 100. A value of 100 will cause the kernel to swap out inactive processes frequently in favor of file system performance, resulting in large disk cache sizes. A value of 1 tells the kernel to only swap processes to disk if absolutely necessary. This can be set through a command like "echo 1 > /proc/sys/vm/swappiness"
ksm/sleep_millisecs: Set through "echo 200 > /sys/kernel/mm/ksm/sleep_millisecs". This setting controls how many milliseconds the ksmd (KSM daeomn) should sleep before the next scan.
khugepaged/scan_sleep_millisecs: Set through "echo 50000 > /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs". This setting controls how many milliseconds to wait in khugepaged is there is a hugepage allocation failure to throttle the next allocation attempt.
numa_balancing: Disabled through "echo 0 > /proc/sys/kernel/numa_balancing". This feature will automatically migrate data on demand so memory nodes are aligned to the local CPU that is accessing data. Depending on the workload involved, enabling this can boost the performance if the workload performs well on NUMA hardware. If the workload is statically set to balance between nodes, then this service may not provide a benefit.
Zone Reclaim Mode: Zone reclaim allows the reclaiming of pages from a zone if the number of free pages falls below a watermark even if other zones still have enough pages available. Reclaiming a page can be more beneficial than taking the performance penalties that are associated with allocating a page on a remote zone, especially for NUMA machines. To tell the kernel to free local node memory rather than grabbing free memory from remote nodes, use a command like "echo 1 > /proc/sys/vm/zone_reclaim_mode"
file system page cache: The command "echo 3> /proc/sys/vm/drop_caches" is used to free slab objects and pagecache.

kernel.randomize_va_space (ASLR)

This setting can be used to select the type of process address space randomization. Defaults differ based on whether the architecture supports ASLR, whether the kernel was built with the CONFIG_COMPAT_BRK option or not, or the kernel boot options used.

Possible settings:

0: Turn process address space randomization off.
1: Randomize addresses of mmap base, stack, and VDSO pages.
2: Additionally randomize the heap. (This is probably the default.)

Disabling ASLR can make process execution more deterministic and runtimes more consistent.

For more information see the randomize_va_space entry in the "https://www.kernel.org/doc/Documentation/sysctl/kernel.txt" Linux sysctl documentation

Transparent Hugepages (THP)

THP is an abstraction layer that automates most aspects of creating, managing, and using huge pages. It is designed to hide much of the complexity in using huge pages from system administrators and developers. Huge pages increase the memory page size from 4 kilobytes to 2 megabytes. This provides significant performance advantages on systems with highly contended resources and large memory workloads. If memory utilization is too high or memory is badly fragmented which prevents hugepages being allocated, the kernel will assign smaller 4k pages instead. Most recent Linux OS releases have THP enabled by default.

THP usage is controlled by the sysfs setting /sys/kernel/mm/transparent_hugepage/enabled .

Possible values:

never: entirely disable THP usage.
madvise: enable THP usage only inside regions marked MADV_HUGEPAGE using madvise(3).
always: enable THP usage system-wide. This is the default.

THP creation is controlled by the sysfs setting /sys/kernel/mm/transparent_hugepage/defrag.

Possible values:

never: if no THP are available to satisfy a request, do not attempt to make any.
defer: an allocation requesting THP when none are available get normal pages while requesting THP creation in the background.
defer+madvise: acts like "always", but only for allocations in regions marked MADV_HUGEPAGE using madvise(3); for all other regions it's like "defer".
madvise: acts like "always", but only for allocations in regions marked MADV_HUGEPAGE using madvise(3). This is the default.
always: an allocation requesting THP when none are available will stall until some are made.

An application that "always" requests THP often can benefit from waiting for an allocation until those huge pages can be assembled.

For more information see the "https://www.kernel.org/doc/Documentation/vm/transhuge.txt" Linux transparent hugepage documentation.

Firmware / BIOS / Microcode Settings

Determinism Control:

This BIOS option allows user to choose AGESA determinism control. Available settings are:

Manual: User can set customized determinism.
Auto (Default setting): Use the fused determinism.

Determinism Slider:

Selects the determinism mode for the CPU:

Auto (Default setting): Use default performance determinism settings.
Power: Maximizes performance within the power limits defined by cTDP and PPT.
Performance: Provides predictable performance across all processors of the same type.

cTDP Control(Configurable TDP):

TDP is an acronym for “Thermal Design Power.” TDP is the recommended target for power used when designing the cooling capacity for a server. EPYC processors are able to control this target power consumption within certain limits. This capability is referred to as “configurable TDP” or "cTDP." cTDP can be used to reduce power consumption for greater efficiency, or in some cases, increase power consumption above the default value to provide additional performance. cTDP is controlled using a BIOS option.

The default EPYC cTDP value corresponds with the microprocessor’s nominal TDP. For the EPYC 7601, the default value is 180W. The default cTDP value is set at a good balance between performance and energy efficiency. The EPYC 7601 cTDP can be reduced as low as 165W, which will minimize the power consumption for the processor under load, but at the expense of peak performance. Increasing the EPYC 7601 cTDP to 200W will maximize peak performance by allowing the CPU to maintain higher dynamic clock speeds, but will make the microprocessor less energy efficient. Note that at maximum cTDP, the CPU thermal solution must be capable of dissipating at least 200W or the EPYC 7601 processor might engage in thermal throttling under load.

The available cTDP ranges for each EPYC model are in the table below:

Model	Nominal TDP	Minimum cTDP	Maximum cTDP**
EPYC 9654	360	320	400
EPYC 9654P	360	320	400
EPYC 9754	360	320	400

Package Power Limit (PPT) Control:

Specifies the maximum power that each CPU package may consume in the system. The actual power limit is the maximum of the Package Power Limit and cTDP. Available settings are:

Auto (Default setting): Use the fused processor PPT value.
Manual: Let user specifies customized processor PPT value.

NUMA nodes per socket (NPS):

Non-Uniform Memory Architecture (NUMA) enables the CPU cores to access memory via NUMA domains / nodes. Users can specify the number of desired NUMA nodes per populated socket in the system:

NPS0: Zero will attempt to interleave two CPU socket together.
NPS1: Each physical processor is a NUMA node, and memory accesses are interleaved across all memory channels directly connected to the physical processor.
NPS2: Each physical processor is two NUMA nodes, and memory accesses are interleaved across 4 memory channels.
NPS4: Each physical processor is four NUMA nodes, and memory accesses are interleaved across 2 memory channels.
Auto (Default setting): BIOS will use default NPS setting NPS1.

ACPI SRAT L3 Cache as NUMA Domain:

Enable the option to report each L3 cache as a NUMA domain to BIOS ACPI System Resource Affinity Table (SRAT):

Disable: Do not report each L3 cache as a NUMA domain to the OS.
Enable: Report each L3 cache as a NUMA domain to the OS.
Auto (Default setting): BIOS will use default setting.

SMT Control:

Can be used to disable symmetric multithreading. To re-enable SMT, a POWER CYCLE is needed after selecting the 'Auto' option. WARNING - S3 is NOT SUPPORTED on systems where SMT is disabled.

Disable: Single hardware thread per core.
Auto (Default setting): Two hardware threads per core.

L1 Stream HW Prefetcher:

uses the history of L1 cache memory access patterns to fetch additional sequential lines in ascending or descending order.:

auto: Default BIOS settings for general purpose.
Enable: Enable L1 Stream HW Prefetcher.
Disable: Disable L1 Stream HW Prefetcher.

L2 Stream HW Prefetcher:

uses the history of L2 cache memory access patterns to fetch additional sequential lines in ascending or descending order.

auto: Default BIOS settings for general purpose.
Enable: Enable L1 Stream HW Prefetcher.
Disable: Disable L1 Stream HW Prefetcher.

Memory interleaving:

This setting allows interleaved memory accesses across multiple memory channels in each socket, providing higher memory bandwidth.

0 = Disable.
1 = Auto (Default setting).

IOMMU:

Enable: Enables the I/O Memory Management Unit (IOMMU), which extends the AMD64 system architecture by adding support for address translation and system memory access protection on DMA transfers from peripheral devices.

0 = Disable.
1 = Enable (Default setting).

APBDIS:

APBDIS is an IO Boost disable on uncore. For any system user that needs to block these uncore optimizations that are impacting base core clock speed, we are exposing a method to disable this behavior called APBDis. This locks the fabric clock to the non-boosted speeds. Available settings are:

0 = not APBDIS (mission mode).
1 = Enable APBDIS.

Fixed SOC Pstate:

Specifying a fixed SOC P-state. this option is available if APBDIS is enabled. Available settings are:

P0: Highest-performing SOC P-state.
P1: Next-highest-performing SOC P-state.
P2: Minimum Infinity Fabric P-State.
Auto (Default setting): Dynamic.

ACPI CST C2 Latency:

Enter in microseconds (decimal value).Larger C2 latency values will reduce the number of C2 transitions and reduce C2 residency. Fewer transitions can help when performance is sensitive to the latency of C2 entry and exit. Higher residency can improve performance by allowing higher frequency boost and reduce idle core power.

Default: 800

Last updated Aug. 15, 2023.