SPEC CPU2017 Flag Description for Quanta Computer Inc.
OS Tuning
ulimit:
Used to set user limits of system-wide resources. Provides control over resources available to the shell and processes started by it. Some common ulimit commands may include:
- ulimit -s [n | unlimited]: Set the stack size to n kbytes, or unlimited to allow the stack size to grow without limit.
- ulimit -l (number): Set the maximum size that can be locked into memory.
Disabling Linux services:
Certain Linux services may be disabled to minimize tasks that may consume CPU cycles.
irqbalance:
Disabled through "service irqbalance stop". Depending on the workload involved, the irqbalance service reassigns various IRQ's to system CPUs. Though this service might help in some situations, disabling it can also help environments which need to minimize or eliminate latency to more quickly respond to events.
Performance Governors (Linux):
In-kernel CPU frequency governors are pre-configured power schemes for the CPU. The CPUfreq governors use P-states to change frequencies and lower power consumption. The dynamic governors can switch between CPU frequencies, based on CPU utilization to allow for power savings while not sacrificing performance.
Other options beside a generic performance governor can be set, such as the perf-bias:
--perf-bias, -b
On supported Intel processors, this option sets a register which allows the cpupower utility (or other software/firmware) to set a policy that controls the relative importance of performance versus energy savings to the processor. The range of valid numbers is 0-15, where 0 is maximum performance and 15 is maximum energy efficiency.
The processor uses this information in model-specific ways when it must select trade-offs between performance and energy efficiency. This policy hint does not supersede Processor Performance states (P-states) or CPU Idle power states (C-states), but allows software to have influence where it would otherwise be unable to express a preference.
On many Linux systems one can set the perf-bias for all CPUs through the cpupower utility with one of the following commands:
- "cpupower -c all set -b 0"
- "cpupower -c all set --perf-bias 0"
- "cpupower set -b 0"
Tuning Kernel parameters:
The following Linux Kernel parameters were tuned to better optimize performance of some areas of the system:
- dirty_background_ratio: Set through "echo 40 > /proc/sys/vm/dirty_background_ratio". This setting can help Linux disk caching and performance by setting the percentage of system memory that can be filled with dirty pages.
- dirty_ratio: Set through "echo 40 > /proc/sys/vm/dirty_ratio". This setting is the absolute maximum amount of system memory that can be filled with dirty pages before everything must get committed to disk.
- swappiness: The swappiness value can range from 1 to 100. A value of 100 will cause the kernel to swap out inactive processes frequently in favor of file system performance, resulting in large disk cache sizes. A value of 1 tells the kernel to only swap processes to disk if absolutely necessary. This can be set through a command like "echo 1 > /proc/sys/vm/swappiness"
- ksm/sleep_millisecs: Set through "echo 200 > /sys/kernel/mm/ksm/sleep_millisecs". This setting controls how many milliseconds the ksmd (KSM daeomn) should sleep before the next scan.
- khugepaged/scan_sleep_millisecs: Set through "echo 50000 > /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs". This setting controls how many milliseconds to wait in khugepaged is there is a hugepage allocation failure to throttle the next allocation attempt.
- numa_balancing: Disabled through "echo 0 > /proc/sys/kernel/numa_balancing". This feature will automatically migrate data on demand so memory nodes are aligned to the local CPU that is accessing data. Depending on the workload involved, enabling this can boost the performance if the workload performs well on NUMA hardware. If the workload is statically set to balance between nodes, then this service may not provide a benefit.
- Zone Reclaim Mode: Zone reclaim allows the reclaiming of pages from a zone if the number of free pages falls below a watermark even if other zones still have enough pages available. Reclaiming a page can be more beneficial than taking the performance penalties that are associated with allocating a page on a remote zone, especially for NUMA machines. To tell the kernel to free local node memory rather than grabbing free memory from remote nodes, use a command like "echo 1 > /proc/sys/vm/zone_reclaim_mode"
- max_map_count-n: The maximum number of memory map areas a process may have. Memory map areas are used as a side-effect of calling malloc, directly by mmap and mprotect, and also when loading shared libraries.
tuned-adm:
The tuned-adm tool is a commandline interface for switching between different tuning profiles available to the tuned tuning daeomn available in supported Linux distros. The default configuration file is located in /etc/tuned.conf and the supported profiles can be found in /etc/tune-profiles.
Some profiles that may be available by default include: default, desktop-powersave, server-powersave, laptop-ac-powersave, laptop-battery-powersave, spindown-disk, throughput-performance, latency-performance, enterprise-storage
To set a profile, one can issue the command "tuned-adm profile (profile_name)". Here are details about relevant profiles.
- throughput-performance: Server profile for typical throughput tuning. This profile disables tuned and ktune power saving features, enables sysctl settings that may improve disk and network IO throughphut performance, switches to the deadline scheduler, and sets the CPU governor to performance.
- latency-performance: Server profile for typical latency tuning. This profile disables tuned and ktune power saving features, enables the deadline IO scheduler, and sets the CPU governor to performance.
- enterprise-storage: Server profile to high disk throughput tuning. This profile disables tuned and ktune power saving features, enables the deadline IO scheduler, enables hugepages and disables disk barriers, increases disk readahead values, and sets the CPU governor to performance
Transparent Huge Pages (THP):
THP is an abstraction layer that automates most aspects of creating, managing, and using huge pages. THP is designed to hide much of the complexity in using huge pages from system administrators and developers, as normal huge pages must be assigned at boot time, can be difficult to manage manually, and often require significant changes to code in order to be used effectively. Transparent Hugepages increase the memory page size from 4 kilobytes to 2 megabytes. Transparent Hugepages provide significant performance advantages on systems with highly contended resources and large memory workloads. If memory utilization is too high or memory is badly fragmented which prevents hugepages being allocated, the kernel will assign smaller 4k pages instead. Most recent Linux OS releases have THP enabled by default.
Linux Huge Page settings:
If you need finer control and manually set the Huge Pages you can follow the below steps:
- Create a mount point for the huge pages: "mkdir /mnt/hugepages"
- The huge page file system needs to be mounted when the systems reboots. Add the following to a system boot configuration file before any services are started: "mount -t hugetlbfs nodev /mnt/hugepages"
- Set vm/nr_hugepages=N in your /etc/sysctl.conf file where N is the maximum number of pages the system may allocate.
- Reboot to have the changes take effect.
Note that further information about huge pages may be found in your Linux documentation file: /usr/src/linux/Documentation/vm/hugetlbpage.txt
- Determinism Control:
- This BIOS option allows user to choose AGESA determinism control. Available settings are:
-
- Manual: User can set customized determinism.
- Auto (Default setting): Use the fused determinism.
- Determinism Slider:
- Selects the determinism mode for the CPU:
-
- Auto : Use default performance determinism settings.
- Power: Maximizes performance within the power limits defined by cTDP and PPT.
- Performance: Provides predictable performance across all processors of the same type.
- cTDP Control(Configurable TDP):
- TDP is an acronym for “Thermal Design Power.” TDP is the recommended target for power used when designing the cooling capacity for a server. Configures the maximum power that the CPU will consume, up to the platform power limit (PPT). Valid values vary by CPU model. If value outside the valid range is set, the CPU will automatically adjust the value so that it does fall within the valid range. When increasing cTDP, additional power will only be consumed up to the Package Power Limit (PPT), which may be less than the cTDP setting.
-
Model |
Nominal TDP |
Minimum cTDP |
Maximum cTDP** |
EPYC 7742 |
225 |
225 |
240 |
EPYC 7702 |
200 |
165 |
200 |
- Package Power Limit (PPT) Control:
- Specifies the maximum power that each CPU package may consume in the system. The actual power limit is the maximum of the Package Power Limit and cTDP. Available settings are:
-
- Auto (Default setting): Use the fused processor PPT value.
- Manual: Let user specifies customized processor PPT value.
- NUMA nodes per socket (NPS):
- Non-Uniform Memory Architecture (NUMA) enables the CPU cores to access memory via NUMA domains / nodes. Users can specify the number of desired NUMA nodes per populated socket in the system:
-
- NPS0: Zero will attempt to interleave two CPU socket together.
- NPS1: Each physical processor is a NUMA node, and memory accesses are interleaved across all memory channels directly connected to the physical processor.
- NPS2: Each physical processor is two NUMA nodes, and memory accesses are interleaved across 4 memory channels.
- NPS4: Each physical processor is four NUMA nodes, and memory accesses are interleaved across 2 memory channels.
- Auto: BIOS will use default NPS setting NPS1.
- SMT Control:
- Can be used to disable symmetric multithreading. To re-enable SMT, a POWER CYCLE is needed after selecting the 'Auto' option. WARNING - S3 is NOT SUPPORTED on systems where SMT is disabled.
- IOMMU:
- Enable: Enables the I/O Memory Management Unit (IOMMU), which extends the AMD64 system architecture by adding support for address translation and system memory access protection on DMA transfers from peripheral devices.
- APBDIS:
- APBDis is an IO Boost disable on uncore. For any system user that needs to block these uncore optimizations that are impacting base core clock speed, we are exposing a method to disable this behavior called APBDis. This locks the fabric clock to the non-boosted speeds. Available settings are:
-
- 0 = not APEDIS (mission mode).
- 1 = Enable APBDIS.
Last updated Oct. 29, 2019.