SPEC CPU2017 Platform Settings for HPE XD225v AMD-based systems
Operating System (OS) Application/Service Tuning:
The following OS tunes could've been applied to better optimize performance of some areas of the system:
- ulimit: Used to set user limits of system-wide resources. Provides control over resources available to the shell and processes started by it. Some common ulimit commands may include:
- ulimit -s [n | unlimited]: Set the stack size to n kbytes, or unlimited to allow the stack size to grow without limit.
- ulimit -l (number): Set the maximum size that can be locked into memory.
- Performance/Scaling Governors (Linux): In-kernel CPU frequency governors are pre-configured power schemes for the CPU. The CPUfreq governors use P-states to change frequencies and lower power consumption. The dynamic governors can switch between CPU frequencies, based on CPU utilization to allow for power savings while not sacrificing performance. To set the governor, use the following commmand: "cpupower frequency-set -r -g {desired_governor}". CPUFreq provides the following generic scaling governors, located in a subdirectory (/sys/devices/system/cpu/cpufreq/) - performance, powersave, ondemand, conservative etc.
- Disabling Linux services: Certain Linux services may be disabled to minimize tasks that may consume CPU cycles.
- irqbalance: Disabled through "service irqbalance stop". Depending on the workload involved, the irqbalance service reassigns various IRQ's to system CPUs. Though this service might help in some situations, disabling it can also help environments which need to minimize or eliminate latency to more quickly respond to events.
- tuned-adm: The tuned-adm tool is a commandline interface for switching between different tuning profiles available to the tuned tuning daemon available in supported Linux distros. The default configuration file is located in /etc/tuned.conf and the supported profiles can be found in /etc/tune-profiles. Some profiles that may be available by default include: default, desktop-powersave, server-powersave, laptop-ac-powersave, laptop-battery-powersave, spindown-disk, throughput-performance, latency-performance, enterprise-storage. To set a profile, one can issue the command "tuned-adm profile (profile_name)". Here are details about relevant profiles:
- throughput-performance: Server profile for typical throughput tuning. This profile disables tuned and ktune power saving features, enables sysctl settings that may improve disk and network IO throughput performance, switches to the deadline scheduler, and sets the CPU governor to performance.
- balanced: Provides a balance between performance and power consumption. The profile uses auto-scaling and auto-tuning when possible. A possible drawback is increased latency.
- latency-performance: Server profile for typical latency tuning. This profile disables tuned and ktune power saving features, enables the deadline IO scheduler, and sets the CPU governor to performance.
- enterprise-storage: Server profile to high disk throughput tuning. This profile disables tuned and ktune power saving features, enables the deadline IO scheduler, enables hugepages and disables disk barriers, increases disk readahead values, and sets the CPU governor to performance
OS Kernel Parameter Tuning:
The following Linux Kernel parameters were tuned to better optimize performance of some areas of the system:
- dirty_background_ratio: Set through "echo 40 > /proc/sys/vm/dirty_background_ratio". This setting can help Linux disk caching and performance by setting the percentage of system memory that can be filled with dirty pages.
- dirty_ratio: Set through "echo 8 > /proc/sys/vm/dirty_ratio". This setting is the absolute maximum amount of system memory that can be filled with dirty pages before everything must get committed to disk.
- ksm/sleep_millisecs: Set through "echo 200 > /sys/kernel/mm/ksm/sleep_millisecs". This setting controls how many milliseconds the ksmd (KSM daemon) should sleep before the next scan.
- khugepaged/scan_sleep_millisecs: Set through "echo 50000 > /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs". This setting controls how many milliseconds to wait in khugepaged is there is a hugepage allocation failure to throttle the next allocation attempt.
- swappiness: The swappiness value can range from 1 to 100. A value of 100 will cause the kernel to swap out inactive processes frequently in favor of file system performance, resulting in large disk cache sizes. A value of 1 tells the kernel to only swap processes to disk if absolutely necessary. This can be set through a command like "echo 1 > /proc/sys/vm/swappiness"
- numa_balancing: Disabled through "echo 0 > /proc/sys/kernel/numa_balancing". This feature will automatically migrate data on demand so memory nodes are aligned to the local CPU that is accessing data. Depending on the workload involved, enabling this can boost the performance if the workload performs well on NUMA hardware. If the workload is statically set to balance between nodes, then this service may not provide a benefit.
- Zone Reclaim Mode: Zone reclaim allows the reclaiming of pages from a zone if the number of free pages falls below a watermark even if other zones still have enough pages available. Reclaiming a page can be more beneficial than taking the performance penalties that are associated with allocating a page on a remote zone, especially for NUMA machines. To tell the kernel to free local node memory rather than grabbing free memory from remote nodes, use a command like "echo 1 > /proc/sys/vm/zone_reclaim_mode"
- Free the file system page cache: The command "echo 1> /proc/sys/vm/drop_caches" is used to free up the filesystem page cache.
- kernel/randomize_va_space, also known as ASLR (Address Space Layout Randomization): This setting can be used to select the type of process address space randomization. Defaults differ based on whether the architecture supports ASLR, whether the kernel was built with the CONFIG_COMPAT_BRK option or not, or the kernel boot options used. Possible settings:
- 0: Turn process address space randomization off.
- 1: Randomize addresses of mmap base, stack, and VDSO pages.
- 2: Additionally randomize the heap. (This is probably the default.)
- Disabling ASLR can make process execution more deterministic and runtimes more consistent. For more information see the randomize_va_space entry in the Linux sysctl documentation.
- Transparent Hugepages (THP): THP is an abstraction layer that automates most aspects of creating, managing, and using huge pages. It is designed to hide much of the complexity in using huge pages from system administrators and developers. Huge pages increase the memory page size from 4 kilobytes to 2 megabytes. This provides significant performance advantages on systems with highly contended resources and large memory workloads. If memory utilization is too high or memory is badly fragmented which prevents hugepages being allocated, the kernel will assign smaller 4k pages instead. Most recent Linux OS releases have THP enabled by default. THP usage is controlled by the sysfs setting /sys/kernel/mm/transparent_hugepage/enabled. Possible values:
- never: entirely disable THP usage.
- madvise: enable THP usage only inside regions marked MADV_HUGEPAGE using madvise(3).
- always: enable THP usage system-wide. This is the default.
- THP creation is controlled by the sysfs setting /sys/kernel/mm/transparent_hugepage/defrag. Possible values:
- never: if no THP are available to satisfy a request, do not attempt to make any.
- defer: an allocation requesting THP when none are available get normal pages while requesting THP creation in the background.
- defer+madvise: acts like "always", but only for allocations in regions marked MADV_HUGEPAGE using madvise(3); for all other regions it's like "defer".
- madvise: acts like "always", but only for allocations in regions marked MADV_HUGEPAGE using madvise(3). This is the default.
- always: an allocation requesting THP when none are available will stall until some are made.
- An application that "always" requests THP often can benefit from waiting for an allocation until those huge pages can be assembled. For more information see the Linux transparent hugepage documentation.
Linux Huge Page settings:
If one prefers not to use Transparent Hugepages, one can always setup Huge Pages by following the below steps:
- Create a mount point for the huge pages: "mkdir /mnt/hugepages"
- The huge page file system needs to be mounted when the systems reboots. Add the following to a system boot configuration file before any services are started: "mount -t hugetlbfs nodev /mnt/hugepages"
- Set vm/nr_hugepages=N in your /etc/sysctl.conf file where N is the maximum number of pages the system may allocate.
- Reboot to have the changes take effect.
Note that further information about huge pages may be found in your Linux documentation file: /usr/src/linux/Documentation/vm/hugetlbpage.txt
Environment Variables:
The following Linux environment variables that could've possibly been tuned to better optimize performance of some areas of the system:
- GOMP_CPU_AFFINITY: Used to bind threads to specific CPUs. The variable should contain a space-separated or comma-separated list of CPUs. This list may contain different kinds of entries: either single CPU numbers in any order, a range of CPUs (M-N) or a range with some stride (M-N:S). CPU numbers are zero based. For example, GOMP_CPU_AFFINITY="0 3 1-2 4-15:2" will bind the initial thread to CPU 0, the second to CPU 3, the third to CPU 1, the fourth to CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12, and 14 respectively and then start assigning back from the beginning of the list. GOMP_CPU_AFFINITY=0 binds all threads to CPU 0. There is no libgomp library routine to determine whether a CPU affinity specification is in effect. As a workaround, language-specific library functions, e.g., getenv in C or GET_ENVIRONMENT_VARIABLE in Fortran, may be used to query the setting of the GOMP_CPU_AFFINITY environment variable. A defined CPU affinity on startup cannot be changed or disabled during the runtime of the application. If both GOMP_CPU_AFFINITY and OMP_PROC_BIND are set, OMP_PROC_BIND has a higher precedence. If neither has been set and OMP_PROC_BIND is unset, or when OMP_PROC_BIND is set to FALSE, the host system will handle the assignment of threads to CPUs.
- OMP_DYNAMIC: Dynamic adjustment of threads. Enable or disable the dynamic adjustment of the number of threads within a team. The value of this environment variable shall be TRUE or FALSE. If undefined, dynamic adjustment is disabled by default.
- OMP_SCHEDULE: How threads are scheduled. Allows to specify schedule type and chunk size. The value of the variable shall have the form: type[,chunk] where type is one of static, dynamic or guided. The optional chunk size shall be a positive integer. If undefined, dynamic scheduling and a chunk size of 1 is used.
- OMP_THREAD_LIMIT: Set the maximum number of threads. Specifies the number of threads to use for the whole program. The value of this variable shall be a positive integer. If undefined, the number of threads is not limited.
- MALLOC_CONF: This environment variable affects the execution of the allocation functions. If the environment variable MALLOC_CONF is set, the characters it contains will be interpreted as options.
Firmware Settings:
One or more of the following settings may have been set. If so, the "Platform Notes" section of the report will say so; and you can read below to find out more about what these settings mean.
- Determinism Control (Default = Auto): This option allows the user to choose between an Auto and Manual mode for Determinism Control. Values for this BIOS option can be:
- Auto: Use default performance determinism settings.
- Manual: Specify custom power/performance determinism.
- Determinism Enable (Default = Power): This option allows the user to select either Power or Performance Determinism. Values for this BIOS option can be:
- Power: Maximum performance of any individual system by leveraging the capabilities of a given CPU to the maximum, resulting in a varying performance range across the datacenter or larger deployments.
- Performance: Uniform performance across identically configured systems in a datacenter.
- NUMA nodes per socket (Default = Auto): Specifies the number of desired NUMA nodes per socket. This setting enables a trade-off between minimizing local memory latency for NUMA-aware or highly parallelizable workloads vs. maximizing per-core memory bandwidth for non-NUMA-friendly workloads. NPS2 and/or NPS4 may not be an option on certain OPNs or with certain memory populations. Values for this BIOS option can be:
- Auto: Use platform- and OPN-default NUMA nodes per socket.
- NPS0: Attempt to interleave the two sockets together.
- NPS1: Indicates a single NUMA node per socket. This setting configures all memory channels on the processor into a single NUMA domain. All of the processor cores, all attached memory, and all PCIe devices connected to the SoC are in that one NUMA domain. Memory accesses are interleaved across all 24 memory channels into a single address space.
- NPS2: 2 NUMA domains per socket, which interleaves the corresponding six memory channels within the same 6 CCD NUMA domain. Half of the cores and half of the memory channels of the SoC are grouped together into one NUMA domain, with the remaining cores and memory channels grouped into the second NUMA domain. Memory is interleaved across the six memory channels of each NUMA domain.
- NPS4: 4 partitions the processor into four NUMA domains with each logical quadrant configured as its own NUMA domain. Memory is interleaved across the three memory channels within each quadrant. PCIe devices will be local to one of the four NUMA domains depending on the quadrant (of the I/O die) that has the PCIe root complex for that device. Every pair of memory channels is interleaved. This is recommended for HPC and other highly-parallel workloads. You must use NPS4 when booting Windows systems with SMT enabled for AMD EPYC processors with more than 64 cores because Windows limits the size of a CPU group to a maximum of 64 logical cores.
- PPT Control (Default = Auto): Enables or disables the PPT (Package Power Tracker) control. Values for this BIOS option can be:
- Auto: Use platform- and OPN-default PPL (Package Power Limit).
- Manual: Set customized PPL.
- PPT (Default = 0): This option is available if the user sets the PPT Control to Manual.
- Values 100-500: Set PPT, in Watts.
- TDP Control (Default = Auto): Enables or disables the TDP (Thermal Design Power) control. Values for this BIOS option can be:
- Auto: Use platform- and OPN-default TDP.
- Manual: Set custom configurable TDP.
- TDP (Default = 0): This option is available if the user sets the TDP Control to Manual.
- Values 100-500: Set configurable TDP, in Watts.
Last updated July 29, 2025.