Operating System (OS) Application/Service Tuning:
The following OS tunes could've been applied to better optimize performance of some areas of the system:
- ulimit: Used to set user limits of system-wide resources. Provides control over resources available to the shell and processes started by it. Some common ulimit commands may include:
- ulimit -s [n | unlimited]: Set the stack size to n kbytes, or unlimited to allow the stack size to grow without limit.
- ulimit -l (number): Set the maximum size that can be locked into memory.
- Performance Governors (Linux): In-kernel CPU frequency governors are pre-configured power schemes for the CPU. The CPUfreq governors use P-states to change frequencies and lower power consumption. The dynamic governors can switch between CPU frequencies, based on CPU utilization to allow for power savings while not sacrificing performance. To set the governor, use the following commmand: "cpupower frequency-set -r -g {desired_governor}"
- Disabling Linux services: Certain Linux services may be disabled to minimize tasks that may consume CPU cycles.
- irqbalance: Disabled through "service irqbalance stop". Depending on the workload involved, the irqbalance service reassigns various IRQ's to system CPUs. Though this service might help in some situations, disabling it can also help environments which need to minimize or eliminate latency to more quickly respond to events.
- tuned-adm: The tuned-adm tool is a commandline interface for switching between different tuning profiles available to the tuned tuning daemon available in supported Linux distros. The default configuration file is located in /etc/tuned.conf and the supported profiles can be found in /etc/tune-profiles. Some profiles that may be available by default include: default, desktop-powersave, server-powersave, laptop-ac-powersave, laptop-battery-powersave, spindown-disk, throughput-performance, latency-performance, enterprise-storage. To set a profile, one can issue the command "tuned-adm profile (profile_name)". Here are details about relevant profiles:
- throughput-performance: Server profile for typical throughput tuning. This profile disables tuned and ktune power saving features, enables sysctl settings that may improve disk and network IO throughput performance, switches to the deadline scheduler, and sets the CPU governor to performance.
- latency-performance: Server profile for typical latency tuning. This profile disables tuned and ktune power saving features, enables the deadline IO scheduler, and sets the CPU governor to performance.
- enterprise-storage: Server profile to high disk throughput tuning. This profile disables tuned and ktune power saving features, enables the deadline IO scheduler, enables hugepages and disables disk barriers, increases disk readahead values, and sets the CPU governor to performance
OS Kernel Parameter Tuning:
The following Linux Kernel parameters were tuned to better optimize performance of some areas of the system:
- dirty_background_ratio: Set through "echo 40 > /proc/sys/vm/dirty_background_ratio". This setting can help Linux disk caching and performance by setting the percentage of system memory that can be filled with dirty pages.
- dirty_ratio: Set through "echo 8 > /proc/sys/vm/dirty_ratio". This setting is the absolute maximum amount of system memory that can be filled with dirty pages before everything must get committed to disk.
- ksm/sleep_millisecs: Set through "echo 200 > /sys/kernel/mm/ksm/sleep_millisecs". This setting controls how many milliseconds the ksmd (KSM daemon) should sleep before the next scan.
- khugepaged/scan_sleep_millisecs: Set through "echo 50000 > /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs". This setting controls how many milliseconds to wait in khugepaged is there is a hugepage allocation failure to throttle the next allocation attempt.
- swappiness: The swappiness value can range from 1 to 100. A value of 100 will cause the kernel to swap out inactive processes frequently in favor of file system performance, resulting in large disk cache sizes. A value of 1 tells the kernel to only swap processes to disk if absolutely necessary. This can be set through a command like "echo 1 > /proc/sys/vm/swappiness"
- numa_balancing: Disabled through "echo 0 > /proc/sys/kernel/numa_balancing". This feature will automatically migrate data on demand so memory nodes are aligned to the local CPU that is accessing data. Depending on the workload involved, enabling this can boost the performance if the workload performs well on NUMA hardware. If the workload is statically set to balance between nodes, then this service may not provide a benefit.
- Zone Reclaim Mode: Zone reclaim allows the reclaiming of pages from a zone if the number of free pages falls below a watermark even if other zones still have enough pages available. Reclaiming a page can be more beneficial than taking the performance penalties that are associated with allocating a page on a remote zone, especially for NUMA machines. To tell the kernel to free local node memory rather than grabbing free memory from remote nodes, use a command like "echo 1 > /proc/sys/vm/zone_reclaim_mode"
- Free the file system page cache: The command "echo 1> /proc/sys/vm/drop_caches" is used to free up the filesystem page cache.
- kernel/randomize_va_space, also known as ASLR (Address Space Layout Randomization): This setting can be used to select the type of process address space randomization. Defaults differ based on whether the architecture supports ASLR, whether the kernel was built with the CONFIG_COMPAT_BRK option or not, or the kernel boot options used. Possible settings:
- 0: Turn process address space randomization off.
- 1: Randomize addresses of mmap base, stack, and VDSO pages.
- 2: Additionally randomize the heap. (This is probably the default.)
- Disabling ASLR can make process execution more deterministic and runtimes more consistent. For more information see the randomize_va_space entry in the Linux sysctl documentation.
- Transparent Hugepages (THP): THP is an abstraction layer that automates most aspects of creating, managing, and using huge pages. It is designed to hide much of the complexity in using huge pages from system administrators and developers. Huge pages increase the memory page size from 4 kilobytes to 2 megabytes. This provides significant performance advantages on systems with highly contended resources and large memory workloads. If memory utilization is too high or memory is badly fragmented which prevents hugepages being allocated, the kernel will assign smaller 4k pages instead. Most recent Linux OS releases have THP enabled by default. THP usage is controlled by the sysfs setting /sys/kernel/mm/transparent_hugepage/enabled. Possible values:
- never: entirely disable THP usage.
- madvise: enable THP usage only inside regions marked MADV_HUGEPAGE using madvise(3).
- always: enable THP usage system-wide. This is the default.
- THP creation is controlled by the sysfs setting /sys/kernel/mm/transparent_hugepage/defrag. Possible values:
- never: if no THP are available to satisfy a request, do not attempt to make any.
- defer: an allocation requesting THP when none are available get normal pages while requesting THP creation in the background.
- defer+madvise: acts like "always", but only for allocations in regions marked MADV_HUGEPAGE using madvise(3); for all other regions it's like "defer".
- madvise: acts like "always", but only for allocations in regions marked MADV_HUGEPAGE using madvise(3). This is the default.
- always: an allocation requesting THP when none are available will stall until some are made.
- An application that "always" requests THP often can benefit from waiting for an allocation until those huge pages can be assembled. For more information see the Linux transparent hugepage documentation.
Linux Huge Page settings:
If one prefers not to use Transparent Hugepages, one can always setup Huge Pages by following the below steps:
- Create a mount point for the huge pages: "mkdir /mnt/hugepages"
- The huge page file system needs to be mounted when the systems reboots. Add the following to a system boot configuration file before any services are started: "mount -t hugetlbfs nodev /mnt/hugepages"
- Set vm/nr_hugepages=N in your /etc/sysctl.conf file where N is the maximum number of pages the system may allocate.
- Reboot to have the changes take effect.
Note that further information about huge pages may be found in your Linux documentation file: /usr/src/linux/Documentation/vm/hugetlbpage.txt
Environment Variables:
The following Linux envrionment variables that could've possibly been tuned to better optimize performance of some areas of the system:
- GOMP_CPU_AFFINITY: Used to bind threads to specific CPUs. The variable should contain a space-separated or comma-separated list of CPUs. This list may contain different kinds of entries: either single CPU numbers in any order, a range of CPUs (M-N) or a range with some stride (M-N:S). CPU numbers are zero based. For example, GOMP_CPU_AFFINITY="0 3 1-2 4-15:2" will bind the initial thread to CPU 0, the second to CPU 3, the third to CPU 1, the fourth to CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12, and 14 respectively and then start assigning back from the beginning of the list. GOMP_CPU_AFFINITY=0 binds all threads to CPU 0. There is no libgomp library routine to determine whether a CPU affinity specification is in effect. As a workaround, language-specific library functions, e.g., getenv in C or GET_ENVIRONMENT_VARIABLE in Fortran, may be used to query the setting of the GOMP_CPU_AFFINITY environment variable. A defined CPU affinity on startup cannot be changed or disabled during the runtime of the application. If both GOMP_CPU_AFFINITY and OMP_PROC_BIND are set, OMP_PROC_BIND has a higher precedence. If neither has been set and OMP_PROC_BIND is unset, or when OMP_PROC_BIND is set to FALSE, the host system will handle the assignment of threads to CPUs.
- OMP_DYNAMIC: Dynamic adjustment of threads. Enable or disable the dynamic adjustment of the number of threads within a team. The value of this environment variable shall be TRUE or FALSE. If undefined, dynamic adjustment is disabled by default.
- OMP_SCHEDULE: How threads are scheduled. Allows to specify schedule type and chunk size. The value of the variable shall have the form: type[,chunk] where type is one of static, dynamic or guided. The optional chunk size shall be a positive integer. If undefined, dynamic scheduling and a chunk size of 1 is used.
- OMP_THREAD_LIMIT: Set the maximum number of threads. Specifies the number of threads to use for the whole program. The value of this variable shall be a positive integer. If undefined, the number of threads is not limited.
- MALLOC_CONF: This environment variable affects the execution of the allocation functions. If the environment variable MALLOC_CONF is set, the characters it contains will be interpreted as options.
]]>