SFS 3.0
Standard Performance Evaluation Corporation (SPEC)
6585 Merchant Place, Suite 100
Warrenton, VA 20187, USA
Phone: 540-349-7878
Fax: 540-349-5992
E-Mail: info@spec.org
www.spec.org
Copyright (c) 2001 by Standard Performance Evaluation Corporation
(SPEC)
All rights reserved
SPEC and SFS are registered trademarks of the Standard Performance Evaluation Corporation
NFS is a registered trademark of Sun Microsystems, Inc.
CHAPTER 1 Introduction to SPECsfs
TABLE 1. SFS work loads and their mix percentages
Basis for change in mix of operations.
Modifications in the file set in SFS 2.0
The new work loads for SFS 3.0
Modifications in the file set in SFS 3.0
TABLE 2. File size distribution
Working set and file distribution
Support for Linux & BSD clients
Source and build changes for portability
Shell-script changes for portability
Documentation changes in SFS 3.0
CHAPTER 2 Running Instructions
Complying with the Uniform Access Rule
More obscure variables in the RC file.
Table 3. Available values for the DEBUG flags:
Setting up the SFS Environment and building tools
Setting the Environment/Compiler Variables
Setting up the SPECsfs Parameters
SFS Remote Client Setup Utilities
Compiling and Running SFS without the menu-driven tools
Generating the Submission File
CHAPTER 4 SFS Disclosures and Run Rules
Fair Use of SPECsfs97_R1 Results
Research and Academic usage of SPECsfs97_R1
Overview of SPEC SFS 3.0 Run Rules
Benchmark Software Requirements
Server configuration requirements
SPEC’s Description of Stable Storage for SFS 3.0
Protocol definition of stable storage and its use
Stable storage further defined
Examples which are not considered stable storage
SPEC’s Description of Uniform Access for SFS 3.0
Network configuration requirements
Benchmark Execution Requirements
Server File System Creation and Configuration
Data Point Specification for Results Disclosure
Maximum response time for Results Disclosure
Over all response time calculation
Benchmark Modifiable Parameters
Valid methods for benchmark execution
Benchmark metric or minimum disclosure
Full disclosure of benchmark results
Server stable storage configuration
Other server hardware configuration
Network hardware configuration
SPEC SFS 3.0 (SFS97_R1) is
the latest version of the Standard Performance Evaluation Corp.'s benchmark
that measures NFS file server throughput and response time. It provides
a standardized method for comparing performance across different vendor
platforms. This is an incremental release that based upon the design of
SFS 2.0 (SFS97) and address several critical problems uncovered in that release,
additional it addresses several tools issues and revisions to the
run and reporting rules.
The major features of SPEC SFS 3.0 (SFS97_R1) includes:
resolves defects uncovered in SFS 2.0
measures results for both NFS protocol version 3 and version 2,
either TCP or UDP can be used as the network transport,
the operation mix closely matches real-world NFS workloads,
the benchmark distribution CD includes precompiled and tested binaries,
has an interface to accommodate both accomplished and novice users,
includes report page generation tool.
This document specifies the guideline on how SPEC SFS 3.0 is to be run
for measuring and publicly reporting performance results. These rules
have been established by the SPEC SFS Subcommittee and
approved by the SPEC Open Systems Steering Committee. They ensure that
results generated with this suite are meaningful, comparable to other generated
results, and are repeatable. Per the SPEC license
agreement, all results publicly disclosed must adhere to these Run
and Reporting Rules.
This document also includes the background and design of the SFS benchmark
and a guide to using the SFS tools.
SPEC released SFS 1.0 in 1993. In November of 1994 SFS 1.1 was released which fixed a set of minor problems. Version 1.X of the SFS benchmark and its related work load were commonly referred to as LADDIS [Wittle]. SFS 1.X contains support for measuring NFS version 2 servers with the UDP network transport.
With the advance of NFS server technology and the continuing change in customer workloads, SPEC has updated SFS 1.1 to reflect these changes. SFS 2.0, released in December of 1997, reflects the efforts of SPEC in this regard. With the release of SFS 2.0, the LADDIS name was replaced with the preferred name of SPECsfs97.
The SPECsfs benchmark is a synthetic benchmark that generates an increasing load of NFS operations against the server and measures the response time (which degrades) as load increases. The older version, SFS 1.1, only supports NFS version 2 over UDP for results generation. SFS 2.0 added support for NFS version 3 server measurements. SFS 2.0 also added support for the use of TCP as a network transport in generating benchmark results. The SPECsfs workload consists primarily of the mix of NFS operations, the file set, block size distribution, and the percentage of writes which are appends versus overwrites.
The single workload in SFS 1.1 measured NFS Version 2 over UDP and presented the server with a heavy write-oriented mix of operations (see Table 1).
The 15% WRITE component for NFS was considered high, and WRITE activity dominated processing on most servers during a run of the SFS 1.1 work load. The operation mix for the SFS 1.1 workload was obtained primarily from nhfsstone (a synthetic NFS Version 2 benchmark developed by Legato Systems). Block size and fragment distributions were derived from studies at Digital. Append mode writes accounted for 70% of the total writes generated by the workload.
In SFS 1.1, 5MB per NFS op/s of data was created to force increasing disk head motion when the server misses the cache and 1MB per NFS op/s was actually accessed (that is 20% of the data created was accessed at any point generated). The 1MB of data accessed per NFS op/s was accessed according to a Poisson distribution to provide a simulation of more frequently accessed files.
TABLE 1. SFS work loads and their mix percentages
| NFS |
SFS 1.1 |
SFS 2.0 & 3.0 |
SFS 2.0 & 3.0 |
| Operation |
NFSv2 |
NFSv2 |
NFSv3 |
| LOOKUP |
34% |
36% |
27% |
| READ |
22% |
14% |
18% |
| WRITE |
15% |
7% |
9% |
| GETATTR |
13% |
26% |
11% |
| READLINK |
8% |
7% |
7% |
| READDIR |
3% |
6% |
2% |
| CREATE |
2% |
1% |
1% |
| REMOVE |
1% |
1% |
1% |
| FSSTAT |
1% |
1% |
1% |
| SETATTR |
1% |
||
| READDIRPLUS |
9% |
||
| ACCESS |
7% |
||
| COMMIT |
5% |
SFS 2.0 supported both NFS version 2 and NFS version 3. The results for each version were not comparable. The NFS Version 2 mix was derived from NFS server data. The NFS Version 3 mix was desk-derived from the NFS Version 2 mix. Neither of these workloads were comparable to the SFS 1.1 work load
From SFS 1.1, there were two main areas of change in the workload generated by the benchmark. To determine the workload mix, data was collected from over 1000 servers over a one month period. Each server was identified as representing one of a number of environments, MCAD, Software Engineering, etc. A mathematical cluster analysis was performed to identify a correlation between the servers. One cluster contained over 60% of the servers and was the only statistically significant cluster. There was no correlation between this mix and any single identified environment. The conclusion was that the mix is representative of most NFS environments and was used as the basis of the NFS version 2 workload.
Due to the relatively low market penetration of NFS version 3 (compared to NFS version 2), it was difficult to obtain the widespread data to perform a similar data analysis. Starting with the NFS version 2 mix and using published comparisons of NFS version 3 and NFS version 2 given known client workloads [Pawlowski], the NFS version 3 mix was derived and verified against the Sun Microsystems network of servers.
The file sets in the SFS 2.0 and workloads were modified so that the overall size doubled as compared to SFS 1.1 (10 MB per ops/s load requested load). As disk capacities have grown, so has the quantity of data stored on the disk. By increasing the overall file set size a more realistic access pattern was achieved. Although the size doubled, the percentage of data accessed was cut in half resulting in the same absolute amount of data accessed. While the amount of disk space used grew at a rapid rate, the amount actually accessed grew at a substantially slower rate. Also the file set was changed to include a broader range of file sizes (see table 2 below). The basis for this modification was a study done of a large AFS distributed file system installation that was at the time being used for a wide range of applications. These applications ranged from classic software development to administrative support applications to automated design applications and their data sets. The SFS2.0 file set included some very large files which are never actually accessed but which affect the distribution of files on disk by virtue of their presence.
There is no change in the mix of operations in SFS 3.0. (See TABLE 1 above) The mix is the same as in SFS 2.0. However, t he results for SFS 3.0 are not comparable to results from SFS 2.0 or SFS 1.1. SFS 3.0 contains changes in the working set selection algorithm that fixes errors that were present in the previous versions. The selection algorithm in SFS 3.0 accurately enforces the originally defined working set for SFS 2.0. Also enhancements to the workload mechanism improve the benchmark's ability to maintain a more even load on the SUT during the benchmark. These enhancements affect the workload and the results. Results from SFS 3.0 should only be compared with other results from SFS 3.0.
The files selected by SFS 3.0 are on a “best fit” basis, instead of purely random as with SFS 1.1. The “best fit” algorithm in SFS 2.0 contained an error that prevented it from working as intended. This has been corrected in SFS 3.0.
SFS 3.0 contains changes in the working set selection algorithm that fix errors that were present in the previous versions. The file set used in SFS 3.0 is the same file set as was used in SFS 2.0 with algorithmic enhancements to eliminate previous errors in the file-set selection mechanism. The errors in previous versions of SFS often reduced the portion of the file-set actually accessed, which is called the "working set" .
TABLE 2. File size distribution
| Percentage |
Filesize |
| 33% |
1KB |
| 21% |
2KB |
| 13% |
4KB |
| 10% |
8KB |
| 8% |
16KB |
| 5% |
32KB |
| 4% |
64KB |
| 3% |
128KB |
| 2% |
256KB |
| 1% |
1MB |
There are several areas of change in the SPEC SFS 3.0 benchmark. The changes are grouped into the following areas:
Measurement of time.
Regulation of the load.
Working set and file access distribution.
Other enhancements
Documentation changes
Within each of these areas there is a brief description of what motivated the change along with a detailed description of the new mechanisms.
Gettimeofday () resolution.
In the SFS 2.0 benchmark time was measured with the gettimeofday () interface. The function gettimeofday () was used to measure intervals of time that were short in duration. SFS 3.0 now measures the resolution of the gettimeofday () function to ensure its resolution is sufficient to measure these short events. If the resolution of gettimeofday () is 100 microseconds or better then the benchmark will proceed. If it is not then the benchmark will log the resolution and terminate. The user must increase the resolution to at least 100 microseconds before the benchmark will permit the measurement to continue.
Select () resolution compensation.
In the SFS benchmark there is a regulation mechanism that establishes a steady workload. This regulation mechanism uses select () to introduce sleep intervals on the clients. This sleep interval is needed if the client is performing more requests than was intended. In SFS 2.0 the regulation mechanism relied on select () to suspend the client for the specified number of microseconds. The last parameter to select is a pointer to a timeval structure. The timeval structure contains a field for seconds and another field for microseconds. The implementation of select may or may not provide microsecond granularity. If the requested value is less than the granularity of the implementation then it is rounded up to the nearest value that is supported by the system. On many systems the granularity is 10 milliseconds. The mechanism in SFS 2.0 could fail if the granularity of the select () call was insufficient. It was possible that the benchmark could attempt to slow the client by a few milliseconds and have the unintended effect of slowing the client by 10 milliseconds or more.
The SFS 2.0 benchmark makes adjustments to the sleep interval at two different times during the benchmark. During the warm-up phase the benchmark makes adjustments every 2 seconds. In the run phase it makes adjustments every 10 seconds. Once the adjustment was made the adjustment value was used for the rest of the interval (2 seconds or 10 seconds) until the next time the adjustment was recalculated. If the granularity of select's timeout was insufficient then the sleep duration would be incorrect and would be used for the entire next interval.
The mechanism used in SFS 3.0 is more complex. The sleep interval is calculated as it was in SFS 2.0. When the client is suspended in select () and re-awakens it checks the amount of time that has passed using a the gettimeofday () interface. This allows the client to know if the amount of time that it was suspended was the desired value. In SFS 3.0 the requested sleep interval is examined with each NFS operation. If the requested sleep interval was for 2 milliseconds and the actual time that the client slept was 10 milliseconds then the remaining 8 milliseconds of extra sleep time is remembered in a compensation variable. When the next NFS operation is requested and goes to apply the sleep interval of 2 milliseconds the remaining 8 milliseconds is decremented by the requested 2 milliseconds and no actual sleep will be performed. Once the remainder has been consumed then the process begins again. This mechanism permits the client to calculate and use sleep intervals that are smaller than the granularity of the select () system call. The new mechanism performs these compensation calculations on every NFS operation.
The gettimeofday() interface measures wall clock interval that the
process was suspended by select (). This interval may occasionally
include periods of time that were unrelated to select (), such
as context swtiches, cron jobs, interrupts, and so on. SFS 3.0 resets
the compensation variable whenever it reaches 100 milliseconds, so that
noise from unrelated events does not overload the compensation mechanism.
The SFS 2.0 benchmark uses a mechanism to establish a steady workload. This mechanism calculates the amount of work that needs to be completed in the next interval. It calculates the amount of sleep time (sleep duration per NFS operation) that will be needed for each operation so that the desired throughput will be achieved. During the warm-up phase the interval for this calculation is every 2 seconds. During the run phase the interval for this calculation is every 10 seconds. If the client performs more operations per second than was desired then the sleep duration for each NFS operation over the next interval is increased. In SFS 2.0 the sleep duration for each NFS operation could be increased too quickly and result in the client sleeping for the entire next. This resulted in no work being performed for the entire next interval. When the next interval completed then the algorithm in SFS 2.0 could determine that it needed to decrease the sleep duration for the next interval. The next interval would perform work and could then again have performed too much work and once again cause the next sleep duration calculation to overshoot and cause the next interval to perform no work. This oscillation could continue for the duration of the test.
In SFS 3.0 the nerw sleep interval is restricted to be no more than:
2 * (previous_sleep_interval +5) Units are in milliseconds
This reduces how aggressively the algorithm increments the sleep interval and permits the steady workload to be achieved. In SFS 2.0 the calculation for how much work to perform in the next interval would attempt to catch up completely in the next interval. This has been changed so that the sleep duration will not change to rapidly.
SFS 3.0 also checks the quantity of work to be performed for each interval, in the run phase, and if any interval contains zero operations then the benchmark logs an error and terminates.
In order to understand the changes in the SFS 3.0 benchmark there is need for the reader to become familiar with several internal mechanisms in SFS 2.0. The following is a brief description of these mechanisms. The following graphic is provided to assist in understanding the overall file distribution of SFS.
The SFS benchmark creates files that will later be used for measurement of the system's performance. The working set of the SFS benchmark is 10 percent of all of the files that it creates. This working set is established when the benchmark initializes. This initialization groups the files in the working set into access groups. Each group contains the same number of files. For each group there is a probability of access that is calculated using a Poisson distribution for all of the groups. The use of a Poisson distribution simulates the access behavior of file servers. That behavior being that some files are accessed more frequently than others. The Poisson probability is used to create a range value that each group encompasses. The range value for each group is the
Poisson probability * 1000 + previous_groups_range_value. For groups with a low probability the range value is incremented by a small number. For groups with a high probability the range value is incremented by a large number.
During the run phase each NFS operation selects one of the files in the working set to be accessed. Since the time to select the file is inside the measurement section it is critical that the file selection mechanism be as non-intrusive as possible. This selection mechanism uses a random number that is less than or equal to the maximum range value that was calculated for all the groups. A binary search of the group's ranges is performed to select the group that corresponds to this random value. After the group is selected then another random number is used to select a particular file within the group.
The following is a graphical representation of the SFS 2.0 Poisson distribution that would be used when the operations/second/process is 25. This results in 12 access groups with the access distribution seen below.
The problem with SFS 2.0 is that the Poisson distribution could deteriorate as the number of files being accessed by any process became large. The probabilities for some of the access groups became zero due to rounding. The following is a graphical representation of the SFS 2.0 Poisson distribution that would be used when the operations/sec/process is increased and there are 192 access groups.
In SFS 2.0 the number of files in the working set is reduced as the number of operations/sec/proc is increased.
The algorithm also contained a mathematical error that could eventually reduce the number of access groups to one. This was not seen in any previous results as the criteria to activate this defect was that the number of operations/second/process would need to be above 500 and no previous results were in this range. For more details on each defect in SFS 2.0 see the “Defects in SFS 2.0” written by Stephen Gold from Network Appliance on the SPEC web site.
The defects in SFS 2.0 resulted in an overall reduction in the working set and may have impacted the SPECsfs97 results. The exact impact on the result depends on the size of the caches in the server and other factors. If the caches were sufficiently large as to encompass the entire 10 percent of all of the files that were created (the intended working set) then the impact on the result may be negligible. This is because if all of the files that should have been accessed would have fit in the caches then the selection of which file to access becomes moot.
In SFS 3.0 the file selection algorithm has been changed so that reduction
of the working set no longer occurs. The algorithm in SFS 3.0 is based
on the Poisson probabilities used in SFS 2.0, but SFS 3.0
manipulates the probabilities to ensure that all of the files in the
working set have a reasonable probability of being accessed.
To achieve this, SFS 3.0 implements a "cyclical Poisson" distribution.
The following graph shows the SFS 3.0 access probabilities for 192 access
groups:
Instead of varying the parameter of the Poisson distribution to generate
values for 192 access groups, the relative probabilities for 25 ops/sec
(12 access groups) are simply repeated as many times as
necessary. Thus there are no access groups with extremely small probabilities,
and no huge floating-point values are needed to compute them.
For 192 access groups, a total of 16 repetitions or "cycles" are
used. Each cycle of access groups has the same aggregate probability of
access, namely 1/16. (The number of access groups in SFS 3.0 is always
a multiple of 12, so there are no partial cycles.)
The Poisson probabilities for 25 ops/sec (12 access groups) are scaled
down by 16 (the number of cycles) and applied to the first 12 access groups,
which constitute the first cycle of the distribution. The same
probabilities are also applied to the next 12 access groups (the second
cycle) and the process is repeated across all 16 cycles.
Another view of the working set is to divide it into 12 distinct access-group "generations",
each of which is represented by a single access group in each cycle. Within
a given generation, all the access-groups have the same probability of
access. For instance, groups 1, 13, 25, ... 181 constitute one generation.
The cache profile across 192 groups looks very much like the cache profile across 12 groups. Why is this the case? The answer is that the cyclical Poisson distribution results in the following distribution of accesses across generations:
In the above graph there are still 192 access groups, but they have been
aggregated together into 12 generations. The probability of access
for each generation has been plotted. Note that the curve for
192 access groups (with 16 access groups per generation) looks the
same as the one for 12 access groups (with one access group per generation).
Hence the cache behavior of SFS is no longer sensitive to the number
of load-generating processes used to achieve a given request rate.
In theory, the same effect could have been achieved by always having
12 access groups, no matter how many files there are. This was not
done for fear that exhaustive searches for files within a very large access
group would be expensive, causing the load-generators to bog down in
file selection.
The SFS 3.0 benchmark contains support for Linux and BSD clients. As the popularity of these other operating systems continues to grow the demand for their support was seen as an indication of the importance of SFS support.
In SFS 2.0 after a access group was selected for access, a file was selected for access. This selection was done by picking a random file within the group to be accessed and then searching for a file that meets the transfer size criteria. There was an attempt to pick a file based on the best fit of transfer size and available file sizes. This mechanism was not working correctly due to an extra line of code that was not needed. This defect resulted in the first file to have a size equal to or larger than the transfer size being picked.
SFS 3.0 eliminates the extra line of code and permits the selection to be a best fit selection instead of a first fit selection.
The “generate” script is now included with SFS 3.0. This shell script is used to create the submission that is sent to SPEC for review and publication of SFS results.
In SFS 2.0 the benchmark tested the Readdirplus() functionality of NFS version 3. However it did not validate that all of the requested data was returned by the operation. SFS 3.0 performs the additional validation and ensures that all of the requested attributes and data are returned from the Readdirplus() operation.
Updated release number from 2.0 to 3.0.
Updated version date from " 23 October 1997 " to " 20
June 2001 ".
Updated all SPEC copyrights to 2001.
Update SPEC mailing address to reflect move from Manassas
to Warrenton.
Reduction in the memory required by each client to run the benchmark.
Corrected the calculations of atime.nseconds
and mtime.nseconds.
Format total_fss_bytes using "%10lu" instread
of "%10d" to avoid wraparound.
Set variables in sfs_mcr so that processes
get cleaned up.
Fix a typo in the sample sfs_rc.
New compiler flags for IBM.
Add "linux" and "freebsd" wrappers.
Don't include <stropts.h> on FreeBSD.
Save sockaddr_in before calling ioctl(SIOCGIFFLAGS).
Limit select's size to FD_SETSIZE.
Remove unused svc_getreq() function which
was nonportable.
Implemented a better way to set CDROM_BENCHDIR.
Removed use of the "function" keyword
when defining shell functions.
Deleted dangling "-a" in installsfs
conditional.
Added missing back-tics in install_sfs and
run_sfs.
Eliminated dependencies on /usr/tmp directory.
Use back-tic to set SPEC_HOME.
Don't use '.' to invoke sfsenv.
Use pwd_mkdb to update FreeBSD password database.
The SFS 3.0 Users Guide has been updated to reflect:
There are several things you must set up on your server before you can successfully execute a benchmark run.
1. Configure enough disk space. SPECsfs needs 10 MB of disk space for each NFSops you will be generating, with space for 10% growth during a typical benchmark run (10 measured load levels, 5 minutes per measured load). You may mount your test disks anywhere in your server's file space that is convenient for you. The NFSops a server can process is often limited by the number if independent disk drives configured on the server. In the past, a disk drive could generally sustain on the order of 100-200 NFSops. This was only a rule of thumb, and this value will change as new technologies become available. However, you will need to ensure you have sufficient disks configured to sustain the load you intend to measure.
2. Initialize and mount all file systems. According to the Run and Disclosure Rules, you must completely initialize all file systems you will be measuring before every benchmark run. On Unix systems, this is accomplished with the “newfs” command. Just deleting all files on the test disks in not sufficient because there can be lingering effects of the old files (e.g. the size of directory files, location of inodes on the disk) which effect the performance of the server. The only way to ensure a repeatable measurement is to re-initialize all data structures on the disks between benchmark runs. However, if you are not planning on disclosing the result, you do not need to perform this step.
3. Export all file systems to all clients. This gives the clients permission to mount, read, and write to your test disks. The benchmark program will fail without this permission.
4. Verify that all RPC services work. The benchmark programs use port mapping, mount, and NFS services provided by the server. The benchmark will fail if these services do not work for all clients on all networks. If your client systems have NFS client software installed, one easy way to do this is to attempt mounting one or more of the server's disks on the client. NFS servers generally allow you to tune the number of resources to handle UDP and/or TCP requests. When benchmarking using the TCP protocol , you must make sure that UDP support is at least minimally configured or the benchmark will fail to initialize.
5. Ensure your server is idle. Any other work being performed by your server is likely to perturb the measured throughput and response time. The only safe way to make a repeatable measurement is to stop all non-benchmark related processing on your server during the benchmark run.
6. Ensure that your test network is idle. Any extra traffic on your network will make it difficult to reproduce your results, and will probably make your server look slower. The easiest thing to do is to have a separate, isolated network between the clients and the server during the test.
At this point, your server should be ready to measure. You must now set up a few things on your client systems so they can run the benchmark programs.
1. Create “spec” user. SPECsfs should run as a non-root user.
2. The SPECsfs programs must be installed on clients.
3. Ensure sfs and sfs3 are setUID root, if necessary. Some NFS servers only accept mount requests if sent from a reserved UDP or TCP port, and only the root user can send packets from reserved ports. Since SPECsfs generally is run as a non-root user, the sfs and sfs3 programs must be set to execute with an effective UID of root.
To get the benchmark to use a reserved port, you must include a -DRESVPORT option in your compile command. This is easiest to accomplish by editing the Makefile wrapper file (M.xxxx) for your client systems. The build process will then make the client use a reserved port and will arrange to run the benchmark programs as root. However, you may want to verify this works the first time you try it.
4. Configure and verify network connectivity between all clients and server. Clients must be able to send IP packets to each other and to the server. How you configure this is system-specific and is not described in this document. Two easy ways to verify network connectivity are to use a “ping” program or the netperf benchmark (http:// onet1.external.hp.com/netperf/NetperfPage.html) .
5. If clients have NFS client code, verify they can mount and access server file systems. This is another good way to verify your network is properly configured. You should unmount the server's test disks before running the benchmark.
6. Configure remote shell access. The Prime Client needs to be able to execute commands on the other client systems using rsh (remsh on HP-UX, AT&T Unix, and Unicos). For this to work, you need to create a .rhosts file in the spec user's home directory.
A good test of this is to execute this command from the prime client:
$ rsh client_name "rsh prime_client date"
If this works, all is well.
7. The Prime Client must have sufficient file space in the SFS file tree to hold the result and log files for a run. Each run generates a log file of 10 to 100 kilobytes, plus a result file of 10 to 100 kilobytes. Each client also generates a log file of one to 10 kilobytes.
Once you have the clients and server configured, you must set some parameters for the benchmark itself, which you do in a file called the “rc file”. The actual name of the file is a prefix picked by you, and the suffix “_rc”. The default version shipped with the benchmark is delivered as “sfs_rc” in the benchmark source directory. The SPECsfs tools allow you to modify parameters in the rc file. If you want to manually edit this file, the sfs_rc file should be copied to the results directory. The sfs_rc file can then be edited directly. The sfs_rc file is executed by a Bourne shell program, so all the lines in the RC file must be in Bourne shell format. Most important, any variable which is a list of values must have its value enclosed in double quotes.
There are several parameters you must set, and several others you may change to suit your needs while performing a disclosable run. There are also many other parameters you may change which change the benchmark behavior, but lead to an undisclosable run (for example, turning on debug logging).
The parameters you can/must set are:
1. MNT_POINTS : This parameter specifies the names of the file systems the clients will use when testing the server. It can take two forms.
The first form is a list of host:path pairs specifying the file systems this particular client will be using. For example, if the server is named “testsys” and has three test mount points named “/test1”, “/test2”, and “/test3”, the list would be “testsys:/test1 testsys:/test2 testsys:/test3”. You must be very careful when specifying the mount point to comply with the uniform access rule (see below).
The second form is simply the name of a file containing a list of mount points for each client. The format of the file is:
client_name server:path server:path...
client_name server:path server:path...
And so on, one line for each client system. This file gets stored in the “results” directory, the same place as the rc file.
2. LOAD , INCR_LOAD , and NUM_RUNS : These parameters specify the aggregate load the clients will generate. You can specify the load points two ways:
• You can explicitly specify a list of load levels in LOAD. So, if you want to test a load of 100, 300, and 700 NFSops, you would set LOAD to “100 300 700".
• If you want to test a set of evenly spaced load points, set all three parameters. Set LOAD to the lowest load level, set INCR_LOAD the amount you would like to increase the load for each measured run, and set NUM_RUNS to the number of times you would like to increment the load. This is the easiest way to configure a disclosable run.
For example, if you would like to measure 10 evenly spaced points ending at 2000 NFSops, you would set LOAD to 200, INCR_LOAD to 200, and NUM_RUNS to 10.
3. CLIENTS : This is the names of all the client systems you will use to load your server. If you will be generating load with the prime client, include it on this list.
4. NUM_PROCS : This is the number of load generating processes (“procs”) you want to run on each client system. As you add procs, you can have more NFS requests outstanding at any given time, and you can use more file systems on the server, all of which tends to increase the load your server can process (until either the disks or the processors run out of capacity).
There is a relationship between the value of PROCS, CLIENTS and MNT_POINTS. The number of mount points specified in MNT_POINTS must equal the value of PROCS, or equal the value of PROCS times the number of clients in CLIENTS. In the first case, each mount point will be accessed by one proc on each client. In the second case, each listed mount point will be accessed by exactly one proc on one client. The first PROC mount points will be used by the first client, the second PROC mount points by the second client, and so forth.
You may specify the same mount point multiple times in MNT_POINTS. This allows you to have more than one process accessing a given filesystem on the server, without having all clients loading that filesystem. If a fileystem traverses multiple disks, (Example RAID Level 0, 1 …), then care must be taken to conform to the uniform access rule.
5. NFS_VERSION : This may be left unset or set to 2 to measure NFS protocol version 2, and set to 3 to measure NFS protocol version 3.
6. TCP : Set this to 1 or “on” to use TCP to communicate between the clients and the server. Leave it unset or set to 0 to use UDP.
7. BIOD_MAX_READS and BIOD_MAX_WRITES : SPECsfs emulates the read-ahead and write-behind behavior of NFS block I/O daemons. These allow a client to have multiple read and write requests outstanding at a given time. BIOD_MAX_READS and BIOD_MAX_WRITES configure how many read or write operations SPECsfs will transmit before stopping and waiting for replies. You can set these to any value from 0 to 32, inclusive.
There are many other parameters you can modify in the rc file, but generally none are necessary. They allow you to change the NFS operation mix, change run duration parameters, or turn on debugging information. Modifying most of these parameters will lead to an invalid (that is, undisclosable) run. The full list of parameters is documented at the end of the sfs_rc file and at the end of this section.
Complying with the Uniform Access Rule
The most common way to perform an un-disclose-able run is to violate the uniform access rule See “ SPEC's Description of Uniform Access for SFS 3.0 ” . In some systems, it is possible to complete an NFS operation especially fast if the request is made through one network interface and the data is stored on just the right file system. The intent of the rule is to prevent the benchmarker (that's you) from taking advantage of these fast paths to get an artificially good result.
The specific wording of the rule states that “for every network, all file systems should be accessed by all clients uniformly.” The practical implication of the uniform access rule is you must be very careful with the order in which you specify mount points in the MNT_POINTS variable.
The fool-proof way to comply with the uniform access rule is to have every client access every file system, evenly spreading the load across the network paths between the client and server. This works pretty well for small systems, but may require more procs per client than you want to use when testing large servers.
If you want to run fewer procs on your clients' than you have file systems, you will need to take some care figuring out the mount points for each client.
Uniform access is a slippery subject. It is much easier to examine a configuration and say whether it is uniform than it is to come up with a perfect algorithm for generating complying mount point lists. There will always be new configurations invented which do not fit any of the examples described below. You must always examine the access patterns and verify there is nothing new and innovative about your systems which makes it accidentally violate the uniform access rule.
Below are some examples of generating mount point lists which do comply with the uniform access rule.
To begin, you must first determine the number of file systems, clients, and load generating processes you will be using. Once you have that, you can start deciding how to assign procs to file systems. As a first example, we will use the following file server:
Clients C1 and C2 are attached to Network1, and the server's address on that net is S1. It has two disk controllers (DC1 and DC2), with four file systems attached to each controller (F1 through F8).
You start by assigning F1 to proc1 on client 1. That was the easy part.
You next switch to DC2 and pick the first unused file system (F5). Assign this to client 1, proc 2.
Continue assigning file systems to client 1, each time switching to a different disk controller and picking the next unused disk on that controller, until client 1 has PROC file systems. In the picture above, you will be following a zig-zag pattern from the top row to the bottom, then up to the top again. If you had three controllers, you would hit the top, then middle, then bottom controller, then move back to the top again. When you run out of file systems on a single controller, go back and start reusing them, starting from the first one.
Now that client 1 has all its file systems, pick the next controller and unused file system (just like before) and assign this to client 2. Keep assigning file systems to client 2 until it also has PROC file systems.
If there were a third client, you would keep assigning it file systems, like you did for client 2.
If you look at the result in tabular form, it looks something like this (assuming 4 procs per client):
C1: S1:F1 S1:F5 S1:F2 S1:F6
C2: S1:F3 S1:F7 S1:F4 S1:F8
The above form is how you would specify the mount points in a file. If you wanted to specify the mount points in the RC file directly, then it would look like this:
CLIENTS=”C1 C2”
PROCS=4
MNT_POINTS=”S1:F1 S1:F5 S1:F2 S1:F6 S1:F3 S1:F7 S1:F4 S1:F8
If we had 6 procs per client, it would look like this:
C1: S1:F1 S1:F5 S1:F2 S1:F6 S1:F3 S1:F7
C2: S1:F4 S1:F8 S1:F1 S1:F5 S1:F2 S1:F6
Note that file systems F1, F2, F5, and F6 each get loaded by two procs (one from each client) and the remainder get loaded by one proc each. Given the total number of procs, this is as uniform as possible. In a real benchmark configuration, it is rarely useful to have an unequal load on a given disk, but there might be some reasons this makes sense.
The next wrinkle comes if you should have more than one network interface on your server, like so:
Clients C1 and C2 are on Network1, and the server's address is S1. Clients C3 and C4 are on Network2, and the server's address is S2.
We start with the same way, assigning F1 to proc 1 of C1, then assigning file systems to C1 by rotating through the disk controllers and file systems. When C1 has PROC file systems, we then switch to the next client on the same network, and continue assigning file systems. When all clients on that network have file systems, switch to the first client on the next network, and keep going. Assuming two procs per client, the result is:
C1: S1:F1 S1:F5
C2: S1:F2 S1:F6
C3: S2:F3 S2:F7
C4: S2:F4 S2:F8
And the mount point list is:
MNT_POINTS=”S1:F1 S1:F5 S1:F3 S1:F7 S2:F2 S2:F6 S2:F4 S2:F8”
The first two mount points are for C1, the second two for C2, and so forth.
These examples are meant to be only that, examples. There are more complicated configurations which will require you to spend some time analyzing the configuration and assuring yourself (and possibly SPEC) that you have achieved uniform access. You need to examine each component in your system and answer the question “is the load seen by this component coming uniformly from all the upstream components, and is it being passed along in a uniform manner to the downstream ones?” If the answer is yes, then you are probably in compliance.
As mentioned above, there are many more parameters you can set in the RC file. Here is the list and what they do.
The following options may be set and still yield a disclose-able benchmark run:
1. SFS_USER : This is the user name of the user running the benchmark. It is used when executing remote shell commands on other clients from the prime client. You would only want to modify this if you are having trouble remotely executing commands.
2. SFS_DIR and WORK_DIR : These are the directory names containing the SPECsfs programs ( SFS_DIR ), the RC file, and logging and output files ( WORK_DIR ). If you configure your clients with the same path for these directories on all clients, you should not need to fool with this. One easy way to accomplish this is to export the SFS directory tree from the prime client and NFS mount it at the same place on all clients.
3. PRIME_MON_SCRIPT and PRIME_MON_ARGS : This is the name (and argument list) of a program which SPECsfs will start running during the measurement phase of the benchmark. This is often used to start some performance measurement program while the benchmark is running so you can figure out what is going on and tune your system.
Look at the script “sfs_ext_mon” in the SPECsfs source directory for an example of a monitor script.
4. RSH : This is the name of the remote command execution command on your system. The command wrapper file (C.xxxx) should have set this for you, but you can override it here. On most Unix systems, it is “rsh”, but a few (e.g. HP-UX and Unicos), it's called “remsh”.
These remaining parameters may be set, but SPEC will reject the result for disclosure. They are available only to help you debug or experiment with your server
5. WARMUP_TIME and RUNTIME : These set the duration of the warmup period and the actual measurement period of the benchmark. They must be 300, or the submission will be rejected for disclosure.
6. MIXFILE : This specifies the name of a file in WORK_DIR which describes the operation mix to be executed by the benchmark. You must leave this unspecified to disclose the result. However, if you want to change the mix for some reason, this gives you the ability.
Look in the file sfs_c_man.c near the function setmix() for a description of the mix file format. The easiest to use format is as follows:
SFS MIXFILE VERSION 2
opname xx%
opname yy%
# comment
opname xx%
The first line must be the exact string “SFS MIXFILE VERSION 2" and nothing else. The subsequent lines are either comments (denoted with a hash character in the first column) or the name of an operation and it's percentage in the mix (one to three digits, followed by a percent character). The operation names are: null, getattr, setattr, root, lookup, readlink, read, wrcache, write, create, remove, rename, link, symlink, mkdir, rmdir, readdir, fsstat, access, commit, fsinfo, mknod, pathconf, and readdirplus. The total percentages must add up to 100 percent.
7. ACCESS_PCNT : This sets the percentage of the files created on the server which will be accessed for I/O operations (i.e. will be read or written). The must be left unmodified for a result to be submitted for publication.
8. DEBUG : This turns on debugging messages to help you understand why the benchmark is not working. The syntax is a list of comma-separated values or ranges, turning on debugging flags. A range is specified as a low value, a hyphen, and a high value (e.g. “3-5” turns on flags 3, 4, and 5), so the value “3,4,8-10” turns on flags 3, 4, 8, 9, and 10.
To truly understand what gets reported with each debugging flag, you need to read the source code. The messages are terse, cryptic, and not meaningful without really understanding what the code is trying to do. Note the child debugging information will only be generated by one child process, the first child on the first client system. This must not be modified for a valid submission.
| Value |
Name of flag |
Comment |
| 1 |
DEBUG_NEW_CODE | Obsolete and unused |
| 2 |
DEBUG_PARENT_GENERAL | Information about the parent process running on each client system. |
| 3 |
DEBUG_PARENT_SIGNAL | Information about signals between the parent process and child processes |
| 4 |
DEBUG_CHILD_ERROR | Information about failed NFS operations |
| 5 |
DEBUG_CHILD_SIGNAL | Information about signals received by the child processes |
| 6 |
DEBUG_CHILD_XPOINT | Every 10 seconds, the benchmark checks it's progress versus how well it's supposed to be doing (for example, verifying it is hitting the intended operation rate). This option gives you information about each checkpoint |
| 7 |
DEBUG_CHILD_GENERAL | Information about the child in general |
| 8 |
DEBUG_CHILD_OPS | Information about operation starts, stops, and failures |
| 9 |
DEBUG_CHILD_FILES | Information about what files the child is accessing |
| 10 |
DEBUG_CHILD_RPC | Information about the actual RPCs generated and completed by the child |
| 11 |
DEBUG_CHILD_TIMING | Information about the amount of time a child process spends sleeping to pace itself |
| 12 |
DEBUG_CHILD_SETUP | Information about the files, directories, and mix percentages used by a child process |
| 13 |
DEBUG_CHILD_FIT | Information about the child's algorithm to find files of the appropriate size for a given operation |
The following are things that one may wish to adjust to obtain the maximum throughput for the SUT.
CHAPTER 3 SFS tools
This section briefly describes the usage of the run tools provided with the SPEC System File Server (SFS) Release 3.0 suite. These tools provide both a novice mode (query driven) and a advanced mode (menu driven) interface that provide the user with helpful scripts that can set up the environment, set various benchmark parameters, compile the benchmark, conduct benchmark validation, execute the benchmark, view results from a run and archive the results. The results obtained from multiple data points within a run are also collected in a form amenable for ease of use with other result formatting tools. These tools are used on the primary load generator (Prime-Client) for benchmark setup and control as well as on the rest of the NFS load generators (clients) to assist in compiling the programs.
While not required to run the benchmark, the SFS tools can facilitate the “quick” running of the benchmark for tuning various components of the system and results reporting.
This section does not cover the complete Client-Server environment setup in detail. It touches only the portions currently handled by the tools. For information on how to set up and run the SFS suite the reader is advised to refer to the section on running SFS above.
The SFS Benchmark uses the UNIX “Makefile” structure (similar to other SPEC Suites) to build tools, compile the benchmark source into executables, and to clean directories of all executables. If you are familiar with other SPEC suites, navigating around SFS should be very similar. It is important to note that unlike other SPEC benchmarks, SPECsfs's validation and execution functions are built into the “sfs_mgr” script suppli