Skip navigation
 
 

Defects in SFS 2.0 Which Affect the Working-Set

by Stephen Gold, SFS vice-chair

FINAL DRAFT, last updated 19 July 2001

1.0 Executive Summary

Significant defects have recently been discovered in the SFS 2.0 benchmark suite, causing it to be withdrawn. Each of these defects tends to reduce the size of the benchmark's working-set as the number of load-generating processes is reduced. Since test-sponsors have great freedom in deciding how many processes to use, the resulting variations in the working-set hamper meaningful comparison of SFS 2.0 test results.

Defect #1 manifests when the requested ops/sec/process is in the hundreds. In observed incidents, the working-set of the benchmark may be reduced as much as 27%.

Defect #2 manifests whenever the requested ops/sec/process is 26 or more. The working-set of the benchmark may be reduced by a factor of three or more.

Defect #3 manifests only when the requested ops/sec/process is 500 or more. The working-set of the benchmark may be be reduced two orders of magnitude or more.

Defect #4: Even if the first three defects were corrected, the cache profile of the working-set would still vary with the number of load-generating processes used. As processes are reduced, the fileset distribution narrows.

Because of these defects, many of the published SFS 2.0 results are not comparable. Comparability exists only for results that were run on servers which would have cached the intended working-set of the benchmark. Based on simulation, it is believed that 115 out of the 248 approved SFS 2.0 results meet this criterion for comparability.

2.0 Technical Background

The SFS 2.0 benchmark suite was released by SPEC December 1997. SFS 2.0 is used to evaluate and compare the performance of fileservers under steady NFS loads. SFS 2.0 was withdrawn by the SPEC Open Systems Group in June 2001 due to the defects which are the subject of this document.

Important features of SFS 2.0 included:

  • coordination of multiple load-generating clients
  • coordination of multiple load-generating processes on each client
  • portability between different load-generating clients
  • a dataset which grows in direct proportion to the requested load
  • adaptive regulation of the load generated by each process
  • measurement of both achieved throughput and average response-time
  • a 300-second warmup phase before each measurement phase
  • run-rules intended to create a level playing-field for competition
  • reporting rules intended to provide all information needed to reproduce each published result

The SFS 2.0 suite consists of two workloads, SPECsfs97.v2 and SPECsfs97.v3. The differences between the two workloads boil down to the operation-mixes used. Data-set generation, file selection, and rate regulation are done exactly the same way for both workloads. Thus, defects in these areas affect both workloads.

Much of the remainder of this discussion assumes that the reader is a licensee with access to the source-code. But sufficient detail is presented to allow a non-licensee to grasp the broad outlines of how the benchmark works and what went wrong.

2.1 Working-Set

The working-set of an NFS workload over an interval is the number of distinct filesystem blocks accessed during the interval.

All modern NFS servers have some ability to cache frequently-accessed filesystem blocks in their buffer cache for later reuse. This is done to reduce the probability that any particular access will require a (costly) disk read.

On many servers, the buffer cache is smaller than the working-set of the SFS 2.0 benchmark over its 300-second measurement phase. (One reason for the 300-second warmup phase in SFS 2.0 is to warm up the cache sufficiently to approximate the steady-state behavior.) For such servers, changes to the working-set size can have a large impact on the number of disk operations, the utilization of the CPU and disks, achievable throughput, and average response-times.

2.2 How the SFS 2.0 dataset is Generated

The design intent of SFS 2.0 was that the dataset would be 10 MBytes per op/s of requested load.

Looking at the implementation (sfs_c_chd.c) we see that each load-generating process has its own dataset, rooted in its own subdirectory, with no sharing between processes. The dataset of each process consists of four so-called "working sets". Each "working set" is used for a different group of NFS operations:

Io_working_set
used for "I/O operations": GETATTR, SETATTR, LOOKUP, READ, WRITE, FSSTAT, ACCESS, and non-failing LOOKUP operations
Non_io_working_set
used for CREATE, REMOVE, and failing LOOKUP operations
Dir_working_set
used for READDIR and READDIRPLUS operations
Symlink_working_set
used for READLINK operations

In this paper we are mainly concerned with the Io_working_set.

The code that sizes the "working sets" is in sfs_c_man.c. Num_io_files gets set to roughly 390 times the requested op/sec for the process. The INIT phase ensures that this number of "I/O" files (with names like file_en.00001) get created on the server. Since the average file size is about 27 KBytes, this works out to about 10 MBytes per requested op/sec.

Note that the acheived ops/sec reported in the benchmark disclosure generally differs slightly from the requested ops/sec.

2.3 How "I/O" Files are Selected

The design intent of SFS 2.0 was that only 10% of the files would be accessed in the course of a run, and furthermore that the files would be accessed according to a Poisson distribution, causing some files to be accessed much more frequently than others.

Num_working_io_files is set to roughly 10% of Num_io_files. The init_fileinfo() function in sfs_c_chd.c randomly selects Num_working_io_files of the "I/O" files to go into the Io_working_set.

To facilitate generation of the Poisson distribution, the Io_working_set is divided into eight or more access-groups of roughly equal size. An access-group in the Io_working_set typically contains about 100 files. The number of access-groups is always a multiple of four. To be precise:

   group_cnt = 8 + ((Num_working_io_files/500) * 4)

Each access-group is assigned a relative weight, called "next_value". The sum of the weights from 0 to i is stored in Io_working_set.entries[i].range. Then the sum of all the weights is stored in Io_working_set.max_range.

When a load-generating process wants to generate an NFS operation, it calls the do_op() function in sfs_c_chd.c, which calls op(). If the operation chosen by do_op() is one which uses the Io_working_set, op() calls randfh() with file_type=Sfs_io_file. Some operations (like READ and WRITE) may call randfh() more than once, until they find a file with particular properties.

In order to pick an access-group, randfh() generates a random value between 0 and (Io_working_set.max_range - 1), inclusive. It then does a binary search, starting at Io_working_set.max_range/2 - 1 and proceeding until it finds a group such that either:

  • Io_working_set.entries[group - 1].range < value && value <= Io_working_set.entries[group].range

or

  • group == 0 && value <= Io_working_set.entries[group].range

If the random value were uniformly distributed and the ".range" fields were monotonically non-decreasing, the probability of selecting a particular group would be:

   (Io_working_set.entries[group].range - Io_working_set.entries[group - 1].range)/Io_working_set.max_range  (if group > 1)

or 

   Io_working_set.entries[0].range/Io_working_set.max_range  (if group = 0)

which would equal next_value/Io_working_set.max_range . (Recall that a different next_value is computed for each group.)

In order to pick a file, randfh() generates random offsets within the selected group. If the operation is a READ or a non-append WRITE, randfh() keeps generating offsets until it finds a file that is large enough for the current operation.

2.4 How Request Rates are Regulated

Between calls to do_op(), the load-generating processes sleeps by calling msec_sleep(). Each load-generating process regulates its request rate independantly from the other processes. This is done by periodically adjusting the average sleep-time, Target_sleep_mspc. The goal of the adjustment is to find a value of Target_sleep_mspc which causes the process to generate its share of the requested total rate, which is Child_call_load.

The same adjustment algorithm is used for both the warmup phase and the measurement phase, just with different parameters. It is fairly simple. A process starts the warmup phase with Target_sleep_mspc calculated based on sleeping 50% the time. After a period of time has elapsed, the process calls check_call_rate() to adjust Target_sleep_mspc.

The check_call_rate() function begins by comparing actual requests generated since start of the current phase (Reqs_this_test) with a target based on multiplying the Child_call_load by elapsed time in the current phase (elapsed_time). The difference between the target and actual requests is added to the normal request count for the upcoming check-period (req_target_per_period) and that is the number of requests to be attempted in the upcoming check-period. In other words, the process attempts to be completely caught up by the end of the upcoming check-period.

The check_call_rate() function sets Target_sleep_mspc to reflect the number of requests to be attempted, the duration of a check-period, and the expected work time per request in the upcoming check-period. Since the actual work times are not known, the code substitutes (as an approximation) the average work time for all requests so far in the current phase.

During the warmup phase, each check-period lasts for two seconds. Since the warmup phase is 300 seconds, a total of 150 checks are made during the warmup phase. By that time the Target_sleep_mspc has hopefully converged to a stable value. During the measurement phase (which lasts for another 300 seconds) the checking and adjustments continue, but the check-period is increased to 10 seconds.

The load-generating process does not attempt to sleep for exactly Target_sleep_msec each time. Instead, it generates a random number of milliseconds (rand_sleep_msec) which is distributed uniformly between roughly 50% and 150% of Target_sleep_msec. This randomized sleep interval is passed to msec_sleep(), which in turn calls the library function select(). The timeout argument passed to select is based directly on rand_sleep_msec.

3.0 Description of the Defects

3.1 Defect #1: request-rate regulation is unstable, causing processes to sleep for many seconds, aka "The oscillation defect"

discovered May 7, 2001

The rate adjustment algorithm described in Section 2.3 is highly aggressive. For instance, if the process generates twice the requested rate during first check-period of the warmup phase, it will try to recover completely during the second check-period. To do so, it will attempt to make zero requests during that period. This is rather extreme considering that there are still 149 periods remaining during the warmup phase!

Because the adjustment algorithm is so aggressive, it tends to be unstable, particularly when the process is asked to generate a large number of requests per second.

The instability of the adjustment algorithm is exacerbated by difficulties in accurately controlling the interval between requests. On many UNIX systems, the select() timeout resolution is one clock-tick, which typically means 10 msec. For instance, msec_sleep(1) is likely to sleep for at least 10 msec, for an error of >900% with respect to the requested interval!

If the adjustment algorithm sees that the process is falling behind with Target_sleep_mspc=1, it is likely to try Target_sleep_mspc=0. This causes a sudden jump in the request-rate, lasting for an entire period. This can put the process so far ahead that it will stall for the period after that.

Furthermore, SFS 2.0 tests are usually run with more load-generating processes than CPUs. This means that processes whose select() timeout has expired may have to wait for another process to relinquish the CPU before it can return from select() and generate another request. This increases the upredictability of the msec_sleep() function and tends to introduce further instability.

The instability, when present, can be observed by enabling the CHILD_XPOINT debugging code in check_call_rate(), which can be accomplished by setting DEBUG=6 in the sfs_rc file. If you see debug output containing the string " 0 call, rqnd " then you will know that one of the load-generating processes generated zero calls for that check-period.

The zero-call behavior has been observed both during warmup phases and measurement phases. With respect to measurement phases, the behavior has been observed for request-rates ranging from 200 to 400 ops/sec/process, affecting 6-27% of the adjustment periods. It is possible that the defect could occur for request-rates outside-this range.

When a load-generating process experiences a zero-call period during the measurement phase, its portion of the fileset receives no accesses for ten seconds. Over that interval, the working-set of the benchmark is reduced by 1/N from the intended working-set, where N is the total number of load-generating processes.

Smaller oscillations in the request-rate, say from 50% to 150% of the requested rate, would also tend to reduce the working-set, though the effect is harder to quantify.

To summarize:

When the requested ops/sec/process is in the hundreds, instability in the rate regulation algorithm can cause the request-rate of a process to oscillate wildly. Sometimes the request-rate of a process goes to zero for ten seconds at a time. The frequency and severity of the problem are poorly understood. In observed incidents, the working-set of the benchmark was reduced by 6%-27%.

3.2 Defect #2: I/O access-group probabilities can round to zero, aka "The distribution defect"

discovered May 7, 2001

The next_value variable (i.e., the weight given to access-group i) is calculated as:

   next_value = (int) (1000 * (cumulative_lambda_power /
                     (e_to_the_lambda * cumulative_x_factorial)))

where:

  • lambda = group_cnt / 2
  • cumulative_lambda_power is lambda to the ith power

and

  • cumulative_x_factorial is (i+1)!.

Whenever the requested ops/sec/process is in the range 26 to 38, there will be 16 access groups, lambda = 8, and e_to_the_lambda = 2980.96. This causes access-group #0 to have next_value of:

      (int) (1000 * (1.0 / (2980.96 * 1.0)))) = (int)0.335 = 0

So Io_working_set.entries[0].range == 0, which mean that group #0, containing roughly 6% of the Io_working_set files is never selected by randfh(Sfs_io_file). In effect, the probability of access, which should have been 0.03%, got rounded down to zero.

As the request-rates increase, so do group_cnt, lambda, and e_to_the_lambda. next_value=0 occurs for more and more access-groups, making them inaccessible. This phenomenon starts at the margins of the distribution (groups 0 and group_cnt-1, where the Poisson probabilities are lowest) and spreads inward toward the mode, as illustrated in the following table:

Requested    Number of I/O access-groups:       Inaccessible
ops/proc     total  inaccessible  accessible    I/O access-groups:
  1-12          8     0  (0%)          8         none
 13-25         12     0  (0%)         12         none
 26-38         16     1  (6%)         15         {0}
 39-51         20     2 (10%)         18         {0-1}
 52-64         24     3 (13%)         21         {0-2}
 65-76         28     5 (18%)         23         {0-3, 27}
 77-89         32     8 (25%)         24         {0-5, 30-31}
 90-102        36    11 (31%)         25         {0-6, 32-35}
103-115        40    13 (33%)         27         {0-7, 35-39}
116-128        44    17 (39%)         27         {0-9, 37-43}
129-141        48    19 (40%)         29         {0-10, 40-47}
142-153        52    22 (42%)         30         {0-11, 42-51}
 ...
193-205        68    35 (51%)         33         {0-18, 52-67}
 ...
295-307       100    60 (60%)         40         {0-30, 71-99}
 ...
398-410       132    87 (66%)         45         {0-44, 90-131}
411-423       136    90 (66%)         46         {0-45, 92-135}
424-435       140    93 (66%)         47         {0-47, 95-139}
436-448       144    97 (67%)         47         {0-49, 97-143}
 ...
475-487       156   107 (69%)         49         {0-54, 104-155}
488-499       160   111 (69%)         49         {0-56, 106-159}

The "inaccessible" groups in this table are just the ones which have next_value=0.

Here is the same data presented in graphical form:

Note that below 26 requested ops/sec/proc, all the access-groups are accessible and this defect has no effect.

The trend would continue past 500 requested ops/sec/proc, except that at that point the existence of defect #3 complicates the issue.

Inaccessible I/O access-groups reduce the number of files accessed from what the benchmark intended. Of the "Files accessed for I/O operations" printed in the client log, it might be the case that only 1/3 were truly accessible via those operations. Thus the I/O operations get concentrated over a smaller set of files than was intended.

The inaccessible files are precisely the ones that would be accessed least frequently in a correct Poission distribution. On the other hand, these files are also the ones that are most likely to miss in the server's buffer cache, so their inaccessibility could have a relatively large impact on buffer-cache miss-rates. Buffer-cache miss-rates could affect the average response-times reported by the benchmark as well as the peak throughputs.

Don Capps of Hewlett-Packard has created a simulation of the SFS 2.0 benchmark which can quickly determine, for a given SFS 2.0 result, what fraction of the Io_working_set access-groups were inaccessible due to this defect.

To summarize:

Whenever the requested ops/sec/process is 26 or more, there are some I/O access-groups which cannot be accessed because their probability of access is zero. The number of inaccessible access-groups generally increases as the request-rate increases. At 475 requested ops/sec/process, the working-set of the "I/O operations" in SFS 2.0 is reduced by more than a factor of three.

3.3 Defect #3: I/O access-group ranges can be negative, aka "The floating-point overflow defect"

discovered May 7, 2001

Recall that the next_value (the weight of access-group i) is calculated as:

   next_value = (int) (1000 * (cumulative_lambda_power /
                     (e_to_the_lambda * cumulative_x_factorial)))

All three variables in the right-hand side of the assignment are double-precision. If the load-generating client implements IEEE Standard 754 floating point (as most do) the largest accurately-represented value is roughly 2e+308.

For 500 requested ops/sec/process, there are 164 groups and lambda=82. Something strange happens for group #162, since cumulative_lambda_power (82 to the 162nd power) is roughly 1e+310, which is represented as Infinity. The denominator is also Infinity, since e_to_the_lambda (4.094e+35) times cumulative_x_factorial (1.22969e+289) also overflows 2e+308. So the quotient is Infinity divided by Infinity, or NaN (not-a-number). 1000 times NaN is still NaN. And converting NaN to an int results in (2^31 - 1) or 2147483647.

Now, the "range" values for each access-group are declared as ints. It so happens that previous_range = Io_working_set.entries[161].range = 970.

Adding 2147483647 and 970 using 32-bit twos-complement arithmetic (which is what most compilers generate for ints) yields -2147482679, which is stored in Io_working_set.entries[162].range.

Since the NaN also occurs for i=163, next_value is (2^31 - 1) again. Io_working_set.entries[163].range and Io_working_set.max_range both get set to 968.

Now one of the assumptions of the binary search algorithm (namely, that the "range" fields are monotonically non-decreasing) has been violated for the last two access-groups. This violation is the basis for defect #3.

For 164 groups this defect is not a serious matter, since the binary search starts at group=81 and never reaches the last two access-groups for any random value between 0 and 967.

As the request-rate increases beyond 512 requested ops/sec/process, so do group_cnt, lambda, and e_to_the_lambda. next_value=NaN occurs for more and more access-groups. As long as the NaNs are confined the highest access-groups and there are an even number of NaNs, this defect has little or no effect.

However, if the number of NaNs happens to be odd instead of even (as happens when there are 168 or 176 groups) then Io_working_set.max_range will be a large negative integer. In this case the random value generated will be a positive number in the range 0 to 1-max_range. The effect is to cause the vast majority of accesses to go to one or two access-groups.

For example, consider a process with 168 access-groups. There are seven NaNs and max_range = -2147482689. The binary search algorithm picks a value in the range 0 to 2147482690. The binary search algorithm starts at group=83, which has Io_working_set.entries[83].range = 469.

By far the most likely scenario is that the random value is greater than 966, in which case the binary search examines groups 125, 146, 157, 162, 165, and 166, before settling on group=167. Over the course of a 300-second measurement phase (in which the process generates on the order of a million I/O operations) the expected number of random values between 0 and 966 is less than 1. In effect, the working-set for I/O operations has been reduced to a single access-group.

For exactly 304 access-groups, max_range is zero, causing a division by zero error which terminates the benchmark with a core dump.

When the number of access-groups exceeds 304, the NaNs invade the middle access-groups, where the binary search starts. Even if the number of NaNs is even, the binary search algorithm can get seriously confused by them. Once again, the effect is to cause the vast majority of accesses to go to one or two access-groups.

For example, consider a process with 324 access-groups. This time there are 184 NaNs and max_range=-162. The binary search algorithm picks a value in the range 0 to 161. The binary search algorithm starts at group=161. But:

   Io_working_set.entries[161].range = 0 and
   Io_working_set.entries[162].range = 2147483647.

By walking through the code, you can see that group #161 is selected for value=0 and group #162 is selected for 0 < value < 162:

        if (work_set->entries[group].range == value)
            break;
        if (work_set->entries[group].range > value) {
            ...
        } else if (work_set->entries[group].range < value) {
            if (work_set->entries[group+1].range > value) {
                group++;
                break;           

Thus, out of 324 groups, only two groups are accessible, and group #162 gets over 99% of the I/O accesses.

Don Capp's simulator calculates the group-access probabilities, generates random numbers, and performs the binary search using algorithms equivalent to those in the SFS 2.0 load generator. The number of accesses simulated is always 300 times the requested ops/sec. This is slightly unrealistic because (for various reasons) the actual number of I/O accesses per requested op is not really unity. Nevertheless the simulator provides a very convincing illustration of how the number of I/O access-groups actually accessed varies for different numbers of requested ops/sec/process.

The following table, generated using the simulator, shows how the number of groups with next_value=0 and next_value=NaN varies for selected numbers of access-groups. The last column shows how many access-groups were actually selected by the binary search algorithm after it had been invoked millions of times.

Requested    number of I/O access-groups:
ops/proc     total  next_value=0  next_value=NaN  selected via binary search
488-499       160      111              0              49 (31%)
500-512       164      112              2              49 (30%)
513-525       168      111              7               2  (1%) 
526-538       172      109             12              46 (27%)
539-551       176      108             17               2  (1%)
552-564       180      106             22              45 (25%)
 ...
590-602       192      103             36              44 (23%)
 ...
629-641       204       99             50              44 (22%)
 ...
667-679       216       96             56              30 (14%)
680-692       220       96             68              30 (14%)
693-705       224       94             73               2 (<1%)
706-717       228       93             78              32 (14%)
718-730       232       92             82              33 (14%)
 ...
757-769       244       95             96              32 (13%)
770-782       248       96            100              32 (13%)
 ...
795-807       256      100            109               3  (1%)
 ...
885-897       284      113            140              28 (10%)
898-910       288      113            145               1 (<1%)
911-923       292      116            149               1 (<1%)  
924-935       296      118            153               1 (<1%)
936-948       300      120            158               1 (<1%)
949-961       304      122            162              [core-dump]
962-974       308      122            167               1 (<1%)
975-987       312      125            171               1 (<1%)
988-999       316      127            175               1 (<1%)
1000-1012     320      129            180               2 (<1%)
1013-1025     324      131            184               2 (<1%)
1026-1038     328      133            188               1 (<1%)

Here is the same data presented in graphical form:

To summarize:

NaNs are generated whenever the requested ops/sec/process is 500 or more. These NaNs cause violation of the assumptions underlying the algorithm used to select access-groups.

When the number of NaNs is even and less than 150, the number of groups accessed generally declines as the request-rate increases, reaching 28 groups at 897 requested ops/sec/process. The working-set of the "I/O operations" in SFS 2.0 is reduced by up to an order of magnitude.

When the number of NaNs is odd and/or greater than 150, the number of groups accessed is usually three or less, with most of the accesses going to a single group. The working-set of the "I/O operations" in SFS 2.0 is reduced by at least two orders of magnitude.

3.4 Defect #4: fileset distribution narrows as processes are reduced

discovered June 5, 2001

Defects #2 and #3 can be remedied by making minor changes to the software which sets up the ranges. But even if the Poisson distribution were implemented as intended, there would still be a serious problem with the benchmark. The problem lies in the way the Poisson parameter "lambda" varies with the per-process request rate.

Although buffer cache algorithms vary in detail from server to server, it is reasonable to assume that the cacheability of an access-group increases with its frequency of access. So when analyzing cache behavior, it makes sense to sort the access-groups in order of probability, so that the rank of the most frequently-accessed group is 1 and the rank of the least frequently-accessed group is N. Plotting a group's probability of access against its sorted rank produces a "cache profile" of the workload.

As the number of groups grows, the cache profile becomes more and more concentrated on the left edge of the graph. In other words, a greater and greater fraction of the accesses occur in the busiest parts of the fileset.

For a simple illustration, imagine a server which always hits in the most-frequently accessed 1/8 of the groups and always misses in the remaining 7/8 of the groups. For a target load is 50,000 ops/sec, the Io_working_set size is always roughly 50 GBytes. By simply varying the number of load-generating processes, the fraction of accesses which are to cached groups can be varied from 20% to 84%, as shown below:

Procs   Ops/proc  Groups/proc    Cached_Groups Lambda    % cached
5000       10         8            5000           4        20.6%
1500       33        16            3000           8        28.1%
 625       80        32            2500          16        38.5%
 263      190        64            2104          32        52.1%
 128      390       128            2048          64        68.3%
  62      806       256            1859         128        84.2%

The only constraints on the number of load-generating processes come from the performance capabilities of the server-under-test and compliance with the Uniform Access Rule (UAR).

Of course, if the server-under-test caches the benchmark's working-set over the entire measurement phase, then the changes in the cache profile should have little or no impact on the results.

To summarize:

Even if the other three defects were corrected, the cacheability of the working-set would still vary with the number of load-generating processes used. This variability would invalidate any comparison of SFS 2.0 results for servers that were tested with different number of processes, unless it could be shown that both servers were operating entirely out of cache.

4.0 Effect on Approved Results

A total of 248 SFS 2.0 results have been approved by SPEC. 247 of those results have been published on the SPEC website. (The remaining result is awaiting the vendor's decision whether or not to publish.)

Don Capps's simulator has been used to demonstrate that 115 of the published SFS 2.0 results would have fit in the servers' memory even if the intended working-set had been achieved. Since the defects described in this report affect only the working-set of the benchmark, such results can be considered comparable to one another. Summarized simulation results for all 248 approved SFS 2.0 results are presented in Appendix A.

Of the 248 approved results, the requested ops/sec/process at peak ranges from 18.89 to 1000. Only one result exceeded the 499 requested ops/sec/process threshold for triggering defect #3. 230 results (93%) exceeded the 25 requested ops/sec/process threshold for triggering defect #2. Because defect #1 has not been adequately modeled, it is unknown how many of the approved results might have triggered that defect.

Of all the approved SFS 2.0 results, the five with the highest requested ops/sec/process at the peak were:

  1. 4054 SPECsfs97.v2 ops per second with an overall response time of 0.94 ms: 1000 requested ops/sec/process (not published yet)
  2. 14186 SPECsfs97.v3 ops per second with an overall response time of 1.55 ms: 447 requested ops/sec/process
  3. 10011 SPECsfs97.v3 ops per second with an overall response time of 1.24 ms: 420 requested ops/sec/process
  4. 13056 SPECsfs97.v3 ops per second with an overall response time of 1.46 ms: 406.25 requested ops/sec/process
  5. 17286 SPECsfs97.v3 ops per second with an overall response time of 1.33 ms: 402.5 requested ops/sec/process

All five of these results were submitted in 2001. Aside from these five results, no other approved results exceeded 400 requested ops/sec/process.

Using the tables in this report, one can see that:

  • Result 1 was affected by defect #3 at the peak, to the extent that only 2 of each process's 320 access-groups were actually accessed. In other words, the set of groups accessed was reduced by a factor of 160.
  • Results 2 through 5 were affected by defect #2 at the peak:
    • Result 2 was affected to the extent that only 47 each process's 144 access-groups were actually accessed. In other words, the set of groups accessed was reduced by a factor of 3.06x.
    • Result 3 was affected to the extent that only 46 each process's 136 access-groups were actually accessed. In other words, the set of groups accessed was reduced by a factor of 3.09x.
    • Results 4 and 5 were affected to the extent that only 45 each process's 132 access-groups were actually accessed. In other words, the set of groups accessed was reduced by a factor of 2.93x.

According to the simulator, none of these five results was obtained with a server that could operate entirely out of cache with the intended working-set. So in each case the performance of the server-under-test was exaggerated by the defects described in this report.

Many other results were affected as well. See Appendix A for more information about specific results.

5.0 Conclusions

Substantial defects have recently been discovered in the SFS 2.0 benchmark suite, causing it to be withdrawn. Each of these defects tends to reduce the size of the benchmark's working-set as the number of load-generating processes is reduced. Since test-sponsors have great freedom in deciding how many processes to use, the resulting variations in the working-set hamper meaningful comparison of SFS 2.0 test results for servers that do not cache the entire dataset.

Defect #1 manifests when the requested ops/sec/process is in the hundreds. Instability in the rate regulation algorithm can cause the request-rate of a process to go to zero for ten seconds at a time. The frequency and severity of the problem are poorly understood. In observed incidents, the working-set of the benchmark was reduced by 6%-27%.

Defect #2 manifests whenever the requested ops/sec/process is 26 or more. Some I/O access-groups become inaccessible because their probability of access is rounded down to zero. The number of inaccessible access-groups generally increases as the request-rate increases. At 475 requested ops/sec/process, the working-set of the "I/O operations" in SFS 2.0 is reduced by more than a factor of three. About 92% of the approved SFS 2.0 runs suffered from inaccessible access-groups because of this defect, though in about half of those cases the reported performance is thought to have been unaffected because the server-under-test could hold the intended working-set in cache.

Defect #3 manifests only when the requested ops/sec/process is 500 or more. Some I/O access-groups become inaccessible because propagation of floating-point overflows into NaNs result in violation of an assumption underlying the access-group selection algorithm. When the number of NaNs is even and less than 150, the working-set of the "I/O operations" in SFS 2.0 is reduced by up to an order of magnitude. When the number of NaNs is odd and/or greater than 150, the defect becomes much more serious and the working-set of the "I/O operations" in SFS 2.0 is reduced at least two orders of magnitude. Only one approved SFS 2.0 result has been affected by this defect. As of the publication of this report, that result has not been published on the SPEC website.

Even if the first three defects were corrected, the cache profile of the working-set would still vary with the number of load-generating processes used. This variability would invalidate any comparison of SFS 2.0 results for servers that were tested with different number of processes, unless it could be shown that both servers were operating entirely out of cache.

Because of these defects, many of the published SFS 2.0 results are not comparable. Comparability exists only for results that were run on servers which would have cached the intended working-set of the benchmark. Based on simulation, it is believed that 115 out of the 248 approved SFS 2.0 results meet this criterion for comparability.

Appendix A: Simulation Results

Here are the summarized simulation results for all 248 approved SFS 2.0 results. There are a few caveats to be considered in interpreting these results:

  • defect #1 is not simulated
  • the simulator considers only the I/O working-set, not at any of the other working sets
  • the simulator assumes that all RAM and NVRAM listed in the disclosure are available for buffer-cache
  • the simulator assumes that there are exactly 300 accesses to the I/O working set for each requested op/sec

Here are the 98 approved SPECsfs97.v2 results, sorted by achieved throughput at peak:

Approved Result Peak requested load, in ops/sec Number of processes Peak requested ops/sec/process SUT memory (MBytes) I/O files created (GBytes) SFS 2.0 I/O access-groups SFS 2.0 I/O access-groups accessed (percent) SFS 2.0 est. I/O files accessed: cacheable+uncacheable SFS 2.0 est. accesses to uncacheable I/O files SFS 3.0 est. I/O files accessed: cacheable+uncacheable SFS 3.0 est. accesses to uncacheable I/O files Status of the result, with respect to defects
1146 SPECsfs97.v2 ops per second with an overall response time of 3.60 ms 1200 9 133.33 264 468000 (11) 432 261 (60%) 20790+4392 6165 20790+4392 83214 probably affected
1300 SPECsfs97.v2 ops per second with an overall response time of 3.27 ms 1398 9 155.33 264 545220 (13) 504 306 (57%) 20790+6318 12510 20790+6318 126261 probably affected
2010 SPECsfs97.v2 ops per second with an overall response time of 1.88 ms 2000 8 250.00 512 780000 (19) 672 296 (44%) 31968+0 0 31968+0 98056 probably affected
2217 SPECsfs97.v2 ops per second with an overall response time of 5.58 ms 2200 64 34.38 1028 858000 (21) 1024 768 (100%) 71744+0 0 71744+0 0 I/O working set was cacheable; probably not affected
2366 SPECsfs97.v2 ops per second with an overall response time of 4.14 ms 2400 39 61.54 264 936000 (23) 936 819 (88%) 20787+48282 288951 20787+48282 378534 probably affected
2691 SPECsfs97.v2 ops per second with an overall response time of 3.98 ms 2700 24 112.50 264 1053000 (26) 960 648 (75%) 20784+41016 283320 20784+41016 460296 probably affected
2903 SPECsfs97.v2 ops per second with an overall response time of 8.07 ms 2880 56 51.43 512 1123200 (28) 1344 1176 (88%) 40320+41552 161504 40320+41552 281456 probably affected
3141 SPECsfs97.v2 ops per second with an overall response time of 3.82 ms 3100 40 77.50 1088 1209000 (30) 1280 1080 (75%) 77120+0 0 77120+0 38800 probably affected
3372 SPECsfs97.v2 ops per second with an overall response time of 5.34 ms 3360 56 60.00 3072 1310400 (32) 1344 1176 (88%) 96432+0 0 96432+0 0 I/O working set was cacheable; probably not affected
3425 SPECsfs97.v2 ops per second with an overall response time of 7.56 ms 3400 48 70.83 1024 1326000 (33) 1344 1008 (88%) 80640+9504 9504 80640+9504 96192 probably affected
3827 SPECsfs97.v2 ops per second with an overall response time of 3.92 ms 3840 88 43.64 3072 1497600 (37) 1760 1936 (92%) 118448+0 0 118448+0 0 I/O working set was cacheable; probably not affected
3930 SPECsfs97.v2 ops per second with an overall response time of 3.11 ms 3996 66 60.55 544 1558440 (39) 1584 1386 (88%) 42834+72006 368478 42834+72006 532356 probably affected
4054 SPECsfs97.v2 ops per second with an overall response time of 0.94 ms (unpublished) 4000 4 1000.00 800 1560000 (39) 1280 20 (2%) 968+0 0 968+0 328628 probably affected
4201 SPECsfs97.v2 ops per second with an overall response time of 3.36 ms 4194 63 66.57 544 1635660 (40) 1764 1323 (88%) 42840+68229 372834 42840+68229 581427 probably affected
4217 SPECsfs97.v2 ops per second with an overall response time of 3.32 ms 4200 84 50.00 544 1638000 (41) 1680 1848 (92%) 42840+85512 470316 42840+85512 587160 probably affected
4508 SPECsfs97.v2 ops per second with an overall response time of 2.74 ms 4500 216 20.83 7168 1755000 (43) 2592 2592 (100%) 156168+0 0 156168+0 0 <26 requested ops/sec/process and I/O working set was cacheable; probably not affected
4549 SPECsfs97.v2 ops per second with an overall response time of 6.28 ms 4596 84 54.71 1024 1792440 (44) 2016 1764 (88%) 80640+50820 133056 80640+50820 305256 probably affected
4672 SPECsfs97.v2 ops per second with an overall response time of 7.61 ms 4698 72 65.25 4096 1832220 (45) 2016 1512 (88%) 123912+0 0 123912+0 0 I/O working set was cacheable; probably not affected
4752 SPECsfs97.v2 ops per second with an overall response time of 3.73 ms 4752 45 105.60 544 1853280 (46) 1800 1215 (75%) 42840+65520 387540 42840+65520 726345 probably affected
5023 SPECsfs97.v2 ops per second with an overall response time of 3.89 ms 4998 54 92.56 544 1949220 (48) 1944 1350 (69%) 42822+74682 481410 42822+74682 800010 probably affected
5095 SPECsfs97.v2 ops per second with an overall response time of 3.60 ms 5100 48 106.25 544 1989000 (49) 1920 1296 (75%) 42816+73152 472032 42816+73152 828000 probably affected
5240 SPECsfs97.v2 ops per second with an overall response time of 5.69 ms 5200 56 92.86 3072 2028000 (50) 2016 1400 (69%) 121856+0 0 121856+0 0 I/O working set was cacheable; probably not affected
5303 SPECsfs97.v2 ops per second with an overall response time of 4.64 ms 5250 84 62.50 1024 2047500 (51) 2016 1764 (88%) 80640+69720 234360 80640+69720 449904 probably affected
5402 SPECsfs97.v2 ops per second with an overall response time of 3.92 ms 5376 56 96.00 544 2096640 (52) 2016 1400 (69%) 42840+84896 581392 42840+84896 905520 probably affected
5550 SPECsfs97.v2 ops per second with an overall response time of 4.78 ms 5500 84 65.48 1024 2145000 (53) 2352 1764 (88%) 80640+65772 226800 80640+65772 505260 probably affected
5952 SPECsfs97.v2 ops per second with an overall response time of 3.96 ms 5929 132 44.92 3072 2312310 (57) 2640 2904 (92%) 182028+0 0 182028+0 0 I/O working set was cacheable; probably not affected
6030 SPECsfs97.v2 ops per second with an overall response time of 4.89 ms 6000 210 28.57 1216 2340000 (58) 3360 2520 (100%) 95760+100800 378210 95760+100800 466200 probably affected
6071 SPECsfs97.v2 ops per second with an overall response time of 3.41 ms 6048 102 59.29 1056 2358720 (59) 2448 2142 (88%) 83130+92820 355572 83130+92820 611592 probably affected
6155 SPECsfs97.v2 ops per second with an overall response time of 2.85 ms 6296 112 56.21 1056 2455440 (61) 2688 2352 (88%) 83104+97328 404656 83104+97328 674240 probably affected
7025 SPECsfs97.v2 ops per second with an overall response time of 8.25 ms 7050 72 97.92 4096 2749500 (68) 2592 1800 (69%) 167112+0 0 167112+0 0 I/O working set was cacheable; probably not affected
7046 SPECsfs97.v2 ops per second with an overall response time of 4.79 ms 7050 90 78.33 1216 2749500 (68) 2880 2430 (75%) 95760+79830 294390 95760+79830 713880 probably affected
7206 SPECsfs97.v2 ops per second with an overall response time of 2.76 ms 7200 128 56.25 1024 2808000 (70) 3072 2688 (88%) 80640+125568 621568 80640+125568 924032 probably affected
7431 SPECsfs97.v2 ops per second with an overall response time of 2.30 ms 7560 54 140.00 4960 2948400 (73) 2592 1566 (60%) 158112+0 0 158112+0 0 I/O working set was cacheable; probably not affected
7462 SPECsfs97.v2 ops per second with an overall response time of 3.53 ms 7440 66 112.73 1056 2901600 (72) 2640 1782 (75%) 83160+86790 394812 83160+86790 953502 probably affected
7612 SPECsfs97.v2 ops per second with an overall response time of 7.46 ms 7548 72 104.83 4096 2943720 (73) 2880 1944 (75%) 173232+0 0 173232+0 0 I/O working set was cacheable; probably not affected
7750 SPECsfs97.v2 ops per second with an overall response time of 2.67 ms 7800 48 162.50 1056 3042000 (76) 2688 1632 (57%) 83136+67968 298800 83136+67968 1049760 probably affected
8165 SPECsfs97.v2 ops per second with an overall response time of 3.04 ms 8280 576 14.38 14336 3229200 (80) 6912 6912 (100%) 285120+0 0 285120+0 0 <26 requested ops/sec/process and I/O working set was cacheable; probably not affected
8170 SPECsfs97.v2 ops per second with an overall response time of 2.72 ms 8100 128 63.28 1024 3159000 (79) 3072 2688 (88%) 80640+149504 828416 80640+149504 1146880 probably affected
8292 SPECsfs97.v2 ops per second with an overall response time of 6.03 ms 8280 168 49.29 3072 3229200 (80) 3360 3696 (92%) 241920+12600 12600 241920+12600 76104 probably affected
9079 SPECsfs97.v2 ops per second with an overall response time of 2.56 ms 9180 54 170.00 4960 3580200 (89) 3240 1728 (53%) 172800+0 0 172800+0 0 I/O working set was cacheable; probably not affected
9133 SPECsfs97.v2 ops per second with an overall response time of 3.60 ms 9096 72 126.33 5248 3547440 (88) 3168 2160 (62%) 195984+0 0 195984+0 0 I/O working set was cacheable; probably not affected
9825 SPECsfs97.v2 ops per second with an overall response time of 5.92 ms 9760 112 87.14 5120 3806400 (95) 3584 3024 (75%) 243488+0 0 243488+0 0 I/O working set was cacheable; probably not affected
10384 SPECsfs97.v2 ops per second with an overall response time of 3.93 ms 10350 198 52.27 4096 4036500 (101) 4752 4158 (88%) 295218+0 0 295218+0 46728 probably affected
10715 SPECsfs97.v2 ops per second with an overall response time of 5.43 ms 10593 117 90.54 8192 4131270 (103) 4212 2925 (69%) 250146+0 0 250146+0 0 I/O working set was cacheable; probably not affected
10724 SPECsfs97.v2 ops per second with an overall response time of 3.09 ms 10992 72 152.67 9344 4286880 (107) 3744 2160 (62%) 222912+0 0 222912+0 0 I/O working set was cacheable; probably not affected
11382 SPECsfs97.v2 ops per second with an overall response time of 2.99 ms 11400 192 59.38 2112 4446000 (111) 4608 4032 (88%) 166272+164928 580224 166272+164928 1053888 probably affected
11806 SPECsfs97.v2 ops per second with an overall response time of 5.16 ms 11700 240 48.75 3072 4563000 (114) 4800 5280 (92%) 241920+120240 248160 241920+120240 531840 probably affected
11823 SPECsfs97.v2 ops per second with an overall response time of 5.62 ms 11760 168 70.00 5120 4586400 (114) 4704 3528 (88%) 313992+0 0 313992+0 6552 probably affected
13435 SPECsfs97.v2 ops per second with an overall response time of 3.17 ms 13496 128 105.44 2112 5263440 (131) 5120 3456 (75%) 166272+141952 545664 166272+141952 1553536 probably affected
13605 SPECsfs97.v2 ops per second with an overall response time of 3.15 ms 13520 1536 8.80 14336 5272800 (132) 12288 16896 (92%) 494592+0 0 494592+0 0 <26 requested ops/sec/process and I/O working set was cacheable; probably not affected
14081 SPECsfs97.v2 ops per second with an overall response time of 1.99 ms 13992 48 291.50 10240 5456880 (136) 4608 1872 (41%) 208704+0 0 208704+0 0 I/O working set was cacheable; probably not affected
14140 SPECsfs97.v2 ops per second with an overall response time of 2.91 ms 14300 320 44.69 2048 5577000 (139) 6400 7040 (92%) 161280+280000 1384960 161280+280000 1802240 probably affected
14279 SPECsfs97.v2 ops per second with an overall response time of 6.28 ms 14400 168 85.71 5120 5616000 (140) 5376 4536 (75%) 357000+0 0 357000+0 163128 probably affected
14941 SPECsfs97.v2 ops per second with an overall response time of 5.64 ms 15120 224 67.50 6144 5896800 (147) 6272 4704 (88%) 405440+0 0 405440+0 45024 probably affected
15053 SPECsfs97.v2 ops per second with an overall response time of 5.76 ms 15030 567 26.51 3648 5861700 (146) 9072 6804 (100%) 286902+205821 610659 286902+205821 824418 probably affected
15235 SPECsfs97.v2 ops per second with an overall response time of 1.54 ms 15200 50 304.00 3200 5928000 (148) 5000 2000 (42%) 221000+0 0 221000+0 1136350 probably affected
15270 SPECsfs97.v2 ops per second with an overall response time of 1.91 ms 15984 48 333.00 10240 6233760 (156) 5184 1968 (38%) 225888+0 0 225888+0 0 I/O working set was cacheable; probably not affected
15421 SPECsfs97.v2 ops per second with an overall response time of 4.20 ms 15400 264 58.33 6144 6006000 (150) 6336 5544 (88%) 438240+0 0 438240+0 63360 probably affected
16060 SPECsfs97.v2 ops per second with an overall response time of 6.13 ms 16524 486 34.00 3648 6444360 (161) 7776 5832 (100%) 287226+250290 861678 287226+250290 1128492 probably affected
16138 SPECsfs97.v2 ops per second with an overall response time of 3.00 ms 16000 256 62.50 2048 6240000 (156) 6144 5376 (88%) 161280+296960 1614592 161280+296960 2240768 probably affected
16832 SPECsfs97.v2 ops per second with an overall response time of 2.98 ms 16698 216 77.31 16384 6512220 (163) 6912 5832 (75%) 416448+0 0 416448+0 0 I/O working set was cacheable; probably not affected
17437 SPECsfs97.v2 ops per second with an overall response time of 4.50 ms 17500 450 38.89 3648 6825000 (170) 9000 9900 (92%) 287100+245700 790650 287100+245700 1333800 probably affected
17674 SPECsfs97.v2 ops per second with an overall response time of 5.69 ms 17600 224 78.57 12288 6864000 (171) 7168 6048 (75%) 437024+0 0 437024+0 0 I/O working set was cacheable; probably not affected
18235 SPECsfs97.v2 ops per second with an overall response time of 3.63 ms 18240 480 38.00 8192 7113600 (178) 7680 5760 (100%) 595200+0 0 595200+0 0 I/O working set was cacheable; probably not affected
18293 SPECsfs97.v2 ops per second with an overall response time of 3.80 ms 19000 216 87.96 3456 7410000 (185) 6912 5832 (75%) 272160+202176 676944 272160+202176 1794960 probably affected
18431 SPECsfs97.v2 ops per second with an overall response time of 4.33 ms 18700 264 70.83 8192 7293000 (182) 7392 5544 (88%) 495792+0 0 495792+0 11352 probably affected
18860 SPECsfs97.v2 ops per second with an overall response time of 3.57 ms 18400 1536 11.98 20480 7176000 (179) 12288 16896 (92%) 681984+0 0 681984+0 0 <26 requested ops/sec/process and I/O working set was cacheable; probably not affected
19755 SPECsfs97.v2 ops per second with an overall response time of 3.77 ms 20394 243 83.93 3456 7953660 (199) 7776 6561 (75%) 272160+232308 888165 272160+232308 2091258 probably affected
20176 SPECsfs97.v2 ops per second with an overall response time of 5.58 ms 21120 224 94.29 12288 8236800 (206) 8064 5600 (69%) 501536+0 0 501536+0 0 I/O working set was cacheable; probably not affected
20406 SPECsfs97.v2 ops per second with an overall response time of 6.77 ms 20250 720 28.12 6144 7897500 (197) 11520 8640 (100%) 483840+184320 339120 483840+184320 545760 probably affected
20460 SPECsfs97.v2 ops per second with an overall response time of 3.78 ms 20400 480 42.50 8192 7956000 (199) 9600 10560 (92%) 622560+0 0 622560+0 84960 probably affected
20683 SPECsfs97.v2 ops per second with an overall response time of 3.21 ms 21000 420 50.00 3072 8190000 (205) 8400 9240 (92%) 241920+399840 2002140 241920+399840 2619120 probably affected
20925 SPECsfs97.v2 ops per second with an overall response time of 4.18 ms 21010 264 79.58 12288 8193900 (205) 8448 7128 (75%) 518496+0 0 518496+0 0 I/O working set was cacheable; probably not affected
22017 SPECsfs97.v2 ops per second with an overall response time of 4.04 ms 22000 450 48.89 3648 8580000 (214) 9000 9900 (92%) 287100+391950 1651050 287100+391950 2331000 probably affected
22885 SPECsfs97.v2 ops per second with an overall response time of 5.17 ms 24300 252 96.43 12288 9477000 (237) 9072 6300 (69%) 574812+0 0 574812+0 0 I/O working set was cacheable; probably not affected
23687 SPECsfs97.v2 ops per second with an overall response time of 3.32 ms 23496 360 65.27 3072 9163440 (229) 10080 7560 (88%) 241920+377640 2069640 241920+377640 3260880 probably affected
23815 SPECsfs97.v2 ops per second with an overall response time of 4.39 ms 24288 264 92.00 12288 9472320 (237) 9504 6600 (69%) 571032+0 0 571032+0 0 I/O working set was cacheable; probably not affected
25639 SPECsfs97.v2 ops per second with an overall response time of 9.94 ms 25440 720 35.33 8192 9921600 (248) 11520 8640 (100%) 645120+191520 317520 645120+191520 535680 probably affected
25849 SPECsfs97.v2 ops per second with an overall response time of 3.79 ms 25776 480 53.70 12288 10052640 (251) 11520 10080 (88%) 739200+0 0 739200+0 0 I/O working set was cacheable; probably not affected
25973 SPECsfs97.v2 ops per second with an overall response time of 1.55 ms 26700 80 333.75 6400 10413000 (260) 8960 3360 (39%) 368480+0 0 368480+0 1518960 probably affected
27097 SPECsfs97.v2 ops per second with an overall response time of 3.70 ms 27000 252 107.14 8192 10530000 (263) 10080 6804 (75%) 616392+0 0 616392+0 757512 probably affected
27210 SPECsfs97.v2 ops per second with an overall response time of 3.73 ms 27000 252 107.14 8192 10530000 (263) 10080 6804 (75%) 616392+0 0 616392+0 757512 probably affected
27646 SPECsfs97.v2 ops per second with an overall response time of 4.25 ms 27840 240 116.00 15360 10857600 (271) 10560 7200 (62%) 601680+0 0 601680+0 0 I/O working set was cacheable; probably not affected
27688 SPECsfs97.v2 ops per second with an overall response time of 3.46 ms 28496 512 55.66 4096 11113440 (278) 12288 10752 (88%) 322560+497152 2415104 322560+497152 3596288 probably affected
28265 SPECsfs97.v2 ops per second with an overall response time of 3.87 ms 28050 396 70.83 18432 10939500 (273) 11088 8316 (88%) 743688+0 0 743688+0 0 I/O working set was cacheable; probably not affected
30555 SPECsfs97.v2 ops per second with an overall response time of 1.56 ms 30400 100 304.00 6400 11856000 (296) 10000 4000 (42%) 442000+0 0 442000+0 2272700 probably affected
30684 SPECsfs97.v2 ops per second with an overall response time of 3.59 ms 30496 512 59.56 4096 11893440 (297) 12288 10752 (88%) 322560+560640 2870272 322560+560640 4072960 probably affected
30703 SPECsfs97.v2 ops per second with an overall response time of 3.78 ms 31200 2160 14.44 21504 12168000 (304) 25920 25920 (100%) 1069200+0 0 1069200+0 0 <26 requested ops/sec/process and I/O working set was cacheable; probably not affected
31803 SPECsfs97.v2 ops per second with an overall response time of 4.42 ms 31680 352 90.00 24576 12355200 (309) 12672 8800 (69%) 748000+0 0 748000+0 0 I/O working set was cacheable; probably not affected
40218 SPECsfs97.v2 ops per second with an overall response time of 6.31 ms 40000 400 100.00 16384 15600000 (390) 14400 10000 (69%) 950400+0 0 950400+0 112800 probably affected