Skip navigation

Standard Performance Evaluation Corporation

Facebook logo LinkedIn logo Twitter logo
 
 

The Workload for the SPECweb96 Benchmark


Background

The workload for SPECweb96 is based upon analysis of logs from several servers. During the investigation we had access to logs from NCSA's site (made popular by Mosaic), the home pages for Hewlett-Packard and HAL Computers, and even a small site supporting several comic strip artists. We then compared the findings from these logs against summary data provided by Netscape and CommerceNet.

These logs showed remarkable similarity in the relationships between file sizes and their frequency of access. A fair number of requests were to quite small files: small graphical elements, short text files, and so on. Most of the accesses were to files several KB in size: mostly HTML files and their main graphics. Then, as file size increased, frequency trailed off; a resonable number of requests for HTML files and other documents and pictures that were between 10 and 100 KB in size. Finally there were only occasional accesses to larger documents and multimedia files larger than 100KB. In short, the common activity was often browsing home pages and indices before finally selecting only a few large files to download.

File Mix

After reviewing all this data, we settled on a workload mix build out of files in four classes: files less than 1KB account for 35% of all requests, files between 1KB and 10KB account for 50% of requests, 14% between 10KB and 100KB, and finally 1% between 100KB and 1MB.

        TABLE 1
      File Sizes per Class and Frequency of Access

      Class 0   0 -- 1KB    35%
      Class 1   1KB -- 10KB   50%
      Class 2   10KB -- 100KB   14%
      Class 3   100KB -- 1MB     1%

There are 9 discrete sizes within each class (e.g. 1 KB, 2 KB, on up to 9KB, then 10 KB, 20 KB, through 90KB, etc.), resulting in a total of 36 different files (9 in each of 4 classes).

        TABLE 2
    Sizes (in bytes) of Files in Each Class

  Class 0   Class 1   Class 2   Class 3
  102   1024    10,240    102,400
  204   2048    20,480    204,800
  .
  .
  .
  922   9216    92,160    921,600

However, accesses within a class are not evenly distributed; they are allocated using a Poisson distribution centered around the midpoint within the class. The resulting access pattern mimics the behavior where some files (such as "index.html") are more popular than the rest, and some files (such as "mydog.gif") are rarely requested.

Scaling

Finally, it was decided that the total size of the file set should scale with the expected throughput of the server. This is not to say that the size of web sites grows larger as they become more popular, but rather, that the expectations for a high-end server are much greater than for a smaller server. In particular, it should not be unreasonable to assume that two smaller systems might be replaced by one that has twice the performance rating; however, this only holds if the larger system can also handle the files from both, not just the higher request rate. Recognizing that there is likely to be overlap and that file space probably does not grow linearly, SPEC chose to have the file set size grow slowly: the file set size will double as the expected throughput quadruples.

The number of directories is set by the above mentioned scaling function, which can be stated as: sqrt( throughput / 5 ) * 10.

        TABLE 3
  Number of Directories (And Resulting Disk Space)
      Based Upon the Target Throughput ("Ops")

  Ops:    1 Dirs:    4  Size:  22 MB
  Ops:    2 Dirs:    6  Size:  31 MB
  Ops:    5 Dirs:   10  Size:  49 MB
  Ops:   10 Dirs:   14  Size:  69 MB
  Ops:   20 Dirs:   20  Size:  98 MB
  Ops:   50 Dirs:   31  Size: 154 MB
  Ops:  100 Dirs:   44  Size: 218 MB
  Ops:  200 Dirs:   63  Size: 309 MB
  Ops:  500 Dirs:  100  Size: 488 MB
  Ops: 1000 Dirs:  141  Size: 690 MB

Summary

The resulting workload can be thought of as what might be the behavior of a system supporting the "home pages" for a number of "members"; thus there are a set of directories, one per member, with 36 files each, a complete set of nine files per each of the four classes.

Requests are spread evenly across all applicable directories. Within each directory, or each "member's home", accesses are distributed according to the functions defined above, resulting in several popular files, a few common files, and a number of infrequent files.