Standard Performance Evaluation Corporation
The Workload for the SPECweb96 Benchmark
The workload for SPECweb96 is based upon analysis of logs from several servers. During the investigation we had access to logs from NCSA's site (made popular by Mosaic), the home pages for Hewlett-Packard and HAL Computers, and even a small site supporting several comic strip artists. We then compared the findings from these logs against summary data provided by Netscape and CommerceNet.
These logs showed remarkable similarity in the relationships between file sizes and their frequency of access. A fair number of requests were to quite small files: small graphical elements, short text files, and so on. Most of the accesses were to files several KB in size: mostly HTML files and their main graphics. Then, as file size increased, frequency trailed off; a resonable number of requests for HTML files and other documents and pictures that were between 10 and 100 KB in size. Finally there were only occasional accesses to larger documents and multimedia files larger than 100KB. In short, the common activity was often browsing home pages and indices before finally selecting only a few large files to download.
After reviewing all this data, we settled on a workload mix build out of files in four classes: files less than 1KB account for 35% of all requests, files between 1KB and 10KB account for 50% of requests, 14% between 10KB and 100KB, and finally 1% between 100KB and 1MB.
TABLE 1 File Sizes per Class and Frequency of Access Class 0 0 -- 1KB 35% Class 1 1KB -- 10KB 50% Class 2 10KB -- 100KB 14% Class 3 100KB -- 1MB 1%
There are 9 discrete sizes within each class (e.g. 1 KB, 2 KB, on up to 9KB, then 10 KB, 20 KB, through 90KB, etc.), resulting in a total of 36 different files (9 in each of 4 classes).
TABLE 2 Sizes (in bytes) of Files in Each Class Class 0 Class 1 Class 2 Class 3 102 1024 10,240 102,400 204 2048 20,480 204,800 . . . 922 9216 92,160 921,600
However, accesses within a class are not evenly distributed; they are allocated using a Poisson distribution centered around the midpoint within the class. The resulting access pattern mimics the behavior where some files (such as "index.html") are more popular than the rest, and some files (such as "mydog.gif") are rarely requested.
Finally, it was decided that the total size of the file set should scale with the expected throughput of the server. This is not to say that the size of web sites grows larger as they become more popular, but rather, that the expectations for a high-end server are much greater than for a smaller server. In particular, it should not be unreasonable to assume that two smaller systems might be replaced by one that has twice the performance rating; however, this only holds if the larger system can also handle the files from both, not just the higher request rate. Recognizing that there is likely to be overlap and that file space probably does not grow linearly, SPEC chose to have the file set size grow slowly: the file set size will double as the expected throughput quadruples.
The number of directories is set by the above mentioned scaling function, which can be stated as: sqrt( throughput / 5 ) * 10.
TABLE 3 Number of Directories (And Resulting Disk Space) Based Upon the Target Throughput ("Ops") Ops: 1 Dirs: 4 Size: 22 MB Ops: 2 Dirs: 6 Size: 31 MB Ops: 5 Dirs: 10 Size: 49 MB Ops: 10 Dirs: 14 Size: 69 MB Ops: 20 Dirs: 20 Size: 98 MB Ops: 50 Dirs: 31 Size: 154 MB Ops: 100 Dirs: 44 Size: 218 MB Ops: 200 Dirs: 63 Size: 309 MB Ops: 500 Dirs: 100 Size: 488 MB Ops: 1000 Dirs: 141 Size: 690 MB
The resulting workload can be thought of as what might be the behavior of a system supporting the "home pages" for a number of "members"; thus there are a set of directories, one per member, with 36 files each, a complete set of nine files per each of the four classes.
Requests are spread evenly across all applicable directories. Within each directory, or each "member's home", accesses are distributed according to the functions defined above, resulting in several popular files, a few common files, and a number of infrequent files.