SPECweb2005 Benchmark Design Document

SPECweb2005 Release 1.20 Benchmark Design Document

Version 1.20, Last modified 04/05/2006

Table of Contents

SPECweb2005

1.0 Overview of SPECweb2005

SPECweb2005 is a software benchmark product developed by the Standard Performance Evaluation Corporation (SPEC), a non-profit group of computer vendors, system integrators, universities, research organizations, publishers, and consultants. It is designed to measure a system's ability to act as a web server servicing static and dynamic page requests.

SPECweb2005 is the successor of SPECweb99 and SPECweb99_SSL, offering the capabilities of measuring both SSL and non-SSL request/response performance, and continues the tradition of giving Web users the most objective and most representative benchmark for measuring web server performance.

Rather than offering a single benchmark workload that attempts to approximate the breadth of web server workload characteristics found today, SPECweb2005 has chosen a 3-workload benchmark design: banking, ecommerce, and support. Additionally, the change from a concurrent connection-based workload metric to a simultaneous session-based workload metric is intended to offer a more direct correlation between the benchmark workload scores and the number of users a web server can support for a given workload.

This paper will discuss the benchmark architecture and performance metrics. Separate design documents offer workload-specific design details.

2.0 Logical Components of SPECweb2005

SPECweb2005 has four major logical components: the clients, the prime client, the web server, and the back-end simulator (BeSim). These logical components of the benchmark are illustrated, below:

Logical Components of SPECweb2005

2.1 Client

The benchmark clients run the application program that sends HTTP requests to the server and receives HTTP responses from the server. For portability, this application program and the prime client program have been written in Java. Note that as a logical component, one or more load-generating clients may exist on a single physical system.

2.2 Prime Client

The prime client initializes and controls the behavior of the clients, runs initialization routines against the web server and BeSim, and collects and stores the results of the benchmark tests. Similarly, as a logical component it may be located on a separate physical system from the clients, or it may be on the same physical system as one of the clients.

2.3 Web Server

The web server is that collection of hardware and software that handles the requests issued by the clients. In this documentation, we shall refer to it as the SUT (System Under Test) or Web Server. The HTTP server software may also be referred to as the HTTP daemon.

2.4 Back-End Simulator (BeSim)

BeSim is intended to emulate a back-end application server that the web server must communicate with in order to retrieve specific information needed to complete an HTTP response (customer data, for example). BeSim exists in order to emulate this type of communication between a web server and a back-end server. BeSim design documentation is located here: BeSimDesign.html

3.0 Performance Metrics

The primary performance metric for reporting benchmark results is SPECweb2005. Additionally, there are submetric scores for each workload displayed in the workload-specific results files. It is important to understand the significance of both the primary metric scores and the submetric scores for SPECweb2005 results.

3.1 Primary Metric (SPECweb2005)

The primary metric, SPECweb2005, is the geometric mean of the ratio of each of the workload submetric scores to their respective workload reference scores, with that result multiplied by 100, as shown in the formula, below:

SPECweb2005 Primary Metric Calculation

The SPECweb2005 primary metric provides an overall system score relative to the submetric scores for the reference system. Since the reference system would have a score of 100 using this calculation, a score of 200 would represent an overall score that is double the reference score. The geometric mean was chosen in order to prevent any single workload submetric score from having excessive weight in the primary metric score.

3.2 Workload Submetrics

Workload submetric scores are in units of simultaneous sessions. They represent the number of simultaneous user sessions the SUT was able to support while meeting the quality-of-service (QOS) requirements of the benchmark. While the primary metric offers a relative, overall performance score for the system being measured, the submetric scores offer a workload-by-workload view of a system's performance characteristics in units of interest for real-world web sites.

4.0 SPECweb2005 Internals

4.1 Architecture

The SPECweb2005 benchmark is used to measure the performance of HTTP servers. The HTTP server workload is driven by one or more client systems, and controlled by the prime client. Each client sends HTTP requests to the server and validates the server responses. When all of the HTTP requests have been sent and responses received that constitute a web page (typically, this is a dynamic response plus any embedded image files), the number of bytes received, the response time, and the QOS criteria met for that web page transaction is recorded by the client.

Prior to the start of the benchmark, one or more client processes is started on each of the client systems. These processes either listen on the default port (1099) or on another port specified by the user in Test.config. Once all client processes have been started, the client systems are ready for workload and run-specific initialization by the prime client.

The prime client will read in the key value pairs from the configuration files, Test.config, Testbed.config, and the workload-specific configuration file (ex: SPECweb_Banking.config), and perform initialization for the web server and for BeSim. Upon successful completion, it will initialize each client process, passing each client process the configuration information read from the configuration files, as well as any configuration information the prime client calculated (number of load generating threads, for example). When all initialization has completed successfully, the prime client will start the benchmark run.

At the end of the benchmark run, the prime client collects this result data from all clients, aggregates this data, and writes this information to a results file. When all three iterations have finished, an ASCII text report file and an HTML report file are also generated.

4.2 Implementation

The number of simultaneous sessions corresponds to the number of load-generating processes/threads that will continuously send requests to the HTTP server during the benchmark run. Each of these threads will start a "user session" that will traverse a series of workload-dependent states. Once the user session ends, the thread will start a new user session and repeat this process. This process is intended to represent users entering a site, making a series of HTTP requests of the server, and then leaving the site. A new user session starts as soon as the previous user session ends, and this process continues until the benchmark run is complete.

The prime client controls the phases of the benchmark run. These phases are illustrated in the diagram, below:

SPECweb2005 Benchmark Phases

The thread ramp-up period, A, is the time period across which the load generating threads are started. This phase is designed to ramp up user activity rather than beginning the benchmark run with an immediate and full-load spike in requests.

The warm-up period, B, is intended to be a time during which the system can prime its cache prior to the actual measurement interval. At the end of the warm-up period, all results are cleared from the load generator, and recording starts anew. Accordingly, any errors reported prior to the beginning of the run period will not be reflected in the final results for this benchmark run.

The run period, C, is the interval during which benchmark results are recorded. The results of all HTTP requests sent and responses received during this interval will be recorded in the final benchmark results.

The thread ramp-down period, D, is simply the inverse of A. It is the period during which all load-generating threads are stopped. Although load generating threads are still making requests to the server during this interval, all recording of results will have stopped at the end of the run period.

The ramp-down period, E, is the time given to the client and server to return to their "unloaded" state. This is primarily intended to assure sufficient time for TCP connection clean-up before the start of the next test iteration.

The ramp-up period, F, replaces the warm-up period, B, for the second and third benchmark run iterations. It is presumed at this point that the server's cache is already primed, so it requires a shorter period of time between the thread ramp-up period and the run period for these subsequent iterations in order to reach a steady-state condition.

A load-generating thread will make a dynamic request to the HTTP server on one connection, then will reuse that connection as well as an additional connection to the server to make parallel image requests for that page. This is intended to emulate the common browser behavior of using multiple connections to request the page image files from the server. Note that the load-generating thread does not extract the images from the web page returned, as that would create unnecessary page parsing burden for the load-generating threads. Instead, the page image files to be requested for each dynamic page are retrieved from the workload-specific configuration file.

For each page requested by a load-generating thread, the load generator will start a timer immediately before sending the page request to the HTTP server, and it will stop the timer as soon as the last byte of the response for that page is received. It will likewise time the responses for all supporting image files and add those response times to the dynamic page response time in order to arrive at a total response time for that page that includes all supporting image files. Valid responses will then have their aggregate page response time checked against their respective QOS values for the workload, and the value for the corresponding QOS field (TIME_GOOD, TIME_TOLERABLE, or TIME_FAIL) will be incremented.

At the end of a run, the prime client aggregates the run data from all of the clients and determines whether the run met the benchmark QOS criteria.

4.3 Harness Software Design

The benchmark harness has three primary software components: the prime client code, the client "base" code, and the client workload code. Additionally, the reporter code is used to generate HTML and ASCII-formatted reports from the "raw" reports created by the prime client.

The SPECweb2005 harness separates the common benchmark harness functionality contained in the prime client code and the client base code from the workload-specific functionality contained in the client workload code. In designing it in this way, workloads can more easily be added, removed, or replaced without changes to the base harness code. The workload-specific client code for SPECweb2005 consists of three class files, one for each type of workload: SPECweb_Banking, SPECweb_Ecommerce, and SPECweb_Support. Adding or replacing a workload simply requires that the new class implement the methods expected by the base harness code, and that the workload's base name match the base name of the corresponding workload configuration file. The key harness class files are illustrated, below:

SPECweb2005 Key Java Classes

The three classes invoked from the command line are specweb, specwebclient, and reporter.

specwebclient is invoked on one or more client systems to start the client processes that will generate load against the HTTP server. Once invoked, specwebclient listens on the assigned port waiting for the prime client's instructions.

The prime client is started by invoking specweb on the prime client system. specweb reads in the relevant configuration files (Test.config, Testbed.cong, and the workload-specific configuration file) and then lets the SPECwebControl class handle benchmark execution. SPECwebControl then creates a RemoteLoadGenerator to handle communication between the prime client and the client processes (represented by the blue line, above). RemoteLoadGenerator communicates with the specwebclient processes via Java's Remote Method Invocation (RMI). The RMI methods called by RemoteLoadGenerator to control specwebclient are:

createThreads():
creates the number of load-generating threads/processes that will send requests to the server. To do this, it checks the configuration information sent by the prime client, and then creates the appropriate workload-specific objects: SPECweb_Banking, SPECweb_Ecommerce, or SPECweb_Support. (It could also invoke a custom workload, such as SPECweb_Custom, provided there was a SPECweb_Custom.class and a SPECweb_Custom.config file, with the latter being specified in the Test.config file.)
setup():
handles initialization of static workload class variables. Since these need only be set once per specwebclient process, rather than for each load-generating thread, this initialization is executed only by the first load-generating thread for each specwebclient process.
start():
starts each load-generating thread. start() corresponds to the beginning of Phase A, THREAD_RAMPUP_SECONDS, in the benchmark phase chart, above.
stop():
stops each load-generating thread. stop() corresponds to the beginning of Phase D, THREAD_RAMPDOWN_SECONDS, in the benchmark phase chart, above.
getHeartbeat():
this method simply tells the prime client that this client process is still "alive". The prime client calls this method on a regular interval to affirm that the test is still running properly, rather than waiting until the end of a benchmark run to discover problems.
waitComplete():
before each load-generating thread is killed at the end of the run, it gives the load-generating threads a finite amount of time to "clean up", and then attempts to terminate all load-generating threads. It then displays the benchmark run statistics for that client process.
exit():
allows the prime client to kill the client process. This is used at the end of a complete benchmark run (i.e. completion of all iterations) if KILL_CLIENT is set to a value greater than 0 in Test.config. Otherwise, the client remains listening on the port, essentially in the same state as when specwebclient was first invoked.
getStatistics():
returns the results collected by this client process during the benchmark run. This is called at the end of each iteration by the prime client to collect the results from each of the client processes.
clearStatistics():
clears all results collected prior to the beginning of Phase C, the RUN_SECONDS measurement interval.
isReady():
returns true when all load-generating threads have been created on the client.
cleanUp():
gets rid of objects created during the run that will be recreated in any subsequent run, including the LoadGenerator object on the client. Although this is not required, explicitly nullifying these objects allows the memory to be reclaimed more quickly, and creates less work for Java's garbage collector.

The reporter class is invoked only to recreate the ASCII and HTML results for a previously created single or combined raw file, or to combine 3 separate workload .raw files into one submittable .raw file for SPEC submission. The latter is the method to initially generate the raw and formatted files for a complete SPECweb2005 result. Note that the individual .raw files created at the end of the benchmark run for each of the workloads are essentially "interim" results files. They must be combined using the reporter to create a submittable raw file for SPEC submission.

4.4 Rated Receive and Quality of Service

Like SPECweb99, SPECweb2005 uses a rated receive mechanism for simulating connection speeds. However, for SPECweb2005 the simulated connection speeds have been increased to a maximum rate of 100,000 bytes/sec. This rate was chosen to reflect the higher connection speeds common today with the widespread adoption of broadband.

One of the most notable changes in SPECweb2005 from its predecessors is the change from connection-based QOS to a web page-based QOS. For all except the Support workload's "download" request, a time-based QOS is used. Specifically, QOS is based on the amount of time that elapses between a web page request and the receipt of the complete web page, including any supporting image files. For the Support workload's download state, a more appropriate byte rate-based QOS is applied. That is, the total bytes received from a download request divided by the download time must meet the byte rate stipulated in the QOS requirements.

4.5 Think Time and HTTP 304 Server Responses

Another new feature for the SPECweb2005 benchmark is the inclusion of "think time" between user session requests (i.e. between workload "states"). After the initial request in a user session, the load generating thread will calculate a random, exponentially-distributed think time between the values of THINK_INTERVAL and THINK_MAX, with an average value of THINK_TIME, as specified in each workload's configuration file.

The load generating thread will then wait that amount of time before issuing another request for that session. This delay is meant to more closely emulate end-user behavior between requests. In doing so, connections to the server are kept open much longer than they would, otherwise, and benchmark tuning requires more judicious choices for the server's keep-alive timeout value (particularly for SSL connections), as is the case for real-world web servers.

SPECweb2005 has also added support for generating server 304 responses (not-modified-since). This is accomplished by calling the HTTP server's init script, which returns the current time on the server. The harness then uses that value as the time value in all subsequent "if-modified-since" requests to the HTTP server, assuring that the server will return an HTTP 304 response. How frequently each static image request results in a 304 response is controlled via the "304 response %" value assigned to each of the static image files in the workload-specific configuration file.

4.6 Workload Design

Workload-specific design documentation can be found using the following links:

4.7 Harness Configuration Files

In addition to making it easier to create and add new workloads to the SPECweb2005 harness, the harness' flexibility has been greatly increased by making it more easily modified via the configuration files. For example, state transitions can be modified by changing the state transition probabilities in the workload-specific files (the STATE_'n' values). The static image files that are requested along with the dynamic web page can be modified by changing the PAGE_n_FILES values, also in the workload-specific configuration file. You can even change the names of the image files requested, their sizes, and the frequency with which requests for these image files return a 304 response (not-modified-since), simply through changes to the Image File Details section of the same configuration file.

Obviously, such changes would result in non-compliant benchmark runs, but adding such flexibility increases the benchmark's usefulness in capacity planning and other research.

5.0 Conclusion

SPECweb2005 represents a standardized benchmark for measuring web server performance. Building upon the success of its predecessors, SPECweb2005 provides users an objective measure allowing users to make fair comparisons between results from a wide range of systems.

SPECweb2005 includes three separate workloads representing three common yet disparate types of consumer activity: shopping, banking, and downloading files. By offering PHP and JSP scripts to generate the dynamic server responses, SPECweb2005 also assures that the HTTP server's dynamic request performance is based on two of the most common technologies for generating dynamic content today. And though SPECweb2005 is not designed as a capacity planning tool, by exposing more benchmark tuning flexibility in the configuration files and making it easier to add custom workloads, SPECweb2005 further extends its potential usefulness in capacity planning.

This whitepaper has described the benchmark architecture. It is not a guide to running the benchmark. For information on running the benchmark please refer to the User Guide included with the benchmark CD. Also, please refer to the Run Rules that govern what constitutes a valid SPECweb2005 run prior to running tests whose results will be submitted to SPEC for publication on the SPEC web site or publicly disclosed as a valid SPECweb2005 result.

6.0 Additional Information

More information on SPECweb2005 can be found at the SPEC web site at:

http://www.spec.org/web2005/

Java(r) is a registered trademark of Sun Microsystems.