SPECweb2005 Release 1.0 Run and Reporting Rules

Version 0.12, Last modified 6/12/2005




1.0 Introduction

This document specifies how SPECweb2005 is to be run for measuring and publicly reporting performance results. These rules abide by the norms laid down by the SPEC Web Subcommittee and approved by the SPEC Open Systems Steering Committee. This ensures that results generated with this suite are meaningful, comparable to other generated results, and are repeatable (with documentation covering factors pertinent to duplicating the results).

Per the SPEC license agreement, all results publicly disclosed must adhere to these Run and Reporting Rules.

1.1 Philosophy

The general philosophy behind the rules of SPECweb2005 is to ensure that an independent party can reproduce the reported results.

The following attributes are expected:

Furthermore, SPEC expects that any public use of results from this benchmark suite shall be for System Under Test (SUT) and configurations that are appropriate for public consumption and comparison. Thus, it is also expected that:

SPEC requires that any public use of results from this benchmark follow the SPEC OSG Fair Use Policy and those specific to this benchmark (see Fair Use section below).  In the case where it appears that these guidelines have not been adhered to, SPEC may investigate and request that the published material be corrected.

1.2 Fair Use of SPECweb2005 Results

When competitive comparisons are made using SPECweb2005 benchmark results the following benchmark specific rules apply:

  1. Only the following approved normalized metrics and submetrics may be used: SPECweb2005,  SPECweb2005_Banking, SPECweb2005_Ecommerce, SPECweb2005_Support.
  2. Simultaneous User Sessions may be used when comparing results from any one workload.
  3. Median Aggregate QoS Compliance and or Total Weighted Aggregate Byte Rate values may be used to distinguish between SPECweb2005 workload specific submetrics at the same value.
  4. The following comparisions between results categories (Single Node Platform, Homogeneous Multi Node Platform, and Heterogeneous/Solution Platform) are allowed where a basis for the comparison has been defined, all others are prohibited:

SPEC expects that the following template be used:

SPEC® and SPECweb® are registered trademarks of the Standard Performance Evaluation Corp. (SPEC). Competitive numbers shown reflect results published on www.spec.org as of <date>. [The comparison presented is based on <basis for comparison>].  For the latest SPECweb2005 results visit http://www.spec.org/osg/web2005.
(Note: [...] above required only if selective comparisons are used.)

Example:

SPECweb2005 is a trademark of the Standard Performance Evaluation Corp. (SPEC). Competitive numbers shown reflect results published on www.spec.org as of November 12, 2005. The comparison presented is based on best performing 4-core Single Node Platform servers currently shipping by Vendor 1, Vendor 2 and Vendor 3. For the latest SPECweb2005 results visit http://www.spec.org/osg/web2005.

The rationale for the template is to provide fair comparisons, by ensuring that:


1.3 Research and Academic Usage

SPEC encourages use of the SPECweb2005 benchmark in academic and research environments. It is understood that experiments in such environments may be conducted in a less formal fashion than that demanded of licensees submitting to the SPEC web site. For example, a research environment may use early prototype hardware or software that simply cannot be expected to function reliably for the length of time required to complete a compliant data point, or may use research hardware and/or software components that are not generally available. Nevertheless, SPEC encourages researchers to obey as many of the run rules as practical, even for informal research. SPEC respectfully suggests that following the rules will improve the clarity, reproducibility, and comparability of research results.

Where the rules cannot be followed, the deviations from the rules must be disclosed. SPEC requires these non-compliant results be clearly distinguished from results officially submitted to SPEC or those that may be published as valid SPECweb2005 results. For example, a research paper can use simultaneous sessions but may not refer to them as SPECweb2005 results if the results are not compliant.


1.4 Caveat

SPEC reserves the right to adapt the benchmark codes, workloads, and rules of SPECweb2005 Release 1.0 as deemed necessary to preserve the goal of fair benchmarking. SPEC will notify members and licensees whenever it makes changes to this document and will rename the metrics.

Relevant standards are cited in these run rules as URL references, and are current as of the date of publication. Changes or updates to these referenced documents or URLs may necessitate repairs to the links and/or amendment of the run rules. The most current run rules will be available at the SPEC Web site at http://www.spec.org. SPEC will notify members and licensees whenever it makes changes to the suite.


2.0 Running the SPECweb2005 Release 1.0 Benchmark

2.1 Environment

2.1.1 Protocols

As the WWW is defined by its interoperative protocol definitions, SPECweb2005 requires adherence to the relevant protocol standards. It is expected that the Web server is HTTP 1.1 compliant. The benchmark environment shall be governed by the following standards:

To run SPECweb2005, in addition to all the above standards, SPEC requires the SUT to support SSLv3 as defined in the following:

Of the various ciphers supported in SSLv3, cipher SSL_RSA_WITH_RC4_128_MD5 is currently required for all workload components that use SSL.  It was selected as one of the most commonly used SSLv3 ciphers and allows results to be directly compared to each other. SSL_RSA_WITH_RC4_128_MD5 consists of:

For further explanation of these protocols, the following might be helpful:

The current text of all IETF RFC's may be obtained from: http://ietf.org/rfc.html

2.1.2 Testbed Configuration

These requirements apply to all hardware and software components used in producing the benchmark result, including the System under Test (SUT), network, and clients.

2.1.3 System Under Test (SUT)

For a run to be valid, the following attributes must hold true:

2.2 Measurement

2.2.1 Load Generation

The SPECweb2005 individual workload metrics represent the actual number of user sessions that a server can support while meeting quality of service (QoS) and validation requirements for the given workload.  In the benchmark run, a number of simultaneous user sessions are requested. Typically, each user session would start with a single thread requesting a dynamically created file or page. Following the receipt of this file and the need to request multiple embedded files within the page, two threads corresponding to that user session actively make connections and request files on these connections. The number of threads making requests on behalf of a given user session is limited to two, in order to comply with the HTTP 1.1 recommendations.

The load generated is based on page requests, transition between pages and the static images accessed within each page, as defined in the SPECweb2005 Design Specification.

The QoS requirements for each workload are defined in terms of two parameters, Time_Good and Time_Tolerable. QoS requirements are page based, Time_Good and Time_Tolerable values are defined separately for each workload (Time_Tolerable > Time_Good). For each page, 95% of the page requests (including all the embedded files within that page) are expected to be returned within Time_Good and 99% of the requests within Time_Tolerable.  Very large static files (i.e. Support downloads) use specific byte rates as their QoS requirements.

The validation requirement for each workload is such that less than 1% of requests for any given page and less than 0.5% of the all page requests in a given test iteration fail validation.

It is required in this benchmark that all user sessions be run at the HIGH-SPEED-INTERNET speed of 100,000 bytes/sec.

In addition, the URL retrievals (or operations) performed must also meet the following quality criteria:

Note: The Weighted Percentage Difference for any given workload page is calculated using the following formulas:

WPD = PageMix% * ETR

ETR = (#Sessions * RunTime) / (ThinkTime * %RwTT + AvgRspTime)

Where:

Workload Page Mix Percentage Table
Banking
Mix %

Ecommerce
Mix%

Support
Mix%
acct summary
15.11%

billing
3.37%

catalog
11.71%
add payee 1.12%

browse 11.75%

download
6.76%
bill pay 13.89%

browse product 10.03%

file
13.51%
bill pay status 2.23%

cart
5.30%

file catalog
22.52%
check detail html 8.45%

confirm
2.53%

home
8.11%
check image 16.89%

customize1 16.93%

product
24.78%
change profile 1.22%

customize2 8.95%

search
12.61%
login 21.53%

customize3 6.16%



logout 6.16%

index
13.08%



payee info 0.80%

login
3.78%



post check order 0.88%

product detail
8.02%



post fund transfer 1.24%

search
6.55%



post profile 0.88%

shipping
3.55%



quick pay 6.67%






request checks 1.22%






req xfer form 1.71%








2.2.2 Benchmark Parameters

Workload-specific configuration files are supplied with the harness. All configurable parameters are listed in these files. For a run to be valid, all the parameters in the configuration files must be left at default values, except for the ones that are marked and listed clearly as "Configurable Workload Properties".

2.2.3 Running SPECweb2005 Workloads

Since SPECweb2005 contains three distinct workloads (banking, ecommerce, and support), the benchmarker may:


2.3 Workload Filesets

The particular files referenced shall be determined by the workload generation in the benchmark itself. A fileset for a workload consists of content that the dynamic scripts reference. This represents images, static content, and also "padding" to bring the dynamic page sizes in-line with that observed in real-world Web sites. All filesets are to be generated using the Wafgen fileset generator supplied with the benchmark tools. It is the responsibility of the benchmarker to ensure that these files are placed on the SUT so that they can be accessed properly by the benchmark. These files, and only these files shall be used as the target fileset. The benchmark performs internal validations to verify the expected results. No modification or bypassing of this validation is allowed.

The SUT is required to be configured with the storage to contain all necessary software and logs for compliant runs of all three workloads.  As a minimum, the system must also be configured to contain the largest fileset of three workloads, such that each of the other two workload filesets can be mapped into to the same storage footprint.  If the system has not been configured to contain storage to hold the filesets for all three workloads concurrently, then the benchmarker must use the same I/O subsystem (disks, controllers, etc) and not add or remove storage.   The disclosure details must indicate whether the filesets were stored concurrently or remapped between workload runs.

2.3.1 Banking Fileset

For the Banking workload, we define two types of files:

1. The embedded image files, that do not grow with the load. Details on these files (bytes and type) are specified in the design document.
2. The number of check images increase linearly with the number of simultaneous connections supported. For each connection supported, we would maintain check images for 50 users, each in its own directory. For each user defined, there will be 20 check images maintained, 10 representing the front of the checks and the other 10 representing the back of the checks.

The above assumes that under high load conditions in a banking environment, we would expect to see no more than 1% of the banking customers logged in at the same time.

2.3.2 E-commerce Fileset

For the E-commerce workload, two types of files are defined:

1. The embedded image files, that do not grow with the load. Details on these files (bytes and type) are specified in the design document.
2. The product images, which increase linearly with the number of simultaneous sessions requested. For each simultaneous session, 5 "product line" directories are created. Each product line directory contains images for 10 different "products". Each product has 3 different sizes, representing the various views of products that are often presented to users (i.e., thumbnails, medium-sized, and larger close-up views).

2.3.3 Support Site Fileset

For the support site workload, two types of files are defined:

1. The embedded image files, that do not grow with the load. Details on these files (bytes and type) are specified in the design document.
2. The file downloads, which increase linearly with the number of simultaneous sessions requested. The ratio of simultaneous sessions to download directories is 4:1. Each directory contains downloads for 5 different categories (i.e. flash BIOS upgrades, video card drivers, etc.).  The file sizes were determined by analyzing the file sizes observed at various hardware vendors' support sites.

2.4 Dynamic Request Processing

SPECweb2005 follows a page based model. Each page is initiated by a dynamic GET or POST request, which runs a dynamic script on the server and returns a dynamically created Web page. Associated with each dynamic page, are a set of static files or images, which the client requests right after the receipt of the dynamically created page. The page returned is marked as complete when all the associated images/static files for that page are fully received.

Only the dynamic scripts provided in the benchmark kit may be used for submissions/publications. In version 1.0 of the release, implementations in PHP and JSP are provided.

The pseudo code reference specifications are the standard definition of the functionality. Any dynamic implementation must follow the specification exactly.

Approval of any newly submitted dynamic code for future releases will include testing conformance to pseudo code as well as running of the code on other platforms by active members of the sub-committee. This will be done in order to ensure compliance with the letter and spirit of the benchmark, namely whether the scripts used to code the dynamic requests are representative of  scripts commonly in use within the relevant customer base.  An acceptable scripting language must meet the following requirements:

The sub-committee plans to release an updated benchmark kit every 6 months as needed. In order to allow sufficient time for the review and release process, the submitter is required to submit the dynamic code during the first two months of the cycle.   It is recommended that the submitter inform the sub-committee of the plan to provide an implementation prior to the actual code submission. 

All dynamic implementations submitted to SPEC must include a signed permission to use form and must be freely available for use by other members and licensees of the benchmark.


3.0 Reporting Results

3.1 Metrics And Reference Format

The reported metric, SPECweb2005,  will be derived from a set of compliant results from all three workloads in the suite:

 The SPECweb2005 metric is a "supermetric" that is the geometric mean of  the  three normalized submetrics for each workload. The normalized submetric for a given workload is defined as the ratio of the workload metric for the SUT to the  workload metric for the reference platform multiplied by 100.

The individual workload metric is the number of simultaneous sessions from a compliant test run consisting of of three consecutive valid and conforming iterations of the benchmark, using one invocation of "java specweb".

Each iteration consists of a minimum 3 minute thread ramp up and a minimum 5 minute warm up period and a 30 minute measurement period (i.e. run time; which may be increased to ensure at least 100 requests for each page type are completed where the load is minimal).   There are also corresponding rampdown periods (3 minutes + 5 minutes) between iterations. 

The SPECweb2005 reference platform consists of:

Hardware:

Processor: Athlon MP 1.2Ghz
Motherboard: Tyan S2462
Memory:: 2048MB RAM in 4x512M DIMMs
Disk Subsystem:Adaptec AIC-7899W On-board 
  One channel: 1 x 18G U160 SCSI drive for OS/Swap/Logs
  Other channel: 3 x 18G U160 SCSI drives in Software RAID0 for fileset
Network: Single port Intel Pro/1000XT NIC

Software:

OS: Fedora Core release 2 with 2.6.5-1.358 kernel
Webserver: Apache 1.3.31 with Mod_ssl 2.8.20 and OpenSSL 0.9.7e
Scripting language: PHP 4.3.9 and Smarty 2.6.6

The metric SPECweb2005 and individual workload metrics may not be associated with any estimated results. This includes adding, multiplying or dividing measured results to create a derived metric for some other system configuration.

The report of results for the SPECweb2005 benchmark is generated in ASCII and HTML format by the provided SPEC tools. These tools may not be changed without prior SPEC approval. The tools perform error checking and will flag some error conditions as resulting in an "invalid run".  However, these automatic checks are only there for debugging convenience, and do not relieve the benchmarker of the responsibility to check the results and follow the run and reporting rules.


3.1.1 Categorization of Results

SPECweb2005 results will be categorized into single and multiple node results, where the terms single and multiple nodes are as defined in this section. Multiple nodes are again defined to be of two types, homogeneous and heterogeneous. Moreover, for multiple  submissions involving homogeneous nodes, the subcommittee will also require a submission on a corresponding single-node platform (see details in the following paragraphs).

A Single Node Platform for SPECweb2005 consists of one or more processors executing a single instance of an OS and one or more instances of the same Web server software. Externally attached storage for software and filesets may be used; all other performance critical operations must be performed within the single server node.  A single common set of NICs must be used across all 3 workloads to relay all HTTP and HTTPS  traffic.

Example:
                                 |
test harness (clients, switches)=|=Server NICs:Server Node:Storage
                                 |

A Homogeneous Multi Node Platform for SPECweb2005 consists of two or more electrically equivalent single Node Servers in a single chassis or connected through a shared bus. Each node contains the same number and type of processing units and devices and each node executes a single instance of an OS and one or more  instances of the same Web server software. Storage for the filesets may be duplicated or shared.  All incoming requests from the test harness must be load balanced by either by a single node that receives all incoming requests and balances the load across the other nodes (A) or by a separate load balancing appliance that serves that function (B).  Each node must contain a single common set of NICs that must be used across all 3 workloads to relay all HTTP and HTTPS traffic.

If a separate load balancing appliance is used it must be included in the SUT's definition. 

A)
                                 |
test harness (clients, switches)=|=Node_1 NICs:Node_1_LB:Node_2:..:Node_N
                                 |
B)
                                                  Node_1
                                 |               /
test harness (clients, switches)=|=LoadBalancer +------Node_2
                                 |               \
                                                  Node_N



A Heterogeneous/Solution Platform for SPECweb2005 consists of any combination of server nodes and appliances that have been networked together to provide all the performance critical functions measured by the benchmark. All incoming requests from the test harness must be load balanced by either a single node that receives all incoming requests and balances the load across the other nodes or by a separate load balancing appliance that serves that function. Electrical equivalence between server nodes is not required.

Storage for the filesets may be duplicated or shared. Additional appliances that provide performance critical operations such as intelligent switches or SSL appliances may be used. All nodes and appliances used must be included in the SUT's definition. Examples: C & D.

C)

                                 |
test harness (clients, switches)-|-I_Switch-Node_1 NICs:Node_1_LB:Node2:..:Node_N
|

D)

                                                  SSLappliance-ImageServer_1
| /
test harness (clients, switches)-|-LoadBalancer-+-SSLappliance-Node_2
| \
SSLappliance-Node_N

3.2 Testbed Configuration

All system configuration information  required to duplicate published performance results must be reported. Tunings not in default configuration for software and hardware settings including details on network interfaces must be reported. 

3.2.1 SUT Hardware

The SUT hardware configuration must not be changed between workload runs.  However, not all hardware used in one workload is required to be used in another.  In the case where multiple controllers are used for one workload, the same controllers must be electronically connected, and some subset of those controllers must be used, for the other workloads.

In the case of NICs, all NICs must be used by each workload and each NIC must carry a significant portion of the network traffic.

The following SUT hardware components must be reported:

The documentation of the hardware for a result in the Heterogeneous/Platform category must also include a diagram of the configuration. 

3.2.2 SUT Software

  The following SUT software components must be reported:

3.2.3 Network Configuration

A brief description of the network configuration used to achieve the benchmark results is required. The minimum information to be supplied is:

3.2.4 Clients

The following load generator hardware components must be reported:

3.2.5 Backend Simulator (BeSim)

The following BeSim hardware and software components must be reported:

Note: BeSim API code is provided as part of the SPECweb2005 kit, and can be compiled in several different ways: ISAPI, NSAPI, or FastCGI.  For more information, please see the User's Guide.

3.2.6 General Availability Dates

The dates of general customer availability must be listed for the major components: hardware, HTTP server, and operating system, month and year. All the system, hardware and software features are required to be generally available on or before date of publication, or within 90 days of the date of publication (except where precluded by these rules, see section 3.2.7). With multiple components having different availability dates, the latest availability date must be listed.

Products are considered generally available if they are orderable by ordinary customers and ship within a reasonable time frame. This time frame is a function of the product size and classification, and common practice. The availability of support and documentation for the products must coincide with the release of the products.

Hardware products that are still supported by their original or primary vendor may be used if their original general availability date was within the last five years. The five-year limit is waived for hardware used in client and BeSim systems.

Software products that are still supported by their original or primary vendor may be used if their original general availability date was within the last three years.

In the disclosure, the benchmarker must identify any component that is no longer orderable by ordinary customers.

If pre-release hardware or software is tested, then the test sponsor represents that the performance measured is generally representative of the performance to be expected on the same configuration of the release system. If the sponsor later finds the performance to be lower than 5% of that reported for the pre-release system, then the sponsor shall resubmit a new corrected test result. 

3.2.7 Rules on the Use of Open Source Applications

SPECweb2005 does permit Open Source Applications outside of a commercial distribution or support contract with some limitations. The following are the rules that govern the admissibility of the Open Source Application in the context of a benchmark run or implementation. Open Source Applications do not include shareware and freeware, where the source is not part of the distribution.

  1. Open Source Application rules do not apply to Open Source operating systems, which would still require a commercial distribution and support.

  2. Only a "stable" release can be used in the benchmark environment; non-“stable” releases (alpha, beta, or release candidates) cannot be  used.  Reason: An open source project is not contractually bound and volunteer resources make predictable future release dates unlikely (i.e. may be more likely to miss SPEC's 90 day General Availability window).  A “stable” release is one that is clearly denoted as a stable release or a release that is available and recommended for general use. It must be a release that is not on the development fork, not designated as an alpha, beta, test, preliminary, pre-released, prototype, release-candidate, or any other terms that indicate that it may not be suitable for general use.

  3. The initial “stable” release of the application must be a minimum of 12 months old.  Reason: This helps ensure that the software has real application to the intended user base and is not a benchmark special that's put out with a benchmark result and only available for the 1st three months to meet SPEC's forward availability window.

  4. At least 2 additional releases (major, minor, or bug fix) must also have been completed.  Reason: This helps establish a track record for the project that it is actively being maintained.

  5. An established online support forum must be in place and clearly active, "usable", and "useful".  It’s expected that there be at least one posting within the last 90 days.  Postings from the benchmarkers or their representatives, or members of the Web subcommittee will not be included in the count. Reason: Another aspect that establishes that support is available for the software.  However, benchmarkers must not cause the forum to appear active when it otherwise would not be. A "useful" support forum is defined as one that provides useful responses to users’ questions, such that if a previously unreported problem is reported with sufficient detail, it is responded to by a project developer or community member with sufficient information that the user ends up with a solution, a workaround, or has been notified that the issue will be address in a future release, or that its outside the scope of the project.  The archive of the problem-reporting tool must have examples of this level of conversation. A "usable" support forum is defined as one where the problem reporting tool was available without restriction, had a simple user-interface, and users can access old reports.

  6. The project must have at least 2 identified developers contributing and maintaining the application.  Reason: To help ensure that this is a real application with real developers and not a fly-by-night benchmark special.

  7. The application must use a standard open source license such as one of those listed at http://www.opensource.org/licenses/.

  8. The “stable” release used in the actual test run must be the current stable release at the time the test result is run or the prior “stable” release if the superseding/current “stable” release will be less than 90 days old at the time the result is made public.

  9. The “stable” release used in the actual test run must be no older than 18 months.  If there has not been a “stable” release within 18 months, then the open source project may no longer be active and as such may no longer meet these requirements.  An exception may be made for “mature” projects (see below).

  10. In rare cases, open source projects may reach “maturity” where the software requires little or no maintenance and there may no longer be active development.  If it can be demonstrated that the software is still in general use and recommended either by commercial organizations or active open source projects or user forums and the source code for the software is less than 20,000 lines, then a request can be made to the subcommittee to grant this software “mature” status.  This status may be reviewed semi-annually.  An example of a “mature” project would be the FastCGI library.

3.2.8 Test Sponsor

The reporting page must list the date the test was performed, month and year, the organization which performed the test and is reporting the results, and the SPEC license number of that organization.

3.2.9 Notes

This section is used to document:

3.3 Log File Review

The following additional information may be required to be provided for SPEC's results review:

The submitter is required to keep the entire log file from both the SUT and the BeSim box, for each of the three workloads, for the duration of the review period.


4.0 Submission Requirements for SPECweb2005

Once you have a compliant run and wish to submit it to SPEC for, you will need to provide the following:

Once you have the submission ready, please email SPECweb2005 submissions to subweb2005@spec.org.

SPEC encourages the submission of results for review by the relevant subcommittee and subsequent publication on SPEC's web site. Licensees may publish compliant results independently; however, any SPEC member may request a full disclosure report for that result and the test sponsor must comply within 10 business days. Issues raised concerning a result's compliance to the run and reporting rules will be taken up by the relevant subcommittee regardless of whether or not the result was formally submitted to SPEC.



5.0 The SPECweb2005 Release 1.0 Benchmark Kit

SPEC provides client driver software, which includes tools for running the benchmark and reporting its results.  This client driver is written in Java; precompiled class files are included with the kit, so no build step is necessary. This software implements various checks for conformance with these run and reporting rules. Therefore the SPEC software must be used; except that necessary substitution of equivalent functionality (e.g. fileset generation) may be done only with prior approval from SPEC. Any such substitution must be reviewed and deemed "performance-neutral" by the OSSC.

The kit also includes Java code for the file set generator (Wafgen) and C code for BeSim.

SPEC also provides server-side script code for each workload. In the initial release, PHP and JSP scripts are provided.  These scripts have been tested for functionality and correctness on various operating systems and Web servers. Hence all submissions must use either of these script implementations.  Any new dynamic script implementation will be evaluated by the sub-committee according to the acceptance process (see section 2.4)

Once the code is approved by the sub-committee, it will be made available on the SPEC Web site for any licensee to use in their tests/submissions.  Upon approval, the new implementation will be made available in future releases of the benchmark and may not be used until after the release of the new version.


Copyright© 2005 Standard Performance Evaluation Corporation

Java® is a registered trademark of Sun Microsystems.