SPECweb2005 Run Rules

SPECweb2005 Release 1.20 Run and Reporting Rules

Version 1.20, Last modified 6/30/08

     1.0 Introduction
          1.1 Philosophy
          1.2 Fair Use of SPECweb2005 Results
          1.3 Research and Academic Usage
          1.4 Caveat
     2.0 Running the SPECweb2005 Benchmark
          2.1 Environment
               2.1.1 Protocols
               2.1.2 Testbed Configuration
               2.1.3 System Under Test (SUT)
          2.2 Measurement
               2.2.1 Load Generation
               2.2.2 Benchmark Parameters
               2.2.3 Running SPECweb2005 Workloads
          2.3 Workload Filesets
               2.3.1 Banking Fileset
               2.3.2 E-commerce Fileset
               2.3.3 Support Site Fileset
          2.4 Dynamic Request Processing
     3.0 Reporting Results
          3.1 Metrics And Reference Format
               3.1.1 Categorization of Results
          3.2 Testbed Configuration
               3.2.1 SUT Hardware
               3.2.2 SUT Software
               3.2.2.1 SUT Software Tuning Allowances
               3.2.3 Network Configuration
               3.2.4 Clients
               3.2.5 Backend Simulator (BeSim)
               3.2.6 General Availability Dates
               3.2.7 Rules on the Use of Open Source Applications
               3.2.8 Test Sponsor
               3.2.9 Notes
          3.3 Log File Review
     4.0 Submission Requirements for SPECweb2005
     5.0 The SPECweb2005 Benchmark Kit

1.0 Introduction

This document specifies how SPECweb2005 is to be run for measuring and publicly reporting performance results. These rules abide by the norms laid down by the SPEC Web Subcommittee and approved by the SPEC Open Systems Steering Committee. This ensures that results generated with this suite are meaningful, comparable to other generated results, and are repeatable (with documentation covering factors pertinent to duplicating the results).

Per the SPEC license agreement, all results publicly disclosed must adhere to these Run and Reporting Rules.

1.1 Philosophy

The general philosophy behind the rules of SPECweb2005 is to ensure that an independent party can reproduce the reported results.

The following attributes are expected:

Proper use of the SPEC benchmark tools as provided.
Availability of an appropriate full disclosure report.
Support for all of the appropriate protocols.

Furthermore, SPEC expects that any public use of results from this benchmark suite shall be for System Under Test (SUT) and configurations that are appropriate for public consumption and comparison. Thus, it is also expected that:

Hardware and software used to run this benchmark must provide a suitable environment for serving WWW documents.
Optimizations utilized must improve performance for a larger class of workloads than those defined by this benchmark suite.
The SUT and configuration is generally available, documented, supported, and encouraged by the vendor(s) or provider(s).

SPEC requires that any public use of results from this benchmark follow the SPEC OSG Fair Use Policy and those specific to this benchmark (see Fair Use section below). In the case where it appears that these guidelines have not been adhered to, SPEC may investigate and request that the published material be corrected.

1.2 Fair Use of SPECweb2005 Results

When public disclosures and competitive comparisons are made using SPECweb2005 benchmark results the following benchmark specific rules apply:

Results from a fully compliant run of the SPECweb2005 suite must be used when making competitive comparisons. A fully compliant run consists of a valid run of each workload in the suite: banking, ecommerce, and support and the associated full disclosure report.
Only the following approved normalized metrics and submetrics generated from a complete and compliant set of results for all three workloads may be used: SPECweb2005, SPECweb2005_Banking, SPECweb2005_Ecommerce, SPECweb2005_Support.
SPECweb2005_Banking, SPECweb2005_Ecommerce, SPECweb2005_Support metrics from valid single workload runs may be published but competitive comparisons to SPEC published results are not allowed.
Simultaneous User Sessions may be used when comparing results from any one workload (above limitations apply).
Median Aggregate QoS Compliance and or Total Weighted Aggregate Byte Rate values may be used to distinguish between SPECweb2005 workload specific submetrics at the same value.
The following comparisions between results categories (Single Node Platform, Homogeneous Multi Node Platform, and Heterogeneous/Solution Platform) are allowed where a basis for the comparison has been defined, all others are prohibited:

comparisons between Single Node Platform results,
comparisons between any Homogeneous Multi Node results,
comparisions between corresponding Homogeneous Multi Node Platform results and the corresponding Single Node result (same node type, same submitter),

comparisons between Heterogeneous/Solution Platform results.

SPEC expects that the following template be used:

SPEC? and SPECweb? are registered trademarks of the Standard Performance Evaluation Corp. (SPEC). Competitive numbers shown reflect results published on www.spec.org as of <date>. [The comparison presented is based on <basis for comparison>]. For the latest SPECweb2005 results visit http://www.spec.org/osg/web2005.
(Note: [...] above required only if selective comparisons are used.)

Example:

SPECweb2005 is a trademark of the Standard Performance Evaluation Corp. (SPEC). Competitive numbers shown reflect results published on www.spec.org as of November 12, 2005. The comparison presented is based on best performing 4-core Single Node Platform servers currently shipping by Vendor 1, Vendor 2 and Vendor 3. For the latest SPECweb2005 results visit http://www.spec.org/osg/web2005.

The rationale for the template is to provide fair comparisons, by ensuring that:

The date at which the competitive data was retrieved is clearly mentioned. Such date must be within the time frame in which new SPECweb2005 results are published.
The subset of results used for comparison is clearly defined.
A reference to http://www.spec.org is included.

1.3 Research and Academic Usage

SPEC encourages use of the SPECweb2005 benchmark in academic and research environments. It is understood that experiments in such environments may be conducted in a less formal fashion than that demanded of licensees submitting to the SPEC web site. For example, a research environment may use early prototype hardware or software that simply cannot be expected to function reliably for the length of time required to complete a compliant data point, or may use research hardware and/or software components that are not generally available. Nevertheless, SPEC encourages researchers to obey as many of the run rules as practical, even for informal research. SPEC respectfully suggests that following the rules will improve the clarity, reproducibility, and comparability of research results.

Where the rules cannot be followed, the deviations from the rules must be disclosed. SPEC requires these non-compliant results be clearly distinguished from results officially submitted to SPEC or those that may be published as valid SPECweb2005 results. For example, a research paper can use simultaneous sessions but may not refer to them as SPECweb2005 results if the results are not compliant.

1.4 Caveat

SPEC reserves the right to adapt the benchmark codes, workloads, and rules of SPECweb2005 as deemed necessary to preserve the goal of fair benchmarking. SPEC will notify members and licensees whenever it makes changes to this document and will rename the metrics.

Relevant standards are cited in these run rules as URL references, and are current as of the date of publication. Changes or updates to these referenced documents or URLs may necessitate repairs to the links and/or amendment of the run rules. The most current run rules will be available at the SPEC Web site at http://www.spec.org. SPEC will notify members and licensees whenever it makes changes to the suite.

2.0 Running the SPECweb2005 Benchmark

2.1 Environment

2.1.1 Protocols

As the WWW is defined by its interoperative protocol definitions, SPECweb2005 requires adherence to the relevant protocol standards. It is expected that the Web server is HTTP 1.1 compliant. The benchmark environment shall be governed by the following standards:

RFC 2616 Hypertext Transfer Protocol 1.1 -- HTTP 1.1(Draft Standard)
RFC 791 Internet Protocol (IPv4) (Standard)

updated by RFC1349 Type of Service in the Internet Protocol Suite (Proposed Standard)

RFC 792 Internet Control Message Protocol (Standard)

updated by RFC 950 Internet Standard Subnetting Procedure (Standard)

RFC 793 Transmission Control Protocol (TCP) (Standard)

updated by RFC 3168 The Addition of Explicit Congestion Notification (ECN) to IP (Proposed Standard)

RFC 950 Internet Standard Subnetting Procedure (Standard)
RFC 1122 Requirements for Internet Hosts - Communication Layers (Standard)

updated by RFC 2474 Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers. (Proposed Standard)

RFC 2460 Internet Protocol, Version 6 Specification (IPv6) (Draft Standard) Note: may be used in place of or in conjunction with IPv4.

To run SPECweb2005, in addition to all the above standards, SPEC requires the SUT to support SSLv3 as defined in the following:

SSL Protocol V3 defined in http://www.netscape.com/eng/ssl3/draft302.txt

Of the various ciphers supported in SSLv3, cipher SSL_RSA_WITH_RC4_128_MD5 is currently required for all workload components that use SSL. It was selected as one of the most commonly used SSLv3 ciphers and allows results to be directly compared to each other. SSL_RSA_WITH_RC4_128_MD5 consists of:

RSA public key (asymmetric) encryption with a 1024-bit key
RC4 symmetric encryption with a 128-bit key for bulk data encryption
MD5 digest algorithm with 128-bit output for the Message Authentication Code (MAC)

A compliant result must use the cipher suite listed above, and must employ the 1024 bit key for RSA public key encryption, 128-bit key for RC4 bulk data encryption, and have a 128-bit output for the Message Authentication code.

For further explanation of these protocols, the following might be helpful:

RFC 1180 TCP/IP Tutorial (RFC 1180) (Informational)
RFC 2151 A Primer on Internet and TCP/IP Tools and Utilities (RFC 2151) (Informational)
RFC 1321 MD5 Message Digest Algorithm (Informational)

The current text of all IETF RFC's may be obtained from: http://ietf.org/rfc.html

All marketed standards that a software product states as being adhered to must have passed the relevant test suits used to ensure compliance with the standards. For example, In the case of Java Servlet Pages, one must pass the published test suites from Sun.

2.1.2 Testbed Configuration

These requirements apply to all hardware and software components used in producing the benchmark result, including the System under Test (SUT), network, and clients.

The SUT must conform to the appropriate networking standards, and must utilize variations of these protocols to satisfy requests made during the benchmark.

The value of TCP TIME_WAIT must be at least 60 seconds (i.e. if a connection between the SUT and a client enters TIME_WAIT is must stay in TIME_WAIT for at least 60 second).

The SUT must be comprised of components that are generally available on or before date of publication, or shall be generally available within 90 days of the first publication of these results.

Any deviations from the standard default configuration for testbed configuration components must be documented so an independent party would be able to reproduce the configuration and the result without further assistance.

The connections between a SPECweb2005 load generating machine and the SUT must not use a TCP Maximum Segment Size (MSS) greater than 1460 bytes. This needs to be accomplished by platform-specific means outside the benchmark code itself. The method used to set the TCP MSS must be disclosed. MSS is the largest "chunk" of data that TCP will send to the other end. The resulting IP datagram is normally 40 bytes larger: 20 bytes for the TCP header and 20 bytes for the IP header resulting in an MTU (Maximum Transmission Unit) of 1500 bytes.

The BeSim engine must be run on a physically different system from the SUT.

Open Source Applications that are outside of a commercial distribution or support contract must adhere to the Rules on the Use of Open Source Applications (section 3.2.7).

2.1.3 System Under Test (SUT)

For a run to be valid, the following attributes must hold true:

The SUT returns the complete and appropriate byte streams for each request made.
The SUT and BeSim log the following information for each request made: address of the requester, a date and time stamp accurate to at least 1 second, specification of the file requested, size of the file transferred, and the final status of the request. These requirements are satisfied by the Common Log Format.
No dynamic content responses shall be cached by the SUT. In other words, the SUT dynamic code must generate the dynamic content for each request.
The SUT and BeSim must utilize stable storage for all data files and Web server logs. The log file records must be written to non-volatile storage, at least once every 60 seconds. The SUT and BeSim must maintain the log for the entire duration of the run.
The submitter is required to keep the log files for the SUT and BeSim from the run for the duration of the review cycle and make them available upon request.

2.2 Measurement

2.2.1 Load Generation

The SPECweb2005 individual workload metrics represent the actual number of user sessions that a server can support while meeting quality of service (QoS) and validation requirements for the given workload. In the benchmark run, a number of simultaneous user sessions are requested. Typically, each user session would start with a single thread requesting a dynamically created file or page. Following the receipt of this file and the need to request multiple embedded files within the page, two threads corresponding to that user session actively make connections and request files on these connections. The number of threads making requests on behalf of a given user session is limited to two, in order to comply with the HTTP 1.1 recommendations.

The load generated is based on page requests, transition between pages and the static images accessed within each page, as defined in the SPECweb2005 Design Specification.

The QoS requirements for each workload are defined in terms of two parameters, Time_Good and Time_Tolerable. QoS requirements are page based, Time_Good and Time_Tolerable values are defined separately for each workload (Time_Tolerable > Time_Good). For each page, 95% of the page requests (including all the embedded files within that page) are expected to be returned within Time_Good and 99% of the requests within Time_Tolerable. Very large static files (i.e. Support downloads) use specific byte rates as their QoS requirements.

The validation requirement for each workload is such that less than 1% of requests for any given page and less than 0.5% of the all page requests in a given test iteration fail validation.

It is required in this benchmark that all user sessions be run at the HIGH-SPEED-INTERNET speed of 100,000 bytes/sec.

In addition, the URL retrievals (or operations) performed must also meet the following quality criteria:

There must be least 100 requests for each type of page defined in the workload represented in the result.
The Weighted Percentage Difference (WPD) between the Expected Number of Requests (ENR) and the actual number of requests (ANR) for any given page should be within +/- 1%.
The sum of the per page Weighted Percentage Differences (SWPD) must not exceed +/- 1.5% .

Note: The Weighted Percentage Difference for any given workload page is calculated using the following formulas:

WPD = PageMix% * ETR

ETR = (#Sessions * RunTime) / (ThinkTime * %RwTT + AvgRspTime)

Where:

ETR is the calculated Expected number of Total Requests.
PageMix% is the percentage requests for the given workload page (see Table below).
#Sessions is the number of Simultaneous Sessions requested for the test.
RunTime is the RUN_SECONDS for each iteration; 1800 seconds for a compliant test run.
ThinkTime is the workload specific value for THINK_TIME; 10 seconds for Banking and Ecommerce and 5 seconds for Support.
%RwTT is the workload specific percentage of Requests with Think Time. In each workload, some page transitions include user think time while some page transitions do not include the think time (such as the initial request at the start of a session). The %RwTT value factors this difference into the calculation. For Banking, the %RwTT = 61.58%; for Ecommerce, the %RwTT = 91.94; and for Support, the %RwTT = 92.08.
AvgRspTime is the Average Response Time for the iteration taken from the result page for the test.

Workload Page Mix Percentage Table
Banking	Mix %	Ecommerce	Mix%	Support	Mix%
acct summary	15.11%	billing	3.37%	catalog	11.71%
add payee	1.12%	browse	11.75%	download	6.76%
bill pay	13.89%	browse product	10.03%	file	13.51%
bill pay status	2.23%	cart	5.30%	file catalog	22.52%
check detail html	8.45%	confirm	2.53%	home	8.11%
check image	16.89%	customize1	16.93%	product	24.78%
change profile	1.22%	customize2	8.95%	search	12.61%
login	21.53%	customize3	6.16%
logout	6.16%	index	13.08%
payee info	0.80%	login	3.78%
post check order	0.88%	product detail	8.02%
post fund transfer	1.24%	search	6.55%
post profile	0.88%	shipping	3.55%
quick pay	6.67%
request checks	1.22%
req xfer form	1.71%

2.2.2 Benchmark Parameters

Workload-specific configuration files are supplied with the harness. All configurable parameters are listed in these files. For a run to be valid, all the parameters in the configuration files must be left at default values, except for the ones that are marked and listed clearly as "Configurable Workload Properties".

2.2.3 Running SPECweb2005 Workloads

Since SPECweb2005 contains three distinct workloads (banking, ecommerce, and support), the benchmarker may:

Run the three workloads in any order.
Reboot the SUT and any or all parts of the the testbed between tests.
Retune the SUT software to optimize for each workload (tuning details must be included in the disclosure).
Remove the fileset for one workload to free storage to hold the fileset for another workload.

2.3 Workload Filesets

The particular files referenced shall be determined by the workload generation in the benchmark itself. A fileset for a workload consists of content that the dynamic scripts reference. This represents images, static content, and also "padding" to bring the dynamic page sizes in-line with that observed in real-world Web sites. All filesets are to be generated using the Wafgen fileset generator supplied with the benchmark tools. It is the responsibility of the benchmarker to ensure that these files are placed on the SUT so that they can be accessed properly by the benchmark. These files, and only these files shall be used as the target fileset. The benchmark performs internal validations to verify the expected results. No modification or bypassing of this validation is allowed.

The SUT is required to be configured with the storage to contain all necessary software and logs for compliant runs of all three workloads. As a minimum, the system must also be configured to contain the largest fileset of three workloads, such that each of the other two workload filesets can be mapped into to the same storage footprint. If the system has not been configured to contain storage to hold the filesets for all three workloads concurrently, then the benchmarker must use the same I/O subsystem (disks, controllers, etc) and not add or remove storage. The disclosure details must indicate whether the filesets were stored concurrently or remapped between workload runs.

2.3.1 Banking Fileset

For the Banking workload, we define two types of files:

1. The embedded image files, that do not grow with the load. Details on these files (bytes and type) are specified in the design document.
2. The number of check images increase linearly with the number of simultaneous connections supported. For each connection supported, we would maintain check images for 50 users, each in its own directory. For each user defined, there will be 20 check images maintained, 10 representing the front of the checks and the other 10 representing the back of the checks.

The above assumes that under high load conditions in a banking environment, we would expect to see no more than 1% of the banking customers logged in at the same time.

2.3.2 E-commerce Fileset

For the E-commerce workload, two types of files are defined:

1. The embedded image files, that do not grow with the load. Details on these files (bytes and type) are specified in the design document. 2. The product images, which increase linearly with the number of simultaneous sessions requested. For each simultaneous session, 5 "product line" directories are created. Each product line directory contains images for 10 different "products". Each product has 3 different sizes, representing the various views of products that are often presented to users (i.e., thumbnails, medium-sized, and larger close-up views).

2.3.3 Support Site Fileset

For the support site workload, two types of files are defined:

1. The embedded image files, that do not grow with the load. Details on these files (bytes and type) are specified in the design document.
2. The file downloads, which increase linearly with the number of simultaneous sessions requested. The ratio of simultaneous sessions to download directories is 4:1. Each directory contains downloads for 5 different categories (i.e. flash BIOS upgrades, video card drivers, etc.). The file sizes were determined by analyzing the file sizes observed at various hardware vendors' support sites.

2.4 Dynamic Request Processing

SPECweb2005 follows a page based model. Each page is initiated by a dynamic GET or POST request, which runs a dynamic script on the server and returns a dynamically created Web page. Associated with each dynamic page, are a set of static files or images, which the client requests right after the receipt of the dynamically created page. The page returned is marked as complete when all the associated images/static files for that page are fully received.

Only the dynamic scripts provided in the benchmark kit may be used for submissions/publications. The current release provides implementations in PHP and JSP.

The pseudo code reference specifications are the standard definition of the functionality. Any dynamic implementation must follow the specification exactly.

For new dynamic implementations, the submitter must inform the sub-committee at least one month prior to the actual code submission. All dynamic implementations submitted to SPEC must include a signed permission to use form and must be freely available for use by other members and licensees of the benchmark. Once the code has been submitted, the sub-committee will then review the code for a period of four months. Barring any issues with the implementation, the sub-committee will then incorporate the implementation into a new version of the benchmark.

Approval of any newly submitted dynamic code for future releases will include testing conformance to pseudo code as well as running of the code on other platforms by active members of the sub-committee. This will be done in order to ensure compliance with the letter and spirit of the benchmark, namely whether the scripts used to code the dynamic requests are representative of scripts commonly in use within the relevant customer base. An acceptable scripting language must meet the following requirements:

The scripting language must have been in production use for at least 12 months
There must be a minimum of 100 independent sites in production that have used the scripting language for at least 6 months to demonstrate applicability to real world environments.
It must use the facilities provided by the scripting language, wherever possible, to meet the pseudo code. For facilities not provided by the scripting language, where a lower-level language must be used, the subcommittee will review the implementation to ensure any deviations from the core scripting language are required.
The script interpreter must run in user mode. Dynamic content can not be executed within the kernel.

3.0 Reporting Results

3.1 Metrics And Reference Format

The reported metric, SPECweb2005, will be derived from a set of compliant results from all three workloads in the suite:

Banking, where all the requests use HTTPS (SSL);
Ecommerce, which includes both HTTP as well as HTTPS requests; and
Support, which uses only HTTP requests

The SPECweb2005 metric is a "supermetric" that is the geometric mean of the three normalized submetrics for each workload. The normalized submetric for a given workload is defined as the ratio of the workload metric for the SUT to the workload metric for the reference platform multiplied by 100.

The individual workload metric is the number of simultaneous sessions from a compliant test run consisting of of three consecutive valid and conforming iterations of the benchmark, using one invocation of "java specweb".

Each iteration consists of a minimum 3 minute thread ramp up and a minimum 5 minute warm up period and a 30 minute measurement period (i.e. run time; which may be increased to ensure at least 100 requests for each page type are completed where the load is minimal). There are also corresponding rampdown periods (3 minutes + 5 minutes) between iterations.

The SPECweb2005 reference platform consists of:

Hardware:

Processor: AMD Athlon MP 1.2Ghz
Motherboard: Tyan S2462
Memory: 2 GB RAM (4x512M DIMMs)
Disk Subsystem: Adaptec AIC-7899W on-board
First channel: 1 x 18G U160 SCSI drive for OS/Swap/Logs
Second channel: 3 x 18G U160 SCSI drives in Software RAID0 for fileset
Network: Intel PRO/1000 XT

Software:

Operating System: Fedora Core 2 (2.6.5-1.358 kernel)
Webserver: Apache 1.3.31 with mod_ssl 2.8.20 and OpenSSL 0.9.7e
Scripting engine: PHP 4.3.9 and Smarty 2.6.6

The metric SPECweb2005 and individual workload metrics may not be associated with any estimated results. This includes adding, multiplying or dividing measured results to create a derived metric for some other system configuration.

The report of results for the SPECweb2005 benchmark is generated in ASCII and HTML format by the provided SPEC tools. These tools may not be changed without prior SPEC approval. The tools perform error checking and will flag some error conditions as resulting in an "invalid run". However, these automatic checks are only there for debugging convenience, and do not relieve the benchmarker of the responsibility to check the results and follow the run and reporting rules.

SPEC reviews and accepts for publication on SPEC's website only a complete and compliant set of results for all three workloads run and reported according to these rules. SPEC allows the public disclosure of the SPECweb2005 metric as well as the individual workload metrics: SPECweb2005_Banking, SPECweb2005_Ecommerce, and SPECweb2005_Support from compliant runs of these workloads without formal submission to SPEC. All public disclosures must adhere to the Fair Use Rules. Full disclosure reports of all test and configuration details as described in these run and report rules must be made available. Licensees are encouraged to run all three workloads and submit them to SPEC for publication.

3.1.1 Categorization of Results

SPECweb2005 results will be categorized into single and multiple node results, where the terms single and multiple nodes are as defined in this section. Multiple nodes are again defined to be of two types, homogeneous and heterogeneous. Moreover, for multiple submissions involving homogeneous nodes, the subcommittee will also require a submission on a corresponding single-node platform (see details in the following paragraphs).

A Single Node Platform for SPECweb2005 consists of one or more processors executing a single instance of a first level supervisor software, i.e. an operating system or a hypervisor hosting one or more instances of the same guest operating system, where one or more instances of the same web server software are executed on the main operating system or the guest operating systems. ?Externally attached storage for software and filesets may be used; all other performance critical operations must be performed within the single server node.? A single common set of NICs must be used across all 3 workloads to relay all HTTP and HTTPS? traffic.

Example:

test harness (clients, switches)=|=Server NICs:Server Node:Storage

A Homogeneous Multi Node Platform for SPECweb2005 consists of two or more electrically equivalent single Node Servers in a single chassis or connected through a shared bus. Each node contains the same number and type of processing units and devices and each node executes a single instance of an OS and one or more instances of the same Web server software. Storage for the filesets may be duplicated or shared. All incoming requests from the test harness must be load balanced by either by a single node that receives all incoming requests and balances the load across the other nodes (A) or by a separate load balancing appliance that serves that function (B). Each node must contain a single common set of NICs that must be used across all 3 workloads to relay all HTTP and HTTPS traffic.

If a separate load balancing appliance is used it must be included in the SUT's definition.

test harness (clients, switches)=|=Node_1 NICs:Node_1_LB:Node_2:..:Node_N

                                                  Node_1

                                 |               /

test harness (clients, switches)=|=LoadBalancer +------Node_2

                                 |               \

                                                  Node_N

A Heterogeneous/Solution Platform for SPECweb2005 consists of any combination of server nodes and appliances that have been networked together to provide all the performance critical functions measured by the benchmark. All incoming requests from the test harness must be load balanced by either a single node that receives all incoming requests and balances the load across the other nodes or by a separate load balancing appliance that serves that function. Electrical equivalence between server nodes is not required.

Storage for the filesets may be duplicated or shared. Additional appliances that provide performance critical operations such as intelligent switches or SSL appliances may be used. All nodes and appliances used must be included in the SUT's definition. Examples: C & D.

test harness (clients, switches)-|-I_Switch-Node_1 NICs:Node_1_LB:Node2:..:Node_N

                                                  SSLappliance-ImageServer_1

                                 |               /

test harness (clients, switches)-|-LoadBalancer-+-SSLappliance-Node_2

                                 |               \

                                                  SSLappliance-Node_N

3.2 Testbed Configuration

All system configuration information required to duplicate published performance results must be reported. Tunings not in default configuration for software and hardware settings including details on network interfaces must be reported.

3.2.1 SUT Hardware

The SUT hardware configuration must not be changed between workload runs. However, not all hardware used in one workload is required to be used in another. In the case where multiple controllers are used for one workload, the same controllers must be electronically connected, and some subset of those controllers must be used, for the other workloads.
In the case of NICs, all NICs must be used by each workload and each NIC must carry a significant portion of the network traffic.

The following SUT hardware components must be reported:

Vendor's name
System(s) model names
Processor name, clock rate, number of processors (#cores, #chips, #cores/chip, on-chip threading enabled/disabled), and size and organization of primary, secondary, and other cache, per processor. If a level of cache is shared among processor cores in a system that must be stated in the "notes" section
Main memory size and memory configuration if this is an end-user option which may affect performance, e.g. interleaving and access time
Other hardware, e.g. write caches, or other accelerators
Number, type, model, and capacity of disk controllers and drives
Type of file system used

The documentation of the hardware for a result in the Heterogeneous/Platform category must also include a diagram of the configuration.

3.2.2 SUT Software

The following SUT software components must be reported:

HTTP (Web) Server software and version.
Dynamic code type used (i.e. PHP, JSP)
Operating System and version.
The values of MSL (maximum segment life) and TIME-WAIT. If TIME-WAIT is not equal to 2*MSL, that must be noted. (Reference section 4.2.2.13 of RFC 1122).
Any other software packages used during the benchmarking process (Smarty etc.)
Other clarifying information as required to reproduce benchmark results (e.g. number of daemons, BIOS parameters, Web server buffer cache size, disk striping, non-default kernel parameters, etc.), and logging mode, must be stated in the "notes" section.
Additionally, the submitter must be prepared to make available a description of each of the tuning features that were utilized (e.g. kernel parameters, Web software settings, etc.) including the purpose of that tuning feature. Where possible, it must be noted how the values used differ from the default settings for that tuning feature. The SUT software tuning configuration can be modified between workload runs.
Any software compiled by the submitter must supply the compiler version and non-default compiler tuning options used to build it.
The method for creating the RSA public encryption key and certification must be stated.

3.2.2.1 SUT Software Tuning Allowances

The following SUT software tunings are acceptable:

The use of long web server cache check/refresh intervals for static pages or images (longer than the benchmark runtime) is allowed.
Disabling the static page update parameter in the Web server software is allowed.

3.2.2.2 SUT Software Tuning Limitations

The following SUT software tunings are not acceptable:

Percentage of requests directed towards various NIC interfaces cannot be varied between workloads.

3.2.3 Network Configuration

A brief description of the network configuration used to achieve the benchmark results is required. The minimum information to be supplied is:

Number, type, and model of network controllers
Number and type of networks used
Base speed of network
Number, type, model, and relationship of external network components to support SUT (e.g., any external routers, hubs, switches, etc.)
A network configuration notes section may be used to list the following additional information:
Relationship of clients, client type, and networks (including routers, etc. if applicable) -- in short: which clients are connected to which LAN segments. For example: "client1 and client2 on one ATM-622, client3 and client4 on second ATM-622, and clients 5, 6, and 7 each on their own 100TX segment."

3.2.4 Clients

The following load generator hardware components must be reported:

Number of client systems
Client weighting if the load is not distributed evenly across all clients
System model number, processor type and clock rate, number of processors
Main memory size
Network Controller
Operating System and Version
JVM version used to run client
Other performance critical Hardware
Other performance critical Software

3.2.5 Backend Simulator (BeSim)

The following BeSim hardware and software components must be reported:

Number of BeSim systems
System model number, processor type and clock rate, number of processors
Main memory size
Network Controller
Operating System and Version
Web Server Software and Version
BeSim Server API
Other performance critical Hardware
Other performance critical Software

Note: BeSim API code is provided as part of the SPECweb2005 kit, and can be compiled in several different ways: ISAPI, NSAPI, or FastCGI. For more information, please see the User's Guide.

3.2.6 General Availability Dates

The dates of general customer availability must be listed for the major components: hardware, HTTP server, and operating system, month and year. All the system, hardware and software features are required to be generally available on or before date of publication, or within 90 days of the date of publication (except where precluded by these rules, see section 3.2.7). With multiple components having different availability dates, the latest availability date must be listed.

Products are considered generally available if they are orderable by ordinary customers and ship within a reasonable time frame. This time frame is a function of the product size and classification, and common practice. The availability of support and documentation for the products must coincide with the release of the products.

Hardware products that are still supported by their original or primary vendor may be used if their original general availability date was within the last five years. The five-year limit is waived for hardware used in client and BeSim systems.

Software products that are still supported by their original or primary vendor may be used if their original general availability date was within the last three years.

In the disclosure, the benchmarker must identify any component that is no longer orderable by ordinary customers.

If pre-release hardware or software is tested, then the test sponsor represents that the performance measured is generally representative of the performance to be expected on the same configuration of the release system. If the sponsor later finds the performance to be lower than 5% of that reported for the pre-release system, then the sponsor shall resubmit a new corrected test result.

3.2.7 Rules on the Use of Open Source Applications

SPECweb2005 does permit Open Source Applications outside of a commercial distribution or support contract with some limitations. The following are the rules that govern the admissibility of the Open Source Application in the context of a benchmark run or implementation. Open Source Applications do not include shareware and freeware, where the source is not part of the distribution.

Open Source Application rules do not apply to Open Source operating systems, which would still require a commercial distribution and support.
Only a "stable" release can be used in the benchmark environment; non-"stable" releases (alpha, beta, or release candidates) cannot be used.

Reason: An open source project is not contractually bound and volunteer resources make predictable future release dates unlikely (i.e. may be more likely to miss SPEC's 90 day General Availability window). A "stable" release is one that is clearly denoted as a stable release or a release that is available and recommended for general use. It must be a release that is not on the development fork, not designated as an alpha, beta, test, preliminary, pre-released, prototype, release-candidate, or any other terms that indicate that it may not be suitable for general use.
The initial "stable" release of the application must be a minimum of 12 months old.

Reason: This helps ensure that the software has real application to the intended user base and is not a benchmark special that's put out with a benchmark result and only available for the 1st three months to meet SPEC's forward availability window.
At least two additional stable releases (major, minor, or bug fix) must have been completed, announced and shipped beyond the initial stable release.

Reason: This helps establish a track record for the project and shows that it is actively maintained.
An established online support forum must be in place and clearly active, "usable", and "useful". It’s expected that there be at least one posting within the last 90 days. Postings from the benchmarkers or their representatives, or members of the Web subcommittee will not be included in the count.

Reason: Another aspect that establishes that support is available for the software. However, benchmarkers must not cause the forum to appear active when it otherwise would not be. A "useful" support forum is defined as one that provides useful responses to users’ questions, such that if a previously unreported problem is reported with sufficient detail, it is responded to by a project developer or community member with sufficient information that the user ends up with a solution, a workaround, or has been notified that the issue will be address in a future release, or that its outside the scope of the project. The archive of the problem-reporting tool must have examples of this level of conversation. A "usable" support forum is defined as one where the problem reporting tool was available without restriction, had a simple user-interface, and users can access old reports.
The project must have at least 2 identified developers contributing and maintaining the application.

Reason: To help ensure that this is a real application with real developers and not a fly-by-night benchmark special.
The application must use a standard open source license such as one of those listed at http://www.opensource.org/licenses/.
The "stable" release used in the actual test run must be the current stable release at the time the test result is run or the prior "stable" release if the superseding/current "stable" release will be less than 90 days old at the time the result is made public.
The "stable" release used in the actual test run must be no older than 18 months. If there has not been a "stable" release within 18 months, then the open source project may no longer be active and as such may no longer meet these requirements. An exception may be made for “mature” projects (see below).
In rare cases, open source projects may reach “maturity” where the software requires little or no maintenance and there may no longer be active development. If it can be demonstrated that the software is still in general use and recommended either by commercial organizations or active open source projects or user forums and the source code for the software is less than 20,000 lines, then a request can be made to the subcommittee to grant this software “mature” status. This status may be reviewed semi-annually. An example of a “mature” project would be the FastCGI library.

3.2.8 Test Sponsor

The reporting page must list the date the test was performed, month and year, the organization which performed the test and is reporting the results, and the SPEC license number of that organization.

3.2.9 Notes

This section is used to document:

System state: single or multi-user
System tuning parameters other than default
Process tuning parameters other than default
MTU size of the network used
Background load, if any
ANY approved portability changes made to the individual benchmark source code including module name, line number of the change.
Additional information such as compilation options may be listed
Critical customer-identifiable firmware or option versions such as network and disk controllers
Additional important information required to reproduce the results, which do not fit in the space allocated above must be listed here.
If the configuration is large and complex, added information must be supplied either by a separate drawing of the configuration or by a detailed written description which is adequate to describe the system to a person who did not originally configure it.
Part numbers or sufficient information that would allow the end user to order the SUT configuration if desired.

3.3 Log File Review

The following additional information may be required to be provided for SPEC's results review:

ASCII versions of the Web server and BeSim log files in the Common Log Format, as defined in http://www.w3.org/pub/WWW/Daemon/User/Config/Logging.html#LogFormat.

The submitter is required to keep the entire log file from both the SUT and the BeSim box, for each of the three workloads, for the duration of the review period.

4.0 Submission Requirements for SPECweb2005

Once you have a compliant run and wish to submit it to SPEC for review, you will need to provide the following:

The combined output raw file containing ALL the information outlined in section 3.
Log files from the run upon request.

Once you have the submission ready, please email SPECweb2005 submissions to subweb2005@spec.org.

SPEC encourages the submission of results for review by the relevant subcommittee and subsequent publication on SPEC's web site. Licensees may publish compliant results independently; however, any SPEC member may request a full disclosure report for that result and the test sponsor must comply within 10 business days. Issues raised concerning a result's compliance to the run and reporting rules will be taken up by the relevant subcommittee regardless of whether or not the result was formally submitted to SPEC.

5.0 The SPECweb2005 Benchmark Kit

SPEC provides client driver software, which includes tools for running the benchmark and reporting its results. This client driver is written in Java; precompiled class files are included with the kit, so no build step is necessary. This software implements various checks for conformance with these run and reporting rules. Therefore the SPEC software must be used; except that necessary substitution of equivalent functionality (e.g. fileset generation) may be done only with prior approval from SPEC. Any such substitution must be reviewed and deemed "performance-neutral" by the OSSC.

The kit also includes Java code for the file set generator (Wafgen) and C code for BeSim.

SPEC also provides server-side script code for each workload. In the initial release, PHP and JSP scripts are provided. These scripts have been tested for functionality and correctness on various operating systems and Web servers. Hence all submissions must use either of these script implementations. Any new dynamic script implementation will be evaluated by the sub-committee according to the acceptance process (see section 2.4).

Once the code is approved by the sub-committee, it will be made available on the SPEC Web site for any licensee to use in their tests/submissions. Upon approval, the new implementation will be made available in future releases of the benchmark and may not be used until after the release of the new version.

Java(r) is a registered trademark of Sun Microsystems.