SPECweb2005 Release 1.20 Run and Reporting Rules
Version 1.20, Last modified 6/30/08
1.0
Introduction
1.1 Philosophy
1.2
Fair Use of SPECweb2005 Results
1.3 Research and Academic Usage
1.4
Caveat
2.0 Running the
SPECweb2005 Benchmark
2.1 Environment
2.1.1 Protocols
2.1.2 Testbed Configuration
2.1.3 System Under Test (SUT)
2.2 Measurement
2.2.1 Load Generation
2.2.2 Benchmark Parameters
2.2.3 Running SPECweb2005 Workloads
2.3 Workload Filesets
2.3.1 Banking Fileset
2.3.2 E-commerce Fileset
2.3.3 Support Site Fileset
2.4 Dynamic Request Processing
3.0 Reporting Results
3.1
Metrics And Reference Format
3.1.1 Categorization of Results
3.2
Testbed Configuration
3.2.1 SUT Hardware
3.2.2 SUT Software
3.2.2.1 SUT Software Tuning Allowances
3.2.3 Network Configuration
3.2.4 Clients
3.2.5 Backend Simulator (BeSim)
3.2.6 General Availability Dates
3.2.7 Rules on the Use of Open Source Applications
3.2.8 Test Sponsor
3.2.9 Notes
3.3 Log File Review
4.0 Submission
Requirements for SPECweb2005
5.0 The SPECweb2005
Benchmark Kit
This document specifies how SPECweb2005 is to be run for measuring and publicly reporting performance results. These rules abide by the norms laid down by the SPEC Web Subcommittee and approved by the SPEC Open Systems Steering Committee. This ensures that results generated with this suite are meaningful, comparable to other generated results, and are repeatable (with documentation covering factors pertinent to duplicating the results).
Per the SPEC license agreement, all results publicly disclosed must adhere to these Run and Reporting Rules.
The general philosophy behind the rules of SPECweb2005 is to ensure that an independent party can reproduce the reported results.
The following attributes are expected:
Furthermore, SPEC expects that any public use of results from this benchmark suite shall be for System Under Test (SUT) and configurations that are appropriate for public consumption and comparison. Thus, it is also expected that:
SPEC requires that any public use of results from this benchmark follow the SPEC OSG Fair Use Policy and those specific to this benchmark (see Fair Use section below). In the case where it appears that these guidelines have not been adhered to, SPEC may investigate and request that the published material be corrected.
When public disclosures and competitive comparisons are made using SPECweb2005 benchmark results the following benchmark specific rules apply:
SPEC expects that the following template be used:
SPEC? and SPECweb? are registered trademarks of
the Standard Performance Evaluation Corp. (SPEC). Competitive numbers shown
reflect results published on www.spec.org as of <date>. [The comparison
presented is based on <basis for comparison>]. For the latest
SPECweb2005 results visit http://www.spec.org/osg/web2005.
(Note: [...] above required only if selective comparisons are used.)
Example:
SPECweb2005 is a trademark of the Standard Performance Evaluation Corp. (SPEC). Competitive numbers shown reflect results published on www.spec.org as of November 12, 2005. The comparison presented is based on best performing 4-core Single Node Platform servers currently shipping by Vendor 1, Vendor 2 and Vendor 3. For the latest SPECweb2005 results visit http://www.spec.org/osg/web2005.
The rationale for the template is to provide fair comparisons, by ensuring that:
SPEC encourages use of the SPECweb2005 benchmark in academic and research environments. It is understood that experiments in such environments may be conducted in a less formal fashion than that demanded of licensees submitting to the SPEC web site. For example, a research environment may use early prototype hardware or software that simply cannot be expected to function reliably for the length of time required to complete a compliant data point, or may use research hardware and/or software components that are not generally available. Nevertheless, SPEC encourages researchers to obey as many of the run rules as practical, even for informal research. SPEC respectfully suggests that following the rules will improve the clarity, reproducibility, and comparability of research results.
Where the rules cannot be followed, the deviations from the rules must be disclosed. SPEC requires these non-compliant results be clearly distinguished from results officially submitted to SPEC or those that may be published as valid SPECweb2005 results. For example, a research paper can use simultaneous sessions but may not refer to them as SPECweb2005 results if the results are not compliant.
SPEC reserves the right to adapt the benchmark codes, workloads, and rules of SPECweb2005 as deemed necessary to preserve the goal of fair benchmarking. SPEC will notify members and licensees whenever it makes changes to this document and will rename the metrics.
Relevant standards are cited in these run rules as URL references, and are current as of the date of publication. Changes or updates to these referenced documents or URLs may necessitate repairs to the links and/or amendment of the run rules. The most current run rules will be available at the SPEC Web site at http://www.spec.org. SPEC will notify members and licensees whenever it makes changes to the suite.
As the WWW is defined by its interoperative protocol definitions, SPECweb2005 requires adherence to the relevant protocol standards. It is expected that the Web server is HTTP 1.1 compliant. The benchmark environment shall be governed by the following standards:
To run SPECweb2005, in addition to all the above standards, SPEC requires the SUT to support SSLv3 as defined in the following:
Of the various ciphers supported in SSLv3, cipher SSL_RSA_WITH_RC4_128_MD5 is currently required for all workload components that use SSL. It was selected as one of the most commonly used SSLv3 ciphers and allows results to be directly compared to each other. SSL_RSA_WITH_RC4_128_MD5 consists of:
A compliant result must use the cipher suite listed above,
and must employ the 1024 bit key for RSA public key encryption, 128-bit key for
RC4 bulk data encryption, and have a 128-bit output for the Message
Authentication code.
For further explanation of these protocols, the following might be helpful:
The current text of all IETF RFC's may be obtained
from: http://ietf.org/rfc.html
All marketed standards that a software product states as being adhered to must
have passed the relevant test suits used to ensure compliance with the
standards. For example, In the case of Java Servlet Pages, one must pass the
published test suites from Sun.
These requirements apply to all hardware and software components used in producing the benchmark result, including the System under Test (SUT), network, and clients.
The SUT must conform to the appropriate networking standards, and must utilize variations of these protocols to satisfy requests made during the benchmark.
The value of TCP TIME_WAIT must be at least 60 seconds (i.e. if a connection between the SUT and a client enters TIME_WAIT is must stay in TIME_WAIT for at least 60 second).
The SUT must be comprised of components that are generally available on or before date of publication, or shall be generally available within 90 days of the first publication of these results.
Any deviations from the standard default configuration for testbed configuration components must be documented so an independent party would be able to reproduce the configuration and the result without further assistance.
The connections between a SPECweb2005 load generating machine and the SUT must not use a TCP Maximum Segment Size (MSS) greater than 1460 bytes. This needs to be accomplished by platform-specific means outside the benchmark code itself. The method used to set the TCP MSS must be disclosed. MSS is the largest "chunk" of data that TCP will send to the other end. The resulting IP datagram is normally 40 bytes larger: 20 bytes for the TCP header and 20 bytes for the IP header resulting in an MTU (Maximum Transmission Unit) of 1500 bytes.
The BeSim engine must be run on a physically different system from the SUT.
For a run to be valid, the following attributes must hold true:
The SPECweb2005 individual workload metrics represent the actual number of user sessions that a server can support while meeting quality of service (QoS) and validation requirements for the given workload. In the benchmark run, a number of simultaneous user sessions are requested. Typically, each user session would start with a single thread requesting a dynamically created file or page. Following the receipt of this file and the need to request multiple embedded files within the page, two threads corresponding to that user session actively make connections and request files on these connections. The number of threads making requests on behalf of a given user session is limited to two, in order to comply with the HTTP 1.1 recommendations.
The load generated is based on page requests, transition between pages and the static images accessed within each page, as defined in the SPECweb2005 Design Specification.
The QoS requirements for each workload are defined in terms of two parameters, Time_Good and Time_Tolerable. QoS requirements are page based, Time_Good and Time_Tolerable values are defined separately for each workload (Time_Tolerable > Time_Good). For each page, 95% of the page requests (including all the embedded files within that page) are expected to be returned within Time_Good and 99% of the requests within Time_Tolerable. Very large static files (i.e. Support downloads) use specific byte rates as their QoS requirements.
The validation requirement for each workload is such that less than 1% of requests for any given page and less than 0.5% of the all page requests in a given test iteration fail validation.
It is required in this benchmark that all user sessions be run at the HIGH-SPEED-INTERNET speed of 100,000 bytes/sec.
In addition, the URL retrievals (or operations) performed must also meet the following quality criteria:
Note: The Weighted Percentage Difference for any given workload page is calculated using the following formulas:
WPD = PageMix% * ETR
ETR = (#Sessions * RunTime) / (ThinkTime
* %RwTT + AvgRspTime)
Where:
Workload Page Mix Percentage Table |
|||||||
Banking |
Mix % |
|
Ecommerce |
Mix% |
|
Support |
Mix% |
acct summary |
15.11% |
|
billing |
3.37% |
|
catalog |
11.71% |
add payee |
1.12% |
|
browse |
11.75% |
|
download |
6.76% |
bill pay |
13.89% |
|
browse product |
10.03% |
|
file |
13.51% |
bill pay status |
2.23% |
|
cart |
5.30% |
|
file catalog |
22.52% |
check detail html |
8.45% |
|
confirm |
2.53% |
|
home |
8.11% |
check image |
16.89% |
|
customize1 |
16.93% |
|
product |
24.78% |
change profile |
1.22% |
|
customize2 |
8.95% |
|
search |
12.61% |
login |
21.53% |
|
customize3 |
6.16% |
|
|
|
logout |
6.16% |
|
index |
13.08% |
|
|
|
payee info |
0.80% |
|
login |
3.78% |
|
|
|
post check order |
0.88% |
|
product detail |
8.02% |
|
|
|
post fund transfer |
1.24% |
|
search |
6.55% |
|
|
|
post profile |
0.88% |
|
shipping |
3.55% |
|
|
|
quick pay |
6.67% |
|
|
|
|
|
|
request checks |
1.22% |
|
|
|
|
|
|
req xfer form |
1.71% |
|
|
|
|
|
|
Workload-specific configuration files are supplied with the harness. All configurable parameters are listed in these files. For a run to be valid, all the parameters in the configuration files must be left at default values, except for the ones that are marked and listed clearly as "Configurable Workload Properties".
Since SPECweb2005 contains three distinct workloads (banking, ecommerce, and support), the benchmarker may:
The particular files referenced shall be determined by the workload generation in the benchmark itself. A fileset for a workload consists of content that the dynamic scripts reference. This represents images, static content, and also "padding" to bring the dynamic page sizes in-line with that observed in real-world Web sites. All filesets are to be generated using the Wafgen fileset generator supplied with the benchmark tools. It is the responsibility of the benchmarker to ensure that these files are placed on the SUT so that they can be accessed properly by the benchmark. These files, and only these files shall be used as the target fileset. The benchmark performs internal validations to verify the expected results. No modification or bypassing of this validation is allowed.
The SUT is required to be configured with the storage to contain all necessary software and logs for compliant runs of all three workloads. As a minimum, the system must also be configured to contain the largest fileset of three workloads, such that each of the other two workload filesets can be mapped into to the same storage footprint. If the system has not been configured to contain storage to hold the filesets for all three workloads concurrently, then the benchmarker must use the same I/O subsystem (disks, controllers, etc) and not add or remove storage. The disclosure details must indicate whether the filesets were stored concurrently or remapped between workload runs.
For the Banking workload, we define two types of files:
1. The embedded image files, that do not grow with
the load. Details on these files (bytes and type) are specified in the design
document.
2. The number of check images increase linearly with the number of simultaneous
connections supported. For each connection supported, we would maintain check
images for 50 users, each in its own directory. For each user defined, there
will be 20 check images maintained, 10 representing the front of the checks and
the other 10 representing the back of the checks.
The above assumes that under high load conditions in a banking environment, we would expect to see no more than 1% of the banking customers logged in at the same time.
For the E-commerce workload, two types of files are defined:
1. The embedded image files, that do not grow with the load. Details on these files (bytes and type) are specified in the design document. 2. The product images, which increase linearly with the number of simultaneous sessions requested. For each simultaneous session, 5 "product line" directories are created. Each product line directory contains images for 10 different "products". Each product has 3 different sizes, representing the various views of products that are often presented to users (i.e., thumbnails, medium-sized, and larger close-up views).
For the support site workload, two types of files are defined:
1. The embedded image files, that do not grow with
the load. Details on these files (bytes and type) are specified in the design
document.
2. The file downloads, which increase linearly with the number of simultaneous
sessions requested. The ratio of simultaneous sessions to download directories
is 4:1. Each directory contains downloads for 5 different categories (i.e.
flash BIOS upgrades, video card drivers, etc.). The file sizes were
determined by analyzing the file sizes observed at various hardware vendors'
support sites.
SPECweb2005 follows a page based model. Each page is initiated by a dynamic GET or POST request, which runs a dynamic script on the server and returns a dynamically created Web page. Associated with each dynamic page, are a set of static files or images, which the client requests right after the receipt of the dynamically created page. The page returned is marked as complete when all the associated images/static files for that page are fully received.
Only the dynamic scripts provided in the benchmark kit may be used for submissions/publications. The current release provides implementations in PHP and JSP.
The pseudo code reference specifications are the standard definition of the functionality. Any dynamic implementation must follow the specification exactly.
For new dynamic implementations, the submitter must inform the sub-committee at least one month prior to the actual code submission. All dynamic implementations submitted to SPEC must include a signed permission to use form and must be freely available for use by other members and licensees of the benchmark. Once the code has been submitted, the sub-committee will then review the code for a period of four months. Barring any issues with the implementation, the sub-committee will then incorporate the implementation into a new version of the benchmark.
Approval of any newly submitted dynamic code for future releases will include testing conformance to pseudo code as well as running of the code on other platforms by active members of the sub-committee. This will be done in order to ensure compliance with the letter and spirit of the benchmark, namely whether the scripts used to code the dynamic requests are representative of scripts commonly in use within the relevant customer base. An acceptable scripting language must meet the following requirements:
The reported metric, SPECweb2005, will be derived from a set of compliant results from all three workloads in the suite:
The SPECweb2005 metric is a "supermetric" that is the geometric mean of the three normalized submetrics for each workload. The normalized submetric for a given workload is defined as the ratio of the workload metric for the SUT to the workload metric for the reference platform multiplied by 100.
The individual workload metric is the number of simultaneous sessions from a compliant test run consisting of of three consecutive valid and conforming iterations of the benchmark, using one invocation of "java specweb".
Each iteration consists of a minimum 3 minute thread ramp up and a minimum 5 minute warm up period and a 30 minute measurement period (i.e. run time; which may be increased to ensure at least 100 requests for each page type are completed where the load is minimal). There are also corresponding rampdown periods (3 minutes + 5 minutes) between iterations.
The SPECweb2005 reference platform consists of:
The metric SPECweb2005 and individual workload metrics may not be associated with any estimated results. This includes adding, multiplying or dividing measured results to create a derived metric for some other system configuration.
The report of results for the SPECweb2005 benchmark is generated in ASCII and HTML format by the provided SPEC tools. These tools may not be changed without prior SPEC approval. The tools perform error checking and will flag some error conditions as resulting in an "invalid run". However, these automatic checks are only there for debugging convenience, and do not relieve the benchmarker of the responsibility to check the results and follow the run and reporting rules.
SPEC reviews and accepts for publication on SPEC's website only a complete and compliant set of results
for all three workloads run and reported according to these rules. SPEC
allows the public disclosure of the SPECweb2005 metric as well as the
individual workload metrics: SPECweb2005_Banking, SPECweb2005_Ecommerce, and
SPECweb2005_Support from compliant runs of these workloads without formal
submission to SPEC. All public disclosures must adhere to the Fair Use
Rules. Full disclosure reports of all test and configuration details as
described in these run and report rules must be made available. Licensees
are encouraged to run all three workloads and submit them to SPEC for
publication.
SPECweb2005 results will be categorized into single and multiple node results, where the terms single and multiple nodes are as defined in this section. Multiple nodes are again defined to be of two types, homogeneous and heterogeneous. Moreover, for multiple submissions involving homogeneous nodes, the subcommittee will also require a submission on a corresponding single-node platform (see details in the following paragraphs).
A Single Node Platform for SPECweb2005 consists of one or more processors executing a single instance of a first level supervisor software, i.e. an operating system or a hypervisor hosting one or more instances of the same guest operating system, where one or more instances of the same web server software are executed on the main operating system or the guest operating systems. ?Externally attached storage for software and filesets may be used; all other performance critical operations must be performed within the single server node.? A single common set of NICs must be used across all 3 workloads to relay all HTTP and HTTPS? traffic.
Example:
|
test harness (clients, switches)=|=Server NICs:Server Node:Storage
|
A Homogeneous Multi Node Platform for SPECweb2005 consists of two or more
electrically equivalent single Node Servers in a single chassis or connected
through a shared bus. Each node contains the same number and type of processing
units and devices and each node executes a single instance of an OS and one or
more instances of the same Web server software.
Storage for the filesets may be duplicated or
shared. All incoming requests from the test harness must be load balanced
by either by a single node that receives all incoming requests and balances the
load across the other nodes (A) or by a separate load balancing appliance that
serves that function (B). Each node must contain a single common set of NICs that must be used across all 3 workloads to relay all
HTTP and HTTPS traffic.
If a separate load balancing appliance is used it must be included in the SUT's definition.
A)
|
test harness (clients, switches)=|=Node_1 NICs:Node_1_LB:Node_2:..:Node_N
|
B)
Node_1
| /
test harness (clients, switches)=|=LoadBalancer +------Node_2
| \
Node_N
A Heterogeneous/Solution Platform for SPECweb2005 consists of any combination of server nodes and appliances that have been networked together to provide all the performance critical functions measured by the benchmark. All incoming requests from the test harness must be load balanced by either a single node that receives all incoming requests and balances the load across the other nodes or by a separate load balancing appliance that serves that function. Electrical equivalence between server nodes is not required.
Storage for the filesets may be duplicated or shared. Additional appliances that provide performance critical operations such as intelligent switches or SSL appliances may be used. All nodes and appliances used must be included in the SUT's definition. Examples: C & D.
C)
|
test harness (clients, switches)-|-I_Switch-Node_1 NICs:Node_1_LB:Node2:..:Node_N
|
D)
SSLappliance-ImageServer_1
| /
test harness (clients, switches)-|-LoadBalancer-+-SSLappliance-Node_2
| \
SSLappliance-Node_N
All system configuration information required to duplicate published performance results must be reported. Tunings not in default configuration for software and hardware settings including details on network interfaces must be reported.
The SUT hardware configuration must not be changed between
workload runs. However, not all hardware used in one workload is required
to be used in another. In the case where multiple controllers are used
for one workload, the same controllers must be electronically connected, and
some subset of those controllers must be used, for the other workloads.
In the case of NICs, all NICs
must be used by each workload and each NIC must carry a significant portion of
the network traffic.
The following SUT hardware components must be reported:
The documentation of the hardware for a result in the Heterogeneous/Platform category must also include a diagram of the configuration.
The following SUT software components must be reported:
The following SUT software tunings are acceptable:
The following SUT software tunings are not acceptable:
A brief description of the network configuration used to achieve the benchmark results is required. The minimum information to be supplied is:
The following load generator hardware components must be reported:
The following BeSim hardware and software components must be reported:
Note: BeSim API code is provided as part of the SPECweb2005 kit, and can be compiled in several different ways: ISAPI, NSAPI, or FastCGI. For more information, please see the User's Guide.
The dates of general customer availability must be listed for the major components: hardware, HTTP server, and operating system, month and year. All the system, hardware and software features are required to be generally available on or before date of publication, or within 90 days of the date of publication (except where precluded by these rules, see section 3.2.7). With multiple components having different availability dates, the latest availability date must be listed.
Products are considered generally available if they are orderable by ordinary customers and ship within a reasonable time frame. This time frame is a function of the product size and classification, and common practice. The availability of support and documentation for the products must coincide with the release of the products.
Hardware products that are still supported by their original or primary vendor may be used if their original general availability date was within the last five years. The five-year limit is waived for hardware used in client and BeSim systems.
Software products that are still supported by their original or primary vendor may be used if their original general availability date was within the last three years.
In the disclosure, the benchmarker must identify any component that is no longer orderable by ordinary customers.
If pre-release hardware or software is tested, then the test sponsor represents that the performance measured is generally representative of the performance to be expected on the same configuration of the release system. If the sponsor later finds the performance to be lower than 5% of that reported for the pre-release system, then the sponsor shall resubmit a new corrected test result.
SPECweb2005 does permit Open Source Applications outside of a commercial distribution or support contract with some limitations. The following are the rules that govern the admissibility of the Open Source Application in the context of a benchmark run or implementation. Open Source Applications do not include shareware and freeware, where the source is not part of the distribution.
The reporting page must list the date the test was performed, month and year, the organization which performed the test and is reporting the results, and the SPEC license number of that organization.
This section is used to document:
The following additional information may be required to be provided for SPEC's results review:
The submitter is required to keep the entire log file from both the SUT and the BeSim box, for each of the three workloads, for the duration of the review period.
Once you have a compliant run and wish to submit it to SPEC for review, you will need to provide the following:
Once you have the submission ready, please email SPECweb2005 submissions to subweb2005@spec.org.
SPEC encourages the submission of results for review by the relevant subcommittee and subsequent publication on SPEC's web site. Licensees may publish compliant results independently; however, any SPEC member may request a full disclosure report for that result and the test sponsor must comply within 10 business days. Issues raised concerning a result's compliance to the run and reporting rules will be taken up by the relevant subcommittee regardless of whether or not the result was formally submitted to SPEC.
SPEC provides client driver software, which includes tools for running the benchmark and reporting its results. This client driver is written in Java; precompiled class files are included with the kit, so no build step is necessary. This software implements various checks for conformance with these run and reporting rules. Therefore the SPEC software must be used; except that necessary substitution of equivalent functionality (e.g. fileset generation) may be done only with prior approval from SPEC. Any such substitution must be reviewed and deemed "performance-neutral" by the OSSC.
The kit also includes Java code for the file set generator (Wafgen) and C code for BeSim.
SPEC also provides server-side script code for each workload. In the initial release, PHP and JSP scripts are provided. These scripts have been tested for functionality and correctness on various operating systems and Web servers. Hence all submissions must use either of these script implementations. Any new dynamic script implementation will be evaluated by the sub-committee according to the acceptance process (see section 2.4).
Once the code is approved by the sub-committee, it will be made available on the SPEC Web site for any licensee to use in their tests/submissions. Upon approval, the new implementation will be made available in future releases of the benchmark and may not be used until after the release of the new version.
Copyright © 2005-2006 Standard Performance Evaluation Corporation. All rights reserved.
Java(r) is a registered trademark of Sun Microsystems.