SPECmail2001 Release 1.0 Run and Reporting Rules

Version 1.00
Last modified: December 28, 2000

1.0 Introduction

This document specifies how SPECmail2001 is to be run for measuring and publicly reporting performance results. These rules have been established by the SPEC Mail Server Subcommittee and approved by the SPEC Open Systems Steering Committee. The rules ensure that results generated with this suite are meaningful, comparable to other generated results, and are repeatable (with documentation covering factors pertinent to duplicating the results).

Per the SPEC license agreement, all results publicly disclosed must adhere to these Run and Reporting Rules.

1.1 Philosophy

SPEC believes the user community will benefit from an objective series of tests, which can serve as common reference and be considered as part of an evaluation process.

SPEC is aware of the importance of optimizations in producing the best system performance. SPEC is also aware that it is sometimes hard to draw an exact line between legitimate optimizations that happen to benefit SPEC benchmarks and optimizations that specifically target the SPEC benchmarks. SPEC wants to increase awareness of implementers and end users to issues of unwanted benchmark-specific optimizations that would be incompatible with SPEC's goal of fair benchmarking.

SPEC expects that any public use of results from this benchmark suite shall be for Systems Under Test (SUTs) and configurations that are appropriate for public consumption and comparison. Thus, it is required that:

Hardware and software used to run this benchmark must provide a suitable environment for supporting Internet mail transmission using standardized email protocols.
Optimizations utilized must improve performance for a larger class of workloads than just the ones defined by this benchmark suite. There must be no benchmark specific optimizations.
The SUT and configuration is generally available, documented, supported, and encouraged by the providers.

To ensure that results are relevant to end-users, SPEC expects that the hardware and software implementations used for running the SPEC benchmarks adhere to following conventions:

Proper use of the SPEC benchmark tools as provided.
Availability of an appropriate full disclosure report.
Support for all of the appropriate protocols.

1.2 Caveat

SPEC reserves the right to investigate any case where it appears that these guidelines and the associated benchmark run and reporting rules have not been followed for a published SPEC benchmark result. SPEC may request that the result be withdrawn from the public forum in which it appears and that the benchmarker correct any deficiency in product or process before submitting or publishing future results.

SPEC reserves the right to adapt the benchmark codes, workloads, and rules of SPECmail2001 as deemed necessary to preserve the goal of fair benchmarking. SPEC will notify members and licensees if changes are made to the benchmark and will rename the metrics (e.g. from SPECmail2001 to SPECmail2001a).

Relevant standards are cited in these run rules as URL references, and are current as of the date of publication. Changes or updates to these referenced documents or URL's may necessitate repairs to the links and/or amendment of the run rules. The most current run rules will be available at the SPEC web site at http://www.spec.org. SPEC will notify members and licensees whenever it makes changes to the documentation.

2.0 Run Rules

The production of compliant SPECmail2001 test results requires that the tests be run in accordance with these run rules. These rules relate to the requirements for the System Under Test (SUT) and the testbed (i.e. SUT, clients, and network), including protocols, operation, configuration, test staging, optimizations, and measurement.

2.1 Protocols

As Internet email is defined by its protocol definitions, SPECmail2001 requires adherence to the relevant protocol standards:

RFC 821 : Simple Mail Transfer Protocol (SMTP)
RFC 1939 : Post Office Protocol - Version 3 (POP3)

The SMTP and POP3 protocols imply the following:

RFC   791 : Internet Protocol (IPv4)
RFC 2460 : Internet Protocol, Version 6 (IPv6) [ may be used in place of IPv4 ]
RFC   792 : Internet Control Message Protocol (ICMP)
RFC   793 : Transmission Control Protocol (TCP)
RFC   950 : Internet Standard Subnetting Procedure
RFC 1122 : Requirements for Internet Hosts - Communication Layers

Internet standards are evolving standards. Adherence to related RFC's (e.g. RFC 1191 Path MTU Discovery) is also acceptable, provided the implementation retains the characteristic of interoperability with other implementations.

2.2 General Availability

The entire testbed (SUT, clients, and network) must be comprised of components that are generally available, or shall be generally available within three months of the first publication of the results.

Products are considered generally available if they are orderable by ordinary customers and ship within a reasonable time frame. This time frame is a function of the product size and classification, and common practice. Some limited quantity of the product must have shipped on or before the close of the stated availability window. Shipped products do not have to match the tested configuration in terms of CPU count, memory size, and disk count or size, but the tested configuration must be available to ordinary customers. The availability of support and documentation for the products must coincide with the release of the products.

Hardware products that are still supported by their original or primary vendor may be used if their original general availability date was within the last five years. The five-year limit is waived for hardware used in client systems.

Software products that are still supported by their original or primary vendor may be used if their original general availability date was within the last three years.

In the disclosure, the benchmarker must identify any component that is no longer orderable by ordinary customers.

2.3 Stable Storage

The SUT must utilize stable storage for the mail store. Mail servers are expected to safely store any email they have accepted until the recipient has disposed of it. To do this, mail servers must be able to recover the mail store without loss from multiple power failures (including cascading power failures), operating system failures, and hardware failures of components (e.g. CPU) other than the storage medium. At any point where the data can be cached, after the server has accepted the message and acknowledged its receipt, there must be a mechanism to ensure any cached message survives the server failure.

Examples of stable storage include:

Media commit of data; i.e. the message has been successfully written to the disk media.
An immediate reply disk drive with battery-backed on-drive intermediate storage or an uninterruptible power supply (UPS).
Server commit of data with battery-backed intermediate storage and recovery software.
Cache commit with UPS.

Examples which are not considered stable storage:

An immediate reply disk drive without battery-backed on-drive intermediate storage or UPS.
Cache commit without UPS.
Server commit of data without battery-backed intermediate storage and recovery software.

If an UPS is required by the SUT to meet the stable storage requirement, the benchmarker is not required to perform the test with an UPS in place. The benchmarker must state in the disclosure that an UPS is required. Supplying a model number for an appropriate UPS is encouraged but not required.

If a battery-backed component is used to meet the stable storage requirement, that battery must have sufficient power to maintain the data for at least 48 hours to allow any cached data to be committed to media and the system to be gracefully shut down. The system or component must also be able to detect a low battery condition and prevent the use of the component or provide for a graceful system shutdown.

2.4 Single Logical Server

The SUT must present to mail clients the appearance and behavior of a single logical server for each protocol. Specifically, the SUT must present a single system view, in that the results of any mail transaction from a client that change the state on the SUT must be visible to any/all other clients on any subsequent mail transaction. For example, if User_1 has 10 mail messages in his mailbox on the SUT, then that user could read those 10 messages from any client system.

2.5 Mail Server Logging

For a run to be valid, the following attributes related to logging must hold true:

The mail server must make at least one entry into a log file for each SMTP and POP session initiated. The log entry must include as a minimum the following fields:

SMTP:
time stamp (month / day / hour / minute / sec)
message identifier of the transferred message
POP:
time stamp (month / day / hour / minute / sec)
user identifier

The log file records do not have to be synchronously committed to storage, but must be scheduled for non-volatile storage within 60 seconds.
The server must maintain the log for the entire duration of the run.
A binary format may be used for logging; however, ASCII translation is required when providing log files as part of the full disclosure materials.

2.6 Networking

For a run to be valid, the following attributes that relate to TCP/IP network configuration must hold true:

Since SPECmail2001 is a representation of ISP mail servers, the connections between a load generating client and the SUT must not use a TCP Maximum Segment Size (MSS) greater than 1460 bytes. This needs to be accomplished by platform-specific means outside the benchmark code itself. The method used to set the TCP MSS must be disclosed. MSS is the largest "chunk" of data that TCP will send to the other end. The resulting IP datagram is normally 40 bytes larger: 20 bytes for the TCP header and 20 bytes for the IP header resulting in a Maximum Transmission Unit (MTU) of 1500 bytes.
The value of TIME_WAIT must be at least 60 seconds.
On those systems that do not dynamically allocate TCP TIME_WAIT table entries, the appropriate system parameter must be configured to ensure that user ports are not reused before TIME_WAIT period expires. This would be set on a per client and per server node basis as applicable. As a basis for calculation, it may be assumed that the benchmark will generate 200 connections/minute per 10,000 SPECmail2001 users that will last no more than 30 seconds. So for each 10,000 SPECmail2001 users, there must be at least 300 TIME_WAIT table entries.

Note: SPEC intends to follow relevant standards wherever practical, but with respect to this performance sensitive parameter it is difficult due to ambiguity in the standards. RFC1122 requires that TIME_WAIT be 2 times the maximum segment life (MSL) and RFC793 suggests a value of 2 minutes for MSL. So TIME_WAIT itself is effectively not limited by the standards. However, current TCP/IP implementations define a de facto lower limit for TIME_WAIT of 60 seconds, which is the value used in most BSD derived UNIX implementations.

2.7 Initializing and Running the Benchmark

To make an official SPECmail2001 test run, the benchmarker must perform the following steps:

Pre-populate the mail store on the server. This can be accomplished by:

Running the initialization sequence (specmail -initonly) as described in the User Guide.

Restoring an archive of the mail store created after the successful completion of the initialization sequence described above.

Start the SPECmail2001 test using the default options required for a compliant test.

2.8 Optimization

Benchmark specific optimization is not allowed. Any optimization of either the configuration or software used on the SUT must improve performance for a larger class of workloads than that defined by this benchmark and must be supported and recommended by the provider. Optimizations that take advantage of the benchmark's specific features are forbidden. Examples of inappropriate optimization include, but are not limited to, taking advantage of specially formed test user account names, the fixed set of message sizes in the workload, or the workload's mailbox sizes.

2.9 Measurement

The provided SPECmail2001 tools must be used to run and produce measured SPECmail2001 results. The SPECmail2001 metric is a function of the SPECmail2001 workload, the associated mail store and the defined Quality of Service criteria. SPECmail2001 results are not comparable to any other mail server performance metric.

2.9.1 Metric

SPECmail2001 expresses performance in terms of SPECmail2001 Messages per Minutes (MPM). The benchmarker specifies the number of users for which the benchmark tools will generate a workload. The load generators will generate a mix of SMTP and POP3 transactions that are presented to the mail server such that 1 MPM is representative of load expected during the peak hour for 200 POP consumer users. In addition to the MPM metric, the benchmark will also report the configured number of SPECmail2001 users.

2.9.2 Workload

SPECmail2001 requires that for each SMTP incoming message received by the SUT, the SUT must also handle a selection of POP transactions. The POP transactions include AUTH, STAT, RETR, and DELE. SMTP incoming messages that are not intended for local users are relayed as outgoing SMTP messages. The workload parameters required for a valid run are contained in the default workload parameter file supplied with the benchmark. A detailed explanation of the workload is included in the SPECmail2001 Architecture White Paper.

2.9.3 Mail Store

It is the responsibility of the benchmarker to ensure that the messages that make up the mail store are placed on the SUT so that they can be accessed properly by the benchmark. These messages and only these messages shall be used as the target working set. The benchmark performs internal validations to verify the expected results. No modification or bypassing of this validation is allowed.

The benchmark determines the initial working set size for the test based on a function of the number of POP3 users specified for the test, the message size distribution, and mailbox size distribution. An estimate of the raw byte count for the working set can be calculated as follows:

Working_Set_Size = 100KB * UserCount

The actual size of the mail store and the amount of disk space to contain it will be a function of the mail server products in use and any additional storage overhead needed or configured. It is recommended that an additional 10% of storage space be available to accommodate the fluctuations in the workload.

The benchmarker is responsible for configuring the SUT with the corresponding number of user accounts and mailboxes required for the test. The benchmark suite provides tools for the initial population of the mail store.

Since the working set is not static and changes over the course of the test as messages are added or deleted, it is allowable for the benchmarker to capture the mail store image after the tools have created the initial population (see section 2.7).

2.9.4 Quality of Service Criteria

The SPECmail2001 benchmark has specific Quality of Service (QoS) criteria for response times, delivery times and error rates. The QoS criteria are checked by the benchmark tools.

SPECmail2001 requires that for each request type except SMTP-DATA and POP-RETR commands, 95% of all response times must be less than 5 seconds.
For the SMTP-DATA and POP-RETR commands, 95% of all messages transferred to/from the mail server must transfer at a minimum rate of half the modem speed plus 5 seconds.
SPECmail2001 requires that 95% of all messages to local users get delivered to the target mailbox within 60 seconds.
SPECmail2001 requires that 95% of all messages to remote mail users must be received by the mail server (sink) within the measurement period.
SPECmail2001 requires that not more than 1% of transactions fail.

2.9.5 POP3 Autologout Timer

According to the POP3 RFC 1939:

A POP3 server MAY have an inactivity autologout timer. Such a timer MUST be of at least 10 minutes duration. The receipt of any command from the client during that interval should suffice to reset the autologout timer. When the timer expires, the session does NOT enter the UPDATE state--the server should close the TCP connection without removing any messages or sending any response to the client.

If the mail server includes an inactivity autologout timer, it must be set to at least 10 minutes. It is recommended that the timer not be set to longer than 10 minutes as this could cause a slight increase in POP3 lock conflicts particularly at the 120% load level.

2.10 Load Generators

The SPECmail2001 benchmark requires the use of one or more client systems. One client system is designated the prime client and will run the benchmark manager. One or more client systems act as load generators. One client system is designated as the smtpsink to handle the mail to remote addresses. Please refer to the User Guide for more detail on these roles.

A server component of the SUT must not be used as a load generator or a smtpsink when testing to produce valid SPECmail2001 results. A server component may be used as the prime client, but this is not recommended.

The client systems must have a Java Runtime Environment (JRE) version 1.1.8 or higher installed in order to run the benchmark tools.

2.11 SPECmail2001 Parameters

The SPECmail2001 benchmark provides two parameter files that contain the testbed configuration and workload parameters. The file SPECmail_config.rc contains the testbed (clients and SUT) configuration information that appears in the final report and must be modified to contain the site-specific information.

The file SPECmail_fixed.rc contains the default workload parameters used to produce a compliant test result. This file must not be altered. Modifying the SPECmail_fixed.rc will not prevent the benchmark from running, but the results generated using the modified SPECmail_fixed.rc file will always be marked non-compliant.

To help ensure that the content of the parameter files is correct and can be used to produce a compliant test run, benchmarkers are encouraged to invoke the java specmail command with the -compliant switch. Then if there are problems in the rc files, the benchmark will generate appropriate warning messages and immediately discontinue the test.

The SPECmail2001 User Guide provides detailed documentation on the parameters in the SPECmail_config.rc and SPECmail_fixed.rc files.

3.0 Reporting Rules

In order to publicly disclose SPECmail2001 results, the benchmarker must adhere to these reporting rules in addition to having followed the run rules above. The goal of the reporting rules is to ensure the SUT and testbed are sufficiently documented such that someone could understand the results and reproduce the test.

3.1 Metrics And Result Reports

The benchmark single figure of merit, SPECmail2001 messages per minute, is the throughput measured during the run at the 100% load level. A complete benchmark result is comprised of three separate measurements for the 80%, 100%, and 120% load levels, shown on the results reporting page. A detailed breakdown of each test is included on the reporting page.

The report of results for the SPECmail2001 benchmark is generated in HTML by the provided SPEC tools. These tools may not be changed, except for portability reasons with prior SPEC approval. The tools perform error checking and will flag some error conditions resulting in an "invalid result". However, these automatic checks are only there for debugging convenience and do not relieve the benchmarker of the responsibility to check the results and follow the run and reporting rules.

The section of the output.raw file that contains actual test measurements must not be altered. Corrections to the SUT descriptions may be made as needed to produce a properly documented disclosure.

3.2 Results Disclosure and Usage

Any SPECmail2001 result produced in compliance with these run and reporting rules may be publicly disclosed and represented as a valid SPECmail2001 result.

Any test result not in full compliance with the run and reporting rules must not be represented using the SPECmail2001 metric name.

The metric SPECmail2001 messages per minute must not be associated with any estimated results. This includes adding, multiplying or dividing measured results to create a derived metric.

3.2.1 Fair Use of SPECmail2001 Results

When competitive comparisons are made using SPECmail2001 benchmark results available from the SPEC web site, SPEC requires that the following template be used:

SPECmail2001 is a trademark of the Standard Performance Evaluation Corp. (SPEC). Competitive numbers shown reflect results published on www.spec.org from date to date. [The comparison presented is based on basis for comparison.] For the latest SPECmail2001 results visit http://www.spec.org/osg/mail2001.

Notes:

The reported dates must cover the period in which the competitive results were published on the SPEC web site.
The bracketed phrase above ([...]) is required only if selective comparisons are used.

Example:

SPECmail2001 is a trademark of the Standard Performance Evaluation Corp. (SPEC). Competitive numbers shown reflect results published on www.spec.org from Jan 12 to Mar 31, 2001. The comparison presented is based on best performing 4-cpu servers currently shipping by Vendor 1, Vendor 2 and Vendor 3. For the latest SPECmail2001 results visit http://www.spec.org/osg/mail2001.

The rationale for the template is to provide fair comparisons by ensuring that:

The time period when the competitive data was published is clearly mentioned.
The subset of results used for comparison is clearly defined.
A reference to http://www.spec.org is included.

3.2.2 Research and Academic Usage of SPECmail2001

SPEC encourages use of the SPECmail2001 benchmark in academic and research environments. It is understood that experiments in such environments may be conducted in a less formal fashion than that required of licensees submitting to the SPEC web site or otherwise disclosing valid SPECmail2001 results.

For example, a research environment may use early prototype hardware that simply cannot be expected to stay up for the length of time required to run the entire benchmark, or may use research software that is unsupported and not generally available. Nevertheless, SPEC encourages researchers to obey as many of the run rules as practical, even for informal research. SPEC suggests that following the rules will improve the clarity, reproducibility, and comparability of research results. Where the rules cannot be followed, SPEC requires the results be clearly distinguished from fully compliant results such as those officially submitted to SPEC, by disclosing the deviations from the rules and avoiding the use of the SPECmail2001 metric name.

3.3 Testbed Configuration Disclosure

The system configuration information that is required to duplicate published performance results must be reported. This list is not intended to be all-inclusive, nor is each performance neutral feature in the list required to be described. The rule is: If it affects performance or the feature is required to duplicate the results, then it must be described.

Any deviations from the standard default configuration for the SUT must be documented, so an independent party would be able to reproduce the result without further assistance.

For most of the following configuration details, there is an entry in the configuration file, and a corresponding entry in the tool-generated HTML result page. If information needs to be included that does not fit into these entries, the Notes sections must be used.

3.3.1 SUT Hardware

The following SUT hardware components must be reported:

Vendor's name
System model number, type and clock rate of processor, number of processors, and main memory size
Size and organization of primary, secondary, and other cache, per processor. If a level of cache is shared among processors in a system, it must be stated in the notes section of the disclosure.
Memory configuration options, if they affect performance, e.g. interleaving and access time
Other hardware, e.g. write caches, or other accelerators
Number, type, model, and capacity of disk controllers and drives
Disk subsystem configuration details, if they affect performance

3.3.2 SUT Software

The following SUT software components must be reported:

Mail Server software and version
Operating system and version
Type of file system
The values of maximum segment life (MSL) and TIME_WAIT. If TIME_WAIT is not equal to 2*MSL, that must be noted. (Reference section 4.2.2.13 of RFC 1122).
Any other software packages used during the benchmarking process
Other clarifying information as required to reproduce benchmark results; e.g. number of daemons, server buffer cache size, disk striping, non-default kernel parameters, and logging mode
Additionally, the submitter must make available a description of the tuning features that were utilized; e.g. kernel parameters and software settings, including the purpose of that tuning feature. Where possible, it must be noted how the values used differ from the default settings for that tuning feature. This disclosure can be part of the Notes sections or a separate document.

3.3.3 Network Configuration

A brief description of the network configuration used to achieve the benchmark result is required. The minimum information to be supplied is:

Number, type, and model of network controllers
Number and type of networks used
Base speed of network
A network configuration notes section may be used to list the following additional information:

Number, type, model, and relationship of external network components to support the SUT (e.g., any external routers, hubs, switches, etc.)
Relationship of clients, client type, and networks (including routers, hubs, switches, etc.), i.e. which clients are connected to which LAN segments. For example: "clients 1 and 2 on one ATM-622, clients 3 and 4 on second ATM-622, and clients 5, 6, and 7 each on their own 100TX segment."
Number, type, model, and relationship of external network components

3.3.4 Client Systems

The following client system properties must be reported:

Number of client systems
System model number, processor type and clock rate, number of processors
Main memory size
Network Controller
Operating System and Version
JRE version used to run the benchmark (i.e. invoke specmail and specmailclient)
Any non-default parameters (e.g. email, TCP, OS, and Network tuning parameters)

3.3.5 Configuration Diagram

A configuration diagram of the SUT must be provided in a common graphics format (e.g. PNG, JPEG, GIF). This will be included in the HTML formatted results page. An example would be a line drawing that provides a pictorial representation of the SUT including the network connections between clients, server nodes, switches and the storage hierarchy and any other complexities of the SUT that can best be described graphically.

3.3.6 General Availability Dates

The dates of general customer availability must be listed for the major components: hardware, mail server software, and operating system, by month and year. All the system, hardware and software features are required to be available within three months of the first publication of the result. The overall hardware availability date must be the latest of the hardware availability dates. The overall software availability date must be the latest of the software availability dates.

If pre-release hardware or software is used, then the test sponsor represents that the performance measured is the performance to be expected on the same configuration of the release system. If the test sponsor later finds the performance has dropped by more than 5% of that reported for the pre-release system, then the test sponsor must resubmit a corrected test result.

For additional information on general availability requirements, please refer to section 2.2 above.

3.3.7 Test Sponsor

The reporting page must list:

Organization which is reporting the results
SPEC license number of that organization
Date the test was performed, by month and year

3.3.8 Disclosure Notes

The Notes section is used to document information such as:

System tuning parameters other than default
Process tuning parameters other than default
MTU size of the network used
Background load, if any
Any approved portability change made to the individual benchmark source code including module name, line number of the change
Information such as compilation options must be listed if the end user is required to build the server software from sources
Critical customer-identifiable firmware or option versions such as network and disk controllers
Additional important information required to reproduce the results from other reporting sections that require a larger text area
Any supplemental drawings or detailed written descriptions, or pointers to same, that may be needed to clarify some portion of the SUT
Definitions of tuning parameters may be included or a pointer supplied to a separate document
Part numbers or sufficient information that would allow the end user to order the SUT configuration if desired
Identification of any components used that are supported but are no longer orderable by ordinary customers

3.4 Mail Server Log File Review

The following additional information must be provided if requested for SPEC's results review:

ASCII versions of the SMTP and POP log files from the SUT

In order to minimize disk space requirements, the submitter is only required to keep the section of the log that covers the 100% load level phase. However, having the log files available in their entirety during the review is preferred.

4.0 Submission Requirements for SPECmail2001

Once the test sponsor has a compliant run and wishes to submit it to SPEC for review, they will need to provide the following:

The output.raw file containing the information outlined in section 3
File containing the configuration diagram of the SUT
Any supplemental information, such as tuning descriptions or additional configuration information that helps explain the SUT but did not fit within the report format

Once the submission is ready, please e-mail it to submail2001@spec.org

Retain the following for possible request during the review:

The SUT's SMTP and POP log files from the run in ASCII format

SPEC encourages the submission of results for review by the relevant subcommittee and subsequent publication on SPEC's web site. Vendors may publish compliant results independently; however, any SPEC member may request a full disclosure report for that result and the test sponsor must comply within 10 business days. Issues raised concerning a result's compliance to the run and reporting rules will be taken up by the relevant subcommittee regardless of whether or not the result was formally submitted to SPEC.

5.0 SPECmail2001 Benchmark Kit

SPEC provides client driver software, which includes the tools for running the benchmark and reporting its results. This software includes a number of checks for conformance with these run and reporting rules.

The client driver software is provided as Java bytecode. SPEC also includes the Java source in the distribution. Only the supplied Java bytecode may be used to produce publishable SPECmail2001 results. SPEC requires the user to provide any other software needed to run the benchmark, e.g. OS and JRE.

The kit also includes the SPECmail_config.rc and SPECmail_fixed.rc files described above and a copy of the benchmark documentation (User Guide, Architecture White Paper, FAQ, and Run and Reporting Rules).

Licensees will be notified of any significant updates to the benchmark tools or documentation. Updated versions of the documentation will be available at http://www.spec.org.