Standard Performance Evaluation Corporation
SPECmail2008 Release 1.0
Run and Reporting Rules
|Metric:||SPECmail_MSEnt2008 - Mailserver Enterprise 2008|
|Document||v1.00||Last modified: 17 July 2008|
This document specifies how SPECmail2008 is to be run for measuring and publicly reporting performance results. These rules abide by the norms laid down by SPEC. The rules ensure that results generated with this suite are meaningful, comparable to other generated results, and are repeatable (with documentation covering factors pertinent to duplicating the results).
Per the SPEC license agreement, all results publicly disclosed must adhere to these Run and Reporting Rules.
SPEC believes the user community will benefit from an objective series of tests, which can serve as common reference and be considered as part of an evaluation process.
SPEC is aware of the importance of optimizations in producing the best system performance. SPEC is also aware that it is sometimes hard to draw an exact line between legitimate optimizations that happen to benefit SPEC benchmarks and optimizations that specifically target the SPEC benchmarks. SPEC wants to increase awareness of implementers and end users to issues of unwanted benchmark-specific optimizations that would be incompatible with SPEC's goal of fair benchmarking.
SPEC expects that any public use of results from this benchmark suite shall be for Systems Under Test (SUTs) and configurations that are appropriate for public consumption and comparison. Thus, it is required that:
To ensure that results are relevant and publishable, SPEC expects that the hardware and software implementations used for running the SPEC benchmarks adhere to following conventions:
SPEC reserves the right to investigate any case where it appears that these guidelines and the associated benchmark run and reporting rules have not been followed for a published SPEC benchmark result. SPEC may request that the result be withdrawn from the public forum in which it appears and that the benchmarker correct any deficiency in product or process before submitting or publishing future results.
SPEC reserves the right to adapt the benchmark codes, workloads, and rules of SPECmail2008 as deemed necessary to preserve the goal of fair benchmarking. SPEC will notify members and licensees if changes are made to the benchmark and will rename the metrics (e.g. from SPECmail_MSEnt2008 to SPECmail_MSEnt2008a).
Relevant standards are cited in these run rules as URL references, and are current as of the date of publication. Changes or updates to these referenced documents or URL's may necessitate repairs to the links and/or amendment of the run rules. The most current run rules will be available at the SPEC web site at http://www.spec.org. SPEC will notify members and licensees whenever it makes changes to the documentation.
2.0 Run Rules
The production of any compliant SPECmail2008 test results requires that the tests be run in accordance with these run rules. These rules relate to the requirements for the System Under Test (SUT) and the test bed (i.e. SUT, clients, and network), including protocols, operation, configuration, test staging, optimizations, and measurement.
As Internet email is defined by its protocol definitions, SPECmail2008 requires adherence to the relevant protocol standards:
The SMTP and IMAP4 protocols imply the following:
: Internet Protocol (IPv4)
Internet standards are evolving standards. Adherence to related RFC's (e.g. RFC 1191 Path MTU Discovery) is also acceptable provided the implementation retains the characteristic of interoperability with other implementations.
2.2 General Availability
The entire test bed (SUT, clients, and network) must be comprised of components that are generally available, or shall be generally available within 3 months of the first publication of the results. For more detailed information on the Report Generating rules, please refer to Section 3.3.
Products are considered generally available if they are orderable by ordinary customers and ship within a reasonable time frame. This time frame is a function of the product size and classification, and common practice. Some limited quantity of the product must have shipped on or before the close of the stated availability window. Shipped products do not have to match the tested configuration in terms of CPU count, memory size, and disk count or size, but the tested configuration must be available to ordinary customers. The availability of support and documentation for the products must coincide with the release of the products.
Hardware products that are still supported by their original or primary vendor may be used if their original general availability date was within the last five years. The five-year limit is waived for hardware used in client systems.
Software products that are still supported by their original or primary vendor may be used if their original general availability date was within the last three years.
In the disclosure, the benchmarker must identify any component that is no longer orderable by ordinary customers.
2.3 Stable Storage
The SUT must utilize stable storage for the mail store. Mail servers are expected to safely store any email they have accepted until the recipient has disposed of it. To do this, Mailservers must be able to recover the mail store without loss from multiple power failures (including cascading power failures), operating system failures, and hardware failures of components (e.g. CPU) other than the storage medium. At any point where the data can be cached, after the server has accepted the message and acknowledged its receipt, there must be a mechanism to ensure any cached message survives the server failure.
If an UPS is required by the SUT to meet the stable storage requirement, the benchmarker is not required to perform the test with an UPS in place. The benchmarker must state in the disclosure that an UPS is required. Supplying a model number for an appropriate UPS is encouraged but not required.
If a battery-backed component is used to meet the stable storage requirement, that battery must have sufficient power to maintain the data for at least 48 hours to allow any cached data to be committed to media and the system to be gracefully shut down. The system or component must also be able to detect a low battery condition and prevent the use of the component or provide for a graceful system shutdown.
2.4 Single Logical Server
The SUT must present to mail clients the appearance and behavior of a single logical server for each protocol. Specifically, the SUT must present a single system view, in that the results of any mail transaction from a client that change the state on the SUT must be visible to any/all other clients on any subsequent mail transaction. For example, if User_1 has 10 mail messages in his mailbox on the SUT, then that user could read those 10 messages from any client system.
2.5 Mail Server Logging
For a run to be valid, the following attributes related to logging must hold true:
For a run to be valid, the following attributes that relate to TCP/IP network configuration must hold true:
Note: SPEC intends to follow relevant standards wherever practical, but with respect to this performance sensitive parameter it is difficult due to ambiguity in the standards. RFC1122 requires that TIME_WAIT be 2 times the maximum segment life (MSL) and RFC793 suggests a value of 2 minutes for MSL. So TIME_WAIT itself is effectively not limited by the standards. However, current TCP/IP implementations define a de facto lower limit for TIME_WAIT of 60 seconds, which is the value, used in most BSD derived UNIX implementations.
2.7 Initializing and Running the Benchmark
To make an official SPECmail2008 test run, the benchmarker must perform the following steps:
Benchmark specific optimization is not allowed. Any optimization of either the configuration or software used on the SUT must improve performance for a larger class of workloads than that defined by this benchmark and must be supported and recommended by the provider. Optimizations that take advantage of the benchmark's specific features are forbidden. Examples of inappropriate optimization include, but are not limited to, taking advantage of specially formed test user account names, the fixed set of message sizes in the workload, or the workload's mailbox sizes.
The provided SPECmail2008 tools must be used to run and produce the measured SPECmail_MSEnt2008 results. The SPECmail_MSEnt2008 metric is a function of the SPECmail2008 Enterprise workload, the associated mail store and the defined Quality of Service criteria. SPECmail_MSEnt2008 results are not comparable to any other mail server performance metric.
SPECmail2008 expresses performance in terms of SPECmail_MSEnt2008 IMAP Sessions per Hour. The benchmarker specifies the number of users for which the benchmark tools will generate a workload. The load generators presents a predefined mixture of SMTP and IMAP4 transactions to the E-mail server. Each SPECmail_MSEnt2008 IMAP user is represented by 1 or more SPECmail2008 Command sequences, in pre-defined combinations and durations during the peak hour. In addition to the SPECmail_MSEnt2008 metric, the benchmark also reports the configured number of SPECmail2008 users.
SPECmail_MSEnt2008 profile requires the SUT handle a selection of IMAP transactions and SMTP incoming messages for each IMAP user. The IMAP commands include LOGIN, FETCH, LIST, APPEND, SELECT, STORE and EXPUNGE, among others. SMTP incoming messages not intended for local users are relayed as outgoing SMTP messages. The workload parameters required for a valid run are contained in the default workload parameter file supplied with the benchmark. A detailed explanation of the workload is included in the SPECmail2008 Architecture White Paper.
2.9.3 Mail Store
It is the benchmarker's responsibility to ensure that the messages that make up the mail store are placed on the SUT so that they can be accessed properly by the benchmark. These folders and messages shall be used as the target working set. The benchmark performs internal validations to verify the expected results. No modification or bypassing of this validation is allowed.
The benchmark determines the initial working set size for the test based on a function of the number of IMAP4 users specified for the test, the message size distribution, and mailbox size distribution. Use the following rules as an estimate of the raw byte count needed for the data working set:
The actual size of the mail store and the amount of disk space to hold it is a function of the E-mail server product used and any additional storage overhead needed or configured. Another 10% should be added to the total storage space to accommodate the fluctuations in the workload.
The benchmarker is responsible for configuring the SUT with the corresponding number of user accounts and mailboxes required for the test. The benchmark suite provides tools for the initial population of the mail store.
Since the working set is not static and changes over the course of the test as messages are added or deleted, it is permitted for the benchmarker to capture the mail store image after the tools have created the initial population but before running any load tests (see section 2.7).
2.9.4 Quality of Service Criteria
The SPECmail2008 benchmark has specific Quality of Service (QoS) criteria for response times, delivery times and error rates. The QoS criteria are checked by the benchmark tools.
2.9.5 IMAP4 Autologout Timer
According to the IMAP4 RFC, 2060:
An IMAP4 server MAY have an inactivity autologout timer. Such a timer MUST be at least 30 minutes duration. The receipt of any command from the client during that interval should suffice to reset the autologout timer. When the timer expires, the session does NOT enter the UPDATE state--the server should close the TCP connection without expunging any messages or sending any response to the client. Messages marked DELETED will remain in the server until another IMAP session issues the EXPUNGE command.
2.10 Load Generators
The SPECmail2008 benchmark requires the use of one or more client systems. One client system is designated as the prime client and will run the benchmark manager. One or more client systems act as load generators. One client system is designated as the smtpsink to handle the e-mails sent to remote addresses. Please refer to the User Guide for more detail on these roles.
A server component of the SUT must not be used as a load generator or a smtpsink when testing to produce valid SPECmail2008 results. A server component may be used as the prime client, but this is not recommended.
The client systems must have a Java Runtime Environment (JRE) version 1.5 or higher installed in order to run the benchmark tools.
2.11 SPECmail2008 Parameters
The SPECmail2008 benchmark provides three (3) parameter files that contain the testbed configuration and workload parameters.
The file SPECimap_sysinfo.rc contains the site-specific information that appears in the final report.
The file SPECimap_config.rc contains the testbed (clients and SUT) configuration information that should be modified. This data also appears in the final report.
The file SPECimap_fixed.rc contains the default workload parameters used to produce a compliant test result. This file must not be altered. Modifying the SPECimap_fixed.rc will not prevent the benchmark from running, but the results generated using the modified SPECimap_fixed.rc file will always be marked non-compliant.
To help ensure that the content of the parameter files is correct and can be used to produce a compliant test run, benchmarkers are encouraged to invoke the java specimap command with the -compliant switch. Then if there are problems in the rc files, the benchmark will generate appropriate warning messages but continue running the compliant test.
The SPECmail2008 User Guide provides detailed documentation on the
parameters in the Specimap_config.rc,
SPECimap_sysinfo.rc and Specimap_fixed.rc
3.0 Reporting Rules
In order to publicly disclose SPECmail2008 results, the benchmarker must adhere to these reporting rules in addition to having followed the run rules above. The goal of the reporting rules is to ensure the SUT and testbed are sufficiently documented such that someone could understand the results and reproduce the test.
3.1 Metrics And Result Reports
The benchmark single figure of merit, SPECmail_MSEnt2008 Sessions per Hour, is the throughput measured during the run at the 100% load level.
The report of results for the SPECmail2008 benchmark is generated in HTML by the provided SPEC tools. These tools may not be changed, except for portability reasons with prior SPEC approval. The tools perform error checking and will flag some error conditions resulting in an "invalid result". However, these automatic checks are only there for debugging convenience and do not relieve the benchmarker of the responsibility to check the results and follow the run and reporting rules.
The section of the output.raw file that contains actual test measurements must not be altered. Corrections to the SUT descriptions may be made as needed to produce a properly documented disclosure.
3.2 Results Disclosure and Usage
Any SPECmail2008 result produced, reviewed and accepted by SPEC in compliance with these run and reporting rules may be publicly disclosed and represented as a valid SPECmail2008 result.
Any test result not in full compliance with the run and reporting rules must not be represented using the SPECmail2008 SPECmail_MSEnt2008 metric name.
The metric SPECmail_MSEnt2008 Sessions per Hour must not be associated with any estimated results. This includes adding, multiplying or dividing measured results to create a derived metric.
Compliant runs need to be submitted to SPEC for review and must be accepted prior to public disclosure. Submissions must include the Submission File, a Configuration Diagram, and the Full Disclosure Archive for the run.
3.2.1 Fair Use of SPECmail2008 Results
Any public use of SPECmail2008 results must, at the time of
publication, adhere to the then-currently-posted version of SPEC's Fair
When competitive comparisons are made using SPECmail2008 benchmark results available from the SPEC web site, SPEC requires that the following template be used:
The rationale for the template is to provide fair comparisons by ensuring that:
3.2.2 Research and Academic Usage of SPECmail2008
SPEC encourages use of the SPECmail2008 benchmark in academic and research environments. It is understood that experiments in such environments may be conducted in a less formal fashion than that required of licensees submitting to the SPEC web site or otherwise disclosing valid SPECmail2008 results.
For example, a research environment may use early prototype hardware that simply cannot be expected to stay up for the length of time required to run the entire benchmark, or may use research software that is unsupported and not generally available. Nevertheless, SPEC encourages researchers to obey as many of the run rules as practical, even for informal research. SPEC suggests that following the rules will improve the clarity, reproducibility, and comparability of research results. Where the rules cannot be followed, SPEC requires the results be clearly distinguished from fully compliant results such as those officially submitted to SPEC, by disclosing the deviations from the rules and avoiding the use of the SPECmail2008 metric name.
3.3 Testbed Configuration Disclosure
The system configuration information that is required to duplicate published performance results must be reported. This list is not intended to be all-inclusive, nor is each performance neutral feature in the list required to be described. The rule is: If it affects performance or the feature is required to duplicate the results, then it must be described.
Any deviations from the standard default configuration for the SUT must be documented, so an independent party would be able to reproduce the result without further assistance.
For most of the following configuration details, there is an entry in the configuration file, and a corresponding entry in the tool-generated HTML result page. If information needs to be included that does not fit into these entries, the Notes sections must be used.
3.3.1 SUT Hardware
The following SUT hardware components must be reported:
3.3.2 SUT Software
The following SUT software components must be reported:
3.3.3 Network Configuration
A brief description of the network configuration used to achieve the benchmark result is required. The minimum information to be supplied is:
3.3.4 Client Systems
The following client system properties must be reported:
3.3.5 Configuration Diagram
A configuration diagram of the SUT must be provided in a common graphics format (e.g. PNG, JPEG, GIF). This will be included in the HTML formatted results page. An example would be a line drawing that provides a pictorial representation of the SUT including the network connections between clients, server nodes, switches and the storage hierarchy and any other complexities of the SUT that can best be described graphically.
3.3.6 General Availability Dates
The dates of general customer availability must be listed for the major components: hardware, mail server software, and operating system, by month and year. All the system, hardware and software features are required to be available within 3 months of the first publication of the result. The overall hardware availability date must be the latest of the hardware availability dates. The overall software availability date must be the latest of the software availability dates.
If pre-release hardware or software is used, then the test sponsor represents that the performance measured is the performance to be expected on the same configuration of the release system. If the test sponsor later finds the performance has dropped by more than 5% of that reported for the pre-release system, then the test sponsor must resubmit a corrected test result.
For additional information on general availability requirements, please refer to section 2.2 above.
3.3.7 Test Sponsor
The reporting page must list:
3.3.8 Disclosure Notes
The Notes section is used to document information such as:
3.4 Mail Server Log File Review
The following additional information must be provided if requested for SPEC's results review:
4.0 Submission Requirements for SPECmail2008
Once the test sponsor has a compliant run and wishes to submit it to SPEC for review, they will need to provide the following:
Note: Sometimes the submission needs to include supplemental information, that did not fit within the report format. These include unusual setup/tuning needs or additional configuration information that helps explain the SUT. These additional files should be given to SPEC by special arrangements with the SPEC office staff. It should not be included in the submission e-mail message, because it will be stripped during results extraction.
Once the submission package is ready, please e-mail it to email@example.com to begin the submission process.
Retain the following for possible request during the review:
SPEC encourages the submission of results for review by the relevant subcommittee and subsequent publication on SPEC's web site. Vendors may publish compliant results independently as long as said results have been approved by the related SPEC sub-committee.
5.0 SPECmail2008 Benchmark Kit
SPEC provides client driver software, which includes the tools for running the benchmark and reporting its results. This software includes a number of checks for conformance with these run and reporting rules.
The client driver software is provided as Java bytecode and may be used to produce publishable SPECmail2008 results. SPEC requires the user to provide any other software needed to run the benchmark, e.g. OS and JRE.
The kit also includes the SPECimap_config.rc, SPECimap_sysinfo.rc and SPECimap_fixed.rc files described above and a copy of the benchmark documentation (User Guide, Architecture White Paper, FAQ, and Run and Reporting Rules).
Licensees will be notified of any significant updates to the benchmark tools or documentation. Updated versions of the documentation will be available at http://www.spec.org.
Copyright (c) 2008 Standard Performance Evaluation Corporation