Run and Reporting Rules
|
This document specifies how the benchmarks in the SPEC JVM Client98 suite are to be run for measuring and publicly reporting performance results. These rules are intended to ensure that results generated with this suite are meaningful, comparable to other generated results, and are repeatable (with documentation covering factors pertinent to duplicating the results).
Per the SPEC license agreement, all publicly disclosed results must adhere to these Run and Reporting Rules.
SPEC intends that this benchmark suite be applicable to standalone Java client computers, either with disk (e.g., PC, workstation) or without disk (e.g., network computer) executing programs in an ordinary Java platform environment. This suite is intended to measure performance of Java clients, not that of Java servers nor embedded systems even though the benchmarks will run in those environments. The benchmarks measure the speed of execution by the Java Virtual Machine of Java byte codes which is fundamental to overall Java performance in many application environments.
In addition to basic byte code execution, this benchmark suite requires graphics, networking, and I/O, and these functions will influence benchmark performance, in some cases significantly so. However, SPEC intends that these functions will not ordinarily dominate benchmark performance, and therefore these benchmarks should not be taken as representative of application environments which are dominated by these functions.
These benchmarks require a version 1.1 Java Virtual Machine, or later. They are applicable both to systems using 64-bit intermediate floating point values, and to those using 80-bit values.
These benchmarks do not provide a comparison of Java performance to that of C or C++. SPEC recognizes that a number of environments have been created with the intent to provide improved performance by performing some work statically prior to program execution instead of dynamically during program execution. These benchmarks are not intended to compare the dynamic Java platform against such statically compiled programs.
This document specifies how the benchmarks in the SPEC JVM Client98 suite are to be run for measuring and publicly reporting performance results. The general philosophy behind the rules for running the SPEC JVM Client98 benchmark is to ensure that an independent party can reproduce the reported results. The SPEC benchmark tools as provided must be used to run the benchmarks.
SPEC is aware of the importance of optimizations in producing the best system performance. SPEC is also aware that it is sometimes hard to draw an exact line between legitimate optimizations that happen to benefit SPEC benchmarks and optimizations that specifically target the SPEC benchmarks. However, with the list below, SPEC wants to increase awareness of implementors and end users to issues of unwanted benchmark-specific optimizations that would be incompatible with SPEC's goal of fair benchmarking. To ensure that results are relevant to end-users, SPEC expects that the hardware and software implementations used for the running the SPEC benchmarks adhere to following conventions:
In the case where it appears that the above guidelines have not been followed, SPEC may investigate such a claim and request that the offending optimization (e.g. a SPEC-benchmark specific pattern matching) be backed off and the results resubmitted. Or, SPEC may request that the vendor correct the deficiency (e.g. make the optimization more general purpose or correct problems with code generation) before submitting results based on the optimization.
SPEC reserves the right to adapt the benchmark codes, workloads, and rules of SPEC JVM Client98 as deemed necessary to preserve the goal of fair benchmarking. SPEC with notify members and licencees whenever it makes changes to the suite and will rename the metrics. In the event that a workload is removed, SPEC reserves the right to republish in summary form "adapted" results for previously published systems, converted to the new metric. In the case of other changes, a republication may necessitate retesting and may require support from the original test sponsor.
Relevant standards cited in these run rules are current as of the date of publication. Changes or updates to these referenced documents, or other reasons may necessitate amendment of these run rules. The current run rules will be available at the SPEC web site at http://www.spec.org.
Tested systems must provide an environment suitable for running typical Java Version 1.1 programs and must be generally available for that purpose. Any tested system must include an implementation of the Java (tm) Virtual Machine as described by the following references, or as amended by SPEC for later Java versions:
The system must also include an implementation of those packages and their classes that are referenced by this suite as described within the following references:
The SPEC JVM Client98 benchmark suite runs as an applet on a client system (Network Computer, Personal Computer, or Workstation). The SPEC JVM Client98 software is installed onto the file system of a web server. The benchmark home page, index.html, is loaded as a URL by a web browser on the client, and provides instructions and documentation for running the benchmarks, as well as a link to the page containing the SPEC JVM Client98 benchmark applet.
Reportable results must be run with the SPEC tools as an applet from a web server either running on the same machine as the benchmarks, or running on another machine. Benchmark classes must be loaded from that web server; e.g., ensure that the CLASSPATH environment variable, or equivalent, is not set so that benchmark classes will be loaded from a local filesystem.
SPEC requires the use of a of single file system on a web server system to contain the directory tree for the suite being run. SPEC allows any type of file system (disk-based, memory-based, NFS, DFS, etc.) to be used. The type of file system must be disclosed in reported results.
The classes of the benchmark suite are provided as individual class files rather than being collected in a JAR archive. The benchmark must be run using these individual class files, not a JAR or other class file archive. SPEC recognizes that JAR archives are commonly used to speed execution particularly of Java applets loaded over the network. However, by not using JAR we ensure an apples-to-apples comparison without potential second order performance impacts from increased memory utilization.
There are a number of parameters that control the operation of the SPEC benchmark tools which may be set by property files, command line arguments, HTML applet parameters, and by GUI controls. The use of all these controls is explained in the benchmark documentation. The properties in the file "props/spec" may not be changed from the values as provided by SPEC. The properties in the file "props/user" may be set to any desired value. In particular a reportable result must:
All benchmark settings must be reported. The benchmark tools provide for such reporting automatically.
Tuning may be accomplished through JVM command line flags, system configuration settings, web browser configuration settings, initialization files, environment variables, or other means. Tuning options may affect a JIT compiler, any other part of the JVM, or an underlying operating system, if any. The following rules apply to tuning for the SPEC JVM Client98 benchmark suite.
Results will be reported in three categories of physical memory sizes. These categories will be printed on the reporting page, and the SPEC web site will group them by these categories. "Physical" memory size is taken to include a total memory size set in software for the OS to use for all purposes. This must be the same or conservative compared to physically pulling out memory from the machine. The categories of memory size reported are:
Proper use of the SPEC benchmark tools and reporting page generation program will take care of the production of a correctly formatted reporting page. The test sponsor is responsible for the content of all information, and for reviewing the page to ensure that all information is correct. If there is any problem with the SPEC tools in preparing a correct reporting page, the test sponsor can contact SPEC to possibly obtain a bug fix.
The elapsed time in seconds of each of the benchmarks for the system under test is divided into the reference time to give a SPEC ratio indicating the speed of the tested machine compared to times provided by SPEC for the reference machine. The composite metric is calculated as a geometric mean of the individual ratios. All runs of each benchmark when using the SPEC tools are required to have validated correctly.
SPEC derived the reference times by executing the benchmarks on a reference machine. These reference times are now fixed, and are used by the SPEC reporting software for all calculations of SPEC ratios. The reference machine is:
The metrics are calculated as the geometric means of the SPEC ratios. All benchmark programs are weighted equally with the exception of _200_check whose execution time is not reported nor used in metric calculation. The benchmark tools run each benchmark a number of times according to criteria set by the benchmarker. The SPECjvm98 metric is calculated from the best ratios, and the SPECjvm_base98 metric is calculated from the worst ratios.
The table of results shows SPEC ratios for each of the benchmarks in the suite, and an overall geometric mean. Each benchmark is executed at least twice, and both the worst and best of all runs are reported here.
A bar chart depicts the SPEC ratios of each benchmark, worst and best, and also the geometric means of the worst ratios and of the best ratios.
Client Hardware
Client Software
Server Hardware
Server Software
The dates of general customer availability must be listed for the major components: hardware, HTTP server, and operating system, month and year. All the system, hardware and software features are required to be available within 6 months of the date of test.
If pre-release hardware or software is tested, then the test sponsor represents that the performance measured is generally representative of the performance to be expected on the same configuration of the released system. If the sponsor later finds the performance of the released system to be 5% lower than that reported for the pre-release system, then the sponsor is requested to report a corrected test result.
This section is used to document any other contitions which are necessary for an independent party to reproduce the measured test results. E.g.:
If a test result is invalid for any reason this section of the reporting page lists the reason(s) why it is not valid. An invalid result page will also have "INVALID RESULT" stamped across the page background, and no geometric mean is calculated or reported.
A table lists details about each execution of each benchmark:
A bar graph depicts the memory in use and total memory at the end of each individual benchmark run.
SPEC requires a full disclosure of results and configuration details sufficient to reproduce the results. A full disclosure report of all information described in these rules must be available on request within two weeks of any public disclosure of SPEC JVM Client98 results. Acceptable means of providing disclosures include:
A full disclosure of results will include:
If for some reason, the test sponsor cannot run the benchmarks as specified in these rules, the test sponsor can seek SPEC approval for performance-neutral alternatives. No publication of SPEC metrics may be made without such approval, except for research use as described below.
Other modes of running the benchmarks will be provided for research purposes. Any publication of performance information derived using the SPEC JVM Client98 benchmarks must credit SPEC as the source of the benchmarks. Any publication which is not compliant with all of the run rules must not represent directly or indirectly that the result is a SPECjvm98 metric. It must explicity state that the result is not comparable with a SPECjvm98 metric.
In any such use, only the elapsed time of individual benchmarks may be reported. SPEC ratios and geometric means may not be calculated.Elapsed times on different systems and test conditions may be compared with one another. Test conditions must be described sufficiently to allow a third party to reproduce the results. No derivative metrics may be devised based on these benchmarks.
SPECjvm98 metrics may be estimated. All estimates must be clearly identified as such. Licensees are encouraged to give a rationale or methodology for any estimates, and to publish actual SPECjvm98 metrics as soon as possible.
Source code for some of the benchmarks is provided with so that people can better understand what the benchmarks are doing, and to facilitate academic research. However, no compilation of Java source code to class files is required to run the benchmarks, and no such compilation is allowed for reported results. The SPEC tools to run the benchmarks are also provided as Java class files. No compilation of the SPEC tools is required or allowed.