CINT92 and CFP 92 Homogeneous Capacity Method Offers Fair Measure of Processing Capacity by Alexander Carlton Things change fast in the realm of computers, so to keep abreast of advancements in system development, SPEC has developed a new way to measure the basic computing capacity of a system. The "SPECmark89" was a useful measure of a processor's speed; however, in today's world where increasing numbers of machines come with several or many processors inside each system, a single processor's speed may not be as interesting as the overall processing capacity of the system -- not just _how_fast_ can one task be done, _how_many_ tasks can be done each hour. In simpler times, one was only concerned with how fast a system could complete a task. The faster it could complete one task, then the faster it could turn to start the next. Now with multi-processor machines (machines with more than one available computing processor) systems are capable of performing more than one task at the very same time. Such a multi-processor system may not be able to perform any one task faster than a competing uni-processor model, but it may be able to complete more tasks per hour than its rivals. Now the issue is not measuring a system's speed, but measuring how much it can do. SPEC has defined a new method that can be used to determine and compare the processing power available from systems of any degree of multi-processing. The "SPEC Homogeneous Capacity Method" provides one with a fair measure for the processing capacity of a system -- how much work can it perform in a given amount of time. The "SPECrate" is the resulting new metric, the rate at which a system can complete the defined tasks. Like the SPECmark89, the SPECrate is a component measure. It does not attempt to measure factors beyond the basic processing subsystem (CPUs, cache, bus, memory, compilers, etc.). The Capacity Method is not a system level benchmark. The SPEC SDM benchmarks provide an excellent means for making system level measurements. The Capacity Method is a useful test for deriving a fair value for the raw CP horsepower of a system regardless of the number of processors available. The SPECrate is a capacity measure. It is not a measure of how fast a system can perform any task; rather it is a measure of how many of those tasks that system complete within an arbitrary time interval. To use an analogy, imagine that this is not about computers, it is about cooking stoves. The SPECmark89 would be a rating of how fast one burner could bring one cup of water to a boil and the SPECrate would be rating of how much water can be boiled by that stove (using whatever number of burners available). If a simple single-burner stove could boil one cup of water in 5 minutes, then it would have a rate of 12 cups an hour; but if a four-burner stove could bring four separate cups of water to boil all in 15 minutes, it would have a rate of 16 cups an hour. The second stove would have a greater capacity, i.e. a higher SPECrate, but the first stove would be better for boiling an individual egg due to its better SPECmark89. Benchmarks The Capacity Method uses the exact same benchmarks as defined for the CINT92 and CFP92 suites. No source code nor makefiles have been changed in any way. Thus the binaries used are identical to those used for the speed metrics. Only the tools for executing and evaluating the tests have changed. Method As with the traditional speed measures, each benchmark in a suite is measured independently. A user is free to choose, for each benchmark in a suite, the appropriate number of copies to run in order to maximize the performance. What is measured is the elapsed time from when all copies are launched simultaneously until the time the last copy finishes. The elapsed time and the number of copies executed are then used in a formula to calculate a completion rate: the SPECrate. Metric SPECrate = #CopiesRun * ReferenceFactor * UnitTime/ElapsedExecutionTime The first term, #CopiesRun, taken directly from the measurement is typically chosen to optimize the SPECrate for that benchmark. If a system took 622 seconds to complete 3 copies of 015.doduc, then one can simplistically state that the system completes 3 runs of 015.doduc per every 622 seconds. The second term, the ReferenceFactor, is a normalization factor, defined so that each benchmark's job is of a similar duration despite the very wide variance evident in the sizes of the defined workloads. SPEC has defined a standard job length and then uses a reference factor to scale each benchmark's workload up to that length. The standard job length is taken simply as the length of the benchmark with the longest SPEC reference time: 056.ear. So, the ReferenceFactor for each benchmark is defined as the ratio of that benchmark's reference time over the longest reference time: 25,500 seconds. The ReferenceFactor for the short benchmark 015.doduc is then 1860/25500. Therefore, if our example system can do 3 015.doduc runs every 622 seconds, that means it does just under 219 thousandths (0.218823529) of a "doduc-job" every 622 seconds, or 0.000351806 "doduc-job"s per second. Which brings us to the third term, UnitTime/ElapsedExecutionTime, which converts from very short seconds to a unit of time more appropriate for this work. The chosen time interval, or unit, is one week (604,800 seconds) -- one week being the smallest time interval within which the SPEC reference machine (the venerable VAX11-780) can complete a significant number of jobs. Then, going back to that example system which did 3 015.doduc runs in 622 seconds, or about 352 millionths of a "doduc-job" per second; that example system would have a "SPECrate" of 212.7 for 015.doduc (see formula below): 3 * (1860/25500) * (604800/622) Finally, the summary value SPECrate_int92 for the CINT92 suite, and SPECrate_fp92 for the CFP92 suite is calculated by taking the geometric mean of all the SPECrates from each benchmark in a suite. Mathematics The tests measure an elapsed time to complete some number of independent copies of a benchmark, equivalent to a completion rate over an arbitrary time interval. These measured completion rates are then normalized by means of a reference factor to calculate a completion rates for a normalized job definition. The time interval is then converted to a value which provides useful results. Finally, a mean can be taken across all the measurements in a suite. The geometric mean is used as the individual values have already been weighted by their normalizing reference factors. History Back in 1990, SPEC announced its Release 1.2b. The main feature of Release 1.2b was the definition and support for the "SPECthruput89" metric with its "SPEC Thruput Method A: Homogeneous Load" method. SPECthruput89 attempted to measured the per-processor speed available in a system, and then allowed for the calculation of an aggregate across all the processors. This proved to be confusing to many. Unfortunately it was easy to make invalid comparisons between SPECmark89s and SPECthruput89s or even mistake values between these metrics. It is not fair to compare the speed of a uni-processor machine against the throughput of a multi-processor. However, many believed that it would be acceptable to compare SPECthruput89s against SPECmark89s, because the SPECthruput89 looked like a SPECmark89 both in terms of the results and the means to calculate those results. During the definition of CINT92 and CFP92, SPEC spent a great deal of effort to improve the measures for multiprocessor systems. During these discussions we realized that by focusing on a system's total completion rate rather than trying to aggregate its per- processor speeds, we could get an easier metric that would be valid and fair across systems of any number of processors. The SPECrates defined for CINT92 and CFP92 are real throughput, or capacity, measures; and as such are meaningful metrics across all degrees of multi-processing. Since it measures the systems' total CPU capacity, a SPECrate for a uniprocessor should be directly and easily comparable to a SPECrate for another uni- or a 2- or an 8-processor system. What is being compared between the systems is a rating of the available CPU horsepower in the entire system -- not just how fast any one crank inside is turning, but how much power is made available to you to do your work. Copyright (c) 1995 Standard Performance Evaluation Corporation