CINT92 and CFP 92 Homogeneous Capacity Method
Offers Fair Measure of Processing Capacity
by Alexander Carlton
Things change fast in the realm of computers, so to keep abreast of
advancements in system development, SPEC has developed a new way to
measure the basic computing capacity of a system. The "SPECmark89"
was a useful measure of a processor's speed; however, in today's
world where increasing numbers of machines come with several or
many processors inside each system, a single processor's speed
may not be as interesting as the overall processing capacity of
the system -- not just _how_fast_ can one task be done, _how_many_
tasks can be done each hour.
In simpler times, one was only concerned with how fast a system
could complete a task. The faster it could complete one task, then
the faster it could turn to start the next. Now with multi-processor
machines (machines with more than one available computing processor)
systems are capable of performing more than one task at the very
same time. Such a multi-processor system may not be able to perform
any one task faster than a competing uni-processor model, but it may
be able to complete more tasks per hour than its rivals. Now the
issue is not measuring a system's speed, but measuring how much it
can do.
SPEC has defined a new method that can be used to determine and
compare the processing power available from systems of any degree
of multi-processing. The "SPEC Homogeneous Capacity Method" provides
one with a fair measure for the processing capacity of a system --
how much work can it perform in a given amount of time. The
"SPECrate" is the resulting new metric, the rate at which a system
can complete the defined tasks.
Like the SPECmark89, the SPECrate is a component measure. It does
not attempt to measure factors beyond the basic processing subsystem
(CPUs, cache, bus, memory, compilers, etc.). The Capacity Method is
not a system level benchmark. The SPEC SDM benchmarks provide an
excellent means for making system level measurements. The Capacity
Method is a useful test for deriving a fair value for the raw CP
horsepower of a system regardless of the number of processors
available.
The SPECrate is a capacity measure. It is not a measure of how fast
a system can perform any task; rather it is a measure of how many of
those tasks that system complete within an arbitrary time interval.
To use an analogy, imagine that this is not about computers, it is
about cooking stoves. The SPECmark89 would be a rating of how fast
one burner could bring one cup of water to a boil and the SPECrate
would be rating of how much water can be boiled by that stove (using
whatever number of burners available). If a simple single-burner
stove could boil one cup of water in 5 minutes, then it would have
a rate of 12 cups an hour; but if a four-burner stove could bring
four separate cups of water to boil all in 15 minutes, it would
have a rate of 16 cups an hour. The second stove would have a
greater capacity, i.e. a higher SPECrate, but the first stove
would be better for boiling an individual egg due to its better
SPECmark89.
Benchmarks
The Capacity Method uses the exact same benchmarks as defined for
the CINT92 and CFP92 suites. No source code nor makefiles have been
changed in any way. Thus the binaries used are identical to those
used for the speed metrics. Only the tools for executing and
evaluating the tests have changed.
Method
As with the traditional speed measures, each benchmark in a suite is
measured independently. A user is free to choose, for each benchmark
in a suite, the appropriate number of copies to run in order to
maximize the performance.
What is measured is the elapsed time from when all copies are
launched simultaneously until the time the last copy finishes. The
elapsed time and the number of copies executed are then used in a
formula to calculate a completion rate: the SPECrate.
Metric
SPECrate = #CopiesRun * ReferenceFactor * UnitTime/ElapsedExecutionTime
The first term, #CopiesRun, taken directly from the measurement is
typically chosen to optimize the SPECrate for that benchmark. If a
system took 622 seconds to complete 3 copies of 015.doduc, then one
can simplistically state that the system completes 3 runs of 015.doduc
per every 622 seconds.
The second term, the ReferenceFactor, is a normalization factor,
defined so that each benchmark's job is of a similar duration despite
the very wide variance evident in the sizes of the defined workloads.
SPEC has defined a standard job length and then uses a reference
factor to scale each benchmark's workload up to that length. The
standard job length is taken simply as the length of the benchmark
with the longest SPEC reference time: 056.ear. So, the
ReferenceFactor for each benchmark is defined as the ratio of that
benchmark's reference time over the longest reference time:
25,500 seconds.
The ReferenceFactor for the short benchmark 015.doduc is then
1860/25500. Therefore, if our example system can do 3 015.doduc
runs every 622 seconds, that means it does just under 219 thousandths
(0.218823529) of a "doduc-job" every 622 seconds, or 0.000351806
"doduc-job"s per second.
Which brings us to the third term, UnitTime/ElapsedExecutionTime,
which converts from very short seconds to a unit of time more
appropriate for this work. The chosen time interval, or unit,
is one week (604,800 seconds) -- one week being the smallest time
interval within which the SPEC reference machine (the venerable
VAX11-780) can complete a significant number of jobs.
Then, going back to that example system which did 3 015.doduc
runs in 622 seconds, or about 352 millionths of a "doduc-job"
per second; that example system would have a "SPECrate" of 212.7
for 015.doduc (see formula below):
3 * (1860/25500) * (604800/622)
Finally, the summary value SPECrate_int92 for the CINT92 suite,
and SPECrate_fp92 for the CFP92 suite is calculated by taking the
geometric mean of all the SPECrates from each benchmark in a suite.
Mathematics
The tests measure an elapsed time to complete some number of
independent copies of a benchmark, equivalent to a completion rate
over an arbitrary time interval.
These measured completion rates are then normalized by means of a
reference factor to calculate a completion rates for a normalized
job definition. The time interval is then converted to a value
which provides useful results.
Finally, a mean can be taken across all the measurements in a suite.
The geometric mean is used as the individual values have already
been weighted by their normalizing reference factors.
History
Back in 1990, SPEC announced its Release 1.2b. The main feature of
Release 1.2b was the definition and support for the "SPECthruput89"
metric with its "SPEC Thruput Method A: Homogeneous Load" method.
SPECthruput89 attempted to measured the per-processor speed available
in a system, and then allowed for the calculation of an aggregate
across all the processors.
This proved to be confusing to many. Unfortunately it was easy
to make invalid comparisons between SPECmark89s and SPECthruput89s
or even mistake values between these metrics. It is not fair to
compare the speed of a uni-processor machine against the throughput
of a multi-processor. However, many believed that it would be
acceptable to compare SPECthruput89s against SPECmark89s, because
the SPECthruput89 looked like a SPECmark89 both in terms of the
results and the means to calculate those results.
During the definition of CINT92 and CFP92, SPEC spent a great
deal of effort to improve the measures for multiprocessor systems.
During these discussions we realized that by focusing on a system's
total completion rate rather than trying to aggregate its per-
processor speeds, we could get an easier metric that would be
valid and fair across systems of any number of processors.
The SPECrates defined for CINT92 and CFP92 are real throughput,
or capacity, measures; and as such are meaningful metrics across
all degrees of multi-processing. Since it measures the systems'
total CPU capacity, a SPECrate for a uniprocessor should be
directly and easily comparable to a SPECrate for another uni-
or a 2- or an 8-processor system. What is being compared
between the systems is a rating of the available CPU horsepower
in the entire system -- not just how fast any one crank inside
is turning, but how much power is made available to you to do
your work.
Copyright (c) 1995 Standard Performance Evaluation Corporation