Author Topic: Calibration or load level scaling problem (Read 10209 times)

Norbert Schmitt · « **on:** September 19, 2016, 04:59:49 AM »

Hello,

I am currently writing my master thesis and developed a Chauffeur Worklet. The worklet runs native C++ code through JNI.

During measurements I came across what I suspect is a bug in Chauffeur. The load level scaling did not work properly. While at a 100% load level, it reached the calibrated transaction throughput and all other load levels are reported as accurate. Yet the transaction count does not fit. It seems that for load levels below 100%, a different calibration value, lower than reported is used.

I am going to rerun the worklet and see if the problem can be reproduced but would like to hear some advice where I can start looking for the root cause. The native code ran fine on measurements before and after this specific one and the transactions are blocking until they are executed. Could the problem be related to the large number of transactions?

Code: [Select]

   Phase    | Interval | Actual |      Score      |  Host  | Client |   Elapsed   |   Transaction   | Transaction  | Transaction 
            |          |  Load  |                 |   CV   |   CV   | Measurement |                 |    Count     |   Time (s)  
            |          |        |                 |        |        |  Time (s)   |                 |              |             
----------- | -------- | ------ | --------------- | ------ | ------ | ----------- | --------------- | ------------ | ------------
  warmup    |   max    |        |  1,711,211.135  |  0.0%  |  3.1%  |   120.052   | PetTransaction  | 205,432,535  |   852.048   
calibration |   max    |        |  1,544,903.586  |  0.0%  |  3.2%  |   120.052   | PetTransaction  | 185,467,074  |   857.551   
            |   max    |        |  1,706,569.464  |  0.0%  |  2.0%  |   120.044   | PetTransaction  | 204,862,882  |   855.405   
            |   max    |        |  1,693,478.034  |  0.0%  |  2.0%  |   120.048   | PetTransaction  | 203,298,379  |   855.923   
            | Calibrat |        |  1,700,021.825  |        |        |             |                 |              |             
            |   ion    |        |                 |        |        |             |                 |              |             
            |  Result  |        |                 |        |        |             |                 |              |             
measurement |   100%   | 100.0% |  1,695,528.229  |  0.0%  |  2.1%  |   120.040   | PetTransaction  | 203,530,810  |   856.197   
            |   90%    | 90.0%  |  1,006,323.644  |  0.0%  |  0.0%  |   120.064   | PetTransaction  | 120,812,465  |   718.304   
            |   80%    | 80.0%  |   894,418.676   |  0.0%  |  0.0%  |   120.044   | PetTransaction  | 107,369,451  |   614.566   
            |   70%    | 70.0%  |   782,627.988   |  0.0%  |  0.0%  |   120.043   | PetTransaction  |  93,949,039  |   584.892   
            |   60%    | 60.0%  |   670,896.028   |  0.0%  |  0.0%  |   120.044   | PetTransaction  |  80,536,603  |   579.594   
            |   50%    | 50.0%  |   559,069.424   |  0.0%  |  0.0%  |   120.044   | PetTransaction  |  67,112,856  |   574.692   
            |   40%    | 40.0%  |   447,211.301   |  0.0%  |  0.0%  |   120.044   | PetTransaction  |  53,684,693  |   582.760   
            |   30%    | 30.0%  |   335,443.047   |  0.0%  |  0.0%  |   120.040   | PetTransaction  |  40,266,545  |   464.538   
            |   20%    | 20.0%  |   223,659.301   |  0.0%  |  0.0%  |   120.044   | PetTransaction  |  26,848,700  |   287.977   
            |   10%    | 10.0%  |   111,755.848   |  0.0%  |  0.1%  |   120.044   | PetTransaction  |  13,415,605  |   152.997

The config.xml:

Code: [Select]

<?xml version="1.0" encoding="UTF-8"?>
<chauffeur xmlns="http://spec.org/power_chauffeur" xmlns:xi="http://www.w3.org/2001/XInclude">
	<xi:include href="test-environment.xml"/>

	<definitions>
		<!-- Interval length for calibration and measurement intervals. -->
		<interval-length id="testIntervalLength">
			<premeasurement>30s</premeasurement>
			<measurement>120s</measurement>
			<postmeasurement>10s</postmeasurement>
		</interval-length>

		<!-- Interval length for warmup intervals. Experience shows that
		web server loads (such as BUNGEE or HTTP_DS2) profit from long
		warmup periods. These increase stability for the measurements. -->
		<interval-length id="warmupLength">
			<premeasurement>30s</premeasurement>
			<measurement>120s</measurement>
			<postmeasurement>10s</postmeasurement>
		</interval-length>
	</definitions>

	<warmup-phase id="petwarmup">
		<sequence>
			<interval-series className="NoDelaySeries">
				<scenario-mix-factory>javaScenarioMix</scenario-mix-factory>
				<interval-count>1</interval-count>
			</interval-series>
			<interval-length ref="warmupLength"/>
		</sequence>
	</warmup-phase>

	<calibration-phase id="petcalibration">
		<sequence>
			<interval-series className="NoDelaySeries">
				<scenario-mix-factory>javaScenarioMix</scenario-mix-factory>
				<interval-count>3</interval-count>
			</interval-series>
			<interval-length ref="testIntervalLength"/>
		</sequence>
		<calibrator className="AverageThroughputCalibrator">
			<average-intervals>2</average-intervals>
		</calibrator>
	</calibration-phase>

	<measurement-phase id="petmeasurement">
		<sequence>
			<interval-series className="GraduatedMeasurementSeries">
				<scenario-mix-factory>javaScenarioMix</scenario-mix-factory>
				<interval-count>10</interval-count>
			</interval-series>
			<interval-length ref="testIntervalLength"/>
		</sequence>
	</measurement-phase>

	<suite>
		<client-configuration>
			<clients key="PetConfig">
				<count>logicalCores</count>
				<option-set/>
			</clients>
		</client-configuration>

		<description className="tools.descartes.power.chauffeur.pet.PetSuite" classpath="lib/pet.jar"/>

		<xi:include href="listeners.xml"/> 

		<workload enabled="true">
			<name>Pet</name>

			<worklet enabled="true">
				<name>Pet_Native</name>

				<launch-definition id="launchDef">
					<configuration-key>PetConfig</configuration-key>
				</launch-definition>

				<workletDefinition>
					<location>tools/descartes/power/chauffeur/pet/pet.xml</location>
					<classpath>
						<entry>lib/pet.jar</entry>
					</classpath>
				</workletDefinition>

				<!-- Sets the number of users per Client to 1. This setting really
				doesn't matter, as BUNGEE users don't do anything of note. -->
				<max-per-client-users>1</max-per-client-users>

				<!-- Sets the number of transaction threads to 1. Set this if you
				want BUNGEE to only use one driver thread that sends requests to
				the internal webserver. Increase this number for more drivers.
				Removing (or commenting) this line should set the amount of driver
				threads to the amount of logical cores on the SUT. -->
				<num-transaction-threads>1</num-transaction-threads>

				<warmup-phase ref="petwarmup"/>
				<calibration-phase ref="petcalibration"/>
				<measurement-phase ref="petmeasurement"/>
			</worklet>
		</workload>
	</suite>
</chauffeur>

JeremyArnold · « **Reply #1 on:** September 19, 2016, 11:09:07 AM »

I agree this looks odd. I'm not sure what would be causing this, but here are a couple things we can look at.

In results.xml, you should find:
<phase name="Measurement" type="measurement">

Just below this will be the results for each interval, including the <score> (should match the value from the report), <scenariosPerSecond>, <transactionsPerSecond>, and <targetScenariosPerSecond>. I'm most interested in the <targetScenariosPerSecond> -- we would expect the target for the 90% load level to be 90% of the target for the 100% interval. If it's not, something strange is going on and we can investigate further in that direction.

For the 100% load level, the <scenariosPerSecond> and <targetScenariosPerSecond> values should be very close. If they aren't, that will give us a different direction to investigate. I'm sort of suspicious that this will be the case. Chauffeur does treat the 100% load level in a special way, much like the NoDelaySeries used during warmup and calibration. So my theory here is that somehow in the 100% load level we are significantly exceeding the <targetScenariosPerSecond>, but then in the other load levels Chauffeur won't allow it to. But that would still leave the question of why the calibrated value wasn't reflecting this higher rate. And why the "Actual Load" is reported as 100.0% when it would really be something greater than 100%.

If you are willing to post your results.xml, I can certainly take a look. Otherwise, you can start by looking at the things I mentioned above and then we'll go on from there.

Norbert Schmitt · « **Reply #2 on:** September 19, 2016, 11:56:09 AM »

I would have provided the result.xml but the forum does not let me attach larger files.

While looking through the result I found something odd. The actual <targetScenariosPerSecond> for the 100% interval is -1. For all other load levels, the <scenariosPerSecond> and <targetScenariosPerSecond> are very close. I also did take a look at the calibration and warmup values and all of the <targetScenariosPerSecond> are -1 as well.

The 100%, 90% and 80% load levels:

Code: [Select]

<interval name="100%">
            <result>
              <score>1695528.228719</score>
              <elapsed>120039767794</elapsed>
              <scenariosPerSecond>1118110.526758</scenariosPerSecond>
              <transactionsPerSecond>1695528.188202</transactionsPerSecond>
              <targetScenariosPerSecond>-1</targetScenariosPerSecond>
...
<interval name="90%">
            <result>
              <score>1006323.643981</score>
              <elapsed>120063564104</elapsed>
              <scenariosPerSecond>1006237.680313</scenariosPerSecond>
              <transactionsPerSecond>1006237.536771</transactionsPerSecond>
              <targetScenariosPerSecond>1006289.308176</targetScenariosPerSecond>
...
<interval name="80%">
            <result>
              <score>894418.675741</score>
              <elapsed>120043848956</elapsed>
              <scenariosPerSecond>894418.655098</scenariosPerSecond>
              <transactionsPerSecond>894418.597319</transactionsPerSecond>
              <targetScenariosPerSecond>894454.382826</targetScenariosPerSecond>

JeremyArnold · « **Reply #3 on:** September 19, 2016, 12:42:10 PM »

Ahh, yes. The targetScenariosPerSecond==-1 is another side-effect of the special treatment of the 100% interval. Basically, when we see that we are trying to run 100%, we just treat it as running with no delays between transactions (like the NoDelaySeries). Earlier versions of Chauffeur did not do that, but we were encountering a fair number of cases where small delays during the 100% interval were causing runs to be invalid.

I think the biggest clue here is that in the 90% interval (and presumably the others), the <scenariosPerSecond> and <transactionsPerSecond> are very close, but in the 100% interval the <transactionsPerSecond> value is about 50% higher. I think you are running with a batch size of 1, so we expect the two values to be about the same. The target is based on the scenariosPerSecond value, but the score is based on the transaction rate. So I think this is explains the numbers you are seeing.

Now we just need to figure out why the transactionsPerSecond value is so much higher than the scenariosPerSecond during the 100% load level. I'll have to think about what might be causing that. When the batch size is 1 and the transaction rate is sufficiently high (as it is here), the values should normally be very close.

Maxinut · « **Reply #4 on:** February 08, 2018, 10:56:25 PM »

I have the same problem. What can I do to fix it?

JeremyArnold · « **Reply #5 on:** February 08, 2018, 11:49:25 PM »

I don't think we ever got to the bottom of this issue, and I haven't observed it with worklets we've developed inside of SPEC.

If you are willing to post the same portion of your results.xml that Norbert did (showing the 100%, 90%, and 80% interval results), I'll confirm that we're seeing similar behavior in your case. Then I'll work with you to try to identify what is causing the issue.

Please also let me know what version of the Chauffeur WDK you are using.

News:

Author Topic: Calibration or load level scaling problem (Read 10209 times)

Norbert Schmitt

Calibration or load level scaling problem

JeremyArnold

Re: Calibration or load level scaling problem

Norbert Schmitt

Re: Calibration or load level scaling problem

JeremyArnold

Re: Calibration or load level scaling problem

Maxinut

Re: Calibration or load level scaling problem

JeremyArnold

Re: Calibration or load level scaling problem