Author Topic: runspecvirt fail  (Read 26827 times)

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #45 on: September 21, 2017, 01:09:38 PM »
Of course , everytime I run spec,  I have reboot all VMs.

My host is centros7,  and all VM (client \ workload ) is centros 6,  I use kvm hypervisor

I found My physic host date is not sync with vm,  it that matter?

[root@client1 SPECvirt]# ./synccheck.sh
infraserver1:
Fri Sep 22 01:07:22 CST 2017
websserver1:
Fri Sep 22 01:07:23 CST 2017
appserver1:
Fri Sep 22 01:07:23 CST 2017
dbserver1:
Fri Sep 22 01:07:23 CST 2017
batchserver1:
Fri Sep 22 01:07:23 CST 2017
mailserver1:
Fri Sep 22 01:07:23 CST 2017
wclient1:
Fri Sep 22 01:07:24 CST 2017
client1:
Fri Sep 22 01:07:24 CST 2017
[root@client1 SPECvirt]# exit
logout
Connection to 172.21.128.242 closed.
[root@localhost ~]# date
Thu Sep 21 13:07:57 EDT 2017
« Last Edit: September 21, 2017, 01:15:13 PM by zgy »

lroderic

  • Moderator
  • Full Member
  • *****
  • Posts: 167
  • Karma: +6/-0
Re: runspecvirt fail
« Reply #46 on: September 21, 2017, 01:24:25 PM »
It's easiest to have everything in sync - hosts, client VMs, workload VMs. In timesynctiles.sh, use the host's hostname as TIMEHOST, then all VMs will sync to the host.

Lisa

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #47 on: September 21, 2017, 02:06:53 PM »
My client1\wclient1 and workload vm run on different phy host,

So timesynctiles.sh can still work ?

Example client1\wclient1 run on phyname1 and workload vm run on phyname2,

How to run it ? thanks .

lroderic

  • Moderator
  • Full Member
  • *****
  • Posts: 167
  • Karma: +6/-0
Re: runspecvirt fail
« Reply #48 on: September 21, 2017, 02:11:27 PM »
It doesn't matter what the time is on the client host. The time sync script gets and sets the time on the client VMs and workload VMs, not the client host.

Please read the comments at the top of timesynctiles.sh.

You need a source for the time (TIMEHOST), so use the host where the VMs run. Let's say the host where the workload VMs run is 172.16.20.20. Put the following in your runspecvirt.sh script:

Code: [Select]
/opt/SPECvirt/timesynctiles.sh client 172.16.20.20

Lisa

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #49 on: September 21, 2017, 02:34:19 PM »
I have sync the date on host -- client -- workload,  And I will rerun it .

[root@localhost opt]# ./synccheck.sh
infraserver1:
Fri Sep 22 02:28:55 CST 2017
websserver1:
Fri Sep 22 02:28:55 CST 2017
appserver1:
Fri Sep 22 02:28:55 CST 2017
dbserver1:
Fri Sep 22 02:28:56 CST 2017
batchserver1:
Fri Sep 22 02:28:56 CST 2017
mailserver1:
Fri Sep 22 02:28:56 CST 2017
wclient1:
Fri Sep 22 02:28:56 CST 2017
client1:
Fri Sep 22 02:28:56 CST 2017
client-host:
Fri Sep 22 02:28:56 CST 2017
wrokload-host:
Fri Sep 22 02:28:56 CST 2017

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #50 on: September 21, 2017, 03:51:33 PM »
HI , still fail

2017-09-22 03:26:09:896 setting hostsReady = true
2017-09-22 03:26:18:125 specvirt: waiting on 1 prime clients.
2017-09-22 03:28:25:119 PrimeControl: Workload and prime controller builds: 80
2017-09-22 03:28:25:120 PrimeControl: awaiting runtime started signal from prime clients
2017-09-22 03:28:36:592 specvirt: clock sync check completed successfully
2017-09-22 03:28:36:593 specvirt: initiating workload ramp-up.
2017-09-22 03:28:36:593 Polling start time = Fri Sep 22 03:48:37 CST 2017
2017-09-22 03:28:36:593 Polling end time   = Fri Sep 22 05:48:37 CST 2017
2017-09-22 03:33:37:699 PrimeControl: all workloads started.
2017-09-22 03:48:39:593 [ERROR] wclient1:1096 (PRIME_HOST[0][1]) failed to enter run phase before start of polling interval!
2017-09-22 03:48:39:593 PrimeControl: dumping polling start response times...
2017-09-22 03:48:39:593 client1:1098 (PRIME_HOST[0][0]) msec after pollStart: 9
2017-09-22 03:48:39:593 [ERROR] wclient1:1096 (PRIME_HOST[0][1]) msec after pollStart: 9223372036854775807
2017-09-22 03:48:39:594 client1:1094 (PRIME_HOST[0][2]) msec after pollStart: 17
2017-09-22 03:48:39:594 client1:1092 (PRIME_HOST[0][3]) msec after pollStart: 5
2017-09-22 03:48:39:594 PrimeControl: [ERROR] one or more workloads failed to start runtime before start of polling interval. Aborting.
2017-09-22 03:48:39:594 PrimeControl: sending abortTest() to prime clients.
2017-09-22 03:48:39:594 PrimeControl: id=0, abortID=-1
2017-09-22 03:48:39:594 PrimeControl: id=1, abortID=-1
2017-09-22 03:48:39:594 PrimeControl: masters[0]=client1:1098
2017-09-22 03:48:39:594 PrimeControl: masters[1]=wclient1:1096
2017-09-22 03:48:39:594 PrimeControl: id=2, abortID=-1
2017-09-22 03:48:39:595 PrimeControl: masters[2]=client1:1094
2017-09-22 03:48:39:594 PrimeControl: id=3, abortID=-1
2017-09-22 03:48:39:595 PrimeControl: masters[3]=client1:1092


I have synced , and before run ,reboot all vms,

[root@localhost opt]# ./synccheck.sh
infraserver1:
Fri Sep 22 03:49:21 CST 2017
websserver1:
Fri Sep 22 03:49:21 CST 2017
appserver1:
Fri Sep 22 03:49:21 CST 2017
dbserver1:
Fri Sep 22 03:49:22 CST 2017
batchserver1:
Fri Sep 22 03:49:22 CST 2017
mailserver1:
Fri Sep 22 03:49:22 CST 2017
wclient1:
Fri Sep 22 03:49:22 CST 2017
client1:
Fri Sep 22 03:49:23 CST 2017
client-host:
Fri Sep 22 03:49:22 CST 2017
wrokload-host:
Fri Sep 22 03:49:22 CST 2017


Is there any other Possibility that due to this error ?

Thanks.

lroderic

  • Moderator
  • Full Member
  • *****
  • Posts: 167
  • Karma: +6/-0
Re: runspecvirt fail
« Reply #51 on: September 21, 2017, 03:54:20 PM »
Reboot the webserver VM and wclient1 VM and retry.

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #52 on: September 21, 2017, 05:17:20 PM »
same error:
2017-09-22 04:54:20:103 specvirt: clock sync check completed successfully
2017-09-22 04:54:20:103 specvirt: initiating workload ramp-up.
2017-09-22 04:54:20:104 Polling start time = Fri Sep 22 05:14:21 CST 2017
2017-09-22 04:54:20:104 Polling end time   = Fri Sep 22 07:14:21 CST 2017
2017-09-22 04:59:25:990 PrimeControl: all workloads started.
2017-09-22 05:14:23:104 [ERROR] wclient1:1096 (PRIME_HOST[0][1]) failed to enter run phase before start of polling interval!
2017-09-22 05:14:23:104 PrimeControl: dumping polling start response times...
2017-09-22 05:14:23:104 client1:1098 (PRIME_HOST[0][0]) msec after pollStart: 9
2017-09-22 05:14:23:104 [ERROR] wclient1:1096 (PRIME_HOST[0][1]) msec after pollStart: 9223372036854775807
2017-09-22 05:14:23:104 client1:1094 (PRIME_HOST[0][2]) msec after pollStart: 18
2017-09-22 05:14:23:104 client1:1092 (PRIME_HOST[0][3]) msec after pollStart: 7
2017-09-22 05:14:23:104 PrimeControl: [ERROR] one or more workloads failed to start runtime before start of polling interval. Aborting.
2017-09-22 05:14:23:104 PrimeControl: sending abortTest() to prime clients.
2017-09-22 05:14:23:105 PrimeControl: id=0, abortID=-1
2017-09-22 05:14:23:105 PrimeControl: id=1, abortID=-1
2017-09-22 05:14:23:105 PrimeControl: masters[0]=client1:1098

/opt/SPECvirt/timesynctiles.sh client 172.24.11.91  as been added to runspecvirt.sh
and 172.24.11.91 is the host ip of workload vm run on .


wclient1 has beed added to timesynctiles.sh also, like this:
# Set time on client and VMs for each tile
for i in `seq 1 $tiles`;
    do
echo  $CLIENT$i:
       ssh $CLIENT$i date `ssh $PRIME date +%m%d%H%M.%S`
echo  wclient1:
       ssh wclient1 date `ssh $PRIME date +%m%d%H%M.%S`
echo dbserver$i
       ssh dbserver$i date `ssh $PRIME date +%m%d%H%M.%S`
echo appserver$i
       ssh appserver$i date `ssh $PRIME date +%m%d%H%M.%S`
echo batchserver$i
       ssh batchserver$i date `ssh $PRIME date +%m%d%H%M.%S`
echo mailserver$i
       ssh mailserver$i date `ssh $PRIME date +%m%d%H%M.%S`
echo infraserver$i
       ssh infraserver$i date `ssh $PRIME date +%m%d%H%M.%S`
echo webserver$i
       ssh webserver$i date `ssh $PRIME date +%m%d%H%M.%S`
    done
« Last Edit: September 21, 2017, 05:20:10 PM by zgy »

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #53 on: September 21, 2017, 05:26:26 PM »
I wander why every time the pollStart number is always the same ?    Like overflow?
104 [ERROR] wclient1:1096 (PRIME_HOST[0][1]) msec after pollStart: 9223372036854775807

lroderic

  • Moderator
  • Full Member
  • *****
  • Posts: 167
  • Karma: +6/-0
Re: runspecvirt fail
« Reply #54 on: September 21, 2017, 05:44:10 PM »
Back in https://www.spec.org/forums/index.php?topic=89.msg540#msg540, for some reason I don't understand, you decided to set RAMP_SECONDS and WARMUP_SECONDS for each individual tile and workload:

Code: [Select]
RAMP_SECONDS[0][0] = 1200
RAMP_SECONDS[0][1] = 1200
..
WARMUP_SECONDS[0][0] = 900
WARMUP_SECONDS[0][1] = 900
..

Are you trying to use tile or workload indexing with POLL_INTERVAL_SEC like this?

Code: [Select]
POLL_INTERVAL_SEC[0][0] = 7200
POLL_INTERVAL_SEC[0][1] = 600

If so, get rid of it. Entirely unnecessary and probably the cause of your timing problems. Set:

Code: [Select]
RAMP_SECONDS = 300
WARMUP_SECONDS = 600
POLL_INTERVAL_SEC = 1200

If this doesn't work, it's probably best for you to restart from scratch with the Example VM guide and use what you're learned here to start fresh.

Lisa

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #55 on: September 22, 2017, 08:21:14 AM »
Still error.

If I set WORKLOAD_LOAD_LEVEL[1] = 500,  It runs ok.

lroderic

  • Moderator
  • Full Member
  • *****
  • Posts: 167
  • Karma: +6/-0
Re: runspecvirt fail
« Reply #56 on: September 22, 2017, 12:35:49 PM »
This huge msec after pollStart means that it is actually returning a time that is before the pollStart. The granularity is too low. Which clock timer do you use? tsc has been used for most submissions. Please review the VM XML definition files in a compliant KVM submission. Also set DEBUG = 10 in Control.config to return the actual timer values in addition to just the msec after pollStart values.

Lisa

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #57 on: September 29, 2017, 05:43:56 AM »
HI,I rebuild the SPEC, and it can works now .

Is there any config to close the log info that locates in prime-client1_109*.log like follows?  thanks.

prime-client1_1096.log
2017-09-29 13:18:34:344,1194608,1191017,0,3591,527103813688,6439174388,1142178,1235,47604,263,392415,0
2017-09-29 13:18:44:346,1196296,1192705,0,3591,528036629793,6450488767,1143780,1236,47689,263,392415,0
2017-09-29 13:18:54:342,1198075,1194484,0,3591,528837322555,6460290109,1145475,1236,47773,263,392415,0
2017-09-29 13:19:04:343,1199785,1196194,0,3591,529603659846,6469362282,1147118,1237,47839,263,392415,0
2017-09-29 13:19:14:345,1201381,1197790,0,3591,530106108549,6475499332,1148659,1240,47891,263,392415,0
2017-09-29 13:19:24:342,1203066,1199475,0,3591,530756785877,6484256698,1150232,1246,47997,263,392415,0
2017-09-29 13:19:34:342,1204799,1201208,0,3591,531657357060,6495094444,1151876,1247,48085,263,392415,0
2017-09-29 13:19:44:343,1206484,1202893,0,3591,532335631319,6502995513,1153500,1249,48144,263,392415,0

prime-client1_1092.log
2017-09-29 11:20:04:901,1,1,2,2,2,2,1,0
2017-09-29 11:20:14:901,1,2,2,2,2,4,2,0
2017-09-29 11:20:24:902,1,3,3,2,3,7,3,0
2017-09-29 11:20:34:901,1,4,0,0,3,7,4,0
2017-09-29 11:20:44:901,1,5,3,0,3,10,5,0
2017-09-29 11:20:54:901,1,6,2,0,3,12,6,0
2017-09-29 11:21:04:901,1,7,13,0,13,25,7,0
2017-09-29 11:21:14:900,1,8,1,0,13,26,8,0

/prime-client1_1094.log:
2017-09-29 11:36:54:902,125574,125574,0,0,12957367,0,3928
2017-09-29 11:37:04:901,126746,126746,0,0,13060825,0,3928
2017-09-29 11:37:14:901,128061,128061,0,0,13148085,0,3928
2017-09-29 11:37:24:902,129573,129573,0,0,13257560,0,3928
2017-09-29 11:37:34:902,130983,130983,0,0,13383548,0,3928
2017-09-29 11:37:44:902,132530,132530,0,0,13484085,0,3928
2017-09-29 11:37:54:902,134015,134015,0,0,13611187,0,3928
2017-09-29 11:38:04:902,135443,135443,0,0,13890547,0,3928
2017-09-29 11:38:14:902,136845,136845,0,0,13972570,0,3928


/prime-client1_1098.log:
2017-09-29 13:11:34:911,170275,0.2,170144,0.4,339858,0.2,453466,1.25
2017-09-29 13:11:44:912,170571,0.2,170431,0.4,340420,0.2,454204,1.25
2017-09-29 13:11:54:915,170849,0.2,170708,0.4,341028,0.2,454968,1.25
2017-09-29 13:12:04:912,171198,0.2,171051,0.4,341801,0.2,456273,1.25
2017-09-29 13:12:14:910,171661,0.2,171528,0.4,342794,0.2,457823,1.25
2017-09-29 13:12:24:912,172211,0.2,172033,0.4,343896,0.2,459

lroderic

  • Moderator
  • Full Member
  • *****
  • Posts: 167
  • Karma: +6/-0
Re: runspecvirt fail
« Reply #58 on: September 29, 2017, 09:38:48 AM »
Congratulations on some excellent response times.

> Is there any config to close the log info that locates in prime-client1_109*.log
> like follows?  thanks.
>
> 2017-09-29 13:11:34:911,170275,0.2,170144,0.4,339858,0.2,453466,1.25

Not sure what you're looking for here. Those entries in primectrl.out are the workload info and response times for each of the workload transactions through the test.

This shows that app server response times is 1.25, which means you most likely still have capacity on the host. You can set DEBUG_LEVEL = 0, but that's as low as logging goes. Does that answer your question?

Lisa

Miles

  • Jr. Member
  • **
  • Posts: 72
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #59 on: October 03, 2017, 04:25:19 AM »
Hi
Code: [Select]
HI,I rebuild the SPEC, and it can works now .
I encounter the same failure and still pending on it.
Would you please share how to rebuild the SPEC?
Re-setup specvirt on clients or re-install the VMs and scripts or ... ?

Thanks.