SPEC Community

Product Support => SPECvirt_sc2013 => Topic started by: zgy on September 07, 2017, 05:06:37 AM

Title: runspecvirt fail
Post by: zgy on September 07, 2017, 05:06:37 AM
HI all:
as i run ./runspecvirt.sh , it fail, shows like this:

[root@client1 SPECvirt]# tail -50 primectrl.out
2017-09-07 16:38:05:383 PrimeControl: Master 1: client1:1098
2017-09-07 16:38:05:383 PrimeControl: Master 2: client1:1096
2017-09-07 16:38:05:383 PrimeControl: Master 3: client1:1094
2017-09-07 16:38:05:383 PrimeControl: Master 4: client1:1092
2017-09-07 16:38:05:385 PrimeControl: adding host client1:1098
2017-09-07 16:38:05:394 PrimeControl: adding host client1:1096
2017-09-07 16:38:05:399 PrimeControl: adding host client1:1094
2017-09-07 16:38:05:405 PrimeControl: adding host client1:1092
2017-09-07 16:38:05:420 First client for 0: 172.21.128.219:1091
2017-09-07 16:38:05:434 First client for 1: 172.21.128.219:1010
2017-09-07 16:38:05:444 First client for 2: 172.21.128.219:1200
2017-09-07 16:38:05:450 First client for 3: 172.21.128.219:1902
2017-09-07 16:38:05:454 PrimeControl: starting clients...
2017-09-07 16:38:05:454 PrimeControl: clients.length = 4
2017-09-07 16:38:05:454 PrimeControl: clients[0].length = 1
2017-09-07 16:38:05:454 PrimeControl: starting clients[0][0]: 172.21.128.219:1091
2017-09-07 16:38:05:475 PrimeControl: started client: 172.21.128.219:1091
2017-09-07 16:38:05:475 PrimeControl: clients[1].length = 1
2017-09-07 16:38:05:476 PrimeControl: starting clients[1][0]: 172.21.128.219:1010
2017-09-07 16:38:05:482 PrimeControl: started client: 172.21.128.219:1010
2017-09-07 16:38:05:482 PrimeControl: clients[2].length = 1
2017-09-07 16:38:05:482 PrimeControl: starting clients[2][0]: 172.21.128.219:1200
2017-09-07 16:38:05:491 PrimeControl: started client: 172.21.128.219:1200
2017-09-07 16:38:05:491 PrimeControl: clients[3].length = 1
2017-09-07 16:38:05:491 PrimeControl: starting clients[3][0]: 172.21.128.219:1902
2017-09-07 16:38:05:510 PrimeControl: started client: 172.21.128.219:1902
2017-09-07 16:38:05:517 PrimeControl: PTDs not used for this benchmark run!
2017-09-07 16:38:05:517 PrimeControl: starting 4 masters.
2017-09-07 16:38:39:648 PrimeControl: master[0][0] sleeping 20 sec.
2017-09-07 16:38:39:648 PrimeControl: master[0][1] sleeping 20 sec.
2017-09-07 16:38:39:648 PrimeControl: master[0][2] sleeping 20 sec.
2017-09-07 16:38:39:648 PrimeControl: master[0][3] sleeping 20 sec.
2017-09-07 16:38:59:687 PrimeControl: waiting on 4 prime client(s).
2017-09-07 16:39:00:146 Sending config to client1:1092
2017-09-07 16:39:00:151 Sending config to client1:1098
2017-09-07 16:39:00:158 Sending config to client1:1094
2017-09-07 16:39:00:200 Sending config to client1:1096
2017-09-07 16:39:00:344 specvirt: waiting on 3 prime clients.
2017-09-07 16:39:00:618 specvirt: waiting on 2 prime clients.
.
2017-09-07 16:39:00:721 setting hostsReady = true
2017-09-07 16:39:09:684 specvirt: waiting on 1 prime clients.
2017-09-07 16:41:15:791 [ERROR] Received abort signal from client1:1098. Terminating.
2017-09-07 16:41:15:791 PrimeControl: sending abortTest() to prime clients.
2017-09-07 16:41:15:792 PrimeControl: id=1, abortID=0
2017-09-07 16:41:15:792 PrimeControl: masters[1]=client1:1096
2017-09-07 16:41:15:792 PrimeControl: id=3, abortID=0
2017-09-07 16:41:15:792 PrimeControl: id=2, abortID=0
2017-09-07 16:41:15:792 PrimeControl: masters[2]=client1:1094
2017-09-07 16:41:15:792 PrimeControl: masters[3]=client1:1092

so i touch client-1098 log, and it shows:
prime-client1_1098:
[root@client1 SPECvirt]# tail -50 logs/20170907-163805/prime-client1_1098.log
2017-09-07 16:38:59:890 Looking up SPECvirt controller: client1
2017-09-07 16:39:00:150 masterID: 0, tile: 0, workload: 0
2017-09-07 16:39:00:151 hostname: client1
Hostname of prime client: client1
2017-09-07 16:39:00:219 Thu Sep 07 16:38:05 CST 2017
2017-09-07 16:39:00:272 RMI server started: client1:9900
2017-09-07 16:39:00:273 Total clients: 1
2017-09-07 16:39:00:273 Adding host client1:1091
2017-09-07 16:39:00:291 Setting up clients...
2017-09-07 16:39:00:717 calling getHostVM() on jappclient...
2017-09-07 16:41:15:786 Remote exception in setup() from client1:1091
java.rmi.ServerException: RemoteException occurred in server thread; nested exception is:
        java.rmi.RemoteException: Connection refused (Connection refused); nested exception is:
        java.lang.Exception: Connection refused (Connection refused)
2017-09-07 16:41:15:787 Aborting...
2017-09-07 16:41:15:791 abortTest() signal sent...
2017-09-07 16:41:15:791 clientsRunning = true
2017-09-07 16:41:15:792 calling client.exit()...

PLS, the pollme is running in ALL vms, iptables is stop, and SSLv3 is removed  from the jdk.tls.disabledAlgorithms property in the java.security file( /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.141-3.b16.el6_9.x86_64/jre/lib/security/java.security),


Run pollmecheck.sh from client shows:
[root@client1 SPECvirt]# ./pollmecheck.sh
Tile 1 VMs:
dbserver1:
root@dbserver1's password:
root      3643     1  0 16:31 pts/0    00:00:01 java -jar /opt/SPECpoll/pollme.jar -n dbserver -p 8001
appserver1:
root@appserver1's password:
root      4025     1  0 16:30 pts/0    00:00:01 java -jar /opt/SPECpoll/pollme.jar -n appserver -p 8001
batchserver1:
root@batchserver1's password:
root      2362     1  0 16:27 ?        00:00:01 java -jar /opt/SPECpoll/pollme.jar -n batchserver -p 8001
mailserver1:
root@mailserver1's password:
root      2351     1  0 16:26 ?        00:00:02 java -jar /opt/SPECpoll/pollme.jar -n mailserver -p 8001
infraserver1:
root@infraserver1's password:
root      3375     1  0 16:32 pts/0    00:00:01 java -jar /opt/SPECpoll/pollme.jar -n infraserver -p 8001
webserver1:
root@webserver1's password:
root      3512     1  0 16:29 pts/0    00:00:02 java -jar /opt/SPECpoll/pollme.jar -n webserver -p 8001
 

Polleme.out:
[root@infraserver ~]# cat /tmp/pollme.out
Creating RMI listener using RMI Registry port 8001
infraserver/172.21.128.220:8001 ready...
infraserver: 1504773548135

[root@webserver ~]# cat /tmp/pollme.out
Creating RMI listener using RMI Registry port 8001
webserver/172.21.128.221:8001 ready...
2017-09-07 16:39:07:755 contacting host 'infraserver'...
webserver: 1504773547774
infraserver: 1504773548135

[root@mailserver ~]# cat /tmp/pollme.out
Creating RMI listener using RMI Registry port 8001
mailserver/172.21.128.222:8001 ready...
mailserver: 1504773534638
[root@mailserver ~]#

[root@appserver ~]# cat /tmp/pollme.out
Creating RMI listener using RMI Registry port 8001
appserver/172.21.128.224:8001 ready...
[root@appserver ~]#

[root@dbserver ~]# cat /tmp/pollme.out
Creating RMI listener using RMI Registry port 8001
dbserver/172.21.128.225:8001 ready...
[root@dbserver ~]#

[root@batchserver ~]# cat /tmp/pollme.out
Creating RMI listener using RMI Registry port 8001
batchserver/172.21.128.227:8001 ready...
batchserver: 1504773537995
[root@batchserver ~]#

My etc/hosts is:
# External VM-to-client communications
172.21.128.220     infraserver infraserver1
172.21.128.221     webserver webserver1
172.21.128.222     mailserver mailserver1
172.21.128.224     appserver appserver1 specdelivery specemulator
172.21.128.225     dbserver dbserver1
172.21.128.227     batchserver batchserver1

172.21.128.219  client1 specdriver

# Internal VM-to-VM only communications
172.21.128.240       infraserver1-int
172.21.128.241       webserver1-int
172.21.128.244       appserver1-int
172.21.128.245       dbserver1-int specdb

java version is 1.8.0_141 expect appserver is 1.7.0_151, because java 1.8 is incompatible with glassfish3, so i use 1.7.0_151 in appserver.

How to deal whit it now?

attachments is the client log and primectl.out,  also my Control.config in it , please help have a look, thanks so much.
Title: Re: runspecvirt fail
Post by: lroderic on September 07, 2017, 11:55:49 AM
Thanks for all the logs. The problem is with appserver's internal network connection to dbserver evidenced in prime-client1_1098.log which gets reported into primectrl.out. Let's work on that without the noise of the other workloads. In Control.config, please change NUM_WORKLOADS=1 to run only app/dbserver.

You're not using an internal network since both the internal vNICs and external vNICs are on 172.21.128. The point of using internal vNICs for communications between appserver and dbserver is to move that intra-VM network traffic off of the LAN between the client and workload VMs, reducing network traffic between the client and workload VMs. The error in client-172.21.128.219_1091.log means that SPECjAppServer is trying to connect to a hostname it can't find.

To make the example VM configuration work, please create a "dummy" internal network that lets VMs communicate only with each other. If you use vSphere, see https://communities.vmware.com/thread/234233 (https://communities.vmware.com/thread/234233) to set up a "host-only" network. If you use KVM, see http://www.linux-kvm.org/page/Networking#Private_Virtual_Bridge (http://www.linux-kvm.org/page/Networking#Private_Virtual_Bridge) for setting up a private virtual bridge.

Once you create the internal network and assign a second vNIC on each VM to it, edit the VM's /etc/hosts to reflect the IP addresses for those vNICS. The Example VM guide states in Section 2.0:

Code: [Select]
Each VM must have at least one virtual network adapter associated with a physical network that can communicate
with the client. Review the /etc/hosts file for each cloned VM for the IP address assignments. Assign the first network
adapter (eth0) to the external network (for example 192.168). Assign the second network adapter (eth1) to the internal
network (for example 10.10).


Using this example, we'll use 10.10.1 for the internal network on app/db and web/infra VMs:

Code: [Select]
# External VM-to-client communications
172.21.128.220     infraserver infraserver1
172.21.128.221     webserver webserver1
172.21.128.222     mailserver mailserver1
172.21.128.224     appserver appserver1 specdelivery specemulator
172.21.128.225     dbserver dbserver1
172.21.128.227     batchserver batchserver1

# Internal VM-to-VM only communications
10.10.1.220       infraserver1-int
10.10.1.221       webserver1-int
10.10.1.224       appserver1-int
10.10.1.225       dbserver1-int specdb

You could also do a search on "SPECpoll" here in the forum to see others who've had problems with getting SPECpoll working.

Lisa
Title: Re: runspecvirt fail
Post by: zgy on September 08, 2017, 10:15:49 AM
HI Lisa, Thanks for your help.

After I use 10.10.1 for the internal network, I rerun runspecvirt.sh , now the error info is:
2017-09-08 22:10:34:097 SslConnection: [ERROR] IOException during SSL handshake: javax.net.ssl.SSLHandshakeException: No appropriate protocol (protocol is disabled or cipher suites are inappropriate)
2017-09-08 22:10:34:097 Connection: [ERROR] createSocket() failed.
2017-09-08 22:10:34:097 HttpRequestSched: Creating socket...
2017-09-08 22:10:34:097 SslConnection: [ERROR] IOException during SSL handshake: javax.net.ssl.SSLHandshakeException: No appropriate protocol (protocol is disabled or cipher suites are inappropriate)
2017-09-08 22:10:34:097 Connection: [ERROR] createSocket() failed.
2017-09-08 22:10:34:097 HttpRequestSched: Creating socket...
2017-09-08 22:10:34:097 SslConnection: [ERROR] IOException during SSL handshake: javax.net.ssl.SSLHandshakeException: No appropriate protocol (protocol is disabled or cipher suites are inappropriate)
2017-09-08 22:10:34:098 Connection: [ERROR] createSocket() failed.
2017-09-08 22:10:34:098 HttpRequestSched: Creating socket...


After I did  below, but The error still exists
1:Remove SSLv3 and MD5withRSA from the jdk.tls.disabledAlgorithms property in the $JDK_HOME/jre/lib/security/java.security file (SPECvirt assumes an MD5-based SSLv3 cipher).
2:Add SSL_RSA_WITH_RC4_128_MD5 to the jdk.tls.legacyAlgorithms list in the $JDK_HOME/jre/lib/security/java.security file.
3:Remove !MD5 from the SSLCipherSuite list in /etc/httpd/conf.d/ssl.conf file.


Title: Re: runspecvirt fail
Post by: lroderic on September 08, 2017, 10:40:27 AM
Hi. Your app/dbserver got further, but since you ran with NUM_WORKLOADS=4, we can't tell if your changes to the hosts file for app/dbserver worked. I would've made sure that was working before moving on to web/infraserver, but OK.

Thanks for reading the Technical Support FAQ. You're using Java 1.8.0_141 on all VMs except for the appserver? You made those three SSL changes on the client, correct?

Lisa

Title: Re: runspecvirt fail
Post by: zgy on September 09, 2017, 12:03:13 AM
YES, I using java 1.8.0_141 on all VMs expect appserver, 

I made those threre SSL changes on all VMs,   it is right?

Or just for the client VM?
Title: Re: runspecvirt fail
Post by: lroderic on September 09, 2017, 12:54:08 AM
You need those changes only on the client, not on any other VMs. As it says in https://www.spec.org/virt_sc2013/docs/SPECvirt_TechnicalSupport.html (https://www.spec.org/virt_sc2013/docs/SPECvirt_TechnicalSupport.html) under Webserver/Infraserver:

To enable SSLv3, log into the client and: ...

You might consider using TLS_RSA_WITH_AES_128_CBC_SHA. See https://www.spec.org/forums/index.php?topic=58.msg358 (https://www.spec.org/forums/index.php?topic=58.msg358) for info on making it work. If you end up using TLS and intend to submit your test to SPEC for acceptance, make sure to apply the patch at https://www.spec.org/virt_sc2013/docs/SPECvirt_Patches.html (https://www.spec.org/virt_sc2013/docs/SPECvirt_Patches.html).

Lisa
Title: Re: runspecvirt fail
Post by: zgy on September 11, 2017, 05:22:25 AM
I use TLS_RSA_WITH_AES_128_CBC_SHA, now got futher, but anthor error....


Clientmgr1_1088.out:
-> 2017-09-11 17:32:33:537 SPECweb_Support: [ERROR] SocketTimeoutException encountered during run!
-> 2017-09-11 17:32:33:537 SPECweb_Support[653]: State Switch [4,3]
-> 2017-09-11 17:32:33:537 WorkloadScheduler[653]: Thinking for 28000 msec
-> 2017-09-11 17:32:33:562 Connection: SocketTimeoutException waiting for end-of-header
-> 2017-09-11 17:32:33:562 SPECweb_Support[687]: STATE 2; RESPONSE LENGTH = 20480
-> 2017-09-11 17:32:33:562 SPECweb_Support[687]: STATE 2; FILE BYTES READ = -4
-> 2017-09-11 17:32:33:562 SPECweb_Support: [ERROR] SocketTimeoutException encountered during run!
-> 2017-09-11 17:32:33:562 SPECweb_Support[687]: State Switch [2,3]
-> 2017-09-11 17:32:33:562 WorkloadScheduler[687]: Thinking for 12000 msec
-> 2017-09-11 17:32:33:565 Connection: SocketTimeoutException waiting for end-of-header
-> 2017-09-11 17:32:33:565 SPECweb_Support[2219]: STATE 1; RESPONSE LENGTH = 20480
-> 2017-09-11 17:32:33:565 SPECweb_Support[2219]: STATE 1; FILE BYTES READ = -4
-> 2017-09-11 17:32:33:565 SPECweb_Support: [ERROR] SocketTimeoutException encountered during run!
-> 2017-09-11 17:32:33:565 SPECweb_Support[2219]: State Switch [1,1]
-> 2017-09-11 17:32:33:565 WorkloadScheduler[2219]: Thinking for 6000 msec
-> 2017-09-11 17:32:33:575 Connection: SocketTimeoutException waiting for end-of-header
-> 2017-09-11 17:32:33:575 SPECweb_Support[688]: STATE 4; RESPONSE LENGTH = 20480
-> 2017-09-11 17:32:33:575 SPECweb_Support[688]: STATE 4; FILE BYTES READ = -4
-> 2017-09-11 17:32:33:575 SPECweb_Support: [ERROR] SocketTimeoutException encountered during run!
-> 2017-09-11 17:32:33:575 SPECweb_Support[688]: State Switch [4,3]
-> 2017-09-11 17:32:33:575 WorkloadScheduler[688]: Thinking for 29000 msec
-> 2017-09-11 17:32:33:625 Connection: SocketTimeoutException waiting for end-of-header
-> 2017-09-11 17:32:33:625 SPECweb_Support[423]: STATE 3; RESPONSE LENGTH = 20480
-> 2017-09-11 17:32:33:625 SPECweb_Support[423]: STATE 3; FILE BYTES READ = -4
-> 2017-09-11 17:32:33:625 SPECweb_Support: [ERROR] SocketTimeoutException encountered during run!
-> 2017-09-11 17:32:33:625 SPECweb_Support[423]: State Switch [3,2]
-> 2017-09-11 17:32:33:625 WorkloadScheduler[423]: Thinking for 14000 msec
-> 2017-09-11 17:32:33:714 Connection: SocketTimeoutException waiting for end-of-header
-> 2017-09-11 17:32:33:714 SPECweb_Support[816]: STATE 1; RESPONSE LENGTH = 20480
-> 2017-09-11 17:32:33:714 SPECweb_Support[816]: STATE 1; FILE BYTES READ = -4
-> 2017-09-11 17:32:33:714 SPECweb_Support: [ERROR] SocketTimeoutException encountered during run!
-> 2017-09-11 17:32:33:714 SPECweb_Support[816]: State Switch [1,3]
-> 2017-09-11 17:32:33:714 WorkloadScheduler[816]: Thinking for 3000 msec
-> 2017-09-11 17:32:33:720 Connection: SocketTimeoutException waiting for end-of-header
-> 2017-09-11 17:32:33:720 SPECweb_Support[57]: STATE 3; RESPONSE LENGTH = 20480
-> 2017-09-11 17:32:33:720 SPECweb_Support[57]: STATE 3; FILE BYTES READ = -4
-> 2017-09-11 17:32:33:720 SPECweb_Support: [ERROR] SocketTimeoutException encountered during run!
-> 2017-09-11 17:32:33:720 SPECweb_Support[57]: State Switch [3,2]
-> 2017-09-11 17:32:33:720 WorkloadScheduler[57]: Thinking for 5000 msec
-> 2017-09-11 17:32:33:774 Connection: SocketTimeoutException waiting for end-of-header
-> 2017-09-11 17:32:33:774 SPECweb_Support[217]: STATE 3; RESPONSE LENGTH = 20480
-> 2017-09-11 17:32:33:774 SPECweb_Support[217]: STATE 3; FILE BYTES READ = -4
-> 2017-09-11 17:32:33:774 SPECweb_Support: [ERROR] SocketTimeoutException encountered during run!
-> 2017-09-11 17:32:33:774 SPECweb_Support[217]: State Switch [3,4]
-> 2017-09-11 17:32:33:774 WorkloadScheduler[217]: Thinking for 7000 msec
-> 2017-09-11 17:32:33:821 SslConnection: [ERROR] IOException creating socket. Exception is:
-> java.net.ConnectException: Connection timed out (Connection timed out)
-> hostServer is: webserver; port is: 443
-> 2017-09-11 17:32:33:821 Connection: IOException closing SSL socket: java.net.SocketException: Socket is closed
-> 2017-09-11 17:32:33:821 Connection: IOException closing SSL socket: java.net.SocketException: Socket is closed
-> 2017-09-11 17:32:33:830 SslConnection: [ERROR] IOException creating socket. Exception is:
-> java.net.ConnectException: Connection timed out (Connection timed out)
-> hostServer is: webserver; port is: 443
-> 2017-09-11 17:32:33:830 Connection: IOException closing SSL socket: java.net.SocketException: Socket is closed
-> 2017-09-11 17:32:33:830 Connection: IOException closing SSL socket: java.net.SocketException: Socket is closed
-> 2017-09-11 17:32:33:831 SslConnection: [ERROR] IOException creating socket. Exception is:
-> java.net.ConnectException: Connection timed out (Connection timed out)
-> hostServer is: webserver; port is: 443
-> 2017-09-11 17:32:33:831 Connection: IOException closing SSL socket: java.net.SocketException: Socket is closed
-> 2017-09-11 17:32:33:831 Connection: IOException closing SSL socket: java.net.SocketException: Socket is closed
-> 2017-09-11 17:32:33:838 SslConnection: [ERROR] IOException creating socket. Exception is:
-> java.net.ConnectException: Connection timed out (Connection timed out)
-> hostServer is: webserver; port is: 443
-> 2017-09-11 17:32:33:838 Connection: IOException closing SSL socket: java.net.SocketException: Socket is closed
-> 2017-09-11 17:32:33:838 Connection: IOException closing SSL socket: java.net.SocketException: Socket is closed
-> 2017-09-11 17:32:33:856 SslConnection: [ERROR] IOException creating socket. Exception is:
-> java.net.ConnectException: Connection timed out (Connection timed out)
-> hostServer is: webserver; port is: 443
-> 2017-09-11 17:32:33:856 SPECweb_Support[613]: STATE 5; RESPONSE LENGTH = 20480
-> 2017-09-11 17:32:33:856 SPECweb_Support[613]: STATE 5; FILE BYTES READ = -1
-> 2017-09-11 17:32:33:856 SPECweb_Support: [ERROR] STATE 5; makeHttpRequest() failed.
-> 2017-09-11 17:32:33:856 SPECweb_Support[613]: State Switch [5,4]
-> 2017-09-11 17:32:33:858 Connection: SocketTimeoutException waiting for end-of-header
-> 2017-09-11 17:32:33:858 SPECweb_Support[795]: STATE 2; RESPONSE LENGTH = 20480
-> 2017-09-11 17:32:33:858 SPECweb_Support[795]: STATE 2; FILE BYTES READ = -4
-> 2017-09-11 17:32:33:858 SPECweb_Support: [ERROR] SocketTimeoutException encountered during run!
-> 2017-09-11 17:32:33:858 SPECweb_Support[795]: State Switch [2,3]
-> 2017-09-11 17:32:33:858 WorkloadScheduler[795]: Thinking for 9000 msec
-> 2017-09-11 17:32:33:859 SslConnection: [ERROR] IOException creating socket. Exception is:
-> java.net.ConnectException: Connection timed out (Connection timed out)
-> hostServer is: webserver; port is: 443
-> 2017-09-11 17:32:33:859 Connection: IOException closing SSL socket: java.net.SocketException: Socket is closed
-> 2017-09-11 17:32:33:859 Connection: IOException closing SSL socket: java.net.SocketException: Socket is closed
-> 2017-09-11 17:32:33:867 SslConnection: [ERROR] IOException creating socket. Exception is:
-> java.net.ConnectException: Connection timed out (Connection timed out)
-> hostServer is: webserver; port is: 443
-> 2017-09-11 17:32:33:867 SPECweb_Support[504]: STATE 0; RESPONSE LENGTH = 20480
-> 2017-09-11 17:32:33:867 SPECweb_Support[504]: STATE 0; FILE BYTES READ = -1
-> 2017-09-11 17:32:33:867 SPECweb_Support: [ERROR] STATE 0; makeHttpRequest() failed.
-> 2017-09-11 17:32:33:867 SPECweb_Support[504]: State Switch [0,2]
-> 2017-09-11 17:32:33:964 Connection: SocketTimeoutException waiting for end-of-header
-> 2017-09-11 17:32:33:964 SPECweb_Support[1810]: STATE 4; RESPONSE LENGTH = 20480
-> 2017-09-11 17:32:33:964 SPECweb_Support[1810]: STATE 4; FILE BYTES READ = -4
-> 2017-09-11 17:32:33:964 SPECweb_Support: [ERROR] SocketTimeoutException encountered during run!
-> 2017-09-11 17:32:33:964 SPECweb_Support[1810]: State Switch [4,3]
-> 2017-09-11 17:32:33:964 WorkloadScheduler[1810]: Thinking for 1000 msec
-> 2017-09-11 17:32:33:965 SslConnection: [ERROR] IOException creating socket. Exception is:
-> java.net.ConnectException: Connection timed out (Connection timed out)
-> hostServer is: webserver; port is: 443
-> 2017-09-11 17:32:33:965 Connection: IOException closing SSL socket: java.net.SocketException: Socket is closed
-> 2017-09-11 17:32:33:965 Connection: [ERROR] createSocket() failed.
-> 2017-09-11 17:32:33:965 HttpRequestSched: Creating socket...
-> 2017-09-11 17:32:33:965 Connection: IOException closing SSL socket: java.net.SocketException: Socket is closed



[root@client1 SPECvirt]# cat primectrl.out
2017-09-11 17:03:23:329 Mon Sep 11 17:03:23 CST 2017
2017-09-11 17:03:23:330 specvirt: maxPreRunTime = 1201
2017-09-11 17:03:23:330 specvirt: runTime = 7200
2017-09-11 17:03:23:330 specvirt: runTime = 7200
2017-09-11 17:03:23:331 specvirt: runTime = 7200
2017-09-11 17:03:23:331 specvirt: runTime = 7200
2017-09-11 17:03:23:331 specvirt: runTime = 600
2017-09-11 17:03:23:331 specvirt: runTime = 600
2017-09-11 17:03:23:331 specvirt: runTime = 600
2017-09-11 17:03:23:331 specvirt: runTime = 600
2017-09-11 17:03:23:397 RMI server started: client1:9990
2017-09-11 17:03:23:404 [INFO] This is a perf-only benchmark run. Skipping active idle polling interval.
2017-09-11 17:03:23:405 PrimeControl: preparing client drivers.
2017-09-11 17:03:23:405 PrimeControl: PRIME_HOST 0 = client1:1098
2017-09-11 17:03:23:405 PrimeControl: PRIME_HOST 1 = client1:1096
2017-09-11 17:03:23:405 PrimeControl: PRIME_HOST 2 = client1:1094
2017-09-11 17:03:23:405 PrimeControl: PRIME_HOST 3 = client1:1092
2017-09-11 17:03:23:407 PrimeControl: Master 1: client1:1098
2017-09-11 17:03:23:407 PrimeControl: Master 2: client1:1096
2017-09-11 17:03:23:407 PrimeControl: Master 3: client1:1094
2017-09-11 17:03:23:407 PrimeControl: Master 4: client1:1092
2017-09-11 17:03:23:409 PrimeControl: adding host client1:1098
2017-09-11 17:03:23:418 PrimeControl: adding host client1:1096
2017-09-11 17:03:23:424 PrimeControl: adding host client1:1094
2017-09-11 17:03:23:431 PrimeControl: adding host client1:1092
2017-09-11 17:03:23:448 First client for 0: 172.21.128.219:1091
2017-09-11 17:03:23:462 First client for 1: 172.21.128.219:1010
2017-09-11 17:03:23:473 First client for 2: 172.21.128.219:1200
2017-09-11 17:03:23:479 First client for 3: 172.21.128.219:1902
2017-09-11 17:03:23:484 PrimeControl: starting clients...
2017-09-11 17:03:23:484 PrimeControl: clients.length = 4
2017-09-11 17:03:23:485 PrimeControl: clients[0].length = 1
2017-09-11 17:03:23:485 PrimeControl: starting clients[0][0]: 172.21.128.219:1091
2017-09-11 17:03:23:512 PrimeControl: started client: 172.21.128.219:1091
2017-09-11 17:03:23:512 PrimeControl: clients[1].length = 1
2017-09-11 17:03:23:512 PrimeControl: starting clients[1][0]: 172.21.128.219:1010
2017-09-11 17:03:23:516 PrimeControl: started client: 172.21.128.219:1010
2017-09-11 17:03:23:516 PrimeControl: clients[2].length = 1
2017-09-11 17:03:23:517 PrimeControl: starting clients[2][0]: 172.21.128.219:1200
2017-09-11 17:03:23:524 PrimeControl: started client: 172.21.128.219:1200
2017-09-11 17:03:23:524 PrimeControl: clients[3].length = 1
2017-09-11 17:03:23:526 PrimeControl: starting clients[3][0]: 172.21.128.219:1902
2017-09-11 17:03:23:529 PrimeControl: started client: 172.21.128.219:1902
2017-09-11 17:03:23:530 PrimeControl: PTDs not used for this benchmark run!
2017-09-11 17:03:23:530 PrimeControl: starting 4 masters.
2017-09-11 17:10:55:297 PrimeControl: master[0][0] sleeping 20 sec.
2017-09-11 17:10:55:297 PrimeControl: master[0][2] sleeping 20 sec.
2017-09-11 17:10:55:298 PrimeControl: master[0][1] sleeping 20 sec.
2017-09-11 17:10:55:298 PrimeControl: master[0][3] sleeping 20 sec.
2017-09-11 17:11:15:349 PrimeControl: waiting on 4 prime client(s).
2017-09-11 17:11:15:850 Sending config to client1:1094
2017-09-11 17:11:15:858 Sending config to client1:1098
2017-09-11 17:11:15:859 Sending config to client1:1092
2017-09-11 17:11:15:893 Sending config to client1:1096
2017-09-11 17:11:16:209 specvirt: waiting on 3 prime clients.
.
2017-09-11 17:11:16:390 setting hostsReady = true
2017-09-11 17:11:16:539 specvirt: waiting on 2 prime clients.
2017-09-11 17:11:26:809 specvirt: waiting on 1 prime clients.
2017-09-11 17:13:32:065 PrimeControl: Workload and prime controller builds: 80
2017-09-11 17:13:32:065 PrimeControl: awaiting runtime started signal from prime clients
2017-09-11 17:13:49:066 specvirt: clock sync check completed successfully
2017-09-11 17:13:49:066 specvirt: initiating workload ramp-up.
2017-09-11 17:13:49:067 Polling start time = Mon Sep 11 17:33:50 CST 2017
2017-09-11 17:13:49:067 Polling end time   = Mon Sep 11 19:33:50 CST 2017
2017-09-11 17:18:50:103 PrimeControl: all workloads started.
2017-09-11 17:33:50:116 PrimeControl: all workloads in run time.
2017-09-11 17:33:50:117 PrimeControl: dumping polling start response times...
2017-09-11 17:33:50:117 client1:1098 (PRIME_HOST[0][0]) msec after pollStart: 14
2017-09-11 17:33:50:117 client1:1096 (PRIME_HOST[0][1]) msec after pollStart: 50
2017-09-11 17:33:50:117 client1:1094 (PRIME_HOST[0][2]) msec after pollStart: 26
2017-09-11 17:33:50:118 client1:1092 (PRIME_HOST[0][3]) msec after pollStart: 7
2017-09-11 17:33:50:118 PrimeControl: checking polling start response times...
2017-09-11 17:33:50:120 PrimeControl: sleeping for 0 sec
2017-09-11 17:33:50:120 PrimeControl: sending results counter reset command.
2017-09-11 17:33:50:120 PrimeControl: polling for 7200 sec


Client:
[root@client1 SPECvirt]# ulimit -n
999999
[root@client1 SPECvirt]# ulimit -u
unlimited
[root@client1 SPECvirt]#

My client run on a different physic whit other server VM, and I just use 1GE Network card.  Is it a  possibility
that resulting in the error( [ERROR] SocketTimeoutException encountered during run)?

Can I reduse the SIMULTANEOUS_SESSIONS to temply fix this error?  If so, what steps should i do ?

Or which log should i check?  thanks
Title: Re: runspecvirt fail
Post by: zgy on September 11, 2017, 07:48:17 AM
At the end , error info is:
[root@client1 SPECvirt]# cat primectrl.out
2017-09-11 17:03:23:329 Mon Sep 11 17:03:23 CST 2017
2017-09-11 17:03:23:330 specvirt: maxPreRunTime = 1201
2017-09-11 17:03:23:330 specvirt: runTime = 7200
2017-09-11 17:03:23:330 specvirt: runTime = 7200
2017-09-11 17:03:23:331 specvirt: runTime = 7200
2017-09-11 17:03:23:331 specvirt: runTime = 7200
2017-09-11 17:03:23:331 specvirt: runTime = 600
2017-09-11 17:03:23:331 specvirt: runTime = 600
2017-09-11 17:03:23:331 specvirt: runTime = 600
2017-09-11 17:03:23:331 specvirt: runTime = 600
2017-09-11 17:03:23:397 RMI server started: client1:9990
2017-09-11 17:03:23:404 [INFO] This is a perf-only benchmark run. Skipping active idle polling interval.
2017-09-11 17:03:23:405 PrimeControl: preparing client drivers.
2017-09-11 17:03:23:405 PrimeControl: PRIME_HOST 0 = client1:1098
2017-09-11 17:03:23:405 PrimeControl: PRIME_HOST 1 = client1:1096
2017-09-11 17:03:23:405 PrimeControl: PRIME_HOST 2 = client1:1094
2017-09-11 17:03:23:405 PrimeControl: PRIME_HOST 3 = client1:1092
2017-09-11 17:03:23:407 PrimeControl: Master 1: client1:1098
2017-09-11 17:03:23:407 PrimeControl: Master 2: client1:1096
2017-09-11 17:03:23:407 PrimeControl: Master 3: client1:1094
2017-09-11 17:03:23:407 PrimeControl: Master 4: client1:1092
2017-09-11 17:03:23:409 PrimeControl: adding host client1:1098
2017-09-11 17:03:23:418 PrimeControl: adding host client1:1096
2017-09-11 17:03:23:424 PrimeControl: adding host client1:1094
2017-09-11 17:03:23:431 PrimeControl: adding host client1:1092
2017-09-11 17:03:23:448 First client for 0: 172.21.128.219:1091
2017-09-11 17:03:23:462 First client for 1: 172.21.128.219:1010
2017-09-11 17:03:23:473 First client for 2: 172.21.128.219:1200
2017-09-11 17:03:23:479 First client for 3: 172.21.128.219:1902
2017-09-11 17:03:23:484 PrimeControl: starting clients...
2017-09-11 17:03:23:484 PrimeControl: clients.length = 4
2017-09-11 17:03:23:485 PrimeControl: clients[0].length = 1
2017-09-11 17:03:23:485 PrimeControl: starting clients[0][0]: 172.21.128.219:1091
2017-09-11 17:03:23:512 PrimeControl: started client: 172.21.128.219:1091
2017-09-11 17:03:23:512 PrimeControl: clients[1].length = 1
2017-09-11 17:03:23:512 PrimeControl: starting clients[1][0]: 172.21.128.219:1010
2017-09-11 17:03:23:516 PrimeControl: started client: 172.21.128.219:1010
2017-09-11 17:03:23:516 PrimeControl: clients[2].length = 1
2017-09-11 17:03:23:517 PrimeControl: starting clients[2][0]: 172.21.128.219:1200
2017-09-11 17:03:23:524 PrimeControl: started client: 172.21.128.219:1200
2017-09-11 17:03:23:524 PrimeControl: clients[3].length = 1
2017-09-11 17:03:23:526 PrimeControl: starting clients[3][0]: 172.21.128.219:1902
2017-09-11 17:03:23:529 PrimeControl: started client: 172.21.128.219:1902
2017-09-11 17:03:23:530 PrimeControl: PTDs not used for this benchmark run!
2017-09-11 17:03:23:530 PrimeControl: starting 4 masters.
2017-09-11 17:10:55:297 PrimeControl: master[0][0] sleeping 20 sec.
2017-09-11 17:10:55:297 PrimeControl: master[0][2] sleeping 20 sec.
2017-09-11 17:10:55:298 PrimeControl: master[0][1] sleeping 20 sec.
2017-09-11 17:10:55:298 PrimeControl: master[0][3] sleeping 20 sec.
2017-09-11 17:11:15:349 PrimeControl: waiting on 4 prime client(s).
2017-09-11 17:11:15:850 Sending config to client1:1094
2017-09-11 17:11:15:858 Sending config to client1:1098
2017-09-11 17:11:15:859 Sending config to client1:1092
2017-09-11 17:11:15:893 Sending config to client1:1096
2017-09-11 17:11:16:209 specvirt: waiting on 3 prime clients.
.
2017-09-11 17:11:16:390 setting hostsReady = true
2017-09-11 17:11:16:539 specvirt: waiting on 2 prime clients.
2017-09-11 17:11:26:809 specvirt: waiting on 1 prime clients.
2017-09-11 17:13:32:065 PrimeControl: Workload and prime controller builds: 80
2017-09-11 17:13:32:065 PrimeControl: awaiting runtime started signal from prime clients
2017-09-11 17:13:49:066 specvirt: clock sync check completed successfully
2017-09-11 17:13:49:066 specvirt: initiating workload ramp-up.
2017-09-11 17:13:49:067 Polling start time = Mon Sep 11 17:33:50 CST 2017
2017-09-11 17:13:49:067 Polling end time   = Mon Sep 11 19:33:50 CST 2017
2017-09-11 17:18:50:103 PrimeControl: all workloads started.
2017-09-11 17:33:50:116 PrimeControl: all workloads in run time.
2017-09-11 17:33:50:117 PrimeControl: dumping polling start response times...
2017-09-11 17:33:50:117 client1:1098 (PRIME_HOST[0][0]) msec after pollStart: 14
2017-09-11 17:33:50:117 client1:1096 (PRIME_HOST[0][1]) msec after pollStart: 50
2017-09-11 17:33:50:117 client1:1094 (PRIME_HOST[0][2]) msec after pollStart: 26
2017-09-11 17:33:50:118 client1:1092 (PRIME_HOST[0][3]) msec after pollStart: 7
2017-09-11 17:33:50:118 PrimeControl: checking polling start response times...
2017-09-11 17:33:50:120 PrimeControl: sleeping for 0 sec
2017-09-11 17:33:50:120 PrimeControl: sending results counter reset command.
2017-09-11 17:33:50:120 PrimeControl: polling for 7200 sec
2017-09-11 19:33:50:083 PrimeControl: client1:1092 (PRIME_HOST[0][3]) run complete; numStarted = 3
2017-09-11 19:33:50:121 PrimeControl: stopping result polling.
2017-09-11 19:33:50:121 PrimeControl: waiting for all workloads to stop...
2017-09-11 19:33:50:123 PrimeControl: [ERROR] remote exception occured  while calling resetCounters() on 172.21.128.219: java.rmi.ConnectException: Connection refused to host: 172.21.128.219; nested exception is:
        java.net.ConnectException: Connection refused (Connection refused)
2017-09-11 19:33:50:171 PrimeControl: client1:1094 (PRIME_HOST[0][2]) run complete; numStarted = 2
2017-09-11 19:33:56:899 PrimeControl: client1:1098 (PRIME_HOST[0][0]) run complete; numStarted = 1
2017-09-11 19:34:00:122 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:34:15:122 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:34:30:123 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:34:45:123 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:35:00:124 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:35:15:124 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:35:30:125 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:35:45:125 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:36:00:126 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:36:15:126 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:36:30:127 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:36:45:127 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:37:00:128 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:37:15:128 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:37:30:129 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:37:45:129 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:38:00:130 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:38:15:130 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:38:30:131 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:38:45:131 PrimeControl: waiting for 1 masters to finish
2017-09-11 19:39:00:132 PrimeControl: [ERROR] Maximum prime client wait time exceeded!
2017-09-11 19:39:00:133 PrimeControl: [ERROR] startMasters() failed!
2017-09-11 19:39:00:133 PrimeControl: sending abortTest() to prime clients.
2017-09-11 19:39:00:134 PrimeControl: [ERROR] Remote exception occured while calling getValidationRept() on client1:1096 for tile  0, benchmark 1: java.rmi.ConnectException: Connection refused to host: 172.21.128.219; nested exception is:
        java.net.ConnectException: Connection refused (Connection refused)
2017-09-11 19:39:00:135 PrimeControl: id=1, abortID=-1
2017-09-11 19:39:00:135 PrimeControl: masters[1]=client1:1096
2017-09-11 19:39:00:136 PrimeControl: [ERROR] exception occurred sending abortTest signal to specweb_Stub[UnicastRef [liveRef: [endpoint:[172.21.128.219:4188](remote),objID:[544cc4e3:15e70350181:-7ffe, -1590442930850674644]]]]. Exception was:
 java.rmi.ConnectException: Connection refused to host: 172.21.128.219; nested exception is:
        java.net.ConnectException: Connection refused (Connection refused)
2017-09-11 19:39:00:152 PrimeControl: id=3, abortID=-1
2017-09-11 19:39:00:152 PrimeControl: masters[3]=client1:1092
2017-09-11 19:39:00:160 PrimeControl: id=0, abortID=-1
2017-09-11 19:39:00:160 PrimeControl: masters[0]=client1:1098
2017-09-11 19:39:00:184 PrimeControl: id=2, abortID=-1
2017-09-11 19:39:00:184 PrimeControl: masters[2]=client1:1094
2017-09-11 19:39:01:810 PrimeControl: stopping remote client processes
2017-09-11 19:39:03:821 PrimeControl: stopping local client threads
2017-09-11 19:39:03:822 PrimeControl: terminating run. Please wait...
2017-09-11 19:39:04:828 specvirt: benchmark run failed!
2017-09-11 19:39:04:828 specvirt: Done!

I not sure the SSLv3 problem is Solved, because i found https://www.spec.org/forums/index.php?topic=40.0 is Similar with me, and the reason is still the SSLv3!!

Follwing is the log.
Title: Re: runspecvirt fail
Post by: zgy on September 11, 2017, 08:01:54 AM
20170911-170323_part0

(include Control.config,  SPECweb2005/Test.confg,  java.security,  primectrl.out ... )
Title: Re: runspecvirt fail
Post by: zgy on September 11, 2017, 08:06:27 AM
client-172.21.128.219_1010_part0
Title: Re: runspecvirt fail
Post by: zgy on September 11, 2017, 08:11:16 AM
client-172.21.128.219_1010_part1
Title: Re: runspecvirt fail
Post by: zgy on September 11, 2017, 08:11:34 AM
client-172.21.128.219_1010_part2
Title: Re: runspecvirt fail
Post by: zgy on September 11, 2017, 08:12:03 AM
client-172.21.128.219_1010_part3
client-172.21.128.219_1010_part4
Title: Re: runspecvirt fail
Post by: zgy on September 11, 2017, 08:12:27 AM
client-172.21.128.219_1010_part5
Title: Re: runspecvirt fail
Post by: lroderic on September 11, 2017, 12:09:04 PM
Hi. Please simplify and set NUM_WORKLOADS=2 so we can debug what's failing without the extra log data from mailserver (client1:1094) and batchserver (client1:1092).

You're on the right track. Since the client requires ~1.4 GbE, you'll need to separate your web/infraserver client from the other workloads. Until you do that, please reduce the SIMULTANEOUS_SESSIONS to 1000 and rerun. On the client, what do you have in /etc/sysctl.conf? At a minimum:

Code: [Select]
fs.file-max=1000000
net.core.optmem_max = 20000000
net.core.rmem_default = 20000000
net.core.rmem_max = 20000000
net.core.wmem_default = 20000000
net.core.wmem_max = 20000000
net.core.somaxconn = 8192
net.ipv4.tcp_max_tw_buckets = 500000
net.ipv4.tcp_mem = 20000000 20000000 200000000
net.ipv4.tcp_rmem = 20000000 20000000 200000000
net.ipv4.tcp_wmem = 20000000 20000000 200000000
net.ipv4.ip_local_port_range = 4096 65535
net.ipv4.tcp_tw_reuse=1
vm.swappiness = 0
vm.overcommit_memory = 0

You can disable IPV6 if you like.

Lisa
Title: Re: runspecvirt fail
Post by: zgy on September 11, 2017, 09:05:51 PM
YES, i have set /ets/sysctl.conf
[root@client1 SPECvirt]# tail -20  /etc/sysctl.conf
# SPECvirt_sc2013 tunings for a client
fs.file-max=1000000

net.core.optmem_max = 20000000
net.core.rmem_default = 20000000
net.core.rmem_max = 20000000
net.core.wmem_default = 20000000
net.core.wmem_max = 20000000
net.core.somaxconn = 8192
net.ipv4.tcp_max_tw_buckets = 500000
net.ipv4.tcp_mem = 20000000 20000000 200000000
net.ipv4.tcp_rmem = 20000000 20000000 200000000
net.ipv4.tcp_wmem = 20000000 20000000 200000000
net.ipv4.ip_local_port_range = 4096 65535
net.ipv4.tcp_tw_reuse=1

vm.swappiness = 0
vm.overcommit_memory = 0


How should i do if i want to reduce the SIMULTANEOUS_SESSIONS to 1000?
1:  In webserver ,  set SIMULTANEOUS_SESSIONS=1000(/opt/SPECweb2005/wafgen/unix/support_downloads_props.rc)
2:  Also In webserver, run:
     cd /opt/SPECweb2005/wafgen
     ./Wafgen unix/support_downloads_props.rc
     ./Wafgen unix/support_image_props.rc
3:reboot VM
4:rerun

Is it right?
Title: Re: runspecvirt fail
Post by: lroderic on September 11, 2017, 11:56:45 PM
The best way is to edit Control.config and set WORKLOAD_LOAD_LEVEL[1]=1000. No need to rerun Wafgen. (That recreates the web store to embed the tile number in the data, which you already did.) This is documented in the Client Harness User Guide at https://www.spec.org/virt_sc2013/docs/SPECvirt_ClientHarnessUserGuide.html#mozTocId783824 (https://www.spec.org/virt_sc2013/docs/SPECvirt_ClientHarnessUserGuide.html#mozTocId783824).

Lisa
Title: Re: runspecvirt fail
Post by: zgy on September 12, 2017, 04:27:11 AM
Hi, Now I set WORKLOAD_LOAD_LEVEL[1]=500,  the result is still something error:

[root@client1 SPECvirt]# tail -70 primectrl.out
0,0,2017-09-12 14:58:58:032,36398,10.0,35428,10.0,74389,10.0,388657,7.25
0,1,2017-09-12 14:58:58:023,243904,243841,0,63,111456794488,1296346276,219810,14840,9191,259,391583,10
0,2,2017-09-12 14:58:58:023,513419,435919,77500,0,1253548202,0,35483
0,3,2017-09-12 14:58:58:022,1,715,1,0,26,3040,715,0

0,0,2017-09-12 14:59:08:032,36493,10.0,35501,10.0,74530,10.0,388889,7.25
0,1,2017-09-12 14:59:08:024,244240,244177,0,63,111569524905,1297613194,220133,14852,9192,259,391583,10
0,2,2017-09-12 14:59:08:024,514055,436502,77553,0,1254389601,0,35483
0,3,2017-09-12 14:59:08:022,1,716,1,0,26,3041,716,0

0,0,2017-09-12 14:59:18:031,36539,10.0,35577,10.0,74559,10.0,389200,7.25
0,1,2017-09-12 14:59:18:024,244590,244527,0,63,111751904747,1299481555,220482,14853,9192,259,391583,10
0,2,2017-09-12 14:59:18:024,514848,437093,77755,0,1256951286,0,35483
0,3,2017-09-12 14:59:18:022,1,717,1,0,26,3042,717,0

0,0,2017-09-12 14:59:28:030,36628,10.0,35661,10.0,74767,10.0,389510,7.25
0,1,2017-09-12 14:59:28:024,244870,244807,0,63,111916854639,1301179513,220762,14853,9192,259,391583,10
0,2,2017-09-12 14:59:28:024,515618,437730,77888,0,1257989336,0,35483
0,3,2017-09-12 14:59:28:022,1,718,2,0,26,3044,718,0

0,0,2017-09-12 14:59:38:031,36708,10.0,35741,10.0,74865,10.0,389832,7.25
0,1,2017-09-12 14:59:38:024,245215,245152,0,63,112034351619,1302531274,221081,14878,9193,259,391583,10
0,2,2017-09-12 14:59:38:024,516376,438408,77968,0,1258743364,0,35483
0,3,2017-09-12 14:59:38:022,1,719,1,0,26,3045,719,0

2017-09-12 14:59:47:991 PrimeControl: client1:1092 (PRIME_HOST[0][3]) run complete; numStarted = 3
2017-09-12 14:59:48:015 PrimeControl: stopping result polling.
2017-09-12 14:59:48:015 PrimeControl: waiting for all workloads to stop...
2017-09-12 14:59:48:059 PrimeControl: client1:1094 (PRIME_HOST[0][2]) run complete; numStarted = 2
2017-09-12 14:59:56:446 PrimeControl: client1:1098 (PRIME_HOST[0][0]) run complete; numStarted = 1
2017-09-12 14:59:58:015 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:00:13:016 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:00:28:017 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:00:43:017 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:00:58:018 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:01:13:018 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:01:28:019 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:01:43:019 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:01:58:020 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:02:13:020 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:02:28:021 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:02:43:021 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:02:56:531 PrimeControl: client1:1096 (PRIME_HOST[0][1]) run complete; numStarted = 0
2017-09-12 15:02:56:531 PrimeControl: all workloads stopped
2017-09-12 15:02:56:579 Workload validation errors reported!:
0-1-0 = Iteration 1 failed 95% TIME GOOD QoS. Achieved: 90.1%
0-1-1 = Iteration 1 failed 99% TIME TOLERABLE QoS. Achieved: 96.2%
0-0-0 = Dealer 90% Response Time FAILED
0-0-1 = Dealer Avg. Response Time FAILED
0-0-2 = Purchase Cycle Time Avg. FAILED
0-0-3 = Manage Cycle Time Avg. FAILED
0-0-4 = Browse Cycle Time Avg. FAILED
0-0-5 = Vehicle Purchasing Rate FAILED
0-0-6 = Largeorder Vehicle Purchase Rate FAILED
0-0-7 = Regular Vehicle Purchase Rate FAILED
0-0-8 = LargeOrderLine Vehicle Rate FAILED
0-0-9 = PlannedLines Vehicle Rate FAILED
0-0-10 = Manufacturing 90% Response Time FAILED
2017-09-12 15:02:56:599 PrimeControl: aggregate audit...
2017-09-12 15:02:56:601 PrimeControl: aggregate audit...
2017-09-12 15:02:56:609 PrimeControl: aggregate audit...
2017-09-12 15:02:56:724 PrimeControl: aggregate audit...
2017-09-12 15:02:56:724 PrimeControl: validating aggregate audit...
2017-09-12 15:02:58:842 PrimeControl: stopping clients.
2017-09-12 15:02:58:842 PrimeControl: stopping remote client processes
2017-09-12 15:03:00:852 PrimeControl: stopping local client threads
  > Loading Raw Result File..

2017-09-12 15:03:00:934 PrimeControl: terminating run. Please wait...
2017-09-12 15:03:01:940 specvirt: Done!

And I find there are many ERROR in Clientmgr1_1088.out 

Following it the log (also perf.html in it), thanks.
Title: Re: runspecvirt fail
Post by: zgy on September 12, 2017, 04:27:51 AM
Clientmgr1_1088.out_part0
Clientmgr1_1088.out_part1
Title: Re: runspecvirt fail
Post by: zgy on September 12, 2017, 04:28:19 AM
Clientmgr1_1088.out_part2
Clientmgr1_1088.out_part3
Title: Re: runspecvirt fail
Post by: zgy on September 12, 2017, 04:28:42 AM
Clientmgr1_1088.out_part4
Clientmgr1_1088.out_part5
Title: Re: runspecvirt fail
Post by: zgy on September 12, 2017, 04:29:09 AM
Clientmgr1_1088.out_part6
Clientmgr1_1088.out_part7
Title: Re: runspecvirt fail
Post by: zgy on September 12, 2017, 04:29:33 AM
Clientmgr1_1088.out_part8
Clientmgr1_1088.out_part9
Title: Re: runspecvirt fail
Post by: zgy on September 12, 2017, 04:29:56 AM
Clientmgr1_1088.out_part10
Clientmgr1_1088.out_part11
Title: Re: runspecvirt fail
Post by: zgy on September 12, 2017, 04:30:22 AM
Clientmgr1_1088.out_part12
Clientmgr1_1088.out_part13
Title: Re: runspecvirt fail
Post by: zgy on September 12, 2017, 04:30:35 AM
Clientmgr1_1088.out_part14
Title: Re: runspecvirt fail
Post by: lroderic on September 12, 2017, 10:58:48 AM
Code: [Select]
0,1,2017-09-12 14:59:38:024,245215,245152,0,63,112034351619,1302531274,221081,14878,9193,259,391583,10
Congratulations, your web/infraserver test ran @ 500 simultaneous sessions. The harness failure for web/infraserver you see is because you reduced the workload from 2500 to 500, so the SPECvirt harness will report that as a validation error. Do you see any errors in Clientmgr1_1096.out?

Would you please post a sample of errors in Clientmgr1_1088.out? Because I'm not seeing problems with web/infraserver in there. These are normal when you have debugging on:

Code: [Select]
-> 2017-09-12 13:41:37:847 WorkloadScheduler[162]: Thinking for 1000 msec
-> 2017-09-12 13:41:37:860 SPECweb_Support[34]: STATE 4; RESPONSE LENGTH = 20480
-> 2017-09-12 13:41:37:860 SPECweb_Support[34]: STATE 4; FILE BYTES READ = 96572

Your error running at 2500 sessions is because the network is saturated. I don't think you have it set up correctly for performance because app/dbserver should report in the sub-two second range, but it's taking 7.25 sec. The errors the harness is reporting for that is a result of a shortened run since there isn't time for the appserver to meet the required mix of transaction types.

If you have no chance at upgrading to 10 GbE, look at Appendix B of Client Harness User Guide at https://www.spec.org/virt_sc2013/docs/SPECvirt_ClientHarnessUserGuide.html#mozTocId969790 (https://www.spec.org/virt_sc2013/docs/SPECvirt_ClientHarnessUserGuide.html#mozTocId969790) for instructions on setting up a dedicated client for web.

On webserver and infraserver, please issue the ifconfig command and post the results here. Also, please post Clientmgr1_1096.out.

Lisa
Title: Re: runspecvirt fail
Post by: zgy on September 14, 2017, 07:05:33 AM
Thanks, I will use 10Ge network card.
Title: Re: runspecvirt fail
Post by: zgy on September 19, 2017, 09:02:24 AM
I  setup up a dedicated client for web forward : https://www.spec.org/virt_sc2013/docs/SPECvirt_ClientHarnessUserGuide.html#mozTocId969790

But it fail:
[root@client1 SPECvirt]# cat primectrl.out
2017-09-19 20:03:21:554 Tue Sep 19 20:03:21 CST 2017
2017-09-19 20:03:21:609 RMI server started: client1:9990
2017-09-19 20:03:21:615 [INFO] This is a perf-only benchmark run. Skipping active idle polling interval.
2017-09-19 20:03:21:616 PrimeControl: preparing client drivers.
2017-09-19 20:03:21:693 PrimeControl: starting clients...
2017-09-19 20:03:21:693 PrimeControl: starting clients...
2017-09-19 20:03:21:846 PrimeControl: PTDs not used for this benchmark run!
2017-09-19 20:03:21:846 PrimeControl: starting 4 masters.
2017-09-19 20:06:58:005 PrimeControl: waiting on 4 prime client(s).
.
2017-09-19 20:06:59:026 setting hostsReady = true
2017-09-19 20:09:14:401 PrimeControl: Workload and prime controller builds: 80
2017-09-19 20:09:26:996 specvirt: clock sync check completed successfully
2017-09-19 20:09:26:996 specvirt: initiating workload ramp-up.
2017-09-19 20:09:26:996 Polling start time = Tue Sep 19 20:29:27 CST 2017
2017-09-19 20:09:26:996 Polling end time   = Tue Sep 19 22:29:27 CST 2017
2017-09-19 20:14:28:005 PrimeControl: all workloads started.
2017-09-19 20:29:28:037 PrimeControl: all workloads in run time.
2017-09-19 20:29:28:037 PrimeControl: checking polling start response times...
2017-09-19 20:29:28:039 PrimeControl: sleeping for 0 sec
2017-09-19 20:29:28:039 PrimeControl: sending results counter reset command.
2017-09-19 20:29:28:039 PrimeControl: polling for 7200 sec
2017-09-19 20:29:38:050 [ERROR] Received abort signal from wclient1:1096. Terminating.
2017-09-19 20:29:38:050 PrimeControl: sending abortTest() to prime clients.
2017-09-19 20:29:38:051 PrimeControl: [ERROR] startMasters() failed!
[root@client1 SPECvirt]#


Then I check Clientmgr1_1096.out from wclietn1, it shows:
-> 2017-09-19 20:29:52:467 RemoteLoadGen: [ERROR] Unable to contact wclient1:1010
-> 2017-09-19 20:29:52:468 RemoteLoadGen: [ERROR] 1 remote clients, but only 0 responded
-> 2017-09-19 20:29:52:468 SpecwebControl: [ERROR] Client(s) not responding. Aborting test.
-> 2017-09-19 20:29:52:468 RemoteLoadGen: [ERROR] Remote exception setting server reset data collection from wclient1:1010
-> java.rmi.ConnectException: Connection refused to host: 172.21.128.241; nested exception is:
->      java.net.ConnectException: Connection refused (Connection refused)
-> 2017-09-19 20:29:52:468 SpecwebControl: Stopping remote clients.
-> 2017-09-19 20:29:52:470 RemoteLoadGen: 180-second ramp-down starting.

And I check netstat from wclient1, 1010 port is not listened,
[root@wclient1 SPECvirt]# netstat -tunlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name
tcp        0      0 0.0.0.0:111                 0.0.0.0:*                   LISTEN      1714/rpcbind
tcp        0      0 0.0.0.0:38035               0.0.0.0:*                   LISTEN      1758/rpc.statd
tcp        0      0 192.168.122.1:53            0.0.0.0:*                   LISTEN      2551/dnsmasq
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      2140/sshd
tcp        0      0 127.0.0.1:631               0.0.0.0:*                   LISTEN      1793/cupsd
tcp        0      0 127.0.0.1:25                0.0.0.0:*                   LISTEN      2226/master
tcp        0      0 :::1088                     :::*                        LISTEN      3863/java
tcp        0      0 :::6596                     :::*                        LISTEN      3797/java
tcp        0      0 :::1096                     :::*                        LISTEN      3797/java
tcp        0      0 :::34346                    :::*                        LISTEN      3863/java
tcp        0      0 :::40459                    :::*                        LISTEN      1758/rpc.statd
tcp        0      0 :::111                      :::*                        LISTEN      1714/rpcbind
tcp        0      0 :::22                       :::*                        LISTEN      2140/sshd
tcp        0      0 ::1:631                     :::*                        LISTEN      1793/cupsd
tcp        0      0 ::1:25                      :::*                        LISTEN      2226/master
udp        0      0 0.0.0.0:67                  0.0.0.0:*                               2551/dnsmasq
udp        0      0 0.0.0.0:993                 0.0.0.0:*                               1645/portreserve
udp        0      0 0.0.0.0:995                 0.0.0.0:*                               1645/portreserve
udp        0      0 0.0.0.0:617                 0.0.0.0:*                               1714/rpcbind
udp        0      0 0.0.0.0:110                 0.0.0.0:*                               1645/portreserve
udp        0      0 0.0.0.0:111                 0.0.0.0:*                               1714/rpcbind
udp        0      0 0.0.0.0:33782               0.0.0.0:*                               1758/rpc.statd
udp        0      0 0.0.0.0:631                 0.0.0.0:*                               1793/cupsd
udp        0      0 0.0.0.0:143                 0.0.0.0:*                               1645/portreserve
udp        0      0 192.168.122.1:53            0.0.0.0:*                               2551/dnsmasq
udp        0      0 127.0.0.1:703               0.0.0.0:*                               1758/rpc.statd
udp        0      0 :::65354                    :::*                                    1758/rpc.statd
udp        0      0 :::617                      :::*                                    1714/rpcbind
udp        0      0 :::111                      :::*                                    1714/rpcbind

And I fount there is no spec process running on wclient1
[root@wclient1 SPECvirt]# ps afx | grep spec
12408 pts/1    S+     0:00          \_ grep spec
[root@wclient1 SPECvirt]#
[root@wclient1 SPECvirt]#
[root@wclient1 SPECvirt]# ps afx | grep client
12416 pts/1    S+     0:00          \_ grep client
 3797 ?        Sl     0:02 java -jar clientmgr.jar -p 1096 -log
 3863 ?        Sl     0:08 java -jar clientmgr.jar -p 1088 -log
[root@wclient1 SPECvirt]#

why?
Title: Re: runspecvirt fail
Post by: zgy on September 19, 2017, 09:05:33 AM
My configure:
[root@client1 SPECvirt]# cat Control.config | grep wclient
PRIME_HOST[0][1] = "wclient1:1096"
WORKLOAD_CLIENTS[1] = "wclient1:1010"


[root@client1 SPECvirt]# cat Clientmgr.sh
# Clientmgr [tile_index]
# Script called from runspecvirt.sh
#
java -jar clientmgr.jar -p 1098 -log > Clientmgr$1_1098.out 2>&1 &
#java -jar clientmgr.jar -p 1096 -log > Clientmgr$1_1096.out 2>&1 &
ssh wclient$1 ". /root/.bash_profile ; cd /opt/SPECvirt ; java -jar clientmgr.jar -p 1096 -log > Clientmgr$1_1096.out 2>&1 & "
ssh wclient$1 ". /root/.bash_profile ; cd /opt/SPECvirt ; java -jar clientmgr.jar -p 1088 -log > Clientmgr$1_1088w.out 2>&1 & "
java -jar clientmgr.jar -p 1094 -log > Clientmgr$1_1094.out 2>&1 &
java -jar clientmgr.jar -p 1092 -log > Clientmgr$1_1092.out 2>&1 &
java -jar clientmgr.jar -p 1088 -log > Clientmgr$1_1088.out 2>&1 &








Title: Re: runspecvirt fail
Post by: lroderic on September 19, 2017, 01:07:33 PM
On both client1 and wclient1, please issue the command ifconfig and post the results here. Also, please post the contents of each client's /etc/hosts.

Lisa
Title: Re: runspecvirt fail
Post by: zgy on September 19, 2017, 08:55:48 PM
[root@client1 SPECvirt]# ifconfig
eth0      Link encap:Ethernet  HWaddr 52:54:00:80:3E:47
          inet addr:172.21.128.242  Bcast:172.21.128.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe80:3e47/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13767853 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11040281 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:21092275871 (19.6 GiB)  TX bytes:1277646784 (1.1 GiB)

eth1      Link encap:Ethernet  HWaddr 52:54:00:80:3E:48
          inet6 addr: fe80::5054:ff:fe80:3e48/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:541182 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:58465062 (55.7 MiB)  TX bytes:508 (508.0 b)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:15244 errors:0 dropped:0 overruns:0 frame:0
          TX packets:15244 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1700356 (1.6 MiB)  TX bytes:1700356 (1.6 MiB)

virbr0    Link encap:Ethernet  HWaddr 52:54:00:79:40:07
          inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)



[root@wclient1 ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 52:54:00:A0:4E:9B
          inet addr:172.21.128.241  Bcast:172.21.128.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fea0:4e9b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:10895663 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8802015 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:16612923171 (15.4 GiB)  TX bytes:915310848 (872.9 MiB)

eth1      Link encap:Ethernet  HWaddr 52:54:00:B0:80:D8
          inet6 addr: fe80::5054:ff:feb0:80d8/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:664266 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:71517156 (68.2 MiB)  TX bytes:488 (488.0 b)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:5570 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5570 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:544845 (532.0 KiB)  TX bytes:544845 (532.0 KiB)

virbr0    Link encap:Ethernet  HWaddr 52:54:00:79:40:07
          inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)


[root@client1 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
##

##
# Defaults used for the SPECvirt_sc2013 Example VM Setup Guide.

# External VM-to-client communications
172.24.11.220     infraserver infraserver1
172.24.11.221     webserver webserver1
172.24.11.222     mailserver mailserver1
172.24.11.223     appserver appserver1 specdelivery specemulator
172.24.11.224     dbserver dbserver1
172.24.11.225     batchserver batchserver1

172.21.128.242    client1 specdriver
172.21.128.241    wclient1

# Internal VM-to-VM only communications
10.10.1.220       infraserver1-int
10.10.1.221       webserver1-int
10.10.1.224       appserver1-int
10.10.1.225       dbserver1-int specdb


[root@wclient1 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
##

##
# Defaults used for the SPECvirt_sc2013 Example VM Setup Guide.

# External VM-to-client communications
172.24.11.220     infraserver infraserver1
172.24.11.221     webserver webserver1
172.24.11.222     mailserver mailserver1
172.24.11.223     appserver appserver1 specdelivery specemulator
172.24.11.224     dbserver dbserver1
172.24.11.225     batchserver batchserver1

172.21.128.242    client1 specdriver
172.21.128.241    wclient1

# Internal VM-to-VM only communications
10.10.1.220       infraserver1-int
10.10.1.221       webserver1-int
10.10.1.224       appserver1-int
10.10.1.225       dbserver1-int specdb



As https://www.spec.org/virt_sc2013/docs/SPECvirt_ClientHarnessUserGuide.html#mozTocId969790 said,  /etc/hosts  will be updateed like this:

192.168.122.10 client1
192.168.122.11 wclient1

192.168.122.20 client2
192.168.122.21 wclient2

But VM_EXAMPLE shows us /etc/hosts is :
192.168.1.1 infraserver infraserver1
192.168.1.2 webserver webserver1
192.168.1.3 mailserver mailserver1
192.168.1.4 appserver appserver1 specdelivery specemulator
192.168.1.5 dbserver dbserver1
192.168.1.6 batchserver batchserver1

192.168.1.8 client1 specdriver

So In Multy client "specdriver" is not needed?  Or it is a typo?

Thansk.
Title: Re: runspecvirt fail
Post by: lroderic on September 20, 2017, 11:28:20 AM
Here's your trouble:

eth1      Link encap:Ethernet  HWaddr 52:54:00:80:3E:48
          inet6 addr: fe80::5054:ff:fe80:3e48/64 Scope:Link

and

# Internal VM-to-VM only communications
10.10.1.220       infraserver1-int
...

The clients need to use only one vNIC, and it uses the external network to communicate with the workload VMs. You've got a second vNIC for the internal network on the client, and you need to disable/remove that on the client. In Section 2.4 of the Example VM guide:

   The script also sets up hostnames for the internal network interface (eth1 between the VMs only) to
   use the following internal IP addresses. Only network-intensive workload VMs
have a second interface defined.

I'll update the guide to clarify that the clients only use the external network. Workload VMs use both the external and internal network.
I have a picture but have to figure out how to post it here.

Regarding specdriver, that's not a problem. That needs to be there.

Lisa


Title: Re: runspecvirt fail
Post by: lroderic on September 20, 2017, 12:22:16 PM
Here's the picture. External network (blue) for all workload VMs and clients. Internal network (green) only between infra/web and app/db.

With the internal network, you're isolating the traffic between the VMs to offload it from the external network.

Lisa
Title: Re: runspecvirt fail
Post by: zgy on September 21, 2017, 09:08:42 AM
Thanks Lisa

Now I used 10Ge card and remove the eth1 and result is: (Also fail if not use wclient.)
primectrl.out:
2017-09-21 20:45:38:090 [ERROR] wclient1:1096 (PRIME_HOST[0][1]) failed to enter run phase before start of polling interval!
2017-09-21 20:45:38:090 PrimeControl: dumping polling start response times...
2017-09-21 20:45:38:090 client1:1098 (PRIME_HOST[0][0]) msec after pollStart: 8
2017-09-21 20:45:38:091 [ERROR] wclient1:1096 (PRIME_HOST[0][1]) msec after pollStart: 9223372036854775807
2017-09-21 20:45:38:091 client1:1094 (PRIME_HOST[0][2]) msec after pollStart: 16
2017-09-21 20:45:38:091 client1:1092 (PRIME_HOST[0][3]) msec after pollStart: 7
2017-09-21 20:45:38:091 PrimeControl: [ERROR] one or more workloads failed to start runtime before start of polling interval. Aborting.
2017-09-21 20:45:38:091 PrimeControl: sending abortTest() to prime clients.
2017-09-21 20:45:38:091 PrimeControl: id=0, abortID=-1
2017-09-21 20:45:38:091 PrimeControl: id=1, abortID=-1
2017-09-21 20:45:38:092 PrimeControl: masters[0]=client1:1098
2017-09-21 20:45:38:092 PrimeControl: masters[1]=wclient1:1096
2017-09-21 20:45:38:092 PrimeControl: id=3, abortID=-1
2017-09-21 20:45:38:092 PrimeControl: masters[3]=client1:1092
2017-09-21 20:45:38:092 PrimeControl: id=2, abortID=-1
2017-09-21 20:45:38:092 PrimeControl: masters[2]=client1:1094
2017-09-21 20:45:54:960 PrimeControl: stopping remote client processes
2017-09-21 20:45:56:970 PrimeControl: stopping local client threads
2017-09-21 20:45:56:971 PrimeControl: [ERROR] startMasters() failed!
2017-09-21 20:45:56:971 PrimeControl: terminating run. Please wait...
2017-09-21 20:45:57:977 specvirt: benchmark run failed!
2017-09-21 20:45:57:977 specvirt: Done!

ERROR info from wclient1:
[root@wclient1 logs]# grep -rn ERROR .
./Clientmgr1_1096.out:138:-> Checking keyname:IGNORE_WPD_ERRORS with value: 0for compliance.
./20170921-201929/prime-wclient1_1096.log:131:Checking keyname:IGNORE_WPD_ERRORS with value: 0for compliance.
./20170921-201929/client-172.21.128.241_1010.log:575939:2017-09-21 20:45:46:925 MTFF[178]: [ERROR] Incorrect file size returned. Expected: 240; received: 0
./20170921-201929/client-172.21.128.241_1010.log:575941:2017-09-21 20:45:47:053 MTFF[269]: [ERROR] Incorrect file size returned. Expected: 240; received: 0
./20170921-201929/client-172.21.128.241_1010.log:575942:2017-09-21 20:45:47:052 MTFF[135]: [ERROR] Incorrect file size returned. Expected: 80; received: 0
./20170921-201929/client-172.21.128.241_1010.log:575943:2017-09-21 20:45:47:006 MTFF[249]: [ERROR] Incorrect file size returned. Expected: 260; received: -252
./Clientmgr1_1088w.out:575945:-> 2017-09-21 20:45:46:925 MTFF[178]: [ERROR] Incorrect file size returned. Expected: 240; received: 0
./Clientmgr1_1088w.out:575947:-> 2017-09-21 20:45:47:053 MTFF[269]: [ERROR] Incorrect file size returned. Expected: 240; received: 0
./Clientmgr1_1088w.out:575948:-> 2017-09-21 20:45:47:052 MTFF[135]: [ERROR] Incorrect file size returned. Expected: 80; received: 0
./Clientmgr1_1088w.out:575949:-> 2017-09-21 20:45:47:006 MTFF[249]: [ERROR] Incorrect file size returned. Expected: 260; received: -252
[root@wclient1 logs]# grep -rn Error .
[root@wclient1 logs]#

ERROR info from client1:
[root@client1 20170921-201929]# grep -rn ERROR .
./prime-client1_1094.log:51:RESULT.IMAP_APPEND.100.ERROR"0"
./prime-client1_1094.log:60:RESULT.IMAP_FETCH_NUM_RFC822HEADER.100.ERROR"0"
./prime-client1_1094.log:69:RESULT.IMAP_FETCH_RFC822.100.ERROR"0"
./prime-client1_1094.log:78:RESULT.IMAP_LOGIN.100.ERROR"0"
./prime-client1_1094.log:87:RESULT.IMAP_LOGOUT.100.ERROR"0"
./prime-client1_1094.log:96:RESULT.IMAP_SEARCH_ALL_SUBJECT.100.ERROR"0"
./prime-client1_1094.log:105:RESULT.IMAP_SEARCH_UNSEEN.100.ERROR"0"
./prime-client1_1094.log:114:RESULT.IMAP_SELECT_INBOX.100.ERROR"0"
./prime-client1_1094.log:123:RESULT.IMAP_SELECT_INBOX.SPEC.100.ERROR"0"
./prime-client1_1094.log:132:RESULT.IMAP_UID_STORE_NUM_UNSET_FLAGS_SEEN.100.ERROR"0"
./primectrl.out:73:2017-09-21 20:45:38:090 [ERROR] wclient1:1096 (PRIME_HOST[0][1]) failed to enter run phase before start of polling interval!
./primectrl.out:76:2017-09-21 20:45:38:091 [ERROR] wclient1:1096 (PRIME_HOST[0][1]) msec after pollStart: 9223372036854775807
./primectrl.out:79:2017-09-21 20:45:38:091 PrimeControl: [ERROR] one or more workloads failed to start runtime before start of polling interval. Aborting.
./primectrl.out:91:2017-09-21 20:45:56:971 PrimeControl: [ERROR] startMasters() failed!
./Clientmgr1_1094.out:70:-> RESULT.IMAP_APPEND.100.ERROR"0"
./Clientmgr1_1094.out:79:-> RESULT.IMAP_FETCH_NUM_RFC822HEADER.100.ERROR"0"
./Clientmgr1_1094.out:88:-> RESULT.IMAP_FETCH_RFC822.100.ERROR"0"
./Clientmgr1_1094.out:97:-> RESULT.IMAP_LOGIN.100.ERROR"0"
./Clientmgr1_1094.out:106:-> RESULT.IMAP_LOGOUT.100.ERROR"0"
./Clientmgr1_1094.out:115:-> RESULT.IMAP_SEARCH_ALL_SUBJECT.100.ERROR"0"
./Clientmgr1_1094.out:124:-> RESULT.IMAP_SEARCH_UNSEEN.100.ERROR"0"
./Clientmgr1_1094.out:133:-> RESULT.IMAP_SELECT_INBOX.100.ERROR"0"
./Clientmgr1_1094.out:142:-> RESULT.IMAP_SELECT_INBOX.SPEC.100.ERROR"0"
./Clientmgr1_1094.out:151:-> RESULT.IMAP_UID_STORE_NUM_UNSET_FLAGS_SEEN.100.ERROR"0"
[root@client1 20170921-201929]# grep -rn Error .
./prime-client1_1094.log:136:Error unmarshaling return header; nested exception is:
./Clientmgr1_1094.out:155:-> Error unmarshaling return header; nested exception is:
[root@client1 20170921-201929]#


I donot think the ./prime-client1_1094.log and ./Clientmgr1_1094.out is the reseaon that due to benchmark failed, because befor the Error(Error unmarshaling return header; nested exception is), it had receive a abort signal which i think caused by wclient1:1096.

paste more info about Clientmgr1_1088w.out:
[root@wclient1 logs]# tail -100 Clientmgr1_1088w.out
-> 2017-09-21 20:46:00:031 WorkloadScheduler[2285]: Scheduler stopped.
-> 2017-09-21 20:46:00:031 WorkloadScheduler[2285]: Stopping workload thread
-> 2017-09-21 20:45:48:536 WorkloadScheduler[1291]: Scheduler stopped.
-> 2017-09-21 20:46:00:031 WorkloadScheduler[1291]: Stopping workload thread
-> 2017-09-21 20:45:48:536 WorkloadScheduler[1289]: Scheduler stopped.
-> 2017-09-21 20:46:00:031 WorkloadScheduler[1289]: Stopping workload thread
-> 2017-09-21 20:45:48:535 WorkloadScheduler[1284]: Scheduler stopped.
-> 2017-09-21 20:46:00:031 WorkloadScheduler[1284]: Stopping workload thread
-> 2017-09-21 20:45:48:535 WorkloadScheduler[1281]: Scheduler stopped.
-> 2017-09-21 20:46:00:032 WorkloadScheduler[1281]: Stopping workload thread
-> 2017-09-21 20:45:48:535 WorkloadScheduler[1280]: Scheduler stopped.
-> 2017-09-21 20:46:00:032 WorkloadScheduler[1280]: Stopping workload thread
-> 2017-09-21 20:45:48:534 SPECweb_Support[1337]: State Switch [5,4]
-> 2017-09-21 20:45:48:533 SPECweb_Support[2020]: State Switch [2,1]
-> 2017-09-21 20:46:00:032 WorkloadScheduler[1337]: Scheduler stopped.
-> 2017-09-21 20:46:00:032 WorkloadScheduler[1337]: Stopping workload thread
-> 2017-09-21 20:46:00:032 WorkloadScheduler[2020]: Scheduler stopped.
-> 2017-09-21 20:46:00:032 WorkloadScheduler[2020]: Stopping workload thread
-> 2017-09-21 20:45:48:533 SPECweb_Support[1303]: STATE 2; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:032 SPECweb_Support[1303]: STATE 2; FILE BYTES READ = 29545
-> 2017-09-21 20:45:48:507 SPECweb_Support[2264]: State Switch [2,3]
-> 2017-09-21 20:46:00:032 WorkloadScheduler[2264]: Scheduler stopped.
-> 2017-09-21 20:46:00:032 WorkloadScheduler[2264]: Stopping workload thread
-> 2017-09-21 20:45:48:507 SPECweb_Support[1442]: STATE 2; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:032 SPECweb_Support[1442]: STATE 2; FILE BYTES READ = 29543
-> 2017-09-21 20:45:48:507 SPECweb_Support[2489]: STATE 3; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:032 SPECweb_Support[2489]: STATE 3; FILE BYTES READ = 57407
-> 2017-09-21 20:45:48:507 SPECweb_Support[2447]: State Switch [1,3]
-> 2017-09-21 20:45:48:507 SPECweb_Support[1487]: State Switch [1,1]
-> 2017-09-21 20:46:00:033 WorkloadScheduler[2447]: Scheduler stopped.
-> 2017-09-21 20:46:00:033 WorkloadScheduler[2447]: Stopping workload thread
-> 2017-09-21 20:46:00:033 WorkloadScheduler[1487]: Scheduler stopped.
-> 2017-09-21 20:46:00:033 WorkloadScheduler[1487]: Stopping workload thread
-> 2017-09-21 20:45:48:506 SPECweb_Support[2121]: STATE 1; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:033 SPECweb_Support[2121]: STATE 1; FILE BYTES READ = 19934
-> 2017-09-21 20:45:48:483 SPECweb_Support[1735]: STATE 1; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:033 SPECweb_Support[1735]: STATE 1; FILE BYTES READ = 19749
-> 2017-09-21 20:46:00:034 SPECweb_Support[2121]: State Switch [1,3]
-> 2017-09-21 20:46:00:034 WorkloadScheduler[2121]: Scheduler stopped.
-> 2017-09-21 20:46:00:034 WorkloadScheduler[2121]: Stopping workload thread
-> 2017-09-21 20:45:48:483 SPECweb_Support[1779]: State Switch [0,2]
-> 2017-09-21 20:45:48:483 SPECweb_Support[2062]: STATE 1; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:034 SPECweb_Support[2062]: STATE 1; FILE BYTES READ = 19753
-> 2017-09-21 20:46:00:034 WorkloadScheduler[1779]: Scheduler stopped.
-> 2017-09-21 20:46:00:034 WorkloadScheduler[1779]: Stopping workload thread
-> 2017-09-21 20:45:48:483 SPECweb_Support[1882]: STATE 1; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:034 SPECweb_Support[1882]: STATE 1; FILE BYTES READ = 20324
-> 2017-09-21 20:45:48:483 SPECweb_Support[2387]: STATE 1; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:034 SPECweb_Support[2387]: STATE 1; FILE BYTES READ = 19928
-> 2017-09-21 20:45:48:483 SPECweb_Support[1402]: STATE 1; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:034 SPECweb_Support[1402]: STATE 1; FILE BYTES READ = 19944
-> 2017-09-21 20:45:48:483 SPECweb_Support[1445]: State Switch [2,3]
-> 2017-09-21 20:45:48:483 SPECweb_Support[1810]: STATE 1; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:034 SPECweb_Support[1810]: STATE 1; FILE BYTES READ = 19932
-> 2017-09-21 20:46:00:034 WorkloadScheduler[1445]: Scheduler stopped.
-> 2017-09-21 20:46:00:034 WorkloadScheduler[1445]: Stopping workload thread
-> 2017-09-21 20:45:48:483 SPECweb_Support[2429]: STATE 1; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:035 SPECweb_Support[2429]: STATE 1; FILE BYTES READ = 20379
-> 2017-09-21 20:45:48:483 SPECweb_Support[1997]: STATE 1; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:035 SPECweb_Support[1997]: STATE 1; FILE BYTES READ = 20318
-> 2017-09-21 20:45:48:483 SPECweb_Support[2112]: STATE 1; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:035 SPECweb_Support[2112]: STATE 1; FILE BYTES READ = 20377
-> 2017-09-21 20:45:48:483 SPECweb_Support[2498]: STATE 1; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:035 SPECweb_Support[2498]: STATE 1; FILE BYTES READ = 20383
-> 2017-09-21 20:45:48:483 SPECweb_Support[1339]: STATE 1; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:035 SPECweb_Support[1339]: STATE 1; FILE BYTES READ = 20322
-> 2017-09-21 20:46:00:036 SPECweb_Support[1339]: State Switch [1,3]
-> 2017-09-21 20:46:00:036 WorkloadScheduler[1339]: Scheduler stopped.
-> 2017-09-21 20:46:00:036 WorkloadScheduler[1339]: Stopping workload thread
-> 2017-09-21 20:45:48:482 SPECweb_Support[1538]: State Switch [1,1]
-> 2017-09-21 20:45:48:458 SPECweb_Support[1830]: State Switch [2,3]
-> 2017-09-21 20:46:00:036 WorkloadScheduler[1538]: Scheduler stopped.
-> 2017-09-21 20:46:00:036 WorkloadScheduler[1538]: Stopping workload thread
-> 2017-09-21 20:46:00:036 WorkloadScheduler[1830]: Scheduler stopped.
-> 2017-09-21 20:46:00:036 WorkloadScheduler[1830]: Stopping workload thread
-> 2017-09-21 20:45:48:458 SPECweb_Support[1384]: STATE 5; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:036 SPECweb_Support[1442]: State Switch [2,3]
-> 2017-09-21 20:46:00:036 SPECweb_Support[1384]: STATE 5; FILE BYTES READ = 115794
-> 2017-09-21 20:46:00:036 WorkloadScheduler[1442]: Scheduler stopped.
-> 2017-09-21 20:46:00:036 WorkloadScheduler[1442]: Stopping workload thread
-> 2017-09-21 20:45:48:456 SPECweb_Support[2480]: State Switch [4,3]
-> 2017-09-21 20:45:48:456 SPECweb_Support[1611]: STATE 1; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:037 SPECweb_Support[1611]: STATE 1; FILE BYTES READ = 19924
-> 2017-09-21 20:46:00:037 WorkloadScheduler[2480]: Scheduler stopped.
-> 2017-09-21 20:46:00:037 WorkloadScheduler[2480]: Stopping workload thread
-> 2017-09-21 20:45:48:456 SPECweb_Support[1823]: State Switch [1,3]
-> 2017-09-21 20:46:00:037 WorkloadScheduler[1823]: Scheduler stopped.
-> 2017-09-21 20:46:00:037 WorkloadScheduler[1823]: Stopping workload thread
-> 2017-09-21 20:45:48:456 SPECweb_Support[1541]: STATE 1; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:037 SPECweb_Support[1541]: STATE 1; FILE BYTES READ = 19940
-> 2017-09-21 20:45:48:456 SPECweb_Support[1750]: STATE 1; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:037 SPECweb_Support[1750]: STATE 1; FILE BYTES READ = 20326
-> 2017-09-21 20:45:48:455 SPECweb_Support[1654]: STATE 1; RESPONSE LENGTH = 20480
-> 2017-09-21 20:46:00:037 SPECweb_Support[1654]: STATE 1; FILE BYTES READ = 20322
2017-09-21 20:46:00:346 Terminating processes. Please wait...
2017-09-21 20:46:00:347 Killing client procs ...
2017-09-21 20:46:00:350 Done killing procs ...

Title: Re: runspecvirt fail
Post by: zgy on September 21, 2017, 09:08:52 AM
ifconfig info :

[root@client1 SPECvirt]# ./ifconfig.sh
Tile 1 VMs:
dbserver1:
eth0      Link encap:Ethernet  HWaddr 52:54:00:67:75:DB
          inet addr:172.24.11.224  Bcast:172.24.11.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe67:75db/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:11419 errors:0 dropped:0 overruns:0 frame:0
          TX packets:731 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:794184 (775.5 KiB)  TX bytes:104432 (101.9 KiB)

eth1      Link encap:Ethernet  HWaddr 52:54:00:19:ED:DC
          inet addr:10.10.1.225  Bcast:10.10.1.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe19:eddc/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:4080641 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3982337 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:912088505 (869.8 MiB)  TX bytes:1007856027 (961.1 MiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:12246 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12246 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1633832 (1.5 MiB)  TX bytes:1633832 (1.5 MiB)

appserver1:
eth0      Link encap:Ethernet  HWaddr 52:54:00:8A:70:70
          inet addr:172.24.11.223  Bcast:172.24.11.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe8a:7070/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1386278 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6411295 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:253066345 (241.3 MiB)  TX bytes:8402372287 (7.8 GiB)

eth1      Link encap:Ethernet  HWaddr 52:54:00:A6:BE:29
          inet addr:10.10.1.224  Bcast:10.10.1.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fea6:be29/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:4027232 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4039657 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1013511798 (966.5 MiB)  TX bytes:906414850 (864.4 MiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:151645 errors:0 dropped:0 overruns:0 frame:0
          TX packets:151645 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:25168594 (24.0 MiB)  TX bytes:25168594 (24.0 MiB)

batchserver1:
eth0      Link encap:Ethernet  HWaddr 52:54:00:62:D2:6D
          inet addr:172.24.11.225  Bcast:172.24.11.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe62:d26d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:11910 errors:0 dropped:0 overruns:0 frame:0
          TX packets:373 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:777703 (759.4 KiB)  TX bytes:64593 (63.0 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:664 errors:0 dropped:0 overruns:0 frame:0
          TX packets:664 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:65257 (63.7 KiB)  TX bytes:65257 (63.7 KiB)

mailserver1:
eth0      Link encap:Ethernet  HWaddr 52:54:00:5F:8F:10
          inet addr:172.24.11.222  Bcast:172.24.11.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe5f:8f10/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:191942 errors:0 dropped:0 overruns:0 frame:0
          TX packets:289904 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:39773093 (37.9 MiB)  TX bytes:322604850 (307.6 MiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:339 errors:0 dropped:0 overruns:0 frame:0
          TX packets:339 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:29220 (28.5 KiB)  TX bytes:29220 (28.5 KiB)

infraserver1:
eth0      Link encap:Ethernet  HWaddr 52:54:00:DD:AF:B8
          inet addr:172.24.11.220  Bcast:172.24.11.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fedd:afb8/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1374108 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1379470 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:97859836 (93.3 MiB)  TX bytes:274606426 (261.8 MiB)

eth1      Link encap:Ethernet  HWaddr 52:54:00:D6:C5:5A
          inet addr:10.10.1.220  Bcast:10.10.1.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fed6:c55a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2310511 errors:0 dropped:0 overruns:0 frame:0
          TX packets:35061596 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:178925296 (170.6 MiB)  TX bytes:52865583612 (49.2 GiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:439 errors:0 dropped:0 overruns:0 frame:0
          TX packets:439 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:38667 (37.7 KiB)  TX bytes:38667 (37.7 KiB)

webserver1:
eth0      Link encap:Ethernet  HWaddr 52:54:00:0B:6B:A2
          inet addr:172.24.11.221  Bcast:172.24.11.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe0b:6ba2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:9155011 errors:0 dropped:0 overruns:0 frame:0
          TX packets:56127619 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1229199744 (1.1 GiB)  TX bytes:75202972358 (70.0 GiB)

eth1      Link encap:Ethernet  HWaddr 52:54:00:2F:3B:74
          inet addr:10.10.1.221  Bcast:10.10.1.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe2f:3b74/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:35106428 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2265609 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:52871140880 (49.2 GiB)  TX bytes:161157414 (153.6 MiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:14024 errors:0 dropped:0 overruns:0 frame:0
          TX packets:14024 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1323402 (1.2 MiB)  TX bytes:1323402 (1.2 MiB)

client1:
eth0      Link encap:Ethernet  HWaddr 52:54:00:4F:9B:F3
          inet addr:172.21.128.242  Bcast:172.21.128.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe4f:9bf3/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6859780 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1654916 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:8832075123 (8.2 GiB)  TX bytes:388737854 (370.7 MiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:4966 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4966 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:643363 (628.2 KiB)  TX bytes:643363 (628.2 KiB)

wclient1:
eth0      Link encap:Ethernet  HWaddr 52:54:00:8A:CB:C8
          inet addr:172.21.128.241  Bcast:172.21.128.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe8a:cbc8/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:54832580 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7834945 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:75118203239 (69.9 GiB)  TX bytes:1021146124 (973.8 MiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:1192 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1192 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:110359 (107.7 KiB)  TX bytes:110359 (107.7 KiB)




[root@client1 SPECvirt]# ./hostname.sh
infraserver1
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
##

##
# Defaults used for the SPECvirt_sc2013 Example VM Setup Guide.

# External VM-to-client communications
172.24.11.220     infraserver infraserver1
172.24.11.221     webserver webserver1
172.24.11.222     mailserver mailserver1
172.24.11.223     appserver appserver1 specdelivery specemulator
172.24.11.224     dbserver dbserver1
172.24.11.225     batchserver batchserver1

172.21.128.242    client1 specdriver
172.21.128.241    wclient1

# Internal VM-to-VM only communications
10.10.1.220       infraserver1-int
10.10.1.221       webserver1-int
10.10.1.224       appserver1-int
10.10.1.225       dbserver1-int specdb

websserver1
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
##

##
# Defaults used for the SPECvirt_sc2013 Example VM Setup Guide.

# External VM-to-client communications
172.24.11.220     infraserver infraserver1
172.24.11.221     webserver webserver1
172.24.11.222     mailserver mailserver1
172.24.11.223     appserver appserver1 specdelivery specemulator
172.24.11.224     dbserver dbserver1
172.24.11.225     batchserver batchserver1

172.21.128.242    client1 specdriver
172.21.128.241    wclient1

# Internal VM-to-VM only communications
10.10.1.220       infraserver1-int
10.10.1.221       webserver1-int
10.10.1.224       appserver1-int
10.10.1.225       dbserver1-int specdb

appserver1:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
##

##
# Defaults used for the SPECvirt_sc2013 Example VM Setup Guide.

# External VM-to-client communications
172.24.11.220     infraserver infraserver1
172.24.11.221     webserver webserver1
172.24.11.222     mailserver mailserver1
172.24.11.223     appserver appserver1 specdelivery specemulator
172.24.11.224     dbserver dbserver1
172.24.11.225     batchserver batchserver1

172.21.128.242    client1 specdriver
172.21.128.241    wclient1

# Internal VM-to-VM only communications
10.10.1.220       infraserver1-int
10.10.1.221       webserver1-int
10.10.1.224       appserver1-int
10.10.1.225       dbserver1-int specdb


dbserver1:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
##

##
# Defaults used for the SPECvirt_sc2013 Example VM Setup Guide.

# External VM-to-client communications
172.24.11.220     infraserver infraserver1
172.24.11.221     webserver webserver1
172.24.11.222     mailserver mailserver1
172.24.11.223     appserver appserver1 specdelivery specemulator
172.24.11.224     dbserver dbserver1
172.24.11.225     batchserver batchserver1

172.21.128.242    client1 specdriver
172.21.128.241    wclient1

# Internal VM-to-VM only communications
10.10.1.220       infraserver1-int
10.10.1.221       webserver1-int
10.10.1.224       appserver1-int
10.10.1.225       dbserver1-int specdb



batchserver1:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
##

##
# Defaults used for the SPECvirt_sc2013 Example VM Setup Guide.

# External VM-to-client communications
172.24.11.220     infraserver infraserver1
172.24.11.221     webserver webserver1
172.24.11.222     mailserver mailserver1
172.24.11.223     appserver appserver1 specdelivery specemulator
172.24.11.224     dbserver dbserver1
172.24.11.225     batchserver batchserver1

172.21.128.242    client1 specdriver
172.21.128.241    wclient1

# Internal VM-to-VM only communications
10.10.1.220       infraserver1-int
10.10.1.221       webserver1-int
10.10.1.224       appserver1-int
10.10.1.225       dbserver1-int specdb

mailserver1:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
##

##
# Defaults used for the SPECvirt_sc2013 Example VM Setup Guide.

# External VM-to-client communications
172.24.11.220     infraserver infraserver1
172.24.11.221     webserver webserver1
172.24.11.222     mailserver mailserver1
172.24.11.223     appserver appserver1 specdelivery specemulator
172.24.11.224     dbserver dbserver1
172.24.11.225     batchserver batchserver1

172.21.128.242  client1 specdriver
172.21.128.241  wclient1

# Internal VM-to-VM only communications
10.10.1.220       infraserver1-int
10.10.1.221       webserver1-int
10.10.1.224       appserver1-int
10.10.1.225       dbserver1-int specdb

wclient1:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
##

##
# Defaults used for the SPECvirt_sc2013 Example VM Setup Guide.

# External VM-to-client communications
172.24.11.220     infraserver infraserver1
172.24.11.221     webserver webserver1
172.24.11.222     mailserver mailserver1
172.24.11.223     appserver appserver1 specdelivery specemulator
172.24.11.224     dbserver dbserver1
172.24.11.225     batchserver batchserver1

172.21.128.242    client1 specdriver
172.21.128.241    wclient1


client1:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
##

##
# Defaults used for the SPECvirt_sc2013 Example VM Setup Guide.

# External VM-to-client communications
172.24.11.220     infraserver infraserver1
172.24.11.221     webserver webserver1
172.24.11.222     mailserver mailserver1
172.24.11.223     appserver appserver1 specdelivery specemulator
172.24.11.224     dbserver dbserver1
172.24.11.225     batchserver batchserver1

172.21.128.242    client1 specdriver
172.21.128.241    wclient1



Title: Re: runspecvirt fail
Post by: lroderic on September 21, 2017, 10:29:17 AM
Thanks for the details. You've got /etc/hosts and the vNICs correct on both the clients and the workload VMs. You're getting further! :)

Code: [Select]
msec after pollStart: ...
means that somewhere a VM is out of time sync. It happens on occasion if you don't regularly sync everything. Before you start a test, run the helper script timesynctiles.sh to sync the time and date on the clients and workload VMs. I always invoke this in my runspecvirt script so that I never encounter this.

I also think you're at a point where you can stop the verbose output of SPECweb in prime-client1_1094.log. In Control.config, DEBUG_LEVEL = 3 is plenty. If you edited webserver's Test.config to use verbose, you can reset that back to the default.

Lisa
Title: Re: runspecvirt fail
Post by: zgy on September 21, 2017, 11:07:23 AM
Thanks ,  and what do you mean "If you edited webserver's Test.config to use verbose";
How to config it ?
Title: Re: runspecvirt fail
Post by: lroderic on September 21, 2017, 11:42:43 AM
Set each webserver's /opt/SPECweb2005/Test.config so that DEBUG_LEVEL = 0.

Lisa
Title: Re: runspecvirt fail
Post by: zgy on September 21, 2017, 11:49:41 AM
I have run the timesynctiles.sh, set DEBUG_LEVEL=1, and rerun it , I notice that timechecktiles.sh will not sync the wclient.
And May be it is the reason of the pre ERROR. ( [ERROR] wclient1:1096 (PRIME_HOST[0][1]) msec after pollStart: 9223372036854775807)

for i in `seq 1 $tiles`;
    do
echo " "
echo $CLIENT$i:
       ssh $CLIENT$i date
echo dbserver$i:
       ssh dbserver$i date
echo appserver$i:
       ssh appserver$i date
echo batchserver$i:
       ssh batchserver$i date
echo mailserver$i:
       ssh mailserver$i date
echo infraserver$i:
       ssh infraserver$i date
echo webserver$i:
       ssh webserver$i date
    done
Title: Re: runspecvirt fail
Post by: zgy on September 21, 2017, 12:03:13 PM
HI,  I rerun , but it fail again  :'(

Because This time I set DEBUG_LEVEL=1, So no show ( [ERROR] wclient1:1096 (PRIME_HOST[0][1]) msec after pollStart: 9223372036854775807) this time


[root@client1 SPECvirt]# cat primectrl.out
2017-09-21 23:31:37:417 Thu Sep 21 23:31:37 CST 2017
2017-09-21 23:31:37:471 RMI server started: client1:9990
2017-09-21 23:31:37:478 [INFO] This is a perf-only benchmark run. Skipping active idle polling interval.
2017-09-21 23:31:37:478 PrimeControl: preparing client drivers.
2017-09-21 23:31:37:555 PrimeControl: starting clients...
2017-09-21 23:31:37:555 PrimeControl: starting clients...
2017-09-21 23:31:37:581 PrimeControl: PTDs not used for this benchmark run!
2017-09-21 23:31:37:582 PrimeControl: starting 4 masters.
2017-09-21 23:35:23:263 PrimeControl: waiting on 4 prime client(s).
.
2017-09-21 23:35:24:287 setting hostsReady = true
2017-09-21 23:37:39:554 PrimeControl: Workload and prime controller builds: 80
2017-09-21 23:37:51:347 specvirt: clock sync check completed successfully
2017-09-21 23:37:51:348 specvirt: initiating workload ramp-up.
2017-09-21 23:37:51:348 Polling start time = Thu Sep 21 23:57:52 CST 2017
2017-09-21 23:37:51:348 Polling end time   = Fri Sep 22 01:57:52 CST 2017
2017-09-21 23:42:53:126 PrimeControl: all workloads started.
2017-09-21 23:57:54:348 [ERROR] wclient1:1096 (PRIME_HOST[0][1]) failed to enter run phase before start of polling interval!
2017-09-21 23:57:54:349 PrimeControl: [ERROR] one or more workloads failed to start runtime before start of polling interval. Aborting.
2017-09-21 23:57:54:349 PrimeControl: sending abortTest() to prime clients.
2017-09-21 23:58:15:726 PrimeControl: stopping remote client processes
2017-09-21 23:58:17:732 PrimeControl: stopping local client threads
2017-09-21 23:58:17:733 PrimeControl: [ERROR] startMasters() failed!
2017-09-21 23:58:17:733 PrimeControl: terminating run. Please wait...
2017-09-21 23:58:18:738 specvirt: benchmark run failed!
2017-09-21 23:58:18:738 specvirt: Done!









[root@wclient1 SPECvirt]# tail -100 Clientmgr1_1096.out
-> Checking keyname:FILE_4_DATA with value: bar,60,0.8for compliance.
-> Checking keyname:CLASS_6_DIST with value: 0.100for compliance.
-> Checking keyname:FILE_2_DATA with value: ccc,34,0.66for compliance.
-> Checking keyname:FILE_31_DATA with value: H_Service_Tag_Unkown,2170,0.0for compliance.
-> Checking keyname:ITERATIONS with value: 1for compliance.
-> Checking keyname:DYN_CMD_0_EXPECTED_PCT with value: 8.1072for compliance.
-> Checking keyname:IGNORE_WPD_ERRORS with value: 0for compliance.
-> Checking keyname:TIME_TOLERABLE with value: 5000for compliance.
-> Checking keyname:THINK_INTERVAL with value: 1for compliance.
-> Checking keyname:MARKER_FREQ with value: 4096for compliance.
-> 2017-09-21 23:35:25:868 SpecwebControl: Setting client overthink time allowed to 72 sec.
-> 2017-09-21 23:35:25:870 SpecwebControl: **** SPECweb2005 benchmark started
-> 2017-09-21 23:35:25:870 SpecwebControl: Not a multi-load or multi-step run..
-> 2017-09-21 23:35:25:873 SpecwebControl: * Running SPECweb_Support workload
-> 2017-09-21 23:35:25:873 SpecwebControl: wclient1:1010 has 2500 connections.(3)
-> 2017-09-21 23:35:25:873 Configuration: Clearing workload.
-> 2017-09-21 23:35:25:874 SpecwebControl: Adding workload(s)...
-> 2017-09-21 23:35:25:874 Configuration: Adding workload 0
-> 2017-09-21 23:35:25:877 RemoteLoadGen: Total clients: 1
-> 2017-09-21 23:35:25:878 RemoteLoadGen: Adding host wclient1:1010
-> 2017-09-21 23:35:25:895 SPECweb_Support: Entered primeClientInit()
-> 2017-09-21 23:35:25:902 HttpRequestSched: server: webserver; port: 80; Web server init() command:
-> GET /support/init.php?BESIM_HOST=infraserver&BESIM_PORT=81&BESIM_URI=/fcgi-bin/besim_fcgi.fcgi&BESIM_PERSISTENT=false&PADDING_DIR=/var/www/html/support/dynamic_padding/&SMARTY_DIR=/var/www/html/Smarty-2.6.26/libs/&SMARTY_SUPPORT_DIR=/var/www/html/support/&SEND_CONTENT_LENGTH=true HTTP/1.1
-> Accept: */*
-> Host: webserver
->
->
->
-> 2017-09-21 23:35:25:909 HttpRequestSched: Web server init command response:
-> HTTP/1.1 200 OK
-> Date: Thu, 21 Sep 2017 15:35:22 GMT
-> Server: Apache/2.2.15 (CentOS)
-> X-Powered-By: PHP/5.3.3
-> Content-Length: 538
-> Content-Type: text/html; charset=UTF-8
->
-> <html>
->      <head>
->              <title>SPECweb2005 Support Workload Init</title>
->      </head>
->      <body>
-> <P>SERVER_SOFTWARE = Apache/2.2.15 (CentOS)</P>
-> <P>REMOTE_ADDR = 172.21.128.241</P>
-> <P>SCRIPT_NAME = /support/init.php</P>
-> <P>QUERY_STRING = BESIM_HOST=infraserver&BESIM_PORT=81&BESIM_URI=/fcgi-bin/besim_fcgi.fcgi&BESIM_PERSISTENT=false&PADDING_DIR=/var/www/html/support/dynamic_padding/&SMARTY_DIR=/var/www/html/Smarty-2.6.26/libs/&SMARTY_SUPPORT_DIR=/var/www/html/support/&SEND_CONTENT_LENGTH=true</P>
-> <P>SERVER_TIME = 1506008122000</P>
->      </body>
-> </html>
->
-> 2017-09-21 23:35:25:910 HttpRequestSched: Server time is: Thu, 21 Sep 2017 15:35:22 GMT
-> 2017-09-21 23:35:25:910 HttpRequestSched: Server date is: 2017-09-21
-> 2017-09-21 23:35:25:910 HttpRequestSched: condGetReq: If-Modified-Since: Thu, 21 Sep 2017 15:35:22 GMT
->
->
-> 2017-09-21 23:35:25:910 SPECweb_Support: LOAD = 2500
-> 2017-09-21 23:35:25:910 HttpRequestSched: BeSim init() command:
-> GET /fcgi-bin/besim_fcgi.fcgi?3&0&1506009035&2500 HTTP/1.1
-> Accept: */*
-> Host: infraserver
->
->
->
-> 2017-09-21 23:35:27:420 HttpRequestSched: Besim init command response:
-> HTTP/1.1 200 OK
-> Date: Thu, 21 Sep 2017 15:35:23 GMT
-> Server: Apache/2.2.15 (CentOS)
-> Cache-control: no-store
-> Content-Length: 306
-> Connection: close
-> Content-Type: text/html; charset=UTF-8
->
-> <html>
-> <head><title>SPECweb2005 BESIM</title></head>
-> <body>
-> <p>SERVER_SOFTWARE = Apache/2.2.15 (CentOS)
-> <p>REMOTE_ADDR = 172.21.128.241
-> <p>SCRIPT_NAME = /fcgi-bin/besim_fcgi.fcgi
-> <p>QUERY_STRING = 3&0&1506009035&2500
-> <pre>
-> 0
-> DONE ResetDate = 20170921, Time=1506009035,Load=2500,SL=25
-> </pre>
-> </body></html>
->
->
-> 2017-09-21 23:35:27:423 RemoteLoadGen: Initializing clients...
-> 2017-09-21 23:35:34:458 RemoteLoadGen: setting client tile ID to 0
-> 2017-09-21 23:35:34:577 SpecwebControl: Waiting for SPECvirt start signal.
-> 2017-09-21 23:37:53:457 sleeping for 1 seconds.
-> 2017-09-21 23:37:54:458 SpecwebControl: *** Starting Iteration 1 of 1
-> 2017-09-21 23:37:54:458 RemoteLoadGen: 300-second ramp-up starting.
-> 2017-09-21 23:37:54:458 RemoteLoadGen: Setting load to 2500
-> 2017-09-21 23:42:55:231 RemoteLoadGen: Finished starting clients.
-> 2017-09-21 23:42:55:235 SpecwebControl: Warming up for 900 seconds.
-> 2017-09-21 23:57:55:238 SpecwebControl: Clearing results.
-> 2017-09-21 23:57:56:460 Received abort signal from prime controller. Terminating.
-> 2017-09-21 23:57:56:462 SpecwebControl: Terminating run. Please wait...
-> 2017-09-21 23:58:00:053 SpecwebControl: Starting 7200-second runtime.
-> 2017-09-21 23:58:00:065 RemoteLoadGen: stopping client wclient1:1010; abort=true
-> 2017-09-21 23:58:17:512 SpecwebControl: Stopping remote clients.
Title: Re: runspecvirt fail
Post by: lroderic on September 21, 2017, 12:15:38 PM
I have run the timesynctiles.sh, set DEBUG_LEVEL=1, and rerun it , I notice that timechecktiles.sh will not sync the wclient.
And May be it is the reason of the pre ERROR. ( [ERROR] wclient1:1096 (PRIME_HOST[0][1]) msec after pollStart: 9223372036854775807)

for i in `seq 1 $tiles`;
    do
echo " "
echo $CLIENT$i:
       ssh $CLIENT$i date
echo dbserver$i:
       ssh dbserver$i date
...

You forgot to set the time/date on the wclient, so add it to timesynctiles.sh.

In Control.config, keep

Code: [Select]
DEBUG_LEVEL = 3
Title: Re: runspecvirt fail
Post by: zgy on September 21, 2017, 12:28:20 PM
ok,  I have add wclient to timesynctiles.sh last time.

I will reopen the DEBUG_LEVEL=3 this time to see what happen.
Title: Re: runspecvirt fail
Post by: zgy on September 21, 2017, 12:58:25 PM
HI Lisa

The error is same:

[root@client1 SPECvirt]# cat primectrl.out
2017-09-22 00:26:46:464 Fri Sep 22 00:26:46 CST 2017
2017-09-22 00:26:46:464 specvirt: maxPreRunTime = 1201
2017-09-22 00:26:46:464 specvirt: runTime = 7200
2017-09-22 00:26:46:465 specvirt: runTime = 7200
2017-09-22 00:26:46:465 specvirt: runTime = 7200
2017-09-22 00:26:46:465 specvirt: runTime = 7200
2017-09-22 00:26:46:465 specvirt: runTime = 600
2017-09-22 00:26:46:465 specvirt: runTime = 600
2017-09-22 00:26:46:465 specvirt: runTime = 600
2017-09-22 00:26:46:465 specvirt: runTime = 600
2017-09-22 00:26:46:521 RMI server started: client1:9990
2017-09-22 00:26:46:529 [INFO] This is a perf-only benchmark run. Skipping active idle polling interval.
2017-09-22 00:26:46:530 PrimeControl: preparing client drivers.
2017-09-22 00:26:46:530 PrimeControl: PRIME_HOST 0 = client1:1098
2017-09-22 00:26:46:530 PrimeControl: PRIME_HOST 1 = wclient1:1096
2017-09-22 00:26:46:530 PrimeControl: PRIME_HOST 2 = client1:1094
2017-09-22 00:26:46:531 PrimeControl: PRIME_HOST 3 = client1:1092
2017-09-22 00:26:46:532 PrimeControl: Master 1: client1:1098
2017-09-22 00:26:46:532 PrimeControl: Master 2: wclient1:1096
2017-09-22 00:26:46:532 PrimeControl: Master 3: client1:1094
2017-09-22 00:26:46:532 PrimeControl: Master 4: client1:1092
2017-09-22 00:26:46:535 PrimeControl: adding host client1:1098
2017-09-22 00:26:46:542 PrimeControl: adding host wclient1:1096
2017-09-22 00:26:46:550 PrimeControl: adding host client1:1094
2017-09-22 00:26:46:556 PrimeControl: adding host client1:1092
2017-09-22 00:26:46:570 First client for 0: 172.21.128.242:1091
2017-09-22 00:26:46:588 First client for 1: 172.21.128.241:1010
2017-09-22 00:26:46:661 First client for 2: 172.21.128.242:1200
2017-09-22 00:26:46:668 First client for 3: 172.21.128.242:1902
2017-09-22 00:26:46:673 PrimeControl: starting clients...
2017-09-22 00:26:46:673 PrimeControl: starting clients...
2017-09-22 00:26:46:674 PrimeControl: clients.length = 4
2017-09-22 00:26:46:674 PrimeControl: clients.length = 4
2017-09-22 00:26:46:674 PrimeControl: clients[0].length = 1
2017-09-22 00:26:46:674 PrimeControl: clients[0].length = 1
2017-09-22 00:26:46:674 PrimeControl: clients[1].length = 1
2017-09-22 00:26:46:674 PrimeControl: starting clients[0][0]: 172.21.128.242:1091
2017-09-22 00:26:46:674 PrimeControl: starting clients[1][0]: 172.21.128.241:1010
2017-09-22 00:26:46:694 PrimeControl: started client: 172.21.128.242:1091
2017-09-22 00:26:46:694 PrimeControl: clients[1].length = 1
2017-09-22 00:26:46:694 PrimeControl: clients[2].length = 1
2017-09-22 00:26:46:694 PrimeControl: starting clients[2][0]: 172.21.128.242:1200
2017-09-22 00:26:46:697 PrimeControl: started client: 172.21.128.242:1200
2017-09-22 00:26:46:697 PrimeControl: clients[3].length = 1
2017-09-22 00:26:46:697 PrimeControl: starting clients[3][0]: 172.21.128.242:1902
2017-09-22 00:26:46:699 PrimeControl: started client: 172.21.128.242:1902
2017-09-22 00:26:46:699 PrimeControl: started client: 172.21.128.241:1010
2017-09-22 00:26:46:699 PrimeControl: clients[2].length = 1
2017-09-22 00:26:46:699 PrimeControl: clients[3].length = 1
2017-09-22 00:26:46:700 PrimeControl: PTDs not used for this benchmark run!
2017-09-22 00:26:46:700 PrimeControl: starting 4 masters.
2017-09-22 00:30:16:456 PrimeControl: master[0][0] sleeping 20 sec.
2017-09-22 00:30:16:456 PrimeControl: master[0][2] sleeping 20 sec.
2017-09-22 00:30:16:456 PrimeControl: master[0][3] sleeping 20 sec.
2017-09-22 00:30:16:456 PrimeControl: master[0][1] sleeping 20 sec.
2017-09-22 00:30:36:486 PrimeControl: waiting on 4 prime client(s).
2017-09-22 00:30:36:817 Sending config to client1:1098
2017-09-22 00:30:36:818 Sending config to client1:1094
2017-09-22 00:30:36:823 Sending config to client1:1092
2017-09-22 00:30:36:860 Sending config to wclient1:1096
2017-09-22 00:30:36:970 specvirt: waiting on 3 prime clients.
2017-09-22 00:30:37:104 specvirt: waiting on 2 prime clients.
.
2017-09-22 00:30:37:507 setting hostsReady = true
2017-09-22 00:30:45:689 specvirt: waiting on 1 prime clients.
2017-09-22 00:32:52:742 PrimeControl: Workload and prime controller builds: 80
2017-09-22 00:32:52:742 PrimeControl: awaiting runtime started signal from prime clients
2017-09-22 00:33:04:418 specvirt: clock sync check completed successfully
2017-09-22 00:33:04:418 specvirt: initiating workload ramp-up.
2017-09-22 00:33:04:418 Polling start time = Fri Sep 22 00:53:05 CST 2017
2017-09-22 00:33:04:419 Polling end time   = Fri Sep 22 02:53:05 CST 2017
2017-09-22 00:38:06:180 PrimeControl: all workloads started.
2017-09-22 00:53:07:418 [ERROR] wclient1:1096 (PRIME_HOST[0][1]) failed to enter run phase before start of polling interval!
2017-09-22 00:53:07:419 PrimeControl: dumping polling start response times...
2017-09-22 00:53:07:419 client1:1098 (PRIME_HOST[0][0]) msec after pollStart: 7
2017-09-22 00:53:07:419 [ERROR] wclient1:1096 (PRIME_HOST[0][1]) msec after pollStart: 9223372036854775807
2017-09-22 00:53:07:419 client1:1094 (PRIME_HOST[0][2]) msec after pollStart: 20
2017-09-22 00:53:07:419 client1:1092 (PRIME_HOST[0][3]) msec after pollStart: 8
2017-09-22 00:53:07:419 PrimeControl: [ERROR] one or more workloads failed to start runtime before start of polling interval. Aborting.
2017-09-22 00:53:07:419 PrimeControl: sending abortTest() to prime clients.
2017-09-22 00:53:07:419 PrimeControl: id=0, abortID=-1
2017-09-22 00:53:07:419 PrimeControl: id=1, abortID=-1
2017-09-22 00:53:07:419 PrimeControl: masters[0]=client1:1098
2017-09-22 00:53:07:420 PrimeControl: masters[1]=wclient1:1096
2017-09-22 00:53:07:420 PrimeControl: id=3, abortID=-1
2017-09-22 00:53:07:420 PrimeControl: id=2, abortID=-1
2017-09-22 00:53:07:420 PrimeControl: masters[3]=client1:1092
2017-09-22 00:53:07:420 PrimeControl: masters[2]=client1:1094
2017-09-22 00:53:28:167 PrimeControl: stopping remote client processes
2017-09-22 00:53:30:176 PrimeControl: stopping local client threads
2017-09-22 00:53:30:177 PrimeControl: [ERROR] startMasters() failed!
2017-09-22 00:53:30:177 PrimeControl: terminating run. Please wait...
2017-09-22 00:53:31:183 specvirt: benchmark run failed!
2017-09-22 00:53:31:183 specvirt: Done!


But Now I have sync on all vms, 
[root@client1 SPECvirt]# ./synccheck.sh
infraserver1:
Fri Sep 22 00:56:19 CST 2017
websserver1:
Fri Sep 22 00:56:20 CST 2017
appserver1:
Fri Sep 22 00:56:20 CST 2017
dbserver1:
Fri Sep 22 00:56:20 CST 2017
batchserver1:
Fri Sep 22 00:56:20 CST 2017
mailserver1:
Fri Sep 22 00:56:20 CST 2017
wclient1:
Fri Sep 22 00:56:21 CST 2017
client1:
Fri Sep 22 00:56:21 CST 2017



Title: Re: runspecvirt fail
Post by: lroderic on September 21, 2017, 01:05:30 PM
Reboot all VMs, rerun the time sync script, and retry. What platform are you on? Are you using the hypervisor tools to sync the VM clock to the hypervisor? Try using NTP on the server, have both client and client sync to that, then sync the VMs to the clients.

Lisa
Title: Re: runspecvirt fail
Post by: zgy on September 21, 2017, 01:09:38 PM
Of course , everytime I run spec,  I have reboot all VMs.

My host is centros7,  and all VM (client \ workload ) is centros 6,  I use kvm hypervisor

I found My physic host date is not sync with vm,  it that matter?

[root@client1 SPECvirt]# ./synccheck.sh
infraserver1:
Fri Sep 22 01:07:22 CST 2017
websserver1:
Fri Sep 22 01:07:23 CST 2017
appserver1:
Fri Sep 22 01:07:23 CST 2017
dbserver1:
Fri Sep 22 01:07:23 CST 2017
batchserver1:
Fri Sep 22 01:07:23 CST 2017
mailserver1:
Fri Sep 22 01:07:23 CST 2017
wclient1:
Fri Sep 22 01:07:24 CST 2017
client1:
Fri Sep 22 01:07:24 CST 2017
[root@client1 SPECvirt]# exit
logout
Connection to 172.21.128.242 closed.
[root@localhost ~]# date
Thu Sep 21 13:07:57 EDT 2017
Title: Re: runspecvirt fail
Post by: lroderic on September 21, 2017, 01:24:25 PM
It's easiest to have everything in sync - hosts, client VMs, workload VMs. In timesynctiles.sh, use the host's hostname as TIMEHOST, then all VMs will sync to the host.

Lisa
Title: Re: runspecvirt fail
Post by: zgy on September 21, 2017, 02:06:53 PM
My client1\wclient1 and workload vm run on different phy host,

So timesynctiles.sh can still work ?

Example client1\wclient1 run on phyname1 and workload vm run on phyname2,

How to run it ? thanks .
Title: Re: runspecvirt fail
Post by: lroderic on September 21, 2017, 02:11:27 PM
It doesn't matter what the time is on the client host. The time sync script gets and sets the time on the client VMs and workload VMs, not the client host.

Please read the comments at the top of timesynctiles.sh.

You need a source for the time (TIMEHOST), so use the host where the VMs run. Let's say the host where the workload VMs run is 172.16.20.20. Put the following in your runspecvirt.sh script:

Code: [Select]
/opt/SPECvirt/timesynctiles.sh client 172.16.20.20

Lisa
Title: Re: runspecvirt fail
Post by: zgy on September 21, 2017, 02:34:19 PM
I have sync the date on host -- client -- workload,  And I will rerun it .

[root@localhost opt]# ./synccheck.sh
infraserver1:
Fri Sep 22 02:28:55 CST 2017
websserver1:
Fri Sep 22 02:28:55 CST 2017
appserver1:
Fri Sep 22 02:28:55 CST 2017
dbserver1:
Fri Sep 22 02:28:56 CST 2017
batchserver1:
Fri Sep 22 02:28:56 CST 2017
mailserver1:
Fri Sep 22 02:28:56 CST 2017
wclient1:
Fri Sep 22 02:28:56 CST 2017
client1:
Fri Sep 22 02:28:56 CST 2017
client-host:
Fri Sep 22 02:28:56 CST 2017
wrokload-host:
Fri Sep 22 02:28:56 CST 2017
Title: Re: runspecvirt fail
Post by: zgy on September 21, 2017, 03:51:33 PM
HI , still fail

2017-09-22 03:26:09:896 setting hostsReady = true
2017-09-22 03:26:18:125 specvirt: waiting on 1 prime clients.
2017-09-22 03:28:25:119 PrimeControl: Workload and prime controller builds: 80
2017-09-22 03:28:25:120 PrimeControl: awaiting runtime started signal from prime clients
2017-09-22 03:28:36:592 specvirt: clock sync check completed successfully
2017-09-22 03:28:36:593 specvirt: initiating workload ramp-up.
2017-09-22 03:28:36:593 Polling start time = Fri Sep 22 03:48:37 CST 2017
2017-09-22 03:28:36:593 Polling end time   = Fri Sep 22 05:48:37 CST 2017
2017-09-22 03:33:37:699 PrimeControl: all workloads started.
2017-09-22 03:48:39:593 [ERROR] wclient1:1096 (PRIME_HOST[0][1]) failed to enter run phase before start of polling interval!
2017-09-22 03:48:39:593 PrimeControl: dumping polling start response times...
2017-09-22 03:48:39:593 client1:1098 (PRIME_HOST[0][0]) msec after pollStart: 9
2017-09-22 03:48:39:593 [ERROR] wclient1:1096 (PRIME_HOST[0][1]) msec after pollStart: 9223372036854775807
2017-09-22 03:48:39:594 client1:1094 (PRIME_HOST[0][2]) msec after pollStart: 17
2017-09-22 03:48:39:594 client1:1092 (PRIME_HOST[0][3]) msec after pollStart: 5
2017-09-22 03:48:39:594 PrimeControl: [ERROR] one or more workloads failed to start runtime before start of polling interval. Aborting.
2017-09-22 03:48:39:594 PrimeControl: sending abortTest() to prime clients.
2017-09-22 03:48:39:594 PrimeControl: id=0, abortID=-1
2017-09-22 03:48:39:594 PrimeControl: id=1, abortID=-1
2017-09-22 03:48:39:594 PrimeControl: masters[0]=client1:1098
2017-09-22 03:48:39:594 PrimeControl: masters[1]=wclient1:1096
2017-09-22 03:48:39:594 PrimeControl: id=2, abortID=-1
2017-09-22 03:48:39:595 PrimeControl: masters[2]=client1:1094
2017-09-22 03:48:39:594 PrimeControl: id=3, abortID=-1
2017-09-22 03:48:39:595 PrimeControl: masters[3]=client1:1092


I have synced , and before run ,reboot all vms,

[root@localhost opt]# ./synccheck.sh
infraserver1:
Fri Sep 22 03:49:21 CST 2017
websserver1:
Fri Sep 22 03:49:21 CST 2017
appserver1:
Fri Sep 22 03:49:21 CST 2017
dbserver1:
Fri Sep 22 03:49:22 CST 2017
batchserver1:
Fri Sep 22 03:49:22 CST 2017
mailserver1:
Fri Sep 22 03:49:22 CST 2017
wclient1:
Fri Sep 22 03:49:22 CST 2017
client1:
Fri Sep 22 03:49:23 CST 2017
client-host:
Fri Sep 22 03:49:22 CST 2017
wrokload-host:
Fri Sep 22 03:49:22 CST 2017


Is there any other Possibility that due to this error ?

Thanks.
Title: Re: runspecvirt fail
Post by: lroderic on September 21, 2017, 03:54:20 PM
Reboot the webserver VM and wclient1 VM and retry.
Title: Re: runspecvirt fail
Post by: zgy on September 21, 2017, 05:17:20 PM
same error:
2017-09-22 04:54:20:103 specvirt: clock sync check completed successfully
2017-09-22 04:54:20:103 specvirt: initiating workload ramp-up.
2017-09-22 04:54:20:104 Polling start time = Fri Sep 22 05:14:21 CST 2017
2017-09-22 04:54:20:104 Polling end time   = Fri Sep 22 07:14:21 CST 2017
2017-09-22 04:59:25:990 PrimeControl: all workloads started.
2017-09-22 05:14:23:104 [ERROR] wclient1:1096 (PRIME_HOST[0][1]) failed to enter run phase before start of polling interval!
2017-09-22 05:14:23:104 PrimeControl: dumping polling start response times...
2017-09-22 05:14:23:104 client1:1098 (PRIME_HOST[0][0]) msec after pollStart: 9
2017-09-22 05:14:23:104 [ERROR] wclient1:1096 (PRIME_HOST[0][1]) msec after pollStart: 9223372036854775807
2017-09-22 05:14:23:104 client1:1094 (PRIME_HOST[0][2]) msec after pollStart: 18
2017-09-22 05:14:23:104 client1:1092 (PRIME_HOST[0][3]) msec after pollStart: 7
2017-09-22 05:14:23:104 PrimeControl: [ERROR] one or more workloads failed to start runtime before start of polling interval. Aborting.
2017-09-22 05:14:23:104 PrimeControl: sending abortTest() to prime clients.
2017-09-22 05:14:23:105 PrimeControl: id=0, abortID=-1
2017-09-22 05:14:23:105 PrimeControl: id=1, abortID=-1
2017-09-22 05:14:23:105 PrimeControl: masters[0]=client1:1098

/opt/SPECvirt/timesynctiles.sh client 172.24.11.91  as been added to runspecvirt.sh
and 172.24.11.91 is the host ip of workload vm run on .


wclient1 has beed added to timesynctiles.sh also, like this:
# Set time on client and VMs for each tile
for i in `seq 1 $tiles`;
    do
echo  $CLIENT$i:
       ssh $CLIENT$i date `ssh $PRIME date +%m%d%H%M.%S`
echo  wclient1:
       ssh wclient1 date `ssh $PRIME date +%m%d%H%M.%S`
echo dbserver$i
       ssh dbserver$i date `ssh $PRIME date +%m%d%H%M.%S`
echo appserver$i
       ssh appserver$i date `ssh $PRIME date +%m%d%H%M.%S`
echo batchserver$i
       ssh batchserver$i date `ssh $PRIME date +%m%d%H%M.%S`
echo mailserver$i
       ssh mailserver$i date `ssh $PRIME date +%m%d%H%M.%S`
echo infraserver$i
       ssh infraserver$i date `ssh $PRIME date +%m%d%H%M.%S`
echo webserver$i
       ssh webserver$i date `ssh $PRIME date +%m%d%H%M.%S`
    done
Title: Re: runspecvirt fail
Post by: zgy on September 21, 2017, 05:26:26 PM
I wander why every time the pollStart number is always the same ?    Like overflow?
104 [ERROR] wclient1:1096 (PRIME_HOST[0][1]) msec after pollStart: 9223372036854775807
Title: Re: runspecvirt fail
Post by: lroderic on September 21, 2017, 05:44:10 PM
Back in https://www.spec.org/forums/index.php?topic=89.msg540#msg540 (https://www.spec.org/forums/index.php?topic=89.msg540#msg540), for some reason I don't understand, you decided to set RAMP_SECONDS and WARMUP_SECONDS for each individual tile and workload:

Code: [Select]
RAMP_SECONDS[0][0] = 1200
RAMP_SECONDS[0][1] = 1200
..
WARMUP_SECONDS[0][0] = 900
WARMUP_SECONDS[0][1] = 900
..

Are you trying to use tile or workload indexing with POLL_INTERVAL_SEC like this?

Code: [Select]
POLL_INTERVAL_SEC[0][0] = 7200
POLL_INTERVAL_SEC[0][1] = 600

If so, get rid of it. Entirely unnecessary and probably the cause of your timing problems. Set:

Code: [Select]
RAMP_SECONDS = 300
WARMUP_SECONDS = 600
POLL_INTERVAL_SEC = 1200

If this doesn't work, it's probably best for you to restart from scratch with the Example VM guide and use what you're learned here to start fresh.

Lisa
Title: Re: runspecvirt fail
Post by: zgy on September 22, 2017, 08:21:14 AM
Still error.

If I set WORKLOAD_LOAD_LEVEL[1] = 500,  It runs ok.
Title: Re: runspecvirt fail
Post by: lroderic on September 22, 2017, 12:35:49 PM
This huge msec after pollStart means that it is actually returning a time that is before the pollStart. The granularity is too low. Which clock timer do you use? tsc has been used for most submissions. Please review the VM XML definition files in a compliant KVM submission. Also set DEBUG = 10 in Control.config to return the actual timer values in addition to just the msec after pollStart values.

Lisa
Title: Re: runspecvirt fail
Post by: zgy on September 29, 2017, 05:43:56 AM
HI,I rebuild the SPEC, and it can works now .

Is there any config to close the log info that locates in prime-client1_109*.log like follows?  thanks.

prime-client1_1096.log
2017-09-29 13:18:34:344,1194608,1191017,0,3591,527103813688,6439174388,1142178,1235,47604,263,392415,0
2017-09-29 13:18:44:346,1196296,1192705,0,3591,528036629793,6450488767,1143780,1236,47689,263,392415,0
2017-09-29 13:18:54:342,1198075,1194484,0,3591,528837322555,6460290109,1145475,1236,47773,263,392415,0
2017-09-29 13:19:04:343,1199785,1196194,0,3591,529603659846,6469362282,1147118,1237,47839,263,392415,0
2017-09-29 13:19:14:345,1201381,1197790,0,3591,530106108549,6475499332,1148659,1240,47891,263,392415,0
2017-09-29 13:19:24:342,1203066,1199475,0,3591,530756785877,6484256698,1150232,1246,47997,263,392415,0
2017-09-29 13:19:34:342,1204799,1201208,0,3591,531657357060,6495094444,1151876,1247,48085,263,392415,0
2017-09-29 13:19:44:343,1206484,1202893,0,3591,532335631319,6502995513,1153500,1249,48144,263,392415,0

prime-client1_1092.log
2017-09-29 11:20:04:901,1,1,2,2,2,2,1,0
2017-09-29 11:20:14:901,1,2,2,2,2,4,2,0
2017-09-29 11:20:24:902,1,3,3,2,3,7,3,0
2017-09-29 11:20:34:901,1,4,0,0,3,7,4,0
2017-09-29 11:20:44:901,1,5,3,0,3,10,5,0
2017-09-29 11:20:54:901,1,6,2,0,3,12,6,0
2017-09-29 11:21:04:901,1,7,13,0,13,25,7,0
2017-09-29 11:21:14:900,1,8,1,0,13,26,8,0

/prime-client1_1094.log:
2017-09-29 11:36:54:902,125574,125574,0,0,12957367,0,3928
2017-09-29 11:37:04:901,126746,126746,0,0,13060825,0,3928
2017-09-29 11:37:14:901,128061,128061,0,0,13148085,0,3928
2017-09-29 11:37:24:902,129573,129573,0,0,13257560,0,3928
2017-09-29 11:37:34:902,130983,130983,0,0,13383548,0,3928
2017-09-29 11:37:44:902,132530,132530,0,0,13484085,0,3928
2017-09-29 11:37:54:902,134015,134015,0,0,13611187,0,3928
2017-09-29 11:38:04:902,135443,135443,0,0,13890547,0,3928
2017-09-29 11:38:14:902,136845,136845,0,0,13972570,0,3928


/prime-client1_1098.log:
2017-09-29 13:11:34:911,170275,0.2,170144,0.4,339858,0.2,453466,1.25
2017-09-29 13:11:44:912,170571,0.2,170431,0.4,340420,0.2,454204,1.25
2017-09-29 13:11:54:915,170849,0.2,170708,0.4,341028,0.2,454968,1.25
2017-09-29 13:12:04:912,171198,0.2,171051,0.4,341801,0.2,456273,1.25
2017-09-29 13:12:14:910,171661,0.2,171528,0.4,342794,0.2,457823,1.25
2017-09-29 13:12:24:912,172211,0.2,172033,0.4,343896,0.2,459
Title: Re: runspecvirt fail
Post by: lroderic on September 29, 2017, 09:38:48 AM
Congratulations on some excellent response times.

> Is there any config to close the log info that locates in prime-client1_109*.log
> like follows?  thanks.
>
> 2017-09-29 13:11:34:911,170275,0.2,170144,0.4,339858,0.2,453466,1.25

Not sure what you're looking for here. Those entries in primectrl.out are the workload info and response times for each of the workload transactions through the test.

This shows that app server response times is 1.25, which means you most likely still have capacity on the host. You can set DEBUG_LEVEL = 0, but that's as low as logging goes. Does that answer your question?

Lisa
Title: Re: runspecvirt fail
Post by: Miles on October 03, 2017, 04:25:19 AM
Hi
Code: [Select]
HI,I rebuild the SPEC, and it can works now .
I encounter the same failure and still pending on it.
Would you please share how to rebuild the SPEC?
Re-setup specvirt on clients or re-install the VMs and scripts or ... ?

Thanks.