Author Topic: runspecvirt fail  (Read 505 times)

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #15 on: September 11, 2017, 09:05:51 PM »
YES, i have set /ets/sysctl.conf
[root@client1 SPECvirt]# tail -20  /etc/sysctl.conf
# SPECvirt_sc2013 tunings for a client
fs.file-max=1000000

net.core.optmem_max = 20000000
net.core.rmem_default = 20000000
net.core.rmem_max = 20000000
net.core.wmem_default = 20000000
net.core.wmem_max = 20000000
net.core.somaxconn = 8192
net.ipv4.tcp_max_tw_buckets = 500000
net.ipv4.tcp_mem = 20000000 20000000 200000000
net.ipv4.tcp_rmem = 20000000 20000000 200000000
net.ipv4.tcp_wmem = 20000000 20000000 200000000
net.ipv4.ip_local_port_range = 4096 65535
net.ipv4.tcp_tw_reuse=1

vm.swappiness = 0
vm.overcommit_memory = 0


How should i do if i want to reduce the SIMULTANEOUS_SESSIONS to 1000?
1:  In webserver ,  set SIMULTANEOUS_SESSIONS=1000(/opt/SPECweb2005/wafgen/unix/support_downloads_props.rc)
2:  Also In webserver, run:
     cd /opt/SPECweb2005/wafgen
     ./Wafgen unix/support_downloads_props.rc
     ./Wafgen unix/support_image_props.rc
3:reboot VM
4:rerun

Is it right?

lroderic

  • Moderator
  • Full Member
  • *****
  • Posts: 134
  • Karma: +6/-0
Re: runspecvirt fail
« Reply #16 on: September 11, 2017, 11:56:45 PM »
The best way is to edit Control.config and set WORKLOAD_LOAD_LEVEL[1]=1000. No need to rerun Wafgen. (That recreates the web store to embed the tile number in the data, which you already did.) This is documented in the Client Harness User Guide at https://www.spec.org/virt_sc2013/docs/SPECvirt_ClientHarnessUserGuide.html#mozTocId783824.

Lisa

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #17 on: September 12, 2017, 04:27:11 AM »
Hi, Now I set WORKLOAD_LOAD_LEVEL[1]=500,  the result is still something error:

[root@client1 SPECvirt]# tail -70 primectrl.out
0,0,2017-09-12 14:58:58:032,36398,10.0,35428,10.0,74389,10.0,388657,7.25
0,1,2017-09-12 14:58:58:023,243904,243841,0,63,111456794488,1296346276,219810,14840,9191,259,391583,10
0,2,2017-09-12 14:58:58:023,513419,435919,77500,0,1253548202,0,35483
0,3,2017-09-12 14:58:58:022,1,715,1,0,26,3040,715,0

0,0,2017-09-12 14:59:08:032,36493,10.0,35501,10.0,74530,10.0,388889,7.25
0,1,2017-09-12 14:59:08:024,244240,244177,0,63,111569524905,1297613194,220133,14852,9192,259,391583,10
0,2,2017-09-12 14:59:08:024,514055,436502,77553,0,1254389601,0,35483
0,3,2017-09-12 14:59:08:022,1,716,1,0,26,3041,716,0

0,0,2017-09-12 14:59:18:031,36539,10.0,35577,10.0,74559,10.0,389200,7.25
0,1,2017-09-12 14:59:18:024,244590,244527,0,63,111751904747,1299481555,220482,14853,9192,259,391583,10
0,2,2017-09-12 14:59:18:024,514848,437093,77755,0,1256951286,0,35483
0,3,2017-09-12 14:59:18:022,1,717,1,0,26,3042,717,0

0,0,2017-09-12 14:59:28:030,36628,10.0,35661,10.0,74767,10.0,389510,7.25
0,1,2017-09-12 14:59:28:024,244870,244807,0,63,111916854639,1301179513,220762,14853,9192,259,391583,10
0,2,2017-09-12 14:59:28:024,515618,437730,77888,0,1257989336,0,35483
0,3,2017-09-12 14:59:28:022,1,718,2,0,26,3044,718,0

0,0,2017-09-12 14:59:38:031,36708,10.0,35741,10.0,74865,10.0,389832,7.25
0,1,2017-09-12 14:59:38:024,245215,245152,0,63,112034351619,1302531274,221081,14878,9193,259,391583,10
0,2,2017-09-12 14:59:38:024,516376,438408,77968,0,1258743364,0,35483
0,3,2017-09-12 14:59:38:022,1,719,1,0,26,3045,719,0

2017-09-12 14:59:47:991 PrimeControl: client1:1092 (PRIME_HOST[0][3]) run complete; numStarted = 3
2017-09-12 14:59:48:015 PrimeControl: stopping result polling.
2017-09-12 14:59:48:015 PrimeControl: waiting for all workloads to stop...
2017-09-12 14:59:48:059 PrimeControl: client1:1094 (PRIME_HOST[0][2]) run complete; numStarted = 2
2017-09-12 14:59:56:446 PrimeControl: client1:1098 (PRIME_HOST[0][0]) run complete; numStarted = 1
2017-09-12 14:59:58:015 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:00:13:016 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:00:28:017 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:00:43:017 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:00:58:018 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:01:13:018 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:01:28:019 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:01:43:019 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:01:58:020 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:02:13:020 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:02:28:021 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:02:43:021 PrimeControl: waiting for 1 masters to finish
2017-09-12 15:02:56:531 PrimeControl: client1:1096 (PRIME_HOST[0][1]) run complete; numStarted = 0
2017-09-12 15:02:56:531 PrimeControl: all workloads stopped
2017-09-12 15:02:56:579 Workload validation errors reported!:
0-1-0 = Iteration 1 failed 95% TIME GOOD QoS. Achieved: 90.1%
0-1-1 = Iteration 1 failed 99% TIME TOLERABLE QoS. Achieved: 96.2%
0-0-0 = Dealer 90% Response Time FAILED
0-0-1 = Dealer Avg. Response Time FAILED
0-0-2 = Purchase Cycle Time Avg. FAILED
0-0-3 = Manage Cycle Time Avg. FAILED
0-0-4 = Browse Cycle Time Avg. FAILED
0-0-5 = Vehicle Purchasing Rate FAILED
0-0-6 = Largeorder Vehicle Purchase Rate FAILED
0-0-7 = Regular Vehicle Purchase Rate FAILED
0-0-8 = LargeOrderLine Vehicle Rate FAILED
0-0-9 = PlannedLines Vehicle Rate FAILED
0-0-10 = Manufacturing 90% Response Time FAILED
2017-09-12 15:02:56:599 PrimeControl: aggregate audit...
2017-09-12 15:02:56:601 PrimeControl: aggregate audit...
2017-09-12 15:02:56:609 PrimeControl: aggregate audit...
2017-09-12 15:02:56:724 PrimeControl: aggregate audit...
2017-09-12 15:02:56:724 PrimeControl: validating aggregate audit...
2017-09-12 15:02:58:842 PrimeControl: stopping clients.
2017-09-12 15:02:58:842 PrimeControl: stopping remote client processes
2017-09-12 15:03:00:852 PrimeControl: stopping local client threads
  > Loading Raw Result File..

2017-09-12 15:03:00:934 PrimeControl: terminating run. Please wait...
2017-09-12 15:03:01:940 specvirt: Done!

And I find there are many ERROR in Clientmgr1_1088.out 

Following it the log (also perf.html in it), thanks.

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #18 on: September 12, 2017, 04:27:51 AM »
Clientmgr1_1088.out_part0
Clientmgr1_1088.out_part1

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #19 on: September 12, 2017, 04:28:19 AM »
Clientmgr1_1088.out_part2
Clientmgr1_1088.out_part3

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #20 on: September 12, 2017, 04:28:42 AM »
Clientmgr1_1088.out_part4
Clientmgr1_1088.out_part5

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #21 on: September 12, 2017, 04:29:09 AM »
Clientmgr1_1088.out_part6
Clientmgr1_1088.out_part7

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #22 on: September 12, 2017, 04:29:33 AM »
Clientmgr1_1088.out_part8
Clientmgr1_1088.out_part9

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #23 on: September 12, 2017, 04:29:56 AM »
Clientmgr1_1088.out_part10
Clientmgr1_1088.out_part11

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #24 on: September 12, 2017, 04:30:22 AM »
Clientmgr1_1088.out_part12
Clientmgr1_1088.out_part13

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #25 on: September 12, 2017, 04:30:35 AM »
Clientmgr1_1088.out_part14

lroderic

  • Moderator
  • Full Member
  • *****
  • Posts: 134
  • Karma: +6/-0
Re: runspecvirt fail
« Reply #26 on: September 12, 2017, 10:58:48 AM »
Code: [Select]
0,1,2017-09-12 14:59:38:024,245215,245152,0,63,112034351619,1302531274,221081,14878,9193,259,391583,10
Congratulations, your web/infraserver test ran @ 500 simultaneous sessions. The harness failure for web/infraserver you see is because you reduced the workload from 2500 to 500, so the SPECvirt harness will report that as a validation error. Do you see any errors in Clientmgr1_1096.out?

Would you please post a sample of errors in Clientmgr1_1088.out? Because I'm not seeing problems with web/infraserver in there. These are normal when you have debugging on:

Code: [Select]
-> 2017-09-12 13:41:37:847 WorkloadScheduler[162]: Thinking for 1000 msec
-> 2017-09-12 13:41:37:860 SPECweb_Support[34]: STATE 4; RESPONSE LENGTH = 20480
-> 2017-09-12 13:41:37:860 SPECweb_Support[34]: STATE 4; FILE BYTES READ = 96572

Your error running at 2500 sessions is because the network is saturated. I don't think you have it set up correctly for performance because app/dbserver should report in the sub-two second range, but it's taking 7.25 sec. The errors the harness is reporting for that is a result of a shortened run since there isn't time for the appserver to meet the required mix of transaction types.

If you have no chance at upgrading to 10 GbE, look at Appendix B of Client Harness User Guide at https://www.spec.org/virt_sc2013/docs/SPECvirt_ClientHarnessUserGuide.html#mozTocId969790 for instructions on setting up a dedicated client for web.

On webserver and infraserver, please issue the ifconfig command and post the results here. Also, please post Clientmgr1_1096.out.

Lisa

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #27 on: September 14, 2017, 07:05:33 AM »
Thanks, I will use 10Ge network card.

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #28 on: September 19, 2017, 09:02:24 AM »
I  setup up a dedicated client for web forward : https://www.spec.org/virt_sc2013/docs/SPECvirt_ClientHarnessUserGuide.html#mozTocId969790

But it fail:
[root@client1 SPECvirt]# cat primectrl.out
2017-09-19 20:03:21:554 Tue Sep 19 20:03:21 CST 2017
2017-09-19 20:03:21:609 RMI server started: client1:9990
2017-09-19 20:03:21:615 [INFO] This is a perf-only benchmark run. Skipping active idle polling interval.
2017-09-19 20:03:21:616 PrimeControl: preparing client drivers.
2017-09-19 20:03:21:693 PrimeControl: starting clients...
2017-09-19 20:03:21:693 PrimeControl: starting clients...
2017-09-19 20:03:21:846 PrimeControl: PTDs not used for this benchmark run!
2017-09-19 20:03:21:846 PrimeControl: starting 4 masters.
2017-09-19 20:06:58:005 PrimeControl: waiting on 4 prime client(s).
.
2017-09-19 20:06:59:026 setting hostsReady = true
2017-09-19 20:09:14:401 PrimeControl: Workload and prime controller builds: 80
2017-09-19 20:09:26:996 specvirt: clock sync check completed successfully
2017-09-19 20:09:26:996 specvirt: initiating workload ramp-up.
2017-09-19 20:09:26:996 Polling start time = Tue Sep 19 20:29:27 CST 2017
2017-09-19 20:09:26:996 Polling end time   = Tue Sep 19 22:29:27 CST 2017
2017-09-19 20:14:28:005 PrimeControl: all workloads started.
2017-09-19 20:29:28:037 PrimeControl: all workloads in run time.
2017-09-19 20:29:28:037 PrimeControl: checking polling start response times...
2017-09-19 20:29:28:039 PrimeControl: sleeping for 0 sec
2017-09-19 20:29:28:039 PrimeControl: sending results counter reset command.
2017-09-19 20:29:28:039 PrimeControl: polling for 7200 sec
2017-09-19 20:29:38:050 [ERROR] Received abort signal from wclient1:1096. Terminating.
2017-09-19 20:29:38:050 PrimeControl: sending abortTest() to prime clients.
2017-09-19 20:29:38:051 PrimeControl: [ERROR] startMasters() failed!
[root@client1 SPECvirt]#


Then I check Clientmgr1_1096.out from wclietn1, it shows:
-> 2017-09-19 20:29:52:467 RemoteLoadGen: [ERROR] Unable to contact wclient1:1010
-> 2017-09-19 20:29:52:468 RemoteLoadGen: [ERROR] 1 remote clients, but only 0 responded
-> 2017-09-19 20:29:52:468 SpecwebControl: [ERROR] Client(s) not responding. Aborting test.
-> 2017-09-19 20:29:52:468 RemoteLoadGen: [ERROR] Remote exception setting server reset data collection from wclient1:1010
-> java.rmi.ConnectException: Connection refused to host: 172.21.128.241; nested exception is:
->      java.net.ConnectException: Connection refused (Connection refused)
-> 2017-09-19 20:29:52:468 SpecwebControl: Stopping remote clients.
-> 2017-09-19 20:29:52:470 RemoteLoadGen: 180-second ramp-down starting.

And I check netstat from wclient1, 1010 port is not listened,
[root@wclient1 SPECvirt]# netstat -tunlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name
tcp        0      0 0.0.0.0:111                 0.0.0.0:*                   LISTEN      1714/rpcbind
tcp        0      0 0.0.0.0:38035               0.0.0.0:*                   LISTEN      1758/rpc.statd
tcp        0      0 192.168.122.1:53            0.0.0.0:*                   LISTEN      2551/dnsmasq
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      2140/sshd
tcp        0      0 127.0.0.1:631               0.0.0.0:*                   LISTEN      1793/cupsd
tcp        0      0 127.0.0.1:25                0.0.0.0:*                   LISTEN      2226/master
tcp        0      0 :::1088                     :::*                        LISTEN      3863/java
tcp        0      0 :::6596                     :::*                        LISTEN      3797/java
tcp        0      0 :::1096                     :::*                        LISTEN      3797/java
tcp        0      0 :::34346                    :::*                        LISTEN      3863/java
tcp        0      0 :::40459                    :::*                        LISTEN      1758/rpc.statd
tcp        0      0 :::111                      :::*                        LISTEN      1714/rpcbind
tcp        0      0 :::22                       :::*                        LISTEN      2140/sshd
tcp        0      0 ::1:631                     :::*                        LISTEN      1793/cupsd
tcp        0      0 ::1:25                      :::*                        LISTEN      2226/master
udp        0      0 0.0.0.0:67                  0.0.0.0:*                               2551/dnsmasq
udp        0      0 0.0.0.0:993                 0.0.0.0:*                               1645/portreserve
udp        0      0 0.0.0.0:995                 0.0.0.0:*                               1645/portreserve
udp        0      0 0.0.0.0:617                 0.0.0.0:*                               1714/rpcbind
udp        0      0 0.0.0.0:110                 0.0.0.0:*                               1645/portreserve
udp        0      0 0.0.0.0:111                 0.0.0.0:*                               1714/rpcbind
udp        0      0 0.0.0.0:33782               0.0.0.0:*                               1758/rpc.statd
udp        0      0 0.0.0.0:631                 0.0.0.0:*                               1793/cupsd
udp        0      0 0.0.0.0:143                 0.0.0.0:*                               1645/portreserve
udp        0      0 192.168.122.1:53            0.0.0.0:*                               2551/dnsmasq
udp        0      0 127.0.0.1:703               0.0.0.0:*                               1758/rpc.statd
udp        0      0 :::65354                    :::*                                    1758/rpc.statd
udp        0      0 :::617                      :::*                                    1714/rpcbind
udp        0      0 :::111                      :::*                                    1714/rpcbind

And I fount there is no spec process running on wclient1
[root@wclient1 SPECvirt]# ps afx | grep spec
12408 pts/1    S+     0:00          \_ grep spec
[root@wclient1 SPECvirt]#
[root@wclient1 SPECvirt]#
[root@wclient1 SPECvirt]# ps afx | grep client
12416 pts/1    S+     0:00          \_ grep client
 3797 ?        Sl     0:02 java -jar clientmgr.jar -p 1096 -log
 3863 ?        Sl     0:08 java -jar clientmgr.jar -p 1088 -log
[root@wclient1 SPECvirt]#

why?

zgy

  • Newbie
  • *
  • Posts: 42
  • Karma: +0/-0
Re: runspecvirt fail
« Reply #29 on: September 19, 2017, 09:05:33 AM »
My configure:
[root@client1 SPECvirt]# cat Control.config | grep wclient
PRIME_HOST[0][1] = "wclient1:1096"
WORKLOAD_CLIENTS[1] = "wclient1:1010"


[root@client1 SPECvirt]# cat Clientmgr.sh
# Clientmgr [tile_index]
# Script called from runspecvirt.sh
#
java -jar clientmgr.jar -p 1098 -log > Clientmgr$1_1098.out 2>&1 &
#java -jar clientmgr.jar -p 1096 -log > Clientmgr$1_1096.out 2>&1 &
ssh wclient$1 ". /root/.bash_profile ; cd /opt/SPECvirt ; java -jar clientmgr.jar -p 1096 -log > Clientmgr$1_1096.out 2>&1 & "
ssh wclient$1 ". /root/.bash_profile ; cd /opt/SPECvirt ; java -jar clientmgr.jar -p 1088 -log > Clientmgr$1_1088w.out 2>&1 & "
java -jar clientmgr.jar -p 1094 -log > Clientmgr$1_1094.out 2>&1 &
java -jar clientmgr.jar -p 1092 -log > Clientmgr$1_1092.out 2>&1 &
java -jar clientmgr.jar -p 1088 -log > Clientmgr$1_1088.out 2>&1 &