Author Topic: Fail to run benchmark with runspecvirt.sh  (Read 16758 times)

Tang

  • Newbie
  • *
  • Posts: 22
  • Karma: +0/-0
Fail to run benchmark with runspecvirt.sh
« on: March 16, 2018, 02:54:00 AM »
Hi,

I have done clock sync and run benchmark with ./runspecvirt.sh. Then I check primectrl.out but it just not updated after a few hours.
It seems that no errors is printed to the logs. It may encounter some errors, how can I deal with it ?

Thanks a lot.

set Control.config:
NUM_WORKLOADS=1
OS: CentOS6.5
java version:
app servers:
java version "1.8.0_112"
other servers and client:
java version "1.7.0_45"
glassfish: glassfish-4.0
glassfish log: attached below
apache: apache-tomcat-6.0.44

primectrl.out :

2018-03-16 17:05:55:213 Fri Mar 16 17:05:55 CST 2018
2018-03-16 17:05:55:214 specvirt: maxPreRunTime = 1201
2018-03-16 17:05:55:214 specvirt: runTime = 7200
2018-03-16 17:05:55:214 specvirt: runTime = 600
2018-03-16 17:05:55:217 Validator: [WARNING] NUM_WORKLOADS value is: 1; should be 4
2018-03-16 17:05:55:218 Validator: [WARNING] Non-compliant configuration.
2018-03-16 17:05:55:218 [WARNING] This will be a non-compliant benchmark result!
2018-03-16 17:05:55:256 RMI server started: client1:9990
2018-03-16 17:05:55:261 [INFO] This is a perf-only benchmark run. Skipping active idle polling interval.
2018-03-16 17:05:55:261 PrimeControl: preparing client drivers.
2018-03-16 17:05:55:261 PrimeControl: PRIME_HOST 0 = client1:1098
2018-03-16 17:05:55:262 PrimeControl: Master 1: client1:1098
2018-03-16 17:05:55:264 PrimeControl: adding host client1:1098
2018-03-16 17:05:55:283 First client for 0: 192.1.1.7:1091
2018-03-16 17:05:55:289 PrimeControl: starting clients...
2018-03-16 17:05:55:289 PrimeControl: clients.length = 1
2018-03-16 17:05:55:289 PrimeControl: clients[0].length = 1
2018-03-16 17:05:55:289 PrimeControl: starting clients[0][0]: 192.1.1.7:1091
2018-03-16 17:05:55:305 PrimeControl: started client: 192.1.1.7:1091
2018-03-16 17:05:55:305 PrimeControl: PTDs not used for this benchmark run!
2018-03-16 17:05:55:306 PrimeControl: starting 1 masters.
2018-03-16 17:05:55:309 PrimeControl: master[0][0] sleeping 20 sec.
2018-03-16 17:06:15:327 PrimeControl: waiting on 1 prime client(s).
2018-03-16 17:06:15:516 Sending config to client1:1098
.
2018-03-16 17:06:16:339 setting hostsReady = true
2018-03-16 17:08:31:274 PrimeControl: Workload and prime controller builds: 79
2018-03-16 17:08:31:274 PrimeControl: awaiting runtime started signal from prime clients
2018-03-16 17:08:31:274 PrimeControl: all workloads started.

prime-client1_1098.log:

2018-03-16 17:06:15:428 Looking up SPECvirt controller: client1
2018-03-16 17:06:15:516 masterID: 0, tile: 0, workload: 0
2018-03-16 17:06:15:516 hostname: client1
Hostname of prime client: specclient1
2018-03-16 17:06:15:536 Fri Mar 16 17:05:55 CST 2018
2018-03-16 17:06:15:562 RMI server started: client1:9900
2018-03-16 17:06:15:562 Total clients: 1
2018-03-16 17:06:15:562 Adding host client1:1091
2018-03-16 17:06:15:571 Setting up clients...
2018-03-16 17:06:16:337 calling getHostVM() on jappclient...
2018-03-16 17:08:31:275 Starting drivers and waiting for Steady State...


client-192.1.1.7_1091.log:

2018-03-16 17:05:55:465 Creating jappclient using RMI Registry port 1091
2018-03-16 17:05:55:500 specclient1:1091 ready...
Driver Host: specclient1                Tile Number:0
2018-03-16 17:06:15:597 matchOut() messages set ...
2018-03-16 17:06:15:598 Starting rmiregistry; bindWait = 45000
2018-03-16 17:07:00:598 Starting Controller; bindWait = 45000
2018-03-16 17:07:00:603 waiting for: Binding controller to /
rec'd notifyInterrupt(101) call
Binding controller to //specclient1:2098/ControllerLauncher: done in waitMatch(0)
2018-03-16 17:07:46:168 Starting Agents
---------------//specclient1:2098/Controller
Calling switchLog as master
url[0] is : http://specemulator:8080/Emulator/EmulatorServlet?cmd=switchlog
url[1] is : http://specdelivery:8000/Supplier/DeliveryServlet?cmd=switchlog
calling driver.waitMatch(0)...
2018-03-16 17:08:31:277 waiting for: waiting2ramp
RunID for this run is : 17
Output directory for this run is : /opt/SPECjAppServer2004/output/17
loadFactor=5
changeRate=30
burstyCurve from run.properties=37,72,61,87,132,77,0,49,137,93,187,103,174,138,200,173,153,107,225,44,36,44,48,68,138,125,116,88,38,50
scaleFactor=1.0
Curve avg txRate = 100.0
maxTxRate=225
tileNumber=0
Will run in bursty mode after rampup/warmup phases. Starting at burstPoint:0
WarmUp style = 0 (0=linear only, 1=burstycurve, 2=zigzag)
Phase one of warm up (start of transaction activity) will increase IR from 0 to 100 linearly, over 900 seconds.
Steady-State IR transition stepRate(ms)=40000
Burst Curve StartPoint Tile Multiplier=7
smoothFactor=1
Using default timeSkewTolerance value: 3
Mar 16, 2018 5:08:32 PM com.sun.enterprise.v3.server.CommonClassLoaderServiceImpl findDerbyClient
INFO: Cannot find javadb client jar file, derby jdbc driver will not be available by default.
Mar 16, 2018 5:08:32 PM org.glassfish.enterprise.iiop.impl.GlassFishORBManager getCorbalocURL
INFO: list ==> specdelivery:3700
Mar 16, 2018 5:08:32 PM org.glassfish.enterprise.iiop.impl.GlassFishORBManager getCorbalocURL
INFO: corbaloc url ==> iiop:1.2@specdelivery:3700

Tang

  • Newbie
  • *
  • Posts: 22
  • Karma: +0/-0
Re: Fail to run benchmark with runspecvirt.sh
« Reply #1 on: March 16, 2018, 02:56:58 AM »
I also checked glassfish server log and searched 'error', and did't find any 'error' printed.

ChrisFloyd

  • Moderator
  • Jr. Member
  • *****
  • Posts: 52
  • Karma: +2/-0
Re: Fail to run benchmark with runspecvirt.sh
« Reply #2 on: March 21, 2018, 10:38:50 AM »
Tang,

I see in one location where it appears to be connecting to client with hostname client1, but refers to hostname primeclient1 elsewhere, and appears to be configured by numeric IP in the Control.config file.  Can you please provide the Control.config file used for this test, and a brief summary of your client topology and hostnames/IPs?  Thanks,

Tang

  • Newbie
  • *
  • Posts: 22
  • Karma: +0/-0
Re: Fail to run benchmark with runspecvirt.sh
« Reply #3 on: March 21, 2018, 10:24:27 PM »
Hi,ChrisFloyd

The below script is /etc/hosts in client1. In other six vm servers, this file is the same except the red part, for example, in infraserver the red part is modified to 'infraserver1'.
All vm servers and client are connected with virtual eth0 with 192.1.1.x. Infraserver, webserver, appserver, dbserver and client are connected with virtual eth1 with 192.2.1.x.
There is not virtual eth1 in mailserver or batchserver. All servers and client are now running on one physical machine.
Control.config is attatched below.

/etc/hosts:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4 specclient1
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.1.1.1          infraserver1 infraserver1-ext infraserver
192.1.1.2          webserver1 webserver1-ext webserver
192.1.1.3          mailserver1 mailserver1-ext mailserver
192.1.1.4          appserver1 appserver1-ext specdelivery specemulator appserver
192.1.1.5          dbserver1  dbserver1-ext specdb dbserver
192.1.1.6          batchserver1 batchserver1-ext batchserver

192.2.1.1          infraserver1-int besim infraserver-int
192.2.1.2          webserver1-int webserver-int
192.2.1.3          mailserver1-int mailserver-int
192.2.1.4          appserver1-int appserver-int
192.2.1.5          dbserver1-int dbserver-int
192.2.1.6          batchserver1-int batchserver-int

192.1.1.7   specclient1 client1 specdriver specclient
192.1.2.7   specclient2 client2
192.1.3.7   specclient3 client3
192.1.4.7   specclient4 client4
192.1.5.7   specclient5 client5
192.1.6.7   specclient6 client6
192.1.7.7   specclient7 client7
192.1.8.7   specclient8 client8
192.1.9.7   specclient9 client9

Thanks.

Tang

  • Newbie
  • *
  • Posts: 22
  • Karma: +0/-0
Re: Fail to run benchmark with runspecvirt.sh
« Reply #4 on: March 28, 2018, 04:28:15 AM »
I've changed client hostname to client1 and set hostname with command "hostname client1", and modified /etc/sysconfig/network to set HOSTNAME=client1.
The problem is still the same.

/etc/hosts:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4 client1
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.1.1.1          infraserver1 infraserver1-ext infraserver
192.1.1.2          webserver1 webserver1-ext webserver
192.1.1.3          mailserver1 mailserver1-ext mailserver
192.1.1.4          appserver1 appserver1-ext specdelivery specemulator appserver
192.1.1.5          dbserver1  dbserver1-ext specdb dbserver
192.1.1.6          batchserver1 batchserver1-ext batchserver

192.2.1.1          infraserver1-int besim infraserver-int
192.2.1.2          webserver1-int webserver-int
192.2.1.3          mailserver1-int mailserver-int
192.2.1.4          appserver1-int appserver-int
192.2.1.5          dbserver1-int dbserver-int
192.2.1.6          batchserver1-int batchserver-int

192.1.1.7   specclient1 client1 specdriver specclient
192.1.2.7   specclient2 client2
192.1.3.7   specclient3 client3
192.1.4.7   specclient4 client4
192.1.5.7   specclient5 client5
192.1.6.7   specclient6 client6
192.1.7.7   specclient7 client7
192.1.8.7   specclient8 client8
192.1.9.7   specclient9 client9

ChrisFloyd

  • Moderator
  • Jr. Member
  • *****
  • Posts: 52
  • Karma: +2/-0
Re: Fail to run benchmark with runspecvirt.sh
« Reply #5 on: March 28, 2018, 02:28:21 PM »
Tang,

We are reviewing the logs to se if we can identify the problem.

While we review, have you disabled the firewall on all systems (or at least the master, clients, appserver, and dbserver in this case).  Does the test eventually complete but with no appserver result?  Can you try setting POLL_INTERVAL_SECS = 180 and see if the end of run messages are seen in the primer controller output?

Thanks,

-Chris


Tang

  • Newbie
  • *
  • Posts: 22
  • Karma: +0/-0
Re: Fail to run benchmark with runspecvirt.sh
« Reply #6 on: March 29, 2018, 01:48:50 AM »
Hi Chris,

I have disabled firewall on all vm servers and client, and double checked with commands "service iptables status" and "service ip6tables status" in all VMs.
Returns as below:
iptables: Firewall is not running.
ip6tables: Firewall is not running.
And these two services "iptables" and "ip6tables" are set off and not to start up automatically.

Firewall on the physical machine all VMs running on, is inactive. (This is a ubuntu machine and checked with "ufw status")

I set POLL_INTERVAL_SEC=180 and rerun, the problem is similar. New logs attached below.

primectrl.out
2018-03-29 11:24:26:144 Thu Mar 29 11:24:26 CST 2018
2018-03-29 11:24:26:145 specvirt: maxPreRunTime = 1201
2018-03-29 11:24:26:145 specvirt: runTime = 180
2018-03-29 11:24:26:145 specvirt: runTime = 600
2018-03-29 11:24:26:148 Validator: [WARNING] POLL_INTERVAL_SEC value is: 180; should be 7200 or greater.
2018-03-29 11:24:26:148 Validator: [WARNING] NUM_WORKLOADS value is: 1; should be 4
2018-03-29 11:24:26:148 Validator: [WARNING] Non-compliant configuration.
2018-03-29 11:24:26:148 [WARNING] This will be a non-compliant benchmark result!
2018-03-29 11:24:26:245 RMI server started: client1:9990
2018-03-29 11:24:26:250 [INFO] This is a perf-only benchmark run. Skipping active idle polling interval.
2018-03-29 11:24:26:251 PrimeControl: preparing client drivers.
2018-03-29 11:24:26:251 PrimeControl: PRIME_HOST 0 = client1:1098
2018-03-29 11:24:26:252 PrimeControl: Master 1: client1:1098
2018-03-29 11:24:26:259 PrimeControl: adding host client1:1098
2018-03-29 11:24:26:271 First client for 0: 127.0.0.1:1091
2018-03-29 11:24:26:277 PrimeControl: starting clients...
2018-03-29 11:24:26:277 PrimeControl: clients.length = 1
2018-03-29 11:24:26:277 PrimeControl: clients[0].length = 1
2018-03-29 11:24:26:277 PrimeControl: starting clients[0][0]: 127.0.0.1:1091
2018-03-29 11:24:26:293 PrimeControl: started client: 127.0.0.1:1091
2018-03-29 11:24:26:293 PrimeControl: PTDs not used for this benchmark run!
2018-03-29 11:24:26:293 PrimeControl: starting 1 masters.
2018-03-29 11:24:26:297 PrimeControl: master[0][0] sleeping 20 sec.
2018-03-29 11:24:46:313 PrimeControl: waiting on 1 prime client(s).
2018-03-29 11:24:46:552 Sending config to client1:1098
.
2018-03-29 11:24:47:326 setting hostsReady = true
2018-03-29 11:27:02:252 PrimeControl: Workload and prime controller builds: 79
2018-03-29 11:27:02:253 PrimeControl: awaiting runtime started signal from prime clients
2018-03-29 11:27:02:254 PrimeControl: all workloads started.

client-127.0.0.1_1091.log
2018-03-29 11:24:26:448 Creating jappclient using RMI Registry port 1091
2018-03-29 11:24:26:538 client1:1091 ready...
Driver Host: client1      Tile Number:0
2018-03-29 11:24:46:622 matchOut() messages set ...
2018-03-29 11:24:46:623 Starting rmiregistry; bindWait = 45000
2018-03-29 11:25:31:623 Starting Controller; bindWait = 45000
2018-03-29 11:25:31:635 waiting for: Binding controller to /
rec'd notifyInterrupt(101) call
Binding controller to //client1:2098/Controller
Launcher: done in waitMatch(0)
2018-03-29 11:26:17:119 Starting Agents
---------------//client1:2098/Controller
Calling switchLog as master
url[0] is : http://specemulator:8080/Emulator/EmulatorServlet?cmd=switchlog
url[1] is : http://specdelivery:8000/Supplier/DeliveryServlet?cmd=switchlog
calling driver.waitMatch(0)...
2018-03-29 11:27:02:256 waiting for: waiting2ramp
RunID for this run is : 29
Output directory for this run is : /opt/SPECjAppServer2004/output/29
loadFactor=5
changeRate=30
burstyCurve from run.properties=37,72,61,87,132,77,0,49,137,93,187,103,174,138,200,173,153,107,225,44,36,44,48,68,138,125,116,88,38,50
scaleFactor=1.0
Curve avg txRate = 100.0
maxTxRate=225
tileNumber=0
Will run in bursty mode after rampup/warmup phases. Starting at burstPoint:0
WarmUp style = 0 (0=linear only, 1=burstycurve, 2=zigzag)
Phase one of warm up (start of transaction activity) will increase IR from 0 to 100 linearly, over 900 seconds.
Steady-State IR transition stepRate(ms)=40000
Burst Curve StartPoint Tile Multiplier=7
smoothFactor=1
Using default timeSkewTolerance value: 3
Mar 29, 2018 11:27:02 AM org.glassfish.enterprise.iiop.impl.GlassFishORBManager getCorbalocURL
INFO: list ==> specdelivery:3700
Mar 29, 2018 11:27:02 AM org.glassfish.enterprise.iiop.impl.GlassFishORBManager getCorbalocURL
INFO: corbaloc url ==> iiop:1.2@specdelivery:3700

prime-client1_1098.log
2018-03-29 11:24:46:434 Looking up SPECvirt controller: client1
2018-03-29 11:24:46:551 masterID: 0, tile: 0, workload: 0
2018-03-29 11:24:46:551 hostname: client1
Hostname of prime client: client1
2018-03-29 11:24:46:577 Thu Mar 29 11:24:26 CST 2018
2018-03-29 11:24:46:589 RMI server started: client1:9900
2018-03-29 11:24:46:590 Total clients: 1
2018-03-29 11:24:46:590 Adding host client1:1091
2018-03-29 11:24:46:598 Setting up clients...
2018-03-29 11:24:47:324 calling getHostVM() on jappclient...
2018-03-29 11:27:02:254 Starting drivers and waiting for Steady State...

Thanks.

lroderic

  • Moderator
  • Full Member
  • *****
  • Posts: 167
  • Karma: +6/-0
Re: Fail to run benchmark with runspecvirt.sh
« Reply #7 on: April 11, 2018, 06:02:48 PM »
Is the polling process running on all workload VMs (not on the client)?

Code: [Select]
# cat /tmp/polllme.out
Any errors in /var/log/messages on the client, appserver, or dbserver? Do you have enough disk space on all three? It's very strange that it hangs without being logged.

Lisa

Tang

  • Newbie
  • *
  • Posts: 22
  • Karma: +0/-0
Re: Fail to run benchmark with runspecvirt.sh
« Reply #8 on: April 12, 2018, 07:39:42 AM »
Hi Lisa,
I restart all vm and rerun ./runspecvirt to check polling process.
After "all workloads started" is shown in primectrl.out on client, I check pollme.out as follows.

in appserver
[root@appserver1 ~]# cat /tmp/pollme.out
Creating RMI listener using RMI Registry port 8001
appserver1-ext/192.1.1.4:8001 ready...

in dbserver:
[root@dbserver1 ~]# cat /tmp/pollme.out
Creating RMI listener using RMI Registry port 8001
dbserver1-ext/192.1.1.5:8001 ready...

In /var/log/messages in client, it shows some fail print as below, how can I deal with it?
var/log/messages in client:
Apr 12 21:59:23 localhost kernel: platform microcode: firmware: requesting intel-ucode/06-06-03
Apr 12 21:59:23 localhost kernel: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
Apr 12 21:59:23 localhost kernel: parport_pc 00:05: reported by Plug and Play ACPI
Apr 12 21:59:23 localhost kernel: parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
Apr 12 21:59:23 localhost kernel: ppdev: user-space parallel port driver
Apr 12 21:59:23 localhost kernel: EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts:
Apr 12 21:59:23 localhost kernel: Adding 1048568k swap on /dev/mapper/VolGroup-lv_swap.  Priority:-1 extents:1 across:1048568k D
Apr 12 21:59:23 localhost kernel: NET: Registered protocol family 10
Apr 12 21:59:23 localhost kernel: lo: Disabled Privacy Extensions
Apr 12 21:59:23 localhost kernel: e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Apr 12 21:59:23 localhost kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready
Apr 12 21:59:23 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Apr 12 21:59:23 localhost kernel: e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Apr 12 21:59:23 localhost kernel: eth0: IPv6 duplicate address fe80::5054:ff:fe12:3456 detected!
Apr 12 21:59:23 localhost kernel: eth1: IPv6 duplicate address fe80::5054:ff:fe12:3457 detected!
Apr 12 21:59:24 localhost rpc.statd[1290]: Version 1.2.3 starting
Apr 12 21:59:24 localhost sm-notify[1291]: Version 1.2.3 starting
Apr 12 21:59:24 localhost mcelog: failed to prefill DIMM database from DMI data
Apr 12 21:59:24 localhost kdump: kexec: loaded kdump kernel
Apr 12 21:59:24 localhost kdump: started up
Apr 12 21:59:24 localhost acpid: starting up
Apr 12 21:59:24 localhost acpid: 1 rule loaded
Apr 12 21:59:24 localhost acpid: waiting for events: event logging is off
Apr 12 21:59:24 localhost acpid: client connected from 1500[68:68]
Apr 12 21:59:24 localhost acpid: 1 client rule loaded
Apr 12 21:59:26 localhost automount[1654]: lookup_read_master: lookup(nisplus): couldn't locate nis+ table auto.master
Apr 12 21:59:26 localhost abrtd: Init complete, entering main loop

/var/log/messages in appserver:
Apr 12 21:59:19 localhost kernel: e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Apr 12 21:59:19 localhost kernel: e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Apr 12 21:59:19 localhost kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready
Apr 12 21:59:19 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
Apr 12 21:59:19 localhost kernel: eth1: IPv6 duplicate address fe80::5054:ff:fe12:3457 detected!
Apr 12 21:59:19 localhost rpc.statd[1084]: Version 1.2.3 starting
Apr 12 21:59:19 localhost sm-notify[1085]: Version 1.2.3 starting
Apr 12 21:59:19 localhost kdump: kexec: loaded kdump kernel
Apr 12 21:59:19 localhost kdump: started up
Apr 12 21:59:19 localhost acpid: starting up
Apr 12 21:59:19 localhost acpid: 1 rule loaded
Apr 12 21:59:19 localhost acpid: waiting for events: event logging is off
Apr 12 21:59:19 localhost acpid: client connected from 1285[68:68]
Apr 12 21:59:19 localhost acpid: 1 client rule loaded
Apr 12 21:59:21 localhost automount[1305]: lookup_read_master: lookup(nisplus): couldn't locate nis+ table auto.master
Apr 12 21:59:21 localhost mcelog: failed to prefill DIMM database from DMI data
Apr 12 21:59:21 localhost abrtd: Init complete, entering main loop
Apr 12 22:01:35 localhost kernel: Switching to clocksource tsc

/var/log/messages in dbserver:
Apr 12 21:59:20 localhost kernel: lo: Disabled Privacy Extensions
Apr 12 21:59:20 localhost kernel: e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Apr 12 21:59:20 localhost kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready
Apr 12 21:59:20 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Apr 12 21:59:20 localhost kernel: e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Apr 12 21:59:20 localhost kernel: eth0: IPv6 duplicate address fe80::5054:ff:fe12:3456 detected!
Apr 12 21:59:20 localhost kernel: eth1: IPv6 duplicate address fe80::5054:ff:fe12:3457 detected!
Apr 12 21:59:20 localhost rpc.statd[1294]: Version 1.2.3 starting
Apr 12 21:59:20 localhost sm-notify[1295]: Version 1.2.3 starting
Apr 12 21:59:20 localhost kdump: kexec: loaded kdump kernel
Apr 12 21:59:20 localhost kdump: started up
Apr 12 21:59:21 localhost acpid: starting up
Apr 12 21:59:21 localhost acpid: 1 rule loaded
Apr 12 21:59:21 localhost acpid: waiting for events: event logging is off
Apr 12 21:59:21 localhost acpid: client connected from 1498[68:68]
Apr 12 21:59:21 localhost acpid: 1 client rule loaded
Apr 12 21:59:22 localhost automount[1652]: lookup_read_master: lookup(nisplus): couldn't locate nis+ table auto.master
Apr 12 21:59:22 localhost mcelog: failed to prefill DIMM database from DMI data
Apr 12 21:59:23 localhost abrtd: Init complete, entering main loop
Apr 12 22:01:04 localhost kernel: mysqld (3417): Using mlock ulimits for SHM_HUGETLB is deprecated
Apr 12 22:01:36 localhost kernel: Switching to clocksource tsc

These three messages logs are attached below.

Memory,disk,smp config of vms:
infraserver memory-1G disk-40G smp-1
webserver   memory-20G smp-2
mailserver  memory-1G disk-20G smp-1
appserver   memory-8G smp-4
batchserver memory-1G smp-1
dbserver    memory-32G disk-20G smp-8
client      memory-8G disk-24G smp-8

Thanks.

Tang

lroderic

  • Moderator
  • Full Member
  • *****
  • Posts: 167
  • Karma: +6/-0
Re: Fail to run benchmark with runspecvirt.sh
« Reply #9 on: April 12, 2018, 10:22:24 AM »
Googling that mcelog error says it's harmless, so I don't think that's it. While the test is hanging, can you ping between dbserver and appserver?

Could you please disable IPV6 and retry?

Also, tell us more about the server. What hypervisor are you using? What does /var/log/messages on the server show? How much physical memory does the server have? And after you start all the VMs, how much free physical memory is still available?

Lisa
« Last Edit: April 12, 2018, 01:59:32 PM by lroderic »

Tang

  • Newbie
  • *
  • Posts: 22
  • Karma: +0/-0
Re: Fail to run benchmark with runspecvirt.sh
« Reply #10 on: April 12, 2018, 10:30:36 PM »
Hi Lisa,
I disable ipv6 on all vms and then rerun, but get similar result and logs.
To make sure ipv6 is disabled, I double checked with 'ifconfig' on all vms and get similar prints as follow:

[root@client1 ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 16:20:00:00:00:06
          inet addr:192.1.1.7  Bcast:192.1.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:4548 errors:353 dropped:0 overruns:0 frame:353
          TX packets:4087 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1068911 (1.0 MiB)  TX bytes:2158645 (2.0 MiB)

eth1      Link encap:Ethernet  HWaddr 16:20:00:01:00:06
          inet addr:192.2.1.7  Bcast:192.2.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:10 errors:5 dropped:0 overruns:0 frame:5
          TX packets:1 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1803 (1.7 KiB)  TX bytes:90 (90.0 b)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:3964 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3964 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:473166 (462.0 KiB)  TX bytes:473166 (462.0 KiB)

New "/var/log/messages" on client, appserver and dbserver are attached.

I use KVM hypervisor.
Total memory is 251G. Before vms start, free memory is 203G.
While all vms start, before specvirt process run, free memory is 197G.
When process is hanging, free memory is 176G.

Thanks,

Tang

lroderic

  • Moderator
  • Full Member
  • *****
  • Posts: 167
  • Karma: +6/-0
Re: Fail to run benchmark with runspecvirt.sh
« Reply #11 on: April 13, 2018, 01:27:38 PM »
Did you install Java JDK or JRE on dbserver and appserver? For convenience, I install the same version of Java JDK on all VMs (even though they don't all need it). Your comment says you're running Java 1.8 on the appserver but Java 1.7 elsewhere. Could you bring them all up to OpenJDK 1.8?

Lisa

Tang

  • Newbie
  • *
  • Posts: 22
  • Karma: +0/-0
Re: Fail to run benchmark with runspecvirt.sh
« Reply #12 on: April 14, 2018, 05:22:50 AM »
Hi, Lisa

I install java 1.8.0_112 on all server and client and then rerun. However, it's still same result and hanging. Which part of vm may be not set up correctly do you think? Maybe I can take a try to set up the new vm ?

thanks,

Tang

lroderic

  • Moderator
  • Full Member
  • *****
  • Posts: 167
  • Karma: +6/-0
Re: Fail to run benchmark with runspecvirt.sh
« Reply #13 on: April 16, 2018, 02:45:28 PM »
Was that Java JDK you installed, or JRE? Please make sure the output from the java -version command matches on the dbserver, appserver, and client. Should look something like:

Code: [Select]
# java -version
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)

Also, during the run, SPECjAppServer outputs to a result directory that goes under the SPECjAppServer*/result. Would you please post the files in that directory?

Lisa

Tang

  • Newbie
  • *
  • Posts: 22
  • Karma: +0/-0
Re: Fail to run benchmark with runspecvirt.sh
« Reply #14 on: April 16, 2018, 10:56:24 PM »
Hi,Lisa

Output of java -version shows as follow:

[root@client1 ~]# java -version
java version "1.8.0_112"
Java(TM) SE Runtime Environment (build 1.8.0_112-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixed mode)
[root@client1 ~]#
[root@client1 ~]# ssh appserver java -version
java version "1.8.0_112"
Java(TM) SE Runtime Environment (build 1.8.0_112-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixed mode)
[root@client1 ~]# ssh dbserver java -version
java version "1.8.0_112"
Java(TM) SE Runtime Environment (build 1.8.0_112-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixed mode)

In appserver, while process is hanging, it seems that there is not result directory under SPECjAppServer2004.
I've checked all files in SPECjAppServer2004, and find that only 'output' directory is updated.
In this 'output', there's an null docs 'README'.
Does it mean that SPECjAppServer2004 is not running actually?

In appserver:
[root@appserver1 SPECjAppServer2004]# pwd
/opt/SPECjAppServer2004
[root@appserver1 SPECjAppServer2004]# ls
ant  bin  build_glassfish.xml  build.xml  ChangeLog.txt  classes  config  docs  errata.txt  jars  License  output  README.txt  reporter  schema  src  tomcat.xml  version
[root@appserver1 SPECjAppServer2004]# cd output/
[root@appserver1 output]# ls
README
[root@appserver1 output]# cat README
[root@appserver1 output]#

thanks,

Tang