Author Topic: "PrimeControl: terminating run" while running 2 tiles  (Read 692 times)

lroderic

  • Moderator
  • Full Member
  • *****
  • Posts: 134
  • Karma: +6/-0
Re: "PrimeControl: terminating run" while running 2 tiles
« Reply #45 on: September 28, 2017, 10:56:36 AM »
Have you confirmed that your 10 GbE network is working? It's simplest and best with a 10 GbE network since it lets you run all workloads against one client.

We need to focus on the client(s). All signs point to network bandwidth on the client NICs/vNICs. Are these physical clients or VMs? If they're VMs, are their vNICs configured correctly? Do they reside on a separate host from the SUT? Do you have a dedicated network for the testbed, or are you sharing it with a busy lab? Is the entire network on the same network switch?

Since you can run with three tiles @ 0.4 but fail @ 0.6, it looks like you're crossing that 1 GbE network threshold for the client. (Each client needs about 1.4 GbE.)

Please reboot the client right before a test run and capture the output of ifconfig. Then run the test and run top on the client during the test run. At the end of the test, do another ifconfig on the client to get packets sent/received and any collisions:

Code: [Select]
        RX packets 291259  bytes 201474352 (192.1 MiB)
        RX errors 0  dropped 566  overruns 0  frame 0
        TX packets 111161  bytes 19175356 (18.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

You can use ethtool to drill down on network stats as well.

If the client is a VM, please post its XML definition, and post the output of ifconfig and lspci on the host.

Lastly, get /etc/sysconfig/networks-scripts/ifcfg-* on the host for the NIC and bridge settings and post it here.

Lisa

Miles

  • Jr. Member
  • **
  • Posts: 63
  • Karma: +0/-0
Re: "PrimeControl: terminating run" while running 2 tiles
« Reply #46 on: October 02, 2017, 09:31:21 PM »
Hi
All my clients and wclients are VMs.
My SPECvirt runs in a dedicated network rather than a busy lab.

The NIC used as VM network device on the host:(Please refer to HOST_ifconfig.txt for the details)
ens6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 90.90.1.80  netmask 255.255.0.0  broadcast 90.90.255.255
        inet6 fe80::3efd:feff:fe9d:6c70  prefixlen 64  scopeid 0x20<link>
        ether 3c:fd:fe:9d:6c:70  txqueuelen 1000  (Ethernet)
        RX packets 10807669  bytes 11371129452 (10.5 GiB)
        RX errors 0  dropped 377  overruns 0  frame 0
        TX packets 4458658  bytes 300435255 (286.5 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



Thanks.
« Last Edit: October 02, 2017, 09:38:00 PM by Miles »

DavidSchmidt

  • Moderator
  • Newbie
  • *****
  • Posts: 12
  • Karma: +1/-0
Re: "PrimeControl: terminating run" while running 2 tiles
« Reply #47 on: October 03, 2017, 02:50:17 PM »
Hi Miles. I looked at the files you attached. The networking looks a little odd to me. You say that you are using ens6 is the port you are using for the VM network device. How is this device configured? In the lscpi info, I see details about 3 network devices:
  1 x 2-port I350 Intel 1GbE NIC (eno1 and eno2 in ifconfig data), 
  1 x Intel XL710 1-port 40GbE NIC,
  1 x Intel X540 2-port 10GbE NIC (ens5f0 and ens5f1, along with 32 virtual functions enp5s[xx]).

I actually don't see what device is ens6. I presume it is the XL710 NIC port, but don't know for certain since that NIC is slot 3, not slot 6. I also don't see any bridge information, so I don't know that layout is. If you don't use SRIVO, then you should have a bridge configured.

Can you run brctl show to show which devices are attached to which bridges, if any?

You say that your SUT network is on an isolated network. Which NIC ports are connected to the this network? I am presuming it's ens5f0 at least, but it's not set up as a bridge device as near as I can tell, so I am not sure how your client is actually talking to your SUT.

Thanks,

Miles

  • Jr. Member
  • **
  • Posts: 63
  • Karma: +0/-0
Re: "PrimeControl: terminating run" while running 2 tiles
« Reply #48 on: October 06, 2017, 04:14:08 AM »
Hi
Yes, ens6 is the XL710 NIC portfunctions enp5s[xx]).

I use SRIVO on X540 but no bridge configured.
Code: [Select]
Can you run brctl show to show which devices are attached to which bridges, if any?bridge name   bridge id      STP enabled   interfaces
virbr0      8000.000000000000   yes   

Thanks.

DavidSchmidt

  • Moderator
  • Newbie
  • *****
  • Posts: 12
  • Karma: +1/-0
Re: "PrimeControl: terminating run" while running 2 tiles
« Reply #49 on: October 09, 2017, 09:43:02 AM »
Hi Miles. I have a couple of questions regarding your SRIOV configuration. Per the lscpi info you provided, it looks like only one port of your X540 is configured with VFs (only even numbered functions show up in list of Virtual Functions). Did you only configure one port to use VFs? Would you please provide the output for dmesg?

Also, Would you mind providing the xml file for the client that is using SRIOV?

Finally, can you confirm that your X540 ports are connected to the SUT switch?

Miles

  • Jr. Member
  • **
  • Posts: 63
  • Karma: +0/-0
Re: "PrimeControl: terminating run" while running 2 tiles
« Reply #50 on: October 12, 2017, 09:09:08 PM »
Hi
I have completed 3T1W (web workload) successfully.

I modified TILEINDEX of support_image_props.rc and support_downloads_props.rc on webservers.

Now the value is 0 on webserver1; 1 on webserver2; and 2 on webserver3, but all 2 in my previous failed run.

But my webservers still often encounter "Out of memory: Kill process 16465 (httpd) score 1 or sacrifice child" and "connection refused" occurred.

Should I increase the VRAM on webservers? (Now 35840MB is allocated).

Thanks.
« Last Edit: October 12, 2017, 09:15:55 PM by Miles »

DavidSchmidt

  • Moderator
  • Newbie
  • *****
  • Posts: 12
  • Karma: +1/-0
Re: "PrimeControl: terminating run" while running 2 tiles
« Reply #51 on: October 16, 2017, 02:58:46 PM »
Hi Miles. This looks like a tuning issue with the Apache webserver; it appears the webserver application is running out of memory. I would look at the tuning options of a published SPECvirt_sc2013 that uses Apache and verify you have the same settings in your httpd.conf files.