Author Topic: Batchserver workload failed  (Read 6047 times)

Tang

  • Newbie
  • *
  • Posts: 22
  • Karma: +0/-0
Batchserver workload failed
« on: January 09, 2019, 07:14:34 AM »
Hi,
thank you David, I have solved this problem. The configration in glassfish.env and default.env in SPECjAppserver/config/ in client vm is missing. I just configed these two files in appserver.

I reran the specvirt ,with 4 workload together and meet another problem. The batchserver ran for some time (seemed running correctly), but then it returned fail result finally.
In results, it showed that "
batch_interval[0]:FAILED batch interval[0] completed with unexpected result.
batch_interval[1]:FAILED batch interval[1] completed with unexpected result.

"
I checked prime-client1_1092.log in /opt/SPECvirt/logs, and there's no error in this log. I'm not sure what's the problem now.
Logs and results in /opt/SPECvirt attached below.

thanks,
Tang

Tang

  • Newbie
  • *
  • Posts: 22
  • Karma: +0/-0
Re: Batchserver workload failed
« Reply #1 on: January 09, 2019, 07:28:04 AM »
this is results in /opt/SPECvirt.

DavidSchmidt

  • Moderator
  • Newbie
  • *****
  • Posts: 21
  • Karma: +3/-1
Re: Batchserver workload failed
« Reply #2 on: January 09, 2019, 03:12:25 PM »
Hi Tang.

I am  sorry that the batchserver workload isn't working properly. Since you completed a run, would you please check in the results/[result dir]/1-1.0 directory and see if there are files like 0-3_1_CINT2006.001.train.20190104-025940-11.rsf and 0-3_1_CPU2006.001.20190104-025940-11.log (or 0-3_1_CPU2006.001.20190104-025940-11.log.debug) . These are the detailed output and log files created by the batchserver workload. Depending on how many vCPUs you have defined for the batchserver, you could see up to 10 of each of them. Would you collect and post them to help me debug the problem you are having?

Thanks,

Tang

  • Newbie
  • *
  • Posts: 22
  • Karma: +0/-0
Re: Batchserver workload failed
« Reply #3 on: January 09, 2019, 09:13:28 PM »
Hi David,
Thank you for your help. I have configured 8 vcpus for batchservers.
I found ten files with similar name as you memtioned, in /opt/SPECvirt/results/20190108-201649/1-1.0.
I packed these ten files in two packages.

Tang

  • Newbie
  • *
  • Posts: 22
  • Karma: +0/-0
Re: Batchserver workload failed
« Reply #4 on: January 09, 2019, 09:14:42 PM »
this is the second package.
thank you so much.

DavidSchmidt

  • Moderator
  • Newbie
  • *****
  • Posts: 21
  • Karma: +3/-1
Re: Batchserver workload failed
« Reply #5 on: January 10, 2019, 10:47:32 AM »
Hi Tang,

How much memory do you have allocated for the batchserver VM? With 8 vCPUs, you will run 8 copies of the batch workload in parallel and then run 2 copies in parallel. In the .debug log, for the first run, I am seeing errors of the form "Error mallocing memory" and 5 of the 8 copies of the workload terminated after 9 seconds (vs. the expected ~45 sec run on the other 3). The second run of 2 copies completed fine, which is why there was no debug file for it.

There are 2 ways to address this: add more memory to the VM or reduce the number of vCPUs to 3 or fewer. For most publications, 1 or 2 vCPUs are used for the batchserver configuration, so I would recommend this approach. The requirement for the batch server is that all 10 copies of the workload completes in less than 900 seconds, so even with 1 vCPU you should be fine.

Tang

  • Newbie
  • *
  • Posts: 22
  • Karma: +0/-0
Re: Batchserver workload failed
« Reply #6 on: January 11, 2019, 01:17:20 AM »
Hi, David

I have set batch server vm to 2 vcpu, and let any other else configuration remain the same and then rerun.
The "batchserver fail“ problem is solved, Thank you so much.
I start set up multi-tile workload now. Thank you for your help.

thanks,
Tang