Serial C version was developed the Center for Manycore Programming at Seoul National University and derived from the serial Fortran versions in "NPB3.3-SER" developed by NAS.
OpenACC version was developed by Rengan Xu and Sunita Chandrasekaran from University of Houston and Mathew Colgrove from NVIDIA.
EP kernel benchmark is an embarrassingly parallel algorithm with a reduction. The algorithm generates n pairs of uniform (0,1) pseudorandom deviates (xj,yj). Then for each j the condition tj = x2j + yj2 <= 1 is checked. If the condition is satisfied, Xk = xj sqrt(-2log(tj))/tj and Yk = yj sqrt(-2log(tj))/tj , where k starts from 1 and increments after each step. Finally Ql (0 <= l <= 9) counts the pairs (Xk,Yk) that lie in the square annulus l <= max(|Xk, Yk|) <= l + 1. Then Sum(Xk) + Sum(Yk) are then calculated. In this algorithm, Ql(0 <= l <= 9) performs the reduction of all the pairs.
The input dataset size is comprised of W, A through E classes. We have used the 3 classes in our experiments:
Class W: reference data for n = 2^25 pairs of (xj,yj) (1 <= j <= n)
Class C: reference data for n = 2^32 pairs of (xj,yj) (1 <= j <= n)
Class D: references data for n = 2^36 pairs of (xj,yj) (1 <= j <= n)
Class W is used by the test workload, Class C by train, and Class D by ref.
Ql (0 <= l <= 9) that counts the pairs (Xk,Yk) that lie in the square annulus l <= max(|Xk, Yk|) <= l + 1, and Sum(Xk) + Sum(Yk).