Serial C version was developed the Center for Manycore Programming at Seoul National University and derived from the serial Fortran versions in "NPB3.3-SER" developed by NAS.
Initial port to OpenMP by Alexander Grund
EP kernel benchmark is an embarrassingly parallel algorithm with a reduction. The algorithm generates n pairs of uniform (0,1) pseudorandom deviates (xj,yj). Then for each j the condition tj = x2j + yj2 <= 1 is checked. If the condition is satisfied, Xk = xj sqrt(-2log(tj))/tj and Yk = yj sqrt(-2log(tj))/tj , where k starts from 1 and increments after each step. Finally Ql (0 <= l <= 9) counts the pairs (Xk,Yk) that lie in the square annulus l <= max(|Xk, Yk|) <= l + 1. Then Sum(Xk) + Sum(Yk) are then calculated. In this algorithm, Ql(0 <= l <= 9) performs the reduction of all the pairs.
The input dataset size is comprised of W, A through E classes. We have used the 3 classes in our experiments:
Class W: reference data for n = 2^25 pairs of (xj,yj) (1 <= j <= n)
Class C: reference data for n = 2^32 pairs of (xj,yj) (1 <= j <= n)
Class D: references data for n = 2^36 pairs of (xj,yj) (1 <= j <= n)
Class W is used by the test workload, Class C by train, and Class D by ref.
Ql (0 <= l <= 9) that counts the pairs (Xk,Yk) that lie in the square annulus l <= max(|Xk, Yk|) <= l + 1, and Sum(Xk) + Sum(Yk).