# 452.ep SPECaccel Benchmark Description File

452.ep

## Benchmark Author

Serial C version was developed the Center for Manycore Programming at Seoul National University and derived from the serial Fortran versions in "NPB3.3-SER" developed by NAS.

Initial port to OpenMP by Alexander Grund

## Benchmark Program General Category

Embarrassingly Parallel

## Benchmark Description

EP kernel benchmark is an embarrassingly parallel algorithm with a reduction. The algorithm generates n pairs of uniform (0,1) pseudorandom deviates (xj,yj). Then for each j the condition tj = x2j + yj2 <= 1 is checked. If the condition is satisfied, Xk = xj sqrt(-2log(tj))/tj and Yk = yj sqrt(-2log(tj))/tj , where k starts from 1 and increments after each step. Finally Ql (0 <= l <= 9) counts the pairs (Xk,Yk) that lie in the square annulus l <= max(|Xk, Yk|) <= l + 1. Then Sum(Xk) + Sum(Yk) are then calculated. In this algorithm, Ql(0 <= l <= 9) performs the reduction of all the pairs.

## Input Description

The input dataset size is comprised of W, A through E classes. We have used the 3 classes in our experiments:

Class W: reference data for n = 2^25 pairs of (xj,yj) (1 <= j <= n)

Class D: references data for n = 2^36 pairs of (xj,yj) (1 <= j <= n)

Class E: references data for n = 2^40 pairs of (xj,yj) (1 <= j <= n)

Class W is used by the test workload, Class D by train, and Class E by ref.

## Output Description

Ql (0 <= l <= 9) that counts the pairs (Xk,Yk) that lie in the square annulus l <= max(|Xk, Yk|) <= l + 1, and Sum(Xk) + Sum(Yk).

C

## Known portability issues

The "BLKSIZE" macro in "ep.c" sets the size of block size to use. Smaller block sizes means more iterations with larger block sizes requiring fewer and likely better performance. However, each block uses approximately 1 MiB of memory or 1 GiB per thousand. The default block size is 15,000 taking ~15 GiB of memory.

While SPECaccel 2023's system requirements expect 16GB of available device memory, if you are unable to run the benchmark due to memory limitations, users may select smaller block sizes via the "-DSPEC_BLOCK_SIZE=" macro included with the OPTIMIZE setting in the config file. Sizes smaller than 15000 when used to work around memory constriants of the device, are considered a portability option and can be set in either base or peak runs.

User may not use larger block sizes in base runs. However in peak runs, -DSPEC_BLOCK_SIZE settings with larger block sizes are considered a tunning parameter and may be used.

## Reference

1. Information on NPB 3.3, including the technical report, the original specifications, source code, results and information on how to submit new results, is available at: http://www.nas.nasa.gov/Software/NPB/
2. Information about the C version developed by the Center for Manycore Programming can be found at: http://aces.snu.ac.kr/Center_for_Manycore_Programming/Home.html