Richard F. Barrett, Courtenay T. Vaughan, and Michael A. Heroux
Center for Computing Research, Sandia National Laboratories
A broad range of scientific computation involves the use of difference stencils. In a parallel computing environment, this computation is typically implemented by decomposing the spacial domain, inducing a "halo exchange" of process-owned boundary data. This approach adheres to the Bulk Synchronous Parallel (BSP) model. Because commonly available architectures provide strong inter-node bandwidth relative to latency costs, many codes "bulk up" these messages by aggregating data into a message as a means of reducing the number of messages. A renewed focus on non-traditional architectures and architecture features provides new opportunities for exploring alternatives to this programming approach.
Command line options:
|--scaling||Parallel scaling configuration||SCALING STRONG, SCALING WEAK*|
|--comm_method||Boundry exchange implementation||COMM_METHOD_BSPMA*, COMM_METHOD_SVAF|
|--stencil||Stencil to be applied||STENCIL NONE, STENCIL 2D5PT, STENCIL 2D9PT, STENCIL 3D7PT, STENCIL 3D27PT*|
|--nx, --ny, --nz||Grid dimension in (x; y; z) directions or Global values if strong scaling, local values if weak scaling.||>0; 10*|
|--num_vars||Number of GRID arrays operted on.||1-40*|
|--percent_sum||(Approximate) percentage of variables summation reduced||0-100; 0*|
|--num_tsteps||Number of time steps iterated.||>0|
|--num_spikes||Number of source spikes inserted.||>0|
|--npx, --npy, --npz||Logical processor grid in (x, y, z)||>0|
|--report_diffusion||Write error to stdout every n time steps||n >=0*|
|--debug_grid||Initialize grids to 0, insert heat source in center.||0 or 1*|
|--report_perf||Reporting options (Ignored in SPEC version)||0*, 1, 2|
|* default value|
The benchmark self validates and only gives an error if the error tolerance is greater than the value defined by --error_tol.
For the original version, performance output is controlled by the command line option report perf. By default it is set to 0, resulting in the problem configuration and performance results written to a file named result.yaml, formatted using YAML2. By setting this option to 1, this information is also written to a text file named result.txt. Setting it to 2 adds per processor communication times to the result.txt file.