NASA7 NASA Ames FORTRAN Kernels 1.0 General 1.1: Classification: These kernels are very heavily floating point intensive. 1.2: Description: NASA7 is a collection of 7 kernels. For each kernel, the program generates its own input data, performs the kernel and compares the result against an expected result. Read the VERSION file for details of SPEC modifications. The seven kernels are: MXM - matrix multiply CFFT2d - complex radix 2 FFT on 2D array CHOLSKY - Cholesky decomposition in parallel on a set of input matrices. BTRIX - Block tridiagonal matrix solution along one dimension of a four dimensional array. GMTRY - Sets up arrays for a vortex method solution and per- forms Gausian elimination on the resulting arrays. EMIT - Creates new vortices according to certain boundary conditions. VPENTA - inverts 3 matrix pentadiagonals in a highly parallel fashion. 1.3: Source/Author: David H. Bailey and John T. Barton, NASA, Ames 1.4: Version/Date: 09/09/91 1.5: Other Information: A complete description of NASA7 can be found in NASA Technical Memorandum #86711, "The NAS Kernel Benchmark Program" by David H. Bailey and John T. Barton 2. PERFORMACE 2.1: Metrics: 2.2: Elapsed Time: The benchmark runs in about 20 minutes on Apollo's DN10000. On less powerful machines, the benchmark can take over 4 hours to run. The SPEC reference time (to 3 sig. fig.) is 16800 seconds. 2.3: Report: NASA7 produces the report shown below THE NAS KERNEL BENCHMARK PROGRAM PROGRAM ERROR FP OPS SECONDS MFLOPS MXM 3.4313E-15 4.1943E+08 152.9719 2.74 CFFT2D 1.9008E-13 4.9807E+08 194.1711 2.57 CHOLSKY 2.8784E-12 2.2103E+08 203.5035 1.09 BTRIX 2.5033E-13 3.2197E+08 180.1951 1.79 GMTRY 2.4207E-13 2.2650E+08 240.8430 0.94 EMIT 1.0347E-15 2.2604E+08 40.2722 5.61 VPENTA 8.4523E-15 2.5943E+08 231.0638 1.12 TOTAL 3.5738E-12 2.1725E+09 1243.0207 1.75 Fortran STOP 2.4: Additional Performance Considerations: The benchmark, as distributed by SPEC, uses double precision data. This degree of precision meets the minimum precision requirements specified by the benchmark's authors. The total error must be less than 5e- 10. The SECONDS and MFLOPS fields are all reported as 0.0. 3. SOFTWARE NASA7 is written in FORTRAN and is highly portable. The benchmark does contain code that is vectorizable, although certain kernels may be more difficult to vectorize than others. The source code contains no vectorizing compiler directives. 4. HARDWARE No special hardware is required. NASA7 runs without paging in 8 megabytes of physical memory and probably requires much less. Any floating point hardware such as vector or array processors will have a big impact on performance. 5. OPERATIONAL 5.1: Disk Space The source of NASA6 is a little over 1000 lines, thus there is minimal space required to hold the source. The executable image is likely to be less than 100k bytes. 5.2: Installation: All the material related to the NASA6 benchmark is in the direc- tory nasa6. There is a Makefile in that directory that will build the benchmark and run it. The Makefile passes '-O' to the f77 compiler. This can be overridden by specifying FFLAGS on the 'make' commandline. 5.3: Execution: The Makefile runs the benchmark by default. 5.4: Correctness Verification: The program should produce a table similar to the one in Para- graph 2.3. The total error must be less than 5e-10.