SPEChpc™ 2021 Large Result

Copyright 2021-2023 Standard Performance Evaluation Corporation

IBM (Test Sponsor: Oak Ridge National Laboratory)

Summit: IBM Power System AC922 (IBM Power9, Tesla V100-SXM2-16GB)

SPEChpc 2021_lrg_base = 41.00

SPEChpc 2021_lrg_peak = Not Run

hpc2021 License: 056A Test Date: Sep-2021
Test Sponsor: Oak Ridge National Laboratory Hardware Availability: Nov-2018
Tested by: Oak Ridge National Laboratory Software Availability: Jul-2021

Benchmark result graphs are available in the PDF report.

Results Table

Benchmark Base Peak
Model Ranks Thrds/Rnk Seconds Ratio Seconds Ratio Seconds Ratio Model Ranks Thrds/Rnk Seconds Ratio Seconds Ratio Seconds Ratio
SPEChpc 2021_lrg_base 41.00
SPEChpc 2021_lrg_peak Not Run
Results appear in the order in which they were run. Bold underlined text indicates a median measurement.
805.lbm_l ACC 8400 1 38.6 70.5 27.0 1010
818.tealeaf_l ACC 8400 1 68.3 21.2 68.3 21.2
819.clvleaf_l ACC 8400 1 37.4 56.2 35.3 59.5
828.pot3d_l ACC 8400 1 1560 29.1 1410 32.3
834.hpgmgfv_l ACC 8400 1 1510 22.2 1400 23.9
835.weather_l ACC 8400 1 39.3 87.2 37.6 91.0
Hardware Summary
Type of System: Homogenous Cluster
Compute Node: IBM Power System AC922
Interconnect: Mellanox InfiniBand
Compute Nodes Used: 1400
Total Chips: 2800
Total Cores: 30800
Total Threads: 123200
Total Memory: 700 TB
Software Summary
Compiler: C/C++/Fortran: Version 21.7 of
NVHPC Toolkit
MPI Library: Spectrum MPI Version
Other MPI Info: None
Other Software: None
Base Parallel Model: ACC
Base Ranks Run: 8400
Base Threads Run: 1
Peak Parallel Models: Not Run

Node Description: IBM Power System AC922

Number of nodes: 1400
Uses of the node: compute
Vendor: IBM
Model: IBM Power System AC922
CPU Name: IBM POWER9 2.1 (pvr 004e 1201)
CPU(s) orderable: 2 chips
Chips enabled: 2
Cores enabled: 22
Cores per chip: 44
Threads per core: 4
CPU Characteristics: Up to 3.8 GHz
CPU MHz: 2300
Primary Cache: 32 KB I + 32 KB D on chip per core
Secondary Cache: 512 KB I+D on chip per core
L3 Cache: 110 MB I+D on chip per chip
Other Cache: None
Memory: 512 GB (16 x 32 GB RDIMM-DDR4-2666)
Disk Subsystem: 2 x 800 GB (Samsung Electronics Co Ltd NVMe SSD
Controller 172Xa/172Xb)
Other Hardware: None
Accel Count: 4
Accel Model: Tesla V100-SXM2-16GB
Accel Vendor: NVIDIA Corporation
Accel Type: GPU
Accel Connection: NVLink 2.0
Accel ECC enabled: Yes
Accel Description: See Notes
Adapter: Mellanox ConnectX-5
Number of Adapters: 2
Slot Type: None
Data Rate: 100 Gb/s (4X EDR)
Ports Used: 2
Interconnect Type: EDR InfiniBand
Accelerator Driver: NVIDIA CUDA 450.80.02
Adapter: Mellanox ConnectX-5
Adapter Driver: 4.9-
Adapter Firmware: 16.29.1016
Operating System: Red Hat Enterprise Linux
Local File System: xfs
Shared File System: 250 PB IBM Spectrum Scale parallel filesystem
over 4X EDR InfiniBand
System State: Multi-user, run level 3
Other Software: None

Interconnect Description: Mellanox InfiniBand

Vendor: Mellanox
Model: Mellanox Switch IB-2
Switch Model: Mellanox IB EDR Switch IB-2
Number of Switches: 1
Number of Ports: 36
Data Rate: 100 Gb/s
Topology: Non-blocking Fat-tree
Primary Use: MPI Traffic and GPFS access

Submit Notes

The config file option 'submit' was used.

General Notes

 MPI startup command:
   jsrun command was used to launch job using 1 GPU/rank.
Detailed information from nvaccelinfo

CUDA Driver Version:           11000
NVRM version:                  NVIDIA UNIX ppc64le Kernel Module  450.80.02  Wed Sep 23 00:55:04 UTC 2020

Device Number:                 0
Device Name:                   Tesla V100-SXM2-16GB
Device Revision Number:        7.0
Global Memory Size:            16911433728
Number of Multiprocessors:     80
Concurrent Copy and Execution: Yes
Total Constant Memory:         65536
Total Shared Memory per Block: 49152
Registers per Block:           65536
Warp Size:                     32
Maximum Threads per Block:     1024
Maximum Block Dimensions:      1024, 1024, 64
Maximum Grid Dimensions:       2147483647 x 65535 x 65535
Maximum Memory Pitch:          2147483647B
Texture Alignment:             512B
Clock Rate:                    1530 MHz
Execution Timeout:             No
Integrated Device:             No
Can Map Host Memory:           Yes
Compute Mode:                  exclusive-process
Concurrent Kernels:            Yes
ECC Enabled:                   Yes
Memory Clock Rate:             877 MHz
Memory Bus Width:              4096 bits
L2 Cache Size:                 6291456 bytes
Max Threads Per SMP:           2048
Async Engines:                 4
Unified Addressing:            Yes
Managed Memory:                Yes
Concurrent Managed Memory:     Yes
Preemption Supported:          Yes
Cooperative Launch:            Yes
  Multi-Device:                Yes
Default Target:                cc70

Compiler Version Notes

 CC  805.lbm_l(base) 818.tealeaf_l(base) 834.hpgmgfv_l(base)
/usr/lib64/crt1.o:(.rodata+0x8): undefined reference to `main'
/usr/bin/ld: link errors found, deleting executable `a.out'
pgacclnk: child process exit status 1: /sw/summit/xalt/1.2.1/bin/ld
nvc 21.7-0 linuxpower target on Linuxpower 
NVIDIA Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

 FC  819.clvleaf_l(base) 828.pot3d_l(base) 835.weather_l(base)
nvfortran 21.7-0 linuxpower target on Linuxpower 
NVIDIA Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Base Compiler Invocation

C benchmarks:


Fortran benchmarks:


Base Optimization Flags

C benchmarks:

 -O3   -acc=gpu 

Fortran benchmarks:

 -O3   -acc=gpu 

The flags file that was used to format this result can be browsed at

You can also download the XML flags source by saving the following link: