SPEC® MPIL2007 Result

Copyright 2006-2010 Standard Performance Evaluation Corporation

Intel Corporation

Endeavor (Intel Xeon X5670, 2.93 GHz,
DDR3-1333 MHz, SMT on, Turbo off)

MPI2007 license: 13 Test date: Nov-2010
Test sponsor: Intel Corporation Hardware Availability: Mar-2010
Tested by: Pavel Shelepugin Software Availability: Nov-2010
Benchmark results graph

Results Table

Benchmark Base Peak
Ranks Seconds Ratio Seconds Ratio Seconds Ratio Ranks Seconds Ratio Seconds Ratio Seconds Ratio
Results appear in the order in which they were run. Bold underlined text indicates a median measurement.
121.pop2 3072 103   37.7  86.4 45.0  88.7 43.9  2048 77.1 50.4  77.7 50.1 77.3 50.3
122.tachyon 3072 937   2.08 73.5 26.5  73.3 26.5  3072 938   2.07 67.2 28.9 67.1 29.0
125.RAxML 3072 127   23.0  127   22.9  127   22.9  3072 126   23.2  125   23.3 125   23.3
126.lammps 3072 42.2 58.3  39.6 62.1  40.2 61.2  3072 42.9 57.4  38.6 63.6 38.7 63.5
128.GAPgeofem 3072 189   31.4  190   31.3  194   30.5  3072 195   30.5  192   30.9 184   32.3
129.tera_tf 3072 48.6 22.6  44.0 25.0  45.9 24.0  3072 42.5 25.8  41.9 26.2 42.0 26.2
132.zeusmp2 3072 41.8 50.7  43.0 49.3  42.1 50.3  2048 39.7 53.4  40.0 53.0 40.2 52.8
137.lu 3072 38.3 110    38.9 108    39.4 107    2048 34.4 122    34.6 121   34.6 121  
142.dmilc 3072 24.6 150    24.8 148    24.7 149    3072 24.2 152    24.1 153   24.4 151  
143.dleslie 3072 831   3.73 826   3.75 828   3.74 2048 32.9 94.1  31.9 97.3 31.3 99.2
145.lGemsFDTD 3072 111   39.7  110   40.1  127   34.8  2048 104   42.6  114   38.6 103   42.8
147.l2wrf2 3072 125   65.4  126   64.9  126   64.9  3072 128   64.0  137   59.8 122   67.1
Hardware Summary
Type of System: Homogeneous
Compute Node: Endeavor Node
Interconnects: IB Switch
Gigabit Ethernet
File Server Node: NFS
Total Compute Nodes: 256
Total Chips: 512
Total Cores: 3072
Total Threads: 6144
Total Memory: 6 TB
Base Ranks Run: 3072
Minimum Peak Ranks: 2048
Maximum Peak Ranks: 3072
Software Summary
C Compiler: Intel C++ Compiler 12.0.0.072 for Linux
C++ Compiler: Intel C++ Compiler 12.0.0.072 for Linux
Fortran Compiler: Intel Fortran Compiler 12.0.0.072 for Linux
Base Pointers: 64-bit
Peak Pointers: 64-bit
MPI Library: Intel MPI Library 4.0.1.005 for Linux
Other MPI Info: None
Pre-processors: No
Other Software: None

Node Description: Endeavor Node

Hardware
Number of nodes: 256
Uses of the node: compute
Vendor: Intel
Model: SR1600UR
CPU Name: Intel Xeon X5670
CPU(s) orderable: 1-2 chips
Chips enabled: 2
Cores enabled: 12
Cores per chip: 6
Threads per core: 2
CPU Characteristics: Intel Turbo Boost Technology disabled,
6.4 GT/s QPI, Hyper-Threading enabled
CPU MHz: 2934
Primary Cache: 32 KB I + 32 KB D on chip per core
Secondary Cache: 256 KB I+D on chip per core
L3 Cache: 12 MB I+D on chip per chip, 12 MB shared / 6 cores
Other Cache: None
Memory: 24 GB (Dual-rank RDIMM 6x4-GB DDR3-1333 MHz)
Disk Subsystem: Seagate 400 GB ST3400755SS
Other Hardware: None
Adapter: Intel (ESB2) 82575EB Dual-Port Gigabit
Ethernet Controller
Number of Adapters: 1
Slot Type: PCI-Express x8
Data Rate: 1Gbps Ethernet
Ports Used: 2
Interconnect Type: Ethernet
Adapter: Mellanox MHQH29-XTC
Number of Adapters: 1
Slot Type: PCIe x8 Gen2
Data Rate: InfiniBand 4x QDR
Ports Used: 1
Interconnect Type: InfiniBand
Software
Adapter: Intel (ESB2) 82575EB Dual-Port Gigabit
Ethernet Controller
Adapter Driver: e1000
Adapter Firmware: None
Adapter: Mellanox MHQH29-XTC
Adapter Driver: OFED 1.4.2
Adapter Firmware: 2.7.000
Operating System: Red Hat EL 5.4, kernel 2.6.18-164
Local File System: Linux/ext2
Shared File System: NFS
System State: Multi-User
Other Software: PBS Pro 10.1

Node Description: NFS

Hardware
Number of nodes: 1
Uses of the node: fileserver
Vendor: Intel
Model: S7000FC4UR
CPU Name: Intel Xeon CPU
CPU(s) orderable: 1-4 chips
Chips enabled: 4
Cores enabled: 16
Cores per chip: 4
Threads per core: 2
CPU Characteristics: --
CPU MHz: 2926
Primary Cache: 32 KB I + 32 KB D on chip per core
Secondary Cache: 8 MB I+D on chip per chip, 4 MB shared / 2 cores
L3 Cache: None
Other Cache: None
Memory: 64 GB
Disk Subsystem: 8 disks, 500GB/disk, 2.7TB total
Other Hardware: None
Adapter: Intel 82563GB Dual-Port Gigabit
Ethernet Controller
Number of Adapters: 1
Slot Type: PCI-Express x8
Data Rate: 1Gbps Ethernet
Ports Used: 1
Interconnect Type: Ethernet
Software
Adapter: Intel 82563GB Dual-Port Gigabit
Ethernet Controller
Adapter Driver: e1000e
Adapter Firmware: N/A
Operating System: RedHat EL 5 Update 4
Local File System: None
Shared File System: NFS
System State: Multi-User
Other Software: None

Interconnect Description: IB Switch

Hardware
Vendor: Mellanox
Model: Mellanox MTS3600Q-1UNC
Switch Model: Mellanox MTS3600Q-1UNC
Number of Switches: 46
Number of Ports: 36
Data Rate: InfiniBand 4x QDR
Firmware: 7.1.000
Topology: Fat tree
Primary Use: MPI traffic

Interconnect Description: Gigabit Ethernet

Hardware
Vendor: Force10 Networks
Model: Force10 S50, Force10 C300
Switch Model: Force10 S50, Force10 C300
Number of Switches: 15
Number of Ports: 48
Data Rate: 1Gbps Ethernet
Firmware: 8.2.1.0
Topology: Fat tree
Primary Use: Cluster File System

Submit Notes

The config file option 'submit' was used.

General Notes

 MPI startup command:
   mpiexec.hydra command was used to start MPI jobs. To start a job by
   this command, the daemons are not required to be run beforehand.

 BIOS settings:
   Intel Hyper-Threading Technology (SMT): Enabled (default is Enabled)
   Intel Turbo Boost Technology (Turbo)  : Disabled (default is Enabled)

 RAM configuration:
   Compute nodes have 1x4-GB RDIMM on each memory channel.

 Network:
   Forty six 36-port switches: 18 core switches and 28 leaf switches.
   Each leaf has one link to each core. Remaining 18 ports on 25 of 28 leafs
   are used for compute nodes. On the remaining 3 leafs the ports are used
   for FS nodes and other peripherals.

 Job placement:
   Each MPI job was assigned to a topologically compact set of nodes, i.e.
   the minimal needed number of leaf switches was used for each job: 1 switch
   for 96/192 ranks, 2 switches for 384 ranks, 4 switches for 768 ranks,
   8 switches for 1536 ranks, 15 switches for 3072 ranks.

 PBS Pro was used for job submission. It has no impact on performance.
   Can be found at: http://www.altair.com

Compiler Invocation

C benchmarks:

 mpiicc 

C++ benchmarks:

126.lammps:  mpiicpc 

Fortran benchmarks:

 mpiifort 

Benchmarks using both Fortran and C:

 mpiicc   mpiifort 

Portability Flags

121.pop2:  -DSPEC_MPI_CASE_FLAG 
126.lammps:  -DMPICH_IGNORE_CXX_SEEK 

Base Optimization Flags

C benchmarks:

 -O3   -xSSE4.2   -no-prec-div 

C++ benchmarks:

126.lammps:  -O3   -xSSE4.2   -no-prec-div 

Fortran benchmarks:

 -O3   -xSSE4.2   -no-prec-div 

Benchmarks using both Fortran and C:

 -O3   -xSSE4.2   -no-prec-div 

Peak Optimization Flags

C benchmarks:

 -O3   -xSSE4.2   -no-prec-div   -ipo 

C++ benchmarks:

126.lammps:  -O3   -xSSE4.2   -no-prec-div   -ipo 

Fortran benchmarks:

 -O3   -xSSE4.2   -no-prec-div   -ipo 

Benchmarks using both Fortran and C:

121.pop2:  -O3   -xSSE4.2   -no-prec-div   -ipo 
128.GAPgeofem:  -O3   -xSSE4.2   -no-prec-div 
132.zeusmp2:  Same as 121.pop2 
147.l2wrf2:  Same as 121.pop2 

The flags file that was used to format this result can be browsed at
http://www.spec.org/mpi2007/flags/EM64T_Intel111_flags.20120720.html.

You can also download the XML flags source by saving the following link:
http://www.spec.org/mpi2007/flags/EM64T_Intel111_flags.20120720.xml.