SPEC® MPIM2007 Result

Copyright 2006-2010 Standard Performance Evaluation Corporation

AMD, QLogic Corporation, Rackable Systems, IWILL

AMD Emerald Cluster: AMD Opteron CPUs,
QLogic InfiniPath/SilverStorm Interconnect

SPECmpiM_peak2007 = Not Run

MPI2007 license: 0018 Test date: May-2007
Test sponsor: QLogic Corporation Hardware Availability: Nov-2006
Tested by: QLogic Performance Engineering Software Availability: Jul-2007
Benchmark results graph

Results Table

Benchmark Base Peak
Ranks Seconds Ratio Seconds Ratio Seconds Ratio Ranks Seconds Ratio Seconds Ratio Seconds Ratio
Results appear in the order in which they were run. Bold underlined text indicates a median measurement.
104.milc 512 46.7 33.5  45.5 34.4  45.4 34.5 
107.leslie3d 512 180   29.0  177   29.5  207   25.3 
113.GemsFDTD 512 1338   4.72 1132   5.57 1122   5.62
115.fds4 512 78.4 24.9  116   16.8  76.4 25.5 
121.pop2 512 259   16.0  275   15.0  284   14.5 
122.tachyon 512 76.2 36.7  76.2 36.7  81.3 34.4 
126.lammps 512 232   12.5  231   12.6  232   12.6 
127.wrf2 512 205   38.0  187   41.6  186   41.9 
128.GAPgeofem 512 64.9 31.8  66.7 31.0  69.4 29.7 
129.tera_tf 512 108   25.6  109   25.4  111   24.8 
130.socorro 512 150   25.5  151   25.2  156   24.5 
132.zeusmp2 512 76.0 40.8  76.0 40.8  76.6 40.5 
137.lu 512 60.2 61.0  69.7 52.8  62.5 58.8 
Hardware Summary
Type of System: Homogenous
Compute Node: Rackable, IWILL, AMD
Interconnects: QLogic InfiniBand HCAs and switches
Broadcom NICs, Force10 switches
File Server Node: Headnode NFS filesystem
Head Node: Rackable, IWILL, AMD
Other Node: Headnode NFS filesystem
Total Compute Nodes: 128
Total Chips: 256
Total Cores: 512
Total Threads: 512
Total Memory: 1 TB
Base Ranks Run: 512
Minimum Peak Ranks: --
Maximum Peak Ranks: --
Software Summary
C Compiler: QLogic PathScale C Compiler 3.0
C++ Compiler: QLogic PathScale C++ Compiler 3.0
Fortran Compiler: QLogic PathScale Fortran Compiler 3.0
Base Pointers: 64-bit
Peak Pointers: 64-bit
MPI Library: QLogic InfiniPath MPI 2.1
Other MPI Info: None
Pre-processors: No
Other Software: None

Node Description: Rackable, IWILL, AMD

Hardware
Number of nodes: 128
Uses of the node: compute, head
Vendor: Rackable Systems, IWILL, AMD
Model: Rackable Systems C1000 chassis, IWILL DK8-HTX
motherboard
CPU Name: AMD Opteron 290
CPU(s) orderable: 1-2 chips
Chips enabled: 2
Cores enabled: 4
Cores per chip: 2
Threads per core: 1
CPU Characteristics: --
CPU MHz: 2800
Primary Cache: 64 KB I + 64 KB D on chip per core
Secondary Cache: 1 MB I+D on chip per core
L3 Cache: None
Other Cache: None
Memory: 8 GB (8 x 1 GB DDR400)
Disk Subsystem: 250 GB, SATA
Other Hardware: Nodes custom-built by Rackable Systems. The
Rackable C1000 chassis is half-depth with 450W,
48 VDC Power Supply. Integrated Gigabit Ethernet
for admin/filesystem.
Adapter: Intel 82541PI Gigabit Ethernet controller
Number of Adapters: 1
Slot Type: integrated on motherboard
Data Rate: 1 Gbps Ethernet
Ports Used: 1
Interconnect Type: Ethernet
Adapter: QLogic InfiniPath QHT7140
Number of Adapters: 1
Slot Type: HTX
Data Rate: InfiniBand 4x SDR
Ports Used: 1
Interconnect Type: InfiniBand
Software
Adapter: Intel 82541PI Gigabit Ethernet controller
Adapter Driver: Part of Linux kernel modules
Adapter Firmware: None
Adapter: QLogic InfiniPath QHT7140
Adapter Driver: InfiniPath 2.1
Adapter Firmware: None
Operating System: ClusterCorp Rocks 4.2.1
(Based on RedHat Enterprise Linux 4.0 Update 4)
Local File System: Linux ext3
Shared File System: NFS
System State: Multi-User
Other Software: Sun Grid Engine 6.0

Node Description: Headnode NFS filesystem

Hardware
Number of nodes: 1
Uses of the node: file server, other
Vendor: Tyan
Model: Thunder K8QSD Pro (S4882) motherboard
CPU Name: AMD Opteron 885
CPU(s) orderable: 1-4 chips
Chips enabled: 4
Cores enabled: 8
Cores per chip: 2
Threads per core: 1
CPU Characteristics: --
CPU MHz: 2600
Primary Cache: 64 KB I + 64 KB D on chip per core
Secondary Cache: 1 MB I+D on chip per core
L3 Cache: None
Other Cache: None
Memory: 16 GB (16 x 1 GB DDR400 dimms)
Disk Subsystem: 250 GB, SATA, 7200 RPM
Other Hardware: None
Adapter: Broadcom BCM5704C
Number of Adapters: 2
Slot Type: integrated on motherboard
Data Rate: 1 Gbps Ethernet
Ports Used: 2
Interconnect Type: Ethernet
Software
Adapter: Broadcom BCM5704C
Adapter Driver: Part of Linux kernel modules
Adapter Firmware: None
Operating System: ClusterCorp Rocks 4.2.1
(Based on RedHat Enterprise Linux 4.0 Update 4)
Local File System: Linux ext3
Shared File System: NFS
System State: Multi-User
Other Software: Sun Grid Engine 6.0

General Notes

"other" purposes of this node: login, compile, job submission
and queuing.
This node assembled with a 2U chassis and 700 watt ATX 12V Power Supply.

Interconnect Description: QLogic InfiniBand HCAs and switches

Hardware
Vendor: QLogic
Model: InfiniPath and Silverstorm
Switch Model: QLogic SilverStorm 9120 Fabric Director
Number of Switches: 1
Number of Ports: 144
Data Rate: InfiniBand 4x SDR and InfiniBand 4x DDR
Firmware: 3.4.0.5.2
Topology: Single switch (star)
Primary Use: MPI traffic

General Notes

The data rate between InifniPath HCAs and SilverStorm switches
is SDR. However, DDR is used for inter-switch links.

Interconnect Description: Broadcom NICs, Force10 switches

Hardware
Vendor: Force10
Model: E300
Switch Model: Force10 E300 Gig-E switch
Number of Switches: 1
Number of Ports: 288
Data Rate: 1 Gbps Ethernet
Firmware: N/A
Topology: Single switch (star)
Primary Use: file system traffic

Base Compiler Invocation

C benchmarks:

 /usr/bin/mpicc -cc=pathcc 

C++ benchmarks:

126.lammps:  /usr/bin/mpicxx -CC=pathCC 

Fortran benchmarks:

107.leslie3d:  /usr/bin/mpif90 -f90=pathf90 
113.GemsFDTD:  /usr/bin/mpif90 -f90=pathf90 
115.fds4:  /usr/bin/mpif90 -f90=pathf90 
129.tera_tf:  /usr/bin/mpif90 -f90=pathf90 
132.zeusmp2:  /usr/bin/mpif90 -f90=pathf90 
137.lu:  /usr/bin/mpif90 -f90=pathf90 

Benchmarks using both Fortran and C (except as noted below):

 /usr/bin/mpicc -cc=pathcc   /usr/bin/mpif90 -f90=pathf90 

Base Portability Flags

104.milc:  -DSPEC_MPI_LP64 
121.pop2:  -DSPEC_MPI_DOUBLE_UNDERSCORE   -DSPEC_MPI_LP64 
122.tachyon:  -DSPEC_MPI_LP64 
127.wrf2:  -DF2CSTYLE   -DSPEC_MPI_DOUBLE_UNDERSCORE   -DSPEC_MPI_LINUX   -DSPEC_MPI_LP64 
128.GAPgeofem:  -DSPEC_MPI_LP64 
130.socorro:  -fno-second-underscore   -DSPEC_MPI_LP64 

Base Optimization Flags

C benchmarks:

 -march=opteron   -Ofast 

C++ benchmarks:

126.lammps:  -march=opteron   -O3   -OPT:Ofast   -CG:local_fwd_sched=on 

Fortran benchmarks:

107.leslie3d:  -march=opteron   -O3   -OPT:Ofast   -OPT:malloc_alg=1   -LANG:copyinout=off 
113.GemsFDTD:  -march=opteron   -O3   -OPT:Ofast   -OPT:malloc_alg=1   -LANG:copyinout=off 
115.fds4:  -march=opteron   -O3   -OPT:Ofast   -OPT:malloc_alg=1   -LANG:copyinout=off 
129.tera_tf:  -march=opteron   -O3   -OPT:Ofast   -OPT:malloc_alg=1   -LANG:copyinout=off 
132.zeusmp2:  -march=opteron   -O3   -OPT:Ofast   -OPT:malloc_alg=1   -LANG:copyinout=off 
137.lu:  -march=opteron   -O3   -OPT:Ofast   -OPT:malloc_alg=1   -LANG:copyinout=off 

Benchmarks using both Fortran and C:

121.pop2:  -march=opteron   -Ofast   -O3   -OPT:Ofast   -OPT:malloc_alg=1   -LANG:copyinout=off 
127.wrf2:  Same as 121.pop2 
128.GAPgeofem:  Same as 121.pop2 
130.socorro:  Same as 121.pop2 

Base Other Flags

C benchmarks:

 -IPA:max_jobs=4 

C++ benchmarks:

126.lammps:  -IPA:max_jobs=4 

Fortran benchmarks:

107.leslie3d:  -IPA:max_jobs=4 
113.GemsFDTD:  -IPA:max_jobs=4 
115.fds4:  -IPA:max_jobs=4 
129.tera_tf:  -IPA:max_jobs=4 
132.zeusmp2:  -IPA:max_jobs=4 
137.lu:  -IPA:max_jobs=4 

Benchmarks using both Fortran and C (except as noted below):

 -IPA:max_jobs=4 

The flags file that was used to format this result can be browsed at
http://www.spec.org/mpi2007/results/flags/MPI2007_flags.20100413.07.html.

You can also download the XML flags source by saving the following link:
http://www.spec.org/mpi2007/results/flags/MPI2007_flags.20100413.07.xml.