About NAMD

NAMD, recipient of a 2002 Gordon Bell Award and a 2012 Sidney Fernbach Award, is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 200,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR. NAMD is distributed free of charge with source code.

Official website: http://www.ks.uiuc.edu/Research/namd/

Disclaimer: The optimization configuration presented in this article is specific for a particular computational environment. In order to produce an optimized package for a different combination of hardware, software and parallelization environments some changes may be required.

Abstract

This document shows how to build NAMD 2.9 using Intel Cluster Studio XE 2013 with Intel MPI and FFT 2.1.5. The building process explained below was performed on an Intel SandyBridge (E5-2680) node. Several compilers, libraries and compiling options were evaluated to achieve the best performance. Eventually, the following configuration was used to build the code:

 

  • RHEL 6.3
  • Intel Compiler 13.1.0 (Included in the ICS 2013)
  • Intel MPI 4.1.0.024 (Included in the ICS 2013)
  • FFTW 2.1.5 (compiled with the Intel compilers included in the ICS 2013)
  • Charm 6.4.0 (included in the NAMD 2.9 source distribution)
  • TCL 8.5.8 (Downloaded from the NAMD website)

Environment Setup

Both Intel MPI and Intel Composer packages require configuration options to be set before using them. It is important to note that Intel Cluster Studio XE 2013 provides a collection of scripts that sets up MKL environment, including $MKLROOT variable, which is the path to the MKL libraries.

source /share/apps/intel/icsxe/2013.0.028/composer_xe/mkl/bin/mklvars.sh intel64
source /share/apps/intel/icsxe/2013.0.028/composer_xe/bin/compilervars.sh intel64
source /share/apps/intel/icsxe/2013.0.028/mpi/bin64/mpivars.sh
source /share/apps/intel/icsxe/2013.0.028/composer_xe/bin/idbvars.sh intel64
source /share/apps/intel/icsxe/2013.0.028/composer_xe/tbb/bin/tbbvars.sh intel64
source /share/apps/intel/itac/8.1.0.024/intel64/bin/itacvars.sh

Building dependencies

NAMD can run on many architectures exploiting different kind of parallel strategies. This document will focus on MPI version of NAMD and we will study the scalability and performance for this specific paradigm.

FFTW 2.1.5

FFTW needs to be compiled with Intel Cluster Studio 2013.

wget http://www.fftw.org/fftw-2.1.5.tar.gz
tar -zxvf fftw-2.1.5.tar.gz
cd fftw-2.1.5
./configure F77=ifort CC=icc CFLAGS=-O3 FFLAGS=-O3 --enable-threads --enable-float --enable-type-prefix --prefix=/share/libs/fftw/sandybridge/2.1.5/intel-2013/smp
make
make check | tee fftw_check.log
make install

CHARM 6.4.0

Several files were modified before building CHARM. This step is required to adapt the configuration scripts in order to use the ICS 2013 compiler. CHARM is included in the NAMD source code tarball. Untar the NAMD and CHARM compressed files (the latter is located inside the source main directory), and apply the changes in the following files, conv-mach.sh, cc-mpicxx.sh and smart-build.pl, using the corresponding patch files conv-mach.sh.patch, cc-mpicxx.sh.patch and smart-build.pl.patch.

tar -zxvf NAMD_2.9_Source.tar.gz
export $NAMD_SRC=$PWD/NAMD_2.9_Source
cd $NAMD_SRC
tar -xvf charm-6.4.0.tar
cd charm-6.4.0/src/arch/mpi-linux-x86_64/
patch conv-mach.sh < ~/src/NAMD/conv-mach.sh.patch
patch cc-mpicxx.sh < ~/src/NAMD/cc-mpicxx.sh.patch
patch smart-build.pl < ~/src/NAMD/smart-build.pl.patch
cd $NAMD_SRC/charm-6.4.0

Patches applied to CHARM

conv-mach.sh

diff -uN conv-mach.sh.orig conv-mach.sh
--- conv-mach.sh.orig 2012-03-12 10:00:04.000000000 +1300
+++ conv-mach.sh 2013-06-04 16:38:35.335737000 +1200
@@ -5,8 +5,8 @@
then
. $CHARMINC/MPIOPTS
else
- MPICXX_DEF=mpicxx
- MPICC_DEF=mpicc
+ MPICXX_DEF=mpiicpc
+ MPICC_DEF=mpiicc
fi
test -z "$MPICXX" && MPICXX=$MPICXX_DEF
@@ -22,6 +22,7 @@
CMK_REAL_COMPILER=`$MPICXX -show 2>/dev/null | cut -d' ' -f1 `
case "$CMK_REAL_COMPILER" in
g++) CMK_AMD64="-m64 -fPIC" ;;
+icpc) CMK_AMD64="-O3 -fPIC" ;;
pgCC) CMK_AMD64="-fPIC -DCMK_FIND_FIRST_OF_PREDICATE=1" ;;
charmc) echo "Error> charmc can not call AMPI's mpicxx/mpiCC wrapper! Please fix your PATH."; exit 1 ;;
esac
@@ -36,10 +37,10 @@
CMK_LIBS="-lckqt $CMK_SYSLIBS "
CMK_LD_LIBRARY_PATH="-Wl,-rpath,$CHARMLIBSO/"
-CMK_NATIVE_CC="gcc $CMK_AMD64 "
-CMK_NATIVE_LD="gcc $CMK_AMD64 "
-CMK_NATIVE_CXX="g++ $CMK_AMD64 "
-CMK_NATIVE_LDXX="g++ $CMK_AMD64 "
+CMK_NATIVE_CC="icc $CMK_AMD64 "
+CMK_NATIVE_LD="icc $CMK_AMD64 "
+CMK_NATIVE_CXX="icpc $CMK_AMD64 "
+CMK_NATIVE_LDXX="icpc $CMK_AMD64 "
CMK_NATIVE_LIBS=""
# fortran compiler

cc-mpicxx.sh

diff -uN cc-mpicxx.sh.orig cc-mpicxx.sh
--- cc-mpicxx.sh.orig 2012-03-12 10:00:03.000000000 +1300
+++ cc-mpicxx.sh 2013-06-04 14:04:21.030877000 +1200
@@ -1,8 +1,8 @@
# user enviorn var: MPICXX and MPICC
# or, use the definition in file $CHARMINC/MPIOPTS
-MPICXX_DEF=mpicxx
-MPICC_DEF=mpicc
+MPICXX_DEF=mpiicpc
+MPICC_DEF=mpiicc
MPICXX=$MPICXX_DEF
MPICC=$MPICC_DEF
@@ -17,7 +17,7 @@
CMK_REAL_COMPILER=`$MPICXX -show 2>/dev/null | cut -d' ' -f1 `
case "$CMK_REAL_COMPILER" in
g++) CMK_AMD64="-m64 -fPIC" ;;
-icpc) CMK_AMD64="-m64";;
+icpc) CMK_AMD64="-O3";;
pgCC) CMK_AMD64="-DCMK_FIND_FIRST_OF_PREDICATE=1 " ;;
FCC) CMK_AMD64="-Kfast -DCMK_FIND_FIRST_OF_PREDICATE=1 --variadic_macros";;
esac
@@ -48,8 +48,8 @@
# fortran compiler
# for Intel Fortran compiler 8.0 and higher which is renamed to ifort from ifc
# does not work for ifc 7.0
-CMK_CF77="mpif77 -auto -fPIC "
-CMK_CF90="mpif90 -auto -fPIC "
+CMK_CF77="mpiifort -auto -fPIC "
+CMK_CF90="mpiifort -auto -fPIC "
CMK_CF90_FIXED="$CMK_CF90 -132 -FI "
F90DIR=`which ifort 2> /dev/null`
if test -h "$F90DIR"

smart-build.pl

diff -u ./smart-build.pl.orig ./smart-build.pl
--- ./smart-build.pl.orig 2013-04-11 13:00:19.131994834 +1200
+++ ./smart-build.pl 2013-04-11 11:39:08.230299433 +1200
@@ -108,6 +108,11 @@
$mpi_found = "true";
$mpioption = "mpicxx";
}
+$m = system("which mpicc mpiicpc > /dev/null 2>/dev/null") / 256;
+if($m == 0){
+ $mpi_found = "true";
+ $mpioption = "mpiicpc";
+}
# Give option of just using the mpi version if mpicc and mpiCC are found
if($mpi_found eq "true"){

Build CHARM

The following commands were used, as specified in the installation guide of NAMD:

MPICXX=mpiicpc CXX=icpc ./build charm++  mpi-linux-x86_64 mpicxx ifort --with-production --no-shared -O3 -DCMK_OPTIMIZE=1 | tee build_charm_mpi-linux-x86_64-01.log
cd $NAMD_SRC

The expected output is something similar to “charm++ built successfully”. To verify whether it is working properly, make the tests suite for charm++.

cd mpi-linux-x86_64-ifort-mpicxx/tests/charm++/megatest
make pgm
mpirun -n 4 ./pgm

TCL 8.5.8

TCL used in this configuration has been downloaded from the NAMD website.

wget http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.5.9-linux-x86_64.tar.gz
wget http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.5.9-linux-x86_64-threaded.tar.gz
tar xzf tcl8.5.9-linux-x86_64.tar.gz
tar xzf tcl8.5.9-linux-x86_64-threaded.tar.gz
mv tcl8.5.9-linux-x86_64 tcl
mv tcl8.5.9-linux-x86_64-threaded tcl-threaded

Configuration and code patching

In order to compile with Intel Cluster Studio XE 2013 and taking into account the previous dependencies, some changes need to be applied at several makefiles.

Setup the CHARMBASE in the Make.charm

cat Make.charm
# Set CHARMBASE to the top level charm directory.
# The config script will override this setting if there is a directory
# called charm-6.4.0 or charm in the NAMD base directory.
CHARMBASE = /share/apps/NAMD/sandybridge/2.9/ics-2013/charm-6.4.0

Setup the architecture files

cat arch/Linux-x86_64-ics-2013.arch
NAMD_ARCH = Linux-x86_64
CHARMARCH = mpi-linux-x86_64-ifort-mpicxx
FLOATOPTS = -ip -fno-rtti -no-vec
CXX = icpc
CXXOPTS = -static-intel -O2 $(FLOATOPTS)
CXXNOALIASOPTS = -O2 -fno-alias $(FLOATOPTS)
CC = icc
COPTS = -static-intel -O2 $(FLOATOPTS)

Setup the FFTW file with Intel MKL FFTW Wrappers

cat arch/Linux-x86_64.fftw
FFTDIR=/share/libs/fftw/sandybridge/2.1.5/intel-2013/smp
FFTINCL=-I$(FFTDIR)/include
FFTLIB=-L$(FFTDIR)/lib -lsrfftw -lsfftw
FFTFLAGS=-DNAMD_FFTW
FFT=$(FFTINCL) $(FFTFLAGS)

Setup the TCL file

cat arch/Linux-x86_64.tcl
#TCLDIR=/Projects/namd2/tcl/tcl8.5.9-linux-x86_64
TCLDIR=/share/apps/NAMD/sandybridge/2.9/ics-2013/tcl-threaded
TCLINCL=-I$(TCLDIR)/include
#TCLLIB=-L$(TCLDIR)/lib -ltcl8.5 -ldl
TCLLIB=-L$(TCLDIR)/lib -ltcl8.5 -ldl -lpthread
TCLFLAGS=-DNAMD_TCL
TCL=$(TCLINCL) $(TCLFLAGS)

Building the code

The following commands were used, as specified in the installation guide of NAMD:

./config Linux-x86_64-ics-2013 --charm-base ./charm-6.4.0 --charm-arch mpi-linux-x86_64-ifort-mpicxx
cd Linux-x86_64-ics-2013
make | tee ../Linux-x86_64-ics-2013_make.log

Setting up the modulefile

Modules are an easy and fast way to provide the application to the users in a “ready-to-use” fashion. Nevertheless, some applications require special environment variables to run properly, and for that reason we have created a MPIRUN wrapper as a modules environment alias.

#%Module1.0
module-whatis "NAMD : Scalable molecular dynamics - Compiled with Intel Cluster Studio 2013, Charm++ 6.4.0 and FFTW 2.1.5"
module load intel/ics-2013
module load fftw2/2.1.5_intel-2013-smp-sandybridge
set root /share/apps/NAMD/sandybridge/2.9/ics-2013/Linux-x86_64-ics-2013
prepend-path PATH $root
set-alias MPIRUN "mpiexec.hydra -machinefile \$LOADL_HOSTFILE -genv I_MPI_FABRICS dapl -genv I_MPI_PIN_PROCESSOR_LIST='grain=cache2,shift=sock' -envall \$*"

Advice related to MPI flags

As you can see in the module file, the suggested flags for the Intel MPI are the following:

-genv I_MPI_FABRICS dapl -genv I_MPI_PIN_PROCESSOR_LIST='grain=cache2,shift=sock'

Code testing, benchmark results and application scalability

Benchmarking can be useful to investigate if the obtained performance can meet your expectations. In this particular case, you can also use the benchmarks to review the correctness of the obtained results.

In addition, and for efficiency reasons, we need to know and report about the application scalability for a particular hardware design. This can save many cpu time when users start to submit jobs for the first time, warning them not to ask for more cores than the application is capable to scale.

NAMD implementation demonstrated good scalability for SandyBridge and Infiniband QDR (NeSI Pan Cluster). The following chart shows the NAMD 2.9 scalability for the popular APOA1 benchmark, that you can find on the official website.

Author: Jordi Blasco (Landcare Research / NeSI)

Reviewers: Gene Soudlenkov (Center for eResearch - University of Auckland)

Sina Masoud-Ansari (Center for eResearch - University of Auckland)

Alfred Gil (HPCNow!)