From: Revthi Sanker
Date: Thu Sep 05 2013 - 03:47:19 CDT
Dear Sir,
This is the script I am using to run NAMD in my cluster:
#@ output = test.out
#@ error = test.err
#@ job_type = MPICH
#@ class = Medium128
#@ node = 8
#@tasks_per_node = 16
#@ environment = COPY_ALL
#@ queue
Jobid=`echo $LOADL_STEP_ID | cut -f 6 -d .`
mkdir -p $tmpdir; cd $tmpdir
cp -R $LOADL_STEP_INITDIR/* $tmpdir
cat $LOADL_HOSTFILE > ./host.list
export LD_LIBRARY_PATH=/sware/openmpi1.6/lib:$LD_LIBRARY_PATH
/sware/openmpi1.6/bin/mpirun --mca btl openib,self -np 128 -hostfile
$LOADL_HOSTFILE /sware/NAMD_2.9_Source/Linux-x86_64-g++/namd2 md9.namd
mv ../job$Jobid $LOADL_STEP_INITDIR
Am I failing to include something? Kinldy provide your valuable suggestions
in this regard.
Thanks in advance.
M.S. Research Scholar
Indian Institute Of Technology, Madras
On Thu, Sep 5, 2013 at 12:02 PM, Norman Geist <> wrote:
> Hi Revthi,****
> you should also have mentioned if you use an NAMD compiled against charm++
> or MPI. If charm++, try “+idlepoll” to the namd2 command, it should
> additionally improve scaling, sometimes two fold. Furthermore, if you have
> hyperthreading or magnycores, try to use half of the cores claimed per node
> and bind the processes to real physical cores only. You can use
> “/proc/cpuinfo” to determine that. “processors” with same “physical id” and
> “core id” usually appear to be the same physical core, these should not be
> used as they are bottlenecked due memory or fpu. Using “taskset” on the
> namd2 command, you can easily control which cores are allowed. ****
> Example:****
> charmrun +p 64 ++nodelist nodelist taskset –c 0,2,4,6 namd2 +idlepoll
> If you do not have virtual cores, forget about the above for now, but keep
> in mind for the future as it has a large impact.****
> Additionally, it is easy to say how well a scaling is, if you just compare
> the speedup to the ideal linear case. Therefore simply devide the time/step
> of 1node by time /step of n nodes. This number will usually be <= n nodes.
> The nearer it is to n nodes, the better. Do some benchmarks while
> increasing number of nodes and keep in mind that there can be a point of
> outscaling, where the time/step will start raising again. But you do not
> seem to hit that case already.****
> So far I think there’s a little more to squeeze out for 300K system doing
> about 2.5ns/day.****
> Good luck****
> Norman Geist.****
> *Von:* [] *Im
> Auftrag von *Axel Kohlmeyer
> *Gesendet:* Mittwoch, 4. September 2013 10:01
> *An:* Revthi Sanker
> *Cc:*
> *Betreff:* Re: namd-l: namd scale-up****
> On Wed, Sep 4, 2013 at 9:43 AM, Revthi Sanker <>
> wrote:****
> Dear all, ****
> I am running NAMD on the super cluster at my institute. My system consists
> of 3 L atoms roughly.****
> please keep in mind that most people on this mailing list (and in the
> world in general) do not know what a lakh is and better talk about 300,000
> atoms instead. what would you think if somebody would talk to you about a
> system with 2000 gross atoms?****
> I am aware that the scale up depends on the configuration of the cluster
> I am currently using. But the people at the computer center would like to
> get a rough estimate of the the Benchmark (ns/day) for a system size of
> mine. Anybody who is aware of the yield for this system size, please let
> me know as I am not sure if what I am getting currently (*2.5 ns/day *for
> 8 nodes* 16 processors=128) is optimum or can it be tweaked further.****
> the only way to find out the optimum, is by doing a (strong) scaling
> benchmark, i.e. use a different number of nodes and plot the resulting
> speedup. the performance depends not only on the hardware (CPU
> (type,generation,clock rate), memory bandwith, interconnect, BIOS
> configuration (e.g. hyper-threading, turbo boost)), but also on software
> (kernel, NAMD version, compiler, configuration (SMP, MPI, ibverbs)) and
> your system and input. so there is no way to tell from the number of atoms
> in the system and the number of nodes/cores whether you have a good
> performance or a bad performance.****
> you can compare your numbers (absolute per cpu core performance and
you can compare your numbers (absolute per cpu core performance and speedup) to other published data from other machines (even if much older)
