From: Bin He (
Date: Tue Nov 11 2014 - 06:19:06 CST
Thanks a lot for your kind reply.
I am sorry that the time data I provided was confused.
So I used the default binary(Download from the NAMD website) to test again.
The binary I used:
GPU k20m*2
*1 node with multicore-CUDA version:*
*./namd2 +p16 +deices 0,1 ../workload/f1atpase2000/f1atpase.namd *
*1 node with ibverbs-smp-CUDA*
++p 16 ++ppn 8 ++nodelist nodelist ++scalable-start ++verbose
_64-ibverbs-smp-CUDA/namd2 +devices 0,1
With "++local", the application can not start. So I have to run with
nodelist content
group main ++shell ssh-
host node330
host node330
*2 node with ibverbs-smp-CUDA*
* /home/gpuusr/binhe/namd/NAMD_2.10b1_Linux-x86_64-ibverbs-smp-CUDA/charmrun
++p 32 ++ppn 8 ++nodelist nodelist2node ++scalable-start ++verbose
x-x86_64-ibverbs-smp-CUDA/namd2 +devices 0,1
nodelist content
group main ++shell ssh-
host node330
host node330
host node329
host node329
*4 node with ibverbs-smp-CUDA *
* /home/gpuusr/binhe/namd/NAMD_2.10b1_Linux-x86_64-ibverbs-smp-CUDA/charmrun
++p 64 ++ppn 8 ++nodelist nodelist4node ++scalable-start ++verbose
x-x86_64-ibverbs-smp-CUDA/namd2 +devices 0,1
nodelist content
group main ++shell ssh-
host node330
host node330
host node329
host node329
host node328
host node328
host node332
host node332
numstep 2000; outputEnergies 100
Actually, the ibverbs-smp-CUDA version scales not bad. BUT the the cpu
Cpu(s): 53.1%us, 29.0%sy, 0.0%ni, 17.9%id, 0.0%wa, 0.0%hi, 0.0%si,
means that not all computing resources are used well.
And We can find that ibverbs-smp-CUDA is slower than multicores-CUDA in a
node. Yes, Network bandwidth and latency may cause it, but ibverbs version
without CUDA scale well and the cpu usage of ibverbs version is perfect
when running several nodes.
So, I do not think network bandwidth and latency is the key reason to cause
it. How can I increase the cpu usage and accelerate namd?
Best Regards!
Bin He
Member of IT
Unique Studio
Room 811,Building LiangSheng,1037 Luoyu Road, Wuhan 430074,P.R. China
☎:(+86) 13163260252
2014-11-11 16:48 GMT+08:00 Norman Geist <>:
> Ok, you actually DON’T have a problem! You compare apples with oranges.
> To compare the performance of different binaries, you SHOULD use the same
> hardware. So you would want to test the ibverbs version on the machine with
> 4gpus+16 cores or vice versa the multicore binary on one of the
> 2GPU+12cores nodes.
> Away from that, using multiple nodes introduce a new bottleneck which is
> network bandwidth and latency. So you always will have losses due the
> additional overhead and you cpus spending time in waiting for communication
> rather than working. This varies for different system sizes (Amdahl’s law).
> BUT actually your scaling isn’t that bad. From 2 to 4 nodes It scales by
> 46% instead of ideal 50% (u miss the 1 node case btw.).
> So don’t care about CPU usage, only about the actual timings. Also try to
> namd2 “+idlepoll” which can improve parallel scaling across network.
> Also for CUDA and small systems try in config:
> twoawayx yes
> only if that brings improvement try
> twoawayx yes
> twoawayy yes
> only if that brings improvement try
> twoawayx yes
> twoawayy yes
> twoawayz yes
> Most of cases twoawayx is enough or already too much.
> Norman Geist.
> *Von:* [] *Im
> Auftrag **von *Bin He
> *Gesendet:* Montag, 10. November 2014 20:51
> *An:* Norman Geist
> *Cc:*
> *Betreff:* Re: namd-l: Why CPU Usage is low when I run ibverbs-smp-cuda
> version NAMD
> 1. Using the servers mentioned above, I got the result:
> multicores-CUDA
> 4
> 16
> 64s
> ibverbs-smp-cuda
> 2 per node
> 12 per node
> 2
> 57
> 2 per node
> 12 per node
> 4
> 37
> when running ibverbs-smp-cuda, the cpu usr usage is less than 50%, and sys
> usage is about 30%.
> The cpu usage is too ugly. *What I want to do is to find the reason why
> the cpu usage is so strange*.
> 2. If I want to get the best performance with cuda, what parameters in the
> config file I can modify?
> ------------------------
> Best Regards!
> Bin He
> Member of IT
> Unique Studio
> Room 811,Building LiangSheng,1037 Luoyu Road, Wuhan 430074,P.R. China
> ☎:(+86) 13163260252
> Weibo:何斌_HUST
> 2014-11-10 14:53 GMT+08:00 Norman Geist <>:
> What you observe might be expectable as the CUDA code of NAMD is
> officially tuned for the multicore version. BUT, do you actually notice any
> performance difference regarding time/step?
> Norman Geist.
> *Von:* [] *Im
> Auftrag von *Bin He
> *Gesendet:* Samstag, 8. November 2014 08:25
> *An:*
> *Betreff:* namd-l: Why CPU Usage is low when I run ibverbs-smp-cuda
> version NAMD
> Hi everyone,
> I am a fresh man to NAMD.
> The Desc of our clusters:
> cpu:E5-2670(8cores)
> memory:32GB
> socket:2
> network:IB
> GPU:k20m*2
> CUDA:6.5
> workload:* f1atpase(numsteps2000)*
> When I run the multicores-namd version, the cpu usage is about 100% and
> GPU usage is about 50%;
> *CMD:./namd2 +p16 +devices 0,1 ../workload/f1atpase/f1atpase.namd*
> cpu time is about 88s.
> When I run the ibverbs-smp-cuda version, the cpu usage is about just
> 40%us and 30 % sy. GPU usage is about 50%.
> *CMD:/home/gpuusr/binhe/namd/NAMD_2.10b1_Linux-x86_64-ibverbs-smp-CUDA/charmrun
> ++p 60 ++ppn 15 ++nodelist nodelist ++scalable-start ++verbose
> /home/gpuusr/binhe/namd/NAMD_2.10b1_Linux-x86_64-ibverbs-smp-CUDA/namd2
> +devices 0,1 /home/gpuusr/binhe/namd/workload/f1atpase/f1atpase.namd*
> cpu time is about 37s.
> when I try to use setcpuaffinity, the result is worst.
> So what is wrong with my operation?
> Thanks
> ------------------------
> Best Regards!
> Bin He
> Member of IT
> Unique Studio
> ------------------------------
> <>
> Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus
> <> Schutz ist aktiv.
> ------------------------------
> <>
> Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus
> <> Schutz ist aktiv.
This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:22:59 CST