From: Roman Petrenko (rpetrenko_at_gmail.com)
Date: Mon Feb 16 2009 - 22:07:30 CST

sorry, the subject line meant to be "vmd with cuda1.1"
runtime cpu 101489 sec
runtime gpu 234 seconds
still similar speedup 433.7

could it be due to some threads interference?
Using 4 CPUs
thread 0 started...
thread 3 started...
thread 2 started...
thread 1 started...

why is it using 4 threads? i submitted a job with -lnodes=1:ppn=1 option.

i'll update later on what compiler was used.

On Mon, Feb 16, 2009 at 10:56 PM, John Stone <johns_at_ks.uiuc.edu> wrote:
>
> Hi,
> The GFLOPS numbers indicate the number of floating point
> calculations, but not the runtime. The speedup is based on the
> different in runtime not on GFLOPS. There are various differences
> in the algorithm used on the GPU vs. the algorithm used on the CPU,
> making the CPU more efficient in GFLOPS/runtime than the GPU is.
> I think if you divide the runtimes out, you'll get a speedup closer
> to the range we see, assuming you run a build done with the Intel
> compiler and not just GCC. GCC doesn't generate very good code for
> this particular kernel, and just using the Intel compilers will
> improve the CPU performance by a large factor vs. GCC.
> Let me know if you have further questions.
>
> Cheers,
> John Stone
> vmd_at_ks.uiuc.edu
>
> On Mon, Feb 16, 2009 at 10:47:16PM -0500, Roman Petrenko wrote:
>> Hi all,
>> i ran time-averaged coulomb potential evaluations (downloaded from vmd
>> cuda website) and got more than 500 speedup on nvidia 9800 gpu vs cpu.
>> gpu speed 265.45 GFLOPS
>> cpu(4 threads) speed 0.497122 GFLOPS
>>
>> how is it possible? According to John Stone presentations the speedup
>> is expected to be at around 30 or 100.
>
> --
> NIH Resource for Macromolecular Modeling and Bioinformatics
> Beckman Institute for Advanced Science and Technology
> University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
> Email: johns_at_ks.uiuc.edu Phone: 217-244-3349
> WWW: http://www.ks.uiuc.edu/~johns/ Fax: 217-244-6078
>