From: Rene Salmon (rsalmon_at_tulane.edu)
Date: Fri Mar 23 2007 - 17:44:59 CDT
Hello,
We have compiled NAMD-2.6 with mvapich to run over infiniband using the
instructions on the NAMD wiki here.
http://www.ks.uiuc.edu/Research/namd/wiki/index.cgi?NamdOnInfiniBand
Namd seems to work nicely and scale for 2,4,8,16 cpus but once we try to run
on 32 cpus or more we start getting this error message:
namd2: viarecv.c:724: viadev_eager_pull_pack: Assertion
`rhandle->bytes_copied_to_user == rhandle->len'
failed.
namd2: viarecv.c:724: viadev_eager_pull_pack: Assertion
`rhandle->bytes_copied_to_user == rhandle->len'
failed.
namd2: viarecv.c:724: viadev_eager_pull_pack: Assertion
`rhandle->bytes_copied_to_user == rhandle->len'
failed.
namd2: viarecv.c:724: viadev_eager_pull_pack: Assertion
`rhandle->bytes_copied_to_user == rhandle->len'
failed.
[mpirund] rank 18 has got signal 6
[mpirund] rank 19 has got signal 6
.
.
.
[mpirund] rank 28 has got signal 6
Timeout for rank 20 hostname 'compute-01-01-ib'. Job is not finalized there.
Cleaning up all processes ...
Some rank on 'compute-01-13-ib' exited without finalize.
done.
Any ideas as to what might cause this?
Thank you
Rene
-- Rene Salmon Tulane University Center for Computational Science http://www.ccs.tulane.edu rsalmon_at_tulane.edu Tel 504-862-8393 Fax 504-862-8392
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:44:29 CST