From: Pietro Amodeo (pamodeo_at_icmib.na.cnr.it)
Date: Tue Jan 19 2010 - 05:03:21 CST
Hi,
I've recently installed NAMD 2.7b2 version compiled for x86_64
architecture using ibverbs on our cluster, based on
Dual Opteron Quad-core nodes with Infiniband interconnection and the
following sw configuration:
> (CentOS 5)
> kernel 2.6.18-53.el5
> gcc 4.1.2 20070626 (Red Hat 4.1.2-14) / icc 10.1 (Build 20070913
Pack.ID: l_cc_p_10.1.008)
> fftw 3.2.1
> ofed131 - openmpi 1.2.6
I sent a MD simulation (solvated protein including a total of 50443 atoms,
nPT, PBC, Ewald) on two nodes (16 cores).
After 698200 steps the calculation stopped with the following message in
the nohup.out file:
------------- Processor 12 Exiting: Called CmiAbort ------------
Reason:
Length mismatch!!
Fatal error on PE 12>
Length mismatch!!
and the following last lines in the log file:
TIMING: 698200 CPU: 47830.6, 0.0679197/step Wall: 47840.3,
0.0679338/step, 175.53 hours remaining, 151.316406 MB of memory in use.
ENERGY: 698200 16442.2971 11537.6218 1723.3257
182.0720 -202515.9631 17715.5541 0.0000 0.0000
46767.8964 -108147.1960 311.0399 -154915.0924
-107442.8354 310.6142 2143.1450 -99.1058
486574.9356 -10.2105 -8.0865
size: -1083700960, len:112.
[12] Stack Traceback:
[0] CmiAbort+0x5f [0xabb81b]
[1] /root/NFS/NAMD_2.7b2_Linux-x86_64-ibverbs/namd2 [0xab533e]
[2] /root/NFS/NAMD_2.7b2_Linux-x86_64-ibverbs/namd2 [0xab40ab]
[3] /root/NFS/NAMD_2.7b2_Linux-x86_64-ibverbs/namd2 [0xab252f]
[4] /root/NFS/NAMD_2.7b2_Linux-x86_64-ibverbs/namd2 [0xabf2d2]
[5] CcdCallBacks+0x104 [0xabf400]
[6] CsdScheduleForever+0xd8 [0xabc664]
[7] CsdScheduler+0x1c [0xabc232]
[8] _Z11master_initiPPc+0x2d6 [0x5121f6]
[9] _ZN7BackEnd4initEiPPc+0x31 [0x511f19]
[10] main+0x2f [0x50d80f]
[11] __libc_start_main+0xf4 [0x39fe81d8a4]
[12] _ZNSt8ios_base4InitD1Ev+0x4a [0x508bda]
AFAIK, this message is related to Infiniband communication and it is
issued by machine-ibverbs.c routine.
I couldn't find any strictly-related message in NAMD mailing list, while
other CmiAbort errors apparently depended on specific versions of either
charm++ or namd subroutines.
Before performing new (lengthy) blind tests with the same and/or different
input or node usage, I'll be glad if someone could suggest some more
diagnostic test or debug setting to be used in advance.
Sincerely,
Pietro Amodeo
-- Dr. Pietro Amodeo, PhD Istituto di Chimica Biomolecolare del CNR Comprensorio "A. Olivetti", Edificio 70 Via Campi Flegrei 34 I-80078 Pozzuoli (Napoli) - Italy Phone +39-0818675072 Fax +39-0818041770 Email pamodeo_at_icmib.na.cnr.it
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:53:41 CST