From: Jim Phillips (
Date: Thu Oct 28 2010 - 14:14:28 CDT
This is very strange. NAMD is using the rename() function to avoid
overwriting the previous output file, and the error returned is saying
that 2mrt_md_extend.restart.coor and 2mrt_md_extend.restart.coor.old are
not on the same filesystem. I have no idea how this could be the case.
You can test the same operation in the shell via:
ln 2mrt_md_extend.restart.coor 2mrt_md_extend.restart.coor.old
(This creates a hard link, not a symbolic link as in "ln -s".)
Are you using regular NFS or the new pNFS "Parallel NFS"? (Do you have
multiple file servers for this filesystem, or just multiple clients?)
On Thu, 28 Oct 2010, Kwee Hong wrote:
> *Hi *all,
> I had my simulation run on a 14 nodes cluster and I got this error msg:
> ERROR: Error on renaming file 2mrt_md_extend.restart.coor to
> 2mrt_md_extend.restart.coor.old: Invalid cross-device link
> FATAL ERROR: Unable to open binary file 2mrt_md_extend.restart.coor: File
> exists
> ------------- Processor 0 Exiting: Called CmiAbort ------------
> Reason: FATAL ERROR: Unable to open binary file 2mrt_md_extend.restart.coor:
> File exists*
> And after posting the error at the mailing list and we got it solved as it
> is due to file's permission. After some time, another similar error occur
> with an exra notes:
> *
> *
> *ERROR: Error on renaming file ZN_wb_md.restart.coor to
> ZN_wb_md.restart.coor.old: Invalid cross-device link*
> *FATAL ERROR: Unable to open binary file ZN_wb_md.restart.coor: File exists*
> *------------- Processor 0 Exiting: Called CmiAbort ------------*
> *Reason: FATAL ERROR: Unable to open binary file ZN_wb_md.restart.coor: File
> exists*
> *
> *
> *[0] Stack Traceback:*
> * [0:0] CmiAbort+0x5c [0xb4521c]*
> * [0:1] _Z8NAMD_errPKc+0x9d [0x520c99]*
> * [0:2] _ZN6Output17write_binary_fileEPciP6Vector+0x17e [0x98619e]*
> * [0:3] _ZN6Output26output_restart_coordinatesEP6Vectorii+0x1b5 [0x986003]
> *
> * [0:4] _ZN6Output10coordinateEiiP6VectorP11FloatVectorR7Lattice+0x12b
> [0x985c57]*
> * [0:5]
> _ZN24CkIndex_CollectionMaster39_call_receivePositions_CollectVectorMsgEPvP16CollectionMaster+0x18f
> [0x533603]*
> * [0:6] CkDeliverMessageFree+0x21 [0xa863df]*
> *Charmrun: error on request socket--*
> *Socket closed before recv.*
> This round I doubt the problem got to do with the file's permission. We are
> using nfs parallel file system on the cluster. We export the nfs
> using (rw,sync,no_subtree_check,no_root_squash) options.
> Anyway to tackle this?
> Thanks
> Regards,
> Joyce
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:56:17 CST