Re: Interrupted Writing Restart Files

From: Josh Vermaas (vermaasj_at_msu.edu)
Date: Fri Dec 02 2022 - 12:24:18 CST

Hi Collin,

I always get this messed up, since it depends on the order the files
were written when you ran out of disk space. I *think* NAMD writes the
coor first, the vel second, and the xsc third. It failed in the first
step, so I think you are right that NAMD did not rename the "current"
restart.vel and restart.xsc file. The best way to check this would be to
list the file modification times. Ideally there will be a measurable gap.

But, that means you'd have a modestly discontinuous dcd, since the DCD
was written with last step 68801000. Keep this in mind when doing your
analysis, even if the effect is likely small.

-Josh

On 12/2/22 11:21 AM, Collin Nisler wrote:
> Hello NAMD mailing list, I saw some similar discussions about this
> topic but nothing regarding this particular issue. I ran out of
> scratch space in the middle of a run, and the output of the log file
> looks like this:
>
> PRESSURE: 68800950 -58.3105 -62.6528 69.6044 -62.6528 147.068 -31.9313
> 69.6044 -31.9313 -66.2698
> GPRESSURE: 68800950 -63.9582 -86.9157 115.89 -91.6053 165.422 -101.241
> 47.5551 -45.9184 2.22279
> PRESSAVG: 68800950 -61.8699 22.772 8.63488 22.772 -46.7047 28.4571
> 8.63488 28.4571 4.35764
> GPRESSAVG: 68800950 -63.9626 16.8961 7.61863 22.8471 -43.3447 27.5754
> 9.58984 28.387 6.92415
> ENERGY: 68800950      5221.9152 26651.4812     18972.3235      
> 346.9468        -560309.6766   32807.5852         0.0000        
> 0.0000    107498.2178    -368811.2069       302.8579   -476309.4247  
> -368238.5517     301.4446              7.4958        34.5622  
> 1652816.1789       -34.7390       -33.4611
>
> WRITING COORDINATES TO DCD FILE Dec-Myr-Ole-Dag-Bilayer-mineq01i.dcd
> AT STEP 68801000
> WRITING COORDINATES TO RESTART FILE AT STEP 68801000
> FATAL ERROR: Error on write to binary file
> Dec-Myr-Ole-Dag-Bilayer-mineq01i.restart.coor: Disk quota exceeded
> [Partition 0][Node 0] End of program
>
> I was planning on using the
> Dec-Myr-Ole-Dag-Bilayer-mineq01i.restart.coor.old file to restart,
> since it presumably began to write the new one but didn't finish, but
> using the current (not .old) .vel and .xsc files. The current .xsc
> file shows a timestep of 68800000. Would this be the proper way to
> restart the simulation? Is there some way I can be certain, once it
> begins running, that it was restarted properly, since I'm assuming the
> pressure and/or temperature won't show any obvious discontinuities
> even if I used the wrong files? Thanks very much for the help.
>
> Collin

-- 
Josh Vermaas
vermaasj_at_msu.edu
Assistant Professor, Plant Research Laboratory and Biochemistry and Molecular Biology
Michigan State University
vermaaslab.github.io

This archive was generated by hypermail 2.1.6 : Tue Dec 13 2022 - 14:32:44 CST