your suggestion of using a different name for the output files worked.
A question derived from this simulation. In a simulation with X replicas
one gets X PMFs, how do you combine all of them? Do you use NAMD (somehow)?
Or maybe just take the average with a simple bash script?
>> Hi,
>> thanks for your comments, outputname is set to "meta" only without a
>> reference to replicas that you mentioned.
> Please make use outputName different for each replica as suggested,
> otherwise they'll overwrite each other's output.
>> May I ask you about the tcl function you mentioned, where could I find
>> its description? I get the following output files:
>> mymtd-replicas.txt
>> meta-distance.5.files.txt.BAK
>> meta-distance.5.files.txt
>> meta-distance.0.files.txt.BAK
>> meta-distance.0.files.txt
>> meta.xst.BAK
>> meta.restart.xsc.old
>> meta.restart.vel.old
>> meta.restart.coor.old
>> meta.restart.colvars.state.old
>> meta.restart.colvars.state
>> meta.pmf.BAK
>> meta.partial.pmf.BAK
>> meta.dcd.BAK
>> meta.colvars.traj.BAK
>> meta.colvars.traj
>> meta.colvars.state.old
>> meta.colvars.meta-distance.5.state
>> meta.colvars.meta-distance.5.hills.traj
>> meta.colvars.meta-distance.5.hills
>> meta.colvars.meta-distance.0.hills.traj
>> meta.xst
>> meta.restart.xsc
>> meta.restart.vel
>> meta.restart.coor
>> meta.pmf
>> meta.partial.pmf
>> meta.dcd
>> meta.colvars.state
>> meta.colvars.meta-distance.0.state
>> meta.colvars.meta-distance.0.hills
> This is consistent with your set up, each of those files is being written
> over multiple times, but those that contain the replica ID are different
> (because Colvars detects the replica ID internally from NAMD when you
> launch NAMD with +replicas).
>> plus the log file of NAMD which contains the information of the replicas
>> I used here. Because I requested 8 replicas I expected more output files.
>> The
>> content of mymtd-replicas.txt (written by NAMD not by me) is:
>> 0 meta-distance.0.files.txt
>> 5 meta-distance.5.files.txt
>> this tells me that somehow NAMD is setting 2 replicas although I
>> requested 8: mpirun -np 112 namd2 +replicas 8 script.inp
> Not quite: normally that list would be populated by the replicas, one by
> one. You ask for 8, but then because the replicas write all at the same
> time *onto the same files* they end up with I/O errors and the simulation
> doesn't seem to go on smoothly and the replicas don't get to the
> registration step.
>> The colvars config file contains the lines:
>> metadynamics {
>> name meta-distance
>> colvars distance1
>> hillWeight 0.1
>> newHillFrequency 1000
>> writeHillsTrajectory on
>> hillwidth 1.0
>> multipleReplicas on
>> replicasRegistry mymtd-replicas.txt
>> replicaUpdateFrequency 50000
>> writePartialFreeEnergyFile on
>> }
>> I am running on a parallel file system for hpc. Any comment will be
>> appreciated. Thanks again.
> For now the problem seems not to have differentiated the output prefix
> between replicas. If the problem persists after fixing that, please also
> report what kind of parallel file system (NFS, GPFS, Lustre, ...).
>>> Jing, you're probably using different values for outputName if you're
>>> using multipleReplicas on (i.e. multiple walkers), but still, please
>>> confirm that that's what you are using.
>>> Note also that by using file-based communication the replicas don't need
>>> to be launched with the same command, but can also be run as independent
>>> jobs:
>>> In that framework, the main advantage of +replicas is mostly that the
>>> value of replicaID is filled automatically, so that your Colvars config
>>> file can be identical for all replicas.
>>> If you are experiencing file I/O issues also when launching replicas
>>> independently (i.e. not with a single NAMD run with +replicas), can you
>>> find out what kind of filesystem you have on the compute nodes?
>>> Thanks
>>> Giacomo
>>>> There is definitely a bug in the 2.14 MPI version. One of my students
>>>> has noticed that anything that calls NAMD die isn't taking down all the
>>>> replicas, and so the jobs will continue to burn resources until they
>>>> reach their wallclock limit.
>>>> However, the key is figuring out *why* you are getting an error. I'm
>>>> less familiar with metadynamics, but at least for umbrella sampling, it
>>>> is pretty typical for each replica to write out its own set of files.
>>>> This is usually done with something like:
>>>> outputname somename.[myReplica]
>>>> Where [myReplica] is a Tcl function that evaluates to the replica ID
>>>> for
>>>> each semi-independent simulation. For debugging purposes, it can be
>>>> very
>>>> helpful for each replica to spit out its own log file. This is usually
>>>> done by setting the +stdout option on the command line.
>>>> mpirun -np 28 namd2 +replicas 2 namd_metadynamics.inp +stdout
>>>> outputlog.%d.log
>>>> -Josh
>>>> > Hi,
>>>> >
>>>> > I am running a metadynamics simulation with NAMD 2.14 MPI version.
>>>> > SLURM is being used for job scheduling, the way to run it by using 2
>>>> > replica on a 14 cores node is as follows:
>>>> >
>>>> > mpirun -np 28 namd2 +replicas 2 namd_metadynamics.inp
>>>> >
>>>> > In fact, I have tried upto 8 replicas and the resulting pmf looks
>>>> very
>>>> > similar
>>>> > to what I obtain with other methods such as ABF. The problem is that
>>>> > by using
>>>> > the replicas option, the simulation hangs right at the end. I have
>>>> > looked at the
>>>> > output files and it seems that right at the end NAMD wants to access
>>>> > some files
>>>> > (for example, *.xsc, *hills*, ...) that already exist and NAMD throws
>>>> > an error.
>>>> >
>>>> > My guess is that this could be either a misunderstanding from my side
>>>> > in running NAMD with replicas or a bug in the MPI version.
>>>> >
>>>> > Have you observed that issue previously? Any comment is welcome.
>>>> Thanks
>>>> >
