From: Sebastian S (
Date: Fri Nov 15 2019 - 12:39:29 CST
By the way, I'm using version 2.13, as the administrators of my network
haven't installed the new one yet
On Fri, Nov 15, 2019 at 1:34 PM Sebastian S <> wrote:
> I tried in the same node and I'm getting the same errors. The funny thing
> is that I can run 4 replicas without problems, but when I try 10 they start
> failing
> module load namd
> mpirun -np 2 namd2 testres.rep1.namd > s1.0.log &
> mpirun -np 2 namd2 testres.rep2.namd > s2.0.log &
> mpirun -np 2 namd2 testres.rep3.namd > s3.0.log &
> mpirun -np 2 namd2 testres.rep4.namd > s4.0.log &
> mpirun -np 2 namd2 testres.rep5.namd > s5.0.log &
> mpirun -np 2 namd2 testres.rep6.namd > s6.0.log &
> mpirun -np 2 namd2 testres.rep7.namd > s7.0.log &
> mpirun -np 2 namd2 testres.rep8.namd > s8.0.log &
> mpirun -np 2 namd2 testres.rep9.namd > s9.0.log &
> mpirun -np 2 namd2 testres.rep10.namd > s10.0.log &
> wait
> On Fri, Nov 15, 2019 at 12:54 PM Victor Kwan <> wrote:
>> Try with running the 12 replicas on the same node to see if the problem
>> relates to MPI?
>> Victor
>> On Fri, Nov 15, 2019 at 12:26 PM Canal de Sebassen <
>>> wrote:
>>> I have another question about these simulations. I started running some
>>> yesterday and:
>>> 1) initially some walkers do not start at all. I get messages like
>>> colvars: Metadynamics bias "metadynamics1": failed to read the file
>>> "metadynamics1.rep1.files.txt": will try again after 10000 steps.
>>> and in the same step the walker reads the other replicas and ends with
>>> colvars: Metadynamics bias "metadynamics1": reading the state of
>>> replica "rep1" from file "".
>>> colvars: Error: in reading state configuration for "metadynamics" bias
>>> "metadynamics1" at position -1 in stream.
>>> 2) others, they run for a while but then give me a message
>>> colvars: Error: in reading state configuration for "metadynamics" bias
>>> "metadynamics1" at position -1 in stream.
>>> FATAL ERROR: Error in the collective variables module: exiting.
>>> 3) in the end, I only get 3 walkers to work, with the other 9 I sent
>>> left for dead. I'm running these simulations in my local cluster, with the
>>> following code
>>> #!/bin/bash
>>> #$ -pe mpi-24 288 # Specify parallel environment and legal core size
>>> #$ -q long # Specify queue
>>> #$ -N Trial1 # Specify job name
>>> TASK=0
>>> cat $PE_HOSTFILE | while read -r line; do
>>> host=`echo $line|cut -f1 -d" "|cut -f1 -d"."`
>>> echo $host >> hostfile
>>> done
>>> hostfile="./hostfile"
>>> while IFS= read -r host
>>> do
>>> let "TASK+=1"
>>> /usr/kerberos/bin/rsh -F $host -n "uname -a; echo $TASK; cd XXXXXXXXX;
>>> pwd; module load namd; mpirun -np 24 namd2 testres.rep$TASK.namd >
>>> s$TASK.0.log ; exit" &
>>> done < $hostfile
>>> wait
>>> rm ./hostfile
>>> Am I doing something wrong? Currently the times of my colvars are
>>> colvarsTrajFrequency 10000
>>> metadynamics {
>>> colvars d1 d2
>>> useGrids on
>>> hillWeight 0.05
>>> newHillFrequency 10000
>>> dumpFreeEnergyFile on
>>> dumpPartialFreeEnergyFile on
>>> saveFreeEnergyFile on
>>> writeHillsTrajectory on
>>> multipleReplicas yes
>>> replicaID rep9
>>> replicasRegistry replicas.registry.txt
>>> replicaUpdateFrequency 10000
>>> and my namd outputs are
>>> numSteps 25000000
>>> outputEnergies 10000
>>> outputPressure 10000
>>> outputTiming 10000
>>> xstFreq 10000
>>> dcdFreq 10000
>>> restartFreq 10000
>>> Thanks,
>>> Sebastian
>>> On Sat, Nov 9, 2019 at 8:03 PM Canal de Sebassen <
>>>> wrote:
>>>> Thanks for your reply, Giacomo. I'll take your suggestions into
>>>> consideration when setting up the system.
>>>> Regards,
>>>> Sebastian
>>>> On Thu, Nov 7, 2019 at 6:37 PM Giacomo Fiorin <>
>>>> wrote:
>>>>> Hi Canal, first of all try upgrading to the latest NAMD nightly
>>>>> build. Thanks to Jim's help, I added extra checks that make the
>>>>> input/output functionality more robust (the same checks are used when
>>>>> writing the NAMD restart files):
>>>>> There is also an important bugfix in the output of the PMF (the
>>>>> restart files are fine):
>>>>> About the exchange rate, on modern hardware optimal performance is
>>>>> around few milliseconds/step, so 1000 steps is kind of short for a full
>>>>> cycle with all replicas reading each others' files. Best to increase it by
>>>>> a factor of 10 or more: I would have made its default value the same of the
>>>>> restart frequency, but there is no telling how long that would be for each
>>>>> user's input.
>>>>> Regarding the PMFs, nothing special is needed. Each replica will
>>>>> write PMFs with the same contents (the PMF extracted from the shared bias),
>>>>> so they will be equal minus the fluctuations arising from synchronization.
>>>>> You are probably confused by the partial output files, which are triggered
>>>>> by dumpPartialFreeEnergyFile (a flag that is off by default).
>>>>> Lastly, Gaussians 0.01 kcal/mol high added every 100 steps is quite a
>>>>> bit of bias, and will be further multiplied by the number of replicas.
>>>>> Giacomo
>>>>> On Thu, Nov 7, 2019 at 6:06 PM Canal de Sebassen <
>>>>>> wrote:
>>>>>> Hello,
>>>>>> Say I run a metadynamics simulation with 10 walkers. I then get 10
>>>>>> different pmf files. If my simulation was in 2D, how do I get a single
>>>>>> energy landscape? Do I use abf_integrate?
>>>>>> Also, what are some good practices when running these kind of
>>>>>> simulations?
>>>>>> I haven't found many examples. This is one my current colvars files.
>>>>>> I plan to get about 1-5 microseconds of data. Is a replicaUpdateFrequency
>>>>>> of 1000 too large? I tried with a smaller one but I get problems because
>>>>>> some files of a replica cannot be found by another one (maybe due to
>>>>>> lagging?).
>>>>>> Thanks,
>>>>>> Sebastian
>>>>>> colvarsTrajFrequency 100
>>>>>> colvar {
>>>>>> name d1
>>>>>> outputAppliedForce on
>>>>>> width 0.5
>>>>>> lowerBoundary 0.0
>>>>>> upperBoundary 30.0
>>>>>> upperWallConstant 100.0
>>>>>> distanceZ {
>>>>>> forceNoPBC yes
>>>>>> main {
>>>>>> atomsFile labels.pdb
>>>>>> atomsCol B
>>>>>> atomsColValue 1.0
>>>>>> }
>>>>>> ref {
>>>>>> atomsFile labels.pdb
>>>>>> atomsCol B
>>>>>> atomsColValue 2.0
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>> colvar {
>>>>>> name d2
>>>>>> outputAppliedForce on
>>>>>> width 1
>>>>>> lowerBoundary 0.0
>>>>>> upperBoundary 10.0
>>>>>> upperWallConstant 100.0
>>>>>> coordNum {
>>>>>> cutoff 4.0
>>>>>> group1 {
>>>>>> atomsFile labels.pdb
>>>>>> atomsCol O
>>>>>> atomsColValue 1.0
>>>>>> }
>>>>>> group2 {
>>>>>> atomsFile labels.pdb
>>>>>> atomsCol B
>>>>>> atomsColValue 2.0
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>> metadynamics {
>>>>>> colvars d1 d2
>>>>>> useGrids on
>>>>>> hillWeight 0.01
>>>>>> newHillFrequency 100
>>>>>> dumpFreeEnergyFile on
>>>>>> dumpPartialFreeEnergyFile on
>>>>>> saveFreeEnergyFile on
>>>>>> writeHillsTrajectory on
>>>>>> multipleReplicas yes
>>>>>> replicaID rep1
>>>>>> replicasRegistry replicas.registry.txt
>>>>>> replicaUpdateFrequency 1000
>>>>>> }
>>>>> --
>>>>> Giacomo Fiorin
>>>>> Associate Professor of Research, Temple University, Philadelphia, PA
>>>>> Research collaborator, National Institutes of Health, Bethesda, MD
This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:12 CST