From: Marcelo C. R. Melo (melomcr_at_gmail.com)
Date: Thu Jan 31 2019 - 16:41:35 CST
They are referring to tasks, not nodes. One could request 8 tasks in a
4-core multi-threaded system, for example. Makes sense? (Though that would
not be advisable in your case).
As I mentioned in my previous e-mail, you should check the commands that
control how ORCA distributes its computations in a cluster, as you may need
to provide a "hostfile" indicating the name(s) of the node(s) where ORCA
will find available processors. This is something every cluster makes
available when the queuing system reserves nodes for a job, so you should
find out how to access that in your cluster.
I imagine the cluster's MPI system is not making the tasks available when
ORCA calls MPI.
And yes, ORCA will use MPI even to parallelize within a single node.
Best
--- Marcelo Cardoso dos Reis Melo, PhD Postdoctoral Research Associate Luthey-Schulten Group University of Illinois at Urbana-Champaign crdsdsr2_at_illinois.edu +1 (217) 244-5983 On Thu, 31 Jan 2019 at 15:30, Francesco Pietra <chiendarret_at_gmail.com> wrote: > Are PAL4 and PAL8 expecting four or eight nodes, respectively, rather than > cores? > > ---------- Forwarded message --------- > From: Francesco Pietra <chiendarret_at_gmail.com> > Date: Thu, Jan 31, 2019 at 10:22 PM > Subject: Re: namd-l: Tuning QM-MM with namd-orca on one cluster node > To: Marcelo C. R. Melo <melomcr_at_gmail.com> > Cc: NAMD <namd-l_at_ks.uiuc.edu> > > > Hi Marcelo: > Fist thanks. > I moved away from MOPAC as I could not obtain SCF convergence, which was > not unexpected because of the two iron ions. ORCA reached single point > convergence in two runs of 125 iterations each (I was unable to set a flag > for more iterations, "maxiter #" on the qmConfigLine was not accepted and a > perusal of the manual did not help me). I used extensively ORCA years ago > for CD simulation (excited states), but then never more. > As to the size of the system, I am a biochemist, therefore interested in > real systems (which is no justification, I admit) Anyway I used a most > sloppy DFT and convergence in the hope that it is anyway more appropriate > than semiempirical for my system. > > I must correct my previous post, as I missed to notice the line >> >> Charm++> cpu affinity enabled. > > In new runs, described below, affinity info was complete in namd.log > >> Charm++> cpu affinity enabled. >> [1] pthread affinity is: 1 >> [3] pthread affinity is: 3 >> [4] pthread affinity is: 4 >> [2] pthread affinity is: 2 >> [0] pthread affinity is: 0 > > > I went before into troubles with PAL# then I (badly) forgot to reactivate > it but, in my hands, such troubles remain. I.e., with either PAL8 or PAL4, > the error, revealed in /0/*TmpOut, was > >> There are not enough slots available in the system to satisfy the 4 slots >> that were requested by the application: >> /cineca/prod/opt/applications/orca/4.0.1/binary/bin/orca_gtoint_mpi > > >> Either request fewer slots for your application, or make more slots >> available >> for use. > > > Settings were > qmConfigLine "! UKS BP86 RI SV def2/J enGrad PAL4 SlowConv" (or PAL8) > qmConfigLine "%%output Printlevel Mini Print\[ P_Mulliken \] 1 > Print\[P_AtCharges_M\] 1 end" > > > #SBATCH --nodes=1 > #SBATCH --ntasks=1 > #SBATCH --cpus-per-task=36 > #SBATCH --time=00:30:00 > module load profile/archive > module load autoload openmpi/2.1.1--gnu--6.1.0 (without activating mpi, > the system complains that mpirun is unavailable and crashes. I must admit > to be confused about that because for a single node mpi should not be > requested) > > /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2 > namd-01.conf +p5 +setcpuaffinity +showcpuaffinity > namd-01.log > > It seems that my settings are not providing hardware enough to ORCA > despite the full node of 36 cores. > > Thanks for advice > > francesco > > > > On Thu, Jan 31, 2019 at 8:08 PM Marcelo C. R. Melo <melomcr_at_gmail.com> > wrote: > >> Hi Francesco, >> >> The first line in your namd.log says >> "Info: Running on 5 processors, 1 nodes, 1 physical nodes." >> Which indicates NAMD is indeed using the 5 cores you requested with >> "+p5". Some times "top" will show just one process, but the CPU usage of >> the process will show 500%, for example, indicating 5 cores. This happens >> in some cluster management systems too. >> >> As for ORCA, your "qm config line" does not indicate you are requesting >> it to use multiple cores, so it most likely is really using just one. You >> should be using the keyword "PAL?", where the question mark indicates the >> number of requested cores: use "PAL8", for example, to ask for 8 cores. >> You should become familiarized with the commands that control how ORCA >> distributes its computations in a cluster (their manual is very good), as >> you may need to provide a "hostfile" indicating the name(s) of the node(s) >> where ORCA will find available processors. This is something every cluster >> makes available when the queuing system reserves nodes for a job, so you >> should find out how to access that in your cluster. >> >> As a final note, even in parallel, calculating 341 QM atoms (QM system + >> link atoms) using DFT will be slow. Really slow. Maybe not 10 hours per >> timestep, but you just went from a medium sized semi-empirical (parallel >> MOPAC) calculation to large DFT one. Even in parallel, MOPAC could take a >> couple of seconds per timestep (depending on CPU power). ORCA/DFT will take >> much more than that. >> >> Best, >> Marcelo >> --- >> Marcelo Cardoso dos Reis Melo, PhD >> Postdoctoral Research Associate >> Luthey-Schulten Group >> University of Illinois at Urbana-Champaign >> crdsdsr2_at_illinois.edu >> +1 (217) 244-5983 >> >> >> On Thu, 31 Jan 2019 at 12:27, Francesco Pietra <chiendarret_at_gmail.com> >> wrote: >> >>> Hello >>> Having obtained very good performance of NAMD(nightbuild)-MOPAC on one >>> cluster node on my system (large qm part, see below, including two iron >>> ions) , I am now trying the same with NAMD(nightbuild)-ORCA on the same >>> cluster (36 cores along two sockets). So far I was unable to have namd and >>> orca running on more than one core each. >>> >>> namd.conf >>> qmConfigLine "! UKS BP86 RI SV def2/J enGrad SlowConv" >>> qmConfigLine "%%output Printlevel Mini Print\[ P_Mulliken \] 1 >>> Print\[P_AtCharges_M\] 1 end" >>> (SCF already converged by omitting "enGrad") >>> >>> namd.job >>> #SBATCH --nodes=1 >>> #SBATCH --ntasks=1 >>> #SBATCH --cpus-per-task=36 >>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2 >>> namd-01.conf +p5 +setcpuaffinity + showcpuaffinity > namd-01.log >>> >>> namd.log >>> Info: Running on 5 processors, 1 nodes, 1 physical nodes. >>> Info: Number of QM atoms (excluding Dummy atoms): 315 >>> Info: We found 26 QM-MM bonds. >>> Info: Applying user defined multiplicity 1 to QM group ID 1 >>> Info: 1) Group ID: 1 ; Group size: 315 atoms ; Total PSF charge: -1 >>> Info: Found user defined charge 1 for QM group ID 1. Will ignore PSF >>> charge. >>> Info: MM-QM pair: 180:191 -> Value (distance or ratio): 1.09 (QM Group 0 >>> ID 1) >>> Info: MM-QM pair: 208:195 -> Value (distance or ratio): 1.09 (QM Group 0 >>> ID 1) >>> Info: MM-QM pair: 243:258 -> Value (distance or ratio): 1.09 (QM Group 0 >>> ID 1) >>> Info: MM-QM pair: 273:262 -> Value (distance or ratio): 1.09 (QM Group 0 >>> ID 1) >>> Info: MM-QM pair: 296:313 -> Value (distance or ratio): 1.09 (QM Group 0 >>> ID 1) >>> Info: MM-QM pair: 324:317 -> Value (distance or ratio): 1.09 (QM Group 0 >>> ID 1) >>> Info: MM-QM pair: 358:373 -> Value (distance or ratio): 1.09 (QM Group 0 >>> ID 1) >>> Info: MM-QM pair: 394:377 -> Value (distance or ratio): 1.09 (QM Group 0 >>> ID 1) >>> Info: MM-QM pair: 704:724 -> Value (distance or ratio): 1.09 (QM Group 0 >>> ID 1) >>> Info: MM-QM pair: 742:728 -> Value (distance or ratio): 1.09 (QM Group 0 >>> ID 1) >>> Info: MM-QM pair: 756:769 -> Value (distance or ratio): 1.09 (QM Group 0 >>> ID 1) >>> Info: MM-QM pair: 799:788 -> Value (distance or ratio): 1.09 (QM Group 0 >>> ID 1) >>> Info: MM-QM pair: 820:830 -> Value (distance or ratio): 1.09 (QM Group 0 >>> ID 1) >>> Info: MM-QM pair: 864:851 -> Value (distance or ratio): 1.09 (QM Group 0 >>> ID 1) >>> Info: MM-QM pair: 1461:1479 -> Value (distance or ratio): 1.09 (QM Group >>> 0 ID 1) >>> Info: MM-QM pair: 1511:1500 -> Value (distance or ratio): 1.09 (QM Group >>> 0 ID 1) >>> Info: MM-QM pair: 1532:1547 -> Value (distance or ratio): 1.09 (QM Group >>> 0 ID 1) >>> Info: MM-QM pair: 1566:1551 -> Value (distance or ratio): 1.09 (QM Group >>> 0 ID 1) >>> Info: MM-QM pair: 1933:1946 -> Value (distance or ratio): 1.09 (QM Group >>> 0 ID 1) >>> Info: MM-QM pair: 1991:1974 -> Value (distance or ratio): 1.09 (QM Group >>> 0 ID 1) >>> Info: MM-QM pair: 2011:2018 -> Value (distance or ratio): 1.09 (QM Group >>> 0 ID 1) >>> Info: MM-QM pair: 2050:2037 -> Value (distance or ratio): 1.09 (QM Group >>> 0 ID 1) >>> Info: MM-QM pair: 2072:2083 -> Value (distance or ratio): 1.09 (QM Group >>> 0 ID 1) >>> Info: MM-QM pair: 2098:2087 -> Value (distance or ratio): 1.09 (QM Group >>> 0 ID 1) >>> Info: MM-QM pair: 2139:2154 -> Value (distance or ratio): 1.09 (QM Group >>> 0 ID 1) >>> Info: MM-QM pair: 2174:2158 -> Value (distance or ratio): 1.09 (QM Group >>> 0 ID 1) >>> TCL: Minimizing for 200 steps >>> Info: List of ranks running QM simulations: 0. >>> Nothing about affinity!! (which was clearly displayed in MOPAC case) >>> >>> /0/qmm_0_input.TmpOut shows SCF ITERATIONS >>> >>> "top" shown a single PR for both namd and orca. >>> ___- >>> I had already tried a different job setting >>> #SBATCH --nodes=1 >>> #SBATCH --ntasks-per-node=4 >>> #SBATCH --ntasks-per-socket=2 >>> module load profile/archive >>> module load autoload openmpi/2.1.1--gnu--6.1.0 >>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2 >>> namd-01.conf +p5 > namd-01.log >>> >>> Here too, "top" showed a single PR for both namd and orca, so that in >>> about 20 hous, namd.log was at "ENERGY 2", indicating that 1400 hrs were >>> needed to complete the simulation. >>> >>> Thanks for advice >>> francesco pietra >>> >>> >>>
This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:20:28 CST