Re: FATAL ERROR: CudaTileListKernel::buildTileLists,

From: Tue Boesen (alyflex_at_gmail.com)
Date: Tue Mar 01 2022 - 10:52:46 CST

Thank you for the prompt reply Giacomo!

Let me provide some additional background information about the systems I'm
running:
I am running energy minimization on most of the proteins in RCSB (so around
300k samples).

   1. Each example is sanitised
   2. run through pdb4amber to ensure correct format.
   3. tleap adds water box and neutralises the charge.
   4. Namd minimizes the energy

The reason why I believe this problem has to do with the number of atoms,
is that I consistently see it happening for large systems.
While I have never seen it for any of the smaller protein systems.

It could of course be a density issue like you suggest, but I find it
unlikely given the way I prepare the system and the way it consistently
shows up for larger systems. I have uploaded a zipfile to my google drive
and here:
https://urldefense.com/v3/__https://drive.google.com/file/d/1Zg9LfSQA2ARgyXKSAig3t1zQBNuXdIEP/view?usp=sharing__;!!DZ3fjg!ojQM-5AijDl-shoSDY7mz_qlRLY4YR93bfzA7zY8makMo_jIsTGA1zCJoMZFlWNGQQ$
, this contains the config file for namd and the pdb file, prmtop file and
the resulting output log file.

The problem with tracking memory with nvidia-smi is that the memory
overflow usually happens so fast that nvidia-smi doesn't catch it, however
I can try to find some large systems where it runs and see how much memory
they use to get an indicator of whether this could be the problem.

Similarly I will try to run this example on CPU and see whether it will run
there, though I suspect this might be a non-viable solution for me even if
the run is successful, depending on the runtime.

If anyone has any other ideas or suggestions I would be happy to hear them.

Kind Regards
Tue

On Tue, Mar 1, 2022 at 6:45 AM Giacomo Fiorin <giacomo.fiorin_at_gmail.com>
wrote:

> I think that the key here is not the total number of atoms, but this
> message:
> *Too many atoms in a patch*
> i.e. the local density seems to be a problem.
>
> Without knowing how the system was set up, the best suggestion would be to
> minimize it without the GPU, i.e. with NAMD 2 or the CUDASOA flag off.
> Unlike MD, which can be run in mixed precision under certain conditions,
> minimization is much more reliable if run in double precision. I am not
> sure if the current GPU code allows that.
>
> Giacomo
>
> On Tue, Mar 1, 2022 at 2:16 AM Tue Boesen <alyflex_at_gmail.com> wrote:
>
>> I'm trying to run energy minimization using NAMD 3.0alpha9 for
>> Linux-x86_64-multicore-CUDA, it works well for smaller systems, but I find
>> that for large systems I consistently get this error:
>>
>> FATAL ERROR: CudaTileListKernel::buildTileLists, maximum shared memory
>> allocation exceeded. Too many atoms in a patch
>>
>> I'm running the minimization on a Geforce RTX 3090 with 24GB memory, so I
>> believe I should have enough memory though it doesn't tell me exactly how
>> much it is using.
>>
>> The system I'm minimizing has about 1.5M atoms, and consists of a protein
>> in a box of water with a few Na+ Cl- ions.
>>
>> I have attached the logfile of the error below.
>>
>> Does anyone have any good suggestions for how to run this minimization?
>>
>> Cheers
>> Tue
>>
>>
>>
>> Charm++> No provisioning arguments specified. Running with a single PE.
>> Use +auto-provision to fully subscribe resources or +p1 to
>> silence this message.
>> Charm++: standalone mode (not using charmrun)
>> Charm++> Running in Multicore mode: 1 threads (PEs)
>> Charm++> Using recursive bisection (scheme 3) for topology aware
>> partitions
>> Converse/Charm++ Commit ID: v6.10.2-0-g7bf00fa
>> Warning> Randomization of virtual memory (ASLR) is turned on in the
>> kernel, thread migration may not work! Run 'echo 0 >
>> /proc/sys/kernel/randomize_va_space' as root to disable it, or try running
>> with '+isomalloc_sync'.
>> CharmLB> Load balancer assumes all CPUs are same.
>> Charm++> Running on 1 hosts (1 sockets x 8 cores x 2 PUs = 16-way SMP)
>> Charm++> cpu topology info is gathered in 0.000 seconds.
>> Info: Built with CUDA version 11000
>> Did not find +devices i,j,k,... argument, using all
>> Pe 0 physical rank 0 binding to CUDA device 0 on tue-ubuntu: 'NVIDIA
>> GeForce RTX 3090' Mem: 24265MB Rev: 8.6 PCI: 0:9:0
>> Info: NAMD 3.0alpha9 for Linux-x86_64-multicore-CUDA
>> Info:
>> Info: Please visit http://www.ks.uiuc.edu/Research/namd/
>> Info: for updates, documentation, and support information.
>> Info:
>> Info: Please cite Phillips et al., J. Chem. Phys. 153:044130 (2020)
>> doi:10.1063/5.0014475
>> Info: in all publications reporting results obtained with NAMD.
>> Info:
>> Info: Based on Charm++/Converse 61002 for multicore-linux-x86_64-iccstatic
>> Info: Built Sun Feb 28 21:57:49 CST 2021 by jmaia on manila.ks.uiuc.edu
>> Info: 1 NAMD 3.0alpha9 Linux-x86_64-multicore-CUDA 1 tue-ubuntu tue
>> Info: Running on 1 processors, 1 nodes, 1 physical nodes.
>> Info: CPU topology information available.
>> Info: Charm++/Converse parallel runtime startup completed at 0.283695 s
>> Info: 0 MB of memory in use based on /proc/self/stat
>> Info: Using bitfields in atom data structures.
>> Info: sizeof( CompAtom ) = 32
>> Info: sizeof( CompAtomExt ) = 8
>> CkLoopLib is used in SMP with simple dynamic scheduling (converse-level
>> notification)
>> Info: Configuration file is
>> /media/tue/Data/Data/test_mini/RCSB/../relax_pdb//calc/AF_1W_1WU_1WUZ_1_A/namd/AF_1W_1WU_1WUZ_1_A.conf
>> Info: Changed directory to
>> /media/tue/Data/Data/test_mini/RCSB/../relax_pdb//calc/AF_1W_1WU_1WUZ_1_A/namd
>> TCL: Suspending until startup complete.
>> Warning: The following variables were set in the
>> Warning: configuration file but will be ignored:
>> Warning: paraTypeXplor (parameters)
>> Warning: paraTypeCharmm (parameters)
>> Info: Using TIP3P water model.
>> Warning: The Langevin gamma parameters differ over the particles,
>> Warning: requiring extra work per step to constrain rigid bonds.
>> Info: SIMULATION PARAMETERS:
>> Info: TIMESTEP 1
>> Info: NUMBER OF STEPS 0
>> Info: STEPS PER CYCLE 20
>> Info: PERIODIC CELL BASIS 1 281.56 0 0
>> Info: PERIODIC CELL BASIS 2 0 145.95 0
>> Info: PERIODIC CELL BASIS 3 0 0 286.421
>> Info: PERIODIC CELL CENTER -16.6987 1.34986 1.40648
>> Info: WRAPPING WATERS AROUND PERIODIC BOUNDARIES ON OUTPUT.
>> Info: LOAD BALANCER Centralized
>> Info: LOAD BALANCING STRATEGY New Load Balancers -- DEFAULT
>> Info: LDB PERIOD 4000 steps
>> Info: FIRST LDB TIMESTEP 100
>> Info: LAST LDB TIMESTEP -1
>> Info: LDB BACKGROUND SCALING 1
>> Info: HOM BACKGROUND SCALING 1
>> Info: PME BACKGROUND SCALING 1
>> Info: MIN ATOMS PER PATCH 40
>> Info: INITIAL TEMPERATURE 310
>> Info: CENTER OF MASS MOVING INITIALLY? NO
>> Info: DIELECTRIC 1
>> Info: EXCLUDE SCALED ONE-FOUR
>> Info: 1-4 ELECTROSTATICS SCALED BY 0.833333
>> Info: MODIFIED 1-4 VDW PARAMETERS WILL BE USED
>> Info: DCD FILENAME min1.dcd
>> Info: DCD FREQUENCY 200
>> Info: DCD FIRST STEP 200
>> Info: DCD FILE WILL CONTAIN UNIT CELL DATA
>> Info: NO EXTENDED SYSTEM TRAJECTORY OUTPUT
>> Info: NO VELOCITY DCD OUTPUT
>> Info: NO FORCE DCD OUTPUT
>> Info: OUTPUT FILENAME min1
>> Info: RESTART FILENAME min1.restart
>> Info: RESTART FREQUENCY 200
>> Info: BINARY RESTART FILES WILL BE USED
>> Info: CUTOFF 10
>> Info: PAIRLIST DISTANCE 16
>> Info: PAIRLIST SHRINK RATE 0.01
>> Info: PAIRLIST GROW RATE 0.01
>> Info: PAIRLIST TRIGGER 0.3
>> Info: PAIRLISTS PER CYCLE 2
>> Info: PAIRLIST OUTPUT STEPS 100
>> Info: PAIRLISTS ENABLED
>> Info: MARGIN 0.555
>> Info: HYDROGEN GROUP CUTOFF 2.5
>> Info: PATCH DIMENSION 19.055
>> Info: ENERGY OUTPUT STEPS 200
>> Info: ENERGY EVALUATION STEPS 200
>> Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
>> Info: MOMENTUM OUTPUT STEPS 200
>> Info: TIMING OUTPUT STEPS 200
>> Info: PRESSURE OUTPUT STEPS 200
>> Info: LANGEVIN DYNAMICS ACTIVE
>> Info: LANGEVIN TEMPERATURE 310
>> Info: LANGEVIN USING BBK INTEGRATOR
>> Info: LANGEVIN DAMPING COEFFICIENT IS 5 INVERSE PS
>> Info: LANGEVIN DYNAMICS NOT APPLIED TO HYDROGENS
>> Info: LANGEVIN PISTON PRESSURE CONTROL ACTIVE
>> Info: TARGET PRESSURE IS 1.01325 BAR
>> Info: OSCILLATION PERIOD IS 200 FS
>> Info: DECAY TIME IS 100 FS
>> Info: PISTON TEMPERATURE IS 310 K
>> Info: PRESSURE CONTROL IS GROUP-BASED
>> Info: INITIAL STRAIN RATE IS 0 0 0
>> Info: CELL FLUCTUATION IS ISOTROPIC
>> Info: PARTICLE MESH EWALD (PME) ACTIVE
>> Info: PME TOLERANCE 1e-06
>> Info: PME EWALD COEFFICIENT 0.312341
>> Info: PME INTERPOLATION ORDER 4
>> Info: PME GRID DIMENSIONS 288 150 288
>> Info: PME MAXIMUM GRID SPACING 1
>> Info: Attempting to read FFTW data from
>> FFTW_NAMD_3.0alpha9_Linux-x86_64-multicore-CUDA.txt
>> Info: Optimizing 6 FFT steps. 1... 2... 3... 4... 5... 6... Done.
>> Info: Writing FFTW data to
>> FFTW_NAMD_3.0alpha9_Linux-x86_64-multicore-CUDA.txt
>> Info: FULL ELECTROSTATIC EVALUATION FREQUENCY 2
>> Info: USING VERLET I (r-RESPA) MTS SCHEME.
>> Info: C1 SPLITTING OF LONG RANGE ELECTROSTATICS
>> Info: PLACING ATOMS IN PATCHES BY HYDROGEN GROUPS
>> Info: RIGID BONDS TO HYDROGEN : ALL
>> Info: ERROR TOLERANCE : 1e-08
>> Info: MAX ITERATIONS : 100
>> Info: RIGID WATER USING SETTLE ALGORITHM
>> Info: RANDOM NUMBER SEED 1646117723
>> Info: USE HYDROGEN BONDS? NO
>> Info: Using AMBER format force field!
>> Info: AMBER PARM FILE
>> /media/tue/Data/Data/test_mini/RCSB/../relax_pdb//calc/AF_1W_1WU_1WUZ_1_A/leap/AF_1W_1WU_1WUZ_1_A_neutral.prmtop
>> Info: COORDINATE PDB
>> /media/tue/Data/Data/test_mini/RCSB/../relax_pdb//calc/AF_1W_1WU_1WUZ_1_A/leap/AF_1W_1WU_1WUZ_1_A_neutral.pdb
>> Info: Exclusions will be read from PARM file!
>> Info: SCNB (VDW SCALING) 2
>> Info: USING ARITHMETIC MEAN TO COMBINE L-J SIGMA PARAMETERS
>> Reading parm file
>> (/media/tue/Data/Data/test_mini/RCSB/../relax_pdb//calc/AF_1W_1WU_1WUZ_1_A/leap/AF_1W_1WU_1WUZ_1_A_neutral.prmtop)
>> ...
>> PARM file in AMBER 7 format
>> Warning: Skipping ATOMIC_NUMBER in parm file while seeking MASS.
>> Warning: Skipping SCEE_SCALE_FACTOR in parm file while seeking SOLTY.
>> Warning: Skipping SCNB_SCALE_FACTOR in parm file while seeking SOLTY.
>> Warning: Found 485687 H-H bonds.
>> Info: SUMMARY OF PARAMETERS:
>> Info: 67 BONDS
>> Info: 153 ANGLES
>> Info: 198 DIHEDRAL
>> Info: 0 IMPROPER
>> Info: 0 CROSSTERM
>> Info: 0 VDW
>> Info: 153 VDW_PAIRS
>> Info: 0 NBTHOLE_PAIRS
>> Info: Reading pdb file
>> /media/tue/Data/Data/test_mini/RCSB/../relax_pdb//calc/AF_1W_1WU_1WUZ_1_A/leap/AF_1W_1WU_1WUZ_1_A_neutral.pdb
>> Info: TIME FOR READING PDB FILE: 0.900383
>> Info:
>> Info: LONG-RANGE LJ: APPLYING ANALYTICAL CORRECTIONS TO ENERGY AND
>> PRESSURE
>> Info: LONG-RANGE LJ: AVERAGE A AND B COEFFICIENTS 574955 AND 581.291
>> Info: ****************************
>> Info: STRUCTURE SUMMARY:
>> Info: 1472897 ATOMS
>> Info: 1471677 BONDS
>> Info: 26517 ANGLES
>> Info: 65693 DIHEDRALS
>> Info: 0 IMPROPERS
>> Info: 0 CROSSTERMS
>> Info: 1536544 EXCLUSIONS
>> Info: 1464268 RIGID BONDS
>> Info: 2954423 DEGREES OF FREEDOM
>> Info: 494316 HYDROGEN GROUPS
>> Info: 4 ATOMS IN LARGEST HYDROGEN GROUP
>> Info: 494316 MIGRATION GROUPS
>> Info: 4 ATOMS IN LARGEST MIGRATION GROUP
>> Info: TOTAL MASS = 8.89287e+06 amu
>> Info: TOTAL CHARGE = -3.33139e-05 e
>> Info: MASS DENSITY = 1.25465 g/cm^3
>> Info: ATOM DENSITY = 0.125139 atoms/A^3
>> Info: *****************************
>> Info:
>> Info: Entering startup at 44.7864 s, 0 MB of memory in use
>> Info: Startup phase 0 took 0.000248679 s, 0 MB of memory in use
>> Info: ADDED 0 IMPLICIT EXCLUSIONS
>> Info: Startup phase 1 took 0.201876 s, 0 MB of memory in use
>> Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
>> Info: NONBONDED TABLE SIZE: 705 POINTS
>> Info: ABSOLUTE IMPRECISION IN FAST TABLE FORCE: 2.64698e-22 AT 9.94673
>> Info: RELATIVE IMPRECISION IN FAST TABLE FORCE: 5.64247e-16 AT 9.94673
>> Info: INCONSISTENCY IN FAST TABLE ENERGY VS FORCE: 0.000290479 AT 0.251946
>> Info: ABSOLUTE IMPRECISION IN SCOR TABLE FORCE: 2.11758e-22 AT 9.94673
>> Info: RELATIVE IMPRECISION IN SCOR TABLE FORCE: 5.86184e-16 AT 9.94673
>> Info: INCONSISTENCY IN SCOR TABLE ENERGY VS FORCE: 0.000178193 AT 9.97184
>> Info: ABSOLUTE IMPRECISION IN VDWA TABLE FORCE: 1.00974e-28 AT 9.99687
>> Info: INCONSISTENCY IN VDWA TABLE ENERGY VS FORCE: 0.0040507 AT 0.251946
>> Info: ABSOLUTE IMPRECISION IN VDWB TABLE FORCE: 6.2204e-22 AT 9.99687
>> Info: INCONSISTENCY IN VDWB TABLE ENERGY VS FORCE: 0.00150189 AT 0.251946
>> Info: Startup phase 2 took 0.00026995 s, 0 MB of memory in use
>> Info: Startup phase 3 took 1.1152e-05 s, 0 MB of memory in use
>> Info: Startup phase 4 took 0.00199203 s, 0 MB of memory in use
>> Info: Startup phase 5 took 1.6442e-05 s, 0 MB of memory in use
>> Info: PATCH GRID IS 14 (PERIODIC) BY 7 (PERIODIC) BY 15 (PERIODIC)
>> Info: PATCH GRID IS 1-AWAY BY 1-AWAY BY 1-AWAY
>> Info: REMOVING COM VELOCITY 0.000245969 -0.00383036 -0.00162446
>> Info: LARGEST PATCH (736) HAS 112867 ATOMS
>> Info: TORUS A SIZE 1 USING 0
>> Info: TORUS B SIZE 1 USING 0
>> Info: TORUS C SIZE 1 USING 0
>> Info: TORUS MINIMAL MESH SIZE IS 1 BY 1 BY 1
>> Info: Placed 100% of base nodes on same physical node as patch
>> Info: Startup phase 6 took 0.193281 s, 0 MB of memory in use
>> Info: Use 3D box decompostion in PME FFT.
>> Info: PME using 1 x 1 x 1 pencil grid for FFT and reciprocal sum.
>> Info: Startup phase 7 took 0.000113754 s, 0 MB of memory in use
>> Info: Updated CUDA force table with 4096 elements.
>> Info: Updated CUDA LJ table with 17 x 17 elements.
>> Info: Startup phase 8 took 0.0210923 s, 0 MB of memory in use
>> Info: Startup phase 9 took 3.051e-05 s, 0 MB of memory in use
>> Info: Startup phase 10 took 1.0641e-05 s, 0 MB of memory in use
>> Info: Startup phase 11 took 0.000820673 s, 0 MB of memory in use
>> LDB: Central LB being created...
>> Info: Startup phase 12 took 0.000622233 s, 0 MB of memory in use
>> Info: CREATING 30878 COMPUTE OBJECTS
>> Info: Found 348 unique exclusion lists needing 1216 bytes
>> Info: Startup phase 13 took 0.320426 s, 0 MB of memory in use
>> Info: Startup phase 14 took 4.9448e-05 s, 0 MB of memory in use
>> Info: Startup phase 15 took 0.00141796 s, 0 MB of memory in use
>> Info: Finished startup at 45.5287 s, 0 MB of memory in use
>>
>> TCL: Minimizing for 100 steps
>> FATAL ERROR: CudaTileListKernel::buildTileLists, maximum shared memory
>> allocation exceeded. Too many atoms in a patch
>> [Partition 0][Node 0] End of program
>>
>

This archive was generated by hypermail 2.1.6 : Tue Dec 13 2022 - 14:32:44 CST