Re: FATAL ERROR: CudaTileListKernel::buildTileLists,

From: Giacomo Fiorin (giacomo.fiorin_at_gmail.com)
Date: Tue Mar 01 2022 - 08:47:11 CST

And btw, nvidia-smi should show memory usage by all processes using the GPU
via CUDA.

On Tue, Mar 1, 2022 at 9:45 AM Giacomo Fiorin <giacomo.fiorin_at_gmail.com>
wrote:

> I think that the key here is not the total number of atoms, but this
> message:
> *Too many atoms in a patch*
> i.e. the local density seems to be a problem.
>
> Without knowing how the system was set up, the best suggestion would be to
> minimize it without the GPU, i.e. with NAMD 2 or the CUDASOA flag off.
> Unlike MD, which can be run in mixed precision under certain conditions,
> minimization is much more reliable if run in double precision. I am not
> sure if the current GPU code allows that.
>
> Giacomo
>
> On Tue, Mar 1, 2022 at 2:16 AM Tue Boesen <alyflex_at_gmail.com> wrote:
>
>> I'm trying to run energy minimization using NAMD 3.0alpha9 for
>> Linux-x86_64-multicore-CUDA, it works well for smaller systems, but I find
>> that for large systems I consistently get this error:
>>
>> FATAL ERROR: CudaTileListKernel::buildTileLists, maximum shared memory
>> allocation exceeded. Too many atoms in a patch
>>
>> I'm running the minimization on a Geforce RTX 3090 with 24GB memory, so I
>> believe I should have enough memory though it doesn't tell me exactly how
>> much it is using.
>>
>> The system I'm minimizing has about 1.5M atoms, and consists of a protein
>> in a box of water with a few Na+ Cl- ions.
>>
>> I have attached the logfile of the error below.
>>
>> Does anyone have any good suggestions for how to run this minimization?
>>
>> Cheers
>> Tue
>>
>>
>>
>> Charm++> No provisioning arguments specified. Running with a single PE.
>> Use +auto-provision to fully subscribe resources or +p1 to
>> silence this message.
>> Charm++: standalone mode (not using charmrun)
>> Charm++> Running in Multicore mode: 1 threads (PEs)
>> Charm++> Using recursive bisection (scheme 3) for topology aware
>> partitions
>> Converse/Charm++ Commit ID: v6.10.2-0-g7bf00fa
>> Warning> Randomization of virtual memory (ASLR) is turned on in the
>> kernel, thread migration may not work! Run 'echo 0 >
>> /proc/sys/kernel/randomize_va_space' as root to disable it, or try running
>> with '+isomalloc_sync'.
>> CharmLB> Load balancer assumes all CPUs are same.
>> Charm++> Running on 1 hosts (1 sockets x 8 cores x 2 PUs = 16-way SMP)
>> Charm++> cpu topology info is gathered in 0.000 seconds.
>> Info: Built with CUDA version 11000
>> Did not find +devices i,j,k,... argument, using all
>> Pe 0 physical rank 0 binding to CUDA device 0 on tue-ubuntu: 'NVIDIA
>> GeForce RTX 3090' Mem: 24265MB Rev: 8.6 PCI: 0:9:0
>> Info: NAMD 3.0alpha9 for Linux-x86_64-multicore-CUDA
>> Info:
>> Info: Please visit http://www.ks.uiuc.edu/Research/namd/
>> Info: for updates, documentation, and support information.
>> Info:
>> Info: Please cite Phillips et al., J. Chem. Phys. 153:044130 (2020)
>> doi:10.1063/5.0014475
>> Info: in all publications reporting results obtained with NAMD.
>> Info:
>> Info: Based on Charm++/Converse 61002 for multicore-linux-x86_64-iccstatic
>> Info: Built Sun Feb 28 21:57:49 CST 2021 by jmaia on manila.ks.uiuc.edu
>> Info: 1 NAMD 3.0alpha9 Linux-x86_64-multicore-CUDA 1 tue-ubuntu tue
>> Info: Running on 1 processors, 1 nodes, 1 physical nodes.
>> Info: CPU topology information available.
>> Info: Charm++/Converse parallel runtime startup completed at 0.283695 s
>> Info: 0 MB of memory in use based on /proc/self/stat
>> Info: Using bitfields in atom data structures.
>> Info: sizeof( CompAtom ) = 32
>> Info: sizeof( CompAtomExt ) = 8
>> CkLoopLib is used in SMP with simple dynamic scheduling (converse-level
>> notification)
>> Info: Configuration file is
>> /media/tue/Data/Data/test_mini/RCSB/../relax_pdb//calc/AF_1W_1WU_1WUZ_1_A/namd/AF_1W_1WU_1WUZ_1_A.conf
>> Info: Changed directory to
>> /media/tue/Data/Data/test_mini/RCSB/../relax_pdb//calc/AF_1W_1WU_1WUZ_1_A/namd
>> TCL: Suspending until startup complete.
>> Warning: The following variables were set in the
>> Warning: configuration file but will be ignored:
>> Warning: paraTypeXplor (parameters)
>> Warning: paraTypeCharmm (parameters)
>> Info: Using TIP3P water model.
>> Warning: The Langevin gamma parameters differ over the particles,
>> Warning: requiring extra work per step to constrain rigid bonds.
>> Info: SIMULATION PARAMETERS:
>> Info: TIMESTEP 1
>> Info: NUMBER OF STEPS 0
>> Info: STEPS PER CYCLE 20
>> Info: PERIODIC CELL BASIS 1 281.56 0 0
>> Info: PERIODIC CELL BASIS 2 0 145.95 0
>> Info: PERIODIC CELL BASIS 3 0 0 286.421
>> Info: PERIODIC CELL CENTER -16.6987 1.34986 1.40648
>> Info: WRAPPING WATERS AROUND PERIODIC BOUNDARIES ON OUTPUT.
>> Info: LOAD BALANCER Centralized
>> Info: LOAD BALANCING STRATEGY New Load Balancers -- DEFAULT
>> Info: LDB PERIOD 4000 steps
>> Info: FIRST LDB TIMESTEP 100
>> Info: LAST LDB TIMESTEP -1
>> Info: LDB BACKGROUND SCALING 1
>> Info: HOM BACKGROUND SCALING 1
>> Info: PME BACKGROUND SCALING 1
>> Info: MIN ATOMS PER PATCH 40
>> Info: INITIAL TEMPERATURE 310
>> Info: CENTER OF MASS MOVING INITIALLY? NO
>> Info: DIELECTRIC 1
>> Info: EXCLUDE SCALED ONE-FOUR
>> Info: 1-4 ELECTROSTATICS SCALED BY 0.833333
>> Info: MODIFIED 1-4 VDW PARAMETERS WILL BE USED
>> Info: DCD FILENAME min1.dcd
>> Info: DCD FREQUENCY 200
>> Info: DCD FIRST STEP 200
>> Info: DCD FILE WILL CONTAIN UNIT CELL DATA
>> Info: NO EXTENDED SYSTEM TRAJECTORY OUTPUT
>> Info: NO VELOCITY DCD OUTPUT
>> Info: NO FORCE DCD OUTPUT
>> Info: OUTPUT FILENAME min1
>> Info: RESTART FILENAME min1.restart
>> Info: RESTART FREQUENCY 200
>> Info: BINARY RESTART FILES WILL BE USED
>> Info: CUTOFF 10
>> Info: PAIRLIST DISTANCE 16
>> Info: PAIRLIST SHRINK RATE 0.01
>> Info: PAIRLIST GROW RATE 0.01
>> Info: PAIRLIST TRIGGER 0.3
>> Info: PAIRLISTS PER CYCLE 2
>> Info: PAIRLIST OUTPUT STEPS 100
>> Info: PAIRLISTS ENABLED
>> Info: MARGIN 0.555
>> Info: HYDROGEN GROUP CUTOFF 2.5
>> Info: PATCH DIMENSION 19.055
>> Info: ENERGY OUTPUT STEPS 200
>> Info: ENERGY EVALUATION STEPS 200
>> Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
>> Info: MOMENTUM OUTPUT STEPS 200
>> Info: TIMING OUTPUT STEPS 200
>> Info: PRESSURE OUTPUT STEPS 200
>> Info: LANGEVIN DYNAMICS ACTIVE
>> Info: LANGEVIN TEMPERATURE 310
>> Info: LANGEVIN USING BBK INTEGRATOR
>> Info: LANGEVIN DAMPING COEFFICIENT IS 5 INVERSE PS
>> Info: LANGEVIN DYNAMICS NOT APPLIED TO HYDROGENS
>> Info: LANGEVIN PISTON PRESSURE CONTROL ACTIVE
>> Info: TARGET PRESSURE IS 1.01325 BAR
>> Info: OSCILLATION PERIOD IS 200 FS
>> Info: DECAY TIME IS 100 FS
>> Info: PISTON TEMPERATURE IS 310 K
>> Info: PRESSURE CONTROL IS GROUP-BASED
>> Info: INITIAL STRAIN RATE IS 0 0 0
>> Info: CELL FLUCTUATION IS ISOTROPIC
>> Info: PARTICLE MESH EWALD (PME) ACTIVE
>> Info: PME TOLERANCE 1e-06
>> Info: PME EWALD COEFFICIENT 0.312341
>> Info: PME INTERPOLATION ORDER 4
>> Info: PME GRID DIMENSIONS 288 150 288
>> Info: PME MAXIMUM GRID SPACING 1
>> Info: Attempting to read FFTW data from
>> FFTW_NAMD_3.0alpha9_Linux-x86_64-multicore-CUDA.txt
>> Info: Optimizing 6 FFT steps. 1... 2... 3... 4... 5... 6... Done.
>> Info: Writing FFTW data to
>> FFTW_NAMD_3.0alpha9_Linux-x86_64-multicore-CUDA.txt
>> Info: FULL ELECTROSTATIC EVALUATION FREQUENCY 2
>> Info: USING VERLET I (r-RESPA) MTS SCHEME.
>> Info: C1 SPLITTING OF LONG RANGE ELECTROSTATICS
>> Info: PLACING ATOMS IN PATCHES BY HYDROGEN GROUPS
>> Info: RIGID BONDS TO HYDROGEN : ALL
>> Info: ERROR TOLERANCE : 1e-08
>> Info: MAX ITERATIONS : 100
>> Info: RIGID WATER USING SETTLE ALGORITHM
>> Info: RANDOM NUMBER SEED 1646117723
>> Info: USE HYDROGEN BONDS? NO
>> Info: Using AMBER format force field!
>> Info: AMBER PARM FILE
>> /media/tue/Data/Data/test_mini/RCSB/../relax_pdb//calc/AF_1W_1WU_1WUZ_1_A/leap/AF_1W_1WU_1WUZ_1_A_neutral.prmtop
>> Info: COORDINATE PDB
>> /media/tue/Data/Data/test_mini/RCSB/../relax_pdb//calc/AF_1W_1WU_1WUZ_1_A/leap/AF_1W_1WU_1WUZ_1_A_neutral.pdb
>> Info: Exclusions will be read from PARM file!
>> Info: SCNB (VDW SCALING) 2
>> Info: USING ARITHMETIC MEAN TO COMBINE L-J SIGMA PARAMETERS
>> Reading parm file
>> (/media/tue/Data/Data/test_mini/RCSB/../relax_pdb//calc/AF_1W_1WU_1WUZ_1_A/leap/AF_1W_1WU_1WUZ_1_A_neutral.prmtop)
>> ...
>> PARM file in AMBER 7 format
>> Warning: Skipping ATOMIC_NUMBER in parm file while seeking MASS.
>> Warning: Skipping SCEE_SCALE_FACTOR in parm file while seeking SOLTY.
>> Warning: Skipping SCNB_SCALE_FACTOR in parm file while seeking SOLTY.
>> Warning: Found 485687 H-H bonds.
>> Info: SUMMARY OF PARAMETERS:
>> Info: 67 BONDS
>> Info: 153 ANGLES
>> Info: 198 DIHEDRAL
>> Info: 0 IMPROPER
>> Info: 0 CROSSTERM
>> Info: 0 VDW
>> Info: 153 VDW_PAIRS
>> Info: 0 NBTHOLE_PAIRS
>> Info: Reading pdb file
>> /media/tue/Data/Data/test_mini/RCSB/../relax_pdb//calc/AF_1W_1WU_1WUZ_1_A/leap/AF_1W_1WU_1WUZ_1_A_neutral.pdb
>> Info: TIME FOR READING PDB FILE: 0.900383
>> Info:
>> Info: LONG-RANGE LJ: APPLYING ANALYTICAL CORRECTIONS TO ENERGY AND
>> PRESSURE
>> Info: LONG-RANGE LJ: AVERAGE A AND B COEFFICIENTS 574955 AND 581.291
>> Info: ****************************
>> Info: STRUCTURE SUMMARY:
>> Info: 1472897 ATOMS
>> Info: 1471677 BONDS
>> Info: 26517 ANGLES
>> Info: 65693 DIHEDRALS
>> Info: 0 IMPROPERS
>> Info: 0 CROSSTERMS
>> Info: 1536544 EXCLUSIONS
>> Info: 1464268 RIGID BONDS
>> Info: 2954423 DEGREES OF FREEDOM
>> Info: 494316 HYDROGEN GROUPS
>> Info: 4 ATOMS IN LARGEST HYDROGEN GROUP
>> Info: 494316 MIGRATION GROUPS
>> Info: 4 ATOMS IN LARGEST MIGRATION GROUP
>> Info: TOTAL MASS = 8.89287e+06 amu
>> Info: TOTAL CHARGE = -3.33139e-05 e
>> Info: MASS DENSITY = 1.25465 g/cm^3
>> Info: ATOM DENSITY = 0.125139 atoms/A^3
>> Info: *****************************
>> Info:
>> Info: Entering startup at 44.7864 s, 0 MB of memory in use
>> Info: Startup phase 0 took 0.000248679 s, 0 MB of memory in use
>> Info: ADDED 0 IMPLICIT EXCLUSIONS
>> Info: Startup phase 1 took 0.201876 s, 0 MB of memory in use
>> Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
>> Info: NONBONDED TABLE SIZE: 705 POINTS
>> Info: ABSOLUTE IMPRECISION IN FAST TABLE FORCE: 2.64698e-22 AT 9.94673
>> Info: RELATIVE IMPRECISION IN FAST TABLE FORCE: 5.64247e-16 AT 9.94673
>> Info: INCONSISTENCY IN FAST TABLE ENERGY VS FORCE: 0.000290479 AT 0.251946
>> Info: ABSOLUTE IMPRECISION IN SCOR TABLE FORCE: 2.11758e-22 AT 9.94673
>> Info: RELATIVE IMPRECISION IN SCOR TABLE FORCE: 5.86184e-16 AT 9.94673
>> Info: INCONSISTENCY IN SCOR TABLE ENERGY VS FORCE: 0.000178193 AT 9.97184
>> Info: ABSOLUTE IMPRECISION IN VDWA TABLE FORCE: 1.00974e-28 AT 9.99687
>> Info: INCONSISTENCY IN VDWA TABLE ENERGY VS FORCE: 0.0040507 AT 0.251946
>> Info: ABSOLUTE IMPRECISION IN VDWB TABLE FORCE: 6.2204e-22 AT 9.99687
>> Info: INCONSISTENCY IN VDWB TABLE ENERGY VS FORCE: 0.00150189 AT 0.251946
>> Info: Startup phase 2 took 0.00026995 s, 0 MB of memory in use
>> Info: Startup phase 3 took 1.1152e-05 s, 0 MB of memory in use
>> Info: Startup phase 4 took 0.00199203 s, 0 MB of memory in use
>> Info: Startup phase 5 took 1.6442e-05 s, 0 MB of memory in use
>> Info: PATCH GRID IS 14 (PERIODIC) BY 7 (PERIODIC) BY 15 (PERIODIC)
>> Info: PATCH GRID IS 1-AWAY BY 1-AWAY BY 1-AWAY
>> Info: REMOVING COM VELOCITY 0.000245969 -0.00383036 -0.00162446
>> Info: LARGEST PATCH (736) HAS 112867 ATOMS
>> Info: TORUS A SIZE 1 USING 0
>> Info: TORUS B SIZE 1 USING 0
>> Info: TORUS C SIZE 1 USING 0
>> Info: TORUS MINIMAL MESH SIZE IS 1 BY 1 BY 1
>> Info: Placed 100% of base nodes on same physical node as patch
>> Info: Startup phase 6 took 0.193281 s, 0 MB of memory in use
>> Info: Use 3D box decompostion in PME FFT.
>> Info: PME using 1 x 1 x 1 pencil grid for FFT and reciprocal sum.
>> Info: Startup phase 7 took 0.000113754 s, 0 MB of memory in use
>> Info: Updated CUDA force table with 4096 elements.
>> Info: Updated CUDA LJ table with 17 x 17 elements.
>> Info: Startup phase 8 took 0.0210923 s, 0 MB of memory in use
>> Info: Startup phase 9 took 3.051e-05 s, 0 MB of memory in use
>> Info: Startup phase 10 took 1.0641e-05 s, 0 MB of memory in use
>> Info: Startup phase 11 took 0.000820673 s, 0 MB of memory in use
>> LDB: Central LB being created...
>> Info: Startup phase 12 took 0.000622233 s, 0 MB of memory in use
>> Info: CREATING 30878 COMPUTE OBJECTS
>> Info: Found 348 unique exclusion lists needing 1216 bytes
>> Info: Startup phase 13 took 0.320426 s, 0 MB of memory in use
>> Info: Startup phase 14 took 4.9448e-05 s, 0 MB of memory in use
>> Info: Startup phase 15 took 0.00141796 s, 0 MB of memory in use
>> Info: Finished startup at 45.5287 s, 0 MB of memory in use
>>
>> TCL: Minimizing for 100 steps
>> FATAL ERROR: CudaTileListKernel::buildTileLists, maximum shared memory
>> allocation exceeded. Too many atoms in a patch
>> [Partition 0][Node 0] End of program
>>
>

This archive was generated by hypermail 2.1.6 : Tue Dec 13 2022 - 14:32:44 CST