Re: Fwd: nvidia issue with namd12 Debian 11

From: Vermaas, Josh (vermaasj_at_msu.edu)
Date: Mon Jan 17 2022 - 12:24:31 CST

How big is your system? The error being tossed back is that you are out of memory. The GTX 680 only has 2GB of memory, and so depending on your system size you may run yourself out of memory.

-Josh

From: <owner-namd-l_at_ks.uiuc.edu> on behalf of Francesco Pietra <chiendarret_at_gmail.com>
Reply-To: "namd-l_at_ks.uiuc.edu" <namd-l_at_ks.uiuc.edu>, Francesco Pietra <chiendarret_at_gmail.com>
Date: Monday, January 17, 2022 at 4:40 AM
To: NAMD <namd-l_at_ks.uiuc.edu>, debian-users <debian-user_at_lists.debian.org>
Subject: namd-l: Fwd: nvidia issue with namd12 Debian 11

I forgot to add that commands 'nvidia-detect' and 'nvidia-smi' detect both GTX 680 as activated and tells that they are supported by all driver versions, including those for Tesla 450.
Actually, legacy nvidia drivers are only required for very old nvidia graphic cards, from 400 downwards.

I alsoo add that the box is at CUDA 11.2

---------- Forwarded message ---------
From: Francesco Pietra <chiendarret_at_gmail.com<mailto:chiendarret_at_gmail.com>>
Date: Mon, Jan 17, 2022 at 4:15 AM
Subject: nvidia issue with namd12 Debian 11
To: NAMD <namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu>>, debian-users <debian-user_at_lists.debian.org<mailto:debian-user_at_lists.debian.org>>

With a Debian 11 box with two GTX 680 I am unable to get them working. The problem occurred with upgrading from debian 10 to 11 and, from namd 11 to 12 (/NAMD_Git-2021-11-27_Linux-x86_64-multicore-CUDA)

nvidia-driver 460.91.03-1
linux-image-amd64 5.10.84-1
linux kernel 5.10.0-10-amd64

Error when trying a minimization:

TCL: Minimizing for 3000 steps
FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file src/CudaTileListKernel.cu, function sortTileLists, line 1577
 on Pe 2 (gig64 device 0 pci 0:2:0): an illegal memory access was encountered
FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file src/CudaTileListKernel.cu, function sortTileLists, line 1577
 on Pe 2 (gig64 device 0 pci 0:2:0): an illegal memory access was encountered
[Partition 0][Node 0] End of program
FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file src/CudaTileListKernel.cu, function sortTileLists, line 1577
 on Pe 4 (gig64 device 1 pci 0:3:0): an illegal memory access was encountered
FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file src/CudaTileListKernel.cu, function sortTileLists, line 1577
 on Pe 4 (gig64 device 1 pci 0:3:0): an illegal memory access was encountered

I have also reconfigured the xserver, at no avail.

I have noticed issues about namd12/nvidia on the web, apparently unresolved.

Thanks for advice
francesco pietra


This archive was generated by hypermail 2.1.6 : Tue Dec 13 2022 - 14:32:44 CST