:bulb: This post describes how to resolve the Failed to initialize NVML: Driver/library version mismatch error that appears when running nvidia-smi after the NVIDIA driver is installed.

[01] Situation

  • Ubuntu server with an NVIDIA GPU installed
  • GPU is in use after installing nvidia-driver
  • Running nvidia-smi fails with Failed to initialize NVML: Driver/library version mismatch

    nvidia-error-01

[02] Cause

2-1. Identify the problem

1
dmesg

nvidia-error-02

  • NVRM: API mismatch ... error message
  • Client version is 470.103.01, but the kernel module version is 470.86
  • Linux’s unattended-upgrade automatically updated the security-related packages, causing the version gap

2-2. Check what unattended-upgrade ran

1
cat /var/log/apt/history.log

nvidia-error-03

1
view /var/log/unattended-upgrades/unattended-upgrades.log.1.gz

nvidia-error-04

nvidia-error-05

  • The log shows that libnvidia-* packages were automatically updated on 2022-02-08.

:bulb: If you can’t find the relevant entries in unattended-upgrades.log, check the older log files (e.g. log.1.gz).

[03] Solution 1: Prevent unattended-upgrade

Exclude the NVIDIA-related packages from the unattended-upgrade target list.

Edit /etc/apt/apt.conf.d/50unattended-upgrades:

1
2
3
4
5
6
# Based on Ubuntu 24.04
# Add the following block to /etc/apt/apt.conf.d/50unattended-upgrades

Unattended-Upgrade::Package-Blacklist {
  "nvidia-*.";
}

Image 41

:small_blue_diamond: Reference: link

[04] Solution 2: Remove the NVIDIA module (without rebooting)

If the update has already been installed automatically, removing the existing module resolves the problem.

4-1. Check the NVIDIA kernel modules

1
lsmod |grep nvidia

nvidia-error-06

4-2. Remove the modules

1
2
3
4
5
# Remove the listed modules
rmmod $module_name

# ex)
rmmod nvidia_uvm

:warning: If you see an error like ERROR: Module nvidia_drm is in use, terminate the processes that are using it.

nvidia-error-07

1
2
3
4
5
6
# Find and kill processes using NVIDIA
lsof /dev/nvidia*
kill -9 $PID

# ex)
kill -9 449143

nvidia-error-08

4-3. Verify

The modules are automatically reloaded, and nvidia-smi works again.

nvidia-error-09