Nvidia 驱动

Summary: Author: 张亚飞 | Read Time: 3 minute read | Published: 2016-09-22
Filed under Categories: LinuxTags: Note,

Nvidia 安装

查看显卡列表

sudo lshw -C display

安装 NVIDIA 驱动

apt install -y nvidia-driver-535-server

使用命令 nvidia-smi 查看 NVIDIA 卡列表

nvidia-smi

Ubuntu Linux Install Nvidia Driver (Latest Proprietary Driver)

nvcc --version

实时显示显存使用情况

nvidia-smi -l 5  #5秒刷新一次

每隔一秒刷新一次,刷新频率改中间数字即可

watch -n 1 -d nvidia-smi

指定 gpu 显卡

使用 docker run 可以通过 --gpus all 参数绑定机器所有 gpu 显卡

docker run -it --rm --gpus all proxy.icsay.com/library/ubuntu nvidia-smi

显卡列表

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:21:00.0 Off |                  Off |
| 45%   25C    P8              6W /  450W |       1MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        Off |   00000000:81:00.0 Off |                  Off |
| 45%   26C    P8              6W /  450W |       1MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

也可以指定 gpu index 绑定特定的 gpu 卡,例如绑定 index0 的显卡

docker run -it --rm --gpus "device=0" ubuntu nvidia-smi

绑定序号为 01gpu 的显卡:

docker run -it --rm --gpus '"device=0,1"' ubuntu nvidia-smi

注意,绑定多张显卡需要加双引号

也可以通过 NVIDIA_VISIBLE_DEVICES 环境变量来指定绑定的显卡,注意,使用 NVIDIA_VISIBLE_DEVICES 环境变量时需加上 --runtime=nvidia 参数

docker run -it --rm --runtime=nvidia --env NVIDIA_VISIBLE_DEVICES=0,1 ubuntu nvidia-smi

可以通过设置环境变量 NVIDIA_VISIBLE_DEVICESgpubusid 来绑定显卡,以下分别绑定 busid00000000:21:00.000000000:81:00.0

docker run -it --rm --runtime=nvidia --env NVIDIA_VISIBLE_DEVICES=00000000:21:00.0 ubuntu nvidia-smi
docker run -it --rm --runtime=nvidia --env NVIDIA_VISIBLE_DEVICES=00000000:81:00.0 ubuntu nvidia-smi

绑定多张显卡的参数为

docker run -it --rm --runtime=nvidia --env NVIDIA_VISIBLE_DEVICES=00000000:21:00.0,00000000:81:00.0 ubuntu nvidia-smi

其它

可以通过 --deviceNVIDIA_DRIVER_CAPABILITIES 指定绑定的驱动

docker run -it --rm --runtime=nvidia --device /dev/nvidia0:/dev/nvidia0 --env NVIDIA_DRIVER_CAPABILITIES="compute,utility" --env NVIDIA_VISIBLE_DEVICES=00000000:81:00.0 ubuntu nvidia-smi

以下绑定所有显卡

docker run --gpus 'all,"capabilities=compute,utility"' --rm -it ubuntu nvidia-smi
docker run -e NVIDIA_DRIVER_CAPABILITIES=all --gpus all --rm -it ubuntu nvidia-smi

Runtime options with Memory, CPUs, and GPUs 浅谈 docker 挂载 GPU 原理 Failed to specify all capability #1128 nvidia-smi executable file not found in $PATH #1668 如何优雅的管理你的GPU


常见问题

使用命令 apt install -y nvidia-driver-535-server 安装完 NVIDIA 驱动后报以下错误

# nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 535.183

网上查询说是安装的版本和系统版本不一致

查看系统版本

# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  550.54.15  Tue Mar  5 22:23:56 UTC 2024
GCC version:  gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~22.04)

貌似系统版本为 550.54.15,但是我安装的是 535.183

查看可安装的驱动列表

ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0 ==
modalias : pci:v000010DEd00002684sv000010DEsd000016F3bc03sc00i00
vendor   : NVIDIA Corporation
driver   : nvidia-driver-535-server - distro non-free
driver   : nvidia-driver-545-open - distro non-free
driver   : nvidia-driver-545 - distro non-free
driver   : nvidia-driver-535 - distro non-free recommended
driver   : nvidia-driver-535-server-open - distro non-free
driver   : nvidia-driver-535-open - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

发现没有提供 550 的驱动,参考以上命令直接将版本号改成 550

apt install -y nvidia-driver-550-server

Comments

Cor-Ethan, the beverage → www.iirii.com