本手册用于在CentOS 7上安装nVidia Docker环境,进行GPU相关编程开发工作。文中CUDA版本使用10.1,但安装步骤也适用于CUDA 11.x。本文主要引用以下官方文档:
Installation Guide — NVIDIA Cloud Native Technologies documentation。如官方文档有变更,以最新版本为准。
注意: 安装nvidia-docker2前要先安装 驱动及CUDA、CUDNN安装
在 CentOS 7 上设置 Docker¶
如果您在 EC2 等云实例上,则官方CentOS 映像可能不包含iptables成功安装 Docker 所需的工具。在继续执行本文档中概述的其余步骤之前,请尝试使用此命令获取功能更强大的 VM。
$ sudo dnf install -y tar bzip2 make automake gcc gcc-c++ vim pciutils elfutils-libelf-devel libglvnd-devel iptables设置官方 Docker CE 存储库:
$ sudo yum-config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo现在您可以观察docker-ce存储库中可用的软件包:
$ sudo yum repolist -v由于 CentOS 不支持containerd.io较新版本的 Docker-CE 所需的特定版本的软件包,一种选择是手动安装该containerd.io软件包,然后继续安装这些docker-ce 软件包。
安装containerd.io包:
$ sudo yum install -y https://download.docker.com/linux/centos/7/x86_64/stable/Packages/containerd.io-1.4.3-3.1.el7.x86_64.rpm现在安装最新的docker-ce软件包:
$ sudo yum install docker-ce -y使用以下命令确保 Docker 服务正在运行:
$ sudo systemctl --now enable dockerhello-world最后,通过运行容器测试您的 Docker 安装:
$ sudo docker run --rm hello-world这应该会产生如下所示的控制台输出:
Unable to find image 'hello-world:latest' locallylatest: Pulling from library/hello-world0e03bdcc26d7: Pull completeDigest: sha256:7f0a9f93b4aa3022c3a4c147a449bf11e0941a1fd0bf4a8e6c9408b2600777c5Status: Downloaded newer image for hello-world:latestHello from Docker!This message shows that your installation appears to be working correctly.To generate this message, Docker took the following steps:1、The Docker client contacted the Docker daemon.2、The Docker daemon pulled the "hello-world" image from the Docker Hub. (amd64)3、The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading.4、The Docker daemon streamed that output to the Docker client, which sent it to your terminal.To try something more ambitious, you can run an Ubuntu container with:$ docker run -it ubuntu bashShare images, automate workflows, and more with a free Docker ID:https://hub.docker.com/For more examples and ideas, visit:https://docs.docker.com/get-started/设置 NVIDIA 容器工具包¶
设置stable存储库和 GPG 密钥:
注意:直接在命令行输入$ 后面内容 distribution= ....., 然后点击回车就行
会自动在/etc/yum.repos.d/目录下生成 nvidia-docker.repo文件
$ distribution=$(、/etc/os-release;echo $ID$VERSION_ID) && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo更新包列表后安装nvidia-docker2包(和依赖项):
$ sudo yum clean expire-cache$ sudo yum install -y nvidia-docker2设置默认运行时后重启 Docker 守护进程完成安装:
$ sudo systemctl restart docker此时,可以通过运行基本 CUDA 容器来测试工作设置:
$ sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi这应该会产生如下所示的控制台输出:
+-----------------------------------------------------------------------------+| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr、ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M、|| | | MIG M、||===============================+======================+======================|| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 || N/A 34C P8 9W / 70W | 0MiB / 15109MiB | 0% Default || | | N/A |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=============================================================================|| No running processes found |+-----------------------------------------------------------------------------+如果是cuda:10.0 则显示如下:
可以将文档翻译成中文如下: