1. 介绍

1.1 CUDA

CUDA是NVIDIA推出的用于自家GPU的并行计算框架,也就是说CUDA只能在NVIDIA的GPU上运行,而且只有当要解决的计算问题是可以大量并行计算的时候才能发挥CUDA的作用。

显示支持型号与算力

1.2 cuDNN

cuDNN(CUDA Deep Neural Network library):是NVIDIA打造的针对深度神经网络的加速库,是一个用于深层神经网络的GPU加速库。如果你要用GPU训练模型,cuDNN不是必须的,但是一般会采用这个加速库。


2. 准备工作

2.1 关闭桌面系统

$ sudo service lightdm stop
# or
$ sudo service sddm stop 

2.2 卸载第三方驱动

$ lsmod | grep nouveau

$ sudo apt-get remove --purge nvidia*
$ sudo apt-get autoremove

$ sudo vi /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0

$ sudo update-initramfs -u

$ sudo reboot

Centos 7

NVIDIA GeForce GT 610 GPU installed in this system is supported through the NVIDIA 390.xx legacy Linux graphics drivers

我这块显示太老,所以只能安装较旧的版本。

Before installing the CUDA Toolkit on Linux, please ensure that you have the latest NVIDIA driver R390 installed. The latest NVIDIA R390 driver is available at: www.nvidia.com/drivers

下载R390驱动 NVIDIA-Linux-x86_64-390.141.run

cuda_9.1.85_387.26_linux.run / https://developer.nvidia.com/zh-cn/cuda-downloads 选择旧版本

查找驱动

Notes

CUDA Toolkit and Minimum Compatible Driver Versions

$ sudo rpm -i cuda-repo-rhel7-9-1-local-9.1.85-1.x86_64.rpm
$ sudo yum clean all
$ sudo yum install cuda

$ sudo yum install kmod-nvidia-390xx*

以下是采用runfile方式安装,但安装失败,编译内核时失败。

$ lsmod | grep nouveau

$ sudo vi /etc/modprobe.d/nvidia-installer-disable-nouveau.conf
blacklist nouveau
options nouveau modeset=0

$ sudo mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau.img
$ sudo dracut /boot/initramfs-$(uname -r).img  $(uname -r)

$ sudo reboot
$ sudo ./cuda_9.1.85_387.26_linux.run

安装失败处理

Installing the NVIDIA display driver...
The driver installation is unable to locate the kernel source. Please make sure that the kernel source packages are installed and set up correctly.
If you know that the kernel source packages are installed and set up correctly, you may pass the location of the kernel source with the '--kernel-source-path' flag.

===========
= Summary =
===========

Driver:   Installation Failed
Toolkit:  Installation skipped
Samples:  Installation skipped


Logfile is /tmp/cuda_install_27244.log.

# 原因是少内核源码

# 解决
$ sudo yum --enablerepo=elrepo-kernel install kernel-ml-devel
# 因为我安装的不是默认版本内核,所以指定一下内核路径。还需要注意gcc版本,10下编译内核失败。最后还是将内核和gcc恢复到原始版本
$ sudo ./cuda_9.1.85_387.26_linux.run  --kernel-source-path=/usr/src/kernels/3.10.0-1160.21.1.el7.x86_64

重启后再执行1.1


3. 安装

根据显卡型号到NVIDIA官网下载驱动和库安装包 可直接下载cuda安装包,它包括NVIDIA驱动

文件 说明
NVIDIA-Linux-x86_64-375.20.run NVIDIA驱动安装包
cuda_11.1.0_455.23.05_linux.run cuda库安装包
cudnn-11.1-linux-x64-v8.0.4.30.tgz cudnn库压缩包

3.1 使用cuda安装包安装

$ chmod a+x cuda_11.1.0_455.23.05_linux.run
$ ./cuda_11.1.0_455.23.05_linux.run

3.2 启动桌面系统

$ sudo service lightdm start
# or
$ sudo service sddm start

3.3 验证

$ nvidia-smi

Thu Mar 11 09:10:21 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05    Driver Version: 455.23.05    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce MX150       Off  | 00000000:03:00.0 Off |                  N/A |
| N/A   39C    P0    N/A /  N/A |      0MiB /  2002MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
~/NVIDIA_CUDA-11.1_Samples/1_Utilities/deviceQuery$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce MX150"
  CUDA Driver Version / Runtime Version          11.1 / 11.1
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 2003 MBytes (2099904512 bytes)
  ( 3) Multiprocessors, (128) CUDA Cores/MP:     384 CUDA Cores
  GPU Max Clock rate:                            1532 MHz (1.53 GHz)
  Memory Clock rate:                             3004 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        98304 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 3 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.1, CUDA Runtime Version = 11.1, NumDevs = 1
Result = PASS

( 3) Multiprocessors, (128) CUDA Cores/MP: 384 CUDA Cores

3个流式多处理器(即SM),每个多处理器包含128个流处理器,共384个CUDA核


4. Samples

4.1 CUDA Samples目录

目录名 说明
Simple Reference 基础CUDA示例,适用于初学者, 反映了运用CUDA和CUDA runtime APIs的一些基本概念.
Utilities Reference 演示如何查询设备能力和衡量GPU/CPU 带宽的实例程序。
Graphics Reference 图形化示例展现的是 CUDA, OpenGL, DirectX 之间的互通性
Imaging Reference 图像处理,压缩,和数据分析
Finance Reference 金融计算的并行处理
Simulations Reference 展现一些运用CUDA的模拟算法
Advanced Reference 用CUDA实现的一些先进的算法
Cudalibraries Reference 这类示例主要告诉我们该如何使用CUDA各种函数库(NPP, CUBLAS, CUFFT,CUSPARSE, and CURAND).

参考自

5. 相关文档

thrust库文档