深度学习 GPU环境 Ubuntu 16.04 + Nvidia GTX 1060 + Python 3.5 + CUDA 9.0 + cuDNN 7.1 + TensorFlow 1.6 环境配置

接上篇ubuntu16.04安装nvidia驱动

按照上篇教程已经把nvidia驱动安装好

安装CUDA 9.0

如果存在之前的旧版本,可以选择先卸载,以免和新的 CUDA 版本产生冲突,在 /usr/local/cuda/bin 目录下有一个uninstallcuda*.pl 文件,可以直接运行卸载,命令如下:

1
sudo ./uninstall_cuda_*.pl

这样即可将 CUDA 全部卸载。
接下来我们再下载 CUDA 9.0,注意 TensorFlow 1.5 和 1.6 版本依然只是兼容 CUDA 9.0,没有兼容 CUDA 9.1,所以不要下载 9.1,CUDA 9.0 的下载地址是:https://developer.nvidia.com/cuda-90-download-archive ,然后依次勾选好系统的版本
这里我们选择 Linux-x86_64-Ubuntu-16.04-runfile 的配置,然后点击 Base Installer 部分的 Download 按钮,下载 CUDA 9.0 安装包。
对应的下载命令是:

1
wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run

执行此命令,等待下载完成即可。
接下来执行安装,运行如下命令:

1
sudo  ./cuda_9.0.176_384.81_linux-run

安装过程需要输入一些确认选项,过程如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Description

The NVIDIA CUDA Toolkit provides command-line and graphical
tools for building, debugging and optimizing the performance
Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit: n

Install the CUDA 9.0 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
[ default is /usr/local/cuda-9.0 ]:

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 9.0 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
[ default is /home/cqc ]:

Installing the CUDA Toolkit in /usr/local/cuda-9.0 ...

最后如果出现这样的提示,就证明 CUDA 安装好了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Driver:   Not Selected
Toolkit: Installed in /usr/local/cuda-9.0
Samples: Installed in /home/cqc, but missing recommended libraries

Please make sure that
- PATH includes /usr/local/cuda-9.0/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-9.0/lib64, or, add /usr/local/cuda-9.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.0/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.0/doc/pdf for detailed information on setting up CUDA.

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 9.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run -silent -driver

然后我们需要配置一下环境变量,更改 ~/.bashrc 文件,添加如下几行:

1
2
3
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda

修改完毕之后执行一下使其生效:

1
source ~/.bashrc

这时我们输出 CUDA_HOME、LD_LIBRARY_PATH 就可以看到对应的输出了:

1
2
3
4
echo $CUDA_HOME
/usr/local/cuda
echo $LD_LIBRARY_PATH
/usr/local/cuda/lib64

这样就代表环境变量生效了,CUDA 安装完成。

安装cuDNN 7.1

cuDNN 的全称是 The NVIDIA CUDA® Deep Neural Network library,是专门用来对深度学习加速的库,它支持 Caffe2, MATLAB, Microsoft Cognitive Toolkit, TensorFlow, Theano 及 PyTorch 等深度学习的加速优化,目前最新版本是 cuDNN 7.1,接下来我们来看下它的安装方式。
下载链接:https://developer.nvidia.com/rdp/cudnn-download ,需要注册之后才能打开,这里我们选择 cuDNN v7.1.1 (Feb 28, 2018), for CUDA 9.0,然后选择 cuDNN v7.1.1 Library for Linux

下载下来之后解压安装即可,但是在这里我下载下来之后后缀名不是.tgz,直接暴力改名,把后缀名改成了.tgz

1
2
3
4
5
6
mv cudnn-9.0-linux-x64-v7.1.solitairetheme8 cudnn-9.0-linux-x64-v7.1.tgz
tar -zxf cudnn-9.0-linux-x64-v7.1.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/ -d
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

执行完如上命令之后,cuDNN 就安装好了,这时我们可以发现在 /usr/local/cuda/include 目录下就多了 cudnn.h 头文件。

安装TensorFlow 1.6

我使用的是ubuntu16.04自带的python3.5,刚开始时可能pip没有安装,执行如下命令安装pip

1
sudo apt install python3-pip

然后pip换清华镜像源,参考我之前的一篇博客
接下来我们直接安装 TensorFlow 1.6 即可,TensorFlow 1.6 版本针对 CUDA 9 和 cuDNN 7 做了优化,可以预构建二进制文件。
这里需要安装的是 TensorFlow 的 GPU 版本,命令如下:

1
pip3 install tensorflow-gpu==1.6.0

安装完成之后验证一下:

1
2
3
4
5
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
>>> print(sess.run(hello))
b'Hello, TensorFlow!'

中间输出如下信息说明安装成功

1
2
3
4
5
6
7
8
2018-04-03 10:26:04.737972: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-04-03 10:26:04.974013: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-04-03 10:26:04.974426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:06:00.0
totalMemory: 5.93GiB freeMemory: 5.75GiB
2018-04-03 10:26:04.974440: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-04-03 10:26:05.219382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5534 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:06:00.0, compute capability: 6.1)

至此,全部环境配置都成功

打赏

请我喝杯咖啡吧~

支付宝
微信