94. TensorFlow-2 安装

Windows环境

1. 安装 CUDAcudnn

  1. 安装 CUDA

    1. 查看显卡信息,[CUDA版本和驱动版本的对照表]

    2. 下载对应的cuda,[CUDA各个版本下载地址]

      最高的版本:低事实最高版本一个级别

      最好低最高版本低几个版本。

    3. 安装cuda,勾掉以下3个:

      1. NVIDIA Geforce Experience ……
      2. CUDAVisual Studio ……
      3. driver ……display driver
    4. 其他默认即可

    5. 注意:版本对照等重要信息见文末的参考资料

      问题:我的显卡比10.1大,但是没问题,

  2. 安装 cudnn

    1. 下载和 CUDA 对应的版本(建议不要下载最新的,看好官方文档在决定下载哪个)点击这里下载
    2. 解压
    3. 该文件名为 cudnn
    4. 复制到目录:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
  3. 添加环境变量,详见参考资料

2. 安装 Tensorflow

  1. 安装 Tensorflow-cpu 版:
    1
    pip install --upgrade tensorflow
  2. 安装 Tensorflow-gpu 版:
    1
    pip install --upgrade tensorflow-gpu

3. 报错

  1. 报错信息:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    Traceback (most recent call last):
    File "C:\Users\toy\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 18, in swig_import_helper
    return importlib.import_module(mname)
    File "C:\Users\toy\AppData\Local\Programs\Python\Python35\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
    File "<frozen importlib._bootstrap>", line 986, in _gcd_import
    File "<frozen importlib._bootstrap>", line 969, in _find_and_load
    File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
    File "<frozen importlib._bootstrap>", line 666, in _load_unlocked
    File "<frozen importlib._bootstrap>", line 577, in module_from_spec
    File "<frozen importlib._bootstrap_external>", line 906, in create_module
    File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
    ImportError: DLL load failed: 找不到指定的模块。


    During handling of the above exception, another exception occurred:


    Traceback (most recent call last):
    File "C:\Users\toy\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 41, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
    File "C:\Users\toy\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 21, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
    File "C:\Users\toy\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 20, in swig_import_helper
    return importlib.import_module('_pywrap_tensorflow_internal')
    File "C:\Users\toy\AppData\Local\Programs\Python\Python35\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
    ImportError: No module named '_pywrap_tensorflow_internal'


    During handling of the above exception, another exception occurred:


    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "C:\Users\toy\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\__init__.py", line 24, in <module>
    from tensorflow.python import *
    File "C:\Users\toy\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\__init__.py", line 51, in <module>
    from tensorflow.python import pywrap_tensorflow
    File "C:\Users\toy\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 52, in <module>
    raise ImportError(msg)
    ImportError: Traceback (most recent call last):
    File "C:\Users\toy\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 18, in swig_import_helper
    return importlib.import_module(mname)
    File "C:\Users\toy\AppData\Local\Programs\Python\Python35\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
    File "<frozen importlib._bootstrap>", line 986, in _gcd_import
    File "<frozen importlib._bootstrap>", line 969, in _find_and_load
    File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
    File "<frozen importlib._bootstrap>", line 666, in _load_unlocked
    File "<frozen importlib._bootstrap>", line 577, in module_from_spec
    File "<frozen importlib._bootstrap_external>", line 906, in create_module
    File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
    ImportError: DLL load failed: 找不到指定的模块。


    During handling of the above exception, another exception occurred:


    Traceback (most recent call last):
    File "C:\Users\toy\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 41, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
    File "C:\Users\toy\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 21, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
    File "C:\Users\toy\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 20, in swig_import_helper
    return importlib.import_module('_pywrap_tensorflow_internal')
    File "C:\Users\toy\AppData\Local\Programs\Python\Python35\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
    ImportError: No module named '_pywrap_tensorflow_internal'

    Failed to load the native TensorFlow runtime.

    See https://www.tensorflow.org/install/install_sources#common_installation_problems

    for some common reasons and solutions. Include the entire stack trace
    above this error message when asking for help.
  2. 安装 vs2019即可:
  3. 选中:
    1. 通用 Windows 平台开发
    2. .NET 桌面开发
    3. ASP.NET 和 Web开发

4. 检测是不是用的 GPU:

  1. tf.test.is_gpu_available()
  2. 如下:
1
2
3
from tensorflow.python.client import device_lib

print(device_lib.list_local_devices())

参考资料

参考视频 1kc2

安装文件 qlgj

Ubuntu环境

安装 CUDA

  1. 查看显卡信息

    1
    $ nvidia-smi

    会得到以下信息:

    其中:

    版本
    显卡驱动 450.51.05
    cuda 11.0

    Note:

    1. 若没有安装过NVIDIA,那么 nvidia-smi 会报错,可直接安装cuda,因为cuda会自动安装显卡驱动
    2. 若安装完cuda, nvidia-smi 仍报错,那么重启就可以了。
  2. 下载cuda [点击这里下载]

    注意:选择和自己系统相匹配的(在下安装的是11.0)

    网站会给出相应的安装命令,如下:

    1
    2
    3
    4
    5
    6
    7
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
    sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
    wget http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda-repo-ubuntu1804-11-0-local_11.0.2-450.51.05-1_amd64.deb
    sudo dpkg -i cuda-repo-ubuntu1804-11-0-local_11.0.2-450.51.05-1_amd64.deb
    sudo apt-key add /var/cuda-repo-ubuntu1804-11-0-local/7fa2af80.pub
    sudo apt-get update
    sudo apt-get -y install cuda

    Note: 若 wget http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda-repo-ubuntu1804-11-0-local_11.0.2-450.51.05-1_amd64.deb 下载太慢,那么用浏览器打开 wget 后面的网址即可浏览器下载。

cudnn

  1. 添加 cudnn 环境变量

    1. [cuDNN下载地址]

      Note: 一定要下载与cuda相对应的版本,如:cuDNN Library for Linux (x86_64)

    2. 配置环境变量

      1. .bashrc 添加如下:

        1
        2
        export PATH="/usr/local/<cuda-11.0>/bin/:$PATH"
        export LD_LIBRARY_PATH="/home/<username>/cuda/lib64:$LD_LIBRARY_PATH" # cudnn解压的路径
  2. 查看cuda和cudnn是否安装成功

    1
    nvcc -V

    如果安装成功会得到以下信息:

    1
    2
    3
    4
    5
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2020 NVIDIA Corporation
    Built on Thu_Jun_11_22:26:38_PDT_2020
    Cuda compilation tools, release 11.0, V11.0.194
    Build cuda_11.0_bu.TC445_37.28540450_0
  3. 安装 tensorflow
    点击这里查看github教程

    PS: 最好使用科学上网

  4. 检测 tensorflow-gpu 是否安装成功

    1
    2
    3
    import tensorflow as tf

    print(tf.test.is_gpu_available())

Linux多版本CUDA安装/多版本切换

本机系统信息

CUDA版本和显卡驱动

1
nvidia-smi

目前CUDA版本为11.6,如下图所示,显卡驱动为510.47.03(根据下面的对照表,降级成CUDA10.2没问题)

ubuntu系统版本和架构

1
2
$ uname
$ lsb_release -a

CUDA版本和驱动版本的对照表

[Table 3 CUDA Toolkit and Corresponding Driver Versions]

如上图所示,像CUDA10.2版本,需要440.33以上的显卡驱动。例如,如果你通过nvidia-smi命令获得的显卡驱动号大于440.33,你可以将CUDA 11降到10.2是完全没有问题的。而如果你的显卡驱动号是396.26,那么你只能在CUDA7/8/9上进行切换。

因此显卡驱动总是越高越好,依赖关系为:

  • 操作系统版本最优先
  • 显卡驱动次之
  • CUDA再次之+Cudnn
  • 最后是python库,如pytorch版本

下载CUDA

下载网址:[https://developer.nvidia.com/cuda-toolkit-archive]

运行如下wget命令在服务器安装cuda的目录下:

将下载好的文件执行以下操作:

  • 进入到该目录下,使用chmod 755 cuda_10.2.89_440.33.01_linux.run更改文件的执行权限。
  • 不是管理员用户,因此无法使用sudo安装,直接执行以下命令即可 sh cuda_10.2.89_440.33.01_linux.run

安装CUDA

  1. 输入accept接收协议
  2. 选择***只安装CUDA Toolkit***,选择Option回车进入
  3. 修改安装路径

前提:如下使用的两个目录需要先创建

  • /home/<username>/cuda-10.2/
  • /home/<username>/cuda-10.2/mylib/

PS: 这两个目录可以随便换

先选择Toolkit Options回车进入

选择Change Toolkit Install Path回车,输入自定义的CUDA安装目录:

1
/home/<username>/cuda-10.2/
更改`Library install path`
1
/home/<username>/cuda-10.2/mylib

安装成功

此时nvcc还是显示原来的CUDA版本,还需要修改本地文件,如下:

修改环境配置文件

直接使用源文件夹

修改命令相关参数,执行如下命令:

1
vim ~/.bashrc

在最下方添加刚刚安装cuda的路径:

1
2
export PATH="/home/<username>/<cuda-10.2>/bin:$PATH"
export LD_LIBRARY_PATH="/home/<username>/cuda-10.2/lib64:/home/<username>/<cuda-10.2>/mylib/lib64:$LD_LIBRARY_PATH"

之前的CUDA路径可以注释掉,方便之后使用

保存之后,执行以下命令,使配置生效:

1
source~/.bashrc

查看效果:nvcc -V

显示CUDA 10.2,已经安装成功

ln映射文件夹

1
$ ln -s /home/<username>/<cuda-10.2> /usr/local/cuda

PS: /usr/local/cuda 目录可以换

修改命令相关参数,执行如下命令:

1
vim ~/.bashrc

在最下方添加刚刚安装cuda的路径:

1
2
export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:/usr/local/cuda/mylib/lib64:$LD_LIBRARY_PATH"

参考链接