使用 AMD Radeon RX 6600XT 在 Deepin 上运行 Stable Diffusion

最近 AI 绘画十分火爆,我看得也有些心痒痒。于是在移动 SSD 里面装了一个 Deepin 20.7,顺便也能当个随身系统了。

先前我已经进行了多次试验,证明 AMD 显卡采用 DirectML 无法在 Windows 下运行泄露模型,因为没有合理的方式转换为 ONNX 格式模型。而且,DirectML 运行 ONNX 的其他 Stable Diffusion 模型效率也偏低。

于是我只能选择在 Linux 下采用 ROCm 运行,也踩到了不少坑。

指导链接:Install-and-Run-on-AMD-GPUs

环境安装

基于 ROCm 的深度学习环境在 Linux 下环境配置相对复杂,我们分节描述。

ROCm 环境准备

Deepin 官方的软件源缺少部分软件包,所以需要加入 Ubuntu 的 security 源拉取部分内容。首先,从 Ubuntu 服务器和 AMD Radeon 拉取必备的 GPG 公钥。

gpg --keyserver keyserver.ubuntu.com --recv-keys 16126D3A3E5C1192
curl https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -
gpg --armor --export | sudo apt-key add -
echo 'deb http://security.ubuntu.com/ubuntu bionic-security main universe' | sudo tee -a /etc/apt/sources.list
echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/5.1.1 ubuntu main' | sudo tee /etc/apt/sources.list.d/rocm.list
sudo apt update

然后安装必备依赖项:

sudo apt install libnuma-dev libpython3.8 rocm-dev rocm-libs

安装成功后使用 rocm-smi 命令查看 GPU,若出现形如以下内容的 GPU 信息则安装成功,首次安装完成可能需要重启,然后即可在 /etc/apt/sources.list 去除 Ubuntu Security 源。

======================= ROCm System Management Interface =======================
================================= Concise Info =================================
GPU  Temp   AvgPwr  SCLK     MCLK    Fan  Perf  PwrCap  VRA
0    55.0c  18.0W   2735Mhz  541Mhz  0%   auto  130.0W   46%   3%    
================================================================================
============================= End of ROCm SMI Log ==============================

然后配置以下动态库路径,编辑 /etc/ld.so.conf.d/10-rocm.conf

/opt/rocm-5.1.1/lib
/opt/rocm-5.1.1/rocsolver/lib
/opt/rocm-5.1.1/rocblas/lib
/opt/rocm-5.1.1/rocclr/lib

此时 ROCm 的基本环境搭建完成。

部分参考:deepin20.6正确安装rocm,前面一直整不好,今天终于搞定了

Python 环境准备

这里主要是因为 Deepin 默认预装的是 Python 3.7,版本过老,Ubuntu 等发行版可忽略,该部分只是照顾 Linux 基础较差的读者。

下载 Python 3.10 的源码包,解压后进入该目录。

安装必备依赖和编译安装:

sudo apt install -y make build-essential libssl-dev zlib1g-dev liblzma-dev libbz2-dev libreadline-dev libsqlite3-dev llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev
./configure --enable-optimizations --with-ssl
make "-j$(nproc)"
sudo make altinstall

Stable Diffusion WebUI 搭建

首先是基础的环境搭建:

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui
python3.10 -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip wheel

此时注意,若遵照 Wiki 直接运行后面的指令则会出现以下报错:
cuda error and rocm aborted
这是由于首先我们不是 CUDA 显卡,需要跳过 CUDA 测试,这个老生常谈。

但是 Aborted 呢?查阅资料后我了解到 AMD Radeon RX 6600XT 是没有被 ROCm 官方支持的显卡(代号为 gfx1032)。因此我们需要进行 HSA override,以 gfx1030 仿冒运行,最终启动命令如下:

HSA_OVERRIDE_GFX_VERSION=10.3.0 TORCH_COMMAND='pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.1.1' COMMANDLINE_ARGS='--skip-torch-cuda-test' python3 launch.py

这里对于 RX6600XT 不需要强制使用全精度。

如果一切良好,你应当能够在 http://127.0.0.1:7860 访问你的实例。

然后你就可以把你的模型放入 models/Stable-diffusion 下了。

性能问题解决

首先,AMD 启动后第一次生成时间会相对较长,这是正常现象。

但是,对于没有官方支持的显卡,电脑开机后第一次使用 ROCm 计算时,驱动会自动将显卡锁在低性能。可采用 rocm-smi 查看:
low perf

因此,我们需要执行一次以下指令解除锁定:

rocm-smi -d 0 --setperflevel auto

尾声

最后我写了一个脚本 run.sh

#!/bin/bash
cd "$(dirname $0)"
source ./venv/bin/activate
export HSA_OVERRIDE_GFX_VERSION=10.3.0
export TORCH_COMMAND='pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.1.1'
export COMMANDLINE_ARGS='--skip-torch-cuda-test'
sudo bash -c '(sleep 60 ; rocm-smi -d 0 --setperflevel auto) &'
python3 launch.py "$@"

AMD 显卡采用 ROCm 计算的速度是不慢的,而且具备显存不足时调用内存的能力,即使显存不是很大的显卡也可以用于生成分辨率较高的图像,环境折腾虽然麻烦却也值得。

6 Replies to “使用 AMD Radeon RX 6600XT 在 Deepin 上运行 Stable Diffusion”

  1. 出现了一些错误:_init__.py:88: UserWarning: HIP initialization: Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice (Triggered internally at ../c10/hip/HIPFunctions.cpp:110.)
    return torch._C._cuda_getDeviceCount() > 0
    Warning: caught exception ‘Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice’, memory monitor disabled
    LatentDiffusion: Running in eps-prediction mode
    rocm的版本是5.2.0的,也安装了对应的torch和torchvision包,请问这是为什么检测不到gpu呢?
    使用的命令是:HSA_OVERRIDE_GFX_VERSION=10.3.0 TORCH_COMMAND=’pip install torch torchvision –extra-index-url https://download.pytorch.org/whl/rocm5.2‘ COMMANDLINE_ARGS=’–skip-torch-cuda-test’ python3 launch.py –no-half
    显卡是RX6600

  2. 大佬求助,我按照步骤做下来成功进入来ui界面,但在生成图片的时就会出错显示“RuntimeError: “LayerNormKernelImpl” not implemented for ‘Half’ ”。
    下面时代码:
    DENIRED@DENIRED-PC:~$ cd stable-diffusion-webui/
    DENIRED@DENIRED-PC:~/stable-diffusion-webui$ source venv/bin/activate
    (venv) DENIRED@DENIRED-PC:~/stable-diffusion-webui$ HSA_OVERRIDE_GFX_VERSION=10.3.0 TORCH_COMMAND=’pip install torch torchvision –extra-index-url https://download.pytorch.org/whl/rocm5.1.1‘ COMMANDLINE_ARGS=’–skip-torch-cuda-test’ python3 launch.py
    Python 3.10.8 (main, Oct 23 2022, 16:59:54) [GCC 8.4.0]
    Commit hash: 1ef32c8b8fa3e16a1e7b287eb19d4fc943d1f2a5
    Installing requirements for Web UI
    Launching Web UI with arguments:
    /home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/__init__.py:83: UserWarning: HIP initialization: Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice (Triggered internally at ../c10/hip/HIPFunctions.cpp:110.)
    return torch._C._cuda_getDeviceCount() > 0
    Warning: caught exception ‘Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice’, memory monitor disabled
    LatentDiffusion: Running in eps-prediction mode
    DiffusionWrapper has 859.52 M params.
    making attention of type ‘vanilla’ with 512 in_channels
    Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
    making attention of type ‘vanilla’ with 512 in_channels
    Loading weights [4470c325] from /home/DENIRED/stable-diffusion-webui/models/Stable-diffusion/wd-v1-3-float32.ckpt
    Global Step: 683410
    Applying cross attention optimization (InvokeAI).
    Model loaded.
    Loaded a total of 0 textual inversion embeddings.
    Embeddings:
    Running on local URL: http://127.0.0.1:7860

    To create a public link, set `share=True` in `launch()`.
    Loading weights [81761151] from /home/DENIRED/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.ckpt
    Global Step: 840000
    Applying cross attention optimization (InvokeAI).
    Weights loaded.
    Error completing request
    Arguments: (‘a lonely girl’, ”, ‘None’, ‘None’, 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 0, 0, 0, False, False, None, ”, 1, ”, 0, ”, True, False, False) {}
    Traceback (most recent call last):
    File “/home/DENIRED/stable-diffusion-webui/modules/ui.py”, line 223, in f
    res = list(func(*args, **kwargs))
    File “/home/DENIRED/stable-diffusion-webui/webui.py”, line 63, in f
    res = func(*args, **kwargs)
    File “/home/DENIRED/stable-diffusion-webui/modules/txt2img.py”, line 48, in txt2img
    processed = process_images(p)
    File “/home/DENIRED/stable-diffusion-webui/modules/processing.py”, line 407, in process_images
    uc = prompt_parser.get_learned_conditioning(shared.sd_model, len(prompts) * [p.negative_prompt], p.steps)
    File “/home/DENIRED/stable-diffusion-webui/modules/prompt_parser.py”, line 138, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
    File “/home/DENIRED/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py”, line 558, in get_learned_conditioning
    c = self.cond_stage_model(c)
    File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
    return forward_call(*input, **kwargs)
    File “/home/DENIRED/stable-diffusion-webui/modules/sd_hijack.py”, line 334, in forward
    z1 = self.process_tokens(tokens, multipliers)
    File “/home/DENIRED/stable-diffusion-webui/modules/sd_hijack.py”, line 349, in process_tokens
    outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
    File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
    return forward_call(*input, **kwargs)
    File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py”, line 722, in forward
    return self.text_model(
    File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
    return forward_call(*input, **kwargs)
    File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py”, line 643, in forward
    encoder_outputs = self.encoder(
    File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
    return forward_call(*input, **kwargs)
    File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py”, line 574, in forward
    layer_outputs = encoder_layer(
    File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
    return forward_call(*input, **kwargs)
    File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py”, line 316, in forward
    hidden_states = self.layer_norm1(hidden_states)
    File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
    return forward_call(*input, **kwargs)
    File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py”, line 189, in forward
    return F.layer_norm(
    File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/functional.py”, line 2503, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
    RuntimeError: “LayerNormKernelImpl” not implemented for ‘Half’

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注