使用 AMD Radeon RX 6600XT 在 Deepin 上运行 Stable Diffusion

2022年10月17日

18:41

使用 AMD Radeon RX 6600XT 在 Deepin 上运行 Stable Diffusion有 6 条评论

最近 AI 绘画十分火爆，我看得也有些心痒痒。于是在移动 SSD 里面装了一个 Deepin 20.7，顺便也能当个随身系统了。

先前我已经进行了多次试验，证明 AMD 显卡采用 DirectML 无法在 Windows 下运行泄露模型，因为没有合理的方式转换为 ONNX 格式模型。而且，DirectML 运行 ONNX 的其他 Stable Diffusion 模型效率也偏低。

于是我只能选择在 Linux 下采用 ROCm 运行，也踩到了不少坑。

指导链接：Install-and-Run-on-AMD-GPUs

环境安装

基于 ROCm 的深度学习环境在 Linux 下环境配置相对复杂，我们分节描述。

ROCm 环境准备

Deepin 官方的软件源缺少部分软件包，所以需要加入 Ubuntu 的 security 源拉取部分内容。首先，从 Ubuntu 服务器和 AMD Radeon 拉取必备的 GPG 公钥。

gpg --keyserver keyserver.ubuntu.com --recv-keys 16126D3A3E5C1192
curl https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -
gpg --armor --export | sudo apt-key add -
echo 'deb http://security.ubuntu.com/ubuntu bionic-security main universe' | sudo tee -a /etc/apt/sources.list
echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/5.1.1 ubuntu main' | sudo tee /etc/apt/sources.list.d/rocm.list
sudo apt update

然后安装必备依赖项：

sudo apt install libnuma-dev libpython3.8 rocm-dev rocm-libs

安装成功后使用 rocm-smi 命令查看 GPU，若出现形如以下内容的 GPU 信息则安装成功，首次安装完成可能需要重启，然后即可在 /etc/apt/sources.list 去除 Ubuntu Security 源。

======================= ROCm System Management Interface =======================
================================= Concise Info =================================
GPU  Temp   AvgPwr  SCLK     MCLK    Fan  Perf  PwrCap  VRA
0    55.0c  18.0W   2735Mhz  541Mhz  0%   auto  130.0W   46%   3%    
================================================================================
============================= End of ROCm SMI Log ==============================

然后配置以下动态库路径，编辑 /etc/ld.so.conf.d/10-rocm.conf：

/opt/rocm-5.1.1/lib
/opt/rocm-5.1.1/rocsolver/lib
/opt/rocm-5.1.1/rocblas/lib
/opt/rocm-5.1.1/rocclr/lib

此时 ROCm 的基本环境搭建完成。

部分参考：deepin20.6正确安装rocm，前面一直整不好，今天终于搞定了

Python 环境准备

这里主要是因为 Deepin 默认预装的是 Python 3.7，版本过老，Ubuntu 等发行版可忽略，该部分只是照顾 Linux 基础较差的读者。

下载 Python 3.10 的源码包，解压后进入该目录。

安装必备依赖和编译安装：

sudo apt install -y make build-essential libssl-dev zlib1g-dev liblzma-dev libbz2-dev libreadline-dev libsqlite3-dev llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev
./configure --enable-optimizations --with-ssl
make "-j$(nproc)"
sudo make altinstall

Stable Diffusion WebUI 搭建

首先是基础的环境搭建：

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui
python3.10 -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip wheel

此时注意，若遵照 Wiki 直接运行后面的指令则会出现以下报错:
cuda error and rocm aborted
这是由于首先我们不是 CUDA 显卡，需要跳过 CUDA 测试，这个老生常谈。

但是 Aborted 呢？查阅资料后我了解到 AMD Radeon RX 6600XT 是没有被 ROCm 官方支持的显卡（代号为 gfx1032）。因此我们需要进行 HSA override，以 gfx1030 仿冒运行，最终启动命令如下：

HSA_OVERRIDE_GFX_VERSION=10.3.0 TORCH_COMMAND='pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.1.1' COMMANDLINE_ARGS='--skip-torch-cuda-test' python3 launch.py

这里对于 RX6600XT 不需要强制使用全精度。

如果一切良好，你应当能够在 http://127.0.0.1:7860 访问你的实例。

然后你就可以把你的模型放入 models/Stable-diffusion 下了。

性能问题解决

首先，AMD 启动后第一次生成时间会相对较长，这是正常现象。

但是，对于没有官方支持的显卡，电脑开机后第一次使用 ROCm 计算时，驱动会自动将显卡锁在低性能。可采用 rocm-smi 查看：
low perf

因此，我们需要执行一次以下指令解除锁定：

rocm-smi -d 0 --setperflevel auto

尾声

最后我写了一个脚本 run.sh：

#!/bin/bash
cd "$(dirname $0)"
source ./venv/bin/activate
export HSA_OVERRIDE_GFX_VERSION=10.3.0
export TORCH_COMMAND='pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.1.1'
export COMMANDLINE_ARGS='--skip-torch-cuda-test'
sudo bash -c '(sleep 60 ; rocm-smi -d 0 --setperflevel auto) &'
python3 launch.py "$@"

AMD 显卡采用 ROCm 计算的速度是不慢的，而且具备显存不足时调用内存的能力，即使显存不是很大的显卡也可以用于生成分辨率较高的图像，环境折腾虽然麻烦却也值得。

6 Replies to “使用 AMD Radeon RX 6600XT 在 Deepin 上运行 Stable Diffusion”

caibirdme说道：

2023年3月28日 20:38

请教下，怎么确认是GPU在跑而不是CPU呢？

回复
VanJay说道：

2023年3月26日 03:09

弄了一天了，cuda还是不可用裂开了

回复
malossov说道：

2022年10月30日 00:46

出现了一些错误：_init__.py:88: UserWarning: HIP initialization: Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice (Triggered internally at ../c10/hip/HIPFunctions.cpp:110.)
return torch._C._cuda_getDeviceCount() > 0
Warning: caught exception ‘Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice’, memory monitor disabled
LatentDiffusion: Running in eps-prediction mode
rocm的版本是5.2.0的，也安装了对应的torch和torchvision包，请问这是为什么检测不到gpu呢？
使用的命令是：HSA_OVERRIDE_GFX_VERSION=10.3.0 TORCH_COMMAND=’pip install torch torchvision –extra-index-url https://download.pytorch.org/whl/rocm5.2‘ COMMANDLINE_ARGS=’–skip-torch-cuda-test’ python3 launch.py –no-half
显卡是RX6600

回复
DENIRED说道：

2022年10月23日 19:36

大佬求助，我按照步骤做下来成功进入来ui界面，但在生成图片的时就会出错显示“RuntimeError: “LayerNormKernelImpl” not implemented for ‘Half’ ”。
下面时代码：
DENIRED@DENIRED-PC:~$ cd stable-diffusion-webui/
DENIRED@DENIRED-PC:~/stable-diffusion-webui$ source venv/bin/activate
(venv) DENIRED@DENIRED-PC:~/stable-diffusion-webui$ HSA_OVERRIDE_GFX_VERSION=10.3.0 TORCH_COMMAND=’pip install torch torchvision –extra-index-url https://download.pytorch.org/whl/rocm5.1.1‘ COMMANDLINE_ARGS=’–skip-torch-cuda-test’ python3 launch.py
Python 3.10.8 (main, Oct 23 2022, 16:59:54) [GCC 8.4.0]
Commit hash: 1ef32c8b8fa3e16a1e7b287eb19d4fc943d1f2a5
Installing requirements for Web UI
Launching Web UI with arguments:
/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/__init__.py:83: UserWarning: HIP initialization: Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice (Triggered internally at ../c10/hip/HIPFunctions.cpp:110.)
return torch._C._cuda_getDeviceCount() > 0
Warning: caught exception ‘Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice’, memory monitor disabled
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type ‘vanilla’ with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type ‘vanilla’ with 512 in_channels
Loading weights [4470c325] from /home/DENIRED/stable-diffusion-webui/models/Stable-diffusion/wd-v1-3-float32.ckpt
Global Step: 683410
Applying cross attention optimization (InvokeAI).
Model loaded.
Loaded a total of 0 textual inversion embeddings.
Embeddings:
Running on local URL: http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Loading weights [81761151] from /home/DENIRED/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.ckpt
Global Step: 840000
Applying cross attention optimization (InvokeAI).
Weights loaded.
Error completing request
Arguments: (‘a lonely girl’, ”, ‘None’, ‘None’, 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 0, 0, 0, False, False, None, ”, 1, ”, 0, ”, True, False, False) {}
Traceback (most recent call last):
File “/home/DENIRED/stable-diffusion-webui/modules/ui.py”, line 223, in f
res = list(func(*args, **kwargs))
File “/home/DENIRED/stable-diffusion-webui/webui.py”, line 63, in f
res = func(*args, **kwargs)
File “/home/DENIRED/stable-diffusion-webui/modules/txt2img.py”, line 48, in txt2img
processed = process_images(p)
File “/home/DENIRED/stable-diffusion-webui/modules/processing.py”, line 407, in process_images
uc = prompt_parser.get_learned_conditioning(shared.sd_model, len(prompts) * [p.negative_prompt], p.steps)
File “/home/DENIRED/stable-diffusion-webui/modules/prompt_parser.py”, line 138, in get_learned_conditioning
conds = model.get_learned_conditioning(texts)
File “/home/DENIRED/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py”, line 558, in get_learned_conditioning
c = self.cond_stage_model(c)
File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/home/DENIRED/stable-diffusion-webui/modules/sd_hijack.py”, line 334, in forward
z1 = self.process_tokens(tokens, multipliers)
File “/home/DENIRED/stable-diffusion-webui/modules/sd_hijack.py”, line 349, in process_tokens
outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py”, line 722, in forward
return self.text_model(
File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py”, line 643, in forward
encoder_outputs = self.encoder(
File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py”, line 574, in forward
layer_outputs = encoder_layer(
File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py”, line 316, in forward
hidden_states = self.layer_norm1(hidden_states)
File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py”, line 189, in forward
return F.layer_norm(
File “/home/DENIRED/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/functional.py”, line 2503, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: “LayerNormKernelImpl” not implemented for ‘Half’

回复
1. Victor Huang说道：
  
  2022年10月23日 21:31
  
  建议尝试加上原 repo wiki 中的 no half 等参数，你的显卡可能没有半精度运算单元。
  
  回复
  1. DENIRED说道：
    
    2022年10月24日 08:06
    
    确实可以啦！！谢谢啊大佬
    
    回复