视频地址:https://www.ixigua.com/7247868403061359165
chatgpt引爆了AIGC,最近涌现了很多很棒的AI项目,其实浏览项目主页你会发现很多项目官方推荐的就是在Linux平台上进行训练、微调、推理。接下来跟大家分享一下在华为欧拉(openeuler)系统上安装docker和人工智能计算环境。目的是想让大家知道我们国产的操作系统是可以完美运行这些主流的技术项目的。
首先参考https://www.toutiao.com/article/7248211950994653731 安装好自己的欧拉服务器,然后ssh登录到自己的欧拉服务器,然后复制下面的命令到ssh中执行即可完成安装:
1、屏蔽内核开源的nvidia驱动
cat << EOF >> /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
EOF
2、重新生成内核
备份内核
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
重新生成内核
dracut -v /boot/initramfs-$(uname -r).img $(uname -r)
重启
reboot
3、安装软件包
yum install -y kernel-devel gcc make g++ libglvnd-devel
4、安装nvidia显卡驱动
根据自己的实际的nvidia显卡型号下载驱动
https://www.nvidia.cn/Download/index.aspx?lang=cn
赋予可执行权限
chmod 755 NVIDIA-Linux-x86_64-535.54.03.run
执行安装
./NVIDIA-Linux-x86_64-535.54.03.run --kernel-source-path /usr/src/kernels/5.10.0-136.36.0.112.oe2203sp1.x86_64
重启
reboot
5、安装docker
下载最新的docker软件包
wget https://download.docker.com/linux/static/stable/x86_64/docker-24.0.2.tgz
解压缩软件包
tar -xvzf docker-24.0.2.tgz
拷贝文件
cp docker/* /usr/bin/
生成docker的服务文件
(1)生成docker.service文件
cat << EOF >> /lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target docker.socket firewalld.service containerd.service time-set.target
Wants=network-online.target containerd.service
Requires=docker.socket
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutStartSec=0
RestartSec=2
Restart=always
# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3
# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
OOMScoreAdjust=-500
[Install]
WantedBy=multi-user.target
EOF
(2)生成docker.socket文件
cat << EOF >> /lib/systemd/system/docker.socket
[Unit]
Description=Docker Socket for the API
[Socket]
# If /var/run is not implemented as a symlink to /run, you may need to
# specify ListenStream=/var/run/docker.sock instead.
ListenStream=/run/docker.sock
SocketMode=0660
SocketUser=root
SocketGroup=docker
[Install]
WantedBy=sockets.target
EOF
(3)生成containerd.service文件
cat << EOF >> /lib/systemd/system/containerd.service
# Copyright The containerd Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target local-fs.target
[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/bin/containerd
Type=notify
Delegate=yes
KillMode=process
Restart=always
RestartSec=5
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
LimitNOFILE=infinity
# Comment TasksMax if your systemd version does not supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
OOMScoreAdjust=-999
[Install]
WantedBy=multi-user.target
EOF
(4)创建目录
mkdir -p /etc/docker
(5)生成docker配置文件
cat << EOF >> /etc/docker/daemon.json
{
"registry-mirrors": ["http://hub-mirror.c.163.com"]
}
EOF
(6)创建用户组
groupadd docker
(7)设置开机自动启动
systemctl daemon-reload
systemctl enable --now containerd
systemctl enable --now docker
(8)查看版本信息
docker version
6、安装nvidia容器工具包
生成仓库信息
curl -s -L https://nvidia.github.io/libnvidia-container/centos8/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
安装容器工具包
yum install -y nvidia-container-toolkit
生成配置文件
nvidia-ctk runtime configure --runtime=docker
重启docker
systemctl restart docker
验证nvidia容器工具包
docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
看到上面截图这样的输出,说明docker已经和nvidia容器工具包成功整合了,下一篇文章咱们来分享在欧拉系统上快速部署当下热门的Stalbe Diffusion来实现AI绘画。
页面更新:2024-05-02
本站资料均由网友自行发布提供,仅用于学习交流。如有版权问题,请与我联系,QQ:4156828
© CopyRight 2020-2024 All Rights Reserved. Powered By 71396.com 闽ICP备11008920号-4
闽公网安备35020302034903号