登录到计算巢所使用的服务器上,发现通过 docker-compose 部署的 ollama 没有启动
部署清单文件 /root/application/docker-compose.yaml 内容如下:
services:
ollama:
volumes:
- ollama:/root/.ollama
container_name: ollama
pull_policy: if_not_present
tty: true
restart: unless-stopped
image: ollama/ollama:latest
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities:
- gpu
ports:
- 11434:11434
environment:
OLLAMA_ORIGINS: "*"
OLLAMA_HOST: "0.0.0.0"
open-webui:
image: ghcr.io/open-webui/open-webui:ollama
container_name: open-webui
volumes:
- open-webui:/app/backend/data
depends_on:
- ollama
ports:
- 8080:8080
environment:
- 'ENABLE_OPENAI_API=False'
- 'OLLAMA_BASE_URL=http://ollama:11434'
- 'WEBUI_SECRET_KEY='
extra_hosts:
- host.docker.internal:host-gateway
restart: unless-stopped
volumes:
ollama: {}
open-webui: {}
运行 docker compose up
命令重新部署,出现下面的错误
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown
问题的原因是这台阿里云 ecs 服务器没有安装 nvidia 驱动,正好之前遇到过这款 ecs 安装 nvidia 驱动的问题,详见博文:阿里云轻量级 GPU 实例安装 NVIDIA 驱动
通过下面的命令安装好驱动,问题就解决了
acs-plugin-manager --exec --plugin grid_driver_install