News
3 new Serverspace GPT API Language Models available now!
JT
July 4 2025
Updated July 4 2025

Deploying AI/LLM on your computer

Linux Security

Automation is often not enough to solve local problems, which would simplify the user experience, both for solving personal issues and would be an excellent module in the IP. In this article, we will look at how to deploy local LLM or well-known AI models via Docker using Ollama. The instruction is suitable for Ubuntu, Windows (WSL2) or macOS.

If your capacities are not enough to solve the tasks, then Serverspace provides the latest GPT models that are just as easy to integrate via API or use in a web dashboard.

Below is a step-by-step process for launching the language model via Docker using the CPU.

1. Installing Docker

Linux (Ubuntu/Debian)

sudo apt update
sudo apt install docker.io
sudo systemctl enable --now docker

Windows/macOS

Download Docker Desktop from the official website and install it. After installation, check the operation:

winget install Docker.DockerDesktop

And then you can check its performance by running the command:

docker run hello-world

2. Running Ollama in Docker (on CPU)

Ollama is a local server for running LLM (for example, LLaMA, Mistral, etc.).

docker run -d \
--name ollama \
-p 11434:11434 \
-v ollama_data:/root/.ollama \
ollama/ollama

The parameter -v ollama_data:/root/.ollama — saves models between restarts. By default, Ollama will use the CPU if there are no GPU drivers inside the container.

docker ps
docker exec -it ollama ollama pull llama3
docker exec -it ollama ollama list

3. Connecting the web interface — Open WebUI

docker run -d \
--name open-webui \
-p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:ollama

The interface will be available on: http://localhost:3000

4. Alternative: Text Generation Web UI (on CPU)

Create a Dockerfile:

FROM python:3.11-slim
RUN apt-get update && apt-get install -y git build-essential
RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
RUN git clone https://github.com/oobabooga/text-generation-webui
WORKDIR /text-generation-webui
RUN pip install -r requirements_cpu_only.txt
COPY start.sh /start.sh
RUN chmod +x /start.sh
ENTRYPOINT ["/start.sh"]

The startup script start.sh:

#!/bin/bash
rm -rf models && ln -s /models models
python server.py --listen --cpu

docker-compose.yml:

version: '3.8'
services:
textgen:
build: .
container_name: textgen_cpu
ports:
- "7860:7860"
volumes:
- ./models:/models

Launch:

docker compose up -d

To communicate with the model, you can also use the API, which will allow you to integrate it as a module into any IC. The final endpoint of the appeal:

POST http://localhost:11434/api/generate

To make an API request, you must use any tool or library.:

curl http://localhost:11434/api/generate \
-d '{
"model": "llama3",
"prompt": "Tell me what quantum physics is in simple words",
"stream": false
}'
  • model - The name of the model, for example llama3, mistral, gemma, phi, etc.
  • prompt - The text of the request
  • stream - If `true`, the response will come in parts (convenient for chat). If `false` — the entire response at once
  • options - Additional generation parameters

Example with generation settings:


curl http://localhost:11434/api/generate \
-d'{
"model": "llama3",
"prompt": "Write a short story about a robot.",
"stream": false,
"options": {
"temperature": 0.8,
"top_k": 40,
"top_p": 0.9,
"num_predict": 200
}
}'
  • temperature - Randomness control. The higher the value, the more creative
  • top_k is the limit on the number of tokens, among which the next one is selected
  • top_p is Probability control (nuclear sampling)
  • num_predict is how many tokens to generate at most

5. Optimization for CPU

Limitation of Docker container resources:

docker run --cpus=4 --memory=16g ollama/ollama
Use quantized models (GGUF/ggml) to reduce the load on the CLI interface (via curl) faster than WebUI
Vote:
5 out of 5
Аverage rating : 5
Rated by: 1
33145 North Miami, FL 2520 Coral Way apt 2-135
+1 302 425-97-76
700 300
ITGLOBAL.COM CORP
700 300

You might also like...

We use cookies to make your experience on the Serverspace better. By continuing to browse our website, you agree to our
Use of Cookies and Privacy Policy.