Self-Host Your Own ChatGPT: A Pro Guide to Open WebUI and Docker

Table of Contents

Take Control of Your Data

Privacy is the primary reason to ditch cloud-based AI. Every prompt sent to a public model is potentially stored, audited, or used for training. For developers handling sensitive client code or proprietary data, this is a massive risk. Open WebUI bridges this gap. It provides a polished, ChatGPT-like interface that runs locally on your own hardware.

I have used this setup to bridge the gap between raw local models and a productive team environment. It turns a command-line tool into a collaborative platform. You aren’t just running a model; you’re building a private AI ecosystem.

Quick Start: Up and Running in 5 Minutes

If Docker is ready and Ollama is already running on your machine, you can launch the interface with one command. This is the fastest way to test the UI before committing to a permanent configuration.

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

After the container starts, visit http://localhost:3000. The first user to register becomes the administrator. The host.docker.internal flag is the secret sauce here. It allows the containerized UI to communicate with the Ollama service running on your host OS.

Professional Deployment with Docker Compose

A single docker run is fine for testing, but I recommend Docker Compose for long-term use. It centralizes your environment variables, network settings, and data volumes in one version-controlled file. This approach is essential when you begin integrating RAG (Retrieval-Augmented Generation) or connecting multiple backends.

The Optimized Configuration

Create a docker-compose.yaml file. This configuration ensures the UI and the Ollama backend reside on the same virtual network for reliable communication.

services:
  ollama:
    volumes:
      - ./ollama:/root/.ollama
    container_name: ollama
    pull_policy: always
    tty: true
    restart: unless-stopped
    image: ollama/ollama:latest

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    volumes:
      - ./open-webui:/app/backend/data
    depends_on:
      - ollama
    ports:
      - 3000:8080
    environment:
      - 'OLLAMA_BASE_URL=http://ollama:11434'
      - 'WEBUI_SECRET_KEY=change_this_to_a_long_random_string'
    restart: unless-stopped

Don’t Lose Your Data

A common mistake is forgetting volume mapping. Open WebUI stores chat history and user settings in a SQLite database inside the container. If you don’t map /app/backend/data to a folder on your hard drive, your data vanishes during the next update. In the example above, I use ./open-webui to keep everything portable and easy to back up.

Advanced Performance: GPU and RAG

Running a model like Llama 3.1 70B on a CPU is a recipe for frustration. You will likely see speeds as slow as 1-2 tokens per second. To get 50+ tokens per second, you must pass your GPU through to Docker. NVIDIA users will need the NVIDIA Container Toolkit installed on their host system.

Enabling Hardware Acceleration

Update the ollama service in your Compose file to reserve GPU resources:

deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: all
          capabilities: [gpu]

This simple addition transforms the experience. Responses go from sluggish, word-by-word generation to near-instantaneous results.

Making the AI Smarter with RAG

Open WebUI includes a built-in RAG engine. You can drop PDFs or text files directly into the chat. The system indexes them and uses that context to answer questions. For technical documentation, I suggest setting RAG_TOP_K to 5 or 10. This forces the model to look at more document snippets, leading to more accurate technical answers.

Maintenance and Security Tips

Safe Updates

The Open WebUI team releases updates frequently—often several times a week. To update without losing your chats, follow this three-step workflow:

docker compose down to stop services.
docker compose pull to grab the latest images.
docker compose up -d to restart.

Since we defined persistent volumes earlier, your database remains safe.

Hardening for Team Use

If you share this instance with colleagues, the default settings are too permissive. Once everyone has an account, set ENABLE_SIGNUP=False in your environment variables to prevent strangers from creating accounts. Also, always use a reverse proxy like Nginx or Traefik to enable HTTPS if the UI is accessible over a network.

The Hybrid Hub Strategy

Local models are great, but sometimes you need the raw power of GPT-4o. I often use Open WebUI as a hybrid hub. By adding an OpenAI or Anthropic API key in the settings, you can switch between a free local model for basic tasks and a premium cloud model for complex reasoning—all within the same conversation thread.

Finally, keep an eye on your hardware. If the UI feels laggy, check your RAM allocation. Docker Desktop on Windows and Mac defaults to 2GB or 4GB of RAM. For a smooth experience with LLMs, I recommend bumping this to at least 16GB in the Docker settings.