Run your own "ChatGPT" like locally

Ollama is a tool that enables users to run advanced language models, such as Mistral or llama, - both ChatGPT competitors - on their local machines.

Unlike traditional AI solutions that rely heavily on cloud-based infrastructure, Ollama empowers users to harness AI capabilities locally.

This does not only reduces dependency on internet connectivity but also enhances data privacy and security.

Installing Ollama

The tool is available on MacOS, Linux and Windows in preview. Download and run the installer ⬇️ :

Download Ollama on macOS
Download Ollama on macOS
Download Ollama on Linux
Download Ollama on Linux
Download Ollama on Windows
Download Ollama on Windows

Choosing a model

Multiples models are offered for free on Ollama's website, letting you choose the one which best match your needs:

library
Get up and running with large language models.

As a starter pack, I suggest to use Mistral (🇫🇷) or llama3.1 from Meta, two general purposes models :

llama3.1
Llama 3.1 is a new state-of-the-art model from Meta available in 8B, 70B and 405B parameter sizes.
mistral
The 7B model released by Mistral AI, updated to version 0.3.

Using a model locally

The first thing to do is start Ollama. Open the application and run the following command in your terminal to download llama3.1:

jeremy@macbookpro ~ » ollama pull llama3.1:8b
pulling manifest 
pulling 87048bcd5521... 100%  4.7 GB
pulling 11ce4ee3e170... 100%  1.7 KB
pulling f1cd752815fc... 100%  12 KB
pulling 56bb8bd477a5... 100%  96 B
pulling e711233e7343... 100%  485 B
verifying sha256 digest 
writing manifest 
removing any unused layers 
success 

You can interact with the model:

jeremy@macbookpro ~ » ollama run llama3.1:8b
>>> Send a message (/? for help)

Using a WebUI to interact with Ollama models

As Ollama is design to be a cli tool, this is not really user friendly. However, it also comes with a WebUI:

GitHub - open-webui/open-webui: User-friendly WebUI for LLMs (Formerly Ollama WebUI)
User-friendly WebUI for LLMs (Formerly Ollama WebUI) - open-webui/open-webui

I recommend using Docker for running the WebUI, while keeping Ollama outside of Docker to optimize performance. Here's a docker-compose stack you can use:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    volumes:
      - open-webui:/app/backend/data
    ports:
      - 3000:8080
    environment:
      - 'OLLAMA_BASE_URL=http://host.docker.internal:11434'
      - 'WEBUI_SECRET_KEY='
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped

volumes:
  open-webui: {}
🕵️‍♂️
The WebUI requires to have an account created

Here's the WebUI home after login:

As you can see, llama3.1 is defined as default model for me, on top of the screen. Use the dropdown menu to select a model available on your computer, or download one. If no model is present, please check the following configuration:

Open the settings ⬇️

Then the admin settings ⬇️

And then the connection panel ⬇️

On this panel, make sure Ollama API URL match "http://host.docker.internal:11434"

Configure web browsing

Still in the admin panel, you can configure the web browsing capabilities of the models:

Performances

I'm running Ollama on a 2021 Macbook Pro M1 Pro 16 Gb of memory, and the requests are usually processed in 2-5 seconds, based on my experience with daily usage over several weeks.