Running Large Language Models Locally
These instructions are taken from the Open WebUI tutorials, which can be found here. I used Python and Conda, but other platforms (e.g., Docker, Kubernetes) are also available.
Downloads
Note: These instructions were done on a Windows machine running WSL 2. They should also work for macOS. If you’re running a Windows machine and don’t have WSL installed, you can find more information about that here.
Install with conda
Create a new Conda environment. It can be named anything, but for the sake of following the tutorials, I’ll name it open-webui
.
conda create -n open-webui python=3.11
and activate it with
conda activate open-webui
Then intall Open WebUI with pip
:
pip install open-webui
Start the server:
open-webui serve
Then navigate to http://localhost:8080/ to access the ChatGPT-like UI.
To update the Open WebUI package, run
pip install -U open-webui
Ollama
Get Models via Ollama
Go to the Ollama website and navigate to “Models”. Or just click here. Each model will have instructions on it’s size. For example, if I want to download the 1b variant of llama3.2, I would run
ollama pull llama3.2:1b
which would install the model. To run the model, I would simply type
ollama run llama3.2:1b
We can now run any of the models we have installed from the command line. For example, if I installed llama3.2 via ollama pull llama3.2
, I could run it with
ollama run llama3.2
which then results in an interactive prompt that we can interact with:
~$ ollama run llama3.2
>>> Send a message (/? for help)
Citation
@online{gregory2025,
author = {Gregory, Josh},
title = {Running {Large} {Language} {Models} {Locally}},
date = {2025-01-18},
url = {https://joshgregory42.github.io/posts/2025-09-03-local-llm/},
langid = {en}
}