Definition
Ollama is an open-source platform for running large language models locally with a single command. Written mainly in Go and using llama.cpp as its inference backend, it packages models, weights, and configuration into a simple workflow so that a command such as ollama run downloads a model and starts a chat session. It aims to make self-hosted AI as approachable as installing any other developer tool.
Model management and API
Ollama treats models a little like container images: you pull a named model, it is cached locally, and it can be swapped or removed cleanly. It serves models over a local REST API on port 11434 (for example a POST to localhost:11434/api/chat), and provides Python and JavaScript client libraries. This lets developers wire local models into their own applications without sending prompts to a third party.
Where it fits
Ollama runs on macOS, Windows, Linux, and Docker, which makes it a practical choice for a home server or a spare workstation alongside other self-hosted infrastructure. Because it builds on llama.cpp, it inherits that engine's broad hardware support and GGUF model format, while hiding most of the lower-level configuration.
For the engine underneath Ollama, see llama.cpp; for the model file format it manages, see GGUF.
Find local-AI runtimes in the sovereign self-hosting catalog.
In Simple Terms
Ollama is an open-source platform for running large language models locally with a single command. Written mainly in Go and using llama.cpp as its inference…
