What You Can Run Locally with Ollama

Learning to run AI locally with Ollama has become the most common entry point into local language models. Only a few years ago running a model on a personal computer meant configuring Python environments and resolving command-line errors. The tool called Ollama has reduced that process to something close to installing any ordinary application. If you have been curious about local AI but assumed it demanded specialist knowledge, this is the right place to start.

What is Ollama?

Ollama is a free open-source tool that downloads and runs open-weight models on local hardware. It manages the components that previously required manual attention behind a single command. That means retrieving the correct files and selecting a quantization appropriate to the available hardware and serving the model to other applications. Once installed, retrieving and conversing with a model is effectively one line:

ollama run llama3.1 downloads Llama 3.1 and begins a conversation.
ollama run qwen2.5-coder runs a programming-focused model instead.

That is the entire procedure. The first invocation downloads the model. Every subsequent run starts within seconds and operates fully offline with no account and no network dependency.

What can you do with Ollama and local AI?

A private local chatbot

The most immediate application is a ChatGPT-style assistant that runs entirely on your own machine. It can answer questions and draft correspondence and assist with brainstorming, with the guarantee that nothing typed leaves the computer. Pairing Ollama with Open WebUI provides a polished browser-based chat interface comparable to commercial offerings.

A local coding assistant

Code-specialised models such as Qwen2.5-Coder perform strongly when run locally, and many editors and development tools can be directed at a local model in place of a paid cloud service. This is particularly valuable for proprietary codebases that should not be transmitted externally and for working without connectivity.

Document question-answering with RAG

Using a method called retrieval-augmented generation (RAG) a local model can answer questions about your own PDFs and notes and documents. The system locates the relevant passages and supplies them to the model, which answers from that material. This forms the basis of a private research assistant that never uploads its source documents. The local AI project ideas guide describes how to begin.

Summarisation and rewriting

Local models are well suited to routine text tasks such as condensing long articles and tightening prose and adjusting tone and translating between languages. These workloads run acceptably even on modest hardware, which makes them a sensible first use for a newly configured system.

Image generation: a clarification

Generating images is another popular local task but it is typically performed with Stable Diffusion through dedicated tools such as ComfyUI or Automatic1111 rather than through Ollama, which is oriented toward language models. The distinction is worth noting because the two are frequently conflated.

Ollama, LM Studio, or llama.cpp?

Ollama is the most widely recommended option but it is not the only one, and the right choice depends on how much control you want:

LM Studio is a graphical application for those who prefer to click rather than type commands. It offers model discovery and a chat interface in a single window at the cost of some flexibility.
Open WebUI is a self-hosted ChatGPT-like front end designed to pair with Ollama, suitable if you want a refined interface served from your own machine.
llama.cpp is the underlying inference engine on which much of this software is built, exposed directly for those who require maximum control over how models run.

For most newcomers Ollama paired with Open WebUI is the path of least resistance, while LM Studio suits those who prefer an entirely graphical workflow.

What runs well depends on your graphics card

Ollama makes starting straightforward but what runs well remains a function of VRAM. An 8 GB card handles 7-to-8-billion-parameter models comfortably. Greater VRAM unlocks larger and more capable ones. Consult the VRAM requirements guide for the size figures and run a build through the WillMyGPURunIt calculator to establish its exact options before downloading anything.