Ollama¶
What is Ollama?¶
Ollama is a lightweight framework for running large language models (LLMs) locally on your own hardware. It provides a simple CLI and REST API for downloading and interacting with open-source models such as Llama, Mistral, Qwen, Gemma, and many others — without sending data to external servers.
Base Environment
Ollama is provided via the environment modules system.
GPU Nodes Only
Ollama is only available on GPU nodes.
You must first start an interactive GPU session or submit a GPU job before loading the module:
Loading Ollama¶
Available Models¶
To see all models currently available on the cluster:
Browse all models available for download at https://ollama.com/search.
Pulling a Model¶
Specify a variant
Many models have multiple size variants. You can specify one explicitly:
Running a Model¶
Interactive mode¶
Single prompt (non-interactive)¶
Using the REST API¶
Ollama exposes a REST API at http://localhost:11434. This is the recommended way to interact with Ollama programmatically.
Basic API request
Chat API (OpenAI-compatible)
Use the API over the CLI for scripting
The CLI (ollama run) outputs terminal control characters that may appear as garbled text when captured in scripts or logs. The REST API returns clean JSON and is preferred for any programmatic use.
Using Ollama in Python¶
GPU Nodes Only
Ollama is only available on GPU nodes. Python scripts using Ollama must be submitted via sbatch with the GPU partition.
example.py
Basic Python usage
ollama_job.sh
Example SLURM job
#!/bin/bash
#SBATCH --job-name=ollama_test
#SBATCH --partition=GPU
#SBATCH --nodes=1
#SBATCH --gres=gpu:1
#SBATCH --mem=16G
#SBATCH --time=01:00:00
#SBATCH --output=ollama_%j.out
module load ollama/0.17.7
source ~/miniconda3/bin/activate
conda activate myenv # change to your environment name, or remove if using base
python example.py
Model Information¶
To inspect details about a specific model (architecture, parameters, quantization, capabilities):
Capabilities¶
Models on Ollama may support different capabilities:
| Capability | Description |
|---|---|
completion |
Standard text generation |
tools |
Function/tool calling support |
vision |
Image input support |
thinking |
Extended reasoning/chain-of-thought |
embedding |
Text embedding generation |
More Information¶
Documentation
Official Ollama documentation:
Available models: