Optional
NVIDIA GPU Drivers
Skip this section if you have no NVIDIA GPU. CPU-only setups work fine.
💡
Why Add a GPU?
An NVIDIA GPU dramatically accelerates AI inference. A 7B model that takes 15 seconds on CPU can respond in under 1 second with a GPU. Ollama automatically detects and uses NVIDIA GPUs via CUDA.
Step 1 — Verify GPU is Detected
bash
# Check if your NVIDIA card is visible
lspci | grep -i nvidia
# Example output: 01:00.0 VGA: NVIDIA Corporation GA106 [RTX 3060] (rev a1)Step 2 — Clean Any Existing NVIDIA Packages
bash
# Only run if you had a previous failed installation
sudo apt-get remove --purge 'libnvidia-.*'
sudo apt-get remove --purge '^nvidia-.*'
sudo apt-get remove --purge '^cuda-.*'
sudo apt clean && sudo apt autoremoveStep 3 — Install NVIDIA Drivers
bash
# Add graphics drivers PPA
sudo add-apt-repository ppa:graphics-drivers/ppa --yes
sudo apt-get update
update-pciids
# Install driver (570 is current stable — check nvidia.com for latest)
sudo apt-get install nvidia-driver-570 -y
sudo apt-get reinstall linux-headers-$(uname -r)
sudo update-initramfs -u
# Validate DKMS modules
sudo dkms status
# Reboot
sudo rebootStep 4 — Verify GPU Driver
bash
# After reboot — check GPU status
nvidia-smi
# Should show your GPU name, VRAM, driver version, CUDA versionStep 5 — Install CUDA Toolkit
bash
# Download and install CUDA keyring
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
# Install CUDA toolkit
sudo apt-get install cuda-toolkit -y
sudo apt-get install nvidia-gds -y
# Verify and reboot
sudo dkms status
sudo rebootStep 6 — Verify Ollama Detects the GPU
bash
# After reboot — test a model with GPU
ollama run llama3.2 "Hello" --verbose
# Look for "gpu layers" in the output — confirms GPU is being used
# Check which GPU Ollama is using
ollama ps
# PROCESSOR column should show: GPU (not CPU)GPU Compatibility Table
| GPU | VRAM | Models it Handles | Notes |
|---|---|---|---|
| RTX 3060 / 4060 | 12 GB | 7B–13B models fully in GPU | Great entry GPU for AI |
| RTX 3090 / 4090 | 24 GB | 13B–34B models in GPU | Excellent performance |
| RTX 4060 Ti | 16 GB | 13B–20B models in GPU | Recommended mid-range |
| Tesla P100 | 16 GB | 13B–20B via CUDA | Great used/refurb option |
| Tesla V100 32G | 32 GB | 34B models in GPU | Used servers — great value |