GPU Configuration

predictor.sh automatically detects and uses available GPU acceleration.

Supported Backends

Backend

Platform

Hardware

Metal

macOS

Apple Silicon (M1/M2/M3)

CUDA

Linux/Windows

NVIDIA GPUs (RTX 20/30/40 series)

CPU

All

Fallback (slower)

Automatic Detection

When you run predictor up, it automatically detects available GPUs:

$ predictor up ./llama.gguf

Detecting hardware...
  CUDA: NVIDIA RTX 4090 (24GB VRAM) ✓

Loading model: llama-7b-q4.gguf
  Size: 3.8GB
  Loading... ████████████████████ 100%

✓ Tunnel established

Force CPU Mode

If no GPU is detected, predictor.sh requires explicit consent to run on CPU:

$ predictor up ./llama.gguf

Detecting hardware...
  CUDA: Not available (driver not found)
  Metal: Not available (not macOS)

Error: No GPU detected. Use --cpu flag to run on CPU (slower).

  predictor up ./llama.gguf --cpu

Use the --cpu flag to proceed:

predictor up ./llama.gguf --cpu

GPU Selection (Multi-GPU)

On systems with multiple GPUs, use CUDA_VISIBLE_DEVICES:

# Use only GPU 0
CUDA_VISIBLE_DEVICES=0 predictor up ./llama.gguf

# Use GPUs 0 and 1
CUDA_VISIBLE_DEVICES=0,1 predictor up ./llama.gguf

VRAM Requirements

predictor.sh checks if your GPU has enough VRAM before loading:

Error: Model requires ~16GB VRAM, but GPU has 8GB available.

Options:
  1. Use a smaller quantized model (Q4_K_M uses ~4GB)
  2. Run on CPU with --cpu flag (slower)

VRAM Calculator: https://predictor.sh/vram-calculator

Typical VRAM Usage (7B Models)

Quantization

VRAM Required

Q4_K_M

~4GB

Q5_K_M

~5GB

Q6_K

~6GB

Q8_0

~8GB

F16

~14GB

Troubleshooting

CUDA Not Detected

Check NVIDIA drivers:
```
nvidia-smi
```
Verify CUDA installation:
```
nvcc --version
```
Check for GPU visibility:
```
echo $CUDA_VISIBLE_DEVICES
```

Metal Not Detected

Metal is only available on macOS with Apple Silicon. Intel Macs do not support Metal acceleration.

Verify your chip:

sysctl -n machdep.cpu.brand_string

Out of Memory Errors

If you see CUDA/Metal out-of-memory errors:

Use a smaller quantization (e.g., Q4_K_M instead of Q8_0)
Close other GPU-intensive applications
Try a smaller model

GPU Temperature

Monitor GPU temperature in the TUI:

┌─ predictor ─────────────────────────────────────────────────┐
│ 🎮 CUDA | RTX 4090 | 4.2GB/24GB VRAM | 45°C                 │

If temperature is high (>80°C), consider:

Improving case airflow
Reducing request concurrency
Using a lower-power model

PreviousEnvironment Variables NextCLI Commands

Last updated 1 month ago

hashtagSupported Backends

hashtagAutomatic Detection

hashtagForce CPU Mode

hashtagGPU Selection (Multi-GPU)

hashtagVRAM Requirements

hashtagTypical VRAM Usage (7B Models)

hashtagTroubleshooting

hashtagCUDA Not Detected

hashtagMetal Not Detected

hashtagOut of Memory Errors

hashtagGPU Temperature