GPU Configuration
predictor.sh automatically detects and uses available GPU acceleration.
Supported Backends
Metal
macOS
Apple Silicon (M1/M2/M3)
CUDA
Linux/Windows
NVIDIA GPUs (RTX 20/30/40 series)
CPU
All
Fallback (slower)
Automatic Detection
When you run predictor up, it automatically detects available GPUs:
$ predictor up --model ./llama.gguf
Detecting hardware...
CUDA: NVIDIA RTX 4090 (24GB VRAM) ✓
Loading model: llama-7b-q4.gguf
Size: 3.8GB
Loading... ████████████████████ 100%
✓ Tunnel establishedForce CPU Mode
If no GPU is detected, predictor.sh requires explicit consent to run on CPU:
Use the --cpu flag to proceed:
GPU Selection (Multi-GPU)
On systems with multiple GPUs, use CUDA_VISIBLE_DEVICES:
VRAM Requirements
predictor.sh checks if your GPU has enough VRAM before loading:
Typical VRAM Usage (7B Models)
Q4_K_M
~4GB
Q5_K_M
~5GB
Q6_K
~6GB
Q8_0
~8GB
F16
~14GB
Troubleshooting
CUDA Not Detected
Check NVIDIA drivers:
Verify CUDA installation:
Check for GPU visibility:
Metal Not Detected
Metal is only available on macOS with Apple Silicon. Intel Macs do not support Metal acceleration.
Verify your chip:
Out of Memory Errors
If you see CUDA/Metal out-of-memory errors:
Use a smaller quantization (e.g., Q4_K_M instead of Q8_0)
Close other GPU-intensive applications
Try a smaller model
GPU Temperature
Monitor GPU temperature in the TUI:
If temperature is high (>80°C), consider:
Improving case airflow
Reducing request concurrency
Using a lower-power model
Last updated