Environment Variables

predictor.sh reads these environment variables:

Reference

Variable

Description

Required

Default

HF_TOKEN

HuggingFace access token

For gated models

CUDA_VISIBLE_DEVICES

GPU selection

All GPUs

RUST_LOG

Log level

info

ORT_DYLIB_PATH

ONNX Runtime library path

Bundled

HuggingFace Token

Required for gated models like Llama 3, Mistral, etc.

Get Your Token

Go to huggingface.co/settings/tokens
Create a new token with "Read" access
Accept the model license on its HuggingFace page

Set the Token

# Export for session
export HF_TOKEN=hf_xxxxxxxxxxxxx

# Or set inline
HF_TOKEN=hf_xxx predictor up meta-llama/Llama-3-8B

Persistent Setup

Add to your shell profile (~/.bashrc, ~/.zshrc):

export HF_TOKEN=hf_xxxxxxxxxxxxx

GPU Selection

Control which GPUs predictor.sh uses:

# Use only GPU 0
CUDA_VISIBLE_DEVICES=0 predictor up ./llama.gguf

# Use GPUs 0 and 2
CUDA_VISIBLE_DEVICES=0,2 predictor up ./llama.gguf

# Disable all GPUs (force CPU)
CUDA_VISIBLE_DEVICES="" predictor up ./llama.gguf

This is a standard NVIDIA environment variable. See CUDA documentation for details.

Debug Logging

Enable detailed logging for troubleshooting:

# Debug level for predictor
RUST_LOG=predictor=debug predictor up ./llama.gguf

# Trace level (very verbose)
RUST_LOG=predictor=trace predictor up ./llama.gguf

# Multiple modules
RUST_LOG=predictor=debug,tower_http=debug predictor up ./llama.gguf

Alternatively, use the --verbose flag:

predictor up ./llama.gguf --verbose

ONNX Runtime Path

Override the bundled ONNX Runtime library:

ORT_DYLIB_PATH=/path/to/libonnxruntime.so predictor up ./model.onnx

This is rarely needed. Use only if you require a specific ONNX Runtime version.

Previouspredictor.yaml NextGPU Configuration

Last updated 1 month ago

hashtagReference

hashtagHuggingFace Token

hashtagGet Your Token

hashtagSet the Token

hashtagPersistent Setup

hashtagGPU Selection

hashtagDebug Logging

hashtagONNX Runtime Path