predictor.yaml

The predictor.yaml file configures your endpoint. Generate one interactively:

predictor init

Full Reference

# ═══════════════════════════════════════════════════════════════
# Model Source (choose one)
# ═══════════════════════════════════════════════════════════════
model: ./path/to/model.gguf        # Local file path
# OR
hf_model: org/model-id             # HuggingFace model ID

# ═══════════════════════════════════════════════════════════════
# Server Settings
# ═══════════════════════════════════════════════════════════════
port: 8000                         # Local server port
timeout: 60                        # Request timeout (seconds)
max_concurrent: 10                 # Connection pool size

# ═══════════════════════════════════════════════════════════════
# Model Type & API Format
# ═══════════════════════════════════════════════════════════════
type: text                         # text | audio | image (auto-detected)
api_format: auto                   # auto | openai | generic
streaming_format: sse              # sse | raw

# ═══════════════════════════════════════════════════════════════
# Health Check
# ═══════════════════════════════════════════════════════════════
health_check:
  endpoint: /health                # Health check path
  interval: 30                     # Check interval (seconds)

# ═══════════════════════════════════════════════════════════════
# Hardware Configuration
# ═══════════════════════════════════════════════════════════════
hardware:
  backend: auto                    # auto | cuda | metal | cpu

# ═══════════════════════════════════════════════════════════════
# Inference Settings
# ═══════════════════════════════════════════════════════════════
inference:
  max_concurrent: 1                # Sequential by default
  request_queue_size: 100          # Queue depth limit

# ═══════════════════════════════════════════════════════════════
# Security
# ═══════════════════════════════════════════════════════════════
allowed_ips:                       # IP allowlist (CIDR)
  - "192.168.1.0/24"
  - "10.0.0.5"

# ═══════════════════════════════════════════════════════════════
# Logging
# ═══════════════════════════════════════════════════════════════
log_level: info                    # info | metadata | headers | bodies

# ═══════════════════════════════════════════════════════════════
# Metadata (displayed in dashboard)
# ═══════════════════════════════════════════════════════════════
metadata:
  name: "My Model"
  version: "v1.0"
  type: "llm"

# ═══════════════════════════════════════════════════════════════
# Audio Settings (Whisper)
# ═══════════════════════════════════════════════════════════════
audio:
  language: auto                   # or specific language code

# ═══════════════════════════════════════════════════════════════
# TTS Settings
# ═══════════════════════════════════════════════════════════════
tts:
  default_voice: af_bella

Minimal Examples

Text Model (GGUF)

model: ./llama-7b-q4.gguf

HuggingFace Model

hf_model: TheBloke/Llama-2-7B-GGUF

Whisper

hf_model: openai/whisper-large-v3
audio:
  language: en

TTS

hf_model: hexgrad/Kokoro-82M
tts:
  default_voice: af_bella

Configuration Priority

Settings can be specified in multiple places. Priority (highest to lowest):

Command-line arguments and flags (--port, etc.)
predictor.yaml file
Default values

Log Levels

Level

What's Logged

info

Basic request info

metadata

Timestamp, status, latency, size

headers

+ Request/response headers

bodies

+ First 1KB of request/response bodies

headers and bodies levels may expose sensitive data. Use with caution.

IP Allowlist

Restrict access to specific IP addresses or ranges:

allowed_ips:
  - "203.0.113.50"        # Single IP
  - "10.0.0.0/8"          # CIDR range
  - "192.168.1.0/24"      # Another range

Requests from non-allowed IPs receive 403 Forbidden.

PreviousStreaming NextEnvironment Variables

Last updated 1 month ago

hashtagFull Reference

hashtagMinimal Examples

hashtagText Model (GGUF)

hashtagHuggingFace Model

hashtagWhisper

hashtagTTS

hashtagConfiguration Priority

hashtagLog Levels

hashtagIP Allowlist

Full Reference

Minimal Examples

Text Model (GGUF)

HuggingFace Model

Whisper

TTS

Configuration Priority

Log Levels

IP Allowlist