Quickstart

Get your first model endpoint running in under 5 minutes.

Prerequisites

  • predictor.sh CLI installed (Installation)

  • A predictor.sh account

Step 1: Authenticate

predictor login

This opens your browser for OAuth authentication. Once approved, you're ready to go.

Step 2: Serve a Model

Option A: Local Model File

If you have a GGUF model file:

predictor up ./llama-7b-q4.gguf

Option B: HuggingFace Model

Download and serve directly from HuggingFace:

First run will download the model. Subsequent runs use the cached version.

Step 3: Use Your Endpoint

Once running, you'll see output like:

Test with curl

Use with OpenAI SDK

Step 4: Monitor Your Endpoint

View live stats in the terminal UI, or check logs:

Step 5: Shutdown

Press Ctrl+C in the terminal, or from another terminal:

Your URL remains reserved for when you come back online.

Next Steps

Last updated