Troubleshooting

Common issues and solutions.

Connection Issues

"Tunnel connection failed"

Symptoms: Can't establish tunnel to predictor.sh servers.

Solutions:

  1. Check internet connection:

    curl https://predictor.sh
  2. Check firewall: Ensure outbound HTTPS (443) is allowed.

  3. Corporate proxy: Configure proxy settings:

    export HTTPS_PROXY=http://proxy.company.com:8080
  4. VPN interference: Try disconnecting VPN temporarily.

"Endpoint abc123 is already connected"

Cause: Another predictor instance is using this endpoint.

Solutions:

  1. Stop the other instance:

    predictor down
  2. If the other instance crashed, wait 30 seconds for timeout.

  3. Use --force to override:

    predictor down --force

Authentication Issues

"Unauthorized (401)"

Cause: Invalid or expired token.

Solutions:

  1. Check your token:

  2. Ensure correct Authorization header:

  3. Re-authenticate:

"HuggingFace: Unauthorized"

Cause: Missing or invalid HF_TOKEN for gated models.

Solution:

Get token at: huggingface.co/settings/tokens


GPU Issues

"No GPU detected"

Symptoms:

Solutions:

  1. Check NVIDIA drivers (Linux/Windows):

    If not found, install NVIDIA drivers.

  2. Check CUDA:

  3. Verify GPU visibility:

    Empty means all GPUs visible. Unset if needed.

  4. Run on CPU (slower):

"Insufficient VRAM"

Symptoms:

Solutions:

  1. Use smaller quantization:

    • Q4_K_M (~4GB for 7B model)

    • Q5_K_M (~5GB for 7B model)

  2. Use a smaller model.

  3. Close other GPU applications.

  4. Run on CPU:

"CUDA out of memory" during inference

Cause: GPU ran out of memory during generation.

Solutions:

  1. Reduce max_tokens in requests.

  2. Lower concurrency in predictor.yaml:

  3. Use smaller quantization.


Model Issues

"Unsupported model format"

Cause: predictor.sh doesn't support this file format.

Supported formats:

  • .gguf - GGUF quantized models

  • .safetensors - SafeTensors format

  • .onnx - ONNX models

Not supported:

  • .pt, .bin - PyTorch checkpoints

  • .h5 - Keras/TensorFlow

Solution: Find a GGUF or SafeTensors version of your model.

"Model architecture not recognized"

Cause: Missing or invalid config.json.

Solution: Ensure your model directory contains:

  • config.json (model configuration)

  • *.safetensors (model weights)

  • tokenizer.json (for text models)

"Download failed / checksum mismatch"

Cause: Corrupted or incomplete download.

Solutions:

  1. Clear cache and retry:

  2. Check disk space.

  3. Try a different network connection.


Request Issues

"Request timeout (504)"

Cause: Inference took too long.

Solutions:

  1. Increase timeout in predictor.yaml:

  2. Reduce max_tokens in your request.

  3. Use a faster model/quantization.

"Too many requests (429)"

Cause: Rate limit exceeded.

Solutions:

  1. Reduce request frequency.

  2. Check your tier limits:

  3. Upgrade to a higher tier.


Logs and Debugging

Enable debug logging

View log file

Logs are written to /tmp/predictor.log:

Raw log output (no TUI)


Getting Help

If you're still stuck:

  1. Check the GitHub Issues

  2. Search existing issues for your error message

  3. Open a new issue with:

    • Error message

    • predictor --version output

    • OS and hardware info

    • Steps to reproduce

Last updated