Troubleshooting

Common issues and solutions.

Connection Issues

"Tunnel connection failed"

Symptoms: Can't establish tunnel to predictor.sh servers.

Solutions:

Check internet connection:
```
curl https://predictor.sh
```
Check firewall: Ensure outbound HTTPS (443) is allowed.

Corporate proxy: Configure proxy settings:

export HTTPS_PROXY=http://proxy.company.com:8080

VPN interference: Try disconnecting VPN temporarily.

"Endpoint abc123 is already connected"

Cause: Another predictor instance is using this endpoint.

Solutions:

Stop the other instance:
```
predictor down
```
If the other instance crashed, wait 30 seconds for timeout.
Use --force to override:
```
predictor down --force
```

Authentication Issues

"Unauthorized (401)"

Cause: Invalid or expired token.

Solutions:

Check your token:
```
predictor token show --reveal
```

Ensure correct Authorization header:

curl -H "Authorization: Bearer pred_your_token" ...

Re-authenticate:
```
predictor logout
predictor login
```

"HuggingFace: Unauthorized"

Cause: Missing or invalid HF_TOKEN for gated models.

Solution:

export HF_TOKEN=hf_xxxxxxxxxxxxx
predictor up meta-llama/Llama-3-8B

Get token at: huggingface.co/settings/tokens

GPU Issues

"No GPU detected"

Symptoms:

Error: No GPU detected. Use --cpu flag to run on CPU (slower).

Solutions:

Check NVIDIA drivers (Linux/Windows):
```
nvidia-smi
```
If not found, install NVIDIA drivers.
Check CUDA:
```
nvcc --version
```
Verify GPU visibility:
```
echo $CUDA_VISIBLE_DEVICES
```
Empty means all GPUs visible. Unset if needed.
Run on CPU (slower):
```
predictor up ./llama.gguf --cpu
```

"Insufficient VRAM"

Symptoms:

Error: Model requires ~16GB VRAM, but GPU has 8GB available.

Solutions:

Use smaller quantization:
- Q4_K_M (~4GB for 7B model)
- Q5_K_M (~5GB for 7B model)
Use a smaller model.
Close other GPU applications.
Run on CPU:
```
predictor up ./llama.gguf --cpu
```

"CUDA out of memory" during inference

Cause: GPU ran out of memory during generation.

Solutions:

Reduce max_tokens in requests.
Lower concurrency in predictor.yaml:
```
inference:
  max_concurrent: 1
```
Use smaller quantization.

Model Issues

"Unsupported model format"

Cause: predictor.sh doesn't support this file format.

Supported formats:

.gguf - GGUF quantized models
.safetensors - SafeTensors format
.onnx - ONNX models

Not supported:

.pt, .bin - PyTorch checkpoints
.h5 - Keras/TensorFlow

Solution: Find a GGUF or SafeTensors version of your model.

"Model architecture not recognized"

Cause: Missing or invalid config.json.

Solution: Ensure your model directory contains:

config.json (model configuration)
*.safetensors (model weights)
tokenizer.json (for text models)

"Download failed / checksum mismatch"

Cause: Corrupted or incomplete download.

Solutions:

Clear cache and retry:

rm -rf ~/.cache/huggingface/hub/models--*
predictor pull your/model

Check disk space.
Try a different network connection.

Request Issues

"Request timeout (504)"

Cause: Inference took too long.

Solutions:

Increase timeout in predictor.yaml:
```
timeout: 120  # seconds
```
Reduce max_tokens in your request.
Use a faster model/quantization.

"Too many requests (429)"

Cause: Rate limit exceeded.

Solutions:

Reduce request frequency.
Check your tier limits:
```
predictor usage
```
Upgrade to a higher tier.

Logs and Debugging

Enable debug logging

# Via environment variable
RUST_LOG=predictor=debug predictor up ./llama.gguf

# Via flag
predictor up ./llama.gguf --verbose

View log file

Logs are written to /tmp/predictor.log:

tail -f /tmp/predictor.log

Raw log output (no TUI)

predictor up ./llama.gguf --raw-logs

Getting Help

If you're still stuck:

Check the GitHub Issues
Search existing issues for your error message
Open a new issue with:
- Error message
- predictor --version output
- OS and hardware info
- Steps to reproduce

PreviousCLI Commands

Last updated 1 month ago

hashtagConnection Issues

hashtag"Tunnel connection failed"

hashtag"Endpoint abc123 is already connected"

hashtagAuthentication Issues

hashtag"Unauthorized (401)"

hashtag"HuggingFace: Unauthorized"

hashtagGPU Issues

hashtag"No GPU detected"

hashtag"Insufficient VRAM"

hashtag"CUDA out of memory" during inference

hashtagModel Issues

hashtag"Unsupported model format"

hashtag"Model architecture not recognized"

hashtag"Download failed / checksum mismatch"

hashtagRequest Issues

hashtag"Request timeout (504)"

hashtag"Too many requests (429)"

hashtagLogs and Debugging

hashtagEnable debug logging

hashtagView log file

hashtagRaw log output (no TUI)

hashtagGetting Help

Connection Issues

"Tunnel connection failed"

"Endpoint abc123 is already connected"

Authentication Issues

"Unauthorized (401)"

"HuggingFace: Unauthorized"

GPU Issues

"No GPU detected"

"Insufficient VRAM"

"CUDA out of memory" during inference

Model Issues

"Unsupported model format"

"Model architecture not recognized"

"Download failed / checksum mismatch"

Request Issues

"Request timeout (504)"

"Too many requests (429)"

Logs and Debugging

Enable debug logging

View log file

Raw log output (no TUI)

Getting Help