Troubleshooting
Common issues and solutions.
Connection Issues
"Tunnel connection failed"
Symptoms: Can't establish tunnel to predictor.sh servers.
Solutions:
Check internet connection:
curl https://predictor.shCheck firewall: Ensure outbound HTTPS (443) is allowed.
Corporate proxy: Configure proxy settings:
export HTTPS_PROXY=http://proxy.company.com:8080VPN interference: Try disconnecting VPN temporarily.
"Endpoint abc123 is already connected"
Cause: Another predictor instance is using this endpoint.
Solutions:
Stop the other instance:
predictor downIf the other instance crashed, wait 30 seconds for timeout.
Use
--forceto override:predictor down --force
Authentication Issues
"Unauthorized (401)"
Cause: Invalid or expired token.
Solutions:
Check your token:
Ensure correct Authorization header:
Re-authenticate:
"HuggingFace: Unauthorized"
Cause: Missing or invalid HF_TOKEN for gated models.
Solution:
Get token at: huggingface.co/settings/tokens
GPU Issues
"No GPU detected"
Symptoms:
Solutions:
Check NVIDIA drivers (Linux/Windows):
If not found, install NVIDIA drivers.
Check CUDA:
Verify GPU visibility:
Empty means all GPUs visible. Unset if needed.
Run on CPU (slower):
"Insufficient VRAM"
Symptoms:
Solutions:
Use smaller quantization:
Q4_K_M (~4GB for 7B model)
Q5_K_M (~5GB for 7B model)
Use a smaller model.
Close other GPU applications.
Run on CPU:
"CUDA out of memory" during inference
Cause: GPU ran out of memory during generation.
Solutions:
Reduce
max_tokensin requests.Lower concurrency in
predictor.yaml:Use smaller quantization.
Model Issues
"Unsupported model format"
Cause: predictor.sh doesn't support this file format.
Supported formats:
.gguf- GGUF quantized models.safetensors- SafeTensors format.onnx- ONNX models
Not supported:
.pt,.bin- PyTorch checkpoints.h5- Keras/TensorFlow
Solution: Find a GGUF or SafeTensors version of your model.
"Model architecture not recognized"
Cause: Missing or invalid config.json.
Solution: Ensure your model directory contains:
config.json(model configuration)*.safetensors(model weights)tokenizer.json(for text models)
"Download failed / checksum mismatch"
Cause: Corrupted or incomplete download.
Solutions:
Clear cache and retry:
Check disk space.
Try a different network connection.
Request Issues
"Request timeout (504)"
Cause: Inference took too long.
Solutions:
Increase timeout in
predictor.yaml:Reduce
max_tokensin your request.Use a faster model/quantization.
"Too many requests (429)"
Cause: Rate limit exceeded.
Solutions:
Reduce request frequency.
Check your tier limits:
Upgrade to a higher tier.
Logs and Debugging
Enable debug logging
View log file
Logs are written to /tmp/predictor.log:
Raw log output (no TUI)
Getting Help
If you're still stuck:
Check the GitHub Issues
Search existing issues for your error message
Open a new issue with:
Error message
predictor --versionoutputOS and hardware info
Steps to reproduce
Last updated