Text-to-Speech
predictor.sh supports two TTS engines: Kokoro (ONNX) and Parler-TTS (Candle).
Engine Comparison
Quality
Best
Good
Speed
Faster
Slower
Streaming
✅ Full SSE
❌ No
Voices
10 built-in
6 presets + custom
Voice Control
Fixed voices
Prompt-based
The engine is auto-detected from the model path:
Path contains "kokoro" → Kokoro engine
Path contains "parler" → Parler-TTS engine
Kokoro TTS (Recommended)
High-quality, fast synthesis with streaming support.
Available Voices
af_bella
Warm, neutral
Female
American
af_sarah
Clear, professional
Female
American
af_nicole
Energetic, bright
Female
American
af_sky
Soft, gentle
Female
American
am_adam
Neutral, clear
Male
American
am_michael
Deep, authoritative
Male
American
bf_emma
Warm, approachable
Female
British
bf_isabella
Elegant, refined
Female
British
bm_george
Classic, distinguished
Male
British
bm_lewis
Friendly, conversational
Male
British
OpenAI Voice Mapping
When using OpenAI voice names, they map to Kokoro voices:
alloy
af_bella
echo
am_adam
fable
bm_george
onyx
am_michael
nova
af_nicole
shimmer
af_sky
Parler-TTS
Prompt-based voice synthesis using natural language descriptions.
Voice Presets
alloy
A neutral, clear female voice
echo
A neutral, clear male voice
fable
A warm, storytelling male voice
onyx
A deep, authoritative male voice
nova
An energetic, bright female voice
shimmer
A soft, gentle female voice
Custom Voice Descriptions
Parler-TTS accepts custom voice descriptions:
API Usage
OpenAI-Compatible API
With Streaming (Kokoro only)
ElevenLabs-Compatible API
List Available Voices
Python SDK
Output Formats
MP3
.mp3
Default, smaller files
WAV
.wav
Uncompressed, larger
Speed Control
Adjust speech speed from 0.5x to 2.0x:
Streaming Format
When streaming is enabled, audio is delivered as Server-Sent Events:
Each chunk contains base64-encoded audio that can be decoded and played progressively.
Last updated