Text-to-Speech

predictor.sh supports two TTS engines: Kokoro (ONNX) and Parler-TTS (Candle).

Engine Comparison

Feature
Kokoro
Parler-TTS

Quality

Best

Good

Speed

Faster

Slower

Streaming

✅ Full SSE

❌ No

Voices

10 built-in

6 presets + custom

Voice Control

Fixed voices

Prompt-based

The engine is auto-detected from the model path:

  • Path contains "kokoro" → Kokoro engine

  • Path contains "parler" → Parler-TTS engine

High-quality, fast synthesis with streaming support.

Available Voices

Voice ID
Description
Gender
Accent

af_bella

Warm, neutral

Female

American

af_sarah

Clear, professional

Female

American

af_nicole

Energetic, bright

Female

American

af_sky

Soft, gentle

Female

American

am_adam

Neutral, clear

Male

American

am_michael

Deep, authoritative

Male

American

bf_emma

Warm, approachable

Female

British

bf_isabella

Elegant, refined

Female

British

bm_george

Classic, distinguished

Male

British

bm_lewis

Friendly, conversational

Male

British

OpenAI Voice Mapping

When using OpenAI voice names, they map to Kokoro voices:

OpenAI Voice
Kokoro Voice

alloy

af_bella

echo

am_adam

fable

bm_george

onyx

am_michael

nova

af_nicole

shimmer

af_sky

Parler-TTS

Prompt-based voice synthesis using natural language descriptions.

Voice Presets

Preset
Description

alloy

A neutral, clear female voice

echo

A neutral, clear male voice

fable

A warm, storytelling male voice

onyx

A deep, authoritative male voice

nova

An energetic, bright female voice

shimmer

A soft, gentle female voice

Custom Voice Descriptions

Parler-TTS accepts custom voice descriptions:

API Usage

OpenAI-Compatible API

With Streaming (Kokoro only)

ElevenLabs-Compatible API

List Available Voices

Python SDK

Output Formats

Format
Extension
Notes

MP3

.mp3

Default, smaller files

WAV

.wav

Uncompressed, larger

Speed Control

Adjust speech speed from 0.5x to 2.0x:

Streaming Format

When streaming is enabled, audio is delivered as Server-Sent Events:

Each chunk contains base64-encoded audio that can be decoded and played progressively.

Last updated