Audio
Audio transcription and speech synthesis endpoints.
Transcriptions (Speech-to-Text)
Endpoint
POST /v1/audio/transcriptionsRequest
curl https://abc123.predictor.sh/v1/audio/transcriptions \
-H "Authorization: Bearer $TOKEN" \
-F "[email protected]" \
-F "model=whisper-1"Parameters
file
file
Yes
Audio file (mp3, wav, m4a, webm, etc.)
model
string
Yes
Model ID (use "whisper-1")
language
string
No
Language code (e.g., "en", "es")
Response
{
"text": "Hello, this is a transcription of the audio file."
}Supported Audio Formats
WAV (
.wav)MP3 (
.mp3)M4A/AAC (
.m4a,.aac)OGG Vorbis (
.ogg,.oga)FLAC (
.flac)WebM (
.webm)
Python Example
Speech (Text-to-Speech)
OpenAI-Compatible Endpoint
Request
Parameters
input
string
Yes
Text to synthesize
voice
string
Yes
Voice ID
response_format
string
No
mp3 (default) or wav
speed
float
No
Speed 0.5 to 2.0 (default 1.0)
stream_format
string
No
Set to sse for streaming
Available Voices
alloy
Neutral female
echo
Neutral male
fable
British male
onyx
Deep male
nova
Energetic female
shimmer
Soft female
Response
Binary audio data (MP3 or WAV).
Example
Python Example
ElevenLabs-Compatible Endpoints
Generate Speech
Streaming Speech
Response is Server-Sent Events with base64-encoded audio chunks.
List Voices
Response:
Voice IDs (Kokoro)
af_bella
American female, warm
af_sarah
American female, professional
af_nicole
American female, energetic
af_sky
American female, soft
am_adam
American male, neutral
am_michael
American male, deep
bf_emma
British female, warm
bf_isabella
British female, elegant
bm_george
British male, classic
bm_lewis
British male, friendly
Last updated