Audio

Audio transcription and speech synthesis endpoints.

Transcriptions (Speech-to-Text)

Endpoint

POST /v1/audio/transcriptions

Request

curl https://abc123.predictor.sh/v1/audio/transcriptions \
  -H "Authorization: Bearer $TOKEN" \
  -F "[email protected]" \
  -F "model=whisper-1"

Parameters

Parameter
Type
Required
Description

file

file

Yes

Audio file (mp3, wav, m4a, webm, etc.)

model

string

Yes

Model ID (use "whisper-1")

language

string

No

Language code (e.g., "en", "es")

Response

{
  "text": "Hello, this is a transcription of the audio file."
}

Supported Audio Formats

  • WAV (.wav)

  • MP3 (.mp3)

  • M4A/AAC (.m4a, .aac)

  • OGG Vorbis (.ogg, .oga)

  • FLAC (.flac)

  • WebM (.webm)

Python Example


Speech (Text-to-Speech)

OpenAI-Compatible Endpoint

Request

Parameters

Parameter
Type
Required
Description

input

string

Yes

Text to synthesize

voice

string

Yes

Voice ID

response_format

string

No

mp3 (default) or wav

speed

float

No

Speed 0.5 to 2.0 (default 1.0)

stream_format

string

No

Set to sse for streaming

Available Voices

Voice
Description

alloy

Neutral female

echo

Neutral male

fable

British male

onyx

Deep male

nova

Energetic female

shimmer

Soft female

Response

Binary audio data (MP3 or WAV).

Example

Python Example


ElevenLabs-Compatible Endpoints

Generate Speech

Streaming Speech

Response is Server-Sent Events with base64-encoded audio chunks.

List Voices

Response:

Voice IDs (Kokoro)

Voice ID
Description

af_bella

American female, warm

af_sarah

American female, professional

af_nicole

American female, energetic

af_sky

American female, soft

am_adam

American male, neutral

am_michael

American male, deep

bf_emma

British female, warm

bf_isabella

British female, elegant

bm_george

British male, classic

bm_lewis

British male, friendly

Last updated