Audio

Audio transcription and speech synthesis endpoints.

Transcriptions (Speech-to-Text)

Endpoint

POST /v1/audio/transcriptions

Request

curl https://abc123.predictor.sh/v1/audio/transcriptions \
  -H "Authorization: Bearer $TOKEN" \
  -F "[email protected]" \
  -F "model=whisper-1"

Parameters

Parameter

Type

Required

Description

file

file

Yes

Audio file (mp3, wav, m4a, webm, etc.)

model

string

Yes

Model ID (use "whisper-1")

language

string

Language code (e.g., "en", "es")

Response

{
  "text": "Hello, this is a transcription of the audio file."
}

Supported Audio Formats

WAV (.wav)
MP3 (.mp3)
M4A/AAC (.m4a, .aac)
OGG Vorbis (.ogg, .oga)
FLAC (.flac)
WebM (.webm)

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="https://abc123.predictor.sh/v1",
    api_key="pred_your_token"
)

with open("meeting.mp3", "rb") as audio:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio,
        language="en"
    )
print(transcript.text)

Speech (Text-to-Speech)

OpenAI-Compatible Endpoint

POST /v1/audio/speech

Request

{
  "input": "Hello, this is a test of text to speech.",
  "voice": "alloy",
  "response_format": "mp3",
  "speed": 1.0
}

Parameters

Parameter

Type

Required

Description

input

string

Yes

Text to synthesize

voice

string

Yes

Voice ID

response_format

string

mp3 (default) or wav

speed

float

Speed 0.5 to 2.0 (default 1.0)

stream_format

string

Set to sse for streaming

Available Voices

Voice

Description

alloy

Neutral female

echo

Neutral male

fable

British male

onyx

Deep male

nova

Energetic female

shimmer

Soft female

Response

Binary audio data (MP3 or WAV).

Example

curl https://abc123.predictor.sh/v1/audio/speech \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello world!",
    "voice": "alloy"
  }' \
  --output speech.mp3

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="https://abc123.predictor.sh/v1",
    api_key="pred_your_token"
)

response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello, this is a test.",
    speed=1.0
)

response.stream_to_file("output.mp3")

ElevenLabs-Compatible Endpoints

Generate Speech

POST /v1/text-to-speech/{voice_id}

curl https://abc123.predictor.sh/v1/text-to-speech/af_bella \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world"}' \
  --output speech.mp3

Streaming Speech

POST /v1/text-to-speech/{voice_id}/stream

curl https://abc123.predictor.sh/v1/text-to-speech/af_bella/stream \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world"}'

Response is Server-Sent Events with base64-encoded audio chunks.

List Voices

GET /v1/voices

curl https://abc123.predictor.sh/v1/voices \
  -H "Authorization: Bearer $TOKEN"

Response:

{
  "voices": [
    {"voice_id": "af_bella", "name": "Bella"},
    {"voice_id": "am_adam", "name": "Adam"},
    ...
  ]
}

Voice IDs (Kokoro)

Voice ID

Description

af_bella

American female, warm

af_sarah

American female, professional

af_nicole

American female, energetic

af_sky

American female, soft

am_adam

American male, neutral

am_michael

American male, deep

bf_emma

British female, warm

bf_isabella

British female, elegant

bm_george

British male, classic

bm_lewis

British male, friendly

PreviousChat Completions NextStreaming

Last updated 1 month ago

hashtagTranscriptions (Speech-to-Text)

hashtagEndpoint

hashtagRequest

hashtagParameters

hashtagResponse

hashtagSupported Audio Formats

hashtagPython Example

hashtagSpeech (Text-to-Speech)

hashtagOpenAI-Compatible Endpoint

hashtagRequest

hashtagParameters

hashtagAvailable Voices

hashtagResponse

hashtagExample

hashtagPython Example

hashtagElevenLabs-Compatible Endpoints

hashtagGenerate Speech

hashtagStreaming Speech

hashtagList Voices

hashtagVoice IDs (Kokoro)

Transcriptions (Speech-to-Text)

Endpoint

Request

Parameters

Response

Supported Audio Formats

Python Example

Speech (Text-to-Speech)

OpenAI-Compatible Endpoint

Request

Parameters

Available Voices

Response

Example

Python Example

ElevenLabs-Compatible Endpoints

Generate Speech

Streaming Speech

List Voices

Voice IDs (Kokoro)