Skip to main content

Text to Speech

Endpoint: /audio/speech Main request parameters:
ParameterDescription
modelModel used for speech synthesis, supported model list.
inputText content to be converted into audio.
voiceReference voice, supports system preset voices, user preset voices, and user dynamic voices.
curl https://api.elkapi.com/v1/audio/speech \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini-tts",
    "input": "The quick brown fox jumped over the lazy dog.",
    "voice": "alloy"
  }' \
  --output speech.mp3

Speech to Text

Endpoint: /audio/transcriptions Content-Type: multipart/form-data Main request parameters:
ParameterDescription
modelModel used for speech-to-text, supported model list.
fileAudio file to be converted to text.
curl https://api.elkapi.com/v1/audio/transcriptions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/audio.mp3" \
  -F model="gpt-4o-transcribe"

Speech to Speech

This scene is currently only supported by Elevenlabs models. Please refer to the corresponding documentation.
  • Set OPENAI_BASE_URL to https://api.elkapi.com/v1
  • OPENAI_API_KEY should be set to your API Key
  • Most models have been adapted to the OpenAI mapping interface. Some models have not been adapted. Please refer to the model documentation.

OpenAI Official Docs

OpenAI Audio API

OpenAI Official Docs

OpenAI TTS Guide