Speech SDK
Use speech-sdk (Apache 2.0) to reach additional cloud TTS providers (ElevenLabs, Cartesia, Hume, Deepgram, Google Gemini TTS, Inworld, and more) with your own provider API keys. Requests go from the OpenReader server directly to the provider's API; no extra account or proxy is involved.
Models use the provider/model format. The API key you enter belongs to the provider named by the model prefix: for elevenlabs/eleven_multilingual_v2 enter an ElevenLabs key, for cartesia/sonic-3.5 a Cartesia key, and so on.
Setup
Recommended (auth + admin): Settings → Admin → Shared providers
- Add a shared provider with type
speech-sdk. - Enter the API key for the provider you want to use.
- Set default model to a matching
provider/model(for exampleelevenlabs/eleven_multilingual_v2).
Users select the enabled shared provider, model, and voice from Settings → TTS Provider.
Built-in models
openai/gpt-4o-mini-tts(works with your existing OpenAI API key)elevenlabs/eleven_multilingual_v2cartesia/sonic-3.5deepgram/aura-2google/gemini-2.5-flash-preview-ttsinworld/inworld-tts-1.5-max
You can also choose Other and enter any provider/model the SDK supports. Recognized prefixes: openai, elevenlabs, cartesia, hume, deepgram, google, inworld, minimax, fish-audio, murf, resemble, fal-ai, mistral, xai, smallest-ai.
Voice IDs
ElevenLabs and Cartesia identify voices by opaque IDs. The built-in lists map to these shared library voices:
| ElevenLabs ID | Name | Cartesia ID | Name | |
|---|---|---|---|---|
JBFqnCBsd6RMkjVDRZzb | George | a0e99841-438c-4a64-b679-ae501e7d6091 | Barbershop Man | |
IKne3meq5aSn9XLyUdCD | Charlie | 156fb8d2-335b-4950-9cb3-a2d33f0c0c2a | British Lady | |
XB0fDUnXU5powFXDhCwa | Charlotte | 694f9389-aac1-45b6-b726-9d9369183238 | California Girl | |
Xb7hH8MSUJpSbSDYk0k2 | Alice | 87748186-23bb-4571-8b8b-a73da9bf9c4f | Commercial Lady | |
iP95p4xoKVk53GoZ742B | Chris | ee7ea9f8-c0c1-498c-9f62-dc2da49a6f98 | Friendly Reading Man | |
nPczCjzI2devNBz1zQrb | Brian | 248be419-c632-4f23-adf1-5324ed7dbf1d | Hannah | |
onwK4e9ZLuTAKqWW03F9 | Daniel | |||
pFZP5JQG7iQjIQuC4Bku | Lily | |||
pqHfZKP75CvOlQylNhV4 | Bill |
Notes
- One voice per request; Kokoro-style multi-voice mixing does not apply to this provider.
- Playback speed is applied client-side, so cached audio segments stay valid when you change speed.
- Providers without a built-in voice list fall back to a
defaultentry, which lets the provider pick its default voice. - Word-by-word highlighting works the same as with every other provider (alignment runs in OpenReader, not the provider).
- TTS requests are sent from the server, not the browser. The API key is never exposed to clients.