Real-time CPU speech synthesis with streaming
OpenAI-compatible TTS API. Use any OpenAI TTS client by pointing it to this server.
/health
Health check endpoint for container orchestration and monitoring.
{
"status": "healthy",
"model_loaded": true,
"device": "cpu",
"sample_rate": 24000
}
/v1/voices
List all available voices (built-in and custom).
{
"object": "list",
"data": [
{"id": "alba", "name": "Alba", "object": "voice"},
{"id": "marius", "name": "Marius", "object": "voice"}
]
}
/v1/audio/speech
Generate speech audio from text. OpenAI-compatible endpoint.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
input |
string | ✓ | - | The text to generate speech from (max 4096 chars) |
voice |
string | alba |
Voice ID, filename, or URL. See /v1/voices
|
|
model |
string | - | Ignored (for OpenAI compatibility) | |
response_format |
string | mp3 |
Audio format: mp3, wav,
pcm, opus, flac
|
|
stream_format |
string |
Set to audio for raw streaming (SSE not supported)
|
curl -X POST http://localhost:49112/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "Hello world!",
"voice": "alba",
"response_format": "mp3"
}' \
--output speech.mp3
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:49112/v1",
api_key="not-needed"
)
response = client.audio.speech.create(
model="supertonic-2",
voice="alba",
input="Hello world!"
)
response.stream_to_file("output.mp3")
Audio file in the requested format (binary stream).
Content-Type: audio/mpeg,
audio/wav, audio/opus, etc.
{
"error": "Missing required field: input"
}
These voices work without authentication:
Custom voices support JSON styles or audio prompts when a voice extractor is configured.