Text-to-Speech
Generate natural, human-like speech from text using state-of-the-art TTS models.
Overview
Izwi's text-to-speech converts written text into spoken audio. Features include:
- Natural voices — High-quality, expressive speech
- Multiple formats — WAV, MP3, OGG, FLAC, AAC
- Speed control — Adjust playback speed
- Streaming — Real-time audio generation
- Local processing — No cloud, complete privacy
Getting Started
Download a TTS Model
izwi pull qwen3-tts-0.6b-baseGenerate Speech
Command line:
izwi tts "Hello, welcome to Izwi!" --output hello.wavWith playback:
izwi tts "Hello, welcome to Izwi!" --playUsing the CLI
Basic Usage
izwi tts "Your text here" --output output.wavOptions
| Option | Description | Default |
|---|---|---|
--model, -m | TTS model to use | qwen3-tts-0.6b-base |
--output, -o | Output file path | stdout |
--format, -f | Audio format | wav |
--speed, -r | Speech speed (0.5-2.0) | 1.0 |
--speaker, -s | Voice/speaker ID | default |
--temperature, -t | Sampling temperature | 0.7 |
--play, -p | Play audio after generation | — |
--stream | Stream output in real-time | — |
Examples
Different formats:
izwi tts "Hello world" --format mp3 --output hello.mp3 izwi tts "Hello world" --format ogg --output hello.oggAdjust speed:
# Slower (0.5x - 1.0x) izwi tts "Speaking slowly" --speed 0.75 --output slow.wav # Faster (1.0x - 2.0x) izwi tts "Speaking quickly" --speed 1.5 --output fast.wavRead from stdin:
echo "Text from pipe" | izwi tts - --output piped.wav cat article.txt | izwi tts - --output article.wavStreaming output:
izwi tts "Long text for streaming" --stream --playUsing the Web UI
- Navigate to Text to Speech in the sidebar
- Enter your text in the input field
- Select a voice (if available)
- Click Generate
- Play or download the audio
Features
- Live preview — Hear audio as it generates
- Download — Save audio files locally
- History — Access recent generations
Using the API
Endpoint
POST /v1/audio/speechRequest
{
"model": "qwen3-tts-0.6b-base",
"input": "Hello, this is a test.",
"voice": "default",
"speed": 1.0,
"response_format": "wav"
}Response
Binary audio data with appropriate Content-Type header.
Example (curl)
curl -X POST http://localhost:8080/v1/audio/speech \ -H "Content-Type: application/json" \ -d '{"model": "qwen3-tts-0.6b-base", "input": "Hello world"}' \ --output speech.wavAvailable Models
| Model | Size | Quality | Speed |
|---|---|---|---|
qwen3-tts-0.6b-base | 1.2 GB | Good | Fast |
qwen3-tts-1.7b-base | 3.4 GB | Better | Medium |
For voice cloning, use customvoice variants. For voice design, use voicedesign variants.
Audio Formats
| Format | Extension | Notes |
|---|---|---|
| WAV | .wav | Uncompressed, highest quality |
| MP3 | .mp3 | Compressed, widely compatible |
| OGG | .ogg | Open format, good compression |
| FLAC | .flac | Lossless compression |
| AAC | .aac | High efficiency compression |
Tips
- Punctuation matters — Use proper punctuation for natural pauses
- Break long text — Split very long text into paragraphs
- Test different speeds — Find the right pace for your use case
- Use appropriate models — Larger models = better quality but slower
See Also
- Voice Cloning — Clone custom voices
- Voice Design — Create voices from descriptions
- CLI Reference — Full command documentation