Izwi - Local Audio Inference Engine

Overview

Izwi's text-to-speech converts written text into spoken audio. Features include:

Natural voices — High-quality, expressive speech
Multiple formats — WAV, MP3, OGG, FLAC, AAC
Speed control — Adjust playback speed
Streaming — Real-time audio generation
Local processing — No cloud, complete privacy

Getting Started

Download a TTS Model

izwi pull qwen3-tts-0.6b-base

Generate Speech

Command line:

izwi tts "Hello, welcome to Izwi!" --output hello.wav

With playback:

izwi tts "Hello, welcome to Izwi!" --play

Using the CLI

Basic Usage

izwi tts "Your text here" --output output.wav

Options

Option	Description	Default
`--model`, `-m`	TTS model to use	`qwen3-tts-0.6b-base`
`--output`, `-o`	Output file path	stdout
`--format`, `-f`	Audio format	`wav`
`--speed`, `-r`	Speech speed (0.5-2.0)	`1.0`
`--speaker`, `-s`	Voice/speaker ID	`default`
`--temperature`, `-t`	Sampling temperature	`0.7`
`--play`, `-p`	Play audio after generation	—
`--stream`	Stream output in real-time	—

Examples

Different formats:

izwi tts "Hello world" --format mp3 --output hello.mp3 izwi tts "Hello world" --format ogg --output hello.ogg

Adjust speed:

# Slower (0.5x - 1.0x) izwi tts "Speaking slowly" --speed 0.75 --output slow.wav # Faster (1.0x - 2.0x) izwi tts "Speaking quickly" --speed 1.5 --output fast.wav

Read from stdin:

echo "Text from pipe" | izwi tts - --output piped.wav cat article.txt | izwi tts - --output article.wav

Streaming output:

izwi tts "Long text for streaming" --stream --play

Using the Web UI

Navigate to Text to Speech in the sidebar
Enter your text in the input field
Select a voice (if available)
Click Generate
Play or download the audio

Features

Live preview — Hear audio as it generates
Download — Save audio files locally
History — Access recent generations

Using the API

Endpoint

POST /v1/audio/speech

Request

{
  "model": "qwen3-tts-0.6b-base",
  "input": "Hello, this is a test.",
  "voice": "default",
  "speed": 1.0,
  "response_format": "wav"
}

Response

Binary audio data with appropriate Content-Type header.

Example (curl)

curl -X POST http://localhost:8080/v1/audio/speech \ -H "Content-Type: application/json" \ -d '{"model": "qwen3-tts-0.6b-base", "input": "Hello world"}' \ --output speech.wav

Available Models

Model	Size	Quality	Speed
`qwen3-tts-0.6b-base`	1.2 GB	Good	Fast
`qwen3-tts-1.7b-base`	3.4 GB	Better	Medium

For voice cloning, use customvoice variants. For voice design, use voicedesign variants.

Audio Formats

Format	Extension	Notes
WAV	`.wav`	Uncompressed, highest quality
MP3	`.mp3`	Compressed, widely compatible
OGG	`.ogg`	Open format, good compression
FLAC	`.flac`	Lossless compression
AAC	`.aac`	High efficiency compression

Tips

Punctuation matters — Use proper punctuation for natural pauses
Break long text — Split very long text into paragraphs
Test different speeds — Find the right pace for your use case
Use appropriate models — Larger models = better quality but slower

Text-to-Speech