IzwiIzwi

Text-to-Speech

Generate natural, human-like speech from text using state-of-the-art TTS models.

Overview

Izwi's text-to-speech converts written text into spoken audio. Features include:

  • Natural voices — High-quality, expressive speech
  • Multiple formats — WAV, MP3, OGG, FLAC, AAC
  • Speed control — Adjust playback speed
  • Streaming — Real-time audio generation
  • Local processing — No cloud, complete privacy

Getting Started

Download a TTS Model

izwi pull qwen3-tts-0.6b-base

Generate Speech

Command line:

izwi tts "Hello, welcome to Izwi!" --output hello.wav

With playback:

izwi tts "Hello, welcome to Izwi!" --play

Using the CLI

Basic Usage

izwi tts "Your text here" --output output.wav

Options

OptionDescriptionDefault
--model, -mTTS model to useqwen3-tts-0.6b-base
--output, -oOutput file pathstdout
--format, -fAudio formatwav
--speed, -rSpeech speed (0.5-2.0)1.0
--speaker, -sVoice/speaker IDdefault
--temperature, -tSampling temperature0.7
--play, -pPlay audio after generation
--streamStream output in real-time

Examples

Different formats:

izwi tts "Hello world" --format mp3 --output hello.mp3 izwi tts "Hello world" --format ogg --output hello.ogg

Adjust speed:

# Slower (0.5x - 1.0x) izwi tts "Speaking slowly" --speed 0.75 --output slow.wav # Faster (1.0x - 2.0x) izwi tts "Speaking quickly" --speed 1.5 --output fast.wav

Read from stdin:

echo "Text from pipe" | izwi tts - --output piped.wav cat article.txt | izwi tts - --output article.wav

Streaming output:

izwi tts "Long text for streaming" --stream --play

Using the Web UI

  1. Navigate to Text to Speech in the sidebar
  2. Enter your text in the input field
  3. Select a voice (if available)
  4. Click Generate
  5. Play or download the audio

Features

  • Live preview — Hear audio as it generates
  • Download — Save audio files locally
  • History — Access recent generations

Using the API

Endpoint

POST /v1/audio/speech

Request

{
  "model": "qwen3-tts-0.6b-base",
  "input": "Hello, this is a test.",
  "voice": "default",
  "speed": 1.0,
  "response_format": "wav"
}

Response

Binary audio data with appropriate Content-Type header.

Example (curl)

curl -X POST http://localhost:8080/v1/audio/speech \ -H "Content-Type: application/json" \ -d '{"model": "qwen3-tts-0.6b-base", "input": "Hello world"}' \ --output speech.wav

Available Models

ModelSizeQualitySpeed
qwen3-tts-0.6b-base1.2 GBGoodFast
qwen3-tts-1.7b-base3.4 GBBetterMedium

For voice cloning, use customvoice variants. For voice design, use voicedesign variants.


Audio Formats

FormatExtensionNotes
WAV.wavUncompressed, highest quality
MP3.mp3Compressed, widely compatible
OGG.oggOpen format, good compression
FLAC.flacLossless compression
AAC.aacHigh efficiency compression

Tips

  1. Punctuation matters — Use proper punctuation for natural pauses
  2. Break long text — Split very long text into paragraphs
  3. Test different speeds — Find the right pace for your use case
  4. Use appropriate models — Larger models = better quality but slower

See Also