IzwiIzwi

Voice Design

Create custom voices from text descriptions — no audio samples required.

Overview

Voice design generates unique voices based on natural language descriptions. Describe the voice you want, and Izwi creates it:

  • No samples needed — Create voices from scratch
  • Infinite variety — Design any voice you can describe
  • Quick iteration — Rapidly test different voice concepts
  • Creative freedom — Perfect for characters and personas

Getting Started

Download a Voice Design Model

izwi pull qwen3-tts-0.6b-voicedesign

Design a Voice

Describe the voice you want:

A warm, friendly female voice with a slight British accent. Middle-aged, professional but approachable.

Using the Web UI

Step 1: Describe Your Voice

  1. Navigate to Voice Design in the sidebar
  2. Enter a description of your desired voice
  3. Be specific about characteristics you want

Step 2: Generate Sample

  1. Enter sample text to hear the voice
  2. Click Generate
  3. Listen to the result

Step 3: Iterate

  • Adjust your description
  • Generate again
  • Repeat until satisfied

Voice Description Tips

Effective Descriptions

Include details about:

AspectExamples
GenderMale, female, androgynous
AgeYoung, middle-aged, elderly
ToneWarm, authoritative, playful
AccentBritish, Southern US, neutral
PaceFast, measured, deliberate
EnergyEnergetic, calm, subdued
CharacterProfessional, friendly, mysterious

Example Descriptions

News anchor:

A professional male voice, mid-30s, with a clear American accent. Authoritative and trustworthy, with measured pacing.

Children's narrator:

A warm, enthusiastic female voice. Friendly and expressive, perfect for storytelling. Slightly higher pitch with playful energy.

AI assistant:

A calm, neutral voice with no strong accent. Clear and helpful, not robotic but not overly emotional. Professional and efficient.

Audiobook narrator:

A rich, deep male voice with a slight British accent. Mature and sophisticated, with excellent diction and a storytelling quality.

Using the CLI

Generate with Voice Description

izwi tts "Hello, this is my designed voice" \ --model qwen3-tts-0.6b-voicedesign \ --speaker "A warm, friendly female voice with a British accent" \ --output designed.wav

Using the API

Endpoint

POST /v1/audio/speech

Request

{
  "model": "qwen3-tts-0.6b-voicedesign",
  "input": "Hello, this is my designed voice.",
  "voice_description": "A warm, friendly female voice with a British accent"
}

Example (curl)

curl -X POST http://localhost:8080/v1/audio/speech \ -H "Content-Type: application/json" \ -d '{ "model": "qwen3-tts-0.6b-voicedesign", "input": "Hello, this is my designed voice.", "voice_description": "A warm, friendly female voice" }' \ --output designed.wav

Available Models

ModelSizeQuality
qwen3-tts-0.6b-voicedesign1.2 GBGood
qwen3-tts-1.7b-voicedesign3.4 GBBetter

Larger models better interpret complex descriptions.


Best Practices

Be Specific

❌ "A nice voice"

✅ "A warm, professional female voice in her 40s with a calm, reassuring tone"

Use Comparisons

"Similar to a podcast host — conversational but polished"

Describe the Context

"A voice suitable for meditation apps — slow, soothing, and peaceful"

Iterate

Start broad, then refine:

  1. "A male voice"
  2. "A young male voice with energy"
  3. "A young male voice with energy, like a sports commentator"

Limitations

  • Consistency — Same description may produce slightly different voices
  • Extreme requests — Very unusual voices may not generate well
  • Accents — Some accents are better supported than others
  • Singing — Designed for speech, not singing

Voice Design vs Voice Cloning

AspectVoice DesignVoice Cloning
InputText descriptionAudio sample
Use caseCreate new voicesReplicate existing voices
ConsistencyMay vary slightlyMore consistent
FlexibilityUnlimited creativityLimited to source

See Also