Features
Izwi provides a comprehensive suite of audio AI capabilities. Each feature is accessible via the web UI, desktop app, and command line.
Core Features
| Feature | Description | Guide |
|---|---|---|
| Voice | Real-time voice conversations with AI | Voice Guide |
| Chat | Text-based AI conversations | Chat Guide |
| Text-to-Speech | Generate natural speech from text | TTS Guide |
| Transcription | Convert audio to text | Transcription Guide |
| Diarization | Identify multiple speakers | Diarization Guide |
| Voice Cloning | Clone voices from audio samples | Voice Cloning Guide |
| Voice Design | Create voices from descriptions | Voice Design Guide |
Feature Comparison
| Feature | Web UI | Desktop | CLI | API |
|---|---|---|---|---|
| Voice | ✓ | ✓ | — | ✓ |
| Chat | ✓ | ✓ | ✓ | ✓ |
| Text-to-Speech | ✓ | ✓ | ✓ | ✓ |
| Transcription | ✓ | ✓ | ✓ | ✓ |
| Diarization | ✓ | ✓ | — | ✓ |
| Voice Cloning | ✓ | ✓ | ✓ | ✓ |
| Voice Design | ✓ | ✓ | ✓ | ✓ |
Getting Started
Start the server:
izwi serveOpen the web UI:
http://localhost:8080Download required models:
izwi pull qwen3-tts-0.6b-base izwi pull qwen3-asr-0.6bModel Requirements
Different features require different models:
| Feature | Required Models |
|---|---|
| Voice | TTS + ASR + Chat model |
| Chat | Chat model (Qwen3 or Gemma) |
| Text-to-Speech | TTS model |
| Transcription | ASR model (Qwen3 or Parakeet) |
| Diarization | Diarization model (Sortformer) |
| Forced Alignment | Forced aligner model |
| Voice Cloning | TTS CustomVoice model |
| Voice Design | TTS VoiceDesign model |
Next Steps
Choose a feature to learn more:
- Voice Mode — Real-time conversations
- Text-to-Speech — Generate speech
- Transcription — Convert audio to text