On-Device Voice AI, Built for Production
Production ready on-device inference for real-time voice applications. Build voice AI that runs locally with sub-100ms latency — no cloud dependencies, no API costs.

Izwi Desktop — Prototype and test voice AI locally
Ship Voice AI to Production
Choose how to integrate Izwi into your stack — desktop app for prototyping, or server SDK for production deployment.
Desktop Playground
GUI for exploration & experimentation
A native desktop application for macOS, Windows, and Linux. Prototype voice AI features, test models, and build audio workflows before deploying to production.
- Visual interface for all audio features
- Real-time voice conversations
- Built-in model management
- Drag & drop audio transcription
Server & API
Production ready inference engine
Production ready HTTP API server with OpenAI-compatible endpoints. Deploy voice AI at scale with a simple REST API, designed for low-latency and high-throughput.
- OpenAI-compatible API endpoints
- Drop-in replacement for cloud APIs
- Rust-native for maximum performance
- WebSocket support for streaming
Complete Voice AI Stack
Everything you need to build production voice AI applications. From transcription to synthesis, all running locally.
Text-to-Speech
Generate natural, expressive speech from text. Multiple voice options with speed and pitch control.
Speech Recognition
High-accuracy transcription with word-level timestamps. Supports multiple languages and audio formats.
Voice Cloning
Clone any voice from just seconds of audio. Create custom speakers for your applications.
Voice Design
Create unique voices from text descriptions. Design the perfect voice for your brand or application.
Conversational AI
Real-time voice conversations with AI. Natural back-and-forth dialogue with automatic speech detection.
Speaker Diarization
Automatically identify and separate multiple speakers in recordings. Perfect for meetings and interviews.
Your Data Never Leaves Your Machine
On-device inference means your data never leaves the device. Zero external dependencies, no API keys, no usage limits. Full control over your voice AI infrastructure.
$ izwi serve
✓ Models loaded locally
✓ Server running on localhost:8080
✓ GPU acceleration: Metal (Apple Silicon)
Processing: 100% local
Cloud API calls: 0
Data sent externally: NoneSDK for Production Deployment
Drop-in OpenAI-compatible API. Deploy voice AI to production without changing your codebase.
from openai import OpenAI
# Point to your local Izwi server
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed"
)
# Generate speech
response = client.audio.speech.create(
model="qwen3-tts-0.6b-base",
input="Hello from Izwi!",
voice="alloy"
)
response.stream_to_file("output.mp3")
# Transcribe audio
transcript = client.audio.transcriptions.create(
model="qwen3-asr-0.6b",
file=open("audio.wav", "rb")
)
print(transcript.text)Built for Real-Time Performance
Rust-native inference engine optimized for production workloads. Sub-100ms latency for responsive voice experiences.
Apple Silicon Native
Optimized for M1/M2/M3 with Metal GPU acceleration
NVIDIA CUDA Support
Leverage your NVIDIA GPU for maximum performance
Ship On-Device Voice AI Today
Join thousands of developers building production voice AI applications with zero cloud costs and complete data control.
Apache 2.0 License • Free forever