IzwiIzwi
Production ready • Open Source
Izwi

On-Device Voice AI, Built for Production

Production ready on-device inference for real-time voice applications. Build voice AI that runs locally with sub-100ms latency — no cloud dependencies, no API costs.

<100ms Latency
100% Private
Rust-Native Engine
Star on GitHub|Apache 2.0 License
localhost:8080
Izwi Desktop Application - Local audio inference playground

Izwi Desktop — Prototype and test voice AI locally

Ship Voice AI to Production

Choose how to integrate Izwi into your stack — desktop app for prototyping, or server SDK for production deployment.

Desktop Playground

GUI for exploration & experimentation

A native desktop application for macOS, Windows, and Linux. Prototype voice AI features, test models, and build audio workflows before deploying to production.

  • Visual interface for all audio features
  • Real-time voice conversations
  • Built-in model management
  • Drag & drop audio transcription
Download Desktop App

Server & API

Production ready inference engine

Production ready HTTP API server with OpenAI-compatible endpoints. Deploy voice AI at scale with a simple REST API, designed for low-latency and high-throughput.

  • OpenAI-compatible API endpoints
  • Drop-in replacement for cloud APIs
  • Rust-native for maximum performance
  • WebSocket support for streaming
View API Documentation

Complete Voice AI Stack

Everything you need to build production voice AI applications. From transcription to synthesis, all running locally.

Text-to-Speech

Generate natural, expressive speech from text. Multiple voice options with speed and pitch control.

Speech Recognition

High-accuracy transcription with word-level timestamps. Supports multiple languages and audio formats.

Voice Cloning

Clone any voice from just seconds of audio. Create custom speakers for your applications.

Voice Design

Create unique voices from text descriptions. Design the perfect voice for your brand or application.

Conversational AI

Real-time voice conversations with AI. Natural back-and-forth dialogue with automatic speech detection.

Speaker Diarization

Automatically identify and separate multiple speakers in recordings. Perfect for meetings and interviews.

Your Data Never Leaves Your Machine

On-device inference means your data never leaves the device. Zero external dependencies, no API keys, no usage limits. Full control over your voice AI infrastructure.

No internet connection required
No API keys or authentication
No usage limits or quotas
Fully auditable open-source code
Local Processing
$ izwi serve
✓ Models loaded locally
✓ Server running on localhost:8080
✓ GPU acceleration: Metal (Apple Silicon)

Processing: 100% local
Cloud API calls: 0
Data sent externally: None

SDK for Production Deployment

Drop-in OpenAI-compatible API. Deploy voice AI to production without changing your codebase.

example.py
from openai import OpenAI

# Point to your local Izwi server
client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"
)

# Generate speech
response = client.audio.speech.create(
    model="qwen3-tts-0.6b-base",
    input="Hello from Izwi!",
    voice="alloy"
)
response.stream_to_file("output.mp3")

# Transcribe audio
transcript = client.audio.transcriptions.create(
    model="qwen3-asr-0.6b",
    file=open("audio.wav", "rb")
)
print(transcript.text)

Built for Real-Time Performance

Rust-native inference engine optimized for production workloads. Sub-100ms latency for responsive voice experiences.

<100ms
Time to First Token
10x
Faster than cloud APIs
0
Cloud Dependencies

Apple Silicon Native

Optimized for M1/M2/M3 with Metal GPU acceleration

NVIDIA CUDA Support

Leverage your NVIDIA GPU for maximum performance

Ship On-Device Voice AI Today

Join thousands of developers building production voice AI applications with zero cloud costs and complete data control.

Apache 2.0 License • Free forever