Give your OpenClaw agents a truly local voice
OpenClaw has taken the AI world by storm, becoming one of the most popular open-source personal AI assistant frameworks. It runs on your machine, connects to the messaging platforms you already use—WhatsApp, Telegram, Slack, Discord, Signal, iMessage, and more—and can browse the web, control your system, and extend itself with skills and plugins. OpenClaw even has a built-in Talk Mode for voice conversations. But that voice support relies on ElevenLabs, a cloud service. That's where Izwi comes in.
The Local Voice Gap in AI Agents
If you've been using OpenClaw, you know how powerful it is. It can browse the web, read and write files, run shell commands, and extend itself with community skills or ones you build yourself. It already supports voice through Talk Mode—but that voice pipeline sends your audio to ElevenLabs' servers for processing.
Voice isn't just a nice-to-have feature. It's the most natural way humans communicate. When your AI agent can speak and listen, the interaction becomes fluid, intuitive, and remarkably human. But there's a catch: cloud-based voice solutions require sending your audio to remote servers, introducing privacy concerns and latency issues.
Why Local Voice Matters for Agents
When you're building an AI agent that lives on your machine—like OpenClaw—you want it to be truly yours. That means:
- Privacy: Your voice data never leaves your device
- Speed: Sub-50ms latency for real-time conversations
- Reliability: Works offline, no API rate limits
- Control: You own the models and the data
This is exactly why we built Izwi—a local-first audio inference engine that runs entirely on your hardware.
Integrating Izwi with OpenClaw
The integration is surprisingly simple. Since Izwi provides an OpenAI-compatible API, you can drop it right into your existing OpenClaw setup. Here's how.
Step 1: Install and Start Izwi
First, download and install Izwi from GitHub Releases (.dmg for macOS, .deb for Linux, or the Windows installer). Then start the server:
izwi serve
This starts a local HTTP server on localhost:8080 with OpenAI-compatible endpoints.
Step 2: Pull the Required Models
Izwi needs models for speech recognition and synthesis. Pull them with:
izwi pull Qwen3-TTS-12Hz-0.6B-Base
izwi pull Qwen3-ASR-0.6B
These compact models run efficiently on most hardware while delivering impressive quality.
Step 3: Configure OpenClaw to Use Local Voice
In your OpenClaw configuration, you can now point audio processing to Izwi. Create a custom skill that leverages Izwi's endpoints:
from openai import OpenAI
izwi_client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed"
)
def transcribe_audio(audio_path):
with open(audio_path, "rb") as audio_file:
transcript = izwi_client.audio.transcriptions.create(
model="qwen3-asr-0.6b",
file=audio_file
)
return transcript.text
def synthesize_speech(text, output_path):
response = izwi_client.audio.speech.create(
model="qwen3-tts-0.6b-base",
input=text,
voice="alloy"
)
response.stream_to_file(output_path)
Step 4: Create Voice Skills for Your Agent
Now you can extend OpenClaw with fully local voice capabilities. Here are some ideas:
- Voice Notes: Record voice messages and have your agent transcribe, summarize, and file them appropriately
- Spoken Notifications: Have your agent announce important events through your speakers using TTS
- Voice Commands: Create a skill that listens for wake words and processes voice commands locally
- Meeting Summaries: Transcribe meeting recordings and generate audio summaries
Advanced Features
Voice Cloning
Want your agent to have a specific voice? Izwi's voice cloning feature lets you create custom voices from just a few seconds of audio. Perfect for giving your agent a consistent, recognizable personality. You can manage voices through the Izwi desktop app or the API—see the Voice Cloning docs for details.
Speaker Diarization
Processing meeting recordings? Izwi can automatically identify and separate different speakers, making it easy to generate structured transcripts with speaker labels. This is available through both the desktop app and the API.
Voice Design
Create entirely new voices from text descriptions. Describe the voice you want—warm and professional, energetic and friendly—and Izwi will generate it.
Performance Considerations
Running voice AI locally requires some hardware resources, but modern machines handle it well:
| Hardware | Model Size | Latency | Quality |
|---|---|---|---|
| Apple Silicon (M1/M2/M3) | 0.6B | <50ms | Good |
| Apple Silicon (M1/M2/M3) | 1.7B | <100ms | Excellent |
| NVIDIA GPU (8GB+) | 0.6B | <30ms | Good |
| NVIDIA GPU (8GB+) | 1.7B | <60ms | Excellent |
The Privacy Advantage
Here's what makes local voice processing special: your voice never leaves your machine. Unlike cloud-based solutions that transmit every audio snippet to remote servers for processing, Izwi handles everything locally.
This matters because your voice contains:
- Biometric data that can identify you
- Context about your life, work, and relationships
- Potentially sensitive information you speak without thinking
With Izwi + OpenClaw, you get the best of both worlds: powerful AI agents with voice capabilities, and the peace of mind that comes with true local processing.
Getting Started
Ready to give your OpenClaw agents a truly local voice? Here's what you need:
- Download Izwi from GitHub Releases or the downloads page
- Check our documentation at izwiai.com/docs
- Join the community on GitHub
The combination of OpenClaw's agent capabilities and Izwi's local voice processing opens up entirely new possibilities for private, powerful AI assistants. Your agents can finally speak and listen—all without sending a single byte of audio to the cloud.
Try It Today
Download Izwi for free and start building voice-enabled agents. Join thousands of developers who are building privacy-first AI applications.