Voice Mode
Voice mode enables real-time spoken conversations with AI. Speak naturally and receive spoken responses — all processed locally on your device.
Overview
Voice mode combines:
- Speech recognition — Converts your voice to text
- AI chat — Processes your message and generates a response
- Text-to-speech — Speaks the response back to you
Everything runs locally with no cloud services.
Getting Started
Required Models
Download the necessary models:
# Text-to-speech izwi pull qwen3-tts-0.6b-base # Speech recognition izwi pull qwen3-asr-0.6b # Chat (optional, for smarter responses) izwi pull qwen3-chat-0.6b-4bitStart Voice Mode
- Start the server:
izwi serve - Open the web UI:
http://localhost:8080/voice - Click the microphone button to start speaking
Using Voice Mode
Web UI
- Navigate to Voice in the sidebar
- Click the microphone button to start recording
- Speak your message
- Click again to stop recording (or wait for auto-detection)
- Listen to the AI response
Controls
| Control | Action |
|---|---|
| Microphone | Start/stop recording |
| Speaker | Mute/unmute responses |
| Settings | Configure voice and model |
Configuration
Select Voice
Choose from available voices in the settings panel. Different TTS models offer different voice options.
Select Models
Configure which models to use:
- ASR Model — For speech recognition
- TTS Model — For speech synthesis
- Chat Model — For response generation
Audio Settings
- Auto-detect silence — Automatically stop recording when you stop speaking
- Playback speed — Adjust response playback speed
Tips for Best Results
- Use a good microphone — Built-in laptop mics work but external mics are better
- Minimize background noise — Find a quiet environment
- Speak clearly — Natural pace, clear pronunciation
- Wait for responses — Let the AI finish before speaking again
Keyboard Shortcuts
| Shortcut | Action |
|---|---|
Space | Toggle recording |
Escape | Cancel current recording |
M | Mute/unmute |
Troubleshooting
No audio input detected
- Check your microphone permissions in system settings
- Ensure the correct input device is selected
- Test your microphone in another application
Responses are slow
- Use smaller models for faster responses
- Ensure models are loaded (not loading on-demand)
- Check system resources (RAM, CPU usage)
Poor transcription accuracy
- Speak more clearly and slowly
- Reduce background noise
- Try a larger ASR model (
qwen3-asr-1.7b)
See Also
- Chat — Text-based conversations
- Transcription — Batch audio transcription
- Text-to-Speech — Generate speech from text