Voice Mode

Overview

Voice mode combines:

Speech recognition — Converts your voice to text
AI chat — Processes your message and generates a response
Text-to-speech — Speaks the response back to you

Everything runs locally with no cloud services.

Getting Started

Required Models

Download the necessary models:

# Text-to-speech izwi pull Qwen3-TTS-12Hz-0.6B-Base # Speech recognition izwi pull Qwen3-ASR-0.6B-GGUF # Chat izwi pull Qwen3-8B-GGUF # Optional unified speech model izwi pull LFM2.5-Audio-1.5B-GGUF

Start Voice Mode

Start the server:
```
izwi serve
```
Open the web UI:
```
http://localhost:8080/voice
```
Click the microphone button to start speaking

Using Voice Mode

Web UI

Navigate to Voice in the sidebar
Click the microphone button to start recording
Speak your message
Click again to stop recording (or wait for auto-detection)
Listen to the AI response

Controls

Control	Action
Microphone	Start/stop recording
Speaker	Mute/unmute assistant responses
Settings	Configure models, playback speed, and speech detection

Configuration

Select Voice

Choose from available voices in the settings panel. Different TTS models offer different voice options.

Select Models

Configure which models to use:

ASR Model — For speech recognition
TTS Model — For speech synthesis
Chat Model — For response generation

Voice Agent Prompt

Use the Voice Agent Prompt section in settings to customize the assistant's speaking style and behavior. Prompt changes are saved locally and apply the next time you start a voice session.

Observational Memory

Use Observational Memory in settings to review, enable, disable, or delete the stable user memories captured from modular voice conversations. You can forget individual memories or clear them all at any time.

Audio Settings

Auto-detect silence — Automatically stop recording when you stop speaking
Playback speed — Adjust response playback speed from 0.75x to 1.75x
Mute output — Silence assistant playback without ending the session

Tips for Best Results

Use a good microphone — Built-in laptop mics work but external mics are better
Minimize background noise — Find a quiet environment
Speak clearly — Natural pace, clear pronunciation
Wait for responses — Let the AI finish before speaking again

Keyboard Shortcuts

Shortcut	Action
`Space`	Start or stop the current voice session
`Escape`	Stop the current voice session
`M`	Mute/unmute

Shortcuts are ignored while focus is inside a text field or other editable control.

Troubleshooting

No audio input detected

Check your microphone permissions in system settings
Ensure the correct input device is selected
Test your microphone in another application

Responses are slow

Use smaller models for faster responses
Ensure models are loaded (not loading on-demand)
Check system resources (RAM, CPU usage)

Poor transcription accuracy

Speak more clearly and slowly
Reduce background noise
Try a larger ASR model (Qwen3-ASR-1.7B-GGUF)