IzwiIzwi

izwi diarize

izwi diarize

Speaker diarization — identify and separate multiple speakers in audio.


Synopsis

izwi diarize <FILE> [OPTIONS]

Description

Analyzes audio to identify different speakers and when they spoke. Optionally includes transcription with speaker labels.


Arguments

ArgumentDescription
<FILE>Audio file to analyze

Options

OptionDescriptionDefault
-m, --model <MODEL>Diarization modelsortformer-4spk
-n, --num-speakers <N>Expected number of speakersAuto-detect
-f, --format <FORMAT>Output format: text, json, verbose_jsontext
-o, --output <PATH>Output file (default: stdout)
--transcribeInclude transcription with speaker labels
--asr-model <MODEL>ASR model for transcriptionqwen3-asr-0.6b

Examples

Basic diarization

izwi diarize meeting.wav

With known speaker count

izwi diarize meeting.wav --num-speakers 3

With transcription

izwi diarize meeting.wav --transcribe

JSON output

izwi diarize meeting.wav --format json --output diarization.json

Full pipeline with custom models

izwi diarize interview.wav \\ --transcribe \\ --asr-model qwen3-asr-1.7b \\ --format verbose_json \\ --output interview_transcript.json

Output Formats

Text

[00:00 - 00:05] Speaker 1: Welcome to the meeting. [00:05 - 00:12] Speaker 2: Thanks for having me. [00:12 - 00:20] Speaker 1: Let's start with the agenda.

JSON

{
  "segments": [
    {"speaker": "Speaker 1", "start": 0.0, "end": 5.2},
    {"speaker": "Speaker 2", "start": 5.5, "end": 12.1}
  ],
  "num_speakers": 2
}

Verbose JSON (with transcription)

{
  "segments": [
    {
      "speaker": "Speaker 1",
      "start": 0.0,
      "end": 5.2,
      "text": "Welcome to the meeting."
    },
    {
      "speaker": "Speaker 2", 
      "start": 5.5,
      "end": 12.1,
      "text": "Thanks for having me."
    }
  ],
  "num_speakers": 2,
  "duration": 120.5
}

Available Models

ModelDescription
sortformer-4spkStreaming Sortformer, up to 4 speakers

See Also