Speaker Diarization: The Feature Nobody Talks About But Everyone Needs
Every transcription tool converts audio to text. That's the baseline.
What most of them skip is telling you who said it.
That gap is speaker diarization. For regulated industries like law firms, healthcare, and financial services, knowing who said what isn't a convenience feature. It's the difference between a useful transcript and a compliance liability.
What Speaker Diarization Actually Is
Diarization answers one question: "Who spoke when?"
It takes an audio file with multiple voices and outputs something like this:
[00:00 - 00:15] Speaker 1: Let's discuss the settlement terms. [00:15 - 00:32] Speaker 2: I've reviewed the draft. We have concerns about clause 4. [00:32 - 00:48] Speaker 1: Which part specifically? [00:48 - 01:05] Speaker 2: The indemnification language. It exposes us to...
Instead of a flat wall of text, you get a structured record. You know exactly who said what, when.
The technical process involves:
- Voice activity detection: figuring out when someone is speaking
- Speaker segmentation: dividing audio into "who spoke when" chunks
- Speaker embedding: creating a voice "fingerprint" for each speaker
- Clustering: grouping segments by speaker identity
Overlapping voices, background noise, and similar-sounding speakers all degrade accuracy. Good diarization handles real-world audio, not studio-quality recordings.
Why Regulated Industries Can't Use Cloud Diarization
For law firms, healthcare providers, and financial services firms, the compliance picture changes considerably.
The Attorney-Client Confidentiality Problem
Cloud transcription sends audio to external servers. That audio is processed, stored (temporarily or permanently), and potentially logged. For attorney-client communications, that creates real risk across two areas: the duty of confidentiality and, depending on the circumstances, privilege itself.
The problem isn't whether a particular provider is trustworthy. Once data leaves the firm's control, a breach, a subpoena, or a misconfigured storage bucket changes everything. The firm can no longer fully account for where the data went or who could access it.
ABA Model Rule 1.6(c) requires lawyers to make reasonable efforts to prevent unauthorized disclosure of client information. Sending sensitive call recordings to a third-party cloud service for processing is difficult to square with that obligation — and harder still to document if it's ever challenged.
HIPAA and Medical Conversations
Patient conversations are protected health information (PHI). Under HIPAA, covered entities must have Business Associate Agreements with any vendor handling PHI. When you send audio to a cloud transcription service, you're creating a new vendor relationship with compliance implications.
For large healthcare systems, that's manageable. For private practices, solo practitioners, or therapy groups, it's overhead they don't need. And for patients? They'd rather their medical conversations never touched a cloud server at all.
Financial Services and Compliance
FINRA Rule 4511 and SEC Exchange Act Rule 17a-4 require broker-dealers to retain records of customer communications with specific retention periods, access controls, and audit requirements. Cloud transcription generates records outside the firm's infrastructure: copies the firm may not fully control, may struggle to produce on demand, and may not be able to delete when required.
Local diarization keeps audio and transcripts on-premises, under the firm's custody. Retention, access, and deletion stay within the firm's own compliance workflow.
The Local Advantage
On-device diarization addresses more than privacy. It gives firms direct control over what happens to the data.
When diarization runs locally:
- Audio never leaves your infrastructure
- No third-party data processing to document
- No BAA required for internal tools
- You decide retention, access, and deletion policies
- Works offline: no internet required for sensitive calls
For industries where confidentiality is both an ethical obligation and a competitive advantage, that control matters.
How Izwi Handles Diarization
Izwi runs speaker diarization entirely on-device. No cloud calls, no external API, no third-party processing. Nothing leaves your machine.
Here's what it looks like:
# Process a meeting recording
izwi diarize client-call.wav
Output:
{
"segments": [
{"speaker": "Speaker 1", "start": 0.0, "end": 15.2, "text": "Let's discuss the settlement terms."},
{"speaker": "Speaker 2", "start": 15.5, "end": 32.1, "text": "I've reviewed the draft. We have concerns about clause 4."},
{"speaker": "Speaker 1", "start": 32.3, "end": 48.0, "text": "Which part specifically?"},
{"speaker": "Speaker 2", "start": 48.2, "end": 65.8, "text": "The indemnification language. It exposes us to..."}
],
"num_speakers": 2,
"duration": 120.5
}
A few things we built in:
- Specifying speaker count: if you know there are 3 participants, tell Izwi for better accuracy
- Speaker labeling: rename "Speaker 1" to "Attorney" or "Client" in the output
- Flexible output: JSON for integration, plain text for sharing
- Works offline: process sensitive recordings on an air-gapped machine if needed
The same engine that handles transcription, TTS, and voice chat also handles diarization. One tool. All local.
Before You Send That Audio to the Cloud
If you're transcribing meetings, calls, or interviews in a regulated industry, diarization is useful — but so is knowing where that audio goes during processing.
Law firms, healthcare providers, and financial services firms all operate under rules that make third-party cloud transcription a real compliance concern. Local diarization removes the exposure point: no vendor relationship to document, no breach surface to manage, no third-party records outside your control.
Most transcription tools can't meet that requirement. Izwi is built specifically for it.
Try it. Pull a model. Process a recording. See who said what — without anything leaving your machine.
Try It Today
Download Izwi for free and start building voice-enabled agents. Join thousands of developers who are building privacy-first AI applications.
If you found this useful, consider starring us on GitHub
Star us on GitHub