izwi align
izwi align
Forced alignment — align text to audio at word level.
Synopsis
izwi align <FILE> <TEXT> [OPTIONS]Description
Aligns reference text to audio, producing word-level timestamps. Useful for:
- Subtitle generation
- Karaoke timing
- Audio editing
- Pronunciation analysis
Arguments
| Argument | Description |
|---|---|
<FILE> | Audio file to align |
<TEXT> | Reference text to align |
Options
| Option | Description | Default |
|---|---|---|
-m, --model <MODEL> | Alignment model | qwen3-forcedaligner-0.6b |
-f, --format <FORMAT> | Output format: text, json, verbose_json | json |
-o, --output <PATH> | Output file (default: stdout) | — |
Examples
Basic alignment
izwi align audio.wav "Hello world, this is a test."Save to file
izwi align audio.wav "Hello world" --output alignment.jsonText output
izwi align audio.wav "Hello world" --format textOutput Formats
JSON (default)
{
"alignments": [
{"word": "Hello", "start": 0.0, "end": 0.45},
{"word": "world", "start": 0.50, "end": 0.95},
{"word": "this", "start": 1.10, "end": 1.30},
{"word": "is", "start": 1.35, "end": 1.45},
{"word": "a", "start": 1.50, "end": 1.55},
{"word": "test", "start": 1.60, "end": 2.00}
],
"duration": 2.0
}Text
Hello 0.00 - 0.45 world 0.50 - 0.95 this 1.10 - 1.30 is 1.35 - 1.45 a 1.50 - 1.55 test 1.60 - 2.00Use Cases
Subtitle Generation
Generate precise timestamps for subtitles:
izwi align video_audio.wav "$(cat script.txt)" --output subtitles.jsonAudio Editing
Find exact word boundaries for editing:
izwi align podcast.wav "um actually" --format jsonPronunciation Analysis
Analyze timing of spoken words:
izwi align recording.wav "The quick brown fox" --format verbose_jsonAvailable Models
| Model | Description |
|---|---|
qwen3-forcedaligner-0.6b | Qwen3-based forced aligner |
See Also
- izwi transcribe — Speech-to-text
- izwi diarize — Speaker diarization