How to transcribe a Podcast with Whisper on an ARM Mac using Homebrew
Goofing around with podcast transcripts today. Here's what I did to transcribe version 26 of the Breaking Change podcast, after a couple hours of being mad at how hard the Internet was making it:
- Run
brew install whisper-cpp
, because I'm fucking sick of cloning one-off Python repos. - Download a model and put it somewhere (I chose
ggml-large-v3-turbo-q8_0.bin
because it's apparently slower but more accurate than "q5", whatever the hell any of this means) - Since your podcast is probably an MP3, you'll have to convert it to a WAV file for Whisper. Rather than create an interstitial file we'd have to clean up later, we'll just pipe the conversion from
ffmpeg
. That bit of the command looks like:ffmpeg -i "v26.mp3" -ar 16000 -ac 1 -f wav -
- Next is the actual Whisper command, which requires us to reference both the Metal stuff (which ships with
whisper-cpp
) as well as our model (which I just put in iCloud Drive so I could safely forget about it). I also set it to output SRT (because I wrote a Ruby gem that converts SRT files to human-readable transcripts) and hint that I'm speaking in English. That bit of the command looks like this:GGML_METAL_PATH_RESOURCES="$(brew --prefix whisper-cpp)/share/whisper-cpp" whisper-cpp --model ~icloud-drive/dotfiles/models/whisper/ggml-large-v3-turbo-q8_0.bin --output-srt --language en --output-file "v26.srt"
Here's the above put together into a brief script I named transcribe-podcast
that will just transcribe whatever file you pass to it:
# Check if an input file is provided
if [ -z "$1" ]; then
echo "Usage: $0 input_audio_file"
exit 1
fi
input_file="$1"
base_name=$(basename "$input_file" | sed 's/\.[^.]*$//')
# Convert input audio to 16kHz mono WAV and pipe to whisper-cpp
ffmpeg -i "$input_file" -ar 16000 -ac 1 -f wav - | \
GGML_METAL_PATH_RESOURCES="$(brew --prefix whisper-cpp)/share/whisper-cpp" \
whisper-cpp --model ~/icloud-drive/dotfiles/models/whisper/ggml-large-v3-turbo-q8_0.bin \
--output-srt --language en --output-file "$base_name" -
If you're writing a script like this for yourself, just replace the path to the --model
flag and you too will be able to do cool stuff like this:
$ transcribe-podcast your-podcast.mp3
As for performance, on an M4 Pro with 14 CPU cores, the above three-and-a-half hour podcast took a bit over 11 minutes. On an M2 Ultra with 24 cores, the same file was finished in about 8 minutes. Cool.