justin․searls․co

How to transcribe a Podcast with Whisper on an ARM Mac using Homebrew

Goofing around with podcast transcripts today. Here's what I did to transcribe version 26 of the Breaking Change podcast, after a couple hours of being mad at how hard the Internet was making it:

  1. Run brew install whisper-cpp, because I'm fucking sick of cloning one-off Python repos.
  2. Download a model and put it somewhere (I chose ggml-large-v3-turbo-q8_0.bin because it's apparently slower but more accurate than "q5", whatever the hell any of this means)
  3. Since your podcast is probably an MP3, you'll have to convert it to a WAV file for Whisper. Rather than create an interstitial file we'd have to clean up later, we'll just pipe the conversion from ffmpeg. That bit of the command looks like: ffmpeg -i "v26.mp3" -ar 16000 -ac 1 -f wav -
  4. Next is the actual Whisper command, which requires us to reference both the Metal stuff (which ships with whisper-cpp) as well as our model (which I just put in iCloud Drive so I could safely forget about it). I also set it to output SRT (because I wrote a Ruby gem that converts SRT files to human-readable transcripts) and hint that I'm speaking in English. That bit of the command looks like this: GGML_METAL_PATH_RESOURCES="$(brew --prefix whisper-cpp)/share/whisper-cpp" whisper-cpp --model ~icloud-drive/dotfiles/models/whisper/ggml-large-v3-turbo-q8_0.bin --output-srt --language en --output-file "v26.srt"

Here's the above put together into a brief script I named transcribe-podcast that will just transcribe whatever file you pass to it:

# Check if an input file is provided
if [ -z "$1" ]; then
  echo "Usage: $0 input_audio_file"
  exit 1
fi

input_file="$1"
base_name=$(basename "$input_file" | sed 's/\.[^.]*$//')

# Convert input audio to 16kHz mono WAV and pipe to whisper-cpp
ffmpeg -i "$input_file" -ar 16000 -ac 1 -f wav - | \
  GGML_METAL_PATH_RESOURCES="$(brew --prefix whisper-cpp)/share/whisper-cpp" \
  whisper-cpp --model ~/icloud-drive/dotfiles/models/whisper/ggml-large-v3-turbo-q8_0.bin \
  --output-srt --language en --output-file "$base_name" -

If you're writing a script like this for yourself, just replace the path to the --model flag and you too will be able to do cool stuff like this:

$ transcribe-podcast your-podcast.mp3

As for performance, on an M4 Pro with 14 CPU cores, the above three-and-a-half hour podcast took a bit over 11 minutes. On an M2 Ultra with 24 cores, the same file was finished in about 8 minutes. Cool.


Got a taste for hot, fresh takes?

Then you're in luck, because you can subscribe to this site via RSS or Mastodon! And if that ain't enough, then sign up for my newsletter and I'll send you a usually-pretty-good essay once a month. I also have a solo podcast, because of course I do.