Sunday, Dec 15, 2024

How to transcribe a Podcast with Whisper on an ARM Mac using Homebrew

Goofing around with podcast transcripts today. Here's what I did to transcribe version 26 of the Breaking Change podcast, after a couple hours of being mad at how hard the Internet was making it:

Run brew install whisper-cpp, because I'm fucking sick of cloning one-off Python repos.
Download a model and put it somewhere (I chose ggml-large-v3-turbo-q8_0.bin because it's apparently slower but more accurate than "q5", whatever the hell any of this means)
Since your podcast is probably an MP3, you'll have to convert it to a WAV file for Whisper. Rather than create an interstitial file we'd have to clean up later, we'll just pipe the conversion from ffmpeg. That bit of the command looks like: ffmpeg -i "v26.mp3" -ar 16000 -ac 1 -f wav -
Next is the actual Whisper command, which requires us to reference both the Metal stuff (which ships with whisper-cpp) as well as our model (which I just put in iCloud Drive so I could safely forget about it). I also set it to output SRT (because I wrote a Ruby gem that converts SRT files to human-readable transcripts) and hint that I'm speaking in English. That bit of the command looks like this: GGML_METAL_PATH_RESOURCES="$(brew --prefix whisper-cpp)/share/whisper-cpp" whisper-cpp --model ~icloud-drive/dotfiles/models/whisper/ggml-large-v3-turbo-q8_0.bin --output-srt --language en --output-file "v26.srt"

Here's the above put together into a brief script I named transcribe-podcast that will just transcribe whatever file you pass to it:

# Check if an input file is provided
if [ -z "$1" ]; then
  echo "Usage: $0 input_audio_file"
  exit 1
fi

input_file="$1"
base_name=$(basename "$input_file" | sed 's/\.[^.]*$//')

# Convert input audio to 16kHz mono WAV and pipe to whisper-cpp
ffmpeg -i "$input_file" -ar 16000 -ac 1 -f wav - | \
  GGML_METAL_PATH_RESOURCES="$(brew --prefix whisper-cpp)/share/whisper-cpp" \
  whisper-cpp --model ~/icloud-drive/dotfiles/models/whisper/ggml-large-v3-turbo-q8_0.bin \
  --output-srt --language en --output-file "$base_name" -

If you're writing a script like this for yourself, just replace the path to the --model flag and you too will be able to do cool stuff like this:

$ transcribe-podcast your-podcast.mp3

As for performance, on an M4 Pro with 14 CPU cores, the above three-and-a-half hour podcast took a bit over 11 minutes. On an M2 Ultra with 24 cores, the same file was finished in about 8 minutes. Cool.

How to transcribe a Podcast with Whisper on an ARM Mac using Homebrew

Got a taste for hot, fresh takes?