I use an open-source tool called whisper-ctranslate2 to generate subtitles, it’s based on OpenAI Whisper but is much faster, uses less memory, and has the ability to generate subtitles with word-level highlights. I use the following settings:
whisper-ctranslate2 --model large-v3 --output_format vtt --highlight_words=True --device cuda --language en --word_timestamps=True video.mp4
Here the model can be tiny, base, small, medium or large. I’ve had good luck with small and above. I just normally combine the subtitles with the video file and then upload to drive. Most of my previous sprint updates should have them; however, I realised that Google Drive won’t show them unless you use the Drive interface to associate a subtitle file with the video, so that’s what I did this time.
My normal sprint recording process is to use OBS to record the update, and then I have a simple script that will take in the sprint number, and the path to the video file and create subtitles, open the editor for me to fix things (like my name which it will always get wrong), then use
mkvmerge to combine the subtitle and video into an mkv file, use
rclone to upload it to the right place, use
rclone to get the link to the file which includes its
id, and insert this automatically into the iframe code to get the embed code directly in the terminal. It also copies this to the clipboard to save even more effort
I was also considering using a locally running LLM via ollama to ingest the transcript and generate a summary of the kind Zoom does, but that turned out to require a bit too much fiddling for very little benefit so I gave up.