YouTube video summarizer script

One of the best uses for ChatGPT (as of July 2023) is to summarize content. By configuring the API calls to disable randomness, it becomes an incredible tool to compact huge chunks of text. Now, combine that with my dislike for long YouTube videos (almost always artificially extended to get more Ad impressions), and a recent discovery of a YouTube transcript (subtitles) API (this one)... and you'll get where my thinking went: to build a script that fetched the English transcript, and summarized it for me.

ChatGPT 3.5 provides a 16k tokens model, which means that, if the video is not too chatty, you can fit a decent amount of minutes in length. I've set a default of 20, but at times even 15 minutes generate too many tokens, and other times a ~35 min long video fits perfectly, so your mileage may vary (I added an optional argument to override the default).

My prompt is probably not perfect, but works nicely alongside a temperature=0 setup, and I force it to be of a minimum length because if not at times it was way too brief (succinct maybe? because was still good):

Generate a summary of at minimum {SUMMARY_WORDS} words from the content below, delimited by triple @ symbols.

Content: @@@{transcript}@@@

I just pull the english transcript, remove any non-speech lines ([an eerie sound] and the like), join then by new lines, and good to go to feed the LLM. It also showcases how good these models are ingesting unstructured content, because there are almost no punctuation marks, making it hard to read by us humans.

For the moment, the script leaves both the english transcript and the generated summary as .txt files for debugging purposes, but I might change that, or make it also a toggle.

Anyway, the (small) source code can be found at my GitHub. In the folder containing that file there's also a config.py.sample where you should add your OpenAI API key, and rename it to config.py.

Tags: Development

YouTube video summarizer script published @ . Author: