How to write a Shell script to download videos from YouTube
Here is a challenging Shell script exercise.
Write a program to download a video from YouTube.
In Shell script.
From first principles.
Ever since YouTube started video hosting in 2005
it has been possible to write
a Shell script to download a video from YouTube.
As at 2020, it is still possible.
It is a nice demo of the power of Shell scripts.
There are lots of
but it is a good exercise
to write one yourself
from first principles.
You may be surprised that one can
be written in Shell.
Usage will be like:
Download it to y.mp4
Many formats may exist.
URL of the page
The URL you see in your browser:
is a permanent "home page" for the movie, with comments, related movies, etc.
It is not the URL of the movie itself.
However if we look inside the
HTML source of this page, we can find the URL of the movie.
The movie may only exist at that URL for the next few minutes.
URL of the video
Where is the URL of the actual video
(as opposed to the URL of the page)?
In fact there are multiple video URLs for different formats.
They are buried deep inside the source code.
They look like this:
They are delimited by double quotes.
Fixing the URL
The URL of the video looks a bit strange.
It seems like we need to make some changes to it:
- "https:\/\/" looks like it should be changed to "https://"
- "\/" looks like it should be changed to "/"
- If you are familiar with how URLs do arguments,
"\\u0026" looks like
it should be changed to ampersand.
In fact, that is pretty much it.
Do these fixes and you can in fact fetch the video.
Recipe that currently works
This recipe works as at 2020:
- Take in the URL of the page as a
to get the web page.
Get the web page once.
Save to a file.
Then when debugging, use that file,
without going to fetch the page from YouTube again.
When you have debugged the program, you can fix it so it always fetches the page.
to put a new line in front of every "http".
This is to isolate the "http" lines.
to change all double quotes (") to new lines.
- grep for "googlevideo" again.
- grep for
Now we have a short list of URLs.
But the URLs need some editing.
to change all "\\u0026" to "&"
Top tip: "&" means something to sed.
So use "\&" which means "literally the ampersand character".
Also notice we have
"\" in the first pattern (the pattern to search for). That has special meaning.
You can either fix that now, or note that it will be fixed by the next step.
See what the URLs look like now. Do they look normal yet?
to delete the "\" character.
OK now do the URLs look normal?
- You now have a listing of multiple video URLs
with different "itag" values.
The itag values look like
Pick URL to download based on this
guide to itags:
YouTube video stream format codes
I suggest this one:
- itag=18 (MP4 360).
Save as file.mp4
Not all formats always exist.
- itag=5 (FLV).
Save as file.flv
- itag=22 (MP4 720).
Save as file.mp4
- We pipe the above to a grep for the itag we want.
- We now have a single URL, that looks something like this:
- wget the URL to output to a file like file.mp4 and you are done!
Video can be played in various ways, depending on installation:
- In browser.
Or put video in web directory and use http://
Might have to change:
Tools - Preferences - Audio - Output Type - UNIX OSS audio
The script can launch the player automatically:
vlc file &