Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Online coding site: Ancient Brain



CA114      CA170

CA686      Projects

Free AI coding exercises

How to write a Shell script to download videos from YouTube

Here is a testing Shell script exercise:
Write a program to download a video from YouTube. In Shell script. From first principles.


Ever since YouTube started video hosting in 2005, it has been possible to write a Shell script to download a video from YouTube. It is a nice demo of the power of Shell scripts.

There are lots of YouTube downloaders out there, but it is a good exercise to write one yourself from first principles. You may be surprised that one can be written in Shell.


Usage will be like:
 youtube (url) 
For example:
 youtube https://www.youtube.com/watch?v=rQZfCd9BOJE  

Find the MP4 file. Download it to   y.mp4
Or find the FLV file. Download it to   y.flv
Many formats may exist.



General strategy

  1. The URL you see in your browser:
    is a permanent "home page" for the movie, with comments, related movies, etc.
    It is not the URL of the movie itself.
    However if we look inside the HTML source of this page, we can find the URL of the movie.
    The movie may only exist at that URL for the next few minutes.

  2. The "https://" form of the page seems to cause problems in my analysis, compared to the "http://" form of the page.
    The solution is simple:
    The start of your program should convert "https://" to "http://" before proceeding.

Current location of URL of video

The URL of the video (as opposed to the URL of the page) is currently in a section that looks like this:
 <script> var ytplayer = ....

There are multiple URLs, and they are found like this:

... url=THEURL ...

THEURL contains googlevideo.com and is Percent encoded.
The URLs are delimited by any or all of the following:

  1. comma
  2. double quotes
  3. ampersand (In fact the Hex value, which is "\u0026".)

Recipe that currently works

For 40%

  1. Take in the URL of the page as a command-line argument. Convert https to http using sed as follows:
    arg=`echo "$1" | sed "s|https://|http://|"`

  2. Use wget to get the web page and output to a file:
     wget -q -O - "$arg"	> y.htm

  3. Check this works before proceeding.

    N.B. When debugging the rest of the program, just work with y.htm without going to fetch the page from YouTube again.
    (This is just in case large numbers of requests from this class to YouTube in a short time cause problems.)
    When you have debugged the program, you can fix it so it fetches the page from Youtube again.

  4. When the above is working: grep the file for "url=".
  5. Check this works before proceeding.

  6. When the above is working: Change all double quotes (") to new lines.
    Pipe the output of the grep above to a tr command like this:

    tr '"' '\n' 

  7. grep the result for "url=" again.
  8. Check this works before proceeding.

  9. When the above is working: Change all commas to new lines using tr similar to the above.
  10. grep the result for "url=" again.
  11. Check this works before proceeding.

  12. When the above is working: Change all "\u0026" to new lines ("\n").
    Pipe the above to a sed command like this:

    sed 's|\\u0026|\n|g'
    (Note: "\" has special meaning to sed so I "escaped" it.)

  13. grep the result for "url=" again.
  14. Check this works before proceeding.

  15. When the above is working: grep further for "googlevideo.com".
  16. Check this works before proceeding.

  17. You now have a listing of multiple video URLs. The URLs are a bit messy and we will need to tidy them up. They look like this:


For 60%

  1. When the above is working: Remove the "url=" bits.
    Pipe the above to a sed command like this:
    sed 's|^url=|\n|'
    For the meaning of "^" see string matching / regular expressions.
  2. Check this works before proceeding.

  3. When the above is working:
  4. You now have a listing of multiple video URLs with different "itag" values.
    The URLs are Percent encoded so we will need to decode them.
    The itag values look like itag%3DVALUE which is percent-encoded version of itag=VALUE

  5. Pick one of the URLs to download based on this Guide to itag values (see "Comparison of YouTube media encoding options").

    I suggest this one:
    • itag=18 (MP4 360). Save as file.mp4
    Alternatives include:
    • itag=5 (FLV). Save as file.flv
    • itag=22 (MP4 720). Save as file.mp4
    Not all formats always exist.

  6. We pipe the above to a grep for the itag we want.
  7. We now have a single URL, looking something like this:


  8. Check this works before proceeding.

For 100%

  1. When the above is working: We percent-decode the URL.
    This is tricky to do in Shell. So I will give you a Perl script to do this.
    Save the following Perl script to a file and call it "percentdecode":
    use URI::Escape;
    my $encodedurl = $ARGV[0];
    my $url = uri_unescape($encodedurl);
    print "$url\n";

  2. It decodes the command-line parameter (ARGV[0]).
    To use it, I suggest you pipe the output of our program so far to a new script that looks like this:

    read url
    newurl=`percentdecode "$url"`
    echo "$newurl"

  3. This should now print a percent-decoded (i.e. normal) URL looking something like this:


    This is the URL of the video.

  4. Check this works before proceeding.

  5. When the above is working: Use wget to fetch this URL and store the output in a file like   y.flv or   y.mp4
  6. I thought we might need to convert it to "http://" but "https://" seems to work fine.

Play video

Video can be played in various ways, depending on installation:
  1. In browser.
    Use file://
    Or put video in web directory and use http://

  2. "Videos"

  3. VLC.
    Might have to change: Tools - Preferences - Audio - Output Type - UNIX OSS audio

  4. RealPlayer

The script can launch the player automatically:
vlc file &



ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.

Wikipedia: Sometimes I link to Wikipedia. I have written something In defence of Wikipedia. It is often a useful starting point but you cannot trust it. Linking to it is like linking to a Google search. A starting point, not a destination. I automatically highlight in red all links to Wikipedia and Google search and other possibly-unreliable user-generated content.