Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Search:

CA170      CA216      CA249      CA318      CA651

w2mind.computing.dcu.ie      w2mind.org


How to write a Shell script to download videos from YouTube

Here is a great Shell script exercise:
Write a program to download a video from YouTube. In Shell script. From first principles.




Introduction

Ever since YouTube started video hosting, it has never been hard to write a Shell script to download videos from YouTube. It is a nice demo of the power of Shell scripts.

There are lots of YouTube downloaders out there, but it is a good exercise to write one yourself from first principles. You may be surprised that one can be written in Shell.




Usage

Usage like:
youtube (url)
e.g.:
youtube "http://www.youtube.com/watch?v=OuSdU8tbcHY"
(which is "Titanic in 5 seconds")
Find the FLV file. Download it to   y.flv
Or find the MP4 file and download it to   y.mp4
Many formats may exist.



"Titanic in 5 seconds" is a good video to debug with because it is small.

N.B. If I set this as an exercise: Develop it with any other video, not this one.




General strategy

  1. The URL you see in your browser:
      http://www.youtube.com/watch?v=ID
    is a permanent "home page" for the movie, with comments, related movies, etc.
    It is not the URL of the movie itself.
    However if we look inside the HTML source of this page, we can find the URL of the movie.
    The movie may only exist at that URL for the next few minutes.

  2. The "https://" form of the URL may not work, or be difficult, when using wget or other command-line tools.
    The solution is simple:
    The first line of your program should convert "https://" to "http://" before proceeding.


Current location of URL of video

The URL of the video (as opposed to the URL of the page) is currently in a section that looks like this:
 <script>var ytplayer = ytplayer || {};ytplayer.config = ....

There are multiple URLs, and they are found like this:

... \u0026url=THEURL\u0026 ...

Note that \u0026 is "&" encoded.

A URL may be at the start or end of these sections, delimited by quotes. So it could appear like:

... "url=THEURL\u0026 ...


Recipe that currently works


For 40%

  1. Take in the URL of the page as a command-line argument. Convert https to http using sed as follows:
    
    arg=`echo "$1" | sed "s|https://|http://|"`
    
    

  2. Use wget to get the web page and output to a file:
    
     wget -q -O - "$arg"	> y.htm
    
    

  3. Check this works before proceeding.


    N.B. When debugging the rest of the program, please try to work with y.htm without going to fetch the page from YouTube again.
    (This is just in case vast numbers of requests from this class to YouTube in a short space of time cause problems!)
    When you have debugged the program, you can restore it so it fetches the page from Youtube again.


  4. When the above is working: grep the file for "url=".
  5. Check this works before proceeding.

  6. When the above is working: Change all double quotes (") to new lines.
    Pipe the output of the grep above to a tr command like this:

    
    tr '"' '\n' 
    
    

  7. grep the result for "url=" again.
  8. Check this works before proceeding.

  9. When the above is working: Change all "\u0026" to new lines ("\n").
    Pipe the above to a sed command like this:

    
    sed 's|\\u0026|\n|g'
    
    
    (Note: "\" has special meaning to sed so I "escaped" it.)

  10. grep the result for "url=" again.
  11. Check this works before proceeding.

  12. You now have a listing of multiple video URLs. The URLs are a bit messy and we will need to tidy them up.


For 60%

  1. When the above is working: grep further for "googlevideo.com".
  2. Check this works before proceeding.

  3. When the above is working: Using sed like above, change "url=" to two new lines followed by "url=". This will show the URLs clearly.
  4. Check this works before proceeding.

  5. When the above is working: Remove the "url=" bits.
    Pipe the above to a sed command like this:
    
    sed 's|^url=||'
    
    
    For the meaning of "^" see string matching / regular expressions.
  6. Check this works before proceeding.

  7. When the above is working: Remove any commas and bits after them.
    Pipe the above to a sed command like this:
    
    sed 's|,.*$||'
    
    
    For the meaning of ".*" and "$" see string matching / regular expressions.
  8. Check this works before proceeding.

  9. When the above is working:
  10. You now have a clear listing of multiple video URLs with different "itag" values.
    The URLs are Percent encoded so we will need to decode them.
    The itag values look like itag%3DVALUE which is percent-encoded version of itag=VALUE

  11. Pick one of the URLs to download based on this Guide to itag values (see "Comparison of YouTube media encoding options").
    I suggest one of these:
    • itag=5 (lowest common denominator FLV). Save as file.flv
    • itag=18 (MP4 360). Save as file.mp4
    • itag=22 (MP4 720). Save as file.mp4
    Not all formats always exist.
  12. We pipe the above to a grep for the itag we want.
  13. We now have a single URL, looking something like this:

    http%3A%2F%2Fr11---sn-q0c7dn7r.googlevideo.com%2Fvideoplayback%3Fitag%3D43%26mt%3D1394118075%26expire%3D1394143484%26ratebypass%3Dyes%26signature%3DE62B548B55053599CFA362B4DB2756E0252C0FDA.415E89CAB2BC022F7DDA3D326F45C2F11D6CB68F%26sver%3D3%26id%3Dbf1b99b973e8baa3%26ipbits%3D0%26key%3Dyt5%26ms%3Dau%26sparams%3Did%252Cip%252Cipbits%252Citag%252Cratebypass%252Csource%252Cupn%252Cexpire%26mv%3Dm%26upn%3DvnHynJlopWA%26source%3Dyoutube%26fexp%3D917000%252C922520%252C916623%252C937417%252C937416%252C913434%252C936910%252C936913%252C902907%252C934022%26ip%3D136.206.217.17

  14. Check this works before proceeding.


For 100%

  1. When the above is working: We percent-decode the URL.
    This is tricky to do in Shell. So I will give you a Perl script to do this.
    Save the following Perl script to a file and call it "percentdecode":
    #!/usr/bin/perl
     
    use URI::Escape;
     
    my $encodedurl = $ARGV[0];
    
    my $url = uri_unescape($encodedurl);
     
    print "$url\n";
    

  2. It decodes the command-line parameter (ARGV[0]).
    To use it, I suggest you pipe the output of our program so far to a new script that looks like this:

     
    read url
    
    newurl=`percentdecode "$url"`
    
    echo "$newurl"
     
    

  3. This should now print a percent-decoded (i.e. normal) URL looking something like this:

    http://r11---sn-q0c7dn7r.googlevideo.com/videoplayback?itag=43&mt=1394118075&expire=1394143484&ratebypass=yes&signature=E62B548B55053599CFA362B4DB2756E0252C0FDA.415E89CAB2BC022F7DDA3D326F45C2F11D6CB68F&sver=3&id=bf1b99b973e8baa3&ipbits=0&key=yt5&ms=au&sparams=id%2Cip%2Cipbits%2Citag%2Cratebypass%2Csource%2Cupn%2Cexpire&mv=m&upn=vnHynJlopWA&source=youtube&fexp=917000%2C922520%2C916623%2C937417%2C937416%2C913434%2C936910%2C936913%2C902907%2C934022&ip=136.206.217.17

    This is the URL of the video.

  4. Check this works before proceeding.

  5. When the above is working: Use wget to fetch this URL and store the output in a file like   y.flv or   y.mp4





Finish




The file is now on your disk and can be played in a local player.



Feeds      HumphrysFamilyTree.com

Bookmark and Share           On Internet since 1987.