Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Search:

CA249      CA318      CA425      CA651

w2mind.computing.dcu.ie      w2mind.org


How to write a Shell script to download videos from YouTube

Here is a great Shell script exercise:
Write a program to download a video from YouTube. In Shell script. From first principles.




Introduction

For years I have been writing - for classes and for fun - a Shell script to download videos from YouTube.

There are lots of YouTube downloaders out there, but it is a good exercise to write one yourself from first principles. You may be surprised that one can be written in Shell.

This script needs re-writing every year, because YouTube keeps changing their formats.




Usage

Usage like:
youtube (url)
e.g.:
youtube "http://www.youtube.com/watch?v=OuSdU8tbcHY"
(which is "Titanic in 5 seconds")
Find the FLV file. Download it to   y.flv
(Or find the MP4 file and download it to   y.mp4, etc.)



We pick this video to debug with because it is small!




General strategy

  1. The URL you see in your browser:
      http://www.youtube.com/watch?v=ID
    is a permanent "home page" for the movie, with comments, related movies, etc.
    It is not the URL of the movie itself.
    However if we look inside the HTML source of this page, we can find the URL of the movie.
    The movie may only exist at that URL for the next few minutes.

  2. The "https://" form of the URL may not work, or be difficult, when using wget or other command-line tools.
    The solution is simple:
    The first line of your program should use sed to convert "https://" to "http://" before proceeding:
    
    arg=`echo "$1" | [SOME sed COMMAND]`
    
    

    
    
  3. Then you can get the web page:
    
     wget -q -O - "$arg"	> y.htm
    
    

    And then parse the web page to find the URL of the video.

    
    


2014 strategy

We parse the web page that we just downloaded.
URL of the video is on lines like this:
... \u0026url=arg\u0026 ...
It may be at the start or end of the section, delimited by quotes:
"url=arg\u0026 ...
So proceed as follows:


  1. Change all double quotes (") to new lines. Use tr.

  2. Change all "\u0026" to new lines. Use sed.

  3. grep the result for lines containing "url="

  4. grep further for "googlevideo.com".
    This will actually give multiple video URLs with different "itag" values.

    [5 out of 10]

    
    
  5. Pick one of the URLs to download based on this Guide to itag values (see "Comparison of YouTube media encoding options").
    The following might be useful:
    • itag=5 (lowest common denominator FLV). Save as file.flv
    • itag=18 (MP4 360). Save as file.mp4
    • itag=22 (MP4 720). Save as file.mp4
    Warning: Not all formats will exist.
    You can pick out one of the URLs using grep.
    I suggest for testing use itag=5.

  6. When we have picked one of these URL lines, it should look something like this:

    url=http%3A%2F%2Fr11---sn-q0c7dn7r.googlevideo.com%2Fvideoplayback%3Fitag%3D43%26mt%3D1394118075%26expire%3D1394143484%26ratebypass%3Dyes%26signature%3DE62B548B55053599CFA362B4DB2756E0252C0FDA.415E89CAB2BC022F7DDA3D326F45C2F11D6CB68F%26sver%3D3%26id%3Dbf1b99b973e8baa3%26ipbits%3D0%26key%3Dyt5%26ms%3Dau%26sparams%3Did%252Cip%252Cipbits%252Citag%252Cratebypass%252Csource%252Cupn%252Cexpire%26mv%3Dm%26upn%3DvnHynJlopWA%26source%3Dyoutube%26fexp%3D917000%252C922520%252C916623%252C937417%252C937416%252C913434%252C936910%252C936913%252C902907%252C934022%26ip%3D136.206.217.17,itag=18

  7. Get rid of "url=" at the start. Use sed.

  8. Get rid of ",..." at the end. Use sed.
    Use the wildcard ",.*$"
    (which means "comma, any character any number of times, end of line")

    We now have a Percent encoded URL looking something like this:

    http%3A%2F%2Fr11---sn-q0c7dn7r.googlevideo.com%2Fvideoplayback%3Fitag%3D43%26mt%3D1394118075%26expire%3D1394143484%26ratebypass%3Dyes%26signature%3DE62B548B55053599CFA362B4DB2756E0252C0FDA.415E89CAB2BC022F7DDA3D326F45C2F11D6CB68F%26sver%3D3%26id%3Dbf1b99b973e8baa3%26ipbits%3D0%26key%3Dyt5%26ms%3Dau%26sparams%3Did%252Cip%252Cipbits%252Citag%252Cratebypass%252Csource%252Cupn%252Cexpire%26mv%3Dm%26upn%3DvnHynJlopWA%26source%3Dyoutube%26fexp%3D917000%252C922520%252C916623%252C937417%252C937416%252C913434%252C936910%252C936913%252C902907%252C934022%26ip%3D136.206.217.17

    [7 out of 10]

    
    
  9. We percent-decode the URL with the following Perl script. Run this:

    percentdecode "$url"
    

    where "percentdecode" is this Perl script:

    #!/usr/bin/perl
     
    use URI::Escape;
     
    my $encodedurl = $ARGV[0];
    
    my $url = uri_unescape($encodedurl);
     
    print "$url\n";
    

    We now have a URL looking something like this:

    http://r11---sn-q0c7dn7r.googlevideo.com/videoplayback?itag=43&mt=1394118075&expire=1394143484&ratebypass=yes&signature=E62B548B55053599CFA362B4DB2756E0252C0FDA.415E89CAB2BC022F7DDA3D326F45C2F11D6CB68F&sver=3&id=bf1b99b973e8baa3&ipbits=0&key=yt5&ms=au&sparams=id%2Cip%2Cipbits%2Citag%2Cratebypass%2Csource%2Cupn%2Cexpire&mv=m&upn=vnHynJlopWA&source=youtube&fexp=917000%2C922520%2C916623%2C937417%2C937416%2C913434%2C936910%2C936913%2C902907%2C934022&ip=136.206.217.17

    This is actually the URL of the video!

  10. Use wget to fetch this URL and store the output in   y.flv or   y.mp4 etc.

    [10 out of 10]






Finish




The file is now on your disk and can be played in a local player.



Feeds      HumphrysFamilyTree.com

Bookmark and Share           On Internet since 1987.