Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Search:

CA249      CA318      CA425      CA651

w2mind.computing.dcu.ie      w2mind.org


How to write a Shell script to download videos from YouTube

Here is a great Shell script exercise:
Write a program to download a video from YouTube. In Shell script. From first principles.




Introduction

For years I have been writing - for classes and for fun - a Shell script to download videos from YouTube.

There are lots of YouTube downloaders out there, but it is a good exercise to write one yourself from first principles. You may be surprised that one can be written in Shell.

This script needs re-writing every year, because YouTube keeps changing their formats.




Usage

Usage like:
youtube (url)
e.g.:
youtube "http://www.youtube.com/watch?v=OuSdU8tbcHY"
(which is "Titanic in 5 seconds")
Finds the FLV file.
Downloads it to   y.flv



We pick this video to debug with because it is small!




Strategy

  1. The URL you see in your browser:
      http://www.youtube.com/watch?v=ID
    is a permanent "home page" for the movie, with comments, related movies, etc.
    It is not the URL of the movie itself.
    However if we look inside the HTML source of this page, we can find the URL of the movie.
    The movie may only exist at that URL for the next few minutes.

  2. The "https://" form of the URL may not work, or be difficult, when using wget or other command-line tools.
    The solution is simple:
    The first line of your program should use sed to convert "https://" to "http://" before proceeding:
    
    arg=`echo "$1" | [SOME sed COMMAND]`
    
    

    
    
  3. Then you can get the web page:
    
     wget -q -O - "$arg"	> y.htm
    
    

    And then parse the web page to find the URL of the video.

    
    


2014 strategy

We parse the web page that we just downloaded.
URL of the video is on lines like this:
... \u0026url=arg\u0026 ...
It may be at the start or end of the section, delimited by quotes:
"url=arg\u0026 ...
So proceed as follows:


  1. Change all double quotes (") to new lines. Use tr.

  2. Change all "\u0026" to new lines. Use sed.

  3. grep the result for lines containing "url="

  4. grep further for "googlevideo.com".
    This will give you a bunch of likely looking URLs. The first one seems to normally work as the video.

    [5 out of 10]

    
    
  5. Assuming the first one is the one we want, we can extract it with:
    head -1
    It should look something like this:

    url=http%3A%2F%2Fr11---sn-q0c7dn7r.googlevideo.com%2Fvideoplayback%3Fitag%3D43%26mt%3D1394118075%26expire%3D1394143484%26ratebypass%3Dyes%26signature%3DE62B548B55053599CFA362B4DB2756E0252C0FDA.415E89CAB2BC022F7DDA3D326F45C2F11D6CB68F%26sver%3D3%26id%3Dbf1b99b973e8baa3%26ipbits%3D0%26key%3Dyt5%26ms%3Dau%26sparams%3Did%252Cip%252Cipbits%252Citag%252Cratebypass%252Csource%252Cupn%252Cexpire%26mv%3Dm%26upn%3DvnHynJlopWA%26source%3Dyoutube%26fexp%3D917000%252C922520%252C916623%252C937417%252C937416%252C913434%252C936910%252C936913%252C902907%252C934022%26ip%3D136.206.217.17,itag=18

  6. Get rid of "url=" at the start. Use sed.

  7. Get rid of ",..." at the end. Use sed.
    Use the wildcard ",.*$"
    (which means "comma, any character any number of times, end of line")

    We now have a Percent encoded URL looking something like this:

    http%3A%2F%2Fr11---sn-q0c7dn7r.googlevideo.com%2Fvideoplayback%3Fitag%3D43%26mt%3D1394118075%26expire%3D1394143484%26ratebypass%3Dyes%26signature%3DE62B548B55053599CFA362B4DB2756E0252C0FDA.415E89CAB2BC022F7DDA3D326F45C2F11D6CB68F%26sver%3D3%26id%3Dbf1b99b973e8baa3%26ipbits%3D0%26key%3Dyt5%26ms%3Dau%26sparams%3Did%252Cip%252Cipbits%252Citag%252Cratebypass%252Csource%252Cupn%252Cexpire%26mv%3Dm%26upn%3DvnHynJlopWA%26source%3Dyoutube%26fexp%3D917000%252C922520%252C916623%252C937417%252C937416%252C913434%252C936910%252C936913%252C902907%252C934022%26ip%3D136.206.217.17

    [7 out of 10]

    
    
  8. We percent-decode the URL with the following Perl script. Run this:

    percentdecode "$url"
    

    where "percentdecode" is this Perl script:

    #!/usr/bin/perl
     
    use URI::Escape;
     
    my $encodedurl = $ARGV[0];
    
    my $url = uri_unescape($encodedurl);
     
    print "$url\n";
    

    We now have a URL looking something like this:

    http://r11---sn-q0c7dn7r.googlevideo.com/videoplayback?itag=43&mt=1394118075&expire=1394143484&ratebypass=yes&signature=E62B548B55053599CFA362B4DB2756E0252C0FDA.415E89CAB2BC022F7DDA3D326F45C2F11D6CB68F&sver=3&id=bf1b99b973e8baa3&ipbits=0&key=yt5&ms=au&sparams=id%2Cip%2Cipbits%2Citag%2Cratebypass%2Csource%2Cupn%2Cexpire&mv=m&upn=vnHynJlopWA&source=youtube&fexp=917000%2C922520%2C916623%2C937417%2C937416%2C913434%2C936910%2C936913%2C902907%2C934022&ip=136.206.217.17

    This is actually the URL of the video!

  9. Use wget to fetch this URL and store the output in   y.flv

    [10 out of 10]






Finish




The FLV file is now on your disk and can be played in RealPlayer or other players.



Feeds      HumphrysFamilyTree.com

Bookmark and Share           On Internet since 1987.