Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Search:

CA249      CA318      CA425      CA651

w2mind.computing.dcu.ie      w2mind.org


How to write a Shell script to download videos from YouTube

Here is a great Shell script exercise:
Write a program to download a video from YouTube. In Shell script. From first principles.




Introduction

Ever since YouTube started video hosting, it has never been hard to write a Shell script to download videos from YouTube. It is a nice demo of the power of Shell scripts.

There are lots of YouTube downloaders out there, but it is a good exercise to write one yourself from first principles. You may be surprised that one can be written in Shell.




Usage

Usage like:
youtube (url)
e.g.:
youtube "http://www.youtube.com/watch?v=OuSdU8tbcHY"
(which is "Titanic in 5 seconds")
Find the FLV file. Download it to   y.flv
(Or find the MP4 file and download it to   y.mp4, etc.)



We pick this video to debug with because it is small!




General strategy

  1. The URL you see in your browser:
      http://www.youtube.com/watch?v=ID
    is a permanent "home page" for the movie, with comments, related movies, etc.
    It is not the URL of the movie itself.
    However if we look inside the HTML source of this page, we can find the URL of the movie.
    The movie may only exist at that URL for the next few minutes.

  2. The "https://" form of the URL may not work, or be difficult, when using wget or other command-line tools.
    The solution is simple:
    The first line of your program should use sed to convert "https://" to "http://" before proceeding:
    
    arg=`echo "$1" | [SOME sed COMMAND]`
    
    

    
    
  3. Then you can get the web page:
    
     wget -q -O - "$arg"	> y.htm
    
    

    And then parse the web page to find the URL of the video.

    
    


2014 strategy

We parse the web page that we just downloaded.
The URL of the video is on lines like this:
... \u0026url=arg\u0026 ...
It may be at the start or end of the section, delimited by quotes:
"url=arg\u0026 ...
So proceed as follows:


  1. Change all double quotes (") to new lines. Use tr.

  2. Change all "\u0026" to new lines. Use sed.
    Warning: "\" has special meaning to sed and must be escaped.

  3. grep the result for lines containing "url="

  4. grep further for "googlevideo.com".
    This will actually give multiple video URLs with different "itag" values.

    [5 out of 10]

    
    
  5. Pick one of the URLs to download based on this Guide to itag values (see "Comparison of YouTube media encoding options").
    For example:
    • itag=5 (lowest common denominator FLV). Save as file.flv
    • itag=18 (MP4 360). Save as file.mp4
    • itag=22 (MP4 720). Save as file.mp4
    Warning: Not all formats will exist.
    You can pick out one of the URLs using grep.

    For this demo, I want you to use itag=5.
    If you can find any YouTube video that does not have itag=5 please tell me.

  6. When we have picked one of these URL lines, it should look something like this:

    url=http%3A%2F%2Fr11---sn-q0c7dn7r.googlevideo.com%2Fvideoplayback%3Fitag%3D43%26mt%3D1394118075%26expire%3D1394143484%26ratebypass%3Dyes%26signature%3DE62B548B55053599CFA362B4DB2756E0252C0FDA.415E89CAB2BC022F7DDA3D326F45C2F11D6CB68F%26sver%3D3%26id%3Dbf1b99b973e8baa3%26ipbits%3D0%26key%3Dyt5%26ms%3Dau%26sparams%3Did%252Cip%252Cipbits%252Citag%252Cratebypass%252Csource%252Cupn%252Cexpire%26mv%3Dm%26upn%3DvnHynJlopWA%26source%3Dyoutube%26fexp%3D917000%252C922520%252C916623%252C937417%252C937416%252C913434%252C936910%252C936913%252C902907%252C934022%26ip%3D136.206.217.17,itag=18

  7. Get rid of "url=" at the start. Use sed.

  8. Get rid of ",..." at the end. Use sed.
    Use the wildcard ",.*$"
    (which means "comma, any character any number of times, end of line")

    We now have a Percent encoded URL looking something like this:

    http%3A%2F%2Fr11---sn-q0c7dn7r.googlevideo.com%2Fvideoplayback%3Fitag%3D43%26mt%3D1394118075%26expire%3D1394143484%26ratebypass%3Dyes%26signature%3DE62B548B55053599CFA362B4DB2756E0252C0FDA.415E89CAB2BC022F7DDA3D326F45C2F11D6CB68F%26sver%3D3%26id%3Dbf1b99b973e8baa3%26ipbits%3D0%26key%3Dyt5%26ms%3Dau%26sparams%3Did%252Cip%252Cipbits%252Citag%252Cratebypass%252Csource%252Cupn%252Cexpire%26mv%3Dm%26upn%3DvnHynJlopWA%26source%3Dyoutube%26fexp%3D917000%252C922520%252C916623%252C937417%252C937416%252C913434%252C936910%252C936913%252C902907%252C934022%26ip%3D136.206.217.17

    [7 out of 10]

    
    
  9. We percent-decode the URL.
    I will give you a Perl script to do this. Run this:

    percentdecode "$url"
    

    where "percentdecode" is this Perl script:

    #!/usr/bin/perl
     
    use URI::Escape;
     
    my $encodedurl = $ARGV[0];
    
    my $url = uri_unescape($encodedurl);
     
    print "$url\n";
    

    We now have a URL looking something like this:

    http://r11---sn-q0c7dn7r.googlevideo.com/videoplayback?itag=43&mt=1394118075&expire=1394143484&ratebypass=yes&signature=E62B548B55053599CFA362B4DB2756E0252C0FDA.415E89CAB2BC022F7DDA3D326F45C2F11D6CB68F&sver=3&id=bf1b99b973e8baa3&ipbits=0&key=yt5&ms=au&sparams=id%2Cip%2Cipbits%2Citag%2Cratebypass%2Csource%2Cupn%2Cexpire&mv=m&upn=vnHynJlopWA&source=youtube&fexp=917000%2C922520%2C916623%2C937417%2C937416%2C913434%2C936910%2C936913%2C902907%2C934022&ip=136.206.217.17

    This is the URL of the video.

  10. Use wget to fetch this URL and store the output in   y.flv or   y.mp4 etc.

    [10 out of 10]






Finish




The file is now on your disk and can be played in a local player.



Feeds      HumphrysFamilyTree.com

Bookmark and Share           On Internet since 1987.