Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Search:

CA249      CA318      CA425      CA651

w2mind.computing.dcu.ie      w2mind.org


How to write a program to download videos from YouTube

How to write a Shell script to download videos from YouTube. From first principles.




Introduction

For some years I have been writing - for classes and for fun - a UNIX Shell script to download videos from YouTube.

There are lots of YouTube downloaders, but it is a good exercise to write one yourself from first principles. You may be surprised that one can be written in Shell.

This script needs re-writing every year, because YouTube keep changing their formats.


This script no longer works. YouTube have changed formats yet again.
The URL is still in there, though, in ytplayer.config
Can anyone write a Shell script to extract it? Tell me here.




Usage

Usage like:
youtube (url)
e.g.:
youtube "http://www.youtube.com/watch?v=OuSdU8tbcHY"
(which is "Titanic in 5 seconds")
Finds the FLV file.
Downloads it to   y.flv



We pick this video to debug with because it is small!




Strategy

  1. The URL you see in your browser:
      http://www.youtube.com/watch?v=ID
    is a permanent "home page" for the movie, with comments, related movies, etc.
    It is not the URL of the movie itself.
    However if we look inside the HTML source of this page, we can find the URL of the movie.
    The movie will also only exist at that URL for the next few minutes.

  2. The URL of the movie is in fact in the header right at the top of the source code. (View Source to see.) Something like:
    <head> ...
    <script> ...
    some.function("IRRELEVANT_URL");
    some.function("THE_URL");
    </script>
    <title> ...
    
  3. The function name seems to change, but THE_URL can be spotted because it contains "generate_204".
  4. The URL needs some editing once it is extracted.


Shell code - main script

This Shell script gets the "home page" for the video, pipes it into a separate script to extract the URL, and then downloads that URL.


url=`wget -q -O - "$1"   | extracturl`
   
wget  -O - "$url"  > y.flv



Shell code - extracturl

To extract the URL, we make a Shell script "extracturl" consisting of various programs piped together.
("extracturl" could be a script, or a Shell function.)

TIP: See what the first pipe does, then the first two, then the first three, until you have all of them piped together.

The following are to be piped together:


  1. grep "generate_204"

    This gives a single line, looking something like this (URL in bold):

    <script>var yt = yt || {};yt.preload = {};yt.preload.counter_ = 0;yt.preload.start = function(src) {var img = new Image();var counter = ++yt.preload.counter_;yt.preload[counter] = img;img.onload = img.onerror = function () {delete yt.preload[counter];};img.src = src;img = null;};yt.preload.start("http:\/\/r13---sn-q0c7dn7z.c.youtube.com\/crossdomain.xml");yt.preload.start("http:\/\/r13---sn-q0c7dn7z.c.youtube.com\/generate_204?sver=3\u0026source=youtube\u0026expire=1361552880\u0026id=3ae49d53cb5b7076\u0026signature=C11CE1F97394B7D012FDC4A5279B1DF38E5858BA.79A6AC7DB3888987E1B9D7B576D66B23D7A6D25B\u0026mv=m\u0026sparams=algorithm%2Cburst%2Ccp%2Cfactor%2Cid%2Cip%2Cipbits%2Citag%2Csource%2Cupn%2Cexpire\u0026burst=40\u0026ms=au\u0026cp=U0hVRlVQUF9NSkNONV9NSlRJOk5zempyMnY3al9B\u0026ipbits=8\u0026upn=GLEXJMqd2sQ\u0026factor=1.25\u0026algorithm=throttle-factor\u0026ip=MYIP\u0026itag=34\u0026mt=1361529971\u0026key=yt1\u0026fexp=909708%2C914072%2C916611%2C920704%2C912806%2C902000%2C922403%2C922405%2C929901%2C913605%2C925006%2C931202%2C908529%2C920201%2C930101%2C906834%2C926403%2C901451");</script><title>titanic in 5 seconds - YouTube</title><link rel="search" type="application/opensearchdescription+xml" href="http://www.youtube.com/opensearch?locale=en_US" title="YouTube Video Search"><link rel="shortcut icon" href="http://s.ytimg.com/yts/img/favicon-vfldLzJxy.ico" type="image/x-icon"> <link rel="icon" href="//s.ytimg.com/yts/img/favicon_32-vflWoMFGx.png" sizes="32x32"><link rel="canonical" href="/watch?v=OuSdU8tbcHY"><link rel="alternate" media="handheld" href="http://m.youtube.com/watch?v=OuSdU8tbcHY"><link rel="alternate" media="only screen and (max-width: 640px)" href="http://m.youtube.com/watch?v=OuSdU8tbcHY"><link rel="shortlink" href="http://youtu.be/OuSdU8tbcHY"> <meta name="title" content="titanic in 5 seconds">

    [2 out of 5]

  2. To extract the URL, break lines at double-quotes, and extract the line with generate_204:

    	
    tr '"' '\n'		|
    
    grep "generate_204"	
    
    

    You now have the extracted URL:

    http:\/\/r13---sn-q0c7dn7z.c.youtube.com\/generate_204?sver=3\u0026source=youtube\u0026expire=1361552880\u0026id=3ae49d53cb5b7076\u0026signature=C11CE1F97394B7D012FDC4A5279B1DF38E5858BA.79A6AC7DB3888987E1B9D7B576D66B23D7A6D25B\u0026mv=m\u0026sparams=algorithm%2Cburst%2Ccp%2Cfactor%2Cid%2Cip%2Cipbits%2Citag%2Csource%2Cupn%2Cexpire\u0026burst=40\u0026ms=au\u0026cp=U0hVRlVQUF9NSkNONV9NSlRJOk5zempyMnY3al9B\u0026ipbits=8\u0026upn=GLEXJMqd2sQ\u0026factor=1.25\u0026algorithm=throttle-factor\u0026ip=MYIP\u0026itag=34\u0026mt=1361529971\u0026key=yt1\u0026fexp=909708%2C914072%2C916611%2C920704%2C912806%2C902000%2C922403%2C922405%2C929901%2C913605%2C925006%2C931202%2C908529%2C920201%2C930101%2C906834%2C926403%2C901451

    [3 out of 5]

  3. Now some obvious editing of the URL:

    1. Use sed to change http:\/\/ to http://
    2. Use sed to change youtube.com\/ to youtube.com/

    Warning: "\" in the string has special meaning to sed.

  4. Some less obvious editing:

    1. Use sed to change \u0026 to &

    Warning: "\" and "&" in the string have special meaning to sed.

  5. And finally some very non-obvious editing. You might have guessed that the above was the URL, and tidied it up, but then found out it does not work. Further research was needed to do the final edit (thanks to Brian Kane for all of this):

    
    	sed  's|generate_204|videoplayback|g' 
    
    

    This give the edited URL, looking like this:

    http://r13---sn-q0c7dn7z.c.youtube.com/videoplayback?cp=U0hVRlVQUF9NSkNONV9NSlRJOjg0OHRUaFZKTTBJ&id=3ae49d53cb5b7076&signature=0FDF981AB22BCCB53F4B2059436D6837C58E1BD1.3F69089BEF997527BDE4B91DEFA0902E8DAFB37F&ip=MYIP&ms=au&source=youtube&expire=1361552880&key=yt1&factor=1.25&ipbits=8&mv=m&sver=3&mt=1361530031&upn=xfVDC2jSFOk&fexp=906357%2C916807%2C920704%2C912806%2C902000%2C922403%2C922405%2C929901%2C913605%2C925006%2C908529%2C920201%2C930101%2C906834%2C926403%2C901451&sparams=algorithm%2Cburst%2Ccp%2Cfactor%2Cid%2Cip%2Cipbits%2Citag%2Csource%2Cupn%2Cexpire&itag=34&algorithm=throttle-factor&burst=40

    [4 out of 5]

  6. Your script should now be able to fetch the FLV file to disk.

    [5 out of 5]



Notes




The FLV file is now on your disk and can be played in RealPlayer or other players.



Feeds      HumphrysFamilyTree.com

Bookmark and Share           On Internet since 1987.