Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Online coding site: Ancient Brain

coders   JavaScript worlds

Search:

CA170      CA668      CA686

Online AI coding exercises

Project ideas


How to write a Shell script to download videos from YouTube

Here is a challenging Shell script exercise.
Write a program to download a video from YouTube. In Shell script. From first principles.




Introduction

Ever since YouTube started video hosting in 2005, it has been possible to write a Shell script to download a video from YouTube. As at 2020, it is still possible. It is a nice demo of the power of Shell scripts.

There are lots of YouTube downloaders out there, but it is a good exercise to write one yourself from first principles. You may be surprised that one can be written in Shell.




Usage

Usage will be like:
 youtube (url) 
For example:
 youtube https://www.youtube.com/watch?v=rQZfCd9BOJE  

Find the MP4 file. Download it to   y.mp4
Many formats may exist.

  

Testing




URL of the page

  1. The URL you see in your browser:
      https://www.youtube.com/watch?v=ID
    is a permanent "home page" for the movie, with comments, related movies, etc.
    It is not the URL of the movie itself.
    However if we look inside the HTML source of this page, we can find the URL of the movie.
    The movie may only exist at that URL for the next few minutes.



URL of the video

Where is the URL of the actual video (as opposed to the URL of the page)?

In fact there are multiple video URLs for different formats. They are buried deep inside the source code. They look like this:

"https:\/\/SERVERID.googlevideo.com\/videoplayback?ARG=VALUE\\u0026ARG=VALUE\\u0026  ..."

They are delimited by double quotes.

  

Fixing the URL

The URL of the video looks a bit strange. It seems like we need to make some changes to it: In fact, that is pretty much it.
Do these fixes and you can in fact fetch the video.



Recipe that currently works

This recipe works as at 2020:
  1. Take in the URL of the page as a command-line argument.

  2. Use wget to get the web page.
    Top tip: Get the web page once. Save to a file. Then when debugging, use that file, without going to fetch the page from YouTube again. When you have debugged the program, you can fix it so it always fetches the page.

  3. Use sed to put a new line in front of every "http". This is to isolate the "http" lines.

  4. Then grep for "googlevideo"

  5. Use tr to change all double quotes (") to new lines.

  6. grep for "googlevideo" again.
  7. grep for "videoplayback"

  
Now we have a short list of URLs. But the URLs need some editing.
  1. Use sed to change all "\\u0026" to "&"
    Top tip: "&" means something to sed. So use "\&" which means "literally the ampersand character".
    Also notice we have "\" in the first pattern (the pattern to search for). That has special meaning. You can either fix that now, or note that it will be fixed by the next step.
    See what the URLs look like now. Do they look normal yet?

  2. Use tr to delete the "\" character.
    OK now do the URLs look normal?

  3. You now have a listing of multiple video URLs with different "itag" values.
    The itag values look like itag=VALUE

  4. Pick URL to download based on this guide to itags: YouTube video stream format codes

    I suggest this one:
    • itag=18 (MP4 360). Save as file.mp4
    Alternatives include:
    • itag=5 (FLV). Save as file.flv
    • itag=22 (MP4 720). Save as file.mp4
    Not all formats always exist.

  5. We pipe the above to a grep for the itag we want.
  6. We now have a single URL, that looks something like this:

    https://r1---sn-q0cedn7s.googlevideo.com/videoplayback?expire=1584569784&ei=V0lyXpLcOpuwxN8Pr8ynuAI&ip=136.206.217.30&id=o-AIqQ_-mxoy2Hncpz_rfUDe5HbfwbhqfhkvNhmTuYIQen&itag=18&source=youtube&requiressl=yes&mh=9w&mm=31%2C26&mn=sn-q0cedn7s%2Csn-5hne6nsr&ms=au%2Conr&mv=m&mvi=0&pl=16&initcwndbps=1503750&vprv=1&mime=video%2Fmp4&gir=yes&clen=752255&ratebypass=yes&dur=22.453&lmt=1559497970109948&mt=1584548075&fvip=1&c=WEB&txp=5431432&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cgir%2Cclen%2Cratebypass%2Cdur%2Clmt&sig=ADKhkGMwRgIhAIQYDX0NVV_9eQX57RzjTNKe4wPBWAXwdzhGcRGw7fxrAiEA7W2dAd6aZGw9edUHEDgLAanvI5Bm98WWVfrux7O9xmk%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=ABSNjpQwRgIhAM2L6hyS3JFtbQ6M5F7bGi8grfz6MNOb_EZ2cPtLwbB4AiEAy_LiHMqu3DI1_DCqTjTlZ0ykwq7l1wN3vy36Eudgdco%3D

  7. wget the URL to output to a file like file.mp4 and you are done!



Play video

Video can be played in various ways, depending on installation:
  1. In browser.
    Use file://
    Or put video in web directory and use http://

  2. "Videos"

  3. VLC.
    Might have to change: Tools - Preferences - Audio - Output Type - UNIX OSS audio

  4. RealPlayer


The script can launch the player automatically:
vlc file &

  

Links



ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.

Wikipedia: Sometimes I link to Wikipedia. I have written something In defence of Wikipedia. It is often a useful starting point but you cannot trust it. Linking to it is like linking to a Google search. A starting point, not a destination. I automatically highlight in red all links to Wikipedia and Google search and other possibly-unreliable user-generated content.