Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Online coding site: Ancient Brain

coders   JavaScript worlds

Search:

CA170      CA668      CA686

Online AI coding exercises

Project ideas


Grab Flickr data

Write a shell script to get images from Flickr as an XML list, parse it and display result as HTML.

Background


The "flickr" shell script

  1. Shell script usage:
    flickr (tag)
  2. Gets XML feed of latest photos tagged with this tag.
  3. Extract these items:
    <content type="html">
     HTML payload (encoded)
    </content>
    
  4. Decode payloads to normal HTML.
  5. Output just those payloads into some file: file.html
  6. Launch browser with that file to look at it.



40%

  1. The script constructs the correct URL from the command-line argument.
  2. Use wget to get the URL to file.xml

  3. Extract the sections we want.
    See Parsing XML in Shell
    Pipe the XML into:
    xpath '//content'
    or:
    xpath '//content[@type="html"]'

60%

  1. You now have the HTML payloads. Unfortunately they are encoded. We need to decode them.
  2. Convert &lt; to <
    Use the reverse of the conversion in the Shell search engine.
  3. Convert &gt; to >
    Use something similar to the previous.
  4. Convert &quot; to "
    Use something similar to the previous.

100%

  1. xpath generates noisy output messages. Do a silent xpath.

  2. Send output to some file: file.html
  3. Your program automatically runs the browser (run detached) with that file.
  4. The actual images will display. (The HTML has embeds of remote images.)


Testing

  1. flickr dublin
    should fetch latest images for Dublin and display them.


Extra things

  1. When an encoded HTML payload is inside XML tags, and you use xpath to extract it, xpath seems to do some decoding automatically.
    file.xml contains this:
       <tag> &lt;a href=&quot;url&quot;&gt;text&lt;/a&gt; </tag>
    
    You send it through xpath:
       cat file.xml  | xpath '//tag'
    
    It outputs this:
       <tag> &lt;a href="url">text&lt;/a> </tag> 
    
    Some characters decoded. Some not.
    So you may not need all of the above sed lines. But no harm will be done.