Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Search:

CA216      CA249      CA318

CA400      CA651      CA668


Grab Flickr data

Write a shell script to get images from Flickr as an XML list, parse it and display result as HTML.

Background


The "flickr" shell script

  1. Shell script usage:
    flickr (tag)
  2. Gets XML feed of latest photos tagged with this tag.
  3. Extract these items:
    <content type="html">
     HTML payload (encoded)
    </content>
    
  4. Decode payloads to normal HTML.
  5. Output just those payloads into some file: file.html
  6. Launch browser with that file to look at it.



40%

  1. The script constructs the correct URL from the command-line argument.
  2. Use wget to get the URL to file.xml

  3. Extract the sections we want.
    See Parsing XML in Shell
    Pipe the XML into:
    xpath '//content'
    or:
    xpath '//content[@type="html"]'

60%

  1. You now have the HTML payloads. Unfortunately they are encoded. We need to decode them.
  2. Convert &lt; to <
    Use the reverse of the conversion in the search engine.
  3. Convert &gt; to >
    Use something similar to the previous.
  4. Convert &quot; to "
    Use something similar to the previous.

100%

  1. Send output to some file: file.html
  2. Launch browser with that file.
  3. The actual images will display. (The HTML has embeds of remote images.)


Testing

  1. flickr dublin
    should fetch latest images for Dublin and display them.


Extra things

  1. xpath generates noisy output messages. How would you do a silent xpath?

  2. When an encoded HTML payload is inside XML tags, and you use xpath to extract it, xpath seems to do some decoding automatically.
    file.xml contains this:
       <tag> &lt;a href=&quot;url&quot;&gt;text&lt;/a&gt; </tag>
    
    You send it through xpath:
       cat file.xml  | xpath '//tag'
    
    It outputs this:
       <tag> &lt;a href="url">text&lt;/a> </tag> 
    
    Some characters decoded. Some not.
    So you may not need all of the above sed lines. But no harm will be done.