Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Search:

CA216      CA249      CA318

CA400      CA651      CA668


Mark Humphrys - Computers and Internet - "Why on earth would I link to you?"


Why on earth would I link to you?


Mark Humphrys
Dublin City University, School of Computing

This article appeared in abridged form in the Irish Times on Mon 15th Feb 1999.



Introduction

Most writing about the web is done from a very short-term perspective. Authors create web sites very quickly - and abandon them almost as quickly. Sometimes the sites are pulled down (leaving everyone who linked to them irritated). Sometimes they are left up, as dead sites, unmaintained and full of broken links. In either case, the authors rarely thought about what role their site might play in the long-term unfolding of the web.

I am in the unusual position of knowing something of what it is like to maintain a web site over a long period. This is my 6th year of maintaining a several-hundred page site on my research and other interests, and a number of surprising lessons have become clear to me over those years.


Links are free ads

First, some background. My site is one of those classic "amateur" web sites, with no commercial purpose other than to promote my research work in computer science and my other interests (mainly history). It gets 30,000 hits a month (the circulation, I suppose, of a small magazine or small local newspaper) and would probably get more if I promoted it or registered it properly with the search engines, none of which I have yet done.

Hits are not really the right way to measure site stats (with multiple included files per page). A better measure would be page views.

Because it is non-commercial, I link to other relevant web sites throughout my texts. Linking is the fundamental innovation of the web - the ability to directly reference other works from within your text - yet few commercial sites outside search engines take any advantage of it. Each one prefers to re-invent the wheel in an (absurd) attempt to force you to stay within their site for all your needs.

Having established that mine is the kind of site that links, what are the long-term implications of this? It turns out that in what is effectively a small magazine, I currently offer no less than 509 free ads for Yahoo Corporation, without being asked to, and without expecting anything from them in return. The profound question is - Why do I do this? And why did Yahoo get all these free ads and not anybody else?


In the future, all links will be to Yahoo

Let me explain the evolution of a typical page over the last few hectic years. The example we take is a page in which I refer to William Shakespeare. Now while Shakespeare is of some relevance to the main subject of my page, it would be pointless for me to start maintaining my own (inadequate) biography or collection of data on Shakespeare at this point. The logical thing to do is to point to someone who specialises in Shakespeare. Hyperlinks should be (but rarely are) used like this to strip down pages to their original content and devolve everything else to remote specialised pages.

In the early days of the web this would have involved a link to some Californian computer science student's home page, something like: http://www.cs.stanford.edu/~cs94joeshmo/shakespeare.html, which would have been the only Shakespeare page on the web at that time. As the web developed of course, our friend cs94joeshmo would graduate from school and vanish, and I would (irritated) change my page to link to some other soon-to-vanish site. Most web authors are just beginning to experience what it is like to have your links break year after year, but after a few years of it you either go crazy or you develop specific strategies to future-proof your pages.

First of all, I started linking to more heavy-duty, dedicated pages, something like: http://www.shakespeare.org/, which at least looked as if it would still exist next year. But the web kept exploding, and I soon realised that shakespeare.org was only one of dozens of dedicated Shakespeare sites, and it did not make sense to restrict my readers to one of them. What I wanted was a list of sites, so I started linking to things like: http://www.shakespeare.org/shakespeare-sites.html. But still the web kept changing, and I began to wonder was this the best list of sites to link to. What I really wanted to find was the definitive place on the web that the word "Shakespeare" should link to.

The answer, I argue, and the final resting place of all my ceaselessly changing links, is a Yahoo category: http://www.yahoo.com/Literature/Shakespeare/. Yahoo gives you the confidence that this link will work forever, and that the page it refers to will be maintained forever - gradually expanding with sub-categories and sub-sub-categories as the amount of information increases. And I am finally freed of the job of following all those moving sites.

Not any more. The Yahoo directory shut down in 2014.

So I have noticed this effect over the last half-decade - that hundreds of my links have slowly migrated to Yahoo, and linking to specific pages has been slowly replaced by linking to Yahoo categories, as Yahoo introduce them. Even links to specific content, that you would imagine Yahoo would be no good for, such as Act 5, Scene 5 of Macbeth, might be surrounded by a general Yahoo Macbeth category, in case the specific site I link to ceases to exist. So while it's not actually true that all my links will someday be to Yahoo, it is true that they provide an almost irreplaceable service for link authors like me. And getting themselves embedded in people's web pages like this will make them hard to displace. For instance, any new web directory is going to have to be a lot better than Yahoo now for me to go and change those 509 links ..

And in fact the other web directories, search engines and portals don't even seem to realise what is going on. Catering almost entirely for the transient "surfer" category, they provide keyword-ranked chaotic lists of information, on pages that cannot or are not designed to be linked to. Try "Shakespeare" on any of them and you will see what I mean. They use frames so you cannot find the address, the address is a mess of keywords and search parameters ("CGI arguments") or temporary identification numbers ("cookies") - anything, it seems, to make sure you do not link to their site. Only Infoseek seem to provide an actual page that you can link to: http://www.infoseek.com/Books/Shakespeare/ - and it is a far inferior cousin to Yahoo's complex page.

The Infoseek directory has shut down.

This pattern is repeated in other categories. Try the English Civil War. The science of taxonomy. The town of Dun Laoghaire. The Orange Orders. The 1798 Rising. The Holocaust. Fermat's Last Theorem. The TV show Father Ted. For each of these Yahoo has a dedicated category at some point in its vast hierarchy (which is really a network). There's not much competition for who to link to.

Not only has Yahoo Directory shut down. But there is plenty of new competition for who to link to. Consider Wikipedia on the English Civil War. The science of taxonomy. The town of Dun Laoghaire. The Orange Orders. The 1798 Rising. The Holocaust. Fermat's Last Theorem. The TV show Father Ted.

Infoseek could perhaps, though, mount a serious challenge someday if (a) they made their hierarchical structure easy to navigate, and (b) if Yahoo took their eye off the ball. Because it is possible that Yahoo themselves do not realise what their strengths are, and that this has actually happened so far by luck. The web design expert Jakob Nielsen has written a column on why Yahoo is good in which - though he does not discuss the linking issue - he makes this very point. The paradox of sites not really understanding why they are successful is in fact common on the Internet, as I shall now discuss with reference to the Irish Times.



On the web, more people will read the archive than the current issue

Another strange fact about my web pages is that I provide 115 free ads for the Irish Times on them, and basically no ads at all for the Irish Independent or The Guardian. Why is this?

The answer is because of their online archive. Newspapers and magazines on the web have developed online archives almost by accident. When the day's news is over, some simply replace their front page with the next day's news, thereby guaranteeing (perhaps inadvertently) that nobody will ever link to one of their articles. But others, perhaps only by accident, leave the old pages online somewhere, thereby providing hundreds of useful hooks for web authors to link their pages to.

For instance, on one of my pages I refer to Capt. Percival Lea-Wilson, an RIC man killed in 1921.

Correction: 1920.

Now it happens that there was an exchange of letters and an article by Neil Jordan about him in the Irish Times in October 1996. By the marvels of online archiving, this material is not dead and buried in the National Library's back rooms somewhere, but is alive on the web and I can link to it. I can even link directly to an individual reader's letter that month. Indeed, every time I read a useful article now, I note the URL so I can make a link to it. Whereas if the newspaper had no online archive (e.g. the Irish Independent until just recently) - well easy come, easy go.

The Irish Independent and The Guardian both now have full archives online.

So online archives (or rather the lack of them) are another example of sites failing to think long-term. In fact, Jakob Nielsen has found that the majority of hits an article gets are when it is in the archive rather than when it is the current issue (and you will be hitting his archive as well yourself if you go and read his article). This is another of those surprising facts about the Internet that has only become clear as the years have gone by. Eventually, advertising space in the archive should become more expensive than space in the current issue itself.

Should the Irish Times decide for some reason to take down their archive, I will of course reluctantly have to relocate all those 115 links to point to somewhere else. Even if they just demand, say, user registration for marketing purposes (like, e.g., The Guardian recently) I would probably over the long-term reluctantly redirect those 115 free ads - I don't want any barriers to my readers being able to follow links from my pages. Forcing people to remove ads to you would certainly be regarded as odd behaviour in the offline world - yet online it is still commonplace.

The Irish Times is now pay-to-view.

To summarise, after years of re-editing my links, I have developed a specific policy which makes me avoid linking to some sites and inclined to link to others. Many (if not all) of the sites I provide free ads for may not know why I do so, and may casually force me to remove those ads in the future. Few people think about the web long-term, but I would argue that what you link to and why will eventually be dominated by long-term thinking. "Surfing" is only a fad - in the long run, I imagine the web as becoming the definitive organiser of humanity's information. We will be able to rationally analyse what is the best place on the planet for a particular concept to link to. The reward for the winners in this competition will clearly be immense..



Some relevant links




Follow-up to this article, explaining what has happened to my links since.




In defence of Wikipedia




Feeds      w2mind.org

On Internet since 1987.