Using the HTTP 'Referer'

On the Witches’ Three webpage I started some work on a concept. I’m sure it’s not a new idea and I’m also not sure if it can succesfully be utilized but the basic idea come from the fact that when people put a link to your site on their site and the link gets clicked on, the browser passes on the HTTP ‘referer’ variable, telling the webserver which site it just came from.

Since many links to your site means that you have good content, search engines like Google increase the pagerank so your site shows up higher in search results. It’s therefore a good idea to encourage people to add links to your site. Ofcourse the best encouragement is to have great content, but using the HTTP referer to dynamically add links back to the site that links to your site may add some extra incentive.

Links in the Database

On the Dutchie site, I would like every hyperlink to be maintained in a database. I’ve seen too many sites that link to non-existing webpages. By putting most, if not all, the hyperlinks inside a database and by verifying  (using a script) periodically whether the links still exist, I can set an attribute on the link if the link is no longer valid. Invalid links in the content can then simply be disabled wherever they get used.

Using this links-in-a-database concept, I decided to add a link ‘category’ called ‘Referers‘. This is now a link category where only a script that gets included on every page of dutchie.org can add links to, and it does this whenever it sees the HTTP_REFERER variable being set. First the HTTP_REFERER needs to be parsed and cleaned up somewhat. If there’s a link to your site in a page that can be accessed in multiple ways (for example on a forum where the link may be in an outline of an article but also in the article itself), we don’t want every possible way to look at the referrer being passed to be put in my linktable. So I simply strip off the ?….. part of the URL. At this point I can still afford to be reasonably sloppy since a referral link will be visible in our weblink category for only one day at most.

  • Check if the referral link exists in the category; if it does, I”ll increase the hit counter
  • If the referral link does not exist yet, add it with a title that’s recognizable fairly easily (‘Automatically added link’ or something like that) so that users seeing the link will know that it has not been processed yet.

Checking the links

Obviously… not everybody is an honest person. When I see how many automated programs try to get their links on this blog by sending automated responses to the articles I keep reminding myself that linkage is valuable. So after I add a referral link, it’s probably a good idea to periodically check if the link to Dutchie still exists on the referral page. I’ve created a simple Perl program for this that uses the Perl LWP library to retrieve the page, the HTML::Parser library to scan the page for it’s title and to check all the <a href=”…”> tags.

Since the linkchecker program is essentially a web crawler, it also uses the WWW::RobotRules module to determine whether it’s okay to crawl somebody’s page. Some of the specifics are in the Linkchecker page. The linkchecker script will also check the user contributed links on Dutchie and check their validity, so there will be no broken links anywhere.

Using a database link in contents

So I have a site where people can add links, but they can also add contents. Inside the contents I want to have the ability to use the links in an easy way. For this I will utilize a very early decision in the building of Dutchie; Every page is built from a number of blocks such as the header, the navigation bar, the contents, the ads, the footer etc. Each of these blocks may add to a variable called $page that’s initialized in the very first block. The $page variable eventually gets printed to the screen at the last included block.

Since at the end of the script nothing has been output yet, I can still manipulate the end result in one big substitute and replace command. This way I can use what I’ll call ‘site macro’s’. A link macro may for example look something like ‘LNK(123)’ (identifying a link simply by it’s unique ID), ‘LNK(‘videos/myclip.flv’) or any other format I want to use. On the macro processing stage I’ll simply replace these macro’s with the real links, while keeping the flexibility to turn these links on or off, depending on whether the linkchecker has found the link to be okay.

And Finally…

While building this referral mechanism I had forgotten to ignore referral links coming from Dutchie itself! While this turned out to result in a lot of links in the link database that point to different places on Dutchie, an unexpected side effect became visible; I have been building a sitemap by clicking on all the links as I go about testing everything a few times per day! I had recently submitted a sitemap.xml to the Google webmaster tools and decided that it would be a royal pain in the rear to have to do this frequently. So I added a little bit of code to the linkchecker script to automatically generate a sitemap.xml for Dutchie. So now, when people start adding contents to Dutchie and thereby creating new URL’s, everytime they click on the new link we’ll get a new local referrer which the linkchecker script will pick up every day to automatically generate this sitemap.xml

email
Tags: , , ,

About Fred Leeflang

Hoi! Ik ben de website beheerder van de Forza website.