Web statistics explained

Two things to consider before we begin. Do you have access to a web-hosting service, and how much access do you have to the server? What may seem obvious questions are actually key considerations. For example, if you have a free web host, you usually won’t have access to the server’s raw log files, and thus be forced to use ‘counter’ applications instead of processing your statistics. If you’re on a standard paid hosting contract, you’ll usually have a more heavyweight stats package like Webalizer or AWStats that will do this for you. If not, you almost certainly won’t have enough access to the server to install a replacement at this level – such tools need to integrate with Apache, not simply run PHP or Perl scripts on demand. Obviously, if you do have complete control over the machine, you can do what you like.
Preparing the ground We’ll cover the difference between each type in a moment. Regardless of what you use, your stats package will provide the following key details: the number of visits and unique visits your site receives, and how they reached your site. Hits – the number of individual elements of the page accessed by the visitor – are no longer a worthwhile metric, although are often included as well.
Beyond that, depending on the package, other information can contain the browsers being used to view your page, the path taken by visitors through your pages, the OS they used, their country of origin, the amount of time spent on your site, the number apparently reached from bookmarks rather than direct URL typing, the number of errors generated (notably 404 – ‘Not Found’), and usually best of all, a list of keywords that brought people to your virtual doorstep. These tend to lean towards two things: very bizarre fetishes, and no connection to anything you might even conceivably have put on your site. So if it was you who got to www.richardcobbett.co.uk in September by typing in ‘lesbian police‘, ‘elves stripping‘ or ‘infogrames is innovation’, sorry for the disappointment – and seek help immediately.
It’s worth noting that more information isn’t necessarily better. Knowing what browsers people are using is an extremely important piece of information; knowing what keywords are attracting visitors is key to a commercial website – but this is overkill. In many cases, you’re better off with a simpler tool that simply gives you your statistics up-front, but support more specific features that you want.
Delayed referrals
Referral logs are one of the single most agonising parts of running a website, thanks to two problems: incompetence and malevolence. In a nutshell, the logs show you the pages that people have linked to you from – be they blog posts, news sites, comment threads – and thus keep an eye on what the Internet is saying. The hitch is that many referral packages only give you part of the story.
If the link doesn’t use your site’s name, or if some posting coprophage has simply leeched – linking to your files and draining your bandwidth instead of their own – an image from your site, you may never find what people are saying about you. Simply being linked by a protected page doesn’t grant you access – anything on a private LiveJournal, or in an email, or on other such sites, will remain a mystery unless the owner opens up. But that’s not much of a problem.
Worse are referrals like www.freepokertipzhonestguv.org. This kind of site has not linked to you, at all. Faking a referral is extremely easy, and used by two types of web-scum: spammers, hoping that you publish your logs somewhere and that they’ll get links to their latest fake sites, drum up links on the search engines, and make a dime on your bandwidth; and fraudsters, who target your natural interest in seeing what people are saying about you to try and make you click on a link to them personally. Usually, they’re selling the software that does this to other people, priced around $20 and an unused human soul.
And here’s the kicker. You don’t just get one of these, nor ten, but hundreds. Thousands. It’s a constant flood of drivel that completely washes out the genuine information you want to see, and digging yourself out of it is a certified nightmare. As ever, spammers are willing to cause you any amount of frustration just for the slight chance that you’ll consider clicking on their crap.
Clearing out the Aegean stables
Most stats programs you’ll see just don’t bother trying, although you can often get past this with content management systems or web counters that keep their own referrer logs. These offer much more scope to whittle the list away via blacklists or filters (such as removing anything containing the word ‘poker’ or ‘viagra’), or fighting off the automated scripts by checking to see if the web-browser actually loaded your page – either by making them download an image (old-style) or process a quick Javascript before recognising their existence.
Mint (www.haveamint.com) is the most recent of these, with the twin advantage of being installed on your own server rather than a third-party server, and running completely invisibly. It requires PHP and MySQL, and anyone who visits without Javascript enabled doesn’t get counted – but they won’t be blocked either. It’s a simple tool, restricting itself to a quick look at your stats – Visits, Searches, Pages and Referrers – and is priced at $30 per website.
As referrer spam is done by scripts rather than people actually visiting your site, either of these approaches clears up a decent squelch of the problem immediately – if not eradicating it. Only a careful smattering of nuclear bombs is likely to do that.
Collecting up keywordsStatistics don’t simply show you how people found your site, they help you draw more traffic to your door. AWStats (www.awstats.org) is the most comprehensive package that you’re likely to get with your hosting provider – while it’s a strong package in its own right, it’s open source and free, unlike the likes of Urchin, which can practically tell you the consistency of your vistors’ saliva, but charge $199 a month for the privilege. More common still is Webalizer, also free, but about as enjoyable to use as an automated prong that randomly stabs into your eyeball and twists your brain stems like candyfloss.
Keywords are one of the best examples. Search engine strings may be bizarre, but drilling down to the word level makes it easy to see how you’re representing yourself on the web – if the key phrases you want yourself to be associated with aren’t there, or a tiny proportion of irrelevant results, it could well be time to either overhaul the site as a whole and target your market more directly, or to start investing in Google adverts and other methods to draw traffic.
Even on personal sites, statistics are more than just an ego-boost, and they’re life-and-death on commercial pages. More isn’t always better however – as just as you can over-simplify your current situation, so can the information you actually care about be buried in obtuse stats. Web counter or log-reader, you’ll get your basic data all the same. It’s what you need after that that makes your choice so tricky.
Adding RSS feeds to your site is a good idea, regardless of whether you update constantly or once in a blue moon. If the former, you’ll quickly hit one problem when trying to chart your statistics: you can see how many times the RSS feed has been polled – along with any other page on your site – but rarely anything more than that. If it shows a summary, you may be able to look at the hits for the specific article page, and fair enough – but this doesn’t work if your reader gets the whole post up front.
This is where Feedburner comes in (www.feedburner.com). Along with other useful features, like converting your feed into RSS or Atom on demand, and producing a human-readable version of it for anyone who puts in the URL directly, it gathers statistics on exactly which entries have been seen and clicked through. It’s a free service, and one well worth signing up for. You simply point it to your current feed, wherever that happens to be, and it reforms it into a new one, for instance, http://feeds.feedburner.com/richardcobbett. If the original feed moves, you update Feedburner with the new location, and everyone subscribed to it will continue to receive your latest posts without having to re-subscribe. There are some commercial elements to it as well, but the basic service is completely free to use, and highly recommended.
Another way around this is to sign up with a service like Bloglines (www.bloglines.com) and subscribe to your own feed. This will show you how many people have subscribed to it on at least that service, and give you a rough idea of your regular numbers – although you would need to repeat the process on every web aggregator tool in the world to get a full idea of your success.

