Analytics spam; can you trust your website data?

We’ve all suffered with email spam for years.  Despite the combined efforts of Google, Microsoft and our corporate IT departments, we still get unsolicited mail in our inboxes and lose genuine mail to the dreaded junk folder.

For most, it’s an annoying fact of email life. Whilst spam can certainly affect productivity, for most business users it doesn’t pose any major risks.

A new spam in town

Since the beginning of 2015 there has been an increase in a new, altogether more insidious, kind of spam. Unscrupulous individuals and companies are polluting our web analytics data with “ghost” traffic, fake referrals and even bogus goal completions.

Unlike email spam, this is more than a minor annoyance. Left unchecked, the inflated numbers could have a serious effect on existing marketing and business reporting. That in turn could impact business decisions based on Google Analytics data.

Analytics spam, really?

Yes really. It’s not a small-scale problem either. Take a look at the chart below which shows a real example from a site with a reasonable level of traffic.

Weekly sessions including spam

If you were the marketing manager responsible for the website you’d be feeling pretty happy about the traffic profile from week 20 onwards.

However, if we remove the spam, you’ll see that a large proportion of that traffic has been fake.

Weekly sessions with spam removed

On a smaller site, the spam traffic could easily overwhelm the real numbers.

What are they doing?

Spammers are hitting your analytics is two ways. First, they are hijacking your Google Analytics code and adding it to a dummy site. That creates genuine looking, “ghost” traffic that is recorded against your Google Analytics account.

Importantly, it also allows them to create false Google Analytics “Events”. If you’re using events to drive your GA goals, this can have serious implications for your KPI reporting.

Secondly, they are setting up links to your site on a dummy page and using an automated “bot” to click on them. This shows in your reporting as a genuine referrer sending traffic to your site.

What are they trying to achieve?

Like all spammers, they want you to visit their site. When you look in your reporting, here’s what you are likely to see: –

Referral list

Wow! Look at all the traffic that came from… This leads you to visit the site, where they try to sell you something, usually analytics related.

Genuine companies are being duped by unscrupulous “agencies” to drive traffic to their sites using these techniques.

What does that mean for my reporting?

The initial thing you’ll notice is that there is a big increase in “Referral” traffic, often to your home page. That will increase the number of users, sessions and pageviews:

Top line statsBecause the spam bots only visit one page, and for only a short time, this results in lower values for engagement such as pages per session, time on site and bounce rate:

Engagement statsHow can you fix it?

Fixing the initial problem is relatively simple if you understand Google Analytics filters and segments.

To stop people “ghosting” your website, you need to set up a hostname “Filter” so that Google Analytics only records traffic generated by your website. Care needs to be taken, if you are recording traffic from multiple sites, to include all the valid domains.

Analytics Edge has a great post explaining the technical process.

Filtering only works from the day it is set up so you should aim to do this as soon as possible.

That leaves your existing data which will still be polluted with the spam traffic. To clean the historical reports you’ll need to create a new “Segment” in Google analytics that filters out the spam.

Analytics Edge also explain how to set this up and have created a shared segment in the Google Analytics solutions gallery.

Once imported, the new segment will show in the drop-down list: –

Segment details

It’s important to remember to use this for future reporting to ensure spam is filtered out.

An ongoing battle

The battle against the new spammers is ongoing. Each week, they find different, more devious ways to pollute our data.

The hostname filter is a one-off fix which all sites should implement as a matter of course. This will remove a large percentage of the fake traffic.

The filtered segment will need to be updated with new domains and referrers periodically to exclude new traffic sources.

If you outsource your analytics to us then you can rest easy. We’ve already added the filters to your GA account and will be regularly updating/distributing the filtered segment.

Need advice?

If you are concerned about the integrity of your data then why not drop us a quick line?

If you’d like to keep up to date on the latest thinking, sign up for our email newsletter below.