removing-referral

Welcome to a Beginners Guide to Removing Referral Spam in Google Analytics.

In this guide, I will be teaching you how to remove or block referrer spam.
First, we will start with the basics

 

What is Referrer Spam?

Referrer spam occurs when your site gets fake referral traffic from bots and this fake traffic is then recorded by Google Analytics (GA).

 

What is a bot and What do they do?

A bot is a program called a crawler which is developed to perform repetitive tasks with a high degree of accuracy and speed.

Bots are used for indexing web pages mostly (reading contents of web pages).

 

Good Bots:

Google Bot is an example of a good-bot. A Googlebot is used by Google to crawl and index pages on the internet. They use their crawl bots every day to crawl web pages of all types. This is how Google has so many up to date site results across the internet.

Good bots obey a file called “robots.txt” but bad bots don’t. Bad bots can create fake user accounts, send spam emails, steal email addresses and can get around CAPTCHAs codes.

 

Bad Bots:

Bad bots are mostly used in black hat techniques such as:

 

  • artificially increase website traffic
  • click fraud
  • scrape websites
  • spread malware (virus)
  • harvest email addresses

 

Bad bots use many methods to hide so that they can’t be detected by security. They can pretend to a web browser (like chrome) or traffic coming from a legitimate website.

They send out HTTP requests to the websites with a fake referrer header and create and send fake referrer headers to avoid being detected as bots.

The fake referrer header has the website URL which the spammer wants to promote and/or build backlink to.

When they do this, it is recorded in your server logs. Google treats this referrer value as a back-link which influences the search engine ranking of the link being promoted.

They can hide from bot filtering used by Google Analytics (GA) and because of this, you can then see spam Traffic in your GA ‘Referrals’ reports.

Most bots don’t use Javascript but some do. Bots that do use Javascript show up as hits in GA reports and mess up the traffic data and any metric based on sessions like bounce and conversion rate.

Bots that don’t use Javascript

Bots that don’t use Javascript on the other hand, (like Googlebot) do not mess up your data. However, their visits are still recorded in your server logs file. They still consume your server resources and still eat your bandwidth. They can even negatively affect your website performance.

If you can’t see a problem in your GA reports but your sites still acting funny check out another article we have written on bots that don’t use javascript and how to defend from them.

 

Can It Get Any Worst? YES! It Can.

Botnets:

Botnets are a network of infected computers that come from different IPs and countries at different rates and are all being controlled by one source. The computers act like zombies if you will, to a leader computer (the spammer). The bigger the network the more IPs which means you can’t just block IPs and limit the rate.

Botnets can also create dozens of fake referrer headers and if they are using a VPN then IP blocking is useless. This means if you block a spam referral by a GA filter or by using .htaccess file there is no guarantee that you have completely blocked it.

 

Infection Bots

Botnets get new computers onto their network by infecting them with malware. They become zombies of that Botnet with the end user not even realising it most of the time.

 

Sad Truth:

If you decide to block botnets, you will most likely block the traffic coming from real people. Whatever you do, though don’t click on the links in your ‘Referrals’ reports as they might be trying to infect your computer.

 

What You Can Do About it

Check Your Reports

Go to your Referrals report and sort the report by bounce rate in descending order. You can also download it if you prefer. Look at referrers with a bounce rate of 100% and 40+ sessions. They are probably spam.

Bot Filtering

It’s definitely not foolproof but try Using GA’s “Bot filtering” feature which excludes hits from known bots.

If You Can’t Identify It

If you still can’t identify it then you might have to visit the site (to make sure it is legitimate). You must have anti-virus/malware software installed on your site and computer before you visit any website that you can’t identify.

List of Known Domains

I have put together a list of suspicious sites referred below. If it’s on the list below then chances are, it is a spam referrer and you don’t need to check the website to make sure

Click Here To View The List (LINKs on this list are updated every so often)

Block them from appearing in your reports.

You can do this by adding a custom advanced filter on GA as shown below.

Use a WAF

Web Application Firewall acts as a line of defence between your web server and the internet. This is probably the fastest way to sort the problem. Also, most services cache your site so if the site on the server goes down your site will still function and be viewable.

Use Google Chrome

The best option to surf the internet is to use Google Chrome. Chrome detects malware deploying websites faster than any other web browser.

 

Block referrer used by a bot

Go to your .htaccess file and add the following:

Example Below:

 

"RewriteEngine On Options +FollowSymlinks RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)*luxup\.ru\ [NC,OR] RewriteRule .* – [F]"

 

This will block the HTTP and HTTPS referrals from luxup.ru and subdomains.

 

Block the IP address used by the spam bot

 

To block IPs in your .htaccess file and add write code below:

 

CODE:
"RewriteEngine On Options +FollowSymlinks Order Deny,Allow Deny from 234.45.12.33"
CODE END:

 

Block the IP address range

 

If you are sure that a range of IPs is bad, then you can block the whole IP range.

 

CODE:
"RewriteEngine On Options +FollowSymlinks Deny from 86.239.34.0/44 Allow from all"
CODE END

 

CIDR is a method for representing a range of IPs.

 

Blocking by CIDR better than blocking individual IP and it takes less space on a server.

 

86.239.34.0/44 is the CIDR range.

 

Use custom alerts to monitor unusual spikes.

If you are using GA, you can use custom alerts this way you can quickly detect and fix issues and minimise their impact.

 

Little Tips:

 

  • Do not exclude the referrer spam from the referral traffic using the ‘Referral exclusion list’ this will not do anything.
  • Create a note/annotation on your charts in G.A and explain what the unusual spike is for.

 

Important Note For PC’s

Without the right protection (anti-virus/anti-malware) your machine could be in danger.

Important Note For Mac’s

Bots are less likely to happen on Macs but you will still need to be aware as there are a few emerging (i-warm)

 

Keep updated with latest OS X and maybe invest in some protection (anti-virus/anti-malware) to be safe.

 

Comments