The past couple of years have seen an increasing amount of algorithmic penalties being handed out by Google which have changed the look of their search results drastically. First we had Google Panda which algorithmically lowers the search visibility of sites filled with low quality content, whether this is scraped or spun content or just thin pages full of boilerplate text and ads. Then in April 2012 we had Google Penguin which algorithmically lowers the search visibility of sites with links from low quality content and heavy exact match anchor text.
Image credit: adactio
I’m not going to get into a debate about whether or not these updates were a good or a bad thing but we have to accept the facts that with the recent news that Panda updates will now be automatically rather than manually rolled out and that Penguin 2.0 is on the horizon it’s fair to assume these spam fighters are here to stay.
Since the first iterations of Panda I had clients whose blogs were being outranked by websites who had scraped their content and were not being attributed by Google as the original source of the document. I am not talking about syndication which is a normal part of the web, where there exists an agreement between webmasters; but people who steal your content without any permission or attribution. Now that we have Penguin to add to the mix I am seeing websites that steal your content and then link back to you causing problems too.
Many webmasters will add internal links to pages on their sites with the anchor text they want that page to rank for. This in itself is not a huge issue however if you then get a few dozen scraper sites stealing your content and then linking to your web pages using the same anchor text you can soon find that your site is in trouble.
Cleaning Up Scraped Content
When I get people coming to me looking for help with a ranking penalty I handle it in a very process oriented way. I’m sure other people may have their own opinions on the steps to take to remove pages which are stealing your content.
This is the exact process I use for cleaning up scraped content for my clients.
1. Find out who is Scraping from Your Website
There are a number of paid and free tools out there which you can begin to use to find out who is copying your content. I always suggest you start by checking the links to your website.
Google Webmaster Tools is entirely free and provides you with a list of links that are pointing to your website, Bing also provide you with free access to links they find pointing to your site too. If there aren’t enough links being shown in Webmaster Tools then it might be worth investing in a link analysis package such as ahrefs or MajesticSEO to help you, I like to download the links into an Excel file or Google Docs.
The next step is to copy a unique sentence from a web page with a lot of links to it and then search for that sentence in Google with quotes.
If someone is copying your content make a note of it on your spreadsheet and move on to the next one, it is also a good idea to make a note of any contact pages or emails too.
(This is a labour intensive process so it may be more cost effective to hire an outsource on oDesk or Elance to do this data entry for you.)
2. Make Contact with the Scrapers
I always try to contact the webmaster to either remove the content; if it is a high quality site and a writer has stolen my article to pass off as their own work then I might ask the webmaster to remove the content in question and replace it with a new version rewritten by me. You can easily find out the quality of a site with a plugin such as SEOQuake.
If you can’t find an email address, social media accounts or contact form on the website concerned then use a whois lookup tool to find their contact information.
3. File a DMCA Request
I always use this option as a last resort mainly because I believe in being a good web citizen. I find that if people are doing something wrong it is better to ask them to stop or try to educate them than simply hit them with legal notices straight away. Some webmasters however will refuse or even ignore you if this is the case then I will file a complaint with their web host and/or Google.
The best tool I have found to find out a web host of a site is whoishostingthis.com, simply add the domain of the site concerned into the search box and in a matter of seconds it will give you the hosting provider’s name and web address so you can raise a DMCA complaint.
In your DMCA request make sure you provide details of the web page you want removing, the web page which has stolen your content and details of any attempts you have tried to make to resolve the issue with the webmaster directly.
Many web hosts will take action within a matter of hours and in some cases they will remove the whole site until the scraped content is removed.
Have you ever had someone scrape your content, what did you do about it? I’d love to talk about this in the comments below.