The past couple of years have seen an increasing amount of algorithmic penalties being handed out by Google which have changed the look of their search results drastically. First we had Google Panda which algorithmically lowers the search visibility of sites filled with low quality content, whether this is scraped or spun content or just thin pages full of boilerplate text and ads. Then in April 2012 we had Google Penguin which algorithmically lowers the search visibility of sites with links from low quality content and heavy exact match anchor text.

Image credit: adactio

I’m not going to get into a debate about whether or not these updates were a good or a bad thing but we have to accept the facts that with the recent news that Panda updates will now be automatically rather than manually rolled out and that Penguin 2.0 is on the horizon it’s fair to assume these spam fighters are here to stay.

Since the first iterations of Panda I had clients whose blogs were being outranked by websites who had scraped their content and were not being attributed by Google as the original source of the document. I am not talking about syndication which is a normal part of the web, where there exists an agreement between webmasters; but people who steal your content without any permission or attribution. Now that we have Penguin to add to the mix I am seeing websites that steal your content and then link back to you causing problems too.

Many webmasters will add internal links to pages on their sites with the anchor text they want that page to rank for. This in itself is not a huge issue however if you then get a few dozen scraper sites stealing your content and then linking to your web pages using the same anchor text you can soon find that your site is in trouble.

Cleaning Up Scraped Content

When I get people coming to me looking for help with a ranking penalty I handle it in a very process oriented way. I’m sure other people may have their own opinions on the steps to take to remove pages which are stealing your content.

This is the exact process I use for cleaning up scraped content for my clients.

1.       Find out who is Scraping from Your Website

There are a number of paid and free tools out there which you can begin to use to find out who is copying your content. I always suggest you start by checking the links to your website.

Google Webmaster Tools is entirely free and provides you with a list of links that are pointing to your website, Bing also provide you with free access to links they find pointing to your site too. If there aren’t enough links being shown in Webmaster Tools then it might be worth investing in a link analysis package such as ahrefs or MajesticSEO to help you, I like to download the links into an Excel file or Google Docs.

The next step is to copy a unique sentence from a web page with a lot of links to it and then search for that sentence in Google with quotes.

google search copied content

If someone is copying your content make a note of it on your spreadsheet and move on to the next one, it is also a good idea to make a note of any contact pages or emails too.

(This is a labour intensive process so it may be more cost effective to hire an outsource on oDesk or Elance to do this data entry for you.)

2.       Make Contact with the Scrapers

I always try to contact the webmaster to either remove the content; if it is a high quality site and a writer has stolen my article to pass off as their own work then I might ask the webmaster to remove the content in question and replace it with a new version rewritten by me. You can easily find out the quality of a site with a plugin such as SEOQuake.

If you can’t find an email address, social media accounts or contact form on the website concerned then use a whois lookup tool to find their contact information.

3.       File  a DMCA Request

I always use this option as a last resort mainly because I believe in being a good web citizen. I find that if people are doing something wrong it is better to ask them to stop or try to educate them than simply hit them with legal notices straight away. Some webmasters however will refuse or even ignore you if this is the case then I will file a complaint with their web host and/or Google.

The best tool I have found to find out a web host of a site is whoishostingthis.com, simply add the domain of the site concerned into the search box and in a matter of seconds it will give you the hosting provider’s name and web address so you can raise a DMCA complaint.

In your DMCA request make sure you provide details of the web page you want removing, the web page which has stolen your content and details of any attempts you have tried to make to resolve the issue with the webmaster directly.

Many web hosts will take action within a matter of hours and in some cases they will remove the whole site until the scraped content is removed.

Have you ever had someone scrape your content, what did you do about it? I’d love to talk about this in the comments below.

24 comments

  1. nabil

    Reply

    ‘The past couple of years have seen an increasing amount of algorithmic penalties being handed out by Google which have changed the look of their search results drastically. ‘penality is one of the problems our blogs face these days!!!
    I really enjoy reading this article.Thank you for sharing it!

  2. Sudipto

    Reply

    Hey Chris,
    Nice post and Thanks for sharing this post of cleaning up scraped content with us. I think first we have to contact with the person who scrapping our content and if he not listen then we have to filed DMCA.

  3. Prakash

    Reply

    We should contact the scraper and then file DMCA. Great tips for removing our posts from scrapers site.

  4. John

    Reply

    To be well optimization of your website in search engine, you must have to possess quality updated content which helps your website ranking a lot. If anyone copied your valuable content then just follow the steps as said above to remove it.

  5. nabil

    Reply

    Cleaning Up Scraped Content!! I AM REALLY crazy abut cleaning!!! so cleaning up scraped content will be also good thing.It is like cleaning email list in order to avid any bad results of nt ing so!!If someone is copying your content make a note of it on your spreadsheet and move on to the next one, it is also a good idea to make a note of any contact pages or emails too.!! this is really good infrmation!! thank you for sharing!!

  6. prabhat

    Reply

    great share
    recently i have seen that somebody was scrapping my content so i contacted him but he did not remove my post then i filed DMCA but i did not get a reply hope they will reply me

    • Chris

      Reply

      prabhat,

      Who did you file the DMCA with? I’m surprised they ignored it as that’s the worst thing they could do. I would attempt to contact the webmaster again and ask the content is removed or amended with attribution to the original source.

  7. Jennifer Cunningham

    Reply

    Thanks for the info on scrapers. I have not had this experience but I would rather be prepared with a plan if this should happen. I can use these tips.

  8. Joe Hart

    Reply

    I recently noticed that my content was being scraped on by a guy who is guest posting everywhere.Since the sites where the content was posted does not belong to him, i had to take an extra amount of effort to get the content removed..I shared the links to the original articles with the owners of the website and soon they removed all his guest posts..Some respond quickly but some webmasters login to their mails once everyweek or so..In such cases i wait for sometime before officially raising a complaint.

  9. Dave

    Reply

    Excellent information I truly love the free tools you were mentioning here. Anything I can get my hands on that is free is always good in my books.

  10. Reply

    Chris, Thanks for this timely post. Recently, I made a post that was a real hit and some sites linked to it. I was moved by instinct to make a search on the title and I found out one particular site was ranking for the title on Google first page. I visited the site and found my post pasted there word-for-word. But I noticed there was a link acknowledging me as the author. But the link was an inactive one.

    I contacted the webmaster to change the link to an active link and he has not replied my mail even after a reminder.

    My question is, if you copy someone’s content and acknowledge him as the right author, what kind of link should you add to the post? Should you add an active link or just any link should do?

    • Chris

      Reply

      If you didn’t release the copyright under any creative commons license then they should not take your content without any permission, let alone an attribution link.

      I would give the webmaster another email and ask that they either give you full attribution with a dofollow link of your choosing, allow you to rewrite the piece as a guest blog or remove the article entirely as they do not have permission.

      If they fail to respond then I would consider the DMCA warning via their web host

      • Reply

        My site has a copyright warning displayed conspicuously. I think I should write a reminder again before taking the DMCA action against the site. Thanks for your advice.

  11. Reply

    Chris, Thanks for this timely article. Recently, I wrote an article and it was a hit. I eventually did a search on the title and discovered a website was ranking for it on Google first page. I visited the site and found my article there word-for-word. I noticed the article was attributed to me, but the link used at the end of the article was an inactive link.
    I then wrote to the webmaster to make the link active, but he has not replied my mail even after a reminder.

    My question is, if you copy someone’s content and want to attribute the authorship to the person, what kind of link should you use? Is it an active link or just any link would do?

  12. Salman khan

    Reply

    well one can also disable the right click on their website if they want to protect their content…

  13. Carl

    Reply

    Thank you for this informative post
    Scraping, or more to the point, Plagiarism has become a major issue since the Panda and Penguin updates for my many aquarium and pond based articles.
    I am attempting update anchor text to see if this would help as per your suggestion

    Unfortunately I have filed DMCA complaints with no answer EVER forth coming from Google
    As well these scrapers never reply to my emails, the worst of these is About.com which scraped content from my UV Sterilization article. It now out ranks me for much of my content.

    The other problem is I am constantly researching, updating content and pictures, so this is slowly making my original content more different from that that is/was plagiarized. My fear is while this should be a good thing for Search Engine Ranking (it used to be), I no longer notice this helping, especially with Google

    • Chris

      Reply

      Have you tried raising a DMCA with their hosting provider or using a DMCA service provider?

      • Carl

        Reply

        I filed another DMCA complaint, but other than the offending page, I could not find anywhere to add the hosting provider (about.akadns.net)

        BTW, thanks for the tip

  14. Reply

    I know that Google Panda usually penalizes duplicated content but I wouldnt care too much about scrapers: I think that Google understands that they are bots and, eventually, he will ban them instead of you!

    • Chris

      Reply

      Yes Google are getting smarter but at the end of the day do you want someone else making money off your content without permission or attribution?

  15. Chris

    Reply

    Frank

    Those are both good tips for stopping scraped content outranking you as it lets Google know the true source.

    Madras Geek there’s lots of articles out there already and the process might be slightly different for each webhost. You can always pay a 3rd party service to do them for you.

  16. Frank Steiner

    Reply

    You might also consider adding PubSubHubbub plugin to your post as it makes sure that search engines recognize the original content owner. Google Authorship plugins also helps in claiming the ownership of content.

  17. Madras Geek

    Reply

    Great post! This really helps others.

    if you could write another post detailing on DMCA request, it will be great helpful.

  18. Chris

    Reply

    Kharim

    Thanks for hosting my post I hope your readers have lots of questions for me 🙂

Leave a comment

Your email address will not be published. Required fields are marked *