Friday 3 January 2014

Link Audit Formula: What to Do if Google Detected a Pattern of Artificial or Unnatural Links Pointing to Your Site?

Even if you do not do any kind of Black hat SEO, or involved in any unnatural link building schemes, you may find a message from Google telling you that you have been penalized. 
I once got this message from Google Webmaster Tools:
Google has detected a pattern of artificial or unnatural links pointing to your site. Buying links or participating in link schemes in order to manipulate PageRank are violations of Google's Webmaster Guidelines.
As a result, Google has applied a manual spam action to gouverneur.com/. There may be other actions on your site or parts of your site.

Recommended action
  • Use the Links to Your Site feature in Webmaster Tools to download a list of links to your site.
  • Ensure that unnatural links pointing to your site are removed.
  • When these changes are made, and you are satisfied that links to your site follow Google's Webmaster Guidelines, submit a reconsideration request. If you're unable to remove links pointing to your site, please provide as much detail as possible in your reconsideration request.
  • For an updated list of manual actions currently applied to your site, visit the Manual Actions page. If no manual actions are listed, there is no longer a need to file a reconsideration request.
If we determine your site is no longer in violation of our guidelines, we'll revoke the manual action.
If you have any questions about how to resolve this issue, please visit the Webmaster Help Forum.


If you got it too and do not know what to do, I believe it is time for a Deep Link Audit.

Let's Start:

Collecting the Link Data
To get a complete backlink profile, you will need a paid subscription to a backlink checker. Everyone seems to have a “favorite”, but any of the “Big 4” SEOmozMajestic SEOLink Research Tools or Ahrefs will do the job.

We will be focusing on the following link characteristics:

·         The URL of the page linking to you
·         The URL on your site that is being linked to
·         The IP of the URL linking to you
·         The anchor text used
·         The Percentage (Mix) of Anchor text
·         The follow/nofollow status of the link
·         A measure (rank) of the link’s trust & authority

To begin, enter the URL to audit into the backlink tool. Next, export the data into a CSV file. Sort in ascending value (low to high) by domain/trust/Moz/Cemper whatever rank. In theory this will provide you with a list of links in the order of weakest to strongest. I say “In Theory” as some of the weakest links may be harmless, and some powerful paid links may be killing you. There is no pure algorithmic solution. To do a link audit correctly, requires a manual review.
Analyzing the Link Data
Links that need to be reviewed and considered for removal are the following:
Links that appear on a domain that isn't indexed in Google.
This usually signals a quality problem. A quick way to test for this is to run a “site” command:
Example: “Site:haveresults.com”
sometimes a perfectly good site isn’t indexed, because of a bad robots.txt, like:
User-agent: *
Disallow: /
This usually happens when a website leaves the development stage, but the robots.txt isn’t changed to allow the search engines to crawl the site. That’s why a manual review is important.
Links that appear on a website with a malware or virus warning.
This is pretty self explanatory.

Links that appear on the same page as spammy, unrelated links.
Run the Google Search Command: inurl:links sex,viagra,payday loans and you can find unlimited hacked pages, too.
Links that appear on a page with Google PageRank that is gray bar or zero.
This usually signals poor quality or low trust, but it could also indicate a new page that hasn’t been updated in the PR bar. Gray PR is not the same as PR 0 (zero). The graybar is sometimes a quality indicator, but doesn’t necessarily mean that the site is penalized or de-indexed. Many low quality, made for SEO directories, have a gray bar or PR 0.
Links coming from link networks.
Link networks are a group of websites with common registrars, common IPs, common C-blocks, common DNS, common analytics and/or common affiliate code. Chances are, if a group of websites shares a common ip, you will also find some of the other characteristics of a link network, so that’s where I look first. If using Ahrefs, you would navigate to Domain reports>yourwebsite.com>IPs and get a report like this:
Then Drill down to Domain reports>yourwebsite.com>referring domains, to discover a crappy network
Sitewide Links – especially blogroll and footer links.
Most are unnatural and none pass the juice that they once did.
Watch for exceptions to the rule: After a manual review, I am able to determine that in this case, the first sitewide link found in the tool is natural and there is no need to remove it:. Just one more example of why human intervention is necessary to get a link audit right.

Paid links.
If you are attempting to recover from a manual penalty, every paid link must be removed. No exception. The Google spam team spends all day every day rooting out paid links. After a while, spotting a paid link becomes second nature. That juicy link that you are certain that you can slip by Google will stick out like a sore thumb to the trained eye and will only prolong the agony of a manual penalty.
Beyond specific link types, which could be considered “suspicious”, there are new link rules that need to be reviewed and adhered to in a Post Penguin era.
Post-Penguin Link Audit Considerations
Keep in mind that Penguin is just the latest anti link spam algorithm rolled out by Google. They are hammering websites built on link schemes and rewarding sites with a natural backlink profile. A natural profile contains an assortment of link types, pointing to a website. Your audit should turn up a good mix of:
·         Brand links: Variations include: Your Domain, YourDomain.com, www.YourDomain.com, YourDomain.
·         Exact-match anchor text keyword links: These anchor text links should point to the most appropriate page on the website (the one you are optimizing).
·         Partial-match keyword links: It’s important not to over-optimize with exact match keywords, otherwise you could trip a phrase based filter.
·         Generic Links: Like “Read More” or “Click Here.” Keep in mind that good content should fill this need with little if any work required on your part.
·         Page title links: Some of your links should be the same as your page title.

There are some good tools on the market like Link Detox and Remove’em to help you with link audits and even link removals. The key takeaway is that no matter what tool you are using, a human review is going to be necessary to “get it right.” Leaving it to metrics alone is a formula for failure.

What follows is a step-by-step, tactical walkthrough of exactly how to perform a link profile audit, and how to figure out which links should be removed and/or disavowed.
What you’ll need:
  • Scrapebox (A tool every SEO must have in their arsenal)
  • Proxies for Scrapebox (optional, recommended. I recommend going for the “Bonanza” package from the “Exclusive Proxies” section.)
  • Microsoft Excel

Find Your Anchor Text Ratio

To get started, we need to analyze the most important signal that Google’s Penguin algorithm looks for: over-optimization of anchor text.
Step 1: Get a list of your website’s inbound links and put the list in your Excel spreadsheet. You can get this information from the following sources:
For the most complete information, try to combine data from all four sources. However, I recommend just using the data from your Google Webmaster Tools account. It’s free, and usually about as thorough as you’ll get from the other sources. Plus, it’s straight from Google. For this walkthrough, we’ll assume you’re using the list from your Webmaster Tools account.
Note: To get a list of your inbound links from Google Webmaster Tools, follow the steps below:
  1. Login to Google Webmaster Tools
  2. Click your Website
  3. Click “Traffic” on the left navigation
  4. Click “Links to your site”
  5. Click “Who links the most”
  6. Click “Download latest links”
Step 2: Run your list of links through Scrapebox to get the anchor text of each link. For a detailed walkthrough of how to set up Scrapebox, load proxies, etc., please see my post on how to use Scrapebox to find guest blogging opportunities. Depending on how long your list of links is, and how many proxies you’re using, this step could take a long time.
For lists of links that are 1,000 or less, it shouldn’t take more than 10 minutes. But several nights ago, I ran a report on a list of links that was over 43,000, and I had to let Scrapebox run over night in order to complete.
Step 3: Export the report to Excel on your desktop. You may need to open and re-save the file after you export it, because for some reason it often corrupts immediately after export. Opening and re-saving the spreadsheet should fix it.
Step 4: Within your spreadsheet, sort your columns as such:
  • Column A: Source URL
  • Column B: Destination URL
  • Column C: Anchor Text
  • Column D: Found?
Step 5: Sort column D by alphabetical order and remove all rows in which column D’s value is anything other than “Found.” You’ll likely see lots of “Not Found,” “Error 404″ and such from the Scrapebox output, which should be removed.
Step 6: Delete Column D (it’s no longer necessary).
Step 7: Add a new Column D with header “Number of Anchor Occurrences.”
Step 8: In cell D2, enter the following formula: =COUNTIF($C$2:$C$6633,C2).
Note: Change “6633″ in the above formula to whatever the number of the last row of your data set is.
Step 9: Apply this formula to all rows in column D by clicking in cell D2 and then clicking the box in the lower-right of the cell, and dragging it down the entire length of Column D. You’ll now have a list of the number of occurrences of each anchor text in the spreadsheet.
Step 10: Open a new tab (or worksheet) within your spreadsheet and paste in the data from Columns C and D.
Step 11: That data will still contain the formulas in the cells, so we need to remove that. To do so, copy/paste the data from columns C and D into notepad. Then, re-copy and paste it back into your new worksheet. The values for “Number of anchor occurrences” will now be absolute values rather than formulas.
Step 12: Now, it’s time to remove duplicates. Remove duplicates by highlighting your two columns, then going to the “Data” tab in Excel and clicking “Remove Duplicates.” In the ensuing popup box, make sure both columns are checked and then click OK.
Step 13: Add a new column C with header “Percent of Total.”
Step 14: Sort by Column B (“Number of anchor occurrences”) from largest to smallest.
Step 15: Scroll down to the last row containing data, and in column B, in the cell directly below the cell containing the last piece of data, enter the following formula: =SUM(B2:B6633).
This will result in the total number of links.
Note: Change “6633″ in the above formula to whatever the number of the last row of your data set is.
Step 16: In Column C (“Percent of Total”), click in cell C2 and type the following formula: =B2/$B$422.
Note: Change “422″ in the above formula to the number of the row that contains the total number of links, which you created in step 15.
Step 17: Change the format of the values in Column C to “Percentage” with two decimal points. You can do this by highlighting the column, right-clicking, and selecting “Format Cells” then changing the “Category” setting to “Percentage.”
Step 18: Apply this formula to all rows in column C. You should now have a list of percentages of anchor text as a ratio of the entire link profile.
Step 19: Highlight in red any rows in which the anchor text exceeds 2 percent of the overall link profile, EXCEPT the following anchor types:
  • Brand anchors
  • Naked URLs
  • Images (i.e. no anchor text)
The remaining highlighted anchor text is the anchor text for which your inbound link profile is over-optimized.
If you’ve made it this far and found no over-optimized anchor text in your inbound link profile, congratulations! You’re probably not a target of Google Penguin. If you did find over-optimized anchor text, read on.

Analyze Your Referring Domains

Next, it’s time to get a list of referring domains, and gather some metrics on each one so we can determine whether we have any domains that need to be completely disavowed.
Step 20: Copy/paste your list of links into a Notepad file.
Step 21: Load that file into Scrapebox using the “Import URL list” button.
Step 22: Click “Trim to Root”
Step 23: Click “Remove/Filter” then click “Remove Duplicate Domains.”
Step 24: Click “Check PageRank” and “Get Domain PageRank” to get the domain PR of each domain.
Step 25: Export the list of domains using the “Import/Export URLs & PR” button.
Step 26: Copy/paste the output from your newly exported file back into your Excel spreadsheet and sort by PR from largest to smallest.

Find Out Which Links and Domains Need to Be Disavowed or Removed

Now, it’s time to figure out which links and domains need to be removed or disavowed.
Step 27: Refer to your list of anchor text percentages. Find the first highlighted anchor (from Step 19) and note what the anchor is.
Step 28: Return to your Scrapebox output with the column that includes anchor text, and sort by anchor text, in alphabetical order.
Step 29: Scroll down the list of anchors until you find the first occurrence of the anchor you noted in step 27.
Step 30: Copy/paste all link URLs containing that anchor into a new worksheet titled “links to disavow.”
Step 31: Repeat steps 27-30 for all anchor texts highlighted in red from Step 19.
Step 32: Refer again to your list of anchor text percentages. Go through each anchor and eyeball any anchors that are completely unrelated to the niche or maliciously and obviously spam (for example, porn, gambling, or viagra-related anchors). Add all links containing these anchors to your “links to disavow” worksheet in addition to a new, separate list.
Step 33: Load your list of links from the “links to disavow” worksheet into Scrapebox and get the domain PageRank of each link.
Step 34: Copy/paste the output from your newly exported file back into your Excel spreadsheet and sort by PR from largest to smallest.
Step 35: Highlight all links with a PR of 4 or below, and all links with malicious or completely unrelated anchor text.
Step 36: Add the highlighted links to your “links to disavow” list. Now, it’s time to figure out which domains to completely disavow.
Step 37: Copy/paste your list of links from Step 33 (your “links to disavow” spreadsheet) into a Notepad file.
Step 38: Load that Notepad file into Scrapebox and repeat steps 20-26.
Step 39: Add all domains with PR 2 or below to your disavow list.
Step 40: Eyeball the remaining domains and highlight any that don’t end in the following extensions (unless you’re sure you don’t want to remove them):
  • .com
  • .net
  • .org
Step 41: Add the highlighted domains to your “links to disavow” list.
You should now have a list that contains the following:
  • A list of links that contain anchor text for which your inbound link profile is over-optimized, which reside on a domain that’s PR 4 or less
  • A list of links that contain spammy, malicious, or completely unrelated anchor text
  • A list of domains that contain links to your website with over-optimized anchor text and are also PR 2 or less
  • A list of domains with domain extensions that are not .com, .net or .org
To disavow an entire domain, use the following format:
domain:spamdomain1.com
domain:spamdomain2.com
domain:spamdomain3.com
To disavow individual links from a domain, use the following format:
http://spamdomain4.com/contentA.html
http://spamdomain5.com/contentB.html
http://spamdomain6.com/contentC.html
Your disavow list should look like this:
domain:spamdomain1.com
domain:spamdomain2.com
domain:spamdomain3.com
http://spamdomain4.com/contentA.html
http://spamdomain5.com/contentB.html
http://spamdomain6.com/contentC.html
Step 42: When you’re ready to submit your list of links to disavow, follow Google’s official instructions on how to do so.

Closing Thoughts

  • If you have access to the SEOMoz API, feel free to substitute domain authority (DA) as your metric rather than PageRank. This is a more accurate metric to use, but it’s expensive to use it in bulk. In step 35, substitute PR 4 with DA 40 or below. In Step 39, substitute PR 2 with DA 30 or below.
  • Why did I choose 2 percent as the threshold for over-optimization? I’ve done at least 50 inbound link profile audits, and in my experience, the sweet spot appears to be about 2 percent.  The 2 percent figure is purely based on my hands-on experience in the field working with real clients who were penalized by Google Penguin.
  • How did I come up with the specific PR and DA thresholds for disavowal? Again, this is based purely on my experience in the field. There’s no textbook that’ll tell you the “right” number(s) or even metrics to use.

Source: 
http://www.searchenginejournal.com/how-to-know-which-links-to-disavow-in-google/50709/
http://searchenginewatch.com/article/2207168/How-to-Conduct-a-Link-Audit