Building Your Own Link Profile Based on Google’s Data

Posted in: SEO

Join thousands of marketers to get the best search news in under 5 minutes. Get resources, tips and more with The Splash newsletter:

Go Fish Digital is excited to share a post from guest author Kamil Guzdek. Kamil is an SEO expert focusing on technical SEO, lifting link related penalties, and web analytics. He has 8-plus years of expertise in SEO with an M.Sc. Degree in E-business from the Oxford Brookes University(UK) and Pg.Dip. in Online Marketing from Warsaw School of Economics(PL). He has vast experience in performing SEO for all kinds of websites – ranging from online publishers, startups, SaaS platforms, up to e-commerce. You can reach out and connect with him on LinkedIn, or read more about Kamil on his website (https://guzdek.co/)

Google has a full backlink profile for your domain and uses it as part of its algorithm to determine ranking order in search results. As a result, a change in your backlinks can either increase or decrease your organic traffic.

Related Content:

At most, Google Search Console shows you around 150,000 links at a time, but what if your link profile exceeds that number? We developed a way to track those links so that you can stay on your toes and better track the link profile Google shows you in Search Console. By using it, you can always be prepared to act quickly.

Once an SEO understands how important links are to a search strategy, they realize just how mission-critical it is to keep track of them.

Let’s start from the beginning:

What’s a Link Profile?

A link profile is a dataset containing all the links pointing to any URL in your domain.

Simple, right? Well, it’s not really. We’ll go into the specifics a little later.

Could I Just Pay to Get My Backlink Profile?

There are multiple SEO tools, like MOZ, Ahrefs or Majestic, that allow you to see the sites that link to yours, and all of them use their own resources to find links. And, unless you’re wanting to track websites with high domain authority or spammy websites, these tools do the job very well.

But, Google has its own dataset of your site’s backlink profile.

Google has an index of all the websites that are credible enough to be displayed in the Search Engine Results Pages (SERPs). It also has a vast collection of data on websites that are less reliable and are, therefore, not included in the index.

As you can imagine, the tools above can’t get the complete story for your link profile. To some extent, they can cope with adding new domains, but if you want to know all of the bad links pointing to your domain, this might be tricky.

There is one tool that does the job well – LinkDetox.com. This tool aggregates links from 25 sources, creating a link profile for you. It also integrates with your Google Analytics to retrieve all referral traffic and evaluates it. This helps find really spammy domains, including the nasty ones that are linking to your site, which you might not find in Google’s index.

Why Do I Need to Have My Link Profile?

I’m sure by now you’ve heard that Google can be pretty generous when decreasing the organic traffic coming to your website. In the case of links, you can get a manual penalty, which you’ll be notified about in Google Search Console, or you might fall under an algorithmic adjustment that assesses the site’s link profile.
This may not seem like a big deal if you don’t have many backlinks, but if the number exceeds 200,000-300,000 inbound links, you’ll need to be proactive so you won’t lose a ton of organic traffic based on links you don’t know exist.

In both cases, to get yourself out of trouble, it’s best if you know about all of the links that point to your website, not just a fraction, and what better place to start if not Google?

How Do I Get My Link Profile from Google?

First, let’s estimate how big your link profile is. The resource I use most often for this is the Link Detox pricing page. You won’t get an exact number, but this page will give you a general idea of what size your link profile could be.

If your domain has less than 100,000 of links, you can download your backlink profile directly from Google Search Console. If you see that the number of links exceeds 150,000-180,000, however, Google will only provide you with a sample of your link profile. A 150,000 link sample might seem huge, but if your link profile has millions of backlinks, it’s really not enough to assess all the potential problems.

There is, however, something you can do to help get more links than what Google shares, including all the spammy links which might be hurting your search engine rankings.

Download Your Links from Google

Begin downloading your backlinks from Search Console on a weekly basis and make it a habit.

In the Links section of Google Search Console, click ‘Export Sample Links’ > ‘More Sample Links’.  When prompted, click ‘Download CSV’. Repeat the process for ‘Latest Links’.

Once the files are downloaded, store them in one location.

Every now and then, Google rotates a small batch of links into your sample files. If you keep downloading them for long enough, you’ll start seeing a bigger picture of exactly what your link profile looks like.

For the next section of this blog, I assume that you have a Google Cloud account, you’ve created a project, and have billing turned on.

Build Your Own Link Profile

Once you have at least one month’s worth of data, start merging them into your link profile.

How to Create a Bucket

  1. To do this, log in to Google Cloud and head to the Storage section. Once you’re there, create a bucket – a place where you can store all of your links.You’ll be asked to fill in some data, including a globally-unique bucket name.
  2. Next, pick a physical location where you want your data to be stored. To ensure that the speed of data exchange remains relatively high, select a location that’s near you.
  3. Then, choose the storage class. As I access this data approximately once a month, I chose “nearline”. Depending on your needs, you may want to choose “standard”, or “coldline”. Be aware, though, that this choice will affect your final pricing.
  4. Finally, choose how you want to control access to your bucket. What you choose here won’t affect your final project, so either one will do.
  5. Once you’re done with all the settings, head to your bucket’s ‘Overview’ section and copy your “Link for gsutil”.You can now upload all your link files with a single drag and drop.

How to Complete Your Bucket

Now, you’ll need to create a small shell script that can concatenate all of those links into .CSV files.

  1. Open a text editor (I suggest Atom, but really any other developer-friendly editor will work here) and paste the following code:
    : ${1?Provide a valid Cloud Storage path}
    mkdir -p profile-builder
    cd profile-builder
    gsutil -m rsync $1 .
    dt=$(date '+%Y%m%d')
    dir="output-"$dt
    mkdir $dir
    awk FNR!=1 *.csv > $dir/$dir-tempawk.csv
    cut -d, -f1 $dir/$dir-tempawk.csv > $dir/$dir-tempcut.csv
    rm $dir/$dir-tempawk.csv
    sed 's/^"//' $dir/$dir-tempcut.csv > $dir/$dir-tempsed.csv
    rm $dir/$dir-tempcut.csv
    sort -u $dir/$dir-tempsed.csv > $dir/$dir.csv
    gsutil cp $dir/$dir.csv $1/output/
    cd ..
    rm -rd profile-builder
  2. Save that as a .sh file – for example, as profile-builder.sh.
  3. Upload this script to your Command Line Interface (or Terminal). Simply open the Terminal and upload your script.Now, you’re finally ready to build your very first link profile.

How to Create a Link Profile

  1. In your opened Terminal, type the following command (remember to replace your gsutil link and profile-builder.sh with whatever name you saved your script as):sh profile-builder.sh {your_gsutil_link}
  2. Then, press enter to have the console combine all of the files with all of your link data and delete duplicates.At some point, depending on the amount of data you feed in, the process might look as if it crashed. Give it a couple of minutes – it’s running a lot of calculations!Once it’s all ready, you’ll see the message “Operation completed over 1 objects/file size.”
  3. At this point, start looking for a new directory in your bucket called “output”. Inside, you’ll find a file called output-date.csv. Congratulations – this is your first Link Profile!

All that’s left to be done is to delete your files, but don’t delete the “output” directory, from your bucket. This way, you’ll be able to process only the data for a small amount of files inside your bucket in the coming months.

You may need to merge your output files at some point, but luckily, this is a simple process. Remember the command you typed into the Cloud Console? At the end of your gsutil link, simply add “/output” and the script will perform on all .CSV files in the “output” directory.

How to Use Your Link Profile

Now that you have a link profile, you can start assessing it for bad links. A good place to start would be to look for patterns, any paid links, exact match anchor text, link farms and directories (especially look for “globe”- they have dozens of versions of this directory, on various domains, and all of them are really harmful).

If you feel that your link profile is too big to process on your own, you can always look for software that will do it for you. I recommend using Clusteric, which provides you with a lot of insights into analyzing your links and helps to identify all the bad ones.

Happy hunting!

Search News Straight To Your Inbox

This field is for validation purposes and should be left unchanged.

*Required

Join thousands of marketers to get the best search news in under 5 minutes. Get resources, tips and more with The Splash newsletter: