Creating Internal Linking Opportunities Using Screaming Frog

Over the years, I’ve found that one of the major challenges that can hold back a successful SEO strategy/campaign is how under-optimized a site’s internal linking structure is.

We can create fantastic pieces of content across a variety of topics but, if those pages are housed in isolation, there’s a huge missed opportunity to create topical relevance and to harness existing page equity to distribute across your site.

Building strategic internal links could take poorly ranking content and move it up the rankings with relative ease.

Sprinkle in fantastic on-page optimisation and high-quality backlinks, and you’re onto a winner.

I’m going to walk you through how I find internal linking opportunities based on a site’s existing content using Screaming Frog, custom extractions and Excel formulas.

Methodology

I use a consistent CSS class to extract only the body content across a website. This allows us to find internal links based on keywords or the surrounding content in a paragraph (<p> tag).

This enables me to create not only super-relevant internal links but find natural places for me to create optimised anchor links.

Double winning.

Step 1 – finding the CSS selector using Google Dev Tools

This is the most frustrating part of the process.

It also highlights how lazy developers can be. 

The ideal scenario involve the CSS selector for a site’s entire body of content being the same. Using divs and classes like:

“Main_body_content”

“Blog_post_full”

These kinds of classes would make my life a whole lot easier.

But, I digress. It’s not the end of the world.

I’m going to use comparethemarket.com (UK comparison site) to walk you through how this works.

The first thing you’re going to do is highlight the main section of content as high up the page as possible. This allows you to find the CSS selector much faster.

Highlight your text, right-click and choose ‘Inspect’.

This will now take you to the main Google Dev Tools window. Once you’re here, it’s the process of elimination to find the CSS selector.

Pro tip: the easiest way to do this is to hover your mouse over elements in the code until you see the content highlighted on your page.

Something like this:

In this example, I was lucky. 

The class ID for the page’s content is “content”.

You want to make sure that when you do find the class ID, it doesn’t include the site’s navigation. This will skew your results by including anchor text that from a site’s menu which you could confuse as an internal linking opportunity.

Now that you’ve found the right ID, right-click and select ‘Copy’ and then ‘Copy Selector’

Step 2 – running a custom extraction in Screaming Frog

If you’re unfamiliar with Screaming Frog, I highly recommend you spend the time learning the tool’s capabilities.

I’m going to walk you through only a single application of how useful this tool can be.

Open up Screaming Frog and navigate to ‘Configuration’ > ‘Custom’ > ‘Extraction’.

The configuration rules are as follows for the extraction to work:

 

[Name your extraction]

CSSPath

Extract Text

Attribute optional: leave empty

This is going to tell Screaming Frog that you want to extract the text only from the CSS selector and not worry about pulling styling or HTML tags.

Hit OK.

This has now saved your custom extraction before you begin your crawl.

The next step is to add your desired directory or domain you’d like to crawl. 

Personally speaking, it would make much more sense to focus on a directory at a time for large sites or, if you have a smaller domain, crawl the entire site and then we’ll filter by subfolder in Excel.

The choice is yours – just remember that the larger the crawl, the larger the dataset you have to manipulate.

For this example, I’m going to crawl Compare The Market from the /car-insurance/ directory. 

I can tell Screaming Frog to only include this directory by setting rules defined in the ‘Include’ section.

Or, I can prevent the crawler from leaving the start directory in the ‘Configuration’ settings (shown below).

Now that I’ve defined my crawler restrictions (if any), I can now run the crawl.

Once completed, I export the entire dataset where I’ll begin finding opportunities in the extracted content.

 

Step 3 – finding your opportunities in Excel

You should have a full Excel sheet that looks something similar to this…

Tons of unnecessary data which we need to remove.

We only need a few columns:

 

Address

Status Code

Unique Inlinks

Custom Extraction (using the name you define)

 

This leaves us with a much smaller dataset to work with. 

The column with your content will look messy right now but, don’t worry, once we start adding headers to the columns, we can hide the content column entirely.

So, now we need to start defining our column headers with the keywords we want to find in the extracted content.

In this example, I’m using insurance-related terms.

You’ll see along the top row that I have set my keywords as column headers.

They don’t need to be case sensitive; however, to be safe, I always set the headers using lower case phrases.

I’m going to use a ISNUMBER, SEARCH formula in Excel to establish whether or not my target keywords appear in the text.

The formula:

=ISNUMBER(SEARCH(find_text, within_text))

This tells Excel to look for your keyword (find_text) in your extracted content (within_text).

In this example my find_text cell reference is D2 and my content cell reference begins at C3.

Providing that you’ve selected the correct cell reference, you should now see TRUE or FALSE in your keyword column.

Awesome, right?

Okay, now, to make this scalable, you need to lock your formula references so that you can simply drag it across an infinite number of keywords without having to re-write it every time.

Revised formula:

=ISNUMBER(SEARCH(D$2,$C3))

This formula means the row number (2) is locked, but the column letter (D) isn’t. 

Meaning if I drag the formula to the right, it will stay on row 2, but move to column E (working its way across your column headers for keyword selection).

Next, we lock the column, and not the row for the content (meaning the formula will look down your rows, but not move to the next column if you drag it over.

Which creates $C3.

The dollar sign highlights which part of the formula is locked.

Now, I can drag it across as many columns and voila, Excel is now finding me tons of internal linking opportunities based on page content.

Step 4 – extracting subfolders for content themes (additional step)

If you didn’t define which subfolder you wanted to scrape at the start but have instead used the entire domain, we can now start filtering subfolders for internal links.

The easiest way to do this is with a ‘TRIM’ formula.

The formula is messy and can be confusing, however, the easiest way to use this is, to replace cell values with the URL address column reference.

Finding the first subfolder (formula)

The highlighted cell references need to be set to the start of your ‘Address’ column.

Finding the second subfolder (additional step)

If you need to dig deeper than the first subfolder, use the formula below to extract the second subfolder from your ‘Address’ column.

Again, update ‘B3’ to the correct cell reference.

This now gives you:

URLs with content specific to your keyword

Thematic opportunities based on subfolder structures

The ability to scale this across potentially hundreds of keywords

If I wanted to place links to a /car-insurance/ landing page (using the example above), I now have 10s if not 100s of URLs I can really easily update.

Step 5 – correlating page authority using RDs and Inlinks (bonus step)

This step only really needs to happen if you’re familiar with page authority and how to effectively distribute it across money pages for the most benefit.

However, without this step, you would’ve still created a solid internal linking strategy that can be scaled.

The first thing I do is extract Ahrefs page level data for RDs from the entire domain.

I navigate to ‘Top Pages’ and export the entire list.

I then take this data and create a separate tab in my existing internal linking worksheet that we’ve been using.

With this data now in my worksheet, all I have to do is reference the ‘Address’ and the column number from my backlink export to VLOOKUP the RD (referring domains).

The formula:

=IFERROR(VLOOKUP(B3,’Backlink Data’!A:C,2,FALSE), “0”)

I’ve broken down the formula below to highlight what the formula is actually doing.

I recommend wrapping all of your formulas in =IFERROR to prevent #N/A or #VALUE from ruining your sheet (and your fun).

This data now allows me to highlight power pages to distribute equity from – quickly.

Pro tip: do not abuse how you distribute your page authority. If a page has 100s of external links, the chances are that you’re diluting your equity.

I’ve filtered my pages by top referring RDs to make this selection easier.

Conclusion

You can advance this process however you want.

If you’re a wizard with Excel (which I am not) I’m sure you could automate this at scale, too.

The purpose here is to allow you to find pages, themes, authority pages and content that you can create links to/from.

Once you get familiar with the process, this will likely save you hours of time.

I hope you enjoyed it!

Ryan Darani – Founder of Launch My Links.

Want to learn more about link building and SEO? We have plans and courses for everyone. Use the button below to get in touch.

 

author avatar
Craig Campbell
I am a Glasgow based SEO expert who has been doing SEO for 22 years. In this time I have gained extensive knowledge in the subject of SEO and have build up a wealth of experience in SEO and other digital marketing services.

seo profile image

Craig Campbell

I am a Glasgow based SEO expert who has been doing SEO for 22 years.

  • social media icon
  • social media icon
  • social media icon