Analyze SERP Backlink Profiles in Bulk for SEO Using Python
The importance of backlinks as a quality signal has changed little since the early 2000s. Algorithms have evolved, but backlinks remain a strong ranking factor.
Looking up your backlink stats and comparing them to competitors is an SEO staple. What’s new is operationalizing this process and working smarter. Reduce one-off competitor checks — analyze in bulk to identify who and why matters most for a given keyword using SERP scraping.
In this Python SEO tutorial, I’m going to show you step-by-step how to build a script that scrapes the SERPs for a given keyword and analyzes the top X sites to extract backlink insights.
There are dozens of metrics we can get from Ahrefs to analyze, but in this tutorial, we’re going to focus on:
- Ahrefs URL Rank
- Backlinks count
- Referring Domain Count
- Top anchor text count by backlinks
- Top anchor text count by referring domain
- % where a query one-gram is in an anchor text (I think this one is really cool!)
- Average Ahrefs link rank of backlinks
Table of Contents
Requirements and Assumptions
- Python 3 is installed and basic Python syntax is understood
- Access to a Linux installation (I recommend Ubuntu) or Google Colab
- SERPapi or similar service (you will need to modify the API call if not using SERPapi)
- Ahrefs API access
- Be careful when copying code, as indentation is not always preserved.
Install Python Modules
The only module you’ll likely need to install is the SERPapi client, published as google-search-results. If you are in a notebook, don’t forget the leading exclamation mark when running shell commands.
pip3 install google-search-results
Import Python Modules
- requests: for calling both APIs.
- pandas: storing result data.
- json: handling API responses.
- serpapi: interfacing with the SERPapi API.
- urllib.parse: encoding and decoding URLs for passing to APIs.
- seaborn: conditional formatting for pandas tables.
import requests import pandas as pd import json from serpapi import GoogleSearch import urllib.parse import seaborn as sns
Setup API Variables
First, set up the API key variables. These keys are typically found in each platform’s account settings — keep them secure.
ahrefs_apikey = "" serp_apikey = ""
Let’s set up some variables for the SERPapi call. See their full API documentation for many additional parameters should you need them.
- query: this is the query you want to search for
- location: the country you want to communicate the search is coming from
- lang: the language the query search is in
- country: similar to location, you usually want these to align
- result_num: how many results to return. Eight is roughly one page, 16 is two pages, etc.
- google_domain: the country-specific domain you want to search from (ex. google.com or google.fr). This usually aligns with some of the parameters above.
query = "" location = "" lang = "" country = "" result_num = "" google_domain = ""
Make SERP API Call
Next, build the API call using the parameters above (add any additional ones as needed), run it, and receive the results in JSON form.
params = {
"q": query,
"location": location,
"hl": lang,
"gl": country,
"num": result_num,
"google_domain": google_domain,
"api_key": serp_apikey}
search = GoogleSearch(params)
results = search.get_dict()
Setup Container Lists and Dataframe
Now create list variables for temporary storage and then move the data into the pandas DataFrame where it will ultimately live. Add any extra metrics you want to store as additional columns. Note: I split the query by spaces to create a list of one-grams that we will use for anchor text analysis compared to the query. This helps measure how topically aligned backlink anchor text is with the query.
df = pd.DataFrame(columns = ['URL', 'UR', 'BL','RD','Top Anchor RD','Top Anchor BL',"%KW in Anchors"])
keyword_gram_list = query.split(" ")
urls = []
kw_an_ratio_list = []
backlinks_list = []
backlink_UR_list = []
refdomains_list = []
rank_list = []
top_anchor_rd = []
top_anchor_bl = []
Process SERP Result URLs
Loop through the results provided by SERPapi to begin analyzing the top sites for your query. Before calling the Ahrefs API, encode each SERP result URL so it can be passed as an HTTP parameter. Most of the processing in this tutorial occurs inside this loop until we start handling anchor text and appending rows to the DataFrame.
for x in results["organic_results"]: urls.append(urllib.parse.quote(x["link"]))
Retrieve Backlinks Count
Once the URL is encoded (making special characters parameter-friendly), start the first Ahrefs call. We’ll need four different calls because the data we need is spread across four Ahrefs API endpoints. See the full Ahrefs API documentation for details and to modify URL parameters. This first call retrieves backlink and referring-domain counts; after the call, load those values into lists for later use.
for x in urls: apilink = "https://apiv2.ahrefs.com/?token="+ahrefs_apikey+"&target="+x+"&limit=1000&output=json&from=metrics_extended&mode=exact" get_backlinks = requests.get(apilink) getback = json.loads(get_backlinks.text) backlinks = getback['metrics']['backlinks'] backlinks_list.append(backlinks) refdomains = getback['metrics']['refdomains'] refdomains_list.append(refdomains)
Retrieve URL Rank
Next, call the Ahrefs API to get the URL Rank.
apilink = "https://apiv2.ahrefs.com/?token="+ahrefs_apikey+"&target="+x+"&limit=1000&output=json&from=ahrefs_rank&mode=exact" get_rank = requests.get(apilink) getrank = json.loads(get_rank.text) rank = getrank['pages'][0]['ahrefs_rank'] rank_list.append(rank)
Retrieve Backlink Avg Rank
The third API call retrieves the URL’s full backlink profile (you may need to adjust the limit parameter). From that profile, compute the average Ahrefs ranks.
apilink = "https://apiv2.ahrefs.com/?token="+ahrefs_apikey+"&target="+x+"&limit=2000&output=json&from=backlinks&order_by=ahrefs_rank%3Adesc&mode=exact" get_bl_rank = requests.get(apilink) getblrank = json.loads(get_bl_rank.text) all_bl_ratings = [] for y in getblrank['refpages']: all_bl_ratings.append(y['ahrefs_rank']) bl_avg_ur = round(sum(all_bl_ratings)/len(all_bl_ratings)) backlink_UR_list.append(bl_avg_ur)
Retrieve Anchor Text
The fourth and final API call retrieves all anchor text for the URL’s backlinks. We’ll use this to identify the most frequent anchor text and to compute the percentage of query one-grams in the anchor profile.
apilink = "https://apiv2.ahrefs.com/?token="+ahrefs_apikey+"&target="+x+"&limit=3000&output=json&from=anchors&mode=exact" get_anchor = requests.get(apilink) getanchor = json.loads(get_anchor.text)
Query 1-grams in Anchor Text and Top Referring Domain Anchor Text
Next, count how many of the one-gram words from the query appear across all anchor texts. This is a useful measure of topical and keyword signal strength in backlink anchor texts. We’ll use that count to compute a percentage later. We also determine the most frequent anchor text by referring-domain count: as we loop through each anchor, compare it to the current best and update the stored top anchor accordingly.
kw_an_count = 0
for count, x in enumerate(getanchor['anchors']):
if any(x['anchor'].find(check) > -1 for check in keyword_gram_list):
kw_an_count += 1
if count == 0:
an_rd = x['refdomains']
top_an_rd = x['anchor']
else:
if an_rd < x['refdomains']:
an_rd = x['refdomains']
top_an_rd = x['anchor']
else:
pass
Find Top Anchor Text By Backlinks
Now perform the same comparison using backlink counts. Boilerplate areas on some sites can produce thousands of backlinks from the same source, which can skew simple frequency counts. Comparing top anchors by referring domains alongside total backlinks helps surface more meaningful signals.
if count == 0:
an_bl = x['backlinks']
top_an_bl = x['anchor']
else:
if an_bl < x['backlinks'] or top_an_bl == "":
an_bl = x['backlinks']
top_an_bl = x['anchor']
else:
pass
if top_an_rd == "":
top_an = "empty anchor"
if top_an_bl == "":
top_an_bl = "empty anchor"
Calculate % Query 1-gram in Anchor Text
With those values collected, compute the percentage of anchors containing a query one-gram and append it to the results list. Also append the top anchor texts by referring domain and by backlinks.
kw_an_ratio = int((kw_an_count/len(getanchor['anchors']))*100) kw_an_ratio_list.append(int(kw_an_ratio)) top_anchor_rd.append(top_an_rd) top_anchor_bl.append(top_an_bl)
Populate Dataframe and Conditionally Format
Finally, move the data from the lists into the DataFrame created earlier. Because we encoded the URLs for the Ahrefs API, decode them so they are human-friendly.
As an optional final step, use seaborn to apply conditional formatting (a color gradient) to numerical columns for easier scanning. Run all processing before applying seaborn styling, since it converts the DataFrame into a Styler object.
urls = [urllib.parse.unquote(x) for x in urls]
df['URL'] = urls
df['UR'] = rank_list
df['BL'] = backlinks_list
df['Avg BL UR'] = backlink_UR_list
df['RD'] = refdomains_list
df['Top Anchor RD'] = top_anchor_rd
df['Top Anchor BL'] = top_anchor_bl
df['%KW in Anchors'] = kw_an_ratio_list
cm = sns.light_palette("green",as_cmap=True)
df2 = df
df2 = df2.style.background_gradient(cmap=cm)
df2
Sample Output
Below is the output for the query “python seo” and grabbing the first page of results. Nice work from some of my Pythonista buddies JC Chouinard, Ruth Everett, Liraz Postan, and Daniel Heredia Mejais. Looks like I have some work to do with ImportSEM to crack the first page! You can see that even though python.org doesn’t have the highest UR, BL, or RD counts, it has the highest % of the query one-grams in its anchor text. Big signal!

Conclusion
Now you have a framework for analyzing your SERP competitors’ backlink profiles at the query level. Try making the code more efficient and extend it in ways I didn’t consider. I can think of a few ways to extend this script:
- Scrape the SERP URLs for SEO OPF tests to find more patterns and correlations.
- Output top n-grams for anchor text by frequency rather than just the top 1.
- Output average referring-domain rank for each URL.
- Loop through an entire list of keywords instead of just one.
If you’re into SERP analysis, see my other tutorial on calculating readability scores.
Now get out there and try it out! Follow me on Twitter and let me know your Python SEO applications and ideas!
SERP Backlink Profile FAQ
How can Python be utilized to analyze SERP backlink profiles in bulk using Ahrefs for SEO analysis?
Python scripts can interact with the Ahrefs API to perform bulk analysis of SERP backlink profiles and extract SEO insights.
Which Python libraries are commonly used for analyzing SERP backlink profiles with Ahrefs?
Common libraries include requests for API calls, pandas for data manipulation, and other libraries as needed depending on the specific analysis.
What specific steps are involved in using Python to analyze SERP backlink profiles with Ahrefs in bulk?
The process includes connecting to the Ahrefs API, fetching backlink data, preprocessing the data, and using Python to perform in-depth analysis and produce actionable SEO insights.
Are there any considerations or limitations when using Python for bulk analysis of SERP backlink profiles with Ahrefs?
Consider the limits of the Ahrefs API, potential variations in backlink data, and the need for a clear definition of analysis goals and criteria. You may need to update analyses regularly.
Where can I find examples and documentation for analyzing SERP backlink profiles with Python and Ahrefs?
Explore online tutorials, the Ahrefs API documentation, and SEO-specific resources for practical examples and detailed guides on using Python to analyze SERP backlink profiles in bulk with Ahrefs.
- Evaluate Subreddit Posts in Bulk Using GPT4 Prompting - December 12, 2024
- Calculate Similarity Between Article Elements Using spaCy - November 13, 2024
- Audit URLs for SEO Using ahrefs Backlink API Data - November 11, 2024














