backlinks seo ahrefs api
Estimated Read Time: 4 minute(s)
Common Topics: data, domain, url, api, requests

Here’s a step-by-step SEO tutorial for a Python script that retrieves and analyzes domain data using Ahrefs’ API. This tutorial will help SEOs, webmasters, and data analysts monitor domain metrics like broken backlinks, total backlinks, and domain rating.

Note: This script requires an Ahrefs API key. For large-scale or fast data needs, paid APIs are recommended, while free API options might involve rate limits or delays. With an Ahrefs account, you can get started by obtaining your API key.

Requirements and Assumptions

  • Basic Python Knowledge: Ensure that Python 3 is installed and you’re comfortable with Python syntax. Alternatively, you can use a notebook like Google Colab.
  • Ahrefs API Access: You’ll need an Ahrefs API key to run the API requests in this script.
  • CSV File of Domains: A CSV file, domains.csv, containing a column named domains with URLs to analyze.

Step 1: Install Required Libraries

The script relies on pandas and requests for data handling and API requests. Install these libraries with:

!pip3 install pandas requests

Step 2: Import Libraries

After installation, import the required Python libraries in your script.

import pandas as pd
import requests
import json
import time
from datetime import date

Step 3: Set Up API and Define Functions

The following functions will connect to Ahrefs and retrieve metrics for each domain:

3.1. Fetch Broken Backlinks

This function retrieves broken backlinks from Ahrefs. Note, that you can adjust the “limit” in the query string. I set it to 5000 to save on API costs.

def fetch_broken_backlinks(url,api_token):
    bl_url = "https://api.ahrefs.com/v3/site-explorer/broken-backlinks"
    headers = {
        "Accept": "application/json",
        "Authorization": f"Bearer {api_token}"
    }
    querystring = {"limit": "5000", "select": "http_code", "target": url, "aggregation": "similar_links"}
    
    try:
        response = requests.get(bl_url, headers=headers, params=querystring)
        response.raise_for_status()
        data = response.json()
    except requests.RequestException as e:
        print(f"Request failed: {e}")
        return 0
    
    return len(data.get('backlinks', []))

3.2. Fetch Domain Rating

Retrieve domain rating using Ahrefs’ API.

def fetch_domain_rating(url,api_token):
    dr_url = "https://api.ahrefs.com/v3/site-explorer/domain-rating"
    headers = {
        "Accept": "application/json",
        "Authorization": f"Bearer {api_token}"
    }
    querystring = {"target": url, "date": date.today().strftime("%Y-%m-%d")}

    try:
        response = requests.get(dr_url, headers=headers, params=querystring)
        response.raise_for_status()
        data = response.json()
    except requests.RequestException as e:
        print(f"Request failed: {e}")
        return "n/a"

    return data.get("domain_rating", {}).get("domain_rating", "n/a")

3.3. Fetch Total Backlinks

This function retrieves the total backlinks count.

def fetch_backlinks(url,api_token):
    bl_url = "https://api.ahrefs.com/v3/site-explorer/backlinks-stats"
    headers = {
        "Accept": "application/json",
        "Authorization": f"Bearer {api_token}"
    }
    querystring = {
        "target": url,
        "mode": "exact",
        "output": "json",
        "date": date.today().strftime("%Y-%m-%d")
    }

    try:
        response = requests.get(bl_url, headers=headers, params=querystring)
        response.raise_for_status()
        data = response.json()
    except requests.RequestException as e:
        print(f"Request failed: {e}")
        return 0

    return int(data.get('metrics', {}).get('live', 0))

3.4. Check Domain Status

This function checks the HTTP status of each domain.

def get_status(url):
    try:
        response = requests.get(url, timeout=5)
        return response.status_code
    except requests.RequestException:
        return "n/a"

Step 4: Read the Domain List and Loop Through URLs

Load the domain list from a CSV file and initialize a loop to process each domain. Note, I am assuming your domains csv file doesn’t contain the protocol in the address. If it does, look in the code below and remove the added protocol from the get_status() function parameter.

urls = pd.read_csv("domains.csv")["domains"].str.strip().tolist()
results = []
api_token = "" # Enter your ahrefs API key

for count, url in enumerate(urls):
    time.sleep(1)  # Delay to avoid rate limits

    broken_backlinks = fetch_broken_backlinks(url,api_token)
    url_status_code = get_status("https://" + url) #only use if your domains in the CSV don't contain the protocol in the address.
    backlinks = fetch_backlinks(url,api_token)
    rating = fetch_domain_rating(url,api_token)
    
    if count % 10 == 0:
        print(f"{count} - {url_status_code}: {url}")
    
    results.append({
        "URL": url,
        "Status Code": url_status_code,
        "Domain Rating": rating,
        "Broken Backlinks": broken_backlinks,
        "Total Backlinks": backlinks
    })

Step 5: Compile Data in a DataFrame and Export

The scraped data for each URL is stored in a DataFrame.

df = pd.DataFrame(results)
df.to_csv("domain_metrics.csv", index=False)

Example Output

The output CSV file (domain_metrics.csv) will contain the following columns:

  • URL: The domain name.
  • Status Code: The HTTP status code of the domain.
  • Domain Rating: The Ahrefs domain rating.
  • Broken Backlinks: Count of broken backlinks.
  • Total Backlinks: Total live backlinks.

Conclusion

This Python script automates the process of data extraction and analysis from Ahrefs, offering useful insights into domain metrics. It’s valuable for SEO marketers, webmasters, and data analysts to evaluate and improve site health.

Following this guide, you can modify the script to suit various SEO needs or extend its functionality by adding more metrics.

Follow me at: https://www.linkedin.com/in/gregbernhardt/

Greg Bernhardt
Follow me

Leave a Reply