Audit URLs for SEO Using ahrefs Backlink API Data
Here’s a step-by-step SEO tutorial for a Python script that retrieves and analyzes domain data using Ahrefs’ API. This tutorial will help SEOs, webmasters, and data analysts monitor domain metrics like broken backlinks, total backlinks, and domain rating.
Note: This script requires an Ahrefs API key. For large-scale or fast data needs, paid APIs are recommended, while free API options might involve rate limits or delays. With an Ahrefs account, you can get started by obtaining your API key.
Table of Contents
Requirements and Assumptions
- Basic Python Knowledge: Ensure that Python 3 is installed and you’re comfortable with Python syntax. Alternatively, you can use a notebook like Google Colab.
- Ahrefs API Access: You’ll need an Ahrefs API key to run the API requests in this script.
- CSV File of Domains: A CSV file,
domains.csv
, containing a column nameddomains
with URLs to analyze.
Step 1: Install Required Libraries
The script relies on pandas
and requests
for data handling and API requests. Install these libraries with:
!pip3 install pandas requests
Step 2: Import Libraries
After installation, import the required Python libraries in your script.
import pandas as pd import requests import json import time from datetime import date
Step 3: Set Up API and Define Functions
The following functions will connect to Ahrefs and retrieve metrics for each domain:
3.1. Fetch Broken Backlinks
This function retrieves broken backlinks from Ahrefs. Note, that you can adjust the “limit” in the query string. I set it to 5000 to save on API costs.
def fetch_broken_backlinks(url,api_token): bl_url = "https://api.ahrefs.com/v3/site-explorer/broken-backlinks" headers = { "Accept": "application/json", "Authorization": f"Bearer {api_token}" } querystring = {"limit": "5000", "select": "http_code", "target": url, "aggregation": "similar_links"} try: response = requests.get(bl_url, headers=headers, params=querystring) response.raise_for_status() data = response.json() except requests.RequestException as e: print(f"Request failed: {e}") return 0 return len(data.get('backlinks', []))
3.2. Fetch Domain Rating
Retrieve domain rating using Ahrefs’ API.
def fetch_domain_rating(url,api_token): dr_url = "https://api.ahrefs.com/v3/site-explorer/domain-rating" headers = { "Accept": "application/json", "Authorization": f"Bearer {api_token}" } querystring = {"target": url, "date": date.today().strftime("%Y-%m-%d")} try: response = requests.get(dr_url, headers=headers, params=querystring) response.raise_for_status() data = response.json() except requests.RequestException as e: print(f"Request failed: {e}") return "n/a" return data.get("domain_rating", {}).get("domain_rating", "n/a")
3.3. Fetch Total Backlinks
This function retrieves the total backlinks count.
def fetch_backlinks(url,api_token): bl_url = "https://api.ahrefs.com/v3/site-explorer/backlinks-stats" headers = { "Accept": "application/json", "Authorization": f"Bearer {api_token}" } querystring = { "target": url, "mode": "exact", "output": "json", "date": date.today().strftime("%Y-%m-%d") } try: response = requests.get(bl_url, headers=headers, params=querystring) response.raise_for_status() data = response.json() except requests.RequestException as e: print(f"Request failed: {e}") return 0 return int(data.get('metrics', {}).get('live', 0))
3.4. Check Domain Status
This function checks the HTTP status of each domain.
def get_status(url): try: response = requests.get(url, timeout=5) return response.status_code except requests.RequestException: return "n/a"
Step 4: Read the Domain List and Loop Through URLs
Load the domain list from a CSV file and initialize a loop to process each domain. Note, I am assuming your domains csv file doesn’t contain the protocol in the address. If it does, look in the code below and remove the added protocol from the get_status() function parameter.
urls = pd.read_csv("domains.csv")["domains"].str.strip().tolist() results = [] api_token = "" # Enter your ahrefs API key for count, url in enumerate(urls): time.sleep(1) # Delay to avoid rate limits broken_backlinks = fetch_broken_backlinks(url,api_token) url_status_code = get_status("https://" + url) #only use if your domains in the CSV don't contain the protocol in the address. backlinks = fetch_backlinks(url,api_token) rating = fetch_domain_rating(url,api_token) if count % 10 == 0: print(f"{count} - {url_status_code}: {url}") results.append({ "URL": url, "Status Code": url_status_code, "Domain Rating": rating, "Broken Backlinks": broken_backlinks, "Total Backlinks": backlinks })
Step 5: Compile Data in a DataFrame and Export
The scraped data for each URL is stored in a DataFrame.
df = pd.DataFrame(results) df.to_csv("domain_metrics.csv", index=False)
- Evaluate Subreddit Posts in Bulk Using GPT4 Prompting - December 12, 2024
- Calculate Similarity Between Article Elements Using spaCy - November 13, 2024
- Audit URLs for SEO Using ahrefs Backlink API Data - November 11, 2024