Entries by Greg Bernhardt

Evaluate Subreddit Posts in Bulk Using GPT4 Prompting

Reddit is a goldmine of user-generated content—people share their thoughts, ask questions, and discuss topics of all kinds. By analyzing subreddit posts, you can gain insights into what users care about, emerging trends, and pain points. Pair this with OpenAI’s powerful language models, and you can quickly generate summaries, topic suggestions, or keyword ideas tailored […]

Calculate Similarity Between Article Elements Using spaCy

In this Python SEO tutorial, we’ll walk through a Python script that uses SpaCy to calculate similarity metrics between content keywords and the body of an article. This analysis can help SEOs and content creators assess content relevance and keyword alignment. Using Natural Language Processing (NLP), we’ll compute similarity scores to gauge how well keywords […]

Audit URLs for SEO Using ahrefs Backlink API Data

Here’s a step-by-step SEO tutorial for a Python script that retrieves and analyzes domain data using Ahrefs’ API. This tutorial will help SEOs, webmasters, and data analysts monitor domain metrics like broken backlinks, total backlinks, and domain rating. Note: This script requires an Ahrefs API key. For large-scale or fast data needs, paid APIs are […]

Storing CrUX CWV Data for URLs Using Python for SEOs

The CWV panic days appear to be over, but that doesn’t mean keeping tabs on the data isn’t useful. CrUX data is useful for SEOs because it offers performance analysis opportunities for real user Core Web Vital metrics like Largest Contentful Paint (LCP), First Input Delay (FID), and Cumulative Layout Shift (CLS) which are minor […]

Scraping YouTube Video Page Metadata with Python for SEO

In this tutorial, we will explore a Python script designed to scrape and analyze YouTube video metadata for free. This framework can become an excellent start for a tool to assist SEOs, content creators, and data analysts. Note, that paid APIs such as SerpAPI can process requests much faster and reliably but are not free. […]

Calculate SERP Rank Readability Scores Using Python

  Readability scores are not a verified SEO ranking factor. I repeat readability scores are not a verified SEO ranking factor. So why care? You might care because not matching the readability level for your audience may result in higher bounce rates, lower engagement, and less conversion. It’s quite simple, your audience expects content to […]

Find Interlinking Opps via Entity N-gram Matches Using Python

Any seasoned SEO will know that finding internal linking at scale can be a difficult process but a very important one. This is especially true if your content is not well organized topically. If you have a blog that is a mess or seemingly full of random articles and you’re tasked with internally linking them […]

Collect Domain Security Information with Python

In this tutorial, we will learn how to automate the collection of various domain-related technical information using Python. The script will gather data such as WHOIS details, DNS records, SSL certificates, reverse IP lookup, blacklist status, robots.txt, and more. Using the pandas library, we will also show how to store the collected data in a […]

Build and Run Python Scripts on the Fly With GPT-3

GPT-3 and its forms have taken the world by storm and for good reason. It’s an exciting time full of possibilities. The limits are being pushed every day. Python as an SEO skill has always been a bit niche due to some learning curve. Today, we start blowing that learning curve out of the water! […]

Compare Keyword SERP Similarity in Bulk with Python

One of SEO’s oldest and still best ways to understand keywords and how Google treats them is to study the search engine result pages. Often times when doing keyword research we’ll end up with a large list of keywords. We’re then faced with validating and cleaning the list of prospects. We care about cleaning because […]

Analyze SERP Backlink Profiles in Bulk for SEO Using Python

The importance of the backlink as a quality signal has changed little since the early 2000s. The algorithms have evolved, but backlinks remain a strong ranking factor. The act of looking up their own backlink stats and comparing them to competitors is an SEO staple. What can be new in this process is leveling up […]

Detect Generic Anchor Text in Links for SEO using Python

Optimizing anchor text for internal links has been a stable activity within SEO for a very long time. Google even has an entry on anchor text in their SEO guidelines. Anchor text provides the user and search engine with valuable contextual tags for the topical nature of the page you’re linking to. This is a […]

Detect Text in Images in Bulk With Tesseract Using Python for SEO

Imagery in articles can be a wonderful communication device when used correctly. One issue that still plagues SEO content teams is how to properly handle text in images. Historically, text within an image is trapped and the contextual message is lost to search engines that didn’t have the processing power to decode (they still likely […]

Extracting Data from PDFs Using PDFMiner

PDF files are ubiquitous in various industries, but programmatically extracting data from them can be complex. PDFMiner, a powerful Python library, helps parse and extract content from PDFs in formats like plain text, HTML, XML, or tagged text. This tutorial explains how to use a comprehensive PDF extraction script. We’ll explore its structure and functionality […]

Classify Anchor Text N-Grams for Interlinking Insights with Python

In this Python SEO tutorial, I’ll show you a programmatic method to start analyzing your internal anchor text for topical relevance. Internal anchor text remains one of the most powerful topical endorsements you can provide. Anchor texts are explicit contextual signals Google can use to help understand and calculate the linked page’s topical authority. Let’s […]

Webpage Word Sense Disambiguation for SEO Using Python and NLTK

In semantics, ambiguity is partially defined as a word having multiple “senses”. A sense is a meaning or definition. Effective content in SEO should be as free of ambiguity as possible. When you have ambiguity in your content you risk machines (that evaluate your content via natural language understanding), not being able to understand your […]

Calculate GSC CTR Stats By Position Using Python for SEO

Last week SEO Clarity came out with a new SERP CTR study. The numbers were lower than I expected even as an average for all queries. It got me thinking. What is MY average CTR by position? Turns out, it’s much higher. This is likely due to good SEO by optimizing the title, meta, and […]

Use Python and Google Trends to Forecast Your Top GSC Keywords

Google Search Console already gives SEO’s amazing historical data for how the queries you rank for are performing. Google Trends also is a useful platform that can give insights into a query’s relative popularity within Google’s system (by Geo) historically and a little forecasting for the future. What if we could begin to marry these […]

Detect Google SERP Title and Snippet Rewrites with Python

Back in early August of 2021 word began to travel through the industry that titles were being rewritten in the Google SERPs in a frequency and manner not seen before. Plenty of SERP analysis has been done to understand the why, how, and what to do about it, but it starts with an analysis of […]

Use Python to Create a GSC to BigQuery Pipeline

Google Search Console is likely the most important source of data for an SEO. However, like most GUI platforms, it suffers from the same large downside. You’re stuck in a GUI that only gives you 16 months of data. You can manually export data to a Google Sheet. Exporting to a Google Sheet is fine, […]

Overlay GSC Data with Google Algo Updates Using Python

Most SEO’s hearts skip a beat when they hear a Google algorithm update is unfolding and for a few days relentlessly check analytics. Then, there is a natural lull, the panic or excitement fades and you get back to your work. Google algorithms don’t always result in a dramatic spike one way or another. It […]

Build an N-Gram Text Analyzer for SEO using Python

The days where content SEO was simply copywriting are over. Modern content SEO now employs massive resources for technical analysis for the words you write/manage. Actually, this has been the case for nearly 10 years now with the introduction of machine learning in search engines. The tools are now widely available to SEOs to achieve […]

Bulk IP Filter for Google Analytics Using Python and RegEx

Not every Python script needs to be complex, long, and work of art. Sometimes it can help with quick mundane tasks. One such opportunity presented itself a couple of weeks ago where a client asked for 50+ IPs to be filtered from a Google Analytics view. We could have taken 20 minutes and manually added […]

Compare Wikipedia Search Data with Google Trends with Python

There are countless ways to understand trends which are important in understanding the past, present, and future. I’m sure everyone is familiar with Google Trends. No doubt it’s very powerful, but there are options as well. One being Wikipedia. Wikipedia currently is the 4th most visit website in the US. If only there were a […]

Measure Causal Impact from GSC Data Using Python

Causal Impact is a Bayesian-like statistical algorithm pioneered by Kay Brodersen working at Google that aims to predict the counterfactual after an event. Take for example you make a large SEO change to a website. Sometimes it’s not obvious whether or not the change was beneficial. You can compare against the past, but the past […]

Competitive SEO URL Analysis with Python

Match your URLs to your competitor’s URLs, find title keyword and ranking keyword count differences with this step-by-step Python SEO tutorial.  SEO is not an island. You are not simply improving your site/pages in a vacuum. You need to consider your competition as you all are jockeying for positions in the same SERPs. Some URLs […]

Use Python to Label Query Intent, Entities and Keyword Count

Query analysis is a large topic, but I wanted to focus on intent and entity recognition. Intent and entity recognition are very important concepts to understand in SEO. Google’s use of machine learning has rapidly increased since 2013 when they introduced their Knowledge Graph. For intent, what is important is how Google’s understanding of the […]

Generate a 404 Redirect List for SEO with Polyfuzz Using Python

We’ve all had a client where we pop in their Google Search Console or ahrefs account and see they have hundreds or thousands of reported 404s. Perhaps from a migration or perhaps a decade of regular pruning. This tutorial won’t cover evaluating whether they are worth redirecting or not, but rather simply the case if […]

Greg Bernhardt Joins Webinar on How to Perform a Content Audit

I had a great time joining this webinar on content audits for SEO. Thanks for the opportunity Authoritas! Was great to see and hear from Laura Monckton and Daniel Heredia Mejias! Sadly no Python in this webinar, but great advice nonetheless! See the outline and video below… When Evaluating Content for SEO, Consider These 7 Core Concepts: Accessibility […]

Analyze Words Using WordsAPI App and Python for SEO

Ask any SEO writer, the words you choose for your copy matter. Sometimes we think we know the attributes, relationships, and the word universe words live in, but often we don’t. It can be a challenge to generate ideas and inspiration from singular words. Understanding words can help you explore possibilities for your content that […]

Scraping YouTube Video Pages for SEO with Python

I had a project this week that tasked my team with optimizing YouTube tags for a couple hundred videos. We could do it manually but thought this was a nice chance to use Python. Our idea was we could scrape YouTube video page information, put it into a spreadsheet for easier organization and identification of […]

SEO Data Blending with Python for Beginners

Data is everything for an SEO and it’s all too often scattered across proprietary platforms that do a good job of visualizing and analyzing that data according to how they think you want. Even when these platforms give you export methods you still need to load it into Excel and Google Sheets and perform some […]

Crawl and Optimize All Website Images With Python

Last month I released a tutorial for automating new image optimization over FTP. This time we’re going to crawl an entire website and locally optimize the images we come across, organized by URL. Note this short but intermediate level script is not for massive sites as it is. For one thing, all images are dumped […]

Automate Image Compression with Python over FTP

Image compression isn’t new to the tech SEO world, but with site performance in the form of core vitals being a ranking factor, the time is now to start taking action. I’ve done dozens and dozens of site audits and I find that 80% of site performance issues can be bucketed under images or JavaScript. […]

Create a Custom Twitter Tweet Alert System with Python

Do you follow hundreds or even thousands of accounts on Twitter? Do you find yourself missing important announcement tweets from Google Search accounts? Were you late to know about the latest core update? No more! In this tutorial I’m going to show, using Python, how to create a very simple Twitter alert system. We’re going […]

Find Search Volume Ceiling for Keyword Categories Using Python

In this tutorial, I’m going to show, using Python, how to generate broad keyword categories using current ranking keyword data and then auto label those keywords with the categories. This is useful in getting a broad overview of what topics you’re ranking the most for and what potential search volumes ceilings are for each category. […]

Analyze Crawled PDF Text Using Python for SEO

Google has been indexing PDFs for many years and ranks them among web pages so it’s only logical to analyze your PDFs for optimization opportunities just like a web page. Obviously in general that is a much harder task given the file’s constraints, but we can start this process with the help of Python. I’m […]

Use Python and Brightlocal API to Grab Your Keyword Rankings

Brightlocal has been in the SEO toolbox for local ranking and citations for a number of years now.  Brightlocal has an extensive API system allowing you to retrieve info and automate a lot of processes. In this tutorial, I’m going to show you how it easy is to grab your ranking data from their API […]

How SEOs Can Use Python to Automate Lighthouse Reports

Google’s web page scanner Lighthouse has been a fixture as one of the most important tools to use when evaluating a web page. This scanner at a high level measures your page’s performance, SEO, accessibility, and best practices. At a deeper level, it gives more granular metrics for each of those categories and displays recommendations. […]

Detect Web Page Technologies with BuiltWith API and Python

For SEO audits, one area you may want to detect and store the different technologies a website is using. Sure you can spot check and run a few console commands, but what if you could have an API do it all for you. The service BuiltWith has just that capability! BuiltWith has several very interesting […]

Getting Started with Google NLP API Using Python

For Search Engines and SEO, Natural Language Processing (NLP) has been a revolution. NLP is simply the process and methodology for machines to understand human language. This is important for us to understand because machines are doing the bulk of page evaluation, not humans. While knowing at least some of the science behind NLP is […]

Use Python to Scrape Technical Info for Domains

SEOs wear many hats and from time to time whether during a technical audit or technical troubleshooting, it’s nice to have public technical information handy for a domain you’re working on. Below are some Python tools you can use to easily grab that available domain information. It would be easy to loop this over your […]

Website Uptime Monitor With LEDs and LCD Screen Using Python

Earlier in the tutorial, SEO Guide to Creating a Website Uptime Monitor Using Python I showed you how to create a simple uptime monitor and store that information in MySQL. This next phase is a direct extension of the previous tutorial and assumes you have that all set up. The code in the tutorial will […]

SEOs Can Retrieve the Google Cache Date for URLs Using Python

Viewing cached links in Google is often used by SEOs as a troubleshooting or information recovery method. Google caches some of the web pages it crawls and creates a type of snapshot of that page at the time of the crawl. You’ll often notice some resources or images aren’t rendered so it’s rarely a perfect […]

Extract Google Suggestions API Data for SEO Insights with Python

One of the main tenets of SEO is understanding the search climate for the keywords you are targeting. Understanding what the trends are, what people are searching for, and in what volume, and generating ideas from those results. An often overlooked API is Google Suggestions. It’s essentially tapping into Google’s search autocomplete feature. We can […]

Find Keyword Opportunities with Google Trends, Python and Ahrefs

Google Trends has long been a powerful tool at the SEO’s disposal. Understanding historical, present, and forecasting future trends lets us understand things like seasonality and generational events like the Coronavirus. Who back in 2019 would have thought that toilet paper would hit 100 on Google Trends in March 2020? The web interface of Google […]

Submit a WordPress Gravity Form via API with Python

Gravity Forms is a popular WordPress form plugin. If you are a leads-based business you will want to know that your form is working and that leads can contact you. We’ve all had the feeling of looking at the entry log and seeing an unusual gap between the last entry and the current moment. Maybe […]

Use Python and Chrome to Take Webpage Screenshots

In 2017 Chrome released a headless (no GUI) feature of being able to take a screenshot of a single web page from a specified viewport. This helps in keeping an archive for version comparison, monitoring, and client-facing deliverables. Because it’s a headless feature it’s perfect for use with Python. In just several lines we’ll run […]

How to Get Cached Pages From Wayback Machine API

Archive.org’s Wayback Machine has been a staple in the SEO industry for looking back at cached historical web pages. Each cached page is called a snapshot. It’s great for tracking progress, troubleshooting issues, or if you are lucky, recovering data. Using the Wayback Machine GUI is not always quick or frustration-free. Using the steps below, […]