Find Search Volume Ceiling for Keyword Categories Using Python

Estimated Read Time: 4 minute(s)

Common Topics: data, keyword, categories, enlighter, using

In this tutorial, I’m going to show, using Python, how to generate broad keyword categories using current ranking keyword data and then auto label those keywords with the categories. This is useful in getting a broad overview of what topics you’re ranking the most for and what potential search volumes ceilings are for each category. The biggest opportunities could be in those categories with the largest total volume ceilings. The potential search volumes show estimates of what you could expect if you ranked #1 for every keyword and every user clicked your link. Obviously, not realistic, but the upper bound data is worthwhile.

Table of Contents

Requirements and Assumptions

Python 3 is installed and basic Python syntax understood
Access to a Linux installation (I recommend Ubuntu) or Google Colab.
Keyword Data with search volume (Semrush organic research export)

Import Modules and Data Table Extension

Let’s start by importing the pandas module and the data table Google Colab extension (for easier dataframe viewing).

import pandas as pd
%load_ext google.colab.data_table

Now we load the CSV keyword data into two different dataframes. I’m confident there is a clear way to do everything we need using one dataframe, but this was easiest for me. In the first dataframe we just need the Keyword column in order to generate the keyword counts we’ll use to auto label them into categories.

df = pd.read_csv("rckey.csv")['Keyword']
df2 = pd.read_csv("rckey.csv")

With the keywords data loaded in both dataframe we’ll start with the first one to generate the keyword categories. The Keyword column contains mostly phrases and to evaluate the word for word we split() the phrases by whitespace in a list, use explode() to create new rows from the words in that list, count the unique words in that column and reset the dataframe index.

df = df.str.split().explode().value_counts().reset_index()

The column likely contains a lot of stop words, so let’s filter those out. Add words to the list as needed.

stop = ['1','2','3','4','5','6','7','8','9','get','ourselves', 'hers','us','there','you','for','that','as','between', 'yourself', 'but', 'again', 'there', 'about', 'once', 'during', 'out', 'very', 'having', 'with', 'they', 'own', 'an', 'be', 'some', 'for', 'do', 'its', 'yours', 'such', 'into', 'of', 'most', 'itself', 'other', 'off', 'is', 's', 'am', 'or', 'who', 'as', 'from', 'him', 'each', 'the', 'themselves', 'until', 'below', 'are', 'we', 'these', 'your', 'his', 'through', 'don', 'nor', 'me', 'were', 'her', 'more', 'himself', 'this', 'down', 'should', 'our', 'their', 'while', 'above', 'both', 'up', 'to', 'ours', 'had', 'she', 'all', 'no', 'when', 'at', 'any', 'before', 'them', 'same', 'and', 'been', 'have', 'in', 'will', 'on', 'does', 'yourselves', 'then', 'that', 'because', 'what', 'over', 'why', 'so', 'can', 'did', 'not', 'now', 'under', 'he', 'you', 'herself', 'has', 'just', 'where', 'too', 'only', 'myself', 'which', 'those', 'i', 'after', 'few', 'whom', 't', 'being', 'if', 'theirs', 'my', 'against', 'a', 'by', 'doing', 'it', 'how', 'further', 'was', 'here', 'than']

df = df[~df['index'].isin(stop)].reset_index()

Next we take the top 20 keyword categories and put them in a list. You can alter this or leave it out if you’re saving to a spreadsheet, but for charting 20 is a good number.

topkeywords = df['index'][:20].to_list()
print(topkeywords)

Let’s make a quick chart of the counts for each keyword category

df[:20].plot.bar(y='Keyword', x='index', figsize=(15,5), title="Volume", rot=20)

keyword count graph

Next, it’s time to start labeling our keywords with the categories so we can sum up the search volumes. We will be using the second dataframe. We use the function extract() which searches the keyword column for a match using regex and if there is a match, put it in the newly made Marked column. We build our regex using the categories we made from the first dataframe. The variable regex is will end up looking like “word1|word2|word3|word4|word5…” when using join(). For those without a match, we will drop those rows using dropna().

regex = '(' + ('|'.join(topkeywords)) + ')'
df2['Marked'] = df2.Keyword.str.extract(regex)
df2 = df2.dropna()

Now let’s group all the same keyword categories together, sum their search volumes, and sort by search volume descending.

df2 = df2.groupby(['Marked'])['Search Volume'].sum().to_frame().sort_values('Search Volume', ascending=False).reset_index()
df2.head()

dataframe

Lately we plot these columns in a graph and can see where our potential volumes for the keywords we rank for.

df2.plot.bar(y='Search Volume', x='Marked', figsize=(15,5), title="Keyword Search Volume Ceiling", xlabel="Tag", ylabel="Search Volume", rot=20)

keyword colume graph

Conclusion

Now with this framework, you can generate these keyword categories and summed search volumes for any website to mine opportunities. Next tutorial I’ll show how to do this with landing page data and sessions! Now get out there and try it out! Follow me on Twitter and let me know your applications and ideas!

Author
Recent Posts

Follow me

Greg Bernhardt

Sr. SEO Specialist for Shopify. Nearly 20 years of experience in web design, web development, and web marketing. Education in Information Sciences from UW-Milwaukee. Managing the largest online US physics community. Enjoy learning about search engines, SEO, chrome tricks, Python, knowledge graphs, data science, and more!

Follow me

Latest posts by Greg Bernhardt (see all)

Evaluate Subreddit Posts in Bulk Using GPT4 Prompting - December 12, 2024
Calculate Similarity Between Article Elements Using spaCy - November 13, 2024
Audit URLs for SEO Using ahrefs Backlink API Data - November 11, 2024

Requirements and Assumptions

Import Modules and Data Table Extension

Conclusion

Related Python Tutorials