Google Trends, Python, ahrefs

Google Trends has long been a powerful tool at the SEOs disposal. Understanding historical, present and forecasting future trends let’s us understand things like seasonality and generational events like the Coronavirus. Who back in 2019 would have thought toilet paper would hit 100 on Google trends in March 2020? The web interface of Google Trends is super user friendly and you can find out a lot, but it only provides data divorced from your website. What if you could bring in some ranking data to find what URLs rank for trending keywords. These would be opportunities you can focus on now or forecast for in the future. Let’s dive in!

Requirements and Assumptions

  • Python 3 is installed and basic Python syntax understood (Pandas module experience recommended)
  • Access to a Linux installation (I recommend¬†Ubuntu) or Google Colab
  • Ahrefs account (subscription)

Download Ahrefs keyword data

  1. Login at ahrefs.com
  2. Click “Organic Keywords” on the left sidebar
  3. Click “Export” at the top right of the data table
  4. Choose “Full Report” and UTF-8 format
  5. Rename downloaded csv as ranking.csv

Starting the Script

My recommendation is to develop this script in Google Colab as we use a specific Colab library and extension, but can be done elsewhere just fine with a couple minor tweaks.

First let’s install the the module pytrends which will help us tap into Google Trends data. If you are using Colab, put an exclamation mark before pip3 so !pip3.

pip3 install pytrends

Next we import our required modules. Pytrends communicates with Google Trends. The data from Pytrends is part JSON, part Pandas dataframe so we need Pandas to handle that. Time for a script delay to reduce friction with Google Trends. data_table is part of the Google extension you’ll load soon to help enhance dataframe visuals. Obviously if you aren’t running this in Google Colab, you can omit that module and the extension.

from pytrends.request import TrendReq
import pandas as pd
import time
from google.colab import data_table

If you are running this in Google Colab you’ll want to use their data table extension which enhances to normal Pandas dataframe visual into more of a spreadsheet.

%load_ext google.colab.data_table

Assign Keyword and Load Ahrefs Data

Next we create a variable to store our keyword. This is what we will be searching Google Trends for.

keyword = "dog collar"

Now let’s load the Ahrefs keyword data into a Pandas dataframe called rankdf. If you are using Google Colab don’t forget to upload this file from the left sidebar file upload area. Once the csv is in the dataframe we want to remove (drop) everything but the keyword and url columns. Our rankdf will consist of only those two columns.

rankdf = pd.read_csv('ranking.csv')

rankdf.drop(['#', 'Position History','Position History Date','Volume','Difficulty','Traffic (desc)','CPC','Last Update','Page URL inside','SERP Features'], axis=1, inplace=True)

Configure pytrends Module

It’s time to use the pytrends module. First we call the TredReq() function and load it into the pytrends variable as an object. Then we build the kw_list the call payload function will use to search data. This variable must be a list even though in this script we’ll make it a single list item. Others may feel free to extend this into multiple keywords. Then we build the request to send to Google Trends.

pytrends = TrendReq(hl='en-US', tz=360)

kw_list = []
kw_list.append(keyword)

pytrends.build_payload(kw_list, cat=0, timeframe='today 5-y', geo='', gprop='')

Access Related Topics for our Keyword

Now we can access different types of data. Full documentation is here. For the purposes of this script we’re going to access data for “Related Topics” and “Related Queries“. First we’re going to access “Related Queries“. The sub data we want is “formattedValue” which is the breakout %, “topic_title“, which is the name of the related keyword breaking out and then “topic_type“, which gives the keyword (Entity) a category.

To access all of them we need to dig into the data contained in the gettopics object and convert them all into a list.

Lastly we want to wrap the data assignments in a Try/Except because if you send Google Trends a word where it doesn’t return data, because maybe the word is fake or unpopular, it will return an error. We’ll assign a variable to mark if there is no data and use it later.

gettopics = pytrends.related_topics()
stop = 0

try:
  breakout = list(gettopics.values())[0]['rising']['formattedValue']
  title = list(gettopics.values())[0]['rising']['topic_title']
  topic = list(gettopics.values())[0]['rising']['topic_type']
except:
  stop = 1

The module pytrends we are using is an unofficial API. This means there are no official guidelines for how to communicate with Google Trends. If abused it can lead to Google blocking you. It’s not uncommon for me to get a “time out” response and that just means I need to wait a few seconds and try again. To help us not abuse the API we add in a script delay using the time module.

Access Related Queries for our Keyword

We already have our “Related Topics” data, it now time to get the “Related Queries” data. The process is similar to what we’ve already done for “Related Topics“. The data we want to grab is “Top Queries” and “Rising Queries“. We grab both, but as you’ll see later on, for keywords matching with your CSV you loaded, you can only use one or the other to match and display in the final dataframe.

time.sleep(1)

getqueries = pytrends.related_queries()

try:
  top = list(getqueries.values())[0]['top']['query']
  rising = list(getqueries.values())[0]['rising']['query']
except:
  stop = 1

At this point we have all of our data, now we just need to form it into the final dataframe for us to view. Starting our we check if the variable stop equals 1 and if it does, print a message telling us that whatever keyword we used, did not have enough data for Google to send us, so there was an error. Next we need to bundle all of data together into the variable “data“. At the moment they are all in separate list variables. You will notice I am omitting “rising“. This is the point in this script that you need to choose either “top” or “rising” to use and certainly you can switch back and forth. This is because admittedly I am unsure of how to merge dataframes where our keyword column from ahrefs data matches against two columns (top and rising) in the dataframe. Let me know if you know!

Merge Lists and Create Master Dataframe

Lastly we create a dataframe from the master list we’ve created called data and name the columns.

if stop == 1:
  print("No Top or Rising Trends Found. Try a diferent keyword")

data = list(zip(breakout,title,topic,top))

trendsdf = pd.DataFrame(data, columns = ['Breakout %','Related Keyword','Entity Label','Top Queries'])

Merge Google Trends and Ahrefs Dataframes Together

At this point we have our dataframe with Google Trend data. What we want to do now is merge this dataframe into the dataframe we created using the Ahrefs CSV at the beginning of this script based on the Ahrefs keyword column data matching data in the “Top Queries” (or Rising, if you chose it) column in the Google Trends dataframe. We will use the fillna() function to replace any NaN values (no data) with a clear double hypen.

During the merge the Ahrefs “Keyword” column is added, but that is a duplicate of the “Top Queries” (or Rising, if you chose it) column, so let’s drop it from the dataframe.

mergedf = pd.merge(trendsdf, rankdf, left_on='Top Queries', right_on='Keyword', how='left').fillna(value='--')
mergedf.drop(['Keyword'], axis=1, inplace=True)

We now at the last step! Great job so far!

All that is left is to print out the keyword, in case you forgot and to print out the final dataframe. For that we’ll make use of that extension we added in the beginning. We feel the Datatable() the final dataframe and bingo! Feel free to adjust the rows per page value (Google usually returns 15-20 rows) and if you want numbered rows on the left change include_index to True. If you aren’t using Google Colab or aren’t using the data_table module, go to the alternative code below.

print("For Keyword: " + kw_list[0])
data_table.DataTable(mergedf, include_index=False, num_rows_per_page=20)

Final Step Alternative

Adjust the number in head() for how many you rows you want to show.

print("For Keyword: " + kw_list[0]) 
mergedf.head(20)

Here is an example of the expected output. In this example I used a pet store’s Ahrefs data and keyword of “dog collar”. Note the colored sections are columns that are related to each other, they don’t cross relate other than being generated from Google Trends using the same keyword.

Google Colab
Click to enlarge

Conclusion

Google Trends has always been a powerful tool for accessing historical trends and forecasting new ones. The ability to tap into their data from an API perspective opens the door to enormous possibilities. This tutorial gave you a sliver of the data that is accessible to you and a singular application. You can easily extend this into taking in multiple keywords, storing data in a database, blending data in from other sources or accessing other data areas. Your mind should be churning with ideas. So get out there and try it out! Follow me on Twitter and let me know your applications and ideas!

Greg Bernhardt
Follow me

Leave a Reply