Update a Google Sheet with Semrush Position Tracking API Using Python
If you are interested in breaking out your keyword position data from Semrush using their API (paid) and sending it into a Google Sheet using Python then this tutorial is for you! As a bonus, I’ll show you a couple of ways to automate the process! You will be amazed at how easy it is and how little code is required. I am using Semrush API, but really this script can be easily adjusted to use whatever tracking platform you use as long as they offer an API. For this tutorial, there is a little bit of prior setup involved with the Google Cloud Platform. I am considering creating a tutorial for that for those without experience. Until then let’s dive in for those who are comfortable with it.
Table of Contents
Requirements and Assumptions
- Python 3 is installed and basic Python syntax understood
- Access to a Linux installation (I recommend Ubuntu) or Google Colab
- Semrush API with a project tracking keyword positions (you can adjust the script to use a different platform)
- Google Cloud Platform account (GCP)
- Google Drive API and Google Sheets API Enabled in GCP
- Credentials created (service account) in GCP and JSON file downloaded
- Service account name shared with google sheet (open JSON credential file to find it, it has the @ in it)
Import Modules and Set Authentication
Let’s start by installing the gspread module with helps us with using the Google Sheets API. Type this into your terminal. Google Colab has it preinstalled but an old version that we need to update. If you are using Google Colab be sure to add an exclamation mark at the beginning.
pip3 install gspread==3.6
Next we import the modules we’ll be using.
- gspread: interacts with the Google Sheets API
- ServiceAccountCredentials: Google authentication
- requests: for making the API call over HTTP
- json: handling the API response which is in json format
- pandas: organizing the keyword data
- date: getting today’s date
import gspread from oauth2client.service_account import ServiceAccountCredentials import requests import json import pandas as pd from datetime import date
Next we set up some variables. Your project ID (several digits) can be found in the URL when you are viewing its Semrush project dashboard. LIMIT is how many keywords you want to be returned. I am using minimal properties. I would highly suggest reviewing the Semrush API documentation for options to customize the data returned. These options just help build the API URL call.
Setup Key Variables
today = str(date.today()) APIKEY = "" PROJECT_ID = "" LIMIT = "" DOMAIN = "" #example: domain.com creds_file = "" #example: cred.json sheet_name = ""
Now we build the API URL call. Note the URL parameter at the end. It has some wildcards in it that you may want to customize. I am using the default version which will accept data from any variation of the core domain name and pages of it. This is a required field. We then execute the call and store the data in JSON format. After that, we create our dataframe container which will house our keyword data. Lastly, we grab the keyword total from the data. This should match your LIMIT API parameter.
Build API Call and Handle Response
url = "https://api.semrush.com/reports/v1/projects/"+PROJECT_ID+"/tracking/?key="+APIKEY+"&action=report&type=tracking_position_organic&display_limit="+LIMIT+"&url=*."+DOMAIN+"/*" payload = {} headers= {} response = requests.request("GET", url, headers=headers, data = payload) kdata = json.loads(response.text.encode('utf8')) df = pd.DataFrame(columns = ['Date', 'Keyword', 'Current Rank', 'Last Rank','Diff','Diff30','Volume']) total = kdata['total']
Then we simply iterate through the JSON API data. Semrush uses some bizarre key naming schemes and once again I would highly suggest reading the API documentation as there are a lot of options for what data you can grab and definitions for what things are. For this tutorial, we will simply grab the keyword, the current rank, the last rank (based on the beginning date parameter or the default), the one-day difference, the 30-day difference, and the keyword volume.
Once we have that data into variables we can add them to our currently empty dataframe and sort them by volume. Careful sorting by when you have keywords without a rank as they are returned with a hyphen and can throw an error.
Store Keyword Data from API Response
for x in range(total): keyword = kdata['data'][str(x)]["Ph"] current_rank = kdata['data'][str(x)]["Fi"]['*.' + domain + '/*'] last_rank = kdata['data'][str(x)]["Be"]['*.' + domain + '/*'] diff = kdata['data'][str(x)]["Diff"]['*.' + domain + '/*'] diff30 = kdata['data'][str(x)]["Diff30"]['*.' + domain + '/*'] volume = kdata['data'][str(x)]["Nq"] df = df.append({'Date':today,'Keyword':keyword, 'Current Rank':current_rank, 'First Rank':last_rank,'Diff':diff,'Diff30':diff30,'Volume':volume}, ignore_index = True) df.sort_values(by=['volume'], inplace=True, ascending=False)
Now that we have our keyword data nicely in a dataframe it’s time to ship it over to the Google Sheet. The gspread module is very straightforward with a number of options. I would highly suggest reading its documentation. It is possible to update a cell one at a time, but note Google Sheet API has a call limit of 500 calls per 100 seconds. So if you are tracking more keywords than that you’ll run into an issue. The best way, in this case, is to send the whole package over at once which was the point of putting everything into a dataframe.
Update Google Sheet
scope = ['https://www.googleapis.com/auth/spreadsheets',"https://www.googleapis.com/auth/drive.file","https://www.googleapis.com/auth/drive"] creds = ServiceAccountCredentials.from_json_keyfile_name("cred-file", scope) client = gspread.authorize(creds_file) sheet = client.open(sheet_name).sheet1 sheet.update([df.columns.values.tolist()] + df.values.tolist())
When you run the script you will get a response from Google Sheets API indicating the call was successful. The transfer is nearly instant and you can see the result if you have the sheet in another tab. Note the sheet will be overwritten. There are options to make it append in the gspread documentation.
The true power of this method is now you can automate this process to run say once a day or once a week. From here you have two options. You can leverage cron jobs if you are running it on a Linux installation or you can run it in the cloud with Google Cloud Functions and Google Cloud Scheduler. GCF and GCP are not difficult to set up but beyond the scope of this tutorial.
Automate the Process
If your script above is working well when you run it manually, it’s time to automate it. Luckily, Linux already supplies us with a solution by using the crontab. The crontab stores entries of scripts where you can dictate when to execute them (like a scheduler). You have lots of flexibility with how you schedule your script (any time of day, day of the week, day of the month, etc.). To add entries to the crontab, run this command:
crontab -e
It will likely open up the crontab file in vi editor. On a blank line at the bottom of the file, type the code below. This code will run the script at midnight every Sunday. To change the time to something else, use this cronjob time editor. Customize with your path to the script.
0 0 * * SUN /usr/bin/python3 PATH_TO_SCRIPT/filename.py
If you want to create a log file to record each time the script ran, you can use this instead. Customize with your path to the script.
0 0 * * SUN /usr/bin/python3 PATH_TO_SCRIPT/filename.py > PATH_TO_FILE/FILENAME.log 2>&1
Save the crontab file and you’re good to go! Just note, that your computer needs to be on at the time the cronjob is set to run.
Conclusion
That’s it! You’ve now broken out your keyword data from Semrush using their API. From here you can do analysis within Google Sheets, import it into another platform or use it for reporting. The possibilities are endless! Remember I used Semrush API, but this tutorial can be adjusted with little effort to use many other APIs if you use Moz, ahrefs, or Majestic instead. I personally have used this method for Brightlocal API too. Now get out there and try it out! Follow me on Twitter and let me know your applications and ideas!
Special thanks to @TechWithTimm and @jdegbau for the inspiration.
Google Sheets and SemRush API FAQ
How can I update a Google Sheet using the Semrush Position Tracking API with Python?
Python scripts can be employed to make requests to the Semrush Position Tracking API and update a Google Sheet with the obtained position tracking data, facilitating seamless integration for SEO analysis.
What Python libraries are commonly used for interacting with the Semrush Position Tracking API and updating Google Sheets?
Python’s requests
library is typically used for making API requests to Semrush, and the gspread
library facilitates interactions with Google Sheets, allowing for the updating of spreadsheet data.
What specific data can be obtained from the Semrush Position Tracking API and updated in a Google Sheet using Python?
Python scripts can extract a variety of position tracking data from the Semrush API, including keyword rankings, search visibility, and other relevant metrics. This information can then be updated in specific cells of a Google Sheet.
Are there any considerations or limitations to be aware of when using Python to update a Google Sheet with Semrush Position Tracking data?
Consider factors such as API rate limits, proper authentication, and the structure of the Google Sheet. Ensure compliance with Semrush API usage policies during data retrieval.
Where can I find examples and documentation for using Python to update a Google Sheet with Semrush Position Tracking API data?
Explore the official documentation for the Semrush Position Tracking API for comprehensive guides. Additionally, refer to online tutorials and Python resources for practical examples and implementation details in updating Google Sheets with Semrush data using Python.
- Evaluate Subreddit Posts in Bulk Using GPT4 Prompting - December 12, 2024
- Calculate Similarity Between Article Elements Using spaCy - November 13, 2024
- Audit URLs for SEO Using ahrefs Backlink API Data - November 11, 2024