sitemap generator seo python streamlit
Estimated Read Time: 5 minute(s)
Common Topics: file, sitemap, app, data, enlighter

This past week I needed to create a few custom sitemaps for a client. It needed to consist of nearly 200 URLs. I could have manually accomplished this in an hour, but I thought this was a good opportunity to create a quick script to help. The script to create the sitemap was less than 10 lines of Python and took 10 minutes to write. After finishing my work task I thought this was a good opportunity to make another Streamlit app and base a tutorial on it. In this tutorial, I’ll give you the framework for creating a simple sitemap generator app on Streamlit. I have purposely only included the required elements for a sitemap as many of the optional elements like a priority and last modified are debatable if useful at all. The live app is located at the bottom of this tutorial if you want to give it a try.

Streamlit is quite easy to learn the basics of, although it has plenty of limitations. I’m happy to say that the development is very fast-paced and new features are coming out every month. I will not be going over the basics of setting up Streamlit. You can find that setup information here. Please take a few minutes to familiarize yourself with the basics here.

Requirements and Assumptions

  • Python 3 is installed and basic Python syntax understood
  • Access to a Linux installation (I recommend Ubuntu)
  • URL list CSV with URL column labeled “Address” (default from ScreamingFrog in this tutorial)
  • Streamlit installed locally and optionally sign up for an app deployment account

Import Modules

  • pandas: for importing the CSV file into the app
  • os: for saving and opening the XML file
  • streamlit: a framework for making the script into an app
  • base64: encoding and decoding for file writing

Let’s first import the modules needed for this script expressed above.

import pandas as pd
import os
import streamlit as st
import base64

Style and Set App Header Info

We first use st.markdown() function, which allows you to add HTML, to add some styling to the heading of the app.

st.markdown("""
<style>
.big-font {
    font-size:50px !important;
}
</style>
""", unsafe_allow_html=True)

Next, we create our app heading and some directions again using st.markdown()

st.markdown("""
<p class="big-font">Simple Sitemap Generator</p>
<b>Directions: </b></ br><ol>
<li>Upload CSV of ScreamingFrog crawl or URL list</li>
<li>CSV must have URLs under a column named 'Address' (Standard in ScreamingFrog)</li>
</ol>
""", unsafe_allow_html=True)

Create App Inputs

I like to offer the choice for the user their sitemap. For this we use one of Streamlit’s input functions st.text_input() which makes a textbox. You can see the input can be assigned to a variable right away. Spaces in filenames are problematic so we replace those with hyphens and lastly, we add the XML file extension. Next, we use the st.file_uploader() to create a file upload widget for the intake of the CSV URL list. You can see one of the parameters is to specify which file types are allowed. It’s in list form so you can add more by delimiting via a comma.

filename = st.text_input('Create Sitemap File Name','ex domain-sitemap')
filename = filename.replace(' ','-')
filename = filename + ".xml"

get_csv = st.file_uploader("Upload CSV File",type=['csv'])

Process URL List

Streamlit essentially processes the script top to bottom, but there are places where it pauses to wait for input. After all, we don’t want some code to run until a user takes an action. We handle this by adding a conditional to check if the variable that handles the file upload is “not None“. Once the user uploads the file, the conditional is satisfied and we continue the script. At this point, we import the CSV URL list into a pandas dataframe which we immediately convert into a list for easier iteration. We’re going to build the entire sitemap content into a single variable called urllist. We start it off by adding the sitemap urlset schema tag which is required by sitemap standards.

if get_csv is not None:
    df = pd.read_csv(get_csv)

    urls = df['Address'].tolist()

     urllist = "<?xml version='1.0' encoding='UTF-8'?>" + "\n"
     urllist += "<urlset xmlns='http://www.sitemaps.org/schemas/sitemap/0.9'>" + "\n"

Now we just loop through the list of URLs and add them to urllist variable. Notice we add a linebreak to the end of each URL. Once we exhaust our URL list we close the urlset tag.

for i in urls:
      urllist += f"<url><loc>{i}</loc></url>" + "\n"

    urllist += "</urlset>"

Create XML File to Download

So we have our entire sitemap document in a variable, now we need to make it an actual file the user can download. Streamlit at the moment doesn’t make this very easy, so we need to use a workaround. For this, we’ll create a function that takes in the filename. Assume we’ve created a file as we will after this part. We want to open that file and assign the contents to the variable XML. Then we’ll convert to Base64. Base64 encoding is a type of conversion of bytes into ASCII characters. The result is a URL that essentially contains the file data that you can choose to download like a virtual file.

def get_xml_download_link(filename):
        with open(filename, 'r') as f:
            xml = f.read()
        b64 = base64.b64encode(xml.encode()).decode()
        return f'<a href="data:file/xml;base64,{b64}" download="{filename}">Download Sitemap XML file</a>'

To make the above function work we need to first create a new file and write the content of the variable urllist. Using st.write() we add a little fun notice with an emoji. The last line, st.markdown() calls the function we created above and returns the download link to print.

    open(filename, "w").write(urllist)
    st.write(":sunglasses: Sitemap Generation Successful :sunglasses:")
    st.markdown(get_xml_download_link(filename), unsafe_allow_html=True)

This last little line is just my author and friends byline to show how adding links works using st.write().

st.write('Author: [Greg Bernhardt](https://twitter.com/GregBernhardt4) | Friends: [Rocket Clicks](https://www.rocketclicks.com), [importSEM](https://importsem.com) and [Physics Forums](https://www.physicsforums.com)')

App Finished!

sitemap generator app

Test out the sitemap generator app here!

For Streamlit help and resources follow: Charly Wargnier and Fanilo Andrianasolo

Now get out there and try it out! Follow me on Twitter and let me know your applications and ideas!

Greg Bernhardt
Follow me

Leave a Reply