Create a Simple Sitemap Generator App for SEO Using Python and Streamlit

Estimated Read Time: 6 minute(s)

Common Topics: app, sitemap, file, streamlit, generator

This past week I needed to create a few custom sitemaps for a client. It needed to consist of nearly 200 URLs. I could have manually accomplished this in an hour, but I thought this was a good opportunity to create a quick script to help. The script to create the sitemap was less than 10 lines of Python and took 10 minutes to write. After finishing my work task I thought this was a good opportunity to make another Streamlit app and base a tutorial on it. In this tutorial, I’ll give you the framework for creating a simple sitemap generator app on Streamlit. I have purposely only included the required elements for a sitemap as many of the optional elements like a priority and last modified are debatable if useful at all. The live app is located at the bottom of this tutorial if you want to give it a try.

Streamlit is quite easy to learn the basics of, although it has plenty of limitations. I’m happy to say that the development is very fast-paced and new features are coming out every month. I will not be going over the basics of setting up Streamlit. You can find that setup information here. Please take a few minutes to familiarize yourself with the basics here.

Table of Contents

Requirements and Assumptions

Python 3 is installed and basic Python syntax understood
Access to a Linux installation (I recommend Ubuntu)
URL list CSV with URL column labeled “Address” (default from ScreamingFrog in this tutorial)
Streamlit installed locally and optionally sign up for an app deployment account

Import Modules

pandas: for importing the CSV file into the app
os: for saving and opening the XML file
streamlit: a framework for making the script into an app
base64: encoding and decoding for file writing

Let’s first import the modules needed for this script expressed above.

import pandas as pd
import os
import streamlit as st
import base64

Style and Set App Header Info

We first use st.markdown() function, which allows you to add HTML, to add some styling to the heading of the app.

st.markdown("""
<style>
.big-font {
    font-size:50px !important;
}
</style>
""", unsafe_allow_html=True)

Next, we create our app heading and some directions again using st.markdown()

st.markdown("""
<p class="big-font">Simple Sitemap Generator</p>
<b>Directions: </b></ br><ol>
<li>Upload CSV of ScreamingFrog crawl or URL list</li>
<li>CSV must have URLs under a column named 'Address' (Standard in ScreamingFrog)</li>
</ol>
""", unsafe_allow_html=True)

Create App Inputs

I like to offer the choice for the user their sitemap. For this we use one of Streamlit’s input functions st.text_input() which makes a textbox. You can see the input can be assigned to a variable right away. Spaces in filenames are problematic so we replace those with hyphens and lastly, we add the XML file extension. Next, we use the st.file_uploader() to create a file upload widget for the intake of the CSV URL list. You can see one of the parameters is to specify which file types are allowed. It’s in list form so you can add more by delimiting via a comma.

filename = st.text_input('Create Sitemap File Name','ex domain-sitemap')
filename = filename.replace(' ','-')
filename = filename + ".xml"

get_csv = st.file_uploader("Upload CSV File",type=['csv'])

Process URL List

Streamlit essentially processes the script top to bottom, but there are places where it pauses to wait for input. After all, we don’t want some code to run until a user takes an action. We handle this by adding a conditional to check if the variable that handles the file upload is “not None“. Once the user uploads the file, the conditional is satisfied and we continue the script. At this point, we import the CSV URL list into a pandas dataframe which we immediately convert into a list for easier iteration. We’re going to build the entire sitemap content into a single variable called urllist. We start it off by adding the sitemap urlset schema tag which is required by sitemap standards.

if get_csv is not None:
    df = pd.read_csv(get_csv)

    urls = df['Address'].tolist()

     urllist = "<?xml version='1.0' encoding='UTF-8'?>" + "\n"
     urllist += "<urlset xmlns='http://www.sitemaps.org/schemas/sitemap/0.9'>" + "\n"

Now we just loop through the list of URLs and add them to urllist variable. Notice we add a linebreak to the end of each URL. Once we exhaust our URL list we close the urlset tag.

for i in urls:
      urllist += f"<url><loc>{i}</loc></url>" + "\n"

    urllist += "</urlset>"

Create XML File to Download

So we have our entire sitemap document in a variable, now we need to make it an actual file the user can download. Streamlit at the moment doesn’t make this very easy, so we need to use a workaround. For this, we’ll create a function that takes in the filename. Assume we’ve created a file as we will after this part. We want to open that file and assign the contents to the variable XML. Then we’ll convert to Base64. Base64 encoding is a type of conversion of bytes into ASCII characters. The result is a URL that essentially contains the file data that you can choose to download like a virtual file.

def get_xml_download_link(filename):
        with open(filename, 'r') as f:
            xml = f.read()
        b64 = base64.b64encode(xml.encode()).decode()
        return f'<a href="data:file/xml;base64,{b64}" download="{filename}">Download Sitemap XML file</a>'

To make the above function work we need to first create a new file and write the content of the variable urllist. Using st.write() we add a little fun notice with an emoji. The last line, st.markdown() calls the function we created above and returns the download link to print.

    open(filename, "w").write(urllist)
    st.write(":sunglasses: Sitemap Generation Successful :sunglasses:")
    st.markdown(get_xml_download_link(filename), unsafe_allow_html=True)

This last little line is just my author and friends byline to show how adding links works using st.write().

st.write('Author: [Greg Bernhardt](https://twitter.com/GregBernhardt4) | Friends: [Rocket Clicks](https://www.rocketclicks.com), [importSEM](https://importsem.com) and [Physics Forums](https://www.physicsforums.com)')

App Finished!

Test out the sitemap generator app here!

For Streamlit help and resources follow: Charly Wargnier and Fanilo Andrianasolo

Now get out there and try it out! Follow me on Twitter and let me know your applications and ideas!

Python Sitemap with Streamlit FAQ

How can Python and Streamlit be utilized to build a simple sitemap generator app for SEO purposes?

Python scripts, along with the Streamlit library, can be employed to create a user-friendly app that generates sitemaps. This app simplifies the process of creating sitemaps for SEO optimization.

Which Python libraries are commonly used for building a sitemap generator app?

Python libraries such as xml.etree.ElementTree for XML generation and streamlit for creating interactive web apps are commonly used for developing sitemap generator applications.

What specific steps are involved in building a simple sitemap generator app with Python and Streamlit?

The process includes creating Python scripts to generate XML-based sitemaps, integrating these scripts into a Streamlit app, and deploying the app for user-friendly sitemap generation.

Are there any considerations or limitations when building a sitemap generator app for SEO with Python and Streamlit?

Consider factors like user interface design, app responsiveness, and potential limitations in handling large websites. Ensure that the app meets SEO standards for sitemap formats.

Where can I find examples and documentation for building a sitemap generator app with Python and Streamlit?

Explore online tutorials, documentation for Streamlit and XML handling in Python, and resources specific to web app development for practical examples and guidance in creating a sitemap generator app for SEO.

Author
Recent Posts

Follow me

Greg Bernhardt

Sr. SEO Specialist for Shopify. Nearly 20 years of experience in web design, web development, and web marketing. Education in Information Sciences from UW-Milwaukee. Managing the largest online US physics community. Enjoy learning about search engines, SEO, chrome tricks, Python, knowledge graphs, data science, and more!