This past week I needed to create a few custom sitemaps for a client. It needed to consist of nearly 200 URLs. I could have manually accomplished this in an hour, but I thought this was a good opportunity to create a quick script to help. The script to create the sitemap was less than 10 lines of Python and took 10 minutes to write. After finishing my work task I thought this was a good opportunity to make another Streamlit app and base a tutorial on it. In this tutorial, I’ll give you the framework for creating a simple sitemap generator app on Streamlit. I have purposely only included the required elements for a sitemap as many of the optional elements like a priority and last modified are debatable if useful at all. The live app is located at the bottom of this tutorial if you want to give it a try.
Streamlit is quite easy to learn the basics of, although it has plenty of limitations. I’m happy to say that the development is very fast-paced and new features are coming out every month. I will not be going over the basics of setting up Streamlit. You can find that setup information here. Please take a few minutes to familiarize yourself with the basics here.
Table of Contents
Requirements and Assumptions
- Python 3 is installed and basic Python syntax understood
- Access to a Linux installation (I recommend Ubuntu)
- URL list CSV with URL column labeled “Address” (default from ScreamingFrog in this tutorial)
- Streamlit installed locally and optionally sign up for an app deployment account
Import Modules
- pandas: for importing the CSV file into the app
- os: for saving and opening the XML file
- streamlit: a framework for making the script into an app
- base64: encoding and decoding for file writing
Let’s first import the modules needed for this script expressed above.
import pandas as pd import os import streamlit as st import base64
Style and Set App Header Info
We first use st.markdown() function, which allows you to add HTML, to add some styling to the heading of the app.
st.markdown(""" <style> .big-font { font-size:50px !important; } </style> """, unsafe_allow_html=True)
Next, we create our app heading and some directions again using st.markdown()
st.markdown(""" <p class="big-font">Simple Sitemap Generator</p> <b>Directions: </b></ br><ol> <li>Upload CSV of ScreamingFrog crawl or URL list</li> <li>CSV must have URLs under a column named 'Address' (Standard in ScreamingFrog)</li> </ol> """, unsafe_allow_html=True)
Create App Inputs
I like to offer the choice for the user their sitemap. For this we use one of Streamlit’s input functions st.text_input() which makes a textbox. You can see the input can be assigned to a variable right away. Spaces in filenames are problematic so we replace those with hyphens and lastly, we add the XML file extension. Next, we use the st.file_uploader() to create a file upload widget for the intake of the CSV URL list. You can see one of the parameters is to specify which file types are allowed. It’s in list form so you can add more by delimiting via a comma.
filename = st.text_input('Create Sitemap File Name','ex domain-sitemap') filename = filename.replace(' ','-') filename = filename + ".xml" get_csv = st.file_uploader("Upload CSV File",type=['csv'])
Process URL List
Streamlit essentially processes the script top to bottom, but there are places where it pauses to wait for input. After all, we don’t want some code to run until a user takes an action. We handle this by adding a conditional to check if the variable that handles the file upload is “not None“. Once the user uploads the file, the conditional is satisfied and we continue the script. At this point, we import the CSV URL list into a pandas dataframe which we immediately convert into a list for easier iteration. We’re going to build the entire sitemap content into a single variable called urllist. We start it off by adding the sitemap urlset schema tag which is required by sitemap standards.
if get_csv is not None: df = pd.read_csv(get_csv) urls = df['Address'].tolist() urllist = "<?xml version='1.0' encoding='UTF-8'?>" + "\n" urllist += "<urlset xmlns='http://www.sitemaps.org/schemas/sitemap/0.9'>" + "\n"
Now we just loop through the list of URLs and add them to urllist variable. Notice we add a linebreak to the end of each URL. Once we exhaust our URL list we close the urlset tag.
for i in urls: urllist += f"<url><loc>{i}</loc></url>" + "\n" urllist += "</urlset>"
Create XML File to Download
So we have our entire sitemap document in a variable, now we need to make it an actual file the user can download. Streamlit at the moment doesn’t make this very easy, so we need to use a workaround. For this, we’ll create a function that takes in the filename. Assume we’ve created a file as we will after this part. We want to open that file and assign the contents to the variable XML. Then we’ll convert to Base64. Base64 encoding is a type of conversion of bytes into ASCII characters. The result is a URL that essentially contains the file data that you can choose to download like a virtual file.
def get_xml_download_link(filename): with open(filename, 'r') as f: xml = f.read() b64 = base64.b64encode(xml.encode()).decode() return f'<a href="data:file/xml;base64,{b64}" download="{filename}">Download Sitemap XML file</a>'
To make the above function work we need to first create a new file and write the content of the variable urllist. Using st.write() we add a little fun notice with an emoji. The last line, st.markdown() calls the function we created above and returns the download link to print.
open(filename, "w").write(urllist) st.write(":sunglasses: Sitemap Generation Successful :sunglasses:") st.markdown(get_xml_download_link(filename), unsafe_allow_html=True)
This last little line is just my author and friends byline to show how adding links works using st.write().
st.write('Author: [Greg Bernhardt](https://twitter.com/GregBernhardt4) | Friends: [Rocket Clicks](https://www.rocketclicks.com), [importSEM](https://importsem.com) and [Physics Forums](https://www.physicsforums.com)')
App Finished!
Test out the sitemap generator app here!
For Streamlit help and resources follow: Charly Wargnier and Fanilo Andrianasolo
Now get out there and try it out! Follow me on Twitter and let me know your applications and ideas!
Python Sitemap with Streamlit FAQ
How can Python and Streamlit be utilized to build a simple sitemap generator app for SEO purposes?
Python scripts, along with the Streamlit library, can be employed to create a user-friendly app that generates sitemaps. This app simplifies the process of creating sitemaps for SEO optimization.
Which Python libraries are commonly used for building a sitemap generator app?
Python libraries such as xml.etree.ElementTree
for XML generation and streamlit
for creating interactive web apps are commonly used for developing sitemap generator applications.
What specific steps are involved in building a simple sitemap generator app with Python and Streamlit?
The process includes creating Python scripts to generate XML-based sitemaps, integrating these scripts into a Streamlit app, and deploying the app for user-friendly sitemap generation.
Are there any considerations or limitations when building a sitemap generator app for SEO with Python and Streamlit?
Consider factors like user interface design, app responsiveness, and potential limitations in handling large websites. Ensure that the app meets SEO standards for sitemap formats.
Where can I find examples and documentation for building a sitemap generator app with Python and Streamlit?
Explore online tutorials, documentation for Streamlit and XML handling in Python, and resources specific to web app development for practical examples and guidance in creating a sitemap generator app for SEO.
- Calculate Similarity Between Article Elements Using spaCy - November 13, 2024
- Audit URLs for SEO Using ahrefs Backlink API Data - November 11, 2024
- Build a Custom Named Entity Visualizer with Google NLP - June 19, 2024