We can strip out that unneeded data often reducing file sizes by sometimes 80+%. This can have a profound impact on site performance. Too often I see sites loading images over 1MB. Unless you’re running an art or photography store (where the highest possible quality can matter), that is ridiculous in this age. Many common CMS’s now have built-in or plugin capabilities to handle image compression but I still come across those uncommon or legacy CMS’s that don’t and you need to use FTP. In that case, you need this tutorial!
In this tutorial, I’m going to show you how easy it is to set up an automated process to download new images for the day, compress them and then upload them back to the server. All for free!
Table of Contents
Requirements and Assumptions
- Python 3 is installed and basic Python syntax understood
- Access to a Linux installation (I recommend Ubuntu) or Google Cloud Platform (some alliterations will need to be made)
- FTP login to host server and access to the server image folder
Import Modules and Set Authentication
Before we begin, remember to watch the indents of anything you copy here as sometimes the code snippets don’t copy perfectly. All of the modules below should be in core Python 3. I found I needed to update PIL to the latest version 8.2, you can do that via this command in your terminal:
pip3 install PIL --upgrade
- ftplib: handles the FTP connection
- pathlib: helps identify the image extension cleanly
- dateutil: extends the datetime module
- datetime: handles the date and time functions
- PIL: processes the image compression
- os: for opening and writing files locally
- glob: extends the os module
Import Python Modules
Let’s first import the modules needed for this script expressed above.
from ftplib import FTP import pathlib from dateutil import parser from datetime import date from datetime import datetime from PIL import Image import PIL import os import glob
Setup FTP Connection
Next we set up our FTP connection variables. Note this script as written is insecure. If you need to FTP over TLS. There are a few small modifications like using the FTP_TLS() function which you can make found in this documentation. Then we make the connection using the variables we just setup. Lastly, we open the folder where the images are. Many systems will create subfolders. In this case, you’ll need to develop some kind of recursive method to loop through the folders. That is beyond this tutorial.
host = "YOUR_SERVER_IP" port = "YOUR_PORT" username = "YOUR_USERNAME" password = "YOUR_PASSWORD" img_folder_path = "WHERE_YOUR_IMAGES_ARE_STORED_ON_SERVER" ftp = FTP() ftp.set_debuglevel(2) ftp.connect(host, port) ftp.login(username, password) ftp.cwd(img_folder_path )
Setup Script Variables
Next, we capture the names of all the images in the folder using the ftp.mlsd() function. We can only optimize JPG and PNG files so let’s put them in a list to match against the files on the server. We’ll then need to define where to locally store the uncompressed images we want to optimize. Finally, we store today’s date to match against the files on the server so we only download what was uploaded today. This prevents you from downloading everything and optimizing them into infinity. Lastly, we create a log file if one doesn’t already exist so we can keep track of our optimizations over time.
names = ftp.mlsd() imglist = [".jpg",".jpeg",".png",".JPG",".JPEG",".PNG"] rawpath = "UNCOMPRESSED_IMG_PATH_LOCAL" today = date.today() todayshort = today.strftime("%Y/%m/%d") logfile = open(rawpath + "opt/log.txt", "a")
Loop Through Images
Now that we have captured all the file names in the folder on the server, we can process only those that have a modified date equal to today and an extension that is in our list. For example files uploaded yesterday or gifs and webp files will not be downloaded for processing.
for name, facts in names: mod_date = str(datetime.strptime(facts["modify"],"%Y%m%d%H%M%S"))[:10] if pathlib.Path(name).suffix in imglist and mod_date == todayshort:
If we encounter a file that satisfies our requirements we use ftp.retrbinary() to download the file to whatever path we set for rawpath. Next, we make a variable for the uncompressed image named filename that includes the local path of the file. We then need to create a subfolder that will house the optimized image. This way you can always revert back to the original image if you aren’t happy with the results. Never overwrite your original file, always keep backups.
Finally, we use the Image function in the PIL module to open the image and resave the file to the optimized subfolder setting optimize parameter to True and the quality to a reasonable 65. Play around with the quality level. For some images, you can go down as far as 35 and have it be ok, but in general, you’ll find yourself between 65-80.
ftp.retrbinary("RETR " + name, open(rawpath + name, 'wb').write) filename = rawpath + name filename_opt = rawpath + "opt/" + name picture = Image.open(filename) picture.save(filename_opt, optimize=True, quality=65)
Upload and Log
With the new optimized file ready, it’s time to upload it back to the server and overwrite the original file (keep your local original backup until you are sure it’s acceptable). Then we log the details and do some good housekeeping with closing the files and connections.
fp = open(filename_opt, 'rb') ftp.storbinary('STOR %s' % os.path.basename(filename_opt), fp, 1024) fp.close() org_size = os.path.getsize(filename) opt_size = os.path.getsize(filename_opt) logfile.write(today + " - " + name + "Org: " + str(org_size/1024)kb + " Opt: " + str(opt_size/1024)kb + str((opt_size-org_size)/org_size*100) + "% savings") ftp.quit() logfile.close()
That’s it! Now it’s time to automate!
Automating the Compression
I can show you two options for automating this process. Run at 11:30 pm…
- Send it to the cloud and use Google Cloud Platform. I have a tutorial on setting up Google Cloud Platform with Cloud Functions and Cloud Scheduler.
- Automate it locally via your cronjob system if you are using a Linux distro or MacOS. See below:
Luckily, Linux already supplies us with a solution by using the crontab. The crontab stores entries of scripts where you can dictate when to execute them (like a scheduler). You have lots of flexibility with how you schedule your script (any time of day, day of the week, day of the month, etc.).
But first, if you are going this route you should add a shebang to the very top of your script, it tells Linux to run the script using Python3:
Now back to the crontab! To open it and add entries to the crontab, run this command:
It will likely open up the crontab file in vi editor. On a blank line at the bottom of the file, type the code below. This code will run the script at midnight every Sunday. To change the time to something else, use this cronjob time editor. Customize with your path to the script.
0 0 * * SUN /usr/bin/python3 PATH_TO_SCRIPT/filename.py
If you want to create a log file to record each time the script ran, you can use this instead. Customize with your path to the script.
0 0 * * SUN /usr/bin/python3 PATH_TO_SCRIPT/filename.py > PATH_TO_FILE/FILENAME.log 2>&1
Save the crontab file and you’re good to go! Just note, that your computer needs to be on at the time the cronjob is set to run.
So there you have it! You can now optimize those images via FTP for free and automated. Set it and forget it! Don’t forget core vitals is going to be a ranking factor in May 2021. Future extensions for the tutorial would be to handle multiple folders and recursion. Now get out there and try it out! Follow me on Twitter and let me know your applications and ideas!
Thanks to James Phoenix for the inspiration after reading his tutorial here:
- Detect Text in Images in Bulk With Tesseract Using Python for SEO - June 4, 2022
- Create a Topical Internal Link Graph for SEO with NetworkX and Python - April 24, 2022
- Evaluate Sentiment Analysis in Bulk with spaCy and Python - March 17, 2022