In 2017 Chrome released a headless (no GUI) feature of being able to take a screenshot of a single web page at from a specified viewport. This helpful in keeping an archive for version comparison, monitoring and client facing deliverables. Because it’s a headless feature it’s perfect for using with Python. In just several lines we’ll run headless Chrome, take the screenshot, optimize it and then we’re ready to do whatever else with it. Let’s get started!
Requirements and Assumptions
- Python 3 is installed and basic Python syntax understood
- Access to a Linux installation (I recommend Ubuntu)
- Chrome Browser installed
Starting the Script
First let’s import the optimize-images (command-line) module to compress the screenshot into an optimized PNG file.
pip3 install optimize-images
Next we import our required modules. All we need outside of core Python is time, datetime and OS module to run the optimize-images module. Remember it’s a command line module so we’ll need to run it as an executable.
import os from datetime import datetime import time
Let’s set a few variables we’ll need. The name variable will be used when creating the screenshot file name. Reassign the name variable to your website name, no spaces. Reassign the url to the address of the page you want to take a screenshot of.
name = "importsem" url = "https://importsem.com" getdate = datetime.now().strftime("%m-%d-%y")
Take the Screenshot
Now we can do ahead and use the os module to run Chrome in headless mode, hide the scrollbars for a better image and set the window size. The window size is something you’ll have to play around with depending on your layout. Do be sure to swap in your screenshot destination path. We will use a 15 second delay to make sure the file is ready before continuing. So we can compare original size to optimized size we’ll use the os.stat() function to get file size.
try: stream = os.popen("chromium-browser --headless --hide-scrollbars --screenshot='/PATH_TO_DESTINATION/" + name + "_org_" + getdate + ".png' --window-size=1920,1200 " + url) time.sleep(15) org_png = os.stat('/PATH_TO_DESTINATION/" + name + "_org_" + getdate + ".png').st_size
Optimize the Screenshot Image
Now that we have our screenshot PNG we should optimize it to save some file size. We’ll use the optimize-images module. In the documentation there are some configurations you can play with to get a better compression. I just leave them on default. I generally save about 10-15% of file size. Once again be sure to change the destination path in the code. We will use a 20 second delay to make sure the file is ready before continuing.
stream = os.popen("optimize-images /PATH_TO_DESTINATION/" + name + "_op_" + getdate + ".png") time.sleep(20) op_png = os.stat('/PATH_TO_DESTINATION/" + name + "_op_" + getdate + ".png').st_size
Now we close out the Try/Except catching any errors.
except: print("Screenshot failed")
Compare the Two Images
To close out, let’s compare the file sizes and report the bytes saved from compression. First we check if the images exist where we expect them to be. Then we print out each image size and calculate the difference.
if os.path.isfile() == true and os.path.isfile() == true: print("Original Image: " + org_png) print("Optimized Image" + op_png) print("Saved: " + org_png-op_png) else: print("One of the files doesn't exist")
You’ve just learned how easy it is to take screenshots from headless Chrome sing Python. You can easily automate this script to store the data in a database and or loop through an entire site by loading a csv from a Screaming Frog crawl containing every URL on your site. Feel free to comment or Tweet me any cool ways you’ve built Wayback Machine API into your own scripts and extended this one.