SEOs wear many hats and from time to time whether during a technical audit or technical troubleshooting it’s nice to have public technical information handy for a domain you’re working on. Below are some Python tools you can use to easily grab that available domain information. It would be easy to loop this over your entire client list via a manual Python list object or via a database and automate to run every morning so you always have the freshest information at your disposal.
Table of Contents
Install Modules
!pip3 install whois
!pip3 install dnspython
!pip3 install pyOpenSSL
Note that the Whois module is dependent on having a Whois app on your computer. Windows does not inherently have one, neither does Google Colab. This is best run with Linux like Ubuntu. To make sure your Whois on Ubuntu is installed and updated, run the commands below.
sudo apt-get update sudo apt-get install whois
Import Modules
import whois import json import requests import re import socket import dns.resolver import ssl import OpenSSL
Now that the modules have been installed and imported we need to set our domain variable which contains the domain you want to use.
domain="rocketclicks.com"
Get MX Records
mailservers = "" for x in dns.resolver.resolve(domain, 'MX'): mailservers += x.to_text() + "\n" print(mailservers)
Get WHOIS Records
Note, there is a bunch of information you can grab. Uncomment print(w) to see the JSON response and you can pick out what you want.
w = whois.whois(domain) #print(w) registrar = w['registrar'] expiration_date = w['expiration_date'] print(registrar) print(expiration_date)
Get Domain IP
domainip = socket.gethostbyname(domain)
Get NameServers
dnsrecords="" getresolver = dns.resolver.Resolver() getns = getresolver.resolve(domain, "NS") for rdata in getns: dnsrecords += str(rdata) + "\n" print(dnsrecords)
Get Text Records (SPF)
You need the Try/Except because not all domains will have text records. See this SPF/DMARC module to extend this with validation and warning outputs.
textrecords = "" getresolver = dns.resolver.Resolver() try: gettext = getresolver.resolve(domain, "TXT") for rdata in gettext: textrecords += str(rdata) + "\n" except: textrecords = "n/a" print(textrecords)
Get Server Request Header Info
response = requests.head(url,verify=True) header = dict(response.headers) headerinfo = "" for key, value in header.items(): headerinfo += key + ': ' + value + "\n" print(headerinfo)
TLS Version and Certificate Info
try: cert = ssl.get_server_certificate((domain, 443)) x509 = OpenSSL.crypto.load_certificate(OpenSSL.crypto.FILETYPE_PEM, cert) expobj = str(x509.get_notAfter()) expiredate = re.search("[0-9]{8}",expobj) date1 = expiredate.group(0) datey = date1[0:4] datem = date1[4:6] dated = date1[6:8] date = datem + "-" + dated + "-" + datey issueobj = str(x509.get_issuer()) issurer = re.search("CN=[a-zA-Z0-9\s'-]+",issueobj) issurer1 = issurer.group(0).replace("'","") print(issurer1) sslinfo = "Expiry Date: " + date + " \n Issuer: " + issurer1 except: sslinfo = "n/a" hostname = domain context = ssl.create_default_context() try: with socket.create_connection((hostname, 443)) as sock: with context.wrap_socket(sock, server_hostname=hostname) as ssock: tls = ssock.version() tls = tls.replace("TLSv","") sslerror = "0" except BaseException as e: tls="0" print(sslinfo) print(tls)
Conclusion
Now you have the framework to begin creating your own uptime monitor using a raspberry pi, electrical breadboard, and an LCD screen. Lots more potential on this one! And there you have it! If you find more modules and opportunities to scrap technical information for domains, please let me know and I’ll add it to this list! In a follow-up post, I’ll be showing how to check for blacklists, reverse ips, and technologies a website is using. Stay tuned! Now get out there and try it out! Follow me on Twitter and let me know your Python SEO applications and ideas!
- Find Interlinking Opps via Entity N-gram Matches Using Python - April 3, 2023
- Build and Run Python Scripts on the Fly With GPT-3 - January 5, 2023
- Compare Keyword SERP Similarity in Bulk with Python - November 18, 2022