domain technicals

Use Python to Scrape Technical Info for Domains

Estimated Read Time: 3 minute(s)
Common Topics: domain, data, enlighter, technical, information

SEOs wear many hats. During a technical audit or troubleshooting, it’s useful to have a domain’s public technical information on hand. Below are some Python tools you can use to fetch that domain information. You can easily loop this over your client list—using a Python list or a database—and automate it to run every morning so you always have fresh data.

Install Modules

!pip3 install whois

!pip3 install dnspython

!pip3 install pyOpenSSL

Note that the Whois module depends on a whois client installed on your system. Windows and Google Colab do not include one by default. This is best run on Linux distributions such as Ubuntu. To ensure whois is installed and up to date on Ubuntu, run the commands below.

sudo apt-get update
sudo apt-get install whois

Import Modules

import whois
import json
import requests
import re
import socket
import dns.resolver
import ssl
import OpenSSL

After installing and importing the modules, set the domain variable to the domain you want to query.

domain="rocketclicks.com"

Get MX Records

mailservers = "" 
for x in dns.resolver.resolve(domain, 'MX'): 
    mailservers += x.to_text() + "\n"
print(mailservers)

Get WHOIS Records

There is a lot of information available. Uncomment print(w) to view the JSON response and select the fields you need.

w = whois.whois(domain)

#print(w)

registrar = w['registrar']
expiration_date = w['expiration_date']

print(registrar)
print(expiration_date)

Get Domain IP

domainip = socket.gethostbyname(domain)

Get NameServers

dnsrecords=""
getresolver = dns.resolver.Resolver() 
getns = getresolver.resolve(domain, "NS") 
for rdata in getns:
    dnsrecords += str(rdata) + "\n"
print(dnsrecords)

Get Text Records (SPF)

A try/except block is needed because not all domains have TXT records. See the SPF/DMARC module to extend this with validation and warnings.

textrecords = ""
getresolver = dns.resolver.Resolver()
    
try:
    gettext = getresolver.resolve(domain, "TXT") 
    for rdata in gettext: 
        textrecords += str(rdata) + "\n"
except:
    textrecords = "n/a"
print(textrecords)

Get Server Request Header Info

response = requests.head(url,verify=True)
header = dict(response.headers)
headerinfo = ""
for key, value in header.items():
    headerinfo += key + ': ' + value + "\n"
print(headerinfo)

TLS Version and Certificate Info

try:
      cert = ssl.get_server_certificate((domain, 443))
      x509 = OpenSSL.crypto.load_certificate(OpenSSL.crypto.FILETYPE_PEM, cert)

      expobj = str(x509.get_notAfter())
      expiredate = re.search("[0-9]{8}",expobj)
      
      date1 = expiredate.group(0)
      datey = date1[0:4]
      datem = date1[4:6]
      dated = date1[6:8]
      date = datem + "-" + dated + "-" + datey

      issueobj = str(x509.get_issuer())
      issurer = re.search("CN=[a-zA-Z0-9\s'-]+",issueobj)
      issurer1 = issurer.group(0).replace("'","")
      print(issurer1)
      
      sslinfo = "Expiry Date: " + date + " \n Issuer: " + issurer1
  except:
      sslinfo = "n/a"
      
  hostname = domain
  context = ssl.create_default_context()

  try:
      with socket.create_connection((hostname, 443)) as sock:
          with context.wrap_socket(sock, server_hostname=hostname) as ssock:
              tls = ssock.version()
              tls = tls.replace("TLSv","")
              sslerror = "0"
  except BaseException as e:
      tls="0"

print(sslinfo)
print(tls)

 

Conclusion

Now you have a framework to build an uptime monitor using a Raspberry Pi, an electrical breadboard, and an LCD screen—there’s plenty of potential to expand. If you discover additional modules or methods for scraping domain technical information, let me know and I’ll add them to this list. In a follow-up post I’ll cover checking blacklists, reverse IPs, and detecting a site’s technologies. Stay tuned. Try this out, and follow me on Twitter to share your Python SEO applications and ideas!

Python and Domain Info FAQ

How can Python be employed to scrape technical information for domains?

You can use Python with libraries such as BeautifulSoup and requests to extract technical details from websites, including domain-related data.

Are there specific Python libraries commonly used for web scraping technical information from domains?

Common libraries include BeautifulSoup and requests, which help parse HTML and retrieve domain-related technical data.

What technical details can be scraped for domains using Python?

Python scripts can gather DNS records, SSL certificate details, server headers, and other data associated with a domain.

Are there any ethical considerations or legal implications when scraping technical information for domains?

Adhere to ethical scraping practices, respect site terms of service, and comply with applicable laws when extracting domain technical information.

Where can I find examples and documentation for using Python to scrape technical information for domains?

See online tutorials and the documentation for libraries like BeautifulSoup and requests for examples and guidance on extracting domain technical details.

Greg Bernhardt
Follow me