7

I have a large list of domains and I need to check if domains are available now. I do it like this:

import requests
list_domain = ['google.com', 'facebook.com']
for domain in list_domain:
    result = requests.get(f'http://{domain}', timeout=10)
    if result.status_code == 200:
        print(f'Domain {domain} [+++]')
    else:
        print(f'Domain {domain} [---]')

But the check is too slow. Is there a way to make it faster? Maybe someone knows an alternative method for checking domains for existence?

  • Well, this is somewhat ban manners on the internet, but you could look into doing the requests in parallel (using multiprocessing) or asynchronously. – mCoding Nov 25 '20 at 17:21
  • Relevant https://stackoverflow.com/questions/29773003/check-whether-domain-is-registered/29773604 – Equinox Nov 25 '20 at 17:27
  • @venky__ This answer doesn't suit me. These examples do not work for me on Windows + Python3. – Владимир Nov 25 '20 at 17:37

5 Answers5

3

If you want to check which domains are available, the more correct approach would be to catch the ConnectionError from the requests module, because even if you get a response code that is not 200, the fact that there is a response means that there is a server associated with that domain. Hence, the domain is taken.

This is not full proof in terms of checking for domain availability, because a domain might be taken, but may not have appropriate A record associated with it, or the server may just be down for the time being.

The code below is asynchronous as well.

from concurrent.futures import ThreadPoolExecutor
import requests
from requests.exceptions import ConnectionError

def validate_existence(domain):
    try:
        response = requests.get(f'http://{domain}', timeout=10)
    except ConnectionError:
        print(f'Domain {domain} [---]')
    else:
        print(f'Domain {domain} [+++]')


list_domain = ['google.com', 'facebook.com', 'nonexistent_domain.test']

with ThreadPoolExecutor() as executor:
    executor.map(validate_existence, list_domain)
sarartur
  • 1,178
  • 1
  • 4
  • 13
2

You can use the socket library to determine if a domain has a DNS entry, this is quick and may be a good enough proxy:

>>> import socket
>>> 
>>> addr = socket.gethostbyname('google.com')
>>> addr
'74.125.193.100'
>>> socket.gethostbyname('googl42652267e.com')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
socket.gaierror: [Errno -2] Name or service not known
>>> 

import socket

def check_dns(hostname: str) -> bool:
    try:
        socket.gethostbyname(hostname)
        return True
    except socket.error:
        return False

assert check_dns(hostname='www.google.com') is True
assert check_dns(hostname='www.fhsdfh2462fgfwegryt2g.com') is False

Or you can use the python-whois library to check if the domain is registered.

cleder
  • 1,637
  • 14
  • 15
1

You can do that via "requests-futures" module. requests-futures runs Asynchronously, If you have a average internet connection it can check 8-10 url per second (Based on my experience).

Mr. lindroid
  • 162
  • 1
  • 12
0

What you can do is run the script multiple times but add only a limited amount of domains to each to make it speedy.

David
  • 23
  • 6
-1

Use scrapy it is way faster and by default it yields only 200 response until you over ride it so in your case follow me

pip install scrapy 

After installing in your project folder user terminal to creat project

Scrapy startproject projectname projectdir

It will create folder name projectdir

Now

cd projectdir

Inside projectdir enter

scrapy genspider mydomain mydomain.com

Now navigate to spiders folder open mydomain.py

Now add few lines of code

import scrapy


class MydomainSpider(scrapy.Spider):
    name = "mydomain"

    def start_requests(self):
        urls = [
            'facebook.com',
            'Google.com',
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
       yield { ‘Available_Domains’ : response.url}

Now back to projectdir and run

scrapy crawl mydomain -o output.csv

You will have all the working domains having status code 200 in output.csv file

For more see

Assad Ali
  • 288
  • 1
  • 12