IP Address lookups using Python

There was a script for the C-based lookup, pure python-based lookup and redis-based lookup: c_based.py: from random import randint import GeoIP if __name__ == "__main__": gi = GeoIP.new(GeoIP.GEOIP_MEMORY_CACHE) ip_address = '.'.join([str(randint(0, 255)) for _ in range(0, 4)]) gi.country_code_by_addr(ip_address) pure_python_based.py: from random import randint import pygeoip if __name__ == "__main__": gi = pygeoip.GeoIP('/usr/share/GeoIP/GeoIP.dat', flags=pygeoip.const.MMAP_CACHE) ip_address = '.'.join([str(randint(0, 255)) for _ in range(0, 4)]) gi.country_code_by_addr(ip_address) redis_based.py: from random import randint import socket import struct import redis def ip2long(ip): """ Convert an IP string to long """ packedIP = socket.inet_aton(ip) return struct.unpack("!L", packedIP)[0] if __name__ == "__main__": redis_con = redis.StrictRedis(host='localhost', port=6379, db=0) ip_address = ip2long('.'.join([str(randint(0, 255)) for _ in range(0, 4)])) resp = redis_con.zrangebyscore(name='countries', min=ip_address, max='+inf', start=0, num=1) country = resp[0].split('@')[0] if resp else None I then created a bash file that would run each script 1000 times and output how long each took: $ cat benchmark.sh #!/bin/bash function c_based { for i in `seq 1 1000`; do python ./c_based.py done } function pure_python_based { for i in `seq 1 1000`; do python ./pure_python_based.py done } function redis_based { for i in `seq 1 1000`; do python ./redis_based.py done } time c_based time pure_python_based time redis_based Here is the result of running the benchmark: $ ./benchmark.sh # C-based real 0m8.407s user 0m5.688s sys 0m2.588s # Pure python-based real 0m17.498s user 0m13.492s sys 0m3.737s # redis-based real 0m29.075s user 0m21.545s sys 0m6.939s For ad hoc requests the time it takes to load the database into memory levels the playing field a lot. The C-based approach is still about twice as fast as the pure python-based approach but now the redis-based approach is only around twice as slow and the pure python-based approach. The curious case of The CSV database I downloaded from MaxMind and the binary one I installed via the libgeoip-dev package had differences between them. One of the test IP addresses I used when I started building these scripts was According to whois the IP address is mapped to a network in Herndon, VA, USA and sits in the net range – When I ran a redis lookup manually though it came back with Romania as the country where the IP address is mapped to: $ redis-cli> ZRANGEBYSCORE countries 2130706433 +inf LIMIT 0 1 1) "RO@2147483648" The closest, lower value to the IP address will always be returned with the redis lookup implementation used in this blog..That means if there is no exact range the IP address being looked up in the database then it wont be flagged up..I looked at the CSV file and it turns out there are no mappings for any 24.x.x.x ranges before 24.36.x.x: $ grep '^"24.' GeoIPCountryWhois.csv | head "","","405012480","405143551","CA","Canada" "","","405143552","405180415","US","United States" "","","405180416","405184511","CA","Canada" "","","405184512","405364735","US","United States" "","","405364736","405372927","CA","Canada" "","","405372928","405422079","PR","Puerto Rico" "","","405422080","405798911","US","United States" "","","405798912","405831679","CA","Canada" "","","405831680","405843967","US","United States" "","","405843968","405848063","CA","Canada" At this point I wondered if any IP addresses would return the same results from all three implementations and whois..I picked from the CSV file: $ grep GeoIPCountryWhois.csv "","","418693120","418709503","CA","Canada" Whois said the IP address is mapped to a network in Richmond Hill, Ontario, Canada..All python-based and redis-based lookups returned Canada as their answer as well.. More details

Leave a Reply