Listing of web crawlers that do not support compression

If you are the author of any of these spiders, then please add support for content compression when you crawl the web. This will save you bandwidth on your crawling system, and it saves bandwidth on the servers that you crawl.

Adding compression support can be very simple -- if your spider is coded in Perl using LWP::UserAgent, then the addition of a single line of code will enable compression support.

$ua->default_header('Accept-Encoding' => 'gzip');
and then you need to make sure that you always refer to 'decoded_content' when dealing with the response object.

For other languages, all you need to do is to add

Accept-encoding: gzip
to the HTTP request that you send, and then be prepared to deal with a 'content-encoding: gzip' in the response.

Happily, some of the large spiders do support compression -- the googlebot and Yahoo Slurp do (to name but two). Since I started prodding crawler implementors, a couple have implemented compression (one within hours), and another reported that it was a bug that it didn't work -- which would be fixed shortly.

Crawlers which do more than 5% of the total (uncompressed) crawling activity are marked in bold below.

CrawlerLast IP used
Aboundex/0.3 (http://www.aboundex.com/crawler/)" "www.gladstonefamily.net173.192.34.95
DomainStatsBot/1.0 (http://domainstats.io/our-bot)" "www.gladstonefamily.net136.243.59.237
ia_archiver" "pond.gladstonefamily.net54.173.35.129
ia_archiver" "pond1.gladstonefamily.net54.173.35.129
masscan/1.0 (https://github.com/robertdavidgraham/masscan)" "-128.204.198.119
Mozilla/5.0 (compatible; Dataprovider.com;)" "www.gladstonefamily.net167.114.65.240
Mozilla/5.0 (compatible; DeuSu/5.0.2; +https://deusu.de/robot.html)" "gladstonefamily.net85.93.91.84
Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" "blog.gladstonefamily.net216.244.66.202
Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" "gladstone.name216.244.66.241
Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" "gladstonefamily.net216.244.66.249
Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" "pond.gladstonefamily.net216.244.66.242
Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" "pond1.gladstonefamily.net216.244.66.249
Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)" "blog1.gladstonefamily.net108.59.8.70
Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)" "charon.gladstonefamily.net108.59.8.70
Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)" "gladstonefamily.net162.210.196.130
Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)" "www.gladstonefamily.net162.210.196.130
Mozilla/5.0 (compatible; SEOkicks-Robot; +http://www.seokicks.de/robot.html)" "gladstonefamily.net138.201.59.34
Mozilla/5.0 (compatible; SEOkicks-Robot; +http://www.seokicks.de/robot.html)" "pond1.gladstonefamily.net138.201.59.34
Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36 OPR/36.0.2130.32" "gladstonefamily.net50.21.179.20
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6 - James BOT - WebCrawler http://cognitiveseo.com/bot.html" "pond1.gladstonefamily.net144.76.100.237
Python-urllib/2.7" "pond1.gladstonefamily.net52.90.72.154

Comments, problems etc to
Philip Gladstone

Last modified Sunday, 19 November 2006