Abusing the internet with popular search engine technologies

by c0ntex | c0ntexb[at]gmail.com


When you think of a search engine, you probably conjure up an image of the handy, html based site that can

help you find those RFC's, wiring diagrams, soft-warez, serial keys or questionable images, depending on your

taste. You've probably used generic search engines like Google, Yahoo, MSN, Lycos & Hotbot a trillion times

without considering even the possibility that it could be used to support an attack attempt.

Due to the sheer volume of HTTP based attacks, one has to consider than anything utilising HTTP is a serious

theoretical attack vector. Too many people feel happy in the knowledge that only HTTP holes are punched in

the firewall, if the server daemon is patched and user input locations are verified as safe then there is no

need to worry. This is a serious misconception that one should avoid.

HTTP fingerprinting, SQL Injection, malformed header injections, XXS, cookie fun, every conceivable form of

web based application or service attack can be managed by your security infrastructure. Yet what is the use

in having any of this preventative security to protect your data from prying eyes, when you have allowed

search bots to come in happily and crawl all web servers.

Taken from the Berkely teaching guide, stating that Spiders are:

"Computer robot programs, referred to sometimes as "crawlers" or"knowledge-bots" or "knowbots" that are used

by search engines to roam the World Wide Web via the Internet, visit sites and databases, and keep the search

engine database of web pages up to date. They obtain new pages, update known pages, and delete obsolete ones.

Their findings are then integrated into the "home" database."

When any powerful technology is utilised by a curious mind, possessing advanced knowledge of the internal

workings, it becomes easy to transform the technology into a potentially malicious weapon. In this case, the

engine soon becomes an advanced data-gathering tool for attackers. Everyone knows about RFP's web mining tool

called Whisker, yet not everyone is aware that a search engine can do the same...and more.

Google is the most powerful search engine online and it probably has been for the last four years or more,

crawling literally millions of websites and public archives daily. The amount of data that it contains would

crush any reasonable human mind due to the sheer scale. When data is being stored on such a vast scale and

the manner in which it's gathered by engines, it is of no wonder that sometimes-sensitive information will be


The spiders that Google use to harvest all these sites have no mind, they are merely an axon connection to a

central soma, acting like a neural network. The data is filtered down the axon if it passes a simple logic

test, the soma then sends it into the central brain. The only energy required to sustain crawling is a diet

similar to that of the pecking pigeons which perform the ranking inside the database clusters.


When testing a company site or domain, it is always useful to use Google as it's crawling software is superb

and will find some pretty interesting stuff you might otherwise overlook. It also has a very nice cache

function that will allow you to see "old" pages that were once available from that site. Attackers can use

this feature to compare pages, documents, structures and the likes.

Some fictional public BBS sits in your DMZ, using flat text and CGI. The BBS contains a plain text or md5 encrypted

password file that sits in the web directory and Mr Security forgets about it. One week later, a search engine

spider creeps along, crawls your site and just happens to find the password text. 1 month later we come along and

search your domain for password.txt and guess what pops up.

Interestingly, it seems a user from the internal network has subscribed to the BBS, using the same password as

his domain password. Now things become more interesting, especially when you find out after a very authentic

sounding phone call to his office receptionist, that the user is a member of IT staff.

IT: domains, routers, switches, servers, databases.

Human: laziness, imperfection, same password everywhere, domain admin account?.

It's possible to block spiders by using the robots file, providing the spider can read. An example robots.txt

file looks like:

User-agent: * # This shoo's all spiders off with a big

Disallow: / # stick, no flies around this server:

User-agent: googlebot # Fend off googlebot:

Disallow: /cgi-bin/

Disallow: /images/

# Tease vulnerable spiders:


Disallow: /GoogleBot PAYLOAD :-)/

If you really suffer from arachnophobia, you could block the crawler bot parasite at your router / firewall.

NOTE: This text is not trying to encourage the use of Google as an attack tool, acting on information found could be

illegal. Yet it is worth considering the ramifications of what data you allow your users to place in their private web


Example usage of the Google advanced search features:

allinurl: # Find all strings in the url

allintitle: # Find all strings in the title

filetype: # Find only specific filetype

intitle: # Find any strings in the title

inurl: # Find any strings in the url

link: # Find strings in link

site: # Find strings on this site


allinurl:session_login.cgi # Nagios/NetSaint/Nagmin

allinurl:/cgi-bin/status.cgi? # Network monioring

allinurl:/cgi-bin/extinfo.cgi # Nagios/NetSaint

inurl:cpqlogin.htm # Compaq Insight Agent

allinurl:"Index of /WEBAGENT/" # Compaq Insight Agent

allinurl:/proxy/ssllogin # Compaq Insight Agent

allintitle:WBEM Login # Compaq Insight Agent

WBEM site:domain.com # Compaq Insight Agent at domain.com

inurl:vslogin OR vsloginpage # Visitor System or Vitalnet

allintitle:/exchange/root.asp # Exchange Webmail

"Index of /exchange/" # Exchange Webmail Directory

netopia intitle:192.168 # Netopia Router Config


site:blah.com filename:cgi # Check cgi source code

allinurl:.gov passwd # A blackhat favorite

allinurl:.mil passwd # Another blackhat favorite

"Index of /admin/"

"Index of /cgi-bin/"

"Index of /mail/"

"Index of /passwd/"

"Index of /private/"

"Index of /proxy/"

"Index of /pls/"

"Index of /scripts/"

"Index of /tsc/"

"Index of /www/"

"Index of" config.php OR config.cgi

intitle:"index.of /ftp/etc" passwd OR pass -freebsd -netbsd -openbsd

intitle:"index.of" passwd -freebsd -netbsd -openbsd

inurl:"Index of /backup"

inurl:"Index of /tmp"

inurl:auth filetype:mdb # MDB files

inurl:users filetype:mdb

inurl:config filetype:mdb

inurl:clients filetype:xls # General spreadsheets

inurl:network filetype:vsd # Network diagrams


Ведете ли вы блог?


Результаты опроса

Новостной блок