Google – the word is synonymous with simplicity in the world of internet search engines. It is quick, easy to use and searches the web based on simple logic that ordinary users understand. Everybody can use Google.
However, not everyone uses Google for the purposes for which it was intended. Recently, Google has also become one of the most effective and popular tools for discreetly obtaining targeting information in order to non-intrusively assess the security of web servers. The search engine has also become the tool of choice for effortlessly obtaining unsecured files, directories, documents and sensitive information available over the web with little or no risk. Hackers use Google.
Most users know Google by its popular one-line search page – but it also offers extended functionality for locating, and then searching through, specific domains, URLs, live and cached web pages, links, news groups, directories, images, files and documents accessible via the web.
For today's cracker, hacker, script kiddie, fraudster, identity thief or even political terrorist, to become familiar and competent with the advanced search operators of the Google engine is to add significant weaponry to your arsenal.
Google site operator searches can be expanded to search out whole domains for specific words – the query "site:edu test answers" would return a list of every .edu domain that possessed the words "test" and "answers," for example. The same query would reveal all domains that share sensitive key works such as "site:gov nuclear" or "site:gov defense," allowing a hacker to identify specific government, non-profit, country, industry, political group, or subject target websites.
Invisible reconnaissance
The site operator can also be used to map the contents of a target web server. The query "site:www.scworld.com sc magazine" will return a list of every web page held by the www.scworld.com web server bearing the words sc magazine which is, of course, all of them.
Since the engine searches not only the web pages, but URL and title pages from a Google server of cached data and images, a hacker can easily obtain a comprehensive website structure without even visiting the site. This query then gives them maximum target reconnaissance with virtually no exposure.
Locating sensitive directory listings
Most directory listings are posted intentionally for public use, but some are not, and possess sensitive information that is extremely valuable to the hacker trying to discern the content of web pages.
By using the operator "intitle: index.of parent directory," a bad guy can passively identify incorrectly configured or temporarily posted directories containing sensitive or other information useful for exploiting other hacks such as determining web server application versions.
Web server software and versions
Hackers can also quickly identify the web server software run at target websites by using the operator "intitle: server.at target site name.com." The results of this query would indicate the exact version of the web server software running on the target site (indicated on the last line).
While any hacker could determine the same thing by connecting directly to a targeted web server and reading the HTTP headers, the Google query gives you this information from cached data. As a result, a hacker can remotely determine which attacks would work against the server without connecting to it.
Identifying web servers running vulnerable software can also be accomplished by running Google queries to identify servers with default pages. When a web server is first installed, it contains a set of default web pages with instructions for the site administrator. Administrators failing to remove these default pages will expose their sites (and, by extension, their networks) as poorly maintained – exactly what hackers are searching for.
Identifying all the servers on the web running default versions of Microsoft Internet Information Services (ISS) 5.0 is as simple as running the query "allintitle: Welcome to Windows 2000 Internet Services." The response page provides a hackers' shopping list of vulnerable sites.
Google will provide a list of all the sites running default software when you add the title of the software default page after the query: "allintitle:." For example, do you want a list of all the sites running on a default version of Netscape Enterprise Server? If so, simply run the query "allintitle: Netscape Enterprise Server Home Page." The response page should keep you busy for some time with links to servers that bear default pages similar to the one pictured overleaf.
As if this was not enough, the Google engine will also locate poorly administered sites by specifically searching for URLs with default settings indicating manuals, help pages or sample programs for a software. This can be done by running the query "inurl:" with the URL-indicated title page of the default page (software manual, help page or sample program) you are looking for. The response page will list all web servers currently running manuals, say, for the software you might be looking to exploit.
Identifying CGI scanning directories
By far one of the scarier features of the Google engine is its ability to locate "CGI" or "web" scanning directories containing sample software code or vulnerable files. These scanning tools store vulnerabilities they have identified in commonly named files.
By using either the "index.of" or "inurl" operators, and then attaching the desired target directory or file name, a hacker can locate a veritable pirates' treasure of vulnerable targets.
For example, the Google query "allinurl:/cgi-bin/finger.cgi" would return pages of directories from cached Google servers holding files of the results of "finger" profiling commands. The queries can and do produce password files to the able Google hacker.
In fact, the Google web scanning queries have become automated, and are available for downloading from hacker chat and websites. It really doesn't get any easier than that.
Once again, because this target information is pulled from cached Google files instead of live web servers, the exposure to the potential hacker is nil. And if you are a bad guy – it doesn't get any better than that. It like casing a house you are going to rob and not even having to drive through the neighborhood.
Finding sensitive data
Finally, Google search operators are excellent at finding sensitive information and files that just should not be on the web in the first place. Because Google can search for and through certain types of documents for specific words, it can be used to attempt to locate any permutation you can imagine. If you don't think so, just try running an operator query such as "inurl:backup" and see how many people are storing the backups of their web servers or personal computers online.
A query of: "inurl:admin" can find admin directories and subsequent variations and sub queries will produce truly scary results. My personal favorite is "inurl:admin filetype xls password," which produces exactly what you think it does.
Examples of this type of low-hanging fruit are found posted on dozens of sites that preach and practise Google hacking. In fact, if you look for it, the web is filled with sites posting "Googleturds" found by "Googledorks" using "GooScans" and "GooPots."
Google hacking is popular for three reasons – it's simple, it works and it is very discreet (a strong attraction for the serious hacker).
Like Ninjas, real hackers like to get in and get out of a target without leaving so much as a fingerprint behind. Google hacking allows them to assess targets and design an effective attack from a safe distance before penetrating them.
A skilled Google hacker is limited only by his imagination in using the engine.