Monday, November 26, 2012

How to exploit robots.txt?

What is robots.txt?

Robots.txt is a file that contain path which cannot crawled by bot most of time search-engine bots like google bot or etc. It tells search-engine that this directory is private & can not be crawled by them.

If yo are site owner & want to make robots.txt file , then go following link , it will create robots.txt file for you.

so just for now , robots.txt is pretty much what websites use to block certain pages from search engines.

Here is a sample :

First method

Now this method is very rare & the web-master would have to be stupid to do this, but you'll be surprised how many stupid people there are in the world.

This one is simple, go to one of the disallowed directories & look in the source. Sometimes web-master leave comments there to give hints like passwords/ or user-names.

You never know you might find something juicy. :]

With this info you could possibly guess his password by entering some of the most infamous/best football teams.

You can also check for disallowed directory which may be allowed or weak permission.Click here for python script to audit robots.txt file automatically.


Second method

Directory Traversal

Ok, you use directory traversal when you get denied from a web-page. For example if you go to a disallowed directory & you get denied [404 page]

You can easily bypass that if there insecure with directory traversal. Also, getting denied from a page shows that there must be some sexy info inside of it. :]

So lets get started.

1. Go to the directory you got denied from. I will be using an example.

2. Once you get denied you need to add a not found directory.

3. Now for the directory traversal part you need to add a /../

This will bring it back one directory, which can get you access to the disallowed directory.

Keep it mind that you can also use the first method if you get access to the directory.

Click here for more path traversal details tutorials.

Third method

CGI-BIN exploits

Alright, the /cgi-bin/ page has alot of public exploits out right now. So, this method only goes for if the site has /cgi-bin/

So, anyways. I dont want my tutorial to be to big so here is a list of CGI-BIN exploits.


gl said...


Unknown said...

Use this Robots.txt is a simple text file that is placed in the root directory. The purpose of this text is to help the search engine properly index your webpage. This tool would allow the search engine to navigate every page of your site.
Default robots.txt

Unknown said...

How does this work when you are using a site archived from the "Internet Archive" website?

Post a Comment