Home » website hack » How to exploit robots.txt?

Monday, November 26, 2012

How to exploit robots.txt?

Posted by nirav desai at 7:57 PM Labels: website hack

What is robots.txt?

Robots.txt is a file that contain path which cannot crawled by bot most of time search-engine bots like google bot or etc. It tells search-engine that this directory is private & can not be crawled by them.

If yo are site owner & want to make robots.txt file , then go following link , it will create robots.txt file for you.

http://www.mcanerin.com/EN/search-engine/robots-txt.asp

so just for now , robots.txt is pretty much what websites use to block certain pages from search engines.

Here is a sample : http://www.whitehouse.gov/robots.txt

First method

Now this method is very rare & the web-master would have to be stupid to do this, but you'll be surprised how many stupid people there are in the world.

This one is simple, go to one of the disallowed directories & look in the source. Sometimes web-master leave comments there to give hints like passwords/ or user-names.

You never know you might find something juicy. :]

With this info you could possibly guess his password by entering some of the most infamous/best football teams.

You can also check for disallowed directory which may be allowed or weak permission.Click here for python script to audit robots.txt file automatically.

Second method

Directory Traversal

Ok, you use directory traversal when you get denied from a web-page. For example if you go to a disallowed directory & you get denied [404 page]

You can easily bypass that if there insecure with directory traversal. Also, getting denied from a page shows that there must be some sexy info inside of it. :]

So lets get started.

1. Go to the directory you got denied from. I will be using an example.

www.slave.com/users/

2. Once you get denied you need to add a not found directory.

www.slave.com/users/randomwords&numbers

3. Now for the directory traversal part you need to add a /../

This will bring it back one directory, which can get you access to the disallowed directory.

www.slave.com/users/randomwords&numbers/../

Keep it mind that you can also use the first method if you get access to the directory.

Click here for more path traversal details tutorials.

Third method

CGI-BIN exploits

Alright, the /cgi-bin/ page has alot of public exploits out right now. So, this method only goes for if the site has /cgi-bin/

So, anyways. I dont want my tutorial to be to big so here is a list of CGI-BIN exploits.

https://www.hellboundhackers.org/articles/7-complete-set-of-cgi-bin-exploits-and-what-they-do.html

Do you Like this Article?

Get Latest Updates For Free!
Your email address will not be shared with anyone

2 comments:

Unknown said...: Use this Robots.txt is a simple text file that is placed in the root directory. The purpose of this text is to help the search engine properly index your webpage. This tool would allow the search engine to navigate every page of your site.
Default robots.txt; November 23, 2015 at 4:46 PM
Unknown said...: How does this work when you are using a site archived from the "Internet Archive" website?; April 6, 2017 at 11:57 PM

Top Menu

Hacking & Tricks

Monday, November 26, 2012

How to exploit robots.txt?

What is robots.txt?

First method

Second method

Third method

2 comments:

Post a Comment

Still searching? Try here

Do you Like Hacking?

Labels

Popular Posts

ABout me

Join me in G+

Contact Form