Ad

Usage Of 'Allow' In Robots.txt

- 1 answer

Recently I saw a site's robots.txt as follows:

User-agent: *
Allow: /login
Allow: /register

I could find only Allow entries and no Disallow entries.

From this, I could understand robots.txt is nearly a blacklist file to Disallow pages to be crawled. So, Allow is used only to allow a sub part of domain which is already blocked with Disallow. Similar to this:

Allow: /crawlthis
Disallow: /

But, that robots.txt has no Disallow entries. So, does this robots.txt let Google crawl all the pages? Or, does it allow only the specified pages tagged with Allow?

Ad

Answer

You are right that this robots.txt file allows Google to crawl all the pages on the website. A thorough guide can be found here: http://www.robotstxt.org/robotstk" rel="nofollow noreferrer" target="_blank" rel="nofollow noreferrer" href="http://www.robotstxt.org/robotstxt.html" >http://www.robotstxt.org/robotstxt.html.

If you want googleBot to only be allowed to crawl the specified pages then correct format would be:

User Agent:*
Disallow:/
Allow: /login
Allow: /register

(I would normally disallow those specific pages though as they don't provide much value to searchers.)

It's important to note that the Allow command line only works with some robots (including Googlebot)

Ad
source: stackoverflow.com
Ad