Deny Access But Allow Robots I.e. Google To Sitemap.xml
Is there a method where you can only allow robots such as Google, Yahoo, or other search engine robots to my sitemap which is located at http://www.mywebsite.com/sitemap.xml. Is this possible to not allow direct access by a user but only to robots?
Well basically no, but you could do something with the user-agent string and disallow access (assuming Apache)
<Location /sitemap.xml> SetEnvIf User-Agent GodBot GoAway=1 Order allow,deny Allow from all Deny from env=!GoAway </Location>
But as it says here (where I found the syntax)
Access control by User-Agent is an unreliable technique, since the User-Agent header can be set to anything at all, at the whim of the end user.
- → Incorrect title with link in google crawler
- → Disallow query strings in robots.txt for only one url
- → Crawling hashbangs without ajax
- → I have a 302 redirect pointing to www. but Googlebot keeps crawling non-www URLs
- → Usage of 'Allow' in robots.txt
- → Preventing search engines from indexing all posts
- → disallow some image folders
- → Robots.txt, php.ini, connect_to_database.php, .htaccess
- → Display initial element in React for bots and screen readers
- → How to (dynamically) change meta tags before the site is scraped in Angular 2?
- → Quickest way to get list of <title> values from all pages on localhost website
- → Settings prerender.io for meteor.js on localhost