What is robots.txt?
Robots.txt is a text file webmasters create to instruct web robots (typically search engine robots) how to crawl pages on their website.
The robots.txt file gives directives to search engine bots on which pages should be followed and which ones shouldn’t. (e.g. you don’t want your admin panel being followed, as this may be indexed in Google, making it easier to be hacked etc…).
How do I set up robots.txt?
You can either use an online generator like this one or set your own one up with the following format:
Here are a few examples of robots.txt in action:
We would start by uploading a Robots.txt file, to our root directory, so that it would appear here: yourdomain.com/robots.txt
How to block all web crawlers from all content
User-agent: * Disallow: /
Using this syntax in our robots.txt file would tell all web crawlers not to crawl any pages, at all. You probably don’t want to use this option.
How to allow all web crawlers access to your content
User-agent: * Disallow:
Using this syntax tells web crawlers to crawl all pages on yourdomain.com, including the homepage.
Blocking a specific web crawler from a specific folder
User-agent: Googlebot Disallow: /example-subfolder/
This syntax tells only Google’s crawler (user-agent name Googlebot) not to crawl any pages that contain the URL string yourdomain.com/example-subfolder/.
Blocking a specific web crawler from a specific web page
User-agent: Bingbot Disallow: /example-subfolder/blocked-page.html
This syntax tells only Bing’s crawler (user-agent name Bing) to avoid crawling the specific page at yourdomain.com/example-subfolder/blocked-page.