What is robots.txt?
Robots.txt is a text file webmasters create to instruct web robots (typically search engine bots) how to crawl pages on their websites.
The robots.txt file gives directives to search engine crawlers on which pages should be indexed and which ones shouldn’t (e.g. you don’t want your admin panel or dashboard being listed, as this may make it easier to be hacked, etc…).
How do I set up robots.txt?
You can either use an online robots.txt generator or set up your own in one of the following formats.
Here are a few examples of robots.txt in action:
We start by uploading a robots.txt file, to our root web directory or /robots.txt
How to block all web crawlers from all content
User-agent: * Disallow: /
Using this syntax in our robots.txt file would tell all web crawlers not to crawl any pages, at all. You probably don’t want to use this option.
How to allow all web crawlers access to your content
User-agent: * Disallow:
Using this syntax tells bots to crawl all pages on your domain name, including the homepage.
Blocking a specific web crawler from a specific folder
User-agent: Googlebot Disallow: /example-subfolder/
This syntax tells only Google’s crawler (user-agent name Googlebot) not to crawl any pages that contain the URL string yourdomain.com/example-subfolder/.
Blocking a specific web crawler from a specific web page
User-agent: Bingbot Disallow: /example-subfolder/blocked-page.html
This syntax tells only Bing’s crawler (user-agent name Bingbot) to avoid crawling the specific page at yourdomain.com/example-subfolder/blocked-page.html