Robots.txt for SEO

Robots.txt and it’s impact on SEO and search.

To understand the impact it has on SEO and search, let’s begin by understanding :


What is Robots.txt and why it is used?

Robot.txt file informs search engine bots/agents about which specific URLs it needs to crawl and therefore manages the crawler traffic to your site.

Why is Robots.txt important for SEO?

It is one of the basic tools for SEO to tell search bots about what, how, and when to crawl the site. A tool that helps webmasters to control what and when the bots can access their sites. It is a clean way to ensure which sections of your site needs to be crawled and to also deny access to spam bots. This also reduces duplicate content issue especially for big sites where a lot of dynamic URLs get generally on the fly thereby creating duplicate content.

Definitely a helpful tool for SEO.

Now let’s look at some examples of what rules go into a robots.txt file. Specific syntax is applicable while creating the Robots.txt file.

1.           If you want to allow all crawlers then the file should have the following:

User-agent: *

Disallow:

2.           The opposite of the above is below. This is disallowing all robots:

User-agent: *

Disallow: /

3.           If you do not want to allow specific folders:

User-agent: *

Disallow: /junk/

Disallow:/wp-admin/

4.           You can disallow media or specific files that you don’t want bots to access. And depending upon what kind of files these are, the wildcard and dollar is used as in the example below:

User-agent: *

Disallow: /*.xls$

5. If you have noticed a specific bot/agent in your traffic report and that is spamming your site, you will likely want to disallow it by using the below rule:

User-agent: Actual name of the spam bot

Disallow: /

6. Another element that you can add to the file is your sitemap since it makes is easier for the bots to find it.

User-agent: *

Sitemap: https://seo-bestpractices.com/sitemap

As mentioned earlier, even if you disallow certain folder in your robots.txt, you may find that those urls have got indexed, not directly through agents but because other sites may have linked to you. If you want to block all robots or agents completely then use noindex. 

You will need to include the following meta tag in the head section of your page
<meta name="robots" content="noindex">

You may even password protect the page in order to completely disallow search bots.

Below are examples of what we mean by Agents, and there are many more agents other than the below:

•            Google: Googlebot.

•            Bing: Bingbot.

•            Yahoo: Slurp.

Now that you have a txt ready, what do you do with it?

1.           The first step is to create a txt file with the name robots.txt

2.           The 2nd step is to add all the rules as applicable for your site (from the examples discussed above)

3.           The 3rd step is to upload it to your root directory, meaning It should show up on typing in your website address as below:

https://abc.com/robots.txt
(replace abc with your website url)

4.           Is to test if it is working properly in the robot.txt tester in Google search console, robots.txt Tester.

I am providing a link to a site’s sitemap which I loved particularly because of the graphics. And it will give you an idea how to use syntax in the robot.txt

https://www.avvo.com/robots.txt

I hope you found this post useful. Would love to hear from you in the comments field below.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

5 − 2 =