Robots.txt and it’s impact on SEO and search.
To understand the impact it has on SEO and search, let’s begin by understanding :
Robot.txt file informs search engine bots/agents about which specific URLs it needs to crawl and therefore manages the crawler traffic to your site.
It is one of the basic tools for SEO to tell search bots about what, how, and when to crawl the site. A tool that helps webmasters to control what and when the bots can access their sites. It is a clean way to ensure which sections of your site needs to be crawled and to also deny access to spam bots. This also reduces duplicate content issue especially for big sites where a lot of dynamic URLs get generally on the fly thereby creating duplicate content.
Definitely a helpful tool for SEO.
Now let’s look at some examples of what rules go into a robots.txt file. Specific syntax is applicable while creating the Robots.txt file.
1. If you want to allow all crawlers then the file should have the following:
User-agent: *
Disallow:
2. The opposite of the above is below. This is disallowing all robots:
User-agent: *
Disallow: /
3. If you do not want to allow specific folders:
User-agent: *
Disallow: /junk/
Disallow:/wp-admin/
4. You can disallow media or specific files that you don’t want bots to access. And depending upon what kind of files these are, the wildcard and dollar is used as in the example below:
User-agent: *
Disallow: /*.xls$
5. If you have noticed a specific bot/agent in your traffic report and that is spamming your site, you will likely want to disallow it by using the below rule:
User-agent: Actual name of the spam bot
Disallow: /
6. Another element that you can add to the file is your sitemap since it makes is easier for the bots to find it.
User-agent: *
Sitemap: https://seo-bestpractices.com/sitemap
As mentioned earlier, even if you disallow certain folder in your robots.txt, you may find that those urls have got indexed, not directly through agents but because other sites may have linked to you. If you want to block all robots or agents completely then use noindex. You will need to include the following meta tag in the head section of your page <meta name="robots" content="noindex"> You may even password protect the page in order to completely disallow search bots.
Below are examples of what we mean by Agents, and there are many more agents other than the below:
• Google: Googlebot.
• Bing: Bingbot.
• Yahoo: Slurp.
Now that you have a txt ready, what do you do with it?
1. The first step is to create a txt file with the name robots.txt
2. The 2nd step is to add all the rules as applicable for your site (from the examples discussed above)
3. The 3rd step is to upload it to your root directory, meaning It should show up on typing in your website address as below:
https://abc.com/robots.txt
(replace abc with your website url)
4. Is to test if it is working properly in the robot.txt tester in Google search console, robots.txt Tester.
I am providing a link to a site’s sitemap which I loved particularly because of the graphics. And it will give you an idea how to use syntax in the robot.txt
https://www.avvo.com/robots.txt
I hope you found this post useful. Would love to hear from you in the comments field below.
The 3 key SEO factors; In continuation to my last post which talked about CONTENT…
SEO & AI is the new enhanced SEO. What does SEO have to do with…
How to optimize web pages with videos so that they can be found on SERPs…
Before we go into the three key factors that will help you Rank higher on…
SEO in 2019 will see a greater shift towards optimizing websites, pages, social media presence…
Customer reviews and ratings are one of the most effective ways of increasing the popularity…
This website uses cookies.