This website uses cookies

Our website, platform and/or any sub domains use cookies to understand how you use our services, and to improve both your experience and our marketing relevance.

🔊 Web Growth Summit is here! Learn from industry experts on July 17-18, 2024. REGISTER NOW→

How to Create and Add Rules in WordPress Robots.txt File (A Developer-Friendly Guide)

Updated on August 22, 2023

9 Min Read
How to Add Robots.txt File for WordPress

When it comes to website SEO, simply creating a website is not enough. To be visible in the SERPs for specific keywords, it’s important to get listed in search engines.

Search engine robots crawl and index websites. And webmasters can control how these robots parse their sites by creating instructions in a special file called robots.txt.

Proper configuration of the robots.txt file is essential for ensuring that certain WordPress site pages may not need to be indexed by search engines. In this article, we’ll provide guidance on how to set up a robots.txt file for WordPress to optimize website SEO.

Brief Overview of Robots.txt File

A robots.txt is a text file located at the root of your website that tells search engine crawlers which URLs it can crawl. It is also known as the Robots Exclusion Protocol. In short, robots.txt tells search engine bots what they should not crawl on your website.

When a search engine bot is about to crawl your site’s URL (it will crawl and retrieve information so it can be indexed), it will first look for your robots.txt file. So, as per your instructions in the robots.txt file, the crawler will check which URLs it can crawl or not.

Robots.txt File

– Source: Elliance, Inc.

Fully Managed WordPress Hosting That Autoscales

Manage your WordPress site with ease on Cloudways Autonomous. Get fast, secure hosting that scales with your traffic—all without any setup headaches.

Importance of Robots.txt File in SEO

You can create a robots.txt file for your WordPress website to control how search engine crawlers access your site’s content. It can be used in conjunction with the robots meta tag and disallow directive to provide specific instructions to search engine crawlers.

The robots meta tag is a piece of HTML code that can be added to individual web pages to specify whether or not a search engine should index the page or follow links on the page.

On the other hand, the disallow directive is used in a robots.txt file to prevent search engine crawlers from accessing specific pages or directories on your WordPress website.

You can optimize your robots.txt file by prioritizing the pages and directories you want search engines to focus on crawling and indexing, helping improve your site’s SEO performance.

Understand Rules in Robots.txt File

There are various rules that you can add to your WordPress robots.txt file. Some common rules include disallowing specific directories or files, allowing or disallowing specific user-agents, and specifying your sitemap’s location.

It is important to understand the rules in a robots.txt file and use them correctly, as incorrect usage can result in unintended consequences, such as blocking search engines from accessing important pages on your website.

Below is a table that outlines some commonly used rules for robots.txt files and briefly explains their purpose.

Rule Explanation
User-agent: * This rule specifies that the following rules apply to all crawlers.
Disallow: / This rule disallows all crawlers from accessing any page on the site.
Disallow: /private/ This rule disallows all crawlers from accessing any page under the /private/ directory.
Allow: /public/ This rule allows all crawlers to access any page under the /public/ directory.
Sitemap: https://www.example.com/sitemap.xml This rule specifies the location of the sitemap file for the site.

Locate Robots.txt File in WordPress

The robots.txt file is a text file located in your website’s root directory. It instructs web robots, such as search engine crawlers, which pages or files on your site should not be accessed.

  • To view the robots.txt file for your website, simply open a web browser and navigate to the URL your-website.com/robots.txt. If the file is present, its contents will be displayed in the browser.

Locate Robots.txt File in WordPress

  • If nothing appears, this means that a robots.txt file has not been created for your website.

Optimize your website’s search engine visibility with a properly configured robots.txt file!

With Cloudways Managed WordPress Hosting, configuring your robots.txt file is a breeze.

Create Robots.txt File in WordPress

Here are the easy steps to create a robots.txt file for your WordPress website.

  • Log in to your WordPress hosting dashboard. For example, if you use Cloudways, log in to your account.
  • From the top menu bar, select the Servers tab.
  • Navigate to Server Management and select Master Credentials to obtain your SSH/SFTP access.

select Master Credentials

  • You can use any FTP server application to access your WordPress database files. I use FileZilla and connect to my server using the Master Credentials I get from Cloudways.

FTP server application

  • Once you have connected to your server using FileZilla, navigate to the /applications folder of your WordPress database files. Inside this folder, you will see different subfolders.

navigate to the /applications

  • After navigating back to the Cloudways Platform, select the Applications option from the top left bar. From there, choose the application for which you want to add the robots.txt file.

select the Applications

  • From the left pane of the Cloudways Platform, navigate to Application Management, select Application Settings, and finally, General. Here, you will find the folder name for your selected application.

Application Settings

  • After navigating back to FileZilla, go to the /applications/[FOLDER NAME]/public_html directory, where [FOLDER NAME] is the folder name for your selected application that you found in the Cloudways Platform. In this directory, create a new text file and name it robots.txt.

create a new text file

  • Once you have created the file, right-click on the file and select View/Edit to open it in a text editor. You can use any text editor of your choice, such as Notepad. This will allow you to edit the contents of the robots.txt file for your WordPress website.

edit the contents of the robots.txt file

Add Rules to WordPress Robots.txt

Now that you know what rules you can use, I will tell you how to edit the WordPress robots.txt file. Open your robots.txt file and add the following processes:

1. Block Access to Your Entire Site

This rule blocks all search engine robots from accessing your site by using

User-agent: *
Disallow: / directives

2. Block a Single Bot From Accessing Your Site

This rule blocks a single bot (in this case, Googlebot) from accessing your site by using

User-agent: Googlebot
Disallow: / directives

3. Block Access to a Specific Folder or File

This rule blocks access to a specific folder or file (in this case, /private-folder/) by using

User-agent: *
Disallow: /private-folder/ directives

4. Allow All Bots to Have Full Access to Your Site

This rule allows all robots full access to your site by using

User-agent: *
Disallow: directives

5. Allow Access to a Specific File in a Disallowed Folder

This rule allows access to a specific file (in this case, /private-folder/specific-file.html) in a disallowed folder (in this case, /private-folder/) by using

User-agent: *, Disallow: /private-folder/, 
Allow: /private-folder/specific-file.html directives

6. Prevent Bots From Crawling WordPress Search Results

This rule stops bots from crawling WordPress search results by using

User-agent: *
Disallow: /?s= directives.

7. Create Different Rules for Different Bots in Robots.txt

This rule creates different rules for different bots in robots.txt by using the User-agent: directive followed by the bot’s name and then specifying the rules for that bot using the Disallow: or Allow: directives.

8. Disallow GPTBot to Access Your Content

To disallow GPTBot in the WordPress robots.txt file, you can use the User-agent: directive followed by the bot’s name, GPTBot, and then specify the rules for that bot using the Disallow: directive. Here is an example:

User-agent: GPTBot
Disallow: /

Specify User Agents in Robots.txt File

Let’s now discuss using wildcards in the robots.txt file to allow or disallow specific file types throughout WordPress.

1. Using Wildcards in robots.txt File

Search engines like Google and Bing support using wildcards in the robots.txt file. These wildcards can allow/disallow specific file types throughout the WordPress website.

2. Using Asterisk (*) Wildcard

An asterisk (*) can be used to handle a wide range of options/selections. For example, to disallow all images starting with “image” and with “jpg” extension from being indexed by search engines, you can use the following code:

User-agent: *
Disallow : /images/image*.jpg

The power of * is not limited to images only. You can even disallow all files with a particular extension. For example, to disallow all files with extensions “pdf” & “png” found in the downloads folder, you can use the following code:

User-agent: *
Disallow: /downloads/*.pdf
Disallow: /downloads/*.png

You can even disallow WordPress core directories by using *. For example, to ask search engines not to crawl directories starting with “wp-”, you can use the following code:

User-agent: *
Disallow: /wp-*/

3. Using Dollar ($) Wildcard

Another wildcard symbol used in the WordPress robots.txt file is the dollar symbol ($). For example, to ask search engines not to index referral.php or referral.php?id=123 and so on, you can use the following code:

User-agent: *
Disallow: referral.php

But what if you want to block referral.php only, you only have to include the $ symbol just after the referral.php.The symbol ensures only referral.php is blocked but not referral.php?id=123. For example:

User-agent: *
Disallow: referral.php$

You can use $ for directories too. For example, to instruct search engines to disallow the wp-content folder and all directories inside the wp-content, you can use the following code:

User-agent: *
Disallow: /wp-content/

If you want to disallow only wp-content rather than all sub-folders, you should use the $ symbol. For example:

User-agent: *
Disallow: /wp-content/$

The $ symbol ensures that only wp-content is disallowed. All the directories in this folder are still accessible.

Example of a WordPress Robots.txt

Below is an example of a robots.txt file for a WordPress blog:

User-agent: *

Disallow: /admin/

Disallow: /admin/*?*

Disallow: /admin/*?

Disallow: /blog/*?*

Disallow: /blog/*?

sitemap: http://www.yoursite.com/sitemap.xml

The first line indicates the User-agent. This refers to the search engine that can access and index the website. Where * means all search engines. You can specify each search engine separately.

The next few lines will not allow search engines to crawl certain directories such as “admin” and “blog”. It is often not necessary for search engines to index these directories.

If your site has a sitemap, adding its URL helps search engine bots find the sitemap file. This results in faster indexing of pages.

Validate WordPress Robots.txt File

Testing your WordPress robots.txt file is very important to ensure it has been set up correctly and isn’t negatively affecting the site’s performance. Here’s how you can test your WordPress robots.txt file:

  • Open the robots.txt Tester tool;
  • Submit your website URL;
  • Click on the Test button.

Validate WordPress Robots.txt File

  • If the test button shows ‘Allowed’, this means that the URL you entered isn’t blocked from Google web crawlers or vice-versa if it shows ‘Blocked’.
  • The tester shows errors and warnings. Fix those, and you’re good to go!

Avoid Common Mistakes in Robots.txt

When you make a Robots.txt file for your website, be sure to avoid these common mistakes:

  • Don’t block pages that should be allowed: If you block pages that should be allowed, search engines won’t be able to find them.
  • Don’t allow pages that should be blocked: Make sure to block pages that shouldn’t be seen by search engines. If you don’t, private information could be exposed.
  • Test your Robots.txt file: After you make your robots.txt file, test it to ensure it works correctly. Check that all the pages you want to block are actually blocked.
  • Update your Robots.txt file: As your website changes, update your robots.txt file too. If you don’t, search engines might not see your new pages or might see pages you don’t want them to see.
  • Understand what Robots.txt does: Make sure you know what the robots.txt file does and how it works with search engines. You might make mistakes when setting it up if you don’t understand it.

Summary

The Robots.txt file is a valuable tool for SEO because it allows you to instruct search engine bots on what to index and what not to index on your website.

However, it’s important to be cautious when using it because a misconfiguration can lead to complete deindexation of your site (e.g., using Disallow: /).

Generally, the best practice is allowing search engines to crawl as much of your site as possible while protecting sensitive information and avoiding duplicate content. For example, you can use the Disallow directive to block specific pages or folders or the Allow directive to override a Disallow rule for a particular page.

Not all bots follow the rules outlined in the Robots.txt file, so controlling what gets indexed is not foolproof. Nonetheless, it’s a useful tool to include in your SEO strategy.

Frequently Asked Questions

Q. Why is a Robots.txt file used?

A. The Robots.txt file instructs search engine robots that analyze your website on what to consider and ignore while crawling. Thanks to this file, you can prohibit the exploration and indexing of your site to some robots (also called “crawlers” or “spiders”).

Q. Do I need Robots.txt for WordPress?

A. You should have a Robots.txt file for your WordPress website to control how search engines crawl your site. You can create one using a text editor or a plugin like Yoast SEO or All in One SEO Pack. Remember to take other security measures to protect your website from malicious bots or hackers.

Q. Where is Robots.txt in WordPress?

A. WordPress does not have a default Robots.txt file. You can create one manually and upload it to your site’s root directory, or use a plugin like Yoast SEO or All in One SEO Pack to create and edit your robots.txt file from the plugin’s settings. Test your Robots.txt file to ensure it is working correctly.

Share your opinion in the comment section. COMMENT NOW

Share This Article

Danish Naseer

Danish Naseer is a WordPress Community Manager at Cloudways. He is passionate about designing, developing, and engaging with people to help them. He also actively participates in the community to share his knowledge. Besides that, he loves to watch documentaries, traveling and spending time with family. You can contact him at [email protected]

×

Get Our Newsletter
Be the first to get the latest updates and tutorials.

Thankyou for Subscribing Us!

×

Webinar: How to Get 100% Scores on Core Web Vitals

Join Joe Williams & Aleksandar Savkovic on 29th of March, 2021.

Do you like what you read?

Get the Latest Updates

Share Your Feedback

Please insert Content

Thank you for your feedback!

Do you like what you read?

Get the Latest Updates

Share Your Feedback

Please insert Content

Thank you for your feedback!

Want to Experience the Cloudways Platform in Its Full Glory?

Take a FREE guided tour of Cloudways and see for yourself how easily you can manage your server & apps on the leading cloud-hosting platform.

Start my tour

CYBER WEEK SAVINGS

  • 0

    Days

  • 0

    Hours

  • 0

    Mints

  • 0

    Sec

GET OFFER

For 4 Months &
40 Free Migrations

For 4 Months &
40 Free Migrations

Upgrade Now