When it comes to website SEO, simply creating a website is not enough. To be visible in the SERPs for specific keywords, it’s important to get listed in search engines.
Search engine robots crawl and index websites. And webmasters can control how these robots parse their sites by creating instructions in a special file called robots.txt.
Proper configuration of the robots.txt file is essential for ensuring that certain WordPress site pages may not need to be indexed by search engines. In this article, we’ll provide guidance on how to set up a robots.txt file for WordPress to optimize website SEO.
- Brief Overview of Robots.txt File
- Importance of Robots.txt File in SEO
- Understand Rules in Robots.txt File
- Locate Robots.txt File in WordPress
- Create Robots.txt File in WordPress
- Add Rules to WordPress Robots.txt
- Specify User Agents in Robots.txt File
- Example of a WordPress Robots.txt
- Validate WordPress Robots.txt File
- Avoid Common Mistakes in Robots.txt
Brief Overview of Robots.txt File
A robots.txt is a text file located at the root of your website that tells search engine crawlers which URLs it can crawl. It is also known as the Robots Exclusion Protocol. In short, robots.txt tells search engine bots what they should not crawl on your website.
When a search engine bot is about to crawl your site’s URL (it will crawl and retrieve information so it can be indexed), it will first look for your robots.txt file. So, as per your instructions in the robots.txt file, the crawler will check which URLs it can crawl or not.
– Source: Elliance, Inc.
Fully Managed WordPress Hosting That Autoscales
Manage your WordPress site with ease on Cloudways Autonomous. Get fast, secure hosting that scales with your traffic—all without any setup headaches.
Importance of Robots.txt File in SEO
You can create a robots.txt file for your WordPress website to control how search engine crawlers access your site’s content. It can be used in conjunction with the robots meta tag and disallow directive to provide specific instructions to search engine crawlers.
The robots meta tag is a piece of HTML code that can be added to individual web pages to specify whether or not a search engine should index the page or follow links on the page.
On the other hand, the disallow directive is used in a robots.txt file to prevent search engine crawlers from accessing specific pages or directories on your WordPress website.
You can optimize your robots.txt file by prioritizing the pages and directories you want search engines to focus on crawling and indexing, helping improve your site’s SEO performance.
Understand Rules in Robots.txt File
There are various rules that you can add to your WordPress robots.txt file. Some common rules include disallowing specific directories or files, allowing or disallowing specific user-agents, and specifying your sitemap’s location.
It is important to understand the rules in a robots.txt file and use them correctly, as incorrect usage can result in unintended consequences, such as blocking search engines from accessing important pages on your website.
Below is a table that outlines some commonly used rules for robots.txt files and briefly explains their purpose.
Rule | Explanation |
User-agent: * | This rule specifies that the following rules apply to all crawlers. |
Disallow: / | This rule disallows all crawlers from accessing any page on the site. |
Disallow: /private/ | This rule disallows all crawlers from accessing any page under the /private/ directory. |
Allow: /public/ | This rule allows all crawlers to access any page under the /public/ directory. |
Sitemap: https://www.example.com/sitemap.xml | This rule specifies the location of the sitemap file for the site. |
Locate Robots.txt File in WordPress
The robots.txt file is a text file located in your website’s root directory. It instructs web robots, such as search engine crawlers, which pages or files on your site should not be accessed.
- To view the robots.txt file for your website, simply open a web browser and navigate to the URL your-website.com/robots.txt. If the file is present, its contents will be displayed in the browser.
- If nothing appears, this means that a robots.txt file has not been created for your website.
Optimize your website’s search engine visibility with a properly configured robots.txt file!
With Cloudways Managed WordPress Hosting, configuring your robots.txt file is a breeze.
Create Robots.txt File in WordPress
Here are the easy steps to create a robots.txt file for your WordPress website.
- Log in to your WordPress hosting dashboard. For example, if you use Cloudways, log in to your account.
- From the top menu bar, select the Servers tab.
- Navigate to Server Management and select Master Credentials to obtain your SSH/SFTP access.
- You can use any FTP server application to access your WordPress database files. I use FileZilla and connect to my server using the Master Credentials I get from Cloudways.
- Once you have connected to your server using FileZilla, navigate to the /applications folder of your WordPress database files. Inside this folder, you will see different subfolders.
- After navigating back to the Cloudways Platform, select the Applications option from the top left bar. From there, choose the application for which you want to add the robots.txt file.
- From the left pane of the Cloudways Platform, navigate to Application Management, select Application Settings, and finally, General. Here, you will find the folder name for your selected application.
- After navigating back to FileZilla, go to the /applications/[FOLDER NAME]/public_html directory, where [FOLDER NAME] is the folder name for your selected application that you found in the Cloudways Platform. In this directory, create a new text file and name it robots.txt.
- Once you have created the file, right-click on the file and select View/Edit to open it in a text editor. You can use any text editor of your choice, such as Notepad. This will allow you to edit the contents of the robots.txt file for your WordPress website.
Add Rules to WordPress Robots.txt
Now that you know what rules you can use, I will tell you how to edit the WordPress robots.txt file. Open your robots.txt file and add the following processes:
1. Block Access to Your Entire Site
This rule blocks all search engine robots from accessing your site by using
User-agent: * Disallow: / directives
2. Block a Single Bot From Accessing Your Site
This rule blocks a single bot (in this case, Googlebot) from accessing your site by using
User-agent: Googlebot Disallow: / directives
3. Block Access to a Specific Folder or File
This rule blocks access to a specific folder or file (in this case, /private-folder/) by using
User-agent: * Disallow: /private-folder/ directives
4. Allow All Bots to Have Full Access to Your Site
This rule allows all robots full access to your site by using
User-agent: * Disallow: directives
5. Allow Access to a Specific File in a Disallowed Folder
This rule allows access to a specific file (in this case, /private-folder/specific-file.html) in a disallowed folder (in this case, /private-folder/) by using
User-agent: *, Disallow: /private-folder/, Allow: /private-folder/specific-file.html directives
6. Prevent Bots From Crawling WordPress Search Results
This rule stops bots from crawling WordPress search results by using
User-agent: * Disallow: /?s= directives.
7. Create Different Rules for Different Bots in Robots.txt
This rule creates different rules for different bots in robots.txt by using the User-agent: directive followed by the bot’s name and then specifying the rules for that bot using the Disallow: or Allow: directives.
8. Disallow GPTBot to Access Your Content
To disallow GPTBot in the WordPress robots.txt file, you can use the User-agent: directive followed by the bot’s name, GPTBot, and then specify the rules for that bot using the Disallow: directive. Here is an example:
User-agent: GPTBot Disallow: /
Specify User Agents in Robots.txt File
Let’s now discuss using wildcards in the robots.txt file to allow or disallow specific file types throughout WordPress.
1. Using Wildcards in robots.txt File
Search engines like Google and Bing support using wildcards in the robots.txt file. These wildcards can allow/disallow specific file types throughout the WordPress website.
2. Using Asterisk (*) Wildcard
An asterisk (*) can be used to handle a wide range of options/selections. For example, to disallow all images starting with “image” and with “jpg” extension from being indexed by search engines, you can use the following code:
User-agent: * Disallow : /images/image*.jpg
The power of * is not limited to images only. You can even disallow all files with a particular extension. For example, to disallow all files with extensions “pdf” & “png” found in the downloads folder, you can use the following code:
User-agent: * Disallow: /downloads/*.pdf Disallow: /downloads/*.png
You can even disallow WordPress core directories by using *. For example, to ask search engines not to crawl directories starting with “wp-”, you can use the following code:
User-agent: * Disallow: /wp-*/
3. Using Dollar ($) Wildcard
Another wildcard symbol used in the WordPress robots.txt file is the dollar symbol ($). For example, to ask search engines not to index referral.php or referral.php?id=123 and so on, you can use the following code:
User-agent: * Disallow: referral.php
But what if you want to block referral.php only, you only have to include the $ symbol just after the referral.php.The symbol ensures only referral.php is blocked but not referral.php?id=123. For example:
User-agent: * Disallow: referral.php$
You can use $ for directories too. For example, to instruct search engines to disallow the wp-content folder and all directories inside the wp-content, you can use the following code:
User-agent: * Disallow: /wp-content/
If you want to disallow only wp-content rather than all sub-folders, you should use the $ symbol. For example:
User-agent: * Disallow: /wp-content/$
The $ symbol ensures that only wp-content is disallowed. All the directories in this folder are still accessible.
Example of a WordPress Robots.txt
Below is an example of a robots.txt file for a WordPress blog:
User-agent: * Disallow: /admin/ Disallow: /admin/*?* Disallow: /admin/*? Disallow: /blog/*?* Disallow: /blog/*? sitemap: http://www.yoursite.com/sitemap.xml
The first line indicates the User-agent. This refers to the search engine that can access and index the website. Where * means all search engines. You can specify each search engine separately.
The next few lines will not allow search engines to crawl certain directories such as “admin” and “blog”. It is often not necessary for search engines to index these directories.
If your site has a sitemap, adding its URL helps search engine bots find the sitemap file. This results in faster indexing of pages.
Validate WordPress Robots.txt File
Testing your WordPress robots.txt file is very important to ensure it has been set up correctly and isn’t negatively affecting the site’s performance. Here’s how you can test your WordPress robots.txt file:
- Open the robots.txt Tester tool;
- Submit your website URL;
- Click on the Test button.
- If the test button shows ‘Allowed’, this means that the URL you entered isn’t blocked from Google web crawlers or vice-versa if it shows ‘Blocked’.
- The tester shows errors and warnings. Fix those, and you’re good to go!
Avoid Common Mistakes in Robots.txt
When you make a Robots.txt file for your website, be sure to avoid these common mistakes:
- Don’t block pages that should be allowed: If you block pages that should be allowed, search engines won’t be able to find them.
- Don’t allow pages that should be blocked: Make sure to block pages that shouldn’t be seen by search engines. If you don’t, private information could be exposed.
- Test your Robots.txt file: After you make your robots.txt file, test it to ensure it works correctly. Check that all the pages you want to block are actually blocked.
- Update your Robots.txt file: As your website changes, update your robots.txt file too. If you don’t, search engines might not see your new pages or might see pages you don’t want them to see.
- Understand what Robots.txt does: Make sure you know what the robots.txt file does and how it works with search engines. You might make mistakes when setting it up if you don’t understand it.
Summary
The Robots.txt file is a valuable tool for SEO because it allows you to instruct search engine bots on what to index and what not to index on your website.
However, it’s important to be cautious when using it because a misconfiguration can lead to complete deindexation of your site (e.g., using Disallow: /).
Generally, the best practice is allowing search engines to crawl as much of your site as possible while protecting sensitive information and avoiding duplicate content. For example, you can use the Disallow directive to block specific pages or folders or the Allow directive to override a Disallow rule for a particular page.
Not all bots follow the rules outlined in the Robots.txt file, so controlling what gets indexed is not foolproof. Nonetheless, it’s a useful tool to include in your SEO strategy.
Frequently Asked Questions
Q. Why is a Robots.txt file used?
A. The Robots.txt file instructs search engine robots that analyze your website on what to consider and ignore while crawling. Thanks to this file, you can prohibit the exploration and indexing of your site to some robots (also called “crawlers” or “spiders”).
Q. Do I need Robots.txt for WordPress?
A. You should have a Robots.txt file for your WordPress website to control how search engines crawl your site. You can create one using a text editor or a plugin like Yoast SEO or All in One SEO Pack. Remember to take other security measures to protect your website from malicious bots or hackers.
Q. Where is Robots.txt in WordPress?
A. WordPress does not have a default Robots.txt file. You can create one manually and upload it to your site’s root directory, or use a plugin like Yoast SEO or All in One SEO Pack to create and edit your robots.txt file from the plugin’s settings. Test your Robots.txt file to ensure it is working correctly.
Danish Naseer
Danish Naseer is a WordPress Community Manager at Cloudways. He is passionate about designing, developing, and engaging with people to help them. He also actively participates in the community to share his knowledge. Besides that, he loves to watch documentaries, traveling and spending time with family. You can contact him at [email protected]