Just creating a website is not enough. Getting listed in the search engines is the essential goal of all website owners so that a website becomes visible in SERP for certain keywords. This listing of a website and visibility of freshest content is mainly due to search engine robots that crawl and index websites. Webmasters can control the way in which these robots parse websites by inserting instructions in a special file called robots.txt.
In this article, I’ll tell how to set up a WordPress robots.txt file for the best website SEO. Note that several pages of a WordPress website need not be indexed by the search engines.
What Is a Robots.txt File?
A robots.txt is a text file located at the root of your website that tells search engine crawlers not to crawl parts of your website. It is also known as the Robots Exclusion Protocol that prevents search engines from indexing certain useless and/or specific contents (e.g. your login page and sensitive files).
In short, robots.txt tells search engine bots what they should not crawl on your website.
Here is how it works! When a search engine bot is about to crawl a URL of your website (that is, it will crawl and retrieve information so it can be indexed), it will first look for your file robots.txt.
Why Create Robots.txt for WordPress?
You usually don’t need to add the robots.txt file for WordPress websites. Search engines index the entire WordPress sites by default. However, for better SEO, you can add a robots.txt file to your root directory to specifically disallow search engines to access specific areas of your WordPress website.
How to Create Robots.txt for WordPress?
Log in to your web hosting dashboard. In my example, I am using Cloudways – Managed Cloud Hosting platform.
Go to the Servers tab from the top menu bar and get your SSH/SFTP access from Server Management → Master Credentials.
Use any FTP server application to access your WordPress database files. I am using FileZilla for this tutorial. Launch it and connect to your server by using Master Credentials.
Once connected, go to /applications folder of your WordPress database files. You will see different folders there.
Now go back to the Cloudways Platform and from the top left bar, go to Applications. Select the application that you want to add the robots.txt file for:
From the left pane, go to Application Management → Application Settings → General. You will find the folder name of your application.
Go back to FileZilla and then navigate to /applications/[FOLDER NAME]/public_html. Create a new text file here and name it robots.txt.
Right click on the robots.txt file, and click View/Edit to open it in a text editor (Notepad is a handy option).
Advanced Robots.txt for WordPress
Search engines like Google and Bing support the use of wildcards in the robots.txt file. These wildcards can be used to allow/disallow specific file types throughout the WordPress website.
An asterisk (*) can be used to handle a wide range of options/selections.
User-agent: * Disallow : /images/image*.jpg
Here, * means that all images starting with “image” and with “jpg” extension will not be indexed by search engines.
Example: image1.jpg, image2.jpg, imagexyz.jpg will not be indexed by the search engines.
The power of * is not limited to images only. You can even disallow all files with a particular extension.
User-agent: * Disallow: /downloads/*.pdf Disallow: /downloads/*.png
The above statements will ask all search engines to disallow all files with extensions “pdf” & “png” found in the downloads folder.
You can even disallow WordPress core directories by using *.
User-agent: * Disallow: /wp-*/
The above line asks search engines not to crawl directories starting with “wp-”.
Example: wp-includes, wp-content, etc will not be indexed by search engines.
Another wildcard symbol used in WordPress robots.txt file is the dollar symbol ($).
User-agent: * Disallow: referral.php
The above statement will ask search engines not to index referral.php and also referral.php?id=123 and so on.
But what if you want to block referral.php only? You only have to include $ symbol just after the referral.php.
The symbol $ ensures that only referral.php is blocked but not referral.php?id=123.
User-agent: * Disallow: referral.php$
You can use $ for directories too.
User-agent: * Disallow: /wp-content/
This will instruct search engines to disallow wp-content folder plus all directories that are located inside wp-content. If you want to disallow only wp-content rather than all sub-folders, you should use the $ symbol. For example:
User-agent: * Disallow: /wp-content/$
The $ symbol ensures that only wp-content is disallowed. All the directories in this folder are still accessible.
Below is the robots.txt file for Cloudways blog.
User-agent: * Disallow: /admin/ Disallow: /admin/*?* Disallow: /admin/*? Disallow: /blog/*?* Disallow: /blog/*?
The first line indicates the User-agent. This refers to the search engine that is allowed to access and index the website. A complete list of all search engine bots is available here.
Where * means all search engines. You can specify each search engine separately.
Disallow: /admin/ Disallow: /admin/*?* Disallow: /admin/*?
This will not allow search engines to crawl the “admin” directory. It is often not necessary for search engines to index these directories.
Disallow: /blog/*?* Disallow: /blog/*?
If your WordPress site is a blogging site, it is the best practice to restrict search engine bots to not crawl your search queries.
If your site has a sitemap. Adding its URL helps search engine bots in finding the sitemap file. This results in faster indexing of pages.
What to Include in Robots.txt for WordPress?
You decide which parts of the WordPress site you wish to be included in SERP. Everyone has their own views on setting WordPress robots.txt file. Some recommend not to add a robots.txt file in WordPress. While in my opinion one should add and disallow /wp-admin/ folder. Robots.txt file is public. You can find a robots.txt file of any website by visiting www.example.com/robots.txt.
We’re done with a robots.txt file in WordPress. If you have any query about setting robots.txt file, feel free to ask in the comment section below.
As you can see, the file robots.txt is an interesting tool for your SEO. It makes it possible to point out to search engine robots what to index, and what not to index. But it must be handled with care. A bad configuration can lead to a total deindexation of your website (example: if you use Disallow: /). So, be careful!
Now it’s your turn. Tell me if you use this type of file and how you configure it. Share me your comments and feedback in the comments.
Q1. What is robots.txt?
The robots.txt is a text file placed at the root of your website. This file is intended to prohibit search engine robots from indexing certain areas of your website. The robots.txt file is one of the first files scanned by spiders (robots).
Q2. Why a robots.txt file is used?
The robots.txt file gives instructions to the search engine robots that analyze your website, it’s an exclusion protocol for robots. Thanks to this file, you can prohibit the exploration and indexing of your site to some robots (also called “crawlers” or “spiders”).
Start Growing with Cloudways Today!
We never compromise on performance, security, and support.
Mustaasam is the WordPress Community Manager at Cloudways - A Managed WordPress Hosting Platform, where he actively works and loves sharing his knowledge with the WordPress Community. When he is not working, you can find him playing squash with his friends, or defending in Football, and listening to music. You can email him at email@example.com