Just creating a website is not enough. Getting listed in search engines is the essential goal of all website owners so that a website becomes visible in SERP for certain keywords. This listing of a website and visibility of freshest content is mainly due to search engine robots that crawl and index websites. Webmasters could control the way in which these robots parse websites by inserting instructions in a special file called robots.txt.
Today, you will learn how to easily setup a WordPress site for the best SEO. Several areas of a WordPress website need not be indexed by the search engines. You can make this happen by using WordPress robots.txt file.
Understanding Robots.txt File
Every website has a “robots.txt” file that tells search engines what pages to index. This file is usually found in the root directory of a website. If not, you can easily create one.
Set up WordPress Robots.txt File
Usually, for WordPress websites, you do not need to add robots.txt file. Search engines index the entire WordPress sites by default. However, for better SEO, you can add robots.txt file to your root directory to specifically disallow search engines to access specific areas of your WordPress site.
How to Create Robots.txt File
Login to your hosting Dashboard. In our example, I will be using Cloudways – a high-performance Managed Cloud WordPress Hosting.
Go to “Servers” tab from top left, and get into your server. You will find your FTP details there.
Open an FTP server application to access your WordPress files. I will be using “FileZilla.” Launch it and connect to a server by using above “MASTER CREDENTIALS”.
Once connected, go to the Applications folder. You will see some folders here. Do not get confused. Go back to the Cloudways platform and from the top left pane, go to Applications.
Select the application that you want to add the robots.txt file for:
From the left pane, go to Application Settings. You will find the folder name of your application.
Come back to FileZilla and then navigate to “/applications/[FOLDER NAME]/public_html.” Create a new text file here. Name it “robots.txt”
Open robots.txt WordPress file with any text editor (Notepad is a handy option).
Advanced robots.txt for WordPress
Search engines like Google and Bing support the use of wildcards in the robots.txt file. These wildcards can be used to allow/disallow specific file types throughout the WordPress website.
An asterisk (*) can be used to handle sequence.
User-agent: * Disallow : /images/image*.jpg
Here, * means that all images starting with “image” and with “jpg” extension will not be indexed by search engines.
Example: image1.jpg, image2.jpg, imagexyz.jpg will not be indexed by search engines.
The power of * is not limited to images only. You can even disallow all files with a particular extension.
User-agent: * Disallow: /downloads/*.pdf Disallow: /downloads/*.png
The above statements will ask all search engines to disallow all files with extensions “pdf” & “png” found in the downloads folder.
You can even disallow WordPress core directories by using *.
User-agent: * Disallow: /wp-*/
The above line asks search engines not to crawl directories starting with “wp-”.
Example: wp-includes, wp-content, etc will not be indexed by search engines.
Another wildcard symbol used in WordPress robots.txt file is the dollar symbol ($).
User-agent: * Disallow: referral.php
The above statement will ask search engines not to index referral.php and also referral.php?id=123 and so on.
But what if you want to block referral.php only? You only have to include $ symbol just after the referral.php.
The symbol $ ensures that only referral.php is blocked but not referral.php?id=123.
User-agent: * Disallow: referral.php$
You can use $ for directories too.
User-agent: * Disallow: /wp-content/
This will instruct search engines to disallow wp-content folder plus all directories that are located inside wp-content. If you want to disallow only wp-content rather than all sub-folders, you should use the $ symbol. For example:
User-agent: * Disallow: /wp-content/$
The $ symbol ensures that only wp-content is disallowed. All the directories in this folder are still accessible.
Below is the robots.txt file for Cloudways blog.
User-agent: * Disallow: /admin/ Disallow: /admin/*?* Disallow: /admin/*? Disallow: /blog/*?* Disallow: /blog/*?
The first line indicates the User-agent. This refers to the search engine that is allowed to access and index the website. A complete list of all search engine bots is available here.
Where * means all search engines. You can specify each search engine separately.
Disallow: /admin/ Disallow: /admin/*?* Disallow: /admin/*?
This will not allow search engines to crawl the “admin” directory. It is often not necessary for search engines to index these directories.
Disallow: /blog/*?* Disallow: /blog/*?
If your WordPress site is a blogging site, it is the best practice to restrict search engine bots to not crawl your search queries.
If your site has a sitemap. Adding its URL helps search engine bots in finding the sitemap file. This results in faster indexing of pages.
What to include in my WordPress robots.txt file?
You decide which parts of WordPress site you wish to be included in SERP. Everyone has their own views on setting WordPress robots.txt file. Some recommend not to add robots.txt file in WordPress. While in my opinion one should add and disallow /wp-admin/ folder. Robots.txt file is public. You can find robots.txt file of any website by visiting www.example.com/robots.txt.
We’re done with robots.txt file in WordPress. If you have any query about setting robots.txt file, feel free to ask in the comment section below.
Frequently Asked Questions
Q1. What is robots txt WordPress?
The robots.txt is a text file placed at the root of your website. This file is intended to prohibit search engine robots from indexing certain areas of your website. The robots.txt file is one of the first files scanned by spiders (robots).
Q2. Why robots txt file is used?
The robots.txt file gives instructions to the search engine robots that analyze your website, it’s an exclusion protocol for robots . Thanks to this file, you can prohibit the exploration and indexing of your site to some robots (also called “crawlers” or “spiders”).