Every brand wishes to dominate Google SERPs and spends a lot of money and efforts to get their pages into the TOP TEN positions.
You need to realize that search engine optimization (SEO) has gone very technical. If you wish to get ranked in Google’s SERP, you need to understand the dynamics of technical SEO in order to beat the competition. This means that you need to start thinking beyond keyword placement and publishing blogs on third-party websites for generating backlinks.
In this article, I will introduce two important aspects of technical SEO: crawl equity and server logs. These are important because if Google’s bots are not visiting your websites regularly, all your SEO efforts can potentially go down the drain.
- The Crawl Budget Concept
- Factors That Affect Crawl Budget
- PageRank
- Server Response Time
- Site Structure
- Content
- What Are Server Logs?
- How This Can Help Your SEO?
- How to Make Most of Your Crawl Budget?
- Take A Look Under the Hood
The Crawl Budget Concept
Google and other search engines assign each domain a limited daily “crawl budget” that dictates the number of your website’s pages that their spiders will crawl.
The crawl budget is calculated based on two factors:
- Crawl rate limit: how much the search engine can crawl your site without crashing your server.
- Crawl demand: how much the search engine wants to crawl your site.
Generally, smaller websites with a few thousand distinct URLs don’t have to worry about the crawl budget. Search engines can crawl most of your web pages with ease. However, if you’re a large site with thousands or even millions of pages, you’ll want to utilize your crawl budget to boost your online visibility strategically.
Factors That Affect Crawl Budget
The following factors significantly affect the crawl budget allocate to your website.
- PageRank
Crawl demand (and therefore crawl budget as well) is directly related to your domain authority and link equity. Crawl demand signals to Google that you are a trusted authority site. The higher your PageRank, the more Google will want to crawl your site for fresh content.
- Server Response Time
The server response time is the time taken by your hosting server to respond to the visitor’s request. Sometimes, it is also referred to as the Time To First Byte (TTFB). According to Google guidelines, a website shouldn’t have TTFB more than 200 ms. Test your website from an online speed testing tool, and improve server response time if it’s greater than 200-300ms.
- Site Structure
Proper site structure makes navigation easy for users, and easier for crawl bots. Your navigation and internal linking determine how crawl-friendly your site is.
A simple, logical hierarchy with major categories, subcategories, and individual pages works best for both the visitors and crawl bots. Site structure becomes an issue with larger sites that have faceted navigation, or when searches filter through user-selected parameters.
To improve this factor, restructure your site for SEO to prevent creating millions of URLs that confuse bots and eat up crawl budget.
- Content
Low-value pages, outdated content, spam, and duplicate content all squander away your valuable crawl equity.
Ensuring that you have original, value-adding, high-quality content on each of your pages prevent crawlers from missing out on your site’s most important sections.
What Are Server Logs?
Let’s jump to a different concept for a moment.
Whenever a user (or crawl bot) accesses your site, your server will create a log. It is a record of all of the requests a server receives during a particular time frame.
Server logs contain a lot of incredibly useful data that you can use to improve your website design and strategy.
Here’s an example of a server log entry from Wikipedia:
- 0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] “GET /apache_pb.gif HTTP/1.0” 200 2326
So here’s a breakdown of what each part means:
- 0.0.1: This is the IP address of the remote host that requested access to your site.
- user-identifier: The client’s RFC 1413 identity.
- frank: The requester’s user ID.
- [10/Oct/2000:13:55:36 -0700]: This is the timestamp of the request, including the date, exact time, and time zone.
- GET /apache_pb.gif HTTP/1.0: GET is one of two methods (the other is POST) that give you more information about the user’s behavior. GET tells you that the user tried to retrieve data—in this case, the resource at /apache_pb.gif. POST tells you that the user submitted something to the site, like a form or comment. HTTP/1.0 tells you the HTTP version accessed.
- 200: The status code that your site returned. A 200 status code means it was successful, a 300 status code is a redirect, a 400 status code is a client error, and a 500 status code is a server error.
- 2326: The number of downloaded bytes when the page was accessed.
How This Can Help Your SEO?
Server logs give you a wealth of information that you can use to understand crawler bot behavior. If you filter the records to isolate the search engine spiders, you’ll get a detailed and very accurate view of how they crawl pages.
The insights you’ll gain from analyzing server logs will help you make necessary improvements to your site, rank higher in SERP, get more traffic, convert clicks into leads, and convert leads into sales.
Here are some of the things that you can discover during a detailed server log analysis:
- How often Google crawls a specific directory
- Performance issues, long load times, or common server errors
- Broken links and duplicate content
- Pages with too many crawls
- Pages with not enough crawl
How to Make the Most of Your Crawl Budget?
Because you only have a finite crawl budget, it’s crucial to optimize your crawl budget strategy and redirect search engine spiders into indexing the important pages.
Here are some basic tips in tweaking your crawl equity to your advantage.
- Find and fix broken links, errors, and redirects, including soft 404 error pages that may have duplicate content or pages detached from your site structure.
- Replace temporary 302 redirects with permanent 301 redirects. 301 redirects will prevent Google from re-crawling the page too often.
- Spot and address 400/500 status code errors immediately.
- Use the rel=nofollow tag to discourage bots from crawling duplicate content, or use rel=canonical to redirect the indexing signals to a superset URL.
- Remove low-value or duplicate content, or return a 404/410 status code on those pages.
- If you’re using faceted navigation, use URL parameters instead of directories or file paths to display filtered content.
Take A Look Under the Hood
If your site isn’t performing as well as you wish, even after optimizing and redesigning your content, then it might be time to look at your server logs.
Understanding crawl budgets and log analysis is a crucial step towards better SEO. You can’t increase your crawl budget overnight, but you can make the most of the crawl budget you already have.
Disclaimer: This is a guest post by Tony Atkins from CixxFive. The opinions and ideas expressed herein are the author’s own, and in no way reflect Cloudways position.
Umair Hussain Siddiqui
Umair Hussain is a Digital Marketer with a Computer Science background, and working at Cloudways - A Managed Cloud Hosting Platform. He is an internet savvy & loves to dig into search engine optimizations. In his free time, he likes to watch Sci-fi and mind-bending time travel movies & series. You can ping him at [email protected]