This website uses cookies

Our website, platform and/or any sub domains use cookies to understand how you use our services, and to improve both your experience and our marketing relevance.

📣 Introducing DigitalOceans General Purpose & CPU Optimized Servers on Cloudways Flexible. Learn More

How to Block AI Crawlers That Slow Down Your Website and Waste Bandwidth

Updated on April 23, 2025

9 Min Read
Block AI Crawlers

TL;DR: You can’t fully block AI scrapers, but robots.txt, meta tags, and Cloudflare’s bot control help limit them. Cloudways users get built-in bot blocking via Imunify360 WAF.

Not all website traffic is good traffic. While human visitors engage with your content, AI crawlers quietly scan your pages—often without your permission. These bots, like OpenAI’s GPTBot, Applebot, CCBot, Google-Extended, and Bytespider, are built to collect data for AI models and search tools.

Sure, when AI bots crawl your website, they may include your content in AI-generated responses on platforms like ChatGPT. But when too many of these bots hit your site at once, they don’t just skim your content—they eat up bandwidth, and slow everything down for real visitors.

No surprise, then, that over 35% of the world’s top 1,000 websites are now blocking GPTBot, according to data from Originality.ai. Site owners are starting to push back, setting boundaries on how their content is accessed.

If you’re ready to do the same, this guide breaks down three easy and effective ways to stop unwanted AI bots from draining your site’s performance.

But before we discuss how to block AI traffic from your site, let’s get into what AI bots are, their types, and what user agents are.

What Are AI Crawler Bots?

AI crawlers bots are automated programs that visit websites to collect content. This data is used to train large language models (LLMs) used in tools like chatbots and AI-driven search.

Unlike traditional crawlers like Googlebot that index content for search results, AI crawlers are focused on collecting massive volumes of content to improve how AI understands and responds to language.

And their presence is rapidly growing. According to traffic data from Vercel’s network, AI crawlers are now responsible for a large share of automated requests hitting their infrastructure.

In just one month, OpenAI’s GPTBot made 569 million fetches, followed by Anthropic’s Claude at 370 million. AppleBot registered 314 million, while PerplexityBot added another 24.4 million.

While these numbers don’t yet rival Googlebot’s 4.5 billion requests across Gemini and Search, the combined volume from these AI crawlers amounts to nearly 1.3 billion fetches—or about 28% of what Googlebot generated—based on Vercel’s traffic insights.

The rise in this kind of traffic isn’t slowing down. The DV Fraud Lab reported that bot activity nearly doubled in the second half of 2024, with December alone showing a 70% jump compared to the same month the year before—crossing over 2 billion ad requests.

As more AI crawlers show up, they eat into a site’s bandwidth. That extra load can slow things down for real visitors.

What Is a User Agent?

After talking about AI crawlers, you might wonder how websites even know what crawler is visiting them. That’s why it’s important to talk about user agents.

A user agent is like a name tag for any software that connects to a website. It tells the site what kind of program is making the request—whether it’s a browser like Chrome or Firefox, or a bot like GPTBot or Googlebot. Every time a crawler or browser loads a page, it sends a small string of text that identifies itself. That’s the user agent.

For example, if you’re browsing with Chrome, your browser will send a user agent string saying it’s Chrome, which version it is, and what system you’re on. AI crawlers do the same, which helps website owners understand who—or what—is visiting their site.

Knowing the user agent can help site owners block crawlers that are using up too many resources.

Types of AI Bots

Companies now rely on different kinds of AI bots to gather and process information online. Here’s a breakdown of the main types you’ll come across:

Bot Type What They Do Examples
Chat Bots Designed to respond to user queries using AI. They rely on content they’ve been trained on. ChatGPT-User (OpenAI), Meta-ExternalFetcher
Data Collectors Scan websites to collect large sets of written content for training AI models. Applebot, Common Crawl, ClaudeBot
Search Crawlers Analyze pages for keywords, links, and structure to help power AI search tools. PerplexityBot, BingPreview

Why You Might Want to Block AI Crawlers?

As mentioned earlier, AI crawlers don’t just visit your site once and leave quietly—they often scan multiple pages, pulling large chunks of content. That repeated activity can eat up bandwidth and slow things down, especially if your server isn’t built to handle a constant stream of automated visits.

But that’s not the only concern. When these bots scrape your content, it often ends up powering AI tools—like AI search features or overviews shown directly in search results.

The problem?

If users are already seeing your content summarized or repackaged somewhere else, they may never click through to your site. So even if your content is solid, your traffic can take a hit.

Blocking AI crawlers gives you a way to take back a bit of control—keeping your content from being used where you didn’t intend, and also helping protect your server’s resources at the same time.

Speed Matters. So Does Smart Hosting.

Don’t let bots drag your site down. Get faster load times, better control, and reliable performance with our Hosting for WordPress. Built for speed and peace of mind.

Will Blocking AI Bots Hurt My Google Rankings?

No, blocking AI crawlers doesn’t impact how your site ranks in regular Google Search. These bots are different from Googlebot, which is responsible for indexing your pages. As long as Googlebot isn’t blocked, your content will still appear in search results like it normally does. Blocking AI crawlers just stops them from using your content to train language models or power AI summaries.

How to Block AI Crawlers [4 Methods]

There are a few ways to keep AI bots off your site. You can set rules in your robots.txt, block IPs using a firewall, or use a CDN like Cloudflare, which offers Bot Fight Mode and AI Scrapers and Crawlers blocker.

If you’re on Cloudways managed cloud hosting, bot protection is handled for you as part of the fully managed setup—no manual efforts needed.

Let’s now walk through each option a little more closely.

Method#1: Use Robots.txt (Manual Method)

Use Robots.txt

The most straightforward way to block AI bots is by adding a short rule to your robots.txt file:

User-agent: name-of-bot

Disallow: /

For example, to block OpenAI’s crawler, you’d write:

User-agent: GPTBot

Disallow: /

Let’s quickly go through how to create and upload this file. I’ll use FileZilla for this walkthrough, though there are plenty of other ways you can access your site files.

Steps to Set Up a Robots.txt File with FileZilla

1. Connect to Your Site

  • Open FileZilla and connect using your FTP credentials. I’m using Cloudways, so I’ll navigate to Server Management and select Master Credentials to obtain access to your SSH/SFTP. I’ll enter my credentials to log in.

Connect to Your Site

  • Once connected, go to your website’s root directory—this is usually called public_html or may just be your domain name.

2. Create the File

  • Right-click in the file area and create a new file named robots.txt. If you already have one, just right-click and choose “View/Edit” to open it up. I already have a robots.txt file, so I’ll edit it.

Create the File

3. Add the Rules

Open the file in a basic text editor like Notepad. Then, type the rule like this:

User-agent: BotName

Disallow: /

Replace BotName with the name of the bot you want to block. For example, to block GPTBot, just write:

User-agent: GPTBot

Disallow: /
<.pre>

4. Save and Upload

Once you’re done, upload the updated version of the file to your server.

5. Test It

To check if it’s live, go to your browser and type: yourdomain.com/robots.txt. If it loads, you’re all set.

How to Block Most Common AI Bots?

In the earlier example, we looked at how to block just one crawler (OpenAI’s GPTBot). Let’s now look at how to block most common AI crawler bots:

1. ChatGPT-User

What it does: Crawls websites when ChatGPT users request summaries or citations. How to block:

User-agent: ChatGPT-User

Disallow: /

2. Meta-ExternalFetcher

What it does: Fetches web content for Meta’s AI tools (Facebook, Instagram).

How to block:

User-agent: Meta-ExternalFetcher

Disallow: /

3. ClaudeBot

What it does: Gathers data to train Anthropic’s Claude AI.

How to block:

User-agent: ClaudeBot

Disallow: /

4. GPTBot

What it does: Scrapes web content to train OpenAI’s models like ChatGPT.

How to block:

User-agent: GPTBot

Disallow: /

5. Google-Extended

What it does: Collects data for Google’s AI products (Gemini, Vertex AI).

How to block:

User-agent: Google-Extended

Disallow: /

6. Bytespider

What it does: Scrapes content for TikTok’s parent company (ByteDance).

How to block:

User-agent: Bytespider

Disallow: /

7. PerplexityBot

What it does: Indexes web pages for Perplexity AI’s search answers.

How to block:

User-agent: PerplexityBot

Disallow: /

8. Applebot-Extended

What it does: Trains Apple’s AI models (Siri, Apple Intelligence).

How to block:

User-agent: Applebot-Extended

Disallow: /

9. Amazonbot

What it does: Powers Alexa’s search results.

How to block:

User-agent: Amazonbot

Disallow: /

10. Diffbot

What it does: Extracts and sells website data for AI training.

How to block:

User-agent: Diffbot

Disallow: /

11. CCBot

What it does: Builds open datasets for AI training (Common Crawl).

How to block:

User-agent: CCBot

Disallow: /

12. Scrapy

What it does: Aggressive scraper often used for datasets.

How to block:

User-agent: Scrapy

Disallow: /

13. YouBot

What it does: Crawls for You.com’s AI search results.

How to block:

User-agent: YouBot

Disallow: /

14. OAI-SearchBot

What it does: Indexes content for OpenAI’s SearchGPT.

How to block:

User-agent: OAI-SearchBot

Disallow: /

15. FacebookBot

What it does: Trains Meta’s AI speech recognition.

How to block:

User-agent: FacebookBot

Disallow: /

16. Applebot

What it does: Indexes web content for Siri’s answers.

How to block:

User-agent: Applebot

Disallow: /

17. Meta-ExternalAgent

What it does: Scrapes data for Meta’s AI projects.

How to block:

User-agent: Meta-ExternalAgent

Disallow: /

18. Omgili

What it does: Sells crawled data for AI training (Webz.io).

How to block:

User-agent: omgili

Disallow: /

19. Anthropic-AI

What it does: Suspected crawler for Anthropic’s AI models.

How to block:

User-agent: anthropic-ai

Disallow: /

20. Claude-Web

What it does: Unconfirmed crawler for Claude AI.

How to block:

User-agent: Claude-Web

Disallow: /

21. Cohere-AI

What it does: Likely crawls for Cohere’s AI tools.

How to block:

User-agent: cohere-ai

Disallow: /

22. Ai2Bot

What it does: Crawls domains to train language models.

How to block:

User-agent: Ai2Bot

Disallow: /

23. Ai2Bot-Dolma

What it does: Gathers web data for AI training (Ai2).

How to block:

User-agent: Ai2Bot-Dolma

Disallow: /

24. FriendlyCrawler

What it does: Unknown purpose, possibly for ML experiments.

How to block:

User-agent: FriendlyCrawler

Disallow: /

25. Timpibot

What it does: Scrapes data for AI model training (Timpi).

How to block:

User-agent: Timpibot

Disallow: /

26. Webzio-Extended

What it does: Sells crawled data for AI training (Webz.io).

How to block:

User-agent: Webzio-Extended

Disallow: /

How to Block Everything at Once

To block all crawlers (including non-AI bots), add this to robots.txt:

User-agent: *

Disallow: /

Method#2: Use a Firewall

A firewall gives you direct control over what gets through to your site—and what doesn’t. One way to slow down or block AI crawlers is by identifying and denying access to known IP addresses they use. It’s not bulletproof, since bots often rotate IPs, but it’s a decent first layer of defense.

You can also set up your firewall to trigger CAPTCHAs for suspicious traffic. This prevents automated tools from bypassing, helping you filter out non-human visits before they reach your site.

If you’re hosting your site on Cloudways, you’re already covered with a managed web application firewall (WAF) powered by Imunify360. It works in the background to block bad bots automatically. We’ll talk more about this further down.

Method#3: Use a CDN (Automated Option)

Aside from speeding up your site, CDNs also stop bots. At Cloudways, we’ve partnered with Cloudflare to offer their Enterprise add-on at just $4.99/month. That’s a huge drop from the usual $200+ price tag for the same plan.

As I mentioned earlier, Cloudflare includes two key features for blocking unwanted bot traffic: Bot Fight Mode and AI Scrapers and Crawlers. If you’re using Cloudflare directly, you can turn them on from the Security > Bots section in your Cloudflare dashboard.

Security Bots

But if you’re using the Cloudflare Enterprise add-on through Cloudways, we take care of everything for you. The setup is fully managed, which means we actively monitor and block suspicious bots from the backend—you won’t have to worry about doing anything from your end.

Method#4: Use Cloudways to Block AI Bots From Scraping Your Site (Hands-off Option)

At Cloudways, our managed Web Application Firewall (WAF), powered by Imunify360, is built to keep bots out without getting in the way of real visitors.

It works by using an Anti-Bot Challenge that filters out unwanted traffic before it hits your site.

Most bots fail this step and never reach your WordPress, Drupal, or other web applications—saving your server resources and protecting you from spam, scans, and automated attacks.

Legit users won’t even notice anything. They’ll get your content right away. The system quietly checks for basic browser support like JavaScript and cookies to confirm it’s a human.

Good bots, like Google’s, are left alone. The best part is that Cloudways WAF comes free with our Flexible plan.

Want more control?

You can manually block IPs or countries directly from the Cloudways dashboard.

Block IP

And if you’re looking to lock things down even tighter, pairing this with our Cloudflare Enterprise add-on gives you an extra firewall layer that catches all kinds of bot traffic.

Conclusion

It’s not fully possible to block AI from accessing your content just yet, but you can cut down on the number of AI bots hitting your site. Starting with robots.txt is a good step, since many AI crawlers follow the rules you define in your file.

A good CDN like Cloudflare helps too, especially with its bot blocking features that keep low-quality traffic out.

Firewalls also give you control over who gets in, and when paired with CAPTCHA, they’re pretty effective at sorting out humans from bots.

And if you’re using Cloudways, our managed WAF (offered for free with Cloudways Flexible plan), powered by Imunify360, blocks AI bots behind the scenes—no need for manual setup.

All combined, these steps help you cut down unwanted scrapers, save bandwidth, and better block AI from draining your site’s resources.

Share your opinion in the comment section. COMMENT NOW

Share This Article

Abdul Rehman

Abdul is a tech-savvy, coffee-fueled, and creatively driven marketer who loves keeping up with the latest software updates and tech gadgets. He's also a skilled technical writer who can explain complex concepts simply for a broad audience. Abdul enjoys sharing his knowledge of the Cloud industry through user manuals, documentation, and blog posts.

×

Webinar: How to Get 100% Scores on Core Web Vitals

Join Joe Williams & Aleksandar Savkovic on 29th of March, 2021.

Do you like what you read?

Get the Latest Updates

Share Your Feedback

Please insert Content

Thank you for your feedback!

Do you like what you read?

Get the Latest Updates

Share Your Feedback

Please insert Content

Thank you for your feedback!

Want to Experience the Cloudways Platform in Its Full Glory?

Take a FREE guided tour of Cloudways and see for yourself how easily you can manage your server & apps on the leading cloud-hosting platform.

Start my tour