Bots: to Block or Not to Block? Effective Bot Management Strategy

Bots: to Block or Not to Block? Effective Bot Management Strategy

Bots: to Block or Not to Block? Effective Bot Management Strategy

Why Blocking All Bots Isn’t Always The Best Idea

Alternatives to Blocking Bots: Bot Mitigation Techniques

Closing Thoughts: When To Block Bots and bot management

Why Blocking All Bots Isn’t Always The Best Idea

Alternatives to Blocking Bots: Bot Mitigation Techniques

Closing Thoughts: When To Block Bots and bot management

Why Blocking All Bots Isn’t Always The Best Idea

Alternatives to Blocking Bots: Bot Mitigation Techniques

Closing Thoughts: When To Block Bots and bot management

Related posts:

Creating and Configuring robots.txt

Feeding Fake Data

Throttling or Rate Limiting

Challenging The Client With CAPTCHA

Industry & Business Best Practices

Emma Yulini
Outreach Manager Rise Digital

Creating and Configuring robots.txt

Feeding Fake Data

Throttling or Rate Limiting

Challenging The Client With CAPTCHA

Industry & Business Best Practices

Emma Yulini Outreach Manager Rise Digital

Creating and Configuring robots.txt

Feeding Fake Data

Throttling or Rate Limiting

Challenging The Client With CAPTCHA

Emma Yulini
Outreach Manager Rise Digital

A significant portion of users visiting your site are not human. In 2017, more than 50% of internet traffic came from bots, and while there have been some improvements in recent years, today it’s estimated that more than 40% of all internet traffic comes from bots, and around 25% comes from malicious, bad bots. Malicious bots are used by cybercriminals to launch various types of bot attacks that can lead to extremely costly and often unmanageable damages both to a business’s finances and reputation.

Blocking these bots would seem to be the best, most cost-effective approach to protect your website and systems from various types of cybersecurity attacks; however, the reality of this is not that simple.

First, what is a bot? Or to be more exact, what is an internet bot?

In a nutshell, a “bot” is simply a software solution that is programmed to perform an automated process. This process is typically relatively simple but repetitive. The main benefit of a bot is that it can execute a process at a much faster rate than any human user.

With that being said, there are bots owned by legitimate businesses that are actually beneficial for your website. Googlebot, for example, has an important task of crawling and indexing your site so it can be ranked on Google’s search result pages.

We wouldn’t want to accidentally block these good bots, and the thing is, differentiating between the good bots and the malicious bots isn’t always easy.

Nowadays, there are plenty of advanced bot programmers that are also utilizing the latest technologies (including AI and deep learning) to develop very sophisticated bots. They can, for example, use legitimate residential proxies to rotate between seemingly legitimate IP addresses while performing humanlike activities, such as non-linear mouse movements and randomized clicks.

As we can see, it’s already challenging enough to identify bot activities from legitimate users, much less ensuring whether or not the identified bots are malicious in nature.

This is one of the two reasons why blocking all bots isn’t always a good idea; we wouldn’t want to accidentally block legitimate users and good bots, what we refer to as false positives. This is why having an adequately powerful bot management solution is crucial.

The second reason is that even if we’ve successfully identified the presence of a malicious bot, blocking this bot will not stop a persistent bot owner/operator. They simply will modify the bot to more effectively bypass your detection and blocking measures, and if you are not careful, they can use information you’ve accidentally provided (i.e. your error message) to modify the bot.

Instead, it’s typically better to just mitigate and manage the malicious bot’s activities without the operator’s knowledge. This will allow the bot to waste its resources, which will more effectively discourage a persistent attacker.

All bot management practices should start with creating and properly configuring a robots.txt file. This is a simple text file that lives on your website’s server and functions to specify rules and policies for any bots accessing your site’s or application’s resources.

For instance, you can set rules defining which pages can be crawled by bots, and which ones can’t.

Good bots will follow the rules set by robots.txt, but malicious bots will not. In fact, bad bots might parse the content of the robots.txt file to get more information about your website in an attempt to find vulnerabilities and valuable data.

While robots.txt is not always an effective measure against malicious bots, it’s still very important to configure it properly to manage good bots making too many requests.

This method is fairly simple in principle; we keep the malicious bot active to waste its resources by replying to its requests with fake data.

For example, we can redirect the bot to a page similar to one it’s requesting but with thin or altered content. We can deliberately mislead the bot with wrong data values to poison its database, so the attacker will take the wrong action.

Bots operate with resources, which can be quite expensive for the bot operator. So, hopefully by slowing the bot enough and letting it waste its resources, the attacker will give up and move to other targets.

Similar in practice to the above: we slow down the bot’s activities to let it waste resources. However, here we will let it access the legitimate resource while slowing down the bandwidth served. This can be effective for bots that are persistently attacking certain resources, and the idea remains the same. Hopefully, we’ll discourage the attacker so they’ll move to different targets other than our website.

This is a common approach when we are not sure whether the client is a bot or a human user. CAPTCHA tests or similar challenges are designed to be very difficult for bots to solve, but fairly easy to solve by human users.

However, no matter how good a CAPTCHA is, it will always slow down legitimate users and hurt the user experience. Also, with the presence of CAPTCHA farms, it is no longer a perfect, bulletproof solution.

Still, it is an effective solution in specific use cases to protect your site from less advanced bots.

As a last resort, and in cases where we are almost 100% sure that the client is a malicious bot, total blocking may be the best choice. When we block bots, we don’t need to process their traffic and there are no rules or policies to implement, making this a cost-effective solution.

However, blocking bots is only recommended where we have an adequate bot management and detection solution in place, preferably an AI-powered solution capable of effectively and consistently differentiating between good bots and bad bots.

Emma Yulini

Tags: AI, Bots, Cybercriminals, Cybersecurity, Data, Google, Security

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

« April 2025 »