A significant portion of users visiting your site are not human. In 2017, more than 50% of internet traffic came from bots, and while there have been some improvements in recent years, today it’s estimated that more than 40% of all internet traffic comes from bots, and around 25% comes from malicious, bad bots. Malicious bots are used by cybercriminals to launch various types of bot attacks that can lead to extremely costly and often unmanageable damages both to a business’s finances and reputation.
Blocking these bots would seem to be the best, most cost-effective approach to protect your website and systems from various types of cybersecurity attacks; however, the reality of this is not that simple.
Why Blocking All Bots Isn’t Always The Best Idea
First, what is a bot? Or to be more exact, what is an internet bot?
In a nutshell, a “bot” is simply a software solution that is programmed to perform an automated process. This process is typically relatively simple but repetitive. The main benefit of a bot is that it can execute a process at a much faster rate than any human user.
With that being said, there are bots owned by legitimate businesses that are actually beneficial for your website. Googlebot, for example, has an important task of crawling and indexing your site so it can be ranked on Google’s search result pages.
We wouldn’t want to accidentally block these good bots, and the thing is, differentiating between the good bots and the malicious bots isn’t always easy.
Nowadays, there are plenty of advanced bot programmers that are also utilizing the latest technologies (including AI and deep learning) to develop very sophisticated bots. They can, for example, use legitimate residential proxies to rotate between seemingly legitimate IP addresses while performing humanlike activities, such as non-linear mouse movements and randomized clicks.
As we can see, it’s already challenging enough to identify bot activities from legitimate users, much less ensuring whether or not the identified bots are malicious in nature.
This is one of the two reasons why blocking all bots isn’t always a good idea; we wouldn’t want to accidentally block legitimate users and good bots, what we refer to as false positives. This is why having an adequately powerful bot management solution is crucial.
The second reason is that even if we’ve successfully identified the presence of a malicious bot, blocking this bot will not stop a persistent bot owner/operator. They simply will modify the bot to more effectively bypass your detection and blocking measures, and if you are not careful, they can use information you’ve accidentally provided (i.e. your error message) to modify the bot.
Instead, it’s typically better to just mitigate and manage the malicious bot’s activities without the operator’s knowledge. This will allow the bot to waste its resources, which will more effectively discourage a persistent attacker.
Alternatives to Blocking Bots: Bot Mitigation Techniques
Creating and Configuring robots.txt
All bot management practices should start with creating and properly configuring a robots.txt file. This is a simple text file that lives on your website’s server and functions to specify rules and policies for any bots accessing your site’s or application’s resources.
For instance, you can set rules defining which pages can be crawled by bots, and which ones can’t.
Good bots will follow the rules set by robots.txt, but malicious bots will not. In fact, bad bots might parse the content of the robots.txt file to get more information about your website in an attempt to find vulnerabilities and valuable data.
While robots.txt is not always an effective measure against malicious bots, it’s still very important to configure it properly to manage good bots making too many requests.
Feeding Fake Data
This method is fairly simple in principle; we keep the malicious bot active to waste its resources by replying to its requests with fake data.
For example, we can redirect the bot to a page similar to one it’s requesting but with thin or altered content. We can deliberately mislead the bot with wrong data values to poison its database, so the attacker will take the wrong action.
Bots operate with resources, which can be quite expensive for the bot operator. So, hopefully by slowing the bot enough and letting it waste its resources, the attacker will give up and move to other targets.
Throttling or Rate Limiting
Similar in practice to the above: we slow down the bot’s activities to let it waste resources. However, here we will let it access the legitimate resource while slowing down the bandwidth served. This can be effective for bots that are persistently attacking certain resources, and the idea remains the same. Hopefully, we’ll discourage the attacker so they’ll move to different targets other than our website.
Challenging The Client With CAPTCHA
This is a common approach when we are not sure whether the client is a bot or a human user. CAPTCHA tests or similar challenges are designed to be very difficult for bots to solve, but fairly easy to solve by human users.
However, no matter how good a CAPTCHA is, it will always slow down legitimate users and hurt the user experience. Also, with the presence of CAPTCHA farms, it is no longer a perfect, bulletproof solution.
Still, it is an effective solution in specific use cases to protect your site from less advanced bots.
Closing Thoughts: When To Block Bots and bot management
As a last resort, and in cases where we are almost 100% sure that the client is a malicious bot, total blocking may be the best choice. When we block bots, we don’t need to process their traffic and there are no rules or policies to implement, making this a cost-effective solution.
However, blocking bots is only recommended where we have an adequate bot management and detection solution in place, preferably an AI-powered solution capable of effectively and consistently differentiating between good bots and bad bots.
Emma Yulini
Tags: AI, Bots, Cybercriminals, Cybersecurity, Data, Google, Security