diff options
Diffstat (limited to 'content/guide/beware-of-the-python-crawlers.md')
| -rw-r--r-- | content/guide/beware-of-the-python-crawlers.md | 142 |
1 files changed, 142 insertions, 0 deletions
diff --git a/content/guide/beware-of-the-python-crawlers.md b/content/guide/beware-of-the-python-crawlers.md new file mode 100644 index 0000000..ad2f333 --- /dev/null +++ b/content/guide/beware-of-the-python-crawlers.md @@ -0,0 +1,142 @@ +--- +title: Beware of the Python Crawlers +date: 2025-11-16T11:02:32+01:00 +deprecated: false +--- + +## Intro + +### I'll show you the problem + +Okay so you know that when you host a website/have a server opened to internet connections, there will be a lot of bots interacting with your server. + +If you own a server/virtual personnal server and that you have a website. Try to check the `access.log` of your nginx server : + +```sh +cat /var/log/nginx/access.log +``` + +You might find a whole bunch of lines, like people browsing your website, google, openai and others getting the `robots.txt` etc. + +But there is also those people that want to harm you. Those people are not trying to cause harm to **you** especially, they just have those awful **python crawlers** that make those HTTP requests over and over. I will give you a quick example right now. + +Do this : + +```sh +grep "404" /var/log/nginx/access.log +``` + +You will get a bunch of logs. + +Here are some guy trying to find some interesting stuff on my server (I am not going to show all the lines as that only today, in 12 hours, I got 10000 bots requests, without joking): + +```sh +[SOME_IP_ADDRESS] - - [DATE] "GET /cgi-bin/info.php HTTP/1.1" 404 19 "-" "python-httpx/0.24.1" +[SOME_IP_ADDRESS] - - [DATE] "GET /cgi-bin/phpinfo.php HTTP/1.1" 404 19 "-" "python-httpx/0.24.1" +[SOME_IP_ADDRESS] - - [DATE] "GET /cgi-bin/info.php.save HTTP/1.1" 404 19 "-" "python-httpx/0.24.1" +[SOME_IP_ADDRESS] - - [DATE] "GET /.env HTTP/1.1" 404 19 "-" "python-httpx/0.24.1" +[SOME_IP_ADDRESS] - - [DATE] "GET /.env.local HTTP/1.1" 404 19 "-" "python-httpx/0.24.1"& +[SOME_IP_ADDRESS] - - [DATE] "GET /.git/config HTTP/1.1" 404 19 "-" "Python/3.10 aiohttp/3.13.1" +[SOME_IP_ADDRESS] - - [DATE] "GET /.gitlab-ci.yml HTTP/1.1" 404 19 "-" "Python/3.10 aiohttp/3.13.1" +``` + +They try to get the git configuration of the repository `.git/config`, your CI/CD gitlab script `.gitlab-ci.yml`, some ENVIRONMENT VARIABLES `.env`, etc. + +Fortunately, I use a static site generator called [hugo](https://gohugo.io), so the root of this webpage is located in a folder that only contains html, css and images, I don't have any php, .env, cgi scripts thing at all. So they only get `404` errors. + +But please be careful about who makes GET and POST requests to your server. You can setup your NGINX server to block specifics path (like deny access to `yoursite.com/importantfile.txt` etc). + +### The thing + +I am tired of these bots requests. All this computing power could be used for something else, but instead they prefer to waste the resources of our servers. + +I knew that having a server would be something "risky" in itself, but I didn't know that, that many bots are pinging you, refreshing the page all the time, sending POST, DELETE and others requests. It's basically wasted bandwidth. + +I even have some dude bruteforcing my email server. He just have those 30 IPs all starting with the same numbers and he keeps trying to connect as 'common' users like `kevin`, `andrea`, `git`, `root`, `postmaster` etc. + +I am tired of this, so let's block them. + +## fail2ban + +### Presentation + +[fail2fan](https://github.com/fail2ban/fail2ban) is a program that tracks the content of your programs logs (NGINX, dovecot, postfix, ssh, counter-strike, apache...) and grep specific types of REGEX patterns. These regex are defined as filters, they "filter" the content of the logs, searching for matching results. + +Fail2ban uses your system's firewall to ban IPs that make logs that are detected by filters, and then decide to ban them for a certain amount of time. + +### Installation + +Install `fail2ban` with your package manager, you already have it if you used the [emailwiz script](https://github.com/Lukesmithxyz/emailwiz) from Luke Smith. + +```sh +# On debian based server + +apt-get update +apt-get install fail2ban -y +``` + +```sh +systemctl enable --now fail2ban +systemctl status fail2ban # should say active (running) +``` + +### Configuration + +After installing fail2ban, you should have tons of filters in `/etc/fail2ban/filter.d/`. + +Fail2ban uses "jails", they define common rules like, which log should be analyzed, which filter to apply, if someone does match the filter, should you ban him immediatly ? For how long ? etc. + +A default config is provided in `/etc/fail2ban/jail.conf`, don't edit it as your package manager will overwrite the content if fail2ban's maintainers make changes to that file. + +> Before doing anything, please read the top of the "/etc/fail2ban/jail.conf" file. Watch some videos, read the ArchWiki, get your feet wet. + +### My configuration + +This is the configuration I use for fail2ban. You can use it and edit it as much as you want. + +```toml +{{< include-remote url="https://codeberg.org/mielota/dox/raw/branch/main/etc/fail2ban/jail.local" >}} +``` + +After saving the changes you made to your `jail.local`, restart fail2ban. + +```sh +systemctl restart fail2ban +``` + +### Using fail2ban client + +You can use `fail2ban-client` to do some cool things. + +1. To list the ip that you banned : + +```sh +fail2ban-client banned +``` + +2. To list the ip that you banned per section + +```sh +fail2ban-client status SECTION +``` + +Example : + +```sh +fail2ban-client status nginx-limit-req +fail2ban-client status dovecot +``` + +3. Unban an IP (useful if your server banned you) + +```sh +fail2ban-client set SECTION unbanip IP_ADDRESS +``` + +## Conclusion + +It's a bit terrifying to see so many people trying to breach the security of your system. I didn't know at all that my poor server was suffering that much, this article is only here to warn you, check your logs. + +There is tons of tutorial on the web about fail2ban, I only covered the basics. + +This is not a guide about fail2ban, it's really just a warning. |
