summaryrefslogtreecommitdiff
path: root/content/guide
diff options
context:
space:
mode:
Diffstat (limited to 'content/guide')
-rw-r--r--content/guide/beware-of-the-python-crawlers.md142
1 files changed, 142 insertions, 0 deletions
diff --git a/content/guide/beware-of-the-python-crawlers.md b/content/guide/beware-of-the-python-crawlers.md
new file mode 100644
index 0000000..ad2f333
--- /dev/null
+++ b/content/guide/beware-of-the-python-crawlers.md
@@ -0,0 +1,142 @@
+---
+title: Beware of the Python Crawlers
+date: 2025-11-16T11:02:32+01:00
+deprecated: false
+---
+
+## Intro
+
+### I'll show you the problem
+
+Okay so you know that when you host a website/have a server opened to internet connections, there will be a lot of bots interacting with your server.
+
+If you own a server/virtual personnal server and that you have a website. Try to check the `access.log` of your nginx server :
+
+```sh
+cat /var/log/nginx/access.log
+```
+
+You might find a whole bunch of lines, like people browsing your website, google, openai and others getting the `robots.txt` etc.
+
+But there is also those people that want to harm you. Those people are not trying to cause harm to **you** especially, they just have those awful **python crawlers** that make those HTTP requests over and over. I will give you a quick example right now.
+
+Do this :
+
+```sh
+grep "404" /var/log/nginx/access.log
+```
+
+You will get a bunch of logs.
+
+Here are some guy trying to find some interesting stuff on my server (I am not going to show all the lines as that only today, in 12 hours, I got 10000 bots requests, without joking):
+
+```sh
+[SOME_IP_ADDRESS] - - [DATE] "GET /cgi-bin/info.php HTTP/1.1" 404 19 "-" "python-httpx/0.24.1"
+[SOME_IP_ADDRESS] - - [DATE] "GET /cgi-bin/phpinfo.php HTTP/1.1" 404 19 "-" "python-httpx/0.24.1"
+[SOME_IP_ADDRESS] - - [DATE] "GET /cgi-bin/info.php.save HTTP/1.1" 404 19 "-" "python-httpx/0.24.1"
+[SOME_IP_ADDRESS] - - [DATE] "GET /.env HTTP/1.1" 404 19 "-" "python-httpx/0.24.1"
+[SOME_IP_ADDRESS] - - [DATE] "GET /.env.local HTTP/1.1" 404 19 "-" "python-httpx/0.24.1"&
+[SOME_IP_ADDRESS] - - [DATE] "GET /.git/config HTTP/1.1" 404 19 "-" "Python/3.10 aiohttp/3.13.1"
+[SOME_IP_ADDRESS] - - [DATE] "GET /.gitlab-ci.yml HTTP/1.1" 404 19 "-" "Python/3.10 aiohttp/3.13.1"
+```
+
+They try to get the git configuration of the repository `.git/config`, your CI/CD gitlab script `.gitlab-ci.yml`, some ENVIRONMENT VARIABLES `.env`, etc.
+
+Fortunately, I use a static site generator called [hugo](https://gohugo.io), so the root of this webpage is located in a folder that only contains html, css and images, I don't have any php, .env, cgi scripts thing at all. So they only get `404` errors.
+
+But please be careful about who makes GET and POST requests to your server. You can setup your NGINX server to block specifics path (like deny access to `yoursite.com/importantfile.txt` etc).
+
+### The thing
+
+I am tired of these bots requests. All this computing power could be used for something else, but instead they prefer to waste the resources of our servers.
+
+I knew that having a server would be something "risky" in itself, but I didn't know that, that many bots are pinging you, refreshing the page all the time, sending POST, DELETE and others requests. It's basically wasted bandwidth.
+
+I even have some dude bruteforcing my email server. He just have those 30 IPs all starting with the same numbers and he keeps trying to connect as 'common' users like `kevin`, `andrea`, `git`, `root`, `postmaster` etc.
+
+I am tired of this, so let's block them.
+
+## fail2ban
+
+### Presentation
+
+[fail2fan](https://github.com/fail2ban/fail2ban) is a program that tracks the content of your programs logs (NGINX, dovecot, postfix, ssh, counter-strike, apache...) and grep specific types of REGEX patterns. These regex are defined as filters, they "filter" the content of the logs, searching for matching results.
+
+Fail2ban uses your system's firewall to ban IPs that make logs that are detected by filters, and then decide to ban them for a certain amount of time.
+
+### Installation
+
+Install `fail2ban` with your package manager, you already have it if you used the [emailwiz script](https://github.com/Lukesmithxyz/emailwiz) from Luke Smith.
+
+```sh
+# On debian based server
+
+apt-get update
+apt-get install fail2ban -y
+```
+
+```sh
+systemctl enable --now fail2ban
+systemctl status fail2ban # should say active (running)
+```
+
+### Configuration
+
+After installing fail2ban, you should have tons of filters in `/etc/fail2ban/filter.d/`.
+
+Fail2ban uses "jails", they define common rules like, which log should be analyzed, which filter to apply, if someone does match the filter, should you ban him immediatly ? For how long ? etc.
+
+A default config is provided in `/etc/fail2ban/jail.conf`, don't edit it as your package manager will overwrite the content if fail2ban's maintainers make changes to that file.
+
+> Before doing anything, please read the top of the "/etc/fail2ban/jail.conf" file. Watch some videos, read the ArchWiki, get your feet wet.
+
+### My configuration
+
+This is the configuration I use for fail2ban. You can use it and edit it as much as you want.
+
+```toml
+{{< include-remote url="https://codeberg.org/mielota/dox/raw/branch/main/etc/fail2ban/jail.local" >}}
+```
+
+After saving the changes you made to your `jail.local`, restart fail2ban.
+
+```sh
+systemctl restart fail2ban
+```
+
+### Using fail2ban client
+
+You can use `fail2ban-client` to do some cool things.
+
+1. To list the ip that you banned :
+
+```sh
+fail2ban-client banned
+```
+
+2. To list the ip that you banned per section
+
+```sh
+fail2ban-client status SECTION
+```
+
+Example :
+
+```sh
+fail2ban-client status nginx-limit-req
+fail2ban-client status dovecot
+```
+
+3. Unban an IP (useful if your server banned you)
+
+```sh
+fail2ban-client set SECTION unbanip IP_ADDRESS
+```
+
+## Conclusion
+
+It's a bit terrifying to see so many people trying to breach the security of your system. I didn't know at all that my poor server was suffering that much, this article is only here to warn you, check your logs.
+
+There is tons of tutorial on the web about fail2ban, I only covered the basics.
+
+This is not a guide about fail2ban, it's really just a warning.