After implementing the new tarragon the biggest problem I had involved the clamav package, and its loading of signatures. If clamd doesn’t come up and open its socket, then amavisd (the daemon who is consulted by postfix to handle all the checking of each piece of mail on input and output) will fail (assuming he is configured to do virus checking), This results in various problems. Amavis will mark the mail as “unchecked”, but worse, it will report failure back to postfix who gets confused and very often the message is delivered two or three times.
Clamd, the clamav daemon, now has over 6 million signatures. There are a lot of bad boys out there. The signatures are loaded by clamd from its database (in /var/lib/clamav) on startup, into memory. As a result, clamd has a large memory footprint, almost 800Mb on my system. The first issue, discovered before going live, was that systemd’s default parameters expect any daemon he starts to load within 90 seconds. If it fails to check in within that time, systemd considers it broken and terminates it. Clamd takes at least 3 minutes to load. I had to set a special TimeoutStartSec value in the systemd service script for clamd@.service.
Whew! I thought, boy I’m glad I figured that out. Hah!
The “out of the box” installation has freshclam running every couple of hours, downloading new signatures. Every time it runs it updates the database in /var/lib/clamav. A short time later, clamd notices that an updated database is available and reloads the signatures from that database. I have not looked at the code, but watching this with top, I think at the very end of the reload clamd actually forks to daemonize himself, and briefly requires double the memory – but this assertion is based upon seeing this happen only once.
Suffice to say that my next problem was that when clamd attempted to reload signatures, it would fail to obtain enough memory. Sometimes it would fail while trying to read the signatures from the database and construct the in memory structures. Sometimes it would manage all that, but then it would fail to daemonize.
This instance is an old Amazon m3.medium which has 3.75Gb of ram. It is a reserved instance with a year left on the clock, so I don’t want to throw it away and buy a newer instance with more memory. I also don’t want to pay for a larger instance. This whole idea of providing services in the cloud for myself and everyone else is predicated on the idea that it is inexpensive. Increasing to an instance with 8Gb or ram will double the cost. I’m too cheap for that. I want it to run in what I have.
Using tools like top, I observed that after the system was running for a while the memory usage would climb into the 90% range. The biggest memory hogs were clamd himself, who wants about 21%, and apache who had 6 processes, a few of which were using as much as 15% each. In aggregate, apache was using 60%-65% of resident memory. Amavis, Tor and Mysql together account for 15%-20%. Do the math: using the minimums that is 21+60+15, or 96%. Asking clamd to load new signatures when memory is already at 96% seems an unlikely proposition.
An initial test was to graceful restart apache, which would drop his memory usage down to about 15%, and then start clamd. While it takes clamd several minutes, it also takes apache quite a while to build back up, so the clamd load would work during that interval.
I set about trying to figure out why apache needed so much memory. There is quite a bit of complexity surrounding the parameters which apache uses to decide how many processes to start, how many threads to run in each one, when to start new ones, how to handle keep alive timers, how to avoid having to start new processes to handle incoming requests. I read a good deal, and I decided that the out of the box parameters I was using would support far more web activity than my small collection of “not very active” websites required.
For apache configuration files I keep two separate additional configuration directories paralleling the standard ones. Apache’s two standard configuration directories are /etc/httpd/conf.d and /etc/httpd/conf.modules.d. I keep in parallel /etc/httpd/conf.d.dee and /etc/httpd/conf.modules.d.dee and I create symlinks in the former set (e.g. conf.d) pointing to the latter (conf.d.dee) when I need to modify things, e.g. ssl parameters in /etc/httpd/conf.d/ssl.conf.
So for this purpose I created a new file in conf.modules.d.dee called 10-mpmconfig.conf and symlinked to it from conf.modules.d. It contains apache parameters:
<IfModule event.c>
ServerLimit 8
StartServers 1
ThreadsPerChild 10
ThreadLimit 15
MinSpareThreads 5
MaxSpareThreads 20
MaxRequestWorkers 80
MaxConnectionsPerChild 80
</IfModule>
Apache server-status shows that I rarely have more than 6-7 threads active, and I’ve never seen more than 10. With the standard out of the box parameters I had something like 90 idle threads across 5 started server processes, plus the root process. With these settings I can accommodate up to 80 connections/threads, but generally – for the normal case – will have only 1-2 server processes, each with 10 threads. If I get as low as 5 spare threads I will start a new process, with 10 more. If I get as many as 20 spare threads it will shut down a process. The last parameter, MaxConnectionsPerChild causes a child to only handle a total of 80 connections before it dies and is replaced by another – in case of memory leaks.
After implementing these changes, apache typically only has 1 server process and it is using less than 8%, (plus the root process which uses almost nothing). All of apache consumes less than 10% under ordinary load conditions.
But I went further than this. I have also stopped freshclam from running every couple of hours. I have stopped it running automatically at all.
Every night during the backups I do a lot of little housekeeping things. One thing is renewal of certificates, with certbot. And after running certbot – just in case it changed a certificate, I do a graceful apache restart. Otherwise apache doesn’t detect that there is a new certificate and eventually, barring some other reason to restart, I get a bunch of noise about expired certificates.
So now, after the apache restart I run a freshclam, “manually” in the script, and then force a clamd restart. This is a conscious decision to only renew the virus signatures once a day. Maybe I will regret this some day. But for my little community it is hard for me to see that we need to renew virus signatures 6 times a day.