There was a fail event reported on rosemary from one side of a pair of 60GB SSDs, which hold Rosemary_Data. Typical of my installations this mirror set holds the stuff the system needs beyond the os install: /home, databases, certificates, repositories, mail, samba, local bin, etc. Its a mirror set with an encrypted container, containing a btrfs filesystem. The older versions of these setups contain separate btrfs subvolumes for the different directories, newer ones have only one subvolume for that, and another for snaps. This is an old one.
Rosemary doesn’t have an extensive set of services – really only the /home and the databases. No real need for much of anything else. The local bin comes out of the repo anyway, there is no mail, no repository, no certs. However, without that volume the system won’t come up in a usable way. So lesson one learned here was when you get a fail event, attend to it. I let it go for a few days, because I knew I was going to have to pull the case out of the rack mount to get at the SSDs.
Continue reading Rosemary Recovery 2020
Some people for whom I provide some kinds of support with gateway pis, use Macs. For the pc folk – at least for those on Windows 10, I’ve been seting up to do the filehistory thing, and putting the filehistory onto the /backup drive on the gateway pi. Then it gets sent here overnight. I wanted to do the same for the folks who have Macs, of which there are several.
Continue reading Timemachine on Gateway pi
I don’t know if I actually know enough to write this post. But I want to record what little I do know about this.
The symptom is that my tarragondata volume on this system, tarragon, claims to be out of space. This is a btrfs volume, about which there are other posts. It contains most of the dynamic parts of the system. The root volume ‘/’ is very small, about 20GB. Just enough to install the Centos code and keep a few little things. The great majority of the information needed to run the system is symlinked out of /, which includes /home, mail, databases, websites and their data, the repositories, certificates, local scripts etc.
This is a 180GB disk, and it currently is running about 55% full, i.e. almost 100GB used. Among the information on this disk are snapshots of all the tarragondata, every night for 30 days. This isn’t disaster backup/disk failure backup (which is elsewhere), this is “operator error” backup.
A couple of weeks ago I began to experience a new kind of failure. In the middle of the night, suddenly this btrfs volume would report that it was out of space – usage 100%, although the amount of storage in use was, still the roughtly 100GB that it normally uses. It manifestly was not actually out of space.
Continue reading Out of space on btrfs
After implementing the new tarragon the biggest problem I had involved the clamav package, and its loading of signatures. If clamd doesn’t come up and open its socket, then amavisd (the daemon who is consulted by postfix to handle all the checking of each piece of mail on input and output) will fail (assuming he is configured to do virus checking), This results in various problems. Amavis will mark the mail as “unchecked”, but worse, it will report failure back to postfix who gets confused and very often the message is delivered two or three times.
Clamd, the clamav daemon, now has over 6 million signatures. There are a lot of bad boys out there. The signatures are loaded by clamd from its database (in /var/lib/clamav) on startup, into memory. As a result, clamd has a large memory footprint, almost 800Mb on my system. The first issue, discovered before going live, was that systemd’s default parameters expect any daemon he starts to load within 90 seconds. If it fails to check in within that time, systemd considers it broken and terminates it. Clamd takes at least 3 minutes to load. I had to set a special TimeoutStartSec value in the systemd service script for clamd@.service.
Whew! I thought, boy I’m glad I figured that out. Hah!
Continue reading Clamd signatures and Apache memory
This server, on Amazon, hosts my website and a dozen others, provides mail service for several people’s email including my own with postfix, dovecot, opendkim, amavis, spamassassin and clamd, provides contacts and calendar service using radicale, provides vpn service with openvpn, provides a tor relay, provides nextcloud service, and hosts my svn repository.
The server was last rebuilt in 2017. Long, long ago when I built the first version of it, I was most familiar with Red Hat/Fedora, and since then it has been easiest just to upgrade it with Fedora, always grumbling to myself that someday I’m going to change it. The problem with being on Fedora, of course, is that Fedora changes every 6 months, so I’m constantly behind. And after a year I’m at end of life. This is dumb for a server that I don’t want to be messing with all the time.
Continue reading Tarragon Rebuild 2019
I now have 8 of these gateway boxes out there. This morning as I was checking backups on one of them, I observed that it took quite a long time to respond. I ran a top on it and was horrified to see that its memory use was 100% and so was its swap. Holy @#$%!@ Batman!
Most of the memory was being used by the lxpanel. And (hangs head in embarrassment) there were actually two lxpanels running – one for the console and one in the vnc window I launch at startup.
It seems the lxpanels leak. I don’t know how badly, but it doesn’t matter. These boxes are meant to run forever so even a tiny leak is eventually fatal.
Well this was simple. I will seldom, if ever, need to get into a graphical environment remotely, and if I do I can always start vnc from the command line. So I took out the startvnc from the startup script. And I have even LESS need for a graphical console since there is not even a monitor on these things. So I set the default systemd target to multi-user.target.
Did this on all the gateways that are running on pi-zeros. Those few running on bigger ubuntu boxes I didn’t really have the problem anyway.
After rebooting them they come up with no lxpanels. I’ll watch the memory use, but I think this will fix the problem.
I found out today something I am sure to forget.
In every zfs dataset there is an invisible directory (by invisible, I mean that it does NOT show up with ls -a) name .zfs. In side this directory are two subdirectories, shares and snapshots.
The snapshots subdirectory is a perfectly serviceable read-only access to all the snapshots. Viz:
Continue reading Invisible zfs snapshot directory
I want to be able to get to my wife’s mac, in another city. She is an unsophisticated user, and I’d like to be able to help her when she needs help, but I can’t ask her to do very much setup. I also want to be able to provide backup for her files.
The first step was to outfit her with one of the little gateway pis previously described. Once that was done, we managed, together, to enable me to get to her mac with ssh, by way of the pi tunnel. And we managed to set up an account on her mac under my name.
Continue reading Setting up a mac remotely
I realized as I was writing a new post that I had never documented the gateway pi undertaking.
This started when a friend in the mountains got a new internet service where the ISP would not allow him (and therefore me) access to his router. As a result I could no longer use ssh to connect to his systems.
I solved this problem by setting his systems up to use a tool called autossh, with which I could have his system start, monitor, and keep running an ssh daemon with reverse tunnels open to my system. I could then reach him by attaching through the reverse tunnels.
Continue reading Gateway pi
At one point I was getting low on space on tarragondata, so I added an additional physical device to the btrfs filesystem containing tarragondata.
[root@tarragon backup_scripts]# btrfs fi show
Label: 'tarragon_data' uuid: d6e4b6fc-8745-4e6e-b6b4-8548142b5154
Total devices 2 FS bytes used 92.04GiB
devid 1 size 120.00GiB used 120.00GiB path /dev/xvdf1
devid 2 size 30.00GiB used 30.00GiB path /dev/xvdg
This is fine, but there are a couple of problems. The main one is that I can no longer use the EC2 snapshot capability on tarragondata, which meant that the nightly EC2 snapshot feature I was using had to be deimplemented.
But now I am about to create a new tarragon instance, and it would be really helpful to be able to snapshot tarragondata (Amazon snapshot, not btrfs snapshot) and then create a new Amazon volume with a consistent snapshot for testing. Continue reading Adjusting the size of the tarragondata volume