October 31, 2015

ZOL Systemd Pain Points

Happy Halloween ;D

To Systemd Or Not To Systemd…​.

Another factor to take under consideration when deploying ZOL on boot and root is whether "to systemd or not to systemd". Aye! Yes, that is the question. Yes, you do have a choice! Whatever else systemd may or may not be, it results in the potential for some pain points on a ZOL boot and root set up. It’s important to understand them as best one can and make at least some effort to second guess how they may impact your use case. This is no small feat given the complexity of systemd and the relentless reinvention of Linux by systemd developers. As more and more mainstream distros support zfsonlinux, I’ve no doubt these will be eased by those more adept than I moving forward. In the meantime, however, there are considerations and compromises to be made using zfsonlinux on systemd based distributions.

Systemd/journald and Legacy Mounts.

One of the beauties of ZFS is the elegance it brings to managing file systems, a.k.a. datasets under ZFS lingo, all under the umbrella of ZFS itself. The deep enterprise experience Sun’s engineers brought to the table when designing ZFS really shines through here. Of course legacy mounts via /etc/fstab are also supported. I am of the opinion, however, that mixing and matching zfs datasets with legacy mounts adds complexity and hobbles one of ZFS’s strongest selling points: its management toolset user interface. Let’s take a look at how all this relates to sytemd.

Journald provides more information about early boot and late shutdown than old school loggers such as syslog-ng. In so doing, journald dearly wants to flush that data to disk sooner rather than later during boot and later rather than sooner during shutdown. So it enlists systemd to manage mounting and unmounting of the default location for these logs, /var/log/journald, to do just that. Breaking any part of that path out onto a dedicated zfs dataset results in problems. During shutdown, systemd unmounts zfs pools, then flushes journald to /var/log/journald. During the next boot, a legacy file mount point for journald data will now be mounted early in the boot process even if not explicitly listed in /etc/fstab at the behest of systemd. Moments later in the boot when systemd imports the zfs pools, /var/log/journald will no longer be empty. ZFS, seeing that there is data where it is expecting none, errors on the side of caution and refuses to mount the zfs dataset. Perhaps I am missing something and altering this behavior is not such a difficult task but I spent some hours tying to do so unsuccessfully. Maybe a systemd guru will come along with the answer.

Given the above one may conclude that breaking out datasets on a ZOL boot and root system is more hassle that it’s worth. And for your use case, you may well be correct. I, however, am reluctant to manage my data that way. I have sampled the holy grail that is ZFS on platforms where it is a first class citizen. One thing I desire is for snapshots of the base platform to be just precisely that, and only that. Disks may be cheap, but I’ve no interest in these snaps occupying more space than they otherwise need too in order to track frequently changing files like system logs and other non essential and readily reproduced things like /var/cache. Conversely, I do very much want system essentials such as files living under /var/lib to remain consistent with the rest of /. Or not.

Seasoning datasets and mountpoints to taste is trivial under other ZFS implemtations but presents a bit of a quagmire for systemd based ZOL platforms. I don’t see any clear best practice here, can only offer my thoughts, and suggest giving the above some analysis in the context of your use case.

Systemd, Extended Atributes, and ACL’s

Systemd potentially represents the mother or all security holes in Linux and in efforts towards due diligence and mitigation of such really, really, really wants/needs to make use of extended attributes and access control lists. As best I’ve been able to surmise, as of this writing, these are used on /var/lib/systemd and /var/log/journald. I expect that as systemd continues along the relentless path towards world domination, it may well desire to set such on other file systems/mountpoints. No worries. We can accomodate for that. ZOL handles xattr and POSIX ACL’s. Two different ways. Different because of the way Linux implements stuff compared to IllumOS and *BSD based platforms. As such, Linux specific feature flags are options that help tune the use of such on Linux. As Linux specific features, however, they are also not necessarily portable to other ZFS implementations. If such is a concern, maybe best to stick with the defaults. If less of a concern and performance more of a concern then I suggest considering use of these features.

Example use of xattr and POSIX acls on ZOL dataset:
# zfs set xattr=sa rpool/ROOT/var/lib
# zfs set acltype=posixacl rpool/ROOT/var/lib

Concluding Thoughts

I want to break out datasets. Candidates for my short list are:


Depending on your use case, you may want to consider managing /var/lib/lxc, /var/lib/systemd, /var/cache/pacman/pkg, etc. under dedicated zfs datasets as well. Season the list above to taste.

Alas, althougth I personally have minimal to no use for xattr and/or posixacl on a workstation configuration, all in all if systemd is a must for you then it behooves you to endeavor to go with the systemd flow. You just can’t fight it. And there’s no denying it. You will be assimilated into the collective. I recommend you give this conscious pause when considering your implementation planning.

These datasets will all best be managed as legacy mounts under /etc/fstab. Setting xattr and acltype during creation of the /var dataset gets them inherited automagically to datasets living under /var. This is a convenience feature. Season to taste if not to your liking.

Example /etc/fstab using zol legacy mounts
rpool/ROOT/var      /var      zfs        defaults                   0 0
. etc. for any additional zfs datasets mounted under /var

Alternatively, give it at KISS and manage all of / as a single zfs dataset.

Have fun.

Tags: zfsonlinux