The whole thing started simply because I wanted to host a new service, specifically immich, which I mentioned in a previous post. I was a bit dissatisfied with my current docker + VM + zfs + proxmox setup. I felt that whenever I tinkered with one service, it was easy to bring down other services with it, especially since Docker’s networking has a somewhat annoying issue. So, I thought, why not go all out and put each service into its own lxc container. Docker was really just because I find it convenient to deploy, and managing it with portainer CE (Community Edition) with its UI is indeed handier. Thus began this journey of tinkering.
Let’s start with the conclusion I don’t recommend doing this.
However, as more people start to use NVME SSDs, the situation has changed. I’ve been using an 2-way zfs mirror pool based on NVME SSDs for a while now, and I’ve found that the performance of LXC → VFS → Proxmox ZFS on NVME SSDs is at least good enough for light home uses (e.g. Nginx Proxy Manager, Immich, Jellyfin, etc). I guess the main reason for this is that NVME SSDs have a much higher bandwidth than SATA SSDs and much much much higher sync-write speed (even though I’m using consumer grade NVME SSDs, they are still much faster than older SSDs). However, this is only true for NVME SSDs and probably only true from light uses. I believe the signifcant write amplification is still there. So using docker+lxc+zfs on NVME SSDs is probably still not a good idea in my eyes as it will shorten the lifespan of the SSD and, still, the performance will still not be as good as a real server or a VM.
The main reason: Docker’s storage driver on ZFS-based LXC can basically only use VFS ( vfs ). This driver is slow, uses a lot of space, and is particularly unfriendly to ZFS on home SSDs (specifically, the sata ones). The Immich stack once failed to start entirely due to the slow disk speed. Later, I had to disable ZFS sync (i.e., standard → disable) on the LXC subvolume just to get it running. And an Nginx Proxy Manager ( nginx-proxy-manager ) instance required about 20GB of space just to start. Such huge space consumption, slow disk performance, and persistently high I/O delay have already doomed running Docker inside LXC on a ZFS system as fundamentally unreasonable.
Moreover, part of the reason for using LXC was supposed to be better performance, but such terrible disk performance basically negates this approach. And VFS is actually stated to be only for testing scenarios; it should be avoided in production.
Simply put: if you want Docker, use a VM. If you use LXC, don’t use Docker. (It’s also best to avoid NFS ( nfs ), CIFS ( samba ), FUSE ( fuse ), etc., and preferably don’t use privileged containers. To be honest, this makes me feel LXC isn’t very useful… for my use case).
Other Issues Encountered with LXC
Firstly, on LXC, you naturally need to enable Nesting; otherwise, you can’t use Docker.
Some people, to save trouble, might directly set the LXC to be privileged. But even then, docker run might still error out, most likely related to AppArmor ( apparmor ). One solution is to uninstall AppArmor. Note that installing Docker itself will actually install AppArmor (I’m not so sure for what I claim here TBH). So even if you uninstall it before installing Docker, you’ll need to uninstall it again after installing Docker. The same goes for installing docker-compose.
NOTE
💡 Another method should be to modify the LXC config and add unconfined, but I didn’t try this myself. It likely requires manually editing the LXC config, which the Proxmox 7 UI probably doesn’t support yet. This should actually be the proper way.
Additionally, if you need NFS/CIFS, besides enabling the corresponding options, you might also need privileged mode. I recall it being like this before, but I’m not entirely sure now.
Finally, if you absolutely want to avoid using the VFS storage driver on LXC, you’ll likely come across the fuse-overlayfs solution. This is also a storage driver, and its performance seems much better than VFS, and it supports any underlying filesystem. Why not use it?
-
My containers failed to start. MariaDB failed to start, and Nginx Proxy Manager also failed to start.
-
A more significant issue: Using fuse-overlayfs likely requires enabling the FUSE option in LXC. However, Proxmox’s Snapshot/Backup might conflict with FUSE, causing taking snapshots to hang. This basically means no native Proxmox backups.
NOTE
💡 fuse-overlayfs has apparently worked for some people in their experiments, but thorough testing is advised. I personally won’t be touching it again.
Why Can’t the ZFS Storage Driver Be Used?
If it weren’t for LXC, you could probably use this storage driver, and it’s generally recommended on ZFS. But it doesn’t work on LXC. Even if you know the filesystem inside your LXC is actually ZFS, Docker doesn’t know that. And LXC certainly (I figure) won’t let Docker running inside it mess with the host’s filesystem, while the Docker ZFS storage driver precisely needs to interact with things like snapshots. So these two are irreconcilable. It will likely remain this way for the foreseeable future.
What Problems Were Encountered on VMs?
Firstly, running multiple Docker Compose stacks on a single VM means that when tinkering, it’s easy for everything to go down together. Especially if infrastructure services (like Reverse Proxy, Git, Portainer) go down, cleaning up is a hassle.
Secondly, I encountered a strange issue before. When the number of Docker Compose stacks got a bit large, the entire network would fail. I hadn’t figured out why before, but now I know.
-
Docker defaults to bridge mode, and the available network ranges in the first phase are 172.17.x.x → 172.31.x.x. Each range is large, but there are only about 15 of them (approximately?). Note that the default bridge often already occupies one.
-
Each Docker stack has a default network, so if you start a dozen or so stacks, the first phase of 172.x addresses might be exhausted.
-
After the first phase is used up, it starts using 192.168.x.x, which is the second priority. Once it started using this range, the network on my VM would go down. This was likely due to a conflict with the WAN network segment of the VM’s upstream router (an OPNsense VM). That was awkward.
The solution should also be quite simple: just increase the number of network ranges in the first phase. Usually, this involves modifying/adding the “default-address-pools” field in /etc/docker/daemon.json.
However, even if resolved, using a single VM for all Docker containers might not be the best choice. Spreading them out, even without Kubernetes, by creating tiers and placing different tiers in different VMs. Whenever I want to tinker with something new, I start with the least important tier and try to avoid messing with infrastructure-related services. This should also reduce the probability of breaking things.
Postscript
Finally, let’s talk about LXC and Docker. These two are actually more like competitors (I figure, not sure), rather than Docker running on top of LXC. So, if you use LXC, it’s best to treat it as a bare-metal service platform and not try to do anything cgroup-related inside it. Unfortunately, although LXC is quite appealing on Proxmox, compared to Docker’s ecosystem, its few available images are really insufficient. You could say that on LXC you can deploy any service you need yourself. But sometimes, deployment requires immutability. Apart from persistent data, the code itself and some temporary files have an ephemeral lifecycle. This is what allows for portability. Plus, with Kubernetes being so popular now, I’m not very optimistic about LXC’s future (ofc I’m just nobody and I also don’t pay for it, I still appreciate all the work and effort around LXC and I believe it’s a great project and providing great value. It is just me from a consumer-hardware-based self-host enthusiast’s perspective).
Actually, when it comes down to it, stateful applications are the root of the problem. If all storage was accessed via APIs, and authentication issues were properly handled with ACL+IAM, and the storage layer was separated out to object storage or a database cluster, then any container could be stateless and easily migrated. It’s just that this is basically Cloud Native. For home use, you can’t expect all these self-hosted services to reach this level; it’s almost like abandoning local file systems entirely. Enterprises naturally do it this way. For home use, it’s better to make do with VM + NFS + Docker Bind Mounts. If you have good conditions at home (i.e., resources/budget), setting up Ceph ( ceph ) or iSCSI ( iscsi ) is also feasible, but in a way, these aren’t vastly different from NFS for home users 😓 (well, Ceph is actually very different if you have HA for proxmox). Perhaps only those specifically experimenting with distributed storage would have such needs. I’m just hosting Samba at home; there’s no need for such a big fuss.