3

In my home lab I have containers spread across a few devices (mainly a server and a NAS). I have Docker containers running on the server whose main storage is on the NAS, and I have named cifs volumes to expose the SMB shares to the containers. So far so good.

The problem occurs when there are partial outages. E.g. the NAS has an update and needs to reboot, or there is a network problem between the server and the NAS. Even after the NAS returns, the containers on the server are still in a bad state since the cifs mount doesn't get restarted. And for most of them (e.g. Jellyfin), it's not worth having the containers running at all if the network share they point to isn't available.

I'm wondering if there is a general practice to handle this kind of dependency short of Kubernetes. If not, it seems like I need a mechanism to stop the containers (or maybe the whole Compose stack) when the share is unavailable and restart them when it returns. I.e. something akin to deunhealth but for network shares rather than container health[1] . I could probably write such a tool pretty easily, but it seems like something that might already exist.

[1] One could imagine adding this to the health check and using deunhealth, but this wouldn't be right for this situation since the container should be stopped when the share is unavailable, not forced into a restart loop.

Giacomo1968
  • 58,727
keyboardr
  • 131

1 Answers1

1

The problem occurs when there are partial outages [..] Even after the NAS returns, the containers on the server are still in a bad state since the cifs mount doesn't get restarted [..] it seems like I need a mechanism to stop the containers (or maybe the whole Compose stack) when the share is unavailable and restart them when it returns

I've actually had this problem on my side. There are power outages over here.

I ended up writing a script like the one you're describing for my needs, it works a lot like what you described. It then gets scheduled to run periodically as a systemd timer.

I had to write some provisioning via Terraform, Ansible and Proxmox to test it thoroughly. I've tested a few scenarios:

  • samba share gets mounted on the host and then mounted onto the docker container
  • samba share gets directly mounted into docker compose
  • long boot-times for the NAS

Even though I was recommended to mount the Samba share directly onto the container, I've noticed it's actually more practical to mount it on the host (because mounting directly onto the container won't allow me to stop the container when the mount point is frozen).

For now, the docker stacks are hardcoded although it would be possible to pick up the right docker stack paths automatically (the ones that make use of Samba shares).

About long boot-times for the NAS I've tried some systemd fstab options such as x-systemd.device-timeout and x-systemd.mount-timeout, neither of which worked as advertised in their respective documentations.

code

UPDATE 2025-01-27:

Possible future improvements:

  • automatically detect samba-using docker containers and only consider those (instead of using hardcoded paths as the code works now)
  • automatically cross-reference docker containers to the samba shares they use so only those samba shares will be remounted
  • measure bandwidth on the tcp samba ports to determine if the samba shares are actively being used or in fact, frozen
wsdookadr
  • 111