Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] nixos/zfs: Don't force import by default #36882

Closed
wants to merge 1 commit into from

Conversation

infinisil
Copy link
Member

@infinisil infinisil commented Mar 12, 2018

See 12e77fdc3f6f for reasoning of the previous default enabling of these
options. It is quiet a bit later now and networking.hostId being set has
been enforced since e9affb4274a4 in 2014.

This should be properly tested by multiple people before merged. If you have tested this on your zfs setup please make a comment. Edit: And if you have set those options to false already, drop a comment as well.

Edit: Make sure to reboot after setting these to properly test it!

{
  boot.zfs.forceImportRoot = false;
  boot.zfs.forceImportAll = false;
}
Motivation for this change

See 12e77fdc3f6f

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option build-use-sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

See 12e77fd for reasoning of the previous default enabling of these
options. It is quiet a bit later now and networking.hostId being set has
been enforced since e9affb4 in 2014.
@aristidb
Copy link
Contributor

Is this a case for using stateVersion?

@aristidb
Copy link
Contributor

I've been running a ZFS pool on NixOS with these options since 2017-06-16, so from my point of view, this PR "works" 🥇

@fpletz fpletz added this to the 18.09 milestone Mar 12, 2018
@infinisil
Copy link
Member Author

@aristidb No, stateVersion should be used for state, e.g. a postgres database default version update. If we were to use stateVersion for this, all current users wouldn't profit from this new default, because stateVersion should never be changed really.

@infinisil
Copy link
Member Author

infinisil commented Mar 12, 2018

@lheckemann
Copy link
Member

I've "tested" this by manually disabling the options. Works for me, and seems like a sensible change.

@infinisil
Copy link
Member Author

Tested this on 3 machines, works without problems for me.

@sjau
Copy link

sjau commented Mar 13, 2018

So, what happens if you import the pool on a live cd first and then try to normal boot?

@infinisil
Copy link
Member Author

@sjau I wasn't sure of when force importing was necessary myself, so I looked it up in the source code! The important bits of the force import condition is here, I quote:

if (state != POOL_STATE_EXPORTED && hostid != get_system_hostid())
  return (B_TRUE);

So, the pool needs to not be exported and the hostid must not matching for a force import to be required. This means that assuming the previous system properly exported the pool it shouldn't occur.

Whether that's the case with NixOS I'll have to look into yet.

@lheckemann
Copy link
Member

Ah yes, so this would only be problematic for the installer if it shut down uncleanly.

@sjau
Copy link

sjau commented Mar 13, 2018

It seems that a shut down doesn't properly export the pool. At least when I have NixOS installed, then do a proper shutdown and then I boot the live iso and try to import the pool, I'll get the message that I need to force-import it.

@infinisil
Copy link
Member Author

Update: NixOS does indeed seem to not export zpools at all on shutdown (the relevant file would be https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/tasks/filesystems/zfs.nix)!

So.. This is probably a showstopper for this PR, I might verify this, but I'm pretty sure this would indeed break the installation process for anybody using zfs. This PR should therefore be delayed by at least 2 releases, while a change to export pools properly should be merged for this release.

@infinisil
Copy link
Member Author

Actually, this is not half bad because: Installers from previous NixOS versions will still force import pools, so they continue to work. Now given that proper exporting gets implemented along with this PR everything works out perfectly, because the new installer exports properly and doesn't force import (which the installed system also does)

There could however be problems with preexisting zpools and maybe other cases, so I think a period period of 1 release between implementing proper exporting and force importing should be given.

@Mic92
Copy link
Member

Mic92 commented Mar 13, 2018

@infinisil maybe also some end-user documentation just in case.

@infinisil
Copy link
Member Author

@Mic92 Yeah, the options description needs to be adjusted as well

@infinisil
Copy link
Member Author

infinisil commented Mar 13, 2018

tl;dr: Force import should be the default, this PR gets closed, a new PR should adjust the options descriptions with info from this comment.

So I researched some more. What I got is:

  • The only bad thing about force importing is when multiple computers try to use the same pool at the same time, something that almost never occurs.
  • Force importing wouldn't be needed when pools get exported at shutdown, however that's not how export is supposed to be used. export should only be used when you know a different machine is going to use the zpool.
  • The minor problem with exporting the pool on each shutdown is that you can't rollback to a txg (a zpool wide transaction rollback) anymore because these apparently get cleared on every export. This recovery however is only to be used as a last resort.

Conclusions

  • It does not matter that we force import. There are no problems with it in an estimated 99.999% of cases. In the 0.001% that would have problems with it, NixOS should be properly tested on such a system anyways to realize it wouldn't work and the options should need to be switched off.
  • Not using force import by default would "break" installs when the user doesn't zpool export the root zpool for the installed system. Solvable by any of:
    • zpool export
    • Adding zfs_force=1 to the command line in the bootloader
    • Reinstalling with boot.zfs.forceImportRoot = true
    • Implementing units to export zpools automatically on shutdown This doesn't apply in installers, they don't have the installed zpool in their config and the units won't be there.
  • Not using force import by default would also break cases where a single pool is used by different systems (at different times). This is solvable by:
    • Implementing units to export zpools automatically on shutdown. Which however has the disadvantage of breaking txg rollback which might be useful for 0.001% of users at some point.

Proposed actions

  • The options boot.zfs.forceImport{Root,All} should keep a default of true.
  • The description of these options should be changed to explain exactly in which cases force importing can cause problems, and in which cases not force importing can cause problems. The options shouldn't contain a recommendation to turn them off, because there are arguably more cases where this can cause problems.

I did a lot of asking around the past 2 days, so I thought I'd write everything down I discovered for future reference here. If anybody got any more info on this please let me know. Thanks a lot to the people in #zfsonlinux which were really helpful.

My proposal should probably be implemented in a separate PR, so I'll close this in a week or so.

@grahamc
Copy link
Member

grahamc commented Mar 13, 2018

It does not matter that we force import. There are no problems with it in an estimated 99.999% of cases. In the 0.001% that would have problems with it, NixOS should be properly tested on such a system anyways to realize it wouldn't work and the options should need to be switched off.

may be worth documenting the 0.001% scenario and what sorts of things they'd need to look out for?

@infinisil
Copy link
Member Author

@grahamc Yeah this should be documented in the option

@cleverca22
Copy link
Contributor

cleverca22 commented Mar 13, 2018

if you properly umount and zpool export tank the pool from the installer, then the fresh install can boot with forceimport set to false

but if you fail to export the pool and just shutdown -h now, it will just hang in the initrd and never boot

@lheckemann
Copy link
Member

Implementing units to export zpools automatically on shutdown

This is easier said than done, as it will typically require unmounting all the filesystems on it, and if the root filesystem lives in the pool that could be be difficult. Some distros achieve this by pivoting back into the initramfs for shutting down, but that seems to be quite a fuss. I think just continuing to force import by default is sensible.

@aij
Copy link
Contributor

aij commented Mar 17, 2018

you'd need two systems booting from a shared storage device

Isn't that only the case for forceImportRoot? forceImportRoot will force import any ZFS storage that happens to be connected at boot time, won't it?

I think an important question that hasn't been addressed here yet is: What are the worst case consequences of each default?

With forceImport* = false, the worst case is that in (rare?) cases you could end up with a non-bootable system, which would have to be manually fixed. (Eg: by booting with zfs_force=1.)

With forceImport* = true, what bad things can happen? Will it only import unmounted pools? (stealing them from other systems and requiring manual intervention to return them) Or will it also import pools that are currently mounted, potentially resulting in catastrophic data loss?

Per the man page,

          -f      Forces import, even if the pool appears to be potentially active.

which would seem to imply the later. @infinisil are there any additional checks that would prevent pool corruption when using forceImport?

Anyway, if we are really trying to decide between an inconvenient default or a dangerous default, my vote is for the inconvenient default. 👍

FWIW I've been using both options set to false since at least December on 5 machines, with no issues. I reinstalled two or three of those machines and didn't need to use zfs_force=1 even for the first boot.

@lheckemann
Copy link
Member

forceImportRoot will force import any ZFS storage that happens to be connected at boot time, won't it?

No, just the pool that has the root filesystem on it, as far as I understand (that's what forceImportAll is for).

@cleverca22
Copy link
Contributor

id say you shouldnt be exporting the rootfs pool on shutdown
the fact that its not exported is what allows zfs to open i automatically on bootup, as long as the hostid remains the same

@infinisil
Copy link
Member Author

if we are really trying to decide between an inconvenient default or a dangerous default, my vote is for the inconvenient default.

Good point, I didn't think of that.

FWIW I've been using both options set to false since at least December on 5 machines, with no issues. I reinstalled two or three of those machines and didn't need to use zfs_force=1 even for the first boot.

So you either exported the pools before booting into the new system, or you didn't change the host id of the new system. As tested by @cleverca22, the default installation procedure includes a host id change and not exporting it so a force import is needed.

So how about this: Disabling force importing by default and adding something to the installer that reminds the user to export the pool before booting into the new system. Then we have both the benefits of not causing data loss in all cases but new users still being able to install a ZFS based NixOS without problems.

@vcunat vcunat modified the milestones: 18.09, 19.03 Sep 2, 2018
@infinisil
Copy link
Member Author

I don't have any real motivation to work on this (the installer change) right now, if somebody else wants to do this, feel free to make your own PR :)

@infinisil infinisil closed this Nov 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet