-
-
Notifications
You must be signed in to change notification settings - Fork 15.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NixOS/auto-upgrade: offer the possibility to define a reboot window during which the system may be automatically rebooted #77622
Conversation
@danbst, could you review this? |
@danbst, thanks! Anything else needs to happen to have this merged? |
@r-vdp sorry, I wait for more 👀 . But I don't quite know whom to ping for another review. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/what-are-your-goals-for-20-03/4773/14 |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/nixos-20-03-feature-freeze/5655/29 |
I haven't fully tried it yet but it looks all right. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't show correct behavior when the time span goes through midnight. E.g. with lower = "23:00"
, upper = "02:00"
and let's say it runs on Tuesday 01:00, then the condition makes it only reboot if "Tuesday 01:00 is before Tuesday 02:00 and after Tuesday 23:00", which won't be the case.
nixos/modules/tasks/auto-upgrade.nix
Outdated
''; | ||
${optionalString (cfg.rebootWindow != null) '' | ||
elif [[ "''${current_time}" < "${cfg.rebootWindow.lower}" ]] || \ | ||
[[ "''${current_time}" > "${cfg.rebootWindow.upper}" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still wrong. Example is lower = 12:00
, upper = 6:00
, current = 18:00
-> This is within the window, but the code doesn't do it then.
After a bit of thinking, I'd say the best way to correctly implement this is to: Determine if lower > upper
, and if so, flip them around. Then check whether lower < cur < upper
, and toggle this result if lower > upper
from before. If the result is true, do the reboot, otherwise don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@infinisil quite late, but I implemented this change just now.
I marked this as stale due to inactivity. → More info |
854de96
to
f6e06a8
Compare
f6e06a8
to
687196c
Compare
Taking another look at this, unless I'm missing something, I think there's a big flaw with this: It's not useful, because the upgrade time is already fixed by the
As such, the |
@infinisil my use case is that we have many servers that are in very remote locations on often slow uplinks (sometimes satellite). As such, when an upgrade requires downloading a lot of new store paths, it may not finish before working hours start again, and so I want to be able to make sure that under no circumstances our servers reboot during working hours. We also have servers that are turned off outside of working hours, due to power restrictions. For those, we run the upgrade service during working hours when the servers are turned on, but I don't want them to reboot at that time, they will activate the new generation the next morning when they are started again. As our servers are located in volatile contexts, we try to configure them as generically as possible so that they are operational no matter what the local staff decides is best in terms of uptime and power usage. Does that make sense? |
I see, that does make more sense. That does make me think of an alternative approach that could be a bit more deterministic:
This would mean that you can control when both activation and reboots can happen, and if the script makes sure to gcroot the system derivation when it first builds it (even if it can't activate/reboot yet), you can be sure that it will do the activation/reboot the next time it can, since everything is built already. How does that sound? In any case, I don't think this idea conflicts with this PR and can be implemented later. |
That's an interesting suggestion indeed! There is one issue that I see with this approach though. On our servers we make the difference between a deployment of a new config from our git repository (eg addition of a new user or an updated ssh key, updated definition of a service, etc) which never requires a reboot, and an actual upgrade. Only during an upgrade do we we also update the nix channel and so do we potentially need to reboot after. Changes to our git repo are under certain conditions deployed automatically to our servers. However, if we upgrade every hour and then wait until we're in the reboot window to reboot, then the nix channel can already have been updated when a config change is being deployed and so at that point, if a reboot was required, we cannot easily activate this new config without also switching the kernel, restarting services, etc. I also don't think that there's an easy way to roll back the nix channel to the point from which the current generation was derived. All of this would not necessarily be an issue, but I noticed in the past that activating a config with an updated kernel without rebooting can cause issues where we can for instance no longer exec into docker containers, which is quite annoying for our support staff. Besides the fact that restarting services during production hours can also cause issues. So, at least for our use case, it's important that we only do an upgrade when we'll actually be able to reboot the server. When the upgrade service exceeds the reboot window, we obviously also end up in the described situation, but this is assumed to be quite exceptional since such long running updates are not common, usually only on a new release. |
41f6055
to
ecb8ab7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry haven't had a lot of time recently, but I should have some now. This is the only nit that I think should still be addressed, after that I'll merge it :)
…uring which the system may be automatically rebooted Some systems should not be rebooted at just any time. If the upgrade process takes too long, for instance because of a slow internet connection, or if the upgrade service is ran during production hours, we want to allow to define a window outside of which a reboot will not be performed. The system will then reboot on the next run of the upgrade service which finishes inside the reboot window. E.g. we can run the update service twice per week, once during the night and once during the day, but reboots are only allowed during the night. By doing so, a system that is usually shut down during the night will still receive updates and systems that are turned on 24/7 can be rebooted outside of production hours. Co-authored-by: Silvan Mosberger <github@infinisil.com>
24798df
to
39f3eb3
Compare
@infinisil great, thanks! I incorporated the proposed changes, squashed everything together, and rebased the PR on top of master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, let's merge this finally!
Motivation for this change
Some systems should not be rebooted at just any time. If the upgrade process takes too long, for instance because of a slow internet connection, or if the upgrade service is ran during production hours, we want to allow to define a window outside of which a reboot will not be performed.
The system will then reboot on the next run of the upgrade service which finishes inside the reboot window.
E.g. we can run the update service twice per week, once during the night and once during the day, but reboots are only allowed during the night. By doing so, a system that is usually shut down during the night will still receive updates and systems that are turned on 24/7 can be rebooted outside of production hours.
Things done
sandbox
innix.conf
on non-NixOS linux)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)nix path-info -S
before and after)